HANDHELD LIDAR ODOMETRY ESTIMATION AND MAPPING SYSTEM

(1)

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science in Engineering - Robotics

30.0 credits

HANDHELD LIDAR ODOMETRY

ESTIMATION AND MAPPING

SYSTEM

Niclas Holmqvist

nht13002@student.mdh.se

Examiner: Masoud Daneshtalab

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisor: Carl Ahlberg

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Daniel Adolfsson,

¨

Orebro University, ¨

Orebro, Sweden

(2)

1 Abstract

Ego-motion sensors are commonly used for pose estimation in Simultaneous Localization And Map-ping (SLAM) algorithms. Inertial Measurement Units (IMUs) are popular sensors but suffer from integration drift over longer time scales. To remedy the drift they are often used in combination with additional sensors, such as a LiDAR. Pose estimation is used when scans, produced by these additional sensors, are being matched. The matching of scans can be computationally heavy as one scan can contain millions of data points. Methods exist to simplify the problem of finding the relative pose between sensor data, such as the Normal Distribution Transform SLAM algorithm. The algorithm separates the point cloud data into a voxelgrid and represent each voxel as a normal distribution, effectively decreasing the amount of data points. Registration is based on a function which converges to a minimum. Sub-optimal conditions can cause the function to converge at a local minimum. To remedy this problem this thesis explores the benefits of combining IMU sensor data to estimate the pose to be used in the NDT SLAM algorithm.

(3)

2 Introduction

Simultaneous Localization And Mapping (SLAM) is a method, commonly used in robots, for constructing a map of the surroundings while keeping track of where in this map the robot is. A common approach to provide the SLAM algorithm with sufficient data is to combine data input from two different types of sensors [1] [2]. One sensor produces data which can represent the surrounding environment while another complements this data by estimating the ego-motion of the sensor. The task of estimating sensor ego-motion has important applications in various fields, such as augmented/virtual reality or autonomous robot control. Ego-motion can be measured by keeping track of the rotation of the wheels of the robot or by measuring the acceleration of the body of the robot.

Inertial Measurement Unit (IMU) is a term used for sensors which combine, usually three of each, accelerometers, gyroscopes, and magnetometers. Accelerometers are sensors which measure the external specific force acting on the sensor, including the gravity pull of the earth. Gyroscopes measure the rate of change of the sensor’s orientation. The magnetometers measures the heading of the sensor by measuring the Earth’s magnetic field. By combining the data from these sensors it is possible to estimate the orientation of the sensor and, for short periods of time, even the translation of the sensor. These types of sensors are subject to extensive research recent years and are continuously improved in form of cost effectiveness, accuracy, power consumption, start-up time, and size. This causes them to be common for a wide variety of applications such as smart-phones, virtual reality headsets, and body-motion capture applications [3]. The common challenge in estimating the orientation and position, usually denoted in combination as pose of the body, to which the sensors are attached, is to compensate for the erroneous estimations which occurs as an effect of integrating the noisy and biased data outputted by an IMU [4].

This thesis aims to improve and test one SLAM algorithm based on Normal Distributions Transform (NDT) [5]. The NDT algorithm uses exteroceptive sensor data, often provided in the form of a point cloud, to compute Gaussian distributions across the environment discretized as a voxelgrid. When not using any ego-motion sensors the algorithm predicts its pose to be the last known pose. This approach is known as zero motion prediction and is an erroneous estimation as long as the system has a velocity. By estimating the pose through the use of IMU data the accuracy of pose estimation can be improved which can in turn improve the performance of the algorithm. A more in detail introduction to this field of work is presented in Section 3. In Section 4 the problem formulation and proposed solution to the problem is described. A detailed description of the different parts of the system developed for this thesis is described in Section 5. How the different steps of reaching the results (which are presented in Section 7) were carried out can be found in Section 6. And finally, a discussion of the results and a conclusion of the work done can be found in Section 8 and 9 respectively.

(5)

3 Background

There are many different algorithms which can solve the SLAM problem. To mention a few there are for example Rao-Blackwellized Particle Filters [6], kalman filters which were among the early approaches to solve SLAM, and GraphSLAM [7], which is probably the most commonly used SLAM method today. The Normal Distribution Transform (NDT) was initially developed for 2D range scans but has been improved to work for 3D scans [8]. The 3D-NDT SLAM method is compared to another popular method for doing registration, the Iterative Closest Point (ICP) method, and results show, among other things, that the NDT stores the 3D-map generated more efficiently than traditional point cloud maps while still being able to do registration.

In [9] a mono camera, a 3D-LiDAR, a GPS, and an IMU is used to estimate the trajectory when flying underneath a bridge where the GPS signal was blocked. The authors made use of a Graph-Based SLAM, composed of a node and a constraint, which made it possible to fuse the pose results of each sensor. This made pose estimation possible even when GPS signal was blocked. In [1] two different types of cameras was used, one standard camera and one event based camera. Results show that the two types of cameras complement one and other to yield a robust state estimation pipeline in combination with an IMU to estimate pose. In [10] the data from a camera, a LiDAR, and an IMU is fused using an Extended Kalman Filter. The initial pose estimation from the IMU is corrected by the position estimates from the camera and LiDAR. By constantly testing different combinations of both sensors and algorithms the process of improving the future localization and mapping techniques may continue.

Which algorithm to use is often dictated by the sensors. To receive information about the surroundings a robot needs exteroceptive sensors, such as a camera or a LiDAR. These types of sensors makes it possible for the robot to receive data with which it can create a representation of its surroundings. The exteroceptive sensor is often coupled with interoceptive sensors such as wheel encoders or IMUs to complement the state estimation of the robot. While these interoceptive sen-sors complement an exteroceptive sensor well, they are poor units by themselves when estimating pose. The sensor data produced by IMUs tend to drift heavily when integrating the data out-putted by the IMU to find position and translation. Slipping of the wheels cause ego motion from wheel encoders to drift. This makes either interoceptive sensor rarely used for navigation purposes without utilizing additional sensors. The position of a pedestrian can be tracked with nothing but an IMU. The advantage of tracking a pedestrian in particular is that the IMU will occasionally have zero velocity. This happens if the IMU is positioned on the foot, allowing velocity estimates to reset each step. This reduces the problem of position and orientation estimates drifting [11]. The wheel and motor encoders has been used extensively in areas unrelated to robotics for years which has resulted in accurate hardware. This research has fostered the evolution of robotics [12]. A combination of an exteroceptive sensor and an interoceptive sensor, for example a camera and an IMU, which is popular when trying to solve the SLAM problem. The use of IMU sensors are becoming more popular as an effect of how the competitive consumer market is causing them to become available at a lower price [13] [2]. This makes it possible to include several IMUs without the price of the system going through the roof. However, low accuracy is one pitfall of the cheap IMUs which has spurred the research in combining the data from several IMUs to achieve good estimations using low priced hardware. On the other hand, state of the art IMUs are also improving as an effect of the demand and extensive research being made in the area.

3.1 Normal Distribution Transform

The NDT algorithm was first introduced in 2003 and has since then been developed and improved extensively. The fundamental routine of the algorithm is that it divides the space scanned into equally sized voxels creating a 3D voxelgrid. The point cloud collected by a scan is then discretized into this voxelgrid. Depending on if the number of data points assigned to a voxel surpasses a certain threshold of data points from the point cloud a Gaussian probability density function is calculated for that voxel. This results in a sparse representation of the scan in which the data points from the point cloud are now represented by normal distributions. An example depicting the two types of maps, produced by the NDT SLAM algorithm, can be seen in figure1.

(6)

is done by finding the transformation between two point sets, M and F , by minimizing the sum of distances between pairs of Gaussian distributions. The transformation between two point sets can be found by minimizing:

fD2D(p) = nM,nF X i=1,j=i −d1exp −d2 2 µ T ij(R T_{CiR + Cj)}−1_µ ij (1) Where p are the transformation parameters over which the function is minimized. R and t are the rotation and translation components of p. µi Ci are the mean and covariance of each Gaussian component. µij = Rµi+ t − µj is the transformed mean vector distance. d1 and d2 are regularization factors (constants).

This function can utilize an initial estimated pose p0 provided by ego motion estimates. The registration between two scans is computed with the initial estimation as starting-point. If p0 is far off the registration will find a sub-optimal solution, alternatively not find a solution at all. On the other hand, if p0is a near perfect guess the convergence is straightforward and accurate.

To make the algorithm less prone to finding sub-optimal results multi-resolution registration can be used [14]. Multi-resolution is a method which utilizes several different resolutions of the voxelgrid when computing registration to converge to a better minimum. Each registration event is computed at a coarser resolution at first and, in an iterative manner, computes registration for finer resolutions. This method is computationally heavy but benefits the algorithm to estimate translation between scans.

The NDT algorithm has been developed and expanded furthermore to include Occupancy Maps [15] [16], performance improvement by clustering and ground segmentation [17], and incorporating odometry in registration [18].

Figure 1: A comparison of the two types of maps; NDT map to the left and point cloud map to the right. The maps were created processing data which were acquired during this thesis work. The NDT algorithm used a multi-resolution of 0.5m3while utilizing IMU pose prediction. As can be seen, the dense point cloud is sparsely represented by normal distributions.

(7)

4 Problem formulation

The NDT algorithm suffers from the need to converge to the correct solution to function properly. If the resolution of the voxelgrid is high in comparison to the error between estimation and actual solution or if there are voxels which are inconsistent the risk of converging to the wrong solution increases. If the algorithm gets stuck in a local minimum the registration might derail and obscure the map. This is commonly solved by increasing the cell size of the voxels in the map, an effect achieved through multi-resolution. The following questions are to be answered in this thesis work. • How is sensor data, outputted by an IMU as linear acceleration and rotation, integrated to

provide initial estimation?

• How can the NDT SLAM algorithm avoid converging to a local minimum? • What are the benefits for complementing LiDAR range scans with IMU data?

To address these questions a data set containing data from both a LiDAR and an IMU is required. To evaluate the method an additional sensor for ground truth is required. The ground truth is obtained by including measured GPS positions of the system in the data set. To acquire this data set a hand-held physical device, with synchronized hardware, was designed. The hand-held system will be covered in the following section.

(8)

5 System Design

In this section a description of the key parts of the hand-held sensory system, developed for pose estimation and LiDAR scans, is presented. The IMU, which is described in more detail than the other sensors because of its central role for this thesis; producing data for pose estimation. The LiDAR sensor produces point cloud data suitable to be used when doing SLAM based on NDT. A description of the hand-held model which holds all the sensors and a brief description of Robot Operating System (ROS) can be found in this section. To evaluate performance of the system a trajectory will be recorded using GPS simultaneously to the SLAM. The GPS positions produced are not used for pose estimation but will instead be used as ground truth to which the estimated trajectory will be compared to produce an absolute trajectory error (ATE). A camera was attached to the system and used by the author as a tool to easier distinguish data sets during the thesis.

5.1 Inertial Measurement Unit

The IMU is a sensor used to determine orientation and acceleration. Generally an IMU consists of a combination of sensors of the types accelerometers, gyroscopes and magnetometers. One accelerometer, one gyroscope and one magnetometer is used for each orthogonal axis to measure the linear acceleration, angular velocity, and general heading. A general description of the sensors follow.

5.1.1 Gyroscopes

The three main types of Gyroscopes are the mechanical gyros, fibre optic gyroscope and the micro-machined electromechanical systems (MEMS).

The mechanical gyroscope is based on the same effect which makes bicycles stable when moving forward. The rotation of a wheel naturally resists changes in orientation. By having the wheel spinning mid air, held up by gimbals to make it free to rotate in all three axes, it is possible to measure the angle of the gimbals when the gyroscope is subjected to rotation. This delicate mechanical device measures absolute orientation, but is prone to errors over time due to friction, and requires some time to get the mechanical parts moving at startup.

Fibre optics is a medium with total internal reflection through which light travels. The gyro-scope based on fibre optics uses a coil of fibre optics, through which a beam of light is shone into each end of the coil. When the beams exit the coil at the opposite ends the rotation of the coil can be measured from the combined intensity of the light beams. The beam that was shone into the coil in the opposite direction of the rotation will exit the coil sooner than the opposite beam. A larger coil produces better results, which causes the technique to be problematic when size is a limiting factor, especially so for embedded sensors.

The MEMS gyroscope is based on the Coriolis effect which states that an object moving in a rotating frame of reference causes an inertial force to act on the object. The effect is described by F = −2mwv. Where F is the force, m is the mass and v is the velocity of the object and w is the angular rotation of the frame of reference. To measure the Coriolis effect the MEMS gyroscopes utilizes a vibrating medium which vibrates along only one axis. When the medium is affected by the Coriolis effect, caused by rotation of the gyroscope, a force can be measured along the axis perpendicular to the vibrating axis, as depicted in Figure 2. The magnitude of these vibrations can be used to calculate the angular velocity.

5.1.2 Accelerometers

Accelerometers can be classified as either mechanical or as a solid state devices. There are also MEMS-accelerometers that can utilize either mechanical or solid state principles.

Mechanical linear accelerometers are usually primitive using nothing but a spring and a dis-placement sensor. When a mass attached to the spring moves, the disdis-placement measured gives the force actuated on the mass, which in turn makes it possible to calculate the acceleration of the sensor using Newton’s second law F = ma.

There are different solid state accelerometers. An example is the surface acoustic wave ac-celerometer where, at one end, a beam is rigidly attached to the base while it has, at the other

(9)

end, a mass attached to it which is free to move, as depicted in Figure3. When the sensor ac-celerates in the direction of the input axis the beam will bend, actuating resonators, which are placed on the beam, at perturbed frequencies proportional to the applied strain of the beam. These perturbed frequencies can be used to calculate the acceleration of the sensor.

Figure 2: Gyroscopes measure the force Fc caused by the Coriolis effect when the mass m is under the effect of both a velocity v and an angular velocity ω.

Figure 3: Accelerometers measure the change in frequencies of the resonators when the sensor accelerate along the input axis of the Surface Acoustic Wave sensor.

5.1.3 Magnetometers

Magnetometers are able to measure the heading of the sensor by the influence of the earths magnetic field. The most common technique is based on the Hall effect, which lets current flow through a conductive plate. When the plate is affected by the earths magnetic field the electrons will flow through the plate with a perturbed path. This causes one side of the conductive plate to have a positive charge and the other a negative charge. This means it is possible to measure a voltage across the plate, which can be be used to find magnetic north.

5.1.4 Estimating orientation and translation

Integration of the gyroscopes yields the orientation of the sensor. Double integration of the ac-celerometers yields translation. The acac-celerometers are subjected to the force of gravity of the earth. Compensation for this constant force, acting on the sensor, is based on the orientation attained from integration of the gyroscopes. The process of integrating data to achieve orientation

(10)

Figure 4: Dead-reckoning. Sensor output from gyroscopes and accelerometers are integrated to produce pose estimate.

and translation can be considered dead-reckoning and results in an estimated relative pose of the sensor. If the initial pose was perfectly known, and if the sensors output was not affected by any errors, the dead-reckoning process, depicted in Figure 4, would lead to perfect pose estimates. This is not the case as neither is true. The output from the sensors are noisy and subjected to a biased error, and if the gravity compensation contains errors a leakage of acceleration will occur, perturbing the measured accelerations caused by movements. These errors grow exponentially when integrated and quadratically when double integrated, which is the process of dead-reckoning. Dead-reckoning works only for short periods of time as the integrated error can drift for several meters in a matter of seconds [4]. To represent orientations different parametrizations can be used, such as Euler angles, rotation vector, rotation matrix, or quaternions. They all contain all the necessary information to transform vectors between frames. In this work quaternions and rotation matrices are used for the calculations. Euler angles are avoided as they suffer from the problem of gimbal lock.

5.1.5 Xsens Motion Tracker

The XSENS Motion Tracker MTi-30 IMU is used for this system. It uses accelerometers, gyro-scopes and magnetometers to detect linear headings where the magnetometer is used as a heading reference. Each sensor component inside the IMU is calibrated with high accuracy. When the IMU is calibrated during production it is subjected to a wide range of various physical effects, such as motion and temperature, to find correct calibration parameters. These parameters are then used when the IMU internally computes the orientation using a Kalman Filter. By combining the measurement of gravity from the accelerometers, and the Earth magnetic north measured by the magnetometers, it compensates for the constant drift caused by integration of gyroscope data. A system compensating orientation drift in this way is referred to as an Attitude and Heading Ref-erence System (AHRS). To measure gravity using the accelerometers the assumption is made that the average acceleration, caused by movement of the system, equals zero. This makes it possible to observe the direction of gravity to stabilize roll and pitch. Yaw is stabilized by utilizing the local magnetic field as a compass [19].

5.2 LiDAR

The NDT algorithm requires dense point clouds. It has been designed to work with laser range data. It could be possible to apply the algorithm on point cloud data derived from a camera sensor. The LiDAR creates more robust point clouds, will not be troubled by illumination problems, and does not require the same extent of preprocessing. The 3D Velodyne HDL-32E LiDAR was used to gather the point cloud data. The sensor has the shape of a cylinder where the top half spins at roughly 10Hz to enable 360◦horizontal field of view. The vertical field of view is +10◦ and −30◦. The sensor produces up to ∼ 1.39 millions points per second and can measure data at a distance of up to 100m [20].

5.3 Hand-held system

To keep all sensors in a fixed relation a hand-held part was designed to act as a backbone for the system. A CAD-model was created using the Computer Aided Design (CAD) software SolidWorks

(11)

Figure 5: A cross section of the handheld system. 1: LiDAR. 2: IMU. 3: 3D-printed part. 4: GPS antenna.

Figure 6: The hand-held system. The LiDAR can be seen standing on top of the backbone part of the system. The IMU can be seen inside. The USB-camera attached to the bottom of the unit was used only to distinguish data sets. GPS antenna is not present in the figure.

and exported and printed in PLA plastic. The design of the model considered placement of sensors to fall into suitable coordinate frames as well as rigidity for fixed relation between sensors. A cross section of the system is depicted in figure5 which show the positioning of each sensor in the hand-held unit. A picture of the system can be seen in figure6. In the picture the GPS antenna is not present. The hand-held system has attached to it the IMU, LiDAR, and the RTK-GPS rover antenna. The hand-held system is complemented by a backpack. All sensors are hooked up to a laptop computer which are carried in the backpack. The LiDAR is connected to the laptop with an ethernet cable via the LiDAR interface box. A 3-cell battery pack supplies power to the LiDAR and LiDAR-interface box.

5.4 Robot Operating System

ROS is an operating system framework designed primarily to make it easy to share tions. ROS provides services, such as hardware abstraction, low-level device control, implementa-tion of commonly used funcimplementa-tionality, message-passing between processes, and package management. Executables can be run as nodes within ROS. Each node can communicate with other nodes via the ROS communication infrastructure, either locally on one machine or across multiple machines. Messages which concern sensor data, pose state, control, and planning are examples which are

(12)

commonly used for communication between nodes. The primary programming languages used in the ROS framework is C++ and Python. Common implementations that are shared as open source comprise hardware drivers, SLAM algorithms, simulation tools, and more. For example, there are ready to use sensor node implementations in ROS that can launch a node which reads sensor data and publishes the data to a topic. This topic can in turn be subscribed to by any other node in the network to process the data1. ROS was used for acquiring the data sets used in this thesis. By launching one ROS node each for the IMU, LiDAR, RTK-GPS, and camera to have all the sensor data be published to topics. The data being published was then recorded and stored as a file on the computer running ROS by starting and stopping the recording manually.

5.5 RTK-GPS

The RTK-GPS C94-M8P-3 from ublox is used for this system. To provide centimeter level accuracy the RTK-GPS technique utilizes the measured phase of the signal between two stations and the GPS signal of both stations. It is a highly accurate positioning system and it is used to record a ground truth trajectory, used as a reference for evaluation.

A GPS receiver can find its position by measuring the time it takes for a signal to travel between the receiver and the satellites. The ionosphere and troposphere of the Earth affects the travel time of the signal. Structures and trees can partially block the signal and flat surfaces such as walls can cause the signal to bounce resulting in multipathing. To measure the time it takes for the signal to travel to the receiver a pseudo random binary sequence contained in the signal is aligned to a similar sequence in the receiver. Once the two signals are aligned the distance between the receiver and the GPS can be estimated. By combining the results from several satellites the receiver can calculate its position.

The RTK-GPS system consists of a base which is absolutely stationary and a rover that can move around. The base keeps track of its GPS position and sends out correction data to the rover, usually via radio or over internet. For this experiment a radio link is set up between the base and the rover (attached to the hand-held system). When broadcasting data via radio it is necessary for the antennas that are communicating to be in line of sight of each other. This can cause problem in rugged landscapes or when moving behind concrete buildings.

When setting up the base station a survey in needs to be made. The survey in lets the base find a position with high accuracy. The survey in process ends when both the predefined desired accuracy and the duration of time has been attained. Once the survey in has been made the base station will start broadcasting the correction data. Once the rover receives these data it will start to solve its own position ambiguities. When the ambiguities are solved the rover can estimate its position with an accuracy down to a few centimeters. The time taken for the rover to solve these ambiguities depends partly on how many and which constellations of satellites it can detect. However, the most important factor for a good RTK-GPS solution (∼2-3cm accuracy) is to have the base station set up with a good satellite coverage (∼10 satellites) without obstructions as the solution produced by the rover is affected by the quality of the correction data produced by the base [21].

(13)

6 Method

To evaluate the system different data sets were acquired by recording data using the hand-held system. This section describes how data was acquired, how Absolute Trajectory Error is measured, and how IMU data is utilized to benefit the NDT algorithm and LiDAR sensor.

6.1 Data Acquisition

The data was recorded at the premises of ¨Orebro University in ¨Orebro. Finding a location, which would enable the GPS antennas to see as many satellites as possible, while still being considered interesting enough to record point cloud data for mapping, proved to be challenging. Several locations were tested, but often too many structures in the vicinity of the GPS antennas obstructed the signals, for a stable RTK-GPS solution to be established. A location was chosen, which had clear view of the sky, to enable a good survey in, for the RTK-GPS station while still allowing the rover to move along the nearby buildings, and to some extent in between them, while maintaining visual contact with the base station to maintain radio link connection. Two data sets were collected to evaluate the system, one while walking and one while running. The different paths along which the different data sets were recorded are depicted in figure7, along with the location of the RTK-GPS base station. The base station for the experiment consist of a laptop computer with the RTK-GPS base connected to it. The GPS-antenna of the RTK-GPS base sits on top of a tripod ∼ 1.5m high. In preparation before recording the data sets the RTK-GPS base station was set up to perform a survey in. The survey in was configured to run for at the least 30 minutes and at the least attain an accuracy of 0.5m. The base station achieved an accuracy of 0.44m after 30minutes.

Figure 7: The area at ¨Orebro University where data was recorded. 1: Trajectory of data set 1. 2: Trajectory of data set 2. 3: Position of RTK-GPS base station.

6.2 Absolute Trajectory Error

To evaluate the performance of the system the Absolute Trajectory Error will be calculated by comparing the estimated trajectory of the SLAM with the Ground Truth measured by the RTK-GPS system. The LiDAR sensor produces data at a rate of 10Hz while the RTK-RTK-GPS system only produces data at 1Hz. This means that each position produced by the algorithm as the estimated trajectory will be compared to a linearly integrated position between two RTK-GPS readings. The recorded GPS-trajectory will be aligned to the estimated trajectory to compensate for errors in

(14)

approximated distance between sensors and for the accuracy of the GPS position reading. The GPS produces (X, Y, Z) coordinates in the cartesian Earth Centered Earth Fixed (ECEF) frame which, has its origo in the center of mass of the earth. Its axes are aligned with the international reference pole. The distance d from the estimated trajectory E to the Ground Truth GT will be summed for all positions T produced resulting in an absolute trajectory error measured in a root mean square error, as seen in Equation1.

AT E = v u u u u t T X i d(Ei, GTi)2 T (1)

The magnitude of this absolute trajectory error is considered a measurement of how well the algorithm performs.

6.3 Estimating pose

To provide an initial pose estimate po to the objective function used in the NDT registration process, dead reckoning was used. While relative orientation estimates using an IMU can be precise, estimating the position of the LiDAR using IMU acceleration data is harder as it requires double time integration. The problem is addressed by finding by filtering the IMU velocity estimates together with drift tree LiDAR velocity estimates based on registration poses.

A reference velocity vref _{is calculated by} vref = 1

∆T(o2− o1) (2)

where ∆T = t2− t1 and o2− o1 is the translation of the system over the last two registrations. The filtered velocityvt_e was obtained by combining vref _{with the estimated velocity vt}_{as seen in} Equation3.

e

vt= αvref_t + (1 − α)vt (3)

Where α is the factor that regulates the impact of vref _{and v}

trespectively. vtis obtained by using e

vt−1and integrating acceleration from t − 1 to t as seen in Equation4. vt=vt−1_e +X

i

aw_t(i)∆T (4)

at(i) is the acceleration measurement of sensor estimate i at the time t. The acceleration aw in Equation4 is expressed in a fixed world frame by rotating the original acceleration according to Equation5.

aw= Rw_lidRlid_imuaimu (5)

where Rwlid is the orientation of the LiDAR in the world frame, R lid

imu is the orientation of the IMU in the LiDAR frame and aimu is the measured acceleration in the IMU frame. Note that the gravitation of the Earth is not included in aimuas it is estimated and subtracted internally in the IMU.

Finally, the position estimate is obtained similar to the velocity in Equation 4 by integrating the velocity from t − 1 to t

ot= ot−1+X i

vw_t(i)∆T (6)

The orientation estimate is obtained by

Rt= Rt−1Rlid_imu∆Rimu_t,t−1Rimu_lid (7) where Rt−1 is the orientation of the lidar pose in the world frame at time t − 1. ∆Rimu_t,t−1 is the difference in orientation of the IMU between the time t − 1. The rotation estimate Rtand position estimate otwas combined and used as an initial guess p₀ to the registration.

(15)

6.4 Undistorting Pointclouds

The LiDAR rotates at 10 Hz. This means that one full scan is produced over the course of 0.1 second. Any rotation will obscure the point cloud. For example, an angular velocity of 90◦/s will produce 0.1 ∗ 90◦= 9◦rotation of the sensor during a scan resulting in a skewed point cloud. Each point cloud cluster provided by the LiDAR sensor is rotated according to the relative orientation obtained. The rotation of the LiDAR at time t0 and t is obtained by

Rlid_t 0 = R imu t0 R imu lid (8)

Rlid_t = Rimu_t Rimu_lid (9)

respectively. So, the rotation from time t0and t is obtained by

Rlid_t,t₀ = Rlid_t Rlid−1_t₀ (10) Given a point segment acquired at time t

Plidt (11)

the points are undistorted by applying

Plid_t,corrected = Rlid_t.t

0P

lid

t (12)

The undistortion of point clouds is utilized whenever the IMU is used in the tests.

6.5 Evaluation

The system will be tested by running a series of tests where the ATE will be measured. To evaluate the performance of the algorithm, using the IMU in comparison to without IMU pose estimation, two extensive tests will be carried out. A third extensive test will evaluate the performance of utilizing the IMU pose estimation for the algorithm when not using the multi-resolution method. The first two tests will be made with the same settings on two different data sets. One which was recorded while walking (data set 1) and the other while running (data set 2). The third test will be run with multi-resolution turned off on data set 1. The algorithm will be tested at different resolutions, 0.2, 0.3, 0.4, 0.5, 0.6, and 0.7 m3_{, where each resolution will be evaluated by running} five different neighbouring resolutions. The product from these tests is a quantitative study of the performance of the algorithm.

(16)

7 Results

Detailed results from test case 1 are presented in a table with exact values from every SLAM sequence. The average trajectory error for each test case is displayed with a graph. Three test cases were performed. Test case 1 and 2 compare the performance of the algorithm running with or without IMU prediction. Test case 3 aims to study the performance of the algorithm with or without utilizing multi-resolution while using IMU prediction.

7.1 Test Case 1

For this test case the data set 1 was used. The data set is recorded while walking. The detailed presentation of the ATE from test case 1 is presented in Table1. A comparison of the data obtained can be seen as a graph in Figure8 where the average error is depicted. It is clear that for coarse resolutions either method produces similar results. Coarse resolutions in this context is > 0.49m3_. For finer resolutions, < 0.49m3_{, the method utilizing the IMU predictions is more robust.}

Mu-Res -0.01 -0.005 +0.0 +0.005 +0.01 Avg 0.2 With IMU 0.181787 0.245115 0.246043 0.24318 0.175244 0.218274 W/O IMU 0.195498 1.138054 1.638658 0.286455 0.96337 0.844407 0.3 With IMU 0.1757 0.174946 0.18039 0.181884 0.17354 0.177292 W/O IMU 1.274207 2.680296 2.142284 0.179436 2.610908 1.777426 0.4 With IMU 0.135585 0.138080 0.132310 0.135266 0.136138 0.135476 W/O IMU 0.137596 0.141001 0.136599 3.140976 0.140318 0.739298 0.5 With IMU 0.131796 0.131899 0.131753 0.132687 0.131633 0.131954 W/O IMU 0.131852 0.132052 0.134159 0.133793 0.130166 0.132404 0.6 With IMU 0.133256 0.132741 0.133434 0.131973 0.13203 0.132687 W/O IMU 0.133145 0.132486 0.132564 0.131416 0.13307 0.132536 0.7 With IMU 0.13319 0.134066 0.133586 0.133789 0.13409 0.133744 W/O IMU 0.132453 0.13281 0.132538 0.131981 0.132279 0.132412 Table 1: Table depicting the Average Trajectory Error for different multi-resolutions used in test case 1, with and without IMU pose estimation. Outlier is marked with bold text.

Figure 8: A graph depicting the average absolute trajectory error produced from test case 1. In the case of mapping with a multi-resolution of 0.4 the algorithm performs well at all resolu-tions except the outlier at 0.405 (0.4+0.005) without the IMU. This mapping sequence produced

(17)

an ATE ∼ 24 times higher than results from tests using the same method but at neighbouring resolutions, as can be seen in Table 1. A visual representation for the trajectory produced in comparison to the ground truth in the XY-plane is depicted in Figure9for both with IMU and the sequence without IMU which produced the outlier. In Figure10a comparison is depicted between the two maps created by the same test. It can be noted that the map has been skewed somewhere in the middle so that half of the map has been rotated.

Figure 9: A comparison between the estimated trajectory and ground truth for test case 1 for 0.405 multi-resolution sequences. With IMU to the left and without IMU to the right (outlier). Red represents the error between estimated trajectory and the ground truth.

Figure 10: Two NDT-maps generated from 0.405 multi-resolution. With IMU to the left and without IMU to the right from test case 1. It can be seen that the map to the right has been skewed.

7.2 Test Case 2

For test case 2 data set 2 was used. The data set was recorded while running. The detailed presentation of the ATE from test case 2 is presented in table2. A comparison of the results can be seen in Figure 11 where the average ATE is depicted. For coarse resolutions the algorithm running with IMU prediction produces slightly better results. Coarse resolutions in this context is > 0.6m3_{. For finer resolutions, < 0.6m}3_{, the method utilizing the IMU produces good results until} a resolution of < 0.4m3 _{where it starts producing noticeably higher ATE values. One extreme} outlier appeared at 0.21m3_{, as can be seen in Table}₂_{, running with the IMU. The outlier was left} out when plotting the graph in Figure11for consideration of readability. The produced map from

(18)

this outlier does not resemble the area recorded in any way, as shown next to a successful SLAM trajectory of a multi-resolution of 0.305m3_{on this same data set, seen in Figure}₁₂_.

Mu-Res -0.01 -0.005 +0.0 +0.005 +0.01 Avg 0.2 With IMU 13.67046 13.60853 13.79115 13.66079 72.825041 13.68273 W/O IMU 13.75409 13.87096 13.89311 13.60138 12.85526 13.59496 0.3 With IMU 10.48414 1.110085 13.29352 1.103186 1.019438 5.402072 W/O IMU 13.7154 13.4356 13.40966 13.09014 13.07739 13.34564 0.4 With IMU 0.922175 1.017006 0.935122 1.072227 1.104153 1.010137 W/O IMU 13.73875 13.7138 0.136599 13.54535 13.17174 13.17174 0.5 With IMU 0.942723 0.861066 0.942038 0.860566 0.797082 0.880695 W/O IMU 7.508296 6.657963 7.282038 6.419649 1.235368 5.820663 0.6 With IMU 0.8046 0.779483 0.820759 0.828549 0.850099 0.816698 W/O IMU 1.120184 1.058158 1.19835 1.124236 1.078919 1.115969 0.7 With IMU 0.806472 0.872928 0.783229 0.750386 0.7636 0.795323 W/O IMU 0.979967 0.985858 0.967881 0.859712 0.936805 0.946045 Table 2: Table depicting the Average Trajectory Error for different multi-resolutions used in test case 2, with and without IMU pose estimation. Outlier is marked with bold text.

Figure 11: A graph depicting the average absolute trajectory error produced from test case 2. The average computation time for registration is depicted as a graph in Figure 13. As can be seen from this graph the computation time does decrease for coarser resolutions. The average computation time is slightly lower when utilizing the IMU prediction for all resolutions at which the method produced a low ATE. The computation time is lower at a resolution of 0.3m3 _running without IMU. None of the two methods produced low ATE results at this resolution, as can be seen in Figure11.

(19)

Figure 12: Two maps produced from test case 2. The top map was produced by the outlier at resolution 0.21m3 with an ATE of ∼ 72m. The bottom map at resolution 0.305m3 had an ATE of ∼ 1m. The top map does not resemble the area scanned at all. In the bottom map the ground and the wall is clear and the trees are distinguishable.

Figure 13: Average computation time for test case 2. Multi-resolution running data set 2.

7.3 Test Case 3

The goal for this test is to find out how the algorithm performs using IMU prediction with multi-resolution turned off. The same data set was used for test case 1, from which the results for the method using IMU prediction and multi-resolution will be used for comparison. The results are

(20)

shown in Figure14. As can be seen, the method produces a low ATE at resolutions > 0.4m3_when multi-resolution is disabled (single-resolution). The total computation time for all registrations are summed, as can be seen in Figure15. The graph show that the data set is processed ∼ 48 seconds (∼ 11%) faster by turning multi-resolution off and instead utilize IMU prediction. Notice how the method running without IMU prediction did not manage to produce a low ATE for this resolution (> 0.4m3) even with multi-resolution for this same data set, see Figure8.

Figure 14: A graph depicting the average ATE from test case 3 (single) and from test case 1 (multi) with IMU.

(21)

Figure 15: Average computation time comparison for algorithm running with and without multi-resolution on data set 1 using IMU prediction.

8 Discussion

The aim for this thesis was to estimate the pose of a hand held system utilizing the data from an IMU. The estimated pose was used to improve the performance of a SLAM algorithm which is based on the Normal Distributions Transform (NDT) applied to point cloud data. A rigid and easy to hold hand held system was assembled holding all the necessary sensors. Data sets were gathered consisting of IMU, LiDAR, and RTK-GPS data in an outdoor structured environment at the premises of ¨Orebro University. Evaluation of the system was carried out by comparing the estimated trajectory of the SLAM algorithm with the RTK-GPS position of the system. It is evident that the performance of SLAM based on NDT is improved by supplying pose estimates based on IMU data for certain parameter configurations.

By placing the IMU in a fixed relation to the exteroceptive sensor the rotation and translation of the system can be estimated. This estimation aids the algorithm to compute registration if the estimation is closer to the true pose than the preceding pose. By utilizing the results from recent registration events the average velocity of the system can be estimated. This average velocity can then be used as an initial velocity when integrating acceleration data outputted by the IMU.

It can be concluded from tests that the high deviation in trajectory error, for finer resolutions, when not utilizing the IMU pose estimations, is a product from the algorithm failing to converge to the correct minimum. Instead, a local minimum has been established skewing the map. This results in a negative spiral, where future registration events are even more prone to continue matching in this local minimum. The registration event might also match the scan at a new local minimum, perturbing the map even further. When utilizing the IMU predictions the registration events are more successful at finer resolutions.

In test case 2 where the velocity of the system was increased when recording the data. As can be seen in the graph for average ATE (Figure 11) from test case 1 the resolution at which either method is able to successfully complete registration with certainty is shifted towards more coarse resolutions when compared to average ATE of test case 2 (Figure8). This suggests that for higher velocities of the system the pose estimation produced by the IMU contributes to successful registrations at a coarser resolution. Additionally, it can be concluded from the tests that for finer resolutions the IMU pose estimation is necessary for successful registrations. It can be stated

(22)

that the relation between velocity and voxel size (resolution) is that when a higher velocity is measured in voxels per second the need for pose estimation increases. For a high velocity and fine resolution the IMU prediction used in this work is unable to produce good enough pose estimates for registration to be successful.

Matching scans depends on recognizing features in the environment to be successful. Mapping an indoor environment requires a finer resolution as the size of objects is limited to the size of the room. For example, corners in a room is often times a strong indication of position. When mapping an outdoor environment on the other hand, the level of detail required for a successful localization becomes coarse. Corners of buildings can be represented at coarse resolutions and are strong indicators for positioning. As shown by the tests, coarse resolutions produce good results when mapping in an outdoor environment. The predicted pose matters less for coarse resolutions as, which has been shown, the difference in measured ATE becomes negligible when comparing methods utilizing estimated pose from IMU and zero motion prediction. Especially so as the computation time is lower for coarser resolutions the incentive to use a fine resolution is lowered.

Test shows that by utilizing the IMU prediction the algorithm can successfully compute reg-istrations at a finer resolution without utilizing the multi-resolution method. Multi-resolution is computationally heavy and it is possible to produce a NDT map at certain resolutions faster by turning multi-resolution off and instead utilize IMU prediction.

To summarize; utilizing IMU for pose prediction contributes to successful registrations at finer resolutions and higher velocities. It also enables certain resolutions to be computed without the need for multi-resolution which in turn decreases computation time for registration. For coarser resolution the contribution from IMU pose estimation becomes less substantial. When mapping an outdoor structured area a coarse resolution can be enough to successfully compute registrations.

(23)

9 Conclusion and Future Work

There are several improvements possible for the system developed in this work. The use of RTK GPS as ground truth for a system, which aims to SLAM in a structured area, was counter intuitive as the RTK-GPS system is heavily dependent on both a strong GPS signal and unobstructed radio communication between the base and the rover. These demands of the antennas of the system can cause problems in the targeted type of area, an outdoor structured area, which was the focus for this thesis. However, with good positioning of the base station, for example on top of a rooftop, the performance would suffice to establish a stable RTK-solution in an even more structured area. When producing the ground truth trajectory using the RTK GPS system the survey in period used was 30 minutes. The accuracy of the trajectory produced can probably be improved by having the base station survey in for a long period of time, for example one week. The problem with the test setup was that the base station used for this thesis relied on the battery power from a laptop computer. A more permanent RTK GPS base station would be preferred if additional data was to be acquired using this method.

Although the hand held system was designed with consideration regarding sensor positions and rigidity of the backbone of the system a redesign is in order. The distance between the IMU and LiDAR can be shorter so that their frames can align better. The part used in this thesis was fortunately extremely sturdy which made rough modifications possible without compromising the fixed relation between sensors.

Incorporating the IMU pose estimation as a soft constraint into the objective function of the algorithm was not utilized in the tests made for this thesis. This method provide additional information in feature poor environments. This can be interesting for the case of data set 4 where the environment is somewhat similar to that of a corridor with a repetitive environment of the parkway.

For this thesis offline SLAM was performed. A next step for this project could be to adapt the system to be able to run online. Once running online, the system can provide the pose estimates as a subsystem to a system which can actuate its position in the world, for example in an unmanned car or aerial vehicle.

(24)

References

[1] A. R. Vidal, H. Rebecq, T. Horstschaefer, and D. Scaramuzza, “Ultimate slam? combining events, images, and imu for robust visual slam in hdr and high speed scenarios,” IEEE Robotics and Automation Letters, vol. PP, no. 99, pp. 1–1, 2018.

[2] B. Suwandi, T. Kitasuka, and M. Aritsugi, “Low-cost imu and gps fusion strategy for apron vehicle positioning,” in TENCON 2017 - 2017 IEEE Region 10 Conference, Nov 2017, pp. 449–454.

[3] Y. Zhang, Y. Fei, L. Xu, and G. Sun, “Micro-imu-based motion tracking system for virtual training,” in 2015 34th Chinese Control Conference (CCC), July 2015, pp. 7753–7758. [4] M. Kok, J. D. Hol, and T. B. Schn, “Using intertial sensors for position and orientation

estimation,” April 2017.

[5] P. Biber, “The normal distributions transform: A new approach to laser scan matching,” 2003.

[6] G. Grisetti, G. D. Tipaldi, C. Stachniss, W. Burgard, and D. Nardi, “Speeding-up rao-blackwellized slam,” in Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, May 2006.

[7] F. Lu and E. Milios, “Globally consistent range scan alignment for environment mapping,” Autonomous Robots, vol. 4, no. 4, pp. 333–349, Oct 1997. [Online]. Available:

https://doi.org/10.1023/A:1008854305733

[8] M. Magnuss, A. Lilienthal, and T. Ducket, “Scan registration for autonomous mining vehicles using 3d-ndt.”

[9] T. Oh, S. Jung, S. Song, and H. Myung, “Graph-based slam (simultaneous localization and mapping) for bridge inspection using uav (unmanned aerial vehicle),” in The 2017 World Congress on Advances in Structural Engineering and Mechanics ASEM17, August 2017. [10] H. Deilamsalehy and T. C. Havens, “Sensor fused three-dimensional localization using imu,

camera and lidar,” in 2016 IEEE SENSORS, Oct 2016, pp. 1–3.

[11] A. Jimenez, F. Seco, C. Prieto, and J. Guevara, “A comparison of pedestrian dead-reckoning algorithms using a low-cost mems imu,” August 2009.

[12] R. Siegwart and I. R. Nourbakhsh, “Introduction to autonomous mobile robots,” 2004. [13] F. Yu, M. Zhu, J. Wang, and S. Xiao, “An unconventional gps and multiple low-cost imu

integration strategy with individual model for systematic errors and measurements,” in 2017 Forum on Cooperative Positioning and Service (CPGPS 65289;, May 2017, pp. 66–71. [14] T. Stoyanov, M. Magnusson, and A. J. Lilienthal, “Point set registration through minimization

of the l2 distance between 3d-ndt models,” in 2012 IEEE International Conference on Robotics and Automation, May 2012, pp. 5196–5201.

[15] J. Saarinen, H. Andreasson, T. Stoyanov, J. Ala-Luhtala, and A. Lilienthal, “Normal distribu-tions transform occupancy maps: Application to large-scale online 3d mapping,” in Proceed-ings - IEEE International Conference on Robotics and Automation, 05 2013, pp. 2233–2238. [16] J. Saarinen, T. Stoyanov, H. Andreasson, and A. J. Lilienthal, “Fast 3d mapping in highly

dynamic environments using normal distributions transform occupancy maps,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Nov 2013, pp. 4694– 4701.

[17] A. Das, J. Servos, and S. L. Waslander, “3d scan registration using the nomal distributions transform with ground segmentation and point cloud clustering.”

(25)

[18] H. Andreasson, D. Adolfsson, T. Stoyanov, M. Magnusson, and A. Lilienthal, “Incorporating ego-motion uncertainty estimates in range data registration,” 09 2017, pp. 1389–1395. [19] X. Technologies, “Mti user manual document mt0605p,” February 2018.

[20] Velodyne, “http://velodynelidar.com/hdl-32e.html,” Accessed: 29 April 2018.

[21] E. Bollard, “Technical brief how quickly should a gps receiver achieve an rtk fixed solution,” August 2014.

HANDHELD LIDAR ODOMETRY ESTIMATION AND MAPPING SYSTEM

V¨

aster˚

as, Sweden

Thesis for the Degree of Master of Science in Engineering - Robotics

30.0 credits

HANDHELD LIDAR ODOMETRY

ESTIMATION AND MAPPING

SYSTEM

Niclas Holmqvist

nht13002@student.mdh.se

Examiner: Masoud Daneshtalab

M¨

alardalen University, V¨

aster˚

as, Sweden

Supervisor: Carl Ahlberg

M¨

alardalen University, V¨

aster˚

as, Sweden

Company supervisor: Daniel Adolfsson,

¨

Orebro University, ¨

Orebro, Sweden

1

Abstract

Table of Contents

2

Introduction

3

Background

3.1

Normal Distribution Transform

4

Problem formulation

5

System Design

5.1

Inertial Measurement Unit

5.2

LiDAR

5.3

Hand-held system

5.4

Robot Operating System

5.5

RTK-GPS

6

Method

6.1

Data Acquisition

6.2

Absolute Trajectory Error

6.3

Estimating pose

6.4

Undistorting Pointclouds

6.5

Evaluation

7

Results

7.1

Test Case 1

7.2

Test Case 2

7.3

Test Case 3

8

Discussion

9

Conclusion and Future Work

References