Improving SLAM on a TOF Camera by Exploiting Planar Surfaces

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2016

Improving SLAM on a TOF

Camera by Exploiting

Planar Surfaces

Richard Bondemark

(2)

Master of Science Thesis in Electrical Engineering

Improving SLAM on a TOF Camera by Exploiting Planar Surfaces

Richard Bondemark LiTH-ISY-EX--16/4984--SE Supervisor: Hannes Ovrén

isy, Linköpings universitet Erik Ringaby

SICK IVP Examiner: Per-Erik Forssén

isy_{, Linköpings universitet}

Computer Vision Laboratory Department of Electrical Engineering

(3)

Sammanfattning

Simultaneous localization and mapping (SLAM) är problemet att kartlägga sin omgivning samtidigt som man lokaliserar sig själv i kartan. Det är ett viktigt och aktivt forskningsområde inom robotik. I det här exjobbet testas två tillvägagångs-sätt för att minska felet i kameraposition och orientering som uppstår över tiden i SLAM-lösningar. Det första tillvägagångssättet testar 3 olika rörelsemodeller för kameran. Två av modellerna utnyttjar vetskapen om att kameran sitter mon-terad på en vagn. Dessa två metoder förbättrar resultatet för SLAM-algoritmen. Det andra tillväggagångssättet försöker minska felet genom att reducera bruset i punktmolnsdatan som används i kartläggningen. Det görs genom att hitta pla-na ytor i punktmolnen. Medianfiltrering används som en alterpla-nativ lösning för att jämföra hur bra planestimeringen står sig. Planestimeringen visar sig också minska felet i lösningen, medan medianfiltreringen endast försämrar resultatet.

(4)

(5)

Abstract

Simultaneous localization and mapping (SLAM) is the problem of mapping your surroundings while simultaneously localizing yourself in the map. It is an impor-tant and active area of research for robotics. In this master thesis two approaches are attempted to reduce the drift which appears over time in SLAM algorithms. The first approach tries 3 different motion models for the camera. Two of the mod-els exploit the a priori knowledge that the camera is mounted on a trolley. These two methods are shown to improve the results. The second approach attempts to reduce the drift by reducing noise in the point cloud data used for mapping. This is done by finding planar surfaces in the point clouds. Median filtering is used as an alternative to compare the result for noise reduction. The planes estimation approach is also shown to reduce the drift, while the median estimation makes it worse.

(6)

(7)

Notation

Notations

Notation Meaning

pi_s Point i from point cloud s p(z) The z-component of p

Z0:k The set {Z0, ..., Zk}

L _{Label map}

N _{Normal map}

D _{Depth discontinuity map}

I_(n) n × n identity matrix

Abbreviations

Abbreviation Meaning

EKF Extended Kalman Filter

FOV Field Of View

ICP Iterative Closest Point IMU Inertial Measurement Unit PCA Principal Component Analysis PCL Point Cloud Library

SLAM Simultaneous Localization And Mapping

TOF Time Of Flight

(10)

(11)

1

Introduction

Simultaneous localization and mapping (SLAM) is the problem of determining ones position (or more typically the position of a robot or vehicle) relative to tar-get locations of unknown, but assumed time invariant, positions [Durrant-Whyte and Bailey, 2006]. This is an important problem in autonomous robotics since it helps making navigation decisions and gives the robot awareness of its surround-ings. The problem has already been solved for laser scanners, 2D cameras and depth (3D) cameras, and there are existing applications working in real time [En-dres et al., 2012][Kamarudin et al., 2014]. Figure 1.1 schematically illustrates a robot moving and observing landmarks at different points in time. In SLAM these landmarks would be put in a global map for future reference when calculating one’s position.

There are however drawbacks to each data acquisition method. Laser scan-ners have a very limited field of view, 2D cameras do not give very precise depth data and a very sparse point map, and the resolution of 3D cameras is relatively low and they give large amounts of information that is hard to associate correctly between frames. 2D cameras are interesting in this context because aside from the mapping problem they can also be used for other computer vision tasks where 2D images excel. This is because much of the information in our surroundings is adapted to human vision, such as signs. Also, color information is typically more descriptive than for example a point cloud. The color information can be combined with 3D information, which has become more popular since the intro-duction of the Microsoft Kinect.

This master thesis was carried out at the offices of SICK IVP in Linköping. The thesis will examine the possibility of using the a priori knowledge that depth images are from an indoor environment when performing SLAM. The work on the thesis was started in the course TDDD89 Scientific Method and was contin-uously worked on throughout the thesis project. The introduction explains the

(12)

2 1 Introduction

goals of the thesis, the approach to reach these goals, as well as the motivation to why these goals were selected. Chapter 2 explains some concepts that have to be understood in order to understand the rest of the thesis, although this is not a full tutorial for these subjects. This chapter also mentions other approaches to SLAM. Chapter 3 presents the algorithms that will be tested as well as how they will be evaluated, and chapter 4 presents the results of these methods. Finally the results are discussed in chapter 5.

t=0

t=1 t=2

Figure 1.1: A schematic figure of a robot moving and observing landmarks at different times.

1.1 Motivation

SLAM is a problem that has to be solved for many applications, particularly in autonomous robot platforms. A good SLAM solution could be used in a variety of products that need to map their environment, particularly indoors since that is where SLAM is most likely to work well. For example, if a rescue squad has to enter a building with an unknown layout a robot could be sent in first to map the environment, thus giving a better idea of what can be expected once inside. Or if the police or military are taking pictures for evidence in a building the exact location where a picture was taken from can be known. Another application example is for a robot moving autonomously in an indoor environment, such as a factory floor or a hospital.

One specific problem within SLAM is to figure out how each image frame relates to the global map. Especially when only using point cloud data there is a risk that these transformations from one frame to the global map are wrong due to erroneous point associations or noise. This could result in a completely wrong solution, causing jumps in the global map, or more commonly drifts in the solution. Man-made structures are typically constructed from flat surfaces, which can be exploited to reduce the risk of erroneous point association, thus increasing the SLAM robustness.

(13)

1.2 Aim 3

1.2 Aim

The aim of this thesis is to explore ways to increase the precision of a simple 3D camera SLAM implementation in indoor environments. The idea is to do that by regularizing with regard to how the camera moves (along the floor). Man made environments typically consists of flat surfaces. It would also be interesting to know if these flat surfaces can be exploited to reduce noise in the images from the camera, and if this noise reduction will reduce the drift in the solution.

1.3 Research questions

With the aims in section 1.2 in mind the following questions will be answered: • How big is the difference in drift when comparing SLAM with and without

regularized camera pose?

• The input to the SLAM algorithm is point clouds. Does reduction of noise in these point clouds through surface estimation reduce drift in the solution? • How does such a method compare to a simpler method, such as median

filtering of the z-component of the point cloud data? The z-component is chosen because it is the dimension that is most influenced by noise. This will be tested by implementing a simple SLAM algorithm. Additional error functions for the ICP algorithm (see section 2.2), plane estimation, and median filtering will then be added modularly. The different combinations of methods will then be tested on a set of recorded data.

1.4 Delimitations

All images will be captured indoors since the TOF camera works best when not in direct sunlight. The camera will also be mounted on a trolley, thus moving approximately in a 2D plane along the floor. This is because this is a reasonable assumption in an indoor application and it could simplify some calculations.

(14)

(15)

2

Theory

This chapter covers both relevant theory and related work. First it gives a brief and general introduction to the SLAM problem in section 2.1. It then covers some variations of the important ICP algorithm in section 2.2. The more specific problem of SLAM on time-of-flight cameras is then presented in section 2.3. Map representation is covered in section 2.4, and finally evaluation of the results are covered in section 2.5.

2.1 SLAM

This section will give a brief, general description of the SLAM problem. For a more thorough description see [Durrant-Whyte and Bailey, 2006] and [Bailey and Durrant-Whyte, 2006].

SLAM is typically formulated as a Bayesian posterior probability problem, which would be to maximize the expression in equation 2.1.

P(xk, m|Z0:k, U0:k, x0) (2.1) where

• xk is the vehicle position at time k

• m is a list of all observed landmarks

• Z0:kare the sets of landmark observations up to time k

• U0:kis the control input up to time k, for example a steering signal or infor-mation from a gyro

• x0is the vehicle starting position. 5

(16)

6 2 Theory

This means that we are trying to estimate the vehicle and landmark locations at time k, given a starting position, control input, and landmark observations up to that point. It is desirable to do this recursively so that at each time step we only update the results based on the last set of observations.

[Durrant-Whyte and Bailey, 2006] describes two typical approaches to proba-bilistic SLAM: The extended Kalman filter (EKF) approach and the particle filter approach. The EKF approach tracks both the landmark coordinates and the cam-era pose, and how certain we are of these coordinates and the pose. It formulates a time update and a measurement update. The time update estimates a new ve-hicle position based on the previous position estimations, the control input (if any) and a motion model. It also increases the uncertainty of the current posi-tion. The measurement update updates the estimated vehicle position and the estimated landmark locations based on an observation model. It also decreases the uncertainty of the current position and the landmark positions.

The particle filter approach, called FastSLAM, will not be considered in this text, but the main advantage is that the probability distribution is considered over the whole trajectory X0:k which makes the landmarks independent, thus removing the quadratic complexity in the covariance (uncertainty) matrix.

There are some major issues included with SLAM that one need to take into account when implementing it. Some of these are computational effort and data association.

Computational effort: There are typically many (thousands or millions) land-marks, and this number keeps growing with the map. With the naive imple-mentation the computations grow quadratically with regards to the number of landmarks. There are some ways to remedy this, for example by updating a local map with a high frequency and the global map with low frequency.

Data association: Landmarks in new measurements will have to be associated with the landmarks from previous measurements in order for the algorithm to work. They also must not be associated with the wrong landmarks, because then the model might break down completely. Two solutions to this problem is to take into account the structure of several landmarks at the same time and to take into account appearance signatures, for example color or structure.

2.2 Iterative Closest Point

Assume you have two point clouds, each originating from a depth camera. The point clouds are partly overlapping the same area. ICP is an algorithm for finding the transform that transforms the source cloud into the target cloud’s coordinate frame. The most basic way of doing this is called point-to-point ICP and is de-scribed in algorithm 1.

(17)

2.2 Iterative Closest Point 7 Algorithm 1Point-to-point ICP

Input: source point cloud Ps, target point cloud Pt, starting solution T

Output: optimized relative transformation T 1: repeat 2: Ad← ∅ 3: for all pi_s∈_P_s_do 4: p ←find_closest_point(pis, Pt, T) 5: if kp − pi_sk_{< r then} 6: Ad ←Ad∪ {p, pis} 7: T ←optimize_alignment(Ad)

8: until(Change(T) < θ) or iterations > max_iterations

T= arg min T 1 2 X j∈Ad T(pjs) − pjt 2 (2.2)

For each point in the source cloud, after it has been transformed with T, the algorithm picks the closest point in the target cloud and adds the point pair to

Ad. This is done on lines 3-6. The alignment optimization on line 7 is then done

as in equation 2.2. This is repeated until convergence or a maximum number of iterations have been reached.

Erroneous point associations will cause the algorithm to fail. Therefore ICP is very sensitive to local minima, which means the starting solution is important. Since there is no 1-to-1 correspondence between the points, and the point clouds may not completely overlap, one should also ignore point pairs that are too far away from each other. This is done on line 5, where r is the distance threshold. Figure 2.1 illustrates the overlap problem. In this example the two curves should be fitted to each other. The red parts should however not be associated with any part of the other curve.

Figure 2.1: A figure illustrating two curves that are partly identical and could overlap. The green parts are similar while the red parts stretch outside of the other curve’s range.

(18)

8 2 Theory

[Chen and Medioni, 1991] introduced the point-to-plane error metric to ICP. This method takes into account that structures are typically locally planar. The error metric when optimizing alignment then changes from the one in equation 2.2 to the one in equation 2.3. nj_tis the plane normal at point pj_t. This error metric has proven superior to the point-to-point error metric in indoor environments.

T= arg min T 1 2 X j∈Ad T(pjs) − p j t · nj_t 2 (2.3) [Segal et al., 2009] suggest "Generalized ICP" as an improvement to point-to-plane ICP. This method changes the error metric to a more general form. In particular they suggest a form they call plane-to-plane ICP. This form takes into account the structure of both the source and the target point cloud in the same way point-to-plane ICP takes into account the structure of the target cloud. They show that this somewhat improves the accuracy of ICP, but more importantly makes it more robust to different choices of the distance threshold r. Generalized ICP has been implemented in PCL.1

2.3 SLAM using RGB-D camera

[Henry et al., 2012] describe an algorithm that exploits the advantages of both the depth image from a time-of-flight camera and the pixel values in an RGB image. In the algorithm there are two threads running simultaneously. One is responsible for the frame to frame alignment using an algorithm the authors call RGB-D ICP, the other is responsible for loop closure.

RGB-D ICP works as follows:

1. Extract sparse features from the source and target RGB images.

2. Associate the features with their depth values. In the paper the FAST fea-ture detector is used in combination with Calonder feafea-ture descriptor, im-plemented in OpenCV.

3. Use RANSAC to find the best transformation between the two sets of fea-tures.

4. If necessary perform ICP.

Loop closing: This is done in two steps, detection and correction. Detection is done by performing RANSAC on features from keyframes. Keyframes are chosen such that the loop closing does not take too much computational power, for ex-ample every n:th frame or when the visual overlap with the previous keyframe is too small. [Henry et al., 2012] suggests two strategies for correction, pose graph estimation and sparse bundle adjustment. Pose graph estimation is much faster than bundle adjustment approach, but not as accurate.

1_{http://docs.pointclouds.org/trunk/classpcl_1_1_generalized_iterative_} closest_point.html

(19)

2.4 Map representation 9 [Rydell et al., 2016] take a different approach. Their system consists of a Kinect camera and an IMU. From the Kinect they get intensity and depth images. The IMU gives them information about acceleration in 3 dimensions and angular velocities. From the Kinect images SIFT feature points are extracted and consid-ered potential landmarks. The feature points and IMU information are merged with an EKF. The feature points, along with the camera pose, are tracked through the images until they are not observed for a set amount of time.

2.4 Map representation

Visualization of the result of the SLAM algorithm is not the focus of this the-sis, but a few alternatives will be mentioned anyway, since the visualization is necessary to get an idea of how good the algorithm is and can potentially be com-putationally expensive. [Henry et al., 2012] suggests a method for producing 3D maps. They use a surfel (surface element) representation of the map. Each sur-fel contains information on location, size, orientation, color, and confidence. As a surfel is observed from several locations its confidence is increased. If other surfels are observed through a surfel its confidence is lowered. If the confidence becomes too small the surfel is removed.

The surfel map generation can be slow however, taking up to 3 seconds for a relatively large map for each update [Henry et al., 2012]. For a 2D application, typically a robot moving on the floor, a 2D map might be enough. In [Almansa-Valverde et al., 2012] this is done by implementing an occupancy grid (or rather, a set of occupancy grids). Each cell in the grid contains the certainty that the cell is occupied. As the robot gets detections within the grid the certainty that the grid is occupied is increased. Since the final map size is unknown during runtime grids are added to the map as the robot explores outside the existing grids. Figure 2.2 shows an example of an occupancy grid containing 3 sub grids and with a few cells with detections. The algorithm is performed in 3 main steps. Step 1: Select which detections to use. If the height of the robot is known this

information can be used to decide which detections are far enough from the floor to be ignored since the robot could safely pass below them. An example of an object giving such detections could be a table.

Step 2: Calculation of map coordinates. The coordinates measured by the robot are relative to the robot and not the map origin and thus has to be trans-formed into the global coordinate system. Since in this application this is already done by the SLAM algorithm this will not be discussed further here. Step 3: Update occupancy grid. For each 3D point, increase the probability of

occupancy in the cell it falls into. Typically many points will fall into the same cell if it is occupied. Therefore each point should only increase the probability with a small value. This is only done if more than N points fall into a cell within one frame in order to prevent noise from ruining the map. Another step to remove noise influence and also remove the influence from

(20)

10 2 Theory

objects moving around in the scene is to decrease the occupancy probability in some cells. This is done by checking which cells fall between the robot and the cells with detections in them, similar to [Henry et al., 2012]. These cells are likely to be empty since the cells which got hits would be occluded otherwise.

Figure 2.2:An example of an occupancy grid containing 3 sub grids. Using a 3D occupancy grid is a bad idea since it requires too much memory to handle efficiently. An alternative is to use a sparse octree representation of the data. An octree is a tree datastructure where each node has 8 children. The exact theory will be left out of this thesis, but the main idea is that each level of the tree represents one level of detail, and each node represents one part of space. The space a node represents is then split up among its children. If a node contains only empty space it will be a leaf. The depth of the tree is set according to the level of detail wanted. Figure 2.3 contains an example of an image represented by a quadtree, the 2D correspondence to an octree. The squares are the nodes at each level of detail. Sparse octrees have been shown by [Laine and Karras, 2010] to be memory efficient when storing 3 dimensional surface data and time efficient when performing raycasting. For example, [Bengtsson, 2014] uses sparse octrees to represent 3D scans from a Kinect.

2.5 Evaluation

Henry et al. [Henry et al., 2012] evaluates three aspects: The global consistency, the properties of RGB-D ICP, and the properties of the global optimization. The global consistency is tested qualitatively by overlaying the 3D model on a 2D floor plan of the testing environment and comparing them. The RGB-D ICP properties are tested by setting up markers with known distances between them. The cam-era is then carried between the markers in turn, producing measurements for each marker. These measurements are then compared to the real world measure-ments.

(21)

2.5 Evaluation 11

Figure 2.3: Sparse quadtree example. Each square in the image is repre-sented by a node in the quadtree. Black is the coarsest level of detail, green the finest.

An alternative to evaluating by measuring distances in the image is to apply the algorithm on standard datasets with known ground truth data. [Sturm et al., 2012] presents one such method and provides datasets [Sturm, 2014].

(22)

(23)

3

Method

This chapter describes the system setup in section 3.1, the tested algorithms in section 3.2 and finally the evaluation process in section 3.3.

3.1 System setup

The camera is fastened on a trolley. It is also connected to a laptop via an Ethernet cable. An image of the setup is shown in figure 3.1. The data is gathered by pushing the trolley around the office of SICK IVP at unhurried walking speed. This setup means that the rigid transformation describing the camera motion between frames can be changed from 6 degrees of freedom (3 for translation and 3 for rotation) to 3, since the trolley will only move in a 2D plane and only rotate around 1 axis.

The camera used is a SICK Visionary-T, which is a time-of-flight camera that records 144 x 176 pixel images at 30 frames per second.1 Its depth precision is 3 mm at 1 m distance and 30 mm at 7 m distance. Each frame consists of a depth image, an intensity image and a confidence image. The confidence image gives a confidence between 0 and 1 for each pixel in the depth image. The camera is shown in figure 3.2. The laptop runs on an Intel Core i5-2520M CPU, a 2.50GHz dual core processor from 2011, and 8 GB of RAM.

3.2 Tested algorithms

Several algorithms have been tested and compared. The first three are different motion models. The last two are pre-processing steps that aim to reduce the noise in the point clouds.

1_{https://www.sick.com/de/en/vision/3d-vision/3vistor-t/c/g358152}

(24)

14 3 Method

Figure 3.1:Image of the camera setup.

1. 3D SLAM 2. 2D SLAM

3. 3D SLAM with regularization A Plane estimation

B Median filtering

3D SLAM is a very basic version of SLAM. Each frame from the camera is assumed to be in the form of a point cloud. For each frame it finds the relative transformation to the previous one. It then adds the point cloud to a global map. The following two algorithm are variations of the first with different limitations added when estimating new camera poses. The last two algorithms complement the first three and can be run in parallel. Plane estimation estimates planes in each local point cloud and then fits the points into these planes. The idea is that this will reduce the noise, and thus the drift that is introduced over time. Median filtering is performed before the plane estimation if it is activated. It is mainly used here as a reference point.

(25)

3.2 Tested algorithms 15

Figure 3.2:Image of the time of flight camera used to gather data.

3.2.1 3D SLAM

The first algorithm tested is very simple. A rigid transformation is estimated for each frame in order to place it into the global map. This is done frame-by-frame as in algorithm 2.

Algorithm 2Simple SLAM

Input: point clouds P0,. . . , PN, rotation threshold θrot, translation threshold

θtrans, maximum number of ICP iterations max_iterations Output: generated map M, final camera pose T

1: M ← ∅ 2: T ← I(4) 3: Tprev←I(4) 4: i = 1 5: whilei <= N do 6: T∗ _← TT−1 prevT 7: repeat 8: T∗_temp←_T∗ 9: Ad ←find_closest_points(Pi, Pi−1, T ∗ ) 10: T∗←_{arg min} T X j∈Ad T(pjs) − pjt · nj_t 2

11: until (rotation_diff(T∗, T∗temp) < θrot and translation_diff(T∗, T∗temp) <

θtrans) or iterations > max_iterations 12: Tprev←T

13: T ← T∗

14: Pi ←transform_point_cloud(Pi, T∗)

15: M ← M ∪ Pi

(26)

16 3 Method

where T is defined in equation 3.1.

T=             r0,0 r0,1 r0,2 tx r1,0 r1,1 r1,2 ty r2,0 r2,1 r2,2 tz 0 0 0 1             (3.1)

The algorithm is initialized on lines 1-4. Each transformation is estimated with point-to-plane ICP by using the previous frame (which has been transformed into the global map) as the target frame and the current frame as the source frame. A constant velocity model is assumed, thus the algorithm is initialized with the previous transform multiplied with the relative transform between the two pre-vious frames. This is done on line 6.

Lines 7-11 describe point-to-plane ICP as in section 2.2. The distance thresh-old t is set to 500mm, which was decided after testing. Point-to-plane ICP was chosen instead of point-to-point ICP because it generally performs better in man-made environments which largely consists of flat surfaces. It was chosen over plane-to-plane ICP since plane-to-plane ICP adds complexity to the algorithm without improving the performance.

For each point in the target point cloud its normal is estimated using PCA, using all points within a 100 mm radius. On line 10 T∗is then estimated so that the point-to-plane error function is minimized. The correspondence selection and estimation of T∗

is repeated until T∗

does not change anymore or a maximum number of iterations have been performed. In this case the algorithm is set to stop if the translation part is smaller than 0.1 mm and the rotation is smaller than 10−5radians. The maximum number of iterations is set to 10. This number of iterations is in practice rarely reached however. To lower the computational time only a few of the points in the point cloud are used. A 5 pixel border is removed, since these points have been observed to not always be reliable due to distortions close to the edges. All points below a set confidence (0.8) are removed. Finally random points are removed until there are only 1500 points left, which is less than 1/10 of the original number of points.

In the beginning of the project the plan was to initialize ICP with a RANSAC algorithm, as is done by [Henry et al., 2012]. It would sample 2D intensity key-points in the source frame and try to match them with keykey-points in the target frame. Then it would find the best 3D rigid transform between these points (by using their corresponding depth data). This idea was however discarded for sev-eral reasons. The biggest problem is that the intensity images from the camera are of very low quality with low resolution and high noise levels. Many scenes also did not generate any keypoints at all (or at least too few to be of any use). The keypoints that are found are often located at depth discontinuities. This lead to the RANSAC algorithm not producing a solution or, worse, giving a bad solu-tion which in turn made the ICP unreliable. Instead the constant velocity model mentioned above is used.

(27)

3.2 Tested algorithms 17

3.2.2 2D SLAM

The second tested algorithm assumes that the camera moves only along its x- and

z-axes, i.e. along the floor, and only rotates about its y-axis. This means that T

will be defined as in equation 3.2.

T=              cos(θy) 0 sin(θy) tx 0 1 0 0 −_sin(θ_y₎ ₀ _cos(θ_y₎ _t_z 0 0 0 1              (3.2)

In all other aspects it is identical to the method described in section 3.2.1. This should work fine as long as the assumption that the camera only moves in a 2D plane is correct. This algorithm could be sensitive to doorsteps or other unevennesses in the floor. It also means that a change in height is not possible, so it will not work in an environment with height variations.

3.2.3 3D SLAM with regularization

The third algorithm tested allows 3D movement as in section 3.2.1, but with reg-ularization terms that punish movement along the y-axis and rotations that are not about the y-axis. Thus the equation on line 10 in algorithm 2 is exchanged for equation 3.3. T∗= arg min T X j∈Ad T(pjs) − pjt · nj_t 2 + (Crϕ2e)2+ (Ctt2y)2 (3.3)

where T is defined as in equation 3.1, ϕ is the deviation of the rotation axis from the y-axis, and Cr and Ctare constants.

Testing shows that good values for Cr and Ctare Cr = 1000 and Ct= 100. ϕ

is calculated as in equation 3.4. ϕ is also illustrated in figure 3.3.

ϕ = arccos_kv· ey

vk ke_yk (3.4)

where v is the rotation axis and eyis the basis vector along the y-axis.

3.2.4 SLAM with plane estimation

In an attempt to remove noise from the point clouds, plane estimation was intro-duced. The algorithm attempts to find all planar surfaces in each frame. It then moves the points belonging to these surfaces into the plane. The plane region detection is done similar to [Salas-Moreno et al., 2014] with a few modifications. It is summarized in algorithm 3. S(x, y) =       

1 if kP (x) · N (x) − P (y) · N (y)k < δ(P (x)(z))2and N (x) · N (y) > cos(Θ) 0 otherwise

(28)

18 3 Method

Figure 3.3: A figure describing the rotation error ϕ. v is the rotation axis and eyis the basis vector along the y-axis.

Algorithm 3Plane estimation

Input: Organized point cloud P consisting of pixel coordinates ui and

corre-sponding 3D points xi

Output: P with points moved into planes

1: N ←_{estimate_surface_normals(P )} 2: for all u ∈P do

3: D_{(u) ←} Y

ui∈ω

S(u, ui) . ω is a 5 × 5 window with u in the center

4: L ←_{generate_label_map(D)} 5: for all u ∈P do

6: add_point_to_plane(P (u), YL_(u)) 7: for allplanes Yndo

8: estimate_plane_parameters(Yn)

9: for allpoints xi ∈Yndo

10: distance_f rom_plane ← nn•xi . nnis the plane normal of Yn

11: x_i = xi−nn· distance_f rom_plane

First the surface normal for each point is estimated on line 1. This is done with PCA, using the K nearest neighbors. The depth data from the camera is quite noisy, therefore K needs to be set relatively high. K = 50 was used in this case.

Then, on lines 2-3, for each point u ∈ P , it is decided whether that point is in a local plane. This is decided by comparing it to all other points within a square window with u in the middle. The comparison is done as in equation 3.5. As-sume a plane is parametrized as (nx, ny, nz, d), where nx, nyand nzare the plane

normal components and d is the plane distance from the origin. The first part of equation 3.5 then compares the plane distances of the planes passing through these points. The second part of equation 3.5 compares the normals. In this project δ = 6 · 10−5and Θ = 0.1571. If all comparisons return 1 the correspond-ing point in the depth discontinuity map D is set to 1, otherwise to 0.

(29)

3.3 Evaluation 19 Given D a label map L is generated on line 4. This is done with a Union-Find algorithm. Union-Union-Find is a common way to find connected components. It implements two functions, union(A, B) and find(A). Union(A, B) unites the two sets which A and B belongs to. Find(A) returns the set A belongs to. In this case each point ui ∈ P is given a unique set with a corresponding label. Each set is

then united with the one to the left and the one above if and only if D(ui) = 1.

The result is that all points belonging to the same plane will have the same label. L _{is then used to gather all points into groups (planes) depending on their} plane label. This is done on lines 5-6. From each group the corresponding plane parameters are estimated using PCA (lines 7-8). Finally each point is orthogo-nally projected into the plane it is grouped in.

3.2.5 Median filtering

An optional step before anything else is done is to median filter the frame. The intent is to reduce the noise in surfaces. A median filter is chosen instead of a low pass filter to avoid edge problems in the point cloud. Only the z-component (corresponding approximately to the depth value) of each point is filtered. The filter size is 5 pixels. This size was chosen because a larger filter was deemed to destroy too much details in the point clouds. A smaller size would be too small to deal with the large amounts of noise that is present in most frames.

3.3 Evaluation

The problem with evaluating the proposed algorithm on the equipment that is used for this project is that it is closely tied to the real world. Because the noise and lens distortion sometimes are very extreme they are hard to model. The noise levels also vary much depending on the material of the surface. That raises the problem of how to establish a ground truth that can be compared to the algorithm output. One way to establish the ground truth is by using a motion tracking system, e.g. as in [Sturm et al., 2012] or [Huang et al., 2013]. There are two problems with this approach however. The first one is that no such system will be available for this project. The second one is that it is unfeasible to be able to track the camera movement with such a system over an area as large as the one intended for this project.

Another way to perform evaluation is by running the algorithm on a bench-mark dataset with known ground truth. This could provide a good starting point, but these datasets seem to mostly come from Kinect cameras where the image is of a higher resolution than in the time of flight camera used in these experi-ments. Most of these datasets also seem to come from hand held cameras. Thus the whole system, with a lower resolution time-of-flight camera on a trolley, can-not be evaluated with these datasets. The ground truth for these datasets are typically recorded with motion tracking systems, as in [Sturm et al., 2012].

Evaluation of the system described in section 3.1 will instead be done similar to [Henry et al., 2012]. They place landmarks that are easily identifiable in the

(30)

20 3 Method

generated map and measure the distances between them. These distances are then compared to the corresponding distances in the map. For this thesis no landmarks will be placed. Instead the locations of the camera will be marked in the real world, and the distance between these locations will be measured with a measuring tape. The estimated extrinsic camera parameters will be considered the output from the algorithm. The estimated pose can then be compared to the measured pose. Two types of tests will be performed:

The first type will measure how well the system handles rotations. This is done by rotating the trolley 360 degrees in place. Marks on the floor are used to make sure the trolley ends at the same location that it starts. Natural lines in the scene, such as angles in the wall and pillars, are used to make sure the trolley is correctly aligned in the start and the stop frame. The camera rotation should then be the same in the end of the sequence as in the beginning. Thus the camera rotation estimated by the SLAM method should also be the same in the end as in the beginning. The results of these tests are the deviation from the starting pose in radians.

The second type will test how much the estimated camera pose drifts from the real pose when moving through a corridor or a room. This is done by mov-ing the trolley approximately 10-15 meters in the current test location and then measure the exact distance traveled. As mentioned above, markers are placed by the trolley in the starting position and again in the final position. The distance between the markers are then measured using a measuring tape. This distance is compared to the final position of the camera in the SLAM system.

Since the ground truth measurements are not perfect the results will be bi-ased. In the rotation tests the ground truth errors will consist of deviation from the starting point, both in terms of translation and rotation. The translation de-viation causes a drift of the projection into the image sensor. How far an object drifts in the sensor is proportional to its distance from the sensor. For example, consider an object which is at the edge of the FOV and 2 meters from the camera center, which is approximately the distances used in these tests. The FOV is 69°, which means the distance to the other edge of the FOV is 2.24 m. An illustration of the FOV with distances is shown in figure 3.4. The sensor width is 176 pixels, so each centimeter of drift to the side causes approximately 176₂₂₄ = 0.79 pixels of drift on the sensor. Because of lens distortions this is not completely true, but it is close enough for small distances. The rotation deviation is caused by the camera not being perfectly rotated 360°. Again, the FOV is 69°and the sensor width is 176 pixels. Thus, an error of one pixel in the alignment due to rotation deviation causes a bias of ₁₇₆69 = 0.39 degrees.

In these tests the translation is deemed to be within 5 cm of 0, causing an error of 5 pixels. In addition the misalignment due to erroneous rotation is deemed to be within 5 pixels. This adds up to a total error of maximum 10 pixels. This error translates to 0.39 · 10 = 3.9 degrees, or 0.068 radians.

In the translation tests the ground truth errors are caused purely by measure-ment errors. These errors are deemed to be less than 0.1 m in these tests.

3 datasets of each type are recorded. They are then tested with different pa-rameters in different combinations. The papa-rameters tested are:

(31)

3.3 Evaluation 21

Figure 3.4:An illustration of the distances in the FOV.

• With/without median filtering • With/without plane estimation

• 3D SLAM/2D SLAM/3D SLAM with regularization

(32)

(33)

4

Results

The results of the rotation tests are presented in table 4.1 and the results of the translation tests in table 4.2. The last rows, the mean errors, are also presented as bar graphs in figure 4.1 and figure 4.2. The different parameters are described in section 3.2. The tables present the results in increasing order of mean error magnitude. That is, the parameter combination which gives the smallest mean error is first in the table and the parameter combination which gives the largest mean error is presented last. In figure 4.1 and 4.2 the results are grouped so that each group corresponds to one combination of noise reduction methods, starting with no median filtering or plane estimation to the left and both median filtering and plane estimation to the right.

A few example images of the results have been put in Appendix A.

(34)

24 4 Results T able 4.1: The test resul ts for 3 da tasets and 12 di ff eren t method combina tions are presen ted here, al ong with the mean error . The errors are in radians. The best resul t for each da taset is mar ked with bol d text. Error mode 2D 3D reg. 2D 3D reg. 2D 2D 3D reg. 3D Median fil ter No No No Y es Y es Y es Y es Y es Plane estima tion Y es Y es No Y es Y es No No Y es Rota tion error rota teInO ffi ce2 0.27875 0.24520 0.32450 0.31157 0.31769 0.31141 0.26207 0.38167 Rota tion error rota teInMeetingArea 0.26627 0.28801 0.23605 0.26870 0.28125 0.36200 0.39965 0.31830 Rota tion error rota ting3 0.31873 0.33654 0.33070 0.37463 0.36476 0.33032 0.34240 0.41038 Mean error 0.28792 0.28992 0.29708 0.31830 0.32123 0.33458 0.33471 0.37012 Error mode 3D 3D reg 3D 3D Median fil ter No No Y es No Plane estima tion Y es No No No Rota tion error rota teInO ffi ce2 0.31585 0.32252 0.42748 0.40430 Rota tion error rota teInMeetingArea 0.37487 0.38672 0.46024 0.30860 Rota tion error rota ting3 0.44939 0.44801 0.27295 0.46308 Mean error 0.38004 0.38575 0.38689 0.39200

(35)

25 T able 4.2: The test resul ts for 3 da tasets and 12 di ff eren t par ameter settings are presen ted here, al ong with the mean error . The errors are in percen ts of the ground truth tr ansla tion. The best resul t for each da taset is mar ked with bol d text. Error mode 2D 3D reg. 2D 3D 3D 3D 3D reg. 3D reg. Median fil ter No No No No No Y es Y es Y es Plane estima tion Y es Y es No No Y es Y es Y es No T ransla tion error corridorW alk1 8.77 11.55 15.47 15.02 15.54 16.99 16.68 17.47 T ransla tion error corridorW alk2 6.23 4.40 3.92 4.66 3.01 5.06 4.90 5.66 T ransla tion error offi ceW alk 11.69 10.82 10.21 10.69 14.00 11.26 11.94 10.93 Mean error 8.90 8.92 9.87 10.12 10.85 11.10 11.17 11.35 Error mode 2D 3D 3D reg. 2D Median fil ter Y es Y es No Y es Plane estima tion No No No Y es T ransla tion error corridorW alk1 19.59 16.80 19.80 16.50 T ransla tion error corridorW alk2 4.90 5.94 4.61 6.82 T ransla tion error offi ceW alk 9.92 13.49 13.51 14.88 Mean error 11.47 12.07 12.64 12.73

(36)

26 4 Results Mean error [radians] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Median filter =0 Plane estimation =0 Median filter =1 Plane estimation =0 Median filter =0 Plane estimation =1 Median filter =1 Plane estimation =1 Parameters

Mean rotation errors

3D SLAM 2D SLAM

3D SLAM with regularization

Figure 4.1: The mean rotation errors are presented here. They are grouped by noise reduction method (with or without median filtering and plane esti-mation). The errors are in radians. The black error bars show the standard deviation in the results.

(37)

27 Mean error [%] 0 5 10 15 20 25 Median filter =0 Plane estimation =0 Median filter =1 Plane estimation =0 Median filter =0 Plane estimation =1 Median filter =1 Plane estimation =1 Parameters

Mean translation errors

3D SLAM 2D SLAM

3D SLAM with regularization

Figure 4.2: The mean translation errors are presented here. They are grouped by noise reduction method (with or without median filtering and plane estimation). The errors are in percents of the ground truth distance. The black error bars show the standard deviation in the results.

(38)

(39)

5

Discussion, analysis, and

conclusions

In section 5.1 the results are analyzed and conclusions are drawn. This section also tries to answer the questions asked in section 1.3. Section 5.2 discusses the chosen methods and what could be done differently, and section 5.3 considers the contexts in which this work could be put into use and the ethical consequences of this. Finally, section 5.4 discusses ideas for continued work.

5.1 Results

We will first look at the different SLAM methods and then at the noise reduction methods. We will also take a look at how the results are affected by the bias from measurement errors in the ground truth.

The results show that 3D SLAM is inferior to the two other methods. This is at least very clear for the rotation error, while the translation errors vary more in the results. That 3D SLAM is inferior is expected since this method does not use the a priori knowledge of how the camera is expected to move. 2D SLAM also performs slightly better than 3D SLAM with regularization in most cases, even though most of the time they are very similar. How similar they behave can be adjusted with the Cr and Ct parameters. It is a bit surprising that 3D SLAM

with regularization does not outperform 2D SLAM since the floor is not always completely even, and the camera thus should jump and twist a bit while moving. Apparently these movements are too small for the 3D SLAM with regularization to be necessary. In a less stable environment, where the vehicle moves faster perhaps, 3D SLAM with regularization would probably outperform 2D SLAM. Cr

and Ctcould even be adjusted according to the current speed and environment.

3D SLAM should probably only be used if the camera is moving freely in all 3 dimensions.

Median filtering apparently does more harm than good (compare group 1 and 29

(40)

30 5 Discussion, analysis, and conclusions

2, and group 3 and 4 in figure 4.1 and 4.2). This is probably because it destroys details in the point cloud data while it cannot remove the noise in the planar surfaces as well as the plane estimation method. The plane estimation method is able to use many more points when reducing the noise in these areas, which improves the noise reduction capability. In areas with much structure, on the other hand, the median filter is likely to destroy this structure, while the plane estimation method will leave these areas untouched.

The measurement errors in the ground truth could be quite big, especially in the rotation test datasets. They are, however, still smaller than the errors in the method by a factor 4. Thus the conclusions drawn here are still valid, since the performance order is the same. The measurement errors in the translation test datasets are one order of magnitude smaller than the errors in the methods, and should therefore not affect the results significantly.

5.2 Method

There are some known techniques that have been left out of this thesis to save time but could have been used to improve the results. Instead of using the pre-vious transform as a starting solution for the current one a Kalman filter with a constant velocity or constant acceleration model could have been used, which should improve the conditions for the ICP algorithm. This would also make the method more similar to EKF SLAM, even though no landmarks would be tracked by the filter.

Regarding the evaluation of the algorithms there are some things to keep in mind. There is of course a risk for small measurement errors in the ground truth, resulting in a bias in the results. This is however considered to be negligible in this context. Another source of bias in the measurement results that could potentially have a bigger impact however, is if the camera calibration is a bit wrong. A few tests have been done and it seems that this is not a big issue. These biases also does not affect the relative results between the algorithms since these errors are deemed to be greater than the measurement and camera calibration errors.

It could have been interesting to create artificial data to test with the algo-rithm. This would not say much about how good the algorithm is on the actual hardware, but it could have given some information onwhere the errors appear

since the exact camera pose would be known in each frame. This could then be compared to the algorithm result for each frame.

5.3 The work in a larger context

More cameras in people’s homes means a larger risk that these cameras are being used without the user’s consent or knowledge. The cameras could be exploited both by the producer, who could for example sell data about the user to mar-keting companies as is being done with people’s internet usage today, and by an outside part who could gain access to the camera illegally to spy on the user.

(41)

5.4 Continued work 31 The more intelligent the camera or image processing software is, the more in-formation can be gained from these activities and the more vulnerable the user appears to become. For a SLAM algorithm to be effective in this context the cam-era has to move around, which seems unlikely (or at least easily notable) though. The smarter and cheaper cameras become the more they appear in peoples home however.

Another aspect of SLAM is automation. Although this discussion range many more contexts than SLAM it is still relevant. Who is responsible for the actions of an automated unit? Is it the owner, the vendor, the producer or someone else? This question have to be answered, especially in forms of legislation before auto-mated vehicles can be used in public areas.

5.4 Continued work

There are many ways to improve the algorithm. Many of them are already known and left out of this thesis for time constraint reasons. Examples of this are the point picking part of the ICP algorithm. Right now points are chosen randomly. [Rusinkiewicz and Levoy, 2001] suggest normal space sampling instead, which improves the convergence rate, especially when dealing with mostly isotropic areas.

The normal estimation calculations could be sped up using integral images, for example as in [Holzer et al., 2012]. Even though speed has not been the fo-cus of this thesis, normal estimation is one of the bottlenecks, along with the optimization in the ICP part of the algorithm.

It would also be interesting to investigate if tracking plane identities helps the ICP algorithm. Then planes would have to be merged as a final step of each iteration of the algorithm. The idea would then be to match whole planes against each other. This is partly done in [Salas-Moreno et al., 2014].

(42)

(43)

(44)

(45)

A

Result images

This appendix contains a subset of the results images from the testing. The im-ages have been chosen to cover the best and the worst results. Some of the imim-ages have been marked to highlight typical errors that appears. Figure A.1 and A.2 are two examples of relatively good results. Figure A.3, A.4, A.5, and A.6 are examples of bad results. Comparing figure A.1 and A.3 shows that it’s hard to tell that errors have occurred if the camera have not looped back to a previous location.

(46)

36 A Result images

Figure A.1: Result of the dataset corridorWalk1 with plane estimation and no median filtering, using 2D SLAM.

(47)

37

Figure A.2:Result of the dataset rotateInMeetingArea with plane estimation and no median filtering, using 2D SLAM. The two marked areas are really the same and should be overlapping.

(48)

38 A Result images

Figure A.3:Result of the dataset corridorWalk1 with no plane estimation or median filtering, using 3D SLAM with regularization.

Figure A.4: Result of the dataset officeWalk with plane estimation and me-dian filtering, using 2D SLAM.

(49)

39

Figure A.5:Result of the dataset rotateInOffice2 with plane estimation and no median filtering, using 3D SLAM with regularization. The areas marked with black are really the same and should be overlapping. The area marked with blue clearly show how much the angle has drifted, since this should be a flat wall.

(50)

40 A Result images

Figure A.6:Result of the dataset rotating3 with no plane estimation or me-dian filtering, using 3D SLAM with regularization. The two marked areas are really the same and should be overlapping.

(51)

Bibliography

Sergio Almansa-Valverde, José Carlos Castillo, and Antonio Fernández-Caballero. Mobile robot map building from time-of-flight camera. Expert Systems with Applications, 39(10):8835–8843, aug 2012. ISSN 09574174. doi: 10.1016/ j.eswa.2012.02.006. URL http://www.sciencedirect.com/science/ article/pii/S0957417412002485. Cited on page 9.

T. Bailey and H. Durrant-Whyte. Simultaneous localization and mapping (SLAM): part II. IEEE Robotics & Automation Magazine, 13(3):108–117, sep 2006. ISSN 1070-9932. doi: 10.1109/MRA.2006.1678144. URL http://ieeexplore. ieee.org/articleDetails.jsp?arnumber=1678144. Cited on page 5. Morgan Bengtsson. Indoor 3D Mapping using Kinect, 2014. URL

http://www.diva-portal.org/smash/record.jsf?pid= diva2{%}3A716061{&}dswid=3455. Cited on page 10.

Y. Chen and G. Medioni. Object modeling by registration of multiple range im-ages. In Proceedings. 1991 IEEE International Conference on Robotics and Automation, pages 2724–2729. IEEE Comput. Soc. Press, 1991. ISBN 0-8186-2163-X. doi: 10.1109/ROBOT.1991.132043. URL http://ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=132043. Cited on page 8.

H. Durrant-Whyte and T. Bailey. Simultaneous localization and mapping: part I. IEEE Robotics & Automation Magazine, 13(2):99–110, jun 2006. ISSN 1070-9932. doi: 10.1109/MRA.2006.1638022. URL http://ieeexplore.ieee. org/lpdocs/epic03/wrapper.htm?arnumber=1638022. Cited on pages 1, 5, and 6.

Felix Endres, Jurgen Hess, Nikolas Engelhard, Jurgen Sturm, Daniel Cremers, and Wolfram Burgard. An evaluation of the RGB-D SLAM system. In 2012 IEEE International Conference on Robotics and Automation, pages 1691–1696. IEEE, may 2012. ISBN 978-1-4673-1405-3. doi: 10.1109/ICRA. 2012.6225199. URL http://ieeexplore.ieee.org/lpdocs/epic03/ wrapper.htm?arnumber=6225199. Cited on page 1.

(52)

42 Bibliography

P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research, 31(5):647–663, feb 2012. ISSN 0278-3649. doi: 10.1177/0278364911434148. URL http://ijr.sagepub. com/content/31/5/647.abstract. Cited on pages 8, 9, 10, 16, and 19. S. Holzer, R. B. Rusu, M. Dixon, S. Gedikli, and N. Navab. Adaptive neighborhood

selection for real-time surface normal estimation from organized point cloud data using integral images. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2684–2689. IEEE, oct 2012. ISBN 978-1-4673-1736-8. doi: 10.1109/IROS.2012.6385999. URL http://ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6385999. Cited on page 31.

Albert S Huang, Abraham Bachrach, Peter Henry, Michael Krainin, Dieter Fox, and Nicholas Roy. Visual odometry and mapping for autonomous flight using an RGB-D camera, 2013. URL https://login.e.bibl.liu.se/ login?url=https://search.ebscohost.com/login.aspx?direct= true{&}db=edsbas{&}AN=edsbas.ftciteseerx.oai.CiteSeerX. psu.10.1.1.352.6446{&}site=eds-live{&}scope=site. Cited on page 19.

Kamarulzaman Kamarudin, Syed Muhammad Mamduh, Ali Yeon Md Shakaff, and Ammar Zakaria. Performance analysis of the Microsoft Kinect sensor for 2D Simultaneous Localization and Mapping (SLAM) techniques. Sen-sors (Basel, Switzerland), 14(12):23365–87, jan 2014. ISSN 1424-8220. doi: 10.3390/s141223365. URL http://www.mdpi.com/1424-8220/14/12/ 23365/htm. Cited on page 1.

Samuli Laine and Tero Karras. Efficient sparse voxel octrees. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games - I3D 10, page 55, New York, New York, USA, feb 2010. ACM Press. ISBN 9781605589398. doi: 10.1145/1730804.1730814. URL http://dl.acm. org.e.bibl.liu.se/citation.cfm?id=1730804.1730814. Cited on page 10.

S. Rusinkiewicz and M. Levoy. Efficient variants of the ICP algorithm. In Proceedings Third International Conference on 3-D Digital Imaging and Modeling, pages 145–152. IEEE Comput. Soc, 2001. ISBN 0-7695-0984-3. doi: 10.1109/IM.2001.924420-7695-0984-3. URL http://ieeexplore.ieee.org/ articleDetails.jsp?arnumber=924423. Cited on page 31.

Joakim Rydell, Erika Bilock, and Håkan Larsson. Navigate into danger. In 2016 IEEE/ION Position, Location & Navigation Symposium - PLANS 2016, 2016. Cited on page 9.

Renato F. Salas-Moreno, Ben Glocken, Paul H. J. Kelly, and Andrew J. Davison. Dense planar SLAM. In 2014 IEEE International Sympo-sium on Mixed and Augmented Reality (ISMAR), pages 157–164. IEEE,

(53)

Bibliography 43 sep 2014. ISBN 978-1-4799-6184-9. doi: 10.1109/ISMAR.2014.6948422. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm? arnumber=6948422. Cited on pages 17 and 31.

Aleksandr V. Segal, Dirk Haehnel, and Sebastian Thrun. Generalized-ICP, 2009. URL http://www.robots.ox.ac.uk/{~}avsegal/resources/ papers/Generalized{_}ICP.pdf. Cited on page 8.

Jürgen Sturm. RGB-D SLAM Dataset and Benchmark, 2014. URL http: //vision.in.tum.de/data/datasets/rgbd-dataset. Cited on page 11.

Jürgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evaluation of RGB-D SLAM systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573–580. IEEE, oct 2012. ISBN 978-1-4673-1736-8. doi: 10.1109/IROS. 2012.6385773. URL http://ieeexplore.ieee.org/lpdocs/epic03/ wrapper.htm?arnumber=6385773. Cited on pages 11 and 19.

Improving SLAM on a TOF Camera by Exploiting Planar Surfaces

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2016