Simultaneous Localization And Mapping Using a Kinect in a Sparse Feature Indoor Environment

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Simultaneous Localization And Mapping Using a

Kinect™ In a Sparse Feature Indoor Environment

Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet

av

Fredrik Hjelmare & Jonas Rangsjö LiTH-ISY-EX--12/4587--SE

Linköping 2012

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Simultaneous Localization And Mapping Using a

Kinect™ In a Sparse Feature Indoor Environment

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Fredrik Hjelmare & Jonas Rangsjö LiTH-ISY-EX--12/4587--SE

Handledare: Christina Grönvall

isy_{, Linköpings universitet} Johan Kinander Cybercom Erik Almqvist Cybercom Hannes Töllborg Cybercom

Examinator: Fredrik Gustafsson

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Automatic Control

Department of Electrical Engineering SE-581 83 Linköping Datum Date 2012-06-11 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-78885

ISBN — ISRN

LiTH-ISY-EX--12/4587--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Simultan lokalisering och kartering med hjälp av en Kinect™ i en inomhusmiljö med få landmärken

Simultaneous Localization And Mapping Using a Kinect™ In a Sparse Feature Indoor Envi-ronment

Författare Author

Fredrik Hjelmare & Jonas Rangsjö

Sammanfattning Abstract

Localization and mapping are two of the most central tasks when it comes to autonomous robots. It has often been performed using expensive, accurate sensors but the fast develop-ment of consumer electronics has made similar sensors available at a more affordable price. In this master thesis a TurtleBot™ robot and a Microsoft Kinect™ camera are used to per-form Simultaneous Localization And Mapping, SLAM. The thesis presents modifications to an already existing open source SLAM algorithm. The original algorithm, based on visual odometry, is extended so that it can also make use of measurements from wheel odometry and a single axis gyro. Measurements are fused using an Extended Kalman Filter, EKF, oper-ating in a multirate fashion. Both the SLAM algorithm and the EKF are implemented in C++ using the framework Robot Operating System, ROS.

The implementation is evaluated on two different data sets. One set is recorded in an ordi-nary office room which constitutes an environment with many landmarks. The other set is recorded in a conference room where one of the walls is flat and white. This gives a partially sparse featured environment.

The result by providing additional sensor information is a more robust algorithm. Peri-ods without credible visual information does not make the algorithm lose its track and the algorithm can thus be used in a larger variety of environments including such where the possibility to extract landmarks is low. The result also shows that the visual odometry can cancel out drift introduced by wheel odometry and gyro sensors.

Nyckelord

Keywords SLAM, RGBDSLAM, Kinect, TurtleBot, Sparse Feature Environment, Extended Kalman Fil-ter, Robot Operating System

(6)

(7)

Abstract

Localization and mapping are two of the most central tasks when it comes to autonomous robots. It has often been performed using expensive, accurate sen-sors but the fast development of consumer electronics has made similar sensen-sors available at a more affordable price.

In this master thesis a TurtleBot™ robot and a Microsoft Kinect™ camera are used to perform Simultaneous Localization And Mapping, SLAM. The thesis presents modifications to an already existing open source SLAM algorithm. The original algorithm, based on visual odometry, is extended so that it can also make use of measurements from wheel odometry and a single axis gyro. Measurements are fused using an Extended Kalman Filter, EKF, operating in a multirate fashion. Both the SLAM algorithm and the EKF are implemented in C++ using the frame-work Robot Operating System, ROS.

The implementation is evaluated on two different data sets. One set is recorded in an ordinary office room which constitutes an environment with many landmarks. The other set is recorded in a conference room where one of the walls is flat and white. This gives a partially sparse featured environment.

The result by providing additional sensor information is a more robust algorithm. Periods without credible visual information does not make the algorithm lose its track and the algorithm can thus be used in a larger variety of environments in-cluding such where the possibility to extract landmarks is low. The result also shows that the visual odometry can cancel out drift introduced by wheel odome-try and gyro sensors.

(8)

(9)

Acknowledgments

We would like to thank our supervisors at Cybercom, Johan Kinnander, Erik Almqvist and Hannes Töllborg for their support during this project. A thanks also goes to our supervisor Christina Grönwall and our examiner Fredrik Gustafs-son at LiU. Finally we would also like to thank Johan IvansGustafs-son at Cybercom for giving us the opportunity to formulate and perform this thesis project.

Linköping, June 2012 Fredrik Hjelmare & Jonas Rangsjö

(10)

(11)

1

Introduction

This chapter gives a background to Simultaneous Localization And Mapping, SLAM and outlines the project by setting up the problem formulation. It also presents the limitations of the project and ends with a thesis outline.

1.1 Simultaneous Localization And Mapping

SLAM is an umbrella term for algorithms that build maps of an unknown envi-ronment and at the same time perform localization in that area. The method is often used by robots and other vehicles and is a first step towards making them autonomous. Once a map of the area is created and the vehicle knows its posi-tion with respect to the map, the next step, the navigaposi-tion and route planning step, can take place. As a third step, smart decisions, dependent on the informa-tion available for the vehicle, can be made and the vehicle may then be regarded as truly autonomous. Figure 1.1 shows the possible look of a small map generated using SLAM.

(14)

2 1 Introduction

Figure 1.1:A map with a TurtleBot.

1.2 Background

The research in the area of SLAM has been on the agenda for several universities and other institutions world wide the last decades. The continuous improvement of the computational capacity in computers has made it possible for SLAM ap-plications to be performed outside of pure test environments, and in the most latter years even outdoors [Bailey, 2002][Newman et al., 2006][Granström and Callmer, 2008][Kümmerle et al., 2011b]. SLAM has often been performed using a laser scanner, which is an accurate but expensive sensor. In this project the laser scanner is replaced by a Kinect™ camera, which is a much cheaper, but still qualified sensor unit.

The applications of SLAM are numerous. If mapping of a hazardous or poisonous environment is required a robot with SLAM-based navigation is an excellent op-tion. Some concrete examples are indoor mapping of houses on fire or mapping of damaged nuclear reactors. Another interesting case is monitoring of electrical power lines. That kind of work might be expensive and time consuming work for

humans to do but probably cheaper when using an UAV1. The UAV will simply

follow the power lines autonomously and at the same time check the power line for damage.

The research area of SLAM has matured during the past years and the 2D-SLAM

(15)

1.3 Problem Formulation 3

problem using a laser scanner has been implemented in so many cases that it may now be considered solved [Hertzberg et al., 2011]. The field of visual SLAM, i.e. using visual images rather than using measurements from wheel sensors or IMU:s (Inertial Measurement Unit) has increased over the last years. Contributing to this development is a large variety of open source components, aiming at separate tasks in the SLAM machinery [Hertzberg et al., 2011].

During the years different types of sensors have been used to perform visual SLAM, e.g. 2D laser scanners, mono cameras and stereo cameras [Sturm et al., 2011]. The development in the robotic community has lately shifted towards the use of sensors that are cheaper, lighter, smaller and thus often less accurate. This change allows for a use of UAVs to a bigger extent [Steder et al., 2008] and also in-creases the potential of getting SLAM based consumer products, e.g. autonomous vacuum cleaners, on the market.

The work load of implementing an embedded SLAM-system has decreased as the number of open source packages has increased. The Point Cloud Library, PCL, [Rusu and Cousins, 2011] is a standalone, large scale, open project for processing of 3D point clouds, which might become handy when working with applications concerning 3D SLAM. Another neat package is the OctoMap package which is a library used to describe mapped environments in a way that is both probabilistic and scalable [Wurm et al., 2010].

1.3 Problem Formulation

In this project SLAM will be performed using a Kinect™, instead of a laser scan-ner, as primary sensor. With a Kinect™ the sensor data cannot be expected to be as accurate as from a laser. Laser scanners are well known to have high accuracy in range. The Kinect™ has a maximum range of about eight meters while a laser scanner often has a maximum range of 50-80 meters.

Apart from being cheaper the Kinect™ has other advantages compared to a laser scanner. Its measurements are three dimensional (range and bearing in two direc-tions) while the data from a laser scanner is two dimensional (range and bearing). This makes the data more information dense. In addition to that the Kinect™ also has a RGB-camera which captures the surroundings in video pace. This gives the possibility to get 3D maps with coloured textures.

The ability to recognize similarities between images is a key factor for a visual SLAM algorithm to be successful. Without that ability the algorithm is lost. In this project the performance of indoor visual SLAM using a TurtleBot will be evaluated with the following questions as guidelines:

(16)

4 1 Introduction

• How does an indoor environment with few recognizable objects in the im-ages affect the performance of the algorithm?

• What adjustments can be suggested to improve the result in such an envi-ronment?

• Can a motion model improve the performance of the algorithm?

1.4 Limitations

In this project the Kinect™ will always be mounted on a TurtleBot, and not be used in combination with any other robot platform. The different algorithms for e.g. visual feature detection, pose estimation or map building, will all come from open source libraries, but it is in the scope of this project to make adjustments to them to fit our purposes. Furthermore the whole robot platform and much of

its functionality will be based on ROS2_{and any possible limitations in ROS will}

consequently be reflected in the evaluations. It is not in the scope of this project to add any additional sensors to the robot, neither will the existing hardware be modified in any way.

Measurements will be performed indoors, in order not to expose the Kinect™ to too much IR radiation, in which case the measurements from the depth sensor is affected. When data is collected there will be no moving objects present in the environment, since such disturbances might affect the algorithm in an unpre-dictable way.

There will not be any demands on real time performance on algorithms in this project. Real time performance here refers to the ability of mapping a room at the same time as the robot moves around. It also means that all computations to build the map are completed in such a pace that the robot can move around freely in the environment and still have the most recent estimate of the surroundings at most one second after that the corresponding image was captured. Even if this project does not intend to fulfill any real time performance goals there might be time spent on trying to get real time performance of the system.

1.5 Cybercom

The master thesis project is carried out at Cybercom in Linköping. Cybercom is a consultancy focused on advanced IT and telecom solutions. The company was founded in 1995 and has been quoted on the NASDAQ OMX Nordic ex-change since 1999. Today the company has around 1700 employees and offices in Linköping, Gothenburg, Stockholm, Denmark, Finland and Singapore to name a few places. Furthermore Cybercom is the main sponsor of "Y-sektionen" (the Master of Science Programme in Applied Physics and Electronics at LiTH). The

(17)

1.6 Thesis Outline 5 Office in Linköping has around 60 employees working with big as well as small companies represented in the region.

1.6 Thesis Outline

The thesis continues with Chapter 2 that gives an introduction to the TurtleBot’s hardware and software. Chapter 3 explains the concept of SLAM in more detail and presents the open source SLAM algorithm, RGBDSLAM, used in this project. Chapter 4 describes the modeling and filtering used to extend the original RGBD-SLAM algorithm. Methods for data collection and evaluation of the results are found in Chapter 5. Finally the results are presented in Chapter 6 and a discus-sion is found in Chapter 7.

(18)

(19)

2

TurtleBot

This chapter starts with an introduction to the the hardware and software directly related to TurtleBot and continues with a short presentation of the vast open source library ROS, which holds much of the base functionality of the TurtleBot.

2.1 System Overview

TurtleBot is a robot kit that provides hardware and software for development of robot algorithms. The hardware consists of an iRobotCreate, a Kinect™ camera and a 150 degrees/second single axis gyro. An ordinary laptop is mounted on the

TurtleBot and it will be referred to as therobot computer. The software runs on

both therobot computer and a more powerful desktop computer. A complete list of

the hardware in a TurtleBot is given below and the list relates to Figure 2.1.

AMobile Base and Power Board

• iRobotCreate

• 3000 mAh Ni-MH battery pack • 150 degrees/second single axis gyro

• 12 V 1.5 A software enabled power supply (for powering the Kinect™)

B3D Sensor

• Microsoft Kinect™

• Kinect™ power board adapter cable 7

(20)

8 2 TurtleBot

CComputing :: ASUS 1215N

• Processors :: Intel® Atom™ D525 dual core processor • Memory :: 2 GB

• Graphics :: NVIDIA®ION ™ discrete graphics processor • Internal Hard Drive :: 250 GB

DTurtleBot Hardware

• Kinect™ mounting hardware • TurtleBot structure

• TurtleBot module plate with 1 inch spacing hole pattern

Figure 2.1:Hardware setup for TurtleBot robot kit.

2.2 iRobotCreate

iRobotCreate is an affordable mobile robot platform for educators, students and developers. iRobotCreate has over 30 sensors which react to both internal and ex-ternal events. It is built for indoor use only. In this project the interesting sensors in iRobotCreate are the single axis gyro and the left and right wheel encoder.

(21)

2.3 Kinect™ 9

2.3 Kinect™

The Kinect™ camera is a low cost sensor built specially for the Xbox console. It can deliver a RGB image and a depth image in parallel in video rate. Mi-crosoft has not released any official hardware specifications for the Kinect™ sen-sor and therefore unofficial specifications from reversed engineering is the most accessible way to get insight in the machinery. The OpenKinect community [http://openkinect.org, 2012] states that the Kinect™ has two cameras, one RGB-camera and one range RGB-camera. The latter is based on structured light. According to PrimeSense, the manufactures of the micro controller in the Kinect™, the pro-jected IR-points are processed by a PS1080A micro controller to produce a depth image. From design reference for PrimeSense PS1080A it can be inferred that:

Field of View(Horizontal, Vertical, Diagonal) = 58° H, 45° V, 70° D

Spatial x/y resolution(@ 2 m distance from sensor) = 3 mm

Depth z resolution(@ 2 m distance from sensor) = 1 cm

Operation range= 0.8 m - 3.5 m

2.4 Robot Computer

The robot computer enclosed with the TurtleBot is an ASUS eeePC which is a

lightweight netbook. The main task of therobot computer is to collect data and

send it to adesktop computer or record it to a hard drive. The robot computer also

interacts with the hardware in the TurtleBot. Due to the computational power

in the enclosedrobot computer some calculations can also advantageously be

per-formed on this computer, instead of letting thedesktop computer do all the work.

2.5 Desktop Computer

In this thesis a HP EliteBook 8460p laptop will be used as thedesktop computer.

The computer has an Intel Ci7 (2.8 GHz) CPU, 8 GB internal memory and an

Intel® HD Graphics 3000 integrated graphics card. The main task of the desktop computer is to do computations on data provided by the robot computer. Another

task is to visualize the result of the algorithms, a task that demands a powerful computer in order to run smoothly.

2.6 Software Enclosed with TurtleBot

(22)

10 2 TurtleBot

An SDK for the TurtleBot including all software connections between ROS,

iRobotCreate, Kinect™ and all transforms between coordinate systems fixed in different parts of the TurtleBot.

A development environment for the desktop including monitoring software

and tutorials for development.

Libraries for visualization, planning, control and error handling that are

spe-cific plug-ins for the TurtleBot in Rviz, which is a visualization applica-tion for ROS, described in Secapplica-tion 2.7. The visualizaapplica-tion applicaapplica-tion for

TurtleBot can be seen in Figure 2.2. Here also the included package

point-cloud_to_laserscan is used.

Demo applications examples The TurtleBot comes with a custom made version

ofgmapping for easy indoor mapping and planning. This application uses

the Kinect™ to simulate a laser scan and together with odometry and gyro data performs 2D-SLAM in real time.

Figure 2.2: Example of the included visualization tool Rviz. The

3D-TurtleBot robot model is included with the 3D-TurtleBot package.

2.7 Robot Operating System

ROS constitutes the base of the TurtleBot system and defines the means of exter-nal communication when building specialized applications. On the ROS home-page [http://www.ros.org, 2012], one can read the following:

”ROS provides libraries and tools to help software developers create robot appli-cations. It provides hardware abstraction, device drivers, libraries, visualizers, message-passing, package management, and more. ROS is licensed under an open source, BSD license”.

The ROS version used in this project isROS Electric Emys which was released

(23)

2.7 Robot Operating System 11

ROS is built up bystacks which in turn are built up by packages. A stack delivers

the functionality of a whole system or major parts of a system, e.g.

• theTurtleBot stack, which provides code for the TurtleBot

• theros_comm stack, which provides internal communication in ROS and

• theperception_pcl stack, integrate Point Cloud Library into ROS.

The packages, new and modified, of this project would be suitable to add together to a stack. A stack typically contains everything from a handful up to a couple of dozens of packages.

A ROS package often holds the functionality of a smaller, more delimited task

than a stack, e.g. thenavigation stack contains various packages that does

local-ization and route planning. Every package contains at least onenode. A node

typically solves one, or a few, separate tasks, e.g.

• therobot_pose_ekf node does EKF filtering,

• thepcd_viewer node provides a GUI for rendering point clouds and

• theimage_proc node can rectify RGB and depth images.

Nodes are started up using therosrun or roslaunch command. The first command

starts one single node whereas the latter can start multiple nodes at the same time using the specified launch file. The launch file also has the neat property that values of dynamic parameters may be supplied to the nodes. This results in an accessible way of troubleshooting and trimming without re-compiling the C++-code.

A ROS-message is a data structure that can be sent between nodes. Messages can

hold any information and ROS comes with useful predefined messages. User defined messages are also allowed and the ROS framework is prepared so that defining user specific messages is easily done.

The communication between nodes are handled either by atopic or a service. A

topic implements an asynchronous communication pattern. Amessage is posted

on the topic and the posting node continues its work. The nodes that listen to the

topic, thesubscribers, get a callback and can then process the information in the

posted message. A service implements a synchronous communication model and any service call is thus blocking for the calling node and requires the immediate attention of the answering node. What defines a service in ROS is the input mes-sage type, the service function to be run and the output mesmes-sage type. To define a new service one simply specifies the message types to use in a text file and a corresponding service class is automatically generated.

The ROS framework also has a neat command namedrosbag that can record and

play a bag file. A bag file is a file that contains the messages published on all

the topics specified when calling therosbag record command and it can therefore

(24)

(25)

3

Algorithms

In this chapter the functionality of a generic SLAM algorithm is explained. A pre-sentation of one of the most central open source packages in this project, RGBD-SLAM, is also made.

3.1 Under the Hood of a SLAM Algorithm

A SLAM algorithm essentially consists of the following steps:

• Data acquisition; In this step measurements from the sensors, e.g. laser scanner or video camera, are gathered.

• Feature extraction; a number of characteristic, and thereby easily

recogniz-able, landmarks1are selected from the data set.

• Feature association; landmarks from previous measurements are associ-ated with landmarks from the most recent measurement.

• Pose estimation; the relative change between the landmarks and the posi-tion of the vehicle is used to estimate the new pose of the vehicle.

• Map adjustment; the map is updated according to the new pose and the corresponding measurements.

The five tasks are continuously repeated and a trajectory of position estimates and a map is built up. Figure 3.1 illustrates the process. In Figure 3.1a the robot has received input data. Landmarks are extracted from the data. In the case of

1_{Landmarks - In this report}_{Landmarks, Keypoints and Features are synonymous.}

(26)

14 3 Algorithms

visual SLAM, a landmark can be anything that is easily recognizable by a visual sensor, e.g.

• a corner, • an edge,

• a dot in a protruding color.

The robot in Figure 3.1a has extracted three landmarks and these are stored in a database containing all landmarks that have been observed so far. In Figure 3.1b the robot has moved forward and received new data. The robot extracts landmarks from the data and searches through its database to see if there are any matches with old landmarks. Extracted landmarks that are not found in the database are added and landmarks giving a match in the database are used to estimate the change of the robot’s pose. This is done by measuring the change in distance and angle to the old landmarks. When the new pose is estimated the robot uses this estimate and the measurements to adjust the positions of the landmarks. SLAM can be regarded as a hen and egg problem. A proper map is needed to get a proper pose estimate and a proper pose estimate is needed to get a proper map.

(a)Robot position and landmarks at time t0.

(b)Robot position and landmarks at time t1 > t0. Compared to 3.1a the robot has found two new landmarks and one of the old landmarks has gone out of sight.

(27)

3.2 RGBDSLAM 15

3.2 RGBDSLAM

This thesis project is based on an algorithm called RGBDSLAM, which is a six degrees of freedom (x,y,z, roll, pitch and yaw) algorithm that performs visual SLAM (VSLAM). The algorithm uses both the color (RGB) and the depth (D) in-formation and is generic due to the fact that it contains no motion model. The algorithm is developed by researchers from the University of Freiburg in Ger-many and is described in Engelharda et al. [2011] and Endres et al. [2011]. An implementation of the algorithm is provided as a package in ROS. A schematic overview of the algorithm can be studied in Figure 3.2.

Kinect™ sensor

Depth image RGB-image

Make point clouds from depth data

Match RGB and depth data

Pairwise 6D Transformation Estimation (RANSAC, GICP) Features and

corresponding point clouds from

database

Feature extraction from RGB-image (SURF, SIFT, ORB)

Global

optimization/Loop-closure (g2_o)

Registrated point cloud

Figure 3.2:Schematic overview of the RGBDSLAM algorithm provided as a

(28)

16 3 Algorithms

As a first step the depth and RGB-images are collected with synchronized time-stamps. Then features are extracted from the RGB-image by a feature extraction algorithm. RGBDSLAM has multiple feature extraction algorithms implemented. The implementations have different pros and cons in different environments and they differ in computation time [Juan and Gwun, 2009]. The algorithms imple-mented are SURF, SIFT and ORB and the first two are described in section 3.4. In the next step of the algorithm extracted features are projected to the depth im-age. This step introduces some uncertainty into the chain of operations. Mainly due to the synchronization mismatch between depth and RGB-images, but also because of interpolation between points with large differences in depth. The fact that a minor misprojection of a feature lying on an object border on to the depth image can result in a big depth error makes features picked at object borders unreliable.

To find a 6D transform for the camera position in this noise the RANSAC algo-rithm is used. Features are matched with earlier extracted features from a set of 20 images in the standard configuration. The set consists of a subset including some of the most recent captured images and another subset including images randomly selected from the set of all formerly captured images. Three matched feature pairs are randomly selected and are used to calculate a 6D transform. All feature pairs are then evaluated by their Euclidian distance to each other. Pairs whose Euclidian distance is below a certain threshold are counted as inliers. From these inliers a refined 6D transform is calculated using GICP.

3.3 Random Sample Consensus

RANdom SAmple Consensus, or RANSAC for short, is an iterative algorithm used to adapt the parameters of a mathematical model to experimental data. RANSAC is a suitable method when a data set contains a high percentage of outliers, i.e. measurements that suffer from measurement errors so large that the validity of the measurements is low. The method was first presented in the beginning of the eighties in Fischler and Bolles [1981] and was suggested to be a suitable method for automated image analysis.

Assume a mathematical model that has n free parameters which can be estimated given a set of measurements, P . The number of measurements in P has to be greater than n, #P > n. Let S and T be two different varying subsets of P . Given the assumptions the RANSAC algorithm works as follows:

• Randomly select a subset of the measurements in P and call it S. Use S to make a first estimate of the n free parameters of the model.

• Use the current estimate of the model to select a new subset of points,

T , from the measurements that are within some error tolerance from the

model.

(29)

3.3 Random Sample Consensus 17

the free parameters of the model according to this new subset. Calculate a measure of how well T and the model coincide, store that value and select a new subset S.

• If T does not contain more measurements than the given limit, randomly select a new subset S from P and start all over again.

• If none of the selected subsets hold more measurements than the limit, exit in failure, or if the maximum number of iterations has been reached, exit. The method is characterized by three parameters:

1. The error tolerance used to determine when data is not part of the model. 2. The maximum number of iterations in the algorithm.

3. The minimum value on the number of measurements in a subset to be used for parameter estimation.

The big advantage with RANSAC is the robust way it handles outliers in the data set. The drawbacks with RANSAC is that it does not guarantee any solution, nor that a given solution is optimal. Furthermore the three parameters mentioned above are to a large extent problem specific, which means that experimental ad-justment to the specific case treated is required.

(a)Example of measurements. (b)Measurements as seen by RANSAC

with the model being a line. Stars are considered as outliers and dots are con-sidered as inliers.

Figure 3.3:Illustration of how RANSAC adapts a linear model to a data set

(30)

18 3 Algorithms

3.4 Feature Extraction

The two commonly used feature extractors Scale Invariant Feature Transform, SIFT, and Speeded Up Robust Features, SURF are breifly described below.

3.4.1 Scale Invariant Feature Transform

SIFT is a method for extracting and detecting landmarks in an image. The method was invented in 1999 [Lowe, 1999] and has since been widely used. The algo-rithm is, as the name suggests, invariant to scale transformations in images and also rotational transformations. SIFT is much appreciated for its reliability and repeatability which are important properties of a feature extractor/detector [Juan and Gwun, 2009]. The features are selected at maxima and minima of the color in an image that has first been smoothed and then filtered through a difference of Gaussian filter (DoG). The continuous two dimensional difference of Gaussian convolution kernel is f (x, y) = 1 2πσ2e −_(x2_+y2_)/(2σ2₎ − 1 2πσ2_K2e −_(x2_+y2_)/(2σ2₎ , (3.1)

where K is a constant bigger than one which scales the standard deviation. The DoG filter creates two versions of the original image; one that is somewhat more blurred than the other. The blurring is carried out using approximate Gaussian filtering kernels. Then the most blurred image is subtracted from the other one and maxima and minima in the resulting image are detected. Detected points with low contrast or points lying along edges are discarded. Each detected key-point is characterized by the magnitude and rotation of the pixelwise image

gra-dient. Here Aij represents the value of each pixel, Mij is the magnitude of the

gradient and Rij is the orientation of the gradient.

Mij =

q

(Aij−Ai+1,j)2+ (Aij −Ai,j+1)2 (3.2)

Rij = arctan2(Aij−Ai+1,j, Ai,j+1−Aij) (3.3)

The above calculated values are post processed to increase the robustness to illu-mination changes.

3.4.2 Speeded Up Robust Features

The feature detection algorithm SURF is partly inspired by SIFT and was first presented in Bay et al. [2006]. The method aims at being a fast, but yet reliable, algorithm. The method is both scale- and rotation invariant and handles blur and changes in illumination in a satisfying manner [Juan and Gwun, 2009]. The filters used in SURF are discretized and cropped which results in rectangular filters with integral values. In this way the computational load is reduced and the extraction or detection of features is speeded up.

(31)

3.5 Registration 19

3.5 Registration

In order to build a map of several point clouds registration is necessary. Reg-istration is the process of finding the best possible transformation to stitch two point clouds together. The point clouds are called destination, D, and source, S. This section presents some of the most used and known registration methods. In Figure 3.4 an example of registration between two point clouds is showed.

Figure 3.4:Example of registration between two point clouds. The overlap

of the point clouds is seen most easily in the bottom of the figure. There is approximately 80% overlap.

3.5.1 Iterative Closest Point - Point to Point

Iterative Closest Point (ICP) is one of the most intuitive registration methods. The algorithm has been widely used and was developed in the early 90’s. A frequently cited analysis of the algorithm is Besl and McKay [1992]. The algorithm can be summarized in two key steps.

1. Find matching feature points between the two point clouds.

2. Compute a transform, T, that is a rotation and translation such that it min-imizes the Euclidian distance between the matching points.

An iteration of these two steps typically gives convergence to the desired trans-form. A key parameter to tune in the algorithm is the maximum matching

dis-tance threshold, emax. With a complete overlap of the two point clouds a high

value of emaxcan be used and the algorithm will still converge. The parameter

emaxhas to be low if matching points only come from a small overlap, otherwise

(32)

20 3 Algorithms

between convergence and accuracy. Segal et al. [2009] presents the algorithm as in Algorithm 1.

Algorithm 1Standard Iterate Closest Point algorithm

Input: Two point clouds: D = {ai}S= {si}

1: Initial guess: Transform T0

Output: Correct Transform T

2: T= T0

3: whilenot converged do

4: fori = 1 to N do 5: di = FindClosestPointInD(T · si) 6: if ||di −T · si||< emaxthen 7: wi = 1 8: else 9: wi = 0 10: end if 11: end for 12: T= argmin T        X i wi||T· si−di||2        13: end while

3.5.2 Generalized Iterative Closest Point

A far more advanced algorithm to handle registration is Generalized ICP, GICP, first presented by Segal et al. [2009]. This can be thought of as a plane to plane matching algorithm. By assigning probabilistic properties to extracted features a most likelihood estimate of the transform can be achieved. Furthermore the probabilistic framework can make use of all general research in probabilistic techniques for increased robustness, such as outlier rejection. The probabilistic

model assumes that points in the measured point clouds ˆD= { ˆai}and ˆS = { ˆsi}are

independently Gaussian distributed. This will generate the probabilistic point

clouds D and S with points ai ∼ N

ˆai, CiD and si ∼ N ˆ si, CiS . Here C_iD and

C_iS are the three dimensional covariance associated with the measured points.

By defining T∗as the correct transform when perfect compliance between points

occur, it is known that

ˆ

si = T

∗ _ˆ

di. (3.4)

Defining the function

ei(T) = si−Tdi (3.5)

and considering this function evaluated with T∗, as in equation (3.5), gives the

probabilistic function ei(T) ∼ N ˆ si−(T ∗ ) ˆdi, CiS+ (T ∗ )C_iD(T∗)T. (3.6)

(33)

3.6 General Graph Optimization 21

This is the objective function for optimization and a most likelihood estimate can be found solving T=argmax T        Y i p(ei(T))        =argmax T        X i log(p(ei(T)))        ,

where p is a Gaussian probability density function for a three dimensional vari-able with independent elements, see Appendix A.1. This can be simplified to

T=argmin T        X i ei(T)T C_iS+ TC_iDTT−1ei(T)        . (3.7)

The solution of Equation (3.7) gives a transform that optimize the point cloud matching in a probabilistic manner. The equation can also be reveal that this truly is a Generalized-ICP algorithm. By setting

C_iS= I

C_iD= 0

the standard Point to Point ICP algorithm is obtained. In this case Equation (3.7) becomes T=argmin T        X i ei(T)Tei(T)        =argmin T        X i ||_e_i_(T)T||2        ,

which coincides the Point to Point ICP algorithm for matching features (wi = 1),

see line 12 in Algorithm 1.

3.6 General Graph Optimization

The General Graph Optimization package (abbreviated g2_{o) is an open source}

library which implements a general and efficient graph optimizer. A graph op-timizer achieves a minimization of a nonlinear error function represented as a graph, see Figure 3.5. The nodes represent vectors of parameters and the edges how well two parameter vectors match to external constraint, relating the two

pa-rameter vectors. In the case where g2o is used by RGBDSLAM each node in the

graph represents a state variable of the robot and each edge represents a pairwise observation between the nodes that are connected by that edge. The meaning of a

(34)

22 3 Algorithms

pairwise observation is that node B observes its pose with respect to node A, and vice versa. The algorithm is fast due to, among other things, its use of the sparse connectivity property of the graph and advanced solvers of sparse linear systems. The interested reader may refer to Kümmerle et al. [2011a] for a more thorough presentation. x1 x2 x3 x4 e12 e31 e23 e24 F(x) =e12TI12e12+ e23TI23e23+ e24TI24e24+ e31TI31e31

Figure 3.5: To the left a graph with nodes and edges and to the right the

corresponding nonlinear error function where Iij is the information matrix

(35)

4

Models and Filtering

So far this thesis has given an overview of the system and an introduction to the open source algorithm RGBDSLAM. In an attempt to improve the performance of the system, according to the guidelines stated in Section 1.3, three steps are taken:

1. Wheel odometry and gyro data is measured.

2. An appropriate motion model for the robot is implemented.

3. Already existing measurements from visual odometry are fused with wheel odometry, gyro measurements and the motion model using an Extended Kalman Filter.

These three modifications result in an algorithm that in the following will be called the modified algorithm as opposed to the original algorithm.

This chapter introduces the mathematical models used to formulate the SLAM problem in a rigorous manner. The models consist of both motion models that describe the motion of the robot and measurement models that demonstrate how the sensor data is inserted into the framework. The chapter continues with an explanation of key properties of the filter that fuses the motion model and the measurement models.

4.1 Notation

The state vector of the robot at time instance k is called xk and holds a three

dimensional pose, i.e. a three dimensional position and a three dimensional ori-23

(36)

24 4 Models and Filtering entation, xk = Xk Yk Zk φk θk ψk T . (4.1)

Here Xk,Ykand Zkrepresent the position of the robot in a world fixed coordinate

system and φk, θkand ψk, the roll, pitch and yaw angles, are defined as rotations

around the world fixed X , Y and Z axes, see Figure 4.1. The robot speed is

denoted vk and is the speed in the case of 2D movement in X - and Y -direction.

The angular velocity around the world fixed Z axis, i.e the derivative with respect

to time of the yaw angle is denoted ωk.

Figure 4.1:Definitions of roll, pitch and yaw.

By using vkand ωkan extended state vector on the form

xk=

Xk Yk Zk φk θk ψk vk ωk

T

(4.2)

can be formed. This state vector is used everywhere except in the original

turtle-bot_node package.

The sample time cannot be considered uniform and thus the sample time from

event k − 1 to k is denoted Tk.

The system can be modeled as a state space model, which on a generic form is

xk+1= f (xk, uk) + wk, (4.3)

yk= h(xk, uk) + ek, (4.4)

(37)

4.2 Motion Models 25

R respectively, i.e.

wk∼ N(0, Q), (4.5)

ek∼ N(0, R), (4.6)

and uk is the direct input to the system. Inputs to the system are supplied using

the state vector and there will not be any direct inputs to the system. Therefore

ukwill onwards be disregarded.

The measurement equation (4.4) will in the forthcoming cases be linear and can consequently be simplified as

yk = h(xk) + ek (4.7)

= Cxk+ ek, (4.8)

while the motion model (4.3) remains nonlinear.

4.2 Motion Models

RGBDSLAM has no motion model and hence there are no constraints to the es-timated trajectory of the robot. The absence of a motion model also makes the algorithm heavily dependent on a continuous stream of images that contains suf-ficiently many and re-detectable landmarks. To decrease the dependency of land-marks well trimmed motion models are introduced. This section will describe different motion models considered in this project.

4.2.1 Original Motion Model

The native navigation packages for the TurtleBot contain an EKF with a motion model. The filter uses a 6D model and estimates the robot pose with a 3D position and 3D orientation estimate. The filter fuses data from the wheel odometry, the gyro and the visual odometry. Given the state vector (4.1) the general motion model is: xk+1=                      Xk+1 Yk+1 Zk+1 φk+1 θk+1 ψk+1                      = f (xk) + wk =                      Xk Yk Zk φk θk ψk                      + wk (4.9)

The model models a constant position and fixed angles around the axes. It is a very simple and perhaps not an appropriate model for a moving robot.

(38)

26 4 Models and Filtering

4.2.2 2D Constant Velocity Model

By using the extended state vector in (4.2), a more dynamic but still relatively simple model can be formed as

xk+1=                              Xk+1 Yk+1 Zk+1 φk+1 θk+1 ψk+1 vk+1 ωk+1                               = f (xk) + wk =                              Xk+ Tkcos(ψk)vk Yk+ Tksin(ψk)vk 0 0 0 ψk+ Tkωk vk ωk                               + wk. (4.10)

This model describes a 2D motion where the robot is always in an upright posi-tion and travels with constant velocity. This model might fit the experimental set up better than the model in (4.9).

4.2.3 2D Coordinated Turn Model

The motion modelling can be extended further by using the current yaw, ψk and

the angular velocity ωkin a more sophisticated way. This is conveniently done by

a 2D discretized coordinated turn model using polar velocity [Gustafsson, 2010], which is given in (4.12). xk+1=                              Xk+1 Yk+1 Zk+1 φk+1 θk+1 ψk+1 vk+1 ωk+1                               = f (xk) + wk (4.11) =                                 Xk+ω2vk sin( ωkTk 2 ) cos(ψ + ωkTk 2 ) Yk+ ω2vksin( ωkTk 2 ) sin(ψ + ωkTk 2 ) 0 0 0 ψk+ Tkωk vk ωk                                 + wk. (4.12)

(39)

4.3 Measurement Models 27

4.3 Measurement Models

Measurement models are used to define how the sensors’ measurements and the states are related. The system consists of the three different sensor units

• wheel odometry, • gyro and

• visual odometry.

The three units contribute to the measurement equation in different ways. Note that the extended state vector (4.2) is used through out this section.

4.3.1 Wheel Odometry

Tachometers are used to measure the wheel speed of the robot. The TurtleBot platform uses specific hardware details and integration of speed to provide

mea-surements of Xk, Yk, ψk and the speed vk. The measurement model of the wheel

odometry sensor is given as

yodom,k = Codomxk+ eodom,k(d) (4.13)

=             1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0                                          Xk Yk Zk φk θk ψk vk ωk                               + eodom,k(d) (4.14) =             Xk Yk ψk vk             + eodom,k(d). (4.15)

Here eodom,k(d) is the sensor noise. The measured states from the odometry sensor

are incrementally updated from an initial position and therefore the noise is also increasing. A simplified model for the noise is a Gaussian noise model according to

eodom,k(d) ∼ N (0, Rodom(d)) (4.16)

where the covariance, Rodom(d), depends on the traveled distance, d, as

Rodom(d) = dS,

where S is a design parameter in form of a constant covariance matrix for the specific sensor.

(40)

4.3.2 Gyro

The robot has a single axis gyro which measures the yaw angle and the angular velocity around the z-axis i.e.

ygyro,k = Cgyroxk+ egyro,k (4.17)

= 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 !                              Xk Yk Zk φk θk ψk vk ωk                               + egyro,k (4.18) = ψk ωk ! + egyro,k. (4.19)

The noise egyro,kis modeled as Gaussian noise with zero mean,

egyro,k ∼ N(0, Rgyro). (4.20)

4.3.3 Visual Odometry

The visual odometry provides information from the RGB camera and the depth camera of the Kinect™. Collection and post processing of data is described in Section 5.2. The concerned algorithms that calculate useful information from the post processed data are presented in Chapter 3. The result of the algorithms is an estimate of the full 3D pose of the robot, i.e.

(41)

4.3 Measurement Models 29

yvo,k= Cvoxk+ evo,k(votrust) (4.21)

=                      1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0                                                   Xk Yk Zk φk θk ψk vk ωk                              

+ evo,k(votrust) (4.22)

=                      Xk Yk Zk φk θk ψk                     

+ evo,k(votrust) (4.23)

Here evo,k(votrust) is modeled as Gaussian noise according to

evo,k∼ N(0, Rvo(votrust)). (4.24)

The measurement noise, Rvo(votrust), depends on the parameter votrust which

varies in a nonlinear way as described in Algorithm 3.

4.3.4 Total Model

Concatenation of the three measurement models above gives a total measurement equation on the form

yk =         yodom,k ygyro,k yvo,k         (4.25) =         Codom Cgyro Cvo         xk+         eodom,k egyro,k evo,k         (4.26) = Cxk+ ek (4.27)

(42)

where as a result the matrix C is

C =                                                1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0                                                . (4.28)

The measurement noise ekbecomes

ek(d, votrust) =         eodom,k(d) egyro,k

evo,k(votrust)

        (4.29)

with a corresponding covariance matrix R as

R(d, votrust) =         Rodom(d) 0 0 0 Rgyro 0 0 0 Rvo(votrust)         (4.30)

due to the assumption that the different measurement noises are independent from each other.

4.4 Extended Kalman Filtering

In this section the filtering process is described. All the sensors provide relative measurements, meaning that the measurements depend on earlier states. The input to the filter is given as the difference of the new measurement and an earlier

estimated state ˆxk. Wheel odometry and gyro measurements are processed at

full rate and the visual measurements are processed at a sub rate. The filtering process can be studied in Figure 4.2. The filter is an Extended Kalman Filter using first order Taylor expansion. It is implemented by the open source project Orocos Bayesian Filtering Library [http://www.orocos.org/bfl, 2012]. The filter used in this project is described by Algorithm 2.

(43)

4.4 Extended Kalman Filtering 31

Algorithm 2Extended Kalman Filter

where Qkis the process noise and h(x) defines the measurement equation

accord-ing to the general notation in Equation (4.7). The measurement equation is linear

and therefore the derivative h0x(x)) = ∂Cx∂x = C is a constant matrix. The

deriva-tive of the motion model, f0

x(x), for the models in Section 4.2 are presented in

Appendix A.2.

4.4.1 Filter and Model Evaluation

An overview of the filter and its information flow can be studied in Figure 4.2. The loop updating the filter with visual odometry data is run in a sub rate com-pared to the loop updating the filter with wheel odometry and gyro measure-ments, which is run at full filter rate.

Filtering process Odom/gyro Kinect™ Dead recon RGBDSLAM EKF xˆk full rate sub rate

(44)

A sample of how the input information to the filter may look is showed in Figure

4.3. The covariance of visual odometry depends on the votrustparameter which

in turn depends on the number of reliable landmarks. The covariance of the wheel odometry sensor depends on the traveled distance since last visual odome-try update. This models the increased uncertainty related to the wheel odomeodome-try sensor as the distance to the last visual measurement increases. Figure 4.4 shows the prior distribution of the filter, i.e. the distribution after system update but before measurement update. The posterior distribution is updated by the filter during every measurement update. The dots every third update symbolizes that visual odometry has provided information to the filter. These dotted estimates are the ones sent back to the RGBDSLAM algorithm. It can also be seen that the filter covariance increases in the same pattern as the odometry covariance.

0.3 0.32 0.34 0.36 0.38 0.4 0.42 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.3 1.31 X [m] Y [m ]

Filter inputs from measurements

Odom/Gyro measurement Vo measurement

Figure 4.3:Measurements from wheel odometry, gyro and visual odometry

(45)

4.4 Extended Kalman Filtering 33 0.3 0.32 0.34 0.36 0.38 0.4 0.42 1.22 1.24 1.26 1.28 1.3 1.32 X [m] Y [m ] Filter update Prior estimate

Odom/Gyro influenced filter estimate Vo influenced filter estimate

(46)

(47)

5

Implementation and Evaluation

Methods

In this chapter the cooperation of RGBDSLAM and the filter is described. Meth-ods for data collection and post processing are also presented. The chapter is ended by a section that explains the experimental setup and evaluation criteria that are used when evaluating the implemented algorithm.

5.1 Filter Integration in RGBDSLAM

The models in Chapter 4 are well defined and strict in a mathematical sense but are not well suited for direct implementation in C++ and ROS. The theoretical models need some adjustments to fit the framework and additional functionality e.g. synchronization of data is required.

RGBDSLAM- Pose estimation

RANSAC GICP Filtercall

EKF

Reception of filtered pose

Odom/gyro input

Figure 5.1:The tasks of pose estimation.

(48)

36 5 Implementation and Evaluation Methods

Figure 5.1 gives an overview of how the pose estimation is performed in the al-gorithm. When new RGB and depth images, wheel odometry data and gyro data are captured, three things can happen:

1. All of the information is processed.

2. Only the information from the odometry and gyro sensors is processed.

3. None of the information is processed.

In thefirst case, where all data is used, the visual data is joined by the odometry

and gyro data in the box named “Filter call“. Here the measurements from the three sensors are inserted into a ROS service call and sent, using a blocking call-response method to the filter. During the service call the input data to the filter is published on the input topics of the filter node. After reception of new data the filter checks the time stamps of the measurements and, if the three measurements are sufficiently close in time, the distribution of the internal Gaussian probability distribution is updated accordingly. A new estimate of the mean value of the distribution is computed and returned as the current pose estimate.

In thesecond case, where the visual data is disregarded, a multirate filter call is

car-ried out. In this case a filter update with only odometry and gyro measurements (and information from the motion model) are used to create a new estimate of the robot position. This estimate is not sent back to RGBDSLAM, but used internally in the filter.

Thethird case, where none of the data is used, is when the parameters for data

skipping are used to lower the rate of input data to the algorithm.

If data is scattered in time the distribution is not updated. As a consequence the

time between updates, in Chapter 4 referred to as Tk, is varying.

5.1.1 Covariance of Visual Odometry

The implemented software has a parameter, votrust(as in visual odometry trust)

which adapts the trust in the visual data. Its value reflects how well the visual SLAM algorithm has succeeded in finding and matching landmarks in the pro-cessed images. The parameter is set according to the following algorithm.

(49)

5.2 Data Collection and Post Processing 37

Algorithm 3Set Trust in Visual Odometry

Input: Number of total matches between images, Nm, number of inliers (found

by RANSAC), Ni, boolean use_loop_closure, sequential ID of nodes that are

matched, I D1 and I D2

Output: Noise parameter for visual measurement votrust

1: ifuse_loop_closure then

2: ifI D2 − I D1 < minimum distance for loop closure then

3: votrustis set to adjustable value less than one

4: loop_closure = true

5: end if

6: end if

7: if!loop_closure then

8: ifvotrust> 10 & Nm, 0 & Ni>= 10 then

9: ifNi/Nm> 0.7 then 10: votrust= votrust/10 11: end if 12: else 13: votrust= 1 14: ifNi== 0 then 15: votrust= 10000 16: else ifNi < 10 then 17: votrust= 1000 18: else ifNm, 0 then 19: ifNi/Nm< 0.75 then 20: votrust= 5 21: else ifNi/Nm< 0.8 then 22: votrust= 3 23: else ifNi/Nm< 0.85 then 24: votrust= 2 25: end if 26: else 27: votrust= 1000000 28: end if 29: end if 30: end if

As Algorithm 3 shows the value of the parameter is decreased if there is a possible loop closure and the value is increased if there are few matches of landmarks between images.

5.2 Data Collection and Post Processing

In order to test parameter adjustments or new coding strategies in a repeatable way, it is not feasible to every now and then start up the TurtleBot and run it

(50)

along some predefined track. The reasons for this are various, but amongst them are

• the time consumption,

• the variation of the driven route and • changes in the environment.

The solution is to record the data generated by a drive in such a way that it can be

replayed time after time. This is conveniently done by therosbag command, see

Section 2.7.

The visual data is presented in different ways on different topics. The data from the gyro and wheel odometry sensors are given in one single form on one single topic each and there are hence no alternatives on how to record them. In addition

to the concrete measurement data, the/tf topic is recorded. This topic contains

the transformations between different coordinate systems fixed in both the Turtle-Bot and its surroundings. In the post processing, when RGBDSLAM is running, the/tf topic is supplemented with other transforms published by RGBDSLAM.

Spelled out, all the recorded topics are: • /camera/depth/camera_info • /camera/depth/image • /camera/rgb/camera_info • /camera/rgb/image_color • /imu/data • /odom • /tf

Recording all these topics gives data in a rate of about 70 [Mb/s]. The data cannot be replayed at full speed, since that would make the input buffers of RGBDSLAM to overflow. Instead data is played in a speed of 1/2 to 1/20 of full speed depend-ing on the parameter values used in the algorithm. The data flow when replaydepend-ing a bag file can be seen in Figure 5.2.

(51)

5.3 Evaluation Criteria 39 /rosbag_play /camera/rgb/image_color /camera/rgb/camera_info /camera/depth/image /rgbdslam_mod /clock /tf /robot_pose_ekf_mod /imu/data /odom

Figure 5.2: Data flow when playing a bag file. The ellipses are ROS nodes

and the rectangles are ROS topics.

5.3 Evaluation Criteria

Evaluation methods can be very different depending on the experimental setup and the resources available. A motion tracking system would for example give the opportunity to determine the position of the robot with a high precision. A sensitive laser scan equipment could, by scanning the environment where tests are carried out, give a sort of ground truth for the estimated 3D map. In this project however, such high precision evaluation instruments has not been used. Instead the results have been evaluated using more ordinary techniques: 3D maps have been evaluated using ocular inspection and the estimated trajectories have been evaluated using the ground truth matcher, described in Section 5.3.1. There are three key factors which are altered in order to evaluate the performance of the modified algorithm. Trajectories and maps can be computed

1. with or without filter,

2. with the original algorithm parameters set to give a robust or a less compu-tationally heavy performance and

3. using different data sets.

The parameter alternatives mentioned above are specified in Table 5.1. The differ-ent choices essdiffer-entially consist of deciding how often to call the filter, how many keypoints to extract and how many matches to require between images.

The filter frequency determines how often to send both visual and odometric in-formation to the filter whereas the multirate filter frequency sets how often to

(52)

Table 5.1:Parameter specifications.

Robust setup Slimmed setup Comment

Filter frequency 2.5 hz 1.5 hz Vo, odom and

gyro measure-ments

Multirate filter frequency 7.5 hz 6 hz Only odom

and gyro mea-surements

Max. keypoints 400 75 Maximum

number of keypoints ex-tracted from image

Min. keypoints 100 50 Minimum

number of keypoints ex-tracted from image

Sufficient matches 100 60 Sufficient

number of matches be-tween key-points from different im-ages

Min. matches 75 20 Minimum

number of matches be-tween key-points from different im-ages

(53)

5.3 Evaluation Criteria 41

send odometric data only. The maximum and minimum number of keypoints simply specify how many keypoints to extract from the images using the chosen image extractor (SURF is used in this project). The sufficient matches and min. matches parameters decide how many matches between the sets of keypoints from different images that are to be seen as a sufficient number and a minimum number, respectively. If the sufficient number of keypoints is reached, the algo-rithm will not try to get further matches and if there are fewer than the minimum value of matches, the algorithm will not try to calculate any transform between the corresponding images.

Two data sets has been used in the evaluation. These are referred to as the “figure-eight“ data set and the “few landmarks“ data set. In the figure-eight set the robot is run two laps in a figure-eight pattern, starting with a left turn at the base of the eight. The data set is quite ordinary in the sense that the room where it is recorded is not so big and the environment gives the robot varying depth and RGB information. The second data set, the few landmarks set, unsurprisingly give the algorithm rather few landmarks to work with. It is recorded in a partly more sterile environment with a flat, white wall. Both data sets are recorded by driving the robot manually with a PlayStation 3 control on a track marked with tape on the floor. The different environments can be studied in Figure 5.3 and Figure 5.4.

(a)Overview of the office room.

(b)Another view from the office room.

Figure 5.3: Views from the office in which the figure-eight data set was

Simultaneous Localization And Mapping Using a Kinect in a Sparse Feature Indoor Environment

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Simultaneous Localization And Mapping Using a

Kinect™ In a Sparse Feature Indoor Environment

Simultaneous Localization And Mapping Using a

Kinect™ In a Sparse Feature Indoor Environment

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Abstract

Acknowledgments

Contents

1

Introduction

1.1

Simultaneous Localization And Mapping

1.2

Background

1.3

Problem Formulation

1.4

Limitations

1.5

Cybercom

1.6

Thesis Outline

2

TurtleBot

2.1

System Overview

2.2

iRobotCreate

2.3

Kinect™

2.4

Robot Computer

2.5

Desktop Computer

2.6

Software Enclosed with TurtleBot

2.7

Robot Operating System

3

Algorithms

3.1

Under the Hood of a SLAM Algorithm

3.2

RGBDSLAM

3.3

Random Sample Consensus

3.4

Feature Extraction

3.4.1

Scale Invariant Feature Transform

3.4.2

Speeded Up Robust Features

3.5

Registration

3.5.1

Iterative Closest Point - Point to Point

3.5.2

Generalized Iterative Closest Point

3.6

General Graph Optimization

4

Models and Filtering

4.1

Notation

4.2

Motion Models

4.2.1

Original Motion Model

4.2.2

2D Constant Velocity Model

4.2.3

2D Coordinated Turn Model

4.3

Measurement Models