Institutionen för systemteknik
Department of Electrical Engineering
Examensarbete
Simultaneous Localization And Mapping Using a
Kinect™ In a Sparse Feature Indoor Environment
Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet
av
Fredrik Hjelmare & Jonas Rangsjö LiTH-ISY-EX--12/4587--SE
Linköping 2012
Department of Electrical Engineering Linköpings tekniska högskola
Linköpings universitet Linköpings universitet
Simultaneous Localization And Mapping Using a
Kinect™ In a Sparse Feature Indoor Environment
Examensarbete utfört i Reglerteknik
vid Tekniska högskolan vid Linköpings universitet
av
Fredrik Hjelmare & Jonas Rangsjö LiTH-ISY-EX--12/4587--SE
Handledare: Christina Grönvall
isy, Linköpings universitet Johan Kinander Cybercom Erik Almqvist Cybercom Hannes Töllborg Cybercom
Examinator: Fredrik Gustafsson
isy, Linköpings universitet
Avdelning, Institution Division, Department
Automatic Control
Department of Electrical Engineering SE-581 83 Linköping Datum Date 2012-06-11 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport
URL för elektronisk version
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-78885
ISBN — ISRN
LiTH-ISY-EX--12/4587--SE Serietitel och serienummer Title of series, numbering
ISSN —
Titel Title
Simultan lokalisering och kartering med hjälp av en Kinect™ i en inomhusmiljö med få landmärken
Simultaneous Localization And Mapping Using a Kinect™ In a Sparse Feature Indoor Envi-ronment
Författare Author
Fredrik Hjelmare & Jonas Rangsjö
Sammanfattning Abstract
Localization and mapping are two of the most central tasks when it comes to autonomous robots. It has often been performed using expensive, accurate sensors but the fast develop-ment of consumer electronics has made similar sensors available at a more affordable price. In this master thesis a TurtleBot™ robot and a Microsoft Kinect™ camera are used to per-form Simultaneous Localization And Mapping, SLAM. The thesis presents modifications to an already existing open source SLAM algorithm. The original algorithm, based on visual odometry, is extended so that it can also make use of measurements from wheel odometry and a single axis gyro. Measurements are fused using an Extended Kalman Filter, EKF, oper-ating in a multirate fashion. Both the SLAM algorithm and the EKF are implemented in C++ using the framework Robot Operating System, ROS.
The implementation is evaluated on two different data sets. One set is recorded in an ordi-nary office room which constitutes an environment with many landmarks. The other set is recorded in a conference room where one of the walls is flat and white. This gives a partially sparse featured environment.
The result by providing additional sensor information is a more robust algorithm. Peri-ods without credible visual information does not make the algorithm lose its track and the algorithm can thus be used in a larger variety of environments including such where the possibility to extract landmarks is low. The result also shows that the visual odometry can cancel out drift introduced by wheel odometry and gyro sensors.
Nyckelord
Keywords SLAM, RGBDSLAM, Kinect, TurtleBot, Sparse Feature Environment, Extended Kalman Fil-ter, Robot Operating System
Abstract
Localization and mapping are two of the most central tasks when it comes to autonomous robots. It has often been performed using expensive, accurate sen-sors but the fast development of consumer electronics has made similar sensen-sors available at a more affordable price.
In this master thesis a TurtleBot™ robot and a Microsoft Kinect™ camera are used to perform Simultaneous Localization And Mapping, SLAM. The thesis presents modifications to an already existing open source SLAM algorithm. The original algorithm, based on visual odometry, is extended so that it can also make use of measurements from wheel odometry and a single axis gyro. Measurements are fused using an Extended Kalman Filter, EKF, operating in a multirate fashion. Both the SLAM algorithm and the EKF are implemented in C++ using the frame-work Robot Operating System, ROS.
The implementation is evaluated on two different data sets. One set is recorded in an ordinary office room which constitutes an environment with many landmarks. The other set is recorded in a conference room where one of the walls is flat and white. This gives a partially sparse featured environment.
The result by providing additional sensor information is a more robust algorithm. Periods without credible visual information does not make the algorithm lose its track and the algorithm can thus be used in a larger variety of environments in-cluding such where the possibility to extract landmarks is low. The result also shows that the visual odometry can cancel out drift introduced by wheel odome-try and gyro sensors.
Acknowledgments
We would like to thank our supervisors at Cybercom, Johan Kinnander, Erik Almqvist and Hannes Töllborg for their support during this project. A thanks also goes to our supervisor Christina Grönwall and our examiner Fredrik Gustafs-son at LiU. Finally we would also like to thank Johan IvansGustafs-son at Cybercom for giving us the opportunity to formulate and perform this thesis project.
Linköping, June 2012 Fredrik Hjelmare & Jonas Rangsjö
Contents
1 Introduction 1
1.1 Simultaneous Localization And Mapping . . . 1
1.2 Background . . . 2 1.3 Problem Formulation . . . 3 1.4 Limitations . . . 4 1.5 Cybercom . . . 4 1.6 Thesis Outline . . . 5 2 TurtleBot 7 2.1 System Overview . . . 7 2.2 iRobotCreate . . . 8 2.3 Kinect™ . . . 9 2.4 Robot Computer . . . 9 2.5 Desktop Computer . . . 9
2.6 Software Enclosed with TurtleBot . . . 9
2.7 Robot Operating System . . . 10
3 Algorithms 13 3.1 Under the Hood of a SLAM Algorithm . . . 13
3.2 RGBDSLAM . . . 15
3.3 Random Sample Consensus . . . 16
3.4 Feature Extraction . . . 18
3.4.1 Scale Invariant Feature Transform . . . 18
3.4.2 Speeded Up Robust Features . . . 18
3.5 Registration . . . 19
3.5.1 Iterative Closest Point - Point to Point . . . 19
3.5.2 Generalized Iterative Closest Point . . . 20
3.6 General Graph Optimization . . . 21
4 Models and Filtering 23 4.1 Notation . . . 23
4.2 Motion Models . . . 25
4.2.1 Original Motion Model . . . 25 vii
viii CONTENTS
4.2.2 2D Constant Velocity Model . . . 26
4.2.3 2D Coordinated Turn Model . . . 26
4.3 Measurement Models . . . 27
4.3.1 Wheel Odometry . . . 27
4.3.2 Gyro . . . 28
4.3.3 Visual Odometry . . . 28
4.3.4 Total Model . . . 29
4.4 Extended Kalman Filtering . . . 30
4.4.1 Filter and Model Evaluation . . . 31
5 Implementation and Evaluation Methods 35 5.1 Filter Integration in RGBDSLAM . . . 35
5.1.1 Covariance of Visual Odometry . . . 36
5.2 Data Collection and Post Processing . . . 37
5.3 Evaluation Criteria . . . 39
5.3.1 Ground Truth Matcher . . . 42
6 Results 45 6.1 Accuracy of Sensors . . . 45
6.2 Evaluation of the SLAM Algorithms . . . 49
6.2.1 Original Algorithm. Figure-Eight Data Set . . . 49
6.2.2 Original Algorithm. Few Landmarks Data Set . . . 52
6.2.3 Modified Algorithm. Few Landmarks Data Set . . . 54
6.2.4 Modified Algorithm. Figure-Eight Data Set . . . 58
6.2.5 Summary of Match Values . . . 64
6.3 3D Maps . . . 64
6.3.1 Original Algorithm. Figure-Eight Data Set . . . 64
6.3.2 Original Algorithm. Few Landmarks Data Set . . . 67
6.3.3 Modified Algorithm. Few Landmarks Data Set . . . 69
6.3.4 Modified Algorithm. Figure-eigth Data Set . . . 71
7 Discussion 73 7.1 Discussion of the Results . . . 73
7.2 Future Work . . . 74
Bibliography 77 A Appendix 81 A.1 Multivariate Gaussian Distribution . . . 81
A.2 Derivatives of Motion Models . . . 81
A.2.1 2D Constant Velocity . . . 82
A.2.2 2D Coordinated Turn . . . 82
1
Introduction
This chapter gives a background to Simultaneous Localization And Mapping, SLAM and outlines the project by setting up the problem formulation. It also presents the limitations of the project and ends with a thesis outline.
1.1
Simultaneous Localization And Mapping
SLAM is an umbrella term for algorithms that build maps of an unknown envi-ronment and at the same time perform localization in that area. The method is often used by robots and other vehicles and is a first step towards making them autonomous. Once a map of the area is created and the vehicle knows its posi-tion with respect to the map, the next step, the navigaposi-tion and route planning step, can take place. As a third step, smart decisions, dependent on the informa-tion available for the vehicle, can be made and the vehicle may then be regarded as truly autonomous. Figure 1.1 shows the possible look of a small map generated using SLAM.
2 1 Introduction
Figure 1.1:A map with a TurtleBot.
1.2
Background
The research in the area of SLAM has been on the agenda for several universities and other institutions world wide the last decades. The continuous improvement of the computational capacity in computers has made it possible for SLAM ap-plications to be performed outside of pure test environments, and in the most latter years even outdoors [Bailey, 2002][Newman et al., 2006][Granström and Callmer, 2008][Kümmerle et al., 2011b]. SLAM has often been performed using a laser scanner, which is an accurate but expensive sensor. In this project the laser scanner is replaced by a Kinect™ camera, which is a much cheaper, but still qualified sensor unit.
The applications of SLAM are numerous. If mapping of a hazardous or poisonous environment is required a robot with SLAM-based navigation is an excellent op-tion. Some concrete examples are indoor mapping of houses on fire or mapping of damaged nuclear reactors. Another interesting case is monitoring of electrical power lines. That kind of work might be expensive and time consuming work for
humans to do but probably cheaper when using an UAV1. The UAV will simply
follow the power lines autonomously and at the same time check the power line for damage.
The research area of SLAM has matured during the past years and the 2D-SLAM
1.3 Problem Formulation 3
problem using a laser scanner has been implemented in so many cases that it may now be considered solved [Hertzberg et al., 2011]. The field of visual SLAM, i.e. using visual images rather than using measurements from wheel sensors or IMU:s (Inertial Measurement Unit) has increased over the last years. Contributing to this development is a large variety of open source components, aiming at separate tasks in the SLAM machinery [Hertzberg et al., 2011].
During the years different types of sensors have been used to perform visual SLAM, e.g. 2D laser scanners, mono cameras and stereo cameras [Sturm et al., 2011]. The development in the robotic community has lately shifted towards the use of sensors that are cheaper, lighter, smaller and thus often less accurate. This change allows for a use of UAVs to a bigger extent [Steder et al., 2008] and also in-creases the potential of getting SLAM based consumer products, e.g. autonomous vacuum cleaners, on the market.
The work load of implementing an embedded SLAM-system has decreased as the number of open source packages has increased. The Point Cloud Library, PCL, [Rusu and Cousins, 2011] is a standalone, large scale, open project for processing of 3D point clouds, which might become handy when working with applications concerning 3D SLAM. Another neat package is the OctoMap package which is a library used to describe mapped environments in a way that is both probabilistic and scalable [Wurm et al., 2010].
1.3
Problem Formulation
In this project SLAM will be performed using a Kinect™, instead of a laser scan-ner, as primary sensor. With a Kinect™ the sensor data cannot be expected to be as accurate as from a laser. Laser scanners are well known to have high accuracy in range. The Kinect™ has a maximum range of about eight meters while a laser scanner often has a maximum range of 50-80 meters.
Apart from being cheaper the Kinect™ has other advantages compared to a laser scanner. Its measurements are three dimensional (range and bearing in two direc-tions) while the data from a laser scanner is two dimensional (range and bearing). This makes the data more information dense. In addition to that the Kinect™ also has a RGB-camera which captures the surroundings in video pace. This gives the possibility to get 3D maps with coloured textures.
The ability to recognize similarities between images is a key factor for a visual SLAM algorithm to be successful. Without that ability the algorithm is lost. In this project the performance of indoor visual SLAM using a TurtleBot will be evaluated with the following questions as guidelines:
4 1 Introduction
• How does an indoor environment with few recognizable objects in the im-ages affect the performance of the algorithm?
• What adjustments can be suggested to improve the result in such an envi-ronment?
• Can a motion model improve the performance of the algorithm?
1.4
Limitations
In this project the Kinect™ will always be mounted on a TurtleBot, and not be used in combination with any other robot platform. The different algorithms for e.g. visual feature detection, pose estimation or map building, will all come from open source libraries, but it is in the scope of this project to make adjustments to them to fit our purposes. Furthermore the whole robot platform and much of
its functionality will be based on ROS2and any possible limitations in ROS will
consequently be reflected in the evaluations. It is not in the scope of this project to add any additional sensors to the robot, neither will the existing hardware be modified in any way.
Measurements will be performed indoors, in order not to expose the Kinect™ to too much IR radiation, in which case the measurements from the depth sensor is affected. When data is collected there will be no moving objects present in the environment, since such disturbances might affect the algorithm in an unpre-dictable way.
There will not be any demands on real time performance on algorithms in this project. Real time performance here refers to the ability of mapping a room at the same time as the robot moves around. It also means that all computations to build the map are completed in such a pace that the robot can move around freely in the environment and still have the most recent estimate of the surroundings at most one second after that the corresponding image was captured. Even if this project does not intend to fulfill any real time performance goals there might be time spent on trying to get real time performance of the system.
1.5
Cybercom
The master thesis project is carried out at Cybercom in Linköping. Cybercom is a consultancy focused on advanced IT and telecom solutions. The company was founded in 1995 and has been quoted on the NASDAQ OMX Nordic ex-change since 1999. Today the company has around 1700 employees and offices in Linköping, Gothenburg, Stockholm, Denmark, Finland and Singapore to name a few places. Furthermore Cybercom is the main sponsor of "Y-sektionen" (the Master of Science Programme in Applied Physics and Electronics at LiTH). The
1.6 Thesis Outline 5 Office in Linköping has around 60 employees working with big as well as small companies represented in the region.
1.6
Thesis Outline
The thesis continues with Chapter 2 that gives an introduction to the TurtleBot’s hardware and software. Chapter 3 explains the concept of SLAM in more detail and presents the open source SLAM algorithm, RGBDSLAM, used in this project. Chapter 4 describes the modeling and filtering used to extend the original RGBD-SLAM algorithm. Methods for data collection and evaluation of the results are found in Chapter 5. Finally the results are presented in Chapter 6 and a discus-sion is found in Chapter 7.
2
TurtleBot
This chapter starts with an introduction to the the hardware and software directly related to TurtleBot and continues with a short presentation of the vast open source library ROS, which holds much of the base functionality of the TurtleBot.
2.1
System Overview
TurtleBot is a robot kit that provides hardware and software for development of robot algorithms. The hardware consists of an iRobotCreate, a Kinect™ camera and a 150 degrees/second single axis gyro. An ordinary laptop is mounted on the
TurtleBot and it will be referred to as therobot computer. The software runs on
both therobot computer and a more powerful desktop computer. A complete list of
the hardware in a TurtleBot is given below and the list relates to Figure 2.1.
AMobile Base and Power Board
• iRobotCreate
• 3000 mAh Ni-MH battery pack • 150 degrees/second single axis gyro
• 12 V 1.5 A software enabled power supply (for powering the Kinect™)
B3D Sensor
• Microsoft Kinect™
• Kinect™ power board adapter cable 7
8 2 TurtleBot
CComputing :: ASUS 1215N
• Processors :: Intel® Atom™ D525 dual core processor • Memory :: 2 GB
• Graphics :: NVIDIA®ION ™ discrete graphics processor • Internal Hard Drive :: 250 GB
DTurtleBot Hardware
• Kinect™ mounting hardware • TurtleBot structure
• TurtleBot module plate with 1 inch spacing hole pattern
Figure 2.1:Hardware setup for TurtleBot robot kit.
2.2
iRobotCreate
iRobotCreate is an affordable mobile robot platform for educators, students and developers. iRobotCreate has over 30 sensors which react to both internal and ex-ternal events. It is built for indoor use only. In this project the interesting sensors in iRobotCreate are the single axis gyro and the left and right wheel encoder.
2.3 Kinect™ 9
2.3
Kinect™
The Kinect™ camera is a low cost sensor built specially for the Xbox console. It can deliver a RGB image and a depth image in parallel in video rate. Mi-crosoft has not released any official hardware specifications for the Kinect™ sen-sor and therefore unofficial specifications from reversed engineering is the most accessible way to get insight in the machinery. The OpenKinect community [http://openkinect.org, 2012] states that the Kinect™ has two cameras, one RGB-camera and one range RGB-camera. The latter is based on structured light. According to PrimeSense, the manufactures of the micro controller in the Kinect™, the pro-jected IR-points are processed by a PS1080A micro controller to produce a depth image. From design reference for PrimeSense PS1080A it can be inferred that:
Field of View(Horizontal, Vertical, Diagonal) = 58° H, 45° V, 70° D
Spatial x/y resolution(@ 2 m distance from sensor) = 3 mm
Depth z resolution(@ 2 m distance from sensor) = 1 cm
Operation range= 0.8 m - 3.5 m
2.4
Robot Computer
The robot computer enclosed with the TurtleBot is an ASUS eeePC which is a
lightweight netbook. The main task of therobot computer is to collect data and
send it to adesktop computer or record it to a hard drive. The robot computer also
interacts with the hardware in the TurtleBot. Due to the computational power
in the enclosedrobot computer some calculations can also advantageously be
per-formed on this computer, instead of letting thedesktop computer do all the work.
2.5
Desktop Computer
In this thesis a HP EliteBook 8460p laptop will be used as thedesktop computer.
The computer has an Intel Ci7 (2.8 GHz) CPU, 8 GB internal memory and an
Intel® HD Graphics 3000 integrated graphics card. The main task of the desktop computer is to do computations on data provided by the robot computer. Another
task is to visualize the result of the algorithms, a task that demands a powerful computer in order to run smoothly.
2.6
Software Enclosed with TurtleBot
10 2 TurtleBot
An SDK for the TurtleBot including all software connections between ROS,
iRobotCreate, Kinect™ and all transforms between coordinate systems fixed in different parts of the TurtleBot.
A development environment for the desktop including monitoring software
and tutorials for development.
Libraries for visualization, planning, control and error handling that are
spe-cific plug-ins for the TurtleBot in Rviz, which is a visualization applica-tion for ROS, described in Secapplica-tion 2.7. The visualizaapplica-tion applicaapplica-tion for
TurtleBot can be seen in Figure 2.2. Here also the included package
point-cloud_to_laserscan is used.
Demo applications examples The TurtleBot comes with a custom made version
ofgmapping for easy indoor mapping and planning. This application uses
the Kinect™ to simulate a laser scan and together with odometry and gyro data performs 2D-SLAM in real time.
Figure 2.2: Example of the included visualization tool Rviz. The
3D-TurtleBot robot model is included with the 3D-TurtleBot package.
2.7
Robot Operating System
ROS constitutes the base of the TurtleBot system and defines the means of exter-nal communication when building specialized applications. On the ROS home-page [http://www.ros.org, 2012], one can read the following:
”ROS provides libraries and tools to help software developers create robot appli-cations. It provides hardware abstraction, device drivers, libraries, visualizers, message-passing, package management, and more. ROS is licensed under an open source, BSD license”.
The ROS version used in this project isROS Electric Emys which was released
2.7 Robot Operating System 11
ROS is built up bystacks which in turn are built up by packages. A stack delivers
the functionality of a whole system or major parts of a system, e.g.
• theTurtleBot stack, which provides code for the TurtleBot
• theros_comm stack, which provides internal communication in ROS and
• theperception_pcl stack, integrate Point Cloud Library into ROS.
The packages, new and modified, of this project would be suitable to add together to a stack. A stack typically contains everything from a handful up to a couple of dozens of packages.
A ROS package often holds the functionality of a smaller, more delimited task
than a stack, e.g. thenavigation stack contains various packages that does
local-ization and route planning. Every package contains at least onenode. A node
typically solves one, or a few, separate tasks, e.g.
• therobot_pose_ekf node does EKF filtering,
• thepcd_viewer node provides a GUI for rendering point clouds and
• theimage_proc node can rectify RGB and depth images.
Nodes are started up using therosrun or roslaunch command. The first command
starts one single node whereas the latter can start multiple nodes at the same time using the specified launch file. The launch file also has the neat property that values of dynamic parameters may be supplied to the nodes. This results in an accessible way of troubleshooting and trimming without re-compiling the C++-code.
A ROS-message is a data structure that can be sent between nodes. Messages can
hold any information and ROS comes with useful predefined messages. User defined messages are also allowed and the ROS framework is prepared so that defining user specific messages is easily done.
The communication between nodes are handled either by atopic or a service. A
topic implements an asynchronous communication pattern. Amessage is posted
on the topic and the posting node continues its work. The nodes that listen to the
topic, thesubscribers, get a callback and can then process the information in the
posted message. A service implements a synchronous communication model and any service call is thus blocking for the calling node and requires the immediate attention of the answering node. What defines a service in ROS is the input mes-sage type, the service function to be run and the output mesmes-sage type. To define a new service one simply specifies the message types to use in a text file and a corresponding service class is automatically generated.
The ROS framework also has a neat command namedrosbag that can record and
play a bag file. A bag file is a file that contains the messages published on all
the topics specified when calling therosbag record command and it can therefore
3
Algorithms
In this chapter the functionality of a generic SLAM algorithm is explained. A pre-sentation of one of the most central open source packages in this project, RGBD-SLAM, is also made.
3.1
Under the Hood of a SLAM Algorithm
A SLAM algorithm essentially consists of the following steps:
• Data acquisition; In this step measurements from the sensors, e.g. laser scanner or video camera, are gathered.
• Feature extraction; a number of characteristic, and thereby easily
recogniz-able, landmarks1are selected from the data set.
• Feature association; landmarks from previous measurements are associ-ated with landmarks from the most recent measurement.
• Pose estimation; the relative change between the landmarks and the posi-tion of the vehicle is used to estimate the new pose of the vehicle.
• Map adjustment; the map is updated according to the new pose and the corresponding measurements.
The five tasks are continuously repeated and a trajectory of position estimates and a map is built up. Figure 3.1 illustrates the process. In Figure 3.1a the robot has received input data. Landmarks are extracted from the data. In the case of
1Landmarks - In this reportLandmarks, Keypoints and Features are synonymous.
14 3 Algorithms
visual SLAM, a landmark can be anything that is easily recognizable by a visual sensor, e.g.
• a corner, • an edge,
• a dot in a protruding color.
The robot in Figure 3.1a has extracted three landmarks and these are stored in a database containing all landmarks that have been observed so far. In Figure 3.1b the robot has moved forward and received new data. The robot extracts landmarks from the data and searches through its database to see if there are any matches with old landmarks. Extracted landmarks that are not found in the database are added and landmarks giving a match in the database are used to estimate the change of the robot’s pose. This is done by measuring the change in distance and angle to the old landmarks. When the new pose is estimated the robot uses this estimate and the measurements to adjust the positions of the landmarks. SLAM can be regarded as a hen and egg problem. A proper map is needed to get a proper pose estimate and a proper pose estimate is needed to get a proper map.
(a)Robot position and landmarks at time t0.
(b)Robot position and landmarks at time t1 > t0. Compared to 3.1a the robot has found two new landmarks and one of the old landmarks has gone out of sight.
3.2 RGBDSLAM 15
3.2
RGBDSLAM
This thesis project is based on an algorithm called RGBDSLAM, which is a six degrees of freedom (x,y,z, roll, pitch and yaw) algorithm that performs visual SLAM (VSLAM). The algorithm uses both the color (RGB) and the depth (D) in-formation and is generic due to the fact that it contains no motion model. The algorithm is developed by researchers from the University of Freiburg in Ger-many and is described in Engelharda et al. [2011] and Endres et al. [2011]. An implementation of the algorithm is provided as a package in ROS. A schematic overview of the algorithm can be studied in Figure 3.2.
Kinect™ sensor
Depth image RGB-image
Make point clouds from depth data
Match RGB and depth data
Pairwise 6D Transformation Estimation (RANSAC, GICP) Features and
corresponding point clouds from
database
Feature extraction from RGB-image (SURF, SIFT, ORB)
Global
optimization/Loop-closure (g2o)
Registrated point cloud
Figure 3.2:Schematic overview of the RGBDSLAM algorithm provided as a
16 3 Algorithms
As a first step the depth and RGB-images are collected with synchronized time-stamps. Then features are extracted from the RGB-image by a feature extraction algorithm. RGBDSLAM has multiple feature extraction algorithms implemented. The implementations have different pros and cons in different environments and they differ in computation time [Juan and Gwun, 2009]. The algorithms imple-mented are SURF, SIFT and ORB and the first two are described in section 3.4. In the next step of the algorithm extracted features are projected to the depth im-age. This step introduces some uncertainty into the chain of operations. Mainly due to the synchronization mismatch between depth and RGB-images, but also because of interpolation between points with large differences in depth. The fact that a minor misprojection of a feature lying on an object border on to the depth image can result in a big depth error makes features picked at object borders unreliable.
To find a 6D transform for the camera position in this noise the RANSAC algo-rithm is used. Features are matched with earlier extracted features from a set of 20 images in the standard configuration. The set consists of a subset including some of the most recent captured images and another subset including images randomly selected from the set of all formerly captured images. Three matched feature pairs are randomly selected and are used to calculate a 6D transform. All feature pairs are then evaluated by their Euclidian distance to each other. Pairs whose Euclidian distance is below a certain threshold are counted as inliers. From these inliers a refined 6D transform is calculated using GICP.
3.3
Random Sample Consensus
RANdom SAmple Consensus, or RANSAC for short, is an iterative algorithm used to adapt the parameters of a mathematical model to experimental data. RANSAC is a suitable method when a data set contains a high percentage of outliers, i.e. measurements that suffer from measurement errors so large that the validity of the measurements is low. The method was first presented in the beginning of the eighties in Fischler and Bolles [1981] and was suggested to be a suitable method for automated image analysis.
Assume a mathematical model that has n free parameters which can be estimated given a set of measurements, P . The number of measurements in P has to be greater than n, #P > n. Let S and T be two different varying subsets of P . Given the assumptions the RANSAC algorithm works as follows:
• Randomly select a subset of the measurements in P and call it S. Use S to make a first estimate of the n free parameters of the model.
• Use the current estimate of the model to select a new subset of points,
T , from the measurements that are within some error tolerance from the
model.
3.3 Random Sample Consensus 17
the free parameters of the model according to this new subset. Calculate a measure of how well T and the model coincide, store that value and select a new subset S.
• If T does not contain more measurements than the given limit, randomly select a new subset S from P and start all over again.
• If none of the selected subsets hold more measurements than the limit, exit in failure, or if the maximum number of iterations has been reached, exit. The method is characterized by three parameters:
1. The error tolerance used to determine when data is not part of the model. 2. The maximum number of iterations in the algorithm.
3. The minimum value on the number of measurements in a subset to be used for parameter estimation.
The big advantage with RANSAC is the robust way it handles outliers in the data set. The drawbacks with RANSAC is that it does not guarantee any solution, nor that a given solution is optimal. Furthermore the three parameters mentioned above are to a large extent problem specific, which means that experimental ad-justment to the specific case treated is required.
(a)Example of measurements. (b)Measurements as seen by RANSAC
with the model being a line. Stars are considered as outliers and dots are con-sidered as inliers.
Figure 3.3:Illustration of how RANSAC adapts a linear model to a data set
18 3 Algorithms
3.4
Feature Extraction
The two commonly used feature extractors Scale Invariant Feature Transform, SIFT, and Speeded Up Robust Features, SURF are breifly described below.
3.4.1
Scale Invariant Feature Transform
SIFT is a method for extracting and detecting landmarks in an image. The method was invented in 1999 [Lowe, 1999] and has since been widely used. The algo-rithm is, as the name suggests, invariant to scale transformations in images and also rotational transformations. SIFT is much appreciated for its reliability and repeatability which are important properties of a feature extractor/detector [Juan and Gwun, 2009]. The features are selected at maxima and minima of the color in an image that has first been smoothed and then filtered through a difference of Gaussian filter (DoG). The continuous two dimensional difference of Gaussian convolution kernel is f (x, y) = 1 2πσ2e −(x2+y2)/(2σ2) − 1 2πσ2K2e −(x2+y2)/(2σ2) , (3.1)
where K is a constant bigger than one which scales the standard deviation. The DoG filter creates two versions of the original image; one that is somewhat more blurred than the other. The blurring is carried out using approximate Gaussian filtering kernels. Then the most blurred image is subtracted from the other one and maxima and minima in the resulting image are detected. Detected points with low contrast or points lying along edges are discarded. Each detected key-point is characterized by the magnitude and rotation of the pixelwise image
gra-dient. Here Aij represents the value of each pixel, Mij is the magnitude of the
gradient and Rij is the orientation of the gradient.
Mij =
q
(Aij−Ai+1,j)2+ (Aij −Ai,j+1)2 (3.2)
Rij = arctan2(Aij−Ai+1,j, Ai,j+1−Aij) (3.3)
The above calculated values are post processed to increase the robustness to illu-mination changes.
3.4.2
Speeded Up Robust Features
The feature detection algorithm SURF is partly inspired by SIFT and was first presented in Bay et al. [2006]. The method aims at being a fast, but yet reliable, algorithm. The method is both scale- and rotation invariant and handles blur and changes in illumination in a satisfying manner [Juan and Gwun, 2009]. The filters used in SURF are discretized and cropped which results in rectangular filters with integral values. In this way the computational load is reduced and the extraction or detection of features is speeded up.
3.5 Registration 19
3.5
Registration
In order to build a map of several point clouds registration is necessary. Reg-istration is the process of finding the best possible transformation to stitch two point clouds together. The point clouds are called destination, D, and source, S. This section presents some of the most used and known registration methods. In Figure 3.4 an example of registration between two point clouds is showed.
Figure 3.4:Example of registration between two point clouds. The overlap
of the point clouds is seen most easily in the bottom of the figure. There is approximately 80% overlap.
3.5.1
Iterative Closest Point - Point to Point
Iterative Closest Point (ICP) is one of the most intuitive registration methods. The algorithm has been widely used and was developed in the early 90’s. A frequently cited analysis of the algorithm is Besl and McKay [1992]. The algorithm can be summarized in two key steps.
1. Find matching feature points between the two point clouds.
2. Compute a transform, T, that is a rotation and translation such that it min-imizes the Euclidian distance between the matching points.
An iteration of these two steps typically gives convergence to the desired trans-form. A key parameter to tune in the algorithm is the maximum matching
dis-tance threshold, emax. With a complete overlap of the two point clouds a high
value of emaxcan be used and the algorithm will still converge. The parameter
emaxhas to be low if matching points only come from a small overlap, otherwise
20 3 Algorithms
between convergence and accuracy. Segal et al. [2009] presents the algorithm as in Algorithm 1.
Algorithm 1Standard Iterate Closest Point algorithm
Input: Two point clouds: D = {ai}S= {si}
1: Initial guess: Transform T0
Output: Correct Transform T
2: T= T0
3: whilenot converged do
4: fori = 1 to N do 5: di = FindClosestPointInD(T · si) 6: if ||di −T · si||< emaxthen 7: wi = 1 8: else 9: wi = 0 10: end if 11: end for 12: T= argmin T X i wi||T· si−di||2 13: end while
3.5.2
Generalized Iterative Closest Point
A far more advanced algorithm to handle registration is Generalized ICP, GICP, first presented by Segal et al. [2009]. This can be thought of as a plane to plane matching algorithm. By assigning probabilistic properties to extracted features a most likelihood estimate of the transform can be achieved. Furthermore the probabilistic framework can make use of all general research in probabilistic techniques for increased robustness, such as outlier rejection. The probabilistic
model assumes that points in the measured point clouds ˆD= { ˆai}and ˆS = { ˆsi}are
independently Gaussian distributed. This will generate the probabilistic point
clouds D and S with points ai ∼ N
ˆai, CiD and si ∼ N ˆ si, CiS . Here CiD and
CiS are the three dimensional covariance associated with the measured points.
By defining T∗as the correct transform when perfect compliance between points
occur, it is known that
ˆ
si = T
∗ ˆ
di. (3.4)
Defining the function
ei(T) = si−Tdi (3.5)
and considering this function evaluated with T∗, as in equation (3.5), gives the
probabilistic function ei(T) ∼ N ˆ si−(T ∗ ) ˆdi, CiS+ (T ∗ )CiD(T∗)T. (3.6)
3.6 General Graph Optimization 21
This is the objective function for optimization and a most likelihood estimate can be found solving T=argmax T Y i p(ei(T)) =argmax T X i log(p(ei(T))) ,
where p is a Gaussian probability density function for a three dimensional vari-able with independent elements, see Appendix A.1. This can be simplified to
T=argmin T X i ei(T)T CiS+ TCiDTT−1ei(T) . (3.7)
The solution of Equation (3.7) gives a transform that optimize the point cloud matching in a probabilistic manner. The equation can also be reveal that this truly is a Generalized-ICP algorithm. By setting
CiS= I
CiD= 0
the standard Point to Point ICP algorithm is obtained. In this case Equation (3.7) becomes T=argmin T X i ei(T)Tei(T) =argmin T X i ||ei(T)T||2 ,
which coincides the Point to Point ICP algorithm for matching features (wi = 1),
see line 12 in Algorithm 1.
3.6
General Graph Optimization
The General Graph Optimization package (abbreviated g2o) is an open source
library which implements a general and efficient graph optimizer. A graph op-timizer achieves a minimization of a nonlinear error function represented as a graph, see Figure 3.5. The nodes represent vectors of parameters and the edges how well two parameter vectors match to external constraint, relating the two
pa-rameter vectors. In the case where g2o is used by RGBDSLAM each node in the
graph represents a state variable of the robot and each edge represents a pairwise observation between the nodes that are connected by that edge. The meaning of a
22 3 Algorithms
pairwise observation is that node B observes its pose with respect to node A, and vice versa. The algorithm is fast due to, among other things, its use of the sparse connectivity property of the graph and advanced solvers of sparse linear systems. The interested reader may refer to Kümmerle et al. [2011a] for a more thorough presentation. x1 x2 x3 x4 e12 e31 e23 e24 F(x) =e12TI12e12+ e23TI23e23+ e24TI24e24+ e31TI31e31
Figure 3.5: To the left a graph with nodes and edges and to the right the
corresponding nonlinear error function where Iij is the information matrix
4
Models and Filtering
So far this thesis has given an overview of the system and an introduction to the open source algorithm RGBDSLAM. In an attempt to improve the performance of the system, according to the guidelines stated in Section 1.3, three steps are taken:
1. Wheel odometry and gyro data is measured.
2. An appropriate motion model for the robot is implemented.
3. Already existing measurements from visual odometry are fused with wheel odometry, gyro measurements and the motion model using an Extended Kalman Filter.
These three modifications result in an algorithm that in the following will be called the modified algorithm as opposed to the original algorithm.
This chapter introduces the mathematical models used to formulate the SLAM problem in a rigorous manner. The models consist of both motion models that describe the motion of the robot and measurement models that demonstrate how the sensor data is inserted into the framework. The chapter continues with an explanation of key properties of the filter that fuses the motion model and the measurement models.
4.1
Notation
The state vector of the robot at time instance k is called xk and holds a three
dimensional pose, i.e. a three dimensional position and a three dimensional ori-23
24 4 Models and Filtering entation, xk = Xk Yk Zk φk θk ψk T . (4.1)
Here Xk,Ykand Zkrepresent the position of the robot in a world fixed coordinate
system and φk, θkand ψk, the roll, pitch and yaw angles, are defined as rotations
around the world fixed X , Y and Z axes, see Figure 4.1. The robot speed is
denoted vk and is the speed in the case of 2D movement in X - and Y -direction.
The angular velocity around the world fixed Z axis, i.e the derivative with respect
to time of the yaw angle is denoted ωk.
Figure 4.1:Definitions of roll, pitch and yaw.
By using vkand ωkan extended state vector on the form
xk=
Xk Yk Zk φk θk ψk vk ωk
T
(4.2)
can be formed. This state vector is used everywhere except in the original
turtle-bot_node package.
The sample time cannot be considered uniform and thus the sample time from
event k − 1 to k is denoted Tk.
The system can be modeled as a state space model, which on a generic form is
xk+1= f (xk, uk) + wk, (4.3)
yk= h(xk, uk) + ek, (4.4)
4.2 Motion Models 25
R respectively, i.e.
wk∼ N(0, Q), (4.5)
ek∼ N(0, R), (4.6)
and uk is the direct input to the system. Inputs to the system are supplied using
the state vector and there will not be any direct inputs to the system. Therefore
ukwill onwards be disregarded.
The measurement equation (4.4) will in the forthcoming cases be linear and can consequently be simplified as
yk = h(xk) + ek (4.7)
= Cxk+ ek, (4.8)
while the motion model (4.3) remains nonlinear.
4.2
Motion Models
RGBDSLAM has no motion model and hence there are no constraints to the es-timated trajectory of the robot. The absence of a motion model also makes the algorithm heavily dependent on a continuous stream of images that contains suf-ficiently many and re-detectable landmarks. To decrease the dependency of land-marks well trimmed motion models are introduced. This section will describe different motion models considered in this project.
4.2.1
Original Motion Model
The native navigation packages for the TurtleBot contain an EKF with a motion model. The filter uses a 6D model and estimates the robot pose with a 3D position and 3D orientation estimate. The filter fuses data from the wheel odometry, the gyro and the visual odometry. Given the state vector (4.1) the general motion model is: xk+1= Xk+1 Yk+1 Zk+1 φk+1 θk+1 ψk+1 = f (xk) + wk = Xk Yk Zk φk θk ψk + wk (4.9)
The model models a constant position and fixed angles around the axes. It is a very simple and perhaps not an appropriate model for a moving robot.
26 4 Models and Filtering
4.2.2
2D Constant Velocity Model
By using the extended state vector in (4.2), a more dynamic but still relatively simple model can be formed as
xk+1= Xk+1 Yk+1 Zk+1 φk+1 θk+1 ψk+1 vk+1 ωk+1 = f (xk) + wk = Xk+ Tkcos(ψk)vk Yk+ Tksin(ψk)vk 0 0 0 ψk+ Tkωk vk ωk + wk. (4.10)
This model describes a 2D motion where the robot is always in an upright posi-tion and travels with constant velocity. This model might fit the experimental set up better than the model in (4.9).
4.2.3
2D Coordinated Turn Model
The motion modelling can be extended further by using the current yaw, ψk and
the angular velocity ωkin a more sophisticated way. This is conveniently done by
a 2D discretized coordinated turn model using polar velocity [Gustafsson, 2010], which is given in (4.12). xk+1= Xk+1 Yk+1 Zk+1 φk+1 θk+1 ψk+1 vk+1 ωk+1 = f (xk) + wk (4.11) = Xk+ω2vk sin( ωkTk 2 ) cos(ψ + ωkTk 2 ) Yk+ ω2vksin( ωkTk 2 ) sin(ψ + ωkTk 2 ) 0 0 0 ψk+ Tkωk vk ωk + wk. (4.12)
4.3 Measurement Models 27
4.3
Measurement Models
Measurement models are used to define how the sensors’ measurements and the states are related. The system consists of the three different sensor units
• wheel odometry, • gyro and
• visual odometry.
The three units contribute to the measurement equation in different ways. Note that the extended state vector (4.2) is used through out this section.
4.3.1
Wheel Odometry
Tachometers are used to measure the wheel speed of the robot. The TurtleBot platform uses specific hardware details and integration of speed to provide
mea-surements of Xk, Yk, ψk and the speed vk. The measurement model of the wheel
odometry sensor is given as
yodom,k = Codomxk+ eodom,k(d) (4.13)
= 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 Xk Yk Zk φk θk ψk vk ωk + eodom,k(d) (4.14) = Xk Yk ψk vk + eodom,k(d). (4.15)
Here eodom,k(d) is the sensor noise. The measured states from the odometry sensor
are incrementally updated from an initial position and therefore the noise is also increasing. A simplified model for the noise is a Gaussian noise model according to
eodom,k(d) ∼ N (0, Rodom(d)) (4.16)
where the covariance, Rodom(d), depends on the traveled distance, d, as
Rodom(d) = dS,
where S is a design parameter in form of a constant covariance matrix for the specific sensor.
28 4 Models and Filtering
4.3.2
Gyro
The robot has a single axis gyro which measures the yaw angle and the angular velocity around the z-axis i.e.
ygyro,k = Cgyroxk+ egyro,k (4.17)
= 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 ! Xk Yk Zk φk θk ψk vk ωk + egyro,k (4.18) = ψk ωk ! + egyro,k. (4.19)
The noise egyro,kis modeled as Gaussian noise with zero mean,
egyro,k ∼ N(0, Rgyro). (4.20)
4.3.3
Visual Odometry
The visual odometry provides information from the RGB camera and the depth camera of the Kinect™. Collection and post processing of data is described in Section 5.2. The concerned algorithms that calculate useful information from the post processed data are presented in Chapter 3. The result of the algorithms is an estimate of the full 3D pose of the robot, i.e.
4.3 Measurement Models 29
yvo,k= Cvoxk+ evo,k(votrust) (4.21)
= 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 Xk Yk Zk φk θk ψk vk ωk
+ evo,k(votrust) (4.22)
= Xk Yk Zk φk θk ψk
+ evo,k(votrust) (4.23)
Here evo,k(votrust) is modeled as Gaussian noise according to
evo,k∼ N(0, Rvo(votrust)). (4.24)
The measurement noise, Rvo(votrust), depends on the parameter votrust which
varies in a nonlinear way as described in Algorithm 3.
4.3.4
Total Model
Concatenation of the three measurement models above gives a total measurement equation on the form
yk = yodom,k ygyro,k yvo,k (4.25) = Codom Cgyro Cvo xk+ eodom,k egyro,k evo,k (4.26) = Cxk+ ek (4.27)
30 4 Models and Filtering
where as a result the matrix C is
C = 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 . (4.28)
The measurement noise ekbecomes
ek(d, votrust) = eodom,k(d) egyro,k
evo,k(votrust)
(4.29)
with a corresponding covariance matrix R as
R(d, votrust) = Rodom(d) 0 0 0 Rgyro 0 0 0 Rvo(votrust) (4.30)
due to the assumption that the different measurement noises are independent from each other.
4.4
Extended Kalman Filtering
In this section the filtering process is described. All the sensors provide relative measurements, meaning that the measurements depend on earlier states. The input to the filter is given as the difference of the new measurement and an earlier
estimated state ˆxk. Wheel odometry and gyro measurements are processed at
full rate and the visual measurements are processed at a sub rate. The filtering process can be studied in Figure 4.2. The filter is an Extended Kalman Filter using first order Taylor expansion. It is implemented by the open source project Orocos Bayesian Filtering Library [http://www.orocos.org/bfl, 2012]. The filter used in this project is described by Algorithm 2.
4.4 Extended Kalman Filtering 31
Algorithm 2Extended Kalman Filter
Require: ˆx0|0and ˆP0|0 Time update ˆ xk+1|k= f ( ˆxk|k) Pk+1|k = f 0 x( ˆxk|k)Pk|k(f 0 x( ˆxk|k))T + Qk Measurement update Sk = h 0 x( ˆxk|k−1)Pk|k−1(h 0 x( ˆxk|k−1))T Kk = Pk|k−1(h 0 x( ˆxk|k−1))TS −1 k εk = yk−h( ˆxk|k−1) ˆ xk|k= ˆxk|k−1+ Kkεk Pk|k = Pk|k−1−Pk|k−1(h 0 x( ˆxk|k−1))TS −1 k h 0 x( ˆxk|k−1)Pk|k−1
where Qkis the process noise and h(x) defines the measurement equation
accord-ing to the general notation in Equation (4.7). The measurement equation is linear
and therefore the derivative h0x(x)) = ∂Cx∂x = C is a constant matrix. The
deriva-tive of the motion model, f0
x(x), for the models in Section 4.2 are presented in
Appendix A.2.
4.4.1
Filter and Model Evaluation
An overview of the filter and its information flow can be studied in Figure 4.2. The loop updating the filter with visual odometry data is run in a sub rate com-pared to the loop updating the filter with wheel odometry and gyro measure-ments, which is run at full filter rate.
Filtering process Odom/gyro Kinect™ Dead recon RGBDSLAM EKF xˆk full rate sub rate
32 4 Models and Filtering
A sample of how the input information to the filter may look is showed in Figure
4.3. The covariance of visual odometry depends on the votrustparameter which
in turn depends on the number of reliable landmarks. The covariance of the wheel odometry sensor depends on the traveled distance since last visual odome-try update. This models the increased uncertainty related to the wheel odomeodome-try sensor as the distance to the last visual measurement increases. Figure 4.4 shows the prior distribution of the filter, i.e. the distribution after system update but before measurement update. The posterior distribution is updated by the filter during every measurement update. The dots every third update symbolizes that visual odometry has provided information to the filter. These dotted estimates are the ones sent back to the RGBDSLAM algorithm. It can also be seen that the filter covariance increases in the same pattern as the odometry covariance.
0.3 0.32 0.34 0.36 0.38 0.4 0.42 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.3 1.31 X [m] Y [m ]
Filter inputs from measurements
Odom/Gyro measurement Vo measurement
Figure 4.3:Measurements from wheel odometry, gyro and visual odometry
4.4 Extended Kalman Filtering 33 0.3 0.32 0.34 0.36 0.38 0.4 0.42 1.22 1.24 1.26 1.28 1.3 1.32 X [m] Y [m ] Filter update Prior estimate
Odom/Gyro influenced filter estimate Vo influenced filter estimate
5
Implementation and Evaluation
Methods
In this chapter the cooperation of RGBDSLAM and the filter is described. Meth-ods for data collection and post processing are also presented. The chapter is ended by a section that explains the experimental setup and evaluation criteria that are used when evaluating the implemented algorithm.5.1
Filter Integration in RGBDSLAM
The models in Chapter 4 are well defined and strict in a mathematical sense but are not well suited for direct implementation in C++ and ROS. The theoretical models need some adjustments to fit the framework and additional functionality e.g. synchronization of data is required.
RGBDSLAM- Pose estimation
RANSAC GICP Filtercall
EKF
Reception of filtered pose
Odom/gyro input
Figure 5.1:The tasks of pose estimation.
36 5 Implementation and Evaluation Methods
Figure 5.1 gives an overview of how the pose estimation is performed in the al-gorithm. When new RGB and depth images, wheel odometry data and gyro data are captured, three things can happen:
1. All of the information is processed.
2. Only the information from the odometry and gyro sensors is processed.
3. None of the information is processed.
In thefirst case, where all data is used, the visual data is joined by the odometry
and gyro data in the box named “Filter call“. Here the measurements from the three sensors are inserted into a ROS service call and sent, using a blocking call-response method to the filter. During the service call the input data to the filter is published on the input topics of the filter node. After reception of new data the filter checks the time stamps of the measurements and, if the three measurements are sufficiently close in time, the distribution of the internal Gaussian probability distribution is updated accordingly. A new estimate of the mean value of the distribution is computed and returned as the current pose estimate.
In thesecond case, where the visual data is disregarded, a multirate filter call is
car-ried out. In this case a filter update with only odometry and gyro measurements (and information from the motion model) are used to create a new estimate of the robot position. This estimate is not sent back to RGBDSLAM, but used internally in the filter.
Thethird case, where none of the data is used, is when the parameters for data
skipping are used to lower the rate of input data to the algorithm.
If data is scattered in time the distribution is not updated. As a consequence the
time between updates, in Chapter 4 referred to as Tk, is varying.
5.1.1
Covariance of Visual Odometry
The implemented software has a parameter, votrust(as in visual odometry trust)
which adapts the trust in the visual data. Its value reflects how well the visual SLAM algorithm has succeeded in finding and matching landmarks in the pro-cessed images. The parameter is set according to the following algorithm.
5.2 Data Collection and Post Processing 37
Algorithm 3Set Trust in Visual Odometry
Input: Number of total matches between images, Nm, number of inliers (found
by RANSAC), Ni, boolean use_loop_closure, sequential ID of nodes that are
matched, I D1 and I D2
Output: Noise parameter for visual measurement votrust
1: ifuse_loop_closure then
2: ifI D2 − I D1 < minimum distance for loop closure then
3: votrustis set to adjustable value less than one
4: loop_closure = true
5: end if
6: end if
7: if!loop_closure then
8: ifvotrust> 10 & Nm, 0 & Ni>= 10 then
9: ifNi/Nm> 0.7 then 10: votrust= votrust/10 11: end if 12: else 13: votrust= 1 14: ifNi== 0 then 15: votrust= 10000 16: else ifNi < 10 then 17: votrust= 1000 18: else ifNm, 0 then 19: ifNi/Nm< 0.75 then 20: votrust= 5 21: else ifNi/Nm< 0.8 then 22: votrust= 3 23: else ifNi/Nm< 0.85 then 24: votrust= 2 25: end if 26: else 27: votrust= 1000000 28: end if 29: end if 30: end if
As Algorithm 3 shows the value of the parameter is decreased if there is a possible loop closure and the value is increased if there are few matches of landmarks between images.
5.2
Data Collection and Post Processing
In order to test parameter adjustments or new coding strategies in a repeatable way, it is not feasible to every now and then start up the TurtleBot and run it
38 5 Implementation and Evaluation Methods
along some predefined track. The reasons for this are various, but amongst them are
• the time consumption,
• the variation of the driven route and • changes in the environment.
The solution is to record the data generated by a drive in such a way that it can be
replayed time after time. This is conveniently done by therosbag command, see
Section 2.7.
The visual data is presented in different ways on different topics. The data from the gyro and wheel odometry sensors are given in one single form on one single topic each and there are hence no alternatives on how to record them. In addition
to the concrete measurement data, the/tf topic is recorded. This topic contains
the transformations between different coordinate systems fixed in both the Turtle-Bot and its surroundings. In the post processing, when RGBDSLAM is running, the/tf topic is supplemented with other transforms published by RGBDSLAM.
Spelled out, all the recorded topics are: • /camera/depth/camera_info • /camera/depth/image • /camera/rgb/camera_info • /camera/rgb/image_color • /imu/data • /odom • /tf
Recording all these topics gives data in a rate of about 70 [Mb/s]. The data cannot be replayed at full speed, since that would make the input buffers of RGBDSLAM to overflow. Instead data is played in a speed of 1/2 to 1/20 of full speed depend-ing on the parameter values used in the algorithm. The data flow when replaydepend-ing a bag file can be seen in Figure 5.2.
5.3 Evaluation Criteria 39 /rosbag_play /camera/rgb/image_color /camera/rgb/camera_info /camera/depth/image /rgbdslam_mod /clock /tf /robot_pose_ekf_mod /imu/data /odom
Figure 5.2: Data flow when playing a bag file. The ellipses are ROS nodes
and the rectangles are ROS topics.
5.3
Evaluation Criteria
Evaluation methods can be very different depending on the experimental setup and the resources available. A motion tracking system would for example give the opportunity to determine the position of the robot with a high precision. A sensitive laser scan equipment could, by scanning the environment where tests are carried out, give a sort of ground truth for the estimated 3D map. In this project however, such high precision evaluation instruments has not been used. Instead the results have been evaluated using more ordinary techniques: 3D maps have been evaluated using ocular inspection and the estimated trajectories have been evaluated using the ground truth matcher, described in Section 5.3.1. There are three key factors which are altered in order to evaluate the performance of the modified algorithm. Trajectories and maps can be computed
1. with or without filter,
2. with the original algorithm parameters set to give a robust or a less compu-tationally heavy performance and
3. using different data sets.
The parameter alternatives mentioned above are specified in Table 5.1. The differ-ent choices essdiffer-entially consist of deciding how often to call the filter, how many keypoints to extract and how many matches to require between images.
The filter frequency determines how often to send both visual and odometric in-formation to the filter whereas the multirate filter frequency sets how often to
40 5 Implementation and Evaluation Methods
Table 5.1:Parameter specifications.
Robust setup Slimmed setup Comment
Filter frequency 2.5 hz 1.5 hz Vo, odom and
gyro measure-ments
Multirate filter frequency 7.5 hz 6 hz Only odom
and gyro mea-surements
Max. keypoints 400 75 Maximum
number of keypoints ex-tracted from image
Min. keypoints 100 50 Minimum
number of keypoints ex-tracted from image
Sufficient matches 100 60 Sufficient
number of matches be-tween key-points from different im-ages
Min. matches 75 20 Minimum
number of matches be-tween key-points from different im-ages
5.3 Evaluation Criteria 41
send odometric data only. The maximum and minimum number of keypoints simply specify how many keypoints to extract from the images using the chosen image extractor (SURF is used in this project). The sufficient matches and min. matches parameters decide how many matches between the sets of keypoints from different images that are to be seen as a sufficient number and a minimum number, respectively. If the sufficient number of keypoints is reached, the algo-rithm will not try to get further matches and if there are fewer than the minimum value of matches, the algorithm will not try to calculate any transform between the corresponding images.
Two data sets has been used in the evaluation. These are referred to as the “figure-eight“ data set and the “few landmarks“ data set. In the figure-eight set the robot is run two laps in a figure-eight pattern, starting with a left turn at the base of the eight. The data set is quite ordinary in the sense that the room where it is recorded is not so big and the environment gives the robot varying depth and RGB information. The second data set, the few landmarks set, unsurprisingly give the algorithm rather few landmarks to work with. It is recorded in a partly more sterile environment with a flat, white wall. Both data sets are recorded by driving the robot manually with a PlayStation 3 control on a track marked with tape on the floor. The different environments can be studied in Figure 5.3 and Figure 5.4.
(a)Overview of the office room.
(b)Another view from the office room.
Figure 5.3: Views from the office in which the figure-eight data set was