Pose estimation of a VTOL UAV using IMU, Camera and GPS

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Pose estimation of a VTOL UAV using IMU,

Camera and GPS

Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping

av

Fredrik Bodesund LiTH-ISY-EX--10/4360--SE

Linköping 2010

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Pose estimation of a VTOL UAV using IMU,

Camera and GPS

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Fredrik Bodesund LiTH-ISY-EX--10/4360--SE

Handledare: Martin Skoglund

isy, Linköpings universitet

Magnus Sethson

CybAero AB

Examinator: David Törnqvist

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution

Division, Department

Division of Automatic Control Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2010-10-15 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version http://www.control.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-60641 ISBN — ISRN LiTH-ISY-EX--10/4360--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title

Position- och orienteringsskattning av en VTOL UAV med IMU, Kamera och GPS Pose estimation of a VTOL UAV using IMU, Camera and GPS

Författare

Author

Fredrik Bodesund

Sammanfattning

Abstract

When an autonomous vehicle has a mission to perform, it is of high importance that the robot has good knowledge about its position. Without good knowledge of the position, it will not be able to navigate properly and the data that it gathers, which could be of importance for the mission, might not be usable. A helicopter could for example be used to collect laser data from the terrain beneath it, which could be used to produce a 3D map of the terrain. If the knowledge about the position and orientation of the helicopter is poor then the collected laser data will be useless since it is unknown what the laser actually measures.

A successful solution to position and orientation (pose) estimation of an au-tonomous helicopter, using an inertial measurement unit (IMU), a camera and a GPS, is proposed in this thesis. The problem is to estimate the unknown pose us-ing sensors that measure different physical attributes and give readus-ings containus-ing noise.

An extended Kalman filter solution to the simultaneous localisation and

map-ping (SLAM) is used to fuse data from the different sensors and estimate the pose

of the robot. The scale invariant feature transform (SIFT) is used for feature extraction and the unified inverse depth parametrisation (UIDP) model is used to parametrise the landmarks. The orientation of the robot is described by quater-nions.

To be able to evaluate the performance of the filter an ABB industrial robot has been used as reference. The pose of the end tool of the robot is known with high accuracy and gives a good ground truth so that the estimations can be evaluated. The results show that the algorithm performs well and that the pose is estimated with good accuracy.

Nyckelord

(6)

(7)

Abstract

When an autonomous vehicle has a mission to perform, it is of high importance that the robot has good knowledge about its position. Without good knowledge of the position, it will not be able to navigate properly and the data that it gathers, which could be of importance for the mission, might not be usable. A helicopter could for example be used to collect laser data from the terrain beneath it, which could be used to produce a 3D map of the terrain. If the knowledge about the position and orientation of the helicopter is poor then the collected laser data will be useless since it is unknown what the laser actually measures.

A successful solution to position and orientation (pose) estimation of an au-tonomous helicopter, using an inertial measurement unit (IMU), a camera and a GPS, is proposed in this thesis. The problem is to estimate the unknown pose us-ing sensors that measure different physical attributes and give readus-ings containus-ing noise.

An extended Kalman filter solution to the simultaneous localisation and

map-ping (SLAM) is used to fuse data from the different sensors and estimate the

pose of the robot. The scale invariant feature transform (SIFT) is used for fea-ture extraction and the unified inverse depth parametrisation (UIDP) model is used to parametrise the landmarks. The orientation of the robot is described by quaternions.

To be able to evaluate the performance of the filter an ABB industrial robot has been used as reference. The pose of the end tool of the robot is known with high accuracy and gives a good ground truth so that the estimations can be evaluated. The results show that the algorithm performs well and that the pose is estimated with good accuracy.

Sammanfattning

När en autonom farkost skall utföra ett uppdrag är det av högsta vikt att den har god kännedom av sin position. Utan detta kommer den inte att kunna navigera och den data som den samlar in, relevant för uppdraget, kan vara oanvändbar. Till exempel skulle en helikopter kunna användas för att samla in laserdata av terrängen under den, för att skapa en 3D karta av terrängen. Om kännedomen av helikopterns position och orientering är dålig kommer de insamlade lasermät-ningarna att vara oanvändbara eftersom det inte är känt vad lasern faktiskt mäter. I detta examensarbete presenteras en väl fungerande lösning för position och orienterings estimering av autonom helikopter med hjälp av en inertial

measure-ment unit (IMU), en kamera och GPS. Problemet är att skatta positionen och

(8)

vi

orienteringen med hjälp av sensorer som mäter olika fysiska storheter och vars mätningar innehåller brus.

En extended Kalman filter (EKF) lösning för simultaneous localisation and

mapping (SLAM) problemet används för att fusionera data från de olika sensorerna

och estimera positionen och orienteringen. För feature extrahering används scale

invariant feature transform (SIFT) och för att parametrisera landmärken används unified inverse depth parametrisation (UIDP). Orienteringen av roboten beskrivs

med hjälp av qvartinjoner.

För att evaluera skattningarna har en ABB robot används som referens vid datainsamling. Då roboten har god kännedom om position och orientering av sitt främre verktyg gör detta att prestandan i filtret kan undersökas. Resultaten visar att algorithmen fungerar bra och att skattningar har hög noggrannhet.

(9)

Acknowledgments

I would first of all like to thank CybAero for letting me perform this thesis, and especially Magnus Sethson who was my supervisor from CybAero. It has been an interesting and fun task filled with problems and obstacles. I have gained much knowledge and experience during this period because of the difficult and challenging problem.

Second of all I would like to thank my examiner David Törnqvist and supervisor Martin Skoglund for all your input and support. Without your help, I would still be looking for errors in the code and would never have found that special misplaced minus sign.

I would like to thank my parallel thesis companion Erik Almqvist. The time spent working on this thesis would have been very difficult, boring and unsatisfying without you. I hope our frequent visits at various cafés have given you something more than a bigger belly.

I would also like to thank Peter Nordin and Lars Anderson for all your help, especially during the start of the thesis. You knowledge about the hardware and how to record the data was very valuable and saved Erik and me a lot of headache. Finally I would like to thank my wife for all support and patience during my education and this work. It is finally finished and I hope that you are proud of me.

(10)

(11)

4.4.2 Feature Initialization . . . 37 5 Sensor Calibration 39 5.1 Time Synchronization . . . 39 5.1.1 Visual Synchronization . . . 40 5.1.2 Synchronization by Correlation . . . 40 5.2 IMU Calibration . . . 41 5.3 Camera Calibration . . . 46 5.3.1 Camera Model . . . 47 5.3.2 Calibration Method . . . 47 5.4 GPS Calibration . . . 49 6 State Estimation 51 6.1 Sensor Fusion . . . 51 6.1.1 Data Management . . . 52 6.1.2 Time Step . . . 52

6.1.3 State Estimation Algorithm . . . 52

6.2 Data Association and Map Management . . . 54

6.2.1 Landmark Initiation . . . 54

6.2.2 Data Association . . . 55

6.2.3 Map Management . . . 56

7 Results 57 7.1 Flight Over Book . . . 57

7.1.1 Test 1: IMU and GPS . . . 59

7.1.2 Test 2: IMU and Camera . . . 64

7.1.3 Test 3: IMU, Camera and GPS . . . 68

8 Conclusions & Future Work 73

Bibliography 75

(13)

Chapter 1

Introduction

Knowing your position in the world is of vital importance for anyone who wants to go from one place to another. Humans navigate by using inputs from their senses, e.g. eyes, ears. By fusing these inputs with previous knowledge of the environment, a person can come to a conclusion of where he is. If the person is in an unknown environment he gathers new data that are stored in the mind. This new data can be used for navigation in this unknown area and if he returns to the same place again he can recognise the environment and come to the conclusion that he has been there before. To further help the navigation, for example, a map could be used. Humans, however, are not perfect for navigation and it is easy to get lost in unknown terrains since our memory and senses are not flawless.

When the same problem is transferred to a robot where the exact position and

orientation, also called pose, is unknown the robot has to rely on data from its

sensors to estimate its pose. But the data from the sensor contains noise and therefore the true position can never be known. Another problem is that the estimated position contains an error that will add to the next estimated position error, resulting in drift from the true position that increases as time goes on. To compensate for this drift the robot must either have knowledge of the drift or a sensor that gives reading of its true position, like the global positioning system (GPS). Together with the sensor data a dynamic model can be used to describe the motion of the robot. Therefore there exist kinematic conditions that have to be fulfilled even if the sensor data gives readings that contradict more natural motions because of noise in the measurement. By fusing the dynamic model and sensor readings a better pose estimate can be achieved. The sensor readings and dynamic model are often weighted and fused in a probabilistic filter, like an extended Kalman

filter (EKF) or particle filter (PF).

To further improve the navigation estimate the robot could build a map of its environment resulting in simultaneous localisation and mapping (SLAM). By adding landmarks to this map the robot can position itself with respect to these landmarks. The landmarks are used in the same way that we humans use reference objects in our surroundings from which we can navigate. These landmarks can also be used by the robot to recognise that the scene has been previously observed,

(14)

2 Introduction

so called loop closure. In this work only short time SLAM will be used which means that only the most recent landmarks will be saved and old landmarks will be removed if not observed over a time period. Therefore only the most recent landmarks will be used for navigation. This can be compared to visual odometry where features are tracked for the sole purpose of estimating the trajectory.

This part of the work will give a description of the problem and its background, a presentation of CybAero, related work, objectives for the work and a disposition of this report.

1.1 Background

CybAero develops and manufactures vertical take off and landing unmanned aerial

vehicles (VTOL UAV), i.e. unmanned helicopters, see Figure 1.1. One of many

applications for these vehicles is to measure the ground and produce a topographic map. These maps can be used in civil applications to measure volumes of garbage, wooden chips etc. In military applications it may be used to compare topographic maps with different timestamps in order to reveal newly hidden objects, explosives for example.

Tekniska Verken AB is a company that (among many other services) handles waste from households and industries, producing and supplying district heating and cooling via its own network, as well as electricity. Waste that cannot be recycled or transformed into energy is stored in a landfill. There are regulations and laws concerning the growth of these landfills, hence Tekniska Verken AB has to measure their landfill volumes once a year. Measuring the growth of the landfills is today made by measuring the position of several points across the landfills with a GPS. This is a dangerous task and accidents have occurred. The volume estimate is also very crude since the measurements points are few and the GPS data has errors.

The idea to use a helicopter equipped with a light detection and ranging (LI-DAR) sensor to measure the landfill was first proposed by Tekniska verken. They then contacted CybAero to see if they could provide this service which is examined in this work.

1.2 CybAero AB

CybAero develops and manufactures, as mentioned earlier, VTOL-UAV and re-lated sensor systems. Each system is built to meet customer specifications for civilian or military applications.

CybAero got its formal start in 2003, although the research and development for the company’s technology began in 1992 via a joint research project between The Swedish National Defence Research Agency (FOI) and Linköping University. The headquarters of CybAero is located in Linköping, Sweden. Today, CybAero has 18 employees.

(15)

1.3 Objectives 3

(a) Cybaero APID 60 Helicopter.

(b) Helicopter flying over landfills collecting data for pose estimation and 3D mapping.

Figure 1.1. The APID 60 helicopter (a) and the helicopter in flight (b).

1.3 Objectives

The service that CybAero wants to provide for Tekniska Verken AB is to produce a topographic map of the ground with a LIDAR equipped helicopter. It is then possible to estimate the volume of the landfill from this topographic map. In order to create this topographic map it is required that there exists a good knowledge

(16)

4 Introduction

about the pose of the helicopter at each instant. Without good knowledge of the position and the orientation of the helicopter it is difficult to know what the LIDAR actually measures. Therefore, the goal of this master thesis is to implement filters and algorithms that estimate the pose of the helicopter UAV using an inertial

measurement unit (IMU), a camera and a GPS. This pose estimation will later be

used in the parallel master thesis Airborne mapping using LIDAR, [1], to produce a topological map using LIDAR measurements. The goals of this work can be described in the following way:

• Collect data from sensors using an ABB industrial robot to simulate a heli-copter.

• Implement filters and algorithms that estimate the pose of the simulated helicopter.

• Evaluate the result and see if they are good enough to meet the requirements stated from the Airborne mapping using LIDAR, [1], Master’s thesis. • Merge the estimated pose with 3D Mapping with known Poses for UAV

master thesis and evaluate the estimated model landfill volume.

1.4 Related Work

Pose estimation and navigation is a crucial part for an UAV resulting in many articles and literature that is of importance to this work. The biggest area of importance to this work is sensor fusion where the readings of different sensors are combined (fused) to get a better description of the system that is observed. In this work an IMU, camera and simulated GPS will be used to get a estimation of the helicopter movement. Most often a probabilistic filter is used to fuse these measurements together with a dynamic model to get an estimation of the pose.

When it comes to navigation for a robot or UAV one of the more discussed topics is simultaneous localisation and mapping, see [11], [29] or [8]. This means that the robot maps its environment and uses this map to help navigation. A pop-ular sensor to detect features in the environment is a camera. Since computational speed has grown fast during the last decades, machine vision has become a hot topic for research and is now seen in many applications concerning, for example, navigation, see [23] or [3], and object recognition, see [20].

A feature in an image is something that is differentiable from its surroundings and easy to find in another image of the same scene. When the features have been chosen the robot tracks these features to try to estimate how it is moving in relation to these features. These features are stored in a map making it possible for the robot to determine, when features are extracted, if they have been observed at an earlier stage. If that is the case then the robot can compensate for bias in its estimated position that have occurred since the last time the same feature was observed.

There are a lot of research surrounding this topic and many articles to read for information. Since SLAM is computationally heavy many of these articles

(17)

1.4 Related Work 5

tries to describe different solutions to this problem and how to implement efficient methods either using for example, EKF or PF.

A nice introduction to the SLAM problem can be found in [8]. Here the author describes how SLAM works, the different obstacles that one might encounter and different solutions to the problem.

In [5] a solution to the SLAM problem only using a camera is proposed. The au-thor uses the EKF-SLAM framework and the saliency operator of Shi and Tomasi to extract features, see [30]. The single camera approach can also be seen in [16]. A practical experiment of SLAM in an autonomous helicopter can be found in [18]. The EKF-SLAM approach is used also here.

A proposal on how to use SIFT features in SLAM can be seen in [25]. Here the author proposes how the features can be matched and to use statistics to evaluate if the landmarks seem to behave well.

In [28] an UKF-SLAM approach is used instead and here they also use a SLAM method where old landmarks are discarded if not observed and match for a period of time.

Some alternatives to solve the SLAM problem can be seen in [3] and [6] where the problem is seen as an optimization problem.

(18)

(19)

Chapter 2

Hardware

The sensors and other equipment used in this work are described in this chapter. Figure 2.1 illustrates a possible setup of the sensors on the helicopter. The vector p describes the position of the helicopter in the world fixed XYZ-frame and the marked square area is the field of view for the camera.

(a) Sensors attached to the helicopter. (b) Field of view for the camera.

Figure 2.1. Illustrative picture of how the sensors would be attached to the helicopter.

2.1 Inertial Measurement Unit

An inertial measurement unit (IMU) is a sensor that measures the inertial forces generated by the movement of the sensor. The most frequent applications for an

(20)

8 Hardware

inertial system are navigation, attitude measurement and platform stabilization. An IMU consists of accelerometers, measuring the specific force f, which is a sum of acceleration at and the earth gravitational field, in all three dimensions.

It also contains gyroscopes that measure the rotational velocity ωt in all three

dimensions. This is shown in Figure 2.2 (b). Since at= ∂

2

∂t2ptand ωt= _∂t∂αtthe position ptand angle αtcan be calculated

by integration. However, since the measurements contain noise and bias, this calculation will diverge from the true value, and the error will increase as time goes by. This will be further addressed in Chapter 6. This means that sensor readings from the IMU are only trustworthy during shorter time periods and can not be used to calculate position or orientation from data collected during a longer time interval.

IMUs that have good performance even over longer time periods are available but come with the drawback of a very high price and physical size. IMUs like these are today used in, for example, military applications and civilian applications with very high demands in reliability, accuracy and safety. It is however, from an economical point of view, interesting to see if similar performance can be achieved using an cheaper device accompanied with smart signal processing.

A Crossbow IMU400CC has been used in this work which can be seen in Figure 2.2 (a) and it delivers data at ≈ 125 Hz. For more information about IMUs see Crossbows homepage: www.xbow.com.

(a) The inertial measurement unit, Crossbow IMU400CC, used in this work.

X Y Z

(b) The coordinate system in which the IMU delivers data.

Figure 2.2. The IMU (a) and its coordinate system (b).

2.2 Camera

A camera is a device that produces 2D images of a 3D scene and cameras are today used in a vast amount of different applications.

(21)

2.3 Global Positioning System 9

as sensors for robots and other machines because it is computationally heavy to extract relevant information from a picture. Today’s computers are however fast and cameras are popular as sensors in many different areas.

In this work, the images produced from the camera will help to support the pose estimation of the robot. By extracting so called features from the images the relative movement of the robot can be estimated by tracking these features. This will be further explained in Section 3.1.

The camera used is a Marlin F-080, see Figure 2.3, producing images with a resolution of 656×494 at either 7.5, 15 or 20 Hz. In this work it is however assumed that a fps of 7.5 is enough. For more information see Allied Vision Technologies homepage: www.alliedvisiontec.com

Figure 2.3. The camera, Marlin F-080, used in this work.

2.3 Global Positioning System

The global positioning system (GPS) is a satellite navigation system and is one of the more crucial and used system for navigation today since it bring information about the receivers position in a world fixed coordinate system. The GPS system was created and realized by the American Department of Defense in 1973 but became available to anyone with a GPS receiver in 1994. The system is accurate and gives the position with an error of a few meters depending on how many satellites that are available for positioning. A conceptual image of the GPS system is shown in Figure 2.4, found at [34].

Since the robots movement is very small in comparison to the noise an actual GPS receiver would give, simulated GPS data is used instead. This is made from the positioning data collected from the ABB robot. By adding noise to the position of the robot a simulated GPS is received which can be used as input. This simulated measurement is produced at 10 Hz to simulate the real GPS that will be used on the helicopter.

2.4 ABB Industrial Robot IRB1400

In order to evaluate the pose estimation the true pose has to be known. In many cases GPS is used as ground truth which makes it hard to evaluate how good the estimations are since it is inaccurate and does not give any information about the

(22)

10 Hardware

Figure 2.4. A illustrative image of the GPS system.

orientation. In order to avoid this problem, an ABB industrial robot IRB1400 is used as ground truth, which will simulate the movement of the helicopter. The position and orientation of the end tool on the robot is given with high precision and gives a good reference for the estimation. The robot logs the position and orientation of the calibrated point of the sensor platform at 100 Hz. The calibration point is located right below the IMU in order to put the IMU as close to rotation centre as possible and minimise ”lever effects” that will cause the IMU to read accelerations because of rotation. The robot itself can be seen in Figure 2.5 (c) and (d).

2.5 Sensor Platform

The sensor platform is an aluminium plate on which the sensors described in this section and a URG-04LX scanning laser range sensor is mounted, see Figure 2.5 (b). The laser is not used for pose estimation but instead used in the Airborne mapping using LIDAR Master’s thesis, see [1], to create maps of the terrain. The platform itself can be seen in Figure 2.5 (a), the LIDAR in Figure 2.5 (b) and the mounting of the sensors can be seen in Figure 2.5 (d).

(23)

2.5 Sensor Platform 11

(a) Sensor platform onto which the sen-sors are attached.

(b) The lidar, URG-04LX, used in the the parallel thesis, see [1].

(c) The ABB industrial robot IRB1400.

(d) Sensors mounted on the sensor platform which is at-tached on the ABB industrial robot IRB1400.

(24)

(25)

Chapter 3

Related Theory

The theory used in this work will be described in this chapter. The chapter starts with a presentation of image processing and gives a short introduction to differ-ent techniques on how to extract features in images. These features are tracked between images in order to estimate the pose of the the robot in relation to these features. Since several different coordinate systems are used in this work, the chapter then continues to describe how transformations between different coordi-nate systems are performed and represented. Then theory surrounding non-linear filtering and SLAM will be presented which is used for the actual state estimation of the robot.

3.1 Feature Extraction

A feature in an image is something that is well defined and easy to recognize between different images. For humans an object like a cow on a grass field or a TV in a living room is easy to distinguish from its surroundings even if the object is seen from different perspectives. Object recognition is however hard and computationally demanding. This means that there has to be a different criterion deciding what is easy to find and recognise between different images. Some of the available feature extractors describes the feature in vector form with a descriptor that is as unique as possible and distinguishable from other features in the image. It is required that the features have a high degree of repeatability which means the the same feature will be extracted from different images when the same scene is observed. It is also required that its descriptor is similar between images even if the scene is observed from a slightly different position and angle.

Three different ways to find features in an image are described in this section. In this work only the SIFT extractor is used but the others are presented so that the reader can get an understanding of different ways to find features.

(26)

14 Related Theory

3.1.1 Harris Corner detector

One of the most frequently used feature detector is the Harris corner detector, see [13]. The Harris corner detector was first proposed by Stevens and Harris in 1980 and has been widely used since it is simple to understand and easy to implement. The Harris corner detector is based on the structure tensor or second order moment matrix. It measures the local changes of a signal when small patches are shifted by a small amount in different directions. The structure tensor is defined as in (3.1), in a point x =x y. A(x, σ) = X xi,yi g(x, σ) I2 x IxIy IyIx Iy2 = hI2 xi hIxIyi hIyIxi hIy2i , (3.1)

where Ix and Iy represent the image derivatives in the x-, and y-directions and

the window

g(x, σ) = 1

2πσ2e

−(x2_+y2_)/2σ2

, (3.2)

is the Gaussian function. Here (xi, yi) is the points, centred around x, which

defines the neighbourhood area in analysis and σ is the standard deviation. The window does not need to be a Gaussian function e.g., but if a circular weighted window is used then the response will be isotropic.

If both of the eigenvalues of the matrix A are large, then it’s likely that the point x is near a corner in an image and if only one of the eigenvalues are large than x is near an edge. The Harris corner detector is based on an approximation of using the eigenvalues to decide weather the point x is near a corner or not. Instead the trace and determinant of A(x) are studied, which are tightly related to the eigenvalues. See [13] for more information.

Since the Harris detector does not come with a descriptor of the feature other methods have to be used to determine if the extracted feature is the same that was seen in previous images. One way to solve this problem is to use normalized

cross correlation (NCC) which is reliable when it comes to try to find a match of

a template in a larger image.

There are several feature extractors available today that are based on the Harris corner detector but that are improved with respect to some the issues with the original detector, see for example [21].

3.1.2 Speeded Up Robust Features

Another way to find features is to study the Hessian instead. It is defined as H(x, σ) =Lxx(x, σ) Lxy(x, σ)

Lyx(x, σ) Lyy(x, σ),

, (3.3)

where Lxx means convolution with the second order derivative of the Gaussian

function _∂x∂22g(x, σ) with the image at a point x. The function g(x, σ) is the same as seen in (3.2). The Hessian gives responses to shapes like blobs or ridges.

One of the more state-of-the-art methods to extract features is the speeded up

(27)

3.1 Feature Extraction 15

descriptor, first presented by Herbert Bay et al. in 2006, see [2]. It is based on an approximation of the Hessian, so called box filters, which can be evaluated very quickly using integral images. The orientation is calculated using the Haar wavelet in the circular region that defines the area of interest by the Hessian.

For region description, a square area with the same orientation as the region of interest is chosen and divided into 4×4 sub areas. The orientation is then calculated in each of these subspaces in both the x and y direction, again using the Haar wavelet. These orientations, and their absolute values, are used to compose the descriptor vector being 4×4×4 = 64 bytes.

3.1.3 Scale Invariant Feature Transform

One of the most used feature extractors today is the scale invariant feature

trans-form (SIFT), which is invariant to image rotation, scale and translation, and

partially invariant to illumination changes or affine or 3D projections. SIFT was developed by David G. Lowe in 2004, see [20], and has since been widely used in many different applications concerning object recognition and tracking. It uses the maxima or minima of the difference-of-Gaussian (DoG) function to extract features. The DoG function is defined as

D(x, σ) = L(x, kiσ) − L(x, kjσ), (3.4)

where L(x, kσ) represent the scale-space of a image, that is the image I(x) convo-luted with the variable scale Gaussian

L(x, kσ) = g(x, kσ) ∗ I(x). (3.5)

Here the function g(x, σ) is the same as in (3.2). The variable k represent the different scales of the image and is an octave, meaning that k will be doubled between the two different scale-spaces L(x, kσ). Hence a DoG image between scales kiσ and kjσ is just the difference of the Gaussian-blurred images at scales kiσ and kjσ.

A key point descriptor is created by first computing the gradient magnitude and orientation at each image sample point in a region around the extreme point of the DoG function. These samples are then accumulated into histograms summarizing the contents in 8 orientations over 4×4 sub regions. Therefore each feature is described using a 4 x 4 x 8 = 128 byte vector.

An example of extracted features can be seen in Figure 3.1.3 where a dragonfly is shown which was caught on camera outside my wife’s parents house this summer.

(28)

(a) Original image of dragonfly.

(b) Dragonfly image with extracted features.

(29)

3.2 Kinematics 17

3.2 Kinematics

In applications when there are several coordinate systems the relation between these systems have to be known, or estimated, if the position of an object expressed in coordinate system (B) is to be expressed in a world fixed coordinate system (W). From the rigid body mechanics we know that all of these transformations can be described by a translation and rotation. The translation is a vector describing the displacement between the two systems and the rotation is the change of direction of the coordinate axis, often described by a rotation matrix. These rotations can be parametrized in different ways, for example by Euler angels or unit quaternions, see [26] for an overview.

In Figure 3.2 two coordinate systems are shown. A world fixed coordinate system is represented by the (W) system and the (B) system represent a, possibly time varying, system that is translated and rotated relative to the (W) system. The point r, that is measured in the (B) system, is described in (W) system as

rW = rt+ RW BrB, (3.6)

where rt is the translation vector between the two coordinate systems, rB is the vector to the point r in the (B) system and RW B is the rotation matrix from the (B) system to the (W) system.

XW YW ZW xB zB yB rt rW rB r

Figure 3.2. The world fixed system (W) and system (B).

3.2.1 Rotation

The rotation of a vector u around n is geometrically shown in (3.3). The new vector p is described as

(30)

This is called the rotation formula and the reader is referred to [14] or [31] for complete deductions. Notice that the rotation is performed counter clockwise in the positive direction. This orientation could equally be performed by a clockwise rotation of the coordinate frame instead. These vector rotations are used in the

u

Figure 3.3. The rotation of vector u about n resulting in p.

following way. Consider a point r measured in the frame (W). The same point could be expressed in the B-system (consider the case where rt_{= 0 in Figure 3.2)}

instead by applying the rotation formula

rB = cos ϕrW + (1 − cos ϕ)nW(nW · rW_{) − sin ϕ(n}W _{× r}W_), _(3.8)

where nW _{is the unit axis of the W-system and ϕ is the rotation angle. The cross}

product can be rewritten as

u × v × w = v(w · u) − w(u · v), (3.9) and can be expressed as a matrix-vector multiplication

w × v =   0 −w3 w2 w3 0 −w1 −w2 w1 0   | {z } ,S(w) v = S(w)v. (3.10)

These two properties make it possible to rewrite (3.8) as

rB = cos ϕrW + (1 − cos ϕ)nW(nW · rW_{) − sin ϕ(n}W _{× r}W₎

= cos ϕrW + (1 − cos ϕ)(nW × nW _{× r}W _{+ r}W_{) + sin ϕ(n}W _{× r}W₎

=I + sin ϕS(nW) + (1 − cos ϕ)S(nW₎2

| {z }

,RBW

rW. (3.11)

Therefore a rotation of a coordinate frame can be defined using a rotation matrix

(31)

Rotation matrices belong to the orthogonal group having the properties

R ∈_{A ∈ R}3×3 : ATA = I, det A = 1 . (3.13) This means that the inverse to the rotation matrix is the transpose, resulting in transformations between the two coordinate systems are given by the transpose of the rotation matrix

rW = (RBW)−1rB = (RBW)TrB = RW BrB. (3.14)

3.2.2 Quaternions

Quaternions is an extension to the complex numbers, defined as a vector with four real numbers q =q0 q1 q2 q3 and it was first proposed by Sir William

Rowan Hamilton in 1844 [12]. It can also be seen as a vector consisting of a scalar part q0 and the vector q as in (3.15).

The reader is referred to [19] for more information about quaternions or to [31] for an introduction.

In this section the bold font are used to represent vectors in R3 _{to simplify}

notation. In the rest of this work all vectors of any dimension are represented with bold font.

A unit quaternion can be written as

q =q0

q

. (3.15)

The multiplication of two quaternions is defined in the following way

p q = p0q0− p · q p0q + q0p + p × q , (3.16)

and quaternions have the following properties:

p q 6= q p (3.17) kqk = v u u t 3 X j=0 q2 j (3.18) kp qk = kpkkqk (3.19) (p q) v = p (q v) (3.20)

The inverse is defined such that

q−1 q = q q−1=1 0

. (3.21)

This means that the inverse can be calculated as

q−1=q0 q −1 = q0 −q /kqk. (3.22)

(32)

These properties make it possible to use quaternions to represent rotations. Think of the quaternion as in (3.15), where the scalar part q0 represent a rotation angle

and q is the vector around which the rotation will occur. This is the case in (3.23)

q =q0 q = cos θ sin θn . (3.23)

Here θ is the rotation angle around the unit vector n, where n is parametrized using direction cosine,

n =   cos α cos β cos γ  . (3.24)

Direction cosine is a parametrisation making it possible to describe a direction of a vector in an unique way, where each of the angles represent the angle from each of the coordinate axis o the vector, see [31] for a richer description. If this parametrisation is used vector rotation can be represented using quaternions.

To rotate a vector u about n, which is represented by the quaternion q, the rotated vector can be described by

v = q−1 u q, (3.25) where v = 0 v and u = 0 u .

The result of this quaternion multiplication is

v = q · uq0− (q0u − q × u) · q (q · u)q + q0(q0u − q × u) + (q0u − q × u) × q , (3.26)

which can be simplified to

v = 0 2(q · u)q + (q2 0− q · q)u − 2q0q × u . (3.27)

By using (3.23) the expression can be rewritten as

v =

0

2 sin2θn(n · u) + (cos2_{θ − sin}2_{θ)u − 2 cos θ sin θ(n × u),}

(3.28) and by using trigonometric identities it can be rewritten as

v =

0

cos 2θu + (1 − cos 2θ)n(n · u) − sin 2θ(n × u)

, (3.29)

which is the same as (3.7) if θ = ϕ/2. Therefore the unit quaternion

q = cos(ϕ/2)

sin(ϕ/2)n

, (3.30)

(33)

It is, as described in (3.11), possible to rewrite this vector rotation as a matrix multiplication which gives the following rotation matrix

R(q) =   (q2 0+ q12− q22− q23) 2(q1q2+ q0q3) 2(q1q3− q0q2) 2(q1q2− q0q3) (q02− q12+ q22− q23) 2(q2q3+ q0q1) 2(q1q3+ q0q2) 2(q2q3− q0q1) (q02− q12− q22+ q23)  . (3.31)

Quaternions has the advantage of being more robust than Euler angles since they are a non singular parametrisation. They also have a bilinear differential equation making it possible to describe the dynamics of the quaternions as a matrix multiplication as we will see in the following section. Euler angles on the other hand have fewer parameters and are more intuitive, but suffer from singularities and have non linear dynamics.

3.2.3 Dynamics of Quaternions

Consider two systems, a world fixed W-system and a moving body fixed B-system. The dynamics of the quaternions have to be known since the B-system is time varying and the quaternions represent the relation between the two systems. The differentiation of the quaternions will not be derived here but only the result will be presented. The unit quaternion qBW _{represents the rotation between the two}

systems and ωBW_{is the rotation velocity of the B-system relative to the W-system.}

˙ qBW(t) = 1 2q BW_{(t) ω}BW_{(t) =} 1 2     0 −ωx −ωy −ωz ωx 0 ωz −ωy ωy −ωz 0 ωx ωz ωy −ωx 0     | {z } S(ωBW)     q0 q1 q2 q3     = 1 2     −q1 −q2 q3 q0 −q3 q2 q3 q0 −q1 −q2 q1 q0     | {z } ˜ S(qBW₎   ωx ωy ωz   (3.32)

The IMU readings are in this work incorporated in the dynamic motion model for the filter. Since the reading of the rotational velocity ωBW contains measurement noise w this has to be accounted for. The measurement noise w is small and it is approximated that ωBW _{+ w can be used as angular velocity. This gives the}

following expression, ˙

qBW = 1 2S(ω

BW _{+ w)q}BW_, _(3.33)

which can be written as ˙

qBW = 1 2S(ω

BW_)qBW _{+ ˜}_S(qBW_)w _(3.34)

(34)

3.3 Filter theory

Probability filtering deals with the problem of estimating the probability of the system states, xt, given measurements, y1:t, that is

p(xt|y1:t). (3.35)

The following section will give a short description of linear and non linear filtering. These filters are often derived using a Bayesian view where the states are seen as hidden, which means that the states are not directly observed but only represented by a probability density function (PDF).

To read more about the Bayesian framework and their models the reader is referred to [7]. The books [11] and [29] have much information and theory re-garding linear, and non linear filtering, derivations of the filters and their different applications. David Törnqvist’s PhD thesis [31] gives a nice overview of filtering theory and their applications and much more information relevant to the theory than what will be presented in this work. No bold fonts are used in the following sections since the states, measurements and inputs can be of any dimension, even scalars.

3.3.1 Kalman Filter

The KF filter was derived around 1960 by R.E. Kalman and R.S. Bucy and has since then been one of the most used tools in signal processing. See [17] for Kalmans own deduction.

The KF is the best linear unbiased estimator (BLUE) which means that the KF is an optimal filter, in the case with linear systems and Gaussian noise.

The KF requires linear systems and the following section will present the used model.

Linear State Space Model

The KF estimates states in a linear state space model

xk+1= Fkxk+ Gukuk+ Gvkvk, (3.36a) yk = Hkxk+ Hkuuk+ ek. (3.36b)

The index k here represent the time-discrete time or sample number since it does not necessarily have to be interpreted as time.

The process- and measurement noise is assumed Gaussian, that is

vk∼ N (0, Qk), (3.37a)

ek∼ N (0, Rk). (3.37b)

(35)

3.3 Filter theory 23

Algorithm 1 The Kalman Filter

1: Initialization: ˆx1|0 = E[x0] and P1|0= Cov[x0]

4: Set k = k+1 and repeat from step 2

3.3.2 Extended Kalman Filter

The extended kalman filter (EKF) is one of the most used methods for state es-timation in non linear dynamic systems. It is an approximation that makes it possible to use the framework from the KF even for non linear systems. The model used here is non linear with white noise,

˙

xt= f (xt, ut, vt, t), (3.39a) yt= h(xt, ut, t) + et. (3.39b)

The system is sampled which gives the model

xk+1= f (xk, uk, vk, k), (3.40a) yk = h(xk, uk, k) + ek (3.40b)

In order to use the KF theory the system has to linear. This is done by linearising the models around the latest state estimation with the first order Taylor expansion

(36)

where the matrices Fk,Gvk and Hk is the Jacobian of the function, that is

Fk, ∂f (x, u, v, k) ∂x (x,u,v,k)=(ˆxk|k,uk,0,k) , (3.42a) Gv_k_, ∂f (x, u, v, k) ∂v (x,u,v,k)=(ˆxk|k,uk,0,k) , (3.42b) Hk, ∂h(x, u, k) ∂x (x,u,k)=(ˆx_k|k−1,uk,k) . (3.42c)

This is an approximation that does not guarantee optimality but have proven to be useful in many cases even thought the the model is not linear. It usually works fine as long as the rest term, which is neglected in the EKF, is small. Filters like the unscented Kalman filter take this rest term in consideration and tries to estimate it. However, as long as the EKF algorithm performs well there is little to gain in estimating the rest term. The reader is referred to [11] for alternative filters.

The EKF algoritm is presented in Algorithm 2. Algorithm 2 The Extended Kalman Filter

1: Initialization: ˆx1|0= E[x0] and P1|0= Cov[x0]

3.4 Simultaneous Localisation and Mapping

Simultaneous localisation and mapping (SLAM) refers to the problem of

estimat-ing the position of an autonomous robot usestimat-ing observations of landmarks in the environment, that have an unknown position, and map this environment. These landmarks can be any kind of object in the environment that is easy to find and recognise by an available sensor. Often lasers, sonars or cameras are used to iden-tify features that are candidates for landmarks. Then some criteria is used to determine if the feature is appropriate to incorporate in the filter. Suitable

(37)

land-3.4 Simultaneous Localisation and Mapping 25

marks depend on which sensor you use to extract these features, but these can be a well defined corner, a line or some kind of marker etc.

Many different solutions to SLAM have been suggested over the years where the EKF-SLAM has probably been the most used, see for example [5], [16] or [23], but PF solutions are also popular, see [9] or [31]. Much of the research is about the technical difficulties with SLAM and which approximations that can be done and still receive good results. A major issue is the computational cost of performing EKF-SLAM and the problem with loop closing, which is to recognise previously seen features though they have not been observed for a time. In SLAM the landmarks are treated as states in the state vector, which means that the the state vector will grow as more landmarks are initiated. What makes the EKF-SLAM work is that you get a full covariance matrix describing the correlation between the robot states and the different landmarks, see [27]. The problem is that the computational cost of inverting these matrices grow O(n2_{), where n is the}

number of states, in the EKF resulting in that large maps with over 100 landmarks becomes hard to handle in real time. The reader is referred to [8] and [31] for more information about computational complexity.

One of the problems with PF solutions is that more particles gives better performance, but this gives a higher computational cost. Another problem is that the PF map becomes degenerated over time and loop closure becomes hard, see [31].

The SLAM problem have been a hot topic for research the last 20 years and there are a vast amount of literature and articles covering the topic.

The following section will present the EKF-SLAM solution from a Bayesian point of view.

3.4.1 Problem Formulation

In SLAM the PDF of the states, x, and the map, m, is estimated given observations

y, that is

p(xt, mt|y1:t), (3.44)

which is similar to the model in (3.35). The only difference is that you now have the map states to take under consideration. The state space model becomes similar to the model in (3.40) with the additional map state vector

xk+1= f (xk, uk, vk, k), (3.45a)

mk+1= mk, (3.45b)

yk = h(xk, mk) + ek. (3.45c)

Since SLAM problems often are non linear in robot dynamics or measurement equation linearisation has to be done in the same way as for the EKF. The Jacobian matrices are calculated in the same way as in (3.42) with the exception of the measurement equation that have to be linearised around the mk states as well

Hk , ∂h(x, m) ∂(x, m) (x,m)=(ˆxk|k−1,mk|k−1) . (3.46)

(38)

It is now possible to use the same framework as from the EKF to make a working SLAM algorithm but there are a few more concepts of SLAM that have to be presented.

If the robot travels in an unknown environment only the robot states exists from the start and as the robot travels around features are extracted and landmarks are initialized. This extends the state vector and covariance matrix according to which parametrisation that are used to define the landmark. Often only the position of the landmark is described, resulting in two or three parameters defining the position in 2D or 3D. Angels can be used to define in which direction the landmark is positioned relative to the robot. In full SLAM six parameters are used to describe the landmark and in the Section 4.4 one of these parametrisations will be presented which also is the used parametrisation in this work.

When the features are extracted it must be controlled whether the feature belongs to an landmark or if it is a newly discovered feature that might be an appropriate candidate for a new landmark. This is called the association step and will be further discussed in the map management Section 6.2. If an association is made then the detected landmark will be used in the measurement update and the states, x, m, and the covariance, P , will be updated according to the EKF algorithm. If it is a newly discovered feature, it will be added to the map and the robot will use it for localisation in the following time steps.

(39)

3.4 Simultaneous Localisation and Mapping 27

3.4.2 SLAM Algorithm

The SLAM algorithm will be presented in Algorithm 3. To make the notation more simple z represent the whole state vector, including robot states and map states m, and x represent only the robot states.

Algorithm 3 Simultaneous Localisation and Mapping 1: Initialization: ˆ z1|0 = ˆx1|0 ˆ m1|0 , P1|0 = " Pxx 1|0 P xm 1|0 Pmx 1|0 P mm 1|0 # (3.47)

2: Association step: Detect features and associate with landmarks 3: Measurement Update:

Kk = Pk|k−1HkT(HkPk|k−1HkT+ Rk)−1 (3.48a)

ˆ

zk|k= ˆzk|k−1+ Kk(yk− h(ˆzk|k−1)) (3.48b)

Pk|k= (I − KkHk)Pk|k−1 (3.48c)

In many SLAM applications the map is unknown from the initialization, mean-ing that the only the robot states x and the covariance Pxx

1|0 needs to be initialized

from the start. It is also worth mentioning that only the associated landmarks will be updated in the EKF update phase, hence non associated landmarks will not be updated.

3.4.3 Implementation of EKF-SLAM

There are several steps in the SLAM algorithm that have to be done in order to perform the measurement and time update. The models, for example, have to be linearised and correct matches between features and landmarks have to be done. These steps are described more in detail in this section.

(40)

Linearising

To perform the measurement update the measurement equation has to be linear or linearised. The measurement from the simulated GPS is linear, since it measures the position directly, and no linearisation is needed. The measurement equation for the for the camera, see Section 4.3.2, is however non linear and it has to be linearised. This could be done analytically but is however complicated for this measurement model and a numerical method is used instead. The derivative is, in this work, approximated by central difference, that is

f0(x) ≈ f (x + h) − f (x − h)

2h . (3.49)

Here f is a function, h is some small number and x is the variable, scalar or vector. Prediction and innovation

Every time a measurement is made, the corresponding measurement model is used to predict the outcome of the measurement according to its current states. The result from the subtraction of the actual measurement and the prediction is called the innovation,

εk= yk− h(ˆzk|k). (3.50)

Here y_k is the actual measurement, h is the measurement model and ˆzk|k is the

latest state estimate. How much the states will be updated depends on the size of this innovation and the Kalman gain Kk which is calculated as in Algorithm 2

or 3. The Kalman gain Kk is the optimal feedback gain for convergence and noise

suppression, given that the system is linear and that the noise in the models are Gaussian.

The prediction for the GPS measurement is the current state estimate of the position of the robot. The innovation tells in what way the current states should be updated, increase x position, decrease z position and so on. The measurement are as known noisy and the Kalman gain controls how the states are actually updated.

The prediction for the landmarks in the map is however a little more complex. It is done in the following way. First of all the position of the landmark in the world is estimated using the measurement equation, see Equation 4.9. The lat-est landmark states and the current robot position are used to predict at which position the landmark is predicted to be positioned. Then the camera model, as described in Section 5.3.1, is applied to compute the actual pixel location for the predicted landmark. Figure 3.4 visualizes this prediction.

Gating

When an image is available and features have been extracted the positions of the landmarks in the map will be used to predict where they are expected to appear in the image. Unless it is positioned outside of the field of view of the camera, an area around the predicted position will be searched for features. If no features

(41)

3.4 Simultaneous Localisation and Mapping 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Pixel

100

200

300

400

500

600

100

200

300

400

Figure 3.4. Prediction of landmarks in image. The red crosses are extracted features and the arrow point at the point where the landmarks are predicted to be positioned. The number of the landmark is printed if an association with a feature is made.

are found within the area, then the landmark will be unassociated and thus not used in the measurement update. If however features are found then its descriptor will be compared with the landmarks descriptor. This will be further explained in Section 6.2.

The method to restrict in what region the measurement need to be found is is called Gating. In this work a constant area around the prediction have been used to define in what region in the image the measurement have to be found for an association and measurement update is to be done. This is done for outlier rejection and to prevent miss association which SLAM is very sensitive against.

Another way to do this is to use the innovation covariance,

S_ki = (H_kiPk|k−1Hki T

+ Ri). (3.51)

A measurement is considered valid if the product,

εi_kTS_ki−1εi_k ≤ λ, (3.52) is below the pre defined threshold λ. Here εi_k is the innovation, as described in Equation 3.50, for the landmark i observation. The matrices Hi

k and R

i _{are also}

calculated from this observation. The test is to see if it is likely that the observation is correct. This is a χ2_{-test with two degrees of freedom and the number λ can be}

calculated analytically. The test is to see if the innovation can be described by the Gaussian assumption that is used in the EKF. If this value is transcended then the null hypothesis that the innovation is standard Gaussian, which means that the mean is zero and that the standard deviation is zero, will be rejected and the measurement will be discarded and not used for estimation.

In this work a threshold of λ = 5.99 is used. This value is chosen from a

(42)

probability that the null hypothesis will be rejected and a correct association will be thrown away.

The acceptance region for searched features is rather large in this work since it is coupled with this test to assure that no miss associations are made. An area with a radius of 100 pixels have been used in this work.

(43)

Chapter 4

Models

This chapter presents the models used in this work. First the different coordi-nate systems are presented in which the different sensors and equipment delivers its data. Then the dynamic model is described which contains the system states that we wish to estimate and how these states relate to each other. It then con-tinues with the measurement equations, which describe how the measurements from each sensor depends on the system states. It ends with a description of the parametrisation used in this work to describe the landmarks used for the SLAM algorithm.

4.1 Coordinate Systems

The coordinate systems used in this work are as follows:

• World frame, (W): The frame in which the position and orientation of the robot is calculated. It is fixed which means that a position measured in this frame is constant over time.

• Body frame, (B): This is the coordinate system for the IMU.

• Camera frame, (C): This is the coordinate system located in the optical centre of the camera. It is assumed that the body frame (B) and camera frame (C) coincide in orientation but are separated by a translation. • Image frame, (I): The image frame is located in the camera, perpendicular

to the optical axis but located at a distance, the focal length, behind the optical centre.

• Robot frame, (R): The coordinate system in which the ABB industrial robot delivers its pose.

(44)

32 Models

4.2 Dynamic Model

The dynamic model is used to describe how the states of the robot and map evolve over time. The model used in this work to describe the motion of the robot is a constant velocity (CV) model with accelerations as input and quaternions describing its orientation. The state vector xk = pk vk qk

T

consists of the position for the robot p_k ∈ R3 _{described in a world fixed coordinate system,}

speed for the robot vk ∈ R3 in the world fixed coordinate system and qk =

q0 q1 q2 q3

T

describing the robots orientation. mj,k denotes the states for

the map and j is the index for each map entry.

xk+1=   I3 T I3 03×4 03×3 I3 03×4 04×3 04×3 I4+T₂S(ωk)  xk+   T2 2 I3 T I3 04×3  RW B(qk)ua,k+ g | {z } f (xk,uk) +   T2 2 I3 03×3 T I3 03×3 04×3 T₂S(q˜ k)   | {z } Gk  wa,k wω,k mj,k+1=mj,k, j = 1, ..., Mk (4.1) where wa,k∼ N (0, Ra) (4.2) wω,k∼ N (0, Rω) (4.3) S(ωk) =     0 −ωx,k −ωy,k −ωz,k ωx,k 0 ωz,k −ωy,k ωy,k −ωz,k 0 ωx,k ωz,k ωy,k −ωx,k 0     (4.4) ˜ S(q_k) =     q1,k −q2,k −q3,k q0,k −q3,k q2,k q3,k q0,k −q1,k −q2,k q1,k q0,k     (4.5)

In (4.1) RW B(q_k) is the rotation matrix describing the rotation from the body fixed coordinate system (B), centred in the IMU, to the world fixed coordinate system (W).

Input ua,k =ax ay az T

is the measured IMU acceleration in the body frame. The vector g =0 0 gˆT

is the gravitation in the world coordinate system that has to be subtracted in order to prevent interference from the earth gravitational

(45)

4.3 Measurement Equations 33

field. The value of ˆg is estimated before every data collection to compensate for

the measured g-vector. This is further explained in Chapter 5.

The process noise is described by wa,k and wω,k in (4.2) and (4.3). The matrix S(ωk) in (4.4) describes the dynamics of the quaternions and the matrix ˜S(qk) in

(4.5) is the discretized noise matrix for the quaternions.

Since the input uk depends on the orientation of the robot, which is described by

the quaternions q_tthe model have to be linearised.

4.3 Measurement Equations

Since we are using three sensors in this work, three different measurement equa-tions might be expected, but since the IMU readings are used as input directly to the dynamic model only GPS and camera have measurement equations.

4.3.1 GPS Measurement Equation

Since the GPS measures the position of the robot directly in the world fixed coordinate system the measurement is linear and no linearisation of the equation is needed. ygps_k = p_k+ e1,k=   px_k py_k pz_k  + e1,k (4.6) where e1,k∼ N (0, Rgps) (4.7)

The rows and columns of the measurement covariance Rgpsis assumed independent

resulting in a diagonal matrix, however the variance σ2

pz for pz_k is twice as large

as variance for px k and p

y

k. This is to model a real GPS measurement since the

altitude is much harder for the GPS to estimate than latitude and longitude. Since the GPS is simulated from the position data that is received from the robot by adding noise to the position, only empirical values for the noise is available. In this work it is assumed that noise with a standard deviation of a few centimetres is realistic.

4.3.2 Camera Measurement Equation

In order to use images from a camera as measurements landmarks are extracted and tracked over time to get information about movement of robot relative to these landmarks. Landmark positions are parametrised according to inverse depth parametrisation, see Section 4.4, resulting in

mcj,k=   px j,k py_j,k pz j,k  = RCBRBW(qk)     px_j py_j pz j  + 1 ρj d(θ, φ) − pk− RW Brc  . (4.8)

Pose estimation of a VTOL UAV using IMU, Camera and GPS

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Pose estimation of a VTOL UAV using IMU,

Camera and GPS

Pose estimation of a VTOL UAV using IMU,

Camera and GPS

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Abstract

Sammanfattning

Acknowledgments

Contents

Chapter 1

Introduction

1.1

Background

1.2

CybAero AB

1.3

Objectives

1.4

Related Work

Chapter 2

Hardware

2.1

Inertial Measurement Unit

2.2

Camera

2.3

Global Positioning System

2.4

ABB Industrial Robot IRB1400

2.5

Sensor Platform

Chapter 3

Related Theory

3.1

Feature Extraction

3.1.1

Harris Corner detector

3.1.2

Speeded Up Robust Features

3.1.3

Scale Invariant Feature Transform

3.2

Kinematics

3.2.1

Rotation

u

3.2.2

Quaternions

3.2.3

Dynamics of Quaternions

3.3

Filter theory

3.3.1

Kalman Filter

3.3.2

Extended Kalman Filter

3.4

Simultaneous Localisation and Mapping

3.4.1

Problem Formulation

3.4.2

SLAM Algorithm

3.4.3

Implementation of EKF-SLAM

Pixel

Pixel

100

200

300

400

500

600

100

200