An Optimization Based Approach to Visual Odometry Using Infrared Images

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

An Optimization Based Approach to Visual

Odometry Using Infrared Images

Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping

av Emil Nilsson LiTH-ISY-EX--10/4386--SE

Linköping 2010

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

An Optimization Based Approach to Visual

Odometry Using Infrared Images

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Emil Nilsson LiTH-ISY-EX--10/4386--SE

Handledare: Christian Lundquist

isy, Linköpings universitet

Jacob Roll

Autoliv Electronics

David Forslund

Autoliv Electronics

Examinator: Thomas Schön

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution

Division, Department

Division of Automatic Control Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2010-06-15 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://www.control.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-57981 ISBN — ISRN LiTH-ISY-EX--10/4386--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title

En optimeringsbaserad metod för visuell odometri med infraröda bilder An Optimization Based Approach to Visual Odometry Using Infrared Images

Författare

Author

Emil Nilsson

Sammanfattning

Abstract

The goal of this work has been to improve the accuracy of a pre-existing algorithm for vehicle pose estimation, which uses intrinsic measurements of vehicle motion and measurements derived from far infrared images.

Estimating the pose of a vehicle, based on images from an on-board camera and intrinsic measurements of vehicle motion, is a problem of simultanoeus localization and mapping (SLAM), and it can be solved using the extended Kalman filter (EKF). The EKF is a causal filter, so if the pose estimation problem is to be solved off-line acausal methods are expected to increase estimation accuracy significantly. In this work the EKF has been compared with an acausal method for solving the SLAM problem called smoothing and mapping (SAM) which is an optimization based method that minimizes process and measurement noise.

Analyses of how improvements in the vehicle motion model, using a number of diﬀerent model extensions, aﬀects accuracy of pose estimates have also been performed.

Nyckelord

Keywords visual odometry, smoothing and mapping, SAM, SLAM, automobile motion model, FIR, monocular

(6)

(7)

Abstract

The goal of this work has been to improve the accuracy of a pre-existing algorithm for vehicle pose estimation, which uses intrinsic measurements of vehicle motion and measurements derived from far infrared images.

Estimating the pose of a vehicle, based on images from an on-board camera and intrinsic measurements of vehicle motion, is a problem of simultanoeus localization and mapping (SLAM), and it can be solved using the extended Kalman filter (EKF). The EKF is a causal filter, so if the pose estimation problem is to be solved off-line acausal methods are expected to increase estimation accuracy significantly. In this work the EKF has been compared with an acausal method for solving the SLAM problem called smoothing and mapping (SAM) which is an optimization based method that minimizes process and measurement noise.

Analyses of how improvements in the vehicle motion model, using a number of diﬀerent model extensions, aﬀects accuracy of pose estimates have also been performed.

Sammanfattning

Målet med detta examensarbete var att förbättra precisionen hos en redan exis-terande algoritms skattningar av ett fordons pose (position och orientering), som använder interna mätningar av fordonets rörelse samt mätningar erhållna från infraröda bilder.

Att skatta ett fordons pose, utifrån bilder från en kamera ombord på farkosten samt interna mätningar av fordonsrörelse, är ett problem av typen samtidig lokalis-ering och kartlokalis-ering (SLAM), och det kan lösas med ett utökat Kalmanﬁlter (EKF). EKF är ett kausalt ﬁlter, så om skattning av pose ska utföras i efterhand kan icke-kausala metoder istället användas och sådana metoder förväntas ge avsevärt förbättrad precision i skattningarna. I detta examensarbete har EKF jämförts med en icke-kausal metod att lösa SLAM-problemet som kallas glättning och kartering (SAM), en optimeringsbaserad metod som minimerar process- och mätbrus.

Även analyser av hur förbättringar av fordonets rörelsemodell, med ett antal olika modelltillägg, påverkar precisionen i skattningarna av pose har genomförts.

(8)

(9)

Acknowledgments

First of all I would like to thank David Forslund and Jacob Roll at Autoliv Elec-tronics for the help, in all kinds of aspects, during this master’s thesis. Of great help has also Christian Lundquist and my examiner Thomas Schön been, con-stantly sharing ideas and answering questions throughout my work. My thanks also goes to Zoran Sjanic and Martin Skoglund for giving me an introduction to SAM.

Furthermore I want to take the opportunity to thank Thomas Karlsson and Göran Forsling at the Mathematics department for, during all of my years at the university, always allowing me to ask them questions, giving me helpful solutions and explanations to mathematical problems and queries.

Last but not least, I would like to thank my family and Anna for all encour-agement and support you have given me throughout the years.

(10)

(11)

Introduction

This section describes the problem to be solved in this work, and the context of the problem.

1.1 Background

Autoliv Electronics has, together with the Division of Automatic Control at Linkö-ping University, developed an off-line software tool for estimating the sequence of poses of a vehicle (with pose meaning position and orientation), using images taken by a far infrared (FIR) camera and measurements of vehicle acceleration, speed and yaw rate. This tool uses sensor fusion of the vehicle measurements and FIR images in order to perform simultaneous localization and mapping (SLAM), so it is not only the vehicle poses that are estimated, but also positions of landmarks seen by the camera. Figure 1.1 illustrates the SLAM problem; it shows five con-secutive vehicle poses, and measurements of landmarks from the different vehicle positions. Note that although the camera only measures the direction to land-marks, it is possible to determine a landmark’s position as the vehicle moves and more measurements of the landmark are gathered.

1.2 Problem Formulation

The problem to be solved in this work is to ﬁnd out how the oﬀ-line estimation of vehicle poses can be improved in terms of accuracy.

Before this work, pose estimation was performed using an extended Kalman filter (EKF). Since the EKF is a causal method, and the pose sequence estimation is performed off-line, using an acausal method was expected to improve the esti-mation significantly, so the first order of action was to implement such a method. The problem of recovering the best possible estimate of the poses and landmark positions is called smoothing and mapping (SAM, see for example [1]), and the method proposed was linear least squares SAM, which means iteratively lineariz-ing the vehicle and measurement models and findlineariz-ing a pose sequence estimate by

(14)

Figure 1.1: Simultaneous localization (of the vehicle, illustrated by rectangles) and mapping (of the landmarks, illustrated by circles) is performed by making some sort of position measurements of the landmarks (e.g. bearing measurements) from the diﬀerent vehicle positions.

solving a linear least squares problem. Other proposed areas with potential for increasing the estimation accuracy was improvements regarding the vehicle model, feature detection and association algorithms (how to extract measurements from the FIR images) and robustness against moving objects.

Since the ground truth for the vehicle poses is unknown, the estimation accu-racy must be evaluated by indirect measures, so an important part of this work is to ﬁnd good measures of estimation accuracy.

1.3 Autoliv

Autoliv is a worldwide leading developer and manufacturer of automotive safety systems such as seatbelts, airbags and safety electronics, with most of the major automobile manufacturers as their customers.

The main activity of Autoliv’s subsidiary company Autoliv Electronics, for whom this work is performed, is to delevop and manufacture electronic control units (ECU) for controlling air bag deployment. Not so long ago Autoliv Elec-tronics started development of the Night Vision pedestrian detection system [5], a system that in spite of darkness is able to warn drivers when pedestrians are in the vehicles path, or moving towards this path. Although the system currently does not automatically detect and warn for animals, they can be seen on the Night Vision display, giving the driver a chance to detect them by using the display in a way similar to the use of rear-view mirrors. The Night Vision system, originally without the pedestrian detection, has been available in production cars since 2005. Figure 1.2 shows the Night Vision system in action, along with visual images for comparison.

(15)

1.4 Camera System Overview 3

(a) City (b) Countryside

Figure 1.2: These two images show comparisons between visual images and Night Vision images

1.4 Camera System Overview

The Night Vision camera registers radiation in the far infrared (FIR) region at 30 Hz, with a resolution of 320×240 pixels, and a spectral range of 8–14 µm. An advantage of using a FIR sensor, compared to near infrared (NIR) sensors, is that body heat radiation from humans and animals lies within the FIR region ([13] suggests that it is easier to detect distant pedestrian using FIR compared to using NIR). On the other hand, non-living objects and structures are not registered so clearly by a FIR camera, so for the sake of estimating vehicle poses the FIR camera might not be the best choice. The camera system is passive, meaning that it does not actively illuminate the road ahead with FIR radiation. Figure 1.3 shows the camera when it is mounted to the car.

(a) Camera mounting (b) Close-up view of camera mounting

Figure 1.3: The images in this ﬁgure show the placement of the Night Vision camera.

(16)

1.5 Related Work

The EKF-based estimation of the vehicle poses that Autoliv used before this work is described in [9]. Descriptions of a few versions of SAM can be found in [1] and [11], where [11] describes the application of SAM to a problem similar to the one in this work. The work in [1] is on the other hand more focused on fast execution when using SAM, using factorizations of the information matrix, and [4] describes how to further improve the execution time by using incremental updates of the factorized information matrix.

In [8] it is shown how the uncertainty in the distance (depth) to landmarks in images is more adequately represented by Gaussian noise when the distance parametrization used is the inverse depth than when regular depth is used.

The model improvements for vehicle pitch dynamics evaluated during this work, namely constant oﬀset, a second order model and inﬂuence from acceleration, are described in [2].

Bundle adjustment (which is a computer vision term corresponding to what is here referred to as SAM) is compared to filtering in [12]. The authors evaluate bundle adjustment and filtering in terms of computational cost for decreasing the robot pose estimate covariance, and found that filtering based methods only might be the better choice when the available processing resources are small; they conclude that bundle adjustment is superior in all other cases. Bundle adjustment is still rather costly, and the authors of [10] state that this is because of the choice of a single privileged coordinate frame; a choice which they avoid by taking a relative approach to bundle adjustment. This approach uses some of the poses as references, and for each other pose or landmark there is one reference pose which it is related to.

In the solution developed during this work feature detection is performed us-ing the Harris corner detector described in [3], and for feature description image patches are used. For association of features in new images normalized cross-correlation (NCC), see for example [7], between feature patches and the new im-ages is used. A problem with using image patches as feature descriptors is that they are not invariant to changes in scale and rotation, and [9] suggests using the so-called Scale-invariant feature transform (SIFT) described in [6], instead of the Harris detector and NCC of image patches. The SIFT algorithm detects and de-scribes features in images and two of its advantages is that it is invariant to scale and orientation of the features.

(17)

Chapter 2

System Modeling

The system of a vehicle moving in unknown surroundings, with measurements of landmarks which are stationary in these surroundings, and measurements of the vehicle motion, is modeled by a state space model. To begin with we have the state of the vehicle (describing the current status of the position and motion of the vehicle) at time t, denoted xv

t. Based on the vehicle state xvt−1 and some input ut

(in our case it is the acceleration of the vehicle), the next vehicle state xv_t is given by

xvt = f (xvt₋₁, ut) + wt, (2.1a)

where f is the motion model function and wt is Gaussian process noise which

compensates for model errors. The position at time t of the jth landmark is parametrized by its state xl

j,tand the landmark motion model becomes

xl_j,t= xl_j,t₋₁, (2.1b)

since we assume that all landmarks are stationary. Then we have, at time t, the vehicle measurements y_tv according to

ytv= hv(xvt) + evt, (2.1c)

where hv _{is the vehicle measurement function and e}v

t is Gaussian measurement

noise. Finally, the landmark measurements of the jth landmark at time t is denoted

yl

j,tand given by

yl_j,t= hl(xv_t, xl_j,t) + el_j,t, (2.1d) where hl_{is the landmark measurement function and e}l

j,tis Gaussian measurement

noise.

2.1 Coordinate Frames

There are three relevant coordinate frames for the combined vehicle and camera system:

(18)

• World (w): This is considered an inertial frame and it is ﬁxed to the

sur-roundings of the vehicle.

• Vehicle body (b): This frame is ﬁxed to the vehicle, with its origin located

in the middle of the rear axis. Coordinate frame b coincide with w at the start of a scenario.

• Camera (c): This frame is ﬁxed relative to b, and it is positioned in the

optical center of the camera.

For all three coordinate frames the x-axis is pointing forward, the y-axis is pointing to the right and the z-axis is pointing downwards.

The yaw, pitch and roll angles describe rotations around z-, y- and x-axes respectively, and unless stated otherwise, positive direction of angles are given by the right-hand grip rule applied to the coordinate frame axis around which the rotation occurs.

2.2 Rotation Matrices

This section describes how coordinates of a ﬁxed point is transformed between three dimensional coordinate systems that have diﬀerent orientation but the same origin.

The three basic rotation matrices for three dimensional space are given by

Rx(θ) =  10 cos θ0 − sin θ0 0 sin θ cos θ   , (2.2a) Ry(θ) =   cos θ0 01 sin θ0 − sin θ 0 cos θ   , (2.2b) Rz(θ) = 

cos θsin θ − sin θ 0cos θ 0

0 0 1



 . (2.2c)

For any rotation matrix we have that R(−θ) = R(θ)−1, and we also have

R−1= RT by deﬁnition.

If the angles yaw (α), pitch (β) and roll (γ) are used to describe the orientation of coordinate frame κ (e.g. a vehicle) as seen from coordinate frame λ (e.g. the world), the rotation matrix Rκλ for converting λ-coordinates to κ-coordinates is given by

Rκλ= Rx(−γ)Ry(−β)Rz(−α) =

= Rx(γ)TRy(β)TRz(α)T.

(19)

2.3 Vehicle Process Model 7

Since rotating from κ to λ is the inverse of rotating from λ to κ (Rλκ = (Rκλ)−1) we get

Rλκ= Rz(α)Ry(β)Rx(γ). (2.4)

Equation (2.5) concludes this theory section by deﬁning a notation for the rotation matrix, as a function of the three rotation angles:

R(α, β, γ) = Rλκ. (2.5)

2.3 Vehicle Process Model

This section describes the vehicle process models, in other words the models for how vehicle states changes over time. The ﬁrst section is about the basic model (also described in [9]) that was used before this work, and the remaining sections are about the model extensions that have been tested during this work.

2.3.1 Basic Model

Before this work the used vehicle process model was the one described in this section.

With the states described in Table 2.1, the vehicle state vector at time t is given by xv_t =         pt vx,t ψt δt αt φt         , (2.6a) pt=  ppx,ty,t pz,t   . (2.6b)

The input signal utis given by

ut= ˙vx,t, (2.7)

where the vehicle acceleration ˙vx,tin practice is measured. By treating ˙vx,t as an

input signal instead of as a measurement we avoid having to incorporate it as a state in the vehicle state vector.

With T as the sampling time, L as the wheel base of the vehicle and C as a pitch damping parameter, we have

(20)

State Description

p Vehicle position in world coordinates.

vx Velocity of the vehicle in its forward direction.

ψ Yaw angle (z-axis rotation), relative to the world coordinate frame.

δ Front wheel angle, relative to the vehicle’s forward direction.

α Road pitch angle, relative to the world coordinate frame xy-plane.

φ Pitch angle of the vehicle, relative to the road.

Table 2.1: This table contains short descriptions of the vehicle states.

f (xv_t, ut+1) =             px,t+ T vx,tcos ψtcos αt

py,t+ T vx,tsin ψtcos αt

pz,t− T vx,tsin αt vx,t+ T ut+1 ψt+ T vx,t L tan δt δt αt Cφt             , (2.8)

which describes the vehicle process model.

The vehicle process noise wtis independent and Gaussian, according to

wt= B(xvt₋₁)ωt∼ N (0, Q(xvt₋₁)), (2.9a) Q(xv_t) = B(xv_t)QωB(xv_t)T, (2.9b) B(xv_t) =       T cos ψtcos αt 0 T sin ψtcos αt 0 T sin αt 0 1 0 0 I4×4      , (2.9c) ωt=       ωvx t ωψt ωδt ωαt ωφt      ∼ N (0, Q ω_), _(2.9d) Qω=        qvx qψ

0

qδ

0

qα qφ        , (2.9e)

(21)

2.3 Vehicle Process Model 9

2.3.2 Model Extensions

This section describes the vehicle process model extensions tested during this work. Note that these model extensions can be used together in any combination.

We need to introduce some new notation: In order to describe the vehicle process model for one of the vehicle states, we use superscripts. An example: For the yaw angle (ψ) process model we would use the notation fψ.

Constant Oﬀset in Car Pitch Angle

Since the stationary pitch angle of the camera, relative to the road, might be non-zero, due to the current load in the vehicle or misaligned camera mounting, the state for the camera pitch oﬀset φ0 _{is added to the state vector. This oﬀset state}

is interpreted as the stationary car pitch angle, around which the car pitch angle oscillates. The process model for car pitch angle φ and car pitch angle oﬀset φ0 becomes

fφ(xv_t, ut+1) = C(φt− φ0t) + φ

0

t, (2.10a)

fφ0(xv_t, ut+1) = φ0t, (2.10b)

where C is the previously mentioned pitch damping parameter.

The camera pitch oﬀset models a constant oﬀset angle, so the process noise variance for φ0_t is zero. The process noise variance for φ is independent of whether

φ0 is included in the vehicle state and process model or not. Acceleration Oﬀset in Car Pitch Angle

Acceleration (including deceleration) of the vehicle has significant effects on the car pitch angle. The model of this effect is that the stationary car pitch angle, when the acceleration u is constant, becomes Ku, where K is a vehicle geometry dependent parameter. This leads to that

fφ(xvt, ut+1) = C(φt− Kut+1) + Kut+1. (2.11)

This model is very similar to the constant oﬀset model for car pitch; the dif-ferences are that for acceleration no new states are required, and the oﬀset due to acceleration is not necessarily constant.

Note that the choice of using a linear relation between acceleration and sta-tionary car pitch angle is mainly motivated by the fact that the springs, which are part of the vehicle suspension, have a linear relation between displacement and force (this is known as Hooke’s law).

The process noise variance for φ should not be changed when adding the car pitch acceleration oﬀset to the process model.

(22)

Second Order Car Pitch Model

A natural extension of the ﬁrst order model in (2.8) for the car pitch is to use a second order model, modeling the car pitch as a damped harmonic oscillator, instead of as oscillation-free damping which the ﬁrst order model describes.

With τ as the characteristic time (sometimes called relaxation time) and fn

as the natural frequency of the car pitch system the diﬀerential equation for car pitch motion is ¨ φ + 21 τφ + (2πf˙ n) 2_{φ =} torque moment of inertia. (2.12)

Since the torque is unknown it will be represented by process noise and subse-quently set to zero in

¨

φ + 21

τφ + (2πf˙ n)

2_{φ = 0,} _(2.13)

which is the time-continuous equation (without process noise) that the car pitch process model will be based on.

Discretization and approximations of derivatives are performed according to

φt= φ(tT ), (2.14a) ˙ φt= ˙φ(tT )≈ φt+1− φt T , (2.14b) ¨ φt= ¨φ(tT )≈ ˙ φt+1− ˙φt T . (2.14c)

The order of the linear ordinary diﬀerential equation in (2.13) is two. This means that a state space representation of the car pitch system requires two states;

φ is already a vehicle state, and ˙φ is added to the vehicle state vector.

In the absence of process noise, we can identify φt+1as fφ(xvt, ut+1) and ˙φt+1

as fφ˙(xv_t, ut+1), so (2.13) and (2.14) gives fφ(xv_t, ut+1) = φt+ T ˙φt, (2.15a) fφ˙(xv_t, ut+1) = ( 1− 2T τ ) ˙ φt− T (2πfn)2φt. (2.15b)

For the second order model of the car pitch to have the same characteristic time τ (i.e. the same damping) as the ﬁrst order model in (2.8) we seek τ = τ (C). The ﬁrst order model in (2.8) is a discretization of the continuous time model

˙

φ +1

(23)

2.4 Landmark Parametrization 11

Using (2.14) to discretize (2.16) results in

φt+1≈ ( 1−T τ ) φt. (2.17)

A comparison of (2.8) and (2.17) ﬁnally gives

τ ≈ T

1− C. (2.18)

Using the second order model for car pitch, the process noise variance for φ is lowered by an order of magnitude compared to the process noise variance when using the basic model, making φ tightly connected to ˙φ. Then the process noise

variance for ˙φ is set to approximately correspond to that of φ for when the basic

model is used, resulting in a value one or two orders of magnitude higher than the basic φ process noise variance.

Roll Angle

For several reasons, such as that curves of country roads are banked, and that the car may roll quite a bit when driving on uneven roads such as dirt roads, letting the roll angle be constant at zero, which is done in the basic model where roll is not a vehicle state, might be an inadequate approximation. The process model for the combined roll angle γ of the car and road is given by

fγ(xv_t, ut+1) = γt. (2.19)

Since the roll and pitch angles of automobiles have similiar behaviour in terms of amplitude and natural frequency, the process noise variance for the roll angle is set to be approximately the same as the basic model car pitch process noise variance.

2.4 Landmark Parametrization

In this section the landmark state is described. At time t we have Mt number

of visible landmarks. (Visible means that the landmark has been measured; a landmark may very well be non-visible although it is present in the FIR image, but it cannot be visible if it is not in the image.) The landmark index, which is its identiﬁcation, of the visible landmark number i ∈ {1, 2, . . . , Mt} at time t is

denoted jt(i).

With the landmark states described in Table 2.2, the landmark state vector is given by

(24)

xl_t=       xl_j t(1),t xl jt(2),t .. . xl jt(Mt),t      , (2.20a) xl_j,t=     k_j,tw θw j,t ϕw j,t ρj,t     , (2.20b) k_j,tw =  k w j,t,x kw j,t,y kw_j,t,z   . (2.20c) State Description

kw _{The position in world coordinates of the camera at the time when the}

landmark was ﬁrst seen.

θw _{The azimuth angle of the landmark as seen from k}w_{, relative to world}

coordinate frame directions.

ϕw The elevation angle of the landmark as seen from kw, relative to world coordinate frame directions, with positive angles towards the positive z-axis.

ρ The inverse depth (which is the inverse of the distance) from kw_{to the}

landmark.

Table 2.2: This table contains short descriptions of the landmark states.

The reason for using inverse depth rather than regular distance is that we want a more natural way to assign uncertainty to the estimates of distance. Since the uncertainty of state estimates will be represented by normal distributions we get higher uncertainty for large distances and lower uncertainty for small distances by using inverse depth, and this is precisely what we want.

The landmark state xl_j,t is a parametrization of the position l_j,tw of landmark j at time t, and the relationship between position and state, with landmark position given in world coordinates, is

lw_j,t= kw_j,t+ 1 ρj,t  cos ϕ w j,tcos θwj,t cos ϕw_j,tsin θw_j,t sin ϕw j,t   | {z } mw j,t . (2.21)

(25)

2.5 Measurement Model 13

2.5 Measurement Model

The model for the vehicle measurements is

hv(xv_t) = ( vx,t ˙ ψt ) = ( vx,t vx,t L tan δt ) , (2.22)

where L is the wheel base of the vehicle. The landmark measurement model is given by hl(xv_t, xl_j,t) =Pn(pcj,t) = 1 pc_j,t,x ( pc_j,t,y pc j,t,z ) , (2.23a) pc_j,t=  p c j,t,x pc_j,t,y pc j,t,z   = pc_(xv t, x l j,t) = = 1 ρj,t Rcb(Rbw(ρj,t(kj,tw − pt) + mwj,t)− ρj,tcb), (2.23b) Rcb= R(αc, βc, γc)T, (2.23c) Rbw = { R(ψt, αt+ φt, γt)T if model contains γ, R(ψt, αt+ φt, 0)T otherwise. (2.23d)

where cb _{is the position of the camera in the vehicle body coordinate frame, and}

Pn(pc) is the so-called normalized pinhole projection of a point pc, which is given

in camera coordinates. Furthermore,Pn generates normalized camera coordinates

and αc_{, β}c _{and γ}c _{are the yaw, pitch and roll angles of the camera, relative to the}

vehicle body coordinate frame.

Both of the two measurement noises ev

t and elj,tare independent and Gaussian,

according to ev_t ∼ N (0, Sv), (2.24a) el_j,t∼ N (0, Sl), (2.24b) Sv = ( svx ₀ 0 sψ˙ ) , (2.24c) Sl= ( sc ₀ 0 sc ) , (2.24d)

where all the s-variables are measurement noise variance parameters.

The FIR camera is of course digital, so it generates images consisting of pixels. To translate between pixel coordinates (y˜ z˜)T and normalized camera coordi-nates (y z)T, which is the kind of coordinates landmark measurements yl

j,t are

(26)

( y z ) =   ˜ y−˜yic fy ˜ z−˜zic fz   , (2.25)

where(y˜ic _z_˜ic)T _{denotes the image center (which is the intersection of the optical}

axis and the image plane), and fy and fz are the focal lengths (given in pixels) in

(27)

Chapter 3

Visual Odometry and SLAM

The contents of this chapter describes how estimation of vehicle and landmark states is performed, by first explaining how the extended Kalman filter (EKF) is applied to our problem, and then describing the smoothing and mapping (SAM) algorithm, which can be seen as a method to refine the state estimates from the EKF. Finally an improvement of the feature association is described.

3.1 Extended Kalman Filter

With xv

t and xlt given by (2.6a) and (2.20a) respectively, the true state vector at

time t is given by xt= ( xv t xl t ) , (3.1)

and ˆxt|s is the estimate of xt using measurements up until time s∈ {t − 1, t}.

If the model and the initial values of ˆx and P are accurate, then we have Pt|s = cov(xt− ˆxt|s). The model used is of course not perfectly accurate, but we

can still interpret Pt_|s as a measure of how good ˆxt_|s approximates xt. However,

because of the linearization done in the EKF, P tends to be underestimated, saying that ˆx is less uncertain than it actually is, and this must be kept in mind if we are

looking at the actual values in P (e.g. plotting the conﬁdence bounds for landmark positions). Pt_|s can be decomposed according to

t_|s is the covariance of the vehicle states, P vl t_|s and P

lv

t_|s is the covariance

between the landmark states and the vehicle states, and Pl

t|s is the covariance of

the landmark states. The vector yl

t contains all the landmark measurements for time t, and hlt(xt)

is the corresponding measurement function; they are deﬁned in (3.3). Remember 15

(28)

from Chapter 2 that the landmarks which are measured at time t have indices

jt(i),∀i ∈ {1, 2, . . . , Mt}, where Mtis the number of measured landmarks at time

t. We have that yl_t=       y_jl t(1),t y_jl t(2),t .. . y_jl t(Mt),t      , (3.3a) hl_t(xt) =       hl_(xv t, xljt(1),t) hl_(xv t, xljt(2),t) .. . hl_(xv t, xljt(Mt),t)      . (3.3b)

The EKF uses linearized measurement and process models given by

Ft= ∂f (xv t−1, ut) ∂xv t₋₁ xv t−1=ˆxvt−1|t−1 , (3.4a) H_tv= ∂h v_(xv t) ∂xv t xv t=ˆxvt|t−1 , (3.4b) H_tl= ∂h l t(xt) ∂xt xt=ˆxt|t−1 . (3.4c)

We then get the time update (i.e. the prediction) of the states according to

The next step is the measurement update, which adjusts the state predictions from the time update, using measurements. Sl

t is the noise covariance for

mea-surement yl

t, so it is a block diagonal matrix with Mtnumber of Slmatrices on its

(29)

3.1 Extended Kalman Filter 17 ˆ xt|t= ˆxt|t−1+ Ktv(y v t − h v_(ˆ_xv t_|t−1)) + K l t(y l t− h l t(ˆxt|t−1)), (3.6a) Pt|t= Pt|t−1− Ktv ( Hv t 0 ) Pt|t−1− KtlH l tPt|t−1, (3.6b) K_tv= Pt|t−1 ( Htv 0 )T( ( Htv 0 ) Pt|t−1 ( Htv 0 )T + Sv )₋₁ , (3.6c) K_tl= Pt|t−1(Htl) T(_Hl tPt|t−1(Htl) T_{+ S}l)−1_. _(3.6d)

When a new feature is extracted from an image, a corresponding landmark is added to the landmark state vector ˆxt|t and the covariance matrix Pt|t must

be expanded to include the uncertainty of the new landmark’s states. With p as the vehicle position, ρ0 as the initial inverse depth, Pρ0 as the initial variance

for inverse depth, ynl₌(_ynl

y ynlz

)T

as the measured position of the feature (in normalized camera coordinates), and superscipt nl meaning new landmark, the new state vector ˆxnew_t_|t and covariance matrix P_tnew_|t are given by

ˆ xnew_t_|t = ( ˆ xt_|t ˆ xnl ) , (3.7a) ˆ xnl=     ˆ pt|t+ Rwbcb arctan ynl y + ˆψt|t arctan yznl− ˆφt_|t ρ0     , (3.7b) Rwb= { R( ˆψt, ˆαt+ ˆφt, ˆγt) if model contains γ, R( ˆψt, ˆαt+ ˆφt, 0) otherwise, (3.7c) P_tnew_|t = J     Pt|t 0 0 0 0 0 0 0 0 0 Sl ₀ 0 0 0 Pρ0     JT, (3.7d) J = ( I 0 ∂ ˆxnl ∂ ˆp_t|t 0 ∂ ˆxnl ∂ ˆψ_t|t 0 0 ∂ ˆxnl ∂ ˆφ_t|t 0 · · · 0 0 0 0 ∂ ˆxnl ∂ynl ∂ ˆxnl ∂ρ0 ) , (3.7e) where lines are added for better readability, at the same place in the diﬀerent matrices (which are of equal size). These lines are placed so they separate the old covariance from the covariance of the new landmark state. Remember that cb _is

the position of the camera in the vehicle body coordinate frame, and note that the covariance for the new landmark’s camera position state will be given by the covariance for the vehicle position in combination with _{∂ ˆ}∂ ˆ_pxnl

t|t.

Due to the linearization of the non-linear landmark measurement function hl

it sometimes happens that the latest estimation of a landmarks inverse depth is negative. This can happen to landmarks that are distant but with measurements

(30)

indicating that the current distance estimate is too low, so the distance is increased by lowering the inverse depth estimation. To avoid this problem all inverse depth estimates in ˆxl

t_|t are forced to be above a positive threshold ρmin, thereby setting

a maximum allowed distance for landmarks. If ρmin is small enough, making the

maximum allowed distance large enough, the measurement errors (for landmarks further away than the maximum allowed distance) due to forcing the inverse depth estimates is negligible.

Algorithm 1 describes the EKF algorithm used. Pv

1_|0is the initial vehicle state

covariance, tf is the interval for when to search for new features, and Mmaxis the

maximal number of visible landmarks. Algorithm 1 Sensor fusion using EKF

1: Initialize all elements in ˆxv

1|1 to zero, except for ˆvx,1|1and ˆδ1|1which are given

by measurements yv

1 using (2.22) with yv1= hv(ˆx1v_|1). Let P1_|1= P1v_|0.

2: Generate yl

1by extracting features from the ﬁrst image.

3: Expand ˆx1|1and P1|1according to (3.7) using yl1.

4: for t = 2 to N do

5: Generate ˆxt|t from ˆxt−1|t−1 and Pt|t from Pt−1|t−1 using vehicle

measure-ments y_tv, but no landmark measurements.

6: Predict landmark measurements.

7: Associate features, resulting in landmark measurements ytl. (Landmarks

whose features cannot be associated with high precision are removed.)

8: Update ˆxt_|t and Pt_|t using landmark measurements ytl. (Landmarks with

highly unlikely measurements are removed.)

9: for all j do

10: Make sure ˆρj,t|t≥ ρmin.

11: end for

12: Make sure Pt|t is symmetric.

13: if t≡ 0 (mod tf) and Mt< Mmax then

14: Extract new features (at most Mmax_{− M}

tnew features).

15: end if

16: end for

3.2 Feature Detection

In order to find distinctive, and thereby trackable, features in the image some sort of feature detection must be performed. In this work the so-called Harris detector (see [3]) is used; this detector finds areas in the image where the image contains significant change in more than one direction (thus for examle corners, but not straight lines, will be detected). The feature detection algorithm, used in Algorithm 1 and given by Algorithm 2, is run at fixed time intervals, and only if the number of visible landmarks Mt is lower than the maximum allowed number

(31)

3.3 Feature Association 19 Algorithm 2 Feature detection

1: Smooth the image in order to suppress noise.

2: Compute the image gradient.

3: Compute the Harris measure from the image gradient.

4: Mask the image where the road is expected to be, near the image borders, and around predicted positions of existing features, by setting the Harris measure to zero in these regions.

5: Set nM = Mmax− Mt.

6: while nM > 0 and maximum of Harris measure is above a threshold do

7: Extract a ﬁxed-size image patch for the feature around the position of the maximum for the Harris measure.

8: Mask the image around the new feature, by setting the Harris measure to zero in this region.

9: Decrease nM by one.

10: end while

11: for all extracted features do

12: Expand the landmark state and covariance with the new landmark that the extracted feature represents.

13: end for

3.3 Feature Association

In order to track a feature in a sequence of frames we must perform association, i.e. determining the position of the feature in question in all of the images in the sequence. In this work features are represented and characterized by a small image patch of each feature. Association of one feature image patch with a camera image, i.e. determining where in the camera image the feature image patch (and thus the feature) is located, is performed using normalized cross-correlation (NCC). The feature association in Algorithm 1 is performed according to Algorithm 3.

3.4 Smoothing and Mapping

The idea behind smoothing and mapping (SAM) is to simultaneously estimate the vehicle states and the landmark states, utilizing both previous, present and future measurements, as opposed to the EKF which cannot utilize future measurements. This acausal estimation is performed by solving weighted least squares problems iteratively, until desired accuracy of the converging solution is achieved.

Since the landmarks are stationary each landmark is represented by a time-independent landmark state. The vehicle however is not stationary, so the vehicle states for all the time steps t ∈ {1, 2, . . . , N} are included in the state vector of the SAM algorithm. Mathematical descriptions of the states used in the SAM algorithm can be found in (3.8).

(32)

Algorithm 3 Feature association

1: for all features in xltdo

2: Predict feature position in image.

3: Select a search region around the predicted position.

4: if search region is completely within the image then

5: Compute normalized cross-correlation (NCC) of the current image (lim-ited to the search region) and the feature image patch representing the landmark.

6: Find the maximum of the NCC.

7: Compute the maximum of the NCC when a small area around the original maximum has been zeroed out.

8: if second maximum NCC-value is not close to original maximum then

9: Save the position of the original maximum as the new measurement of the feature.

10: end if

11: end if

12: end for

Not all landmarks from the EKF run are included in the SAM run; a threshold for the minimum number of times a landmark has been visible is used to sort out landmarks with few measurements. With M number of landmarks included in the SAM run, the landmark index of landmark number i∈ {1, 2, . . . , M} in the SAM run is denoted j(i). The state vector x, containing both landmark states and vehicle states, is given by

x = ( xv xl ) , (3.8a) xv=      xv 1 xv 2 .. . xv_N     , (3.8b) xl=       xl j(1) xl_j(2) .. . xl_{j(M )}      , (3.8c) xl_j =  θ w j ϕw_j ρj   . (3.8d)

The variable ¯x is the current estimate of x. During the ﬁrst SAM iteration

¯

(33)

3.4 Smoothing and Mapping 21

collected from all the time steps of the EKF run. Landmark states xl_j on the other hand are based on the last estimation by the EKF of the landmark state in question.

Instead of including the camera positions in the landmark states xl_j we use the vehicle state xv_t

c(j), from the time tc(j) when landmark j was ﬁrst seen, to calculate

the camera position. This can be done since we assume that the camera is ﬁrmly mounted in the car. We can recompose a landmark state using the function g, which returns the complete landmark state (i.e. camera position, azimuth angle, elevation angle and inverse depth) according to

g(xv_t c(j), x l j) = ( ptc(j)+ R wb_cb xl j ) , (3.9a) Rwb= { R(ψtc(j), αtc(j)+ φtc(j), γtc(j)) if model contains γ, R(ψtc(j), αtc(j)+ φtc(j), 0) otherwise. (3.9b)

Note that xl_j,t, with time index, denotes the landmark state that includes camera position, whereas xl

j, without time index, denotes the landmark state that

ex-cludes camera position.

Since the number of landmark measurements yl

j,tis not the same for every time

step, k is used to numerate the complete series of landmark measurements. We introduce the notation jk for the index of the landmark associated with

measure-ment k, and similarly tk for the time when measurement k was performed. Using

the just introduced notation, yl

jk,tk is the measurement we get from landmark

measurement number k. Vehicle measurements yv

t are performed once every time step, so no notation

other than the time t is required to numerate these measurements.

Linearization at x = ¯x of the process model and measurement model, and

using the fact that the input signal utis non-variable (it is actually a measurement

although we treat it as an input signal) results in

¯ xv_t+ δxv_t = f (¯xv_t₋₁, ut) + Ftδxvt−1+ wt, (3.10a) ytv= hv(¯xvt) + Htvδxvt+ evt, (3.10b) y_jl k,tk= h l_(¯_xv tk, g(¯x v tc(jk), ¯x l jk)) + H l kδx v tk+ (3.10c) + H_kl,cδxv_t_c_(j_k₎+ Jkδxljk+ e l jk,tk, (3.10d) where

(34)

Ft= ∂f (xv t−1, ut) ∂xv t₋₁ xv t−1=¯xvt−1 , (3.11a) H_tv = ∂h v_(xv t) ∂xv t xv t=¯xvt , (3.11b) H_kl = ∂h l_(xv t, g(¯xvtc(jk), ¯x l jk)) ∂xv t xv t=¯xvtk , (3.11c) H_kl,c= ∂h l_(¯_xv tk, g(x v tc, ¯x l jk)) ∂xv tc xv tc=¯xvtc(jk) , (3.11d) Jk = ∂hl_(¯_xv tk, g(¯x v tc(jk), x l j)) ∂xl j xl j=¯xljk , (3.11e)

are Jacobians of the process model and measurement model. By using the residuals

at= ¯xvt − f(¯x v t−1, ut), (3.12a) cv_t = y_tv− hv(¯xv_t)), (3.12b) cl_k= y_jl k,tk− h l_(¯_xv tk, g(¯x v tc(jk), ¯x l jk)), (3.12c)

we can form a least squares optimization expression, minimizing the sum of squared weighted measurement and process noises, according to

δx∗= arg min δx ( ∑ t ( ∥wt∥2_Q−1 t +∥ev_t∥2_(Sv₎−1 ) +∑ k ∥el jk,tk∥ 2 (Sl₎−1 ) = = arg min δx ( ∑ t ( ∥Ftδxvt−1− δx v t− at∥2_Q−1 t +∥H_tvδxv_t − cv_t∥2_(Sv₎−1 ) + +∑ k ∥Hl kδxvtk+ H l,c k δx v tc(jk)+ Jkx l jk− c l k∥2(Sl₎−1 ) , (3.13)

where Qt= Q(¯xvt−1) (see (2.9)). Note that for t = 1 we need ¯xv0, which we don’t

have, so we let F1 = 0, a1 = 0 and Q1 = P1v_|0, which is the initial vehicle state

covariance for the EKF estimation. This way the resulting term∥δxv

1∥2_Q−1

1

makes sure that the smoothed state estimate ¯xs _{stays reasonably close to ¯}_x.

The vector norm ∥ · ∥P−1 used in (3.13) is the so called Mahalanobis distance,

and it is given by

∥e∥2

P−1 = e

T_P−1_{e = (P}−T/2_e)T_(P−T/2_{e) =}_∥P−T/2_e_∥2

(35)

3.4 Smoothing and Mapping 23

A problem with using the Mahalanobis distance is that Qtis singular (see

Ap-pendix B for a proof), so its inverse does not exist. To get around this problem we add ϵI, where ϵ > 0 is a small number, to Qtbefore the inversion.

By collecting all the weighted Jacobians (Q−T/2_t Ft, Q−T/2t , (Sv)−T/2Htv, . . . )

in a matrix A, and stacking all the weighted residuals (Q−T/2_t at, (Sv)−T/2clk and

(Sl₎−T/2_cv

t) in a vector b, the least squares optimization can be rewritten to be of

the form of a standard linear least squares problem, according to

δx∗= arg min

δx

∥Aδx − b∥2

2, (3.15)

and an example of the structure of A is available in Example 3.1.

Thereafter the sought smoothed state vector ¯xs_{is obtained according to}

¯

xs= ¯x + δx∗ (3.16) In order to get good accuracy for the state estimate we repeat the SAM algo-rithm, using ¯x = ¯xs_{, until δx}∗_{becomes smaller than some predeﬁned threshold.}

Example 3.1

To show the structure of the matrix A, an example scenario is used to illustrate how A is composed of the Jacobians Htv, Hkl, H

l,c

k and Jk, but with all the weight

matrices omitted for better visibility (the symbol ∼ used later on is to be inter-preted as "roughly similar to"). The example scenario has:

• N = 5 time steps. • M = 2 landmarks.

• Measurements of landmark j = 1 at times t ∈ {1, 2, 3, 4}. • Measurements of landmark j = 2 at times t ∈ {3, 4, 5}.

This results in the following landmark measurements, enumerated by k

k 1 2 3 4 5 6 7

tk 1 2 3 3 4 4 5

jk 1 1 1 2 1 2 2

The structure of A, as generated by Algorithm 4, is described in (3.17). Re-member that the subscript k of H_kl,cand Hl

k denotes landmark measurement

num-ber, while the subscript t of H_tv denotes time. Note that:

• A11relates vehicle states to vehicle state prediction errors.

(36)

• A22 relates landmark states to measurement prediction errors. A = ( A11 0 A21 A22 ) ∼ ∼                                −I F2 −I F3 −I F4 −I F5 −I Hv 1 H₁l,c+ H₁l J1 H₂v H₂l,c H2l J2 Hv 3 H₃l,c Hl 3 J3 H₄l,c+ Hl 4 J4 Hv 4 H₅l,c Hl 5 J5 H₆l,c H₆l J6 H5v H₇l,c Hl 7 J7                                (3.17)

It is time comsuming to evaluate the Jacobian H_kl,c, connecting camera position to vehicle state. If only the Jacobian’s partial derivatives with respect to vehicle

position are allowed to be non-zero, it is possible to reduce the execution time

of the evaluation whilst the speed of convergence is marginally lowered, and the approximation’s eﬀect on the vehicle pose is almost undetectable. As a result of this, the implementation uses the approximate version of H_kl,c. The interpretation of this approximation is that we disregard change in camera position that stems from changes in the vehicle state angles ψtc(jk) and αtc(jk).

Just as with the EKF algorithm, and because of the same reason, it is possible to after a SAM iteration get negative estimates of landmarks’ inverse depths. This problem is solved in the SAM algorithm like it is in the EKF algorithm, namely by forcing inverse depth estimates below ρminup to this minimal allowed value.

Algorithm 4 summarizes the implementation of the SAM algorithm. nv _{is the}

number of states in the vehicle model (i.e. the length of xv

t, for any t). nl is the

(37)

3.5 Feature Association Improvement 25

3.5 Feature Association Improvement

In order to derive FIR images measurements that contains much information about the vehicle poses we want many measurements, but at the same time we want few landmark states since also these are estimated from the measurements. A method to increase the number of landmark measurements per landmark is described be-low.

A major disadvantage with association of features using normalized cross-correlation of image patches is that change in scale is not handled. To overcome this issue, without implementing a whole new feature representation, scaling of the image patches that describes the features can be performed according to

Lj,t= 2 ⌊ Dj,tL0 2 ⌋ + 1, (3.18a) Dj,t= ∥ˆlw j,t− ˆkwj,t∥ ∥ˆlw j,t− (ˆpt+ Rwbcb)∥ = 1 ˆ ρj,t∥ˆlwj,t− (ˆpt+ Rwbcb)∥ , (3.18b)

where Lj,tis the size of the enlarged patch for landmark j at time t, L0 is the size

of the original patch (with size being the length of the quadratic patches’ sides). ˆ

pt, ˆkj,tw, ˆρj,tand ˆlwj,tare EKF estimates of states and variables described in Section

2.4, and Rwb _{is the same as in (3.7). Note that l}w

j,t is not a state and is therefore

not itself estimated, it is instead calculated using the estimated landmark states. The quantity Dj,t is a scale factor which, based on the EKF estimate of

land-mark state and vehicle state, estimates how much larger landland-mark j will appear in the image at time t, compared to when the landmark was ﬁrst seen and the image patch describing the feature was stored. The scale factor is simply calculated as the distance to the landmark when it was ﬁrst measured, divided by the current distance to it.

Scaling the feature patches can, besides prolonging the life-time of landmarks, also be expected to improve the quality of landmark measurements. The key to this is that when scaling is not used, and we have a landmark patch which ought to be enlarged, then the association of this patch with new image might become oﬀset in some direction. Consider for example the top of a spruce tree and let’s say that the feature patch describing this landmark is taken far away so that the patch depicts the upper half of the tree. If we associate this feature patch with an image taken at half the original distance to the tree it is likely that we succeed in ﬁnding an association, but the patch will be associated with the top quarter of the tree, so the measurement is higher up than it should be.

Note that Lj,t is Dj,tL0 rounded to the nearest odd integer. This is done to

make sure that also the enlarged patch has one pixel which represents the center of the patch. Also note that Dj,tis an approximation of the real scale change, but

this approximation holds since the landmarks are small compared to the distance to them, resulting in small angles.

(38)

Algorithm 4 Sensor fusion using SAM

1: Generate ¯xv from all ˆxv_t_|t.

2: Generate ¯xl_{from all ˆ}_xl

t_|t, using the last estimate of the individual landmarks.

Exclude landmarks that has fewer measurements than a predeﬁned threshold.

3: Update yl

jk,tk (taken from the EKF algorithm) by removing measurements

belonging to landmarks removed in the previous step.

4: repeat

5: Initialize a and c as empty column vectors, and A21 and A22 as empty

matrices. (These variables will grow iteratively as new rows are added to them.) 6: for t = 1 to N do 7: if t = 1 then 8: Set A11= (P₁v_|0)−T/2 ( −I 0nv×(N−1)nv)_. 9: Set a = 0nv×1. 10: else 11: Append Q−T/2t ( 0nv×(t−2)nv Ft −I 0n v_×(N−t)nv) to A11. 12: Append Q−T/2_t atto a. 13: end if 14: Append (Sv₎−T/2(₀2×(t−1)nv Hv t 02×(N−t)n v) to A21.

15: Append a two rows of zeros to A22.

16: Append (Sv₎−T/2_cv t to c. 17: for all k : tk = t do 18: if t = tc(jk) then 19: Append (Sl)−T/2(02×(t−1)nv H_kl,c+ H_kl 02×(N−t)nv)to A21. 20: else 21: (Sl)−T/2(02×(tc(jk)−1)nv _Hl,c k 0 2×(t−tc(jk)−1)nv _Hl k 0 2×(N−t)nv) is appended to A21. 22: end if 23: Append (Sl)−T/2 ( 02×(jk−1)nl _J k 02×(M−jk)n l) to A22. 24: Append (Sl₎−T/2_cl k to c. 25: end for 26: end for 27: Let A = ( A11 0 A21 A22 ) . 28: Let b = ( a c ) .

29: Calculate δx∗ and ¯xs_{according to (3.15) and (3.16) respectively.}

30: Set ¯x = ¯xs_.

31: for all j do

32: Make sure ¯ρj≥ ρmin.

33: end for

(39)

Chapter 4

Results

This chapter contains a description of the result presentation, followed by the results of using SAM and vehicle model extensions, and lastly results of introducing scale change compensation to the feature association.

4.1 Performance Measures

The results in terms of vehicle pose estimation accuracy of SAM and vehicle model extensions are given in the form of two measures. The first one, landmark measure-ment residuals, is an indirect but quantified measure of pose estimation accuracy, whereas the second measure, trajectory plot in camera image, is direct but non-quantified and also to some extent subjective.

4.1.1 Landmark Measurement Residuals

The root mean square, RMS(v), of a vector v =(v1 . . . vN

)T ∈ RN _is RMS(v) = v u u t 1 N N ∑ i=1 v2 i. (4.1)

To measure how well image positions for landmarks are predicted, i.e. how far oﬀ the predictions were from the measurements on average, the RMS(˜cl_{) of}

the landmark measurement residuals ˜cl _{(given in pixels) is used. With f}

y and fz

as the focal lengths (given in pixels) in y-direction and z-direction respectively,

Mk ₌∑N

t=1Mt as the total number of landmark measurements and te(j) as the

end time for landmark j, i.e. the latest time that it was measured, ˜cl _{is given}

according to

(40)

˜ cl=    ˜ cl 1 .. . ˜ cl Mk    , (4.2a) ˜ cl_k= ( fy 0 0 fz ) cl_k, (4.2b) cl_k= { yl jk,tk− h l_(ˆ_xv tk|tk, ˆx l

jk,te(jk)|te(jk)) if EKF estimates are used,

y_jl_k_,t_k− hl(¯xv_t_k, g(¯xv_t

c(jk), ¯x l

jk)) if SAM estimates are used.

(4.2c)

Note that cl

k is given in normalized camera coordinates.

The fact that landmark measurements are discretized (they are given as integer pixel positions) results in that there is a lower limit for the RMS of the landmark measurement residuals, and it is RMS(˜cl₎_' 1

2√3 ≈ 0.29. See Appendix C for the

derivation of this limit.

A drawback with this pose estimation measure is that since it is based on land-mark measurements it depends on what types of landland-marks that are extracted from the image sequence, and the extracted landmarks are not necessarily the same when using a diﬀerent subset of the model extensions described in 2.3.2. For example: Let us consider a data sequence where we with one model extension get a couple of erroneously associated landmarks and no erroneous associations with another model extension, then it is diﬃcult to compare the pose estimation accuracy based on the RMS(˜cl_{) values from these two cases. To make the results}

more dependable we take the RMS(˜cl_{) values for all the 16 combinations of}

ve-hicle process model extensions for both EKF and SAM (a total of 32 values per sequence) and calculate a few averages of the RMS(˜cl_{) values for each sequence.}

It should also be noted that the measurements of course are not the truth, and the quality of them depends on how well the feature association algorithm performs.

4.1.2 Trajectory Plot in Camera Image

Plotting the sequence of estimated vehicle positions (the trajectory) in the ﬁrst FIR camera image provides an intuitive way to evaluate the accuracy of the pose estimates.

The vehicle pose consists of three-dimensional position and three-dimensional orientation, and the part of the pose sequence that the trajectory basically shows is position and yaw angle (since yaw is tightly connected to the sequence of positions), but not pitch angle or roll angle.

The obvious drawback of using the trajectory plot as a measure of pose es-timation accuracy is that it is not quantitative but instead has to be assessed subjectively.

(41)

4.2 SAM and Model Extensions 29

4.2 SAM and Model Extensions

We introduce the following abbreviations for the vehicle process model extensions described in Section 2.3.2:

Offset: Constant offset in car pitch angle. Acc: Acceleration offset in car pitch angle. Roll: Roll angle.

2nd: Second order model for car pitch.

For each data sequence used to evaluate the SAM algorithm and model exten-sions proposed in this thesis, both EKF and SAM estimates has been generated for all the 16 combinations of vehicle process model extensions, and all values of RMS(˜cl) for the sequences are found in Appendix D. This section summarizes the results found in Appendix D, and for each sequence a few vehicle trajectories are plotted in the ﬁrst FIR camera image of the sequence. To illustrate the trajectory and landmarks a map of estimated landmark positions and vehicle trajectory is shown for each sequence.

It is important to note that not all sequences contains behaviour that the model extensions are ment to handle. If for example the true car pitch offset is zero the offset model extension cannot be expected to improve the pose estimates for that particular sequence. The same goes for the acceleration offset in car pitch angle for any sequence in which the vehicle moves at constant speed. To give an idea about which model extension that are given a chance to improve the results for a particular sequence, all sequences below are accompanied by a short description of its pitch motion, roll motion and acceleration, but note that it is generally unknown whether or not the car pitch has a constant offset.

(42)

Sequence A

The car pitch and roll oscillates quite a bit during this sequence, but the accel-eration is virtually zero. Table 4.1 presents the results for the mean landmark measurement residuals, when using SAM compared to using only EKF, and when using the diﬀerent vehicle model extensions, compared to not using them. Figure 4.1 shows a top view of the estimated map and trajectory for Sequence A, while Figures 4.2 and 4.3 show trajectory plots which compares the diﬀerent vehicle model extensions, and SAM versus EKF.

Method RMS(˜cl)

EKF 2.71

SAM 0.51

(a) EKF versus SAM.

RMS(˜cl) Vehicle process model

Oﬀset Acc Roll 2nd

With model 0.52 0.51 0.48 0.50

Without model 0.50 0.51 0.54 0.53

(b) With and without model extensions, using SAM.

Table 4.1: These tables show the eﬀect on the mean landmark measurement resid-uals of using SAM and vehicle model extensions for Sequence A.

0 500 1000 1500 −200 −150 −100 −50 0 50

x [m]

y [m]

Figure 4.1: Sequence A: This ﬁgure show a map of the SAM estimates, using the basic vehicle process model.

(43)

4.2 SAM and Model Extensions 31

Figure 4.2: Sequence A: Trajectory for EKF estimates (dashdotted) and SAM estimates (solid), using the basic vehicle process model.

Figure 4.3: Sequence A: Trajectory for SAM estimates, using the models oﬀset (dotted), acc (dashed), roll (dashdotted) and 2nd (solid).

(44)

Sequence B

In this sequence there are moderate amounts of pitch and roll motion, but the acceleration is very close to zero. Table 4.2 presents the results for the mean landmark measurement residuals, when using SAM compared to using only EKF, and when using the diﬀerent vehicle model extensions, compared to not using them. Figure 4.4 shows a top view of the estimated map and trajectory for Sequence B, while Figures 4.5 and 4.6 show trajectory plots which compares the diﬀerent vehicle model extensions, and SAM versus EKF.

Method RMS(˜cl)

EKF 1.18

SAM 0.46

(a) EKF versus SAM.

RMS(˜cl) Vehicle process model

Oﬀset Acc Roll 2nd

With model 0.45 0.46 0.41 0.46

Without model 0.46 0.46 0.50 0.45

(b) With and without model extensions, using SAM.

Table 4.2: These tables show the eﬀect on the mean landmark measurement resid-uals of using SAM and vehicle model extensions for Sequence B.

0 100 200 300 400 500 600 700 −60 −40 −20 0 20 40 60

x [m]

y [m]

Figure 4.4: Sequence B: This ﬁgure show a map of the SAM estimates, using the basic vehicle process model.

An Optimization Based Approach to Visual Odometry Using Infrared Images

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

An Optimization Based Approach to Visual

Odometry Using Infrared Images

An Optimization Based Approach to Visual

Odometry Using Infrared Images

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan i Linköping

av

Abstract

Sammanfattning

Acknowledgments

Contents

Chapter 1

Introduction

1.1

Background

1.2

Problem Formulation

1.3

Autoliv

1.4

Camera System Overview

1.5

Related Work

Chapter 2

System Modeling

2.1

Coordinate Frames

2.2

Rotation Matrices

2.3

Vehicle Process Model

2.3.1

Basic Model

0

0

2.3.2

Model Extensions

2.4

Landmark Parametrization

2.5

Measurement Model

Chapter 3

Visual Odometry and SLAM

3.1

Extended Kalman Filter

3.2

Feature Detection

3.3

Feature Association

3.4

Smoothing and Mapping

3.5

Feature Association Improvement

Chapter 4

Results

4.1

Performance Measures

4.1.1

Landmark Measurement Residuals

4.1.2

Trajectory Plot in Camera Image

4.2

SAM and Model Extensions

Sequence A

x [m]

y [m]

Sequence B

x [m]

y [m]