Using Homographies for Vehicle Motion Estimation

(1)

Department of Electrical Engineering

Master’s Thesis

Using Homographies for Vehicle Motion

Estimation

Pär Lundgren

LiTH-ISY-EX–15/4846–SE

Linköping 2015

Department of Electrical Engineering Linköping University

(2)

(3)

Using Homographies for Vehicle Motion Estimation

Master’s Thesis in Automatic Control

completed at The Institute of Technology, Linköping University,

by

Pär Lundgren LiTH-ISY-EX–15/4846–SE

Supervisor: Michael Roth

isy_{, Linköpings universitet}

Daniel Ankelhed

Autoliv Electronics AB

Examiner: Martin Enqvist

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Division of Automatic Control Department of Electrical Engineering SE-581 83 Linköping Datum Date 2015-06-10 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

ISBN — ISRN

LiTH-ISY-EX–15/4846–SE

Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Using Homographies for Vehicle Motion Estimation

Författare Author

Pär Lundgren

Sammanfattning Abstract

This master’s thesis describes a way to represent vehicles when tracking them through an im-age sequence. Vehicles are described with a state containing their position, velocity, size, etc.. The thesis highlights the properties of homographies due to their suitability for estimation of projective transformations. The idea is to approximatively represent vehicles with planes based on feature points found on the vehicles. The purpose with this approach is to estimate the displacement of a vehicle by estimating the transformation of these planes. Thus, when a vehicle is observed from behind, one plane approximates features found on the back and one plane approximates features found on the side, if the side of the vehicle is visible. The pro-jective transformations of the planes are obtained by measuring the displacement of feature points.

The approach presented in this thesis builds on the prerequisites that a camera placed on a vehicle provides an image of its field of view. It does not cover how to find vehicles in an image and thus it requires that the patch which contains the vehicle is provided.

Even though this thesis covers large parts of image processing functionalities, the focus is on how to represent vehicles and how to design an appropriate filter for improving estimates of vehicle displacement. Due to noisy features points, approximation of planes, and estimated homographies, the obtained measurements are likely to be noisy. This requires a filter that can handle corrupt measurements and still use those that are not.

An unscented Kalman filter, UKF, is utilized in this implementation. The UKF is an approx-imate solution to nonlinear filtering problems and is here used to update the vehicle’s states by using measurements obtained from homographies. The choice of the unscented Kalman filter was made because of its ease of implementation and its potentially good performance. The result is not a finished implementation for tracking of vehicles, but rather a first attempt for this approach. The result is not better than the existing approach, which might depend on one or several factors such as poorly estimated homographies, unreliable feature points and bad performance of the UKF.

Nyckelord

(6)

(7)

Abstract

This master’s thesis describes a way to represent vehicles when tracking them through an image sequence. Vehicles are described with a state containing their position, velocity, size, etc.. The thesis highlights the properties of homographies due to their suitability for estimation of projective transformations. The idea is to approximatively represent vehicles with planes based on feature points found on the vehicles. The purpose with this approach is to estimate the displacement of a vehicle by estimating the transformation of these planes. Thus, when a vehicle is observed from behind, one plane approximates features found on the back and one plane approximates features found on the side, if the side of the vehicle is visible. The projective transformations of the planes are obtained by measuring the displacement of feature points.

The approach presented in this thesis builds on the prerequisites that a camera placed on a vehicle provides an image of its field of view. It does not cover how to find vehicles in an image and thus it requires that the patch which contains the vehicle is provided.

Even though this thesis covers large parts of image processing functionalities, the focus is on how to represent vehicles and how to design an appropriate filter for improving estimates of vehicle displacement. Due to noisy features points, ap-proximation of planes, and estimated homographies, the obtained measurements are likely to be noisy. This requires a filter that can handle corrupt measurements and still use those that are not.

An unscented Kalman filter, UKF, is utilized in this implementation. The UKF is an approximate solution to nonlinear filtering problems and is here used to update the vehicle’s states by using measurements obtained from homographies. The choice of the unscented Kalman filter was made because of its ease of imple-mentation and its potentially good performance.

The result is not a finished implementation for tracking of vehicles, but rather a first attempt for this approach. The result is not better than the existing ap-proach, which might depend on one or several factors such as poorly estimated homographies, unreliable feature points and bad performance of the UKF.

(8)

(9)

1

Introduction

This thesis is about target tracking of vehicles in the field of view of a camera that is mounted in a car. The goal is to construct a tracker that makes use of feature points in 2D images to create robust estimates of a vehicle’s position, velocity and pose, etc.. The objective is to obtain a tracker that can provide the driver with accurate information regarding poses of oncoming and preceding vehicles.

1.1 Background

By utilizing powerful computational resources, a large number of sensors and sophisticated dynamical models it is today possible to obtain good estimates of what the surrounding environment looks like. At Autoliv Electronics today, they detect vehicles and pedestrians from 2D images and generate models of them. This enables a safety system that is capable of providing drivers with helpful information about the road ahead.

The existing system at Autoliv makes use of data from a stereo camera to retrieve information about the road ahead. There are two cameras that together provides two images which are combined into a depth image, a 3D image, of the field of view, FOV, see Figure 1.1. From this it is possible to extract information regarding visible vehicles, for example the distance to a vehicle and its yaw rate.

Badino et al. [2009] introduced the stixel representation for 3D-world represen-tation. By the use of stixels, that is columns of pixels, it is possible for the current system to estimate the length and the direction of vehicles. Stixels are obtained from the depth image. The concept is that if several adjacent pixels in a image col-umn have roughly the same depth values, then they together constitute a stixel. Hence a stixel indicates that something is perpendicular to the camera axis.

(12)

2 1 Introduction Field of view Visible sides Heading direction Camera axis Yaw Camera

Figure 1.1:The camera is drawn solid with the camera axis as a dashed line. The vehicle is the rectangle heading in the direction of the arrow. The yaw-angle is drawn between the heading direction of the vehicle and the dotted line that is parallel to the camera axis.

If stixels are detected in several neighbouring columns they can be grouped to represent a patch of the image. Grouping stixels requires that neighbouring stix-els consists of similar depth values. When observing a vehicle the stixstix-els that are obtained from one visible side can be grouped into a patch that approximatively represents a rectangle. When two sides of a vehicle are visible it is possible to approximate two rectangular patches. The choice of breaking point, where the two patches are divided, is not described in this thesis since the exact description of how stixels are grouped lies beyond the scope of this thesis. The stixels in each patch might vary in depth depending on the vehicle’s pose. Seen from above the rectangles are represented by an L-shaped line consisting of the two sides that are facing the camera. This L-shape is determined by the existing system. The heading direction of the vehicle is determined from the pose and the velocity vector of the obtained L-shape. The yaw rate is the derivative of the estimated yaw angle, as illustrated in Figure 1.1.

Due to the uncertainty of the currently used L-shape corner, the direction of the vehicle is uncertain as well. This motivates Autoliv to investigate additional meth-ods to improve their current system and provide a more accurate representation of the surrounding vehicles.

The first step to track an object through a sequence of images with the help of feature points is to locate the object of interest in the images. This task is done by using a classifier. Since this is outside the scope of this thesis it is not further

(13)

1.1 Background 3

Figure 1.2:Feature points located on the side are obtained on a much smaller part of the image even though they might be located further apart in the world. Since the length of the car is greater than the width, the opposite relationship would be desirable.

explained here. Provided the interesting patch of the image, the feature points can be extracted.

It is possible to obtain information regarding the object of interest by locating the same features in two consecutive images. The retrieved information depend on the approach that is used to represent the object and the assumptions that are made.

One might choose among different approaches to make use of the data retrieved from tracking feature points. Fundamental for utilizing feature points is to relate their movement in 2D images to the vehicle’s movement on the road. This re-quires an approach that can utilize the relationship among feature points and approximate the overall displacement between consecutive frames. This will introduce noise and uncertainty due to unreliable tracks of individual feature points. Hence it is important that the chosen approach can handle uncertainty and unreliable measurements.

First of all, individual feature points that are related to the vehicle are extracted and then tracked between images. When tracking feature points individually an important problem arises. One should utilize the tracked feature points and relate their displacement between images to the vehicle’s motion. Since the ob-tained displacement of features has limited accuracy it is desirable to use a repre-sentation that is robust even though unreliable displacements are obtained. Since vehicles are usually observed from behind in a slightly skewed position, the long side of the vehicle is seen within a small patch of the image. This is illustrated in Figure 1.1, where the long side is referring to the left of the two visible sides. From this follows that feature points from the long side are more sensitive to changes in yaw angle of the vehicle than features that are from the back of the vehicle. Figure 1.2 illustrates how the vehicle is observed by the camera in Figure 1.1, with feature points drawn.

Figure 1.1 and Figure 1.2 emphasize another problem, the feature points found on the side of the vehicle will transform in a different manner than those on the back. Therefore one must treat the feature points found on each side separately.

(14)

4 1 Introduction

Feature points found on one side of the vehicle can be approximated to be located in a plane, hence feature points from two sides can be approximated with two planes. With the approximations of planes from feature points one can describe the transformation of features points between frames with the transformation of planes. Hence it is possible to approximatively describe the change of pose of the vehicles from the approximated transformations of planes.

1.2 Objective

The main objective of this thesis is to find a way to use the tracks of individual feature points for describing the vehicle motion in an accurate way. A crucial aspect is the way that one chooses to represent the vehicle since it decides the ability of utilizing the feature points for describing projective transformations. The goal is to make use of feature points from both of the vehicle’s visible sides and to design a filter that utilizes the representation of vehicles made by two planes to estimate states from the obtained measurements.

Transformation of feature points from both the back and from the side of the ve-hicle should be described by a projective transformation of a plane. This should enable the ability to capture any motion of the vehicle. Since the obtained dis-placements of feature points are likely to be noisy and since all feature points on one side are approximated to be located in a plane it is required to have a filter that handles uncertainty.

1.3 Limitations

The focus in this thesis is on tracking that involves filtering, modelling and han-dling of noise. Therefore, the functionalities used for image processing are ob-tained by using functions from open source libraries.

One initial goal of the thesis was to evaluate different potential filtering methods. The aim was then to evaluate the performance of both the extended Kalman filter, EKF, and the unscented Kalman filter, UKF, separately and then compare them. This has not been performed since the representation of vehicles was considered more interesting to investigate.

Another initial thought of approach for this thesis was to make use of a depth im-age by augmenting feature points with data from this imim-age and use the feature point’s depth when tracking. This thought was discarded due to insight of other interesting approaches for this thesis.

This thesis project was carried out at Autoliv Electronics, Linköping, and their existing system has been used to support this project’s implementation with data and functionalities. The data provided by Autoliv’s existing system consist of im-ages and the prediction of vehicles’ state and state covariance. Two functions are provided, one function that determines the region of interest, ROI, for a vehicle and one transition function. The ROI is derived from a state and it represents a

(15)

1.4 Related work 5

patch in the image where the vehicle is located. The transition function performs the time update of a state. The use of a given transition function in this thesis is motivated by the objective of utilizing homographies for measurement updates. Therefore no effort has been made to increase the accuracy of Autoliv’s existing transition function.

1.4 Related work

Tracking objects in image sequences has been investigated for some time and a variety of approaches exist.

1.4.1 Image processing

Computer vision is today a highly active research area and there are a wide range of approaches to determine the best representation of the real world from images. Since the goal of this thesis is to track vehicles through image sequences it is vital to find good features to track. Shi and Tomasi [1994] proposed a method for fea-ture point extraction from 2D intensity images. The method is today recognized and has been used widely for many years. The implementation in this thesis will hence use Shi and Tomasi’s method to extract and select feature points.

When tracking feature points there are a number of considerations to take into account. Lucas and Kanade [1981] presented a feature point tracker that per-forms image registration through a local search. For example, they described how to utilize linearisation of image properties. Bouguet [2001] introduced an algorithm that performs tracking on images represented in a pyramidal form. By using a pyramidal representation, the algorithm he introduced was able to uti-lize the approximations introduced by Lucas and Kanade [1981] even when the displacement of image features was large between the frames.

These methods are well recognized and generally accepted as high performance methods. Due to their availability and reliability the above methods will be used for the underlying image processing in this thesis.

At Autoliv they use a stereo camera to obtain a depth image, in addition to the gray scale image. Many studies have been performed to evaluate different tech-niques that utilize the depth information, for example finding feature points in depth images. Among those are Loop Closure by the Help of Surface Elements, surfels [Weise et al., 2009] and Shape Index Mapping, SIM [Gedik and Alatan, 2013, He et al., 2013].

1.4.2 Filtering methods

Since the measurements obtained from tracking feature points can be unreliable it is desirable to have a filter that is able to handle unreliable measurements. A Kalman filter is likely to handle such circumstances rather well and is therefore chosen in this thesis [Gustafsson, 2010].

(16)

6 1 Introduction

The Extended Kalman Filter, EKF, is a filter developed for handling nonlinear systems which is the general case for real systems [Welch and Bishop, 1995]. The EKF uses linearisation around the current mean for describing nonlinearities and hence it only approximates the actual system. Since nonlinearities are common when one observes vehicles in traffic the EKF was a potential candidate for use in this project.

Another filter that is developed for handling nonlinearities is the Unscented Kalman Filter, UKF. The UKF was introduced by Julier and Uhlmann [1997] and has since been further refined and given alternative designs. For example, the scaled UKF was introduced by Julier [2002].

The UKF approaches the problem of approximating nonlinearities in a different manner. It predicts the covariance of the system based on a few samples and hence it only approximates the system, as with the EKF. An important difference from the EKF is that the UKF does not linearise around the current mean to es-timate the nonlinearities. In some cases the UKF might have advantages to the EKF [Wan and Van Der Merwe].

1.5 Outline

The outline of the thesis is as follows. The three following chapters, Chapters 2-4, contain the theory that is used in this thesis. Chapter 2 contains theory about im-age processing. It describes the means that are used for extracting features from images and tracking them through image sequences. Chapter 3 describes the fil-ter theories that are of infil-terest. It describes two fundamental filfil-ters that are used for nonlinear applications and contains a small part regarding the differences be-tween these two and the arguments for the selected filter. Chapter 4 includes the theory used for the vehicle representation. It describes some parts of the mathe-matical concept known as projective geometry and the use of homographies. The three remaining chapters, Chapters 5-7, describe the implementation, result and conclusion. Chapter 5 describes how the implementation was performed. It re-lates in large part to the theory described in the Chapters 2-4. Chapter 6 contains an evaluation method and the obtained result. It mentions possible causes for lack of performance. Chapter 7 contains conclusions that have been made dur-ing the thesis and some retrospectives regarddur-ing the work. It suggests future work to improve the existing implementation and also an alternative approach.

(17)

2

Image Processing

This chapter describes the image processing methods that are utilized in this the-sis. More specifically it handles topics such as feature point extraction and track-ing of feature points.

2.1 Background

The images used in this thesis are gray-scale with 8-bits resolution. Image motion is derived from tracking of feature points, which means that one finds character-istic points in the image and follows these through a sequence of images.

Autoliv utilizes a classifier that supports the tracker with information regarding in which region of the image it is interesting to search for features. This area is only a small part of the entire image and it enables the tracker to find features that are related to the object. The tracker can then extract feature points within this area to find reliable features to use for tracking. The area is referred to as the region of interest, ROI.

When determining image motion by feature point tracking, it is essential to choose reliable feature points. The first step when selecting feature points from images is to determine which information that distinguishes good feature points from the rest of the image. Shi and Tomasi [1994] proposed a feature point cri-terion that is based upon the functionality of the tracker. Their focus was on determining the affine changes of features that arise when one performs tracking of moving objects.

It is assumed that the information that distinguishes a feature point is not trans-formed in a different way than what an affine transform is possible to describe.

(18)

8 2 Image Processing

This is assumed since a match between feature points only depends on a small area that is surrounding the feature and not an large patch of the image, which is the case when matching larger objects between images. The feature points de-pends on a small area since they are chosen due to the characteristics of their absolute closest neighbouring area.

To begin with, image motion can be described as

J(x, y) = I(x − ξ(x, y), y − η(x, y)), (2.1)

where the latter image J can be obtained from the previous image I by moving every pixel, (x, y), from the previous image by the image displacement δ = (ξ, η) of the point x = (x, y).

Shi and Tomasi [1994] emphasized the importance of using an affine motion model that handles the fact that different points can move in different ways within an image. They show that this is superior to the alternative of only de-scribing pure translation for image motion. Affine motion can be represented with δ = Dx + d, where D = " dxx dxy dyx dyy # (2.2) is a deformation matrix and d is the pure translation of the image centre. Point x in the first image I then moves to point Ax + d in the second image J, see Figure Figure 2.1 where A = 1 + D and 1 is the identity matrix. Given this, two images can be related as

J(Ax + d) = I(x). (2.3)

The affine transformation that is mentioned here is not to be confused with the projective transformation that is used for modelling the entire vehicle, see Chap-ter 4.

2.2 Feature point extraction

Shi and Tomasi [1994] brought forward the necessity of finding features that con-tain enough information for being tracked reliably. They proposed a feature point criterion that is optimal by construction since it selects features based on the func-tionality of the tracker.

The objective of tracking a feature is to find the matrixA and the vector d that

minimize the absolute value of

 =

Z

W

J(Ax + d) − I(x)2w(x)dx, (2.4)

(19)

2.2 Feature point extraction 9

Figure 2.1: Displacement vector Ax+d in image J corresponds to displace-ment vector x in image I. The car is just for illustrating purposes, in practice it is not an appropriate feature.

w(x) is a weighting function which for simplicity can be set to 1 [Shi and Tomasi,

1994]. Figure 2.1 shows the displacement vector for the feature window in image

I and J, respectively.

Linearisation ofJ(Ax + d) via Taylor expansion gives

J(Ax + d) = J(x) + gTδ, (2.5)

where g = hgx gy

iT

= h∂J_∂x,∂J_∂yiT and δ = Dx + d. In accordance with Shi and Tomasi [1993] this yields the 6 × 6 linear system

T z = a (2.6)

where zT = hdxx dyx dxy dyy dx dy

i

contains the values of the deforma-tion matrix D and the displacement d, a is an error vector and T is shown in (2.8). The error vector a depends on the differences of the images as

a= Z W h I(x) − J(x)i                      xgx xgy ygx ygy gx gy                      w(x)dx. (2.7)

The 6 × 6 matrix T , which is obtained from one image, is derived as

T = Z W " U V VT Z # w(x)dx (2.8) where

(20)

10 2 Image Processing U =              x2gx2 x2gxgy xygx2 xygxgy x2gxgy x2gy2 xygxgy xygy2 xyg_x2 xygxgy y2gx2 y2gxgy xygxgy xygy2 y2gxgy y2gy2              , V =              xgx2 xgxgy xgxgy xgy2 yg2_x ygxgy ygxgy ygy2              and Z = " gx2 gxgy gxgy g2y # .

However, since the deformation of a feature between two frames is assumed to be small, the deformation matrix D can be set to a zero matrix. Attempting to deter-mine the deformation can actually lead to poor displacement solutions according to Shi and Tomasi [1994].

By utilizing that D can be set to zero the error vector a can be rewritten as error e, where e only contains the two last entries of vector a. This provides

Zd = e, where dT=hdx dy

i

, (2.9)

which can be used to determine the displacement d.

Z is determined from one image, here image J, and is used to select feature. It is

necessary that the eigenvalues of the 2 × 2 matrix Z are larger than a certain value. This is to ensure that the feature’s characteristics are reliable to use for tracking. The eigenvalues of Z can neither differ too much in magnitude since that could correspond to a unidirectional texture pattern.

Two large eigenvalues could for example represent a corner that could be tracked reliably. One large eigenvalue could correspond to a line in the image. In prac-tise, a predefined threshold λ works as a bound for which features that are to be chosen and which that are not. Thus, the patch is chosen to represent a feature if

min(λ1, λ2) > λ (2.10)

and discarded otherwise.

In this project features are drawn from the part of the image that contains the vehicle. The best features are chosen and saved together with the vehicle’s state. The state of a vehicle contains data regarding the vehicle’s position, velocity, etc..

2.3 Optical flow calculation

When good features are found and related to a state in one image they can be used in the next, concurrent, image by finding the corresponding feature points there. The distance with which the feature has moved is then possible to obtain, and this is called the optical flow. The Lucas-Kanade tracker, [Lucas and Kanade, 1981], is a method used to determine the optical flow.

(21)

2.3 Optical flow calculation 11

Figure 2.2:Displacement vector x+d in image J corresponds to displacement vector x in image I. The car is just for illustrating purposes.

2.3.1 Image registration

By utilizing certain similarity measures one can map features between images, known as image registration. By minimizing those similarity measures it is possi-ble to perform image registration with desired precision. The image registration procedure is what is enabling the computation of the optical flow.

Measure of similarities between images can be derived by taking either the L1or

the L2norm as L1,norm = X xW I (x)−J(x+d) and L2,norm = X xW I (x)−J(x+d) 21/2 , (2.11)

where W is the integration window, that is the part of the image that is interesting to compare.

The image registration procedure used by the Lucas-Kanade tracker aims to find the displacement vector d = [dx, dy] that minimizes the L2 norm, and the error

function (d), for a window that surrounds the point. The error function is for-mulated as (d) = (dx, dy) = L2,norm 2 = ux+wx X x=ux−wx uy+wy X y=uy−wy I(x, y) − J(x + dx, y + dy) 2 , (2.12)

which gives the size of the integration window as (2wx+ 1) · (2wy+ 1) pixels.

The optimal solution is derived by taking the first derivative of (d) with respect to d and finding its zero point as

∂(d)

∂d =

h

(22)

By approximating J linearly equation (2.12) gets quadratic in dx, dy and

mini-mization is then possible to obtain by solving a linear system of equations.

2.3.2 Pyramidal implementation

Due to the first order Taylor expansion, the Lucas-Kanade feature point tracker is only a good approximation when the displacement of feature points is small. Bouguet [2001] introduced a method that was able to utilize the properties of the Lucas-Kanade tracker and also handle large displacement of feature points. This method is based on a pyramidal image representation.

Pyramid image

Given the image I with a resolution of nx×ny, an image pyramid is derived by

generating subimages recursively. Each pyramid level contains half the resolu-tion of its preceding level.

The native image is hence at level 0 and the pyramid consist of Lmlevels. Each

level L is derived as IL(x, y) =1 4I L−1_{(2x, 2y)+} 1 8 IL−1(2x − 1, 2y) + IL−1(2x + 1, 2y)+ IL−1(2x, 2y − 1) + IL−1(2x, 2y + 1)+ 1 16 IL−1(2x − 1, 2y − 1) + IL−1(2x + 1, 2y + 1)+ IL−1(2x − 1, 2y + 1) + IL−1(2x + 1, 2y − 1). (2.14)

An implementation according to equation (2.14) makes use of the lowpass filter [ 1/4 1/2 1/4 ] × [ 1/4 1/2 1/4 ], for anti-aliasing purposes. However, in practice a lowpass filter likeh1/16 1/4 3/8 1/4 1/16i×

h

1/16 1/4 3/8 1/4 1/16iis used, according to the implementation proposed by Bouguet [2001].

Bouguet [2001] defines dummy image values one pixel around the image IL−1 according to IL−1(−1, y) =IL−1(0, y), IL−1(x, −1) =IL−1(x, 0), IL−1(nL−1x , y) =IL−1(nL−1x −1, y), IL−1(x, nL−1y ) =IL−1(x, nL−1y −1), IL−1(nL−1_x , nL−1y ) =IL−1(nL−1x −1, nL−1y −1), for 0 ≤ x ≤ nL−1x −1 and 0 ≤ y ≤ nL−1y −1.

(23)

From this follows that equation (2.14) is defined for x and y that suits the criteria of 0 ≤ 2x ≤ nL−1x −1 and 0 ≤ 2y ≤ nL−1y −1. Hence, the width nLxand length nLyof

image ILare the largest integers that fulfils the two criteria

nL_x=n L−1 x + 1 2 , nLy= nL−1y + 1 2 . (2.15)

From the image pyramid determined by (2.14) and (2.15) it is possible to handle large pixel motions in the image, and still keep the window used for integration small in the subimages. This enables the tracker to utilize the first order Taylor expansion. Typical numbers of levels that are used are 2, 3 or 4 depending on the maximum expected optical flow.

Pyramid tracking

For a given point x in image I the corresponding point xL =hxLx xLy

iT

in image

ILis

xL= x

2L. (2.16)

The residual pixel displacement for each image level L is derived, analogous to equation (2.12). In addition to (2.12) each pyramid level L is provided an initial guess, gL_{= [g}L

x gyL]T, of the optical flow. This gives the expression

L(dL) = L(d_xL, d_yL) = xLx+wx X x=xL x−wx xL y+wy X y=xL y−wy IL(x, y)−JL(x+g_xL+d_xL, y +g_yL+dL_y)2 (2.17)

to minimize with respect to dL. The initial guess, gL, for each image level depends on gL+1and the displacement vector d found in the previously evaluated level,

L + 1, according to

gL= 2(gL+1+ dL+1). (2.18)

Using the initial guess gLm _{= [0 0]}T_{, the final optical flow can be expressed as}

d= Lm X L=0 2LdL (2.19) Iterative computation

Provided the pyramidal representation of the images, {IL}_L=0,...,L

mand {J

L_}

L=0,...,Lm,

(24)

(2.13). The solution is obtained by iterative computations in a Newton-Raphson style. That is, the optimal solution is found by iterating in the direction of the minimum L2norm.

Provided the initial guess ¯vk−1 ₌ h

vxk−1 vyk−1

iT

for each iteration k and by de-noting J_kL(x, y) = JL(x + vxk−1, y + vk−1y ), the solution for iteration k is obtained by

minimizing the error function k

k( ¯nk) = k(nk_x, nk_y) = xLx+wx X x=xLx−wx xL y+wy X y=xLy−wy IL(x, y) − J_kL(x + nk_x, y + nk_y)2, (2.20) with respect to ¯nk _{= (n}k x, nky).

The vector ¯v is updated in each iteration according to

¯

vk = ¯vk−1+ ¯nk. (2.21)

If the norm of the vector ¯nk is smaller than the accuracy threshold the iteration can be terminated and the solution ¯vkcan be used for the initial guess of the next pyramidal level L − 1. More we get

dL = ¯vk (2.22)

which is used according to (2.18) to update the initial guess gL−1 for the next pyramidal level, L − 1.

Optical flow algorithm

Algorithm 1 summarizes the sequence of derivations made when finding the optimum optical flow according to the pyramidal implementation of the Lucas-Kanade tracker described by Bouguet [2001].

(25)

Algorithm 1Find the corresponding vector v in image J to the point u in image I. Build representations of pyramidal I and J: {IL}_L=0,...,L

M and {J

L}_L=0,...,L

m

Initial pyramidal guess: gLm₌h_gLm

x gyLm iT =h0 0iT forL = Lmto 0 step -1 do Location of u on image IL: uL=hux uy iT = u/2L Estimate the derivative of IL_{with respect to x: I}

x(x, y) = I

L_(x+1,y)−IL_(x−1,y)

2

Estimate the derivative of ILwith respect to y: Iy(x, y) = I

L_(x,y+1)−IL_(x,y−1)

2

Spatial gradient matrix:

G =Pux+wx x=ux−wx Puy+wy y=uy−wy " I2 x(x, y) Ix(x, y)Iy(x, y) Ix(x, y)Iy(x, y) Iy2(x, y) #

Initialization of iteration step L: ¯v0=h0 0iT

fork = 1 to K step 1(or until || ¯nk||_{< accuracy threshold) do}

Image difference: δIk(x, y) = IL(x, y) − JL(x + gxL+ vk−1x , y + gyL+ vk−1y )

Image mismatch vector: ¯bk =P ux+wx x=ux−wx Puy+wy y=uy−wy "δIk(x, y)Ix(x, y) δIk(x, y)Iy(x, y) #

Optical flow (according to Lucas-Kanade): ¯nk= G−1¯bk

Guess for next iteration: ¯νk = ¯νk−1+ ¯nk

end for

Final optical flow for level L: dL= ¯νK

Guess for next level L − 1: gL−1=hgxL−1 gyL−1

iT

= 2(gL+ dL) end for

Final optical flow: d = g0_{+ d}0

(26)

(27)

3

Filtering

This chapter describes the two filters that are of interest for this implementation. It describes their properties and compares their suitability for handling measure-ments obtained from feature point tracking.

In this thesis, the objective is to obtain a filter that handles measurements of fea-ture points obtained by the tracker described in Chapter 2 [Bouguet, 2001]. The filter is required to handle non-linearities such as the non-linear displacement of feature points between frames. The goal is to use the obtained measurements in accordance with their reliability, depending on what kind of measurement model that is used.

Two filters for approximate filtering in non-linear models have been considered for this thesis. Today at Autoliv they utilize an extended Kalman filter, EKF, for the purpose of target tracking. The other interesting alternative is the unscented Kalman filter, UKF, that has become rather popular during the last decades.

3.1 Extended Kalman filter

The EKF is a non-linear version of the Kalman filter. The EKF is widely used and has become a standard technique used in a number of nonlinear estima-tion applicaestima-tions during the last decades, often with good success [Wan and Van Der Merwe, Julier and Uhlmann, 2004].

The EKF can be applied to a nonlinear state-space model

xk = f (xk−1, uk−1) + wk−1 (3.1a)

zk = h(xk) + vk, (3.1b)

(28)

18 3 Filtering

where f is the transition function and h the observation model. Furthermore

xk and zk are the state and measurement at time k, respectively. The process

and observation noises, wk and vk, respectively, are assumed to be zero mean

multivariate Gaussian noises with covariances Qkand Rk, respectively.

The matrices Pk|k−1 and Pk|k are the predicted and corrected covariance,

respec-tively, for the state estimation error ˆx − x.

The EKF’s weak point is the linearisation around the state estimate, which might be inadequate. This is what has driven researchers to find alternative methods according to Julier and Uhlmann [2004]. Even though the EKF has some known flaws it has been widely used due to its relatively low computational complexity and because it is simple, it also often works well.

3.2 Unscented Kalman filter

Julier and Uhlmann [1997] proposed the UKF as a new extension of the Kalman filter for non-linear systems. They claimed the performance of the UKF to be equivalent to a Kalman filter in the linear case. The UKF was developed to ad-dress the weakness of the EKF, according to Julier and Uhlmann [2004], and it makes no use of linearisation via Taylor expansion.

(29)

3.2 Unscented Kalman filter 19

Unscented transform

Uhlmann [1995] presented in his doctoral dissertation the unscented transform, UT, that would be fundamental for the filter that he and Simon Julier later would introduced. The UT is based on the intuition that it is easier to estimate a prob-ability distribution than it is to estimate a non-linear function or transformation [Uhlmann, 1994]. The UT is a function that is used for estimation of non-linear transformations by applying their non-linear transform to a finite set of samples, known as sigma points. The sigma points are distributed around the last mated state so that their mean and covariance are in accordance with the esti-mated mean and covariance of the state.

The sigma points are hence likely to represent some aspects of the probability distribution of the given state. When the non-linear transformation is applied on the state and the sigma points the idea is that the result will reflect the transfor-mation of the entire probability distribution, see Figure 3.1.

Unscented transform

Figure 3.1: The unscented transform aims to reflect the transformation of the entire probability distribution of the given state.

Each sigma point, χi, is assigned a weight, Wi, which can be assigned any

non-negative value under the condition that

2L

X

i=0

Wi = 1, (3.4)

where L is the dimension of the state [Julier and Uhlmann, 2004]. 2L + 1 sigma points are generated, one at the mean and 2L at the contour of the covariance. This is to retrieve an unbiased transformation of the state and can be achieved through a number of variants of weights.

Time update

The UKF utilizes the unscented transform to update the state estimate and its covariance. When doing the time update of the state the first stage is to gener-ate sigma points corresponding to the previous estimgener-ate of stgener-ate, ˆxk−1|k−1, and

covariance, Pk−1|k−1. The sigma points can be distributed according to the scaled

(30)

where i and i − L indicates the column for the matrix within parentheses.

λ = α2(L + κ) − L where κ usually is set to 0 and α to 10−3_{[Julier, 2002].}

The square root of Pk−1|k−1 is achieved through Cholesky decomposition of the

covariance matrix as

Pk−1|k−1= AAT ⇒

q

Pk−1|k−1= AT. (3.6)

Thereafter, the sigma points are propagated through the transition function f and are time updated according to

˜

χ_k|k−1i = f (χi_k−1|k−1), i = 0, ..., 2L. (3.7) The sigma points are assigned weights according to the scaled unscented trans-form proposed by Julier [2002] as

Wm0 = λ L + λ Wc0= λ L + λ + (1 − α 2_{+ β)} Wmi = Wci = 1 2(L + λ), i = 1, .., 2L (3.8)

where β is set to 2 which is optimal for the case with Gaussian distribution [Julier, 2002].

The weights are used to provide the predicted state and the predicted covariance from the sigma points as

The next step in the UKF algorithm is to predict the measurement, zk. The result

of (3.7) is propagated through the observation model,

(31)

3.3 Filter properties 21

The observations, γ_ki, combined with their respective weights generate the pre-dicted measurement, ˆzk. This follows from

ˆzk =

2L

X

i=0

W_miγ_ki. (3.11)

The estimated measurement covariance is computed according to

Pzkzk =

2L

X

i=0

Wci(γki− ˆzk)(γki −ˆzk)T. (3.12)

The state measurement cross-covariance is also obtained from

Pxkzk =

2L

X

i=0

Wci(χik|k−1−xˆk|k−1)(γki −ˆzk)T (3.13)

and the Kalman gain is obtained as

Kk = PxkzkP

−₁

zkzk (3.14)

where R is the covariance matrix of the measurement noise.

The state is then updated by adding the innovation weighted by the Kalman gain to the predicted state, ˆxk|k−1. The innovation comes from the difference between

the predicted measurement, ˆzk, and the obtained measurement, zk. The update

is carried out according to ˆ

xk|k = ˆxk|k−1+ Kk(zk−ˆzk) (3.15)

and the covariance is updated as

Pk|k= Pk|k−1−KkPzkzkK

T

k. (3.16)

3.3 Filter properties

Regarding performance of handling nonlinearities, both the EKF and the UKF are limited since they both utilize approximations that introduce loss in perfor-mance. In a reality where nonlinearities occur, approximations cannot be more than qualified guesses and it is therefore hard to tell which of the filters that is best.

Even though the UKF has gained popularity during the latest years there are no complete result about whether it is better or worse than the EKF. Julier and Uhlmann [1997] claim that the performance of the UKF is better than the perfor-mance of the EKF. Another point of view is given by Gustafsson and Hendeby [2012]. They show that the claimed performance of the UKF does not hold for all

(32)

22 3 Filtering

cases but they also show that the UKF succeeds to provide a good approximation for many common sensor models.

The UKF has been chosen in this project due to its ease of implementation and also due to its potentially good performance. The implemented UKF is described in Chapter 5. Details regarding the observation model are described there, i.e. how obtained measurements from feature points are handled.

(33)

4

Projective geometry

This chapter describes some part of the mathematical concept known as projec-tive geometry. In large parts it relates to the theory described by Hartley and Zisserman [2004].

More specifically, it focuses on the means that are used in this thesis for de-scribing transformation of vehicle models. The transformation of interest ap-pears when vehicles are tracked and one tries to match their appearance between frames. The projective transformation is important when matching vehicles that are obtained in different poses and from different angles.

4.1 Background

The projective transformation extends the properties of the affine transformation that was mentioned in Chapter 2. For example, by using an affine transformation when tracking objects, the parallel lines belonging to the object are preserved. This is not desirable since it is not realistic to assume that parallel lines are con-sistent between frames. When for example a vehicle seen from behind changes pose, that is, appears from a different angle, the part rotated away from the cam-era becomes smaller in the the image. Similarly, the part that is rotated towards the camera is enlarged. In traffic, these so called cornerstone effects are likely to appear for vehicles that are observed by a camera.

If the visible sides of a vehicle are said represented with two planes, the projective transformation of those planes is a good estimate of how the vehicle is moving. The idea is to represent planes with feature points found on the vehicle and then obtain the projective transformation by tracking the features and matching the planes. Figure 4.1 shows the projective transformation of a rotated plane. This is

(34)

24 4 Projective geometry

Figure 4.1: Cornerstone effect of a rectangle. The right part is rotated in-wards and the left part outin-wards. The reference is drawn with dashed lines.

the kind of transformation that is desired to estimate by tracking feature points from vehicles between frames.

4.2 Projective geometry

Projective geometry is the field of geometry where geometric properties are in-variant under projective transformations. The projective space, P2, is what en-ables the projective geometry and it consists of a set of lines that pass through the origin of a vector space. One property of the projective space is that two par-allel lines, in Euclidean space, are said to intersect at infinity. This is analogous to a railway track meeting at the horizon, see Figure 4.2. From this follows that angles are not relevant in a projective space since they are not invariant under projective transformation.

Figure 4.2: Picture from Pixabay [2015]. The parallel track intersect at the horizon. In a projective space each line represents a point.

(35)

4.2 Projective geometry 25

in projective transformation. A point x = (x, y) in 2D-space is represented in homogeneous coordinates as x = (x1, x2, x3)T where the scale factor x3 , 0. The point x can then be described in inhomogeneous coordinates as

x = x1 x3

, y = x2 x3

. (4.1)

Multiplying a point with a constant in homogeneous coordinates results in the same point since it is still on the same line in the projective space. x = (x, y, 1) is one example of a point in homogeneous coordinates which is considered to be equivalent to x = (2x, 2y, 2).

4.2.1 Homographies

Hartley and Zisserman [2004], page 32, defines homography as an invertible map-ping h from P2to itself such that three points x1, x2and x3are on the same line if

and only if h(x1), h(x2) and h(x3) are. See Figure 4.3 where each pair of collinear

points in plane P1and P2occur where the lines from originn intersects the planes.

P1

P2 n

Figure 4.3: Each projective line intersect the corresponding points in both planes, P1 and P2, and they all intersect in n, the origin in the projective

space.

Hartley and Zisserman [2004] define a homography as a linear transformation on homogeneous 3-vectors represented by a nonsingular 3 × 3 matrix according to

(36)

26 4 Projective geometry xi₂= Hxi₁ (4.2) where H =         h1 h2 h3 h4 h5 h6 h7 h8 h9        

and where the equality is not as per value but in the direction of the left- and right-hand side expressions. They are hence equal in the projective space and can differ by magnitude.

A more eloquent way of describing the relation between xi₁and xi₂is therefore by the use of the cross-product equation

xi₂×_Hxi

1= 0 (4.3)

where the zero vector comes from the fact that they are pointing in the same direction.

The text in this part follows the theory presented by Hartley and Zisserman [2004], pages 32-33 and 87-93. Following the notations done by Hartley and Zisserman [2004] the j-th row of matrix H can be denoted as hjT and thus

Hxi =          h1Txi h2Txi h3Txi          . (4.4)

Denoting xi₂= [xi₂, y₂i, wi₂]T the cross-product can be written as

xi₂×_Hxi 1=          y₂ih3Txi₁−_wi 2h2Txi1 wi₂h1Tx₁i −_xi 2h3Tx1i x₂ih2Txi₁−_yi 2h1Txi1          = 0. (4.5) Exploiting that hjT_xi

1 = xiT1 hjand denoting 0 = [0, 0, 0]T, (4.5) can be rewritten

as          0T −_wi 2xiT1 y2ix1iT wi₂x₁iT 0T −_xi 2xiT1 −_yi 2xiT1 xi2xiT1 0T          | {z } =Ai         h1 h2 h3         |{z} =h = 0. (4.6)

This provides us with three sets of equations among which only two are linearly independent. Hence, since Aiis only linearly independent for two block rows the

(37)

4.3 RANSAC 27

expression can be reduced to " 0T −_wi 2xiT1 y2ixiT1 wi₂xiT₁ 0T −_xi 2xiT1 # | {z } =A0_i         h1 h2 h3         = 0. (4.7)

This way of expressing the equation is valid for any homogeneous coordinates defined as xi₁ = [xi₁, y₁i, wi₁]T and xi₂ = [x₂i, y₂i, w₂i]T. The scaling parameters w₁i and wi₂ can be arbitrarily chosen but for convenience w₁i = w₂i = 1 are good values.

Finding H

Considering that H has nine entries might indicate that there are nine degrees of freedom, but since H only is determined up to a certain scaling it leaves eight degrees of freedom left. The scaling can be arbitrarily chosen, for example as ||_h||_{= 1.}

By mapping one corresponding point between two different frames there are two degrees determined. Since one point has one x and one y component there two constraints added for each mapped point. Hence four points are required for solving H.

Provided four points one has four sets of equations as A0_ih= 0, according to (4.7). Rewriting the four sets of A0_ih= 0 into Ah = 0 by adding the rows of A0_i beneath each other provides totally eight equations to use for finding the unknown vector h. The solution is obtained by finding the null-vector of A.

When more than four points are used to find H the system of equations is over-determined. If the points are from image coordinates they are likely to contain noise. From this follows that there is no exact solution to Ah = 0. The interesting solution is hence found from minimizing the norm ||Ah||. Minimizing ||Ah|| can be achieved by finding the unit singular vector corresponding to the smallest singular value of A.

From tracked feature points, described in Chapter 2, the idea is that it is possible to derive a homography that relates the features between the frames. The number of features is required to be more than four in this implementation and hence an overdetermined system of equation is provided. Due to unreliable feature points it is necessary to detect outliers. A possible way to do this is to utilize a RANSAC method.

4.3 RANSAC

Kovesi [2014] provides a function named ransacfithomography that uses a

RANSAC (RANdom SAmple Consensus) method to detect outliers. The function by Kovesi makes use of four randomly chosen points to derive a homography.

(38)

28 4 Projective geometry

The homography is then applied to all other points and then the distances be-tween the plane and the points are calculated. Those points that lie within the considered threshold are considered inliers and those that do not are considered outliers.

After a fixed number of trials the homography that fits the most number of inliers is chosen. The inliers for the best model are used to derive a new homography from an overdetermined system of equations, since it is based on more than four points. The obtained homography is then used to obtain the measured transfor-mation of the ROI.

(39)

5

Implementation

This chapter describes the algorithms and the structure of the implementation. The chapter relates to the theory that is described in the previous chapters. The implementation suits the overall goal of the thesis well in theory since it utilizes both visible sides of vehicles to determine displacement.

5.1 Background

How to represent the objects of interest came to be the most vital part of the implementation. The object of interest is a vehicle observed in a more or less skewed pose. The implementation is provided images and prediction of vehicles’ states and state covariances from Autoliv’s existing system. The predicted states are estimates of vehicles in matters of position, speed, heading, yaw rate and size. A vehicle’s position and heading direction is relative to the ego vehicle but a vehicle’s speed and yaw rate are relative to the world. The existing system also provides a transition function and a function to determine the region of interest, ROI. It is Autoliv’s intention to not leave out any details of how they choose to describe the state of a vehicle or how their transition function is designed. The ROI is a patch of the image where the vehicle is located and the ROI is derived from the vehicle’s state. Two quadrangles represents the ROI, one for the back of the vehicle and one for the side. The properties of the ROI are depending on the vehicle’s position and heading direction. The transition function determines an estimate of a state forward in time. Autoliv’s existing transition function is also applicable inversely. By applying the inverse transition function one can es-timate a state backwards in time. Since the prerequisites in this project provided a state that was updated in time it was Autoliv’s intention to use the inverse

(40)

30 5 Implementation

sition function to determine the previous state. In this implementation the ROI is used in combination with the inversely applied transition function to predict measurements.

An interface is used to utilize data and functionalities from the existing system. The interface enables one to receive and send data to and from the existing sys-tem.

When a predicted state and predicted covariance are received from Autoliv’s sys-tem the implementation searches among existing tracks for a related state. Exist-ing tracks consist of estimated states of vehicles and coordinates of the vehicle’s corresponding feature points. Depending on the Euclidean distance to previously tracked vehicles, the tracker will either initiate a new track or update an existing one.

5.2 Initiating track

If no existing track has a state that is close enough to the predicted state, in terms of position, a new track is initiated. The position of a state is described in world coordinates where the origin is set at the ego vehicle, the vehicle where the cam-era is attached. The estimated position of a vehicle corresponds to the location of the point in Figure 1.1 from where the arrow for heading direction originates. When a new track is initiated the tracker locates new features within the deter-mined ROI or more specifically, within the area of interest, AOI, which is a more restricted area inside the ROI. This is shown in Figure 5.1, where the AOI is given by the two smaller rectangles that are inside the ROI. The state and the covari-ance of the initiated track is set equal to the predicted state and covaricovari-ance as

ˆ

xk|k= ˆxk|k−1,

Pk|k = Pk|k−1,

(5.1) where ˆxk|k−1and Pk|k−1are the predicted state and covariance received from the

existing system. The updated state and covariance, ˆxk|k and Pk|k respectively, are

saved and also sent back to Autoliv’s system. The image is saved so that one can track the located features in a following image.

The implementation makes use of a function from OpenCV [2014b] to extract feature points. Features are required to be positioned with a minimum distance, pixel-wise, from the closest neighbouring feature. This is to prevent that feature points are located too close to each other since, otherwise, there is a risk that they will be exchanged with each other. Exchanged feature points would decrease the reliability of the tracker since obtained measurements would contain more noise. A relative quality measure is used for feature points. This measurement requires that no feature is allowed to have less than 10 percent of the quality obtained

(41)

5.2 Initiating track 31

Figure 5.1:Vehicle spotted from behind with feature points initiated on back and side. The ROI is drawn in white and divided into two parts, one repre-senting the back and the side. The AOI is drawn in green and can be seen as the two inner rectangles.

from the best feature. The quality is obtained from the left hand-side expression of equation (2.10).

The AOI is used to ensure that found feature points belong to the vehicle and to exclude corners. Feature points located on corners might change characteristic depending on reflection and variation in light. Such feature points are therefore less reliable and not desirable to rely on when estimating the state of a vehicle. A feature point located on a corner could also be difficult to assign to any side of the vehicle since it might not fit to any of the planes that represent the vehicle’s sides.

Feature points from the side of the vehicle are used only if the side is represented in a patch that is wide enough. Figure 5.1 shows this situation where feature points are drawn on both the back and the side of the vehicle. The image patch representing the side in Figure 5.1 is at the verge of not being utilized for tracking feature points since it is very narrow.

(42)

32 5 Implementation

5.3 Tracking

When the Euclidean distance between the predicted state’s position and any ex-isting track’s position is small enough the tracker will make use of the matched track’s feature points. The feature points belonging to that track are located in the current image and from the tracked feature points the measurement is de-rived. The idea is to describe the displacement of feature points between images with homographies and apply the same homographies on the ROI that is derived from the predicted state. The result from applying a homography on a ROI is a new ROI. The considered measurement in this implementation is the ROI that is retrieved from applying the measured homography on the predicted ROI.

5.3.1 Sigma points from predicted state

The tracker determines a predicted measurement, based on the properties of the UKF, from the state’s ROI. The predicted state, ˆxk|k−1, and the predicted

covari-ance, Pk|k−1, are used to generate sigma points. The sigma points are distributed

according to (3.5) but ˆxk|k−1and Pk|k−1are used instead of ˆxk−1|k−1and Pk−1|k−1.

Since the idea of the observation model is to estimate the previous state’s ROI, the observation model utilize that the transition can be applied inversely to retrieve the previous state. The observation model applies the inverse transition function on each sigma point according to

f−1( ˆχ_k|k−1i ) = χi_k−1|k−1. (5.2)

The obtained sigma points, χi_k−1|k−1, are used to determine their respective ROIs. This is possible since a sigma point contains the same type of information as a state. The corners of their ROIs are the observations, γ_ki, as in (3.10) and they are used to determine the predicted measurement, ˆzk, according to (3.11). The

predicted measurement, ˆzk, is hence an estimate of the previous state’s ROI.

Figure 5.2 shows the corners that are used to describe a ROI. There are only six points needed to represent a ROI since the two in the middle are used both for the back and for the side. By moving the corners of a ROI one obtains a transformed ROI that is possible to describe by applying homographies on the original. Depending on whether the side of the vehicle is used or not the measurement will consist of either four or eight points. The side is not used if the patch that constitutes the side in the image is too narrow. When both sides are used there are two sets of points that contain the corner points of the L-shape. The points at the corner are duplicated since the implementation separates the measurements obtained from the back and from the side of the vehicle.

(43)

5.3 Tracking 33

Figure 5.2:Vehicle spotted from behind with ROI drawn. Points at the cor-ners of the ROI are used to describe the transformation of the vehicle.

5.3.2 Predicted measurements

The obtained sets of corner points are the observations, γ_ki, of the sigma-states,

h( ˆχi_k|k−1), in accordance with (3.10). Observation γ_ki consists of four pairs of u and v coordinates for the corners as

γ_ki =hu_k,1i v_k,1i u_k,2i v_k,2i u_k,3i v_k,3i ui_k,4 v_k,4i iT . (5.3) The predicted measurement is determined both for the back and for the side of the ROI. Hence two separate observations are obtained for each set of sigma point.

The predicted measurement, ˆzk, is thereafter obtained by using (3.11) and the

predicted measurement covariance, Pk|k−1, according to (3.12).

An alternative approach for the predicted measurement would be to apply the transition function inversely on the predicted state ˆxk|k−1 to retrieve ˆxk−1|k−1.

Then generate sigma points around ˆxk−1|k−1and propagate those trough the

tran-sition function. Doing so would provide a predicted measurement ˆzk for the

current state’s ROI.

The weights that are used in the filter are chosen according to (3.8) and the Kalman gain is derived as in (3.14) where the covariance matrix R is estimated

(44)

34 5 Implementation

from data. The matrix R is further tuned manually to give the filter satisfactory properties.

5.3.3 Optical flow of feature points

The coordinates of the related track’s feature points and the previous image are used to determine the optical flow. The feature points belonging to the related track, with state ˆxk−1|k−1, are searched for within the ROI that belongs to the

predicted state, ˆxk|k−1. The search for every feature is restricted to the

neighbour-hood of the feature’s location in the previous frame. The matched features in the current frame are required to be within the ROI in order to prevent erroneous features to affect the performance of the filter.

In summary the implementation finds new coordinates for the feature points from the previous image within the new image. The feature points’ displace-ments between the images are the optical flow. The search for matching features follows the approach described in Chapter 2 and is performed through the imple-mentation from OpenCV [2014a]. See Algorithm 1 in Chapter 2 for an overview.

5.3.4 Measurement from homography

The matched feature points are used to derive an estimate of the vehicle’s trans-formation. A homography is computed to match the displacement of the feature points such that it describes the displacement from the current frame to the pre-vious. If both visible sides are used for the track, two homographies are derived independently of each other.

The feature points are converted into homogeneous coordinates with scale fac-tor one, see page 25. The homography is derived with the implementation from Kovesi [2014] which uses the theory described in Chapter 4. The minimum num-ber of feature points are eight and hence an overdetermined system of equations is provided, see page 27 for details. The requirement of having at least eight tracked feature points regards each side of the vehicle. The requirement is set to have a more robust estimate of the vehicle’s transformation between images since it is assumed that some feature points are tracked erroneously.

This stage introduces uncertainty since it is assumed that all features belong to a flat plane, which in general is not the case with features found in a frame. For example a feature located on the towbar or the rear-view mirror is distanced to the plane that is represented by the features located on the licence plate or front door.

The assumption that all feature points lie in a plane requires that outliers are de-tected and removed so that only reliable feature points are used to estimate the homography. This implementation makes use of the ransacfithomography

func-tion written by Kovesi [2014] to sort out outliers and provide a homography that is determined from inliers. Figure 5.3 shows a track with two outliers.

The homographies obtained from the tracked feature points are used to provide the measurement, zk. From applying the obtained homographies on the

Using Homographies for Vehicle Motion Estimation

Department of Electrical Engineering

Master’s Thesis

Using Homographies for Vehicle Motion

Estimation

Pär Lundgren

Using Homographies for Vehicle Motion Estimation

Master’s Thesis in Automatic Control

completed at The Institute of Technology, Linköping University,

by

Abstract

Contents

1

Introduction

1.1

Background

1.2

Objective

1.3

Limitations

1.4

Related work

1.4.1

Image processing

1.4.2

Filtering methods

1.5

Outline

2

Image Processing

2.1

Background

2.2

Feature point extraction

2.3

Optical flow calculation

2.3.1

Image registration

2.3.2

Pyramidal implementation

3

Filtering

3.1

Extended Kalman filter

3.2

Unscented Kalman filter

3.3

Filter properties

4

Projective geometry

4.1

Background

4.2

Projective geometry

4.2.1

Homographies

4.3

RANSAC

5

Implementation

5.1

Background

5.2

Initiating track

5.3

Tracking

5.3.1

Sigma points from predicted state

5.3.2

Predicted measurements

5.3.3

Optical flow of feature points

5.3.4

Measurement from homography