Detection and tracking of overtaking vehicles

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Detection and tracking of overtaking vehicles

Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet

av

Daniel Hultqvist LiTH-ISY-EX--13/4689--SE

Linköping 2013

Department of Electrical Engineering Linköpings tekniska högskola Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping

(2)

(3)

Detection and tracking of overtaking vehicles

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Daniel Hultqvist LiTH-ISY-EX--13/4689--SE

Handledare: Johan Dahlin

isy_{, Linköpings universitet}

Jacob Roll

Autoliv Electronics

Fredrik Svensson

Autoliv Electronics

Examinator: Thomas Schön

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Division of Automatic Control Department of Electrical Engineering SE-581 83 Linköping Datum Date 2013-06-17 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://www.ep.liu.se

ISBN — ISRN

LiTH-ISY-EX--13/4689--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Detektion samt följning av omkörande fordon Detection and tracking of overtaking vehicles

Författare Author

Daniel Hultqvist

Sammanfattning Abstract

The car has become bigger, faster and more advanced for each passing year since its first appearance, and the safety requirements have also become stricter. Computer vision based support is a growing area of safety features where the car is equipped with a mono- or stereo camera. It can be used for detecting pedestrians walking out in the street, give a warning for wild-life during a cold January night using night-vision cameras and much more. This master thesis investigates the problem of detecting and tracking overtaking vehicles. Vehicles that overtake are only partly visible in the beginning, rendering it hard for standard detection/classification algorithms to get a positive detection. The need to quickly detect an incoming vehicle is crucial to be able to take fast counter-measure, such as braking, if needed. A novel approach referred to as theWall detector is suggested, detecting incoming vehicles

using one-dimensional optical flow. Under the assumption that an overtaking car is moving in parallel to the ego-vehicle, both cars are moving towards the vanishing point in the image. A detection wall, consisting of several detection lines moving towards the vanishing point, is created, making all objects that are moving parallel to the ego-vehicle move along these lines.

The result is a light-weight and fast detector with good detection performance in real-time. Several approaches for the Wall detector are implemented and evaluated, revealing that a feature based approach is the best choice. The information from the system can be used as input to heavier algorithms, boosting the confidence or to initialize a track.

Nyckelord

Keywords mono camera, vehicle detection, vehicle overtaking, vehicle cut-in, optical flow, feature track-ing

(6)

(7)

Sammanfattning

Bilen har blivit större, snabbare och mer avancerad för varje år som gått sedan den uppfanns och även säkerhetskraven har blivit striktare. Ett område inom säkerhetsfunktionalitet som växer rejält just nu är att använda kameror i bilen för att ge datorseende-baserad hjälp. Det kan användas för att upptäcka människor som vandrar ut i gatan, visa en varning för vilddjur en kall januarinatt m.h.a. en mörkerseende kamera och mycket mer.

Det område detta examensarbete undersöker är att tidigt upptäcka när bilar kom-mer in i bild från sidan. Bilar som kör om är endast delvis synliga i början, vilket gör det svårt för standarddetektionsalgorithmer att få ett positivt utslag. Möjlig-heten att tidigt upptäcka en omkörande bil är kritisk för att kunna ta snabba beslut, t.ex. bromsa om bilen kör in för snävt. En ny metod kalladWall detector

föreslås som detekterar inkommande bilar m.h.a. endimensionellt optiskt flöde. Under antagandet att en omkörande bil kör parallellt med sin egen bil kommer båda bilar röra sig mot flyktpunkten (eng. vanishing point) i bilden. En detekte-ringsvägg bestående av flertalet detekteringslinjer med lutning mot flyktpunkten skapas sådant att alla objekt kommer röra sig längs med linjerna.

Resultatet är en snabb detektor med bra detektionsförmåga i realtid. Flertalet me-toder för att detektera rörelser längs linjerna presenteras, där ett intressepunkts-tillvägagångssätt visar sig vara den bästa. Informationen från detektionen kan matas vidare till tyngre algoritmer, t.ex. för att öka säkerheten i en klassifiering eller för att initiera ett nytt track.

(8)

(9)

Abstract

The car has become bigger, faster and more advanced for each passing year since its first appearance, and the safety requirements have also become stricter. Com-puter vision based support is a growing area of safety features where the car is equipped with a mono- or stereo camera. It can be used for detecting pedestrians walking out in the street, give a warning for wild-life during a cold January night using night-vision cameras and much more.

This master thesis investigates the problem of detecting and tracking overtaking vehicles. Vehicles that overtake are only partly visible in the beginning, rendering it hard for standard detection/classification algorithms to get a positive detection. The need to quickly detect an incoming vehicle is crucial to be able to take fast counter-measure, such as braking, if needed. A novel approach referred to as theWall detector is suggested, detecting incoming vehicles using one-dimensional

optical flow. Under the assumption that an overtaking car is moving in parallel to the ego-vehicle, both cars are moving towards the vanishing point in the image. A detection wall, consisting of several detection lines moving towards the vanishing point, is created, making all objects that are moving parallel to the ego-vehicle move along these lines.

The result is a light-weight and fast detector with good detection performance in real-time. Several approaches for the Wall detector are implemented and evalu-ated, revealing that a feature based approach is the best choice. The information from the system can be used as input to heavier algorithms, boosting the confi-dence or to initialize a track.

(10)

(11)

Acknowledgments

First, I would like to thank Autoliv Electronics for the opportunity to do this master thesis, supplying the idea and an office to carry out the work. A huge thanks to my supervisors at Autoliv Electronics, Jacob Roll and Fredrik Svensson, for their invaluable support providing both theoretical and practical expertise. A special thanks goes to my supervisor at Linköping University, PhD. student Johan Dahlin, for all the discussions that could often result in a new way of think-ing. His comments and suggestions on the report have been especially helpful. I would also like to thank my examiner Dr. Thomas Schön, ISY at Linköping Uni-versity, for his valuable comments on both the report and the work throughout the thesis.

Finally, a great thanks to my family. Without your love and support, none of this would have been possible.

Veni, vidi, vici Linköping, June 2013 Daniel Hultqvist

(12)

(13)

1 Introduction 1 1.1 Problem formulation . . . 2 1.2 Related work . . . 3 1.3 Implementation overview . . . 4 1.3.1 Detection . . . 4 1.3.2 Tracking . . . 4 1.4 Limitations . . . 5 1.5 Results . . . 5 1.6 Autoliv Electronics . . . 6 1.7 Outline . . . 6 2 Optical flow 7 2.1 Optical flow approaches . . . 7

2.2 One-dimensional optical flow . . . 8

2.3 Cost functions . . . 9

2.3.1 Estimating the displacement in 1D . . . 10

2.4 Two-dimensional optical flow . . . 11

2.4.1 Generalization to multiple dimensions . . . 12

2.4.2 Reducing complexity . . . 13

2.4.3 Model selection . . . 13

2.4.4 Finding good features to track . . . 14

3 The Wall Detector 17 3.1 Creating detection lines . . . 18

3.2 Retrieving line values . . . 19

3.3 Pre-processing of lines . . . 20

3.4 Tracking the entire line . . . 20

3.4.1 Advantages . . . 21

3.4.2 Disadvantages . . . 21

3.4.3 Improving the method, sub-lines . . . 22

3.5 Tracking 1D features using Lucas-Kanade . . . 25

3.5.1 Finding the features . . . 25

3.5.2 Putting it together . . . 26

(14)

x CONTENTS

3.6 Tracking features using dynamic programming . . . 28

3.7 Common issues and solutions . . . 32

4 Template tracker 33 4.1 Creating a detection zone . . . 33

4.2 Implementation of the KLT tracker . . . 34

4.3 Scale pyramid . . . 35

4.4 Feedback to the system . . . 37

4.5 Handling template drifting . . . 37

4.6 Summary . . . 39

5 Experimental results and discussion 41 5.1 Wall detector evaluation . . . 41

5.1.1 Ability to track . . . 41

5.1.2 Convergence . . . 43

5.1.3 False hits when no overtakes . . . 45

5.1.4 Ability to detect incoming vehicles . . . 47

5.1.5 Overall performance . . . 48 5.2 2D Tracker evaluation . . . 49 5.2.1 Convergence . . . 49 5.2.2 Overall performance . . . 50 6 Concluding remarks 51 6.1 Conclusions . . . 51 6.2 Future work . . . 52 Bibliography 53

(15)

1

Introduction

The safety features of a vehicle have always been of great importance, and more active features are nowadays being added e.g. such as camera-vision based sup-port. If a camera is mounted behind the windshield of the car, the system can e.g. identify pedestrians or determine if a potentially dangerous situation is upon the driver. An example of a potentially dangerous situation is when a car is overtak-ing and cuttovertak-ing in too close, forcovertak-ing the driver to brake. It is of great importance to quickly detect any incoming object, so that situation analysis can be performed as early as possible. This thesis investigates how such detections can be made and how to use the resulting output for tracking.

Figure 1.1:Camera to be mounted behind the windshield.

If a vehicle is detected, secondary measurements that are otherwise too costly to always be made can be performed. The implementation is a small part of the complete safety system which leads to performance requirements on both speed and robustness. The system should be able to run faster than real-time, i.e.

(16)

2 1 Introduction

allowing the camera update rate to be above 25Hz, and also give a low percentage of false detections.

1.1 Problem formulation

The main problem in this thesis is to quickly detect incoming movement in an image using a mono camera system. The movements should be processed to give a robust interpretation of specific parts of the scene. Figure 1.2 shows a typical overtake sequence, first displaying a normal drive along the highway and then a car is entering the field of view. The car is quickly detected and tracked once a confirmed detection is made.

(a)Before vehicle is overtaking. (b)Detect vehicle.

(c)Find feature to track periodically. (d)Continue tracking.

Figure 1.2:The different stages of the system are shown. At first, the overtak-ing car has not yet come into the field of view. The car then enters the scene more and more for each frame, allowing it to be detected. Once a confirmed detection is made, the car is tracked outside the detection zone.

(17)

1.2 Related work 3

At first, the overtaking car needs to be detected. The detection should be made within a detection area which can be fixed or adapted to previous information. A detection should have some form of confirmation that it is a correct detection, reducing the number of false hits.

If a detection is confirmed, the second subsystem that tracks the incoming vehicle should be activated. The tracking system can be computationally heavier than the detection part, but must still show real-time performance. Actions can be taken depending on the estimated velocity of the tracked vehicle, such as braking if the car is cutting in too close.

1.2 Related work

The detection of vehicle overtaking has been researched previously, resulting in a number of different methods. Morizane et al. [1999] proposed an early cut-in ve-hicle recognition system. The system uses a fixed detection area and investigates if the area has visually changed between frames. The results are dependent on the scenery and is considered not robust enough.

Optical flow [Horn and Schunck, 1981, Lucas and Kanade, 1981] is a computer vision algorithm that estimates how areas of pixels have moved between consec-utive frames. The flow can describe how the vehicles are moving in relation to the ego-motion of the owner’s vehicle and also allows the vehicles to be partly oc-cluded. One cut-in detection system was proposed by Batavia et al. [1997] which used one-dimensional optical flow to detect incoming vehicles from behind and used fixed detection zones. The general concept of using one-dimensional optical flow is used in this thesis, but with the camera looking ahead instead.

Both Baehring et al. [2005] and Garcia et al. [2012] build their solutions around two-dimensional optical flow by investigating the direction of the flow vectors. It is stated in both of these papers that the two-dimensional approach needs ad-ditional input to get a robust system, e.g. radar data or inertial sensors. While being able to run in real-time, it is not clear if they are fast enough to allow multi-ple other subsystems to run as well. The task of finding two-dimensional feature points suitable for tracking is potentially too heavy to be continously running. Looking a bit outside the scope of this thesis, a stereo camera could also be used which is done by Barth and Franke [2008]. This can increase robustness and accuracy, but requires additional hardware both for disparity calculations and input data.

One common issue that all optical flow implementations have is camera shaki-ness which can give errors in the motion vectors. A method of stabilizing the image could improve the robustness of the system. Broggi et al. [2005] propose a method which uses histogram of gradients. While it was not able to get into this thesis, it could potentially improve the performance significantly and is worth mentioning for future work.

(18)

4 1 Introduction

1.3 Implementation overview

In this thesis, a couple of different methods to detect cutting-in vehicles and track them are implemented and evaluated. The complete system can be divided into two main parts, the detection part and the tracking part. They are connected as seen in Figure 1.3. Track vehicles Track features Detect 2D features Detect 1D features Match features Detect vehicles

Figure 1.3:General flowchart for the algorithms.

The main focus in this thesis resides on the detection of overtaking vehicles. As the thesis was done at Autoliv Electronics, the information from the detector could be passed along to already existing algorithms. The tracking part was mainly used as a visualizer. Both subsystems are now briefly described.

1.3.1 Detection

The detection sub-system consists of theWall detector, which quickly detects an

overtaking vehicle. This detector uses one-dimensional optical flow along detec-tion lines to detect moving objects in the image. Several approaches to calculate and use the optical flow are implemented and evaluated. These approaches are briefly summarized in Table 1.1. If a vehicle is moving in a certain direction, i.e. moving along detection lines in a certain direction, it is considered to be a detected overtaking vehicle. More details about this algorithm is presented in Chapter 3.

1.3.2 Tracking

The tracking sub-system consists of a feature detector, which finds corners within a detection zone, and a template tracker. The feature point detector is the Shi-Tomasi detector [Shi and Shi-Tomasi, 1994], which provides features with good

(19)

prop-1.4 Limitations 5

Method Description

Entire line tracking One-dimensional detection lines are created at the far left side of the image. The overall movement of the pixel intensities on the line between consecutive frames is estimated.

Sub-line tracking Conceptually the same as the entire line tracking, but with the detection lines cut into smaller parts, allow-ing more precise trackallow-ing.

Feature point track-ing (LK)

Find feature points along the detection lines and track these using the Lucas-Kanade algorithm.

Feature point track-ing (DP)

Find feature points along the detection lines and track these using a dynamic programming approach. Table 1.1:Implemented methods of detecting an overtaking vehicle.

erties for tracking. The template tracker is based on the well-known KLT tracker [Tomasi and Kanade, 1991], but is optimized for lower computational complex-ity. A scale pyramid is used to allow large displacement. More details about this algorithm is presented in Chapter 4.

The application is implemented in matlab using recorded data from test vehi-cles. Parts of the implementation were ported to unoptimized C/C++ to investi-gate the proper performance of the implementation.

1.4 Limitations

While the implemented system should put focus on speed, it was not possible to test the system in real-time inside a car. The theory behind the algorithms states that they are fast enough, but the actual implementation in a working car is left as future improvements. The implemented system currently only detects on the left side of the image, the expansion to the right side is trivial and left as future work as well. The current implemented system only handles when the ego-car is driving straight, extending the system to work in curves etc. can be done with additional sensor data from the vehicle and is left as future work.

1.5 Results

The main result from this thesis is a fast detector with good performance. The detector is able to run in real-time when implemented in C++ and has a high detection rate. The information retrieved from the detector can be used as input for heavier algorithms, e.g. tracking algorithms. This is demonstrated by using the information as input to the two-dimensional template tracker. More details

(20)

6 1 Introduction

in about the results can be seen in Chapter 5.

1.6 Autoliv Electronics

Autoliv was founded in 1953 by the brothers Stig and Lennart Lindblad under the name Autoservice AB, where the first product was the two-point seatbelt. The name was changed in 1968 to Autoliv AB (AUTOservice Lindblad In Vårgårda). The company grew large by buying several companies during the 1980’s and the 1990’s and in 1997 fused with the American company Morton ASP Inc. to create Autoliv Inc.

The Autoliv Inc. group is one of the biggest actors within vehicle safety with a global market share of 35% in passive safety and 20% in active safety. There are approximately 51,000 employees globally where 4,600 work within research and development.

Autoliv Electronics is a part of the Autoliv Inc. group and specializes in devel-oping active safety features for vehicles such as radar, night vision and camera vision system. Development of the camera systems are being done on both mono camera systems and stereo camera systems. The mono camera is cheaper and easier to integrate inside the vehicle, but lacks a simple solution to get depth in an image. However, algorithms can be developed to overcome this issue and still get a good picture of the surroundings.

1.7 Outline

First, the theory of the thesis is presented in Chapter 2 to give the ground to stand on in implementing the system. The different optical flow methods are presented, starting with the one-dimensional case and then extending it to two-dimensional. Besides the Lukas-Kanade algorithm, a dynamic programming so-lution which matches feature points is also proposed. The complexities of the different methods are described to indicate which methods that could be used. Based on the knowledge from the theory chapter, the focus is shifted to the im-plementation of the system. The first imim-plementation part, Chapter 3, describes the Wall detector, a detection approach using one-dimensional optical flow. The information from the Wall detector is then passed on to the template tracker, de-scribed in Chapter 4.

The evaluation of the implementation and discussion around them is presented in Chapter 5. The thesis is summarized in Chapter 6 with conclusions and sug-gestions for future work.

(21)

2

Optical flow

This chapter describes methods of tracking movements between two or more con-secutive frames which is the base for the implementation part.Optical flow

algo-rithms are used to track how pixel values have moved between the frames by minimizing some error function. Knowing the movement in some parts of the scene, conclusions can be drawn about the general movement of the entire scene.

2.1 Optical flow approaches

Optical flow can be calculated in a local or global sense. The local approach per-forms the calculations within a local region Ω and assumes that the movement vector v is constant in the entire region. A cost function is minimized within the

local region to estimate the motion vector. The global approach instead estimates

the motion vector for each pixel separately to minimize a cost function overthe entire image. In order to get a unique solution, a smoothness term needs to be

added which aims to have as little variation in the motion vector field as possi-ble. The global approach is more sensitive to noise due to the calculations being performed on the entire image and it is also more computationally heavy. Figure 2.1 shows the results of the two approaches trying to estimate the motion vectors in a 2D image with a synthetic box that has moved between two frames. The local approach calculates a constant motion vector in a region around each pixel. As can be seen, the regions within the box cannot be calculated since a unique solution cannot be found. This is because this synthetic image is single colored without any textures. The corners however have uniqueness to them since they include both the background and the box, making it possible to find the correct position in the second frame. The global approach estimates the

(22)

8 2 Optical flow

tion vector field over the entire image. The smoothness term makes sure that the variation in the motion vector field is as small as possible, which in this case makes all the vectors move in the same direction.

(a)Real translation (b)Local flow

(c)Global flow

Figure 2.1: Different methods to calculate optical flow. The local approach calculates the movement based on a small region around each pixel. The global approach tries to estimate the motion vectors based on the entire im-age, trying to minimize the variance between all motion vectors.

The local approach is faster, more robust to noise and works better in real images with textured environments. The smoothness constraint in the global optical flow approach forces motion vectors to be similar, which does not always correspond to the true motion. Instead, the local approach allows the scene to be more dy-namic. With these conclusions, the local approach is used in this thesis. One common local optical flow algorithm is theLucas-Kanade (LK) algorithm [Lucas

and Kanade, 1981]. The optical flow can be calculated in single or multiple di-mensions using this algorithm. Since the various methods implemented in this thesis use both single and multiple dimension optical flow, both are described in Sections 2.2 and 2.4, respectively.

2.2 One-dimensional optical flow

The main concept of one-dimensional optical flow is to estimate the horizontal disparity h between the curves F(x) and G(x) = F(x + h), that minimizes a spec-ified cost function. An example that illustrates this disparity can be seen in Fig-ure 2.2. The curve values from the previous frame are fixed while the curve values in the current frame are translated along the x-axis. If a cost threshold is met, the matching is considered successful.

(23)

2.3 Cost functions 9 0 10 20 30 40 50 60 70 −10 −8 −6 −4 −2 0 2 4 6 8 h

Position along line

Intensity 0 10 20 30 40 50 60 70 −10 −8 −6 −4 −2 0 2 4 6 8

Position along line

Intensity

Figure 2.2:Left: Interpolated values from a fixed line in an image. Values on the line in the previous frame are blue and in current frame are red. The dis-parity between the curves is h. Right: The current curve has been translated to give the best match and the estimated motion h.

The displacement can describe how an object is moving along a line. It is vital that the values are moving either completely to the left or completely to the right in order to obtain a good estimate. If half of the values are moving in one direc-tion and the other half in the other direcdirec-tion, then the tracking fails. This implies that the length of the curve is an important variable; a short line has the potential to be easier matched while the long line can be more robust against noise, as it removes the effect of outliers by averaging.

2.3 Cost functions

When estimating the displacement in N dimensions, a good cost function should be used. A good cost function should preferably be somewhat convex and allow easy further derivations. Some common choices of functions are:

• L1_-norm  =X x∈T |_{F(x + h) − G(x)|.} _(2.1) • L2-norm  =        X x∈T [F(x + h) − G(x)]2        1/2 . (2.2)

• Negative of normalized correlation

 = − Px∈TF(x + h)G(x) (P x∈TF(x + h)2) 1/2 (P x∈T G(x)2) 1/2. (2.3)

(24)

10 2 Optical flow

Here, T denotes the local region around the current point of interest and h de-notes the displacement of this region between two consecutive frames F(x) and

G(x). The L2-norm is the most widely used in the literature because it yields good results and it is easy to get a good analytic derivative when performing the dis-placement calculations [Lucas and Kanade, 1981]. In practice, when interpolated pixel intensities are used as input, the selected cost function might not be convex and therefore contain more than one local minima. Because of this, the minimiza-tion problem can sometimes be difficult to solve. These listed cost funcminimiza-tions only take translation into account, the cost functions are extended in Section 2.4.3 to allow more advanced transformations, e.g. scaling and skewing.

2.3.1 Estimating the displacement in 1D

In this thesis the L2-norm, in (2.2), is chosen as the cost function for estimating the displacement ˆh. Equation 2.2 can be simplified by performing a first-order linear Taylor expansion. The derivative of the cost function w.r.t. to h can then easily be calculated and by setting it to zero, the local minimum can be found,

∂ ∂h = ∂ ∂h X x∈T [F(x + h) − G(x)]2 ≈ ∂ ∂h X x∈T [F(x) + hF0(x)) − G(x)]2 =X x∈T 2F0(x)[F(x) + hF0(x)) − G(x)] = 0. Rearranging terms gives us the estimated displacement,

bh = P x∈T F 0 (x)[G(x) − F(x)] P x∈T F0(x)2 ,

where F0(x) denotes the derivative of F(x) w.r.t the one-dimensional parameter x. The cost function is minimized by an iterative approach using the iterative form,

h0= 0, hk+1 = hk+ P x∈TF 0 (x + hk)[G(x) − F(x + hk)] P x∈T F 0 (x + hk)2 .

The estimated displacement ˆh is assumed to be small, given that a Taylor expan-sion was performed. If the real displacement is too large, the expresexpan-sion might not be valid and may result in bad performance, e.g. divergence. One way to get around this problem is to use a scale-pyramid, calculating a rough estimate at first and then refining it during each step. More of these tricks are presented in the following chapters that discuss the implementation.

(25)

2.4 Two-dimensional optical flow 11

2.4 Two-dimensional optical flow

The one-dimensional optical flow can be extended to N dimensions [Lucas and Kanade, 1981]. In the two-dimensional case, a pixel (assuming images) is tracked between two consecutive frames by first extracting a template around the pixel in the first image. The algorithm then tries to find the best matching position for the template in the second image. Common values for the template side size ωt are

9, 11, 13, 15 and 17 pixels. A demonstration of a template is shown in Figure 2.3 with a large template of 30×30 pixels for better visualization.

(a)Image with template marked. (b)Extracted template.

Figure 2.3: Left: Whole image with the template marked around a pixel. Right: The extracted template that is used for the tracking between consec-utive frames.

There are two main concepts of two-dimensional optical flow: dense and sparse. In dense optical flow, the displacement for each pixel in the image is calculated, which gives motion vectors over the entire image. This can be computationally costly if the template is large. Another issue is that not all pixels are suitable for tracking because they potentially lack uniqueness. An example of this is a pixel located on the road surface, where the template around it would be without any texture, making it impossible to find a unique match. Instead, sparse optical flow tries to find the best pixels/features to track before the actual optical flow calculations are made. The calculations are only performed on the selected fea-tures from the detector which significantly decreases the computational burden. Methods of finding good features to track are described in Section 2.4.4.

The general algorithm using sparse optical flow consists of the following steps: 1. Detect feature points in the previous frame.

2. For each feature point, extract a template with specified size around the feature point.

3. Estimate the displacement of the template in the next image by minimizing some cost function.

(26)

12 2 Optical flow

The estimated motion vectors can be used as feedback to the system for the next frame if the tracking should be performed longer than for two consecutive frames. If the feature points are moving somewhat monotonically, this reduce the number of iterations needed for convergence.

2.4.1 Generalization to multiple dimensions

The derivation for the optical flow in Section 2.3.1 can be generalized to multiple dimensions and extended to support more transformations than pure translation. Rewriting the equations somewhat, the standard forward additive approach of

Lucas-Kanade [Baker and Matthews, 2004] can be written as,

 =X

x∈T

[I(W (x; p)) − T (x)]2, (2.4) where I(x) denotes the current frame and T (x) denotes the extracted template around a feature in the previous frame. W (x, p) denotes the warp function that transforms a template according to a selected transformation model. A couple of common transformation models are described in Section 2.4.3. The parameter

p denotes the coefficient values used in the selected transformation model, e.g.

scaling ratio or translation. These parameters are iteratively adjusted with ∆p to minimize the cost function. Equation (2.4) is solved using the Lucas-Kanade algorithm by iteratively minimizing the expression,

 =X

x∈T

[I(W (x; p + ∆p) − T (x)]2. (2.5)

As in the one-dimensional case, a first order Taylor expansion is made to linearize (2.5) which gives,  ≈X x∈T [I(W (x; p)) + OI∂W ∂p ∆p − T (x)] 2_, _(2.6)

where the gradient of image I denoted OI = (_∂x∂I,∂I_∂y) is calculated within the warped template area. The term ∂W_∂p denotes the Jacobian of the warp function. The update parameter ∆p is the parameter of interest, so the derivative of the expression w.r.t. to ∆p is set to zero to find the optimal update step. This can be expressed as, ∂ ∂∆p ≈2 X x∈T " O_I∂W ∂p #T" I(W (x; p)) + OI∂W ∂p ∆p − T (x) # = 0 ⇔ ∆p = H−1X x∈T " O_I∂W ∂p #T" T (x) − I(W (x; p)) # , (2.7)

where H denotes the Hessian matrix, i.e. H = P

x∈T h_∂W ∂p iTh_∂W ∂p i .

(27)

The parameter p is updated using,

p ← p + ∆p, (2.8)

for the next iteration. To enforce convergence, an additional coefficient α ∈ [0, 1] can be added which is iteratively decreased if the new cost function is larger than the previous. The new expression is then

pi+1 ←pi+ α∆p. (2.9)

2.4.2 Reducing complexity

The forward additive approach for solving the optical flow has a complexity of O_(n2_{N + n}3_{) [Baker and Matthews, 2004], where n is the number of warp} pa-rameters and N number of pixels in the template. The operation for calculating the Hessian, which needs to be recalculated in each iteration, is of complexity O_(n2_{N ) alone. An alternative approach is the inverse compositional [Baker and} Matthews, 2004] which precomputes many properties and thus reduces the com-plexity. The approach switches roles of the template and the current image and instead minimizes,

X

x∈T

[T (W (x; ∆p)) − I(W (x; p))]2, (2.10)

where the parameter update is now,

Wi+1(x; p) ← Wi(x; p) ◦ Wi(x; ∆p)

−₁

, (2.11)

where the composition operation ◦ is defined as,

Wi(x; p) ◦ Wi(x; ∆p) ≡ Wi(Wi(x; ∆p); p), (2.12)

i.e. the parameter x is warped first with ∆p and then with p. Because the template and image have switched roles, many steps in the forward additive approach can be precomputed for the template which is constant. This reduces the total complexity to O(nN + n3). For the full derivation of the inverse compositional approach, see Baker and Matthews [2004].

2.4.3 Model selection

The warp transformation is of great importance for the algorithm. The pure trans-lation model has difficulties when the tracked object is changing size, e.g. mov-ing away or towards the camera, in this case a model with scalmov-ing and translation might suffice. An affine model might be the best choice if the object is rotating in the scene, which changes the appearance of the object. A couple of warp transfor-mations and their corresponding Jacobians are presented in Table 2.1.

To illustrate the difference between the models, the similarity and affine trans-formation is shown in Figure 2.4. The translation and scaling models, which are both included in the other models, are trivial and not shown.

(28)

14 2 Optical flow

Model Warp matrix Jacobian

Translation W(x; p) = 1 0 p₀ ₁ _p1 2 ! ∂W ∂p = 1 0 0 1 ! Scale + Trans. W(x; p) = 1 + p₀ 1 _{1 + p}0 p3 2 p4 ! ∂W ∂p = x 0 1 0 0 y 0 1 ! Similarity W(x; p) = 1 + p1 −p2 p3 p2 1 + p1 p4 ! ∂W ∂p = x y 1 0 −_y _x ₀ ₁ ! Affine W(x; p) = 1 + p1 p3 p5 p2 1 + p4 p6 ! ∂W ∂p = x 0 y 0 1 0 0 x 0 y 0 1 !

Table 2.1:Warp models and the corresponding Jacobians.

STOP

(a)Start.

STOP

(b)Similarity.

STOP

(c)Affine.

Figure 2.4: A stop sign is transformed using similarity and affine models. The similarity model rotates and uniformly scale the sign while the affine model performs skewing, rotation and non-uniform scaling.

2.4.4 Finding good features to track

It is crucial that the templates to be tracked have properties that are suitable for tracking, i.e. avoiding the aperture problem among others. The template should be robust against noise in the image and preferably have some intuitive appearance. One common feature that can be tracked is corners. There exist plenty of corner detection algorithms with their respective advantages and disad-vantages. The algorithm used in this thesis is called theShi-Tomasi detector [Shi

and Tomasi, 1994] which has its roots in the Harris corner detector [Harris and Stephens, 1988].

For each pixel in the detection area, a window/template is created around the pixel. The structure tensor A is created based on the values within this window

W and it is formulated as,

A =X W I2 x IxIy IxIy Iy2 ! =       D Ix2 E D IxIy E D IxIy E D Iy2 E       (2.13)

(29)

be exploited by analyzing the eigenvalues of the structure tensor. A pixel is con-sidered a corner if both eigenvalues are large and positive. By calculating the minimum of the two eigenvalues, a simple check can be made to see if it is above some pre-set threshold λt,

min(λ1, λ2) > λt. (2.14)

If this condition is fulfilled, the pixel is considered a corner. The threshold λtis

determined empirically, typical values are often found in the range 0.1-0.25. For a real symmetric matrix, the eigenvalues can be found as,

A = I_Ixx Ixy

xy Iyy

!

,

det[A − λI] = 0 ⇒ (Ixx−λ)(Iyy−λ) − Ixy2 = 0

⇒_{λ =} 1 2· Ixx+ Iyy± q (Ixx−Iyy)2+ 4Ixy2 (2.15) Since the only interesting eigenvalue is the smallest one, the final decision rule for determining if a corner is present is,

1 2· Ixx+ Iyy− q (Ixx−Iyy)2+ 4Ixy2 > λt. (2.16)

Alternatives that can be used are the Harris corner detection which is very similar or the fast (Features from Accelerated Segment Test) corner detector [Rosten and Drummond, 2005]. The Harris detector considers the eigenvalue calculations too expensive and instead looks at other properties of the structure matrix A. The fastdetector exploits the fact that a circle around a pixel should have consecu-tive brighter or darker values on the circle points. The implementation is based on a great number of nestedif statements, one for each possible case. Due to its

(30)

(31)

3

The Wall Detector

Now that the theory has been presented, it is time to put it to action. The first part of the complete system is the detector which identifies incoming cars. The detector must be robust, i.e. have a low number of false detections, and it must also be able to quickly detect an incoming car. Since the detection part needs to be running continuously, the detector should be of low complexity. The overtaking detection and tracking system should in other words be a tiny part of a larger system, putting extra demands on the performance.

In this section, a novel concept called theWall detector is proposed. Consider the

situation when a car is overtaking the ego-vehicle. Both cars are assumed to have the same direction of respective movement vectors v1 and v2, but with different

magnitudes, i.e. the motion vectors are parallel. The situation is visualized in Figure 3.1, where the two vectors v1and v2are the parallel motion vectors.

Figure 3.1:Overtaking situation where the car 1 with velocity v1is moving

faster than car 2 with velocity v2. Both motion vectors v1and v2are parallel

and intersect in the vanishing point.

All lines that are orthogonal to the image plan are in central perspective

(32)

18 3 The Wall Detector

tion mapped such that the lines intersect in thevanishing point. The vanishing

point can be found in an image by examining where all horizontal lines that are orthogonal to the image plane intersect, which is displayed in Figure 3.2.

Figure 3.2: This figure illustrates how to retrieve the vanishing point. All horizontal lines (shown in red) orthogonal to the image plane will intersect at the vanishing point which is illustrated with an orange circle.

When the observer is driving straight, all parallel moving objects seen by the observer are moving towards or away from the vanishing point. This can be ex-ploited by using a line, that intersects a starting point and the vanishing point, as the axis on where the optical flow should be calculated. Any near parallel moving object is moving along this axis. With this knowledge, aWall consisting of several

detection lines is created at the edge of the image. If a car is overtaking, the pixel intensities of the vehicle are moving along these detection lines. The movement along these lines can be tracked by calculating the one-dimensional optical flow. Calculating the optical flow in one dimension is very cheap, allowing for a fast detection system.

3.1 Creating detection lines

The detection wall consists of several detection lines that all have a slope towards the vanishing point. When creating the detection lines, depending on the track-ing method, at least three main features should be considered:

1. The lines must be angled towards the vanishing point due to the assump-tion of moving objects.

2. The lines should not be too short nor too long, making it a hard-solved problem, as discussed in Section 2.2.

3. The lines should be placed in such way that vehicles driving in parallel with the car will be moving along the detection wall.

(33)

3.2 Retrieving line values 19

This thesis investigates the vehicle overtaking detection, where the areas around the far left and right areas of the images are interesting. A start column is created on both sides and lines with a specified length are drawn evenly spaced from this column towards the vanishing point. Figure 3.3 displays an example of the detection wall where the blue lines are indicating the detection lines.

Figure 3.3: A detection wall consisting of blue detection lines. Each line starts from a user specified position. The slope of the curve is estimated such that the lines intersect with the vanishing point.

3.2 Retrieving line values

When the detection lines have been set, the pixel values on the line need to be retrieved both in the previous and in the current frame. As an image is two-dimensional and sub-pixel accuracy is desired, bilinear interpolation is used to retrieve the intensity values. Bilinear interpolation retrieves the values from the four nearest integer pixels and adds them together according to the distance to the pixels. Figure 3.4 shows the principle of bilinear interpolation.

p

₁ a

p

3 2 4 b (x y )s s,

x

y

Figure 3.4: Bilinear interpolation of a pixel value. Each of the four closest integer pixel contributes to the interpolated value according to their distance to the evaluation point.

(34)

The interpolated value can be calculated as,

f (x, y) = p1(1 − a)(1 − b) + p2a(1 − b) + p3(1 − a)b + p4ab, (3.1) where pi denotes the pixel intensities in the four nearest integer points. The

parameters a and b denote the distances along the x and y-axis between the first integer point and the wanted sub-pixel point. These distances can be found as,

a = xs− bxsc, b = ys− bysc.

3.3 Pre-processing of lines

As in most problems, a preprocessing step is needed to increase the quality of the result and decrease the computational complexity. The first step in this imple-mentation is to check if the line contains good enough properties to allow it to be tracked. Examples of such properties could be:

• Standard deviation of the line values should be above a threshold. • The line values should have at least n extrema.

• The maximum intensity difference should be above certain threshold. The reason why this is needed is probably best explained using an example. The detection lines that are located on the road surface have near constant values due to the one-colored, non-patterned form of the road. Trying to track these lines or features is a difficult problem, due to the lack of a unique solution. During the implementation phase, the following formula proved to give good results,

Action =       

further calculations, if std(F0(x)/ max(F(x))) > σt

skip line, else . (3.2)

The parameter x denotes all the points on the line and σt denotes a threshold

which is empirically determined. The threshold parameter σt is set to a value

within the range [0.002, 0.01] and is rarely changed at all between datasets.

3.4 Tracking the entire line

The simplest way to apply the one-dimensional optical flow in the detector is to investigate how all values along a detection line have moved between the previ-ous and the current frame. The pseudo-code for the implementation is shown in Algorithm 1.

The interpolation only needs to be performed once per line in the previous frame. However, in the iteration phase, the line is translated along its axis, requiring the line values to be re-interpolated in the current frame. The displacement update ∆h is calculated using the Lucas-Kanade algorithm which gives sub-pixel

(35)

3.4 Tracking the entire line 21

Algorithm 1Optical flow tracking of entire line Initialize line parameters,startX, stopX, k, m

for alldetection lines do

Interpolate line values in previous frame. (Section 3.2)

h ← 0.

∆h ← Inf. iteration ← 1.

whileiteration < maxIterations AND ∆h > updateThreshold do DisplacestartX, endX by ∆h.

Interpolate line values in current frame. (Section 3.2) Calculate error function. (Equation 2.2)

Update h and ∆h. (Equation 2.3.1)

iteration ← iteration +1.

end while

iferror < errorThreshold then Store successful track offset. end if

end for

Perform additional tasks based on the successfully tracked lines.

Chapter 2. Figure 3.5 shows the method being applied to real data. First how the method works in normal operation, when no car is within the field of view. A car then enters the field of view and three different situations are shown, (c) about to enter the detection wall, (e) completely within the detection wall and (g) about to leave the detection wall. Red and green lines indicate detection lines moving towards and away from the vanishing point. Blue lines indicate discarded lines that did not fulfill the pre-processing constraint, i.e. too low standard deviation. Finally, yellow lines indicate lines of interest, but the tracking failed.

3.4.1 Advantages

The main advantage of using the complete line for detection is that it is robust against noise and outliers. Another advantage is that it is easy to formulate the complexity for the method, since the maximum number of iterations is known to always be one iteration phase per line.

3.4.2 Disadvantages

Although the method can be said to be robust against noise, this is only true when the whole line is moving in one direction. If a car is entering the scene, one part of the line is moving to the right while the other part of the line is moving to the left. Due to this fact, it is not possible to determine one direction that the whole line is moving towards and the tracking then fails. In other words, it is only possible to track a line if the line values are mostly background or mostly on

(36)

the car. A second disadvantage is that a long line requires considerable amount of time to interpolate during each iteration.

3.4.3 Improving the method, sub-lines

Instead of tracking the entire line, the line can be split into smaller sublines. The theory remains the same, even Algorithm 1 is still valid, the only thing that is changed is the initial set of lines. Each subline has the same computational complexity as before, but the accuracy of the tracking increases. This can once again be best explained using an example.

A car is moving into the scene and represents a third of the values in the lines. When using the standardtrack the entire line, the tracking fails since an

unambigu-ous offset cannot be determined. If the line instead is split into three sublines, the first line is determined to move in the direction of the car while the other two are moving in the same direction as the background. Figure 3.6 shows the same situ-ations as in Figure 3.5, but with each line split into three line segments/sublines instead. The most notable change is that a more precise response can be seen in the "about to enter" situation and in the "about to leave" situation.

This approach does increase the accuracy of the tracking; however it is less robust against noise.

(37)

3.4 Tracking the entire line 23

(a)Only background.

(b)Car about to enter scene.

(c)Car completely within detection zone.

(d)Car leaving detection zone.

(38)

(a)Only background.

(39)

3.5 Tracking 1D features using Lucas-Kanade 25

3.5 Tracking 1D features using Lucas-Kanade

Typically, tracking is successful only if the values that are tracked are unique in some manner. Tracking sublines can still fail if the properties of the line are not good enough and tracking all sublines requires unnecessary computational power. Even if the tracking succeeds, the accuracy might be low.

So, instead of tracking a predetermined line segment, feature points on the line are extracted. A small interval around the feature point could then be tracked in the same manner as in the previous method. Using this concept, the most distinct trackable parts of the line are tracked. This also allows for many different moving directions on the line.

3.5.1 Finding the features

Unlike the two-dimensional case, there are no known standard methods of find-ing one-dimensional features. Common unique points in a signal are the extreme points, i.e. minima and maxima. Three methods described in Table 3.1 were therefore implemented and evaluated for detecting a feature.

Method Description

Extrema Find all minima and maxima that has large enough second derivative.

STD-based Find all minima and maxima that has large enough stan-dard deviation (STD) in interval around the extreme point. Slopes Find all minima and maxima that are large enough on the

gradient line.

Table 3.1:Methods of finding one-dimensional features.

An example of the features found by respective method on the same line can be seen in Figure 3.7. If we only observe the line values, the extrema looks like the most unique feature points. It is however harder to explain what an extremum is in a real picture, in this case it is actually a very thin pole. The most intuitive method is the slope method where each slope represents an edge in the image, which means that the edges in the image are tracked. This is good for the visual-ization of the tracking as one can guess in beforehand where the features will be. While slope features are great for visualizing, they are not necessarily the best choice for the actual tracking. The STD method usually gives more features which can be tracked equally well. In the end, the slopes method is easier to debug and is deemed to give more than enough feature points, which is why it is selected as the standard method.

(40)

26 3 The Wall Detector 0 50 100 150 200 250 300 350 400 100 150 200 250 Line Extrema STD Slopes

Position along line

Intensity

Figure 3.7:Detected features on a line using different methods. The extrema method investigates extreme points and select the ones with largest deriva-tive. The STD method investigates the standard deviation within an area around each extreme point. Lastly, the slope method selects the points with largest derivative. This can be intuitively thought of as edges in the image.

3.5.2 Putting it together

First, the entire detection lines are created as before. Along each detection line, features are detected. Each feature is tracked to the next frame by creating an interval around the feature point and then tracking that interval. Algorithm 2 describes the complete method in pseudo code.

Figure 3.8 shows the same scenes as in Figures 3.5 and 3.6 but when the feature tracking approach is used. Instead of tracking fixed lines, a more flexible and precise detection can be made. The slopes method is used for detecting features, which can be seen since the tracked lines are all corresponding to edges. In the images, many of the features can be found around the chassis edges around the wheel base and along the door edges. These features are almost always present on any overtaking car. Under the assumption that has been made, i.e. the car is traveling along the detection lines, this approach finds more features than a normal 2D detector does since they usually require more uniqueness.

(41)

3.5 Tracking 1D features using Lucas-Kanade 27

Algorithm 2LK tracking of 1D features Initialize line parameters,startX, stopX, k, m

Interpolate line values in previous frame. (Section 3.2)

features ← DetectFeatures(interpolated line)

for allfeatures do

Create interval around feature.

Interpolate interval values in previous frame. (Section 3.2)

h ← 0.

∆h ← Inf. iteration ← 1.

whileiteration < maxIterations AND ∆h > updateThreshold do Displace interval position ∆h.

Interpolate interval values in current frame. (Section 3.2) Calculate error function. (Equation 2.2)

Update h and ∆h. (Equation 2.3.1)

iteration ← iteration +1.

end while

iferror < errorThreshold then Store successful track offset. end if

end for end for

function DetectFeatures(line)

points ← Find optimal points.

Sortpoints after standard deviation within area around the point

end function

(42)

3.6 Tracking features using dynamic programming

The Lucas-Kanade (LK) approach is an iterative solution where each feature point tries to converge towards the best match. This gives more precise solutions but the speed is depending on how many iterations that are needed on average. An-other approach appears if the assumption that a unique feature point in the pre-vious frame is always found in the next frame is made.

The feature points in the previous detection lines are detected in the same way as before. In addition to these feature points, feature points along the current detection lines are also detected. The feature points in the previous detection lines and the ones in the current detection lines are now matched against each other to find if any of them match. This matching problem can be solved by means of dynamic programing (DP).

The problem can be considered a variant of bipartite matching / altered stable marriage. The cost function used is the root mean square error (RMSE) between intervals around the feature points,

i,j=        1 |_{T |} X x∈T [Lc,i(x) − Lp,j(x)]2        1/2 ,

where Lc,i denotes the line values within an interval around the feature i in the

current frame. Lp,j denotes the line values within an interval around the feature

j in the previous frame. Algorithm 3 describes one implementation that can be

used to solve the problem. Figure 3.9 shows the same situations as before but with the dynamic programming approach. For comparison with the LK approach, the same cost threshold is used for matching.

The performance resembles the feature tracking using Lucas-Kanade, but with fewer successfully tracked features. This is because the LK algorithm allows it-erative tuning to find a good match, while the dynamic programming approach needs to find exact matches at once. The DP algorithm is faster, but can require higher thresholds for more successful tracks, risking more false positives. This means that the DP requires higher match threshold than the LK approach and is effectively less accurate.

(43)

3.6 Tracking features using dynamic programming 29

Algorithm 3Dynamic programming solution to matching Initialize line parameters

fp ←Find feature points in previous frame.

fc←Find feature points in current frame.

for allfeatures f in fpdo

f.bestMatch ← −1. f.bestCost ← inf.

end for

for allfeatures f in fcdo

f.bestMatch ← −1. f.bestCost ← inf.

f.costs ← Calculate costs to each previous feature in fprev.

Sortf.costs ascending.

end for

MatchedPairs = MatchFeatures(fp, fc)

end for

function MatchFeatures(fp, fc)

forfeature f in fcdo

forfeature g in fpsorted asc. according to cost do

Match ← True if f cost to g is better than g current best cost.

ifMatch then

Feature node g tells its previous best match to find new node.

g.bestMatch ← f .

Break and continue with next feature in fc.

end if end for end for end function

(44)

(a)Only background.

Figure 3.8:Tracking of features on the detection lines in different situations using Lucas-Kanade.

(45)

3.6 Tracking features using dynamic programming 31

(a)Only background.

Figure 3.9:Tracking of features on the detection lines in different situations using dynamic programming.

(46)

3.7 Common issues and solutions

A common situation in real-life is repetitive patterns, e.g. poles between highway lanes going in separate directions. Figure 3.10 shows the values of a detection line, displaying this phenomenon.

0 20 40 60 80 100 120 140 160 180 200 100 120 140 160 180 200 220 240 260

Position along line

Intensity

Figure 3.10: The intensity values of a detection line displays a repetitive signal. The detection line is upon a snowy background which has high pixel values and poles with low pixel values popping up on occasion. Each dip is a pole.

This phenomenon can create a problem during the matching for all methods. The starting position has a large influence in this case, since the iterative nature of the LK risk moving towards the incorrect but still valid solution. One solution is to run the algorithm starting from different starting points. The starting points can be random or evenly spaced out over the signal. Another approach is to detect features in the current line, like in the DP solution, and start from these. If several matches are found, two actions can be made:

1. Select the match that has the lowest cost.

2. Consider the detection line too hard to track and therefore ignore it. Selecting the best match can in many cases work well, however light conditions etc. that normally might have small effects can have larger effects now when the margins are smaller. Another advantage for the ignoring action is that a detection line with incoming objects normally does not have any repetitive patterns like this.

(47)

4

Template tracker

The Wall detector discussed in the previous chapter provides the possibility to detect an incoming car. However, it is only effective within the detection area. This information can be used to initialize a more advanced system which is not running continuously. In this thesis, we use the Wall detector to initialize the

template tracker introduced in this chapter. The template tracker first detects 2D

features within a detection zone that can be either at a fixed position or dynam-ically adjusted to information from the Wall detector. Once the features have been detected, these are tracked between each consecutive frame, allowing the prediction of the paths of the overtaking cars. The features can also be tracked outside the detection zone, allowing for a more general tracker than the Wall de-tector. If the tracked features have velocity moving towards a dangerous area for the ego-vehicle, appropriate actions can be taken, such as braking.

4.1 Creating a detection zone

Finding features within the entire image is not efficient since most of the detected features are not of any interest. Instead, a detection zone is created where feature points are searched for and tracked. This greatly reduces the computational com-plexity but also puts focus on finding the interesting feature points. If there is no a priori known information about the scene, the detection zone is set to a fixed position and size, see Figure 4.1.

If there is information about the scene such as lane information, the detection zone could dynamically adjust to that information. The static detection zone is set far to the left side of the image to be able to quickly detect incoming fea-tures. The box is thin to reduce complexity and limit the number of

(48)

34 4 Template tracker

Figure 4.1:Detection zone (in red) for 2D tracker with feature points marked as yellow blobs. The detection zone is fixed to be near the edge on the left side of the image, allowing feature detection on the car when only a small part is within view. The detection zone is thin, so that the feature points are concentrated on the interesting parts for this system.

tant features. When the Wall detector is used, the detection zone adjusts to the tracked feature points on the detection lines. The maximum and minimum of these tracked features positions are calculated and spans the detection zone with a small additional margin.

4.2 Implementation of the KLT tracker

The implementation of the KLT is done with the inverse compositional approach described in Chapter 2. The selection of the best model is difficult, as the affine model is the most adaptable but risks over-fitting, while the translation model might not suffice. When testing each of the models, the pure simple translation model proved to be the best in the sense that it converged to a match far more often than the other models.

The other transformation models could often give better fits for some features. However, it was common for the tracking to fail due to over-fitting one param-eter. The main issue is that the translation parameter changes are held back by skewing/scaling parameter changes. The affine model has a tendency to skew the template instead of translating it at all; it is quite often skewed all the way to a line shape. Similar results can be seen with the scaling as well. This effect can be reduced by introducing some form of regularization, punishing values growing too fast/large etc.

One approach is to first use the translation model to get an estimate of the trans-lation part and then to use a more specific model for precision. In this thesis,

(49)

4.3 Scale pyramid 35

such precision is not needed since the tracking is only for a short period of time before the standard tracker takes over. Because of this, the translation model is always used.

In the situation shown in Figure 4.2, the car is in the overtaking lane during the entire sequence. Because of this, most of the feature points found are visible throughout the entire sequence as well. When a car is cutting in, the features found on the side of the car are visible during the actual cut-in sequence, but as the car is steering up, these features are lost. In this case, only the features on the back side of the car are still trackable. This is however not a large issue since the information from the features on the side of the car is only useful when the car is overtaking and still very near, or when the car is cutting in close.

4.3 Scale pyramid

When a car is overtaking, it is often moving at a much higher speed than the ego-car. This means that the overtaking car moves a large distance between each frame. As mentioned in the theory chapter, the Taylor expansions made dur-ing the derivation of the algorithms assume a small displacement. In order to overcome this issue, a scale pyramid is created [Bouguet, 2001]. On each scale pyramid, the tracking is performed, starting from the coarsest scale to the finest. This gives a rough estimate in the beginning which is refined for each scale level. The scales of the images are defined as,

Scale factor = 2L−1, (4.1) where L is the level. In order to visualize the scale pyramid, a smaller down-scaling factor of 1.3 is used, otherwise the proportions in the image would differ too much. This scale pyramid can be seen in Figure 4.3, where each level of the pyramid is shown.

The down-sampling needs to be preprocessed with a low-pass filter to avoid alias-ing effects. One possible choice of lowpass filter Lpis the "magic" filter1,

Lp= 1 64·             1 3 3 1 3 9 9 3 3 9 9 3 1 3 3 1             . (4.2)

This filter is 2D convolved with the image to get a more blurry image, i.e. I ∗ Lp.

The original image is padded with its edge columns/rows to avoid edge effects when convolving. Once the image has been blurred with the filter, the image is downsampled by taking every other pixel value in each row and column. The image needs to be downsampled several times for the scale pyramid, so for each scale, the filter is applied and then the actual downsampling.

1_{More information about this special filter and its properties is available at http://}

(50)

36 4 Template tracker

(a)Start bounding box.

(b)More of the car being detected and tracked.

(c)Entire car being tracked.

(d)Car tracked a long way.

Figure 4.2: Sequence of car being tracked. Feature points are found within a detection zone and are then tracked using the KLT tracker. A bounding box is created by finding the min/max coordinates of the feature points if enough feature points have been tracked longer than a certain time thresh-old. Feature points are added periodically if a car is still detected.

(51)

4.4 Feedback to the system 37 0 200 400 600 -400 -200 -600 400 -400 -200 200 0 -300 -100 100 300

Figure 4.3: Scale pyramid where the image has been downscaled 1/1.3 for each level. In the real implementation, a scaling factor of 2 is used but is harder to visualize properly in the report.

4.4 Feedback to the system

In order to get faster convergence and a more robust system, the estimated param-eters can be used as feedback to the system for the next iteration and be used as initial estimation. Given that the overtaking vehicle velocity difference between consecutive frames is small, the motion vectors are approximately the same. This can be modeled by a Kalman filter using a constant velocity model; however such a filter is only useful if the tracking is meant to be performed over a longer pe-riod of time. In normal situations, using the most recent estimated parameters is sufficient to use as initial parameter values.

4.5 Handling template drifting

The problem of tracking feature points over a longer consecutive sequence is usu-ally approached in two different ways. The first approach is to extract the tem-plates only once around the detected feature points. These temtem-plates are then tracked between each frame. The same templates are then used throughout the entire sequence without being updated to the current scene. The drawback with this method is that it is very sensitive to changes, such as light conditions, be-tween the frames.

Detection and tracking of overtaking vehicles

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Detection and tracking of overtaking vehicles

Detection and tracking of overtaking vehicles

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Sammanfattning

Abstract

Acknowledgments

Contents

1

Introduction

1.1

Problem formulation

1.2

Related work

1.3

Implementation overview

1.3.1

Detection

1.3.2

Tracking

1.4

Limitations

1.5

Results

1.6

Autoliv Electronics

1.7

Outline

2

Optical flow

2.1

Optical flow approaches

2.2

One-dimensional optical flow

2.3

Cost functions

2.3.1

Estimating the displacement in 1D

2.4

Two-dimensional optical flow

2.4.1

Generalization to multiple dimensions

2.4.2

Reducing complexity

2.4.3

Model selection

STOP

STOP

STOP

2.4.4

Finding good features to track

3

The Wall Detector

3.1

Creating detection lines

3.2

Retrieving line values

p

p

p

p

x

y

3.3

Pre-processing of lines

3.4

Tracking the entire line

3.4.1

Advantages

3.4.2

Disadvantages

3.4.3

Improving the method, sub-lines

3.5

Tracking 1D features using Lucas-Kanade