Vision and Radar Sensor Fusion for Advanced Driver Assistance Systems

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Vision and Radar Sensor Fusion for Advanced Driver

Assistance Systems

Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet

av

Christian Andersson Naesseth LiTH-ISY-EX--13/4685--SE

Linköping 2013

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Vision and Radar Sensor Fusion for Advanced Driver

Assistance Systems

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Christian Andersson Naesseth LiTH-ISY-EX--13/4685--SE

Handledare: Hanna Nyqvist

isy_{, Linköpings universitet}

Peter Hall

Autoliv Electronics AB

Examinator: Thomas Schön

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Avdelningen för Reglerteknik Department of Electrical Engineering SE-581 83 Linköping Datum Date 2013-06-14 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version http://www.ep.liu.se

ISBN — ISRN

LiTH-ISY-EX--13/4685--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Vision och Radar Sensorfusion för Avancerade Förarassistanssystem Vision and Radar Sensor Fusion for Advanced Driver Assistance Systems

Författare Author

Christian Andersson Naesseth

Sammanfattning Abstract

The World Health Organization predicts that by the year 2030, road traffic injuries will be one of the top five leading causes of death. Many of these deaths and injuries can be pre-vented by driving cars properly equipped with state-of-the-art safety and driver assistance systems. Some examples are auto-brake and auto-collision avoidance which are becoming more and more popular on the market today. A recent study by a Swedish insurance com-pany has shown that on roads with speeds up to 50 km/h an auto-brake system can reduce personal injuries by up to 64 percent. In fact in an estimated 40 percent of crashes, the auto-brake reduced the effects to the degree that no personal injury was sustained. It is imperative that these so called Advanced Driver Assistance Systems, to be really effec-tive, have good situational awareness. It is important that they have adequate information of the vehicle’s immediate surroundings. Where are other cars, pedestrians or motorcycles relative to our own vehicle? How fast are they driving and in which lane? How is our own vehicle driving? Are there objects in the way of our own vehicle’s intended path? These and many more questions can be answered by a properly designed system for situational awareness.

In this thesis we design and evaluate, both quantitatively and qualitatively, sensor fusion algorithms for multi-target tracking. We use a combination of camera and radar information to perform fusion and find relevant objects in a cluttered environment. The combination of these two sensors is very interesting because of their complementary attributes. The radar system has high range resolution but poor bearing resolution. The camera system on the other hand has a very high bearing resolution. This is very promising, with the potential to substantially increase the accuracy of the tracking system compared to just using one of the two. We have also designed algorithms for path prediction and a first threat awareness logic which are both qualitively evaluated.

Nyckelord

Keywords automotive, sensor fusion, radar, vision, camera, target tracking, EKF, tracking evaluation, OSPA, path prediction, ADAS, car

(6)

(7)

Sammanfattning

Världshälsoorganisationen förutspår att trafikskador kommer att vara den femte vanligaste dödsorsaken år 2030. Många av dessa skador och dödsfall kan undvi-kas om alla skulle köra fordon som är utrustade med de senaste förarassistans-och säkerhetssystemen. Några exempel på dessa är autobroms förarassistans-och kollisionsund-vikande system som blir mer och mer populära på marknaden. En studie av Folk-sam påvisar att på vägar med hastigheter om upp till och med 50 km/h så kan ett autobromssystem reducera personskador med upp till 64 procent. Man upp-skattar även att cirka 40 procent av olyckorna reducerades till den grad att ingen personskada skedde.

För att dessa så kallade förarassistanssystem ska vara som mest effektiva krävs god kännedom om den egna bilens omgivning. Var finns andra bilar, motorcyklar, fotgängare i förhållande till den egna bilen? Hur snabbt färdas de och i vilken fil? Hur kör vår egen bil? Finns det hinder i vägen för vår bil? Dessa och många fler frågor kan besvaras av ett väldesignat och vältestat system för situationsmedve-tenhet (situational awareness).

I denna rapport designar och utvärderar vi, både kvalitativt och kvantitativt, sen-sorfusionsalgoritmer för målföljning. Vi använder oss av en kombination av ra-dar och kamera-information för att automatiskt hitta och följa relevanta mål i en brusig miljö. Det är intressant att kombinera radar och kamera eftersom dessa har komplementerande attribut som vi vill utnyttja. Radarn har en bra upplös-ning i avstånd men dålig upplösupplös-ning i vinkelled. Kameran å andra sidan har bra upplösning i vinkelled. Detta betyder att det finns möjligheter att signifikant öka nogrannheten hos målföljningen i jämförelse med att bara använda endera av des-sa sensorer. Vi har också designat algoritmer för att förutsäga hur den egna bilen kommer köra (path prediction) samt ett enkelt första försök till hotanalys (threat assessment).

(8)

(9)

Abstract

The World Health Organization predicts that by the year 2030, road traffic in-juries will be one of the top five leading causes of death. Many of these deaths and injuries can be prevented by driving cars properly equipped with state-of-the-art safety and driver assistance systems. Some examples are brake and auto-collision avoidance which are becoming more and more popular on the market today. A recent study by a Swedish insurance company has shown that on roads with speeds up to 50 km/h an auto-brake system can reduce personal injuries by up to 64 percent. In fact in an estimated 40 percent of crashes, the auto-brake reduced the effects to the degree that no personal injury was sustained.

It is imperative that these so called Advanced Driver Assistance Systems, to be really effective, have good situational awareness. It is important that they have adequate information of the vehicle’s immediate surroundings. Where are other cars, pedestrians or motorcycles relative to our own vehicle? How fast are they driving and in which lane? How is our own vehicle driving? Are there objects in the way of our own vehicle’s intended path? These and many more questions can be answered by a properly designed system for situational awareness.

In this thesis we design and evaluate, both quantitatively and qualitatively, sensor fusion algorithms for multi-target tracking. We use a combination of camera and radar information to perform fusion and find relevant objects in a cluttered environment. The combination of these two sensors is very interesting because of their complementary attributes. The radar system has high range resolution but poor bearing resolution. The camera system on the other hand has a very high bearing resolution. This is very promising, with the potential to substantially increase the accuracy of the tracking system compared to just using one of the two. We have also designed algorithms for path prediction and a first threat awareness logic which are both qualitively evaluated.

(10)

(11)

Acknowledgments

I would like to start by thanking Autoliv Electronics for letting me do this project and supplying workstation and data with which to carry it out. I would especially like to thank my supervisor at Autoliv Electronics, Peter Hall, for his valuable input and discussions throughout this thesis. On top of this I would also like to thank my co-workers at Autoliv who have helped me and listened to my ideas. Furthermore, I would like to give thanks to my supervisor at Linköping Univer-sity, Hanna Nyqvist, for her input and comments on my thesis. Lastly, I would like to thank my examiner, Dr. Thomas Schön, for valuable comments on my report.

Linköping, June 2013 Christian Andersson Naesseth

(12)

(13)

5.1.1 Quadratic Model . . . 47 5.1.2 Cubic Model . . . 48 5.1.3 Clothoid Model . . . 49 5.1.4 Path Measurements . . . 50 5.2 Threat Assessment . . . 52 5.3 Results . . . 52 6 Concluding Remarks 57 6.1 Results Summary . . . 57 6.2 Future Work . . . 58 A Evaluation Data 61 A.1 Tables with Mean Values . . . 61

A.2 Plots with Tracking Evaluation Metrics . . . 62

A.3 Plots with the ospa metric . . . 62

(15)

Notation

Sets

Notation Meaning

R Set of real numbers

Symbols

Symbol Meaning

W World fixed coordinate system

E Ego-vehicle coordinate system

Ti Target i

pTiE

x Longitudinal position of target i relative to the

ego-vehicle coordinate system

pTiE

y Lateral position of target i relative to the ego-vehicle

coordinate system

vTiW

x Longitudinal velocity of target i relative to the world

fixed coordinate system

vTiW

y Lateral velocity of target i relative to the world fixed

coordinate system

lTi _{Real world width of target i}

rTi

h Road height of target i

xTi _{State vector of target i}

xk x (kTs), Ts sampling time

xk+1 x (kTs+ Ts), Tssampling time

X Matrix

I Identity matrix of relevant size

diag(x1, . . . , xn) Diagonal matrix, with the main diagonal consisting of

scalars x1, . . . , xn

(16)

xii Notation

Abbreviations

Abbreviation Meaning

adas _{Advanced Driver Assistance Systems} ecu _{Electronic Control Unit}

pda _{Probabilistic Data Association} imm _{Interacting Multiple Model}

slam _{Simultaneous Localization and Mapping}

sf Safe Fusion

gf General Fusion

df Decentralized Fusion cf Centralized Fusion wls Weighted Least Squares

ospa _{Optimal Subpattern Assignment} svd _{Singular Value Decomposition} ekf _{Extended Kalman Filter} gnn _{Global Nearest Neighbour}

fpt _{Feature Point Tracking} rmse _{Root Mean Square Error}

rts _{Rauch - Tung - Striebel fixed-lag smoother} ch Completeness History

ti Track Initiation time

rtmr Redundant Track Mean Ratio stmr Spurious Track Mean Ratio

pe Position (tracking) Error ve Velocity (tracking) Error

(17)

1

Introduction

Every day in traffic a lot of people are involved in accidents that, with a prop-erly equipped car, might have been avoided. With increasing processing power, cheaper electronics and sensors, today’s cars can be equipped with technology to let the car itself sense and perceive other cars and objects in its surroundings to identify potential threats. This situational awareness can be used to imple-ment many kinds of Advanced Driver Assistance Systems (adas) such as adap-tive cruise control, lane-departure warning, pedestrian detection, auto-brake, col-lision avoidance etc.

A recent study by a Swedish insurance company, Folksam [Fol, 2013], has shown that on roads with speeds of up to 50 km/h, an auto-brake system can reduce personal injuries by up to 64 percent. In fact, in an estimated 40 percent of crashes, the auto-brake reduced the effects to the degree that no personal injury was sustained.

A key component in these systems is situational awareness, i.e. where is the ego-vehicle in relation to other ego-vehicles, objects or obstacles and where will it be in the future? This problem can be solved using sensor fusion, collecting informa-tion from for instance radars, cameras, odometry, and fusing this informainforma-tion to track objects, find their position in the ego-vehicle’s immediate surroundings and predict the ego-vehicle’s intended path.

In this chapter we will discuss the sensors and basic system setup used through-out the thesis, Section 1.1, general concepts of sensor fusion for target tracking and situational awareness, Section 1.2, some related work, Section 1.3, problem formulation in Section 1.4, introduction of our collaboration partner Autoliv AB in Section 1.5 and a general outline of the whole thesis in Section 1.6.

(18)

2 1 Introduction

(a)Monovision camera (b)Radar

Figure 1.1:Sensors from Autoliv AB used in this thesis.

1.1 System Setup

In this thesis we use data collected from an Autoliv test vehicle equipped with a monocular camera and a radar with medium and long range capabilities, see Figure 1.1a and 1.1b. The combination of these two sensors is interesting because of their complementary attributes; the radar has a high range resolution but poor bearing and a camera system has high bearing resolution.

Inputs to our sensor fusion algorithms are ego-vehicle measurements and either tracked objects from both the vision and radar system designed by Autoliv, or classifications with a region of interests from the vision system and detections with range and bearing data from the radar system. Algorithms are implemented and evaluated offline using MATLAB.

1.2 Sensor Fusion for Target Tracking and Situational

Awareness

This thesis will mainly investigate and compare different approaches to camera and radar sensor fusion for vehicle detection and tracking. Fusion takes place on an Electronic Control Unit (ecu) and we have a radar and a vision system that can either output tracked objects or raw measurements. The conceptually different approaches and algorithms that have been applied and evaluated are: Static Fusion The radar and vision systems each provide a list of tracked objects

and their estimated states. We perform fusion frame by frame, considering the received information to be a static estimate of the target’s state.

Decentralized Fusion The radar and vision systems each provide a list of tracked objects and their estimated states. We assume the estimates are actually measurements and perform target tracking with an extended Kalman fil-ter.

Centralized Fusion The radar and vision systems each provide raw measure-ment data. With this we perform target tracking with extended Kalman filtering.

(19)

1.3 Related Work 3

These approaches are interesting to compare and see what gains, if any, we can get by using more computational power, memory, communication bandwith etc. Static fusion can be performed with the least amount of computational power, sig-nalling and memory as no information is saved between frames and the tracked objects received from the two sub-systems are usually fewer in number and size than raw measurement data. The decentralized algorithm uses substantially more CPU than the static fusion algorithm, communication bandwith is basically the same, as the input data is identical, however the tracks need to be saved so slightly more memory is used. Centralized fusion uses raw measurement data which requires more bandwidth and generates more tracks, resulting in higher memory usage. The relative usage of resources for these three algorithms is sum-marized in Table 1.1. The decentralized and centralized fusion algorithms both use multi-target tracking theory, adding a dynamic element to the fusion algo-rithm, and hence are also collectively known as dynamic fusion algorithms. Be-cause of restrictions on system design and ecu it is interesting to investigate the performance differences between fusion algorithms of various complexities.

Fusion Algorithm CPU Memory Communication

Static Low Low Low

Decentralized High Average Low

Centralized High High High

Table 1.1:Approximate relative resource usage of fusion algorithms.

To exemplify when the above fusion algorithms can be used to increase situa-tional awareness in traffic, we have also fused information from the tracking sys-tem as well as from the ego-vehicle measurements to predict the intended path. This, together with fundamental threat assessment, is found towards the end of this thesis.

Figure 1.2 illustrates all the previously described concepts on a scene from a test drive on a German highway. The multi-target tracking approach used in this scene is the centralized fusion algorithm. To the left we see a bird’s eye view of the scene where rectangular green boxes denote fused tracks and the black lines extending from these are estimated velocities. Blue round dots are radar detections and the middle red line is the predicted path. To the right we see the camera image overlayed with yellow region of interests for classified objects.

1.3 Related Work

In this section we discuss some related work regarding vision and radar fusion for adas. A lot of studies and research has been done on vision and radar fusion because of their complementary attributes. Research on camera and radar fusion stretches all the way back to the late 1980s. Algorithms for solving this problem are generally very computationally intensive, so the research didn’t really take

(20)

4 1 Introduction

Figure 1.2:Target tracking and path prediction illustration. Bird’s eye view to the left and camera image to the right. The blue round dots show the radar detections, green rectangles with lines the tracked vehicles and the red long curve the predicted path. In the camera image we see yellow regions of interest from the classifier.

off until in the late 1990s and in the 2000s as hardware started to catch up. The interest and number of articles have never been higher, which definitely makes this an interesting topic for study. To illustrate this we have included a diagram, see Figure 1.3. This diagram displays the number of articles published each year, starting from the year 1985, that have the topic camera or vision and radar as found on the Web of Knowledge [WoK, 2013].

We begin by giving a selection on some of the articles and research in mono-vision and radar fusion for automotive tracking applications. In Gern et al. [2001] a system using vision for lane detection and fusion for vehicle tracking is proposed. In this article, radar detections are used to give a rough search area for the vision detection algorithms, and vehicle tracking is performed using a Kalman filter. Another early system is Fade, presented in Steux et al. [2002], which fuses the radar and vision on a low level. A promising approach for vehicle tracking using probabilistic data association (pda) and the interacting multiple model (imm) filter is given in Liu et al. [2008]. An interesting case study of radar and vision data for tracking is performed in Tango et al. [2008]. Pedestrian tracking across blind regions is performed in Otto et al. [2012] using a fusion approach based on a joint integrated pda filter. A frontal object perception and tracking system is illustrated in Chavez-Garcia et al. [2012] using Dempster Shafer theory based sensor fusion.

Radar and vision information can also be used for other types of adas, here we present some articles for object detection. Motion stereo, recovering 3D structure using parallaxes due to camera motion, and radar fusion for obstacle detection

(21)

1.3 Related Work 5 1985 1990 1995 2000 2005 2010 0 20 40 60 80 100 120 Year Nr . of articles

Figure 1.3:Number of articles with topic vision/camera and radar published each year according to Web of Knowledge.

is presented in Kato et al. [2002]. In Sole et al. [2004] a system using vision for radar target validation is described and is shown to improve detection. An article describing vehicle detection, where radar data is used to locate areas of interest in images provided by the camera, is given in Bombini et al. [2006]. Haselhoff et al. [2007] is taking this idea one step further by using radar to first find a region of interest in the image and then again by fusing the output of the vision classifier. Motion stereo and radar fusion is again applied and refined in Bertozzi et al. [2008] for object detection and classification. Radar detections are used for image segmentation and together with image data form a sparse input to a classifier. The classifier is made up of a multilayer in-place learning network which shows promise compared to other standard classifiers based on neural networks and support vector machines.

An interesting insight into the pros and cons of radar and vision sensors is given in Hofmann et al. [2001] as well as results for a hybrid adaptive cruise control sys-tem. When guard rails are present these can be detected and used to improve and speed up the system performance as is done in Alessandretti et al. [2007]. Guard rail detections and detections of other stationary objects can also be used to im-plement local slam for a better situational awareness. This, as well as tracking, is performed in Vu et al. [2011] using a laser scanner for slam in combination with a vision system for tracking of vehicles. In Lundquist [2011] local slam is performed using radar detections and several different map and motion models. Compared to much of the discussed related work, we present an approach and evaluation on a slightly higher level of data fusion as described in Section 1.2. In the last part of this thesis we discuss path prediction and threat assessment. In

(22)

6 1 Introduction

two of the models we have taken inspiration from e.g. Lundquist [2011] who has used leading vehicles as a source of information to estimate road geometry. We use this approach with a slight twist, using leading vehicles to predict ego-vehicle path instead of road curvature.

1.4 Problem Formulation

While studying the related work, available system setup and data, as well as the differen approaches describes in Section 1.2 we have identified the following sub-problems and tasks for a vision and radar sensor fusion system that we will study in this thesis:

• Design a tracking algorithm based on static fusion.

• Design a tracking algorithm based on decentralized fusion. • Design a tracking algorithm based on centralized fusion.

• Tune the tracking algorithms using a suitable metric on real world test data. • Evaluate and compare the different algorithms on real world test data. • Design path prediction and fundamental threat assessment algorithms

us-ing ego-vehicle measurements and tracked vehicles.

1.5 Autoliv

Autoliv was founded in Vårgårda in 1953 by the two brothers Lennart and Stig Lindblad as a dealership and repair shop for motor vehicles called Lindblads Autoservice. In 1956 they started manufacturing their first safety product, a twopoint seatbelt, and in 1968 the company changed its name to Autoliv AB which stands for AUTOservice Lindblad In Vårgårda. Autoliv AB and Morton ASP merged in 1997 and the company Autoliv Inc, the current company, was formed.

Today, Autoliv Inc. is one of the leading companies in automotive safety and they produce a variety of safety products such as airbags, seatbelts and vision systems, among other things. They have approximately 80 production plants in 29 countries and 10 technical centers in 9 countries. Autoliv has about 48000 employees whereof 4400 are in Research, Development & Engineering.

Autoliv Electronics, a division of Autoliv, develops vision, night vision and radar systems as well as central electronic control units and satellite sensors. They have about 1500 employees primarily in France, Sweden, US and China.

In Linköping, Autoliv Electronics develops systems for night vision, mono vision, stereo vision and also some radar-vision fusion. The night vision system has fea-tures such as pedestrian detection and animal detection. The mono vision system

(23)

1.6 Outline 7

features are lane departure warning, speed sign recognition, vehicle collision mit-igation, pedestrian warning and high beam automation. The stereo system can do the same as the mono vision system but with higher accuracy and during low light conditions. The stereo system can also perform general object detection.

1.6 Outline

Below is an outline of this thesis with a short summary of the contents of each chapter:

Chapter 2 explains static fusion theory and results when applied to test data. Chapter 3 explains dynamic fusion theory and results when applied to test data. Chapter 4 quantitatively evaluates and compares the sensor fusion algorithms

for multi-target tracking based on tracking evaluation metrics.

Chapter 5 extends situational awareness with path prediction and tracked vehi-cle threat assessment.

Chapter 6 summarizes the results from the previous chapters and discusses fu-ture work with some concluding remarks.

(24)

(25)

2

Static Fusion

In this chapter we describe and present the results of two static fusion algorithms. We assume that independent radar and vision tracking systems each provide a static estimate of relevant object states at each sample. By this we mean that for each scan the fusion algorithm associates vision and radar tracks and fuses them as if they were static estimates of the target state. In Section 2.1 we describe the modeling work done to use the static fusion framework, in Section 2.2 how the track-to-track association problem is solved and in Section 2.3 the different algo-rithms for performing the actual fusion of the estimates. Finally in Section 2.5 we present the results when applying the algorithms to test data.

2.1 Modeling

In this chapter we assume tracked objects received from the vision and radar systems are static estimates of each target’s state. The state of each target consists of longitudinal position (pTiE

x ), longitudinal velocity over ground (vxTiW), lateral

position (pTiE

y ) and lateral velocity over ground (vyTiW), with vector notation

xTi ₌                pTiE x vTiW x pTiE y vTiW y                .

The estimates received from the vision system for each scan will be denoted by ˆ

xTi

v, i = 1, . . . , N , and each radar scan by ˆx Tj

r , j = 1, . . . , M. We further assume that

these are multivariate normally distributed with covariance matrices PTi

v and P Tj

r

(26)

10 2 Static Fusion

respectively. We assume the covariance matrices to be unknown as these might not be supplied if we have no control over the vision or radar tracking systems. This means that the covariance must either be estimated or modeled.

2.1.1 Covariance Matrix

As no information is kept between different scans, we make an assumption on the structure and parameters of the covariance matrix. We have tried two different structures. The first, tuning with noise parameters in cylindrical coordinates, assumes that the covariance matrix, given angle ϕ to the target, is given by

P = V DpV T ₀ 0 V DvVT ! , (2.1) where V = cos ϕ sin ϕ sin ϕ −_{cos ϕ} ! , Dp= σ₀r _σ0 ϕ ! , Dv= σ₀˙r _σ0 ˙ ϕ ! .

The parameters σr, σϕ, σ˙r, σϕ˙ are tuning parameters. The second structure is for

tuning noise parameters in a cartesian coordinate system and is given by

P =              σpx 0 0 0 0 σpy 0 0 0 0 σvx 0 0 0 0 σvy              , (2.2)

where σpx, σpy, σvx, σvy are the tuning parameters. The first structure lets us tune with the relative strengths of radar and vision measurements in consideration. However, some approximations are made prior to our algorithm regarding radar track velocity, ie. only range rate is estimated, but the longitudinal velocity is approximated as the x-composant of the range rate vector and lateral velocity is approximated as zero. This is the reason why we decided to also try the standard diagonal covariance matrix.

2.2 Track Association

By track association we mean the problem of, at each scan, finding the radar and vision track that correspond to the same object. At each scan the vision tracker outputs estimates ˆxTi

v , i = 1, . . . , N and the radar tracker ˆx Tj

v , j = 1, . . . , M. This

means that we can get at most min (M, N ) fused objects. The approach used can be seen in Algorithm 1. It is basically a standard data association approach with statistical distance calculation and optimal assignment with the Auction al-gorithm, as explained in e.g. Blackman and Popoli [1999]. In this thesis we have considered and implemented two different statistical distances, the first was Ma-halanobis distance (DM) and the second the Bhattacharyya distance (DB). Both

these measures can describe the distance, in some sence, between two random vectors. Thresholding was also used to prevent association if the corresponding

(27)

2.3 Fusion 11

distance was too large. Assignment is performed by the auction algorithm which minimizes the sum of distances of associated objects given that at most one asso-ciation can be made per track.

Algorithm 1:Track Association input : Vision estimates ˆxTi

v , i = 1, . . . , N and radar estimates

ˆ

xTrj, j = 1, . . . , M with respective covariance matrices PvTi and P Tj

r .

output: Logical association matrix A, M × N .

1 Distance: fori ← 1 to N , do forj ← 1 to M, do Dij = DM = ˆ xTi v −xˆ Tj r T P−1 ˆ xTi v −xˆ Tj r or Dij = DB= 1 8 ˆ xTi v −xˆ Tj r T P−1 ˆ xTi v −xˆ Tj r + 1 2ln           det(P ) q det(PTi v )det(P Tj r )           where P = P Ti v + P Tj r 2 ifDij ≥Threshold then Dij = ∞ 2 Optimal Assignment: A = auction (D) ,

where Aij = 1 if radar estimate j is assigned to vision estimate i and Aij = 0

otherwise.

2.3 Fusion

This section describes the two different algorithms for static fusion that we used for performing fusion. The first one assumes independent estimates and the sec-ond one does not.

2.3.1 Fusion of Independent Estimates

Assuming that the radar and vision estimates are independent the Weighted Least Squares (wls) solution for each target, see Gustafsson [2010], is given by

P =Pv−1+ P

−₁

r

−1

(28)

12 2 Static Fusion ˆ x = PPv−1xˆv+ P −1 r xˆr . (2.3b)

This is referred to as General Fusion (gf).

2.3.2 Fusion of Dependent Estimates

The independence assumption does not necessarily hold as the vision and radar estimates are based on tracking of the same objects and we assume that we have no insight in to the vision and radar tracker. To be on the safe side we also decided to implement and evaluate an algorithm that can handle dependent estimates. We decided to try the Safe Fusion (sf) algorithm, Gustafsson [2010], which re-quires no extra information regarding correlation. This algorithm is described in Algorithm 2 which uses the singular value decomposition (svd) and change of basis to fuse the two estimates. Essentially this corresponds to picking the best estimate in each dimension of this new basis.

Algorithm 2:Safe Fusion

input : Two unbiased estimates ˆxvand ˆxrwith corresponding

information matrices Iv= Pv−1and Ir = Pr−1.

output: Fused estimate ˆx and covariance matrix P . 1 SVD: I_v= U₁D₁U₁T 2 SVD: D −1 2 1 U1TIrU1D −1 2 1 = U2D2U2T 3 Transformation matrix: T = U₂TD 1 2 1U1

4 State transformation: ˆ¯x_v= T ˆx_v, ˆ¯x_r = T ˆx_r with covariances Cov( ˆ¯x_v) = I

and Cov( ˆ¯xr) = D −₁ 2 respectively. 5 fori ← 1 to n_x, let ˆ¯xi ₌ ( ˆ¯xi v, if D2ii< 1, ˆ¯xi r, if D2ii≥1, Dii= ( 1, if D₂ii < 1, D₂ii, if D₂ii≥_1, where D is diagonal.

6 Inverse state transformation:

ˆ

x = T−1ˆ¯x,

P = T−1D−1T−T.

2.4 Tuning

We decided to use the Optimal Subpattern Assignmnent (ospa) metric to do tun-ing. Multi-target tracking evaluation using the ospa metric described in Ristic et al. [2010] can be used to measure distance between the set of ground truth tracks and estimated tracks as given by a multi-target tracking algorithm.

(29)

Aver-2.4 Tuning 13

aging over the ospa distance for tracks gives a scalar value which can be used for tuning all the parameters in a target tracking system. A good property of the ospa metric is that it can not only account for localization errors but also cardinality and labeling errors.

Largely following the notation in Ristic et al. [2010] we define a track X on the discrete-time points t = (t1, . . . , tK) as a labeled sequence of length K:

X = (X1, X2, . . . , XK) , (2.4)

where, given the indicator 1k which takes the value of one if the track exists at

time tk and zero otherwise, Xk, k = 1, . . . , K is given by

Xk =

(

∅_, _{if 1}_k _{= 0}

{_{(`, x}_k_)}, _{if 1}_k _{= 1.} (2.5)

In this case ` denotes the track label which is unique and constant for each track. We continue by defining the set of true tracks, Xk, and estimated tracks, bXk, at tk

X_k ₌_`₁_{, x}_k,1_{, . . . , `}_m_{, x}_k,m_, _(2.6) b

X_k ₌_s₁_{, ˆ}_x_k,1_{, . . . , s}_n_{, ˆ}_x_k,n_, _(2.7) where m is the number of true tracks at tkand n the number of estimated. When

m ≤ n the ospa distance Dp,c

X_k_{, b}X_k_{is defined as} Dp,c X_k_{, b}X_k₌        1 n        min π∈Πn m X i=1 dc ˜ xk,i, ˜ˆxk,π(i) p + (n − m) cp               1 p , (2.8)

where ˜xk,i ≡ `i, xk,i, ˜ˆxk,π(i)≡

sπ(i), ˆxk,π(i)

and

• dx, ˜ˆ˜ x is the base distance between a true track and an estimated track at tk;

• dc

˜

x, ˜ˆx = minc, dx, ˜ˆ˜ xis the cutoff distance between two tracks at tk,

c > 0 is the cutoff parameter;

• Πnrepresents the permutations, length m, and with elements from {1, 2, . . . , n};

• 1 ≤ p < ∞ is the ospa metric order parameter. If m > n the OSPA is defined as Dp,c

X_{, b}X_{= D}_p,cXb_k, X_k

and if Xk, bXk = ∅ the

distance is zero. The base distance between two labeled vectors is given by

dx, ˜ˆ˜ x=d (x, ˆx)p0+ d (`, s)p0

1

p0 _, _(2.9)

where

• 1 ≤ p < ∞ is the base distance order parameter; • d (x, ˆx) is the localization base distance, here the p0

-norm;

(30)

14 2 Static Fusion

Parameter α ∈ [0, c] decides the penalty assigned to the labeling error. To label the estimated tracks we solve the assignment problem resulting from minimizing an ospa distance as described in Ristic et al. [2010] and then label the estimated track with the same label as its corresponding true track. The remainder of esti-mated tracks are given different unique labels.

For tuning we then, for constant p, c, p0, α, maximize

1 K K X k=1 Dp,c X_k_{, b}X_k_, _(2.10)

i.e. the average ospa distance for the whole data set, with respect to the tuning parameters such as noise- and initial covariances as well as track management pa-rameters. Maximization was performed by trying different setups and choosing the best one.

2.5 Results

2.5.1 Test Data

The fusion algorithms have been tested on three sequences of rural and highway data for illustration purposes, see below for a brief description:

Rural Road Rural road, one preceding and one oncoming car

Highway 1 Highway, three preceding targets, two cars and one truck

Highway 2 Highway, dense radar clutter, 10 preceding targets, eight cars and two trucks

The plots in the next section have been generated by defining the world fix, W , coordinate system with origin as ego-vehicle starting position. Then we have found vehicle position by performing dead reckoning based on known ego-vehicle speed and yaw rate. This is then used to find the true object and tracked object trajectories. A more detailed discussed how the reference (true) object were estimated see Chapter 4. Given speed veand yaw-rate ˙ϕewe get ego-vehicle

position, (pxe, pye)

T _{relative to W as}

pxe,k+1= pxe,k+ Tsvecos ϕk, (2.11)

pye,k+1= pye,k+ Tsvesin ϕk, (2.12)

ϕk+1= ϕk+ Tsϕ˙e, (2.13)

and each target and true object position relative to W as

pTiW x,k = pxe,k+ p T_iE x,k, (2.14) pTiW y,k = pye,k+ p TiE y,k, (2.15)

(31)

2.5 Results 15

where (pTiE

x,k, p TiE

y,k)T is, bar a constant translation between camera and front bumper

(ego-vehicle) coordinate system, equal to the tracked relative positions. Each test sequence is only approximately 12-30 seconds, and exact positioning is not needed to evaluate the tracking performance, so the impact of numerical errors is acceptable.

The plots shown in the next section contains best fit track trajectories for each true object trajectory.

2.5.2 Results and Discussion

We found that in general the diagonal covariance matrix structure gave better results in tracking. The plots in this section have been made using the diagonal structure of the covariance matrix, tuning after the ospa metric as described in Section 2.4 and with the General Fusion (gf) algorithm. The Safe Fusion (sf) algorithm generates very similar plots and have not been included here. The dif-ferences between these two algorithms mainly lie in position and velocity estima-tion accuracy, see Chapter 4. The plots show the trajectories of the true objects, solid line, and the relevant tracks, dash-dot/dots/dashes, with the direction of movement depicted by a triangle at the start and end of an object or track. The first plot depicts the Rural Road data scenario with one oncoming and one preceding target, see Figure 2.1. We can see that the gf algorithm seems to have problems tracking the oncoming target. The vision system’s estimated position is somewhat off in the beginning making the association difficult with the more accurate radar track. This is the case in general for oncomng vehicles because of transients in the vision system especially when it comes to velocity estimation. However, the preceding target which is in front of the ego-vehicle during the whole data sequence is tracked with good accuracy as is shown in the figure. The second plot, Figure 2.2, shows the first two preceding targets of the Highway 1 data set and their corresponding track trajectories. As we can see the first track, a car, is tracked very well. However, the truck is not tracked for the first 100 or so meters as the vision estimated position is much closer to ego-vehicle then the actual position. Position is in general very difficult for a monovision system to estimate. This is because the position estimate relies on an assumption of real width of the object and truck widths can vary a lot. It might be possible to make fusion in this case as there is a radar track, but to make the association would require us to raise the covariances significantly which would be bad for generalization.

The third plot, Figure 2.3, shows the first four preceding targets of the Highway 2 data set and their corresponding track trajectories. As we can see the static fusion algorithms have no problem tracking the leading vehicles even though there is some radar clutter and several objects. The accuracy is not quite as good as in the previous two examples but the general trajectory is kept.

The last plot, Figure 2.4, shows the next two preceding targets of the Highway 2 data set and their corresponding track trajectories. Here we can see that there

(32)

16 2 Static Fusion 0 50 100 150 200 250 300 −1 0 1 2 3 4 5 6 x [m] y [m] ← _{Track 1 (preceding)} ← _{Track 2 (oncoming)} // . − / /. True objects

Figure 2.1: Rural Road with one oncoming and one preceding target. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

0 100 200 300 400 500 600 700 800 −60 −50 −40 −30 −20 −10 0 x [m] y [m] ← _{Track 1} Track 2 → . − . True objects

Figure 2.2:Highway 1, first two preceding targets. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

(33)

2.5 Results 17 0 50 100 150 200 250 300 350 400 −35 −30 −25 −20 −15 −10 −5 0 5 x [m] y [m] Track 1 → ↑ Track 2 Track 3 ↓ ↓ _{Track 4} Track 5 → . − . True objects

Figure 2.3:Highway 2, first four preceding targets. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

is a significant portion of the trajectory which the object is not tracked. This is explained by the relatively long distance to the objects being tracked. They both start out more than 100 m ahead of the ego-vehicle and, because of high radar clutter and the unfavorable vision situation, these are not tracked until the ego-vehicle starts catching up. However, after track initiation is made the objects are tracked with good accuracy.

As we can see in Figure 2.5, the static fusion algorithms have a weakness in that they require both the vision and radar tracks to be of decent to good quality. This figure shows a scene, frame number 196, from the Highway 1 data scenario with camera picture to the right and the birds eye view to the left. The tracks are marked r, v and f and blue, magenta and green for radar, vision and fused tracks respectively. In this scene there are two leading vehicles, a car and a truck. We can see that the car is tracked accurately by the fusion algorithm. However, be-cause of the big distance between the vision and radar track for the truck, no association is made and hence no corresponding fused track. Of course this prob-lem could be solved by increasing covariances, but is a trade-off with the number of false associations that is acceptable.

All in all static fusion can significantly help with the position and velocity estima-tion when a radar-vision track associaestima-tion is made. However, it also suffers from the drawbacks from both the radar and vision system. This includes, amongst other things, bad weather conditions for the vision system and guard rail clutter

(34)

18 2 Static Fusion 100 200 300 400 500 600 700 800 −140 −120 −100 −80 −60 −40 −20 0 x [m] y [m] ← _{Track 1} Track 2 → . − . True objects

Figure 2.4:Highway 2, next two preceding targets. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

r

v v f r

Figure 2.5: An example of a difficult situation for the static fusion algo-rithms. In the birds eye view to the left blue rectangles are radar tracks (r), magenta are vision tracks (v) and green are fused tracks (f). Lines from the tracks is the velocity.

(35)

2.5 Results 19

for the radar system. Because of these reasons it is our opinion that this fusion algorithm concept needs to be complemented somehow if it is to be used in safety critical applications. One way could be to output the union of all tracks, at least in the more important zones like in front of the ego-vehicle.

(36)

(37)

3

Dynamic Fusion

In this chapter we will describe the basic theory for multi-target tracking us-ing Extended Kalman Filters (ekf) and then apply it in the decentralized fusion (df) and centralized fusion (cf) algorithms. The decentralized algorithm takes tracked objects as inputs and considers them as measurements which is then used in the tracking system. The centralized algorithm takes actual raw measurement data as inputs and performs tracking. In Section 3.1 we describe target tracking theory, in Section 3.2 the algorithm specific models and state definitions. In Sec-tion 3.3 we present and discuss the results of the two fusion algorithms on test data.

3.1 Target Tracking

Target tracking is the act of inferring relevant information of an object of interest over time given sensor data. This section is dedicated to give an introduction to multi-target tracking with the ekf, for a more extensive treatment see e.g. Black-man and Popoli [1999].

3.1.1 Target Representation

Each target is represented by its state, x. The temporal evolution of the state is described by a difference equation (f ( · )) and measurements, y, by a static equa-tion (h( · )); both with additive Gaussian noise, w and e. In mathematical terms, given known input signals uk:

xk+1= f (xk, uk) + wk,

yk= h (xk, uk) + ek.

(38)

22 3 Dynamic Fusion

For more information on this representation, also known as discrete state-space representation, see e.g. Blackman and Popoli [1999] or Gustafsson [2010].

3.1.2 Practical Issues

In general, the number of tracks, and which measurement belongs to which track, is unknown. The first problem we refer to as track handling and the second, data association. These problems are practical issues that must be addressed in multi-target tracking. There are many approaches and solutions to these problems, for a comprehensive description of these see Blackman and Popoli [1999]. Below, we describe the ones adopted for the decentralized and centralized tracking systems in this thesis.

Data Association

Measurements arrive from the sensors without any information on which mea-surement belongs to which track; this is the problem which data association solves. We assume that each track can be assigned at most one measurement, which means that the target is not extended, i.e. the resolution of the sensors is such that they will only give rise to one measurement per target per scan.

At each scan there are N tracks and M measurements available. Most data as-sociation methods create a distance matrix, D, where each element Dij consists

of some statistical distance measure between measurement yi and track xj. If

as-sociation is unlikely the element is given a very high value. Then assignment is performed, by solving the optimization problem

min M X i=1 N X j=1 cijDij, subject to: M X i=1 cij = 1, j = 1, . . . , N , N X j=1 cij = 1, i = 1, . . . , M, cij ∈ {0, 1},

where cij = 1 means that measurement i is associated to track j. This method

means that we minimize the sum of distances for associated tracks and measure-ments. For the decentralized and centralized fusion algorithms we decided to use the Global Nearest Neighbour (gnn) approach for data association. This means the distance is calculated as

Dij = yi−h xj, u T S−_j1yi−h xj, u | {z } Mahalanobis distance, dij + lnhdetSj i ,

(39)

3.1 Target Tracking 23

Track Handling

Track handling solves the problem of discerning the true tracks from what is false objects, measurements and clutter. To do this we employ a kind of filter, or finite-state machine, where each track is assigned a track state, which can take the following values:

Initialized First state, measurement has initialized new track

Pre-confirmed Second state, track has several measurements assigned Confirmed Third state, track confirmed as a true track

For each state we employ a standard M/N logic for upgrading or downgrading a track. This means that out of the latest N samples, at least M measurements of the relevant type must be assigned, see Section 3.1.2, to the track for it to be in the relevant state. A track can only be upgraded or downgraded one step at each sample, so it can not go directly from initialized to confirmed for exam-ple. However a track can be dropped directly from either the initialized state or the pre-confirmed state if enough missed detections occur. A dropped track is deleted.

3.1.3 Filter Theory

Here we will give a quick introduction to nonlinear filtering using the ekf based on the state-space representation in Section 3.1.1. This approach consists of do-ing two thdo-ings at each sample, first a time update and then a measurement up-date. For more information on the ekf and nonlinear filtering see e.g. Gustafsson [2010].

Measurement Update

When data association, as described in Section 3.1.2, has been performed we need to update tracks with their associated measurement. If we let ˆxk|k−1denote the

estimate at time k given all measurements up till time k − 1 and Pk|k−1its

corre-sponding covariance. Then the update is performed with the following computa-tions: Sk= ∇xh ˆ xk|k−1, uk Pk|k−1∇xh ˆ xk|k−1, uk T + Rk, Kk= Pk|k−1∇xh ˆ xk|k−1, uk T S_k−1, εk= yk−h ˆ xk|k−1, uk , ˆ xk|k= ˆxk|k−1+ Kkεk, Pk|k= Pk|k−1−Pk|k−1∇xh ˆ xk|k−1, uk T S_k−1∇_x_h_x_ˆ_k|k−1_{, u}_k_P_k|k−1_.

(40)

24 3 Dynamic Fusion

Time Update

The time update consists of predicting the value at time k + 1 given all values up until k. This results in the following computations:

3.1.4 General Algorithm

The general algorithm used for both the decentralized and centralized fusion ap-proaches in the coming two chapters are based on Algorithm 3.

Algorithm 3:Multi-target Tracking Algorithm

input : Measurements from radar and vision systems. output: Tracked targets.

Initialization: ˆx0|0, P0|0

Time recursion:

1 Time Update, see Section 3.1.3 2 Measurement Update:

• Data Association, see Section 3.1.2

• Compute measurement update, see Section 3.1.3

3 Track Handling:

• Update track states, see Section 3.1.2

• Initialize new tracks based on unassigned measurements • Drop old tracks without enough measurements

4 k := k + 1

3.2 Algorithm Specific Information

In this section we will describe algorithm specific models and practical issues for the two implemented dynamic fusion approaches.

3.2.1 Modeling

Decentralized Fusion State

The state of each target consists of longitudinal position (pTiE

x ), longitudinal

ve-locity over ground (vTiW

x ), lateral position (pTyiE) and lateral velocity over ground

(vTiW

y ), with vector notation

xTi ₌                pTiE x vTiW x pTiE y vTiW y                . (3.1)

(41)

3.2 Algorithm Specific Information 25

Centralized Fusion State

The state of each target consists of longitudinal position (pTiEC

x ) relative to the

camera, longitudinal velocity over ground (vTiW

x ), lateral position (pTyiEC) relative

to the camera, lateral velocity over ground (vTiW

y ), real world width of the object

(lTi_{) and road height (r}Ti

h ), with vector notation

xTi ₌                          pTiEC x vTiW x pTiEC y vTiW y lTi rTi h                          . (3.2)

Road height is defined as the distance between expected vertical position of the target given the flat-road assumption and the actual vertical position of the target.

Motion Model

As the focus of the thesis is not on modeling car dynamics, we decided to use a simple constant velocity model for prediction purposes. Both algorithms use the same model to propagate position and velocity. We assume that the ego-vehicle’s speed and yaw rate, uk = (ve, ˙ϕe)T, are known input signals to the system and

further that the process noise, wk, is white, Gaussian and zero-mean. Then by

aug-menting the state vector in (3.1) (or (3.2)) with a 1, i.e. xTi ₌_pT_xiE_{, v}_xTiW_{, p}_yTiE_{, v}T_yiW_{, 1}T_, we get the motion model in homogeneous coordinates as

xTi k+1= E (uk, Ts) T (Ts) xTki+ wk 0 ! , (3.3)

where the matrices E and T are defined as

T (Ts) =                 1 Ts 0 0 0 0 1 0 0 0 0 0 1 Ts 0 0 0 0 1 0 0 0 0 0 1                 , E (uk, Ts) =                  rc 0 rs 0 tx 0 rc 0 rs 0 −_r_s ₀ _r_c ₀ _t_y 0 −_r_s ₀ _r_c ₀ 0 0 0 0 1                  , (3.4) with rs= sin ˙ϕeTs, rc= cos ˙ϕeTs, ∆xe= veϕ˙ −1 e sin ˙ϕeTs, ∆ye= veϕ˙ −1 e (1 − cos ˙ϕeTs) ,

tx= −∆yesin ˙ϕeTs− ∆xecos ˙ϕeTs,

(42)

26 3 Dynamic Fusion

For the constant velocity model the process noise is wk ∼ N (0, Qk) where

Qk = σ_x2_¨G (Ts) G (Ts)T 0 0 σ_y2_¨G (Ts) G (Ts)T ! , G (Ts) = ₁ 2T 2 s, Ts T .

For more details regarding the motion model and derivation see Maehlisch et al. [2006].

The state vector in the centralized fusion algorithm is augmented with lTi _and

rTi

h . These are constant under the flat road assumption, which gives us the extra

difference equations as described below

lTi k+1 = l Ti k + wl,k, rTi h,k+1 = r Ti h,k+ wrh,k,

where we assume that the process noise for these new states is a zero-mean Gaus-sian, i.e. _wwl,k rh,k ! ∼ N_{0, diag}_σ2 l , σr2h . Measurement Models

In the decentralized fusion case the received information from each system is an estimate of the full state vector. With the state vector defined in Section 3.2.1 the resulting function, h( · ), is given by:

yTi

k = x Ti

k + ek, (3.5)

where the measurement noise is assumed to be Gaussian white noise ek ∼ N(0, Rk).

In the centralized fusion case the received measurement information consists of raw radar and vision data. The radar outputs range rkand bearing ϕk. The vision

classification algorithms give a bounding box for the vehicle or pedestrian; giving us information regarding horizontal and vertical central pixel position and pixel width. If the quality of classification is good or if, with flat road assumption, the object is close enough we also receive an estimate of the real width.

The measurement equations for the radar detections are given by

yTi r,k= r pTiEC x,k −tx 2 +pTiEC y,k −ty 2 + er,k, yTi ϕ,k= arctan pTiEC y,k −ty pTiEC x,k −tx + eϕ,k,

where tx, ty are the x, y-components for the vector relating the position of the

camera coordinate system, EC, to that of the radar system, ER, given that the

camera and radar are aligned. The measurement noise, e(·),k, is assumed to be

(43)

3.2 Algorithm Specific Information 27

For the vision system we get

yTi hc,k = −f pTiEC y,k pTiEC x,k + ehc,k, yTi lp,k = f lTi k pTiEC x,k + elp,k, yTi vc,k = f rTi h,k pTiEC x,k + erh,k, yTi lc,k = l Ti k + elc,k, yTi lf,k = l Ti k + elf,k,

where f is the focal length of the camera and again the measurement noise, e(·),k,

is assumed to be zero-mean Gaussian.

3.2.2 Practical Issues

The practical issues for a tracking system include, but are not limited to, track handling, data association and a general algorithm; all described in Section 3.1. Another issue in this case is that the radar and vision systems are sampled at different frequencies, the radar slightly slower than the vision system. This we have solved by simply updating with measurements as they become available, resulting in nonuniform sampling.

As target tracking for adas has real-time considerations and requires speedy and accurate tracking we decided that saving the last 0.5 seconds of measurement history for each track and each measurement system, i.e. radar and vision, was adequate for our purposes. As the radar system seems to give a lot of extra detec-tions on non-vehicle objects we have designed the system so that a track can not be confirmed unless vision measurements are associated to the track.

In the centralized case, the radar measurements always consists of two values and the vision measurement can be from three to five measurements. When updating a track using vision we only use the relevant information, i.e. checking which measurements are available and accurate at each scan as described in the pre-vious section. This is because some measurements are only valid under certain circumstances, eg. width estimate if the target is close enough.

Data association is solved as described in Section 3.1, with a gnn approach. Gat-ing is performed with the Mahalanobis distance and a chi-squared test.

The general algorithm used for tracking follows the one described in Section 3.1 with the above modifications.

(44)

28 3 Dynamic Fusion 0 50 100 150 200 250 300 −1 0 1 2 3 4 5 6 x [m] y [m] ← _{Track 1 (preceding)} ← _{Track 2 (oncoming)} // . − / /. True objects

Figure 3.1: dfalgorithm on Rural Road with one oncoming and one pre-ceding target. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

3.3 Results

The same test data sequences as in Section 2.5 have been used, for a more detailed description of plot generation and tuning see Chapter 2. We have split this sec-tion into three subsecsec-tions. We first discuss and showcase the qualitative results of the two dynamic fusion algorithms seperately and then make a few comments comparing the two. In Chapter 4 all algorithms are evaluated extensively, com-paring quantitative results from multi-target tracking evaluation metrics.

3.3.1 Decentralized Fusion

Figure 3.1 shows the Rural Road scenario with one preceding vehicle throughout the scenario as well as one oncoming vehicle for a shorter duration. As we can see the more computationally heavy df algorithm generates a very similar trajectory as the static fusion algorithms seen in the last chapter. The tracking accuracy on the preceding target seems to even be slightly worse. The similarities in this case are in line with what we expected as it is a rather simple scenario where the inputs to the algorithms are the same.

The second figure, Figure 3.2, displays the results of the first two objects on the Highway 1 data set. Here is where we can really see the gains of the df algorithm over the static fusion versions. The truck, corresponding to track 2, has a track associated very early and with good position accuracy. This is because we have

(45)

3.3 Results 29 0 100 200 300 400 500 600 700 800 −60 −50 −40 −30 −20 −10 0 x [m] y [m] ← _{Track 1} Track 2 → . − . True objects

Figure 3.2: dfalgorithm on Highway 1, first two preceding targets. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

increased the association distance between radar and vision tracks, which can be done without making the tracking worse in other situations. This is made possi-ble by the averaging effects of filtering as well as the track handling principles, i.e. finite-state machine with the states initiated, pre-confirmed and confirmed. The tracks shown and the algorithm outputs are only the confirmed tracks. The third figure, Figure 3.3, displays results on the Highway 2 data set. Only the first four preceding vehicles are shown in this plot. Here we can see that in general the df algorithm tracks well but has some problems with the object corresponding to track 4 and 5.

The fourth figure, Figure 3.4, displays the results of the next two objects on the Highway 2 data set. Again the df algorithm has certain problems with broken tracks, i.e. it needs to initiate new tracks for a real object it is supposed to track because previous ones diverge. But the position accuracy is good when there is a track assigned. Track 3 in this plot continues for some time because it is actually swapped with a nearby object, a truck, which is not shown in this plot.

In the following figure, Figure 3.5, we show a scene from the Highway 2 data scenario with camera picture to the right and the birds eye view to the left. This scene, frame number 250, contains five preceding vehicles that are of interest to an adas. The tracks are marked r, v and f and blue, magenta and green for radar, vision and fused tracks respectively. Out of the five vehicles, four cars and one

(46)

30 3 Dynamic Fusion 0 50 100 150 200 250 300 350 400 −35 −30 −25 −20 −15 −10 −5 0 5 x [m] y [m] Track 1 → ↑ Track 2 Track 3 ↓ Track 4 ↑ Track 5 ↑ . − . True objects

Figure 3.3: dfalgorithm on Highway 2, first four preceding targets. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

100 200 300 400 500 600 700 800 −140 −120 −100 −80 −60 −40 −20 0 x [m] y [m] ← _{Track 2} ↑ Track 1 ← _{Track 3} ← _{Track 5} Track 4 → . − . True objects

Figure 3.4: df algorithm on Highway 2, next two preceding targets. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

(47)

3.3 Results 31 v rf r r f v f r v vfr

Figure 3.5: An example of a difficult situation for the df algorithm. In the birds eye view to the left blue rectangles are radar tracks (r), magenta are vision tracks (v) and green are fused tracks (f). Lines from the tracks is the velocity.

truck, we can see that the four of the objects are accurately tracked by the df algo-rithm. We can see that there is an improvement over the static fusion algorithms when it comes to tracking trucks. The vision and radar measurement stemming from the truck have a fusion track as is seen by the rectangle in the middle of the bird’s eye view. However, this picture also shows a weakness of the df algorithm in that it still has to rely on that there are, during a set of a few frames, both radar and vision measurements associated with a fusion track. The car in the ego-vehicle lane has no corresponding fusion track because no vision measure-ments have been associated with it, only radar measuremeasure-ments. Remember we required at least one associated vision measurement, even with many good radar measurements, for a track to confirmed. Here there is also a trade-off between letting a measurement be associated to a track more easily, i.e. increasing either the covariance or the threshold, and decreasing the amount of false tracks.

3.3.2 Centralized Fusion

Figure 3.6 shows the Rural Road scenario with one preceding vehicle throughout the scenario as well as one oncoming vehicle for a shorter duration. Here we see a substantial improvement in the tracking of the oncoming target as compared to the other algorithms. The algorithm’s accuracy for the oncoming target looks a bit poor in the beginning of the track, but the difference is in the order of 0.5 m. The preceding vehicle is tracked at a higher positional accuracy than any of the other algorithms and with a quick initiation time.

Figure 3.7 displays the results of the first two objects on the Highway 1 data set. The algorithm does an excellent job of tracking both the car and the truck with

(48)

32 3 Dynamic Fusion 0 50 100 150 200 250 300 −1 0 1 2 3 4 5 6 x [m] y [m] ← _{Track 1 (preceding)} ← _{Track 2 (oncoming)} // . − / /. True objects

Figure 3.6: cf algorithm on Rural Road with one oncoming and one pre-ceding target. True trajectory given by solid line, arrow markers denote start/end of a track and direction. (0, 0) is ego-vehicle starting position.

high accuracy.

Figure 3.8 displays the results of the first four objects on the Highway 2 data set. Also here the cf algorithm tracks the true objects in a timely and accurate fashion. However, we should mention that a few extra tracks were created for some of the bigger objects.

The last plot, Figure 3.9, shows the results of the next two objects on the Highway 2 data set. The trajectories of the tracks are basically on top of the true object trajectories. The object corresponding to track 2, which is quite far from the ego-vehicle in the beginning, takes some time until it is tracked.

In the last figure, Figure 3.10, we show a scene from the Highway 1 data scenario with camera picture to the right and the bird’s-eye view to the left. This scene, frame number 12, contains one preceding vehicle that is of interest to an adas. The radar measurements are marked by r and are blue, fusion tracks are marked by an f and are green. As we can see the leading vehicle is tracked as expected. However, on the other side of the median an oncoming vehicle can be seen. The vision measurement for this vehicle has been falsely associated with radar mea-surements stemming from the guard rail. This is a problem that might be avoided by well tuned algorithms.

Vision and Radar Sensor Fusion for Advanced Driver Assistance Systems

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Vision and Radar Sensor Fusion for Advanced Driver

Assistance Systems

Vision and Radar Sensor Fusion for Advanced Driver

Assistance Systems

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Sammanfattning

Abstract

Acknowledgments

Contents

Notation

1

Introduction

1.1

System Setup

1.2

Sensor Fusion for Target Tracking and Situational

Awareness

1.3

Related Work

1.4

Problem Formulation

1.5

Autoliv

1.6

Outline

2

Static Fusion

2.1

Modeling

2.1.1

Covariance Matrix

2.2

Track Association

2.3

Fusion

2.3.1

Fusion of Independent Estimates

2.3.2

Fusion of Dependent Estimates

2.4

Tuning

2.5

Results

2.5.1

Test Data

2.5.2

Results and Discussion

3

Dynamic Fusion

3.1

Target Tracking

3.1.1

Target Representation

3.1.2

Practical Issues

3.1.3

Filter Theory

3.1.4

General Algorithm

3.2

Algorithm Specific Information

3.2.1

Modeling

3.2.2

Practical Issues

3.3

Results

3.3.1

Decentralized Fusion

3.3.2

Centralized Fusion