1
Examensarbete 30 hp Juni 2015
Tracking Vehicles using Multiple
Detections from a Monocular Camera
Viktor Bäck
Teknisk- naturvetenskaplig fakultet UTH-enheten
Besöksadress:
Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress:
Box 536 751 21 Uppsala Telefon:
018 – 471 30 03 Telefax:
018 – 471 30 00 Hemsida:
http://www.teknat.uu.se/student
Abstract
Tracking Vehicles using Multiple Detections from a Monocular Camera
Viktor Bäck
This thesis concerns image based tracking of vehicles using a monocular camera. A classifier is used to detect and classify objects in the images from the camera. For each detected object the classifier outputs several classifications, each including a confidence value. The objective of this thesis is to investigate how these classifications and confidence values can be used in a single target tracking framework in the best possible way. This is achieved by evaluating several tracking methods that utilize the classifications and confidence values in different ways. The relationship between the confidence values and the accuracy of the corresponding classifications is also investigated.
The methods are evaluated using data from real-world scenarios. It is found that classifications with high confidence values are more accurate on average than those with low confidence values. The differences in the average performance for the considered methods are found to be small.
Image based tracking of vehicles is a key component in active safety systems in vehicles. Such systems can warn the driver or automatically brake the vehicle if a collision is about to happen, thereby preventing accidents.
ISSN: 1401-5757, UPTEC F15 055 Examinator: Tomas Nyberg Ämnesgranskare: Thomas Schön
Handledare: Daniel Ankelhed, Niklas Ollesson
Sammanfattning
Att köra bil är bland det farligaste en genomsnittlig person gör. Förarens säkerhet beror både på hennes egen och andras förmåga att vara uppmärksam i trafiken.
Detta medför problem eftersom människors förmåga att upptäcka och reagera korrekt i farliga situationer av sin natur är begränsad. Så länge det är en människa som kör bilen kommer dessa problem att kvarstå. Passiva säkerhetssystem, så som bilbälten och krockkuddar, kan lindra skadorna till följd av en kollision. För att förhindra att olyckan inträ↵ar krävs dock så kallade aktiva säkerhetssytem, vilket är vad detta arbete handlar om.
På samma sätt som en människa använder sina sinnen kan en bil använda sen- sorer för att få information om omgivningen. Sådana sensorer kan till exempel vara en eller flera kameror, eller radar. Informationen från dessa sensorer bearbe- tas därefter stegvis. Först tas information fram om vart intressanta objekt så som bilar och fotgängare befinner sig, samt vilken hastighet dem rör sig med. Därefter kan man utifrån denna information bedöma om en kollision är påväg att inträf- fa. Om så är fallet kan olika strategier för att undvika kollisionen tillämpas. Till exempel så kan föraren varnas med en ljudsignal, eller så kan bilen automatiskt aktivera bromsarna.
I det här arbetet undersöks olika metoder för att bestämma position och has- tighet för bilar. Den enda sensoren som används av en kamera. En klassifice- rare används för att detektera och klassificera objekt i bilderna från kameran.
För varje detekterat objekt ger klassificeraren ifrån sig ett konfidensvärde. Målet med det här arbetet är att undersöka hur dessa klassificeringar och konfidensvär- den på bästa möjliga sätt kan användas för målföljning av bilar. Detta uppnås genom att utvärdera målföljningsmetoder som använder klassificeringarna och konfidensvärdena på olika sätt. Förhållandet mellan konfidensvärdena och nog- grannheten av motsvarande klassificeringar undersöks även.
Metoderna utvärderas med hjälp av data från verkliga scenarion. Utvärdering- en visar att klassificeringar med hög konfidens i genomsnitt är noggrannare än klassificeringar med låg konfidens. Skillnaden i den genomsnittliga prestandan för metoderna visas vara liten.
i
Acknowledgments
First of all I would like to thank my supervisors Daniel Ankelhed and Niklas Ollesson at Autoliv Electronics AB, as well as my subject reviewer at Uppsala University, Thomas Schön. You always pointed my in the right direction when I got o↵ track.
Special thanks to Karl Granström and Gustaf Hendeby for taking your time and answering my questions. Your ideas and advise have been truly helpful.
Also, thank you Peter Hall and Jacob Roll for the opportunity to do my thesis at Autoliv. It has been really great to get to know Autoliv as a company, and all the wonderful people working there.
I would also like to thank David Molin, who was doing his thesis at Autoliv at the same time as me, for all the interesting conversations and for accompanying me on the climbing walls of Hangaren.
Last but definitely not least, I thank my family for their unfaltering support.
Linköping, June 2015 Viktor Bäck
v
Contents
Notation ix
1 Introduction 1
1.1 Previous Work . . . 1
1.2 Problem Formulation . . . 2
1.3 Limitations . . . 3
1.4 Model . . . 3
1.5 Dataset . . . 6
1.6 Outline . . . 6
2 Target Tracking 9 2.1 Filtering . . . 9
2.1.1 Kalman Filter . . . 9
2.1.2 Extended Kalman Filter . . . 10
2.2 Data Association . . . 11
2.2.1 Gating . . . 11
2.2.2 Nearest Neighbor Association . . . 12
2.2.3 Probabilistic Data Association . . . 13
2.2.3.1 Association Probability . . . 13
2.2.3.2 State Update . . . 14
2.2.4 M/N Initiation and Deletion of Tracks . . . 15
3 Methods 17 3.1 Measurement Group Partitioning and Association . . . 17
3.2 Clustering Methods . . . 18
3.2.1 Average Cluster . . . 20
3.2.2 Simple Cluster . . . 20
3.2.3 Weighted Sum Cluster . . . 20
3.2.4 Nonlinear Weighted Sum Cluster . . . 20
3.2.5 Regression Cluster . . . 21
3.3 Probabilistic Data Association . . . 23
4 Error Analysis 25
vii
viii Contents
4.1 Track-Marking Pairs . . . 25
4.2 Error Norms . . . 26
4.2.1 Tracking Error . . . 26
4.2.2 5-Largest Tracking Error . . . 26
4.2.3 Clustering Error . . . 26
4.3 Overtaking Scenarios . . . 26
4.4 Confidence Value Error Analysis . . . 27
5 Results 29 5.1 Method Evaluation . . . 29
5.2 Method Evaluation in Overtaking Scenarios . . . 29
5.3 Confidence Value Evaluation . . . 31
6 Conclusions 33 6.1 Overall Performance . . . 33
6.1.1 Simple Cluster . . . 33
6.1.2 Nonlinear Weighted Sum Cluster . . . 34
6.1.3 Weighted Sum Cluster . . . 34
6.1.4 Average Cluster . . . 34
6.1.5 Probabilistic Data Association . . . 34
6.1.6 Regression Cluster . . . 34
6.2 Performance in Overtaking Scenarios . . . 34
6.3 Summary . . . 41
7 Future Work 49
Bibliography 51
Notation
Classification, measurement and region of interest (ROI) are words that will be used somewhat interchangeably in this thesis. Although, in actuality a classifica- tion is transformed into a measurement, which corresponds to a ROI than can be drawn as a rectangle on the image. Also, image and frame both refer to a single image from the camera mounted on the ego vehicle.
All variables that represent vectors, matrices or sets are written in bold font.
Vectors are by default column vectors.
A table with the most frequently used abbreviations in this thesis are pre- sented below.
Abbreviations
Abbreviation Explanation
5-LTE 5-Largest Tracking Error AC Average Cluster
CE Clustering Error EKF Extended Kalman Filter GNN Global Nearest Neighbour
KF Kalman Filter
NN Nearest Neighbour
PDA Probabilistic Data Association PDF Probability Density Function
RC Regression Cluster ROI Region of Interest
SC Simple Cluster SO Spatial Overlap SSM State Space Model
TO Temporal Overlap TE Tracking Error
WSC Weighted Sum Cluster
ix
Introduction 1
Active safety systems in vehicles is a field that has gained more and more atten- tion in recent years as a result of embedded electronic systems becoming cheaper and more powerful. It is estimated that more than 90% of all vehicle accidents are caused by human error [1]. The potential of such systems to prevent injuries and save lives is therefor immense.
Systems are currently in use that are able to warn the driver if a threatening situation occurs, or brake the car using autonomous emergency braking (AEB) in order to prevent accidents. The possibility of fully autonomous cars sharing the road with human drivers has also gained a lot of attention in media lately, not at least due to the Google Self-Driving Car project [2].
The European New Car Assessment Programme (Euro NCAP) is a programme that assesses the safety of vehicle models. A rating is assigned to a vehicle model based on how it performs in tests that simulate real-world accident scenarios.
In 2014 AEB was included in the Euro NCAP rating system [3]. This acts as a catalyst which increases the demand for active safety systems.
1.1 Previous Work
Target tracking is a challenging problem that has occupied the research commu- nity for several decades. It is also a highly relevant problem, with applications such as active safety system for vehicles [4], air traffic control [5] and ballistic missile surveillance [6]. The challenges come from the fact that sensors, as well as models, are always inaccurate to some degree. In target tracking applications this inaccuracy typically results in (i) measurements not originating from a target, known as clutter (ii) measurements originating from another target (iii) lack of measurement due to occlusion (iv) probability of detection less than unity.
Several methods that handle these problems have been considered. A category 1
2 1 Introduction
of commonly used non-probabilistic methods are based on data association, i.e.
explicit association of measurements to tracks. One simple but common method is the Nearest Neighbour (NN) method which associates the closest gated mea- surement to the target [7]. The Global Nearest Neighbour (GNN) method is a gen- eralization of NN which handles association of measurements to multiple targets by solving a optimal assignment problem [7]. The Probabilistic Data Association (PDA) filter, first presented in [8], handles single-target tracking in a cluttered environment by considering all events where at most one of the measurements in the target gate is target originated. A multi-target extension of PDA called the Joint Probabilistic Data Association (JPDA) filter is presented in [9]. Track initia- tion and deletion is added to this framework in the Integrated Probabilistic Data Association (IPDA) filter [10], and its multi-target version the Joint Integrated Probabilistic Data Association (JIPDA) filter [11]. In [12] the Multi-Hypothesis Tracking (MHT) method is presented, which considers the data association prob- lem over time by considering all measurement to track association events. The MHT method su↵ers from combinatorial growth of the number of association events, and is known to be more computationally demanding and more compli- cated to implement than e.g. JPDA, which limits its use in real-time systems.
Multi-detection versions of PDA, JPDA and IPDA have recently been pre- sented in [13], [14], [15], respectively. A common application for these methods are the over-the-horizon radar (OTHR) problem [16].
Another category of methods uses random finite sets (RFS) in order to for- mulate and solve the problem in a Bayesian setting. Explicit data association is thereby avoided. Examples of such methods are the Probability Hypothesis Density (PHD) filter [17] and the Bernoulli filter [18]. Various extension of these methods exist that handles multi-target tracking, initialization/deletion of tracks, etc.
1.2 Problem Formulation
The system setup is as follows. A monocular camera is mounted in the front of the ego-vehicle and is directed in the vehicle’s forward direction. Images from the camera are processed in real-time by a detection algorithm, which determines class and pixel location of objects in the image, as well as a confidence value for each detected object. The confidence value gives a measure of the certainty of the classification. The output from the detection stage is then used as measurements in the tracking stage, according to the tracking-by-detection paradigm. The de- tection is thus done independently of the tracking. The images from the camera have a resolution of 1024x592 pixels.
The objective of this thesis is to investigate how the classifications and the cor- responding confidence values can be used in a single target tracking framework in the best possible way. This is achieved by investigating di↵erent clustering methods (Section 3.2) and a modified version of the Probabilistic Data Associa- tion (PDA) method (Section 3.3). The relationship between the confidence values and the accuracy of the corresponding classifications is also investigated (Section
1.3 Limitations 3
4.4).
The tracking is done using a model which describes the target motion in 3- dimensional world coordinates. The tracking is evaluated in world coordinates (see Chapter 4) using manually marked reference tracks in the image plane. A simple state transition model is used in order to make it easier to investigate the best use of the classifications. Indeed, using a simple model allows one to isolate the contribution from the classifications, without having to consider contribu- tions from e.g. optical flow, etc. This is done in the hope that the analysis is still valid for more advanced models.
1.3 Limitations
To limit the scope of this thesis, the following assumptions are made.
1. Single target tracking
2. Preceding traffic on highway roads
3. Same model tuning can be used for all implemented methods
Only single target tracking methods are considered. Measurement association problems that arise due to the presence of multiple targets are handled in a way that is independent of the used tracking method, as described in Section 3.1. This is done so that the focus can be on investigating how the single target tracking can be improved. In order to limit the e↵ect of measurement association problems due to multiple targets, difficult scenarios such as traffic in cities are avoided.
Hence only data from highway traffic is considered.
It is also assumed that the same model (as given in Section 1.4) can be used for all considered methods.
1.4 Model
For each image frame from the camera, the detection algorithm outputs several classifications for each detected object in the image. Each classification contains information about the class of the object, a region of interest (ROI) which contains information about the pixel location of its corners, as well as an estimate of the certainty of the classification,
P(object is of class T| true detection of object). (1.1) The height of the ROI is not detected. Instead it is derived from a constant width- to-height ratio which is dependent on the object class. An example of a set of classifications for a car is shown in Figure 1.1. The confidence value is shown in the upper left corner of each ROI. Note that the ROI of each classification is assumed to be right-angled.
4 1 Introduction
Figure 1.1: Classifications with corresponding confidence values (shown in the upper left corner of each ROI) for a car.
The classifications are then transformed and used as measurements y in the tracking stage according to
y = 0BBBB BBBBBB@
xHCP [pixels]
width [pixels]
ybottom [pixels]
size [meters]
1CCCC
CCCCCCA, (1.2)
where xHCPis the horizontal center pixel, width is the width in pixels, and ybottom
is the y-coordinate of the bottom edge of the ROI. The component size is an as- sumed width of the vehicle in world coordinates and is constant for each vehicle class. This prior knowledge about objects based on their class allows one to esti- mate the distance to the objects. Since size does not contain any new information, it can be considered an artificial measurement.
As a model for the target dynamics we consider a constant velocity model, where the acceleration is modeled as normally distributed noise. With measure- ment signal as in (1.2), the model can be formulated using a state space model (SSM) as
xk+1 = f (xk,uk) + wk (1.3a)
yk = h(xk) + vk, (1.3b)
where wk ⇠ N (0, Qk) and vk ⇠ N (0, Rk) are process noise and measurement noise, respectively, which are assumed to be normally distributed with covariance
1.4 Model 5
x y z
forward direction
Figure 1.2: Sketch of the ego-vehicle and the xyz-coordinate system. The origin of the coordinate system coincides with the location of the camera, which is placed on the windscreen. The forward direction of the vehicle is indicated by an arrow.
matrices Qk and Rk. The input vector uk contains information about the ego- vehicle motion:
uk = vego
˙✓ego
!
, (1.4)
where vegoand ˙✓egoare the velocity and and yaw rate of the ego-vehicle, respec- tively. The state vector, which contains information about a target, has the follow- ing components
xk = 0BBBB BBBBBBBB BBBBBBB@
w x vx
y vy
z 1CCCC CCCCCCCC CCCCCCCA
, (1.5)
where w is the width of the vehicle in world coordinates, vxand vyare the velocity in the x- respectively y-direction in the xyz-coordinate system that is attached to the camera in the ego-vehicle, as given in Figure 1.2. The point (x, y, z), which is also measured in world coordinates, is located on the lower center part of the ROI that frames the backside of the target vehicle (see Figure 1.3).
The measurement model (1.3b) models the fisheye e↵ect of the camera and transforms world coordinates (x, y, z) to image coordinates (xp, yp) according to the pinhole camera model, which is given by
xp yp
!
=1 x
fxy fyz
!
, (1.6)
where fxand fyare the focal points in the x- and y-direction, respectively.
6 1 Introduction
1.5 Dataset
All considered tracking methods are implemented and evaluated in Chapter 5 using data that have been recorded in real-world scenarios. The data includes classifications, motion data of the ego-vehicle and images from the camera. The methods are then evaluated by comparing estimated tracks with reference tracks, known as markings, that represent the true position of objects in the image plane.
Each marking contains the class of the object and an ROI in pixel coordinates (see Figure 1.4),
yM = 0BBBB BB@
xHCP [pixels]
width [pixels]
ybottom [pixels]
1CCCC
CCA , (1.7)
with notation as in (1.2). The markings have been obtained by manually drawing ROIs for objects in the recorded image frames, which is done with almost pixel precision.
1.6 Outline
An outline of the chapters is given below.
• Chapter 2 gives an introduction to single tracking.
• Chapter 3 presents all methods that are considered in this thesis.
• Chapter 4 gives a detailed description of how the methods are evaluated.
• Chapter 5 presents method evaluation results.
• Chapter 6 presents comments on the results from the method evaluation.
• Chapter 7 presents thoughts on future work.
1.6 Outline 7
Figure 1.3: ROI corresponding to the state vector xk shown in blue, where the point (x, y, z) is marked with a white dot.
Figure 1.4: Car with a marking shown in red.
Target Tracking 2
Target tracking amounts to estimating the state of one or many targets over time given measurements. In a typical scenario measurements arrive sequentially over time, and whenever a new measurement is available the state estimate is updated accordingly.
The di↵erent stages of a typical target tracking algorithm are described in detail in the following sections.
2.1 Filtering
In the filtering step the state estimate is updated based on the previous state estimate and new measurements. This is achieved by using a model of the system, such as the state space model (SSM)
xk+1= f (xk,uk) + wk (2.1a)
yk = h(xk) + vk, (2.1b)
where wk ⇠ N (0, Qk) and vk ⇠ N (0, Rk) are process noise and measurement noise, respectively, which we assume are normally distributed. The state transi- tion function f is in general a nonlinear function of the previous state xk and, possibly, an input signal uk. It should capture the dynamics of the system, and makes it possible to predict future states based on a previous state. The observa- tion function h is also in general a nonlinear function of the state estimate xkand contains information about how the state is measured.
2.1.1 Kalman Filter
The filtering problem amounts to estimating p(xk| y1:k), i.e. the probability den- sity function of the state at time k given all measurements up to time k. In the
9
10 2 Target Tracking
case when f and h are linear, the SSM (2.1) can be written as
xk+1= Fkxk+ Bkuk+ wk (2.2a) yk = Hkxk+ Dkuk+ vk, (2.2b) where Fk, Bk, Hk and Dk are constant matrices. Unlike (2.1), an exact solution to the filtering problem exists for (2.2) and is given by the normal distribution N (ˆxk|k,Pk|k), with parameters given by the recursive equations
ˆxk|k= ˆxk|k 1+ Kk˜yk, (2.3a) Pk|k= (I KkHk)Pk|k 1, (2.3b)
˜yk= yk Hkˆxk|k 1, (2.3c) Kk= Pk|k 1HTkSk1, (2.3d) Sk= HkPk|k 1HTk + Rk. (2.3e) The equations (2.3) are collectively known as the Kalman filter (KF). State predic- tions can then be calculated from the filtered state estimate using the dynamics of the model according to
ˆxk+1|k= Fk+1ˆxk|k+ Bk+1uk+1, (2.4a) Pk+1|k= Fk+1Pk|kFTk+1+ Qk+1. (2.4b)
2.1.2 Extended Kalman Filter
If the state transition model f or the observation model h in (2.1) are nonlinear, then there exist no closed form solution to the filtering problem. Hence one has to rely on methods that solve the problem approximately. Several such methods have been considered, e.g the unscented Kalman filter (UKF) and particle filters (PF). In this thesis one of the most common approaches is considered; the so called extended Kalman filter (EKF). The EKF utilizes a quite intuitive approach;
instead of using Hk in the update equations (2.3), use the Jacobian JH, k= @h
@x ˆxk|k 1. (2.5)
Similarly, Fk+1is replaced by the Jacobian of f in the prediction equation (2.4b), JF, k+1= @f
@x ˆxk|k,uk. (2.6)
When calculating the innovation ˜yk and the state prediction ˆxk+1|k, however, the nonlinear functions h and f should be used;
˜yk = yk h(xk|k 1,uk), (2.7a) ˆxk+1|k = f (ˆxk|k,uk+1). (2.7b) The EKF is ad-hoc in the sense that there is no guarantee that it converges to the correct solution. That being said, it has been empirically shown to work well for many systems with mild nonlinearities.
2.2 Data Association 11
2.2 Data Association
If more than one target is being tracked or clutter is present, one faces the prob- lem of how the measurements should be assigned to the di↵erent targets, and how they should be used to update the state of each target. Data association methods present a solution to this problem.
2.2.1 Gating
The first step in data association methods often consists of gating the measure- ments, which means that for each target we only consider measurements that are sufficiently ’close’ based on the current estimate of the targets state and some distance norm. This reduces computational cost, and is motivated by the obser- vation that measurements ’far away’ from a target are unlikely to have originated from that target. Several di↵erent gating strategies exist which consider di↵erent distance norms. One of the most commonly used is ellipsoidal gating, where a measurement ykis considered to be inside the gate if
dk2:= (yk ˆyk)TSk1(yk ˆyk) G (2.8) is fulfilled, where G > 0 is a parameter that determines the size of the gate, ˆyk is the predicted measurement and Sk is the covariance of the innovation, as calculated in (2.3) (see Figure 2.1a). The volume of the ellipsoidal gating region is
VG = CMp
|Sk|GM/2, (2.9)
where M is the number of elements in yk, and CMis given in terms of the Gamma function as
CM = ⇡M/2
⇣M 2 + 1⌘ =
8>
>>
>>
>>
><
>>
>>
>>
>>
:
⇡M/2
⇣M 2
⌘!, M even
2M+1⇣
M+12
⌘!⇡(M 12 )
(M + 1)! M odd.
. (2.10)
The distance norm dk2, known as the Mahalanobis distance, of the innovation takes into account the uncertainty of the predicted measurement, resulting in a larger gating region if the uncertainty is large. If the state and measurement models are accurate, then d2k is approximately 2distributed with M degrees of freedom. Hence it is common to choose a value of Gthat corresponds to the, say, 99th percentile of the 2distribution, which means that approximately 99% of the measurements are expected to fall into the gate. This often gives a reasonable trade-o↵ between having a gate which is too large and thus resulting in a high computational burden, and not gating all measurements that originate from the target.
Gating alone does not necessarily solve all data association problems. Con- sider the following scenarios. (i) Suppose that more than one measurement falls
12 2 Target Tracking
d2k= G
y1k
y2k
y3k ˆ yk
(a)
d2k= G
yk1
yk2
yk3 yˆk1
ˆ
y2k d2k= G
(b)
Figure 2.1: (a) Predicted measurement ˆyk and an ellipsoidal gating region given by dk2 G. Measurements y1k and y2k are inside the gate, while y3k is not. (b) Two overlapping gating regions corresponding to two di↵erent tracks, with predicted measurement ˆy1k and ˆy2k, respectively. Measurement y2kis inside both gates.
into a target gate; it could be a measurement from another target or it could be clutter. In this case it is not clear how the target state should be updated using the gated measurements. (ii) Another problematic situation is shown in Figure 2.1b, where a measurement is in the gate of two di↵erent target gates.
In the following sections some common data association methods that address problem (i) are presented. However, since the emphasis in this thesis is on single- target tracking, rather than multi-target tracking, methods that handle problem (ii) are not considered. Instead, measurement association problems due to multi- ple targets are handled as described in Section 3.1.
2.2.2 Nearest Neighbor Association
A simple data association method that is common in radar applications is Nearest Neighbour (NN) data association, where the distance to each measurement inside the gate is calculated according to (2.8). The measurement with the smallest distance value, i.e. which is closest to the predicted measurement, is then used to update the state of the target. The other measurements inside the gate are simply neglected. This resolves the association problem in Figure 2.1a.
There exist global nearest neighbor association (GNN) methods that deal with the problem in Figure 2.1b by associating measurements with tracks in such a way that the total cost of all associations are minimized, thus obtaining an opti- mal solution to the association problem.
2.2 Data Association 13
2.2.3 Probabilistic Data Association
The probabilistic data association (PDA) method estimates the posterior state PDF by considering all possible measurement association events. In each asso- ciation event it is assumed that at most one measurement is target originated and the other measurements are clutter. A derivation (which is inspired by [15]) of the PDA method now follows.
Suppose that N measurements {y}k := {yjk}Nj=1 are inside the gate of a track at time k, and let Yk := {{y}t| t k} be the set of all measurements up to time k.
Furthermore, let Ajbe the event that measurement yjis target originated, and A0
be the event where all measurement are clutter. Let A be the set of all mutually exclusive and exhaustive events Aj.
The Probabilistic Data Association (PDA) filter, first presented in [8], solves the measurement association problem by marginalizing the posterior state PDF with respect to all possible association events,
p(xk|Yk) = X
Aj2A
p(xk, Aj|Yk)
= X
Aj2A
p(xk|Aj,Yk)P(Aj|Yk), (2.11)
where P(Aj|Yk) is the probability of the association event Ajgiven measurements up to time k.
2.2.3.1 Association Probability
The association probability P(Aj|Yk) can be expressed in terms of a likelihood and a prior conditioned on the previous measurements according to
P(Aj|Yk) = ⌘p({y}k|Aj,Yk 1)P(Aj|Yk 1), (2.12) where all terms that do not depend on Aj are included in the normalization con- stant ⌘. The association Aj at time k is assumed to be independent of previous measurements, i.e. P(Aj|Yk 1) = P(Aj). Also, the prior P(Aj) is assumed to be uniformly distributed, which means that it can be included in the normalization constant.
For j 1 the likelihood in (2.12) is proportional to
p({y}k|Aj,Yk 1) / PDPG N 1⇤jk, (2.13) where PDis the probability of detection, and PGis the probability that a detected measurement is inside the track gate. The clutter density is assumed to Poisson distributed with parameter (see [7])
= N
VG, (2.14)
14 2 Target Tracking
where VGis the volume of the track gate (2.9). The spatial likelihood of the true measurement ⇤jk can be obtained by calculating the corresponding innovation and evaluating the innovation PDF N (yjk h(xk) | 0, Sk), which is given by
⇤jk = ed2j/2 PG(2⇡)M/2p
|S|k
, (2.15)
where M is the dimension of the measurement vector yk and dj is the Maha- lanobis distance (2.8). The innovation PDF is restricted to the gating region by including the gating probability PGin (2.15).
For j = 0 the likelihood is proportional to
p({y}k|Aj,Yk 1) / (1 PDPG) N. (2.16) The normalization constant ⌘ can now be obtained by marginalizing (2.12) with respect to the association events Aj, using (2.13) and (2.16). This yields the following expressions for the association probability,
P(Aj|Yk) = 8>
>>
>>
>>
><
>>
>>
>>
>>
:
(1 PDPG) (1 PDPG) + PDPGPN
j=1⇤jk j = 0 PDPG⇤jk
(1 PDPG) + PDPGPN
j=1⇤jk
j 1.
(2.17)
2.2.3.2 State Update
A weighted sum of the innovations is calculated as
˜yk= XN
j=1
P(Aj|Yk)˜yjk, (2.18)
where ˜yjk is the innovation corresponding to measurement yjk. The sum of PDFs (2.11) is now approximated using a single Gaussian distribution N (xk|ˆxk|k,Pk|k), where
ˆxk|k = ˆxk|k 1+ Kk˜yk, (2.19) and
Pk|k= P0k|k+ dPk, (2.20) where
P0k|k= P(A0|Yk)Pk|k 1+ (1 P(A0|Yk))P⇤k|k (2.21)
dPk = Kk
26666 664
XN j=1
P(Aj|Yk)˜yjk(˜yjk)T ˜yk˜yTk 37777
775 KTk (2.22) P⇤k|k= (I KkHk)Pk|k 1. (2.23)
2.2 Data Association 15
The Kalman gain Kk and the state prediction are calculated as in the standard EKF filter presented in Section 2.1.
2.2.4 M/N Initiation and Deletion of Tracks
If a measurement is not associated to an existing track, it could either be clutter or be a measurement from a new target. For this reason a tentative track is created which is evaluated to make sure that it really is a new target. If the tentative track receives sufficiently many measurements over a period of time, it will be considered a confirmed track, otherwise it will be deleted.
A common track initiation procedure is M/N initiation, which is now ex- plained. In order for a tentative track to be confirmed, it first needs to receive measurements on N1 consecutive time steps. Then, the track must receive mea- surements on M2out of the next N2time steps. If these conditions are fulfilled, the track is considered confirmed.
When a confirmed track stops receiving new measurements, a strategy which deletes the track is employed. One simple strategy is to delete the track if it has not received a new measurement for NDconsecutive time steps.
Methods 3
This chapter presents a detailed description of all considered methods.
3.1 Measurement Group Partitioning and Association
As the objective of this thesis is not to evaluate the multi-target tracking perfor- mance of di↵erent methods, but rather their single target tracking performance, all implemented methods utilize an algorithm which forms groups of measure- ments, and associates at most one such group to each target. Each group of mea- surements should then correspond to an object in the image. While this is not necessarily always true, it resolves the association problem due to adjacent ob- jects in the image in a consistent manner, which is independent of the used track- ing method. The measurement group partitioning and association is described in detail below.
In order to successfully use a single target tracking algorithm in situations where vehicles are located close to each other in the image, groups of measure- ments are formed and assigned to tracks as explained in Algorithm 1. The over- lap in Algorithm 1 is calculated using the spatial overlap norm
SO = area (A1\ A2)
area (A1[ A2), (3.1)
where A1and A2are the two measurement ROIs (see Figure 3.1). An example of a measurement group partitioning is shown in Figure 3.2.
Each track is then associated with at most one measurement group by choos- ing the closest group inside the track gate. The distance to the measurement group is defined as the Mahalanobis distance (2.8) to the measurement in the group with the highest confidence value. This association is done so that a mea- surement group can be associated to at most one track, thus reducing the risk
17
18 3 Methods
Algorithm 1 Measurement Group Partitioning
1: measurements initially have no id
2: count 0 . Number of measurement groups
3: while 9 measurements with no id do
4: count count+1
5: find measurement m1 with highest confidence and no id
6: assign id count to m1 and all measurements that overlaps with m1
7: end while
8: for each measurement m2 with more than one id do
9: for each id i of m2 do
10: find measurement m3 with highest confidence and id i
11: calculate overlap of m2 and m3
12: end for
13: assign m2 the id i that corresponds to the largest overlap on line 11
14: end for A1
A2
A1\ A2:
A1
A2
A1[ A2:
Figure 3.1: Intersection of two ROIs (left), and union (right).
that two tracks are tracking the same target by sharing the same measurement group. It should be noted though that in the case when several tracks compete over the same measurement group, the group is associated to one of the tracks more or less at random. A more optimal approach could be used, e.g. where the group is associated with the closest track. However, this has not been considered, since empirical results suggest that this situation is rare and should not a↵ect the evaluation of the considered methods.
3.2 Clustering Methods
After the measurements have been partitioned into measurement groups, one ap- proach is to cluster each measurement group into one single measurement, which is then filtered using a standard EKF. These methods are referred to as clustering methods in this thesis. All considered clustering methods are presented below.
3.2 Clustering Methods 19
0.95 0.8
0.7
0.9
(a)
0.95 0.8
0.7
0.9 id = 1
id = 1
id = 1
(b)
0.95 0.8
0.7
0.9 id = 1
id = 1
id = 1, 2
id = 2
(c)
0.95 0.8
0.7
0.9 id = 1
id = 1
id = 1
id = 2
(d)
Figure 3.2: (a) 4 measurements with corresponding confidence values. (b) The id 1 is assigned to the measurement with confidence value 0.95 and to all measurements that it overlaps. (c) The id 2 is assigned to the measure- ment with confidence value 0.9 and to all measurements that it overlaps.
(d) The measurement with confidence value 0.7 that was assigned two id’s is now assigned id 1, since the spatial overlap with the measurement with confidence value 0.95 is greater than that with confidence value 0.9.
20 3 Methods
3.2.1 Average Cluster
In Average Cluster (AC) the mean value of the measurements is calculated and used as output. Hence the confidence values are not used.
3.2.2 Simple Cluster
In Simple Cluster (SC) the measurement with the highest confidence value is used as output, thereby discarding the other measurements. This method is sen- sitive to outliers and tends to produce a noisy output. Also, useful information might be lost when discarding measurements.
3.2.3 Weighted Sum Cluster
In Weighted Sum Cluster (WSC) a weighted sum of the measurements is calcu- lated for each measurement group according to
y = 1 P
jpj X
j
pjyj, (3.2)
where yj are the measurements in the measurement group, with corresponding confidence values pj. The WSC is less sensitive to outliers than SC since it uses more measurements.
In e↵ect it assumes that the confidence values gives an indication of the accu- racy of the measurements. Note that a large group of low confidence measure- ments can give the same contribution to the output as a smaller group of high confidence measurements.
Variations of this method are also considered, where only the m measure- ments with the highest confidence values are used in (3.2). Note that if m = 1, then WSC is reduced to SC.
3.2.4 Nonlinear Weighted Sum Cluster
In Nonlinear Weighted Sum Cluster (NWSC) a weighted sum of the measure- ments is calculated for each measurement group according to
y = 1 P
j ˜pj X
j
˜pjyj, (3.3)
where yj are the measurements in the measurement group and ˜pj are trans- formed confidence values given by
˜pj = 8>
>>
<>
>>
: 0.5pj
0.85 pj< 0.85 c1E(pj) + c2 pj 0.85.
(3.4)
3.2 Clustering Methods 21
0 0.2 0.4 0.6 0.8 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
p
˜p
Figure 3.3: Transformed confidence values (blue line) plotted against confi- dence values. The black dashed line is a guide for the eye.
An explanation of (3.4) now follows. In Chapter 5 the mean error, E(p), for a large number of measurements is calculated and plotted against their correspond- ing confidence values p (see Figure 5.1b). By doing the transformation (3.4), pj should contribute to the sum (3.3) in a way that corresponds to the shape of E(p).
This can be seen by noting that E(p) is approximately linear for p < 0.85. It also holds that
E(0.85)⇡ max E(p) min E(p)
2 . (3.5)
This results in (3.4), where c1and c2are constants such that ˜pjattains the values 0.5 and 1 at pj = 0.85 and and pj = 1, respectively. The transformation (3.4) is visualized in Figure 3.3.
3.2.5 Regression Cluster
In Regression Cluster (RC) a function g(y, ✓) is fitted to the measurements in a least squares sense by solving the optimization problem
min✓
X
j
⇣pj g( ˆyj,✓)⌘2
+X
i
ri(✓)2, (3.6)
where ✓ = (✓1, ✓2, . . . , ✓np)Tis a vector with npparameters and ri(✓) are quadratic penalty terms which impose constraints on ✓. The measurement vector in (3.6)
22 3 Methods
is given in the alternative form
ˆyj= 0BBBB BBBBBB@
xleftj xrightj ybottomj
1CCCC
CCCCCCA, (3.7)
where xleftj and xjrightare the x-coordinate of the left and right edge of the ROI of measurement j, respectively. We consider a Gaussian shaped function as regres- sion function,
g(y, ✓) = ✓1exp⇣
(y µ(✓))T⌃(✓) 1(yj µ(✓))⌘
, (3.8)
where
µ(✓) = 0BBBB BB@
✓2
✓3
✓4 1CCCC
CCA (3.9)
⌃(✓) = 0BBBB BB@
✓5 0 0
0 ✓6 0
0 0 ✓7
1CCCC
CCA , (3.10)
with constraints
0 ✓1 1 (3.11)
minj xleftj ✓2 max
j xleftj (3.12)
minj xrightj ✓3 max
j xrightj (3.13)
minj ybottomj ✓4 max
j ybottomj (3.14)
0 ✓5, ✓6, ✓7 104. (3.15) Hence the number of parameters np= 7.
Note that the confidence values of the measurements can not be seen as re- alizations of a PDF. For this reason the normalization constant of the Gaussian distribution is replaced by the parameter ✓1 in (3.8). The Levenberg-Marquardt algorithm [19], which is a nonlinear least squares method, is used to find a local minimum of (3.6). When a solution is found the vector (3.9) is used as output from RC.
A requirement for the RC method to work is that sufficiently many measure- ments are available when solving (3.6), otherwise the Levenberg-Marquardt al- gorithm will result in a system of equations that is badly conditioned. For this reason the WSC method is used if the number of measurements are less than 8.
3.3 Probabilistic Data Association 23
3.3 Probabilistic Data Association
A modified version of the PDA method in Section 2.2.3 is presented here. As it is suspected that measurements with high confidence are more accurate, the confi- dence values are included in the modified method. This is done by multiplying the likelihood (2.13), where yj is the target originated measurement, with the corresponding confidence value pj,
p({y}k,{p}k|Aj,Yk 1) / PDPG N 1⇤jkpj, (3.16) where {p}k is the set of all confidence values at time k. This results in a slightly di↵erent expression for the association probability
P(Aj|Yk,{p}k) = 8>
>>
>>
>>
><
>>
>>
>>
>>
:
(1 PDPG) (1 PDPG) + PDPGPN
j=1⇤jkpj j = 0 PDPG⇤jkpj
(1 PDPG) + PDPGPN
j=1⇤jkpj j 1.
(3.17)
The state update and prediction is identical to the standard PDA given in Section 2.2.3.
Error Analysis 4
This chapter presents the error norms that are used in the method evaluation in Chapter 5. The errors are calculated for estimated tracks using reference tracks, known as markings (see Section 1.5). All estimated tracks, however, do not have a corresponding marking, so there is a need to pair up tracks with markings before the errors can be calculated. This procedure is presented in detail below.
4.1 Track-Marking Pairs
In order to calculate the error for an estimated track, there is a need to determine if a marking exists that corresponds to the tracked object. It is also necessary to determine the time interval that they both have in common. This is achieved by introducing the concepts of spatial overlap and temporal overlap (see [20]). The spatial overlap of a confirmed track and a marking for a single image frame is given by
SO = area (T \ M)
area (T [ M), (4.1)
where T and M are the ROIs of a track and a marking, respectively (as defined in (3.1)). The temporal overlap is defined as
T O = overlap in frame span. (4.2) A track and a marking are then paired up if the following criteria are fulfilled:
SO 0.2 T O 18 frames, (4.3)
where SO is the average spatial overlap during the time interval given by the temporal overlap. The criteria (4.3) has empirically been shown to result in good track-marking pairs.
25
26 4 Error Analysis
4.2 Error Norms
The following sections present error norms that are used in the method evalua- tion in Chapter 5. Suppose that a marking yM, as given in (1.7), is available for a given track-marking pair at a given frame. The di↵erence yM y in image coordi- nates is then transformed to a di↵erence P (yM y) in world coordinates using the transformation (1.6). This is done under the assumption that the x component of the marking is the same as that of the estimated track. The vector y can ei- ther be a measurement, the output from the clustering, or obtained using the measurement model (1.3b). The error is then calculated in world coordinates as
E(y) := P (yM y) 2, (4.4)
where k · k2 denotes the 3-dimensional Euclidean norm. Note that the artificial measurement size, as given in (1.2), is not included when calculating the error (4.4).
4.2.1 Tracking Error
The tracking error (TE) is presented in Algorithm 2.
Algorithm 2 Tracking Error
1: a empty array
2: for each track-marking pair p do
3: for each frame in the temporal overlap of p do
4: calculate E(y), where y is obtained from the measurement model (1.3b)
5: concatenate the error calculated on row 4 to a
6: end for
7: end for
8: result mean value of a . tracking error
4.2.2 5-Largest Tracking Error
The 5-largest tracking error (5-LTE) is presented in Algorithm 3. The 5-LTE gives a measure of the worst performance of a method.
4.2.3 Clustering Error
The clustering error (CE) is presented in Algorithm 4. The CE can only be calcu- lated for clustering methods, and not for e.g. PDA.
4.3 Overtaking Scenarios
In Section 5.2 all methods are evaluated in overtaking scenarios where the target vehicle is seen from an angle (see Figure 4.1). This case is especially difficult since the classifications tend to be non-symmetrically distributed around the target.
4.4 Confidence Value Error Analysis 27
Algorithm 3 5-Largest Tracking Error
1: a empty array
2: for each track-marking pair p do
3: for each frame in the temporal overlap of p do
4: calculate E(y), where y is obtained from the measurement model (1.3b)
5: end for
6: concatenate the 5 largest errors calculated on row 4 to a
7: end for
8: result mean value of a . 5-largest tracking error Algorithm 4 Clustering Error
1: a empty array
2: for each track-marking pair p do
3: for each frame in the temporal overlap of p do
4: calculate E(y), where y is the output from the clustering
5: concatenate the error calculated on row 4 to a
6: end for
7: end for
8: result mean value of a . clustering error
The overtaking scenarios are selected using the following conditions,
x < 30 m (4.5a)
10 m < y < 1.7 m or 1.7 m < y < 10 m, (4.5b) where x and y are elements in the state vector xk (see Section 1.4).
4.4 Confidence Value Error Analysis
Most of the tracking methods in this thesis relies on the assumption that the confidence values give an indication of the accuracy of the measurements. It is therefore of interest to investigate whether or not this assumption is true, and if so, to what extent.
The dependence of the measurement accuracy on the confidence values are evaluated by computing the error according to the norm (4.4) for measurements and then plotting the error against the corresponding confidence values. In order to compute the error for a measurement of an object, a reference marking of the object must be available. For each frame in the temporal overlap of each track- marking pair, as given in Section 3.1, the error is calculated for all measurements in the measurement group that was assigned to the track at that frame. This allows one to plot, for example, the mean value and the standard deviation of the error for di↵erent confidence value intervals, as is done in Section 5.3.
28 4 Error Analysis
Figure 4.1: Example of an overtaking scenario. The truck is located at (x, y, z) = (23, 3.1, 1.4).
Results 5
In this chapter the methods presented in Chapter 3 are evaluated. All input sig- nals to the tracking stage have been recorded in real-world scenarios as described in Section 1.5. The error of each method is calculated in world coordinates using the error norms presented in Chapter 4. An error analysis of the confidence val- ues is also given.
All methods are implemented using M/N initiation (see Section 2.2.4) of tracks, with N1 = 3, M2 = 2 and N2 = 3. Tracks are deleted after ND = 4 consecutive misses.
5.1 Method Evaluation
The methods are evaluated using data that consists of about 2 · 105 frames. For each method about 686 track-marking pairs were created, with an average length of about 305 frames.
The TE, 5-LTE and CE are shown in Table 5.1 for each method.
5.2 Method Evaluation in Overtaking Scenarios
The methods are evaluated using data that consists of about 2 · 105frames. Only track-marking pairs where the track satisfies (4.5) are considered. For each method about 253 track-marking pairs were created, with an average length of about 75 frames.
The TE, 5-LTE and CE for overtaking scenarios are shown in Table 5.2 for each method.
29
30 5 Results
Table 5.1: Estimated error in centimeters for all methods. The smallest and largest mean values are shown in green and red, respectively.
TE 5-LTE CE
Method Mean Std Mean Std Mean Std
SC 21.84 11.02 38.36 14.46 24.43 12.05
WSC, m = 2 21.27 10.74 37.05 14.39 22.55 11.25 WSC, m = 3 21.04 10.62 36.58 14.26 21.84 10.94 WSC, m = 4 20.93 10.55 36.39 14.19 21.52 10.81 WSC, m = 5 20.89 10.56 36.35 14.22 21.37 10.79 WSC, m = 6 20.91 10.58 36.42 14.22 21.32 10.79 NWSC 21.06 10.59 36.47 14.21 21.19 10.77
WSC 21.20 10.65 36.72 14.25 21.46 10.90
AC 21.38 10.78 37.05 14.50 21.78 11.11
PDA 22.58 11.30 37.91 14.80 - -
RC 20.91 10.67 38.09 15.69 22.73 11.83
Table 5.2: Estimated error in centimeters for all methods in overtaking sce- narios. The smallest and largest mean values are shown in green and red, respectively.
TE 5-LTE CE
Method Mean Std Mean Std Mean Std
SC 17.43 10.69 28.78 14.17 21.10 12.57
WSC, m = 2 16.96 10.22 26.98 13.02 19.10 11.55 WSC, m = 3 16.94 10.09 26.68 12.45 18.46 11.11 WSC, m = 4 16.96 10.15 26.67 12.61 18.19 10.96 WSC, m = 5 17.19 10.27 26.85 12.59 18.25 10.99 WSC, m = 6 17.37 10.37 27.11 12.64 18.32 11.00 NWSC 17.81 10.38 27.24 12.35 18.32 10.90
WSC 18.22 10.55 27.83 12.33 18.85 11.13
AC 18.63 10.73 28.41 12.36 19.41 11.43
PDA 19.34 11.76 28.91 14.00 - -
RC 17.65 10.28 28.14 12.35 20.08 11.86