Evaluation of Target Tracking Using Multiple Sensors and Non-Causal Algorithms

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2019

Evaluation of Target

Tracking Using Multiple

Sensors and Non-Causal

Algorithms

(2)

Evaluation of Target Tracking Using Multiple Sensors and Non-Causal Algorithms:

Albin Vestin and Gustav Strandberg LiTH-ISY-EX--19/5256--SE Supervisor: Lic. Eng. Gustav Lindmark

ISY, Linköping University Dr. Thomas Svantesson

Veoneer Sweden AB Dr. Patrik Leissner

Veoneer Sweden AB

Examiner: Assoc. Prof. Gustaf Hendeby ISY, Linköping University

Division of Automatic Control Department of Electrical Engineering

Linköping University SE-581 83 Linköping, Sweden

(3)

Abstract

Today, the main research field for the automotive industry is to find solutions for active safety. In order to perceive the surrounding environment, tracking nearby traffic objects plays an important role. Validation of the tracking performance is often done in staged traffic scenarios, where additional sensors, mounted on the vehicles, are used to obtain their true positions and velocities. The difficulty of evaluating the tracking performance complicates its development.

An alternative approach studied in this thesis, is to record sequences and use non-causal algorithms, such as smoothing, instead of filtering to estimate the true target states. With this method, validation data for online, causal, target tracking algorithms can be obtained for all traffic scenarios without the need of extra sensors. We investigate how non-causal algorithms affects the target tracking performance using multiple sensors and dynamic models of different complexity. This is done to evaluate real-time methods against esti-mates obtained from non-causal filtering.

Two different measurement units, a monocular camera and aLIDARsensor, and two dy-namic models are evaluated and compared using both causal and non-causal methods. The system is tested in two single object scenarios where ground truth is available and in three multi object scenarios without ground truth.

Results from the two single object scenarios shows that tracking using only a monocular camera performs poorly since it is unable to measure the distance to objects. Here, a com-plementaryLIDARsensor improves the tracking performance significantly. The dynamic models are shown to have a small impact on the tracking performance, while the non-causal application gives a distinct improvement when tracking objects at large distances. Since the sequence can be reversed, the non-causal estimates are propagated from more certain states when the target is closer to the ego vehicle. For multiple object tracking, we find that correct associations between measurements and tracks are crucial for improving the tracking performance with non-causal algorithms.

(4)

(5)

Acknowledgments

New city, new friends and so many good memes throughout our time here. First and foremost we would like to thank all of the Yc nerds for keeping our sanity during countless of hours studying.

A big thanks to our supervisors Thomas Svantesson and Patrik Leissner at Veoneer for all the help and feedback during our work. Thank you for setting us up, answering our questions and putting our work into context. Thank you, Daniel Ankelhed for all your help and advice on our work. A high five to you, Casper Johansson for making final calls on visual decisions. We would also like to thank Gustaf Hendeby, both for teaching us sensor fusion and for being the examiner of this thesis. We appreciate all of your ideas and input on our thesis. Last but not least, we would like to thank Gustav Lindmark for proofreading our thesis and coming with external answers when needed. You truly are "Norrlands Guld".

Last but not least we would like to thank our families. Without you, we would not have become who we are today. Thank you for everything you have ever done, all the food you have cooked, all the support and love you give, and for the happiness we share.

Linköping, June 2019 Albin Vestin and Gustav Strandberg

(6)

(7)

Notation

STOCHASTICDEFINITIONS Notation Meaning x State P State covariance y Measurement u Input w Process noise

Q Process noise covariance e Measurement noise

R Measurement noise covariance

OPERATIONS

Notation Meaning

Cov Covariance

max Maximize

diag(a, b) Diagonal matrix of elements a and b AT _{Transpose of matrix A}

p(x|θ) Probability density function of x given θ , Equal by definition

ˆ

x Estimate of stochastic variable x

fx0 Partial derivative of function f on variable x ˙

x Time derivative of x ix

(10)

DISCRETETIMESYMBOLS

Symbol Meaning

Ts Sampling time

xk State vector x at time k, x(kTs)

k|m At time k given measurements up to time m N Number of sampling points in sequence ˆ

xF

k|k Superscript F indicates causal estimate, using measure-ments up to time k

ˆ

xB_k|k Superscript B indicates anti-causal estimate, using measure-ments from time k up to N

GEOMETRY

Symbol Meaning

X Longitudinal Cartesian axis Y Lateral Cartesian axis Z Vertical Cartesian axis U Horizontal image axis V Vertical image axis

ABBREVIATIONS

Abbreviation Meaning

EKF Extended Kalman Filter

RTS Rauch-Tung-Striebel Smoother

TFS Two Filter Smoother

SPRT Sequential Probability Ratio Test

ADAS Advanced Driver Assisting Systems

GNN Global Nearest Neighbour

ROI Region Of Interest

IMU Inertial Measurement Unit

LIDAR Light Detection And Ranging

RADAR Radio Detection And Ranging

CNN Convolutional Neural Network

CV Constant Velocity

PDF Probability Density Function

(11)

Notation xi

OTHERNOTATIONS

Notation Explanation

Track Object of interest to estimate state over time Target Object of interest at present time

Ego The vehicle from which other objects are tracked, also called ego vehicle or ego car

Tracker The implemented filter to track other objects Ground truth The true state

Tentative Track A target that has neither been rejected nor confirmed as a track

Observation Measurement

Online tracking Causal/real-time tracking

(12)

(13)

List of Figures

1.1 Overview of the target product developed by Veoneer . . . 3

2.1 Brief overview of online tracking . . . 8

2.2 Overview of the filter recursion . . . 10

2.3 Gating areas between two tracks and two observations . . . 14

2.4 Example of how the covariance affects the Mahalanobis distance . . . . 15

2.5 An example of 2/2 & 3/4 M/N -logic . . . 16

2.6 A brief overview of the fusion methods . . . 17

3.1 Design overview of the non-causal object tracker . . . 20

3.2 Illustrations of coordinate systems . . . 22

3.3 Overview of the pinhole camera model . . . 23

3.4 PerfectROImarkings done by hand . . . 24

3.5 ROImeasurement of a vehicle . . . 25

3.6 A brief overview of the implemented filtering algorithm . . . 29

3.7 Definition of Euler angles in the tracking coordinate system . . . 31

3.8 Transformation between two frames, given by visual odometry . . . 31

4.1 Singe object scenario 1: Oncoming vehicle . . . 36

4.2 Scenario 1: Estimates px, vx, pyand vyusingROI . . . 37

4.3 Scenario 1: Absolute error of px, vx, pyand vyusingROI . . . 38

4.4 Scenario 1: Estimates s, pzand vzusingROI . . . 39

4.5 Scenario 1: Estimates px, vx, pyand vyusingROIandLIDAR. . . 41

4.6 Scenario 1: Absolute error of px, vx, pyand vyusingROIandLIDAR. . 42

4.7 Scenario 1: Estimates s, pzand vzusingROIandLIDAR . . . 43

4.8 Single object scenario 2: Overtaking vehicle . . . 44

4.9 Scenario 2: Estimates px, vx, pyand vyusingROI . . . 45

4.10 Lateral uncertainties with multiplicative noise . . . 46

4.11 Scenario 2: Absolute error of px, vx, pyand vyusingROI . . . 47

4.12 Scenario 2: Estimates s, pzand vzusingROI . . . 48

4.13 Scenario 2: Estimates px, vx, pyand vyusingROIandLIDAR. . . 49

4.14 Scenario 2: Absolute error of px, vx, pyand vyusingROIandLIDAR. . 50

4.15 Scenario 2: Estimates s, pzand vzusingROIandLIDAR . . . 51

4.16 Width estimation of a maneuvering vehicle . . . 52

4.17 Scenario 3: Independent targets . . . 53 xiii

(14)

4.20 Scenario 3: Estimates px, vx, pyand vyof Track 2 usingROIandLIDAR 56 4.21 Scenario 3: Estimates s, pzand vzof Track 2 usingROIandLIDAR. . . 57 4.22 Scenario 4: Overlapping targets . . . 58 4.23 Scenario 4: Estimates px, vx, pyand vyof two tracks usingROIandLIDAR 59 4.24 Scenario 4: Estimates s, pzand vzof two tracks usingROIandLIDAR . 60 4.25 Scenario 5: Occluded target . . . 61 4.26 Scenario 5: Estimates px, vx, pyand vyusingROIandLIDAR. . . 61 4.27 Scenario 1: Estimates px, vx, pyand vyusingROI(IMU) . . . 63 4.28 Scenario 1: Absolute errors of px, vx, pyand vyusingROI(IMU) . . . . 64 4.29 Scenario 1: Estimates pzand s usingROI(IMU) . . . 65 4.30 Scenario 1: Estimates px, vx, pyand vyusingROIandLIDAR(IMU) . . 66 4.31 Scenario 1: Absolute errors of px, vx, pyand vyusingROI&LIDAR(IMU) 67 4.32 Scenario 1: Estimates pzand s usingROIandLIDAR(IMU) . . . 68 4.33 Scenario 2: Estimates px, vx, pyand vyusingROI(IMU) . . . 70 4.34 Scenario 2: Absolute errors of px, vx, pyand vyusingROI(IMU) . . . . 71 4.35 Scenario 2: Estimates pzand s usingROI(IMU) . . . 72 4.36 Scenario 2: Estimates px, vx, pyand vyusingROIandLIDAR(IMU) . . 73 4.37 Scenario 2: Absolute errors of px, vx, pyand vyusingROI&LIDAR(IMU) 74 4.38 Scenario 2: Estimates pzand s usingROI(IMU) . . . 75

List of Tables

4.1 RMSEfor both of the single object scenarios. . . 52 4.2 RMSEfor both of the single object scenarios (IMU). . . 68

(15)

Chapter 1 Introduction

Today, the accidents in traffic are mainly caused by the human factor. In fact, around 94% of all traffic accidents happens as a consequence of human errors [1]. In the future, where more vehicles are automated, accidents in traffic are expected to be reduced and many lives saved. To get there, the automated vehicles have to be safe, which is ensured by continuous testing and validation.

Reliability of a system lies in getting consistent results from a robust design. To achieve the expected behaviour of a system, extensive testing and validation is required. A recent application where rigorous testing is applied is self-driving cars. For example, self-driving cars need to track the relative positions and velocities of oncoming vehicles, i.e. from images of a mounted camera. But if these estimates are inaccurate, the car may act on a non-existing threat, or even worse, miss the detection of danger. To prevent this, the system should be tested in different traffic scenarios.

To validate the outcome of a system and ensure it is correct, staged scenarios are often used where the results can be compared to the expected outcome. For an autonomous car, the estimated relative position of other objects could be compared to measurements from external sensors, used as ground truth. With this method, the tracking performance could be measured as the difference between the estimate and the true state. This is an effective method but it requires the validation to be performed in staged scenarios.

An alternative approach is to record sequences in traffic and afterwards use knowledge from the whole recording to estimate the true state trajectory. This method does not require the tracked objects to be equipped with additional sensors and could therefore be applied in real traffic scenarios. For online tracking the computational time is limited, but for recorded sequences, this limitation is removed. Another advantage is the possibility to use non-causal algorithms. Since the full sequence is available, the track could be estimated using future measurements to obtain an even more accurate output.

(16)

1.1 Background

In the automotive industry, the main focus of the last decades has been to find solutions for active safety rather than for passive safety. The difference between the two is that active safety systems prevents accidents, while passive safety protects the passenger in case of an accident [2]. Examples of passive safety systems are seat belts and airbags. Today, new Advanced Driver Assistance Systems (ADAS) are developed to actively

in-crease the safety of the passengers. Emergency braking, adaptive cruise control and lane keep assist are examples ofADASthat are used in modern cars. All of these features act on information about the surrounding environment of the vehicle. To perceive the traf-fic environment and estimate movements of the nearby objects, sensor fusion and target tracking plays a central part. [3]

Target tracking is used to estimate the relative positions and velocities of the objects in the traffic environment, referred to as tracks. For this to be possible, measurements of the ego movement and detections of the surrounding objects are necessary. The sensors are classified into internal and external sensors. For tracking applications, internal sensor equipment are typically sensors like gyroscopes and accelerometers to measure the angu-lar rates and accelerations respectively. The traffic environment is instead often observed through external sensors such as camera, LIDAR andRADAR systems. To increase the performance of target tracking, multiple sensors could be used to complement each other. [4]

The target tracking performed in ADAS used for collision avoidance and lane keeping needs to operate in real time, often referred to as online. Online target tracking is limited to use causal filtering algorithms to be performed within a limited time. These require-ments are relaxed when performing target tracking offline. This is the case when recorded sequences are evaluated to validate the systems. For the offline applications of target tracking, more computationally expensive algorithms that operates non-causally can be applied. This means that estimation can be done using both historical and future data, compared to the causal case where only prior measurements are available. Filters that estimates non-causally are called smoothers. Smoothed estimates are more accurate since more data is used at each time step [5, p.264].

To estimate variables of interest, measured data can be processed using sensor fusion al-gorithms, which is done in a prediction and a correction step. The prediction step uses a dynamic model to describe the transition of states through time. This step, often referred to as the time update, results in increased uncertainties of the states [6]. The measure-ments are then used to correct the predictions in a step called the measurement update, which reduces the state uncertainties [6]. Whether the estimates should rely mainly on the prediction or the correction step is described by filter parameters. Some of these pa-rameters are set to describe the hardware while others are designed by the user to obtain a system suited for their purpose [7, p. 1]. Using these concepts along with the information about the error distributions, estimates of the states can be achieved.

(17)

1.2 Application 3

1.2 Application

This thesis is done in cooperation with the company Veoneer, which develops automated driving andADAS. Their target product is a real-time tracking device for identifying and monitoring the nearby traffic environment. The tracking system uses a camera sensor for monitoring objects, such as vehicles and pedestrians. The relative position and velocity of the detected objects are estimated in order to avoid collisions. The main use of the system is to provide the active safety systems with information about the environment [8]. The product inputs images from the camera sensor and extracts for example Regions Of Interest(ROI) to classify visible objects. Other sensors can also be used as extra inputs when tracking objects, such as Light Detection and Ranging (LIDAR) sensors, Inertial Measurement Units(IMU) and Radio Detection and Ranging (RADAR). The system in action is illustrated in Figure 1.1.

Figure 1.1: Overview of the target product developed by Veoneer, [9]. Copyright Veoneer.

1.3 Related Work

Optimal smoothing techniques are well established and exist with many different ap-proaches [10]. A related work for high precision tracking where smoothing techniques are used is offline navigation systems [11], [12]. In these articles the tracking is improved by using the non-causal approach Two Filter Smoother (TFS) with the Extended Kalman Fil-ter(EKF). The smoothing is done by forming a state as the error between the forward and backwards recursions. The resulting smoothing algorithms obtain much more accurate estimates for these highly nonlinear systems.

Regarding vehicle tracking, lots of studies on how to estimate the movement of cars have been done. It is reasonable to do a more thorough modeling of the ego car than for other objects since all tracks are measured relatively [4]. One way of modeling track

(18)

dynamics is to use an Interacting Multiple Model (IMM) filter, which predicts the states

from different motion models in paralell. Each of the predictions are weighted and the final result of theIMMfilter in each time point is a combination of all predictions, based on their weights. In [13], multiple motion models are used to describe the object dynamics. The authors conclude that nonmaneuvering vehicles that travels with constant velocity, for example at highways or at stop signs, can be modeled with lower process noise to improve tracking. This is relevant when the sampling time is large and the process noise is high. However, if the target is constantly maneuvering the result would not improve significantly compared to a Kalman Filter [13].

A related work of vehicle tracking with a monocular camera is [14], where each image pixel is mapped to a longitudinal distance by assuming flat road. For detecting vehicles they use a trained Convolutional Neural Network (CNN). The detections are in turn used as measurements in aEKF. The filter output is then compared againstLIDARmeasurements used as ground truth. This work concludes that the tracking using a monocular camera and aCNNis sufficient both at close and large distances.

1.4 Objectives

The purpose of this thesis is to investigate how non-causal algorithms affects target track-ing performance ustrack-ing multiple sensors and dynamic models of different complexity. This is done to evaluate real-time methods against estimates obtained by smoothing. Consecu-tively the following questions will be addressed:

• Does non-causal algorithms improve the tracking performance, compared to causal methods?

• Does data from multiple sensors improve the tracking performance, compared to using a single sensor?

• Does a more complex noise model improve the tracking performance? • Does tracking multiple objects affect the tracking performance?

1.5 Project Limitations

This thesis extends across a timeline of 20 weeks. In order to make the project feasible to complete in time, some limitations have been introduced. Besides limited amount of time, factors like the amount of useful data sequences also limits the project.

The choice of filtering algorithm is assumed to have small impact on the tracking perfor-mance and is therefore not investigated.

• Filtering choice - Only theEKFwill be evaluated.

• Non-causal methods - Only the Two Filter Smoother (TFS) will be evaluated. • Data association - The Global Nearest Neighbor (GNN) algorithm is the only

(19)

1.6 Thesis Outline 5

• Measurement units - Monocular camera,LIDAR,IMUand Differential Global

Posi-tioning System(DGPS).

• Only tracking of cars will be evaluated. Pedestrians, buses and other objects will therefore not be tracked nor evaluated.

1.6 Thesis Outline

The thesis is structured as follows. First, in Chapter 2, we introduce the theoretical back-ground of object tracking including the fundamental concepts and algorithms. Following that, the implemented tracker is described in Chapter 3. Thereafter the results of the thesis are found in Chapter 4. Finally, conclusions and discussion of the results, together with suggested future work are found in Chapter 5.

(20)

(21)

Chapter 2 Tracking Fundamentals

The aim with this chapter is to give the reader insight in methods for common tracking components. The theory and algorithms used in this thesis are presented. Some theory is given in general terms, in such cases further details are found in the references.

The chapter begins with a brief overview of a general online tracking system. Thereafter, some details on how dynamic modelling is used for tracking and characterizing a Con-stant Velocity(CV) model. Following that, filtering theory and common approaches are given, including the Extended Kalman Filter, the Iterated Kalman Filter and fundamentals on non-causal smoothing algorithms. Further, the data association problem is presented, outlined with a solution using the Global Nearest Neighbour (GNN) method. Later, some theoretical approaches on track management are presented, describing how to decide if a track is active or not. This is followed by a brief description regarding sensor fusion techniques. The chapter ends with some alternative considerations on how to validate a filter.

(22)

2.1 Overview

A simple schematic of an online tracking algorithm can be seen in Figure 2.1. The differ-ent steps and functionalities in the flow chart are:

1. Associating measurements with tracks and initiating new tracks.

• Compares the measurements for the current frame and the track states from the latest time update. The measurements are associated with the tracks. This is done with for instance Global Nearest Neighbour [15, Sec. 6.4]. If there are unassociated measurements, new tracks are initialized. Unassociated tracks are dealt with in the next step.

2. Track management.

• Managing the tracks based on whether they are associated with measurements or not. This could be done by M/N -logic or a score based methodology like the Sequential Probability Ratio Test (SPRT). Both of these uses some kind of track score and thresholds to decide whether the track is active or not. [15] 3. State correction.

• Correcting the states for each active track using the associated measurements. 4. State prediction.

• Predicting the states to the next frame for each active track.

Measurements and processed sensor data

State prediction

Associating measurements with tracks and initiating

new tracks

Track management

State correction

(23)

2.2 Dynamic Modelling 9

2.2 Dynamic Modelling

Dynamic modelling of a target describes the states transition through time, by for example its kinematics [16]. Depending on the application and complexity there exists numerous model alternatives [16]. One dynamic model that can be used for vehicle tracking is the

CVmodel. It approximates the velocity transition between time instants as constant [13]. ACVmodel in three dimensions describes the dynamics of the states

xk =pkx vxk pky vyk pkz vkz T

,

where px, pyand pzdefines the position and vx, vyand vzdefines the velocity in cartesian coordinates. The time update of the states is modeled with noise

wk =wxk¨ wky¨ wzk¨ T

, describing the uncertainties of along each of the axes. The model

xk+1=         1 Ts 0 0 0 0 0 1 0 0 0 0 0 0 1 Ts 0 0 0 0 0 1 0 0 0 0 0 0 1 Ts 0 0 0 0 0 1         xk+         T2 s/2 0 0 Ts 0 0 0 T2 s/2 0 0 Ts 0 0 0 T2 s/2 0 0 Ts         wk, (2.1)

is defined in discrete time with sampling time Ts[6]. An important factor for modeling the dynamics in this way is that the motion in each dimension is assumed independent [6]. The assumption that the velocity is constant means implicitly that the absolute accel-eration is zero. In a traffic environment, the ego vehicle accelaccel-eration and retardation is expected, hence when tracking from an ego vehicle, the observations are often relative. This could be dealt with by instead tracking in absolute position and velocity, compensat-ing with the movement of the ego vehicle [4].

2.3 Filtering

One way of estimating states is by using a filtering method, which calculates the states recursively using system models. When working with state space models, filtering can be done in a correction and a prediction step. For the correction step, the measurement model

yk= h(xk, uk, ek), (2.2)

is used to describe the probability of the measurements yk as a function of the states xk, the inputs uk and the measurement noise ek, at time k. The measurement model describes the corresponding Probability Density Function (PDF) p(yk|xk). The corrected state is distributed according to the posterior density function pk|k(xk|y1:k). The next step, which is the prediction step, is done using a motion model, such as (2.1). The transition

(24)

Figure 2.2: Overview of the filter recursion with connection between correction and prediction step [4].

There exist several approaches for how to estimate the states depending on the transi-tion density and the likelihood functransi-tion. For linear models where both distributransi-tions are Gaussian, the Kalman filter gives optimal estimates. For nonlinear models with Gaussian noise distributions for both process and measurement noise, the Extended Kalman Filter and the Unscented Kalman Filter (UKF) are two well known approaches. In cases where the noises are not Gaussian distributed, other approaches such as the particle filter or the marginalized particle filter are more suited. [4, p. 36]

2.3.1 Extended Kalman Filter

Originating from the linear Kalman Filter, the Extended Kalman Filter (EKF) approxi-mates a nonlinear model with a linear around the current state estimate [6]. The nonlinear system is given by

xk+1= f (xk, uk, wk), Cov(wk) = Qk, yk= h(xk, uk, ek), Cov(ek) = Rk.

(2.3)

This represents a nonlinear state space model where xk is the state vector, ukis a known input and both wk and ek are noise vectors with covariances Qk and Rk, respectively. With the index k|m, indicating time k with measurements up to time m, theEKFrecursion is given in Algorithm 2.1 [6].

(25)

2.3 Filtering 11

Algorithm 2.1 Extended Kalman Filter [6]

Given the nonlinear system (2.3) with ˆxk|k−1and Pk|k−1from the previous time update, the following recursion defines theEKFof order one:

∂x and the shorthand notations f (xk) = f (xk, uk, wk) and h(xk) = h(xk, uk, ek) are used.

TheEKFof order one uses the Jacobian matrix to approximate the nonlinearities. Other approaches such as the Unscented Kalman Filter instead uses Sigma-points transformed via the nonlinear functions, which reduces errors of higher order linearizations [5].

2.3.2 Noise Parameter Tuning

The time and measurement update steps in theEKFrecursion updates the state uncertain-ties based on the noise covariances Qk and Rk, respectively. The measurement noise Rk reflects the accuracy of the sensor and is therefore given by hardware specifications [7]. For artificial measurements that cannot be directly measured, the noise would be expressed as a combination of the covariances of involved variables [17].

The process noise wk, which express the certainty of the motion model, is often more difficult to specify than the measurement noise. For most applications the process noise is therefore tuned by ad hoc methods. The ratio between Qk and Rk reflects the bal-ance between trusting the measurements or the dynamic model. For this reason, the filter performance is determined by how the parameters are tuned. [7]

2.3.3 Iterated Kalman Filter

When data from multiple sensors are available, an Iterated Kalman Filter can be used for correcting the states. This approach updates on all available data sequentially in each time frame. For uncorrelated measurements, this is equivalent with performing one measure-ment update using all of the measuremeasure-ments [6]. The iteration starts with the initialization

(26)

of the state

ˆ

x0_k|k= ˆxk|k−1, (2.5a)

P_k|k0 = Pk|k−1. (2.5b)

The measurement update at time k is

where i = 1, 2, ..., M are the available sensors [6]. The final step of the measurement update is

ˆ

xk|k= ˆxMk|k, (2.7a)

Pk|k= Pk|kM. (2.7b)

2.4 Fixed Interval Smoothing

In Section 2.3 theory for estimating the priori and posteriori states, ˆxk|k−1 and ˆxk|k, is presented. These are estimated from a prediction and a correction step respectively, where the historical measurements up to the current time step are used. However, in many cases data has been recorded in advance, meaning both future and historical data are available for the state estimation. Methods that use this non-causality are called smoothers. [5, p.263-264]

When estimating states non-causally, fixed interval smoothing is one approach. A smooth-er that is applied on the time intsmooth-erval [1, N ] estimates ˆxk|Nin every time step k ∈ [1, N ] using all available measurements within the interval.

The Two Filter Smoother (TFS) [6, p.178], is an approach of using non-causality for state estimation and is easily implemented for reversible state space models [10]. Like all fixed interval smoothers, the idea of this method is that the state ˆxk, k = 1, 2, ..., N , is estimated by using the information from the whole interval. This is done using two independent filters, where one is operating forward in time and the other backwards in time. The outputs of the forward filter at time k are the causal estimates ˆxF

k|kand the state covariance PF

k|k. The backwards filter outputs the anti-causal estimates ˆx B

m|m+1and the

(27)

2.5 Data Association 13

time m + 1 up to N are used for the estimate at time m. By fusing these results, the smoothed estimations, ˆxm|Nand Pm|N, are acquired. [5, p.280]

To derive the model used in the backward filter, assume that the dynamic model from the nonlinear state space model (2.3) can be expressed as the backwards recursion

xk = g(xk+1, uk, wk), yk = h(xk, uk, ek).

(2.8)

TheTFSis given in Algorithm 2.2.

Algorithm 2.2 Two Filter Smoothing [6, p.180]

Given the forward (2.3) and backward (2.8) models, the smoothed states ˆxk|N and state covariances Pk|Nin each time step k = 1, 2, ..., N , are obtained by:

1. Forward Filtering. Apply Algorithm 2.1 using the forward model (2.3) to compute ˆ

xF_k|kand P_k|kF , and store them.

2. Backward filtering. Apply Algorithm 2.1 using the backward model (2.8) to com-pute ˆxB_k|k+1and P_k|k+1B , and store them. The initialization of the backward filter should be based on no prior information, setting the initial state covariance as infi-nite.

3. Fusion of the states and state covariances. The smoothed states ˆxk|N and state covariances Pk|N are obtained by

Pk|N= (P_k|kF )−1+ (P_k|k+1B )−1 −1 , (2.9a) ˆ xk|N=Pk|N (P_k|kF )−1xˆF_k|k+ (P_k|k+1B )−1xˆB_k|k+1 −1 . (2.9b)

Another method for data smoothing is the Rauch-Tung-Striebel (RTS) smoother. One ad-vantage ofRTSis that the estimated states and state covariances for the backward filtering is not computed, resulting in a computationally more efficient approach of data smooth-ing [5].

2.5 Data Association

For multiple target tracking there exists a problem of how to pair measurements and tar-gets. This problem is referred to as the data association problem. There exists several approaches on how to solve it [15, p. 338-342], whereGNNis used in this thesis.

(28)

2.5.1 Global Nearest Neighbour

A solution to the data association problem is the Global Nearest Neighbor (GNN) algo-rithm. This approach minimizes the overall distance between the measurements and the active tracks’ states. Here, a brief overview is presented while further details are found in for instance [15, p. 340].

Assume that there exists equally many tracks as measurements. The optimal solution to

GNN is the minimized sum of all normsP dij, where each track xi and measurement yj are only associated once. This optimization problem can be solved with different techniques including linear programming or by defining a score function with steepest descent [15, p. 342-343]. An efficient alternative is the Auction Algorithm [15], which is used in this thesis. Other approaches such as fuzzy logic, neural networks and genetic algorithms are discussed in [15, Ch. 7]. The norm between a track xiand an observation yjcould, for example, be defined as the statistical Euclidean distance dij = |xi− yj|. With a large number of either tracks or measurements, the possible number of combina-tions grows. One way of dealing with this is by defining a threshold called gating distance, Gi, for each track xi, seen in Figure 2.3. Let

aij = Gi− dij (2.10)

be the margin between observation yj and the edge of the gating region for track xi. If aij ≥ 0, it is accepted as a possible association, if aij < 0, the association is invalid and therefore ignored. As an example, two tracks and measurements are illustrated in Figure 2.3. Here, both of the tracks x1and x2 can be associated with y1, while y2only has x2as a valid assignment. In essence,GNNmaximizes the sum of all marginsP aij, considering all tracks xiand all measurements yj.

x₁ x₂ d₂₁ d₁₁ y1 y2 d₂₂ G₁ G₂

Figure 2.3: Two dimensional gating areas between two tracks x1, x2and two obser-vations y1and y2.

(29)

2.5 Data Association 15

2.5.2 Mahalanobis Distance

Mahalanobis distance is defined as the distance between two points normalized with the covariance of the difference between them. The correlation is described by the covariance matrix of the variables. A special case is when the covariance matrix is equal to the identity matrix, leading to the Mahalanobis distance being equal to the Euclidean distance [18]. The Mahalanobis distance dij is defined as

d2_ij = (xi− yj)TCov[xi− yj]−1(xi− yj) (2.11) where xi are the track’s states and yj is the observation [19]. The Mahalanobis distance normalizes the Euclidean distance with the covariance. An example of how the covariance relates to the association is illustrated in Figure 2.4. In this case the observation has zero uncertainty and no gating is used. In Figure 2.4(a) the Euclidean distance d21between the track x2and observation y1is smaller than d11. For the Euclidean nearest neighbour algorithm, this would associate the observation with track x2. As seen in the figure, the observation is however more likely to be associated with x1 since the measurement is within the uncertainty of the track. The Mahalanobis distance, seen in Figure 2.4(b), is normalized on the covariance which would instead result in x1being associated with y1.

x₁

x₂

d₂₁ d₁₁

y₁

(a) Conventional Euclidean distances d11

and d12. x₁ x₂ dm₂₁ dm₁₁ y₁

(b) The Mahalanobis distances dm 11and dm12.

Figure 2.4: The two tracks x1and x2with their state covariances indicated by the dotted ellipsoids and an observation y1, measured with two different distances.

(30)

2.6 Track Management

If a target leaves the area covered by a sensor, there will not be any more detections of it, and the track should be removed. On the other hand, in some frames, a track might be unassigned with measurements because the system fails to detect the target. This should not cause the tracker to stop tracking the object. The tracker could also get false measurements, which should be rejected as tracks. Track management is about handling these situations. To decide, algorithms based on the track histories or hypothesis tests are used. One algorithm, based on track history, is M/N -logic, which is used in this thesis. An alternative method is for example Sequential Probability Ratio Tests,SPRT, which is based on hypothesis tests [15, Sec. 6.2].

M/N -logic is a method for track management that is based on the track’s association history. It is used for both maintenance and initialization of a track. The idea is that the track should be associated with measurements in M of the latest N frames to be confirmed as a track. If it fails, the track is rejected. Figure 2.5 displays an example of M/N -logic used for initialization of a track. Here the tracks are divided into the initiation levels:

• Confirmed track - A track that has passed the M/N -logic. • Rejected track - A track that has failed the M/N -logic.

• Tentative track - A track that has neither passed nor failed the M/N -logic. In each time step k, a track could either be associated with a measurement or not, which is represented by the edges of the graph. In this example, the algorithm conditions on two M/N -logic parts, M1/N1& M2/N2, where M1 = N1 = 2, M2 = 3 and N2 = 4. A confirmed track has been associated to measurements M1times of the first N1frames and then M2times of the next N2frames. These are the green nodes in the graph. A rejected track is represented by a red node and a tentative track as a yellow node. The use of two M/N -logic parts immediately removes tentative tracks from false measurements. [20]

Confirmed track Tentative track Rejected track Measurement No measurement k k+1 k+2 k+3 k+4 k+5

(31)

2.7 Sensor Fusion 17

2.7 Sensor Fusion

Sensor fusion algorithms are essentially used to combine information from different sourc-es. The fusion can be done in different ways. The methods discussed in this section are called decentralized and centralized fusion.

The idea of decentralized fusion is to estimate states from given measurements separately, using a filter for each measurement type [21]. The estimates are then used as measure-ments in a new filter, where they are fused, see Figure 2.6 [6].

The centralized fusion uses all measurements in the measurement model at the same time. This means that all measurements are processed within a single filter [21], seen in Fig-ure 2.6. Filter/Fusion Filter Filter Camera LIDAR Measurement Measurement Filter Camera LIDAR Measurement Measurement Decentralized Centralized Estimate Estimate

Figure 2.6: An example of the fusion methods, decentralized and centralized fusion.

2.8 Filter Validation

This section contains descriptions of how to validate a filter. One way of validating a filter is to compare the estimates to ground truth data. This data can be used as the expected outcome of the filter and hence be directly compared to the estimates. As a measure of performance, the absolute error

|ˆxk− x0k|, (2.12)

between the estimate ˆxkand the true value x0k, for all time points k, can be used. Another measure is the Root Mean Square Error (RMSE)

RMSE(ˆx) = v u u t 1 N N X k=1 ˆ xk− x0k 2 ! ≈ q E (ˆx − x0₎2₌ q Var(ˆx) + Bias2(ˆx), (2.13)

(32)

where ˆx is a stochastic variable and x0_{is the ground truth for all time points k [6, p.473].} This is a measure on how close the estimated states are on average to the true value. If ˆx is an unbiased estimate of x0_{, meaning Bias(ˆ}_{x) = 0, the} _RMSE _{is equal to the} standard deviation of ˆx [22]. As described in Section 2.3.2, the performance of the filter is affected by the tuning. The filter can therefore be tuned until sufficient accuracy is obtained, indicated by the performance measures. [6, ch. 15]

The true states are often unknown outside simulations, meaning they have to be measured or estimated using for example, external sensors. These sensors are required to be accurate to be useful for filter validation, meaning their measurement noise should be low. Since external sensors usually requires to be mounted on the targets, this validation is done in staged scenarios. Other approaches for validation are smoothing instead of filtering, use of state constraints or artificial measurements or use of more sophisticated models. Using extra sensors is often the most beneficial when estimating the ground truth. [6, p. 410]

(33)

Chapter 3 The Non-Causal Object Tracker

In this chapter, the implemented non-causal object tracker is presented. First, a design overview of the modular tracker is given. Here, the functionalities of each module and the connections between them are described. After this, more detailed descriptions of the tracker are given, beginning with the algorithm to associate measurements with tracks, which is described in combination with the track management. This is followed by a sec-tion on the coordinate systems, which includes the tracking and image coordinate systems and the conversion between them. Afterwards, a section on the measurements used by the tracker are presented. Here the types of measurements and how they are included in the model are explained. The implemented non-causal filtering is then explained followed by the modeled dynamics.

(34)

3.1 Design Overview

The tracker is designed with different modules with different responsibilities. Since data is given in an offline environment, whole sequences of recorded data are available in each time step. This is handled by the sequence module, responsible for loading and storing available data. This data includes all the measurements, inputs and parameters used by the tracker. Illustrated in Figure 3.1, the available data in each time frame are sent from the sequence module to the data association module. Here the measurements are assigned to either confirmed or tentative tracks. In cases of unassigned measurements, the module creates new tentative tracks. The data association also updates each track’s status, used for track management. The track management is done to decide whether to confirm or reject the detections as a track. The tracker module keeps all tracks and iterates a filter for each one of them. Each track is represented with an object, which logs all the assigned measurements and estimates, updates its current state and executes the filter updates. The filter updates are using matrices and functions stored by the models module. Here all dy-namic models, measurement models and noise matrices are stored. Parameters specified for the sequence, hardware specifications and tuning values are stored in the parameters module. The content in both the models and parameters modules are accessible from all other modules. ROI LIDAR Ego motion DGPS Sequence Data Association GNN and Track Management Confirmed Objects Tentative Objects Tracker Track 1 Track n Track 1 Track m Track Track Track Track Filter Status Data Logs Tuning Initialization Parameters Dynamic Noise Models Measurement Calibration Filter Status Data Logs Filter Status Data Logs Filter Status Data Logs

(35)

3.2 Track Association and Management 21

The tracker is implemented to useROIfrom a monocular camera andLIDARas

measure-ments. Ego vehicle motion is treated as an input, describing the translation and rotation of the ego vehicle between frames. In sequences whereDGPSis available, this data is used for validation. Each track is initialized fromROImeasurements, as a tentative track. The tracker then updates each assigned measurement type separately. This means that theROI

andLIDARmeasurements are used for state correction sequentially, depending on their availability. Each track’s logged measurements are used in the backward filtering recur-sion. The forward and backward estimates are fused by the smoothing techniqueTFSto achieve the final result.

3.2 Track Association and Management

The tracker usesGNNwith Mahalanobis distance to formulate the data association prob-lem and solves it using the Auction Algorithm. The track management is done with M/N -logic forROImeasurements, see Section 2.6. This particular M/N -logic is a 2/2, 4/8 implementation. TheLIDARmeasurements are associated separately from theROI. A cost of not assigning measurements is required by the Auction Algorithm. This value should be chosen so that depending on the measurement certainty, the cost of assigning to a new rather than an existing track is not too large. This means that existing tracks are preferred to be assigned with measurements rather than creating new tracks for them. If this value is too small, new tracks will be created instead of assigning measurements to existing tracks. Since the association is done between stochastic variables, Mahalanobis distance is used. The data association and track management is implemented as the fol-lowing steps:

1. Associate measurements to existing tracks.

2. Manage existing tracks by updating M/N -logic parameters. 3. Create new tracks from unassigned measurements.

Since the association is done in an offline environment and does not depend on the com-putational capacity, gating is not needed. This means that all measurements and tracks are compared. Also, because whole sequences are available, it is possible to know whether a track will be rejected or confirmed in the frame it is initialized. This results in that when the whole sequence has been tracked, there will be no tentative tracks in any time step. If a track is not assigned to a measurement in the current frame, the measurement update is skipped for that track.

(36)

3.3 Coordinate Systems

The camera is mounted inside the windshield, close to the rear-view mirror, pointing towards the front of the ego vehicle. All the targets are tracked in an ego vehicle fixed coordinate system with origin in the camera position. Illustrated in Figure 3.2(a), the longitudinal distance is represented by the X-axis, the lateral distance by the Y -axis and the vertical distance by the Z-axis. Here the longitudinal and lateral axes are parallel with the ego vehicle.

Since the camera is mounted with an angular difference with respect to the ego vehicle, the ROI measurements are given in a camera rotated coordinate system. The origin is shared between both systems. Compared to the vehicle fixed system, the camera rotated coordinate system is tilted around the Y -axis, facing downward from the ego driver’s point of view. This makes the XR-axis point towards the ground as illustrated in Figure 3.2(b).

X Y

Z

(a) Tracking coordinate system.

X

R

Z

R

(b) Camera rotated coordinate system. Figure 3.2: Illustrations of coordinate systems.

The camera projects the world coordinates onto the image sensor, which is modeled as a pinhole camera [23, p. 153]. This model gives a relation between a world position PW and the measured pixel coordinate PI in the image plane. The principal axis XR from the camera rotated coordinate system is defined from the image plane through the camera center, seen in Figure 3.3(a). The image axes U for horizontal position and V for vertical position are defined to compensate for the camera obscura effect. The pinhole camera model is illustrated in Figure 3.3.

(37)

3.3 Coordinate Systems 23 PW XR YR ZR U V PI f Image Plane Camera Center

(a) Pinhole camera model using the rotated camera coordinate sys-tem.

P

W

X

R

Z

R

V

P

I

f

Image Plane Camera Center P_X P_Z

(b) Side view of the pinhole camera model geometry. Figure 3.3: Overview of the pinhole camera model.

From the pinhole camera model, the following relations are obtained

PW ,px py pz T , (3.1a) PI ,iu iv T , (3.1b) iu= f py px , (3.1c) iv= f pz px , (3.1d)

where f is the camera focal length, given as a conversion factor between pixels and meters. Here, iuand ivare the horizontal and vertical image coordinates.

(38)

3.4 Measurements

Multiple sensor measurements are available to the tracker. For targets, measurements are available in the form ofROIand clusteredLIDARdata. In staged scenariosDGPSdata are also available to use as ground truth. In this section, the time index is skipped in order to simplify the notation. That is, y = h(x, u) will be used instead of yk = h(xk, uk), where x are the states and u the input to the model.

3.4.1 Region of Interest

From the camera image sensor the objects are represented as two dimensional rectangular boxes called Regions of Interest (ROI). These boxes are either computer classified objects or manually annotated. In online scenarios, only computer markings are available, but in offline scenarios, more accurate markings can be obtained by marking each vehicle by hand in each frame. The hand markings are referred to as perfectROI, visualized in Figure 3.4.

Figure 3.4: PerfectROImarkings done by hand.

It is important that the markings of perfectROIare consistent between consecutive frames. Variations would be affecting the target states, leading to larger uncertainties of both posi-tion and velocity of the tracked vehicle. The marking is done with theROI’s edges at the headlights for oncoming vehicles and at the rear lights for preceding vehicles. TheROI

measurements are also classified as vehicles, trucks or as pedestrians, but in this thesis only vehicles are considered. AROImeasurement consists of

yROI₌_i

u iv iw aw

T

, (3.2)

which is illustrated in Figure 3.5. Here iu and iv are the horizontal and vertical coordi-nates in the image plane Figure 3.3(a), while iw is the measured width of theROI. The measurement awis the assumed width of the target. This assumption is needed for the monocular camera to estimate the longitudinal distance towards the target.

(39)

3.4 Measurements 25

V

U

i

v

i

u

i

w

Figure 3.5:ROImeasurement of a vehicle.

The measurement model describing the connection betweenROImeasurements and states is yROI =       iu iv iw aw       =       −fu py px + cu −fvp_pz_x + cv fu_ps_x s       + eROI = hROI (x, u) + eROI . (3.3)

The camera parameters fu, fv, cu and cv describes the horizontal and vertical focal lengths and the horizontal and vertical pixel coordinate for the center pixel, respectively. Here, awis the assumed width of targets and eROIis the measurement noise. The camera parameters are assumed constant during the whole sequence. Since a monocular camera is unable to detect the longitudinal distance of a target, the artificial measurement awis used to give the measurement equations (3.3) one unique solution. Without this assump-tion a large value of iwcould be explained by either a wide target far away or a small target up close. The assumption introduces uncertainties of other states. If the assumed width is incorrect, the longitudinal distance px will be wrong, affecting the rest of the states, seen in (3.1).

The covariance of the measurement noise eROI_{is defined by}

RROI =     i2 wσ2iu 0 0 0 0 i2 wσi2v 0 0 0 0 i2 wσ2iw 0 0 0 0 i2 wσ2aw     . (3.4)

Here σiu, σiv, σiwand σaware tuning parameters. The standard deviation of aROI

mea-surement depends on the pixel width iwof the target. This is because iu, ivand iwvaries more for objects that are closer to the camera. The assumed width awwill also become

(40)

more uncertain for closer targets. Since the width of vehicles vary, the tracking perfor-mance depends on the difference between the true and the assumed width. The standard deviation of the assumed width is therefore set high to tolerate a large deviation between the true and assumed width.

3.4.2 LIDAR

LIDARmeasurements are collected from a sensor facing in front of the ego vehicle. The

LIDARmeasures the time difference between emitting and receiving a light pulse. This is translated to a measured distance. TheLIDARequipment have large uncertainties for lateral measurements, while the longitudinal distance and velocity are measured accu-rately. The used equipment can only detect moving vehicles and tend to miss detections of dark-colored vehicles. The measurements given by theLIDARare

yLIDAR

=pLIDAR

x vxLIDAR pyLIDAR vyLIDAR T

, (3.5)

where each measurement corresponds to the relative position or velocity of the detected target’s closest center point. The LIDAR equipment gathers measurements from point clouds which are clustered and classified by external software. Furthermore, the LIDAR

equipment operates at a sampling rate different from the camera sensor. When LIDAR

detections are available for the tracker, the measurement model

yLIDAR ₌       pLIDAR x vLIDAR x pLIDAR y vLIDAR y       =       px vx−t_Tx_s py vy− ty Ts      

+ eLIDAR_{= h}LIDAR_{(x, u) + e}LIDAR _(3.6)

is used. Here, eLIDAR_{is the measurement noise for}

LIDARmeasurements. Since the

veloc-ities vLIDAR

x and v

LIDAR

y are relative to the ego vehicle, the absolute velocities vxand vyin the state vector needs to be compensated with the ego vehicle velocities tx

Tsand

ty

Ts. As for

the dynamic model, txand tyare the translation of the ego vehicle along the longitudinal and lateral axes, respectively, between two frames. The measurement noise covariance for eLIDAR_is RLIDAR =      σ2 pLIDAR x 0 0 0 0 σ2 vLIDAR x 0 0 0 0 σ2 pLIDAR y 0 0 0 0 σ2 vLIDAR y      (3.7) where σ2_pLIDAR x , σ 2 vLIDAR x , σ 2 pLIDAR y and σ 2 vLIDAR

(41)

3.4 Measurements 27

3.4.3 Fused Measurements

When data from both sensors are available, centralized fusion is used to estimate the states. This results in the measurement model

y = " yROI k yLIDAR k # = hROI_{(x, u) + e}ROI hLIDAR_{(x, u) + e}LIDAR . (3.8)

The combination of (3.4) and (3.7) results in the following covariance matrix for uncorre-lated measurements [6], R =   RROI ₀ 4×4 04×4 RLIDAR  . (3.9)

Depending on which measurements are available in each frame, the measurement model will vary. The measurementsROIandLIDARare assumed independent and could therefore

be used for correction by the Iterated Kalman Filter, described in Section 2.3.3. In the definition (2.6), i represents the available sensors. Here, in the case where bothROIand

LIDAR are available, i = {1, 2}, where i = 1 represents ROI and i = 2 represents

LIDAR. When onlyROIis available, the measurement update is done once with i = {1} representingROI. In the case where onlyLIDARis available, i = {1} representingLIDAR.

3.4.4 Differential GPS

DGPSis an accurate measurement for absolute positions, but requires that all vehicles are equipped with additional measurement units. Hence, this measurement can only be used in staged scenarios.DGPSmeasures the absolute position and velocity of a target,

yDGPS

=pDGPS

x vxDGPS pyDGPS vyDGPS

T

. (3.10)

These measurements are good estimates of the true state and can therefore be used for filter validation, as described in Section 2.8.

(42)

3.5 Filtering

For multiple target tracking the filtering problem is a combination of estimating all of the target states and the ego motion with intermediate correlations. By approximating that there are no correlations between the targets and that the ego motion is not dependent on the target measurements, the filtering problem can be solved using multiple independent filters. Let xi

kbe the state vector (3.16) of track i ∈ 1, ..., n, at time k ∈ 1, ..., N . Define Yk= {ykROI, y

LIDAR

k }

as all the available measurements and

Ek = {Rk(φ, θ, ψ), tk}

as the motion of the ego vehicle at time k. With the assumption that the motion of the ego vehicle does not depend on the measurements in any frame k, the general multi object tracking can be factorized as

p(x1_k, x2_k, ..., x_kn_{, E}k|Yk) = p(x1k, x 2 k, ..., x n k|Ek, Yk)p(Ek|Yk) ≈ p(x1 k, x 2 k, ..., x n k|Ek, Yk)p(Ek). (3.11)

This factorization is analogous to the fastSLAM algorithm seen in [6, Sec. 11.3]. This formulation means that all of the track states are estimated using the same filter at time k. For this example, it is assumed that all measurements Yjkare associated correctly with each track xj_k. Combine this with the assumption that there are no correlations between the tracks, the factorization

p(x1k, x2k, ..., xnk|Ek, Yk)p(Ek) = p(Ek) n Y

j=1

p(xj_k|Ek, Yjk), (3.12)

can be performed. This factorization is similar to the one used in the fastSLAM algorithm and in the Marginalized Particle Filter, meaning that each target is tracked separately with a corresponding filter [6]. The association between the measurements of Ykand tracks is managed by the data association, described in Section 3.2.

To estimate the variables of interest, a non-causal smoother has been implemented for each track. The method used for this is theTFS. As described in Section 2.4,TFSsmoothes the states by fusing the estimations from a causal and an anti-causal filter. The filtering for a single track at time k ∈ [ti, tf] is illustrated in Figure 3.6. The track is initialized at time instance ti∈ [1, N −1] and tracked until time step tf∈ [ti+1, N ], which is the first frame it is not associated with measurements. The anti-causal filter operates backwards in time using the previously assigned measurements. The results of the forward and backward recursions are then fused using (2.9).

(43)

Fusion of the Estimates ˆ

xk|tf, Pk|tf

Figure 3.6: A brief overview of the implemented filtering algorithm.

In order to not include the same measurement twice, the corrected states from the forward recursion and the predicted states from the backward recursion are fused. The initializa-tion for both of the filters are done with no prior informainitializa-tion included, meaning PF

ti|0and

P_tB

f|0are set to represent high uncertainties.

For both the causal and the anti-causal filter the EKF of order one is used, see Algo-rithm 2.1. The forward filter uses a causal dynamic model and the backward filter uses an anti-causal dynamic model. The implementation is done using the Sensor Fusion and Tracking toolbox forMATLAB[24].

(44)

3.6 Dynamic modeling

In this section, the modeled dynamics are presented. This includes both the ego vehicle dynamics and the dynamics of the targets used for the first dynamic model evaluated. Thereafter an alternative model of lower complexity called theIMUmodel is presented.

3.6.1 Ego Vehicle Dynamics

The ego vehicle dynamics are modeled as a translational and rotational transformation between frames. Estimates called visual odometry are extracted from the camera and are used as input to the ego vehicle dynamics. The estimates are given between frames and used as transition to the next or previous frame. Given a target vehicle that is standing still at position

pk

x pky pkz

T

at frame k, the transition to frame k + 1 due to ego vehicle motion is

   pk+1 x pk+1 y pk+1 z   = R(φk+1, θk+1, ψk+1)    pk x pk y pk z   +    tk+1 x tk+1 y tk+1 z   . (3.13)

Here the ego vehicle translation is represented by the vector tk+1

,tk+1

x tk+1y tk+1z

T and the rotation by the matrix R(φk+1, θk+1, ψk+1). Since the tracking is done in a fixed coordinate system with origin at the ego vehicle, the world is transformed according to the ego vehicle motion. This means that a ego vehicle movement along the positive X-axis would result in a given negative value of tk+1

x as compensation. The same idea is used for the rotation, where a right turn of the ego vehicle would rotate the surrounding objects to the left. The rotation using the Euler angles φk+1, θk+1and ψk+1defines the rotation

(45)

3.6 Dynamic modeling 31 where Rx(φ) =    1 0 0 0 cos(φ) sin(φ) 0 − sin(φ) cos(φ)   , (3.15a) Ry(θ) =    cos(θ) 0 − sin(θ) 0 1 0 sin(θ) 0 cos(θ)   , (3.15b) Rz(ψ) =    cos(ψ) sin(ψ) 0 − sin(ψ) cos(ψ) 0 0 0 1   . (3.15c)

These rotations are defined as φ being roll, θ as pitch and ψ for yaw angle, seen in Fig-ure 3.7.

X

Y

ψ

X

Z

θ

φ

Y

Z

Figure 3.7: Definition of Euler angles in the tracking coordinate system.

The total transformation between two consecutive frames is illustrated in Figure 3.8.

R

t

(46)

3.6.2 Target Dynamics

A Constant Velocity (CV) model is used to describe the target dynamics for causal filtering. The states of the model are

xk =sk pkx pky pkz vxk vyk vzk T

, (3.16)

where s is the target vehicle width, px, py, pzare the relative positions and vx, vy, vzare the absolute velocities of a target. The states are updated according to

sk+1=sk+ Tsws˙, (3.17a)    pk+1 x pk+1 y pk+1 z   =R k+1       pk x pk y pk z   + Ts    vk x vk y vk z   + T2 s 2    wx¨ wy¨ wz¨      +    tk+1 x tk+1 y tk+1 z   + Ts    wx˙ego wy˙ego wz˙ego   , (3.17b)    vk+1 x vk+1y vk+1z   =R k+1       vk x vyk vzk   + Ts    w¨x wy¨ wz¨      , (3.17c)

compactly written as (2.3). Here wx¨, wy¨and wz¨represents an acceleration noise for the target, wx˙ego, wy˙ego and wz˙ego the noise for the ego translation and Ts is the sampling

time. For shorthand notation of (3.15), Rk+1

, R(φk+1+ Tsw_φ˙, θk+1+ Tsw_θ˙, ψk+1+ Tsw_ψ˙)

is used. Here, w_φ˙, w_θ˙ and w_ψ˙ represents the noise for the ego rotation. The variance of ws˙is modeled to give the vehicle width a slow update.

The motion of the ego car is modeled with six degrees of freedom, found in Section 3.6.1. The compensation of the rotation Rk+1 and the translation tk+1 of the ego vehicle is done to move each target relatively to the ego vehicle. Since the ego angles are uncertain, the dynamic model has a multiplicative noise. The multiplicative noise in the model is approximated as

xk+1= f (xk, uk, wk) ≈ f (xk, uk, 0) +

∂f (xk, uk, wk)

∂w wk, (3.18)

to get an additive noise. Here, the non-linear dynamic model f (xk, uk, wk) is differenti-ated with respect to the noise vector

w =ws˙ w¨x wy¨ w¨z wφ˙ wθ˙ wψ˙ wx˙ego wy˙ego wz˙ego

T . The process noise covariance matrix used for the filtering is

(47)

3.6 Dynamic modeling 33 Cov ∂f (xk, uk, wk) ∂w wk = ∂f (xk, uk, wk) ∂w Qk ∂f (xk, uk, wk) ∂w T (3.19) where Qk= Cov(wk) = diag(σs2˙, σ 2 ¨ x, σ 2 ¨ y, σ 2 ¨ z, σ 2 ˙ φ, σ 2 ˙ θ, σ 2 ˙ ψ, σ 2 ˙ xego, σ 2 ˙ yego, σ 2 ˙ zego), (3.20)

with the variance for each of the noise parameters on the diagonal. The noises are assumed to be uncorrelated and their standard deviations are used as tuning parameters.

When estimating anti-causally, meaning only future data is used in the filtering, a model describing the dynamics backwards in time is implemented. This dynamic model is stated compactly in (2.8), where the states (3.16) are used. The equations describing the anti-causal state dynamics,

sk =sk+1− Tsws˙, (3.21a)    pk x pk y pk z   =(R k+1₎−1    pk+1 x − tk+1x − Tswx˙ego pk+1 y − tk+1y − Tswy˙ego pk+1 z − tk+1z − Tswz˙ego   − Ts    vk x vk y vk z   − T2 s 2    wx¨ wy¨ wz¨   , (3.21b)    vk x vk y vk z   =(R k+1₎−1    vk+1 x vk+1 y vk+1 z   − Ts    w¨x wy¨ wz¨   , (3.21c)

are constructed from reversing the causal model (3.17), expressing xk as a function of xk+1, uk and wk. Similar to (3.17), the backward model also has a multiplicative noise. This is dealt with in the same way as for the forward model, namely to approximate the noise as additive using the Jacobian,

xk= g(xk+1, uk, wk) ≈ g(xk+1, uk, 0) +

∂g(xk+1, uk, wk)

∂w wk. (3.22)

This results in the process noise covariance of (3.19), replacing the function f (xk, uk, wk) with g(xk+1, uk, wk). Qkis the same for both of the dynamic models (3.20).

3.6.3 The IMU Model

An alternative model has been implemented to evaluate the tracking performance using different model complexities. This model is referred to as theIMUmodel. This section describes both the ego dynamics and the target dynamics used for this model. Here, the dynamics along the longitudinal and lateral axes are modeled as constant velocity while the vertical position is assumed static. The motion of the ego car is modeled in three degrees of freedom, where ex˙ is the longitudinal velocity, e_ψ˙ is the yaw rate and eθ is

(48)

the difference in the pitch angle between two consecutive frames. The parameters ex˙ and e_ψ˙ are processed measurements from an IMU, while eθ is calculated from the images. The ego motion parameters are handled as inputs to the filter and assumed to be without uncertainties. The rotation matrix,

R(e_ψ˙, Ts) =

"cos(e_ψ˙Ts) − sin(e_ψ˙Ts) sin(e_ψ˙Ts) cos(e_ψ˙Ts)

#

, (3.23)

describes the yaw rotation of the ego vehicle between two consecutive frames. The causal dynamics of the states xk =sk pkx vxk pky vyk pkz

T are sk+1= sk, (3.24a) " pk+1 x pk+1 y # = R(−ek+1_˙ ψ , Ts) " pk x+ Ts(vxk− e k+1 ˙ x ) pk y+ Tsvyk # , (3.24b) " vk+1 x vk+1 y # = R(−ek+1_˙ ψ , Ts) " xk vk y # , (3.24c) pk+1_z = pk_z− pk+1 x tan(e k+1 θ ), (3.24d)

modeled with an additive noise, similar to theCV model seen in (2.1). The anti-causal dynamic model is constructed from (3.24), expressing xk as a function of xk+1, ex˙, eψ˙ and eθ. The anti-causal model,

sk = sk+1, (3.25a) " pkx pky # = R−1(−ek+1_˙ ψ , Ts) " pk+1x pk+1y # − Ts " vk x− e k+1 ˙ x vk y # , (3.25b) " vk x vk y # = R−1(−ek+1_˙ ψ , Ts) " vk+1 x vk+1 y # , (3.25c) pk_z = pk+1_z + pk+1_x tan(ek+1_θ ), (3.25d) is modeled with additive noise, in the same way as the causal dynamics.

The IMU model is considered to be of lower complexity than the model presented in Section 3.6.1 and Section 3.6.2. This is because the motion of the ego vehicle is modeled with three degrees of freedom compared to six. The ego parameters are also handled without uncertainties, resulting in a less complex noise model.

Evaluation of Target Tracking Using Multiple Sensors and Non-Causal Algorithms

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2019