Tracking Groups of People in Video Surveillance

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Tracking Groups of People in Video Surveillance

Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet

av Viktor Edman LiTH-ISY-EX--13/4693--SE

Linköping 2013

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Tracking Groups of People in Video Surveillance

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Viktor Edman LiTH-ISY-EX--13/4693--SE

Handledare: Maria Andersson

isy_{, Linköpings universitet} foi

Karl Granström

isy_{,Linköpings universitet}

Examinator: Fredrik Gustafsson

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Division of Automatic Control Department of Electrical Engineering SE-581 83 Linköping Datum Date 2013-06-13 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://www.ep.liu.se

ISBN — ISRN

LiTH-ISY-EX--13/4693--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Gruppmålföljning av Människor i Videoövervakning Tracking Groups of People in Video Surveillance

Författare Author

Viktor Edman

Sammanfattning Abstract

In this master thesis, the problem of tracking groups using an image sequence dataset is ex-amined. Target tracking can be defined as the problem of estimating a target’s state given prior knowledge about its motion and some sensor measurements related to the target’s state. A popular method for target tracking is e.g. the Kalman filter. However, the Kalman filter is insufficient when there are multiple targets in the scene. Consequently, alternative mul-titarget tracking methods must be applied along with methods for estimating the number of targets in the scene. Multitarget tracking can however be difficult when there are many unresolved targets, e.g. associating observations with targets in dense crowds. A viable sim-plification is group target tracking, keeping track of groups rather than individual targets. Furthermore, group target tracking is preferred when the user wants to know the motion and extension of a group in e.g. evacuation scenarios.

To solve the problem of group target tracking in video surveillance, a combination of gm-phdfiltering and mean shift clustering is proposed. The gm-phd filter is an approximation of Bayes multitarget filter. Pedestrian detections converted into flat world coordinates from the image dataset are used as input to the filter. The output of the gm-phd filter consists of Gaussian mixture components with corresponding mean state vectors.

The components are divided into groups by using mean shift clustering. An estimate of the number of members and group shape is presented for each group. The method is evaluated using both single camera measurements and two cameras partly surveilling the same area. The results are promising and present a nice visual representation of the groups’ character-istics. However, using two cameras gives no improvement in performance, probably due to differences in detections between the two cameras, e.g. a single pedestrian can be observed being at two positions several meters apart making it difficult to determine if it is a single pedestrian or multiple pedestrians.

Nyckelord

(6)

(7)

Abstract

In this master thesis, the problem of tracking groups using an image sequence dataset is examined. Target tracking can be defined as the problem of estimating a target’s state given prior knowledge about its motion and some sensor measure-ments related to the target’s state. A popular method for target tracking is e.g. the Kalman filter. However, the Kalman filter is insufficient when there are mul-tiple targets in the scene. Consequently, alternative multitarget tracking meth-ods must be applied along with methmeth-ods for estimating the number of targets in the scene. Multitarget tracking can however be difficult when there are many unresolved targets, e.g. associating observations with targets in dense crowds. A viable simplification is group target tracking, keeping track of groups rather than individual targets. Furthermore, group target tracking is preferred when the user wants to know the motion and extension of a group in e.g. evacuation scenarios. To solve the problem of group target tracking in video surveillance, a combina-tion of gm-phd filtering and mean shift clustering is proposed. The gm-phd filter is an approximation of Bayes multitarget filter. Pedestrian detections con-verted into flat world coordinates from the image dataset are used as input to the filter. The output of the gm-phd filter consists of Gaussian mixture components with corresponding mean state vectors.

The components are divided into groups by using mean shift clustering. An es-timate of the number of members and group shape is presented for each group. The method is evaluated using both single camera measurements and two cam-eras partly surveilling the same area.

The results are promising and present a nice visual representation of the groups’ characteristics. However, using two cameras gives no improvement in perfor-mance, probably due to differences in detections between the two cameras, e.g. a single pedestrian can be observed being at two positions several meters apart making it difficult to determine if it is a single pedestrian or multiple pedestrians.

(8)

(9)

Sammanfattning

I det här examensarbetet undersöks problemet med att målfölja grupper genom att använda bildsekvenser över ett övervakningsområde. Målföljning kan definie-ras som problemet att skatta ett måls tillstånd givet någon kunskap om målets rörelse samt mätningar som relaterar till målets tillstånd. Gruppmålföljning är ett bra alternativ när mål befinner sig för nära varandra vilket leder till en låg detektionssannolikhet, exempelvis i folksamlingar.

För att lösa problemet föreslås en kombination av ett gm-phd-filter och “mean shift clustering”. gm-phd-filtret är en approximation av Bayes filter för multip-la mål. Som indata till filtret används detektioner av människor i bilderna som konverterats till koordinater i en platt värld. Ut från filtret fås komponenter i en Gaussisk mixtur med tillhörande tillståndsvektorer.

Komponenterna delas in i grupper med “mean shift clustering”. För varje grupp redovisas det skattade antalet medlemmar samt gruppens form. Den föreslagna metoden utvärderas genom att använda både en enkel kamera samt att använda två kameror som delvis övervakar samma område.

Resultaten är lovande och redovisar på ett tydligt sätt en grupps karaktäristik. Att använda två kameror gav ingen förbättring av resultatet, antagligen till följd av skillnader i detektioner mellan de två kamerorna.

(10)

(11)

Acknowledgments

As this master thesis concludes my five years spent at Linköping University shoot-ing for a Master of Science degree in Applied Physics and Electrical Engineershoot-ing, some acknowledgements are in order.

First, I would like to thank my examiner Fredrik Gustafsson along with my su-pervisors Karl Granström and Maria Andersson for their guidance, support and encouragement during this master thesis. I truly appreciate you giving me the op-portunity of writing a conference paper and consequently giving me an insight into the world of research.

I also want thank my fellow master thesis companions Johan, Joel and Tomas for their company at FOI during this past semester. It has been good to know that I was not the only one having difficulties from time to time. The same goes for the people present at the weekly Wednesday lunches.

Lastly, I would like to thank my family. My father, mother and younger brother have given me more love and support than I ever could have wished for. I would not have been here without you. Furthermore, a heartfelt thanks to my girlfriend Malin for her love, support, inspiration, joyfulness etc. these past years and the years to come. You all have my deepest gratitude!

Linköping, June 2013 Viktor Edman

(12)

(13)

Notation

Abbreviations

Abbreviation Description

phd _{Probability Hypothesis Density}

gm-phd _{Gaussian Mixture Probability Hypothesis Density} rfs Random Finite Set

gtt Group Target Tracking

Coordinates

Notation Description

x x-coordinate in a 3D world

y y-coordinate in a 3D world

z z-coordinate in a 3D world

xc x-coordinate in a 3D camera coordinate system

yc y-coordinate in a 3D camera coordinate system

zc z-coordinate in a 3D camera coordinate system

xu x-coordinate in an undistorted image plane

yu y-coordinate in an undistorted image plane

xd x-coordinate in a distorted image plane

yd y-coordinate in a distorted image plane

xf horizontal image coordinate

yf vertical image coordinate xi

(16)

Camera calibration

Rrot Rotation matrix

T Translation vector

φ Euler angle of the rotation around the x-axis

θ Euler angle of the rotation around the y-axis

ψ Euler angle of the rotation around the z-axis

f Effective focal length

κ Radial lens distortion coefficient

Cx x-coordinate of centre of radial lens distortion

Cy y-coordinate of centre of radial lens distortion

dx Centre to centre distance between adjacent sensor ele-ments in x-direction

dy Centre to centre distance between adjacent sensor ele-ments in y-direction

sx Uncertainty image scale factor

Probability theory

p(•) Probability density function

p (x|z) Probability density over x given z

N_{(x; µ,}Σ) _{Gaussian probability density function defined over} the random vector x with mean vector µ and covari-ance matrixΣ

Target tracking

x_k State vector at time k

Pk State covariance at time k

px,k, py,k Target position at time k

vx,k, vy,k Target velocity at time k zk Measurement vector at time k

Fk Linear motion model

Hk Linear measurement model

Q Process noise covariance matrix

R Measurement noise covariance matrix

X _{State space}

Z _{Measurement space}

Ts Sample time

Nk Number of targets at time k

(17)

Notation xiii

Sets

R Real numbers

Ξ Random finite set variable

Xk Random finite set of state vectors in X at time k Zk Random finite set of measurement vectors in Z at time

k

Mk Random finite set of Gaussian mean vectors m at time

k

Probability hypothesis density filtering Notation Description

Dk Probability hypothesis density at time k

wk Weight of Gaussian component at time k

m_k Mean vector of a component in a Gaussian mixture at time k

Jk Number of Gaussian components at time k

pS Probability of survival

pD Probability of detection

c(z) Distribution of false alarms

λ Clutter per time step

T Pruning threshold

U Merging threshold

D(g)_k Probability hypothesis density for a group g at time k

Mean shift clustering

Notation Description xk Data point at time k

K(•) Kernel function Σ Covariance matrix

(18)

(19)

1

Introduction

The purpose of this master thesis is to examine and propose a method to solve the problem of group target tracking (gtt) in crowded scenes with pedestrians, us-ing cameras as sensors. Trackus-ing groups of people can be used when surveillus-ing crowds of people at e.g. concerts, evacuations and political demonstrations. This master thesis has been done as a final examination for achieving the degree Mas-ter of Science in Applied Physics and Electrical Engineering at Linköping Univer-sity. The work has been performed at the division of Sensor Informatics, Swedish Defence Research Agency (foi).

This chapter elaborates the problem at hand with some background information. The chapter ends with a summary of the method proposed, along with the layout of the master thesis.

1.1 Problem Formulation

This section presents the problem at hand, and lists the limitations of the master thesis.

The problem at hand is to examine and evaluate group target tracking using im-ages from a camera feed. More precisely, the purpose of the master thesis is to present a possible method for solving this problem using a probability hypothesis density (phd) filter.

The problem of group target tracking can be divided into two sub problems: 1. Tracking a group i.e. to determine the position and the velocity of a group

using measurements;

(20)

2. Representing the group with a spatial extension and number of members. The first part of the problem is tracking the group. That is defining models and examining different methods for determining the position of the group and its velocity. Problems with this is for example the uncertainty of which observations belong to which group. Additionally, what happens when individuals come very close or even obscure each other? In this case it is to be expected that these indi-viduals will not be properly detected. How can possible failures in detections be modelled?

The second part of the problem is describing the group, e.g. what conditions have to be met for targets to be in the same group. This gives rise to a series of problems: How close do the persons have to be to each other? What happens when a person leaves or joins the group? When should a group be split into two or when should two groups be merged? Furthermore, it is desirable to describe the group in terms of approximate extension, shape and number of members. To simplify the problem some limitations have been made. The most significant limitation is that the theory and method behind pedestrian detection are not es-sential to this master thesis. Instead, a suitable tool will be used without guaran-tees on its performance or stability.

Furthermore, the methods do not have to be efficient tracking multiple groups merging and splitting. It is not essential either that the method presented in this master thesis handles the tracking of individual pedestrian targets.

Additionally, proposing real time solution is not necessary, mostly due to that the problem will be examined using Matlab [Matlab, 2011].

1.2 Background

The Kalman filter and the particle filter are two popular methods for tracking single targets. These methods are however insufficient when there are multiple targets in the scene. In this case, the problem instead turns into a multiple target tracking problem which presents several difficulties. Especially associating mea-surements with the correct targets has proved challenging and the result is often uncertain. Examples of this are when targets are overlapping or when the sensor returns false detections and clutter. In the video tracking problem, overlapping targets presents a major difficulty for pedestrian detection algorithms, resulting in unobserved targets. Typical techniques for data association for multiple tar-gets include the Joint Probabilistic Data Association Filter and Multi Hypothesis Tracking, see e.g. [Blackman and Popoli, 1999].

The paragraph above deals with point target tracking, i.e. targets generating no more than one measurement per time step. However, if the point targets are close to each other, making it difficult separating them, or if the observation of targets are uncertain a simplification is tracking groups of point targets instead, i.e. group target tracking. In crowded groups, it is easier to associate

(21)

measure-1.3 Method and approach 3

ments to the group rather than individuals of the group. Intuitively, it is easier keeping track of a few groups rather than many individual point targets. Another reason for group target tracking is when the user is not interested in what the in-dividuals but instead interested in the behaviour of the group, e.g. evacuation scenarios. Group tracking has been investigated in several studies and for sev-eral applications, see e.g.[Konle, 2011, Baum et al., 2010, Zhan et al., 2008, Clark and Godsill, 2007, Rosswog and Ghose, 2012, McKenna et al., 2000]. Group tar-get tracking using the phd filter has been proposed and somewhat examined in [Mahler, 2002, Swain and Clark, 2011, Mahler and Zajic, 2002, Clark and Godsill, 2007, Mahler, 2003a].

Approaches for solving group tracking can roughly be divided into the following [Blackman and Popoli, 1999]:

1. Group tracking without individual tracks; 2. Group tracking with simplified individual tracks;

3. Individual target tracking which is supplemented by group tracking. The most suitable approach largely depends on the application. In crowded scenes, with many potentially false detections and clutter, 1. or 2. would prob-ably be the most practical approaches since tracks of all individuals within the group will be difficult to initiate and maintain in dense crowds.

Group tracking uses the same processes as conventional tracking methods, i.e. detection, measurement update and prediction. An additional step required for group tracking is the representation of the group, in the form of shape and size, which is not necessary in point target tracking. The shape and size of the group can also be used to estimate the behaviour of the group. This is done in for ex-ample [Andersson et al., 2013], using clustering techniques, and in [Carmi et al., 2012], using the phd filter. The behaviour of the group is in these studies repre-sented by group activity (e.g. fights), merge and split.

Using the phd filter combined with cameras as sensors for solving the group tracking problem has been done before [Wang et al., 2006] but it remains largely an unexplored application of the phd filter.

1.3 Method and approach

To solve the problem formulated before, the work of the master thesis has been divided into different stages

1. Literature survey examining method and results of what previously has been done;

2. Choosing suitable methods and implementing them in Matlab; 3. Making adjustments and supplements to the methods chosen; 4. Evaluating the results.

(22)

This work has resulted in the method summarized below:

1. For each image frame, pedestrians are detected by a method and code pro-vided by [Dollár et al., 2009, 2010]. The output from this method is rect-angles, or bounding boxes, corresponding to each detected pedestrian. The mid point of the lower side of each rectangle is assumed to be an estimate of the corresponding pedestrian’s footprint. The output of the algorithm is given as a point pI =

xf, yf

in image coordinates.

2. The second step is to transform these points in the image plane to flat world coordinates pW = (x, y)T. This assumes that the video camera is placed on an elevated position and camera calibration parameters are available for the conversion between image and world coordinates.

3. The flat world coordinates are filtered using a gm-phd filter representing the probability hypothesis density with a Gaussian mixture model. It is important to note that the phd is not a probability density e.g. different components in the Gaussian mixture do not infer different individuals, un-like classical target tracking methods [Bar-Shalom et al., 2011]. Instead a component can be interpreted as a hypothesis for the likelihood of a point target where the weight of the component gives an indication of the hypoth-esis validity. A pleasant feature of the gm-phd filter is that if the weights in the Gaussian mixture model are summed it corresponds to an estimate of the expected number of pedestrians within the surveillance area.

4. The final step is to apply a clustering algorithm to the gm-phd filter output. The algorithm used in this master thesis to group the Gaussian components is a Gaussian mean shift clustering algorithm. For each cluster the position and velocity is estimated and the shape of the group is approximated using a level curve of the cluster’s gm-phd surface. By summing the weights of the Gaussian components in the cluster the number of individuals in respective group is estimated.

The method is summarized in Figure 1.1, Figure 1.2 and Figure 1.3.

1.4 Layout of the master thesis

This section describes the layout of the rest of the master thesis. The next chapter, Chapter 2, describes the datasets that are being used in this master thesis and how they can be used as measurements which later will be input to the gm-phd filter. Chapter 3 presents the filtering made to the data from Chapter 2. The chapter starts with some background theory for tracking targets which leads to an intro-duction of the phd filter and how it is used in this master thesis.

The final chapter presenting the theory and method behind the solution is Chap-ter 4 which describes the formation of groups along with an estimated shape, po-sition and velocity. The master thesis is concluded with Chapter 5 and Chapter 6

(23)

1.4 Layout of the master thesis 5 Dataset Pedestrian detector Conversion of detections to flat world coordinates Image frames Detections Measurements

Figure 1.1:Flow chart of the generation of measurements.

Initialize

Time update

Measurement update

Merging and pruning

Group estimation Measurements Birth components Group ID:s GM Components GM Components GM Components GM Components GM Comp.

Figure 1.2:Flow chart of gm-phd filter with measurements as input gener-ating Gaussian components to the group estimation algorithm.

which presents the results and conclusions with possible improvements respec-tively.

(24)

Mean shift clustering Assigning group ID to GM components Group ID:s GM Components Clusters of GM components Estimate group states and shapes

Visualize groups Groups Groups

(25)

2

Data and preprocessing

This chapter presents the datasets used in this master thesis and how they are preprocessed prior to the filtering. The chapter begins with a description of artifi-cially generated measurements in Section 2.1 and continues with Section 2.2 pre-senting a dataset of image sequences containing pedestrians moving in groups. The last two sections deal with the preprocessing of the real data, Section 2.3 describes the method behind the pedestrian detection and finally Section 2.4 de-scribes how the detections are transformed into a flat world coordinate system for simplifying the target tracking.

2.1 Artificial data

The artificial data is used as a comparison when evaluating the real dataset. The idea is that the artificial data is almost ideal, i.e. eliminating uncertainties from a pedestrian detection algorithm.

The data artificially generated represents measurements in a flat 2D-world i.e. no detection algorithm is needed. The data is generated by evaluating two functions, one for the x-coordinates and one for the y-coordinates, giving different values and uncertainties in both directions. Both functions maps the sample number to the targets’ positions, making the targets move relative to the last time step. When creating the artificial data, it is assumed that all targets are detected i.e.

pD = 1, there is no clutter and the probability for a target’s survival is pS = 1. The measurement for each target is the true target position with some added Gaussian noise with mean µ and variance σ2. The sampling time is assumed to be Ts= 1s.

See Figure 2.1 for an example of artificially generated groups.

(26)

2 4 6 8 10 1 2 3 4 5 6 7 8 9 x [m] y [m]

Figure 2.1:Example of artificially generated groups with σ2= 0.052for each measurement. Each cross represents a detection, i.e. a measurement.

2.2 Real data

For evaluating the proposed method for solving the group target tracking prob-lem using real data, the PETS 2012 dataset [University of Reading, 2012] is uti-lized. This dataset consists of image sequences from up to four different camera views, see Figure 2.2 for the camera locations. As can be seen from the figure, the cameras surveils the same 3-way junction.

The cameras are not perfectly synchronised and especially camera number four is out of sync with the others. The cameras take approximately seven pictures per second, which means that the sample time is:

Ts≈ 1

7 s. (2.1)

Furthermore, the scenario in the image sequence is that several groups of people move along a road from one edge of the image to the other. Figure 2.3 shows a snapshot from the dataset called Flow Analysis and Event Recognition. As can be seen from the figure, the groups are rather dense with several pedestrians walk-ing close to each other. Therefore, some persons are obscured by other persons which can cause problems for the detection algorithm, and consequently also for the tracking and estimation of groups.

(27)

2.3 Detection of pedestrians 9

Figure 2.2: The camera positions in the PETS dataset [University of Read-ing, 2012]. The image originates from Google Maps [2009] and was later modified by University of Reading.

2.3 Detection of pedestrians

For detection of pedestrians in the images, the methods and the code presented by Piotr Dollár [Dollár et al., 2009, 2010] are used. The detection algorithm uses integral channel features [Dollár et al., 2009] for detecting pedestrians from a single image, no prior information is needed for the detections. Dollár concludes that this method outperforms for instance the method based on histogram of oriented gradients [Dollár et al., 2009].

Partly due to the lack of prior knowledge, the algorithm has difficulties detecting pedestrians that are partially or fully obscured by other objects. This gives rise to missed detections.

The algorithm returns a rectangle, or bounding box, for each detected pedestrian, see Figure 2.4 for an example. It is not in the scope of this master thesis improving this algorithm. It is only used for extracting measurements from the dataset.

(28)

Figure 2.3: A snapshot from camera view 1 in the PETS2012 dataset [Uni-versity of Reading, 2012]

2.4 Conversion between image and world

coordinates

Tracking objects in the image plane is possible, but the drawback is that physical motion of pedestrians is harder to model in the image plane. For instance, a target is perceived moving slower farther away from the camera than close to it. Furthermore, clustering is easier to perform in physical quantities. Instead of tracking in the image plane, the goal is to follow both individuals and groups in the ground plane, i.e. in world coordinates. Hence, the centre point for the lower edge of each bounding box is transformed into world coordinates, which are used as measurements approximating the positions of the pedestrians’ feet.

The data from PETS 2012 [University of Reading, 2012] includes a camera cal-ibration file. This file contains different calcal-ibration parameters that have been determined by usingTsai camera calibration model [Tsai, 1987]. These parameters

can be used to transform image coordinates (xf, yf) to ground plane coordinates (x, y, z).

(29)

2.4 Conversion between image and world coordinates 11

Figure 2.4:Example of pedestrian detections in an image.

coordinates (xd, yd),

xd = dx(xf −Cx)/sx, (2.2)

yd = dy(yf −Cy), (2.3)

where dx, dyare centre to centre distance between adjacent sensor elements in x and y direction respectively, Cy, Cx are coordinates of centre of radial lens dis-tortion and sx is a scale factor compensating for uncertainty imperfections in hardware timing for scanning and digitisation.

The second step is to transform the distorted coordinates into undistorted image coordinates (xu, yu), xu = xd(1 + κr2), (2.4) yu = yd(1 + κr2). (2.5) where r is r = q x2_d+ y_d2, (2.6)

and κ is the radial lens distortion coefficient.

(30)

is given by the following system of equations assuming that all targets move in the ground plane defined by z(x, y) given by a terrain elevation map,

        xuzc/f yuzc/f zc         = R         x y z(x, y)         + T , (2.7)

where f is the focal length, zcis the camera’s z-coordinate which is unknown, R is the rotation matrix,

R =        

cos θ cos ψ cos ψ sin φ sin θ − cos φ sin ψ sin φ sin ψ + cos φ cos ψ sin θ cos θ sin ψ sin φ sin θ sin ψ + cos φ cos ψ cos φ sin θ sin ψ − cos ψ sin φ

−_{sin θ} _{cos θ sin φ} _{cos φ cos θ}

        , (2.8)

and T is the translation vector,

T =         Tx Ty Tz         . (2.9)

To solve (2.7) the following assumption is made:

2.1 Assumption. Targets move in a flat world, i.e. z(x, y) = 0. Given the assumption above, the solution for a flat world is given by

x

=

(Tx−xuTz/f )

(

yuR3,2/f −R2,2

)

−

(

xuR3,2/f −R1,2

)(

Ty−yuTz/f

)

(

x_uR_3,1/f −R_1,1

)(

y_uR_3,2/f −R_2,2

)

−

₍

_x uR3,2/f −R1,2

)(

yuR3,1/f −R2,1

)

, (2.10)

y

=

(

xuR3,1/f −R1,1

)(

Ty−yuTz/f

)

−(Tx−xuTz/f )

(

yuR3,1/f −R2,1

)

(

xuR3,1/f −R1,1

)(

yuR3,2/f −R2,2

)

−

(

xuR3,2/f −R1,2

)(

yuR3,1/f −R2,1

)

.

(2.11)

The coordinates (x, y) will be used as measurements in the tracking problem. An example of conversion of coordinates can be seen in Figure 2.5.

(31)

2.4 Conversion between image and world coordinates 13

−15

−10

−5

0

5

10 −15

−10

−5

0

5

10 x [m]

y [m]

Cam 1

Road

F.o.v. cam 1

(32)

(33)

3

Target tracking and filtering

This chapter presents the filtering of the measurements used for tracking the targets. Target tracking is the problem of estimating the state vector of the tar-get by compromising sensor measurements and knowledge about its mobility [Gustafsson et al., 2010]. The chapter starts with introducing theory for classical single target tracking, which eventually leads to theory about multi target track-ing ustrack-ing the phd filter. Finally, the method for multiple target tracktrack-ing with the gm-phd_{filter is presented.}

3.1 Single target tracking

In single target tracking, the goal is to track a single target. This can be done given some measurements and prior knowledge about the target motion. However, the measurements and the knowledge about the target is in the real world never per-fect. A way of dealing with these uncertainties is by modelling the motion and measurements using probability densities p(•) resulting in a motion model and measurement model respectively. The motion model pk+1|k(xk+1|xk) is used for predicting the next state vector xk+1and the measurement model pk+1(zk+1|xk+1) connects the measurements to the state vector xk+1. This is popularly done using the Bayes single target, single sensor recursive filter recursion which is given by the following equations [Mahler, 2007]:

pk+1|k(xk+1|z1:k) = Z pk+1|k(xk+1|xk) pk|k(xk|z1:k) dxk (3.1) pk+1|k+1 x_k+1|z_1:(k+1)= pk+1(zk+1|xk+1) pk+1|k(xk+1|z1:k) pk+1(zk+1|z1:k) (3.2) 15

(34)

where xk denotes the target state vector at time k, zkdenotes the sensor measure-ments at time k and Bayes normalization factor is

pk+1(zk+1|z1:k) =

Z

pk+1(zk+1|xk+1) pk+1|k(xk+1|z1:k) dxk+1, (3.3)

which is necessary to make the area under the graph equal 1.

Equation (3.1) is commonly known as the time update, where the prior for the next time step is predicted using the knowledge about the target’s motion. Equa-tion (3.2) is commonly known as the measurement update where the posterior is estimated by correcting the prior with the aid of a measurement.

The filter propagates through time according to:

p0(x0) → p1|0(x1|z0) → p1|1(x1|z1) → . . . (3.4)

pk|k(xk|z1:k) → pk+1|k(xk+1|z1:k) → pk+1|k+1

x_k+1|_z_1:(k+1)_, _(3.5)

where p0(x0) denotes a guess of the target’s initial probability distribution.

If the probability densities are Gaussian and the models are linear, the recursion is known as the Kalman filter.

3.2 Random Finite Sets

In this master thesis, single target tracking is not a suitable approach to solve the problem, see Chapter 1. A multitarget equivalent is more desirable and will be presented in the next section. However, first one needs to define random fi-nite sets which is a way of representing multitarget states and measurement. A random finite set is defined by [Mahler, 2007] as follows:

3.1 Definition (Random finite set). A random variableΞ that draws its instan-tiationsΞ = X from the hyperspace X of all finite subsets X (the null set ∅ in-cluded) of some underlying space X0.

In this master thesis, the underlying space X0is the Euclidean vector space Rnx

where nxis the number of state variables. Consequently, the hyperspace X con-sists of all the finite sets according to

X=∅ X=x(1)∈ Rn .. . X=nx(1), . . . , x(N )o, x(i)∈ Rnx∀_{i ∈ {1, . . . , N }} (3.6)

(35)

3.3 Bayes filter for multiple target tracking 17

3.3 Bayes filter for multiple target tracking

When tracking groups of individuals, there is a need for a filter that can han-dle multiple targets. Consequently, Bayes single target filtering described in Sec-tion 3.1 will not suffice. This secSec-tion presents the multi target Bayes filter which is a generalization of the single target version to multitarget problems [Mahler, 2007]. The generalization is made by modelling the targets’ respective state vec-tors and the measurements as random finite sets.

Let Zk be an rfs of measurements at time k, Xkan rfs of states at time k,

pk+1|k(Xk+1|Xk) be the multi-target Markov density and pk+1(Zk+1|Xk+1) the

multi-source likelihood function. Then the Bayes multitarget filter time update and measurement update are [Mahler, 2007],

pk+1|k(Xk+1|Z1:k) = Z pk+1|k(Xk+1|Xk) pk|k(Xk|Z1:k) δXk, (3.7) pk+1|k+1 X_k+1|_Z_1:(k+1)₌ pk+1(Zk+1 |_X_k+1_{) p}_k+1|k_(X_k+1|_Z_1:k₎ pk+1(Zk+1|Z1:k) , (3.8)

where the normalization factor is

pk+1(Zk+1|Z1:k) =

Z

pk+1(Zk+1|Xk+1) pk+1|k(Xk+1|Z1:k) δXk+1. (3.9)

The propagation through time is analogous to (3.1) but with rfs and set integrals instead of single vectors and vector integrals.

As one can see, this filter resembles the single target Bayes filter and is as well theoretically optimal [Mahler, 2007]. However, due to the necessity of comput-ing set integrals, the recursion is not computationally tractable [Mahler, 2007]. Therefore approximations are needed to be able to use the filter in practice.

3.4 The Probability Hypothesis Density Filter

The probability hypothesis density (phd) filter is a solution to the multi-target tracking problem [Mahler, 2003b, 2007]. Furthermore, the phd-filter is an ap-proximation of the optimal first-order multitarget-moment Bayes filter and was proposed as an analogous solution to the constant gain Kalman filter for single targets. Rather than returning a probability distribution, the phd filter returns a probability hypothesis density Dk|k which can be interpreted as a target den-sity, where peaks indicate a greater likelihood of an existing target in that area [Mahler, 2007]. See Figure 3.1 for an example of a phd surface. The phd has the property that the integral over the whole state space is the number of expected targets in the state space, i.e.

(36)

Figure 3.1:A Gaussian mixture phd-surface with 136 Gaussian components.

Nk|k, Z

Dk|k(x) dx, (3.10)

where Nk|k denotes the number of targets and Dk|k(x) denotes the phd [Mahler, 2007]. This is a distinct difference from a probability density function which when integrated equals 1.

As well as the classic Bayes filter, the phd filter consists of two steps: a time update (prediction) and measurement update (correction). The recursion is as follows . . . → Dk|k(x) → phd−_predictorDk+1|k(x) → phd−_correctorDk+1|k+1(x) → . . . (3.11)

3.4.1 Initialization

Before a prediction can be made, the phd filter requires an initial phd estimate. This prior phd can be described as

D0|0(x) = n0s0(x0) , (3.12)

where n0denotes the initial estimated number of expected targets and s0denotes

some probability density with peaks at the initial target positions. If very little is known about the initial targets, a wise choice is to model the probability density

(37)

3.4 The Probability Hypothesis Density Filter 19

s0as a uniform distribution.

3.4.2 Time update

The time update (prediction step) of the phd filter predicts the phd at time k + 1 using prior information up until current time k, just like Bayes filter. The phd filter prediction assumes that the multitarget transition model follows the follow-ing assumptions [Mahler, 2007]:

3.2 Assumption. Target movements are statistically independent. 3.3 Assumption. Targets can disappear from the scene.

3.4 Assumption. New targets can be spawned by existing targets.

3.5 Assumption. New targets can appear in the scene independently of existing targets.

Given these assumptions, the prediction step of the phd filter is given by [Mahler, 2007]:

(3.13)

Dk+1|k(xk+1) = γk+1|k(xk+1)

| {z }

Prediction of birth targets

+ Z

pS(xk) · pk+1|k(xk+1|xk) dxk | {z }

Prediction of persisting targets

+ Z

βk+1|k(xk+1|xk) dxk

| {z }

Prediction of spawned targets

.

The different factors in the equation above are briefly described in Table 3.1. Table 3.1:Description of the different factors in the phd predictor (3.13).

pk+1|k(xk+1|xk) The single-target Markov transition density

pS(xk) The probability that a target with state xk at time step k will survive to time k + 1

βk+1|k(xk+1|xk) The likelihood that a group of new targets with state xk+1

will be spawned at time step k + 1 from a single previous target with state xk at time step k

γk+1|k(xk+1) The likelihood that new targets with state xk+1will enter

(38)

3.4.3 Measurement update

The next step in the phd recursion is the measurement update (correction step) where measurements are used to estimate the phd surface at time k + 1. To do this, the phd filter assumes the following [Mahler, 2007]:

3.6 Assumption. No target generates more than one measurement.

3.7 Assumption. Each measurement is generated by no more than one target. 3.8 Assumption. All measurements are conditionally independent of target states, missed detections and a multiobject Poisson false alarm process.

Given these assumptions and a predicted phd Dk+1|k(xk+1) from the previous section, the measurement update is as follows [Mahler, 2007]:

Table 3.2:Description of the different factors in the phd predictor (3.13). Notation Description

pD(xk+1) Probability of detection at time k + 1 from target with state xk+1.

pk+1(zk+1|xk+1) The sensor likelihood function.

λk By the sensor collected average number of Poisson dis-tributed false alarms.

ck(zk+1) The spatial distribution of the false alarms

If there are multiple sensors, these are easily implemented by repeating the mea-surement update for each sensor with the corresponding sensor model [Mahler, 2007]. In this master thesis, this results in that multiple cameras can be utilized for detections.

3.4.4 Advantages

The phd filter has the following advantages [Mahler, 2007]:

• The phd filter has potentially desirable computational qualities. Its compu-tational complexity is O(mn) where n is the number of targets and m is the number of measurements in Z.

(39)

3.5 Gaussian mixtures 21

• It admits modelling of targets disappearing, birth of new targets as well as spawning of target from existing targets.

• The phd filter does not require measurements to be coupled with existing targets.

3.4.5 Disadvantages

The phd filter has the following disadvantages [Mahler, 2007]:

• Suffers from high variance in estimated number of targets when false and missed detections are present.

• Loss of information since the phd is an approximation of the full multi target distribution pk|k(X).

3.5 Gaussian mixtures

In the next section, the Gaussian mixture phd filter is described, which uses Gaus-sian mixture models. A GausGaus-sian mixture model is a weighted sum of J GausGaus-sian components,

p(x|π) =

J X

i=1

w(i)N_{x; m}(i)_{, P}(i)_, _(3.15) where

π = {w(i), m(i), P(i)} _{i = 1, . . . , J,} _(3.16) and w(i)denotes a weight for Gaussian component i, m(i)denotes the mean vector for Gaussian component i and P(i)the covariance matrix for Gaussian component

i.

Gaussian mixture models can be used in many different kinds of applications e.g. machine learning, speech recognition and, as in this master thesis, for target tracking. The reason for this wide area of applications is that a Guassian mixture model can approximate any function with arbitrary precision. However, a draw-back is that the mixture usually grows at an exponential rate when processed in recursion, see e.g. [Schieferdecker and Huber, 2009].

(40)

3.6 Gaussian mixture

PHD

filter

This section describes the Gaussian mixture phd filter which is a proposed closed form solution to the original phd filter. This is done by estimating the phd with a Gaussian mixture model. The gm-phd filter was first presented in [Vo and Ma, 2006]. It is the gm-phd filter that is used in this master thesis for filtering the measurements.

To derive the Gaussian Mixture phd filter, the following assumptions must be made:

3.9 Assumption. All targets evolve and generate observations independently of each other.

3.10 Assumption. Clutter in measurements are generated by a Poisson process and are independent of target-oriented measurements.

3.11 Assumption. The predicted multi-target rfs is Poisson distributed. 3.12 Assumption. The single targets follow a linear Gaussian dynamical model and the sensor measurements are modelled with a linear Gaussian model, i.e.

pk+1|k(xk|xk−1) = N (xk; Fk−1xk−1, Qk−1), (3.17)

gk(zk|xk) = N (zk; Hkxk, Rk). (3.18)

3.13 Assumption. The survival and detection probabilities are state indepen-dent, i.e.

pS(xk) = pS, (3.19)

pD(xk) = pD. (3.20)

3.14 Assumption. The birth and spawn rfs intensities are Gaussian mixtures of the form γk(x) = Jγ,k X i=1 w(i)_γ,kN x; m(i)_γ,k, P_γ,k(i) , (3.21) βk|k−1(xk|xk−1) = Jβ,k X j=1 w(j)_β,kN x; F_β,k−1(j) xk−1+ d(j)_β,k−1, Q(j)_β,k−1 , (3.22)

(41)

3.6 Gaussian mixture phd filter 23

The following sections include the mathematical formulas for prediction and measurement update in the gm-phd filter, pseudo code can be found in [Vo and Ma, 2006].

3.6.1 Initialization

The GM-PHD filter is initialized with a Gaussian mixture intensity

D0(x) = J0 X i=1 w(i)₀ N x; m(i)₀ , P₀(i) , (3.23)

where J0denotes the initial expected number of targets, w(i)0 denotes the weight

of the initial i:th target, m(i)₀ denotes the initial i:th target’s state and P₀(i)denotes the corresponding covariance.

3.6.2 Prediction

The prediction of the gm-phd filter is given by the following equations:

Dk|k−1(x) = DS,k|k−1(x) + Dβ,k|k−1(x) + γk(x), (3.24)

Here, γk(x) is given by 3.21 and

Spontaneous birth and spawn birth

In the gm-phd filter, the spontaneous birth model is given by (3.21) as a sum of weighted Gaussian components. For readability, the formula will be repeated here: γk(x) = Jγ,k X i=1 w(i)_γ,kN x; m(i)_γ,k, P_γ,k(i) , (3.31)

(42)

The state for the born target is given by m(i)_γ,k with the corresponding covariance matrix P_γ,k(i).

Furthermore, it was earlier mentioned that the gm-phd filter can model the birth of targets spawning from another target as well, see (3.32), e.g. missiles released from a fighter aircraft. The equation is repeated below:

βk|k−1(xk|xk−1) = Jβ,k X i=1 w(i)_β,kN

x; F_β,k−1(i) x_k−1+ d(i)_β,k−1, Q(i)_β,k−1

, (3.32)

where w(i)_γ,kdenotes the weight of the i:th Gaussian component.

The spawned target’s motion is modelled with F_β,k−1(i) . How the spawned target is born relative to the parent target is modelled with d(i)_β,k−1, e.g. one can assume that a missile initially will move forward relative to the fighter aircraft.

For both birth models, one has to determine the weights w(i)γ and w(i)β for Jγ and

Jβcomponents respectively. The reasoning performed for choosing these weights is the same for both models.

Let X0 be the state space and A be a subset of X0 corresponding to the

surveil-lance area. If all components have their significant probability mass within A the expected number of births in each time step arePJγ,k

i=1w (i) γ,k and PJβ,k i=1w (i) β,k respec-tively. If only some components have their significant probability mass within A the expected number of births are the sum of those weights. Birth components with only half their probability mass within A will only contribute with half their weight. The reasoning behind all this is that the integral over the phd equals the expected number of targets. When the phd is a Gaussian mixture the integral for all Gaussian components respectively equals 1 multiplied with the correspond-ing component’s weight hence resultcorrespond-ing in a sum of weights. Consequently, the weights should be chosen as an expectation of the number of births per time step where a larger weight indicates a higher likelihood for a birth.

Furthermore, in practice, one has to test different placements of the Gaussian birth components along with their weights using intuition and information about the scenario at hand.

(43)

3.6 Gaussian mixture phd filter 25

3.6.3 Measurement update

The update of the gm-phd filter is given by the following equations:

Clutter can be seen as all the measurements that are not obviously associated to a target. The clutter in the measurements is modelled as an Poisson process independent of targets according to:

κk(z) = λkck(zk+1), (3.40) where λk denotes the expected number of clutter measurements at time k and

ck(zk+1) denotes the spatial probability distribution describing how the clutter is spread out in the measurement space. In this master thesis, ck(zk+1) is modelled as a uniform distribution 1/A where A is the size of the sensor’s surveillance area [Granström, 2012]. Consequently, κk(z) can be interpreted as the number of clutter measurements per area unit.

3.6.4 Merging and Pruning

The number of Gaussian components increases for every iteration which is com-putationally demanding. To deal with this a prune and merge procedure is im-plemented. In a pruning step components with a weight w(i)_k below a threshold

(44)

distance dM= r m(i)_k −_m(j) k T P_k(j) m(i)_k −_m(j) k , (3.41) dM= r m(i)_k −_m(j) k T P_k(i) m(i)_k −_m(j) k , (3.42)

below some threshold U from each other are merged into one single Gaussian component according to w(m)_k =X i∈M w_k(i) (3.43) m(m)_k = 1 ¯ w(m)_k X i∈M w(i)_k m(i)_k (3.44) P_k(m)= 1 ¯ w(m)_k X i∈M w(i)_k P_k(i)+ m(m)_k −_m(i) k m(m)_k −_m(i) k T! (3.45) where M denotes the set of Gaussian components within a Mahalanobis distance

U from each other. The Mahalanobis distance is calculated using the covariance

matrix from component i (3.41) as well as using the covariance matrix from com-ponent j (3.42) since they usually are not the same.

If there still are too many Gaussian components after the prune and merge, only the Jmaxcomponents with the largest weights are saved for next iteration.

(45)

4

Extraction and presentation of groups

This chapter describes how the components of the Gaussian mixture, given by the gm-phd_{filter’s output, can be exploited for representing a group. This is done by} first applying a clustering algorithm to the Gaussian components dividing them into different groups, see Section 4.1. In Section 4.2, it is described how groups are identified and tracked through time. Section 4.3 describes how the position and velocity are estimated for each group. Section 4.4 describes how the phd surface can be used for giving an estimate of the corresponding shape of each group.

4.1 Gaussian mean shift clustering

After pruning and merging in the gm-phd filter, the Gaussian components are divided into groups. The division is done by a method known as mean shift clustering [Fukunaga and Hostetler, 1975, Cheng, 1995]. Mean shift clustering is a simple method for shifting data points to an average point in their surroundings [Cheng, 1995]. One advantage of mean shift clustering algorithms compared to other popular clustering algorithms is that the number of clusters is not an input parameter.

In this master thesis, the data points which are clustered are the output from the gm-phd filter in form of the Gaussian components mean vectors in the set M_k = m(1)_k , . . . , m(Jk) k . For each point xk = m

(r)

k from Mk where r ∈ 1, . . . , Jk the sample mean is calcu-27

(46)

−5 0 5 10 15 −5 0 5 10 15 20 x y (a) −5 0 5 10 15 −5 0 5 10 15 20 x y (b)

Figure 4.1: (a) Data points in a two dimensional space. (b) The data has been divided into four clusters.

lated according to:

x_k+1= PJk i=1K m(i)_k −_x_k m(i)_k PJk i=1K m(i)_k −_x_k , (4.1)

where K(•) denotes the kernel function. The difference xk+1−xk is called the mean shift vector i.e. the distance the point xkhas moved.

In this master thesis, the kernel is a Gaussian distribution i.e.

K m(i)_k −x_k = N x_k; m(i)_k ,Σ , (4.2)

whereΣ denotes a parameter determining the “size” of the kernel, e.g. a covari-ance matrix with large elements will lead to larger and consequently fewer clus-ters. The Gaussian kernel has the property that it will converge to the densest region in its surrounding.

Mean shift clustering is an iterative method and (4.1) will be repeated for the points’ new positions until the norms of the mean shift vectors are below some convergence threshold i.e. when the points do not move farther than a distance δ from the previous iteration. This indicates that the points have found the nearest dense region i.e. cluster center.

All points close enough to some cluster centre is considered to be in the same cluster. The output of the algorithm is the number of clusters along with the clusters and their members. The total algorithm is summarized in algorithm 1. The clusters are reformed into groups in the next section.

Figure 4.1 shows an example of a data point cloud that has been divided into different clusters using this method.

(47)

4.2 Determining group belonging 29

Algorithm 1:Mean shift clustering

input : data, convThresh,sigma,distThresh while 1 do

fork = 1:numberOfPoints do num=PJk

i=1Ndist(data(k ),data(i ),sigma)data(i ) den=PJk

i=1Ndist(data(k ),data(i ),sigma) newData(k ) = num/den

end

distanceMatrix= DistanceBetweenAllPoints(newData) Put data points close enough to each other in same cluster clusters=

DetermineClustersWrtDistance(distanceMatrix,distThresh) numberOfClusters= Size(clusters)

CalculateMeanShiftVector(newData,data) if MaxMeanShiftVector< convThresh then

Break % Exit loop when all points have converged. end

end

output: numberOfClusters, clusters

4.2 Determining group belonging

The mean shift clustering algorithm returns the current clusters. This section describes how these clusters are labelled with a group ID for track continuity. The set of group ID:s for every Gaussian components is denoted

Tk =

T_k(1), . . . , TJk

k

and T₀(i)= −1 ∀i ∈ 1, . . . , J0, i.e. components that yet not have

been designated a group will have ID −1. To keep track of the components group ID, a few things have to be added to the gm-phd recursion, see algorithm 2. Algorithm 2 only describes how the gm-phd filter updates the group ID:s and not how the ID:s are determined. Algorithm 3 assigns each created cluster in the mean shift clustering process a unique group ID. The algorithm is run immedi-ately after the mean shift clustering step.

Basically, the algorithm sorts the clusters descending with respect to the number of members. Every cluster is assigned a wanted group ID depending on what group was the most popular among the cluster’s Gaussian components during the previous time step. The cluster with the most estimated number of members gets to choose group ID first. This is repeated until all clusters have been assigned a group ID. If a group ID is already occupied, the cluster will be assigned the sec-ond most popular ID and so on. Consequently, a group can change ID from one time step to another if there is a larger group wanting the same ID or if a larger

(48)

Algorithm 2:Updating group ID:s input : Jk−1,Tk−1 Prediction i = 0 for j = 0, . . . , Jγdo i = i+1 ..

. % Unchanged code from the GM-PHD filter pseudo code

T_k|k−1(i ) = -1 % Spontaneous birth components are not given

a group ID end for j = 0, . . . , Jβdo for l = 0, . . . , Jk−1do i = i+1 .. .

T_k|k−1(i )=Tk−1(l ) % Spawned component are given the same

group ID as its parent end end for j = 0, . . . , Jk−1do i = i+1 .. .

T_k|k−1(i )=Tk−1(j ) % Existing components are given the same

group ID as in previous iteration end J_k|k−1= i Update for j = 0, . . . , Jk|k−1do .. .

T_k(j )=Tk|k−1(j ) % Group ID:s are updated end l = 0 for n = 0, . . . , |Zk|do l = l+1 for j = 0, . . . , Jk|k−1do .. .

T_k(l · Jk|k−1+ j)=Tk|k−1(j ) % Group ID:s are same as the previous component end .. . end J_k = l · Jk|k−1+ Jk|k−1 Prune and merge

For components that are merged the most popular ID is chosen as the merged component’s ID. If two ID:s are equally popular, the smaller will be chosen. output: Tk, Jk

(49)

4.3 Estimating position and velocity for a group 31

group has merged with the current group. Newly created Gaussian components are given the ID: −1. If group ID −1 is the most popular in a cluster, the cluster will be assigned the maximum group ID +1.

4.3 Estimating position and velocity for a group

The position and velocity for a group with index g is estimated according to how the Gaussian components are merged in the gm-phd filter, see Section 3.6.4. That is ¯ w(g)_k = J_k(g) X i=1 w(g,i)_k , (4.3) ¯ m(g)_k = 1 ¯ w(g)_k J_k(g) X i=1 w(g,i)_k m(g,i)_k , (4.4)

where ¯w(g)_k denotes the total weight of the group with index g and ¯m(g)_k denotes the corresponding group’s mean state vector. (g, i) denotes the i:th component in group g

Equation (4.3) is also an estimate of the number of individuals in the group, i.e. ¯

N_k(g)= ¯w(g)_k . (4.5)

4.4 Shape estimation of groups

The shape of a group is estimated through its gm-phd surface. For each group g, the two dimensional gm-phd surface is calculated according to

D_k(g)(xp) = Jg,k X i=1 w(g,i)_k N xp; m (g,i) p,k , P (g,i) p,k , (4.6) where m(g,i)_p,k =        p_x,k(g,i) p_y,k(g,i)        , (4.7) xp= x y ! (4.8)

(50)

Algorithm 3:Determining group ID input : numberOfClusters, clusters,Tk Find each clusters most wanted ID for i = 0, . . . , numberOfClusters do

wantedID(i ) = FindMostCommonPreviousID(clusters(i )) votes(i ) = VotesForPreviousID(clusters(i ))

end

Assign a group ID to each cluster occupiedID= ∅

while! IsEmpty(votes) do groupID= -1

while groupID ≤0 do

Sort so that the cluster with most members/votes is placed first clusterPriorityIndex= Sort(votes,’descending’ ) clusterNr= clusterPriorityIndex(first ) members= ClusterMembers(clusterNr) if wantedID(i ) == -1 then groupID= maxOldID maxOldID= maxOldID + 1 votes(clusterNr)= ∅

else if wantedIDin occupiedID then wantedID= GetMostWantedIDNotInOccupied(occupiedID,Tk,members) groupID= -1 else groupID= wantedID votes(clusterNr)= ∅ end end

occupiedID= [occupiedID, groupID] groups(groupID)= clusters(clusterNr)

Update the group ID:s for the Gaussian components T_k(members)= groupID

end

(51)

4.4 Shape estimation of groups 33 P_p,k(g,i)=           σ2 p(g,i)_x,k σp(g,i)x,k σ_p(g,i) y,k ρ_p(g,i) x,k ,p (g,i) y,k σ_p(g,i) y,k σ_p(g,i) x,k ρ_p(g,i) x,k ,p (g,i) y,k σ2 p(g,i)_y,k           , (4.9) where ρ_p(g,i) x,k ,p (g,i) y,k

denotes the correlation between p(g,i)_x,k and p(g,i)_y,k .

The surface is then intersected at a threshold τ. This intersection is interpreted as an approximation of the groups’ shapes and sizes. The threshold τ is calculated according to: τ = maxxpD (g) k (xp) 2 ¯ N_k(g) 2 . (4.10)

This means that the threshold will be at half the value of the maximum peak divided with the estimated number of members in the group squared. This solu-tion allows an estimate of the shape even though it might be very uncertain if the group actually exists. The formula has no scientific derivation but is the result of intuition and studies of the results.

(52)

(53)

5

Setup and results

In this chapter, the models and parameters used along with the corresponding results are presented. The method in this master thesis has been tested for three different setups; artificial data, real data using one camera and real data using two cameras.

5.1 Models and parameters

This section presents the models and parameters used in this master thesis. The parameters have been chosen by trying different parameters and studying the results. It is difficult motivating the choice of a single parameter since the param-eters are very dependent of each other. However, when choosing the paramparam-eters, one has to have the scenario at hand in mind, e.g. the groups move in meters from each other.

5.1.1 Data and preprocessing parameters

This section presents the parameters used for the data and preprocessing.

Dataset

The artificial dataset is evaluated by simulating 20 consecutive frames. The frames consist of two groups of eight individuals each. One of the groups moves coor-dinated, i.e. do not move relative to the other group members, along a vertical trajectory and the other moves coordinated along a horizontal trajectory. Each target is given some Gaussian measurement noise with mean µ = 0 and variance

σ2 = 0.052. However, all targets are detected and there is no clutter, i.e. almost ideal conditions.

Tracking Groups of People in Video Surveillance

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Tracking Groups of People in Video Surveillance

Tracking Groups of People in Video Surveillance

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Abstract

Sammanfattning

Acknowledgments

Contents

Notation

1

Introduction

1.1

Problem Formulation

1.2

Background

1.3

Method and approach

1.4

Layout of the master thesis

2

Data and preprocessing

2.1

Artificial data

2.2

Real data

2.3

Detection of pedestrians

2.4

Conversion between image and world

coordinates

x

=

(

)

(

)(

)

(

)(

)

(

)(

)

y

=

(

)(

)

(

)

(

)(

)

(

)(

)

.

−15

−10

−5

0

5

10

−15

−10

−5

0

5

10

x [m]

y [m]

Cam 1

Road

F.o.v. cam 1

3

₍