Institllti'onen for systemteknik
Department of Electrical Engineering
Examensarbete
Investigations in Tracking and Colour Classification
TE
AndersMoe Reg nr: LiTH-ISY-EX-1967 7 December 1998LINKOPINGS UNIVERSIIET
Investigations in Tracking and Colour Classification
Examensarbetet utfOrt i Bildbehandling yid Tekniska Hogskolan i Linkoping av
Anders Moe
Reg nr: LiTH-ISY-EX-1967
Handledare:
Klas Nordberg, Linkopings Universitet, Sverige Johan Wiklund, Linkopings Universitet, Sverige
Examinator:
Klas Nordberg, Linkopings Universitet, Sverige
Avdelning, Institution Division, department
Department of Electrical Engineering Computer Vision
Sprak Rapporttyp ISBN Language Report: category
D SvenskalSwedish
o
Licentiatavhandling ISRNo EngelskaJEnglish o Examensarbete
o
C-uppsats Serietitel och serienummero
D-uppsats Title of series, numbering0
o
bvrig rapport0
LiTH-ISY-EX- 1967
URL fOr elektronisk version
Titel
Title Investigations in Tracking and Colour Classification Undersokningar inom fOljning och fargklassificering
Forfattare Anders Moe
Author Sammanfattning Abstract ISSN Datum Date 1998-12-07
Den har rapporten behandlar huvudsakligen tre olika problem. Det fOrst problemet ar hur man ska filtrera fordons positions data. For att gora detta maste fordonen Ioljas. Detta ar gjort med ett KalmanfiIter. Det andra problemet var att styra en kamera sa att ett givet fordon ligger mitt i bild, tre olika fOrhallande har betraktats. Detta lOstes huvudsakligen med ett Kalmanfilter. Det sista problemet var hur man ska anvanda fordonens farg sa att man far siikrare klassificering av demo Nagra forslag pa hur detta kan goras ges, men ingen riktigt bra metod har hittats.
In this report, mainly three different problems are considered. The first problem considered is how to filter position data of vehicles. To do so the vehicles have to be tracked. This is done with Kalman filters. The second problem considered is how to control a camera to keep a vehicle in the center of the image, under three different conditions. This is mainly solved with a Kalman fil-ter. The last problem considered is how to use the color of the vehicles to make classification of them more robust. Some suggestions on how this might be done are given. However, no really good method to do this has been found.
Abstract
In this report, mainly three dierent problems are considered. The rst problem considered is how to lter position data of vehicles. To do so the vehicles have to be tracked. This is done with Kalman lters. The second problem considered is how to control a camera to keep a vehicle in the center of the image, under three dierent conditions. This is mainly solved with a Kalman lter. The last problem considered is how to use the color of the vehicles to make classication of them more robust. Some suggestions on how this might be done are given. However, no really good method to do this has been found.
Acknowledgments
I would like to thank the people at the Computer Vision lab at ISY for taking their time answering all kinds of questions and helping me with the equipment.
I thank my examiner Klas Nordberg and my opponent for correcting this thesis and giving suggestions on improvements.
Contents
1 Introduction
5
1.1 WITAS . . . 5 1.2 Position measuring . . . 6 1.3 Camera control . . . 7 1.4 Color classication . . . 7 1.5 Problem denitions . . . 7 1.5.1 Position ltering . . . 7 1.5.2 Camera control . . . 8 1.5.3 Color classication . . . 92 Theoretical backgrounds
11
2.1 The Kalman lter . . . 112.1.1 Initialization of the Kalman lter (t=0) . . . 13
2.1.2 Setting the variances . . . 14
2.2 Adaptive parameter estimation with Kalman lter . . . . 14
2.3 The steady state . . . 15
2.4 Color and color spaces . . . 15
2.4.1 The HSV color space . . . 16
2.4.2 The GCS . . . 17
2.5 From light source to camera output . . . 18
2.6 Function approximation and principal component analysis 18 2.6.1 Principal Component Analysis . . . 19
2.6.2 Polynomial function approximation . . . 20
2.6.3 Articial Neural Networks . . . 21
3 Position ltering
24
3.1 Variance estimations . . . 243.2 Implementation of the Kalman lter . . . 25
3.3 Tracking of the vehicles . . . 26
4 Camera control
31
4.1 Implementation . . . 31 4.1.1 Condition 1 . . . 31 4.1.2 Condition 2 . . . 33 4.1.3 Condition 3 . . . 33 4.2 Evaluation . . . 365 Color classication
38
5.1 Implementation . . . 38 5.2 Evaluation . . . 41 5.2.1 HSV and GCS . . . 425.2.2 Projection in the dominant direction and polyno-mial transformation . . . 42 5.2.3 Angle approximation . . . 43 5.2.4 Conclusions . . . 45
6 Discussions
48
6.1 Position ltering . . . 48 6.2 Camera control . . . 49 6.3 Color classication . . . 50Introduction
This thesis is divided into three parts which are more ore less connected to each other. The rst part deals with tracking and ltering of vehicles, the second with the problem of controlling a camera so that the tracked vehicle is kept in the center of the image and the third with how the color of the vehicles should be used to separate them from each other. All three problem domains are related to the WITAS project described below.
1.1 WITAS
Wallenberg laboratory on Information Technology and Autonomous Sys-tems (WITAS) is formed by four research groups, one from the Depart-ment of Electrical Engineering (Computer Vision Laboratory) and three from the Department of Computer and Information Science at Linkoping University.
WITAS is engaged in basic research in the area of intelligent au-tonomous vehicles and other auau-tonomous systems. The main goal is to develop an airborne computer system which is able to make rational de-cisions. The decisions should be based on various information, some of the information sources should be: pre-stored geographical information, vision sensors and information sent to it by radio. For more information, see the WITAS home page [12].
One of the objectives of the airborne system should be to supervise trac (e.g. detect queues). To do so the positions of the observed vehi-cles must be extracted from images taken by a camera on the system. Each vehicle should also be given a specic identity while they are ob-served, and a ltered position and an estimated velocity should also be extracted. And if a specic vehicle should be tracked during any longer
time period, one would like to keep it in the center of the observed area. Another thing one would like to do is to use the color of the vehicles to decrease the probability of mixing them up with each other. These are the problems considered in this thesis.
A simulator has been developed in WITAS. It simulates a helicopter with a camera attached to it. The helicopter is ying above a landscape and some cars are driving on a road network in the landscape. Some of the evaluation will be done with this simulator.
1.2 Position measuring
There are many ways to detect the presence of vehicles in an image sequence. One way is to rst stabilize the sequence with respect to the ground. The optical ow (a velocity eld for a sequence of images, see g. 1.1) is then calculated and thresholded. The center of gravity of the blobs (see g. 1.1) are then calculated to get the positions. However, one would like to have the positions of the vehicles in some kind of world coordinates. This can also be done in many dierent ways. One way is to use a map containing positions of xed objects (landmarks). Then, if the ground is assumed to be at there exists an eight-parameter bi-linear
transformation (
y
= Ax+b1+c
Tx) from image to ground coordinates. Since the
position and orientation of the camera varies, so do these parameters and therefore they have to be estimated in each image. The transformation depends on having four landmarks in the image. This is since it needs eight known to calculate eight unknown in a linear system.
These positions will contain an error component. In this case, some of the error is introduced when calculating the optical ow and center of gravity of the blob, and some when calculating the positions of the xed objects and then assuming it to lie in a plane. The given positions of the xed objects is probably neither so accurate. The simulator uses a rather dierent way to extract the positions of the vehicles. However, independent of which method used to get the positions, some error will be introduced.
The position of the airborne system (will be called platform from now on) is likely to be determined with a dierential GPS (Global Posi-tioning Sensor) and inertial navigation (if the platform is moving). The accuracy of this positioning depends on the GPS used (for a good dif-ferential GPS the accuracy is about one meter) and the movement of the platform (if the platform is moving the accuracy can be improved over time by using inertial navigation. However, the error component
stops, and the accuracy of the positioning will be the accuracy of the GPS). The simulator does not take this into account, thus the position obtained from the simulator was the correct position of the helicopter. However, some noise could have been added to make it more realistic.
To decrease the error the positions can be ltered, preferably with a Kalman lter.
1.3 Camera control
To track moving vehicles on the ground with a camera mounted on a platform (in this case an airborne platform), one would like to keep the vehicle in the center of the image while tracking it. To do so one must predict the movement of the vehicle and the platform (together or separately) since they move between two image frames. One would also like to lter the position (or change in position) of the vehicle and platform, since the positioning of the vehicle and platform may be rather inaccurate and also discretized (as mentioned in the section above). One also has to calculate a control signal to the camera to track the vehicle. The prediction and ltering can be done with a Kalman lter.
1.4 Color classication
The problem with using color for classication is that the apparent colors change with lighting (e.g. due to shadows). To use color for classication in a good way, one must make some prediction on how possible dierent changes in the color are. This can be made in some dierent ways. The easiest way is to assume that the distribution of the light spectrum is constant, thus only the intensity changes. This makes it very easy to predict the changes, since this correspond to a uniform scaling of the color components. However, in practice this assumption is not always valid. For instance, when going from direct sun light to shadow the spec-trum distribution will change, since almost all the light in the shadow has been re ected in the atmosphere. One way is to use some model for the changes, and another is to approximate the changes.
1.5 Problem denitions
1.5.1 Position ltering
The rst problem considered in this thesis is, to use position data to detect the vehicles, give each an identity, and nally to track them.
The position data comes from an existing program that calculates the optical ow on an image sequence (see g. 1.1 below) and returns an estimate of the center of gravity of the moving objects. The use of a Kalman lter was suggested to obtain predictions of the movements of the vehicles (used when tracking the vehicles) and ltering of the noisy position data (is done to get smooth and realistic traces of the cars). The program should be able to handle the following situations.
Vehicles disappearing under short time periods, which occurs for
example when a vehicle goes under a bridge.
Two position data become one. This happens because the optical
ow lter must have an extension in space, so when two vehicles gets too close to each other the lter response will be interpreted as one object.
Figure 1.1: Left: From the original sequence. Right: Optical ow
1.5.2 Camera control
The second problem considered in this thesis is to keep a specic vehicle in the center of an image while tracking it. Three dierent conditions should be considered.
of the platform, the error vector, the camera angles and a camera parameter. (Condition 1)
Known parameters: Only the error vector,the camera angles and
the camera parameter are known. (Condition 2)
Known parameters: Only the error vector and camera angles are
known. (Condition 3)
The error vector (the vector between the center of the image and the center of the vehicle) is given as two oats (number of pixels in the
(
x;y
) directions, see g. 1.2a where (X;Y
) are xed coordinate axes,y
is along the heading of the platform).
If multiplied with the distance to the object in the center of the
im-age, the camera parameter gives an estimation of the relation
meter=pixel
in the center of the image.
The control signal to the camera should be two angles,
y between,
z
(pointing down from the platform) andy
and x betweenz
andx
(see g. 1.2b below). Y X y x -z Ground the x-direction Center of image in (a) (b) -x Platform Platform Error vector Center of image Car Camera view
Figure 1.2: a: Seen from above. b: Seen from one side.
p x1.5.3 Color classication
The third problem considered in this thesis is to investigate if color can be used as a means to discriminate between dierent vehicles, and if so, how.
The color classication should be considered to be used for two dif-ferent tasks:
When tracking a vehicle. The color should be used to decrease the
possibility of mixing them up with each other.
To re-identify a vehicle by detecting if it has the same color as
earlier.
In the WITAS project, the color is not intended to be used as the only feature; it is complemented by other features, e.g. length and width.
Theoretical backgrounds
This chapter contains some short presentations of some of the already existing theory which is used in this thesis.
2.1 The Kalman lter
First a small example where the Kalman lter is useful (for more de-tails about the Kalman lter and proofs see e.g. [1] or [2]). Assume that we want to estimate the position (in one direction) of an object which is oating in space (no gravity and no friction) from noisy po-sition measurements. The object weighs one kg and is aected by an immeasurable random force. If we assume that the velocity and force are approximately constant between two measurements, we can predict
the position (
p
) and velocity (u
) for the next time step according tou
(t
+ 1) =u
(t
) +T
F
(t
) (2.1)p
(t
+ 1) =p
(t
) +T
u
(t
) +T
22
F
(t
) (2.2)where
T
is the time between two time steps. The measurement can bewritten
m
(t
) =p
(t
) +e
(t
) (2.3)where
e
(t
) is the measurement noise. The equations (1.1) and (1.2) canbe written in matrix form
"
p
(t
+ 1)u
(t
+ 1) # = " 1T
0 1 #"p
(t
)u
(t
) # + " T2 2T
#F
(t
) (2.4)which is the form that the Kalman lter uses. If
F
(t
) ande
(t
) areof the position and velocity, which are optimal in a sense (eq. 2.10 is minimized).
The Kalman lter have been proved useful in a broad range of areas, and like the Wiener lter, it is based on a model for the signal of interest. However, in the Kalman lter the model has been generalized to a steady state:
x
(t
+ 1) =Fx
(t
) +v
(t
) (2.5)y
(t
) =Hx
(t
) +e
(t
) (2.6)where
x
is the state variables andy
the measured variables,v
ande
are process noise and measurement noise, respectively. The stochastic
processes
v
ande
are assumed white, i.e.E
[v
(t
)] =E
[e
(t
)] = 0 (2.7)E
[v
(t
)v
T(s
)] =R
1(t
,s
) (2.8)E
[e
(t
)e
T(s
)] =R
2(t
,s
) (2.9)The Kalman lter is the solution to the problem of estimating
x
(t
) fromthe measurements
y
(s
);
0s
t
, so thatE
[jx
(t
),x
^(t
)j2] (2.10)
is minimized, where ^
x
(t
) is the lter estimate. Note that the matricesF;H;R
1 andR
2 can be functions of time. The solution isPrediction of states:
^
x
(t
+ 1jt
) =F
x
^(t
jt
) (2.11)Update of the state estimation:
^
x
(t
+1jt
+1) = (I
,K
(t
+1)H
)^x
(t
+1jt
)+K
(t
+1)y
(t
+1) (2.12)Prediction of the covariance matrix:
C
(t
+ 1jt
) =FC
(t
jt
)F
T +
R
1 (2.13)
Update of the covariance matrix:
C
(t
+1jt
+1) = (I
,K
(t
+1)H
)C
(t
+1jt
)(I
,K
(t
+1)H
)T+K
(t
+1)R
2K
T(
t
+1)Update of the Kalman gain matrix:
K
(t
+ 1) =C
(t
+ 1jt
)H
THC
(t
+ 1jt
)H
T +R
2(
t
+ 1)(2.15)
If there is a known input signal
u
(t
), i.e. if (1.1) is replaced byx
(t
+ 1) =Fx
(t
) +Gu
(t
) +v
(t
) (2.16)the only modication which has to be done is to replace (2.11) with ^
x
(t
+ 1jt
) =F
x
^(t
jt
) +Gu
(t
) (2.17)2.1.1 Initialization of the Kalman lter (t=0)
According to Kalman's signal model,
x
(0) is a stochastic variable withE
[x
(0)] =x
0 (2.18) andE
[(x
(0),x
0)(x
(0) ,x
0) T] =C
0 (2.19)Obviously, the best initialization is to set ^
x
(0j,1) =x
0 (2.20) andC
(0j,1) =C
0;
(2.21)but if no advance information about
x
is available a natural initializationis
^
x
(0j,1) = [0:::
0]T (2.22)C
(0j,1) =kI
(2.23)where
k
is a suciently large constant to imply that the uncertaintyabout the states is large. If some boundaries of the states are known a
reasonable initialization of
C
isC
(0j,1) = 0 B B B B @b
2 1 0:::
0 0b
2 2:::
0 ... ... ... 0 0 0 0b
2 n 1 C C C C A (2.24)2.1.2 Setting the variances
It is often impossible to calculate the true process variance
R
1. In thiscase one has to regard
R
1 as a variable to adjust until the behavior ofthe lter is satisfying. It is the proportion between
R
1 andR
2 thatdetermines the behavior of the lter. If
R
1 is large andR
2 is smallthe lter will be fast but sensitive to noise. For small
R
1 and largeR
2the opposite is true. Just setting
R
2 to some reasonable value and thenadjust
R
1will often give a good result. If there are no special demands onthe lter, a good way to adjust it is to minimize
E
[(y
(t
),H
x
^(t
jt
,1)) 2].To see if it is properly adjusted, you can then look at the so called
innovation
y
(t
),H
x
^(t
jt
,1), which should have approximately the samecharacteristics as
e
(t
) in (2.6). Note that there should be no maneuversin the test data when looking at the innovation.
2.2 Adaptive parameter estimation with Kalman
lter
The parameter variations can be written as
z
(t
+ 1) =z
(t
) +v
(t
) (2.25)where
z
is the parameter vector andv
is the change between two samples.v
(t
) are independent stochastic vectors with mean value zero and thecovariance matrix
R
1(t
). The relation between the parameters and themeasurements can be written as
y
(t
) =H
(t
)z
(t
) +e
(t
) (2.26)where
e
(t
) is the measurement noise.e
(t
) is assumed to be anindepen-dent stochastic variable with mean value zero and variance
R
2(t
). Thisis a special case of the steady state in section (2.1): set
F
=I
and letH
be a function of time. The solution is^
z
(t
) = ^z
(t
,1) +K
(t
)(y
(t
),H
(t
)^z
(t
,1)) (2.27)K
(t
) =P
(t
,1)H
T(t
)R
2(t
) +H
(t
)P
(t
,1)H
T(t
) (2.28)P
(t
) =P
(t
,1),P
(t
,1)H
T(t
)H
(t
)P
(t
,1)R
2(t
) +H
(t
)P
(t
,1)H
T(t
) +R
1(t
) (2.29)2.3 The steady state
The steady state for the cars is obtained by setting the acceleration in
each direction (
x
andy
) to be white noise, thus in the x-directiond
2x
dt
2 =b
w
_(t
) (2.30)where
E
[ _w
(t
)] = 0 (2.31)E
[ _w
(t
) _w
(s
)] =(t
,s
) (2.32)Dening the state vector
X
(t
) = "x
dxdt # (2.33) this gives the time continuous modeldX
(t
)dt
= " 0 1 0 0 #X
(t
) +b
" 0 1 # _w
(t
):
(2.34)Solving the dierential equation and making it time discrete gives the steady state
X
(t
+ 1) = " 1T
0 1 #X
(t
) +b
pT
" T 2 T p 12 1 0 #"W
1(t
)W
2(t
) # (2.35)where
W
1 andW
2are independent and N(0,1) (for more details see [3]).Note that the matrix operating on
W
1 andW
2 is ambiguous. A steadystate with the states
x
, dxdt and d2xdt2 can be obtained in a similar way.
2.4 Color and color spaces
Humans have three types of cones which are sensitive to dierent wave-lengths of light, one for wavewave-lengths around 600 nm (red), one around 540 nm (green) and one around 450 nm (blue). This gives a three di-mensional representation of colors, the RGB color space. This color representation is widely spread and is the one used in color television. There are many other color spaces, of which two color spaces will be presented here HSV (Hue, Saturation and Value) and GCS (GOP Color Space). For further reading on colors and color spaces see e.g. [6],[7],[8].
2.4.1 The HSV color space
The HSV color space separates the luminance (Value) from the color part by dening the V axis in the [1,1,1] direction of the RGB space. H is the angle around the V axis with zero corresponding to a vector pointing towards the R axis. S can be thought of as \how pure a color is". (see g. 2.1 below). The HSV space is really a cylinder, and not a cone or hexcone as in g. 2.1. However, the perceived change in color as saturation varies is less for dark colors (i.e. ones with a low Value parameter) than for light ones (i.e. ones with a high Value parameter), so the color space is usually distorted to form a cone to help compensate for this perception imbalance. The RGB to HSV transformation is a little strange, which can be seen below.
max=max(R,G,B) min=min(R,G,B) diff=max-min V=max
if max <> 0 then S=diff/max else S=0 if S=0 then H=undefined else Rd=(max-R)/diff; Gd=(max-G)/diff; Bd=(max-B)/diff; if R=max then H=Bd-Gd else if G=Max then H=2+Rd-Bd else if B=max then H=4+Gd-Rd; end end end H=H*60; if H<0 then H=H+360; end
Figure 2.1: The HSV space, V lies in the [1,1,1] direction in the RGB space
2.4.2 The GCS
Like the HSV color space the GCS tries to separate the luminance from the chrominance. The GCS is based on the CIEu'v'1976 standard. The
transformation from RGB space to
u
0v
0 space (chrominance space) isdone by: X=2.77R+1.75G+1.13B Y=R+4.59G+0.06B Z=0.06G+5.59B
u
0 = 4X
X
+ 15Y
+ 3Z
(2.36)v
0 =X
+ 159Y
Y
+ 3Z
(2.37)where
Y
is the luminance. To come to the GCS from CIEu'v'1976,the white-point in the
u
0v
0 space is moved to origin and the Cartesiancoordinate system is made polar with argument zero pointing at green.
Instead of using
Y
as luminance information,L
=X
+15Y
+3Z
is used.2.5 From light source to camera output
A standard color camera represents each pixel with three scalars: the RGB-values. The RGB-value obtained at a specic pixel depends mainly on the color of the surface which projects onto that pixel, on the light sources in the scene and the properties of the camera (see g. 2.2). This section contains a simplied explanation on how these issues are
related. First we have the light source which at a distance
r
has aspectrum
L
(;r
) ( is the wavelength). Then we have the absorptionof the medium which the light travels through. This ought to be rather small if the traveled distance is rather short and the medium is air, so this is neglected. Background re ections (the light is rst re ected by other surfaces before it is re ected on the particular surface corresponding to the pixel), which are not always neglectable, are not considered in this model. The light is then re ected by the surface of interest. There are basically two types of re ections: rst we have the re ections where the light is re ected \inside" the surface (called sub-surface re ection or body re ection), then we have the case where the light is re ected on the interface (this is known as specularities, and occurs for example when watching some types of surfaces from a particular angle). The second type of re ection is not considered here, so the spectrum of the re ected light will be:
S
() =R
(;a
)L
(;r
) (2.38)where
a
denotes the photometric angles. The light is then measured bythe sensors (or sensor) in the camera, in this case three (RGB). The sensor response is then given by:
p
k=Z
F
k()S
()d;k
= 1;
2;
3 (2.39)where
p
kis the response of the k-th sensor andF
kis the response functionof the k-th sensor (see Fig. 2.3). For a more detailed explanation see e.g. [9].
2.6 Function approximation and principal
com-ponent analysis
This section which presents two types of function approximation; poly-nomial and neural networks, starts with a short review of principal com-ponent analysis.
Light source
Surface Pixel
Figure 2.2: The light re ecting in the surface
F
λ
F
1 F2 F3
Figure 2.3: Example of a set of response functions
2.6.1 Principal Component Analysis
The principal component analysis solves the problem of nding the dom-inant subspace of a set of vectors [
v
1;v
2;:::::;v
n] (a group of points(vec-tors)). This is done by estimating the covariance matrix C for the set.
C
=E
[(v
,m
)(v
,m
T)] (2.40)m
=E
[v
] (2.41)where
m
is the mean vector for the cluster. The largest eigenvectorsof C represents the dominant subspace for the set. The eigenvectors can be calculated in many dierent ways, e.g. by using Singular Value Decomposition (SVD).
To make this plausible, one may see what the elements in the covari-ance matrix represents. In the diagonal are the varicovari-ances in the dierent directions (along the axes) and outside the diagonal are the correlation between the dierent components. It is rather easy to imagine how a
new set of axes should be chosen, to make them uncorrelated (the eigen-vectors) with each other (if the set is fairly shaped like a line, and the dimension of the space is two or three). The conclusion is that the eigen-vectors with the largest eigenvalues represents the dominant subspace for the set.
For more details, see e.g. [10].
2.6.2 Polynomial function approximation
This type of function approximation is rather easy and straight forward.
The problem to solve is to t a polynomial to an function
f
as well aspossible. For example, a two variable function
f
can be approximatedwith a second order polynomial. ~
f
(x;y
) =a
+bx
+cy
+dx
2+
ey
2+
gxy
(2.42)where
a;b;c;d;e;g
are the parameters to be determined. This is doneby minimizing
def. as =X i jf
(x
i;y
i),f
~(x
i;y
i)j 2 (2.43) This can easily be solved with the least squares method, by writing it in matrix formF
=Zv
(2.44) whereZ
= 2 6 4 1x
1y
1x
2 1y
2 1x
1y
1 ... ... ... ... ... ... 1x
ny
nx
2 ny
2 nx
ny
n 3 7 5 (2.45)v
= 2 6 6 6 6 6 6 6 4a
b
c
d
e
g
3 7 7 7 7 7 7 7 5 (2.46) andF
= 2 6 4f
(x
1;y
1) ...f
(y
n;y
n):
3 7 5 (2.47)v
= (Z
TZ
),1Z
TF
(2.48)The inverse might be very computation demanding if there are many unknown variables in the polynomial. However, the inverse does not need to be recalculated for every new measurement to be included in the approximation. Instead the old matrix just has to be modied a little, because
Z
TnewZ
new =Z
TZ
+mm
T (2.49)where
Z
new is the newZ
matrix with the new measurement included,and
m
is the new variable vectorm
= 2 6 6 6 6 6 6 6 4 1x
newy
newx
2 newy
2 newx
newy
new 3 7 7 7 7 7 7 7 5 (2.50) The new inverse is then(
Z
TnewZ
new),1 = (Z
TZ
+mm
T),1 = (Z
TZ
),1 , ((Z
TZ
),1m
)(m
T(Z
TZ
),1) 1 +m
T(Z
TZ
),1m
(2.51) which does not require as much computation.2.6.3 Articial Neural Networks
An articial neural network is constructed by connecting small com-putational units in a network, the units are called perceptrons. The perceptrons are meant to be simplied models of biological neurons. A
perceptron consists of number of inputs
x
[n
], a transfer functionS
(Wx
)and some outputs
u
[k
]. In this presentation the perceptrons will onlyhave a single output (see g. 2.4 below).
The function
S
can be chosen in many dierent ways. Two commonfunctions are: the linear function
S
(Wx
) =Wx
and the 'sigmoid'func-tion
S
(Wx
) = 11+e
,Wx. A number of perceptrons are then connected to
each other. The following presentation will only consider a feed-forward networks (no closed loop in the network). Below a two layered structure is presented (g. 2.5).
One property of the neural network is its ability to learn things. This can be used for function approximation. The learning is done by
. . . . . S(Wx) x[1] x[2] x[N] u
Figure 2.4: A single perceptron with N inputs
applying inputs to which the correct outputs are known and then adjust
the
W
(see g. 2.4) vectors in a smart way. The adjustment can bedone in mainly two dierent ways: one way is to adjust them for every input signal (on-line), the other way is to rst apply the hole set of input
signals and then adjust the vectors (o-line). One way to adjust the
W
vectors is to use the backpropagation algorithm.
The backpropagation adjustment rules for the layers are: The Error function:
E
p= 12XKi=1
(
d
pi ,u
p
i)2 (2.52)
where
p
is an indexing for the set of input-output signals andd
pi is thewanted output signal for the i:th perceptron in the output layer. O-line:
W
=, P X p=1@E
p@W
(2.53) On-line:W
=,@E
p@W
(2.54)The output layer:
x[1] . . . . . . . . . . . . . . . . . . f2 f1 s1 s2 sK . . . . . . . . . . . . . v[1] v[2] u[1] u[2] u[K] . .
x[N] Hidden layer Output layer
fM v[M]
Figure 2.5: Feed-forward network with one hidden layer The hidden layer:
@E
p@W
m =@E
p@v
p[m
]@v
@W
p[mm
x
]@W
@W
mmx
(2.56) where@E
p@v
p[m
] = X k@E
p@u
p[k
]@u
@W
pk[k
v
]@v
@W
p[km
v
] (2.57)where
denotes the learning rate. For proof of the backpropagationalgorithm and more information about neural networks, see e.g. [11].
Adjustment rules
Here are two other ways to adjust the
W
vectors explained. The rstmethod adds a momentum term
W
oldto eq. (2.53) and (2.54), where is a constant to adjust. This is done to avoid oscillations. The secondmethod is the delta bar delta rule. The delta bar delta rule says that if
low pass ltered version of
W
timesW
is positive the learning rateis increased, and decrease if negative. Thus, if the weights has been e.g. decreasing for a while, the learning rate increases. Both methods makes the adjustment faster.
Position ltering
This chapter presents an implementation of a Kalman lter for stabiliz-ing measurements of position and velocity, accordstabiliz-ing to the discussion in 1.2 and how the evaluation is done.
The test sequence used for variance estimation and evaluation is an 800 frames (32 seconds) long sequence from Hallunda (see g. 3.2) containing 29 vehicles on the main road and one on the road to the right, and maybe some on the small road to the left (very hard to see).
3.1 Variance estimations
This section considers the problem to estimate the variance of the mea-surement noise. As mentioned in 2.1.2, the performance of the Kalman lter depends on having estimates of the variances for the measurement noise.
The variances for the positions of the vehicles are estimated in two
dierent coordinate systems (
x;y
) and (v;o
), wherev
lies in thedirec-tion of the modirec-tion of the vehicle and
o
is orthogonal tov
. The (v;o
)system is tested since the variance is not expected to be equal in the
v
ando
directions (a common vehicle is not square, which may causethe variances to dier in the
v
ando
directions). To estimate thevari-ances in the
x
direction, a lineat
+b
with length 19 is tted at everypoint except the nine rst and last of the tracking data. This is done by
minimizing p +9 X t=p,9 j
at
+b
,x
(t
)j 2 (3.1)each point by calculating p+9 X t=p,9 j
at
+b
,x
(t
)j 2W
(t
) (3.2)where
W
(t
) is a Hamming window. The variance in they
directionis estimated in the same way. To estimate variance in in the (
v;o
)directions the lines are tted to the position data (
x;y
) instead of to(
x;t
) and (y;t
). Then the angle of each line is calculated and the data isrotated so that the new x-direction is parallel to the corresponding line. Then a line is tted to the rotated data and the variance is estimated as before. To obtain a scalar estimate the mean of the variances calculated was taken. Note that the variance probably changes with the camera parameters and the distance between the camera and vehicle. However, the necessary information needed to evaluate an adaptive system was not available.
The measurement variance is obtained by using traces from vehicles
making no maneuvers. The estimates obtained in the (
v;o
) directionswere
R
2v = 0:
06 andR
2o = 0:
02, so the variances in the (x;y
) directionsare dependent of the direction of the movement of the vehicle. The process variance is seen as a design parameter.
3.2 Implementation of the Kalman lter
This section explains how the Kalman lter is implemented (which co-ordinate system used, which variables used as states and how it is initi-ated).
The lter is implemented in four ways:
With position and velocity as states in the
x;y
coordinates.With position and velocity as states in the
v;o
coordinates.With position, velocity and acceleration as states in the
x;y
coor-dinates.
With position, velocity and acceleration as states in the
v;o
coor-dinates.
In the cases when the (
x;y
) coordinates are used, the variancesR
2xandR
2y are set to 0.04, and when (v;o
) are usedR
2v is set to 0.06 andR
2oto 0.02. In all four cases the process variance
R
1 is seen as a designit was adjusted to lter as hard as possible with some conditions. The conditions imply that the lter should not lose track of a vehicle which
accelerates with 9
m=s
2 along thev
axis or with 9m=s
2 along theo
axis.The values of the conditions, are selected on the basis that no normal vehicle can accelerate more than this. By doing this it was easier to handle some of the problems that occur when no position data can be found, and I do not think it is so important to get the best possible estimation of the position.
The two implementations which did not include the acceleration in the states did prove to give the best results (probably since it is not so easy to make a good estimation of the acceleration, when the position data is quite noisy), and since they demand less computation, the two with acceleration included were out of consideration. It was harder to observe any dierence between the remaining two; the variance of the ltered data was calculated but no signicant dierence was found. Con-sequently, which to choose depends on what is seen as most important,
the small amount of calculations needed when using the (
x;y
)coordi-nates, or the exibility of (
v;o
) coordinates and the fact that it shouldbe able to give better result if the lter was adjusted to minimize the error.
The initiation of the lter is done by setting the prediction of the states to zero (which implies that we do not know anything about the position) and setting the covariance matrix to
"
s
2 00
v
2#
(3.3)
where
s
is the width of the viewable area in meters (the position datareceived was given in meters) and
v
is some possible velocity (the speedlimit ought to be a good value, is set to 25m/2 in this lter).
3.3 Tracking of the vehicles
This section explains how the tracking of the vehicles is performed, men-tions some problems encountered and how these were solved.
In the rst frame a lter (Kalman) is applied to each position mea-surement, and in the consecutive frames to every position measurement not already taken. When to remove a lter, is explained later. The predictions from the lters are separated in two groups: one labeled 'vehicles' and one 'not vehicles'. To be classied as a vehicle the lter must have tracked something during a number of samples. This is done
The tracking is done by taking the position predictions from the l-ters labeled 'vehicles' and calculate the distance between each prediction and each measured position. This gives a matrix where the value in line
k
and columnl
is the distance between predictionk
and measurementl
.Then each prediction is connected to the measured position closest to the
prediction (this is done when the (
x;y
) coordinates are used; when the(
v;o
) coordinates are used an elliptical distance measurement is used,D
= p3
dv
+ 5do
. This is done since the variance is greater in thev
direction than in the
o
direction) under the condition that the distanceis less than some reasonable value (the movements of the vehicles has
some restrictions). In this way a vector
pos
is created, see [5].Example: The measured position closest to prediction number ve
is measurement number 2, gives
pos
(5) = 2.If two or more predictions are connected to the same measurement, the ones with the largest distances are set to be not allowed to connect to that measurement, and the connecting part is done again. This is done until no predictions have the same measurement. If there are any measurements left they are used in the same way with the predictions from the lters labeled 'not vehicles'. If some lter labeled 'not vehicles' does not get connected to any measurement it is removed. If some lter labeled 'vehicles' does not get connected to any measurement four things may have happened:
The vehicle drives out of the viewable area
The vehicle disappear behind some object, e.g. a bridge
The optical ow program fails to detect the vehicle, e.g. because
the car has no motion anymore (it has stopped)
The vehicle gets to close to another vehicle
In the rst three cases there is not much to do without knowing anything about the context (the road, buildings etc.), so the lter keeps predicting the position under a number of steps, the search area is in-creased for each step. If no connection is found the lter is removed.
In the case when two vehicles gets too close to each other, three dierent actions could be chosen. Let one vehicle overtake the other vehicle, let one vehicle drive after the other or just let them go straight forward according to predicted velocities. Which alternative to choose
depends on the length of
x
(see g. 3.1 below) and the velocity vectors, , , , , , g g @ I
x
v
car1 car2 sFigure 3.1: The dot is the measurement,
x
is orthogonal tov
the velocityvector.
If the scalar product of the two velocity vectors is negative (the vehicles meet each other) they are set to just go straight forward.
If the scalar product of the two velocity vectors is positive and the
length of
x
is smaller than some number, the vehicles are set to driveafter each other. If greater, the vehicle behind the other (car1) is set to drive past the other (car2). This is done by predicting how car1 will move relative to the measurement. First the point where car1 should be when the two vehicles separate relative to the measurement is calculated by
mirroring car1's position in the line spanned by
x
, then the time for thismovement is estimated by making the assumption that the measurement is the center of gravity for the two vehicles together. This gives
vm
= (v
1 +v
2)=
2 (3.4)vr
=v
1,vm
(3.5)where
v
1 andv
2 are scalar velocities for car1 and car2, respectively.vm
is the estimated velocity for the measurement and
vr
is the estimatedvelocity for car1 relative the measurement. The time for the event is
then obtained by just dividing the distance by
vr
. The trace is theneasily calculated, and car2 is just placed on the opposite side of the measurement.
3.4 Evaluation
The evaluation which is presented in this section, is mainly done by looking at the ltered traces to see if the tracker lost track of any vehicle or mixed them up with each other, and to see if the tracking looked natural (not natural if the vehicles drive over each other or jump large distances). The variance was also estimated for the traces.
The rst part of the evaluation is rather subjective, but the program did not lose track of the vehicles or mix them up with each other in the test sequence and I think the tracking looks rather natural except
vehicles can appear from nowhere when two cars enter the viewable area closely together and then separate (see g. 3.2 below). However, these two problems could not be avoided. Another fault observed in the test sequence was that noise was sometimes classied as a vehicle (one clear case and three not so obvious, since it is not clear if it is a vehicle or noise). This fault can easily be removed by increasing the number of steps it takes to be classied as a vehicle, but when a measurement is labeled 'not vehicles' it is very vulnerable, since it has a lower priority and not allowed to disappear under a single frame, so a compromise had to be made. Another fault was that sometimes the estimated positions of the vehicles were to close to each other during overtaking, this was
to some extent avoided by setting a min value to
x
(see g. 3.2) whenthe maneuver was classied as an overtaking. However, this was not avoided for one of the ve overtakings in the test sequence. This was partially caused by the position data switching between one and two measurements a couple of times and partially by some drawbacks in the routine that connects the measurements to the lters. The routine connects (as described before) the predictions with the measurements closest to them, which causes a problem under an overtaking because the prediction to the vehicle being passed gets closer to the measurement belonging to the passing vehicle than to it's own measurement, and this distance is shorter than the distance between the passing vehicle and this measurement. The result is that the lters get each other's measurements which makes the predictions approach each other. This would have caused the program to mix them up with each other if not the two measurements had become one again. The variance estimations decreased from
R
x= 0:
04,R
y = 0:
04 toR
x= 0:
002,R
y = 0:
002.Camera control
This chapter presents how the camera control is implemented (to be able to track a vehicle) and how the evaluation is performed.
4.1 Implementation
This section presents how the camera control is implemented for the three dierent conditions mentioned in subsection 1.5.2.
Known parameters: The position of the vehicle and the position of
the platform relative to a xed point on the ground, the heading of the platform, the error vector, the camera angles and a camera parameter. (Condition 1)
Known parameters: Only the error vector,the camera angles and
the camera parameter are known. (Condition 2)
Known parameters: Only the error vector and camera angles are
known. (Condition 3)
When an equation is only shown for the
x
direction it is equivalentfor the
y
direction.4.1.1 Condition 1
The predictions of the vehicle and platform movements are done with two Kalman lters with the states position and velocity. Since the error is given in pixels and the predicted movements are in meters and also given in dierent coordinate systems, the error is transformed to meters and the predicted movements are transformed to the coordinate system
of the platform. To do this the distance
D
between the platform and the vehicle is calculated according toD
=q ^p
2 pZ+ (^p
vX,p
^pX) 2+ (^p
vY ,p
^pY) 2 (4.1)where (^
p
v;
p
^p) are the estimated vehicle and platform positions,respec-tively (comes from the Kalman lters, in meters). The error in meters
"
m can then be calculated as"m ="p
c D
(4.2)where
"
p is the error vector in pixels,c
is the camera parameter. Thepredicted movement to the next frame in the platform coordinates ^
v
ppcan be estimated as ^
v
pp= 2 6 4 sin(p) ,cos(p) cos(p) sin(p) 3 7 5^v
pf (4.3)where ^
v
pf is the predicted movement to the next frame for the platformin the xed coordinates, and
p is the angle between theX
andy
axis(see g. 1.2b). The vehicle movement ^
v
vf is transformed in the sameway. Then the new control angles are easily calculated, according to
xn= arctan"
mx+ ^v
vpx,^v
ppx+ (^v
ppZ+ ^p
pZ)tan
(x)^
p
pZ (4.4)A Kalman lter has also been implemented to compensate for the rotation of helicopter. The lter used the heading angle and angular velocity as states. This lter is independent of the other lter with posi-tion and velocity as states, this is since a helicopter is not constrained to move in the heading direction. The compensation in meters are obtained by calculating:
r
=q (^p
vX ,p
^pX) 2+ (^p
vY ,p
^pY) 2 (4.5)dy
=r
(cos(p,d
p),cos(p)) (4.6)dx
=,r
(sin(p,d
p),sin(p)) (4.7)where
r
is the radius from the rotation center,dx
anddy
are thecom-pensation factors in meters and
d
p is the predicted change in angle4.1.2 Condition 2
Now no positions are known, but a normalized position of the vehicle relative to the platform can easily be estimated. The estimate is then Kalman ltered to get a prediction of the movement of the vehicle rela-tive to the platform.
D
norm =q
1 + tan(
x)2+ tan(y)2 (4.8)
^
p
xnorm ="
xpc D
norm+ tan(x) (4.9)The new angle is then given by:
xn= arctan("
xpc D
norm+ ^v
vpxnorm+ tan(x)) (4.10)since
arctan
"
xpc D
+ ^v
vpxp
^pZ+ ^p
pZ tan(x) = (4:
10) (4.11)4.1.3 Condition 3
Now when no camera parameter is known, the program must estimate the camera parameter or some other related parameter. The error can be written as
"
xm = ^p
xm,h
tan(x) (4.12)where ^
p
xm is the position of the vehicle relative to the platform (inmeters) and
h
is the height of the platform above the ground (in meters).If we then dierentiate (4.12) with respect to
t
we getd
dt
("
xm) = ^v
xm,h ddt
(tan(x)) (4.13)where ^
v
xmis the vehicles velocity inm=s
. However, the system is discreteso (4.12) a nite dierence is used.
"
xmn,"
xmT
= ^v
T
xm ,h
tan(
xn),tan(x)T
(4.14)where ^
v
xm is the movement of the vehicle between two frames,multipli-cation with cTD gives
"
xpn,"
xp= ^v
xp,1
c D
norm (tan(xn),tan(x)) (4.15)Here we have the two parameters we want to estimate: the movement
known (the change in error),
D
norm can be calculated and the old andnew camera angles are known. So the problem turns out to be of the type
y
=a
+bx
(4.16)where (
y;x
) are known and (a;b
) unknown and vary with time. Thisproblem can be solved in many dierent ways; here are some of the methods tested.
For each pair of (
x;y
), (a;b
) can be calculated (two equations andtwo unknown) if the new and old
x
, and new and oldy
, dier fromeach other respectively. The
a
andb
:s are then low pass ltered.A single perceptron (used in neural networks) updated with the
back propagation algorithm without moment.
The least squares method with and without weighting
With a Kalman lter
The two rst methods did not give acceptable results, so they are not presented here. Furthermore, since the unweighted least square is a special case of the weighted and it did not give as good results as the weighted, it is not presented here either.
Weighted least square
To solve the problem with the weighted least square it is written with matrices.
y
=Ax
(4.17) wherey
= 2 6 6 6 6 6 6 4y
[1]y
[2]y
[3] ...y
[n
] 3 7 7 7 7 7 7 5 (4.18) ,A
= 2 6 6 6 1x
[1] 1x
[2] ... ... 3 7 7 7 (4.19)and
x
= "a
b
#:
(4.20)The least square gives the solution to the overdetermined system that
minimizesjj
Ax
,y
jj2 and the weighted least square solution minimizes
jj
W
(Ax
,y
)jj2 where W is the weight matrix. The solution is given by
x
= (A
TW
TWA
),1A
TW
TWy
(4.21)where the columns in
A
must be linearly independent (This is checkedbefore solving the equation. If not linear independent the old
a
andb
areused). In this problem
n
is set to 10 (10 measurements are used toesti-mate
a
andb
) to make the estimations follow changes ina
andb
ratherquickly and
W
is set to a diagonal matrix with the diagonal elementsincreasing with increasing indices to imply that the new measurements are more reliable than the old.
Adaptive parameter estimation with a Kalman lter
The parameter variations can be written as
z
(t
+ 1) =z
(t
) +v
(t
) (4.22)and the relation between the parameters and the measurements can be written as
y
(t
) =H
(t
)z
(t
) +e
(t
) (4.23)where
e
(t
) is the measurement noise. A Kalman lter is then applied tothis as explained in section 2.2.
In this problem
R
1 is time dependent, sincea
is the movement inpixels and the variation in
a
depends on the distance and the cameraparameter. However, the distance is not known so
R
1 is set to acon-stant value. The variation in
b
depends on the normalized distance.R
2is assumed to be time independent since
y
is given in pixels (themea-surement error of the error in pixels ought to be rather independent of the distance and camera parameter). In this case are
H
=h 1x
i (4.24) andz
= "a
b
#:
(4.25)To avoid instability caused by numerical problems the value of
H
(t
)P
(t
,1)H
T(t
) (4.26)is checked. If it is negative it is set to zero. It should always be greater
than zero since
P
(t
) is positive denite (P
(t
) is a covariance matrix).P
(t
) should also always be symmetric. This is guaranteed by settingP
(t
) = (P
(t
) +P
T(t
))=
2 (4.27)4.2 Evaluation
The evaluation was mainly done with a small test track where a vehicle rst drives with constant velocity along a straight line and then along half a turn of a circle. The platform was stationary to make it easy to compare the dierent solutions, and the camera parameter was x. Ap-proximately white normal distributed noise with varying variance was added to the position of the vehicle. In this test environment all im-plementations approximately gave equally good results. However if the zoom of the camera was changing during the test track the implementa-tions with adaptive parameter estimation obviously did not perform as well as the other ones. Since the implementation constrained by condi-tion one was going to be used, the other ones were not tested any more than this. However, if an implementation under condition three is going to be used the one using a Kalman lter is preferable since it is much easier to adjust to perform well.
No numbers on how well the dierent implementations perform are presented (mean deviation and mean variance) since these depends on so many dierent things (for example the movement of the car and platform, the measurement noise, the camera parameter and the change of it, the height of the platform and so on), so it would probably not be any usable information.
The condition one implementation shall be further tested in the sim-ulator mentioned in section 1.1. It simulated a helicopter with a camera attached to it, the helicopter is ying above a landscape and some ve-hicles are driving on a road network in the landscape. One thing the helicopter should be able to do was to follow a vehicle and keep it in the middle of the camera image. The measurement variance for the vehicles and the helicopter shall be estimated and the process variance set to some appropriate value. The measurement variance can be estimated in the same way as in section section 3.1. The process variance shall
the car disapears is handeled by another program. One way might be to select the parameters that minimizes the error under "normal" ma-neuvers, one must however make sure that the lter can handle large maneuvers.
Color classication
This chapter presents some methods to use color to make classication of vehicles more robust, in changing illumination.
5.1 Implementation
A large part of the time used on color classication was spent on nding material about this subject (both general information and work done in the area) and then read it. However, none of the methods found were seen to be applicable to this problem. The reason is that all methods found either make some restrictions on the environment (e.g. the object must have more than one color) that are not valid in this case, or require some information which are not available (e.g. knowing how the color of the object looks in dierent possible lighting conditions).
One thing one must assume, is that the change in the spectrum of the lighting changes slowly over frequencies, or that the re ectance function
R
in eq. (2.38) is quite smooth, or both. These assumptions says that,two points in the same region of the RGB-space will change in roughly the same way. Another thing assumed is that the scene only contains two dierent types of illumination (e.g. sun and shadow or e.g. street light and shadows) at the same time.
The plan was rst to reduce the 3-dimensional RGB-space to a 2-dimensional space independent of the luminance. The reason for this is that the main problem was assumed to be the illumination changes caused by shadows (and was so in the test sequences), and one would like to think that the luminance is the only thing changing (see g. 5.1 below, RGB plot for six cars in varying lighting. 40 measurements for each car). Based on this approach, the following method were tested:
the H component was also tried)
The chrominance components in the GCS
To project the RGB values in a plane with its normal parallel to
the dominant direction of change in the RGB space
To use a polynomial transformation that minimizes the spread
However, the luminance have to be used as a third component to distinguish two colors with dierent luminance (e.g. black and white).
0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 G R B
Figure 5.1: RGB plots for a red, a white, a blue, a green, a yellow and a black vehicle
However, these methods did not give so very good results (see the evaluation). Therefore some other methods were tested. These methods are based on the idea of getting a function to approximate the angles to the direction of change in the dierent parts of the RGB-space (see g. 5.2). The method should also be able to adapt to changes in these directions.
One method tested was to use a model on how the the RGB values change with changing lighting. The model used was the simplest
possi-ble, a diagonal matrix
D
. The matrix describes the change in the RGBvalues
R
caused by a specic change in the lightingL
.R
L+L=DR
L (5.1)If the change in lighting is
k
L
, the change in the RGB values will beThis can be made continuous
R
L1 =F
(t
)R
L2 (5.3)where
F
(t
) can be written asF
(t
) =e
Mt = 2 6 4e
Rt 0 0 0e
Gt 0 0 0e
Bt 3 7 5 (5.4)The constants
k can be calculated by selecting two points from asuitable cluster (many measurements and large dierence between the largest eigenvalue and the other two), the points should correspond to
measurements in sun and shadow, respectively, and then set e.g.
t
= 1to correspond to this change. The angles corresponding to the direction of change can then easily be calculated by rst calculate the tangent to the curve. This model assumes that the camera sensors are linear.
Two dierent function approximations were used: polynomial func-tion approximafunc-tion using the mean squares method (see 2.6.2) and neu-ral networks (see 2.6.3). No complete program has been made. However, the method is meant to work in the following way. When tracking of a vehicle shall begin, the RGB vector of its color is calculated. The vector is then sent to the function approximation program which returns the two angles. Then an ellipsoid probability eld is calculated around the RGB vector, e.g.
p
= 9z
2+k
y
2+x
2 (5.5)where
k
is a normalization constant and (x;y;z
) is the distances fromthe RGB vector, with
z
in the direction described by the angles. Theweighting of the
z
distance should vary with the distance from the origin,since light colors changes more with varying lightning than dark colors do. The probability eld should probably also be non-symmetric, since e.g. a light color in a shadow can change in RGB values much more than
a dark color outside the shadow. This can be achieved by weighting the
z
distance dierent depending on if the
z
distance is positive or negative.In the next step the real tracking begins and the RGB vectors to all vehicles (or all possible vehicles) in the image are sent to the program. The program returns the probabilities for the vehicles to help the tracker to choose the correct vehicle. The tracker then sends the RGB vector of the chosen vehicle to the program, which calculates a new reference point for the probability eld by weighting the old reference point with the new RGB vector in some way (e.g. mean value). And so on until
interesting. To later see how likely a vehicle is the same as this one, the probability is calculated with respect to the reference point.
The adapting for the model based method can be done by just up-dating the matrix (as mentioned above or with some mean of matrices) with constant time intervals or whenever it is seen necessary. The adapt-ing for the approximatadapt-ing function can be done in many ways. One easy way that might work is to divide the RGB-space in a number of sec-tions (one for rather red colors and so on). In each section one cluster of RGB measurements is selected and used for training. When a new vehicle has been tracked, the principal component analysis is used to get the dominant direction and a variance estimation in that direction. If the variance is small compared to the distance to the origin, it is not used for training. If it is not so small, it is examined to which section the cluster belongs to. If the dierence of the new and old directions is rather small nothing is done, and maybe also if the dierence is very large. If the dierence is not to small (and not too large), the old cluster is replaced with the new and the approximating functions are updated. This is not optimal since it ought to be possible to some extent predict how the directions in all sections change if the change in one is known.
0 50 100 150 200 250 300 0 100 200 300 0 50 100 150 200 250 300 G R B
Figure 5.2: A vector eld obtained from an approximating function
5.2 Evaluation
The evaluation data was made from a number of video sequences taken from a helicopter. The rst video sequence was stored as Super-VHS.
This was not sucient since the bandwidth for the colors is rather small in Super-VHS. The result is that the colors are smoothed out, which results in the vehicles having a mixture of the vehicles color and the background color. However, some video sequences stored on Digital BetaCAM were found (the RGB values are stored separately, so that the bandwidth for the colors is the same as the bandwidth for the intensity). Some parts of these sequences were digitized and used to extract test data from. The extraction of the test data was done by taking the RGB
values from a 33 pixels large neighborhood on the vehicle, and then
calculate a weighted mean of if. This is done for 40 consecutive frames for each vehicle. The vehicles are rather small in the video sequences
(maybe 510 pixels). This makes it hard to see the borders of the car,
which makes it hard to be sure that the 33 is inside the borders of
the vehicle all the time. This together with the sampling of the image and the cars not having a uniform color (e.g. the interior), has probably introduced some errors. The extraction of the RGB clusters was totally done for 34 vehicles.
Mainly six clusters (the clusters in g. 5.1) are used for evaluation. These clusters are selected to cover as large part of the RGB-space as possible.
5.2.1 HSV and GCS
These two color spaces were quite quickly dismissed, since they do not work. Fig. 5.3 below shows the HS-plane for the six vehicles. Ideally would there be six separated dot shaped clusters, thus no need to say any more about this. The GCS gave more or less the same result, so neither this space is further evaluated.
5.2.2 Projection in the dominant direction and
polyno-mial transformation
The direction to project in was chosen to the mean direction of the dominant direction of the six clusters. The result of projecting these clusters is shown in g. 5.4. The evaluation data should not be the same as the learning data to make a proper evaluation. However, if it does not work as well as demanded on the learning data, it certainly will not work good enough on the evaluation data. The result is quite good, except for the red one (the cluster to the right), and therefore was this method also excluded.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 S H
Figure 5.3: The HS-plane for the six clusters
One way is to select the points as the mean value of the projection of respectively cluster in the dominant direction (as mentioned above). The result for a second order polynomial is shown in g. 5.5. The yellow (at the bottom) and the red (to the right) have improved a little. However, it can be seen from g. 5.6 (curves of the points in the RGB-space that are transformed to selected points in the 2-dimensional space) that the selection of the points are not so good, since the curve for the red (at the bottom) is very bent. This can obviously cause very large errors. There could not be found any good way to avoid this problem, if it should be adaptable, which it was supposed to be. Consequently, this method was also excluded.
5.2.3 Angle approximation
The method using a model was tested by just looking at the lines dened
by eq. (5.4) when
t
goes from 0 to 1, for some point in each cluster (seeg. 5.7). This was done for about 20 clusters, and all except the yellow seemed to be OK (see g. 5.7). There was only one yellow car in the test sequence. So it is not really clear if it is the model that does not work, or if it is some large errors in the measurements of the RGB values to this vehicle (I could not nd any reason for this, when looking on how the data had been extracted). The result is that this method is excluded until it is clear what causes the large error for the yellow cluster.
The evaluation of the two function approximation methods was done a little more properly. Both were trained with the set of six clusters,