Investigations in Tracking and Colour Classification

(1)

Institllti'onen for systemteknik

Department of Electrical Engineering

Examensarbete

TE

AndersMoe Reg nr: LiTH-ISY-EX-1967 7 December 1998

LINKOPINGS UNIVERSIIET

(2)

Investigations in Tracking and Colour Classification

Examensarbetet utfOrt i Bildbehandling yid Tekniska Hogskolan i Linkoping av

Anders Moe

Reg nr: LiTH-ISY-EX-1967

Handledare:

Klas Nordberg, Linkopings Universitet, Sverige Johan Wiklund, Linkopings Universitet, Sverige

Examinator:

Klas Nordberg, Linkopings Universitet, Sverige

(3)

Avdelning, Institution Division, department

Department of Electrical Engineering Computer Vision

Sprak Rapporttyp ISBN Language Report: category

D SvenskalSwedish

o

Licentiatavhandling ISRN

o EngelskaJEnglish o Examensarbete

o

C-uppsats _{Serietitel och serienummer}

o

D-uppsats Title of series, numbering

0

o

bvrig rapport

0

LiTH-ISY-EX- 1967

URL fOr elektronisk version

Titel

Title _{Investigations in Tracking and Colour Classification}Undersokningar inom fOljning och fargklassificering

Forfattare Anders Moe

Author Sammanfattning Abstract ISSN Datum Date 1998-12-07

Den har rapporten behandlar huvudsakligen tre olika problem. Det fOrst problemet ar hur man ska filtrera fordons positions data. For att gora detta maste fordonen Ioljas. Detta ar gjort med ett KalmanfiIter. Det andra problemet var att styra en kamera sa att ett givet fordon ligger mitt i bild, tre olika fOrhallande har betraktats. Detta lOstes huvudsakligen med ett Kalmanfilter. Det sista problemet var hur man ska anvanda fordonens farg sa att man far siikrare klassificering av demo Nagra forslag pa hur detta kan goras ges, men ingen riktigt bra metod har hittats.

In this report, mainly three different problems are considered. The first problem considered is how to filter position data of vehicles. To do so the vehicles have to be tracked. This is done with Kalman filters. The second problem considered is how to control a camera to keep a vehicle in the center of the image, under three different conditions. This is mainly solved with a Kalman fil-ter. The last problem considered is how to use the color of the vehicles to make classification of them more robust. Some suggestions on how this might be done are given. However, no really good method to do this has been found.

(4)

Abstract

In this report, mainly three dierent problems are considered. The rst problem considered is how to lter position data of vehicles. To do so the vehicles have to be tracked. This is done with Kalman lters. The second problem considered is how to control a camera to keep a vehicle in the center of the image, under three dierent conditions. This is mainly solved with a Kalman lter. The last problem considered is how to use the color of the vehicles to make classication of them more robust. Some suggestions on how this might be done are given. However, no really good method to do this has been found.

(5)

Acknowledgments

I would like to thank the people at the Computer Vision lab at ISY for taking their time answering all kinds of questions and helping me with the equipment.

I thank my examiner Klas Nordberg and my opponent for correcting this thesis and giving suggestions on improvements.

(6)

Introduction

This thesis is divided into three parts which are more ore less connected to each other. The rst part deals with tracking and ltering of vehicles, the second with the problem of controlling a camera so that the tracked vehicle is kept in the center of the image and the third with how the color of the vehicles should be used to separate them from each other. All three problem domains are related to the WITAS project described below.

1.1 WITAS

Wallenberg laboratory on Information Technology and Autonomous Sys-tems (WITAS) is formed by four research groups, one from the Depart-ment of Electrical Engineering (Computer Vision Laboratory) and three from the Department of Computer and Information Science at Linkoping University.

WITAS is engaged in basic research in the area of intelligent au-tonomous vehicles and other auau-tonomous systems. The main goal is to develop an airborne computer system which is able to make rational de-cisions. The decisions should be based on various information, some of the information sources should be: pre-stored geographical information, vision sensors and information sent to it by radio. For more information, see the WITAS home page [12].

One of the objectives of the airborne system should be to supervise trac (e.g. detect queues). To do so the positions of the observed vehi-cles must be extracted from images taken by a camera on the system. Each vehicle should also be given a specic identity while they are ob-served, and a ltered position and an estimated velocity should also be extracted. And if a specic vehicle should be tracked during any longer

(9)

time period, one would like to keep it in the center of the observed area. Another thing one would like to do is to use the color of the vehicles to decrease the probability of mixing them up with each other. These are the problems considered in this thesis.

A simulator has been developed in WITAS. It simulates a helicopter with a camera attached to it. The helicopter is ying above a landscape and some cars are driving on a road network in the landscape. Some of the evaluation will be done with this simulator.

1.2 Position measuring

There are many ways to detect the presence of vehicles in an image sequence. One way is to rst stabilize the sequence with respect to the ground. The optical ow (a velocity eld for a sequence of images, see g. 1.1) is then calculated and thresholded. The center of gravity of the blobs (see g. 1.1) are then calculated to get the positions. However, one would like to have the positions of the vehicles in some kind of world coordinates. This can also be done in many dierent ways. One way is to use a map containing positions of xed objects (landmarks). Then, if the ground is assumed to be at there exists an eight-parameter bi-linear

transformation (

y

= Ax+b

1+c

Tx) from image to ground coordinates. Since the

position and orientation of the camera varies, so do these parameters and therefore they have to be estimated in each image. The transformation depends on having four landmarks in the image. This is since it needs eight known to calculate eight unknown in a linear system.

These positions will contain an error component. In this case, some of the error is introduced when calculating the optical ow and center of gravity of the blob, and some when calculating the positions of the xed objects and then assuming it to lie in a plane. The given positions of the xed objects is probably neither so accurate. The simulator uses a rather dierent way to extract the positions of the vehicles. However, independent of which method used to get the positions, some error will be introduced.

The position of the airborne system (will be called platform from now on) is likely to be determined with a dierential GPS (Global Posi-tioning Sensor) and inertial navigation (if the platform is moving). The accuracy of this positioning depends on the GPS used (for a good dif-ferential GPS the accuracy is about one meter) and the movement of the platform (if the platform is moving the accuracy can be improved over time by using inertial navigation. However, the error component

(10)

stops, and the accuracy of the positioning will be the accuracy of the GPS). The simulator does not take this into account, thus the position obtained from the simulator was the correct position of the helicopter. However, some noise could have been added to make it more realistic.

To decrease the error the positions can be ltered, preferably with a Kalman lter.

1.3 Camera control

To track moving vehicles on the ground with a camera mounted on a platform (in this case an airborne platform), one would like to keep the vehicle in the center of the image while tracking it. To do so one must predict the movement of the vehicle and the platform (together or separately) since they move between two image frames. One would also like to lter the position (or change in position) of the vehicle and platform, since the positioning of the vehicle and platform may be rather inaccurate and also discretized (as mentioned in the section above). One also has to calculate a control signal to the camera to track the vehicle. The prediction and ltering can be done with a Kalman lter.

1.4 Color classication

The problem with using color for classication is that the apparent colors change with lighting (e.g. due to shadows). To use color for classication in a good way, one must make some prediction on how possible dierent changes in the color are. This can be made in some dierent ways. The easiest way is to assume that the distribution of the light spectrum is constant, thus only the intensity changes. This makes it very easy to predict the changes, since this correspond to a uniform scaling of the color components. However, in practice this assumption is not always valid. For instance, when going from direct sun light to shadow the spec-trum distribution will change, since almost all the light in the shadow has been re ected in the atmosphere. One way is to use some model for the changes, and another is to approximate the changes.

1.5 Problem denitions

1.5.1 Position ltering

The rst problem considered in this thesis is, to use position data to detect the vehicles, give each an identity, and nally to track them.

(11)

The position data comes from an existing program that calculates the optical ow on an image sequence (see g. 1.1 below) and returns an estimate of the center of gravity of the moving objects. The use of a Kalman lter was suggested to obtain predictions of the movements of the vehicles (used when tracking the vehicles) and ltering of the noisy position data (is done to get smooth and realistic traces of the cars). The program should be able to handle the following situations.

Vehicles disappearing under short time periods, which occurs for

example when a vehicle goes under a bridge.

Two position data become one. This happens because the optical

ow lter must have an extension in space, so when two vehicles gets too close to each other the lter response will be interpreted as one object.

Figure 1.1: Left: From the original sequence. Right: Optical ow

1.5.2 Camera control

The second problem considered in this thesis is to keep a specic vehicle in the center of an image while tracking it. Three dierent conditions should be considered.

(12)

of the platform, the error vector, the camera angles and a camera parameter. (Condition 1)

Known parameters: Only the error vector,the camera angles and

the camera parameter are known. (Condition 2)

Known parameters: Only the error vector and camera angles are

known. (Condition 3)

The error vector (the vector between the center of the image and the center of the vehicle) is given as two oats (number of pixels in the

(

x;y

) directions, see g. 1.2a where (

X;Y

) are xed coordinate axes,

y

is along the heading of the platform).

If multiplied with the distance to the object in the center of the

im-age, the camera parameter gives an estimation of the relation

meter=pixel

in the center of the image.

The control signal to the camera should be two angles,

y between

,

z

(pointing down from the platform) and

y

and

x between

z

and

x

(see g. 1.2b below). Y X y x -z Ground the x-direction Center of image in (a) (b) -x Platform Platform Error vector Center of image Car Camera view

Figure 1.2: a: Seen from above. b: Seen from one side.

p

x

1.5.3 Color classication

The third problem considered in this thesis is to investigate if color can be used as a means to discriminate between dierent vehicles, and if so, how.

The color classication should be considered to be used for two dif-ferent tasks:

(13)

When tracking a vehicle. The color should be used to decrease the

possibility of mixing them up with each other.

To re-identify a vehicle by detecting if it has the same color as

earlier.

In the WITAS project, the color is not intended to be used as the only feature; it is complemented by other features, e.g. length and width.

(14)

Theoretical backgrounds

This chapter contains some short presentations of some of the already existing theory which is used in this thesis.

2.1 The Kalman lter

First a small example where the Kalman lter is useful (for more de-tails about the Kalman lter and proofs see e.g. [1] or [2]). Assume that we want to estimate the position (in one direction) of an object which is oating in space (no gravity and no friction) from noisy po-sition measurements. The object weighs one kg and is aected by an immeasurable random force. If we assume that the velocity and force are approximately constant between two measurements, we can predict

) +

T

T

u

(

t

) +

T

2

F

(

t

) (2.2)

where

T

is the time between two time steps. The measurement can be

t

) (2.4)

which is the form that the Kalman lter uses. If

F

(

t

) and

e

(

t

) are

(15)

of the position and velocity, which are optimal in a sense (eq. 2.10 is minimized).

) (2.6)

where

x

is the state variables and

y

the measured variables,

v

and

e

v

T₍

_s

_{)] =}

_R

1

t

,

s

) (2.9)

The Kalman lter is the solution to the problem of estimating

x

(

t

) from

the measurements

y

(

s

)

;

0

s

F

(

t

) (2.16)

the only modication which has to be done is to replace (2.11) with ^

x

(

t

+ 1j

t

) =

F

Obviously, the best initialization is to set ^

x

(0j,1) =

x

0 (2.20) and

C

(0j,1) =

C

0

;

(2.21)

but if no advance information about

x

is available a natural initialization

is

^

x

(0j,1) = [0

:::

0]T (2.22)

C

(0j,1) =

kI

(2.23)

2 n 1 C C C C A (2.24)

(17)

2.1.2 Setting the variances

It is often impossible to calculate the true process variance

R

1. In this

2 to some reasonable value and then

adjust

R

1will often give a good result. If there are no special demands on

the lter, a good way to adjust it is to minimize

E

[(

y

(

t

),

H

H

x

^(

t

j

t

,1), which should have approximately the same

characteristics as

e

(

t

) in (2.6). Note that there should be no maneuvers

in the test data when looking at the innovation.

2.2 Adaptive parameter estimation with Kalman

t

) (2.25)

where

z

is the parameter vector and

(18)

2.3 The steady state

The steady state for the cars is obtained by setting the acceleration in

each direction (

x

and

y

) to be white noise, thus in the x-direction

d

2

x

dt

2 =

b

w

_(

t

) (2.30)

where

E

[ _

w

(

t

)] = 0 (2.31)

E

[ _

w

(

)

:

2 is ambiguous. A steady

state with the states

x

, dx_dt and d2x

dt2 can be obtained in a similar way.

2.4 Color and color spaces

Humans have three types of cones which are sensitive to dierent wave-lengths of light, one for wavewave-lengths around 600 nm (red), one around 540 nm (green) and one around 450 nm (blue). This gives a three di-mensional representation of colors, the RGB color space. This color representation is widely spread and is the one used in color television. There are many other color spaces, of which two color spaces will be presented here HSV (Hue, Saturation and Value) and GCS (GOP Color Space). For further reading on colors and color spaces see e.g. [6],[7],[8].

(19)

2.4.1 The HSV color space

The HSV color space separates the luminance (Value) from the color part by dening the V axis in the [1,1,1] direction of the RGB space. H is the angle around the V axis with zero corresponding to a vector pointing towards the R axis. S can be thought of as \how pure a color is". (see g. 2.1 below). The HSV space is really a cylinder, and not a cone or hexcone as in g. 2.1. However, the perceived change in color as saturation varies is less for dark colors (i.e. ones with a low Value parameter) than for light ones (i.e. ones with a high Value parameter), so the color space is usually distorted to form a cone to help compensate for this perception imbalance. The RGB to HSV transformation is a little strange, which can be seen below.

max=max(R,G,B) min=min(R,G,B) diff=max-min V=max

if max <> 0 then S=diff/max else S=0 if S=0 then H=undefined else Rd=(max-R)/diff; Gd=(max-G)/diff; Bd=(max-B)/diff; if R=max then H=Bd-Gd else if G=Max then H=2+Rd-Bd else if B=max then H=4+Gd-Rd; end end end H=H*60; if H<0 then H=H+360; end

(20)

Figure 2.1: The HSV space, V lies in the [1,1,1] direction in the RGB space

2.4.2 The GCS

Like the HSV color space the GCS tries to separate the luminance from the chrominance. The GCS is based on the CIEu'v'1976 standard. The

transformation from RGB space to

u

0

v

0 space (chrominance space) is

done by: X=2.77R+1.75G+1.13B Y=R+4.59G+0.06B Z=0.06G+5.59B

u

0 = 4

X

+ 15

Y

+ 3

Z

(2.36)

v

0 =

_X

_{+ 15}9

Y

_Y

_{+ 3}

_Z

(2.37)

where

Y

is the luminance. To come to the GCS from CIEu'v'1976,

the white-point in the

u

0

v

0 space is moved to origin and the Cartesian

coordinate system is made polar with argument zero pointing at green.

Instead of using

Y

as luminance information,

L

=

X

+15

Y

+3

Z

is used.

(21)

2.5 From light source to camera output

A standard color camera represents each pixel with three scalars: the RGB-values. The RGB-value obtained at a specic pixel depends mainly on the color of the surface which projects onto that pixel, on the light sources in the scene and the properties of the camera (see g. 2.2). This section contains a simplied explanation on how these issues are

related. First we have the light source which at a distance

r

has a

spectrum

L

(

;r

) (

is the wavelength). Then we have the absorption

of the medium which the light travels through. This ought to be rather small if the traveled distance is rather short and the medium is air, so this is neglected. Background re ections (the light is rst re ected by other surfaces before it is re ected on the particular surface corresponding to the pixel), which are not always neglectable, are not considered in this model. The light is then re ected by the surface of interest. There are basically two types of re ections: rst we have the re ections where the light is re ected \inside" the surface (called sub-surface re ection or body re ection), then we have the case where the light is re ected on the interface (this is known as specularities, and occurs for example when watching some types of surfaces from a particular angle). The second type of re ection is not considered here, so the spectrum of the re ected light will be:

S

(

) =

R

(

;a

)

L

(

;r

) (2.38)

where

a

denotes the photometric angles. The light is then measured by

the sensors (or sensor) in the camera, in this case three (RGB). The sensor response is then given by:

p

k=

Z

F

k(

)

S

(

)

d;k

= 1

;

2

;

3 (2.39)

where

p

kis the response of the k-th sensor and

F

kis the response function

of the k-th sensor (see Fig. 2.3). For a more detailed explanation see e.g. [9].

2.6 Function approximation and principal

com-ponent analysis

This section which presents two types of function approximation; poly-nomial and neural networks, starts with a short review of principal com-ponent analysis.

(22)

Light source

Surface Pixel

Figure 2.2: The light re ecting in the surface

F

λ

F

1 F2 F3

Figure 2.3: Example of a set of response functions

2.6.1 Principal Component Analysis

The principal component analysis solves the problem of nding the dom-inant subspace of a set of vectors [

v

1

;v

2

;:::::;v

v

] (2.41)

where

m

is the mean vector for the cluster. The largest eigenvectors

of C represents the dominant subspace for the set. The eigenvectors can be calculated in many dierent ways, e.g. by using Singular Value Decomposition (SVD).

To make this plausible, one may see what the elements in the covari-ance matrix represents. In the diagonal are the varicovari-ances in the dierent directions (along the axes) and outside the diagonal are the correlation between the dierent components. It is rather easy to imagine how a

(23)

new set of axes should be chosen, to make them uncorrelated (the eigen-vectors) with each other (if the set is fairly shaped like a line, and the dimension of the space is two or three). The conclusion is that the eigen-vectors with the largest eigenvalues represents the dominant subspace for the set.

For more details, see e.g. [10].

2.6.2 Polynomial function approximation

This type of function approximation is rather easy and straight forward.

The problem to solve is to t a polynomial to an function

f

as well as

possible. For example, a two variable function

f

can be approximated

2

+

ey

2

+

gxy

(2.42)

where

a;b;c;d;e;g

are the parameters to be determined. This is done

b

c

d

e

g

3 7 7 7 7 7 7 7 5 (2.46) and

F

= 2 6 4

f

(

= 2 6 6 6 6 6 6 6 4 1

x

new

y

new

x

2 new

y

2 new

x

new

y

func-tion

S

(

Wx

) = 1

1+e

,Wx. A number of perceptrons are then connected to

each other. The following presentation will only consider a feed-forward networks (no closed loop in the network). Below a two layered structure is presented (g. 2.5).

One property of the neural network is its ability to learn things. This can be used for function approximation. The learning is done by

(25)

. . . . . S(Wx) x[1] x[2] x[N] u

Figure 2.4: A single perceptron with N inputs

applying inputs to which the correct outputs are known and then adjust

the

W

(see g. 2.4) vectors in a smart way. The adjustment can be

done in mainly two dierent ways: one way is to adjust them for every input signal (on-line), the other way is to rst apply the hole set of input

signals and then adjust the vectors (o-line). One way to adjust the

W

vectors is to use the backpropagation algorithm.

The backpropagation adjustment rules for the layers are: The Error function:

E

p_{= 12}XK

i=1

(

d

p_i ,

u

p

i)2 (2.52)

where

p

is an indexing for the set of input-output signals and

d

pi is the

wanted output signal for the i:th perceptron in the output layer. O-line:

W

=,

P X p=1

@E

p

@W

(2.53) On-line:

W

=,

@E

p

@W

(2.54)

The output layer:

(26)

x[1] . . . . . . . . . . . . . . . . . . f2 f1 s1 s2 sK . . . . . . . . . . . . . v[1] v[2] u[1] u[2] u[K] . .

x[N] _{Hidden layer} _{Output layer}

fM v[M]

Figure 2.5: Feed-forward network with one hidden layer The hidden layer:

@E

p

@W

m =

@E

p

@v

p_[

_m

_]

@v

_@W

p[_m

m

_x

]

@W

_@W

m_m

x

(2.56) where

@E

p

@v

p_[

_m

_{] =} X k

@E

p

@u

p_[

_k

_]

@u

_@W

p_k[

k

_v

]

_@v

@W

p_[k

_m

v

_] (2.57)

where

is positive the learning rate

is increased, and decrease if negative. Thus, if the weights has been e.g. decreasing for a while, the learning rate increases. Both methods makes the adjustment faster.

(27)

Position ltering

This chapter presents an implementation of a Kalman lter for stabiliz-ing measurements of position and velocity, accordstabiliz-ing to the discussion in 1.2 and how the evaluation is done.

The test sequence used for variance estimation and evaluation is an 800 frames (32 seconds) long sequence from Hallunda (see g. 3.2) containing 29 vehicles on the main road and one on the road to the right, and maybe some on the small road to the left (very hard to see).

3.1 Variance estimations

the variances to dier in the

v

and

o

directions). To estimate the

vari-ances in the

x

direction, a line

at

+

b

with length 19 is tted at every

point except the nine rst and last of the tracking data. This is done by

W

(

t

) is a Hamming window. The variance in the

y

direction

is estimated in the same way. To estimate variance in in the (

v;o

)

directions the lines are tted to the position data (

x;y

) instead of to

(

x;t

) and (

y;t

). Then the angle of each line is calculated and the data is

rotated so that the new x-direction is parallel to the corresponding line. Then a line is tted to the rotated data and the variance is estimated as before. To obtain a scalar estimate the mean of the variances calculated was taken. Note that the variance probably changes with the camera parameters and the distance between the camera and vehicle. However, the necessary information needed to evaluate an adaptive system was not available.

The measurement variance is obtained by using traces from vehicles

making no maneuvers. The estimates obtained in the (

v;o

) directions

were

R

2v = 0

:

06 and

R

2o = 0

:

02, so the variances in the (

x;y

) directions

are dependent of the direction of the movement of the vehicle. The process variance is seen as a design parameter.

3.2 Implementation of the Kalman lter

This section explains how the Kalman lter is implemented (which co-ordinate system used, which variables used as states and how it is initi-ated).

The lter is implemented in four ways:

With position and velocity as states in the

x;y

coordinates.

With position and velocity as states in the

v;o

coordinates.

With position, velocity and acceleration as states in the

x;y

coor-dinates.

With position, velocity and acceleration as states in the

v;o

coor-dinates.

2o

to 0.02. In all four cases the process variance

R

1 is seen as a design

(29)

it was adjusted to lter as hard as possible with some conditions. The conditions imply that the lter should not lose track of a vehicle which

accelerates with 9

m=s

2 along the

v

axis or with 9

m=s

2 along the

o

axis.

The values of the conditions, are selected on the basis that no normal vehicle can accelerate more than this. By doing this it was easier to handle some of the problems that occur when no position data can be found, and I do not think it is so important to get the best possible estimation of the position.

The two implementations which did not include the acceleration in the states did prove to give the best results (probably since it is not so easy to make a good estimation of the acceleration, when the position data is quite noisy), and since they demand less computation, the two with acceleration included were out of consideration. It was harder to observe any dierence between the remaining two; the variance of the ltered data was calculated but no signicant dierence was found. Con-sequently, which to choose depends on what is seen as most important,

the small amount of calculations needed when using the (

x;y

)

coordi-nates, or the exibility of (

v;o

) coordinates and the fact that it should

be able to give better result if the lter was adjusted to minimize the error.

The initiation of the lter is done by setting the prediction of the states to zero (which implies that we do not know anything about the position) and setting the covariance matrix to

"

s

2 0

0

v

2

#

(3.3)

where

s

is the width of the viewable area in meters (the position data

received was given in meters) and

v

is some possible velocity (the speed

limit ought to be a good value, is set to 25m/2 in this lter).

3.3 Tracking of the vehicles

This section explains how the tracking of the vehicles is performed, men-tions some problems encountered and how these were solved.

In the rst frame a lter (Kalman) is applied to each position mea-surement, and in the consecutive frames to every position measurement not already taken. When to remove a lter, is explained later. The predictions from the lters are separated in two groups: one labeled 'vehicles' and one 'not vehicles'. To be classied as a vehicle the lter must have tracked something during a number of samples. This is done

(30)

The tracking is done by taking the position predictions from the l-ters labeled 'vehicles' and calculate the distance between each prediction and each measured position. This gives a matrix where the value in line

k

and column

l

is the distance between prediction

k

and measurement

l

.

Then each prediction is connected to the measured position closest to the

prediction (this is done when the (

x;y

) coordinates are used; when the

(

v;o

) coordinates are used an elliptical distance measurement is used,

D

= p

3

dv

+ 5

do

. This is done since the variance is greater in the

v

direction than in the

o

direction) under the condition that the distance

is less than some reasonable value (the movements of the vehicles has

some restrictions). In this way a vector

pos

is created, see [5].

Example: The measured position closest to prediction number ve

is measurement number 2, gives

pos

(5) = 2.

If two or more predictions are connected to the same measurement, the ones with the largest distances are set to be not allowed to connect to that measurement, and the connecting part is done again. This is done until no predictions have the same measurement. If there are any measurements left they are used in the same way with the predictions from the lters labeled 'not vehicles'. If some lter labeled 'not vehicles' does not get connected to any measurement it is removed. If some lter labeled 'vehicles' does not get connected to any measurement four things may have happened:

The vehicle drives out of the viewable area

The vehicle disappear behind some object, e.g. a bridge

The optical ow program fails to detect the vehicle, e.g. because

the car has no motion anymore (it has stopped)

The vehicle gets to close to another vehicle

In the rst three cases there is not much to do without knowing anything about the context (the road, buildings etc.), so the lter keeps predicting the position under a number of steps, the search area is in-creased for each step. If no connection is found the lter is removed.

In the case when two vehicles gets too close to each other, three dierent actions could be chosen. Let one vehicle overtake the other vehicle, let one vehicle drive after the other or just let them go straight forward according to predicted velocities. Which alternative to choose

depends on the length of

x

(see g. 3.1 below) and the velocity vectors

(31)

, , , , , , g g @ I

x

v

car1 car2 s

Figure 3.1: The dot is the measurement,

x

is orthogonal to

v

the velocity

vector.

If the scalar product of the two velocity vectors is negative (the vehicles meet each other) they are set to just go straight forward.

If the scalar product of the two velocity vectors is positive and the

length of

x

is smaller than some number, the vehicles are set to drive

after each other. If greater, the vehicle behind the other (car1) is set to drive past the other (car2). This is done by predicting how car1 will move relative to the measurement. First the point where car1 should be when the two vehicles separate relative to the measurement is calculated by

mirroring car1's position in the line spanned by

x

, then the time for this

movement is estimated by making the assumption that the measurement is the center of gravity for the two vehicles together. This gives

vm

= (

v

1 +

v

2)

=

2 (3.4)

vr

=

v

1,

vm

(3.5)

where

v

1 and

v

2 are scalar velocities for car1 and car2, respectively.

vm

is the estimated velocity for the measurement and

vr

is the estimated

velocity for car1 relative the measurement. The time for the event is

then obtained by just dividing the distance by

vr

. The trace is then

easily calculated, and car2 is just placed on the opposite side of the measurement.

3.4 Evaluation

The evaluation which is presented in this section, is mainly done by looking at the ltered traces to see if the tracker lost track of any vehicle or mixed them up with each other, and to see if the tracking looked natural (not natural if the vehicles drive over each other or jump large distances). The variance was also estimated for the traces.

The rst part of the evaluation is rather subjective, but the program did not lose track of the vehicles or mix them up with each other in the test sequence and I think the tracking looks rather natural except

(32)

vehicles can appear from nowhere when two cars enter the viewable area closely together and then separate (see g. 3.2 below). However, these two problems could not be avoided. Another fault observed in the test sequence was that noise was sometimes classied as a vehicle (one clear case and three not so obvious, since it is not clear if it is a vehicle or noise). This fault can easily be removed by increasing the number of steps it takes to be classied as a vehicle, but when a measurement is labeled 'not vehicles' it is very vulnerable, since it has a lower priority and not allowed to disappear under a single frame, so a compromise had to be made. Another fault was that sometimes the estimated positions of the vehicles were to close to each other during overtaking, this was

to some extent avoided by setting a min value to

x

Known parameters: The position of the vehicle and the position of

the platform relative to a xed point on the ground, the heading of the platform, the error vector, the camera angles and a camera parameter. (Condition 1)

Known parameters: Only the error vector,the camera angles and

the camera parameter are known. (Condition 2)

Known parameters: Only the error vector and camera angles are

known. (Condition 3)

When an equation is only shown for the

x

direction it is equivalent

^pY) 2 (4.1)

where (^

p

v

;

p

^p) are the estimated vehicle and platform positions,

respec-tively (comes from the Kalman lters, in meters). The error in meters

"

m can then be calculated as

"m ="p

c D

(4.2)

where

"

p is the error vector in pixels,

c

"

mx+ ^

v

vpx,^

v

ppx+ (^

v

ppZ+ ^

p

pZ)

tan

(

x)

^

p

pZ (4.4)

d

p),cos(

p)) (4.6)

dx

=,

r

(sin(

p,

d

p),sin(

p)) (4.7)

where

r

is the radius from the rotation center,

dx

and

dy

are the

com-pensation factors in meters and

d

p is the predicted change in angle

(36)

4.1.2 Condition 2

Now no positions are known, but a normalized position of the vehicle relative to the platform can easily be estimated. The estimate is then Kalman ltered to get a prediction of the movement of the vehicle rela-tive to the platform.

D

norm =

q

1 + tan(

x)2+ tan(

y)2 (4.8)

^

p

xnorm =

"

xp

c D

norm+ tan(

x) (4.9)

The new angle is then given by:

xn= arctan(

"

xp

c D

norm+ ^

v

vpxnorm+ tan(

x)) (4.10)

since

arctan

"

xp

c D

+ ^

v

vpx

_p

_^_pZ+ ^

p

pZ tan(

x) = (4

:

10) (4.11)

4.1.3 Condition 3

Now when no camera parameter is known, the program must estimate the camera parameter or some other related parameter. The error can be written as

"

xm = ^

p

xm,

h

tan(

x) (4.12)

where ^

p

xm is the position of the vehicle relative to the platform (in

meters) and

h

is the height of the platform above the ground (in meters).

If we then dierentiate (4.12) with respect to

t

we get

d

dt

(

"

xm) = ^

v

xm,

h ddt

(tan(

x)) (4.13)

where ^

v

xmis the vehicles velocity in

m=s

. However, the system is discrete

so (4.12) a nite dierence is used.

"

xmn,

"

xm

T

= ^

v

T

xm ,

h

tan(

xn),tan(

x)

T

(4.14)

where ^

v

xm is the movement of the vehicle between two frames,

multipli-cation with _cTD gives

"

xpn,

"

xp= ^

v

xp,

1

c D

norm (tan(

xn),tan(

x)) (4.15)

Here we have the two parameters we want to estimate: the movement

(37)

known (the change in error),

D

norm can be calculated and the old and

new camera angles are known. So the problem turns out to be of the type

y

=

a

+

bx

(4.16)

where (

y;x

) are known and (

a;b

) unknown and vary with time. This

problem can be solved in many dierent ways; here are some of the methods tested.

For each pair of (

x;y

), (

a;b

) can be calculated (two equations and

two unknown) if the new and old

x

, and new and old

y

, dier from

each other respectively. The

a

and

b

:s are then low pass ltered.

A single perceptron (used in neural networks) updated with the

back propagation algorithm without moment.

[1] 1

x

[2] ... ... 3 7 7 7 (4.19)

(38)

and

x

= "

a

b

#

:

(4.20)

The least square gives the solution to the overdetermined system that

minimizesjj

Ax

,

y

jj

2 and the weighted least square solution minimizes

jj

W

(

Ax

,

y

)jj

A

and

b

rather

quickly and

W

is set to a diagonal matrix with the diagonal elements

increasing with increasing indices to imply that the new measurements are more reliable than the old.

Adaptive parameter estimation with a Kalman lter

The parameter variations can be written as

z

(

t

+ 1) =

z

(

t

) +

v

(

t

) (4.22)

and the relation between the parameters and the measurements can be written as

where

e

(

t

) is the measurement noise. A Kalman lter is then applied to

this as explained in section 2.2.

In this problem

R

1 is time dependent, since

a

is the movement in

pixels and the variation in

a

depends on the distance and the camera

parameter. However, the distance is not known so

R

1 is set to a

con-stant value. The variation in

b

depends on the normalized distance.

R

2

is assumed to be time independent since

y

is given in pixels (the

mea-surement error of the error in pixels ought to be rather independent of the distance and camera parameter). In this case are

H

=h 1

x

i (4.24) and

z

= "

a

b

#

:

(4.25)

(39)

The evaluation was mainly done with a small test track where a vehicle rst drives with constant velocity along a straight line and then along half a turn of a circle. The platform was stationary to make it easy to compare the dierent solutions, and the camera parameter was x. Ap-proximately white normal distributed noise with varying variance was added to the position of the vehicle. In this test environment all im-plementations approximately gave equally good results. However if the zoom of the camera was changing during the test track the implementa-tions with adaptive parameter estimation obviously did not perform as well as the other ones. Since the implementation constrained by condi-tion one was going to be used, the other ones were not tested any more than this. However, if an implementation under condition three is going to be used the one using a Kalman lter is preferable since it is much easier to adjust to perform well.

No numbers on how well the dierent implementations perform are presented (mean deviation and mean variance) since these depends on so many dierent things (for example the movement of the car and platform, the measurement noise, the camera parameter and the change of it, the height of the platform and so on), so it would probably not be any usable information.

The condition one implementation shall be further tested in the sim-ulator mentioned in section 1.1. It simulated a helicopter with a camera attached to it, the helicopter is ying above a landscape and some ve-hicles are driving on a road network in the landscape. One thing the helicopter should be able to do was to follow a vehicle and keep it in the middle of the camera image. The measurement variance for the vehicles and the helicopter shall be estimated and the process variance set to some appropriate value. The measurement variance can be estimated in the same way as in section section 3.1. The process variance shall

(40)

the car disapears is handeled by another program. One way might be to select the parameters that minimizes the error under "normal" ma-neuvers, one must however make sure that the lter can handle large maneuvers.

(41)

Color classication

This chapter presents some methods to use color to make classication of vehicles more robust, in changing illumination.

5.1 Implementation

A large part of the time used on color classication was spent on nding material about this subject (both general information and work done in the area) and then read it. However, none of the methods found were seen to be applicable to this problem. The reason is that all methods found either make some restrictions on the environment (e.g. the object must have more than one color) that are not valid in this case, or require some information which are not available (e.g. knowing how the color of the object looks in dierent possible lighting conditions).

One thing one must assume, is that the change in the spectrum of the lighting changes slowly over frequencies, or that the re ectance function

R

in eq. (2.38) is quite smooth, or both. These assumptions says that,

two points in the same region of the RGB-space will change in roughly the same way. Another thing assumed is that the scene only contains two dierent types of illumination (e.g. sun and shadow or e.g. street light and shadows) at the same time.

The plan was rst to reduce the 3-dimensional RGB-space to a 2-dimensional space independent of the luminance. The reason for this is that the main problem was assumed to be the illumination changes caused by shadows (and was so in the test sequences), and one would like to think that the luminance is the only thing changing (see g. 5.1 below, RGB plot for six cars in varying lighting. 40 measurements for each car). Based on this approach, the following method were tested:

(42)

the H component was also tried)

The chrominance components in the GCS

To project the RGB values in a plane with its normal parallel to

the dominant direction of change in the RGB space

To use a polynomial transformation that minimizes the spread

However, the luminance have to be used as a third component to distinguish two colors with dierent luminance (e.g. black and white).

0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 G R B

Figure 5.1: RGB plots for a red, a white, a blue, a green, a yellow and a black vehicle

However, these methods did not give so very good results (see the evaluation). Therefore some other methods were tested. These methods are based on the idea of getting a function to approximate the angles to the direction of change in the dierent parts of the RGB-space (see g. 5.2). The method should also be able to adapt to changes in these directions.

One method tested was to use a model on how the the RGB values change with changing lighting. The model used was the simplest

possi-ble, a diagonal matrix

D

. The matrix describes the change in the RGB

values

R

caused by a specic change in the lighting

L

.

R

L+L=

DR

L (5.1)

If the change in lighting is

k

L

, the change in the RGB values will be

(43)

This can be made continuous

R

L1 =

F

(

t

)

R

L2 (5.3)

Bt 3 7 5 (5.4)

The constants

k can be calculated by selecting two points from a

suitable cluster (many measurements and large dierence between the largest eigenvalue and the other two), the points should correspond to

measurements in sun and shadow, respectively, and then set e.g.

t

= 1

to correspond to this change. The angles corresponding to the direction of change can then easily be calculated by rst calculate the tangent to the curve. This model assumes that the camera sensors are linear.

Two dierent function approximations were used: polynomial func-tion approximafunc-tion using the mean squares method (see 2.6.2) and neu-ral networks (see 2.6.3). No complete program has been made. However, the method is meant to work in the following way. When tracking of a vehicle shall begin, the RGB vector of its color is calculated. The vector is then sent to the function approximation program which returns the two angles. Then an ellipsoid probability eld is calculated around the RGB vector, e.g.

p

= ₉

_z

2+

k

y

2+

x

2 (5.5)

where

k

is a normalization constant and (

x;y;z

) is the distances from

the RGB vector, with

z

in the direction described by the angles. The

weighting of the

z

distance should vary with the distance from the origin,

since light colors changes more with varying lightning than dark colors do. The probability eld should probably also be non-symmetric, since e.g. a light color in a shadow can change in RGB values much more than

a dark color outside the shadow. This can be achieved by weighting the

z

distance dierent depending on if the

z

distance is positive or negative.

In the next step the real tracking begins and the RGB vectors to all vehicles (or all possible vehicles) in the image are sent to the program. The program returns the probabilities for the vehicles to help the tracker to choose the correct vehicle. The tracker then sends the RGB vector of the chosen vehicle to the program, which calculates a new reference point for the probability eld by weighting the old reference point with the new RGB vector in some way (e.g. mean value). And so on until

(44)

interesting. To later see how likely a vehicle is the same as this one, the probability is calculated with respect to the reference point.

The adapting for the model based method can be done by just up-dating the matrix (as mentioned above or with some mean of matrices) with constant time intervals or whenever it is seen necessary. The adapt-ing for the approximatadapt-ing function can be done in many ways. One easy way that might work is to divide the RGB-space in a number of sec-tions (one for rather red colors and so on). In each section one cluster of RGB measurements is selected and used for training. When a new vehicle has been tracked, the principal component analysis is used to get the dominant direction and a variance estimation in that direction. If the variance is small compared to the distance to the origin, it is not used for training. If it is not so small, it is examined to which section the cluster belongs to. If the dierence of the new and old directions is rather small nothing is done, and maybe also if the dierence is very large. If the dierence is not to small (and not too large), the old cluster is replaced with the new and the approximating functions are updated. This is not optimal since it ought to be possible to some extent predict how the directions in all sections change if the change in one is known.

0 50 100 150 200 250 300 0 100 200 300 0 50 100 150 200 250 300 G R B

Figure 5.2: A vector eld obtained from an approximating function

5.2 Evaluation

The evaluation data was made from a number of video sequences taken from a helicopter. The rst video sequence was stored as Super-VHS.

(45)

This was not sucient since the bandwidth for the colors is rather small in Super-VHS. The result is that the colors are smoothed out, which results in the vehicles having a mixture of the vehicles color and the background color. However, some video sequences stored on Digital BetaCAM were found (the RGB values are stored separately, so that the bandwidth for the colors is the same as the bandwidth for the intensity). Some parts of these sequences were digitized and used to extract test data from. The extraction of the test data was done by taking the RGB

values from a 33 pixels large neighborhood on the vehicle, and then

calculate a weighted mean of if. This is done for 40 consecutive frames for each vehicle. The vehicles are rather small in the video sequences

(maybe 510 pixels). This makes it hard to see the borders of the car,

which makes it hard to be sure that the 33 is inside the borders of

the vehicle all the time. This together with the sampling of the image and the cars not having a uniform color (e.g. the interior), has probably introduced some errors. The extraction of the RGB clusters was totally done for 34 vehicles.

Mainly six clusters (the clusters in g. 5.1) are used for evaluation. These clusters are selected to cover as large part of the RGB-space as possible.

5.2.1 HSV and GCS

These two color spaces were quite quickly dismissed, since they do not work. Fig. 5.3 below shows the HS-plane for the six vehicles. Ideally would there be six separated dot shaped clusters, thus no need to say any more about this. The GCS gave more or less the same result, so neither this space is further evaluated.

5.2.2 Projection in the dominant direction and

polyno-mial transformation

The direction to project in was chosen to the mean direction of the dominant direction of the six clusters. The result of projecting these clusters is shown in g. 5.4. The evaluation data should not be the same as the learning data to make a proper evaluation. However, if it does not work as well as demanded on the learning data, it certainly will not work good enough on the evaluation data. The result is quite good, except for the red one (the cluster to the right), and therefore was this method also excluded.

(46)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 S H

Figure 5.3: The HS-plane for the six clusters

One way is to select the points as the mean value of the projection of respectively cluster in the dominant direction (as mentioned above). The result for a second order polynomial is shown in g. 5.5. The yellow (at the bottom) and the red (to the right) have improved a little. However, it can be seen from g. 5.6 (curves of the points in the RGB-space that are transformed to selected points in the 2-dimensional space) that the selection of the points are not so good, since the curve for the red (at the bottom) is very bent. This can obviously cause very large errors. There could not be found any good way to avoid this problem, if it should be adaptable, which it was supposed to be. Consequently, this method was also excluded.

5.2.3 Angle approximation

The method using a model was tested by just looking at the lines dened

by eq. (5.4) when

t

goes from 0 to 1, for some point in each cluster (see

g. 5.7). This was done for about 20 clusters, and all except the yellow seemed to be OK (see g. 5.7). There was only one yellow car in the test sequence. So it is not really clear if it is the model that does not work, or if it is some large errors in the measurements of the RGB values to this vehicle (I could not nd any reason for this, when looking on how the data had been extracted). The result is that this method is excluded until it is clear what causes the large error for the yellow cluster.

The evaluation of the two function approximation methods was done a little more properly. Both were trained with the set of six clusters,

Investigations in Tracking and Colour Classification

Institllti'onen for systemteknik

Department of Electrical Engineering

Examensarbete

TE

LINKOPINGS UNIVERSIIET

o

o

o

o

Abstract

Acknowledgments

Contents

1 Introduction

5

2 Theoretical backgrounds

11

3 Position ltering

24

4 Camera control

31

5 Color classi cation

38

6 Discussions

48

Introduction

1.1 WITAS

1.2 Position measuring

y

1.3 Camera control

1.4 Color classi cation

1.5 Problem de nitions

1.5.1 Position ltering

1.5.2 Camera control

x;y

X;Y

y

meter=pixel

z

y

z

x

1.5.3 Color classi cation

Theoretical backgrounds

2.1 The Kalman lter

p

u

u

t

u

t

T

F

t

p

t

p

t

T

u

t

T

F

t

T

m

t

p

t

e

t

e

t

p

t

u

t

T

p

t

5 Color classication

1.4 Color classication

1.5 Problem denitions

1.5.3 Color classication

_s

_R

_s

_R