• No results found

Investigations in Tracking and Colour Classification

N/A
N/A
Protected

Academic year: 2021

Share "Investigations in Tracking and Colour Classification"

Copied!
55
0
0

Loading.... (view fulltext now)

Full text

(1)

Institllti'onen for systemteknik

Department of Electrical Engineering

Examensarbete

Investigations in Tracking and Colour Classification

TE

AndersMoe Reg nr: LiTH-ISY-EX-1967 7 December 1998

LINKOPINGS UNIVERSIIET

(2)

Investigations in Tracking and Colour Classification

Examensarbetet utfOrt i Bildbehandling yid Tekniska Hogskolan i Linkoping av

Anders Moe

Reg nr: LiTH-ISY-EX-1967

Handledare:

Klas Nordberg, Linkopings Universitet, Sverige Johan Wiklund, Linkopings Universitet, Sverige

Examinator:

Klas Nordberg, Linkopings Universitet, Sverige

(3)

Avdelning, Institution Division, department

Department of Electrical Engineering Computer Vision

Sprak Rapporttyp ISBN Language Report: category

D SvenskalSwedish

o

Licentiatavhandling ISRN

o EngelskaJEnglish o Examensarbete

o

C-uppsats Serietitel och serienummer

o

D-uppsats Title of series, numbering

0

o

bvrig rapport

0

LiTH-ISY-EX- 1967

URL fOr elektronisk version

Titel

Title Investigations in Tracking and Colour Classification Undersokningar inom fOljning och fargklassificering

Forfattare Anders Moe

Author Sammanfattning Abstract ISSN Datum Date 1998-12-07

Den har rapporten behandlar huvudsakligen tre olika problem. Det fOrst problemet ar hur man ska filtrera fordons positions data. For att gora detta maste fordonen Ioljas. Detta ar gjort med ett KalmanfiIter. Det andra problemet var att styra en kamera sa att ett givet fordon ligger mitt i bild, tre olika fOrhallande har betraktats. Detta lOstes huvudsakligen med ett Kalmanfilter. Det sista problemet var hur man ska anvanda fordonens farg sa att man far siikrare klassificering av demo Nagra forslag pa hur detta kan goras ges, men ingen riktigt bra metod har hittats.

In this report, mainly three different problems are considered. The first problem considered is how to filter position data of vehicles. To do so the vehicles have to be tracked. This is done with Kalman filters. The second problem considered is how to control a camera to keep a vehicle in the center of the image, under three different conditions. This is mainly solved with a Kalman fil-ter. The last problem considered is how to use the color of the vehicles to make classification of them more robust. Some suggestions on how this might be done are given. However, no really good method to do this has been found.

(4)

Abstract

In this report, mainly three di erent problems are considered. The rst problem considered is how to lter position data of vehicles. To do so the vehicles have to be tracked. This is done with Kalman lters. The second problem considered is how to control a camera to keep a vehicle in the center of the image, under three di erent conditions. This is mainly solved with a Kalman lter. The last problem considered is how to use the color of the vehicles to make classi cation of them more robust. Some suggestions on how this might be done are given. However, no really good method to do this has been found.

(5)

Acknowledgments

I would like to thank the people at the Computer Vision lab at ISY for taking their time answering all kinds of questions and helping me with the equipment.

I thank my examiner Klas Nordberg and my opponent for correcting this thesis and giving suggestions on improvements.

(6)

Contents

1 Introduction

5

1.1 WITAS . . . 5 1.2 Position measuring . . . 6 1.3 Camera control . . . 7 1.4 Color classi cation . . . 7 1.5 Problem de nitions . . . 7 1.5.1 Position ltering . . . 7 1.5.2 Camera control . . . 8 1.5.3 Color classi cation . . . 9

2 Theoretical backgrounds

11

2.1 The Kalman lter . . . 11

2.1.1 Initialization of the Kalman lter (t=0) . . . 13

2.1.2 Setting the variances . . . 14

2.2 Adaptive parameter estimation with Kalman lter . . . . 14

2.3 The steady state . . . 15

2.4 Color and color spaces . . . 15

2.4.1 The HSV color space . . . 16

2.4.2 The GCS . . . 17

2.5 From light source to camera output . . . 18

2.6 Function approximation and principal component analysis 18 2.6.1 Principal Component Analysis . . . 19

2.6.2 Polynomial function approximation . . . 20

2.6.3 Arti cial Neural Networks . . . 21

3 Position ltering

24

3.1 Variance estimations . . . 24

3.2 Implementation of the Kalman lter . . . 25

3.3 Tracking of the vehicles . . . 26

(7)

4 Camera control

31

4.1 Implementation . . . 31 4.1.1 Condition 1 . . . 31 4.1.2 Condition 2 . . . 33 4.1.3 Condition 3 . . . 33 4.2 Evaluation . . . 36

5 Color classi cation

38

5.1 Implementation . . . 38 5.2 Evaluation . . . 41 5.2.1 HSV and GCS . . . 42

5.2.2 Projection in the dominant direction and polyno-mial transformation . . . 42 5.2.3 Angle approximation . . . 43 5.2.4 Conclusions . . . 45

6 Discussions

48

6.1 Position ltering . . . 48 6.2 Camera control . . . 49 6.3 Color classi cation . . . 50

(8)

Introduction

This thesis is divided into three parts which are more ore less connected to each other. The rst part deals with tracking and ltering of vehicles, the second with the problem of controlling a camera so that the tracked vehicle is kept in the center of the image and the third with how the color of the vehicles should be used to separate them from each other. All three problem domains are related to the WITAS project described below.

1.1 WITAS

Wallenberg laboratory on Information Technology and Autonomous Sys-tems (WITAS) is formed by four research groups, one from the Depart-ment of Electrical Engineering (Computer Vision Laboratory) and three from the Department of Computer and Information Science at Linkoping University.

WITAS is engaged in basic research in the area of intelligent au-tonomous vehicles and other auau-tonomous systems. The main goal is to develop an airborne computer system which is able to make rational de-cisions. The decisions should be based on various information, some of the information sources should be: pre-stored geographical information, vision sensors and information sent to it by radio. For more information, see the WITAS home page [12].

One of the objectives of the airborne system should be to supervise trac (e.g. detect queues). To do so the positions of the observed vehi-cles must be extracted from images taken by a camera on the system. Each vehicle should also be given a speci c identity while they are ob-served, and a ltered position and an estimated velocity should also be extracted. And if a speci c vehicle should be tracked during any longer

(9)

time period, one would like to keep it in the center of the observed area. Another thing one would like to do is to use the color of the vehicles to decrease the probability of mixing them up with each other. These are the problems considered in this thesis.

A simulator has been developed in WITAS. It simulates a helicopter with a camera attached to it. The helicopter is ying above a landscape and some cars are driving on a road network in the landscape. Some of the evaluation will be done with this simulator.

1.2 Position measuring

There are many ways to detect the presence of vehicles in an image sequence. One way is to rst stabilize the sequence with respect to the ground. The optical ow (a velocity eld for a sequence of images, see g. 1.1) is then calculated and thresholded. The center of gravity of the blobs (see g. 1.1) are then calculated to get the positions. However, one would like to have the positions of the vehicles in some kind of world coordinates. This can also be done in many di erent ways. One way is to use a map containing positions of xed objects (landmarks). Then, if the ground is assumed to be at there exists an eight-parameter bi-linear

transformation (

y

= Ax+b

1+c

Tx) from image to ground coordinates. Since the

position and orientation of the camera varies, so do these parameters and therefore they have to be estimated in each image. The transformation depends on having four landmarks in the image. This is since it needs eight known to calculate eight unknown in a linear system.

These positions will contain an error component. In this case, some of the error is introduced when calculating the optical ow and center of gravity of the blob, and some when calculating the positions of the xed objects and then assuming it to lie in a plane. The given positions of the xed objects is probably neither so accurate. The simulator uses a rather di erent way to extract the positions of the vehicles. However, independent of which method used to get the positions, some error will be introduced.

The position of the airborne system (will be called platform from now on) is likely to be determined with a di erential GPS (Global Posi-tioning Sensor) and inertial navigation (if the platform is moving). The accuracy of this positioning depends on the GPS used (for a good dif-ferential GPS the accuracy is about one meter) and the movement of the platform (if the platform is moving the accuracy can be improved over time by using inertial navigation. However, the error component

(10)

stops, and the accuracy of the positioning will be the accuracy of the GPS). The simulator does not take this into account, thus the position obtained from the simulator was the correct position of the helicopter. However, some noise could have been added to make it more realistic.

To decrease the error the positions can be ltered, preferably with a Kalman lter.

1.3 Camera control

To track moving vehicles on the ground with a camera mounted on a platform (in this case an airborne platform), one would like to keep the vehicle in the center of the image while tracking it. To do so one must predict the movement of the vehicle and the platform (together or separately) since they move between two image frames. One would also like to lter the position (or change in position) of the vehicle and platform, since the positioning of the vehicle and platform may be rather inaccurate and also discretized (as mentioned in the section above). One also has to calculate a control signal to the camera to track the vehicle. The prediction and ltering can be done with a Kalman lter.

1.4 Color classi cation

The problem with using color for classi cation is that the apparent colors change with lighting (e.g. due to shadows). To use color for classi cation in a good way, one must make some prediction on how possible di erent changes in the color are. This can be made in some di erent ways. The easiest way is to assume that the distribution of the light spectrum is constant, thus only the intensity changes. This makes it very easy to predict the changes, since this correspond to a uniform scaling of the color components. However, in practice this assumption is not always valid. For instance, when going from direct sun light to shadow the spec-trum distribution will change, since almost all the light in the shadow has been re ected in the atmosphere. One way is to use some model for the changes, and another is to approximate the changes.

1.5 Problem de nitions

1.5.1 Position ltering

The rst problem considered in this thesis is, to use position data to detect the vehicles, give each an identity, and nally to track them.

(11)

The position data comes from an existing program that calculates the optical ow on an image sequence (see g. 1.1 below) and returns an estimate of the center of gravity of the moving objects. The use of a Kalman lter was suggested to obtain predictions of the movements of the vehicles (used when tracking the vehicles) and ltering of the noisy position data (is done to get smooth and realistic traces of the cars). The program should be able to handle the following situations.

 Vehicles disappearing under short time periods, which occurs for

example when a vehicle goes under a bridge.

 Two position data become one. This happens because the optical

ow lter must have an extension in space, so when two vehicles gets too close to each other the lter response will be interpreted as one object.

Figure 1.1: Left: From the original sequence. Right: Optical ow

1.5.2 Camera control

The second problem considered in this thesis is to keep a speci c vehicle in the center of an image while tracking it. Three di erent conditions should be considered.

(12)

of the platform, the error vector, the camera angles and a camera parameter. (Condition 1)

 Known parameters: Only the error vector,the camera angles and

the camera parameter are known. (Condition 2)

 Known parameters: Only the error vector and camera angles are

known. (Condition 3)

The error vector (the vector between the center of the image and the center of the vehicle) is given as two oats (number of pixels in the

(

x;y

) directions, see g. 1.2a where (

X;Y

) are xed coordinate axes,

y

is along the heading of the platform).

If multiplied with the distance to the object in the center of the

im-age, the camera parameter gives an estimation of the relation

meter=pixel

in the center of the image.

The control signal to the camera should be two angles,

y between

,

z

(pointing down from the platform) and

y

and

x between

z

and

x

(see g. 1.2b below). Y X y x -z Ground the x-direction Center of image in (a) (b) -x Platform Platform Error vector Center of image Car Camera view

Figure 1.2: a: Seen from above. b: Seen from one side.

p

x

1.5.3 Color classi cation

The third problem considered in this thesis is to investigate if color can be used as a means to discriminate between di erent vehicles, and if so, how.

The color classi cation should be considered to be used for two dif-ferent tasks:

(13)

 When tracking a vehicle. The color should be used to decrease the

possibility of mixing them up with each other.

 To re-identify a vehicle by detecting if it has the same color as

earlier.

In the WITAS project, the color is not intended to be used as the only feature; it is complemented by other features, e.g. length and width.

(14)

Theoretical backgrounds

This chapter contains some short presentations of some of the already existing theory which is used in this thesis.

2.1 The Kalman lter

First a small example where the Kalman lter is useful (for more de-tails about the Kalman lter and proofs see e.g. [1] or [2]). Assume that we want to estimate the position (in one direction) of an object which is oating in space (no gravity and no friction) from noisy po-sition measurements. The object weighs one kg and is a ected by an immeasurable random force. If we assume that the velocity and force are approximately constant between two measurements, we can predict

the position (

p

) and velocity (

u

) for the next time step according to

u

(

t

+ 1) =

u

(

t

) +

T



F

(

t

) (2.1)

p

(

t

+ 1) =

p

(

t

) +

T



u

(

t

) +

T

2

2 

F

(

t

) (2.2)

where

T

is the time between two time steps. The measurement can be

written

m

(

t

) =

p

(

t

) +

e

(

t

) (2.3)

where

e

(

t

) is the measurement noise. The equations (1.1) and (1.2) can

be written in matrix form

"

p

(

t

+ 1)

u

(

t

+ 1) # = " 1

T

0 1 #"

p

(

t

)

u

(

t

) # + " T2 2

T

#

F

(

t

) (2.4)

which is the form that the Kalman lter uses. If

F

(

t

) and

e

(

t

) are

(15)

of the position and velocity, which are optimal in a sense (eq. 2.10 is minimized).

The Kalman lter have been proved useful in a broad range of areas, and like the Wiener lter, it is based on a model for the signal of interest. However, in the Kalman lter the model has been generalized to a steady state:

x

(

t

+ 1) =

Fx

(

t

) +

v

(

t

) (2.5)

y

(

t

) =

Hx

(

t

) +

e

(

t

) (2.6)

where

x

is the state variables and

y

the measured variables,

v

and

e

are process noise and measurement noise, respectively. The stochastic

processes

v

and

e

are assumed white, i.e.

E

[

v

(

t

)] =

E

[

e

(

t

)] = 0 (2.7)

E

[

v

(

t

)

v

T(

s

)] =

R

1



(

t

,

s

) (2.8)

E

[

e

(

t

)

e

T(

s

)] =

R

2



(

t

,

s

) (2.9)

The Kalman lter is the solution to the problem of estimating

x

(

t

) from

the measurements

y

(

s

)

;

0

s



t

, so that

E

[j

x

(

t

),

x

^(

t

)j

2] (2.10)

is minimized, where ^

x

(

t

) is the lter estimate. Note that the matrices

F;H;R

1 and

R

2 can be functions of time. The solution is

 Prediction of states:

^

x

(

t

+ 1j

t

) =

F

x

^(

t

j

t

) (2.11)

 Update of the state estimation:

^

x

(

t

+1j

t

+1) = (

I

,

K

(

t

+1)

H

)^

x

(

t

+1j

t

)+

K

(

t

+1)

y

(

t

+1) (2.12)

 Prediction of the covariance matrix:

C

(

t

+ 1j

t

) =

FC

(

t

j

t

)

F

T +

R

1 (2.13)

 Update of the covariance matrix:

C

(

t

+1j

t

+1) = (

I

,

K

(

t

+1)

H

)

C

(

t

+1j

t

)(

I

,

K

(

t

+1)

H

)T+

K

(

t

+1)

R

2

K

T(

t

+1)

(16)

 Update of the Kalman gain matrix:

K

(

t

+ 1) =

C

(

t

+ 1j

t

)

H

T

HC

(

t

+ 1j

t

)

H

T +

R

2(

t

+ 1)

(2.15)

If there is a known input signal

u

(

t

), i.e. if (1.1) is replaced by

x

(

t

+ 1) =

Fx

(

t

) +

Gu

(

t

) +

v

(

t

) (2.16)

the only modi cation which has to be done is to replace (2.11) with ^

x

(

t

+ 1j

t

) =

F

x

^(

t

j

t

) +

Gu

(

t

) (2.17)

2.1.1 Initialization of the Kalman lter (t=0)

According to Kalman's signal model,

x

(0) is a stochastic variable with

E

[

x

(0)] =

x

0 (2.18) and

E

[(

x

(0),

x

0)(

x

(0) ,

x

0) T] =

C

0 (2.19)

Obviously, the best initialization is to set ^

x

(0j,1) =

x

0 (2.20) and

C

(0j,1) =

C

0

;

(2.21)

but if no advance information about

x

is available a natural initialization

is

^

x

(0j,1) = [0

:::

0]T (2.22)

C

(0j,1) =

kI

(2.23)

where

k

is a suciently large constant to imply that the uncertainty

about the states is large. If some boundaries of the states are known a

reasonable initialization of

C

is

C

(0j,1) = 0 B B B B @

b

2 1 0

:::

0 0

b

2 2

:::

0 ... ... ... 0 0 0 0

b

2 n 1 C C C C A (2.24)

(17)

2.1.2 Setting the variances

It is often impossible to calculate the true process variance

R

1. In this

case one has to regard

R

1 as a variable to adjust until the behavior of

the lter is satisfying. It is the proportion between

R

1 and

R

2 that

determines the behavior of the lter. If

R

1 is large and

R

2 is small

the lter will be fast but sensitive to noise. For small

R

1 and large

R

2

the opposite is true. Just setting

R

2 to some reasonable value and then

adjust

R

1will often give a good result. If there are no special demands on

the lter, a good way to adjust it is to minimize

E

[(

y

(

t

),

H

x

^(

t

j

t

,1)) 2].

To see if it is properly adjusted, you can then look at the so called

innovation

y

(

t

),

H

x

^(

t

j

t

,1), which should have approximately the same

characteristics as

e

(

t

) in (2.6). Note that there should be no maneuvers

in the test data when looking at the innovation.

2.2 Adaptive parameter estimation with Kalman

lter

The parameter variations can be written as

z

(

t

+ 1) =

z

(

t

) +

v

(

t

) (2.25)

where

z

is the parameter vector and

v

is the change between two samples.

v

(

t

) are independent stochastic vectors with mean value zero and the

covariance matrix

R

1(

t

). The relation between the parameters and the

measurements can be written as

y

(

t

) =

H

(

t

)

z

(

t

) +

e

(

t

) (2.26)

where

e

(

t

) is the measurement noise.

e

(

t

) is assumed to be an

indepen-dent stochastic variable with mean value zero and variance

R

2(

t

). This

is a special case of the steady state in section (2.1): set

F

=

I

and let

H

be a function of time. The solution is

^

z

(

t

) = ^

z

(

t

,1) +

K

(

t

)(

y

(

t

),

H

(

t

)^

z

(

t

,1)) (2.27)

K

(

t

) =

P

(

t

,1)

H

T(

t

)

R

2(

t

) +

H

(

t

)

P

(

t

,1)

H

T(

t

) (2.28)

P

(

t

) =

P

(

t

,1),

P

(

t

,1)

H

T(

t

)

H

(

t

)

P

(

t

,1)

R

2(

t

) +

H

(

t

)

P

(

t

,1)

H

T(

t

) +

R

1(

t

) (2.29)

(18)

2.3 The steady state

The steady state for the cars is obtained by setting the acceleration in

each direction (

x

and

y

) to be white noise, thus in the x-direction

d

2

x

dt

2 =

b

w

_(

t

) (2.30)

where

E

[ _

w

(

t

)] = 0 (2.31)

E

[ _

w

(

t

) _

w

(

s

)] =



(

t

,

s

) (2.32)

De ning the state vector

X

(

t

) = "

x

dxdt # (2.33) this gives the time continuous model

dX

(

t

)

dt

= " 0 1 0 0 #

X

(

t

) +

b

" 0 1 # _

w

(

t

)

:

(2.34)

Solving the di erential equation and making it time discrete gives the steady state

X

(

t

+ 1) = " 1

T

0 1 #

X

(

t

) +

b

p

T

" T 2 T p 12 1 0 #"

W

1(

t

)

W

2(

t

) # (2.35)

where

W

1 and

W

2are independent and N(0,1) (for more details see [3]).

Note that the matrix operating on

W

1 and

W

2 is ambiguous. A steady

state with the states

x

, dxdt and d2x

dt2 can be obtained in a similar way.

2.4 Color and color spaces

Humans have three types of cones which are sensitive to di erent wave-lengths of light, one for wavewave-lengths around 600 nm (red), one around 540 nm (green) and one around 450 nm (blue). This gives a three di-mensional representation of colors, the RGB color space. This color representation is widely spread and is the one used in color television. There are many other color spaces, of which two color spaces will be presented here HSV (Hue, Saturation and Value) and GCS (GOP Color Space). For further reading on colors and color spaces see e.g. [6],[7],[8].

(19)

2.4.1 The HSV color space

The HSV color space separates the luminance (Value) from the color part by de ning the V axis in the [1,1,1] direction of the RGB space. H is the angle around the V axis with zero corresponding to a vector pointing towards the R axis. S can be thought of as \how pure a color is". (see g. 2.1 below). The HSV space is really a cylinder, and not a cone or hexcone as in g. 2.1. However, the perceived change in color as saturation varies is less for dark colors (i.e. ones with a low Value parameter) than for light ones (i.e. ones with a high Value parameter), so the color space is usually distorted to form a cone to help compensate for this perception imbalance. The RGB to HSV transformation is a little strange, which can be seen below.

max=max(R,G,B) min=min(R,G,B) diff=max-min V=max

if max <> 0 then S=diff/max else S=0 if S=0 then H=undefined else Rd=(max-R)/diff; Gd=(max-G)/diff; Bd=(max-B)/diff; if R=max then H=Bd-Gd else if G=Max then H=2+Rd-Bd else if B=max then H=4+Gd-Rd; end end end H=H*60; if H<0 then H=H+360; end

(20)

Figure 2.1: The HSV space, V lies in the [1,1,1] direction in the RGB space

2.4.2 The GCS

Like the HSV color space the GCS tries to separate the luminance from the chrominance. The GCS is based on the CIEu'v'1976 standard. The

transformation from RGB space to

u

0

v

0 space (chrominance space) is

done by: X=2.77R+1.75G+1.13B Y=R+4.59G+0.06B Z=0.06G+5.59B

u

0 = 4

X

X

+ 15

Y

+ 3

Z

(2.36)

v

0 =

X

+ 159

Y

Y

+ 3

Z

(2.37)

where

Y

is the luminance. To come to the GCS from CIEu'v'1976,

the white-point in the

u

0

v

0 space is moved to origin and the Cartesian

coordinate system is made polar with argument zero pointing at green.

Instead of using

Y

as luminance information,

L

=

X

+15

Y

+3

Z

is used.

(21)

2.5 From light source to camera output

A standard color camera represents each pixel with three scalars: the RGB-values. The RGB-value obtained at a speci c pixel depends mainly on the color of the surface which projects onto that pixel, on the light sources in the scene and the properties of the camera (see g. 2.2). This section contains a simpli ed explanation on how these issues are

related. First we have the light source which at a distance

r

has a

spectrum

L

(

;r

) (



is the wavelength). Then we have the absorption

of the medium which the light travels through. This ought to be rather small if the traveled distance is rather short and the medium is air, so this is neglected. Background re ections (the light is rst re ected by other surfaces before it is re ected on the particular surface corresponding to the pixel), which are not always neglectable, are not considered in this model. The light is then re ected by the surface of interest. There are basically two types of re ections: rst we have the re ections where the light is re ected \inside" the surface (called sub-surface re ection or body re ection), then we have the case where the light is re ected on the interface (this is known as specularities, and occurs for example when watching some types of surfaces from a particular angle). The second type of re ection is not considered here, so the spectrum of the re ected light will be:

S

(



) =

R

(

;a

)

L

(

;r

) (2.38)

where

a

denotes the photometric angles. The light is then measured by

the sensors (or sensor) in the camera, in this case three (RGB). The sensor response is then given by:

p

k=

Z

F

k(



)

S

(



)

d;k

= 1

;

2

;

3 (2.39)

where

p

kis the response of the k-th sensor and

F

kis the response function

of the k-th sensor (see Fig. 2.3). For a more detailed explanation see e.g. [9].

2.6 Function approximation and principal

com-ponent analysis

This section which presents two types of function approximation; poly-nomial and neural networks, starts with a short review of principal com-ponent analysis.

(22)

Light source

Surface Pixel

Figure 2.2: The light re ecting in the surface

F

λ

F

1 F2 F3

Figure 2.3: Example of a set of response functions

2.6.1 Principal Component Analysis

The principal component analysis solves the problem of nding the dom-inant subspace of a set of vectors [

v

1

;v

2

;:::::;v

n] (a group of points

(vec-tors)). This is done by estimating the covariance matrix C for the set.

C

=

E

[(

v

,

m

)(

v

,

m

T)] (2.40)

m

=

E

[

v

] (2.41)

where

m

is the mean vector for the cluster. The largest eigenvectors

of C represents the dominant subspace for the set. The eigenvectors can be calculated in many di erent ways, e.g. by using Singular Value Decomposition (SVD).

To make this plausible, one may see what the elements in the covari-ance matrix represents. In the diagonal are the varicovari-ances in the di erent directions (along the axes) and outside the diagonal are the correlation between the di erent components. It is rather easy to imagine how a

(23)

new set of axes should be chosen, to make them uncorrelated (the eigen-vectors) with each other (if the set is fairly shaped like a line, and the dimension of the space is two or three). The conclusion is that the eigen-vectors with the largest eigenvalues represents the dominant subspace for the set.

For more details, see e.g. [10].

2.6.2 Polynomial function approximation

This type of function approximation is rather easy and straight forward.

The problem to solve is to t a polynomial to an function

f

as well as

possible. For example, a two variable function

f

can be approximated

with a second order polynomial. ~

f

(

x;y

) =

a

+

bx

+

cy

+

dx

2

+

ey

2

+

gxy

(2.42)

where

a;b;c;d;e;g

are the parameters to be determined. This is done

by minimizing



def. as



=X i j

f

(

x

i

;y

i),

f

~(

x

i

;y

i)j 2 (2.43) This can easily be solved with the least squares method, by writing it in matrix form

F

=

Zv

(2.44) where

Z

= 2 6 4 1

x

1

y

1

x

2 1

y

2 1

x

1

y

1 ... ... ... ... ... ... 1

x

n

y

n

x

2 n

y

2 n

x

n

y

n 3 7 5 (2.45)

v

= 2 6 6 6 6 6 6 6 4

a

b

c

d

e

g

3 7 7 7 7 7 7 7 5 (2.46) and

F

= 2 6 4

f

(

x

1

;y

1) ...

f

(

y

n

;y

n)

:

3 7 5 (2.47)

(24)

v

= (

Z

T

Z

),1

Z

T

F

(2.48)

The inverse might be very computation demanding if there are many unknown variables in the polynomial. However, the inverse does not need to be recalculated for every new measurement to be included in the approximation. Instead the old matrix just has to be modi ed a little, because

Z

Tnew

Z

new =

Z

T

Z

+

mm

T (2.49)

where

Z

new is the new

Z

matrix with the new measurement included,

and

m

is the new variable vector

m

= 2 6 6 6 6 6 6 6 4 1

x

new

y

new

x

2 new

y

2 new

x

new

y

new 3 7 7 7 7 7 7 7 5 (2.50) The new inverse is then

(

Z

Tnew

Z

new),1 = (

Z

T

Z

+

mm

T),1 = (

Z

T

Z

),1 , ((

Z

T

Z

),1

m

)(

m

T(

Z

T

Z

),1) 1 +

m

T(

Z

T

Z

),1

m

(2.51) which does not require as much computation.

2.6.3 Arti cial Neural Networks

An arti cial neural network is constructed by connecting small com-putational units in a network, the units are called perceptrons. The perceptrons are meant to be simpli ed models of biological neurons. A

perceptron consists of number of inputs

x

[

n

], a transfer function

S

(

Wx

)

and some outputs

u

[

k

]. In this presentation the perceptrons will only

have a single output (see g. 2.4 below).

The function

S

can be chosen in many di erent ways. Two common

functions are: the linear function

S

(

Wx

) =

Wx

and the 'sigmoid'

func-tion

S

(

Wx

) = 1

1+e

,Wx. A number of perceptrons are then connected to

each other. The following presentation will only consider a feed-forward networks (no closed loop in the network). Below a two layered structure is presented ( g. 2.5).

One property of the neural network is its ability to learn things. This can be used for function approximation. The learning is done by

(25)

. . . . . S(Wx) x[1] x[2] x[N] u

Figure 2.4: A single perceptron with N inputs

applying inputs to which the correct outputs are known and then adjust

the

W

(see g. 2.4) vectors in a smart way. The adjustment can be

done in mainly two di erent ways: one way is to adjust them for every input signal (on-line), the other way is to rst apply the hole set of input

signals and then adjust the vectors (o -line). One way to adjust the

W

vectors is to use the backpropagation algorithm.

The backpropagation adjustment rules for the layers are: The Error function:

E

p= 12XK

i=1

(

d

pi ,

u

p

i)2 (2.52)

where

p

is an indexing for the set of input-output signals and

d

pi is the

wanted output signal for the i:th perceptron in the output layer. O -line: 

W

=,



P X p=1

@E

p

@W

(2.53) On-line: 

W

=,

@E

p

@W

(2.54)

The output layer:

(26)

x[1] . . . . . . . . . . . . . . . . . . f2 f1 s1 s2 sK . . . . . . . . . . . . . v[1] v[2] u[1] u[2] u[K] . .

x[N] Hidden layer Output layer

fM v[M]

Figure 2.5: Feed-forward network with one hidden layer The hidden layer:

@E

p

@W

m =

@E

p

@v

p[

m

]

@v

@W

p[m

m

x

]

@W

@W

mm

x

(2.56) where

@E

p

@v

p[

m

] = X k

@E

p

@u

p[

k

]

@u

@W

pk[

k

v

]

@v

@W

p[k

m

v

] (2.57)

where



denotes the learning rate. For proof of the backpropagation

algorithm and more information about neural networks, see e.g. [11].

Adjustment rules

Here are two other ways to adjust the

W

vectors explained. The rst

method adds a momentum term



W

oldto eq. (2.53) and (2.54), where

is a constant to adjust. This is done to avoid oscillations. The second

method is the delta bar delta rule. The delta bar delta rule says that if

low pass ltered version of 

W

times 

W

is positive the learning rate

is increased, and decrease if negative. Thus, if the weights has been e.g. decreasing for a while, the learning rate increases. Both methods makes the adjustment faster.

(27)

Position ltering

This chapter presents an implementation of a Kalman lter for stabiliz-ing measurements of position and velocity, accordstabiliz-ing to the discussion in 1.2 and how the evaluation is done.

The test sequence used for variance estimation and evaluation is an 800 frames (32 seconds) long sequence from Hallunda (see g. 3.2) containing 29 vehicles on the main road and one on the road to the right, and maybe some on the small road to the left (very hard to see).

3.1 Variance estimations

This section considers the problem to estimate the variance of the mea-surement noise. As mentioned in 2.1.2, the performance of the Kalman lter depends on having estimates of the variances for the measurement noise.

The variances for the positions of the vehicles are estimated in two

di erent coordinate systems (

x;y

) and (

v;o

), where

v

lies in the

direc-tion of the modirec-tion of the vehicle and

o

is orthogonal to

v

. The (

v;o

)

system is tested since the variance is not expected to be equal in the

v

and

o

directions (a common vehicle is not square, which may cause

the variances to di er in the

v

and

o

directions). To estimate the

vari-ances in the

x

direction, a line

at

+

b

with length 19 is tted at every

point except the nine rst and last of the tracking data. This is done by

minimizing p +9 X t=p,9 j

at

+

b

,

x

(

t

)j 2 (3.1)

(28)

each point by calculating p+9 X t=p,9 j

at

+

b

,

x

(

t

)j 2

W

(

t

) (3.2)

where

W

(

t

) is a Hamming window. The variance in the

y

direction

is estimated in the same way. To estimate variance in in the (

v;o

)

directions the lines are tted to the position data (

x;y

) instead of to

(

x;t

) and (

y;t

). Then the angle of each line is calculated and the data is

rotated so that the new x-direction is parallel to the corresponding line. Then a line is tted to the rotated data and the variance is estimated as before. To obtain a scalar estimate the mean of the variances calculated was taken. Note that the variance probably changes with the camera parameters and the distance between the camera and vehicle. However, the necessary information needed to evaluate an adaptive system was not available.

The measurement variance is obtained by using traces from vehicles

making no maneuvers. The estimates obtained in the (

v;o

) directions

were

R

2v = 0

:

06 and

R

2o = 0

:

02, so the variances in the (

x;y

) directions

are dependent of the direction of the movement of the vehicle. The process variance is seen as a design parameter.

3.2 Implementation of the Kalman lter

This section explains how the Kalman lter is implemented (which co-ordinate system used, which variables used as states and how it is initi-ated).

The lter is implemented in four ways:

 With position and velocity as states in the

x;y

coordinates.

 With position and velocity as states in the

v;o

coordinates.

 With position, velocity and acceleration as states in the

x;y

coor-dinates.

 With position, velocity and acceleration as states in the

v;o

coor-dinates.

In the cases when the (

x;y

) coordinates are used, the variances

R

2xand

R

2y are set to 0.04, and when (

v;o

) are used

R

2v is set to 0.06 and

R

2o

to 0.02. In all four cases the process variance

R

1 is seen as a design

(29)

it was adjusted to lter as hard as possible with some conditions. The conditions imply that the lter should not lose track of a vehicle which

accelerates with 9

m=s

2 along the

v

axis or with 9

m=s

2 along the

o

axis.

The values of the conditions, are selected on the basis that no normal vehicle can accelerate more than this. By doing this it was easier to handle some of the problems that occur when no position data can be found, and I do not think it is so important to get the best possible estimation of the position.

The two implementations which did not include the acceleration in the states did prove to give the best results (probably since it is not so easy to make a good estimation of the acceleration, when the position data is quite noisy), and since they demand less computation, the two with acceleration included were out of consideration. It was harder to observe any di erence between the remaining two; the variance of the ltered data was calculated but no signi cant di erence was found. Con-sequently, which to choose depends on what is seen as most important,

the small amount of calculations needed when using the (

x;y

)

coordi-nates, or the exibility of (

v;o

) coordinates and the fact that it should

be able to give better result if the lter was adjusted to minimize the error.

The initiation of the lter is done by setting the prediction of the states to zero (which implies that we do not know anything about the position) and setting the covariance matrix to

"

s

2 0

0

v

2

#

(3.3)

where

s

is the width of the viewable area in meters (the position data

received was given in meters) and

v

is some possible velocity (the speed

limit ought to be a good value, is set to 25m/2 in this lter).

3.3 Tracking of the vehicles

This section explains how the tracking of the vehicles is performed, men-tions some problems encountered and how these were solved.

In the rst frame a lter (Kalman) is applied to each position mea-surement, and in the consecutive frames to every position measurement not already taken. When to remove a lter, is explained later. The predictions from the lters are separated in two groups: one labeled 'vehicles' and one 'not vehicles'. To be classi ed as a vehicle the lter must have tracked something during a number of samples. This is done

(30)

The tracking is done by taking the position predictions from the l-ters labeled 'vehicles' and calculate the distance between each prediction and each measured position. This gives a matrix where the value in line

k

and column

l

is the distance between prediction

k

and measurement

l

.

Then each prediction is connected to the measured position closest to the

prediction (this is done when the (

x;y

) coordinates are used; when the

(

v;o

) coordinates are used an elliptical distance measurement is used,

D

= p

3

dv

+ 5

do

. This is done since the variance is greater in the

v

direction than in the

o

direction) under the condition that the distance

is less than some reasonable value (the movements of the vehicles has

some restrictions). In this way a vector

pos

is created, see [5].

Example: The measured position closest to prediction number ve

is measurement number 2, gives

pos

(5) = 2.

If two or more predictions are connected to the same measurement, the ones with the largest distances are set to be not allowed to connect to that measurement, and the connecting part is done again. This is done until no predictions have the same measurement. If there are any measurements left they are used in the same way with the predictions from the lters labeled 'not vehicles'. If some lter labeled 'not vehicles' does not get connected to any measurement it is removed. If some lter labeled 'vehicles' does not get connected to any measurement four things may have happened:

 The vehicle drives out of the viewable area

 The vehicle disappear behind some object, e.g. a bridge

 The optical ow program fails to detect the vehicle, e.g. because

the car has no motion anymore (it has stopped)

 The vehicle gets to close to another vehicle

In the rst three cases there is not much to do without knowing anything about the context (the road, buildings etc.), so the lter keeps predicting the position under a number of steps, the search area is in-creased for each step. If no connection is found the lter is removed.

In the case when two vehicles gets too close to each other, three di erent actions could be chosen. Let one vehicle overtake the other vehicle, let one vehicle drive after the other or just let them go straight forward according to predicted velocities. Which alternative to choose

depends on the length of

x

(see g. 3.1 below) and the velocity vectors

(31)

, , , , , ,  g g @ I

x

v

car1 car2 s

Figure 3.1: The dot is the measurement,

x

is orthogonal to

v

the velocity

vector.

If the scalar product of the two velocity vectors is negative (the vehicles meet each other) they are set to just go straight forward.

If the scalar product of the two velocity vectors is positive and the

length of

x

is smaller than some number, the vehicles are set to drive

after each other. If greater, the vehicle behind the other (car1) is set to drive past the other (car2). This is done by predicting how car1 will move relative to the measurement. First the point where car1 should be when the two vehicles separate relative to the measurement is calculated by

mirroring car1's position in the line spanned by

x

, then the time for this

movement is estimated by making the assumption that the measurement is the center of gravity for the two vehicles together. This gives

vm

= (

v

1 +

v

2)

=

2 (3.4)

vr

=

v

1,

vm

(3.5)

where

v

1 and

v

2 are scalar velocities for car1 and car2, respectively.

vm

is the estimated velocity for the measurement and

vr

is the estimated

velocity for car1 relative the measurement. The time for the event is

then obtained by just dividing the distance by

vr

. The trace is then

easily calculated, and car2 is just placed on the opposite side of the measurement.

3.4 Evaluation

The evaluation which is presented in this section, is mainly done by looking at the ltered traces to see if the tracker lost track of any vehicle or mixed them up with each other, and to see if the tracking looked natural (not natural if the vehicles drive over each other or jump large distances). The variance was also estimated for the traces.

The rst part of the evaluation is rather subjective, but the program did not lose track of the vehicles or mix them up with each other in the test sequence and I think the tracking looks rather natural except

(32)

vehicles can appear from nowhere when two cars enter the viewable area closely together and then separate (see g. 3.2 below). However, these two problems could not be avoided. Another fault observed in the test sequence was that noise was sometimes classi ed as a vehicle (one clear case and three not so obvious, since it is not clear if it is a vehicle or noise). This fault can easily be removed by increasing the number of steps it takes to be classi ed as a vehicle, but when a measurement is labeled 'not vehicles' it is very vulnerable, since it has a lower priority and not allowed to disappear under a single frame, so a compromise had to be made. Another fault was that sometimes the estimated positions of the vehicles were to close to each other during overtaking, this was

to some extent avoided by setting a min value to

x

(see g. 3.2) when

the maneuver was classi ed as an overtaking. However, this was not avoided for one of the ve overtakings in the test sequence. This was partially caused by the position data switching between one and two measurements a couple of times and partially by some drawbacks in the routine that connects the measurements to the lters. The routine connects (as described before) the predictions with the measurements closest to them, which causes a problem under an overtaking because the prediction to the vehicle being passed gets closer to the measurement belonging to the passing vehicle than to it's own measurement, and this distance is shorter than the distance between the passing vehicle and this measurement. The result is that the lters get each other's measurements which makes the predictions approach each other. This would have caused the program to mix them up with each other if not the two measurements had become one again. The variance estimations decreased from

R

x= 0

:

04,

R

y = 0

:

04 to

R

x= 0

:

002,

R

y = 0

:

002.

(33)
(34)

Camera control

This chapter presents how the camera control is implemented (to be able to track a vehicle) and how the evaluation is performed.

4.1 Implementation

This section presents how the camera control is implemented for the three di erent conditions mentioned in subsection 1.5.2.

 Known parameters: The position of the vehicle and the position of

the platform relative to a xed point on the ground, the heading of the platform, the error vector, the camera angles and a camera parameter. (Condition 1)

 Known parameters: Only the error vector,the camera angles and

the camera parameter are known. (Condition 2)

 Known parameters: Only the error vector and camera angles are

known. (Condition 3)

When an equation is only shown for the

x

direction it is equivalent

for the

y

direction.

4.1.1 Condition 1

The predictions of the vehicle and platform movements are done with two Kalman lters with the states position and velocity. Since the error is given in pixels and the predicted movements are in meters and also given in di erent coordinate systems, the error is transformed to meters and the predicted movements are transformed to the coordinate system

(35)

of the platform. To do this the distance

D

between the platform and the vehicle is calculated according to

D

=q ^

p

2 pZ+ (^

p

vX,

p

^pX) 2+ (^

p

vY ,

p

^pY) 2 (4.1)

where (^

p

v

;

p

^p) are the estimated vehicle and platform positions,

respec-tively (comes from the Kalman lters, in meters). The error in meters

"

m can then be calculated as

"m ="p

c D

(4.2)

where

"

p is the error vector in pixels,

c

is the camera parameter. The

predicted movement to the next frame in the platform coordinates ^

v

pp

can be estimated as ^

v

pp= 2 6 4 sin(

p) ,cos(

p) cos(

p) sin(

p) 3 7 5^

v

pf (4.3)

where ^

v

pf is the predicted movement to the next frame for the platform

in the xed coordinates, and

p is the angle between the

X

and

y

axis

(see g. 1.2b). The vehicle movement ^

v

vf is transformed in the same

way. Then the new control angles are easily calculated, according to

xn= arctan

"

mx+ ^

v

vpx,^

v

ppx+ (^

v

ppZ+ ^

p

pZ)

tan

(

x)

^

p

pZ (4.4)

A Kalman lter has also been implemented to compensate for the rotation of helicopter. The lter used the heading angle and angular velocity as states. This lter is independent of the other lter with posi-tion and velocity as states, this is since a helicopter is not constrained to move in the heading direction. The compensation in meters are obtained by calculating:

r

=q (^

p

vX ,

p

^pX) 2+ (^

p

vY ,

p

^pY) 2 (4.5)

dy

=

r

(cos(

p,

d

p),cos(

p)) (4.6)

dx

=,

r

(sin(

p,

d

p),sin(

p)) (4.7)

where

r

is the radius from the rotation center,

dx

and

dy

are the

com-pensation factors in meters and

d

p is the predicted change in angle

(36)

4.1.2 Condition 2

Now no positions are known, but a normalized position of the vehicle relative to the platform can easily be estimated. The estimate is then Kalman ltered to get a prediction of the movement of the vehicle rela-tive to the platform.

D

norm =

q

1 + tan(

x)2+ tan(

y)2 (4.8)

^

p

xnorm =

"

xp

c D

norm+ tan(

x) (4.9)

The new angle is then given by:

xn= arctan(

"

xp

c D

norm+ ^

v

vpxnorm+ tan(

x)) (4.10)

since

arctan

"

xp

c D

+ ^

v

vpx

p

^pZ+ ^

p

pZ tan(

x) = (4

:

10) (4.11)

4.1.3 Condition 3

Now when no camera parameter is known, the program must estimate the camera parameter or some other related parameter. The error can be written as

"

xm = ^

p

xm,

h

tan(

x) (4.12)

where ^

p

xm is the position of the vehicle relative to the platform (in

meters) and

h

is the height of the platform above the ground (in meters).

If we then di erentiate (4.12) with respect to

t

we get

d

dt

(

"

xm) = ^

v

xm,

h ddt

(tan(

x)) (4.13)

where ^

v

xmis the vehicles velocity in

m=s

. However, the system is discrete

so (4.12) a nite di erence is used.

"

xmn,

"

xm

T

= ^

v

T

xm ,

h

tan(

xn),tan(

x)

T

(4.14)

where ^

v

xm is the movement of the vehicle between two frames,

multipli-cation with cTD gives

"

xpn,

"

xp= ^

v

xp,

1

c D

norm (tan(

xn),tan(

x)) (4.15)

Here we have the two parameters we want to estimate: the movement

(37)

known (the change in error),

D

norm can be calculated and the old and

new camera angles are known. So the problem turns out to be of the type

y

=

a

+

bx

(4.16)

where (

y;x

) are known and (

a;b

) unknown and vary with time. This

problem can be solved in many di erent ways; here are some of the methods tested.

 For each pair of (

x;y

), (

a;b

) can be calculated (two equations and

two unknown) if the new and old

x

, and new and old

y

, di er from

each other respectively. The

a

and

b

:s are then low pass ltered.

 A single perceptron (used in neural networks) updated with the

back propagation algorithm without moment.

 The least squares method with and without weighting

 With a Kalman lter

The two rst methods did not give acceptable results, so they are not presented here. Furthermore, since the unweighted least square is a special case of the weighted and it did not give as good results as the weighted, it is not presented here either.

Weighted least square

To solve the problem with the weighted least square it is written with matrices.

y

=

Ax

(4.17) where

y

= 2 6 6 6 6 6 6 4

y

[1]

y

[2]

y

[3] ...

y

[

n

] 3 7 7 7 7 7 7 5 (4.18) ,

A

= 2 6 6 6 1

x

[1] 1

x

[2] ... ... 3 7 7 7 (4.19)

(38)

and

x

= "

a

b

#

:

(4.20)

The least square gives the solution to the overdetermined system that

minimizesjj

Ax

,

y

jj

2 and the weighted least square solution minimizes

jj

W

(

Ax

,

y

)jj

2 where W is the weight matrix. The solution is given by

x

= (

A

T

W

T

WA

),1

A

T

W

T

Wy

(4.21)

where the columns in

A

must be linearly independent (This is checked

before solving the equation. If not linear independent the old

a

and

b

are

used). In this problem

n

is set to 10 (10 measurements are used to

esti-mate

a

and

b

) to make the estimations follow changes in

a

and

b

rather

quickly and

W

is set to a diagonal matrix with the diagonal elements

increasing with increasing indices to imply that the new measurements are more reliable than the old.

Adaptive parameter estimation with a Kalman lter

The parameter variations can be written as

z

(

t

+ 1) =

z

(

t

) +

v

(

t

) (4.22)

and the relation between the parameters and the measurements can be written as

y

(

t

) =

H

(

t

)

z

(

t

) +

e

(

t

) (4.23)

where

e

(

t

) is the measurement noise. A Kalman lter is then applied to

this as explained in section 2.2.

In this problem

R

1 is time dependent, since

a

is the movement in

pixels and the variation in

a

depends on the distance and the camera

parameter. However, the distance is not known so

R

1 is set to a

con-stant value. The variation in

b

depends on the normalized distance.

R

2

is assumed to be time independent since

y

is given in pixels (the

mea-surement error of the error in pixels ought to be rather independent of the distance and camera parameter). In this case are

H

=h 1

x

i (4.24) and

z

= "

a

b

#

:

(4.25)

(39)

To avoid instability caused by numerical problems the value of

H

(

t

)

P

(

t

,1)

H

T(

t

) (4.26)

is checked. If it is negative it is set to zero. It should always be greater

than zero since

P

(

t

) is positive de nite (

P

(

t

) is a covariance matrix).

P

(

t

) should also always be symmetric. This is guaranteed by setting

P

(

t

) = (

P

(

t

) +

P

T(

t

))

=

2 (4.27)

4.2 Evaluation

The evaluation was mainly done with a small test track where a vehicle rst drives with constant velocity along a straight line and then along half a turn of a circle. The platform was stationary to make it easy to compare the di erent solutions, and the camera parameter was x. Ap-proximately white normal distributed noise with varying variance was added to the position of the vehicle. In this test environment all im-plementations approximately gave equally good results. However if the zoom of the camera was changing during the test track the implementa-tions with adaptive parameter estimation obviously did not perform as well as the other ones. Since the implementation constrained by condi-tion one was going to be used, the other ones were not tested any more than this. However, if an implementation under condition three is going to be used the one using a Kalman lter is preferable since it is much easier to adjust to perform well.

No numbers on how well the di erent implementations perform are presented (mean deviation and mean variance) since these depends on so many di erent things (for example the movement of the car and platform, the measurement noise, the camera parameter and the change of it, the height of the platform and so on), so it would probably not be any usable information.

The condition one implementation shall be further tested in the sim-ulator mentioned in section 1.1. It simulated a helicopter with a camera attached to it, the helicopter is ying above a landscape and some ve-hicles are driving on a road network in the landscape. One thing the helicopter should be able to do was to follow a vehicle and keep it in the middle of the camera image. The measurement variance for the vehicles and the helicopter shall be estimated and the process variance set to some appropriate value. The measurement variance can be estimated in the same way as in section section 3.1. The process variance shall

(40)

the car disapears is handeled by another program. One way might be to select the parameters that minimizes the error under "normal" ma-neuvers, one must however make sure that the lter can handle large maneuvers.

(41)

Color classi cation

This chapter presents some methods to use color to make classi cation of vehicles more robust, in changing illumination.

5.1 Implementation

A large part of the time used on color classi cation was spent on nding material about this subject (both general information and work done in the area) and then read it. However, none of the methods found were seen to be applicable to this problem. The reason is that all methods found either make some restrictions on the environment (e.g. the object must have more than one color) that are not valid in this case, or require some information which are not available (e.g. knowing how the color of the object looks in di erent possible lighting conditions).

One thing one must assume, is that the change in the spectrum of the lighting changes slowly over frequencies, or that the re ectance function

R

in eq. (2.38) is quite smooth, or both. These assumptions says that,

two points in the same region of the RGB-space will change in roughly the same way. Another thing assumed is that the scene only contains two di erent types of illumination (e.g. sun and shadow or e.g. street light and shadows) at the same time.

The plan was rst to reduce the 3-dimensional RGB-space to a 2-dimensional space independent of the luminance. The reason for this is that the main problem was assumed to be the illumination changes caused by shadows (and was so in the test sequences), and one would like to think that the luminance is the only thing changing (see g. 5.1 below, RGB plot for six cars in varying lighting. 40 measurements for each car). Based on this approach, the following method were tested:

(42)

the H component was also tried)

 The chrominance components in the GCS

 To project the RGB values in a plane with its normal parallel to

the dominant direction of change in the RGB space

 To use a polynomial transformation that minimizes the spread

However, the luminance have to be used as a third component to distinguish two colors with di erent luminance (e.g. black and white).

0 50 100 150 200 250 0 50 100 150 200 250 0 50 100 150 200 250 G R B

Figure 5.1: RGB plots for a red, a white, a blue, a green, a yellow and a black vehicle

However, these methods did not give so very good results (see the evaluation). Therefore some other methods were tested. These methods are based on the idea of getting a function to approximate the angles to the direction of change in the di erent parts of the RGB-space (see g. 5.2). The method should also be able to adapt to changes in these directions.

One method tested was to use a model on how the the RGB values change with changing lighting. The model used was the simplest

possi-ble, a diagonal matrix

D

. The matrix describes the change in the RGB

values

R

caused by a speci c change in the lighting 

L

.

R

L+L=

DR

L (5.1)

If the change in lighting is

k



L

, the change in the RGB values will be

(43)

This can be made continuous

R

L1 =

F

(

t

)

R

L2 (5.3)

where

F

(

t

) can be written as

F

(

t

) =

e

Mt = 2 6 4

e

Rt 0 0 0

e

Gt 0 0 0

e

Bt 3 7 5 (5.4)

The constants



k can be calculated by selecting two points from a

suitable cluster (many measurements and large di erence between the largest eigenvalue and the other two), the points should correspond to

measurements in sun and shadow, respectively, and then set e.g.

t

= 1

to correspond to this change. The angles corresponding to the direction of change can then easily be calculated by rst calculate the tangent to the curve. This model assumes that the camera sensors are linear.

Two di erent function approximations were used: polynomial func-tion approximafunc-tion using the mean squares method (see 2.6.2) and neu-ral networks (see 2.6.3). No complete program has been made. However, the method is meant to work in the following way. When tracking of a vehicle shall begin, the RGB vector of its color is calculated. The vector is then sent to the function approximation program which returns the two angles. Then an ellipsoid probability eld is calculated around the RGB vector, e.g.

p

= 9

z

2+

k

y

2+

x

2 (5.5)

where

k

is a normalization constant and (

x;y;z

) is the distances from

the RGB vector, with

z

in the direction described by the angles. The

weighting of the

z

distance should vary with the distance from the origin,

since light colors changes more with varying lightning than dark colors do. The probability eld should probably also be non-symmetric, since e.g. a light color in a shadow can change in RGB values much more than

a dark color outside the shadow. This can be achieved by weighting the

z

distance di erent depending on if the

z

distance is positive or negative.

In the next step the real tracking begins and the RGB vectors to all vehicles (or all possible vehicles) in the image are sent to the program. The program returns the probabilities for the vehicles to help the tracker to choose the correct vehicle. The tracker then sends the RGB vector of the chosen vehicle to the program, which calculates a new reference point for the probability eld by weighting the old reference point with the new RGB vector in some way (e.g. mean value). And so on until

(44)

interesting. To later see how likely a vehicle is the same as this one, the probability is calculated with respect to the reference point.

The adapting for the model based method can be done by just up-dating the matrix (as mentioned above or with some mean of matrices) with constant time intervals or whenever it is seen necessary. The adapt-ing for the approximatadapt-ing function can be done in many ways. One easy way that might work is to divide the RGB-space in a number of sec-tions (one for rather red colors and so on). In each section one cluster of RGB measurements is selected and used for training. When a new vehicle has been tracked, the principal component analysis is used to get the dominant direction and a variance estimation in that direction. If the variance is small compared to the distance to the origin, it is not used for training. If it is not so small, it is examined to which section the cluster belongs to. If the di erence of the new and old directions is rather small nothing is done, and maybe also if the di erence is very large. If the di erence is not to small (and not too large), the old cluster is replaced with the new and the approximating functions are updated. This is not optimal since it ought to be possible to some extent predict how the directions in all sections change if the change in one is known.

0 50 100 150 200 250 300 0 100 200 300 0 50 100 150 200 250 300 G R B

Figure 5.2: A vector eld obtained from an approximating function

5.2 Evaluation

The evaluation data was made from a number of video sequences taken from a helicopter. The rst video sequence was stored as Super-VHS.

(45)

This was not sucient since the bandwidth for the colors is rather small in Super-VHS. The result is that the colors are smoothed out, which results in the vehicles having a mixture of the vehicles color and the background color. However, some video sequences stored on Digital BetaCAM were found (the RGB values are stored separately, so that the bandwidth for the colors is the same as the bandwidth for the intensity). Some parts of these sequences were digitized and used to extract test data from. The extraction of the test data was done by taking the RGB

values from a 33 pixels large neighborhood on the vehicle, and then

calculate a weighted mean of if. This is done for 40 consecutive frames for each vehicle. The vehicles are rather small in the video sequences

(maybe 510 pixels). This makes it hard to see the borders of the car,

which makes it hard to be sure that the 33 is inside the borders of

the vehicle all the time. This together with the sampling of the image and the cars not having a uniform color (e.g. the interior), has probably introduced some errors. The extraction of the RGB clusters was totally done for 34 vehicles.

Mainly six clusters (the clusters in g. 5.1) are used for evaluation. These clusters are selected to cover as large part of the RGB-space as possible.

5.2.1 HSV and GCS

These two color spaces were quite quickly dismissed, since they do not work. Fig. 5.3 below shows the HS-plane for the six vehicles. Ideally would there be six separated dot shaped clusters, thus no need to say any more about this. The GCS gave more or less the same result, so neither this space is further evaluated.

5.2.2 Projection in the dominant direction and

polyno-mial transformation

The direction to project in was chosen to the mean direction of the dominant direction of the six clusters. The result of projecting these clusters is shown in g. 5.4. The evaluation data should not be the same as the learning data to make a proper evaluation. However, if it does not work as well as demanded on the learning data, it certainly will not work good enough on the evaluation data. The result is quite good, except for the red one (the cluster to the right), and therefore was this method also excluded.

(46)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 S H

Figure 5.3: The HS-plane for the six clusters

One way is to select the points as the mean value of the projection of respectively cluster in the dominant direction (as mentioned above). The result for a second order polynomial is shown in g. 5.5. The yellow (at the bottom) and the red (to the right) have improved a little. However, it can be seen from g. 5.6 (curves of the points in the RGB-space that are transformed to selected points in the 2-dimensional space) that the selection of the points are not so good, since the curve for the red (at the bottom) is very bent. This can obviously cause very large errors. There could not be found any good way to avoid this problem, if it should be adaptable, which it was supposed to be. Consequently, this method was also excluded.

5.2.3 Angle approximation

The method using a model was tested by just looking at the lines de ned

by eq. (5.4) when

t

goes from 0 to 1, for some point in each cluster (see

g. 5.7). This was done for about 20 clusters, and all except the yellow seemed to be OK (see g. 5.7). There was only one yellow car in the test sequence. So it is not really clear if it is the model that does not work, or if it is some large errors in the measurements of the RGB values to this vehicle (I could not nd any reason for this, when looking on how the data had been extracted). The result is that this method is excluded until it is clear what causes the large error for the yellow cluster.

The evaluation of the two function approximation methods was done a little more properly. Both were trained with the set of six clusters,

References

Related documents

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av

Det har inte varit möjligt att skapa en tydlig överblick över hur FoI-verksamheten på Energimyndigheten bidrar till målet, det vill säga hur målen påverkar resursprioriteringar

Detta projekt utvecklar policymixen för strategin Smart industri (Näringsdepartementet, 2016a). En av anledningarna till en stark avgränsning är att analysen bygger på djupa

While firms that receive Almi loans often are extremely small, they have borrowed money with the intent to grow the firm, which should ensure that these firm have growth ambitions even