Feature tracking with automatic selection of spatial scales

(1)

Lars Bretzner and Tony Lindeberg

Computational Visionand Active PerceptionLaboratory (CVAP),

Department ofNumerical Analysis andComputing Science,

KTH, S-100 44 Stockholm, Sweden.

Email:bretzner@@bion.kth.se,tony@@bion.kth.se

Technical reportISRN KTHNA/P{96/21{SE.

Abstract

Whenobservingadynamic world,thesize of imagestructures mayvary

overtime. This articleemphasizestheneedfor includingexplicitmecha-

nismsforautomaticscaleselectioninfeaturetrackingalgorithmsinorder

to:(i)adaptthelocalscaleofprocessingtothelocalimagestructure,and

(ii)adapttothesizevariationsthatmayoccurovertime.

Theproblemsofcornerdetectionandblobdetectionaretreatedinde-

tail,andacombinedframeworkforfeaturetrackingispresentedinwhich

the image features at every time moment are detected at locally deter-

mined and automatically selected scales. A useful property of the scale

selectionmethod is that thescalelevelsselectedin thefeature detection

step re ect the spatial extent of the image structures. Thereby, the in-

tegratedtrackingalgorithm hastheabilityto adapt tospatial aswell as

temporalsizevariations,andcaninthiswayovercomesomeoftheinherent

limitationsofexposingxed-scaletrackingmethodstoimagesequencesin

which thesize variationsarelarge.

Inthecomposed trackingprocedure,thescaleinformation isusedfor

twoadditionalmajorpurposes:(i)fordeninglocalregionsofinterestfor

searching for matching candidates aswell assetting the window size for

correlation when evaluating matching candidates, and (ii) stability over

timeofthescaleandsignicancedescriptorsproducedbythescaleselec-

tionprocedureareusedforformulatingamulti-cuesimilaritymeasurefor

matching.

Experimentsonreal-worldsequencesarepresentedshowingtheperfor-

manceof the algorithm when applied to (individual) tracking of corners

andblobs.Specically,comparisonswithxed-scaletrackingmethodsare

included as well asillustrations of the increasein performance obtained

byusingmultiplecuesinthefeaturematchingstep.

Keywords:feature,tracking,motion,blob,corner,scale,scale-space,scale

selection,similarity,computervision

(2)

Contents

1 Introduction 1

2 The needforautomatic scaleselectionin featuretracking 3

3 Feature detection with automatic scale selection 4

3.1 Normalizedderivatives . . . 4

3.2 Cornerdetectionwithautomaticscaleselection . . . 5

3.3 Blobdetectionwithautomaticscaleselection . . . 5

4 Tracking and prediction in a multi-scale context 6

5 Matchingon multi-cuesimilarity 7

6 Combined tracking algorithm 8

7 Experimentalresults 10

7.1 Cornertracking . . . 10

7.2 Blobtracking . . . 12

8 Summaryand Discussion 14

8.1 Spatialconsistencyandstatisticalevaluation. . . 15

8.2 Multi-cuetracking . . . 15

8.3 Temporalconsistency . . . 16

A Algorithmic details 20

A.1 Prediction . . . 20

A.2 Featuredetection . . . 20

A.3 Matching . . . 20

(3)

1 Introduction

Beingable to track image structures over time is a usefuland sometimes nec-

essarycapabilityforvisionsystemsintendedto interactwithadynamicworld.

There are several computer vision algorithms in which tracking arises as an

importantsubproblem.Some situationsare:

Fixation means maintaining a relationship between a physical point or

regionintheworldandsome (usuallycentral)regioninacamerasystem.

To maintain such a relationship over time, we have to relate some char-

acteristicpropertiesof thephysicalpoint to entitiesthat aremeasurable

fromthe availableimagedata.

Objectrecognitioninadynamicallyvaryingenvironmentgivesrisetothe

same type of problem,includingthecase when thevisualagent isactive

and moves relative to the scene. Examples of the latter are navigation

as well as active scene exploration. When objects move relative to the

observer, feature tracking is a useful processing step for preserving the

identityof imagefeatures over time.

The identity problem is also essential in algorithms for motion segmen-

tation and structure from motion. To compute structural properties or

invariantdescriptorswhichdependonthetemporalvariationofageomet-

ricconguration, some mechanism isneeded formatchingcorresponding

imagefeatures overtime.

Thereisan extensive literatureontrackingmethodsoperatingwithoutspecic

apriori knowledgeabouttheworld,suchasobjectmodelsorhighlyrestricted

domains. Without any aim of giving an extensive survey, the work in this di-

rectioncan be classiedinto three maincategories:

Correlationbased tracking Thepresumablyearliestapproachtoimagematch-

ingis thecorrelationtechnique based on thesimilaritybetweencorresponding

grey-levelpatchesovertime.Givenawindowof somesize,whichcoversanim-

age detailat acertain timemoment,thecorrespondingdetail at thenexttime

moment isdened as thepositionof the window (of the same size)that gives

thehighestcorrelationscore when compared to thepreviouspatch.

Optical ow based tracking The denition of an optic ow eld gives rise to

a motion eld in the image domain, which can be interpretedas the result of

trackingallimagepointssimultaneously.Withrespecttothetrackingproblem,

the motion of coherently moving (and possibly segmented) regions computed

from optic ow algorithms can be used for guiding tracking procedures, as

shownby[Thompson et al.,1993] and [Meyerand Bouthemy,1994].

Featuretracking Overtheyearsalargenumberofapproacheshavebeendevel-

opedfortrackingimagefeaturessuchasedgesandcornersovertime.Essentially,

whatcharacterizesafeaturetrackingmethodisthatimagefeaturesarerstex-

(4)

main primitives forthe tracking and matching procedures. Concerning corner

tracking,[Shapiro et al., 1992b] detect and track cornersindividuallyinan al-

gorithmoriginallyaimedatapplicationssuchasvideoconferencing.[Smithand

Brady,1995]trackalargesetofcornersandusetheresultsina ow-basedseg-

mentationalgorithm.[ZhengandChellappa,1995]havestudiedfeaturetracking

when compensating forcamera motion,and [Geeand Cipolla,1995] track lo-

cally darkest pointswith applicationsto pose estimation. In contour tracking,

[Blake et al.,1993, Curwen et al.,1991] usesnakesto trackmoving, deforming

image features. [Cipolla and Blake, 1992] applysuch an approach to estimate

time-to-contact, and [Koller etal.,1994]trackcombinedmotionandgrey-level

boundariesintraÆc surveillance. An overview of dierent approaches to edge

trackingcan be foundintherecent bookby[Faugeras, 1993].

Thesubjectofthisarticleis toconsiderthedomainoffeaturetrackingand

to complement previous works on this subject by addressing the problem of

scale and scale selection in the spatial domain and by introducing new simi-

larity measures in the matching step. In most previous works, the analysis is

performed at a single predetermined scale. Here, we will emphasize and show

by examples why it is useful to include an explicit mechanism for automatic

scale selection to be able to handlesituations inwhich the size variations are

large.Besides avoidingexplicitsetting ofscalelevels forfeaturedetection,and

thus overcoming some of the fundamental limitations of processing image se-

quences at a single scale, it willbe demonstrated how scale levels selected by

a scale selection procedure can constitute a usefulsource of informationwhen

deningasimilaritymeasureovertime,aswellasforadaptingthewindowsize

forcorrelationto thelocal imagestructure.

Moreover, sincetheresulting matching algorithmwewillarrive at isbased

on a similaritymeasure dened as the combination of dierent discriminative

properties, and with small modications can be applied to tracking of both

corners and blobs, we will emphasize this multi-cue aspect as an important

component forincreasingtherobustness offeaturetrackingalgorithms.

Thepresentationis organizedas follows:Section 2 illustratestheneed for

adaptive scaleselection infeature tracking. It givesa hands-ondemonstration

of the improvement in performance that can be obtainedby includinga scale

selectionmechanismwhentrackingfeaturesinimagesequencesinwhichthesize

variationsovertimearelarge.Section 3describesthefeaturedetectionstepand

reviewsthebasiccomponentsinageneralprincipleforscaleselection.Sections

4 and 5 explain how the scale information obtained from these processing

modules can be used inthe prediction step and in theevaluation of matching

candidates.Section 6summarizeshowthesecomponentscanbecombinedwith

a classical feature tracking scheme with prediction followed by detection and

matching.Section 7 shows the performance of thealgorithm when applied to

real-worlddata. Feature trackingusingadaptivescalesiscompared totracking

at one, xed scale. Comparisons are also made between single-cue and multi-

cuesimilaritymeasures. Finally,we concludeinsection 8bysummarizing the

mainproperties of themethodand byoutliningnaturalextensions.

(5)

2 The need for automatic scale selection in feature tracking

To extract features from an image, we have to apply some operators to the

data. The type of features that can be extracted are largely determined by

thespatialextent ofthese operators.Whendealingwithreal-world dataabout

which no or very littleinformationis available, we can hardly expectto know

inadvance what scales are relevant forprocessing a given image. Therefore, a

reasonableapproachistoconsideralargenumberofscalessimultaneously,and

thisisoneofthemajormotivationsforusingamulti-scalerepresentationwhen

automaticallyprocessing measurement datasuch asimages.

Despite this now rather well-spread insight, most work on feature track-

ingstillperforms theanalysisat one scale only.Forcorrelation based tracking

methods,thiscorrespondsto usingaxed-sizewindowovertime,andconcern-

ing feature tracking to detecting image features at the same scale at all time

moments.Suchanapproachwill,however,suerfrominherentlimitationswhen

appliedto real-life imagesequencesinwhichthesizevariationsare large.This

basic propertyconstitutes one illustration of why a mechanism for automatic

scaleselection is an essentialcomplement to traditionalmulti-scale processing

ingeneral, and to featuredetection andfeature trackinginparticular.

In an image sequence, the size of image structures may change over time

dueto expansionsorcontractions. Atypical exampleoftheformeriswhenthe

observer approaches an object as shown in gure 1. The left column in this

gureshowsafewsnapshotsfromatrackerwhichfollowsacornerontheobject

overtimeusingastandardfeaturetrackingtechniquewithaxedscaleforcor-

nerdetection anda xedwindowsizeforhypothesis evaluation bycorrelation.

After a number of frames, the algorithm fails to detect the right feature and

thecornerislost.Thereasonwhythisoccurs,issimplythefactthatthecorner

nolongerexistsatthepredeterminedscale.Asa comparison,therightcolumn

showstheresultofincorporatingamechanismforadaptationofthescalelevels

to thelocalimage structure(details willbe given inlater sections). Ascan be

seen,thecorneriscorrectly trackedoverthewholesequence.(Thesameinitial

scalewasused inbothexperiments.)

Anothermotivationtothisworkoriginatesfromthefactthatallfeaturede-

tectors suer from localization errors dueto e.g noiseand motion blur. When

detecting rigidbodymotionorrecovering3Dstructurefromfeaturepoint cor-

respondencesinanimagesequence,itisimportantthatthemotioninthescene

islargecomparedtothelocalizationerrorsofthefeaturedetector. Iftheinter-

framemotionis small,we therefore have to trackfeaturesovera largenumber

offramestoobtainaccurateresults.Thisrequirementconstitutesakeymotiva-

tionfor includingascale selectionmechanisminthe featuretracker, to obtain

longertrajectoriesof correspondingfeatures asinputto algorithmsformotion

estimationand recovery of 3Dstructure.

Concerning the common useof xed scalelevels intrackingmethods, it is

worthpointingoutthatinsituationswheretheimagefeaturesaredistinct(e.g.

sharpcornerson asmoothbackground),traditionalmethodsusingxedscales

(6)

scale selection in such situations are that: (i) the actual tuning of the scale

parameter can be avoided, (ii) as will be illustrated later, stability over time

of theselected scalelevelsturnsoutto bea usefuldiscriminativeconstraint to

includein amatching criterion.

3 Feature detection with automatic scale selection

A natural framework to use when extracting features from image data is to

dene the image features from multi-scale dierential invariants expressed in

termsof Gaussianderivative operators[Koenderinkand vanDoorn, 1992,Flo-

rack et al., 1992], or more specically, as maxima or zero-crossings of such

entities [Lindeberg, 1994c]. In this way, image features such as corners,blobs,

edges and ridgescan becomputed at any levelof scale.

A basicproblem that arises forany such feature detector concerns how to

determineatwhatscalestheimagefeaturesshouldbeextracted,orifthefeature

detection is performed at several scales simultaneously, what image features

should be regarded as signicant. A framework addressing this problem has

beendeveloped in[Lindeberg,1993, Lindeberg,1994c].Insummary,oneof the

mainresultsfromthisworkisageneralprincipleforscaleselection,whichstates

that scalelevels for featuredetection can be selected from the scales at which

normalized dierential invariants assume maxima over scales. In this section,

we shall give a brief review of how this methodology applies to the detection

of features such as blobs and corners. The image features so obtained, with

theirassociatedattributes resulting from thescale selectionmethod,will then

beused asbasicprimitivesforthe trackingprocedure.

3.1 Normalized derivatives

The scale-space representation[Witkin, 1983, Koenderink, 1984] of a signal f

is denedastheresult ofconvolving f

L(:; t)=g(:; t)f (1)

withGaussian kernelshavingdierent valuesof thescaleparameter t

g(x; t)= 1

2t e

(x 2

+y 2

)=(2t)

(2)

In thisrepresentation, -normalizedderivatives[Lindeberg, 1996a] aredened

by

@

=t =2

@

x

(3)

wheretisthevarianceoftheGaussiankernel.Fromthisconstruction,anormal-

ized dierential invariant is then obtained by replacing all spatial derivatives

(7)

3.2 Corner detection with automatic scale selection

Acommonwaytodeneacornerinagrey-levelimageindierentialgeometric

termsis asa pointat whichboththecurvatureof alevel curve

= L

yy L

2

x +L

xx L

2

y 2L

x L

y L

xy

L 2

x +L

2

y

3=2

(4)

and thegradient magnitude

jrLj= q

L 2

x +L

2

y

(5)

arehigh[KitchenandRosenfeld,1982,KoenderinkandRichards,1988,Deriche

and Giraudon, 1990, Blom, 1992]. If we consider the product of and the

gradientmagnituderaisedto somepower,and choosethepowerequaltothree,

we obtaintheessentiallyaÆne invariant expression

~

=L

yy L

2

x +L

xx L

2

y 2L

x L

y L

xy

(6)

withits corresponding -normalized dierentialinvariant

~

norm

=t 2

~

(7)

In[Lindeberg,1994a] itisshownhow ajunctiondetectorwithautomaticscale

selectioncan be formulatedintermsof thedetection of scale-space maxima of

~

2

norm

, i.e., bydetecting pointsin scale-space where ~ 2

norm

assumesmax-

ima with respect to both scale and space. When detecting image features at

coarsescales itturnsoutthat thelocalization can be poor. Therefore,thisde-

tectionstepiscomplementedbyasecondlocalizationstage,inwhichamodied

Forstner operator[Forstnerand Gulch,1987], isusedforiterativelycomputing

new localization estimates using scale information from the initial detection

step(see thereferences fordetails).

Ausefulpropertyofthiscornerdetectionmethodisthatitleadstoselection

ofcoarserscalesforcornershavinglargespatialextent.Figure2illustratesthis

property by showing the result of applying the corner detection method to

two dierent images, and graphically illustrating each detected and localized

cornerby acircle withtheradius proportionalto thedetection scale. Notably,

the support regions of these blobs serve as natural regions of interest around

the detected corners. As we shall demonstrate later, such regions of interest

and context information turn out to be highly useful for a feature tracking

procedure.

3.3 Blob detection with automatic scaleselection

Asshownintheabovementionedreferences,a straightforwardmethod forblob

detection can be formulated in an analogous manner by detecting scale-space

maximaof thesquareof thenormalized Laplacian

r 2

L=t(L

xx +L

yy

) (8)

(8)

Thisoperatorgivesastrongresponseforblobsthatarebrighterordarkerthan

theirbackground,andinanalogywiththecornerdetectionmethod,theselected

scalelevels provideinformationaboutthecharacteristicsizeof theblob.

Figure3showstheresultofapplyingthisblobdetectionmethodtothesame

imagesasusedingure2.Ascan be seen,a representative setofblobfeatures

at dierent scales is extracted. Moreover, it can be noted how well the blob

circles re ect the size variations, in particular, considering how simple opera-

tionstheblobdetection algorithmisbasedon (Gaussiansmoothing,derivative

computation,and detection ofscale-space maxima).

4 Tracking and prediction in a multi-scale context

When tracking features over time, both the position of the feature and the

appearance of its surrounding grey-level pattern can be expected to change.

To relate features over time, we shall throughout this work make use of the

commonassumption aboutsmallmotionsbetweensuccessive frames.

Thereareseveral waystopredictthepositionofafeatureinthenextframe

basedonitspositionsinpreviousframes.WhereastheKalmanlteringmethod-

ologyhasbeencommonly usedinthecomputer visionliterature,thisapproach

suersfromafundamentallimitationifthemotiondirectionsuddenlychanges.

If a feature moving ina certain direction hasbeentracked over a long period

of time, then thebuilt-in temporal smoothing of the feature trajectory in the

Kalmanlter, impliesthatthepredictionswillcontinue tobeinessentiallythe

same direction,althoughthe actualdirectionof themotion changes. Iftheco-

variancematrices in theKalmanlter have been adapted to smalloscillations

aroundthepreviouslysmoothtrajectory,itwillhencebelikelythatthefeature

is lostat the discontinuity. 1

For this reason, we shall make use of simpler rst-order prediction, which

uses themotion betweenthe previoustwo successive framesas aprediction to

thenext frame.

2

Withina neighbourhood of each predictedfeature position,we detect new

features using the corner (or blob) detection procedure with automatic scale

selection. The support regions associated with the features serve as natural

regions of interest when searching for new corresponding features in the next

frame. In this way, we can avoid the problem of setting a global threshold

on the distance between matching candidates. There is, of course, a certain

scaling factor between the detection scale and the size of the support region.

The important propertyof this method,however, is thatit will automatically

select smaller regions of interest for small-size image structures, and larger

search regions for larger size structures. Here, we shall make use of this scale

informationforthree mainpurposes:

1

Aswillbeshownintheexperimentsinsection 7,theresultingfeaturetrajectoriesmaybe

quiteirregular. Enforced temporalsmoothing oftheimage positionsof thefeatures,leading

tosmoothertrajectories,wouldnotbeappropriateforsuchdata.

2

Bothconstantacceleration and constant velocity modelshavebeen used,butthelatter

(9)

Settingthesearch regionforpossiblematching candidates.

Settingthewindowsizeforcorrelation matching.

Usingthestabilityofthe detectionscale asamatching condition.

We setthesizeof thesearchregionto thespatialextent ofthepreviousimage

feature,multipliedbyasafetyfactor.Withinthiswindow,acertain numberof

candidate matches are selected. Then, an evaluation of these matching candi-

datesismadebasedonacombined similaritymeasuretobedenedinthenext

section.

5 Matching on multi-cuesimilarity

Based on theassumption of smallinter-frameimage motions, we use a multi-

ple cue approach to the feature matching problem. Instead of evaluating the

matching candidates using a correlation measure on a local grey-level patch

only, asdone in mostfeature trackingalgorithms, we combine the correlation

measure with signicance stability, scale stability and proximity measures as

denedbelow.

Patch similarity. This measure is a normalized Gaussian-weighted intensity

cross-correlation between two image patches. Here, we compute this measure

over a square centered at the feature and with its size set from the detection

scale. The measure is derived from the cross-correlation of the imagepatches,

see[Shapiroetal.,1992a],computedusingaGaussianweightfunctioncentered

at the feature. The motivation for using a Gaussian weight function is that

imagestructuresnearthefeaturecentershouldberegardedasmore signicant

thanperipheralstructures.Giventwo brightnessfunctionsI

A andI

B

,and two

image regions D

A

R and D

B

R of the same size jDj = jD

A

j = jD

B j

centered at p

A andp

B

respectively,theweightedcross-correlationbetweenthe

patchesis denedas:

C(A;B)= 1

jDj X

x2D

A e

(x p

A )

2

I

A (x)I

B (x p

A +p

B )

1

jDj 2

X

x

A 2D

A e

(x p

A )

2

I

A (x

A )

X

x

B 2D

B e

(x p

B )

2

I

B (x

B

) (9)

and thenormalizedweighted cross-correlationis

S

patch

(A;B)=

C(A;B)

p

C(A;A)C(B;B)

(10)

where

C(A;A)= 1

jDj X

x2D

A (e

(x p

A )

2

I

A (x))

2 1

jDj 2

( X

x2D

A e

(x p

A )

2

I

A (x) )

2

(11)

and C(B;B) is dened analogously.As is well-known, thissimilaritymeasure

is invariant to superimposed linear illumination gradients. Hence, rst-order

eects of scene lightning do not aect this measure, and the measure only

(10)

Signicance stability. A straightforwardsignicance measure of a featurede-

tectedaccordingtothemethoddescribedinsection3isthenormalizedresponse

at the localscale-space maximum.Forcorners,thismeasure is thenormalized

levelcurve curvatureaccording to (7)and forblobsitisthenormalized Lapla-

cian according to (8). To compare signicance values over time, we measure

similaritybyrelativedierencesinsteadofabsolute,anddenethismeasure as

S

sign

=jlog R

B

R

A

j (12)

where R

A

and R

B

are thesignicance measures of the corresponding features

A and B.

Scale stability. Since the features are detected at dierent scales, the ratio

between thedetection scales of two features constitutes a measure of stability

over scales. To measure relative scale variations, we use the absolute value of

thelogarithm ofthisratio, denedas

S

scal e

=jlog t

B

t

A

j (13)

wheret

A and t

B

arethe detectionscales of A andB.

Proximity We measure how wellthe position x

A

of featureA corresponds to

thepositionx

pred

predictedfrom featureB

S

pos

= kx

A x

pred k

p

t

B

(14)

wheret

B

isthedetection scalefeature B.

Combined similarity measure. In summary, the similarity measure we make

useof a weightedsumof (10), (12) and(13),

S

comb

=c

patch S

patch +c

sign S

sign +c

scal e S

scal e +c

pos S

pos

(15)

wherec

patch ,c

sing ,c

scal e and c

pos

aretuningparameters to be determined.

6 Combined tracking algorithm

By combiningthe componentsdescribed in the previoussections, we obtaina

feature tracking scheme based on a traditional predict-detect-update loop. In

addition,thefollowingprocessingstepsare added:

Quality measure. Each feature is assigned a quality measure indicating

howstableit isovertime.

Bidirectional matching. To provide additional information to later pro-

cessing stages about the reliabilityof thematches, the matching can be

donebidirectionally.GivenafeatureF

1

fromthefeatureset,werstcom-

puteits winningmatching candidate F inthe current image. Ifthen F

(11)

is the winning candidate of F

2

in the backward matching direction,the

match between F

1

and F

2

is registered as safe. This processing step is

usefulforsignallingpossiblematchingerrors.

Duringthetrackingprocedureeach featureisassociated withthe following

attributes:

{ its detectionscale t

det ,

{ its estimatedsizeD=k

size

p

t

det

boundedfrombelowto D

min ,

{ its position,

{ its qualityvalue.

An overview of thetrackingalgorithm is given ingure 4.At a more detailed

level, eachindividualmoduleoperatesasfollows:

Prediction The prediction is performed as described in section 4. For each

feature in the feature set, a linear prediction of the position in the current

frame is computed based on the positionsof the corresponding feature inthe

two previous frames. The size of the search window is computed as k

w1

D

(withthesizeD boundedfrombelow).When a trajectory isinitiated,there is

nofeaturehistorytobasethepredictionon,soweusealargersearchwindowof

sizek

w2

D (k

w2

>k

w1

) and usethe originalfeaturepositionasthe predicted

position.

Detection Ineachframe,imagefeaturesaredetectedasdescribedinsection 3.

Thewindowobtainedfromthepredictionstep issearchedforthesamekindof

featuresoveralocallyadaptedrangeofscales[t

min

;t

max

],wheret

max

=k

range

t

det and t

min

= t

det

=k

range

.The numbern of detected candidates depends on

which feature extractionmethod we usein thedetection step.

Matching The matchingis basedon thesimilaritymeasuresdescribed insec-

tion 5. The original feature is matched to the candidates obtained from the

detection stepandthewinneristhefeature havingthehighestcombinedsimi-

larityvalueaboveaxedthresholdT

comb

andapatchcorrelation valueabovea

thresholdT

patch

.Thesethresholdsarenecessarytosuppressfalsematcheswhen

featuresdisappearduetoe.g occlusion.

Ifafeatureismatched,thequalityvalueisincreasedbydq

i

anditsposition,

its scaledescriptor,its signicancevalueand its grey-levelpatchare updated.

Ifno matchisfound, thefeatureis consideredunmatched,itsqualityvalue

isdecreased bydq

d

and its positionisset to thepredictedposition.

Finallyforeachframe,thefeaturesetisparsedtodetectfeaturemergesand

to remove featureshaving qualityvaluesbelow a thresholdT

q

.When two fea-

turesmerge, theirtrajectoriesareterminatedanda newtrajectory isinitiated.

Inthisway, we obtainmorereliable featuretrajectoriesforfurtherprocessing.

(12)

7 Experimental results

7.1 Corner tracking

Let usrst demonstrate theperformanceof thealgorithm when applied to an

imagesequence consisting of 60 frames. In thissequence, thecamera moves in

afairlycomplexwayrelativeto astaticscene.Theobjectsof interestonwhich

the features (here corners) are detected are a telephone and a package on a

table. From thejunctions detected inthe initialframe, a subsetof 14 features

were selected manually asshownin gure5.

Figure6showsthesituationafter30,50and60frames. Intheillustrations,

blacksegmentsonthetrajectoriesindicatematchedpositions,whilewhiteseg-

ments show unmatched (predicted) positions. The matching is based on the

combined similarity measure incorporating patch correlation, scale stability,

signicance stabilityand proximity.The detection scales ofthe featuresare il-

lustrated by the size of the circles in the images, and we see how all corners

are detected at nescales in the initial frame. As time evolves, the detection

scales adaptto the sizechanges of theimagestructures; tracked sharp corners

arestilldetectedatnescaleswhilebluntcornersaredetectedatcoarserscales

when thecameraapproaches thescene.

Figure7 shows theresult of an attempt to trackthe same corners at xed

scales,usingtheautomaticallydetermineddetection scalesfrom theinitialim-

age. As can be seen, the sharpest corners are correctly tracked but the blunt

corners are inevitably lost. This eect is similar to the initial illustration in

section 2.

Figure 8 shows another example for a camera tracking a toy train on a

table. In the initial frame, 29 corners were selected manually; 25 on the train

and 4 on an object in the background. Some of these corners are enumerated

and willbereferred to whendiscussingtheperformancebelow.

Corner no Patch similarityonly Combined similarity measure

1 lostinframe29 lost inframe29

2 mismatchedin 18 mismatchedin18

3 mismatchedin 16 mismatchedin16

4 lostin83 |

5 mismatchedin 63 |

6 lostin81 lost in75

7 lostin33 |

8 lostin46 lost in46

Table 1: Tableshowingwhen eight of theenumeratedcorners in the trainsequence

arelost.Notethatoutofthecornerswhicharelostwhenmatchingonpatchsimilarity

only, three corners are trackedduring the whole sequence when using the combined

similaritymeasure.

Figure 9 shows thesituation after60, 100 and 140 frames, usingthe com-

(13)

showwhenthealgorithmfailedtomatchthecorners(stressingtheimportance

of keeping unmatchedfeatures over a certain numberof frames). Noisy image

data and motion blur will increase the number of matching failures. Corners

no 2, 3, 6 and 8 are lost dueto moving structures in the background causing

accidental views. In thelast frames of the sequence, corner no 9 has poor lo-

calization, since thecorner edges arealigned causing thecorner to disappear.

Theimportanceofusingthecombinedsimilaritymeasureinthematchingstep

isillustratedinthetrainsequenceingure 10,showingtheresultofmatching

on patch correlation only. We see that corners no 4, 5, and 7, which were all

tracked usingthe combined similaritymeasure, now are lost. Table 1 shows,

for both experiments, when the enumerated corners in the train sequence are

lost.

(14)

7.2 Blob tracking

Letusnowapplythesame framework forblobtracking.In thetrainsequence,

we manually selected 11 blobs on the train and 2 blobsin the background in

the initial frame shown in gure 11. Figure 12 shows the situation after 30,

90 and 150 frames. The size of the circles in the gures correspond to the

detection scales of the blobs. Notehow thedetection scaleadapts to thelocal

imagestructurewhentheblobsundergoexpansionfollowedbycontraction.All

visibleblobsexcept one aretracked duringthewholesequence.

Referring to the need for automatic scale selection in feature tracking, as

advocated insection 2,it isillustrativeto showthe resultsof attemptingblob

tracking with feature detection at a xed scale. The scale level for detecting

each blobwasautomaticallyselected intherstframeandwasthenkeptxed

throughout the sequence. Figure 13 shows the result after 30 and 150 frames.

Clearly,the trackerhas severe problems dueto theexpansion and contraction

inthesequence.

As a further illustration of the capability of the algorithm to track blobs

under large size changes we applied it to a sequence of 87 images where a

person,dressedinaspottedshirt,approachesthecamera.Inarectangulararea

intheinitial frame,the20 mostsignicant blobswere automaticallydetected,

as shown in gure 14. Figure 15 shows the results after 25, 50 and 87 frames

when matching on the combined similarity measure. All blobsexcept one are

correctly tracked over theentiresequence.

Figure 16 shows the situation after 25 frames when matching on patch

similarityonly.Comparedto gure15, three moreblobsarenow lost,and one

blob ismismatched. Inscenes like thisone, withrepetitive, similarstructures,

therate ofmismatchesis considerablyhigherifwe match on patch correlation

onlyinsteadof usingthecombinedsimilaritymeasure.

Whentryingto tracktheblobsat a xedscale,ascan beseeningure17,

most of the blobs are lost already after 25 frames. The last correctly tracked

blobis lostafter about50 frames.

Insummary,theseexperimentsshowthatsimilarqualitativepropertieshold

forblobtrackingandforjunctiontracking:(i)Byincludingthesignicanceval-

ues and theselected scale levels in the matching criterion,we obtain a better

performance than when matching on grey-level correlation only. (ii) The per-

formanceoftrackingat adaptivelydeterminedscalelevelsissuperiorcompared

to similartrackingat a xed scale.

(15)

Let usnally illustratehowfeaturetrackingwithautomaticscaleselection

over alargenumberof framesislikelyto giveustrajectorieswhichcorrespond

to reliable and stable physical scene points or regions of interest on objects.

By explicitly registering the features that are stable over time, we are able

to suppressspurious featureresponses dueto noise, temporaryocclusions etc.

Figure 18showstheinitialframeofasequenceinwhichthe10mostsignicant

blobshavebeentracked inaregionaroundthefaceofthesubject. Thesubject

rstapproachesthecameraandthenmovesbackto theinitialposition.Figure

19showsthesituationafter20,45and90frames. Wecanseethatafterawhile

only four features remain in the feature set and these are the stable features

corresponding tothenostrilsand theeyes.Thisabilitytoregisterstableimage

structures over time is clearly a desirable quality in many computer vision

applications.Notably,forgeneralsceneswithlargeexpansionsorcontractions,

a scaleselectionmechanismis essentialto allowforsuchregistrations.

(16)

8 Summary and Discussion

We have presenteda framework forfeaturetrackinginwhich a mechanism for

automatic scale selection has been built into the feature detection stage and

theadditionalattributesoftheimagefeaturesobtainedfromthescaleselection

moduleareusedforguidingtheotherprocessingstepsinthetrackingprocedure.

Wehaveargued thatsucha mechanismisessentialforanyfeaturetracking

procedure intendedtooperateinacomplexenvironment,inorderto adaptthe

scale of processing to the size variations that may occur in the image data as

wellasovertime. Ifwe attemptto trackfeatures byprocessing theimagedata

at one single scaleonly,we can hardlyexpectto be ableto followthefeatures

overlarge sizevariations. Thispropertyis a basicconsequenceof the inherent

multi-scale nature of image structures, which means that a given object may

appearindierent waysdependingon thescaleof observation.

Specically,based on a previously developed feature detection framework

withautomaticscaleselection,wehavepresentedaschemefortrackingcorners

and blobsover timeinwhich:

the imagefeatures at anytime moment are detected usinga featurede-

tection methodwithautomatic scaleselection,and

thisinformationisusedfor

{ guidingthedetection and selectionof new featurecandidates,

{ providing context informationforthematchingprocedure,

{ formulatinga similaritymeasure formatching featuresovertime.

Besidesavoiding explicitselectionof scale levels forfeaturedetection, thefea-

turedetectionprocedurewithautomaticscaleselectionallowsustotrackimage

features over largesizevariations.Asdemonstrated in theintroductoryexam-

ple in section 2, we can in this way obtain a substantial improvement in the

performancerelative to a xed-scalefeaturetracker.

Sincethescalelevelsobtainedfromthescaleselectionprocedurere ectthe

spatialextent oftheimagestructures,wecan alsousethiscontext information

foravoidingexplicitsettingsofdistancethresholdsandpredenedwindowsizes

for matching. Moreover, by including the scale and signicance information

associated with the image features from the scale selection procedure into a

multi-cue similarity measure, we showed how we in this way can improve the

reliabilityofthe low-levelmatching procedure.

Ofcourse,thereareinherentlimitationsintrackingeachfeatureindividually

asdoneinthiswork,andascan beseenfromtheexamples,thereareanumber

of situations wherethe tracking algorithm fails. Typically,thisoccursbecause

ofrapidchangesinthelocalgrey-levelpatternaroundthecorner,corresponding

to violationsof theassumption aboutsmallinter-framemotions.

Anotableconclusionthatcan bemadeinthiscontext,isthatdespitethese

limitations, we have shown byexamples that the resulting tracking procedure

is able to track most of the visible features that can be followed over time

(17)

frameworkpresentedhereprovidesanimportantsteptowardsovercomingsome

of thelimitationsinpreviousfeaturetracking algorithms.

8.1 Spatial consistency and statistical evaluation.

Intheschemepresentedsofar,eachfeatureistrackedindividually,withoutany

explicitnotionofcoherentlymovingclusters.Itisobviousthattheperformance

ofatrackingmethodcanbeimprovedifthelatternotioncanbeintroduced,and

theoverallmotionoftheclusterscan beusedforgeneratingbetterpredictions,

aswellasmorerenedevaluationcriteriaofmatchingcandidates.Toinvestigate

if the motions of the tracked features possibly correspond to the same rigid

body motion, we might compute descriptors such as aÆne 3-D coordinates.

Interesting work in this direction have been presented by [Reid and Murray,

1993,Wiles and Brady,1995, Shapiro,1995].

It is also natural to include a statistical evaluation of the reliability of

matches aswellas theirpossibleagreement with dierent clusters, as donein

[Shapiro,1995].Whereas suchan approach hasnotbeenexploredinthiswork,

thisshouldnot be interpretedasimplyingthat the scaleselection method ex-

cludestheusefulnessofastatisticalevaluation.Themainintentionbehindthis

work has been to explore how farit ispossibleto reach byusinga bottom-up

constructionoffeaturetrajectoriesandbyincludingamechanismforautomatic

scale selection in the feature detection step. Then, the intention is that these

twoapproachesshouldbeappliedina complementarymanner,wherethescale

selection method serves as a pre-conditioner for generating more reliable hy-

potheses with more reliable input data. The scale selection method can also

provide context informationover what domains statistical evaluations should

bemade.

8.2 Multi-cue tracking

A tracking method based on asingle visual cue, like those reviewedin section

1 may have a rather good performance under certain conditions butmay fail

inmore complexscenes. In thiscontext, a multi-cue approach to the tracking

problem is natural, i.e a system in which several types of algorithms operate

simultaneouslyandthealgorithmmostsuitabletoagivensituationdominates.

This means that the vision system must have the ability to evaluate the re-

liability of the various tracking methods and to switch between them in an

appropriateway.

Initial work in this direction, combining disparity cues with optical ow

basedobjectsegmentation,hasbeenperformedby[Uhlinet al.,1995].Theap-

proachdevelopedherelendsitselfnaturallytointegrationwithsuchtechniques,

in which such cues can be used for evaluating candidate feature clusters, and

the feature tracking module inturn can be used as a more rened processing

mechanismformaintainingobjecthypothesesovertime.Ofcourse,thisleadsto

basicproblemsof feature selection.Onepossibleapproach foraddressing such

(18)

8.3 Temporal consistency

Asanalremarkitisworth pointingoutthatinthiswork,theimagefeatures

ineach frame have beenextractedindependently from each other and without

anyotherexplicituseoftemporalconsistencythantheheuristicconditionthat

a feature hypothesis is allowed to survive over a few frames. To make more

explicituse of temporal consistency, it is natural to incorporate the notionof

a temporal scale-space representation [Lindeberg and Fagerstrom, 1996] and

to include scale selection over the temporal scale domain as well [Lindeberg,

1996b].

Inthiscontext, itis also natural to combine thefeaturetrackingapproach

withasimultaneouscalculationof optical owestimates andto integratethese

two approachessoas to makeuse of theirrelative advantages.These subjects,

includingtheintegrationofmultipletrackingtechniquesintoamulti-cueframe-

work, constitutemajorgoals of ourcontinuedresearch.

(19)

References

[Blakeetal.,1993] Blakeetal.\AÆne-invariantcontourtrackingwithautomaticcon-

trol ofspatiotemporal scale". InProc. 4th International Conference on Computer

Vision, Berlin,Germany,1993. IEEEComputerSocietyPress.

[Blom,1992] J.Blom. Topological and Geometrical Aspects of Image Structure. PhD

thesis. , Dept. Med. Phys. Physics,Univ.Utrecht, NL-3508Utrecht,Netherlands,

1992.

[CipollaandBlake,1992] R.Cipolla and A. Blake. \Surface orientationand time to

contactfrom image divergence anddeformation". In G.Sandini, editor, Proc. 2nd

EuropeanConferenceonComputerVision,pages187{202,SantaMargheritaLigure,

Italy,1992.SpringerVerlag,Berlin.

[Curwen etal.,1991] Curwenetal. \ParallelimplementationofLagrangiandynamics

forreal-timesnakes". InProc. BritishMachine VisionConference.SpringerVerlag,

Berlin,1991.

[DericheandGiraudon,1990] R. DericheandG.Giraudon. \AccurateCornerDetec-

tion: An Analytical Study". In Proc. 3rd Int. Conf. on Computer Vision, pages

66{70,Osaka,Japan,1990.

[Faugeras,1993] O.Faugeras. Three-dimensional computervision. MITPress,Cam-

bridge,Massachusetts,1993.

[Floracketal.,1992] L.M.J.Florack;B.M.terHaarRomeny;J.J.Koenderink,and

M.A.Viergever.\ScaleandtheDierentialStructureofImages".ImageandVision

Computing,10(6):376{388,Jul.1992.

[ForstnerandGulch,1987] W. A. Forstner and E. Gulch. \AFastOperator forDe-

tection and Precise Location of Distinct Points, Corners and Centers of Circular

Features".InProc.IntercommissionWorkshop of theInt.Soc.for Photogrammetry

andRemoteSensing,Interlaken,Switzerland,1987.

[GeeandCipolla,1995] A.H. Gee and R. Cipolla. \Fast visual tracking by tempo-

ralconsensus". TechnicalReport CUED/F-INFENG/TR207,Deptof Engineering,

UniversityofCambridge,England,1995.

[KitchenandRosenfeld,1982] L.Kitchen andA. Rosenfeld. \Gray-LevelCornerDe-

tection". Pattern Recognition Letters,1(2):95{102,1982.

[KoenderinkandRichards,1988] J. J. Koenderink and W. Richards. \Two-

Dimensional CurvatureOperators". J. of the Optical Society of America,5:7:1136{

1141,1988.

[KoenderinkandvanDoorn,1992] J.J. Koenderink and A. J. van Doorn. \Generic

neighborhood operators". IEEE Trans. Pattern Analysis and Machine Intell.,

14(6):597{605,Jun.1992.

[Koenderink,1984] J.J.Koenderink. \Thestructureofimages". Biological Cybernet-

ics,50:363{370,1984.

[Kolleretal., 1994] D.Koller;J.Weber,andJ.Malik. \Robustmultiplecartracking

withocclusionreasoning".InJ.-O.Eklundh,editor,Proc.3rdEuropeanConference

on Computer Vision, pages 189{196, Stockholm, Sweden, 1994. Springer Verlag,

Berlin.

[Lindebergand Fagerstrom,1996] T.LindebergandD.Fagerstrom.\Scale-Spacewith

causal time direction". In Proc. 4th European Conference on Computer Vision,

(20)

[Lindeberg, 1993] T. Lindeberg. \On Scale Selection for Dierential Operators". In

K.HeiaK.A.Hgdra,B.Braathen,editor,Proc. 8thScandinavian Conf.onImage

Analysis,pages857{866,Troms,Norway,May.1993.NorwegianSocietyforImage

ProcessingandPatternRecognition.

[Lindeberg, 1994a] T.Lindeberg. \Junctiondetectionwithautomaticselectionofde-

tection scales and localization scales". In Proc. 1st International Conference on

Image Processing, volume I,pages924{928,Austin,Texas,Nov. 1994.IEEE Com-

puterSocietyPress.

[Lindeberg, 1994b] T. Lindeberg. \ScaleSelection for DierentialOperators". Tech-

nicalReportISRNKTH/NA/P--94/03--SE,Dept.ofNumericalAnalysisandCom-

putingScience,KTH,Stockholm, Sweden,Jan.1994. (Submitted).

[Lindeberg, 1994c] T.Lindeberg.Scale-SpaceTheoryinComputerVision.TheKluwer

InternationalSeriesin EngineeringandComputer Science.KluwerAcademicPub-

lishers,Dordrecht,Netherlands,1994.

[Lindeberg, 1996a] T.Lindeberg. \Edgedetectionandridgedetectionwithautomatic

scale selection". In Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pat-

ternRecognition, 1996,pages 465{470,San Francisco,California,June1996.IEEE

ComputerSocietyPress.

[Lindeberg, 1996b] T.Lindeberg. \Onautomaticselectionoftemporalscales".1996.

[Meyerand Bouthemy,1994] F.G.Meyerand P.Bouthemy. \Region-basedtracking

usingaÆne motionmodels in longimagesequences". Computer Vision, Graphics,

andImage Processing:Image Understanding,60(2):119{140,1994.

[ReidandMurray,1993] I. D. Reid and D. W. Murray. \Tracking foveated corner

clustersusingaÆnestructure". InProc.4thInternational ConferenceonComputer

Vision,pages76{83,Berlin,Germany,1993.IEEEComputerSocietyPress.

[Shapiro,1995] L.S.Shapiro. AÆneanalysisof imagesequences. CambridgeUniver-

sityPress,Cambridge,England,1995.

[Shapiroet al.,1992a] L.S.Shapiro;H. Wang,andJ.M.Brady. \Acornermatching

and tracking strategy applied to videophony". Technical Report OUEL 1933/92,

RoboticsResearchGroup,UniversityofOxford,1992.

[Shapiroet al.,1992b] L. S. Shapiro; H. Wang,and J. M. Brady. \A matching and

trackingstrategyforindependentlymovingobjects". InProc. British Machine Vi-

sionConference,pages306{315.SpringerVerlag,Berlin,1992.

[ShiandTomasi,1994] J. Shi and C. Tomasi. \Good features to track". In Proc.

IEEEComp. Soc. Conf. on Computer Vision and Pattern Recognition,pages 593{

600.IEEEComputerSocietyPress,1994.

[SmithandBrady,1995] S.M.SmithandJ.M.Brady. \ASSET-2:Real-time motion

segmentation and shape tracking". IEEE Trans. Pattern Analysis and Machine

Intell., 17(8):814{820,1995.

[Thompsonet al.,1993] W.B.Thompson;P.Lechleider,andE.R.Stuck. \Detecting

moving objectsusing the rigidity constraint". IEEE Trans. Pattern Analysis and

MachineIntell.,15(2):162{165,1993.

[Uhlin etal.,1995] T. Uhlin; P. Nordlund; A. Maki, and J.-O. Eklundh. \Towards

an Active Visual Observer". In Proc. 5th International Conference on Computer

(21)

[WilesandBrady,1995] C. S. Wiles and M. Brady. \Closing the loop on multiple

motions". InProc. 5th International Conference on Computer Vision, pages308{

313.IEEEComputerSocietyPress,1995.

[Witkin,1983] A.P.Witkin.\Scale-spaceltering". InProc.8th Int.JointConf.Art.

Intell.,pages1019{1022,Karlsruhe,WestGermany,Aug.1983.

[ZhengandChellappa,1995] Q. Zhengand R. Chellappa. \Automatic feature point

extractionand trackingin imagesequencesforarbitrarycameramotion". Interna-

tional JournalofComputer Vision, 15(1):31{76,1995.

(22)

A Algorithmic details

This appendix gives adetailed listing ofthe parametersthat in uence thealgorithm

aswellastheparametersettingsthat havebeenusedforgeneratingtheexperiments.

A.1 Prediction

Theparametersdeterminingthesizeofthesearchwindow(seesection6)were

k

size

=5

k

w1

=1:5

k

w2

=2k

w1

D

min

=16

A.2 Featuredetection

Whendetectingfeatureswithautomaticscaleselection,thefollowingscalerangeswere

usedin theinitial frame:

Junctiondetection Blobdetection

t

min

=4:0 t

min

=4:0

t

max

=256:0 t

max

=512:0

andtheparameter inthenormalizedderivativeconcept(seesection3)wassetto:

=0:875 =1

Whensearchingfornewimagefeatures,thesearchformatchingcandidatestoafeature

detectedatscalet

det

wasperformedintheinterval[t

det

=k

1

;t

det k

1

],where k

range

=3.

Inallexperiments,thesamplingdensityinthescaledirectionwassettocorrespond

to aminimum of 5scalelevelsperoctave.In allother aspects, thefeature detection

algorithms followed the default implementation of junction and blob detection with

automatic scale selection described in [Lindeberg, 1994b]. The maximum number of

matchingcandidatesevaluatedforeach featurewas:

n=8 n=20

A.3 Matching

Thefollowingthresholdswereusedin thematching step

T

patch

=0:75 T

patch

=0:6

T

comb

=0:65 T

comb

=0:5

andtheparametersforcontrollingthequalitymeasure overtime(seesection6)

dq

i

=0:2

dq

d

=0:1

T =0

(23)

Similarity measures: Relativeweights Intheexperimentspresentedhere,thefol-

lowingrelativeweights(seesection5)wereusedin thecombinedsignicancemeasure

(15):

Junctiondetection Blob detection

c

patch

=1:0 c

patch

=1:0

c

sign

= 0:08 c

sign

= 0:25

c

scale

= 0:08 c

scale

= 0:08

c

pos

= 0:1 c

pos

= 0:1

Togiveaqualitativemotivationforusingtheseordersofmagnitudefortherelative

weights,letusrstestimatetherangesinwhichthesedescriptorswillvary:

For the cross-correlation measure, it trivially holds that jS

patch

j < 1. By the

thresholdingoperationon thisvalue, jT

patch

j=0:7,the variationof thisentity

isconnedtotheintervaljS

patch

j2[0:7;1:0].Inpractice,therelativevariations

areusuallyintheintervaljS

patch

j2[0:8;1:0].

Concerningthesignicancemeasure,thesignicancevaluesofcornerscomputed

from animage with grey-level valuesin therange [0;255]typicallyvary in the

intervallogR<25.Empirically, therelativevariationsare usuallyof theorder

of logR <3. For blob features, the correspondingvalues are logR <8 and

logR<1.

Concerningthestabilityofthescalevalues,therestrictedsearchrangegivenby

k

range

, implies that the relative variationof this descriptor will alwaysbe less

thanlogt1.

For theproximity measure the maximumvalue is p

20:5k

range

k

w1

5.

Withsmoothscenemotionsthevalueisnormallyconsiderablysmaller.

Motivated by the fact that the relative variation in S

patch

is about a factor of ten

smallerthanthe other entities,the relativeweightsof thecomponentsin S

comb were

setaccordingto thetable above.

Note that the correlation measure is the dominant component, and the relative

in uenceoftheothercomponentscorrespondsto abouthalf thatvariation.

The reasonwhyc

sign

isincreased in blob detection, is that thedimension ofthe

signicancemeasuresaredierent:

[~ 2

norm

]=[brightness]

6

[(r 2

norm L)

2

]=[brightness]

2

Hence,itisnaturaltoincreasethecoeÆcientofS

sign

=jlog RB

R

A

jbyafactorofthreein

blobdetectioncomparedtojunctiondetection.Asageneralrule,wehavenotperformed

anyne-tuningoftheparameters,and allparametervalueshavebeenthesameinall

experiments.

(24)

Initialframe

Fixedscale tracking

Adaptive scaletracking

Figure 1: Illustration of the importance of automatic scale selection when tracking

image structures overtime. The corner is lost using detection at a xed scale (left

column), whereasitis correctlytrackedusingadaptivescaleselection(rightcolumn).

Thesizeofthecirclescorrespondtothedetectionscalesofthecornerfeatures.

(25)

Figure2:Theresultofapplyingthecornerdetection algorithmwithautomatic

scaleselection to two dierent grey-level images.(top row) Originalgrey-level

images. (bottom row) The 100 most signicant corners superimposed onto a

bright copy of the original image. Graphically, each corner is illustrated by

circle withthe radius re ectingthedetection scale. Observe thata reasonable

set of junction candidates is obtained, and that the circles serve as natural

regionsof interest aroundthe cornersto beused infurtherprocessing.

Figure 3: The result of applyingthe blob detection algorithm with automatic

scaleselectionto thesame images asusedforcornerdetection ingure 2.The

100mostsignicantblobshavebeengraphicallyillustratedbycircleswiththeir

(26)

Algorithm:

For eachframe:

For eachfeatureF in thefeature set:

1. Prediction

1.1 Predictthepositionof thefeatureF in thecurrentframebasedon

informationfrom thepreviousframes.

1.2 Computethesearchregioninthecurrentframebasedoninformation

from thepreviousframesandthescaleofthefeature.

2. Detection

Detectncandidates C

k

overareducedset ofscalesin theregionof

interestinthecurrentframe.

3. Matching

3.1 MatcheverycandidateC

k

tothefeatureF andndthebest match

usingthecombinedsimilaritymeasure.

3.2 Optionally,performbidirectionalmatchingtoregistersafematches.

3.3 Comparethesimilarityvaluetoapredeterminedthreshold:

If above: consider the feature as matched; update its position, its

scale descriptor, its signicance value, its grey-level patch and in-

creaseitsqualityvalue.

If below:consider thefeature asunmatched; update itsposition to

thepredictedposition anddecreaseitsqualityvalue.

Parse the feature set to detect feature merges and remove features having

qualityvaluesbelowacertainthreshold.

Figure 4:Overviewofthefeaturetrackingalgorithm.

(27)

Figure 5:Thephonesequence:Theinitialframewith14detectedcorners.

(28)

Figure 6: Corner tracking with adaptive scale selection and matching on combined

similarity:thetrackedcornersinthephonesequenceafter30frames(top),50(middle)

and60frames(bottom).As canbeseen,allcornersarecorrectlytracked.

(29)

Figure 7: Corner trackingwith xed scales over time: the tracked cornersin phone

sequence after 30 frames (top), 50 (middle) and 60 frames (bottom). Note that the

bluntcornersarelostcomparedto theadaptivescaletrackingingure 6.

(30)

Figure 8:Thetrainsequence: Theinitial framewith29detectedcorners.

(31)

Figure 9: Corner trackingwith adaptive scale selection and matching on combined

similarity:thetrackedcornersinthetrainsequenceafter60frames(top),100(middle)

and140frames(bottom).

(32)

Figure 10:Matchingcandidatesonpatch correlationonly:thetrackedcornersinthe

trainsequenceafter60frames(top)and100frames(bottom).Threemorecornersare

lostascomparedtogure9.

(33)

Figure11: Thetrainsequence:Theinitialframewith13detectedblobs.(The sizeof

thecirclescorrespondtothedetectionscalesoftheblobfeatures.

(34)

Figure 12: Blob tracking with adaptive scale selection and matching on combined

similarity: thetracked blobsin thetrain sequenceafter 30 frames (top), 90 (middle)

and150frames(bottom).Allblobsarecorrectlytracked.

(35)

Figure 13: Blob tracking using xed scales in the detection procedure: the tracked

blobs in train sequence after 30 frames (top), 90 (middle) and 150 frames (bottom).

Onlyoneblobiscorrectlytrackedoverthewholesequence.

(36)

Figure14:Theinitialframeoftheshirtsequencewiththe20strongestblobsdetected

in arectangular window. Thesize of thecircles correspondto the detectionscalesof

theblobfeatures.)

(37)

Figure15:Blobmatchingusingthecombinedsimilaritymeasure:thetrackedblobsin

theshirtsequenceafter 25frames (top),50 frames(middle) and87frames (bottom).

Notehowthescales,illustratedbythesizeofthecircles,adapt tothesize changesof

theimagestructures.

(38)

Figure16:Matchingthecandidatesonpatchsimilarityonly:thetrackedblobsin the

shirt sequence after 25 frames. Compared to the top image in gure 15, three more

blobsarelostandoneismismatched.

Figure 17: Blob tracking using xed scales in the detection procedure: the tracked

blobs in theshirtsequence after 25frames. Mostblobs are alreadylost because they

nolongerexistattheinitiallychosenscale.

(39)

Figure 18: Theinitial frame ofthe face sequence with the10 mostsignicantblobs

detectedinaregionaroundthefaceofthesubject.

(40)

Figure19:Trackingtheblobsinthefacesequencewithautomaticscaleselection;the

situationafter20,45and90frames.Afterabout60framesonlythe4moststableblobs

remaininthefeatureset.