Lars Bretzner and Tony Lindeberg
Computational Visionand Active PerceptionLaboratory (CVAP),
Department ofNumerical Analysis andComputing Science,
KTH, S-100 44 Stockholm, Sweden.
Email:bretzner@@bion.kth.se,tony@@bion.kth.se
Technical reportISRN KTHNA/P{96/21{SE.
Abstract
Whenobservingadynamic world,thesize of imagestructures mayvary
overtime. This articleemphasizestheneedfor includingexplicitmecha-
nismsforautomaticscaleselectioninfeaturetrackingalgorithmsinorder
to:(i)adaptthelocalscaleofprocessingtothelocalimagestructure,and
(ii)adapttothesizevariationsthatmayoccurovertime.
Theproblemsofcornerdetectionandblobdetectionaretreatedinde-
tail,andacombinedframeworkforfeaturetrackingispresentedinwhich
the image features at every time moment are detected at locally deter-
mined and automatically selected scales. A useful property of the scale
selectionmethod is that thescalelevelsselectedin thefeature detection
step re ect the spatial extent of the image structures. Thereby, the in-
tegratedtrackingalgorithm hastheabilityto adapt tospatial aswell as
temporalsizevariations,andcaninthiswayovercomesomeoftheinherent
limitationsofexposingxed-scaletrackingmethodstoimagesequencesin
which thesize variationsarelarge.
Inthecomposed trackingprocedure,thescaleinformation isusedfor
twoadditionalmajorpurposes:(i)fordeninglocalregionsofinterestfor
searching for matching candidates aswell assetting the window size for
correlation when evaluating matching candidates, and (ii) stability over
timeofthescaleandsignicancedescriptorsproducedbythescaleselec-
tionprocedureareusedforformulatingamulti-cuesimilaritymeasurefor
matching.
Experimentsonreal-worldsequencesarepresentedshowingtheperfor-
manceof the algorithm when applied to (individual) tracking of corners
andblobs.Specically,comparisonswithxed-scaletrackingmethodsare
included as well asillustrations of the increasein performance obtained
byusingmultiplecuesinthefeaturematchingstep.
Keywords:feature,tracking,motion,blob,corner,scale,scale-space,scale
selection,similarity,computervision
Contents
1 Introduction 1
2 The needforautomatic scaleselectionin featuretracking 3
3 Feature detection with automatic scale selection 4
3.1 Normalizedderivatives . . . 4
3.2 Cornerdetectionwithautomaticscaleselection . . . 5
3.3 Blobdetectionwithautomaticscaleselection . . . 5
4 Tracking and prediction in a multi-scale context 6
5 Matchingon multi-cuesimilarity 7
6 Combined tracking algorithm 8
7 Experimentalresults 10
7.1 Cornertracking . . . 10
7.2 Blobtracking . . . 12
8 Summaryand Discussion 14
8.1 Spatialconsistencyandstatisticalevaluation. . . 15
8.2 Multi-cuetracking . . . 15
8.3 Temporalconsistency . . . 16
A Algorithmic details 20
A.1 Prediction . . . 20
A.2 Featuredetection . . . 20
A.3 Matching . . . 20
1 Introduction
Beingable to track image structures over time is a usefuland sometimes nec-
essarycapabilityforvisionsystemsintendedto interactwithadynamicworld.
There are several computer vision algorithms in which tracking arises as an
importantsubproblem.Some situationsare:
Fixation means maintaining a relationship between a physical point or
regionintheworldandsome (usuallycentral)regioninacamerasystem.
To maintain such a relationship over time, we have to relate some char-
acteristicpropertiesof thephysicalpoint to entitiesthat aremeasurable
fromthe availableimagedata.
Objectrecognitioninadynamicallyvaryingenvironmentgivesrisetothe
same type of problem,includingthecase when thevisualagent isactive
and moves relative to the scene. Examples of the latter are navigation
as well as active scene exploration. When objects move relative to the
observer, feature tracking is a useful processing step for preserving the
identityof imagefeatures over time.
The identity problem is also essential in algorithms for motion segmen-
tation and structure from motion. To compute structural properties or
invariantdescriptorswhichdependonthetemporalvariationofageomet-
ricconguration, some mechanism isneeded formatchingcorresponding
imagefeatures overtime.
Thereisan extensive literatureontrackingmethodsoperatingwithoutspecic
apriori knowledgeabouttheworld,suchasobjectmodelsorhighlyrestricted
domains. Without any aim of giving an extensive survey, the work in this di-
rectioncan be classiedinto three maincategories:
Correlationbased tracking Thepresumablyearliestapproachtoimagematch-
ingis thecorrelationtechnique based on thesimilaritybetweencorresponding
grey-levelpatchesovertime.Givenawindowof somesize,whichcoversanim-
age detailat acertain timemoment,thecorrespondingdetail at thenexttime
moment isdened as thepositionof the window (of the same size)that gives
thehighestcorrelationscore when compared to thepreviouspatch.
Optical ow based tracking The denition of an optic ow eld gives rise to
a motion eld in the image domain, which can be interpretedas the result of
trackingallimagepointssimultaneously.Withrespecttothetrackingproblem,
the motion of coherently moving (and possibly segmented) regions computed
from optic ow algorithms can be used for guiding tracking procedures, as
shownby[Thompson et al.,1993] and [Meyerand Bouthemy,1994].
Featuretracking Overtheyearsalargenumberofapproacheshavebeendevel-
opedfortrackingimagefeaturessuchasedgesandcornersovertime.Essentially,
whatcharacterizesafeaturetrackingmethodisthatimagefeaturesarerstex-
main primitives forthe tracking and matching procedures. Concerning corner
tracking,[Shapiro et al., 1992b] detect and track cornersindividuallyinan al-
gorithmoriginallyaimedatapplicationssuchasvideoconferencing.[Smithand
Brady,1995]trackalargesetofcornersandusetheresultsina ow-basedseg-
mentationalgorithm.[ZhengandChellappa,1995]havestudiedfeaturetracking
when compensating forcamera motion,and [Geeand Cipolla,1995] track lo-
cally darkest pointswith applicationsto pose estimation. In contour tracking,
[Blake et al.,1993, Curwen et al.,1991] usesnakesto trackmoving, deforming
image features. [Cipolla and Blake, 1992] applysuch an approach to estimate
time-to-contact, and [Koller etal.,1994]trackcombinedmotionandgrey-level
boundariesintraÆc surveillance. An overview of dierent approaches to edge
trackingcan be foundintherecent bookby[Faugeras, 1993].
Thesubjectofthisarticleis toconsiderthedomainoffeaturetrackingand
to complement previous works on this subject by addressing the problem of
scale and scale selection in the spatial domain and by introducing new simi-
larity measures in the matching step. In most previous works, the analysis is
performed at a single predetermined scale. Here, we will emphasize and show
by examples why it is useful to include an explicit mechanism for automatic
scale selection to be able to handlesituations inwhich the size variations are
large.Besides avoidingexplicitsetting ofscalelevels forfeaturedetection,and
thus overcoming some of the fundamental limitations of processing image se-
quences at a single scale, it willbe demonstrated how scale levels selected by
a scale selection procedure can constitute a usefulsource of informationwhen
deningasimilaritymeasureovertime,aswellasforadaptingthewindowsize
forcorrelationto thelocal imagestructure.
Moreover, sincetheresulting matching algorithmwewillarrive at isbased
on a similaritymeasure dened as the combination of dierent discriminative
properties, and with small modications can be applied to tracking of both
corners and blobs, we will emphasize this multi-cue aspect as an important
component forincreasingtherobustness offeaturetrackingalgorithms.
Thepresentationis organizedas follows:Section 2 illustratestheneed for
adaptive scaleselection infeature tracking. It givesa hands-ondemonstration
of the improvement in performance that can be obtainedby includinga scale
selectionmechanismwhentrackingfeaturesinimagesequencesinwhichthesize
variationsovertimearelarge.Section 3describesthefeaturedetectionstepand
reviewsthebasiccomponentsinageneralprincipleforscaleselection.Sections
4 and 5 explain how the scale information obtained from these processing
modules can be used inthe prediction step and in theevaluation of matching
candidates.Section 6summarizeshowthesecomponentscanbecombinedwith
a classical feature tracking scheme with prediction followed by detection and
matching.Section 7 shows the performance of thealgorithm when applied to
real-worlddata. Feature trackingusingadaptivescalesiscompared totracking
at one, xed scale. Comparisons are also made between single-cue and multi-
cuesimilaritymeasures. Finally,we concludeinsection 8bysummarizing the
mainproperties of themethodand byoutliningnaturalextensions.
2 The need for automatic scale selection in feature tracking
To extract features from an image, we have to apply some operators to the
data. The type of features that can be extracted are largely determined by
thespatialextent ofthese operators.Whendealingwithreal-world dataabout
which no or very littleinformationis available, we can hardly expectto know
inadvance what scales are relevant forprocessing a given image. Therefore, a
reasonableapproachistoconsideralargenumberofscalessimultaneously,and
thisisoneofthemajormotivationsforusingamulti-scalerepresentationwhen
automaticallyprocessing measurement datasuch asimages.
Despite this now rather well-spread insight, most work on feature track-
ingstillperforms theanalysisat one scale only.Forcorrelation based tracking
methods,thiscorrespondsto usingaxed-sizewindowovertime,andconcern-
ing feature tracking to detecting image features at the same scale at all time
moments.Suchanapproachwill,however,suerfrominherentlimitationswhen
appliedto real-life imagesequencesinwhichthesizevariationsare large.This
basic propertyconstitutes one illustration of why a mechanism for automatic
scaleselection is an essentialcomplement to traditionalmulti-scale processing
ingeneral, and to featuredetection andfeature trackinginparticular.
In an image sequence, the size of image structures may change over time
dueto expansionsorcontractions. Atypical exampleoftheformeriswhenthe
observer approaches an object as shown in gure 1. The left column in this
gureshowsafewsnapshotsfromatrackerwhichfollowsacornerontheobject
overtimeusingastandardfeaturetrackingtechniquewithaxedscaleforcor-
nerdetection anda xedwindowsizeforhypothesis evaluation bycorrelation.
After a number of frames, the algorithm fails to detect the right feature and
thecornerislost.Thereasonwhythisoccurs,issimplythefactthatthecorner
nolongerexistsatthepredeterminedscale.Asa comparison,therightcolumn
showstheresultofincorporatingamechanismforadaptationofthescalelevels
to thelocalimage structure(details willbe given inlater sections). Ascan be
seen,thecorneriscorrectly trackedoverthewholesequence.(Thesameinitial
scalewasused inbothexperiments.)
Anothermotivationtothisworkoriginatesfromthefactthatallfeaturede-
tectors suer from localization errors dueto e.g noiseand motion blur. When
detecting rigidbodymotionorrecovering3Dstructurefromfeaturepoint cor-
respondencesinanimagesequence,itisimportantthatthemotioninthescene
islargecomparedtothelocalizationerrorsofthefeaturedetector. Iftheinter-
framemotionis small,we therefore have to trackfeaturesovera largenumber
offramestoobtainaccurateresults.Thisrequirementconstitutesakeymotiva-
tionfor includingascale selectionmechanisminthe featuretracker, to obtain
longertrajectoriesof correspondingfeatures asinputto algorithmsformotion
estimationand recovery of 3Dstructure.
Concerning the common useof xed scalelevels intrackingmethods, it is
worthpointingoutthatinsituationswheretheimagefeaturesaredistinct(e.g.
sharpcornerson asmoothbackground),traditionalmethodsusingxedscales
scale selection in such situations are that: (i) the actual tuning of the scale
parameter can be avoided, (ii) as will be illustrated later, stability over time
of theselected scalelevelsturnsoutto bea usefuldiscriminativeconstraint to
includein amatching criterion.
3 Feature detection with automatic scale selection
A natural framework to use when extracting features from image data is to
dene the image features from multi-scale dierential invariants expressed in
termsof Gaussianderivative operators[Koenderinkand vanDoorn, 1992,Flo-
rack et al., 1992], or more specically, as maxima or zero-crossings of such
entities [Lindeberg, 1994c]. In this way, image features such as corners,blobs,
edges and ridgescan becomputed at any levelof scale.
A basicproblem that arises forany such feature detector concerns how to
determineatwhatscalestheimagefeaturesshouldbeextracted,orifthefeature
detection is performed at several scales simultaneously, what image features
should be regarded as signicant. A framework addressing this problem has
beendeveloped in[Lindeberg,1993, Lindeberg,1994c].Insummary,oneof the
mainresultsfromthisworkisageneralprincipleforscaleselection,whichstates
that scalelevels for featuredetection can be selected from the scales at which
normalized dierential invariants assume maxima over scales. In this section,
we shall give a brief review of how this methodology applies to the detection
of features such as blobs and corners. The image features so obtained, with
theirassociatedattributes resulting from thescale selectionmethod,will then
beused asbasicprimitivesforthe trackingprocedure.
3.1 Normalized derivatives
The scale-space representation[Witkin, 1983, Koenderink, 1984] of a signal f
is denedastheresult ofconvolving f
L(:; t)=g(:; t)f (1)
withGaussian kernelshavingdierent valuesof thescaleparameter t
g(x; t)= 1
2t e
(x 2
+y 2
)=(2t)
(2)
In thisrepresentation, -normalizedderivatives[Lindeberg, 1996a] aredened
by
@
=t =2
@
x
(3)
wheretisthevarianceoftheGaussiankernel.Fromthisconstruction,anormal-
ized dierential invariant is then obtained by replacing all spatial derivatives
3.2 Corner detection with automatic scale selection
Acommonwaytodeneacornerinagrey-levelimageindierentialgeometric
termsis asa pointat whichboththecurvatureof alevel curve
= L
yy L
2
x +L
xx L
2
y 2L
x L
y L
xy
L 2
x +L
2
y
3=2
(4)
and thegradient magnitude
jrLj= q
L 2
x +L
2
y
(5)
arehigh[KitchenandRosenfeld,1982,KoenderinkandRichards,1988,Deriche
and Giraudon, 1990, Blom, 1992]. If we consider the product of and the
gradientmagnituderaisedto somepower,and choosethepowerequaltothree,
we obtaintheessentiallyaÆne invariant expression
~
=L
yy L
2
x +L
xx L
2
y 2L
x L
y L
xy
(6)
withits corresponding -normalized dierentialinvariant
~
norm
=t 2
~
(7)
In[Lindeberg,1994a] itisshownhow ajunctiondetectorwithautomaticscale
selectioncan be formulatedintermsof thedetection of scale-space maxima of
~
2
norm
, i.e., bydetecting pointsin scale-space where ~ 2
norm
assumesmax-
ima with respect to both scale and space. When detecting image features at
coarsescales itturnsoutthat thelocalization can be poor. Therefore,thisde-
tectionstepiscomplementedbyasecondlocalizationstage,inwhichamodied
Forstner operator[Forstnerand Gulch,1987], isusedforiterativelycomputing
new localization estimates using scale information from the initial detection
step(see thereferences fordetails).
Ausefulpropertyofthiscornerdetectionmethodisthatitleadstoselection
ofcoarserscalesforcornershavinglargespatialextent.Figure2illustratesthis
property by showing the result of applying the corner detection method to
two dierent images, and graphically illustrating each detected and localized
cornerby acircle withtheradius proportionalto thedetection scale. Notably,
the support regions of these blobs serve as natural regions of interest around
the detected corners. As we shall demonstrate later, such regions of interest
and context information turn out to be highly useful for a feature tracking
procedure.
3.3 Blob detection with automatic scaleselection
Asshownintheabovementionedreferences,a straightforwardmethod forblob
detection can be formulated in an analogous manner by detecting scale-space
maximaof thesquareof thenormalized Laplacian
r 2
L=t(L
xx +L
yy
) (8)
Thisoperatorgivesastrongresponseforblobsthatarebrighterordarkerthan
theirbackground,andinanalogywiththecornerdetectionmethod,theselected
scalelevels provideinformationaboutthecharacteristicsizeof theblob.
Figure3showstheresultofapplyingthisblobdetectionmethodtothesame
imagesasusedingure2.Ascan be seen,a representative setofblobfeatures
at dierent scales is extracted. Moreover, it can be noted how well the blob
circles re ect the size variations, in particular, considering how simple opera-
tionstheblobdetection algorithmisbasedon (Gaussiansmoothing,derivative
computation,and detection ofscale-space maxima).
4 Tracking and prediction in a multi-scale context
When tracking features over time, both the position of the feature and the
appearance of its surrounding grey-level pattern can be expected to change.
To relate features over time, we shall throughout this work make use of the
commonassumption aboutsmallmotionsbetweensuccessive frames.
Thereareseveral waystopredictthepositionofafeatureinthenextframe
basedonitspositionsinpreviousframes.WhereastheKalmanlteringmethod-
ologyhasbeencommonly usedinthecomputer visionliterature,thisapproach
suersfromafundamentallimitationifthemotiondirectionsuddenlychanges.
If a feature moving ina certain direction hasbeentracked over a long period
of time, then thebuilt-in temporal smoothing of the feature trajectory in the
Kalmanlter, impliesthatthepredictionswillcontinue tobeinessentiallythe
same direction,althoughthe actualdirectionof themotion changes. Iftheco-
variancematrices in theKalmanlter have been adapted to smalloscillations
aroundthepreviouslysmoothtrajectory,itwillhencebelikelythatthefeature
is lostat the discontinuity. 1
For this reason, we shall make use of simpler rst-order prediction, which
uses themotion betweenthe previoustwo successive framesas aprediction to
thenext frame.
2
Withina neighbourhood of each predictedfeature position,we detect new
features using the corner (or blob) detection procedure with automatic scale
selection. The support regions associated with the features serve as natural
regions of interest when searching for new corresponding features in the next
frame. In this way, we can avoid the problem of setting a global threshold
on the distance between matching candidates. There is, of course, a certain
scaling factor between the detection scale and the size of the support region.
The important propertyof this method,however, is thatit will automatically
select smaller regions of interest for small-size image structures, and larger
search regions for larger size structures. Here, we shall make use of this scale
informationforthree mainpurposes:
1
Aswillbeshownintheexperimentsinsection 7,theresultingfeaturetrajectoriesmaybe
quiteirregular. Enforced temporalsmoothing oftheimage positionsof thefeatures,leading
tosmoothertrajectories,wouldnotbeappropriateforsuchdata.
2
Bothconstantacceleration and constant velocity modelshavebeen used,butthelatter
Settingthesearch regionforpossiblematching candidates.
Settingthewindowsizeforcorrelation matching.
Usingthestabilityofthe detectionscale asamatching condition.
We setthesizeof thesearchregionto thespatialextent ofthepreviousimage
feature,multipliedbyasafetyfactor.Withinthiswindow,acertain numberof
candidate matches are selected. Then, an evaluation of these matching candi-
datesismadebasedonacombined similaritymeasuretobedenedinthenext
section.
5 Matching on multi-cuesimilarity
Based on theassumption of smallinter-frameimage motions, we use a multi-
ple cue approach to the feature matching problem. Instead of evaluating the
matching candidates using a correlation measure on a local grey-level patch
only, asdone in mostfeature trackingalgorithms, we combine the correlation
measure with signicance stability, scale stability and proximity measures as
denedbelow.
Patch similarity. This measure is a normalized Gaussian-weighted intensity
cross-correlation between two image patches. Here, we compute this measure
over a square centered at the feature and with its size set from the detection
scale. The measure is derived from the cross-correlation of the imagepatches,
see[Shapiroetal.,1992a],computedusingaGaussianweightfunctioncentered
at the feature. The motivation for using a Gaussian weight function is that
imagestructuresnearthefeaturecentershouldberegardedasmore signicant
thanperipheralstructures.Giventwo brightnessfunctionsI
A andI
B
,and two
image regions D
A
R and D
B
R of the same size jDj = jD
A
j = jD
B j
centered at p
A andp
B
respectively,theweightedcross-correlationbetweenthe
patchesis denedas:
C(A;B)= 1
jDj X
x2D
A e
(x p
A )
2
I
A (x)I
B (x p
A +p
B )
1
jDj 2
X
x
A 2D
A e
(x p
A )
2
I
A (x
A )
X
x
B 2D
B e
(x p
B )
2
I
B (x
B
) (9)
and thenormalizedweighted cross-correlationis
S
patch
(A;B)=
C(A;B)
p
C(A;A)C(B;B)
(10)
where
C(A;A)= 1
jDj X
x2D
A (e
(x p
A )
2
I
A (x))
2 1
jDj 2
( X
x2D
A e
(x p
A )
2
I
A (x) )
2
(11)
and C(B;B) is dened analogously.As is well-known, thissimilaritymeasure
is invariant to superimposed linear illumination gradients. Hence, rst-order
eects of scene lightning do not aect this measure, and the measure only
Signicance stability. A straightforwardsignicance measure of a featurede-
tectedaccordingtothemethoddescribedinsection3isthenormalizedresponse
at the localscale-space maximum.Forcorners,thismeasure is thenormalized
levelcurve curvatureaccording to (7)and forblobsitisthenormalized Lapla-
cian according to (8). To compare signicance values over time, we measure
similaritybyrelativedierencesinsteadofabsolute,anddenethismeasure as
S
sign
=jlog R
B
R
A
j (12)
where R
A
and R
B
are thesignicance measures of the corresponding features
A and B.
Scale stability. Since the features are detected at dierent scales, the ratio
between thedetection scales of two features constitutes a measure of stability
over scales. To measure relative scale variations, we use the absolute value of
thelogarithm ofthisratio, denedas
S
scal e
=jlog t
B
t
A
j (13)
wheret
A and t
B
arethe detectionscales of A andB.
Proximity We measure how wellthe position x
A
of featureA corresponds to
thepositionx
pred
predictedfrom featureB
S
pos
= kx
A x
pred k
p
t
B
(14)
wheret
B
isthedetection scalefeature B.
Combined similarity measure. In summary, the similarity measure we make
useof a weightedsumof (10), (12) and(13),
S
comb
=c
patch S
patch +c
sign S
sign +c
scal e S
scal e +c
pos S
pos
(15)
wherec
patch ,c
sing ,c
scal e and c
pos
aretuningparameters to be determined.
6 Combined tracking algorithm
By combiningthe componentsdescribed in the previoussections, we obtaina
feature tracking scheme based on a traditional predict-detect-update loop. In
addition,thefollowingprocessingstepsare added:
Quality measure. Each feature is assigned a quality measure indicating
howstableit isovertime.
Bidirectional matching. To provide additional information to later pro-
cessing stages about the reliabilityof thematches, the matching can be
donebidirectionally.GivenafeatureF
1
fromthefeatureset,werstcom-
puteits winningmatching candidate F inthe current image. Ifthen F
is the winning candidate of F
2
in the backward matching direction,the
match between F
1
and F
2
is registered as safe. This processing step is
usefulforsignallingpossiblematchingerrors.
Duringthetrackingprocedureeach featureisassociated withthe following
attributes:
{ its detectionscale t
det ,
{ its estimatedsizeD=k
size
p
t
det
boundedfrombelowto D
min ,
{ its position,
{ its qualityvalue.
An overview of thetrackingalgorithm is given ingure 4.At a more detailed
level, eachindividualmoduleoperatesasfollows:
Prediction The prediction is performed as described in section 4. For each
feature in the feature set, a linear prediction of the position in the current
frame is computed based on the positionsof the corresponding feature inthe
two previous frames. The size of the search window is computed as k
w1
D
(withthesizeD boundedfrombelow).When a trajectory isinitiated,there is
nofeaturehistorytobasethepredictionon,soweusealargersearchwindowof
sizek
w2
D (k
w2
>k
w1
) and usethe originalfeaturepositionasthe predicted
position.
Detection Ineachframe,imagefeaturesaredetectedasdescribedinsection 3.
Thewindowobtainedfromthepredictionstep issearchedforthesamekindof
featuresoveralocallyadaptedrangeofscales[t
min
;t
max
],wheret
max
=k
range
t
det and t
min
= t
det
=k
range
.The numbern of detected candidates depends on
which feature extractionmethod we usein thedetection step.
Matching The matchingis basedon thesimilaritymeasuresdescribed insec-
tion 5. The original feature is matched to the candidates obtained from the
detection stepandthewinneristhefeature havingthehighestcombinedsimi-
larityvalueaboveaxedthresholdT
comb
andapatchcorrelation valueabovea
thresholdT
patch
.Thesethresholdsarenecessarytosuppressfalsematcheswhen
featuresdisappearduetoe.g occlusion.
Ifafeatureismatched,thequalityvalueisincreasedbydq
i
anditsposition,
its scaledescriptor,its signicancevalueand its grey-levelpatchare updated.
Ifno matchisfound, thefeatureis consideredunmatched,itsqualityvalue
isdecreased bydq
d
and its positionisset to thepredictedposition.
Finallyforeachframe,thefeaturesetisparsedtodetectfeaturemergesand
to remove featureshaving qualityvaluesbelow a thresholdT
q
.When two fea-
turesmerge, theirtrajectoriesareterminatedanda newtrajectory isinitiated.
Inthisway, we obtainmorereliable featuretrajectoriesforfurtherprocessing.
7 Experimental results
7.1 Corner tracking
Let usrst demonstrate theperformanceof thealgorithm when applied to an
imagesequence consisting of 60 frames. In thissequence, thecamera moves in
afairlycomplexwayrelativeto astaticscene.Theobjectsof interestonwhich
the features (here corners) are detected are a telephone and a package on a
table. From thejunctions detected inthe initialframe, a subsetof 14 features
were selected manually asshownin gure5.
Figure6showsthesituationafter30,50and60frames. Intheillustrations,
blacksegmentsonthetrajectoriesindicatematchedpositions,whilewhiteseg-
ments show unmatched (predicted) positions. The matching is based on the
combined similarity measure incorporating patch correlation, scale stability,
signicance stabilityand proximity.The detection scales ofthe featuresare il-
lustrated by the size of the circles in the images, and we see how all corners
are detected at nescales in the initial frame. As time evolves, the detection
scales adaptto the sizechanges of theimagestructures; tracked sharp corners
arestilldetectedatnescaleswhilebluntcornersaredetectedatcoarserscales
when thecameraapproaches thescene.
Figure7 shows theresult of an attempt to trackthe same corners at xed
scales,usingtheautomaticallydetermineddetection scalesfrom theinitialim-
age. As can be seen, the sharpest corners are correctly tracked but the blunt
corners are inevitably lost. This eect is similar to the initial illustration in
section 2.
Figure 8 shows another example for a camera tracking a toy train on a
table. In the initial frame, 29 corners were selected manually; 25 on the train
and 4 on an object in the background. Some of these corners are enumerated
and willbereferred to whendiscussingtheperformancebelow.
Corner no Patch similarityonly Combined similarity measure
1 lostinframe29 lost inframe29
2 mismatchedin 18 mismatchedin18
3 mismatchedin 16 mismatchedin16
4 lostin83 |
5 mismatchedin 63 |
6 lostin81 lost in75
7 lostin33 |
8 lostin46 lost in46
Table 1: Tableshowingwhen eight of theenumeratedcorners in the trainsequence
arelost.Notethatoutofthecornerswhicharelostwhenmatchingonpatchsimilarity
only, three corners are trackedduring the whole sequence when using the combined
similaritymeasure.
Figure 9 shows thesituation after60, 100 and 140 frames, usingthe com-
showwhenthealgorithmfailedtomatchthecorners(stressingtheimportance
of keeping unmatchedfeatures over a certain numberof frames). Noisy image
data and motion blur will increase the number of matching failures. Corners
no 2, 3, 6 and 8 are lost dueto moving structures in the background causing
accidental views. In thelast frames of the sequence, corner no 9 has poor lo-
calization, since thecorner edges arealigned causing thecorner to disappear.
Theimportanceofusingthecombinedsimilaritymeasureinthematchingstep
isillustratedinthetrainsequenceingure 10,showingtheresultofmatching
on patch correlation only. We see that corners no 4, 5, and 7, which were all
tracked usingthe combined similaritymeasure, now are lost. Table 1 shows,
for both experiments, when the enumerated corners in the train sequence are
lost.
7.2 Blob tracking
Letusnowapplythesame framework forblobtracking.In thetrainsequence,
we manually selected 11 blobs on the train and 2 blobsin the background in
the initial frame shown in gure 11. Figure 12 shows the situation after 30,
90 and 150 frames. The size of the circles in the gures correspond to the
detection scales of the blobs. Notehow thedetection scaleadapts to thelocal
imagestructurewhentheblobsundergoexpansionfollowedbycontraction.All
visibleblobsexcept one aretracked duringthewholesequence.
Referring to the need for automatic scale selection in feature tracking, as
advocated insection 2,it isillustrativeto showthe resultsof attemptingblob
tracking with feature detection at a xed scale. The scale level for detecting
each blobwasautomaticallyselected intherstframeandwasthenkeptxed
throughout the sequence. Figure 13 shows the result after 30 and 150 frames.
Clearly,the trackerhas severe problems dueto theexpansion and contraction
inthesequence.
As a further illustration of the capability of the algorithm to track blobs
under large size changes we applied it to a sequence of 87 images where a
person,dressedinaspottedshirt,approachesthecamera.Inarectangulararea
intheinitial frame,the20 mostsignicant blobswere automaticallydetected,
as shown in gure 14. Figure 15 shows the results after 25, 50 and 87 frames
when matching on the combined similarity measure. All blobsexcept one are
correctly tracked over theentiresequence.
Figure 16 shows the situation after 25 frames when matching on patch
similarityonly.Comparedto gure15, three moreblobsarenow lost,and one
blob ismismatched. Inscenes like thisone, withrepetitive, similarstructures,
therate ofmismatchesis considerablyhigherifwe match on patch correlation
onlyinsteadof usingthecombinedsimilaritymeasure.
Whentryingto tracktheblobsat a xedscale,ascan beseeningure17,
most of the blobs are lost already after 25 frames. The last correctly tracked
blobis lostafter about50 frames.
Insummary,theseexperimentsshowthatsimilarqualitativepropertieshold
forblobtrackingandforjunctiontracking:(i)Byincludingthesignicanceval-
ues and theselected scale levels in the matching criterion,we obtain a better
performance than when matching on grey-level correlation only. (ii) The per-
formanceoftrackingat adaptivelydeterminedscalelevelsissuperiorcompared
to similartrackingat a xed scale.
Let usnally illustratehowfeaturetrackingwithautomaticscaleselection
over alargenumberof framesislikelyto giveustrajectorieswhichcorrespond
to reliable and stable physical scene points or regions of interest on objects.
By explicitly registering the features that are stable over time, we are able
to suppressspurious featureresponses dueto noise, temporaryocclusions etc.
Figure 18showstheinitialframeofasequenceinwhichthe10mostsignicant
blobshavebeentracked inaregionaroundthefaceofthesubject. Thesubject
rstapproachesthecameraandthenmovesbackto theinitialposition.Figure
19showsthesituationafter20,45and90frames. Wecanseethatafterawhile
only four features remain in the feature set and these are the stable features
corresponding tothenostrilsand theeyes.Thisabilitytoregisterstableimage
structures over time is clearly a desirable quality in many computer vision
applications.Notably,forgeneralsceneswithlargeexpansionsorcontractions,
a scaleselectionmechanismis essentialto allowforsuchregistrations.
8 Summary and Discussion
We have presenteda framework forfeaturetrackinginwhich a mechanism for
automatic scale selection has been built into the feature detection stage and
theadditionalattributesoftheimagefeaturesobtainedfromthescaleselection
moduleareusedforguidingtheotherprocessingstepsinthetrackingprocedure.
Wehaveargued thatsucha mechanismisessentialforanyfeaturetracking
procedure intendedtooperateinacomplexenvironment,inorderto adaptthe
scale of processing to the size variations that may occur in the image data as
wellasovertime. Ifwe attemptto trackfeatures byprocessing theimagedata
at one single scaleonly,we can hardlyexpectto be ableto followthefeatures
overlarge sizevariations. Thispropertyis a basicconsequenceof the inherent
multi-scale nature of image structures, which means that a given object may
appearindierent waysdependingon thescaleof observation.
Specically,based on a previously developed feature detection framework
withautomaticscaleselection,wehavepresentedaschemefortrackingcorners
and blobsover timeinwhich:
the imagefeatures at anytime moment are detected usinga featurede-
tection methodwithautomatic scaleselection,and
thisinformationisusedfor
{ guidingthedetection and selectionof new featurecandidates,
{ providing context informationforthematchingprocedure,
{ formulatinga similaritymeasure formatching featuresovertime.
Besidesavoiding explicitselectionof scale levels forfeaturedetection, thefea-
turedetectionprocedurewithautomaticscaleselectionallowsustotrackimage
features over largesizevariations.Asdemonstrated in theintroductoryexam-
ple in section 2, we can in this way obtain a substantial improvement in the
performancerelative to a xed-scalefeaturetracker.
Sincethescalelevelsobtainedfromthescaleselectionprocedurere ectthe
spatialextent oftheimagestructures,wecan alsousethiscontext information
foravoidingexplicitsettingsofdistancethresholdsandpredenedwindowsizes
for matching. Moreover, by including the scale and signicance information
associated with the image features from the scale selection procedure into a
multi-cue similarity measure, we showed how we in this way can improve the
reliabilityofthe low-levelmatching procedure.
Ofcourse,thereareinherentlimitationsintrackingeachfeatureindividually
asdoneinthiswork,andascan beseenfromtheexamples,thereareanumber
of situations wherethe tracking algorithm fails. Typically,thisoccursbecause
ofrapidchangesinthelocalgrey-levelpatternaroundthecorner,corresponding
to violationsof theassumption aboutsmallinter-framemotions.
Anotableconclusionthatcan bemadeinthiscontext,isthatdespitethese
limitations, we have shown byexamples that the resulting tracking procedure
is able to track most of the visible features that can be followed over time
frameworkpresentedhereprovidesanimportantsteptowardsovercomingsome
of thelimitationsinpreviousfeaturetracking algorithms.
8.1 Spatial consistency and statistical evaluation.
Intheschemepresentedsofar,eachfeatureistrackedindividually,withoutany
explicitnotionofcoherentlymovingclusters.Itisobviousthattheperformance
ofatrackingmethodcanbeimprovedifthelatternotioncanbeintroduced,and
theoverallmotionoftheclusterscan beusedforgeneratingbetterpredictions,
aswellasmorerenedevaluationcriteriaofmatchingcandidates.Toinvestigate
if the motions of the tracked features possibly correspond to the same rigid
body motion, we might compute descriptors such as aÆne 3-D coordinates.
Interesting work in this direction have been presented by [Reid and Murray,
1993,Wiles and Brady,1995, Shapiro,1995].
It is also natural to include a statistical evaluation of the reliability of
matches aswellas theirpossibleagreement with dierent clusters, as donein
[Shapiro,1995].Whereas suchan approach hasnotbeenexploredinthiswork,
thisshouldnot be interpretedasimplyingthat the scaleselection method ex-
cludestheusefulnessofastatisticalevaluation.Themainintentionbehindthis
work has been to explore how farit ispossibleto reach byusinga bottom-up
constructionoffeaturetrajectoriesandbyincludingamechanismforautomatic
scale selection in the feature detection step. Then, the intention is that these
twoapproachesshouldbeappliedina complementarymanner,wherethescale
selection method serves as a pre-conditioner for generating more reliable hy-
potheses with more reliable input data. The scale selection method can also
provide context informationover what domains statistical evaluations should
bemade.
8.2 Multi-cue tracking
A tracking method based on asingle visual cue, like those reviewedin section
1 may have a rather good performance under certain conditions butmay fail
inmore complexscenes. In thiscontext, a multi-cue approach to the tracking
problem is natural, i.e a system in which several types of algorithms operate
simultaneouslyandthealgorithmmostsuitabletoagivensituationdominates.
This means that the vision system must have the ability to evaluate the re-
liability of the various tracking methods and to switch between them in an
appropriateway.
Initial work in this direction, combining disparity cues with optical ow
basedobjectsegmentation,hasbeenperformedby[Uhlinet al.,1995].Theap-
proachdevelopedherelendsitselfnaturallytointegrationwithsuchtechniques,
in which such cues can be used for evaluating candidate feature clusters, and
the feature tracking module inturn can be used as a more rened processing
mechanismformaintainingobjecthypothesesovertime.Ofcourse,thisleadsto
basicproblemsof feature selection.Onepossibleapproach foraddressing such
8.3 Temporal consistency
Asanalremarkitisworth pointingoutthatinthiswork,theimagefeatures
ineach frame have beenextractedindependently from each other and without
anyotherexplicituseoftemporalconsistencythantheheuristicconditionthat
a feature hypothesis is allowed to survive over a few frames. To make more
explicituse of temporal consistency, it is natural to incorporate the notionof
a temporal scale-space representation [Lindeberg and Fagerstrom, 1996] and
to include scale selection over the temporal scale domain as well [Lindeberg,
1996b].
Inthiscontext, itis also natural to combine thefeaturetrackingapproach
withasimultaneouscalculationof optical owestimates andto integratethese
two approachessoas to makeuse of theirrelative advantages.These subjects,
includingtheintegrationofmultipletrackingtechniquesintoamulti-cueframe-
work, constitutemajorgoals of ourcontinuedresearch.
References
[Blakeetal.,1993] Blakeetal.\AÆne-invariantcontourtrackingwithautomaticcon-
trol ofspatiotemporal scale". InProc. 4th International Conference on Computer
Vision, Berlin,Germany,1993. IEEEComputerSocietyPress.
[Blom,1992] J.Blom. Topological and Geometrical Aspects of Image Structure. PhD
thesis. , Dept. Med. Phys. Physics,Univ.Utrecht, NL-3508Utrecht,Netherlands,
1992.
[CipollaandBlake,1992] R.Cipolla and A. Blake. \Surface orientationand time to
contactfrom image divergence anddeformation". In G.Sandini, editor, Proc. 2nd
EuropeanConferenceonComputerVision,pages187{202,SantaMargheritaLigure,
Italy,1992.SpringerVerlag,Berlin.
[Curwen etal.,1991] Curwenetal. \ParallelimplementationofLagrangiandynamics
forreal-timesnakes". InProc. BritishMachine VisionConference.SpringerVerlag,
Berlin,1991.
[DericheandGiraudon,1990] R. DericheandG.Giraudon. \AccurateCornerDetec-
tion: An Analytical Study". In Proc. 3rd Int. Conf. on Computer Vision, pages
66{70,Osaka,Japan,1990.
[Faugeras,1993] O.Faugeras. Three-dimensional computervision. MITPress,Cam-
bridge,Massachusetts,1993.
[Floracketal.,1992] L.M.J.Florack;B.M.terHaarRomeny;J.J.Koenderink,and
M.A.Viergever.\ScaleandtheDierentialStructureofImages".ImageandVision
Computing,10(6):376{388,Jul.1992.
[ForstnerandGulch,1987] W. A. Forstner and E. Gulch. \AFastOperator forDe-
tection and Precise Location of Distinct Points, Corners and Centers of Circular
Features".InProc.IntercommissionWorkshop of theInt.Soc.for Photogrammetry
andRemoteSensing,Interlaken,Switzerland,1987.
[GeeandCipolla,1995] A.H. Gee and R. Cipolla. \Fast visual tracking by tempo-
ralconsensus". TechnicalReport CUED/F-INFENG/TR207,Deptof Engineering,
UniversityofCambridge,England,1995.
[KitchenandRosenfeld,1982] L.Kitchen andA. Rosenfeld. \Gray-LevelCornerDe-
tection". Pattern Recognition Letters,1(2):95{102,1982.
[KoenderinkandRichards,1988] J. J. Koenderink and W. Richards. \Two-
Dimensional CurvatureOperators". J. of the Optical Society of America,5:7:1136{
1141,1988.
[KoenderinkandvanDoorn,1992] J.J. Koenderink and A. J. van Doorn. \Generic
neighborhood operators". IEEE Trans. Pattern Analysis and Machine Intell.,
14(6):597{605,Jun.1992.
[Koenderink,1984] J.J.Koenderink. \Thestructureofimages". Biological Cybernet-
ics,50:363{370,1984.
[Kolleretal., 1994] D.Koller;J.Weber,andJ.Malik. \Robustmultiplecartracking
withocclusionreasoning".InJ.-O.Eklundh,editor,Proc.3rdEuropeanConference
on Computer Vision, pages 189{196, Stockholm, Sweden, 1994. Springer Verlag,
Berlin.
[Lindebergand Fagerstrom,1996] T.LindebergandD.Fagerstrom.\Scale-Spacewith
causal time direction". In Proc. 4th European Conference on Computer Vision,
[Lindeberg, 1993] T. Lindeberg. \On Scale Selection for Dierential Operators". In
K.HeiaK.A.Hgdra,B.Braathen,editor,Proc. 8thScandinavian Conf.onImage
Analysis,pages857{866,Troms,Norway,May.1993.NorwegianSocietyforImage
ProcessingandPatternRecognition.
[Lindeberg, 1994a] T.Lindeberg. \Junctiondetectionwithautomaticselectionofde-
tection scales and localization scales". In Proc. 1st International Conference on
Image Processing, volume I,pages924{928,Austin,Texas,Nov. 1994.IEEE Com-
puterSocietyPress.
[Lindeberg, 1994b] T. Lindeberg. \ScaleSelection for DierentialOperators". Tech-
nicalReportISRNKTH/NA/P--94/03--SE,Dept.ofNumericalAnalysisandCom-
putingScience,KTH,Stockholm, Sweden,Jan.1994. (Submitted).
[Lindeberg, 1994c] T.Lindeberg.Scale-SpaceTheoryinComputerVision.TheKluwer
InternationalSeriesin EngineeringandComputer Science.KluwerAcademicPub-
lishers,Dordrecht,Netherlands,1994.
[Lindeberg, 1996a] T.Lindeberg. \Edgedetectionandridgedetectionwithautomatic
scale selection". In Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pat-
ternRecognition, 1996,pages 465{470,San Francisco,California,June1996.IEEE
ComputerSocietyPress.
[Lindeberg, 1996b] T.Lindeberg. \Onautomaticselectionoftemporalscales".1996.
[Meyerand Bouthemy,1994] F.G.Meyerand P.Bouthemy. \Region-basedtracking
usingaÆne motionmodels in longimagesequences". Computer Vision, Graphics,
andImage Processing:Image Understanding,60(2):119{140,1994.
[ReidandMurray,1993] I. D. Reid and D. W. Murray. \Tracking foveated corner
clustersusingaÆnestructure". InProc.4thInternational ConferenceonComputer
Vision,pages76{83,Berlin,Germany,1993.IEEEComputerSocietyPress.
[Shapiro,1995] L.S.Shapiro. AÆneanalysisof imagesequences. CambridgeUniver-
sityPress,Cambridge,England,1995.
[Shapiroet al.,1992a] L.S.Shapiro;H. Wang,andJ.M.Brady. \Acornermatching
and tracking strategy applied to videophony". Technical Report OUEL 1933/92,
RoboticsResearchGroup,UniversityofOxford,1992.
[Shapiroet al.,1992b] L. S. Shapiro; H. Wang,and J. M. Brady. \A matching and
trackingstrategyforindependentlymovingobjects". InProc. British Machine Vi-
sionConference,pages306{315.SpringerVerlag,Berlin,1992.
[ShiandTomasi,1994] J. Shi and C. Tomasi. \Good features to track". In Proc.
IEEEComp. Soc. Conf. on Computer Vision and Pattern Recognition,pages 593{
600.IEEEComputerSocietyPress,1994.
[SmithandBrady,1995] S.M.SmithandJ.M.Brady. \ASSET-2:Real-time motion
segmentation and shape tracking". IEEE Trans. Pattern Analysis and Machine
Intell., 17(8):814{820,1995.
[Thompsonet al.,1993] W.B.Thompson;P.Lechleider,andE.R.Stuck. \Detecting
moving objectsusing the rigidity constraint". IEEE Trans. Pattern Analysis and
MachineIntell.,15(2):162{165,1993.
[Uhlin etal.,1995] T. Uhlin; P. Nordlund; A. Maki, and J.-O. Eklundh. \Towards
an Active Visual Observer". In Proc. 5th International Conference on Computer
[WilesandBrady,1995] C. S. Wiles and M. Brady. \Closing the loop on multiple
motions". InProc. 5th International Conference on Computer Vision, pages308{
313.IEEEComputerSocietyPress,1995.
[Witkin,1983] A.P.Witkin.\Scale-spaceltering". InProc.8th Int.JointConf.Art.
Intell.,pages1019{1022,Karlsruhe,WestGermany,Aug.1983.
[ZhengandChellappa,1995] Q. Zhengand R. Chellappa. \Automatic feature point
extractionand trackingin imagesequencesforarbitrarycameramotion". Interna-
tional JournalofComputer Vision, 15(1):31{76,1995.
A Algorithmic details
This appendix gives adetailed listing ofthe parametersthat in uence thealgorithm
aswellastheparametersettingsthat havebeenusedforgeneratingtheexperiments.
A.1 Prediction
Theparametersdeterminingthesizeofthesearchwindow(seesection6)were
k
size
=5
k
w1
=1:5
k
w2
=2k
w1
D
min
=16
A.2 Featuredetection
Whendetectingfeatureswithautomaticscaleselection,thefollowingscalerangeswere
usedin theinitial frame:
Junctiondetection Blobdetection
t
min
=4:0 t
min
=4:0
t
max
=256:0 t
max
=512:0
andtheparameter inthenormalizedderivativeconcept(seesection3)wassetto:
Junctiondetection Blobdetection
=0:875 =1
Whensearchingfornewimagefeatures,thesearchformatchingcandidatestoafeature
detectedatscalet
det
wasperformedintheinterval[t
det
=k
1
;t
det k
1
],where k
range
=3.
Inallexperiments,thesamplingdensityinthescaledirectionwassettocorrespond
to aminimum of 5scalelevelsperoctave.In allother aspects, thefeature detection
algorithms followed the default implementation of junction and blob detection with
automatic scale selection described in [Lindeberg, 1994b]. The maximum number of
matchingcandidatesevaluatedforeach featurewas:
Junctiondetection Blobdetection
n=8 n=20
A.3 Matching
Thefollowingthresholdswereusedin thematching step
Junctiondetection Blobdetection
T
patch
=0:75 T
patch
=0:6
T
comb
=0:65 T
comb
=0:5
andtheparametersforcontrollingthequalitymeasure overtime(seesection6)
dq
i
=0:2
dq
d
=0:1
T =0
Similarity measures: Relativeweights Intheexperimentspresentedhere,thefol-
lowingrelativeweights(seesection5)wereusedin thecombinedsignicancemeasure
(15):
Junctiondetection Blob detection
c
patch
=1:0 c
patch
=1:0
c
sign
= 0:08 c
sign
= 0:25
c
scale
= 0:08 c
scale
= 0:08
c
pos
= 0:1 c
pos
= 0:1
Togiveaqualitativemotivationforusingtheseordersofmagnitudefortherelative
weights,letusrstestimatetherangesinwhichthesedescriptorswillvary:
For the cross-correlation measure, it trivially holds that jS
patch
j < 1. By the
thresholdingoperationon thisvalue, jT
patch
j=0:7,the variationof thisentity
isconnedtotheintervaljS
patch
j2[0:7;1:0].Inpractice,therelativevariations
areusuallyintheintervaljS
patch
j2[0:8;1:0].
Concerningthesignicancemeasure,thesignicancevaluesofcornerscomputed
from animage with grey-level valuesin therange [0;255]typicallyvary in the
intervallogR<25.Empirically, therelativevariationsare usuallyof theorder
of logR <3. For blob features, the correspondingvalues are logR <8 and
logR<1.
Concerningthestabilityofthescalevalues,therestrictedsearchrangegivenby
k
range
, implies that the relative variationof this descriptor will alwaysbe less
thanlogt1.
For theproximity measure the maximumvalue is p
20:5k
range
k
w1
5.
Withsmoothscenemotionsthevalueisnormallyconsiderablysmaller.
Motivated by the fact that the relative variation in S
patch
is about a factor of ten
smallerthanthe other entities,the relativeweightsof thecomponentsin S
comb were
setaccordingto thetable above.
Note that the correlation measure is the dominant component, and the relative
in uenceoftheothercomponentscorrespondsto abouthalf thatvariation.
The reasonwhyc
sign
isincreased in blob detection, is that thedimension ofthe
signicancemeasuresaredierent:
[~ 2
norm
]=[brightness]
6
[(r 2
norm L)
2
]=[brightness]
2
Hence,itisnaturaltoincreasethecoeÆcientofS
sign
=jlog RB
R
A
jbyafactorofthreein
blobdetectioncomparedtojunctiondetection.Asageneralrule,wehavenotperformed
anyne-tuningoftheparameters,and allparametervalueshavebeenthesameinall
experiments.
Initialframe
Fixedscale tracking
Adaptive scaletracking
Figure 1: Illustration of the importance of automatic scale selection when tracking
image structures overtime. The corner is lost using detection at a xed scale (left
column), whereasitis correctlytrackedusingadaptivescaleselection(rightcolumn).
Thesizeofthecirclescorrespondtothedetectionscalesofthecornerfeatures.
Figure2:Theresultofapplyingthecornerdetection algorithmwithautomatic
scaleselection to two dierent grey-level images.(top row) Originalgrey-level
images. (bottom row) The 100 most signicant corners superimposed onto a
bright copy of the original image. Graphically, each corner is illustrated by
circle withthe radius re ectingthedetection scale. Observe thata reasonable
set of junction candidates is obtained, and that the circles serve as natural
regionsof interest aroundthe cornersto beused infurtherprocessing.
Figure 3: The result of applyingthe blob detection algorithm with automatic
scaleselectionto thesame images asusedforcornerdetection ingure 2.The
100mostsignicantblobshavebeengraphicallyillustratedbycircleswiththeir
Algorithm:
For eachframe:
For eachfeatureF in thefeature set:
1. Prediction
1.1 Predictthepositionof thefeatureF in thecurrentframebasedon
informationfrom thepreviousframes.
1.2 Computethesearchregioninthecurrentframebasedoninformation
from thepreviousframesandthescaleofthefeature.
2. Detection
Detectncandidates C
k
overareducedset ofscalesin theregionof
interestinthecurrentframe.
3. Matching
3.1 MatcheverycandidateC
k
tothefeatureF andndthebest match
usingthecombinedsimilaritymeasure.
3.2 Optionally,performbidirectionalmatchingtoregistersafematches.
3.3 Comparethesimilarityvaluetoapredeterminedthreshold:
If above: consider the feature as matched; update its position, its
scale descriptor, its signicance value, its grey-level patch and in-
creaseitsqualityvalue.
If below:consider thefeature asunmatched; update itsposition to
thepredictedposition anddecreaseitsqualityvalue.
Parse the feature set to detect feature merges and remove features having
qualityvaluesbelowacertainthreshold.
Figure 4:Overviewofthefeaturetrackingalgorithm.
Figure 5:Thephonesequence:Theinitialframewith14detectedcorners.
Figure 6: Corner tracking with adaptive scale selection and matching on combined
similarity:thetrackedcornersinthephonesequenceafter30frames(top),50(middle)
and60frames(bottom).As canbeseen,allcornersarecorrectlytracked.
Figure 7: Corner trackingwith xed scales over time: the tracked cornersin phone
sequence after 30 frames (top), 50 (middle) and 60 frames (bottom). Note that the
bluntcornersarelostcomparedto theadaptivescaletrackingingure 6.
Figure 8:Thetrainsequence: Theinitial framewith29detectedcorners.
Figure 9: Corner trackingwith adaptive scale selection and matching on combined
similarity:thetrackedcornersinthetrainsequenceafter60frames(top),100(middle)
and140frames(bottom).
Figure 10:Matchingcandidatesonpatch correlationonly:thetrackedcornersinthe
trainsequenceafter60frames(top)and100frames(bottom).Threemorecornersare
lostascomparedtogure9.
Figure11: Thetrainsequence:Theinitialframewith13detectedblobs.(The sizeof
thecirclescorrespondtothedetectionscalesoftheblobfeatures.
Figure 12: Blob tracking with adaptive scale selection and matching on combined
similarity: thetracked blobsin thetrain sequenceafter 30 frames (top), 90 (middle)
and150frames(bottom).Allblobsarecorrectlytracked.
Figure 13: Blob tracking using xed scales in the detection procedure: the tracked
blobs in train sequence after 30 frames (top), 90 (middle) and 150 frames (bottom).
Onlyoneblobiscorrectlytrackedoverthewholesequence.
Figure14:Theinitialframeoftheshirtsequencewiththe20strongestblobsdetected
in arectangular window. Thesize of thecircles correspondto the detectionscalesof
theblobfeatures.)
Figure15:Blobmatchingusingthecombinedsimilaritymeasure:thetrackedblobsin
theshirtsequenceafter 25frames (top),50 frames(middle) and87frames (bottom).
Notehowthescales,illustratedbythesizeofthecircles,adapt tothesize changesof
theimagestructures.
Figure16:Matchingthecandidatesonpatchsimilarityonly:thetrackedblobsin the
shirt sequence after 25 frames. Compared to the top image in gure 15, three more
blobsarelostandoneismismatched.
Figure 17: Blob tracking using xed scales in the detection procedure: the tracked
blobs in theshirtsequence after 25frames. Mostblobs are alreadylost because they
nolongerexistattheinitiallychosenscale.
Figure 18: Theinitial frame ofthe face sequence with the10 mostsignicantblobs
detectedinaregionaroundthefaceofthesubject.
Figure19:Trackingtheblobsinthefacesequencewithautomaticscaleselection;the
situationafter20,45and90frames.Afterabout60framesonlythe4moststableblobs
remaininthefeatureset.