• No results found

Qualitative multi-scale feature hierarchies for object tracking

N/A
N/A
Protected

Academic year: 2021

Share "Qualitative multi-scale feature hierarchies for object tracking"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

for Object Tracking



Lars Bretzner and Tony Lindeberg

ComputationalVisionand Active Perception Laboratory (CVAP)

Departmentof Numerical Analysisand Computing Science

KTH (Royal Institute of Technology)

S-100 44 Stockholm, Sweden.

Email: f bretzner, tonyg@nada.kth.se

Technical report ISRN KTH/NA/P/{9909{SE

Abstract

This papershows howthe performanceof feature trackerscanbe improved by

building a hierarchical view-based object representation consisting of qualita-

tiverelationsbetweenimagestructures at di erentscales. The ideaisto track

all image features individually, and to use the qualitative feature relations for

avoiding mismatches, resolving ambiguous matchesand for introducing feature

hypotheses whenever image features are lost. Compared to more traditional

workonview-based objecttracking,thismethodologyhastheabilitytohandle

semi-rigid objectsandpartialocclusions. Comparedto trackersbasedonthree-

dimensional objectmodels,thisapproachismuchsimplerandofamoregeneric

nature. Ahands-onexampleispresentedshowinghowanintegratedapplication

systemcanbeconstructedfrom conceptuallyverysimpleoperations.



ThesupportfromtheSwedishResearchCouncilforEngineeringSciences,TFR,andtheSwedish

NationalBoardforIndustrialandTechnicalDevelopment,NUTEK,isgratefullyacknowledged. Ac-

ceptedfor publicationin Journal ofVisual Communicationand ImageRepresentation. An earlier

versionofthismanuscriptwaspresentedinM.Nielsen,P.Johansen,O.OlsenandJ.Weickert(eds),

Proc.SecondInternationalConferenceonScale-SpaceTheoriesinComputerVision,(Corfu,Greece),

September1999. Springer-VerlagLectureNotesinComputerScience,vol1682,pp. 117{128.

(2)

Tomaintainastablerepresentationofadynamicworld,itisnecessarytorelateimage

datafromdi erent timemoments. Whenanalysingimagesequencesframebyframe,

asiscommonly doneincomputer visionapplications, itisthereforeusefultoinclude

anexplicittrackingmechanismsinto thevisionsystem.

Whenconstructingsuchatrackingmechanism,thereisalargefreedomindesign,

concerning how much a priori informationshould be included into and be used by

the tracker. If the goal is to track a single object of known shape, then it may be

naturalto builda three-dimensional object model, and to relate computed viewsof

thisinternal model to the image data that occur. An alternative approach is store

a large number of actual views in a database, and subsequentlymatch these to the

imagesequence.

Dependingonwhat typeofobjectrepresentationwechoose,wecanexpectdi er-

ent trade-o s between thecomplexity of constructing theobjectrepresentation and

thecomplexity inmatching theobject representationto imagedata.

1

In particular,

di erent design strategies will implydi erent amounts of additional work when the

databaseisextended withnew objects.

The subject ofthisarticle isto advocate theuseof qualitative multi-scaleobject

modelsinthiscontext, asopposedto moredetailedmodels. The ideaisto represent

only dominant image features of the object, and relations between those that are

reasonably stable under view variations. In this way, a new object model can be

constructedwith onlyminoradditional work,and it willbedemonstrated that such

a weaker approach to object representationis powerful enough to give a signi cant

improvement intherobustness offeature trackers.

Amainrationalefortheproposedapproachisthatifwetrackindividualfeatures

over long time periods in scenes with changing conditions (e.g., object pose and

illumination), the likelihood that features will be mismatched or lost will increase

with time. Major aims of the proposed hierarchical representation are to handle

such problems, and also to assist in the initialization stage of the feature tracker.

Whenafeatureislost,therelationsofthequalitativefeaturehierarchymodelwillbe

usedforde ningsearchregions inthewhich thelostfeaturecan bedetected. When

mismatches occur, relational constraints in the feature hierarchy will be helpful for

detecting andrejecting outliers.

Theusefulnessofsuchahierarchicalobjectrepresentationforfeaturetrackingwill

bedemonstratedbyexperimentsonreal-worldimagesequences. Speci cally,itwillbe

shownhowan integratednon-trivial applicationto human-computerinteractioncan

beconstructedinastraightforwardandconceptuallyverysimpleway,bycombination

witha setof elementaryscale-space operations.

The presentationis organized as follows: Section 2 presents the general motiva-

tionsbehindtheproposedapproach,withanoverviewofrelated works. Insection 3,

we rst brie yreviewthemulti-scaleframework weusefordetecting imagefeatures,

anddescribehowhierarchicalandqualitativefeaturerelationscanbede nedbetween

these multi-scales image features. Section 4 outlines how such a view-based object

1

With theterm\complexity", wehere referto boththe computationalcomplexity inmatching

algorithmsandthedegreeofstructuralcomplexitythatisrequiredwhendesigningthesoftware.

(3)

tal results for two sample applications to hand gesture analysis and face tracking,

respectively. Finally,section 5 concludes witha summaryand discussionconcerning

otherpossibleapplicationsand generalizationsofthe proposedideas.

2 Choice of Image Representation for Feature Tracking

The framework we consideris one in which image features are detected at multiple

scales. Eachfeatureisassociatedwitharegioninspaceaswellasarangeofscales,and

relations between features at di erent scales imposehierarchical links across scales.

Speci cally, we assume that the image features are detected with a mechanism for

automaticscale selection (Lindeberg 1998b). In earlier work (Bretzner & Lindeberg

1998a), we have demonstrated how such a scale selection mechanism is essential to

obtaina robust behaviour of the featuretracker ifthe image features undergo large

sizevariations intheimagedomain.

Therationaleforusingahierarchicalmulti-scaleimagerepresentationforfeature

trackingoriginatesfromthewell-knownfactthatreal-worldobjectsconsistofdi erent

typesofstructuresatdi erentscales. Aninternalobjectrepresentationshouldre ect

thisfact. One aspect of this, which we shall make particular use of, is that certain

hierarchical relationsover scales tendto remainreasonably stable when theviewing

conditionsare varied. Thus, even if some features arelost duringtracking(e.g. due

to occlusions, illumination variations, or spurious errors by the feature detector or

thefeature matchingalgorithm), itis ratherlikelythat a suÆcientnumberof image

featureswillremaintosupportthetrackingoftheotherfeatures. Thereby,thefeature

trackerwillhavehigherrobustness 2

withrespecttoocclusions,viewingvariationsand

spuriouserrorsinthe lower-levelmodules. Aswe shallsee, the qualitative nature of

these feature relations willalso make itpossible to handlesemi-rigidobjects within

thesame framework.

In this way, the approach we will propose is closely related to the notionof ob-

jectrepresentation. Comparedtothemoretraditionalproblemofobjectrecognition,

however,therequirementsaredi erent,sincetheprimarygoalistomaintainastable

imagerepresentation over time, and we do not need to supportindexingand recog-

nition functionalities into large databases. For these reasons, a qualitative image

representation can be suÆcient inmanycases, and o er a higher exibilitybybeing

more genericthandetailedobject models.

Related works. The topic of this paper touches on both the subjects of feature

trackingandobjectrepresentation. Theliteratureontrackingislargeandimpossible

to reviewhere. Hence,we focuson themostcloselyrelated works.

Imagerepresentations involvinglinkingacross scaleshave beenpresentedbysev-

eral authors. (Crowley & Parker 1984, Crowley & Sanderson 1987) detected peaks

and ridges in a pyramid representation. In retrospect, a main reason why stability

problemswereencountered isthat thepyramidsinvolveda rathercoarsesamplingin

2

According tothe terminologyproposedby(Toyama &Hager1999),theautomaticscale selec-

tionmechanismisessentialforthepre-failurerobustnessof thefeaturetracker,while theproposed

qualitativemulti-scalefeaturehierarchyimprovesthepost-failurerobustness.

(4)

paths inscale-space, and thisidea was madeoperational formedicalimage segmen-

tation by (Lifshitz & Pizer 1990) and (Vincken et al. 1997). (Lindeberg 1993) con-

structed a scale-space primal sketch, in which a morphological support region was

associated with each extremum point and paths of critical points over scales were

computeddelimitedbybifurcations. (Olsen1997) applieda similarapproachto wa-

tershed minimain the gradient magnitude. (GriÆn et al. 1992) developed a closely

relatedapproachbasedonmaximumgradientpaths,however,atasinglescale. Inthe

scale-space primalsketch, scaleselection wasperformed,bymaximizingmeasuresof

blobstrength over scales,and signi cance was measuredbythe volumes that image

structures occupyin scale-space, involving thestability overscales asa major com-

ponent. A generalization ofthisscaleselection ideato more general classesof image

structureswaspresentedin(Lindeberg1994,Lindeberg1998b, Lindeberg1998a), by

detectingscale-spacemaxima,i.e. pointsinscale-space atwhichnormalizeddi eren-

tialmeasures of feature strength assume local maxima withrespect to scale. (Pizer

etal. 1994) and his co-workers (Gauch & Pizer 1993) have proposedclosely related

descriptors, focusing on multi-scale ridge representations for medical image analy-

sis. Psychophysical results by (Burbeck & Pizer1995) support the belief that such

hierarchicalmulti-scalerepresentationsare relevantforobjectrepresentation.

Withrespecttotheproblemofobjectrecognition,(Shokoufandehetal.1998)de-

tectextremainawavelet transforminawaycloselyrelatedtothedetectionofscale-

space maxima, and de ne a graph structure from these image features. This graph

structure is then matched to corresponding descriptors for other objects, based on

topologicalandgeometricsimilarity. Earliergraph-likeobjectrepresentationsinclude

theclassicalmodel-basedapproachby(Lowe 1985),usedinconjunctionwithpercep-

tual grouping, as well as the distributed aspect hierarchy proposed by (Dickinson

etal.1992). Inrelationtothelargenumberof worksonmodelbasedtracking, there

aresimilaraims betweenourapproach and the followingworks: (Koller etal. 1993)

usedcarmodelsto supportthetrackingof vehiclesinlongsequenceswithocclusions

and illumination variations. (Smith & Brady 1995) de ned clusters of coherently

moving corner features as to support the tracking of cars in a qualitative manner.

(Black & Jepson 1998b) constructed a view-based object representation using an

eigenimage approach to compactly represent and support the tracking of an object

seen from a large number of di erent views. The recently developed condensation

algorithm (Isard & Blake 1998, Black & Jepson 1998a) is of particular interest, by

explicitly constructing statistical distributions to capture relations between image

features. Concerning the speci c application to qualitative hand tracking that will

beaddressedinthispaper,more detailedhandmodelshavebeenpresentedby(Kuch

& Huang1995, Heap & Hogg 1996, Yasumuro et al. 1999). Related graph-like rep-

resentations forhand trackingand face tracking have beenpresented by(Triesch &

von derMalsburg1996, Mauerer&von derMalsburg 1996).

3 Image Features and Qualitative Feature Relations

Weareinterestedinrepresentingobjectswhichcangiverisetoarichvarietyofimage

features of di erent types and at di erent scales. Generically, these image features

(5)

(iii) two-dimensional (blobs), and we assume that each image feature is associated

witha regioninspace aswellasarange of scales.

3.1 Computation of Image Features

When computing a hierarchical view-based object representation, one may at rst

desire to compute a detailed representation of the multi-scale image structure, as

donebythe scale-space primalsketch or some of theclosely related representations

reviewed in section 2. Since we are interested in processing temporal image data,

however, and the construction of such a representation from image data requires a

rather large amount of computations, we shall here follow a computationally more

eÆcient approach.

We focusonimagefeaturesexpressedinterms ofscale-space maxima,i.e. points

inscale-space at which di erential geometric entities assume local maxima with re-

spectto space andscale (Lindeberg1998b). Formally,such pointsarede nedby

( r(D

norm

L(x; s))=0) ^ ( @

s (D

norm

L(x; s))=0) (1)

where L(; s) denotes the scale-space representation of the image f constructed by

convolutionwithaGaussiankernelg(;s)withscaleparameter(variance)sandD

norm

isa di erentialinvariantnormalized bythereplacementof all spatialderivatives@

x

i

by -normalizedderivatives@



i

=s =2

@

x

i :

Two examplesofsuchdi erentialdescriptors,whichweshallmakeparticular use

ofhere,include thenormalized Laplacian(with =1) forblob detection

r 2

norm

L=s(L

xx +L

yy

) (2)

andthesquaredi erence betweentheeigenvaluesL

pp andL

qq

oftheHessian matrix

(with =3=4) forridgedetection

AL

norm

=s 2

jL

pp L

qq j

2

=s 2

((L

xx L

yy )

2

+4L 2

xy

) (3)

see(Lindeberg1998a)foramoregeneraldescription. Acomputationallyveryattrac-

tivepropertyofthisconstructionisthatthescale-spacemaximacanbecomputedby

architecturallyvery simpleandcomputationallyhighlyeÆcientoperationsinvolving:

(i) scale-space smoothing, (ii) pointwise computation of di erential invariants, and

(iii)detection of local maximaofscalar entitiesinscale-space.

Furthermore,tosimplifythegeometricanalysisofimagefeatures,weshallreduce

the spatial representation of image descriptors to ellipses, by evaluating a second

moment matrix

= Z

2R 2



L 2

x L

x L

y

L

x L

y L

2

y



g(;s

int

)d (4)

at integration scale s

int

proportionalto the detection scale of thescale-space maxi-

mum(equation(1)). Thereby,eachimagefeaturewillwerepresentedbyapoint(x; s)

inscale-spaceand acovariancematrixdescribingtheshape,graphicallyillustrated

byan ellipse. Forone-dimensional features, thecorrespondingellipseswillbehighly

(6)

scriptors of the second moment matrices will be rather circular. Attributes derived

from thecovariance matrixinclude its anisotropyderived from the ratio

max

=

min

between its eigenvalues, and its orientation de ned as the orientation of its main

eigenvector.

Figure 4 shows an example of such image descriptors computed from a grey-

level image, after ranking on a signi cance measure de ned as the magnitude of

the response of the di erentialoperator at the scale-space maximum. A trivialbut

nevertheless very useful e ect of this ranking is that it substantially reduces the

number of image features for further processing, thus improving the computational

eÆciency. In a more detailed representation of the multi-scale deep structure of a

real-world image, itwilloften be the casethat a largenumberof theimagefeatures

andtheir hierarchical relationscorrespondto imagestructures thatwillberegarded

asinsigni cantbylaterprocessing stages.

3.2 Qualitative Feature Relations

Betweentheabovementionedfeatures,varioustypesofrelationscanbede nedinthe

imageplane. Here,weconsiderthefollowingtypesof qualitative relations:

Spatial coincidence (inclusion): We saythataregionAat positionx

A

and scale

s

A

is in spatial coincidence relation to a region B at position x

B

and at a

(coarser) scales

B

>s

A if

(x

A x

B )

T

 1

B (x

A x

B )2[D

1

;D

2

] (5)

where D

1

and D

2

are distance thresholds and 

B

is a covariance matrix asso-

ciated withregionB. By usingaMahalanobis distancemeasure, we introduce

a directional preference which is highly useful for expressing spatial relations

between elongated image features. While the special case D

1

=0 corresponds

to an inclusion relation, there are also cases where one may want to explicitly

represent distantfeatures, usingD

1

>0

Stability of scale relations: Fortwoimagefeaturesat timest

k andt

k

0,weassume

thattheratiobetweentheirscalevaluesshouldbeapproximatelythesame. This

is motivatedby thephysicalrequirement ofscaleinvariance underzooming

s

A (t

k )

s

B (t

k )

 s

A (t

k 0

)

s

B (t

k 0

)

: (6)

To accept smallvariations dueto changes inview direction and spuriousvari-

ations from the scale selection mechanism of the feature tracker, we measure

relative distances in the scale direction and implement the \" operation by

q q 0

()jlog q

q 0

j<logT,where T >1 isa thresholdinthescaledirection.

Directional relation (bearing): Forafeature A relatedtoa one-dimensionalfea-

ture B, the angle is measured between the main eigenvector of 

B

and the

vector x

A x

B

from thecenter x

B

of B to thecenter x

A

ofA (see Figure1) .

(7)

x

x B

A α

Figure 1:The direction relation (bearing) between two features A and B is the angle

betweenthemaineigenvectorof

B

(illustrated bytheellipse)andthevectorx

A x

B .

Trivially,theserelationsareinvarianttotranslationsandrotationsintheimageplane.

The scale invariance of these relations follows from corresponding scale invariance

properties of image descriptors computed from scale-space maxima | if the sizeof

animagestructureisscaledbyafactorcintheimagedomain,thenthecorresponding

scalelevels aretransformedbya factor c 2

.

3.3 Qualitative Multi-Scale Feature Hierarchy

Letus now considera speci c examplewith images of a hand. From our knowledge

thata handconsists of ve ngers,weconstruct a modelconsisting of: (i)thepalm,

(ii)the ve ngers, (iii)a nger tipforeach nger,(see gure2).

Each ngerisinaspatialcoincidencerelationtothepalm,aswellasadirectional

relation. Moreover,each ngertipisinaspatialrelationshiptoits nger,andsatis es

a directional relation to this feature. In a similar manner, each nger is in a scale

stability relation with respect to the palm, and each ngertip is in a corresponding

scalestabilityrelationrelative to its nger.

Such a representation will be referred to as a qualitative multi-scale feature hi-

erarchy. Figure 3 shows the relations this representation is built from, using UML

notation (Fowler & Scott 1997). An attractive property of this view-based object

representation is that it only focuses on qualitative object features. There is no

assumptionof rigidity,onlythat thequalitativeshapeis preserved.

The idea behind this construction is of course that the palm and the ngertips

should give rise to blob responses (equation (2)) and that the ngers give rise to

ridgeresponses (equation(3)). Figure 4shows an exampleofhowthismodel can be

initializedand matchedto imagedatawith associated imagedescriptors.

To exclude responses from thebackground, we have here requiredthat all image

featuresshouldcorrespond to bright blobsorbright ridges. Alternatively,one could

de nespatial inclusionrelations with respectto other segmentation cues relative to

thebackground, e.g. chromaticityordepth.

Here,wehave constructedthegraphwith featurerelations manually,usingqual-

itativeknowledgeabouttheshapeoftheobjectanditsprimitives. Inamoregeneral

setting,however, one can also considerthe learningof stable featurerelations inan

actualsetting,basedonarichersetofimagefeaturesaswellasarichervocabularyof

(8)

x y s

Figure2:A qualitativemulti-scalefeaturehierarchyconstructedforahandmodel.

top−hand:Relation handconstraint:Constraint

hand:Objfeature

hand−finger:Relation fingerconstraint:Constraint

finger[1]:Objfeature finger[2]:Objfeature

finger−tip[1]:Relation finger−tip[2]:Relation

tipconstraint:Constraint

tip[1]:Objfeature tip[2]:Objfeature

...

...

...

Figure3:Instancediagram forthefeaturehierarchyofahand ( gure2).

20 strongestblobsandridges Initializedhandmodel Allhandfeaturescaptured

Figure4: Illustrationof theinitialization stage of theobjecttracker. Once the coarse-scale

feature is found (here the palm of the hand), the qualitative feature hierarchy guides the

top-downsearchfortheremainingfeaturesoftherepresentation. (The leftimage showsthe

20mostsigni cantblobresponses(inred)and ridgeresponses(in blue).)

References

Related documents

Figure 15: Blob matching using the combined similarity measure: the tracked blobs in. the shirt sequence after 25 frames (top), 50 frames (middle) and 87

The object tracking module is based on the feature tracking method with automatic scale selection presented in Chapter 2 and incorpo- rates a qualitative feature hierarchy as

In detail, this implies the extraction of raw data and computation of features inside Google Earth Engine and the creation, assessment and selection of classifiers in a

The algorithm is not specifically for ob- ject matching, but it allows the combination of two point clouds with overlapping regions to be combined into a single cloud, which should

Intensive TB Case Finding in Unsafe Settings: Testing an Outreach Peer Education Intervention for Increased TB Case Detection among Displaced Populations and Host Communities in

En R3 kände också att han inte hade kontroll på hemmet: ” När jag bodde på HVB-hemmet kunde jag inte bestämma… För att personalen kontrollerade.” En gemensam faktor till att

At the beginning of this study, one of the pioneer General Delegates of National Security in Cameroon tries to describe the CIDP project was to lead to a system where all the citizens

Given the accuracies from the different models, the most promising model is the boosted decision tree because it had the best accuracy. But the model did not perform best on all the