Use your hand as a 3-D mouse or relative orientation from extended sequences of sparse point and line correspondances using the affine trifocal tensor

(1)

or,

Relative Orientation from Extended Sequences

of Sparse Point and Line Correspondences

Using the AÆne Trifocal Tensor

? ??

LarsBretznerandTonyLindeberg

ComputationalVisionandActivePerceptionLaboratory(CVAP)

Dept.ofNumericalAnalysisandComputingScience,

KTH,S-10044Stockholm,Sweden

Abstract. Thispaperaddressestheproblemofcomputingthree-dimen-

sional structure and motion from an unknown rigid conguration of

pointsandlinesviewedbyanaÆneprojectionmodel.Analgebraicstruc-

ture,analogous tothe trilineartensor for threeperspectivecameras, is

denedforcongurationsofthreecenteredaÆnecameras.Thiscentered

aÆnetrifocaltensorcontains12non-zerocoeÆcientsandinvolveslinear

relationsbetweenpointcorrespondencesandtrilinearrelations between

linecorrespondences.ItisshownhowtheaÆnetrifocaltensorrelatesto

theperspectivetrilineartensor, andhowthree-dimensionalmotion can

becomputedfromthistensorinastraightforwardmanner.Afactoriza-

tionapproachisalsodevelopedtohandlepointfeaturesandlinefeatures

simultaneouslyinimage sequences.This theoryis appliedto aspecic

probleminhuman-computerinteractionofcapturingthree-dimensional

rotations from gestures of a human hand. Besides the obvious appli-

cation,this test problemillustrates the usefulnessof the aÆnetrifocal

tensorinasituationwheresuÆcientinformationisnotavailabletocom-

putetheperspective trilineartensor,whilethe geometryrequirespoint

correspondencesaswellaslinecorrespondencesoveratleastthreeviews.

1 Introduction

Theproblemofderivingstructuralinformationandmotioncuesfromimagese-

quences arisesasanimportantsubproblemin severalcomputervisiontasks.In

this paper,weareconcerned withthecomputation of three-dimensionalstruc-

ture and motion from point and line correspondences extracted from a rigid

three-dimensionalobjectofunknownshape,usingtheaÆnecameramodel.

?

Thesupportfrom theSwedishResearchCouncilfor EngineeringSciences, TFR,is

gratefullyacknowledged.Email:bretzner@nada.kth.se,tony@nada.kth.se

??

InProc.5thEuropeanConferenceonComputerVision(H.BurkhardtandB.Neu-

mann,eds.),vol.1406ofLectureNotesinComputerScience,(Freiburg,Germany),

(2)

from perspectiveand orthographicprojectionhavebeenpresentedby(Ullman

1979,Maybank1992,Huang&Lee1989, Huang&Netravali1994)andothers.

With the introduction of the aÆne camera model (Koenderink & van Doorn

1991,Mundy&Zisserman1992)alargenumberofapproacheshavebeendevel-

oped, including (Shapiro 1995, Beardsleyet al. 1994, McLauchlan et al. 1994,

Torr 1995) to mention just afew. Line correspondences havebeen studied by

(Spetsakis&Aloimonos1990,Wenget al.1992),andfactorizationmethodsfor

points and lines constitute a particularly interesting development (Tomasi &

Kanade 1992, Morita& Kanade 1997, Quan& Kanade1997, Sturm & Triggs

1996).These directionsofresearchhaverecentlybeencombinedwiththe ideas

behind the fundamental matrix (Longuet-Higgins 1981, Faugeras 1992, Xu &

Zhang1997)andhaveleadto thetrilineartensor(Shashua1995,Hartley1995,

Heyden 1995) as aunied model for point and line correspondences for three

cameras,withinterestingapplications(Beardsleyetal.1996)aswellasadeeper

understandingoftherelationsbetweenpointfeaturesandlinefeaturesovermul-

tipleviews(Faugeras&Mourrain1995,Heydenetal.1997).

Thesubjectofthispaperistobuildupontheabovementionedworks,andto

developaframeworkforhandlingpointandlinefeaturessimultaneouslyforthree

ormoreaÆneviews.Initially,weshallfocusonimagetripletsandshowhowan

aÆnetrifocaltensor canbedenedforthreecenteredaÆnecameras.Thistensor

has a similar algebraic structure as the trilinear tensor for three perspective

cameras.Comparedto the trilineartensor, however,ithas theadvantagethat

it contains a smaller number of coeÆcients, which implies that fewer feature

correspondencesarerequiredtodeterminethistensor.Itwillalsobeshownthat

motionestimationfromthistensorismorestraightforward.

This theory will then be applied to the problem of computing changes in

three-dimensional orientation from a sparse set of point and line correspon-

dences.Specically,itwillbedemonstratedhowastraightforwardman-machine

interface for 3-D orientation interaction (Lindeberg & Bretzner 1998) can be

designedbasedonthetheorypresentedandusingnootheruserequipmentthan

theoperator'sownhand.Formoredetails,see(Bretzner&Lindeberg1998).

2 Geometric problem and extraction of image features

Aspecicapplicationweareinterestedinistomeasurechangesintheorientation

ofahumanhand,asastraightforwardinterfacetotransfer3-Drotationalinfor-

mation to acomputerusing no other userequipmentthan the operator'sown

hand.Incontrasttopreviousapproachesfor human{computerinteraction that

arebasedondetailedgeometrichandmodels(suchas(Lee&Kunii1995,Heap

&Hogg1996))weshallhereexploreamodelbasedonqualitativefeaturesonly.

This modelinvolvesthethumb, theindex ngerand themiddlenger,andfor

each nger the position of the ngertip and the orientation of the nger are

measuredintheimagedomain. Successfultrackingoftheseimagefeaturesover

(3)

whichisassumed toberigid.Itis worthnotingthat neitherthetrajectoriesof

pointfeaturesorlinefeaturesper se aresuÆcienttocomputethemotioninfor-

mationweareinterestedin.Theproblemrequiresthecombinationofpointand

linefeatures.Moreover,duetothesmallnumberofimagefeatures,theinforma-

tion is notsuÆcient to computethe trilineartensor for perspective projection

(see thenextsection).Forthis reason,weshalluse anaÆneprojectionmodel,

andtheaÆnetrifocaltensorwill beakeytool.

Thetrajectoriesofimagefeaturesusedasinputareextractedusingaframe-

work for feature tracking with automatic scaleselection reported in (Bretzner

&Lindeberg1996,Bretzner&Lindeberg1997).Blobfeaturescorrespondingto

the nger tips are computed from points (x;y; t) in scale-space (Koenderink

1984,Lindeberg1994)atwhichthesquarednormalizedLaplacian

(r 2

norm L)

2

=t 2

(L

xx +L

yy )

2

(1)

assumes maxima with respect to scale and space simultaneously (Lindeberg

1994).Suchpointsarereferredtoasscale-spacemaximaofthenormalizedLapla-

cian.Inasimilarway,ridgefeaturesaredetectedfromscale-spacemaximaofa

normalizedmeasureofridgestrengthdened by(Lindeberg1996)

AL 2

norm

=t 4

(L 2

pp L

2

qq )

2

=t 4

(L

xx L

yy )

2

+4L 2

xy

2

; (2)

where L

pp and L

qq

arethe eigenvaluesof the Hessianmatrix and thenormal-

izationparameter =0:875.Ateachridgefeature,awindowedsecondmoment

matrix(Forstner&Gulch1987, Bigunet al.1991,Lindeberg1994)

= Z Z

(;)2R 2

L 2

x L

y

L

x L

y L

2

y

g(;; s)dd (3)

iscomputed usingaGaussian windowfunction g(;; s)centeredat thespatial

maximumofAL

norm

andwiththeintegrationscalestunedbythedetection

scale of the scale-space maximum of AL

norm

. The eigenvector of corre-

spondingtothelargesteigenvaluegivestheorientationofthenger.

Theleftcolumnin gure3showsanexampleofimagetrajectoriesobtained

in thisway. An attractiveproperty of this feature trackingscheme is that the

scale selectionmechanismadapts the scale levels to thelocal image structure.

Thisgivestheabilitytotrackimagefeaturesoverlargesizevariations,whichis

particularly importantfor theridge tracker.Provided that thecontrastto the

backgroundissuÆcient,thisschemegivesfeaturetrajectoriesoverlargenumbers

offrames,usingaconceptuallyverysimpleinterframematchingmechanism.

3 The trifocal tensor for three centered aÆne cameras

Tocapturemotioninformationfromtheprojectionsofanunknownconguration

oflinesin3-D,itisnecessarytohaveatleastthreeindependentviews.Acanon-

ical model fordescribing thegeometric relationships betweenpointcorrespon-

dencesandlinecorrespondencesoverthreeperspectiveviewsisprovidedbythe

(4)

framescanbeobtainedbyfactorizingamatrixwithimagemeasurementstothe

product oftwomatricesof rank3,onerepresentingmotion,and theotherone

representingshape(Tomasi&Kanade1992,Ullman&Basri1991).Frameworks

forcapturinglinecorrespondencesovermultipleaÆneviewshavebeenpresented

by(Quan& Kanade1997)and forpoint featuresunder perspectiveprojection

by(Sturm &Triggs1996).

Thesubjectofthissectionistocombinetheideabehindthetrilineartensor

for simultaneous modelling ofpointand line correspondencesoverthree views

with the aÆne projection model. It will be shown how an algebraic structure

closely related to the trilinear tensor can be dened for three centered aÆne

cameras. This centered aÆne trifocal tensor involves linear relations between

thepointfeaturesandtrilinearrelationshipsbetweentheline features.

3.1 Perspective cameraand three views

ConsiderapointP =(x;y;1;) T

whichis projectedbythree cameramatrices

M=[I;0],M 0

=[A;u 0

] andM 00

=[B;u 00

] totheimagepointsp,p 0

andp 00

:

p= 0

@ x

y

1 1

A

= 0

@ 1000

0100

0010 1

A 0

B

@ x

y

1

C

A

; (4)

p 0

= 0

@ x

0

y 0

1 1

A

= 0

@ a

1

1 a

1

2 a

1

3 u

0 1

a 2

1 a

2

2 a

2

3 u

0 2

a 3

1 a

3

2 a

3

3 u

0 3

1

A 0

B

@ x

y

1

C

A

= 0

B

@ a

1 T

p+u 0

1

a 2

T

p+u 0

2

a 3

T

p+u 0

3 1

C

A

; (5)

p 00

= 0

@ x

0 0

y 00

1 1

A

= 0

@ b

1

1 b

1

2 b

1

3 u

00 1

b 2

1 b

2

2 b

2

3 u

00 2

b 3

1 b

3

2 b

3

3 u

00 3

1

A 0

B

@ x

y

1

C

A

= 0

B

@ b

1 T

p+u 00

1

b 2

T

p+u 00

2

b 3

T

p+u 00

3 1

C

A

: (6)

Following(Faugeras&Mourrain1995)and(Shashua1997),letusintroducethe

followingtwomatrices

r

j

=

1 0 x 0

0 1y 0

; s

k

=

1 0 x 00

0 1y 00

: (7)

Then, in terms of tensor notation (where i;j;k 2 [1;3], ; 2 [1;2] and we

throughoutfollowtheEinsteinsummationconventionthatadoubleoccurrence

ofanindeximpliessummationoverthatindex)therelationsbetweentheimage

coordinatesandthecamerageometrycanbewritten

r

j u

0 j

+r

j a

j

i p

i

=0; s

k u

00 k

+s

k b

k

i p

i

=0: (8)

Byintroducingthetrifocal tensor(Shashua1995,Hartley1995)

T jk

=a j

u 00

k

b k

u 0

j

; (9)

(5)

r

j s

k T

jk

i

=0: (10)

Writtenoutexplicitly,thisexpressioncorrespondstothefollowingfourrelations

betweentheprojectionsp,p 0

andp 00

ofP (Shashua1997):

x 0 0

T 13

i p

i

x 0 0

x 0

T 33

i p

i

+x 0

T 31

i p

i

T 11

i p

i

=0;

y 00

T 13

i p

i

y 00

x 0

T 33

i p

i

+x 0

T 32

i p

i

T 12

i p

i

=0;

x 00

T 23

i p

i

x 00

y 0

T 33

i p

i

+y 0

T 31

i p

i

T 21

i p

i

=0;

y 00

T 23

i p

i

y 00

y 0

T 33

i p

i

+y 0

T 32

i p

i

T 22

i p

i

=0:

(11)

Giventhreecorrespondinglines, l T

p=0,l 0

T

p 0

=0andl 00

T

p 00

=0,each image

linedenesaplanethroughthecenterofprojection,givenbyL T

P =0,L 0

T

P =

0andL 00

T

P =0,where

L T

=l T

M=(l

1

;l

2

;l

3 0);

L 0

T

=l 0

T

M 0

=(l 0

j a

j

1

;l 0

j a

j

2

;l 0

j a

j

3

;l 0

j u

0 j

);

L 00

T

=l 00

T

M 00

=(l 00

k b

k

1

;l 00

k b

k

2

; l 00

k b

k

3

;l 00

k u

00 k

):

(12)

Since l,l 0

and l 00

are assumedto beprojections of thesamethree-dimensional

line, theintersection oftheplanesL,L 0

andL 00

mustdegenerateto alineand

rank 0

B

@ l

1 l

0

j a

j

1 l

00

k b

k

1

l

2 l

0

j a

j

2 l

00

k b

k

2

l

3 l

0

j a

j

3 l

00

k b

k

3

0 l 0

j u

0 j

l 00

k u

00 k

1

C

A

=2: (13)

All 33minorsmustbezero,andremovalof thethree rstlines respectively,

leadstothefollowingtrilinearrelationships,outofwhichtwoareindependent:

(l

2 T

jk

3 l

3 T

jk

2 )l

0

j l

00

k

=0;

(l

1 T

jk

3 l

3 T

jk

1 )l

0

j l

00

k

=0;

(l

1 T

jk

2 l

2 T

jk

1 )l

0

j l

00

k

=0:

(14)

These expressionsprovideacompactcharacterizationof thetrilinearline rela-

tionsrstintroducedby(Spetsakis&Aloimonos1990).

Insummary, eachpointcorrespondencegivesfour equations,and each line

correspondence two.Hence,K pointsand Llines are(generically)suÆcient to

express a linear algorithm for computing the trilinear tensor (up to scale) if

(6)

Considernext apointQ =(x;y;;1) T

which is projectedto theimage points

q, q 0

andq 00

bythreeaÆnecameramatricesM,M 0

and M 00

, respectively:

q= 0

@ x

y

1 1

A

=MQ= 0

@ 1000

0100

0001 1

A 0

B

@ x

y

1 1

C

A

; (15)

q 0

= 0

@ x

0

y 0

1 1

A

=M 0

Q= 0

@ c

1

1 c

1

2 c

1

3 v

0 1

c 2

1 c

2

2 c

2

3 v

0 2

0 0 0 1 1

A 0

B

@ x

y

1 1

C

A

; (16)

q 00

= 0

@ x

0 0

y 00

1 1

A

=M 00

Q= 0

@ d

1

1 d

1

2 d

1

3 v

00 1

d 2

1 d

2

2 d

2

3 v

00 2

0 0 0 1 1

A 0

B

@ x

y

1 1

C

A

: (17)

Here, the parameterization of Q diers from P, since for an image point q =

(x;y;1) T

theprojection(15)impliesthatthethree-dimensionalpointisonthe

rayQ=(x;y;;1) T

forsome.Byeliminating,weobtainthefollowinglinear

relationshipsbetweentheimagecoordinatesofq,q 0

andq 00

:

(c 1

3 d

1

1 c

1

1 d

1

3 )x+(c

1

3 d

1

2 c

1

2 d

1

3 )y+d

1

3 x

0

c 1

3 x

0 0

+(c 1

3 v

00 1

d 1

3 v

0 1

)=0;

(c 2

3 d

1

1 c

2

1 d

1

3 )x+(c

2

3 d

1

2 c

2

2 d

1

3 )y+d

1

3 y

0

c 2

3 x

0 0

+(c 2

3 v

00 1

d 1

3 v

0 2

)=0;

(c 1

3 d

2

1 c

1

1 d

2

3 )x+(c

1

3 d

2

2 c

1

2 d

2

3 )y+d

2

3 x

0

c 1

3 y

00

+(c 2

3 v

00 2

d 2

3 v

0 2

)=0;

(c 2

3 d

2

1 c

2

1 d

2

3 )x+(c

2

3 d

2

2 c

2

2 d

2

3 )y+d

2

3 y

0

c 2

3 y

00

+(c 2

3 v

00 2

d 2

3 v

0 2

)=0:

(18)

This structure corresponds to thetrilinearconstraint(11) forperspectivepro-

jection,andweshallreferto itastheaÆnetrifocalpointconstraint.

Threelinesl T

q=0,l 0

T

q 0

=0andl 00

T

q 0 0

=0inthethreeimagesdenethree

planesL T

Q=0,L 0

T

Q=0andL 00

T

Q=0in three-dimensionalspacewith

L T

=l T

M =(l

1

;l

2

;0;l

3 );

L 0

T

=l 0

T

M 0

=(l 0

1 c

1

1 +l

0

2 c

2

1

; l 0

1 c

1

2 +l

0

2 c

2

;l 0

1 c

1

3 +l

0

2 c

2

3

; l 0

1 v

0 1

+l 0

2 v

0 2

+l 0

3 );

L 0

T

=l 00

T

M 00

=(l 00

1 d

1

1 +l

00

2 d

2

1

; l 00

1 d

1

2 +l

00

2 d

2

;l 00

1 d

1

3 +l

00

2 d

2

3

;l 00

1 v

0 0 1

+l 00

2 v

0 0 2

+l 00

3 ):

Since l, l 0

and l 00

are projections of the samethree-dimensionalline, the inter-

sectionofL,L 0

andL 00

mustdegenerateto alineand

rank

l

1 l

0

1 c

1

1 +l

0

2 c

2

1

l 00

1 d

1

1 +l

00

2 d

2

1

l

2 l

0

1 c

1

2 +l

0

2 c

2

l 00

1 d

1

2 +l

00

2 d

2

0 l

0

1 c

1

3 +l

0

2 c

2

3

l 00

1 d

1

3 +l

00

2 d

2

3

l

3 l

0

v 0

1

+l 0

v 0

2

+l 0

l 00

v 0

1

+l 00

v 00

2

+l 00

=2: (19)