Sparse Point and Line Correspondences
in Multiple AÆne Views
Lars Bretzner and Tony Lindeberg
ComputationalVisionand Active Perception Laboratory (CVAP)
Departmentof Numerical Analysisand Computing Science
KTH (Royal Institute of Technology)
S-100 44 Stockholm, Sweden.
http://www.nada.kth.se/~tony
Email: f bretzner, tonyg@nada.kth.se
Technical report ISRN KTH/NA/P{99/13{SE
Abstract
Thispaperaddressestheproblemofcomputingthree-dimensionalstructureand
motion from an unknown rigid congurationof points and lines viewed by an
aÆne projectionmodel. An algebraicstructure, analogousto the trilinearten-
sor forthreeperspectivecameras,isdened forcongurationsof threecentered
aÆne cameras. This centered aÆnetrifocal tensor contains 12non-zero coeÆ-
cientsandinvolveslinearrelationsbetweenpointcorrespondences andtrilinear
relationsbetweenlinecorrespondences. ItisshownhowtheaÆnetrifocaltensor
relatestotheperspectivetrilineartensor,andhowthree-dimensionalmotioncan
becomputedfrom this tensorin astraightforwardmanner. A factorizationap-
proachisdevelopedto handlepointfeaturesandlinefeatures simultaneouslyin
image sequences, anddegeneratefeature congurationsareanalysed. This the-
ory isappliedtoaspecic probleminhuman-computerinteractionofcapturing
three-dimensional rotationsfrom gestures of a human hand. This application
to quantitative gesture analyses illustrates the usefulness of the aÆne trifocal
tensorinasituationwheresuÆcientinformationisnotavailabletocomputethe
perspectivetrilinear tensor, while thegeometry requires pointcorrespondences
aswellasline correspondencesoveratleastthree views.
An earlier version of this manuscriptwas presented inH. Burkhardt and B. Neumann(eds.)
Proc. 5th European Conference onComputer Vision,(Freiburg,Germany),vol. 1406 of Springer-
VerlagLectureNotesinComputerScience,pp. 141{157,June1998. ThesupportfromtheSwedish
ResearchCouncilforEngineeringSciences,TFR,andtheSwedishNationalBoardforIndustrialand
TechnicalDevelopment,NUTEK,isgratefullyacknowledged.
1 Introduction 1
2 Geometric problem and extraction of image features 2
3 The trifocal tensor for three centered aÆne cameras 3
3.1 Perspectivecamera and three views. . . . . . . . . . . . . . . . . . . . 4
3.2 AÆne cameraand three views. . . . . . . . . . . . . . . . . . . . . . . 5
4 The centered aÆne camera and its relations to perspective 6 5 Orientation from the centered aÆne trifocal tensor 8 6 Joint factorization of point and line correspondences 10 6.1 Structureestimationfrompointand linecorrespondences . . . . . . . 12
6.2 Resolvingtheambiguityinthe rotationestimates . . . . . . . . . . . . 13
6.3 Relative weightingof point andlineconstraints . . . . . . . . . . . . . 13
6.3.1 Computing thecentered aÆnetrifocaltensor . . . . . . . . . . 14
6.3.2 Findingscalefactorsof linespriorto factorization . . . . . . . 14
6.3.3 Simultaneous factorization ofpointsandlines . . . . . . . . . . 15
7 Degenerate situations 16 7.1 Degenerate three-dimensionalshapes . . . . . . . . . . . . . . . . . . . 16
7.2 Degenerate three-dimensionalmotions . . . . . . . . . . . . . . . . . . 17
8 Experiments 18 8.1 Experimentson synthetictest data . . . . . . . . . . . . . . . . . . . . 18
8.1.1 Errormeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
8.1.2 In uence offeaturelocalizationerrors . . . . . . . . . . . . . . 19
8.1.3 In uence ofnumberof imagefeatures . . . . . . . . . . . . . . 20
8.1.4 In uence ofperspective eects . . . . . . . . . . . . . . . . . . 20
8.1.5 In uence oftemporal samplingdensity. . . . . . . . . . . . . . 21
8.2 Dependencyon object shape . . . . . . . . . . . . . . . . . . . . . . . 22
8.3 Conclusionsfrom thesyntheticexperiments . . . . . . . . . . . . . . . 25
8.4 Experimentson real imagedata. . . . . . . . . . . . . . . . . . . . . . 25
9 Summary and discussion 26 A Appendix 27 A.1 Algebraicconstraintson theaÆne trifocaltensor . . . . . . . . . . . . 27
A.2 Experimentalinvestigationof a minimalcase . . . . . . . . . . . . . . 28
Theproblemofderivingstructuralinformationandmotioncuesfromimagesequences
arises as an important subproblem in several computer vision tasks. In thispaper,
we are concerned with the computation of three-dimensional structure and motion
frompoint and linecorrespondencesextracted froma rigidthree-dimensionalobject
ofunknownshape,usingtheaÆnecamera model.
Early works addressing this problem domain based on point correspondences
from perspective and orthographic projection have been presented by Ullman [1 ],
Maybank [2 ], Huang and Lee [3 ], Huang and Netravali [4] and others. With the
introduction of the aÆne camera model (Koenderink and van Doorn [5 ], Mundy
and Zisserman [6 ]) a large number of approaches have been developed, including
(Shapiro[7 ], Beardsley etal. [8 ], McLauchlan et al. [9], Torr [10 ])to mention just a
few,see also (Faugeras [11 ]). Line correspondenceshave been studiedby(Spetsakis
andAloimonos [12 ], Weng etal[13]), and factorization methodsforpointsandlines
constituteaparticularlyinterestingdevelopment(Tomasiand Kanade[14 ],Poelman
andKanade[15 ],QuanandKanade[16 ], Sturmand Triggs[17 ]). Thesedirectionsof
research haverecentlybeencombinedwiththe ideasbehindthefundamentalmatrix
(Longuet-Higgins[18 ],Faugeras[19 ],Xu andZhang[20 ])andhave leadto thetrilin-
eartensor(Shashua [21 ],Hartley[22 ], Heyden[23])asa uniedmodelforpoint and
line correspondences for three cameras, with interesting applications (Beardsley et
al.[24 ])aswellasadeeperunderstandingoftherelationsbetweenpointfeaturesand
linefeaturesovermultipleviews(Faugerasand Mourrain [25 ],Heydenet al. [26 ]).
The subject of this paper is to build upon the abovementioned works, and to
develop a framework for handling point and line features simultaneously for three
or more aÆne views. Initially, we shall focus on image triplets and show how an
aÆne trifocal tensor can be dened for three centered aÆne cameras. This tensor
hasasimilaralgebraicstructureasthetrilineartensorforthreeperspectivecameras.
Compared to the trilinear tensor, however, it has the advantage that it contains
a smaller number of coeÆcients, which implies that fewer feature correspondences
are required to determine this tensor. Motion estimation from this tensor is more
straightforwardthan fortheperspective trilineartensor. Moreover, the resultsfrom
aÆnemotionestimationcan beexpectedtobemorerobustthanperspectiveanalysis
in situations when the perspective eects are small. The handle image features in
morethanthreeimages,weshallalsodevelopafactorizationapproach,whichinvolves
simultaneous handlingofpointand linefeaturesin multipleimageframes.
This theory will then be applied to the problem of computing changes in three-
dimensionalorientationfroma sparseset of pointand linecorrespondences. Speci-
cally,itwillbedemonstratedhowaman-machineinterfacefor3-Dinteractioncan be
designedbased on thetheory presented. The idea isto track pointand linefeatures
corresponding to the nger tips and the orientation of the ngers, and to compute
three-dimensionalrotations (andtranslations) assuming rigidityof thehand. These
motion estimates can then be used for controlling the motion of other computer-
controlledequipment (Lindeberg and Bretzner [27 ]). Notably,we thereby eliminate
theneedforother externalcontrolequipmentthantheoperator'sown hand.
A mainrationale to this work originates from the following question: If we have a
sparse set of image features that have been tracked over a relatively long period of
time, to what extent can such extended feature trajectories be used for computing
the three-dimensional structure and motion of a rigid object? Moreover, we are
interestedinexploringwhether itispossibletomake useofimagefeaturesthathave
been extracted from natural objects. Most works on three-dimensional structure
andmotionestimationhave beenperformedunderdierentconditions,byexploiting
densesets of imagefeatures, whichhave beencomputed fromman-made objects.
Figure 1 shows one specic application, which we will focus on. The idea is to
capture three-dimensional motions as mediated by the gestures of a human hand,
and to use measurements of 3-D rotational information computed in this way for
controllingothercomputerizedequipment,see[27 ]foramore generaldescriptionand
Cipolla et al. [28 ], Freeman and Weissman [29 ], Maggioni and Kammerer [30 ] for
related works. In contrast to previous approaches for human{computer interaction
thatarebasedondetailedgeometrichandmodels(suchasKuchandHuang[31 ],Lee
and Kunii [32], Heap and Hogg [33], Yasumuro et al. [34 ]), we shall here explore a
model based on qualitative features only. This model involves three to ve ngers,
andforeach nger thepositionofthe ngertipand theorientationof thenger are
measuredintheimagedomain. Successfultrackingoftheseimagefeaturesovertime
leadsto asetofpointcorrespondencesandlinecorrespondences. Thetaskisthento
computechangesinthe3-D orientationofsucha conguration,whichisassumed to
be rigid.
Given only a a small number of image features, neither the trajectories of the
point features or the line features per se are suÆcient to compute the motion in-
formation we are interested in. For example, when a user holds his hand with the
ngersspreading out, we have experienced that the positions of the nger tips will
oftenbeinapproximatelythesameplane,leadingtoill-conditionedmotionestimates
if computed from point features only. Therefore, the ability to combine point fea-
turesand linefeatures isof high importance. Moreover, dueto thesmallnumberof
image features, the informationis not suÆcient to compute the trilineartensor for
perspective projection (see the next section). For thisreason, we shalluse an aÆne
projectionmodel,andthe aÆnetrifocal tensorwillbe a keytool.
The trajectories ofimagefeatures usedasinputareextractedusingaframework
for feature tracking with automatic scale selection reported in (Bretzner and Lin-
deberg [35 , 36 ]). Blob features corresponding to the nger tipsare computed from
points(x;y; t)inscale-space (Koenderink[37 ],Lindeberg[38 ])at whichthesquared
normalizedLaplacian
(r 2
norm L)
2
=t 2
(L
xx +L
yy )
2
(1)
assumes maxima with respect to scale and space simultaneously (Lindeberg [39 ]).
Such points are referred to as scale-space maxima of the normalized Laplacian. In
a similar way, ridge features are detected from scale-space maxima of a normalized
measureof ridgestrength
AL 2
norm
= t 4
(L 2
pp L
2
qq )
2
=t 4
(L
xx L
yy )
2
+4L 2
xy
2
; (2)
pp qq
parameter = 0:875 (Lindeberg [40 ]). At each ridge feature, a windowed second
moment matrix
= Z Z
(;)2R 2
L 2
x L
x L
y
L
x L
y L
2
y
g(;; s)dd (3)
iscomputedusingaGaussianwindowfunctiong(;; s)centered at thespatialmax-
imum of AL
norm
and with theintegration scale stuned bythe detection scaleof
the scale-space maximum of AL
norm
. The eigenvector of corresponding to the
largest eigenvalue givestheorientationof thenger.
Figure1:Resultsofmulti-scaletrackingofpointandlinefeaturescorrespondingtothenger
tipsand the ngersof ahumanhand. (left)grey-levelimage showingthe rst framein an
imagesequence,(middle)imagefeaturesextractedbycombiningthedetectionofscale-space
maximaof blob and ridgefeatures [39, 40] with a qualitative hand model in the form of a
multi-scalefeaturehierarchy[41],(right)feature trajectoriesobtainedbymulti-scalefeature
tracking[35].
Figure 1(c) shows an exampleof image trajectoriesobtained inthisway. An at-
tractivepropertyofthisfeaturetrackingschemeisthatthescaleselectionmechanism
adapts the scale levels to the local image structure. This gives the ability to track
imagefeaturesoverlargesizevariations,whichisparticularlyimportantfortheridge
tracker. Providedthatthecontrastto thebackgroundissuÆcient,thisscheme gives
feature trajectories over large numbers of frames, using a conceptually very simple
interframe matchingmechanism.
3 The trifocal tensor for three centered aÆne cameras
To capture motion information from the projections of an unknown conguration
of points and lines in 3-D, it is necessary to have at least three independent views.
A canonical model for describing the geometric relationships between point corre-
spondencesandlinecorrespondencesoverthree perspectiveviewsis providedbythe
trilineartensor (Shashua [21 , 42 ], Hartley [22 ], Heyden et al. [26 ]). For aÆne cam-
eras,acompactmodelofpointcorrespondencesovermultipleframescanbeobtained
by factorizing a matrixwith image measurements to theproduct of two matrices of
rank 3,one representing motion,and the other one representing shape(Tomasi and
Kanade[14 ],UllmanandBasri[43 ]). Frameworksforcapturing linecorrespondences
overmultipleaÆneviewshavebeenpresentedbyQuanandKanade[16 ]andforpoint
featuresunderperspectiveprojectionbySturmandTriggs[17 ].
simultaneous modellingof point and linecorrespondencesover three views with the
aÆne projectionmodel. It willbe shown how an algebraicstructure closely related
tothetrilineartensorcan bedenedforthree centered aÆnecameras. Thiscentered
aÆnetrifocal tensor involveslinearrelationsbetweenthepointfeaturesandtrilinear
relationshipsbetween thelinefeatures.
3.1 Perspective camera and three views
Considera point P =(x;y;1;) T
whichis projectedbythree camera matricesM =
[I;0], M 0
=[A;u 0
]and M 00
=[B;u 00
]to theimagepoints p,p 0
and p 00
:
p= 0
@ x
y
1 1
A
= 0
@
1 0 0 0
0 1 0 0
0 0 1 0 1
A 0
B
B
@ x
y
1
1
C
C
A
; (4)
p 0
= 0
@ x
0
y 0
1 1
A
= 0
@ a
1
1 a
1
2 a
1
3 u
01
a 2
1 a
2
2 a
2
3 u
0 2
a 3
1 a
3
2 a
3
3 u
0 3
1
A 0
B
B
@ x
y
1
1
C
C
A
= 0
B
@ a
1 T
p+u 0
1
a 2
T
p+u 0
2
a 3
T
p+u 0
3 1
C
A
; (5)
p 00
= 0
@ x
00
y 00
1 1
A
= 0
@ b
1
1 b
1
2 b
1
3 u
00 1
b 2
1 b
2
2 b
2
3 u
00 2
b 3
1 b
3
2 b
3
3 u
00 3
1
A 0
B
B
@ x
y
1
1
C
C
A
= 0
B
@ b
1 T
p+u 00
1
b 2
T
p+u 00
2
b 3
T
p+u 00
3 1
C
A
: (6)
FollowingFaugerasand Mourrain [25 ] andShashua [42 ],introducethefollowingtwo
matrices
r
j
=
1 0 x
0
0 1 y
0
; s
k
=
1 0 x
00
0 1 y
00
: (7)
Then, in terms of tensor notation (where i;j;k 2 [1;3], ; 2 [1;2] and we follow
the Einstein summation convention that a double occurrence of an index implies
summation over that index) the relations between the image coordinates and the
camerageometrycan be written
r
j u
0 j
+r
j a
j
i p
i
=0; s
k u
00 k
+s
k b
k
i p
i
=0: (8)
Byintroducingthetrifocaltensor(Shashua [21 ],Hartley [22 ])
T jk
i
=a j
i u
0 0 k
b k
i u
0 j
; (9)
therelationsbetweenthe pointcorrespondenceslead to thetrifocalconstraint
r
j s
k T
jk
i
=0: (10)
Writtenoutexplicitly,thisexpressioncorrespondstothefollowingfour(independent)
relationsbetweentheprojections p,p 0
and p 00
of P (Shashua[42 ]):
x 00
T 13
i p
i
x 0 0
x 0
T 33
i p
i
+x 0
T 31
i p
i
T 11
i p
i
=0;
y 00
T 13
i p
i
y 00
x 0
T 33
i p
i
+x 0
T 32
i p
i
T 12
i p
i
=0;
x 00
T 23
i p
i
x 00
y 0
T 33
i p
i
+y 0
T 31
i p
i
T 21
i p
i
=0;
y 00
T 23
i p
i
y 00
y 0
T 33
i p
i
+y 0
T 32
i p
i
T 22
i p
i
=0:
(11)
Given three corresponding lines,l T
p =0, l 0
p 0
= 0 and l 00
p 00
= 0, each image line
denes a plane throughthe center of projection, given by L T
P =0, L 0
T
P = 0 and
L 00
T
P =0,where
L T
=l T
M =(l
1
;l
2
;l
3 0);
L 0
T
=l 0
T
M 0
=(l 0
j a
j
1
;l 0
j a
j
2
;l 0
j a
j
3
;l 0
j u
0 j
);
L 00
T
=l 00
T
M 0 0
=(l 0 0
k b
k
1
;l 00
k b
k
2
;l 00
k b
k
3
;l 00
k u
00 k
):
(12)
Since l, l 0
and l 00
are assumed to be projections of the same three-dimensionalline,
theintersection ofthe planesL,L 0
and L 00
mustdegenerate to a lineand
rank 0
B
B
B
@ l
1 l
0
j a
j
1 l
00
k b
k
1
l
2 l
0
j a
j
2 l
00
k b
k
2
l
3 l
0
j a
j
3 l
00
k b
k
3
0 l 0
j u
0 j
l 00
k u
00 k
1
C
C
C
A
=2: (13)
All33 minorsmustbezero, and removalof thethreerst lines respectively,leads
to thefollowingtrilinearrelationships,outof which twoare independent:
(l
2 T
jk
3 l
3 T
jk
2 )l
0
j l
00
k
=0;
(l
1 T
jk
3 l
3 T
jk
1 )l
0
j l
00
k
=0;
(l
1 T
jk
2 l
2 T
jk
1 )l
0
j l
00
k
=0:
(14)
These expressions provide a compact characterization of the trilinear line relations
rstintroducedbySpetsakisand Aloimonos[12 ].
In summary, each point correspondence gives four equations, and each line cor-
respondence two. Hence, K pointsand L lines are(generically) suÆcient to express
a linear algorithm forcomputing the trilineartensor (upto scale) if 4K+2L 26
(Shashua [21 ], Hartley[22]).
3.2 AÆne camera and three views
Consider next a point Q =(x;y;;1) T
which is projected to the image points q, q 0
andq 00
bythree aÆne cameramatrices M,M 0
and M 00
,respectively:
q = 0
@ x
y
1 1
A
=MQ= 0
@
1 0 0 0
0 1 0 0
0 0 0 1 1
A 0
B
B
@ x
y
1 1
C
C
A
(15)
q 0
= 0
@ x
0
y 0
1 1
A
=M 0
Q= 0
@ c
1
1 c
1
2 c
1
3 v
0 1
c 2
1 c
2
2 c
2
3 v
0 2
0 0 0 1
1
A 0
B
B
@ x
y
1 1
C
C
A
(16)
q 00
= 0
@ x
00
y 00
1 1
A
=M 0 0
Q= 0
@ d
1
1 d
1
2 d
1
3 v
0 0 1
d 2
1 d
2
2 d
2
3 v
0 0 2
0 0 0 1
1
A 0
B
B
@ x
y
1 1
C
C
A
(17)