Recently Searched

No results found

Tags

No results found

Document

No results found

Upload

Home Schools Topics

Log in

A Prototype System for Computer Vision Based Human Computer Interaction

Share "A Prototype System for Computer Vision Based Human Computer Interaction"

N/A

N/A

Protected

Academic year: 2021

Info

Protected

Academic year: 2021

Share "A Prototype System for Computer Vision Based Human Computer Interaction"

Copied!

15

0

0

15

0

0

Loading.... (view fulltext now)

Download now ( 15 Page )

Full text

(1)

Based Human Computer Interaction

Lars Bretzner, 1

Ivan Laptev, 1

Tony Lindeberg, 1

Soren Lenman, 2

Yngve Sundblad 2

1

ComputationalVisionand Active Perception Laboratory (CVAP)

2

Center for User-Oriented IT-Design (CID)

Departmentof Numerical Analysisand Computing Science

KTH (Royal Institute of Technology)

S-100 44Stockholm, Sweden.

Technical report ISRN KTH/NA/P{01/09{SE

1 Introduction

Withthedevelopment of informationtechnology inoursociety, we can expect

that computer systems to a larger extent will be embedded into our environ-

ment. Theseenvironmentswillimposeneedsfornewtypesofhuman-computer-

interaction,with interfacesthat arenaturaland easy to use. Inparticular, the

abilitytointeractwithcomputerizedequipmentwithoutneedforspecialexter-

nalequipment isattractive.

Today, the keyboard, the mouse and the remote control are used as the

main interfaces for transferring information and commands to computerized

equipment. Insomeapplicationsinvolvingthree-dimensionalinformation,such

asvisualization,computer games and controlof robots, other interfaces based

ontrackballs,joysticksanddataglovesarebeingused. Inourdailylife,however,

wehumansuseourvisionandhearingasmainsourcesofinformationaboutour

environment. Therefore, one may ask to what extent it would be possible to

developcomputerizedequipmentabletocommunicatewithhumansinasimilar

way,byunderstandingvisualand auditive input.

Perceptualinterfaces based on speech have already started to nda num-

ber of commercial and technical applications. For examples, systems are now

available where speech commands can be use for dialing numbers in cellular

The support from the Swedish Research Council for Engineering Sciences,

TFR, and the Swedish National Board for Industrial and Technical Development,

NUTEK, is gratefully acknowledged. On-line video clip demos can be viewed from

http://www.nada.kth.se/cvap/g vmdi , and an on-line version of this manuscript can be

fetchedfromhttp://www.nada.kth.se/cvap/a bstr acts/ cvap 251. html.

(2)

ingpowerofcomputershasreachedapointwherereal-timeprocessingofvisual

informationis possiblewith commonworkstations.

The purpose of this article is to describe ongoing work in developing new

perceptualinterfaceswithemphasis oncommandsexpressed ashandgestures.

Examplesof applicationsofhandgesture analysisinclude:

Controlof consumerelectronics

Interactionwithvisualization systems

Controlof mechanical systems

Computer games

Potential advantages of usingvisualinputinthiscontext arethat visualinfor-

mation makes it possible to communicate with computerized equipment at a

distance, without need for physical contact with the equipment that is to be

controlled. Moreover,theusershouldbeableto controltheequipmentwithout

needforspecializedexternal devices,such asa remotecontrol.

2 Control by hand gestures

Figure 1 shows an illustrationof a type of scenario we are interested in. The

user is infront of a camera connectedto a computer. The camera follows the

movements of thehand, and performs actions dependingon the state and the

motion of the hand. Three basic types of hand gestures can be identied in

such asituation:

Astatichandpostureimpliesthatthehandisheldinaxedstateduringa

certainperiodoftime,duringwhichthesystemrecognizesthestategiven

a predened set of states. Examples of interpretations that are possible

Figure 1:Example of asimple situation where the user controls actions on ascreen

usinghand gestures. Inthisapplication,thepositionofthecursoriscontrolledbythe

(3)

between dierentmodesfora commandinvolving motion.

Aquantitativehand motionmeansthatthetwo-dimensionalorthethree-

dimensional motion of the hand is measured, and the estimated motion

parameters (translationsandrotations)arebeingusedforcontrollingthe

motionofothercomputerizedequipment,suchasvisualizationparameters

for displaying a three-dimensional object, the volume of a TV or the

motion of robot.

Aqualitativehand motionmeansthatthehandmovesaccordingtoapre-

dened motion pattern(a trajectory inspace-time) and that themotion

patternisrecognizedfromapredenedsetofmotionpatterns. Examples

ofinterpretationsincludeletters(thePalmPilotsignlanguage)orcontrol

of consumerelectronicsina similarmannerasforstatic handpostures.

3 A prototype scenario

To be ableto test computer-vision-basedhuman-computer-interactioninprac-

tice, we developed a prototype test bed system, where the user can control a

TVset and alampusingthefollowingtypes ofhandpostures:

Three openngers(gure 2(a)) toggle theTV on oro.

Twoopen ngers(gure 2(b-c))changethe channeloftheTV.With the

index nger pointing to one side, the next TV channelis selected, while

the previouschannelis selected iftheindexnger pointsupwards.

Five open ngers(gure 2(d)) toggle thelampon oro.

Figure 3 shows a few snapshots from a demonstration, where a user controls

equipment in theenvironment in thisway. Ingures 3(a){(b) a user turnson

the lamp, in gures 3(c){(d) he turns on the TV set, and in gures 3(e){(f)

heswitches theTVset to anew channel. Allstepsinthisdemonstrationhave

Toggle TVon/o Next channel Previous channel Toggle lampon/o

Figure2:Handposturescontrollingaprototypescenario: (a)ahand withthreeopen

ngerstogglestheTVonoro,(b)ahandwithtwoopenngersandtheindexnger

pointingtoonesideselectsthenextTVchannel,(c)ahandwithtwoopenngersand

theindex ngerpointing upwards selects theprevious TV channel, (d) a hand with

veopenngerstoggles thelamp onoro.

(4)

(a) (b)

(c) (d)

(e) (f)

Figure3:Afewsnapshotsfromascenariowhereauserentersaroomandturnsonthe

lamp(a)-(b),turnsontheTVset(c)-(d)andswitchestoanewTVchannel(e)-(f).

(5)

systemdescribedinnext section.

4 A prototype system

To track and recognize hands in multiple states, we have developed a system

basedonacombinationofshapeand colourinformation. At anoverviewlevel,

thesystemconsistsof thefollowingfunctionalities(see gure 4):

Image capturing

Colour segmentation

Feature detection

Tracking and Pose recognition

Application control

ROI

Blobs and Ridges Colour image

Pose, Position, Scale and Orientation

Figure 4:Overview of the main components of the prototype system for detecting

and recognizing hand gestures, and using this information for controlling consumer

electronics.

Theimageinformationfromthecameraisgrabbedatframerate,thecolour

images are converted from RGB format to a new colour space that separates

the intensity and chromaticity components of the colour data. In the colour

images, colour feature detection is performed, which results in a set of image

featuresthatcanbematchedto amodel. Moreover, acomplementarycompar-

isonbetweenactualcolourandskincolourisperformedto identifyregionsthat

aremorelikelytocontainhands. Basedonthedetectedimagefeaturesand the

computedskincolour similarity,comparisonwith a setof object hypothesesis

performed using a statistical approach referred to as particle ltering or con-

densation. The most likelyhand posture is estimated, aswell asthe position,

sizeand orientationofthehand. This recognizedgesture informationisbound

to dierent actions relative to the environment, and these actions are carried

under the control of the gesture recognition system. In this way, the gesture

recognition system provides a medium by which the user can control dier-

ent typesof equipment in hisenvironment. AppendixAgives a more detailed

descriptionof thealgorithmsand computationalmodulesinthe system.

(6)

The problem of hand gesture analysis has received increased attention recent

years. Early work of using hand gestures for television control was presented

by (Freeman & Weissman 1995) usingnormalized correlation; see also (Kuch

& Huang 1995, Pavlovic et al. 1997, Maggioni & Kammerer 1998, Cipolla&

Pentland 1998) for related works. Some approaches consider elaborated 3-

D hand models (Regh & Kanade 1995), while others use colour markers to

simplifyfeature detection (Cipollaet al. 1993). Appearance-based models for

hand tracking and sign recognition were used by (Cui & Weng 1996), while

(Heap &Hogg 1998, MacCormick & Isard 2000) tracked silhouettes of hands.

Graph-like and feature-based hand models have been proposed by (Triesch &

vonderMalsburg1996)forsignrecognitionandin(Bretzner&Lindeberg1998)

fortrackingand estimating3-D rotationsof ahand.

Theuseofahierarchicalhandmodelcontinuesalongtheworksby(Crowley

&Sanderson1987)whoextractedpeaks fromaLaplacianpyramidofanimage

and linked them into a tree structure with respect to resolution, (Lindeberg

1993) who constructed scale-space primal sketch with an explicit encoding

of blob-like structures in scale space as well as the relations between these,

(Triesch&von derMalsburg1996)who usedelasticgraphsto representhands

in dierent postures with local jets of Gabor lters computed at each vertex,

(Lindeberg1998) who performed feature detection with automaticscale selec-

tionbydetectinglocalextremaofnormalizeddierentialentitieswithrespectto

scale,(Shokoufandehetal.1999)whodetectedmaximainamulti-scalewavelet

transform, aswellas(Bretzner & Lindeberg1999), who computedmulti-scale

blobandridgefeaturesand denedexplicitqualitative relationsbetweenthese

features. The useof chromaticityas a primarycuefor detecting skincoloured

regionswasrstproposedby(Fleck etal.1996).

Our implementation of particle ltering largely follows the traditional ap-

proachesforcondensationaspresentedby(Isard&Blake1996,Black&Jepson

1998, Sidenbladhet al. 2000, Deutscher et al. 2000) and others. Using thehi-

erarchical multi-scale structure of the hand models, however, we adapted the

layeredsamplingapproach(Sullivanetal.1999)andusedacoarse-to-nesearch

strategyto improvethecomputational eÆciency,here,bya factor oftwo.

The proposed approach is based on several of these works and is novel in

the respect that it combines a hierarchical object model with image features

at multiple scales and particle ltering for robust tracking and recognition.

For more details about the algorithmic aspects underlying the tracking and

recognitioncomponentsinthecurrent system,see (Laptev&Lindeberg2000).

6 The CVAP-CID collaboration

Theworkis carried outasa collaborationprojectbetweentheComputational

Vision and Active Perception Laboratory (CVAP) and the Center for User-

Oriented IT-Design at KTH, where CVAP provides expertise on computer vi-

sion,whileCIDprovides expertiseon human-computer-interaction.

(7)

tralimportancethatuserstudiesarebeingcarriedoutandthattheinteraction

istestedinprototypesystemsasearlyaspossible. Computervisionalgorithms

for gesture recognition will be developed by CVAP, and will be used in pro-

totype systems in scenarios dened in collaboration with CID. User studies

forthese scenarios willthen be performed and bedeveloped by CID, to guide

furtherdevelopments.

References

Black, M. & Jepson,A. (1998), A probabilistic frameworkfor matchingtemporaltrajecto-

ries: Condensation-based recognition of gestures and expressions, in `Fifth European

ConferenceonComputerVision',Freiburg,Germany,pp.909{924.

Bretzner, L. & Lindeberg, T. (1998), Use your hand as a 3-D mouse or relative orienta-

tionfromextended sequencesof sparsepoint andline correspondences usingtheaÆne

trifocal tensor, in H. Burkhardt & B.Neumann, eds, `Fifth European Conference on

Computer Vision', Vol. 1406 of Lecture Notes in Computer Science, Springer Verlag,

Berlin,Freiburg,Germany,pp.141{157.

Bretzner, L. & Lindeberg, T. (1999), Qualitative multi-scale feature hierarchies for object

tracking,inO.F.O.M.Nielsen,P.Johansen&J.Weickert,eds,`Proc.2ndInternational

Conference onScale-Space Theories in Computer Vision', Vol. 1682, Springer Verlag,

Corfu,Greece,pp.117{128.

Cipolla, R., Okamoto, Y. & Kuno, Y. (1993), Robust structure frommotion usingmotion

parallax, in `Fourth International Conference on Computer Vision', Berlin, Germany,

pp.374{382.

Cipolla, R. & Pentland, A., eds (1998), Computer vision for human-computer interaction,

CambridgeUniversityPress,Cambridge,U.K.

Crowley, J. & Sanderson, A. (1987), `Multiple resolution representation and probabilistic

matchingof2-dgray-scaleshape',IEEETransactionsonPatternAnalysisandMachine

Intelligence9(1),113{121.

Cui, Y. & Weng, J. (1996), View-based hand segmentation and hand-sequence recognition

withcomplexbackgrounds, in`13thInternationalConference onPatternRecognition',

Vienna,Austria,pp.617{621.

Deutscher, J., Blake, A. & Reid, I. (2000), Articulated body motion capture by annealed

particleltering,in`CVPR'2000', HiltonHead,SC,pp.II:126{133.

Fleck, M., Forsyth, D. & Bregler, C. (1996), Finding naked people, in `Fourth European

ConferenceonComputerVision',Cambridge,UK,pp.II:593{602.

Freeman,W.T.&Weissman,C.D.(1995),Televisioncontrolbyhandgestures,in`Proc.Int.

Conf.onFaceandGestureRecognition',Zurich,Switzerland.

Heap, T. & Hogg, D. (1998), Wormholes in shape space: Tracking through discontinuous

changes in shape, in `Sixth International Conference on Computer Vision', Bombay,

India,pp.344{349.

Isard,M.&Blake,A.(1996),Contourtrackingbystochasticpropagationofconditionalden-

sity,in`FourthEuropeanConferenceonComputerVision',Cambridge, UK,pp.I:343{

356.

Kuch,J.J.&Huang,T.S.(1995),Visionbasedhandmodellingandtrackingforvirtualtele-

conferencingandtelecollaboration,in`Proc.5thInternationalConferenceonComputer

Vision',Cambridge,MA,pp.666{671.

Laptev,I.&Lindeberg,T.(2000),Trackingofmulti-statehandmodelsusingparticleltering

and a hierarchy of multi-scale image features, Technical Report ISRN KTH/NA/P--

00/12--SE,Dept.ofNumericalAnalysisandComputingScience,KTH,Stockholm,Swe-

den.

(8)

scale-space primal sketch: A method for focus-of-attention', International Journal of

ComputerVision11(3),283{318.

Lindeberg,T.(1998),`Featuredetectionwithautomaticscaleselection',InternationalJournal

ofComputer Vision30(2), 77{116.

MacCormick,J.&Isard,M.(2000),Partitionedsampling,articulatedobjects,andinterface-

quality hand tracking, in `Sixth European Conference on Computer Vision', Dublin,

Ireland,pp.II:3{19.

Maggioni,C.& Kammerer,B. (1998), Gesturecomputer-history,designand applications,in

R.Cipolla&A.Pentland,eds,`Computervisionforhuman-computerinteraction',Cam-

bridgeUniversityPress,Cambridge,U.K.,pp.23{52.

Pavlovic, V. I.,Sharma, R.& Huang, T. S.(1997), `Visualinterpretation of hand gestures

forhuman-computerinteraction: Areview',IEEETrans.PatternAnalysisandMachine

Intell.19(7),677{694.

Regh,J.M.&Kanade,T.(1995),Model-basedtrackingofself-occludingarticulatedobjects,

in`FifthInternationalConferenceonComputerVision',Cambridge,MA,pp.612{617.

Shokoufandeh, A., Marsic, I.& Dickinson, S. (1999), `View-based object recognition using

saliencymaps',ImageandVisionComputing17(5/6),445{460.

Sidenbladh,H.,Black,M. &Fleet, D.(2000),Stochastic trackingof3dhumangures using

2dimagemotion,in`SixthEuropeanConferenceonComputerVision',Dublin,Ireland,

pp.II:702{718.

Sullivan, J.,Blake, A.,Isard, M. &MacCormick,J.(1999), Objectlocalization bybayesian

correlation, in `Seventh International Conference onComputerVision', Corfu, Greece,

pp.1068{1075.

Triesch, J. & von der Malsburg, C. (1996), Robust classication of hand postures against

complexbackground,in`Proc.Int.Conf.onFaceandGestureRecognition',Killington,

Vermont,pp.170{175.

A Computational modules in the prototype system

This appendix gives a more detailed description of the algorithms underlying

thedierent computationalmodulesinthe prototype system forhandgesture

recognitionoutlinedinsection4. Incontrasttothemaintext,thispresentation

assumesknowledge aboutcomputer vision.

A.1 Shape cues

For each image, a set of blob and ridge features is detected. The idea is that

the palm of the hand gives rise to a blob at a coarse scale, each one of the

ngersgives rise to a ridge at a ner scale, and each nger tip givesrise to a

ne scale blob. Figure 5 shows an example of such image features computed

froman image.

A.1.1 Colour feature detection

Technically,thisfeaturedetection stepisbasedonthefollowingcomputational

steps. The inputcolour imageis transformed from theRGB colourspace to a

(9)

I =

R+G+B

3

(1)

u=R G (2)

v=G B (3)

Ascale-space representationis computedofeach colourchannelf

i

byconvolu-

tionwithGaussiankernelsg(; t)ofdierentvariancet,C

i

(; t)=g(; t)f

i ()

andthefollowingnormalizeddierentialexpressionsarecomputedandsummed

upoverthe channelsat each scale:

B

norm C=

X

C t

2

(@

xx C

i +@

yy C

i )

2

(4)

R

norm C=

X

C t

3=2

(@

xx C

i

@

yy C

i )

2

+4(@

xy C

i )

2

(5)

Then,scale-space maximaofthesenormalized dierentialentitiesaredetected,

i.e.,pointsatwhichB

norm

andR

norm

assumenormalizedmaximawithrespect

to space and scale. At each scale-space maximum (x; t) a second-moment

matrix

= X

i Z

2R 2

(@

x C

i )

2

(@

x

LCi)(@

y C

i )

(@

x C

i )(@

y C

i

) (@

y C

i )

2

g(;s

int

)d (6)

is computed at integration scales

int

proportional to the scale of the detected

imagefeatures. To allowforthecomputationaleÆciencyneededto reachreal-

timeperformance,allthecomputationsinthefeaturedetection stephavebeen

implementedwithina pyramidframework. Figure5 shows such features, illus-

tratedbyellipsescentered at xand withcovariance matrix=t

norm

,where

norm

==

min and

min

isthesmallesteigenvalueof .

(a) (b)

(c)

Figure5:Theresultofcomputingblobfeaturesandridgefeaturesfromanimageofa

hand. (a) circlesand ellipsescorrespondingto thesignicantbloband ridgefeatures

extractedfrom animage of ahand; (b)selected image features correspondingto the

palm, the ngers and the nger tips of a hand; (c) a mixture of Gaussian kernels

associated with blob and ridge features illustrating how the selected image features

capturetheessentialstructureofahand.

(10)

Asmentionedabove,animageofahandcanbeexpectedtogiverisetobloband

ridgefeaturescorrespondingtothengersofthehand. Theseimagestructures,

together with informationabout their relative orientation, positionand scale,

canbeusedfordeningasimplebutdiscriminativeview-basedmodelofahand.

Thus, we represent a handbya setof blob and ridgefeatures asillustrated in

gure6,and denedierent states, dependingonthe numberof openngers.

To model translations, rotations and scaling transformations of the hand,

we dene a parameter vector X = (x;y;s;;l), which describes the global

position (x;y), the size s, and the orientation of the hand in the image,

together with its discrete state l = 1:::5. The vector X uniquely identies

thehandcongurationintheimageandestimationofXfrom imagesequences

correspondsto simultaneoushandtrackingand recognition.

α

x,y,s _l=1 l=2

l=4

l=3

l=5

Figure6: Feature-basedhand models in dierent states. Thecircles and ellipsescor-

respondtoblob andridgefeatures. Whenaligningmodelsto images,thefeaturesare

translated,rotatedandscaledaccordingtotheparametervectorX.

A.3 Skin colour

When tracking human faces and hands in images, the use of skin colour has

beendemonstrated tobea powerfulcue. Inthiswork,weexplore similarityto

skincolourintwo ways:

Fordeningcandidateregions (masks)forsearching forhands.

Forcomputingaprobabilisticmeasure of anypixel beingskincoloured.

Histogram-based computation of skin coloured search regions. To

delimit regions in the image for searching for hands, an adaptive histogram

analysis of colour information is performed. For every image, a histogram is

computedforthechromatic(u;v)-componentsofthecolourspace. Inthis(u;v)-

space acoarse search regionhas beendened, whereskin colouredregions are

likely to be. Within this region, blob detection is performed, and the blob

mostlikelyto correspond to skincolouris selected. The supportregionof this

blob incolour space is backprojected into the image domain, which results in

(11)

interestinterestcomputedinthisway,whichareusedasaguideforsubsequent

processing.

Figure7:Todelimittheregionsinspacewheretoperformrecognitionofhandgestures,

aninitialcomputationofregionsofinterestiscarriedout,basedonadaptivehistogram

analysis. Thisillustrationshowsthebehaviourofthehistogrambasedcolouranalysis

for a detailof a hand. In the system, however, the algorithm operates on overview

images. (a)originalimage,(b)histogramoverchromaticinformation,(c)backprojected

histogramblob givingahandmask,(d)resultsofblobdetectionin thehistogram.

Probabilistic prior on skin colour. For exploring colour information in

thiscontext, wecompute aprobabilisticcolourpriorinthe followingway:

Hands were segmentedmanuallyfrom thebackgroundforapproximately

30 images, and two-dimensionalhistograms over the chromatic informa-

tion (u;v) wereaccumulatedforskin regionsand background.

These histogramswere summedup and normalizedto unit mass.

Given these trainingdata, the probability of any measured image point

with colourvalues(u;v) beingskincolourwasestimatedas

p

skin

(u;v)=

max (0;aH

skin

(u;v) H

bg (u;v))

P

u;v

max(0;aH

skin

(u;v) H

bg (u;v))

; (7)

References

Download now ( PDF - 15 Page - 510.16 KB )

Related documents

Computer vision as a tool for forestry

The models created in these experiments all performed poorly with only achieving 4-11% mAP on the test set. Earlier testing of these architectures shows that they have

A six-week hand exercise programme improves strength and hand function in patients with rheumatoid arthritis

The RA group and the control group showed significantly improved hand force (both flexion and extension force) and hand function after only 6 weeks of

A Comparison of Three Computer System Simulators

Execution-driven simulators execute applications on a simulated processor. No traces are needed and the simulation can be conducted on one machine [13]. The instruc- tions

Vision-Based In-Hand Manipulation with Limited Dexterity

Due to their structure, parallel grip- pers can only perform simple in-hand manipulation motions, but the combination of many of these simple motions allows the robot to

In-Hand Manipulation Using Three-Stages Open Loop Pivoting

The friction coefficients µ and ξ may typically not be known a priori for a new tool, and they also have to be estimated. This estimation is run in parallel to the execution of

A real-time hand pose recognition system

In this paper we are going to discuss the problem of hand pose estimation in the context of an existing application, developed in [19] and more recently in [18] for close to

A Study of Hand Painted Textures

In the examples used, the environment textures of World of Warcraft are analogous, since the Blizzard artists have utilised hues between two primary colours in the colour wheel as

Improving a stereo-based visual odometry prototype with global optimization

The quantitative evaluation does not really represent a realistic scenario since ground truth data is used to simulate the assumption that the loop closure relative pose is

Upload your study materials to download all documents.

Your document will be enriched, shared on 5dok SE to assist in studying.

Related documents

On the Cramér-Rao lower bound under model mismatch

On the Cramér-Rao lower bound under model mismatch

6

0

0

A high-linearity SiGe RF power amplifier for 3G and 4G small basestations

A high-linearity SiGe RF power amplifier for 3G and 4G small basestations

19

0

0

The Human Gyroscope A prototype

The Human Gyroscope A prototype

45

0

0

Del 5.

13

0

0

Motivera och variera : Ett utvecklingsarbete med fokus på språkutveckling

Motivera och variera : Ett utvecklingsarbete med fokus på språkutveckling

37

0

0

Biometer: a hand-held biomass meter, The

Biometer: a hand-held biomass meter, The

27

0

0

Demokrati – välinformerade väljare Motion 2020/21:965 av Betty Malmberg (M) - Riksdagen

Demokrati – välinformerade väljare Motion 2020/21:965 av Betty Malmberg (M) - Riksdagen

2

0

0

Hållbara matvägar – referens- och lösningsscenarier för mjölkproduktion och framställning av konsumtionsmjölk och lagrad ost.

Hållbara matvägar – referens- och lösningsscenarier för mjölkproduktion och framställning av konsumtionsmjölk och lagrad ost.

105

0

0