Magic mirror using motion capture in an exhibition enviroment

Full text

(1)LiU-ITN-TEK-A--10/068--SE. Magic mirror using motion capture in an exhibition environment Daniel Eriksson Thom Persson 2010-11-18. Department of Science and Technology Linköping University SE-601 74 Norrköping, Sweden. Institutionen för teknik och naturvetenskap Linköpings Universitet 601 74 Norrköping.

(2) LiU-ITN-TEK-A--10/068--SE. Magic mirror using motion capture in an exhibition environment Examensarbete utfört i medieteknik vid Tekniska Högskolan vid Linköpings universitet. Daniel Eriksson Thom Persson Handledare Thomas Rydell Examinator Stefan Gustavson Norrköping 2010-11-18.

(3) Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/. © Daniel Eriksson, Thom Persson.

(4) Magic mirror using motion capture in an exhibition environment Daniel Eriksson. Thom Persson. November 30, 2010.

(5) Abstract Motion capture is a commonly used technique in the movie and computer game industries to record animation data. The systems used in these industries are expensive high end systems that often use markers on the actor together with several cameras to record. Reasonable results can be achieved using no markers and a single webcam. In this report we will take a look on such a system and then use it together with our own animation software. The final product will be placed in an exhibition environment, restricting the level of interaction with the user that is practical..

(6) Contents Contents. 2. List of Figures. 4. List of Algorithms. 5. List of Abbreviations. 6. 1 Introduction 1.1 Interactive Institute . 1.2 Visualization Center C 1.3 Purpose . . . . . . . . 1.4 Our task . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 7 7 7 7 8. 2 Motion capture 2.1 Optical systems . . . . . . . . . . 2.1.1 Passive markers . . . . . . 2.1.2 Active markers . . . . . . 2.1.3 Markerless . . . . . . . . . 2.2 Magnetic systems . . . . . . . . . 2.3 Mechanical systems . . . . . . . . 2.4 Body motion capture . . . . . . . 2.5 Facial motion capture . . . . . . 2.5.1 Feature tracking . . . . . 2.5.2 Active appearance models 2.5.3 visage|SDK . . . . . . . . 2.5.4 faceAPI . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. . . . . . . . . . . . .. 9 10 11 11 11 12 12 12 13 13 14 15 16. . . . .. . . . .. . . . .. . . . .. . . . .. 3 Animation 17 3.1 Blend shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Skeletal animation . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4 Blink detection 19 4.1 Normal flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 State machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20. 2.

(7) CONTENTS. CONTENTS. 5 Implementation 5.1 Blend shapes . . . . . 5.2 Skinning . . . . . . . . 5.3 Filtering . . . . . . . . 5.4 Matching landmarks . 5.4.1 Eyebrows . . . 5.4.2 Mouth . . . . . 5.4.3 Jaw movement 5.5 Blink detection . . . . 5.6 Key framing . . . . . . 5.6.1 Animation . . . 5.6.2 Editor . . . . . 5.7 Eyes . . . . . . . . . . 5.8 Upper body movement 5.9 Noise . . . . . . . . . . 5.10 Lighting . . . . . . . . 5.11 SSAO . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. 22 22 23 25 26 27 27 28 28 29 29 29 30 30 30 31 31. 6 Results 6.1 Performance . . 6.2 SSAO . . . . . 6.3 Usage Example 6.4 The installation. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 32 32 33 34 36. 7 Conclusions 7.1 OpenCV . . . . . . . . . . . . . . . 7.2 Blink detection . . . . . . . . . . . 7.3 Tracking and animation separation 7.4 Future work . . . . . . . . . . . . . 7.4.1 Gaze tracking . . . . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 38 38 38 38 39 39. . . . .. . . . .. . . . .. . . . .. A faceAPI landmark standard B User manual B.1 Command line arguments B.2 Available Shapes . . . . . B.3 File formats . . . . . . . . B.3.1 Config . . . . . . . B.3.2 Models . . . . . . B.3.3 Animation . . . . .. 40. . . . . . .. . . . . . .. . . . . . .. Bibliography. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 41 41 41 42 42 43 43 44. 3.

(8) List of Figures 1.1. An early concept image of the installation. . . . . . . . . . . . . .. 2.1. Optical motion capture suit with markers from MoCap for Artists: Workflow and Techniques for Motion Capture[4]. . . . . . . . . . Example of detected facial features from A robust facial feature tracking system[13]. . . . . . . . . . . . . . . . . . . . . . . . . . Screen capture of example video showcasing visage|SDK from Visage Technologies. . . . . . . . . . . . . . . . . . . . . . . . . Part of a screen capture of our final application visualizing the tracked features from faceAPI. . . . . . . . . . . . . . . . . . . .. 2.2 2.3 2.4. 11 13 15 16. 3.1. Example of blend shapes from the book GPU Gems 3[30].. 4.1. The normal flow showing an opening sequence between two eyeframes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The visualization used by Divjak and Bischof [31] for their state machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21. 5.1 5.2 5.3. Illustrated bone weights. . . . . . . . . . . . . . . . . . . . . . . Illustrated bone placement. . . . . . . . . . . . . . . . . . . . . . Key frame editor . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24 25 30. 6.1 6.2 6.3 6.4 6.5. Comparision between SSAO on (left) Usage examples. . . . . . . . . . . . Usage examples. . . . . . . . . . . . The installation seen from the front. The installation seen from the back.. . . . . .. 33 34 35 36 37. A.1 faceAPI landmark standard . . . . . . . . . . . . . . . . . . . . .. 40. 4.2. 4. and SSAO off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. (right). . . . . . . . . . . . . . . . . . . . .. . . .. 8. . . . . .. . . . . .. 18. 20.

(9) List of Algorithms 1 2 3 4 5. General psuedocode . . . . Eyebrow psuedocode . . . . Open mouth psuedocode . . Smile psuedocode . . . . . . Blink detection psuedocode. . . . . .. . . . . .. 5. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 26 27 27 28 29.

(10) List of Abbreviations AAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GLSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SSAO . . . . . . . . . . . . . . . . . . . . . . . . . . . . VR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Active Appearence Model Application Programming Interface Computer Generated Imagery Central Processing Unit Finite Impulse Response OpenGL Shading Language Graphics Processing Unit Graphical User Interface Infinite Implulse Response Light-Emitting Diod Massachusetts Institute of Technology Principal Component Analysis Software Development Kit Screen Space Ambient Occlusion Virtual Reality. 6.

(11) Chapter 1. Introduction This Master’s thesis is carried out at the Interactive Institute in Norrköping with the goal to be a part of an exhibition called ”To show what can not be seen” at Norrköping Visualization Center C.. 1.1. Interactive Institute. The Interactive Institute1 is an experimental media research institute with expertise in art, design and technology. They conduct research in these areas and also provide strategic advice to cooperations and public organizations. The Interactive Institute is organized into subgroups that are located around the country, each one with a slightly different focus. The group this Master’s thesis was developed at is called C-Studio and is also a part of the Norrköping Visualization Center C.. 1.2. Visualization Center C. Norrköping Visualization Center C is a cooperation between Norrköping Kommun, Linköping University, Norrköping Science Park and The Interactive Institute. Besides the permanent exhibition mentioned above, the center also contains a temporary exhibition, a restaurant and café, office space for the various branches of C, conference rooms, a VR-theater and a dome theater.. 1.3. Purpose. The purpose of this Master’s thesis was to develop an interactive installation that in a playful manner demonstrates how motion capture techniques work and are used in the game and movie industries. 1 http://www.tii.se. 7.

(12) 1.4. OUR TASK. 1.4. CHAPTER 1. INTRODUCTION. Our task. The task was twofold; the first task was to develop, from scratch or based on existing software, a robust system for tracking a user’s movements. The requirements for this system where: • Real time, or as close to real time as possible. • Camera based, be usable with a single off-the-shelf webcam. • No user input other than the camera. • Can handle large head rotations, up to 45 degrees. • Can handle occlusion. • Do not utilize markers or other equipment. The second task was to develop, from scratch or based on existing software, an application that utilizes the tracking software to render animated characters.. Figure 1.1: An early concept image of the installation.. 8.

(13) Chapter 2. Motion capture Motion capture is a process used in a number of areas including the military [1], clinical medicine [2] and the entertainment industry [3]. Despite the wide use of motion capture and its different implementations, a fitting definition can be found by Kitagawa and Windsor [4]: ”Motion capture (mocap) is sampling and recording motion of humans, animals, and inanimate objects as 3D data. The data can be used to study motion or to give an illusion of life to 3D computer models.” A common use of motion capture today can be found in the feature film industry where an actor’s movements are recorded using one of the available systems for motion capture and then translated onto a virtual character. While some call it ”Satan’s Rotoscope” and see it as a threat to the livelihood of animators, others see it as a promising technique which will allow animators to focus on more creative endeavors [5]. Whatever role motion capture will play in the future, the main qualities today are: • The high speed, making it possible to animate a virtual character much faster than traditional key framing, resulting in an easy way to do multiple takes with different deliveries from the actor. Some systems are so fast that even real time viewing is possible, giving even more control to the director [6]. • The amount of work needed is not as dependent on the complexity, or the length, of the animation as other common animation techniques like key framing can be. • Realistic movements and physical interactions, including exchange of forces and secondary motion, are easily recorded. Although it is a very fast and cost effective technique, the automatic nature of motion capture has a few disadvantages as well: • Specific hardware, software, and personnel are needed to record and process the data, as well as a designated motion capture area in some cases. 9.

(14) 2.1. OPTICAL SYSTEMS. CHAPTER 2. MOTION CAPTURE. The combined cost can be too much for small studios or productions with a low budget. • The motion capture area may be very limited in volume, severely effecting the possible range of scenes to capture. The system may also be dependent on specific clothing or prohibit the use of metal objects. • Movements which are not physically possible cannot be captured. • Artifacts occur when the proportions of the capture subject’s limbs differs from the computer model. One of the earliest techniques related to motion capture is rotoscoping, an animation technique invented in 1915 by Max Fleischer [7]. With rotoscoping, animators trace over live-action film, frame by frame, with hand-drawn animation. The primary use of rotoscoping was to help the animators quickly create realistic movement with as little work as possible. Rotoscoping has since then been used in many of Disney’s animated feature films with human movement, for example ”Snow White and the Seven Dwarfs” from 1937, as well as more recent productions like Ralph Bakshi’s animated film ”The Lord of the Rings” from 1978 [5]. The first successful CGI implementation of motion capture more closely related to today’s techniques is the movement of the animated robot "Brilliance". She was produced by Robert Abel and Associates for the National Canned Food Information Council for a commercial aired during the 1985 Super Bowl. Her movements were animated by painting a total of 18 black dots on a live model’s joints, and photographing the model from multiple angles. The photographs where then processed in the computer to generate the information needed to animate the robot. Kitagawa and Windsor[4] divide today’s motion capture techniques into five main areas; optical, magnetic, mechanical, ultrasonic and inertial systems. The last two are not discussed due to their rare use in the entertainment business.. 2.1. Optical systems. Optical systems rely on the use of multiple cameras with overlapping capture areas to gather motion capture data. The cameras are used together with up to hundreds of reflective or light emitting markers attached to the capture subject’s joints, or in some recent systems, with no markers at all. At least two cameras need to see a marker in order to triangulate its 3D position, although three or more is preferable for accuracy. Real time view is possible, although limited to less accurate motions due to a needed post-processing step where rotational information is calculated for the markers. Problems with the technique are mostly related to occlusion from the capture target or props, as well as a limited capture area requiring a very controlled lighting environment [4].. 10.

(15) 2.1. OPTICAL SYSTEMS. 2.1.1. CHAPTER 2. MOTION CAPTURE. Passive markers. Passive markers usually have a spherical or circular shape and are coated with a reflective material, often put directly on the skin of the actor or on a slim body suit. Using a light source directed from each camera, the markers reflect the light and appear very bright. The images from the camera are thresholded so only the markers are seen.. 2.1.2. Active markers. Active markers are made of light-emitting diodes (LED) instead of reflecting external light. This requires the capture target to wear electrical equipment but helps significantly with the identification of each marker. This is achieved either by lighting one marker at a time at the cost of frame rate, or by using a unique combination of amplitude and frequency for each LED.. Figure 2.1: Optical motion capture suit with markers from MoCap for Artists: Workflow and Techniques for Motion Capture[4].. 2.1.3. Markerless. Research in computer vision has pushed markerless motion capture techniques. At MIT [8] and Stanford [2], techniques using no specific clothing or markers have emerged by extracting motion parameters from edges and silhouettes from the image stream. Among commercial systems, the company mova and their. 11.

(16) 2.2. MAGNETIC SYSTEMS. CHAPTER 2. MOTION CAPTURE. Contour Reality Capture system uses phosphorescent makeup to capture the geometry and texture of an actors face.. 2.2. Magnetic systems. Magnetic systems use less sensors than optical systems, often as few as 12-20 and with a lower sampling rate. The sensors are attached to the capture target and output both position and orientation by measuring the spatial relationship to a magnetic transmitter. This enables real time viewing without any tedious post-processing, but with the drawback of a smaller capture area. Magnetic systems do not suffer from occlusion or problems with sensor identification but are prone to magnetic and electrical interference caused by metal objects as well as electrical equipment. The output tends to be a bit noisy and the mobility of the capture target is somewhat limited by carrying a battery and wiring for the sensors.. 2.3. Mechanical systems. Mechanical systems are worn as an exo-skeletal device consisting of hinge-joints, straight rods and potentiometers. The system measures the joint angles of the capture subject and is suitable for real time viewing. There is no occlusion, no interference and no capture area is needed due to the systems high portability. The problems are mostly related to the bad mobility caused by rigid exo-skeletons tendency to break, as well as the limited range of motion from the hinge-joints. Mechanical systems also rely on accelerometers to measure global translation like walking or jumping, often resulting in the data staying at the same spot on the ground or sliding motion.. 2.4. Body motion capture. The task from The Interactive Institute demands that a single webcam is used with no markers, severely limiting the available techniques to mono view markerless optical systems. A lot of these systems cannot be considered for real time applications or relies on a too controlled environment [9]. Although promising results can be achieved with a powerful computer and a parallelized implementation, it requires large amounts of training data as well as an initiation by the user [10]. Full body mono view markerless motion capture is overall a very challenging task, especially in an exhibition environment with changing lighting conditions and a crowded background [11]. A more realistic approach, while still moving within the boundaries of the task, is finding a less demanding motion capture target, for example facial motion capture.. 12.

(17) 2.5. FACIAL MOTION CAPTURE. 2.5. CHAPTER 2. MOTION CAPTURE. Facial motion capture. Facial motion capture focuses on capturing the head movements as well as the movement of face muscles in order to recreate facial expressions. Although some of the concerns from full body motion capture remain in facial motion capture, it is still a more suitable approach due to the high number of real time implementations available.. 2.5.1. Feature tracking. A feature tracker finds and follows facial features from one frame to another. A feature can be any point in a face but is often found on the edge around the mouth, an eye, an eyebrow or any other contrasting part easily found through image processing. A common first step in locating features to track is to find a face to limit the search. Some use color analysis to find skin tones in an image and then image processing techniques like thresholding and morphological operations to locate the overall face [12, 13]. Others use a machine learning approach where a system is trained to recognize a face based on classifiers from training data [14]. Both techniques can also be used to locate facial features in the limited region of the face. The thresholding approach, however, often identifies the found features by comparing their geometrical information from the image to a predefined face model. Due to the variety of individual appearances, for example skin color and face proportions, we consider both methods far from ideal. One solution is to manually mark facial features from a neutral expression in an initialization phase. This obviously requires tedious user input and is therefore not an approach suited for our application.. Figure 2.2: Example of detected facial features from A robust facial feature tracking system[13]. The second step is tracking the identified feature points from one frame to another. It is possible to repeat the previous procedure for each frame, but a less computational heavy way is to track the feature points using their neighboring pixels in template matching [13, 15]. The third and last step is approximating 13.

(18) 2.5. FACIAL MOTION CAPTURE. CHAPTER 2. MOTION CAPTURE. head rotations. Many feature tracking algorithms skip this step entirely and are therefore prone to error when handling out-of-plane rotations in the input image [14, 16, 17]. Promising results are accomplished using Kalman filter or the Posit algorithm together with a simplified 3D face model to predict head movements [15, 13, 18]. The downside is that the technique relies heavily on the similarity between the simplified 3D face model and the geometry of the actual face being captured.. 2.5.2. Active appearance models. An interesting approach not too distant to feature tracking is active appearance models (AAM) [19]. An AAM consists of two parts; a shape model and an appearance model. The shape model is defined as a simple triangulated 2D mesh of a human face, or more specifically its vertex positions. The shape model is commonly implemented as a shape matrix s containing the vertex positions v that make up the mesh. s = (x1 , y1 , x2 , y2 ...xv , yv )T. (2.1). AAMs allow linear shape variation, meaning that the shape s can be expressed as the linear combination of a base shape s0 and n weighted shape vectors si . s = s0 +. n X. pi si. (2.2). i=1. The base shape s0 and the shape vectors si are created by manually overlaying the 2D mesh on a series of training images containing faces. Principal Component Analysis (PCA) is then run on the training shapes, creating s0 from the mean matrix and the si matrixes from the reshaped eigenvectors with the largest eigenvalues [20]. The appearance model is defined as the pixel content within the base shape s0 . The appearance of an AAM is then an image A(u) where u(u, v) ∈ s0 . Like the shape model, each appearance A(u) can be described as a linear combination of a base appearance A0 (u) and m weighted appearances Ai (u). A(u) = A0 (u) +. m X. λi Ai (u). (2.3). i=1. A0 (u) and Ai (u) are created using PCA on the pixel content within each shape in the training images. First the shapes need to be transformed to the shape s0 through piecewise affine warp between the corresponding triangles in the shape and s0 . A0 (u) is the mean image and Ai (u) are the eigenimages with the largest corresponding eigenvalues from the PCA. Given the shape weights p = (p1 , p2 ...pn )T , the shape model can be calculated using equation 2.2. Likewise, given the appearance weights λ = (λ1 , λ2 ...λm )T , the appearance model can be calculated using equation 2.3. The AAM instance 14.

(19) 2.5. FACIAL MOTION CAPTURE. CHAPTER 2. MOTION CAPTURE. can now be created by warping the appearance A from its base shape s0 to the model shape s. Normally the AAM model is fitted to a face in an input image, meaning finding the optimal shape and appearance weights minimizing the difference between the input image and the AAM instance. There are a number of ways to solve this problem depending on the speed and efficiency needed. Among the fast solutions, the inverse compositional alignment [19] appears to outperform previous algorithms in efficiency. AAMs can be extended to include 3D head pose estimation [21, 22] as well as handle occlusions [23]. However, the use of training data requires a lot of manual labour and still performs unsatisfactory on persons excluded from the training data [24].. 2.5.3. visage|SDK. One commercially available feature tracker is visage|SDK from Linköping based Visage Technologies AB1 . Visage Technologies AB offers services and applications involving computer generated virtual characters and computer vision for finding and tracking faces and facial features in images and video. Their real time feature tracker, based on a master’s thesis by Nils Ingemars [15], supports out-of-plane head rotations with full 3D head tracking but requires a setup procedure. The setup consists of manually positioning and scaling a 2D projection of the Candide-3 [25] face mesh model over a still image of the capture target at the beginning of the tracking session. Another noticeable issue is the lack of recovery from errors caused by fast head movements or occlusion.. Figure 2.3: Screen capture of example video showcasing visage|SDK from Visage Technologies. 1 http://www.visagetechnologies.com/. 15.

(20) 2.5. FACIAL MOTION CAPTURE. 2.5.4. CHAPTER 2. MOTION CAPTURE. faceAPI. Among the commercially available feature trackers faceAPI from SeeingMachines2 stands out. Their real time implementation offers a fully automatic and highly robust face tracker with an estimated 3D head-position and orientation. The features available for tracking are eyes, eyebrows and lips which all can be extracted from a number of movie file formats as well as any webcam. The tracking is robust to occlusions, fast movements, large head rotations, environment lighting as well as varying personal traits including facial deformation, skin color, beards and glasses and automatically recovers from tracking errors. The tracker is easily integrated, highly configurable and comes with a comprehensive documentation. faceAPI fulfills all of our demands and much more and is therefore the optimal tracker for our final application.. Figure 2.4: Part of a screen capture of our final application visualizing the tracked features from faceAPI.. 2 http://www.seeingmachines.com/. 16.

(21) Chapter 3. Animation The second task in our Master’s thesis is to develop a system animating a 3D face model with the output feature point coordinates from our tracking software. The animation system is to be separated from the tracking system in order to play a key framed animation when no tracking occurs. A possible animation system is the muscle-based approach where facial expressions are created by simulating the movement of the underlaying muscle structure of the face [26]. Another approach is to simply move the vertices in the face model according to their feature point counterparts from the tracking data and interpolate the leftover vertices. The first approach adds a lot of complexity since we need to decide which muscles are involved based on the movements of the feature points which is far from trivial. The second approach is quite promising but requires a higher number of feature points at greater precision than available in order to create realistic expressions. What is needed is a technique that can create realistic expressions from a small set of feature points without adding a lot of complexity. The technique also needs to be well known in order for us to find a realistic, textured and animated 3D face model resource since our 3D modeling experience is not enough for the limited timeframe. A technique fulfilling all these demands is blend shapes.. 3.1. Blend shapes. Blend shapes is not a new concept, according to Joshi et al. [27] it can be traced back to Parke’s work on facial animation in the early 70’s [28, 29]. The technique consists of a model with a neutral expression and a finite set of extreme expressions. From that it is possible to construct virtually an infinite amount of expressions by blending the neutral expression with the extremes weighted in different ways. The use of blend shapes makes it possible to create details which are hard to track but likely to occur together with trackable details. For example the wrinkles above the nose when the inner feature points of the eyebrows move. 17.

(22) 3.2. SKELETAL ANIMATION. CHAPTER 3. ANIMATION. downward and inward to create a frown.. Figure 3.1: Example of blend shapes from the book GPU Gems 3[30]. Realistic and textured blend shape resources can be obtained from various places. A popular tool is FaceGen1 from Singular Inversions, a program used a lot in the game industry to create diverse facial blend shape meshes with little effort.. 3.2. Skeletal animation. The linear blending of blend shapes makes it a bad solution for motion including rotations. To animate such motion, for example head rotations or jaw movement, virtual bones are placed inside the model. Each vertex is then connected with weights to the bones that can effect that vertex. When the bones move the model is animated. 1 http://www.facegen.com/. 18.

(23) Chapter 4. Blink detection Humans blink very often and involuntary. If our virtual character can match the blinking of the user it will help to improve the user experience. There are quite a bit of research done on detecting and analyzing blinks and eye movement [31, 32, 33]. When observed over a period of time this can give insight into the mental status of the subject [33], for example measuring fatigue in an operator of heavy machinery or critical systems. A common problem in blink detection is the initial localization of the eyes, a problem faceAPI handles for us. Once the eyes are found, the actual blink detection can commence by matching each eye to a person-specific open-eye template [32] or search for motion in the immediate surrounding [31, 33]. The template approach detects blinks by measuring the error between the eye template and an area around the last known eye location. A high error indicates a bad fit, meaning that the eye has changed from open to closed. One problem with the template approach is knowing when the eyes are open for the initial creation of the templates. A bigger problem is the bad matching that naturally occurs when the user turns his head and each eye region is compared to a template seen from a frontal position. This results in false positives, making the technique viable only for stationary heads or head-mounted cameras. An approach more resilient to our head rotations is searching for motion using normal flow.. 4.1. Normal flow. The base of the technique used by Heishman and Duric [33], as well as Divjak and Bischof[31], is to calculate the direction and amplitude of the movement within a frame around the eyes. A common technique for this is called optical flow but it is considered too slow for real time applications by Heishman and Duric who instead use a similar technique called normal flow. The advantage of normal flow is that it can be computed using only local information. By subtracting the global head movement from the local flow movement around. 19.

(24) 4.2. STATE MACHINE. CHAPTER 4. BLINK DETECTION. the eyes, a reliable flow direction and magnitude can be found and used to determine the state of the eyes.. Figure 4.1: The normal flow showing an opening sequence between two eyeframes. Image from Using Image Flow to Detect Eye Blinks in Color Videos [33].. 4.2. State machine. To monitor and update the state of the eyes Heishman and Duric[33] propose the use of a state machine. In their state machine they have three states; open, opening and closing. Divjak and Bischof[31] extends on this and add another state to their state machine; closed. The reason behind using a state machine is that normal flow by itself do not tell if the eye is open or closed. But if you have the previous state of the eye as well as the information from the flow it is possible to know what the next state is. A potential problem is finding generic thresholds for the flow direction and the magnitude which together decide the current state.. 20.

(25) 4.2. STATE MACHINE. CHAPTER 4. BLINK DETECTION. Figure 4.2: The visualization used by Divjak and Bischof [31] for their state machine. mag is the mean flow magnitude, dir is the dominant flow direction and T is the mean flow magnitude threshold.. 21.

(26) Chapter 5. Implementation The application is primarily written in C++. For face tracking faceAPI is used, other APIs used are OpenGL for graphics, OpenCV1 to interact with the camera and for image processing, GLFW2 and QT3 for window management and GUI handling. Boost4 is also used for thread management and for easy handling of command line options. Shaders are written in GLSL. OpenGL is favoured before DirectX and the XNA-framework since we are already familiar with it. GLFW is used because of its simplicity while OpenCV is the obvious option for image processing in C/C++.. 5.1. Blend shapes. In the blend shape calculations each vertex can be calculated independently from each other, making it suitable to be done on the vertex shader. Each blend shape consists of the difference between the expression mesh and the neutral expression mesh. The differences for vertices and normals are stored in a texture which together with texture width, vertex count and number of shapes is supplied to the shader in order to calculate the correct row and column for the texture lookups when blending. To get the current vertex id the ”GL_EXT_gpu_shader4”-extension5 is used, enabling the built in ”gl_VertexID” variable which just as easily could be supplied as an attribute. The ”GL_ARB_texture_rectangle”-extension6 is also used for rectangle textures with integer indices instead of the standard 2D textures where the coordinates are normalized to the 0 to 1 range. The formula used to calculate the resulting vertices and normals for the shapes can be seen below in 5.1 where d~k is the differences and wk is the weight 1 http://opencv.willowgarage.com/wiki/ 2 http://glfw.sf.net 3 http://qt.nokia.com/products/ 4 http://www.boost.org/ 5 http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt 6 http://www.opengl.org/registry/specs/ARB/texture_rectangle.txt. 22.

(27) 5.2. SKINNING. CHAPTER 5. IMPLEMENTATION. for shape k. ~vout = ~vbase +. N X. d~k wk. (5.1). k=1. This results in the following shader code: uniform uniform uniform uniform uniform uniform. sampler2DRect vertexDeltas; sampler2DRect normalDeltas; sampler2DRect weights; int vertexCount; int numOfShapes; int textureWidth;. ... int id = gl_VertexID; vec4 vertex = gl_Vertex; vec3 normal = gl_Normal; for(int i = 0; i < numOfShapes; i++) { int index = vertexCount * i + id; int row = index / textureWidth; int col = index % textureWidth; vec2 dpos = vec2(col, row); vec2 wpos = vec2(i, 1); float curr_weight = texture2DRect(weights, wpos).x; vec3 vertexDelta = texture2DRect(vertexDeltas, dpos).xyz; vertex += vec4(vertexDelta * curr_weight, 0); vec3 normalDelta = texture2DRect(normalDeltas, dpos).xyz; normal += vec3(normalDelta * curr_weight); } The blend shape mesh used in the application is created by the digital artist Kent Trammell7 .. 5.2. Skinning. The model has in Maya bones defined for head rotations, jaw movement and eye movement. Unfortunately the format8 used to export the model from Maya 7 http://www.ktrammell.com 8 Wavefront. .OBJ. 23.

(28) 5.2. SKINNING. CHAPTER 5. IMPLEMENTATION. does not have any support for skinning data, making it impossible to export the bone information. To solve this a pre-processing step was added where the bone weights are estimated for the head rotations based on the y-value of the vertex and which part of the mesh it belongs to. In figure 5.1 this is illustrated using green color where the model is effected by rotation and red color where it is not effected. The gradient is the area where we do a linear interpolation to get a smooth transition from non effected vertices to the effected vertices. For example it is known that the shirt mesh is not effected by the bone so there is no point checking the y value for its vertices. It is also known that the eyes, teeth and tongue meshes are always effected by the current maximum of the bone weight making it possible to skip the check for their y values also.. Figure 5.1: Illustrated bone weights. The location of the bone is manually placed inside the throat of the face mesh, visualized as the top red dot in figure 5.2. The output from faceAPI is the euler angles for the head rotations which are used to construct the rotation matrix on the CPU. The rotation matrix is then uploaded to the GPU every frame where it in the vertex shader is applied after all blend shape calculations.. 24.

(29) 5.3. FILTERING. CHAPTER 5. IMPLEMENTATION. Figure 5.2: Illustrated bone placement. Skinning for jaw movement and eye movement are substituted with blend shapes for simplicity.. 5.3. Filtering. The values received from faceAPI are very noisy, resulting in jerky movements for the virtual character unless a low pass filter is applied. Both an Infinite Impulse Response (IIR) filter, described in equations 5.2 and 5.3, and a Finite Impulse Response (FIR) filter, described in equation 5.4 are implemented. In equation 5.3 RC is the time constant controlling the behavior of the filter and ∆t is the length of the current time step. In equation 5.4 N is the length of the buffer. y[n] = y[n − 1] + α(x[n] − y[n − 1]) α=. y[n] =. ∆t RC + ∆t. N 1 X x[n − k] N. (5.2) (5.3). (5.4). k=0. For head rotations the IIR filter was discovered to be unusable due to the impulse response being non-zero an infinite amount of time, making the virtual head movements lag too far behind the user’s movements.. 25.

(30) 5.4. MATCHING LANDMARKS. 5.4. CHAPTER 5. IMPLEMENTATION. Matching landmarks. The tracked feature points are called landmarks in the faceAPI standard. In every frame the differences between landmark coordinates are calculated in order to find the appropriate weights for the blend shapes. For example the weight for the open mouth blend shape is calculated by comparing the difference between an upper lip landmark and an lower lip landmark. The problem with this approach is to find the largest difference possible in order to normalize the difference to a value between 0 and 1 and use it as a weight for the blend shape. Every user has a unique appearance and different face proportions, making it hard to specify fixed maximum differences that works for anyone. The approach used is to set all the maximum differences relative to the distance between the eyes. This distance works well as an indicator of head width since it remains the same during the entire user experience as well as being created from the very first tracked image where a frontal face is required. More precise relationships could be used if the first tracked image always contains a neutral expression which unfortunately is not the case. The ”lowpassFilter”-function in the pseudo code below is always the FIR filter described in section 5.3. The width of the buffer differs between them as the goal was to make the window as narrow as possible to reduce lag. Algorithm 1 General psuedocode pupilSeperation = abs(rightEyeCenter - leftEyeCenter) mouthNeutralWidth = 0.7 * pupilSeperation ”rightEyeCenter” in algorithm 1 is landmark 600 and ”leftEyeCenter” is landmark 700. See appendix A for the faceAPI landmark standard.. 26.

(31) 5.4. MATCHING LANDMARKS. 5.4.1. CHAPTER 5. IMPLEMENTATION. Eyebrows. Algorithm 2 Eyebrow psuedocode for each eyebrow do browDiff = lowpassFilter(abs((eyeCenter - innerBrowPoint ) / pupilSeperation)) if browDiff < 0.22 then amountDown = min((0.22 - browDiff) / 0.7, 1.0) setShape(Brow down, amountDown * 1.3) setShape(Eye squint, amountDown) setShape(Brow up, 0) setShape(Eye wide, 0) else amountUp = min((browDiff - 0.22) / 0.23, 1.0) setShape(Brow up, amountUp * 1.3) setShape(Eye wide, amoutUp) setShape(Brow down, 0) setShape(Eye squint, 0) end if end for ”innerBrowPoint” refers to landmark 302 and 400. ”eyeCenter” to landmark 600 and 700.. 5.4.2. Mouth. Algorithm 3 Open mouth psuedocode mouthOpen = lowpassFilter(abs(overlipYPos - underlipYPos)) amountOpen = min(mouthOpen / (0.8 * mouthNeutralWidth), 1.0) if amountOpen > 0.3 then setShape(Mouth open, ((1.0/0.7) ∗ (amountOpen − 0.3))2 ) else setShape(Mouth open, 0) end if. 27.

(32) 5.5. BLINK DETECTION. CHAPTER 5. IMPLEMENTATION. Algorithm 4 Smile psuedocode mouthWidth = lowpassFilter(abs(leftMouthCorner - rightMouthCorner)) mouthMaxWidthDelta = (1.04 * pupilSeperation) - mouthNeutralWidth mouthMinWidthDelta = mouthNeutralWidth - (0.5 * pupilSeperation) if mouthWidth > mouthNeutralWidth then amountWide = min((mouthWidth - mouthNeutralWidth) / mouthMaxWidthDelta, 1.0) setShape(Full smile, amountWide) setShape(Left temple flex, amountWide) setShape(Right temple flex, amountWide) else setShape(Full smile, 0) setShape(Left temple flex, 0) setShape(Right temple flex, 0) end if The corresponding landmarks: • ”overlipYPos” is y-component of landmark 202 • ”underlipYPos” is y-component of landmark 206 • ”rightMouthCorner” is landmark 4 • ”leftMouthCorner” is landmark 5. 5.4.3. Jaw movement. The landmarks for the lips together with the face contour landmarks allow for detection when the user moves his jaw to the right and left. The program is not keeping track of the jaw position, instead it looks for movement and initiate the correct animation when it detects a movement to the right or left.. 5.5. Blink detection. Using the eye landmarks from faceAPI, a 40x50 pixels large sub image is created around each eye for blink detection calculations. Each eye is calculated independently but we make our virtual character blink with both eyes even if only one eye detects a blink. Otherwise the eyes blink out of sync which looks very unnatural. This also helps animating blinks for users with asymmetric blink behavior [33] at the cost of removing single eye blinks. To calculate motion around the eyes, Heishman and Duric [33] use the normal flow algorithm and reject optical flow for real time applications. Due to time constraints and the fact that OpenCV has several ways to calculate optical flow implemented, optical flow was chosen as the blink detection technique.. 28.

(33) 5.6. KEY FRAMING. CHAPTER 5. IMPLEMENTATION. Based on the required inputs and what is said about them in [34] the ”cvCalcOpticalFlowHS()” function was chosen. Head movements introduce noise to the optical flow calculations. By compensating for roll and ignoring the results if the pitch or yaw in a frame is above a certain threshold a large portion of false positive blinks should be removed. Algorithm 5 Blink detection psuedocode Apply the reverse head rotation around the Z-axis to the image. for each eye do Calculate optical flow on a 40x50 image with center in the pupil. Construct histograms of the directions and magnitudes of the optical flow vectors. T = 1.8 + (z-value of the head position) * 0.9 if The length of the rotation-difference vector < 0.8 then if The average magnitude > T and the dominating direction is down. then Blink with both eyes. end if end if end for. 5.6. Key framing. When the system is not tracking a user it looks a bit boring if the model is not doing anything. This is fixed by using a predefined animation playing. The animation is implemented by interpolating between key framed expressions and head rotations.. 5.6.1. Animation. In the key frame animation system a key frame is defined with a time and a complete set of all rotations and shape weights. For every time step we calculate how far it is between two key frames on a scale from 0 to 1 and then the following function is used to interpolate between the two. y(t) = 6t5 − 15t4 + 10t3. (5.5). This equation was proposed by Ken Perlin in [35] and is C2 continuous which means that it is continuous in the second derivative thus making the transition between key frames really smooth.. 5.6.2. Editor. Hard coding the key frame animations with C++ is very tiresome. Therefore a simple file format was developed to store the key frame data and an editor to edit the format. Details on the file format can be seen in appendix B.3.3. 29.

(34) 5.7. EYES. CHAPTER 5. IMPLEMENTATION. Figure 5.3: Key frame editor. 5.7. Eyes. To get a good user experience it is important that the eyes of the virtual character do not stare out into nothing but instead look at the user. This was accomplished by rotating each eye around its center in the opposite direction of the pitch and yaw head rotations making the virtual character look straight at the user at all times.. 5.8. Upper body movement. The shoulder section of the model has no movement caused by blend shapes or the head’s bone animation but if it is completely still it does not feel right. To address this problem a low frequency and low magnitude noise was added controlling the upper body rotation around an imaginary bone placed in the hip of the virtual character, bottom red dot in figure 5.2. This gives the impression of the character swinging back and forth like he is shifting his weight from one foot to the other.. 5.9. Noise. For the low frequency upper body motion perlin noise[35] is used, specifically an implementation made by Stefan Gustavsson [36]. The noise is applied independently for each rotation axis and consists of a sum of noises with different. 30.

(35) 5.10. LIGHTING. CHAPTER 5. IMPLEMENTATION. magnitudes and frequencies.. 5.10. Lighting. For lighting the model the Phong model is usded. The light sources are four point lights consisting of one key light, two rim lights and one fill light.. 5.11. SSAO. Screen space ambient occlusion (SSAO) is a technique that tries to approximate ambient occlusion in screen space using the depth value and normal in each pixel. An implementation from [37] with some modifications is used. The original code uses the texture coordinates from the fullscreen quad to lookup a random vector resulting in an artifact where the SSAO shadow ”slides” over the model when it moves. To fix this the texture coordinates from the model is used instead to lookup a random vector.. 31.

(36) Chapter 6. Results An important factor to consider when evaluating our work is how the reception of our installation has been in the exhibition. We have not done any formal survey but from our observations during the opening weekend and what we have heard from the tour guides people do like the installation. However there are some usability problems, to some people it not clear what to do.. 6.1. Performance. Our vertex shader is not optimized for performance in any way, for example the two rows of teeth in our model consist of 50k polygons. They have no blend shapes but the blend shape calculations in the vertex shader is applied to them any way for a simpler code structure. Our only consideration performance wise for the GPU calculations has been to make sure that the anti aliasing level does not make the framerate drop below 60fps. On the CPU faceAPI takes a lot of performance bringing our developing machines with a Core 2 Duo processor to its limit. This became a non issue on the exhibition computer that have more modern Core i7 Quad Core processor.. 32.

(37) 6.2. SSAO. 6.2. CHAPTER 6. RESULTS. SSAO. Figure 6.1: Comparision between SSAO on (left) and SSAO off (right). As can be seen in figure 6.1 the visual results is not that good. The image is taken from the exhibition computer running a ATI Radeon 5770 graphics card. On our developing machines running Nvidia 8800 GTS graphics cards the results were still not good enough but do not look quite as bad. There were also a lot of problems with the framebuffer extension on the ATI card. Things that worked on the Nvidia card did not work at all on the ATI card. When useing framebuffers we also do not take advantage of the anti aliasing applied to our OpenGL window since we missed the OpenGL extension that handles anti aliasing for framebuffer rendering. Our solution was to supersample the texture ourselves by rendering to a texture that was larger than the window resolution and then scaling it down when displaying it.. 33.

(38) 6.3. USAGE EXAMPLE. 6.3. CHAPTER 6. RESULTS. Usage Example. (a) Tracked neutral expression. (b) Keyframed expression. (c) Keyframed expression. (d) Tracked sceptical expression. Figure 6.2: Usage examples. 34.

(39) 6.3. USAGE EXAMPLE. CHAPTER 6. RESULTS. (a) Tracked smiling expression with (b) Tracked open mouth with faceAPI outfaceAPI output frame and wire frame put frame and wire frame. (c) Tracked head rotations with faceAPI (d) Tracked closed eyes with faceAPI outoutput frame and wire frame put frame and wire frame. Figure 6.3: Usage examples.. 35.

(40) 6.4. THE INSTALLATION. 6.4. CHAPTER 6. RESULTS. The installation. Figure 6.4: The installation seen from the front.. 36.

(41) 6.4. THE INSTALLATION. CHAPTER 6. RESULTS. Figure 6.5: The installation seen from the back.. 37.

(42) Chapter 7. Conclusions During the work we have had a couple of decision points where we had to decide on the direction of the work. The two important ones were chosing a model as well as a tracking software.. 7.1. OpenCV. We found that OpenCV is very good at what is does but the nature of the Clanguage makes it hard to use and extremely hard to debug. However OpenCV also has a Python interface and this looks a lot easier to use, it would be interesting see what kind of performance hit replacing our C OpenCV code with the same functionality in Python would have.. 7.2. Blink detection. Latency is often a bad thing but for our blink detection it is actually beneficial since it allows the user to see the virtual character blink. Even though we simplified the algorithms discussed by Heishman et al. [33] and Divjak et al. [31] we are happy with the results. Our goal from the start was to keep track of the open and close state of the eye but that showed not be not feasable with our setup. Besides the flow calculation algorithms we looked into if it was possible to filter out the white part of the eye and depending on how much ”white” we found determine the state of the eye. This idea was rejected quite early because of the low resolution of our webcam images as well as the amount of ”white” varied huge between just the two of us.. 7.3. Tracking and animation separation. As we said in the animation chapter the goal was to separate the animation and tracking and we think we succeeded in that part. Our solution is that they. 38.

(43) 7.4. FUTURE WORK. CHAPTER 7. CONCLUSIONS. run in separate threads within a single program and communicate via abstract interfaces thus making it easy to add other classes that can communicate via the same interfaces. Of course the separation could be taken to the extreme where they run in a separate process and communicate via for example tcp/ip but that would be completely overkill for our application.. 7.4 7.4.1. Future work Gaze tracking. It would have been neat if the eyes of the model followed the users eyes. We had no time to look further into this but still think that our solution with the model always looking forward is good enough for this installation. If the user looks at the screen then the eyes are looking the right direction, if the user looks in any other direction he or she will not notice if the eyes do not follow perfectly.. 39.

(44) Appendix A. faceAPI landmark standard. Figure A.1: faceAPI landmark standard 40.

(45) Appendix B. User manual B.1. Command line arguments. --help --windowmode --fullscreen --fsaa arg --width arg --height arg. Produces this help message. Force window mode. Force fullscreen mode. Set FSAA level. Default value is 4. Set horizontal resolution. Set vertical resolution.. Fullscreen mode will always overwrite window mode regardless of the ordering of the arguments.. B.2. Available Shapes. We have in total 28 different shapes. • Left brow down. • Right brow down. • Left brow up. • Right brow up. • Left eye closed. • Right eye closed. • Open mouth. • Angry pucker with the mouth. • Full smile.. 41.

(46) B.3. FILE FORMATS. APPENDIX B. USER MANUAL. • Look right. • Look left. • Look up. • Look down. • Left temple flex. • Right temple flex. • Left eye squint. • Right eye squint. • Left eye wide. • Right eye wide. • Shift jaw left. • Shift jaw right. • Crinkle nose. • Upper lip up. • Under lip up. • Right side smile. • Left side smile. • Left brow middle down. • Right brow middle down.. B.3 B.3.1. File formats Config. struct ConfigFileHeader { int shapes; int files; }; struct FileInfo { char name[40]; }; 42.

(47) B.3. FILE FORMATS. B.3.2. APPENDIX B. USER MANUAL. Models. The binary model files consist of the header seen below and then float arrays with the data. The length of these arrays can be calculated using data in the header and the order of these arrays is; vertices, normals, texture coordinates, weights, vertex deltas and normal deltas. struct MeshFileHeader { char name[50]; int numOfShapes; int vertexCount; unsigned int meshType; char textureName[100]; float gloss; float Ka[3]; float Kd[3]; float Ks[3]; };. B.3.3. Animation. Our animation file format is based on the .OBJ file format. Since we already had a reader for that we could reuse some of that logic to read our animation files. This is an example of from the actual idle animation that is running in the exhibition: newkf time 0 rotation 0 0 0 newkf time 1.4 rotation 0 5 10 shape 2 0.3 shape 3 0.2 shape 4 1.4 shape 5 1.4 shape 6 1 shape 19 0.3. 43.

(48) Bibliography [1] Lockheed martin - human immersive lab. http://www.lockheedmartin. com/aeronautics/labs/human_immersive.html visited 7/8 2010. [2] L. Mundermann, S. Corazza, and T.P. Andriacchi. Accurately measuring human movement using articulated icp with soft-joint constraints and a repository of articulated models. In Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on, pages 1 –6, 17-22 2007. [3] Remington Scott. Sparking life: notes on the performance capture sessions for the lord of the rings: the two towers. SIGGRAPH Comput. Graph., 37(4):17–21, 2003. [4] Midori Kitagawa and Brian Windsor. MoCap for Artists: Workflow and Techniques for Motion Capture. Focal Press, 2008. [5] Gordon Cameron, Andre Bustanoby, Ken Cope, Steph Greenberg, Craig Hayes, and Olivier Ozoux. Motion capture and cg character animation (panel). In SIGGRAPH ’97: Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 442–445, New York, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co. [6] Sharon Waxman. Computers join actors in hybrids on screen. http: //www.nytimes.com/2007/01/09/movies/09came.html visited 7/8 2010, January 2007. [7] Max Fleischer. Method of producing moving-picture cartoons. U.S. patent 1242674, 1915. [8] Michael Leventon and W. Freeman. Bayesian estimation of 3-d human motion from an image sequence. Technical report, 1998. [9] K. Onishi, T. Takiguchi, and Y. Ariki. 3d human posture estimation using the hog features from monocular image. pages 1 –4, dec. 2008. [10] Ryuzo Okada and Björn Stenger. A single camera motion capture system for human-computer interaction. IEICE - Trans. Inf. Syst., E91-D(7):1855– 1862, 2008.. 44.

(49) BIBLIOGRAPHY. BIBLIOGRAPHY. [11] Cristian Sminchisescu. 3d human motion analysis in monocular video techniques and challenges. In AVSS ’06: Proceedings of the IEEE International Conference on Video and Signal Based Surveillance, page 76, Washington, DC, USA, 2006. IEEE Computer Society. [12] Jari Hannuksela, Janne Heikkilä, and Matti Pietikäinen. A real-time facial feature based head tracker. advanced concepts for intelligent vision systems. In in Advanced Concepts for Intelligent Vision Systems,Brussels, page 267272, 2004. [13] Jingying Chen and B. Tiddeman. A robust facial feature tracking system. pages 445 – 449, sep. 2005. [14] Jong-Gook Ko, Kyung-Nam Kim, and R.S. Ramakrishna. Facial feature tracking for eye-head controlled human computer interface. volume 1, pages 72 –75 vol.1, 1999. [15] Nils Ingemars. A feature based face tracker using extended kalman filtering, 2007. [16] Taro Goto, Marc Escher, Christian Zanardi, and Nadia Magnenatthalmann. Mpeg-4 based animation with face feature tracking. In In Proc. Eurographics Workshop on Computer Animation and Simulation ’99, pages 89–98. Springer, 1999. [17] Marian Stewart Bartlett, Gwen Littlewort, Mark Frank, Claudia Lainscsek, Ian Fasel, and Javier Movellan. Recognizing facial expression: Machine learning and application to spontaneous behavior, 2005. [18] Tommaso Gritti. Toward fully automated face pose estimation. In IMCE ’09: Proceedings of the 1st international workshop on Interactive multimedia for consumer electronics, pages 79–88, New York, NY, USA, 2009. ACM. [19] Iain Matthews and Simon Baker. Active appearance models revisited. International Journal of Computer Vision, 60:135–164, 2003. [20] Kyungnam Kim. Face recognition using principle component analysis, 2003. [21] B. Theobald, I. Matthews, S. Boker, and J. F. Cohn. Real-time expression cloning using appearance models. [22] Daniel F. DeMenthon and Larry S. Davis. Model-based object pose in 25 lines of code. International Journal of Computer Vision, 15:123–141, 1995. [23] Soumya Hamlaoui and Franck Davoine. Facial action tracking using particle filters and active appearance models. In sOc-EUSAI ’05: Proceedings of the 2005 joint conference on Smart objects and ambient intelligence, pages 165–169, New York, NY, USA, 2005. ACM.. 45.

(50) BIBLIOGRAPHY. BIBLIOGRAPHY. [24] Ralph Gross, Iain Matthews, and Simon Baker. Generic vs. person specific active appearance models. Image Vision Comput., 23(12):1080–1093, 2005. [25] Jörgen Ahlberg. Candide-3 - an updated parameterised face. Technical report, 2001. [26] Mauricio Radovan and Laurette Pretorius. Facial animation in a nutshell: past, present and future. In Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries, SAICSIT ’06, pages 71–79, , Republic of South Africa, 2006. South African Institute for Computer Scientists and Information Technologists. [27] Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and Frederic Pighin. Learning controls for blend shape based realistic facial animation. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Courses, page 17, New York, NY, USA, 2006. ACM. [28] Frederick I. Parke. Computer generated animation of faces. In ACM ’72: Proceedings of the ACM annual conference, pages 451–457, New York, NY, USA, 1972. ACM. [29] Frederic Ira Parke. A parametric model for human faces. PhD thesis, 1974. [30] Hubert Nguyen. Gpu gems 3. Addison-Wesley Professional, first edition, 2007. [31] Matjaz Divjak and Horst Bischof. Real-time video-based eye blink analysis for detection of low blink-rate during computer use. Technical report, 2008. [32] Michael Chau and Margrit Betke. Real time eye tracking and blink detection with usb cameras. Technical report, 2005. [33] Ric Heishman and Zoran Duric. Using image flow to detect eye blinks in color videos. In WACV ’07: Proceedings of the Eighth IEEE Workshop on Applications of Computer Vision, page 52, Washington, DC, USA, 2007. IEEE Computer Society. [34] Dr. Gary Rost Bradski and Adrian Kaehler. Learning OpenCV, 1st edition. O’Reilly Media, Inc., 2008. [35] Ken Perlin. Improving noise. In SIGGRAPH ’02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 681–682, New York, NY, USA, 2002. ACM. [36] Stefan Gustavsson. Noise. http://webstaff.itn.liu.se/~stegu/aqsis/ aqsis-newnoise/. [37] Ssao. 2009.. http://www.gamerendering.com/2009/01/14/ssao/, January. 46.

(51)

No results found