Per-actor Based Optimization for Semantic-preserving Facial Rig Generation Using Sample Data

Full text

(1)LiU-ITN-TEK-A--21/030-SE. Per-actor Based Optimization for Semantic-preserving Facial Rig Generation Using Sample Data Josefine Klintberg 2021-06-16. Department of Science and Technology Linköping University SE-601 74 Norrköping , Sw eden. Institutionen för teknik och naturvetenskap Linköpings universitet 601 74 Norrköping.

(2) LiU-ITN-TEK-A--21/030-SE. Per-actor Based Optimization for Semantic-preserving Facial Rig Generation Using Sample Data The thesis work carried out in Medieteknik at Tekniska högskolan at Linköpings universitet. Josefine Klintberg Norrköping 2021-06-16. Department of Science and Technology Linköping University SE-601 74 Norrköping , Sw eden. Institutionen för teknik och naturvetenskap Linköpings universitet 601 74 Norrköping.

(3) Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/. © Josefine Klintberg.

(4) Linköping University | Department of Technology and Natural Science Master’s thesis, 30 ECTS | Media Technology 21 | LiU-ITN-TEK-A--21/030--SE. Per-actor based optimization for semantic-preserving facial rig generation using sample data Individuell optimering för semantik-bevarande generation av ansiktsriggar med hjälp av träningsdata Josefine Klintberg Supervisor: Apostolia Tsirikoglou Examiner: Jonas Unger External supervisors: Dan Englesson, Rasmus Haapaoja & Nils Lerin. Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se.

(5) Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.. Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.. © Josefine Klintberg.

(6) Abstract. The creation of high-resolution digital humans and computer generated characters are frequently occurring within film production. During the design process of digital humans, special attention is given to the character’s face, since it allows for much of the emotional expressiveness, and a facial rig that is to be used within a film production can be built up by hundreds of expressions. The creation of the rig is a time consuming task for artists and can be automated in order to generate personalized facial rigs that can be adapted towards sample data. With high emphasis on the need of combining recent research and technology regarding automatic facial rig generation with the artistic aspect and the usage of digital humans within film production pipelines, this thesis project presents a scalable blendshape optimization framework that is adapted to fit within a VFX-pipeline, provides stability for various kinds of usage and makes the workflow of creating facial rigs more efficient. The framework successfully generates per-actor based facial rigs adapted towards sample data while ensuring that the semantics of the input rig are kept in the process. With the core in a reusable generic model, gradient based deformations, user-driven regularization terms, rigid alignment, and the possibility to split blendshapes in symmetrical halves, the proposed framework provides a stable algorithm that can be applied to any target blendshape. The proposed framework serves as a source for investigating and evaluating parameters and solutions related to automatic facial rig generation and optimization..

(7) Acknowledgments I would like to direct great appreciation to all the people at Goodbye Kansas for letting me perform my thesis project at the company, it has truly been a great experience. I would also like to thank my examiner and my supervisor at Linköping University for their support throughout the project and to my supervisors at Goodbye Kansas who kept on pushing me and cheering me on, and who always were there for a meeting or to discuss some results. I would also like to thank the entire pipeline department at Goodbye Kansas for all the fika’s, the game nights and for creating an open environment that inspires dedicated work. Additionally I would like to direct thanks to my sister Therese, the one who is always there to keep me on track, and to my family for being the rock to lean on when I need it. To my nephew Nils who came along to join this party at the very end of the project. To my friends and the dedicated teachers I have met during my MSc studies in Media Technology. To my friends from my previous studies in Engineering Physics, who encouraged me to follow my dreams and pursue my interest in computer graphics, this thesis would not have happened if it were not for you. To Karin, who was here for the beginning, but not the end, you will never be forgotten.. iv.

(8) Dictionary The following terms and abbreviations are of importance for this thesis report. The full word is defined at the first mention and then the abbreviation is used for further mentions. BFGS – Broyden-Fletcher-Goldfarb-Shanno algorithm; an iterative method to solve unconstrained, non-linear, optimization problems. FACS – Facial Action Coding System; a manual for separating facial expressions according to muscle movements. CGC – Computer Generated Character. CGI – Computer Generated Imagery. HMC – Head Mounted Camera. L-BFGS-B – Limited-memory BFGS algorithm extended to handle simple bounds. LU decomposition – Lower-Upper decomposition. MSE – Mean Squared Error. obj – Object file; a file format in ASCII-format for describing 3D-objects. Quad Mesh – Quadrilateral mesh; a mesh consisting of faces built up by four vertices. SVD – Singular Value Decomposition. VFX – Visual effects.. v.

(9) Typographical Conventions Vector – v Matrix – M Code – example_function() Proper name and software – ExampleName Abbreviations – Stated in full at first mention and defined in dictionary. vi.

(10) Contents Abstract. iii. Acknowledgments. iv. Dictionary. v. Typographical Conventions. vi. Contents. vii. List of Figures. ix. List of Tables. xi. List of Algorithms 1. 2. Introduction 1.1 Background . . . . 1.2 Motivation . . . . . 1.3 Aim . . . . . . . . . 1.4 Research questions 1.5 Delimitations . . .. xii. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 1 1 2 2 2 2. Theory 2.1 VFX-pipeline: related concepts . . . . . . . . . . 2.1.1 Facial rigs . . . . . . . . . . . . . . . . . . 2.1.2 3D software . . . . . . . . . . . . . . . . . 2.1.3 Motion capture . . . . . . . . . . . . . . . 2.1.4 Facial scans . . . . . . . . . . . . . . . . . 2.1.5 Facial Action Coding System . . . . . . . 2.2 Blendshapes . . . . . . . . . . . . . . . . . . . . . 2.2.1 Blendshape representation . . . . . . . . 2.2.2 Principal components . . . . . . . . . . . 2.3 Mesh processing . . . . . . . . . . . . . . . . . . . 2.3.1 Shape matching . . . . . . . . . . . . . . . 2.3.2 Deformation transfer . . . . . . . . . . . . 2.3.3 Displacement fields . . . . . . . . . . . . . 2.4 Optimization theory . . . . . . . . . . . . . . . . 2.4.1 Regularization . . . . . . . . . . . . . . . 2.5 Related work . . . . . . . . . . . . . . . . . . . . . 2.5.1 Blendshape generation and optimization 2.5.2 Machine learning approach . . . . . . . . 2.5.3 Numerical approaches . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. 3 3 3 3 4 4 5 5 5 6 7 7 8 14 14 14 15 15 18 19. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. vii. . . . . .. . . . . .. . . . . .. . . . . .. . . . . ..

(11) 3. 4. 5. 6. Method 3.1 Implementation . . . . . . . 3.1.1 Preprocessing . . . . 3.1.2 Input parsing . . . . 3.1.3 Deformation transfer 3.1.4 Optimization . . . . 3.1.5 Utility functions . . . 3.1.6 Dependencies . . . . 3.2 Evaluation . . . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 20 20 22 22 24 29 33 33 34. Results 4.1 Pre-study . . . . . . . . . . . . . . . . . . . 4.2 Implementation . . . . . . . . . . . . . . . 4.2.1 Deformation transfer . . . . . . . . 4.2.2 Estimation of blendshape weights 4.2.3 Optimization and regularization . 4.2.4 Combo-shapes . . . . . . . . . . . 4.3 Evaluation . . . . . . . . . . . . . . . . . . 4.3.1 Deformation transfer . . . . . . . . 4.3.2 Blendshape weights estimation . . 4.3.3 Optimization algorithm . . . . . . 4.3.4 Manual labour time . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. 35 35 35 36 37 37 40 42 42 45 46 52. Discussion 5.1 Results . . . . . . . . . . . . . . . . . . . . 5.1.1 Optimization algorithm . . . . . . 5.1.2 Estimation of blendshape weights 5.1.3 Reproduction of sample data . . . 5.1.4 Combo-shapes . . . . . . . . . . . 5.1.5 Time and efficiency . . . . . . . . . 5.1.6 Deformation transfer . . . . . . . . 5.2 Method . . . . . . . . . . . . . . . . . . . . 5.3 The work in a wider context . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. . . . . . . . . .. 54 54 54 55 55 55 55 56 57 57. Conclusion 6.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59 59 61. Bibliography. 63. A Appendix A A.1 Base-shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 66 66. viii.

(12) List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7. An overview of the light stage for performing facial scans at Goodbye Kansas . . . . A schematic view of the delta blendshape formulation . . . . . . . . . . . . . . . . . PCA of a randomly generated data set of two dimensions . . . . . . . . . . . . . . . An overview of deformation transfer from a source mesh to a target mesh . . . . . The linear system for performing deformation transfer in the case of a source and target mesh of different topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The linear system for performing deformation transfer in the case of a source and target mesh of consistent topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . A comparison of using displacement vectors compared to deformation transfer for transferring a deformation of a line . . . . . . . . . . . . . . . . . . . . . . . . . .. 3.1 3.2 3.3 3.4 3.5 3.6. A conceptual overview of the blendshape optimization framework . . . . A flowchart providing an overview of the implemented framework . . . . Masks generated in Maya by explicitly defining stationary vertices . . . . An overview of the matrix used to perform blendshape optimization . . . A visualization of how the 3 ˆ 3 matrices were flattened into 1 ˆ 9 vectors A visualization of the matrix B and B T B for a face k in the mesh . . . . . .. 4.1. The result of performing deformation transfer from a generic facial rig to a target facial rig of the same topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A visual result of the initial estimation of the blendshape weights used to reproduce the training poses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A visual result of the optimization workflow for a blendshape . . . . . . . . . . . . A visual result of a number of blendshapes after the estimation stage and the corresponding blendshapes after the refinement stage . . . . . . . . . . . . . . . . . . . The resulting reproduction of scanned data after the blendshape optimization . . . A result of how well the scan happy can be reproduced using the blendshape model A result of how well the scan sad can be reproduced using the blendshape model . A number of combo-shapes outputed by the blendshape optimization framework A comparison of deformation transfer compared to displacement vectors . . . . . . A closeup comparison of deformation gradients compared to displacement vectors A visualization of the influence of the stationary weights ws . . . . . . . . . . . . . An image displaying a self-intersection issue when using deformation transfer . . A visualization of the non-zero structure of A T A . . . . . . . . . . . . . . . . . . . . A plot of processing time versus the number of faces in the target and source meshes for performing deformation transfer . . . . . . . . . . . . . . . . . . . . . . A plot of the total processing time versus the number of blendshapes used in the optimization process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A visualization of the affected areas in the optimized blendshapes . . . . . . . . . . Influence of the value of β in the blendshape optimizer . . . . . . . . . . . . . . . . A visualization of how the optimized blendshape lipPucker performs in comparison to a modelled blendshape for the same target character . . . . . . . . . . . . . .. 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18. ix. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 4 6 7 9 13 13 14 21 22 26 30 31 31. 36 37 38 38 39 40 40 41 42 42 43 43 44 45 47 48 48 49.

(13) 4.19 A visualization of how the optimized blendshape lipCornerPuller performs in comparison to a modelled blendshape for the same target character . . . . . . . . . . . 4.20 The influence of altering the value of β when running the optimization framework for the scanned expression happy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.21 A reconstruction of the scan scream with no rigid alignment . . . . . . . . . . . . . . 4.22 A reconstruction of the scan scream with rigid alignment . . . . . . . . . . . . . . . 4.23 The result of reproducing a complex and asymmetrical training pose . . . . . . . . 4.24 The result of reproducing a complex and asymmetrical training pose using the method of splitting blendshapes in halves . . . . . . . . . . . . . . . . . . . . . . . . 4.25 A visual comparison between a number of combo-shapes generated by the blendshape optimization framework and the corresponding manually modelled comboshape for the same target character . . . . . . . . . . . . . . . . . . . . . . . . . . . .. x. 49 50 51 51 52 52. 53.

(14) List of Tables 4.1 4.2 4.3 4.4 4.5 4.6 4.7. Mesh data for the facial meshes used for performance measurements consisting of four different resolutions of the same model . . . . . . . . . . . . . . . . . . . . . . . Blendshape data used for obtaining the results . . . . . . . . . . . . . . . . . . . . . Time measurements for performing deformation transfer for one single blendshape The influence of λs in the initial blendshape weights estimation . . . . . . . . . . . Time measurements for the optimization framework . . . . . . . . . . . . . . . . . . Time measurements for the optimization framework for the high-resolution mesh L0 with a varying amount of blendshapes used in the optimization . . . . . . . . . Manual labour time spent on creating facial rigs . . . . . . . . . . . . . . . . . . . .. xi. 35 36 45 46 46 47 52.

(15) List of Algorithms 1 2 3 4. Blendshape optimization . . . . . . . Deformation transfer . . . . . . . . . . Shape matching . . . . . . . . . . . . . Finding the optimal triangle gradients. . . . .. xii. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 21 25 28 32.

(16) 1. Introduction. The creation of high-resolution digital humans and computer generated characters (CGC)s is a topic associated with many challenges. Humans are very good at spotting irregularities and if the visual result is not sufficient in a CGC, it can end up giving a rather unpleasant impression for the viewer, often referred to as Uncanny Valley [1]. In order to achieve credible digital humans, a combination of technology, design and human perception comes together. Digital humans are frequently occurring within film production. The CGCs are often created to mimic an existing actor, and require to provide a wide range of their unique appearances and expressions. During the design process of CGCs, special attention is given to the character’s face, since it allows for much of the emotional expressiveness. Typical digital characters face development includes the creation of facial rigs consisting of a large amount of facial expressions. A facial rig that is to be used within a film production can be built up by hundreds of expressions. The creation of the rig is a time consuming task for artists where the facial expressions are either sculpted manually per shape, or tweaked by hand from the target rig to be similar to sample data such as performance capture. This time consuming process can be automated in order to generate personalized facial rigs that can be adapted towards sample data.. 1.1. Background. At the VFX-studio Goodbye Kansas, a lot of CGCs are created for tv-shows, films and commercials, where the creation of the facial rigs tends to be a very time consuming process. With an automated framework for personalized facial rig generation, a production ready facial rig could be created more efficiently and require less manual work. This framework should be designed to fit into the existing pipeline for creating production ready facial rigs.. 1.

(17) 1.2. Motivation. 1.2. Motivation. The motivation for this thesis project is to use research findings within the topic of digital humans in order to implement and evaluate an automatic workflow to generate individualized facial rigs from sample data. In the process of the thesis project, the following challenges need to be addressed: • How to fit such an automatic framework into an existing VFX-pipeline. • How to enable the framework to use a range of sample data such as scanned geometry, hand sculpted poses or markers from video footage. • How to ensure that the semantics of the template rig are maintained when fitting the blendshapes in the rig to the sample data. • How to ensure that the framework can handle and separate mixed expressions as often is the case with performance capture data.. 1.3. Aim. The aim of this thesis project is to investigate how to generate a personalized facial rig with the use of a generic facial rig that is adapted towards target sample data while the semantics of the generic rig remains. Subsequently, this thesis project aims at investigating how such a workflow influences the visual result, the correctness and the processing time.. 1.4. Research questions. This thesis project seeks to answer the following research questions: 1. What are the important factors to consider when implementing and designing a framework that generates per-actor based facial rigs using a generic blendshape model and is adapted to fit within an existing VFX-pipeline? 2. What implementations used in the framework will have considerable influence on the visual result, correctness and processing time for the generated facial rig? 3. How can the framework ensure that the semantics of the generic facial rig are maintained? 4. What solutions can be made to handle the case of mixed expressions in scanned data?. 1.5. Delimitations. This thesis project will be focused on the creation of digital humans for usage within movie production. As highlighted by Li et al. [2] there exists a large gap between high-end and low-end productions of digital humans and since this thesis project is performed with the focus on the film industry, a high-end usage application will solely be considered. Due to the current pandemic of COVID-19 the method will be designed with a starting point in remote work which includes additional time for separate steps in the implementation since unexpected time might be required to learn internal systems at the company and to setup for remote work.. 2.

(18) 2. Theory. This chapter will provide an overview of the theory behind this thesis project and will cover basic concepts used within a VFX-pipeline, mesh modifications and deformations, optimization theory and some insights into related literature and research.. 2.1. VFX-pipeline: related concepts. This section will cover some core aspects within a VFX-pipeline that is of importance to understand this thesis project and relates to the creation of facial rigs.. 2.1.1. Facial rigs. The face of a human is a complex area where a lot of the human expressions and emotions are showed. With a lot of micro-structures in the skin and movement of muscles, the simple task of smiling involves deformations around the face such as wrinkles around the eyes or dimples showing up in the cheeks. This is not only a complex relationship for every face individually, but also varies from person to person. The face of a CGC is represented by a facial rig which essentially is a mesh of the face of a character with integrated control points. The control points allow an animator to steer the deformation of the face and thus be able to create different facial expressions [3]. The creation of a production facial rig to be used in a high quality application, such as film production, requires a lot of design, manual tweaking and care over how the expressions of the CGC should vary. For high-end usage it is also essential to provide a very detailed and realistic, and therefore highly complex, facial rig [4].. 2.1.2. 3D software. A well used way in order to create a facial rig for a CGC is for artists to sculpt the facial poses through utilizing a 3D software such as Maya [5]. 3D softwares offer a range of tools that allow for deformation and control of a mesh. It is therefore possible to create very specific expressions in isolation, but it is a very time consuming process [4].. 3.

(19) 2.1. VFX-pipeline: related concepts. 2.1.3. Motion capture. Motion capture is a method that was first introduced with the aim to speed up the animation process and it is often used to animate humans. In motion capture, sensor data from a moving actor are captured using cameras. These data describe the motion of the actor, and can be transformed onto a CGC in a CGI environment. Related to the capturing of faces, there exists specific capture sessions utilizing a Head Mounted Camera (HMC) [6].. 2.1.4. Facial scans. Another option to generate a CGC is through using scanning systems. Instead of having a single camera on set in the motion capture studio, the actor is put in the middle of a spherical setting; a light stage, with many cameras that captures the face or body of the actor from a range of angles and with different light settings. An overview of a light stage can be seen in Figure 2.1. The result of a light stage capture, such as the one at Goodbye Kansas, is a high-resolution mesh, as well as additional data such as normals, specular maps and gradients which can be used for producing high-resolution skin textures.. Figure 2.1: Overview of the light stage for performing facial scans at Goodbye Kansas.1 1 Image. retrieved with permission from Goodbye Kansas, https://goodbyekansasgroup.com/.. 4.

(20) 2.2. Blendshapes. 2.1.5. Facial Action Coding System. The Facial Action Coding System (FACS) is a manual that was first published in 1978 by Ekman and Friesen and later revised in 2002 by Ekman, Friesen and Hager [7] where facial expressions are divided and separated according to muscle movements. The utilization of FACS expressions can be found both in application and in research within the film- and VFX-industry [7]. However, it is important to note that capturing FACS expressions may not be perfect, since it is difficult for an actor to perform a coded expression in isolation, during a motion capture session or during facial scanning. Movements of particular parts of the face usually engage more facial muscles than desired. For example, it is hard to perform "noseWrinkler" (a FACS expression) without engaging the forehead.. 2.2. Blendshapes. It is possible to express a facial model in a number of different ways, such as using a parametric approach or through a physically based description. A common description within the computer graphics industry is the usage of blendshapes [3].. 2.2.1. Blendshape representation. A blendshape is essentially a facial mesh for a specific facial expression, such as a FACS expression, and a number of blendshapes are used to build up a facial rig. The blendshapes can be expressed mathematically and provide a way to generate a wide range of facial expressions through using linear combinations of different blendshapes and weights. By simply varying the weights, additional combinations and complex expressions can be generated [8]. Since this approach is a form of morphing a shape onto another, the basis vectors in the linear combination are referred to as blendshape targets or morph targets and the corresponding weights as sliders or blending weights [3]. Using this mathematical description, it is possible to express a blendshape model consisting of k different blendshapes according to Equation (2.1), where f is the resulting face, w is the blendshape weights and b is the blendshapes [3]. f=. n ÿ. wk bk. (2.1). k=0. Blendshape weights are typically constrained between 0.0 and 1.0 where a value of 0.0 means that the specific blendshape has no effect on the base-shape and a value of 1.0 means that the specific blendshape is added with full effect to the base-shape [9]. In 3D software such as Maya it is possible to overshoot the values outside of this range for artistic purposes [5]. With the definition of a neutral blendshape, b0 , that describes the target’s resting pose, it is natural to describe the blendshapes bk in the facial rig as offsets from the neutral blendshape. This means that Equation (2.1) can be formulated according to Equation (2.2). This is also known as the local or delta blendshape notation [3]. f = b0 +. n ÿ. w k ( b k ´ b0 ). (2.2). k =1. The corresponding formulation using matrix notation can be seen in Equation (2.3), where B is the blendshape basis as defined in Equation (2.2) [3]. f = b0 + Bw. (2.3) 5.

(21) 2.2. Blendshapes As explained by Lewis et al. [3], the delta formulation of blendshapes can be visualized in a schematic way with the neutral blendshape at the origin and blendshape targets situated at vertices of a N-dimensional hypercube which can be seen in Figure 2.2. This provides a meaningful way to think of the combination of blendshapes. The increasing of a blendshape weight, such as w1 , will "move" the neutral pose along the edge towards blendshape b1 and can result in a number of interim shapes along the way [3]. There exists many benefits of using blendshapes, such as the simple mathematical description and the possibility to perform morphing between shapes which can be used for animation purposes [10]. The linear formulation and the intuitive interpretation is also helpful for artists [3].. Figure 2.2: A schematic view of the delta blendshape formulation according to [3].. There exist a lot of positive aspects that are related to the usage of blendshapes, but it also comes with some drawbacks and notable characteristics. Even though the linear formulation of blendshapes is easy to interpret, the basis vectors in this model are not orthogonal or independent which can result in complex problem formulations when handling topics such as blendshape generation [3].. 2.2.2. Principal components. Blendshapes can also be formulated using a different technique, known as principal components. Principal component analysis (PCA) is a technique where a problem with many variables are projected onto a subspace to reduce the dimension [3]. PCA often provides a better visualization of data and the possibility to explore variance in the data [3]. An example of PCA where a randomly generated data set of two dimensions is visualized can be seen in Figure 2.3. Here, the eigenvectors have been calculated using singular value decomposition (SVD), scaled using the square root of the corresponding eigenvalue and further translated to have their origin at the mean value, which provides a way to investigate the principal axis’s and the variance of the data. 6.

(22) 2.3. Mesh processing. Figure 2.3: An example of PCA with a randomly generated data set of two dimensions. The eigenvectors are visualized in black, scaled using the square root of the corresponding eigenvalue and translated to have their origin at the mean value. Visualized using Matplotlib [11] and Scikit-learn [12]. PCA can be used for N-dimensional analysis and is suitable for blendshape, or mesh, data processing since its vertex coordinates can be represented as a 3D point cloud. There exists a simple transformation between the blendshape representation and a PCA representation which can be useful in certain applications. The conversion between a blendshape and the PCA representation can be done according to Bw + f0 = Uc + e0 ,. (2.4). where U and c are the PCA eigenvectors and coefficients respectively, f0 is the neutral face and e0 is the mean face. Using the PCA representation for blendshapes allows for formulating the blendshapes as an orthogonal basis but it also means that some of the easy interpretability is lost [3].. 2.3. Mesh processing. This section covers important, for this thesis, mesh processing techniques, such as shape matching and mesh deformation approaches, where the focus is on deformation transfer as proposed by Sumner and Popović [13] and deformation using displacement fields.. 2.3.1. Shape matching. When handling different meshes and comparing them to each other, the need of matching them to have the same origin and rotation is often required. This is especially the case when looking at blendshape optimization towards sample data. The performance capture or the scanned data can significantly differ in rotation or translation compared to the generic blendshape model, which can further result in worse performance when estimating the blendshape weights. Shape matching can be done in a number of ways where a common method is through using SVD and a least-squares formulation of the problem, as proposed by Arun et al. [14]. The problem consists of finding the optimal translation t and rotation R that best aligns the two sets of 3D points, tm1 , ..., mn u and ts1 , ..., sn u, according to min R,t. n ÿ. ||(Rmi + t) ´ si ||2 ,. (2.5). i =1. where R is a 3 ˆ 3 rotation matrix, with the assumption that the rotation is centered around the origin, and t is a 3D translation vector [14]. 7.

(23) 2.3. Mesh processing. Given that the solution to the least-squares problem in Equation (2.5) is found as R and t, then the point sets tm1 , ..., mn u and ts1 , ..., sn u will have the same centroid [14]. From the centroids, the center vectors can be computed for each point set according to Equation (2.6) where m and s are the centroids of the respective point set [15]. m ci = m i ´ m s ci = s i ´ s. (2.6). From the formulation of center vectors, the process of shape matching can be divided into first solving for the optimal rotation, and then, given the rotation, calculating the optimal translation [14]. In order to solve for the optimal rotation R, a correlation matrix is calculated according to Equation (2.7) [15], which also can involve weights if implemented as a weighted least-squares formulation [16]. H=. n ÿ. mci scTi. (2.7). i =1. Using the SVD of H, given as H = UΛV T , the optimal rotation matrix R can be found according to Equation (2.8) [16]. R = VU T. (2.8). From the optimal rotation, the optimal translation can be found according to Equation (2.9), which essentially aligns the centroid of the rotated point set tm1 , ..., mn u with the centroid of the point set ts1 , ..., sn u [15]. t = s ´ Rm. 2.3.2. (2.9). Deformation transfer. To modify a blendshape or facial expression, it requires deformation of the mesh. This can be done in a number of ways such as by manipulating the actual vertex positions or by using deformation in gradient space, tangent space or rig-space as an example. A well used and highly adaptable model for usage towards blendshape deformation is gradient guided deformations using deformation transfer as proposed by Sumner and Popović [13]. Deformation transfer is the process of using a deformation of a source mesh and then transfer this deformation to a target mesh based on a correspondence between the two meshes. This correspondence can either be established manually, by finding correspondence points or through a one-to-one relationship for meshes with the same topology [13]. The method provides a very useful way to transfer deformations towards a target mesh that behaves in a similar manner as the source mesh, but without having to manually create the deformed meshes for the target. This process can be applied to blendshapes and an example given by Sumner [17] of a facial mesh’s deformation transfer process can be seen in Figure 2.4. The figure shows the deformation transfer from source facial scans in the top row, to the target facial meshes in the bottom row.. 8.

(24) 2.3. Mesh processing. Figure 2.4: An overview of deformation transfer from a source mesh, facial scans (top row), to a target facial mesh (bottom row). Image source from [17]. Source deformations The first step of deformation transfer is to describe the deformation in the source mesh. This is the task of finding the mapping between the undeformed vertices vi , i P 1...3, in the neutral source mesh and the deformed vertices v˜ i , i P 1...3. This mapping can be expressed as an affine transformation, but there may exist an infinite number of transformations that all describe the same mapping and makes it impossible to find a unique solution [13]. In order to ensure that the affine transformation between the original vertex point and the translated vertex point is unique, Sumner and Popović [13] propose a method that introduces a fourth vertex to describe each triangle, given a triangulated mesh. The position of this new vertex is given through an offset in the normal direction of each current triangle, according to. ( v2 ´ v1 ) ˆ ( v3 ´ v1 ) n ˆ = v1 + n. v4 = v1 + a = v1 + |n| ||(v2 ´ v1 ) ˆ (v3 ´ v1 )||. (2.10). With the notation of the affine transformation as J, which is described by a 3 ˆ 3 matrix, and the displacement vector as d, the transformation can also be expressed for a specific triangle in the mesh [13] according to. Jvi + d = v˜ i ,. i P 1...4.. (2.11). The displacement d in Equation (2.11) can be eliminated through subtracting one of the vertices from the others. Inspecting Equation (2.10) it is evident that subtracting v1 will eliminate ( v ´v )ˆ( v3 ´v1 ) it from the computation of the fourth vertex and will result in v4 ´ v1 = ? 2 1 = ||(v2 ´v1 )ˆ(v3 ´v1 )||. n. ˆ With this description it is possible to write the triangle gradient V for each non-deformed ˜ for each deformed triangle in the mesh actriangle in the mesh, and the triangle gradient V cording to Equation (2.12) [13]. V = [ v2 ´ v1 ˜ = [v˜ 2 ´ v˜ 1 V. v3 ´ v1. v4 ´ v1 ] = [ v2 ´ v1. v3 ´ v1. v˜ 3 ´ v˜ 1. v˜ 4 ´ v˜ 1 ] = [v˜ 2 ´ v˜ 1. v˜ 3 ´ v˜ 1. nˆ ] n˜ˆ ]. (2.12). 9.

(25) 2.3. Mesh processing Sumner and Popović [13] further formulate the deformation gradient J according to ˜ ´1 , J = VV. (2.13). where the introduction of the fourth vertex gives a unique way to describe the transformation from the triangle given by the vertices vi , i P 1...3, to the triangle given by v˜ i , i P 1...3. Correspondence A relationship between the target and source mesh is required in order to decide which triangles in the meshes should behave in a similar manner. This relationship can be expressed in different ways and Sumner and Popović [13] describe the correspondence in the form of a list of pairs that relate the vertex indices in the source mesh to those in the target mesh utilizing an iterative closest point algorithm. This correspondence mapping is defined by Sumner [17] for each triangle s in the source mesh and triangle t in the target mesh according to. M = t(s1 , t1 ), ..., (s|M| , t|M| )u.. (2.14). The mapping for correspondence may be a one-to-many mapping that allows one triangle in the source mesh to correspond to many triangles in the target mesh or vice versa. This allows for a way to process meshes that have a different number of vertices and triangles and makes the algorithm very general and applicable in multiple scenarios [17]. The correspondence is computed through three steps, which consist of a user controlled step with adding of correspondence points, a template fitting algorithm and a triangle pairing algorithm [17]. Template fitting The aim of template fitting is to deform the target mesh, referred to as template in this step, such that its shape is matching the source mesh, the destination [17]. Notable is that the opposite can also be used, that is to match the source mesh to the target mesh instead. Sumner [17] points out that when choosing which of the meshes that should be used as the template, it is worth to consider which mesh has the most vertices and choose this as the template. The new vertices for the template mesh are given through solving the minimization problem min. v˜ 1 ..v˜ n. subject to. wS ES + w I E I + wC EC , v˜ tk = mk ,. (2.15). k P 1...m,. where the term ES is a smoothness term that ensures that adjacent triangles are equal, E I is a term that makes sure that the optimization is not drifting away and causing a large difference in shape, EC is a term looking at the closest valid point and sets that each vertex position in the template mesh should be set to the closest valid point on the destination mesh. wS , w I and wC are constants, tk is the template vertex index for marker k and mk is the position of marker k on the destination mesh [17]. It is pointed out by Sumner and Popović [13] that the template fitting is similar to the aim of the deformation transfer but also state that "the objective function is designed to deform one mesh into the other, rather than deforming it like the other deforms" [13, p. 4] which makes the aim of the correspondence clear.. 10.

(26) 2.3. Mesh processing Template fitting is also highlighted by Sumner [17] to be used as a standalone application that can create correspondence between two meshes and especially points out the example of using this for matching facial scans with a facial mesh. Triangle pairing The triangle pairing step is performed in order to create the correspondence mapping according to Equation (2.14). Sumner [17] describes the approach that builds up a many-to-many mapping where a new entry is added to M given by the closest distance between a triangle in the template and the destination. The new entry is added only if the angle between the triangle normals is less than 90˝ and the centroids of the triangles are within a maximum value from each other. This is to ensure that two triangles that are near to each other but have different orientation are not given the same correspondence [13]. The case of consistent topology In the case of consistent topology between the source mesh and the target mesh, the correspondence map is simply a one-to-one mapping containing the full vertex-list for the source and target mesh. The vertices in the source mesh are describing the deformation for the vertices in the target mesh corresponding to the same vertex id’s [17]. Transfer The final step of deformation transfer is to perform the actual transfer of the deformation from the source mesh to the target mesh. This is done through finding the optimal deformation gradients, T, that minimize the difference between the deformation gradients in the source mesh according to n ÿf min ||S j ´ T j ||2F , (2.16) v˜ 1 ...v˜ n. j =1. where S j is the deformation gradients per face in the source mesh, T j is the deformation gradients per face in the target mesh, n f is the number of faces and F denotes the Frobeniusnorm. Given the optimal deformation gradients, it is possible to find the optimal vertex positions [13]. Solving In order to find the new vertex positions that minimize the difference between deformation gradients according to Equation (2.16), it is necessary to formulate T in such a way that makes it possible to relate the vertex positions v˜ and the deformation gradients of the source mesh [13]. Sumner [17] shows that it is possible to use only three vertices for each triangle for the unknown vertex positions and thereby Equation (2.12) is reduced to the 3 ˆ 2 matrices according to V = [ v2 ´ v1 ˜ = [v˜ 2 ´ v˜ 1 V. v3 ´ v1 ], v˜ 3 ´ v˜ 1 ].. (2.17). Using the reduction of complexity according to Equation (2.17), it is possible to use the factorization of T according to Equation (2.18) in order to determine the relationship between the deformation gradients T and the deformed vertex positions v, ˜ where j is the current face and Q jα and R j come from the QR-factorization of the triangle gradients of the source mesh Rj Rj [17] such that V j = Q j = Q jα |Q jβ = Q jα R j [18]. 0 0. 11.

(27) 2.3. Mesh processing.  ˜ j R´1 Q T = ´ Tj = V jα j. | v˜ 3 ´ v˜ 1 |. | v˜ 2 ´ v˜ 1 |.  ´. . a d. b e. c f. (2.18). The deformation gradients and the deformed vertices have a linear relationship and thereby allow for formulating Equation (2.16) as a linear matrix system according to min. v˜ 1 ...v˜ n. ||c ´ Ax˜ ||22 ,. (2.19). where c contains the deformation gradients for the target mesh, x˜ is the unknown new vertices for the target and A is a large sparse matrix with entries from the QR-factorization of the triangle gradient of the undeformed target mesh [17]. In order to solve for the unknown deformed vertex positions v, ˜ Sumner and Popović [13] formulate the least-squares formulation according to Equation (2.19). This approach is used since the matrix A has more rows than columns, which results in an overdetermined problem [17]. With expansion of the matrix norm and utilizing the derivative of v, Equation (2.19) can further be expressed according to A T Ax˜ = A T c,. (2.20). which can be solved numerically using any sparse solver [17]. The matrices in Equation (2.20) will look differently depending on if the source and target mesh have the same topology or not. The matrix formulation for the two cases can be seen in Figure 2.5, for the case of having meshes with different topology, and in Figure 2.6, for the case of consistent topology. The matrix formulations are described following the notation by Sumner [17]. In the derivation of the matrices, it is important to consider that the optimization problem is invariant to translation and the global translation needs to be removed in order to ensure that the output have the correct translation in space [17]. This is solved by Sumner and Popović [13] through choosing one vertex and explicitly setting the displacement according to Equation (2.11). Sumner [17] explains in a more detailed way that it is also possible to handle the global translation in the forming of the matrix A through eliminating the columns corresponding to stationary vertices and further multiply the columns with their respective constrained value and subtract them from c. Deformation transfer can serve as a useful tool both for deforming meshes of different resolution, and in cases of same topology as it performs better due to its ability of capturing differential changes [17].. 12.

(28) 2.3. Mesh processing. 1. 2. i2. ... i3. ... ... i1. .. n  ´v˜ T ´ . . 1 ´v˜ 2T   .     ..  .     3j   3j + 1   3j + 2       ..   .     . (á ´ d) (´b ´ e) (ć ´ f ). d e f. . . ´     SsT   ..      1    .      T ´    ´ v ˜ i2   ..       . . ..             T  ´v˜ i3 ´    S T     sj    ..        .       T = ´ v ˜ ´   i1     .   . .   .  ..        ´v˜ nT ´            S T   s |M|    . ¨¨¨ a b c. . . ¨¨¨. |M| A 3|M| ˆ n. c 3|M| ˆ 3. x˜ nˆ3. Figure 2.5: The linear system for the case of a source and target mesh of different topology. M is the correspondence according to Equation (2.14) with j P 1...|M| pairs of source and target triangles, x˜ is the new, unknown, vertices, S is the deformation gradients of the source mesh and the entries in A come from Equation (2.18) [17]. 1. 2. ... i2. ... i3. ... i1.      ..  .     3j   3j + 1   3j + 2       ..   .     . .. n  ´v˜ T ´  1 T ´ v ˜   2. ¨¨¨ a b c. d e f. ¨¨¨. m A 3m ˆ n. (á ´ d) (´b ´ e) (ć ´ f ). .  .  ´     . S1T  .    .     T   ´v˜ i ´   .. 2   .  ..      .     ´v˜ T ´    ST  i3  j   .    .    .    T   ´v˜ i1 ´  =  ..   ..   .   .     ´v˜ nT ´      S T   m  . x˜ nˆ3.                            . c 3m ˆ 3. Figure 2.6: The linear system for the case of a source and target mesh of the same topology. m and n is the number of triangles and number of vertices in the meshes respectively, x˜ is the new, unknown, vertices, S is the deformation gradients of the source mesh and the entries in A come from Equation (2.18) added for each target triangle j P 1...m [17].. 13.

(29) 2.4. Optimization theory. 2.3.3. Displacement fields. Another way to represent a deformation in a mesh is through the usage of displacement fields. This approach explicitly defines a displacement vector for each vertex in the mesh that describes the displacement per vertex for the current mesh compared to a reference mesh [17]. In the case of facial expressions, this corresponds to defining the displacement vector per vertex between the neutral blendshape and the current blendshape. With this method, it is possible to transfer deformation between a target mesh and a source mesh, similarly to deformation transfer, using the displacement vectors and apply them to the source mesh. Although being a straightforward approach, there exist limitations compared to deformation transfer. Sumner [17] notes that displacement vectors fail at preserving differential changes compared to deformation transfer as illustrated in Figure 2.7 for a deformation of a line.. Figure 2.7: Displacement vectors compared to deformation transfer for transfer of a deformation. (A) A source deformation of a line. (B) A shorter target line after transfer of the deformation with displacement vectors. (C) A shorter target line after transfer of the deformation using deformation gradients. Image source from [17].. 2.4. Optimization theory. The basic idea of optimization is to find the optimal solution that maximizes or minimizes a certain criteria given some constraints. The difficulty of solving an optimization problem is related to both its complexity and dimension. Complexity is often related to whether the optimization problem to be solved is linear or non-linear and to the constraints and how easily they can be applied. Dimension is directly related to the number of decision variables that build up a problem with the difficulty to increase proportionally with the variables, i.e. it is harder to solve large-scale problems that can contain more than a thousand variables [19]. This is an important aspect when handling optimization problems targeting facial meshes since the number of faces in a high-resolution mesh often exceeds 100,000.. 2.4.1. Regularization. Regularization is a method of adding information to a complex or ill-posed optimization problem such that a solution can be found. The solution for an optimization problem with added regularization terms may not be the perfect solution but often ensures a good solution that at the same time holds a certain degree of smoothness [20]. The regularization term, or penalty term, should be based on how the residual norm behaves when the input parameter varies [21].. 14.

(30) 2.5. Related work. 2.5. Related work. There exist previous examples of successful methods for blendshape generation that is of interest for this thesis project which range in complexity and approach; from simple model transfer methods to more recent machine learning techniques for the blendshape optimization process. A well used method is Example Based Facial Rigging (EBFR) by Li, Weise and Pauly [9] which has a foundation in deformation transfer by Sumner and Popović [13]. The usage of deformation transfer alone for model transfer has disadvantages, since the method does not take collisions into account [3], but this is solved by Li, Weise and Pauly [9] through a larger optimization process and the usage of a higher number of example expressions. The utilization of gradient based deformations can be seen in additional approaches [22], [23] where the method proposed by Seol, Ma and Lewis [23] utilizes a generic rig combined with a FACS capture session of a target actor as input and they conclude that their facial rig generation technique is well suited for automatic solving as well as manual keyframing. With a solver for rigid motion integrated in the algorithm, the method additionally allows for using sample data such as HMC-data as input [23]. The model transfer approach by Li, Weise and Pauly [9] serves as the basis for the method by Ma et al. [22] but with further integration of both a solver for rigid motion and an automatic estimation of blendshape weights. Additionally, Ma et al. [22] utilizes Laplacian deformation as described by Botsch and Sorkine [24] for an initial registration step. Recent research has also introduced machine learning into the optimization of blendshapes. Li et al. [25] shows a two-stage self-supervised learning network which provides a personalized rig from a single scan which reduces the need of input data as well as greatly improves the generation times.. 2.5.1. Blendshape generation and optimization. This section will provide a more detailed description of some of the methods related to automatic blendshape generation that is of use within this thesis project. EBFR introduced by Li, Weise and Pauly [9] is based on deformation transfer along with an iterative approach for optimization, and has been the foundation for many implementations and research within facial rigging since 2010. This method uses a generic blendshape model, consisting of a neutral pose and a number of blendshapes for displacements from the neutral pose, A = tA0 , ..., An u, along with a set of training poses that also include a neutral pose and displacements from it, S = tS0 , ..., Sm u. The goal is to find a new set of optimized blendř shapes, B = tB0 , ..., Bn u, that best reproduces the training poses Tj = B0 + in=1 αij Bi , j P 1..m where the reproduced poses Tj « S j [9]. The optimization of blendshapes is done through minimization of the global energy, given as the sum of the fitting energy and the regularization energy, for each training pose S j according to E A = E f it + βEreg = ||MSj ´ (M0B +. n ÿ i =1. αij MiB )||2F + β. n ÿ. ˚. wi ||MiB ´ MiA ||2F ,. (2.21). i =1. where β is a regularization constant, M are the triangle gradients according to Equation (2.12) where MSj is the triangle gradient for the training pose j, M0B is the triangle gradient for the neutral blendshape, MiB is the unknown triangle gradient for the new optimized blendshape Bi , αij is the estimated blendshape weights and wi is the regularization weights given 15.

(31) 2.5. Related work ˚. according to Equation (2.22) with the constants κ and θ. MiA = G A0 Ñ A0 + Ai ¨ M0B ´ M0B with G A0 Ñ A0 + Ai being the deformation gradient describing the deformation between the neutral blendshape A0 and the expression Ai . Additionally, G A0 Ñ A0 + Ai « G B0 ÑB0 + Bi = (M0B + MiB )(M0B )´1 [9]. wi = ((1 + ||MiA ||F )/(κ + ||MiA ||F ))θ. (2.22). By alternating between treating the blendshape weights as constants and the blendshapes as constants when optimizing, Li, Weise and Pauly [9] keep refining the blendshapes according to Equation (2.21) and make new estimations of the blendshape weights through Equation (2.23) while also changing the regularization constants iteratively [9]. EB =. N ÿ. Sj. B. ||vk ´ (vk 0 +. n ÿ. B. αij vk i )||22 + γ. (αij ´ α˚ij )2. (2.23). i =1. i =1. k =1. n ÿ. Sj. In the equation above, N is the total number of vertices, vk are the vertices of the training B. pose S j , vk i are the vertices of the blendshape Bi , γ is a constant to balance the fitting and regularization and α˚ij are predefined blendshape weights given by the user [9]. The implementation by Li, Weise and Pauly [9] shows benefits with the gradient based optimization formulation but depends on manual input for the estimation of blendshape weights as well as recommending 10 iterations for refining the blendshapes, which may not be very time efficient. An approach targeting the optimization of blendshapes for performance capture data in a fully automatic manner is done by Ma et al. [22]. The optimization problem in this implementation is also targeting a minimization of an energy expression, and the problem formulation can be seen in Equation (2.24) - (2.27) [22]. n. min. wi ,Ri ,ti ,D,b0. ÿf. Eig. (2.24). i =1. 1. Eig = ||Mi (xi ´ pi )||2 1. (2.25). M i = M i b I3. (2.26). xi = (Inv b Ri )(Dwi + b0 ) + (1nv b ti ). (2.27). In the equations above, n f is the number of faces in the blendshapes, p is the input performance capture, M is the weight matrix containing per-match quality score and Inv is an identity matrix with size of number of vertices nv . xi is the reconstructed face pose based on the blendshape model, D is the pose offset, b0 is the neutral pose, wi are the blendshape weights, 1nv is a column vector of ones with the length of nv , R and t are the rigid rotation and translation respectively [22]. As the approach by Li, Weise and Pauly [9], Ma et al. [22] alternates between optimizing the blendshape weights while treating the blendshapes as constants and refining the blendshapes with blendshape weights treated as constants. Additionally, the rigid motion is solved for in the optimization of blendshape weights, which brings stability to the solver [22]. The optimization problem for finding the optimal weights is given by min Eig ,. wi ,Ri ,ti. (2.28) 16.

(32) 2.5. Related work where wi are the blendshape weights that are fixed between 0.0 and 1.0 and Ri and ti are the rotation and translation respectively [22]. In order to achieve a better estimation of the blendshape weights, Ma et al. [22] introduce two regularization terms. The first one relates to the sparseness of the solution, and keeps the combinations of blendshape targets as few as possible. The second one, is a temporal smoothness term that penalizes the difference between the current blendshape weights and those found in the previous stage of the optimization process. This regularization term aims to keep smoothness throughout the iterations. With these additional regularization terms, the estimation of blendshape weights is extended from Equation (2.28) to min Eig + λs Eis + λt Eit ,. wi ,Ri ,ti. (2.29). where Eis = ||wi ||21 is the sparseness regularization term, further extended as ||wi ||21 = ||1nTb wi ||2 , and Eit = ||wi ´ wi´1 ||2 is the temporal smoothness term. The corresponding constants λs and λt control how much influence the respective regularization term has [22]. Ma et al. [22] further optimizes the blendshapes with blendshape weights and rigid motion kept fixed, through minimizing Equation (2.30) with added regularization terms Er and Ed which ensure that the semantics of the template blendshapes are kept. The energy terms are defined according to Equation (2.31) [22]. min E g˜ + λr Er + λd Ed D. (2.30). E g˜ = ||(W T b Inb )vec(D) ´ vec(P˜ )||2 Er = ||G1 vec(D) + G1 (InTb b b0 ) ´ g˚ ||2. (2.31). Ed = ||vec(D) ´ vec(D˚ )||2 In the equation above, W is a blendshape weight matrix, Inb is an identity matrix whose size is the number of blendshapes nb , P˜ is a matrix with the reconstructed poses of the facial performance capture, G is the deformation gradient, G1 = Inb b (G b I3 ), g˚ is a stacked vector of all deformation gradients from the initial blendshapes and D˚ is the initial blendshape model [22]. Yet another similar approach for personalized blendshape generation is presented by Seol, Ma and Lewis [23] which also utilizes an iterative optimization process similar to Li et al. [9]. The usage of a prior preprocessing step consisting of deformation transfer, along with a head motion estimation allows for stability in the solve and it is highlighted that the generation can be applied for HMC-capture as input as well as performance capture [23]. The initial alignment of the tracked expressions used as training poses is performed with the usage of Procrustes alignment and expression weight solving for the minimization problem according to ˆ i (b0 + Bs wi ) + ˆti ´ pi ||2 , min ||R (2.32) Ri ,ti ,wi. ˆ i = Inv b Ri and tî = 1nv b ti with nv being the number of vertices in the mesh, Ri where R and ti are the rigid transformations of the head movement, b0 is the neutral template and Bs is the actor’s initial blendshape basis [23]. The optimization algorithm proposed by Seol, Ma and Lewis [23] is based on deformation gradients and they iteratively solve for the blendshape weights, the rigid motion and the 17.

(33) 2.5. Related work blendshapes [23]. The minimization problem formulated to control the expression transfer and finding the expression mesh xi is defined according to min ||Mi (p˚i ´ xi )||2 + λ||Lb0 (b0 + Bs ci ´ Lb0 xi )||2 , xi. (2.33). where Mi is a diagonal matrix containing information about stationary vertices, ci are predefined blendshape weights from performance capture, λ is a regularization constant, p˚i is the aligned mesh and Lb0 is equal to the matrix A from deformation transfer [23].. 2.5.2. Machine learning approach. With the approach of creating fully automatic solutions for the blendshape optimization problem, the natural extension is to involve machine learning in the process. The solution by Li et al. [25] produces both facial assets and blendshapes given a single scan in neutral expression. Focusing on generating personalized blendshapes, this approach utilizes a selfsupervised learning scheme with a two-staged framework that includes an estimation and a tuning stage [25]. In the estimation stage, a reconstruction loss is defined such that the blendshapes are optimized to best reproduce a set of expressions [25], in a similar manner as the reproduction of training poses by Li, Weise and Pauly [9] where Pk is the reconstructed expression accordř ing to Pk = S0 + iN=1 αik Si with blendshape weights α, N being the number of expressions in the generic model and Si the expressions from the generic model [25]. j. With a target model given as S0 , the reconstruction loss using the personalized blendshapes ř j j1 j j j Si can be given as Pk = S0 + iN=1 αik Si and from this, Li et al. [25] defines the reconstruction loss for their blendshape generator according to ÿ. Lrec =. j1. j. ||Pk ( x ) ´ Pk ( x )||1 .. (2.34). j xP Pk. In order to ensure that the generated blendshapes for the target follow the semantics of the generic blendshapes, Li et al. [25] introduces a regularization term based on minimizing the relative difference in blendshape offsets, according to Lreg =. N ÿ ÿ. j. gi mi ( x )||∆Si ( x )||1 , @i ď 1,. (2.35). i = 1 x P Si. where gi stands for the global weights and mi ( x ) for the local weights per vertex in the blendshape Si according to Equation (2.36) with regularization constants λ g and λl and Si being the personalized blendshapes [25].. gi = ř. λg , @i ď 1 ||S i ( x )||2 x P Si. λil mi ( x ) = , @x P Si ||Si ( x )||2. (2.36). Li et al. [25] combines the reconstruction and regularization terms into a 2D convolutional neural network in the blendshape generator with the loss function according to LG = Lrec + ωreg Lreg .. (2.37). 18.

(34) 2.5. Related work Following the generation stage, a tuning stage is used to achieve the final individualization towards a specific subject. This is done through relaxing the constraints on the blendshape weights and instead of estimating them, learning them with the usage of a neural network. The tuning stage reuses some of the network architecture from the blendshape generator but is based on another loss function according to Equation (2.38) with the new regularization term according to Equation (2.39) [25]. LGFT = Lrec + ωregFT LregFT. LregFT =. N ÿ. j. ||∆Si i =1. FT. j. ´ ∆Si ||1. (2.38). (2.39). The regularization term is used to preserve the semantics of the blendshapes that have been generated by the estimation stage and minimizes the difference between the target blendj j shape offsets ∆Si and the initial blendshape offsets ∆Si [25]. FT. The approach of using neural networks to generate personalized blendshapes as proposed by Li et al. [25] shows high benefits in terms of computation time and stability but this type of implementation also demands a large amount of high-resolution training data.. 2.5.3. Numerical approaches. Much of the problem formulations related to blendshape optimization and generation can be broken down into linear systems which can be solved using a free choice of programming language and method. Some highlighted methods can be found in related work where Ma et al. [22] uses the QP-solver in CVXOPT [26] and a sparse linear solver based on lower-upper (LU) decomposition in Scipy to solve the optimization problems for blendshape personalization. The usage of a solver based on LU composition is also recommended by Sumner and Popović [13] for efficiency when solving the minimization problem they propose, related to deformation transfer.. 19.

(35) 3. Method. This chapter will provide a description of the method for this thesis project and how the implementation of the framework and the evaluation was done. The pre-study was mainly conducted in the beginning of this thesis project in order to build a base for the implementations, but was also carried on throughout the duration of the project. The methodology for gathering relevant theory was made with a starting point in recent studies and publications within facial rig optimization and it was extended as needed to cover a range of topics within computer graphics.. 3.1. Implementation. The implementation of the blendshape optimization was broken down into two major parts inspired by the approach by Li et al. [25], with one estimation stage and one refinement stage. A conceptual overview of the implementation can be seen in Figure 3.1. The overall optimization formulation builds upon EBFR as proposed by Li, Weise and Pauly [9] and was extended to a fully automatic framework using findings from Ma et al. [22] for blendshape weights estimation and a head motion estimation and alignment in a separate step similar to the method by Seol, Ma and Lewis [23]. The optional usage of an iterative update of the refinement stage provides scalability in the solution and offers research prospects for investigating the influence of optimization parameters and changing the optimization loop for different target blendshapes. The framework was implemented in Python which provided rapid prototyping and options to explore various packages when needed. The framework was further designed as a Nix-package with a client and a library component and could be executed through the command line in a nix-shell. A flowchart providing an overview of the algorithm used for generating personalized blendshapes in this thesis project can be seen in Figure 3.2 and a more detailed description can be seen in Algorithm 1.. 20.

(36) 3.1. Implementation. Figure 3.1: A conceptual overview of the blendshape optimization framework. This process consisted of two stages, one estimation stage and one refinement stage that could further be extended in an iterative manner if needed.. Algorithm 1: Blendshape optimization Data: Generic blendshape model, target neutral blendshape and sample data Result: Optimized blendshape model Estimation Stage: perform deformation transfer from generic model to target estimate initial blendshape weights align the training poses calculate triangle gradients calculate prior parameters and regularization weights Refinement Stage: for each refinement step do for each face do find the optimal triangle gradients for all blendshapes end for each blendshape do calculate the corresponding deformation gradients per face perform deformation transfer guided by deformation gradients update the vertex positions end estimate new blendshape weights align the training poses update the triangle gradients for the training poses end formulate the combo-shapes return optimized blendshape model 21.

(37) 3.1. Implementation Facial scans/Manually sculpted poses. Input data. Generic blendshapes. Neutral target blendshape. Optimization framework. Deformation transfer. Initial blendshape weights estimation and alignment of training poses Calculate triangle gradients and regularization weights. Optimize blendshapes. Iteration. Update blendshape weights. Align training poses and update triangle gradients. no. Convergence reached?. yes. Build combo-shapes. Personalized blendshapes. Figure 3.2: A flowchart providing an overview of the implemented framework.. 3.1.1. Preprocessing. Before loading the input to the framework, the blendshapes were meshed, triangulated, normalized and registered to correspond to the same topology as the initial target blendshape and the generic blendshape model in a preprocessing step using existing VFX-softwares at Goodbye Kansas. This step was introduced to make the framework fit into the existing pipeline and to ensure that the meshes do not contain any errors, which improves the stability of the optimization framework as well as reduces the need of error checks in submodules.. 3.1.2. Input parsing. The first step of the framework consisted of loading the input data and storing it for easy and efficient access for the deformation transfer as well as the optimization step. This step required handling of argument parsing since the framework was to be used in a production pipeline. The usage should be clear and able to handle user input to explicitly set parameters if necessary. 22.

(38) 3.1. Implementation The implementation of the input parsing was done using argparse and with a utility class for handling the optimizer settings. Additional input checks were implemented in order to make the framework production ready and scalable for future additions. Input variables The input data was processed as numpy.array objects and consisted of the following required inputs: • A set of vertex lists for a generic blendshape model; expressed in delta notation from the generic neutral blendshape. • A vertex list for the neutral target blendshape. • A set of vertex lists for a number of sample data expressions; expressed in delta notation from the target neutral blendshape. • A face-list that describes the topology of the blendshapes. The user could also provide the following optional input data: • A text file containing names of blendshapes that should not be optimized. • A user-provided mask to determine stationary vertices. • A user-provided mask for rigid alignment of meshes. • A settings file with values for optimization settings that overrides the default values in the framework. • A debug flag for outputting interim blendshapes as obj-files. Running the framework Input parsing was added such that the framework loads the input data according to settings provided either through flags in the terminal, or by a settings-file which could be used to explicitly modify values of optimization constants in the framework. An example command for running the blendshape optimizer from the nix-shell using flags for settings can be seen in Listing 3.1 with the mandatory input variables. 1 2 3 4 5. $ run −−INPUT_DIRECTORY " / d i r e c t o r y − i n i t i a l −blendshapes " −−INPUT_DATA_DIRECTORY " / d i r e c t o r y − s c a n s " −−OUTPUT_DIRECTORY " / d i r e c t o r y −output " −−FACE_LIST_PATH " /path −to −topology − l i s t ". Listing 3.1: Command line example for the optimization framework in a nix-shell. Additional flags for the optional settings as explained in section 3.1.2 could be added in a similar manner. Storing of input The generic blendshape model was loaded and separated into combination-shapes (comboshapes) and base-shapes, where base-shapes are blendshapes containing only a single FACSexpression and combo-shapes being a blended or weighted sum of multiple base-shapes. Since the generic blendshape model was manually created, the estimation of the blendshape weights for the combo-shapes could be done using the established hierarchy. The numeric values of the blendshape weights was estimated with solve_qp() in Quadprog [27]. 23.

No results found