• No results found

Evaluating the quality of 3D character animation produced by artificial neural networks: A user study

N/A
N/A
Protected

Academic year: 2022

Share "Evaluating the quality of 3D character animation produced by artificial neural networks: A user study"

Copied!
48
0
0

Loading.... (view fulltext now)

Full text

(1)

Bachelor of Science in Digital Game Development June 2020

Evaluating the quality of 3D character animation produced by artificial neural

networks

A user study

Ossian Edström

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

(2)

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Bachelor of Science in Digital Game Development.

The thesis is equivalent to 10 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:

Author(s):

Ossian Edström

E-mail: osed16@student.bth.se

University advisor:

M.Sc. Diego Navarro

Department of Computer Science

Faculty of Computing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00

SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

(3)

Abstract

Background. In recent years the use of Artifical Neural Networks(ANN) to generate character animations have expanded rapidly. However comparisons of the perceived realism and quality of these new approaches when compared to the results of a traditional pipeline have been lacking or non-existent.

Objectives. This thesis aims to show initial data on whether the visual quality of one of these novel approaches is perceived to be of higher visual quality than similar keyframe-based approaches and if so why. As such the objectives of this thesis is to produce animations as a base of comparison for the method.

Method. Keyframe animations performing similar actions to an ANN-based plug- in for the game engine unity were handcrafted after which a questionnaire study was performed. This study was sent to willing participants both experienced and inexperienced to gauge public opinion.

Results. The results indicated that participants considered the ANN-based plug-in animations to overall be more realistic, natural, smooth and appealing. Statistical analysis via t-test shows a high statistical significance when comparing opinions on quality between the two sets.

Conclusions. The ANN-based approach was considered by participants to be of superior visual quality due to reasons stated above. All experienced participants correctly guessed which of the two animation sets were AI-based and 1/3 of the inexperienced. However the inexperienced participants who guessed wrong stated similar motivations for their guess to those who guessed right. Which could imply an uncertainty about the capabilities of AI in the public consciousness that wasn’t accounted for in this thesis.

Keywords: Machine Learning, Neural Networks, Character animation, Perception

(4)
(5)

Acknowledgments

The author would like to thank Diego Navarro for his in-depth support, understand- ing, guidance and mentorship. The author would also like to thank all particpants for their imperative help in gathering data. As well as friends and family for their unending support through challenging times.

iii

(6)
(7)

Contents

Abstract i

Acknowledgments iii

1 Introduction 1

1.1 Background . . . . 1

1.2 Aim and objectives . . . . 3

1.2.1 Research Questions and Hypothesis . . . . 3

1.2.2 Motivations,goals and objectives . . . . 3

1.3 Outline . . . . 4

2 Related Work 7 2.1 ANNs and character animation . . . . 7

2.2 Animation naturalness and realism . . . . 7

3 Method 11 3.1 Apparatus and Tools . . . . 11

3.1.1 Apparatus . . . . 11

3.1.2 Autodesk Maya . . . . 11

3.1.3 Unity . . . . 12

3.2 Implementation . . . . 12

3.3 Setup and Procedure . . . . 14

3.3.1 Data analysis . . . . 15

3.4 Limitations . . . . 15

4 Results and Analysis 17 4.1 Results . . . . 17

4.2 Analysis . . . . 19

5 Discussion 21 6 Conclusions and Future Work 25 6.1 Conclusions . . . . 25

6.2 Future Work . . . . 26

References 29

v

(8)

A Supplemental Information 31 A.1 Full thesis survey . . . . 31

vi

(9)

Chapter 1

Introduction

This chapter introduces the context necessary to understand the area, current prob- lems and research gap below in 1.1. Followed by 1.2 where the research question and hypothesis is defined along with goals, objectives and motivations. It ends with 1.3 the outline which briefly describes the other chapters distributions and contents.

1.1 Background

Realistic character animation is an important part in media such as games and movies. However it is difficult to create realistic animation of human motion as people are keen to discern even the inaccuracies of subtle movements. A common way to handle this is Motion Capture also known as MoCap which is a technique to capture and reproduce human motions by recording or "capturing" them. However while MoCap adds realism it is notoriously difficult to edit. This problem has been met with a number of different solutions over the years, for example the in the 2000s Motion Graphs were developed. They are in short a method of taking a set of motion data breaking it down into small nodes between which transitions are automatically generated. This allows users to avoid the hassle of capturing motions specifically meant to connect to each other. [9] [1] In some cases similar techniques use a so called "state-machine" where different actions such as "running" or "walking" are set to be transitioned between sometimes manually.

The main problem with these approaches is that it requires a lot of animator input wherein animators must record the motion data, cut it into small segments or nodes and label it appropriately. [7] Multiple other techniques have been developed to combat this problem by automation, one notable example is Motion Fields which in short uses a set of different possible motions and uses reinforcement learning to choose between the motions at runtime. However this approach is notably slower than motion graphs at runtime despite being a more automated and agile process. It also has difficulty scaling up to especially large datasets and tasks.[13] In more recent years Motion Matching is another example where rather than placing small animations in an overarching structure they manually mark what cannot be automatically detected such as attacks and defense stances for characters. These mark-ups are then used at runtime to quickly find and blend between the current pose and the desired end

1

(10)

2 Chapter 1. Introduction point.[4] However the query for these mark-ups and the storage of data in memory is a potential scalability issue as the scale of the memory in question increases.[7]

The "scalability" or capacity of these approaches to function and be applied efficently as the amount of data (and thus the amount of motions an approach can handle) increases is a re-occurring limitation in the field. This has lead to an increased amount of research into Artifical Neural Networks(ANNs) in recent years as they have an almost unlimited data capacity and low memory usage at runtime.[7] ANNs are a type of data processing inspired by the neurons(nerve cells) of the human brain that in recent years have been used for a wide variety of applications as they can take a variety of inputs to produce specified results [17]. In recent years the use of these networks for simulating character animation has been increasing rapidly due to their high scalability and ability to create new motions from interpolation of the input data. [19]

These new approaches are gaining interest fast and appear to solve many of the prob- lems inherent in scalability and manual animator input. However many of the recent works [7, 19, 12] states that their techniques look natural or realistic without critically applying user studies to verify this. Stating that a framework has created natural motions based mostly on use of MoCap data or abstract mathematical calculations may be a problem within the field since as stated in previous works "Naturalness"

is a highly subjective variable.[2] Further more there has to the authors knowledge been very few published works investigating if these new techniques look less or more realistic than a more traditional approach. Would a layman even be able to tell the difference between these new approaches and traditional keyframe animation?

The only study found in our research that appears close to answering this question is the work of deheesa et al. [5] as they apply a technique based on [7] and compare it to traditional animation logic. However the work of Holden et al. [7] is mostly applicable to locomotion which [5] due to the nature of their study did not investigate. This presents a gap in knowledge and a problem pertaining to the realism and naturalness or "visual quality" of recent ANN-based techniques especially when compared to the more traditional keyframe animation approach. As such this work aims to amend this by way of a user study comparing both perceived visual quality of "traditonal"

(i.e. non-procedurally produced) animations to the procedurally produced ones and an evaluation of which the participants consider more realistic.

Sebastian Starke, one of the authors of a paper introducing one of the most recent techniques within the area using a so called Neural State Machine(NSM) as well as numerous other papers within the subject has released a product for the game engine Unity with a trained neural network. This product "Bio Animation" uses neural networks and a database of motion capture data to produce transitions and motions. [18] [19]

As this product is one of very few neural network based solutions available on the

market today this study intends to use this Unity Extension as a basis of comparison

for visual quality of character animation using neural networks.

(11)

1.2. Aim and objectives 3

1.2 Aim and objectives

The central aim is to determine through user studies whether the visual quality of the animations made by procedural methods using an ANN and motion capture is perceived to be of higher quality than similar hand-crafted animations and if so why.

The subsections below first state the Research Questions and hypothesis followed by a motivation for the choice of methodology as well as a listing of the objectives of this thesis. Please note that skilled/unskilled and experienced/inexperienced participants is referred to interchangeably throughout this thesis and is here defined as "Having studied the subject of 3D animation on a college/University level or having equivalent experience" and was an early yes or no question in the user study.

1.2.1 Research Questions and Hypothesis

In order to reach the central aim the following research questions have been formu- lated:

• Which character animations are preferred by viewers when comparing those produced by procedural neural network methods and traditionally produced ones and why?

• In what way do "skilled" and "unskilled" people within 3D animation reason when trying to determine if animations have been generated by an AI? Does their reasoning differ and if so what way?

The hypothesis for this thesis is that due to the precise nature of motion capture as well as the realistic transitions inherent in the neural network approach is that the viewers will prefer the neural network approach. Especially as the development of that system was overseen by experts while the author and animator of the traditional keyframe animation approach has relative inexperience in the field.

It is anticipated that the reasoning of "skilled" individuals will be more accurate than "unskilled" and that their reasoning will in some manner recognise MoCap data due to previous experience with it. The unskilled reasoning is considerably more uncertain and harder to predict but will likely be less accurate and a bit more random as some individuals will just guess some perhaps on a "hunch".

The results of this thesis may when complete be of a benefit to further research within the realism and comparison between ANN-based approaches and traditional industry-standard approaches. It could potentially be seen as a starting point for developers or companies attempting to consider if an ANN-based approach would be worth the effort or if their users wouldn’t be able to tell the difference between the two approaches.

1.2.2 Motivations,goals and objectives

Several different methodologies were considered as they may have been applicable.

However in order to achieve the goal the question: "What research method(s) may

be most appropriate to answer these questions? " must be answered.

(12)

4 Chapter 1. Introduction The application of procedural neural networks for character animation is a relatively new development. Potentially as part of this novelty there is a lack of studies com- paring the new methods with the current industry standard. Therefore an archival study for example could be a counter-productive methodology. The presented re- search questions require quantitative research in order to establish if the general public has an overall preference. However it also requires qualitative data to at least partially understand the motivations behind the answers.

Interviews with open-ended questions after exposing participants to the stimuli could potentially be an applicable research method. However the working conditions made meeting in person inadmissible and while online meetings were a possibility there was limited time available and the core purpose of this paper was to establish public preference and perception of the recent AI methods. As such a more expedient methodology was required and quantitative answers more relevant. Therefore the decision was made to arrange an online survey with close-ended questions. However to receive an indication of participant’s motivations a "why?" question was added after many of the options.

The central objectives are therefore to produce "traditional" animations that can be used as a basis for comparison to the unity extension and to gather subjective experiences via an online survey. Which will be sent out to as many participants as possible from friends, family and colleagues as this creates a relative certainty of genuine answers and avoids potential misgivings present in more public studies.

As such the objectives are the following:

• Produce "hand-made" animations in Autodesk Maya that fulfill the same ac- tions as the Neural Network plug-in.

• Script basic movement and set their blending between animations in Unity to produce similar motions as the plug-in at the same time.

• Record a video of the same character performing the exact same movements at the same time but with the two different animation set-ups. Edit this video so that they are displayed side-by-side for easier comparison. Upload the video unlisted to Youtube or other video-hosting service.

• Synthesize a form containing questions to ascertain a viewer’s perception of visual quality and preference. Send that and a link to the video to carefully chosen participants.

• Analyze the data gathered and synthesize it into easy to read tables and graphs.

Drawing conclusions from the results gathered and documenting them.

1.3 Outline

This thesis is split into the following chapters:

• Chapter 2 is Related Work which introduces certain theories required to fully

understand the subject as well as previous works attempting to compare two

(13)

1.3. Outline 5 or more approaches to realistic animation and/or evaluate its visual quality.

Often through User Studies.

• Chapter 3 is the Methodology and offers details on the procedures of the project from implementation and data analysis to what apparatus and tools were used.

• Chapter 4 is Results and presents the results of the user study as well as the statistical analysis of its results.

• Chapter 5 is the Discussion where the results are discussed and put into larger scientific context. It is also where reasoning for why these results may have been conceived is considered and how this might impact our research questions.

• Chapter 6 is the last chapter and sums up the points discussed and the final

conclusion to our research questions and recommendations for future work.

(14)
(15)

Chapter 2

Related Work

This chapter lightly introduces recent theories and developments in the field in 2.1 followed by works as similar to this thesis as could be found by the author in 2.2 which informs on previous work within the field and motivates certain aspects taken in the approach of this thesis.

2.1 ANNs and character animation

This thesis was heavily inspired by the work of Starke et al. and Holden et al.

due to the potential of their recent systems within character animation for game development.[19, 7] These studies in addition to multiple others [15, 21, 14], have in- dicated ANNs to be a cost-efficient tool performance-wise for generating transitional animations both in locomotion and in limited object interaction.

In recent years new systems for locomotion and a broadening of other character animations based on ANNs and motion capture have been in development. [19, 7]

However while multiple user studies have been performed to determine the perceived naturalness of other mocap-based systems these new systems have no apparent pub- lished works evaluating them nor do they contain user studies in their individual papers. However an evaluation of realism (compared to other similar techniques) is made based on the variable of "foot-sliding" a term which describes the phenomenon of a character seeming to move in a direction while its feet either hover above the floor or slide along it instead of appearing to propel it forward via contact.

2.2 Animation naturalness and realism

Visually appealing and realistic or otherwise natural motions has been a goal within character animation for many years. One technique to determine this "naturalness"

in animations has been the application of pilot or user studies since it is considered a subjective variable.[2, 5] The following related works have a focus on determining realism, "naturalness" or visual appeal by applying user or pilot studies and often comparing different techniques to one another.

The work of Wang et al. aims to determine visually appealing motion transitions

7

(16)

8 Chapter 2. Related Work through linear blending, comparing and developing two different methods of doing so and evaluate these results through user studies. The article concluded that the two methods presented a perceived improvement for participants when applied in respectively appropriate situations. This study is similar to the work of this paper as it is one of few studies found about determining visual appeal in animation as well as an example of how to perform user studies within the subject. In addition the unity plug-in is in its essence also an attempt at creating visually appealing and efficient motion transitions as it uses a motion capture database to interpolate and generate motions.[20]

Krüger et al. had 39 participants and compared synthesized and natural motions to a reference motion asking participants in the user study which of A or B were the most similar to a reference motion O. They found contrary to their hypothesis that the renderered pose representation (stick figure or point-based) had very little effect on the percieved similarity of the motions each rendition represented. The study compared percieved similarity/dissimilarity using a user study which is reminiscent of the method applied in this paper. [11]

Bandim Faustino proposes a framework to automate motion generation based on motion capture data and an annotation system. Their evaluation approach for de- termining a motion’s ”naturalness” is based on rendering three video-clips for each of multiple scenarios. One of the clips was the current solution while the other two were of the potential improvement one with minimized foot-skating and one without. In a user study 12 participants were asked how natural the motions appeared for each video rating it on a 5 point likert scale between Really Bad and Really good. The new system Beyond Mocap displayed a higher degree of naturalness according to the study results. This study is relevant both for it’s evaluation and its relatively small sample size within that study. In addition it could be considered a compari- son between a ”traditonal” mocap system and an updated one albeit not with deep learning.[2]

Koyama et al. presents a one-hop transition to the primitive problem of motion transition. In essence it is based on the assumption that transition between two given poses becomes smoother and more natural the more similar they are. To apply this knowledge a short motion clip referred to as a "hop" is inserted at "optimal" moments by their framework searching for the optimal clip and timing at each transition. They then demonstrate the frameworks efficiency at run-time by incorporation into the Unity game engine. The study mainly appears to determine its "naturalness" and realism in motions by generating multiple scenarios and using a smoothness value and the researchers own observations.[10] This article is an example of a different approach to determining "naturalness" than this thesis but also an example of using the Unity game-engine for scientific experimentation and a recent approach to motion transition.

Dehesa et al. [5] builds on the concept of PFNNs from [7] to create a framework for VR sword-fighting against a AI-controlled opponent. They used a technical eval- uation, questionnaire study and an interactive user study to determine that their framework produces a more realistic and engaging interaction than the "hand-made"

alternative. The two user studies were performed in order to determine realism, in-

(17)

2.2. Animation naturalness and realism 9 terest and immersion. The questionnaire study consisted of a set of seven-point Likert scales of adjectives to describe the character, the Intrinsic Motivation Inven- tory (IMI)[16] and adapted realism questions from the Immersion questionnaire by Jennet et al.[8]. They had 41 participants and analyzed their results with ANOVAs and two tailed t-tests.[5]

In summation the use of questionnaire and user studies for determination of realistic

or natural motion has been a common practice. However in recent years and within

the use of ANNs for character animation the application appears scarce as the work

of Dehesa et al. was the only study to do so that the author was able to find. In

addition it evaluates the immersion and realism perceived by their approach and

by extension PFNNs while our study expands on this knowledge as it pertains to

locomotion rather than static actions. That study was also focused on the underlying

logic of the characters responses to certain stimuli while the approach of this thesis

focuses on visual differences. This gap as well as determining how well participants

can distinguish an AI approach from the industry standard are primary objectives

of this thesis. The approach of this thesis does not take inspiration from this study

in particular as it was found after the user study had already been performed. It

is however to some degree inspired by some of the previous related works in its

use of likert scales and t-tests to determine statistical significance in the subjective

experience of realism.

(18)
(19)

Chapter 3

Method

As stated in 1.2.2 the methodology arguably most suited to answer our research questions is likely a user study especially as it has proven a viable method in the past as demonstrated in 2.2.The study was conducted via an online survey due to current working conditions making meeting in person inadmissible. In order to perform a User Study however traditional keyframe animations first had to be created and implemented into the Unity game engine so that they could be compared to the plug-in. This process is detailed below in 3.2. These animations were produced in Autodesk Maya and the entire process on a home computer details of both and other tools can be found in 3.1. Once all animations were produced the experiment was setup and conducted as described in 3.3.

3.1 Apparatus and Tools

This chapter details the software and hardware used for this thesis. Each subsection details and explains one of these in detail. The only hardware used is described in the Apparatus subsection. All other sections describe software.

3.1.1 Apparatus

The entire process was conducted from home using a Personal Computer (PC) with the following hardware:

Graphical Processing Unit (GPU): AMD Radeon R9 200 Series

Central Processing Unit (CPU): Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz,3.60GHz The study was conducted remotely and thus it was not possible to ensure all partic- ipants had access to the same hardware. Therefore in order to ensure that the test environment was as similar as possible for all participants a recording was used as stimuli. This ensured that there were no render differences in-engine as a result of testers using different hardware.

3.1.2 Autodesk Maya

Maya is a 3D computer graphics application used professionally within gaming and film spheres. It allows anything from animation, modeling and rendering to physics

11

(20)

12 Chapter 3. Method simulation. This study specifically used the 2019 student version. In this study Maya is primarily used to rig and animate and export a 3D character for use in the Unity game engine.

3.1.3 Unity

Unity is a popular commercially used game engine that can be used by anyone free of charge. Unity was chosen as the game engine as it is commercially used, contained an appropriate plug-in and has been used for studies within the subject previously;

notably the work of koyama et al. [10].

3.2 Implementation

At first the plug-in "Bio-Animation" by Sebastian Starke was purchased and down- loaded from the Unity asset store. The plug-ins demo test environment was used as the main scene for the implementation. The default 3D-model and skeleton called

"Adam" for the "Bio-Animation" plug-in was used when developing the keyframe animations to minimize differences.

The characters in Starke’s plug-in were capable of moving in multiple directions which were put into the following categories: forwards, backwards, left and right and applied different animations for running and walking as well as blends between these motions. A total of nine Keyframe animations were produced to fill the same actions as the plug-in. Eight of the animations were running and walking animations for each of the four directions and the ninth was an idle animation. Each keyframe animation was produced in Autodesk Maya based on video references. These references were acquired mostly through searching for the relevant motions on the web. They con- sisted of film of real people performing the actions and came from multiple different sources with varying degrees of quality and applicability. Backwards movement was notably difficult to find and had to be recorded by the author. This disparity in source material may have led to a noticeable difference between the motions of the final animations. However the intent of the comparison is to simulate the traditional approach of keyframe animation and drawing inspiration and reference from multiple different sources is part of this process. Each video was carefully referenced some- times frame by frame in order craft each animation as close to realism as possible.

The references were not intended to be the same exact motions as the plug-in. This as any one person e.g walking was assumed to be as realistic as any other person walking.

The finished animations were then blended with help of Unity’s animation system

"Mecanim" to account for the input of multiple directions such as walking forward and to the left simultaneously.

A simple script in C# was developed and tested to allow character movement with

a constant velocity and playing of the produced animations at keyboard input. At

runtime Unity was configured to show two viewports side by side where each rendered

a copy of the same character mesh. However the right viewport applied the plug-in

locomotion system and the left the traditional state machine approach with the Maya

(21)

3.2. Implementation 13

produced animations. Both were wired to respond to the same "WASD" keyboard

inputs at the same time and input was made visible through Icons inbetween the

split screen.

(22)

14 Chapter 3. Method

Figure 3.1: The comparison video with Keyframe animation set labeled A and AI labeled B

3.3 Setup and Procedure

After implementation a video was recorded as both systems responded to the same input and then edited to more clearly and efficiently display each state of walk- ing/running and idling. This video was then uploaded to Youtube as an "unlisted"

video which was linked in the google form which allowed participants to pause and compare frames if they wished. The viewports were identified by a letter edited into the corner of the respective viewport. A for the left and B for the right as can be ob- served in fullscreen of the video in Fig 3.1. To see how it was embedded in the actual survey see Fig A.3 in the appendix A there the survey as presented to participants in its entirety is displayed. Note that full-screen was available to participants. Par- ticipants were not clued into the fact that one of the animations had been produced by an AI until after they completed the survey.

The synthesized form contained questions intended to ascertain if the participants were able to distinguish the "traditionally" produced animations from the procedural one. As well as which they consider to have a higher visual quality and if they have any preference between them and why. This form was synthesized using Google Polls for ease of access, security protocol and the fact that it is commonly used and so may be easier for some participants to use. This form and video link was then sent to willing trusted individuals which were friends, family and fellow students who could be assumed to take the study seriously with relative certainty. This could potentially create a risk of bias in certain questions as the group was not randomly selected. However fellow students were invited via a link in a Discord group that all technical artist students at BTH have access to creating an element of randomness. In addition as participants did not know which video was produced by which method nor that the animations of set B were produced by a plug-in and not the author himself. These measures should limit bias to one set over the other however it may have created a reluctance to rate the quality of both as poor. Each computer viewing the video likely did so slightly differently but all motions should be identical across devices other than potential viewing resolution and screen settings.

The form was left open for 3 weeks and sent to multiple potential participants. The

(23)

3.4. Limitations 15 participants included both people familiar with the field of 3D animation (College/U- niversity students who had studied at least one animation course or equivalent.) and those unfamiliar. The form applied a five-point Likert-scale for determining the par- ticipants opinion of quality and similarity between the two animation sets as likert scales have proven a valuable in statistical evaluation of visual phenomena before e.g. in Faustino’s work [2].

3.3.1 Data analysis

The results for the applicable questions of quality was then put through three t-tests to determine if the results hold any statistical significance. With the null-hypothesis that no significance was present and an alternative hypothesis that there was with a P<=0.05 invalidating the null hypothesis. "A" below refers to the keyframe anima- tion set and "B" refers to the ANN-generated animation set. One test was between experienced and inexperienced individuals when determining set "A" and another between them determining quality for set "B". A third for values of quality between A and B in total. All t-test were Heteroscedastic.

3.4 Limitations

The most central limitations of this method deals with the stimuli presented to participants. The "traditionally" produced animations were not crafted by an expe- rienced professional and is therefore not necessarily indicative of industry-standard animations produced by traditional methods. A more accurate comparison could be achieved by employing a seasoned professional or purchasing animations made by one for comparison. However the produced animations may be of similar quality to those produced by hobbyist animators using the Unity game engine thus offering a fair comparison to a potentially helpful plug-in for these developers.

It may have been preferable to use MoCap data for the animations of set A as the plug-in for the AI-system uses a library of motion capture data to generate its motions from. This as Mocap data has previously been acclaimed to produce more natural and fluid motions than keyframe animation and it may have given the AI system an unfair advantage. However due to time-constraints and equipment availability this study could not produce MoCap animations for set A as a basis of comparison.

The two characters did not have the exact same turning or walking speed. This as the plug-in constantly adjusts these parameters based on previous momentum in order to pick the most applicable motion capture clips and based on the motion clips themselves making it functionally impossible to match with a constant speed.

It is possible that extracting the speed of turning/walking from the plug-in to be

applied to the traditional method may have created more visually similar motions

thus simplifying comparison. However this may partially invalidate the comparison

as one of the motion systems would rely on the other. This method would remove

some differences and thereby make others clearer but as the intent is to understand

the perception of the different methods end-results in their entirety this was deter-

mined counter-productive. However it may be a useful method when analyzing more

(24)

16 Chapter 3. Method intricate details and one could potentially create an alternate comparison with exact speeds and compare that to the video without.

The result of limiting the survey to trusted individuals was a relatively small sample

size. This small sample size is a validity risk. However this risk may be lower than

that of posting the survey more publicly thus increasing the risk of less serious and

less invested participants.

(25)

Chapter 4

Results and Analysis

This chapter contains the raw data collected by the user study in the results section followed by an analysis section where a t-test analysis of the difference in rated quality between the two animation sets and between the inexperienced and experienced participants. In this chapter animation set A and B are often referred to. For clarification A refers to the traditionally produced animations labeled A in the video as described chapter 3 and B refers to the animation set generated via an ANN.

4.1 Results

The user study recieved 15 answers wherin 53.3% were male 40% female and one per- son(6.7%) were non-binary. Out of these 40% claimed to be experienced (Definition of experienced as stated in section 1.2). The experienced group was disproportion- ately male with only one female participant. (The Technical Artist program which was one of the largest groups asked to participate is also disproportionately male).

Participants preferred animation set B overall with only one participant having no preference as is visible in Fig 4.1 (A). When asked why multiple participants called it more realistic, natural and smoother. A few participants also mentioned noticing less or no foot sliding in animation set B while finding it prevalent in A. They also vastly preferred the pacing of animation set B as can be observed in Fig 4.1 (B).

Figure 4.1: Three pie charts of different survey results for preferences displayed in percent. Figure (A) shows responses when asked if participants preferred one set over another. Figure (B) shows pacing preference with pacing defined as the timing of movements of an animation. Figure (C) shows which set participants believed had a higher appeal when defined as having personality or life in its motions.

17

(26)

18 Chapter 4. Results and Analysis

Figure 4.2: Three pie charts of responses in percent referring to different aspects of realism. Figure (A) shows if the participants considered sets to have a sudden or jerky transitions between animations such as the transistion from running to walk- ing. Figure (B) represents the answers when participants where asked if they noticed choppy or jittery movements. Figure(C) is response distribution when asked if one of the sets moved in a less natural way.

Figure 4.3: Two seperate graphs. On the left showing what level of quality participants considered each animation set from low (1) to high (5). Blue is set A, Red is set B each colour has its own separate percentage of distribution. On the right the perceived difference between the two sets is represented. The number between percentage and bar represents the number of participants

80% of participants considered set B more appealing, defined here as having more personality or life to its motions, 13.3 percent felt it was about the same between the two. A majority of participants reported animation set B to have more fluid and natural movement as their main motivation for this. A few also pointed out that this was most clear in the turning and jogging motions in particular. A chart of this result is visible in 4.1 (C).

All but two participants considered the transitions between animations to be more sudden and jerky in animation set A when compared to B as clarified in Fig 4.2 (A).

86.7% considered the movements of animation set A to move in a less natural way

than B while the rest considered unnatural movement to be about equal between

the two as visible in Fig 4.2 (C). When queried if they had noticed any jittering or

choppy movements 26.7% reported no observation of such phenomena while 53.3%

(27)

4.2. Analysis 19

Figure 4.4: The distribution of answers to which set participants believed was produced by an AI

reported observing more prevalent chopping and jittering in set A while 13.3% re- ported equal jittering. One person also considered it more prevalent in set B this data is represented by Fig 4.2 (B).

When asked to rate the overall quality of the animation sets on a scale of 1-5 with 1 representing low quality and 5 high. Animation set A had a mean of 2.8 and animation set B had a mean of 4.27. Notably "experienced" participants rated animation set A slightly lower on average than non-experienced individuals. No such difference was observed in the quality responses for animation set B. The responses for distribution of quality responses can be observed in the left of Fig 4.3. In the right chart of the figure is the overall perceived difference between the two motion sets. With this data it is clear that 80% considered the animation sets to be notably different or very different from each other.

The final question of the study informed participants that one of the two options had been animated with a Neural Network using motion capture data and the other

"by hand" all experienced participants correctly guessed B most stating that the amount of detail in motions and transitions between these motions would be diffi- cult to achieve "by hand". The non-experienced group had an even three-way split between all of the possible options A,B and simply not knowing enough to make a judgement. Most who picked A, stated the less natural motions and transitions as their motivation. Those who picked B stated similar motivations but arrived at the opposite and correct conclusion. Those who picked "No idea" motivate that they felt to uncertain and inexperienced to make a choice. The complete percentage distribution for this question is visible in Fig 4.4.

4.2 Analysis

There was a slightly lower average for animation set A when evaluated by the expe-

rienced group. In order to ascertain whether this has statistical significance a "two

tailed welch’s t-test" was performed between the results for experienced and non-

experienced individuals. Two t-tests were performed one on participants opinions of

the quality of set A and set B. The p-value required for rejection of the null-hypothesis

(28)

20 Chapter 4. Results and Analysis was P<=0.05 as 5 percent is the standard within the scientific community. The fi- nal p-values of these t-tests was P ≈ 0.2276 and 0.4927 respectively. This implies that there is no statistically significant difference between experienced and inexperi- enced individuals evaluation of the animation quality. At least not within our limited sample size.

A t-test was also performed to ascertain if there was a statistically significant differ-

ence between the perceived quality of set A vs. B in total, This resulted in a P-value

of 2.9678e

−7

which implies considerable statistical significance. Thereby there the

survey provides statistical proof that animation system B would be highly considered

to be of higher quality by the general public.

(29)

Chapter 5

Discussion

In chapter 3.1 above one of the research questions was: "Which character anima- tions are preferred by viewers when comparing those produced by procedural neural network methods and traditionally produced ones and why?"

The results of the questionnaire study shows that the ANN-based plug-in was consid- erably preferred and a t-test of the preference scores revealed considerable statistical significance despite relatively small sample size. T-tests as stated in chapters 3 and 4 are a common statistical analysis method and are generally considered an efficient method to disprove that results were because of a random factor when the P<=0.05.

When asked to define why most participants stated a higher level of realism and smoothness both in the motions themselves and the transitions between them. Sev- eral participants mentioned that its motions felt less "floaty" and "bore weight to it".

The other specified research question was: "In what way do ’skilled’ and ’unskilled’

people within 3D animation reason when trying to determine if animations have been generated by an AI? Does their reasoning differ and if so in what way?"

100% of Experienced participants correctly guessed which option had been produced by an AI. Half of them indicated that the amount of minute detail in animation set B would be hard to replicate without use of motion capture data. The other half stated the interpolation between animations, the space covered when moving and the movement speed of the character adapting to the animation as primary reasons. Among those who considered themselves inexperienced the answers were evenly divided amongst all three possible options; A,B and Don’t know. It may be of note that those who chose animation set A(The "hand-made" option) some stated that sudden transitions and choppy movements as their reason for believing it had been made by an AI. With one participant stating their reasoning as: "Because animation set B had a more curated, handmade feel to me". The inexperienced who chose correctly also stated smooth and natural movements of set B as their motivators despite picking the opposite option.

The differences in results for the second RQ seem to show an unexpected result in that the same perceptions of the overall quality of the animations results in opposite conclusions. There are several potential reasons for this. On the one hand it could

21

(30)

22 Chapter 5. Discussion be a matter of the individuals perception and trust of AI and its capabilities as a whole as well as their lack of knowledge within motion capture this seems likely given the results of the experienced participants. On the other the results could just be a happenstance of this lack of knowledge considering the fact that every possible option was picked equally often. Or perhaps its not about experience at all but rather a more basic, emotional motivation. One of the inexperienced that chose correctly did state that "...I hope I’m wrong however, since it would be nice if the better animation was man made" which suggests a certain will within at least the inexperienced participants to believe that something "man made" should be better than what is generated artificially.

AI has been publicly discussed more and more since the founding of the field in 1956, especially since 2009. Over the 30+ years since it has gained successively more positive coverage to the point where there is roughly 2-3 times more optimistic than pessimistic coverage in the New York Times over the time period. However specific concerns including a fear of loss of control and ethical concerns has become a lot more common in recent years.[6] such concerns could be a reason that less experienced participants may be reluctant to point out the "better" animation as AI.

In a recent study 42% of "non-expert" respondents to a survey gave plausible defini- tions for what an AI is with 25% attempting to explain it in terms of robots which was considered a less accurate understanding though understandable given the close relation between the fields of robotics and AI.[3] Given this information it is plausible that some of the "non-experienced" participants had a lacking understanding of AI and as such a notable weakness in the questionnaire this thesis applied as it did not define what an AI was for the uninitiated. However doing so may also have skewed results by giving participants new information to use for the final identification.

Nevertheless it should be a consideration in any future studies.

The work of Dehesa et al. [5] is as stated in 2 the most similar study found to this thesis in that it applies a user study and compares animation based on simple hand-made logic to that of a Deep Neural Network approach (specifically PFNNs).

Their results indicated that their framework was consistently seen as more realistic and interesting. The results of this thesis are consistent with theirs in that the recent development of DNNs show considerable promise over a traditional hand- made approach. However currently both their approach and the plug-in based on [19] is considerably limited in that the network must be trained within certain limited parameters. In our case the plug-in is only currently applicable to locomotion rather than the whole range of human and inhuman motion the way the traditional approach can.

In summation the results indicate that the animations produced by the Unity Plug-

In: "Bio-Animation" was vastly preferred to that of a "traditional" animation set

and in conjunction with previous studies [5] implicate that ANN-based animation

approaches can generate more realistic motions than previous approaches.It is notable

however that our "traditional" motion set was not crafted by a professional and as

such may not be the best base of comparison. In addition the plug-in was based on

mocap data and therefore has an extra level of realism in the motions themselves.

(31)

23 Any future studies should consider this as it may achieve different results. However this study does clearly implicate that at least on an amateur level the plug-in and by extension potentially other ANN-based approaches could be preferred methods of applying locomotion character animation in Unity or other game engines. In addition experienced individuals appear to be very adept at spotting an AI system while non- experienced individuals are not. The reasoning for this is hard to ascertain with only the collected data but previous works indicate one potential reason to simply be confusion with what AI and mocap truly means and a potential unwillingness to admit an AI approach as "superior".

Please note that though running the simulation and comparison in real time on a

set location may yield better results. Unfortunately due to the current COVID-

19 crisis the evaluation by study participants cannot be done in person. As such

the methodology has been specifically developed with this fact in mind to minimize

differentiation between numerous indeterminable hardware used by participants.

(32)
(33)

Chapter 6

Conclusions and Future Work

This chapter summarizes and draws the final conclusions of this work in its first sec- tion followed by a section with recommendations for future work within the research area.

6.1 Conclusions

This study investigates and evaluates the perceived realism and appeal of new char- acter animation techniques based on ANNs. Specifically this study uses the software;

"Bio-Animation" from the Unity asset store and compares the animation of this soft- ware to a set of keyframe animations produced "by hand" and using a state-machine and blend-tree for transitions."Bio-Animation"which is based on the work of Starke et al. [19] and applies a Deep Neural Network(DNN) to generate character anima- tions and transitions from a set of motion-capture data and user-input. The two final animation sets where then recorded after responding to the same user input at the same time in real-time. This video was the central stimuli in a questionnaire which was produced to determine public perception of appeal and realism and to help answer the following research questions:

• Which character locomotion animations are preferred by viewers when com- paring those produced by procedural neural network methods and traditionally produced ones and why?

• In what way do "skilled" and "unskilled" people within 3D animation reason when trying to determine if animations have been generated by an AI? Does their reasoning differ and if so what way?

The results show that there was a statistically significant difference between the per- ceived quality of the two animation sets with a majority finding the ANN-generated motion set to be of higher quality and preferred it over the traditionally produced one. Many of the participants stated that they considered its movements and transi- tions to be more realistic, natural and smooth when asked to motivate their choice.

So in short participants preferred the generated motions mostly because of a higher sense of realism.

25

(34)

26 Chapter 6. Conclusions and Future Work 100 % of The "skilled" or "experienced" individuals in our study were able to cor- rectly guess which animation set had been generated by an AI. When asked why most reasoned either that the level of detail in set B would be too difficult to create by hand and therefore had to be motion capture or that the interpolation and response to changes in velocity would be too complex without the help of an AI. The "non- experienced" answers where evenly distributed across all three alternatives which may indicate that the answers where randomly distributed but our sample size is too small to draw any solid conclusion on that. However the ones who correctly guessed set B stated the soft,smooth and natural movements as their motivators. Those who guessed set A mentioned similar arguments of A having choppier transitions, less or- ganic movement and B having a "...more curated hand-made feel..." as motivations for believing A was made via AI. In conclusion both experienced and inexperienced participants reasoned similarly with a focus on how natural transitions and move- ments were. However some experienced individuals also noticed the changes to the animation based on velocity as potential evidence of an AI while some inexperienced participants considered choppy movements to be an indicator. As such the central difference in reasoning appears to lie in the understanding of what AI is and what it is capable of. The reasoning of inexperienced participants may also have been motivated by concerns over the development of AI as discussed above in chapter 5.

From a different perspective these results highlight the importance of understanding the public understanding of AI and motion capture as a whole for these types of studies. How well does the general public understand what motion capture truly is, how it functions and what the end results can look like? What about the same questions for AI? There has been some research on this previously

6.2 Future Work

This work was severely limited by working conditions and time. As such there are many ways it could be expanded. As the amount of realism in character motions was one of the central ways participants determined which motion-set was which it could be of interest to perform a similar experiment but simply with use of motion capture data in the "traditional" animation set. In addition multiple ways of transi- tioning between animations could be made except for just blend trees. Another key frame animation set made by an expert within the subject could also yield new and interesting results. An increased number of participants resulting in larger sample sizes could also improve the accuracy of some data sets most notably it could bring light to whether the evenly distributed responses of the inexperienced group was a coincidence or if a larger study would receive enough results to indicate something statistically significant.

The work of Dehesa et al. [5] extracted and applied questions from both IMI[16] and

from the Immersion questionnaire by Jennet et al.[8]. Both of which are commonly

used in reasearch, IMI especially in gaming. Unfortunately the user study of this

thesis was created and finished before this work was identified by the author. The

more immersion based questions were less relevant to this thesis given that the video

comparison was never intended as an immersive experience however if participants

(35)

6.2. Future Work 27 were allowed to play-test the animations in a future work they could be. The use of these tools for determining realism could have added considerably to synthesizing a more efficient questionnaire study and is recommended to be considered for any future work within the area.

A user study comparison of the perceived realism between multiple different ANN-

based character animation techniques could potentially add a more nuanced under-

standing of what makes certain approaches seem more natural than others. Some

such comparison has already been made in [19] but this was based on data such as

foot-sliding, jittering and timing. Which may be misleading considering the subjec-

tive nature of realism.

(36)
(37)

References

[1] Okan Arikan and D. A. Forsyth. Interactive motion generation from examples, July 2002.

[2] D. J. Bandim Faustino. Beyond Mocap: Animating Soccer Players Based on Po- sitional Tracking Data, 2016. Accepted: 2016-09-15T17:00:50Z Library Catalog:

dspace.library.uu.nl.

[3] Stephen Cave, Kate Coughlan, and Kanta Dihal. "Scary Robots": Examining Public Responses to AI. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, pages 331–337, New York, NY, USA, January 2019. Association for Computing Machinery.

[4] Simon Clavet. Motion Matching and The Road to Next-Gen Animation. Library Catalog: www.gdcvault.com.

[5] Javier Dehesa, Andrew Vidler, Christof Lutteroth, and Julian Padget. Touché:

Data-Driven Interactive Sword Fighting in Virtual Reality. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, pages 1–14, Honolulu, HI, USA, April 2020. Association for Computing Machinery.

[6] Ethan Fast and Eric Horvitz. Long-Term Trends in the Public Perception of Ar- tificial Intelligence. arXiv:1609.04904 [cs], December 2016. arXiv: 1609.04904.

[7] Daniel Holden. Phase-functioned neural networks for character control, 2017.

[8] Charlene Jennett, Anna L. Cox, Paul Cairns, Samira Dhoparee, Andrew Epps, Tim Tijs, and Alison Walton. Measuring and defining the experience of immer- sion in games. International Journal of Human-Computer Studies, 66(9):641–

661, September 2008.

[9] Lucas Kovar, Michael Gleicher, and Frédéric Pighin. Motion graphs. In ACM SIGGRAPH 2008 classes, SIGGRAPH ’08, pages 1–10, Los Angeles, California, August 2008. Association for Computing Machinery.

[10] Yuki Koyama and Masataka Goto. Precomputed optimal one-hop motion transi- tion for responsive character animation. The Visual Computer, 35(6):1131–1142, June 2019.

[11] Björn Krüger, Jan Baumann, Mohammad Abdallah, and Andreas Weber. A Study On Perceptual Similarity of Human Motions. Workshop in Virtual Re-

29

(38)

30 References ality Interactions and Physical Simulation "VRIPHYS" (2011), page 8 pages, 2011. Artwork Size: 8 pages ISBN: 9783905673876 Publisher: The Eurographics Association.

[12] Kyungho Lee, Seyoung Lee, and Jehee Lee. Interactive character animation by learning multi-objective control. ACM Transactions on Graphics, 37(6):180:1–

180:10, December 2018.

[13] Yongjoon Lee, Kevin Wampler, Gilbert Bernstein, Jovan Popović, and Zoran Popović. Motion fields for interactive character locomotion. In ACM SIG- GRAPH Asia 2010 papers, SIGGRAPH ASIA ’10, pages 1–8, Seoul, South Korea, December 2010. Association for Computing Machinery.

[14] Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. Learn- ing predict-and-simulate policies from unorganized human motion data. ACM Transactions on Graphics, 38(6):205:1–205:11, November 2019.

[15] Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. Deep- Mimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics, 37(4):143:1–143:14, July 2018.

[16] Richard M. Ryan. Control and information in the intrapersonal sphere: An extension of cognitive evaluation theory. Journal of Personality and Social Psy- chology, 43(3):450–461, September 1982. Publisher: American Psychological Association.

[17] Sarat Kumar Sarvepalli. Deep learning in neural networks: The science behind an artificial brain, 10 2015.

[18] Sebastian Starke. Bio Animation | Animation | Unity Asset Store. Library Catalog: assetstore.unity.com.

[19] Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. Neural state ma- chine for character-scene interactions. ACM Transactions on Graphics (TOG), 38(6):1–14, November 2019.

[20] Jing Wang and Bobby Bodenheimer. Synthesis and evaluation of linear motion transitions. ACM Transactions on Graphics, 27(1):1:1–1:15, March 2008.

[21] Jungdam Won and Jehee Lee. Learning body shape variation in physics-based

characters. ACM Transactions on Graphics, 38(6):207:1–207:12, November

2019.

(39)

Appendix A

Supplemental Information

This appendix adds supplemental information. In this case it contains only one sections which is a series of figure of the full questionaire participants took.

A.1 Full thesis survey

31

(40)

32 Appendix A. Supplemental Information

Figure A.1: Introduction and consent form to the survey

(41)

A.1. Full thesis survey 33

Figure A.2: Initial Questions

(42)

34 Appendix A. Supplemental Information

Figure A.3: Embeded video and follow-up questions

(43)

A.1. Full thesis survey 35

Figure A.4: Further questions

(44)

36 Appendix A. Supplemental Information

Figure A.5: Further questions

(45)

A.1. Full thesis survey 37

Figure A.6: Supplemental question in case something wasn’t covered.

Figure A.7: Final question with reveal of one of the videos being based on an AI

(46)
(47)
(48)

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

References

Related documents

The different animations span from a version with the maximum number of 106 joints (original animation) to versions where the joint count was reduced to an animation with 18 joints

[r]

Göra en processinriktad presentation av dokumentplanen/arkivförteckningen.. Dokumentplanering

[r]

Valet att använda 3D-animation eller inte i reklamfilm kan säkerligen bero på flera olika faktorer, det kan exempelvis handla om kostnadsfrågor eller att

Similarly to the Chinese participant, the second western participant mentioned above, also analysed the animation based both on different types of media, like Japanese animated

Based on the research questions which is exploring an adaptive sensor using dynamic role allocation with interestingness to detect various stimulus and applying for detecting

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller