A Performance Comparison for 3D Crowd Rendering using an Object-Oriented system and Unity DOTS with GPU Instancing on Mobile Devices.

(1)

IN

DEGREE PROJECT

COMPUTER SCIENCE AND ENGINEERING,

SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2020

A Performance Comparison for 3D

Crowd Rendering using an

Object-Oriented system and Unity DOTS

with GPU Instancing on Mobile

Devices.

MAX TURPEINEN

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

A Performance Comparison

for 3D Crowd Rendering using

an Object-Oriented system

and Unity DOTS with GPU

Instancing on Mobile Devices.

MAX TURPEINEN

Degree Programme in Computer Science and Engineering

Date: June 30, 2020

Supervisor: Björn Thuresson

Examiner: Tino Weinkauf

(4)

(5)

Sammanfattning

(6)

(7)

A Performance Comparison for 3D Crowd Rendering

using an Object-Oriented system and Unity DOTS

with GPU Instancing on Mobile Devices.

Max Turpeinen

maxtu@kth.se Royal Institute of Technology

Stockholm, Sweden

Abstract

This paper aims to address what is a suitable programming paradigm for real-time crowd rendering from a performance standpoint, with smartphones as the target platform. Among the most prominent and intuitive programming paradigms is the object-oriented (OO) one, with data-oriented designs be-coming more common in recent years. In this paper, Unity’s GameObject approach built on an object-oriented foundation is compared with their DOTS system using GPU instancing, arranging di�erent test scenarios later built using Xcode on an iPhone 6S and an iPhone XR. The results from the di�erent scenarios and builds are represented through multiple graphs focusing on the obtained frame rate, CPU usage and GPU usage. The DOTS system proved to outperform the object-oriented system in six out of eight scenarios with the iPhone XR yielding better performance. With DOTS currently being under development, several acceleration and enhancement techniques are yet to be integrated such as culling or LoD, which currently can be used by its counterpart. The OO sys-tem is more robust with variation whereas the DOTS syssys-tem is better suited when the number of characters increases.

1 Introduction

The use of virtual crowds [33,48] has practical uses as well as an aesthetic contributing e�ect when trying to reenact real-life scenarios. Examples of practical uses are evacuation scenarios or crowd psychology, where one tries to explain crowd behaviour in contradistinction to individual behavior. Allowing for a crowd to be represented as a group has its ad-vantages in the uniform and consistent behavior in-between characters, and is not as demanding in terms of computations whereas disadvantages are to get unique characteristics and behaviors for individuals within the group.

In the entertainment industry like in games and movies, the concept of crowds also comes into play in a more aes-thetic sense, trying to convey realism through populating areas. Dealing with a group representation is prefered in real-time strategy games such as Age of Empire (Ensem-ble Studios, 1997-2019), Total War (The Creative Assembly, 2000-2019) and Starcraft (Blizzard Entertainment, 1998-2017), where you are mostly in control of armies rather than indi-viduals. In the sports game genre we are most likely familiar

with the titles of FIFA (Electronic Arts, 1993-2019), NHL (Electronic Arts, 1991-2019) or NBA 2K (Visual Concepts, 1999-2019) where there is a tendency for an audience with spectators. These are also considered crowds but without the need for individual character navigation or collision avoid-ance. Regardless of �eld of use, there is a constant strive for improved realism which also puts a strain on performance. Based on the designated purpose of the crowd, some fac-tors may be more important than others. The most frequent use of crowds can be seen in the gaming industry. In strategy games, unilateral large-scale armies are of utmost impor-tance, meaning focus lies in the numbers rather than the variation credibility. In sports games, we often �nd crowds staging as spectators in the stands where numbers and varia-tion are important, but visual representavaria-tion is not since the crowds are never the point of interest. Within the sandbox genre e.g. GTA (Rockstar Games, 1997-2014) or Minecraft (Mojang, 2009), there is a greater need for interactive environ-ments in which crowd characters come into play. Depending on the designated platform, sought performance and graphi-cal demand, the choice of rendering scheme is of the essence. Smartphones [55] have gone from being only used in a business context to replacing almost every other type of phone. Due to the rapid development pace of mobile devices along with their constant performance increase, we are cur-rently facing a wide range of devices at our disposal, each with its own speci�cation. Developing mobile applications optimized for all current mobile generations is therefore a considerably cumbersome task. Looking at corresponding performance benchmark tests of di�erent smartphone gen-erations [49], the speci�cations as well as the performance varies immensely. This is one of the reasons why there is a need for either software adaptations or universal mobile applications.

(8)

complexity of the model associated with a distance thresh-old from the view perspective with a handful of prede�ned model versions.

The LoD approach is often associated with a polygonal mesh representation where one renders a less complex 3D model as the virtual camera traverses further from the mesh. The Image-Based Rendering (IBR) [2,4,6,13,21] approach relies on having several (2D) images of the same object cap-tured from di�erent positions and angles to reconstruct the idea of an animated 3D object. Point Sample Rendering (PSR) [18,23] uses the idea of having the view be represented by a collection of in�nitesimal view-independent surface points which each contains the information of depth, normal, color etc. which gets recalculated with the change in camera view. Besides using simply one LoD technique, several can be used in parallel to construct a hybrid approach in the at-tempt to utilize the bene�ts of one approach and avoiding the drawbacks of another. The idea behind a hybrid approach is to potentially neutralize the bottlenecks of each di�erent approach and bring forth their individual bene�ts. Never-theless, hybrid approaches often add a layer of management and run the risk of having to store multiple collections of the same object. Having to switch between representations requires communication in between hardware components and might stress their bandwidth.

Despite all attempts to construct the optimal rendering scheme, there is no single one that is strictly better than the others in every aspect. To justify the use of one speci�c approach is therefore based on the desired result and avail-able resources. Ideally, one could render endless amounts of characters with arbitrary intelligence and visual satisfaction, but unfortunately as of today, that is not the case. When it comes to rendering crowds we still struggle with trade-o�s between quantity, quality, memory usage and environmental adaptation.

Apart from the di�erent standard approaches, further po-tential computation improvements can be made by looking at e.g. frustum culling [9], memory usage [13], alternative data representations [50], as well as visual improvements through manipulation of lighting [19], shadowing [41] and crowd variability (clothing) [28].

An alternative data representation currently under devel-opment is Unity’s Data-Oriented Technology Stack (DOTS) [46, 47], adding on top of their more commonly known object-oriented (OO) game engine, an additional data-oriented (DO) paradigm. The justi�cation for a data-oriented design and Unity DOTS is the CPU performance gain through the Entity Component System (ECS), the C# Job System and the Burst Compiler. What is not considered in the general OO approach nor the DOTS approach is the utilization of the GPU for improved performance. One suggestion that has been made is to o�oad the CPU using GPU instancing (2.3.3).

In this paper, the aim is to address the question of Which paradigm for 3D crowd rendering, based upon object-oriented programming and DOTS with GPU instancing respectively, gives the most e�cient performance on a smartphone device of today?

2 Related Work

This section of the paper introduces some relevant concepts of real-time crowd rendering together with the more promi-nent and controversial techniques. Firstly, a few di�erent crowd animation approaches are introduced (2.1), moving over to di�erent character representations (2.2). Lastly (2.3), some of the more modern concepts are touched upon which are later to be implemented in the Methodology and Imple-mentation section.

2.1 Character Animation

Rendering refers to the way of presenting what is displayed on the screen at a given time, which is called a render. A sequence of renders is what adds up to an animation. One preferably aims to have a frame display frequency of at least 30 frames per second, when animating. The result of the ani-mation is in many cases heavily dependent on the character representation.

There are a few approaches of going about crowd render-ing, such as �ow-based, entity-based and character-based ones. Along with the di�erent approaches there are a range of di�erent aspects to consider, depending on the accessible resources and desired result.

When rendering crowds on an individual basis, a handful of primitives has been covered throughout the years where in every approach there is always a trade-o� between quantity, quality, e�ciency and versatility. To create highly accurate characters with �ne-grained appearance [1, 20,29,38] is impractical when trying to run in real-time with a demanding environment or larger crowds. Therefore attempts to make improvements through adaptations in e.g. animations have been made throughout, with varied results.

2.1.1 Skeletal animation. In recent years the most ex-plored way of doing animations is skeletal animations [27], where the base structure of the character consists of a bone hierarchy, with each bone linked to parts of the character’s skin or mesh. Animating using a skeletal basis is known as skinning and the process of constructing this setup is often referred to as rigging. The rigging is usually done manually, but recent aiding tools have been created to speed up the process [3,32,35].

The movement of the bones is what decides how and how much the mesh moves by carefully mapping each bone to regions of the mesh. Multiple bones can a�ect the same mesh, which is why each individual bone has to be properly weighted against the mesh [11]. In between frames, the bones are transformed from one keyframe to another and given

(9)

an arbitrary time of display, the pose can be interpolated accordingly.

2.1.2 Per-vertex animation. Another animation approach is the per-vertex approach, where unlike the skeletal ap-proach where we have bones, mesh deformations are based on the vertices’ positions themselves. In each keyframe, the vertex positions are stored which later can be used to obtain new deformations using interpolation [39,54].

With this approach the animator is given more freedom at hand to alter movements not restricted to bones. This is especially useful if in need of facial expressions or hand gestures, which can be hard to achieve with a limited set of bones. Ulicny et al. [50] used the idea of switching between keyframes containing pre-computed deformed meshes, cre-ating the perception of animation. The major drawback with this approach was the need for large memory storing the pre-computed meshes.

2.1.3 Animation Individuality. By individuality in re-gard to crowds, we refer to diversity in appearance. This goes for variation in animations, but also in clothing etc. When the number of characters increases in a scene, the need for more distinct animations becomes inevitable to maintain the perception of each character being a peculiar individual. An attempt to handle this is by adding a random o�set in the animation for each character, avoiding obvious synchronization [31].

The animation bottleneck seldom lies in the memory foot-print, but rather in the CPU computations needed to blend all the di�erent animations to persuade individuality amongst all the characters. What goes for most animation represen-tations, the di�erent poses are stored in memory as sets of matrices, which transforms the bones or vertices in the run-ning. Instead of sending the keyframe matrices from the CPU to the GPU, which are costly on the CPU-GPU bandwidth, these can be preloaded directly to the GPU, consuming GPU memory instead [37].

2.2 Level-of-detail representations for characters A way of improving crowd rendering performance is to use the level-of-detail (LoD) technique [9,15], where the char-acter representation is based on its addition to the scene [16,34]. The principle is that as a character gets further from the perceived scope, it gets less detailed in terms of represen-tation, allowing for a cheaper approach to be used. Recent adaptations have gone beyond using the concept of LoD only for appearance, but also for sounds and animations [50].

LoDs can either be used as continuous LoDs [14] or dis-crete LoDs [8], where the latter is the most commonly used. Continuous LoDs e.g. progressive meshes, implies that the mesh is changed as we move the virtual camera, using a scheme to re�ne the mesh as we move closer and simpli-fying it as we move further away. By moving the virtual camera back and forth, the appearance might change over

time unless the transitions are rigid or stored in memory. The latter would of course be redundant, making the discrete approach more perceivable and manageable.

As for the discrete LoDs, di�erent prede�ned representa-tions are set up to be used at a certain displacement threshold from the virtual camera. Given the circumstances, one can choose to utilize as many LoDs as possible or needed. Ulicny et al. [50] were able to instantiate more than a thousand of characters into his scene using di�erent discrete LoD repre-sentations.

A common artifact with the LoD approach is the percep-tible degradation, switching from one LoD representation to another, often called the popping artifact [4]. When mov-ing in a scene where LoDmov-ing is not done carefully with no further measures, one can clearly observe the switching in between models. To avoid this, one can add a delay in the model transmission or letting both of the di�erent models be displayed in tandem for a short period.

2.2.1 Polygon-based techniques. As mentioned before, continuous LoDs are not very suitable when it comes to animations [24], therefore the use of static meshes are pre-ferred (Figure 1). Schmalstieg and Furhmann [36] chose to divide mesh into single- or multiple bone weighted regions for simpli�cation, later to reassemble them strategically. For approaches like these, the main issue is the need to prede�ne sets of animations, limiting the dynamicity in variation.

Figure 1. A representation of how LoDs are used and repre-sented. As the camera gets further away from the character the LoD-level increases and the complexity decreases. The percentage implies how many percent of the original mesh the character consists of.

Landreneau and Schaefer [22] introduced an approach to collapse mesh edges, decided by the given fault tolerance to positioning and weights of each edge. Through this, they were able to construct new vertices and weights through interpolation, achieving new animation keyframes from the otherwise limited set.

2.2.2 Point-based Techniques. Another primitive of rep-resenting static mesh is the point-based one, which was ini-tially suggested by Levoy and Witted [23]. It uses the idea of

(10)

having the view be represented by a collection of in�nitesi-mal view-independent surface points which each contains the information of depth, normal, color etc. Each point along with its information then gets recalculated with the change of camera and lighting. This approach works well in the far distance, not having to render polygons [18] which cannot be seen anyhow.

2.2.3 Image-based Techniques. One of the more com-monly used primitives is using images. This as well as the point-based approach, is most suited for characters further away from the camera due to the lack of detail. A compari-son between these two approaches were made by Millan and Rudomin [30]. The idea is having several (2D) images taken of the same object from di�erent angles to (when moving the camera) convince the viewer that it is an actual 3D object. These images are often known as billboards, or imposters when talking about characters.

To generate the images one can either generate them dy-namically or statically. Dynamic generation means images of an object are generated at runtime and kept for as long valid, decided by a given heuristic. Aubel et al. [2] were the one to come up with an initial version, re-using snapshots for more than one frame. Going by the static approach, one captures the images in advance, storing them only to use when needed. Tecchia et al. [40] were the �rst one to try this, to later also enhance the approach adding shadows given from the silhouette of the character [38], which only worked convincingly on planar surfaces. A major drawback besides not being considered detailed, is the memory usage needed to store all the di�erent angles of the model at each frame of each animation cycle.

2.2.4 Hybrid techniques. As we have come across di�er-ent approaches, each with their own bene�ts and drawbacks, several attempts to shift or eliminate the bottleneck of a par-ticular approach has been made through hybrids [4,5,10,21]. The most intuitive hybrid representation is the one using static meshes close to the virtual camera, transitioning to static imposters further, which is what Dobbyn et al. [12] did naming them Geopostors. The Geopostors shows an increase in performance in comparison to the sole polygon-based rep-resentation and an aesthetic improvement compared to the imposter one. Along with Geopostors a number of hybrid approaches have been introduced [4].

2.3 Current Developments

As development is constantly ongoing but rarely accessible for the general public, one usually has to wait until stable and tested versions are released. This is certainly more ac-curate for hardware than for software, even though new smartphones are released every year. With the continuous hardware progression, the software needs to be saturated through better performing paradigms like DOTS or task bal-ancing schemes like GPU instbal-ancing, in comparison to the

more commonly used paradigm of object-oriented program-ming.

2.3.1 Smartphones. When dealing with smartphones as a form of a computer [55], there are still a few practical di�erences. Due to the desired portability and size of smart-phones they cannot be provided the same consistency in electricity nor the equivalent processing power. Hence, they use batteries which do not have the same need for a fan or a heat sink. When instead on the verge of overheating, smartphones tend to rely on thermal throttling, forcing appli-cations and processes to shut down, reducing the workload and dispensing the heat over time. The size of a smartphone is possible through the System on a Chip (SoC) architecture, having all the components packed into one single chip. The main processing components in any computerized system are usually the CPU and the GPU where in the smartphone case, the CPU is heavily pressured having to be able to main-tain active apps [17] and being able to switch between them. The GPU is instead designed to handle more speci�c tasks related to graphics and due to compromise through the SoC and using a battery, mobile applications usually run in 30 fps, rather than a higher frame rate.

2.3.2 Object-Oriented Programming vs Data Oriented Tech Stack. Recently, the transition towards a data-oriented design has had an upswing in the game industry due to its situational performance bene�ts crucial to e.g. smartphones. Among these are Unity which are currently developing their DOTS [46] system which aims to give high performance as a default. The system utilizes the C# Job System allowing for multithreading, Entity Component System (ECS) enforcing performant code writing and further improvement through the burst compiler producing highly optimized machine code. Besides the automated parallelization and optimization of machine code, writing code according to ECS standards is what binds it all together. The traditional object-oriented way of creating a game in Unity is by creating a GameObjects [44] and attaching MonoBehaviors [45], giving its function-ality like rendering, physics and movement. With ECS, the game is instead split up into the three di�erent parts of En-tities, Components and Systems. Entities are used to group together components, similar to the traditional GameObjects, but not as resource demanding. Components are contain-ers to store plain data and unlike GameObjects, lack logic. Systems are instead what operates on the entities given a speci�c set of components, adding the logic. Just like with GameObjects, we can change entities’ component data using systems, e.g. a rendering system to change the rendering component of an entity (Figure 2). Even though the outcome seems to be the same, the computational strain is di�erent since the same system can update several di�erent entities with the same components, whereas each GameObject has to handle its own logic for each separate functionality. Further-more, having component data being labeled the same, they

(11)

can be grouped into the same chunk of memory, allowing for easier and faster memory lookups by the corresponding system.

Figure 2. To the left is a representation of Unity’s traditional way of creating games using GameObjects with attached MonoBehaviors, each handling its own logic. To the right is the ECS representation using Entities, where several Com-ponents can be handled by the same system.

2.3.3 GPU Instancing. Another welcome proposal in re-gards to smartphones is the one of balancing the burden between the CPU and the GPU. One way in o�oading the primarily exposed component, the CPU, is through GPU in-stancing [13,25]. GPU instancing utilizes more of the GPU by letting the CPU preprocess data, converting it into a UV texture which later can be handled by the GPU instead. A UV texture is essentially a matrix but instead consisting of RGB-colors, where depending on the axes, one can form di�erent metrics. As an example, one could express an char-acter’s position by converting its x-, y- and z-coordinates into RGB-colors, having the x-axis of the UV texture represent all vertices and the y-axis the corresponding positions.

3 Methodology and Implementation

In this part of the paper we go through how the two di�er-ent systems of object-oridi�er-ented GameObjects with MonoBe-haviour and DOTS with GPU-instancing were implemented and how they were tested. Onwards, the object-oriented im-plementation will be referred to as the OO system and the DOTS with GPU instancing implementation as the DOTS system.

To answer the research question, research papers and approaches relating to crowd rendering were looked at to get a general overview and understanding on the topic (2. Related Work). For starters, all non real-time rendering tech-niques could be excluded, like ray-tracing. One of the most controversial approaches which i came across in regard to crowd rendering was the one of DOTS (2.3.1) with perfor-mance in mind. Through using a data-oriented approach one can advantageously manage several components within

the same update call handled in parallel through working threads. This speaks a lot for the increasingly common idea of diverging from hardware reliability and depending more on the software archetype for performance [7]. Since this is an ongoing development and di�erent from the typical used approaches, this seemed like a good choice to implement and investigate further. With DOTS being under development, getting a reference from one of the more typical approaches is an intuitive way of going about a comparison study. As Unity also has their traditional object-oriented way of cre-ating games and rendering crowd, this would give the two approaches the most justice being built on the same engine. Additionally, GPU instancing is also introduced to the DOTS implementation, not interfering with the reference imple-mentation, yet making an attempt to balance the workload for improved results.

Both implementations were developed in Unity Version 2019.3.0f3 Personal. The two phones that were used conduct-ing the tests were an iPhone 6S (32 GB) [51] and an iPhone XR (64 GB) [53], both with the operating system iOS 13.4.1. The software (compatible with Unity) to build on the iPhones was Xcode [52] version 11.4.1. As the DOTS implementation was produced during the development of DOTS some pack-ages were used in preview versions. The noteworthy package versions used related to the DOTS system were Burst 1.2.1 [42] and Entities 0.5.1 [43]. To compensate for the screen resolutions not being the same between the iPhone 6S (1334 x 750) and iPhone XR (1792 x 828), the render scale of the render pipeline was reduced to 0.81 and 0.665 respectively.

The two implementations were developed in parallel with the aim to be able to reconstruct as resembling scenes as pos-sible in every aspect (Figure 3), only di�er in the paradigm used. The reasoning being for the tests to be as systematically executed as possible, having the upcoming test be derived from the previous, only having changed or added a single element in each iteration. Each change in the scenery was made with the assumptions that the OO system would be more robust when adding variation and the DOTS system, when adding in numbers. Seeing how the OO system per-formed better than expected, this is what later led to the tests with the scaled re�nement in mesh.

The reason for using two phones of di�erent generations with quite the speci�cation and performance gap, yet plau-sibly used today, was to get an idea of how current phones perform with the di�erent approaches. The idea is not to com-pare the performance between the two di�erent smartphone models, but rather get an insight in the performance range of relevant smartphones of today. How well-performing an approach was on each phone could be seen when the appli-cation was built, being plugged into the computer running with Xcode. The performance values which were gathered in Xcode were frame-rate (fps), CPU-usage and GPU-usage. All scenes were built from the same base with the basic main camera, directional light and skybox that is included

(12)

Figure 3. The OO system running to the left and the DOTS system running to the right trying to mimic the same scene. The coloring is di�erent in between the two approaches since ECS does not support the same shaders for the moment. by default, creating a new scene in Unity. A smaller plane forming the ground and a simple box acting as the focal point for the crowd were also added into the scene. Lastly, an empty GameObject was added in the corner of the plane, which the corresponding spawner script was added depending on the system being tested. Beside all objects in the scene having the same positions, the di�erent spawn objects also had all the same attributes attached to its script. Amongst other things, the prefab itself, the character count in x- and z-axis, the grid step in x and z, the focal point, as well as some noise in each axis.

The implementations were tested against each other on each phone separately, checking for when the frame rate dropped below 30 fps, indicating the upper limit. Even though 30 fps might be considered low with most current applica-tions moving towards 60 fps or higher, 30 fps is a good bench-mark considering OO being an old concept [4] and phones not the optimal rendering platform. Every sample of each build was run once and the measurements taken after �ve minutes into the run to observe potential thermal throttling (2.3.1).

In the �rst test scenario, the same model and animation were used for all characters. The number of characters were increased in each respective case until the frame rate dropped below 30 fps. Noteworthy is that each measurement was received from a single discrete build, meaning the exact break point was not produced but rather an estimate between two builds due to the time consumption of a build. For the second test scenario, nine di�erent crowd characters (Figure 4) along with nine di�erent animations were used, looking at how the variation in appearance a�ects the performance of each system. For the two last test scenarios (assumed to be better suited for the DOTS system), only one variation of crowd character was used, but scaled in delicacy by reducing the

polygon count to 20%, respectively 10% of the original mesh using an external asset. This allowed for more characters to be spawned, testing for less variety but larger scales while trying to maintain 30 fps. The di�erent test compositions can be viewed in Figure 5.

Figure 4. The OO system running to the left and the DOTS System running to the right, having nine di�erent characters each with its own animation.

4 Result

The following graphs (Figure 6-13) show the received mea-surement data in frames per second (fps), processing time in milliseconds for the CPU at each frame (CPU) and process-ing time in milliseconds for the GPU at each frame (GPU) for each speci�c test. The measurements are taken at a sin-gle frame after having the systems run for �ve minutes to observe potential thermal throttling (2.3.1). In every graph, each sample is presented as bars in groups of three where the fps bar (�rst) is colored in blue, whereas the CPU bar (second) is colored in dark gray and the GPU bar (third) in light gray. The coloring is done with the purpose of focusing on the fps primarily, since its performance is dictated by the combination of performance amongst the CPU and GPU.

The targeted benchmark in all the tests are set to be 30 fps (2.3.1), which is also why whenever the system overperforms, all bars for that sample are capped at the 30 mark as an indication. Similarly, whenever the system underperforms, the bars are capped at the bottom mark of the graph with the exception being whenever the system drops below 30 fps for the �rst time. Note that this does not apply to the last test (Figure 13) where we can see how the frame rate of the OO system keeps dropping as the DOTS system maintains its performance.

In all the graphs, two di�erent y-axis are used denoted as “frames per second (fps)” and "milliseconds (ms)" since there are di�erent units between fps and CPU/GPU. Having both units visible simultaneously makes it easy to compare the

(13)

Figure 5. The colored squares indicate what system was tested for which settings, with the number of characters rep-resented on the y-axis and the smartphone and test scenario represented on the x-axis. “#” meaning the test scenario with the same character and “All” including variation in appear-ance and animation.

performance between the two systems being next to each other and also to distinguish patterns and correlations in the di�erent measurements. Whenever the processing time of both the CPU or GPU is kept under 33 ms, so will the frame rate, being capped at 30 fps. On the contrary, whenever the values no longer range between zero and 33 it indicates that one of the processors is being overworked and a breakpoint has been reached. The x-axis contains at the very bottom, the number of characters and moving upwards, the render-ing system tested for that particular number of characters. Note that for all the graphs regarding iPhone 6S, the numeric values (y-axis) ranges from 22-42 and for the iPhone XR, the numeric values ranges from 18-52. The number of characters

and samples (x-axis) also varies across the graphs. The fol-lowing graphs are presented in pairs, where the same tests are executed only on di�erent devices with the iPhone 6S coming �rst and the iPhone XR secondly.

Throughout the graphs regardless of test or device, look-ing at the far right gives a good indication of where the best performing system starts dropping. Paying close attention, one will also see how the CPU and GPU of the DOTS ap-proach is always in line with each other when reaching its limit. When looking at the iPhone 6S test for number of characters (Figure 6), the DOTS system sustains its frame rate for several more characters added to the scene than the OO system. The OO system already starts dropping in frame rate at 30 characters whereas the DOTS approach maintains a solid frame rate of 30 fps up until 50 characters. At 45 char-acters on the other hand, the CPU hits 33 ms in processing time and the GPU 32 ms, meaning it is closing in on its limit which also explains the considerable drop to 27,5 fps in the next sample.

Figure 6. The fps-, CPU- and GPU performance on the iPhone 6S given the number of characters for each system. For the same test but instead with the iPhone XR (Figure 7), unlike with the iPhone 6S, the DOTS system starts dropping in frames ahead of the OO system. At 120 characters the di�erence might seem substantial, but note that it only di�ers in three fps the systems apart.

Adding variety to the scene with the iPhone 6S (Figure 8) results in a much quicker fall in fps for both systems, with the greater fall for the DOTS system. Comparing the values for the OO system with the ones without variation (Figure 6) the di�erence is much smaller than when comparing the DOTS system. This is expected, but bare in mind that the CPU- and GPU values are probably even higher for the OO system at 50 characters, compared to the DOTS system.

Yet again the OO system outperforms the DOTS system on the iPhone XR with variation included (Figure 9). Strangely enough, both systems show greater values in comparison to the test without variation (Figure 7), when expected to perform worse due to the addition in complexity.

(14)

Figure 7. The fps-, CPU- and GPU performance on the iPhone XR given the number of characters for each system.

Figure 8. The fps-, CPU- and GPU performance on the iPhone 6S having nine di�erent characters with nine dif-ferent animations.

Figure 9. The fps-, CPU- and GPU performance on the iPhone XR having nine di�erent characters with nine di�er-ent animations.

Having the mesh of the crowd characters being reduced has increased the amount of characters possible to render,

even for the iPhone 6S (Figure 10). By each character being reduced to a �fth in size, one could make the bald assumption that the system now should be able to handle �ve times more characters. This proves to be the case, but still we cannot take this for granted. Both systems even happen to perform better.

Figure 10. The fps-, CPU- and GPU performance on the iPhone 6S having multiple of the same character, reduces in mesh down to 20%.

Downscaling the complexity of the characters to 20% for the iPhone XR (Figure 11) breaks the trend of having the OO system outperform the DOTS system on the iPhone XR. Instead, the DOTS system outperforms the OO system by quite a bit. At 600 characters we can see how the DOTS system reaches its exact break point of having 30 fps, yet 33 ms for both the CPU and the GPU. Comparing this to the test with the original mesh (Figure 7) the OO system handles about four times as many characters with the same performance, whereas, the DOTS system has surpassed the OO system in performance and pulls o� about six times as many characters.

Figure 11. The fps-, CPU- and GPU performance on the iPhone XR having multiple of the same character, reduces in mesh down to 20%.

(15)

As the mesh goes down to 10% for the iPhone 6S (Figure 12), the OO system can no longer keep up with the DOTS system, collapsing in the other end of the graph. Seeing how the DOTS system almost produced 50% more characters in the �rst test of the iPhone 6S (Figure 6), the DOTS system now produces twice as many characters, maintaining the frame rate of 30 fps.

Figure 12. The fps-, CPU- and GPU performance on the iPhone 6S having multiple of the same character, reduces in mesh down to 10%.

Having a last test on the iPhone XR with the characters being 10% of the original mesh (Figure 13) we can now trace the values for the OO system. The processing time for the CPU at each frame is steadily increasing as the frame rate goes down. Here one can clearly see how dependent the OO system is on the CPU, unlike at the end of all graphs where one can see how the CPU and GPU is kept balanced whenever the DOTS system reaches its limit.

Figure 13. The fps-, CPU- and GPU performance on the iPhone XR having multiple of the same character, reduces in mesh down to 10%.

5 Discussion

This thesis has attempted to address the research question: Which paradigm for 3D crowd rendering, based upon object-oriented programming and DOTS with GPU instancing respec-tively, gives the most e�cient performance on a smartphone device of today?

From the results one can deduce that the DOTS system outperforms the OO system in almost every test case with the exceptions of the ones executed on the iPhone XR with and without variation in characters and animation with the original mesh. Therefore, with the initial statement of both iPhone 6S and iPhone XR being considered smartphone de-vices of today, one cannot conclude that one system is strictly superior in all regards executed on today’s smartphones.

The initial expectation was for the OO system to perform worse than the DOTS system overall, mainly due to it treating each crowd character as a unique object, whereas the DOTS system handles components of the same type simultaneously as groups. This of course speaks for a potential downside for the DOTS system as variation increases whereas the OO system should more or less perform the exact same.

Despite one system not coming out ahead in all the tests, the DOTS system could be considered more promising see-ing how it outperforms the OO system in six out of eight tests. Nonetheless, in all tests where the DOTS system is in-ferior, both systems drop below 30 fps at the same character count implying the performance di�erence being marginal. Additionally, whenever the DOTS system outruns the OO system the results are predominant in favor of the DOTS system. Judging from the tests where the mesh is reduced, the DOTS system almost produces twice the amount of char-acters before dropping below 30 fps in comparison to the OO system. Noteworthy is also how whenever the DOTS system is pushed to its limit the CPU and GPU usage is perfectly balanced, due to the CPU being able to o�oad the burden on the GPU thanks to the GPU instancing. On the contrary, whenever the OO system gets pushed to its limit the CPU usage skyrockets whereas the GPU usage is comparably low. Even though we denote iPhone 6S and iPhone XR as smart-phones of today, already as of this writing we have the variations of the iPhone 11 with improved CPU and GPU likely to perform better in the same tests. Besides the gen-eral improvement in hardware, with a plausible paradigm shift from object-oriented designs to data-oriented designs, hardware modi�cation in e.g. memory might give even fur-ther improvements in performance. The same can be said about interchangeable processing between the CPU and GPU achieving perfect balancing, optimizing processing time. 5.1 Methodology criticism

Throughout the majority of the samples and test the mea-surements proved to be reasonably stable only �uctuating around the presented values. Even so, the measured data

(16)

could have been more accurate doing several tests, using the same scenery and presenting an overall average for each sample. The reason for only running a single execution for each sample was due to the provided resources and time needed to run the samples. Each sample had to be built for a certain smartphone and desired system at a time, meaning with each modi�ed variable, another build had to be set up. As we can see in Table 1, more than 80 samples had to be run just collecting the necessary breakpoints. This does not include the test where four and nine respectively di�erent characters and animation were tested separately for each de-vice, adding up to another at least 40 samples. These samples were not included showing similar results to the samples including both character and animation variation. By ques-tioning the accuracy of either sample all tests had to be rerun to be consistent, including the samples not displayed. With a single sample execution taking approximately 20 minutes (not including the modi�cation time in between runs etc.), with over 100 samples to be tested multiple times, the time needed would have been counted in days.

A related potential error is how the smartphones were plugged into the computer one at a time, continuously charg-ing as the tests were built, generatcharg-ing additional heat, causcharg-ing a potential drop in performance due to cooling measures. This was not considered during the process and might have lasted di�erent amounts of time, depending on if a phone was charging as another scene was constructed etc. This might explain why the OO system performed better in the �rst tests on the iPhone XR, having the OO system built �rst. The majority of the samples were made for the �rst test to get a feeling for where the break point was and how diverse data to collect.

Another �aw to consider is how the numbers in the graphs might be skewed going from only number of characters to having characters with nine di�erent animations and charac-ters due to the average character having a less re�ned mesh than the single one used for the number of characters tests. This has no e�ect on the �nal result when comparing be-tween the two systems, but rather results in an overestimate in performance for both the systems using nine characters and animations. The same reasoning applies for the anima-tions.

Also, when randomizing the o�set of the characters in the OO system, the old math package related to MonoBehaviors were used whereas for the DOTS system, the new math package was used. This also meant that the randomness was produced in di�erent ways, where the o�set for the DOTS system was actually produced via noise instead of through a random function. Therefore nor was a random seed used which could have determined the randomness, making the tests even more alike. To bear in mind is that the randomness is only used initially placing the characters, only causing a potential delay on the �rst frame. Nor is this relevant for potential characters ending up o�screen since we are not

using any culling, meaning regardless if a character can be seen or not, if placed in the scene this will a�ect performance in the same way.

5.2 Future work

Given the previous observation of the CPU usage skyrocket-ing for the OO system one could argue that for future work one should integrate GPU instancing into the OO system to o�oad the CPU, which today is already possible in Unity. As previously mentioned (2.2), the concept of Level-of-detail is frequently used and one of the most prominent used en-hancement techniques when it comes to crowd rendering, but also in games in general. Furthermore, adding additional performance enhancers such as frustum culling [9] would also give a performance increase if used correctly, which have been done previously in OO approaches i.a. in Unity.

6 Conclusion

Through comparing the results of the OO- and DOTS system one can conclude that the DOTS system is performing better as the numbers raise, but variation is kept low. The OO system performs well with having fewer numbers, but does a better job maintaining its performance as variation increases. As for future work, both systems could bene�t from further enhancement techniques such as LoD or culling.

Acknowledgments

Foremost, with my sincerest gratitude and deepest apprecia-tion I would like to thank my supervisor Björn Thuresson who has not only guided me in my research and through these severe times, but who has also mentored me through my studies at KTH.

My appreciation also extends to Friend Factory for having me do my research work in their presence, accompanied by my great friend Vincent E. Wong who I’ve had the greatest pleasure of sharing my studies with throughout our journey at KTH.

References

[1] Thalmann D. Aubel A. 2000. Realistic deformation of human body shapes. Proc. Computer Animation and Simulation 2000 (2000). [2] Thalmann D. Aubel A., Boulic R. 1998. Animated impostors for

real-time display of numerous virtual humans. VW ’98 (1998).

[3] PopoviC J. Baran I. (Ed.). 2007. Automatic rigging and animation of 3d characters. ACM Transactions on Graphics, Vol. 26.

[4] Pelechano N. Beacco, A. and C. Andújar. 2012. E�cient rendering of animated characters through optimized per-joint impostors. Journal of Computer Animation and Virtual Worlds 23 (2012).

[5] Andujar C. Pelechano N. Beacco A., Spanlang B. 2011. A �exible approach for output-sensitive rendering of animated characters. Com-puter Graphics Forum 30 (2011).

[6] Pelechano N. Spanlang B. Beacco A., Andújar C. 2015. A Survey of Real-Time Crowd Rendering. Computer Graphics Forum, 35, 8 (2015). [7] Zhirnov V.V.. Cavin R., Lugli P. 2012. Science and Engineering Beyond

Moore’s Law. Proc. IEEE (2012).

(17)

[8] Fahn C. Tsai J. Chen R. Lin M. Chen, H. 2006. Generating high-quality discrete LOD meshes for 3D computer games in linear time. Multimedia Syst. 11 (2006).

[9] Silva C. T. Durand F. Cohen-Or D., Chrysanthou Y. L. 2003. A sur-vey of visibility for walkthrough applications. IEEE Transactions on Visualization and Computer Graphics 9, 3 (2003).

[10] Meyer A. Coic J., Loscos C. 2007. Three LOD for the Realistic and Real-Time Rendering of Crowds with Dynamic Lighting. Research Report RN/06/20, Université Claude Bernard (2007).

[11] Magnenat-Thalmann N. Cordier F. 2005. A data-driven approach for real-time clothes simulation. Computer Graphics Forum 24 (2005). [12] O’Conor K. O’Sullivan C. Dobbyn S., Hamill J. 2005. Geopostors: a

real-time geometry/impostor crowd rendering system. I3D ’05: Proceedings of the 2005 symposium on Interactive 3D graphics and games (2005). [13] Peng C. Dong Y. 2019. Real-Time Large Crowd Rendering with E�cient

Character and Instance Management on GPU. International Journal of Computer Games Technology. (2019).

[14] Buchholz H. Döllner J. 2005. Continuous level-of-detail modeling of buildings in 3D city models. GIS ’05: Proceedings of the 13th annual ACM international workshop on Geographic information systems (2005). [15] Clark J. H. 1976. Hierarchical geometric models for visible surface

algorithms. Communications of the ACM 19, 10 (1976).

[16] Dobbyn S. O’Sullivan C. Hamill J., McDonnell R. 2005. Perceptual eval-uation of impostor representations for virtual humans and buildings. Computer Graphics Forum 24, 3. (2005).

[17] Arafhin Mazumder T. Islam R., Islam R. 2010. Mobile Application and Its Global Impact. International Journal of Engineering & Technology, IJET-IJENS, Vol: 10, No:06. (2010).

[18] Bærentzen J. 2005. Hardware-accelerated point generation and ren-dering of point-based impostors. Journal of Graphics Tools 10 (2005). [19] Sundstedt V. Bala K. Gutierrez D. O’Sullivan C. Jarabo A., Van Eyck T.

2012. Crowd light: Evaluating the perceived �delity of illuminated dynamic scenes. Computer Graphics Forum (Proc. EUROGRAPHICS 2012) 31, 2. (2012).

[20] Oat C. Gutierrez D. Jimenez J., Echevarria J. I. 2011. Practical and Realistic Facial Wrinkles Animation. GPU Pro 2 (2011).

[21] Collins S. Žára J. O’Sullivan C. Kavan L., Dobbyn S. 2008. Polypos-tors: 2d polygonal impostors for 3d crowds. I3D ’08: Proc. of the 2008 symposium on Interactive 3D graphics and games (2008).

[22] Schaefer S. Landreneau E. 2009. Simpli�cation of articulated meshes. Computer Graphics Forum 28, 2 (2009).

[23] Whitted T. Levoy M. (Ed.). 1985. The Use of Points as a Display Primitive. [24] Behr J. Alexa M. Limper M., Jung Y. 2013. The pop bu�er: Rapid progressive clustering by geometry quantization. Computer Graphics Forum 32, 7 (2013).

[25] Day A. M. Lister W., Laycock R. G. (Ed.). 2009. Geometric Diversity for Crowds on the GPU.

[26] Cohen J. Reddy M. Varshney A. Luebke D., Watson B. (Ed.). 2002. Level of Detail for 3D Graphics. Elsevier Science Inc., New York, NY, USA. [27] Thalmann D. Magnenat-Thalmann N., Laperrire R. 1998.

Joint-dependent local deformations for hand animation and object grasp-ing. CHCCS/SCDHM, Edmonton, Alberta., Chapter In Proceedings on Graphics interface’ 88, 26–33.

[28] Collins S. O’Sullivan C. McDonnell R., Dobbyn S. 2006. Perceptual evaluation of lod clothing for virtual humans. Proc. of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation, SCA ’06 (2006).

[29] Coleman D. McLaughlin T., Cutler L. 2011. Character rigging, deforma-tions, and simulations in �lm and game production. ACM SIGGRAPH 2011 Courses , SIGGRAPH ’11 (2011).

[30] Rudomin I. Millan E. (Ed.). 2006. A comparison between impostors and point-based models for interactive rendering of animated models. [31] Curtis S. Narain R., Golas A. and Lin M. C. 2009. Aggregate

dynam-ics for dense crowd simulation. ACM SIGGRAPH Asia 2009 Papers,

SIGGRAPH Asia ’09 (2009).

[32] Sugimoto M. Pantuwong N. 2012. A novel template-based automatic rigging algorithm for articulated-character animation. Computer Ani-mation and Virtual Worlds 23, 2 (2012).

[33] Badler N. Pelechano N., Allbeck J. (Ed.). 2008. Virtual Crowds: Methods, Simulation, and Control. Morgan & Claypool.

[34] Barham P. Barker R. Waldrop M. Ehlert J. Chrislip C. Pratt D., Pratt S. 1997. Humans in large scale, networked virtual environments. Mas-sachusetts Institute of Technology, Chapter Presence, 547–564. [35] Susin A. Ramirez J., Lligadas X. 2008. Automatic adjustment of rigs to

extracted skeletons. Springer Berlin Heidelberg, Chapter In Articulated Motion and Deformable Objects, Perales F., Fisher R., (Eds.), vol. 5098 of Lecture Notes in Computer Science., 409–418.

[36] Fuhrmann A. Schmalstieg D. (Ed.). 1999. Coarse View Dependent Levels of Detail for Hierarchical and Deformable Models.

[37] Oat C. Tatarchuk N. Shopf J., Barczak J. 2008. March of the froblins: Simulation and rendering massive crowds of intelligent and detailed creatures on gpu. ACM SIGGRAPH 2008 Games , SIGGRAPH ’08 (2008). [38] Pai D. K. Sueda S., Kaufman A. 2008. Musculotendon simulation for hand animation. ACM Trans. Graph. (Proc. SIGGRAPH) 27, 3 (2008). [39] Lorach T. (Ed.). 2007. Gpu blend shapes.

[40] Chrysanthou Y. Tecchia F. 2000. Real-time rendering of densely pop-ulated urban environments. Springer-Verlag, Chapter Proceedings of the Eurographics Workshop on Rendering Techniques 2000, 83–88. [41] Chrysanthou Y. Tecchia F., Loscos C. 2008. Image-based crowd

ren-dering. IEEE Computer Graphics and Applications 22, 2 (2008). [42] Unity Technologies. 2020. Changelogs for the unity package “Burst”.

Retrieved June 8, 2020 fromh�ps://docs.unity3d.com/Packages/com. unity.burst@1.2/changelog/CHANGELOG.html

[43] Unity Technologies. 2020. Changelogs for the unity package “Entities”. Retrieved June 8, 2020 fromh�ps://docs.unity3d.com/Packages/com. unity.entities@0.2/changelog/CHANGELOG.html

[44] Unity Technologies. 2020. Explanation of Unity’s GameObject. Re-trieved June 8, 2020 fromh�ps://docs.unity3d.com/ScriptReference/ GameObject.html

[45] Unity Technologies. 2020. Explanation of Unity’s MonoBehaviour. Re-trieved June 8, 2020 fromh�ps://docs.unity3d.com/ScriptReference/ MonoBehaviour.html

[46] Unity Technologies. 2020. Unity’s own ECS sample GitHub repository with description. Retrieved June 8, 2020 from h�ps://github.com/Unity-Technologies/EntityComponentSystemSamples

[47] Unity Technologies. 2020. Unity’s own summary of what DOTS is. Retrieved June 8, 2020 fromh�ps://unity.com/dots?_ga=2.55008813. 780278623.1590331289-1948476926.1586950390

[48] Musse S. Thalmann D. (Ed.). 2013. Crowd Simulation (2nd ed.). Springer, London.

[49] UL. 2020. Shows a wide range of smartphones with a number indicator as a standardized measure for their performance. Re-trieved June 8, 2020 from h�ps://benchmarks.ul.com/compare/best-smartphones?amount=0&sortBy=PERFORMANCE&reverseOrder= true&osFilter=ANDROID,IOS,WINDOWS&test=SLING_SHOT_ES_ 30&deviceFilter=PHONE&displaySize=3.0,15.0

[50] Thalmann D. Ulicny B., Ciechomski P. d. H. 2004. Crowdbrush: Interac-tive authoring of real-time crowd scenes. Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, SCA ’04 (2004).

[51] Wikipedia. 2015. Thorough description on the speci�cations of the iPhone 6S. Retrieved June 8, 2020 fromh�ps://en.wikipedia.org/wiki/IPhone_ 6S

[52] Wikipedia. 2020. Thorough description on compatibility and releases of Xcode. Retrieved June 8, 2020 fromh�ps://en.wikipedia.org/wiki/ Xcode

[53] Wikipedia. 2020. Thorough description on the speci�cations of the iPhone XR. Retrieved June 8, 2020 fromh�ps://en.wikipedia.org/wiki/IPhone_

(18)

XR

[54] Alexa M. Hormann K. Winkler T., Drieseberg J. 2010. Multi-scale geometry interpolation. Computer Graphics Forum 29, 2 (2010).

[55] Ni L. Zheng P. (Ed.). 2006. Smart Phone and Next Generation Mobile Computing. Morgan Kaufmann.

(19)

TRITA -EECS-EX-2020:490