Tile Based Procedural Terrain Generation in Real-Time: A Study in Performance

(1)

Thesis no: MECS-2014-08

Tile Based Procedural Terrain Generation in Real-Time

A Study in Performance

David Grelsson

Faculty of Computing

Blekinge Institute of Technology SE371 79 Karlskrona, Sweden

(2)

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulllment of the requirements for the degree of Master of Science in Engineering:

Game and Software Engineering. The thesis is equivalent to 20 weeks of full time studies.

Contact Information:

Author(s):

David Grelsson

E-mail: dagc09@student.bth.se

University advisor:

Prof. Johan Hagelbäck School of Computing

Faculty of Computing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00 SE371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

(3)

Abstract

Context. Procedural Terrain Generation refers to the algorithmical creation of terrains with limited or no user input. Terrains are an important piece of content in many video games and other forms of simulations.

Objectives. In this study a tile-based approach to creating endless terrains is investigated. The aim is to nd if real-time performance is possible using the proposed method and possible performance increases from utilization of the GPU.

Methods. An application that allows the user to walk around on a seemingly endless terrain is created in two versions, one that exclusively utilizes the CPU and one that utilizes both CPU and GPU.

An experiment is then conducted that measures performance of both versions of the application.

Results. Results showed that real-time performance is indeed possible for smaller tile sizes on the CPU. They also showed that the application benets signicantly from utilizing the GPU.

Conclusions. It is concluded that the tile-based approach works well and creates a functional terrain. However performance is too poor for the technique to be utilized in e.g. a video game.

Keywords: Procedural Terrain Generation, Multi-threading, Perfor- mance.

i

(4)

List of Figures

1.1 The draw pipeline Direct3D version 11. . . 4

1.2 The dispatch pipeline Direct3D version 11. . . 5

4.1 Tile expansion as the user moves over the terrain. . . 11

4.2 A screenshot where heightmap and a simple grass texture have been aplied. . . 13

4.3 A screenshot with more elaborate texturing. . . 14

4.4 A screenshot of a river created by software agents. . . 16

4.5 A screenshot of a road created by software agents. . . 17

6.1 Comparison of execution times (in ms) between CPU and GPU. . 21

6.2 Comparison of execution times (in ms) between dierent thread group sizes on the GPU. . . 22

6.3 Comparison of execution times (in ms) with and without software agents on the CPU. . . 23

6.4 Comparison of execution times (in ms) with and without software agents on the GPU. . . 24

ii

(5)

List of Tables

4.1 Values for variables used to control the fractal function. . . 12 5.1 Hardware specication for the PC used in the experiment. . . 20 6.1 Comparison of execution times (in ms) between CPU and GPU. . 22 6.2 Comparison of execution times (in ms) between dierent thread

group sizes on the GPU. . . 22 6.3 Comparison of execution times (in ms) with and without software

agents on the CPU. . . 23 6.4 Comparison of execution times (in ms) with and without software

agents on the GPU. . . 24 7.1 Frame rate expressed as maximum time each frame is allowed to

take in ms. . . 25 7.2 Results from a students t test on the collected data for the CPU/GPU

comparison. . . 26

iii

(6)

Chapter 1 Introduction

Procedural Content Generation (PCG) concerns creating content using algorithms, meaning no human designer or artist is needed. This has the possibility to shorten development time, cut costs and potentially inspire new types of content and gameplay. Procedural Terrain Generation (PTG) is a sub-set of PCG that concerns specically the creation of terrains.

In this study an application that allows the user to walk around on a procedurally generated terrain is developed. A tile-based approach is used and as the user gets close to the edge of the terrain new tiles are generated along that edge.

Tiles that the user have moved away from are removed once a certain distance is reached between the tile and the user. In this way a seemingly endless terrain is created. To add more detail to the terrain software agents will be used to create rivers and roads. The application is developed in two versions: one version that exclusively utilizes the CPU and one optimized version that utilizes both CPU and GPU when generating the terrain. An experiment is conducted that measures performance of the two versions for dierent tile sizes. The purpose is to determine if real-time performance on the CPU is possible, at which tile size real-time performance is no longer possible on the CPU, if the application will benet from GPU utilization and if so at which tile size GPU utilization becomes more eective than exclusively utilizing the CPU.

When generating the terrain larger tiles are preferable since that would allow for the user to view more of the terrain at once making the application more visually impressive. However as tile sizes increase as does the time for generating them. Larger tiles means that more height values needs to be calculated and that the software agents have a larger data set to traverse.

Terrains are an important piece of content in many video games and other forms of simulations [8, 23]. Quite a few studies exists in the area of procedural terrain generation, however few of these studies concern performing the generation in real-time and as far the author is aware this study is unique in that it develops a tile-based method for generating a seemingly endless terrain.

1

(8)

Chapter 1. Introduction 2

1.1 Procedural Content Generation

PCG is "the algorithmical creation of game content with limited or indirect user input" [25]. Where content refers to things such as, terrain, levels, cities, equip- ment, dialogue, etc. PCG is interesting for developers to look at for a number of dierent reasons.

Video games require a huge amount of content with high detail to meet player expectations that continually rises as games become more complex and grows in scope [6]. Creating all this content by hand can be both time consuming and expensive. PCG used both oine and online can speed up the development and mitigate high costs. However creating a program that can create accurate and fun content can be very dicult.

Oine PCG is when content is generated during development and the n- ished game ships with all its content already generated. This lowers the demand for the algorithms to create accurate content since faulty or vapid content can be discarded, or be handed of to a human designer who can x problems or add more detail. It also has the possibility to serve as a source of inspiration for human content creators. The algorithms are not constrained by our limited human imag- ination and could possibly produce content that a human designer never would have thought of.

Online PCG on the other hand is when the game generates content as the game is played, either during start up or in real-time during gameplay. This can increase the replay value of a game by oering new experiences to the player every time he returns to the game. It also allows for the emergence of new styles of gameplay that centres around procedurally generated content [24]. If the PCG-algorithms used can be made ecient enough and consistently create novel content it opens up the possibility for truly endless games where, wherever the player goes there are new areas to explore and new things to do. The generation of content could also be adjusted to suit a particular playing style or skill level.

1.2 Noise and Fractals

Common techniques when generating terrains include noise and fractals, either used by themselves or combined to create a heightmap that control the terrain's elevation. In the context of computer graphics noise is an apparently stochastic irregular function. Noise is often used in procedural texturing with the purpose of introducing randomness to patterns that would otherwise look too monotone to appear natural. It is often desirable that noise exhibits the following properties:

It is repeatable of its inputs, meaning that the same input always returns the same output.

It has a range that is known, usually between -1 and 1.

(9)

It is not obviously periodic. Pseudorandom functions always exhibit some level of periodicity, however the period can be made long enough that it is not noticeable.

A popular noise function is Perlin Noise created by Ken Perlin in 1983 [18].

The function divides the texture space into a lattice where each lattice point contains a pseudorandom wavelet. The function computes a noise value for a given coordinate by determining which cell the coordinate belongs to, computing the wavelets for the lattice points in each corner of that cell and then summarizing the wavelets. In 2002 Ken Perlin made some improvements to his noise function.

The improved version is called Improved Perlin Noise and sometimes Simplex Noise. Improved Perlin noise has been shown to generate better looking noise and also running up to 10% faster than the original noise function [21].

A fractal can be dened as "a geometrically complex object, the complexity of which arises through the repetition of a given form over a range of scales" [18].

The "given form" can also be called a basis function. This function can be just about anything, common examples include: an image, a noise function, etc. Frac- tals can be used to model a number of dierent natural phenomena. Mountains are a good example, a smaller part of a mountain looks just as much as a mountain as a larger part. Trees are another example where smaller branches are visually similar to larger ones.

Fractal dimension is the relationship between frequency and amplitude which determines roughness. A terrain generated with a uniform dimension will have the same roughness everywhere. When looking at a real terrain there are often smooth hills and valleys, rocky patches and mountains with a much more rough surface rising from the more smooth ground. By using multifractals which alters their dimension depending on location much more realistic terrain with a varying roughness can be created.

1.3 GPU Programming and The Compute Shader

In 2001 NVIDIA released GeForce 3 which was the rst GPU to support programmable vertex shaders, exposed through DirectX 8 and extensions to

OpenGL [1]. Since then GPUs have continued to become more powerful and programmable and have more and more started to resemble parallel computers.

This has made it possible to use GPUs for other forms of calculations than those strictly tied in with graphics. Problems of a high computational cost that exhibit some level of parallelism can be implemented to run on a GPU to receive a signicant speed-up.

Current graphics pipelines work by moving data through a set of steps called shaders in a predetermined order. Direct3D version 11 supports two dierent kinds of pipelines. There is the draw pipeline (see gure 1.1) that works as

(10)

Chapter 1. Introduction 4 explained above by moving data through shaders. Then there is the dispatch, or compute shader pipeline (see gure 1.2) that exists outside the draw pipeline and allows for programmers to view the GPU as a generic grid of parallel processors.

This allows the programmer to utilize the GPU for general purpose calculations without being forced to render geometry or cast information into textures.

Figure 1.1: The draw pipeline Direct3D version 11.

A compute shader is invoked by making a call to a dispatch method [14]. This call will dispatch a grid where each cell contains a thread group, which size has been specied in the shader. Dierent thread group sizes can have impacts on performance since it determines how threads are scheduled and memory usage.

Threads and thread groups can be identied using system values such as e.g.

SV_DispatchThreadID which contains the thread's x and y coordinate in the grid.

(11)

Figure 1.2: The dispatch pipeline Direct3D version 11.

1.4 Problem Statement

A lot of work has been done in the area of PTG. However very few of the studies that exists conducts the generation in real-time and none of the studies the author has been able to nd allows for the terrain to expand as the user moves over it.

Utilizing online PTG and allowing the terrain to expand in real-time could allow for some interesting properties in a game utilizing this technique:

It creates a seemingly endless world. Some limitations exists that prevents the terrain from being actually endless but it allows the player to move a great distance before hitting the edge.

The amount of disk space needed for the game is reduced since the terrain does not exist until the game is played.

If coupled with a learning algorithm the game could adjust the terrain generation to suit a particular playing style or skill level.

The purpose of this study is to investigate a tile-based technique for generating a terrain that can expand in real-time. For each tile a heightmap will be generated. However a simple heightmap would create a rather bland and unin- teresting terrain. When considering a video game context a player would likely want to see more details such as e.g. dierent colouring of the ground, forests, rivers, etc. This has been taken into account in this study by employing a simple colouring technique for the terrain as well as software agents that will add rivers and roads. The additional detail serves to more accurately estimate performance costs, were the technique to be used in e.g. a video game. Additionally the use of software agents further validates the tile-based technique by showing that changes to the terrain does not need to be localized to a single tile but rather can extend over multiple tiles, that are largely independent, without artefacts such as e.g.

tearing.

(12)

1.5 Aim and Objectives

The aim of this study is to test if real-time performance of procedural generation of a seemingly endless terrain using the "Ridged Multifractal Terrain Model" in combination with software agents is possible on the CPU. It will also investigate possible performance increases by moving parts of the proposed algorithm to the GPU.To achieve this the following steps are completed:

1. An algorithm for generating heightmaps is selected, in this case the "Ridged Multifractal Terrain Model" [18].

2. The heightmap generation algorithm, as well as a method for tiling pieces of terrain are implemented, exclusively utilizing the CPU.

3. Software agents are implemented that can traverse the terrain and create rivers and roads, also utilizing the CPU.

4. An alternate version of the heightmap generation algorithm is implemented, utilizing the GPU.

5. An experiment is conducted that measures the performance of the two dif- ferent versions of the application.

(13)

Chapter 2 Related Work

Musgrave et al. [18] describes what fractals are and how they can be used to model natural phenomena. Musgrave then goes on to present a number of ontogenetic models for generating terrains that are designed to approximate certain erosion features. The models are varieties of multifractals and among them the "Ridged Multifractal Terrain Model", which will be utilized in this project, can be found.

Génevaux et al. [9] created terrains by simulating hydrological erosion. The user provides a sketch as well as a couple of input parameters. The sketch contains the terrain outlines, river mouths and some river parts. From this the algorithm creates a river network represented by a graph. The output from the river generation is a set of 3D polylines describing river elevations. The rivers are then categorised into procedural primitives such as junctions, springs, deltas, and river trajectories that are used in the nal rendering. When the river network has been completed topology data is extracted from the graph for use when constructing the terrain.

Togelius et. al. [26] proposes a search-based procedural content generation algorithm for strategy game maps. Five objectives that determines player experience are dened and a multiobjective evolutionary algorithm is created that searches for maps that satises pairs of these objectives. Since the objectives are conicting Pareto fronts are used to show how the objectives are balanced. The map generation starts with a at grid upon which bases, resources and mountains are placed. The mountains are generated with the use of Gaussian curves drawn in two dimensions.

Doran and Parberry [6] explores a method that will give the user more control over how the terrain is generated. This is achieved by using intelligent software agents and allowing the user to control the constraints on those agents. The terrain generation is divided into three phases. In the rst, or coastline phase, agents work to outline a landmass, that could possibly be surrounded by water. In the second, or landform phase, a larger number of agents move over the landmass to create beaches, hills, mountains, etc. In the nal phase, the erosion phase, rivers are created by eroding parts of the terrain.

Olsen [20] created eroded fractal terrains in real-time. However Olsen uses a dierent denition of real-time than what will be used here. His algorithm

7

(14)

Chapter 2. Related Work 8 generates terrains in a couple of seconds which is deemed acceptable since it is not uncommon for games to have loading times of up to 30 seconds [20]. The base terrain is created with the use of 1/f noise combined with Voronoi diagrams. The terrain is then eroded using a speed optimized thermal erosion algorithm that has been modied to emulate properties of hydraulic erosion.

Mistal [12] presents a GPU-based algorithm that can be used for terrain rendering. The algorithm can create and subdivide a large mesh with distance based level of detail. The height values for each vertex in the terrain are procedurally generated using the "Ridged Multifractal Terrain Model" described by Mus- grave [18]. The algorithm runs in real-time and achieves good performance.

This project diers from the above mentioned studies in that it aims to create a seemingly endless terrain generated in real-time using a tile based solution. All of the above mentioned studies uses a nite size on the terrain and only one of them [12] achieves real-time performance. This project will be using software agents to add more detail to the heightmap (that has been generated using fractals) similar to those described by Doran and Parberry [6]. However the agents in this project will be working under more strict performance requirements.

(15)

Chapter 3 Method

This study uses quantitative research methods. First an application is implemented to later be used in an experiment that measures the application's performance. The experiment is run several times with dierent congurations of the application and measures the time it takes to generate new tiles each time the terrain expands. The result from each run of the experiment is a series of time values that can be compared to determine how the application's performance changes for dierent congurations.

When generating the terrain larger tiles are preferable since that would allow for the user to view more of the terrain at once making the application more visually impressive. However as tile size increase as does the time for generating them. Larger tiles means that more height values needs to be generated and that the software agents have a larger data set to traverse. This project proposes to try and answer the following questions:

RQ1: Can seemingly innitely large terrains be created on the CPU in real-time by tiling pieces of terrain that have been procedurally generated using the "Ridged Multifractal Terrain Model" to create heightmaps and software agents to add more detail in the form of rivers and roads?

RQ2: How large can tiles be before real-time performance on the CPU is no longer possible?

RQ3: At which tile size does the application start to benet from GPU utilization?

An application starts becoming interactive at around 6 frames per second (fps) [1]. 15 fps can denitely be thought of as real-time. There is a noticeable drop in user performance when moving below 15 fps and only a modest increase when moving from 15 fps and up to 30 fps [5]. However Regan et al. [22] has found that human users can experience latency artefacts with a latency as low as 15 ms (around 67 fps). The most commonly occurring frame rates in video games are either 30 or 60 fps [2]. 15 fps is real-time, however if the method presented in this study is to be used in a video game context its performance should be at least 30 fps.

9

(16)

Chapter 3. Method 10 To answer the questions above an application is developed that allows for the user to walk around on a procedurally generated seemingly endless terrain. The terrain will be generated in tiles. The height map for each tile will be generated using the "Ridged Multifractal Terrain Model". When the heightmap has been generated software agents will move over the terrain and add more details such as rivers and roads. The application is developed in two version. One that exclusively utilizes the CPU and one version where the "Ridged Multifractal Terrain Model" calculations have been moved to the GPU.

An experiment is then conducted where performance is measured for dierent congurations of the application. All congurations of the application are run once for each tile size being tested. The tile sizes used in the experiment are 64x64, 128x128, 256x256 and 512x512.

(17)

Chapter 4 Implementation

An application was developed using C++ and Direct3D 11 that allows the user to walk around on a procedurally generated terrain. The terrain generation is tile-based. At start-up a tile is created with its center at (0, 0) as well as eight tiles surrounding the original tile. The user starts at position (0, 0) and can move around on the terrain as he pleases. When the user crosses over to an adjacent tile, new tiles are generated in the available spaces around the tile that the user crossed over to (see gure 4.1). When the user has moved a certain distance away from a tile, that tile is removed.

Figure 4.1: Tile expansion as the user moves over the terrain.

At start-up a grid is generated with the input parameters: width, depth, number of columns and number of rows. This results in a grid consisting of number of rows times number of columns vertices with one end of its diagonal in (^−width₂ ,^−depth₂ )and the opposite end in (^width₂ ,^depth₂ ). Each vertex contains a position, a surface normal and texture coordinates. At this point the vertex positions are in local space and are later translated by the tile object to world space. Since the grid is at the normals all have the value (0.0, 1.0, 0.0) and will be recalcu- lated once a heightmap has been generated. The texture coordinates ranges from

11

(18)

Chapter 4. Implementation 12

(0.0, 0.0) to (1.0, 1.0). The textures used on the terrain are all seamless allowing for them to be repeated several times over the terrain by scaling up the texture coordinates in the shader during rendering. This allows for textures with a lower resolution and makes sure there are smooth color transitions between tiles.

4.1 Heightmap Generation

When a new tile is being generated the grid is copied to the new tile object. A heightmap is then generated using the "Ridged Multifractal Terrain Model" [18].

The fractal function's output can be controlled by changing the values of ve dierent variables: number of octaves, dimension, lacunarity, oset and gain.

Octaves is the number of repetitions in the fractal. Dimension is the relationship between frequency and amplitude which determines the roughness of the generated terrain. Lacunarity determines the size of the gap between frequencies used in the fractal construction. Oset controls the multifractality of the function. As oset increases the function will go from multifractal to monofractal and eventu- ally approach a at plane. Gain controls the amplitude of the signal output by the fractal function. The values used for these variables is presented in table 4.1.

The values were selected based on recommendations in [18] and trial-and-error.

Variable Name Value

Octaves 8

Dimension 2.0 Lacunarity 2.5

Oset 1.0

Gain 2.0

Table 4.1: Values for variables used to control the fractal function.

As a basis function for the fractal, a gradient noise function called Simplex Noise is utilized. The fractal function is run for each vertex in the grid generating a height value. It takes the vertex's x- and z-coordinates, translated to world space, as input. These coordinates are then used by the fractal function as input to the noise function. To avoid big leaps between noise values for neighbouring vertices the coordinates used as input are scaled down by a factor of 1400. A sample of the resulting heightmap can be seen in gure 4.2.

When the heightmap has been generated the height values are applied to the vertices in the grid. Normals are then calculated for each vertex in the grid to allow for lighting calculations to be performed when the terrain is rendered. The normals are also used to texture the terrain. From the beginning the terrain has a grass texture applied to it. For each vertex a dot product is calculated between its normal and a vector aimed straight up, that value is then amplied (multiplied by ve in this case) and written to a blend map that when rendering is used to

(19)

Figure 4.2: A screenshot where heightmap and a simple grass texture have been aplied.

blend in stone and dirt textures. The result (which can be viewed in gure 4.3) is a more varied colouring of the terrain where steeper angels are textured with rock and dirt instead of grass.

4.1.1 Heightmap Generation GPU

The CPU version of the fractal function is sequential stepping through each vertex one at a time. However the "Ridged Multifractal Terrain Model" allows for each point to be evaluated individually making it suitable for a parallel implementation. As an attempt at optimization another version of the fractal function described above was implemented on the GPU using the compute shader in Di- rect3D 11. The author is aware that there are other programming models that can be used in order utilize the GPU for general purpose calculations such as OpenCL [10], CUDA [19] or C++ Amp [13]. However due to previous experience working with the compute shader and the fact that the application uses DirectX for rendering, the compute shader was chosen to reduce development time. There are several performance comparisons between CUDA and OpenCL [11, 7, 4, 3, 17].

The author has however been unable to nd any such comparisons that include the compute shader or C++ Amp.

There are a number of dierent ways a resource can be bound to a shader. In this application the following were used: the shader resource view, the constant

(20)

Figure 4.3: A screenshot with more elaborate texturing.

buer and the unordered access view.

A shader resource view (SRV) is a read only resource that can be bound to any shader stage. A common use of SRVs are to bind textures to a shader.

A constant buer (cbuer) is a read only resource on the GPU that can be updated frequently by the CPU between calls to the GPU. It is commonly used to store variables such as camera data or transformation matrices.

Unordered access views (UAV) were introduced in Direct3D 11 and can be bound only to pixel and compute shaders. The GPU has both read and write access to this type of resource.

Heightmap array that will contain the nished heightmap is bound to the shader as a UAV. Data that is needed by the noise function such as permutation tables, gradients and exponents are created on the CPU during start-up of the application and are bound to the shader as SRVs. The following variables are bound to the shader via a constant buer:

Tile position.

Tile width.

Tile depth.

(21)

Tile number of columns.

Tile number of rows.

Two constants used by the noise function that are calculated at start up.

Number of octaves.

Fractal dimension.

Lacunarity.

The compute shader is dispatched with an equal number of threads that there are vertices in the grid. Since the grid is not copied to the GPU vertex positions needs to be calculated to be used input to the noise function. This is done using tile width/depth, tile number of columns/rows, tile position and thread id as shown in listing 4.1. Other than that the fractal function looks the same as it does on the CPU.

Listing 4.1: Calculating position using thread id

pos . x = −(width / 2 . 0 ) + threadId . x * ( width /( colCnt −1));

pos . y = ( depth / 2 . 0 ) − threadId . y * ( depth /( rowCnt −1));

pos . x += tilePosX ; pos . y += tilePosY ;

A resource that has been created with read/write access on the GPU can not be read by the CPU. Therefore once the compute shader is nished the heightmap data needs to be copied to a resource that the CPU has read access to before the data can be used in the subsequent steps in the terrain generation.

4.2 Software Agents

To add more detail to the terrain software agents were utilized to create rivers and roads. The agents use a greedy search algorithm to nd suitable vertices in the grid that they can add to their path. The agents are given a vertex in the grid as their starting location and a direction that they aim to move along.

The starting vertex is added to the closed list and set as the current node. The algorithm will look at the current node's neighbours along the x- and z-axis, it does not move diagonally. If the neighbouring node is valid i.e. in bounds and not already evaluated, it will be given a value by a heuristic function and be added to the open list. The open list is sorted in ascending order according to the nodes heuristic values. The node at the front of the open list is added to the closed list and the open list is emptied. The last node in the closed list is set as the current node and the algorithm starts over looking at the neighbours. The algorithm continues like this until a stopping criteria is met.

(22)

Chapter 4. Implementation 16 Agents will stop if they encounter the edge of tile. If that edge is adjacent to an already existing tile one of two things will happen.

If the agents has already crossed tiles once, it will be marked as nished and remove one third of the nodes starting from the back of the closed list.

This is to remove the noticeable edge that would otherwise appear between tiles.

If the agent is on its starting tile it will be removed completely.

The agents are not allowed to cross over to already existing tiles since it would be strange if a tile already seen by the user changed its appearance. On the other hand if there is empty space along the edge where the agent stopped it will be marked as incomplete and its last node will be saved so that the agent can cross over once a tile is being generated adjacent to that edge.

Figure 4.4: A screenshot of a river created by software agents.

When a new tile is being generated it will check if its neighbours have an agent that wants to cross over to it. If that is the case the tile will create an agent that starts where the neighbour's agent stopped. If no agents are trying to cross over, the tile will start on a new river or road, or both. Since water tends ow in a downwards direction rivers will start on the highest location on the tile. It does not necessarily need to be the highest location as long as it is high, however the highest is the easiest to nd without having to sort the values in the heightmap.

Roads are simply given the middle of the tile as a starting location. Agents will

(23)

Chapter 4. Implementation 17 be given the same direction as the new tile is being generated in, e.g. if the new tile is to the left of the current tile the agent will try to move to the left. This is done in an attempt to keep the agent from running into already existing tiles and thus being cut short.

The river agents use a heuristic function that will lower a nodes heuristic value if the node is in the same direction the agent has been assigned to follow.

If the node has a height value lower than the current node it will lower the node's heuristic value, if the node has a greater height value than the current node it will give the node a greater heuristic value. If the agent's only option is a node with a heuristic value that is too great the agent will mark itself as nished and stop since this means that the agent's only option is to travel uphill and water tends to ow downhill.

Once a path has been found the path is traversed and the surrounding area is modied. The terrain is eroded to create a riverbed which is also given a sand texture by writing to a blend map. The results of this process can be seen in

gure 4.4.

Figure 4.5: A screenshot of a road created by software agents.

Unlike the river agents, road agents do not look at the closest neighbours to the current node. Instead they look seven steps ahead. This is due to the fact that the agents have a tendency to do a zigzag pattern, which gave satisfying results for rivers, making them wider in certain places, but looks unnatural for roads. Increasing the distance between nodes added to the path remedied this problem. Road agents are also not allowed to move opposite the direction they

(24)

Chapter 4. Implementation 18 have been assigned to follow. Road agents often made u-turns and ran into nodes that had already been added to the path causing them to get stuck. The road agents use a heuristic function that is similar to the one river agents use in that it lowers the heuristic value of a node that is in the same direction as the agent has been told to follow. But instead of looking at height values this heuristic function looks at the angle between the normal and a vector aimed straight up, where a small angle lowers the score and big angle raises it. These angles have already been calculated when texturing the terrain and can thus be read from the blend map (see section 4.1). This causes the agent to keep to at ground and avoid moving up and down steep inclinations.

Just like the river agents, once a path has been found the path is traversed and the terrain modied. In this case no changes to the geometry is done, only the texture is changed to give the look of a dirt road. An example of this can be seen in gure 4.5.

4.3 Deterministic Tiles

In order to make the tile generation deterministic so that the terrain looks the same when the user returns to a previously visited area data on each tile must be saved as the tile is generated the rst time. Each time a new tile is generated a struct containing the following data is saved to an array:

Tile position.

River start position.

River direction.

Road start position.

Road direction.

After new elements have been added to the array it is sorted in ascending order based on tile position using heapsort to allow for faster searching using binary search. When a new tile is generated it will rst search through this array to see if there previously has been a tile located at the same position. If a match is found the information above can be used to recreate the tile. If the direction for either river or road is zero that means this tile does not have a river or road.

No information needs to be saved to recreate the heightmap since the input to the fractal function (which always gives the same output for a given input) is the vertex position, which does not change.

(25)

Chapter 5 Experiment

An experiment was conducted that measured the performance of the application with dierent congurations for four dierent tile sizes. The aim is to nd out performance dierences between CPU and GPU and how much the software agents impact on performance.

During the experiment the user had no control of the application, the camera moved by itself at a constant velocity making a ninety degree turn every minute. Control of the camera was taken away from the user to ensure the same movement pattern each time the application was executed. The same movement pattern means the same number of tiles are generated and that rivers and roads are created the same way since they depend on which direction the user is moving in. Each time the application was executed it ran for ve minutes. Each time the terrain expanded time was measured to see how long it took for all of the new tiles to be generated. Most commonly three to ve tiles are generated each time the terrain expands (see gure 4.1). Time was measured using the QueryPer- formanceCounter [15] and QueryPerformanceFrequency [16] functions. Once the application had nished, a mean value and standard deviation was calculated and written to le along with all the collected time stamps. The following four tile sizes were tested:

64x64

128x128

256x256

512x512

These four sizes were tested on the CPU, with and without river/road agents, and on the GPU, with and without river/road agents. The GPU version was also tested with 3 dierent thread group sizes: 8x8x1, 16x16x1 and 32x32x1.

The results of the experiment can be viewed in chapter 6. The experiment was executed on PC with the hardware specication presented in table 5.1.

No attempts to evaluate the visual quality of the terrain was made during the experiment. The terrain was surveyed by the author to some extent during

19

(26)

Chapter 5. Experiment 20 Operating System Windows 8.1 Pro, 64-bit

CPU Intel Core i7-2600K 3,40 GHz

RAM 8 GB DDR3 1600 MHz

GPU NVIDIA GeForce GTX 460 1 GB GDDR5

GPU Driver Version 332.21

Table 5.1: Hardware specication for the PC used in the experiment.

development to make sure the algorithm delivered visual quality to the author's liking and not producing faulty content such as e.g. tearing between tiles, agents behaving in a manner they are not suppose to, etc. However this experiment is strictly focused on the performance of the application.

(27)

Chapter 6 Results

The experiment yielded results that shows that the application benets signicantly from GPU utilization. As can be seen from gure 6.1 the GPU is slightly faster than the CPU at tile size 64x64. As the tiles increase in size the GPU starts to pull a head more and more. At tile size 512x512 the GPU is more than twice as fast as the CPU.

0 100 200 300 400 500 600

64X64 128x128 256x256 512x512

Elapsed Time In ms

Terrain Tile Size CPU GPU Comparison

CPU GPU

Figure 6.1: Comparison of execution times (in ms) between CPU and GPU.

When comparing dierent thread group sizes on the GPU only small dierences in performance between the dierent group sizes were detected (see g- ure 6.2). The dierence between fastest and slowest for the dierent tile sizes ranges between less than a millisecond to 2.6 milliseconds.

Running the application without software agents on both CPU and GPU showed a small performance increase. It also revealed the heightmap generation as the part of the algorithm taking the most time.

21

(28)

Chapter 6. Results 22

Tile Size CPU GPU 64x64 19.1 14.6 128x128 56.2 35.1 256x256 145.4 74.3 512x512 536.0 249.8

Table 6.1: Comparison of execution times (in ms) between CPU and GPU.

0 50 100 150 200 250 300

64x64 128x128256x256512x512

Elapsed Time In ms

Terrain Tile Size

GPU Thread Group Size Comparison

8x8 16x16 32x32

Figure 6.2: Comparison of execution times (in ms) between dierent thread group sizes on the GPU.

TileSize 8x8 16x16 32x32 64x64 14.63 15.51 14.91 128x128 35.14 35.96 36.63 256x256 75.79 74.92 74.26 512x512 252.4 249.8 252.0

Table 6.2: Comparison of execution times (in ms) between dierent thread group sizes on the GPU.

(29)

0 100 200 300 400 500 600

64x64 128x128 256x256 512x512

Elapsed Time In ms

Terrain Tile Size CPU Agent Comparison

On Off

Figure 6.3: Comparison of execution times (in ms) with and without software agents on the CPU.

Tile Size Agents On Agents O

64x64 19.08 6.786

128x128 56.17 40.29

256x256 145.5 121.3

512x512 536.0 457.3

Table 6.3: Comparison of execution times (in ms) with and without software agents on the CPU.

(30)

0 50 100 150 200 250 300

64x64 128x128 256x256 512x512

Elapsed Time In ms

Terrain Tile Size GPU Agent Comparison

On Off

Figure 6.4: Comparison of execution times (in ms) with and without software agents on the GPU.

Tile Size Agents On Agents O

64x64 14.63 6.932

128x128 35.14 22.43

256x256 74.26 53.52

512x512 249.8 179.4

Table 6.4: Comparison of execution times (in ms) with and without software agents on the GPU.

(31)

Chapter 7 Analysis

As mentioned in the method chapter, real-time was dened as at least 15 fps and it was stated that most video games run at either 30 or 60 fps. For easier comparison with the data presented in chapter 6 these values have been expressed in the maximum amount of time each frame is allowed to take if the frame rate is to be achieved in table 7.1.

FPS ms

15 67

30 33

60 17

Table 7.1: Frame rate expressed as maximum time each frame is allowed to take in ms.

When comparing these values to the results presented in table 6.1 we can see that the CPU does indeed create tiles in real-time for the tile sizes 64x64 and 128x128. Once a size of 256x256 is reached however the CPU is far past the limit for what can be considered as real-time. The GPU version of the application is also capable of creating tiles in real-time for the sizes 64x64 and 128x128. Even though the GPU version is signicantly faster than the CPU it still takes too long to be considered real-time once tiles reaches the size of 256x256. This answers both RQ1 and RQ2.

It is worth noting that even though the tile size 128x128 can denitely be considered to be generated in real-time it is not fast enough for most video games.

Additional speed-up would be required for this to be applicable as anything more than a tech-demo. This is especially true when considering what is measured here is strictly the time for generating tiles, in e.g. a video game time will also be spent each frame on things such as game logic, physics, etc.

To answer RQ3, we need only to look at gure 6.1 which shows that the GPU is faster than the CPU already at the smallest tile size tested.

To ensure that the values used for the CPU/GPU comparison are statistically

25

(32)

Chapter 7. Analysis 26 Tile Size t test Signicance Probability

64x64 3.5 ∗ 10⁻²⁴ 100%

128x128 5.3 ∗ 10⁻⁶⁴ 100%

256x256 1.2 ∗ 10⁻⁹⁸ 100%

512x512 2.1 ∗ 10⁻⁷³ 100%

Table 7.2: Results from a students t test on the collected data for the CPU/GPU comparison.

signicant a two-tailed unpaired t-test was performed. The t-test revealed that the probability that the values used in the comparison are statistically signicant is close 100% for each tile size (see table 7.2).

(33)

Chapter 8 Discussion

The terrain created in this study is actually not endless. As stated in section 4.2 a small amount of data needs to be stored for each tile to be able to recreate it. This small amount is 40 bytes. Which when considering how much memory would be required to store an entire tile, where each vertex is 32 bytes, is not that bad. Still once the user have crossed enough tiles the computer will run out of memory.

Besides memory usage there is another limiting factor that prevents the terrain from being endless. The tile coordinates are stored in a 32-bit oat variable which cannot grow indenitely. Granted the user have to have moved a great distance before this becomes an issue which could be further postponed by using a 64-bit double variable instead but the limit is still there.

8.1 Conclusions

An application was developed that allows the user to walk around on a tile-based procedurally generated terrain. The application was developed in two versions:

one that exclusively utilizes the CPU and one that utilizes both CPU and GPU for generating the terrain. Performance was measured in order to answer questions stated in chapter 3.

The results showed that it was indeed possible to run the application with real-time performance on the CPU for two of the tested tile sizes: 64x64 and 128x128. At tile sizes of 256x256 real-time performance was no longer possible on the CPU. The GPU version was faster than the CPU version for all tile sizes.

Even though the GPU version was signicantly faster than the CPU version it was not fast enough to achieve real-time performance on tiles of 256x256 and larger.

It was also shown that while tiles with the size 128x128 could be generated in real-time it was not fast enough for most video games. This means that out of the four tile sizes tested, the smallest (64x64) is the only one that can be generated fast enough for use in a video game. However, 64x64 is a fairly small size resulting in relatively unimpressive terrain. The technique demonstrated in

27

(34)

Chapter 8. Discussion 28 this study is thus not suitable to be used in a video game context without further improvements to its performance.

While the application still was not fast enough for use in video games, the GPU proved a powerful tool in speeding up the application. At larger tile sizes the GPU version became more than twice as fast. Utilization of the GPU is denitely something that is worth considering when working with PTG-algorithms that exhibit a similar level of parallelism.

Despite poor performance at larger tile sizes the tile based approach produced a functional terrain. This shows that a tile based approach is feasible for creating a large terrain and is viable for further investigation to see if performance can be improved in order to make the technique suitable for a video game or a similar type of application.

8.2 Future Work

Figure 6.3 and gure 6.4 showed that removing the software agents gave an increase in performance but that most of the work was done by the heightmap generation. It would be interesting to investigate further optimizations that could be done to that part for the code. One example could be to better tailor the fractal function to the GPU to better tap in to the computing power of the GPU.

Another interesting thing to look at would be to split up the tile generation over several frames. As shown in gure 4.1 when the terrain expands three to ve tiles are generated. By splitting them up and generating only on tile per frame one might cut the execution times presented in this study by approximately two thirds. One might also consider to instead of using few large tiles to use a larger amount of small tiles having two or more layers surrounding the tile the user is on instead if just one as shown in gure 4.1. This could also be coupled with a level of detail algorithm allowing for tiles further away to stay partially generated until the user gets close enough notice a dierence, thus spreading the workload over several frames.

(35)

References

[1] Tomas Akenine-Möller, Eric Haines, and Naty Homan. Real-Time Render- ing. A K Peters, Ltd., 3rd edition, 2008.

[2] Dmitry Andreev. Real-time frame rate up-conversion for video games: Or how to get from 30 to 60 fps for "free". In ACM SIGGRAPH 2010 Talks, SIGGRAPH '10, pages 16:116:1, New York, NY, USA, 2010. ACM.

[3] J.P. Arun, M. Mishra, and S.V. Subramaniam. Parallel implementation of mopso on gpu using opencl and cuda. In High Performance Computing (HiPC), 2011 18th International Conference on, pages 110, Dec 2011.

[4] G. Bernabe, G.D. Guerrero, and J. Fernandez. Cuda and opencl implemen- tations of 3d fast wavelet transform. In Circuits and Systems (LASCAS), 2012 IEEE Third Latin American Symposium on, pages 14, Feb 2012.

[5] Mark Claypool and Kajal Claypool. Perspectives, frame rates and resolu- tions: It's all in the game. In Proceedings of the 4th International Conference on Foundations of Digital Games, FDG '09, pages 4249, New York, NY, USA, 2009. ACM.

[6] J. Doran and I. Parberry. Controlled procedural terrain generation using software agents. Computational Intelligence and AI in Games, IEEE Trans- actions on, 2(2):111119, June 2010.

[7] Jianbin Fang, A.L. Varbanescu, and H. Sips. A comprehensive performance comparison of cuda and opencl. In Parallel Processing (ICPP), 2011 Inter- national Conference on, pages 216225, Sept 2011.

[8] K.D. Forbus, J.V. Mahoney, and K. Dill. How qualitative spatial reasoning can improve strategy game ais. Intelligent Systems, IEEE, 17(4):2530, July 2002.

[9] Jean-David Génevaux, Éric Galin, Eric Guérin, Adrien Peytavie, and Bed°ich Bene². Terrain generation using procedural models based on hydrology. ACM Trans. Graph., 32(4):143:1143:13, July 2013.

29

(36)

References 30 [10] Khronos Group. The open standard for parallel programming of hetero- geneous systems. https://www.khronos.org/opencl/. [Online; accessed 12-02-2014].

[11] Kamran Karimi, Neil G. Dickson, and Firas Hamze. A performance comparison of cuda and opencl. CoRR, abs/1005.2581, 2010.

[12] Benjamin Mistal. Gpu terrain subdivision and tesselation. In Engel Wolgang, editor, GPU Pro 4 Advanced Rendering Techniques, pages 320. A K Peters, 2013.

[13] MSDN. C++ amp. http://msdn.microsoft.com/en-us/library/

hh265137.aspx. [Online; accessed 12-02-2014].

[14] MSDN. Id3d11devicecontext::dispatch method. http://msdn.microsoft.

com/en-us/library/windows/desktop/ff476405%28v=vs.85%29.aspx.

[Online; accessed 13-05-2014].

[15] MSDN. Queryperformancecounter function. http://msdn.microsoft.com/

en-us/library/windows/desktop/ms644904\%28v=vs.85\%29.aspx. [On- line; accessed 10-02-2014].

[16] MSDN. Queryperformancefrequency function. http://msdn.microsoft.

com/en-us/library/windows/desktop/ms644905\%28v=vs.85\%29.aspx.

[Online; accessed 10-02-2014].

[17] S. Mukherjeet, N. Moore, J. Brock, and M. Leeser. Cuda and opencl imple- mentations of 3d ct reconstruction for biomedical imaging. In High Perfor- mance Extreme Computing (HPEC), 2012 IEEE Conference on, pages 16, Sept 2012.

[18] F. Kenton Musgrave, David S. Ebert, Darwyn Peachy, Ken Perlin, and Steven Worley. Texturing & Modeling: A Procedural Approach. Morgan Kaufmann Publishers, 3rd edition, 2003.

[19] NVIDIA. What is cuda. https://developer.nvidia.com/what-cuda. [On- line; accessed 12-02-2014].

[20] Jacob Olsen. Realtime procedural terrain generation - realtime synthesis of eroded fractal terrain for use in computer games, 2004.

[21] Ken Perlin. Improving noise. ACM Trans. Graph., 21(3):681682, July 2002.

[22] Matthew J. P. Regan, Gavin S. P. Miller, Steven M. Rubin, and Chris Ko- gelnik. A real-time low-latency hardware light-eld renderer. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Tech- niques, SIGGRAPH '99, pages 287290, New York, NY, USA, 1999. ACM Press/Addison-Wesley Publishing Co.

(37)

References 31 [23] Ruben M. Smelik, Tim Tutenel, Klas Jan de Kraker, and Rafael Bidarra.

Declarative terrain modeling for military training games. International Jour- nal of Computer Games Technology, 2010, 2010.

[24] Gillian Smith, Elaine Gan, Alexei Othenin-Girard, and Jim Whitehead. Pcg- based game design: Enabling new play experiences through procedural content generation. In Proceedings of the 2Nd International Workshop on Pro- cedural Content Generation in Games, PCGames '11, pages 7:17:4, New York, NY, USA, 2011. ACM.

[25] Julian Togelius, Emil Kastbjerg, David Schedl, and Georgios N. Yannakakis.

What is procedural content generation?: Mario on the borderline. In Proceed- ings of the 2Nd International Workshop on Procedural Content Generation in Games, PCGames '11, pages 3:13:6, New York, NY, USA, 2011. ACM.

[26] Julian Togelius, Mike Preuss, and Georgios N. Yannakakis. Towards multiobjective procedural map generation. In Proceedings of the 2010 Workshop on Procedural Content Generation in Games, PCGames '10, pages 3:13:8, New York, NY, USA, 2010. ACM.

Tile Based Procedural Terrain Generation in Real-Time: A Study in Performance