GPU-Based Visualisation of Viewshed from Roads or Areas in a 3D Environment

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2016

GPU-based visualisation of

viewshed from roads or

areas in a 3D environment

Christoph Heilmair

(2)

Master of Science Thesis in Electrical Engineering

GPU-based visualisation of viewshed from roads or areas in a 3D environment

Christoph Heilmair LiTH-ISY-EX--16/4951--SE Supervisor: Harald Nautsch

isy, Linköping University Daniel Mellergård

Vricon

Filip Thordarson Vricon

Examiner: Ingemar Ragnemalm isy_{, Linköping University}

Information Coding

Department of Electrical Engineering Linköping University

(3)

(4)

(5)

Abstract

Viewshed refers to the calculation and visualisation of what part of a terrain is visible from a given observer point. It is used within many fields, such as military planning or telecommunication tower placement. So far, no general fast methods exist for calculating the viewshed for multiple observers that may for instance represent a road within the terrain. Additionally, if the terrain contains over-lapping structures such as man-made constructions like bridges, most current viewshed algorithms fail. This report describes two novel methods for viewshed calculation using multiple observers for terrain that may contain overlapping structures. The methods have been developed at Vricon in Linköping as a Mas-ter’s Thesis project. Both methods are implemented using the graphics program-ming unit and the OpenGL graphics library, using a computer graphics approach. Results are presented in the form of figures and images, as well as running time tables using two different test setups. Lastly, future possible improvements are also discussed. The results show that the first method is a viable real-time solu-tion and that the second method requires some addisolu-tional work.

(6)

(7)

Acknowledgments

I would like to thank my girlfriend Emma for helping and supporting me with this thesis. Also, I would like to especially thank my supervisors at Vricon, Filip Thordarson and Daniel Mellergård, for always taking their time to help me when I needed it. Lastly, Ingemar Ragnemalm at Linköping University has been a great help when discussing new ideas and methods.

Linköping, May 2016 Christoph Heilmair

(8)

(9)

Notation

Abbreviations

Abbreviation Meaning

gpu _{Graphics processing unit} los Line of sight

dem Digital elevation model tin Triangulated irregular network api Application programming interface glsl OpenGL Shading Language

ndc Normalized device coordinates cpu _{Central processing unit}

gis _{Geographic information system} fbm _{Fractional brownian motion} svo _{Sparse voxel octree}

(12)

(13)

List of Figures

1.1 An example of a viewshed in a triangulated irregular network. The observer is represented by the white quad close to the center of the image. The red areas correspond to the terrain that is not visible from the observer’s point of view (in every direction). . . 2 2.1 A terrain scene rendered from a camera’s perspective, using a

per-spective projection matrix. Note how this looks similar to if a real life camera would have taken this picture, i.e. an intuitive sense of distance is present. . . 7 2.2 The same terrain as in figure 2.1 rendered with an orthographic

projection matrix. Note that it is harder to get a sense of distance in this figure, compared to in 2.1. . . 8 2.3 The depth buffer of a rendered scene. The brightness of each pixel

corresponds to how far away that pixel is from the camera. Com-pletely black represents pixels exactly at the camera and comCom-pletely white is at the far plane of the camera. . . 9 2.4 Subfigure 2.4a shows how a terrain heightmap might look. The

subfigure 2.4b represents a triangulation of this heightmap. Each pixel in the heightmap has been converted to 3D coordinates of a vertex of a triangle. These triangles then form a complete mesh and are rendered, here in wireframe. . . 11 3.1 A visualisation of how the R2 algorithm can determine visibility

using accumulated slope, here simplified to a 2D case. If the point to be determined visible or not is u, then the maximum accumu-lated slope along its LOS is a. Since the slope at u is less than a, the maximum accumulated slope so far, we know that u must be invisible to the observer. . . 14 3.2 A theoretical cross section of a terrain surface consisting of a flat

ground plane and some overlapping structure above it, for instance a bridge. The R3 algorithm would deem the point u invisible al-though it clearly is not. . . 15 3.3 Mapping of terrain points to a logical sphere, by normalizing

di-rection vectors. . . 16 xiii

(14)

xiv LIST OF FIGURES

3.4 Stereographic projection, from 3D coordinates to 2D. The point P is projected from the north pole N to point P0in the 2D plane. . . 17 4.1 Terrain generated from a heightmap using FBM, with increasing

detail due to increasing number of octaves, 4.1a: 0 octaves, 4.1b: 1 octave, 4.1c: 2 octaves, 4.1d: 5 octaves . . . 21 4.2 A colour attachment texture of the framebuffer used in the

ob-server placement system. Each fragment’s world coordinate has been mapped to a colour by the fragment shader. . . 22 4.3 Issue with the stereographic projection from 3D sphere to 2D plane

beneath it. The point P is projected through the north pole N . The closer the point P is to N , the further away the point will be pro-jected. . . 23 4.4 An observer on terrain, with its shadowmap in the upper right

cor-ner of the image. The shadowmap is gecor-nerated by calculating all LOS directions to all terrain points, mapping them to a unit sphere around the observer, and lastly mapping that unit sphere to the 2D texture using a simple projection that uses the polar coordinates for each LOS. . . 24 4.5 Viewshed calculated for the observer in the middle of this image,

with the associated shadowmap in the upper right corner. Artifacts are introduced by triangles that cover the entire shadowmap. . . . 25 4.6 The double stereographic projection used in the method. If a LOS

on the unit sphere has its coordinates in the southern hemisphere, the left projection is used. If its coordinates lie in the northern hemisphere, the right projection is used. This ensures that the re-sulting texture sizes are deterministic. . . 26 4.7 A case where the simple ray traversal algorithm described in

algo-rithm 4 fails. Empty voxels are represented by the white squares, and filled voxels by red squares. If the fixed increment used to advance the ray falls at for instance the two points indicated by cir-cles, the filled voxel in between them may be missed. This results in the terrain point falsely being classified as visible although it is not. . . 32 4.8 A quadtree that demonstrates the principle of how the sparse voxel

octree is constructed. Filled voxels are illustrated as red squares and empty voxels are painted white. The main quadrants A, B, C and D, counted in a counter-clockwise fashion starting from the upper left, are each sub-divided into four additional child nodes since at least one of them contains a voxel. This process is repeated until no child node contains any filled voxels. The backslashes represent leaf nodes that are empty, whereas the red leaf nodes represent a filled voxel. . . 33

(15)

4.9 On the right, an illustration of how the SVO is represented in GPU memory using a 3D texture, using the quadtree on the left as its basis. The x and y texture indices each represent a tree node. Each

z layer then corresponds to information about every child in that

node. The alpha channel of each child indicates whether the node is empty (alpha = 1), contains a filled voxel (alpha = 2), or has children of its own (alpha = 0). The red and green channels hold information about the texture index at which a child node resides. 34 5.1 Result of the array of shadowmap method. 5.1a: terrain before any

viewshed calculations. 5.1b: Viewshed calculated and simultane-ously visualized for the 7 observers visible on the mountain top. . 38 5.2 Single triangle-shaped artifact in viewshed, with the northern

stere-ographic projection shadowmap. . . 40 5.3 Several triangular artifacts on a mountain side, with the northern

stereographic projection shadowmap. . . 41 5.4 Result of the voxelization and ray traversal method. 5.4a: terrain

before any viewshed calculations. 5.4b: Viewshed calculated and simultaneously visualized for the 3 observers visible on the moun-tain top. . . 42 5.5 SVO structure after voxelization of terrain. . . 43 5.6 Result of the voxelization. 5.6a: terrain before voxelization. 5.6b:

Terrain after voxelization. . . 44 5.7 Holes in voxelized terrain. . . 45 5.8 Artifacts induced by voxel size (red) and the voxelization itself

(green). . . 46

List of Tables

5.1 Setup rig used for timing measurements. . . 37 5.2 Timings for the shadowmap method. S.m. = shadowmap

construc-tion, V.s. = viewshed calculation. . . 39 5.3 Timings for the voxelization and ray traversal method. . . 40

(16)

(17)

1

Introduction

This report serves as the documentation of a Master’s Thesis project conducted at Linköping University and Vricon. The work was done at Vricon’s offices in Linköping.

This chapter begins to describe some related background and motivation for the project, and ends with presenting the investigated problem formulation and associated delimitations. All algorithms mentioned in this chapter will be de-scribed more in-depth in chapter 3.

1.1 Motivation

Viewshed analysis refers to the computation of visibility of (usually) terrain from a given observer point, i.e. what parts of the terrain is visible to the observer if he is situated at a particular point on this terrain. These calculations are done within the context of some digital representation of the terrain, for instance a set of vertices connected together to form triangles. These types of models are referred to as triangulated irregular networks (TIN). See figure 1.1 for an example of this.

Viewshed analysis is an interesting topic in many fields, for instance siting problems to ensure that telecommunication towers get the maximum coverage [6], [11] or impacts of new buildings on tourism [13]. Another proposed appli-cation is within military operations. In this field, given that an accurate TIN or other terrain model exists, accurate viewshed calculations potentially could help in planning the approach of a hostage situation to minimize exposure, or the setup of an ambush. The “reverse” viewshed could also aid in various military scenarios. For instance, if the position of an enemy sniper is known, the viewshed could be used to remain out of sight.

Traditionally, viewshed calculations are done in an offline manner, usually on 1

(18)

2 1 Introduction

Figure 1.1: An example of a viewshed in a triangulated irregular network. The observer is represented by the white quad close to the center of the im-age. The red areas correspond to the terrain that is not visible from the observer’s point of view (in every direction).

central processing units (CPU). However, with the increasing parallel processing power of graphics processing units (GPU), it is becoming possible to do these calculations in real-time [3], [5]. This opens up new possibilities for geographic information systems (GIS) such as the Vricon Explorer.

There are however still challenges to overcome when it comes to calculating the viewshed for overlapping terrain structures, or for novel applications such as calculating the viewshed along a road within some terrain representation.

1.2 Goal

This thesis aims to fill in the most important gaps presented above and propose a method for viewshed analysis visualisation that works in real time in 3D terrain models. In addition to this, the observer will not be a single point, but a segment of the terrain such as a road or a building roof top.

(19)

1.3 Problem formulation 3

1.3 Problem formulation

Given the background in section 1.1, an algorithm that solves these issues must meet the following criteria.

• Work in real time

• Be able to handle overlapping structures in the terrain

• Calculate viewshed from a road (i.e. a line segment) or area, and not just a single observer

In addition to this, an implementation of this algorithm also needs to be able to visualize the results in a meaningful way, i.e. from how much of the road is this particular terrain point visible.

This thesis aims to implement an algorithm with the above mentioned prop-erties.

1.4 Method

In order to solve the presented issues, a thorough literature study on the subject of viewshed analysis will be conducted. Several databases will be used, such as Google Scholar and UniSearch, provided by Linköping University’s library.

Algorithms and methods found during this study will be evaluated based on the requirements presented in section 1.3. If any algorithms exist that fulfil each requirement, it will be implemented and further improved upon. If however no suitable method exist to solve the issue at hand, a suitable starting point will be found and built upon so that it does solve the issue.

(20)

(21)

2

Theory

This chapter provides some underlying knowledge needed in order to understand the problem at hand.

2.1 Brief overview of 3D rendering

In order to fully understand the computer graphics based algorithms presented in 3, as well as the implemented novel algorithms that this thesis presents, a brief overview of modern 3D rendering is required.

2.1.1 The OpenGL rendering pipeline

This thesis utilizes the OpenGL graphics application programming interface (API) as its 3D renderer and for algorithm execution. When rendering meshes, as a terrain TIN can be seen as, there is a series of stages that the raw vertex data goes through until it forms triangles and lastly rasterized pixels on the screen. Since OpenGL version 2.0 these stages are programmable via shaders. Shaders are small programmes written in the OpenGL Shading Language (GLSL) that are compiled and executed on the GPU. These programmes act on an either per-vertex, per-primitive or per-fragment level, and can do so in massive parallelism. It is also important to note that there exists no concept of a world coordinate system in OpenGL itself. The only coordinate system that OpenGL knows about and uses, is the normalized device coordinates (NDC). These range from -1 to 1 in all three axes, with the origin at the center of the screen. The positive z-axis is oriented towards the viewer, meaning that the negative z-axis points into the screen. This means that all input data needs to be transformed from whatever coordinate system it is defined in, to the NDC system. This is particularly impor-tant to consider when defining custom projections.

(22)

6 2 Theory

Vertex shader stage The vertex shader acts on a per-vertex level. In a typical OpenGL application, the vertex shader takes care of transforming the in-put vertices, such as the terrain vertices that will later form triangles, from their own coordinate system to NDC. It does this using a transformation ma-trix, typically built from a model, view and projection matrix. The model matrix takes the vertex coordinates from whatever model space they were defined in to world coordinates. The view matrix transforms the world co-ordinates to the camera, or view, coco-ordinates. The projection matrix lastly transforms the view coordinates to NDC. Note that if the model is defined in world coordinates, no model matrix is needed. Since the shaders allow full programmability, any operations on the vertex positions can be con-ducted. Any other calculations that might be interesting for the next stage can also be calculated.

Geometry shader stage This stage is invoked per-primitive, after the vertex shader. That is, usually per-triangle. Any calculations that require the entire trian-gle can be done here. The output of the geometry shader is then zero or more primitives, which can be specified. Note that the use of a geometry shader is not required, as the other two are.

Fragment shader stage This is the last shader stage and acts per-fragment. A fragment is a part of a primitive that might become a pixel. The program-mer can decide whether to render a particular fragment to the output raster, or to discard it. Typical usage includes per-pixel lighting calculations and texturing. It is however, as with the other shader stages, up to the program-mer to decide what happens in the shader.

It is as mentioned possible to send any information calculated in a previous shader stage to the next one. For instance, it is possible to send some per-vertex attribute calculated in the vertex shader stage to the fragment shader. Since the fragment shader operates on fragments and not vertices, some interpolation is in this case required between the stages. OpenGL supports hardware interpolation, but the programmer may also choose to not interpolate the data at all. If so, the value passed to the fragment shader will be the value calculated in whatever vertex invoked that particular fragment.

Additionally, information from the central processing unit (CPU) that is not vertex data can be uploaded to the shaders by using uniform variables. These are declared in whatever shader stage they are needed, and set by the CPU prior to rendering.

Texturing is handled by samplers in OpenGL. These are declared as uniforms in the shader that wants to use them, usually the fragment shader. The program-mer then sets these texture samplers as active, and specify a certain texture unit from which they will actually sample data. Textures in OpenGL are not limited to 2D textures, but may be 1D textures, arrays of 2D textures, 3D textures or 4D textures, amongst others. The texture data that is set to be sampled in the shaders is of course also fully controllable by the programmer.

(23)

2.1 Brief overview of 3D rendering 7

2.1.2 Projection matrices

The projection matrix, used generally in the vertex shader, may use different types of transformations depending on what is rendered. For instance, if the rendered scene is to look as if a camera took it, a perspective projection is used. This is a type of projection transformation that preserves distances, i.e. objects further away from the camera appear smaller. Another type of projection is an or-thographic projection, that does not preserve distances. For a difference between these types of projections, see figures 2.2 and 2.1.

Figure 2.1: A terrain scene rendered from a camera’s perspective, using a perspective projection matrix. Note how this looks similar to if a real life camera would have taken this picture, i.e. an intuitive sense of distance is present.

It is of course again completely up to the programmer what type of projection is used, or if any at all is used.

2.1.3 Depth buffers

When primitives are rendered to the screen in the pipeline described in subsec-tion 2.1.1, it will be important in what order the primitives are rendered in to get a correct-looking result. If for instance a cube that is defined by 6 quads, each of which is defined by 2 triangles, is rendered, it is important that the triangles at the back are rendered first. If the triangles representing the side that faces

(24)

8 2 Theory

Figure 2.2:The same terrain as in figure 2.1 rendered with an orthographic projection matrix. Note that it is harder to get a sense of distance in this figure, compared to in 2.1.

the camera are rendered first, and after that the ones at the back, it would look unnatural and wrong.

A way to solving this problem is by using depth buffers. This stores the dis-tance from the viewing plane to each rendered pixel in normalized values on the interval [0, 1]. A normalized value of 0 means that the pixel is directly in front of the camera and 1 means that the pixel is at the camera’s far plane, i.e. just visible. OpenGL supports this type of depth buffer and the accompanying depth test as an option. The depth test, if turned on, is conducted after the fragment shader stage. Each fragment’s depth is compared to that of the same screen position, and if that value is less than that of the current fragment, the fragment is discarded. This efficiently solves the problem of ordering when rendering.

See figure 2.3 for an example of how that depth buffer might look.

2.1.4 Framebuffers

An important concept in 3D rendering is framebuffers. In normal rendering, the pipeline processes the vertex data and any other data, as described in subsection 2.1.1 and ultimately renders the pixels directly to the screen. It is however

(25)

pos-2.2 Digital elevation models 9

Figure 2.3: The depth buffer of a rendered scene. The brightness of each pixel corresponds to how far away that pixel is from the camera. Completely black represents pixels exactly at the camera and completely white is at the far plane of the camera.

sible to do the exact same data processing and calculations, and render to an offscreen raster, which is what framebuffers are used for.

A framebuffer may have a depth buffer attachment. This attachment works much like a regular texture and simply contains the depth buffer of the scene that is rendered to the framebuffer. In addition, a framebuffer may or may not have colour attachments. These also work like textures and, if used, contain the actual rendered scene as it would have looked if it would have been rendered to the screen instead of the framebuffer. It is also possible to only use a depth attachment without any colour attachments.

2.2 Digital elevation models

Since viewshed calculations need some form of terrain data in order to produce any results or be meaningful, and may produce different results depending on

(26)

10 2 Theory

how the data is represented, it is important to define such terrain models. A digital elevation model (DEM) is such a digital representation of terrain. The two main categories that are relevant in the case of viewshed analysis are vector-based meshes such as triangulated irregular networks, and raster-based models such as heightmaps. Both of these have their own advantages and disadvantages when it comes to performing viewshed algorithms on them.

2.2.1 Heightmap representation

A heightmap terrain model relies on elevation measurements performed on the real terrain at some interval. This data can then, for instance, be mapped to a grayscale image for visualisation or used directly for various calculations. A drawback to this type of DEM is that there only exists height information at what-ever grid points we sampled the height at. This could of course be enough, but in some cases the resolution might simply not be good enough. One way to alleviate this problem is to use triangulated irregular networks.

2.2.2 Triangulated irregular networks

This representation uses vertices connected together to form triangles. These triangles in turn form a complete terrain model, much like regular meshes in video games and other types of digital visualisation. A TIN can of course be created from a heightmap, by simply setting nearby sampled elevation points to the vertices of the triangles. Likewise, by sampling the TIN at the vertices again, a heightmap can be generated back. See figure 2.4 for an example of this.

2.2.3 Summary

While these types of terrain are commonly used, where TIN’s might be generated from some heightmap representation, it will prove to not be quite enough for the task to be solved by this thesis. The heightmap representation (and inevitably the TIN representation of that heightmap) inherently lacks the ability to repre-sent overlapping terrain or structures, such as bridges or cliff overhangs. This is because the heightmap representation is mathematically a parametric surface with bijective mapping between the planar parametric space and the surface, i.e. for each grid point in the heightmap there exists exactly one 3D point in the re-sulting surface. This can technically be solved by using multiple heightmaps for layering terrain. However, typically only a single heightmap is used.

For this thesis, the assumption is made that a mesh representation of the ter-rain exists that is proper 3D. That is, any viewshed algorithms need to be able to handle overlapping structures. This assumption is made since that is the type of data that Vricon is using.

(27)

2.2 Digital elevation models 11

(a)A 2D heightmap where each grayscale value represents height. The brighter a pixel, the higher the corresponding height value.

(b) A triangulated irregular network ren-dered in wireframe.

Figure 2.4:Subfigure 2.4a shows how a terrain heightmap might look. The subfigure 2.4b represents a triangulation of this heightmap. Each pixel in the heightmap has been converted to 3D coordinates of a vertex of a trian-gle. These triangles then form a complete mesh and are rendered, here in wireframe.

(28)

12 2 Theory

2.3 Visibility

In order to have some common ground for all these algorithms, we need to define what is actually meant by visibility. This thesis chooses to use a similar definition as is used in [8].

Definition 2.1. A terrain point u, defined by three coordinates xu, yu and zu, is

said to be visible by the observer v, defined by three coordinates xv, yvand zv, iff

(29)

3

Related work

Some previous work done in the field of viewshed calculations is presented in this chapter.

The algorithms are split up into two main categories, direct LOS based algo-rithms and computer graphics based algoalgo-rithms.

3.1 Direct LOS based viewshed algorithms

This section provides some overview of the more traditional viewshed algorithms found in literature. They are referred to as direct LOS-based since they all use a more mathematical approach to utilizing the LOS in the calculations. In con-trast, the GPU-based algorithms described later use more of a computer graphics approach.

3.1.1 R3 algorithm

This algorithm is a brute-force approach to solving the viewshed problem, and is described in [7]. It works by going through each terrain point and checking whether the LOS to the observer is clear, by comparing each elevation along the LOS to that of the current terrain point. If all points along this LOS have an elevation lower than that of the current terrain point, that point is deemed visible.

The algorithm is seen as a reference for other algorithms and in general con-sidered to be the exact viewshed [8].

3.1.2 R2 algorithm

A more refined version of the R3 algorithm is the R2 algorithm, also described in more detail in [8], [7]. It uses a similar approach as its predecessor. However,

(30)

14 3 Related work

instead of calculating the elevation of every point along every LOS, the R2 algo-rithm calculates what is referred to as the slope of the LOS. The slope of a LOS is defined as the angle between some ground plane and the LOS itself.

The algorithm itself works in two passes. First, the maximum slope along the LOS to all boundary points of the terrain grid are calculated, by simply accumu-lating the value as you move away from the observer. Next, the algorithm iterates through all terrain points as before, but this time simply compares the current point’s slope to the maximum along that LOS. If the exact LOS to that particular point does not exist, it approximates the maximum slope by that of the closest LOS. See figure 3.1 for an illustration.

The R2 algorithm is therefore considerably faster than the R3 algorithm, at the cost of some accuracy due to the approximation involved.

Figure 3.1:A visualisation of how the R2 algorithm can determine visibility using accumulated slope, here simplified to a 2D case. If the point to be determined visible or not is u, then the maximum accumulated slope along its LOS is a. Since the slope at u is less than a, the maximum accumulated slope so far, we know that u must be invisible to the observer.

3.1.3 Xdraw algorithm

A variant of the same principle that R2 works on, is the Xdraw algorithm. It is found often in viewshed literature [12], [8], [2]. It works in layers of grid points. The first layer corresponds to the 8 grid points closest to the observer, the second layer to the points closest to the first layer and so on.

It then works in a similar way as R2. It is noted that the visibility of a grid point depends on whether its slope is greater than the maximum slope along the current LOS. Instead of precalculating the maximum slopes for each LOS, the maximum value is approximated from the two closest neighbours in the previous layer.

This gives an even better performance, running time-wise, than the R2 algo-rithm, but at an even more increased cost of accuracy.

3.1.4 Summary

A clear pattern of LOS traversal with varying degrees of approximation is present in these methods. Many methods that are used in geographic information

(31)

sys-3.2 Computer graphics based algorithms 15 tems (GIS) use some variant of this. Efforts have been made to improve the approximations or running time of these algorithms by for instance developing GPU parallelizations [10], [15], by optimizing LOS traversal [14], [9], or by a com-bination of these improvements [2].

The key point is that these types of algorithms do not work with a terrain rep-resentation that allows overlapping structures, which the data that Vricon uses does. In fact, it is trivial to prove that the reference algorithm R3, as described in 3.1.1, fails on these types of terrains. Consider figure 3.2. The R3 algorithm would simply traverse the line of sight between the observer and point u and com-pare elevation values. This would falsely classify u as invisible since its elevation is lower than that of points that lie on the overlapping terrain (illustrated in the figure as an ellipse).

Figure 3.2:A theoretical cross section of a terrain surface consisting of a flat ground plane and some overlapping structure above it, for instance a bridge. The R3 algorithm would deem the point u invisible although it clearly is not.

It is safe to say, that all algorithms that work on the same principle as R3 would fail in this case. It should also be noted that most existing viewshed al-gorithms found in literature are in fact of this type. The next section describes instead computer graphics based algorithms, where the overlapping terrain case can be handled.

3.2 Computer graphics based algorithms

In contrast to the direct LOS based algorithms that primarily work from a math-ematical and data driven viewpoint, the computer graphics based approaches in general recognize that the viewshed problem is very closely related to shadow ef-fects. In fact, if one considers the observer to be a point light source, and what is invisible to the observer as shadows, it is identical. This opens up a different cate-gory of algorithms, many of which are inspired by computer graphics techniques for shadowing.

The following subsections describe two of these.

3.2.1 Shadowmap approach

This approach, presented in [3], is based on traditional shadowmapping used extensively in video games to create shadows. Its implementation is somewhat more complex than the previous LOS based approaches described earlier.

(32)

16 3 Related work

The general shadowmapping technique works in two stages. In the first stage, a framebuffer with only a depth attachment is setup and prepared for use. The scene is then rendered to this framebuffer. However, the scene is not rendered from the camera’s perspective as usual, but rather from the perspective of what-ever light source one wishes to cast the shadows. In the case of directional light sources, such as the sun, this means using an orthographic projection matrix, since that is how the sun sees the world. After this render pass, the depth buffer of the framebuffer now contains the normalized distance from every visible frag-ment to the light source.

The second stage is a normal render pass, where the scene is rendered as usual with the usual projection matrix. However, the texture that contains the depth information from the previous stage is uploaded to the fragment shader. Every fragment is then transformed to the coordinate system that was used in the first render pass, using whatever projection matrix was used then. With the fragment in the light’s coordinate system, its distance to the sun is calculated and normal-ized to a value on the interval [0, 1]. This distance value is then compared to what distance value is stored in the depth buffer, from the first render pass, at whatever position this particular fragment ends up in. If the current fragment’s distance is greater than what is stored at its position in the depth buffer texture, the fragment is not visible to the sun and therefore in shadow. If it is less or equal than the depth buffer value, it is visible by the sun and should be lit.

The authors of [3] propose a similar approach for viewshed calculations. The observer is seen as the light source, and what would be in shadow is simply not visible to the observer. However, a simple orthographic projection cannot be used since an observer needs 360◦visibility around it.

The authors propose using a somewhat alternate approach. Each fragment is mapped to a virtual sphere by drawing a vector from the observer to the cur-rent fragment position, and normalizing it. Figure 3.3 shows an illustration of this. This gives cartesian sphere coordinates of each fragment. Since the depth buffer cannot operate on 3D coordinates, the sphere coordinates are mapped to 2D using a stereographic projection. See figure 3.4 for an illustration of how the stereographic projection works. The resulting 2D mapping then becomes the shadow map. In principle, this is normal shadow mapping with a stereographic projection matrix.

Figure 3.3: Mapping of terrain points to a logical sphere, by normalizing direction vectors.

(33)

3.2 Computer graphics based algorithms 17

Figure 3.4:Stereographic projection, from 3D coordinates to 2D. The point

P is projected from the north pole N to point P0 in the 2D plane.

3.2.2 Occlusive volumes

In [5] a somewhat different algorithm is presented. Occlusive volumes are gener-ated from geometric meshes close to the observer, by extending their back faces a large distance in a direction pointing away from the observer. A check is then made to see if any terrain lies within these volumes, which means that they are invisible from the observer.

Since the method for generating the occlusive volumes may result in broken up volumes, i.e. when a triangle faces the observer and its neighbour faces away from the observer, special processing to prevent this must be done beforehand.

3.2.3 Summary

The occlusive volumes method is more complex than the shadowmap approach, since pre-processing of the vertex data is required. This includes fixing broken up volumes that may be created as a result of the generation algorithm. The shadowmap approach requires some finite resolution texture to be used as the shadowmap, which may result in artifacts in the resulting viewshed. This is some-thing that the occlusive volumes method avoids.

The shadowmap algorithm has other advantages, one of which is that it is in essence a widely used technique, albeit in a different field than GIS. The only real difference is the projection matrix used to map terrain point depths to the shad-owmap. It is therefore easier to implement in an existing visualization system than the occlusive volumes approach.

In terms of scalability, that is how well the algorithms handle multiple ob-servers, the shadowmap method performs better, since potentially, at least with-out modifications, each observer would require to go through the vertex pre-processing and fixing in the occlusive volume approach. With the shadowmap approach, it would suffice to create shadowmaps for each observer, which can be done at a relative low cost using the existing pipeline.

All in all, the shadowmap approach seems like a viable starting candidate to solve the issues that this thesis presents.

(34)

(35)

4

Implementation

This chapter describes the implementations of two proposed solutions to the problems presented in 1.3. It starts with a description of the test environment used to implement and evaluate these methods.

4.1 Test environment

In order to be able to thoroughly test any methods, an environment to do this in is required. The one used in this thesis consists of a procedural terrain generator, terrain shaders, an observer placement system and a simple way to change the viewshed algorithm.

4.1.1 Terrain generation

The terrain is generated using a mathematical algorithm referred to as Fractional Brownian Motion (FBM). In essence, this generates a heightmap, as described in section 2.2.1, for a grid of a predefined size with all height values initially set to 0. The terrain generator then loops over all grid points and calculates a height value using FBM. FBM works by sampling a 2D noise function, for each grid point, octaves times at increasing frequency and decreasing amplitude in order to get a height value for the current grid point. See algorithm 1 for pseudocode demonstrating this principle for a single height value.

The noise function may be any 2D noise function. In the case of the test environment, a coherent noise function is used. This means that for any 2D input, it always returns the same output, i.e. it is mathematically surjective. This is of course necessary if any viewshed algorithm comparisons should be done within the terrain.

(36)

20 4 Implementation

Algorithm 1Pseudocode for fractional brownian motion

octaves = 5 total = 0 f requency = 1 amplitude = 1

fori = 1 : octaves do

total ← amplitude ∗ noise(x ∗ f requency, y ∗ f requency) f requency ← f requency ∗ 2

amplitude ← amplitude ∗ 0.5

end for

map[x][y] = total

What makes this terrain generation very useful is the possibility to get arbi-trary detail in the terrain, by simply increasing the number of octaves. Figure 4.1 demonstrates how the detail in the terrain is increased with the number of octaves.

For visualisation purposes, normals are calculated using adjacent vertices. The terrain shaders then calculate a basic colour and lighting using the normals.

4.1.2 Observer placement system

For testing and convenience purposes an observer placement system was created within the test environment. This allows the viewshed observers to be placed on the terrain using the mouse.

In order to do this, some way of mapping the 2D screen pixel coordinates, in which mouse callbacks are registered, to the world coordinates of the terrain is required. This issue is solved by using a framebuffer in which the terrain is rendered to, using the usual projective transformations. The fragment shader used when rendering to this framebuffer maps the x, y and z components of the world coordinates to r, g and b values respectively of the output colour. Since this is a framebuffer, it is not rendered to the screen but rather to a texture in memory. This is done for each frame, as often as the terrain is rendered to the screen. See figure 4.2 for an example of the framebuffer texture.

The texture now, via a simple mapping, contains information about world coordinates. In particular, there is a direct relation between screen space coordi-nates and world coordicoordi-nates. Since the terrain was rendered to the framebuffer in exactly the same manner as when it is rendered to the screen, i.e. with the same transformations and projection matrix, a 2D point on this texture corresponds ex-actly to the same 2D point on the screen. This means that whenever a mouse click is registered on the screen, the world coordinates of that point are retrievable by sampling the framebuffer texture in the same point and inversely mapping the three colour components to coordinate components.

Using this method, the test environment allows direct placement of observers on the terrain using intuitive mouse clicks. Furthermore, a system exists where multiple observers can be placed along for instance a road. The world coordinates

(37)

4.2 Method 1: Arrays of shadowmaps 21

(a) (b)

(c) (d)

Figure 4.1:Terrain generated from a heightmap using FBM, with increasing detail due to increasing number of octaves, 4.1a: 0 octaves, 4.1b: 1 octave, 4.1c: 2 octaves, 4.1d: 5 octaves

of all observers are then stored in CPU memory and are accessible to the viewshed algorithms.

4.2 Method 1: Arrays of shadowmaps

The first method is an extension of the work presented in section 3.2.1. This section motivates the need to modify the original method and also describes the implementation of those modifications. The section ends with an overview of the complete method.

(38)

22 4 Implementation

Figure 4.2: A colour attachment texture of the framebuffer used in the observer placement system. Each fragment’s world coordinate has been mapped to a colour by the fragment shader.

4.2.1 Framebuffer depth attachment modification

The original method does not allow the viewshed to be calculated for multiple observers, which is a requirement. It is the shadowmap that enables the viewshed calculation of one particular observer, or light source, which means that several shadowmaps are needed to fulfil the requirements of multiple observers.

To do this, the framebuffer that is used when calculating the shadowmaps is modified to hold an array of depth buffers. Each observer, with all necessary information about it stored in CPU memory, is looped through and has its own shadowmap calculated as one particular slice of the framebuffers array of depth buffers. In the second step of the shadowmap approach, the step that performs the actual computation of the viewshed, the shadowmap array is uploaded to the GPU shader program as a regular texture along with an array that holds each observer’s world coordinate. The observers are then looped through, where their loop index corresponds to a slice of the shadowmap array, and the usual shad-owmap calculations are done as described in section 3.2.1.

(39)

This method in theory ultimately gives access to the viewshed of each indi-vidual observer. However, there are issues related to the projection matrix used, scalability and performance that require further modification. The implementa-tions of these modificaimplementa-tions are presented in the following secimplementa-tions.

4.2.2 Choice of projection matrix

The array of depth buffers allows the viewshed to be calculated for any number of observers, at least in theory. There are however still issues related to the original approach upon which this method builds, such as the projection matrix used.

Consider once again figure 3.4 on page 17, which depicts the stereographic projection that was originally used in the method described in section 3.2.1. This projection from a sphere to a 2D plane is a necessity, since the method yields 3D coordinates of LOS rays that must be mapped to a 2D texture. There is an ap-parent, and in fact well-known, issue with the stereographic projection; it does not allow the 3D point situated exactly at the north pole to be projected on to any point on the 2D surface. These points are instead projected to points at in-finity. In the context of the viewshed calculations, this has the implication that no terrain point may ever be directly above the observer, which is unacceptable. Furthermore, the closer any LOS comes to the north pole, the further out in the resulting 2D shadowmap the point will be projected to. See figure 4.3 for an illustration of this.

Figure 4.3: Issue with the stereographic projection from 3D sphere to 2D plane beneath it. The point P is projected through the north pole N . The closer the point P is to N , the further away the point will be projected.

An infinite sized shadowmap would have to be allocated to facilitate every possible terrain point which obviously is practically impossible. Instead, a differ-ent type of projection can be used.

In essence, the shadowmaps hold information about the closest terrain point for every LOS in every direction around every observer. Since the method oper-ates by mapping each LOS to a unit sphere, a projection from the sphere to a 2D plane is required, as described earlier. Theoretically, the only requirement for this projection is that it uniquely maps each 3D unit sphere coordinate to a 2D coordinate on the plane. One such unique mapping would be to simply convert the 3D sphere coordinates to polar coordinates, and using the φ and θ angles di-rectly as the x and y coordinates in the depth buffer texture. This also solves the issues that the stereographic projection gave rise to. Algorithm 2 mathematically

(40)

24 4 Implementation

describes the mapping.

Algorithm 2Simple unique 3D to 2D mapping forevery LOS mapped to 3D unit sphere do

φ ← atan(LOS.y, LOS.x) θ ← acos(LOS.z)

end for

Note that using this type of projection removes the intuitive nature of the shadowmap. Compare for instance figure 2.3 on page 9 with figure 4.4. In the first figure, the depth buffer contains a clear image of the terrain. Since the depth buffer shown in the second figure maps polar coordinate angles to the x and y axis, it is more difficult to interpret.

Figure 4.4: An observer on terrain, with its shadowmap in the upper right corner of the image. The shadowmap is generated by calculating all LOS directions to all terrain points, mapping them to a unit sphere around the observer, and lastly mapping that unit sphere to the 2D texture using a sim-ple projection that uses the polar coordinates for each LOS.

(41)

The choice of projection does solve the issues it was designed to solve, but brings new issues. Primarily, this is related to how the shadowmap is generated within OpenGL. Since the φ angle is calculated as the arctangent of the x and y components of the spherical LOS coordinates, and since the transformed vertices will be connected to form triangles, the possibility exists that triangles will be created that stretch over the entire width of the shadowmap. This specifically happens when vertices of a triangle are transformed to opposite sides of the arc-tangent gap, e.g. when one vertex is transformed close to π and the others are transformed close to −π. Since this angle is mapped as-is to the 2D texture, the connected vertices form a triangle that possibly spans the entire texture. See figure 4.5 for an example of this issue as well as the induced artifacts in the view-shed.

Figure 4.5: Viewshed calculated for the observer in the middle of this im-age, with the associated shadowmap in the upper right corner. Artifacts are introduced by triangles that cover the entire shadowmap.

To solve both the issues of the originally used stereographic projection, and that of the simple polar coordinate mapping, a compromise of using two stere-ographic projections per observer is implemented. By splitting the unit sphere

(42)

26 4 Implementation

into two hemispheres and using the appropriate stereographic projection depend-ing on whether a LOS coordinate lies in the upper or lower hemisphere, the issue of having to allocate an infinite texture is solved, as shown in figure 4.6.

Figure 4.6: The double stereographic projection used in the method. If a LOS on the unit sphere has its coordinates in the southern hemisphere, the left projection is used. If its coordinates lie in the northern hemisphere, the right projection is used. This ensures that the resulting texture sizes are deterministic.

The issue of triangles being stretched along the entire texture is also non-existent for this method. Since the opposite hemisphere of a projection pole is always chosen, the issue of an undefined projection at the poles is also solved. The mathematics for this approach are shown in algorithm 3.

Algorithm 3Hemisphere stereographic projection from unit sphere to 2D tex-tures

forevery observer do

forevery LOS mapped to 3D unit sphere do ifLOS.y ≥ 0 then

X ← _1+LOS.yLOS.x Y ← _1+LOS.yLOS.z

. LOS.TerrainPoint gives the world coordinates of the associated terrain

point that generated that particular LOS

SouthernT exture[X, Y ] ← dist(LOS.T errainP oint, observer)

else

X ← _1−LOS.yLOS.x Y ← _1−LOS.yLOS.z

N orthernT exture[X, Y ] ← dist(LOS.T errainP oint, observer)

end if end for end for

Note that these projections of course work within the shadowmapping pipeline, that is they are executed within shader programs on the GPU in great parallel. This type of projection works well, with minor artifacts that will be discussed later.

(43)

4.2.3 Optimizations

Since the viewshed algorithms presented in this thesis are meant to be used in a real-time application, it is important to consider certain optimizations. For this method, it is for instance sufficient to calculate the shadowmaps once each time new observers have been set, in contrast to usual shadowmapping in for instance video games where shadowmaps may have to be recalculated due to dynamic objects changing positions. Since however the number of observers may be too many to calculate every shadowmap within one frame without noticeable hickups in the application, a system to improve the user experience has been implemented.

This system works by having a set maximum frame time, e.g. 33ms for a target framerate of 30Hz, and a set approximation of how long one shadowmap gener-ation will take. It then simply calculates how many shadowmaps may safely be calculated to still lie within the maximum frame time, and stores the remaining number of calculations in CPU memory to be executed in the next frame. This is repeated until all shadowmaps have been calculated. Additionally, the shad-owmaps that have already been calculated are used immediately in the viewshed calculations.

Furthermore, it would be sufficient to do the viewshed calculations them-selves only once for each new set of observers. This implementation does however not contain such an optimization. This will be discussed later.

4.2.4 Final overview

The complete method works in two major steps. The first step, the shadowmap generation, is done once and the second viewshed pass is done continuously. The following description aims to provide an overview of both steps.

Shadowmap pass Generate a shadowmap for each observer by rendering the ter-rain to an offscreen framebuffer with an array of depth buffers as its depth attachment. The shaders for this render pass calculate LOS to every vertex of the terrain, by normalizing the vector between the current observer and the current vertex. This maps each terrain point to a unit sphere around the observer. Map this unit sphere to two different depth buffers in the depth buffer array, using algorithm 3. Set the depth value of each entry in the depth buffer to the distance between the current observer and current ver-tex. The depth testing of the framebuffer will ensure that only the closest terrain points get written to the depth buffer textures.

Viewshed pass Render the terrain as usual, but also upload the array of depth buffers as well as each observer’s position. In the fragment shader, loop over all observers and map each fragment’s world coordinate to the unit sphere around each observer, as it was done in the previous pass. Convert the unit sphere to 2D texture coordinates again using algorithm 3. Now compare the current distance to the observer to the depth value stored at that position in the correct depth buffer texture. If the current distance is greater than the

(44)

28 4 Implementation

value stored in the depth texture, the current fragment is not visible. Lastly colour the current fragment according to how many observers it is visible to.

4.3 Method 2: Scene voxelization and ray traversal

The second method takes a different approach to solving the problem. Its idea comes from the first method but aims to eliminate the need for two textures for every observer. As with the last method, its workings are first described after which an overview of the complete method is provided.

4.3.1 Idea

The previous shadowmap method utilizes the knowledge of how close a (trans-formed) terrain point may be to the observer, whilst still being visible. A simple comparison of distances, after transformations, is needed to determine the visibil-ity. Since these comparisons and transformations are executed by shaders, they work in parallel at essentially no performance cost at all. This general method is a tested and widely used method, but issues arise the more observers are added since each observer needs two shadowmaps each. Shader texture lookups are gen-erally fast, but looping and using many different textures for lookups will have a performance cost.

The idea of this method is that all that is needed by shaders, in order to calcu-late the viewshed, is some representation of the terrain and some way of access-ing this representation usaccess-ing world coordinates. It is then possible for the shaders to ray-traverse through this representation, and determine if every terrain point is visible or not. Using this approach, only one representation of the terrain is required, and can be reused continuously. This eliminates the need for several textures and may scale better with an increasing number of observers than the previous approach. One such representation would be a voxelized version of the terrain, which is also simple to traverse through.

The following sections describe how the terrain is voxelized and mapped to a texture that can be used for lookup in the shaders, how to traverse this represen-tation, as well as several optimizations.

4.3.2 Scene voxelization

A voxel representation of a scene consists of cubes of some size, that together make up a model, as opposed to having triangles that make up a model. These cubes may have a certain colour and normal associated with them, much like vertices, but for our purposes it is sufficient to know if a voxel is empty or not. It would be considered empty if a voxel at a certain world position does not contain or intersect a triangle at the same world position, and considered filled if it does contain or intersect a triangle at the same world position. By altering the sizes of the cubes, different resolutions of a scene can be obtained. If the cubes that represent the voxels are sufficiently small, there would be no difference between

(45)

4.3 Method 2: Scene voxelization and ray traversal 29

a triangle representation and a voxel representation of a scene. The issue is that no voxel representation exists for the terrain used by Vricon and this thesis, which is why a voxelization algorithm is required.

The issue of scene voxelization has been studied to some extent, most notably in [4]. The problem lies in converting a TIN mesh into a voxel representation, and to do so efficiently. In the method described in [4], a voxel representation of an arbitrarily complex mesh can be generated on the GPU using shaders.

The method works by utilizing the usual rasterization pipeline with modifica-tions added that ultimately lead to the rasterized pixels corresponding to voxels. First, the viewport of the rendering window is set to correspond to the dimen-sions of the desired voxel grid. For instance, if a voxel grid resolution of N xN xN is wanted, the viewport is set to N xN pixels. Next, the terrain is rendered using a special voxelization shader program. The shaders orthographically project each triangle, using one of three projection matrices, so that each triangle’s rasterized fragments are transformed to the voxel grid. Since the viewport was set to match the voxel grid dimensions, each fragment within the fragment shader now corre-sponds to a voxel. The following description provides an overview of the GPU shaders with more detailed explanations.

Vertex Shader A basic pass-through shader that simply sets the vertex positions to the world coordinate of the input data.

Geometry shader In this shader stage, one of three orthogonal projection ma-trices is chosen based on the input triangle’s normal. The projection that yields the largest triangle surface, after projection, is chosen. The three possible projection matrices are aligned with each of the three coordinate axes respectively. Information about what projection was used is sent to the fragment shader. The reason that this is done, as opposed to simply using a single orthographic projection, is to minimize holes in the voxelization caused by triangles whose faces are parallel to the orthographic projection. These types of triangles might simply not be rasterized, and therefor not yield any voxels, because they are not visible from a certain orthographic projection’s view.

Fragment shader The fragment shader has a 3D texture bound, which has the same dimensions as the voxel grid, for instance 512x512x512 pixels. This texture has been initialized with 0’s in every pixel. Since we know that if the fragment shader is run, its fragment must come from a triangle that should be written to a voxel. It is then sufficient to simply write a 1 at the current fragment’s pixel coordinates, accessible by the built-in variable gl_FragCoord. However, before this is done, the fragment shader uses the information passed from the geometry shader about how the current fragment was projected, to flip the axis to correspond correctly to the 3D texture. The actual writing of the voxel is done with the imageStore func-tion provided since OpenGL 4.2, which allows shaders to write to a bound texture, as opposed to only being able to read from them.

(46)

30 4 Implementation

After this fast voxelization, which only requires a single draw call to voxelize the entire scene, the 3D texture that was used in the fragment shader now con-tains a 0 for every empty voxel, and a 1 for every filled voxel. It is a complete, variable resolution, representation of the space occupation of the terrain. There are however issues with this approach, that will be described later sections.

4.3.3 Ray traversal

Given the voxel representation of the scene, visibility can be calculated by travers-ing a ray from a terrain point to an observer, and checktravers-ing along this ray whether or not a filled voxel is in its way. A simple approach to this would be to add a fixed increment to the ray origin, in the direction of the observer, for each traver-sal loop as shown in algorithm 4.

Algorithm 4Simple ray traversal algorithm forevery observer O do

forevery terrain point P do

dir ← normalize(O − P ) increment ← 1

dist ← dist(O, P ) minDist ← 0.5 ray ← P

whiledist > minDist do

ifvoxelFilled(ray) then returninvisible

end if

ray ← ray + dir · increment dist ← dist(ray, O)

end while end for end for

This type of algorithm, of course also executed on the GPU, is fast. But it may also miss voxels due to the fixed increment of the ray traversal. See figure 4.7 for an example of this.

A better, yet slightly less efficient algorithm, is the Amanatides & Woo algo-rithm presented in [1]. This traversal algoalgo-rithm is specifically designed to not miss any voxels. Its implementation is detailed in algorithm 5.

Using this algorithm on the GPU allows us to determine visibility for each terrain point and for every observer, in parallel.

4.3.4 Sparse voxel octree representation

The most notable issue with the voxel representation presented in the previous section relates to memory usage. The memory complexity is O(N3), where N is the dimensionality of the terrain data that is to be voxelized. The vast majority of

(47)

Algorithm 5Amanatides & Woo ray traversal algorithm function intbounds(s, ds)

returnds>0?ceil(s)−s:s−f loor(s)_abs(ds) end function

forevery observer O do forevery terrain point P do

dir ← normalize(O − P ) dist ← dist(O, P ) minDist ← 0.5 X ← f loor(currP os.x) Y ← f loor(currP os.y) Z ← f loor(currP os.z) StepX ← sign(dir.x) StepY ← sign(dir.y) StepZ ← sign(dir.z)

tMaxX ← intbounds(O.x, dir.x) tMaxY ← intbounds(O.y, dir.y) tMaxZ ← intbounds(O.z, dir.z) tDeltaX ← StepX_dir.x

tDeltaY ← StepY_dir.y tDeltaZ ← StepZ_dir.z

whiledist > minDist do

iftMaxX < tMaxY then

iftMaxX < tMaxZ then X ← X + StepX

tMaxX ← tMaxX + tDeltaX

else

Z ← Z + StepZ

tMaxZ ← tMaxZ + tDeltaZ

end if else

iftMaxY < tMaxZ then Y ← Y + StepY

tMaxY ← tMaxY + tDeltaY

else

Z ← Z + StepZ

tMaxZ ← tMaxZ + tDeltaZ

end if end if ifvoxelFilled(X, Y, Z) then returninvisible end if end while end for end for

(48)

32 4 Implementation

Figure 4.7: A case where the simple ray traversal algorithm described in algorithm 4 fails. Empty voxels are represented by the white squares, and filled voxels by red squares. If the fixed increment used to advance the ray falls at for instance the two points indicated by circles, the filled voxel in between them may be missed. This results in the terrain point falsely being classified as visible although it is not.

all voxels will be empty, given that terrain tiles usually are much wider than they are high, yet the same amount of memory is required to represent every empty voxel as every filled voxel. The sparse voxel octree (SVO) representation, detailed in [4], largely reduces this memory consumption by grouping together spatially close empty voxels, and then uses memory to instead represent this empty group. The SVO is an octree (nodes have eight or zero children) that consists of a root node, that represents the entire scene, with eight children that each represent one eighth of the scene. The eighths are split spatially along the middle of each of the three axes, giving each child of a node an equal size, and yields 23 = 8 children. Each child node potentially has eight children of its own, provided that the space that the child node represents contains voxels. Figure 4.8 shows this behaviour for a quadtree, from where the step to an octree is trivial.

In figure 4.8 the entire A quadrant is represented by a single leaf node, as opposed to representing each individual empty voxel in A by its own. This signif-icantly reduces the memory complexity, but makes it dependent on the terrain itself. Voxels are however only generated along the surface of the terrain and not inside it, since there is no need for this type of filling. This makes the SVO an efficient representation, memory-wise.

Construction of the sparse voxel octree

The implementation of the SVO is somewhat complicated in comparison with the previously described method using 3D textures in the fragment shader. When us-ing the 3D texture, each voxel generated in the fragment shader was simply writ-ten directly to the texture and no post-processing was required. Due to the recur-sive nature of the SVO, where each child node potentially requires sub-division into n additional child nodes, such a simple implementation is not possible. In-stead, some processing on the CPU is required to build the SVO and to make it usable in the shaders for traversal later. The SVO construction has the following main steps. Since the voxelization only needs to be run once for each terrain tile, the additional overhead introduced by the SVO construction is acceptable.

(49)

Figure 4.8: A quadtree that demonstrates the principle of how the sparse voxel octree is constructed. Filled voxels are illustrated as red squares and empty voxels are painted white. The main quadrants A, B, C and D, counted in a counter-clockwise fashion starting from the upper left, are each sub-divided into four additional child nodes since at least one of them contains a voxel. This process is repeated until no child node contains any filled voxels. The backslashes represent leaf nodes that are empty, whereas the red leaf nodes represent a filled voxel.

• Run voxelization again, with a buffer the size of the calculated number of voxels, and store the voxel world positions in the buffer

• Copy the buffer content to CPU memory • Build a CPU-side SVO from the buffer content

• Build a texture that represents the SVO to be used in the shaders for traver-sal

Since there is no way to calculate the SVO directly on the shaders during the voxelization, information about all filled voxels is required on the CPU. An

(50)

34 4 Implementation

imageBufferin OpenGL can be used for this, since it can be written to in the fragment shader. However, since the buffer requires memory to be allocated, a pre-pass is done before the actual voxels are stored. This pass uses an atomic_uint counter in OpenGL that can be atomically incremented in the shaders to give the correct required size for the buffer. The shaders are then run again with the at-tached buffer, and using imageStore the generated voxel’s world positions are written to it. The contents of the buffer are then copied to CPU memory using glGetBufferSubData, where an SVO is built recursively using this data.

The CPU-side SVO is trivial to construct in a recursive fashion using C++ classes. However, the main purpose of the SVO is to traverse through the terrain in the shaders in order to calculate the viewshed. For this, a representation of the SVO is required that is shader compatible.

The shader-side SVO structure was implemented using a 3D texture with di-mensions d

√

N e x d

√

N e x 8, where N is the number of nodes in the SVO. Each x, y pair in this texture corresponds to a node in the SVO tree, and each z-layer

corresponds to that node’s children. The alpha channel of each child can take values of 0, 1 or 2. A value of 0 indicates that the current child has eight children of its own, 1 indicates an empty leaf node and 2 lastly represents a leaf node that contains a filled voxel. If the alpha channel value is 0, the red and green chan-nels hold information about at what x, y index in the texture the children of the current node reside. See figure 4.9 for an illustration using a quadtree instead of an octree.

Figure 4.9: On the right, an illustration of how the SVO is represented in GPU memory using a 3D texture, using the quadtree on the left as its basis. The x and y texture indices each represent a tree node. Each z layer then corresponds to information about every child in that node. The alpha chan-nel of each child indicates whether the node is empty (alpha = 1), contains a filled voxel (alpha = 2), or has children of its own (alpha = 0). The red and green channels hold information about the texture index at which a child node resides.

This texture is then uploaded to the terrain shaders and used for the ray traver-sal in order to calculate the viewshed. The ray travertraver-sal in essence works the same as for the full 3D texture. The retrieval of voxel information is however consider-ably more complex than for the 3D texture, since the SVO needs to be traversed top-down for each increment in the traversal algorithm using the SVO texture.

GPU-Based Visualisation of Viewshed from Roads or Areas in a 3D Environment

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2016