Multi Sub-Pass & Multi Render-Target Shading In Vulkan: Performance Based Comparison In Real-time

(1)

Bachelor of Science in Computer Science May 2020

Multi Sub-Pass & Multi Render-Target Shading In Vulkan

Performance Based Comparison In Real-time

Alexander Danliden Steven Cederrand

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

(2)

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulﬁlment of the requirements for the degree of Bachelor of Science in Computer Science.

The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identiﬁed as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:

Author(s):

Alexander Danliden

E-mail: aldn17@student.bth.se Steven Cederrand

E-mail: stce17@student.bth.se

University advisor:

Dr. Prashant Goswami

Department of Computer Science

Faculty of Computing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00

SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

(3)

Abstract

Background. Games today are becoming more complex in computational and graphical areas. Companies today want to develop games with state of the art graphics while also having complicated and complex game logic. The vast majority of users rarely meet the computer requirements. This creates an issue which lim- its the target demographic that a company wants to meet. This thesis will focus on two diﬀerent methods that achieves deferred shading in Vulkan and how the en- vironment is aﬀecting both methods as-well as the number of lights and attachments.

Objectives. In Vulkan there are two ways of implementing deferred shading, one is the traditional way of doing it which is by conducting multiple render-targets. The second way is by utilizing a feature unique to Vulkan known as sub-passes. Our aim is to conduct experiments with these two ways of implementing deferred shading to determine which one is the most optimal for a given situation. These situations will vary depending on the number of visible objects and number of lights in the scene.

Methods. The experiments are conducted by a rendering system that have been implemented by us. By implementing both suggested deviations of the rendering technique ’deferred shading’ the data collected will suﬀer less from unexpected and unknown variables than it would if the implementations were taken from a separate source. The experiments that will be conducted intend to measure performance met- rics in the form of average frames per second as well as average render frame time(in seconds). To measure the time performance metric, the system shall utilize Vulkan’s support for gpu-timestamping[7]. To provide reliable measurements without any un- warranted errors each rendering deviation will utilize pre-recorded command buﬀers.

Conclusions. This thesis has shown that using multiple sub-passes within a single render-target performs faster write operations to the attached render attachments.

This result in less memory bandwidth which leads to a faster geometry pass. The performance gain from a faster geometry pass can be used somewhere else to en- hance diﬀerent aspects of the game or graphical application. Having less memory bandwidth would result in a longer battery life on mobile phones and laptops.

Keywords: Vulkan, Sub-pass, render-target, Deferred, Shading

(4)

(5)

Acknowledgments

We would like to thank our supervisor Prashant Goswami for his support and in- valuable input throughout the project. We would also like to thank our friends and families.

Steven Cederrand & Alexander Danliden

iii

(6)

(7)

Abstract i

Acknowledgments iii

1 Introduction 1

1.1 Introduction . . . . 1

1.2 Background . . . . 1

1.3 Related Work . . . . 3

1.4 Aim & Objectives . . . . 4

1.5 Research Question . . . . 4

2 Techniques & Vulkan API 5 2.1 Vulkan API . . . . 5

2.2 Deferred Shading . . . . 6

2.3 Deferred Shading Using Multiple Render-Target . . . . 6

2.4 Deferred Shading Using Sub-Passes . . . . 7

3 Method 9 3.1 Base Implementation . . . . 9

3.2 Renderer Implementation . . . . 10

3.2.1 Base Renderer . . . . 10

3.2.2 Multiple Render Target Renderer . . . . 10

3.2.3 Multiple Sub-pass Renderer . . . . 11

3.3 Experimental Setup . . . . 11

3.4 Hardware . . . . 12

4 Results, Analysis & Summary 13 4.1 Results . . . . 13

4.2 Analysis . . . . 16

4.2.1 Attachment Impact . . . . 16

4.2.2 Performance Stability . . . . 17

4.2.3 Multi Sub-Pass . . . . 17

4.2.4 Multi Render-Target . . . . 17

4.2.5 Summary . . . . 17

5 Discussion 19 5.1 Discussion . . . . 19

5.2 Complications . . . . 19

v

(8)

6 Conclusions & Future Work 21 6.1 Conclusion . . . . 21 6.2 Future Work . . . . 21

References 23

A Implementations 25

A.1 Base Renderer . . . . 25 A.2 Renderer Systems . . . . 26

B Supplemental Information 31

B.1 Multi Render-Target Tables . . . . 31 B.2 Multi Sub-Pass Tables . . . . 33

vi

(9)

Chapter 1 Introduction

1.1 Introduction

Computer graphics have constantly been evolving. From simple ﬂashing pixels to the rendering of complex geometry that may provide the illusion of real-life. For the most part these renderings have never truly occurred in real-time. Rather computer graphics that visualize reality have been locked to the movie industry and have been statically rendered.

It was not until the later decade that real-time rendering has been able to catch up to the visualization performed by the movie industry. To graphically simulate reality has been the aim for many years, which is clear by the graphical evolution of computer games.

Unfortunately graphically simulating reality is a daunting task, from a software and a hardware point of view. Through the years solutions have been proposed and set in place to optimize the complex nature of simulating reality. One of these optimizations is known as Deferred Shading.

1.2 Background

Computer games have grown in complexity and graphical ﬁdelity since they were ﬁrst conceived. Modern game systems aim to simulate reality within computer graphics, but also require complex game logic, to support a certain level of entertainment.

This goal is becoming more and more achievable, through further iterations of graphical processing hardware. But the concept is unfortunately unmarketable. The majority of users are not purchasing state of the art graphical processing units(GPUs).

This is clear from a survey conducted on the platform Steam. The majority of users are using NVIDIA GeForce GTX 1060[17]. This graphics card was released 19th of July, 2016. This is no longer considered state of the art.

Through the eﬃcient use of software solutions and techniques, the graphical aims may be achieved.

There are a hand full of graphical APIs(Application Interfaces) that are being used in the computer games industry. The most prominent are OpenGL[12] and DirectX11[5]. Software companies outside the game industry that develop interactive media also use the mentioned graphics API’s. Statically based render tools utilize these APIs[8]. While being considerably popular these APIs are old, OpenGL was initially released in 1992[2] and DirectX11 was presented in 2008 at the Nvision 08 technical conference[1]. Unfortunately these APIs introduced an abstraction issue,

1

(10)

2 Chapter 1. Introduction placing too much of the workload on the graphical drivers, and forcing the drivers to handle every detail. This is an issue, because the drivers have to assume whats is desired from the programmer. OpenGl suﬀers severly as well from a lack of multi- threading support by being locked to the "main-thread". OpenGl is also an implicit API which implies that resource management is the responsibility of the driver[16].

Vulkan is the name of the follow-up API to OpenGL. Vulkan was introduced in 2015[7] and is a cross-platform 3D graphics and computing API that targets high- performance real-time 3D applications. Vulkan forces the programmer to build, setup and support their environment themselves. This diﬀers from OpenGL and DirectX11.

These APIs would invoke a layer of abstraction that would hide low-level aspects of graphical programming. The purpose of Vulkan is to remove this abstraction and have the programmer carry out all graphical interactions with the GPU drivers. This is so that each program may be optimized for its purpose.

A popular shading technique within 3D rendering is named deferred shading.

Deferred shading is a technique that separates light calculations from geometry rendering[15]. The technique defers the light processing towards another stage. The two stages are known as geometry pass and lighting pass.

The geometry pass processes meshes, and other geometrical objects within a 3D scene and then outputs the results to attachments. Render attachments are buﬀers that store color information, that may be used in a later render-target. The term G-buﬀer is often used when referring to render attachments in deferred shading[18].

Figure 1.1: Visualisation of the data contained within the G-Buﬀers. Positions, normals and even specular values are contained as colors.[9]

Figure 1.1 visualises the data that is stored within G-buﬀers. All the data within the buﬀers are contained as colors, and are later sampled from in the lighting pass.

Separating light and geometry processing has the advantage that the lighting is decoupled from geometry, furthermore lights are only processed for those pixels that are aﬀected by the source. This implies that there may exist a larger light source quantity without severely impacting performance.

Light sources play an important role when creating diﬀerent visual environments.

Both realistic and simplistic. Unfortunately, this technique is not fault free. The

disadvantage of deferred shading is the need for multiple draw-calls from separate

render pipelines. One for outputting geometry data from the geometry pass, and

(11)

1.3. Related Work 3 another for outputting a light processed render. A draw call is when the CPU(Central Processing Unit) commands the GPU to execute an entire instance of the render pipeline. Multiple render pipelines are a requirement within deferred shading, due to structural diﬀerences within shading, inbound vertex data and texture data.

The reason to why issuing multiple draw-calls from separate pipelines is a cum- bersome task is due to the need to swap entire pipelines[3]. Pipelines are vast in scale in regards to the number of components that make one up. Figure 1.2 visualises the number of components that are connected to a pipeline.

Figure 1.2: Overview of the components composing a render pipeline[16]

When swapping a pipeline, each component within a pipeline is replaced.

1.3 Related Work

Converting a 3D deferred shading engine from OpenGL to Vulkan may introduce some form of performance increase. It still will not solve the main issue behind deferred shading, which is the need to swap render-pipelines.

In 2017 there was a study conducted by ARM that introduced the prospect of optimizing deferred shading on mobile GPUs with tile based architecture. The theory proposes the use of a feature in Vulkan known as sub-passes may decrease render time by 20%[6].

A sub-pass splits a render-pass into multiple steps that may contain the same dependencies. In terms of deferred shading the geometry pass and lighting pass would be two dedicated sub-passes in one large render-target[14].

The major increase within ARMs study was due memory dependency. Sub-passes remain within the same memory space at all time, this memory space is within a render-pass. With the locality in place, there is no need to move render attachments to the main memory of the device. Rather when the second sub-pass(lighting pass) is reading from attachments, then there is no necessity for moving memory locality.

This saves a lot of memory bandwidth[6].

(12)

4 Chapter 1. Introduction

1.4 Aim & Objectives

In Vulkan there are two ways of implementing deferred shading, one is the traditional way of doing it which is by conducting multiple render-target. The second way is by utilizing a feature unique to Vulkan known as sub-passes.

Our aim is to conduct experiments with these two ways of implementing deferred shading to determine which one is the optimal implementation for a given situation.

These situations will vary depending on type of visible object and number of lights in the scene. Our secondary aim is to conduct experiments on the same systems but only change the number of render attachments that are being processed in the rendering stage. This is to determine how the traditional implementation and the unique implementation for Vulkan may handle a growing number of render attachments.

The following items are objectives for the research that is being proposed. They generally describe what has to be done to conduct the research that is proposed within this paper.

• To develop and implement a 3D graphical rendering engine that may utilise both multiple sub-passes and multiple render-target when performing deferred shading.

• Performance comparison between sub-passes and traditional render-target dur- ing deferred shading in Vulkan.

• Determine if the number of attachments is a key factor when choosing between sub-passes and traditional render-target implementation.

• To describe what circumstances sub-pass deferred shading may outperform multi render-target shading.

1.5 Research Question

• How is the surrounding environment and the number of attachments aﬀect-

ing the render time when using sub-passes compared to traditional multi-pass

rendering?

(13)

Chapter 2 Techniques & Vulkan API

2.1 Vulkan API

Vulkan is a cutting edge 3D API from Khronos. The main characteristics of Vulkan is that it is a cross-platform graphics and compute API. It is constantly being developed and updated by the Khronos Consortium. The Khronos group are essentially creating a predictable and explicit API that fulfills the expectations of software creators in different fields like, games, mobile development and desktop development. What makes vulkan great is the low-level design that lets developers decide exactly how the API should operate and communicate with the graphics driver. Developers have the ability to control all interactions that OpenGL abstracted away which gives the developers the opportunity to tailor the implementations to their need, which could lead to a significant performance increase.[13][11]

Having the ability to control all the interactions and how every resource is handled introduces disadvantages. The complexity increases drastically. Getting to the stage where a simple textured triangle can be displayed to the screen is a huge diﬀerence in code complexity between OpenGL and Vulkan which can be seen in ﬁgure 2.1.

Figure 2.1: Code complexity illustration between OpenGL and Vulkan

The OpenGL code on the left side in figure 2.1 represents the API calls necessary to draw a simple triangle while on the right side of the figure is a mere description of tasks that need to be fulfilled to achieve the same results in Vulkan. While having

5

(14)

6 Chapter 2. Techniques & Vulkan API more control can lead to a signiﬁcant performance increase the complexity can instead reduce the performance for inexperienced developers, one that is not familiar with the Vulkan API.

2.2 Deferred Shading

Deferred shading is a rendering technique to postpone all the heavy shader calcula- tions to a later stage[10], as mentioned in section 1. Two render-passes are needed to defer all the heavy calculations. The first render-pass is needed to collect all the geometry, this is being stored in attachments. There is an attachment for position, normals, colors, depth and sometimes a specular value for light purposes. After the first render-pass all the attachments that were previously written to is stored in main memory. Before the second render-pass where all the lightning calculations takes place the attachments need to be read from main memory and stored in to the memory of the GPU in order for the shader to access the values. The benefits of implementing deferred shading is the decoupling of scene geometry from lightning.

The light is only computed for the parts that is visible within the scene. More ad- vantages are a cleaner management of complex lightning resources. It can also make it easier to manage other complex shader resources and make the pipeline of the software cleaner.

2.3 Deferred Shading Using Multiple Render-Target

Deferred shading that utilizes multiple render-targets is considered to be the tradi-

tional implementation of the technique. The technique breaks a single render into

two render-passes. Due to the fact that render-passes do not share local memory with

each other, the ﬁrst render-pass will have to output and store the attachment data

to texture memory thus demanding that it be read only. The second render-target

will then have to read from the texture memory. This is an expensive operation

compared to reading from local memory.

(15)

2.4. Deferred Shading Using Sub-Passes 7

2.4 Deferred Shading Using Sub-Passes

Figure 2.2: In this case the sub-pass 0 will be merge with sub-pass 1 in to a single render-target [6]

A render-pass in Vulkan that contains multiple sub-passes is called a multi-pass.

Normally this would not be any diﬀerent from using multiple render-passes but sub-

passes can have dependencies connected to each other. One distinct diﬀerence is that

the dependency can be restricted to a per-pixel region. This is important due to that

it allows the GPU to merge multiple sub-passes into one render-pass. The beneﬁt

of doing this is that the result from one sub-pass can be kept in GPU memory for

another sub-pass to use instead of having to write and read from main memory[6].

(16)

(17)

Chapter 3 Method

3.1 Base Implementation

Figure 3.1: UML overview of the structure of the project

The overview of the project can be show in ﬁgure 3.1. During startup the main method is invoked which initializes the application class. When the application class is initialized, it reads from an experiment settings header that contains which renderer to use, how many lights, how many attachments and for how long each experiments is supposed to run. The responsibility of the application class is to keep updating the application and keep the current chosen renderer to keep drawing. The class also

9

(18)

10 Chapter 3. Method keeps track of the time for each frame. After each experiment the application class terminates and cleans up all the used resources so that the next experiment can be as independent as possible. Before the application class is terminated, the collected results are transferred to a text ﬁle with additional information as, which renderer that ran, how many lights that was used and how many attachments that was used.

3.2 Renderer Implementation

3.2.1 Base Renderer

Displayed in figure 3.1 the two different rendering systems inherits from the base render system. The base render system initializes the Vulkan API and the mutual dependencies for both of the renderers. The base renderer is in charge of setting up the validation layers which are used for debugging, general application information such as the version of the API and which extensions to use which can be seen in figure A.1.

Another responsibility of the base renderer is to choose which GPU to use as well as setting up the logical device which is used for invoking API calls. Besides from setting up all that it also sets up the swap chain, resources for the swap chain, the command pool, semaphores and fences. The semaphores and fences are used to syn- chronize between the CPU and GPU[16]. The last thing that is being created is the query pool which is used to timestamp the GPU when a draw call is performed[14].

3.2.2 Multiple Render Target Renderer

The initialization of the MRTRenderer which is the renderer that performs multiple render-target starts by generating pipeline information which is setting properties for the input assembly, rasterizer, viewport and describing the final color attachment that will be presented to the screen, which can be seen in figure A.3. It continues by creating the uniform buffer that will hold all the lights and matrices and setting up the quad that the final result will be rendered upon. Both rendering systems uses pre-recorded command buffers, which means that the draw calls are recorded during initialization and re-used every frame because the environment is static, the only changes is the movement of the camera and that is done using uniform buffers.

Shaders in Vulkan are required to be converted to SPIR-V binaries in order for the API to interpret them. The initialization of MRT continues with setting up the actual pipeline, creating descriptor pool, descriptor sets and lastly recording the command buﬀers.

The MRT update method which can be seen in ﬁgure A.6 updates the camera to

orbit around the mesh as well as keeping track of the experiment time. The uniform

buﬀers are being updated with new information such as the current position of the

camera. The render method of MRT can be seen in ﬁgure A.4 and ﬁgure A.5. The

render method starts by performing a geometry pass and after performing the last

render-pass which performs the lightning calculations. Next it swaps out the current

image in the swap chain for the one that just ﬁnished rendering and presents it to

the screen. Lastly, the time stamp is queried from the API and stored.

(19)

3.3. Experimental Setup 11

3.2.3 Multiple Sub-pass Renderer

Similar to the MRT the MSP renderer follows the same initialization process. The MSP initialization process can be seen in figure A.7. It starts off by parsing the shaders to account for the number of attachments that is supposed to be used. The camera is created as well as the lights. The render-pass is set up which describes the render-pass as a whole. Within the render-pass creation the sub-passes is defined and the dependencies is set up. It continues by setting up the frame buffers, which depends on how many attachments that is supposed to be used, if all are used then the frame buffers consist of; Swap chain view, color, depth, position, normal and specular. The initialization continues with setting up the uniform buffers and the descriptor sets. Lastly, the pipeline is created and the command buffers recorded.

The update method which can be seen in ﬁgure A.10 is almost identical to MRT logic wise. The experiment time is tracked, the camera is being updated and the uniform buﬀers is being updated.

The render method in MSP which can be seen in figure A.8 starts off by per- forming the draw call and lastly collects the time that it took to perform the current draw call. The draw method can be seen in figure A.9 which performs the actual draw call. The draw method starts by acquiring a swap chain image to render on and submits the pre-recorded command buffers alongside with the swap chain image to the queue for rendering. It later waits for the GPU to finish the rendering with help from the semaphores and fences before presenting the final image to the screen.

3.3 Experimental Setup

Figure 3.2: A simple plane with 512 point lights within the scene

Figure 3.3: The Stanford dragon with 2048 point lights within the scene

The experiments are conducted on two meshes, a simple plane consisting of 96 vertices and the Stanford dragon consisting of 2614242 vertices[4]. In each iteration the lights are increased exponentially starting from 128 point lights and ending on 2048 point lights. The camera moves in a circular motion around the scene while panning in and out in order to keep the scene dynamic instead of static.

Due to the nature of light calculations, no light calculations are able to be con-

ducted with fewer than three attachments(position, normal and color). Four attach-

ments refer to including position, normal, color and specular value. This is a much

(20)

12 Chapter 3. Method more cumbersome lighting calculation and will therefore be the main pillar in regards to light calculation performance.

The metrics that will be gathered for analysis will purely be related to average frame time. Analyzing the average frame time provides a strong overview in regards to real-life potential of a real-time rendering engine.

The reason 2048 point lights were the upper limit is because more point lights exceeded the maximum amount of memory a uniform buﬀer is able to allocate

3.4 Hardware

The following segment speciﬁes the system speciﬁcations of the computer that was used to conduct the rendering experiments.

CPU Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz GPU Nvidia GeForce GTX 1070

RAM 12.0GB DDR3

OS Windows 10 Home

(21)

Chapter 4 Results, Analysis & Summary

4.1 Results

When conducting the experiments and gathering the correct statistics it was never clear as to which implementation could perform optimally. Rather it was clear that the statistics would have to be placed under scrutiny and analyzed. During the experiment’s execution there were no clear visual side eﬀects that occurred.

The following graphs represent the data that was collected from the experiments.

The exact statistical data is available within the appendix chapter.

Figure 4.1: Average frame time described in seconds for rendering varying scenes with one attachment.

From figure 4.1 to figure 4.3 all the data collected never invoked any form of light calculations. This is due to reasons mentioned in chapter 3. It is not possible to perform light calculations with less than three attachment. There is a clear difference between these figures and figure 4.4. Figure 4.4 contains all the statistics that include light calculations.

13

(22)

14 Chapter 4. Results, Analysis & Summary

Figure 4.2: Average frame time described in seconds for rendering varying scenes with two attachment.

Figure 4.3: Average frame time described in seconds for rendering varying scenes with three attachment.

It must be stated as well that the ﬁgures from ﬁgure 4.1 to 4.3 contain an over-

lapping plot. The overlapping plot is the "Quad MSP Dragon", which has the exact

(23)

4.1. Results 15

Figure 4.4: Average frame time described in seconds for rendering varying scenes with four attachment.

same values as "Quad MRT Dragon".

We may observe that within figure 4.4 there are multiple instances of data with a flat trajectory. While other plots may seem to contain a trajectory that is similar to that of an exponential graph. The flat trajectory data refers to the time it took to process the geometry pass. While the other plots refer to the render time of the lighting pass.

It is clear from figure 4.4 that implementing multi render-target deferred shading is far superior to multi sub-pass shading when it comes to performing a lighting pass in general. The only clear case of when multi sub-pass shading overcame multi render-target was when 512 light sources were in place. When comparing the "Quad MSP Dragon" with "Quad mrt Dragon" at 2048 light sources we find a difference of 0,004657s. That is a 10% increase in render time. With that stated when comparing

"Quad MSP Plane" with "Quad MRT Plane" at the same number of light sources there is a diﬀerence of only 5%.

While this is the case for rendering a 3D environment with light sources, there is an inversion effect in regards to the figures ranging from 4.1 to figure 4.3. In these cases utilizing multi sub-pass shading would seem to be the more attractive option in regards to frame time. But this does leave us with the question as to why this is the case.

It is clear that there is a large difference between figure 4.1 to figure 4.3 in terms

of the "Oﬀscreen MSP Plane" plot. This is not the only diﬀerentiating factor. The

implementation of multiple render-target has a tendency to ﬂuctuate. "Oﬀscreen mrt

plane" seemed to double in render time form ﬁgure 4.1 to 4.2, there after it would

remain stable.

(24)

16 Chapter 4. Results, Analysis & Summary Due to the nature of the experiments with having the average frame time pre- served in the data and not preserving the exact frame time for each render it is difficult to analyse fluctuations within the render. Within figure 4.1 there is a fluc- tuation within the time it takes to render to the offscreen buffer for "Offscreen mrt Plane". What caused this fluctuation is unclear.

4.2 Analysis

The data collected from the experiments provided a great insight in regards to the general performance impact of multiple render-target deferred shading and multiple sub-pass deferred shading. With the tests collecting frame time data on both the geometry pass and the lighting pass we can analyze both aspects of the system. Keep in mind that if the total rendering time of both passes is of interest then we need just add the geometry pass and the lighting pass render time.

4.2.1 Attachment Impact

One of the objectives under section 1 demanded analysis upon the performance impact of render attachments upon both multi sub-pass and render-target deferred shading.

Calculating the performance impact of an increase amount of attachments has lead to unexpected results. It is important to note as well that due to the parallel nature of a GPU it is diﬃcult to determine exactly where a bottleneck may arise, or how exactly it may impact performance. Rather the only aspect that may be used to exactly calculate the work would be the total time.

Increasing the attachments for rendering a plane mesh containing 96 vertices, provided an average increase of time spent in the geometry pass by 21.3μs/attachment for multi render-target rendering. While multi sub-pass rendering under the same scenario provided an average increase of 10.6μs/attachment. For this experiment instance there is a clear improvement in render time for multi sub-pass deferred shading. The improvement is a decrease in render time by 50%.

This was not the only case that was measured. The experiments were also con- ducted on a far more complex mesh, the Stanford Dragon containing 2614242 vertices.

The results are truly unexpected. Here we ﬁnd that the average time per attach- ment decreases by -6.3μs/attachment for multi render-target rendering. While multi sub-pass rendering increases by 2.6μs/attachment. Indicating that utilizing multi sub-pass rendering under these circumstances increases render time for each attach- ment in regards to multi sub-pass rendering, but also decreases the render time for multi render-target rendering.

If a conclusion should be revealed from this analysis is that attachments are not

free from impacting performance of either implementation of deferred shading. Due

to the generally small changes in render time, some of which being on μ-second level,

it is diﬃcult argue that attachment overhead is a key performance factor for multi

sub-pass or multi render-target rendering.

(25)

4.2. Analysis 17

4.2.2 Performance Stability

The average frame time for each deferred shading implementation had a natural variance. Due to the use of real-time renderers there is an importance placed on maintaining stability throughout the duration of use.

4.2.3 Multi Sub-Pass

When multi sub-pass deferred shading is implemented the main area of instability lies with rendering the Stanford Dragon. Here we can see that no table has a stable value in regards to off screen buffer rendering. Within table B.9 the average frame should have a value of 0,000428s. Table B.10 suffers from fluctuating values where the average time is 0,000425s. Table B.11 has an average frame of 0,000431s. Table B.12 has an average frame time of 0,0004302s. It is clear from looking at each local table that the values are considerably unstable. This should not occur, even with the increasing number of lights. For these are never truly considered or utilized during the off screen stage of deferred shading. It is important to highlight that the test case of rendering a plane mesh was 100% stable in regards utilizing the off screen stage.

4.2.4 Multi Render-Target

In opposite from multi sub-pass deferred shading there does seem to be considerable ﬂuctuations within both experiments. The reasons of which are purely unknown but may indicate that there is an issue with the traditional shading technique.

4.2.5 Summary

Previously under section 1 a research question was stated that has not explicitly been answered. The research question is the following: How is the surrounding environment and the number of attachments aﬀecting the render time when using sub-passes compared to traditional multi-pass rendering?

The data collected from the experiments clearly reﬂects that the multiple sub- pass renderer respond worse to complex geometry than the multi-target renderer.

It is also noted that the multiple sub-pass renderer performed better with simple geometry than the multi-target renderer.

The number of render attachments do not necessarily impact multi sub-pass or mutli-target deferred shading severely, but nevertheless there is still a small overhead.

Increasing the quantity of light sources seem to hit the multiple sub-pass renderer

harder than the multi-target renderer when exceeding 1024 light sources. This could

be an implementation issue rather than technique issue.

(26)

(27)

Chapter 5 Discussion

5.1 Discussion

It is not clear to which implementation can be considered the overall victor. Under multiple cases multi sub-pass(MSP) outperformed multi render-target(MRT) render- ing. While rendering the plane mesh it was clear that rendering with MSP provided a lesser performance impact. This is because of the main diﬀerences explained in section 2. The selling point of using multiple sub-passes within a single render-target is that it consumes less memory bandwidth which provides better performance. The cost of constantly writing and reading from main memory is far more than if the data could persist on the GPU. Less memory bandwidth provides less battery con- sumption on mobile phones and could also increase the fps for a smoother gameplay.

An interesting observation is that the MRT seems to perform better than MSP when having to deal with lightning calculations. This could be caused by shader differentiation, MSP uses different sets of shaders due to sub-pass specific implemen- tations. The transitioning between the first pass and the second pass is faster with MSP which could mean that the reason MRT performs better during light calcula- tions is due to a better implementation. Another possible explanation may be that MRT is able to write the final pixel color faster to the swap chain image than the MSP. Shader differentiation could be the reason for MRT being faster than MSP during light calculation however being that the shaders are almost identical makes it lean more towards performance differentiation when writing to the swap chain image.

While using multiple sub-passes instead of multiple render-target provides less memory bandwidth it comes with a downside. A pixel can only write and sample from itself meaning that post-eﬀects like blur is not feasible while using multiple sub-passes, in that case a second render-target is needed.

Stability should always be key factor behind one’s choice of rendering technique.

While both implementations would seem to vary in stability, variants being on the level of μ-seconds. It would seem that MSP shading has a more concise and stable way of handling the geometry pass. Compared to MRT which would vary render times, through the rendering of both meshes.

5.2 Complications

Due to the current unfortunate state of the globe, in regards to Covid-19 and the time constraints the test were only performed on a single system. Performing the test on multiple systems with diﬀerent hardware could lead to diﬀerent results. Arguable

19

(28)

20 Chapter 5. Discussion the MSP would still perform better in the sense of memory bandwidth when used on a diﬀerent system. Because of the time constraint, there was simply not enough time to create a more complex scene with a lot of geometry to better enhance the diﬀerences between using multiple render-target or one render-target with multiple sub-passes. Performing the tests on a mobile phone would have been optimal due to phones being more sensitive when consuming memory bandwidth.

Having the ability to perform deferred shading with less memory bandwidth could

mean that the performance gained can be spent somewhere else in the application,

if it is a game then it could be spent on physics, animation or other parts of the

graphical area. It could also mean that mobile games could reduce their battery

consumption which could possibly lead to a happier player base. The ability to

squeeze out more performance is constantly being sought after, the fact that Khronos

group decided to implement such a feature could make other graphics API developers

copy and improve. This could eventually lead to more optimized ways to perform

the techniques that is commonly used in games and other graphical application to

improve the visual aspect.

(29)

Chapter 6 Conclusions & Future Work

6.1 Conclusion

This thesis has shown that multiple sub-pass deferred shading within a single render- pass performs faster write operations to the attached render attachments. The per- formance gain from faster write and read operations may be used in other areas to enhance diﬀerent aspects of the game or graphical application. Having less memory bandwidth by using sub-passes would result in a longer battery life on mobile phones which was mentioned in the study conducted by ARM.

While an increasing amount of rendering attachments do not seem to aﬀect mul- tiple sub-pass rendering as harshly as for multiple render-target rendering. The number of lights that are utilized in a scene does aﬀect multiple sub-pass rendering negatively.

Using sub-passes also limits what the developer can do. Techniques where a pixel needs to know about its neighbors would not be feasible due to the limitations. A pixel can only read and write to itself.

From what is seen from the results, it is hard to draw a conclusion to which implementation is better. According to the result the multiple sub-pass renderer performed better with less geometry but performed worse with complex geometry.

The multiple sub-pass renderer took a bigger hit when the quantity of light sources was increased above 1024 light which could be caused by the implementation or it could be a weakness with the technique itself. It should be concluded that more research should be done upon the topic of utilizing Vulkan sub-passes.

6.2 Future Work

The research provided an insight to the possible results of implementing deferred shading using multiple sub-passes on a standardized personal computer. Through the experiments conducted, the results have proven that sub-pass deferred shading may be a viable technique to be utilized within the context of implementing a real- time rendering engine. It contains considerable potential.

With more and more companies changing their rendering API from OpenGL/Di- rectX11 to Vulkan/DirectX12 there is more room for competition between the APIs.

Each API may create unique features, which will place them ahead of the competi- tion. With sub-passes being a unique Vulkan feature it may be a stepping stone to boosting its popularity for commercial use.

21

(30)

22 Chapter 6. Conclusions & Future Work

For the use of a sub-pass deferred shading to be a more common implementation,

then there should be more research conducted within this area. Neither should the

use of sub-passes be locked towards deferred shading. With the decrease in use of

memory bandwidth the feature may prove to be useful within other areas of real-time

rendering. These areas are unfortunately beyond the scope of this thesis.

(31)

References

[1] Directx11 release. http://www.nvidia.com/content/nvision2008/tech

_p

resentations.html.

[2] Opengl release date. https://www.khronos.org/opengl/wiki/History

_o

f

O

penGL.

[3] Render pipeline overview. https://www.khronos.org/opengl/wiki/Rendering

_P

ipeline

_O

verview.

[4] Stanford dragon. http://graphics.stanford.edu/data/3Dscanrep/Stanford Dragon.

[5] Directx, Mar 2020. https://en.wikipedia.org/wiki/DirectXDirectX Wiki-Page.

[6] Hans-Kristen Arntzen. Arm multi sub pass deferred rendering. https://community.arm.com/developer/tools- software/graphics/b/blog/posts/vulkan-multipass-at-gdc-2017ARM Com- munity Site.

[7] Mike Bailey. Introduction to the vulkan® computer graphics api. pages 1–155, 11 2019.

[8] Blender. https://www.blender.org/about/. Accessed: 22-05-2020.

[9] Joey de Vries. https://learnopengl.com/Advanced-Lighting/Deferred-Shading, 16/05-2020. Accessed: 18-05-2020.

[10] Michal Ferko. Real-time lighting eﬀects using deferred shading. 2012.

[11] Andreas Flöjt. Exploiting temporal coherence in scene structure for incremental draw call recording in vulkan. 2018.

[12] Khronos Group Inc. The industry’s foundation for high performance graphics.

https://www.opengl.org/about/OpenGL About Page.

[13] Christoph Kubisch. Transitioning from gl to vulkan, Feb 2016.

[14] Pawel Lapinski. Vulkan Cookbook. Packt Publishing, 2017.

[15] Bic Schediwy Chris Duﬀy Neil Hunt Michael Deering, Stephanie Winner. The triangle processor and normal vector shader: a vlsi system for high performance graphics. June 1988.

[16] Parminder Singh. Learning Vulkan. Packt Publishing, 2016.

[17] Steam. https://store.steampowered.com/hwsurvey/videocard/?sort=pctSteam User GPU Hardware. Accessed: 14-05-2020.

23

(32)

24 References [18] Naty Hoﬀman Angelo Pesce Michał Iwanicki Sébastien Hillarie Tomas Akenine-

Möller, Eric Haines. Real-Time Rendering 4th Edition. CRC Press, 2018.

(33)

Appendix A

Implementations

A.1 Base Renderer

Figure A.1: First half of the render parent initialize function

25

(34)

26 Appendix A. Implementations

Figure A.2: Second half of the render parent initialize function

A.2 Renderer Systems

(35)

A.2. Renderer Systems 27

Figure A.3: MRT initialization method

Figure A.4: First half of MRT Render method

(36)

28 Appendix A. Implementations

Figure A.5: Second half of MRT Render method

Figure A.6: MRT update method

(37)

A.2. Renderer Systems 29

Figure A.7: MSP initialization method

Figure A.8: The render method of MSP

(38)

30 Appendix A. Implementations

Figure A.9: The draw method of MSP

Figure A.10: MSP update method

(39)

Appendix B

Supplemental Information

B.1 Multi Render-Target Tables

Lights Oﬀscreen MRT Quad MRT Lights: 128 0,000437 0,000014 Lights: 256 0,000445 0,000014 Lights: 512 0,000447 0,000014 Lights: 1024 0,000447 0,000014 Lights: 2048 0,000447 0,000014

Table B.1: MRT Data for Dragon Mesh With One Attachment Lights Oﬀscreen MRT Quad MRT

Lights: 128 0,000438 0,000014 Lights: 256 0,000439 0,000014 Lights: 512 0,000438 0,000014 Lights: 1024 0,000438 0,000014 Lights: 2048 0,000439 0,000014

Table B.2: MRT Data for Dragon Mesh With Two Attachments Lights Oﬀscreen MRT Quad MRT

Lights: 128 0,000435 0,000014 Lights: 256 0,000435 0,000014 Lights: 512 0,000436 0,000014 Lights: 1024 0,000436 0,000014 Lights: 2048 0,000436 0,000014

Table B.3: MRT Data for Dragon Mesh With Three Attachments Lights Oﬀscreen MRT Quad MRT

Lights: 128 0,000433 0,000815 Lights: 256 0,000434 0,001736 Lights: 512 0,00044 0,003752 Lights: 1024 0,000438 0,008039 Lights: 2048 0,000428 0,021087

Table B.4: MRT Data for Dragon Mesh With Four Attachments

31

(40)

32 Appendix B. Supplemental Information Lights Oﬀscreen MRT Quad MRT

Lights: 128 0,000024 0,000017 Lights: 256 0,000024 0,000017 Lights: 512 0,000025 0,000017 Lights: 1024 0,000024 0,000017 Lights: 2048 0,000025 0,000017

Table B.5: MRT Data for Plane Mesh With One Attachment Lights Oﬀscreen MRT Quad MRT

Lights: 128 0,000067 0,000023 Lights: 256 0,000072 0,000024 Lights: 512 0,000071 0,000024 Lights: 1024 0,000071 0,000024 Lights: 2048 0,000073 0,000024

Table B.6: MRT Data for Plane Mesh With Two Attachments Lights Oﬀscreen MRT Quad MRT

Lights: 128 0,000081 0,00002 Lights: 256 0,000082 0,00002 Lights: 512 0,000082 0,00002 Lights: 1024 0,000082 0,00002 Lights: 2048 0,000086 0,000021

Table B.7: MRT Data for Plane Mesh With Three Attachments Lights Oﬀscreen MRT Quad MRT

Lights: 128 0,000087 0,000801 Lights: 256 0,000089 0,00175 Lights: 512 0,00009 0,003802 Lights: 1024 0,00009 0,008157 Lights: 2048 0,000089 0,021471

Table B.8: MRT Data for Plane Mesh With Four Attachments

(41)

B.2. Multi Sub-Pass Tables 33

B.2 Multi Sub-Pass Tables

Lights Oﬀscreen MSP Quad MSP Lights: 128 0,00042 0,000014 Lights: 256 0,000427 0,000014 Lights: 512 0,000429 0,000014 Lights: 1024 0,000432 0,000014 Lights: 2048 0,000432 0,000014

Table B.9: MSP Data for Dragon Mesh With One Attachment Lights Oﬀscreen MSP Quad MSP

Lights: 128 0,000417 0,000014 Lights: 256 0,000426 0,000014 Lights: 512 0,000428 0,000014 Lights: 1024 0,000429 0,000014 Lights: 2048 0,000428 0,000014

Table B.10: MSP Data for Dragon Mesh With Two Attachments Lights Oﬀscreen MSP Quad MSP

Lights: 128 0,000421 0,000014 Lights: 256 0,00043 0,000014 Lights: 512 0,000434 0,000014 Lights: 1024 0,000435 0,000014 Lights: 2048 0,000437 0,000014

Table B.11: MSP Data for Dragon Mesh With Three Attachments Lights Oﬀscreen MSP Quad MSP

Lights: 128 0,00042 0,000942 Lights: 256 0,000429 0,00212 Lights: 512 0,000431 0,004706 Lights: 1024 0,000431 0,010818 Lights: 2048 0,00044 0,025744

Table B.12: MSP Data for Dragon Mesh With Four Attachments

(42)

34 Appendix B. Supplemental Information Lights Oﬀscreen MSP Quad MSP

Lights: 128 0,000008 0,000015 Lights: 256 0,000008 0,000015 Lights: 512 0,000008 0,000015 Lights: 1024 0,000008 0,000015 Lights: 2048 0,000008 0,000015

Table B.13: MSP Data for Plane Mesh With One Attachment Lights Oﬀscreen MSP Quad MSP

Lights: 128 0,000018 0,000015 Lights: 256 0,000018 0,000015 Lights: 512 0,000018 0,000015 Lights: 1024 0,000018 0,000015 Lights: 2048 0,000018 0,000015

Table B.14: MSP Data for Plane Mesh With Two Attachments Lights Oﬀscreen MSP Quad MSP

Lights: 128 0,000027 0,000015 Lights: 256 0,000027 0,000015 Lights: 512 0,000027 0,000015 Lights: 1024 0,000027 0,000015 Lights: 2048 0,000027 0,000015

Table B.15: MSP Data for Plane Mesh With Three Attachments Lights Oﬀscreen MSP Quad MSP

Lights: 128 0,00004 0,000955 Lights: 256 0,00004 0,002149 Lights: 512 0,00004 0,004816 Lights: 1024 0,00004 0,010749 Lights: 2048 0,00004 0,022004

Table B.16: MSP Data for Plane Mesh With Four Attachments

(43)

(44)

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

Multi Sub-Pass & Multi Render-Target Shading In Vulkan: Performance Based Comparison In Real-time

Bachelor of Science in Computer Science May 2020

Multi Sub-Pass & Multi Render-Target Shading In Vulkan

Performance Based Comparison In Real-time

Alexander Danliden Steven Cederrand

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulﬁlment of the requirements for the degree of Bachelor of Science in Computer Science.

The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identiﬁed as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information:

Author(s):

Alexander Danliden

E-mail: aldn17@student.bth.se Steven Cederrand

E-mail: stce17@student.bth.se

University advisor:

Dr. Prashant Goswami

Department of Computer Science

Faculty of Computing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00

SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

Abstract

Conclusions. This thesis has shown that using multiple sub-passes within a single render-target performs faster write operations to the attached render attachments.

Keywords: Vulkan, Sub-pass, render-target, Deferred, Shading

Acknowledgments

We would like to thank our supervisor Prashant Goswami for his support and in- valuable input throughout the project. We would also like to thank our friends and families.

Steven Cederrand & Alexander Danliden

iii

Contents

Abstract i

Acknowledgments iii

1 Introduction 1

1.1 Introduction . . . . 1

1.2 Background . . . . 1

1.3 Related Work . . . . 3

1.4 Aim & Objectives . . . . 4

1.5 Research Question . . . . 4

2 Techniques & Vulkan API 5 2.1 Vulkan API . . . . 5

2.2 Deferred Shading . . . . 6

2.3 Deferred Shading Using Multiple Render-Target . . . . 6

2.4 Deferred Shading Using Sub-Passes . . . . 7

3 Method 9 3.1 Base Implementation . . . . 9

3.2 Renderer Implementation . . . . 10

3.2.1 Base Renderer . . . . 10

3.2.2 Multiple Render Target Renderer . . . . 10

3.2.3 Multiple Sub-pass Renderer . . . . 11

3.3 Experimental Setup . . . . 11

3.4 Hardware . . . . 12

4 Results, Analysis & Summary 13 4.1 Results . . . . 13

4.2 Analysis . . . . 16

4.2.1 Attachment Impact . . . . 16

4.2.2 Performance Stability . . . . 17

4.2.3 Multi Sub-Pass . . . . 17

4.2.4 Multi Render-Target . . . . 17

4.2.5 Summary . . . . 17

5 Discussion 19 5.1 Discussion . . . . 19

5.2 Complications . . . . 19

v

6 Conclusions & Future Work 21 6.1 Conclusion . . . . 21 6.2 Future Work . . . . 21

References 23

A Implementations 25

A.1 Base Renderer . . . . 25 A.2 Renderer Systems . . . . 26

B Supplemental Information 31

B.1 Multi Render-Target Tables . . . . 31 B.2 Multi Sub-Pass Tables . . . . 33

vi

Chapter 1

Introduction

1.1 Introduction

It was not until the later decade that real-time rendering has been able to catch up to the visualization performed by the movie industry. To graphically simulate reality has been the aim for many years, which is clear by the graphical evolution of computer games.

Unfortunately graphically simulating reality is a daunting task, from a software and a hardware point of view. Through the years solutions have been proposed and set in place to optimize the complex nature of simulating reality. One of these optimizations is known as Deferred Shading.

1.2 Background

Computer games have grown in complexity and graphical ﬁdelity since they were ﬁrst conceived. Modern game systems aim to simulate reality within computer graphics, but also require complex game logic, to support a certain level of entertainment.

This goal is becoming more and more achievable, through further iterations of graphical processing hardware. But the concept is unfortunately unmarketable. The majority of users are not purchasing state of the art graphical processing units(GPUs).

This is clear from a survey conducted on the platform Steam. The majority of users are using NVIDIA GeForce GTX 1060[17]. This graphics card was released 19th of July, 2016. This is no longer considered state of the art.

Through the eﬃcient use of software solutions and techniques, the graphical aims may be achieved.

1

A popular shading technique within 3D rendering is named deferred shading.

Deferred shading is a technique that separates light calculations from geometry rendering[15]. The technique defers the light processing towards another stage. The two stages are known as geometry pass and lighting pass.

Figure 1.1: Visualisation of the data contained within the G-Buﬀers. Positions, normals and even specular values are contained as colors.[9]

Figure 1.1 visualises the data that is stored within G-buﬀers. All the data within the buﬀers are contained as colors, and are later sampled from in the lighting pass.

Separating light and geometry processing has the advantage that the lighting is decoupled from geometry, furthermore lights are only processed for those pixels that are aﬀected by the source. This implies that there may exist a larger light source quantity without severely impacting performance.