Bachelor of Science in Digital Game Development June 2020
Evaluation of Performance on Variable Rate Shading
Jonathan Carrera Iseland Leonard Grolleman
Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden
This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Bachelor of Science in Digital Game Development.
The thesis is equivalent to 10 weeks of full-time studies.
The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.
Contact Information:
Authors:
Jonathan Carrera Iseland E-mail: joia15@student.bth.se Leonard Grolleman
E-mail: legr15@student.bth.se
University advisor:
Stefan Petersson
Department of Computer Science
Faculty of Computing Internet : www.bth.se
Blekinge Institute of Technology Phone : +46 455 38 50 00
SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57
Abstract
Background. Modern games are becoming more demanding on the hardware, and to counter this, new techniques to ease these demands are developed. One such opti- mization technique is Variable Rate Shading (VRS), included in the DirectX 12 API.
It allows developers to vary the quality of parts of the frame to improve performance.
How efficient VRS is, seems to vary as different benchmark tests get various results.
This is most likely because of the different scene environments used in the tests.
Objectives. To further expand the environments used in VRS benchmark tests, this study will focus on measuring and evaluating the performance of VRS in a lightweight environment that differs from the others.
Methods. The method consists of developing a lightweight Direct3D 12 application, implement the VRS technique, and measure performance. For a clear evaluation, sev- eral tests are conducted measuring frame time, frame rate, and draw call speed at the different settings using the VRS technique at various resolutions over 1000 iterations.
Results. By measuring the frame time, frame rate, and draw call speed with VRS it was possible to collect performance data which is showcased in this study. The study showcases the average performance using 1x1, 2x2, and 4x4 shading rates at 480p, 1080p and 2160p resolution. The average data were compared between shad- ing rates and resolutions to examine the correlation and deviation. As anticipated, the results showed generally performance improvements when using VRS. However, some settings showed inconsistency in deviations between shading rates, and others showed impaired performance.
Conclusions. The conclusion drawn from this study suggests VRS improves per- formance even in lightweight applications, within reasonable boundaries. However, the performance gain was of a lower degree when comparing with other benchmark tests. This suggests VRS be more useful in higher demanding environments.
Keywords: DirectX, Variable Rate Shading, Performance, Render, Benchmark.
Acknowledgments
Special thanks to our supervisor Stefan Petersson, for giving us the idea to work with this subject, for giving us support and advice on different approaches, and for helping us with the development of the 3D-renderer and testing. We would also like to thank Diego Navarro for the continuous academic feedback on our work, as well as Yan Hu for her guidance on academic writing and for giving us direction on where to begin our study. Lastly, we thank family and friends for their feedback and support.
iii
Contents
Abstract i
Acknowledgments iii
1 Introduction 1
1.1 Aim and Objectives . . . . 2
1.1.1 Aim . . . . 2
1.1.2 Objectives . . . . 2
1.2 Research Questions . . . . 3
2 Background 5 2.1 Variable Rate Shading . . . . 5
2.2 Benchmarks . . . . 6
2.3 Compute Shader . . . . 7
3 Methodology 9 3.1 Implementation . . . . 9
3.1.1 D3D12 Application . . . . 9
3.1.2 Variable Rate Shading . . . . 12
3.1.3 Timer class . . . . 12
3.2 Evaluation . . . . 13
3.3 Limitations . . . . 14
3.4 Hardware . . . . 14
4 Results 15 5 Discussion and Analysis 21 6 Conclusions and Future Work 23 6.1 Conclusions . . . . 23
6.2 Future Work . . . . 23
v
List of Figures
3.1 Stages of the rendering pipeline [35] . . . . 9 3.2 A class diagram of the application . . . . 10 4.1 Bar graph showing the average frame time differences for no SRI ini-
tiation, 1x1, 2x2, and 4x4 shading rate at 480p, 1080p, and 2160p resolutions. . . . 15 4.2 Bar graph showing the average frame rate differences for no SRI ini-
tiation, 1x1, 2x2, and 4x4 shading rate at 480p, 1080p, and 2160p resolutions. . . . 16 4.3 Graph showing how shading rates 1x1, 2x2, and 4x4 at 480p, 1080p
and 2160p resolutions deviates in frame time percentage from no SRI. 17 4.4 Multiple graphs showing frame rate consistency over 1000 iterations
for no SRI initiation, 1x1, 2x2, 4x4 at 480p, 1080p, 2160p resolution . 18 4.5 Bar graph showing the average draw call time differences for no SRI
initiation, 1x1, 2x2, and 4x4 shading rate at 480p, 1080p, and 2160p resolutions. . . . 18 4.6 Graph showing how shading rates 1x1, 2x2, and 4x4 at 480p, 1080p,
and 2160p resolutions deviates in draw call time percentage from no SRI. . . . 19
vii
Chapter 1
Introduction
Optimal performance is an essential part of video games [5]. Game developers are continuously pushing for new heights in computer graphics [15]. They are striving to improve the performance and visual quality of their content [32]. As the modern GPU sees rapid growth in computational capacity, maintaining optimal frame rate seems less strenuous.
However, the growing desire for improved quality in real-time rendering increases computational and shading cost in modern video game content. As of today, the majority of users currently have 1080p resolution [10] as their standard monitor resolution. With the estimated significant growth in the global market for 2160p resolution displays in the coming years [17], pixel-count will further increase. The per-pixel shading cost becomes more demanding to compute as more video game studios push for more realistic and detailed rendering. Additionally, the increasing per-pixel shading cost in mobile games also affect the power consumption on mobile devices, resulting in less battery lifetime. Due to performance constraints from the pixel-count and per-pixel shading cost mentioned above, graphic renderers may not always afford to deliver the same quality level to every part of its output image. This is especially true in virtual reality devices where scenes are rendered twice, once for each eye [24].
To counter these problems, GPUs today support various mechanisms to lower the shading costs. Some examples of this that game studios utilize today include multisample anti-aliasing, mixed resolution shading, dynamic resolution rendering, checkerboard rendering, and coarse pixel shading.
Variable rate shading (VRS), or coarse pixel shading, is a mechanism that has been getting more attention recently for its promising way of optimizing [9]. It enables developers to allocate rendering/shading capacity for each 16x16 pixel region, otherwise called "tile", on the screen at rates varying across the rendered image. This makes it possible to perform shading at a coarser frequency than a pixel, coloring a group of pixels from a single sample. Developers determine the shading capacity within each tile with the use of shading rate image (SRI) at rates of 1 pixel (1x1), 2 pixels (1x2, 2x1), 4 pixels (2x2), 8 pixels (2x4, 4x2), and 16 pixels (4x4). The preferable use case of VRS is to allocate lower shading rates at selective parts of the image that barely impacts the visual quality of the rendered image. Using VRS in this preferred way could be considered as performance gain without drawbacks as the rendering is more granular and not as detailed. Chapter 2 further describes this.
How efficient one application becomes through the use of VRS varies. Different benchmark tests have been conducted to evaluate the features performance capacity.
1
2 Chapter 1. Introduction However, the environments used in these benchmark tests differ in polygon count, light computation, and overall complexity, resulting in varying performance results regarding the feature. One benchmark test by UL Benchmark tested the VRS feature in a simple scenery with approximately 50 % improved performance [1]. Whereas another test made by a developer from Microsoft measured a 14 % - 20 % improved performance in the complex game Civilization 6 [26]. More on these in Chapter 2.
The focus of this study is to further explore the performance efficiency from the VRS feature in a less complex scene, widening the variety of testing environments for VRS. Widening the variation of environments may contribute to other studies searching to study the performance benefits of VRS. Testing VRS in a simpler en- vironment should also show the raw performance of VRS optimization capacity as little to no other computations affects the performance of the application. For a suf- ficient estimation of the performance, testing several of the feature’s settings would be appropriate.
1.1 Aim and Objectives
1.1.1 Aim
This study aims to analyze the performance of the D3D12 feature Variable Rate Shading when used in a simple render application. The reason being, to test VRS in a simpler testing environment compared to the other benchmark tests done by UL and DirectX developers. This is to widen the variety of environments used for benchmark tests and to isolate the impact of VRS. No extended use of VRS such as content-awareness algorithms was used. It only covers the performance of the native VRS support provided by DirectX 12.
The application kept shader and system computations to a minimal extent and focused on being optimized with the use of multithreading. Performance was mea- sured in frame rate, frame time and draw call speed for the evaluation of the VRS feature.
To acquire sufficient performance data, several tests was conducted, measuring the three most substantial shading rates of 1x, 2x, 4x as well as measuring the application without a SRI for comparison. Each shading rate was fully set across the SRI, utilizing its screen pixels for a series of resolutions. 640x480, 1920x1080, and 3840x2160 was used in this case as they are to this date the lowest possible, most common, and significant growing resolutions.
Results should show a clear correlation between shading rate and pixel-count. It will give an overall performance estimation using the set environment.
1.1.2 Objectives
The objectives of the study are the following:
• O1: Develop a D3D12 render application and render a geometric plane.
• O2: Implement VRS feature Tier 2 with the inclusion of a SRI.
• O3: Conduct the tests in the minimal environment.
1.2. Research Questions 3
• O4: Measure the render pass time for a frame time, frame rate, and draw call for each shading rate at each resolution.
• O5: Evaluate the average performance.
• O6: Compare the results.
1.2 Research Questions
The research questions for this study are as follows:
• RQ1: What is the average performance for drawing a basic geometric plane mesh to a render target using Variable Rate Shading from a DirectX 12 plat- form?
• RQ2: What is the performance ratio when increasing the shading rates using a SRI?
• RQ3: What is the performance cost for using a SRI in the pipeline?
Chapter 2
Background
2.1 Variable Rate Shading
Variable Rate Shading [9], a new and promising rendering technique featured in Di- rectX 12’s Turing architecture. It is supported by NVIDIA’s latest graphic cards RTX 20-series and GTX 16-series [3] and the coming RDNA 2 architecture used in Intel’s AMD-graphic cards Gen11 [7]. As part of the DirectX 12 API, the flexible technique allows developers to optimize performance as well as quality by dynami- cally varying the shading rate for different regions across the frame.
The only means of controlling shading rate before VRS was through the use of multisampling anti-aliasing (MSAA) [23, 9, 8] combined with super-sampling. Multi- sampling anti-aliasing, as the name suggests, takes multiple samples on a single-pixel to remove aliasing along the edges of polygons. Whereas VRS reduces the num- ber of samples in various locations. This together with Multi-Resolution Shading, Lens-Matched Shading [9] was where the idea of VRS derived from. The display resolution affected the performance of virtual reality devices due to high bandwidth when rendering to two high-resolution screens simultaneously [22]. This was also an implication that display resolution grows faster than pixel throughput [16]. Some- thing that proves problematic since performance is a crucial part of a good game experience. The reason behind this high latency and high rendering requirements for 3D rendering originate from the game and movie industries. As industries drive the development for higher and finer graphical quality in gaming, technology in perfor- mance optimization and hardware follows in development [32, 13, 14, 12, 6].
The principle of VRS is to reduce the number of samples taken across the entirety of a frame by grouping up the pixels into tiles during the pixel-shading stage. It does so by dividing a uniform shading rate into two passes controlled by two combiners.
In the first pass, the user can choose how to combine the shading rate values, one shading rate value passed from the pipeline and one shading rate passed from the vertex or geometric shader. How the values are combined is determined by flags to accumulate values from only one pass, the highest or lowest values from both passes, or the sum of both passes. The second pass executes similarly to the first pass. Here the user will instead pick between the shading rate from pass one or a shading rate from the SRI. These two passes then result in the final shading rate to be used when drawing the screen. This allows the user to apply a coarse shading rate where vision is impaired. This typically occurs in shadows or in the distance, where details are very dense [2]. This means that games or applications with VRS implemented may have overall improved performance with minor visual impacts [23, 19, 29].
5
6 Chapter 2. Background However, reducing the pixel shading count is not a new method of optimizing performance. Coarse pixel shading is a technique that has been developed since 2014 for DirectX 11 [31], along with the idea of merging pixels [27]. The idea is to take only one sample across a set of multiple pixels, using this sample to draw the color value to the entire tile. These expedite can theoretically be demonstrated with Screen Resolution / Sampling Rate = Samples Per Resolution, where a resolution of 3840x2160 using a shading rate of 2x2 will result in the same amount of samples as 1x1 on a 1920x1080 resolution. Because of this, the hypothesis was that a higher shading rate will result in faster computation.
The VRS support was released with two Tiers, where Tier 1 for older versions only held support for static shading rates per-draw-basis. Tier 2 added support for many other features as defining different shading rates separately across the image or render target, re-using shading rates sets across several viewports. In addition, it added SV_ShadingRate as input in the pixel shader, opening up possibilities for many other implementations, such as making the shading rates content-aware.
2.2 Benchmarks
3DMark is a leading benchmark tool developed by UL for computer and mobile devices. The tool determines the hardware’s 3D graphical rendering performance and CPU workload capacity through a series of intensive benchmark tests. It has tailored benchmark tests for a specific hardware capability, ranging from high-end hardware systems to low-performance systems. The benchmark tests focus on rendering and updating complex game environments in real-time. Each benchmark test gives a score based on different performance parameters. Users can use this score to compare with similar systems. As of this date, 3DMark is the world’s most popular and widely used benchmark tool with millions of users, hundreds of hardware reviews, and many of the world’s leading manufactures.
In August 2019, UL Benchmarks added performance testing of VRS to their per- formance testing application, 3DMark 11 [2], allowing users to try the VRS technique at different settings in a 3D environment. Similar to this study, 3DMark measures the performance of the VRS technique and presents the estimated FPS. The tests allow users to try VRS at different settings in a 3D environment and see the per- formance differences. According to UL Benchmarks in their 3DMark’s VRS feature test where they render a 3D scene with a moving camera, the performance improved by approximately 50 % with VRS, enabling minimal loss in visible quality [1].
DirectX developer Jacques reported measuring a 14 % - 20 % increase in perfor-
mance while utilizing VRS in their experimentation partnered with Firaxis games
[26]. In their experiment, Firaxis initially tested Tier 1 support with a dynamic
shading rate, shading terrain, and water at a 2x2 shading rate and smaller assets at
a 1x1 shading rate. They measured approximately 20 % improved performance but
with little loss in visual quality. In their other experiment, they used Tier 2 for edge
detection to preserve detail. They measured approximately 14 % performance gain
but making it nearly impossible to see any visual quality loss.
2.3. Compute Shader 7
2.3 Compute Shader
As graphics processing unit (GPU) handles graphics, general-purpose computation,
traditionally handled by the central processing unit (CPU), can be performed on
the GPU using a compute shader. Computer shaders are a programmable shader
stage that allows large numbers of parallel processors on the GPU to perform general
computations [20]. It can potentially speed up the application immensely as more
threads than just on the CPU compute.
Chapter 3
Methodology
This thesis is an implementation built upon the analysis of a DirectX 12 feature for 3D-rendering. The method involves developing a D3D12 application with VRS and timers incorporated, run timer tests, and evaluate the resulting data.
Developing the application from the ground up allowed full control of computa- tions on the CPU and GPU. It is suggested for this scenario to have full control when testing VRS. Other computations such as lights and geometry, which are common in games and would otherwise interfere with the performance, can be excluded in this case. Therefore, the timers are more accurate and allow isolation of smaller sections in the pipeline, such as the draw call for geometry.
The application kept shader and system computations to a minimal extent and focus on being optimized with the use of multithreading. Performance was measured in frame rate, frame time, and draw call speed for evaluation of the VRS feature.
The application also called SetStablePowerState to prevent the application from exceeding the thermal limitations of the processors and drains excessive current.
This is to enable profiling of GPU usage without experiencing artifacts. The data gathering involves measuring the application’s frame time, FPS, and draw call speed by commissioning the timer and timestamp query heap. To authorize the data, Microsoft recommends their software PIX to ensure the accuracy of extraction and debugging [11]. This software makes it possible to peek into the GPU to examine if the resources get the correct values.
3.1 Implementation
3.1.1 D3D12 Application
The D3D12 application got developed in a Microsoft Visual Studio 2019 (VS) inte- grated development environment, using the C++ program language. The application followed a standard graphics pipeline for 3D-engines in DirectX 11 & 12 [35].
Figure 3.1: Stages of the rendering pipeline [35]
9
10 Chapter 3. Methodology The application’s fundamental structure utilizes the native Windows API. This was done mostly for convenience when creating a window handle that can later be utilized by the DirectX 12 API when setting up viewports. Besides, it also delivered a clean message handling system providing a simple render loop. Therefore, the windows API also held the responsibilities for creating and pre-initializing all the DirectX core interfaces and resources. The window and viewports were initialized with the resolution size of 640x480, 1920x1080, and 3840x2160. Including more resolutions was not necessary because the performance of different shading rates should scale the same between resolutions mentioned in the background.
For the architecture to give a fair representation of a realistic minimal game envi- ronment and the raw performance of VRS, everything was divided into its systems:
• The core, which would take care of the engine’s critical DirectX resources.
• The render engine, which held the responsibility for rendering the scene.
• The timer engine, to not be affected by other processes and to provide accurate results.
Frames were prepared in separated threads, driven by a ring buffer, storing them in a queue before presenting them to the frame buffers. See figure 3.2. Note that it is important to consider that the driver can only queue up to 3 frames as a de- fault. This will affect the latency between frames. The Windows Display Driver Model limits the operating system from queuing more unless changed manually with IDXGIDevice1::SetMaximumFrameLatency() to a maximum of 16 frames[21]. This feature is supported using the DXGI 1.1 or higher. For the sake of this study, the queued frames were remained at their default values.
Figure 3.2: A class diagram of the application
3.1. Implementation 11 For the core, a device was installed with the ID3D12Device6 using feature level 12.1 to ensure the support for the latest version of VRS Tier 2. With this device, a feature support check was performed, extracting the options data from the device using OPTIONS6. The data made it possible to fetch the supported Tier, ensuring Tier 2 was available. The swap chain was installed with IDXGISwapChain4 used with two frame buffers to match the traditional pipeline. These buffers were created with the flip discard setting to achieve the best performance mentioned in Microsoft’s documentation [28].
The render targets implemented were initialized without blending, so the color values clearing the screen or from the previous frames did not affect the color out- put of the pixel shader. They were implemented to support all colors and alphas to not perform any logical operations or blending in themselves. Also to blend the Red, Green, Blue, and Alpha (RGBA) outputs from the pixel shader together be- fore adding it to the render target. The core also held the responsibility for the ID3D12CommandQueue to ensure other systems synchronize and perform all of the GPU commands in the right order.
As for the render engine, it held the responsibility for all the frames, textures, and necessary resources for rendering, each frame containing its ID3D12CommandAllocator and ID3D12GraphicsCommandList5, created as a direct type for immediate execu- tion. This stage could be optimized through the use of bundles, if the instructions were the same for every draw call as the instructions would be preprocessed by the driver. However, in this study, it was deliberately left as a direct list for reuse at the setup of the system and VRS. All the draw instructions were executed on their thread to allow more frames to be prepared in parallel.
The ring buffer included in the interface was created using a deque containing a dataset of the fence value for the finished frame and its offset in the queue. Deque was a good choice in this case since it allows a faster insertion in the front as well as the back of the queue compared to vectors, making it suitable as a circular buffer.
In this way, it was the buffers only purpose to keep track of the frames committed to the command queue, which was represented by the head and the finished frames representing the tail. The head, the tail, the current number of frames committed, and the maximum number of frames allowed in the buffer are all represented as an unsigned short integer. This kept the memory footprint small and calculations fast.
The maximum allowed frames in the queue was equal to the number of avail- able threads in the system using the std::thread::hardware_concurrency() function.
When the buffer was full, the allocation returned an error message, which indicated a release of all the completed frames before drawing the next one. As the study aims to utilize the feature for the whole screen, shading each tile was necessary. Therefore in the vertex shader, the signature passed a single vertex of 3 float values and add an alpha value to be passed along with the shading rate within the given position. The vertex shader’s sole task was to pass through the data for the rasterizer to register if there was any geometry present. To ensure the GPU shades each tile, the entire frus- tum needs rasterization. An entire rasterized frustum will generate fragments over the render target for each sample-location to contain data for the shading process.
To achieve full rasterization, a simple geometric plane covering the frustum would
be enough.
12 Chapter 3. Methodology
3.1.2 Variable Rate Shading
The question stands which Tier of VRS to use for answering the research questions, as both of the Tiers are suitable when drawing a scene. However, Tier 1 only allows a singular shading rate to uniformly be drawn across a render target, which could cause the scene to become blurry or render artifacts[29]. Since Tier 2 allows a more dynamic use, SRIs are of higher interest as they provide a more precise positioning where details are less necessary. Therefore, it is more likely to be used in a real engine, even though the SRI for this study only contained a uniform shading rate.
The VRS resource was prepared before the render loop, avoiding calculations for each draw call, which affects performance. Many features and usage areas of VRS, such as content awareness is based on the ability to utilize the SRI values during runtime.
Many studies focus on the performance and visual quality of these techniques [29, 19, 34, 33], but as the attentiveness lies within the performance of drawing with the use of VRS, it would only be of interest to read from the SRI. When assembling the VRS resources, it was possible to fetch the image tile size supported by the graphics adapter to set up the palette. This was made through the use of OPTION6, fetched from the device as previously mentioned. To obtain the size of the palette, the application window was divided by the tile size retrieved. It was important that this step was carried out right since the tile size can only be set between three values, 8, 16, and 32. With this, the SRI was created as a committed ID3D12Resource1 with unordered access to allow direct distribution from the GPU, using an unordered access view(UAV).
Despite the vertex and the pixel shader being the only two necessary stages to draw geometry. The application had a compute shader implemented to shift the workload over to the GPU when populating the SRI.
The compute shaders signature contained a singular table consisting of a constant buffer view(CBV) and an Unordered Access View bound to the SRI using the same dimensions. These views shared the same heap where the CBV stood first in line as it was accessed more frequently when carrying out the population algorithms.
Before each draw call, the application instructs the combiners to override the uniform shading rate value to allow editing in the vertex shader stage and read the SRI values.
3.1.3 Timer class
The timer class in the application was held as a separate system to avoid interference with scene rendering. The class divides the timers into two areas of responsibility, one to measure frame time and FPS and another to measure draw call. Each of the timers ran on its dedicated thread to ensure that the results were accurate and were not affected by the outputted processing instructions.
The first timer, measuring frame time and FPS using the Chrono library. It
functions by placing a timestamp at the beginning of a render call, which was later
subtracted with the timestamp taken from the previous frame, granting the frame
time that was then stored inside a float vector. The current timestamp was then
saved as the last timestamp for the next frame. In this manner, the first frame
will always be invalid since the previous timestamp will not contain any value, and
therefore the first value was not included. These values were then later saved into a
3.2. Evaluation 13 text file to be used for analyzing.
Algorithm 1: Simple FPS algorithm for finding the time taken between the previous frame and the current frame.
1
f unction_f ps_count[ ]()
δ = currentT imestamp − previousT imestamp;
lastT imestamp = currentT imestamp;
F P S_V ector.push_back(δ);
The second timer, measures draw call speed, was a GPU timestamp system built upon a class concept from the work of Mikael Olofsson [25]. Here a timestamp query heap was created along with a default committed resource that functions as storage for the timestamps. This system queries timestamps from the GPU queue and calculates the elapsed time. The difference is that this system is not dependent on the CPU to wait for the GPU to finish a frame. Instead, the frames keep their separate resources for parallel querying.
Algorithm 2: Timestamp query
1
Create_T imer() Create Query Heap;
Create Committed Resource.
2
Update()
Set a timestamp into the current position of the command list and bind the position to the heap.
Add draw command into the current position in the command list.
Set a second timestamp into the current position of the command list and bind the position to the heap.
Fetch the timestamps included in the heap and bind them to the committed resource.
Calculate the time by reading two timestamp positions from the committed resource.
3.2 Evaluation
At this stage, the application could run the tests and output the necessary timing
data. Each test initially measured the applications frame rate, FPS, and draw call
time at resolution of 480p, 1080p, 2160p without utilizing a SRI. Same set of tests
were then performed but with a SRI for each shading rate of 1x1, 2x2, 4x4, resulting
in 12 different tests. Each test was performed through 1000 consecutive iterations.
14 Chapter 3. Methodology The measured data was saved into a spreadsheet for further calculations. The spreadsheet consisted of calculating the average frame time, FPS, and draw call speed as well as the consistency diversity for them. These average values were then used to find how they deviate compared one another. This was then followed by comparing shading rates and resolutions. With these values, an overall estimation of the VRS performance was presented and discussed.
3.3 Limitations
For this experiment 1000 iterations was considered an acceptable amount as further iterations would have minimal difference in the results. Therefore, the samples taken were limited to 1000 iterations. This amount also seems to be common among other benchmark works [25, 19].
The difficulty when taking performance tests of features is that the architecture behind every application differs [4], this resulting in different outcomes using different game engines, benchmark environments or even hardware [1, 26, 19, 29]. As there are multiple approaches in developing and optimizing an application, there is most likely room for improvement on the application regarding pipeline and computation in this study. The application could also be impacted by internal background processes of the operating system as they might block the access of threads. This would cause the application to stall and wait for access when constructing new threads [30, 19].
3.4 Hardware
The experimental environment was set-up as a hardware-driven application per-
formed on a LENOVO LNVNB161216 motherboard based on a windows 10 home x64
operation system. Driven by an HM370 Intel Chipset, Intel Core i7-9750H processor
at 2.6 GHz, 8GB NVIDIA RTX 2080 MAX Q, and a 32GB SO-DIMM DDR4 RAM
at 2666MHz. These parts were chosen to mimic the current generation of hardware
released for the RTX 20-Series of graphics cards on which VRS Tier 2 was released
for. The study used the extension of a Philips The One 65" 4K UHD LED Smart
TV 65PUS7354/12 to attain support for 4k resolution.
Chapter 4
Results
This chapter covers all the results of the conducted experiments. The data showcase the performance of VRS in the set environment under the given circumstances. All the tests performed were divided through shading rates and screen resolutions. The tests measured the frame rate and draw calls without the use of SRI and with the use of SRI at the shading rates of 1x1, 2x2, 4x4 for the given resolution of 480p, 1080p, 2160p.
The data was processed to calculate the deviation between the results and con- sistency.
Examining the average frame time presented in figure 4.1, different statistics can be observed, showing the frame time measured for each shading rate on the different resolutions. The data show a clear diversity in performance between shading rates as well as resolutions.
No SRI VRS 1X1 VRS 2X2 VRS 4X4
640x480 3.52 ms 3.42 ms 3.29 ms 3.16 ms
1920x1080 3.48 ms 3.13 ms 3.35 ms 3.23 ms
3840x2160 3.30 ms 3.37 ms 3.39 ms 3.53 ms
2.90 ms 3.00 ms 3.10 ms 3.20 ms 3.30 ms 3.40 ms 3.50 ms 3.60 ms
Average frame time
640x480 1920x1080 3840x2160
Figure 4.1: Bar graph showing the average frame time differences for no SRI initia- tion, 1x1, 2x2, and 4x4 shading rate at 480p, 1080p, and 2160p resolutions.
Each shading rate measured on 480p shows a decrease in frame time in consecutive order. However, the data for 1080p show an irregular consistency across shading rates. The result for shading rate 1x1 at 1080p, shows the frame time making a
15
16 Chapter 4. Results considerable drop from when the SRI was uninitiated. The deviation between SRI uninitiated and 1x1 shading rate shows a difference of 11 %. See figure 4.3. As for shading rate 2x2, the data show a ∼7 % higher frame time compared to 1x1. The 4x4 shading rate lowers the frame time with ∼4 % from 2x2. Looking at the pattern, each setting shows regularity except from 1x1, where it shows an arbitrary drop in frame time compared to the other.
The test for 2160p shows a more consistent regularity in frame time between each shading rate, although increasing in frame time for each rate. Each increment in frame time over the shading rates changes by a factor of 0.58 % - 3.68 %. This decreases performance for every consecutive rate when using a resolution of 8’294’400 pixels in the set environment. The result for each resolution on the NO SRI setting shows a lower demand when pixel-count increases. However, the results for shading rate 2x2 and 4x4 show a higher demand when pixel-count increases. As for frames per second, similar results can be seen regarding patterns and overall performance.
See figure 4.2.
No SRI VRS 1X1 VRS 2X2 VRS 4X4
640x480 284 fps 292 fps 304 fps 316 fps
1920x1080 287 fps 319 fps 299 fps 310 fps
3840x2160 303 fps 296 fps 295 fps 283 fps
260 fps 270 fps 280 fps 290 fps 300 fps 310 fps 320 fps 330 fps
Average frame rate
640x480 1920x1080 3840x2160
Figure 4.2: Bar graph showing the average frame rate differences for no SRI initiation, 1x1, 2x2, and 4x4 shading rate at 480p, 1080p, and 2160p resolutions.
Overall, the frame time shows an increase of 11 % at most when observing 1x1 at 1080p and 4x4 at 480p. However, frame time decreases down to 6 % when used at 2160p.
The preferred case is when the frame time is as consistent as possible throughout the rendering process, leading to smooth fps. For this study, the captured data showed the frame rate to be around ∼300 fps, in which estimating the preferred frame time gives 1/300 = 0.003, meaning that ∼3 ms was the most optimal frame time in this case.
The general consistency in frame rate landed at 3.00-4.00 ms with a few exceptions
in high peaks in ms. See figure 4.4. Each test did receive a high peak somewhere
17
2.93%
7.25%
11.53%
11.11%
3.98%
7.84%
-2.14%
-2.72%
-6.40%
-8.00%
-6.00%
-4.00%
-2.00%
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
VRS 1X1 VRS 2X2 VRS 4X4
Frame time deviation in percentage
640x480 1920x1080 3840x2160
Figure 4.3: Graph showing how shading rates 1x1, 2x2, and 4x4 at 480p, 1080p and 2160p resolutions deviates in frame time percentage from no SRI.
along the iterations in the tests with an average variance of ∼5.2 ms when comparing the highest peak and the lowest peak of frame rate.
The test with the most consistent frame rate was shading rate 2x2 at 480p, which showed a variance of ∼3.9 ms, whereas the tests with the least consistency of all the tests were both shading rate 2x2 and 4x4 at 2160p. Both showed a variance of exact
∼7.16 ms in high and low peak difference.
Generally the tests show that the variation between frame time high and low peak increases with resolution, 480p, 1080p, and 2160p get an average variance of
∼4.4 ms, ∼5.2 ms, and ∼5.9 ms respectively.
Compared to the frame time, the draw call speed showed little to no change throughout each resolution for the different shading rates. Although, the speed compared between resolutions showed a significant difference.
The draw call speed vary across each resolution with 0.51 ms, 0.61ms, and 0.10 ms for 480p, 1080p, and 2160p respectively. See figure 4.5. The variance could be considered insignificant as the change was less than 1.6 % between the shading rates.
See figure 4.6.
18 Chapter 4. Results
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency NO SRI 640x480p
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 1x1 640x480p
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 2x2 640x480p
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms 9.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 4x4 640x480p
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms 9.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency NO SRI 1920x1080
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 1x1 1920x1080
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms 9.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 2x2 1920x1080
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 4x4 1920x1080
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency NO SRI 3840x2160
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 1x1 3840x2160
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms 9.00 ms 10.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 2x2 3840x2160
0.00 ms 1.00 ms 2.00 ms 3.00 ms 4.00 ms 5.00 ms 6.00 ms 7.00 ms 8.00 ms 9.00 ms 10.00 ms
0 100 200 300 400 500 600 700 800 900 1000
Frame rate consistency 4x4 3840x2160
Figure 4.4: Multiple graphs showing frame rate consistency over 1000 iterations for no SRI initiation, 1x1, 2x2, 4x4 at 480p, 1080p, 2160p resolution
No SRI VRS 1X1 VRS 2X2 VRS 4X4
640x480 32.53 ms 32.07 ms 32.02 ms 32.16 ms
1920x1080 65.62 ms 66.24 ms 66.13 ms 66.00 ms
3840x2160 180.90 ms 181.00 ms 181.00 ms 180.91 ms
0.00 ms 20.00 ms 40.00 ms 60.00 ms 80.00 ms 100.00 ms 120.00 ms 140.00 ms 160.00 ms 180.00 ms 200.00 ms
Average draw call time
640x480 1920x1080 3840x2160
Figure 4.5: Bar graph showing the average draw call time differences for no SRI
initiation, 1x1, 2x2, and 4x4 shading rate at 480p, 1080p, and 2160p resolutions.
19
1.45%
1.59%
1.14%
-0.93%
-0.78%
-0.57%
-0.05% -0.05% 0.00%
-1.50%
-1.00%
-0.50%
0.00%
0.50%
1.00%
1.50%
2.00%
VRS 1X1 VRS 2X2 VRS 4X4
Draw call time deviation in percentage
640x480 1920x1080 3840x2160