Real-time Depth of Field with Realistic Bokeh : with a Focus on Computer Games

(1)

Linköpings universitet

Linköping University | Department of Electrical Engineering

Master’s thesis, 30 ECTS | Computer Science

Spring 2019 | LiTH-ISY-EX–19/5261–SE

Real-time Depth of Field with

Re-alistic Bokeh

–

with a Focus on Computer Games

Real-tids depth of ﬁeld med realistisk Bokeh

Anton Christoﬀersson

Supervisor : Wito Engelke Examiner : Ingemar Ragnemalm

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Depth of field is a naturally occurring effect in lenses describing the distance between the closest and furthest object that appears in focus. The effect is commonly used in film and photography to direct a viewers focus, give a scene more complexity, or to improve aes-thetics. In computer graphics, the same effect is possible, but since there are no natural occurrences of lenses in the virtual world, other ways are needed to achieve it. There are many different approaches to simulate depth of field, but not all are suited for real-time use in computer games. In this thesis, multiple methods are explored and compared to achieve depth of field in real-time with a focus on computer games. The aspect of bokeh is also crucial when considering depth of field, so during the thesis, a method to simulate a bokeh effect similar to reality is explored. Three different methods based on the same approach was implemented to research this subject, and their time and memory complexity were measured. A questionnaire was performed to measure the quality of the different meth-ods. The result is three similar methods, but with noticeable differences in both quality and performance. The results give the reader an overview of different methods and directions for implementing it on their own, based on which requirements suits them.

(4)

Acknowledgments

I would like to thank my examiner Ingemar Ragnemalm for allowing me to go on with this specific subject on my own and also for his feedback throughout the thesis. I would also like to thank my super supervisor Wito Engelke for both his great continous support and feedback but also for our meetings which were both informative and enjoyable! I have learned a lot during the time of my thesis.

(5)

1 Introduction 1 1.1 Motivation . . . 1 1.2 Aim . . . 2 1.3 Research questions . . . 2 1.4 Delimitations . . . 3 2 Theory 4 2.1 Lens theory . . . 4 2.2 Related work . . . 4 2.2.1 Ray-tracing . . . 4 2.2.2 Accumulation buffer . . . 5 2.2.3 Multi-layer methods . . . 6 2.2.4 Single-layer methods . . . 6 3 Method 8 3.1 Project environment . . . 8 3.1.1 Rendering pipeline . . . 8

3.2 Prototypes and shared passes . . . 9

3.2.1 Shared passes . . . 9

3.2.2 Poisson disc blur . . . 10

3.2.3 Separable Gaussian blur . . . 14

3.3 Implementation . . . 16

3.4 Evaluation method . . . 19

4 Results 21 4.1 Prototypes . . . 21

4.1.1 Poisson disc blur . . . 21

4.1.2 Separable Gaussian blur . . . 21

4.2 Implementation . . . 22

4.3 Ray-tracing . . . 22

4.4 Evaluation . . . 23

(6)

5.1 Results . . . 27

5.1.1 Performance . . . 27

5.1.2 Quality . . . 28

5.2 Method . . . 28

5.2.1 Prototypes and implementation . . . 29

5.2.2 Evaluation . . . 30 5.2.3 Sources . . . 30 6 Conclusion 31 6.1 Future work . . . 31 Bibliography 33 A Result images 35 B Result measurements 46

(7)

List of Figures

1.1 Circle of confusion . . . 2

3.1 Diagram of rendering pipeline . . . 9

3.2 Poisson disc sample example . . . 11

3.3 Diagram of passes during Poisson disc blur. . . 11

3.4 Filter kernel for Poisson disk blur. . . 12

3.5 Figure of how the blurriness factor affects CoC size. . . 13

3.6 Visualization of the halo effect. . . 13

3.7 Gaussian kernel. . . 14

3.8 Diagram of passes for the Gaussian blur. . . 15

3.9 1-component convolution kernel for complex depth of field. . . 17

3.10 1-component circular kernel for complex depth of field. . . 17

3.11 4-component circular kernel for complex depth of field. . . 18

3.12 Diagram of how the different components are added. . . 20

4.1 UI showcase. . . 22

4.2 Teapots rendered with Poisson disc blur. . . 23

4.3 Teapots rendered with Gaussian blur. . . 23

4.4 Teapots rendered with Circular blur. . . 24

4.5 Teapots rendered with Arnold in 3ds Max. . . 24

4.6 Results from questionnaire . . . 25

5.1 Haloing effect of the different methods. . . 28

5.2 Bokeh comparison. . . 28

5.3 Hard edged on blurry objects. . . 29

A.1 Sponza scene rendered with Poisson disc blur. . . 35

A.2 Sponza scene rendered with Gaussian blur. . . 36

A.3 Sponza scene rendered with circular blur. . . 36

(8)

List of Tables

3.1 Coefficients for a 1-component kernel and a 2-component kernel for the complex depth of field. . . 17 B.1 Time per pass measured on a GTX 970. Circular1C represents a 1 component

method. Averaged median from 10 samples per setting. . . 47 B.2 Total time of each method measured on a GTX 970. Circular1C represents a 1

component method. . . 48 B.3 Memory calculated from shader variables. Circular1C represents a 1 component

(9)

1 Introduction

When you focus on an object, objects that are further away or closer to you than the object in focus will appear blurry. Depth of field (DoF) is defined as the distance between the ob-ject furthest away and closest that still appear "acceptably" sharp. In the real world, DoF is created naturally by how lenses converge light. When light passes through a lens, a camera lens, it converges and hits the cameras film which perceives the light. Only light that comes from a certain distance away from the lens will converge to one point on the film and appear in perfect focus, the plane at this distance is called plane in focus. The light that is not coming from the plane in focus will instead converge to an area on the film which is referred to as the circle of confusion (CoC), and the shape the area takes is called bokeh. The further away from the plane in focus the source of the light hitting the lens is, the bigger the circle of confusion becomes, making the object blurrier as its circle of confusion blends together with other ob-jects light. However, an object can still be close enough to the plane in focus to have its circle of confusion small enough to fit inside the depth of field, making the object acceptably sharp, but only light from the plane in focus will appear perfectly sharp. Its aperture controls the size of a lenses depth of field, as well as the shape of its bokeh, the bigger the aperture, the smaller the depth of field as more light will enter the lens. In figure 1.1 the depth of field of the image can clearly be seen by looking at the ground.

1.1 Motivation

DoF is commonly used in movies, games, and photography to direct attention to a specific part of the screen or image, to give a scene depth or to make a rendered image more realistic by imitating reality. In film and photography, it can also bring an artistic aspect to the image, giving it more depth. A practical example of use could be that while a player is looking through the iron sight of a weapon in a game, only the objects in the crosshair is in focus and everything else is blurry.

However, since there are no lenses present when rendering an image in computer games, a method to simulate this effect is needed. The method should not take up a lot of computa-tional time or memory bandwidth on the GPU as other things needs to be rendered as well. So the goal is to find a way to simulate the effect of DoF in a way that is possible at real-time speeds and at the same time add a believable DoF that gives the final image a positive effect.

(10)

1.2. Aim

Figure 1.1: Picture showing the real world manifestation of depth of field. Looking at the ground, lines can clearly be seen where the blurry parts start and end, also defining the DoF of the image.

1.2 Aim

Computer games most commonly use a so-called pinhole-camera to render its scenes; this means that the aperture is infinitely small, which in itself leads to no natural generation of DoF. However, since the DoF effect can significantly improve the aesthetics of a scene, direct users attention, or make the scene more realistic, it is still an effect that has many uses in games. The obstacle with implementing such an effect in a game though is that it needs to be able to perform in real-time and can not take up too much memory. So in this thesis, the aim is to find and implement a method to achieve high-quality DoF that is usable in games. Because the focus of the implementation will be towards games, it means that the method needs to be scalable between quality and speed, to make it available to as many users as possible with different resources available. The aim is also to create something believable, so if it is possible, the implementation should try to mimic how real life DoF looks, with circular bokeh.

1.3 Research questions

The following questions will be in focus to achieve the aim of the thesis.

1. What techniques can be used for generating depth of field in real-time with a focus on games?

2. Which of these methods would be best suited for real-time generation of depth of field resembling real-life as closely as possible?

3. Is it possible to scale these methods to suit users with different hardware resources? 4. Is it possible to achieve a realistic bokeh effect and still have real-time speeds?

(11)

1.4. Delimitations

1.4 Delimitations

As there are many different approaches to achieving DoF, it is needed to restrict the thesis to exploring only those that are relevant to the goal of the thesis, which has a focus towards games. For example, the ray-tracing methods, as explained in section 2 is far from being a logical choice in games so while it is essential to know about all the available methods, less time will be put into exploring those methods.

To simplify development the algorithm will be done solely on Windows, but since OpenGL is multi-platform there is no reason for the implementation to not work on other platforms as well, but multi-OS support will not be the main focus.

Evaluation of the algorithm is performed by comparing it to two other algorithms,this will test the implementations performance and in quality. Two prototype implementation of more straightforward DoF methods, as well as comparison with a general implementation of a ray-traced method, will be made to compare the quality of the primary implementation.

(12)

2 Theory

This chapter will introduce the most common methods of simulating DoF, some information about what APIs and tools were used to implement the project as well as how the implemen-tations will be evaluated.

2.1 Lens theory

As mentioned earlier, the lens is what controls the size and shape of the DoF. The two prop-erties of a lens that controls DoF are aperture size and focal length. Aperture is the size of the hole that light passes trough, a camera lens has blades to control its aperture size as seen in 2.1. The larger the aperture the smaller the DoF as more light is able to pass trough the lens and get blended together when hitting the cameras film. Alternatively a smaller aperture gives a larger DoF, making it possible to capture the whole scene in focus. An infinitely small aperture would give an infinitely big DoF. The shape of the aperture also effects the shape of the DoFs bokeh, a circular shape gives a circular bokeh for example, while a heart shaped aperture would give a heart shaped bokeh.

The second property of the lens that controls the lens DoF is the focal length, which is the distance from the lens to the camera film. The larger the focal length is the bigger the DoF is. Figure 3.5 shows the different properties of a lens that effects DoF as well as how an object not in focus could be projected on a cameras film.

2.2 Related work

There are many ways of simulating DoF, and new approaches appear constantly. Four dif-ferent categories separate the most common depth of field methods; ray-tracing methods, methods using an accumulation buffer, multi-layer methods, and single-layer methods. They all have their perks and flaws, usually trading quality for speed. Below some of the methods are explained, where each method originates from and how it has developed over the years.

2.2.1 Ray-tracing

Ray-tracing is a method that shoots rays from the eye through every pixel of the resolution, collecting colors from intersection objects and light sources by bouncing rays in the scene.

(13)

2.2. Related work

Figure 2.1: Picture of a camera lens where the blades controlling its aperture can be seen.

Figure 2.2: Figure showing the concept of circle of confusion. The blue line shows light coming from the plane in focus, and the full line shows how light coming from an object in front of the plane in focus and how it projects onto an area of the image plane.

Cook et al. [3] builds upon this technique by introducing distributed ray-tracing. Meaning distributing multiple rays per resolution pixel and averaging the results to create a more realistic scene and soft shadows. Ray-tracing can be handy for rendering DoF since it gives the ability to simulate an actual lens and its properties by distributing the rays over a lens, creating DoF naturally.

However, since most current graphics hardware is specialized for rendering triangles, these methods are not valid for real-time speed. However, with new graphics cards coming out with support for hardware ray-tracing, like Nvidia’s RTX 2080[11], one might see them used more and more in the future.

2.2.2 Accumulation buffer

The methods that use an accumulation buffer, which was introduced by Haerberli et al. [6], has an approach quite similar to ray-tracing. However, instead of shooting multiple rays

(14)

2.2. Related work

distributed over a lens, they render from different camera viewpoints to approximate what a ray-tracer does by blending the different views. Blending different views gives a similar effect as ray-tracing, and if the amount of rendered viewpoints is high, these methods can generate very high-quality images with DoF, but also with motion blur and soft shadows.

The problem with this method is that with the increased number of passes and quality, there is a reduction in speed and increased memory usage. So while this method is faster than ray-tracing it is still too slow for the use in games.

2.2.3 Multi-layer methods

Multi-layered DoF methods, as the name suggests, sorts the scene into layers by depth. C. Scofield initially introduced this type of method in Graphics Gems III[16], where an algo-rithm he calls a 21₂D algorithm is introduced that splits objects into groups, one group for the foreground and one for the background. The layers are then blurred separately and then blended from far to near, creating the output image.

Blending from far to near removes artifacts like edge bleeding since the blurry objects will not blend into the sharp ones and vice versa. The downside with this approach is that it has problems with partial-occlusion since the depths of the objects are averaged when saved into layers. However, by increasing the number of layers used the averaged depth becomes more precise than before, yet while this increases the quality and helps with artifacts, this also decreases the speed of the algorithm. Another artifact appears when an object stretches throughout multiple layers, making it seem like it does not have DoF.

Various ways too speed up the blurring of the layers have been introduced, Kraus et al. [8] introduced a pyramid processing method, and M. Kass [7] proposes an anisotropic heat diffusion function. [10].

2.2.4 Single-layer methods

There are two different single-layer methods, and they are both similar to each other, the first is commonly referred to as splatting, or forward-mapped Z-buffer DoF, and was introduced by Potmesil et al. [13] and works by drawing and blending circles of confusions (CoC) at each pixel. A CoC, whose size depends on the pixels z-depth and the lens characteristics, is "splatted" for each source pixel on to the destination image. Blending these CoCs create the DoF. However, since rendering one CoC per pixel can be quite time-intensive, this method is not viable for real-time, it is though, commonly used in offline rendering [7].

The second method, that is commonly called gathering, was introduced by P. Rokita [15] and works similarly to splatting, but instead of splatting from the source image to the desti-nation image, this method gathers information per destidesti-nation image by looking at the source image and its neighbors. How to blend the source pixels neighbors into the destination image can be done in many different ways. One could use pre-blurring with filters using a different amount of samples like Riguer et al. [14], using an anisotropic diffusion partial differential equation like Bertalmio et al. [2], separable Gaussian filtering like Riguer et al. [14] or creating Poisson disk distribution to stochastically sample points like Potmesil et al. [13].

Olli Niemitalo presents a circularly symmetric convolution [12] that has the separable property like the Gaussian filter, while at the same time being able to create a circular bokeh. He also presents the kernel with different amounts of components, making it more accurate but slower as the number of components used increase. K. Garcia [5] implemented a DoF based on the single and two-components kernels Olli presented in the game Madden NFL 17.

Since these methods use a pinhole camera, which is a camera with an infinitely small lens their perspective is incorrect. This is because in the real world light would hit different parts of the lens, refracting differently, while the pinhole camera will only get hit at one point on the lens. As the light is only from one point, artifacts can appear such as blurry objects popping

(15)

2.2. Related work

up from behind other objects since something that can’t be seen can’t be blurred. Lee et al. [9] addresses this issue and shows a method for splatting where they compose the scene into three layers: foreground, the plane in focus and background, making this a hybrid between single- and multi-layer methods. By having a layer of objects that are occluded, it makes it possible to track objects that are occluded and in that way blur them when they are partially occluded.

(16)

3 Method

The method chapter consists of four parts. Firstly some information about the project envi-ronment, secondly an implementation of two prototypes was made, one using Poisson disk blur and one using Gaussian blur. The third part explains the main implementation, which uses a complex circularly separable kernel for blurring. Lastly, evaluation of the results was performed, measuring time and memory complexity while a questionnaire was used to test the quality of the implementations.

3.1 Project environment

Here the base of how the project environment works is laied out, presenting the different APIs and methods used during the project. The project is build with C++ and OpenGL.

3.1.1 Rendering pipeline

The rendering pipeline, as seen in 3.1 uses deferred shading to split up the different passes while rendering. Splitting everything into passes enables light to only be calculated once as to compared to forward rendering which calculates shading for each object rendered. Deferred shading works by using multi-render-targets (MRT) which means that instead of rendering to screen directly, the rendering is split up into two or more steps. The first one, rendering the geometry and saving information like color, normals, and position into textures of the so-called G-buffer, which is an off-screen framebuffer. These textures are then forwarded into the next pass which uses the information to calculate things like shading; this pass can either render to the screen or continue to render to off-screen buffers used in other passes for things like post-processing methods. The reason behind doing it this way is to only calculate the lighting in the scene once, in forward rendering the shader calculation lighting would run once for each object in the scene. Even shading of objects that would be occluded by other objects later would be calculated, so by rendering all of the geometry only once, lighting is calculated for only the visible objects at the current frame. A disadvantage with this is that the memory cost is increased for the G-buffer, even more so if one wants to use multiple materials or transparency as additional data would have to be stored. Another disadvantage is that hardware anti-aliasing no longer works and will have to be calculated manually.

(17)

3.2. Prototypes and shared passes

Figure 3.1: Diagram showing the flow of the different passes of the rendering pipeline and which framebuffers are used during each pass.

Passes

The current passes available in the environment used are the geometry pass which renders all of the geometry and outputs color, world position, normals and depth into textures of the G-buffer. After that it is time for the light-pass which calculates the scene lights with the help of the data from the G-buffer. Then the skybox pass is run, which uses forward rendering to render a big textured cube around the scene. After that, it is possible to run post-processing passes, right now the only process method used is DoF. Lastly, the color of the post-processing framebuffer is copied to the back buffer and then rendered on screen.

Post-processing

The post-processing passes are used to create full-screen effects on the scene after everything else has rendered. It can be used to detect edges, sharpen images or create effects such as depth of field.

3.2 Prototypes and shared passes

The two prototype implementation of depth of field was:

a) Using stochastically sampled points distributed using Poisson disk distribution b) Using separable convolution with a Gaussian kernel.

These prototypes were done based on the implementations by Guennadi Riguer et al. [14] where they implemented a Poisson disk blur and a Gaussian blur.

3.2.1 Shared passes

All of the prototypes as well as the main implementation share three passes, so to avoid repeating them during each method they will be explained here.

Render

The first pass of all methods renders the screen normally to an off-screen framebuffer, while also calculating the linearized depth from the depth buffer which is then used to calculate the size of each pixels blur factor, which decides how much each pixel will be blurred later on, by calculating its distance from the plane in focus.

As the engine used for implementing these methods use deferred shading, this is split up into two passes, where the geometry pass stores the depth in a two channel texture where the red channel is the depth and the green channel is the blur factor. The depth is linearized so that that 0.0 is to the closest point and 1.0 is the point furthest away, this calculated by equation 3.1.

(18)

Where depth.r is the red color channel of the depth texture, ViewMatrix is the view matrix and worldPos is the current position in model space. Since it is only used for depth for DoF, there is no need for it to be in camera space, removing the need for multiplying it with the projection matrix and later having to restore the linearized depth.

The blur factor is then calculated and stored in the textures green color channel. Two different methods are used to calculate this, one that gives a more physical representation of the lens with control over aperture and focal length; and one that gives more intuitive control of the DoF and is an average of the actual circle of confusion (CoC) equation.

Depth.g= ˇ ˇ ˇ ˇ ˇ a ¨ f ¨(zf ocus´z) zf ocus(z ´ f) ˇ ˇ ˇ ˇ ˇ (3.2) D= f a (3.3)

Where Depth.g is the green color channel of the depth texture, D is the diameter of the lens, f is the focal length, a is the aperture, zf ocusis the plane in focus, and z is the linearized depth

from equation 3.1. The second equation that is a simplification of this equation is as follows. Depth.g= ˇ ˇ ˇ ˇ zf ocus´z f ocalRange ˇ ˇ ˇ ˇ (3.4)

Where f ocalRange is a variable chosen by the user, which controls the DoF size. Downsample

In the second pass of all methods downsamples the buffer from the first pass to a texture1_/₄

of the screen size,1_/2in x and1_/2in y. The choice of downsampling to1_/4of the screen was chosen as it gives a noticeable performance boost with little visual impact, but any amount of downsampling can be performed. Downsampling is done so that the post-pass can be calculated a lot fewer times since the number of fragments is reduced, without losing too much quality, giving a parameter for control between quality and speed of the methods. The blur factor for each pixel is also saved into each downsampled pixels colors alpha channel. Composite

The last and final pass of every method linearly interpolates between the downscaled blurred image and the full resolution image based on the blurriness which is stored in the fullscreen images alpha channel. Giving the final color, according to function 3.5.

FinalColor=mix(f ullColor, blurColor, f ullColor.a); (3.5)

3.2.2 Poisson disc blur

The first prototype uses a version of gathering, based on what Potmesil et al. [Potmesil1981AGENERATION] presented. It uses stochastically sampled points, distributed over a circle with Poisson distribution, which generates random sample points, but with a minimum distance from each other point, making sure no two samples are the same and cre-ates a very even distribution over the whole circle. The set distance between the points are important as if there is a gap between the sample points, a gap will also appear in the ren-dered CoC. An example of sample points randomized this way can be seen in the figure 3.2.

This method also makes use of the Z-buffer to generate a blurriness factor that is later used to calculate the size of each pixels CoC, the CoC is then used together with the previously generated sample points to sample surrounding pixels. Bigger CoC results in sampling color information further away from the pixel.

(19)

Figure 3.2: Example of 45 samples of Poisson disc distribution over a circle.

Figure 3.3: Diagram showing the step order of the passes during Poisson disc blur.

This method is performed in a total of four passes, firstly the two shared passes, and then one for gathering the colors for the destination pixel and lastly the shared pass for composit-ing.

First pass: Gather

The first non-shared pass uses a filter kernel, as shown in figure 3.4, to gather the colors around the pixel, the filters sample points are stochastically sampled according to a Poisson disk distribution which is shifted by the size of the CoC, calculated previously.

The filter kernels sample points are pre-generated on the CPU and sent to the GPU as a uniform; this is to avoid running expensive randomization operations on the GPU each frame. An example of a kernel with 12 sample points is shown in equation 3.6.

(20)

Figure 3.4: Figure of filter kernel sampled according to a Poisson disk distribution.

                     ´0.326212 ´0.40581 ´0.840144 ´0.07358 ´0.695914 0.457137 ´0.203345 0.620716 0.96234 ´0.194983 0.473434 ´0.480026 0.519456 0.767022 0.185461 ´0.893124 0.507431 0.064425 0.89642 0.412458 ´0.32194 ´0.932615 ´0.791559 ´0.59771                      dx dy (3.6) dx= 0.5 scrWidth dy= 0.5 scrHeight (3.7)

The fragment shader in the post-processing pass uses this filter kernel together with the cal-culated blur factor, that is stored in the alpha channel of the color texture. When choosing which pixels should be sampled, the larger the CoC is the further away from pixels center color will be sampled, but if the CoC is small, it might only sample the current pixel making that pixel sharp as illustrated in figure 3.5.

The pixel shader iterates through each sample point of the kernel and uses equation 3.8 to get the color samples coordinate. The CoC size is calculated with the blur factor together with a maximum CoC size constant, increasing this constant increases the maximum distance of how far from the center pixel colors will be sampled, the bigger this constant is, the more sample points are needed to sample the whole area accurately. The equation used to calculate the CoC is shown in equation 3.9, so if CoCMax is 10, the biggest CoC will sample 10 pixels away from the center pixel.

SampleCoord=CurrentCoord+FilterSample[i]¨CocSize (3.8) where currentCoord is the current pixels texture coordinate, f ilterSample is the array with the filter kernel setup previously on the CPU, i is the current sample in the array and cocSize is the pixels CoC size calculated in 3.9.

(21)

Figure 3.5: Figure of how blurriness factor affects CoC size and how pixels are sampled. The right image shows a blurriness factor of one, only sampling the same pixel making the result color clear and the right one showing the maximum blurriness factor of 1, making the sample area as big as Max CoC.

Figure 3.6: Example of the halo effect created when pixels behind sharp objects gets blurred with color samples from the sharp object infront of it, creating a fuzzy outline around sharp objects.

CocSize=SampleDepth.g ¨ CoCMax (3.9)

The sample texture coordinate is then used to sample the color buffer rendered in the previous pass. So for each sample, the color is added to a color sum according to equation 3.10. However, to avoid pixels behind sharp objects from blurring with the color of the sharp objects in front of it, creating a halo around the sharp object, the weight of the color sample contribution is lowered to one if the sample is in front of the current pixel, this gives sharp objects harder edges. A visual example of this is shown in figure 3.6.

SampleContribution=currentDepth ą sampleDepth?1.0 : CoCSize ColorSum+=SampleColor ˚ SampleContribution

TotalContribution+=SampleContribution

(3.10)

Where SampleContribution is how much the sample’s color will contribute to the color of the pixel, the depth of the sample is compared to the current depth, and if it is in front of the current pixel its contribution is set to 1. ColorSum is the sum of all the sampled colors, including the middle sample. For each iteration, the SampleContribution is also added into TotalContribution which will later be used to normalize the total color.

(22)

Figure 3.7: Graph plotting the Gaussian kernel in 3D space.

PixelColor= ColorSum

TotalContribution (3.11)

Lastly, the sum of all the colors are divided by the total contribution, as seen in equation 3.11, of all the samples to get the final color of the pixel.

3.2.3 Separable Gaussian blur

The Gaussian blur method uses a Gaussian function to blur the image by using it as a con-volution kernel. The Gaussian kernel created this way is separable, which means that the convolution can be performed in two passes, one horizontal and one vertical. This lowers to computational cost of the convolution toO(n+m)fromO(n ¨ m)operations. The Gaussian function in one and two dimensions respectively can be seen in equation 3.12, where σ is the variance. A plot of the kernel can also be seen in figure 3.7.

G(x) = ? 1 2πσ2e ´x2 2σ2 G(x, y) = 1 2πσ2e ´x2+y2 2σ2 (3.12)

This method also downscales the color buffer before applying the blur so that the oper-ations are done on fewer pixels. Lastly, the downscaled blurred image is blended with the sharp fullscreen image by a blurriness factor to create depth of field.

Before rendering the Gaussian kernel, weights are set up in arrays that are later sent to the shaders as uniforms, much like what was done in the previous method with sample points, but now the array contains weights instead. An example for sample weights can be seen in equation 3.13, with both X and Y weights. The weights are calculated with the 1d Gaussian function 3.12 for each dimension.

(23)

Figure 3.8: Diagram of passes and their order for the Gaussian blur depth of field.

                           0.0299455076 0.0299455076 0.0388372280 0.0388372280 0.0483941659 0.0483941659 0.0579383336 0.0579383336 0.0666449517 0.0666449517 0.0736540556 0.0736540556 0.0782085732 0.0782085732 0.0797884911 0.0797884911 0.0782085732 0.0782085732 0.0736540556 0.0736540556 0.0666449517 0.0666449517 0.0579383336 0.0579383336 0.0483941659 0.0483941659 0.0388372280 0.0388372280 0.0299455076 0.0299455076                            (3.13)

First pass: Gauss X

The first non shared pass performs the convolution in the x-axis. This pass loops through the same amount of pixels as there are sample weights in both directions of the source pixel plus the source pixel itself. So an example with the weights from 3.13 would sample 8 pixels to the right and the left as well as the source pixel totaling to 15 samples in X. The pixels to sample are determined by the following equation 3.14.

sampleCoord=centerPixel+dx ˚ vec2(i, 0)˚blurrFactor; (3.14) Where sampleCoord is the pixelcoordinate to be sampled, centerPixel is the coordinate of the current sample pixel, dx is the size of a pixel in x direction, i is the current sample ranging from negative number of samples to number of samples, and the blurrFactor is the blurriness factor that has previously been calculated from the linearized depth stored in the color textures alpha channel. Using this coordinate the color at that position is sampled and added to the total color of the destination pixel by multiplying it with the respective weight in the weight array sent from the CPU as equation 3.15 shows.

totalColor+ =sampleColor ˚ weightXY[i+numSamples]; (3.15) Second pass: Gauss Y

The fourth pass performs the convolution in the Y direction, it is the same as the third pass, but instead, it samples the texture where the convolution from the previous pass is stored and uses the following function to get the samplecoord 3.16.

(24)

3.3. Implementation

3.3 Implementation

The main method of the thesis uses an algorithm presented by Olli Niemitalo [12] in his blog, where he presents a multi-component algorithm for a complex 2-d convolution kernel that is both circular and symmetric, much like a Gaussian kernel 3.8 and at the same time separable. The difference being that this kernel creates hard edges around the blurred circle. Since Gauss is the only function in the real domain that is both circularly symmetric and separable, complex numbers are needed to create a kernel that has the same properties as a Gaussian function but has hard edges which will make the bokeh appear circular, imitating reality.

The kernel was created by looking at what conditions a circularly symmetric kernel that is also separable needs to fulfill. To create the condition, one has to look at how convolution works. Since a separable kernel is wanted, the 1-d kernels should be identical, f(x) = f(y)

and equal to the 2-d kernel, meaning the 2-d kernel can be defined as f(x, y) = f(x)¨ f(y). Moreover, since the kernel needs to be circularly symmetric, it is known that the radius r=

a

x2₊_y2_{can be defined as g}₍_r_{) =} _f₍_{x, y}₎_{giving us the following condition for a circularly}

symmetric kernel.

f(

b

x2₊_y2_{) =} _f₍_x₎_¨_f₍_y₎ _(3.17)

This condition can be split into two separated conditions, one magnitude condition, and one argument condition. ˇ ˇ ˇ ˇf ( b x2₊_y2₎ ˇ ˇ ˇ ˇ =|f(x)¨f(y)|=|f(x)| ¨ |f(y)| (3.18) arg( b x2₊_y2_{) =}_arg₍_x_{) +}_arg₍_y₎ _(3.19)

The magnitude condition 3.18 shows that the magnitude function needs to be a Gaussian function as it is the only real function that satisfies this condition. By looking at the argument condition 3.19, it can be seen that the only family of functions that satisfy this condition is ax2.

Knowing what types of functions are needed to fulfill these conditions, together with the knowledge of that a complex function can be created by multiplying its magnitude with its arg, a complex function in polar form can be created as the result of a product between a Gaussian function and a complex phasor with argument x2. Giving the following 1-d and 2-d convolution functions.

f(x) =eáx2eibx2 =eáx2+ibx2 =e(á+bi)x2 (3.20)

f(x, y) =eáx2+ibx2eáy2+iby2 =eá(x2+y2)+ib(x2+y2)=e(á+bi)(x2+y2) (3.21) Where x and y are spatial coordinates, a is the scaling constant for the Gaussian function, and b is the scaling constant for the complex phasor. This creates the the following real and imaginary kernels seen in figure 3.9.

Weights are then added for the real and imaginary part to scale each part respectively resulting in the final kernel 3.22 which is used to blur the scene and create DoF. Combining the real and imaginary part into the kernel in figure 3.10.

(25)

3.3. Implementation

Figure 3.9: 2D visualization of 1-component real and imaginary parts of the complex convo-lution kernel.

a b c d

1-component 0.862325 1.624835 0.767583 1.862321 2-components 0.886528 5.268909 0.411259 -0.548794

1.960518 1.558213 0.513282 4.561110

Table 3.1: Coefficients for a 1-component kernel and a 2-component kernel for the complex depth of field.

Figure 3.10: 3D plot of 1-component circular kernel for complex depth of field.

Olli presents the algorithm with multiple components, each component increasing the cir-cles quality by adding the components together while increasing the computation time and memory complexity for each additional component used. The constants for calculating the 1 component and 2 component, respectively are presented in the table 3.1. The 1 component and 4 components kernels can be seen in figures 3.10 and 3.11.

So now that it has been shown how the math is derived, it is time actually to implement a circular DoF. As the kernel is separable, the post-process structure is very similar to the Gaussian implementation consisting of five passes.

(26)

3.3. Implementation

Figure 3.11: 3D plot of 4-component circular kernel for complex depth of field.

However, before calculations can start in the shaders on the GPU, the weights are pre-calculated from the coefficients on the CPU and uploaded to the GPU to avoid the need to calculate them on the GPU. The real and imaginary part are stored separately in a 2d vector array uniform.

Garcia introduces a way to pre-normalize the kernel weights so that there is no need to do so later in the shader. He normalizes the weights by multiplying them with a normalization coefficient of 1_αas shown in equation 3.23.

fnorm(x) = 1

αfk(x) (3.23)

where fnorm(x)is the normalized kernel weight, fk(x)is the normalized weight given by

equation 3.20 and α is given by the following equation.

α2= C ÿ c R ÿ x=´R R ÿ y=´R c(Fk(x)rFk(y)r´(Fk(x)iFk(y)i)+ (3.24) d(Fk(x)rFk(y)i+ (Fk(x)iFk(y)r) (3.25)

Where C is the different components if there are multiple, R is the maximum radius that will be sampled, c and d are coefficients, and Fk(y)r and Fk(y)i are the real and imaginary

components of the kernel respectively shown in the equations below. Since the normalization is calculated for all of the weights in a 2-d convolution, the coefficient will be squared.

The real and imaginary parts of equation 3.20 as seen in equation 3.26 and 3.27.

Fk(y)r =ebx 2 (cos(x2a)) (3.26) Fk(y)i =ebx 2 (sin(x2a)) (3.27)

Where a, b are coefficients taken from the table 3.1, and x is the spatial coordinate. The im-plementation is scalable, so the number of pixels to be sampled for each pixel can be chosen, meaning the number of weights calculated and sent to the shader can be different.

(27)

3.4. Evaluation method

First pass: Convolution in x axis

After the first two shared passes it is time to use the 1-d convolution kernel in the x-axis of the downsampled scene. The convolution will create one real and one imaginary part. The two parts are stored separately until they can be convoluted in the y-axis later. Also, since each RGB color value is different, it is needed to split them up and convolute them separately from each other in both x and y, storing each RGB value into 2d vectors, one value containing the real part and one containing the imaginary part. Since the implementation uses two components, it is also needed to store everything twice, one for each component. Three RGBA textures are created for the framebuffer, one for each color channel. Where RG contains the real and imaginary party of the first component and BA contains the real and imaginary part of the second component.

As an example an implementation of 8 pixels are sampled in each direction adding up to a total of 17 samples, 8 in x plus the center pixel, so the color of each pixel is sampled, split-up and a complex multiplication is performed with the color and its respective kernel weight previously calculated and then summed in with the other samples. After all of the pixels have been sampled, the pass is finished.

Second pass: Convolution in y axis

This pass convolutes each RGB texture in the y-axis instead, very much the same way that was done in the previous pass. The pass then takes the norm of each component by adding the components real and imaginary part together with complex addition as seen in 3.28, and visually in 3.12. Where c and d are real and imaginary weights coefficients.

Color(x) =

C

ÿ

c=0

dc¨Fc(x)r+cc¨Fc(x)i (3.28)

Lastly, this pass adds the two components norms together by performing a weighted complex addition, where dcand ccare coefficients, and C is the number of components. Adding up to

the final color value for each color channel.

3.4 Evaluation method

The evaluation part of the thesis consists of two parts: one evaluation of the different imple-mentations time and memory complexity with benchmarks, and one that tries to find a way to measure the quality of the different methods with a questionnaire and image comparisons. Time and memory measurements

Since the CPU and GPU work asynchronously, the measurements were done by creating OpenGL queries and waiting for each query to finish. The queries are created and started before each DoF algorithm pass and ended directly after, the program then waits for the query to finish to get the time elapsed from the start of the query. Making sure that all the commands sent to the GPU have finished and that the time given is the time it took for the GPU to process the commands sent.

To reinforce the results that the queries gave Nvidia’s profiling tool, Nsights Graphics[4] was used to profile each pass as well. These tests gave the same results as the queries did. The results from the test are compiled and presented into table B.2, with each passes time complexity shown in B.1.

Nvidia Nsight only supports basic memory profiling tools, so to measure the memory usage of each method instead of the different textures, buffers, and variables sizes were cal-culated by hand and compared with results of Nvidia Nsight.

(28)

3.4. Evaluation method

Figure 3.12: Diagram of how the different components as well as the real and imaginary parts are calculated separately and then added together for the final result.

10 samples were taken for each of the different settings, 5 from queries and 5 from Nsights, the average was then calculated with the median. The results from the memory measure-ments are shown in table B.3.

Questionnaire

The questionnaire contains six questions, three questions regarding the person taking it, and three regarding the quality of the different implementations. It also contains an intro section that explains the concept of DoF and gives a real-life example of DoF and Bokeh to introduce the person taking the questionnaire to the subject. The questionnaire was created online with Google Forms to be easily distributed. The questionnaire was sent mainly to people with some knowledge of computer graphics or photography. A total of 30 people were surveyed. The questionnaire in full can be seen in Appendix A.

(29)

4 Results

In this chapter, the results will be presented. The results of this thesis are two prototype implementations and one main implementation, tests have also been performed to measure time and memory complexity and a questionnaire to see which implementation looks the best.

The different methods are presented in 3 different ways, one scene with high-intensity blur in the background and a little bit in the foreground, one scene with three teapots and blur in both back-and foreground and lastly a scene with something blurry very close in the background. The different methods use the same position and direction of the camera as well as the settings for each scene.

The environment in which everything was implemented contains a UI where the differ-ent methods can be interchanged during run-time, which helps with the comparison of the different methods. The UI can control the distance to plane in focus from the camera, focal length, and aperture. The UI can be seen in figure 4.1.

To keep things clear this chapter will have the same structure as the method part, starting with prototype then implementation and finishing with evaluation.

4.1 Prototypes

4.1.1 Poisson disc blur

The Poisson disk blur used pre-randomized sample points, inside a disc, to sample the sur-rounding pixels. An example of the implementation in use can be seen in figure 4.2. The image contains blurred teapots in a line where the middle teapot is in focus, and both the teapots behind and in front of the teapot in focus are blurred. There is no halo around the teapot in focus, creating quite a realistic DoF. This method also creates a circular bokeh, which can be seen in the appendix figure 5.1. The bokeh is a bit jittery but circular.

4.1.2 Separable Gaussian blur

Because of the Gaussian kernels smooth nature, the Gaussian blur creates a smooth blur where the bokeh as well as smooth edges instead of hard edges like in real life. An exam-ple of the Gaussian blur of the teapot scene can be seen in figure 4.3, some slight haloing can

(30)

4.2. Implementation

Figure 4.1: Showcase of environments UI. Showing FPS, Camera position and view direction. The UI also shows what type of depth of field method currently in use and the functionality to switch between methods. Camera properties such as focal point, focal length and aperture which impact the depth of field, can also be seen. Lastly for the complex depth of field method, number of components used is also switchable between 1 and 2.

be seen on the sharp object. The Gaussian blurs bokeh can be seen in the appendix figure 5.1, which shows how smooth the edges are of the bokeh.

4.2 Implementation

The circular separable convolution creates blur with a circular bokeh. The circular blur of the teapot scene can be seen in figure 4.4. Like with the Gaussian blur a halo can be seen around the object, even more, apparent with this method. An example of this methods circular bokeh can be seen in appendix figure 5.1. If the number of components used for the circular blur is reduced to 1 than the speed of the implementation matches that of Gauss, but the loss of quality is quite apparent.

4.3 Ray-tracing

The reference images that were created for the questionnaire and comparison of quality were rendered in 3ds Max 2020[1] using the ray-tracer Arnold. A result of the teapot scene ren-dered in this way can be seen in figure 4.5.

(31)

4.4. Evaluation

Figure 4.2: Teapots rendered with Poisson disc blur.

Figure 4.3: Teapots rendered with Gaussian blur.

4.4 Evaluation

The results of the evaluation are presented here, firstly, the tables of time and memory com-plexity, and then the results from the questionnaire.

Time and memory measurements

The time measurements were made on a computer with a GTX 970 graphics card, by mea-suring the average result by taking the median of 10 sample points, giving the results in the following tables, where table B.1 shows the time each pass took and table B.2 shows the total time for each method.

The table B.3, shows the calculated memory complexity of each pass, also showing each variables data type.

(32)

4.4. Evaluation

Figure 4.4: Teapots rendered with Circular blur.

Figure 4.5: Teapots rendered with Arnold in 3ds Max.

Questionnaire

(33)

4.4. Evaluation

Figure 4.6: Results from questionnaire

(a) Age.

(b) DoF knowlage.

(34)

4.4. Evaluation

(d) Question 2.

(35)

5 Discussion

5.1 Results

The resulting application is quite versatile and it is quite easy to compare how the different methods by using the same input values controlled by the UI, and also generating images from the same viewpoints as the DoF method used can be interchanged in real-time. Having different methods to compare with each other is useful as the different methods differ greatly from each other, making it possible to choose the one that matches ones needs most.

5.1.1 Performance

Looking at the time measurements, we can see that the separable methods are much faster than the non-separable one which is expected as the complexity is reduced to O(m+n) in-stead of O(m ˚ n)this means that these methods can be used with much higher sample rates enabling higher quality blur and blurrier images, even at higher resolutions. The down-scaling also makes a big difference to performance, reducing the computation time for the methods by sometimes more than half. The downscaling of the image also creates a denser blur, as the distance between pixels in a downscaled scene will be greater than a fullscale one, making the same amount of samples sample further away from the origin in a down-scaled scene than the same amount of samples would in a fullscale one. A downside of the downscaling is that it can introduce aliasing such as jittering when moving the camera, as the scaling might not sample the same places each time but at1_/₄_{of the resolution the jittering is}

not noticeable. We can also see that the Gaussian filter is a bit faster than the circular one, this makes sense as the second pass of the circular filter needs to sample 3 textures and add the imaginary and real parts together

The memory complexity of the methods differ quite a lot, both of the prototypes use about 1.5MB of memory, but a 2-component circular method uses around 6MB, which is quite a big jump. The memory also scales with the number of components used so sacrificing quality for memory is possible; the one component implementation of the circular method only uses a total of about 3.75MB.

(36)

5.2. Method

Figure 5.1: Haloing effect of the different methods in the teapot scene. From left to right, Poisson, Gauss, circular.

Figure 5.2: Bokeh comparison of the different methods, generated from a static image. From left to right, Poisson, Gauss, circular.

5.1.2 Quality

The questionnaire shows us that the quality of the different methods do not differ signifi-cantly, in the images with the Sponza scene the circular method was voted the one closest to the reference image by 53.3%, while when comparing the teapots it was the one voted furthest from the reference. The haloing artifact created by not discarding samples in front of the center pixel in the circular blur and Gaussian method is apparent in the teapot scene, shown in figure 5.1, which might be the reason behind people choosing differently here, but in total between the two scenes circular blur still got more votes. The big difference between the different methods can be seen while looking at the bokeh, where the Poisson and circular methods create a circular bokeh with hard edges while the Gaussian blurs bokeh has smooth edges, an example of this can be seen in figure 5.2. Over 56% chose that the circular methods bokeh was the most realistic one, which was the main reason for using this method, to begin with.

Something else that becomes quite apparent when you have a very blurry object in front of a sharp object is the hard edges of the blurred object. Since the object in focus will not get blurred at all, or consider any surrounding pixels, when a blurry object in the foreground oc-cludes parts of the object in focus, the image will go from very blurry to very sharp instantly, shown in figure 5.3.

5.2 Method

The method of the thesis turned out to be a good way to compare different approaches to the same problem. The differences between the implementations could have been small enough

(37)

5.2. Method

Figure 5.3: Showcase of the sharp edges that appear around very blurry objects that are in front of sharp backgrounds.

not to be visible but since the result shows that the implementations differ in both appearance and memory/time complexity the method is a good way to show each methods strengths and weaknesses.

5.2.1 Prototypes and implementation

The actual implementations are a bit flawed as seen in figures 5.1 and 5.3. While the halo arti-fact only affects Gaussian and circular blur, the sharp edges artiarti-fact affect all of the methods, but they both decrease the quality of the blurs greatly when apparent. The halo artifact could be solved for the two remaining methods similarly as the Poisson method by taking weighted averages from only samples that are in front of the source pixel. The hard edge artifacts are a bit trickier, but K. Garcia explains in his paper [5] how he splits the screen into fore- and background and grows the foreground by tiling the circle of confusion to a minimum value, effectively growing the silhouettes of each blurry object, making it possible to blur things that are actually in focus behind the blurry objects. A positive side of using shaders for the implementation is that the results are highly replicable even in different shading languages than GLSL, as long as the shading language supports the same features used it should not matter where the method is implemented.

The pre-randomized Poisson disk sample points seem to be a bit badly distributed over the entire disk, as seen in figure 5.2. The top half of the disk seems to contain fewer samples than the lower half, creating a fairly strange bokeh. This also means that the lower half should contain more sample of the 64 sample points used than the upper half, but the lower bokeh still looks worse than the circular methods bokeh, which suggests that even with it would look worse nonetheless.

(38)

5.2. Method

Except for the haloing and hard-edge artifact the Gaussian blur works as intended and is the fastest method of them all by a little bit(0.07ms compared to circular blur for a downscaled 25 sample implementation), and while the blur looks nice (coming second in total votes in the questionnaire) the bokeh doesn’t look anything like reality.

The circular DoF received the highest amount of total votes in the questionnaire, suggest-ing it is the highest quality method. It is also the second fastest, not besuggest-ing much slower than the fastest method, while at the same time creating a perfectly circular bokeh without any vis-ible artifacts on the bokeh as can be seen in the Poisson implementation. This points towards that from the results presented the circular method for depth of field is the, of the three im-plemented methods, most correct implementation. With its comparable time complexity and its more excellent quality bokeh, this method beats the other two. The only drawback is that it scales poorly in memory when the number of components increase. If the quality can be sacrificed, the number of components can be reduced to 1, giving it the same time complexity as the Gaussian implementation.

5.2.2 Evaluation

Because of the asynchronous nature of the GPU measuring time complexity is an uncertain task, but by using two different methods to do so makes the results a bit more viable. Both queries in the code were used, and Nvidia Nsight and they both gave the same results, which points to that the measuring is accurate. However, using queries and Nvidia Nsight actually might impact the performance of the application, so running it without measurements might be faster. Also, since the memory complexity of each method has been calculated by hand the accuracy might not be the best, and the real results might differ, if one finds a profiling tool to do this, it would probably perform the task better.

The questionnaire turned out to be an excellent way to measure quality, as abstract as it is it was the best way of comparing the quality in a fast way. However, the number of people tested might be a bit low to get any substantial results. If more time and a more significant base of subjects were available, the results from the questionnaire could bring a bit more weight to the thesis. The questionnaire was also very compact and straightforward, only containing six questions. Which has its advantages as it can be taken quickly and not confuse the subjects too much. However, there is also an advantage of having multiple scenes and angles in different lighting and amount of blur but the time to render images in 3ds Max was the real bottleneck here, as it took close to 5 hours to render each image and then the scene had to be replicated which also took some tweaking. If more time was available, a more thorough questionnaire could have been considered.

5.2.3 Sources

As this thesis had the focus of only implementing depth of field, some sources were neglected where Dof was combined with other post-effects. T. Leimkühler et al. [10] combines both depth of field and motion blur, which together might save more time. So while multi-layer, accumulation buffer and ray-traced methods are discussed in theory, if time was available it could have been interesting to compare methods from all the different approaches to compare them, making the thesis a bit broader.

The sources for the Gaussian and Poisson implementation was also a very basic use of the blurs, and more research could have gone into if these methods are used in more advanced or improved ways today.

There are many sources and different implementations of Poisson and Gaussian blur, but only two sources using circular blur was used, O. Niemitalos blog article presenting the method [12] and K. Garcia’s [5] paper on his implementing. Since the paper explains the method being implemented in a AAA game, it holds some weight, but it could have been interesting to see different takes on the same method or a bit more research in the area.

(39)

6 Conclusion

In this thesis, different depth of field methods have been explained, and three methods suited for real-time use in games have been implemented and looked at more closely. One method sticks out from the three, giving the best result in resembling reality, both in blur quality and bokeh shape without being much heavier computationally than the fastest method. The different methods are all found to be scalable by using different amounts of sample points and, for the circular method, components giving the person implementing the method a wide choice when it comes to quality vs. time and memory complexity. A circular bokeh is also shown to be achievable in real-time with the circular depth of field method. Though artifacts are present in the current implementations, measures to address them are discussed which should not make these methods less viable.

If circular bokeh is wanted than the circular blur is the way to go, but if circular bokeh is not needed, where realistic depth of field is not sought after, then the Gaussian blur creates similar results faster and with lower memory cost making it more suitable. However, since one of the reasons for using depth of field is to make the scene more realistic and circular bokeh plays a significant role in that, the circular blur method should be taken into consider-ation.

The circular method could even be used in CGI, as a post-effect since it can be scaled in for quality quite high. Where a 5-components blur ripples artifacts are 1/250 in size, which would be almost invisible everywhere.

6.1 Future work

First and foremost future work would go into fixing the artifacts of the different methods. As explained in the discussion section of this thesis, to battle the hard edges effect one could split the rendered image into foreground and background with the help of the linearized depth and blur them separately while growing the foregrounds silhouettes, making them blur into the background more smoothly. A different number of components and samples could even be used for the foreground and background if there would be a use for it. K. Garcia [5] explains that they used 1-component blur in the foreground as near-depth of field was much less common, saving performance. The haloing effect could also be addressed by only using samples that have a higher depth value than the source pixels, removing the risk

(40)

6.1. Future work

of sampling sharp objects in front of the blurred pixel. Though one would need to figure out a way to calculate the weighted average in some way for the samples for this to work.

Since the environment in which the depth of field methods has been implemented into is quite limited, lacking shadows and other important visual cues, it would significantly improve the quality of the implementations could be implemented in a more versatile and robust environment, such as a real game engine. This could even improve the perfor-mance/quality of the methods as more optimized, or higher quality was of downscaling or composting could be available, as only simple methods for these things were implemented during this thesis. Implementing the methods in a game engine and using them in a real game would also provide more information about the methods would hold together with other effects or if they are fast enough when resources need to be shared with other features. So far only theoretical conclusions can be made, and only after the implementations have seen real-life usage can practical conclusions be made.

The kernel weight generation can be optimized, as explained by Olli in his blog [12]. This would not increase the quality or performance of the actual depth of field generation but the initialization time of the kernel weights would be improved. Olli explains that a and can be seen as a weight for the real part and b a weight of the imaginary part of the 2-d convolution in the equation 3.22. The same kernel weights can be achieved by using complex multiplica-tion with a ´ bi and then discarding the imaginary part. This, together with disregarding the need to have two identical 1-d convolutions, makes it possible to absorb the complex mul-tiplications into the 1-d convolutions which in itself leads to that the imaginary part of the second 1-d convolution is not needed, saving calculation time and memory for storing it.

It could also be interesting to implement methods from the different approaches, such as multi-layer approaches and make comparisons with them even though they might not have a use in real-time. This would broaden the scope of the thesis implementation, giving a more in-depth analysis of the current depth of field methods.

(41)

Bibliography

[1] Autodesk. 3ds Max 2020. 2019.

[2] M Bertalmio. “Real-time, accurate depth of field using anisotropic diffusion and pro-grammable graphics cards”. In: 3DPVT ’04 Proceedings of the 3D Data Processing, Visu-alization, and Transmission, 2nd International Symposium (2004), pp. 767–773.DOI: 10 . 1109 / TDPVT . 2004 . 1335393. URL: http : / / ieeexplore . ieee . org / xpls /

abs_all.jsp?arnumber=1335393.

[3] Robert L. Cook, Thomas Porter, and Loren Carpenter. “Distributed Ray Tracing”. In: ACM SIGGRAPH Computer Graphics 18.3 (1984), p. 205.ISSN: 00978930.DOI: 10.1145/ 964965.808599.

[4] NVIDIA Corporation. NVIDIA® Nsight™ Graphics. 2019.

[5] Kleber Garcia. “Circular separable convolution depth of field”. In: ACM SIGGRAPH 2017 Talks on - SIGGRAPH ’17. New York, New York, USA: ACM Press, 2017, pp. 1–2.

ISBN: 9781450350082. DOI: 10.1145/3084363.3085022.URL: http://dl.acm. org/citation.cfm?doid=3084363.3085022.

[6] Paul Haeberli and Kurt Akeley. “The accumulation buffer: hardware support for high-quality rendering”. In: ACM SIGGRAPH Computer Graphics 24.4 (1990), pp. 309–318.

ISSN: 00978930. DOI: 10.1145/97880.97913.URL: http://portal.acm.org/ citation.cfm?doid=97880.97913.

[7] Michael Kass, Aaron Lefohn, and John Owens. “Interactive Depth of Field Using Sim-ulated Diffusion on a GPU”. In: Computing (2006), pp. 1–8.ISSN: 1521-0391. DOI: 10. 1111/ajad.12361.URL: https://www.researchgate.net/profile/Aaron_ Lefohn/publication/241587604_Interactive_Depth_of_Field_Using_ Simulated_Difiusion_on_a_GPU/links/0a85e537fd6eab0736000000.pdf. [8] M. Kraus and M. Strengert. “Depth-of-field rendering by pyramidal image processing”. In: Computer Graphics Forum 26.3 (2007), pp. 645–654.ISSN: 14678659.DOI: 10.1111/ j.1467-8659.2007.01088.x.

[9] Sungkil Lee, Gerard Jounghyun Kim, and Seungmoon Choi. “Real-time depth-of-field rendering using point splatting on per-pixel layers”. In: Computer Graphics Forum 27.7 (2008), pp. 1955–1962.ISSN: 14678659.DOI: 10.1111/j.1467-8659.2008.01344. x.

(42)

Bibliography

[10] Thomas Leimkühler and Hans-Peter Seidel. “Laplacian Kernel Splatting for Efficient Depth-of-field and Motion Blur Synthesis or Reconstruction”. In: ACM Trans. Graph 37.4 (2018), pp. 1–11.ISSN: 15577368. DOI: 10 . 1145 / 3197517 . 3201379.URL: https : //doi.org/10.1145/3197517.3201379.

[11] Nvidia. Nvidia Turing GPU, White Paper. Tech. rep.

[12] Olli Niemitalo. Circularly symmetric convolution and lens blur.URL: http : / / yehar . com/blog/?p=1495.

[13] Michael Potmesil and Indranil Chakravarty. “A Lens and Aperture Camera Model for Synthetic Image Generation”. In: ACM Transactions on Graphics 1.2 (1981), pp. 85–108.

ISSN: 07300301.DOI: 10.1145/357299.357300.

[14] Guennadi Riguer, Natalya Tatarchuk, and John Isidoro. “Real-time depth of field simu-lation”. In: ShaderX2: Shader Programming Tips and Tricks with DirectX 9 (2003), pp. 529– 556.URL: http://developer.amd.com/media/gpu_assets/shaderx2_real-timedepthoffieldsimulation.pdf.

[15] PRZEMYSLAW ROKITA. “Fast generation of depth of field effects in computer graph-ics”. In: Comput. & Graphics 17.5 (1993), pp. 593–595.

(43)

A

Result images

(44)

Figure A.2: Sponza scene rendered with Gaussian blur.

(45)

Figure A.4: Questionnaire in full from Google forms.

Quality of depth of eld

* Required

Questions

1 2 3 4 5

What is you name? *

Your answer

How old are you? *

Your answer

How familiar are you with the concept of depth of eld? *

(46)

Image 1

Image 2

Image 3

Image 1

Using this image as reference, which of the following three images resembles it the best? *

(47)

Image 2

Image 3

(48)

Image 1

Image 2

Image 3

Image 1

Using this image as reference, which of the following three images resembles it the best? *

(49)

Image 2

Image 3

(50)

Image 1

Image 2

Image 3

Which of the following three images of the following photo has the most realistic bokeh? *

(51)

Image 1

(52)

Image 2

Real-time Depth of Field with Realistic Bokeh : with a Focus on Computer Games

Linköping University | Department of Electrical Engineering

Master’s thesis, 30 ECTS | Computer Science

Spring 2019 | LiTH-ISY-EX–19/5261–SE

Real-time Depth of Field with

Re-alistic Bokeh

with a Focus on Computer Games

Real-tids depth of ﬁeld med realistisk Bokeh

Anton Christoﬀersson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research questions

1.4

Delimitations

2

Theory

2.1

Lens theory

2.2

Related work

2.2.1

Ray-tracing

2.2.2

Accumulation buffer

2.2.3

Multi-layer methods

2.2.4

Single-layer methods

3

Method

3.1

Project environment

3.1.1

Rendering pipeline

3.2

Prototypes and shared passes

3.2.1

Shared passes

3.2.2

Poisson disc blur

3.2.3

Separable Gaussian blur

3.3

Implementation

3.4

Evaluation method

4

Results

4.1

Prototypes

4.1.1

Poisson disc blur

4.1.2

Separable Gaussian blur

4.2

Implementation

4.3

Ray-tracing

4.4

Evaluation

5

Discussion

5.1

Results

5.1.1

Performance

5.1.2

Quality

5.2