Real-Time Rendering of Soft Shadows

(1)

Department of Science and Technology Institutionen för teknik och naturvetenskap

Linköping University Linköpings universitet

LiU-ITN-TEK-A--19/060--SE

Real-Time Rendering of Soft

Shadows

Johannes Deligiannis

(2)

LiU-ITN-TEK-A--19/060--SE

Real-Time Rendering of Soft

Shadows

Examensarbete utfört i Datateknik

vid Tekniska högskolan vid

Linköpings universitet

Johannes Deligiannis

Handledare Stefan Gustavson

Examinator Jonas Unger

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Real-Time Rendering of Soft Shadows

Johannes Deligiannis

(5)

Abstract

This thesis presents a method to render real time soft shadows efficiently on modern GPU’s using a combination of low discrepancy sampling sequences and spatiotemporal convolution. A theoretical foundation of shadows and shadow rendering will be presented together with common algorithms used for shadow rendering and light space convolution. The method presented is a reasonable candidate for games and other interactive applications aiming to provide visually pleasing soft shadows.

(6)

Thanks

I’d like to express my gratitude to Torbörn Söderman and Daniel Johansson for giving me the opportunity to do my master thesis at EA DICE and to my manager Björn Hedberg, for giving me time after my employment to complete it.

I’d like to thank my external thesis supervisor at EA DICE, Yasin Uludag, not only for the invaluable input and suggestions on technical issues, but also for making me feel like part of the team from day one.

I want to thank my supervisor at LiU, Stefan Gustavson, for the great support during the initial phase of the work and extra thanks for my examiner Jonas Unger for letting me pick up where I left off and complete my thesis - almost half a decade later than planned.

And finally I’d like to thank my family and my friends who are always there for support, through thick and thin.

(7)

List of Figures

2.1 Shadow Mapping . . . 6

2.2 Shadow Acne . . . 6

2.3 Shadow Bias . . . 7

2.4 Cascaded Shadow Map . . . 7

2.5 Umbra & Penumbra . . . 11

2.6 PCSS . . . 12

2.7 The visualisation of a screen space reprojection vector. . . 15

2.8 Sampling pattern: Regular vs Uniform . . . 17

2.9 Hierarchical Poisson Disk . . . 17

2.10 Hammersley vs Halton . . . 18

2.11 Blue Noise Dither Texture . . . 19

2.12 Blue Noise Dithering Example . . . 19

2.13 Cranley-Patterson Rotation . . . 20

2.14 Median & Box filter . . . 21

3.1 Data packing . . . 24

3.2 Overview of Render Module . . . 25

4.1 Reference scene overview . . . 27

4.2 Regular sampling . . . 27

4.3 Uniform sampling . . . 28

(11)

4.5 Hammersley sampling . . . 28

4.6 White noise rotation vs Blue noise rotation . . . 29

4.7 Blue noise Hammersley . . . 29

4.8 Left: White noise + Hammersley, Right: White noise + poisson disc sampling . . 30

4.9 White noise + Hammersley sampling . . . 30

4.10 Denoising overview . . . 30

4.11 Bilateral filter overview . . . 31

4.12 Result . . . 32

4.13 Performance analysis . . . 33

(12)

Glossary

BRDF Bidirectional Reflectance Distribution Function. 3, 4, 24 CSM Convolution Shadow Maps. 8

CSM Cascaded Shadow Maps. 5, 7, 24, 26 ESM Exponential Shadow Maps. 8, 10 GPU Graphics Processing Unit. 14, 23, 26 HLSL High-Level Shading Language. 8 MCM Monte Carlo Method. 20, 34

PCF Percentage Closer Filtering. 8, 10, 11, 13, 24, 26

PCSS Percentage Closer Soft Shadows. 1, 2, 11–14, 16, 20, 23–26, 32–35, 39, 40 QMCM Quasi-Monte Carlo Method. 16, 20, 34

RQMCM Randomized Quasi-Monte Carlo Method. 16, 34 RTV Render Target View. 5, 25, 36

SAT Summed Area Table. 12, 13 VSM Variance Shadow Maps. 8, 13

(13)

Chapter

1 Introduction

1.1 Motivation

Shadows play a crucial role in the understanding of shape, size and relative location of geometry in three dimensional space. Many properties of shadow casters and receivers can be derived by inspecting a shadow silhouette and its interaction with surfaces in the scene. Shadows may even provide clues on the surrounding weather conditions, sun shadows cast during overcast tend to be dimmer and softer due to the light ray absorption and scattering inside clouds. Furthermore, real-time rendering applications often (but not necessarily) attempts to accurately simulate light transport as a means to generate photo realistic images. Most light transport problems, including shadows, are limited in effectiveness due to the complexity of the calculations involved. For real-time applications, the available computational power is strictly limited and may require a wide-range of approximations and assumptions. It’s for this reason preferable to reduce the computation time of shadows not only to produce shadows of higher quality for the same computational cost, but also to free up rendering time for other effects in order to improve the overall image quality.

1.2 Aim

The goal of this thesis project is to investigate different techniques to accelerate the computation of real-time soft shadows while at the same time retain as much image quality as possible. The techniques investigated should also easily integrate to an existing rendering pipeline, engine or framework. In particular the PCSS algorithm will be the focus and different ways to post-process a cheap, low quality shadow mask in to a high quality, noise free, shadow result.

1.3 Research questions

This thesis will focus on a few key areas of soft shadow rendering with a focus on PCSS in particular. While light space algorithms exists (as in shadow map convolution) screen space

(14)

algorithms will be the primarily focus for this thesis.

1. What are the different ways to produce soft shadows in real-time rendering applications? 2. Is it possible to denoise shadows in screen space and produce a pleasing result without

deviate significantly from a reference implementation?

3. Is it practical to re-use approximate shadows temporally without suffering from typical reprojection artefacts like ghosting?

4. Stochastic sampling of the shadow map is often used for PCSS in practice, how should a sampling pattern be chosen for best results?

1.4 Delimitations

This thesis will focus on single orthographic directional shadows cast from opaque occluders (eg. sun shadows) meaning that solutions for multiple lights with overlapping shadows and different distributional characteristics (spotlight, cone light, omnidirectional lights) and shapes will not be covered. Additionally the work assumes a deferred rendering pipeline that allows the shadow mask, if need be, to be processed in screen space per-pixel to share information spatially. As opposed to forward-rendering where each pixel is individually lit.

(15)

Chapter

2 Background

Before diving in to the technical aspects of shadows and the implementation in computer graphics, it can be good to define what is meant by shadow and what a shadow is. At a first glance it might seem obvious what is meant by shadows, after all it’s something we see and interact with daily, but it turns out even dictionaries have a hard time providing accurate definitions. Of many attempts, one such definition [7] is perhaps best capturing the essence of the phenomena:

Shadow [is] the region of space for which at least one point of the light source is occluded.

By inspecting the wording closely it can be inferred that indirect illumination is excluded entirely from the definition. While that may seem strange it’s an appropriate distinction: in rendering shadows normally implies direct illumination where as indirect illumination typically falls under the field of ambient occlusion. With a definition in place to base the rest of the work on, it can be a good idea to define shadows properly using a mathematical formulation as well.

Let’s start with the definition of the fundamental equation in computer-graphics; the rendering

equation [11]:

Lo(p, ω) = Le(p, ω) +

Z

Ω+

fr(p, ω, ˆω)Li(p, ˆω)cos(np, ˆω)dˆω (2.1)

It defines the outgoing light energy (radiance) Lofrom a given surface point p and ray direction

ω. A surface point may be a producer of radiance (light source), described by the term Le.

It may also receive and reflect radiance from other surfaces Li, for any direction given by

the integral over the hemisphere Ω+ around the surface normal np. How much of the incoming

radiance from direction ˆω reflected in the outgoing direction ω is governed by the surface material properties fr, also known as the Bidirectional Reflectance Distribution Function (BRDF). The

BRDF can be anything from a constant value for diffuse materials (spreading energy equally in every direction), a simple specular shading models such as Lambertian shading and Blinn-Phong shading to a highly complex BRDF’s such as Cook-Torrance and the Ward model, where the physical micro-facet structure of a surface is being modelled. The rendering equation models

(16)

the flow of radiance in a scene, but it’s in practice impossible to solve analytically and naive numerical solutions typically crumble due to the recursive nature of the integral. The entire discipline of computer graphics can be considered a collection approximations to the rendering equation (or parts of it, at least).

One such subproblem is shadows. To begin carving out the shadow part of the integral a surface reformulation of eq. 2.1 must first be done. By introducing the second point q in the scene S and substituting the incoming direction ˆ_{ω by the visibility p → q =} _||q−p||q−p and Li(p, p → q) = Lo(q, q → p), leading to: Lo(p, ω) = Le(p, ω) + Z S fr(p, ω, p → q)Lo(q, q → p)G(p, q)V (p, q)dq (2.2) G(p, q) = cos(np, p → q) cos(nq, q → p) ||p − q||2 (2.3)

Where V denotes a visibility function, being either 1, for visible surfaces, or 0 for occluded surfaces. G is referred to as the geometric term, representing the fraction of a surface point area that’s visible on the hemisphere.

For shadow computation a series of further simplifications can be made. If it’s assumed that only direct illumination is of interest, then the recursive part of the integral can be removed (Lo is replaced with Le, since Lo = 0 for non light sources). And thus only points in the light

source l is considered. In addition we assume a single light source, meaning that S = L and the emissive part Le of the surface is ignored. This results in the direct-lighting equation:

Lo(p, ω) =

Z

L

fr(p, ω, p → l)Le(l, l → p)G(p, l)V (p, l)dl (2.4)

Additionally if it can be assumed that the distance from the light source is large, and the light shape is simple, the geometric term G varies little, and if the BRDF is not very complex the integral can be split in to a product of shading and visibility:

Lo(p, ω) = Z L fr(p, ω, p → l)G(p, l)dl · 1 ||L|| Z L Le(l, l → p)V (p, l)dl (2.5)

If the emissive part Le can be considered constant for all directions and points on the surface,

what is left to evaluate is the visibility integral:

VL(p) = 1

||L|| Z

L

V (p, l)dl (2.6) The visibility integral is what is typically meant by (soft) shadow computation, shrinking the integration area L to a single point is what produces hard shadows. This formulation also intuitively connects back to the definition of shadows at start of chapter 2; shadow is the region of space p where at least one point of the light source is occluded: VL(p) 6= 1.

2.1 Shadow Techniques

To evaluate eq. 2.6, and V in particular, many different algorithms have been developed over the years. The two fundamental techniques that constitutes the basis of most shadowing algorithms

(17)

are Shadow Mapping [26] and Shadow Volumes [5]. In general, shadow maps are considered faster and more flexible, since it can render any type of geometry while shadow volumes requires shadow casters to be represented as polygonal geometry in order to extract a shadow silhouette volume. Shadow volumes does on the other hand produce perfectly sharp shadows where shadow mapping suffers from precision issues such a jagged shadow edges and shadow biasing.

Most modern games uses shadow mapping, and the Frostbite game engine, in which the work presented here has been implemented, is no exception. For this reason shadow mapping and related techniques will be presented in more depth.

2.1.1 Shadow Volumes

In contrast to shadow mapping where the scene is converted to a set of sampled depth values, shadow volumes [7] is processing the shadow casters in the scene with the goal to create a new set of primitives, volumes, that will enclose the shadow cast for each primitive. The volume is defined by the shadow casting primitive (eg. triangle) and three quads that extend from the primitive towards infinity. The quads are defined by edge vertices that extend towards infinity along a line from the point light source through the triangle vertices. Any point can then be tested for occlusion of the primitive by either being inside or outside of the constructed shadow volume. The challenge of shadow volumes lies in efficient construction of the shadow volumes. A naive solution would be for each triangle to extrude a shadow volume via a special vertex shader and do the shadow volume inclusion/exclusion test in the pixel shader. However this would result in many expensive screen filling quads which is highly impractical on modern hardware, so in order to accelerate this test triangles may be merged by forming a shadow silhouette volume.

2.1.2 Shadow Mapping

Shadow mapping as described in [26] is one of the most common methods to produce realistic shadows for interactive rendering on modern graphics hardware. The basic idea is to, in addition to the main view, render the scene from the point of view of the light source (such as the sun, spotlights etc). Each rendered pixel in the Render Target View (RTV) then by definition represents a lit surface since nothing is occluding the light source. To evaluate any point in the scene a simple lookup in the shadow map is enough to determine if a surface lit or not. While simple in theory a number of issues may reveal themselves when considering the fact that the shadow map is a discretely sampled function, meaning that any such lookup will suffer from numerical imprecisions in the depth storage and spatial under-sampling.

Each surface depth fragment is stored away in a frame buffer as seen in 2.1. Both depth maps must be continuously updated each frame, to account for both object and camera movement. Depending on the light source type different projections may be used. For a very distant light source an orthogonal projection may be used as light rays may be approximated to be parallel with respect to each other. For directional light sources on a smaller scale, such as spotlights, perspective projection is typically used.

Issues exists in general with the shadow mapping approach that needs to be addressed before making it a usable technique in a general purpose game engine. A typical game level requires shadows to be cast both by small objects on the ground and huge mountains close to the horizon. The massive range in shadow caster scale poses some challenges for the shadow-map technique, as a typical depth RTV currently is in the range of 1024x1024 - 2048x2048 pixels for performance and memory reasons. To solve this issue Cascaded Shadow Maps (CSM) can be

(18)

Figure 2.1: Left: view rendered from main camera with the corresponding depth buffer to the left. Center: The shadow map containing depth values rendered from the light source. Right: the shadow map combined with the main view to produce a hard shadow.

used, as explained in sec. 2.1.4 and solutions to shadow map aliasing will be presented in sec. 2.2.

2.1.3 Shadow Acne

Due to the numerous quantisation steps involved in the shadow generation, including a limited resolution of the shadow map and screen space depth - both the depth fragment in camera space and shadow space - shadow acne can occur. It happens when the depth value in the shadow map and the back projected value differs less than what can be accurately represented by the floating point numbers and inconsistencies results occur over a surface.

Figure 2.2: Left: Shadow acne issue. Right: shadow acne fixed using normal offset bias. To avoid incorrect self shadowing, a depth bias of the shadow test is required. The depth bias works by offsetting the shadow map depth values away from the light source by a certain amount. While it’s possible to reduce and even eliminate shadow acne completely using this technique it comes with a penalty in form of light leaking, also known as Peter Panning.

The observation can be made that the steeper the receiver plane angle is in relation to the shadow surface, the larger the depth bias needs to be in order to avoid self-shadowing. For this reason each rendered polygon is also offset along the normal direction with a factor based on the slope of the geometry relative the light source, see fig. 2.3. The downside being that both the constant and the slope bias needs to be manually tuned.

2.1.4 Cascaded Shadow Maps

One drawback of the shadow map approach in 2.1.2 is the aliasing caused by the resolution discrepancy between screen space resolution and shadow map resolution. Each depth fragment

(19)

Figure 2.3: Left: Shadow acne caused by depth fragment imprecision. Self shadowed areas marked with red. Center: depth bias along the light direction. Right: Depth bias applied along the polygon normal.

in the shadow map will typically cover a large number of pixels in screen space, known as perspective aliasing. To optimize the shadow texel to screen space pixel ratio a CSM [6] [7] technique is often utilised.

The motivation behind CSM is quite intuitive, different areas of the view frustum requires different shadow map resolutions. For this reason the view frustum is split in to N sections along the view depth where each section is rendered in to a separate shadow map. Normally each cascade is rendered with the same resolution, in order for the shadowmaps to be dynamically indexed in the shader using a texture array. The further away from the camera the cascade is located the larger area the shadowmap covers, thus creating a variable shadow texel to world space coverage.

Figure 2.4: Left: Visualisation of a viewing frustum using 3 separate shadow cascades. Right: CSM coverage in screen space using 5 cascades.

2.2 Shadow Map Filtering

The hard shadow border if not addressed suffers from aliasing and jaggies. Increasing the shadow map resolution to a degree where the aliasing and jaggies disappears is usually not feasible in practice. Instead the shadow map can be anti-aliased in the pixel shader during shadow map back-projection. Some solutions applied to reduce shadow map aliasing may coincidentally also be used to approximate soft shadows, as explained further in sec. 2.3.

(20)

2.2.1 Percentage-Closer Filtering

By extending the shadow test to a neighbourhood of pixels in the shadow map and averaging the result, anti-aliasing of the shadow border can be achieved using the Percentage Closer Filtering (PCF) [18] formula:

fpcf(t, ˜z) =

X

ti,∈K

k(ti− t)H(z(ti) − ˜z) (2.7)

Where K is the kernel filter area, k is a filter weight function, H is the heavyside step function and t is a coordinate in the shadow map coordinate system. The function will return a value in the range [0..1].

The choice of filter weight function, k, greatly impacts the look of the penumbra region. Selecting k as a constant value with equal weights for each sample produces banding artefacts, so instead a tent shaped filter kernel is commonly used. In fact the tent shaped kernel is implemented natively in graphics hardware 1_{. Another option for large kernels is to randomly select samples}

inside the filter region, different ways to select this sampling pattern is described in sec. 2.6.1. Another aspect to consider for large filter regions is the potential issue of self-shadowing. Tilted geometry may suffer from self shadowing when the shadow map geometry starts to deviate from the constant depth ˜z. A solution for this is to reconstruct the receiver plane [7] by transforming screen space depth derivatives to shadow coordinates:

_{∂ ˜}_z ∂u ∂ ˜z ∂v = ∂u ∂x ∂u∂y ∂v ∂x ∂v∂y !−T ∂ ˜z ∂x ∂ ˜z ∂y (2.8)

The larger filter region K, the more expensive the computation becomes and for very large kernels the computation becomes prohibitively expensive. For this reason multiple algorithms to pre-process the shadow values region in light space have been developed, which will be outlined in the following sections. A key insight is that the desired operation is not a convolution of the shadow map depth values (that would simply lead to non-sensical results) but a convolution of the depth test. Pre convolving the shadow map then becomes a problem due to the non-linearity of the PCF equation (eq. 2.7), strategies to linearise the equation is explained Convolution Shadow Maps (CSM) and Exponential Shadow Maps (ESM), explained in following sections. Another approach is to analyse the shadow map statistically through Variance Shadow Maps (VSM).

Variance Shadow maps

VSM [?] is able to provide an upper bound of occluded fragments that are occluded by converting the shadow map from a per-pixel depth representation to a per-pixel depth

distribution. The distribution is approximated by storing the mean and variance depth values

per pixel and using Chebychev’s inequality to calculate an upper bound of shadow values on one side of the distribution.

E(x) = Z

xp(x)dx (2.9)

1

(21)

E(x2) = Z

x2p(x)dx (2.10) Depth variance σ2_z and the expected value µz is then retrieved:

µz = E(x) (2.11)

σ_z2= E(x2_{) − E(x)}2 (2.12) The upper bound can then be retrieved via Chebychev’s inequality

P (x ≥ ˆz) ≤ pmax ≡

σ2_z σ2

z+ (ˆz − µz)2

(2.13) Meaning that the maximum possible occlusion value pmax can be found via this statistical

relationship. If a single planar receiver and occluder is assumed, the upper bound is promoted to an equality, as explained in further detail in the original work [?] but for larger deviations from this assumption incorrect shadow result can be produced, for large values of σ2

z ’light leaking’

may occur. Note that Chebychev’s inequality is only true for ˆz > µz, for points where this is

false, the surface point is considered unshadowed.

Shadow Map Prefiltering

Prefiltering of the shadow maps requires a linearisation of the shadow test in eq. 2.7. The idea is to approximate the shadow test function H in such a way that it becomes linear factors separable from ˆz. First the test is redefined as s(z, ˆ_{z) = H(z − ˆz), then define basis functions of} s such that: X αis(B1(zi), B2(xi), ..., ˆz) = s X αiB1(zi), X αiB2(zi), ..., ˆz (2.14) This is true when s can be defined as a series of additive basis functions:

s(z, ˆz) =X

k

ak(ˆz)Bk(z) (2.15)

Then convolution of the basis functions can be computed in a separate step:

B_kconv(z) = X

t_i∈K

k(ti− t)Bk(z(ti)) (2.16)

Now the question is which basis function to use, the authors in [2] investigated different linearisation schemes, via Taylor expansion and other continuous approximations of the sigmoid, but found that fourier expansion gave best results. A reason for this is the shift-invariance property of the Fourier expansion, meaning it will have a constant precision along the function domain, as opposed to Taylor expansion that diverges in accuracy for larger δz = |z − ˆz|. The Fourier expansion is defined as infinite series of sine and cosine terms:

(22)

s ≈ sf = 1 2+ 2 M X k 1 ck cos(ckz) sin(cˆ kz) − M X k 1 ck sin(ckz)cos(cˆ kz) (2.17)

Numeric values of ck and can be found in the original work [2]. In practice a finite number of

expansions will be made, the authors suggest 16 terms for good results, meaning that up to 32 values needs to be stored per pixel depth value. Additionally the sinusoidal nature of the Fourier transform suffers from ringing artefacts, where over and under-shoot is present for low values of M . To reduce this issue the authors suggests attenuating each term with a power increasing exponentially relative to the frequency. The attenuation comes with a trade off in accuracy as the attenuation will decrease the steepness of the step function, leading to imprecision in the shadow test.

Closely related to convolution shadow maps is Exponential Shadow Maps [3]. It’s based on a similar idea of linearisation of the shadow test, but uses an additional assumption of the blocker distance: z < ˆz. This allows the step function to be approximated via a single exponential:

s = lim

α→∞e

−α(z−ˆz)_{, if z < ˆ}_z _(2.18)

The constant is in practice chosen as a large number (α → c), making the function separable: s = e−czeˆz. A high value of the constant c results in better approximation of the step function, but will require increasing precision of the underlying floating point format. The authors suggest using c = 80 for 32-bit floats, as it seems to provide a good tradeoff between precision and accuracy of the approximation. Furthermore the assumption that z < ˆz, is bound to be violated under normal circumstances. While it can be argued that it’s often true, classification for failure cases still needs to be performed to avoid pixels where the exponential runs off to infinity. A solution is a z-max acceleration structure that allows for a quick lookup of min depth value of an arbitrary region (similar to the movement map in sec. 2.5.1). A less conservative but prone to inaccurate approximation is the Threshold Classification that checks if the ESM value is ”too big”: s > 1 + ǫ, the ǫ parameter needs to be configured by hand. Once the failure case has been identified the failed pixels resort to regular PCF. Larger filter areas are more likely to violate the depth distance condition, resulting in ESM being better suited for smaller kernels.

2.3 Soft Shadows

Shadow mapping as described in sec. 2.1.2 is primarily a binary test: a surface point is either in shadow, or it’s lit. In reality shadows are more nuanced (literally): a surface point may be either fully or partially occluded by the light source, see eq. 2.6. This results in regions of shadow silhouette that are partially lit, penumbra, to fully occluded, umbra. See fig. 2.5. Despite the computational convenience of approximating light-sources as single points it’s often not enough to properly capture all features of shadows for realistic rendering. Even the sun, whose distance to the earth measures in millions of kilometres, will cast shadows with penumbra regions of significant size. When talking about soft shadow rendering, it’s mainly the non-binary nature of a light receiving point that’s being of interest. In this section rectangular area lights will be in focus. Algorithms to evaluate arbitrary shapes exists but will not be presented here.

(23)

Figure 2.5: Definition of fully occluded, umbra, and partially occluded, penumbra, regions.

2.3.1 Percentage-Closer Soft Shadows

A popular algorithm to generate soft shadows for real-time applications is PCSS [8]. Due to its simplicity and improvement over regular shadow mapping, it has gained popularity as the to-go soft shadow method for games. The main idea behind PCSS is to use the anti-aliasing properties of PCF in order to approximate a solution to the visibility integral, eq. 2.6.

The algorithm works by assuming that the light source, the (single) occluding surface and the receiving surface are all parallel to each other. This allows the formulation of a simple formula using similar triangles to approximate the penumbra size if the blocker distance is known, see fig. 2.6:

wpenumbra=

zreceiver− zoccluder

zoccluder

wlight (2.19)

Where zrecieveris the distance from the light source to the receiver, zoccluder is the distance from

the light source to the occluder. wlightis the size of the light source, typically controlled by the

application.

To approximate the distance to the occluder zoccluder the average distance of the blockers inside

the blocker search region is found by:

zoccluder= 1 n n X t_i,∈Kb z(ti)H(z(ti) − ˜z) (2.20)

The same plane depth bias used for PCF may be applied in this step. Once the average occluder depth and wpenumbracalculated, a suitable PCF kernel window wf is found via projection to the

shadow map near plane:

wf =

znearplane

zreceiver

wpenumbra (2.21)

Intuitively the results makes sense, the PCF kernel wf will shrink when the average blocker

distance becomes smaller, and grow as the distance becomes bigger. However it should be pointed out that this approximation to visibility integral eq. 2.6 is done through very generous assumptions: Rarely will the parallel occluder assumption hold, even in most simple scenes the occluder(s) is most likely a more interesting geometric figure than a plane. Additionally, approximating the occlusion fraction via a PCF kernel size is generally incorrect as well.

(24)

Figure 2.6: A schematic of PCSS and the variables involved.

Despite this, the results produced from PCSS is visually pleasing and relatively cheap to run and implement, considering the alternatives.

In reality the excessive number of samples in the shadow map for large penumbra search regions have called for acceleration structures and pre-filtering of the shadow map. The most common of those algorithms, which are typically and extension to the pre filtering outlined in sec. 2.2 will be presented here.

Variance Soft Shadow maps

Variance soft shadow maps [28] combines the PCSS framework with the acceleration obtained from statistical analysis of variance shadow maps (sec. 2.2.1). Before explaining the theoretical foundation, it should be known that Variance Soft Shadow Maps (VSSM) uses two complimentary acceleration structures, a Summed Area Table (SAT) that stores per pixel variance and average pixel depth, and a min-max hierarchical shadow map that is used to quickly decide for an arbitrary area whether a point is fully lit or fully occluded. For all other points the variance shadow maps technique is extended to solve soft shadows. For those points, the average blocker depth can be found in a region of N samples via:

N1

N zunocc+ N2

N zocc= µz (2.22) Where zunoccis the average depth values of blockers ≥ ˆz and zoccis the average depth of occluded

points < ˆz. The ratio N1

N can be found via Chebychev’s inequality, (see sec. 2.2.1) P (x ≥ ˆz),

and consequently N2

N = 1 −P (x ≥ ˆz). Resulting in the approximation of average occluder depth:

zocc = µz− P (x ≥ ˆz)zunocc

(25)

While µz is known, zunocc is not. The unoccluded average is approximated as: zunocc = ˆz,

this can be justified on the basis of parallel blocker and receiver assumption, see fig. 2.6. It should be noted that this is only true when the ”planarity” is true; µz ≥ ˆz. Contrary to VSM

where the point is simply considered lit when planarity is false, VSSM solves the problem by applying hierarchical subdivision on the filter kernel until the condition holds true, or the kernel is sufficiently small to allow for a cheap PCF evaluation.

2.3.2 Soft Shadows via Prefiltering

Just as linearisation of the depth test is used to produce a pre-convolved shadow map used for edge-smoothing, it can be used to efficiently produce soft shadows with varying penumbra. The key insight of convolution soft shadow maps [1] is that the average blocker depth of depth values < ˆz also can be expressed as an expansion using Fourier basis, referred to as CSM-Z (see the original work [1] for more details). The variable filter kernel is then approximated using either texture pyramids or SAT’s. This technique suffers from the same ringing artefacts that convolution shadow maps does, (sec. 2.1.4), and the consequent ringing suppression techniques that brightens the shadows at contact points.

2.4 Screen Space Soft Shadows

The goal of screen space soft shadows is not to achieve accurate results, but rather to produce visually pleasing soft penumbra region using a screen space filter. The filter being used typically requires a kernel size σ (eq. 2.43, 2.42) to be defined. Based on intuition the larger the penumbra is, the bigger we can allow the kernel to be without having a sharp contact shadow to become overly-blurred out. Besides this a large penumbra region will be noisier due to the larger area of integration meaning a larger kernel may be required.

As a proxy for kernel size in screen space the the initial penumbra estimation step of the PCSS algorithm can be utilized. We can use this information to guide the kernel size, as suggested by [15]. By projecting the penumbra to screen space via:

dscreen= 1 2 tan(f ov₂ ) (2.24) wscreen penumbra= wpenumbra· dscreen deye (2.25) Where f ov is the camera field of view, deye is the screen space depth and wpenumbra is the

penumbra defined in eq. 2.19. A filter kernel size wscreen penumbra can then be used that

approximates the penumbra from the average blocker depth.

2.5 Temporal Reprojection

Evaluating the each pixel PCSS value is an expensive operation since it involves multiple texture lookups in the shadow map, both for blocker estimation step and for the occluder search. The evaluation cost my be reduced efficiently by resorting to different types of sampling approximations (see sec. 2.6) that reduces the number of samples required dramatically. Unfortunately this introduces unwanted noise in the penumbra region. By

(26)

assuming that the scene objects and the shadow casters are static it’s possible to amortise the evaluation of eq. 2.7 over multiple frames. This amortisation performed by history texture lookup, by transforming the previous world position p of object rendered at pixel coordinate i, j, and using current and previous view Vn and projection Pn matrices:

p_n−1,u′_,v′ = Pn−1Vn−1V−1n P−1n pn,u,v (2.26)

This process is visualized in fig. 2.7 [16] [20]. Looking up the shadow value in frame n − 1 requires interpolation, considering that u′, v′ normally won’t end up at the exact pixel centre. The Graphics Processing Unit (GPU) supports linear interpolation in hardware natively, making it an attractive option due to its speed and simplicity. Higher order interpolation methods can be used for better quality (such as Bicubic, Catmull-Rom interpolation) however these need to be implemented in software and require additional samples.

Once the shadow value for an arbitrary frame can be retrieved, pixel shadow value can then be calculated through the average:

Fn(p) = 1 k k−1 X i=0 fn−i(p) (2.27)

Where Fn is the PCSS shadow value displayed for frame n and p is the pixel location being

evaluated. For obvious reasons not the entire history of frame buffers is being cached, instead the last few frames is stored. Rather than taking the N previous frames and average the result,

exponential filtering can be used:

Fn(p) = αfn(p) + (1 − α)Fn−1(p), α ∈ [0, 1] (2.28)

Where α is a tuneable smoothing factor, lower values converges more slowly and high values converges faster but has poor convergence rate. α is (typically) in the [0.05, 0.2] range, but tuning this parameter is entirely up the application. Another benefit of the exponential filter is that it only requires a single texture cache to be stored in memory, as opposed to eq. 2.27 that requires the last k frames to be stored in memory. For cache lookups that fail, either by deviating significantly in world space position or being outside camera frustum, the cache is considered invalid and convergence rate is reset α = 1. Reprojection of shadows in scenes with dynamic objects requires extra care, as explained further in sec. 2.5.1

2.5.1 Movement Map

Reprojection cache for shadows consists of an additional level of complexity when considering the problem of moving shadow casters and the impact it can have on distant shadow receivers. There is a need to not only evaluate the movement of the shadow receiving surface, but any shadow casting surface that may impact the shadow map lookup must be taken in to account. The key is to find a way to (quickly) decide for a given pixel whether or not anything in the penumbra region has changed from the previous frame, and if it has, invalidate the reprojection cache. A solution to this problem would be to store the previous frame shadow map and compare a given region to the current frame shadow map, if no delta is found then reprojection is valid. Unfortunately performing such an operation would be prohibitively expensive, the cost to do this would equal a brute-force PCSS approach, thus being pretty much useless. To accelerate this test a movement map has been proposed [21], the key idea is to render each moving object in to a render target of same resolution as the shadow map with a binary mask, each pixel being

(27)

Figure 2.7: The visualisation of a screen space reprojection vector.

either 1 or 0 depending on if it’s covered by a dynamic object or not. The render target is then used to produce a MIP-map pyramid. The process of producing a MIP-map will cascade the binary values upwards in the mip-chain, for any region in the movement map containing non-zero value will indicate movement, and thus invalidation of the shadow cache. The movement map search region rsearch is calculated as follows:

rsearch =

zreciever− dnearplane

zreciever

wlight (2.29)

Once the search area is found the movement map mip level lmip to sample is calculated:

lmip = ⌈log2(2 ∗ rsearch∗ wSM)⌉ (2.30)

The shadow cache test then becomes a simple binary test, if the movement map value at MIP level lmip is non-zero then shadow cache is invalidated for that pixel. A practical aspect to

consider is the bitrate of the movement map, if chosen too low with relation to the shadow map resolution information risks getting lost at higher mip levels. To avoid this issue the minimum bit-rate of a fix precision floating-point format can be decided according the the following formula:

Nbit ≥ ⌈2 log2wSM⌉ (2.31)

Where Nbit is the minimum bit depth needed to represent movement for an arbitrary movement

map at resolution wsm. In practice this can be relaxed a bit considering that the search area

(28)

2.6 Monte Carlo Methods

The visibility integral eq. 2.6 can be approximated using PCSS, but it still requires integration over a potentially large domain. A common technique to approximate multi-dimensional integrals that are either not possible to solve analytically or too expensive to do in practice is to apply a family of functions known as Monte Carlo Methods. Fundamentally it works by randomly sampling a value f (xi) on the integration domain Ω and then estimating the integral

using the mean-value theorem.

I = Z Ω f (x)dx (2.32) I ≈ _N1 N X i f (xi), xi (2.33)

Choosing the sample sequence {x1, ..xN} ∈ Ω can greatly impact the overall convergence rate and

error bound for low values of N . When the sample sequence is not chosen from a pseudorandom sequence, but from a low-discrepancy sequence instead, it’s considered a Quasi-Monte Carlo Method (QMCM). While QMCM’s have a greater convergence rate than the per-pixel uniform sequences, it’s prone to aliasing due to reuse between pixels, for this reason a re-randomised sequence of the sample sequence can be used, known as Randomized Quasi-Monte Carlo Method (RQMCM)’s.

2.6.1 Sampling strategies

To evaluate eq. 2.33 a sequence of random numbers needs to be generated. Different sampling strategies are available for this purpose, each with different characteristics that may be evaluated and analyzed. One such important feature of a sample sequence is the sample point equidistribution, also known as discrepancy [23]. The low discrepancy noise is also often referred to as Blue Noise [25]. More formally a signal is said to exhibit blue noise properties when it has weak low-frequency component as opposite to a White Noise signal that has an equal distribution of frequencies. For practical purposes it’s also worth to consider whether or not the algorithm is hierarchical, meaning that for two sample sequences PN and PN +1 the

relationship PN +1= {PN, pN +1} holds true, meaning that the sample sequence can increase in

size without the need to recompute the entire sequence.

Below are different sampling algorithms defined for a set of sample points PN = {p0, .., pN −1}

on the two dimensional unit square, pi ∈ [0, 1]2:

Regular Sampling. Each sample point is regularly sampled in a grid-like pattern, see fig. 2.8.

A non-hierarchical, low discrepancy sample sequence in O(N) will be produced. This pattern suffer from aliasing and banding artefacts, and is typically avoided in practice.

Uniform Random Sampling. The benchmark sampling strategy, it’s free of aliasing but may

produce unwanted clustering of sample-points due to its high discrepancy. The algorithm is hierarchical in O(N), see 2.8.

Poisson Disk Sampling. Poisson disk sampling distributes samples on the sample space such that

no two samples in the sequence have a distance less than a specified radius r: |pi−pj| > r, p ∈ PN.

It will produce a non-hierarchical, low discrepancy sample sequence. Alg. 4 has a lower bound of O(N2). Although more sophisticated methods exist in O(N) [4] for offline calculations and small datasets the dart throwing technique, alg. 4, may suffice. Variants of the poisson disc

(29)

Figure 2.8: Left: Regular grid pattern with N = 256 samples , see alg. 1. Right: Uniform grid pattern with N = 256 samples, see alg. 3.

sampling exists where a relaxation of the fix radius requirement is used [14]. The idea is to begin with an initial radius and after (many) failed attempts (see alg. 4 line 5) the radius is reduced. This process produces a poisson disc distribution that will try to maximize the radius and at the same time produces a sequence that is better suited for progressive (hierarchical) sampling.

Figure 2.9: Left: Poission disk sampling with N = 256, as described in alg. 4. Center: The first 16 samples (highlighted in blue) in the sample sequence resembles a high discrepancy sequence with a high concentration of samples in the top right and bottom right corner. Right: First 16 samples highlighted, the sample set is generated using a hierarchical placement of points according to [14] and may be suitable cases where progressive sampling is needed.

Hammersley Points. Hammersley sampling is a popular technique in numerical and graphics

applications with the general form defined in eq. 2.34 [27].

k = a0+ a1p + a2p2+ ... + arpr, ai ∈ [0, p − 1] (2.34)

Where the non-negative integer k can be represented in a prime base p. The radical inverse Φ is then defined as:

Φp(k) = a0 p + a1 p2 + a2 p3 + ... + ar pr+1 (2.35)

To generate hammersley points on a 2D plane the 2.36 can be used (For a definition in n-dimensions the reader is referred to [27])

(30)

pi= (

k

n, Φp1(k)) for k = 0, 1, 2, ..., n − 1 (2.36)

Where p1 = 2 is a popular choice to use for graphical applications, as the radical inverse of

a binary number has fast implementation (see listing B.3) this is also known as the Van Der

Corput Sequence.

Halton Point Set. If attention is paid closely to eq. 2.36 it can be observed that the the

sequence is not hierarchical, as increasing the number of sample points (n) will require the entire sequence to be recalculated. For an applications where N is not known, a hierarchical version can be constructed, also know as the Halton Point set [27] :

pi = (Φp1(k), Φp2(k)), p1 6= p2 (2.37)

This removes the problematic _nk term in 2.36 and replaces it with the Van Der Corput sequence. The Halton sequence requires two different primes, typically p1 = 2 and p2 = 3.

Figure 2.10: Left: Hammersley sequence with N = 256 samples , see alg. 2. Right: Halton sequence with N = 256 samples.

2.6.2 Blue Noise Dithering

Just as sample de-correlation is necessary within the per-pixel sample sequence, de-correlation between neighbouring 2D pixel sequences is also desired. For instance, the scrambling scheme described in sec. 2.6.3 will rotate each sample point along a vector drawn from a random number distribution. However, to avoid aliasing caused by neighbouring pixels using ’too similar’ rotation vectors, a blue noise dithering texture [9] can be used to retrieve the rotation offset. The blue noise dithering technique takes a white-noise texture as input and attempts to minimise the energy function E by randomly swapping pixel pairs via simulated annealing:

E(M ) =X p6=q E(p, q) =X p6=q exp ₋||pi− qi|| 2 σ2 i −||ps− qs|| d/2 σ2 s ! (2.38)

Where M is a 2-dimensional white-noise input texture with d-dimensional pixel intensity values. Variables p and q denotes a pixel pair in the white noise texture, i indicates the pixel coordinate and s is the sample value. σi and σs are the two tuneable parameters, as suggested by [9] letting

σi = 1 and σs = 2.1 yields pleasing results. Pixel distance needs to be calculated also over

(31)

The resulting blue noise texture has some interesting properties, due to its weak low frequency amplitudes, it is much easier to denoise the result, see fig. 2.6.3. Additionally the pre-denoised noise is arguably more visually pleasing, as seen in top row of fig. 2.6.3.

Figure 2.11: Blue Noise Dithering described in sec. 2.38. Left: A reference white noise texture, also chosen as input (M ) to eq. 2.38. Right: The resulting blue noise texture E(M ) after 216 iterations with σi= 2.1 and σs = 1 as suggested by [9]. The images are 256 pixels wide and 256

pixels tall.

Figure 2.12: Top left: dithering using the white noise texture. Top right: Dithering using the blue noise textures from 2.11. Bottom row: Gaussian blur applied to the images in the top row, both pictures uses the same kernel size.

(32)

2.6.3 Scrambling

If the sample sequence is expensive to compute it may have to be calculated offline (poisson disc sampling for instance, see sec. 2.6). While that can be easily done for small data sets, it’s still desirable to let each pixel have a unique set of samples in order to avoid aliasing. However generating a unique sample sequence per pixel offline is may become prohibitively expensive since both data size and bandwidth may become performance limiting factors. It’s therefore desirable to generate a pseudo-random sequence with the same properties of a poisson disk but still make the sequence unique per pixel, also known as scrambling.

To generate a pseudo-random distribution of samples per pixel a poisson distribution can be rotated and shifted. This will retain the low discrepancy property of the samples while still producing a unique set of samples per pixel.

Cranley-Patterson Rotation

Figure 2.13: Cranley-Patterson rotation of a poisson disc sample set with vector offset ξx,y =

[0.05, 0.05] (Blue = Original, Red = Scrambled)

Cranley-Patterson Rotation [13] is a simple technique to scramble samples distributed over a unit square. Each point in the sample in sequence is offset with a vector and will wrap around on the borders.

ˆ

pi = pi+ ξ mod 1 (2.39)

Where ˆpi is the scrambled version of pi generated by adding a random vector ξ to the sample

sequence. The poisson disc sample sequence should be generated with this in mind to respect minimum disc radius also at the wrapped around border.

2.7 Denoising

Since the per pixel shadow value will be approximated using Monte Carlo Method (MCM) and QMCM’s the function will be heavily approximated with a nosy output as a result.

The sampled PCSS approximation will be noisy by nature, for that reason it might be necessary to denoise the screen space shadow map for a visually pleasant result [19]. There are techniques that relies entirely on screen space denoising of a hard (binary) shadow map [29] [15] by using a kernel size based on the penumbra with entirely.

(33)

The final shadow penumbra result can be denoised over the screen space domain, below are set of different sampling algorithms defined.

2.7.1 Box and Median Filter

Median filter is often used to reduce noise in images and salt and pepper noise in particular. It works by evaluating the neighbouring pixels and replaces the value with the median of the evaluated pixels. This way highly deviating pixels that are unrepresentative of the signal will be erased. Box filter will take the avarage value in the kernel, effectively it works as a low pass filter and is a simple way to reduce high frequency noise.

Figure 2.14: Left: Example of a 3x3 box filter. Right: Example of a 3x3 median filter.

2.7.2 Gaussian Filter

The gaussian smoothing operator can be used to filter images and remove unwanted noise and sampling artefacts. It’s a popular technique used in many different image processing applications, including computer graphics. It’s effectively a low pass filter, removing high frequency details while keeping the low frequency characteristics intact. The 2D gaussian kernel has the form:

Gσ(x) = 1 2πσ2e −x/2σ2 (2.40) I′(p) =X q∈S Gσ(||p − q||)I(q) (2.41)

Where q are pixel coordinates in a pixel region S around the pixel center p, σ is a tuneable parameter controlling the smoothness of the image, (larger value results in a wider blur kernel) and I is the pixel intensity value (shadow mask value). The standard gaussian filter is a low-pass filter suitable for reducing noise in images. While it’s extremely effective at reducing noise it may fail to respect hard boundaries in the image where bleeding is unwanted. A closely related filter is the bilateral filter [24], it’s defined as a weighted average of nearby pixels similar to eq. 2.41 with an additional weighting factor taking pixel intensity value in to account.

I′(p) = 1 Wp X q∈S Gσs(||p − q||)Gσr(|Ib(p) − Ib(q)|)I(q) (2.42) Wp = X q∈S Gσs(||p − q||)Gσr(|Ib(p) − Ib(q)|) (2.43)

The bilateral filter is a combination of a spatial distance and intensity distance, it requires an additional weighted average to be computed for the normalisation factor Wp. The intensity value

(34)

Ib is normally same as intensity (I = Ib) but as suggested in [12] it can also be the depth value

of the screen space depth buffer Ib = z. This way the edges in screen space will be respected

and shadow bleeding across unrelated geometry largely avoided.

The numer of texture lookups to evaluate each sample in the kernel S has quadratic time complexity for increasing pixel and kernel size, O(σn2_{) where n is the texture dimension. It’s}

possible to separate the filtering in to two separate passes, one vertical followed by a horizontal pass, reducing the time complexity to O(n). For the gaussian filter this operation is a pure optimization meaning it will produce the same output in both the separable and non-separable case. For the bilateral filter it’s considered an approximation, but due to the large speedup is often preferred over the more expensive (although accurate) results. The separable bilateral filter may produce artefacts such as streaks.

(35)

Chapter

3 Implementation

The algorithm presented here is based on a few assumtions regarding GPU hardware and performance characteristics. First of all, it’s assumed that aggressively reducing the number of PCSS samples for blocker and filter search is necessary to achieve performance suitable for real-time rendering. Also that memory bandwidth is a limiting factor in general, leading to strategies to compress the data being read and written as much possible. Having multiple full screen rendering passes processing the shadow mask results in many roundtrips to GPU main memory and back.

The second assumption is that denoising is necessary, and that reducing noise is a desired result. The way the algorithm is connected to the engine requires no major changes in the rendering pipeline, PCSS can simply replace the standard shadow rendering algorithms and everything else can be processed as a post processing pass.

As a reference to benchmark the combination of techniques (temporal filter, spatial filtering and sampling) both from a performance and quality perspective, a naive PCSS pass was implemented. A generalised implementation of PCSS can be found in alg. 7.

3.1 Sampling

A 2-channel 64x64 blue noise texture generated offline is tiled over the screen space pixels. The texture pixel value is used to scramble the samples from the pre-generated poisson disc samples using Cranley-Patterson rotation (sec. 2.6.3). A new set poisson disc samples is used each frame in order to get temporally varying samples, this will aid the temporal reprojection to convert faster.

3.2 Movement Map

First all moving objects in the current frame are rendered into a number of movement maps with the engine shadowmaps bound as depth read-only render target, as described in 2.5.1. Since the shadowmap resolution exceeds 1024, the bit format of R8 UNORM suggested in [21] can

(36)

not be used. Instead texture format R16 UNORM is used. After all moving objects have been rendered, a mip-chain is generated per movement map. The movement map lookup is based on the kernel size of the average blocker. In order to avoid a temporally unstable kernel size the blocker search samples are reused each frame.

3.3 Data Format

The PCSS render target is is created using R16G16 FLOAT. The 32 bits per pixel will contain information of the shadow accumulation value, as well as blocker distance and reprojection cache invalidation based on movement map lookup.

Figure 3.1: Per pixel data layout from PCSS render pass. The numbers per field represents the number of bits used to store fixed precision decimal value.

In fig. 3.1 the packing structure is visualized. The shadow value represents the PCSS filtered output, ∈ [0, 1], The average blocker depth is represented by fp16. The last bit is reserved for a boolean to check if the movement map lookup will invalidate the frame cache and if the reprojection is valid.

3.4 Shadow Post Processing

Once the shadowmap has been evaulated a screen space buffer containing the per pixel PCSS data from sec. 3.3 is available (fig. 3.2 step 2). In order to reduce noise in the shadow penumbra region a sequence of filtering steps have been implemented. Observing the blocker search condition in alg. A.2 on line 13 it’s possible to conclude a surface is fully lit if no blockers are found. Unfortunaley this may lead to incorrecly lit surfaces for areas where occluders cover a tiny percentage of the blocker search area, leading to salt and pepper noise in penumbra region. To reduce the aforementioned noise a median filter sec. 2.7.1 is applied to the penumbra mask.

3.5 Algorithm Overview

The high level frame setup is visualized in fig. 3.2. The engine already supplies a state of the art implementation of shadow mapping, shadow bias offset, CSM and PCF. For this reason neither of those implementations have been modified. The engine also provides a buffer containing velocity vectors that can be is shared amongst any render pass that require temporal accumulation: ambient occlusion, screen space raytracing and temporal anti-aliasing, to name a few. The velocity buffer contains precomputed screen space uv-coordinates to perform quick history lookups, in accordance to eq. 2.26. The engine expects a full screen buffer containing per-pixel shadow information [0..1] as input to the deferred lighting render passes. This buffer is later used to modulate the per pixel colour according to the BRDF, more information about the FrostbiteTM lighting model can be found in [22].

(37)

Inside the Soft Shadow Render Module the movement maps are produced in a separate step, as pointed out in sec. 2.5.1 it can be bound as a second RTV during shadow map generation for efficiency, however that optimization is left for future work.

Figure 3.2: Overview of the different render passes in the frame. Left section contains already existing render passes and buffers, on the right side is new passes used to produce soft shadows.

During the PCSS render pass, fig. 3.2, step 2, the shadow maps are being queried according to the movement map cache invalidation scheme outlined in sec. 2.5.1, if movement is detected a higher number of taps are performed. The default blocker sample count is chosen to 16, and either 16 or 64 filter samples are selected depending on the frame cache validity.

The PCSS sample pattern is drawn from a precomputed hierarchical poisson disk distribution sec. 2.6.1, containing 64 samples. Each point is rotated according the the cranley-patterson rotation in sec. 2.6.3. By mapping a 64x64 pre generated blue-noise texture from sec. 2.6.2 containing 2D offset coordinates and tile it over the screen, a low discrepancy rotation vector is generated per pixel to de-correlate neighbouring pixels. See sec. 4.1.

Once the PCSS has been approximated it’s being combined with the history buffer from previous frames using temporal reprojection and exponential filtering eq. 2.42 in step. 3.

Finally a separable bilateral filter from sec. 2.7.2 step. 4 is being used with world space positions used as weight. The average blocker depth is being used to derive the kernel width. The output is sent to the default engine lighting code.

(38)

Chapter

4 Results

This chapter will present the results of the implementation where different measurements of performance and visual quality will be compared and reviewed. The shadow computation cost is dependent on the composition the shadow casters and shadow receivers in the scene. All captures are done in a screen-space resolution of 1280x720 with 4 CSM each with 2048x2048 resolution. The GPU used is a GeForce RTX 2080 Ti. The engine default implementation of CSM has not been modified. To have a base-line implementation where all optimisations can be evaluated against, a brute force sampling technique was implemented as a reference. It works by sampling the entire PCSS kernel both for the average blocker search and for the filtering step. The least expensive type of scene is one with no shadow casters since the PCF step and screen denoising step may be skipped entirely. Contrary a scene with large and distant shadow casters resulting in large penumbras will be more expensive due to the increased number of texture fetches needed by large PCF and screen space kernels. For this reason the penumbra of a distant shadow caster (see fig. 4.1) covers most of the screen (see figures in sec. 4.1) in order to simulate an expensive setting in such a way that performance measurements in fig. 4.13 approaches an upper bound. The close up of the penumbra also supports the visualisation of the effect of different sampling patterns. In sec. 4.1 a selection of the different sampling patterns described in sec. 2.6.1 with sample counts of 16x16 and 64x64 can be viewed. The results highlight a significant difference in penumbra noise depending on the sampling pattern being used.

Section 4.2 highlights the effects of denoising of the penumbras in sec. 4.1 using a fixed kernel size, fig. 4.10. The overbluring when not using bilateral filter can also be seen in fig. 4.11. In fig. 4.13 the performance scaling using all combinations of 8, 32, 64, 128 blocker and filter taps can be viewed. As expected the execution time scales linearly with respect to the number of texture taps, which hints that the GPU hardware is memory limited and is unable to hide texture fetch latency over ALU-instructions. This data supports the assumed optimization strategy of reducing the texture fetches.

The final results can be viewed in sec. 4.3 with a comparison of blue noise sampling using screen space denoising compared to a high sample white noise PCSS implementation (fig. 4.12). GPU timings reports a difference in magnitudes while still obtaining a similar representation of the

(39)

penumbra. The denoising results in a slightly enlarged penumbra but may be very difficult to spot.

Figure 4.1: A PCSS reference scene with a single shadow-caster, a region with large penumbra is highlighted.

4.1 Sampling

(40)

Figure 4.3: White Noise + Uniform 16x8 spp. Right: 64x64 spp.

Figure 4.4: White Noise + Poisson Disc Sampling 16x8 spp. Right: 32x32 spp.

(41)

Figure 4.6: Blue Noise + Poisson Disc Sampling 16x8 spp. Right: 32x32 spp.

(42)

4.2 Denoising

Figure 4.8: Left: White noise + Hammersley, Right: White noise + poisson disc sampling

Figure 4.9: White noise + Hammersley sampling

(43)

(44)

4.3 Result

Figure 4.12: Top: 256 blocker samples and 256 filter samples, render time 20.12ms. Bottom: Blue Noise + Hammersley sampling using 8 blocker samples and 8 filter samples with bilateral filter. render time 1.51ms (PCSS: 0.63ms + Bilateral: 0.88ms)

(45)

4.4 Performance

Figure 4.13: The number of milliseconds for varying number of PCSS blocker and filter taps. The data indicates that low number of samples, less than 8x8, is needed to stay below the millisecond range. Timings displayed are measurements of average frame time over 400 frames.

(46)

Chapter

5 Discussion

The measurements clearly show that on a normal level the cost increases beyond any reasonable budget for a real-time experience, even in the simplest of scenes. Resorting to MCM is clearly a massive win in performance, however the noise introduced in the penumbra region may be distracting and thus performance is simply traded with quality. Moving to QMCM (and RQMCM in particular) paired with blue noise dithering greatly improves the penumbra noise pattern.

Applying screen space denoising using bilateral filter improves the overall quality however it deviates from the reference implementation quite a lot in certain scenarios where penumbra size is varying. Additionally it requires a lighting pipeline that stores per pixel shadow mask in a separate render target, which is not always the case, for instance when forward rendering is used. Denoising the penumbra before using the penumbra itself as an estimator for kernel size turned out to be a pretty decent workaround, despite the fact that it’s lacking any justification beyond solving a practical problem. Tuning the penumbra scale factor in screen space to reflect the world space penumbra (and skipping the hand-tuning step) would be a requirement for a generic engine implementation.

Temporal accumulation of shadows does a great job at improving quality, however it turned out to become very difficult to work well in a generic case. Also properly invalidating history buffer samples turned out to be a difficult task since the heuristics at hand are approximate and difficult to tune properly. Most of the time there is a small movement of -some- shadowcaster resulting in a large area of the screen being invalidated and thus require more expensive recalculation. Also rotating samples using a time seed (in addition to a pixel) for temporally scrambled samples causes a slight variation in the shadow penumbra region, despite exponential smoothing with very aggressively chosen α. The original presentation of movement map [21]. The worst case performance is lower than for regular PCSS considering that the generation of the movement map can be very expensive if many objects are in motion at the same time. It also has a significant memory overhead, adding an additional 75% memory cost for each shadowmap.

(47)

Chapter

6 Conclusion

A method to improve a noisy subsampled PCSS implementation has been presented. It’s possible to improve performance by order of magnitudes while still retaining a perceptually pleasing result. By temporally reproject shadow values over multiple frames a noise free image can be constructed, but it suffers greatly from ghosting artefacts despite, and areas where the temporal cache is invalidated the noisy result persists. Choosing a low discrepancy sampling sequence has been shown to be effective in creating a well distributed penumbra region. The conclusion is that temporally reproject shadowmaps is too conservative to be useful (Movement Map) and too difficult to tweak to work well in the general case.

It has all been implemented in a modern game engine, the techniques and passes used are easy to integrate and runs very well on modern graphics hardware.

6.1 Future Work

Generalization

The work in this thesis is focused on directional light-sources, while most applications will have some kind of sun light source present, generating soft shadows from spotlights is also desirable. By introducing multiple light sources denoising of the penumbra becomes a more complex issue, as the penumbra must be merged from all overlapping lights.

Blue Noise Dithering

The use of a blue noise texture significantly improves the shadow penumbra noise pattern. The argument can be made that the scrambling operation reduces the applicability of the energy function as it fails to take in to account the cyclic nature of the toroidal shift. A more generic approach have been introduced by [10] where the pixel intensity values are replaced with a the monte carlo estimation error compared to the analytical solution of a family of integrand functions.

Real-Time Rendering of Soft Shadows

LiU-ITN-TEK-A--19/060--SE

Real-Time Rendering of Soft

Shadows

Johannes Deligiannis

LiU-ITN-TEK-A--19/060--SE

Real-Time Rendering of Soft

Shadows

Examensarbete utfört i Datateknik

vid Tekniska högskolan vid

Linköpings universitet

Johannes Deligiannis

Handledare Stefan Gustavson

Examinator Jonas Unger

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Real-Time Rendering of Soft Shadows

Thanks

Contents

List of Figures

Glossary

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research questions

1.4

Delimitations

2

Background

2.1

Shadow Techniques

2.2

Shadow Map Filtering

2.3

Soft Shadows

2.4

Screen Space Soft Shadows

2.5

Temporal Reprojection

2.6

Monte Carlo Methods