GPU Accelerated Surface Reconstruction from Particles

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--10/065--SE

GPU Accelerated Surface

Reconstruction from Particles

Erik Edespong

2010-10-28

(2)

LiU-ITN-TEK-A--10/065--SE

GPU Accelerated Surface

Reconstruction from Particles

Examensarbete utfört i medieteknik

vid Tekniska Högskolan vid

Linköpings universitet

Erik Edespong

Handledare Magnus Wrenninge

Examinator Jonas Unger

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

GPU Accelerated Surface

Reconstruction from Particles

Erik Edespong

eried115@student.liu.se

(5)

Abstract

Realistic fluid effects, such as smoke and water has been pursued by the vi-sual effects industry for a long time. In recent years, particle simulations have gained a lot of popularity for achieving such effects. One problem noted by re-searchers has been the difficulty of generating surfaces from the particles. This thesis investigates current techniques for particle surface reconstruction. In ad-dition to this, a GPU-based implementation using constrained mesh smoothing is described. The result is globally smooth surfaces which closely follows the distribution of the particles, though some problems are still apparent. The per-formance of the algortihm is approximately an order of magnitude faster than its CPU counterpart, but is clogged by bottlenecks in sections still runnning on the CPU.

(6)

4.2 Distance field . . . 30 4.3 Mesh creation . . . 30 4.4 Smoothing . . . 31 4.4.1 Finding matrices . . . 32 4.4.2 Solvers . . . 32 4.5 Constraints . . . 33 4.6 Erosion . . . 33 4.7 Attribute transfer . . . 34 4.8 Houdini . . . 34 5 Results 35 5.1 Computer specifications . . . 35 5.2 Example: Fountain . . . 35 5.3 Example: Splash . . . 37

5.4 Example: Color and velocity transfer . . . 38

5.5 Performance . . . 39

6 Conclusion 41 6.1 Future work . . . 42

(8)

Chapter 1 Introduction

In the realm of visual effects, the creation of natural phenomena such as water, smoke and fire have been pursued since the early 80’s with the release of the movie Star Trek II: The Wrath of Khan [Par82][Ree83]. Since such phenomena are very complex, manual techniques such as key-framing is not a viable option. Instead these types of effects are done by setting up physical models for the movement of the fluids and simulating the outcome. Because people are used to seeing, for example, water in real life, it is very important that the water acts and looks as expected if it is to be perceived as real, e.g. water is expected to have a certain viscosity and if it runs too slowly it will be easily spotted.

Fluid simulation approaches are usually divided into two main categories: grid-based (Eulerian) approaches and particle-based (Lagrangian) approaches. Both have their own strengths and weaknesses which will be discussed in Chap-ter 2.

As one of the world’s top visual effects studios, Sony Pictures Imageworks has been working with creating natural phenomena for a long time. The company has proprietary software for both of the simulation approaches mentioned above, the latest addition being a Smoothed Particle Hydrodynamic (SPH) library by Lundqvist in 2009 [Lun09]. This library was integrated with Side Effects’ Houdini [Sof10] for use by the artists in the production pipeline.

(9)

particle representation of the fluid to an actual renderable surface. For this task, a tool that shipped with Houdini is used. The built-in tool does however have a number of shortcomings which makes it unfavorable to use. First of all it is slow when used with large numbers of particles over large simulation domains; it often take more time to reconstruct a surface from the particles than it does to simulate their movement. Secondly, it is difficult to create smooth surfaces without significantly reducing the small-scale detail as well as introducing visually disturbing artifacts to the surface. Finally, the tool is quite hard to use for artists without a technical background, with many non-intuitive settings that need to be tuned to get a resonable surface.

The objective of this thesis is to investigate the creation of surfaces from particle data and the capturing of details of small-scale fluid simulations such as objects falling into water and creating splashes. The goal is to create a prototype for this which is going to be integrated into Houdini. Moreover, the prototype should utilize the parallel capabilities of today’s GPUs for speed benefits. In addition to this, a CPU version should also be created for reference and for use on computers without sufficient GPU support.

1.1 Method summary

The method used in the implementation of this thesis is based on the work of Williams [Wil09]. The method creates an initial surface from a field where each voxel holds the distance to the nearest particle. The surface is created by using a new approach based on Marching Tetrahedra called Marching Tiles. The initial surface is quite crude, so a number of iterative smoothing steps are applied in order to make the surface look good. Normally the problem with smoothing are collapsing surfaces, but the approach applies constraints on the surface after each smoothing step in order to make sure that the surface still represents the underlying particle distribution.

(10)

1.2 Structure of the report

Chapter 2 introduces the necessary background for Eulerian and Lagrangian methods and surveys past as well as current techniques for particle surface reconstruction. In Chapter 3, GPU computing and the CUDA computing ar-chitecture are presented together with considerations for performance benefits. Implementation details for the necessary steps of particle surface reconstruction using the GPU are presented in Chapter 4. Chapter 5 presents the results of the thesis with both images as well as timing comparisons between the CPU and GPU versions. Finally, Chapter 6 features a conclusion of the results and also contains suggestions of further improvements which can be done.

(11)

Chapter 2 Background survey

This chapter begins with a description of the two main fluid simulation ap-proaches. Following this, different methods for creating surfaces from particle data are presented, with a focus on methods that samples the particles on an underlying field. Then, different ways of extracting surfaces from such fields are described. Finally, the constrained surface smoothing method used in the implementation of this thesis is described.

2.1 Fluid simulation

The goal when simulating natural phenomena is to solve equations which de-scribe the motion of the fluid. A popular set of equations used are the Navier-Stokes equations formulated by Claude-Loius Navier in 1822 and by George Gabriel Stokes 1845. These equations are used in a wide number of areas and was first used in computer graphics by Kajiya et al. in 1984 [KVH84]. The difference between computer graphics and many other areas, such as scientific simulation, is that the result does not need to be accurate, or even remotely correct. As long as the result looks realistic to the human eye, it is acceptable and thus a lot of approximations and simplifications can be made. As noted in Chapter 1, there exists two main approaches to fluid simulation which are used for visual effects: Eulerian and Lagrangian, both of which are described below.

(12)

Eulerian approaches are characterized by a simulation domain that is dis-cretized into a static three dimensional field of voxels. The voxels in the field each hold physical properties such as velocity, density and temperature and the values at each voxel are calculated by solving the governing equations in every time step. The field offers fast access to the data for any position within the field by means of interpolation from nearby voxels that can be accessed instantly.

Although grid-based techniques have been the most common way of simu-lating fluids [Bri08], they have some inherent problems. The finite and fixed nature of the field used for computation can make the representation of fine details difficult. This can of course be solved by increasing the size of the field, however at great cost to computation and memory consumption. There has also been use of data structures such as octrees by Losasso [LGF04] et al. for this problem as well as hybrid methods such as Particle Level Sets. The Particle Level Set method, introduced by Enright et al. [EFFM02], spawns particles at strategic places in the field and then let these be moved at each time step. The values of the field are then corrected by the particles if they differ too much. Still, even with more advanced methods, the representation of very fine detail is difficult due to the fixed nature of the grid.

The surface, or interface, of a fluid is implicit in Eulerian approaches. It is often represented by a level set φ where each point in the field holds the signed distance to the interface. Values of φ < 0 are considered to be inside the fluid while values of φ > 0 are outside. Implicitly, this gives that the surface of the fluid lies at φ = 0 [OF03].

Retrieving the surface for rendering is trivial once the level set has been retrieved by using a Marching algorithm such as Marching Cubes (see section 2.3) for extraction of an iso-surface.

Whereas Eulerian approaches simulate the fluid at fixed point in space, La-grangian techniques use a cloud of particles to move with the fluid and to rep-resent it. One of the big advantages with this is that the simulation domain is unrestricted and not bound by any grid, allowing each particle to move freely.

(13)

While grid-based methods need to calculate values at each point in the field, even if there is no fluid in the voxel, particle-based approaches only need to do calculations where there are particles, since this is where the fluid is located.

Another advantage Lagrangian methods have is that each particle has a mass, making problems with mass and volume loss non-existent as long as the number of particles is kept constant.

One of the disadvantages with Lagrangian methods noted by researchers is the difficutly of extracting a high-quality renderable surface [EFFM02] [PTB+_03].

With the increasing interest of Lagrangian methods, especially Smoothed Parti-cle Hydrodynamics, the aspect of surface reconstruction from partiParti-cles has also gained some attention and is the main focus of this thesis.

It should be noted that the surfacing techniques below are not restricted to any particular simulation technique. Any algorithm that simulates particles and has the properties needed by the surfacing algorithms can be used. SPH is used in the examples and results because it is the main system targeted during the work of this thesis.

2.2 Particle surfacing

2.2.1 Metaballs

The field of reconstructing surfaces from particles got one of its first contribution from Blinn [Bli82] when he introduced blobbies, or metaballs, in 1982. The approach creates an influence field from the particles. Each point in the field evaluates the positions of the particles with a function and sums the results to get a final influence value. By doing this it is possible to get a smooth blending between particles, see Figure 2.1. A general description of how metaballs are constructed can be seen in Equation 2.1, where xiis the position of the particle,

x is the voxel position in the field and W is a function that describes the influence, also called a smoothing kernel.

The Gaussian function seen in Equation 2.2, with a and b being constants, is the influence function that Blinn used, but any function that is smooth and

(14)

goes to zero at a maximum value can be used.

φ(x) =X

i

W (x − xi) (2.1)

W (r) = ae−bkrk (2.2)

In the case of functions like that of Equation 2.2, it can be seen that the influence will drop off very rapidly for increasing distances between x and xi,

but never reaches zero. Because accuracy is not of critical importance, it is wise to specify a radius h, outside of which the particle influence is not considered. Equation 2.3 shows another smoothing kernel used by Rosenberg et al. [RB08]. This kernel will be computationally faster than Equation 2.2 partly because of the lack of square roots and the exponential, but mainly because particles outside of h do not need to be processed.

W (r, h) =    (krk/√2h)4_{− (krk/}√_2h)2_{+ 0.25} _{if krk < h} 0 otherwise (2.3)

Once the field is computed, it is very easy to extract the isosurface using one of the methods described in Section 2.3.

Though they are easy to implement, metaballs have a number of disadvan-tages as well. The surface extracted from them is very dependent on the iso-level chosen for extraction as well as the distance h used in the summation [APKG07].

Figure 2.1: Surfacing using metaballs. When two particles are brought together, the combined influence will create a smooth blending (middle), but the technique is bad at representing flat surfaces (right).

(15)

Metaballs are also bad at representing flat surfaces, see rightmost image of Fig-ure 2.1 where the underlying spherical natFig-ure of the particle representation is clearly exposed. Adams et al. [APKG07] also note that disturbing temporal discontinuities appear when adding or removing particles.

2.2.2 Surface splatting

Another early technique, introduced by Reeves [Ree83] involves rendering par-ticles used to model fuzzy objects by additively blending them onto the image plane. If there are enough particles, the individual particles will blend together and create a smooth surface. The downside of this is that the technique re-quires a large amount of particles to yield good results. The method also suffers from problems with single particles being visible at the edges of the surface [RB08]. This technique, also called splatting, is also used by Sims [Sim90] where it is implemented to render waterfalls and snow storms etc. In more recent work, billboards1 are used for surfacing data from SPH simulations by Muller [MCG03], but in subsequent papers this approach seems to have been abandoned [MSKG05].

Though it is easy to implement and gives plausible results for particle sim-ulations [MCG03], it does not seem to have gained much traction in visual effects. Instead, the method has had more success in the fields of visualization of scanned point data and real-time applications [ZPvBG01][BHZK05][ALD06].

2.2.3 Level set based methods

Premo˘ze et al. [PTB+03] introduced a technique based on the level set methods [OS88] to construct surfaces for their particle fluid simulations. The level set method is used to track the surface by first creating a force field from the particle distribution for the first frame and then using a partial differential equation to iteratively evolve the surface until it converges. For subsequent frames, the field from the previous frame is used for initialization, allowing for continuity for the surface between frames.

(16)

The downside with this method is that is seems to be very slow compared to other surface reconstruction algorithms. The authors also note that the method has difficulties with creating sharp boundaries when colliding with objects or other fluids.

2.2.4 Beyond metaballs

As described in section 2.2.1, metaballs create a field by summing the influ-ence of nearby particles, explained by Equation 2.1. A problem with the early smoothing kernels is that if many particles are grouped closely together, the field will have very high values in that area, creating a large, unnatural blob. In 2003, M¨uller et al. [MCG03] used the density from their SPH simulation to normalize the particle contribution to the field, see Equation 2.4.

φ(x) =X i mi 1 ρi W (x − xi, hi) (2.4)

Noting that this approach solves ”some scaling issues but does not signif-icantly reduce the bumps on what should be flat surfaces”, Zhu and Bridson [ZB05] proposed a different field creation method. It is based on the principle that the influence field should be represented by the signed distance field for lone particles. Given a single particle position x0 with radius r0, the resulting

field for a lone particle is:

φ(x) = kx − x0k − r0 (2.5)

With this as a base, they formulate a generalized solution for an arbitrary number of particles by using a weighted average of the positions and radii:

φ(x) = kx − a(x)k − r(x) (2.6) a(x) =X i w(x − xi, hi)xi/ X i w(x − xi, hi) (2.7) r(x) =X i w(x − xi, hi)ri/ X i w(x − xi, hi) (2.8)

(17)

where w is a smooth decaying weight function with finite influence radius hi.

It is easy to see that for particles with no neighbors within hi, the field is reduced

to the field of Equation 2.5. Because the approach uses average positions in the particles, spurious blobs may appear in concave areas where the actual surface positon does not correspond to the position given by Equation 2.6. Even so, the authors note a big improvement in comparison to previous techniques.

Figure 2.2: Comparison between metaballs (red line), Zhu et. al [ZB05] (green curve) and Adams et al. [APKG07] (blue line). Image from [APKG07].

Adams et. al [APKG07] improved the method of Zhu et al. [ZB05] by tracking the particle-surface distance for each particle by using a signed distance field. They used this field in each time step to adjust the per-particle distance to the surface. By making use of this distance instead of the radius in Equation 2.8 and the weight function in Equation 2.9, the authors note an improvement as can be seen in Figure 2.2.

w(r, h) =    (1 − (krk/h)2₎3 _{if krk < h} 0 otherwise (2.9)

2.2.5 Anisotropic kernels

Basing their work on M¨uller et al. [MCG03] (Section 2.2.4), Yu and Turk [YT] note two fundamental problems which make the surfaces of the original approach look bumpy. The first problem is the nature of Lagrangian methods, which makes the distribution of particles very irregular. Secondly, the authors note that the spherical shape of the kernels used is not good for representing the particle density distribution near a surface.

To solve the problem of irregularity, the authors suggest the use of a smooth-ing step of the particle positions before the surface reconstruction. The smoothed

(18)

particle centers are described by Equation 2.10, where w is a weighting function with finite support which determines the influence from nearby particles and λ a constant such that 0 < λ < 1.

x0_i= (1 − λ)xi+ λ X j wijxj/ X j wij (2.10)

This equation can be solved for all positions simultaneously using a sparse matrix system and using an iterative solver which will be discussed more in section 2.4.

To solve the second problem Yu and Turk [YT] proposed a generalization of the smoothing kernels used by techniques in the previous sections. With the exception of Premo˘ze et al.[PTB+03], who use the tangential and perpendicular velocity of the particles to scale their kernel, most other authors use spherically shaped kernels. Instead of using a constant support radius h, as in Equation 2.3, Yu and Turk propose that a positive definite matrix G should be used. This matrix is used to rotate and transform r to create a kernel that better represents the density near a particle. The matrix G is determined by perform-ing a weighted version of Principal Component Analysis [KC03] to the particle positions within a neighborhood. This method constructs a weighted covariance matrix, C, and then performs eigendecomposition where the eigenvectors gives the principal axes and the eigenvalues gives the scaling factors with which to scale the kernel.

Figure 2.3: Comparison between isotropic kernels and the kernel description presented by Yu and Turk. Figure from [YT].

(19)

The result of this is that the kernels that lie near a surface are flattened in the normal direction and stretched in the tangential direction of the surface while lone particles and particles within the volume will be spherical due to the same distribution of density in every direction. A comparison between this technique and Zhu et al. can be seen in Figure 2.3.

2.3 Surface extraction

The techniques described in the previous sections that are building a field, φ, using particles need to use some kind of algorithm for turning the field into a polygon mesh. The most popular way of doing this is to create an iso-surface from a constant value in the field, which can be done in a number of different variations.

2.3.1 Marching Cubes

In 1987, Lorensen et. al [LC87] introduced the Marching Cubes algorithm. The authors divide the field, φ, into cubes, where each corner is assigned to be either inside or outside the surface based on the iso-value selected. They then conclude that a polygon can only pass through the cube in a finite number of ways. For a cube, this number is 28 = 256, which can be reduced to 15 cases by using symmetry. Three of these cases can be seen in Figure 2.4. By using a lookup-table it is possible to quickly get the topological structure of a cube given the inside/outside state of each corner. The final polygon can then be obtained by creating vertices at interpolated positions along the cube’s edges. By doing this (”marching”) for every cube in the field, a mesh can be constructed from the resulting polygons.

The problem with the original approach is the possibility of ambiguities in the cases causing non-manifold geometry [D¨88]. Solutions for this problem has been introduced by, among others, [MSS94] [Che95] and [LLWVT03], who introduces complementary cases to the original 15 to solve the ambiguities.

(20)

Figure 2.4: Three of the fifteen base cases for Marching Cubes with the in-side/outside state of each corner visualized.

2.3.2 Marching Tetrahedra

Another method solving the ambiguity of Marching Cubes was introduced by Bloomenthal [Blo88] who proposed the use of tetrahedra instead of cubes. In addition to removing ambiguities, there are only three unique ways in which a polygon can pass through a tetrahedron, depicted in Figure 2.5, making im-plementation even easier. As noted by [SML], though, this technique creates surfaces with more triangles than Marching Cubes and artificial bumps in the surface may appear due to the choice of orientation of the tetrahedra.

Figure 2.5: Shows the three ways that a polygon can pass through a tetrahedron.

2.3.3 Marching Tiles

In his thesis, Williams [Wil09] noted that any technique based on marching structures with dihedral angles greater or equal to 90◦, such as Marching Cubes or Marching Tetrahedron, can produce vertices with valence1 _{as low as four.}

Such vertices represent poor sampling of the underlying domain and can also

(21)

cause problems when applying numerical operators such as smoothing on the surface.

To solve this problem, Williams introduce a new space filling tile consisting of 46 different tetrahedra. This tile, as seen in Figure 2.6, is constructed such that each edge have at least five tetrahedra incident on it, thus guaranteeing a valence of at least five. The tile can be marched in the same way as the other techniques, but some more lookups need to be done in order to get the correct tetrahedron for a given place in the field. The structure of the tile and how to construct it can be found in [Wil09].

Figure 2.6: The space-filling tile used by Williams from two different angles. A lookup table is used to find which tetrahedron to march depending on location in the field.

2.4 Surface smoothing

Mesh smoothing has been an area of active research for a long time and has many applications such as denoising data and design of high-quality surfaces [BPK+_{07]. A popular method for smoothing meshes is called Laplacian}

smooth-ing, where the vertices of a mesh are gradually moved in the direction of the Laplacian of each vertex. Depending on the approximation used for the Lapla-cian, different results can be achieved. [Bra04]

The concept of smoothing has been used in the creation of surfaces from particles before. For example, Houdini [Sof10] can use a Gaussian filter to smooth the field before creating the initial surface and also smooth the resulting

(22)

mesh. While such techniques do create a smoother surface, the problem is that pre-surface filtering may create unwanted artifacts as seen in Figure 2.7. Both filtering and post-surface smoothing also suffers from the lack of connection to the underlying particle representation and is thus not able to distinguish between noise and actual small-scale structures.

Figure 2.7: Shows a surface created with the Houdini surfacer, with the under-lying particles overlayed. Note that the surface does not coat all particles and that there are artifacts in the form of polygons where there are no particles.

In his thesis, Williams [Wil09] presents an approach for creating smooth and flat surfaces. Instead of using complex kernels to create the initial influence field field, a distance field is used where each voxel holds the distance to the nearest particle. The surface is created at router using Marching Tiles (Section 2.3.3).

The surface is then iteratively smoothed while applying constraints to make sure that the surface stays true to the underlying particles. The basic idea of the method can be seen in Figure 2.8.

The method poses the smoothing as a problem of constrained nonlinear opti-mization and two different quadratic objective functions are used for measuring the smoothness of the surface.

(23)

Figure 2.8: The vertices of the constructed surface (red line) will be located between routerand rinner when the algorithm is finished. Figure from [Wil09].

2.4.1 Unweighted graph Laplacian

The first smoothing step used by Williams is based on an approximation of the Laplacian that uses the inter-vertex connectivity of the mesh, introduced by [Tau95]. The discrete Laplacian of a vertex xi is approximated by:

∇2_x i=

X

j∈N1(i)

Wijxj (2.11)

where N1(i) is the inclusive 1-ring neighborhood of vertex i; that is, all

the vertices that share an edge with vertex i plus the vertex itself. Wij is set

to −1 and Wii is set to the number of elements in N1(i), creating a weighted

connectivity graph for all the vertices. It is now possible to form

∇2_{x = Ax} _(2.12)

where x is a N x1 vector with all the vertices in the mesh and A is a N xN sparse matrix containing the weights for each of the vertices. Iteratively mini-mizing xT_{Ax will now result in each vertex moving towards the center of all its}

(24)

2.4.2 Weighted surface bilaplacian

In the second smoothing step, Williams minimizes xTBx, where B is a quadratic approximation of the surface Bilaplacian:

B = WTD−1W (2.13)

In Equation 2.13, W is a N xN matrix, N being the number of vertices in the mesh, where an element Wij is set to the sum of the cotangent of the opposite

angles for the two triangles incident on the edge connecting vertex i and vertex j, see Figure 2.9 (a) [DC98]. D is a diagonal matrix where element Dii is the

area of all triangles such that vertex i is a part of that triangle, see Figure 2.9 (b). Minimizing the objective function will improve the curvature of the mesh, creating a better looking surface.

Figure 2.9: (a) Shows the angles used for element Wij. (b) Shows all areas

incident on a vertex xi.

2.4.3 Solvers

When the A and B matrices have been retrieved it is possible to set up and solve the smoothing optimization problem as a linear system. This system can then be solved by using for example the Gauss-Seidel iterative method or Successive Over-relaxation (used by Williams [Wil09]). Other researches have

(25)

also used preconditioned bi-conjugate gradient for similar problems with good results [DC98].

The constraints described below do make the optimization problem non-linear, but they are handled separately and are not encoded into the matrices, thus it is still possible to use linear solvers.

2.4.4 Constraints

The problem with just smoothing is that it can cause significant shrinkage of the mesh. If enough iterations are made, the mesh will collapse to the barycenter of the vertices. Williams solves this by applying a constraint to the vertices after every smoothing step. By enforcing Equation 2.14 where x is a vertex and Pia

particle, the resulting surface is not only a good representation of the underlying particles, the author also notes that given particles arranged on a grid of size d, a flat surface will be produced if Equation 2.15 is fulfilled.

rinner ≤ min i kx − Pik ≤ router (2.14) r_outer2 > d 2 2 + r 2 inner (2.15)

(26)

Chapter 3 GPU programming and

CUDA

Performance has always been a driving factor when hardware vendors create new products, whether it is the transfer speed of USB ports, the response time of a touch screen or the number of pixels in a digital camera. The clock frequency of the central processing unit (CPU) of a computer is no different and there has been an exponential increase in performance over the past decades.

The CPU is constructed for running a single thread of sequential instruc-tions and works well for many tasks, but this is a major drawback when having multiple tasks that can run independently of each other, e.g. heavy calcula-tions. Although multi-core CPUs have been around since 2004, the underlying hardware is still made for sequential computations. This is probably why re-searchers without access to massively parallel supercomputers started looking at General Purpose computing on the Graphical Processing Unit (GPGPU) for doing their computations. The GPU is very different from the CPU in that it is made with the intention of performing tasks in parallel, i.e. rendering pixels on a screen, and can be a great deal faster for computations than the CPU if the computations can be parallelized.

(27)

Before the introduction of the Geforce 8 GPU from NVIDIA, the only pos-sibility of GPGPU was to make use of the graphic APIs such as OpenGL or DirectX. This made GPGPU both hard to master and also put restrictions on the type of calculations you could make due to the limitations of the APIs. In November 2006, together with the Geforce 8 card, NVIDIA also introduced the CUDA computing architecture which was made in order to solve these limita-tions. Since then, other APIs have been released, most notably OpenCL which is a cross-platform API for general GPU programming.

The GPU programming work of this thesis is done exclusively in CUDA, and therefore the remainder of this chapter will focus on the architecture and programming model of CUDA.

3.1 CUDA

CUDA, or Compute Unified Device Architecture, is as hinted above a general purpose hardware and software architecture aimed at massively parallel and computationally intensive applications.

Developed by NVIDIA, at the time of writing, CUDA is only supported by GPUs from NVIDIA, such as the Geforce Series. These GPUs are viewed as devices with additional processors to the main CPU, capable of executing tens or hundreds of thousands of threads in parallel.

3.2 Architecture

As the underlying architecture is constructed for parallel programming, it is necessary to have knowledge about it in order to fully utilize its potential.

Every CUDA enabled GPU contain a number of streaming multiprocessors, each containing a number of cores. The multiprocessors have a Single Instruc-tion, Multiple Data architecture (SIMD). This means that at a given time, every core in the streaming multiprocessor executes the same instruction, but on different data. [Cor10c]

(28)

Because not all code is suitable for running in parallel, a CUDA program will typically consist of passages run on either the CPU (the host from now on) or the GPU (the device from now on). The passages with no data parallelism are implemented on the host using normal C or C++ code and the sections that exhibit large amount of data parallelism are implemented using device code (see Section 3.4).

Figure 3.1: An example of a grid of blocks. Image from NVIDIA CUDA Pro-gramming Guide [Cor10c].

When a device function, called a kernel, is called, hundreds to tens of thou-sands of lightweight threads are launched to operate on the data. These threads are put into one of possibly many blocks, which in term form a two dimensional grid. The size of the grid and the blocks can be specified by the programmer. The grid and block indices are also known to each thread, making it possible to form them to fit the data they operate on. Figure 3.1 shows an example of a grid of blocks. The first kernel launches a grid of size (3,2) where each block contains 15 threads (5,3).

(29)

In order to allow for future changes in the hardware such as adding more cores, CUDA is designed for transparent scalability. This makes it possible to create applications that will run on future generations of GPUs without having to change the code because the architecture supports such changes.

3.3 Memory model

There are several different memory types available to the CUDA programmer. Each of these memories has different size, scope and lifetime. Figure 3.2 shows the different memories and their relationships.

• Global memory is the main memory of the GPU and is the only one giving read and write access to both the host and the device. It is accessible from all threads but it is not cached, making access times high even if a coalesced access pattern is followed.

• Constant memory is like global memory but it can only be written to from the host. It is also cached, making it read from the slow device memory only on a cache miss.

• Texture memory is a cached memory accessible to all threads with hard-ware support for special operations such as linear and higher order y7interpolation. • Registers are extremely fast device memories used to store local variables.

They are local to a thread and the memory will be released once the thread is done executing.

• Local memory is a part of the global memory, but the memory is still local to a thread. Since the on-chip memory is limited, the use of too many registers will cause them to be placed in the local memory instead. • Shared memory is a very fast on-chip memory which is shared between the

threads of a block, allowing for collaboration without having to resort to the slow global memory. The lifetime of the shared memory is the lifetime of a kernel.

(30)

Figure 3.2: A diagram over the CUDA memories. Image from NVIDIA CUDA Programming Guide [Cor10c].

3.4 Programming interface

There are currently two Application Programmeing Interfaces (APIs) for pro-gramming CUDA.

The first one is the CUDA Driver API, which is a low-level interface that gives good control over the device and has no dependency on the runtime library. However, the driver API is known to have more verbose code and it is harder to debug than C for CUDA, which is the second API. [Cor10a]

C for CUDA is a higher level interface implemented on top of the driver API. It is built to be as an extension to normal C, exposing new syntax for creating and launching kernels etc. It also contains a number of functions for data transfer to and from the GPU, thread synchronization and providing OpenGL interoperability to name a few.

(31)

3.4.1 Optimizations

Simply getting a CUDA program to compile is not enough to get great perfor-mance. There are a number of things that need to be considered in order to achieve fast programs.

• The highest priority should always be to maximize the parts of the pro-gram that can run in parallel. By choosing correct algorithms and execu-tion configuraexecu-tion for each kernel launch (maximizing the resource usage), extreme speedups can be achieved.

• Memory optimization should be done by first minimizing the data trans-fered between the host and the device. Such transfers have low bandwidth and are very slow compared to transfers between other memories. Access of global memory should always be coalesced and access of global memory from kernels should also be as low as possible, letting kernels within a block use the shared memory instead.

• Control flow instructions such as if- and for-statements can have a big impact on performance if the threads within a warp1 _{are not running}

the same code. If one thread in a warp takes a different route in an if-statement which takes longer time to execute, all the other threads in the same warp will wait for that thread to finish. By grouping threads or the data wisely, such clogs can be avoided.

• Instruction optimization can be made by using alternative versions of heavy arithmetic functions optimized for speed. These functions often has a tradeoff when it comes to precision though, so they should only be used when they do not affect the quality of the result.

More in-depth information about CUDA and many more optimization strate-gies can be found in the CUDA Programming Guide [Cor10c], the Best Practices Guide [Cor10a], or in the two introductory books Programming Massively

(32)

allel Processors by Kirk and Hwu [KmWH10] or CUDA by Example by Sanders and Kandrot [SK10].

(33)

Chapter 4 Implementation

The implementation made in this thesis closely follows the work of Williams [Wil09], which seemed to yield very good results while at the same time having many passages allowing for massively parallel operations on the GPU. A CPU version of the surfacer was constructed as well. This was done because not all artists had good enough graphics cards to handle huge amount of data and also because the surface reconstruction should be able to run on the rendering farm, currently only consisting of CPUs. Though different algorithms are used for some of the operations, the end result should be the same.

Figure 4.1 shows an overview of the data flow and the different operations performed from particles to a final surface for a normal run of the algorithm. The following sections will explain these steps in more depth after a data structure for fast searching is described.

4.1 Uniform hashed grid

In the field creation, the constraint and the attribute transfer steps of Figure 4.1, a spatial search is needed to quickly find particles near a certain point, be it a position in the field or a vertex on the surface.

For this, an implementation based on the work of Le Grand [LG07] is used. In this method space is divided into a fixed sized grid of size (dx, dy, dz) with

(34)

Figure 4.1: All the steps in the implementation of the surfacing algorithm.

cells of equal size, (cx, cy, cz). Then a hash value according to Equation 4.1

is computed for every particle position, P, which is stored together with the particle IDs. The particles are then put into a one-dimensional array which is sorted using a parallel radix sort [SHG09] such that all particles within a cell are adjacent to each other. Two additional arrays are then created to store the start and end indices of each cell in the big array, allowing for fast lookup of nearby particles for positions that hash to the same value.

Phash= mdxdy+ ldx+ k (4.1) k = bPx cx c mod dx l = b Py cy c mod dy m = b Pz cz c mod dz (4.2)

By choosing an appropriate size for the cells, the closest particle to a point can be found by looking in the cell that the particle position is hashed to, or in one of the adjacent cells.

(35)

The uniform hashed grid is fast, light-weight in terms of memory and easy to implement. A sample implementation of this structure can be found in the CUDA SDK [SG08], though it is heavily coupled with the specific implementa-tion.

4.2 Distance field

The CPU version creates the distance field by rasterizing the particles on the underlying field and then checking nearby voxels inside the influence radius for their distance value and updating them if necessary. Since it is not required to store distance in voxels that is not close to a surface, a sparse field from the Field3D library [Wre10] was used for storage.

The rasterization process causes race conditions when run in parallel if atomic operations1 are not used. These operations were not widely supported on the graphic cards installed at Imageworks, so a different technique was used for the GPU version. A sparse data structure was constructed to reduce the memory footprint, much like the sparse field above. The first step was to cal-culate which of the sparse blocks that would need allocation. Blocks that were too far from any particles to contribute to the surface could simply be skipped. Then for all the allocated blocks, each voxel position was visited to search for the distance to the nearest particle by using the uniform hash grid. The size of the hashed grid was such that if no particle was found within the cell or its neighbors, the distance could safely be set to ∞.

4.3 Mesh creation

For the mesh creation, Marching Tiles (Section 2.3.3) was implemented. Im-ageworks already had an implementation of Marching Cubes both for the CPU and GPU, so they were integrated into the tool as well.

Marching Tiles was done on the GPU by first computing which of the tetra-hedra in the tiles would actually contain triangles. This was done so there would

(36)

be no need to march through empty tetrahedra and also to know how many tri-angles the final mesh would have so proper allocations could be made. Then the actual marching of the tetrahedra was made and all the edges in the field that would create a vertex was saved together with the two end values. The reason for this is that simply creating the triangles directly would create just triangles and not a mesh. Because of this it is necessary to weld the vertices or in this case the edges together. The benefit with welding edges instead of the final vertex positions is that the edges have discrete coordinates while the vertices have floating point coordinates which may be subject to rounding errors due to interpolation direction etc. The edge weld was constructed by modifying the algorithm from [HB10] and then the final vertex values were retrieved, creating a mesh with correct connectivity information.

4.4 Smoothing

As can be seen in Figure 4.1, the smoothing is first done for xT_{Ax and then}

two times for xTBx, one with constraints and the second without. The first two steps move the vertices to good positions, while the unconstrained step improves the smoothness of the surface normals while not moving the vertices too much [Wil09]. The result of different number of smoothing steps minimizing xT_Ax

can be seen in Figure 4.2.

The matrices processed in the implementation are all very sparse. Each row in a matrix will have as many elements as there are vertices in the mesh but only an average of seven elements will be non-zero. For memory reasons, the matrices in the implementation were all represented using a sparse matrix stor-age scheme called Compressed Row Storstor-age. While this method makes element access slower, the memory consumption is extremely small in comparison to a dense matrix.

(37)

Figure 4.2: Different number of smoothing steps for xTAx with constraints. (top) Initial surface. (middle) 2 steps, (bottom) 6 steps

4.4.1 Finding matrices

The process of computing the A and B matrices were implemented on the CPU because of the serial nature of the procedure and a lack of time to find appro-priate parallel versions. A is found by creating an array of associative arrays and looping over the connectivity list of the mesh, adding the vertices of each triangle to the appropriate associative array. The sparse matrix is then con-structed by extracting the values from the arrays. The matrix B is concon-structed in a similar fashion.

4.4.2 Solvers

For the actual solving of the linear systems, a Jacobi solver adapted for sparse data was used. Even though this method might converge slower than Gauss-Seidel or Successive Over-Relaxation, the fact that it is very easy to parallelize

(38)

and that the parallel version is much faster makes up for this, see Section 5.5.

4.5 Constraints

The shrinking effect of the smoothing which can be seen as a negative effect in some cases is what makes surfaces flat in this implementation. The constraints make sure that the vertices are allowed to move to better positions while still being true to the particle positions. The constraints are enforced after each smoothing step by simply using a hashed grid to look up the closest particle to each vertex and move the vertex along the radial vector if found to be inside or outside the stipulated radii.

4.6 Erosion

In an attempt to be able to represent very thin structures without using huge number of particles with a small radii, an optional erosion step was added, where each vertex is moved along its negative normal direction. The result of this operation can be seen in Figure 4.3.

While easy to implement and straight forward, this method very easily cre-ates self-intersecting triangles if a large value is used. A surface that looks good in one frame might also have many self-intersections in the next one if large values are used.

(39)

4.7 Attribute transfer

As a last optional step, the ability to transfer attributes from the particles to the vertices of the mesh was added. Such attributes can for example be color, see Figure 5.3, or velocity which can be used for motion blur. This was done by specifying an influence radius for the vertices and then use that as the size of the spatial hash to find all relevant particles within the radius and using a weighted average to get the final attribute value.

4.8 Houdini

The final program was designed to be independent on any particular software package taking only a list of particles and a structure containing the surfacing parameters. In the prototype made, the functionality of the surfacer was ex-posed by constructing a Surface Operator (SOP) for Houdini using the Houdini Development Kit. A screenshot of the SOP can be seen in Figure 4.4. The parameters seen in the figure masks many of the real ones for ease of use for the artist.

Figure 4.4: A screenshot of the Houdini operator created for the particle surface reconstruction.

(40)

Chapter 5 Results

5.1 Computer specifications

All the work in this thesis, including the timing experiments, was done on an octacore 2.5GHz computer with 24Gb of RAM and a Quadro FX 3700 graphics card with 512Mb memory. Note that the CPU version was not threaded, thus only making use of one core.

5.2 Example: Fountain

In Figure 5.1, a sequence from a SPH simulation of a fountain pouring a stream of water can be seen. The simulation had approximately 120,000 particles and was surfaced on a 325x600x100 grid. The surfacing took about two seconds on the GPU, and 30 seconds if done on the CPU in production quality.

(41)

(42)

5.3 Example: Splash

Figure 5.2 shows the surfacing of a splash where an object collides with a big pool of particles. The simulation as well as the surfacing was done in three different passes in order to add more small detail to the final sequence. Note that this makes the small scale particles unable to interact with the pool in the simulation.

(43)

5.4 Example: Color and velocity transfer

The ability to transfer attributes such as color and velocity can be seen in Figure 5.3. A cube acts as source for all the particles which are colored based on their coordinates.

(44)

5.5 Performance

The tests in the following sections were run ten times each and the result is the average of these runs. The tests were run on the same dataset, which was 475,000 particles that was sampled and surfaced on a 500x650x350 field. Figure 5.4 shows a bar chart over the time for the different parts of the surfacing, both for the CPU and the GPU version. Figure 5.5 shows the percentages of the processing times for the GPU version. Note that these results are only an ex-ample and that results may vary quite much depending on particle distribution and field size. However, it can be seen that the GPU version is significantly faster where full utilization of the GPU can be used. It can also be seen that overhead and the sections that are done on the CPU takes up more than 70% of the computation time for the GPU version.

Figure 5.4: Illustrates the execution time for different parts of the program on both the CPU and the GPU.

(45)

Figure 5.5: Shows how the processing time is divided between the different parts of the GPU program.

(46)

Chapter 6 Conclusion

The goal of this thesis was to evaluate the field of reconstructing surfaces from particle data and to create a prototype for this using the GPU for integration into Houdini. The technique implemented should be faster than the current technique and also be easier for artists to work with. Furthermore it should be easier to represent flat surfaces without introducing artifacts.

Because the method presented in this thesis involves more steps and is more computationally expensive than the native Houdini version (which basically performs the two first steps of Figure 4.1) it may not be fair to compare them straight off. The general feeling when working with the GPU version is that it is still faster than the native application. When not doing any smoothing or constraints, the GPU version is much faster.

The usability of the plugin created is not easy to estimate because the pro-totype was not rolled out to be tested by artist until one week prior to the completion of the thesis. The feedback receieved, though, seemed positive and the SOP does hide a lot of unnecessary options to the end user, instead exposing a combination of them as for example ”Smoothness” or ”Mesh detail”.

The technique used for surfacing and smoothing based on the work of Williams [Wil09] creates a final surface that is globally smooth and also flat under certain conditions (Equation 2.15). The technique also creates a surface that stays true to the particles positions. The smoothing can however create collapsing features

(47)

under certain conditions when the vertices snaps to the ”wrong” particles due to large smoothing step sizes. The method is also sensitive to changes in the field, something that becomes a problem when surfacing particles having almost no velocity. Many of the simulations used for testing had particles that were supposed to be settled but still oscillated a little bit, which caused the resulting surface to flicker due to differences in the field from one frame to the next. This becomes extra evident when using reflective materials for the surface.

6.1 Future work

The paper by Yu et al. [YT] has some interesting ideas and the results are visually very pleasing. The implementation of anisotropic kernels should be the first priority for further work. Yu et al. also acknowledges that their solution would benefit from the work of Williams [Wil09], and a combined solution would be good to try. The changes needed would be the creation of the field as well as the constraint enforcement, where a pure distance to the nearest particle can no longer be used. Instead the constraints need to be based on the individual elliptical kernel for each particle.

Another thing that the evaluating artists seemed to want was the possibility to smooth parts of the mesh differently from others. This could easily be done by either creating a field that scales the factors in the matrices or it could automatically be done based on properties such as velocity. Using velocity to scale the elements in the matrices, it would be possible to smooth areas where particles are not moving heavier and smooth less where a lot is going on.

The smoothing of the particle positions mentioned in Yu et al. [YT] should help with the temporal flickering. This smoothing could possibly also be weighted by velocity, letting areas of particles with low velocity be more dependent on neighbors than high velocity particles, where flickering does not pose a problem.

As can be seen in section 5.5 there are parts of the program that is over 100 times faster than the corresponding CPU part, but the overall performance is clogged by bottlenecks. One of the big bottlenecks is the computations that are

(48)

still done on the CPU.

One possible optimization could be to reuse the B matrix in the second step instead of recomputing it, which will gain time but possibly not affect the final surface negatively. Multithreading the CPU parts would also gain the GPU version, but if possible it would be better to try to move all of the computations to the GPU. This would not only give speedup in the computations, the current overhead in moving data back and forth would be removed as well. CUDA is also moving very fast and new libraries and features are coming very rapidly, many of which might be useful, for example CUSPARSE [Cor10b] which is a NVIDIA-developed library for handling sparse matrices added in September 2010.

(49)

Acknowledgements

I would like to thank my supervisor Magnus Wrenninge for all his help and invaluable input.

I would also like to thank the talented people at Sony Pictures Imageworks with whom I have had the fortune to interact, especially Chris Allen, Sosh Mirsepassi and Viktor Lundqvist.

Finally I would like to thank Jonas Unger for helping me getting into the IPAX program and Swedbank-Sparbanksstiftelsen Alfa as well as Professor An-ders Ynnerman’s Foundation for the scholarships that I received.

(50)

Bibliography

[ALD06] Bart Adams, Toon Lenaerts, and Philip Dutr. Particle Splatting: Interactive Rendering of Particle-Based Simulation Data, 2006. [APKG07] Bart Adams, Mark Pauly, Richard Keiser, and Leonidas J. Guibas.

Adaptively sampled particle fluids. In SIGGRAPH ’07: ACM SIG-GRAPH 2007 papers, page 48, New York, NY, USA, 2007. ACM. [BHZK05] Mario Botsch, Alexander Hornung, Matthias Zwicker, and Leif Kobbelt. High-quality surface splatting on today’s GPUs. Proceed-ings Eurographics/IEEE VGTC Symposium Point-Based Graphics, 0:17–141, 2005.

[Bli82] James F. Blinn. A Generalization of Algebraic Surface Drawing. ACM Trans. Graph., 1(3):235–256, 1982.

[Blo88] Jules Bloomenthal. Polygonization of implicit surfaces. Computer Aided Geometric Design, 5(4):341 – 355, 1988.

[BPK+_07] _{Mario Botsch, Mark Pauly, Leif Kobbelt, Pierre Alliez, Bruno}

L´evy, Stephan Bischoff, and Christian R¨ossl. Geometric model-ing based on polygonal meshes. In SIGGRAPH ’07: ACM SIG-GRAPH 2007 courses, page 1, New York, NY, USA, 2007. ACM. [Bra04] Nicholas Bray. Notes on mesh smoothing, 2004.

(51)

[Che95] Evgeni V. Chernyaev. Marching Cubes 33: Construction of Topo-logically Correct Isosurfaces. Technical report, Institute for High Energy Physics, 142284, Protvino, Moscow Region, Russia, 1995. [Cor10a] NVIDIA Corporation. CUDA C Best Practices Guide, Version 3.2,

2010. [Online; accessed 18-September-2010].

[Cor10b] NVIDIA Corporation. CUDA CUSPARSE Library, 2010.

[Cor10c] NVIDIA Corporation. Nvidia cuda c programming guide, version 3.2, 2010. [Online; accessed 18-September-2010].

[D¨88] Martin J. D¨urst. Re: additional reference to ”marching cubes”. SIGGRAPH Comput. Graph., 22(5):243, 1988.

[DC98] Mathieu Desbrun and Marie-Paule Cani. Active Implicit

Sur-face for Animation. In Wayne A. Davis, Kellogg S. Booth, and Alain Fournier, editors, Graphics Interface 1998, June, 1998, pages 143–150, Vancouver, BC, Canada, June 1998. Canadian Human-Computer Communications Society. Published under the name Marie-Paule Cani-Gascuel.

[EFFM02] Douglas Enright, Ronald Fedkiw, Joel Ferziger, and Ian Mitchell. A hybrid particle level set method for improved interface capturing. J. Comput. Phys., 183(1):83–116, 2002.

[HB10] Jared Hoberock and Nathan Bell. Thrust: Weld vertices

example. http://code.google.com/p/thrust/source/browse/ examples/, October 2010.

[KC03] Yehuda Koren and Liran Carmel. Visualization of Labeled Data Using Linear Transformations. Information Visualization, IEEE Symposium on, 0:16, 2003.

[KmWH10] David B. Kirk and Wen mei W. Hwu. Programming Massively Parallel Processors: A Hands-on approach. Addison-Wesley Pro-fessional, 1 edition, 2010.

(52)

[KVH84] James T. Kajiya and Brian P Von Herzen. Ray tracing volume densities. SIGGRAPH Comput. Graph., 18(3):165–174, 1984. [LC87] William E. Lorensen and Harvey E. Cline. Marching cubes: A high

resolution 3D surface construction algorithm. In SIGGRAPH ’87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pages 163–169, New York, NY, USA, 1987. ACM.

[LG07] Scott Le Grand. GPU Gems 3 - Broad-Phase Collision Detection with CUDA, chapter 32, pages 697–722. Addison-Wesley, 2007. [LGF04] Frank Losasso, Fr´ed´eric Gibou, and Ron Fedkiw. Simulating water

and smoke with an octree data structure. In SIGGRAPH ’04: ACM SIGGRAPH 2004 Papers, pages 457–462, New York, NY, USA, 2004. ACM.

[LLWVT03] Thomas Lewiner, Hlio Lopes, Antnio Wilson Vieira, and Geovan Tavares. Efficient implementation of marching cubes cases with topological guarantees. Journal of Graphics Tools, 8(2):38366, de-cember 2003.

[Lun09] Viktor Lundqvist. A smoothed particle hydrodynamic simulation utilizing the parallel processing capabilites of the GPUs. Master’s thesis, Linkping University, Department of Science and Technol-ogy, 2009.

[MCG03] Matthias M¨uller, David Charypar, and Markus Gross. Particle-based fluid simulation for interactive applications. In SCA ’03: Proceedings of the 2003 ACM SIGGRAPH/Eurographics sym-posium on Computer animation, pages 154–159, Aire-la-Ville, Switzerland, Switzerland, 2003. Eurographics Association.

[MSKG05] Matthias M¨uller, Barbara Solenthaler, Richard Keiser, and Markus Gross. Particle-based fluid-fluid interaction. In SCA ’05: Proceed-ings of the 2005 ACM SIGGRAPH/Eurographics symposium on

(53)

Computer animation, pages 237–244, New York, NY, USA, 2005. ACM.

[MSS94] Claudio Montani, Riccardo Scateni, and Roberto Scopigno. A

modified look-up table for implicit disambiguation of Marching Cubes. The Visual Computer, 10(6):353–355, 1994.

[OF03] Stanley Osher and Ronald Fedkiw. Level Set Methods and Dynamic Implicit Surfaces. Springer, 2003.

[OS88] Stanley Osher and James A. Sethian. Fronts propagating with

curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys., 79(1):12–49, 1988.

[Par82] Paramount. Star Trek II: The Wrath of Khan (movie), 1982. [PTB+_03] _{Simon Premo˘}_{ze, Tolga Tasdizen, James Bigler, Aaron Lefohn, and}

Ross T. Whitaker. Particle-Based Simulation of Fluids. Computer Graphics Forum, 22(3):401–410, 2003.

[RB08] Ilya D. Rosenberg and Ken Birdwell. Real-time particle isosurface extraction. In I3D ’08: Proceedings of the 2008 symposium on Interactive 3D graphics and games, pages 35–43, New York, NY, USA, 2008. ACM.

[Ree83] William T. Reeves. Particle systems—a technique for modeling a class of fuzzy objects. In SIGGRAPH ’83: Proceedings of the 10th annual conference on Computer graphics and interactive tech-niques, pages 359–375, New York, NY, USA, 1983. ACM.

[SG08] NVIDIA Corporation Simon Green. CUDA Particles, 2008.

[SHG09] Nadathur Satish, Mark Harris, and Michael Garland. Designing efficient sorting algorithms for manycore GPUs. In IPDPS ’09: Proceedings of the 2009 IEEE International Symposium on Par-allel&Distributed Processing, pages 1–10, Washington, DC, USA, 2009. IEEE Computer Society.

(54)

[Sim90] Karl Sims. Particle Animation and Rendering Using Data Parallel Computation. In Computer Graphics, pages 405–413, 1990.

[SK10] Jason Sanders and Edward Kandrot. CUDA by Example: An

In-troduction to General-Purpose GPU Programming. Morgan Kauf-mann Publishers, 2010.

[SML] Will Schroeder, Ken Martin, and Bill Lorensen. The Visualization Toolkit, Third Edition. Kitware Inc.

[Sof10] Side Effects Software. Houdini. http://www.sidefx.com/, Octo-ber 2010.

[Tau95] Gabriel Taubin. A signal processing approach to fair surface design. In SIGGRAPH ’95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, pages 351–358, New York, NY, USA, 1995. ACM.

[Wil09] Brent Warren Williams. Fluid Surface Reconstruction from Parti-cles. Master’s thesis, The University of British Columbia, 2009. [Wre10] Magnus Wrenninge. Field3D: A file format and data structures for

storing 3D voxel and simulation data, 2010. Open source project at Sony Pictures Imageworks.

[YT] Jihun Yu and Greg Turk. Reconstructing Surfaces of

Particle-Based Fluids Using Anisotropic Kernels. In 2010 ACM SIG-GRAPH/Eurographics symposium on Computer animation. [ZB05] Yongning Zhu and Robert Bridson. Animating sand as a fluid. In

SIGGRAPH ’05: ACM SIGGRAPH 2005 Papers, pages 965–972, New York, NY, USA, 2005. ACM.

[ZPvBG01] Matthias Zwicker, Hanspeter Pfister, Jeroen van Baar, and Markus Gross. Surface splatting. In SIGGRAPH ’01: Proceedings of the 28th annual conference on Computer graphics and interactive tech-niques, pages 371–378, New York, NY, USA, 2001. ACM.

GPU Accelerated Surface Reconstruction from Particles

Department of Science and Technology

Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--10/065--SE

GPU Accelerated Surface

Reconstruction from Particles

Erik Edespong

2010-10-28

LiU-ITN-TEK-A--10/065--SE

GPU Accelerated Surface

Reconstruction from Particles

Examensarbete utfört i medieteknik

vid Tekniska Högskolan vid

Linköpings universitet

Erik Edespong

Handledare Magnus Wrenninge

Examinator Jonas Unger

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

GPU Accelerated Surface

Reconstruction from Particles

Erik Edespong

eried115@student.liu.se

Contents

Chapter 1

Introduction

1.1

Method summary

1.2

Structure of the report

Chapter 2

Background survey

2.1

Fluid simulation

2.2

Particle surfacing

2.2.1

Metaballs

2.2.2

Surface splatting

2.2.3

Level set based methods

2.2.4

Beyond metaballs

2.2.5