Large fused GPU volume rendering

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

Linköping University Linköpings Universitet

SE-601 74 Norrköping, Sweden

601 74 Norrköping

LiU-ITN-TEK-A--08/108--SE

Large fused GPU volume

rendering

Stefan Lindholm

(2)

LiU-ITN-TEK-A--08/108--SE

Large fused GPU volume

rendering

Examensarbete utfört i Vetenskaplig visualisering

vid Tekniska Högskolan vid

Linköpings universitet

Stefan Lindholm

Handledare Gianluca Paladini

Examinator Anders Ynnerman

(3)

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Abstract

This master thesis describes the underlying theory and implementation of a fused GPU volume rendering pipeline. The open source framework of XIP, largely developed at Siemens Corporate Research, is extended with fusion capabilities through a Binary Space Partitioning approach. Regions in the intersection pattern of multiple volumes are identified and subsequently rendered using either Texture Slicing or Raycasting in a cell based fashion. Results demonstrate interactive frame rates for reasonable scenes and are encouraging as the implementation can be extended by several key acceleration methods.

(5)

(6)

Acknowledgments

I would like to thank my supervisor Gianluca Paladini and the team at Imaging and Visualization at Siemens Corporate Research for their support and expertise. Special thanks also goes to Anders Ynnerman as my examiner as well as Andres Sievert, Veronica Giden and Patric Ljung for your help in the creation of this thesis.

(7)

(8)

Chapter 1 Introduction

The context of this thesis is about rendering multiple overlapping volumetric data sets in a computer graphics (CG) environment. The first sections below provides an introduction to the thesis and the field of medical visualization. This is fol-lowed by a background providing overviews of existing methods and techniques for accomplishing this visualization.

1.1 Introduction

Medical imaging and visualization of medical data is an important part in many medical fields ranging from medical trials to surgical planning. The acquisition of data can be performed by a wide selection of methods such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) while ever more intricate tech-niques are constantly developed for the visualization of this data. Also, without regard for specific visualization techniques, more refined approaches are contrived involving multiple data sources at the same time. From a medical point of view this simultaneous visualization of multiple volumes can contribute to a greater sense of context or to highlight relational aspects of different data types. Examples of such are x-ray scans of specific organs placed within a transparent body, blood flow visualization through the model of a heart or regional activity visualized in the context of a human brain.

(11)

1.1.1 Aim

The aim of this thesis is to account for the functionality required to perform fused visualization of multiple volumes and a description of the implementation of this functionality done at Siemens Corporate Research (SCR) in Princeton, USA. It is also aimed at highlighting difficulties and possible solutions to performing fused rendering, both in a general sense and in relation to specific rendering methods.

1.1.2 Outline

The first sections of chapter 1 are dedicated to ensure a certain level of necessary knowledge regarding visualization in general and the specifics of fused rendering. A full problem description, presentations of certain approaches and a design overview are given in chapter 2. Chapter 3 deals with the implementation done at SCR while results and a following discussion can be found in chapter 4 and 5. A vocabulary of abriviations and terms can be found in appendix A.

1.1.3 Limitations

This thesis is not intended as a complete review of or comparison between sampling methods or variations of such, since not enough time have been available to perform the necessary testing. The fact that such reviews are bound to be application specific also put them out of scope for this thesis. Decisions regarding rendering specifics, such as choice of sampling scheme, have been made on a general basis or for debugging purposes during implementation. No other rendering methods than Texture Slicing or Raycasting will be discussed as they are dominant within DVR and no other methods were desired by Siemens.

1.2 Background

This section aim to present some of the corner stones in the area of GPUs, DVR and fusion. The intention is not to provide a complete knowledge base but to highlight those parts that are required to understand the problem and the solution in this thesis. The knowledge entry level is at the level of GPUs and volume rendering. Readers unfamiliar with the underlying fields of computer science and CG might benefit from additional background reading while experienced readers can focus on the later sections of this chapter.

Real Time Volume Graphics [Engel et al., 2006] is highly recommended as a source of knowledge on a wide selection of topics concerning DVR. For the work discussed

(12)

in this thesis and the following text, relevant parts include the underlying theo-retical background, introduction to the GPU, key algorithms and higher level optimizations.1 _{For the underlying knowledge in CG the reader is referred to}

Computer Graphics: Principles and Practice [Foley et al., 1996].

1.2.1 Introduction

For further discussion about possibilities and shortcomings of fusion in computer graphics it is important to have an understanding of a few key things. The general problem that is being solved, the platform used for the solution and how the problem is represented on that platform. The following subsections contain an overview of the hardware followed by a theoretical presentation of the rendering problem and its role in volume visualization. Several key components that ar vital or necessary in performing this hardware rendering are presented as general (e.g. rendering methods and transfer functions) or fusion specific (e.g. segmented rays and sampling schemes).

1.2.2 The GPU

Figure 1.1. Multiprocessors work independently and in parallel. Execution units within

a multiprocessor work in unison and in parallel.

While initially created to offload the CPU by taking over most graphic related operations the GPU is nowadays utilized as a secondary processing unit in many applications. It constitutes the platform of operation and its architecture is a concern for how the problem of volume rendering is solved. In short, the GPU can be thought of as a cluster of processing cores. These are called multiprocessors and operate independently and in parallel, solving identical problems over a series of inputs. Within each multiprocessor, several execution units work in unison, instruction by instruction, on a single input per unit. See figure 1.1. The GPU pipeline can be programmed through small user defined programs commonly called shaders. Main inputs and outputs from these shaders are passed via textures and render buffers whereas some meta data can be made available in the shaders by so-called uniform variables, or simply uniforms.

(13)

While enjoying the gains of parallelism and the advantages of speedups thanks to highly specialized hardware and the parallel build described above, GPUs also have their disadvantages. One highly limiting factor is inefficient execution of if-clauses. In a worst case scenario within a single processing core, all if-clause branches are executed for all pixel fragments [Fernando, 2004]. This behavior is inherent in the architecture of the GPU and is a result of the high level of parallel optimization that requires units within the same core to execute the same instruction simulta-neously. The resulting time penalty can be avoided if the triggering of fragments and the usage of conditionals is such that the fragments of all units within a single core are guaranteed to take the same branch. Another shader related drawback is that some advanced algorithms have to be split over two separate shaders possibly resulting in costly state changes. Raycasting on arbitrary convex polyhedra is an example of such an algorithm where different steps in the algorithm depend on each other.

1.2.3 Direct Volume Rendering

Direct Volume Rendering (DVR) in volume rendering is simply the absence of intermediate surface representations in a volume rendering process. Other meth-ods of visualization, as standard surface rendering and 3D graphics, contain stages where geometrical representations are computed as an initial step and visual prop-erties calculated at surface level [Engel et al., 2006]. The difference is that the proxy geometry used for DVR exists only to trigger the rendering rather than as an representation in itself and that visual properties are calculated in the interior of the volumetric data.

Original Problem of DVR

Figure 1.2. Stages of the information flow in DVR, from patient to pixel values.

Regardless of what rendering method is applied, the literature states that while the origins of DVR lie in physical models of light transportation, in practice it cor-responds to an accumulation of color contributions from a series of volumetric data

(14)

samples.2 _{In theory, the underlying function is a continuous integral describing}

the transportation of light through a media along a certain path, see figure 1.2(a). In CG this integral is approximated with a discrete sum evaluated as an iterative sequence of small contribution calculations along the view direction through the scene, figure 1.2(e). To account for absorbtion this requires composition of the color contributions to be performed sequentially with appropriate blending, equa-tions 1.1 or 1.2 depending on traversal direction. With the GPU as the platform, the accumulation is typically performed into an off screen buffer or directly into the framebuffer with one integral being approximated per pixel. These integrals are the general problem we are trying to solve.

F ront to Back Cout ← Cin+ (1 − αin)Ci αout ← αin+ (1 − αin)αi (1.1) Back to F ront Cout ← (1 − αi)Cin+ Ci αout ← (1 − αi)αin+ αi (1.2) Volumes as 3D Signals

To view the volume data as discrete three dimensional signals opens an entire field of techniques and tools related to Digital Signal Processing. From this viewpoint, as discussed in [Engel et al., 2006] chapter 9, DVR can be seen as the tion process of an original signal as illustrated in figure 1.2(a-c). This reconstruc-tion will seldom be perfect in a real world case.3 The reason is that the choice of interpolation technique during data sampling in CG directly translates to a reconstruction filter in signal processing. Furthermore, there is no interpolation scheme available to match the only filter that would give a perfect reconstruction, the sinc filter. Standard tri-linear interpolation implies a tent shaped filter and even though it is known to have errors it is used in all work covered by this thesis as it is supported by GPU hardware.

Rendering Methods

Two of the most widely spread methods of DVR on modern GPUs are Raycasting and Texture Slicing, figure 1.3(a) and 1.3(b). These methods present algorithmic and practical solutions to the objectives of DVR as discussed above. In Raycasting, a hull of the desired sample area is used as proxy geometry and rendered once or twice to trigger entry and exit points of all sample rays simultaneously. The actual sampling along these rays then takes place in the per pixel fragment executed shader program. Texture slicing on the other hand relies on proxy geometry for

2_{[Preim and Bartz, 2007], [Engel et al., 2006], [Hansen and Johnson, 2005]}

3_{In fact, for perfect reconstruction the signal must be continuous piecewise linear function}

(15)

(a) Raycasting (b) Texture Slicing

Figure 1.3. Two frequently discussed methods of using GL proxy geometry for sample

acquisition [Engel et al., 2004].

each slice of samples where one sample per ray is triggered by a polygonal slice of the desired sample area. If perspective projection is used without compensations the sampling grid will be permuted. For Raycasting the grid becomes spherical with consistent step length while Texture Slicing, since the slicing polygons are still planar, it is the step length that is altered depending on screen position.

Transfer Functions and Lookup Tables

While a light transportation integral requires color contributions, most scientific data holds little information about actual optical properties or color contribu-tions at sample points. Rather, a mapping is performed to convert the sampled scalar values (most often density values) into color contributions. This mapping is done with a transfer function. Prior to a mapping, values are said to be pre-classification and are here called data samples. Post-pre-classification is accordingly the label for values after a mapping has been performed and the values are now called color samples. These post-classification color samples are accumulated in the composition equations 1.1 and 1.2.

Figure 1.4. Typical one-dimensional TF used to convert scalar data samples to color

samples.

A TF is in general a continuous function that often cannot be easily represented analytically. Instead a TF is sampled at discrete points, creating a discrete rep-resentation called a Lookup Table (LUT) that can used for the mapping. Simply put, the data sample is used as a coordinate to access the LUT and extract a color sample. In figure 1.4 a one-dimensional LUT is illustrated where values in the range of [0.0, 1.0] are mapped to different colors. Typically, in CG a LUT is

(16)

represented as a texture and the actual lookup is performed as a texture fetch in the GL. Alpha Correction ˜ αi= 1 − (1 − αi)( ∆ ˜x ∆x) (1.3)

If a constant sampling frequency is maintained then no inter sample weighing other than the TF lookup is typically performed. If the frequency is variable or if the overall intensity is desired to be maintained through a series of images rendered with different frequencies, a per sample weight is introduced.4 In the literature this weight is called alpha correction and is computed as in equation 1.3. While maintaining a fairly constant overall intensity this correction does by no means remove the errors associated with discrete approximations, fig 1.2(e).

1.2.4 Fusion in DVR

Aside from the general aspects of DVR, fusion introduces additional criteria. When fusing multiple overlapping volumes in DVR, the requirements in equations 1.1 and 1.2 regarding sequential color sample composition along the viewing direction still apply. To render the volumes one by one before a final composition would rep-resent a sum that invalidates the iterative discrete approximations by not having sequential samples. Fusion is the art of allowing multiple volumes to be rendered without breaking any of the mentioned laws.

Segmented Rays

As discussed for the software Raycaster in [Grimm et al., 2004], different segments of a sampling ray can be processed separately, this approach is here called seg-mented rays. The segments are blended together while care is taken not to violate the blending order of the segments themselves or their samples. The problem thus becomes one of how to find and sample all segments along the view direction exhibiting different combinations of overlapping volumes.

Although these segments can be chosen arbitrarily, in fused DVR they typically conform to the regions in the intersection pattern of the volumes.5 As illustrated in figure 1.5, this pattern and its regions corresponds well to set theory. As an example, in a rendering of scene (a)6, regions (b) (c) (d) can be rendered separately

4_{This should follow from fig 1.2(e), e.g. if half of the integral was sampled twice as often.} 5_{Higher granularity can be applied for optimizations.}

(17)

(a) Cu ∪ Sp (b) Cu \ Sp (c) Cu ∩ Sp (d) Sp \ Cu

Figure 1.5. To avoid wasted samples a scene can be split down to its intersections

[Commons, 2008]

in any order while blending has to take place as (d) → (c) → (b). If composition is integrated in the rendering, as it is in most hardware CG to avoid temporary storage, then also the rendering order has to be (d) → (c) → (b).

Increased complexity

As opposed to a single region in regular DVR, rendering and blending several small regions introduces an overhead in the overall rendering. This overhead stems from the necessity to keep track of the segments and their blending and introduces additional computational and storage requirements. Since these issues are highly method specific further discussion is withheld until chapter 3.

Sampling Schemes

Under the assumption that sampling is costly and should be kept to a minimum volumes should ideally by sampled according to their respective data resolution and not outside of their individual boundaries. Brute force sampling, fig 1.6(a), is hardly satisfying in this context and without any assumptions of aligned volumes or related volume resolutions. A globally selected frequency, fig 1.6(b), also carries an overhead as it forces over sampling for low resolution volumes. A per region uniform sampling frequency, fig 1.6(c), limits this overhead while introducing a variable sample frequency within individual volumes. Since such a variation alters the approximation error illustrated in 1.2(f), artifacts such as incorrect and abrupt color changes between neighboring pixels can appear. Opacity correction alone is not sufficient to avoid this problem. Interleaved sampling, fig 1.6(d) was discarded by [Rösler et al., 2006] for introducing unspecified ‘artifacts’ and likewise deemed limited in [Plate et al., 2007] due to ‘opacity errors’.

(18)

(a) Brute force. Sample all volumes ev-erywhere according to the highest reso-lution of any single volume.

(b) Global frequency. Sample volumes within their own boundaries according to the highest resolution of any single volume.

(c) Per region frequency. Sample vol-ume intersection regions according to the highest resolution among volumes present in each region.

(d) Interleaved sampling. Sample all volumes independently according to their specific resolution and interleave the sample between volumes.

Figure 1.6. Sampling schemes for multiple overlapping volumes. The colored curves

indicate sampled density for the different volumes along the view axis.

Composition Schemes

(a) Independent TF lookups per data sample.

(b) Combined TF lookup for data sam-ples.

Figure 1.7. Sampling schemes for multiple overlapping volumes.

A interesting area of research when it comes to using TFs for mapping data samples to color samples is how to use more than one data value for each lookup. These are called multidimensional TFs and can be used when fusion is present to investigate new possibilities in how to composite volumes. One such composition is to use the overlap of two volumes and display the difference between samples rather than visualizing the volume data itself, this is sometimes used with heat or flow field visualizations. Independent TF lookups are however more commonly used and are the only ones used in this work.

(19)

1.2.5 Large Data Sets in DVR and Fusion

Figure 1.8. Rubik’s cube is an example of a 3x3x3 bricking.

Any data to be visualized through the GPU has to be present on the VRAM at the time of rendering. In some cases, such as large volumes or large amount of volumes, this presents a problem as all required data takes up more space than is available on the GPU. One solution to this problem is to divide the volume into smaller parts, commonly called bricks, that are rendered separately so that only one such brick is required to be stored on the GPU at any time.7 _Ordinary

rendering methods still apply although additional considerations have to be made regarding boundary constraints and interpolation.

In fusion, it is often desired to identify intersection regions between volumes. When dealing with fusion and multiple bricked volumes this region identification has to be completed all the way down to brick level to assure that no more than one brick from each overlapping volume must be present on VRAM at any time. The significance and implication of this is discussed in section 2.2.1.

(20)

Chapter 2 Theory and Design

The chapter is mainly concerned with fusion as those parts of the solution were less known beforehand. The rendering on the other hand, known to be imple-mented as Texture Slicing and Raycasting and thus docuimple-mented throughout the literature, required substantially less designing and is therefor treated directly in the implementation chapter.

2.1 Problem Description

As a part of the caBIGTM initiative, the open source eXtensible Imaging Plat-form (XIP) have been developed and includes a pipeline for volume rendering. In short, the task is to extend and/or rebuild this existing rendering pipeline to support fusion of multiple volumes. Modules in the pipeline are desired to support large volumes through bricking and include the two rendering methods of Texture Slicing and Raycasting. Since XIP is open source and developed using plug-in methodology, key demands such as modularity, simplicity and extendability is a priority (described in more detail below). Furthermore, memory consumption and rendering speed are also prioritized and interactive framerates are desired.

Modularity, Simplicity and Extendability

Volumes should be represented in such a way that more than one fusion module can operate on the same data with volumes being included or excluded at will on a per module basis, figure 2.1(Fusions). The same modularity should also apply for the rendering step with different renderers operating independently against a single

(21)

Figure 2.1. Modularity in pipeline

fusion module, figure 2.1(Renderings). Such a shared pipeline is a priority for producing split views or piecewise outputs from the same scene. The modularity also implies simplicity through clearly defined parts, extendability is also simplified by the fact that single pieces of the pipeline can be replaced.

Confinements

Under the requirement of interaction this thesis does not cover any fusion schemes at data level, such as pre process re-sampling or fusion of data on disk. The idea is instead to fuse individual contributions calculated from different data sets into a single image directly at render time rather than fusing the data itself.

2.2 Approach

Following the problems of large data and the desired support for volume bricking, regions in the intersection pattern between volumes have to be identified. This implies some scheme to partition space. This section presents Binary Space Parti-tioning (BSP) and a few other key elements as a way to reach the goals and solve the stated problem.

(22)

2.2.1 Binary Space Partitioning

In short, BSP recursively subdivides space into pairs of subspaces by partitioning planes. All such conceptual spaces are called cells. Although the initial space can be bounded, such as a polyhedron, it does not have to be.1 The process is recursive, with all resulting subspaces in turn subdivided further, and proceeds until a specified criteria is met for the resulting cells. One of the most prominent features of BSP compared to other partitioning schemes is the fact that the orien-tation of the partitioning planes are chosen arbitrarily and not axis aligned. The following text explains why simpler schemes are not sufficient and also provides a description of how BSP can generate a tree structure.

Partitioning Alignment

Axis aligned methods are often simple and efficient but can in themselves not ac-complish accurate representations of all regions unless the volumes themselves are axis aligned. As illustrated in figure 2.2, regions are often approximated rather than represented using such methods. These non accurate representations can trigger situations with region intersections ending up completely inside approxi-mations. With region subdivision down to brick level this leads to the fact that several bricks per volume have to be present in VRAM at the time of render-ing.2 _{The Binary Space Partitioning (BSP) used in this work is a non axis aligned}

scheme capable of producing accurate representations for intersections of multiple convex polyhedra.

Tree Representation

Each subspace in BSP can be represented by a node in a binary tree structure called Bsp-tree where each node corresponds to a cell. In this tree, all internal nodes have one positive and one negative child, named so after the partitioning plane half spaces they represent. The only nodes not associated with planes and that have no children are the leaf nodes of the tree. This leads to the fact that the space represented by any node in the tree is defined by the initial space and all planes stored in its direct ancestors when applied in top down order. It also means that for any given node, every cell of its descendants will constitute proper subsets of the cell in that node. See figure 2.4(Bsp-tree) for an example and [Foley et al., 1996] for further reading.

1_{In fact, in this thesis the initial space is unbounded and the root node represents the full}

world space in CG.

2_{If bricks are created using Octrees then the required number can be as high as eight bricks}

(23)

(a) Uniform axis aligned partitioning

(b) Non uniform axis aligned partitioning

(c) Non axis aligned partitioning

Figure 2.2. (a) and (b) are region approximations while (c) can be said to be a

repre-sentation.

2.2.2 Geometrical Homogeneity and Complete Cells

In addition to common terms of space partitioning an additional, or extended, definition of homogeneity is required for its use in the work of this thesis.

Figure 2.3. Geometrical homogeneity. Cells (a), (b), (c) are geometrically homogenous

and thus complete while (d) is incomplete. None of the cells are strictly homogenous.

Homogenous qualities in cells is an important aspect in BSP and often constitute the condition for breaking the recursive subdivision. Homogeneity as described in [Foley et al., 1996] occur when there are no boundaries inside a cell, including boundaries towards empty space. However, in this implementation, the subdivision is driven with the objective to find intersecting regions amongst multiple convex polyhedra.3 _{No requirement exist to find any boundaries between these polyhedra}

and empty space, only between the polyhedra themselves. Strict homogeneity is therefor not required. Hence, the concept of geometric homogeneity is introduced. A cell is said to be geometrically homogenous if all polyhedra within the cell are equal and occupy the same space, i.e. their geometrical representations, including

(24)

position, are equal. This means that while a cell is non homogenous in a strict sense by spanning both occupied and non occupied space it can still be geometrically homogenous. It also follows that cells occupied by less than two polyhedra are geometrically homogenous by definition. In figure 2.3, cells (a), (b) and (c) are all geometrically homogenous (two of them by definition) while cell (d) requires further subdivision. A geometrically homogenous cell is called complete if it fulfills the criteria and is not subdivided further, thus all leaf nodes in a tree represent complete cells. Cells represented by internal nodes are always incomplete. When discussing a Bsp-tree where each cell is represented by a node, the completeness of the cell directly determines the completeness of the node.

2.2.3 Storage Structures

As a direct implication of the desire for minimal memory usage the two data structures of Object Pools and Segmented Arrays are introduced. These structures are available for main RAM only and are not applicable on GPU memory.

Object Pools

An object pool (or resource pool) is basically a stack where objects are stored that are no longer used. Instead of having objects allocated and destroyed on demand this scheme keeps objects passive for future use and thus avoids the cost associated with frequent allocations.4 _{A slight overhead is generated from managing the stack}

but is often negligible in relation to the performance gains from avoiding frequent allocations. For more information see [Kircher and Jain, 2002].

Segmented Arrays

Segmented arrays are beneficial in the same way as object pools in that they limit the amount of allocations while keeping the memory footprint low in a classic memory versus performance tradeoff. The idea is simply to segment the allocation of memory so that every time more memory is needed an entire chunk is allocated. The memory layout can be thought of as a two-dimensional array where one row at a time is allocated. The size of these allocations can be selected depending of what kind of data the structure will hold, with an additional speed-up if the size is kept as a power of two.5 _{In applications with a fairly time invariant memory footprint}

the allocated chunks can remain allocated for fast direct access or otherwise be released when not used anymore.

4_{The actual cost is OS and runtime dependant.}

(25)

2.3 Design

Algorithm 1 describes the main steps to solving the problem of fusion. It starts with an objective of finding spatial representations of all regions in the intersection pattern of the volumes through the use BSP. Rendering can then be performed using these region representations involving only those volumes that occupy each region.

Algorithm 1 Fusion Pipeline

Require: Representations for all involved volumes

1... Insert all volumes as original fragments in Bsp-tree root node

2... Apply BSP until all leaf nodes are complete and the Bsp-tree is built

3... Create proxy geometry according to camera position

4... Render proxy geometry with settings according to represented volumes

2.3.1 Volume Representation

Before any fusion can be initiated the volumes must be represented within the framework. The solution presented uses two main descriptions, one in the pipeline and one called Volume Fragments6_{for the fusion process.}

In the Pipeline

This description contains in itself only two things per volume, a pipeline specific index and a transformation matrix. Other volume specific information such as resolution, transformation, storage, GL access point7 _{and LUT information are}

maintained per volume but not part of any encapsulating structure. The index and matrix pair populates a list used by fusion modules in the environment while all other info is made available directly in the rendering shaders. No need was identified for more comprehensive representations in the framework. This is in contrast to more complex representations found in [Rösler et al., 2006] called V-objects and [Plate et al., 2007] under the name of lenses.

Volume Fragment

The second representation is used within fusion modules and consists of a purely geometrical description of each volume, called volume fragment or simply frag-ment. These fragments are what drives the space partitioning as described later.

6_{Not to be confused with pixel fragments in CG.} 7_{Texture unit in OpenGL}

(26)

The fusion process begins with one fragment per volume, see figure 2.4(Original Fragments), before being subdivided such that, in the end, several smaller frag-ments combine to form a complete volume, see figure 2.4(Resulting Fragfrag-ments). While volumes in the pipeline can be shared by multiple fusion modules each such module has a unique set of fragments.

Volume Fragment Boundaries

Each fragment holds a list of boolean flags with one entry per original boundary of the represented volume signifying if the boundary is open or closed. Initially, all boundaries for a fragment are considered open. Once a boundary polygon of a fragment ends up on or completely outside a plane defining the cell to which the fragment belongs that boundary is closed. This gives a way to control a cell’s geometrical homogeneity and completeness as defined in section 2.2.2; if all bound-aries of all fragments in a cell are closed then the cell is complete. Considering the gray fragment in the middle of figure 2.3 belonging to the light blue square, this fragment would have a list of four boundaries (top, bottom, right, left) with all but the top boundary marked closed as those original edges are no longer a part of the fragment or are positioned on partition planes.

2.3.2 Fusion Overview

Figure 2.4. Fusion solution with the main goal of finding spatial representations for the

individual regions separated to the left.

All volumes are initially represented in the root node of a Bsp-tree with one volume fragment each, see figure 2.4(Original Fragments). BSP is then applied on the scene one plane at a time. This recursive procedure continues until all space is divided into complete cells with a multitude of subdivided volume fragments representing each volume, see figure 2.4(Resulting Fragments). Since space is

(27)

divided in a binary way and the plane normals are known, a view dependent order of the cells can be extracted from the Bsp-tree without any need for sorting in the render module.

(28)

Chapter 3 Implementation

The two main modules, fusion and rendering, implemented in XIP are described here. All implementation is done in C++ using the Open Inventor and OpenGL APIs while the rendering execution is shared between OpenGL and GLSL. Object Pools and Segmented Arrays are used for the management of primitives such as Bsp-tree nodes and volume fragments. In particular, the pools are used as caches for the population of primitives within specific modules while lists of primitives and storage of geometrical data are implemented using the segmented arrays. VTune performance analyzer was used extensively for tracing bottlenecks during development of the structures.

3.1 Fusion Module

The fusion module is a conceptual grouping of pipeline functionality that concern the camera independent aspects of fusion and their underlying structures. Its main assignment is to divide space and to generate a representation of the resulting intersection pattern of the volumes.

3.1.1 Generating Region Representations

Once all volumes are represented in the root node as fragments, fusion is initiated and the recursive BSP method described in 2.2.1 is carried out as in algorithm 21_.

In short, an initial check is performed for each node regarding its completeness, if

1_{For a detailed version see appendix B algorithm 11}

(29)

the node is complete it is marked as a leaf and the recursive branch is closed. If the node exhibits fragments with open boundaries, i.e. it is incomplete, a plane is retrieved and the node is split before algorithm 2 is executed on its children.

Algorithm 2 Build node

Require: node to start recursive subdivision

1... if (node complete) then

2... mark node as leaf

3... else

4... find best partitioning plane for node

5... create children and split node

6... recursively run Build Node on children

7... end if

Figure 3.1. Splitting a node by a specified plane comes down to a sorting and, if necessary, partitioning of all fragments within the node.

When splitting a node by a given plane, all fragments that not directly intersect the plane are sorted amongst the child nodes while those fragments that do intersect the plane are in turn split by the same plane. See figure 3.1 for an illustration where the center fragment is split while the two fragments to the left and right are sorted. This scheme is repeated on every level from fragments through polygons down to individual lines and vertices, see appendix B algorithms 9 and 10. For efficient polygon clipping the implementation is based on the Sutherland-Hodgman algorithm as described in [Foley et al., 1996]. This algorithm traverses the vertices of a polygon in CW or CCW order while keeping track of all vertices on the positive side of the plane and creates new vertices at any edge plane intersections. The algorithm is also extended to support splitting of polygons storing both resulting halves. Furthermore, requirements on all levels for primitives to be convex are fulfilled by definition as all resulting entities from a split of any convex entity by a plane are known to be convex. However, care must be taken to close the geometrical representations of the pieces created in the partitioning. In the case of polyhedra this means adding a polygon at the place of the intersection to close the hull.

(30)

Figure 3.2. An on-plane threshold (area within dashed lines) is introduced in all

inter-section tests to counter the effects of numerical inaccuracies. Green geometries can be sorted to the top half space without being split since their primitives are all considered on or above the partitioning plane.

3.1.2 On Plane Threshold

During plane geometry intersection tests, numerical errors occur due to lack of floating point precision. Situations where at least one vertex is positioned close to a plane can, wrongly, trigger a splitting of the polyhedra it belongs to. To avoid such situations an on-plane-threshold is introduced giving the plane a thickness where points within this thickness are said to be on the plane and thus belong to both half spaces. Without a threshold, testing a single polygon is done by counting positive and negative vertices. With the threshold, this is expanded to also count the number of vertices positioned on the plane within the threshold. If vertices exists strictly on one side of the plane or within the threshold then the polygon is deemed not to intersect the plane. This expands to all types of geometry as seen in figure 3.2 where all green geometry are handled as completely inside the upper half space while blue geometry intersects the plane and needs to be split. The thickness of the threshold is kept small within the range of the precision for the float data type.

3.1.3 Partition Plane Selection

If a Bsp-tree node is not complete and should be subdivided further, a partitioning plane has to be found. As described in section 2.3.1 the completeness of a cell is determined through the boundary states of all fragments in that cell where each closed boundary brings the subdivision closer to completion. Any plane has the potential to close one or more boundaries but only a limited set of planes are guaranteed to do so. This limited set is defined as all planes that coincide with an open boundary for any of the fragments in the cell. Choosing partitioning planes from this set provides a way to reach a completed subdivision as in algorithm 3.

(31)

Algorithm 3 Find partitioning plane

Notation: Fx is fragment x in node such that x ∈ [0, NF]

Notation: Bxk is boundary k for fragment x such that k ∈ [0, NBx]

Require: node T for list of fragments

1...

2... for all (Fi and Fj in T such that 0 ≤ i < j ≤ NF) do

3... for all (open Bik in Fi) do

4... if (Bik separates Fi and Fj) then

5... store boundary Bik as partition plane in T

6... close boundary Bik in fragment Fi

7... break /*jump to if(plane found)*/

8... end if

9... end for

10... end for

11... . . .

12... if (plane found) then

13... split T with plane

14... else

15... close any remaining open boundaries in all fragments in T

16... mark T as leaf

17... end if

(a) 6 cells 5 planes. (b) 10 cells 9 planes.

Figure 3.3. The complexity of the subdivision depend on how the planes are chosen.

Selection Scheme

While the set of boundary closing planes provide a good selection of relevant planes the internal order in which thay are applied must be defined. One of the main goals in determining the order of the planes is to minimize the complexity of the Bsp-tree. Although only planes from the same limited set were chosen in both figure 3.3(a) and 3.3(b) the complexity is almost doubled for the later. As a direct result of this it can be argued that choosing a plane that completely separates as many fragments as possible while keeping intersections to a minimum is preferable. This is demonstrated in the choice of initial planes in figures 3.3(b) (long double

(32)

arrowed diagonal line) and 3.3(a) (long double arrowed vertical line) where the later sets up a much better position for closing multiple boundaries and keeping the number of cells to a minimum. However, spending resources finding the “best” plane comes with a penalty. All open boundary planes of all fragments have to be considered against all other fragments in a vertex by vertex manner before any consideration can take place. Thus, there is a tradeoff between Bsp-tree complexity and performance and which solution is optimal will depend on the situation. If a scene consists of few volumes but behaves such that the tree must be generated often, a low complexity tree might not be worth the extra time it costs to compute. On the other hand, in a more static environment with high complexity, the extra generation time can be negligible and well worth the reduction of tree complexity. In this work, two schemes of different complexity are implemented in accordance with the two situations described above.

3.2 Render Module

The second module of the pipeline performs the actual rendering and conceptually begin after the creation of the Bsp-tree in the fusion module. Main parts include the creation of the render queue and its traversal where a selection of instantiated shaders is a central step.

3.2.1 Render Queue

To keep the pipeline consistent regardless of rendering scheme and to further em-phasize the separation of fusion and render modules the idea of a render queue is introduced. The linear queue is created by a traversal of a Bsp-tree where each leaf cell generates a queue entry, see figure 2.4(Render queue) for illustration. Proxy geometry to be used in the rendering, as discussed per rendering method in sec-tion 1.2.3, is stored in each entry along with addisec-tional informasec-tion such as present volumes. Each render module maintains a private queue.

As all cells to be rendered are leafs and known to be geometrically complete the polygonal hull of any fragment within a cell can thus be used as a geometrical description for all occupied space within that cell. In Texture Slicing, the inter-section points between this description and the desired slicing planes are used to create slicing polygons. For Raycasting, the polygons of the hull itself are used directly as proxy geometry. Additional polygons are inserted in case the fragment is clipped by front or back clip planes by the GL since this otherwise creates holes in the rendered geometry.

(33)

3.2.2 Cell Based Rendering

With inherent depth sort among the cells, rendering becomes the task of iteratively render all cell entries in the queue. The combination of volumes in each entry is used to select a shader according to any of the instantiation schemes described in section 3.2.4. To avoid artifact due to misplaced samples between adjacent rays on opposite sides of a cell border, all sample positions in all cells must be enforced to follow a global pattern. In Texture Slicing using view dependant slicing polygons2_,

this comes down to an enforcement in z-offset in the creation of each plane such that planes in adjacent cells always end up edge to edge. For Raycasting the same type of offset is enforced in the shader as a manipulation of the ray entry point dependant on the camera position.

3.2.3 Rendering Methods

Two hardware renderers are implemented using GLSL programs, one based on Tex-ture Slicing and the other on Raycasting. Both renderers share common feaTex-tures in shader selection and render queue execution as described in previous sections. As stated in section 3.2.1, each renderer initially fills its queue entries with ap-propriate proxy geometry before rendering is carried out according to algorithm 4 or 5. This step is slightly more complex in cell based rendering as compared to regular DVR since the areas are formed by arbitrary convex polyhedra as opposed to being cuboid. Furthermore, since rendering is performed on hardware using a GL API, the state of this API must be set according to the desired functional-ity. In particular, the blending is set according to the method of choice and its implementation such that the requirements of sequentiality stated in section 1.2.3 are not violated. The choice of which sampling schemes from chapter 1.2.4 to implemented for each rendering method was made based on availability and time as no requests were made from SCR.

Texture Slicing

Creating slice polygons for regular DVR can be done analytically by calculating intersections of planes with the bounding cube. However, the complexity of the calculations for the plane intersections are increased in the fusion case due to the non cuboid polyhedra defining the rendering areas. Also, the total number of slices increases with an increasing number of cells which in turn adds more complexity. The implications of this are noticeable in the results of chapter 4 and discussed in 5.

The rendering process of the Texture Slicer implemented in XIP follows algorithm

(34)

4 where the sample point manipulation discussed above is already present in the location of the proxy geometry and thus does not have to be considered in the shader programs. The only sampling scheme implemented in the Texture Slicer is Global Frequency.

Algorithm 4 Texture Slicing

Notation: Ex is entry x in queue such that x ∈ [0, NE]

Require: queue Q

1... bind render buffer as render target

2... enable accumulative blending

3... for all (entries Ei in Q such that 0 ≤ i < NE) do

4... pick shader according to present volumes in Ei and set uniforms

5... render front facing geometry in Ei

6... end for

Raycasting

Raycasters for standard DVR can be implemented as single-pass shaders, avoiding costly state changes in the GL, where exit points for triggered rays are calculated directly in the shader from the planes of the bounding cuboid.3 _{The polyhedric}

nature of the rendering areas in cell based rendering requires an additional pass to be added where the exit points are rendered into a texture for access in the main rendering step. This two-pass approach is further extended so that the resulting alpha after the rendering of one cell is accessible in the rendering of the next.4 A ping-pong like swapping of render targets is thus performed twice for each cell, all according to the segmented rays discussed in section 1.2.4.

Both Global Frequency and Interleaved Sampling variants are implemented for Raycasting and manipulation of the entry points for inter cell consistency is im-plemented directly in the shader programs. An additional sampling point manip-ulation is also introduced as a conversion from the native spherical sampling grid of Raycasting to a uniform grid matching the one found in Texture Slicing.

3.2.4 Instantiated Shaders

In a naive shader implementation the execution of specific volumes would be gov-erned by conditionals as seen in example 3.1(top part). However, poor hardware support for conditionals in shader programs on the GPU prevents a single shader from effectively skipping volumes in this manner.

3_{This is sometimes reversed so that exit points are triggered and entry points are calculated.} 4_{It is stored in the alpha channel of the exit point texture.}

(35)

Algorithm 5 Raycasting

Notation: Exis entry x in queue such that x ∈ [0, NE]

Require: queue Q

1... for all (entries Ei in Q such that 0 ≤ i < NE) do

2... bind render buffer as texture

3... bind exit point buffer as render target

4... bind exit point shader and set uniforms

5... enable replacing blending

6... render back facing geometry in Ei

7... bind exit point buffer as texture

8... bind render buffer as render target

9... pick shader according to present volumes in Ei and set uniforms

10... enable accumulative blending

11... render front facing geometry in Ei

12... end for

Instead, a more advanced scheme is introduced where one shader is compiled for every number of volumes to be fused. This way five shaders are used for a five vol-ume setup with the first shader only sampling a single volvol-ume, the second sampling two volumes etc. While removing the overhead of poor conditional execution, the programmer no longer have the freedom of arbitrary blending for specific volumes. Although they can still be manipulated using specific TFs. If two sequential cell renderings share the same amount of volumes no state change has to be made as the same shader is kept active and only uniform variables are updated.

Yet another scheme is also implemented that takes specific volumes into account at the cost of a growing number of compiled shaders. The baseline this time is that one shader is compiled for each unique combination of volumes and thus a five volume setup would result in 32 unique shaders. Freedom in blending for specific volumes is restored while an overhead is created as a state change is performed for virtually every rendered cell.

Writing 32, or even five, shader source files for a five volume setup is hardly an op-tion for obvious reasons such as code duplicaop-tion. This problem is solved by using pre compilation macros to include or exclude volume contributions, see example 3.1(bottom part) as opposed to a naive if-clause 3.1(top part). Source code is only written once before being instantiated multiple times using different macro setups. Both non-naive schemes described above utilizes this in the implementation.

Example 3.1: If-clause dependent shader code vs. compiler macros

{ ...

(36)

color += (1.0 - color.a) * sampleVol( volumeX, pos ); ...

#ifdef VOLUME_X

color += (1.0 - color.a) * sampleVol( volumeX, pos ); #endif

... }

3.3 Extra Functionality

In addition to the core functionality of DVR, to render volumes, there exist sev-eral performance optimizations and functional tools to increase the value of the visualization. Some of these addition have been included in the pipeline and are presented here.

3.3.1 Clip Planes and Early Ray Termination

The importance and usage of volume clipping is thoroughly discussed in [Engel et al., 2006] chapter 15. In this work volumes can be clipped by the insertion of clip planes on a per fusion module basis, thus clipping all volumes within the Bsp-tree of that module. A potential acceleration of the generation of the Bsp-tree is also notice-able for every clip plane that is added thanks to a lowered initial complexity. I.e., some volumes or at least parts of volumes can be cut away, reducing both the overall amount of primitives and the amount of potential partitioning planes. Standard Early Ray Termination (ERT) is available for Raycasting in such a way that rays are terminated once a certain threshold is reached for the satu-ration of the pixel associated with that ray. And although an already saturated ray can be re-triggered as additional cells are processed the rendering is aborted before the stepping along the ray is initiated. This topic is discussed in both [Krüger and Westermann, 2003] and [Engel et al., 2006].

3.3.2 Fusion with Simple Convex Mesh Geometry

Originally implemented for crude representations of medical instruments inte-grated in DVR, there exists support in the implementation for correct fusion between simple convex polygon geometry, called mesh geometry, and the volu-metric data. This way, standard polygon models can be present in DVR, see

(37)

algorithm 6, with correct blending and with the use of textures or lighting made to the likeness of instruments or tools. The requirement of simplicity lies in the fact that a mesh is inserted as any other volume and thus effects the complexity of the tree accordingly. A 20’000 polygon mesh would for example, if present inside a volume, cause ∝ 20’000 subdivisions. Below, a scheme is presented for Raycasting to address this issue.

Algorithm 6 Fusion with simple mesh geometry

1... Add mesh as a ‘volume’ to the tree

2... Generate Bsp-tree and Render queue

3... . . .

4... for all (cells containing the geometry) do

5... render back facing mesh polygons

6... render volume data

7... render front facing mesh polygons

8... end for

Proposed Scheme for Fusion with Complex Mesh Geometry

For more complex geometry, one approach is to add a simple bounding box as a ‘volume’ in the Bsp-tree and have the fusion take place as in algorithm 7. This scheme will only work with Raycasting and is not yet implemented in the pipeline.

Algorithm 7 Fusion with advanced mesh geometry

1... for all (cells containing the mesh bounding box) do

2... render surrounding volume data5

3... render back volume data6

4... render back facing mesh polygons

5... render intermediate volume data7

6... render front facing mesh polygons

7... render front volume data8

8... end for

5_{Entry and exit points comes from front and back facing hull polygons respectively}

6_{Entry points comes from front facing hull polygons or back facing mesh polygons, whichever}

is closest, exit points comes from back facing hull polygons

7_{Entry and exit points comes from front and back facing mesh polygons respectively} 8_{Entry and exit points comes from front facing hull and mesh polygons respectively}

(38)

Chapter 4 Results

The results presented in this chapter include comparisons of partition plane se-lection schemes and the impacts of volume placement on tree generation time and complexity. Certain aspects of the fusion module and the rendering modules are also presented with illustrations. All benchmark timing was performed using VTune Performance Analyzer (VTune in short) while application frame rates were measured directly in XIP.

All tests were run on a 2.8GHz Pentium 4 machine equipped with 1GB RAM and a GeForce 8800 GT graphics card with 512MB VRAM. The head data set used for rendering has a 2563 _{resolution at 12 bits.}

(a) Dense proxy geometry. (b) Dense scene. (c) Sparse proxy geome-try. (d) Sparse scene.

Figure 4.1. Two different scenes are used in all performance testing. One dense worst

case scenario, the other a sparse situation. The sparse geometry in (c) also includes a big enclosing volume that is left out in this figure for illustrative purposes.

All final results are directly dependent on the complexity of the Bsp-trees as high-lighted in section 1.2.4. This complexity is in turn highly dependent on the relative

(39)

placement of the involved volumes and testing is performed using two different scene setups. As seen in figure 4.1 a dense scene represents a worst case scenario and a sparse scene depicts a distributed scenario where several small volumes are rendered inside an encapsulating big volume.

4.1 Storage Structures

Figure 4.2. Relative VTune benchmark results for three tested allocation methods. The

developed Segmented Arrays and Object Pools presented in section 2.2.3 are compared against standard fix C++ arrays and a naive allocate-on-demand scheme. A dense and computationally expensive scene was used during the experiments.

A direct result of the Object Pools and Segmented Arrays is, as seen in figure 4.2, that relatively little speed have to be sacrificed while the memory footprint is considerably smaller compared to standard C++ arrays. While being large enough for the specific test cases, the C++ arrays in these charts cannot handle arbitrary memory demands. The segmented arrays on the other hand handles all cases. The third pair of columns belonging to the New/Delete memory scheme is mostly present as a reference. Timing was performed with VTune in a test environment that was executed between ten thousand and one million times.

4.2 Bsp-tree Generation and Complexity

Tree complexity, here measured in number of primitives (nodes, fragments, poly-gons), relates roughly as nodes ∝ volumes3_{as seen in figure 4.3(a) and the storage}

requirements for polygons also follow the same pattern. This growth rate is only apparent in scenes with dense volume placement, i.e. worst case scenarios where all volumes overlap each other without relative alignment. In a comparison of VTune ‘Minimum Time’ performance between dense and sparse scenes in charts 4.3(c) and 4.3(d) it is apparent that a sparse placement directly translates to shorter tree generation times with a time gain between 40% and 300%+ for increasing number of involved volumes.

(40)

(a) Complexity of primitives (b) Storage requirements

(c) Generation measurements in VTune (bench marking) and XIP (application) for a dense scene (worst case). The complexity for the ‘Min Time’ and ‘Min Complexity’ schemes in this case are al-most identical due to the scene density.

(d) Generation times (columns) and complexities (lines) for a sparse scene (good natured).

Figure 4.3. Time, complexity and storage charts for generation on Bsp-trees with varying number of volumes. Noticeable differences appear depending on partition plane selection scheme and scene density.

4.3 Partition Plane Selection Methods

Generation times for Bsp-trees in figure 4.3 are of course directly proportional to their complexity. Even so, since the complexity in turn is dependent on the placement of volumes as well as how the partitioning planes are selected some interesting remarks can be made. As discussed in section 3.1.3 the complexity of the tree can possibly be lowered if more resources are spent on partition plane selection. Hence a tradeoff between complexity and speed arises. This is illustrated by the difference in tree complexity in figure 4.4 and confirmed by the time charts in figure 4.3(d) which even indicates a speed-up for the complexity minimizing strategy derived directly from its lowered complexity.

(41)

(a) Minimum Time (b) Minimum Com-plexity

Figure 4.4. Comparison of the two implemented ways to chose partitioning planes and

their effect on the complexity of a simple scene. The performance oriented method in (a) spends less time per plane selection but results in higher complexity.

volume placement is sparse, e.g. figure 4.1(c). If the volume placement is dense on the other hand, the extra resources spent on partition plane selection becomes futile and result in an overall penalty. The Bsp-tree in figure 4.7(a) is an example of this as its resulting structure is independent of which selection scheme is used. Also, figure 4.3(c) shows a time penalty for the ‘minimum complexity’ scheme in XIP while the complexities (not illustrated) for both schemes are similar.

4.4 Real Application Impact

So far, all performance measurements discussed have been done in VTune. How-ever, there exists a close relation between real application measurements done in XIP and VTune benchmarking. This is apparent in figure 4.3(c) where real appli-cation tests closely follow the synthetic estimations. This relation can be observed as long as no other major bottlenecks (typically rendering) appear in the pipeline for the measured scene.

4.5 Proxy Geometry

The proxy geometry for Texture Slicing and Raycasting is shown in figure 4.6(a) and 4.6(b) with the final rendering in 4.6(c). As can be seen in the charts of figure

(42)

(a) The queue makes for the greater part of the proxy geometry consumption for Texture Slicing.

(b) Raycasting carries lit-tle interaction overhead with low amounts of queue geometry.

Figure 4.5. Three points on proxy geometry. The relative amount needed for Texture

Slicing and Raycasting, the portion of which that is queue geometry (needs to be redrawn every frame during interaction) and finally the growth of both with increased scene density.

4.5, the amount of geometry needed for Texture Slicing grows drastically with increased scene complexity. For a dense scene during interaction, roughly 60000 polygons needs to be recomputed each frame for Texture Slicing. The same number for Raycasting is a few thousands. An increase from 256 to 512 in sampling depth for Texture Slicing results in a 88% increase of proxy geometry in a sparse scene while a dense scene causes this increase to be 60%. Raycasting proxy geometry is not dependant of the sampling depth.

(a) Proxy ge-ometry for Tex-ture Slicing

(b) Proxy ge-ometry for Ray-casting

(c) Final ren-dering

Figure 4.6. Screenshots of a fused rendering of three skulls. Rendering at 25 and 15

Fps for Texture Slicing and Raycasting respectively on a 512 square viewport with 512 slices on a unit volume.

(43)

(a) Bsp-tree (b) Parts (c) Queue

Figure 4.7. Screenshots of a fused rendering of two volumes. Grey polyhedrons represent

incomplete cells (internal nodes) while colored implies a complete cell (leaves)

4.6 Fusion Module Results

Figure 4.7 illustrates a small Bsp-tree and its resulting queue for a dual volume intersection.1 _{For a setup with the volumes aligned in one dimension the tree}

exhibits a total of 13 fragments in 9 nodes resulting in a queue with 5 entries. The non axis alignment of the partitioning required to support bricking as discussed in chapter 2 is apparent in figure 4.7(b).

4.7 Render module results

Overall application performance is highly dependent on several factors. Influences of interaction, scene complexity and sampling density on the rendering speeds have been isolated and are illustrated in 4.8(a), 4.8(b) and 4.8(c) respectively. Previ-ously in this section the relation between scene volume placement and Bsp-tree generation times were demonstrated. In line with this relation, between the dense and sparse placement measures in figure 4.8(b) and those for doubled sampling density in 4.8(c), volume placement has a higher impact on the overall frame rate. Raycasting exhibit an invariance in rendering speed in terms of interactivity while Texture Slicing demonstrates a considerable drop in frame rates once camera in-teraction is present. The bottom blue line in figure 4.8(a) is in fact two almost identical lines, with and without camera interaction for Raycasting. This effect originates in the amount of proxy geometry slices that need to be recalculated

(44)

(a) Texture Slicing (TS) display a consid-erable drop in frame rates during interac-tion while Raycasting (RC) numbers remain identical.

(b) Comparison of ren-dering speed regarding scene complexity when rendering with density of 512 samples for a unit volume.

(c) Effects on render-ing speed caused by in-creased sampling density in a sparse scene.

Figure 4.8. Frame rate charts of isolated factors regarding rendering speed.

every frame as the camera changes position. In figure 4.5(a) this corresponds to the render queue (red) parts of the columns. The effect also grows if complexity for a tree for any reason is increased.

(a) 8bit buffer (b) 16bit floating point buffer

(c) 32bit floating point buffer

(d) Rendering

Figure 4.9. Difference for identical rendering between Raycasting and Texture Slicing.

Simple RGB difference multiplied by 100.

Tests were performed on the relative difference between images rendered with Texture Slicing and Raycasting at various texture and buffer precisions on the GPU. This difference is calculated as 100.0 ∗ |raycasting(xi, yj) − slicing(xi, yj)|

in a per channel RGB fashion. As seen in figure 4.9, the difference decreases with increased buffer precision. The noticeable colored artifacts in 4.9(a) are due to Raycasting entry and exit points being faulty if low precision buffer are used. No distinguishing differences in rendering speed were detected during the comparisons of different buffer precisions. This experiment also emphasizes the modularity of the pipeline as the renderings of Texture Slicing and Raycasting as well as the

(45)

comparison calculations are done in realtime using a single fusion module. On top of this, the proxy geometry is also rendered before all parts are visualized in a split viewport as seen in figure 4.10.

Figure 4.10. Four rendering modules operate on a single fusion module to produce a

Large fused GPU volume rendering

Department of Science and Technology

Institutionen för teknik och naturvetenskap

Linköping University Linköpings Universitet

SE-601 74 Norrköping, Sweden

601 74 Norrköping

LiU-ITN-TEK-A--08/108--SE

Large fused GPU volume

rendering

Stefan Lindholm

LiU-ITN-TEK-A--08/108--SE

Large fused GPU volume

rendering

Examensarbete utfört i Vetenskaplig visualisering

vid Tekniska Högskolan vid

Linköpings universitet

Stefan Lindholm

Handledare Gianluca Paladini

Examinator Anders Ynnerman

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Abstract

Acknowledgments

Contents

Chapter 1

Introduction

1.1

Introduction

1.1.1

Aim

1.1.2

Outline

1.1.3

Limitations

1.2

Background

1.2.1

Introduction

1.2.2

The GPU

1.2.3

Direct Volume Rendering

1.2.4

Fusion in DVR

1.2.5

Large Data Sets in DVR and Fusion