Concepts of Hybrid Data Rendering

(1)

Department of Science and Technology Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--17/032--SE

Concepts of Hybrid Data

Rendering

Torsten Gustafsson

2017-06-21

(2)

LiU-ITN-TEK-A--17/032--SE

Concepts of Hybrid Data

Rendering

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Torsten Gustafsson

Handledare Rickard Englund

Examinator Ingrid Hotz

Norrköping 2017-06-21

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Concepts of Hybrid Data

Rendering

A thesis performed in Media Technology

at Link¨oping University.

Torsten Gustafsson

Examiner

Ingrid Hotz

Supervisor

Rickard Englund

Norrk¨oping. June 25, 2017

Department of Science and Technology Link¨oping University

(5)

Abstract

This thesis describes methods for advanced visualization of 3D data. Order-independent transparent data between multiple datasets of varying types can be combined using an A-buffer. In this thesis, an implementation of such a buffer is presented. An optimisation of the A-buffer, based on the discarding of unnecessary fragments will be presented. A line integral con-volution method for 3D volumes will also be presented, where noise input, and resolution of the input- compared to output volumes will be discussed. The resulting method is able to visualize multiple types of data with order-independent transparency. The rendering speed is generally fast enough for an interactive user-experience.

(6)

1 Introduction

In visualization we try to develop the concepts and tools necessary for the understanding of data. This data can look very differently depending on the context. Some examples of data sets may consist of scalar, vector, volumetric or geometric data, to name a few. In many applications it is a combination of these different data types that make for an intuitive visualization.

When performing advanced volume rendering it is sometimes necessary to render geometric data on top of volumetric data sets. To achieve this, a hybrid data visualization method needs to be used. A problem with rendering hybrid data sets is how to render the different data sets in a correctly depth-sorted, per-pixel order.

There exist many different methods that render volumetric data. Some of the methods will be discussed here. Most prominently, volume raycasting in combination with mesh rendering, as well as 3D line integral convolution are the methods this thesis focuses on.

1.1 Purpose

This thesis aims to discuss and present some interesting hybrid data visual-ization methods, which may be useful when the more traditional rendering methods are not enough.

Much of the focus has been on implementing an efficient A-buffer renderer to store fragments before rendering them to screen. This allows us to render any kind of data together in the same view. This type of rendering allows us to render transparency of volumes and geometry without extra work, since the rendering is done in two passes, and the alpha value is easily stored together with the colour values.

Many of the visualizations used in this paper are based on a dataset of a human heart. This dataset will be discussed in detail in chapter 6.

A method for reducing the amount of data that have to be passed to the A-buffer will be presented. Here we choose to call this method ”Early Fragment Discarding”, because of its similarities with the commonly used method called ”Early Ray Termination” that is used in volume raycasting.

(9)

The actual implementations was made in Inviwo [Inv]. The main reason for this was that Inviwo is developed by the people at the InfoVis depart-ment, which made it easy to work around unexpected problems with the implementations. It is also a powerful tool for performing visualization such as the ones presented in this thesis.

The thesis aim to answer the following research questions:

• Can a hybrid data visualization method be implemented in Inviwo? • How can we render such a visualization in real-time?

• Which problems will be the most important to solve in terms of per-formance?

1.2 Report Structure

The structure of this thesis report is as follows:

• The second chapter, Background, will discuss why we want to visu-alize data, and present some of the methods used for it, as well as some related work in the field. As the work focuses a lot on A-buffer methods, theory on such methods will be presented, as well as a prefor-mance improvement to these, based on early discarding of unnecessary fragments.

• The third chapter, Method, will present the development process, and discuss the problems that arose, and which solutions that was used. • The fourth chapter, Results and Evaluation, will discuss the results of

this work, and some ideas for possible future work in this area.

• The fifth chapter, Case Study, will discuss some of the application areas this work was tested on.

(10)

2 Background

This section will discuss some general visualization methods used, and present the platform used for the specific implementations that took place.

2.1 Visualization of Data

Good visualization of data is important to understand the information dif-ferent datasets may contain. There exist many techniques and methods to visualize data. Some of them are specialized for a specific type of data. For example, some datasets consists of points in space, which are meant to be rendered by making triangles from them, and then rendered as simple ge-ometry. On the other hand, volume data can be rendered in a wide range of different techniques [Eng+06], and while one technique may fit well for some problems, other volume rendering techniques may be better suited to visualize the data when other information needs to be highlighted.

In many cases the data needs to be visualized using multiple methods to extract all of the interesting data. A visualization method can be considered better than another if it is able to show more information in a way that is easy to understand to the user. Who is using the visualization method is also important to know, since an experienced user may be more interested in the details of the data, while others may want to see a bigger picture of it, to understand what they are looking at.

2.2 Mesh Rendering

Based on points in 3D space. It is often rendered by grouping these points in groups of three, and rendering them as triangles, with each triangle con-taining colour information.

(11)

2.3 Volume Rendering

Often based on some kind of grid in 3D space, where each element is called a voxel (derived from the name pixel, which is the name of an element in 2D space). These elements contain information such as colour or intensity. Volumes are often visualized using ’volume raycasting’.

2.4 Related work

There exist many methods that achieves specialized rendering of data. In many cases the rendering of multiple, separate objects are necessary. For opaque objects, this is easily achieved using a depth buffer that only allow new fragments that are closer than the currently stored fragment to overwrite it. In the OpenGL pipeline [KSS17] this method is built in and can be easily implemented in code.

Problems occur when these fragments can be transparent, since there is no way of knowing if the object currently being rendered is behind or in front of another object, before we start rendering that object. To solve this problem, multiple techniques have been proposed. One of these are based on performing multiple depth tests using two depth buffers. This method is called ’Depth Peeling’ [Eve01]. A problem with this method is that it may become slow if the scene has got a high depth complexity.

There exists another method that is based on using a buffer to store a list of fragments per screen pixel. This method is called the A-buffer [Car84], and is a widely used method that have been the base for many improvements in the years since it first was proposed. An example of such an improvement is Crassins method [Cra10] of using dynamically sized lists of fragments using a fragment list. A problem with the A-buffer method is how to optimise the local memory for each pixels’ fragment list. Because of the way GPU shaders work, dynamic array sizes is impossible to implement in a shader program. A solution to this is proposed by Lindholm et al. [Lin+14]. Their method is called ”per-pixel array optimization” and is based on using multiple shaders which uses different array sizes, and sending pixels of certain depth complexities to a shader using a similar array size. Lindholm also proposes another improvement called ”per-pixel depth-peeling”, which removes the problem of having to allocate too large array sizes completely.

(12)

k-buffer [Bav+07]. It works similar to the traditional Z-k-buffer, which stores only the frontmost fragment for each pixel, but it can store up to k fragments, sorted and blended in a single pass. A possible advantage of this compared to the A-buffer is that it does not have to define a maximum scene depth it will be able to handle.

When rendering data sets containing vector fields, a popular method to use is ’Line Integral Convolution’ (LIC). This area is well-defined in the 2D case, but when it comes to 3D-LIC, the works are not as numerous. Falk et al. [FW08] presents a specialized method of 3D-LIC that is largely independent of the input size when it comes to the rendering speed. Interrante [Int97] discusses the rendering of multiple overlapping volume surfaces using 3D-LIC, by using the volumes isovalue surfaces to render textures to.

2.5 Inviwo

Inviwo is a visualization toolkit developed at Link¨oping University [Inv]. It is written in C++, and uses OpenGL and GLSL to render graphics output. The methods presented in this thesis have been implemented as parts of the inviwo pipeline.

2.6 Rendering of Volumes and Meshes

In this section we will discuss specialized rendering methods for different types of data, mainly volume data and mesh data. We will briefly explain the classic approach for rendering these data types, and then discuss how these different methods can be combined.

2.6.1 Differences of Volume and Mesh Data

Mesh data is based on vertex points in 3D space. These points are often related to each other, in that we can make triangles out of their relation to each other. These triangles are then rendered to the screen. This rendering method is very common and almost all forms of 3D graphics uses it. Volume rendering on the other hand, can be performed in many different ways. The volume data is often represented by a cubic grid, where each element is called a voxel. These voxels may either contain a single value each (scalar data)

(13)

or multiple values (vector data). The most common method of rendering volume data is volume raycasting, which is based on sending a ray from the camera (eye) coordinates through the volume, and sampling each voxel it passes through.

Because of the way these data types differ, it is difficult to render them simultaneously. We need a unified method that handles both data types at the same time in order to make it possible.

2.6.2 Order-Independent Transparency

A problem of rendering meshes occurs when multiple triangles intersect each other, or some triangles are behind others. To correctly render this infor-mation, we need to know the depth of each fragment rendered. Using a depth-buffer, we can check the depth of each new fragment, and compare it to the current depth value stored in that screen coordinate. If its value is lower, we overwrite it, otherwise we throw it away. This is all well and good, but what if we want to see the triangles that are behind the front ones?

C.a_{= 1 − (1 − fg.a) ∗ (2 − bg.a)}

C.rgb= fg.rgb ∗ fg.a + bg.rgb ∗ bg.a ∗ (1 − fg.a) C.a

(2.1) This is where the classic RGB (Red, Green, Blue) colour space got re-placed with RGBA (Red, Green, Blue, Alpha) in order to get the alpha value, or opacity, of the fragment as well. The fragments that occupy each screen coordinate can then be sorted and blended together using the alpha blending equation, shown in eq. 2.1.

In this equation, ’C’ represents the resulting colour value from blending the foreground colour ’fg’, with the background colour ’bg’. The blended RGB values depend on the blended alpha value, which means it will need to be calculated first.

A problem with this approach is when multiple, separate objects (data sets), are to be rendered one at a time. Because they are rendered sepa-rately, we have no way of sorting their fragments together using conventional methods (for example, OpenGL:s build-in alpha blending only handles one object at a time), and we will thus have problems when they intersect with each other. There are several methods that solve this problem. One of them is using a buffer which first stores all the fragments, and then renders them

(14)

(a) Simple geometries (b) A raycasted volume (c) Combined scene with added transparency

Figure 2.1: The A-buffer is able to combine mesh data and volume data by storing the mesh fragments, and then rendering them together with a volume. to the screen in a second pass. This method is called the A-buffer. Another method, called depth peeling, utilizes the built in depth buffer to perform a rendering technique commonly called ’ping-ponging’, that is based on per-forming a rendering pass on the scene, storing the resulting texture to a buffer, and then rendering again, using the stored texture data. This may be performed as many times as needed, and in the case of depth-peeling, every such rendering pass takes us one fragment deeper in the scene. This means that if we have a scene with a highest depth-complexity of 30 fragments, it would need 30 rendering passes for the depth-peeling algorithm to render the whole scene. On the other hand, it is rarely necessary to blend over that many fragments, as the accumulated alpha value often reaches values close to one after relatively few fragments.

2.6.3 Hybrid Data Rendering

Using an A-buffer, we can render almost anything, as long as we can convert the data points coordinate systems into a unified one. For instance, we can run multiple meshes (standard rendering) and volumes (volume raycasting) where we store the fragment data in our A-buffer instead of rendering them directly to screen. Then we sort the fragments and render them in a second pass. Figure 2.1 shows how this can look.

In order to render fragments with an A-buffer, for each fragment, we must be able to retrieve its screen coordinates, its depth value, and its colour

(15)

(a) A streamline visual-ization

(b) A LIC visualization of the same dataset

(c) 2.2b taken from an-other angle

Figure 2.2: Visualizations of a vector field dataset

value. In order to render fragments of varying transparency, an alpha value must also be stored.

Converting the coordinate systems between a volume and a mesh can be done using entry- and exit points of the volume to convert it to world coordinates, while the mesh’s coordinates is converted to world coordinates using its transformation matrix. The important thing with having a correct coordinate system in the case of using an A-buffer, is to get the correct depth value for each fragment. If a volume and a mesh uses their own depth coordinates, either the volume or the mesh will most likely render completely in front of the other. Their own fragments will still be correctly depth-sorted since they are based on the same coordinate system.

2.6.4 Line Integral Convolution For Volumes

Line Integral Convolution (hereafter referred to as ’LIC’) is useful for in-tuitive rendering of vector fields. When performing LIC on volumes, the method must be able to handle three dimensions, compared to the two that is necessary for images.

LIC is based on calculating output values from noise data, and a vector field. Thus, in order to perform LIC on volumes, a 3D noise volume must be available, as well as a volume containing vector data. For each voxel in the output volume, points are sampled along its streamline in the vector field using a discrete integration method. Traversal of the streamline is either forwards, backwards or in both directions. Typically using a fixed step size,

(16)

the streamline is traversed for a limited number of steps, until enough points have been sampled. The mean value of these points is then used as the voxels final colour value.

Figure 2.2a shows a visualization of the streamlines of a vector field. Figure 2.2b and 2.2c shows how a 3D-LIC of the same vector field might look.

Discrete Integration Methods

There exist many discrete integration methods, but only the two most preva-lent methods will be discussed here. These are Euler integration and fourth-order Runge-Kutta integration. Both methods are based on using fixed step sizes, and Runge-Kutta’s method can be seen as an extension of Euler inte-gration, since first-order Runge-Kutta integration works exactly the same as Euler’s method.

yn+1 = yn+ h ∗ ˆf(pn) Euler integration (2.2)

Euler integration first takes the vector field value of its starting point, and moves in its direction with the length of the step size. Then it takes the new point it landed on, and does the same, until all steps are complete. This method is often inaccurate as the vector field typically turns faster than the length of the step size. Using a smaller step size reduces the error, but increases computation time as more steps are needed. eq. 2.2 shows how new points are calculated. Here yn+1 is the new value to be calculated (in

vector form), yn is its previous value, h is the step length and ˆf(pn) is the

(normalized) direction of the vector field in point pn. the variable n starts

of at 0 and iteratively increases for a fixed number of steps. k1 = f(pn) k2 = f(pn+ h 2k1) k3 = f(pn+ h 2k2) k4 = f(pn+ hk3) yn+1 = yn+ h ∗ k1+ 2k2+ 2k3 + k4 6 Runge-Kutta 4 integration (2.3)

(17)

Another method is the fourth-order Runge-Kutta integration. Instead of moving directly in the direction of the vector field in the current point, it takes the direction of the sampled point, and, from the same starting point, moves in that direction by half the step size. Then it samples the direction of the vector field in that point, moves again from the start point using half the step size. Then in the last step it samples a fourth point using the full step size. When all points are sampled, it calculates in which direction to move based on these four sampled directions. This method is generally much more accurate than Euler integration. Eq. 2.3 shows how it calculates the new direction. In this equation, the different k values are used to calculate the resulting new value (for each step) yn+1. f (x) is the vector field value at

point x. pn is the current position (point) for each step.

2.7 A-Buffer Rendering Methods

The concept of a-buffer is open to different implementation methods. This section explains two of the most common ones, which are based on 3D-textures and linked lists of fragments.

An A-buffer that is able to handle varying transparency and colour of fragments, need to be able to store A-data (red, green, blue and alpha) as well as depth information. This amount to five separate values, which is often not easily stored in a single data container, because since more than four values is used so rarely, OpenGL’s build-in data containers do not include any such built-in container. Instead, we need to define two buffers to store the data, for example one that holds RGBA-data, and a separate one that holds the depth value.

2.7.1 Using 3D Textures

This is most likely the most intuitive method when writing an A-buffer im-plementation. It is based on defining a 3D-texture to store A-buffer elements, which is used as an array of 2D textures. For each coordinate in the image, an array of fragments is stored to later be used for rendering the final image. Figure 2.3 shows an illustration of how such a texture may look. Here it is clear that many elements will often remain unused, and is thus taking up

(18)

y x z

Figure 2.3: Illustration of a 3D texture storing fragments. The z-dimension holds the depth-complexity of each final screen pixel, while the x-y dimen-sions hold the screen pixel coordinates. Here we see an example of a scene of non-uniform depth. Out of 3x3x3=27 possible fragments, only 9 are used. In real applications, the number of unused slots is often a lot higher than this. unnecessary space in the global GPU memory.

The process goes like this: Geometry is first rendered as usual, using a shader that stores the fragments into the 3D-texture in its next available slot. This means that rendering multiple meshes means that every fragment of each mesh will be stored, but there is no way of knowing in which order they were rendered. This is solved in the next step, where a new shader is used. This shader renders the whole screen space as a rectangle. By using the fragment coordinates, it takes the list of fragments that have been stored for each screen coordinate, and sorts that list based on the stored depth values. Alpha blending is then performed on each fragment in the list to generate the final pixel value.

2.7.2 Per-Pixel Linked Lists

Another method that can be used is to define a buffer to store fragments, with an accompanying buffer that stores pointers between these elements. By having the first fragment of each coordinate point to the zero index, and having the remaining ones point back to its previous fragment, a lot of memory can be saved compared to using the 3D-texture method, since the buffer can be defined to be just as large as is necessary. This is advantageous especially for scenes with high variation of depth complexity. Figure 2.4

(19)

Screen texture

p

points to its last fragment fragment fragment fragment end pointer Indexed RGBA+depth container Global counter

(next fragment available)

Figure 2.4: A screen texture is used to store pointers to the last stored fragment, which in turn points to its previous fragment. By using these pointers, we are able to access the RGBA+depth values of each fragment. shows the concept of using linked lists of fragments with an A-buffer.

A problem with using this method is that it will generally be slower than a texture based method, due to bad cache coherency between the fragments, since every fragment is stored separately in the buffer. The L1 and L2 cache memory is generally the fastest layers of a graphics card, that is used as an intermediary between the GPU and the global graphics memory. It is generally very limited in size, and this means that modern graphics cards tend to fetch chunks of memory at a time to improve speed. But if the coherency between elements (in our case fragments) is bad (they are stored in far away locations in memory), it means that the cache will not manage to fetch multiple elements in a single fetch, and will thus require more fetches.

2.7.3 Using Fragment Pages for Improved

Cache-Coherency

A way to improve the coherency in memory is to utilize pages of fragments, instead of using single fragments as elements of the linked list. This is done by defining a fixed ”page size” to be used by the linked list. A pointer in the list will then point to a page of fragments instead of a single fragment.

This method should theoretically remove (or at least minimize) the prob-lem of bad cache coherency when using a linked list of fragments. Figure 2.5 shows how the pages would be stored in the linked list.

(20)

Screen texture

p points to its last page

end pointer

Global counter

(next page available)

page page

The pixel's last page may contain empy ﬁelds

Figure 2.5: Here the screen texture points to a page of fragments. In this illustration the page size is three fragments, but it can essentially be any number. By storing the fragments in pages, we get better performance on the GPU.

2.7.4 Early Fragment Discarding

In many scenes, there is a lot of fragments that are stored unnecessarily. When the alpha values of the closest fragments are high enough to completely occlude the fragments behind them, we would optimally want to have a way of knowing beforehand which fragments those would be, since that would allow us to skip the occluded fragments completely in the first pass.

Here we present a method we call ”Early Fragment Discarding”, which partially solves this problem. It is based on using a buffer that stores the previous frame’s highest depth that was used to store a fragment. In the first rendering pass, for every fragment passed into the shader, it checks if the depth is higher than this value. If it is, the fragment is probably not going to be visible in the final image and is thus discarded.

By storing fewer fragments, the performance increases for multiple rea-sons. One is that the storing of a fragment in itself takes time, and also that by having fewer fragments, sorting them will become faster.

Information loss using the highest stored value as the threshold for dis-carding fragment is a bit high, especially during rotation, which is something we want to avoid. By only using this method while the user is dragging the scene (rotation, translation etc.) and then performing a full rendering once the user stops interacting, a good compromise between performance and cor-rectness can be achieved. Since the fragment depth is stored in the value space of 0-1, we can easily add a margin to the threshold as well, which will

(21)

allow us to keep some of the fragments that could potentially be incorrectly discarded on interaction of the scene. A good value for this margin was found at 5%, in which most of the correct fragments were kept, while still significantly improving performance, compared to not using the method at all.

(22)

3 Method

This chapter discusses the methods that were used in implementing the actual modules and processors in Inviwo. It presents the data types that can be visualized, as well as the options users have for how to visualize the data.

3.1 Rendering Hybrid Data

One of the goals of the work was to research methods of rendering different data types together. Specifically, mesh data and volume data was the focus of the work. These data types are well managed in Inviwo, so the implemen-tation mostly consisted of adding the functionality of the A-buffer to render these data types together.

Inviwo is built around processors, which are connected through inports, outports and connectors. It has got the data types for meshes as well as volumes, so a processor that is supposed to handle any amount of these data types should take a list of meshes, and a list of volumes. Volume raycasting in Inviwo is generally optimised by finding the optimal entry- and exit points for any given volume. This means that these should first be calculated in separate processors, and then sent to the rendering processor. If multiple volumes are to be allowed, multiple entry- and exit points must also be calculated. Mainly for this reason, it was decided that the rendering processor should only be able to handle one volume. Another reasoning for this was that rendering multiple volumes together is rarely necessary, and that there exist other methods of doing so in Inviwo without using an A-buffer.

3.2 A-buffer and Meshes

Because the method only needs to handle one volume, the A-buffer can be simplified to only handle the mesh fragments. By first sending these to the A-buffer shader, the fragments get stored (unsorted) in a list, which is then sent to the second rendering pass, where the fragments are sorted. After having sorted the mesh fragments, standard volume raycasting can be performed on

(23)

(a) 3D-LIC using purely random noise

(b) 3D-LIC with Poisson disks as noise input

Figure 3.1: Two LIC visualizations of the same volume vector field, with different noise inputs. Here we can see that while random noise give a good result, Poisson disks will often give more distinct lines.

a volume, where we compare the foremost fragment in the A-buffer list with the current raycasted depth, with each volume fragment, and if the depth is lower, we insert the mesh fragment first, and remove it from the mehs fragment list. By doing this until all fragments have been rendered, a scene with OIT of multiple data types is gained. This method can be likened to performing a merge sort of two sorted lists.

3.3 3D Line Integral Convolution

A 3D-LIC method was implemented using Euler integration, as well as Runge-Kutta order 4 integration. For every voxel in a vector field volume, a gener-ated noise volume is traversed with a certain amount of steps, in the vector field’s forward- and backward directions. For every step traversed the noise texture is sampled. When all the points are sampled, their mean value is used as the final colour of the voxel.

(24)

3.3.1 Random Noise-Volume Generation

In order to get good results from any LIC, it is important to use a good random generation model for the noise used. In this case, using a 3D volume as the noise input, two types of noise was tested. These were purely random noise, where each voxel get a random value between zero and one, as well as noise generation using Poisson disks in three dimensions.

Out of these two algorithms, the Poisson disk method was generally better at reducing residual noise in the output volume. The method is based on generating points in space, based on a randomly generated starting point.

(25)

4 Results and Evaluation

This chapter will discuss the results of the work, as well as the conclusions drawn from them.

4.1 A-buffer method

Two A-buffer rendering methods have been presented. One using 3D-textures and one using linked lists of fragment pages.

The 3D-texture method have been fully implemented, and a user may set parameters such as the maximum depth that the A-buffer will handle (this represents the size of the z-dimension in the 3D-textures), and to perform a special rendering that highlights the areas with high depth complexity in the scene.

The other method that is based on linked list, have been fully imple-mented, but have got some unexplained bugs, shown in figure 4.1. In figure 4.1a it is apparent that some artefacts appear when rendering basic geome-tries. This problem is less apparent when using more complex meshes. See figure 4.1b. Why these artefacts appear is still unknown, but since they only appear when using certain page sizes, it is most likely due to how the GPU handles multithreading. There may be something wrong with the write-lock currently in place, allowing multiple fragments to write to the same memory slot before the global counter is increased.

The memory allocated for this method is dynamic, in the sense that it starts with a low allocation value, and upon rendering, if the buffer is filled up, it stops, expands the memory buffers, and restarts the procedure. This goes on until the whole scene is filled without exceeding the limits. This gives in many cases much better memory utilization than the method that is based on 3D-textures.

This answers the first research question ”Can a hybrid data visualization method be implemented in Inviwo?”. The simple answer is that it can. In-viwo has full support for modern Open-GL which allows us to use an A-buffer to render fragmetns in two separate passes. The current implementation only

(26)

(a) Artefacts appearing on basic ge-ometries.

(b) On smaller meshes, it is much more subtle.

Figure 4.1: An issue when using certain page sizes on the linked list method is that visible artefacts may appear. This is most likely due to how the GPU handles multiple threads, overwriting values before the counter (which points to the next available memory slot) is increased.

have support for Nvidia graphics cards however. If it is possible to add sup-port for AMD cards is still not certain, as it has not been tested.

4.1.1 Customizable Properties

Both methods have helper processors for performing special rendering of transparent meshes. A ”mesh properties” processor takes a mesh as input and outputs the mesh data in a format that the A-buffer processor can handle. This processor allows the user to set a transparency value of the input mesh, as well as giving the option for special rendering of it. Only one special rendering option have been implemented, which renders the mesh using line rendering instead of the more standard triangle rendering. These lines takes on the same transparency values as the other method, and is meant as an example of the possible rendering methods that can be implemented using a separate mesh properties processor.

The A-buffer compositor processor allows the user to set the size of the 3D-texture, to allocate more memory, which allows scenes with higher depth-complexity to be rendered. The user may also choose to render the scene in a way that the depth complexity is visualized. This is done by counting the number of fragments for each screen pixel and finding the one with the highest number of fragments. The scene is then rendered in red with intensity values based on this value. Fragments with low depth-complexity will then be get colours closer to black, while high complexity fragments will get more

(27)

saturated shades of red.

4.1.2 Problems and Solutions

When implementing an A-buffer renderer, some problems that we had not thought of appeared that we had to find a solution for. The most prominent of these was the linked lists artefacts on geometries, that is explained at the top of this section. We have still not found the solution for these, but when rendering lots of small meshes, such as with the heart dataset (see chapter 6), the artefacts were much too small to be visible, and the method can still be considered viable in certain situations.

When handling scenes with very large depth-complexity, the A-buffer may sometimes go out of bounds of the memory that is allocated. In such situations, instead of experiencing runtime errors or undefined behaviour, we set the screen pixels with too many fragments to black. Even if we in some situations may have most of the information, we do not want to present it. It was considered better to show clearly to the user that more memory needs to be allocated to get a correct visualization.

4.2 3D-LIC method

The line integral convolution method presented in this paper is a basic variant of performing 3D-LIC. It allows the user to choose between two different noise inputs, namely pure random noise, and noise using Poisson disks. It works by integrating both forwards and backwards for each output voxel. Two alternatives for integration methods is available: The Euler method as well as the Runge-Kutta order 4 method.

The results vary greatly based on which random noise generation method is used, as well as the size of the noise volume dimensions. Having a large noise volume generally gives better results, as more fine-grained details can be extracted, but the disadvanatage of using too large sizes is that performance suffers. Using random noise works well, as it is a fast method since each voxel does not depend on anything else than its own value in the generation of the volume. This means that a large noise volume can be achieved without big performance hits. The Poisson disks method however, is quite a bit slower, as each new voxel to be calculated depends on all its nearby voxels during the volume generation. As the actual implementation was done on the GPU,

(28)

the noise generation often failed when attempting to generate very large volumes using Poisson disks. This is probably because of Open-GL having a limit on how long a shader may run a single frame (anything that takes longer than a couple of seconds automatically fails). The reason for this will need to be studied in more detail for a solution to be reached. Because of this problem, only small input volumes could be tested using Poisson disks, which is prevalent in fig. 5.4b, where large points are visible, due to the noise volume having smaller dimensions than the vector field volume.

4.3 Performance

All performance tests presented here have been made on a computer using a Nvidia GTX 580 GPU, with 4552 MB of graphics memory, and an Intel Xeon W3550 CPU, with four cores and clock speed of 3.07 GHz, unless otherwise specified.

The A-buffer methods presented in this paper are based on performing two rendering passes. The first writes the fragments to a fragment container (which may be different depending on the method used), then it renders them to screen in the second pass. When measuring the performance of the A-buffer methods, it is therefore relevant to measure the performance of these passes separately.

Table 4.1: Performance of the two A-buffer methods. The average depth-complexity was ∼12 fragments for the scene these tests were made on.

Method Max Depth Page Size Avg. FPS 3D Texture 64 - ∼22 3D Texture 128 - ∼18 Linked Lists 64 8 ∼18 Linked Lists 128 8 ∼16 Linked Lists 128 2 ∼15 Linked Lists 128 1 _∼14

In order to get a good time measure of each pass, timestamps were taken before the first pass, after the first pass, and after the second pass. By calculating the differences between these we could measure the time taken for them. When running the tests using different options meant to speed

(29)

up one of the rendering passes, it was discovered that both time differences changed drastically, in contrast to just one, which would have been expected. This is probably due to some hardware optimisations performed on the CPU and/or the GPU, and thus these tests may not be 100% correct.

Still, the biggest bottleneck for the first pass is definitely the writing to textures/buffers. We can not affect this without changing the base method since it is necessary to write all the data in order to use it later. For the second pass, the depth-sorting could be considered as the biggest bottleneck. A simple insertion sort is used to sort all the fragments for each screen pixel. This was estimated to be ”good enough” since most pixels should not have a depth-complexity of more than 10 fragments for most scenes. Performance could possibly be improved by changing the sorting method into a better one. A reason this was not tested was because GLSL does not allow re-cursive functions, which most of the O(Nlog(N)) sorting methods are based on. Other workarounds for implementing one of these sorting methods on the GPU was skipped due to time-constraints. Disabling depth-sorting com-pletely currently improves overall performance by around 20-30%, but will of course give incorrect results.

4.3.1 Memory Utilization

The method of using a 3D-texture should be faster than the linked list ver-sion, since a texture pre-allocates the memory it needs, which means that its elements will be stored aligned in memory, which is good when fetching data to the L1- and L2 cache memory. By using pages of fragments, this speed difference should theoretically be removed. One clear advantage of using the linked list version, however, is the memory utilization on the GPU.

Table 4.2: Results from a test performed by Crassin [Cra10], that compared the difference in memory utilization using the 3D-textures and the linked lists methods, when rendering transparent objects.

Dimensions (px) 3D Textures Linked Lists 512x512 64MB 6.5MB 768x708 132.7MB 11.7MB 1680x988 405MB 27.42MB

(30)

maxi-mum depth that it will be able to handle. The linked lists method should the-oretically be able to handle any depth-complexity, but since GLSL does not allow dynamically sized arrays, a fixed value was used even for this method. This limits the performance of the method drastically. The advantage with using this method though, is that we will have the option to allocate less memory and still get correct results, compared to a 3D-texture, where the depth of the texture must be at least as big as the depth of the pixel with the highest depth complexity. According to Crassin [Cra10], the advantage in memory utilization of the method using linked lists can be huge when objects of high depth-complexity is rendered in parts of the full screen space, while other parts are either empty or very simple (which is often the case in real rendering situations). Table 4.2 shows Crassins results when comparing the two methods in some typical rendering situations.

4.3.2 Rendering Speed

The method that uses linked lists of fragments is mostly redundant at the moment, because of how the second rendering pass is implemented. GPU shaders does not allow local arrays of dynamic size, which is a problem for this method, as its main advantage is that it is able to handle dynamic sizes of fragment lists. This means that performance-wise the linked lists method could theoretically be a lot faster than it currently is. Its current advantage over the 3D-texture method is that it does not allocate more global memory than necessary, which allows it to handle complex scenes, even when using a graphics card with low global memory.

Table 4.1 compares the two A-buffer methods. The rendering window these tests were made in was 800 by 800 pixels, . Note that the Early Fragment Discarding method was not used here.

The early fragment discarding method was tested with scenes of varying depth complexity, object transparency and resolutions. The results are shown in table 4.3. Here it is clear that the method is able to improve performance of the A-buffer significantly, under the right circumstances. An unexpected result was that for low resolutions, removing the margin yielded worse results compared to using a 5% margin.

While removing the margin generally increases performance, the visual impact of doing so is considered too big. A margin of around 5% was con-sidered good enough, while using 10% often removed the visual artefacts al-together. The performance gain of removing the margin is also small enough

(31)

Table 4.3: Performance of the Early Fragment Discarding method using dif-ferent parameters. All tests were performed on the same dataset. The aver-age depth-complexity of the scenes were ∼10 fragments for the 400x400 px scene and ∼7 for the 800x800 px one.

Dimensions (px) Alpha Value Method Margin Avg. FPS 400x400 0.2 not EFD - ∼30 400x400 0.2 EFD 5% ∼35 400x400 1.0 EFD 0% _∼44 400x400 1.0 EFD 5% ∼46 400x400 1.0 EFD 10% ∼43 800x800 0.2 not EFD - ∼24 800x800 0.2 EFD 5% _∼32 800x800 1.0 EFD 0% ∼41 800x800 1.0 EFD 5% ∼40 800x800 1.0 EFD 10% ∼39 that using it is easily justifiable.

Since the research questions focused heavily on the resulting rendering speed of the algorithms used, these result can be used to answer the questions. First, the question of ”How can we render such a visualization in real-time?” (referring to the hybrid data visualization method), is answered with the method used, namely the A-buffer. Because of the way the A-buffer works, and the improvements that have been made to the implemented version of it, a real-time visualization is achieved.

The other research question related to the rendering speed is ”Which problems will be the most important to solve in terms of performance?”. The A-buffer which is based on two rendering passes, has its main bottleneck in the storing of the fragments. This problem was solved to a large extent when introducing the EFD method. Another performance bottleneck (while not quite as big) is the sorting of the fragments. A faster sorting method than the one currently implemented would most likely improve performance even more.

(32)

4.4 Future Work

Our hybrid data rendering method using an A-buffer is fully working and is ready to be used in multiple types of scenes. Something that may definitely be improved though, is the performance. Both A-buffer methods used have a big performance problem when it comes to the second rendering pass. This is due to the local array sizes for each pixel being fixed to a predefined value. This problem may be reduced or even removed, using either the per-pixel array optimization, or per-pixel depth-peeling method proposed by Lindholm et al. [Lin+14]. The performance gain of using these methods could be up to 50%, based on tests using limited sizes of the local arrays, and comparing with the static size necessary for a full rendering of the same scene.

As discussed in the Results section, a better sorting algorithm could po-tentially be implemented to improve performance for scenes with high depth complexity.

An improvement to the 3D-LIC method includes better noise generation. The Poisson disks method is generally good, but it is very slow at generating high resolution noise in 3D space. The optimal resolution of the noise input, in comparison to the output resolution could be studied in more detail, in order to improve performance while still keeping a good visual representation.

(33)

5 Case Study

Case studies were performed on a couple of datasets. This chapter will discuss how these studies were performed and the results gained from them.

5.1 Visualization of a Heart Dataset

The implemented methods was tested a dataset of a human heart [Bus+15]. The dataset was acquired using MRI scans of a healthy volunteer, and it represents the 4D blood flow of the heart, with the fourth dimension repre-senting the blood flow over time. Figure 5.1 shows a volume and a streamline visualization of one of the dataset. The volume visualization was made using volume-raycasting of the heart’s scalar values (not the blood flow).

In many cases, a pathline visualization is more intuitive than a streamline one, since pathlines follow a particle along its path over time. A pathline generator for the heart dataset was already available. A visualization of the pathlines of the heart can be seen in figure 5.2.

5.1.1 A-buffer Methods

A goal of this thesis was to render the blood flow together with the ”shell” of the heart. A blood flow visualization already existed, but because of the method used in visualizing it, it was difficult to see where the ’edges’, or the shell of the heart was. Volume raycasting would help to make this clear, by defining a transfer function that highlights the edges of the actual heart.

These methods could easily be combined using the A-buffer method de-scribed in this paper. By generating meshes from the blood flow visual-ization, and using volume raycasting, we could combine these two using an A-buffer-compositor processor. Figure 5.3 shows the results of some of these visualizations. Comparing figures 5.2 and 5.3b it is apparent that some of the shading is lost when using the A-buffer. This is simply a feature that has not been implemented yet and could most likely be added without problem.

(34)

(a) Streamline visualiza-tion of the bloodflow of a heart.

(b) The heart dataset rendered using volume-raycasting

Figure 5.1: The heart dataset contains both vector and scalar values, al-lowing visualization methods like streamlines, and volume-raycasting to be performed on it.

5.1.2 3D-LIC

A 3D-LIC representation of the heart was made in order to visualize the streamlines of the heart at specific time intervals. Optimally a LIC of the pathlines of the blood flow would have been implemented, but due to time constraints this was not done. Visualizations of the hearts streamlines is considered more difficult to interpret as the user only sees the blood flow at a specific time-step. Thus, to get a true understanding of the data, all timesteps must be visualized one by one. In figure 5.4 two different visualizations of the 3D-LIC was generated using random noise, as well as Poisson disks as noise input.

(35)

Figure 5.2: Pathline visualization of the heart dataset. The magnitude of the velocities is shown using a colour-scale from blue to yellow to red. The red area is in the middle and is mostly hidden in this image.

(36)

(a) Combined pathlines and volume raycasting visualizations

(b) Pathline visual-ization using the A-buffer

(c) Same as 5.3b, but with 80% trans-parency

Figure 5.3: A-buffer visualizations of the heart. Figure 5.3a shows the volume ’shell’ together with a pathlines visualization. Figures 5.3b and 5.3c shows the A-buffer rendering of only the pathlines. Compared to figure 5.2, some of the shading is lost here, but on the other hand, transparency can now easily be added, as seen in 5.3c. With the added transparency, the red areas of the pathlines are now more visible.

(37)

(a) LIC representation using random noise

(b) LIC representation using Poisson disks as noise input

Figure 5.4: The 3D-LIC was generated using two different noise inputs. With random noise (5.4a), and using Poisson disks (5.4b). Here the random noise gives clearer lines, which is probably due to the poisson disks method having generated too few points.

(38)

5.2 Visualization of Protein Molecules

Datasets of two different protein molecules was used for testing the A-buffer. The datasets were taken from the Protein Data Bank. The data represents proteins of Scheffersomyces stipitis [PDB17] and yeast protein [PDB16]. The Scheffersomyces stipitis is a much larger protein than the yeast variant.

Using this data, several different mesh representations have been made for the purposes of this case study. These are: a surface visualization, a cartoon visualization, a licorice visualization, and a Van der Waals-force visualization. The purpose of the cartoon visualization is to draw the protein ’backbone’. The licorice visualization highlights the important protein side chains. The Van der Waals force is one of the weak chemical forces, and represents a distance-dependent interaction between atoms.

5.2.1 Using the A-buffer

Using the A-buffer presented in this paper, combined visualizations of these data representations can be made. In figure 5.6, different combinations of these representations have been visualized together, using the yeast protein dataset. By showing multiple representations of the same dataset together like this, a user is able to get an easier understanding of the relation of these representations.

Combinations of the large protein dataset is presented in figures 5.7, 5.8 and 5.9. By combining representations two at a time, a user may be able to derive additional information. Showing more than two at once will often result in cluttering of the data, and the user may not be able to separate what is what. A solution to that while still allowing the user to interact with all representations at once is to make the transparency values of each individual representation adjustable. This allows the user to effectively hide data that is not relevant at the moment, allowing better focus on the important parts, while still being able to interact with every data representation at once.

(39)

(a) Surface visualization (b) Cartoon visualization

(c) Licorice visualization (d) Van der Waals-force visual-ization

Figure 5.5: Different mesh visualizations of the large protein dataset [PDB17]. In 5.5a, we see a surface visualization. 5.5b shows the backbone of the protein (cartoon). 5.5c visualizes the important protein side chains (licorice). In 5.5d, the Van der Waals-force between the atoms is visualized as spheres.

(40)

(a) Surface and licorice visualization

(b) The Van der Waals-force and licorice visual-izations.

(c) Cartoon and licorice

visualization (d) All four representa-tions combined.

Figure 5.6: Combined representations of a yeast protein. 5.6a highlights the surface of the protein together with a licorice representation. The Van der Waals-force rendered together with the licorice representation in 5.6b shows how two different forces connects the atoms. 5.6c shows the relation of the protein backbone and its side chains. Finally in 5.6d all representations are combined as one.

(41)

(a) Surface and licorice visualization

(b) Surface and cartoon visualization

Figure 5.7: Surface visualizations of the licorice and cartoon representations in 5.7a and 5.7b give a clear understanding of the size of the protein, when visualizing specific data.

(42)

Figure 5.8: Cartoon and licorice visualizations combined, for the large pro-tein. Here we can see how the protein backbone relates to the side chains. The side chains will for the most part closely follow the backbone, but will sometimes stretch out from it.

(43)

Figure 5.9: All four representations of the large protein combined together. While this may not be all too useful to visualize at once, as we get a lot of cluttering, it serves to show what is possible using this kind of visualization. An argument for using it is that the user may set the transparency for each representation separately, making it possible to show all the relations in one window, while still keeping control of the interesting parts of the data.

(44)

Bibliography

[Bav+07] Louis Bavoil et al. “Multi-Fragment Effects on the GPU using the k-Buffer”. In: I3D ’07 Proceedings of the 2007 symposium on Interactive 3D graphics and games. Pages 97-104 (2007).

[Bus+15] M. Bustamante et al. “Atlas-based analysis of 4D flow CMR: Au-tomated vessel segmentation and flow quantification”. In: Journal of Cardiovascular Magnetic Resonance (2015).

[Car84] L. Carpenter. “The A -buffer, an antialiased hidden surface method”. In: SIGGRAPH ’84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques. Pages 103-108 (1984).

[Cra10] C. Crassin. OpenGL 4.0+ ABuffer V2.0: Linked lists of fragment pages. July 2010. url: http://blog.icare3d.org/2010/07/. [Eng+06] K. Engel et al. Real-Time Volume Graphics. Wellesley, Massachusetts:

A K Peters, Ltd., 2006.

[Eve01] C. Everitt. “Interactive order-independent transparency”. In: NVIDIA OpenGL Applications Engineering (2001).

[FW08] M. Falk and D. Weiskopf. “Output-Sensitive 3D Line Integral Convolution”. In: IEEE Computer Society (2008).

[Int97] V. Interrante. “Illustrating surface shape in volume data via prin-cipal direction-driven 3D line integral convolution”. In: SIGGRAPH ’97 Proceedings of the 24th annual conference on Computer graph-ics and interactive techniques. Pages 109-116 (1997).

[Inv] Inviwo. Inviwo - Interactive Visualization Workshop. url: https: //www.inviwo.org/.

[KSS17] J Kessenich, G. Sellers, and D. Shreiner. OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 4.5. Boston, Massachusetts: Addison-Wesley, 2017.

[Lin+14] S. Lindholm et al. “Hybrid Data Visualization Based on Depth Complexity Histogram Analysis”. In: Computer Graphics Forum (2014).

(45)

[PDB16] PDB. Crystal structure of yeast Cdt1 C-terminal domain. Nov. 2016. url: http://www.rcsb.org/pdb/explore/explore.do? structureId=5meb.

[PDB17] PDB. Crystal structure of Scheffersomyces stipitis Rai1 in com-plex with (3’-NADP)+ and calcium ion. Jan. 2017. url: http:// www.rcsb.org/pdb/explore/explore.do?structureId=5ulj.

Concepts of Hybrid Data Rendering

LiU-ITN-TEK-A--17/032--SE

Concepts of Hybrid Data

Rendering

Torsten Gustafsson

2017-06-21

LiU-ITN-TEK-A--17/032--SE

Concepts of Hybrid Data

Rendering

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Torsten Gustafsson

Handledare Rickard Englund

Examinator Ingrid Hotz

Norrköping 2017-06-21

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

Concepts of Hybrid Data

Rendering

A thesis performed in Media Technology

at Link¨oping University.

Torsten Gustafsson

Examiner

Ingrid Hotz

Supervisor

Rickard Englund

Contents

1

Introduction

1.1

Purpose

1.2

Report Structure

2

Background

2.1

Visualization of Data

2.2

Mesh Rendering

2.3

Volume Rendering

2.4

Related work

2.5

Inviwo

2.6