http://www.voreen.org
Our Contribution
• Common acceleration techniques for raycasting (e.g. bricking) cannot efficiently make use of CUDA-specific features such as shared memory. • Approach: Decompose the volume into image-order slabs instead of
object order bricks, where rays instead of voxels are grouped.
• The voxel sampling results for all rays inside a slab are explicitly cached in on-chip shared memory not available to shaders.
Memory bandwidth is saved if lighting techniques require gradient calculation, preventing multiple texture fetches of neighboring voxels. •Results:
• For Phong lighting the technique can improve the frame rate up to78%. • Slab-based raycasting without lighting is not expected to increase
performance and only shows the inherent overhead of the technique. • This shows how much of a difference the use of shared memory can
make compared to a shader implementation.
•Conclusion: Only a small performance advantage is possible when directly porting raycasting, but additional hardware features can give CUDA implementations a substantial advantage compared to shaders.
Acknowledgments
This work was partly supported by grants from Deutsche Forschungsgemeinschaft, SFB 656 MoBil Münster (project Z1). The presented concepts have been integrated into the Voreen volume rendering engine.
3D Texture Caching
• To evaluate the influence of the hardware texture cache on volume raycasting we created a random permutation of the start and end point textures to deliberately destroy coherence of the texture fetches. • Result: Raycasting relies heavily on texture caching coherence,
therefore acceleration schemes must make sure to keep data locality.
Slab-Based Raycasting:
Efficient Volume Rendering with CUDA
Jörg Mensmann*, Timo Ropinski, Klaus Hinrichs
Department of Computer Science, University of Münster, Germany
Overview
GPU-based raycasting [Krüger and Westermann 2003] is the state-of-the-art rendering technique for interactive volume visualization. The ray traversal is usually implemented in a fragment shader, utilizing the hardware in a way that was not originally intended. New programming interfaces for stream processing, such as CUDA, support a more general programming model and the use of additional device features, which are not accessible through traditional shader programming. We propose a slab-based raycasting technique that is modeled specifically to use these features to accelerate volume rendering. This technique is based on experience gained from comparing fragment shader implementations of basic raycasting to implementations directly translated to CUDA kernels. The comparison covers direct volume rendering with a variety of optional features, e.g., gradient and lighting calculations.
Volume Raycasting with CUDA
• As a preliminary test, we have analyzed basic raycasting with full Phong lighting, which requires multiple texture fetches per sample point. • GLSL shaders from an existing volume rendering system were ported. • Entry and exit points were generated by rendering a proxy geometry. • Nomajor speedup was expected for simply translating the raycasting
shaders to CUDA kernels, as they use the same hardware.
• Result: Speedups of up to 30% were reached, compared to OpenGL fragment shaders (depends on data set, GPU,
viewport size, lighting model).
• Selection of the CUDA block size can have a great influence on rendering performance, see the graph below (tested with an NVIDIA GeForce GTX 280):
References
KIM, J. 2008.Efficient Rendering ofLarge 3-D and 4-D Scalar Fields. PhD thesis, University of Maryland, College Park.
KRÜGER, J., ANDWESTERMANN, R. 2003. Acceleration techniques for GPU-based volume rendering. In
Proceedings ofIEEE Visualization, 287-292. MARŠÁLEK, L., HAUBER, A.,ANDSLUSALLEK, P. 2008.
High-speed volume ray casting with CUDA. IEEE Symposium on Interactive Ray Tracing, 185.
engine data set (256²x128, 8 bit)
default random
vmhead data set (512²x294, 16 bit)
* For questions, please contact:
mensmann@uni-muenster.de
►
lighting viewport basic slab speedup bsopt 512² 158.5 122.0 -23.0% 16x14 786² 131.1 74.1 -43.5% 16x14 1024² 100.2 43.4 -56.7% 16x30 512² 38.1 67.9 +78.2% 16x30 Phong 768² 27.1 34.1 +25.8% 16x30 1024² 17.2 19.5 +12.7% 16x30 none (overhead)