UPPSALA UNIVERSITET UPPSALA UNIVERSITY Inst. f¨or informationsteknologi Information Technology Avd. f ¨or teknisk databehandling Dept. of Scientific Computing
- A Distributed Radiosity Renderer by
Mikael Lindkvist Andreas S¨oderlund
Daniel Evestedt Report No. 2000:3
January 27, 2000
Abstract
This document describes the theory of radiosity and an implemen- tation of a distributed radiosity engine. Since the results from radios- ity calculations are retrieved in a view independent manner, it is well suited for a realtime walkthrough engine. The program that we devel- oped consists of two parts, both implemented using Java and Java3D.
The rst part deals with the distribution and execution of radiosity calculations. It is based on the progressive radiosity method, with modications that allows it to execute in parallel distributed over a network. The second part displays the result, and lets the user walk around in scene in realtime. The main theoretical issue investigated and dealt with in this project is how to partition a scene in a way that is suited for both parallel calculations and ecient network distribution.
Contents
1 Introduction 4
1.1 The MAD project . . . 4
1.1.1 Distributed Radiosity . . . 4
1.1.2 3D viewer . . . 5
2 Radiosity 5
2.1 The radiosity equation . . . 52.2 Progressive radiosity . . . 7
2.3 Form factors . . . 7
2.3.1 Hemicube approximation . . . 11
2.3.2 Raycasting . . . 11
2.3.3 Nusselt approximation . . . 12
2.4 Visibility calculations . . . 12
2.4.1 Bounding Volumes . . . 12
2.5 Adaptive meshing . . . 13
3 Distributing the radiosity calculations 13
3.1 Standard radiosity distribution . . . 153.2 Progressive radiosity distribution . . . 15
4 Implementation 16
4.1 Overview . . . 164.2 World setup . . . 18
4.2.1 Patching and partitioning . . . 18
4.3 World distribution . . . 19
4.4 The radiosity calculations . . . 19
4.5 Class hierarchy . . . 20
4.5.1 World . . . 20
4.5.2 LocalWorld . . . 21
4.5.3 RemoteWorld . . . 21
4.5.4 Partition . . . 21
4.5.5 PartitionImpl . . . 21
4.5.6 PartitionServer . . . 21
4.5.7 RadiosityNode . . . 21
4.5.8 RadiosityTriangle . . . 21
4.5.9 RadiosityBound . . . 21
4.5.10 FormfactorCalculator . . . 21
4.5.11 RayCaster . . . 22
4.6 Textures . . . 22
5 Results 22
5.1 Scaling of the system . . . 22
5.2 Load balancing with adaptive subdivision . . . 24
5.3 Textures . . . 25
6 Conclusion 25
6.1 Proposal of further research . . . 257 Acknowledgements 26 A Java3D 29
A.1 Introduction to Java3D . . . 29A.1.1 Structure . . . 29
A.1.2 Interaction . . . 30
A.1.3 Java3D in a Radiosity 3D Engine . . . 30
B File-formats 31
B.1 Choice of Java3D loader . . . 31B.2 The LightWave Scene format . . . 31
C How to use the program 32
C.1 RadiosityRenderer . . . 32C.1.1 Parameters . . . 32
C.2 Rendering in local mode . . . 33
C.3 Rendering in distributed mode . . . 33
C.4 ViewSceneFile . . . 33
C.5 Parameters . . . 34
C.6 Viewing a Scene le . . . 34
C.7 Note on memory usage . . . 34
1 Introduction
Most 3D-engines use re ection models that approximate the physical world, because these are less time- and CPU-consuming than the physical re ection model [18]. A classic example is the Phong model [11]. This model is view- dependent, which means that the object color has to be recalculated when the user moves around in the world. A big disadvantage with this method is that it uses the local re ection model, which produce unrealistic shadows because only light from the lightsources are considered when computing the color of an object. Re ections between objects are ignored, and no visibility tests are made.
The raytracing model is of little use in a 3D-engine. The fact that it is view-dependent and far more time- and CPU-consuming than the ap- proximating models makes it unsuitable for real-time graphics whatsoever.
The raytracing model is a global re ection model. This means it uses the re ections from other objects when computing the color of an object. This model is much closer to the physical world than the approximate models above, and therefore produce very realistic images, especially images with a lot of specular highlights. The \problem" is that it can not handle diuse re ections correct, e.g. raytraced shadows tend to be very sharp.
Another popular 3D model is the radiosity model. This a global re ection model, and it is even more time- and CPU-consuming that the raytracing model, but has the advantage of being view-independent. This makes it possible to solve the radiosity equation once and for all and obtain the color intensity of object surfaces, and then use this information in every frame update with little or no additional computations. Of course even this method has disadvantages. The biggest one is that the world must be static.
If one or more objects are moved, the radiosity equations have to be solved all over again. Radiosity is in many ways the opposite of raytracing. It does handle diuse re ection correctly, but can not handle specular highlights.
1.1 The MAD project
The aim of this project is to develop and implement a distributed radiosity 3D-engine using Java and the Java3D package. Because the radiosity model is view-independent as mentioned above, the radiosity renderer compute the color intensity of all object surfaces, and then hand this over to the 3D-engine for viewing. The renderer uses the Lightwave scene le-format.
See Appendix B for more details.
1.1.1 Distributed Radiosity
The radiosity model requires large computational resources. Because of this, it is a good idea to subdivide the computations and run them in parallel.
This can be done in many ways, depending on what kind hardware and network are available. For example using Java threads on a multiple pro- cessor machine [7]. However, we chose a distributed solution. This solution has the advantage of being easy to implement in Java using RMI (Remote Procedure Invocation) and run in ordinary computer networks. The dis- advantage with this technique is large communication overhead, as well as dealing with load-balancing among the participants.
1.1.2 3D viewer
The 3D viewer use the color intensities computed by the renderer in a simple 3D-viewer implemented using the Java3D package. This should be quite an easy task, since the Java3D package has a lot of built-in-features.
2 Radiosity
Radiosity is a technique to simulate the ow of light between perfectly diuse surfaces in a closed environment. It is extremely computational expensive but generates nice eects such as color bleeding and soft shadows. The good thing about it is that once the computation has been done the solution is totally view independent which makes it easy to move around in the envi- ronment with no extra radiosity calculations. To compute the radiosity for a patch we take into consideration the eects all other patches has on this patch. Hence a slight change of the environment requires a total recomputa- tion of the solution making it bad for design where you want to see the eect of a change you do to a model without having to wait too long. Research has been done to cope with this problem and several solutions are proposed [14]. For example an iterative solution can be used where the designer can get a preview of the nal solution and see the eects of the changes fast and decide to discard or keep them. Then the picture gets better and better the more iterations are done. There are several dierent techniques for doing an iterative approach and they are described in more detail in Section 2.2.
As we said before radiosity handles only the diuse colors of the envi- ronment. Hence view-dependent eects like specular highlights, mirrors and refraction are not handled. To get the most realistic images hybrid methods with other techniques can be used [10] [13]. For example all the diuse light- ing can be calculated with a radiosity algorithm and then specular highlights and other things can be added with the help of raytracing.
2.1 The radiosity equation
In the radiosity model each surface is divided into a lot of small patches (the size of the patches depends on how accurate the solution is to be) which each is said to have a constant radiosity. The radiosity of a surface is the rate
at which energy leaves that surface, i.e energy per time unit per area unit.
The radiosity of a given patch is given by
B
i=Ei+iXn
j=1
(BjFij) (1)
where
B
i is the radiosity for patch i
E
iis the emitted radiosity of patchi(ifEi >0 the patch is considered a lightsource)
i is the re ectivity of patchi
F
ij is the form factor from patch ito patch j.
Each patch has a similar equation and to get the radiosity values for all patches we can set up a linear equation system to solve
0
B
B
B
B
@
1,1F11 ,1F12 ::: ,1F1n
,
2 F
21 1,2F22 ::: ,2F2n
... ... ... ...
n F
n1
,
n F
n2
::: 1,nFnn
1
C
C
C
C
A 0
B
B
B
B
@ B
1
B
...2 B
n 1
C
C
C
C
A
=
0
B
B
B
B
@ E
1
E
...2 E
n 1
C
C
C
C
A
(2) After this system has been solved we have the radiosity value for all the patches in the model.
The heaviest computations are done in the form factor calculations. The form factor is a value indicating how much a surface eects another. It is a totally geometric property depending only on the relative position of the two patches. Much computations are done in deciding if there are any occluding patches between them. There are many ways to calculate and approximate the form factor and it will be handled in more detail in Section 2.3
Once the form factor calculations are done the problem get down to solve the equation system. This can be done with some standard method like Jacobi or Gauss-Seidel. The problem is that the number of patches tends to get quite large. So large that one get problems storing it. The largest equation systems that can be handled today are about the size 30000x30000.
A radiosity solution can easily end up in much larger systems than that, so another approach must be made. Cohen [2][4] has suggested the incremental solutions described below.
2.2 Progressive radiosity
There are several variations of the progressive radiosity but they all follow the same basic algorithm.
for
each iterationdo
select a surfacei
calculateFij for all surfacesj
for
each surfacejdo
update radiosity of surfacej update emission of surfacej
end for
set emission of surfacei to zero
end for
This loop is executed until the unshot free light in the scene is below a given threshold.
The advantage of the progressive radiosity is that you can generate a picture showing how the scene looks after each iteration and don't have to wait for the entire calculation to be done before seeing any result. It also does not have the same memory requirements as the standard solution. You have the option to stop the calculation before it converges if you think that the picture is good enough.
The dierent variations are:
Gathering where one patch is updated by gathering light energy from all other patches.
Shooting where the light energy from one patch is distributed to all other patches.
Shooting and sorting where the patch with the most unshot light en- ergy always are chosen as the shooting patch and all its energy is distributed to all other patches.
The shooting and sorting technique is the best and gets the fastest con- vergence and gathering is the worst.
In Figures 1, 2 you can see snapshots from our progressive radiosity ren- derer. The gures show the Cornell box after 2, 8, 200 and 1000 iterations.
2.3 Form factors
The form factor is an important part of the radiosity calculation. It is a property indicating how much a given patch eects another patch. The property is totally geometric depending on the distance between the patches,
(a) Cornell box after 2 iterations.
(b) Cornell box after 8 iterations.
Figure 1: Cornell boxes during progressive radiosity.
(a) Cornell box after 200 iterations.
(b) Cornell box after 1000 iterations.
Figure 2: Cornell boxes during progressive radiosity.
area of the patches, position relative each other and the occlusion of other patches. Because of this it only has to be computed once. The formula for the form factor is
F
ij = 1
A
i Z
A
i Z
A
j
cosicosj
r 2
V
ij dA
j dA
i (3)
where
F
ij is the form factor between patch iand patch j
A
i is the area of patch i
P
i is a point on patchi
V
ij is the visibility between patchiand patch j
r is the distance between the two patches
i is the angle between the normal of patchiand the vector from patch
iand patch j
0i
0j Ni
Nj
Ai
A j
Pi
Pj
r
Figure 3: Two patches showing the form factor variables.
Much of the calculations done in this equation goes into the visibility calculation since any patch can be an occluding patch. Hence all patches have to be taken into consideration. There are some ways to minimize these calculations and it will be further discussed in Section 2.4. The formula includes a double integral and it is not practical to solve this analytically.
Hence several techniques to approximate the form factor have evolved.
2.3.1 Hemicube approximation
The hemicube originally proposed by Cohen [3][4] is a geometrical approx- imation of the form factor. A hemicube (half a cube) is placed around the receiving patch and its surface is divided into cells. The smaller cells the better the approximation. Each cell is then assigned a delta form factor which indicates how much a surface in the direction of the cell will aect the receiving patch. All other patches are then projected to this hemicube and in each cell the distance and identity of the closest patch projected onto it is stored. The form factor is then approximated as the sum of all delta form factors of the cells the patch has been projected as the closest onto.
Unfortunately there are some problems with the hemicube since some light- sources and occluding surfaces can be missed if e.g. it gets projected on the edges of the delta cells.
Figure 4: The hemicube. All triangles are projected onto the surface of the cube and the form factor is the sum of the delta form factors of the cells it is projected onto.
2.3.2 Raycasting
In raycasting a random point is selected on the source patch and a ray is shot from each of the corners of the recieving patch to check if there is any occluding patches in between them. Then a form factor estimation is done for each vertex. To get better approximations several rays can be shot and the form factor is some kind of mean value of the approximated form factors.
With raycasting all patches are taken into account and none are missed. On the other hand more calculations are needed.
2.3.3 Nusselt approximation
Nusselt developed a geometric approximation of the form factor. The form factor between two patches is approximated by projecting one of the patches onto half a sphere which is set up around the other patch. This projection is then projected down to the base. The form factor is the projected area at the base divided by the base area.
A
B
Figure 5: Nusselt approximation. The form factor is approximated by B/A where B is the projected area and A the base area.
2.4 Visibility calculations
Before a form factor between two points on dierent surfaces can be com- puted, one must know if these point are visible to each other. The simplest way to do this is to cast a ray from one point to the other and check if any other object in the world intersect with this ray. As one can see, this implies a lot of computations, i.e. one object/lightsource-ray intersection for each object. The accuracy of the visibility computation depends on the number of sample points used.
2.4.1 Bounding Volumes
Bounding Volumes are a way to group adjacent objects (polygons) together and enclose them in \virtual" bounding volumes. This creates a hierarchy of objects and groups of objects (the algorithm may be recursive). The idea is that if the ray does not intersect with the bounding volume, it does not intersect with any of the object inside it. The shape of the bounding volume may vary. A simple shape, such as a box is easy to use in intersection
computations, but is does not closely t the objects enclosed. The other extreme is a convex hull. This ts the enclosed objects much better, but is more complicated to perform intersection computations on [16].
2.5 Adaptive meshing
To get a good quality of the solution the surfaces in the scene has to be subdivided into several small patches [19]. The more and smaller patches the better the solution will be, but it will also increase the memory and cpu-requirements. A way to cope with this is adaptive meshing. With this technique the patches can be further subdivided than the original meshing in areas where it is needed, i.e. shadowborders and boundaries between objects. By using this technique we do not divide areas that do not need dividing and therefore get less patches.
To know when to divide a patch we calculate the radiosity for several points on the patch and see that the dierence in the radiosity for the points are not above a certain threshold value. If they are, the patch is subdivided into smaller pieces and the radiosity values for these new patches are calculated and is checked for possible further subdivision. This process is continued until no further subdivision is needed. The results of adaptive meshing can be seen in Figure 6.
3 Distributing the radiosity calculations
Since the radiosity computations are demanding it is quite natural to try to distribute it over several computers to achieve better performance. Most workplaces today have a lot of computers connected in a network. If each one of these machines could calculate a share of the solution the performance would be much better than just running it on one computer, even if it is a very fast one. The ideal would be that the performance gain given by adding computers would be linear. Unfortunately we don't live in a perfect world and communication overhead causes the performance gain to get less and less the more nodes are added. To minimize this eect we must try to keep the network communication very low and let the nodes work most of the time on their own.
(a) Cornell box viewed as wireframe.
(b) Cornell box viewed as wireframe using adaptive meshing.
Figure 6: Cornell box with adaptive meshing.
3.1 Standard radiosity distribution
In the standard radiosity solution the distribution comes down to
1. Distributing the form factor calculation and possibly the storage of them.
2. Distributing the solution of the linear equation system.
To calculate the form factor between two patches we unfortunately have to consider all the other patches in the environment in order to check for visibility. This is not good since the problem then gets bad locality and it might be dicult to distribute the calculation without a lot of communica- tion between the nodes. One way to do it is to distribute the whole world to each node. This is not desirable however, since the world can be very large.
We also get the problem to keep some form of consistency between the copies of the world. The best way would be to divide the patches over the nodes and only let them take care of the patches they have. Then we could have a master node which asks the slaves questions about their patches. When e.g. a visibility test has to be done the master can let each node decide if their patches is in between.
When the form factors have been calculated the solving of the equation system remains. There are several ways to do this.
3.2 Progressive radiosity distribution
As described before the progressive radiosity loop is the following
for
each iterationdo
select a surfacei
calculateFij for all surfacesj
for
each surfacejdo
update radiosity of surfacej update emission of surfacej
end for
set emission of surfacei to zero
end for
There have been several attempts to distribute this loop. One has been control distribution where several instances of the iteration is started at the same time and executes concurrently [1]. The problem with this approach is that information has to be exchanged between the dierent instances of the loop as the radiosity values get updated. This will generate a lot of overhead, especially if a shooting method is used since in shooting methods every patch get updated for each iteration. Gathering on the other hand
updates only the value of one patch in each iteration which leads to a lot less communication between the nodes. The bad thing about it is that you have to store the entire world at each node.
Another approach is to distribute the data and let each node perform calculations on their own dataset. This can be done by distributing the inner loop. When the surface has been selected it is send to all nodes and each node performs form factor calculation and radiosity updates locally.
What needs to be stored at each node is the original model of the world to be able to perform visibility calculations locally. This approach is nice since the only overhead needed is the decision of which patch to choose and the broadcast of the patch information. The hard part here is to get a good load balancing so that no node is idle for a long time while the other nodes are doing computations. This might not be a problem if we have equally fast computers and the patches do not change during the iterations since we then can just give each node an equal number of patches to handle. But if we use adaptive meshing the dividing of patches might make the system unbalanced.
To cope with this we can either do some kind of smart initial distribution which takes into account how patches are divided and distributes patches that probably are going to get divided to dierent nodes. Otherwise some sort of dynamical load balancing which nds out if some node is idle and moves the work there. This method requires overhead to do this which is unwanted. It might even take up more time to do this movement than just waiting for the other nodes to nish. Careful considerations have to be done when selecting what technique to be used.
4 Implementation
This section covers implementation of our distributed radiosity engine. Java- specic documentation for classes and methods was generated using javadoc.
It is distributed along with the source code. For information about Java3D, see Appendix A.
4.1 Overview
Our radiosity solution uses progressive radiosity with shooting and sorting.
We have chosen this approach since it is impractical to solve the large sys- tems of equations. Probably we will have too large systems. It is also nice to be able to show intermediate solutions, since we don't have to wait for the entire solution to nish before seeing if something is wrong. To calculate the form factors we use the raycasting technique and we also have implemented adaptive meshing to get more realistic scenes.
For the distribution of the radiosity calculations we have chosen Java's RMI (Remote Method Invocation). It is a very easy way to get things done on another node without sending messages. Once a remote object has been
initialized it can be used as any other object. Methods can be called just as usual but the execution will take place on another node. This makes it very easy to distribute things.
The basic structure of the implementation can be described as in Figure 7. It shows the ow of the program. We have tried to do our implementation as modular as possible, so that it will be easy if we want to change e.g. the algorithm for form factor calculation.
update radiosity To scene representation
Initial meshing
Patch distribution init partition
local maxflux
local radiosity update get patch with
most unsent flux
3D Engine Merge patch
trees local patchessend
Input file
patches for partition
request patch with most flux
local maxflux patch
shooting patch
Master Slaves
Repeat until total unshot flux below threshold Remote partitions
request calculated values
calculated values
Figure 7: Flowchart of the distributed radiosity engine.
4.2 World setup
The objects used in the world are all represented as triangles. The loader reads in a model of the world specied in the Lightwave leformat(lws) and transfers it to a scene object of Java3D. We then convert this to our own representation called RadiosityNode since we do not need all the information in the Java3D scenegraph and we need to do some manipulation which can not be done easily otherwise. The triangles in the tree are then subdivided down to a specied area which is given by the user.
4.2.1 Patching and partitioning
The actual subdivision, or patching, of the geometry is done as shown in Figure 8. Given is the RadiosityNode obtained from the Java3D SceneGraph (1). The triangles of this RadiosityNode is patched into a number of smaller triangles (2). Note that depending on the original sizes of the triangles some triangles might be patched more than others and some might not be patched at all. Once we have such a tree we can partition it into as many partitions as we have clients in the system (3). The black boxes in the gure represent null references.
2. Patched tree.
1. Original tree.
3. Partitioned into two partitions
Figure 8: Geometry patching and partitioning.
ATION19
Copied to all nodes among the nodes
Element after initial meshing Original scene patch Bounding box
Figure9:Distributionofscene.
W orld distribution
edistributetheworldrepresentationbycreatingaremoteobjectoneachvewithitspartofthepatches.Everynodegetstheoriginalrepresentationworldbeforethemeshinginordertospeedupvisibilitycalculations.themeshedelementsoftheworldgetdistributedequallyoverallnodes.somekindofloadbalancingtheelementsareorderedsothatelementsareclosetoeachothergeometricallyalsoaresointheordering.Thisistotheobservationthatitismostlikelythatifanelementgetssubdividedelementsnexttoitwillbesubdividedtoo.Thisisbecauseareasthatgete.g.shadowbordersandboundariesbetweenobjects,stretchesseveraladjacentpatches.Whentheorderingisdonewejustgothroughelementsoneatatimeanddistributeittodierentnodes.The radiosit ycalculations
allthisisdonewecanbegincalculatingtheradiosityvaluesfortheFirstweneedtogetthepatchwithmostunsent uxasshootingh.Thisisdonebythemasterinvokingamethodonalloftheslavestheirpatchwithmostunsent ux.Themasterthencomparesthevespatchesandchoosestheonewithmostunsent ux.Anothermethodinvokedonallslaveswiththispatch,requestingtheslavestoupdateradiosityvaluesoftheirownlocalpatchesinrespectofhowtheshootingheectsthem.Thisincludesformfactorcalculationswhichinourcase,weuseraycasting,alsomeansvisibilitycalculationsfortherays.Tothesecalculationsupeveryslaveusestheoriginalstructurewiththetriangles.Wheneverynodehasnishedtheirupdatethepatchmostunsent uxischosenandwestartoveragain.Thisiscontinueduntil the total unsent ux in the world gets below a certain threshold. Then we convert our representation back to a SceneGraph and pass it to the 3D- engine.
4.5 Class hierarchy
The structure of the program is shown in Figure 10. This section gives a description of all the parts.
4.5.1 World
The World class represents everything in a given scene. It contains all tri- angles and their surface properties. It also contains a FormFactorCalculator which calculates the form factor between the triangles. This can be done in many ways and to change the algorithm for how the form factor is calcu- lated we only need to change the FormFactorCalculator of the world. This is actually an abstract class dening which methods should be available when implementing a world representation.
RadiosityTriangle LocalWorld
FormfactorCalculator
PartitionServer RadiosityNode
RadiosityBound
RayCaster
PartitionImpl Partition RemoteWorld
World
extends extends
implements extends
contains one
or several implements
extends
contains
contains
contains
contains contains
Figure 10: Structure of the radiosity engine.
4.5.2 LocalWorld
Extends the World class by implementing a representation of the world with a RadiosityNode. This class is used when the entire world is to be stored and all the calculations are to be done on one machine.
4.5.3 RemoteWorld
Implements a world representation where the world is divided into partitions and every partition is stored on a dierent machine. This way the calcula- tions can be done in parallel on the dierent partitions which will speed up things signicantly.
4.5.4 Partition
Interface dening the operations which are to be done on a partition of a World, e.g. updating the radiosity of all the patches in the partition
4.5.5 PartitionImpl
Implementation of the Partition interface. It represents the Partition with a RadiosityNode structure.
4.5.6 PartitionServer
Does the initial RMI setup. Starts up a server which takes care of the remote calls. It creates a new partition object and binds it to the rmiregistry. This makes it accessible to the master.
4.5.7 RadiosityNode
The base class for the tree representation. Contains a Vector of child Ra- diosityNodes to represent the tree.
4.5.8 RadiosityTriangle
Contains the information about a triangle needed for radiosity calculations, such as vertices, radiosity values and normal.
4.5.9 RadiosityBound
Bounding primitives used for speeding up the visibility calculations. These are not used in our current implementation.
4.5.10 FormfactorCalculator
Interface used to implement dierent methods of calculation form factors.
4.5.11 RayCaster
Implementation of the FormFactorCalculator interface. Calculates the form factor using the ray casting method (see Section 2.3.2).
4.6 Textures
The number of triangles in a scene can be very large if a good radiosity solution is done. This results in a lot of triangles to show in the 3d-engine.
In order to set a lower bound on how slow a scene will be rendered we can instead of showing all these small triangles calculate a texture from all triangles in an original surface. This texture can then be applied to this triangle and it will be constant frame rate independently of how many triangles the original ones have been subdivided in. The problem with this approach is that in order to store the textures we need a lot of memory and we need a dierent texture for every original patch. This approach works best when the original scene has large triangles which are subdivided into many smaller triangles. If the triangles are small from the beginning it should not be used.
5 Results
5.1 Scaling of the system
We have tested the program on Sun Ultra 10 Creator 3D machines. We tried out some dierent congurations. We rendered the Cornell box locally and distributed over 2, 4 and 8 machines with patch area 1000 and 10 rays for the raycaster. The results can be seen in Figure 11. As you can see the system scales very well for at least 8 machines. Unfortunately we didn't have access to any more machines so we could not do any further testing.
The results indicate however that the scaling would be good for more ma- chines as well, since the speedup so far is close to linear. The actual time and convergence can be seen in Figure 12.
Figure 11: Scaling of the renderer.
Figure 12: Rendering time.
5.2 Load balancing with adaptive subdivision
As an option in our program we have adaptive subdivision. As you can see in Figure 6(b) the triangles will be further subdivided in the shadow borders and borders between objects. We have tried to distribute the triangles so that they will be subdivided equally on all nodes. This is done by distribut- ing triangles that are close to each other on dierent nodes [12] [19]. The result of this load balancing is shown in Figure 13. In this scenario we used the Cornell box scene (as usual) on six nodes using adaptive meshing. After the initial meshing the scene consisted of 1464 patches which were equally divided between the nodes. We then calculated the scene and it resulted in subdivision of these patches so that our nal scene had 8593 patches. The load balance is quite good. Node 5 is the one which has to wait most and is idle for about 20 % of the time. Maybe some kind of dynamic load balancing algorithm would do it better but there you have a lot of overhead just to do the balancing which may result in even more performance loss. Our static algorithm is much easier to implement and generates a good enough result so we didn't think it was necessary to try to improve it with a dynamic one when we did not know if it was going to increase the performance at all.
The bad thing about it is that all the machines must be equally fast or the the system will go at the speed of the slowest computer.
Figure 13: Load balancing using 6 machines and adaptive subdivision.
5.3 Textures
The texture part did not live up to our expectations. In our experiment the textured scene was slower to render and of course took up a lot more memory than the scene with all the triangles. The textured scene contained 44 triangles with textures while the normal scene contained 2234 without textures. It might have been the graphics card that didn't have any good support for textures but we don't know. The only good thing with the textured scene was that it didn't change in terms of memory or speed no matter how much the triangles were subdivided. So if the scene is to be divided into smaller triangles the textured scene will eventually be the faster one.
6 Conclusion
We have developed a distributed radiosity renderer and viewer with decent scalability and load balancing. The solution is not optimal, but since we only had a limited amount of time working on this project, we chosed to use a simple and robust solution.
Radiosity computations are very time- and CPU-consuming, and since Java run on a virtual machine, it was not a very good choice of programming language in that respect. On the other hand it was very easy to develop the distributed part using Java's RMI, and the slaves can run on any platform with Java installed. This is an important feature when dealing with heavy calculations like this. Another nice feature is that with little eort the masters and slaves can run as applets in browsers all around the world.
6.1 Proposal of further research
The most annoying thing with radiosity is that the world has to be static.
This is quite boring since there can be no interaction with the world. Try- ing to implement some way of moving objects and lightsources around is something that would be interesting to investigate. If the form factors are saved when calculating a scene things can be done much faster if something is moved. This is because not all form factors have to be recalculated. If what we want to do is to change the lighting in the scene we don't have to recalculate any form factor at all. The problem can be to store them all in memory since the number of patches in a world tend to be vast. Maybe we can try to approximate the shadows instead of recomputing the radiosity solution [14] and get reasonable good eects.
Another thing to do is to try to optimize the load balancing in a dis- tributed system. Either by coming up with a new improved static load balancing algorithm or try implementing a dynamic one so that the balance can get better if we have machines of dierent speeds.
Our renderer does not handle texture maps. It would be nice to have textures on the objects which control the re ection and emission. This would enable us to model more realistic scenes much easier. There is a however problem on how to handle the texture maps in a radiosity renderer.
This is since the re ection of each patch would depend on the texture map and the texture map does not have a constant re ection value. A naive solution would be to subdivide the patches to the size of the pixels in the texture, since then the re ection would be constant over each patch. This will however result in a lot of extra computation. Another way to do it might be to do a rough subdivision and let the re ection of each patch be the mean value of the texture pixels on the patch. This is not a particulary good solution either since we will not keep the resolution of the texture map. There have been some research however where better methods have been suggested [6].
Another interesting thing to do is some sort of multi-pass solution. The radiosity solution gives us the diuse colors of all surfaces. This however is not enough if we would like to put metallic surfaces or some other shiny surface on an object. We would not get any specular highlights or mirror eects. To be able to get these eects we could use our radiosity colors as the ground colors and then add a new layer on top of this which handles these eects, e.g. let the 3d-engine add the specular colors. This would make the overall impression much better.
The raycasting algorithm requires a lot of visibility calculations. As it is now we calculate a ray/patch intersection for every patch in the world to see if it is in the way of the ray. This could be speeded up if bounding boxes are used so that objects that are close to one another are bundled together inside a box. Then we would not have to take these patches into consideration unless the ray hits the box and thus avoid a lot of unnecessary computation.
7 Acknowledgements
We would like to thank Mark Ollila for his endless stream of possible and impossible ideas. Hans Frimmel for his administrative work that enabled our work to proceed smoothly at all times. Anders Sjoberg for just being there. NCSA's Java3D group for the great Portfolio package. And nally all the people frequently hanging around in the 446 lab providing us with inspiration and many good ideas (and breaks).
References
[1] Alan G. Chalmers and Derek J. Paddon. Parallel processing of pro- gressive renement radiosity methods. Second Eurographics Workshop on Rendering (Photorealistic Rendering in Computer Graphics), pages 149{159, 1994.
[2] Michael Cohen, Shenchang Eric Chen, John R. Wallace, and Donald P.
Greenberg. A Progressive Renement Approach to Fast Radiosity Im- age Generation. In Computer Graphics (ACM SIGGRAPH '88 Pro- ceedings), volume 22, pages 75{84, August 1988.
[3] Michael Cohen and Donald P. Greenberg. The Hemi-Cube: A Radios- ity Solution for Complex Environments. In Computer Graphics (ACM SIGGRAPH '85 Proceedings), volume 19, pages 31{40, August 1985.
also in Tutorial: Computer Graphics: Image Synthesis, Computer So- ciety Press, Washington, 1988.
[4] Michael F. Cohen and John R. Wallace. Radiosity and Realistic Image Synthesis. Academic Press Professional, Boston, MA, 1993.
[5] Thomas A. Funkhouser. Coarse-Grained Parallelism for Hierarchical Radiosity Using Group Iterative Methods. In Computer Graphics Pro- ceedings, Annual Conference Series, 1996 (ACM SIGGRAPH '96 Pro- ceedings), pages 343{352, 1996.
[6] Reid Gershbein, Peter Schroder, and Pat Hanrahan. Textures and Ra- diosity: Controlling Emission and Re ection from Texture Maps. Tech- nical Report CS-TR-449-94, Department of Computer Science, Prince- ton University, Princeton, NJ, March 1994.
[7] Adam Johansson. Distributed progressive radiosity using java., 1999.
[8] Sun Microsystems. Java3d documentation, 1999. Retreived January 18, 2000 from the World Wide Web:
http://java.sun.com/products/java-media/3D/
forDevelopers/j3dapi/index.html.
[9] Sun Microsystems. The java3d tutorial, 1999. Retreived January 18, 2000 from the World Wide Web:
http://java.sun.com/products/java-media/3D/collateral/.
[10] Laszlo Neumann and Attila Neumann. Radiosity and Hybrid Methods.
ACM Transactions on Graphics, 14(3):233{265, July 1995.
[11] Bui-T. Phong. Illumination for computer generated pictures. Commu- nications of the ACM, 18(6):311{317, June 1975.
[12] G. Schau er, W. Sturzlinger, and C. Wild. Load balancing for a parallel radiosity algorithm. Technical Report CEI PACT D4V-3, University Linz, January 1995.
[13] Frank Schoeel. Online radiosity in interactive virtual reality applica- tions. In ACM Symposium on Virtual Reality Software and Technology 1997 (ACM VRST '97), pages 201{208. ACM Press, September 1997.
[14] Frank Schoeel and Michael Meixner. Realtime shadow feedback for interactive radiosity scenes using shaded multipoints. In Proceedings of the ACM Sympoaium on Virtual Reality Software and Technology (VRST '98), November 1998.
[15] Brian Smits, James Arvo, and Donald Greenberg. A Clustering Algo- rithm for Radiosity in Complex Environments. In Computer Graphics Proceedings, Annual Conference Series, 1994 (ACM SIGGRAPH '94 Proceedings), pages 435{442, 1994.
[16] Wolfgang Sturzlinger. Bounding volume construction using point clouds. 12th Spring Conference on Computer Graphics, pages 239{246, June 1996. ISBN 80-223-1032-8.
[17] Seth Teller, Celeste Fowler, Thomas Funkhouser, and Pat Hanrahan.
Partitioning and Ordering Large Radiosity Computations. In Com- puter Graphics Proceedings, Annual Conference Series, 1994 (ACM SIGGRAPH '94 Proceedings), pages 443{450, 1994.
[18] A. Watt. Rendering techniques: Past, present and future. ACM Com- puting Surveys, 28(1):157{159, March 1996.
[19] Yizhou Yu, Oscar H. Ibarra, and Tao Yang. Parallel progressive ra- diosity with adaptive meshing. In Lecture Notes in Computer Science (IRREGULAR '96: Parallel Algorithms for Irregularly Structured Prob- lems), volume 1117, pages 159{170, Berlin, Germany, 1996. Springer- Verlag.
A Java3D
A.1 Introduction to Java3D
Java 3D is a high-level API for creating advanced 3D graphics in Java [8]
[9]. It supports many of the most common techniques used in 3D graphics applications:
Surface attributes. For example textures and colors (and lots of tech- niques for manipulating textures).
Lights. Supports dierent types of lightning models such as Gouraud- and Flat-shading. Dierent types of light sources are also included, ambient, directional, point-light and spotlight for example.
Manipulating objects. Objects can be grouped together and treated as a single object. Object can be translated, rotated and scaled using matrix operations. Of course customized matrixes can be applied as weel.
Bounding volumes. Several types of bounding volumes can be used.
There are methods that automagically computes a bounding volume around an object/group of object.
Of course there are a lot of other features not listed here, such as collision detection, sound etc. The Java3D API uses OpenGL as the interface towards the hardware, which makes it very fast and platform independent since OpenGL implementations exists for most platforms.
A.1.1 Structure
Java3D uses a rather complicated tree structure, which can be manipulated in various ways. There are basically two types of nodes, Node and the Node Component. The Node is the base class for Java3D objects, eects and behaviors that can be applied to objects and dierent types of groups containing objects. Node Components on the other hand, species the at- tributes, for example color and texture, and the geometric shapes of the Leaf which is a type of Node. These tree structures are called scene graphs in Java3D. The Leaf objects contain references to appropriate subclasses of Node Component. Figure 14 shows a simple scene graph.
To render a scene graph, it has to be inserted in a Locale object. This is then inserted in a Virtual Universe Object. The Locale can be seen as \a world in the universe". Theoretically a Virtual universe can contain several Locale objects but this is not recommended at this point by the authors of Java3D.
000000 000000 000000 111111 111111 111111
Other Objects View
View Platform Shape3D node
Appearance Geometry Locale
VirtualUniverse
BranchGroup nodes
TransformGroup nodes Behavior node
Figure 14: Java3D Scenegraph tree.
A.1.2 Interaction
The Java3D API provides methods to interact with the user in various ways.
Input devices such as keyboard and mouse allows the user to move objects in the scenegraph or to move its viewpoint. Collision between objects can be detected using the Behavior class in Java3D. This class denes for example how an object should behave when it collides with another object or what should happen when a certain amount of time has elapsed.
A.1.3 Java3D in a Radiosity 3D Engine
When using radiosity, much of the features in Java3D becomes unnecessary, because all shadows will be computed in advance in the radiosity computa- tions, and not during viewing. This means that we don't need lights from Java3D, because lightsources in the radiosity model are ordinary surfaces emitting energy. However, we do need the collision detection parts, and the fact that there are many le-loaders available for Java3D makes it a quite handy tool for us.
B File-formats
B.1 Choice of Java3D loader
At rst we planned to use the 3DStudio le format, but we couldn't nd a decent 3DS-loader for Java3D. We then discovered Sun's LightWave Scene loader for Java3D. This one worked ne on Sun machines, but not on Linux and Window machines. Being quite tired of non-working loaders we nally found NCSA's Portfolio package which contained several loaders, for exam- ple a LightWave Scene loader that worked on all tested platforms (as Java programs should do). This is the one we are using.
B.2 The LightWave Scene format
You can specify the emittance of an object in Lightwave but unfortunately Java3d doesn't handle it. To be able to specify the emittance we instead use the specular component of the objects for this. The reason for using the specular value is simply that specular values are not used in the radiosity model, we are only dealing with diuse light.
C How to use the program
We have created two programs, the RadiosityRenderer and the ViewScene- File classes. RadiosityRenderer renders a LightWave Scene and produces a Scene le. the Scene le can then be viewed using the ViewSceneFile program.
C.1 RadiosityRenderer
The rendering process starts when user execute the RadiosityRenderer class.
Serveral parameters can be used, and are ne explained in section C.1.1. The program uses the following syntax:
RadiosityRenderer [parameters] <LightWave Scene>
<output Scene file>
Though all parameters are optional, most of them should be changed depending on the LightWave Scene data. You should at least set the patch area.
C.1.1 Parameters
There are several parameters that adjust the quality and speed of the ren- dering process:
-dr
, disable rendering. Disables the rendering process and save a scene le.
-h
nh1 h2
:::h
n, hosts to use in distributed mode. n is the number of hosts to be used in a distributed rendering process. h1, h2, :::, hn are the names of the hosts.
-it
n, Stop after niterations.
-mt
n, adaptive meshing threshold value. If the radiosity values at the vertices of a triangle diers more than this value, the triangle will be subdivided.
-nr
n, number of rays. n determines how many rays the raycaster should use. A large number of rays will produce a more realistic scene, but will increase the rendering time.
-pa
n, maximal patch area. n is the maximal patch area. Triangles with an area larger than this value will be subdivided. Knowledge about the size of the \original triangles" (the triangles in the Light- Wave scene le) is recommended to be able to set this value properly.
-q
, Quiet mode. Don't print iteration numbers and percent left.
-se
n, Stop when n [0:1] of total unsent ux is left.
-si
n, Save a scene at everyn:th iteration. Uses the output lename every time, overwrites previous saved scenes.
-sn
n, Snapshot JPEG picture every n:th iteration.
-tf f
n, use radiosity values from scene le f. Start counting iterations from numbern.
-ut
n, use textures withnsamples per world coordinate.C.2 Rendering in local mode
For example, to run renderer in local mode using the LightWave le foo- bar.lws and write Scene le foobar.scene using parameters patch area 1000 and 5 rays:
java RadiosityRenderer -pa 1000 -nr 5 foobar.lws foobar.scene
You can stop the rendering process by pressing any key on the keyboard.
The scene le will be saved as soon as the current iteration is done.
C.3 Rendering in distributed mode
First you need to start the rmiregistry (used by RMI) and the Partition- Server on each host:
rmiregistry
java PartitionServer
For example, if you want to render the LightWave le foobar.lws and write Scene le foobar.scene using parameters patch area 1000 and 5 rays, using the hosts andreas, daniel and igge:
java RadiosityRenderer -pa 1000 -nr 5 -h 3 andreas daniel igge foobar.lws foobar.scene
Pressing any key will stop the rendering process and save the Scene le.
C.4 ViewSceneFile
The ViewSceneFile class displays the Scene le and lets the user navigate in the scene using the arrow keys and the Page Up/Down keys. Several parameters can be used, and are explained in Section C.5 The program uses
the following syntax:
java ViewSceneFile [parameters] <Scene file>
C.5 Parameters
-wf
, use wireframes. Display Scene using wireframes.
-ut
, use textures. Can only be used if scene was rendered using the texture paramter C.1.1.
-lw
, view LightWave le.C.6 Viewing a Scene le
To view a scene le called foobar.scene, type:
java ViewSceneFile foobar.scene
C.7 Note on memory usage
When rendering and viewing very large scene Java runs out of memory.
However you can tell Java to use more memory using the parameter -mxnm, where n is the number of bytes. For example, -mx256m tells Java to Use 256 Mb memory.