display
Examensarbete utfört iBildkodning
Av
Joel de Vahl
LiTH-ISY-EX05/3748SE
An OpenGL applications usually rendersto a singleframe. Multi-viewor 3D
displaysontheotherhand,needsmoremoreimagesrepresentingdierent
view-ingdirectionsonthesamescene,butmodifyingalargenumberofapplications
would be unsuitable and problematic. However, intercepting and modifying
thesecallsbeforetheyreachtheGPUwoulddramaticallydecreasetheamount
of work needed to support a large number of applications on a new type of
multi-viewor3D display. Thisthesis describesdierentwayson intercepting,
enqueueingand replayingthese calls to support rendering form dierent view
points. Interceptingwith both anown implementationof opengl32.dll and an
OpenGL driver is discussed, and enqueueing using classes, function pointers
and enumerationof functions is tried. The dierent techniques are discussed
quickly with the focus being aworking implementation. This resulting in an
fullyblownOpenGLinterceptorwiththeabilitytoenqueueandreplayaframe
multipletimeswhilemodifyingparameterssuchastheprojectionmatrix. This
implementationusesanownimplementationofopengl32.dllthatisplacedinthe
applicationdirectorytobeloadedbeforetherealone. Enqueueingisperformed
byenumeratingallOpenGLcalls, pushingthis enumeration valueand allcall
datatoalistReplayingisdonebyreadingthesamelistandcallingthefunction
1 Introduction 8 1.1 Background . . . 8 1.2 Purpose . . . 8 1.3 Scope . . . 8 1.4 Method . . . 9 1.5 Overview . . . 9
2 3D displays and interaction 10 2.1 Whyisthereaneedforaninterceptor? . . . 10
2.1.1 Otherpossibleapplications . . . 11
2.2 3Ddisplays . . . 11
2.2.1 Simplestereodisplays . . . 11
2.2.2 Autostereoscopicdisplays . . . 11
2.2.3 Holoform,morethanstereo . . . 12
2.2.4 Othertypesofdisplay . . . 12
2.3 Stereorendering . . . 14
2.4 Desiredusersettings . . . 15
2.5 Conclusions . . . 15
3 The Win32 DLLand OpenGL system 16 3.1 TheWin32DLL . . . 16
3.2 Implicitandexplicitlinking . . . 17
3.3 TheDLLloadingsequence. . . 18
3.4 TheOpenGLchain . . . 18
4.1 Helpingheadersandstructures . . . 21
4.2 TrojanDLL . . . 22
4.3 OpenGLICD . . . 22
4.4 DLLinjection . . . 25
4.5 Conclusions . . . 25
5 Enqueueing and replaying 26 5.1 Objectwrapping . . . 26
5.2 Functionpointers . . . 27
5.3 Enumeration . . . 29
5.4 Conclusions . . . 29
6 Final design and implementation 31 6.1 Specicationparser. . . 31
6.1.1 Calculationofparameterpositions . . . 32
6.2 Tables . . . 32 6.3 Thequeue . . . 33 6.4 Replayers . . . 33 6.5 Extensions. . . 34 6.6 Userinput. . . 35 7 Call modications 36 7.1 Classes. . . 36 7.1.1 Vectorcalls . . . 36 7.2 Vertexarrays . . . 37
7.3 Modiedprojectionmatricesandviewports. . . 37
7.4 Otherspecialcases . . . 37
8 Discussion 38 8.1 Implementation . . . 38
8.1.1 Intercepting. . . 38
8.1.2 Enqueueing . . . 39
8.1.3 Replaying . . . 39
8.1.4 Overalldesign . . . 39
8.2 Applicationsupport . . . 39
8.2.1 Working applications. . . 39
9.1 Networkedrendering . . . 41
9.2 Callaliases . . . 41
9.3 Advancedfunction classes . . . 41
9.4 Multiplecongurations. . . 42
9.5 Pluggablerenderbackends . . . 42
9.6 Ports andotherAPI:s . . . 42
9.7 OpenGLextension . . . 42
Bibliography 43 A ICD Interface 45 B Intercepting 47 B.1 Usingthehelperheaders . . . 47
B.2 Forwardtable . . . 47
B.3 LoadingICDpointers . . . 48
C Enqueueing and Replaying 50 C.1 Enqueuefunction . . . 50
C.2 Objectwrapping . . . 50
C.3 Functionpointers . . . 51
C.4 Enumerated . . . 52
2.1 Exampleofanautostereoscopicdisplay. . . 12
2.2 Exampleofafull3Ddisplay. . . 13
2.3 Frustumsforrenderingofastereopair. . . 14
3.1 TheDLLexporttable. . . 16
3.2 TheDLLimporttable.. . . 17
3.3 TheWin32OpenGLchain. . . 19
4.1 Applicationinitialization withTrojanDLL. . . 23
4.2 Applicationinitialization withICD. . . 24
5.1 UMLclassdiagramoftheobjectwrappingmethod. . . 27
6.1 Flowbetweenthetables.. . . 33
1 ExplicitloadingofaDLL(pseudocode).. . . 17
2 Enqueueingusingfunctionpointers(pseudocode). . . 28
Introduction
1.1 Background
Regular OpenGL applications render a 3D scene to a single screen space, a
frame. When rendering for amulti-view or3D display, more views or angles
ofthesamescene needtobeproduced. A lot ofOpenGL driversalready
sup-portstereorendering,butautostereoscopicdisplaysneedtorelyonapplications
supporting their rendering algorithms. One such autostereoscopic display is
theScanningSlitDisplaybeingdevelopedbySetredAB.Thisprojectwas
tar-getedprimarilyatsupportingSetred'sdisplay,buttheworkisapplicabletoall
autostereoscopicdisplays.
Theneedtoprovideper-applicationsupport makesdevelopmentandadoption
ofnewdisplaytechnologyharderandmoretimeconsumingthanitwouldbeif
3Dcardsdirectlysupportedmulti-viewrenderingintheirdrivers.
1.2 Purpose
ThepurposeofthisthesisistoinvestigatedierentwaysofinterceptingOpenGL
calls under MicrosoftWindowsto providearenderingarchitecturefor 3D
dis-plays,andimplementasetofthesetechniques. Theinterceptorshouldprovide
support for basic OpenGL versions and work with OpenGL application that
works in a xed pipeline way, not renderingto textures ordoing any fancy
renderingtechniques. Itshouldalsobeaseasyaspossiblefortheusertocontrol.
1.3 Scope
Thescopeofthisreportistodiscussthreedierentintercepting,andenqueueing
techniques. Selecting one of each type as the most suitable for the purpose
and provide adetailed descriptionon thenal implementation ofthe selected
This report is formed asan investigation and implementation of achosen
so-lution. The method used is divided into four parts; study of previous work,
implementation and testing of the selectedtechniques, anal productionand
presentationofthework.
The study of previous work is intended to widen the perspective and to
pro-videinitialsolutionideasontheproblems presented. Theimplementationand
testingistoprovideenoughtestingontheselectedtechniquestoselectthebest
accordingto the criteriaof the nal implementation, and thenal
implemen-tationistotesttheselectedtechniques thoroughlyandshowwhich limitations
theyimply. Thepresentationstageiswhentheproductionofthereportistaking
place,aidedbynotesanddocumentationtakenduringthewholeprocess.
1.5 Overview
Thisthesisisdividedintofour parts. Therstpart,Chapter2,provides
back-groundinformationon3Ddisplaysandtheneedforaninterceptorand/or
spe-cial driver support. The second part, Chapter 3, 4 and 5, presents the DLL
and OpenGL systemsin MicrosoftWindows, interceptionmethods and
meth-odsforenqueueingandreplayingOpenGLcalls. Thethirdpart,Chapter6and
7,presentsthenaldesignandthemodicationsmadetomakecallsworkwhen
replayed. The last part, 8and 9, discusses the limitations of the systemand
3D displays and interaction
This chapter aims at being anintroduction to what a3D display is and why
an interceptor might be needed. Several types of displays are discussed and
dierentrenderingalgorithmsareoutlined.
2.1 Why is there a need for an interceptor?
Today's3D accelerators(GPU:s)havesupport for renderingtoasingle frame
bueror to bothaleft andarightbuer,providing support forsingledisplay
(regularmonitor)orstereodisplay.
In atypicalGPU implementation, asinglevertexis transformedand the
cor-responding primitive israsterized to pixels. A singletriangle canin that way
onlybetransformedtooneviewspaceto berasteredatone time. This isnot
inlinewithwhat3Ddisplaysrequire,asa3Dframe canbecomposedof
mul-tipleviews. Toproducethesemultipleviews therenderingcommandsmustbe
iterated N times (where N is thetotal numberof views) with dierent model
viewand/orprojectionmatrices.
Generating the views could be the responsibility of either the driver or the
application, but today's drivers do not support that kind of rendering, and
modifying each applicationto support each3Ddisplaysdierentrendering
al-gorithm would be too much work. Until GPU:s and drivers provide broader
support for 3D displays, there is a need to intercept calls between the
appli-cation and thedriver. This extra layerwill recordthe commands requiredto
renderaregular2Dframeandreplayitasmanytimesasneededfromdierent
anglesto create a3Dframesuitablefor thetargetdisplay. Ideally,this would
bedonebytheGPUorthedriver,but sincecurrenthardwaredoesnotexpose
thisfunctionality,alternativeimplementationsareneeded.
Some GPU drivers already support stereo rendering [1], but whether this is
donebyenqueueingandreplayingthewholesceneorbydrawingeachprimitive
in rst the left, and then the right buer is a closely guarded secret by the
GPUvendors. Otherinterceptorsexist,themostwellknownbeingChromium
renderersforlargedisplaywalls(multipleprojectorsorscreenstiled).
2.1.1 Other possible applications
The Stanford Chromium system, formerly WireGL, [5] is an interceptor
sys-temmadefornetworkedrendering. Chromiumsupportsmultiple rendernodes
connected to a provider. The nodes are in turn connected to projectors or
othertypesoflargescreenstoprovidelargevideowalls. Thisapplicationhasin
turninspiredseveralnewideasonhowtousetheChromiumarchitectureto
per-formgeometrictransformationsontheOpenGLcommandstream. Oneofthem
presentsawayto slicescenesto provide better waysto visualize architectural
3Dscenes[11].
As mentioned before, interceptors can be used for debugging applications by
looking at the calls they make. Applying this to OpenGL, debugging an
ap-plication can provide information on states, break on certain calls and track
memoryusagefortexturesand geometry. Thecommercialdebugger
gDEBug-ger [16]providesaninterfaceforviewing alltexturesand displaylists created
bytheapplication, alongwithbreakpoints,logsandstatistics.
2.2 3D displays
There are a lot of dierent 3D display systemsavailable, ranging from under
100sektoseveralmillionsekinprice. Therangecoverseverythingfromsimple
red/blueglassesto multi projector systemsforfull wall displays. Thissection
isintendedtoshedsomelightondierenttypesof3Ddisplays.
2.2.1 Simple stereo displays
A stereodisplayis adisplay that provides two separateimages to the viewer,
onefortheleftandonefortherighteye,withthelimitationthatonlythosetwo
imagesarerendered, independent ofthenumberofviewersandtheirposition.
The simplest example of that kind of system is red and blue glasseswhere a
redlterforaleft eyeandablue fortherighteyeltersoutthetwodierent
imagesfromaregularscreenorprojector.
Other variationson this theme can providefull color forboth eyes. Oneway
isto polarizethelightin orthogonaldirections whenpassingoutofthescreen
andusepolarizinglensesintheglasses. AnotheristouseasimpleLCDshutter
foreacheyeanddisplaythestereopairsequentially,switchingtheeyethat the
image will reach. The last approach requires active glasses that have to be
synchronisedwiththescreen,usuallybywireorIR.
2.2.2 Autostereoscopic displays
Unlike regular stereo, autostereoscopicdisplays [3] do not requirethe user to
parallaxbarriersisusedtotransmitdierentimagesindierentdirections. An
exampleofthisisshowninFigure2.1. Theleftandrightimagesareinterlaced
andabarriermaskisplacedtoblocktheleftimagefromreachingtherighteye
andviceverse.
This kindof screen putssevereconstraintsonthe user. Depth isonly
experi-encedwhentheeyesarereceivingthecorrectimage,whichrequirestheuserto
hold theirheadinaxed position. Toenablemovement,severalscreensusea
techniquecalled head-tracking. Ahead-trackingsystemfollowstheuser'shead
and/oreyesandupdatesthedirectioninwhichtheimagesarecorrecttomatch.
Thisputsalimitonhowmanysimultaneous usersthedisplaycanhave.
2.2.3 Holoform, more than stereo
Holoformdisplaysareanothertypeof3Ddisplay. Thistypeofdisplayrequires
morethan twoviews of the scene (therefore morethan stereo)but can
pro-videdepth experienceforallviewersinsidethedisplay'sviewcone. Figure 2.2
providesanexampleof this, where aset ofimages from theleft and right are
sentout,andtheusercanreceive3Dexperienceinsidetheareawherebothleft
andrightimagesareprojected.
This type of screen does in general have worse depth quality since there are
no optimized viewing positions, but provides another very important depth
queue; motionparallax. Tothis pointall othertypesof screen discussedhave
providedonly twoviews,independent of theuser position. Holoformdisplays
allow users to move freely inside the viewcone, providing the ability to look
aroundobjectsto acertain degree. Examples ofsuch displaysarethe display
developedby SetredAB andthe 3D TVdisplay developed by Mitsubishi
Re-search[8].
2.2.4 Other types of display
Apartfromthetypesofscreenmentionedabove,thereareacoupleofcompletely
dierentdisplaytechniques.Oneofthemistoprojectanimagesetontomultiple
stackedLCD:s,oneatatime. DependingonthenumberofsuchLCD:s,theuser
Commontomostofthedisplaysdescribedinthischapteristhattheyallrequire
somethingmore than just the standardD-SUB/DVIoutput from thegraphic
card,buttherenderingalgorithmsmightvaryverymuchbetweenthem.
2.3 Stereo rendering
Astereopairistwoimagesof thesamescenetakenwithaslightdisplacement
of thecameraposition (seeFigure 2.3). Thisis usually donefrom anexisting
viewfrustumbyselectinganimage projectionplane withthedistance Dfrom
thecameraandacertainstereoseparation. Assumingarighthandcoordinate
system, xaxisto theright,camerapositioned at theoriginand looking along
negativez(standardOpenGLsetup),thecameraisdisplacedbothleftandright
renderingoneimagefromeachposition.
As thecameraismovedsideways,theintersections betweeneach frustumand
the image projection plane are also translated. To keep the physical image
planethesamebetweenthesetranslations,thefrustumsaresheared. Theabove
mentionedimageshowshowthefrustumisshearedwhenmovingthecameraa
A 3Ddisplayhasmany possible congurationoptionsthatare notneededfor
aregularscreen. AsOpenGL hasnodenedimage projectionplaneand most
renderingalgorithmsfor3Ddisplaysrequireoneto bedened,theinterceptor
might needto have onedened. Dierentprograms may requiredierent
co-ordinate systems, or at least a dierent scale of geometry, making the image
projectionplaneagoodvaluetobeuser-controlledinsomefashion. Asseenin
thestereorenderingexamplein Figure2.3, thecamerais translatedsideways.
Dierent eye separation values will give dierent viewing results, which may
alsobesomethingtheuserwillwanttotweak.
ThenVidia stereodriversupports amultitude of controls [2, 1], themost
no-table being clamping thedepth experience bothat the frontand at the back
of theviewfrustum. This means that objects verynear to and veryfar away
fromtheviewerareprojectedaccordingtotheregularviewfrustumtothe
im-age projectionplane. These distances are also something that could be user
controlled.
2.5 Conclusions
As more and moretypes of 3D displays become available, dierent rendering
algorithmsmustbesupported,eitherbytheapplicationorbysomeotherkindof
layer. Thismakesaninterceptorafeasiblesolutiontotheproblemofproviding
per application support for each possible rendering algorithm for multi view
The Win32 DLL and OpenGL
system
ThischapterprovidesanoverviewonhowDLL:s(DynamicallyLinkedLibraries)
arehandled inMicrosoftWindows. Italsoaimsto describehowOpenGLcalls
arehandledinWindowsandhowanOpenGLcallreachesthehardwareafterit
ismadefromanapplication.
3.1 The Win32 DLL
On Windows,the PE (Win32PortableExecutable)denes astandardformat
for both DLL:s and standard Win32 executables (EXE) both on disk and in
memory [13, 14]. On disk, the DLL is composed by a set of headers and a
sectiontable describing what sectionsare available. A sectioncanbeeither a
codeordatasection,but wherethereisjustone typeofcodesectiontherecan
bemanytypesofdatasections.
Algorithm1Explicit loadingofaDLL (pseudocode).
function_pointer glBegin;
dll_handle handle;
handle = LoadLibrary(opengl32.dll);
glBegin = GetProcAddress(handle, glBegin);
// Use acquired pointer
FreeLibrary(handle);
Oneimportantpartforus is theexport table (seen inFigure 3.1). This table
containsinformation onthe numberof exported symbols,their addressinside
the DLLand their names. The addressof these symbolscan then be fetched
usingthedierentmethodsdescribedin Section3.2. Anotherimportanttable
istheimporttable(seeninFigure3.2)whichholdsinformationonallDLL:sto
beloadedimplicitlyandwhichsymbolstoloadfromthem.
3.2 Implicit and explicit linking
DLL:s can be loaded in twodierent ways. Implicit (load time) and explicit
(runtime)linking.
Implicit linking happens when linking to the DLL and its .lib le at compile
time, marking certain symbols in the code to be imported from that specic
DLL.WhentheexecutablethatlinkstotheDLLisloaded(thiscanbeeithera
DLLoraregularEXE),theimporttableistraversedandallDLL:sareloaded
intomemory. Anexecutablethatisimplicitlylinkedagainstanothercontainsa
tableofsymbolstoimport. Thistablecontainsthenameandadummypointer
to thefunction to beimported. Until theimplicitlylinked DLLis loaded,the
symbols in this table cannot be used. When theDLL to beloaded is fully in
memory, thelinkerwill traversetheimporttable lookingupeachsymbolfrom
theDLL:sexporttablebynameandreplacingthedummypointerwiththereal
library is loaded by calling the LoadLibrary function and released with the
FreeLibrary function. When aDLL is loaded, the developercan call
GetPro-cAdresstorequestaspecic symbol. ThismightlookasinAlgorithm1,where
opengl32.dllis loadedandanexportedsymbolisfetched.
3.3 The DLL loading sequence
WhenloadingaDLL,Windowslooksinspecicplacesfortherequestedle.
Us-ingthedefaultbehaviouronWindowsXP,thefollowingdirectoriesarechecked
in order[10]:
1. Thedirectoryfromwhich theapplicationloaded.
2. Thecurrentdirectory.
3. Thesystemdirectory(c:\windows\system32).
4. The16-bitsystemdirectory(c:\windows\system).
5. TheWindowsdirectory(c:\windows).
6. Thedirectoriesin thePATH environmentvariable.
During loadingof aDLL an optional entry point is called, if available. This
function,called DllMain[10], hasveryrestrictedfunctionalityandshould only
providethesimplestofsetup. CallslikeLoadLibraryhavetobecalledatalater
point.
3.4 The OpenGL chain
OpenGL support in Windows is provided by the opengl32.dll located in the
system32directory. This DLL exportssymbolsequalto OpenGL 1.1with the
addition of some WGL extensions for selecting rendering contexts and pixel
formats. Most applications using OpenGL link to opengl32.lib which implies
implicit loadingof opengl32.dll but there are someapplications, mostnotably
the Quake series by iD Software, that load the DLL explicitly and fetch the
required symbols. WGL functions are platform specic (Microsoft Windows)
extensionstoOpenGLthatprovidesupportforselectingpixelformatandother
framebuerrelatedfunctionality.
WhenOpenGLsupportforwindowswasintroduced, therewheretwowaysfor
ahardwarevendorto providesupport. EitherProviding aMini ClientDriver
(MCD) or a Installable Client Driver (ICD). Both these driversare included
into the Windows 2D GDI driver package, provided by graphic card vendors
to make the operating system utilize the cards capabilities fully. The MCD
is aprimitiverasterizationpath that exports a number of calls to Microsoft's
OpenGLimplementation,tobeusedinsteadofsoftwareblitting. Asall
Microsoft'simplementation couldbeused asanfall backforfeatures not
sup-portedinhardware,thehardwarevendorcouldconcentrateongettingOpenGL
support upandrunningwhileimplementingthemuchmorecomplicated ICD.
The ICD is a full blown OpenGL implementation, where the vendor has to
implementallOpenGLcallsandprovidesoftwareimplementationforanything
thatthehardwareisunabletoprovide. SinceMCDsupportwasremovedfrom
Windows 98, providing the much more complicated ICD is the only way to
achieveacceleratedOpenGLonMicrosoftWindows.
Theopengl32.dll in turnfetches thehardwarevendorimplementationfrom an
InstallableClientDriver(ICD)installedbythevendor. ThisICDDLLisfound
bylookingattheregistrykeyHKEY_LOCAL_MACHINE\SOFTWARE
\Microsoft\WindowsNT \CurrentVersion\OpenGLDrivers. Under thiskey
the vendorprovides informationon which DLL to load astheICD. The ICD
thenprovidesatablewith alltheOpenGL1.1 calls. WhiletheICDis mostly
undocumented in thePlatform SDK provided by Microsoft, many alternative
OpenGLimplementationshavereverseengineeredthisfunctionality. Theopen
sourceOpenGLimplementationMesa3D[15]hasanICDimplementationwhich
denes thefunction calls dened in Appendix A. If no ICDis registered, the
OpenGL DLLcan fall backonto aMiniClientDiver(MCD) oreven software
driver[12]. ThecompleteOpenGLchain isshownin Figure3.3.
The SetContext call is probably the most important call here. It returns a
pointerto atablewhere therstelementis aDWORD(anMicrosoftspecied
data of size 32 bits) containing the total number of function pointers in the
table. Theorderofthesefunction pointershasalsobeenreverseengineeredby
Mesa.
3.5 Conclusion
The easiest way to intercept OpenGL calls seems to be to override the DLL
originalICDoropengl32.dllandplacingtheinterceptingICDorTrojanintheir
placeisalsofeasible,butasthatwouldcollidewiththeconsistencyofthesystem
itwasdecidedtointercepteitherbychangingtheICDnameintheregistryorby
Intercepting the calls
Therststeptointerceptingafunctioncallistosomehowfetchthecallfromthe
senderbefore itreachesthereceiver. Somelogiccanthen processthefunction
argumentsanddecidewhattodo. Thischapterdescribesdierentinterception
strategies for OpenGL on Windows. The dierent strategies all come from a
singlebase idea: trickingsomepartofthesystemtoloadadierentDLL than
usual.
4.1 Helping headers and structures
Tomakeimplementationandtestingeasier,acoupleofheaderswerecreatedto
easethecreationandprocessingofthemultitude ofOpenGLcallsthat needto
beimplementedforaninterceptor. Thesewaslatercreatedusingthe
specica-tionparser describedin Chapter 6. Theheaders consist ofcalls to aCmacro
calledPROCESS_NAMEthat containsthefollowinginformation,in order:
prex The prex that the function call has. Drv for ICD calls, wgl or gl.
The wglcallsare separatedfrom regularglcalls since theyareplatform
specic,andarenotimplementedbytheICD.
ret Thereturntypeofthefunction
name Thefunction name,withouttheprex.
args_tn Allargumentswithtype,written asin thefunctiondeclaration. Eg.
(ArgType1 arg1,ArgType2 arg2,... ArgTypeN argN).
args_n All argumentswithouttype(just thename) written asin afunction
call. Eg. (arg1,arg2,... argN).
num TheICDnumberof thefunction,onlyusedfor determiningtheorderin
headers and then undening the macro. The reasonfor these headers wasto
beabletocreatesomekindofloop whereaspeciedoperationcouldbedone
percall,withalltheinformationaboutthecallprovided. Forexamplecreating
astructure withtypedfunctionpointersto allOpenGL1.1calls.
AppendixBprovidesexamplesonhowthese headerscanbeusedto performa
multitudeofpercalloperations. Includingenumeration,andloadingoffunction
pointers.
4.2 Trojan DLL
InterceptingwithwhatIcallaTrojanDLLissimplyputtinganewDLLwiththe
samenameastheoriginalone(opengl32.dllinourcase)intheDLLsearchpath
beforethe onein system32(see Section3.3). Note that Trojanin the context
doesnotmeanaproviderofmaliciouscode,butaOpenGLimplementationthat
foolstheapplicationtothinkitistherealone.Tomakethiswork,theDLLmust
exporttheexactsamesymbolsastheoriginalandcannotload-timelinktothe
realopengl32.dllsinceaDLLwiththatnameisalreadyloaded. SinceWindows
XP, allsystemles areunder the controlof WindowsFile Protection (WFP).
The resultis that nosystem les canbeoverwrittenby mistake orpurpose
withoutturning theprotectioncompletely o. Thisdismisses the approach of
renamingtheoriginalopengl32.dllwhilelettingtheTrojantakeitsplace.
Figure4.1describeshowtheinitializationsequenceandthehandlingofOpenGL
callsisdonewhenanTojanDLLinterceptorisloaded. Duetothenatureofthe
Windows DLL loader(described in Chapter 3) , the Trojan DLLcannot load
therealOpenGLDLLwheninside DllMain;thishastobedonewhentherst
callismadetotheexportedsymbols. Duringthisinitialization(markedasInit
interceptor in theimage),thereal OpenGLDLL andallexportedsymbolsof
that DLL isloaded and storedinternally in theinterceptor. All calls cannow
beinterceptedandmodied.
Toimplementthis,aDLLwithallOpenGL1.1andWGLfunctionalityneedsto
becreated. Thelistofsymbolstoexportcanbeachievedbylookingonthelist
of symbolsthe opengl32.dllprovides(using thedumpbin utility in Microsoft's
PlatformSDK).ThisDLListhenplacedineachapplication'sdirectoryandisin
thatwayloadedasthedefaultOpenGLDLLonsystemswiththedefault DLL
searchpath. Oncefullyloadedsomeinitializationcodeisrunthatloadsthereal
DLL using aLoadLibrarycall and logicthat loads allthe real OpenGL
func-tionpointers forinternalstoragein theTrojan. Incoming callstothe Trojan's
exported functions canthen beprocessedbefore forwardingto the realDLL:s
function pointers. Thepointerscanbeloaded andusedasshownin Appendix
B.
4.3 OpenGL ICD
Creatingan OpenGL ICDisa variationonthe Trojan DLL.This method
func-tions, onlythe ICDfunctions for requestingand managing contexts andpixel
formats are exported. The real symbols canthen be imported when the
Set-Context function is called, returning pointers to the internal GL functions in
theDLL.
Figure4.2showsthedierencebetweeninterceptingwiththeTrojanDLLand
theICD.SincetheICDinterfacespeciesthatDrvSetContexthastobecalled
before calling any OpenGL functions, wecan concentrateon loading thereal
ICDthere. ThehandlingofOpenGLcallsisnotshown,butdoesn'tdierfrom
theTrojanversionotherthatmodicationofthecallhappensaftertheOpenGL
DLLinsteadofbefore.
BuildingupontheTrojanDLLimplementationdescribedabove,theDLLneeds
to implement all OpenGL calls and all ICD calls. Using the same approach
as when interceptingwglSwapBuers, DrvSetContext can beforced to return
pointerstotheinternalOpenGLfunctioncallsinsideourICD.Thisisshownin
Onemethodofinterception,describedin[9],istohookontoDLLloadmessages
for aspecic thread and force the loadingof our Trojan DLL. This approach
issimplyanotherwayofconnectingtheTrojan totheapplication withoutthe
needtocopyles. Thedownsideisthatapplicationsthatshouldbeintercepted
have to be launched by some kind of control application that initializes the
injectioncode.
Thismethod was notinvestigatedfurther, but couldbeaviablealternativeto
copyingtheTrojanDLLto eachapplication folder.
4.5 Conclusions
The Trojan DLL seems to be the moststraightforward implementation of an
interceptionmechanism,butmightrequiresomeuserinteractionbeforean
ap-plicationcanbeintercepted. An ICDontheotherhandcouldworkcompletely
withoutuserinteraction,butrequiressomethoughtonwhichfunctionpointers
are returnedand builds upon undocumented functionality in Windows. Both
thesemethods seemsviablealternativesforanOpenGLinterceptor
implemen-tation. DLLinjectioncouldalsobepossible,injecting eithertheTrojanorthe
ICDinto thehostprocess,but this wasnottesteddue to time limitationand
Enqueueing and replaying
Whenthestructuresforinterceptingcallsareinplace,asdescribedinChapter
4,thecallscanbealteredandchanged. Theaimofthisthesisistoenqueueand
replaycalls, which meansthat theDLL needsto havesomeinternal structure
to holdthecalls. This chapterwill discusshowthisinformationcanbestored
and replayed. Three dierentapproacheswill be discussedtogetherwith their
strengthsandlimitations.
AfunctioncallcanbeseenasamessagecontainingamessageID(thefunction
name)andsomemessagedata(thearguments). Somecallsrequireinformation
to bereturnedwhich meansthat theOpenGLcommandsneedto beexecuted
while enqueueingto keepthe real driverin the correctstate for returnvalues
onlatercalls. Chapter4describedhowthecallscouldbeintercepted,deciding
thatallavailablemethodsrequireafunctionpointertobecalledwiththeright
conventionandnumberofarguments. ThismeansthattheDLLwillimplement
allOpenGLcallsandthateachcallwillhaveafunction bodythat cancontain
anydesirablelogicthatmightbeneededtoenqueuethatcall,withthelimitation
that the returnvalueneeds to be thereturn valueof thereal OpenGL callin
thecontextthatitiscalled.
Themostimportantfeaturesrequiredoftheenqueueingtechniquesareenqueue
speedand thepossibility to handledierenttypesof OpenGL functions. One
majorproblemisthepossibilitytohandlevectorfunctions,functionsthathavea
pointerasargumentandcanfetchaxedorarbitrarynumberofdatafromthat
pointer. A test interceptor wascreatedfor each ofthe techniques to evaluate
functionalityandperformance.
5.1 Object wrapping
ThismethodcanbeseenasanimplementationoftheCommanddesignpattern
[4], encapsulating a call (or command) into an object to be replayed at will.
Each call can be represented as an object with private variables holding the
datasentas parameters. Deninganabstractbase classGlCall which hasthe
throughthecallsandexecutingthemeasy. Therealcallobjectthenonlyneeds
to implement aconstructor that takesallthe arguments, storethe arguments
internallyandimplementexecute()tocalltherealOpenGLcallwiththestored
arguments.
Forexample,thecallvoidglBegin(GLenummode)canberepresentedasan
ob-jectoftypeGlCall_glBeginwithaninternalvariable oftypeGLenumholding
thesamevalueastheargument,seeAppendixCforasampleC++
implemen-tationofonecall.
Thismethodcan supportdierenttypesoffunctions,forexamplevector
func-tions, by letting the object constructor store all data in the vector and by
providingapointertothestoredvectorwhencallingtherealOpenGLfunction.
The main problem with this technique is that creating and deleting objects
each frame has ahugeimpacton performance. Although thismethod is easy
toimplement,itdidnotperformwellonlargedatasets (manyfunction calls).
Thiscouldbesolvedbypoolingtheobjectsinsideaspeciedmemorysegment.
5.2 Function pointers
Thismethodisalowlevelversionoftheobjectwrappingtechniquementioned
above. Insteadof having avectorof object pointers, avectorof, for example,
unsignedintsisusedtostorethepointertotherealOpenGLfunction,the
num-berof unsignedintsthe datatakesand thedata(see AppendixCfor enqueue
abstraction). Asseen,thedataispushedinreverseorder,sothatreplayingcan
be done without looping backwardsthrough the queue. Thecall is replayed
by fetching the function pointer, pushing the enqueued numberof arguments
tothestackand callingthefunctionpointerusingalowlevelCALLassembly
instruction,seeAlgorithm2foraquickoverviewusingpseudocodeand
Appen-dix CforasampleC++implementationof onecall. Notethat thecallin the
examplealgorithmisactiveOpenGLcallusedtoillustratehowthetechnique
works. Replayingcan be doneinaloop,notrequiringspecial replayfunctions
to be called (likethe execute method in the object wrapped technique).
Al-though this can look likea good way to handle replaying, it makes handling
function glFunctionEnqueue(data[N]) do enqueue(address_of(glFunctionReplay)) enqueue(N) for i = N..1 do enqueue(data[i]) end end function Replay() do int offset = 0
while offset < queue.size() do
functionAddress = queue[offset] offset += 1 N = queue[offset] for i = 1..N do system_stack.push(queue[offset + i]) end CALL functionAddress offset += N end end
it mightnot be feasible due to thehard coded function pointer and thexed
amountofdatato bepassedasargumentsonthesystemstack.
5.3 Enumeration
Enumeration is a variation on both of the previously mentioned enqueueing
techniques. As in thefunction pointer technique,alldata is storedin alistof
unsignedints,butinsteadofstoringthefunctionpointer,auniqueidentierfor
eachcallisstored,andthefunctionargumentsarestoredafterthat.
Replaying can be doneby having atable of function pointers indexed by the
uniqueidentierpushed to thequeue. To be ableto handledierenttypesof
functionsareplayfunction isdened. Thisfunctioniswrittenlikeunsignedint
glFunction(unsignedintoset)andtakestheosetoftherstdatainthequeue
asanargument. Thefunction canthen replayusing anylogicit wantstoand
returns the osetto oneposition pastthe last data, theposition of the next
function identier. This way, the enqueueing function canput arbitrarydata
inthequeue,assumingthat thereplayingfunction canparseit andreturnthe
osetofthenextfunction,makingwayforveryspecializedfunctions. Thesame
enqueue abstraction asused in Section 5.2 canbe used, but modied sothat
arguments with size larger than 4 bytes are not enqueued in swapped order.
Algorithm3showsapseudocodeimplementationusingthismethod, usingthe
samectiveOpenGLcall as inthefunction pointer example. SeeAppendix C
forasampleC++implementationofonecallusingthismethod.
From adierent pointof view,this method is almost exactlythe sameasthe
object wrapping method described above. A constructor (enqueue function)
allocatesmemoryinthequeueforthedatatobestoredandaexecutefunction
(replay function) reads the specied data from the queue and provides it to
thereal OpenGL function. Themain dierence is that insteadof naming the
variables,thistechniqueworkscompletelywiththeaddressesinside thequeue,
notneedingtoallocatememoryeachtimeanewcallismade.
5.4 Conclusions
All theabovediscussedenqueueingmethodswork,butwith varyingexibility
andperformance. Theenumerationmethod hasenoughexibility (canhandle
bothspecializedenqueueandreplayfunctions)andoperatesinapre-allocated
memoryspacewhichremovesthemostdominantperformancebottleneckofthe
function glFunctionEnqueue(data[N]) do enqueue(enumeration(glFunction)) for i = 1..N do enqueue(data[i]) end end function glFunctionReplay(offset){ data[N] for i = 0..(N-1) do data[i + 1] = queue[offset + i] end realGl.glFunctionReplay(data) return offset + N end function Replay() do offset = 0
while offset < queue.size() do
functionEnumeration = queue[offset]; functionPointer = get_replay_func(fenum); offset += 1 offset = functionPointer(offset); end end
Final design and
implementation
Thischapterdescribesthenaldesignandimplementationcreatedin this
the-sis,togetherwithutilities tosimplify addingnewextensionsandOpenGL
ver-sions. Thedesignis heavilybasedontheTrojan andICDexamplesdescribed
inChapter4,withaseparateDLLforeachimplementation. These DLL:sacts
asafrontendtotherealinterceptor,astaticallylinkedlibrarycommontoboth
frontends.
6.1 Specication parser
Writingeach OpenGL function by hand is notfeasible and timeeective. To
make theinterceptor moremaintainableand exible, asmall Rubyscript was
created. This scriptparses anumberof extensionles containinginformation
aboutallOpenGLfunctionsandgeneratesfunctionsforenqueueingand
replay-ingeachcall. Asabase,theChromiumspecicationlewasused. Thislewas
parsedinto the internal data structure in the Ruby scriptand written to le.
Parsing is done before compiling the interceptor, providing information about
allpossiblesupportedcalls.
Eachfunction callisspeciedusingthefollowingkeywords:
name Nameofthefunctioncall,withoutprex. E.g. BegininsteadofglBegin.
Thislinemustbetherstline ofthespeciedcall.
prex Theprexofthespeciedcall. Eithergl,wglorDrv.
return Returntypeofthefunction.
param Speciesanargumenttothefunction. Takestwowhitespaceseparated
arguments, the rst is the name of the argument and the second is the
typewhichcancontainwhitespaces(e.g. const GLint*). Thiskeyword
canbeused several times in afunction specication, but thenames has
the rst is the name of the parameter (as specied by param) and the
second isthenumberofelementsinthevector.
category TheversionofOpenGLorthenameoftheextensionwherethiscall
wasintroduced.
number TheICDnumberofthespeciedcall. Setto-1forallothercalls.
type Functionclassicationusingthekeysspeciedin Chapter7.
name Vertex3fv
prefix gl
return void
param v const GLfloat *
vector v 3
category 1.0
number 137
type
Table6.1: Example ofafunction callspecication.
6.1.1 Calculation of parameter positions
AsshowninSection5.3,replayfunctionsmustcalculatesizesofargumentsand
addthosetogethertondthepositionofthenextargumentinthequeue. This
is donein theparser, generating code that compilesto aconstantwhen using
an optimizing C compiler. Whenenqueueing, all data sizes is rounded upto
the nextmultiple of8. This is also truefor vectorcalls, but here isthe total
numberof bytes enqueuedrounded upto thenextmultipleof8.
6.2 Tables
The design of the interceptor revolves arounda number of tables of function
pointers. Thesetablescontrolthebehaviouroftheinterceptoratagivenpoint,
makingdierentinterceptionstrategiespossible.
In The table containing all entry points to the DLL and all OpenGL calls
that are available usingwglGetProcAddress. Each function in thistable
forwardsthecalltothereceivetable. Thisextrastepisneededtoprovide
theabilitytochangeinterceptionstrategiesforreceivingcalls.
Receive Thetablecontainingfunctionpointerstowhateverwillhappenwhen
acallisreceived. ThiscanusuallybeapointertotherealOpenGL
imple-mentationortoanenqueuefunction. These pointersareinterchangeable
Replay Thetablecontainingfunctionpointerstothereplayfunctionscurrently
in use. Thistable isusedbythereplayfunctions.
Debug Thetable containingfunction pointersto the debugfunctions. These
functionsusethesamecallingconventionasthereplayfunctionsanddump
alldataenqueuedinaframetole.
Real Thetable containingfunctionpointersto therealOpenGL
implementa-tion.
TheapplicationisonlyawareoftheIntable,oratleastthecontentofit. This
iswhereall thefunction calls willarrivetotheinterceptor. Thefunction from
theIntablewillthencallthepointer associatedwiththesamefunction in the
receivetable.
6.3 The queue
Theenqueuestrategyused in theinterceptoris thesameasthe onedescribed
inSection 5.3. An arrayofunsigned intscontainsauniqueidentierand
arbi-trarydata perfunction call. This data is enqueued onan individual basis by
afunction, and replayed by areplay function that mimics theenqueue when
readingthedata.
6.4 Replayers
A frame is considered done when wglSwapBuers is called, and the F
queued using the enqueuefunctions) a replayercanuse thequeue to perform
dierentrenderingalgorithmstocreatethenecessaryimages.
Figure6.2: Applicationowduringreplayingofthequeue.
Theinterceptor implementsamultitude ofdierentrenderersfor bothtesting
andrenderingofimagesto3Ddisplay.
RedrawClear Simpledummyredrawfunction that justclearsthequeueand
returns.
RedrawTiled Redraw function that loops through the queue 16 times,
ren-deringthescenetoa4x4grid.
RedrawRedBlueStereo Rendersredandblueimagesofthescene,withskewed
frustums, creatinganimagesuitableforred/bluestereoglasses.
RedrawCoreRender Renderssceneforholoform3Ddisplay.
6.5 Extensions
Asopengl32.dllonlyprovidesfunctions forOpenGL1.1,allotherfunctionsare
provided to the application using the wglGetProcAdress function. A call to
glGetString(GL_EXTENSIONS) will return alistof extensions supported by
thedriver. Toforce theapplication to onlyusethe extensionstheinterceptor
implements,anarrayofstringsiscreatedbythespecicationparser. Thislistis
comparedwiththestringfromthedriver,returningonlytheunionofthetwo.
pointer,theinterceptorinterceptsthecallandreturnsafunctionpointertothe
Intable. Therealpointeristhenstoredintherealtableinstead.
Newextensionsareadded byaddinganewlewith theextensionsname(e.g.
theextensionnamedGL_EXT_framebuer_objectsbecomesGL_EXT_framebuer_objects.txt).
Thisleisparsedusingthespecicationparserandimplementationsforall
ex-tensionfunctions arecreated.
6.6 User input
The input systemis designed around Windows hooks to planta system wide
keyboard hook. This is due to full screen application (mainly games) taking
full controlof thesystemand notletting otherapplications bevisible. Italso
provides awayto alwaysfetch certain keyboard keyswithout being forcedto
useangraphicaluserinterface. Thekeysareforwardedintotheinterceptorand
iseitherprocessedatonceorstoredasamessagethatisread before replay. A
methodcalledFrameControlliscalled fromallcallsthat swapOpenGLbuers
which willtriggerredraw,queueclearandmessageprocessingbeforeswapping
thebuers.
Currently,therearecontrolsforturningtheinterceptoron/o,switchingto
dif-ferentreplayers,settingthedistanceoftheimageprojectionplaneanddumping
Call modications
Notallcallsaresuitableforenqueueingandreplaying,andsomereplayersmight
need to substitute some calls with special calls during replay. This chapter
describesthedierentclassesthefunctionswerepartitionedintoandhowthey
act.
7.1 Classes
Callswerecategorizedtocreatedierentlogicdependingonthecalltype. The
followinglist describescall modiers that can be put in the type eld in the
function specication les. Nomodiers means that the callshould be
inter-cepted.
Get AfunctionwiththepurposeoffetchingdatafromOpenGL.Doesnotneed
tobequeuedsincenorealcallerexistsduringreplayandnostatechanges
in OpenGLare made. Callsin this classare forwardedto thereal table
and do not haveany enqueuer orreplayers(the pointer in the enqueue
table isequalto thepointerintherealtable).
Special This function has a special implementation, for example
wglSwap-Buers, wglGetProcAdress and glGetString. Used for calls that need
specialcare. Thepointer intheenqueuetable pointstoglNameSpecial.
None Notextin thetypeeld. Thiscallshouldbeinterceptedandenqueued.
7.1.1 Vector calls
Avectorcallisacallthattakesapointerto amemorysegmentandusesthat
during execution of the function. Since we are unaware of whether this data
haschangedduring theapplication's executionornot, wecannot assumeit is
correctwhen replaying. Tosolvethis, allvectorargumentshavetobedened
enqueuer copies the memory segment to the queue and the replayercalls the
realOpenGLcall withapointertotheenqueueddata,skippingthesize ofthe
datawhenreturning. Forcallswithavariablebutlimitedamountofdata, the
upperlimitisused.
7.2 Vertex arrays
Onespecial classof functionsis thevertex arraysfunctions. This classcanbe
dividedintopointerand drawcalls,thepointer callsprovidinginformation on
wheretoreadthearraysandhow(stride,typeetc.) whilethedrawcallsdothe
actualdrawing. AccordingtotheOpenGLspecication[7],thememoryisread
duringthedrawcall,transferringgeometrytothedriverandhardware.
Thememorylocationprovidedbythepointercallsisnotofaknownsizeuntil
adrawcall ismade,makingdirect cachingofthisdatainside thequeue
unfea-sible. Insteadadrawcallcouldbeexplodedintoregularimmediatemodecalls
(glVertex, glColoretc.). This is described in the OpenGL specicationand is
whatweusedhere.
Eachpointercall is storedasaninternal statetobeused later. Whenadraw
callarrives,thearraysareloopedthroughin thespeciedorderandthe
corre-sponding immediatemodecallsare enqueued. Whenreplaying,these callsare
treatedjust likeanyregularcalls.
7.3 Modied projection matrices and view ports
Asapplicationscanmodifytheprojectionmatrixatanytimeduringtheframe,
the replayers will need a way to endure that its modied matrices are used
evenafterthemodication. Toachievethis, theglMatrixModecallismodied
to storeif theapplication hasswitched to GL_PROJECTION andback,and
glBeginismodiedtoaddthereplayersprojectionmatrixbeforerenderingany
geometry.
Somereplayersmightwanttochange theviewportto rendertojust apartof
thescreen, which means that glViewport has to bemodiedduring replayto
scaleandtranslatetheviewportchangesaccordingtothereplayersneeds.
7.4 Other special cases
Some calls that can be considered asvectorcalls takea pointer to a memory
locationtogetherwithrowand columnstrideandthenumberofdata toread.
Discussion
The interception techniques and enqueueing methods described in this thesis
should beapplicable to mostcases where interceptingof function calls. With
verylittleoverhead,thefunctionsiscaught,processedandforwardedtoa
suit-ablereceiverortherealreceiver.Thischapterdiscusseshowtheimplementation
oftheselectedtechniquesperformsandwhatmightbeimproved.
8.1 Implementation
8.1.1 Intercepting
The interception techniques used where the Trojan DLL and the ICD. Both
these methods works verywell. TheTrojan isprobablythe moststraight
for-wardimplementationpossible,justexportingandforwardingselectedsymbols,
whiletheICDisabitmoretrickydue tomorefunctionpointerstokeeptrack
of. DLL injection wasnot investigating further, but might remain a feasible
alternativetobothimplementedtechniques.
As for a comparison between the implemented techniques, the Trojan is (as
said above) by far the easiest way to implement an interceptor on Windows.
Theonlyinformationneededisthefunction declarations,whichisgivenbythe
specicationandthespecicationparser, andawaytohandle various callsto
getnewOpenGLfunctions beyond1.1(wglGetProcAddress). Tablescanthen
easily be set up to provide desired interception functionality. This technique
can beextendedto theICD, but that requires somethought onhowpointers
are provided and handled betweenthe ICD,the application and theOpenGL
DLL.Performancewise,theyarebothonpairsincetheybothworkonthesame
setonfunction pointersallthroughondierentstagesin theOpenGLchain.
AsaDLLcannotloadanotherDLLexplicitlyinsideDllMain,boththeICDand
theTrojan needsto havesomemechanismto loaditsfunction pointers before
thestcallisprocessed. Fornow,thisisdonebycheckingtheisInitializedag
eachcall,andcallingInit() ifthisagisfalse. As thisonlyistruefortherst
ControlList).
8.1.2 Enqueueing
Thequeueimplementedusesastd::vector<unsignedint>,whichstoresa32bit
chunk of data in each queue position. This size wasused bothsince it is the
registrysizeof32bitCPU:sandthesizeofafunctionpointeronthatplatform.
The vector is initialized to a xed numberof elements, but is expanded each
time it is lled. This way, it will reach a level where no new allocations is
needed,andthequeuewill operateinaxedmemoryspace.
Raw enqueue performance is not the primary goal of this project since it is
targetedatatleastacoupleofreplays,wherethellrateandGPUbandwidth
will bethebottlenecks. On theother hand, theenqueue performance hit can
bemeasuredusingtheClearredrawerthatjust clearsthequeue. Onaregular
Quake2map theperformancedrops toabout70%oftheoriginal performance
usingthisreplayer. Thisforascenethat takesup81KB inthequeue.
8.1.3 Replaying
Asidefromtheobviousworkaroundsforprojectionmatricesandviewportxes,
the replayers are working ne. The replay overhead is almost none, since it
consists of just aloopwith twoqueue reads and afunction callto thereplay
functioninit. Thereplayfunctionthendoesanothercallandfetchesatotalof
N variables from thequeue, where N is thenumberof argument thespecied
functiontakes.
8.1.4 Over all design
Themostvisibledrawbackoftheinterceptoristheperformancedecreasewhen
usingvertexarrays. Althoughit seemslikeabottleneckforoptimization, that
taskmightbeharderthanitlooks likeatrstglance(seetheOpenGL
speci-cation[7]). Each elementin thevertexarray,colorarray,texturearrayetc. is
explodedintoaseparatecall,addingbothcallandqueueheaderoverhead.
8.2 Application support
8.2.1 Working applications
Asthepurpose ofthisthesisstates,theinterceptorshould work withasmany
standardrenderingapplications as possible. Init current state,it workson a
wholerangeofapplications. Fromsimpletriangletestapplicationstofullgames
Thecurrentimplementationassumesthat theapplicationdoesnotuseshaders
formulti passandrender totextures. It is alsoassumed that themodel view
andprojectionmatricesareprovidedtotheshaderusing thebuiltinOpenGL
matrices, and not using uniforms orother non standardconstructs. The
ap-plications must alsoswapbuersusing wglSwapBuers, sincethat isthe only
function thatwilltriggerredrawandaqueueclear.
Another problem canoccur when an application removes an object (texture,
displaylistetc.). Allcreationanddeletionofobjectsareignored(notenqueued
andreplayed),whichcouldcauseproblemsiftheapplicationcreatesanddeletes
objectsduring aframeandnotjustat thestartandendoftheprogram. This
problem couldbeavoidedbyhavingaremovelistthat storesallobjectstobe
Future work
This chapter contains thoughts on things that can be researched and
imple-mentedin theinterceptor.
9.1 Networked rendering
Asholoformdisplayscanrequiremorethan50dierentviewinganglesperframe
tocreateanacceptableimage,renderingperformancemightbeabottleneckon
reaching real time performance. The interceptor could propagate the render
queuetoothercomputersonthenetwork,thesamewayChromiumdoes,making
themrenderthesameframefromdierentanglesinparallel.
9.2 Call aliases
AsOpenGLevolved,manyfunctions that wereoriginally partofanextension
were included into the specication. Many of these functions are the same
functions with another name, taking thesame parametersand havethe same
speciedfunctionality. Thespecicationcouldcontaininformationaboutthis,
allowingtheparserto skipgenerating enqueuerandreplayersfor therenamed
calls, forwardingall calls to just one enqueuer. This would reduce total code
size and might provide additional optimization for networked rendering since
fewercallswouldhavetobeenumeratedwhichinturnwouldhelpcompressing
data.
9.3 Advanced function classes
Thecurrentimplementationprovidesautomaticparsingofvectorclassfunctions
that havealldata sequentiallyin memory. There are othertypesof functions
that passadynamicamountof memory, andthe specicationparser couldbe
extendedto parse these. Oneexample is theglMap* familyof functions that
AstheICDisconnectedto allOpenGLapplicationsonthesystem,somekind
of selective interception should be implemented. The user should be able to
selectwhich applications tointercept,and maybeeven set someconguration
options depending on how that application behaves. This could probably be
madebyfetchinginformationonthecurrentprocessandreadingconguration
parametersfrom acongle.
9.5 Pluggable render back ends
Render back ends (Tiled, RedBlue and CoreRender)are implementedas
sep-arate functions inside the interceptor. Tomakethe interceptor moreexible,
these back ends should be implemented asseparate DLL:s using adened
in-terface.
9.6 Ports and other API:s
Thetechniquesdiscussedand implementedon OpenGLfor Windowscouldbe
adapted toworkwithbothDirect3D onWindowsandOpenGL onother
plat-forms. Mostoperatingsystemsprovidesomesortofdynamicallylinkedlibraries
whichcanbeinterceptedusingatrojanlibrary. Thespecicationparsercould
thenbemodiedto generateinterceptioncodefornearly anyAPI.
9.7 OpenGL extension
As moreandmore3D displaysbecomesavailable onthe market,it mightnot
besuitableto foreach displayvendorprovideaninterceptorwiththerequired
renderingbackend. Toavoidamultitude ofdierent interceptors,astandard
interfacecouldbedened,lettingthevendorto hook onto theOpenGLdriver
and controllingrenderingofthe dierentviewports. All theinterceptionand
enqueueing could thebe performed inside the GPU vendors drivers(which is
probablyalreadydonein nVidiasstereodrivers).
Tobypasssomeofthelimitationsoftheinterceptorimplementedinthisthesis,
anOpenGLextension thatcontrolstheenqueueingcouldbeexposed,allowing
theapplicationtosomewhatcontrolwhichpartsoftherenderingtobeenqueued
[1] NVIDIACorporation. NVIDIA3D StereoUser'sGuide. NVIDIA
Corpo-ration,7.5edition,July2005.
[2] NVIDIACorporation. NVIDIA GPUProgrammingGuide2.4.0. NVIDIA
Corporation,2005.
[3] N.A.Dodgson.Autostereoscopic3DDisplays.Computer,38(8):3136,Aug.
2005.
[4] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.
De-signPatterns,ElementsofReusableObject-OrientedSoftware.Professional
ComputingSeries.Addison-Wesley,1995.
[5] GregHumphreys,IanBuck,MatthewEldridge,andPatHanrahan.
Distrib-utedrenderingforscalabledisplays.InSupercomputing'00: Proceedingsof
the 2000 ACM/IEEE conference on Supercomputing (CDROM), page30,
Washington,DC,USA,2000.IEEEComputerSociety.
[6] Greg Humphreys and Pat Hanrahan. A distributed graphics system for
largetileddisplays. InVIS '99: Proceedings of the conference on
Visual-ization'99,pages215223,LosAlamitos,CA,USA,1999.IEEEComputer
SocietyPress.
[7] KurtAkeleyMark Segal. The OpenGL Graphics System: A Specication
(Version2.0). opengl.org,2004.
[8] Wojciech Matusik and Hanspeter Pster. 3D TV: A scalable systemfor
real-time acquisition, transmission, and autostereoscopic display of
dy-namicscenes. ACM Trans.Graph.,23(3):814824,2004.
[9] DanielS.MyersandAdamL.Bazinet. Interceptingarbitraryfunctionson
windows, unixand macintoshosx platforms. Master's thesis, University
OfMaryland,2004.
[10] MSDN Network. Platform SDK: DLLs, Processes, and Threads. In
Mi-crosoftCorporation,September2005.
[11] Christopher Niederauer, Mike Houston, Maneesh Agrawala, and Greg
Humphreys. Non-invasive interactive visualization of dynamic
architec-turalenvironments. InSI3D '03: Proceedings of the 2003 symposiumon
November1997.
[13] Matt Pietrek. An in-depth look into the win32 portable executable le
format. MSDN Magazine,February2002.
[14] Matt Pietrek. An in-depth look into the win32 portable executable le
format,part2. MSDN Magazine,March2002.
[15] TheMesaproject. Mesa3DOpenGLimplementation. www.mesa3d.org.
[16] GraphicRemedy. gDEBugger. www.gremedy.com/products.php.
ICD Interface BOOL DrvCopyContext(HGLRC hGlrcSrc, HGLRC hGlrcDst, UINT mask) HGLRC DrvCreateContext(HDC hDc) BOOL DrvDeleteContext(HGLRC hGlrc) HGLRC DrvCreateLayerContext(HDC hDc, int iLayerPlane) CDCallTable * DrvSetContext(HDC hDc, HGLRC hGlrc, void * callback) void DrvReleaseContext(HGLRC hGlrc) BOOL DrvShareLists(HGLRC hGlrc1, HGLRC hGlrc2) BOOL DrvDescribeLayerPlane(HDC hDc, int iPixelFormat, int iLayerPlane, UINT nBytes, LPLAYERPLANEDESCRIPTOR plpd) int DrvSetLayerPaletteEntries(HDC hDc, int iLayerPlane, int iStart, int cEntries, CONST COLORREF * pcr) int DrvGetLayerPaletteEntries(HDC hDc, int iLayerPlane, int iStart, int cEntries, COLORREF * pcr) BOOL DrvRealizeLayerPalette(HDC hDc, int iLayerPlane, BOOL bRealize) BOOL DrvSwapLayerBuffers(HDC hDc, UINT fuPlanes) int DrvDescribePixelFormat(HDC hDc,
UINT nBytes,
LPPIXELFORMATDESCRIPTOR ppfd)
PROC DrvGetProcAddress(LPCSTR lpszProc)
int DrvSetPixelFormat(HDC hDc,
int iPixelFormat)
BOOL DrvSwapBuffers(HDC hDc)
Intercepting
B.1 Using the helper headers
typedef struct GLImplementation{
#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \
ret (__stdcall* prefix##name##) args_tn;
#include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME } GLImplementation; GLImplementation realGl; void LoadPointers(){
#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \
*((FUNCTION*) &realGl.##prefix##name##) = \
(FUNCTION)GetProcAddress(glHandle, #prefix #name );
#include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME } B.2 Forward table GLImplementation receiveGl; extern "C"{
return receiveGl.##prefix##name##args_n##;\ } #include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME BOOL wglSwapBuffersIntercept(HDC hDc){ // Do stuff return realGl.wglSwapBuffers(hDc); } }; void SetupForwardTables(){
#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \
receiveGl.##prefix##name = realGl.##prefix##name##; #include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME receiveGl.wglSwapBuffers = wglSwapBuffersIntercept; }
B.3 Loading ICD pointers
typedef struct ICDCallTable{
DWORD numCalls;
PROC table[336];
} ICDCallTable;
ICDCallTable* APIENTRY DrvSetContextSpecial(HDC hDc,
HGLRC hGlrc,
void *callback){
static ICDCallTable* icdTable = NULL;
if(!icdTable){
// Get the GL calltable from the real ICD
icdTable = realGl.DrvSetContext(hDc, hGlrc, callback);
// Fetch all calls to our calltable
#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \
*((FUNCTION*) &realGl.##prefix##name##) = \
#include "gltable-1.1.h"
#undef PROCESS_NAME
// Rewite calls for interception
#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \
icdTable->table[##num##] = \ (PROC)##prefix##name##; #include "gltable-1.0.h" #include "gltable-1.1.h" #undef PROCESS_NAME SetupForwardTablesSetupTables(); } return icdTable; } void SetupForwardTables(){
#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \
receiveGl.##prefix##name = \ realGl.##prefix##name##; #include "gltable-1.0.h" #include "gltable-1.1.h" #include "icdtable.h" #undef PROCESS_NAME receiveGl.DrvSetContext = DrvSetContextIntercept; }
Enqueueing and Replaying
C.1 Enqueue function
template<class T>
inline void enqueue(const T t){
if(sizeof(T) == 1){
const unsigned char* c =
reinterpret_cast<const unsigned char*>(&t);
queue.push_back(static_cast<unsigned int>(c[0]));
}
else if(sizeof(T) == 2){
const unsigned short int* i =
reinterpret_cast<const unsigned short int*>(&t);
queue.push_back(static_cast<unsigned int>(i[0]));
}
else if(sizeof(T) == 4){
const unsigned int* i =
reinterpret_cast<const unsigned int*>(&t);
queue.push_back(i[0]);
}
else if(sizeof(T) == 8){
const unsigned int* p =
reinterpret_cast<const unsigned int*>(&t);
queue.push_back(p[1]); queue.push_back(p[0]); } else{ assert(false); } } C.2 Object wrapping class GlCall{
GlCall() { };
virtual ~GlCall() { };
virtual void execute() = 0;
};
class GlCall_glBegin : public GlCall{
private:
GLenum mode;
public:
GlCall_glBegin(GLenum _mode) : mode(_mode) { };
~GlCall_glBegin() { };
void execute(){
realGl.glBegin(mode);
};
};
void glBeginEnqueue(GLenum mode){
queue.push_back(new GlCall_glBegin(mode));
realGl.glBegin(mode);
}
void Replay(){
for(size_t i = 0; i < queue.size(); i++)
queue[i]->execute();
}
C.3 Function pointers
void glBeginEnqueue(GLenum mode){
enqueue(reinterpret_cast<unsigned int>(realGl.glBegin)); enqueue((static_cast<unsigned int>((sizeof(mode)<4?4:sizeof(mode)) + 0)) >> 2); enqueue(mode); return realGl.glBegin(mode); } Replay(){
unsigned int offset = 0;
unsigned int faddr;
unsigned int numArgs;
unsigned int ac;
while(offset < queue.size()){
faddr = queue[offset];
offset++;
ac = numArgs;
_asm{
push eax;
push ebx;
mov ebx, esp;
}
while(ac > 0){
arg = queue[offset];
_asm{
mov eax, arg;
push eax; } offset++; ac--; } _asm{
mov eax, faddr;
call eax;
mov esp, ebx;
pop ebx; pop eax; } } } C.4 Enumerated
typedef unsigned int (*REPLAY_FUNCTION)(unsigned int);
void glBeginEnqueue(GLenum mode){
enqueue(GlFuncEnum_glBegin);
enqueue(mode);
return realGl.glBegin(mode);
}
unsigned int glBeginReplay(unsigned int offset){
realGl.glBegin( *reinterpret_cast<GLenum *>(&(queue[offset + 0 ])) );
return offset + 0 + ((sizeof(GLenum)<4?4:sizeof(GLenum)) >> 2);
}
unsigned int fenum;
unsigned int faddr;
while(offset < queue.size()){ fenum = queue[offset]; faddr = *replayPointers[fenum]; offset++; offset = (reinterpret_cast<REPLAY_FUNCTION>(faddr))(offset); } }
Modied call
void APIENTRY glMap2fEnqueue(GLenum target,
GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder,
const GLfloat * points){
unsigned int numElems = GetNumMap2Elements(target);
enqueue(GlFuncEnum_glMap2f); enqueue(numElems); enqueue(target); enqueue(u1); enqueue(u2); enqueue(uorder); enqueue(v1); enqueue(v2); enqueue(vorder);
for(unsigned int j = 0; j < static_cast<unsigned int>(vorder); j++)
for(unsigned int i = 0; i < static_cast<unsigned int>(uorder); i++)
for(unsigned int e = 0; e < numElems; e++)
enqueue(points[e + i*ustride + j*vstride]);
return realGl.glMap2f(target, u1, u2, ustride, uorder,
v1, v2, vstride, vorder, points);
}
unsigned int glMap2fReplaySpecial(unsigned int offset){
unsigned int numElems = queue[offset];
offset++;
GLint uorder = *reinterpret_cast<GLint *>(&(queue[ ... ]));
*reinterpret_cast<GLenum *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])),
reinterpret_cast<const GLfloat *>(&(queue[ ... ]))
);
return ... ;