• No results found

Intercepting OpenGL calls for rendering on 3D display

N/A
N/A
Protected

Academic year: 2021

Share "Intercepting OpenGL calls for rendering on 3D display"

Copied!
57
0
0

Loading.... (view fulltext now)

Full text

(1)

display

Examensarbete utfört iBildkodning

Av

Joel de Vahl

LiTH-ISY-EX05/3748SE

(2)

An OpenGL applications usually rendersto a singleframe. Multi-viewor 3D

displaysontheotherhand,needsmoremoreimagesrepresentingdierent

view-ingdirectionsonthesamescene,butmodifyingalargenumberofapplications

would be unsuitable and problematic. However, intercepting and modifying

thesecallsbeforetheyreachtheGPUwoulddramaticallydecreasetheamount

of work needed to support a large number of applications on a new type of

multi-viewor3D display. Thisthesis describesdierentwayson intercepting,

enqueueingand replayingthese calls to support rendering form dierent view

points. Interceptingwith both anown implementationof opengl32.dll and an

OpenGL driver is discussed, and enqueueing using classes, function pointers

and enumerationof functions is tried. The dierent techniques are discussed

quickly with the focus being aworking implementation. This resulting in an

fullyblownOpenGLinterceptorwiththeabilitytoenqueueandreplayaframe

multipletimeswhilemodifyingparameterssuchastheprojectionmatrix. This

implementationusesanownimplementationofopengl32.dllthatisplacedinthe

applicationdirectorytobeloadedbeforetherealone. Enqueueingisperformed

byenumeratingallOpenGLcalls, pushingthis enumeration valueand allcall

datatoalistReplayingisdonebyreadingthesamelistandcallingthefunction

(3)

1 Introduction 8 1.1 Background . . . 8 1.2 Purpose . . . 8 1.3 Scope . . . 8 1.4 Method . . . 9 1.5 Overview . . . 9

2 3D displays and interaction 10 2.1 Whyisthereaneedforaninterceptor? . . . 10

2.1.1 Otherpossibleapplications . . . 11

2.2 3Ddisplays . . . 11

2.2.1 Simplestereodisplays . . . 11

2.2.2 Autostereoscopicdisplays . . . 11

2.2.3 Holoform,morethanstereo . . . 12

2.2.4 Othertypesofdisplay . . . 12

2.3 Stereorendering . . . 14

2.4 Desiredusersettings . . . 15

2.5 Conclusions . . . 15

3 The Win32 DLLand OpenGL system 16 3.1 TheWin32DLL . . . 16

3.2 Implicitandexplicitlinking . . . 17

3.3 TheDLLloadingsequence. . . 18

3.4 TheOpenGLchain . . . 18

(4)

4.1 Helpingheadersandstructures . . . 21

4.2 TrojanDLL . . . 22

4.3 OpenGLICD . . . 22

4.4 DLLinjection . . . 25

4.5 Conclusions . . . 25

5 Enqueueing and replaying 26 5.1 Objectwrapping . . . 26

5.2 Functionpointers . . . 27

5.3 Enumeration . . . 29

5.4 Conclusions . . . 29

6 Final design and implementation 31 6.1 Specicationparser. . . 31

6.1.1 Calculationofparameterpositions . . . 32

6.2 Tables . . . 32 6.3 Thequeue . . . 33 6.4 Replayers . . . 33 6.5 Extensions. . . 34 6.6 Userinput. . . 35 7 Call modications 36 7.1 Classes. . . 36 7.1.1 Vectorcalls . . . 36 7.2 Vertexarrays . . . 37

7.3 Modiedprojectionmatricesandviewports. . . 37

7.4 Otherspecialcases . . . 37

8 Discussion 38 8.1 Implementation . . . 38

8.1.1 Intercepting. . . 38

8.1.2 Enqueueing . . . 39

8.1.3 Replaying . . . 39

8.1.4 Overalldesign . . . 39

8.2 Applicationsupport . . . 39

8.2.1 Working applications. . . 39

(5)

9.1 Networkedrendering . . . 41

9.2 Callaliases . . . 41

9.3 Advancedfunction classes . . . 41

9.4 Multiplecongurations. . . 42

9.5 Pluggablerenderbackends . . . 42

9.6 Ports andotherAPI:s . . . 42

9.7 OpenGLextension . . . 42

Bibliography 43 A ICD Interface 45 B Intercepting 47 B.1 Usingthehelperheaders . . . 47

B.2 Forwardtable . . . 47

B.3 LoadingICDpointers . . . 48

C Enqueueing and Replaying 50 C.1 Enqueuefunction . . . 50

C.2 Objectwrapping . . . 50

C.3 Functionpointers . . . 51

C.4 Enumerated . . . 52

(6)

2.1 Exampleofanautostereoscopicdisplay. . . 12

2.2 Exampleofafull3Ddisplay. . . 13

2.3 Frustumsforrenderingofastereopair. . . 14

3.1 TheDLLexporttable. . . 16

3.2 TheDLLimporttable.. . . 17

3.3 TheWin32OpenGLchain. . . 19

4.1 Applicationinitialization withTrojanDLL. . . 23

4.2 Applicationinitialization withICD. . . 24

5.1 UMLclassdiagramoftheobjectwrappingmethod. . . 27

6.1 Flowbetweenthetables.. . . 33

(7)
(8)

1 ExplicitloadingofaDLL(pseudocode).. . . 17

2 Enqueueingusingfunctionpointers(pseudocode). . . 28

(9)

Introduction

1.1 Background

Regular OpenGL applications render a 3D scene to a single screen space, a

frame. When rendering for amulti-view or3D display, more views or angles

ofthesamescene needtobeproduced. A lot ofOpenGL driversalready

sup-portstereorendering,butautostereoscopicdisplaysneedtorelyonapplications

supporting their rendering algorithms. One such autostereoscopic display is

theScanningSlitDisplaybeingdevelopedbySetredAB.Thisprojectwas

tar-getedprimarilyatsupportingSetred'sdisplay,buttheworkisapplicabletoall

autostereoscopicdisplays.

Theneedtoprovideper-applicationsupport makesdevelopmentandadoption

ofnewdisplaytechnologyharderandmoretimeconsumingthanitwouldbeif

3Dcardsdirectlysupportedmulti-viewrenderingintheirdrivers.

1.2 Purpose

ThepurposeofthisthesisistoinvestigatedierentwaysofinterceptingOpenGL

calls under MicrosoftWindowsto providearenderingarchitecturefor 3D

dis-plays,andimplementasetofthesetechniques. Theinterceptorshouldprovide

support for basic OpenGL versions and work with OpenGL application that

works in a xed pipeline way, not renderingto textures ordoing any fancy

renderingtechniques. Itshouldalsobeaseasyaspossiblefortheusertocontrol.

1.3 Scope

Thescopeofthisreportistodiscussthreedierentintercepting,andenqueueing

techniques. Selecting one of each type as the most suitable for the purpose

and provide adetailed descriptionon thenal implementation ofthe selected

(10)

This report is formed asan investigation and implementation of achosen

so-lution. The method used is divided into four parts; study of previous work,

implementation and testing of the selectedtechniques, anal productionand

presentationofthework.

The study of previous work is intended to widen the perspective and to

pro-videinitialsolutionideasontheproblems presented. Theimplementationand

testingistoprovideenoughtestingontheselectedtechniquestoselectthebest

accordingto the criteriaof the nal implementation, and thenal

implemen-tationistotesttheselectedtechniques thoroughlyandshowwhich limitations

theyimply. Thepresentationstageiswhentheproductionofthereportistaking

place,aidedbynotesanddocumentationtakenduringthewholeprocess.

1.5 Overview

Thisthesisisdividedintofour parts. Therstpart,Chapter2,provides

back-groundinformationon3Ddisplaysandtheneedforaninterceptorand/or

spe-cial driver support. The second part, Chapter 3, 4 and 5, presents the DLL

and OpenGL systemsin MicrosoftWindows, interceptionmethods and

meth-odsforenqueueingandreplayingOpenGLcalls. Thethirdpart,Chapter6and

7,presentsthenaldesignandthemodicationsmadetomakecallsworkwhen

replayed. The last part, 8and 9, discusses the limitations of the systemand

(11)

3D displays and interaction

This chapter aims at being anintroduction to what a3D display is and why

an interceptor might be needed. Several types of displays are discussed and

dierentrenderingalgorithmsareoutlined.

2.1 Why is there a need for an interceptor?

Today's3D accelerators(GPU:s)havesupport for renderingtoasingle frame

bueror to bothaleft andarightbuer,providing support forsingledisplay

(regularmonitor)orstereodisplay.

In atypicalGPU implementation, asinglevertexis transformedand the

cor-responding primitive israsterized to pixels. A singletriangle canin that way

onlybetransformedtooneviewspaceto berasteredatone time. This isnot

inlinewithwhat3Ddisplaysrequire,asa3Dframe canbecomposedof

mul-tipleviews. Toproducethesemultipleviews therenderingcommandsmustbe

iterated N times (where N is thetotal numberof views) with dierent model

viewand/orprojectionmatrices.

Generating the views could be the responsibility of either the driver or the

application, but today's drivers do not support that kind of rendering, and

modifying each applicationto support each3Ddisplaysdierentrendering

al-gorithm would be too much work. Until GPU:s and drivers provide broader

support for 3D displays, there is a need to intercept calls between the

appli-cation and thedriver. This extra layerwill recordthe commands requiredto

renderaregular2Dframeandreplayitasmanytimesasneededfromdierent

anglesto create a3Dframesuitablefor thetargetdisplay. Ideally,this would

bedonebytheGPUorthedriver,but sincecurrenthardwaredoesnotexpose

thisfunctionality,alternativeimplementationsareneeded.

Some GPU drivers already support stereo rendering [1], but whether this is

donebyenqueueingandreplayingthewholesceneorbydrawingeachprimitive

in rst the left, and then the right buer is a closely guarded secret by the

GPUvendors. Otherinterceptorsexist,themostwellknownbeingChromium

(12)

renderersforlargedisplaywalls(multipleprojectorsorscreenstiled).

2.1.1 Other possible applications

The Stanford Chromium system, formerly WireGL, [5] is an interceptor

sys-temmadefornetworkedrendering. Chromiumsupportsmultiple rendernodes

connected to a provider. The nodes are in turn connected to projectors or

othertypesoflargescreenstoprovidelargevideowalls. Thisapplicationhasin

turninspiredseveralnewideasonhowtousetheChromiumarchitectureto

per-formgeometrictransformationsontheOpenGLcommandstream. Oneofthem

presentsawayto slicescenesto provide better waysto visualize architectural

3Dscenes[11].

As mentioned before, interceptors can be used for debugging applications by

looking at the calls they make. Applying this to OpenGL, debugging an

ap-plication can provide information on states, break on certain calls and track

memoryusagefortexturesand geometry. Thecommercialdebugger

gDEBug-ger [16]providesaninterfaceforviewing alltexturesand displaylists created

bytheapplication, alongwithbreakpoints,logsandstatistics.

2.2 3D displays

There are a lot of dierent 3D display systemsavailable, ranging from under

100sektoseveralmillionsekinprice. Therangecoverseverythingfromsimple

red/blueglassesto multi projector systemsforfull wall displays. Thissection

isintendedtoshedsomelightondierenttypesof3Ddisplays.

2.2.1 Simple stereo displays

A stereodisplayis adisplay that provides two separateimages to the viewer,

onefortheleftandonefortherighteye,withthelimitationthatonlythosetwo

imagesarerendered, independent ofthenumberofviewersandtheirposition.

The simplest example of that kind of system is red and blue glasseswhere a

redlterforaleft eyeandablue fortherighteyeltersoutthetwodierent

imagesfromaregularscreenorprojector.

Other variationson this theme can providefull color forboth eyes. Oneway

isto polarizethelightin orthogonaldirections whenpassingoutofthescreen

andusepolarizinglensesintheglasses. AnotheristouseasimpleLCDshutter

foreacheyeanddisplaythestereopairsequentially,switchingtheeyethat the

image will reach. The last approach requires active glasses that have to be

synchronisedwiththescreen,usuallybywireorIR.

2.2.2 Autostereoscopic displays

Unlike regular stereo, autostereoscopicdisplays [3] do not requirethe user to

(13)

parallaxbarriersisusedtotransmitdierentimagesindierentdirections. An

exampleofthisisshowninFigure2.1. Theleftandrightimagesareinterlaced

andabarriermaskisplacedtoblocktheleftimagefromreachingtherighteye

andviceverse.

This kindof screen putssevereconstraintsonthe user. Depth isonly

experi-encedwhentheeyesarereceivingthecorrectimage,whichrequirestheuserto

hold theirheadinaxed position. Toenablemovement,severalscreensusea

techniquecalled head-tracking. Ahead-trackingsystemfollowstheuser'shead

and/oreyesandupdatesthedirectioninwhichtheimagesarecorrecttomatch.

Thisputsalimitonhowmanysimultaneous usersthedisplaycanhave.

2.2.3 Holoform, more than stereo

Holoformdisplaysareanothertypeof3Ddisplay. Thistypeofdisplayrequires

morethan twoviews of the scene (therefore morethan stereo)but can

pro-videdepth experienceforallviewersinsidethedisplay'sviewcone. Figure 2.2

providesanexampleof this, where aset ofimages from theleft and right are

sentout,andtheusercanreceive3Dexperienceinsidetheareawherebothleft

andrightimagesareprojected.

This type of screen does in general have worse depth quality since there are

no optimized viewing positions, but provides another very important depth

queue; motionparallax. Tothis pointall othertypesof screen discussedhave

providedonly twoviews,independent of theuser position. Holoformdisplays

allow users to move freely inside the viewcone, providing the ability to look

aroundobjectsto acertain degree. Examples ofsuch displaysarethe display

developedby SetredAB andthe 3D TVdisplay developed by Mitsubishi

Re-search[8].

2.2.4 Other types of display

Apartfromthetypesofscreenmentionedabove,thereareacoupleofcompletely

dierentdisplaytechniques.Oneofthemistoprojectanimagesetontomultiple

stackedLCD:s,oneatatime. DependingonthenumberofsuchLCD:s,theuser

(14)
(15)

Commontomostofthedisplaysdescribedinthischapteristhattheyallrequire

somethingmore than just the standardD-SUB/DVIoutput from thegraphic

card,buttherenderingalgorithmsmightvaryverymuchbetweenthem.

2.3 Stereo rendering

Astereopairistwoimagesof thesamescenetakenwithaslightdisplacement

of thecameraposition (seeFigure 2.3). Thisis usually donefrom anexisting

viewfrustumbyselectinganimage projectionplane withthedistance Dfrom

thecameraandacertainstereoseparation. Assumingarighthandcoordinate

system, xaxisto theright,camerapositioned at theoriginand looking along

negativez(standardOpenGLsetup),thecameraisdisplacedbothleftandright

renderingoneimagefromeachposition.

As thecameraismovedsideways,theintersections betweeneach frustumand

the image projection plane are also translated. To keep the physical image

planethesamebetweenthesetranslations,thefrustumsaresheared. Theabove

mentionedimageshowshowthefrustumisshearedwhenmovingthecameraa

(16)

A 3Ddisplayhasmany possible congurationoptionsthatare notneededfor

aregularscreen. AsOpenGL hasnodenedimage projectionplaneand most

renderingalgorithmsfor3Ddisplaysrequireoneto bedened,theinterceptor

might needto have onedened. Dierentprograms may requiredierent

co-ordinate systems, or at least a dierent scale of geometry, making the image

projectionplaneagoodvaluetobeuser-controlledinsomefashion. Asseenin

thestereorenderingexamplein Figure2.3, thecamerais translatedsideways.

Dierent eye separation values will give dierent viewing results, which may

alsobesomethingtheuserwillwanttotweak.

ThenVidia stereodriversupports amultitude of controls [2, 1], themost

no-table being clamping thedepth experience bothat the frontand at the back

of theviewfrustum. This means that objects verynear to and veryfar away

fromtheviewerareprojectedaccordingtotheregularviewfrustumtothe

im-age projectionplane. These distances are also something that could be user

controlled.

2.5 Conclusions

As more and moretypes of 3D displays become available, dierent rendering

algorithmsmustbesupported,eitherbytheapplicationorbysomeotherkindof

layer. Thismakesaninterceptorafeasiblesolutiontotheproblemofproviding

per application support for each possible rendering algorithm for multi view

(17)

The Win32 DLL and OpenGL

system

ThischapterprovidesanoverviewonhowDLL:s(DynamicallyLinkedLibraries)

arehandled inMicrosoftWindows. Italsoaimsto describehowOpenGLcalls

arehandledinWindowsandhowanOpenGLcallreachesthehardwareafterit

ismadefromanapplication.

3.1 The Win32 DLL

On Windows,the PE (Win32PortableExecutable)denes astandardformat

for both DLL:s and standard Win32 executables (EXE) both on disk and in

memory [13, 14]. On disk, the DLL is composed by a set of headers and a

sectiontable describing what sectionsare available. A sectioncanbeeither a

codeordatasection,but wherethereisjustone typeofcodesectiontherecan

bemanytypesofdatasections.

(18)

Algorithm1Explicit loadingofaDLL (pseudocode).

function_pointer glBegin;

dll_handle handle;

handle = LoadLibrary(opengl32.dll);

glBegin = GetProcAddress(handle, glBegin);

// Use acquired pointer

FreeLibrary(handle);

Oneimportantpartforus is theexport table (seen inFigure 3.1). This table

containsinformation onthe numberof exported symbols,their addressinside

the DLLand their names. The addressof these symbolscan then be fetched

usingthedierentmethodsdescribedin Section3.2. Anotherimportanttable

istheimporttable(seeninFigure3.2)whichholdsinformationonallDLL:sto

beloadedimplicitlyandwhichsymbolstoloadfromthem.

3.2 Implicit and explicit linking

DLL:s can be loaded in twodierent ways. Implicit (load time) and explicit

(runtime)linking.

Implicit linking happens when linking to the DLL and its .lib le at compile

time, marking certain symbols in the code to be imported from that specic

DLL.WhentheexecutablethatlinkstotheDLLisloaded(thiscanbeeithera

DLLoraregularEXE),theimporttableistraversedandallDLL:sareloaded

intomemory. Anexecutablethatisimplicitlylinkedagainstanothercontainsa

tableofsymbolstoimport. Thistablecontainsthenameandadummypointer

to thefunction to beimported. Until theimplicitlylinked DLLis loaded,the

symbols in this table cannot be used. When theDLL to beloaded is fully in

memory, thelinkerwill traversetheimporttable lookingupeachsymbolfrom

theDLL:sexporttablebynameandreplacingthedummypointerwiththereal

(19)

library is loaded by calling the LoadLibrary function and released with the

FreeLibrary function. When aDLL is loaded, the developercan call

GetPro-cAdresstorequestaspecic symbol. ThismightlookasinAlgorithm1,where

opengl32.dllis loadedandanexportedsymbolisfetched.

3.3 The DLL loading sequence

WhenloadingaDLL,Windowslooksinspecicplacesfortherequestedle.

Us-ingthedefaultbehaviouronWindowsXP,thefollowingdirectoriesarechecked

in order[10]:

1. Thedirectoryfromwhich theapplicationloaded.

2. Thecurrentdirectory.

3. Thesystemdirectory(c:\windows\system32).

4. The16-bitsystemdirectory(c:\windows\system).

5. TheWindowsdirectory(c:\windows).

6. Thedirectoriesin thePATH environmentvariable.

During loadingof aDLL an optional entry point is called, if available. This

function,called DllMain[10], hasveryrestrictedfunctionalityandshould only

providethesimplestofsetup. CallslikeLoadLibraryhavetobecalledatalater

point.

3.4 The OpenGL chain

OpenGL support in Windows is provided by the opengl32.dll located in the

system32directory. This DLL exportssymbolsequalto OpenGL 1.1with the

addition of some WGL extensions for selecting rendering contexts and pixel

formats. Most applications using OpenGL link to opengl32.lib which implies

implicit loadingof opengl32.dll but there are someapplications, mostnotably

the Quake series by iD Software, that load the DLL explicitly and fetch the

required symbols. WGL functions are platform specic (Microsoft Windows)

extensionstoOpenGLthatprovidesupportforselectingpixelformatandother

framebuerrelatedfunctionality.

WhenOpenGLsupportforwindowswasintroduced, therewheretwowaysfor

ahardwarevendorto providesupport. EitherProviding aMini ClientDriver

(MCD) or a Installable Client Driver (ICD). Both these driversare included

into the Windows 2D GDI driver package, provided by graphic card vendors

to make the operating system utilize the cards capabilities fully. The MCD

is aprimitiverasterizationpath that exports a number of calls to Microsoft's

OpenGLimplementation,tobeusedinsteadofsoftwareblitting. Asall

(20)

Microsoft'simplementation couldbeused asanfall backforfeatures not

sup-portedinhardware,thehardwarevendorcouldconcentrateongettingOpenGL

support upandrunningwhileimplementingthemuchmorecomplicated ICD.

The ICD is a full blown OpenGL implementation, where the vendor has to

implementallOpenGLcallsandprovidesoftwareimplementationforanything

thatthehardwareisunabletoprovide. SinceMCDsupportwasremovedfrom

Windows 98, providing the much more complicated ICD is the only way to

achieveacceleratedOpenGLonMicrosoftWindows.

Theopengl32.dll in turnfetches thehardwarevendorimplementationfrom an

InstallableClientDriver(ICD)installedbythevendor. ThisICDDLLisfound

bylookingattheregistrykeyHKEY_LOCAL_MACHINE\SOFTWARE

\Microsoft\WindowsNT \CurrentVersion\OpenGLDrivers. Under thiskey

the vendorprovides informationon which DLL to load astheICD. The ICD

thenprovidesatablewith alltheOpenGL1.1 calls. WhiletheICDis mostly

undocumented in thePlatform SDK provided by Microsoft, many alternative

OpenGLimplementationshavereverseengineeredthisfunctionality. Theopen

sourceOpenGLimplementationMesa3D[15]hasanICDimplementationwhich

denes thefunction calls dened in Appendix A. If no ICDis registered, the

OpenGL DLLcan fall backonto aMiniClientDiver(MCD) oreven software

driver[12]. ThecompleteOpenGLchain isshownin Figure3.3.

The SetContext call is probably the most important call here. It returns a

pointerto atablewhere therstelementis aDWORD(anMicrosoftspecied

data of size 32 bits) containing the total number of function pointers in the

table. Theorderofthesefunction pointershasalsobeenreverseengineeredby

Mesa.

3.5 Conclusion

The easiest way to intercept OpenGL calls seems to be to override the DLL

(21)

originalICDoropengl32.dllandplacingtheinterceptingICDorTrojanintheir

placeisalsofeasible,butasthatwouldcollidewiththeconsistencyofthesystem

itwasdecidedtointercepteitherbychangingtheICDnameintheregistryorby

(22)

Intercepting the calls

Therststeptointerceptingafunctioncallistosomehowfetchthecallfromthe

senderbefore itreachesthereceiver. Somelogiccanthen processthefunction

argumentsanddecidewhattodo. Thischapterdescribesdierentinterception

strategies for OpenGL on Windows. The dierent strategies all come from a

singlebase idea: trickingsomepartofthesystemtoloadadierentDLL than

usual.

4.1 Helping headers and structures

Tomakeimplementationandtestingeasier,acoupleofheaderswerecreatedto

easethecreationandprocessingofthemultitude ofOpenGLcallsthat needto

beimplementedforaninterceptor. Thesewaslatercreatedusingthe

specica-tionparser describedin Chapter 6. Theheaders consist ofcalls to aCmacro

calledPROCESS_NAMEthat containsthefollowinginformation,in order:

prex The prex that the function call has. Drv for ICD calls, wgl or gl.

The wglcallsare separatedfrom regularglcalls since theyareplatform

specic,andarenotimplementedbytheICD.

ret Thereturntypeofthefunction

name Thefunction name,withouttheprex.

args_tn Allargumentswithtype,written asin thefunctiondeclaration. Eg.

(ArgType1 arg1,ArgType2 arg2,... ArgTypeN argN).

args_n All argumentswithouttype(just thename) written asin afunction

call. Eg. (arg1,arg2,... argN).

num TheICDnumberof thefunction,onlyusedfor determiningtheorderin

(23)

headers and then undening the macro. The reasonfor these headers wasto

beabletocreatesomekindofloop whereaspeciedoperationcouldbedone

percall,withalltheinformationaboutthecallprovided. Forexamplecreating

astructure withtypedfunctionpointersto allOpenGL1.1calls.

AppendixBprovidesexamplesonhowthese headerscanbeusedto performa

multitudeofpercalloperations. Includingenumeration,andloadingoffunction

pointers.

4.2 Trojan DLL

InterceptingwithwhatIcallaTrojanDLLissimplyputtinganewDLLwiththe

samenameastheoriginalone(opengl32.dllinourcase)intheDLLsearchpath

beforethe onein system32(see Section3.3). Note that Trojanin the context

doesnotmeanaproviderofmaliciouscode,butaOpenGLimplementationthat

foolstheapplicationtothinkitistherealone.Tomakethiswork,theDLLmust

exporttheexactsamesymbolsastheoriginalandcannotload-timelinktothe

realopengl32.dllsinceaDLLwiththatnameisalreadyloaded. SinceWindows

XP, allsystemles areunder the controlof WindowsFile Protection (WFP).

The resultis that nosystem les canbeoverwrittenby mistake orpurpose

withoutturning theprotectioncompletely o. Thisdismisses the approach of

renamingtheoriginalopengl32.dllwhilelettingtheTrojantakeitsplace.

Figure4.1describeshowtheinitializationsequenceandthehandlingofOpenGL

callsisdonewhenanTojanDLLinterceptorisloaded. Duetothenatureofthe

Windows DLL loader(described in Chapter 3) , the Trojan DLLcannot load

therealOpenGLDLLwheninside DllMain;thishastobedonewhentherst

callismadetotheexportedsymbols. Duringthisinitialization(markedasInit

interceptor in theimage),thereal OpenGLDLL andallexportedsymbolsof

that DLL isloaded and storedinternally in theinterceptor. All calls cannow

beinterceptedandmodied.

Toimplementthis,aDLLwithallOpenGL1.1andWGLfunctionalityneedsto

becreated. Thelistofsymbolstoexportcanbeachievedbylookingonthelist

of symbolsthe opengl32.dllprovides(using thedumpbin utility in Microsoft's

PlatformSDK).ThisDLListhenplacedineachapplication'sdirectoryandisin

thatwayloadedasthedefaultOpenGLDLLonsystemswiththedefault DLL

searchpath. Oncefullyloadedsomeinitializationcodeisrunthatloadsthereal

DLL using aLoadLibrarycall and logicthat loads allthe real OpenGL

func-tionpointers forinternalstoragein theTrojan. Incoming callstothe Trojan's

exported functions canthen beprocessedbefore forwardingto the realDLL:s

function pointers. Thepointerscanbeloaded andusedasshownin Appendix

B.

4.3 OpenGL ICD

Creatingan OpenGL ICDisa variationonthe Trojan DLL.This method

(24)
(25)

func-tions, onlythe ICDfunctions for requestingand managing contexts andpixel

formats are exported. The real symbols canthen be imported when the

Set-Context function is called, returning pointers to the internal GL functions in

theDLL.

Figure4.2showsthedierencebetweeninterceptingwiththeTrojanDLLand

theICD.SincetheICDinterfacespeciesthatDrvSetContexthastobecalled

before calling any OpenGL functions, wecan concentrateon loading thereal

ICDthere. ThehandlingofOpenGLcallsisnotshown,butdoesn'tdierfrom

theTrojanversionotherthatmodicationofthecallhappensaftertheOpenGL

DLLinsteadofbefore.

BuildingupontheTrojanDLLimplementationdescribedabove,theDLLneeds

to implement all OpenGL calls and all ICD calls. Using the same approach

as when interceptingwglSwapBuers, DrvSetContext can beforced to return

pointerstotheinternalOpenGLfunctioncallsinsideourICD.Thisisshownin

(26)

Onemethodofinterception,describedin[9],istohookontoDLLloadmessages

for aspecic thread and force the loadingof our Trojan DLL. This approach

issimplyanotherwayofconnectingtheTrojan totheapplication withoutthe

needtocopyles. Thedownsideisthatapplicationsthatshouldbeintercepted

have to be launched by some kind of control application that initializes the

injectioncode.

Thismethod was notinvestigatedfurther, but couldbeaviablealternativeto

copyingtheTrojanDLLto eachapplication folder.

4.5 Conclusions

The Trojan DLL seems to be the moststraightforward implementation of an

interceptionmechanism,butmightrequiresomeuserinteractionbeforean

ap-plicationcanbeintercepted. An ICDontheotherhandcouldworkcompletely

withoutuserinteraction,butrequiressomethoughtonwhichfunctionpointers

are returnedand builds upon undocumented functionality in Windows. Both

thesemethods seemsviablealternativesforanOpenGLinterceptor

implemen-tation. DLLinjectioncouldalsobepossible,injecting eithertheTrojanorthe

ICDinto thehostprocess,but this wasnottesteddue to time limitationand

(27)

Enqueueing and replaying

Whenthestructuresforinterceptingcallsareinplace,asdescribedinChapter

4,thecallscanbealteredandchanged. Theaimofthisthesisistoenqueueand

replaycalls, which meansthat theDLL needsto havesomeinternal structure

to holdthecalls. This chapterwill discusshowthisinformationcanbestored

and replayed. Three dierentapproacheswill be discussedtogetherwith their

strengthsandlimitations.

AfunctioncallcanbeseenasamessagecontainingamessageID(thefunction

name)andsomemessagedata(thearguments). Somecallsrequireinformation

to bereturnedwhich meansthat theOpenGLcommandsneedto beexecuted

while enqueueingto keepthe real driverin the correctstate for returnvalues

onlatercalls. Chapter4describedhowthecallscouldbeintercepted,deciding

thatallavailablemethodsrequireafunctionpointertobecalledwiththeright

conventionandnumberofarguments. ThismeansthattheDLLwillimplement

allOpenGLcallsandthateachcallwillhaveafunction bodythat cancontain

anydesirablelogicthatmightbeneededtoenqueuethatcall,withthelimitation

that the returnvalueneeds to be thereturn valueof thereal OpenGL callin

thecontextthatitiscalled.

Themostimportantfeaturesrequiredoftheenqueueingtechniquesareenqueue

speedand thepossibility to handledierenttypesof OpenGL functions. One

majorproblemisthepossibilitytohandlevectorfunctions,functionsthathavea

pointerasargumentandcanfetchaxedorarbitrarynumberofdatafromthat

pointer. A test interceptor wascreatedfor each ofthe techniques to evaluate

functionalityandperformance.

5.1 Object wrapping

ThismethodcanbeseenasanimplementationoftheCommanddesignpattern

[4], encapsulating a call (or command) into an object to be replayed at will.

Each call can be represented as an object with private variables holding the

datasentas parameters. Deninganabstractbase classGlCall which hasthe

(28)

throughthecallsandexecutingthemeasy. Therealcallobjectthenonlyneeds

to implement aconstructor that takesallthe arguments, storethe arguments

internallyandimplementexecute()tocalltherealOpenGLcallwiththestored

arguments.

Forexample,thecallvoidglBegin(GLenummode)canberepresentedasan

ob-jectoftypeGlCall_glBeginwithaninternalvariable oftypeGLenumholding

thesamevalueastheargument,seeAppendixCforasampleC++

implemen-tationofonecall.

Thismethodcan supportdierenttypesoffunctions,forexamplevector

func-tions, by letting the object constructor store all data in the vector and by

providingapointertothestoredvectorwhencallingtherealOpenGLfunction.

The main problem with this technique is that creating and deleting objects

each frame has ahugeimpacton performance. Although thismethod is easy

toimplement,itdidnotperformwellonlargedatasets (manyfunction calls).

Thiscouldbesolvedbypoolingtheobjectsinsideaspeciedmemorysegment.

5.2 Function pointers

Thismethodisalowlevelversionoftheobjectwrappingtechniquementioned

above. Insteadof having avectorof object pointers, avectorof, for example,

unsignedintsisusedtostorethepointertotherealOpenGLfunction,the

num-berof unsignedintsthe datatakesand thedata(see AppendixCfor enqueue

abstraction). Asseen,thedataispushedinreverseorder,sothatreplayingcan

be done without looping backwardsthrough the queue. Thecall is replayed

by fetching the function pointer, pushing the enqueued numberof arguments

tothestackand callingthefunctionpointerusingalowlevelCALLassembly

instruction,seeAlgorithm2foraquickoverviewusingpseudocodeand

Appen-dix CforasampleC++implementationof onecall. Notethat thecallin the

examplealgorithmisactiveOpenGLcallusedtoillustratehowthetechnique

works. Replayingcan be doneinaloop,notrequiringspecial replayfunctions

to be called (likethe execute method in the object wrapped technique).

Al-though this can look likea good way to handle replaying, it makes handling

(29)

function glFunctionEnqueue(data[N]) do enqueue(address_of(glFunctionReplay)) enqueue(N) for i = N..1 do enqueue(data[i]) end end function Replay() do int offset = 0

while offset < queue.size() do

functionAddress = queue[offset] offset += 1 N = queue[offset] for i = 1..N do system_stack.push(queue[offset + i]) end CALL functionAddress offset += N end end

(30)

it mightnot be feasible due to thehard coded function pointer and thexed

amountofdatato bepassedasargumentsonthesystemstack.

5.3 Enumeration

Enumeration is a variation on both of the previously mentioned enqueueing

techniques. As in thefunction pointer technique,alldata is storedin alistof

unsignedints,butinsteadofstoringthefunctionpointer,auniqueidentierfor

eachcallisstored,andthefunctionargumentsarestoredafterthat.

Replaying can be doneby having atable of function pointers indexed by the

uniqueidentierpushed to thequeue. To be ableto handledierenttypesof

functionsareplayfunction isdened. Thisfunctioniswrittenlikeunsignedint

glFunction(unsignedintoset)andtakestheosetoftherstdatainthequeue

asanargument. Thefunction canthen replayusing anylogicit wantstoand

returns the osetto oneposition pastthe last data, theposition of the next

function identier. This way, the enqueueing function canput arbitrarydata

inthequeue,assumingthat thereplayingfunction canparseit andreturnthe

osetofthenextfunction,makingwayforveryspecializedfunctions. Thesame

enqueue abstraction asused in Section 5.2 canbe used, but modied sothat

arguments with size larger than 4 bytes are not enqueued in swapped order.

Algorithm3showsapseudocodeimplementationusingthismethod, usingthe

samectiveOpenGLcall as inthefunction pointer example. SeeAppendix C

forasampleC++implementationofonecallusingthismethod.

From adierent pointof view,this method is almost exactlythe sameasthe

object wrapping method described above. A constructor (enqueue function)

allocatesmemoryinthequeueforthedatatobestoredandaexecutefunction

(replay function) reads the specied data from the queue and provides it to

thereal OpenGL function. Themain dierence is that insteadof naming the

variables,thistechniqueworkscompletelywiththeaddressesinside thequeue,

notneedingtoallocatememoryeachtimeanewcallismade.

5.4 Conclusions

All theabovediscussedenqueueingmethodswork,butwith varyingexibility

andperformance. Theenumerationmethod hasenoughexibility (canhandle

bothspecializedenqueueandreplayfunctions)andoperatesinapre-allocated

memoryspacewhichremovesthemostdominantperformancebottleneckofthe

(31)

function glFunctionEnqueue(data[N]) do enqueue(enumeration(glFunction)) for i = 1..N do enqueue(data[i]) end end function glFunctionReplay(offset){ data[N] for i = 0..(N-1) do data[i + 1] = queue[offset + i] end realGl.glFunctionReplay(data) return offset + N end function Replay() do offset = 0

while offset < queue.size() do

functionEnumeration = queue[offset]; functionPointer = get_replay_func(fenum); offset += 1 offset = functionPointer(offset); end end

(32)

Final design and

implementation

Thischapterdescribesthenaldesignandimplementationcreatedin this

the-sis,togetherwithutilities tosimplify addingnewextensionsandOpenGL

ver-sions. Thedesignis heavilybasedontheTrojan andICDexamplesdescribed

inChapter4,withaseparateDLLforeachimplementation. These DLL:sacts

asafrontendtotherealinterceptor,astaticallylinkedlibrarycommontoboth

frontends.

6.1 Specication parser

Writingeach OpenGL function by hand is notfeasible and timeeective. To

make theinterceptor moremaintainableand exible, asmall Rubyscript was

created. This scriptparses anumberof extensionles containinginformation

aboutallOpenGLfunctionsandgeneratesfunctionsforenqueueingand

replay-ingeachcall. Asabase,theChromiumspecicationlewasused. Thislewas

parsedinto the internal data structure in the Ruby scriptand written to le.

Parsing is done before compiling the interceptor, providing information about

allpossiblesupportedcalls.

Eachfunction callisspeciedusingthefollowingkeywords:

name Nameofthefunctioncall,withoutprex. E.g. BegininsteadofglBegin.

Thislinemustbetherstline ofthespeciedcall.

prex Theprexofthespeciedcall. Eithergl,wglorDrv.

return Returntypeofthefunction.

param Speciesanargumenttothefunction. Takestwowhitespaceseparated

arguments, the rst is the name of the argument and the second is the

typewhichcancontainwhitespaces(e.g. const GLint*). Thiskeyword

canbeused several times in afunction specication, but thenames has

(33)

the rst is the name of the parameter (as specied by param) and the

second isthenumberofelementsinthevector.

category TheversionofOpenGLorthenameoftheextensionwherethiscall

wasintroduced.

number TheICDnumberofthespeciedcall. Setto-1forallothercalls.

type Functionclassicationusingthekeysspeciedin Chapter7.

name Vertex3fv

prefix gl

return void

param v const GLfloat *

vector v 3

category 1.0

number 137

type

Table6.1: Example ofafunction callspecication.

6.1.1 Calculation of parameter positions

AsshowninSection5.3,replayfunctionsmustcalculatesizesofargumentsand

addthosetogethertondthepositionofthenextargumentinthequeue. This

is donein theparser, generating code that compilesto aconstantwhen using

an optimizing C compiler. Whenenqueueing, all data sizes is rounded upto

the nextmultiple of8. This is also truefor vectorcalls, but here isthe total

numberof bytes enqueuedrounded upto thenextmultipleof8.

6.2 Tables

The design of the interceptor revolves arounda number of tables of function

pointers. Thesetablescontrolthebehaviouroftheinterceptoratagivenpoint,

makingdierentinterceptionstrategiespossible.

In The table containing all entry points to the DLL and all OpenGL calls

that are available usingwglGetProcAddress. Each function in thistable

forwardsthecalltothereceivetable. Thisextrastepisneededtoprovide

theabilitytochangeinterceptionstrategiesforreceivingcalls.

Receive Thetablecontainingfunctionpointerstowhateverwillhappenwhen

acallisreceived. ThiscanusuallybeapointertotherealOpenGL

imple-mentationortoanenqueuefunction. These pointersareinterchangeable

(34)

Replay Thetablecontainingfunctionpointerstothereplayfunctionscurrently

in use. Thistable isusedbythereplayfunctions.

Debug Thetable containingfunction pointersto the debugfunctions. These

functionsusethesamecallingconventionasthereplayfunctionsanddump

alldataenqueuedinaframetole.

Real Thetable containingfunctionpointersto therealOpenGL

implementa-tion.

TheapplicationisonlyawareoftheIntable,oratleastthecontentofit. This

iswhereall thefunction calls willarrivetotheinterceptor. Thefunction from

theIntablewillthencallthepointer associatedwiththesamefunction in the

receivetable.

6.3 The queue

Theenqueuestrategyused in theinterceptoris thesameasthe onedescribed

inSection 5.3. An arrayofunsigned intscontainsauniqueidentierand

arbi-trarydata perfunction call. This data is enqueued onan individual basis by

afunction, and replayed by areplay function that mimics theenqueue when

readingthedata.

6.4 Replayers

A frame is considered done when wglSwapBuers is called, and the F

(35)

queued using the enqueuefunctions) a replayercanuse thequeue to perform

dierentrenderingalgorithmstocreatethenecessaryimages.

Figure6.2: Applicationowduringreplayingofthequeue.

Theinterceptor implementsamultitude ofdierentrenderersfor bothtesting

andrenderingofimagesto3Ddisplay.

RedrawClear Simpledummyredrawfunction that justclearsthequeueand

returns.

RedrawTiled Redraw function that loops through the queue 16 times,

ren-deringthescenetoa4x4grid.

RedrawRedBlueStereo Rendersredandblueimagesofthescene,withskewed

frustums, creatinganimagesuitableforred/bluestereoglasses.

RedrawCoreRender Renderssceneforholoform3Ddisplay.

6.5 Extensions

Asopengl32.dllonlyprovidesfunctions forOpenGL1.1,allotherfunctionsare

provided to the application using the wglGetProcAdress function. A call to

glGetString(GL_EXTENSIONS) will return alistof extensions supported by

thedriver. Toforce theapplication to onlyusethe extensionstheinterceptor

implements,anarrayofstringsiscreatedbythespecicationparser. Thislistis

comparedwiththestringfromthedriver,returningonlytheunionofthetwo.

(36)

pointer,theinterceptorinterceptsthecallandreturnsafunctionpointertothe

Intable. Therealpointeristhenstoredintherealtableinstead.

Newextensionsareadded byaddinganewlewith theextensionsname(e.g.

theextensionnamedGL_EXT_framebuer_objectsbecomesGL_EXT_framebuer_objects.txt).

Thisleisparsedusingthespecicationparserandimplementationsforall

ex-tensionfunctions arecreated.

6.6 User input

The input systemis designed around Windows hooks to planta system wide

keyboard hook. This is due to full screen application (mainly games) taking

full controlof thesystemand notletting otherapplications bevisible. Italso

provides awayto alwaysfetch certain keyboard keyswithout being forcedto

useangraphicaluserinterface. Thekeysareforwardedintotheinterceptorand

iseitherprocessedatonceorstoredasamessagethatisread before replay. A

methodcalledFrameControlliscalled fromallcallsthat swapOpenGLbuers

which willtriggerredraw,queueclearandmessageprocessingbeforeswapping

thebuers.

Currently,therearecontrolsforturningtheinterceptoron/o,switchingto

dif-ferentreplayers,settingthedistanceoftheimageprojectionplaneanddumping

(37)

Call modications

Notallcallsaresuitableforenqueueingandreplaying,andsomereplayersmight

need to substitute some calls with special calls during replay. This chapter

describesthedierentclassesthefunctionswerepartitionedintoandhowthey

act.

7.1 Classes

Callswerecategorizedtocreatedierentlogicdependingonthecalltype. The

followinglist describescall modiers that can be put in the type eld in the

function specication les. Nomodiers means that the callshould be

inter-cepted.

Get AfunctionwiththepurposeoffetchingdatafromOpenGL.Doesnotneed

tobequeuedsincenorealcallerexistsduringreplayandnostatechanges

in OpenGLare made. Callsin this classare forwardedto thereal table

and do not haveany enqueuer orreplayers(the pointer in the enqueue

table isequalto thepointerintherealtable).

Special This function has a special implementation, for example

wglSwap-Buers, wglGetProcAdress and glGetString. Used for calls that need

specialcare. Thepointer intheenqueuetable pointstoglNameSpecial.

None Notextin thetypeeld. Thiscallshouldbeinterceptedandenqueued.

7.1.1 Vector calls

Avectorcallisacallthattakesapointerto amemorysegmentandusesthat

during execution of the function. Since we are unaware of whether this data

haschangedduring theapplication's executionornot, wecannot assumeit is

correctwhen replaying. Tosolvethis, allvectorargumentshavetobedened

(38)

enqueuer copies the memory segment to the queue and the replayercalls the

realOpenGLcall withapointertotheenqueueddata,skippingthesize ofthe

datawhenreturning. Forcallswithavariablebutlimitedamountofdata, the

upperlimitisused.

7.2 Vertex arrays

Onespecial classof functionsis thevertex arraysfunctions. This classcanbe

dividedintopointerand drawcalls,thepointer callsprovidinginformation on

wheretoreadthearraysandhow(stride,typeetc.) whilethedrawcallsdothe

actualdrawing. AccordingtotheOpenGLspecication[7],thememoryisread

duringthedrawcall,transferringgeometrytothedriverandhardware.

Thememorylocationprovidedbythepointercallsisnotofaknownsizeuntil

adrawcall ismade,makingdirect cachingofthisdatainside thequeue

unfea-sible. Insteadadrawcallcouldbeexplodedintoregularimmediatemodecalls

(glVertex, glColoretc.). This is described in the OpenGL specicationand is

whatweusedhere.

Eachpointercall is storedasaninternal statetobeused later. Whenadraw

callarrives,thearraysareloopedthroughin thespeciedorderandthe

corre-sponding immediatemodecallsare enqueued. Whenreplaying,these callsare

treatedjust likeanyregularcalls.

7.3 Modied projection matrices and view ports

Asapplicationscanmodifytheprojectionmatrixatanytimeduringtheframe,

the replayers will need a way to endure that its modied matrices are used

evenafterthemodication. Toachievethis, theglMatrixModecallismodied

to storeif theapplication hasswitched to GL_PROJECTION andback,and

glBeginismodiedtoaddthereplayersprojectionmatrixbeforerenderingany

geometry.

Somereplayersmightwanttochange theviewportto rendertojust apartof

thescreen, which means that glViewport has to bemodiedduring replayto

scaleandtranslatetheviewportchangesaccordingtothereplayersneeds.

7.4 Other special cases

Some calls that can be considered asvectorcalls takea pointer to a memory

locationtogetherwithrowand columnstrideandthenumberofdata toread.

(39)

Discussion

The interception techniques and enqueueing methods described in this thesis

should beapplicable to mostcases where interceptingof function calls. With

verylittleoverhead,thefunctionsiscaught,processedandforwardedtoa

suit-ablereceiverortherealreceiver.Thischapterdiscusseshowtheimplementation

oftheselectedtechniquesperformsandwhatmightbeimproved.

8.1 Implementation

8.1.1 Intercepting

The interception techniques used where the Trojan DLL and the ICD. Both

these methods works verywell. TheTrojan isprobablythe moststraight

for-wardimplementationpossible,justexportingandforwardingselectedsymbols,

whiletheICDisabitmoretrickydue tomorefunctionpointerstokeeptrack

of. DLL injection wasnot investigating further, but might remain a feasible

alternativetobothimplementedtechniques.

As for a comparison between the implemented techniques, the Trojan is (as

said above) by far the easiest way to implement an interceptor on Windows.

Theonlyinformationneededisthefunction declarations,whichisgivenbythe

specicationandthespecicationparser, andawaytohandle various callsto

getnewOpenGLfunctions beyond1.1(wglGetProcAddress). Tablescanthen

easily be set up to provide desired interception functionality. This technique

can beextendedto theICD, but that requires somethought onhowpointers

are provided and handled betweenthe ICD,the application and theOpenGL

DLL.Performancewise,theyarebothonpairsincetheybothworkonthesame

setonfunction pointersallthroughondierentstagesin theOpenGLchain.

AsaDLLcannotloadanotherDLLexplicitlyinsideDllMain,boththeICDand

theTrojan needsto havesomemechanismto loaditsfunction pointers before

thestcallisprocessed. Fornow,thisisdonebycheckingtheisInitializedag

eachcall,andcallingInit() ifthisagisfalse. As thisonlyistruefortherst

(40)

ControlList).

8.1.2 Enqueueing

Thequeueimplementedusesastd::vector<unsignedint>,whichstoresa32bit

chunk of data in each queue position. This size wasused bothsince it is the

registrysizeof32bitCPU:sandthesizeofafunctionpointeronthatplatform.

The vector is initialized to a xed numberof elements, but is expanded each

time it is lled. This way, it will reach a level where no new allocations is

needed,andthequeuewill operateinaxedmemoryspace.

Raw enqueue performance is not the primary goal of this project since it is

targetedatatleastacoupleofreplays,wherethellrateandGPUbandwidth

will bethebottlenecks. On theother hand, theenqueue performance hit can

bemeasuredusingtheClearredrawerthatjust clearsthequeue. Onaregular

Quake2map theperformancedrops toabout70%oftheoriginal performance

usingthisreplayer. Thisforascenethat takesup81KB inthequeue.

8.1.3 Replaying

Asidefromtheobviousworkaroundsforprojectionmatricesandviewportxes,

the replayers are working ne. The replay overhead is almost none, since it

consists of just aloopwith twoqueue reads and afunction callto thereplay

functioninit. Thereplayfunctionthendoesanothercallandfetchesatotalof

N variables from thequeue, where N is thenumberof argument thespecied

functiontakes.

8.1.4 Over all design

Themostvisibledrawbackoftheinterceptoristheperformancedecreasewhen

usingvertexarrays. Althoughit seemslikeabottleneckforoptimization, that

taskmightbeharderthanitlooks likeatrstglance(seetheOpenGL

speci-cation[7]). Each elementin thevertexarray,colorarray,texturearrayetc. is

explodedintoaseparatecall,addingbothcallandqueueheaderoverhead.

8.2 Application support

8.2.1 Working applications

Asthepurpose ofthisthesisstates,theinterceptorshould work withasmany

standardrenderingapplications as possible. Init current state,it workson a

wholerangeofapplications. Fromsimpletriangletestapplicationstofullgames

(41)

Thecurrentimplementationassumesthat theapplicationdoesnotuseshaders

formulti passandrender totextures. It is alsoassumed that themodel view

andprojectionmatricesareprovidedtotheshaderusing thebuiltinOpenGL

matrices, and not using uniforms orother non standardconstructs. The

ap-plications must alsoswapbuersusing wglSwapBuers, sincethat isthe only

function thatwilltriggerredrawandaqueueclear.

Another problem canoccur when an application removes an object (texture,

displaylistetc.). Allcreationanddeletionofobjectsareignored(notenqueued

andreplayed),whichcouldcauseproblemsiftheapplicationcreatesanddeletes

objectsduring aframeandnotjustat thestartandendoftheprogram. This

problem couldbeavoidedbyhavingaremovelistthat storesallobjectstobe

(42)

Future work

This chapter contains thoughts on things that can be researched and

imple-mentedin theinterceptor.

9.1 Networked rendering

Asholoformdisplayscanrequiremorethan50dierentviewinganglesperframe

tocreateanacceptableimage,renderingperformancemightbeabottleneckon

reaching real time performance. The interceptor could propagate the render

queuetoothercomputersonthenetwork,thesamewayChromiumdoes,making

themrenderthesameframefromdierentanglesinparallel.

9.2 Call aliases

AsOpenGLevolved,manyfunctions that wereoriginally partofanextension

were included into the specication. Many of these functions are the same

functions with another name, taking thesame parametersand havethe same

speciedfunctionality. Thespecicationcouldcontaininformationaboutthis,

allowingtheparserto skipgenerating enqueuerandreplayersfor therenamed

calls, forwardingall calls to just one enqueuer. This would reduce total code

size and might provide additional optimization for networked rendering since

fewercallswouldhavetobeenumeratedwhichinturnwouldhelpcompressing

data.

9.3 Advanced function classes

Thecurrentimplementationprovidesautomaticparsingofvectorclassfunctions

that havealldata sequentiallyin memory. There are othertypesof functions

that passadynamicamountof memory, andthe specicationparser couldbe

extendedto parse these. Oneexample is theglMap* familyof functions that

(43)

AstheICDisconnectedto allOpenGLapplicationsonthesystem,somekind

of selective interception should be implemented. The user should be able to

selectwhich applications tointercept,and maybeeven set someconguration

options depending on how that application behaves. This could probably be

madebyfetchinginformationonthecurrentprocessandreadingconguration

parametersfrom acongle.

9.5 Pluggable render back ends

Render back ends (Tiled, RedBlue and CoreRender)are implementedas

sep-arate functions inside the interceptor. Tomakethe interceptor moreexible,

these back ends should be implemented asseparate DLL:s using adened

in-terface.

9.6 Ports and other API:s

Thetechniquesdiscussedand implementedon OpenGLfor Windowscouldbe

adapted toworkwithbothDirect3D onWindowsandOpenGL onother

plat-forms. Mostoperatingsystemsprovidesomesortofdynamicallylinkedlibraries

whichcanbeinterceptedusingatrojanlibrary. Thespecicationparsercould

thenbemodiedto generateinterceptioncodefornearly anyAPI.

9.7 OpenGL extension

As moreandmore3D displaysbecomesavailable onthe market,it mightnot

besuitableto foreach displayvendorprovideaninterceptorwiththerequired

renderingbackend. Toavoidamultitude ofdierent interceptors,astandard

interfacecouldbedened,lettingthevendorto hook onto theOpenGLdriver

and controllingrenderingofthe dierentviewports. All theinterceptionand

enqueueing could thebe performed inside the GPU vendors drivers(which is

probablyalreadydonein nVidiasstereodrivers).

Tobypasssomeofthelimitationsoftheinterceptorimplementedinthisthesis,

anOpenGLextension thatcontrolstheenqueueingcouldbeexposed,allowing

theapplicationtosomewhatcontrolwhichpartsoftherenderingtobeenqueued

(44)

[1] NVIDIACorporation. NVIDIA3D StereoUser'sGuide. NVIDIA

Corpo-ration,7.5edition,July2005.

[2] NVIDIACorporation. NVIDIA GPUProgrammingGuide2.4.0. NVIDIA

Corporation,2005.

[3] N.A.Dodgson.Autostereoscopic3DDisplays.Computer,38(8):3136,Aug.

2005.

[4] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides.

De-signPatterns,ElementsofReusableObject-OrientedSoftware.Professional

ComputingSeries.Addison-Wesley,1995.

[5] GregHumphreys,IanBuck,MatthewEldridge,andPatHanrahan.

Distrib-utedrenderingforscalabledisplays.InSupercomputing'00: Proceedingsof

the 2000 ACM/IEEE conference on Supercomputing (CDROM), page30,

Washington,DC,USA,2000.IEEEComputerSociety.

[6] Greg Humphreys and Pat Hanrahan. A distributed graphics system for

largetileddisplays. InVIS '99: Proceedings of the conference on

Visual-ization'99,pages215223,LosAlamitos,CA,USA,1999.IEEEComputer

SocietyPress.

[7] KurtAkeleyMark Segal. The OpenGL Graphics System: A Specication

(Version2.0). opengl.org,2004.

[8] Wojciech Matusik and Hanspeter Pster. 3D TV: A scalable systemfor

real-time acquisition, transmission, and autostereoscopic display of

dy-namicscenes. ACM Trans.Graph.,23(3):814824,2004.

[9] DanielS.MyersandAdamL.Bazinet. Interceptingarbitraryfunctionson

windows, unixand macintoshosx platforms. Master's thesis, University

OfMaryland,2004.

[10] MSDN Network. Platform SDK: DLLs, Processes, and Threads. In

Mi-crosoftCorporation,September2005.

[11] Christopher Niederauer, Mike Houston, Maneesh Agrawala, and Greg

Humphreys. Non-invasive interactive visualization of dynamic

architec-turalenvironments. InSI3D '03: Proceedings of the 2003 symposiumon

(45)

November1997.

[13] Matt Pietrek. An in-depth look into the win32 portable executable le

format. MSDN Magazine,February2002.

[14] Matt Pietrek. An in-depth look into the win32 portable executable le

format,part2. MSDN Magazine,March2002.

[15] TheMesaproject. Mesa3DOpenGLimplementation. www.mesa3d.org.

[16] GraphicRemedy. gDEBugger. www.gremedy.com/products.php.

(46)

ICD Interface BOOL DrvCopyContext(HGLRC hGlrcSrc, HGLRC hGlrcDst, UINT mask) HGLRC DrvCreateContext(HDC hDc) BOOL DrvDeleteContext(HGLRC hGlrc) HGLRC DrvCreateLayerContext(HDC hDc, int iLayerPlane) CDCallTable * DrvSetContext(HDC hDc, HGLRC hGlrc, void * callback) void DrvReleaseContext(HGLRC hGlrc) BOOL DrvShareLists(HGLRC hGlrc1, HGLRC hGlrc2) BOOL DrvDescribeLayerPlane(HDC hDc, int iPixelFormat, int iLayerPlane, UINT nBytes, LPLAYERPLANEDESCRIPTOR plpd) int DrvSetLayerPaletteEntries(HDC hDc, int iLayerPlane, int iStart, int cEntries, CONST COLORREF * pcr) int DrvGetLayerPaletteEntries(HDC hDc, int iLayerPlane, int iStart, int cEntries, COLORREF * pcr) BOOL DrvRealizeLayerPalette(HDC hDc, int iLayerPlane, BOOL bRealize) BOOL DrvSwapLayerBuffers(HDC hDc, UINT fuPlanes) int DrvDescribePixelFormat(HDC hDc,

(47)

UINT nBytes,

LPPIXELFORMATDESCRIPTOR ppfd)

PROC DrvGetProcAddress(LPCSTR lpszProc)

int DrvSetPixelFormat(HDC hDc,

int iPixelFormat)

BOOL DrvSwapBuffers(HDC hDc)

(48)

Intercepting

B.1 Using the helper headers

typedef struct GLImplementation{

#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \

ret (__stdcall* prefix##name##) args_tn;

#include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME } GLImplementation; GLImplementation realGl; void LoadPointers(){

#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \

*((FUNCTION*) &realGl.##prefix##name##) = \

(FUNCTION)GetProcAddress(glHandle, #prefix #name );

#include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME } B.2 Forward table GLImplementation receiveGl; extern "C"{

(49)

return receiveGl.##prefix##name##args_n##;\ } #include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME BOOL wglSwapBuffersIntercept(HDC hDc){ // Do stuff return realGl.wglSwapBuffers(hDc); } }; void SetupForwardTables(){

#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \

receiveGl.##prefix##name = realGl.##prefix##name##; #include "gltable-1.0.h" #include "gltable-1.1.h" #include "wgltable.h" #undef PROCESS_NAME receiveGl.wglSwapBuffers = wglSwapBuffersIntercept; }

B.3 Loading ICD pointers

typedef struct ICDCallTable{

DWORD numCalls;

PROC table[336];

} ICDCallTable;

ICDCallTable* APIENTRY DrvSetContextSpecial(HDC hDc,

HGLRC hGlrc,

void *callback){

static ICDCallTable* icdTable = NULL;

if(!icdTable){

// Get the GL calltable from the real ICD

icdTable = realGl.DrvSetContext(hDc, hGlrc, callback);

// Fetch all calls to our calltable

#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \

*((FUNCTION*) &realGl.##prefix##name##) = \

(50)

#include "gltable-1.1.h"

#undef PROCESS_NAME

// Rewite calls for interception

#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \

icdTable->table[##num##] = \ (PROC)##prefix##name##; #include "gltable-1.0.h" #include "gltable-1.1.h" #undef PROCESS_NAME SetupForwardTablesSetupTables(); } return icdTable; } void SetupForwardTables(){

#define PROCESS_NAME(prefix, ret, name, args_tn, args_n, num) \

receiveGl.##prefix##name = \ realGl.##prefix##name##; #include "gltable-1.0.h" #include "gltable-1.1.h" #include "icdtable.h" #undef PROCESS_NAME receiveGl.DrvSetContext = DrvSetContextIntercept; }

(51)

Enqueueing and Replaying

C.1 Enqueue function

template<class T>

inline void enqueue(const T t){

if(sizeof(T) == 1){

const unsigned char* c =

reinterpret_cast<const unsigned char*>(&t);

queue.push_back(static_cast<unsigned int>(c[0]));

}

else if(sizeof(T) == 2){

const unsigned short int* i =

reinterpret_cast<const unsigned short int*>(&t);

queue.push_back(static_cast<unsigned int>(i[0]));

}

else if(sizeof(T) == 4){

const unsigned int* i =

reinterpret_cast<const unsigned int*>(&t);

queue.push_back(i[0]);

}

else if(sizeof(T) == 8){

const unsigned int* p =

reinterpret_cast<const unsigned int*>(&t);

queue.push_back(p[1]); queue.push_back(p[0]); } else{ assert(false); } } C.2 Object wrapping class GlCall{

(52)

GlCall() { };

virtual ~GlCall() { };

virtual void execute() = 0;

};

class GlCall_glBegin : public GlCall{

private:

GLenum mode;

public:

GlCall_glBegin(GLenum _mode) : mode(_mode) { };

~GlCall_glBegin() { };

void execute(){

realGl.glBegin(mode);

};

};

void glBeginEnqueue(GLenum mode){

queue.push_back(new GlCall_glBegin(mode));

realGl.glBegin(mode);

}

void Replay(){

for(size_t i = 0; i < queue.size(); i++)

queue[i]->execute();

}

C.3 Function pointers

void glBeginEnqueue(GLenum mode){

enqueue(reinterpret_cast<unsigned int>(realGl.glBegin)); enqueue((static_cast<unsigned int>((sizeof(mode)<4?4:sizeof(mode)) + 0)) >> 2); enqueue(mode); return realGl.glBegin(mode); } Replay(){

unsigned int offset = 0;

unsigned int faddr;

unsigned int numArgs;

unsigned int ac;

while(offset < queue.size()){

faddr = queue[offset];

(53)

offset++;

ac = numArgs;

_asm{

push eax;

push ebx;

mov ebx, esp;

}

while(ac > 0){

arg = queue[offset];

_asm{

mov eax, arg;

push eax; } offset++; ac--; } _asm{

mov eax, faddr;

call eax;

mov esp, ebx;

pop ebx; pop eax; } } } C.4 Enumerated

typedef unsigned int (*REPLAY_FUNCTION)(unsigned int);

void glBeginEnqueue(GLenum mode){

enqueue(GlFuncEnum_glBegin);

enqueue(mode);

return realGl.glBegin(mode);

}

unsigned int glBeginReplay(unsigned int offset){

realGl.glBegin( *reinterpret_cast<GLenum *>(&(queue[offset + 0 ])) );

return offset + 0 + ((sizeof(GLenum)<4?4:sizeof(GLenum)) >> 2);

}

(54)

unsigned int fenum;

unsigned int faddr;

while(offset < queue.size()){ fenum = queue[offset]; faddr = *replayPointers[fenum]; offset++; offset = (reinterpret_cast<REPLAY_FUNCTION>(faddr))(offset); } }

(55)

Modied call

void APIENTRY glMap2fEnqueue(GLenum target,

GLfloat u1, GLfloat u2, GLint ustride, GLint uorder, GLfloat v1, GLfloat v2, GLint vstride, GLint vorder,

const GLfloat * points){

unsigned int numElems = GetNumMap2Elements(target);

enqueue(GlFuncEnum_glMap2f); enqueue(numElems); enqueue(target); enqueue(u1); enqueue(u2); enqueue(uorder); enqueue(v1); enqueue(v2); enqueue(vorder);

for(unsigned int j = 0; j < static_cast<unsigned int>(vorder); j++)

for(unsigned int i = 0; i < static_cast<unsigned int>(uorder); i++)

for(unsigned int e = 0; e < numElems; e++)

enqueue(points[e + i*ustride + j*vstride]);

return realGl.glMap2f(target, u1, u2, ustride, uorder,

v1, v2, vstride, vorder, points);

}

unsigned int glMap2fReplaySpecial(unsigned int offset){

unsigned int numElems = queue[offset];

offset++;

GLint uorder = *reinterpret_cast<GLint *>(&(queue[ ... ]));

(56)

*reinterpret_cast<GLenum *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLfloat *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])), *reinterpret_cast<GLint *>(&(queue[ ... ])),

reinterpret_cast<const GLfloat *>(&(queue[ ... ]))

);

return ... ;

(57)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under 25 år från publiceringsdatum under förutsättning att inga extraordinära

omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten vid

en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

be-skrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form

eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller

konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

för-lagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet – or its possible

replacement – for a period of 25 years starting from the date of publication

barring exceptional circumstances.

The online availability of the document implies permanent permission for

anyone to read, to download, or to print out single copies for his/hers own use

and to use it unchanged for non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional upon the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its www home page:

http://www.ep.liu.se/.

References

Related documents

Further on, technological change can emerge in forms of disruptive technologies, Bower and Christensen (1995). This results in changes in the industry which is

A detailed list of components combined with a finished CAM-model for a measurement card are presented along with interface cards and shielding solutions... Handledare: Magnus

A researcher reported that the ethics review board dis- approved such a study because confidence in research is at risk if researchers study people’s actions without them being

In Chapter 4 we look at the tangent spaces to a differentiable manifold, and define a tangent vector as a point-derivation of the algebra of germs of smooth functions at a point on

registered. This poses a limitation on the size of the area to be surveyed. As a rule of thumb the study area should not be larger than 20 ha in forest or 100 ha in

Radiology still account for the majority of challenges, but pathology and ophthalmology are increasing Figure 2: Results of task 1 of Cancer Metastases in Lymph Nodes 16: Detection

This thesis report will concentrate on giving a overview in how different volume rendering techniques can be used to visualise a cloud in real or interactive time, but will also

The demand is real: vinyl record pressing plants are operating above capacity and some aren’t taking new orders; new pressing plants are being built and old vinyl presses are