Gaze-driven interaction in video games

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

LIU-ITN-TEK-A-18/047--SE

Gaze-driven interaction in

video games

Mohamed Al-Sader

2018-10-19

(2)

LIU-ITN-TEK-A-18/047--SE

Gaze-driven interaction in

video games

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Mohamed Al-Sader

Handledare Jimmy Johansson

Examinator Stefan Gustavson

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Linköping University | Department of Science and Technology

Master thesis, 30 ECTS | Media Technology

202018 | LIU-ITN/LITH-EX-A--2018/xxx--SE

Gaze-driven interaction in

video games

This thesis is presented for the degree of

Master of Science of Linköping University

Mohammed Al-Sader

Supervisor : Jimmy Johansson

Examiner : Stefan Gustavson

(5)

Presentation Date

Publishing Date (Electronic version)

Department and Division

Department of Science and Technology

Linköping University

URL, Electronic Version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-xxxx (Replace xxxx with the correct number)

Publication Title

Gaze-driven interaction in video games

Author(s)

Mohammed Al-Sader

Abstract

The introduction of input devices with natural user interfaces in gaming hardware has changed the way

we interact with games. Hardware with motion-sensing and gesture recognizing capabilities remove the

constraint of interacting with games through typical traditional devices like mouse-keyboard and

gamepads. This changes the way we approach games and how the game communicates back to us as the

player opening new levels of interactivity.

In this thesis we examine how eye tracker technology can be used in games. Eye tracking technology

has previously been in extensive use within areas of support, aiding people with disabilities. It has also

been used in marketing and usability testing. Up to date, the use of eye tracking technology within games

has been very limited. This thesis will cover how to integrate Tobii's eye tracker in EA DICE's Frostbite

3 game engine and how to improve the gaze accuracy of the device through filtering. It will also cover the

use of eye tracking technology in rendering methods. In addition, we will study how the eye tracker

technology can be used to simulate the human visual system when changing our focus point and when we

adapt to new luminance conditions in the scene. This simulation will be integrated with the

implementation of depth of field and tone mapping in Frostbite 3.

Keywords

gaze direction, gaze-dependent DOF, gaze-dependent tone mapping, depth of field, eye tracker, gaze

Language X English

Other (specify below)

Number of Pages 54 Type of Publication Licentiate thesis X Degree thesis Thesis C-level Thesis D-level Report

Other (specify below)

ISBN (Licentiate thesis) ISRN:

Title of series (Licentiate thesis) Series number/ISSN (Licentiate thesis)

(6)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – från publicerings-datum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfat-tning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its proce-dures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(7)

Abstract

The introduction of input devices with natural user interfaces in gaming

hard-ware has changed the way we interact with games.

Hardware with

motion-sensing and gesture recognizing capabilities remove the constraint of interacting

with games through typical traditional devices like mouse-keyboard and gamepads.

This changes the way we approach games and how the game communicates back to

us as the player opening new levels of interactivity.

In this thesis we examine how eye tracker technology can be used in games. Eye

tracking technology has previously been in extensive use within areas of support,

aiding people with disabilities. It has also been used in marketing and usability

test-ing. Up to date, the use of eye tracking technology within games has been very

limited. This thesis will cover how to integrate Tobii’s eye tracker in EA DICE’s

Frostbite 3 game engine and how to improve the gaze accuracy of the device through

filtering. It will also cover the use of eye tracking technology in rendering methods.

In addition, we will study how the eye tracker technology can be used to simulate

the human visual system when changing our focus point and when we adapt to new

luminance conditions in the scene. This simulation will be integrated with the

im-plementation of depth of field and tone mapping in Frostbite 3.

Keywords:

gaze direction, gaze-dependent DOF, gaze-dependent tone mapping,

depth of field, eye tracker, gaze precision, gaze accuracy, measurement noise.

(8)

Acknowledgments

I would like to thank my supervisor Torbjörn Söderman, Technical Director at EA DICE, for the guidance and valuable feedback he provided me. I would like to thank the developers from the Frostbite and Battlefield 4 teams for the support and input they gave me: Mattias Unger for giving me input on the filter implementation, Toby Vockrodt who helped me integrate my filter implementations with the entity system, Charles de Rousiers who helped me understand the depth of field implementation in Frostbite 3 and Sébastien Hillaire who gave me very valuable feedback on my implementation of gaze-dependent depth of field which is based on his work. In particular I would like to thank Tobias Bexelius for his valuable advice, encouragement and help in solving problems in all of my work throughout my time in EA DICE. Next I would like to thank the staff at Tobii Technology for their support: Dzenan Dzemidzic and Fredrik Lindh who familiarized me with the eye tracker and Tobias Lindgren and Anders Olsson for supporting me with any questions and problems I had with the hardware. I would also like to thank my examiner Stefan Gustavson at Linköpings University for the patient guidance and support he provided me and for making this project possible together with EA DICE. Finally I would like to thank my family and friends for proofreading and for their patience and endless encouragement.

(9)

List of Figures

1 Pupil Centre Corneal Reflection (PCCR) remote eye tracking technique where an im-age of the reflections on the cornea and pupil, created from the illumination of a light source, is captured and then used to calculate the gaze direction. Image courtesy to [1]. 3 2 Gaze precision and accuracy metrics from the performance of an eye tracker. The

black circle illustrates the real gaze point of the client while the red x marks are the measured gaze points. The bottom and upper left examples show less dispersion between each gaze sample and thus good precision compared to the bottom and upper right examples. For the accuracy the bottom left and upper right examples show that samples are at a much smaller angular distance from the real gaze point

giving better accuracy. Image courtesy to [2]. . . 4

3 The dashed line is the measured gaze point and the solid line is the actual gaze point. The distance between them shows the gaze accuracy. . . 6

4 Thin lens camera model. Image courtesy to [17]. . . 7

5 Pinhole camera model. Image courtesy to [17]. . . 8

6 The Zone system maps scene zones to print zones with the middle brightness of the scene mapped to the middle print zone. Image courtesy to [23]. . . 9

7 HDRI technique showing pictures of a scene taken at different stops. Image courtesy to [27]. . . 10

8 Flow chart of eye tracker integration in Frostbite 3. . . 13

9 Signal with measurement noise. . . 13

10 Signal filtered by average filter. . . 15

11 Signal filtered by average filter (zoomed). . . 16

12 Block Diagram of Moving Average Filter. . . 18

13 First-order low-pass filter. The square represents the cutoff frequency. . . 18

14 Butterworth First-order low-pass filter. The square represents the cutoff frequency. . 19

15 Backward Euler Differentiation Approximation. Image courtesy to [34]. . . 20

16 Signal filtered by First-order Low-pass filter. . . 21

17 Signal filtered by First-order Low-pass filter (zoomed). . . 21

18 Time constant τ against cutoff frequency fc. . . 22

19 First-order Low-pass filter applied to a signal with low and high cutoff frequencies. 23 20 First-order Low-pass filter applied to a signal with low and high cutoff frequencies (zoomed). . . 23

21 Block diagram of Online Cursor Filter. Image courtesy to [11]. . . 26

22 Signal filtered by Online Cursor Filter. . . 26

23 Signal filtered by Online Cursor Filter (zoomed). . . 27

24 Edge of signal filtered by Online Cursor Filter (zoomed). . . 27

25 False Alarms in Online Cursor Filter. . . 29

26 False Alarms in Online Cursor Filter (zoomed). Left: Arrow 1, Middle: Arrow 2, Right: Arrow 3. . . 29

27 Online Cursor Filter with large threshold giving slow response when steady-state of the signal changes. . . 30

(12)

28 Hyperbolic Tangent Function. The blue line shows the standard function while the

red line shows the function scaled. . . 31

29 Signal filtered by Spatial filter. . . 32

30 Signal filtered by Spatial filter (zoomed). . . 32

31 Circle of Confusion. Image courtesy to [19]. . . 33

32 Flow chart of gaze-dependent DOF in Frostbite 3. . . 34

33 Autofocus system represented by a focus zone centered on a filtered gaze point. . . . 34

34 Gaussian Function. . . 35

35 Focal point calculated using spatial weighting of the focus zone. . . 36

36 Focal point calculated as average depth of the focus zone. . . 37

37 Gaze-dependent DOF. . . 38

38 Gaze-dependent DOF. . . 38

39 Focal distance calculated with accommodation effect. . . 39

40 Focal distance calculated with accommodation effect (zoomed). . . 40

41 Flow chart of gaze dependent tone mapping in Frostbite 3. . . 41

42 The tone mapping operator (a) [32] uses the constraint from Equation 37 while (b) [37] does not use it. We see that (b) has poor quality from pure black regions when gazing at a high luminance region (the sun). In (a) we get better quality since we still see details in lower luminance regions similar to the global method in (c) [23]. Image courtesy to [32]. . . 42

43 Without the constraint in step 3 the tone mapping operator gives large values for the log-average luminance in high luminance regions of the scene as seen in the red curve. Using the constraint gives the tone mapping operator a logarithmic behavior similar to the global method. Image courtesy to [32]. . . 42

44 Gaze-dependent tone mapping when gazing at regions with low luminance. In this scene we see details in low luminance regions clearly while high luminance regions are overexposed. . . 43

45 Gaze-dependent tone mapping when gazing at high luminance regions. We see de-tails in the high luminance regions while low luminance regions becomes darker. . . 44

46 Gaze-dependent tone mapping with the constraint in [32] being used. This method preserves more details in the regions with low luminance while gazing at regions with high luminance. . . 44

47 Gaze-dependent tone mapping without the constraint used. The luminance in dark regions becomes lower. . . 45

48 Spatial Filter and Online Cursor Filter. . . 47

49 Spatial Filter and Online Cursor Filter (zoomed). . . 47

(13)

List of Tables

1 Abstract interface class for eye trackers. . . 11 2 Interface for Tobii Eye Trackers. . . 12

(14)

1 Introduction

1.1 Motivation

Input commands in video games traditionally come from three different devices: mouse, key-board and gamepads. Games played on a computer use a mouse-keykey-board combination while home game systems generally use a gamepad. This is the preferred format for the current gen-eration of home game systems and computers and has been the preferred format going back decades in previous generations. In the past decade this restriction has gradually been removed from both consoles and computers with the introduction of devices using a natural user inter-face; voice and gesture commands (Kinect), motion commands (Wii Remote and PS Move) and touch commands (Wii U gamepad and tablets). One of the most natural uses of the human body is the eyes. With an eye tracker we get a natural user interface that communicates between us, the client, and the software by tracking the gaze of our eyes. By integrating this technology in video games new ways of interacting with games become possible. For example, rendering effects or game logic that depends on the eyes in a real environment could use an eye tracker to receive feedback from the eyes to create a more immersive experience. The aim of a player controlled character in a game, AI behavior and interaction could be simulated by analyzing the gaze of the player.

1.2 Aim

The aim of this thesis is to research how Tobii’s eye tracker technology can be used to interact with games. The thesis will cover integration of the Tobii X2-30 eye tracker in Frostbite 3, filter-ing of gaze data received from the eye tracker and finally integration in some of the renderfilter-ing techniques in Frostbite 3. The report is structured in the following way:

• Chapter 1gives an introduction on eye trackers and eye tracker technology. It also gives a short introduction to EA DICE and Tobii Technology who collaborated in this thesis. • Chapter 2provides background and related work on the filtering and eye tracker

integra-tion on the rendering techniques in Frostbite 3.

• Chapter 3demonstrates the implementations and results of the eye tracker integration, the filtering of input data from the eye tracker and the integration with rendering techniques in Frostbite 3.

• Chapter 4includes a discussion surrounding the results obtained throughout this thesis; advantages and disadvantages of the filtering of input data and the effect of an eye tracker

(15)

1.3. EA DICE

1.3 EA DICE

EA DICE is a Swedish video game developer. Founded in 1992 and based in Stockholm, EA DICE is today well-established within the game industry with critically acclaimed video game series such as Battlefield and Mirror’s Edge. The company is also the developer of the Frostbite game engine which today is used by multiple EA studios.

1.4 Tobii Technology

Tobii Technology is a Swedish hi-tech company specializing in development of hardware and software for eye tracking. Founded in 2001 with headquarters based in Stockholm, Tobii Tech-nology is a global leader in eye tracker techTech-nology with heavy focus on research and devel-opment. With their eye tracking technology involved in a wide range of fields such as usabil-ity, automotive, computer games, human behavior, marketing research and neuroscience, Tobii Technology have multiple partnerships with well-established software and hardware vendors around the world. The company is divided into three business units: Tobii Dynavox, Tobii Pro and Tobii Tech. Each unit focuses on different fields where their technology is applicable.

1.4.1 Tobii Eye Tracker

Tobii eye trackers are used in three main fields: assistive communication, human behavior and volume products (computer hardware, automotive and games). Development of eye trackers with assistive technology comes from Tobii Dynavox and focuses on people with speech dis-abilities. These eye trackers are devices integrated with computers or tablets that offer other assistive technology together with the eye tracking capability such as the software or touching mechanism. Tobii Pro develops eye trackers used to study human behavior. For example, eye trackers can be used by firms wanting client feedback by studying where the client is gazing in their websites or advertisements. They could be used by researchers in academic institutions to study the human behavior and interaction with virtual environments among other things. Eye trackers developed by Tobii Pro come in various formats: as standalone devices mounted on monitors, as glasses and as screen-based eye trackers. Tobii Tech develops eye trackers used in volume products such as VR, games and automotive. These eye trackers also come in different formats. For gaming eye trackers are mounted on the computer monitor to track the player’s gaze. In automotive such as cars a chip is integrated and then used for tasks such as identify-ing the driver for personalization of car settidentify-ings or for safety such as detection of distraction or drowsiness of the driver.

1.4.2 How eye tracking works - Pupil Centre Corneal Reflection

The pupil center corneal reflection (PCCR) is one of the most common remote eye tracking techniques and also the one Tobii’s eye trackers are based on. In PCCR a camera is used to capture an image of the reflections on the cornea and pupils created by the illumination of a light source. The two reflections form an angle from which a direction vector is calculated. This vector is used together with other geometrical features coming from the reflections to retrieve the gaze direction. Tobii’s technique works similarly to PCCR; a near-infrared light source from the eye tracker illuminates the eyes to create the reflections on the cornea and pupil. The image sensors captures two images of the reflections and the gaze direction is then calculated from an image processing algorithm [1], see Figure 1.

(16)

1.4. Tobii Technology

Figure 1: Pupil Centre Corneal Reflection (PCCR) remote eye tracking technique where an image of the reflections on the cornea and pupil, created from the illumination of a light source, is captured and then used to calculate the gaze direction. Image courtesy to [1].

(17)

2 Background and Related Work

2.1 Eye Tracker Filtering

The performance of eye trackers are important as they determine how well we interact with the virtual world of a game. Poor interaction caused by poor gaze data from the eye tracker makes the client feel disconnected from the response of the game while good gaze data provides new intuitive ways of interacting with the game alongside increased immersion. The performance of eye trackers are defined by three core metrics affecting the gaze data the most: robustness, accuracy and precision [2], see Figure 2. By determining the size of a focus zone centered on the client’s gaze point and by filtering gaze data both gaze accuracy and precision are improved which gives the client a more polished experience in-game with the eye tracker.

Figure 2: Gaze precision and accuracy metrics from the performance of an eye tracker. The black circle illustrates the real gaze point of the client while the red x marks are the measured gaze points. The bottom and upper left examples show less dispersion between each gaze sample and thus good precision compared to the bottom and upper right examples. For the accuracy the bottom left and upper right examples show that samples are at a much smaller angular distance from the real gaze point giving better accuracy. Image courtesy to [2].

(18)

2.1. Eye Tracker Filtering

2.1.1 Gaze accuracy

The gaze accuracy is defined as the average angular distance between the gaze point measured by the eye tracker and the real gaze point (see Figure 2), and it is measured in degrees of the visual angle [3,4]. This is illustrated in Figure 3 where the dashed line shows the real gaze point and the solid line shows the measured gaze point. Gaze points measured by the eye tracker rarely correspond to the real gaze points. Head position relative to the eye tracker, change in gaze direction and environmental interference such as illumination affects the gaze accuracy and each degree of accuracy is an error from the real gaze point. For instance, the gaze accuracy for a Tobii X2-30 Eye Tracker measured under ideal conditions and from a distance of 60-65 cm from the eye tracker has an average gaze accuracy of 0.4˝_{[3]. Assuming we are sitting at a} distance of 65 cm from the eye tracker and we are using a 24” display (1920x1200) we show that a gaze accuracy of 0.4˝_{is superior to one screen pixel. Let C denote the measured gaze point} and F denote the real gaze point in Figure 3. The gaze accuracy is denoted as α, the distance as dand the eye as E. We want to solve for the unknown variable b which is the average angular distance error. From Figure 3 we have△ECFin Equation 1 which we can solve b from:

tan

α

_¨

π

180 =

b

d

ô b

=

d ¨

tan

α

_¨

π

180 (1)

With d = 262₍_65cm₎_{and α} ₌ _0.4˝ _{as input we get an average angular distance error of b «} 0.182₍_45.7mm₎_{. The pixel density, measured in pixels per inch (PPI) is calculated from Equation} 2:

PPI

=

d

p

d

i

, d

p

=

b

w

2p

+

h

2p

(2)

dp is the diagonal resolution in pixels of the monitor, wpand hp are the width and height of the resolution in pixels and diis the diagonal size in inches. With wp = 1920, hp = 1200 and di =242we get PPI « 94.3. Multiplying this by the angular distance error b gives us an error of 94.3 ¨ 0.18 « 17 pixels from the real gaze point. We see that even under ideal conditions Tobii X2-30 Eye Tracker gaze accuracy will be superior to one pixel which means the client is not necessarily gazing at the measured point. This error is treated when performing calculations heavily influenced by the gaze area by using the pixels in a focus zone centered on the measured gaze point to contribute to the final result. The size of the focus zone is then based on the average angular distance error.

(19)

2.2. Gaze-dependent Depth Of Field

Figure 3: The dashed line is the measured gaze point and the solid line is the actual gaze point. The distance between them shows the gaze accuracy.

2.1.2 Gaze precision

When interacting with the virtual world of a game the player will fixate their gaze on points of interest in the world. This makes the precision from the input device very important because its output must accurately match the players fixations. Having the eyes fixating on a point when using an eye tracker is referred to as a fixation, and gaze precision defines how accurate the fixation measured by the eye tracker is based on how well it measures the same gaze point [5]. However, measurement systems output digitized signals corrupted by noise resulting in poor precision [6]. In an eye tracker this gives fixations where gaze points are dispersed instead of being focused on the players point of interest which yields in poor precision (see Figure 2). High frequency measurement noise in an eye tracker comes from different factors such as influence from the eyes and interference from the environment in which the measurement is taking place in [7,8]. The noise caused by the eyes comes from different eye movements such as micro saccades, tremors, drifts, blinking, gaze directions in extreme peripheral regions where the precision degrades and other physiological properties [2,8–10]. System inherent noise comes from interference in the environment. Examples of this could be the illumination affecting the image captured by the image sensors of the eye tracker, imperfections in the algorithm used to estimate the gaze point from the image [11, 12] and limitations on the hardware used for the algorithm calculation. Poor precision particularly affect eye trackers used in entertainment mediums such as video games where the input from the player needs to result in an immediate and accurate output on the screen. In this case the output from the game would not correspond to the player’s expected input causing the player and the game to be out of sync. Similar to the gaze accuracy this can lead to various issues such as difficulties in UI-interactions or rendering artifacts caused by incorrect gaze data. To improve the precision a low-pass filter is applied to attenuate high frequencies. Choices of low-pass filters could be a first-order low-pass filter, average filter or dynamic filters such as a spatial filter that adjust its filter characteristics based on distance between two gaze points.

2.2 Gaze-dependent Depth Of Field

The Human Visual System (HVS) is regularly changing the focus point when the human eyes is scanning the environment. The view in the central part of the eye’s visual field is perceived sharply while the view outside of it (the parafoveal and peripheral vision) has less detail and is

(20)

2.2. Gaze-dependent Depth Of Field perceived as blurry. This is directly related to the foveal vision in which the fovea gives a sharp vision of the view seen in the central two degrees of the visual field [13–15]. Changing the focus point cause the fovea to be directed in a new direction and the image of the view in the new direction is then projected on the fovea. Similarly there is a depth range around the focus point where the projected image is perceived as sharp by the eyes while anything in front of or behind this range loses detail and is perceived as blurry. The depth range where the image is in focus is known as depth of field (DOF). The distance to this range is known as the focal distance (also known as focus distance).

DOF is similar to cameras where the optical lens focuses or diverges incoming light. Light rays emitted at the focal distance will be refracted by the optical lens and converge to a single point on the image plane (film in the camera) giving us a sharp image. Light rays emitted at a distance outside the focal distance will diverge more from each other after the lens refrac-tion and end up intersecting the image plane in a conic-like shape, see Figure 4. This shape is approximated by the Circle of Confusion (CoC) [16–18] whose diameter is proportional to the distance where the light rays are emitted from. If the diameter becomes large enough to make the intersecting shape distinguishable from the smallest point that a human eye can see, it contributes to a blurring effect.

In computer graphics the default camera model used is a pinhole camera model. This model always gives a sharp image because theoretically it has an infinite small aperture that only allows one light ray from each point in the scene to pass through [17], see Figure 5. This causes the projected point on the image plane to be infinitely small. However, since real cameras are equipped with lenses and have a finite size on their aperture, thereby allowing multiple light rays from each point in the scene to pass through, they can cause the occurrence of the CoC. Rendering the projected point on the image plane corresponds to a pixel on the screen. For an infinitely small aperture this means we only sample the pixel we are currently processing when determining its color. If we simulate an aperture with a finitie size and the CoC is large enough, the neighboring pixels are sampled when determining the color of the current pixel which results in a blurry effect.

(21)

2.2. Gaze-dependent Depth Of Field

Figure 5: Pinhole camera model. Image courtesy to [17].

DOF has been simulated in fields like photography and cinematography for a long time [16] and was introduced in the early age of computer graphics [19]. Today it is a powerful rendering technique when used in the VFX and games industry where developers use it to give a deeper immersion. It is also used as a tool to guide the client’s focus in the scene being displayed. In games depth of field has traditionally been static; a parameter controlling the focal distance is set and the scene is then blurred accordingly. For some types of games this doesn’t impose any gameplay restriction. For example, in FPS games the player focus almost exclusively on the visor which is located in the center of the screen [20]. Based on this, developers may simply set the focal distance to be within a range that keeps the center of the screen always in focus. Nevertheless this contradicts the natural behavior of the HVS and cameras where the focal dis-tance is changed according to the player’s gaze. Gaze-dependent DOF removes this restriction by simulating the behavior of the eyes. In this technique the focal distance is used as a dynamic parameter whose value is determined by the area the player is gazing at. This makes the focal distance always dependent on the player’s gaze point. This technique is implemented in two ways:

Focal distance from depth of a pixel:The coordinates of the gaze point are used to retrieve the focal distance from the depth of a pixel [21]. A pixel is sampled from the coordinates, and the depth of the pixel is then sampled from the depth buffer of the rendered scene, transformed to the correct coordinate system and then used as the focal distance when calculating the CoC.

Focal distance from focus zone: A focus zone centered on the filtered gaze point is used to calculate the focal distance [22]. The depth of each pixel in the focus zone is sampled from the depth buffer and stored in a new buffer. The average depth is then calculated from the focus zone depth buffer. This method is further enhanced by associating weights with the pixels. The first approach to this uses a Gaussian function to calculate the weights of each pixel in the focus zone. The center of the focus zone which corresponds to the gaze point has the maximum weight which gradually decreases for pixels further away from the center. The effect of this is that more importance is given to depths located around the center of the focus zone. The second approach uses semantic weighting where objects in the game have a weight. This creates a priority system where the depths of one object to has higher priority than the other object when both fall within the focus zone.

(22)

2.3. Gaze-dependent Tone Mapping

2.3 Gaze-dependent Tone Mapping

The dynamic range of luminance in the real world is very high, spanning from the light of a star to the light from the sun. The range of luminance between the two is in ten orders of ab-solute range and four orders of dynamic range from shadows and highlights in a scene [23]. Mapping the high dynamic range of luminance in the real world to a display format such as print on photographic papers and monitors (CRT, LCD etc.) to reproduce the luminance details is challenging due to the significantly lower dynamic range they have. Mapping from high dy-namic range to low dydy-namic range is known as tone mapping, and a well-known tone mapping method for non-digital black-and-white photography is the Zone System [24–26]. This method divides photography in two categories of zones: scene zones and print zones. Scene zones represent an approximate of the luminance range in the scene while print zones represent an approximate of the reflectance of a print. There are eleven print zones ranging from pure black to pure white. In turn there may potentially be many more scene zones because of the much higher dynamic range in the scene. Reproducing the high dynamic range to the print is done by mapping the scene zones to a print zone. This is done by first finding the luminance range of the middle brightness of the scene (known as the middle gray) and mapping it to the middle print zone, and then finding the luminance range for darker and lighter regions and mapping those to other print zones, see [23–26] for more details and Figure 6 for an illustration of the Zone System. In digital photography there are techniques such as high dynamic range imaging (HDRI) to reproduce the high dynamic range from exposure values known as stops. In this technique several pictures of a scene are captured at different exposure levels or stops and then blended together to get the final stop, see Figure 7 for an illustration of this technique.

Figure 6: The Zone system maps scene zones to print zones with the middle brightness of the scene mapped to the middle print zone. Image courtesy to [23].

(23)

2.3. Gaze-dependent Tone Mapping

Figure 7: HDRI technique showing pictures of a scene taken at different stops. Image cour-tesy to [27].

Many tone mapping algorithms used in computer graphics and digital photography are based on traditional photography which treat all parts of the scene equally. While this gives good results it contradicts the natural behavior of the HVS. The dynamic range of the eye at any given moment is limited to four orders of magnitude but the eyes have temporal adaptation to “extend” this dynamic range under varying luminance conditions. This is done by moving the detailed vision to the new luminance range in the scene which makes it very sensitive to a dynamic range of ten orders [23, 28, 29]. For instance, when entering a dark room from a bright room the change in the luminance is intense and the temporal adaptation adjusts the eye to the dark environment allowing it to see more details in the dark. Another example is when stepping into a movie theater whilst a movie is playing. At first the visible details are primarily from the movie display while the theater is dark. After a while the eye has adapted to a new range of luminance that includes darker regions allowing the observer to see some details of the movie theater itself. The temporal adaptation in the HVS adapts to an area covered by one degree of the viewing angle around the gaze direction of the observer, and areas that fall outside have significantly less impact on the result of the adaptation [30]. Most tone mapping algorithms do not simulate the temporal adaptation of the HVS [29] opting to go for a global approach by treating the luminance of the whole scene equally. In addition the global approach influences the image quality as it has been shown that gazing areas highly influence the quality of the final image [31]. In gaze-dependent tone mapping a global tone mapping approach [23] can be used to derive a local gaze-dependent approach by using the pixels of a HDR image in a focus zone corresponding to the gaze area instead of the entire image [32]. This is a closer simulation to the temporal adaptation and helps improving the image quality.

(24)

3 Implementation and results

3.1 Eye Tracker integration in Frostbite 3

Using C++ as codebase a class is implemented to integrate the eye tracker in Frostbite 3. The engine delegates all tasks related to the eye tracker device to the class. This includes connecting and disconnecting from the device, starting and stopping the eye tracking, loading user profile data for the device and fetching and sending gaze data. To allow different eye trackers of same or different hardware vendors to work with the engine an abstract class is first implemented. This class is a base class for all eye tracking devices and all eye tracker implementations derive from this class. To keep the interface simple the abstract class has four pure virtual functions handling common functionality in all eye trackers: initializing the eye tracker, sampling gaze data, retrieving gaze data and retrieving the sampling frequency of the eye tracker. The interface of the class is shown in Listing 1.

c l a s s EyeTracker

{

p u b l i c:

// Ctor and Dtor

EyeTracker ( ) ;

v i r t u a l ~EyeTracker ( ) ;

v i r t u a l void i n i t i a l i z e ( ) = 0 ;

v i r t u a l void sample (f l o a t deltaTime ) = 0 ;

v i r t u a l Vec2 getRawGazePoint ( ) const = 0 ;

v i r t u a l f l o a t getSamplingFrequency ( ) const = 0 ;

// Other member f u n c t i o n s

p r i v a t e:

// Member v a r i a b l e s . . .

} ;

Tables 1: Abstract interface class for eye trackers.

initialize() loads the eye tracker settings, connects the eye tracker and starts the tracking. Utility functions for initializing the eye tracker may be available depending on the supplied library. Among the utility functions supplied with Tobii X2-30 library are load and validation of system and profile configurations, error registrations when connecting and disconnecting to the eye tracker and error registrations when starting and stopping the eye tracking. The error

(25)

registra-3.1. Eye Tracker integration in Frostbite 3 loop. The library function starting the tracking has a callback function which is called every time new gaze data is sampled.

sample(. . . ) becomes a special case when using Tobii X2-30. Originally the function was included in the base class to stay consistent with the interface used by all input devices in Frost-bite 3. Since the sampling function is called every frame by the engine the idea was extended so it would sample the gaze data every frame. However, this created a conflict with the Tobii X2-30 C API. As mentioned earlier the function starting the tracking registers a callback func-tion called every time new gaze data is available, and the gaze data is cached in this callback function. This makes the callback function in essence a sample function. Using the sample func-tion derived from the engine as the callback funcfunc-tion is not possible because its type signature differs from the type signature required for the callback function in Tobii X2-30 C API. Due to this the sampling function derived from the engine was left empty at first, and then later used to check the connection of the eye tracker in every frame. If the state showed that it wasn’t connected then it would try to re-connect to the device. This goes against the intention of the sample function in the engine. However, due to constraints imposed by both APIs it was left in this state.

getRawGazePoint() and getSamplingFrequency() are ordinary getter functions. The former returns a two-dimensional vector representing the raw gaze data on the monitor. This is the gaze data cached every time by the callback function registered with the tracking. The sampling frequency simply returns the frequency of the eye tracker sampling. Tobii X2-30 has a sampling frequency at 30 Hz.

c l a s s TobiiEyeTracker : p u b l i c EyeTracker

{

p u b l i c:

// Ctor and Dtor

TobiiEyeTracker ( ) ; ~TobiiEyeTracker ( ) ;

void i n i t i a l i z e ( ) o ve rri de ;

// Sampling f u n c t i o n of engine

void sample (f l o a t deltaTime ) o ve rr ide ; Vec2 getRawGazePoint ( ) const ov err id e ;

f l o a t getSamplingFrequency ( ) const o ve rr id e ;

// Other member f u n c t i o n s . . .

p r i v a t e:

// Callback f u n c t i o n to r e g i s t e r with the t r a c k i n g

s t a t i c void gazeCB (const t o b i i g a z e _ g a z e _ d a t a ∗ gaze_data ,

void∗ userData ) ;

// Member v a r i a b l e s . . .

} ;

Tables 2: Interface for Tobii Eye Trackers.

The class declaration in Listing 2 shows the interface for Tobii eye trackers. From the class declaration we see that gazeCB(. . . ) (the callback function registered with the tracking) has a completely different signature from sample(. . . ) inherited from the base class which shows the signature constraint described earlier. An instance of the Tobii Eye Tracker class is created in an input device manager handled by Frostbite 3 which in turn calls initialize to start the tracking. A flow chart of the integration is shown in Figure 8.

(26)

Figure 8: Flow chart of eye tracker integration in Frostbite 3.

3.2 Eye Tracker Filtering

As previously covered in section 2.1.2 high frequency measurement noise gives bad precision during fixations. This yields a noisy signal due to the eye tracker sampling gaze data dispersed around the point of interest of the player. This noise is seen in the high spikes in the signal, see Figure 9.

(27)

3.2. Eye Tracker Filtering Reducing the measurement noise by attenuating high frequencies requires the signal to be fil-tered. Since the noise mainly consists of high frequencies it makes low-pass filters very good candidates for noise attenuation. Four different low-pass filters are implemented and compared to each other:

• Average filter:A filter that use N samples and calculates the arithmetic mean.

• First-order low-pass filter: A discretized first-order low-pass filter that allows low fre-quencies below a cutoff frequency to pass while attenuating frefre-quencies above the cutoff frequency. The filter is affected by the sampling frequency and a time constant.

• Online cursor filter:A dynamic filter that adjusts itself after the signal trend between two mean values. This filter aims to simulate a mouse cursor [11]

• Spatial filter:A dynamic filter that adjusts its filter trend based on the distance between two gaze points.

The implementation and results from each filter is described in the following sections. Note that each filter shows different sessions of recorded gaze data. Since we’re mainly concerned with the results from applying filters rather than results of a specific set of gaze data, different recording sessions don’t affect the final result; any set of gaze data should show similar results when the same filter is applied to them. Similarly the dimensions of the gaze data don’t affect the result and thus only one dimension is shown in the results for simplification.

3.2.1 Average filter

One of the more common choice of filters when attenuating noise is the average filter. As the name implies the average of a number of samples is calculated using Equation 3:

¯x

=

1 N

N´1

ÿ

i=0

x

i

(3)

¯x denotes the mean value, xiis the measured sample and N is the total number of samples used in the calculation of the mean value. We see how the filter attenuates noise in a signal by looking at the standard deviation which shows the dispersion between the samples and the mean value of the samples. The standard deviation is calculated from Equation 4:

σ

₌

b

ř

N´1

i=0

(

x

i

´

¯x

)

2

N ´

1 (4)

where σ is the standard deviation. Equation 4 shows that if the number of samples N is in-creased σ becomes smaller leading to less dispersion from the mean value. When samples are less dispersed from the mean the noise is being attenuated. Applying the average filter to a signal is shown in Figure 10 and 11. In Figure 10 we see the filtered signals (green and red) being much smoother compared to the raw signal which fluctuates, and we see that Equation 4 holds true. Figure 11 shows a subset of the same result to see the smoothing effect of the filter better.

Although the average filter performs good noise attenuation, it shows poor results in main-taining edge sharpness which is seen in Figure 10. In comparison to the filtered signals the raw signal maintains very sharp edges. Increasing the amount of samples makes the edges lose their sharpness even more. This is seen when comparing the green signal with the red signal where the amount of samples is increased from 20 to 100. The edges represent the step responses which are sudden changes in the signal. Step responses in eye tracking occur when saccades occur, and less edge sharpness means a slower step response. This should be avoided since a slow step response makes the eye tracker slow at tracking the player’s gaze.

(28)

3.2. Eye Tracker Filtering This would be the equivalent of having an input lag when moving the mouse cursor from one point to another.

Another disadvantage with the average filter is that it considers all samples, even the oldest samples, to be equally important. If the eye tracker was producing a signal that kept a constant mean (if a fixation was occurring) this would be desirable. However, generally the player will be scanning the virtual world, causing the signal to have varying means as the gaze point will regularly move to different positions in the virtual world. If the mean varies regularly the signal is changing its trend, and trends should avoid having old samples weight equally (or at all) on the signal to avoid being influenced with incorrect gaze data. Overall the average filter performs noise attenuation well but suffers from two points: maintaining edge sharpness affecting the trend of the signal and the prioritization of gaze data. A tradeoff between edge sharpness and noise attenuation should be considered when using this filter.

(29)

Figure 11: Signal filtered by average filter (zoomed).

3.2.2 Moving Average Filter

The average filter has drawbacks in memory consumption and computational efficiency. Equa-tion 3 shows the filter requiring us to store N samples in memory when calculating the mean value. If N is a small number then it is trivial in modern computers but if it is large, which may be the case in simulations of complex systems, the filter becomes very wasteful with memory. For example, let’s assume we have a data set where each sample in the set consist of two vari-ables of type float. The amount of memory consumed by this data set can be calculated from Equation 5:

size

(

N

)

_dataSet

=

N ¨ sizeo f

(

f loat

)

¨

2 (5)

Equation 5 shows that the memory consumption grows linearly. Assuming the size of float is four bytes (the size used by floating point units following the single-precision floating-point format of IEEE 754 standard, see [33]) then a set of 1000 samples (two floats per each sample) requires 8 000 bytes and a set of 10 000 samples requires 80 000 bytes which is wasteful of memory. Similarly the computational efficiency will be unsatisfying; looking at Equation 3 again we see that N additions and one division are performed. For example, assuming we have a data set of 1 000 samples and need to calculate the mean in every frame in a program, running at 30 FPS adds up to 30 000 additions and 30 divisions in one second. Although 30 000 additions is trivial in modern hardware it still wastes many clock cycles that can be used for other data instructions. Furthermore divisions are generally expensive operations compared to other arithmetic operations since division algorithms executed by the CPU Arithmetic Logic Unit are less efficient so a good rule is to avoid divisions if not absolutely necessary.

One way to optimize the filter is to remove the division in Equation 3 by caching the result of the division in memory. The computational efficiency of the filter can then be optimized further

(30)

3.2. Eye Tracker Filtering by making the filter recursive. A recursive filter uses the previous output from the filter as a reference, see Equation 6.

¯x

k

=

¯x

k-1

+

1 N

(

x

k

´ x

k-N

)

(6)

Here ¯xkis the average of N latest samples at instant k. We see that ¯xkdepends on three vari-ables: the previous average ¯xk-1, the newest sample xkand the oldest sample from the previous average xk-N. Equation (7-9) shows how Equation 6 is derived:

¯x

k

=

1 N

k

ÿ

i=k´N+1

x

i

(7)

¯x

_k-1

=

1 N

k´1

ÿ

i=k´N

x

i

(8)

¯x

k

´

¯x

k-1

=

_N

1

k

ÿ

i=k´N+1

x

i

´

k´1

ÿ

i=k´N

x

i

!

ô

¯x

k

´

¯x

k-1

=

_N

1 (

x

k

´ x

k-N

)

ô

¯x

k

=

¯x

k-1

+

_N

1 (

x

k

´ x

k-N

)

(9)

Equation 7 and 8 calculate the current and previous averages while Equation 9 subtracts the latter from the former and solves ¯xk. This filter is called Moving Average Filter (MAF) because the filtered output uses the most recent N samples. One can think of it as a window moving over a data set for each instant k that is calculated.

Equation 6 shows that only one addition is required and this is true for any number of samples used in this filter. Comparing the number of additions performed in MAF to the Aver-age Filter when using 1000 samples shows that MAF performs three orders of magnitudes less additions which is a large improvement at the cost of very small changes in the filter imple-mentation. While MAF improves computational efficiency it still needs to store N samples like the Average Filter since xk-N(the oldest sample from the average of the previous instant) needs to always be available when calculating the output at instant k. In addition it also needs to store the previous average. Overall MAF is preferred but it should be noted that it suffers from the same drawback in memory consumption as the Average Filter. A block diagram in stan-dard DPS notation for Equation 6 is shown in Figure 12, illustrating how the previous output is added to the summation of the calculation giving the recursive trait of the filter.

(31)

Figure 12: Block Diagram of Moving Average Filter.

3.2.3 First-order Low-pass filter

A first-order low-pass filter (LPF) allows low frequencies to pass through while attenuating frequencies higher than the cutoff frequency. The cutoff frequency is defined as the frequency where the filter attenuates the output ´3dB below the nominal output from the passband. The gaze data from the eye tracker gives a signal with high frequency fluctuations so applying a low-pass filter to it attenuates the fluctuations which yields a smoother signal.

Two examples of low-pass filters are shown in Figure 13 and Figure 14. In the figures the magnitude of frequencies starts being attenuated from 1.01 rad/s and 0.398 rad/s, both which are the cutoff frequencies of the filters. Smaller frequencies are not attenuated and fall within the range of the pass band.

(32)

Figure 14: Butterworth First-order low-pass filter. The square represents the cutoff frequency.

The equation for a digital LPF can be derived from the transfer function of the filter in the Laplace domain shown in Equation 10:

F

(

s

) =

Y

(

s

)

X

(

s

)

=

1

1 +

τ

_s

(10)

F(s)is the transfer function, X(s)and Y(s)are the Laplace transformations of the filter input and output in time domain, τ is the time constant describing the time to reach 63% of the new steady-state value for the filter’s step response1_{and s is the complex argument. The transfer} function describes the ratio of the filter output and input. If s is substituted with jω the transfer function will output lower values as ω increases which conforms the behavior of a low-pass filter. Cross-multiplying the denominators in Equation 10 yields Equation 11:

(

1 +

τ

_s

₎

_{¨ Y}

₍

_s

_{) =}

_X

₍

_s

₎

_{ô Y}

₍

_s

_{) +}

τ

_{s ¨ Y}

₍

_s

_{) =}

_X

₍

_s

₎

₍₁₁₎

Equation 11 is then transformed from Laplace-domain to time-domain by taking the Laplace inverse of it yielding Equation 12:

y

(

t

) +

τ

_¨

dy

dt

=

x

(

t

)

(12)

Equation 12 gives us the equation of a LPF in continuous time-domain. Since digital filters are in discrete-time space we need to discretize Equation 12. We substitute t with tnto denote a point in discrete-time space and discretize the derivate using the Backward Euler Differentiation Approximation. This method gives us an approximation of the derivate illustrated in Figure 15.

(33)

Figure 15: Backward Euler Differentiation Approximation. Image courtesy to [34].

Figure 15 shows the plot of a function of time y(t), the function at two discrete-time points y(tk) and y(tk-1), the step size h, the exact slope of the function ˙y(tk)and the slope of the secant line formed between the two points representing the approximated derivative which we denote as

˙y1₍_t_k₎_{. We see the error between the exact and approximated derivative}ˇ

ˇ˙y(t_k)´ ˙y1(t_k) ˇ

ˇ becom-ing smaller with smaller step sizes since the slope formed by the secant lines approaches closer to the exact slope, see [35] for examples illustrating this. While smaller step size gives better approximations of the exact derivative, making it too small introduces a round-of error. The round-of error affects the result more than the error between the real and approximated value, and as such the approximation stops improving beyond a certain threshold of the step size h [35]. The equation for Backward Euler Differentiation Approximation is shown in Equation 13:

dy

dt

«

y

(

t

n

)

´ y

(

t

n-1

)

h

, h

=

1 s f req

(13)

tnis a time in discrete-time space and h is our step size which represents the sampling interval of the eye tracker where s f req is the sampling frequency of the eye tracker. Substituting the derivative in Equation 12 with Equation 13 gives us Equation 14:

y

(

t

n

) +

τ

¨

+

y

(

t

n

)

´ y

(

t

n-1

)

h

=

x

(

t

n

)

(14)

Rearranging Equation 14 to solve y(tn)yields Equation 15:

y

(

t

n

) =

_τ

h

+

τ

¨ y

(

t

n-1

) +

h

+

τ

¨ x

(

t

n

)

(15)

(34)

3.2. Eye Tracker Filtering The fraction τ h+τ is commonly denoted as α:

α

₌

τ

h

+

τ

,

(

1 ´ α

) =

h

+

τ

(16)

Substituting the fractions in Equation 15 with Equation 16 yields in Equation 17, the final equa-tion of the discretizaequa-tion process and the discretized LPF:

y

(

t

n

) =

α

¨ y

(

t

n-1

) + (

1 ´ α

)

¨ x

(

t

n

)

(17)

Figure 16 and 17 show the LPF applied to a noisy signal:

(35)

3.2. Eye Tracker Filtering The filtered output from applying LPF shows sufficiently good noise reduction similar to the AF and MAF but the memory consumption is vastly decreased in LPF. Equation 17 removes the constraint of storing data N time instances in the past (see Equation 3 and 9 of the AF and MAF). Instead, it only requires data from the current and previous time instance. Similar to AF and MAF, division can be avoided by caching the value of α and(1 ´ α), reducing it to two multiplications.

One problem in Equation 17 is choosing the value of y(tn-1)at the beginning of an eye track-ing session where n = 0. Since no filtering has been performed at that time instance we are left with an undefined value for y(tn-1). One solution is to let y(tn-1) =x(tn). Using the same value as the first input gives a filtered output with the same value as the first raw input and this becomes the first “true” filtered output for the previous time instance.

3.2.3.1 Time constant against Cutoff Frequency

The relation between the time constant τ and the cutoff frequency fcin LPF is shown in Equation 18:

τ

₌

1 2π f

c

(18)

Recall that τ describes the time it takes for a step response to reach 63% of the new steady-state of the signal. If we set fcto be at large frequencies τ becomes smaller, and the filtered output reaches the steady-state faster. However, a larger fcalso means allowing a wider range of high frequencies to fall within the passband in the frequency domain which introduces more noise in the signal. A balance needs to be found between τ and fcwhere a new steady-state is reached fast enough without introducing too much noise. Equation 18 is illustrated in Figure 18 where the curve has a horizontal asymptote at τ=0:

Figure 18: Time constant τ against cutoff frequencyfc.

The result of applying a small and large time constant is shown in Figure 19. The green signal has a large τ for a small fcwhich is the opposite of the red signal. The noise attenuation should then be considerably strong in the green signal compared to the red signal while having less sharpness in the edges since it takes longer to reach the new steady-state of the signal. This is

(36)

3.2. Eye Tracker Filtering illustrated more clearly in Figure 20. The red signal shows the opposite trait of the green signal for small τ and large fc.

Figure 19: First-order Low-pass filter applied to a signal with low and high cutoff frequen-cies.

(37)

3.2.4 Online Cursor Filter

One restriction in the low-pass filter is its static behavior. The filter does not change its char-acteristics and instead performs the same filtering regardless of the signal trend. In Figure 19 we see the low-pass filter showing two trends. The first trend is when the noise in the signal is attenuated at the cost of an increased time to reach the new steady-state value when a saccadic eye movement occurs. The second trend is reaching the new steady-state of the signal fast but at the cost of more noise in the signal. This relation between noise attenuation and increased time to reach the steady-state of the signal is shown in Equation 18 as described in the previous chapter and clearly seen from the green and red signal in Figure 19 and 20. An improvement to the filter would be strong noise attenuation during fixations and also reaching the steady-state of the signal fast when saccades occurs in the input signal. The filter would adjust itself after the signal trend making it dynamic. In other words, if the input signal has a constant mean then a fixation is occuring and strong noise attenuation is applied and if the input signal has a signif-icant change in the mean the filter applies weaker attenuation to reduce the time for reaching the new steady-state of the signal.

A filter with dynamic behavior is the Online Cursor Filter (OCF) [11]. OCF was used to simulate the mouse cursor movement by using the gaze data as input to the mouse cursor on the monitor. Various experiments were performed such as allowing the user to fixate the cursor on desktop icons and to respond fast to saccadic eye movements. During fixations the noise was attenuated and for prompt responses to saccades the gaze data was attenuated less to avoid long times to reach the new steady-state of the signal.

In the implementation of LPF we saw in Equation 18 that the time constant controls the noise attenuation and the time it took to respond to saccades in input signals. This is used in the implementation of OCF where the time constant τ depends on the trend of the signal. The underlying idea is that if the signal has a constant mean a fixation is occuring and τ is set large to perform strong noise attenuation. However, if the mean of the signal changes significantly a saccade occurs and τ is set to a very small value to avoid the filtered output lagging behind the new steady-state of the signal. Once the filtered output matches the new mean of the input signal τ grows large again to increase the noise attenuation unless a new saccade occurs. The steps for implementing OCF are described below:

1. Calculate two means from samples of six discrete time instances: one mean from samples ts[tn], s[tn-1], s[tn-2]uand the other mean from samples ts[tn-3], s[tn-4], s[tn-5]u.

2. Calculate the mean difference meandiffbetween the two means from the previous step. 3. Check if meandiffexceeds a threshold g. If it exceeds the threshold a saccade occurs and an

alarm is triggered which sets time constant τ to a very small value τ_smallfor weak noise attenuation.

4. Use Equation 17 to calculate the filtered output with the same α defined in Equation 16. 5. Reset τ along an exponential curve with a step size dτ until it reaches a sufficiently large

value τlargefor strong noise attenuation.

In step 1 the oldest sample s[tn-5]is removed when the sample s[tn]in the next time instance is received followed by the remaining samples being pushed back one time instance. This is similar to the AF and MAF where we store samples N discrete time instances in the past and always remove the oldest one when a new sample is received.

In step 2 the difference between the means is calculated to allow a smaller threshold to be used. If the difference was only between two samples taken from the raw gaze data the differ-ence could exceed the threshold even when the player is gazing at the same point because of the noise. This would lead to false saccades being triggered which would set the time constant to a very small value. Increasing the threshold value avoids this but in turn degrades the filter