Visual attention analysis using eyetracker data

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

Linköping University Linköpings Universitet

LiU-ITN-TEK-A--08/117--SE

Visual attention analysis

using eyetracker data

Andreas Ferm

2008-11-21

(2)

LiU-ITN-TEK-A--08/117--SE

Visual attention analysis

using eyetracker data

Examensarbete utfört i medieteknik

vid Tekniska Högskolan vid

Linköpings universitet

Andreas Ferm

Handledare Reiner Lenz

Examinator Reiner Lenz

Norrköping 2008-11-21

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

(4)

Abstract

Little research has been done on the task of how a person searches for images when presented with a set of images, typically those presented by image search engines. By investigating the properties we might be able to present the images in a different manner to ease the users search for the image he/she is looking for. The work was performed at Chiba University under the supervision of Norimichi Tsumura and Reiner Lenz.

I created an experimental platform which first showed a target image and then a7 × 4 grid in which the users

task would be to locate the target image. The experiment data was recorded with a NAC EMR-8B eyetracker that saved the data as both a video and serial data stream. The data was later used to extract certain characteristics for different image sets, like how the eye fixates, and how different image sets affect the scan.

The initial place where the user started his/her search was dependent on where the user previously was fixat-ing. It was also more probable that subsequent fixations were placed in a close proximity to the previous fixation. My results also show that the search task was slightly faster when images where placed with a high contrast between neighboring images, i.e. dark images next to bright ones etc.

(5)

References 26 A Interesting sessions 27 B PHP Code 28 B.1 index.php . . . 28 B.2 functions.php . . . 30 C Matlab Code 35 C.1 extractTimecode.m . . . 35 C.2 getTime.m . . . 36 C.3 extractTracker.m . . . 38 C.4 shiftCalibration.m . . . 38 C.5 makeString.m . . . 39 C.6 idt.m . . . 41 C.7 editDist.m . . . 42 C.8 Ptest.m . . . 43

(7)

List of Figures

2.1 Target image and Grid of images . . . 7

2.2 Concept showing high(left) and low(right) contrast between images . . . 7

2.3 The eyetracker and its ouput . . . 8

(a) The camera module of the eye tracker . . . 8

(b) The DSP unit for the eye tracker . . . 8

(c) The IR image from the camera . . . 8

(d) The thresholded IR image from the camera . . . 8

2.4 Output captured by the computer . . . 9

2.5 Saved screenshot with the cropped time bar image . . . 9

2.6 Eyetracker coordinates off-centered . . . 10

2.7 Data structure after assembly of the data . . . 11

3.1 The principle components of a scanpath. . . 12

3.2 A part of the grid showing before and after the fixation detection . . . 13

3.3 Graph showing the average number of fixations/scene . . . 14

3.4 Average scanpath length per session . . . 15

3.5 Scanpath length depending on number of fixations . . . 15

3.6 Average on-target fixations per session . . . 16

3.7 On-target distribution of sessions . . . 16

3.8 3D view of initial and secondary fixation distribution . . . 16

(a) 3D view of initial fixation distribution . . . 16

(b) 3D view of secondary fixation distribution . . . 16

(c) Amount of initial fixations per grid cell . . . 16

(d) Amount of secondary fixations per grid cell . . . 16

3.9 Comparing the distribution of odd and even sessions . . . 17

3.10 Transition probabilities based on distance between cells . . . 18

(a) Transition probabilities based on distance, point plot . . . 18

(b) Transition probabilities based on distance, average probability for each distance . . . 18

3.11 Graphical explanation of the distances used in the permutation test . . . 21

(a) dbetweenis the average distance between users of different groups . . . 21

(b) dwithinis the average distance between users of different groups . . . 21

3.12 Reading directions for japanese vs. european languages . . . 21

3.13 75x1 vector containing the p-values for each session . . . 22

(8)

List of Tables

2.1 Image Categories . . . 6

2.2 Structure of SQL table userdata . . . 7

2.3 Structure of SQL table images . . . 7

3.1 Characteristics of the sessions . . . 14

3.2 Region-Of-Interest defined . . . 19

(9)

Chapter 1 Introduction

1.1 Project aim

The aim of this project is to collect and study data on how people search for a specific image among a set of images. There are many applications in where a person is presented with a collection of images, and I would like to study if there are any special characteristics on how people search for a given image. Another aim of the project is to create a set of Matlab tools making future research using eye tracker data more efficient.

1.2 Background

The use of eye tracker equipment is becoming more and more used in many different fields. It is widely used in usability tests, cognitive science and psycholinguistics. Similar techniques to eye tracking has recently been introduced into cars as well, where it is being used to monitor the eye activity of the driver to prevent accidents caused by drowsiness etc. Another commercial application for the eye tracker is to enable people with severe paralysis to intereact with a computer, by navigating the mouse cursour with the eyes and blink to click. My research is focusing on visual attention and scanpath characteristics when the stimulus is a grid of images, similar to the result presented by online image search engines.

1.3 Acknowledgment

This research was done at Chiba University with supervision from prof. Norimichi Tsumura and prof. Reiner Lenz who supported me and gave me access to their equipment and experience. I would like to thank Ph.D students Keita Hirai and Shinji Nakagawa for their help in showing me how to use the eye tracker equipment and being there to discuss some problems. I would also like to extend a thank you, to all people who took their time to help me with my data collection. A big thank you to Picsearch1_{for providing me with images to be used in the}

experiment. Finally this project wouldn’t have been possible without the scholarships I recieved from Lennings and Japan Sasagawa Foundation.

1.4 Source code

The source code for this project that is being referred to in this report can be obtained by contacting either me or my supervisor Reiner Lenz

(10)

Chapter 2 Data collection

2.1 Setup

2.1.1 About the experiment

The main purpose of this experiment is to find out how a person scans a set of pictures in search for a given picture. In my experiment I had 23 test subjects, but due to problems during the experiment the data from 6 people turned out to be unusable. So the actual usable data had data from 17 test subjects, 4 were female and 13 were male. 3 of them were european and 14 were japanese. All were university students aged between 21 and 31. The participants were all volunteers and no compensation was given.

2.1.2 Images

The images used for the experiment was supplied by Picsearch and consists of 100 images from 15 categories, see Table 2.1. Along with the images, Picsearch also provided statistics about the images popularity, which was used later on when determining which pictures that would be the target pictures in the experiment.

Andy Warhol Animal Beach Car Cat Claude Monet Dog Doll Flower Food Fruit Garden Lion Love Metallica

Table 2.1: Image Categories

2.1.3 Experiment platform

The experiment platform was based on MySQL1and PHP2to handle the display of images and collecting user data and storing it in a database. The server was run on the same machine as the one being used to show the experiment, to reduce the effect of network lag. Opera3was chosen as the web browser since it is fast and lightweight.

SQL database

The SQL database contained the tables userdata and images. U serdata was the table where the information about the user was stored, an example is shown in Table 2.2. The images table contained information about the images used in the test, and apart from the filename it also contained different statistical measures such as the mean of the image and various histograms. These histograms were used for calculating the similarities of the images in the experiment. I tried several different histograms but ended up using a 3D RGB histogram with 8 bins/channel as it was fast and accurate enough for my experiment. The fields of the images table can be seen in Table 2.3, and the complete database can be reconstructed by importing the eyetracker.sql file.

1_{http://www.mysql.com}

2_{http://www.php.net}

(11)

Name Gender Age Japanese Times Andreas male 25 no 1.232,3.223,...

Table 2.2: Structure of SQL table userdata

Filename Category ImageMean RGB-histogram HSV-histogram ISH-histogram sadf832.jpg animal 0.63 0.00231,0.0152,... 0.0581,0.0007,... 0.0094,0.0111,...

Table 2.3: Structure of SQL table images

PHP

The experiment starts by collecting information about the user, as shown in Table 2.2. Initially the user is shown a picture centered on the screen for 3 seconds, then the screen goes blank for 1.5 seconds, and after a 7x4 grid of images is shown to the user, see Figure 2.1. The user is then asked to locate the initial image in the grid and click on it. After clicking on the image a new target image is shown and the process is repeated for 5 times in each category, to a total of 75 times per user. The target image and the images in the grid are not chosen randomly, instead the 5 target images for each category is chosen as the most popular ones in the given category.

The images in the grid are chosen differently; for each target image an array of images is sorted depending on their similarity to the target image. For every odd session the grid is made up of as similar images to the target image as possible, while for every even session the grid is made up of alternating similarities, thus maximizing the contrast between the images. To calculate the similarities between the pictures I used the RGB histograms and calculated their differences using the histogram intersection algorithm[Swain and Ballard, 1991]. For more information about this see Figure 2.2 for a visual description of the high and low contrast session, also see the code in Appendix B.1 and Appendix B.2.

Figure 2.1: Target image and Grid of images

Figure 2.2: Concept showing high(left) and low(right) contrast between images

2.1.4 Eyetracker

The eyetracker used in the experiment is a NAC EMR-8B that samples data at 60Hz. It can handle 2 camera modules, one for each eye and a DSP (Digital Signal Processor) unit. I used 1 camera for my experiments, the camera module and DSP can be seen Figure 2.3(a) and Figure 2.3(b). The camera module has an infrared lightsource and an infrared image sensor. The camera module feeds a thresholded image to the main DSP unit that transforms the image into a gaze vector, which is then translated into screen coordinates for the output data stream, see Figure 2.3(c) and Figure 2.3(d) for the output of the camera. The output from the eyetracker is a

(12)

video stream, showing the users desktop plus a marker for the eye position and a timebar showing the time in hh:mm:ss:ff format, and serial data containing frame number and eye coordinates.

(a) The camera module of the eye tracker

(b) The DSP unit for the eye tracker

(c) The IR image from the camera

(d) The thresholded IR im-age from the camera

Figure 2.3: The eyetracker and its ouput

2.1.5 Calibration

Whenever a new user is to use the eyetracker equipment a calibration needs to be performed to ensure that the eye coordinates can cover the entire screen and that they actually represent where the user is looking. The calibration is done by having the user focus on 9 separate points on the screen, and marking with the click of a button when he/she is focusing on each point. To confirm that the calibration was successful the user is told to follow the mouse cursor, and if the eye position on the TV matches the mouse cursor then the calibration was successful. Since the camera is so closely zoomed in on the pupil, this system is quite sensitive to body movements from the user. This has lead to that sometimes the eye ends up being outside the view from the camera, which leads to unusable data.

2.2 Experiment

The test was performed on 2 separate computers, where computer no.1 was running the web server and the web browser and computer no.2 was the computer which recieved the video and serial data from the eyetracker. For each test a pair of files were saved, an avi file and a plain text file containing the serial output information such as framenumber and the x,y position of the eye. The video and serial data can be found in the data folder. The screen that the user is watching has a native resolution of 1440x900@75Hz, this video output is then scaled in the eye tracker to 640x480@60hz interlaced, see Figure 2.4. This change in aspect ratio will be compensated for, before the data is used.

2.3 Data partitioning

The files collected so far contained data from the start of the experiment to the last second, and since the only real interesting parts for this experiment is the data for each session when the user is scanning the grid of images. So before we can actually use the data it needs to be extracted and partitioned into a usable datastructure.

2.3.1 Picking out the sessions

As mentioned earlier, each experiment has 75 sessions and for each session we want to extract the data when the grid is shown. This could be done by simply going through all the video files manually writing down the start and finish time for each session. This process would be very time consuming, and most of all boring, so instead I decided to use Matlab and extract the time information automatically.

(13)

Figure 2.4: Output captured by the computer

Finding the start/end

Matlab has built in functions to read movies, but its use is very limited since when it reads a movie it loads it all uncompressed into memory, thus consuming too much memory to be effective. My solution was to use videoIO4_,

which works as a bridge connecting Matlab with the well known video decoding/encoding library ffmpeg. As opposed to Matlabs builtin functions for videos, videoIO creates a video object which reads 1 frame at a time, reducing memory usage drastically at the cost of slightly increased overhead. The program goes through each frame, in which it extracts the top and bottom rows in the grid and joins as one image. The reason why I only want to work on the top and bottom row is to not get false indications when the target image is shown in the center of the screen. The number of objects in the cropped image is computed using Matlabs bweuler function, and a variation coefficient is calculated as well, see Equation 2.1 where i is the cropped image. The number of objects and the variation coefficient is compared to the previous frame, and by trial and error I found when the difference between the two frames was ’big enough’ to be at the start or end of a session , then the whole frame and a crop around the time bar is saved to jpeg files for further processing, see Figure 2.5. The Matlab code can be found in Appendix C.1

variation− coef f icient = σ(i)

¯i (2.1)

Figure 2.5: Saved screenshot with the cropped time bar image

Manual removal

During the capture process some problems could, and did arise, which resulted in that some sessions did not get their start/end time extracted correctly. Another problem with the above mentioned function was due to the fact that the video was interlaced and that sometimes when a sessions started or ended the algorithm detected false positives. To remedy the latter I wrote the program pruneScreenshots.m which goes through all the extracted screenshots from each .avi file and deletes a screenshot if it is closer than 5 frames from the previous one. To fix

(14)

the problem of missing sessions in the extracted screenshots was a laborious task where I ended up not trusting the computer, instead I visually inspected the screenshots, removing incomplete pairs, missing either start or end, and noting which session numbers were missing. The session numbers were then corrected by hand in Matlab.

Extracting the time

After the previous function is finished we now have a set of pairwise images just waiting to be converted into frame numbers. Every timebar image is chopped up into 8 small images, each containing 1 character of the timebar. These subimages are then sent into a very basic character recognition function that compares the image to a set of known images, code can be seen in Appendix C.2. The resulting numbers are then concatenated and transformed into a framenumber. Now every session has 2 frame numbers associated with it, one start and one end frame.

2.3.2 Final steps

Preparing eye tracker coordinates

Now we have a set of start/end frame numbers for each user. These frame numbers are used to pick out the eye coordinates from the serial data text files, where each line starts with the frame number.The code is pretty straightforward but can be seen in Appendix C.3. As mentioned before the eye tracker data had been scaled down from 1440x900 to 640x480, and since this is a change of aspect ratio, I decided to change back the eye tracker coordinates to 1440x900 which is done using the scaleTrackerData.m file.

After visually inspecting the avi files recorded in the experiment, one can see that for some users the calibra-tion was not 100% accurate. This could be seen clearly when the user is looking at the target image before the grid is shown, and the average center coordinate is far off from the images coordinates, see Figure 2.6. So I went through all video files and measured an average translation that would center the tracker data. Each users tracker data was then translated based on the earlier measurements, resulting in a hopefully well calibrated accurate data, see Appendix C.4 to see how my samples were shifted.

Figure 2.6: Eyetracker coordinates off-centered

Putting it together

At this time I had three different structure variables in Matlab, each containing related data about the user and data from the experiment. The final step for the data preparation is to put all the gathered data together in a usable structure. The resulting structure variable contains fields for the filename of the captured avi, information about the user, and a structure that contains the eye tracker data for each session. This structure can be found in the file

(15)

(16)

Chapter 3 Analysis

3.1 Fixation detection

Even after partitioning the eyetracker data, we are still faced with the fact that we have too much data to handle and perform analysis on. A common way to reduce the data is to utilize the fact that a scanpath consists of two components, namely saccades and fixations. Eye saccades are quick, simultaneous movements of both eyes in the same direction, as opposed to fixations which are the short breaks when the eye is stationary. When the eye is not fixating it is doing a straight saccadic movement to the next fixation, see Figure 3.1. So what we will do is find all fixations in the scanpath and use them as a representation of our data, the reason we can do this is that little or no visual processing can be achieved during a saccade[Fuchs, 1971].

Figure 3.1: The principle components of a scanpath.

3.1.1 Algorithm

There are a few different algorithms to detect fixations[Salvucci and Goldberg, 2000], and I decided for a disper-sion based algorithm. In eye movement research the minimum threshold time for a fixation is usually between 100-200ms, for my algorithm I decided to use 150ms as threshold. The dispersion can be seen as the spread of the points, and the threshold is commonly the circular area covered by 0.5-1 degrees from the viewer. In my algorithm I used 1 degree as threshold, which was converted into pixel distance by using the measured distance between the screen and the user. Below is the pseudo code of the algorithm, the Matlab implementation can be seen in Appendix C.6. For this experiment I did not save the length of the fixations, so for other experiments that might be interesting, and could be added easily in idt.m.

(17)

Algorithm 3.1.1: IDT(trackerData, dispersionT hreshold, durationT hreshold) while there are still points left

do               

Initialize window over first points to cover the duration threshold

if dispersion of window points <= threshold then

  

Add additional points to the window until dispersion > threshold Note a fixation point at the centroid of the window points Remove window points from points

else Remove first point from points return(f ixationpoints)

3.1.2 Fixation detection result

After performing the fixation detection the difference in data amount can be seen by comparing the red circles in Figure 3.2, where a picture before and after fixation detection is shown. This reduction in data removes the temporal relation within the data, but for my research I am more interested in which image the user is gazing at and in what order, than for how long they are doing it.

Figure 3.2: A part of the grid showing before and after the fixation detection

3.2 Fixation analysis

3.2.1 Average number of fixations

The average number of fixations is considered to be a measure on how complex or difficult a task is. So what I have done here is to create a 75x1 vector, Avg NoOffFixations PerSession.mat which contains the average number of fixations per session. This gives us a value on how difficult the different sessions are on average. As can be seen in Figure 3.3, there are a few interesting peaks which might be worth further investigation, the reason I did not include the peak at session 72 was due to some problems during the experiment when the web browser crashed twice for just this session.

(18)

0 10 20 30 40 50 60 70 0 5 10 15 Session number No. Fixations

Average No. Fixations Per Session

Figure 3.3: Graph showing the average number of fixations/scene

Sessions of interest

From visual inspection of the Figure 3.3 I picked out 15 sessions of interest based on their high number of fixations. Below in Table 3.1 I’ve collected the sessions and what their characteristics are. The images in the table are shown in Appendix A

Sessions Category Similar pictures Duplicates First Session Last Session Contrast

6 Beach x x x low

7 Beach x x high

8 Beach x x low

9 Beach x x high

10 Beach x x x low

16 Claude Monet x x low

20 Claude Monet x x low

21 Andy Warhol x x low

23 Andy Warhol x x low

30 Dog x low 40 Food x low 46 Doll x x x low 50 Doll x x x low 60 Lion x x low 63 Garden x low

Table 3.1: Characteristics of the sessions

From the first five rows in the table we can see sessions 6-10, which are all the sessions in the beach category. By inspecting the images in the beach category they contain many duplicate images and the non-duplicates are very similar to the target image. We can also see other interesting patterns, an example is that 5 of the above sessions are the first session in their categories and that 6 of them are the last session in their categories. Apart from the beach category all of the sessions in Table 3.1 are low contrast sessions which is interesting, because it could be interpreted that it is more complex to find the correct image when its neighboring images are similar to the target image. The question on how a high and low contrast environment affect the scanpath will be investigated more in Subsection 3.2.5.

3.2.2 Average scanpath length

The length of a scanpath is the length that the eye moves while scanning a session, also called saccadic distance. Since the saccadic movements between fixations are more or less a straight line, I defined the length of a scanpath as the pixel distance between all fixations in a session. The file Avg Scanpath Length PerSession.mat is a 75x1 vector containing the average amount of pixels the eye travels per session. The graph showing the average scanpath length per session can be seen in Figure 3.4. If we compare that graph to the graph of the average number of fixations per session, Figure 3.3, we can see that they look quite similar. In Figure 3.5 I have plotted how the

(19)

scanpath length is depending on the number of fixations, and we can see that the scanpath length seems to be linearily dependent on the number of fixations.

0 10 20 30 40 50 60 70 1000 2000 3000 4000 5000 Session number Length in pixels

Average Scanpath length per session

Figure 3.4: Average scanpath length per session

0 2 4 6 8 10 12 14 16 18 1000 1500 2000 2500 3000 3500 4000 4500 5000 Number of fixations Scanpath length (px)

Scanpath length based on number of fixations

Figure 3.5: Scanpath length depending on number of fixations

3.2.3 On-target fixation

The on-target fixation is the ratio between the number of fixations on the target image and the total fixations. This would be a value between 0-1, where theoretically a 0 would mean that the person didn’t fixate on the target image at all, and a 1 would mean that the person only fixated on the target image. Avg OnTarget Fixation PerSession.mat contains a 75x1 vector that has the average on-target fixations per session and the graph can be seen in Figure 3.6. In Figure 3.7 I have plotted the histogram of the average on-target fixation. I find the result interesting, since I can’t see any relation to the average scanpath length or average number of fixations. Looking at the distribution we can also see that the big weight of the distribution is close to 0, and from Figure 3.6 we can see that there’re even sessions with an average on-target fixation of 0. I see three reasons why/how the average on-target fixation can be 0, and that is;

1. The calibration of the eyetracker is not centered properly, so that the eye is actually fixating on the correct image at least once.

2. The fixation detection is done improperly with too long time threshold and/or too small dispersion threshold. 3. The eye doesn’t need to focus on an image to decide whether it is the correct image or not, but relies on

(20)

0 10 20 30 40 50 60 70 80 0 0.1 0.2 0.3 0.4 Session number

On−target fixation ratio

Average on−target fixation ratio per session

Figure 3.6: Average on-target fixations per session

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 5 10 15 20

On−target fixation ratio

Distribution density

Average on−target fixation ratio distribution

Figure 3.7: On-target distribution of sessions

3.2.4 Placement of initial and second fixation

The initial fixation is in which cell the user fixates first, and the secondary fixation is where the user is fixation after the first one. FixationPosition First.mat and FixationPosition Second.mat are both 7x4 matrices. Each cell in the matrices correspond to the image cell in the grid and contains the sum of all fixations for all users and sessions. This might tell us what part of the grid that catches the users attention and in what order. I have arranged the data in 2 Tables 3.8(c)- 3.8(d), showing where in the grid the users are looking first and secondary. Each table is accompanied by a 3D plot showing the same data, Fig 3.8(a)- 3.8(b).

1 2 3 4 ₁ 2 3 4 5 6 7 0 100 200 300 400 500 Grid cell Grid cell

(a) 3D view of initial fixation distribution

1 2 3 4 ₁ 2 3 4 5 6 7 0 100 200 300 400 Grid cell Grid cell

(b) 3D view of secondary fixation distribution

1 10 37 127 26 3 5 3 30 195 444 153 30 8 4 8 42 47 17 18 7 0 1 1 1 4 2 2

(c) Amount of initial fixations per grid cell

6 27 69 71 42 15 3 12 157 203 180 105 53 8 6 36 90 59 27 33 4 3 4 0 1 1 0 0

(d) Amount of secondary fixations per grid cell

(21)

As can be seen in Table 3.8(c) and Figure 3.8(a), most initial fixations are highly placed around the center of the grid. The second fixations, Figure 3.8(b) and Table 3.8(d), are also highly centered though there is a stronger tendency for a left horisontal movement over vertical and diagonal. When using search engines the most prefered result is usually shown in the upper left corner, which might be a reason that the secondary fixations shows a to move towards the left as well. One possible reason that the first fixations is centered like this could be the fact that the user was led towards the center of the screen before, when the target image was shown. Since there is no other stimulus before the grid is shown, the eye has a tendency to remain stationary, which my result shows.

3.2.5 High vs. low contrast

As mentioned earlier the images shown in the sessions alternate between high and low difference between neigh-boring images. When I initially planned the experiment I was curious on the difference between showing images in a high and low contrast environment. I was even quite sure that there would be a significant difference in the time needed to find the right image. The files Odd Time.mat and Even Time.mat contains vectors of size 744x1 and 499x1. These vectors have the time used for each session in low and high contrast respectively. To calculate the time I used the original data from the eyetracker and took the number of samples divided by 60 to get the time in seconds. The reason why the low contrast vector (Odd Time) has more data is because each category has 5 sessions where 3 are low contrast and 2 are high.

In Figure 3.9 i have plotted the probability density distribution of the odd and even session and surprisingly the distributions are almost identical. The fact that the distributions are almost identical is interesting when compared to the result in Subsection 3.2.1, where sessions with exceptionally high number of fixations belonged to low contrast sessions. There is however a small difference between high and low contrast sessions when looking at the average time, where the average time for low contrast is: 3.9347s and 3.6831s for high contrast. This difference

gives us that the high contrast sessions on average take6.4% less time, which might well explain the results in

Subsection 3.2.1. 0 5 10 15 0 10 20 30 40 50 60 70 80 90 Time(s) Distribution density

Comparing time distribution between High & Low contrast sessions

High contrast sessions Low contrast sessions

Figure 3.9: Comparing the distribution of odd and even sessions

3.2.6 Transition probability

If we see a session as a stochastic process we define the session as X(t) where X is a stochastic variable. X : t → G, where G is the7 × 4 grid of images. We can describe the elements in G as gk. If sn ∈ G are the states at the

time n we can use this to characterise X by using transitional probabilities; pi,j = P (Xn+1= j|Xn = i) This

will give us the quadratic transition matrix with the dimension n× n, where n is the number of possible states.

P = 1 → 1 2 → 1 . . . n→ 1 1 → 2 2 → 2 . . . n→ 2 .. . ... . .. ... 1 → n 2 → n . . . n→ n .

Impact of distance on fixation location

In Subsection 3.2.4 I investigated where the initial and secondary fixations were placed, and in this subsection I will use the transition probability to see if there’s a relation between the probability between two cells and

(22)

their distance to each other. By using fixation data from all the available sessions I created the matrix

Transi-tionMatrixForAll.mat. It is a 28 × 28 matrix and each column sums up to 1.0. I created the 28 × 28 matrix

DistanceMatrix forGRIDelements.mat that contains the city block distance for the corresponding cells in Transi-tionMatrixForAll.mat. With these two matrices I was able to plot the probabilities based on the distance between

the pairs. Figure 3.10(a) shows a plot of all the data points plotted as opposed to Figure 3.10(b) where the average probability is plotted for each city block distance.

0 1 2 3 4 5 6 7 8 9 0 0.1 0.2 0.3 0.4 City−block distance Probability

Transition probabilities depending on distance

(a) Transition probabilities based on distance, point plot

0 1 2 3 4 5 6 7 8 9 0 0.02 0.04 0.06 0.08 0.1 0.12 City−block distance Probability

Average transition probability depending on distance

(b) Transition probabilities based on distance, average probability for each distance

Figure 3.10: Transition probabilities based on distance between cells

We can clearly see in Figure 3.10(b) that the probability looks like it has a half-normal distribution, telling us that it is more probable that the eyes next fixation will be at a region closer to the previous fixation than further away.

3.3 Statistical comparison

For my experiment there are a few questions that I would like to have answered with a statistical significance. When comparing two scanpaths one could of course visually compare them, but this non-scientific approach is prone to many errors due to the researchers expectancies and the lack of computations backing up the researchers statements[Feusner and Lukoff, 2008]. A common method to compare scanpaths to each other is to use pairwise sequence alignment[Josephson and Holmes, 2002][Pan et al., 2004].

3.3.1 Pairwise sequence alignment

Sequence alignment is a method mostly used in the field of bioinformatics where it is being used for example to identify similarities between different DNA or protein sequences. There are many different algorithms to perform sequence alignment, and for my research I have implemented a method that calculates the distance/cost of transforming one scanpath into another. To be able to use this algorithm we have to first modify our data so that it can be respresented by a string.

(23)

Data modification

Our fixations are now defined as points with x- and y-coordinates, but since the experiment is showing many images we are more interested in at which image the user is watching, not the screen coordinate point. So our grid has been defined into a set of ROI’s (Regions Of Interest), where each region is given a label, which can be seen in Table 3.2. Using this table we can transform our fixation points into a string by checking in which ROI the fixation is and appending that label to a fixation string, Matlab code can be seen in Appendix C.5. After the algorithm is finished each session is described as a string, and a whole new toolbox is available, allowing us to use similar algorithms as the ones used in bioinformatics to compare our scanpaths.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b Table 3.2: Region-Of-Interest defined

Edit distance

The edit distance is a measure on how similar two strings are, in that sense that it outputs a cost for transforming one string to another[Josephson and Holmes, 2002]. The transformation is done using 3 different operations, insertion, deletion, substitution, where each operation has a cost. In its simplest form each operations has the cost set to 1. My approach to solve this is to use a dynamic programming algorithm which can be seen in Appendix C.7.

Substitution cost matrix

As mentioned earlier the most primitive solution is to put all edit costs to 1 which feels intuitively wrong for my research. My regions of interests are cells in a grid, see Table 3.2, so it would seem more appropriate to set the substitution cost based on the distance to the cell that would be substituted. For example, the cost of substituting an ’A’ with a ’B’ should be lower than the cost of substituting an ’A’ to a ’U’. So for the substitution cost I created a 28x28 matrix containing the city block distances between all cells which then was normalised between 0 and 1, the substitution matrix can be found in DistanceMatrix forGRIDelements.mat, see a part of it in Table 3.3

3.3.2 Permutation test

With the tools mentioned above we can now compare 2 scanpaths to each other, but this alone is not a good method to compare groups of scanpaths. An example of comparing groups would be to compare the scanpaths between women and men to see if they are scanning the page differently compared to each other.

Recent research has proposed a statistical method to compare groups of scan patterns[Feusner and Lukoff, 2008]. The proposed method is based on a permutation test, that will give the probability for the proposed null hypothesis to be true. A null hypothesis can be for example: ”Japanese people scan a set of images the same way as euro-peans.”. The way to get the p-value is to calculate a reference distance for the experiment grouping and compare that distance to all possible distances by rearranging the groupings. So for each test we start by grouping the test candidates into the experimental groups and calculate the reference distance as Equation 3.1

d∗_{= d} between− dwithin (3.1) 0 A B . . . Z a b A 0 0.111 . . . 0.777 0.888 1 B 0.111 0 . . . 0.666 0.777 0.888 .. . ... ... . .. ... ... ... Z 0.777 0.666 . . . 0 0.111 0.222 a 0.888 0.777 . . . 0.111 0 0.111 b 1 0.888 . . . 0.222 0.111 0

(24)

dbetween and dwithin are both average string edit distances, but working on different element groupings. To

(25)

(a) dbetweenis the average distance between users of different groups

(b) dwithinis the average distance between users of different groups

Figure 3.11: Graphical explanation of the distances used in the permutation test When this is done for the experiment grouping we have our reference distance d∗

ref. To get the p-value we need to

examine all possible groupings of the users and calculate a d∗_{for all of them, and then take the ratio of the number}

of elements,ℵ, of all groupings and the number of groupings which have a d∗_{at least as large as the reference d}∗_,

Equation 3.2.

p=ℵ(d

∗_{>= d}∗ ref)

ℵ(d∗₎ (3.2)

The code for my implementation of the permuation test, and instructions on how to use it, can be seen in Ap-pendix C.8. Even for relatively small groups the number of possible groupings might grow prohibitively large, if we have a group of size n and one of size m, then the number of possible groupings are;

(n + m)!

n!m! (3.3)

3.3.3 Japanese vs. Non-japanese

One question that i wanted to answer was if there would be a significant difference in how japanese people scan a set of images, compared to europeans. The reason for this could be the fact that the japanese language originally is read from the top-right corner and down, as opposed to most ’western’ languages that are read from top-left corner and right, see Figure 3.12 for a graphical explanation. The permutation method was used to do the statistical comparison with the null hypotesis H0;

H0= Japanese scan a set of images in the same way as europeans

(26)

It should be mentioned that the distribution of japanese and non-japanese is far from equal with 14 japanese and 3 europeans, this could mean that the result given will be more on individual differences than group ones. If I would redo this experiment I would rather have more users and less sessions. Since each user has 75 sessions, I decided to do the permutation test for each session not mixing different sessions. This resulted in a 75x1 p-value vector with a p-value for each session, P Value JapNonJap PerSession.mat, and I took the average p-value as the p-value for the null hypothesis.

The p-vector is shown in Figure 3.13, we can see that the p-values for the different sessions range from 0.01-1. As in all statistical tests p-values less than 5% is regarded as a good significance level to reject the null hypothesis. The average p-value for the whole test was41.6%. This p-value fails to reject the null hypothesis that the sets

come from the same distribution. This means, in other words, that we cannot statistically say that there is a differ-ence between how japanese and europeans scan a set of images in the way they are presented in this experiment.

0 10 20 30 40 50 60 70 0 0.2 0.4 0.6 0.8 1 Session number P−value

P−value per session: Japanese vs. Non−japanese

(27)

Chapter 4 Conclusions

4.1 Result

The research I have done has been primarily with the goal to investigate if there are any characteristics in how people are scanning a set of images while searching for a specific one. After doing the analysis I have come to a few conclusions which is presented in their respective subsections below. They should however be taken with a grain of salt as there are many uncertainties in the experiment and there are questions which should be studied more in depth.

4.1.1 Amount and placement of fixations

Analysing the properties of fixations in an eyetracker protocol is quite a big field that can say many things about the scene and its complexity and content. I found it interesting that the placement of the first image to look at was highly based on where the user was looking before the images were shown. This fact could be used by image search engines, for example, to decide where they put the ”search” button and search-text input field. In other words, according to my results, it would be a good thing to place these key items close to where the most relevant search result will be shown.

4.1.2 Scanning pattern

The way the eye moves when scanning can also be used to affect on how to present a set of images when in a search environment. While scanning the images, the user is more probable to fixate on images that are close to the initial fixation, with a slight preference on horisontal scanning over vertical or diagonal scanning. If we apply this once more to the image search engine then the efficiency of the search might be enhanced by placing the high rated images close to each other instead of placing them from left to right a row at a time.

4.1.3 High/Low contrast environment

It was slightly faster to find the correct image when there was a big difference between the neighboring images, e.g. placing dark pictures next to light pictures etc. This could be used to lessen the time the user needs to find a specific image, but might prove different if the user doesn’t have a specific image in mind when using an image search engine.

4.1.4 Statistical comparisons on groups of scanpath

The statistical comparison on groups of scanpaths seems interesting, though it is hard to confirm how well it performs since I did not have any data with fixed scanning characteristics. The biggest drawback to using this method is that it becomes terribly slow when the size of the groups grow, though it might be possible to randomly pick out a subset of the permuations to reduce the amount of computations needed.

(28)

4.2 Lessons learnt

Working on a single project for a longer period is definitely a learning experience. I will try to collect most of the bigger ’lessons learnt’, and things I might have done differently if I were to redo the experiment, in the following subsections.

4.2.1 Gathering high quality data

When gathering the data it was hard to imagine the importance of getting good quality data that was well calibrated and correct all through the whole experiment. The camera that was focusing on the eye was so zoomed in that even the slightest movement could move the eye outside the cameras range which would result in corrupt data. Another very common problem, although quite specific, was with eye lashes. When a person with naturally small eyes, asians e.g., the eye lashes might cover parts of the pupil of the eye and they interfere with the thresholding of the image producing jerky fake eye movements in the data.

Possible solutions to some of the problems would be to put the camera further away or change the lens so that the persons eye can move a little without the eye leaving the view of the camera. A possiblity might also be to secure the head to the head rest instead of just resting it against it as I did in my experiment. I would also have added some calibration point to be recorded with the data so that if the data is not calibrated correctly one might be able to easier measure how much all points should be translated to center the coordinates.

4.2.2 Number of users and user information

The average time for an experiment session was around 30 minutes per user, where 5-6 minutes was for the whole setup and calibration and the rest for the actual experiment. People tended to loose focus halfway in the experi-ment. So if I would do this kind of experiment again I would try to gather more users, so instead of 17 users and 75 sessions i would prefer to switch the numbers and have 75 users with around 20 sessions.

Some of the people who helped me with my test was not fluent in english, so I found that having the test in-structions written in their native language made the whole experiment go more smoothly.

4.2.3 Setting specific goals

The planning part of a research project should not be underestimated. If one has clear goals on what one wants to achieve then it is much easier to form the experiment in a way that will aid you to reach those goals. I had a few vague goals initially which I can feel now afterwards led me a bit down the road, but if I would have defined them more in detail and customised the experiment for this I might’ve reached further on some questions.

4.2.4 Discuss problems

One easily gets caught up in ones problems and in interpreting other peoples work, and as in my case I had a hard time discussing my problems since I didn’t have anyone that had worked on this topic before. I learnt that at those times it is good to swallow ones pride and contact other researchers in the field. For example, I was in contact with Matthew Feusner who was able to explain some parts from one of his publications that was a bit ambiguous, which helped me out when I was in a situation where I was stuck.

4.2.5 Write a journal during the experiment

I tried my best to take notes during the whole project but I didn’t pay attention to small details, this led to that when writing the report it was easy to forget to mention the smaller details. This made the report quite difficult to read for someone who hasn’t done the project, hopefully if you have come this far in the report most things are clear.

4.2.6 Other lessons learnt

There are too many lessons learnt to mention in what is supposed to be a small part of the report. But while doing any kind of project of this kind you get to practice your problem solving skills, programming and most of all that even if you don’t get the results you were expecting doesn’t mean that the project is a failure.

(29)

4.3 Future research

Due to time constraints not all questions that arose could be investigated. I have collected a few of the interesting questions that might be addressed in future research.

4.3.1 On-target fixation

In some sessions I found that the number of on-target fixation was close to or zero. I became curious on digging deeper into the question why this was, if it was just because of some problems in the algorithms or calibration, or if the eye can successfully use peripheral vision to find the correct images. To be able to go deeper in this question one would need to try to eliminate as many uncertainties as possible. The calibration has to be very accurate and different fixation detection thresholds should be tried. Another factor might also be to extend the area around the fixation point since the eye does not look at fixed pixel but has a fixation area which might be dependent on the size of the pupil for example.

4.3.2 Stochastic processes

I just skimmed the surface on using a Markov model in an attempt to extract different characteristics about the scanpath and user. In subsubsection 3.2.6 I used transition probabilities to check the how the probable a jump was depending on its distance from the previous image. Another possible extension of this might be to create a create a probability matrix which takes into account the directions of the movements as well. So for my experiment that would mean creating a matrix that would cover the range of movements±6 in horisontal movement and ±3 in

vertical movement. I think there might be a great potential in using stochastic representations of eyetracker data and it haven’t been explored that much yet by other researchers.

4.3.3 Placement order of fixations

As I checked placement of the first and second fixation in subsection 3.2.4 it might be interesting to continue for the third and fourth fixation and so on and then use each of the 3D plots as a frame in a timelapse. It might not be the most scientific approach but it might give an interesting visual result.

(30)

References

[Feusner and Lukoff, 2008] Feusner, M. and Lukoff, B. (2008). Testing for statistically significant differences between groups of scan patterns. In ETRA ’08: Proceedings of the 2008 symposium on Eye tracking research

& applications, pages 43–46, New York, NY, USA. ACM.

[Fuchs, 1971] Fuchs, A. (1971). The saccadic system. In The Control of Eye Movements, pages 343–362. [Josephson and Holmes, 2002] Josephson, S. and Holmes, M. E. (2002). Visual attention to repeated internet

images: testing the scanpath theory on the world wide web. In ETRA ’02: Proceedings of the 2002 symposium

on Eye tracking research & applications, pages 43–49, New York, NY, USA. ACM.

[Pan et al., 2004] Pan, B., Hembrooke, H. A., Gay, G. K., Granka, L. A., Feusner, M. K., and Newman, J. K. (2004). The determinants of web page viewing behavior: an eye-tracking study. In ETRA ’04: Proceedings of

the 2004 symposium on Eye tracking research & applications, pages 147–154, New York, NY, USA. ACM.

[Salvucci and Goldberg, 2000] Salvucci, D. D. and Goldberg, J. H. (2000). Identifying fixations and saccades in eye-tracking protocols. In ETRA ’00: Proceedings of the 2000 symposium on Eye tracking research &

applications, pages 71–78, New York, NY, USA. ACM.

[Swain and Ballard, 1991] Swain, M. J. and Ballard, D. H. (1991). Color indexing. Int. J. Comput. Vision, 7(1):11–32.

(31)

Appendix A

Interesting sessions

(32)

Appendix B

PHP Code

B.1 index.php

<?php include ’connect.php’; include ’functions.php’; /∗ G l o b a l a v a r i a b l e r ∗ / $histType="hsvHist64"; /∗ r g b H i s t 6 4 , r g b H i s t 1 2 8 , h s v H i s t 1 6 , h s v H i s t 6 4 , h s v H i s t 1 2 8 , i h s ∗ /

$contrast= (isset($_GET[’contrast’] ) ) ? $_GET[’contrast’] :’’; / / 0 : low c o n t r a s t , 1 : h i g h c o n t r a s t

if(empty($contrast) )$contrast= 0 ;

$runNo= (isset($_GET[’runNo’] ) ) ? $_GET[’runNo’] :’’;

if(empty($runNo) )$runNo= 0 ;

$phase= (isset($_GET[’phase’] ) ) ? $_GET[’phase’] :’’; / / w hi c h p h a s e o f t h e e x p e r i m e n t

if(empty($phase) )$phase= 0 ;

$debug= (isset($_GET[’debug’] ) ) ? $_GET[’debug’] :’’; / / s e n d i n 1 f o r de bug o u t p u t

if(empty($phase) )$phase= 0 ;

$timer= (isset($_GET[’timer’] ) ) ? $_GET[’timer’] :’’;

$uid= (isset($_GET[’uid’] ) ) ? $_GET[’uid’] :’’;

$xDim= 7 ; / / number o f p i c s i n x−a x i s $yDim= 4 ; / / number o f p i c s i n y−a x i s $totImg=$xDim∗$yDim;

/∗ S e t s up t h e c o r r e c t c a t e g o r y ∗ / $meta=setCategory($runNo) ;

/∗

I f t h e t i m e was s e n t , c a l c u l a t e t h e t i m e t h e p i c t u r e s w e r e shown a nd u p d a t e t h e DB

∗ /

if( !empty($timer) && !empty($uid) )

{

$deltaTime=microtime(true)−$timer;

$deltaString=$deltaTime . "|";

$sqlString="UPDATE user set expTime=CONCAT(expTime, ’$deltaString’) WHERE uid=$uid";

$timeRes = mysql_query($sqlString)

or die("something went wrong") ;

(33)

/∗

S e t up t h e s q l q u e r y a nd g e t t h e d a t a

∗ /

$imageQuery="SELECT * FROM images WHERE meta=’$meta’ ORDER BY pop DESC;";

$imagesRes = mysql_query($imageQuery)

or die("something went wrong") ;

while ($line = mysql_fetch_object($imagesRes) ) {

$img_url[ ] = $line−>url; / / The ’ f o l d e r / f i l e n a m e . j p g ’

$img_meta[ ] = $line−>meta; / / The c a t e g o r y name i n t h e sqlDB

$histString = $line−>$histType; / / The h i s t o g r a m s t r i n g o f num be r s

$img_histogram[ ] = explode(’ ’,$histString) ; / / H i s t o g r a m a r r a y o f num be r s

$img_pop[ ] = $line−>pop; / / The p o p u l a r i t y t e r m

}

/∗

r e m o v e s e n t r i e s making t h e a c t u a l t a r g e t i m a ge t h e f i r s t one i n t h e a r r a y

∗ /

list($img_url,$img_histogram,$img_pop) = . . .

pruneArrays($img_url,$img_histogram,$img_pop,$runNo) ;

/∗

C a l c u l a t e t h e d i s t a n c e f r om t h e t a r g e t i m a ge

∗ /

$counter= 0 ;

foreach ($img_url as $value) {

$distance[$counter] = distance($img_histogram[ 0 ] ,$img_histogram[$counter] ) ;

$counter = $counter+ 1 ; } /∗ S o r t $ i m g u r l b a s e d on $ d i s t a n c e T h i s c a n s o r t by $ i m g u r l t o o i f d i s t a n c e [ x ] = d i s t a n c e [ x + 1 ] , s o l e t ’ s go a r o u n d i t ! ∗ /

$distKeys=array_keys($distance) ;

array_multisort($distance,$distKeys,$img_url,$img_pop) ;

/∗

P u t t h e p i c t u r e s t o be d i s p l a y e d i n an a r r a y

∗ /

list($toShow,$pop,$dist) =pickPictures($img_url,$img_pop,$contrast,$totImg,$distance) ;

/∗

Now d e p e n d i n g on w hi c h $ p h a s e we w i l l show d i f f e r e n t t h i n g s .

$ p h a s e =0 shows o n l y t h e t a r g e t i m a ge f o r 5 s e c o n d s , t h e n r e d i r e c t s t h e p a g e t o $ p h a s e =1 w hi c h shows a w h i t e b l a n k p a g e f o r x s e c o n d s a nd t h e n r e d i r e c t s t o $ p h a s e =3 t h a t shows t h e a c t u a l image−g r i d

Not t h e b e s t o r s m a r t e s t way , a nd some u n n e c e s s a r y c a l c u l a t i o n s b u t i t ’ s ok f o r t h i s e x p e r i m e n t

∗ /

if ($phase == 0 )

{

$forwardUrl="setTimeout(\"location.href=\’index.php?runNo=$runNo&uid=$uid&contrast";

$forwardUrl=$forwardUrl . "=$contrast&phase=1\’\",[5000]);";

/ / $ i n d e x = ( i n t ) fmod ( $runNo , 5 ) ;

$index= 0 ;

(34)

<tr><td><center><img

src=’images/$toShow[$index]’><center/></td></tr> <script language=javascript>".$forwardUrl . "

</script> "; } elseif($phase= = 1) { $forwardUrl="<script language=javascript>setTimeout(\"location.href=\’index.php?runNo=$runNo&uid";

$forwardUrl=$forwardUrl."=$uid&contrast=$contrast&phase=2\’\", [1500]);</script>"; $imString=$forwardUrl; } else { /∗ $toShow i s now f i l l e d w i t h t h e p i c t u r e s t h a t a r e t o be d i s p l a y e d s h a p e g r i d p l a c e s t h e i m a g e s i n a p r e−d e f i n e d , b u t v a r y i n g o r d e r . s how I m a ge s b u i l d s t h e h t m l c o d e f o r t h e p i c t u r e t a b l e . s t a r t t h e t i m e r t o s e e how l o n g t h e u s e r s e a r c h e s f o r t h e i m a ge ∗ /

$outGrid=shape_grid($toShow,$contrast,$xDim,$yDim,$runNo,$pop) ;

/ / $ o u t G r i d = $toShow ;

$timer = microtime(true) ;

$imString =

showImages($outGrid,$xDim,$yDim,$runNo,$distance,$pop,$contrast,$timer,$uid) ;

}

/∗

C r e a t e t h e h t m l f i l e a nd p r i n t i t

∗ /

$returnString="<html><head><link rel=’stylesheet’ type=’text/css’ href=’style.css’>":

$returnString= $returnString . "</head><body>\n";

$returnString = $returnString . $imString;

$returnString=$returnString."</body></html>";

/∗

I n s e r t d a t a i n t o m ys ql db

s q l Q u e r y =”SELECT ∗ FROM i m a g e s WHERE u i d = ’ $ u i d ’ ; ” ; ∗ / print($returnString) ; ?>

B.2 functions.php

<_?php /∗ F u n c t i o n s ∗ /

function distance($imhist1,$imhist2)

(35)

/∗

u s e a d i s t a n c e method e i t h e r h i s t o g r a m i n t e r s e c t i o n ,

L2 a nd k u l l b a c k−l e i b l e r , f e e l f r e e t o add more d i s t f u n c t i o n s ∗ /

$method=’histInt’;

$result = array_map($method,$imhist1,$imhist2) ;

$resultSum=array_sum($result) ;

if($method ==’histInt’) {return 1−$resultSum;} else {return $resultSum;} } /∗ D i s t a n c e f u n c t i o n s ∗ /

function l2dist($hist1,$hist2)

{

$result=pow($hist1−$hist2, 2 ) ;

return $result;

}

function histInt($hist1,$hist2)

{

$result=min($hist1,$hist2) ;

return $result;

}

function kullback($hist1,$hist2)

{

/ / a dd a s m a l l s m a l l number t o e a c h b i n t o a v o i d l o g ( 0 )

$hist1=$hist1+ 0 . 0 0 0 0 0 1 ;

$hist2=$hist2+ 0 . 0 0 0 0 0 1 ;

$result=$hist1∗ (log($hist1)−log($hist2) ) ;

return $result;

}

/∗

s e t s up t h e l i s t f o r t h e o u t p u t g r i d

∗ /

function shape_grid($imArray,$contrast,$xDim,$yDim,$runNo)

{

$mode=fmod($runNo, 5 ) ; / / g e r o s s 0 : 4 d e p e n d i n g on t h e s e r i e s

switch ($mode) {

case 0 :

for ($counter= 0 ;$counter<_4;$counter++)

{

$imArray[ ] =array_shift($imArray) ;

}

break;

case 1 :

for ($counter= 0 ;$counter<_9;$counter++)

{

}

break;

case 2 :

for ($counter= 0 ;$counter<20;$counter++)

{

}

break;

case 3 :

(36)

{

}

break;

case 4 :

for ($counter= 0 ;$counter<17;$counter++)

{

}

break;

}

return array_reverse($imArray) ;

}

function

showImages($outGrid,$xDim,$yDim,$runNo,$distance,$pop,$contrast,$timer,$uid)

{

$runNo+ + ;

$contrast+ + ;

$contrast=fmod($contrast, 2 ) ;

$returnString = "<table id=’wrapper’ border=’0’><table id=’imtable’ cellspacing=’0’";

$returnString = $returnString . " cellpadding=’10’>"; / / i n i t i e r a t a b e l l e n

$counter= 0 ;

for ($y = 1 ;$y <₌$yDim;$y++){

$returnString = $returnString . "<tr>";

for($x = 1 ;$x <₌$xDim;$x++) {

$imgUrl=urlencode($outGrid[$counter] ) ;

$returnString = $returnString . "<td align=’center’ style=’vertical-align:middle’>";

$returnString = $returnString . "<a

href=’index.php?runNo=$runNo&contrast=$contrast&";

$returnString = $returnString . "uid=$uid&timer=$timer’>" . ("<img src=’images/$imgUrl’>";

$returnString = $returnString . "</a><br></td>") ;

$counter+ + ;

}

$returnString = $returnString . "</tr>";

}

$returnString = $returnString . "</table></table>";

return $returnString;

}

function setCategory($runNo)

{

switch ( (int) ($runNo/ 5 ) ) {

case 0 : $meta=’animal’; break; case 1 : $meta=’beach’; break; case 2 : $meta=’car’; break; case 3 : $meta=’claude%20monet’; break; case 4 : $meta=’andy%20warhol’; break; case 5 : $meta=’dog’;

(37)

break; case 6 : $meta=’flower’; break; case 7 : $meta=’food’; break; case 8 : $meta=’metallica’; break; case 9 : $meta=’doll’; break; case 1 0 : $meta=’love’; break; case 1 1 : $meta=’lion’; break; case 1 2 : $meta=’garden’; break; case 1 3 : $meta=’fruit’; break; case 1 4 : $meta=’cat’; break; default:

echo "test is finished!";

break;

}

return $meta;

}

function pickPictures($img_url,$img_pop,$contrast,$totImg,$distance)

{

/∗

P i c k o u t t h e t a r g e t i m a ge f i r s t

∗ /

$toShow[ ] =array_shift($img_url) ;

$pop[ ] =array_shift($img_pop) ;

$dist[ ] =array_shift($distance) ;

$counter= 0 ;

if ($contrast == 0 ){

for ($counter;$counter<₍$totImg) ;$counter++){

$pop[ ] =array_shift($img_pop) ;

$dist[ ] =array_shift($distance) ;

} }

else {/ / H e r e we t a k e e v e r y o t h e r one w i t h h i g h a nd low d i s t a n c e

for ($counter;$counter<₍$totImg) ;$counter++){

if(fmod($counter, 2 ) = = 0) {

$toShow[ ] =array_pop($img_url) ; / / f r om t h e b o t t o m

$pop[ ] =array_pop($img_pop) ;

$dist[ ] =array_pop($distance) ;

}

else{

$pop[ ] = array_shift($img_pop) ;

$dist[ ] = array_shift($distance) ;

(38)

} }

return array($toShow,$pop,$dist) ;

}

function pruneArrays($img_url,$img_histogram,$img_pop,$runNo)

{

$counter= 0 ;

while ($counter<(int)fmod($runNo, 5 ) )

{ $tmp=array_shift($img_url) ; $tmp=array_shift($img_histogram) ; $tmp=array_shift($img_pop) ; $counter+ + ; }

return array($img_url,$img_histogram,$img_pop) ;

}

/∗ END F u n c t i o n s ∗ /

Visual attention analysis using eyetracker data

Department of Science and Technology

Institutionen för teknik och naturvetenskap

Linköping University Linköpings Universitet

LiU-ITN-TEK-A--08/117--SE

Visual attention analysis

using eyetracker data

Andreas Ferm

2008-11-21

LiU-ITN-TEK-A--08/117--SE

Visual attention analysis

using eyetracker data

Examensarbete utfört i medieteknik

vid Tekniska Högskolan vid

Linköpings universitet

Andreas Ferm

Handledare Reiner Lenz

Examinator Reiner Lenz

Norrköping 2008-11-21

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Project aim

1.2

Background

1.3

Acknowledgment

1.4

Source code

Chapter 2

Data collection

2.1

Setup

2.1.1

About the experiment

2.1.2

Images

2.1.3

Experiment platform

2.1.4

Eyetracker

2.1.5