Classiﬁcation of discrete stress levels in users using eye tracker and K- Nearest Neighbour algorithm

(1)

Classification of discrete stress levels in users using eye tracker and K- Nearest Neighbour

algorithm

Mirjam Bor´ en

Mirjam Bor´en Spring 2020

Master Thesis in Interaction Technology and Design, 30 credits Supervisor: Anders Broberg

Extern Supervisor: Alexandra Bj¨ornham Examiner: Ola Ringdahl

Master of Science Programme in Interaction Technology and Design, 300 credits

(2)

Abstract

The advancement of the Head Mounted Display (HMD) used for Virtual Real- ity (VR) has come a long way and now the option of eye tracking is available in some HMD. The eyes show physiological responses when healthy individuals are stressed, justifying eye tracking as a tool to estimate at minimum, the very presence of stress. Stress can present itself in many shapes and may be caused by different factors such as work, social situations, cognitive load and many others. The stress test Group Stroop Color Word Test (GSCWT) can induce four different levels of stress in users; no stress, low stress, medium stress and high stress.

In this thesis GSCWT was implemented in a virtual reality and users had their pupil dilation and blinking rate recorded. The data was then used to train and test a K-Nearest Neighbour algorithm (KNN). The KNN- algorithm could not accurately predict between the four different stress classes but it could predict the presence or absence of stress.

VR has been used successfully as a tool for practicing different social skills and other everyday life skills for individuals with Autism Spectrum Disorder (ASD).

By correctly identifying the stress level in the user in VR, tools for practicing social skills for ASD individuals could be more personalized and improved.

(3)

Acknowledgment

Thanks to Alexandra Bj¨ornham at Tensaii and to Anders Broberg at Ume˚a University for the supervision and guidance throughout this project. Also many thanks to everyone at Tensaii for the help with the tests and discussions which gave me a better understanding of human behaviour and how to present it in numbers. Thank you for the daily support which helped me getting through the toughest parts of this thesis.

(4)

1 Introduction

This thesis is an isolated part of a larger project that aims to develop a tool with the purpose of helping individuals with Autism Spectrum Disorder by preparing affected individuals for everyday challenges and certain situations they might encounter. This thesis will focus on the contribution of complementary knowl- edge on how to classify and determine the stress levels in users using eye tracking.

There are individual differences in stress responses when exposed to stress and one way to assess the stress level in an individual would be to measure it in real time. To classify not only the presence of stress but also what level of intensity a user is experiencing would increase the possibilities to adjust applications according to each user’s individual needs. Classification of stress levels could lead to a potential evaluating tool of user’s performance under stress in real time.

It would also allow dynamic systems the possibility to adjust according to its users’ stress levels in real time.

Stress can be measured through several physiological responses and eye tracking have been proven successful in understanding how stress affect users [31]. Eye tracking is a technology that allows tracking of the activity of the eyes. It can gather data such as; where the user is looking and for how long, if the user is blinking and how the pupils dilate [65]. There is however a lack of applications that can determine discrete levels of stress in users using the physiological responses of the eyes.

This thesis has explored the possibility of classifying discrete levels of stress using the physiological responses of the eyes. There are many different types of stress but the focus here has been on eye responses from cognitive stress. To understand the different stress responses, data from eye responses was collected from individuals exposed to an environment designed to induce various levels of stress. The two eye responses; pupil dilation and blinking rate was selected for this purpose and the data was collected using an eye tracker. The eye tracker used to collect data was located inside a Head-Mounted Display (HMD) and used in a Virtual Reality (VR). VR is an immersive technology where the user with the help of a HMD placed in front of their eyes can experience a computer simulated environment [43]. The classifying algorithm K- nearest neighbour (KNN) was then used in the analysis of the test user data to see if the correct discrete stress level could be predicted.

1.1 Objectives

The objective of this thesis is to evaluate if changes in pupil dilation and blinking rate can be used to predict different levels of stress using the classification algorithm KNN. Since the stress protocol used to gather the data only had been proven to work using other physiological responses than the eyes, this also indirectly becomes an evaluation of the stress protocol’s relation to eye-related

(7)

stress responses. To answer the question ”Can eye tracking in a Virtual Reality environment be used to build a KNN to predict the stress level in a user?” the project had three distinct goals;

• Implement the stress protocol and collect eye tracking data from test users.

• Build a KNN- classifier using the eye tracking data

• Validate classifier by testing if the classifier can predict the stress level This is an explorative hypothesis of a potentially new method to predict different levels of stress found in users using an eye tracker. Other studies has been made with similar intent, but no other study have tried to make a classifier using data from the eyes combined with the modified Stroop stress protocol, Group Stroop Color Word Test (GSCWT) and a KNN algorithm.

1.2 Thesis Organization

The theory is divided into three parts; ”Technology”, ”Human Anatomy and Physiological Reactions” and ”Machine Learning” which will explain the necessary background information needed to understand this work.

The ”Technology” chapter will introduce the different technologies used to develop and run the selected stress protocol. The chapter begins with an explanation of what virtual reality is and how it differs from other reality technologies.

It will describe two known problems in the area of virtual reality; motion sickness and the vergence accommodation conflict. At the end of the chapter is the software used for the development and testing together with the specific eye tracking equipment that was used in the study described.

The second theory part, ”Human Anatomy and Physiological Reactions” is focusing on the human anatomy and more specifically; stress responses. The chapter starts with a presentation of the human anatomy, focusing on the nervous system and how the different parts of the body are connected and affected during stress. After that is an explanation on how to define stress and how stress affects us, both in the long and in the short term. The main focus of this part is how stress affects our cognitive ability and how eye stress responses have been measured during earlier studies. The anatomy of the eyes are briefly explained and at the end of the chapter together with a presentation of the target group this thesis aims to support with the conclusions from the study.

The last theory chapters is ”Machine Learning” and it will describe what it is and how it can be used to analyse data collected from user tests. It gives a more detailed explanation of what classification is and explains the classification algorithm KNN that was used in the analysis.

(8)

The methodology chapter will motivate why different methods were chosen and describe the stress protocol. It also describes how the test data was analysed using machine learning.

The Result chapter contains data and test results gathered with the protocol mentioned in the Methodology chapter. The Discussion is where the results are analysed and discussed. The Conclusion is a short summary of the results and discussion combined with thoughts on potential future work that could be useful for this area. The last chapter of the thesis is the Author’s journey. At the very end of the thesis is the reference list and appendix which contains a summary table of abbreviations of the terminology used in the report together with the questionnaire used during the users tests.

2 Technology

This chapter will present the different technologies used to implement the stress protocol and gather the user data. The stress test was built in virtual reality which is one of several reality technologies. Virtual reality is a reality technology where the user can access a simulated, digital world using a head mounted display. The eye tracker that was used in this study was located inside the head mounted display. It was used to measure and gather data from the different eye responses of the users. Since the head mounted display tries to trick the user’s sense of movement and perception in order to simulate a different, immersive reality, motion sickness and fatigue are common problems and needs to be considered during user tests in a virtual reality.

2.1 Virtual Reality

Virtual reality (VR) was described by Ivan Sutherland in 1968 [62], as ”sur- round the user with displayed three-dimensional information”. Another way to describe VR as Sharmistha Mandal [43] phrased it is; “VR is a technology which allows the user to interact with a computer-simulated environment”.

Virtual reality is an immersive technology that allows the user access to a computer generated simulation [43]. With the use of a head mounted display (HMD), the user is immersed into the simulation and the real world is shut out [29]. The simulated environment can be a representation of the past, the present, the future or any other kind of imaginary world the creator decides [43]. This gives the advantage of having a completely controlled and safe environment, especially for researchers during experiments and tests [13]. Training done in a VR environment has also been proven to be transferable to the real world when surgeons who had additional practice in VR showed improved performance in the operating room [57].

The first HMD was invented by Ivan Sutherland in 1968 and was named ”Sword

(9)

of Damocles” [43], [62].

2.1.1 Other Reality Technologies

There are other reality technologies in addition to VR, two of them are Aug- mented Reality (AR) and Mixed Reality(MR) [29], [64]. They are similar to each other but each with their own strengths and weaknesses.

Augmented Reality (AR) adds a layer of information on top of the real world [29], [64]. In contrast to virtual reality, the user will still be able to see the physical world in addition to the digital object added through AR [64]. This technology received a lot of attention during the Pokemon Go era when users could find pokemons in different places by looking through their camera [5].

Another area for AR is the IKEA mobile application which allows users to try the furniture in their apartment using AR [33]. A common way to use AR is by using the mobile phone or tablet and its camera [64], [29], [5].

The last mentioned reality technology, Mixed Reality (MR) is similar to AR in the aspect that the user can still see the physical world with the digital object placed in it. MR is also similar to VR since it allows the objects to interact with the environment [5], [64]. Microsoft’s HoloLens is a device that uses MR.

If there was a continuum with these three realities, VR with its full immer- sion of the user using a HMD would be on one side and AR with its overlay of information on top of the real world would be on the other side. MR would be in the middle of these two with its visibility of the physical world and digital objects but with the additional interaction between the object and the environment.

2.1.2 Motion Sickness in Virtual Reality

There are a lot of different movements that can cause motion sickness. It can be caused by traveling in cars, planes or boats. It can also come from carousels in amusement parks and VR [25]. The type of motion sickness in VR can also be referred to as “cyber sickness” [40], [47]. Earlier it was suggested that these two types of sicknesses were different [47]. The reason for this was that “classic”

motion sickness comes from stimulation to the vestibular system and does not necessary depend on vision [40], since even blind people can experience motion sickness [28]. The cybersickness one can experience in VR can occur with the use of only the visual elements and without the stimulation of the vestibular system [40]. A study compared these two sicknesses and concluded that the symptoms and physiological changes were identical but still referred cybersickness as a subtype of motion sickness [47].

Another cause of motion sickness in VR is the frame rate. Individuals are less sensitive to the frame rate of a monitor than in a VR using a HMD since

(10)

the latter is more immersive [14].

2.1.3 Vergence Accommodation Conflict

Another big problem for VR and HMD is the Vergence Accommodation Conflict [39]. In the real world the eyes estimate the distance using two ways; vergence and accommodation. Vergence can be estimated by measuring how much the eyes need to rotate in opposite directions in order to focus on the same point.

The eye does not need to rotate as much to focus on something close compared to something that is far away [36]. The other way for the eye to determine the distance of an object is by measuring how much the lens is focusing [70]. When looking at objects close to us, there is an increase in focal power, and the lens contracts. When the object is more distant, the focal power is decreased and the lens becomes more relaxed [36]. Vergence distance can be simulated in VR [70]. The problem for VR in HMD is that the display is placed in front of the eyes and the focal distance, the depth the eyes are measuring does not reach further than the display causing contradicting information, see Figure 1. This contradiction of depth information from the eyes is called the ”Accommodation Vergence Conflict”.

Figure 1: Vergence and Focal distance [70]. A) How the eyes uses both vergence and focus to measure distance in real life. C) Shows the blur caused by the accommodation when the eye is focusing on something far away in real life. B) How the HMD placed in front of the eyes can only simulate the vergence when the display is placed statically in front of the eyes. D) The close, flat display does not fool the eyes completely and they accommodate as if the object were close

2.2 Eye Tracking Equipment

The headset used in this thesis is the HTC Vive Pro Eye together with two SteamVR Base Stations 2.0, and one hand controller of the 2018 model. The

(11)

HTC Vive Pro Eye is a HMD with a built in eye tracker with the possibility of tracking the following [65];

• Timestamp (device and system)

• Gaze origin

• Gaze direction

• Pupil position

• Pupil size

• Eye openness

Calibration of the eye tracker is done to get more accurate measurements of the users’ eyes. During the calibration, the user is presented with a dot and is told to fixate on it. Once fixated, the dot disappears and another dot appears in a different position. The calibration used in HTC Vive Pro Eye is a five point calibration [65], which means a total of five point will appear one after another during the calibration. Since every individual is unique and eye tracking is very sensitive, HTC recommends that a calibration of the eye tracker is done for each user and every time the HMD is adjusted [69]. There is also a recommendation to perform a new calibration every five to ten minutes during longer sessions due to head motions [14]. Both the stress protocol and eye tracking collection was made in an application built in Unity, a development platform which supports both 2D and 3D development and is used for almost two-thirds of the AR and VR experiences [68].

2.3 Summary of Technology

For this study was virtual reality (VR) technology was chosen. VR has the advantage of creating a completely controlled environment which is useful for certain studies. The head mounted display (HMD), HTC VIVE PRO EYE contains an eye tracker that can be used to record different eye response of the user. The disadvantage of VR is that it can only simulate depth to a certain degree and this simulated depth does not fool everyone. Another problem is that VR is known to cause cybersickness, a subtype of motion sickness.

3 Human Anatomy and Physiological Reactions

Humans are complex organisms with several different systems in their bodies. These systems have specific functions but affect each other in more ways than we understand today. To understand how humans react during stress it is important to understand how we function biologically. Depending on the presence or lack of stress, different systems in the human body activates and triggers different physiological reactions. Stress in humans have been proven to be measurable physiologically. The difficulty is to determine what type of

(12)

stress is affecting us since different types of stress triggers different responses.

Since it is the responses of the eyes that are going to be measured in this study, this chapter will cover the basic anatomy and behaviours of the eyes. At the end of the chapter is information about Autistic Spectrum Disorder (ASD) and how affected individuals might react slightly differently when exposed to stress compared to healthy individuals.

3.1 The Nervous System

The nervous system in humans is a very complex system and consists of two parts, the Central Nervous System (CNS) and the Peripheral Nervous System (PNS). The CNS is made up of the cells in the brain and the spine, and the PNS includes all the other nerves in the body [34], [17]. The PNS and the CNS are connected and signals are sent back and forth continuously. Signals are sent from the brain through the CNS to our muscles allowing us to move our bodies or just wiggle our toes. In return, the PNS sends back information to the CNS using sensory receptors that can signal different smells, vision, hearing, taste and touch e.t.c. [17], [19].

The PNS is divided into two smaller branches; the somatic and the autonomic nervous system [66]. The somatic nervous system is the system we can control with our free will. It enables us to move our legs and bend our fingers, to open our mouths and control other various muscles in the body [19]. The autonomic nervous system handles all the functions that we can not control with our own will. It controls thing such as the heart rate, respiration and smooth muscles, and sends information from the sensory receptors in the visceral organs back to the CNS. Three branches make up the autonomic nervous system; the sympathetic, the parasympathetic and the enteric nervous system [66].

The sympathetic system makes us more alert and prepared for some sort of activity, also known as “fight or flight” response [15], [19]. Physiologically it increases the pupil size, heart rate [19], dilates the lungs [15], increases the blood flow to skeletal muscles and cardiac muscle e.t.c. [19]. This is a response to sudden stress or a threatening situation where the body prepares itself for either flight or to fight [15]. While this system prepares the body for action, another branch in the autonomic nervous system acts as its counterpart, namely the parasympathetic nervous system. It is a system that calms the body down [15].

The parasympathetic nervous system can be called “rest and digest” and in contrast to the sympathetic system it lowers the heart rate, contracts the pupils and stimulates the digestion e.t.c. [19]. The third and last branch of the autonomic nervous system is the enteric nervous system which is known for its functions in digestion [16]. Since the eyes are part of both the parasympathetic and the sympathetic nervous system which regulates the pupil dilation and contraction, it can be used to observe and measure stress levels in an individual.

(13)

3.2 Stress

When exposed to stress, the body responds with what is called a stress response.

The stress responses are a double-edged sword, with one side helping us to make quick decisions in life threatening situations, a skill we developed through evo- lution. Acute stress is an exposure to stress that puts the body’s muscles and hormones on alert and is over within a shorter time frame [9]. Chronic stress is when the body is continuously prepared for fight or flight and never exits this state. This can be damaging in the long term with muscle tension, heart diseases and contribute to both anxiety and depression [55], [9], [27]. Chronic stress is getting a lot attention since it comes with real physiological consequences after prolonged exposure and it is affecting a considerable amount of people. Only in Canada reported 23 % of the population at the age of 15 or over, that they perceived “quite a lot” or “high levels of stress” most of their days during 2014 [12]. Stress can also be caused when there is a difference between how much we can preform and others expectations [27].

Cognition is how we perceive and interpret stimuli on a cognitive level. This includes decision making, attention, memory and learning [71], [56]. The effect of stress on our cognitive abilities varies depending on which ability is affected by what type of stress; such as origin, intensity and durability [56].Overall, stress has a negative effect on our cognition [71]. Stress is experienced mentally but it can be measured through physiologically symptoms such as heart rate, heart variability, pupil dilation, blinking rate and more [63], [71].

The effect of cognitive load on the blinking rate has been examined in several different studies and there is no simple answer dictating how the blinking rate will behave for all situations. Depending on what kind of task the user is performing the blinking rate behaves differently. One study showed that most of its users had a higher blinking rate pattern during conversations, lower during rest and an even lower blinking rate when reading [8]. The more visually demanding a task is, the more the blinking rate is reduced [54]. The explanation why blinking is reduced during visually demanding tasks is that a blink might cause a loss of information [54], [44].

One study had users doing math of different difficulties and measured an increase in pupil dilation with the increasing difficulties [35]. It also measured a reduced number of blinks with the increasing difficulty levels and cognitive load.

The same study found that a sudden increase of difficulty might increase the number of blinks but the constant increase of difficulty resulted in a reduced number of blinks. Another study showed that people with higher IQ did not dilate the pupil as much as the lower scoring peers, reasoning that this might be because they did not experience the same cognitive load [1].

One study used the ”Stroop”-stress test, a stress protocol which is very similar to the one used in the study for this thesis, and measured the blinking rate of

(14)

its participants and found that with the increasing difficulty the blinking rate decreased. The reason for this according to the study, would be the increasing visual demand for the more difficult levels [51]. As the difficulty increased, it reached a point where the performance started to decrease and was accompanied with an increased blinking rate.

3.3 Anatomy of the Eye

The eyes are located in a socket in the skull called the orbit [10] and the size of the eye of an adult measures 2,5 cm in diameter, with only one-sixth of the eye’s surface area exposed to the outside world [19].

There are six muscles attached to the eye, allowing the eye to look up, down, left, right and in different angles [10]. The eye is covered with a protective tear film made up of three different layers; a mucus layer, a watery layer produced by the lacrimal glands and an oily layer that is produced by the meibomian glands [19], [10]. The conjunctiva is an additional protective membrane covering the insides of the eyelids and over the eyes [19], [10].

On the front of the eyes is a bulky, transparent dome- shaped layer that allows light to enter the eye, past the lens and into the pupil. Already at the cornea, the light begins to refract [19] in order to focus the light to hit the retina [10]. Approximately 70% of the focusing power comes from the cornea and 30% from the lens [10].

There are three types of pupilliminary responses; two contracting and one dilation response [46]. The two contracting responses are the pupil light response that contracts the pupil when the light is too bright and the pupil near response that makes the pupil contract when an object is close. The third response, the psychosensory pupil response indicates either an emotional arousal or a cognitive overload by dilation of the pupil [46].

3.4 Autistic Spectrum Disorder

The Diagnostic and Statistical Manual of Mental Health Disorders (DSM) is given by the American Psychiatric Association [4] and is recognized by World Health Organization (WHO) as a standard classification of mental disorders [49]. With the release of the latest edition, DSM-5 in 2013; disorders such as Asperger’s, autistic disorder, pervasive developmental disorder not otherwise specified (PDD-NOS), and childhood disintegrative disorder, that once were diagnosed separately are now categorized under the umbrella term Autism spectrum disorder (ASD) [26], [38]. Some studies referred to in this thesis have used target groups defined by the old terminology, for example; ”Autism” and ”As- perger’s”, but are included since they are a part of the since updated definition of ASD at the time of writing.

(15)

According to data gathered 2014 from the Autism and Developmental Disabil- ities Monitoring (ADDA) Network, a group founded by Centers for Disease and Control Prevention (CDC) [23] which is under U.S Department of Human Health and services [23], is at the time of writing using the most recent study for statistics of the prevalence of Autism in the United States referred to by CDC [24]. ADDA estimates that one in 59 children at the age of eight years old has ASD in the United States (U.S) and that approximately 31% of the ASD individuals have a intellectual disability of IQ≤70 [6]. High functioning autism (HFA) is not an official medical diagnosis but it is often used to describe individual diagnosed with autism or PDD-NOS with an IQ≥70 [20]. There is a gender difference in the prevalence of ASD between male and female estimated to 3:1 [42].

Unfortunately there are limited research on stress signals of the eyes of individuals with ASD. Since there are no research stating that individuals with ASD would react differently except with a possibility of a stronger or a weaker reaction since young ASD individuals have larger tonic pupil sizes compared to non-ASD individuals [2] and given that certain stress reactions of the eyes are determined by the autonomic nervous system, eye tracking was decided as a possible tool to estimate stress for not only healthy individuals, but also individuals with ASD until proven otherwise.

3.5 Summary Human Anatomy and Physiological Reac- tions

Even though stress is a mental phenomenon, it can be measured physiologically.

A balance between two nervous systems keeps our body and its stress responses in check and adjusts accordingly when needed. Stress is a wide term and can be caused by different things, and depending on the nature of the stress, its origin, intensity, and many other factors we respond differently. Blinking rate have been proven to decrease as the cognitive load increases but if there is a sudden increase of stress, there is an increase of the blinking rate. The pupil dilation is also shown to increases with an increasing cognitive load.

4 Machine Learning

Machine learning is a branch in Artificial Intelligence and is about how to teach the program to find patterns in existing data and use this to form predictions [21]. By using machine learning to train an algorithm on the test users’ eye responses, predictions of stress levels in new, untested users can be made. There are several types of learning used in machine learning that can be used in different situations. Depending on the shape of the data training the algorithm, different types of learning are possible. The type of learning will then decide what machine learning processes and algorithms can be used. The prediction of a stress level in a user is a process called ”classification”. The classification

(16)

process requires a certain type of learning and can be performed using different algorithms.

At the end of the chapter is data cleaning, the removal of unwanted data and the model training problems overfitting and underfitting explained.

4.1 Different Types of Learning

During supervised learning the algorithm is given the input X and the output Y to train the algorithm to map the function Y = f(x) [22]. The data given to the algorithm for the algorithm to learn will be referred to in this thesis as training data. Using the training data where the input has the desired output labeled, the algorithm can train itself to predict the output value when new, unlabeled input X is given. For supervised learning to predict as correctly as possible, it requires enough training data to approximately map the function [30].

Unsupervised learning in contrast to supervised learning, does not have any labels, or Y - values to the given input X in the training data. Instead of trying to find the Y, the task is to understand the structure of the input, based on the inputs’ features [22]. This is used for cluster algorithm and association algorithms [22], [7]. Semi- supervised learning is a mix between the supervised and the unsupervised learning where the data set is incomplete. In an incomplete data set, there are datapoints where the input is missing a labeled output in the data set [22]. The last type of machine learning algorithm is the reinforcement learning. It is when the algorithm tries to complete a task, relying on its pre- vious experiences. Should the performance improve, the algorithm is rewarded through positive feedback [22].

”Arguably, Machine Learning models have one sole purpose; to generalize well”- Anas Al-Masri [45]

In machine learning, overfitting and underfitting are two extremes on both sides when we train and test our models. An underfit model can not predict correctly since it has not learned enough from the data and can therefore not generalize.

An overfit model has learned too much from the data. The model is specialized on that particular training data and can not generalize when receiving new data [45]. These two concepts can be better explained using bias and variance. Bias is defined as the the difference between our prediction and the true label of the datapoint. Variance is error from fluctuations in our training data. Should the model have a high variance, it will include the random noise from the training data into the model [45]. Usually, an underfit model has low variance but high bias and an overfit model has a low bias and a high variance. Balancing the variance and bias against each other is called the ”Bias- Variance Tradeoff”

where the desired outcome for a model is a low bias and a low variance.

(17)

4.2 Classification

Classification in machine learning is a supervised learning and is the process of predicting the class of a given datapoint based on its features. To train an algorithm for classification, the training data is given to the algorithm where each of the datapoints, the input X has been labeled with a class as their Y- output [30]. After a program has been given enough data to approximate a mapping function between the input X and output Y it should be able to, based on its existing training datapoints’ features, correctly predict what type of predetermined class a new datapoint should be labeled as [3]. Much like if a child is shown several pictures on dogs and told ”dog” by it’s parent and later when the child sees a dog, the child should be able to tell ” dog” [30].

Classification can be binary i.e. deciding between two classes, or decide between several classes [3]. Binary classes in the example with the child could be “dog”

and “not dog”. Several classes could be different types of animals or different discrete levels of stress. Since the output is a label of a class, classifications will only give a discrete value as the output [41], [30] .

4.3 K - Nearest Neighbour

One of the classification algorithm used in machine learning is the K- nearest neighbours (KNN). Both classification and regression problems can be solved using KNN [67]. A regression problem will in contrast to classification give a real number as the output value, such a weight or dollars [30].

The KNN- algorithm is a lazy learning algorithm which means it will store the training data until it is time to predict new data. It does not have a training phase, in contrast to eager learning algorithms, the other type of learning algorithm used in classification. Eager learners have a training phase where it builds a model which it uses during the prediction phase. Because of this, the prediction phase will be slower for the lazy learning algorithms compared to the eager learners since it has to check against the training data while the eager learners use their constructed model made during the training phase [67], [3].

The KNN- Algorithm is as follows:

1. The training and the test data is loaded

2. An integer “k” to represent the number of neighbours is initiated

3. Use a distance function to calculate the distances between each test data and each of the rows of the training data. The most common one is the Euclidean function but this thesis will also use the Manhattan and Minkowski function.

4. Sort the distances in ascending order

5. Select “k” number of neighbours from the sorted distance data collection

(18)

6. Determine what class the test data belongs to, based on the labels from the k number of selected neighbours of training data

[30], [67].

Useful areas for KNN would be in recommendation systems, to recommend similar products or articles to customers [30]. Other areas include speech recognition and image recognition [67].

One disadvantage with KNN is that the algorithm becomes slower with increasing data or a high k- value [30]. Two advantages with KNN is that it is an easy algorithm to understand and it makes no assumptions about the data, which is useful in the case of nonlinear data [67].

4.3.1 K- Neighbours

The KNN- algorithm uses the principle that similar things stay together and by using this principle predicts what class new data belongs to. The ”K” in KNN represents the number of neighbours we look at to decide what we should classify the new, input data as [67], [30]. Figure 2 is an example of when a dataset is given a new input and need to determine the label. To determine the label of the new input the majority of closest neighbours, in this case k=3 are inspected. Two of the neighbours belong to the green category, with the number

”2”, and only one of the three closest neighbours belong to the red category ”3”.

Using the majority of neighbours, the new input will be labelled as green ”2”.

1

1 1

2 2

2

3 3

3 3 10 ?

20

0

0 100 200 300

Figure 2: The new data will be determined by the majority of the neighbours, with k=3. In this case the green no. 2 class is in majority over the one red no.3 class.

Selecting the most suitable ”k” is not the easiest task since there are no determined k-value that works the for all equations. The best ”k” is something that must be tested by the algorithm. A ”k” that is too small will be easily

(19)

affected by small fluctuations in the data [30], [41]. A too big ”k” will take longer time to process, complicate the model [41] and lower the variance but increase the bias with smoother decision boundaries [41], [61]. It is recommended to have an odd number ”k” to have a tiebreaker when the algorithm is using majority to decide the class label [30], [67].

4.3.2 Determine the Distance Value

To find similarities among the different data points, we use a distance function.

The distance between the elements represents the difference, making elements with a distance of zero equivalent [58]. To determine what class a datapoint belongs to, the KNN- algorithm observes the labels of the closest ”k” neighbours.

To decide the closest ”k” neighbours, a distance function is used. There are several different functions to choose from and the choice of distance function can affect the classification accuracy of the KNN- algorithm. The most commonly used distance function for the KNN- algorithm is the Euclidean function [32].

The Minkowski distance function (1) is a generalized distance function and can be manipulated to other known distance functions [58], all used to calculate the distance between the point x and point y. This is done by calculating the absolute sum of difference between the coordinates in the two vectors x and y, with the xi and yi representing the different variables in the vectors and n being the number of variables used [58]. The factor p changes depending on what distance function is being used.

n

X

i=1

|xi− yi|^p

!^1/p

(1) The Manhattan distance (2) is calculated using the Minkowski distance metric, using p= 1.

n

X

i=1

|xi− yi|¹

!1/1

(2) The Euclidean distance, (3) the most commonly used distance metric for KNN- algorithms is calculated using the Minkowski distance metric, using p= 2.

n

X

i=1

|xi− yi|²

!1/2

(3)

[58]

4.3.3 Weights; Distance or Uniform

The way to determine the label of a data point is to look at the values of the closest ”k” neighbours. When the weight is uniform, the majority vote of the closest neighbours are used regardless of how close or distant they are to the data

(20)

point in question. In some situations, it might be better to consider the closer neighbours more than the distant ones. This can be done by assigning weights to the different data points. When weight is used, the neighbours’ weight is proportional to the inverse of the distance to the data point the algorithm is trying to classify [11]. This means that the neighbours closer to the data point weigh more than the data points further away. This weight will be referred to as ’distance’ in this thesis.

4.3.4 Training and Testing Sample Size

The data used for KNN is divided into two parts, one for training and one for testing. The training data used in supervised learning, such as KNN has a complete data set for every X and Y value. The training data is used to train the algorithm and is usually bigger than than the test data, which is used to validate the algorithm after the training [67]. In this thesis, two different sizes of the training and test data are tested.

4.4 Data Cleaning

The access to information is critical to most sectors today. The information available affects the decisions made in governments, research and the economy.

There is a lot of information out there today and more is being produced every second but just because the information is available does not mean we should use all of it. All information available might not be relevant and some of it could be incorrect [53]. This is where data cleaning, the detection and removal of incorrect and poor quality data comes in. Data cleaning is important since incorrect information could lead to results that does not represent the actual situation in a correct way, causing poor and costly decisions.

A single data collection, in the form of a file or a database can contain small errors such as spelling errors, missing entries or other forms of invalid data [53].

When many single data collections are merged, more complex and informative data sets are created. Should these single data sets contain several errors each, the errors will add up in the merged data set. For the merged data set to re- main accurate after the merges, elimination of extra unnecessary information and duplicates from the different data sets might be necessary [53].

4.5 Summary of Machine Learning

Machine learning is a branch of Artificial Intelligence and is about teaching the program to find patterns in existing data and use this to form predictions [21].

Depending on the shape of the data, different types of teaching can be applied and with that, different algorithms. The classification algorithm K- nearest neighbours (KNN) have several different variables such as the ”k”-value, distance function, weight and size of the training and test data, all of which can be adjusted to fit the data set better.

(21)

The preparation of the data before it is used in machine learning is also important. It might be necessary to clean the data to remove poor quality data from the data set to avoid misleading results [53].

5 Methodology

A stress test was implemented in a virtual reality using an eye tracker capable of registering the blinking and pupil dilation of the users. The test was conducted with 11 users but due to technical problems with the collection of data, one of the stress levels for one of the users was removed. The data from one of the tests; T0, was used to find the baseline for each user. The baseline was used to calculate the average change in pupil dilation and blinking rate for each of the stress level, for every user. Supervised learning was used to train the KNN- algorithm by using the changes of the eyes as input and the stress level as output.

The collected data was used in a KNN- algorithm made by following a tutorial [59] and several tests where the four different variables used in the KNN- algorithm were changed. The four variables were; the distance function, weight, the test and training data size and the ”k”- neighbours. More information about these variables and their roles in the KNN- algorithm is found in Chapter 4. An additional fifth variable was changed to test the KNN- algorithm; the number of features which is more related to the data in the dataset than the KNN- algorithm itself.

5.1 Choice of Test

There are at least four well known, protocols proven to induce stress in participants. The protocols are Tier Social Stress Test (TSST), Stroop, bicycle ergometer test (Ergometer) and Cold Pressor Test (CPT) [60]

By examining and comparing these four test, the Stroop test was deemed the most suitable of them. The reasons to exclude the other tests were the following;

The Ergometer is a test where the participant is using a stationary bicycle for a specified amount of time, at a specified level of intensity [60]. Time and intensity can vary depending on the purpose of experiment, as done in the following studies; [60], [48]. Ergometer is the most efficient test of the four mentioned to induce autonomic stress responses. Due to the fear of to much head motions and thereby disturbing the eye tracker during the procedure, the Ergometer was excluded. One important note mentioned in [60] is that the Ergometer is more related to physical stress than physiological induced stress. A study using this test might have to specify what type of stress they want to achieve more specific than ”stress” [60]. The TSST is the most efficient protocol to induce stress in social-evaluative threat situations [60]. That means situations where the test subject experiences social stress from potentially being judged in a negative

(22)

way by others [52]. This method was excluded since no TSST protocol explain- ing how to induce different intensities of stress could be found. Whereas for the Stroop test, one proven protocol inducing different intensity of stress was found;

Group Stroop Color Word Test (GSCWT) [27] and it will be described later in this section. CPT is a pain induced test where the participant is instructed to place a hand in cold water, approximately 3-4°C for as long as possible. This test will be stopped after 3 minutes, but the participants are not informed of this in advance [60]. The CPT was excluded since Stroop measures an increase in both physiologically and perceived stress by participants compared to CPT [60]. In addition to mentioned factors above, Stroop has been proven successful in a virtual environment [50].

5.1.1 Stroop Test

During a Stroop test the participant is shown a word of a color, example ”red”,

”green”, ”yellow” or ”blue”. Each of the words are written in one of the colors mentioned above, but not necessarily the same color as the word. For example, if the word ”red” is written in a blue color, the participant have to say the word blue. The participant is told to try to only give correct answers.

5.2 Group Stroop Color Word Test Protocol

Using the same layout as the known protocol [27], the GSCWT is made up of six different stages, where each of the stages have a different design for how the words are presented for the user. There are two different screens presented to the user at different time points, one source screen and one destination screen.

5.2.1 Layout of the Stages

There are three different ways to write and color the different word presented for the user are; congruent, incongruent and white, see Figure 3 below. Congruent is when the color of the text is the same as the written word, for example; the text “red” would be in a red color. Incongruent is when there is a mismatch between the written word and the color of the word, for example; the text “red”

would be colored blue, green or any other color other than red. The third way of presenting the words is by having the word on the source screen either congruent or incongruent but colored all the words on the destination screen white.

Table 1 shows how each of the stages are designed.

(23)

Figure 3: The three ways a button can be presented. Top button is congruent, the middle button is incongruent and the bottom button is what is referred to as ”white”

Stage Description Table

Stage no. Source word Destination words

Stage 1 Congruent White

Stage 2 Incongruent White Stage 3 Congruent Congruent Stage 4 Incongruent Congruent Stage 5 Congruent Incongruent Stage 6 Incongruent Incongruent

Table 1: Table showing the structure of the source word and destination words for every stage

On top of these six different stages, there are six different levels of complexity. Each complexity group (CG) is labeled 1 to 6. where 1, (CG1) is the lowest complexity and group 6, (CG6) is the highest complexity. Every CG is made up of a combination of the six stages. There are also three levels of interference, each designed to induce either low, medium or high level of stress.

Every interference level will consist of six CG where every group is made up of a combination of the six stages (ST). The first level of interference is shown in Table 2, the Low Complex Group (LCG), shows the stages 1-6 in a sequence throughout LCG 1-6. The second level of interference, the medium interference have the medium complexity groups (MCG) in a pattern taken straight from the protocol [27], as shown in Table 3. The order in the complexity groups for

(24)

high level of interference, the high complexity groups (HCG) are randomized, see Table 4. These groups where randomized using the randomizing generator from random.org [18].

Complexity Group Design for stages in Low level Interference Low Complex Group Low level interference

LCG1 ST1 ST2 ST3 ST4 ST5 ST6

Table 2: Table showing how the different stages are ordered for every complexity group in the low level interference

Complexity Group Design for Medium level Interference Medium Complex Group Medium level interference

MCG1 ST1 ST2 ST3 ST4 ST5 ST6

Table 3: Table showing how the different stages are ordered for every complexity group in the medium level interference

Complexity Group Design for High level Interference High Complex Group High level interference

HCG1 ST2 ST3 ST6 ST4 ST1 ST5

Table 4: Table showing how the different stages are ordered for every complexity group in the high level interference

Following the protocol [27], the subject will be presented the source word

(25)

for a duration of 1 second. Then the second screen, the destination screen will appear after 0,5 seconds. The subject will have 3 seconds to choose the correct answer from the options presented on the destination screen. Every stage will be presented to the user twice, which will give each of the interference tests a duration of approximately five minutes, see Equation (4);

4, 5seconds/stage × 6stages × 6CG × 2(eachstagetwice) ÷ 60 ≈ 5minutes. (4) 5.2.2 Collection of Eye Data

The data cleaning in this thesis for the collection of the pupil dilation and blinking rate data is relying on a study proposing to eliminate the first 20 seconds to give the subject time to stabilize the blinking rate in the new visible environment [8]. Since the collection time for this thesis was almost 3 minutes longer than the proposing study, the first 30 seconds was discarded for good measure. The average time between blinks is between 2-10 seconds for adults [37]. This also lead to a data cleaning with the removal of possible double blinks.

If there was less than 2 seconds between two blinks, the blink was not included in the analysis.

5.2.3 The User’s Own Opinions

In the breaks between the different tests, the user was asked to fill in a stress evaluation form. The form was given to the user directly after each stress level test, including the baseline, T0 test. The form consisted of two questions when the user selected its stress level and mental fatigue on a four graded scale starting at ”No stress” and ”No fatigue” up to ”High stress”/ ”High fatigue”, see appendix.

5.3 Analysis of the Data

After the data had been collected from the user tests, data cleaning and an analysis was performed. The analysis of the data was conducted with a KNN- algorithm where different variables were tested. In total there were 11 user tests and data was collected during each of the T0, low, medium and high stress test.

For one of the users, the low level stress test had to be removed. At the end of the user testing there were a total of 11 data points for T0, medium and high stress level test but only 10 data points for the low stress test. Resulting in a total of 43 data points to analyze.

5.3.1 Preparation of User Data

The average size of the pupils for each of the users during each stress level was used as the pupil dilation attribute. For each of the different stress levels the user did during the user testing, data was gathered from both the right and the left eye. Both eye’s pupil sizes were measured continuously and saved. For each collected data point, an average of the right and left pupil size was calculated.

(26)

After that, an average pupil size for that user during that specific stress level, was calculated.

The blinking rate for each user was also calculated for each stress level. The number of times the user closed its eyes during the entire test was collected and then used to get the average number of blinks per minute. For it to be collected as a blink, there was a requirement that the eyes had to be open for two or more seconds between the blinks.

The data from the first test for each user where no stress was induced, referred to as T0, was used as the baseline. The changes from this baseline in the pupil dilation and blinking rate was then calculated in percentage for the low, medium and high stress level tests and used in the KNN-algorithm. The KNN- algorithm was implemented following a tutorial made by ”Simplilearn” [59].

5.3.2 KNN analaysis

The KNN- algorithm have several variables that can be adjusted and affect the resulting prediction. To get a more correct value, the KNN-algorithm was run 1000 times and the average from the 1000 runs was used as the result. Figure 6 shows how the average predicting value got more stabilized after 1000 runs.

The complete data set contained a total of six attributes;

• The order the test were taken

• The user’s perceived stress level

• The user’s perceived mental fatigue

• The user’s changes in blinking rate with T0 as 1

• The user’s changes in pupil dilation with T0 as 1

• The class label

The variables tested were the ”k”- value, the number of features, different weight and different sizes on the training and test data. First the ”k”-value was tested using five different neighbours with a test size of 10% but without the ”Order”

attribute. This was done for both uniform weight and distance weight. The same procedure was done again, this time with the ”Order” attribute. To test the size of the training data size, the best predictions from both the with and without ”Order” attribute tests were used and tested with a 20% test data size.

The last test was a binary classification and used the best predictions produced from the 10% test sample size, with ”Order” attribute and uniform weight.

The binary classification had T0 labeled as no stress (”0”) and the three other stress levels were labeled as stress (”1”). This tests whether or not the KNN- algorithm can detect the presence of stress using the collected data.

(27)

6 Result

The results were produced by testing the data from the 11 user tests that had been collected and prepared for the analysis. To get a better understanding of how the different attributes are connected, they are presented visually in the beginning of the chapter. The rest of the chapter shows how the predictions are affected when different variables are changed. The variables are adjusted and tested until the best combination of variables for each of the three distance functions using this data set have been found. At the end of the chapter there is a binary classification to test how the KNN- algorithm performs using only two classes; Stress and No stress.

6.1 Pupil Dilation and Blinking Rate

Figure 4: The distribution of the four stress levels, T0 is blue, low is yellow, medium is green and high stress level is red

The data collection for all the users blinking rate and their change in pupil dilation for each of the different stress levels is visible in Figure 4. A blink was registered as a blink if the eye had been open for at least 2 seconds between two blinks.

(28)

Figure 5: Blinking rate change plotted against pupil dilation change colored in the four different stress levels

Only using the features ”Blinking rate” and ”Pupil dilation” does not give visible clustering for the different stress levels, as shown in Figure 5. The color blue is representing T0, and the orange, green and red colors represent low, medium and high stress level tests, in that order.

6.2 Testing the Different Variables

The data set was split into a training set and a test set. The test set was tested with two different sizes. The input data used was standardized to a value between 0 and 1. More details on this later in this chapter in Table 10 and Table 11. The algorithm used the training data and tested with different values for the ”k” neighbours, weight, with/without the ”Order” feature and for different sizes of the test and training set. The average percentage of correct predictions of 1000 runs was then collected in the different tables. Figure 6 below, is an example of a test to illustrates how the averages are more stable after 1000 runs.

(29)

Figure 6: An example of how the averages are becoming more stable at 1000 runs compare to < 100 runs

6.2.1 Finding the Neighbour ”k” without ”Order” Feature The data set for Table 5 and Table 6 contained

• The class label

Average prediction correct of 1000 runs in percentage (%) Weight: Uniform, Without Order, Test size: 10%

Neighbours(k) 1 2 3 4 5

Euclidean 33,22 28,92 36,82 33,48 27,88 Manhattan 33,2 29,4 38,42 33,9 31,64 Minkowski 33,02 28,94 35,94 35,2 28,32

Table 5: Table showing how the three different distance algorithms perform on average during 1000 runs with different ”k”- neighbour values. Using a uniform weight, no order and a test sample size of 10%. The bold text shows the best prediction for each of the distance functions

(30)

Average prediction correct of 1000 runs in percentage (%) Weight: Distance, Without Order, Test size: 10%

Euclidean 33,56 31,88 33,98 34,94 31 Manhattan 33,08 33,86 35,68 37,48 37,2 Minkowski 32,38 32,9 34,12 34,42 30,48

Table 6: Table showing how the three different distance algorithms perform on average during 1000 runs with different ”k”- neighbour values. Using a distance weight, no order and a test sample size of 10%. The bold text shows the best prediction for each of the distance functions

In Table 5 each of the three different algorithms got their highest percentage of correct predictions with a ”k”- neighbour value of 3. In Table 6, the three different distance functions got their highest percentage of correct predictions with a ”k”- neighbour value of 4. The only difference between the two tables, Table 5 and Table 6 is that the weight is either ”uniform” or ”distance” which is defined in 4.3.3. The highest percentage correct predictions for each of the distance functions are found in Table 5, using uniform weight.

6.2.2 Finding the Neighbour ”k” With ”Order” Feature The data set for Table 7 and Table 8 contained

• The order the test were taken

• The class label

Average prediction correct of 1000 runs in percentage (%) Weight: Uniform, With Order, Test size: 10%

Table 7: Table showing how the three different distance algorithms perform on average during 1000 runs with different ”k”- neighbour values. Using a uniform weight, the order feature and a test sample size of 10%. The bold text shows the best prediction for each of the distance functions

(31)

Average prediction correct of 1000 runs in percentage (%) Weight: Distance, With Order, Test size: 10%

Table 8: Table showing how the three different distance algorithms perform on average during 1000 runs with different ”k”- neighbour values. Using a distance weight, the order feature and a test sample size of 10%. The bold text shows the best prediction for each of the distance functions

In Table 7 with uniform weight, the Euclidean got its highest percentage of correct predictions with a ”k” neighbour of 2. Both Manhattan and Minkowski got their highest percentage of correct predictions with a ”k” neighbour of 3.

In Table 8 with distance weight, the Euclidean got its highest percentage of correct predictions with a ”k” neighbour of 3. Both Manhattan and Minkowski got their highest percentage of correct predictions with a ”k” neighbour of 4.

The only difference between the two tables, Table 7 and Table 8 is that the weight is either ”uniform” or ”distance”. The difference between the two tables; Table 5 and Table 6, compared with the two tables; Table 7 and Table 8 is the addition of the ”Order” feature.

6.2.3 Comparing the Best k With and Without Additional Feature The best predictions from Table 5 to Table 8 summarized in two different tables assorted by the two weights; uniform and distance. Table 9 and Table 10 compares the predictions from with and without the addition of the feature

”Order”.

Average prediction correct of 1000 runs in percentage (%) Weight: Uniform, Test size: 10%

Without order k With order K

Euclidean 36,82 3 43,28 2

Manhattan 38,42 3 46,1 3

Minkowski 35,94 3 42,72 3

Table 9: Table summarizing how the three different distance algorithms with a uniform weight performed on average during 1000 runs with their best ”k”- neighbour values from Table 5 and Table 7. Comparing the presence of the ”Order” feature and a test size of 10%. The bold text shows the best prediction for each of the distance functions

(32)

Average prediction correct of 1000 runs in percentage (%) Weight: Distance, Test size: 10%

Euclidean 34,94 4 36,84 3

Manhattan 37,48 4 40,44 4

Minkowski 34,42 4 37,5 4

Table 10: Table summarizing how the three different distance algorithms with a distance weight performed on average during 1000 runs with their best ”k”- neighbour values from Table 6 and Table 8. Comparing the presence of the ”Order” feature and a test size of 10%. The bold text shows the best prediction for each of the distance functions

Both Table 9 and Table 10 shows that for all of the three distance functions, the addition of an ”Order” feature yielded a higher percentage of correct predictions for both uniform and distance weight. Comparing the predictions with the

”Order” feature show that a uniform weight (Table 9) has a higher prediction for all of the distance functions compared to the distance weight in Table 10.

6.2.4 Comparing the Best k With a Different Test Sample Size The best predictions from Table 5 to Table 8 summarized in two different tables assorted by the two weights; uniform and distance. Table 11 and Table 12 compares the predictions with and without the addition of the feature ”Order”.

It is done the same way as Table 9 and Table 10 but this time using a test sample size of 20%.

Average prediction correct of 1000 runs in percentage (%) Uniform, Test size: 20%

Euclidean 35,36 3 43,29 2

Manhattan 37,51 3 44,5 3

Minkowski 35,08 3 41,4 3

Table 11: Table showing how the three different distance algorithms perform on average during 1000 runs with their best ”k”- values taken from the uniform weight tables; Table 5 and Table 7. A comparison between the with and without the ”Order” feature using a test size of 20%. The bold text shows the best prediction for each of the distance functions

(33)

Average prediction correct of 1000 runs in percentage (%) Distance , Test size: 20%

Euclidean 34,02 4 37,63 3

Manhattan 37,72 4 39,59 4

Minkowski 32,84 4 37,19 4

Table 12: Table showing how the three different distance algorithms perform on average during 1000 runs with their best ”k”- values taken from the distance weight tables; Table 6 and Table 8. A comparison between the with and without the ”Order” feature using a test size of 20%. The bold text shows the best prediction for each of the distance functions

Both Table 11 and Table 12 shows that for all of the three distance functions, the addition of an ”Order” feature yielded a higher percentage of correct predictions for both uniform and distance weight using a test sample size of 20%. The best predictions for both weights using a 20% test sample size was found when using the ”Order” feature. When looking at the ”Order” feature addition, all the distance functions had a higher prediction using the uniform weight (Table 11) compared to the distance weight (Table 12).

6.2.5 Comparing the Two Test Sample Sizes

The best predictions from the different tables when a test sample size of 10%

was used and when a test sample size of 20% was used. Summarized in two different tables assorted by the two weights; uniform and distance.

Average prediction correct of 1000 runs in percentage (%) Uniform, with order

10% k 20% k

Euclidean 43,28 2 43,29 2

Manhattan 46,1 3 44,5 3

Minkowski 42,72 3 41,4 3

Table 13: Comparing the 10% and the 20% test sample size for uniform weight with order. The 10% entries are from Table 9 and the entries in the 20% column are from Table 11.

(34)

Average prediction correct of 1000 runs in percentage (%) Distance, with order

10% k 20% k

Euclidean 36,84 3 37,63 3

Manhattan 40,44 4 39,59 4

Minkowski 37,5 4 37,19 4

Table 14: Comparing the 10% and the 20% test sample size for distance weight with order. The 10% entries are from Table 10 and the entries in the 20% column are from Table 12.

Table 13 and Table 14 show that the Euclidean distance function has the best prediction with a 20% test sample size, for both the uniform and the distance weight. These two tables also show that a 10% test sample size yielded a better prediction for both Manhattan and Minkowski for both uniform and distance weight compared to a 20% test sample size.

6.2.6 The Best Combination of Variables for Each Distance Func- tions

The best performance of the KNN- algorithm for each of the distance functions can be achieved with the following variables:

• Euclidean: With order, Uniform, k = 2, 20% test sample = 43,29%

• Manhattan: With order, Uniform, k = 3, 10% test sample = 46,1%

• Minkowski: With order, Uniform, k = 3, 10% test sample = 42,72%

The best performance for the KNN- algorithm with the data set provided is achieved by using the Manhattan distance function, with order, uniform weight, k = 3, and a 10% test sample.

6.2.7 Comparing the Performance of the Algorithm With That of a Monkey

From each of the distance functions summarized in 6.2.6, a prediction value was retrieved. The prediction covered the four classes; T0, low stress, medium stress and high stress. In Table 15 below, ”Total” is the best prediction for each of the distance functions and ”T0” is the percentage of the prediction consisting of the T0 class. The ”Stress Levels” is the remaining percentages after ”Total”-”T0”, representing the correct prediction for the three classes; low stress, medium stress and high stress.

(35)

Average prediction correct of 1000 runs in percentage (%)

Total T0 Stress Levels

Euclidean 43,29 25 18,29

Manhattan 46,1 26 20,1

Minkowski 42,72 24,96 17,76

Table 15: Taking the best prediction for each algorithm and separating it between the class T0 and the three classes; low stress, medium stress and high stress

Four classes were used in this algorithm; T0, low stress, medium stress and high stress. T0 has a value around 25% which represents 1/4 of the classes.

This means the program can identify the T0 most of the times. Since the algorithm ran 1000 times and the number of data points representing each of the different classes was almost equal, each of the four classes would represent 25% if everything was predicted correctly. The remaining three classes are sharing on a prediction closer to 20% or less, meaning the program have more difficult to identify among these stress levels. Had the three remaining classes been guessed at random, the algorithm would have generated a prediction of 25% since there would have been 1 in 4 it would have picked the correct class. In conclusion, the algorithm performs worse than random for all the classes except for T0.

6.2.8 Binary Classification

The datapoint were changed to a binary classification, meaning only two classes to classify between. T0 had the same label of ”0”, representing no stress, while the other stress levels got a label of the ”1”, to indicate a stress level. The best predictions without the ”Order” feature with a 10% test sample size was selected. The uniform weight was selected from Table 9 since it performed with a better prediction than the distance weight listed in Table 10.

Average prediction correct of 1000 runs in percentage (%) Weight: Uniform, Without Order, Test size: 10%

k Total T0 Stress Levels

Euclidean 3 81,72 15,72 66

Manhattan 3 85,3 22,6 62,7

Minkowski 3 79,82 14,94 64,88

Table 16: Table showing the predictions when only using binary labels, stress, no stress

The binary classification had a much higher prediction than the most opti- mized variable combinations seen in 6.2.6 where four different classes were used

Classiﬁcation of discrete stress levels in users using eye tracker and K- Nearest Neighbour algorithm

Classification of discrete stress levels in users using eye tracker and K- Nearest Neighbour

algorithm

Mirjam Bor´ en

Abstract

Acknowledgment

Contents

1 Introduction

1.1 Objectives

1.2 Thesis Organization

2 Technology

2.1 Virtual Reality

2.2 Eye Tracking Equipment

2.3 Summary of Technology

3 Human Anatomy and Physiological Reactions

3.1 The Nervous System

3.2 Stress

3.3 Anatomy of the Eye

3.4 Autistic Spectrum Disorder

3.5 Summary Human Anatomy and Physiological Reac- tions

4 Machine Learning

4.1 Different Types of Learning

4.2 Classification

4.3 K - Nearest Neighbour

4.4 Data Cleaning

4.5 Summary of Machine Learning

5 Methodology

5.1 Choice of Test

5.2 Group Stroop Color Word Test Protocol

5.3 Analysis of the Data

6 Result

6.1 Pupil Dilation and Blinking Rate

6.2 Testing the Different Variables