Detecting Cognitive Impairment with Eye Tracking Data during Picture Description

(1)

INOM

EXAMENSARBETE

TEKNIK,

GRUNDNIVÅ, 15 HP

,

STOCKHOLM SVERIGE 2020

Detecting Cognitive Impairment

with Eye Tracking Data during

Picture Description

Detektera Kognitiva Svårigheter med Eye

Tracking Data under Bildbeskrivning

MIMMI ANDERSSON

LOUISE VON SYDOW YLLENIUS

KTH

(2)

Detecting Cognitive Impairment with Eye Tracking Data during Picture

Description

M. Andersson

1

, Student, KTH Royal Institute of Technology,

L. von Sydow Yllenius

2

, Student, KTH Royal Institute of Technology

Abstract— The growing numbers of people suffering from Alzheimer’s and other dementia related diseases are expected to accelerate, and the cost for these diseases in Swedish healthcare is high. There are many ongoing research projects in the dementia diagnostics field which aim to detect cognitive impairment at an earlier stage, which would result in reduced costs in healthcare and improved life quality for sufferers. This work aims to investigate if it is possible to classify cognitive impairment based on a person’s eye movements. More specifically, it will explore the possibility of automating an established picture description task that is widely used in traditional dementia diagnostics. In order to do this, eye tracking data was collected during numerous conductions of this task. The eye tracking data was then parsed in to eye movement features and Binary Logistic Regression was used to classify these eye movements. The results showed that the average accuracy of the classification reached 73%. The results did not confirm that eye tracking technique can be used to automate neuropsychological test with an accuracy high enough, but to use a machine learning approach for detecting deviances in eye movement patterns appears to be a promising approach. Furthermore, this work analyzes the possibilities for practically implementing eye tracking techniques in Swedish healthcare in order to detect cognitive impairment at an earlier stage. Provided that an eye tracker can detect cognitive impairment with an accuracy equal to or higher than a medical professional can maintain, the study argues that automated neuropsychological tests at health clinics could be the key to detect cognitive impairment at an earlier stage.

Index Terms—Dementia, Swedish healthcare, diagnostic, neuropsychological test, automatisation, eye tracking, classification, machine learning

Sammanfattning—Antalet personer som lider av alzeimers och andra demensrelaterade sjukdomar förväntar att öka med accelerande fart och kostnaden för dessa sjukdomar för svensk sjukv˚ard är hög. Det finns idag m˚anga p˚ag˚aende forskningsprojekt inom demensdiagnostik där man analyserar personers ögonrörelser för att kunna detektera kognitiva sv˚arighetet i tidigt stadie. Forskningen görs för att minska

Submitted 2020-05-07. This work is a part of an active 5-year research project named EACare, led by representatives from both the department of Speech, Music and Hearing at the Royal Institute of Technology and the Karolinska Institute. We wish to thank Jonas Beskow for supervising and providing neccessary information and materials to make this project possible. We also want to thank Krister H˚akansson who supported us with medical knowledge about cognitive impairment. Last but not least, we want to thank Olga Mikheeva for helping us providing and interpreting data.

1_{Mimmi is a student in Industrial Engineering and Management}

specialising in Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden (email: mimmiand@kth.se).

2_{Louise is a student in Industrial Engineering and Management}

specialising in Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden (email: yllenius@kth.se).

p˚a kostnaden och öka livskvaliten för de som insjuknar. Detta arbete syftar till att undersöka om det g˚ar att använda maskininlärning för att klassificera kognitiv sv˚arighet baserat p˚a en persons ögonrörelser. Mer konkret vill man undersöka om en automatisering av en etablerad bildbeskrivningsuppgift, som idag används flitigt inom demensdiagnotistik. Det har därför samlats in data som representerar personers ögonrörelser under tider de utför olika demenstester. Med hjälp av datan har man sedan tagit fram olika synfunktioner och använt binär och logistisk regression för att klassifisera dessa ögonrörelser. Det genomsnittliga resultatet visade att modellen klassificerade rätt i 73% av fallen. Detta resultat kan inte bekräfta att denna teknik kan användas för att automatisera neuropsykologiskt tester med tillräckligt hög noggrannhet. Däremot ser det lovande ut att kunna använda maskininlärning för att detektera avvikelser i ögonröresler, hos personer som lider av kognitiva sv˚arigheter. Vidare analyseras ocks˚a möjligheten att praktiskt implementera tekniken där man analysera ögonrörelser i svensk sjukv˚ard. Resultatet visar att om det är möjligt att utforma en modell som diagnotiserar bättre än vad en professionell läkare gör, s˚a g˚ar det att argumentera för att automatiska, neuropsykiska tester skulle kunna vara en nyckel för att detektera kognitiva sv˚arigheter i tidigt stadie.

Indextermer—Demens, Svensk sjukv˚ard, diagnostik, neuropsykiska tester, automatisering, avläsning av ögonrörelser, klassifikation, maskininlärning

I. BACKGROUND & GOAL

Healthcare for people with dementia costs Swedish society around 63 billion SEK per year, which is more than the total cost of cancer and heart diseases combined. As human lifespan increases and dementia correlates with higher age, the growing numbers of patients are expected to accelerate. [1] Most of the diagnosed patients today would have been able to live a normal life if symptoms of Mild Cognitive Impairment (MCI) had been detected at an early stage (K. H˚akansson. Associate professor of psychology, Personal interview 2020-02-20). If the methods of diagnosis could improve and more accurately detect MCI, it would therefore result in not only reduced costs in healthcare but also improved life quality for many individuals. One fundamental reason for MCI not always being detected in an early stage is due to the impersistence of the symptoms and the long time span between the regular diagnosis tests (K. H˚akansson. Associate professor of psychology, Personal interview 2020-02-20). This makes the timing and, consequently, the frequency of the tests crucial in order to detect MCI in an earlier stage.

(3)

routinely used in the basic investigation to assess cognitive capacity [2]. Many of these tests do not necessarily need the presence of a medical professional and could therefore be automated. If an automation of these were successful it would enable potential patients to do the tests themselves earlier and more often.

A solution like this could help innovate healthcare and this study will therefore consist of a reflection of how this new innovation could be practically implemented in the Swedish health care regime and what challenges that could be encountered from a multi level perspective.

In summary, this work will aim to try to find a way to automate a neuropsychological dementia investigation test and reflect on what innovations a solution like this could entail.

II. THEORY A. The Cookie Theft Picture

Clinical experts in speech- and language disorders routinely use picture description tasks to assess cognitive impairment. The most well-known picture used in this type of task is the so-called Cookie Theft picture, which depicts a scene with a mom and her two children standing in the kitchen. In the picture, the mom is doing the dishes while the water is overflowing in the sink, meanwhile the two children are stealing cookies from a cookie jar while the boy is about to fall over on the chair he is standing on.

In a medical examination, the participant is asked to describe what is happening in the picture, with or without a time limit. The participant’s cognitive ability is then assessed, based on predetermined specific criteria[2]:

1) What type of event the participant acknowledges and in which order they are described.

2) Which semantic categories that are used during the description (if the descriptions are general or specific). 3) If the participant shows the ability to correctly refer to earlier described objects or events.

4) How causal and temporal relationships are described. The ability to describe a static picture in a correct temporal order is crucial to how a person experiences events in the real world.

5) The ability to use a mental state language. An example of this could be “the boy is trying to reach the cookie jar” or “the water is overflowing but the woman does not seem to notice”.

6) See what structural language is used and if there are any deficiencies in phonology, syntax or semantics.

7) The general cognition and perception. Attention, memory and utterance planning can be impaired for people with cognitive impairment. Repetitions or descriptions that do not contribute to any new information will therefore also be considered.

B. Eye movements in connection to cognitive impairment Several scientific studies have been made on the correlation between eye movements and cognitive impairment. Many studies have shown differences on oculomotor function between patients with dementia due to Alzheimer’s Disease (AD) compared to healthy controls, where the ocolumotor function is the function that controls the eye movements and eyelids. A common method when evaluating the oculomotor function is to induct prosaccades and antisaccades tasks, where a patient is asked to either look at or avoid appearing objects on a screen. Results of these tasks have shown that saccades appeared to be slower and less accurate as a result of AD. Pupil size has also shown to increase with cognitive load and, therefore, also for people with AD compared to healthy controls [3]. These studies suggest that the oculomotor function is linked to cognitive functions and eye movements could therefore be used as a biomarker for cognitive impairment.

C. Eye movements in connection to speech production during picture description

Numerous eye-tracking studies show that there is a strong correlation between eye movements and language production. In an event where a person is asked to describe objects on a picture, the person usually fixates on the objects in the order of the mention [4]. Studies using eye-tracking has showed that healthy persons will fixate an object until a morphological form of the object name has been mentioned. The fixation time on the object is therefore related to the time it takes for the person to recognize the object and mention its correct name [5]. A natural consequence to this is that the fixation time of an object is correlated to the length of the object name. Another natural outcome due to this is that the fixation time for an object decreases with increasing speech rate [6].

D. Eye movement tracking and analysis

One way to enable analysing eye movements is to record them with an electronic eye tracking device. An eye tracker photographs each eye in a high frequency, and outputs raw data such as timestamp, x- and y coordinates of the gaze position, and pupil diameter for each photo. [7]

Eye movements can be divided into oculomotor events and representations [8]. Simple oculomotor events are what can be seen in a plot of the raw data from the eye tracker, for example blinks, saccades and fixations. These are countable and can be measured as a single numerical value but can have different properties such as velocity, duration etc. Definitions of some of these events and algorithms for detecting them are explained below:

(4)

Oculomotor Events and their Properties Oculomotor

Event

Definition Detection Algorithm Properties

Fixations Eyes are fixated at one point with a duration between 50-600ms. Detected by a maximum allowed dispersion threshold. Data samples must lie with-in a certain area (radius) for at least 50ms to be a fixation. Counts, duration, dispersion Saccades Rapid movement of both eyes in the same direction, between two or more phases of fixation.

Detected by a velocity threshold. The velocity threshold is usually set somewhere between 40-100/s. Counts, velocity, length, acceleration

Blinks Both eyes closes.

Detected when x, y = 0 or when the pupil diameter = 0. Counts, duration Smooth Pursuits Slow eye movements. Detected as movement with velocity between 30-40/s.

Counts, velocity, length Artefacts Recorded eye

movements that cannot be classified as any of the events above.

Detected for instance if the data shows saccades with a velocity higher than 900/s (maximum velocity saccades = 900/s for humans).

Counts etc

TABLE I: Oculomotor events and their properties [8].

Furthermore, a scanpath is a trace of a person’s eye-movements and events during a certain timespan. Scanpath visualizations are suitable for inspection of similarities or differences in oculomotor behaviour. A representation of a scanpath can therefore reveal information which otherwise could not be measured. [8]

Fig. 1: Example of a scanpath visualization.

E. Innovations in healthcare from a multi level perspective To understand how new technologies and innovations arise in society, a multi level perspective approach can be valuable to use. The multi-level perspective (MLP) is an analytical tool that can be used to explain industrial and technological change over time. The MLP looks into change from three different levels where processes interacts and aligns to a result in the socio-technical system that is, in this case, the Swedish health care system. These levels

are called the landscape, regime, and technological niche. Firstly, the landscape is the macro-level and is defined as the broader contextual development in our environment. Cultural patterns, politics, economic crisis, and climate change are examples of factors that constitutes and affects the landscape. Secondly, the regime is the meso-level and represents the socio-technical system that is constituted by institutions, rules, organisations, consumers, and technologies. In this case, the Swedish healthcare can be seen as a socio-technical regime where the actors and components in this regime are, amongst others, hospitals (both regulated and private), patients, healthcare workers, medicine, and patient acts. Finally, the technical niche is defined as the micro-environment where technological novelties can be fostered and tested before an eventual innovation of the regime. In this case, a dementia diagnostic device is the technical niche. [9]

The regime is influenced by the landscape, and pressure from the landscape on the regime creates an unbalance in the regime which opens up opportunities for new technicals to break through and innovate the regime. This is how new innovations can emerge and it is therefore important to analyse the landscape level as well as the regime level in order to understand how an automation of dementia diagnostic can be accepted and implemented by the regime. [9]

III. PROBLEM

To summarize the theory above, there are many studies on eye movements in relation to cognitive impairment. However, there are no evident research findings that show the eye movement behaviour of people with cognitive impairment during picture description. By finding any significant behaviour of the eye movements for people with MCI during picture description, the cookie-theft test could be automated.

When trying to detect significant differences in eye movement behaviour, without knowing beforehand what the differences might be, machine learning can be a valuable approach. More specifically, a binary classifying algorithm can be used to classify sequences of eye movements into two disjunct classes. Thus the problem statements will be defined as following:

I. Is it possible to binary classify eye movements during picture description into two cognitive states (MCI and not MCI) with an accuracy higher than baseline and with a significance level of 0.05?

II. Can eye tracking technique from a multi level perspective be practically implemented in Swedish healthcare?

IV. METHOD

To investigate eye movements of patients with MCI, the work was done in collaboration with a project called EACare.

(5)

The EACare project aims to develop a system that can detect dementia in early stage using social robotics and ML. Via this project, the study was granted access to eye tracking data of patients with MCI, collected by doctors at the Karolinska Institute. The provided eye tracking data was collected from 14 patients, 6 diagnosed with some form of cognitive impairment and 8 as healthy.

To detect significant differences in eye movements due to cognitive impairment, an ML algorithm that solves binary classification problems was used. More specifically, Binary Logistic Regression (BLR) was applied in order to classify the eye tracking data into the two states; normal eye movement behavior (0) or symptoms of MCI (1). The raw data set, the data processing method, and the BLR classification is described below.

A. Raw data set

The data from the Karolinska Institute was collected with a Tobii Pro Nano device, which is a screen-based eye tracker that captures eye tracking data at 60 Hz. The output for each datapoint is unprocessed and includes information of timestamp, gaze position, pupil diameter, gaze vector and validity code for each eye (Tobii Pro Nano description webpage, 2018). The gaze position describes the coordinates in 2D on the screen in the Active Display Coordinate System (ADCS). The gaze vector describes a vector in 3D in the User Coordinate System (UCS). The origin vector makes it possible to calculate the angular velocity between two gaze positions. The validity code is binary and decides whether or not to include the data point.

Fig. 2: Visualization of the Eye Tracking features (Used with permission of Tobii AB.)

In addition to the eye tracking data, information about the diagnosis of the patients was obtained. These diagnosis were AD, MCI and subjective MCI and was simplified as 1 (some type of cognitive impairment) and 0 (only self-perceived or no cognitive impairment). Finally, corresponding auto recordings for each patient were obtained.

B. Data processing

The data processing consisted of both quantitative and qualitative methods. In the quantitative part, the raw data was processed in three stages. Firstly, all artefacts and non-value data was filtered from the raw data. Secondly, oculomotor events were detected in the data from each

patient. The algorithms for detecting these oculomotor events are described in Table I. Thirdly, the different properties of these events was calculated and temporarily used as features for each feature vector, where each vector represented one patient.

In order to evaluate the calculated features and select the most potentially significant features, a manual qualitative analysis was also performed. Firstly, calculated features that involved events such as saccades and microsaccades needed to be removed. According to previous studies, significant differences in these features have been shown between patients with MCI and without, but unfortunately the device frequency was not high enough to accurately measure saccades and microsaccades. Secondly, features such as number of blinks and gaze points outside of the screen were removed. There are no previous studies that show that these are correlated to MCI, and since none of these showed high validity or correlation, they could mislead the logistic regression model training.

Thirdly, the corresponding audio recordings for each patient were investigated together with the calculated features. This was to see if the position of the calculated fixation points matched that which the patient orally expressed. Here it was found that the features that include fixation points have deficiencies since the position of the fixation points do not correspond to what the patient says. It is considered to be unlikely that a patient does not have gaze data in e.g. the area of the boy in the cookie-theft picture when the patient mentions the word ‘boy’ in the audio recording. Since fixations are an important event in the BLR training they are included in features, despite the low reliability.

Another important part of choosing features was to select as few but important features as possible. The number of features relative to the number of data points should not be too high. According to guidelines, the minimal number of data points required for good classification performance is N = 10*f/p, where N is the number of data points, f is the number of features and p is the smallest of the proportions of negative or positive cases [10]. In this experiment, N = 14 and p ≈ 0.43, which makes the maximum number of features recommended ≈ 0.6, which is not possible to practically perform.

The definitive features were selected because of their significant correlation to MCI supported by previous studies. The features were also designed in an attempt to reduce individual differences and shortcomings in the raw data set. The final features selected was:

1. Number of fixation points per second

Results have shown that patients with probable AD have a lower fixation point frequency than patients without during clock reading tasks [11].

2. Average duration of fixations

Results have shown that patients with probable AD have on average significantly longer fixations than patients without during clock reading tasks [11].

(6)

Involuntary microsaccades (i.e small eye movements) are shown to differ between people with MCI versus healthy subjects, which in turn could affect the dispersion of fixations [12].

4. Average variance in pupil size

Pupil size is known to increase with cognitive load [3]. 5. Areas of Interest visited within the first 15 seconds of the test

In the cookie-theft assessment, what type of events mentioned and the order of the mentioning is taken into account [3]. To be able to measure this, the salient events in the picture was represented with some Areas of Interests (AoI). The AoI:s chosen are: the face’s of the mom, boy, and girl, the cookie jar, the water overflowing and the chair tilting.

C. Classification

The final features determined represented the 14 feature vectors that were used in the BLR model. Since the project was provided with eye tracking data from 14 patients, where 8 of those were diagnosed as healthy, the baseline was set to 8/14, i.e. 57%. The model was then trained with a gradient descent algorithm, where the training was interrupted if the validation loss increased more than five times in a row, or if the norm of the gradient decreased below 0.01.

In order to evaluate the model with such few data points, 14-fold Cross-validation was used. Thus, for every training, one new data point from the data set was excluded from the training data and used as validation data. The model classified each data point and gave 14 outcomes in total. Based on these 14 outcomes, the accuracy, number of overfitted model training cases, average gradient vector, and average values of the theta parameters were calculated and registered. Because the outcome will differ from each training, the 14-fold validation was made 100 times in order to obtain a more statistically significant result.

All data processing and calculations were done in Python. D. Implementation in Swedish healthcare

In order to investigate the possibilities of implementing and thereby automating the cookie-theft picture test at hospital and clinics in Sweden, a qualitative study was done. Firstly, in order to gather knowledge about how the healthcare system from a social perspective, i.e. how the patients and other actors, such as healthcare workers, relates to possible automations, an in-depth interview was held with an expert in the fields. The full interview is presented in appendix A. Secondly, in order to gain more knowledge about an implementation from a more practical and technical perspective, an information search has been made. In this study it was searched on which processes that has been successfully automated in healthcare so far.

(7)

V. RESULTS A. Results of the Classification Model

Test Nr. _A _P Validation Metrics_R _{A>baseline (57%)} _{% overfittings} Gradient Descent Metrics_{Avg gradient vector} ₁ ₂Avg theta parameters₃ ₄ ₅ 1 0.64 0.60 0.50 true 71% [0.03, 0.10, −0.03, 0.00, 0.04, 0] 0,48 0,47 0,11 0,37 0,16 2 0.71 0.75 0.50 true 79% [−0.03, 0.06, −0.07, −0.00, 0.03, 0] 0.32 0.33 −0.26 0.25 0.09 3 0.71 0.75 0.50 true 86% [0.05, 0.17, −0.03, 0.00, 0.06, 0] 0.18 0.04 0.01 0.01 0.10 4 0.50 0.33 0.17 false 86% [−0.044, 0.029, −0.083, −0.004, 0.03, 0] 0.41 0.17 0.21 0.17 0.12 5 0.79 1.00 0.50 true 86% [−0.03, 0.056, −0.06, −0.00, 0.03, 0] 0.40 0.29 0.02 0.09 0.06 6 0.79 0.80 0.67 true 86% [ 0.01, 0.11, −0.05, 0.00, 0.04, 0] 0.28 0.19 0.08 0.11 0.19 7 0.58 0.50 0.33 true 79% [−0.01, 0.08, −0.06, −0.00, 0.04, 0] 0.31 0.01 0.01 0.39 0.18 8 0.86 1.00 0.67 true 93% [0.06, 0.22, −0.02, 0.01, 0.06, 0] 0.01 0.20 0.16 0.13 0.04 9 0.71 1.00 0.33 true 79% [−0.08, −0.40, 0.28 −0.12, −0.21, 0.43] 0.06 0.07 0.00 0.03 0.00 10 0.71 0.75 0.50 true 79% [−0.02, 0.07, −0.06, −0.00, 0.04, 0] 0.37 0.20 0.06 0.12 0.10 ... ... ... ... ... ... ... ... ... ... ... ... 100 0.64 0.67 0.33 true 79% [0.010, 0.11, −0.04, 0, 0.04, 0] 0.28 0.31 0.06 0.27 0.13 Total avg 73% 72% 45% 89% true 84% [0.01,0.06,0.05, 0.02, 0.04, 0] 0.27 0.18 0,01 0.18 0.00 TABLE II: Results of the first 10 and the last 1 out of 100 tests.

Validation Metrics (A = Accuracy, P = Precision, R = Recall), Gradient Descent Metrics, and Average Theta Parameters (where the dummy theta parameter is not included)

The results of the classification model shows that the average accuracy from across the 100 tests reached 73%. This accuracy did exceed the baseline of 57% 89 out of 100 times, i.e. with a significance level of 11%. The results also show that the gradient descent algorithm was interrupted due to overfitting in 84% of the training cases. In the other 16%, it was due to the low norm of the gradient. Furthermore, some of the theta parameters obtained had consistent average values and some had not. The first, second and fourth parameter values remained constant as negative and positive values. The two other parameters were more inconsistent, and their average values differed more between each test. B. Results of the automated healthcare processes search

Company Process

Enlitic AI Deep Learning to streamline radiology diagnoses [9] Pathai More accurate cancer diagnosis with AI [13]

Buoy health An intelligent symptom checker [14] Zebra M.V AI-powered radiology assistant[15] Freenome Earlier cancer detection with AI [16]

Coala Life Self screening device for arrhythmia investigations at home [17]

Uppsala university

Self screening device for detecting cerbical cancer [18]

TABLE III: Companies and their processes that has been successfully implemented in todays healthcare.

The results of the automated healthcare processes search shows that there are many established automated processes in todays healthcare. Some of those processes enable the patient to do screening at home, in able to conduct more regular examinations for a specific disease.

C. Results of the in-depth interview

Subject Comment Conclusion Disadvantages

of regular screening

1. Overdiagnosis could occur where early stage diseases are detected in screening tests but do not contribute to better treatment or health. Since some dementia related diseases are not treatable, this is a factor to take into consideration. Faulty diagnostics can also occur if no control function is implemented. Dementia is not always preventable. Disadvantages with automation from a patient perspective

There is a certain distrust towards fully digital solutions. There was a case in new Karolinska insitute where they started a project to make everything wireless, which ended up in alarm systems to stop functioning and patients died. Many people hear a lot about problems in this field which can contribute to skepticism. From a patient’s point of view, a lot of patients I talk to want to meet a physician in person, because there is a confidence in doctors’ professional competence and authority. The positive thing about digital diagnostics is that you can reduce the human error factor in diagnostics.

Patients need care with human contact. Disadvantages of automation from a hygiene perspective

Sterilization of tools is something that has been automated in many departments, but has in some cases resulted in dirty tools and many departments have returned to manual sterilization. The use of artificial intelligence in radiology is another area that I have heard been successful. Manual sterilization of tool is sometimes better.

TABLE IV: Summary and conclusions from interview (I. von Sydow, Medicine Doctor, Personal interview 2020-04-15)

The results of the healthcare qualitative study from interview shows that there are disadvantages that both patients and hospital staff experience with automated care processes. These disadvantages are mainly in the form of lack of safety and hygiene.

(8)

VI. DISCUSSION A. Classification Model

When interpreting the result from the classification model, it is important to remember the small amount of data that was inserted into the BLR model. It is also important to keep the low quality of the data in mind. These two limitations can be expected to cause problems with the BLR training and thus produce invalid results. The accessed data set consisted of recordings from only 14 patients, which is a too small amount to reduce the individual differences. Oculomotor system function is personal and eye movements and pupil size can therefore differ for each person. This problem is typically shown in the result of the fourth parameter. Since it is constantly negative it states that the patients with MCI in this data set had a lower pupil variance than healthy people, which in theory should be the opposite since pupil size increases with cognitive load. Thus, there was no correlation between pupil variance and cognitive impairment in the data set. It can be established that these problems are a significant reason as to why the gradient descent algorithm was interrupted by overfitting a high percentage of times, especially with the high number of features in relation to the number of data points. The bad data quality is, on the other hand, typically shown in the quantitative analysis of the calculated features, where the fifth feature proved to be unreliable. According to the gaze data, none of the patients looked at all the areas selected as AOIs during the first 15 seconds. It can be considered unlikely because all patients mention events in these areas in the picture. With this in mind, the feature could have been removed, but it is believed that it has a significant importance from a semantic perspective, and that it can provide valuable information for further use of this model with a higher quality of data. This quality problem also causes deficiencies in the correlation and contributes that the associated theta parameter is being trained to an incorrect value, which in turn has a major impact on the classification result.

However, despite the problems, the results show that there is potential in the BLR model and that the parameters in many cases seems to be trained in the right direction. For instance, in the first two parameters it is possible to distinguish results that align with the theory, with a negative value for the first and a positive value for the second parameter respectively. Since healthy people are more likely to have a higher frequency of fixations, as well as shorter fixations, these parameter values appear to have been trained in the correct direction. Also, the average accuracy is well above the baseline, which indicates that gaze data could provide valuable insight to dementia diagnostics.

In order to answer the first problem statement when using this model, it is unfortunately not possible to exceed an accuracy over baseline with a significance level of 0.05. However, with a larger and more reliable data set, it is possible that this could be achieved. If improved data could be obtained, the model could be tested again and a more informed answer to this question could be given.

B. Method

Furthermore, the method used in order to classify eye features is considered to be appropriate. To use a BLR model in the field of diagnostics is a valid approach when trying to discover new patterns. However, in order to let the algorithm find new patterns or correlations, many more features have to be used. There are several additions that can be made on the method in order for the classification to improve. A Speech-to-Text algorithm can be used to complement the eye tracking data to capture some linguistic features that eye tracking cannot. The converted audio recordings can then be parsed into features and used in the BLR algorithm. Furthermore, with a higher eye tracking device frequency, more subtle oculomotor events can be detected. Subtle deviations in the oculomotor system such as microsaccades and intrusions are shown to differ with MCI, and would be interesting to use as features.

C. Implementing eye tracking in health care from a multi level perspective

When interpreting the result from the qualitative methods, the study have provided many ideas on how to practically implement eye tracking to detect MCI at an earlier stage. With regards to the multi level perspective, the health care regime in Sweden is characterized by rigidity because of the many medical regulations and patient laws in the Swedish healthcare. These factors can inhibit change in the system and make it more difficult to implement a solution like this. However, health care is facing major changes in the digital field. More and more processes in health care adopts machine learning and artificial intelligence approaches, and more people turn to digital medical appointments instead of physical. In order for an invention to be accepted by the regime it is important that all actors in the regime can accept it.

The result of the interview shows that many patients and health care employees are sceptic towards digitalizing medical assessment because of earlier failures, but this sceptism could be minimized by letting the digitalized processess be complementary and not substitute physcial care. Due to the results of the information search it can be concluded that the regime has accepted several technological niches to be able to automate processes in healthcare, but that the safety is a critical factor.

However, the result of the automated healthcare processes search shows that there are today many established methods that has been accepted by the regime. Some of those enables frequent medical examinations that patients can perform themselves at home. However, the result also shows that most innovations that enable automation of diagnosis, are processes that often do not occur in connection with patient contact, and therefore do not require as much resistance from patients to break through. It can therefore be concluded that it is practically possible to introduce regular visits at health clinics for people in certain demographic or genetic groups who are at greater risk of developing cognitive impairment. During these visits, the cognitive tests could be automated

(9)

with eye tracking without the presence of a doctor. This efficient resource management would not only enable more frequent visitations, but also enable people to get tested before developing any noticeable symptoms of cognitive impairment. With an ongoing collection of patient data, and a relative assessment overtime, one could detect deviations in the normal oculomotor system function, allowing for diagnosis of MCI at an earlier stage.

In order to answer the second problem statement, given that an eye tracker can detect differences in cognitive capacity with an accuracy higher than a trained medical professional, it would be possible to implement eye tracking in Swedish healthcare from a multi level perspective. In this case, the safety of the technological niche would be high enough to be legally and socially accepted in Swedish healthcare.

VII. CONCLUSION

It can be concluded that there are many interesting and useful eye movement features that are possible to use in an ML algorithm. Eye movement detection appears to be an appropriate approach to diagnose MCI because of the numerous established correlations between the oculomotor system function and cognitive capacity.

To refer to the purpose of this work, it cannot be answered whether it is possible to automate the neuropsychological dementia investigation test. This is because of the inability of the BLR model to classify MCI with a higher accuracy than a medical professional could achieve. Despite the limitations of the data, the result showed that it is possible to learn something from the provided gaze data. The accuracy is stable and exceeds baseline and the parameters have in many cases been trained in the right direction.

The main shortcomings in this work, which has a major impact on the result, are caused by the low quantity and quality of the data set used. These problems has caused incorrect calculations of features and misleading of the theta parameter training in the BLR algorithm.

It is however arguable that an implementation of an ML algorithm could enable an automated clinical assessment which could improve conditions for detecting MCI in an earlier stage.

In summary, the model and method look hopeful for detecting significant patterns and deviations in the eye movement behaviour. With more and high quality data, the model in this study could be improved and thereafter reassessed. If the model then appears to work with an accuracy equal to or higher than than the accuracy of assessment from professionals, there are great opportunities for implementing it in the Swedish health care system.

APPENDIX

A. List of questions asked in the in-depth interview

• Can you foresee any challenges or problems with automating diagnostic tests in hospitals and clinics in Sweden?

• Do you think there are any social factors such as cultural patterns or social norms that can prevent the acceptance of digitalised health care? Are patients and healthcare employees willing and mature to adopt to new innovations?

• Do you have any examples of processes that has been

automated or attempts that have failed? If so, why do you think they have been succesful or not?

REFERENCES

[1] Socialstyrelsen, “Ny rapport om demenssjukdomarnas kostnader,” https://www.socialstyrelsen. se/om-socialstyrelsen/pressrum/press/

ny-rapport-om-demenssjukdomarnas-kostnader/. Accessed: 2014-06-09.

[2] L. Cummings, “Describing the cookie theft picture: Sources of breakdown in alzheimer’s dementia,” Pragmatics and Society. [3] W. e. a. Kremen, “Pupillary dilation responses as a midlife indicator

of risk for alzheimer’s disease: Association with alzheimer’s disease polygenic risk,” Neurobiology of Aging.

[4] Z. Griffin and K. Bock, “What the eyes say about speaking,” Psychological Science.

[5] H. J. Radach, R. and H. Deubel, The Minds’s Eye. North Holland, 2003.

[6] A. S. e. a. Meyer, “Effect of speech rate and practice on the allocation of visual attention in multiple object naming,” Frontiers in Psychology. [7] TobiiPro, “Tobii pro nano,” https://www.tobiipro.com/

product-listing/nano/. Accessed: 2018-10-17.

[8] K. e. a. Holmqvist, Eye Tracking - A comprehensive guide to methods and measures. OUP Oxford, 2011.

[9] M. A. Schilling, Stretegic Management of Technological Innoation. Mc Graw Hill Education, 2017.

[10] T. R. Hastie, T. and J. Friedman, The Elements of Statistical Learning. Springer, 2009.

[11] U. P. e. a. Mosimann, “Visual exploration behaviour during clock reading in alzheimer’s disease,” Brain.

[12] Z. e. a. Kapoula, “Distinctive features of microsaccades in alzheimer’s disease and in mild cognitive impairment,” Age.

[13] Pathai, “Pathai,”https://www.pathai.com. [14] B. Health, “Buoy,”https://www.buoyhealth.com.

[15] Z. M. V. co., “Zebra medical vision,”https://www.zebra-med. com.

[16] F. co., “Freenome,” https://www.freenome.com. Accessed: 2020-01-17.

[17] C. L. AB, “Coala life,” https://www.coalalife. com/se/coala-hjartmonitor/?gclid=

CjwKCAjw5cL2BRASEiwAENqAPspfg-RfAGe2iQHECftCDBxPhY7FEgaJrg8bvbdhh7XOePrfuCf-HhoCJhkQAvD_ BwE.

[18] . Vardguiden, “Cerbical screening test,”https://www.1177.se/ en/Stockholm/other-languages/other-languages/ undersokningarprover---andra-sprak/

cellprov-fran-livmodern--andra-sprak/. Accessed: 2020-10-05.

M. Andersson (Stockholm, 1992) is a student in Industrial Engineering and Management specialising in Computer Science. This work is a Bachelor thesis in the Management and Computer Science field. She has during this work specialised in the data processing and the machine learning aspect.

L. von Sydow Yllenius (Stockholm, 1994) is a student in Industrial Engineering and Management specialising in Computer Science. This work is a Bachelor thesis in the Management and Computer Science field. She has specialised in the theoretical background of this work and focused on the eye movement behaviour in connection to cognitive impairment.

(10)