A Comparison on Image, Numerical and Hybrid based Deep Learning for Computer-aided AD Diagnostics

(1)

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM, SWEDEN 2020

A Comparison on Image, Numerical and Hybrid based Deep Learning for Computer-aided AD Diagnostics

Sebastian Buvari

Kalle Pettersson

(2)

Degree Project in Computer Science Date: June 23, 2020

Supervisor: Alexander Kozlov

(3)

Abstract

Alzheimer’s disease (AD) is the most common form of dementia making up 60- 70% of the 50 million active cases worldwide and is a degenerative disease which causes irreversible damage to the parts of the brain associated with the ability of thinking and memorizing.

A lot of time and effort has been put towards diagnosing and detecting AD in its early stages and a field showing great promise in aiding with early stage detection is deep learning. The main issues with deep learning in the field of AD detection is the lack of relatively big datasets that are typically needed in order to train an accurate deep learning network.

This paper aims to examine whether combining both image based and numerical data from MRI scans can increase the accuracy of the network.

Three different deep learning neural network models were constructed with the TensorFlow framework to be used as AD classifiers using numerical, image and hybrid based input data gathered from the OASIS-3 dataset.

The results of the study showed that the hybrid model had a slight increase in accuracy compared to the image and numerical based models.

The report concluded that a hybrid based AD classifier shows promising results to being a more accurate and stable model but the results were not conclusive enough to give a definitive answer.

(4)

Sammanfattning

Alzheimer’s sjukdom (AD) är den vanligaste formen av demens och utgör 60-70%

av dem 50 miljoner personer som lider av demens runtom i världen. Alzheimer’s är en degenerativ sjukdom som gör ireversibel skada till de delar av hjärnan som är associerade med minne och kognitiv förmåga.

Mycket tid och resurser har gått till att utveckla metoder för att uptäcka och diagnostisera AD i dess tidiga stadier och ett forskningsområde som visar stor potential är djupinlärning.

Det främsta problemet med djupinlärning inom AD diagnostik är bristen på relativt stora dataset som oftast är nödvändiga för att ett nätverk ska lära sig göra bra evalueringar.

Målet med det här pappret är att utforska ifall en kombination av bildbaserad och numerisk data från MRI scanningar kan öka noggranheten i ett nätverk.

Tre olika djupinlärnings neurala nätverksmodeller konstruerades med TensorFlow ramverket för att användas som AD klassifierare med numerisk-, bild- och hybridbaserad indata samlade från OASIS-3 datasetet.

Rapportens resultat visade att hybrid modellen hade en liten nogrannhets ökning i jämförelse med de bildbaserade och numriska nätverken.

Slutsatsen av den här studien visar att ett hybrid baserat nätverk visar lovande resultat som metod för att öka noggranheten i ett netvärk ämnat för att dignositsera AD. Dock är resultaten inte tillräkcligt avgörande för att ge ett slutgiltigt svar.

(5)

Acknowledgements

We want to thank Alexander Kozlov for mentoring us throughout the project.

Data were provided by OASIS-3: Principal Investigators: T. Benzinger, D. Marcus, J. Morris; NIH P50AG00561, P30NS09857781, P01AG026276, P01AG003991, R01AG043434, UL1TR000448, R01EB009352. AV-45 doses were provided by Avid Radiopharmaceuticals, a wholly owned subsidiary of Eli Lilly.

(6)

1 Introduction

1.1 Problem definition

Alzheimer’s disease (AD) is a degenerative neurological disease which causes irreversible damage to parts of the brain associated with the ability of thinking and memorizing [7]. AD is the most common form of Dementia making up 60- 70% of the 50 million active cases worldwide and is growing at a staggering rate of 10 million new cases each year [25]. As such AD is a rising global issue. For effective treatment it is critical that the disease is diagnosed at an early stage [19]. Previously the only way to confirm an AD diagnosis was examining the brain for signs of damage and shrinkage post mortem [2]. Modernly clinicians are able to accurately diagnose patients through the assistance of non-invasive techniques such as magnetic resonance imaging (MRI). For an accurate diagnosis, considerable time and expertise is required of the clinician, making for a time consuming and costly procedure. A prospect in improving this method is the modern rise of machine learning.

Machine learning allows for the classification of images by training on large data sets, a large amount of images. Deep learning is a class of machine learning models that have the special quality of being able to take raw data input and deliver an output through use of so called hidden layers. This results in an ease of use while maintaining a high level of accuracy, making deep learning a prime contender to make an impact on the medical field. Several studies in the field of AD detection through the use of deep learning have been conducted [6, 7, 10, 11, 14]. The studies show promising results in detection of early stage AD, detection of developed AD and the conversion from mild cognitive impairment (MCI) to AD. A recurring problem with the development of deep learning methods in the field is the limited amount of MRI scans provided in AD data sets. While machine learning methods typically rely on large data sets to train on, AD data sets are usually relatively small

(9)

1.2 Aim, purpose and research question

The purpose of this report is to examine three types of deep learning neural networks in the context of AD detection. The three types of neural networks to be examined are an image based neural network, a neural network based on brain measurements and finally a hybrid neural network using both images and brain measurements to classify AD. Deep learning models with complex data tend to require a large data set for accurate results but the supply of MRI scans for AD inflicted patients are fairly limited in such context (OASIS-3 dataset about 500).

Therefore the question arises, how do we supplement a smaller data set to yield better results? By mixing numerical data with imagery a possibility exists that a hybrid deep learning model may yield better results due to the increase of available data. As such we are lead to the research question for this report:

Is a hybrid deep learning model using both MRI imagery and numerical brain measurements more accurate in detecting AD than corresponding purely image based model?

1.3 Approach

Through studying previous work in the field of AD detection and deep learning, as well as implementations using hybrid data, we attempt to find inspiration for a model with the capability of classifying AD. The classification will be done on MRI brain scans provided by the OASIS-3 data set, as well as numerical data from the FreeSurfer formatted data paired with the scans in the data set. The implementations will be done using TensorFlow, a machine learning framework made by Google, the implementations will also be using Keras, a high level wrapper for TensorFlow. The implementations will be tested on the data set. By analyzing the results from the implementations an answer to the research question may be formed. The results will tell if the hybrid data can yield a higher accuracy than purely image or numerically based methods, if it will provide less accuracy due to overfitting or provide no difference at all.

(10)

1.4 Thesis outline

The following outline of this thesis is ordered as follows. In the background state of the art research is presented in the field of computer aided medical diagnostics.

Additionally sections about what Alzheimer’s disease is and how it is traditionally diagnosed are presented followed by a section about how MRI images are used in diagnosis. Furthermore a study in definitions and concepts of deep learning models and workings is conducted and presented. Thereafter the OASIS project and their accompanying data used in this thesis is presented. In the method section the implementations are presented. In the results we compare the three algorithms accuracy and loss in different contexts and explain these. Under the discussion an analysis is conducted whether the results are reliable, what their meaning has for the field of AD detection and how the results can be used for future work in the field. Finally we make a conclusion determining if a hybrid solution can bring value to the field of AD detection.

(11)

2 Theoretical Background

2.1 Alzheimer’s Disease

2.1.1 What is Alzheimer’s Disease

Alzheimer’s Disease (AD) is a progressive neurodegenerative disease which tends to affect the elderly. AD is the most common form of dementia accounting for upwards of 80% of reported cases [24]. Worldwide there are approximatly 50 million people suffering from dementia with a rapid increase of almost 10 million new cases each year [25]. Although deaths related to many other common diseases such as strokes and cardiovascular diseases are decreasing the deaths associated with AD has increased by 89% from the year 2000 to 2014 [24].

Being a progressively degenerative disease AD tends to start developing slowly and grow worse gradually over time resulting in symptoms ranging from language problems, disorientation, mood swings and behavioural issues all the way to gradually loosing bodily functions eventually resulting in death [3].

The causes of AD are complex and are generally poorly understood [3], but the causes seem to be related to a combination of environmental and genetic factors. The most common factors associated with the risk of developing AD is age and some environmental risk factors including smoking, strokes, heart disease, depression, arthritis and diabetes. Some of the genes that can be linked to the risk of developing AD are APP, PSEN1, PSEN2. Some lifestyle choices such as exercising and staying intellectually stimulated appear to decrease the risk of developing AD [21]. AD is an irreversible disease meaning that there is no known cure for stopping or reversing the disease, which results in the treatment being focused on relieving symptoms [25]. Examples of treatment could be drugs such as cholesterol inhibitors, memantine and physical exercise [24].

(12)

2.1.2 Classic Alzheimer’s diagnostics

The problem with diagnosing Alzheimer’s disease is that a definitive diagnosis can only be obtained post mortem by careful examination of the brain [2].

The two main pathological abnormalities looked for are senile plagues and neurofibrillary tangles, both abnormalities only visible through a microscope.

Within the core of the senile plaques a large amount of a protein called beta amyloid is contained. The plaques are usually located in the space between neurons and tend to be found in the hippocampus and the cerebral cortex, which are areas of the brain associated with memory and decision making respectively (see figure 2.1a). Amyloid plaques form dense structures that does not dissolve and as an individuals disease worsen the number of plaques increases and they often spread to other parts of the brain as well. The neurofibrillary tangles are found within the brain cells themselves. The tangles are formed when the protein Tau, which is a protein associated with microtubles that transport material inside the cell, is severly altered and starts to clump together. This impairs the cells internal transport system which contributes to the eventual death of the cell (see figure 2.1b) [16][4].

(a) Senile plagues (b) Neurofibrillary tangles

Figure 2.1: Microscopic view of Senile plaques and Neurofibrillary tangles from a brain affected with Alzheimer’s disease [16]

(13)

Even though a definitive diagnosis is hard to obtain there are other methods with high accuracy a doctor will use to diagnose patients. During a diagnosis the doctor will look for external signs which would indicate that a patient might suffer from AD, these are the same signs mentioned in the previous subsection. If the doctor suspects that a patient has AD they will order additional tests to rule out other potential diseases with similar symptoms, diseases such as thyroid disorder, vitamin B-12 deficiency of other previous conditions that might contribute to the displayed symptoms. Various brain scans such as MRI and PET scans are also used to both rule out other diseases and help the doctors diagnose AD, more about this in the next section [15].

2.1.3 MRI scans and AD

Magnetic resonance imaging or MRI as it’s commonly known is a technique used in radiology for creating images of the body. The MRI machine uses strong magnetic fields, radio waves and computers to create images of areas within the body [12]. There are several different types of MRI scans such as T1w and T2w.

In T1w MRI scans is based on the longitudinal relaxation of tissues, in brain scans this will render different types of tissues in different gray scale colours. In T1w fat will appear brighter while water and blood will appear darker, this will result in gray matter appearing gray and white matter appearing white in T1w MRI scans [5]. When MRI is used to aid a doctor in diagnosing AD the doctor will look for deterioration and atrophy. Atrophy of medial temporal structures is a sign of mild cognitive impairment (MCI) and specifically atrohpy in the hippocampus is a sign of the conversion from MCI to AD [9].

(14)

2.2 Deep Learning

2.2.1 What is Deep Learning

Deep learning is a class of machine learning techniques that are known for using the method representation learning. Representation learning means that the system can be presented with raw data and is able to use this data for feature detection or classification. These techniques are commonly applied in the development of modern image recognition systems. The most common form of deep learning is supervised learning. The idea behind supervised learning, in the case of image recognition, is to allow the machine to learn by repeatedly being exposed to images which may or may not contain the object that is to be identified. By exposing the machine to several Yes or No instances of images, where either the object appears (Yes instance) or it does not appear (No instance), the machine is able to determine which factors of the images that prove to be important in determining whether or not the object is contained within the image.

The values that are internally tweaked by the machine are commonly referred to as weights [14]. The tweaking of these values is commonly done through the use of backpropagating algorithms [17]. There are different models deep learning algorithms can use to go from taking the input data to outputting a vector instance of most likely answers. This vector is the machines representation of which answer is most likely correct, for example if the machine was looking for a carrot in an image and the yes representation in the vector had the greatest value this would mean that the machine would say that there is indeed a carrot in the image. A deep neural network is used for processing the image and determining an output.

Convolutional Neural Networks is a class of such neural networks and has a common use in the field of image recognition [14].

(15)

2.2.2 Convolutional Neural Networks

Convolutional neural networks (CNN) is a network architecture where the layers are not necessarily fully connected but instead works to give value to spatial context of the input. A fully connected architecture would mean that the neural network would treat pixels close to each other and far apart as the same, meaning the distance has no real effect, it ignores the spatial context of the input pixels. The ability to value the spatial structure of the image is desirable for trivial reasons.

Convultional neural network are designed to give this support [18].

(16)

2.3 Deep Learning for Alzheimer’s detection

With the extensive research done in the medical field of Alzheimer’s research it is no surprise that the disease has attracted attention in the field of machine learning. The studies closest to our own proposal are those focusing on image classification using MRI scans and deep learning.

Islam et al. [10] created a deep learning based multi-class classification for alzheimer’s disease using MRI scans provided by the OASIS-3 dataset with a test accuracy of 73.5%. The implementation was constructed using TensorFlow and convolutional neural networks as the network architecture. The input for the network was a single T1-weighted MRI image and the output of the network was a prediction of one of the four classes: non- demented, very mild dementia, mild dementia and moderate dementia. As the OASIS-3 dataset is relatively small in terms of machine learning datasets the implementation had a focus on maximizing the recognition of impactful features from the MRI scans [10]. Their model for the network was inspired by Inception-V4 network [22] and yielded a improved result compared to what in the report is referred to as a traditional inception module [10].

Ladislav [20] in a research of relevance examines the viability of the use of TensorFlow in the field of deep learning within computational biology. The study looks at usage scenarios for TensorFlow as a framework as well as it’s benefits and drawbacks. The conclusion of the study is that TensorFlow is a suitable framework for the field as it has fast compilation time, streamlined graphics and is developed and maintained by Google which the report states as being “in the forefront of deep learning model development”. Keras is a high level wrapper used with TensorFlow to simplify some of the frameworks otherwise low level and hard to implement concepts such as constructing a neural network. The higher level of abstraction that Keras offers makes TensorFlow more accessible to newer researchers in the

(17)

Several state-of-the-art literature studies has been conducted within the field of deep learning for AD classification. Falahati et al. [6] reviewed 50 articles in the field and concluded that machine learning and multivariate methods have great potential to aid in clinical practice and should be validated and tried against traditional diagnosis performed by clinicians. A distinction was made that the accuracy of the methods might be limited by the data the models are based off and not the method itself. They also stressed the importance of further investigating the use of different combinations of biomarkers as well as combining clinical measures with models to further improve the detection of MCI to AD development [6]. Jo et al. [11] conducted a similar study reviewing 16 papers within the field all using only deep learning or a combination of deep learning and traditional machine learning. The study found that a hybrid combination resulted in higher levels of accuracy compared to methods using only deep learning. An increase of studies in the field of deep learning related to AD diagnosis can be attributed to its ease-of-use, in traditional machine learning it is important to find well-defined features, however, these feature get increasingly difficult to find as the complexity of the data increases. Deep learning ease of use is a result of the user not having to explicitly decide which features are important for image classification but is instead decided through the algorithms hidden layers [11].

(18)

2.4 Dataset

2.4.1 OASIS - Open Access Series of Imaging Studies

The Open Access Series of Imaging Studies, more known as the OASIS project is an open sourced project that aims to make neoroimaging datasets freely available for the scientific community. The OASIS project compiles and distributes a large multi modal dataset in the hopes of facilitating and aiding in the future discoveries related to basic and clinical neoroscience. The data provided by OASIS is spanning a broad demographic containing thousands of subjects showing both normal aging and Alzheimer’s Disease. The OASIS project provides 3 large datasets containing scan data from 2975 MRI sessions and 1608 PET sessions collected from 1664 patients. The three datasets provided are called OASIS-1, OASIS-2 and OASIS-3 the later of which has been used in this report [1].

2.4.2 OASIS-3 dataset

The OASIS-3 dataset is a Longitudainal Neoroimaging, Clinical, and Cognitive dataset for normal age related deterioration and Alzheimer’s disease. The Dataset has retrospectivly been compiled from data collected from 1098 patients, the data was gathered over from several ongoing projects spanning the course of 30 years through the Washington Univeristy in St. Louis (WUSTL) Knight ADRC. Patients represented in the OASIS-3 dataset consist of 609 cognitively normal subjects and 489 subjects showing various degrees of cognitive decline. The patients span the ages of 42-95 years of age. The data is anonymous and the dates of which the patients where scanned has been normalized to reflect the days from entry into the study. The data in the OASIS-3 dataset consist of both MRI and PET data.

The MRI data consists of data compiled from 2168 MR sessions which include T1w, T2w, FLAIR, ASL, SWI, time of flight, resting-state BOLD, and DTI most of which also contain numeric volumetric segmentation files produced through

(19)

2.4.3 NIfTI and FreeSurfer data

NIfTI is a medical imaging file format and is the file format used for storing MRI images in the OASIS-3 dataset. These NIfTI files are 3-dimensional files consisting of MRI image slices from all three axis [13]. FreeSurfer is a structural MRI analysis software. The main use of FreeSurfer is to generate a 3D reconstruction of the brain [8]. A FreeSurfer generated csv-file in the OASIS-3 dataset contains volumetric measurements of the brain [13].

(20)

3 Method

3.1 Retrieving the Data

The data provided by OASIS-3 [13] was partitioned and sorted into Yes and No instances based on medical diagnostics linked to each patient that data was collected from. The No-instances were classified as those patients who had received a diagnosis of ”Cognitively Normal” and the rest of the patients were classified as Yes-instances.

3.1.1 NIfTI data

The dataset collected was extended by including scans from all the sessions a patient had taken part in, resulting in a dataset consisting of 1324 MRI image files that were classified as No-instances and 821 MRI image files that were classified as Yes-instances. To achieve a balanced dataset only 821 images were chosen from each partition of the dataset giving us a total of 1642 images that was used in our study. The 1642 images were partitioned in three parts, 70% were placed in the training dataset, 15% was placed in the validation dataset and 15% placed in the final evaluation dataset which was used for testing the neural network after training and gathering the results. An image slice of a healty brain can be seen in figure 3.1a and an image slice of a brain with AD can be seen in figure 3.1b.

The image files collected were 3-dimensional (256x256x52) NIfTI files consisting of image slices from all three axis. We used slices collected along the z-axis, and for each subject a single slice located at the center of the brain was chosen and used in subsequent training and testing. The images where all scaled down to be 100x100 pixels. The choice of only selecting one image and scaling them down were made in order to reduce the complexity and time required to train the neural network.

(21)

(a) T1w MRI scan of a Cognitivly normal brain

(b) T1w MRI scan of brain with AD

Figure 3.1: Cognitively normal vs AD brain MRI scan [23]

3.1.2 FreeSurfer data

The FreeSurfer data was collected and filtered from a CSV file provided by

OASIS-3 [13] containing data from all MRI sessions in the dataset. From the data collected 10 datapoints for each image were selected and added to the dataset. The datapoints describe the volume of different types of brain matter from different parts of the brain, for instance the total volume of gray matter and the total volume of cortical white matter, some sample data is shown i figure 3.2a.

Before feeding the data to the neural network the datapoints were normalized to values in the range [0,1] in order to obtain a common scale between the various datapoints without distorting differences in the ranges of values. The training- validation-final evaluation data split corresponds to the data split used for the image dataset.

(a) The 10 FreeSurfer datapoints used

Figure 3.2: Example of raw FreeSurfer data before normalization

(22)

3.2 The Classifiers

Three different deep learning neural networks were constructed. The neural networks were designed to be simple in structure and appropriate for their respective input type. The first neural network to be constructed was based off image input (MRI scans), the second on numerical data presented in a CSV file and the third was a combination of the prior two neural networks taking both a image and numerical values as input. The intention of creating three different deep neural networks was not to test the effectiveness of each input datatype but to determine if the third, hybrid neural network would produce a better result than the individual image based and numerical based neural networks.

The classifiers were made using python and mainly the deep learning focused libraries TensorFlow and Keras. They were produced in the python environment jupyter notebook. Other libraries such as pandas and sklearn were used in forming training and valdiation data in proper deep learning input form.

(23)

3.3 Making the image based classifier

The image based classifier featured 7 layers in total. The neural network is kept simple as the focus of the classifier is not high accuracy in its own which is common in previous work [11] but is to work as a comparison base to the hybrid classifier.

The layers implemented were 2 Convulusional 2D layers, 2 Max pooling 2D layers, a flatten layer, and 2 dense layers, order of implementation and final structure can be seen in figure 3.3a below.

(a) imageclassifier tangles

Figure 3.3: The layer structure of the image based deep learning neural network

(24)

3.4 Making the numerical based classifier

The numerical based classifier consisted of three dense layers, as the neural network is simply taking 10 different numerical inputs (see figure 3.4a). The structure of the deep learning network can be seen in the figure below.

(a) numerical classifier

Figure 3.4: The layer structure of the numerical based deep learning neural network

(25)

3.5 Structure of hybrid classifier

To give a fair comparison between the different deep learning neural networks the hybrid classifier needs to be equal to previous structures combined. The aim is to measure if better results can be achieved through the use of mixed data, not a inherently more complicated neural network, as such our hybrid structure is a combination of the image based classifier and numerical based classifier. The hybrid classifier will take image input through the corresponding image classifier layers, numerical data will be taken through the numerical based classifier. We will then concatenate the passed data before finally running it through two final dense layers to start training and evaluation. The structure of the hybrid classifier can be seen below in figure 3.5a.

(a) hybrid classifier

Figure 3.5: The layer structure of the numerical based deep learning neural

(26)

4 Results

The following results were produced by the three deep learning neural networks presented in the method section of this report. The graphs were generated with the use of TensorBoard which is part of the TensorFlow framework. The amount of times (epochs), each model was trained and validated differs, this is because models reach peak accuracy at different amounts of epochs. The amount of epochs were decided by trial and error runs. The graphs and the accuracy of the models will be further examined in the discussion session of the report.

4.1 Image classifier

The results of the image model can be seen in figure 5.2. The model was trained for 50 epochs with a batch size of 32 and used on the validation data each epoch. The loss for training and validation can be seen plotted in figure 4.1a. The accuracy of each epoch for the validation and training can be seen in figure 4.1b. Epoch loss can be observed to unconventionally increase for the validation data, while seemingly having no major impact on epoch accuracy. After 50 epochs the final evaluation dataset of 231 images was used as input and resulted in a accuracy of 71.429%.

(a) Pink: Training dataset, Green:

Validation dataset

(b) Pink: Training dataset, Green:

Validation dataset

(27)

4.2 Numerical classifier

As shown in figure 5.1 the numerical model was used on the training and validation data running for 200 epochs with a batch size of 32. In figure 4.2a the loss for both the training and validation data is plotted and in figure 4.2b the accuracy is plotted. After running the trained numerical model on the final evaluation dataset an accuracy of 73,593% was achieved.

(a) Blue: Training dataset, Red: Validation dataset

(b) Blue: Training dataset, Red: Validation dataset

Figure 4.2: Graphs showing results from the FreeSurfer neural network

4.3 Hybrid Classifier

The results of the hybrid model can be seen in figures 5.3. The model was trained for 30 epochs with a batch size of 32 and used on the validation data each epoch.

The loss for training and validation can be seen plotted in figure 4.3a. The accuracy of each epoch for the validation and training can be seen in figure 4.3b. After 30 epochs the final evaluation dataset of 231 images and numerical value sets was used as input and resulted in a accuracy of 74.891%.

(28)

(a) Red: Training dataset, Blue: Validation dataset

(b) Red: Training dataset, Blue: Validation dataset

Figure 4.3: Graphs showing results from the combined neural network

4.4 Final Evaluation Chart

Figure 4.4 shows the results from running the three different classifier on the final evaluation dataset.

Figure 4.4: Table showing the result from running the 3 different classifiers on the Final Evaluation dataset

(29)

5 Discussion

5.1 General Discussion of Results

The results show that a slight accuracy increase can be achieved through the hybrid data input neural network compared to both the image and numerical neural networks (see figure 4.4). This does give encouraging indication that a hybrid data input can indeed increase the accuracy of AD classifiers but we do not find the increase in accuracy substantial enough to undoubtedly prove the hybrid model being superior. However the increase in accuracy is not the only factor which suggests that a hybrid neural network could yield better results than a purely image or numerically based AD classifier.When we examine the graph for the numerically based AD classifier, the graphs seem reasonable, loss is decreasing every epoch for both training and validation data and accuracy is increasing every epoch for both training and validation data. This is what is to be expected of a learning neural network. We can also see that the curve starts to flatten when reaching higher epochs, meaning we are closing in to the classifiers peak performance, that being the models highest accuracy. Continuing training after peak performance would put the model at risk of overfitting, resulting in a less accurate model. As such the numerical model seems to be behaving normally in the sense of common behaviour for deep learning neural networks. However the same can not be said for the behaviour of the image based AD classifier.

Examining the loss graph for the image neural network the observation can be made that while the loss in the training data seems to be decreasing every epoch, we can actually see the loss of the validation data increasing. We have no clear answer to why this is the case, as deep learning neural networks are viewed as a sort of black box, pinpointing what causes the increase in loss is difficult. It can clearly be observed that while loss is continually increasing for the validation set, accuracy is not decreasing, as such we know that the model is indeed learning. However it seems that when reaching 25 epochs the curve

(30)

the loss still increases for each epoch. A possible reason for this is that the model is improving on classifying the images, but at the same time is becoming more unsure of how sure the classification is. An example makes the statement less confusing. A patient is either classified with having AD or not having AD, if the neural network decides that an image (patient) is 50.1% certain to have AD then it will classify the image as AD. In other words, if the model goes from assuming an image is of 90% AD to in the next epoch assuming that the image is of 85% AD, the patient will still be classified as AD, but the classifier will be further away from being 100% certain. As such the case might be that the model is getting better at classifying scans closer to the 50% certainty mark, which yields better accuracy at early epochs, but also increasing loss as the model grows less certain of how certain it is that the image is of a patient with AD.

However the same does not seem to be true for the hybrid based AD classifier.

When observing the graph for the hybrid neural network, it can be noted that the graph seems to contain traits of both the image and numerical models. The hybrid model seems to have a corresponding learning speed to the early epochs of the image model, a steep increase in accuracy in few epochs. The model does not seem to take after the strange behaviour of the image models loss, instead it follows the expected pattern of the numerical based AD classifier. The accuracy does seem to be flattening when nearing 25 epochs, the same as the image model, which differs from the numerical model that starts to see a flattening of the curve closing in at 200 epochs. As such it seems fair to say that the hybrid model brings the stability of the numerical model, the learning speed of the image model and a slight increase in accuracy compared to the two.

5.2 Scientific error

In this project patients were classified to have AD if they had any diagnosis linking

(31)

accuracy in the model.

Another error might be contained within the OASIS-3 dataset itself, seeing as we’ve found some inconsistencies in how the NIfTI files are structured. At least two incidents were found in the dataset where a NIfTI file had slices swapped along the z-axis and x-axis. Since we are assuming that all the slices we are using are along the z-axis, we also assume the network to learn how to identify slices from that same axis. If there are slices from different axis within the dataset this might confuse the classifier resulting in decreased accuracy.

5.3 Future Work

In order to further increase the classification accuracy using this relatively small dataset adaptations to both the network it self and the data could be implemented.

The following are a few suggestions of what could be done to potentially achieve this.

1. A more complex classifier using other deep learning techniques could yield better results. For example Islam et al. [10] reached an accuracy of 73.5%

using a multi-classed image classifier built with the same framework as this report (TensorFlow) used on the OASIS-3 dataset.

2. More pre-processing steps could be implemented to increase the quality of the input data which in theory should increase the accuracy of the classifier. Pre-processing the images by removing the sculls and converting the pictures into gray-scale images are techniques that could potentially yield a further increase in accuracy.

3. Instead of using only one slice per MRI scan using several slices might also further increase the accuracy.

Also the slices used in this report were slices located in the center of the brain along the z-axis but there is no guarantee that sliced along this axis are the most suited for detecting AD. Potentially using slices along the x or y- axis could increase the

(32)

6 Conclusion

Is a hybrid deep learning model using both MRI imagery and numerical brain measurements more accurate in detecting AD than corresponding purely image based model? While the results are not fully conclusive, there are promising signs that a hybrid model could be a general improvement in the field of deep learning AD detection. The accuracy in itself did not increase enough to be considered a significant improvement, but it did show other desirable traits in the form of model stabilization. The models used were very basic with the sole intent of comparing the accuracy of a hybrid model with the accuracy of the image and numerical model counterparts. Higher accuracy could be achieved in future work with the combination of more complex models.

(33)

References

[1] About the OASIS Brains project. URL:https://www.oasis-brains.org/

(visited on 03/17/2020).

[2] Bobinski, Matthew et al. “The histological validation of post mortem magnetic resonance imaging-determined hippocampal volume in Alzheimer’s disease”. In: Neuroscience 95.3 (1999), pp. 721–725.

[3] Burns, Alistair and Iliffe, Steve. “Alzheimer’s disease.” In: BMJ 338 (2009), b158.

[4] De Leon, M. J. et al. “MRI and CSF studies in the early diagnosis of Alzheimer’s disease”. In: Journal of Internal Medicine 256.3 (2004), pp. 205–223. DOI:10.1111/j.1365-2796.2004.01381.x. eprint: https:

//onlinelibrary.wiley.com/doi/pdf/10.1111/j.1365- 2796.2004.

01381.x. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.

1365-2796.2004.01381.x.

[5] Dr Daniel J Bell, Dr Jeremy Jones. T1 weighted image. URL: https : / / radiopaedia . org / articles / t1 - weighted - image (visited on 01/25/2020).

[6] Falahati, Farshad, Westman, Eric, and Simmons, Andrew. “Multivariate data analysis and machine learning in Alzheimer’s disease with a focus on structural magnetic resonance imaging”. In: Journal of Alzheimer’s disease 41.3 (2014), pp. 685–708.

[7] Farooq, Ammarah et al. “A deep CNN based multi-class classification of Alzheimer’s disease using MRI”. In: 2017 IEEE International Conference on Imaging systems and techniques (IST). IEEE. 2017, pp. 1–6.

[8] FreeSurfer. WorkFlows. URL: http : / / surfer . nmr . mgh . harvard . edu / fswiki/WorkFlows (visited on 06/04/2020).

[9] Frisoni, Giovanni B et al. “The clinical use of structural MRI in Alzheimer

(34)

[10] Islam, Jyoti and Zhang, Yanqing. “A novel deep learning based multi-class classification method for Alzheimer’s disease detection using brain MRI data”. In: International Conference on Brain Informatics. Springer. 2017, pp. 213–222.

[11] Jo, Taeho, Nho, Kwangsik, and Saykin, Andrew J. “Deep learning in Alzheimer’s disease: diagnostic classification and prognostic prediction using neuroimaging data”. In: Frontiers in aging neuroscience 11 (2019), p. 220.

[12] Jr, William C. Shiel. Medical Definition of Magnetic resonance imaging.

Dec. 2018. URL:https://www.medicinenet.com/disease_prevention_

in_women_pictures_slideshow/article.htm (visited on 03/25/2020).

[13] LaMontagne, Pamela J et al. “OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer Disease”.

In: medRxiv (2019). DOI:10.1101/2019.12.13.19014902. eprint: https:

//www.medrxiv.org/content/early/2019/12/15/2019.12.13.19014902.

full.pdf. URL: https://www.medrxiv.org/content/early/2019/12/15/

2019.12.13.19014902.

[14] LeCun, Yann, Bengio, Yoshua, and Hinton, Geoffrey. “Deep learning”. In:

nature 521.7553 (2015), pp. 436–444.

[15] Mayo Clinic. Diagnosing Alzheimer’s: How Alzheimer’s is diagnosed.

URL:https://www.mayoclinic.org/diseases-conditions/alzheimers- disease/in-depth/alzheimers/art-20048075 (visited on 03/23/2020).

[16] NCRAD. Understanding Autopsy Reports. URL:https://ncrad.iu.edu/

understanding_autopsy_reports.html (visited on 03/20/2020).

[17] Nielsen, Michael A. Neural networks and deep learning. Vol. 2018.

Determination press San Francisco, CA, USA: 2015. Chap. 2.

(35)

[20] Rampasek, Ladislav and Goldenberg, Anna . “Tensorflow: Biology’s gateway to deep learning?” In: Cell systems 2.1 (2016), pp. 12–14.

[21] Ridge, Perry G et al. “Alzheimer’s disease: analyzing the missing heritability”. In: PloS one 8.11 (2013).

[22] Szegedy, Christian et al. “Inception-v4, inception-resnet and the impact of residual connections on learning”. In: Thirty-first AAAI conference on artificial intelligence. 2017.

[23] T1-weighted MRI scan. Oasis. URL:https : / / central . xnat . org / app / template/index.html.

[24] Weller, Jason and Budson, Andrew. “Current understanding of Alzheimer’s disease diagnosis and treatment”. In: F1000Research 7 (2018).

[25] WHO - World Health Organisation. Dementia. Sept. 2019. URL: https : //www.who.int/en/news- room/fact-sheets/detail/dementia (visited on 03/17/2020).

(36)

TRITA-EECS-EX-2020:340

A Comparison on Image, Numerical and Hybrid based Deep Learning for Computer-aided AD Diagnostics