• No results found

Predicting Myocardial Infarction using Textual Prehospital Data and Machine Learning.

N/A
N/A
Protected

Academic year: 2021

Share "Predicting Myocardial Infarction using Textual Prehospital Data and Machine Learning."

Copied!
67
0
0

Loading.... (view fulltext now)

Full text

(1)

Predicting Myocardial Infarction using Textual

Prehospital Data and Machine Learning.

Yvette Jane van der Haas

Data Science, master's level 2021

Luleå University of Technology

(2)

Yvette Jane van der Haas 961120-T501

Luleå Tekniska Universitet Leiden University Medical Center

Supervisors: E. Stoop (Data Scientist, LUMC) E. de Koning (Physician researcher, LUMC) S. Saguna (Assistant Professor, LTU)

Predicting Myocardial Infarction using Textual Prehospital Data and Machine Learning.

(3)

Abstract

A major healthcare problem is the overcrowding of hospitals and emergency departments which leads to negative patient outcomes and increased costs. In a previous study, performed by Leiden University Medical Centre, a new and innovative prehospital triage method was developed where two nurse paramedics could consult a cardiologist for patients with cardiac symptoms, via a live connection on a digital triage platform. The developed triage method resulted in a recall = 0.995 and specificity = 0.0113.

This study arise the following research question: ‘Would there be enough (good)

information gathered on the prehospital scene to make a machine learning model able to predict myocardial infarction?’.

By testing different pre-processing steps, several features (premade ones and self-made ones), multiple models (Support Vector Machine, K Nearest Neighbour, Logistic Regression and Random Forest), various outcome settings and hyperparameters, led to the final results: recall = 0.995 and specificity = 0.1101. This is gained through the feature selected by a cardiologist and the Support Vector Machine model.

The outcomes are controlled by an extra explainability layer named Explain Like I’m

Five. This outcome illustrates that the created machine learning model is trained mostly on

(4)

Content

1 INTRODUCTION ... 1 -1.1 Background ... 1 -1.2 Motivation ... 2 -1.3 Problem Definition... 2 -1.4 Data ... 3 -1.5 Ethics... 3 -1.6 Research Methodology ... 4 -1.7 Limitations ... 4 -1.8 Thesis Structure ... 4 -2 RELATED WORK ... 5 -2.1 Machine Learning ... 5

-2.2 Natural Language Processing ... 7

-2.3 Healthcare ... 7

-2.4 Emergency Department ... 7

-2.5 Cardiac ... 8

-2.6 Myocardial Infarction ... 8

-3 THEORY ... 10

-3.1 Machine Learning Models ... 10

(5)

-4 METHOD ... 15

-4.1 Preprocessing ... 15

-4.2 Text to Number ... 16

-4.3 Training Machine Learning Model ... 17

-4.4 Interpretability of Machine Learning Model ... 23

-5 RESULTS ... 25

-5.1 Model Selection ... 26

-5.2 Feature Selection ... 27

-5.3 Iterative Evaluation ... 28

-5.4 Hyperparameter Tuning ... 31

-5.5 Explanation of Final Machine Learning Model ... 31

-6 DISCUSSION ... 34

-7 CONCLUSION AND FUTURE WORK ... 36

-8 REFERENCES ... 38

(6)

-1 INTRODUCTION

The introduction will give a brief overview of the problem and the context of the problem.

1.1 Background

A major healthcare problem is the overcrowding of emergency departments (ED) which leads to negative patient outcomes and increased costs. In a previous study, the HART-c study [1], performed by Leiden University Medical Centre (LUMC), a new and innovative prehospital triage method was developed. Nurse paramedics (NP) could consult a cardiologist for patients with cardiac symptoms, such as chest pain or palpitations. A live connection between the ambulance and the cardiologist was established, giving insight into real-time vital parameters, such as the electrocardiogram (ECG). During this live connection, the cardiologist and the NP decide together whether hospital admission is necessary (see Figure 1).

Figure 1, a scenario diagram of the current hospital triage

1.1.1 Text Analysis

(7)

1.1.2 Myocardial Infarction

The research done by de Koning et al. [1] focuses on cardiac complaints and not all possible complaints which might warrant an ED visit. Since a major part of these cardiac complaints is chest pain (in this research around 50%), which might be caused by myocardial infarction, this will be the target for this research. By focusing specifically on chest pain the analysis can be made more precise.

Definition myocardial: Medical name for the muscle of the heart

1.2 Motivation

Solving this research problem is important because it can reduce overcrowding and reduce healthcare costs. This gives medical employees time to spend their time on other patients improving the quality of care.

In the current situation, there are 6747 ambulance rides for chest pain, of which 5936 ultimately didn’t suffer myocardial infarction. This is more than 85% of all journeys. One ambulance ride in The Netherlands cost around 600-700 euros [10]. If the predictions would be optimal, the cost and time spent on 5936 ambulance rides could have been used for other patients or research.

1.3 Problem Definition

(8)

hospital because the patient is suffering from myocardial infarction, by using prehospital textual data.

Definition recall: Measures how often a test correctly generates a positive result for people who have the condition that’s being tested for (also known as the “true positive” rate). A highly recall test will flag almost everyone who has the disease and not generate many false-negative results. [2]

Definition specificity: Measures a test’s ability to correctly generate a negative result for people who don’t have the condition that’s being tested for (also known as the “true negative” rate). A high-specificity test will correctly rule out almost everyone who doesn’t have the disease and won’t generate many false-positive results. [2]

1.4 Data

The data used consists of 270 columns (see Appendix I for all column names) and 7458 rows, gained from September 2018 till September 2020. Of importance from March 2020 onward the (medical) world had a big impact on COVID-19. From all the data columns, there are 14 columns with free text which don’t contain patient information like name or address. There are another 11 columns that contain values and have been classified as important for myocardial infarction prediction by a physician-researcher [1] (see Appendix III). The last column in this dataset is the target feature/ground truth, this shows where the patient had a myocardial infarction (target=1) or did not had one (target=0). Appendix II shows all the columns coming from one patient from the dataset as an example

1.5 Ethics

(9)

1.6 Research Methodology

The data used for this research is gathered before the start of this study. This is done by the NP on the ambulance. To analyse the data there will be ML used. Finding the best model possible, different ML models and techniques will be tested. Likewise, the explanation of the ML model will be shown. The aim is to find a pattern between prehospital data and myocardial infarction. This shows the study will be experimental quantitative research [12].

1.7 Limitations

The research will be limited to the preparation and processing of the dataset and writing of code. The data which is currently available will be all data, there’s no possibility of gaining more data. For the text classification, there will be only a prediction about: “if the patient needs to be brought to the ED for a visit because of myocardial infarction”. So there won’t be looked into other illnesses which might cause an ED visit. Furthermore, it should be taken into account that there is no budget available during this research.

1.8 Thesis Structure

(10)

2 RELATED WORK

The research problem that will be unravelled can be converted to the next research question: ‘How to combine ML and Natural Language Processing (NLP) to predict if the

patient needs an ED visit because of myocardial infarction, using textual prehospital data?’.

The existing solutions of previous scientific work in this research question area will be described in this Chapter.

2.1 Machine Learning

ML can be defined as 'a computer program that could learn from event E, relative to

similar tasks T and performance measure P, as performance on the tasks in T, as measured by P, improved by experience E.' [16]. ML includes computer algorithms that used to learn from

data without guidance. Thus, computers do not need to be manually programmed, but can independently change and improve their algorithms.

ML has different methods of approaching a problem [17]. Supervised learning, unsupervised learning and semi-supervised learning are the most common methods. Although all these methods have the same goal - to arrive at insights, patterns and relationships that can lead to decisions - they use different approaches.

2.1.1 Supervised

(11)

fewer and fewer errors and can eventually produce the correct output based on new input [16]. Supervised can be used, for example, when classifying dog and cat photos. In this case, the input to the model is several photos each with a label 'dog' or 'cat'. Based on this input, the model learns and can eventually predict the label of new images.

2.1.2 Unsupervised

When using unsupervised learning, no labels are provided to the input of the algorithm. This is therefore unsupervised learning, in which no guidance is provided by entering examples with the desired output [17]. In other words, the algorithm has to figure out what is shows without labels (the correct output). During this process, the computer itself will divide the input into categories. It will then place elements with data that are very similar. Unsupervised ML could be applied to a streaming service. The streaming service clusters all series/movies they offer and a customer can check which series/movies are recommended based on what they have watched before.

2.1.3 Semi-supervised

(12)

2.2 Natural Language Processing

There are several other branches within ML. One of these branches is Natural Language Processing, abbreviated to NLP. The ultimate goal of NLP is to read, decipher and understand human languages. Most NLP techniques have as basis ML techniques [23]. Currently, NLP is used in many places, for example:

 Spellcheck  Spam filters  Search engines

 Siri, Alexa and Google Assistant

NLP can divide into processing and understanding language. During processing, words and sentences are converted into numbers. This is the language that the computer uses. This series of numbers is often put into a vector [24].

2.3 Healthcare

ML can be practised in different areas, one of those areas is healthcare. The research papers by Johnson et al. [3] and Chen et al. [4] show examples of how they used ML in this specific area. Johnson et al. [3] focusses on the pre-processing of the medical data and what the most common pitfalls are. The paper written by Chen et al. [4] gives a good analysis of the general ML in the healthcare approach and concentrates on how to work with textual medical data.

2.4 Emergency Department

(13)

done by Hong et al. [6], tests different ML models for predicting hospital admissions at ED triage. The last research paper, from Goto et al. [7], focuses on the possibility to improve the prediction of clinical outcome for kids in ED by testing different models.

2.5 Cardiac

The research focus is on myocardial infarction as also made clear through the presented research question. So there is no focus on other diseases since this is out of the research scope. The paper “Artificial Intelligence in Cardiology: Applications, Benefits and Challenges” [8] shows how AI works, the applications of AI in cardiology, and goes deeper into the subjects of Deep Learning (DL) in cardiology and unsupervised learning in cardiology.

2.6 Myocardial Infarction

A myocardial infarction has specific symptoms. Research done by Six et al. [9] shows that a HEART-score can be measured which helps a cardiologist in safely ruling out myocardial infarction. This HEART-score represent: History, ECG, Age, Risk factors and Troponin. These five factors might give the most information for a prediction with AI. All can be scored on a 0-2. The lower the HEART score, the lower the chance of myocardial infarction. See Appendix IV for a scheme.

 History: History is scored higher if the symptoms are typical for myocardial infarction: chest pain, its onset and duration, its relationship to exercise, stress or cold, its localisation, its concomitant symptoms and its response to sublingual nitrates. The more of those symptoms, the higher the chance of myocardial infarction.

(14)

 Age: When the patient's age is between 45 and 64, there is a slightly higher risk of myocardial infarction. At an age higher than 65, the risk is even greater. Therefore, the higher the age, the higher the score.

 Risk factors: The number of risk factors the patient has that increase the risk of myocardial infarction. These risk factors are: currently treated diabetes mellitus, current or recent (< one month) smoker, diagnosed hypertension, diagnosed hypercholesterolaemia, family history of coronary artery disease and obesity.

(15)

3 THEORY

To give more clarification the research is divided into steps. A graphical draft of those steps could be seen in Figure 2. These steps could be handled with multiple techniques, all those techniques have their pros and cons. These techniques will be explained throughout this chapter.

Figure 2, block diagram of the steps

3.1 Machine Learning Models

In this study, supervised text classification is done [13]. On basis of the text, the different documents will be classified into the groups ‘myocardial infarction’ and ‘no-myocardial infarction’. In ML there are a lot of different models and not all of them will be able to work for supervised text classification [14, 15]. The most popular used models which fit this criterion are Support Vector Machine (SVM), Random Forest (RF), K Nearest Neighbour (KNN) and Logistic Regression (LR).

3.2 Metrics for Machine Learning Models

(16)

Figure 3, confusion matrix [19]

3.2.1 Options

The options of metrics will be described in the next paragraphs.

3.2.1.1 Accuracy

The accuracy indicates the percentage of the prediction that is correct. The accuracy gives a general picture of the prediction. The score will be between 0 and 1. The formula for the accuracy is:

𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

3.2.1.2 Precision

The precision gives a better picture of the accuracy. With precision, the fraction of the number of correct predictions is calculated. The score will be between 0 and 1. It is calculated by:

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

(17)

3.2.1.3 Recall

The recall also gives a better picture of the accuracy of the prediction. The recall is the fraction of what is predicted. The score will be between 0 and 1. It is calculated by:

𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

3.2.1.4 Specificity

Specificity gives a good image of the chance of false positives [2]. The score will be between 0 and 1. This is calculated by:

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

𝑡𝑟𝑢𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

3.2.1.5 F-score

The F-score, also called F1 Score, is a combination of precision and recall. When no choice can be made between the precision and the recall, the F-score can be useful. This way, both scores do not have to be compared, the F-score is used instead. The score will be between 0 and 1. The F-score is the harmonic mean of precision and recall and is calculated by:

𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

3.2.1.6 Fbeta Score

(18)

𝐹𝛽 𝑠𝑐𝑜𝑟𝑒 =

( 1 + 𝑏𝑒𝑡𝑎2) ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙

𝑏𝑒𝑡𝑎2∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

3.2.2 Original Study Results

To give an example of how the calculations are done, the research dataset is used on how the cardiologist and the two NP made their decision. See Table 1 and the calculations.

Table 1, confusion matrix

Predicted class – Myocardial infarction?

Actual class 1.0 0.0

Myocardial infarction?

1.0 811 4

0.0 5936 68

With this data, the accuracy, precision, recall, specificity, f-score and fbeta score can be calculated. For all the metrics applies, the higher the score; the better the outcome.

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 811 + 68 811 + 68 + 4 + 5936= 0.1289.. 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 811 811 + 5936 = 0.1202. . 𝑅𝑒𝑐𝑎𝑙𝑙 = 811 811 + 4 = 0.9950.. 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 68 68 + 5936= 0.0113.. 𝐹 − 𝑠𝑐𝑜𝑟𝑒 = 2 ∗ 0.1202 ∗ 0.9950 0.1202 + 0.9950= 0.2144.. 𝐹𝑏𝑒𝑡𝑎 𝑠𝑐𝑜𝑟𝑒 = ( 1 + 2 2) ∗ 0.1202 ∗ 0.9950 22∗ 0.1202 + 0.9950 = 0.4052.. 3.2.3 Healthcare

(19)

For this healthcare question, it is important to have as much as or a lower number of false negatives. This false negative value is important because this measures how many patients have a myocardial infarction but will be kept at home since the prediction is ‘no-myocardial infarction’. This can be measured by calculating the recall. For the recall score there is a minimum set by the hospital of 95%, but preferably a better or the same recall as the original results coming from an NP.

Another important aspect of this research is to reduce the number of false positives to prevent an overcrowded ED. This can be measured by calculating specificity or precision. These two measurements need to be as high as possible and show how many people are taken to the hospital when they could have stayed at home.

(20)

4 METHOD

During the research, there will be worked with different techniques. In the next sections of this Chapter, these techniques will be explained.

4.1 Pre-processing

In the process of preparing data, both regex (Regular Expression) and stemming will be used. Regex can easily be used to remove or transform words in the dataset. Stemming changes the words in the dataset, but compared to regex this doesn’t have to be done manually.

4.1.1 Regex

With regex, it is possible to remove punctuality by making just one line of code. This also applies to stop word removal and changing of abbreviations.

4.1.2 Stemming

There are multiple ways to process languages, these process techniques can be defined as synthetic and semantic analysis. Within syntactic techniques, the words get often changed to have a better transfer of the word to the ML model. For instance, think of changing the word

hands into the word hand. Semantic analysis is looking into the meaning of words, think in this

case of the word left. This can mean the opposite of right or for example that: someone left the room. [23]

(21)

4.2 Text to Number

To train an ML model with text, this text must first translate into something a computer can work with. The text will be transformed into a vector. The technique that will be used is called Tf-Idf. Tf-Idf short for: term's frequency (TF) and inverse document frequency (IDF).

Tf-Idf is a technique that first looks at the proportion of a certain word in a document, this is called the Tf-score. The more often a word appears in a document, the greater the relevance of this word. Next, it looks at how often the word appears in the other documents, this is the Idf-score. The logarithm in the Idf formula ensures that the inverse of the ratio is taken. If a word appears often in one document, but as well in other documents, this may be because this word is a common one. These two scores are multiplied and result in the weight of a word in a given document. The higher the Tf-Idf score, the more unique the word is and the lower the Tf-Idf score, the more often the word occurs [11].

Below are the formulas of Tf, Idf and Tf-Idf:

𝑇𝑓 = 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑤𝑜𝑟𝑑 𝑥 𝑎𝑙𝑙 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑎𝑟𝑡𝑖𝑐𝑖𝑎𝑙 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝐼𝑑𝑓 = log ( 𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠

𝑎𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝑤𝑜𝑟𝑑 𝑥) 𝑊(𝑥) = 𝑇𝑓 ∗ 𝐼𝑑𝑓

Example: There is a document with 100 words and the word ‘pain’ appears in it 12 times. The Tf score for "pain" is: 𝑇𝐹 = 12

100 = 0,12 . There are 10,000 documents and in 300 of them, the word ‘pain’ appears. The Idf score: 𝐼𝑑𝑓 = 𝑙𝑜𝑔 (10.000

300 ) = 1,52 . The final Tf-Idf score is: 𝑊(𝑝𝑎𝑖𝑛) = 0.12 ∗ 1.52 = 0.182

(22)

One of the options in the Tf-Idf vectorizer is changing the ngram_range parameter. This is normally set at (1,1) which means it will only take one word per vector row. When this will be changed to (2,2) the vectorizer can only choose two words per row which will be seen as one (think of ‘myocardial infarction’ instead of ‘myocardial’ and ‘infarction’). When setting this on (1,3) the vectorizer can choose to combine one, two or three words.

4.3 Training Machine Learning Model

For this NLP research, it was decided to compare multiple models which fitted best with the problem. In the following sub-sections, the ML model used in the final model will be described regarding how the model works and what the mathematics is behind it. Additionally, the techniques, cross-validation and grid search, will be explained.

4.3.1 Support Vector Machine

Support Vector machine (SVM) is the ML model used during the research. SVM classifies into two classes, with a hyperplane separating these two classes.

Figure 4, a hyperplane in 3d [25]

4.3.1.1 How Does SVM Choose a Class?

(23)

points can be interpreted as vectors. The length of the vector tells in how many dimensions the hyperplane is defined.

Example: In NLP there will be worked with vectors of words. Looking at Figure 4, there can be seen that this hyperplane is created for a problem with three dimensions, so the length of the vector is three. These words could be interpreted as:

- Width is the word 'patient' - Depth is the word 'has'

- Height is the word 'POB' (means: Pain on chest) - Red dots are myocardial infarction

- Blue dots are no-myocardial infarction

The hyperplane tries to show as accurately as possible which word makes the vector a ‘myocardial infarction’, or ‘no-myocardial infarction’.

The optimal hyperplane is as far away from the different clusters as possible. The closest data points ultimately determine the position of the hyperplane. These data points are the support vectors. When these data points are removed, a new position for the hyperplane will be determined. When training a model there will be searched for the support vectors. Other training data is ignored. [25]

(24)

Figure 5, hyperplane and formulas [26]

4.3.1.2 Theoretical Details of SVM

Both classes in which an SVM classifies have a y-value. This value can be either +1 or -1. These two values each represent a class. There are also two other hyperplanes. These run through the support vectors. In Figure 5 they are indicated by the blue and green text. The formulas are as follows: [26]

𝑤⃑⃑ ∙ 𝑥 + 𝑏 ≥ +1

𝑤⃑⃑ ∙ 𝑥 + 𝑏 ≤ −1

The two classes are divided into the value -1 and 1, as can see in the formula above. If the formula (𝑤⃑⃑ ∙ 𝑥 − 𝑏) ‘≥ 1’ is the result, it is classified as blue. If the result of this formula is '≤ 1', it is classified as green. A linear formula can be formulated as: 𝑦 = 𝑎𝑥 + 𝑏. This can be written as: 0 = 𝑎𝑥 + 𝑏 − 𝑦. At SVM, the following version of the latter formula is used.

𝑤⃑⃑ ∙ 𝑥 + 𝑏 = 0

Where 𝑤⃑⃑ = (−1𝑎) is the normal vector and 𝑥 = (𝑥𝑦) a vector with the value of a data point. The above formula is the formula for the optimal hyperplane. It is now known that:

(25)

Adding these formula’s together, it can be written as: 𝑦(𝑤⃑⃑ ∙ 𝑥 + 𝑏) − 1 ≥ 0, with the value 𝑦 = −1 and 𝑦 = 1. To determine the optimal hyperplane mathematically, the margin must first be determined. The distance between the hyperplane through the blue support vectors and the optimal hyperplane is 1−𝑏

|𝑤|. The distance between the hyperplane through the

green support vectors and the optimal hyperplane is −1−𝑏

|𝑤| . The margin between the hyperplane

of the green and blue support vectors is: 𝑀 =(1 − 𝑏) |𝑤⃑⃑ | − (−1 − 𝑏) |𝑤⃑⃑ | 𝑀 = 2 |𝑤⃑⃑ |

M is twice the margin, namely the sum of the margin from optimal hyperplane to

support vectors of one class and the margin from optimal hyperplane to support vectors of the other class. Margin can thus be written as:

𝑚𝑎𝑟𝑔𝑖𝑛 = 1 |𝑤⃑⃑ |

To determine the optimal hyperplane the margin will be maximised. This can also be written as:

𝑚𝑎𝑥 1

|𝑤⃑⃑ |= 𝑚𝑖𝑛 |𝑤⃑⃑ |

Since there is worked in two dimensions, this formula is also written as: 1

2|𝑤|⃑⃑⃑⃑⃑⃑

2. In turn,

this gives the following formula:

(26)

4.3.1.3 Reliability

To calculate the reliability of the model, the distance between the data point and the optimal hyperplane is used. The further away the data point is from the hyperplane, the higher the expectation that the prediction is correct.

4.3.1.4 Multi-class

In previous figures, a separation was always made between two groups with the hyperplane. To be able to use SVM also in a classification problem, with multiple classes, the technique one-vs-all is used. This means that the above-described method is used to determine the optimal hyperplane. In the case of multiple classes, multiple optimal hyperplanes are created. Namely, one for each class that is in the dataset. Instead of class1 against class2 against class3, the strategy is to perform class1 against class2+class3, class2 against class1+class3 and class3 against class1+class2.

4.3.2 Cross-validation

(27)

Figure 6, cross-validation

The number of folds may vary. Usually, it is between 5 and 10 folds. In that case, each training session has between ten and 20% of test data.

4.3.3 Hyperparameter Tuning with GridsearchCV

The SVM model has several hyperparameters. These hyperparameters can be set until the best result is achieved. Instead of allowing the researcher to enter new values for the parameters and having to train every time, GridSearchCV [27] can do this in one go. For this technique, different sets of values are defied for every hyperparameter. Below is an example of what this looks like:

parameters = {'C':[1,10,100,1000],'gamma':[1,0.1,0.001,0.0001]}

(28)

4.4 Interpretability of Machine Learning Model

After the training of an ML model, it will give you a result. How the model comes to a result is often seen as a black box. In healthcare it is very important to know why a model is choosing for a class thus, to know whether the model makes reasonable decisions, an

explainability technique is used.

4.4.1 ELI5

A technique that can be used here is called ELI5, which stands for Explain Like I’m

Five. This technique can show on a global and local level what the most important words are

for a certain class. The higher the score for the word, the more influence it has on the output. [28]

4.4.1.1 Local Explanation of the Model

The local ELI5 shows per document (I) what class it has chosen, (II) the ELI5 score contribution for and against the chosen class, (III) and per word, whether the word affects the class positively or negatively. See Figure 7 for a representation of global ELI5. In this

example, red means ‘no-myocardial infarction’ and green stands for the class ‘myocardial infarction’. The intense the colour, in part III, the more negative or positive effect the word has.

(29)

The figure above represents an example. The example shows that (I) it has chosen for Y=1.0, which means it predicted a ‘myocardial infraction’ with a probability of 0.505. The second thing to be seen is (II) the total amount of words that contribute to this class is 1.955. The words in this document against this class have an ELI5 score of 0.956 and is called <BIAS>. The third part (III) is the full document with highlighted (with red and green) words. In Figure 7, green means ‘chose for class y=1.0’ and red does ‘chose against 1.0’. The darker the colour, the higher the score.

4.4.1.2 Global Explanation of the Model

The global ELI5 shows the most important words for both classes and their scores. Green means ‘chose for class y=1.0’, red does ‘chose for class 0.0’. See Figure 8 below for an example of the top ten words.

(30)

5 RESULTS

This study is done in five steps. These steps are (1) Model selections, (2) Feature selection, (3) Iterative evaluation, (4) Hyperparameter tuning and (5) Explanation of the ML model. These steps are graphically shown in Figure 9.

Figure 9, the flowchart of how to get the right results

In the first step, a first selection will be done between the four ML models which result in two models. The selection will be based on the fbeta score (see Chapter 3.2 for metrics explanation). The second step reduces the number of features to five different ones, this will be based on the fbeta and recall score. The third step is finding downsides about the created models and continuing to improve the features and models. The fourth step is fine-tuning the hyperparameters, and the fifth step is adding an explanation to the final model.

(31)

To know whether a combination of multiple features works, three more features are created before the steps start. These three new features are based on the information that can be seen in Appendix III. One will be all features combined and is named ‘compiledALL’. All the features with text are combined and named ‘compiledTEXT’. The last extra feature is named ‘compiledSELECT’, this is all the features combined that have been selected by the cardiologist.

5.1 Model Selection

The first step is to reduce the number of models from four to two different ML models. These will be chosen from the following models: SVM, RF, KNN and LR. Here, the default ML models used. Pre-processing steps in the models are the same for each model and named as follows: (I) removal of punctuation marks, (II) make all words lowercase, (III) removal of stop words (not words like ‘never, not, no’, since they can make a word a complete different meaning) and (IV) stemming.

The results of the created models (see Appendix V for used code) can be seen in Table 2. These results show the fbeta score, with a beta of 2, per feature and ML model. It can be seen that the overall outcomes of the SVM and the KNN models are the highest.

Table 2, outcome of the four different ML models per feature

Features: SVM RF KNN LR

(32)

Tractus anamnese 0.871984 0.185657 0.882708 0.199732 Lichamelijk onderzoek 0.262735 0.838472 0.880697 0.260054 Toelichting behandeling 0.870643 0.840483 0.88941 0.865952 Overwegingen 0.290214 0.270107 0.882708 0.301609 Bijzonderheden 0.867292 0.865952 0.883378 0.865952 compiledALL 0.817694 0.715818 0.869303 0.8063 compiledTEXT 0.813673 0.727212 0.867292 0.813673 compiledSELECT 0.816354 0.66555 0.868633 0.80563

Mean fbeta score: 0.709391 0.609959 0.878213 0.629199

5.2 Feature Selection

The second step aims to reduce the features. Currently, there are 17 different features and the aim is to go to five features. Here, the default ML models are used. Pre-processing steps for SVM and KNN: (I) removal of punctuation marks, (II) make all words lowercase, (III) removal of stop words (not words like ‘never, not, no’, since they can make a word a complete different meaning) and (IV) stemming.

The created models (see Appendix VI for used code) resulted in the scores found as can be seen in Table 3. These results show the fbeta (beta of 2) and recall score per feature. It can be seen that the fbeta score of most features is between 0.75 and 0.89 for both models. KNN is overall scoring higher than SVM. When looking at the recall score, it can be concluded that scores for the KNN model are worse than for the SVM model. Since the outcomes of the fbeta scores are more constant, it is chosen to select the top five features in terms of recall. These are the features ‘MedischKladblokMKA’, ‘Schouw’, ‘Lichamelijk onderzoek’, ‘Overwegingen’ and ‘compiledSELECT’, which all have a recall score of higher than 0.6.

Table 3, fbeta and recall score of SVM and KNN model per feature

Features: SVM KNN

fbeta recall fbeta recall

(33)

Exposure vrij in te vullen 0.864611 0.04 0.882708 0

AMPLE Medicatie omschrij 0.543566 0.588571 0.878686 0.0114286 AMPLE Past omschrijving 0.704424 0.325714 0.881367 0.0285714 AMPLE Event omschrijving 0.792895 0.365714 0.882038 0.0171429 Reden van melding 0.701743 0.548571 0.869303 0.0685714

Schouw 0.438338 0.714286 0.878016 0.0171429 Speciale anamnese 0.847855 0.131429 0.882708 0 Tractus anamnese 0.871984 0.0342857 0.882708 0 Lichamelijk onderzoek 0.262735 0.851429 0.880697 0 Toelichting behandeling 0.870643 0.188571 0.88941 0.0914286 Overwegingen 0.290214 0.794286 0.882708 0.0114286 Bijzonderheden 0.867292 0.0228571 0.883378 0.00571429 compiledALL 0.817694 0.571429 0.869303 0.125714 compiledTEXT 0.813673 0.554286 0.867292 0.114286 compiledSELECT 0.816354 0.6 0.868633 0.131429 5.3 Iterative Evaluation

In this step, three phases are worked through. These phases are: model decreasing, best pre-processing settings and threshold settings.

5.3.1 Phase One

The first one is to reduce model options. The same models and default settings from the feature selection step resulted in results that can be seen in Table 4. This Table shows the

specificity is significantly better for the KNN model, but the recall and fn score is worse. Since

these last two are the most important indicators, the choice is made to continue with the SVM model.

Table 4, the outcome of SVM and KNN model per feature

Features: fbeta recall specificity tp tn fp fn

(34)

MedischKladblokMKA 0.865952 0.0457143 0.974943 8 1284 33 167 Schouw 0.878016 0.0171429 0.992407 3 1307 10 172 Lichamelijk onderzoek 0.880697 0 0.997722 0 1314 3 175 Overwegingen 0.882708 0.0114286 0.998481 2 1315 2 173 compiledSELECT 0.868633 0.131429 0.966591 23 1273 44 152 5.3.2 Phase Two

The second phase will look into more processing techniques. These advanced pre-processing techniques are:

 Changing the Nan-values to one certain word like ‘NOINFORMARTIONHERE’. This could lead to finding whether the model trains on empty rows, and maybe improve the model.

 Change the ngram_range parameter in the Tf-Idf vectorizer from (1,1) to (1,3).

 There is a list of abbreviations (see Appendix VII) that are often used. The author of the notes could choose to use abbreviations or the full word. For the model, they will have a completely different meaning, but they mean the same. By changing the abbreviations to the full word this problem could be reduced.

All of the above-named techniques are combined and tested for all the different features. As a result, per feature, the best method is used as a pre-processing step for that specific feature. Per feature we will test and report the results of the following methods: (1) change Nan-values, (2) change ngram_range, (3) change abbreviations, (4) change Nan-value & ngram_range, (5) change Nan-values & abbreviations, (6) change ngram_range & abbreviations and (7) all of the three combined.

(35)

for false-negative, instead of recall, to see a better difference between the trained models. Per feature, the value in bold is the best one for that feature. For the full results, Appendix VIII can be seen.

Table 5, false negative-values of different ways of pre-processing per feature Features: Basic Nan Ngram Abbv Nan &

ngram

Nan & abbv

Ngram & Abbv All

MedischKladblok 69 68 122 73 121 71 73 121 Schouw 50 53 159 55 79 59 55 73 Lichamelijk onderz 26 26 165 21 31 25 21 29 Overwegingen 36 34 167 32 45 36 32 45 compiledSELECT 70 70 109 71 109 71 71 105 5.3.3 Phase Three

Since it is known that the fn-value, thus recall as well, is the most important value in this study, it will be the focus in this step to set the threshold. The threshold will be found using the trial-and-error method and should lead to a recall score of 0.95, 0.995, and 1. The first score is a given aim (from the hospital), the second is the recall score the original study reached and the last number, one, would be the perfect score. This threshold can be set by determining for which probability, given by the model, which class should be given to the metrics.

The code in Appendix IX shows per feature (training done with optimal pre-processing steps, see phase 2) the thresholds that have been reached with a minimum recall of 0.95, 0.995 and 1. In Table 6 the results for the ‘compiledSELECT’ are shown. A threshold of 0.955, 0.983 and 0.991 are in accordance with a recall score of 0.95, 0.995 and 1, respectively. his was the feature that had the best precision and specificity, even slightly better than the original triage. The results of the other features are seen in Appendix X.

Table 6, the outcome for different thresholds from feature compiledSELECT

Threshold: fbeta recall precision specificity tp tn fp fn

(36)

0.954553 0.415018 0.950556 0.182271 0.331006 769 1707 3450 40 0.983471 0.205665 0.995056 0.145307 0.0818305 805 422 4735 4 0.991287 0.154375 1.0 0.138196 0.0217181 809 112 5045 0

5.4 Hyperparameter Tuning

The possible outcomes for parameter C are set to [0.01,0.1,1,10,100,1000]. The optimum is calculated with cross-validation and grid search on the evaluation metric: recall (see Chapter 4.3 for the explanation about cross-validation and grid search). The number of folds used is ten. The best value found for the hyperparameter C is 0.1. The different outcomes are shown in Table 7. For these results, there is a new threshold set, since the new parameters influence the outcome of the model. The used code can be found in Appendix XI.

Table 7, the outcome for different thresholds with optimal hyperparameter

Threshold: fbeta recall precision specificity tp tn fp fn

0.5 0.869938 0.0294715 0.659091 0.997683 29 6459 15 955 0.898230 0.291231 0.950203 0.151491 0.191072 935 1237 5237 49 0.975628 0.165862 0.995935 0.136168 0.0396973 980 257 6217 4 0.985361 0.147761 1 0.13406 0.0182268 984 118 6356 0

5.5 Explanation of Final Machine Learning Model

To add an explainability layer on the final ML model there needs to be one extra training iteration done without grid search. (ELI5, the technique used for the explainability layer, doesn’t support a model that is trained with grid search.) For this training iteration the ‘SVM model’, feature ‘compiledSELECT’ and hyperparameter ‘C’ is 0.1 will be used. The used code is found in Appendix XII and the outcome of this training can be seen in Table 8.

Table 8, outcome final chosen model

Threshold: fbeta recall precision specificity tp tn fp fn

(37)

5.5.1 Global Explanation

There are two ways of adding an explanation to a model (see Chapter 4.4.1). These are global and local. The global explanation with the top-20 words is given in Figure 10 (and for top-50 words, see Appendix XIII). In these results, the green words have a positive effect and the red a negative effect on the class ‘myocardial infarction’. The medical Dutch words which are the output in the top-20 and top-50 results do make sense [21].

Figure 10, top-20 global explanation for the class ‘myocardial infarction’

5.5.2 Local Explanation

The local explanation is given per document. This can be used to find what influenced the model to still misclassify the sample. Figures 11 and 12 are two examples of documents that should have been classified as ‘myocardial infarction’, but were classified as

(38)

Figure 11, local explanation of wrong classified document 566 where red words represent the class ‘myocardial infarction’ and green ‘no-myocardial infarction’

Figure 12, local explanation of wrong classified document 3798 where red words represent the class ‘myocardial infarction’ and green ‘no-myocardial infarction’

In the first example (Figure 11) there are a lot of medical Dutch words which

influence the decision making. So are dm (diabetes) and hypertensief very interesting words for predicting myocardial infarction [21]. Also, some words do not make a lot of sense like

dochter (daughter) and bandgevoel (should have favoured class: myocardial infarction)

[21].

The second example (Figure 12) is more confusing [21]. It can be seen that the model is taking words as: ‘kinderopvang (children's daycare) and leidinggev (leadership)’ for the class myocardial infarction, and it does classify the word tia under no-myocardial infarction.

(39)

6 DISCUSSION

The study aimed to see if an ML model could predict whether a patient needs to be brought to the hospital because the patient is suffering from myocardial infarction. Preferably with a better recall and specificity score compared to the original study, but at least an equal score as the original study. The original scores are respectively 0.995 and 0.0113 for recall and specificity. The final ML model result of this research has a recall score of 0.995 and specificity score of 0.1101.

The achieved results show a boost of 10% of the specificity with the same recall score. This implies that the ML model finds relevant information in the data, and so far tested, has better specificity and recall score than the original study findings. The ML model is mainly scoring badly on patients who are taken to the hospital but should have kept home since they don’t suffer from a myocardial infarction. However, the model is scoring very well on the patients who are experiencing a myocardial infarction. ‘Only’ 0.5% of this group is predicted to stay home, although they should have been transported to the hospital.

During the creation of this final ML model, not all commonly used methods were tested. Due to a considerable time limit of the study, these could not be implemented. By not including these techniques, there is a high possibility of a poorer recall and specificity outcome. The unapplied methods, which might have made difference, are described in the following sections.

Numerical Values

(40)

numerical-feature trained ML model and a text-featured trained ML model suffices. The numerical features are also used by the cardiologist and the NP to make a decision, so this could give the ML model more information to train better.

Advanced Embedding

During the research, the Tf-Idf vectorizer was used. This method is known for easy, well and quick usage. There are as well methods that use a more advanced vectorizer, also known as an embedding. These pre-trained embedding techniques probably contain more information, so less information gets lost when words are transformed into numbers.

Different Approach

While selecting the ML models, supervised ML models, which are good in classifying textual data, were evaluated. Since the dataset is imbalanced, (90% vs 10%) an outlier detection approach could also have been used. This is also supervised ML, except it won’t look into the classes, but it’s more interested in which samples are the outliers in the data set. In this case, the minority of the imbalance data set (the 10%) could be perceived as outliers.

Neural Network

(41)

7 CONCLUSION AND FUTURE WORK

The study aimed to find if the prehospital text data could offer certain information which could be handled with an ML model. The most important part was to not have a worse recall than the original results and preferably a better specificity. The recall and specificity of the original study are respectively 0.995 and 0.0113. The final ML model result of this research has a recall score of 0.995 and specificity score of 0.1101.

The results of this study have an increased specificity of almost 10% in comparison with the original study. This result is established by setting a threshold on the recall score of 0.995, which is the aim of the study. These results are achieved by a dataset selected by a physician-researcher and an SVM ML model. The major recall score improvement is made by setting a threshold on the prediction probability.

Further Research

Prospective research could improve the current ML model. Even though the aims of this study are reached, there is still a lot to improve or to inspect. First of all, it is advised to research the advantage of combining an ML model trained on numerical features with an ML model trained on text features. Secondly, prospective research could be done by working with more ‘advanced’ techniques. For example, one could choose to work with an embedding like BERT [29], or a different methodology like outlier detection and different models such as LSTM.

(42)
(43)

8 REFERENCES

[1] de Koning, E., Biersteker, T. E., Beeres, S., Bosch, J., Backus, B. E., Kirchhof, C. J., Alizadeh Dehnavi, R., Silvius, H. A., Schalij, M., & Boogers, M. J. (2021). Prehospital triage of patients with acute cardiac complaints: study protocol of HART-c, a multicentre prospective study. BMJ Open, 11(2), e041553.

https://doi.org/10.1136/bmjopen-2020-041553

[2] HealthNewsReview.org, Science, C., & HealthNewsReview.org. (2021). Understanding medical tests: sensitivity, specificity, and positive predictive value. HealthNewsReview.Org.

https://www.healthnewsreview.org/toolkit/tips-for-understanding-studies/understanding-medical-tests-sensitivity-specificity-and-positive-predictive-value/

[3] Johnson, A. E. W., Ghassemi, M. M., Nemati, S., Niehaus, K. E., Clifton, D., & Clifford, G. D. (2016). Machine Learning and Decision Support in Critical Care. Proceedings of the IEEE, 104(2), 444–466. https://doi.org/10.1109/jproc.2015.2501978

[4] Chen, M., Hao, Y., Hwang, K., Wang, L., & Wang, L. (2017). Disease Prediction by Machine Learning Over Big Data From Healthcare Communities. IEEE Access, 5, 8869–8879.

https://doi.org/10.1109/access.2017.2694446

[5] Taylor, R. A., Pare, J. R., Venkatesh, A. K., Mowafi, H., Melnick, E. R., Fleischman, W., & Hall, M. K. (2016). Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach. Academic Emergency Medicine, 23(3), 269–278.

https://doi.org/10.1111/acem.12876

[6] Hong, W. S., Haimovich, A. D., & Taylor, R. A. (2018). Predicting hospital admission at emergency department triage using machine learning. PLOS ONE, 13(7), e0201016.

https://doi.org/10.1371/journal.pone.0201016

[7] Goto, T., Camargo, C. A., Faridi, M. K., Freishtat, R. J., & Hasegawa, K. (2019). Machine Learning–Based Prediction of Clinical Outcomes for Children During Emergency Department Triage. JAMA Network Open, 2(1), e186937. https://doi.org/10.1001/jamanetworkopen.2018.6937

[8] Artificial intelligence in cardiology: applications, benefits and challenges. (2018). British Journal of Cardiology, https://www.jacc.org/doi/full/10.1016/j.jacc.2018.03.521. https://doi.org/10.5837/bjc.2018.024 [9] Six, A. J., Backus, B. E., & Kelder, J. C. (2008). Chest pain in the emergency room: value of the HEART score. Netherlands Heart Journal, 16(6), 191–196. https://doi.org/10.1007/bf03086144

[10] Formatics . (2021). EHBO Bladel. Badel Ehbo. http://www.ehbobladel.nl/news/show/twijfel-niet-bel-112#:%7E:text=Een%20ambulancerit%20(spoed)%20komt%20gemiddeld,wordt%20betaald%20door%20de%2 0zorgverzekeraar

[11] sklearn.feature_extraction.text.TfidfVectorizer — scikit-learn 0.24.1 documentation. (2020). TFIDF. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html [12] Bhandari, P. (2021, February 15). An introduction to quantitative research. Scribbr.

https://www.scribbr.com/methodology/quantitative-research/

(44)

[14] Khan, D. R. Z., & Allamy, H. (2013). Training Algorithms for Supervised Machine Learning: Comparative Study. INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY, 4(3), 354–360. https://doi.org/10.24297/ijmit.v4i3.773

[15] F.Y, O., J.E.T, A., O, A., J. O, H., O, O., & J, A. (2017). Supervised Machine Learning Algorithms: Classification and Comparison. International Journal of Computer Trends and Technology, 48(3), 128–138. https://doi.org/10.14445/22312803/ijctt-v48p126

[16] Heida, M. (2019, July 26). Wat is Machine Learning? Internet of Things Nederland. https://internetofthingsnederland.nl/wat-is-machine-learning/

[17] 3Bplus. (2020, May 4). Methoden van machine learning – supervised en unsupervised learning. https://3bplus.nl/methoden-van-machine-learning/

[18] Korbut, D. (2019, March 7). Machine Learning Algorithms: Which One to Choose for Your Problem. Medium. https://blog.statsbot.co/machine-learning-algorithms-183cc73197c

[19] Shung, K. P. (2020, April 10). Accuracy, Precision, Recall or F1? - Towards Data Science. Medium. https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9

[20] Brownlee, J. (2020, January 14). A Gentle Introduction to the Fbeta-Measure for Machine Learning. Machine Learning Mastery.

https://machinelearningmastery.com/fbeta-measure-for-machine-

learning/#:%7E:text=have%20been%20made.-,It%20is%20calculated%20as%20the%20ratio%20of%20correctly%20predicted%20positive,examples%20that %20could%20be%20predicted.&text=The%20result%20is%20a%20value%20between%200.0%20for%20no% 20recall,for%20full%20or%20perfect%20recall

[21] Koning, E. (2021, May). Personal interview [Personal interview].

[22] Harlan, A. (2020). You Might Be Leaking Data Even if You Cross Validate. Github. https://alexforrest.github.io/you-might-be-leaking-data-even-if-you-cross-validate.html

[23] Garbade, M. J. (2018, October 15). A Simple Introduction to Natural Language Processing. Medium. https://becominghuman.ai/a-simple-introduction-to-natural-language-processing-ea66a1747b32

[24] Ameisen, E. (2019, April 17). How to solve 90% of NLP problems: a step-by-step guide. Medium. https://blog.insightdatascience.com/how-to-solve-90-of-nlp-problems-a-step-by-step-guide-fda605278e4e [25] Understanding Support Vector Machines: A Primer. (2019, July 1). Machine Learning in Action. https://appliedmachinelearning.blog/2017/03/09/understanding-support-vector-machines-a-primer/ [26] Chen, L. (2019, January 7). Support Vector Machine — Simply Explained - Towards Data Science. Medium. https://towardsdatascience.com/support-vector-machine-simply-explained-fee28eba5496 [27] sklearn.model_selection.GridSearchCV — scikit-learn 0.24.1 documentation. (2021). Sklearn. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

[28] Pflugfelder, E. H. (2016). Reddit’s “Explain Like I’m Five”: Technical Descriptions in the Wild. Technical Communication Quarterly, 26(1), 25–41. https://doi.org/10.1080/10572252.2016.1257741

(45)

9 APPENDIX

Appendix I Data Columns Names Appendix II Text Data Columns Appendix III Data Selection Appendix IV Heart

Appendix V Code Step One Appendix VI Code Step Two Appendix VII Abbreviations

Appendix VIII Outcome Step Three, Phase Two Appendix IX Code Step Three, Phase Three Appendix X Outcome Step Three, Phase Three Appendix XI Code Step Four

Appendix XII Code Step Five

(46)

Appendix I – Data Columns Names

All 270 column names available in the data are translated to English and shown below:

(47)

Left pupil Pupil right Glucose Temperature Exposure free to fill in AMPLE Medication AMPLE Medication description AMPLE Past AMPLE Past description

AMPLE Event prior notice Prior Announcement POB.1 PALP.1 DYSPN COLLAPS AMPLE Event description

Reason for reporting Schouw Special anamnesis Tractus anamnesis Physical examination atient Deceased Working diagnosis arrhythmia Working diagnosis ACS Work diagnosis resuscitation Work diagnosis specialty Work diagnosis injury code Diagnostic cardiology Secondary diagnosis: neurology Internal medicine Secondary diagnosis intoxication Secondary diagnosis paediatrics Secondary diagnosis psychiatry Secondary diagnosis pulmonology Secondary diagnosis traumatology/surgery Specialist diagnosis obstetrics Secondary diagnosis: revalidation Explanation of treatment/follow-up care REANIMATION Recitals Treatment - Intubation Treatment - Ventilation Treatment - Infusion Treatment - Medication Treatment - Resuscitation Clinical picture - Nausea Clinical picture - Shock/bleeding Clinical picture - Chest pain Clinical picture - Collapse Clinical picture - Acute abdomen Clinical picture - Loss of consciousness Clinical picture - Insult Clinical picture - Headache Clinical picture - Fever Clinical picture - Intoxication Hypo-hyperthermia Pain in the back or flank Obstetric Syndrome Clinical picture - Psychological disorder Non-Trauma Treatment - Intubation Non-Trauma Treatment - Ventilation Non Trauma Treatment - Infusion Non Trauma Treatment - Medication Non Trauma Treatment - Resuscitation Destination Reception type Reported arrival time Arrival time

Details

(48)

Appendix II – Text Data Columns

There are 7458 rows in total. The example is study-id 16-18-195651.

Name in Dutch Name in English

Non NaN

Example Example in English

Ritnummer Trip number 7458 16-18-195651 16-18-195651

Geslacht Gender 7458 Vrouw Women

Leeftijd Age 7451 94.0 94.0

Medisch Kladblok MKA

Medical Notepad MKA

7444 dd dec cordis benauwd ah 40 sat 90% mogelijk pob gehad vandaag. nu niet is

aanspreekbaar vg: pacemaker/ nr beleid ha blijft tp

dd dec cordis benumbed ah 40 sat 90% possible pob had today. not now is

approachable vg: pacemaker/ no policy ha stays tp HF (hartfrequentie) HF (heart rate) 6062 78.0 78.0 SYS (systolische bloeddruk) SYS (systolic blood pressure) 5494 182.0 182.0 DIA (diastolische bloeddruk) DIA (diastolic blood pressure) 5458 97.0 97.0

Bleek Pale 908 Nan Nan

Koud, klam, transpireren Cold, clammy, sweating 396 Nan Nan Toelichting ECG Explanation ECG

4329 Af bundeltak links Off bundle branch left Exposure vrij in te vullen Exposure free to fill in 847 Nan Nan AMPLE Medicatie omschrijving AMPLE Medication description

5613 om de dag een tbl lasix hart tabletje

marcomar morfine. pleister

one tbl lasix every other day heart tablet marcomar morphine patch AMPLE Past omschrijving AMPLE Past description 6378 PM ivm hartritmestoornis,sinds 7 jaar

PM due to cardiac arrhythmia, for 7 years

AMPLE Event omschrijving

AMPLE Event description

5088 al 2 weken aan het rommelen niet lekker/ niet fit

3 weken in het alrijne nagekeken

alles goed

toen al had mw middenrif klachten

nu geen pob wel kortademig

vandaag gegeten en gedronken geen diarree

afgelopen nacht niet geslapen niet lekker

zelfstandige dame 1 x p/week thuiszorg

/huishoudelijke hulp/ die eigenlijk niks doet

messing around for 2 weeks not feeling well/not fit

3 weeks in the alrijne checked Everything fine

Even then mrs. had diaphragm complaints

now no pob but short of breath Eaten and drunk today no diarrhoea

did not sleep last night not feeling well

self-employed lady 1 x p/week home care

(49)

nr beleid ,wil wel graag 95 jaar worden

no policy, but would like to live to be 95 Reden van melding Reason for notification 7449 dd:dec cordis benauwd ah 40 x pmin sat 90%

mogelijk pob gehad vandaag nu niet is aanspreekbaar vg; PM / nr beleid ha nablijft tp dd:dec cordis stuffy ah 40 x pmin sat 90%

possible pob today not now

is approachable fg; PM / no policy ha stays tp Schouw Inspection 4360 mw ligt in bed

is kortademig

stapt er zelf uit,gaat eerst naar toilet

is kortademig ,goeie kleur, praat redelijke zinnen

mw lies in bed is short of breath

gets out of bed herself, goes to the toilet first

is short of breath, good colour, talks reasonable sentences Speciale anamnese Special anamnesis 2360 Nan Nan Tractus anamnese Tractus anamnesis

855 a:vrij ,praat redelijke zinnen b; bdz longen crepitaties c: irr 68 d geen koorts thorax bdz crepitaties 4 l O2 sat 98% buik soepel

afgelopen weken last van middenrif / bovenbuik gehad en daarvoor morfinepleister gehad met succes

a; free, talks reasonable sentences b; bdz lungs crepitations c:irr 68 d no fever thorax bdz crepitations 4 l O2 sat 98% abdomen supple

last few weeks suffering from diaphragm / upper abdomen and had a morphine patch with success Lichamelijk onderzoek Physical examination 1696 Nan Nan Werkdiagnose specialisme Work diagnosis specialism 7455 Cardiologie Cardiology Werkdiagnose letselcode Work diagnosis injury code

6994 ["ACS - instabiele AP/non-STEMI","hartritmestoornis"," overig","decompensatio cordis"]

ACS - unstable AP/non-STEMI", "arrhythmia", "other", "decompensatio cordis"] Toelichting behandeling/ vervolgzorg Explanation treatment / follow-up care 2538 Nan Nan

Overwegingen Considerations 2076 dec cordis dec cordis Ziektebeeld –

Pijn op de borst

7458 False False

Bijzonderheden Particulars 794 toenemend

benauwd,crepiteerd over beide longvelden

geen pob

sat met 4 liter 98% 40 Mg lasix iv gehad geen koorts

increasing respiratory distress, crepitations in both lung fields no pob

sat with 4 litres 98% 40 Mg lasix iv given no fever

(50)

Appendix III – Data Selection

In the table below all data, columns are marked whether they are text columns and if they are selected as important by a cardiologist.

Column name, Dutch Column name, English Selected by cardiologist

A text column

Geslacht Gender X

Leeftijd Age X

Medischkladblok MKA Medical Notepad MKA X X

HF (hartfrequentie) HF (heart rate) X

SYS (systolische bloeddruk) SYS (systolic blood pressure) X DIA (diastolische bloeddruk) DIA (diastolic blood pressure) X

Bleek Pale X

Koud, klam, transpireren Cold, clammy, sweating X

Toelichting ECG Explanation ECG X X

Exposure vrij in te vullen Exposure free to fill in X X AMPLE Medicatie omschrijving AMPLE Medication description X

AMPLE Past omschrijving AMPLE Past description X X

AMPLE Event omschrijving AMPLE Event description X X

Reden van melding Reason for report X X

Schouw Fireplace X X

Speciale amnese Special amnesia X X

Tractus amnese Tractus amnese X

Lichamelijk onderzoek Physical examination X

Werkdiagnose specialisme Work diagnosis specialism X

Werkdiagnose letselcode Diagnostic label X

Nevendiagnose cardiologie Additional diagnosis cardiology X Toelichting

behandeling/vervolgzorg

Explanation of treatment/follow-up

care X

Overwegingen Considerations X

Ziektebeeld POB Clinical picture POB X

Bijzonderheden Particulars X X

(51)
(52)

Appendix V – Code Step One

# load dataset

path = "~/ambulance/data/Data.csv"

df = pd.read_csv(path)

columnnames = ["MedischKladblokMKA", "Toelichting ECG", "Exposure vrij in te v ullen","AMPLE Medicatie omschrijving", "AMPLE Past omschrijving", "AMPLE Event omschrijving", "Reden van melding", "Schouw", "Speciale anamnese", "Tractus a namnese", "Lichamelijk onderzoek", "Toelichting behandeling/vervolgzorg", "Ove rwegingen", "Bijzonderheden", 'compiledALL', "compiledTEXT", 'compiledSELECT'] # stopwords settings

stop = sorted(adv.stopwords['dutch']) stop.remove('niet')

stop.remove('niets') stop.remove('zonder') # stemmer settings

stemmer = SnowballStemmer("dutch") w_tokenizer = WhitespaceTokenizer() def stemm_text(text):

return ' '.join([stemmer.stem(w) for w in w_tokenizer.tokenize(str(text))]) # execute preprocessing

df = df.replace(np.nan, ' ', regex=True) for name in columnnames:

df[name] = df[name].str.replace('[^\w\s]',' ') df[name] = df[name].str.lower()

df[name] = df[name].apply(lambda x: ' '.join([word for word in x.split() i f word not in (stop)]))

df[name] = df[name].apply(stemm_text) # make frame to save results

results = pd.DataFrame(columns = ['SVM' , 'RF', 'KNN' , 'LR'], index=columnnam es)

clf = SVC(kernel='linear', class_weight='balanced') for name in columnnames:

X = df[name]

y = df['hart_infarct']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

random_state = 0)

vec = TfidfVectorizer()

(53)

y_actuals = y_test

y_preds = pipe.predict(X_test)

fbeta = fbeta_score(y_actuals, y_preds, average='micro', beta=2) results.at[name, 'SVM'] = fbeta

print_report(pipe)

clf = RandomForestClassifier(max_depth=2, random_state=0, class_weight='balanc ed')

for name in columnnames: X = df[name]

y = df['hart_infarct']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

random_state = 0)

vec = TfidfVectorizer()

pipe = make_pipeline(vec, clf) pipe.fit(X_train, y_train) def print_report(pipe): y_actuals = y_test

y_preds = pipe.predict(X_test)

fbeta = fbeta_score(y_actuals, y_preds, average='micro', beta=2) results.at[name, 'RF'] = fbeta

print_report(pipe)

clf = KNeighborsClassifier(n_neighbors=2) for name in columnnames:

X = df[name]

y = df['hart_infarct']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

random_state = 0)

vec = TfidfVectorizer()

pipe = make_pipeline(vec, clf) pipe.fit(X_train, y_train) def print_report(pipe): y_actuals = y_test

y_preds = pipe.predict(X_test)

fbeta = fbeta_score(y_actuals, y_preds, average='micro', beta=2) results.at[name, 'KNN'] = fbeta

print_report(pipe)

clf = LogisticRegression(random_state=0, class_weight='balanced') for name in columnnames:

X = df[name]

y = df['hart_infarct']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

random_state = 0)

vec = TfidfVectorizer()

(54)

def print_report(pipe): y_actuals = y_test

y_preds = pipe.predict(X_test)

fbeta = fbeta_score(y_actuals, y_preds, average='micro', beta=2) results.at[name, 'LR'] = fbeta

print_report(pipe)

(55)

Appendix VI – Code Step Two

# load dataset

path = "~/ambulance/data/Data.csv"

df = pd.read_csv(path)

columnnames = ["MedischKladblokMKA", "Toelichting ECG", "Exposure vrij in te v ullen","AMPLE Medicatie omschrijving", "AMPLE Past omschrijving", "AMPLE Event omschrijving", "Reden van melding", "Schouw", "Speciale anamnese", "Tractus a namnese", "Lichamelijk onderzoek", "Toelichting behandeling/vervolgzorg", "Ove rwegingen", "Bijzonderheden", 'compiledALL', "compiledTEXT", 'compiledSELECT'] # stopwords settings

stop = sorted(adv.stopwords['dutch']) stop.remove('niet')

stop.remove('niets') stop.remove('zonder') # stemmer settings

stemmer = SnowballStemmer("dutch") w_tokenizer = WhitespaceTokenizer() def stemm_text(text):

return ' '.join([stemmer.stem(w) for w in w_tokenizer.tokenize(str(text))]) # execute preprocessing

df = df.replace(np.nan, ' ', regex=True) for name in columnnames:

df[name] = df[name].str.replace('[^\w\s]',' ') df[name] = df[name].str.lower()

df[name] = df[name].apply(lambda x: ' '.join([word for word in x.split() i f word not in (stop)]))

df[name] = df[name].apply(stemm_text) # make frame to save results

results = pd.DataFrame(columns = ['fbeta', 'recall', 'specificity', 'tp', 'tn'

, 'fp', 'fn'], index=columnnames)

def report(pipe): y_actuals = y_test

y_preds = pipe.predict(X_test)

fbeta = fbeta_score(y_actuals, y_preds, average='micro', beta=2) results.at[name, 'fbeta'] = fbeta

CM = confusion_matrix(y_test, y_preds) TN = CM[0][0]

(56)

FN = CM[1][0] results.at[name, 'fn'] = FN TP = CM[1][1] results.at[name, 'tp'] = TP FP = CM[0][1] results.at[name, 'fp'] = FP specificity = TN/(TN+FP)

results.at[name, 'specificity'] = specificity recall = TP/(TP+FN)

results.at[name, 'recall'] = recall vec = TfidfVectorizer()

clf = KNeighborsClassifier(n_neighbors=2) for name in columnnames:

X = df[name]

y = df['hart_infarct']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

random_state = 0) pipe = make_pipeline(vec, clf) pipe.fit(X_train, y_train) report(pipe) print(results)

clf = SVC(kernel='linear', class_weight='balanced') for name in columnnames:

X = df[name]

y = df['hart_infarct']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,

random_state = 0)

pipe = make_pipeline(vec, clf) pipe.fit(X_train, y_train) report(pipe)

References

Related documents

Numerous techniques have been developed to detect faces in a single image; in this project we have used a classification-based face detection method using Gabor filter

Signal Generation Signal Decomposition Detail Extraction Model Training Classification Test Results Analysis Supported by Biorthogonal Wavelet Analysis Supported by Multiclass

Spectroscopy workflow Samples (training and internal validation) Samples (external validation set) NIR-HSI acquisition Hypercube unfolding and spectra extraction Data

Also students in the WithC-group tended to reach better result in terms of how well they taught their TA (diff=0,3, p=0.07). The table and diagram shows the difference between

5.3 Estimation scenario C - Real world fuel consumption using vehicle and operational parameters... 84 F.3 Estimation

If a machine learning model can accurately predict treatment outcomes, it has the potential to be used in clinical practice as a tool for decision making support, in order

The contribu- tions that result from the studies included in this work can be summarized into four categories: (1) exploring different data representations that are suitable for

First, if the halfway line has been detected, it can be used to determine what half of the field the ball is in.. In addition to that, it can also be determined which third the ball