An Analysis of the Reliability of Internet-Based Symptom Checkers

(1)

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM, SWEDEN 2020

An Analysis of the Reliability of

Internet-Based

Symptom Checkers

Eva Despinoy and Sarah Narrowe

Danielsson

(2)

Degree Project in Computer Science, DD142X

Swedish Title

En analys av pålitligheten av internetbaserade självtester

Authors

Eva Despinoy <eva.despinoy@gmail.com>

Sarah Narrowe Danielsson <sarahnarrowedanielsson@gmail.com>

KTH Royal Institute of Technology

School of Electrical Engineering and Computer Science

Examiner

Pawel Herman

Supervisor

Jeanette Hällgren Kotaleski

(3)

Abstract

Symptom checkers are online tools used for suggesting diagnoses and/or giving

triage advice based on symptoms inputted by a person. The purpose of this study

is to investigate if some specific symptom checkers are reliable by analyzing the

diagnosis results and triage advice given when inputted with specific symptoms,

and by comparing the questions asked by the different checkers. This was done

by testing on four general symptom checkers by inputting them with symptoms

for five different illnesses, and on four symptom checkers designed specifically for

the disease covid-19, which were inputted with symptoms which corresponded to

different severity levels of the illness. The tools used in the study are the Jaccard

Index and Cosine Similarity for comparing the questions, and an implementation

of the RAKE algorithm which transformed the lists of questions to arrays of

keywords. Manual categorizing was used for analyzing the triage advice and

the diagnosis results. The results generated were that the symptom checker’s

questions were not very similar to each other. The manual categorizing of the

general symptom checker’s triage advice showed that most of the general checkers

gave advice recommending the patient to use hospital services even though it

might not have been necessary. In contrast to this the covid-19 triage advice

avoided to recommend the use of hospital services. The diagnosis result of the

general symptoms checkers did not place any of the tested illnesses below fifth

place in the list of the possible diagnoses. In most cases the correct illness was

placed first. In conclusion, according to this study symptom checkers can be seen

as quite reliable. However further studies are needed to address the weaknesses

of thes study, such as little data and imperfect question comparison results.

(4)

Sammanfattning

Självtester online är tester som utifrån ett antal inmatade symptom ger förslag på diagnoser och/eller rekommendationer för vad nästa steg i att hantera sina symptom borde vara. Det kan handla om att träffa en läkare eller behandla symptomen med egenvård. Syftet med den här studien är att undersöka huruvida vissa utvalda självtester är tillförlitliga, genom att undersöka diagnosförslagen, rekommendationerna och frågorna. Studien genomfördes på fyra generella självtester där symptom för fem olika sjukdomar inmatades, och på fyra självtester online som specifikt skapats för covid-19, där symptom för fyra olika fall som hade symptom för sjukdomen i olika grader inmatades.

Strängjämförelseverktygen som användes för att jämföra testernas frågor var

Jaccard Index och Cosine Similarity. Frågorna hade först transformerats till listor

av nyckelord med RAKE-algoritmen. Diagnosförslagen och rekommendationerna

jämfördes och kategoriserades manuellt. Resultatet visade att frågorna från de

olika självtesterna var väldigt olika varandra. Den manuella kategoriseringen

visade att de generella testerna oftast rekommenderade att uppsöka sjukvården

även om det kanske inte alltid behövdes. Däremot undvek självtesterna för covid-

19 att rekommendera kontakt med sjukvården. Ingen av de generella hemsidorna

satte rätt diagnos lägre än femteplats i listan på förslag på olika diagnoser. I de

flesta fall var rätt diagnos på första plats. Slutligen visade resultatet från den här

undersökningen att självtester online verkar vara tillförlitliga. Dock behöver fler

studier göras för att hantera svagheterna i den här studien, såsom lite data och

potentiellt bristfälliga resultat från frågejämförelserna.

(5)

1 Introduction 1

1.1 Research Question . . . . 1

1.2 Purpose . . . . 2

1.3 Scope . . . . 2

1.4 Disposition . . . . 2

2 Background 3 2.1 Online Self-Diagnosis . . . . 3

2.2 Symptom Checkers . . . . 3

2.3 Illnesses . . . . 5

2.4 RAKE-NLTK . . . . 7

2.5 Jaccard Index . . . . 8

2.6 Cosine Similarity . . . . 8

2.7 Difference Between Jaccard Index and Cosine Similarity . . . . 9

2.8 Past Studies . . . . 9

3 Method 11 3.1 Data Collection . . . . 11

3.2 Data Analysis . . . 14

4 Results 16 4.1 General Symptom Checkers . . . 16

4.2 Covid-19 Symptom Checkers . . . 20

5 Discussion 24 5.1 Analysis of results . . . 24

5.2 Analysis of the Method . . . 29

5.3 Comparison with Past Research . . . 29

5.4 Future Research . . . . 31

6 Conclusion 32

7 References 33

(6)

1 Introduction

Our society is in constant technological evolution. Today, most people that live in developed countries have access to the internet [1]. This spawns a lot of new possibilities in different fields, including the medical field. A behavior which has arisen due to this is that people research online when they feel sick in order to find out what their condition potentially is. They might even self-diagnose themselves using the information they find. This often happens before meeting a doctor, or even replaces such visits. The tools that they can use are online forums where users answer each other’s questions, websites where doctors answer users’

questions or online symptom checkers. These resources are not guaranteed to be reliable, which could result in giving people the wrong diagnosis or counselling them incorrectly about whether they should meet a doctor or not. If a tool is overly optimistic and assumes that a person is not very sick, it could lead to the sick person not taking the symptoms seriously and not getting help in time.

On the contrary, if the tool always directs the patient to the doctor, it can be a waste of resources and money. If the symptom checker outputs a diagnosis that is incorrect, this could result in the patient believing that this flawed diagnosis is correct. This could further lead to the patient self-medicating wrongly or even persuading a doctor that this diagnosis is correct.

1.1 Research Question

This report examines a sample of online symptom checkers to answer the following questions:

How reliable are online symptom checkers?

Is it possible to answer this question by comparing checkers using tools such

as the Jaccard Index, Cosine Similarity and manual categorizing, and by

examining if the checkers can find the correct diagnosis when inputted with

specific symptoms for an illness?

(7)

1.2 Purpose

The purpose of the report is to make an assessment about different symptom checker websites and to state their reliability. This is done by comparing the questions asked and the results outputted by these checkers. The findings of the study are also compared to previous studies in the field. The method and the results could also be used to build upon to analyze more symptom checkers.

1.3 Scope

This paper investigates self-diagnoses that are specifically based on using symptom checkers that are conducted like questionnaires, in the form of websites or apps. This means that other ways of conducting self-diagnosis online by for example googling one’s symptoms or asking a community are excluded from the study.

In addition to this, the report examines only a chosen sample of symptom checker websites and diseases and symptoms, to make an assessment around those.

1.4 Disposition

In chapter two, symptom checkers are further described and categorized.

Relevant past studies are cited, and a description of the diseases and symptoms

that were used to test the symptom checkers is made. In addition to this, the

algorithm used to extract the keywords of the symptom checkers questions is

described, as well as the Jaccard Index of Similarity and the Cosine Similarity,

which are the tools used to compare the different symptom checker questions. In

chapter three, the exact steps of the data collection and analysis are described, as

well as the symptom checkers and input data used. In chapter four, the results of

the study are presented. In chapter five, the results are discussed and compared

with previous studies. The method is also discussed, and possible future research

is suggested. Chapter six presents the conclusion of the study.

(8)

2 Background

This section describes the central themes of the report, the tools used in the analysis and the inputs used in the method. It also summarizes past research made about the subject.

2.1 Online Self-Diagnosis

The term self-diagnosis was previously used in the introduction. Self-diagnosis is the act of identifying a medical condition in oneself. Jutel (2010) states in her literary review of 51 papers that 31% of those found self-diagnosis to be reliable and desirable, 23% found it unreliable yet desirable if it becomes more reliable and 28% found it not reliable or desirable. The other studies had mixed views [2].

A UK study from 2016 states that 1 in 4 of all UK citizens self-diagnose through the internet instead of contacting a doctor [3]. There are several studies that have dived into how this kind of self-diagnosis affects humans psychologically. A term that is commonly used in “cyberchondria”. The first study connected to this term was conducted by Microsoft in the year 2014 [4]. The term can be seen as a form of hypochondria that is triggered by information online.

This thesis has chosen not to look at the psychological aspect of self-diagnosis but rather at the technical one. How reliable are online symptom checkers?

2.2 Symptom Checkers

There are two types of symptom checkers. These are described in the study conducted by Semigran et al. (2015). The first type tries to assess a diagnosis and the second one is classed as triage symptoms checkers. Checkers of the first type usually consist of questions about symptoms and health background, and give a list of diseases as a result. The diseases are usually ranked after how likely it is that the user has them. Triage symptoms checkers give advice on how one should move forward. This advice could for example be ‘self-care’ or ‘seek help from a doctor’. There are also symptom checkers that offer both triage advice and suggest diagnoses [5].

An example of a triage symptom checker is the one that Region Stockholm (SLL)

(9)

has published for covid-19, which is used in the study. Figure 2.1 below shows an example of how a question asked by the symptom checker looks like, and Figure 2.2 shows one possible output of the test. In this case the test recommends self- care (egenvård).

Figure 2.1: A question from the Region Stockholm covid-19 test.

Figure 2.2: One possible result of the Region Stockholm covid-19 test.

An example of a hybrid between a triage symptom checker and an assessing

diagnosis symptom checker used in the study is Symptomate, which is shown

in the figures below. Figure 2.3 below shows a question asked by the symptom

checker and Figure 2.4 shows a possible output of the test. In this case the test

results in the recommendation of emergency care and suggests that it could be

either migraine or a brain tumor.

(10)

Figure 2.3: A question from the Symptomate test.

Figure 2.4: A possible result of the Symptomate test.

2.3 Illnesses

In the following section the diseases used in the study are shortly described.

2.3.1 Covid-19

Covid-19 is ignited by a virus in the SARS-family known as sars-cov-2. The

symptoms of the disease are similar to those of the common cold. Fever occurs

in 88% of the cases and dry cough and tiredness are also common symptoms. The

disease is not lethal to most people but if an individual is over 70 or already sick

it can be very dangerous. The disease is a droplet infection, which means that it

spreads through sneezes and coughs. The only test that has 100% accuracy is a

blood drawn test. [6]

(11)

2.3.2 Tonsillitis

Tonsillitis is an inflammation of the tonsils. Some symptoms and signs of the disease are swollen tonsils, sore throat, difficulty swallowing, fever, headache and tender lymph nodes on the sides of the neck. It is more common to get tonsillitis when young due to the fact that the tonsils’ ability to stop infection is stronger then. The tonsils act as the immune system’s first line of defense but after puberty their ability declines. The most common way to test for tonsillitis is by swabbing the throat. [7][8]

2.3.3 Pneumonia

Pneumonia is the inflammation in the tissue of one or both lung(s). The symptoms can appear suddenly in a range of 24-48 hours or build up during the span of a couple of days. Some symptoms of pneumonia are a dry or wet cough, difficulty breathing, rapid heartbeat, high body temperature, sweating and shivering, chest pain and loss of appetite. Pneumonia can be difficult to diagnose since the symptoms are similar to those of the common cold, bronchitis and asthma. A doctor is usually able to diagnose patients by asking questions about the patient’s symptoms but some cases might require blood tests and x-rays. [9]

2.3.4 Migraine

Migraine is a form of headache that is usually combined with vomiting, nausea and light sensitivity. Migraines can last from 4 hours to several days and in some cases even longer. Migraines can be triggered by various conditions such as stress, lack of sleep and skipping meals. There is no test to diagnose migraines, the diagnosis is instead made through a doctor investigating what could cause these headaches and ruling out other potential diagnoses. Migraine is more common in younger people and usually decreases with age. [10]

2.3.5 Irritable Bowel Syndrome (IBS)

Irritable bowel symptom is a condition associated with the digestive system. The

most common symptoms of IBS are stomach pain or cramps, bloating, diarrhoea

and constipation. The severity of the symptoms can vary during different days.

(12)

There could be certain food or drinks that trigger the symptoms. Other less common symptoms of IBS are farting, tiredness and lack of energy, backache and incontinence. There is no test for IBS and there is no diet that works for everyone.

The patient with the symptoms has to discuss how to move forward with a doctor and/or dietitian in order to find a diet that works. [11][12]

2.3.6 Coeliac Disease

Coeliac disease is when the immune system attacks the person’s own tissue when he/she eats gluten. This prevents the body from taking up nutrients from food.

Common symptoms for coeliac disease are diarrhoea, stomach aches, bloating and farting, indigestion and constipation. Other symptoms include fatigue due to malnutrition, unintentional weight loss, itchy rash, infertility, nerve damage and problems with coordination, balance and speech. There is no cure for coeliac disease but the symptoms decrease when a gluten-free diet is followed. Coeliac disease is diagnosed by blood tests and biopsy. After the diagnosis, additional tests may be performed to check how the condition has affected the patient.

[13][14]

2.4 RAKE-NLTK

The RAKE (Rapid Automatic Keyword Extraction) algorithm determines key phrases in a text by analyzing the frequency of the appearance of certain words and how they appear in conjunction with other words [15]. The algorithm parameters are stopwords, which are words with limited lexical meaning like “and” or “the”, a set of phrase delimiters and a set of word delimiters. The input is a text document which is first partitioned into candidate keywords. These are words or sequences of words delimited by stopwords or punctuation. A score is then given to each word in the candidate keywords list. This score is given by the formula:

Score = degree(word)/f requency(word)

The frequency indicates how many times the word appears in the candidate list,

and the degree is the frequency of the word appearing in a sequence with other

(13)

words in the candidate words array. The top scoring words are then selected as keywords for the document.

NLTK (Natural Language Toolkit) is a Python platform which provides libraries for working with data which are strings written in human language [16]. RAKE- NLTK is a library which uses the RAKE algorithm to determine the key sentences of a document with help of the NLTK platform that is used to find stopwords [17].

2.5 Jaccard Index

The Jaccard Index is used to describe the similarity between two sample sets. It is given by the formula:

J (A, B) = |A ∩ B|

|A ∪ B|

The resulting index is a number between 0 and 1 which represents the similarity of the two sets, where 1 is total similarity and 0 total dissimilarity [18].

In the implementation of this index in the report, multisets were used instead of sets. This means that one element can appear several times in the same list. This is because the appearance of a word several times in an array means that it was used several times in the questions which bears an interesting meaning, as it is of value to study if it also occurs several times in other arrays.

2.6 Cosine Similarity

For calculating the Cosine Similarity between two arrays of words, each array is converted to a vector which contains the times that each word appears in the text.

Cosine Similarity then is plotted by the formula:

similarity(A, B) = cos(θ) = a · b

|||a|| × ||b|||

a and b are the vectors consisting of the term frequency of each word appearing in

one of the texts. θ is the angle between the two vectors A and B. The result ranges

from 0 to 1, with 0 meaning that the sets have nothing in common and 1 meaning

that the sets are perfectly similar. [19]

(14)

2.7 Difference Between Jaccard Index and Cosine Similarity

The difference between the Jaccard Index and the Cosine Similarity lies in the denominator. In the Jaccard Index the denominator consists of the union of the two multisets, including duplicates within a multiset. In Cosine Similarity the denominator consists of the total number of attributes that exist in at least one of the sets. If duplicates are not removed then there will be a difference in the numerator as well. This difference is due to the fact that the numerator for the jaccard index consists of the intersection between the multisets. The numerator for the cosine similarity consists of the number of words they have in the common, including duplicates.

Example:

Multiset 1: “Friday Friday Friday Friday Friday Friday”

Multiset 2: “Friday Thursday”

The Jaccard Index implemented with multisets is then:

|{F riday}|

|{F riday, F riday, F riday, F riday, F riday, F riday, T hursday}| = 1/7≈0.143

Calculations Cosine Similarity: The numerical vectorization for set 1 would be:

(6,0) The numerical vectorization for set 2 would be: (1,1) The Cosine Similarity would be:

(6 ∗ 1) + (0 ∗ 1)

sqrt(6

²

+ 0

²

) ∗ sqrt(1

²

+ 1

²

) ≈0.707

2.8 Past Studies

Previous studies have investigated the accuracy of symptom checkers. One of the

most cited is a study by Semigran et al. (2015) which tested the accuracy of 23

symptom checkers of different kinds. All of the checkers were in English. The

conclusion of the study was that these symptom checkers had very low diagnosis

accuracy. Also, the triage symptom checkers often recommended seeking help

(15)

from a doctor when self-care was a more reasonable recommendation. However the study mentioned that these tools were quite new at the time, and could very well be improved in the next few years. [5]

There are also quite a lot of studies that have investigated the result of symptom checkers for one specific disease. One such study was conducted by Powley et al.

(2016) that investigated the use of symptom checkers for the disease inflammatory arthritis. Real patients were asked to put in their symptoms using two symptom checkers, one triage and one diagnosis. The conclusion that came out of this study was that the diagnoses were frequently inaccurate and that the triage advice was most often inappropriate. They also suggested that the triage advice of emergency services given could result in inappropriate use of the healthcare system. [20]

One study conducted by Cornell University (2016) claimed that they had created a symptom checker that gave better diagnosis results than an actual doctor [21].

However this study was later criticized by three scientists in the medical journal The Lancet. These scientists declared that the methods used in this study were so heavily flawed that the results could not be interpreted as true [22].

Another study conducted by Chambers et al. (2019) looked through a total of 29 publications looking through a total of 27 studies in order to find evidence of positive effects of symptom checkers. The study found that the accuracy of diagnosis symptom checkers were low. No specific number was given but this is probably due to the fact that the different studies investigated in the paper portrayed their results differently. Another finding from the study was that in 85% of the cases, algorithm-based triage symptom checkers gave the advice to see a doctor. [23]

These studies show that it is hard to create a symptom checker with good results.

It is also hard to prove the accuracy of these symptom checkers, which makes the

evaluation of them tricky.

(16)

3 Method

The method that was used to examine the different symptom checker websites is described in this section. Part 3.1 explains how the data collection was carried out. Part 3.2 explains how the analysis of the data was conducted.

3.1 Data Collection

The data that was collected were the questions used and answers outputted by different symptom checkers when inputted with chosen values. This information was then analyzed to answer the research question.

3.1.1 Chosen Websites

Firstly, the websites used in the analysis were chosen. They were selected based on the reliability of the organisations which published them.

The study is divided into two parts, one focusing on checkers diagnosing different illnesses, and one on websites which focus on how the user should react based on which covid-19 symptoms they have.

Symptom Checkers Diagnosing Different Illnesses (General Symptom Checkers)

• Symptomate symptom checker (Triage & assessing diagnosis) https://symptomate.com/

• Mayo Clinic symptom checker (Triage & assessing diagnosis)

https://www.mayoclinic.org/symptom-checker/select-symptom/

itt-20009075

• Isabel symptom checker (Triage & assessing diagnosis) https://symptomchecker.isabelhealthcare.com/

• Australian governmental organisation healthdirect (Triage) https://healthdirect.gov.au/symptom-checker/tool

Covid-19 Self-Test Tools All the used covid-19 tools are triage symptom

checkers.

(17)

• Symptom checker in the app “Kry”

• The self-test published by Stockholm region https://corona.sll.se/

• The self-test published by the Welsh National Health Service https://www.nhsdirect.wales.nhs.uk/SelfAssessments/

symptomcheckers/COVID19.aspx

• The self-test recommended by the French government https://maladiecoronavirus.fr/

3.1.2 Input Data

To decide which data should be inputted, the sought output was first decided.

The output could be a specific disease (for example “Migraine”) or an assessment of how serious the condition is based on the triage advice. This section describes exactly which symptoms were inputted to try to output the different sought output.

The complete list of questions for each symptom checker, in addition to the values inputted can be found in Appendix A and Appendix B respectively.

General Input Data

When the tests asked general questions about the individual using the tool, the answer provided followed the following template, a standard person which is based on the average woman in Sweden in the year 2016.

Gender: Female Age: 41

Length: 166 cm Weight: 68 kg

Other: No pregnancy, smoking or chronic disease. No painful menstrual periods.

[24][25]

Input Data for General Symptom checkers

Symptoms to input were decided based on the diseases that had been chosen. The

exact input for each symptom checker and disease are enclosed in the appendix.

(18)

Disease 1: Tonsillitis

Input symptoms (6): swollen tonsils, sore throat, headache, bad breath, fever (between 38° and 40° Celcius), hard time swallowing

Disease 2: Pneumonia

Input symptoms (5): wet cough, difficulty breathing, rapid heartbeat, high body temperature (between 38 and 40), chest pain (stabbing)

Disease 3: Migraine

Input symptoms (5): headache (pulsating and on one side), nausea, dizziness, loss of appetite and light sensitivity

Disease 4: IBS

Input symptoms(7) : Stomach cramps (below belly button), diarrhoea, bloating, constipation, tiredness, stomach pain decreases after bowel movement or passing gas, stomach pain after eating

Disease 5: Coeliac Disease

Input symptoms (5): diarrhoea (foamy), fatigue, unexplained iron-deficiency anemia, bone or joint pain, missed menstrual periods

Input Data for Covid-19 Symptom Checkers

For testing this type of tool, the following four different cases were created, with the enclosed input.

Case 1: Person with no symptoms

Case 2: Person who shows some symptoms.

The standard answer for this person was a 38,5˚ fever, and some coughing and tiredness.

Case 3: Person who shows a lot of symptoms.

The standard answer for this person was a 39,5˚ fever, coughing, tiredness, finding it hard to breath, taste loss, hurting in the muscles.

Case 4: Person who shows some symptoms (the same as Case 2), has risk

factors and has met with a person who has covid-19 and/or travelled abroad

recently.

(19)

3.2 Data Analysis

After inputting the chosen symptoms into the websites, the questions used by the websites and the given answers were saved in a Google Sheets document.

The analysis of the data for both types of symptom checkers followed several steps.

3.2.1 General Symptom Checkers

Firstly, the questions asked by the different symptom checkers were compared.

The motivation behind this is that if the questions are found to be similar, the reliability of the different checkers should also be similar. Also if several checkers published by reliable sources have similar questions, it should also point to the fact that their results are reliable. The questions were converted to Python arrays, with one array listing the questions asked by one symptom checker. Each array was then converted to one big string consisting of all the questions, and the “key expressions” of the string were extracted by the Rake-NLTK algorithm. Then, the punctuation was removed from the resulting array. After this, the expressions were further divided into single words. Then, the Jaccard Index and the Cosine Similarity of the different combinations of question keywords was calculated. An illustration of the process is presented in Appendix E, and the code used for the conversion is shown in Appendix D.

Secondly, the triage advice that was given by the symptom checkers was analyzed.

The advice was first classified in different categories (different types of answers) as it seemed easier to understand the different answers and to compare them with each other. The different advice was then evaluated and compared to each other.

The goal of this was to see if the triage advice seemed to be well-tailored to the symptoms inputted.

Thirdly, the accuracy of the diagnosis was decided by checking the ranking of the

correct diagnosis in the list of suggested diagnoses, and also presented in a table

to be analyzed.

(20)

3.2.2 Covid-19

The questions were also first converted to Python arrays. The arrays that

consisted of the French and Swedish questions were translated into English by an

implementation of Google Translate. Then, the questions and the triage advice

were analyzed in the same way as the data produced by the general symptom

checkers. This type of symptom checkers did not output diagnoses.

(21)

4 Results

In this section, the results from the data analysis are presented. The exact questions, input and output for each case are enclosed respectively in Appendix A, B and C.

4.1 General Symptom Checkers

4.1.1 Similarity of questions

The questions that were processed for the general symptom checkers are the ones asked by the checkers when inputted with migraine symptoms. The keywords extracted from the different arrays of questions are the following:

Keywords of the Symptomate questions (length = 55)

['1', '10', '2500', '8200', 'add', 'age', 'arms', 'cholesterol', 'cigarettes', 'describe', 'dizziness', 'episodes', 'even',

'experiencing', 'following', 'ft', 'headache', 'headache',

'headaches', 'headaches', 'high', 'hypertension', 'injured', 'last', 'legs', 'level', 'lightheadedness', 'located', 'location', 'long', 'move', 'obese', 'overweight', 'past', 'please', 'please',

'pregnant', 'recently', 'recently', 'regions', 'scale', 'sea', 'select', 'select', 'select', 'sex', 'similar', 'smoke', 'strong', 'symptoms', 'symptoms', 'try', 'usually', 'weakness', 'would']

Keywords of the Mayo Clinic questions (length = 12)

['accompanied', 'choose', 'duration', 'headache', 'located', 'onset', 'pain', 'recurrence', 'relieved', 'symptom', 'triggered', 'worsened']

Keywords of the Isabel questions (length = 43)

['activities', 'affecting', 'age', 'better', 'birth', 'cancer',

'changed', 'condition', 'conditions', 'country', 'daily', 'days',

'describe', 'develop', 'diabetes', 'discomfort', 'etc', 'feel',

'gender', 'heart', 'hours', 'last', 'list', 'long', 'long',

(22)

'medication', 'much', 'pain', 'pregnant', 'quickly', 'recently', 'residence', 'select', 'serious', 'symptoms', 'symptoms', 'symptoms', 'symptoms', 'symptoms', 'taking', 'term', 'visited', 'words']

Keywords of the Health Direct questions (length = 108)

['activities', 'age', 'anything', 'anything', 'anywhere', 'area', 'area', 'arm', 'arms', 'bad', 'bleeding', 'blow', 'blue', 'body', 'body', 'bothering', 'bright', 'bruise', 'bumps', 'came', 'chest', 'chin', 'clearly', 'clusters', 'confused', 'could', 'difficulty', 'discoloured', 'drooped', 'drowsy', 'extremely', 'facial', 'feeling', 'flat', 'following', 'gender', 'head', 'headache', 'headache',

'illness', 'include', 'injured', 'isolated', 'itchy', 'joints', 'knock', 'last', 'light', 'like', 'like', 'likely', 'limbs', 'looking', 'looks', 'mouth', 'move', 'onset', 'others', 'pain', 'painful', 'patches', 'patient', 'pinprick', 'possible', 'purple', 'raise', 'raised', 'rash', 'red', 'red', 'required', 'say',

'serious', 'serious', 'severe', 'severe', 'severe', 'skin', 'skin', 'skin', 'slight', 'small', 'small', 'small', 'smile', 'speak', 'speech', 'speed', 'spots', 'spots', 'stop', 'stops', 'stroke', 'suddenly', 'symptom', 'symptoms', 'symptoms', 'symptoms',

'symptoms', 'symptoms', 'tiny', 'understand', 'unusually', 'usual', 'weakness', 'weakness', 'week', 'without']

Table 4.1: The Jaccard Index of the different types of combinations of symptom checker questions (which have been previously transformed into an array of keywords)

Symptomate Mayo Clinic Isabel Australia H.D.

Symptomate 1 0.0308 0.1011 0.0724

Mayo Clinic 0.0308 1 0.0185 0.0345

Isabel 0.1011 0.0185 1 0.0786

Australia H.D. 0.0724 0.0345 0.0786 1

Color grey was given to the diagonal which represents the indexes of on symptom

checker with itself and which is of course always one.

(23)

Table 4.2: The Cosine Similarity of the different types of combinations of symptom checker questions (which have been previously transformed into an array of keywords)

Symptomate Mayo Clinic Isabel Australia H.D.

Symptomate 1 0.1035 0.3113 0.2053

Mayo Clinic 0.1035 1 0.0358 0.1127

Isabel 0.3113 0.0358 1 0.31

Australia H.D. 0.2053 0.1127 0.31 1

The Jaccard Index and the Cosine Similarity show the same trend in tables 4.1 and 4.2. The highest score is yielded by the comparison between Symptomate and Isabel and the lowest between Isabel and Mayo Clinic.

4.1.2 Triage Advice

The triage advice given by the different symptom checkers when inputted with the same symptoms was classified in different categories to be easier to analyze and compared. The manual categorizing of the triage advice consists of five categories: ’emergency’, ’urgent’, ’slightly urgent’, ’call a nurse immediately’ and

’self-care’. Emergency means that the triage results stated that the patient should immediately call an ambulance or go to an emergency room. Urgent means that the patient should see a doctor within 24 hours. Slightly urgent means either that the patient should see a doctor in a couple of days or if the symptoms get worse. Call a nurse immediately means that the advice given was to call a nurse immediately. Self-care means that the triage advice states that the patient can take care of their symptoms at home.

Table 4.3: The triage advice given by the different symptom checkers for the different illnesses

Emergency Urgent (within 24 hrs) Slightly Urgent

(couple of days) Call a Nurse Immediately Self-Care Symptomate Migraine, IBS, Celiac Disease,

Tonsillitis, Pneumonia

Mayo Clinic IBS, Migraine Celiac Disease, Tonsillitis, Pneumonia Isabel Pneumonia Migraine, Celiac Disease, Tonsillitis, IBS

Australia H.D. IBS, Pneumonia Celiac Disease Migraine Tonsillitis

The results drawn from this table is that Symptomate gave the advice of seeing

a doctor within 24 hours for all diseases. Mayo Clinic usually advised that the

patient should see a doctor if certain symptoms appeared and once gave the advice

to go to the emergency room if the patient had certain symptoms. Isabel usually

(24)

stated that the patient should go to a family physician or similar and only stated once that the patient should go to an emergency room. The Australian Health Direct was the only symptom checker that gave self-care as advice and was also the checker with the most diverse advice.

4.1.3 Accuracy of Diagnosis

The Australian Health Direct is a triage symptom checker and was therefore not used for this part of the test. The accuracy was measured by looking at how high the sought diagnosis was placed in the list of the suggested diseases. For example if a symptom checker had put the correct diagnosis in the first position then it would receive a 1 for this disease. The numbers in the boxes in Table 4.4 show the placement of the disease in the result list for the different symptom checkers that were tested.

Table 4.4: The position of the seeked disease for different symptom checkers Migraine IBS Celiac Disease Tonsillitis Pneumonia

Symptomate 1 1 1 2 1

Mayo Clinic 1 2 1 2 2

Isabel 1 1 1 1 5

The results that can be drawn from the table is that no symptom checker placed

the correct diagnosis lower than 5th place. Symptomate had the correct diagnosis

as the top choice in 4/5 of the cases, and the last case was placed second. Isabel

Symptom Checker had the correct diagnosis as the top choice in 4/5 cases, but

placed the correct diagnosis in fifth place in the fifth case. Mayo Clinic had the

correct diagnosis as the top choice in two of the diseases, and the others were

ranked second. The diseases that all of the symptom checkers correctly suggested

as the top choice were celiac disease and migraine.

(25)

4.2 Covid-19 Symptom Checkers

4.2.1 Similarity of Questions

The keywords of the different arrays of questions that were obtained are the following:

Kry (length = 57)

['19', '2', 'apply', 'asthma', 'better', 'breathing', 'bronchitis', 'cancer', 'cardiologist', 'changed', 'chemotherapy', 'chronic', 'cirrhosis', 'contact', 'cortisone', 'cough', 'covid', 'diagnosed', 'dialysis', 'difficulty', 'disease', 'dry', 'failure', 'failure', 'failure', 'feel', 'feeling', 'fever', 'feverish', 'followed', 'following', 'following', 'health', 'health', 'heart',

'inflammatory', 'issues', 'kidney', 'long', 'none', 'outside', 'past', 'past', 'recently', 'respiratory', 'select', 'someone', 'sweden', 'symptoms', 'term', 'therapy', 'travelled', 'treatment', 'two', 'weeks', 'weeks', 'worse']

SLL (length = 66)

['19', '2', 'apply', 'asthma', 'better', 'breathing', 'bronchitis', 'cancer', 'cardiologist', 'changed', 'chemotherapy', 'chronic', 'cirrhosis', 'contact', 'cortisone', 'cough', 'covid', 'diagnosed', 'dialysis', 'difficulty', 'disease', 'dry', 'failure', 'failure', 'failure', 'feel', 'feeling', 'fever', 'feverish', 'followed', 'following', 'following', 'health', 'health', 'heart',

'inflammatory', 'issues', 'kidney', 'long', 'none', 'outside', 'past', 'past', 'recently', 'respiratory', 'select', 'someone', 'sweden', 'symptoms', 'term', 'therapy', 'travelled', 'treatment', 'two', 'weeks', 'weeks', 'worse']

NHS (length = 69)

['19', '5', '70', 'activities', 'age', 'asthma', 'blood', 'bone',

'breathe', 'breathless', 'cancer', 'cancer', 'certain', 'concerned',

'condition', 'condition', 'condition', 'continuous', 'copd',

(26)

'coronavirus', 'cough', 'covid', 'cystic', 'daily', 'developed', 'diabetes', 'example', 'fibrosis', 'get', 'heart', 'high', 'ill', 'illness', 'immune', 'increased', 'infections', 'leukaemia', 'likely', 'lung', 'makes', 'marrow', 'means', 'medicine', 'much', 'new', 'organ', 'person', 'pregnant', 'risk', 'serious', 'severe', 'severe', 'severe', 'speak', 'stopped', 'struggling', 'symptoms', 'system', 'taking', 'temperature', 'transplant', 'treatment', 'types', 'unable', 'usual', 'weakens', 'words', 'years', 'years']

maladiecoronavirus (length = 119)

['24', '24', '3', '48', 'aches', 'age', 'allows', 'azathioprine', 'blood', 'body', 'breath', 'calculate', 'cancer', 'cardiology', 'chest', 'chronic', 'chronic', 'code', 'complications',

'corticosteroids', 'cough', 'cough', 'cyclophosphamide',

'cyclosporine', 'days', 'days', 'decrease', 'decrease', 'defense', 'diabetic', 'dialyzed', 'diarrhea', 'disease', 'disease', 'disease', 'disease', 'drink', 'effort', 'epidemiological', 'examples',

'factor', 'factor', 'failure', 'feed', 'followed', 'heart', 'high', 'highest', 'hours', 'hours', 'hours', 'immune', 'immunosuppressive', 'include', 'increase', 'index', 'infection', 'infections',

'information', 'known', 'last', 'last', 'least', 'little', 'liver', 'loose', 'loss', 'make', 'mass', 'methotrexate', 'monitoring',

'muscle', 'normal', 'noticed', 'noticed', 'pain', 'past', 'perform', 'pregnant', 'pressure', 'quel', 'recent', 'reduces', 'referred', 'renal', 'respiratory', 'risk', 'risk', 'sharp', 'shortness', 'since', 'size', 'smell', 'sore', 'speak', 'specific', 'stools', 'system', 'tacrolimus', 'take', 'take', 'taste', 'temperature', 'therapy', 'three', 'throat', 'tiredness', 'treatment', 'treatment', 'unable', 'unbalanced', 'unusual', 'unusual', 'unusual', 'us',

'vascular', 'weight', 'years', 'zip']

The Jaccard Index and the Cosine Similarity show the same trend for the covid-19

checkers’ question comparisons, as shown in Table 4.5 and Table 4.6. The lowest

(27)

Table 4.5: The Jaccard Index of the different combinations of symptom checker question keywords

Kry SLL NHS maladiecoronavirus

Kry 1 0.1389 0.0678 0.0667

SLL 0.1389 1 0.0888 0.0819

NHS 0.0678 0.0888 1 0.093

maladiecoronavirus 0.0667 0.0819 0.093 1

Table 4.6: The Cosine Similarity of the different combinations of symptom checker question keywords

Kry SLL NHS maladiecoronavirus

Kry 1 0.2842 0.1174 0.1866

SLL 0.2842 1 0.1606 0.2299

NHS 0.1174 0.1606 1 0.1789

maladiecoronavirus 0.1866 0.2299 0.1789 1

score is given by both to the NHS & Kry pair, and the highest by SLL & Kry.

4.2.2 Triage Advice

The chosen classification of categories of answers are “Self-care”, “Call to get more information” and “Call the ambulance”.

Table 4.7: Classification of the answers given by the different symptom checkers

Self-care Call to get more information Call the ambulance

Kry Case 1 Case 2, Case 4 Case 3

SLL Case 1, Case 2 Case 3, Case 4

NHS Case 1 Case 2, Case 4 Case 3

maladiecoronavirus Case 1 Case 2, Case 4 Case 3

Reminder of the cases:

Case 1: Person with no symptoms

Case 2: Person who shows some symptoms Case 3: Person who shows a lot of symptoms

Case 4: Person who shows some symptoms (the same as Case 2), has risk factors and has met a person who has coronavirus or travelled abroad

The results show that the advice is the same for all checkers in all cases, except for

SLL’s symptom checker. SLL differs from the other checkers by appearing to give

less “alarming” advice. In case 2, where a person shows some symptoms, self-care

(28)

is encouraged, while the other checkers advise to call a professional to get more

information. In addition to this, in case 3, which represents a person with a lot

of symptoms including problems with breathing, SLL advises to call to get more

information instead of immediately calling an ambulance like the other checkers

do.

(29)

5 Discussion

5.1 Analysis of results

In this part, the results found are analyzed and discussed with the goal of answering the research question which asked if online symptom checkers are reliable.

5.1.1 General Symptom Checkers Analysis of Questions

Looking at Table 4.1 and Table 4.2 the lowest Jaccard Index and Cosine Similarity values were yielded when the questions from Mayo Clinic were compared with the questions from the other symptom checkers. This is probably related to the fact that Mayo Clinic was the symptom checker with the fewest keywords (total of 12 keywords), because it had the smallest amount of questions. Looking at how the Cosine Similarity and the Jaccard Index are calculated, if two sets have a low amount of words in common and a big size difference it would result in a small numerator value and a large denominator value. As a result it would yield a small number. The fact that Mayo Clinic’s symptom checker asked a small amount of questions in comparison with the other symptom checkers is interesting. How could this symptom checker be so sure of suggesting the correct diagnosis with few questions? Could this affect the reliability? However when looking at the diagnosis result this does not seem to affect the reliability. This will be discussed more in the analysis of the diagnosis results.

The lowest value obtained when calculating the Jaccard Index and Cosine Similarity was between Mayo Clinic (12 words) and Isabel (43 keywords). The two checkers had only one word in common, “pain”. These two symptom checkers had the smallest difference when it came to the amount of keywords. This suggests that the low value is actually related to the fact that the symptom checkers either asked different questions or used different words when asking similar questions.

The highest value for both the Jaccard Index and the Cosine Similarity calculations

were the comparison between Symptomate (55 keywords) and Mayo Clinic (43

(30)

keywords). These two were the symptom checkers with the most similar amount of words. They also had several words in common such as “pregnant”, “age”,

“symptoms” and “long”. The combination of similar amount of keywords, highest value for Jaccard Index and the Cosine Similarity and several keywords in common suggest that these questions were indeed similar. However the value was still rather low, 0.1011 for Jaccard Index and 0.3113 for the Cosine Similarity, on a scale of 0 to 1. This suggests that even though these symptom checkers had the most similar questions they were still quite different.

In conclusion, by just looking at the results of the comparison between the questions from the different symptom checkers, the questions did not seem to be very similar. This was the case even when taking factors such as different amounts of keywords into account. The symptom checkers Mayo Clinic and Symptomate were the checkers with the most similar questions.

Triage Advice

The triage advice of the different symptom checkers varied.

One result that stood out was that the Symptomate symptom checker always recommended seeing a doctor within 24 hours. The reason behind this could be the organisation publishing the symptom checker would rather be safe than sorry to avoid any legal problems in the future.

The Mayo Clinic’s triage advice never consisted of telling the user to immediately seek a doctor or similar. It was always in the context of certain specific symptoms.

The symptom checker would say, if you have or develop any of these symptoms

then seek a physician or go to the nearest emergency department. This made the

categorization of this symptom checker’s advice the hardest. It was eventually

decided that the advice was seen as ’urgent (within 24 hours)’ if the symptom

checker’s triage advice consisted of seeking emergency help if certain symptoms

appeared. If the advice consisted of that a doctor should be consulted if certain

symptoms appeared it was deemed as ’slightly urgent’. It was not possible to use

the category emergency since the website never explicitly said to seek emergency

care immediately. Why this website only told the user to seek professional help

in case of certain symptoms could have many reasons. One possible reason could

(31)

be that the website did not want the user to seek professional help for symptoms that could be taken care of with self-care.

The Isabel symptom checker only suggested where the user could seek help if they wanted to. They never explicitly said the user should seek help but if the site recommended for example emergency services then this was categorized as emergency triage advice.

The Australian Healthdirect site gave the most diverse triage advice. It was also the only symptom checker that gave the recommendation of self-care. This symptom checker is produced by the Australian government, and its purpose is to improve the health literacy amongst patients. The reason behind the diverse advice could therefore be to make sure that patients are aware of their symptoms and how they can move forward.

The big differences between how these sites formulated their triage advice made them hard to compare to each other. But Symptomate was the only checker to always suggest seeing a doctor within 24 hours and Australian Healthdirect the only checker to recommend self-care. Overall it is clear that most of the sites would rather be safe than sorry by rather recommending users to see a doctor or call a nurse than to take care of themselves without any consultation.

Diagnosis Results

All of the chosen diseases for all symptom checkers were ranked first or second in the list of potential diseases. The exception was that Isabel ranked pneumonia in fifth place. If their reliability would be determined only from these results then they would be deemed very reliable.

However, these good results could be due to the fact that the inputted symptoms

are common for the diseases picked. This could have made them easy for

symptom checkers to diagnose. Symptomate did the best when it came to guessing

symptoms and this could be because it had the most questions. Also, how the user

answered a certain question affected the next questions. This suggests that some

kind of technology, for example artificial intelligence, was used to adapt questions

and make it easier for the checker to give the correct diagnosis. Another possibility

is that the illnesses that were chosen as input were easy to find for the symptom

(32)

checkers since they were quite common (except coeliac disease). If the chosen illnesses had been rarer, the checkers might have ranked them lower. This is due to the fact that one factor contributing to the ranking could be how common an illness is.

The results outputted by the different symptom checkers were quite similar. This is interesting considering that the questions were quite different according to the Jaccard index and Cosine Similarity values. This could perhaps mean that there are several ways in which the symptom checker can be constructed and still make good guesses about the diseases. The different symptom checkers were also good at distinguishing similar diseases even though they might both have been in the list of suggested diseases. For example in the result list for Mayo Clinic when putting in symptoms for IBS, IBS placed second and celiac disease placed third.

So the symptom checker was just barely able to distinguish them.

5.1.2 Covid-19 Symptom Checkers Analysis of Questions

Table 4.5 and Table 4.6 show that the Jaccard Index and the Cosine Similarity of the questions asked by different symptom checkers seem to show a similar trend.

They also show that the questions seem to be pretty different from one symptom checker to the other. This is surprising as one would expect symptom checkers that are centered on the same sickness to ask similar questions. The two symptom checkers which appear to be most similar are the one published by Kry, which is a Swedish healthcare app, and the one published by the county of Stockholm. Their similarity could be explained by the fact that these two checkers were the only ones which had the same country of origin.

To understand further the values given by the Jaccard Indexes and Cosine Similarity, one can observe the list of questions and the keywords more closely.

Firstly, it is possible that some questions or keywords have the same meaning,

but are written differently. For example, the lists of keywords contain both words

(33)

“coronavirus” and “covid” which have the same lexical meaning of course, but will not be recognised by the index as being the same word. Another example is that the NHS asks the question “Do you have any of the following health issues? Select all that apply. [...] Heart failure under treatment and followed by a cardiologist [...] “, and maladiecoronavirus asks two question which are translated from French to English as “Do you have a heart or vascular disease” and “Do you take a treatment referred to cardiology”. Both questions of course want to assess if the person has health problems related to the heart, but no keywords will match.

Secondly, the list of keywords of the questions given by the French website is much longer than the other lists (119 while the others have an average length of 64). This is because the website used 21 or 22 questions (depending on the given answers), while the other websites used on average 7 questions. This can mean that the website may ask “extra questions” in addition to the “basic questions” that are probably asked by all websites. These extra-questions make the indexes lower as they make the fraction denominator higher without making the numerator go up as much.

It is also interesting to note that the covid-19 questions are considered on average by the indexes to be more alike than the questions asked by the general symptom checkers. This can probably be linked to the fact that as this type of checker concentrates on very specific symptoms which appear in covid-19, while the other checkers need to be more general.

Analysis of Triage Advice

As shown in part 4.2.2 Triage Advice, the answers given by the different symptom

checkers were all the same, except the ones given by the symptom checker

published by SLL. The answers given by the SLL checker showed a tendency to

be less alarming than the others. A further observation is that the questions

asked by SLL showed a relatively high similarity with the other symptom checkers,

compared to the other pairs. This means that the answers being different cannot

be explained by the questions being asked differently. An explanation could be

the different politics of the countries, as Sweden has shown a liberal approach

to counter the coronavirus epidemic, which is very different compared to the

approaches of the other countries in the EU. It could also be explained by the fact

(34)

that the 1177 health information telephone line is trusted with deciding whether the person should call the ambulance or not.

5.2 Analysis of the Method

The aim of this section is to discuss the reliability of the method used, and thereby how reliable the results are.

A problem that arises when analyzing symptom checkers by comparing their questions is that it is hard to know which symptom checker is the best based on that information. Even if all checkers used exactly the same questions, it would not necessarily mean that they are reliable, as it could just as well mean that all of the checkers are unreliable in the same way. This report found them to be different, but it is not possible to know which of them are reliable and which are not based on this information. Also, the Jaccard Index and Cosine Similarity showed the same trends. This means that using only one of them would probably be sufficient.

The same problem which was outlined above for the questions applies to the triage results. The solution to this would be to let medical experts analyze the results.

They could then give their opinion about whether they find the different questions more or less relevant, and also if they find the triage advice reasonable.

Another problem that was encountered was the translations of the questions from the Swedish and the French symptom checker to English. The translator used was Google Translate and the translations had some mistakes. However, the sentences were understandable and the keywords seemed to be correctly translated.

In addition to this, the data used is not sufficient to make the study statistically significant. For making a wide assessment about if symptom checkers are generally reliable, a large amount of those have to be included in the study. It would also be necessary to test a much bigger number of illnesses, including both common and uncommon ones.

5.3 Comparison with Past Research

The aim of this section is to compare the results and method in this report with

results and methods from other research papers.

(35)

There are quite a few differences between this study and previous work on the field. The biggest one is that most of (if not all) of the past research mostly focuses on the results of the symptom checkers. No other study could be found that analyzes the questions asked by different symptom checkers. In that sense this study is exploring new ways of determining the reliability of symptom checkers.

When compared with research that have generally investigated the accuracy of symptom checkers, there is a big difference when it comes to the amount of data collection. The study by Semigran et al. investigated a total of 23 different symptom checkers and 45 different diseases. When determining accuracy of several symptom checkers for several different diseases the amount of data collected is usually much larger than in this study. There are some studies that have around the same amount of data collection. But these studies usually have a different focus. They either focus on only one condition, or they have a different purpose than investigating accuracy/reliability.

Another difference in method is how the symptoms are put into the different symptom checkers and how they are determined to have chosen the right diagnosis. Semigran et al. used clinical vignettes which are cases that can be used by physicians to study symptoms. They are not necessarily based on real patients but most vignettes are written by medical professionals and have an answer with a correct diagnosis. Other studies have asked real patients to put in their symptoms into different symptom checkers and compared the result with the diagnosis of a doctor. Some studies have also used real cases without asking patients to put in their symptoms.

The results from the previous studies were easier to draw conclusions from than

the results in this study. This has largely to do with the fact that those studies

had more data to draw conclusions from. Also most of the previous studies

either tested the accuracy of the symptom checkers or compared the results from

symptom checkers with the diagnosis of a doctor. It is easier to come up to a

decisive conclusion when conducting studies this way than by comparing different

symptom checkers.

(36)

5.4 Future Research

This section suggests possible future research which could be made in this field, and how it could be related to this study.

It would be interesting in the future to use this technique for analyzing more symptom checkers, with more illnesses, as the statistical significance is of course greater with more data to analyze. It would also maybe make it easier to discern trends.

If the same or a similar method is used, the process of filling in the data in the website could be automatized.

It would probably be useful to also have a healthcare expert evaluates the results, since they likely could explain and interpret them better. It would probably also be useful to make the study with inputs which come from real cases, as the symptom checker inputs would be related to the output illnesses in a more realistic way.

It could also be interesting to replicate the study in a couple of years to see if the results change.

It could also be interesting to see if other methods can be used for comparing the

questions of the symptom checkers. Perhaps a machine learning model could be

designed to find patterns in the questions and connect them to the results of the

different checkers.

(37)

6 Conclusion

Firstly, for the general symptom checkers, this study’s results showed that they seem to be reliable. However, to be statistically significant, the same study should be conducted on more symptom checkers and with additional illnesses.

The illnesses should also be more uncommon than the ones tested in this study.

Secondly, the triage advice was found to be potentially too alarming for most of the general symptom checkers and quite different from each other. On the other hand, three of the covid-19 symptom checkers had the exact same triage recommendations. They seemed to be less alarming than the general symptom checkers. This could be due to the fact that governments do not want sick people to go to the hospital and infect other people when their condition is not life- threatening. Lastly, the string comparison of the questions asked by the symptom checkers showed that these questions seemed to be very different from each other.

This is interesting since this could mean that despite having different questions they can still come to the same conclusion. This difference could also mean that some symptom checkers are in fact more reliable than others. However, because of the error factors named in the discussion section, the conclusion that the questions are very different cannot be considered definite.

In conclusion, according to the results in this study symptom checkers can be quite

reliable. However in order for this to be statistically significant additional studies

are needed to deal with the weaknesses of this study.

(38)

7 References

This sections lists the references used in the report in order of appearance.

[1] Roser, Max, Ritchie, Hannah and Ortiz-Ospina, Esteban, Internet, 2020, retrieved from https://ourworldindata.org/internet

[2] Jutel, Annemarie, A Discursive Systematic Review of the Medical Literature, 2010, The Journal of Participatory Medicine, https:

//participatorymedicine.org/journal/evidence/research/2010/09/15/

self-diagnosis-a-discursive-systematic-review-of-the-medical-literature/

[3] PushDoctor.co.uk, Digital Healthcare Review, 2016, https://www.pushdoctor.co.uk/digital-health-report

[4] White Ryen W. & Horvitz Eric, Cyberchondria: Studies of the

Escalation of Medical Concerns in Web Search, Microsoft Research, 2008- 11, https://www.microsoft.com/en-us/research/wp-content/uploads/2016/

02/TR-2008-178.pdf

[5] Semigran Hannah L, A Linder Jeffrey, Gidengil Courtney & Mehrotra Ateev, Evaluation of symptom checkers for self diagnosis and triage: audit study, the BMJ, 2015-07-08, https://www.bmj.com/content/351/bmj.h3480.full

[6] Cordenius Maud, Schultz Susanna, Covid-19 – coronavirus,

1177 Vårdguiden, 2020-04-13

https://www.1177.se/Stockholm/sjukdomar--besvar/infektioner/

ovanliga-infektioner/covid-19-coronavirus/

[7] Mayo Clinic Staff, Tonsillitis, Mayo Clinic, 2018-12-13

https://www.mayoclinic.org/diseases-conditions/tonsillitis/

symptoms-causes/syc-20378479

[8] Institute for Quality and Efficiency in Health Care, Tonsillitis: Overview, National Center for Biotechnology Information, 2019-01-17, https://www.ncbi.

nlm.nih.gov/books/NBK401249/

[9] National Health Service Staff, Overview Pneumonia, National Health Service,

2019-06-30, https://www.nhs.uk/conditions/pneumonia/

(39)

[10] Newman, Lawrence C, Migraine Headaches, WebMd, 2018-05-13, https://www.webmd.com/migraines-headaches/

migraines-headaches-migraines#1

[11] NHS staff, Irritable Bowel Syndrome (IBS), National Health Service, UK, 2017-10-09,

https://www.nhs.uk/conditions/irritable-bowel-syndrome-ibs/

[12] Michigan Medicine Staff, Irritable Bowel Syndrome (IBS)

and Functional Bowel Disorders, University of Michigan, No date, https://www.

uofmhealth.org/conditions-treatments/digestive-and-liver-health/

irritable-bowel-syndrome-ibs-and-functional-bowel-disorders

[13] NHS staff, Coeliac disease, National Health Service UK, 2019-12-03 https:

//www.nhs.uk/conditions/coeliac-disease/

[14] Celiac Disease Foundation Staff, Symptoms of Celiac Disease, Celiac Disease Foundation, No date, https://celiac.org/about-celiac-disease/

symptoms-of-celiac-disease/

[15] Rose Stuart, Engel Dave, Cramer, Nick & Cowley, Wendy. (2010).

Automatic Keyword Extraction from Individual Documents. 10.1002/

9780470689646.ch1.

[16] NLTK project, Natural Language Toolkit, 2020, https://www.nltk.

org/

[17] Sharma, Vishwas B, rake-nltk 1.0.4,

https://pypi.org/project/rake-nltk/

[18] Real Raimundo & M Vargas Juan, The

probabilistic basis of Jaccard’s index of similarity, 1996, https://www.jstor.org/stable/2413572?casa_token=yzs2_IM1vzMAAAAA:

NJQMez686qKdsLgubCELuox0iPn_

BHxd4bQsLvvnfwz0uC1FGloiQSvmU7NJaxMNRmC0uZE0WTR9WQeAjnyDXoQIAmbMvXeZsCa8MyHlSmia9F1m9Ww&

seq=2#metadata_info_tab_contents

[19] Jiawei Han, Micheline Kamber, Jian Pei, Data Mining (Third edition), 2012,

(40)

Chapter 2.4.7 Cosine Similarity, pages 77-78

[20] Powel Lucy, Mcllroy Graham, Simons Gwenda & Raza Karim, Are online symptoms checkers useful for patients with inflammatory arthritis?, BMC, 2016-08-24, https://bmcmusculoskeletdisord.biomedcentral.com/articles/

10.1186/s12891-016-1189-2

[21] Middleton, Katherine, Butt, Mobasher, Hammerla, Nils, Hamblin, Steven, Mehta, Karan & Parsa, Ali, Sorting out symptoms:

design and evaluation of the ‘babylon check’ automated triage system, 2016-06- 7, https://arxiv.org/pdf/1606.02041.pdf

[22] Fraser Hamish, Coiera Enrico & Wong David, Safety of patient- facing digital symptom checkers, The Lancet, 2018-11 24–30, Pages 2263- 2264, https://www-sciencedirect-com.focus.lib.kth.se/science/article/

pii/S0140673618328198#bib6

[23] Chambers Duncan, Cantrell Anna J, Johnson Maxine, Preston Louise, Baxter Susan K, Booth Andrew & Turner Janette,

Digital and online symptom checkers and health assessment/triage services for urgent health problems: systematic review, BMJ Open, 2019-08-1, https:

//www.ncbi.nlm.nih.gov/pmc/articles/PMC6688675/

[24] Samuelsson, Charlotte, Varannan svensk har övervikt eller fetma, Statistiska Centralbyrån, 2018-10-09, https://www.scb.se/hitta-statistik/

artiklar/2018/varannan-svensk-har-overvikt-eller-fetma/

[25] Statistiska Centralbyrån, Kommuner med högst medelålder, 31 december 2019 jämfört med 31 december 2018, 2020-03-19, Statistiska Centralbyrån https://www.scb.se/hitta-statistik/statistik-efter-amne/

befolkning/befolkningens-sammansattning/befolkningsstatistik/

pong/tabell-och-diagram/topplistor-kommuner/

kommuner-med-hogst-och-lagst-medelalder-31-december-2016-jamfort-med-31-december-2015/

(41)

Appendices

The appendices (except A.5) are attached as links as they are of a significant size.

A Symptom checker questions

https://docs.google.com/spreadsheets/d/1TGo9WJns9tro1FmXGruKpMVnHL3B- WvO8U83Q6MJN9s/edit?usp=sharing

B Answers inputted to symptom checker questions

https://docs.google.com/spreadsheets/d/1JRiQkCfsE2qdb-dBeLu0SssGf- nBsAC3NJ4Y0vFGnEs/edit?usp=sharing

C Results outputted by symptom checkers

https://docs.google.com/spreadsheets/d/1uCDtJWoH1TMatPkMEJmbOyd- caI874P0RquQyKGuiao/edit?usp=sharing

D Code for conversion of arrays of questions to Jaccard Index & Cosine Similarity

https://gits-15.sys.kth.se/despinoy/kex

An Analysis of the Reliability of Internet-Based Symptom Checkers

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM, SWEDEN 2020