IN
DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING 300 , SECOND CYCLE
CREDITS
STOCKHOLM SWEDEN 2015 ,
Clinical Assessment for Deep Vein Thrombosis using Support Vector Machines
A DESCRIPTION OF A CLINICAL ASSESSMENT AND COMPRESSION ULTRASONOGRAPHY JOURNALING SYSTEM FOR DEEP VEIN THROMBOSIS USING SUPPORT VECTOR MACHINES
DANIEL ÖBERG
KTH ROYAL INSTITUTE OF TECHNOLOGY
SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION
Clinical Assessment for Deep Vein Thrombosis using Support Vector Machines
A description of a clinical assessment and compression ultrasonography journaling system for deep vein thrombosis using support vector machines
D. ÖBERG
Master’s Thesis at NADA Supervisor: J. Lagergren
Examiner: J. Lagergren
Abstract
This master thesis describes a journaling system for com- pression ultrasonography and a clinical assessment system for deep vein thrombosis (DVT). We evaluate Support Vec- tor Machines (SVM) models with linear- and radial basis function-kernels for predicting deep vein thrombosis, and for facilitating creation of new clinical DVT assessment.
Data from 159 patients where analysed, with our dataset, Wells Score with a high clinical probability have an accuracy of 58%, sensitivity 60% and specificity of 57% these figured should be compared to those of our base models accuracy of 81%, sensitivity 66% and specificity 84%. A 23 percentage point increase in accuracy. The diagnostic odds ratio went from 2.12 to 11.26. However a larger dataset is required to report anything conclusive.
As our system is both a journaling and prediction system,
every patient examined helps the accuracy of the assessment.
Referat
Klinisk bedömning av djup ventrombos genom SVMs
I denna rapport beskrivs ett journalsystem samt ett system för klinisk bedömning av djupvenstromboser. Vår modell baserar sig på en stödvektormaskin (eng. Support Vector Machine) med linjär och radial basfunktion för att fastställa förekomsten av djupa ventromboser samt att hjälpa till i skapandet av nya modeller för bedömning.
159 patientjournaler användes för att fastställa att Wells Score har en klinisk precision på 58%, 60% sensitivitet och specificitet på 57% som kan jämföras med våran modell som har en precision på 81%, 66% sensitivitet och specificitet på 84%. En 23 procentenheters ökning i precision. Den diagnos- tiska oddskvoten gick från 2.12 till 11.26. Det behövs dock en större datamängd för att rapportera något avgörande.
Då vårt system både är för journalskapande och klinisk
bedömning så kommer varje undersökt patient att bidra till
högre precision i modellen.
Contents
1 Introduction 1
2 Medical Background 3
3 Technical Background 7
3.1 Slack variables . . . . 9
3.2 Different error costs . . . 10
3.3 Radial Basis Function . . . 12
4 Implementation 15 4.1 Environment . . . 15
4.2 Preprocessing . . . 15
4.3 Architecture . . . 16
4.4 Data format . . . 16
5 Results 21 5.1 Test set . . . 21
5.2 Baseline . . . 21
5.3 Benchmarking . . . 22
5.4 Results . . . 24
6 Discussion 29 6.1 Class Weight (Aka. How much is a life worth?) . . . 29
6.2 What is a question worth? . . . 29
7 Conclusion 33 8 Future Work 35 8.1 Regression . . . 35
8.2 Network Synchronisation & Security . . . 35
8.3 Statistical Power . . . 35
8.4 Non Negative Matrix Factorisation . . . 36
9 Abbreviations 37
10 Appendix 39
Bibliography 45
Chapter 1
Introduction
Every year 1.6 per 1000 inhabitants suffer from venous thrombosis, blood cloths in their veins (Nordström et al. 1992). Venous thrombosis formed in the deep veins are most often found in the legs. Multiple ways to asses deep venous thrombosis exists but diagnostics for this dangerous condition range from the accurate but expensive contrast venography to the cheap but unreliable clinical assessment (Schumann and Ewigman 2007).
With this in mind we can assess deep vein thrombosis that has a high specificity with a D-Dimer test which is a small protein fragment present in the blood after a blood clot is degraded. High concentration of D-Dimer correlates with thrombosis, but false positive readings can occur from liver disease, high rheumatoid factor, inflammation, and many other factors (Kabrhel et al. 2010) which makes the sensitivity low.
Another way is the gold standard for DVT clinical assessment, named Wells Score, that was introduced with the paper “Accuracy of clinical assessment of deep-vein thrombosis”. This assessment is performed by a simple scoring system and a yes or no questionnaire regarding the patients medical history. Created by univariate, and stepwise logistic regression analysis of 529 patients’ clinical data. Wells Score is a very quick assessment as it only has 9 significant variables (Gao and Yang 2008).
A combination of Wells Score and a D-dimer test reliably excludes DVT in adults without the need for imaging studies (Ho and others 2010) , such as compression ultrasonography (CUS), which can be both painful and expensive.
In the case that DVT cannot be reliably ruled out a compression ultrasonography
can be used to find a thrombus or the lack of such. In a compression ultrasonography
the examiner will start compressing the veins (usually beginning with the Femoral
vein as far proximally as possible) and work distally towards the feet. Using a
probe containing transducers to send pulses of sound into the leg. When the sound
is hitting a material with different density, part of the sound wave is reflected
back to the probe. The compression is done to check for a reduced coefficient of
compressibility as the cloths are blocking the veins from compressing normally. The
CHAPTER 1. INTRODUCTION examiner typically uses this to check for thrombosis each 2-5 cm noting the absence or presence of occluded veins.
This process is often noted down on either paper or in a system which cannot intelligently be used for anything other than storage and reference. We hope that with our software both the assessment and the compression ultrasonography findings could help train a new kind of improved assessment.
2
Chapter 2
Medical Background
Although this is a thesis in the area of Computer Science some knowledge of deep vein thrombosis is necessary.
The most important thing to understand is what a deep vein thrombosis actually is.
The blood’s unique ability to coagulate and therefore stop bleedings and start the healing of wound is off vital importance. Every day small wounds are healed in the blood vessel. In this process more than 50 different substances is working together (D. Bergqvist et al. 2002) to make sure that the vessels are not coagulating in the vessels in other cases than when wounded. Some substances even has the role of dissolution of coagulated blood.
This balance between the stimulating and inhibitory is sensitive. Coagulated blood could produce blood clots, thrombosis, that leads to degraded circulation on the other hand if the inhibitory substances outbalance the stimulating factors the result will be dangerous internal bleeding.
Blood cloths are usually stuck in a vein in the calf of the leg or the femur in big veins were the blood flow is slower. Often one side of the cloth is stuck to the vein wall were the other side is free and continuously built upon, sometimes measuring several decimeters long (D. Bergqvist et al. 2002).
If part of a cloth breaks free and is allowed to travel to the lung , we call it a pulmonary embolism . A serious condition that 1000 patients a year die from just in Sweden. These figures should be compared to the ~4000 who gets diagnosed with pulmonary embolism or ~8000 patients who gets diagnosed with deep vein thrombosis.
If we instead focus on US, a study the average yearly incidence of first lifetime venous thromoboembolism among people between 1966 and 1990 was 117 per 100 000 people, a similar study for that for Europe in general the rate is as high as 183 per 100 000 people (Antovic and Blombäck 2010).
The rate of occurrence might be even higher than that, one study concluded that,
quote (Sandler and Martin 1989):
CHAPTER 2. MEDICAL BACKGROUND Pulmonary embolism was thought to be the cause of death in 239 of 2388 autopsies performed (10%): 15% of these patients were aged less than 60 years and 68% did not have cancer. Of these patients, 83% had deep-vein thrombosis (DVT) in the legs at autopsy, of whom only 19%
had symptoms of DVT before death. Only 3% of patients who had DVT at autopsy had undergone an investigation for such before death.
The Swedish hospitals costs of venous thromboembolism alone was estimated to 0.375 billion SEK in 1999 (D. Bergqvist et al. 2002). Part of the high cost is due to the difficulty to confirm diagnosis without either a thrombosonography, a D-dimer test or in rare cases intravenous venography; indeed some forms of DVT remains clinically inapparent.
The decision to order thrombosonography is, in several guidelines, in large part done based entirely on pretest risk assessment like Wells Score . The patients with low risk get D-dimer blood test and only go on to ultrasonography if the test is positive.
The ones with high risk goes straight to ultrasonography without getting a D-dimer test.
Thrombosonography, the use of high frequency sound to visualise body tissue, is preferred over other methods of proximal DVT assessment as it is a non-invasive procedure with very high sensitivity and specificity (96% and 98%, respectively) (Gaitini 2006). The main criteria for diagnosing DVT is to find a reduced coefficient of compressibility. This is done by compressing the vein under observation and checking for relatively low compressibility.
Large thrombus often become pronounced after a couple of days, but even a normal vein can produce an echo that looks similar to a thrombus.
As doing ultrasonograhies for every patient would be far to expensive and time- consuming a blood sample test, namely D-dimer, with a clinical assessment is to prefer.
Clinical assessment was long considered unreliable. P.S. Wells and his colleagues challenged this dogma in 1995 with the publication of his now well known paper
“Accuracy of clinical assessment of deep-vein thrombosis”.
The use of their clinical model is now standard practice in DVT diagnoses.
Table 2.1. Wells score (Bounameaux, Perrier, and Righini 2010)
Variable Points
Cancer treatment during the past 6 months +1
Lower leg paralysis or plastering +1
Bed rest > 3 days or surgery < 4 weeks +1
Pain on palpation of deep veins +1
Swelling of entire leg +1
4
Variable Points Diameter difference on affected calf > 3 cm +1
Pitting oedema (affected side only) +1
Dilated superficial veins (affected side) +1 Alternative diagnosis at least as probable as DVT -2
A score is given from analysing the patients medical history were each criteria is increasing the score by one, except if an alternative diagnosis is possible which decreases the score by two.
The old variant of Wells Score divided the probability into three classes Low, Intermediate and High.
Table 2.2. Clinical probability for Wells score
Low 0 total
Intermediate 1-2 total High > 2 total
While a recent modification of the score only have two groups, namely likely or
unlikely DVT (Le Gal, Carrier, and Rodger 2012).
Chapter 3
Technical Background
With machine learning algorithms we can figure out how to assess deep vein throm- bosis by generalising from examples. As more data would become available, the better the assessment would become. To be more specific, we are looking for a binary classification algorithm to improve on the modified Wells Score.
In other words, estimate a function f : R N æ {±1} were R N represents the input space with N number of features/dimensions.
We focus on generalisation when creating a classifier. A classifier that cannot generalise might still have a low training error and this does not imply a low expected test error.
Given that we have training samples in the form of T = {(x 1 , y 1 ), ..., (x ¸ , y ¸ )} ™ (X ◊ Y ) ¸ were the output domain is Y = {≠1, +1} and the input space X ™ R N .
f (x)
I y Õ {y Õ | ÷(x Õ , y Õ ) œ T · x Õ = x} ”= ÿ
≠1 otherwise (3.1)
The classifier shown in equation 3.1 is an example of a function that does not generalise and therefore does not learn. One should note that generalisation is a double edged sword and that finding the proper capacity is an research area itself in machine learning.
Support Vector Machines were developed by Cortes & Vapnik (Cortes and Vapnik 1995) for classification. The basic gist is that classification is done by finding the maximal margin of separation between two classes (optimal hyperplane also known as widest street), which can be seen as the generalisation, for linearly separable patterns.
With support vector classifiers the hyperplane can be described by the unknown, u, and the margin-vector, w (see figure 3.1):
w · u Ø C (3.2)
CHAPTER 3. TECHNICAL BACKGROUND
Figure 3.1. Optimal hyperplane
We are interested in which side the u vector is in so we project the u onto w and if this is bigger than some constant C then we say that u a positive sample.
By setting b = ≠C we get our decision rule:
(w · u) + b = 0, w œ R N , u œ R N , b œ R (3.3) which corresponds to the decision function:
f (x) = sign((w · u) + b) (3.4)
The problem is that we do not know neither the w nor the b. However adding additional constraints we can calculate them.
Taking a positive and negative sample and arbitrary setting it to bigger and smaller than one respectively:
w · x + Ø 1 (3.5)
w · x ≠ Æ 1 (3.6)
Then introducing a variable y i that is +1 for positive samples and ≠1 for negative samples we can combine these into:
y i ú (x i · w + b) Ø 1 (3.7)
8
3.1. SLACK VARIABLES
And with that we can add the extra constraint that:
y i ú (x i · w + b) ≠ 1 = 0 (3.8)
should be were x i is on the margin.
Now if we want the widest margin possible we could take the difference of a negative and positive sample on the margins and project it onto a unit normal.
WIDTH = (x + ≠ x ≠ ) · w
ÎwÎ (3.9)
Which can be simplified further to
(x + ≠ x ≠ ) · w
ÎwÎ = 2
ÎwÎ (3.10)
The way one maximize this in the support vector machines algorithm is to use Lagrange multipliers but to do that we need a constraint, which we have happened to already have mentioned y i ú (x i · w + b) ≠ 1 = 0 . In this case it’s easier to minimize
1
2 ú Î w Î 2 than to maximize 2 ÎwÎ .
The Lagrange multiplier equation we try to optimize become:
L = 1
2 ú Î w Î 2 ≠ ÿ
i
– i (y i ú (w · x i + b) ≠ 1) (3.11) Which is something we can put into quadratic programming solvers to get the w, b and – i parameters.
The result is one unique solution independent on whether we add non-support vector points, as the Lagrange multipliers ( a i ) becomes zero for these points.
But there is a problem with this solution. If we try it against our dataset we get an accuracy of 20.75%! What is happening? Well the problem is that we do not have a cleanly linearly separable dataset. Using the dataset we see that it has given up with finding a solution and is classifing all of the input vectors into +1 (existing deep vein thrombosis) which gives a sensitivity of 100% and a specificity of 0%.
3.1 Slack variables
What we then need then is to relax the constraints to allow for a so called soft margin by using a slack variables, › i :
y i (w · x i ≠ b) Ø 1 ≠ › i , › i Ø 0 (3.12)
CHAPTER 3. TECHNICAL BACKGROUND
arg min
w,x
i,b
I 1
2Î w Î 2 + C ÿ n
i=1
› i J
(3.13)
were q i › i is an upper bound on the number of training errors and C is a constant for assigning higher penalty to errors.
With this we note a 26 percentage point increase in accuracy compared to our baseline (figure 3.2).
Figure 3.2. Linear SVM compared to Wells Score
Even though the accuracy is better, the sensitivity, which is arguably more important in our case (the thrombosonography is relatively cheap), has a 24 percentage point decrease.
3.2 Different error costs
As with most machine learning algorithms we always have the probability to train the model in a way to counteract an unbalanced dataset and tweak the sensitivity or specificity for a given purpose.
10
3.2. DIFFERENT ERROR COSTS
H 0 is actually true H 0 is actually false We conclude H 0 is true correct conclusion Type II error We conclude H 0 is false Type I error correct conclusion
H 0 is called the null hypothesis, and H 1 is called the alternative hypothesis.
Or put in another way:
H 0 is actually true H 0 is actually false We conclude H 0 is true True Positives (TP) False Positives (FP) We conclude H 0 is false False Negatives (FN) True Negatives (TN)
Higher sensitivity corresponds to increased likelihood of the SVM recommending that the doctor go through with a sonography. Doctors might strive for 100% sensitivity but this impacts the accuracy negatively. Higher sensitivity often leads to lower specificity which if it would be 0% would just make every doctor recommending to always take a sonography no matter what. This corresponds to zero Type 1 errors.
classification accuracy = T P + T N T P + T N + F P + F N Sensitivity: proportion of actual positives which are predicted positive
sensitivity = T P T P + F N
Specificity: proportion of actual negative which are predicted negative
specificity = T N T N + F P
So how do we get the SVM to prioritize a high sensitivity? Some researchers have
proposed to use different penalty parameters to handle unbalanced data (Osuna,
Freund, and Girosi 1997). We can show that this works very well even for balanced
data that needs a higher sensitivity.
CHAPTER 3. TECHNICAL BACKGROUND If we go back to the formula we need to minimize:
y i (w · x i ≠ b) Ø 1 ≠ › i , › i Ø 0 (3.14)
arg min
w,b,›
iI 1
2Î w Î 2 + C ÿ n
i=1
› i J
(3.15)
and instead of C use different misclassification costs for the positive- and negative class examples, C + and C ≠ .
arg min
w,b,›
iI 1
2Î w Î 2 + C +
ÿ n i=1
› i + C ≠ ÿ n
i=1
› i J
(3.16)
This goes by the name different error costs (DEC). By assigning a higher misclassifi- cation costs for the positive samples (C + Ø C ≠ ) we get a higher sensitivity. In other words we skew the separating hyperplane towards the positive set.
With this we get a sensitivity of 84,90% when we change the misclassification penalty ratio to 0,9 & 0,1 which is a 24.2 percentage point increase while still having a 3.1 increase in accuracy compared to our baseline.
3.3 Radial Basis Function
But there are still hurdles to overcome as the questions asked by the doctor might not be statistically independent from another. For example cancer correlates heavily with age, and it is not a linear correlation (Ukraintseva and Yashin 2003). This rules out simpler classification algorithms like Naïve Bayes but not Support Vector Machines.
Even though we have made strides so far, what we have shown has only have been able to account for linearly separable points. But with the so called kernel-trick , were we map data into a richer feature space then construct a hyperplane in that space, we are able to classify points that were not linearly separable in its previous space.
We call the function that maps from the vector x to another input space „(x).
By doing this simple transformation we know need to maximize:
K (x, y) = „(x) · „(y) (3.17)
But here we see that we do not really need „(x) on its own but can instead focus on K(x, y) which we call our kernel function.
By using the radial basis kernel (RBF):
12
3.3. RADIAL BASIS FUNCTION
K(x, y) = e ≠“Îx≠yÎ
2(3.18)
were “ is a chosen constant, we get a great and fast nonlinear kernel.
With the similar accuracy as our baseline and linear svm we can get 100% sensitivity with the RBF kernel.
Specifically we are using SVM with C-classification with an RBF-kernel because of
its good general performance and the few number of parameters (Meyer and Wien
2014).
Chapter 4
Implementation
4.1 Environment
One of the requirements was that the implementation should be able to run on Apple iOS platforms. This made it natural to develop the software in a combination of C, C++ and Objective-C. For the machine learning we choose the OpenCV library, developed by Intel Russia research center in Nizhny Novgorod for realtime computer vision. This library contains implementations for both RBF and linear kernels as it adopted the SVM/C++ library libsvm. Effort has gone into making sure the SVM theory presented is the same as the implementation. Note that details have been glossed over, for example an explanation of Lagrange.
4.2 Preprocessing
The first step was preprocessing of data to use as a base for support vector machines were access was given to 159 anonymous patients DVT journals. The journals were without identifiable information. From the journals we extracted the Wells score information and whether a DVT or occlusion were found. Note that from our point of view the occlusion and a DVT is the same.
Table 4.1. Count of each label in dataset
DVT Nothing found
33 126
As we can see we have a heavy bias in the nothing-found-category, even though
we are merging the DVT and occlusion columns. In a way this is good for us as it
means our assessment system has actual value. It is also surprising as many of the
patients from the dataset has already gone through a Wells score assessment which
should make the dataset have a heavy bias in the other direction, with very few in
CHAPTER 4. IMPLEMENTATION the nothing found category. The binary yes or no questions was converted to 1.0 and -1.0 respectively and stored in memory as a matrix.
4.3 Architecture
After the preproccessing step the actual implementation began. We used the typical Model-View-Controller concept and ended up with six controllers:
Features
Statistics Settings
PatientJounal
RiskAssessment
Sonography
One starts at the controller Patient Journal were one is required to write down, a valid, social security number of the patient, the family name and the examiner to proceed to the risk assessment. On the risk assessment the examiner is presented with questions fetched from a JSON-file. If no such file exists the JSON-file will be created with the assessment questions corresponding to Wells Score.
4.4 Data format
One of the goals of the assessment was not only to use the features of Wells Score but also to be able to find new features that is better for assessing deep vein thrombosis.
We would prefer to have doctors try and collaborate with different clinical assessment and share the data but we could not find any existing open format for sharing clinical assessments and compression ultrasonographies therefore we had a need to create one from scratch.
The format is based upon A. Thurins DVT-journals and conforms to the JSON-spec (Bray 2014) with RFC 3339 (Klyne and Newman 2002).
Within this format the risk of name clashing between features needed to be addressed.
For that purpose we assign a universally unique identifier (more commonly known
16
4.4. DATA FORMAT
as its abbreviation: UUID) to each feature, even the wells score features. A UUID is simply a 128-bit value commonly used in distributed systems to identify information without significant central coordination. The probability of a feature id clash is with the UUID:
p (n) ¥ 1 ≠ e ≠
n22x.
Were n is the number of features in our case.
With this we can train values with just sending a list of UUIDs and the system will find the patients with this set of features and return a SVM. We hope that making it easy to create a disjoint set of features will encourage experimentation.
The UUID must be represented by 32 uppercase hexadecimal digits, displayed in five groups separated by hyphens (e.g. B6C7A40E-6FA3-4C91-B31B-918C8776D474).
The JSON-keys are not optional and all the corresponding values must be non-empty.
JSON-Key JSON-Value
group String, name of a group the feature belongs to standardValue Float or boolean of start value
riskAssesmentItemsModelId String, UUID
timeCreated String, yyyy-MM-dd’T’HH:mm:ssZZZZZ descriptionText String, Long description
shortName String, Short description
JSON-Key Example
group “wells_score”
standardValue false
riskAssesmentItemsModelId “52004621-75CB-422F-9FBE-EC0D77C3E4A8”
timeCreated “2015-01-11T14:16:04+01:00”
descriptionText “Paralysis or recent plaster cast”
shortName “Paralysis”
The thing to note are:
• leftLeg & rightLeg contains examination points. The valid values for the childnodes are anyone of the set:
– “T1”, Thrombosis found level 1 - Biggest – “T2”, Thrombosis found level 2
– “T3”, Thrombosis found level 3 - Smallest – “Tr”, Thrombosis remnant
– “x”, Removed thrombosis via surgery
CHAPTER 4. IMPLEMENTATION – “#”, Missing
– “?”, Not visible – “-”, Not surveyed – “0”, Normal
• riskAssessments children has the features with UUID as keys and floats or booleans as values. The UUID keys are represented by 32 uppercase hexadec- imal digits, displayed in five groups separated by hyphens (e.g. B6C7A40E- 6FA3-4C91-B31B-918C8776D474).
Key Value
patientsSocialSecurityNumber String, Number without spaces and dashes
patientsName String
examinersName String, Person responsible for assessment riskAssessment Node, See notes
sonographyExamination Node
examiner String, Person responsible for sonography idCheck Boolean, Patients ID has been checked additionalInformation String, Sonography information
anamnesis String
complications Boolean
normalRepositoryVariance Boolean
rightLeg Node, See notes
leftLeg Node See notes
v-fem-communis-inguen String
fem-sup-dist String
tibialis-post String
s-magna-prox-femur String
iliaca-ext String
poplitea-prox String
tibialis-ant String
peronea String
gastrocnemius String
fem-com String
s-parva-prox String
poplitea-dist String
soleus String
fem-profunda String
v-fem-superficialis String
Key Example
patientsSocialSecurityNumber “9912290104”
18
4.4. DATA FORMAT
Key Example
patientsName “Svensson”
examinersName “Andersson”
riskAssessment —
sonographyExamination —
examiner “Andersson”
idCheck true
additionalInformation “Patient became sick”
anamnesis —
complications false
normalRepositoryVariance —
rightLeg —
leftLeg —
v-fem-communis-inguen “T1”
fem-sup-dist “Tr”
tibialis-post “T3”
s-magna-prox-femur “Tr”
iliaca-ext “T2”
poplitea-prox “x”
tibialis-ant “Tr”
peronea “Tr”
gastrocnemius “#”
fem-com “Tr”
s-parva-prox “?”
poplitea-dist “?”
soleus “—”
fem-profunda “Tr”
v-fem-superficialis “Tr”
Chapter 5
Results
5.1 Test set
Access was given to 159 patients DVT journals without identifiable information.
The information was manually extracted. It contained the Wells score questions, answers and whether a DVT or occlusion were found.
Table 5.1. Count of each label in dataset
DVT Nothing found
33 126
All SVMs were trained with optimized C and gamma values which is considered optimal when the cross-validation estimate of the test set error is minimal using 5 folds, looking for C-values between 2 ≠5 and 2 15 , gamma-values between 2 ≠15 and 2 3 . This should help against overfitting.
One should note that this data set does not say anything about the DVT rate amongst the general population as there is a heavy bias towards DVT as the patients that come to Klinisk Fysiologi to be examined in most cases already have been examined by doctors and is thought to have DVT.
5.2 Baseline
As a baseline we use Wells Score. Philip S. Wells et al. modeled Wells Score using univariate and stepwise logistic analysis (see Wells et al. 1997). This is the current gold standard when it comes to DVT assessment.
Table 5.2. Wells score (Bounameaux, Perrier, and Righini 2010)
Variable Points
Cancer treatment during the past 6 months +1
CHAPTER 5. RESULTS
Variable Points
Lower leg paralysis or plastering +1
Bed rest > 3 days or surgery < 4 weeks +1
Pain on palpation of deep veins +1
Swelling of entire leg +1
Diameter difference on affected calf > 3 cm +1
Pitting oedema (affected side only) +1
Dilated superficial veins (affected side) +1 Alternative diagnosis at least as probable as DVT -2
A score is given from analysing the patients medical history were each criteria is increasing the score by one, except if an alternative diagnosis is possible which decreases the score by two.
The old variant of Wells Score divided the probability into three classes Low, Intermediate and High.
Table 5.3. Clinical probability for Wells score
Low 0 total
Intermediate 1-2 total High > 2 total
5.3 Benchmarking
We knew that SVM with RBF kernel was fast but we were interested in just how fast training and assesment could be done with our very modest dataset. We were out to prove that the SVM prediction with RBF kernel would be able to run a prediction on every single change of the assessment questionnaire.
The hardware used was an iPad Air 2 (model A1566). Our tests show that this is very reasonable as the median of the time for prediction is in the sub millisecond range with a median of 0.11 ms. Even the training of the SVM model has a median of 1.79 ms. The benchmark was done with the Wells Score features and our existing dataset previously mentioned in this paper.
The following version of Clang was used
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) Target: x86_64-apple-darwin14.0.0
Thread model: posix
With the compiler directives (warning directives removed):
22
5.3. BENCHMARKING
clang -x objective-c -arch arm64 -fmessage-length=0 -fdiagnostics-show-note-include-stack
-fmacro-backtrace-limit=0 -std=c11 -fobjc-arc -fmodules -fmodules-prune-interval=86400
-fmodules-prune-after=345600 -fpascal-strings -O0 With the training model params:
svm_type = CvSVM::C_SVC;
kernel_type = CvSVM::RBF;
gamma = 0.033750;
C = 12.500000;
class_weights = {0.167914, 0.832086};
term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 1000, 1e-6);
As the performance was more than enough with our case we never benchmarked with release flags (-O3 or -Os).
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
time (ms) 0
5 10 15 20 25
de nsi ty
DVT Prediction (SVM - RBF)
µ = 0.12, median = 0.11 , = 0.04, sample size = 364
Figure 5.1. Time for classification
CHAPTER 5. RESULTS
1 2 3 4 5 6
time (ms) 0.0
0.2 0.4 0.6 0.8 1.0
de nsi ty
DVT Training (SVM - RBF)
gaussian kernel density estimation, µ = 1.87 , median = 1.79 , = 0.48, sample size = 2261
Figure 5.2. Time for training
5.4 Results
In this section we report results obtained by applying support vector machines to the patient data. The baseline is Wells Score with intermediate probability if not otherwise stated.
We started with a linear SVM and got a accuracy of 84.91% but it only had a sensitivity of 36.36% so it became evident that we needed to approach the problem with different error costs (DEC) in mind.
For comparison we have included both Wells Score with intermediate and high probability as defined by the table 5.3.
C-values are 312.50 for the linear SVM with DEC and SVM with 100% sensitivity both and 2.5 for SVM RBF with class weight 0.81. “ -values are 0.50 for the SVMs with RBF kernel.
24
5.4. RESULTS
Figure 5.3. Linear SVM with soft margins and different error costs compared to Wells Score. C : 12.5
With our dataset the RBF kernel has a better performance than the linear kernel figure 5.5.
In theory a SVM with a RBF kernel is going to out perform Wells Score with a good dataset as long as we have nonlinear data. The ROC curve in figure 5.5 hints that this is indeed the case for our test set.
Class Weight Accuracy Sensitivity Specificity BCR DOR
SVM RBF 0.9226 58.49% 100.00% 35.71% 67.85% Œ
Wells Score - MEDIUM N/A 23.12% 97.05% 3.17% 50.11% 01.08
SVM RBF 0.8196 81.11% 66.66% 84.92% 75.79% 11.26
SVM Linear 0.9193 65.40% 63.63% 65.87% 64.75% 03.37
Wells Score - HIGH N/A 58.49% 60.60% 57.93% 59.27% 02.11
CHAPTER 5. RESULTS
Figure 5.4. SVM with RBF kernel compared to Wells Score.
26
5.4. RESULTS
Figure 5.5. Comparsions of receiver operating characteristics (ROC) with varying
class weights. C : 2.5; “ : 0.50625; 100 iterations, 0.01
Chapter 6
Discussion
6.1 Class Weight (Aka. How much is a life worth?)
We have shown that we could counteract an unbalanced dataset and tweak the sensitivity or specificity by choosing the class weight. Wells Score on the other hand has three risk classes for DVT, low, intermediate and high. In our tests Wells Score patient data with a high risk just has a sensitivity of 60.60% with an accuracy of 58.49%. As sensitivity rate is complementary to false negative rate it means that 39.40 percent would be wrongly classified as not having deep vein thrombosis. More sobering is looking at the medium risk class which has the false negative rate at 2.95% but keep in mind that its specificity is only 3.17%, a 96.83% false positive rate. As seen, we can either choose high accuracy or high sensitivity. In one we tweak the class weight value to get an sensitivity value similar to Wells Score with a high risk and get 26.99 percentage point increase in specificity. The resulting class weight then is 0.9226/0.0774. If we instead maximise the sensitivity to be 100% we see an 35.27 percentage point increase in accuracy compared to Wells Score with medium risk.
6.2 What is a question worth?
Taking a subset with only the more relevant features is called feature selection.
Feature selection is important, in our case not for the performance of SVMs but for the limited time of doctors, asking the patients thousands of questions would not be feasible. As the median of training with our dataset is 1.79 ms it opens up for the possibility to train SVMs without a feature and check the difference of the balanced error rate (BER) which is the average of both the error rate of the positive class and the error rate of the negative class.
BER = F P/(T N + F P ) + F N/(F N + T P )
2
CHAPTER 6. DISCUSSION But only looking at the BER does not give us the whole picture. A feature is not worth much if very few people have had the symptoms. Because of this we show the amount of positive features to show how common they are in the dataset.
We also show the rate of deep vein thrombosis given that symptom, number of confirmed dvts/total number for the subset that had the symptom.
Balanced Error Rate (BER)
Dilated superficial veins (affected side) Cancer Alternative diagnosis at least as probable as DVT Bed rest > 3 days or surgery < 4 weeks Pitting oedema (affected side only) Diameter difference on affected calf > 3 cm Swelling of entire leg Previous DVT diagnostic Pain on palpation of deep veins Paralysis or recent plaster cast
0,00 11,00 22,00 33,00 44,00
Count
Dilated superficial veins (affected side) Cancer Alternative diagnosis at least as probable as DVT Bed rest > 3 days or surgery < 4 weeks Pitting oedema (affected side only) Diameter difference on affected calf > 3 cm Swelling of entire leg Previous DVT diagnostic Pain on palpation of deep veins Paralysis or recent plaster cast
0 35 70 105 140
Probability
Dilated superficial veins (affected side) Cancer Alternative diagnosis at least as probable as DVT Bed rest > 3 days or surgery < 4 weeks Pitting oedema (affected side only) Diameter difference on affected calf > 3 cm Swelling of entire leg Previous DVT diagnostic Pain on palpation of deep veins Paralysis or recent plaster cast
0,00 12,50 25,00 37,50 50,00