Case Based Reasoning method for analysing Physiological sensor data

(1)

International Master’s Thesis

Case Based Reasoning method for analysis of

Physiological sensor data

Asif Moinul Islam

Technology

Studies from the Department of Technology at Örebro University 0 örebro 2012

(2)

(3)

Case Based Reasoning method for analysis of

Physiological sensor data

(4)

(5)

Studies from the Department of Technology

at Örebro University 0

Asif Moinul Islam

Case Based Reasoning method for

analysis of Physiological sensor data

Supervisor: Amy Loutfi, Mobyen Uddin Ahmed

(6)

Title: Case Based Reasoning method for analysis of Physiological sensor data

(7)

Abstract

Remote healthcare is a demanding as well as emergent research area. The rise of healthcare costs in the developed countries have made the policy makers for trying to find an alternate model of healthcare rather than relying on traditional healthcare system. Although advancement in the sensor technology, forthcom-ingness of devices like smart phones and improvement in artificial intelligence technology have made the remote healthcare close to reality but still there are plenty of issues to be solved before it becomes a commonly used healthcare model. In this thesis, studies of two vital physiological parameters pulse rate and oxygen saturation were done to unearth some patterns using Case-Based Reasoning technique. A three-tiered application is developed focusing remote healthcare. The results of the thesis could be used as a starting point of further research of two above mentioned physiological parameters in order to detect anomalous condition of health.

(8)

(9)

Acknowledgements

Firstly, I would like to thank my supervisors Amy Loutfi and Mobyen Uddin Ahmed for their guidance and thoughtful suggestions. I wish to express my gratitude for the volunteers who helped me during the data collection selflessly. Finally, I am grateful to my family for their tremendous support.

(10)

(11)

cases . . . 31 4.2 Comparison with SVM . . . 31 4.2.1 Results of the experiment of the data set A of pulse cases 33 4.2.2 Results of the experiment of the data set B of pulse cases 35 4.2.3 Results of the experiment of the data set C of pulse cases 37 4.2.4 Results of the experiment of the data set D of pulse cases 39 4.2.5 Results of the experiment of the data set A of oxygen

saturation cases . . . 41 4.2.6 Results of the experiment of the data set B of oxygen

saturation cases . . . 43 4.2.7 Results of the experiment of the data set C of oxygen

saturation cases . . . 45 4.2.8 Results of the experiment of the data set D of oxygen

saturation cases . . . 47 4.3 Results summary . . . 49 5 Conclusions 53 5.1 Summary . . . 53 5.2 Future work . . . 54 Appendices 55

A Class Diagram of Hub Software 57

B Class Diagram of Web Server Application 59

C Class Diagram of Data Analysis Application 61

D Database SQL Script 63

(13)

List of Figures

2.1 CBR Cycle . . . 6

2.2 Spectroscopic and absorptive properties of oxygenated hemoglobin (HbO2) and deoxygenated hemoglobin (Hb) . . . 10

2.3 ECG signal . . . 11

3.1 Three tier system developed for collecting pulse oximeter data and analyzing data . . . 13

3.2 Nonin WristOx2 Model 3150 . . . 14

3.3 Android application to collect data from pulse oximeter . . . 15

3.4 Software to analyze cases in casebase . . . 16

3.5 PSD curve of pulse data . . . 18

3.6 PSD curve of oxygen saturation data . . . 18

3.7 Wavelet decomposition . . . 19

3.8 Pulse rate of a subject at different steps of data collection . . . . 20

3.9 Pulse case example . . . 21

3.10 Oxygenation case example . . . 21

4.1 Graphical view of the software to analyze cases . . . 24

4.2 An example of anomalous case . . . 29

4.3 Percentage of picking the case of same subject as nearest neigh-bor case (correct identification of subjects) for pulse and oxygen saturation cases for different set of fatures . . . 31

4.4 Plot of pulse rate of all the samples of a subject . . . 32

4.5 Plot of oxygen saturation of all the samples of a subject . . . 32

4.6 Percentages of correct identification of right subjects of Pulse data cases using CBR (5 nearest neighbor) and SVM (linear kernel) 50 4.7 Percentages of correct identification of right subjects of Oxygen Saturation data cases using CBR (5 nearest neighbor) and SVM (linear kernel) . . . 51

A.1 Class diagram of Hub Software . . . 57

(14)

B.1 Class diagram of web server application . . . 59 C.1 Class diagram of Data Analysis Application . . . 61

(15)

List of Tables

4.1 Details of the subjects . . . 23 4.2 The number of times pulse data cases were retrieved from the

cases of the same subject as nth (n=1 to 5) nearest neighbor po-sitions out of 40 test cases when different feature sets were used 25 4.3 Percentages of picking the cases of same subject as nearest

neigh-bor cases of pulse data at different neighneigh-borhood size (k) and different sets of features . . . 25 4.4 Results of CBR of the nearest case retrieval of pulse data cases . 27 4.5 The number of time oxygen saturation cases were retrieved from

the cases of the same subject as nth (n=1 to 5) nearest neighbor out of 40 test cases . . . 28 4.6 Percentages of picking the cases of same subject as nearest

neigh-bor cases of oxygenation data at different neighneigh-borhood size (k) and different set of features . . . 28 4.7 Results of CBR of the nearest case retrieval of oxygen saturation

data cases . . . 30 4.8 Formulation of different experiment data sets . . . 33 4.9 Results of SVM subject classification of pulse cases for data set A 34 4.10 Results of CBR subject classification of pulse cases for data set A 35 4.11 Results of SVM subject classification of pulse cases for data set B 36 4.12 Results of CBR subject identification of pulse cases for data set B 37 4.13 Results of SVM subject classification of pulse cases for data set C 38 4.14 Results of CBR subject identification of pulse cases for data set C 39 4.15 Results of SVM subject classification of pulse cases for data set D 40 4.16 Results of CBR subject identification of pulse cases for data set D 41 4.17 Results of SVM subject classification of oxygenation cases for

data set A . . . 42 4.18 Results of CBR subject identification of oxygenation cases for

data set A . . . 43 4.19 Results of SVM subject classification of oxygenation cases for

data set B . . . 44

(16)

4.20 Results of CBR subject identification of oxygenation cases for data set B . . . 45 4.21 Results of SVM subject classification of oxygenation cases for

data set C . . . 46 4.22 Results of CBR subject identification of oxygenation cases for

data set C . . . 47 4.23 Results of SVM subject classification of oxygenation cases for

data set D . . . 48 4.24 Results of CBR subject identification of oxygenation cases for

(17)

Chapter 1

Introduction

1.1 Motivation

Healthcare costs in the developed countries are increasing rapidly every year. In 2010, healthcare cost of USA was 17% of its Gross Development Product (GDP) and health care cost of USA has increased rapidly over the last few decades. In countries like Japan, Canada, Germany and some countries in Eu-rope the healthcare account has either crossed 10% or close to 10% of GDP of those countries1_{. As conventional healthcare systems struggle to cope with the}

increasing demand and mounting costs, many developed countries have been looking for new models of healthcare. Some causes of the increase in healthcare costs are growing number of aged population, chronic diseases, obesity, and rise of charge of health provider, hospitals and physicians. Although increase of the treatment volume is one of the significant causes of rise in the healthcare cost but it can be reduced if the first person interaction between the doctors and patients are lessened; which can be done by integrating different advanced technology in healthcare system. Wide-reaching operation of the mobile and wireless networks, the wireless infrastructure can hold up many current and emerging healthcare applications.

Periodically monitoring of vital health parameters like pulse rate, oxygen level in blood, blood pressure etc. can be very helpful for early detection of diseases, resulting reduction of treatment time as well as cost of the treatment. Remote monitoring of health parameter can be helpful to early detection and prevention of the diseases. Expansions of wireless technology, advancements in sensor technology, rapid increase in use of smart phone and internet have made it easier to monitor physiological data remotely. Mobile computing and biomedical sensor technology furnish a new model for healthcare, namely re-mote healthcare [29, 17], that can reduce the cost [13] and improve the quality of healthcare services.

1_{http://www.oecd.org/unitedstates/BriefingNoteUSA2012.pdf}

(18)

Researchers have been interested of health monitoring in out-of-hospital conditions for quite some time. There are still plenty of issues to be deal with before remote healthcare becomes as a widely used healthcare model. Some of the key challenges of implementing remote healthcare model are use of proper intelligent systems for the of multi-parametric data analysis, personalization of the systems used for remote management, supporting healthcare profession-als in decision making, correlating the multi-parametric data with established biomedical knowledge to derive clinically relevant indicators and proper alert system. Embedding expert knowledge for the autonomous monitoring of phys-iological is the key challenge and choosing appropriate artificial intelligence technique and its proper implementation could be the way to build up effective remote healthcare system. The inclusion of expert knowledge is a large ambi-tion. In this thesis attempt to make a concrete step towards this goal by using new methods for analysis of remote healthcare data which is collected by a suite of sensors that can measure phsyiological data continously.

1.2 Problem Statements and Objectives

This thesis investigates the pulse rate and oxygenation saturation data collected from people of different ages during different activity having the vision of find-ing out some patterns in those two physiological parameters. From the motiva-tion part it can be well understood the importance of the remote monitoring for a successful healthcare model. A system capable of remote monitoring is also an aspiration in this thesis. This thesis also examines ‘case-based reasoning’ as the core AI technique to analyze the data collected. The objectives of the thesis can be listed as follows:

• Feature extraction from pulse rate and oxygen saturation data

• Case formulation using selected features to implement CBR (Case-Based Reasoning)

• Develop a three-tiered application to collect and store data • Data collection

• Analyze data using case-based reasoning

1.3 Contributions of this thesis

The most important target of the thesis was to apply case based reasoning method to physiological sensor data collected from pulse oximeter in order to find out some patterns in it. Literature review shows lots of works has been done using ECG sensors but not many studies were found related to pulse rate and oxygen saturation data. Some major contributions of this thesis are

(19)

1.4. OUTLINE 3

• A three-tiered application that can be used as the base of remote health care using the data collected with pulse-oximeter.

• A four minutes long data collection protocol consists of four different activities that could be used for personal profiling.

• Revealing the fact that pulse rate data of an individual follow a pattern when the individual follows some scripted tasks but in case of oxygen saturation the pattern is not much distinguishable from others.

• Implementation of a CBR application using a variety of features extracted from pulse rate and oxygen saturation data of different domain that could be useful for disease diagnosis later on.

1.4 Outline

The rest of this thesis is organised as follows.

Chapter 2 provides a background study mainly focusing on case-based

reason-ing and different bio medical study related to physiological parameter like heart rate.

Chapter 3 gives the overview of the system developed in this project,

descrip-tion of the features extracted and the processing of the data.

Chapter 4 presents the experimental results focusing on the subject

identifica-tion based on the data collected. It also provides a performance compar-ison with the system developed and Support Vector Machine.

Chapter 5 contains summary and future directions.

1.5 Publications

Some of the work presented in this thesis has been published in a conference pa-per, available at http://oru.diva-portal.org/smash/get/diva2:540932/FULLTEXT01

• M.U. Ahmed, A.M. Islam, and A. Loutfi. A case-based patient identifica-tion system using pulseoximeter and a personalized health profile. 2012.

(20)

(21)

Chapter 2

Background

This chapter provides an overview of the case based reasoning and a compari-son of case based reacompari-soning with some other artificial intelligence techniques. A separate section contains the physiological sensor technology used in this the-sis and related to it. Some related works focusing on Case-Based Reasoning in medicine and remote health monitoring are also included.

2.1 Introduction to Case-Based Reasoning

Case-based reasoning (CBR) is a machine learning technique that exploits solu-tions of previously solved problems to solve a new problem. Similar problems have similar solutions is the base assumption of CBR. That is, if the problems A and B are close to each other then their solutions will also be close to each other and solution of one those two problems are known then that solution could be used to solve the other problem. The problem can be the type of plan-ning, diagnosis and designing [20]. In terms of CBR, case stands for a problem situation [1]. CBR can be viewed as a research paradigm [40, 35], as a perspec-tive on human cognition [35] and as a methodology [39]. It is analogous to problem solving method of human as an individual tries to recall a previously solved problem similar to the current problem. In the CBR system the memory of all stored previous cases is called the case base. The structure and representa-tion and cases, memory model to organize the cases and selecrepresenta-tion indexes used to identify the cases are three general issues that are to be considered when creating a case base.

The CBR consists of mainly four processes [1]; retrieve, reuse, revise and retain. These processes are termed as four REs. CBR cycle is illustrated in the Figure 2.1.

1. Retrieve In retrieve process, one or more cases similar to the current case

are retrieved using some matching algorithm from the database of previ-ously solved cases. Retrieval is one of the most important research areas

(22)

Figure 2.1: CBR Cycle [1].

in CBR. Some commonly used retrieval methods are nearest neighbor al-gorithm, induction methods and knowledge guided indexing.

2. Reuse In reuse process, a solution for the current problem is proposed

us-ing the retrieved cases. Adaptation of the solution is done in this step if needed to fit the new case.

3. Revise The proposed solution is tested implementing it in real life or through

simulator in the revise process.

4. Retain In retain process, the case base is updated with the successful solution

of the problem.

There are two kinds of systems in CBR; problem solving and interpre-tive [40, 21]. Problem solving system mainly focuses on constructing the so-lution of a new problem case by modifying the soso-lutions of previously solved cases. Interpretive systems evaluate new cases based on the similarities or dif-ferences with the previous cases.

Relational database representation, object oriented representation, predi-cate logic based techniques are some common methods used to represent cases. In many practical situations due to uncertanity it is difficult to express the cases with precise feature values, for this reason soft computing method is also used to represent the cases.

(23)

2.2. COMPARISON OF CBR WITH OTHER METHODS 7

2.2 Comparison of CBR with other methods

In most of the problem solving AI techniques, each problem is treated uniquely [21], which are not much efficient as in this approach there is no opportunity to re-duce the reasoning steps using previous experiences.

2.2.1 Rule-Based Reasoning

In rule-based reasoning, a problem is broken down into a set of individual rules, where each rule solves part of the problem. To solve the whole problem the rules are combined together. A rule-based system has a set of production rule of the form: IF X then Y, where X is a condition and Y is the action to be carried out if the condition X holds. The problem domain needs to be well known in order to create these rules. That means, to create these rules one has to know how to solve the problem. CBR systems can work better in the partially understood domain, as in CBR it is not mandatory to know how the problems are solved rather precedents are needed to be found out. Knowledge acquisition is a bottleneck rule-based reasoning as with rule-based system experts must be interviewed to capture the rules. Creating these rules can be extremely complex and time consuming. In addition, most of the real world problems are hard to model by rules. On the other hand, knowledge acquisition is much easier in CBR as many domains have existing cases (eg, medicine, design) that can be used to fill the case database [35].

2.2.2 Artificial Neural Network

An artificial neural network is a mathematical model of problem solving which is inspired by the working principle of neurons of human brain. Like CBR it does not have to go throught the knowledge acquisition bottleneck. In this ma-chine learning technique, the system is trained to get the appropriate weight for each neuron for solving a particular problem. The system works well when the problem domain is well understood and data is purely mathematical. For complex structured data CBR works much better than artificial neural net-work [38]. The net-working procedure of neural netnet-work is like a black box, the output of the network is a function of weighted vectors that depends on the network’s architecture and the training mode used. Validity of the systems de-cision cannot be justified because of the nature of its internal mechanism. In medical domain where explanation and justification are very important, CBR is more appropriate than neural network.

2.2.3 Support Vector Machine

A Support Vector Machine is a machine learning tool used for classification task. It finds a hyper plane that separates the feature space into two classes

(24)

with the maximum margin. It’s a good tool for binary classification but it runs slow as it is computationally expensive. Like Artificial Neural Network this method is also a black box on the contrary to case-based reasoning.

2.3 CBR in medicine

Case-based reasoning (CBR) in medicine is a research area that is growing rapidly. As increase of Artificial Intelligence in decision support systems in clin-ical practice, the use of CBR is expected to grow briskly within a few years. The CBR suits in medical decision making as expert knowledge in this domain not only consists of rules but also the knowledge of experience. Because of the continuous expansion of medical knowledge base it is becoming difficult for a person to have all the knowledge of certain field. The applications of CBR in medicine are focused mainly on diagnosis, classification, planning and tutoring.

One of the first CBR systems built in the health science domain was CASEY [23] and its task was to diagnose heart failure patients by comparing them to the earlier patients whose diagnosis were already known. MNAOMIA [9] was a diagnosis and treatment system for eating disorders in psychiatry based on CBR. Another early CBR system was PROTOS [7] which was used for di-agnosing audiological disorders. Some other early significant CBR systems are MEDIC [37] which is used to diagnose pulmonary disease; ALEXIA [6] decides a patient’s hypertension etiology; ROENTGEN [8] assists to design radiation therapy plans; MacRad [32] helps to interpret radiological images; HPISIS [30] diagnoses degenerative brain diseases using image segmentation of CT and MR brain images and T-IDDM [31] supports insulin-dependent diabetes mellitus patient management.

CARE-PARTNER [34] is a decision support system which is used to support long-term follow-up care of stem-cell transplant patients. FM-Ultranet [10] is used for interpreting ultrasound scans and diagnosing fetal malformation as well as abnormalities. NutriGenomics [25] is a system that provides nutrition counseling based on individual genetics, health and lifestyle. Somnus [24] is for supporting diagnosis and treatment of obstructive sleep apnea. WHAT [15] is a tutoring medical CBR system used for the education of sports medicine students. Some other recent CBR system used in medicine are GS.52 [16] for diagnosis of dysmorphic syndromes; COSYL [36] used for liver patient treat-ment strategies; and TeCoMED [33] used for forecasting epidemics of infection diseases.

Ahmed et al. [2] proposed a computer aided biofeedback system for stress diagnosis and treatment. As physiological parameter finger temperature was used and features were extracted from the time series data. Data were collected following a 9 minutes protocol where subjects had to do different tasks for different time duration. For retrieving similar cases modified distance function, similarity matrix and fuzzy similarity were inspected. The experiment was

(25)

car-2.4. PHYSIOLOGICAL SENSOR TECHNOLOGY FOR REMOTE HEALTH

MONITORING 9

ried out on seven subjects and fuzzy similarity function was found out to be most successful with accuracy of 85%.

Some reasons of CBR to be preferred in medical domain are listed below.

Knowledge acquisition Acquisition of knowledge is relatively easy in CBR

sys-tem as there are already large corpus of medical data are available in different medical institutions which can be used as cases as it is or having some minor changes.

Clear separation between objective and subjective knowledge In medicine

ex-pert knowledge is consisted of combination of rules and experience of the physicians. Therefore medical knowledge based systems contain both the objective knowledge found in textbook and the subjective knowledge col-lected from physicians experience which is limited to space and time and are changed frequently. In CBR systems subjective and objective knowl-edge can be used separately in the form of a case.

Easy to update and maintain In CBR inclusion of new cases make it possible

to update the changeable knowledge of the system automatically.

Transparency Unlike the black box type AI systems, a CBR system provides a

set of solutions which is closed to the new problem situation. As a result the decision proposed by the CBR system can justified comparing to the previous problem situations. Because of the transparent nature of CBR system, it is preferred in medical domain.

Cognitive adequateness A CBR system works almost like a physician as a

physi-cian tries to recall a previously treated patient similar to a new patient.

2.4 Physiological Sensor technology for remote

health monitoring

Development of sensor technology has made it possible to use wearable phys-iological sensors to measure pulse rate, oxygen saturation, electrocardiogram, blood pressure etc. and send the measurements through wireless for remote monitoring. Two significant techniques pulse oximetry and electrocardiogram are discussed in the following sections.

2.4.1 Pulse Oximeter

A pulse oximeter [4] provides immediate measurements of arterial oxygenation by determining the color of the blood between a light source and a photodetec-tor. Oxygenated hemoglobin absorb more infrared light and let pass more red light while deoxygenated hemoglobin absorb more red light and let pass more

(26)

infrared light. Two frequencies of light (red and infrared) are utilized to deter-mine the percentage of hemoglobin in the blood that is saturated with oxygen by a pulse oximeter. The percentage of hemoglobin in the blood is called oxygen saturation (SpO2). Oxygen saturation is considered as the fifth most significant vital sign following pulse rate, body temperature, blood pressure and respira-tion. Beside blood oxygen saturation a pulse oximeter also measures the pulse rate which is the frequency of pumping blood of heart during one minute. As the heart beats walls of the arteries expand and contact. The light responds to this pressure from that pulse oximeter measures pulse rate usually averaging from 5-20 seconds. Figure 2.2 shows the difference between light absorptive properties of oxygenated and deoxygenated blood.

Figure 2.2: Spectroscopic and absorptive properties of oxygenated hemoglobin (HbO2)

and deoxygenated hemoglobin (Hb).

Although pulse oximeter was first used in vital sign monitoring during op-erations and anesthesia, due to its non-invasive and immediate real time mon-itoring capability the use of pulse oximeter has been expanded to diagnosis, screening, patient follow up and self monitoring. Nowadays pulse oximeter is being used widely in hospitals, homes, clinics and rehabilitation centers for screening diseases like sleep apnea, cardiac disorders and monitoring vital signs for post operative patients and persons who are undergoing treatments that need continuous or time to time monitoring.

2.4.2 Electrocardiogram sensor

Electrocardiogram (ECG) is a noninvasive method of recording electrical ac-tivity of heart. This technique is widely used nowadays in clinical evaluation of cardiovascular diseases. An ECG consists of P wave - associated with the contraction of artia, QRS complex - associated with the contraction of ven-tricles, and T/U waves - which are associated with the repolarization of the

(27)

2.5. ANDROID BASED PHYSIOLOGICAL MONITORING 11

ventricles. An ECG signal is shown on Figure 2.3. ECG can be used to assess heart rhythm, to diagnose poor blood flow to the heart muscle, to diagnose a heart attack and abnormalities of heart, such as heart chamber enlargement and uncharacteristic electrical conduction.

Figure 2.3: An ECG signal consists of P wave, QRS complex, and T/U waves.

Advancement of sensor technology has made it possible to develop wearable ECG sensors having wireless communication capability. Beside the traditional 12 lead ECG, various types of light and wearable wireless ECG sensors are being used in continuous remote health monitoring as well sports physiology.

2.5 Android based Physiological Monitoring

G. Koshmak [22] worked on developing an android based patient monitoring system. Pulse rate and oxygen saturation from pulse oximeter were used to take the physiological parameters. The android system developed in that project was capable collect data from pulse oximeter sensor through blue tooth. Subjects’ activity was collected using accelerometer sensors of the mobile phone. The collected data were analyzed using three techniques: (1) change point detection (2) anomaly detection and (3) activity correlation.

Change point detection technique was used to detect sudden drop/jump dur-ing the monitordur-ing process. To detect change point data were splitted and mean value were calculated to each splitted portion. If the collected value of a certain point is higher or lower by a threshold value from mean value then the value of that point was considered as change point.

The next step after change point detection was to detect anomaly in the data. The main aim of the anomaly detection procedure was to search the whole data and find unusual and rare piece of data. Anomaly was detected using Sym-bolic Aggregate Approximation (SAX) technique. The complexity of the brute force algorithm used in [19] was O(m2_{), for this reason it was not applicable}

for large databases. To reduce the complexity of the brute force algorithm, it was modified and complexity was reduced to O(m).

(28)

The third parameter used was the level of the activity of the patient during data collection as it could be crucial for decision making. The activity was measured using the following equation.

Act = E[|va2− E[va2]|]; va=

q

a2x+ a2y+ a2z

where ax, ay, az are values obtained from accelerometer sensor of X, Y, and Z

axis respectively. The activity level was saved so that it could be used for future analysis by the physicians.

The system was used on two diseased persons and one person with normal health by collecting data from them continuously for three days. The techniques used were able to detect more change point for the diseased persons than the healthy person.

2.6 Summary

While the litterature has addressed how to use body area networks to monitor physiological data, little emphasis has been placed on the challenge of analysing this data. With the exception of [22] much focus has been put on creating the necessary technical infrastructure (sensors, communications, gateway). At the same time, there exists techniques to assist in bridging the gap between expert knowledge and sensor data which could be difficult or unintuitive to under-stand. As explained in this chapter, CBR is one such method. This thesis will apply CBR to the problem of physiological monitoring. However, there are a number of challenges in this context which make this application non-trivial. One challenge is that it is very difficult to obtain an experts opinion of the con-tinous monitoring data. As stated in [22], experts are unaccustomed to look-ing at measurements this way and rather rely only on thresholds to determine reasonable ranges of data. Further these thresholds are quite broad. A second challenge is that these systems are mainly intended for individuals who already suffer from a decline in health.Therefore, they may exhibit unusual patterns in comparison to a healthy individual but normal for them. Consequently, an individualised profile is necessary. This work will therefore exploit the aibility for CBR to do matching in order to determine if an individual deviates from his/her normal profile. A deviated pattern can then be described in terms of known cases from other individuals in order to relate and describe these devia-tions in an intuitive way to a expert.

(29)

Chapter 3

Methodology

This chapter describes the three tier system developed for analyzing physiolog-ical data collected from pulse oximeter sensor, data processing, feature extrac-tion, case representation and case retrieval method.

3.1 System overview

The system is based on three tier architecture. The client side consists of a pulse oximeter which is the data collection device in the system and an android sys-tem that act as an intermediate component or hub. The second tier is the server that collects and stores data after calculating features (described in 3.2) and stores it to a database. In the third tier there is an internet based desktop appli-cation that can be used to analyze the data collected. The system is illustrated in the Figure 3.1.

Figure 3.1: Three tier system developed for collecting pulse oximeter data and analyzing

data.

(30)

3.1.1 Data collection device

Nonin WristOx2 Model 3150 [28] was used as a data collection device. Pulse oximeter of this model has the capability of transferring data through Bluetooth. Its Oxygen saturation rate range is 0 100% and pulse rate range is 18 -300 beat per minute. It is capable of recording oxygen saturation with accuracy ±2 digits and pulse rate with accuracy ±3%.

Figure 3.2: Nonin WristOx2 Model 3150.

3.1.2 Hub device and software

A samsung ‘Galaxy Tab GT-P1000’ tablet of operating system android 2.3.3 was used as an intermediate hub to collect the data from the pulse oximeter de-vice. A software was developed to run on the android platform whose primary task is to act as via between the pulse oximeter and web server. After collecting the data from pulse oximeter it saves the sensor data and subjects’ contextual information to a text file. When data collection is finished, it sends the text file to a web server. Figure 3.3 contains a screenshot of the android application that acts as hub software to collect the data from pulse oximeter and sends it to server. The class diagram of the software is attached in the Appendix A.

(31)

3.1. SYSTEM OVERVIEW 15

Figure 3.3: Android application to collect data from pulse oximeter.

3.1.3 Web server

The web server of the system was developed using Java. The server receives the text file sent from the android device and extracts features from the file and stores it to a mysql database. It also saves the text file in the server for future use. The class diagram of the software is attached in the Appendix B and the SQL script of the database is attached in the Appendix D.

3.1.4 Data analysis application

An internet based desktop application was developed to analyze the pulse rate and oxygen saturation collected from the subjects using Java. It has capability of retrieving similar cases (section 3.4, 3.5) of the target case using different

(32)

set of features. A view for graphical representation is included in the software that helps to get a clear picture of the nearest retrieved cases. Retrieved cases of a target case sorted by similarity are shown in the Figure 3.4. The class diagram of the software is attached in the Appendix C.

Figure 3.4: Software to analyze cases in casebase.

3.2 Feature extraction

Like all other time series, data obtained from the pulse oximeter device us-ing the protocol (described in section 3.3) is high dimensional. Dimensionality needs to be reduced for analyzing or finding out pattern in the data set. In this thesis features from time domain, frequency domain and time-frequency domain were extracted.

Statistical features used in finding patterns in time series data1_{. Some}

com-mon statistical features like maximum, minimum, arithmetic mean [11] and standard deviation [11] were considered for time domain.

Very often time series data in time domain does not offer much features to extract. For this reason data need to be transformed from time domain to frequency domain. Moreover it has been observed in various researches that healthy and diseased persons’ data are differentiable in frequency domain [14, 41].

(33)

3.2. FEATURE EXTRACTION 17

Fourier transform of the input signal was done to get the frequency domain features. Fourier transform is one of the most influential methods of signal processing. It maps the time domain signal to frequency domain. The trans-formation of time domain signal to frequency domain makes certain features visible which were not visible in time domain data. The Fourier transform of a signal y(t) can be defined as

Y(f) = Z_∞

−∞

y(t)e−j2πftdt

In Discrete Fourier Transform, the input function is discrete that is input data are sampled. Discrete Fourier Transform (DFT) of a time series of length N is given as Y _n NT = 1 N N−1_X k=0 y kT e−i2πnkTNT

where T is the sampling frequency. The periodicities in input data and the rel-ative strengths of any periodic components can be revealed applying Discrete Fourier transform.

To calculate frequency domain features at first power spectral density was calculated from squared amplitude of Discrete Fourier Transform value of data using Fast Fourier Transform algorithm and scaling it to sampling frequency range by normalizing it to frequency bin width. Zero padding of data was done so that number of data samples becomes power of two for applying Fast Fourier Transform algorithm. From the power spectral density Low frequency power, High frequency power, Low frequency power to High frequency power ratio, Low frequency power peak amplitude and High frequency power peak amplitude, Low frequency power peak frequency, High frequency power peak frequency were calculated [14, 18, 5].

Frequencies between 0.04 Hz and 0.15 were considered as Low frequency and frequencies between 0.15 and 0.4 were considered as High frequency [14, 18, 5, 27, 26] shown in the Figure 3.5, 3.6. Power in High and Low frequency region was calculated by numerical integration of Power Spectral Density of the corresponding frequency range. The unit of power spectrum density and power for the pulse rate were BPM2_{(beats per minute) Hz}−_{1 and BPM}2_{respectively.}

Similarly frequency domain features for the oxygen saturation were calculated but in that case the unit of power spectrum density and power were (%)2_Hz−₁

and (%)2_{respectively.}

Wavelet transformation method is used extract features from biomedical signal [12]. Wavelet technique is one of the most advanced tools to process non-stationary signals. The continuous wavelet transform linked to mother wavelet ψ(t)can be defined as

(34)

Figure 3.5: PSD curve of pulse data. LF and HF regions are marked by blue and purple

respectively.

Figure 3.6: PSD curve of oxygen saturation data. LF and HF regions are marked by blue

and purple respectively.

W(a, b) = Z_∞

−∞

y(t)ψab(t)dt

where y(t) is any square integrable function and a, b are scaling and translation parameters respectively. By evaluating the continuous wavelet at dyadic interval the signal can be expressed as

(35)

3.3. DATA COLLECTION PROTOCOL 19 y(t) = ∞ X k=−∞ ∞ X j=−∞ dj(k)2/jψ(2jt − k)

where djis the discrete wavelet transform of the signal y(t).

Figure 3.7: Wavelet decomposition.

Unlike Fourier Transform, Wavelet transform can keep information of both time and frequency, as a result features extracted from Wavelet Transform is considered as time-frequency domain features. Figure 3.7 shows the wavelet decomposition at different level. Statistical features maximum, minimum, arith-metic mean and standard deviation was calculated from the approximation co-efficient of wavelet decomposition of level 1. The function ‘Daubechies 2’ was used as mother wavelet. Symmetric padding was used to make the data samples power of two to implement discrete wavelet transform.

3.3 Data collection protocol

The goal of data collection was to collect as much data as possible within short period of time as well as to get the pulse rate and oxygen saturation of a subject at different states of activity. To achieve the goal of data collection, a four steps protocol were followed where each subject performs four scripted tasks for one minute duration of each task. The step of the first minute is called ‘baseline’ where a subject does not do any kind of activity. This step is for the subject to get used with the device. In the second minute, a subject breathes deeply, inhaling with nose and exhaling with mouth. This type of breathing is also known as diaphragmatic breathing or abdominal breathing. The third step is termed as ‘activity’, where a subject has to walk briskly. At last step, a subject is asked to sit down and try to be relaxed. A similar but slightly different in time duration of the tasks were used in [3]. Figure 3.8 illustrates the pulse rate of a subject at different steps of the protocol for data collection.

(36)

Figure 3.8: Pulse rate of a subject at different steps of data collection.

3.4 Case formulation

To formulate a case all features from the three domains (section 3.2) in four sessions (section 3.3) plus the subjects’ contextual information (age, weight, gender, and blood pressure) were taken. Blood pressure was measured twice, once before and once after the end of taking measurements from pulse oximeter. As a result each case contained total 67 (4 × 15 + 7) features. More study were needed to find correlation between subjects’ pulse rate and oxygen saturation, for this reason data of pulse rate and oxygen saturation were considered as different set of same case and compared separately when similarity matching was done. An example case is illustrated in the Figure 3.9 and Figure 3.10.

3.5 Case retrieval

Nearest neighbor case is retrieved using similarity measurement function. Simi-larity of a feature between two cases was measured using normalized Euclidean distance between those features. Non numeric feature like gender is converted to numeric value substituting by numeric value (1 for male, 0 for female). The

(37)

3.5. CASE RETRIEVAL 21

Figure 3.9: An example of a case formulated from pulse rate.

(38)

function to calculate similarity of a feature between two cases is given in the Equation 3.1.

sim(Ti, Si) =

abs(Ti− Si)

max(i) − min(i) (3.1) Where Ti and Si are feature value of target and source case respectively and

max(i)and min(i) are the maximum and minimum value of the ith_feature

of all cases. Similarity between two cases were measured using the weighted average of all the features that are to be considered. The function for calculating similarity between two cases is given in the Equation 3.2.

sim(T, S) = Pn i=1w_Pi× sim(Ti, Si) n i=1wi (3.2) Where wiis the weight of the feature i. In this thesis all of the features weights

were set to one when it was to be considered as domain expert knowledge was not available.

(39)

Chapter 4

Results Analysis

To build the initial case library as well to do the experimental work for test-ing the system, data were collected from 16 persons of age between 25 and 59. The Table 4.1 contains the details of the subjects including body mass in-dex. Generally a case consists of a problem space and a solution space. In this thesis, the solution space of a cases is empty as no classification was done by the physicians for the collected samples. To investigate whether there are some patterns in individual’s profile, 8 of the subjects’ data were collected more than once. Similarity measurement function of CBR returns value between 0 and 1 where 1 stands for the identical case and 0 is for completely different case. The result section contains similarity values of the cases of the 4 subjects that have 10 cases each in the case base.

Table 4.1: Details of the subjects

Subject Gender Age Weight Height BMI

3 M 28 63 177 20.1 6 M 26 62 166 22.5 7 M 27 70 170 24.2 8 M 28 85 178 26.8 9 M 30 70 180 21.6 10 F 57 75 × × 11 F 59 60 × × 12 F 50 70 × × 13 F 55 65 × × 14 F 30 67 × × 15 F 45 66 × × 16 M 25 58 166 21.0 17 M 25 59 166 21.4 18 M 28 63 164 23.4 19 M 25 80 171 27.4 20 F 27 58 165.50 21.2 23

(40)

4.1 Nearest case retrieval

Software was developed (section 3.1.4) to retrieve the similar cases using the selected features. A user can select a case and find out the similarity of other cases to it. The retrieved cases are sorted according to highest similarity value so that the user can easily find out the most similar cases. Option to select which features will be used to calculate the similarities between cases are also included in the software.

The graphical view (illustrated in the Figure 4.1) of the software contains the graph of the target case and three of its nearest neighbor cases. The simi-larity of the cases can be more visible from the graphical representation of the data of a case to the user rather than textual view of the features of cases. As a result the graphical view gives a rough justification of whether the retrieved cases are really similar to target case.

Figure 4.1: Graphical view of the software to analyze cases.

Cases of those subjects who have most data were analyzed to find out how many times cases of the same subject become the nearest neighbor of each case. It was assumed that data sets of each subject collected at certain time will not be much different than the data sets of the same person collect at other times as all the subjects follow some scripted tasks during data collection. Similarities between the cases were measured considering sets of different domain (time, frequency, time-frequency) features to observe the effect of each domain. The results obtained are demonstrated in the tables 1 and 2. In the result only the highest ranked nearest neighbor is included, that is, if the system picks cases of

(41)

4.1. NEAREST CASE RETRIEVAL 25

the same subject as both first and second nearest neighbor, then it is counted as first nearest neighbor only.

4.1.1 Results of nearest case retrieval of cases of pulse data

Outcome of the nearest case retrieval of pulse data cases of the 40 targeted cases of 4 persons are documented in the Table 4.2, 4.3 and 4.4.

Table 4.2: The number of times pulse data cases were retrieved from the cases of the

same subject as nth (n=1 to 5) nearest neighbor positions out of 40 test cases when different feature sets were used

Features used # of cases retrieved as 1st nearest position # of cases retrieved as 2nd nearest position # of cases retrieved as 3rd nearest position # of cases retrieved as 4th nearest position # of cases retrieved as 5th nearest position All 32 2 1 1 1

All but Contex-tual 22 9 1 2 0 Time domain 22 9 2 2 1 Frequency do-main 12 10 4 3 1 Time-Frequency domain 23 8 1 2 0

Table 4.3: Percentages of picking the cases of same subject as nearest neighbor cases of

pulse data at different neighborhood size (k) and different sets of features

Features used k=1 k=2 k=3 k=4 k=5

All 80% 85% 87.5% 90% 92.5%

All but Contex-tual 55% 77.5% 80% 85% 85% Time domain 55% 77.5% 82.5% 87.5% 90% Frequency do-main 30% 55% 65% 72.5% 75% Time-Frequency domain 57.5% 85% 80% 85% 85%

From Table 4.2 of pulse data results, it can be viewed that out of 40 test cases when all the features are used 37 cases, when contextual features are

(42)

excluded 34 cases, when only time domain features are used 36 cases, when frequency domain features are used 30 cases and when time-frequency domain features were used 34 cases were retrieved of the same subject within 5 nearest neighbor. It can be observed from Table 4.3 that among the cases of pulse data, when all features are used, the system retrieves cases of same subject 92.5% times within 5 nearest neighbor but when features of the subjects’ contextual information are excluded the rate of picking same subject cases of the target case is reduced to 85%. When only the features of each domain were consid-ered, the time domain features gave the best result. The case retrieval rates of same subject were 90%, 75% and 85% for time domain, frequency domain and time frequency domain features.

Table 4.4 shows the results of the nearest case retrieval of 40 cases of pulse data. In that experiment contextual features (age, gender, weight, blood pressure) were not considered. For ‘subject 3’, it can be observed that, only the ‘case 11’ is a bit far from rest of the subject’s data where his nearest match was with another male of his age. For ‘subject 6 ’, ‘case 4’, ‘case 5’ and ‘case 7’ are different from rest of his data. Among three of these cases, ‘case 7’ can be considered as an anomaly as it is very far from his rest of the data and his nearest match for that case is the data of a 55 years old female. For ‘subject 7’, ‘case 6’ could be considered as an anomaly as it is very far from his rest of the data and that case finds its nearest match to the case of a 55 years old female subject. For ‘subject 8’, it can be observed that no case is very far from his rest of the cases.

(43)

Table 4.4: Results of CBR of the nearest case retrieval of pulse data cases

Case id

Subject id

Nearest neighbor using CBR of all cases

Nearest neighbor case of the same subject of the target case Similarity Subject id of the nearest retrieved case Similarity Rank of the neighbor 1 3 0.89622444 3 0.89622444 1 2 3 0.89622444 3 0.89622444 1 3 3 0.89196324 18 0.88664037 4 9 3 0.8716371 7 0.8663386 2 11 3 0.87961376 18 0.8533469 6 12 3 0.8547735 3 0.8547735 1 20 3 0.8547735 3 0.8547735 1 21 3 0.8896725 19 0.8713875 4 42 3 0.8764847 19 0.87357056 2 64 3 0.87357056 3 0.87357056 1 4 6 0.8929274 7 0.8266135 19 5 6 0.8916123 17 0.84520924 9 7 6 0.8902807 13 0.83529824 10 22 6 0.91714406 6 0.91714406 1 27 6 0.8723873 6 0.8723873 1 33 6 0.91332173 8 0.9054356 2 38 6 0.8873378 6 0.8873378 1 40 6 0.8873378 6 0.8873378 1 46 6 0.91714406 6 0.91714406 1 48 6 0.9116768 8 0.9054356 2 6 7 0.8469611 13 0.7958793 6 8 7 0.89087164 7 0.89087164 1 23 7 0.8688261 7 0.8688261 1 28 7 0.90541744 10 0.90497977 2 30 7 0.8833768 17 0.90497977 2 32 7 0.9246789 17 0.89897245 3 35 7 0.90572274 7 0.90572274 1 39 7 0.90572274 7 0.90572274 1 50 7 0.87628376 13 0.86452645 2 51 7 0.8791342 8 0.8607237 6 10 8 0.912681 8 0.912681 1 24 8 0.90613145 8 0.90613145 1 41 8 0.89518356 8 0.89518356 1 45 8 0.912681 8 0.912681 1 49 8 0.9163174 8 0.9163174 1 52 8 0.90613145 8 0.90613145 1 53 8 0.90027654 8 0.90027654 1 57 8 0.9163174 8 0.9163174 1 58 8 0.90425646 9 0.90389085 2 63 8 0.9103581 10 0.90566325 2

(44)

4.1.2 Results of nearest case retrieval of oxygenation cases

Outcome of the nearest case retrieval of oxygenation data cases of the 40 tar-geted cases of 4 persons are documented in the Table 4.5, 4.6 and 4.7.

Table 4.5: The number of time oxygen saturation cases were retrieved from the cases of

the same subject as nth (n=1 to 5) nearest neighbor out of 40 test cases Features used # of cases

retrieved as 1st nearest position # of cases retrieved as 2nd nearest position # of cases retrieved as 3rd nearest position # of cases retrieved as 4th nearest position # of cases retrieved as 5th nearest position All 20 5 0 4 3

All but Contex-tual 12 6 4 2 1 Time domain 12 4 3 2 5 Frequency do-main 6 7 7 3 3 Time-Frequency domain 12 5 4 0 4

Table 4.6: Percentages of picking the cases of same subject as nearest neighbor cases of

oxygenation data at different neighborhood size (k) and different set of features

Features used k=1 k=2 k=3 k=4 k=5

All 50% 62.5% 62.5% 72.5% 80%

All but Contex-tual 30% 55% 60% 85% 62.5% Time domain 30% 40% 47.5% 52.5% 65% Frequency do-main 15% 32.5% 50% 57.5% 65% Time-Frequency domain 30% 42.5% 52.5% 52.5% 62.5%

From Table 4.5 of oxygenation data results, it can be viewed that out of 40 test cases when all the features are used 32 cases, when contextual features are excluded 25 cases, when only time domain features are used 26 cases, when frequency domain features are used 26 cases and when time-frequency domain features were used 25 cases were retrieved of the same subject within 5 nearest neighbor. From the Table 4.6, it can be observed that the retrieval rate for case

(45)

of the same subject within five nearest neighbor was 80% but it was dropped to 62.5% when contextual features are excluded. The retrieval rates were 65% for when time and frequency domain features were used and was 62.5% when time-frequency domain features were used.

Table 4.7 shows the results of the nearest case retrieval of 40 cases of oxygen saturation data. In that experiment contextual features (age, gender, weight, blood pressure) were not considered. For ‘subject 3’, ‘case 1,2,3 and 11’ are different from the rest of the cases but here ’case 1’ could be considered as an anomaly as it is the least closest match among the rest of the cases of this subject. Inspecting that case (Figure 4.2), it was observed that at some point of data collection oxygen saturation level was below 90% which is truly an anomaly as a blood oxygen level of less than 90 percent is considered abnor-mal.. For ‘subject 6 ’, ‘case 4’, ‘case 5’ are different from rest of his data. For ‘subject 7’, ‘case 23’ and ‘case 32’ could be considered as an anomaly as it is very far from his rest of the data. For ‘subject 8’, ‘case 58’ is the most different from rest of his data and his closest match is a data of subject that has very different body mass index from him.

(46)

Table 4.7: Results of CBR of the nearest case retrieval of oxygen saturation data cases

Case id

Subject id

Nearest neighbor case of the same subject of the target case Similarity Subject id of the nearest retrieved case Similarity Rank of the neighbor 1 3 0.8915531 16 0.87227124 6 2 3 0.96119326 12 0.91429865 9 3 3 0.9311559 17 0.90141267 9 9 3 0.911606 6 0.8985948 2 11 3 0.93750995 16 0.91429865 10 12 3 0.9425295 3 0.9397215 2 20 3 0.9718714 3 0.9718714 1 21 3 0.9718714 3 0.9718714 1 42 3 0.9336297 19 0.9336297 2 64 3 0.9397215 3 0.9397215 1 4 6 0.95881146 8 0.8937622 19 5 6 0.92266977 3 0.8819374 10 7 6 0.89132154 8 0.8788708 2 22 6 0.92828774 8 0.8886192 9 27 6 0.94316417 8 0.93924785 2 33 6 0.93924785 6 0.93924785 1 38 6 0.96504796 8 0.95009106 2 40 6 0.9068299 18 0.87306654 4 46 6 0.8835546 18 0.8835546 7 48 6 0.9110712 6 0.9110712 1 6 7 0.9070576 11 0.89567643 2 8 7 0.9298797 7 0.9298797 1 23 7 0.8915241 16 0.78129363 40 28 7 0.9298797 7 0.9298797 1 30 7 0.9062441 7 0.9062441 1 32 7 0.917316 8 0.760276 42 35 7 0.9171369 19 0.9065314 3 39 7 0.8359213 6 0.8331242 2 50 7 0.8974729 6 0.88393044 4 51 7 0.9065314 7 0.9065314 1 10 8 0.92828774 6 0.8864837 7 24 8 0.93168396 18 0.86957335 5 41 8 0.96090865 8 0.96090865 1 45 8 0.94316417 6 0.89802897 11 49 8 0.9490119 17 0.9282778 3 52 8 0.96090865 8 0.96090865 1 53 8 0.96504796 6 0.96504796 6 57 8 0.9371015 17 0.92792976 3 58 8 0.93938 16 0.8898427 14 63 8 0.91697466 10 0.88008815 3

(47)

4.2. COMPARISON WITH SVM 31

4.1.3 Comparison of results between pulse and oxygenation

cases

For the cases of oxygen saturation retrieval rate of cases of same subject was less than pulse cases. It can be noticed from Figure 4.3, that the retrieval rates of picking the same subjects’ case of the target case for oxygen saturation data were less than 10-20% compared to cases of pulse data. The reason can be realized from Figure 4.4 and 4.5 where all the samples of pulse rate and oxygen saturation of a subject are plotted. It can be observed from those figures that many samples of pulse rate follow a close trail but it is not the case for oxygen saturation samples. That means pulse rates of an individual has tendency to follow certain pattern which is the main reason for the better rate of picking the same subject’s case for pulse rate than the cases of oxygen saturation.

Figure 4.3: Percentage of picking the case of same subject as nearest neighbor case

(cor-rect identification of subjects) for pulse and oxygen saturation cases for different set of fatures.

4.2 Comparison with SVM

To compare the performance of CBR with another machine learning technique, Support Vector Machine was implemented. As each subject was doing same scripted tasks during data collection so the subjects (persons) of the data sam-ples can be termed as classifiers. It was experimented how many times SVM can pick the subject of the case when the case is supplied. The experiment was done targeting 40 cases of the 4 subjects that have 10 cases each. For perform-ing the test properly usperform-ing different combination of trainperform-ing and test set data

(48)

Figure 4.4: Plot of pulse rate of all the samples of a subject.

Figure 4.5: Plot of oxygen saturation of all the samples of a subject.

were divided into 4 sets termed as set A, B, C and D which are illustrated in the Table 4.8.

In data set A, training set was constructed taking half of the data sam-ples(cases) of the targeted subjects plus other data samples (cases) of the rest of the subjects and test set was constructed taking the other half of the data samples(cases) of the targeted subjects which were not used in the training set. Set B was constructed excluding the data samples of non targeted subjects from the training set of the set A. Set C was constructed like set A except the data samples of the targeted subjects that were used in training set were put into test set and vice versa. Set D was formed reversing the training and test set of set

(49)

Table 4.8: Formulation of different experiment data sets

Experiment Sets Training Set Test Set

Set A

20 data samples of 4 subjects that have 10 samples each

+

all the data samples of the rest of the 12 subjects

20 data samples of the 4 subjects that have not been used during the training of

experiment set A

Set B 20 data samples of 4 subjects that have 10 samples each

experiment set B

Set C

20 data sample of the 4 subjects that have 10 data samples each that were used

during the test set of experiment set A and B

+

all the data samples of rest of the 12 subjects

experiment set C

Set D

20 data sample of the 4 subjects that have 10 data samples each that were used

during the test set of experiment set A and B

experiment set C

B. The reason behind exclusion of the data samples(cases) in set B and D were to observe how much the reduction of the data samples of the non targeted subjects’ effect in the performance of SVM and CBR system. For SVM, four types of kernel functions were used; linear, quadratic, Gaussian and polyno-mial. Results of the tests are included in details comparing the performance of the retrieval of nearest neighbor case using similarity measurement function of the CBR.

4.2.1 Results of the experiment of the data set A of pulse

cases

In data set ‘A’, data samples of non targeted subjects and half of the data of the targeted subjects were included for training SVM.

(50)

Table 4.9: Results of SVM subject classification of pulse cases for data set A

Case id Subject id Subject id picked using different kernel function of SVM Linear Quadratic Guassian Polynomial

2 3 18 18 18 18 9 3 3 20 3 17 12 3 7 17 18 7 21 3 18 7 3 6 64 3 3 6 3 6 5 6 7 17 17 17 22 6 6 19 6 16 33 6 6 16 16 16 40 6 6 6 6 6 48 6 16 16 8 16 8 7 7 17 8 17 28 7 17 17 17 17 32 7 18 17 16 17 39 7 8 16 7 16 51 7 8 16 16 16 24 8 16 8 8 8 45 8 8 17 8 16 52 8 8 8 8 8 57 8 8 8 8 8 63 8 8 8 8 8

(51)

Table 4.10: Results of CBR subject classification of pulse cases for data set A

Case id

Subject id

Nearest neighbor case of the same subject of the target case Similarity Subject id of the nearest retrieved case Similarity Rank of the neighbor 2 3 0.90453696 3 0.90453696 1 9 3 0.8784376 3 0.8784376 1 12 3 0.8699464 3 0.8699464 1 21 3 0.8926932 6 0.88482463 2 64 3 0.8867796 3 0.8867796 1 5 6 0.89476126 17 0.85716933 6 22 6 0.91972286 6 0.91972286 1 33 6 0.91158414 6 0.91158414 1 40 6 0.89910847 6 0.89910847 1 48 6 0.91158414 6 0.91158414 1 8 7 0.9022731 7 0.9022731 1 28 7 0.9149073 7 0.9149073 1 32 7 0.9226424 17 0.90952754 2 39 7 0.9094948 7 0.9094948 1 51 7 0.8833768 6 0.87527496 4 24 8 0.9159386 8 0.9159386 1 45 8 0.9193934 8 0.9193934 1 52 8 0.9159386 8 0.9159386 1 57 8 0.9250604 8 0.9250604 1 63 8 0.9155193 8 0.9155193 1

The Tables 4.9 and 4.10 show the performances of SVM and CBR case retrieval for the data set A of pulse cases. From Table 4.9, it can be noticed that out of 20 test cases of data set A, SVM picked the right subject correctly 11 times when linear and Gaussian functions are used as kernel. On the other hand from Table 4.10 it is visible that, CBR picked the cases of the same subject 16 times as first nearest neighbor and 19 times within 5 nearest neighbor.

4.2.2 Results of the experiment of the data set B of pulse

cases

In data set ‘B’, data samples of the non targeted subjects of data set ‘A’ were excluded, so only half of the data of the targeted subjects were included for training SVM.

(52)

Table 4.11: Results of SVM subject classification of pulse cases for data set B

2 3 3 3 3 3 9 3 3 3 3 7 12 3 7 8 7 3 21 3 6 7 3 7 64 3 3 6 3 3 5 6 7 6 7 7 22 6 6 3 6 6 33 6 6 6 7 6 40 6 6 7 6 6 48 6 6 7 7 7 8 7 7 7 8 7 28 7 7 7 8 7 32 7 7 7 7 7 39 7 7 7 6 7 51 7 8 7 7 7 24 8 8 8 8 8 45 8 8 8 8 8 52 8 8 8 8 8 57 8 8 8 8 8 63 8 8 8 8 8

(53)

Table 4.12: Results of CBR subject identification of pulse cases for data set B

Case id

Subject id

The Tables 4.11 and 4.12 show the performances of SVM and CBR case retrieval for the data set B of pulse cases. From Table 4.11, it can be noticed that out of 20 test cases of data set B, SVM picked the right subject correctly 17 times and 12 times when linear and Gaussian functions were used as kernel respectively. On the other hand, from Table 4.12 it is visible that, CBR picked the cases of the same subject 14 times as first nearest neighbor and 19 times within 5 nearest neighbor.

4.2.3 Results of the experiment of the data set C of pulse

cases

In data set ‘C’, data samples of non targeted subjects and half of the data of the targeted subjects were included for training SVM. The difference between data set ‘A’ and ‘C’ are, the training data samples of the targeted subjects of data set ‘A’ are used as testing and vice versa.

(54)

Table 4.13: Results of SVM subject classification of pulse cases for data set C

1 3 16 18 3 7 3 3 3 3 3 7 11 3 3 17 17 17 20 3 3 3 3 3 42 3 3 3 3 3 4 6 17 17 17 16 7 6 3 3 3 3 27 6 6 6 6 6 38 6 6 17 6 16 46 6 6 6 6 6 6 7 7 7 19 3 23 7 7 7 7 7 30 7 7 7 7 16 35 7 7 7 7 17 50 7 7 16 18 17 10 8 8 8 8 17 41 8 8 8 8 18 49 8 8 8 8 8 53 8 8 17 7 17 58 8 8 8 8 8

(55)

Table 4.14: Results of CBR subject identification of pulse cases for data set C

Case id

Subject id

The Tables 4.13 and 4.14 show the performances of SVM and CBR case retrieval for the data set C of pulse cases. From Table 4.13 it can be noticed that out of 20 test cases of data set A, SVM picked the right subject correctly 17 and 13 times when linear and Gaussian functions are used as kernel respectively. On the other hand, from Table 4.14 it is visible that CBR picked the cases of the same subject 16 times as first nearest neighbor and 18 times within 5 nearest neighbor.

4.2.4 Results of the experiment of the data set D of pulse

cases

In data set ‘D’, data samples of the non targeted subjects of data set ‘C’ were excluded, so only half of the data of the targeted subjects were included for training SVM.

(56)

Table 4.15: Results of SVM subject classification of pulse cases for data set D

1 3 3 3 3 7 3 3 3 3 3 3 11 3 3 7 3 7 20 3 3 3 3 3 42 3 3 3 3 3 4 6 6 7 6 7 7 6 3 3 3 3 27 6 6 6 6 6 38 6 6 6 6 7 46 6 6 6 6 6 6 7 3 3 3 3 23 7 7 7 7 7 30 7 7 7 7 7 35 7 7 7 3 7 50 7 7 7 3 7 10 8 8 8 8 8 41 8 7 8 8 8 49 8 8 8 8 8 53 8 8 8 7 8 58 8 8 8 8 8

(57)

Table 4.16: Results of CBR subject identification of pulse cases for data set D

Case id

Subject id

The Tables 4.15 and 4.16 show the performances of SVM and CBR case retrieval for the data set ‘D’ of pulse cases. From Table 4.15 it can be noticed that, out of 20 test cases of data set ‘B’, SVM picked the right subject correctly 17 times and 15 times when linear and Gaussian functions were used as kernel respectively. On the other hand, from Table 4.16 it can be observed that CBR picked the cases of the same subject 17 times as first nearest neighbor and 20 times within 5 nearest neighbor.

4.2.5 Results of the experiment of the data set A of oxygen

saturation cases

In data set ‘A’, data samples of non targeted subjects and half of the data of the targeted subjects were included for training SVM.

(58)

Table 4.17: Results of SVM subject classification of oxygenation cases for data set A

2 3 3 6 6 16 9 3 7 3 7 9 12 3 17 16 3 16 21 3 3 3 3 3 64 3 17 16 17 16 5 6 3 3 3 3 22 6 9 9 9 9 33 6 6 6 6 6 40 6 7 6 7 7 48 6 7 3 16 6 8 7 18 8 7 7 28 7 17 8 7 7 32 7 17 3 3 17 39 7 7 6 7 7 51 7 7 6 7 6 24 8 8 19 18 17 45 8 8 17 16 16 52 8 8 8 8 8 57 8 8 8 8 8 63 8 10 14 19 16

Case Based Reasoning method for analysing Physiological sensor data

International Master’s Thesis

Case Based Reasoning method for analysis of

Physiological sensor data

Asif Moinul Islam

Technology

Case Based Reasoning method for analysis of

Physiological sensor data

Studies from the Department of Technology

at Örebro University 0

Asif Moinul Islam

Case Based Reasoning method for

analysis of Physiological sensor data

Abstract

Acknowledgements

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Motivation

1.2

Problem Statements and Objectives

1.3

Contributions of this thesis

1.4

Outline

1.5

Publications

Chapter 2

Background

2.1

Introduction to Case-Based Reasoning

2.2

Comparison of CBR with other methods

2.2.1

Rule-Based Reasoning

2.2.2

Artificial Neural Network

2.2.3

Support Vector Machine

2.3

CBR in medicine

2.4

Physiological Sensor technology for remote

health monitoring

2.4.1

Pulse Oximeter

2.4.2

Electrocardiogram sensor

2.5

Android based Physiological Monitoring

2.6

Summary

Chapter 3

Methodology

3.1

System overview

3.1.1

Data collection device

3.1.2

Hub device and software

3.1.3

Web server

3.1.4

Data analysis application

3.2

Feature extraction

3.3

Data collection protocol

3.4

Case formulation

3.5

Case retrieval

Chapter 4

Results Analysis

4.1

Nearest case retrieval

4.1.1