Using a Smartphone to Detect the Standing-to-Kneeling and Kneeling-to-Standing Postural Transitions

(1)

IN

DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2018,

Using a Smartphone to Detect the Standing-to-Kneeling and Kneeling- to-Standing Postural Transitions

DAN SETTERQUIST

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

Using a Smartphone to Detect the Standing-to-Kneeling and Kneeling-to-Standing

Postural Transitions

DAN SETTERQUIST

Master in Computer Science Date: April 7, 2018

Supervisor: Hedvig Kjellström (KTH), Eric Schmidt (Bontouch AB) Examiner: Olov Engwall

Swedish title: Smartphone-baserad detektion av posturala övergångar mellan stående och knästående ställning School of Computer Science and Communication

(3)

(4)

iii

Abstract

In this report we investigate how well a smartphone can be used to detect the standing-to-kneeling and kneeling-to-standing postural transitions. Possible applications include measuring time spent kneeling in certain groups of workers prone to knee-straining work.

Accelerometer and gyroscope data was recorded from a group of 10 volunteers while performing a set of postural transitions according to an experimental script. The set of postural transitions included the standing-to-kneeling and kneeling-to-standing transitions, in addition to a selection of transitions common in knee-straining occupations.

Using recorded video, the recorded data was labeled and segmented into a data set consisting of 3-second sensor data segments in 9 different classes.

The classification performance of a number of different LSTM-networks were evaluated on the data set. When evaluated in a user-specific setting, the best network achieved an overall classification accuracy of 89.4 %. The network achieved precision 0.982 and recall 0.917 for the standing-to-kneeling transitions, and precision 0.900 and recall 0.900 for the kneeling-to-standing transitions.

When the same network was evaluated in a user-independent setting it achieved an overall accuracy of 66.3 %, with precision 0.720 and recall 0.746 for the standing-to-kneeling transitions, and precision 0.707 and recall 0.604 for the kneeling-to-standing transitions.

The network was also evaluated in a setting where only accelerometer data was used. The achieved performance was similar to that achieved when using data from both the accelerometer and gyroscope.

The classification speed of the network was evaluated on a smartphone. On a Samsung Galaxy S7 the average time needed to perform one classification was 38.5 milliseconds. The classification can therefore be done in real time.

(5)

iv

Sammanfattning

I denna rapport undersöks möjligheten att använda en smartphone för att upptäcka posturala övergångar mellan stående och knästående ställning. Ett möjligt användningsområde för sådan detektion är att mäta mängd tid spenderad knäståendes hos vissa yrkesgrupper.

Accelerometerdata och gyroskopdata spelades in från en grupp av 10 försökspersoner medan de utförde vissa posturala övergångar, vilka inkluderade övergångar från stående till knästående ställning samt från knästående till stående ställning. Genom att granska inspelad video från försöken markerades bitar av den inspelade datan som tillhö- randes en viss postural övergång. Datan segmenterades och gav upp- hov till ett dataset bestående av 3 sekunder långa segment av sensor- data i 9 olika klasser.

Prestandan för ett antal olika LSTM-nätverk utvärderades på datase- tet. Det bästa nätverket uppnådde en övergripande noggrannhet av 89.4 % när det utvärderades användarspecifikt. Nätverket uppnådde en precision av 0.982 och en återkallelse av 0.917 för övergångar från stående till knästående ställning, samt en precision av 0.900 och en återkallelse av 0.900 för övergångar från knästående till stående ställ- ning.

När samma nätverk utvärderades användaroberoende uppnådde det en övergripande noggrannhet av 66.3 %, med en precision av 0.720 och återkallelse av 0.746 för övergångar från stående till knästående ställning, samt en precision av 0.707 och återkallelse av 0.604 för över- gångar mellan knästående och stående ställning.

Nätverket utvärderades också i en konfiguration där enbart accelerometerdata nyttjades, och uppnådde liknande prestanda som när både accelerometerdata och gyroskopdata användes.

Nätverkets klassificeringshastighet utvärderades på en smartphone.

När klassificeringen utfördes på en Samsung Galaxy S7 var den ge- nomsnittliga körningstiden 38.5 millisekunder, vilket är snabbt nog för att utföras i realtid.

(6)

v

Acknowledgements

First, I would like to thank Eric Schmidt, for being my supervisor at Bontouch and providing excellent feedback and support for my work.

Secondly, I am very grateful to Hedvig Kjellström and Olov Engwall for providing academic guidance as well as being very responsive to my questions.

Lastly, I would like to thank my good friend David Rydberg for proof- reading this report and cheering me on.

Stockholm, April 2018 Dan Setterquist

(7)

List of Figures

3.1 The sensor coordinate system of an Android smartphone. 14 3.2 The model of a neuron. . . 19 3.3 An Multilayer Perceptron (MLP) with two input nodes,

two hidden layers and one output node. . . 19 3.4 An LSTM unit. . . 22 4.1 The placement of the phone during data collection. . . . 33 4.2 a) A volunteer using a screwdriver while kneeling. b) A

volunteer using a carpenter’s ruler while squatting. . . . 36 4.3 Box plot of the marked lengths of the transitions in each

class. . . 37 4.4 The marked length of each standing-to-kneeling and kneeling-

to-standing transition. . . 37 4.5 A standing-to-kneeling transition extracted at 3 differ-

ent window locations. . . 38 5.1 The data set splits for the experiments. . . 42 5.2 The confusion matrix of the LSTM(256) model on the

test set. . . 46 5.3 The combined confusion matrix of the runs of experi-

ment 2. . . 48 5.4 The combined confusion matrix of the runs of experi-

ment 3. . . 50

viii

(10)

List of Tables

5.1 The performance of the models evaluated in the grid search. . . 44 5.2 The performance of the LSTM(256) model when trained

with learning rate tuning. . . 45 5.3 The performance of the LSTM(256) model on the test set. 46 5.4 The performance of the LSTM(256) models in experi-

ment 2. . . 47 5.5 The performance of the LSTM(256) models in experi-

ment 3. . . 49 5.6 The performance of the LSTM(256) model on the test set

when using only accelerometer data. . . 51

Glossary

User-independent system A user-independent HAR system is a HAR system that is not tuned for any specific group of users.

User-specific system A user-specific HAR system is a HAR system that is tuned to a specific group of users. During the training of the system training data from the targeted users are included.

Mathematical Notation

Element-wise multiplication.

ix

(11)

x Acronyms

Acronyms

API Application Programming Interface.

ARC Activity Recognition Chain.

HAR Human Activity Recognition.

LSTM Long Short-Term Memory.

MLP Multilayer Perceptron.

ReLU Rectified Linear Unit.

RNN Recurrent Neural Network.

SGD Stochastic Gradient Descent.

(12)

Chapter 1 Introduction

Occupations that require workers to spend a significant amount of time kneeling are associated with knee problems. Examples of such occupations are floor and carpet layers and carpenters. In [1] Jensen and Eenberg perform a literature review of studies investigating links between kneeling, squatting or heavy physical work and knee disor- ders. They review 16 studies investigating knee osteoarthrosis, 5 studies investigating knee bursitis, and 3 studies investigating meniscal lesions. All 16 studies showed a significantly increased risk of knee osteoarthrosis in subjects with kneeling or squatting work. Knee osteoarthrosis is a deterioration of the knee joint which can cause pain and debilitation [2]. Further Jensen and Eenberg found that all 5 studies showed an increased risk of knee bursitis in subjects with kneeling work. Knee bursitis is an inflammation in the bursa of the knee and causes pain [2]. The links between kneeling work and knee problems have also been shown in later studies, for instance in [3] and [4].

People with knee-straining occupations could therefore benefit from better support. Previously, work within the field of Human Activity Recognition (HAR) has been used to support people in various applications. HAR is a field within computer science concerned with detecting the actions and intentions of users. The detection is usually based on some sensor data, for instance a video stream of the user or data from some sensor worn by the user. The sensor data is analyzed and interpreted to determine what actions the user is performing. The goal of the activity recognition is often to assist the user with what they are

1

(13)

2 CHAPTER 1. INTRODUCTION

doing or trying to accomplish. HAR is therefore related to fields such as ubiquitous computing, human-computer interaction and artificial intelligence. [5]

HAR research has many applications, for example in medicine and security. An example application of HAR is detection of falls in elderly.

Falls are a major cause of trauma and death in older people [6]. By automatically detecting these events the response time can be reduced and help can arrive faster. A similar and current application of HAR is that of the Hövding bicycle helmet. The helmet has an inflatable airbag that triggers when a fall or accident is detected, protecting the wearer [7].

This report describes a thesis work within the HAR field aimed at help- ing workers with knee-straining occupations. The research question of and motivation for this thesis are described next.

1.1 Research Question

The basis for this thesis is the following research question: How reliably can a smartphone be used to detect the standing-to-kneeling and kneeling-to- standing postural transitions?

The research question is evaluated by implementing a prototype system which runs on a smartphone and detects these postural transitions. This work also aims at answering questions regarding some design decisions of such a system, showing how it is best implemented in practice.

This project required a data set containing smartphone sensor data from the kneeling postural transitions. An extensive search at the start of the project revealed no such data available for the research. A major effort and contribution of this work was therefore the construction of such a data set. The capture and construction of this data set are described in Chapter 4.

(14)

CHAPTER 1. INTRODUCTION 3

1.2 Motivation

A system capable of detecting the standing-to-kneeling and kneeling- to-standing postural transitions has many use cases. By detecting these postural transitions it is possible to estimate the amount of time a worker spends kneeling. This statistic can help determine if workers are spending excessive amounts of time kneeling and if some workers are kneeling disproportionately more than others. This can help in- form if some workers would benefit from more support in their work.

Workers with knee-straining work often wear knee pads to protect their knees. These knee pads wear out with time and need to be replaced. Measuring the time a worker spends kneeling can therefore also help determine the frequency at which a worker’s knee pads need to be replaced.

The main motivation for this work is a previous system that was developed to solve this problem. The previous system is able to detect the postural transitions quite accurately when operational but suffers from a number of drawbacks. Some of the most important drawbacks of this system are:

• It adds additional costs to the knee pads.

• It uses external hardware sensors with batteries that need to be charged.

• The underlying hardware is experimental and can be somewhat unreliable.

Based on these limitations it was proposed that it might be possible to detect these postural transitions with the sensors of a modern smartphone. A system based on a smartphone would have a number of benefits:

• Since many people already own smartphones the additional costs would be low.

• By leveraging the users’ own phones there is no need to keep track of and charge additional sensors or hardware.

• Packaging the system as an app makes it easy to deploy and maintain.

(15)

1.3 Bontouch AB

This thesis work was commissioned by Bontouch AB. Bontouch AB is a company focusing on mobile application development founded in 2007. The products created by Bontouch AB are used by over 50 million people in 196 countries. This thesis work was performed at Bon- touch AB’s Stockholm office during the period August 2017-January 2018.

1.4 Delimitations

Some delimitations were made in advance to limit the scope of the project.

The research was limited to the Android platform. Android was chosen because it is an open source platform with a large market share.

Android is also supported by several brands of phone manufacturers.

In this project a single phone model was used for experimentation. Be- cause the underlying sensor hardware between platforms and phone models is similar, the results of this project could still be fairly general.

The data needed in this project was collected from a number of volunteers in a controlled environment. This project is therefore limited to this type of data. The data collection was subject to several delimitations which are discussed in Chapter 4.

1.5 Ethical, Societal and Sustainability As- pects

Care has been taken to present the contents of this report in an ethical and unbiased way. The goal was to produce a scientific report with reproducible results. In this section this work is put into a larger context and some broad potential consequences of the project are discussed.

(16)

This work is closely related to the topic of automatically monitoring people’s actions and behaviors. It is clear that technology in this field could be used for unethical purposes. Even if the application of detecting when someone is kneeling down might have a low risk of abuse, the techniques are general enough to be used for other types of monitoring. Especially when these techniques are used together with smartphones the potential for large scale abuse becomes elevated. Re- search in this area could therefore be seen as problematic. However, the case for this type of research is still strong. As mentioned above this type of research has many positive applications and can greatly help people in their everyday lives. The hope is that these benefits outweigh the risks.

Even though detecting kneeling would seem like a type of monitoring with low risk, it is still easy to come up with ways in which this data could be used for ethically questionable purposes. As mentioned previously, kneeling work is linked to knee problems, and data on a subjects kneeling behavior could therefore potentially be used to predict future knee problems. This information could be of interest to companies offering health insurance, which could use this data to raise health care premiums for people who spend significant amounts of time kneeling. This type of monitoring is not unprecedented. In the US several health insurers have started offering financial incentives for people to stay physically active, using wearable fitness trackers to ensure that certain fitness goals are reached. One such program is UnitedHealthcare Motion^®by UnitedHealthcare. The following quote from the website of UnitedHealthcare Motion [8] is a general description of the program:

Using wearable devices, UnitedHealthcare Motion helps participants track steps, set goals and earn financial incentives when they reach daily walking targets.

A later quote describes the goals the participants need to achieve to get the rewards:

Every day participants seek to achieve the three FIT goals.

• Frequency: 500 steps in 7 minutes; six times a day, at least one hour apart ($1.50 per day/$1.00)

• Intensity: 3,000 steps in 30 minutes ($1.25 per day/$1.00)

(17)

• Tenacity: 10,000 steps in one day ($1.25 per day/$1.00) This program and similar programs by other health insurers show that this type of data can be valuable to these companies, and that these companies use techniques similar to those used in this project to adjust pricing.

In general any work that supports workers and helps them perform their jobs in a safer manner can have positive impacts on society. The European Agency for Safety and Health at Work estimates that work- related accidents and injuries cost the EU †476 billion annually [9]. Be- cause of these costs there is a need to use human resources in a smarter and more sustainable way. Reducing work-related injuries would be beneficial both to reduce the number of sick days, increasing produc- tivity, and to reduce the cost of and burden on the health care system.

This project has some environmental aspects worth noting. The previous system used to detect kneeling, described briefly in Section 1.2, uses external hardware sensors. Manufacturing and shipping these sensors will have an environmental cost. Manufacturing smartphones also come with a cost, but this is a product that many people already need and own. From a sustainability perspective there might therefore be benefits from using these phones opportunistically instead of creating new hardware.

To perform HAR using a phone the phone’s sensors need to be con- tinuously sampled. This will likely increase the battery consumption of the phone, and therefore the environmental footprint. This extra battery consumption seems unavoidable when using a phone to perform HAR, but care can be taken to reduce the consumption as much as possible, for example by limiting the number of sensors used.

In this project data was collected from a number of volunteers. All of the volunteers consented to be a part of this project. In this report the volunteers are anonymized and only described in general terms.

(18)

1.6 Report Outline

Below is a short description of the outline of this report and a description of the contents of each chapter.

• Chapter 1, Introduction, gives a brief introduction to the prob- lem domain, states the research question of this report, and dis- cusses the delimitations and ethical considerations of this project.

• Chapter 2, Previous Research, reviews a number of recent re- search papers which are relevant to this project and traces the progress of the HAR field over the last decade. The reviewed research is the basis for the method selection of the project.

• Chapter 3, Theory, contains a review of necessary background information. It focuses on the sensors and machine learning techniques used in this project.

• Chapter 4, Data Collection, describes the data collection that was done in this project. It describes the experimental procedure, the labeling of the data, the resulting data set and the delimitations of the data collection.

• Chapter 5, Classification, describes the classification experiments that were performed on the data set described in Chapter 4, and the results of these experiments.

• Chapter 6, Discussion, contains a discussion of the results pre- sented in Chapter 5, as well as a general discussion of the project.

• Chapter 7, Conclusions, restates the conclusions drawn from the results in brief form.

(19)

Chapter 2 Previous Research

In this chapter a summary of a number of recent research papers within the HAR field is presented. The papers are closely related to the problem of this report and form the basis of the method selection of this thesis.

2.1 HAR with Feature Extraction

A well referenced HAR study is that by Bao and Intille [10]. In the study 20 subjects wore 5 biaxial accelerometers as they performed 20 different activities. The accelerometers were placed on the right hip, right wrist, right ankle, left upper arm and above the left knee. The 20 activities were selected to involve different parts of the body and differ in intensity and included walking, running, watching TV, reading and vacuuming. Several classifiers were tested on the data set. Results for recognizing the activities were good with decision tree classifiers having the best performance with an overall accuracy of 84 %.

The work of Bao and Intille has been cited numerous times. Previous works had mostly focused on data collected in controlled laboratory settings. Bao and Intille had their subjects collect and label their data on their own. This allowed their subjects to perform the actions under ordinary circumstances. It was one of the first works that showed that it is possible to perform HAR and get good classification results with naturalistic accelerometer data.

8

(20)

CHAPTER 2. PREVIOUS RESEARCH 9

Earlier work within the HAR field such as that by Bao and Intille often used standalone sensors. When modern smartphones started includ- ing sensors such as accelerometers and gyroscopes a lot of interest was shown in using these phones in HAR research. Two larger studies investigating the use of phones for HAR were done by Kwapisz et al. [11]

and Ortiz [12] which are described next.

In [11] Kwapisz et al. perform activity recognition using Android phones.

Accelerometer data was collected from phones placed in the pockets of 29 volunteers while they performed 6 different activities. The activities were sitting, standing, walking, jogging, ascending stairs and descending stairs. The data was split into 10 second segments which were turned into feature vectors by extracting 43 features from each segment. Three different classifiers were tested on the data. An MLP model had the best overall performance with an accuracy of 91.7 %.

In [12] Ortiz investigates the use of a smartphone for HAR. A group of 30 volunteers performed 6 different activities while wearing a smartphone on a belt around the waist. Data was collected from the accelerometer and the gyroscope of the phone. The performed activities were sitting, standing, lying down, walking, ascending stairs and descending stairs. The data was segmented and each segment was turned into a feature vector by extracting features from both the time and frequency domain of the signals. A number of different classifiers were evaluated on the data set. The best performing model was a support vector machine, which had an accuracy of 96.5 % when using data from both sensors and features from both the time and frequency domain.

The methods of Bao and Intille, Kwapisz et al. and Ortiz have a similar structure and use the same techniques. These techniques were formal- ized and described in depth by Bulling et al. In [5] Bulling et al. present the Activity Recognition Chain (ARC) which is a general framework for performing HAR. The ARC has several steps of which the most important are listed below.

1. Sensor data acquisition During sensor data acquisition the la- beled sensor data that will form the basis for the HAR project is collected.

2. Data segmentation In the data segmentation step the collected sensor data is segmented into smaller parts. This is often done

(21)

10 CHAPTER 2. PREVIOUS RESEARCH

using a sliding window. The window is moved along the time dimension of the data and at each position a segment of constant length is extracted.

3. Feature extraction Each segment is turned into a feature vector by extracting a set of features from the signals. Several different types of features have been used and evaluated in HAR research.

4. Training The extracted feature vectors are used in the training step of a classifier. Once the data segments have been turned into feature vectors any standard machine learning classifier can be trained on the data.

5. Evaluation Evaluation of the trained classifier can be done using standard machine learning evaluation techniques. Several different classifiers have been used and evaluated in HAR research.

Note that the methods of Bao and Intille, Kwapisz et al. and Ortiz all fit into this framework.

To illustrate the ARC, Bulling et al. perform a small case study. They record data from three accelerometers and gyroscopes located on the arm of two volunteers. The volunteers performed everyday activities like watering a plant and cutting with a knife, as well as some tennis swings. They evaluate different features and classifiers on the data set.

They test both user-specific systems, where the training data contains samples from both participants, and user-independent systems, where the training is done with data from one volunteer and the testing is done on data from the other volunteer. In general the user-specific performance was higher than the user-independent performance. By testing different features they conclude that the selected features can have a significant impact on the performance, but it is hard beforehand to know which features will work well.

The feature selection can therefore be a hard problem. The feature extraction also affects the the performance and footprint when doing real-time classification. In [12] Ortiz found that the feature extraction step used 92 % of the processing time of the classifier even when using a reduced feature set.

(22)

CHAPTER 2. PREVIOUS RESEARCH 11

2.2 HAR End-to-End

Recently there have been some works that employ deep learning techniques to skip the feature extraction and instead use the raw data, for instance the works of Ordonez and Roggen [13] and Hammerla et al. [14]. These works follow a general trend in machine learning enabled by modern deep learning techniques. More and more research move away from feature engineering and instead let the models learn the relevant features during training. These works are described next.

In [13] Ordonez and Roggen use deep learning techniques to perform HAR. They present the DeepConvLSTM network architecture and evaluate it on two public data sets. The DeepConvLSTM network uses a combination of convolutional neural units for feature extraction and Long Short-Term Memory (LSTM) units to model temporal relation- ships. By using these techniques the classification can be done on the raw data without extracting features. The data sets used are the OP- PORTUNITY data set and the Skoda data set.

The OPPORTUNITY data set was introduced in [15]. The data set consists of data from 12 volunteers as they performed activities of daily living in a sensor-rich environment. Sensors were worn by the volunteers on their bodies and placed on various objects in the environment.

The activities performed included drinking a cup of coffee, eating a sandwich and cleaning. Ordonez and Roggen used a subset of the data to evaluate the DeepConvLSTM network.

The Skoda data set was introduced in [16]. The data set consists of data from 1 volunteer performing activities in car manufacturing. The activities included opening and closing the doors and opening and closing the trunk. Ordonez and Roggen used data from 10 accelerometers placed on the right arm of the volunteer.

The DeepConvLSTM network of Ordonez and Roggen achieved state of the art performance on both data sets at the time, outperforming previously published results.

In [14] Hammerla et al. investigate a number of different types of deep networks on 3 different HAR data sets. The data sets used were the OPPORTUNITY data set, the PAMAP2 data set (sensor data from 9

(23)

12 CHAPTER 2. PREVIOUS RESEARCH

volunteers performing 12 lifestyle activities) and the Daphnet Gait data set (accelerometer data from 10 volunteers suffering from Parkinson’s disease).

Hammerla et al. investigate deep feed-forward networks, convolutional networks and LSTM recurrent networks with a number of different hyperparameter settings. A bidirectional LSTM network achieves state of the art performance on the OPPORTUNITY data set, outperforming the DeepConvLSTM network presented by Ordonez and Roggen.

They conclude that recurrent networks outperform convolutional networks for short activities that have a natural ordering. Another conclu- sion is that the performance of convolutional and recurrent networks is less sensitive to hyperparameter settings than the performance of deep feed-forward networks.

The works of Ordonez and Roggen and Hammerla et al. show that it is possible to perform HAR without feature extraction and still get good results.

(24)

Chapter 3 Theory

This chapter contains the necessary theoretical background for the report. Section 3.1 contains a description of the accelerometer and gyroscope sensors in Android phones. Section 3.2 briefly reviews basic machine learning concepts and algorithms. Sections 3.3-3.6 focus on neural networks which are important in this project. Section 3.7 reviews model evaluation techniques. Finally Section 3.8 presents two machine learning libraries used in this project.

3.1 Smartphone Sensors

Modern smartphones come with a multitude of sensors. Examples are accelerometers, gyroscopes, magnetometers, barometers and orientation sensors. The Android operating system allows these sensors to be sampled at user specified rates. The type of data generated depends on the sensor [17].

The accelerometer measures the acceleration of the phone. Each sampling of the accelerometer sensor gives three values, namely the acceleration of the phone in the x-, y- and z-directions. The coordinate system is defined relative to the default orientation of the phone’s screen.

The x-axis points to the right of the screen, the y-axis points to the top of the screen and the z-axis points out in front of the screen. Fig- ure 3.1a shows the coordinate system of the phone. The acceleration is measured in m/s². [18]

13

(25)

14 CHAPTER 3. THEORY

The accelerometer works by measuring the forces affecting the sensor.

Because the phone is influenced by the force of gravity the reported acceleration will always contain a gravity component. A stationary phone reports a total acceleration of approximately 9.8 m/s² [18]. Be- cause of the gravity component the acceleration signal contains im- plicit information about the orientation of the device.

The gyroscope measures the rotational speed of the phone. The rotational speed is measured around the x-, y- and z-axis. Positive rotation is determined by the right-hand rule. From the perspective of the origin positive rotation is reported when the phone turns clockwise around an axis. The rotational speed is measured in rad/s.

Sampling the accelerometer and gyroscope repeatedly gives the acceleration and rotational speed of the phone over time. This data forms a multivariate time series. See Figure 3.1b for an example of accelerometer data sampled from an Android phone.

(a) (b)

Figure 3.1: a) The sensor coordinate system of an Android smart- phone. Reproduced from [18]. b) Accelerometer data sampled from an Android phone.

3.2 Machine Learning

Machine learning algorithms are algorithms that learn from data. In this section basic machine learning concepts are presented and ex- plained.

(26)

CHAPTER 3. THEORY 15

3.2.1 Classification and Supervised Learning

Classification is a type of machine learning problem in which the goal is to assign a class or label to each input. An algorithm or model that classifies data is called a classifier. One example of classification is taking photos of flowers and labeling them with the name of the flower.

In classification it is common to have a training data set where each training sample x has an associated correct label y. Learning that is done by using training data with correct labels is called supervised learning [19]. The goal of the learning is to get a classifier that can correctly classify the training samples, but which is also able to generalize and correctly classify samples that are not present in the training set.

This classifier can then be used to label new and unknown data.

In supervised learning the model is presented with the correctly labeled training data and the parameters of the model are updated to increase its performance. For this to work there has to be a way of measuring the current performance of the model. This is done with the cost function.

3.2.2 Cost Function

An important concept of a machine learning problem is the associated cost function (also called error function or loss). The cost function is a measure of the current error of the algorithm [19, Ch 4]. A simple cost function for a classifier could for instance be to simply count how many labels it gets wrong. There are many standard cost functions available [20] that are used in different settings.

The goal of the training is to minimize the cost function by changing the parameters of the model. Minimizing the cost function is an optimization problem and if the cost function has well behaved derivatives it is often done with gradient descent.

(27)

3.2.3 Gradient Descent

Gradient descent is a conceptually simple optimization algorithm which is easy to understand in the one-dimensional case. Let f(x) be a uni- variate differentiable function with derivative f⁰(x). To minimize f(x) we select a random starting point x0 and calculate f⁰(x0). If f⁰(x0) > 0 we know that f(x) is increasing at x0. From this we know that we can get a smaller value of f(x) by calculating f(x0 ✏)(for sufficiently small ✏) and thus we let x1 = x0 ✏. By similar argument if f⁰(x0) < 0 we let x1 = x0 + ✏. The calculations are then redone with x1. This process continues until a stopping criterion is reached. [19, Ch 4]

General gradient descent is the extension of this algorithm to the multivariate case. Given a function f(x1, ..., xm)and starting values x0 we calculate the gradient rf = (_@x^@f₁, ...,_@x^@f_m). The gradient points in the direction of steepest ascent and the negative gradient points in the direction of steepest descent. To minimize the function we therefore adjust the variables x0 in the direction of the negative gradient [19, Ch 4].

In machine learning the function to minimize is the cost function L(✓) where ✓ is the parameters of the model. Using gradient descent to minimize the cost we calculate the gradient r^✓L(✓) and then update the parameters by

✓⁰ = ✓ ✏r✓L(✓)

where ✏ is a constant controlling the step size called the learning rate.

It is clear from the description that this algorithm is not guaranteed to find a global minimum, or even any minima. It can get stuck in a local minimum or a saddle point, or start to oscillate. It is still widely employed in practice and works well in many contexts.

3.2.4 Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is a way to speed up gradient descent by exploiting the structure of the cost function. The cost function is often an average or expected loss over the training samples which can be written as

L(✓) = E(x,y)

⇥L(f (x, ✓), y)⇤

(28)

where f(x, ✓) is the output of the model for input x and parameters ✓, y is the true label for input x and L is the loss for the sample. In this case the gradient is also an expectation

r^✓L(✓) = E^(x,y)⇥

r^✓L(f (x, ✓), y)⇤ .

When doing normal gradient descent which such a loss function the gradient of the sample loss L needs to be calculated for every sample in the training set. This means that the entire training set needs to be traversed for every parameter update. One such full traversal is called an epoch. The trick of SGD is to use the fact that the gradient is an expectation. This means that it can be estimated using only a few training samples. When using SGD the training set is divided into smaller parts called mini batches. The gradient is estimated and the parameters are updated after each mini batch has been processed. The parameters are therefore updated several times per epoch and this can greatly speed up learning [19, Ch 8]. SGD is a form of Monte-Carlo algorithm in that it is fast to compute but the estimated gradient might have an error.

3.2.5 RMSprop

The learning rate ✏ is an important hyperparameter when using gradient descent and can have a large impact on the learning. Tuning the learning rate may require significant effort. Several methods have been proposed that automatically tune the learning rate during the training, with RMSprop [19, Ch 8] being one such method. In RM- Sprop the learning rate of each parameter is tuned individually, which stems from the observation that the loss function can be much more sensitive to some parameters than others. RMSprop accumulates the squared components of the gradients during training. The current gradient is scaled by the inverse of the square roots of the accumulated gradient components. A hyperparameter ⇢ is used to exponentially decay the accumulated values as the learning progresses. If gt is the gradient computed for the mini batch t the RMSprop update is given by

at= ⇢at 1+ (1 ⇢)gt gt

✓t= ✏ p + at

gt

(29)

✓t+1 = ✓t ✓t

where atare the accumulated values, ✏ is the global learning rate and is a small constant used to prevent division with very small num- bers.

Because of the way RMSprop is formulated parameters with large gradient components will have their learning rate decreased, while the learning rate of parameters with small gradient components will be boosted. This will lead to faster learning for parameters where learning would otherwise be slow while dampening the effects of large gradients in other directions.

3.3 Neural Networks

Neural networks are a type of machine learning algorithms loosely inspired by neuroscience. The basic computational unit of a neural network is a mathematical model of the neurons in the human brain.

A single neuron takes a vector as input x = (x1, ..., xm). Each input has a corresponding weight so the neuron has a weight vector w = (w1, ...., wm). The output of the neuron is calculated by

f (x, w) = g 0

@ Xm

i=1

xiwi

1

A = g(x · w) (3.1)

where g(z) is the activation function [21]. A simple activation function is the threshold function

gt(z) = 8<

:

1 z > ✓ 0 z  ✓

With the threshold function the neuron could act as a simple binary classifier, dividing the inputs into two different classes. There are many different activation functions with various properties. Figure 3.2 shows a diagram of the model of equation 3.1.

(30)

Figure 3.2: The model of a neuron.

Neural networks are created by connecting many neurons together into an overall structure. A common architecture is the MLP. In the MLP the neurons are organized into layers. Each neuron in a layer has connections to every neuron in the previous and next layers, which is why the layers are commonly referred to as fully-connected or dense.

The first layer is the input nodes, the last layer is the output nodes and the layers in between are called hidden layers. Figure 3.3 shows a simple MLP with two input nodes, two hidden layers and one output node. The MLP in Figure 3.3 could be used as a binary classifier of the input data by thresholding the output.

Figure 3.3: An MLP with two input nodes, two hidden layers and one output node.

The output of the MLP is computed layer by layer. If x is the output of the previous layer the output of the current layer y is calculated by

y = g(b + Wx)

where W is a matrix containing the weights of the edges between the previous layer and the current layer and b is a bias vector.

(31)

MLPs are trained using supervised learning. The input samples are given to the network and the current outputs are calculated. The outputs are compared to the known targets and the error is calculated.

The network learns by adjusting the weights to minimize the error.

The weights are therefore the learnable parameters of the model and correspond to ✓ in Section 3.2.3. The algorithm used to update the weights is called backpropagation, which is a form of gradient descent.

Using backpropagation the gradient can be successively pushed back in the network so that all weights can be updated based on the error at the output nodes. [21]

3.4 Recurrent Neural Networks

An MLP is a type of feed-forward network. In feed-forward networks the data flows only in one direction, from the inputs to the outputs.

Feed-forward networks have no persistent state. After the training the network never changes and the same input always gives the same output.

Recurrent Neural Networks (RNNs) have connections that allow the data to flow back in the network. This gives these types of networks a state which is used when producing the output. The state is a form of memory, allowing these networks to alter their outputs based on previous data.

A simple RNN is a modification of the MLP where each layer gets its output at the previous time step as input in addition to the output of the previous layer. This modifies the calculation of the output of a layer as follows. If xtis the output of the previous layer at time step t the output of the current layer ytis calculated by

yt = g(b + Wxt+ Uyt 1)

where U is the weight matrix of the recurrent connections of the layer.

RNNs are natural when processing time series. Because they can remember previous inputs RNNs can learn time dependencies in the data. The time steps of time series data can also be fed to the network one at a time which allows RNNs to operate on variable length data.

(32)

The network produces an output for each time step which means that the network outputs a sequence for each time series input. When doing classification usually only the last output is kept, which is the output the network produces after observing the entire sequence.

Recurrent networks are trained similarly to MLPs with an extension to backpropagation called backpropagation through time. Backpropaga- tion through time propagates the gradients back through every time step to update the weights of the network. This leads to a problem of RNNs, namely the problem of vanishing and exploding gradients. The problem arises because the gradient need to be propagated back many time steps. Each propagation can cause the gradient to decrease or increase exponentially. If the gradient decreases and becomes very small (vanishes) then learning is impossible. If the gradient increases exponentially (explodes) learning becomes unstable [22] [19, Ch 8]. This effect is very pronounced and it can be difficult to train basic RNNs even on sequences where dependent events are separated by only 10 to 20 time steps [22] [19, Ch 10].

3.5 Long Short-Term Memory

Long Short-Term Memory (LSTM) networks were introduced by Hochre- iter and Schmidhuber in 1997 [23] and addresses the problems with vanishing and exploding gradients of basic RNNs. LSTM units intro- duce several new concepts and they have a more complex structure than the basic RNN units. There are several versions of LSTM units but commonly they consist of an internal cell state, an input gate, a forget gate and an output gate. The cell state is a memory where the cell can store values. The gates control access to the cell state. Fig- ure 3.4 gives a schematic view of an LSTM unit.

In the figure the input gate is marked by Gⁱ_t, the forget gate is marked by G^f_t and the output gate is marked by G^o_t. These gates are in essence small neural networks. They take as input the input vector xtand the previous output of the unit yt 1. They have associated weight matrices and bias vectors that they transform their inputs with. Finally they have an activation function which is usually the logistic function. The logistic function clamps the output of the gates between 0 and 1. Since the output of the gates is used in multiplications at certain points in

(33)

Figure 3.4: An LSTM unit.

the LSTM unit they control the data flow in the unit. If a gate outputs a value close to 1 the corresponding data gets through and if the gate outputs a value close to 0 the data is removed. The input gate determines which parts of the cell state that get updated with new values, the forget gate determines which parts of the cell state that are remembered to the next time step, and the output gate determines which parts of the cell state that are output to the rest of the network.

During training the gates learn to open and close at appropriate times so that the LSTM unit can learn the data. The gate activations are given by

Gⁱ_t = (Wxixt+ Wyiyt 1+ bi) G^f_t = (Wxfxt+ Wyfyt 1+ bf)

G^o_t = (Wxoxt+ Wyoyt 1+ bo)

W are weight matrices, b are bias vectors and is the logistic function. The potential new cell state is marked by ˆct in Figure 3.4. It is calculated by

ˆ

ct= gi(Wxcxt+ Wycyt 1+ bc)

which is very similar to the gate activations. Here gi is an activation function. Based on the gate activations and the potential new state the

(34)

new cell state is calculated as

ct = ˆct Gⁱ_t+ ct 1 G^f_t.

This is where the gates control what gets stored in the cell state. In ˆ

ct Gⁱ_tthe input gate determines which parts of the new state to keep and in ct 1 G^f_t the forget gate determines which parts of the previous cell state to remember. When the new cell state has been calculated the output of the unit is given by

yt= go(ct) G^o_t

where go is an activation function. The elementwise multiplication allows the output gate to determine which parts of the output that get sent out to the rest of the network.

LSTM networks do not suffer from the gradient problems of basic RNNs and can learn long term dependencies in the data. [23] [22] [24, Ch 4] [25] [19, Ch 10]

3.6 Common Architectural Components

Designing a neural network requires many architectural decisions. This section contains some common choices that are used as defaults in many contexts.

3.6.1 Rectified Linear Units

The rectified linear function r(x) is defined as r(x) = max(0, x).

It is commonly used as an activation function in neural networks. Units that use the rectified linear function as activation are called Rectified Linear Units (ReLUs).

(35)

3.6.2 Softmax Activation

Softmax is a layer activation function that is often used in the output layer of the network when doing classification. The output layer is chosen to have the same number of units as the number of classes.

Each unit in the output layer learns to activate for a single class. The softmax function takes the activations of the output layer and turn them into a probability for each class. The function is defined as

(x) = e^x¹ P

je^x^j, ..., e^x^m P

je^x^j

! .

All values of the output vector are between 0 and 1 and sum to 1 and can thus be interpreted as a probability distribution over the classes. [19]

3.6.3 Categorical Cross Entropy

Categorical cross entropy is a cost function. When using softmax activation in the output layer the network outputs a probability distribution over the classes. The true labels can also be represented as probability distributions where all the probability mass is concentrated on a single class. Cross entropy is a way to measure the similarity between these distributions. If the distributions are similar the cross entropy is close to 0 and if they are dissimilar it is large. Minimizing the cross entropy therefore pushes the output distributions to be similar to the target distributions. If f(x, ✓) is the network output for input x, y is the target distribution and the length of f(x, ✓) and y is M, then the cross entropy is defined as

H(y, f (x, ✓)) = XM

j=1

yjlog(f (x, ✓)j).

Since in the target distribution all the probability is focused on one class this is equivalent to

H(y, f (x, ✓)) = log(f (x, ✓)_j⇤)

where j^⇤ is the index with the mass. The loss function is then defined as the average cross entropy over all the training samples. If D is the

(36)

training data with N samples the loss is defined as L(✓) = 1

N X

(x,y)2D

log(f (x, ✓)j^⇤).

Note that since the loss function is defined as an average it is well suited to be used with SGD.

3.6.4 Dropout

Dropout is used to prevent the network from overfitting to the training data and increase its ability to generalize to unseen data. When using dropout the output of a fraction of the network’s units are set to 0 for a batch. These units are therefore not used during that training cycle. The units to drop are usually sampled at random for each batch.

Dropout increases the robustness of the network as it is forced to learn redundant representations of the data and prevents it from fitting to noise. [19, Ch. 7]

3.7 Model Evaluation

To find a good model different models need to be objectively compared. This is done by comparing metrics between models. This section describes some commonly used model evaluation techniques.

3.7.1 Data Splitting

To perform model evaluation the data set is commonly split into 3 sub- sets, the training set, the validation set and the test set. The training set is used in the training to update the parameters of the model. It is based on the errors on the training set that the gradient descent is performed, which means that the performance of the model on the training set usually increases almost monotonically as training progresses.

Since the goal is to get a model that generalizes well to unseen data the performance on the training set is not useful. The validation set is used to tune the hyperparameters of the model to get good general- ization. Using the validation set it is for instance possible to determine

(37)

when to stop the training. After each model update the performance on the validation set is evaluated. When the performance on the validation set stops improving or starts to degrade the training is stopped, as that is a sign that the model is beginning to overfit to the training data. When the training is done the performance of the model is evaluated on the test set. Since this set has not been used to tune either the parameters or the hyperparameters of the model the performance is a good indicator of the performance of the model in a real setting. [19, Ch. 5]

3.7.2 Confusion Matrix

A confusion matrix [26, Ch. 2] is a way to visualize the performance of a classifier. Each row and column in the matrix corresponds to one class. For an N-class classification problem the confusion is therefore an N ⇥N matrix. For each prediction the cell corresponding to the row of the true class and the column of the predicted class is incremented.

When the predictions have been made the confusion matrix holds information about which classes are commonly confused and misclas- sified. If the classifier is perfect the confusion matrix is diagonal, as every predicted class matches the true class.

3.7.3 Accuracy and Error

The accuracy [26, Ch. 2] of a model is the fraction of correct classifications out of all classifications. Given an N-class classification problem with an N ⇥ N confusion matrix C the accuracy is defined as

accuracy =

PN i=1C_ii PN

i=1

PN j=1Cij

.

The error is the fraction of incorrect classifications out of all classifications. It is therefore defined as

error = 1 accuracy.

(38)

3.7.4 Precision and Recall

The precision [26, Ch. 2] of a class is the fraction of correct predictions of that class out of all predictions of that class. Given an N ⇥ N confusion matrix C the precision of class c is defined as

precision_c = Ccc

PN i=1Cic

.

The recall [26, Ch. 2] of a class is the fraction of correctly classified samples of that class out of all samples of that class. Given an N ⇥ N confusion matrix C the recall of class c is defined as

recallc = Ccc

PN i=1Cci

.

3.8 TensorFlow and Keras

TensorFlow is a software library for numerical computations developed by Google. In TensorFlow computations are represented as graphs.

Nodes represent mathematical operations and the edges represent data flowing between the nodes. TensorFlow is a general computation framework but it is often used for deep learning. It was originally developed to research machine learning and deep neural networks. [27]

Keras is a python library for deep learning developed by Francois Chollet. Keras provides high-level abstractions and an Application Programming Interface (API) for designing machine learning models.

The goal of Keras is to allow rapid prototyping and experimentation.

Complex deep neural network models can often be created with just a few lines of code. [28]

Keras provides utilities for the entire machine learning pipeline: data preprocessing, model construction, training, testing and evaluation.

By default Keras uses TensorFlow as backend to do numerical calculations. After a Keras model has been trained the backing TensorFlow model can be exported. It is then possible to import the trained Ten- sorFlow model on an Android smartphone by using the TensorFlow

(39)

Android API [29]. Therefore it is possible to train a model on a desk- top computer using Keras and then later use the model for real-time classification on an Android smartphone.

(40)

Chapter 4 Data Collection

This project required a data set containing smartphone sensor data from the kneeling postural transitions. During the initial research no suitable existing data set was found for this project. A major effort and a contribution of this work was therefore gathering such a data set experimentally. The data collection and the resulting data set are described in this chapter.

The captured data was used as a basis for machine learning methods.

Since these algorithms learn from data, the training data should mimic real-world data as closely as possible. Because of the constraints of the project the data could not be gathered in the field. The data was instead captured from a group of volunteers in a more laboratory-style setting. To keep the data as real as possible the volunteers performed naturalistic actions that could occur in the targeted occupations.

4.1 Design

The overall design of the data collection was similar to the data collection method used by Ortiz [12]. The collection was performed by the method outlined in the following steps:

1. An app was developed that records sensor data from the accelerometer and gyroscope of an Android phone.

2. The app developed in step 1 was used to collect sensor data from

29

(41)

30 CHAPTER 4. DATA COLLECTION

a group of volunteers. The volunteers performed a set of tasks while having a phone running the app in one of their pockets.

The volunteers were filmed during the experiment.

3. The recorded video of the volunteers was used to label the collected sensor data.

4. The labeling of step 3 was used to extract labeled sensor data samples from the collected data.

The elements of the data collection are described in the following sections.

4.1.1 Data Collection App

The first step was to develop an app which samples the accelerometer and gyroscope of an Android smartphone. The app records the times- tamp, x-acceleration, y-acceleration, z-acceleration and the rotational speed around the x-, y- and z-axis of each sample. The app samples the sensors at 50 Hz, as this frequency was used successfully by Ortiz [12].

The data is stored as comma-separated values.

4.1.2 Experimental Procedure

The experimental procedure consisted of a set of tasks that the volunteers performed. The tasks were designed to be as similar as possible to tasks that a real-world user of the system could perform. The experimental procedure performed by the volunteers consisted of the following activities, designed to collect data from different transitions:

1. Kneeling down, tightening a screw on the floor, and standing back up

2. Stepping up on a chair, loosening and tightening a light bulb, and stepping down.

3. Squatting down, performing measurements with a carpenter’s ruler, and standing back up.

4. Sitting down on a chair, relaxing, and standing back up.

5. A couple of minutes of free movement.

(42)

CHAPTER 4. DATA COLLECTION 31

Activities 1-4 were repeated a number of times by each volunteer while activity 5 was done once at the end of the experiment. The procedure was designed to create a data set with 9 different classes. Step 1 was intended to capture the standing-to-kneeling and kneeling-to-standing transitions. Step 2 was intending to capture the stepping-up-chair and stepping-down-chair transitions. Step 3 was intended to capture the standing-to-squatting and squatting-to-standing transitions. Step 4 was intended to capture the standing-to-sitting and sitting-to-standing transitions. Step 5 was intended to capture a large variety of motions in a null class.

The transitions of main concern in this thesis are the standing-to-kneeling and kneeling-to-standing transitions. The other transitions were included because they could potentially be confused with the kneeling transitions and they seem common in knee-straining occupations.

The null class was designed to sample from the vast space of actions not included in the other classes. During step 5 the volunteers were not allowed to perform any of the transitions occurring in steps 1-4 but were otherwise allowed to move around freely.

4.1.3 Delimitations

To limit the scope of the data collection the following delimitations were made:

• At the beginning of the procedure the volunteers were instructed to place both knees on the ground when they knelt down. The data collection therefore only focuses on the "proper" kneeling position and does not include postural transitions to other types of kneeling positions, for example where only one knee is on the ground.

• The group of volunteers consisted mostly of employees of Bon- touch AB. Therefore the volunteers might not be a representative sample of the target population of people with knee-straining occupations.

• During the data collection the smartphone was placed in the same pocket in the same orientation for every volunteer. Since the sensor coordinate system is defined relative to the screen of the

(43)

phone the orientation of the phone affects the sensor data. A system trained on the data might therefore only detect the postural transitions when using a similar phone placement.

• The data collection was done using a single phone. The accelerometers and gyroscopes of Android phones report absolute acceleration and rotational speed. The data should therefore not differ much between phones.

Some aspects of these delimitations are discussed further in Section 6.2.

4.1.4 Choice of Phone

The data collection was performed using a Samsung Galaxy S7 running Android 7.0 (Android Nougat). The Galaxy S7 was released during the first half of 2016 [30] and together with the similar Galaxy S7 Edge it sold 55 million units in the period from the release to the first quarter of 2017 [31]. The Galaxy S7 was selected because it represents a modern yet widely used and available phone.

4.1.5 Placement of Phone

The phone was placed in the front right pocket of each volunteer. The phone was placed with the top of the phone pointing downwards and the screen of the phone facing the thigh of the volunteer. The volunteers were allowed to use their own pants and therefore there might be some variation in the placement of the phone. The phone was not fastened but placed naturally in the pocket. Figure 4.1 shows the placement of the phone and the coordinate system of the phone in this placement.

4.1.6 Synchronization

The experiments were filmed using a Samsung Galaxy S6 Edge mounted on a tripod. The video was recorded at 30 frames per second in 640x480 resolution. The recorded video and sensor data needed to be synchro- nized. This was done with the help of the data collection app. The data collection app was programmed to flash the screen of the device

(44)

CHAPTER 4. DATA COLLECTION 33

Figure 4.1: The placement and coordinate system of the phone during data collection. The person depicted as a silhouette is facing towards you.

with red color when the recording of the sensor data was started. At the start of each run the device was shown to the camera when the recording of the sensor data was started. The offset between the video and sensor data could then be determined by finding the first frame of the video which showed the red flash of the device screen.

4.1.7 Labeling

The recorded sensor data was labeled by reviewing the recorded video.

The recorded video was reviewed in the VLC media player. The VLC media player extension Jump to time allows the time of a frame to be determined with sub-second precision. Using the jump to time extension the start and end times of each transition was marked.

During labeling the following set of guidelines was used to determine when an transition started and ended:

• The standing-to-kneeling transition starts at the first bending of the knees and ends when both knees are on the ground.

• The kneeling-to-standing transition starts when the first knee leaves the ground and ends when the person has reached a fully upright position.

(45)

• The standing-to-squatting transition starts at the first bending of the knees and ends when the person has reached a stationary squatting position.

• The squatting-to-standing transition starts at the first upward motion of the body and ends when the person has reached a fully upright position.

• The stepping-up-chair transition starts when the first foot leaves the ground to step up and ends when both feet are on the chair.

• The stepping-down-chair transition starts when the first foot leaves the chair to step down and ends when both feet are on the ground.

• The standing-to-sitting transition starts at the first bending of the body and ends when the person has reached a stationary sitting position.

• The sitting-to-standing transition starts at the first forward or upward movement of the torso and ends when the person has reached a fully upright position.

See Section 6.2 for a discussion on the use of these guidelines in practice.

4.1.8 Extraction

The labeled samples were extracted by selecting data from time win- dows determined by the start and end times of the transitions. The length of the time window was selected based on the average times of the transitions. Each transition was extracted at 3 different locations in the window. For the first location a margin of 200 milliseconds was subtracted from the start time of the transition and the low end of the window was placed at this point. For the second location the center of the transition was determined by averaging the start and end times and the window was centered on this point. For the third location a margin of 200 milliseconds was added to the end time of the transition and the high end of the window was placed at this point.

The transitions were extracted at different locations as a simple form of data augmentation and to encourage the networks to learn features that are robust to different placements. The margin was added

Using a Smartphone to Detect the Standing-to-Kneeling and Kneeling-to-Standing Postural Transitions

Using a Smartphone to Detect the Standing-to-Kneeling and Kneeling- to-Standing Postural Transitions

DAN SETTERQUIST

Using a Smartphone to Detect the Standing-to-Kneeling and Kneeling-to-Standing

Postural Transitions

DAN SETTERQUIST

Abstract

Sammanfattning

Acknowledgements

Contents

List of Figures

List of Tables

Glossary

Mathematical Notation

Acronyms

Chapter 1 Introduction

1.1 Research Question

1.2 Motivation

1.3 Bontouch AB

1.4 Delimitations

1.5 Ethical, Societal and Sustainability As- pects

1.6 Report Outline

Chapter 2

Previous Research

2.1 HAR with Feature Extraction

2.2 HAR End-to-End

Chapter 3 Theory

3.1 Smartphone Sensors

3.2 Machine Learning

3.2.1 Classification and Supervised Learning

3.2.2 Cost Function

3.2.3 Gradient Descent

3.2.4 Stochastic Gradient Descent

3.2.5 RMSprop

3.3 Neural Networks

3.4 Recurrent Neural Networks

3.5 Long Short-Term Memory

3.6 Common Architectural Components

3.6.1 Rectified Linear Units

3.6.2 Softmax Activation

3.6.3 Categorical Cross Entropy

3.6.4 Dropout

3.7 Model Evaluation

3.7.1 Data Splitting

3.7.2 Confusion Matrix

3.7.3 Accuracy and Error

3.7.4 Precision and Recall

3.8 TensorFlow and Keras

Chapter 4

Data Collection

4.1 Design

4.1.1 Data Collection App

4.1.2 Experimental Procedure

4.1.3 Delimitations

4.1.4 Choice of Phone

4.1.5 Placement of Phone

4.1.6 Synchronization

4.1.7 Labeling

4.1.8 Extraction