Multi-factor Authentication: System proposal and analysis of continuous authentication methods

(1)

Markus Fält

Thesis - Institution of Information Systems and Technology Main field of study: Computer Engineering

Credits: 300

Semester, year: Spring, 2020

Supervisor: Johannes Lindén, johannes.linden@miun.se / Mikael Norberg, mikael.norberg@forsakringskassan.se Examiner: Tingting Zhang, tingting.zhang@miun.se

Course code/registration number: DT005A

Degree programme: Civil engineering Computer Science

(2)

Abstract

It is common knowledge that the average user has multiple online accounts which all require a password. Some studies have shown that the number password for the average user is around 25. Considering this, one can see that it is unreasonable to expect the average user to have 25 truly unique passwords. Because of this multi-factor authentication could potentially be used to reduce the number of passwords to remember while maintaining and possibly exceeding the security of unique passwords. This thesis therefore, aims to examine continuous authentication methods as well as proposing an authentication system for combining various authentication methods. This was done by developing an authentication system using three different authentication factors. This system used a secret sharing scheme so that the authentication factors could be weighted according to their perceived security.

The system also proposes a secure storage method for the secret shares and the feasibility of this is shown. The continuous authentication methods tests were done by testing various machine learning methods on two public datasets. The methods were graded on accuracy and the rate at which the wrong user was accepted. This showed that random forest and decision trees worked well on the particular datasets. Ensemble learning was then tested to see how the two continuous factors performed once combined into a single classifier. This gave an equal error rate of around 5% which is comparable to state-of-the-art methods used for similar datasets.

ii

(3)

Acknowledgments

For this thesis I would like to thank both my supervisors. Johannes Lindén for helpful advice regarding both testing and machine learning and Mikael Norberg for helping to define the idea of the thesis. I would like to thank Försäkringskassan for allowing me to collaborate with them in the process of writing this thesis.

(4)

Terminology

Abbreviation Description

2FA Two factor authentication

API Application programming interface

AUC Area under curve

CSS Cascading style sheets CSV Comma-separated-value EER Equal error rate

ER Entity-Relationship

FAR False acceptance rate

FN False negative

FP False positive

FRR False rejection rate

GPS Global positioning system HTML Hypertext markup language HTTPs Hypertext transfer protocol secure JSON Javascript object notation

MFA Multi factor authentication MLP Multi-layer perceptron QR-code Quick Response-code RELU Rectified linear unit

ROC Receiving operator characteristic SFA Single factor authentication SGD Stochastic gradient decent SVC Support vector classifier SVM Support vector machine Std Standard deviation TAR True accept rate

TLS Transport layer security

TN True negative

TP True positive

(7)

1. Introduction

This thesis covers the area of multi-factor authentication. This section will present the motivation and background for the thesis as well as the goals and scope.

1.1 Background

Protecting user accounts is not only important because sensitive data may be accessed by the unwanted parties, but also to avoid intended or unintended tampering. The first adopted authentication factor was password protection. This was adopted as a single authentication factor to prove the users’ identity. According to Florencio and Herley [1] the average online user had about 25 different online accounts requiring passwords in 2007. Password protection is still the standard for most authentication systems, this likely is because of its simplicity of implementation and use. The password factor is also very easy to change in case the password is shared with other parties.

Since passwords are easily shared some high-security systems require additional authentication factors. These additional security factors can be security questions, key-cards, or fingerprints.

A system relying on two authentication factors is commonly referred to as a two-factor-authentication system, while a system using more than two factors is referred to as a multi-factor-authentication system [2]. Both two- and multi-factor-authentication systems are only used when the system is protecting some sensitive data. The number of authentication factors depends on the authentication complexity users will accept to protect their data.

Research is being done to examine how behavior can be used as an authentication method. Historically behavior has been used to analyze military telegraph operating rhythm[3, 2]. A modern example of this could be to analyze how the smartphone touch-screen is used or to examine the phones generated accelerometer data.

Behavioral authentication methods that continuously samples the user’s behavior for authentication is referred to as continuous or active authentication[4]. These two terms are often used interchangeably in the multi-factor-authentication literature.

1

(8)

Continuous authentication methods can produce systems where it is very difficult to fool the system with fake data. However due to the nature of the continuous authentication methods, the systems will often have a high tolerance, leading to high error rates. Because of this continuous authentication methods should be supplemented with additional identification factors.

1.2 Problem motivation

As the amount of digitization increases, so does the potential for critical data breaches. Therefore, it is necessary to develop robust systems that can stop or reduce the spread of stolen personal data.

This could be achieved by strengthening a user’s identity by using a combination of several biometric and non-biometric factors. Most authentication systems have different requirements both in terms of security and how complicated the authentication process can be to the users. Therefore continuous authentication might be a good way to provide extra security while also potentially being unnoticeable to the user.

With the high availability of smartphones and wearable technology new authentication methods are being proposed that may be easier to use and more secure than passwords. Since smartphones contain a variety of different sensors there are several different ways to collect identifying features of the user. As an example Apple has introduced both fingerprint and facial recognition as authentication methods[5, 6].

1.3 Overall aim

The overall aim of this thesis is to examine the field of multi- factor authentication and find a way to use continuous authentication to strengthen users’ identity. This will be done by proposing an authentication scheme, combining multiple biometric and non- biometric factors to allow for many authentication methods.

1.4 Concrete and verifiable goals

Solving the overall problem of this thesis can be done by dividing the problem into several more concrete goals.

(9)

1. Study the field of authentication and multi-factor-authentication.

2. Develop an authentication system that can easily be adapted to use different authentication methods.

3. Examine the field of continuous authentication.

4. By exploring different machine learning models exceed the state of the art performance of continuous authentication using publicly available datasets.

5. Examine how these models can be combined to strengthen the confidence in the identity through ensemble learning methods.

6. Evaluate the the continuous authentication models in terms of accuracy, false-acceptance-rate, and equal-error-rate.

1.5 Scope

In this thesis when examining potential solutions no consideration will be taken to find solutions for groups that fail to enroll. Reflections of this will be presented in Chapter 6.

This thesis will leave privacy issues as something to be discussed in Chapter 6. When looking for potential solutions privacy issues will not be considered.

The thesis does not aim to show how a continuous authentication system is implemented but rather examines several methods and suggests a way to combine multiple independent authentication methods.

When examining different continuous authentication methods publicly available datasets will be used for testing and evaluation. Data collection will only be done for demonstration purposes.

1.6 Outline

Chapter 2 covers the necessary theory needed for understanding this thesis’ construction and results. Chapter 3 presents the method for achieving the concrete and verifiable goals presented in 1.4. Chapter 4 presents the construction and implementation of an authentication

(10)

system. Chapter 5 presents the results of testing various machine learning methods on publicly available datasets. Chapter 6 discusses the practicality of the authentication system as well as social and ethical aspects. Chapter 7 presents the conclusions of the thesis, reflects on the results, and presents the potential for future work.

(11)

2. Theory

This section will present the theory required for understanding the results and analysis of this thesis. Related works are also presented at the end of this chapter.

2.1 Authentication in general

Authentication simply is the process of proving the identity of a user to a system. Essentially this requires that the user provides some data x so that the system can calculate F(x). If F(x) = y where y is some stored value then the identity of the user is proved. The security of this depends on how difficult it is for an adversary to reproduce x. [2, 7]

Using the previously mentioned definition of authentication, it is easy to imagine how simple password protection works. However this is also the general definition when one considers more complex authentication systems with more factors of identification.

2.1.1 Authentication types

Generally, authentication systems can be categorized into three categories: Single (SFA), Two (2FA), and Multi-factor authentication (MFA). The types of identification factors can also be categorized into three categories Knowledge, Ownership, and Biometric-based factors.

Knowledge-based factors are something that the user knows to prove their identity, for example a password or a pin-code. Ownership-based factors are something that the user owns to prove their identity, for example a pass-card or a key. Biometric-based factors are something that the user is to prove their identity, for example some features of the person like a fingerprint or the eyes iris. [2]

SFA only uses one factor to authenticate the user. This is usually some knowledge-based factor like a password or an ownership-based factor like a key. 2FA uses two factors to authenticate the user, this usually involves a combination of knowledge- and ownership-based factors. An example of 2FA could be unlocking a door with a key-card and a pin-code, the ownership proof would then be the key-card and the knowledge proof the pin-code. MFA is when the authentication

5

(12)

system requires more than two authentication factors to prove the users identity. [8]

MFA usually involves not only knowledge and ownership-based factors but also some biometric factors. For example this could be requiring a pin-code, a card and a fingerprint like some gyms do to prevent borrowing of membership-cards. Another example of were MFA is used is the Swedish mobile application ”Mobilt BankId” [9]. Here the application is connected to the phone so that you prove ownership of the phone, the application also requires a code to be entered by the user as a proof of knowledge, the application also uses GPS to analyze the user’s behavior. A model for a user’s location could be used to verify the user’s behavior and behavior could be considered a feature of the user. Therefore this GPS data is a type of biometric data.

2.1.2 Biometric data

Traditionally authentication/identification has been based on knowledge and ownership. Knowledge and ownership-based authentication methods rely on authentication factors that can be forgotten, disclosed, lost or stolen [10]. This is not the case with biometric data. Biometric data can be used as direct proof of the user since it relies on some features of the user. This could be the user’s voice, fingerprint, face, behavior, etc. A person’s biometric characteristics are unique to that person and cannot be duplicated easily.

A challenge with biometric-based authentication is that the sensor data usually suffers from some type of noise, making it difficult to get consistent samples. Since authentication is a binary decision this creates the possibility for false rejections and false acceptances. This can be measured as false-accept-rate (FAR) and false-reject-rate (FRR) [11]. When comparing biometric data to knowledge and ownership- based data one can see that only biometric data suffers from noise in the samples.

A fingerprint cannot be stolen in the same way a password or a key-card can. Instead biometric factors suffer from the possibility of false acceptance. False acceptance causes biometric-based data to not be a sufficiently strong factor by itself [12]. This is due to biometric sampling methods having a non-zero FAR value. Therefore

(13)

using biometric-based authentication in combination with knowledge or ownership-based authentication is recommended. The biometric data can then be seen as a supplemental factor to strengthen the confidence in the identification proof.

2.1.3 Biometric authentication system structure

Matyáš and Říha [10] present a layered model as a general structure of a biometric authentication system. The layer model presented consists of 7 layers which are: First measurement, Creation of master characteristics, Storage of master characteristics, Acquisition, Creation of new characteristics, Comparison, and Decision.

• First Measurement - This is usually the first time a user uses the system and it is important to register their biometric characteristics with high quality. If high-quality samples cannot be taken, then the user cannot be registered with the system. Some people have missing fingers, eye damage, etc. Depending on the biometric sensor these people may not be able to register.

Everyone that are unable to register form a ”fail to enroll” group.

• Creation of master characteristics - A template of the biometric characteristics is created. The raw sensor data should generally not be stored or used for comparison.

• Storage of master characteristics - There are four ways to store the master template, in a card, on a central database, on a workstation, or directly on the authentication terminal. For large scale systems only two of these methods work and that is to store in a card or a central database. Storing the template on a card is usually better for privacy reasons.

• Acquisition - To authenticate a user their biometric charac- teristics must be sampled and compared to the stored master template. The authentication terminal should also in some way check that the biometric samples belong to a live person. For example this could be to prove that the user is at the authentication terminal, so that remote adversaries cannot access the system.

• Creation of new characteristics - This is the processing stage when the user’s biometrics is sampled. Here characteristics are

(14)

extracted from the sample obtained in the acquisition step. The sample could be of lower quality so the number of extracted characteristics may be lower than what is in the master template.

• Comparison - Here the extracted biometric characteristics are compared to a master template. If the user also needs to be identified by the extracted characteristics then the comparison needs to be done on all the stored master templates.

• Decision - A yes/no decision is made based on the comparison.

A security threshold decides on how accurate the comparison has to be. This threshold can be set either for high security or high reliability. For high security the threshold is set to minimize the false accept rate (FAR). For high reliability the threshold is set to minimize the false reject rate (FRR). Usually having a low FRR means that FAR is high and vice versa.

2.1.4 Advantages & Disadvantages

The main advantage of biometric authentication is that the identity is proved directly, since they are unique and cannot be changed. A user’s biometric factors cannot be shared or stolen like a key or a password. Biometric factors can however be stolen from systems and networks, but they should not be considered to be secret, so stolen biometric data should not break security in the same way as a stolen password [10].

Biometric authentication systems are not very accurate compared to password authentication. To get a low FAR value often means that the authentication process will be fairly slow. Some users cannot enroll in the authentication system this means that alternative ways need to be developed so that these users can use the system as well. This can result in a more complicated, less secure, or more expensive system [10].

Biometric data should not be considered secret so the security of a biometric system cannot be based on the secrecy of the users’

biometric characteristics. These systems may also violate users’

privacy since biometric characteristics can be considered sensitive data. Biometric authentication also removes the possibility for users

(15)

to have multiple identities registered, since they would now all share the same biometric characteristics [10].

2.2 Finite field

A finite field also known as a Galois field is a finite set of elements where it is possible to add, subtract, multiply and invert each element.

Galois fields are usually represented as GF(pⁿ) where p is a prime number and n is a positive integer [13]. In this thesis however we will only cover Galois fields where n = 1, these fields can also be called

”prime fields”.

A prime field will contain the elements 0, 1, 2, ..., p−1. Performing addition, subtraction, and multiplication is trivial in such a field.

These operations are simply performed by taking the result modulo p.

However for inversion it is a bit more difficult. This is because to find the inverse of an element the extended Euclidean algorithm needs to be applied [14].

2.3 Secret sharing

Secret sharing is the act of splitting a secret among a set of users.

Each user cannot access the secret by themself, but together they can reconstruct the secret. One entity called dealer knows the secret this dealer constructs all the shares and deals them to each user. Now the users can only access the secret if they all agree to do so. The number of users is n and the threshold value is t. t indicates the minimum number of users that have to agree to access the secret.

According to Binu and Sreekumar [15] there are three cases either t=1, n=t or n >t>1. If t=1 then the secret can simply be shared with each of the users. If n = t then the dealer can simply generate n−1 random values and give them to n−1 of the users. The last user instead gets:

(s XOR p₁ XOR p₂ XOR ... XOR p_n₋₁) (2.1) where p₁, ..., p_n₋₁ are the random values and s is the secret. To recon- struct the secret now the users can simply preform this calculation:

s= (p₁ XOR p₂ XOR ... XOR p_n) (2.2)

(16)

In this type of solution each share needs to be the same size as the original secret. If n >t >1 then it is a bit more difficult since there are many subsets of users that may be able to reconstruct the secret.

The naive solution is to use the same solution as for n =t but for each subset. This will require that each user store a lot of shares. If we let n = 100 and t = 50 then the number of subsets each user would have to store would be (⁹⁹₄₉) ≈5.04·10²⁸. To get around the problem of storage the Shamir’s secret sharing scheme can be used.

2.3.1 Shamir’s Secret Sharing

Shamir [16] presents a secret sharing scheme based polynomials to generate secret shares and reconstruct secrets. There are essentially two requirements that Shamir’s secret sharing scheme has to fulfill.

1. Knowing t or more S_i shares makes S easy to compute.

2. Knowing t−1 or fewer S_i shares makes S difficult to compute.

A scheme that fulfills these requirements is usually referred to as a (t, n) threshold scheme. Shamir’s secret sharing scheme uses the fact that given t points, a polynomial of t−1 degree can be constructed.

Assuming the secret S is an element in a prime field F of size P where 0 < t ≤ n < P and S < P. The dealer has to choose t−1 random positive values as coefficients a₁, ..., a_t₋₁ where a_i <P. The dealer also sets the secret to be the first of the coefficients by letting a₀ = S. This can then be used to construct the polynomial seen in Equation 2.3.

f(x) =

t−1 i

∑

=₀

a_ixⁱ =a₀+a₁x+a₂x²+...+a_t₋₁x^t⁻¹ (2.3)

The shares for the n users can be constructed using this polynomial by letting each user get a share S_i = (i, f(i)) where 0 <i ≤ n and the prime P. Note that none of the users get the S₀ = (0, f(0)) share since

f(0) is the secret.

Reconstructing the secret is done given that there are at least t shares available. Using interpolation the secret a₀ can be found. This is done by applying Lagrange interpolation. First the coefficients for the

(17)

Lagrange polynomial is calculated as seen in Equation 2.4.

l_j(x) =

∏

1≤m≤t m̸=j

x−^xm

x_j−x_m = ^x−^x1

x_j−x₁...x−^xj−1

x_j−x_j₋₁

x−^xj+₁

x_j−x_j+₁

...x−^xt

x_j−x_t (2.4)

Once all of the Lagrange coefficients have been calculated the Lagrange polynomial can be calculated as seen in Equation 2.5.

L(x) =

∑

t j=₁

y_jl_j(x) (2.5)

Simplifying this should give the original coefficients from f(x) and therefore reveal the secret S. This method can be improved since a lot of unused constants are calculated.

2.3.2 Multi secret sharing

A multi-secret sharing scheme is a secret sharing scheme where there are multiple secrets to be shared during one sharing process. These types of schemes are useful when several secrets should be protected by the same amount of data needed to protect a single secret. These types of secret sharing schemes can be classified into two types, one-time-use and multi-use schemes. For a one-time-use scheme, once the secrets have been reconstructed the dealer has to redistribute new shares to all participants. A multi-use scheme however only requires participants to ever keep one share. Most secret sharing schemes today are one-time-use.

Based on Shamir’s secret sharing scheme Yang, Chang, and Hwang [17] propose a multi secret sharing scheme with a threshold t.

The proposed scheme uses a two-variable one-way function f(r, s)that maps a secret s and a value r to a fixed-length bit string. P₁, ..., Pp

denotes the p secrets that will be shared among the n users. The dealer begins by randomly generating n secret shares s₁, ..., s_n which then are sent out to the participants over secure channels. Once the secret shares have been distributed the dealer will randomly choose a value r and calculate f(r, s_i) for i =1, ..., n. The scheme will now act differently depending on the number of secrets (p) compared to the threshold (t).

(18)

If p ≤ t then a prime q will be chosen and a polynomial of degree t−1 will be constructed as seen in Equation 2.6 where:

0<P₁, P₂, ..., Pp, a₁, a₂, ..., at−p <q

h(x) = P₁+P₂x+...+P_px^p⁻¹+a₁x^p+a₂x^p⁺¹+...+a_t₋_px^t⁻¹ mod q (2.6) Once h(x) has been constructed a value y_i = h(f(r, s_i))mod q will be calculated for i =1, ..., n. Now (r, y₁, y₂, ..., y_n) will be published to all users. This results in n+1 public values.

If p > t then a prime q will be chosen and a polynomial of degree p−1 will be constructed as seen in Equation 2.7 where:

0< P₁, P₂, ..., Pp<q

h(x) = P₁+P₂x+...+P_px^p⁻¹mod q (2.7) Once h(x) has been constructed a value y_i = h(f(r, s_i))mod q will be calculated for i =1, ..., n.

The published values will then be (r, h(1), h(2), ..., h(p−t), y₁, y₂, ..., y_n). This results in n+p−t+1 public values.

Lagrange interpolation is used when reconstructing the secrets. Simi- larly to when constructing the shares there are two cases.

First case is when p ≤t in this case t participants calculate their own f(r, s_i) value. The f(r, s_i) value is the corresponding x_i to y_i. Using t pairs of (f(r, s_i), y_i) the polynomial h(x) can be reconstructed. Let the t pairs be numbered i=1, ..., t then the polynomial reconstruction can be written as seen in Equation 2.8.

h(x) =

∑

t i=₁

y_i

∏

1≤j≤t j̸=i

x− f(r, s_j)

f(r, s_i)− ^f(r, s_j) ^{mod q} ^(2.8)

Second case is when p > t, in addition to the t f(r, s_i) values needed, p−t h(i) values are also needed for i = 1, ..., p−t. The polynomial reconstruction can then be written as seen in Equation 2.9.

h(x) =

∑

t i=₁

y_i

∏

1≤j≤t j̸=i

x− f(r, s_j) f(r, s_i)−^f(r, s_j)+

p−t i

∑

=₁

h(i)

∏

1≤j≤p−t j̸=i

x−j

i−^j mod q (2.9)

(19)

2.4 Continuous authentication

Most people today have smart-phones that contain multiple sensors.

The data collected by these sensors could be used for identification, to prevent access to an adversary attacker. A system identifying users based on their behavior when using their smart-phone is usually called an active or continuous authentication system. This implies that the system uses continuous biometric data, in the form of position, touch-gestures, or keystrokes.

The used biometric data needs to be both continuous and transpar- ent[18]. This is because the system needs to be able to re-authenticate the user periodically and the authentication should not interrupt the user. These types of systems follow an enrollment and authentication phase just like regular biometric authentication systems[18]. The main difference is that the enrollment will involve training of a model and require vastly more samples than a regular biometric authentication system.

There are two possible modes for these types of systems, a verification mode where the system simply will give a yes/no answer to whether the user is authenticated or not and an identification mode where the system connects the behavior to a user. [18]

2.4.1 Performance measurements

When evaluating the authentication model produced there are several measurements to consider. For example true-accept-rate (TAR), false- accept-rate (FAR), false-reject-rate (FRR), etc. It can also be good to produce receiving operator characteristic (ROC) curves, these show the relationship between the FAR and TAR by plotting TAR on the y-axis and FAR on the x-axis. From the ROC curve an alternative to accuracy can be used for measuring the quality of the authentication model, this is the area under curve (AUC) value. The AUC value falls in the (0.5, 1.0) range where 0.5 represents random guessing and 1.0 ideal results[18].

As with all biometric authentication systems one needs to consider certain characteristics when choosing between the different traits. In Mahfouz, Mahmoud, and Eldin [18] important biometric behavioral

(20)

characteristics are:

• Universality - Users should have the trait.

• Uniqueness - The trait should distinguish between users.

• Permanence - The trait should not vary over time.

• Collectability - The trait should be easy to measure.

• Performance - The accuracy of the measurements should be robust.

• Acceptability - Users should be willing to present the trait.

• Circumvention - The trait should not susceptible to spoofing or other attacks.

2.5 Related work

This sub-section presents articles that are related to this thesis.

2.5.1 Smartphone as a biometric service for web authentica- tion

Michelin et al. [19] present a way to use smartphones as a biometric service for web authentication. The authentication system presented uses the native Android facial recognition solution as a way to gather biometric data. The proposed system requires a computer running a web client, a restricted web server, and a smartphone.

When the user tries to access the webserver a request is sent containing the user information to the server. The server will then determine if this is a valid user. If the user is determined to be valid then the server will generate a hash based on the user info and a timestamp.

The server then sends this hash back to the user so that the web client can present it as a QR-code. The users scans this QR-code with the smartphone, proving that the user is physically located at the computer. Now the smartphone can be considered physically owned by the user so it can be used as a biometric reader. The smartphone will provide the server with a list of available biometric readers. The

(21)

server can then sort this list according to reliability and ask for a reading.

If the server now needs to ensure that the smartphone still is in the possession of the user a new biometric reading could be required, or a new QR-code could be sent to the web client to ensure that the user still is at the computer.

The method for distributing the ownership-hash presented in this related work, is similar to the way this thesis implements the ownership-hash distribution.

2.5.2 Identifying users of portable devices from gait pattern with accelerometers

Mantyjarvi et al. [20] proposes a method for authenticating users based on their gait pattern. The method presented requires the user to carry a portable device with an accelerometer. This device gathers 3-D data that is then processed using correlation, frequency-domain methods, and data distribution statistics. It is assumed that each user has a unique gait signal, that there is a characteristic distribution of frequency components for each person, and that the shape of the signal affects the data distribution.

The tests used 36 different subjects walking a 20-meter distance at different speeds. The analysis of the gathered data produced a receiver operator characteristic (ROC) curve which showed authentication based on correlation to give the best results. The authentication method presented showed that an equal-error-rate of 7% could be achived.

The article relates to this thesis by testing classification methods on the same type of continuous authentication factor.

2.5.3 An analysis of different approaches to gait recognition using cell phone-based accelerometers

Muaaz and Mayrhofer [21] present a method for gait recognition by using a smartphone’s accelerometers. For this, a dataset containing three-dimensional accelerometer data was used. The data was

(22)

collected from 51 subjects walking 18.5 meters in two directions for a total of 37 meters. The data was collected for each subject in two sessions on different days. The data could not be collected at regular intervals because of this linear interpolation was applied between data points, so that the data appeared to be collected at 100 hertz.

The classification method used was a support vector machine (SVM) combined with a piecewise linear approximation. This resulted in an equal-error-rate of 22.49% to 33.3%. Without piecewise linear approximation the equal-error-rate archived was 16.26% to 28.21%.

The article relates to this thesis by similar machine learning methods are tested for continuous authentication.

2.5.4 Pace independent mobile gait biometrics

Zhong, Deng, and Meltzner [22] presents a method for authentication based on gait that is robust against variance in walking speed. The data used is collected from mobile phone accelerometer sensors with 51 distinct subjects walking at varying speeds. A representation for the subjects’ gait is built using gait dynamics images, which is a representation that is invariant to sensor orientation. Using advanced machine learning methods the noise in the sensor data is reduced.

An algorithm based on dynamic programming is proposed for clas- sifying the gait dynamics image representations. The equal-error-rate achieved with this algorithm was between 3.89% and 7.22%.

The article relates to this thesis by using also using gait data for classification. However, the article does not use the raw sensor data.

Instead a representation called gait dynamics images was used, and the classification was done on these images.

(23)

3. Methodology

This section presents the method that will be used to achieve the concrete and verifiable goals presented in 1.4.

3.1 Study of the research area

Studying the area of authentication and multi-factor-authentication will be done by reading relevant articles. This involves articles that cover a wide area of the subject such as surveys of the research area, articles that discuss how biometric data can be collected from smartphone sensors, etc. Articles that cover relevant cryptography methods will also be studied as needed. Articles will be accessed through the University library or online databases.

The relevant information form this should be presented in the Theory section of this thesis.

The main goal of this study is to find a method for the implementation of the authentication system that is mentioned in section 1.4.

3.2 Development of a general authentication sys- tem

The implementation of this thesis covers both the implementation of the core authentication system and the continuous authentication system. The vision is that these two systems will work together to provide the necessary security, while still being easy to use. However since the core authentication system should be easier to implement as a proof-of-concept it might not be possible to also develop a proof- of-concept of the continuous authentication system. This is because the continuous system will require training to get good accuracy and since public datasets will be used it might be difficult to recreate a way to collect similar data as in the public datasets.

The core authentication system is not expected to be able to generate results that can be easily compared with existing systems, but will mainly be built as a proof-of-concept. In contrast, the continuous authentication will mainly be used to analyze for results while the implementation and integration with the core authentication system

17

(24)

is not of high priority.

Using the information gained from the study of the research area a proof-of-concept system will be developed with at least three authentication factors. One knowledge, one ownership, and one biometric factor. The system should be developed in such a way where the different authentication factors can be easily exchanged for others. Different authentication factors should be able to have different levels of importance to the authentication results. This is so that a potential system can be customized to whatever the security requirements are.

The development of the core authentication system plans to use an android application to collect authentication factors. This will require the android application to implement some way to securely store some secret hash values. For example if the phone is to be used as an ownership proof then the application will need to be able to store some unique secret hash that is only known to the user’s phone and the core system server.

The system is planned to use three authentication factors such as one knowledge, ownership, and biometric factors. The knowledge factor can simply be a password while the ownership factor should be some unique secret hash only known to the system and the user. This would require some method of communicating the secret ownership hash value from the core system to the user’s phone. This can either be done by text entry or through a QR-code. The biometric factor is planned to be either a fingerprint scan or facial recognition. This is due to common sensors readily available on most smartphones such as front-facing cameras and fingerprint readers.

A server needs to be developed both to serve a website for the authentication system and to communicate with a central database.

This server will need to implement some authentication mechanism and a secure method of storage for all the user secrets. To achieve this nodejs is planned to be used. This is because of the large amount of available JavaScript packages. For example by using nodejs we have access to many encryption/decryption packages, as well as packages for generating QR-codes, etc. For the database a suitable sql

(25)

implementation is planned to be used.

3.3 Study of continuous authentication

Similarly to the study of the research area in general this study will make use of the University library and online databases to access articles. The information gained through the study will be presented in the Theory section.

The goal of this study will be to find information about what continuous authentication methods work well, and what datasets are available. Training and testing methods from different articles will be examined so that it can be seen what methods generally are used in these circumstances.

3.4 Developing continuous authentication models

For the development of the continuous authentication models public datasets will be used. For these datasets there are a set of requirements that must be fulfilled.

First it should be possible to distinguish subjects from the data. This is so that a model can be trained for one subject where the subject’s data is used for correct authentication and other subjects’ data is used for as incorrect authentication.

Second, the data should be expected to show differences between subjects behavior. Because of this it is important to consider other studies when choosing datasets. This is because using some type of behavior that is not varied across people will produce inaccurate authentication results.

Third, the different datasets should have enough subjects to allow for a mapping between subjects of the different datasets.

3.5 Combining continuous authentication models

For the training of the continuous authentication methods several distinct datasets will be used, so that it is possible to gauge how the different behavioral factors compare both in terms of accuracy and practicality. This thesis will also look at how ensemble learning can be

(26)

used to combine the different continuous authentication methods. To be able to do this the different datasets need to have some symbolic relationship. This relationship is a mapping between subjects from one dataset to another.

It is expected that once a mapping is made between a set of datasets a model can be trained to identify one subject for each of the datasets.

These models can also be combined into an ensemble learning model, possibly creating a more accurate classifier. How this affects the FAR value needs to be examined. If introducing ensemble learning methods increases classification accuracy as well as the FAR value, one could argue that the system has become less secure because of the ensemble learning methods.

The two main datasets that will be used are the 93 human gait dataset collected by Vajdi et al. [23] and the keystroke dynamics benchmark dataset collected by Killourhy and Maxion [24].

The 93 human gait dataset contains sensor data from smartphones that have been taken while a subject is walking. This dataset records several features such as the motion roll, yaw, and pitch of the phone as well as various other features. This dataset is interesting since it contains a large number of samples with many subjects. Many features may be used for authentication, this dataset could therefore, be used as a source of many different authentication factors for testing of ensemble learning methods.

The keystroke dynamics benchmark dataset contains data of many subjects typing the same password. The dataset contains measurements such as how long a particular key is begin held down, the time between a key release and the next keypress, as well as the time between two keys being pressed. This allows for a model to be produced per subject that can authenticate that subject. However due to the nature of this type of authentication it may be difficult to suggest a training method for a real-world scenario. This is because training such a classifier would require both positive and negative examples of the password being typed. Providing the negative examples would require revealing the password. However using this dataset might still reveal some interesting insight into whether or not a user’s typing

(27)

rhythm can be used as an authentication factor.

(28)

4. Implementation

This section describes the construction and implementation of design- ing a multifactor-authentication system.

4.1 Overview

The multifactor-authentication system uses a smartphone to collect both the biometric and non-biometric authentication factors. For this purpose an android application was developed. The functionality of this application includes collection of authentication factors, secure storage of any persistent data, and secure communication to an authentication server. The authentication server keeps a database of users and hash values to confirm a user’s authentication factors.

Authentication is done by each factor unlocking a set of secret shares that can then be combined to form an authentication secret. For more details on the theory behind this see Section 2.3.1.

Figure 4.1: Authentication factor enrollment process.

Figure 4.1 shows the process of registering a new user. In step 1 the user enters their name and it is sent to the authentication server, the server then responds in step 2 with a hash-value based on the username and a timestamp. Once the user computer has the hash value it is presented on the screen in the form of a QR-code. This QR-code is then scanned using a mobile application in step 3. Now the smartphone can be used as a proof of ownership since only it and

(29)

the server should know the secret hash value. The user is then asked to enter a password and a fingerprint into the mobile application. This data is then sent back to the authentication server together with the secret hash value in step 4. In steps 5 and 6 the server responds telling the user computer and the mobile application if the registration was successful or not.

Figure 4.2: Authentication factor verification process.

Figure 4.2 shows the process of verifying a user. In step 1 the user enters their username and it is sent to the authentication server. The authentication server then needs to enter a state where it is expecting to receive authentication factors from the user’s smartphone. In step 2 the user enters a password, a fingerprint, or both into the mobile application and the data is then sent back to the authentication server together with the ownership proof in the form of a hash value. The server tries to authenticate the user using the data and responds with the result in steps 3 and 4.

4.1.1 Technologies used

• Nodejs - A cross-platform JavaScript runtime environment that allows JavaScript to be executed outside of a web-browser. [25]

• Electron - Framework for creating native desktop applications using web technologies such as JavaScript, HTML and CSS. [26]

• SqLite3 - A relational database management system that is

(30)

embedded into the end program. SqLite3 does not act as a traditional client-server database engine. [27]

• Android Studio : A development environment for the Android operating system. [28]

• Mobile Vision API - A framework that provides the functionality of detecting objects in photos and video. The API includes detectors for faces, barcodes and text. [29]

• HTTPs - A protocol for secure communication over the hypertext transfer protocol. Security is guaranteed by the use of certificates where third parties can confirm the identity of a server. [30]

4.2 Android application

This sub-section covers the construction and implementation of the android application.

4.2.1 Enrollment of authentication factors

The different factors that the user has to provide when enrolling are an ownership hash value, a fingerprint hash value, and a password.

These three factors represent an ownership proof, a biometric proof and a proof of knowledge.

The ownership hash value is entered by the user scanning the QR-code presented on their computer screen. The application achieves this by using the Google mobile vision API [29]. This API needs a picture or a series of pictures taken from a video stream to analyze for faces, QR-codes, or text. This means that the application has to be able to provide pictures for analysis. This can either be done by using the camera API that google provides to start a camera inside the application, or by using android intents to launch a camera application to take a picture. The pros of using Google’s camera API is that it allows for continuous scanning of a video stream. This removes the need for scanning the QR-code again if the scan fails, since the video stream will be scanned until it is successful. However implementing this feature was judged to be quite time consuming and not necessary for the aim of this thesis. Using android intents made

(31)

the implementation quite fast and did not require much code outside of interacting with the vision API itself.

Android intents allow an android application to use the functionality of other applications. For example if an application needs to take a picture it can simply launch another camera application to take the picture. This is done by first sampling the android operating system for an application that can provide the required functionality. If an application is found it can be launched and asked to handle the output in a specific way. The output is generally returned directly but in the case of a picture or a video it is usually saved on the device. In the case of this application a picture was saved temporarily in a location private to this application. The saved picture is then scanned by the mobile vision API. If a QR-code is found the value is returned and stored otherwise the application asks the user to take another picture.

The fingerprint feature was used because the particular android device used had a fingerprint sensor. At first facial recognition was planned to be used instead, but the android device did not support that method of authentication. The mobile vision API is also only capable of face detection, not recognition [31]. With android versions 9 and 10 biometric authentication services are introduced these include both authentication based on facial recognition and fingerprint detection [32]. The android device used in this thesis had android version 8.1 which required the use of the deprecated fingerprint manager service.

This makes it possible to ask the user to scan their fingerprint, if the fingerprint is accepted by the fingerprint manager, a random hash string is generated. This hash string is enrolled as the fingerprint hash value, since it can be used as a proof of successful fingerprint authentication.

The password is enrolled by simply asking the user to enter a password into the application. The password does not however need to be stored in the application memory since the user should know it.

4.2.2 Secure storage of authentication data

The storage of both the ownership hash value and the fingerprint hash value is done in the same way. A feature called shared preferences is used to store the values. The shared preferences feature provided by

(32)

the android operating system, is a method for application developers to store options and settings for a specific application. Although the preferences of one application can be set to be private the values are never stored encrypted. Because of this both the ownership hash value and the fingerprint hash value need to be encrypted on the device before storage.

This can be achieved by using a feature called android key-store [33].

This is a method for storing encryption and decryption keys in a secure way where other applications or data extractors will have difficulty finding them. The android key-store allows the developer to decide when and how the keys should be accessed. For example one could require the user to authenticate themselves before the application has access to the keys. This makes it possible to store the hash secrets securely on an android device and use these secrets as identification proofs.

4.2.3 Secure communication with authentication server

To communicate securely between the smartphone and the authentication server the https protocol was used. This allows for the posting of JSON (JavaScript Object Notation) objects containing the necessary information to be enrolled or verified. Using https requires the server to have a TLS (Transport Layer Security) certificate, this certificate then needs to be signed by a certificate authority so that the user smartphone can trust that the connection is secure. However since this thesis does not aim to produce an authentication system that is ready to be used in the real world, self-signed certificates were used for development. Using self-signed certificates is however not recommended in a product that is planned for release. Since it requires that the user completely trusts the identity of the server.

4.3 Authentication server

This sub-section cover the construction and implementation of the authentication server.

4.3.1 Authentication method

The authentication method uses a secret sharing scheme called the Shamir’s secret sharing scheme. This allows for a secret to be divided

(33)

into a set of shares. Given a threshold t out of n shares the original secret can be reconstructed only if t or more shares are available. For a more detailed explanation please see Section 2.3.1.

Figure 4.3: Method for distributing and collecting secret shares.

Figure 4.3 shows how a secret can be divided into an arbitrary number of sets of secret shares. Each of these sets can then be distributed to an arbitrary number of different authentication systems. These systems can be completely independent and distributed away from each other. This means that even if one of the authentication systems is compromised the original secret should still be secure as long as each system keeps less than t shares.

Figure 4.4 shows an example of how one could think about the secret shares. We can view each secret share as being a sample point on a polynomial function and the secret begin the polynomial function sampled at x = 0. The reason we can think about the secret sharing scheme in this manner is that if we have t points we can always construct a polynomial of degree t−1. As long as we can construct the original polynomial we can then always find the secret value. This secret sharing method can be applied to the real numbers, however when implementing this on a computer it is not possible to use floating-point arithmetic because of floating-point error. Instead the implementation needs to use a finite number field where each element is invertible. The way this is implemented in the authentication server is to use a prime field where there is a prime number p elements.

(34)

Figure 4.4: Example polynomial.

We can then implement all the necessary operations: addition, subtraction, multiplication, and inversion. Addition, subtraction, and multiplication are all done by performing the calculation and then returning the result modulo p. For inversion however the extended Euclidian algorithm needs to be used. The extended Euclidian algorithm allows us to calculate a value s and t such that:

gcd(r₀, r₁) = s·^r0+t·^r1 (4.1) We perform the inversion by setting the value r₀ to be p and r₁ to be the value we want to invert x. We then get:

gcd(p, x) =1 =s· ^p+t·^x ^(4.2) This implies that the value t can be used as the inversion of x.

x⁻¹ ≡t mod p ⇐⇒ x·t ≡1 mod p (4.3) Once we can perform addition, subtraction, multiplication, and inversion on all the elements in the finite field. It is then possible to apply

(35)

Lagrange interpolation to reconstruct the original polynomial that contains all the secret shares. Once the polynomial is reconstructed the secret can be found by sampling at x =0.

When a user enrolls their authentication data a secret is generated using a cryptographically strong pseudo-random number generator.

This secret is then made into n secret shares and each of the authentication factors is assigned a distinct subset of those shares.

This allows one to weigh each authentication factor according to its perceived security level. For example a biometric authentication feature may be unique to the user but suffer from a non-zero false- acceptance-rate. This means that the authentication feature cannot be trusted fully and the weight of the feature should reflect that.

For this authentication system the ownership factor has been assigned a majority of the required secret shares. This is to make the ownership proof required by the system. The remaining shares have then been equally distributed among the remaining authentication factors. By controlling the distribution of secret shares it is possible to make some authentication factors optional and others non-optional. This authentication system views the fingerprint and password as optional, in that only one of them needs to be provided. This allows users some freedom in how they authenticate themselves to the system. Using a similar method for authentication can allow system developers to easily interchange the authentication factors with others and provide a large selection of different authentication methods.

4.3.2 Database design

Storing the secret shares can be done using three different methods.

Either the secret shares are all locally stored on a user device, stored in some central database, or distributed among several different authentication systems.

Storing the secret shares locally on a user device creates a single point of failure for each user. This is because if the device is compromised by malware or hacking all the secret shares may be accessed by an opponent. Even if the secret shares are stored in an encrypted form the problem of storing the encryption keys remains.

(36)

Figure 4.5: ER diagram of the database.

This thesis uses a central database to store the secret shares. The ER-diagram for the central database can be seen in figure 4.5. As can be seen the user stores a name and a secret, this is the secret that must be reconstructed to be authenticated. The users table does not have any relation with any of the other tables. The only relation is that a certain set of secret shares in the secret share table can reconstruct the secret stored in the users table. This is done so that in a database containing a large number of users and secret shares it should be difficult to create a mapping between a user and the correct secret shares. Due to the authentication method used, the problem an opponent would have to solve to map the secret shares to the correct users, is to randomly guess a subset of size n where all the shares can reconstruct the correct secret. If we let the number of users in the database be N, then the number of secret shares in the database is N·n. The possible number of guesses is: (^N_n^·ⁿ). Imagine if the opponent generated all possible guesses and wanted to test them to find which one matched a specific user. We can assume that n is constant and that N grows as users are added.

(N·n n

)

= (N·n)!

(N·n−n)!·n! (4.4) The big-O notation would characterize n to be constant so:

1

n! =O(1) (4.5)

(37)

The big-O notation for the remaining of the formula would then be:

(N·n)!

(N·ⁿ−ⁿ)! =N·n(N·n−1)(N·n−2)·...· (N·n−n+1) =

=Nⁿ(n(n− ¹

N)(n− ²

N)·...· (ⁿ−ⁿ−¹

N )) =O(^Nⁿ·ⁿⁿ) = O((^N·ⁿ)ⁿ) (4.6) This is because:

Nlim→∞(n(n− ¹

N)(n− ²

N)·...· (ⁿ− ⁿ−¹

N )) =nⁿ (4.7) Knowing this we can estimate the time it takes for an opponent to match one user to their secret shares as t· (^N·ⁿ)ⁿ/2 where t is the seconds to do one check. If N =1000000, n=10 and t=0.2 then this would result in an estimated time of 1·10⁶⁹ seconds. This assumes that the opponent cannot categorize the secret shares to the different authentication methods. Also note that in Figure 4.5 there is a one to one relationship between the ownership hash table and the password table. This is merely a practical implementation detail to make each password entry unique. In reality there should be no relationship between the data of the different authentication methods. This is because the user should provide this relation.

Distributing the secret shares to several third-party authentication systems can provide some additional security. Each system should have less shares than the threshold t. We assume that each of the third-party systems cannot create a relationship between their secret shares and another system’s secret shares. This is so that even if two systems wanted to collaborate to reconstruct the secrets they should not be able to do it. The reason this type of storage solution becomes quite secure is because if one of the third-party authentication systems becomes compromised. It should not make any of the secrets known and new secret shares can be generated and distributed out as soon as the main server is alerted. Generating new secret shares should not require the users to update all their credentials, potentially only the authentication factors related to the compromised system should be updated.

4.3.3 Integration with continuous authentication

Integrating continuous authentication into the authentication system is out of the scope of this thesis. However it is still worth discussing

(38)

how this integration could be implemented.

One of the benefits of introducing continuous authentication in an authentication system is that it allows for fraud detection. Fraud detection allows us to analyze behavior and detect patterns of behavior that are out of the norm. Thus the user can be informed of the strange behavior so that they may take appropriate action.

Continuous authentication may also allow for a greater user experi- ence. An equivalent amount of security can be provided while making the system easier to use. This is because continuous authentication does not rely on some authentication process where the user actively has to participate. Instead the user participates passively while engaging in behavior that they would engage in regardless.

Integrating continuous authentication could be done by treating the continuous authentication method as any other authentication method.

This would involve distributing secret shares to the continuous authenticator. The continuous authenticator would then only provide the secret shares to the authentication system if the user has achieved a certain confidence in their identity. Due to the high false-acceptance- rate associated with continuous authentication. The continuous authenticator cannot decide if the user is verified based on one sample authentication. Instead the continuous authenticator has to periodically evaluate the authentication data by calculating a probability of the user begin authenticated. A threshold value therefore, needs to be set as the dividing line between the user being authenticated and not authenticated. How this threshold value is set should reflect the number of secret shares assigned to the continuous authenticator.

One of the problems with integrating continuous authentication is the training process. Authentication is a binary decision either a user is verified or not verified. This requires the authentication models to be trained on data that both represent a verified user and a non-verified user. Due to privacy concerns the authentication models cannot be trained on user devices. This is because training the models on user devices would require distributing behavioral data to the user devices.

Instead training the models in a secure central server potentially does

Multi-factor Authentication: System proposal and analysis of continuous authentication methods

Abstract

Acknowledgments

Table of Contents

Terminology

1. Introduction

1.1 Background

1.2 Problem motivation

1.3 Overall aim

1.4 Concrete and verifiable goals

1.5 Scope

1.6 Outline

2. Theory

2.1 Authentication in general

2.1.1 Authentication types

2.1.2 Biometric data

2.1.3 Biometric authentication system structure

2.1.4 Advantages & Disadvantages

2.2 Finite field

2.3 Secret sharing

2.3.1 Shamir’s Secret Sharing

∑

∏

∑

2.3.2 Multi secret sharing

∑

∏

∑

∏

∑

∏

2.4 Continuous authentication

2.4.1 Performance measurements

2.5 Related work

2.5.1 Smartphone as a biometric service for web authentica- tion

2.5.2 Identifying users of portable devices from gait pattern with accelerometers

2.5.3 An analysis of different approaches to gait recognition using cell phone-based accelerometers

2.5.4 Pace independent mobile gait biometrics

3. Methodology

3.1 Study of the research area

3.2 Development of a general authentication sys- tem

3.3 Study of continuous authentication

3.4 Developing continuous authentication models

3.5 Combining continuous authentication models

4. Implementation

4.1 Overview

4.1.1 Technologies used

4.2 Android application

4.2.1 Enrollment of authentication factors

4.2.2 Secure storage of authentication data

4.2.3 Secure communication with authentication server

4.3 Authentication server

4.3.1 Authentication method

4.3.2 Database design

4.3.3 Integration with continuous authentication