Applying Generative Adversarial Networks for the Generation of Adversarial Attacks Against Continuous Authentication

(1)

(2)

Examiner: György Dán

School of Electrical Engineering and Computer Science Host company: KTH

Swedish title: Tillämpningen av generative adversarial networks för

attacker mot kontinuerlig autentisering

(3)

(4)

Abstract

Cybersecurity has been a hot topic over the past decades with lots of ap- proaches being proposed to secure our private information. One of the emerg- ing approaches in security is continuous authentication, in which the computer system is authenticating the user by monitoring the user behavior during the login session. Although the research of continuous authentication has got a significant achievement, the security of state-of-the-art continuous authenti- cation systems is far from perfect. In this thesis, we explore the ability of classifiers used in continuous authentication and examine whether they can be bypassed by generated samples of user behavior from generative models.

In our work, we considered four machine learning classifiers as the continu- ous authentication system: One-Class support vector machine, support vector machine, Gaussian mixture model and an artificial neural network. Further- more, we considered three generative models used to mimic the user behavior:

generative adversarial network, kernel density estimation generator, and MM-

SEbased generator. The considered classifiers and generative models were

tested on two continuous authentication datasets. The result shows that gener-

ative adversarial networks achieved superior results with more than 50samples

passing continuous authentication.

(5)

Sammanfattning

Cybersäkerhet har varit ett hett ämne under de senaste decennierna med många

tillvägagångssätt skapats för att säkra vår privata information. En av de nya till-

vägagångssätten inom säkerhet är kontinuerlig autentisering där datorsystemet

autentiserar användaren genom att övervaka dess beteende under inloggnings-

sessionen. Trots att forskningen om kontinuerlig autentisering har fått bety-

dande framsteg, är säkerheten för toppmoderna kontinuerliga autentiseringssy-

stem långt ifrån perfekt. I denna avhandling undersöker vi förmågan hos klassi-

ficerare som används vid kontinuerlig autentisering och undersöker om de kan

luras med hjälp av generativa modeller. I vårt arbete använde vi fyra maskinin-

lärningsklassificerare som det kontinuerliga autentiseringssystemet: En-klass

stödvektormaskin, stödvektormaskin, Gaussian-blandningsmodell och ett ar-

tificiellt neuronnät. Vidare övervägde vi tre generativa modeller som användes

för att härma användarens beteende: generativt motsatt nätverk, kärnatäthets-

uppskattningsgenerator och MMSE-baserad generator. De betraktade klassi-

ficerarna och generativa modellerna testades på två dataset för kontinuerlig

autentisering. Resultatet visar att generativa motverkande nätverk uppnådde

överlägsna resultat med mer än 50% av de genererade proverna som passerade

kontinuerlig autentisering.

(6)

1 Introduction 1

1.1 Motivation . . . 1

1.2 Scope of the Thesis . . . 2

1.3 Research Question . . . 3

1.4 Ethics . . . 3

1.5 Outline . . . 3

2 Background 4 2.1 Classifiers . . . 4

2.1.1 Support Vector Machines . . . 4

2.1.2 Gaussian Mixture Model . . . 7

2.1.3 Neural Network . . . 7

2.2 Generative Models . . . 9

2.2.1 KDE Generator . . . 9

2.2.2 GAN . . . 11

2.3 Evaluation Metrics . . . 12

3 Related Work 14 3.1 Multi Factor Authentication . . . 14

3.1.1 Biometric Authentication . . . 14

3.2 Continuous Authentication . . . 15

3.2.1 CA for PCs . . . 15

3.2.2 CA for Mobile Devices . . . 16

3.2.3 CA for Drivers . . . 18

3.3 Application of GAN in Cybersecurity . . . 18

3.3.1 PassGAN . . . 18

3.3.2 MalGAN . . . 19

3.3.3 IDSGAN . . . 19

3.3.4 GAN in Continuous Authentication . . . 19

v

(7)

4 Methods 20

4.1 Datasets . . . 20

4.1.1 CMU Keystroke Dataset . . . 20

4.1.2 Touchalytics Dataset . . . 20

4.2 Attack Model against CA . . . 21

4.2.1 Generative Models . . . 21

4.2.2 Classification Models . . . 23

4.3 Training Procedure . . . 23

5 Results 26 5.1 Model Evaluation . . . 26

5.2 Performance of Classifiers . . . 27

5.3 Performance of Generative Models . . . 27

5.3.1 Averaged Generative Ability of Generative Models . . 27

5.3.2 Effects of the Amount of Training Data . . . 30

5.3.3 Execution Time . . . 34

6 Conclusions 36 6.1 Future Work . . . 37

Bibliography 38

(8)

Introduction

1.1 Motivation

User authentication is an extremely important part of computer and network system security. Currently, password-based authentication is still one of the most popular approaches which only authenticates the user at the initial lo- gin stage. Because attackers can apply brute-force and dictionary password searching which is proved to be very efficient[1], such one-time authentica- tion is easy to be broken either when users choose to use relatively simple passwords or when they use the same passwords for multiple accounts.

There are a lot of ways to enhance the information security of private and corporation devices, compared to the one-time password authentication. One such approach is called multi-factor authentication(MFA) which authenticates a user using multiple means, each called an authentication factor. The au- thentication factors could be your biometric information such as fingerprints, faces, or signatures. They could also be some one-time generated passwords such as PIN code. Both password authentication and MFA share the same se- curity flaw in which a user is only authenticated at the initial login session.

To ensure reliable user authentication during their entire active login period, a continuous authentication scheme is required[2]

The emerging concept of continuous authentication (CA) is based on con- tinuously computing how certain the device is logged in by a legitimate user.

Recent research of continuous authentication is based on behavioral informa- tion where the recorded user input is used to verify the legitimate user[3, 4, 5, 6]. The typically recorded user input used for authentication can be either keystroke patterns on a keyboard, patterns of point devices such as mouses, and touchscreens, transaction characteristics, or several other indicators of the user

1

(9)

ematical models, many machine learning algorithms such as support vector machines (SVMs), Gaussian mixture model (GMM), or K-nearest neighbor (KNN) have been applied into continuous authentication. A classifier that is based on those machine learning algorithms is trained to distinguish the user input from different users.

Even though the research on how to build a CA system has obtained sig- nificant achievement, the security of state-of-the-art CA systems is far from perfect. Consider that there is a system in which the keystroke patterns are recorded for continuous authentication. If Attackers could model the proba- bility distribution of the recorded keystroke patterns, the CA system is easy to be bypassed. For mobile devices, the same method can be applied to typing movement patterns. This brings up the question of how to generate similar keystroke patterns and movement patterns of point devices. The fast develop- ing field of generative models, and especially the invention of generative ad- versarial nets (GAN)[7] which achieved superior performance in image gener- ation tasks, and data augmentation tasks can offer a promising solution to this problem.

1.2 Scope of the Thesis

The desired outcome of the project is to propose an attacking framework which could bypass the CA system after learning the distribution of recorded user in- put data. Bypassing CA will be done by generating and feeding similar data into the CA classifiers to check how many attempts from an unauthorized user will be incorrectly accepted. The core of the framework is GAN which has proved to be powerful in estimating data distribution via the adversarial pro- cess. The sample generation is a learning process. It will be done by minimiz- ing the difference of fake user behavior data and authorized user behavior data using generative models. Hypothetically, classifiers are expected to wrongly classify the generated samples as legitimate attempts more often compared with using recorded data from others as attempts from unauthorized users.

The evaluation standard is to compare the difference in the rate of mistak-

enly accepted attempts before and after using GAN. One goal of the project

is that the attacking framework should be compatible with different devices,

therefore we used several datasets corresponding to different typical user in-

put. Because the shape of user input data will be different, one challenge of

this project is to adapt the structure of GAN for different user input.

(10)

1.3 Research Question

The main question this thesis aims to answer is: To what extent, and with how much of training data can the output of generative adversarial network be used to bypass continuous authentication based on user behavioral information?

1.4 Ethics

The damage caused by cyber crime including theft of mobile devices and digi- tal information has cost severe economic losses. According to Symantec 2019 Internet Security Threat Report[8], the situation is getting worse with 56% in- crement of web attacks and 33% increment of mobile ransomware. To reduce the damage and build a more reliable security system, it is necessary to find existing vulnerabilities and fix them. The purpose of generating adversarial attacks to bypass CA system is to find existing vulnerabilities. It’s important to know our work can’t improve the performance of CA systems by directly fixing the existing vulnerabilities. To prevent malicious usage, our work is not directly implementable into the real world.

1.5 Outline

The rest of the thesis is organized as follows: Chapter 2 gives the background

needed in continuous authentication, generative adversarial nets, and evalu-

ation metrics; chapter 3 presents the related work. Chapter 4 explains the

method used for testing the hypothesis in this thesis. Chapter 5 presents the

results, and gives a reasonable analysis Finally, chapter 6 concludes the thesis

by recapitulating the results and proposes the content of future work.

(11)

Background

In this chapter, we will discuss the basic theory of the machine learning clas- sifiers that we use in this thesis for continuous authentication. Then we will discuss the basic theory of possible generative models used for bypassing con- tinuous authentication. Finally we will discuss the evaluation standard to eval- uate the performance of generative models.

2.1 Classifiers

Many machine learning algorithms have been applied to the problem of build- ing the system of continuous authentication including support vector machines (SVMs), Gaussian Mixture Model (GMM). In the following we provide some description of each of these algorithms.

2.1.1 Support Vector Machines

Support vector machines (SVMs) are machine learning models which target at analyzing data used for classification and regression. SVMs were introduced in early 1990s by Vapnik[9]. The fundamental theory of SVMs is statistical learning. By minimizing empirical risk and the confidence interval of the learning machine, a good generalization capability can be achieved[10].

Consider a sample space = {u

i

,v

i

, i = 1,...N }, where u

i

represent input vectors and v

i

represent linked targets; The linear binary classification problem can be described as finding an optimal hyperplane:w

^T

u + b = 0 to seperate u

i

based on v

i

2 {0, 1}; The number of elements that v contains depend on the number of classes/groups need separation. Fig 2.1 gives an example of how to separate the two groups with a linear hyperplane.

4

(12)

Figure 2.1: An example of SVM using linear kernel.

By calculating the direct distance of each point to the hyperplane and min- imizing the sum of the distance, the problem can be changed into the problem of solving quadratic programming:

min

w,b,⇠ 1

2

kwk

²

+ C P

N i=1

⇠

i

, (2.1)

with constraint condition:

⇠

i

= max(0, 1 v

i

(! · (u

ⁱ

) + b)), ⇠

i

0, i = 1, ..., N,

where w 2 ⌦,is the hyper plane’s vector; b 2 R is a bias term; () is a function mapping training samples u

i

into high dimensional space; C > 0 is punishment factor for wrong sub-samples; ⇠

i

is called hinge loss function.

Then, we get the dual problem:

max Q(a) = max X

N

i=1

a

i

1 2

X

N i=1

X

N j=1

a

i

a

j

v

i

v

j

(u

i

) (u

j

)

X

N i=1

a

i

v

i

= 0, 0  a

ⁱ

 C, i = 1, 2 . . . .N.

(2.2)

Until now the classification can be described as:

(13)

v = sgn(w

^T

· u + b) = sgn(

X

N i=1

(a

i

v

i

( (u) · (u

i

)) + b)), (2.3) where K(u, u

i

) = (u) · (u

i

) is called kernel function, for different SVM algorithms, different kernel functions will apply. The shape of hyperplane is depended on kernel functions. Fig 2.2 gives an example of SVM with an non- linear kernel function.

Figure 2.2: An example of SVM using non-linear kernel.

Normally, the number of classes of SVM is more than two. But when we want to know whether test data belongs to the same member of a class of training data, one-class SVM is a good solution. The quadratic programming is sligtly different from Equation 2.1:

min

w,b,⇠ 1

2

kwk

²

+ C P

^N

i=1

⇠

i

⇢, (2.4)

with constraint condition:

⇠

i

= max(0, ⇢ w · (u

ⁱ

)), ⇠

i

0, i = 1, ..., N,

where constant ⇢ is the threshold. Then the classification can be described as:

v = sgn(w

^T

· u ⇢) = sgn(

X

N i=1

(a

i

K(u, u

i

) ⇢)). (2.5)

(14)

2.1.2 Gaussian Mixture Model

Gaussian Mixture Model (GMM) is a density estimator which is a commonly used type of classifier for clustering. In GMM, the D-dimensional feature vector is modeled using a linear combination of M Gaussians. For object clustering, each object is represented by a GMM and is referred to by its model

. The model is collectively represented by the notation:

= X

M

i=1

⇢

i

· N (i | µ

i

, ⌃

i

) , (2.6)

where µ

i

are mean vectors, ⌃

i

are covariance matrices and ⇢

i

are mix- ture weights. Fig 2.3 shows a simple example of clustering for 1-dimensional features using GMM.

Figure 2.3: An example of clustering using GMM.

2.1.3 Neural Network

A neural network is a computing system whose basic computing element is called neuron. A collection of neurons operating together at a specific depth within a neural network is called layer. Neural network is widely used in ma- chine learning and has achieved promising results in a lot of areas including image recognition, speech recognition, natural language processing and cy- bersecurity tasks.

Consider a sample space = {u

ⁱ

,v

i

, i = 1,...N }, where u

ⁱ

represent input

vectors and v

i

represent desired targets; After the input vectors are fed into a

one-layer neural network, the output will be:

(15)

b

v

i

= (W u

i

+ b), (2.7)

where bv

i

is the output, (·)is called activation function, and W are weights of neural network and b is a bias term. Then the error between output bv

i

and desired target v

i

is defined as:

E( , W ) = v b

i

v

i

. (2.8)

The widely used algorithm for training neural network to find the best W is using gradient descent:

W

^{T +1}

= W

^T

↵ @E( , W

^T

)

@W , (2.9)

where T is the number of training iterations (epochs), ↵ is a constant called learning rate. Fig 2.4 gives a simple example of neural network structure.

Figure 2.4: A example structure of neural network.

(16)

2.2 Generative Models

The main target of generative models is to produce a sequence of data which has the same distribution as the target user input. To achieve this goal, three possible generative models have been selected in this thesis: a simple genera- tive model based on MMSE, kernel density estimation generator and GAN.

2.2.1 KDE Generator

Kernel density estimation (KDE) is a non-parametric way to estimate the prob- ability density function of a random variable.

Consider a sample space = {x

ⁱ

, i = 1,...N } be a univariate independent and identically distributed sample drawn from some distribution with an un- known density f(·). The goal of kernel density estimation is to estimate the shape of this function f(·) by feeding some basic distribution functions. The distribution function or kernel density estimator is

f (x; h) =

^

1 N

X

N i=1

K(x x

i

; h), (2.10)

where h is bandwidth acting as a smoothing parameter, controlling the

trade-off between bias and variance in the result. K(x; h) is the kernel func-

tion which is a well-know density function such as Gaussian, Tophat and Ex-

ponential.

(17)

Figure 2.5: An example of the distribution estimation using KDE with differ- ent bandwidths.

Figure 2.6: The shape of possible kernel functions of KDE.

Fig 2.5 shows the influence of bandwidth and Fig 2.6 shows the shape of

possible kernel functions.

(18)

2.2.2 GAN

Generative Adversarial Network is a framework for estimating probability dis- tribution via an adversarial process. As shown in Fig 2.7, it contains two main parts, a model of generator (G) and a model of discriminator (D) which are trained simultaneously. Both G and D are neural networks. The purpose of G is to captures the data distribution while the purpose of D is to estimate the probability that a sample came from the training data rather than G.

Figure 2.7: The structure of GAN.

To learn the generator’s distribution over data X, a fixed length noise vari- able Z is applied for input. The discriminator is trained to maximize the proba- bility of assigning the correct label to generated samples and samples from the dataset. While the generator is trained to minimize the probability of the dis- criminator correctly assigning the label of generated samples. In general, the framework is train by playing a mini-max game with value function V (G, D):

min

G

max

D

E

x⇠pxx

[log D(x)] + E

z⇠pzz

[log(1 D(G(Z)))]. (2.11)

In practice, Equation 2.11 may not provide enough gradient for the gen-

erator to learn well. At the early stage of training, when G is poor, D can

reject samples from G with high confidence because they are obviously differ-

ent from samples from the training dataset. In this case, log(1 D(G(Z)))

saturates. Instead of training G to minimize log(1 D(G(Z))), another so-

lution is to train G to maximize log D(G(Z)). Therefore, the value function

V (G, D) is changed as follows:

(19)

Figure 2.8: An example of the relationship among EER, FAR and FRR.

min

G

max

D

E

_x⇠p_xx

[log D(x)] + E

_z⇠p_zz

[log D(G(Z))]. (2.12)

2.3 Evaluation Metrics

In a security verification system, there are two evaluating metrics of error:

false acceptance rate (FAR) and false rejection rate (FRR) [11]. FAR is the

measure of the likelihood that the system will incorrectly accept the access

of an unauthorized user[12]. FRR is the measure of the likelihood that the

system will incorrectly reject the access of an authorized user[12]. The value

of FAR and FRR is dependent on the threshold that the classifier chooses. As

shown in Fig 2.8,the point at which FAR and FRR have the same value is

called Equal Error Rate (ERR). EER of any system gives system performance

independent of threshold. Therefore, the lower the ERR, the better the system

performance[12]. Following formulas are used for FAR and FRR calculations:

(20)

F AR = N umber of accepted unauthorized authentication N umber of unauthorized authentication attempts

F RR = N umber of rejected authorized authentication N umber of authorized authentication attempts

(2.13)

(21)

Related Work

In this chapter, we will introduce the related work about multi factor authen- tication, continuous authentication and the application of GAN in the field of cybersecurity.

3.1 Multi Factor Authentication

Many tech giants use multi factor authentication for account login. It is a method in which the user is granted access for the specified device/account after sufficient factors are authenticated. There are many levels or factors to authentication. The first level is "something you know", which typically is enforced by a combination of username and password. The second level is

"something you have", which already exists in the device such as a USB stick with a secret token. The third level is "something you are", which is done with the use of biometrics such as fingerprints, facial recognition, etc[13]. Theoret- ically MFA is more secure than the fixed passcodes because the attacker needs to provide more than one authentication factors. However, people who have less vigilance may leave secret information under some phishing websites. If a mobile device is stolen, some sensitive information such as PIN code for bank accounts is less secure[14].

3.1.1 Biometric Authentication

The most frequently used factor of MFA is biometric information. If the se- curity system only uses biometric information for authentication, the method is also called biometric authentication.

Biometric information can be categorized as two main types which include

14

(22)

hard biometric traits and soft biometric triats[15]. The hard biometric traits in- clude face features, fingerprints, eye retinas or signatures. The soft biometric traits are measurements of physical, behavioural or adhered human character- istics such as measurement of keystroke time and track of mouse movement.

The biometric information is different for everyone, and this is an excellent advantage of authentication factor. Therefore the research on how to apply biometrics for cybersecurity develops very fast.

Each biometric trait has its own advantages and disadvantages, and the way to choose which biometric traits depends on the application requirements. The hard biometric traits are wildly used because they are easy be collected and hard to be forged. Zhan et al. [16] proposed a 3D facial image recognition sys- tem. By improving the traditional facial recognition algorithm, they achieved a satisfactory result of identifying the correct user. Wei et al. [17] proposed a fingerprint identification system to prevent generative on online examination.

Researchers also notice that the soft biometric traits have high accuracy in distinguishing different users. Typing behavior authentication is one of the excellent representations.

Da Costa Abreu et al. [18] proposed a new approach to enhance the accu- racy of identity prediction by combining hard- and soft-biometric information.

Unfortunately, most of the biometric characteristics are unchangeable, there- fore the loss of privacy is permanent if they are ever compromised[19].

3.2 Continuous Authentication

In response to the security concerns such as losing mobile devices, leaking lo- gin passwords and biometric information etc, continuous authentication (CA) have been proposed to authenticate the user during the login session. The re- search on continuous authentication based on behavioral biometrics has signif- icant achievement in recent years. CA can be divided into different categories according to the approach of using different behavioral biometrics. Many ma- chine learning algorithms have been applied in building the system of con- tinuous authentication. The following table shows the categorized algorithms corresponding behavioral biometrics used as training data.

3.2.1 CA for PCs

Before the arrival of smart mobile devices, computers were the main research

subject of continuous authentication. Therefore, the typical user input corre-

sponded to keystroke patterns and mouse movement patterns[20]. The keystroke

(23)

SVM Keystroke dynamics, Finger movement patterns, Brain wave

KNN Keystroke dynamics, Mouse dynamics, Finger movement patterns

GMM Keystroke dynamics, Driving videos Neural Network Driving videos, Face images

Table 3.1: The frequently used algorithms in continuous authentication and corresponding behavioral biometrics training data.

measurements could be very accurate by exploiting the electrical characteris- tics of keyboards together with modifications to the internal system timer.

Dowland et al. [4], proposed a statistical method for continuous authenti- cation, however they didn’t achieve an false rejection rate less than 0.4.

Saritas et al. [21, 22], modeled the interaction between an attacker and an operator using continuous authentication as a statistical model. They found that from the attacker side, the optimal strategy exists a threshold structure and consists of observing and then attacking after gathering enough data. From the operator side, the optimal design of secure measures is provided.

Shepherd et al. [3], analyzed the keystroke characteristics based on a PC keyboard demonstrating a new way to perform continuous authentication using mouse dynamics. They also concluded that a simple statistical approach does not provide sufficient distinction between users.

Mondal et al. [23], proposed a method of performing continuous authenti- cation using mouse dynamics as the behavioural biometric modality. In their proposed scheme, the user will be authenticated per mouse event performed on his/her system with 6 machine learning algorithms. In their result, the gen- uine user has never been classified as an impostor throughout a full session.

Meanwhile, the average number of mouse actions an impostor could perform decreases more than 50% after detection from the best algorithm.

3.2.2 CA for Mobile Devices

Mobile devices have become a main part in the modern society and they are used in all aspects of our lives. The research subject of continuous authenti- cation gradually shifts from computers to mobile devices. The recorded user input also changes to movement patterns such as touch gestures, location in- formation and power consumption.

Gascon et al. [5], extracted typing motion behaviors via asking 300 par-

(24)

ticipants to write a predefined text on the test smartphone and applied support vector machines for analysis. Even though their approach could authenticate some users with high precision, there also existed participants for whom no accurate motion fingerprint can be learned.

Frank et al. [6], proposed a classifier framework based on behavioral touch features extracted from raw touchscreen logs and was able to accept or reject the current user by monitoring interaction with the touch screen. The classi- fier achieves a median equal error rate of 0% for intrasession authentication (authenticate the user within one session), 2% 3% for intersession authen- tication (authenticate the user across multiple sessions at the same day), and below 4% when the authentication test was carried out one week after the en- rollment phase during which the system relies on a conventional authentication method such as password.

Niinuma et al. [15], proposed a framework using soft biometric traits, which performs with high tolerance and high efficiency. Zhao et al. [24], pro- posed a novel Graphic Touch Gesture Feature (GTGF), which is used to repre- sent trace’s movement and pressure dynamics extracting from the touch traces of users. The touch gesture datasets they used includes three sets of common touch gestures (flick up/down, flick right/left, zoom in/out). By combining six gestures together, an EER of 2.62% was achieved, demonstrating that GTGF is a powerful approach for representing touch gestures in CA.

Li et al. [25], proposed a novel biometric-based system to achieve contin- uous and unobservant re-authentication for smartphones. Their system uses a classifier to learn the owner’s finger movement patterns and checks the current user’s finger movement patterns against the owner’s. The system continuously authenticates the current user without interrupting the interactions with smart- phones.

Xu et al. [26], based on the user’s touch operations on the touchscreen,

adopted a continuous and passive authentication mechanism. They studied

how to model multiple types of touch data and perform continuous authenti-

cation accordingly. Their results verified that touch biometrics can serve as a

promising method for continuous and passive authentication. Their authenti-

cation system was able to generally achieve EER values lower than 10% for all

operation types. The slide operation performed the best by achieving an EER

lower than 1%.

(25)

3.2.3 CA for Drivers

Continuous authentication of driver is useful in preventing car thefts and fa- tigue driving.

Nakanishi et al. [27] proposed a method of continuous authentication based on brain wave using a simplified driving simulator. Their model obtained the EER of 24% among 10 subjects.

Derman et al. [28] proposed a CNN-based face classifier with a GMM- based appearance verifier for continuous authentication of vehicle drivers. They collected 130 in-car driving videos from 52 different subjects and investigated the illumination and head pose conditions of current face recognition technol- ogy will allow commercialization of continuous driver authentication.

3.3 Application of GAN in Cybersecurity

GANs have been mainly used in the fields of computer vision due to their capability of data generation without explicitly modelling the probability den- sity function[29]. But they have also been used in the field of cybersecurity and showed success in different tasks such as password guessing[30], malware generation[31] and IDS bypassing[32].

3.3.1 PassGAN

Traditional password guessing is done by state-of-the-art tools which exhaust

existed large password tables with common passwords. Password expand-

ing by concatenating can also be done by those kind of tools. Although they

work well on low level passwords, the task on guessing further complicated

passwords looks impossible for them. Hitaj et al. [30] introduced a novel

approach called PassGAN, which replaces human-generated password rules

with theory-grounded machine learning algorithms, could produce high qual-

ity password guessing after training from large password datasets. At the core

of PassGAN, a neural network is trained to determine autonomously password

characteristics and structures and to leverage this knowledge to generate new

samples that follow the same distribution. Without any priory knowledge of

password structures and properties, the experiment shows that PassGAN can

generated new passwords which are not limited to a specific subset of password

space.

(26)

3.3.2 MalGAN

Malware is a kind of software which intends to cause damage to computers, mobile devices and computer network. The detection of malware is another important area in cybersecurity. With the help of machine learning algorithms, the accuracy and efficiency of malware detection has been dramatically im- proved. However, Hu and Tan [31] proposed a novel approach called MalGAN which generates adversarial malware feature examples successfully bypassing black-box machine learning based malware detection models with nearly zero detection rate. When generating adversarial malware feature examples, only irrelevant features have been modified to make sure the malware is still func- tional. The structure of MalGAN includes three parts: generator, discrimina- tor and black-box detector. Unlike traditional algorithm of GAN which gen- erates adversarial examples using static gradient based approaches, MalGAN generates examples dynamically according to the feedback of the black-box de- tector. The generator changes the distribution of adversarial examples to make it far from the probability distribution of the training dataset of the black-box detector.

3.3.3 IDSGAN

Intrusion detection system (IDS) bears the responsibility of identifying mali- cious network traffic. With the help of machine learning, IDS develops rapidly.

However, the appearance of IDSGAN questioned the robustness of intrusion detection systems. Similar to MalGAN, it is an approach proposed by Lin, Shi, and Xue [32], which generates malicious network traffic to evade IDS. The ad- versarial malicious traffic is generated dynamically according to the feedback of the black-box IDS detector. In order to make sure the traffic is functional, only irrelevant features are modified. In their experiment, many intrusion de- tection systems have been selected as black-box IDS, and IDSGAN performs excellently to deceive these systems.

3.3.4 GAN in Continuous Authentication

As mentioned in section 3.2.1 and section 3.2.2, many machine learning al- gorithms have been applied to build the system of continuous authentication.

However, there still lacks of research about applying GAN for generating ad-

versarial attacks to bypass continuous authentication. Similar to the method

that used in MalGAN and IDSGAN, a generative model trained on datasets

with typical user input records can be build.

(27)

Methods

In this chapter, we will provide details of the datasets we used, and the pro- cedures of how to train the generative models and verification models (classi- fiers).

4.1 Datasets

In this thesis, two datasets were tested including the CMU keystroke dataset [33] and Touchalytics dataset [6].

4.1.1 CMU Keystroke Dataset

The CMU keystroke dataset contains three different types of features which are shown in Table 4.1. The first feature is "Hold Time" which measures the time between the pressing and the release of a key. The second feature is

"Keydown-Keydown Time" which measures the time between the pressing of consecutive keys. The third feature is "Keyup-Keydown Time" measures the time between the release of one key and the pressing of the next key. There are 51 users’ feature vectors in total. Each user performed keyboard inputting the same sequence of passwords ".tie5Roanl." in 8 sessions. During each session, the user repeated 50 times. Hence, there are 400 keystroke vectors for each user.

4.1.2 Touchalytics Dataset

The Touchalytics dataset contains a set of records of user input of commodity mobile devices. It was collected through several reading and image compar-

20

(28)

ison experiments. The form of collected data is the row touchscreen logs.

About 30 behavioral touch features have been extracted from the raw touch- screen logs. The details of extracted features have been shown in Table 4.2.

In order to divide the data records for better feature extraction, a term called stroke is defined as a sequence of touch data that begins with touching the screen and ends with lifting the finger. Most of the extracted features are self- explanatory. Table 4.2 shows the name of the features and part of information needs further explanation.

Feature name Contained information

Hold Time Time between the pressing and the release of a key Keydown-Keydown Time Time between the pressing of consecutive keys Keyup-Keydown Time Time between the release of one key and the press-

ing of the next key

Table 4.1: List of extracted features from CMU kestroke dataset.

4.2 Attack Model against CA

As shown in Fig 4.1, the model of our experiment consists of two main parts, the generative models and classification models. The duty of generative mod- els is to generate samples which estimate the distribution of training data. And the goal of classification models is to evaluate the performance of generated samples. Therefore the classification models should be pre-trained so that they are good at the classification of accepting legitimate users and rejecting attack- ers.

4.2.1 Generative Models

There are three different generative models considered in this thesis which are a simple generative model based on MMSE, the KDE generative model and GAN.

a. MMSE-based Generator:

The simple generative model is constructed based on the MMSE estimator

which is proposed to minimize the mean square error of randomly selected

feature vectors. In order to make the generated data more random, we add

some noise to the MMSE estimator. With this model, we can generate samples

in such format,

(29)

G

M M SE

= 1 K

X

K i=1

X

i

+ µ, (4.1)

where X

i

is the extracted feature vector and K is the number of selected training data, and µ is a Gaussian noise with zero mean while the standard deviation is adjustable.

b. KDE Generative Model

The KDE generative model uses Equation 2.10 to estimate the distribution of training data. By selecting the best bandwidth and suitable kernel function, a good estimation should be made. Then it can be applied to generate meaning- ful samples.

c. GAN

As mentioned in section 2.2.2, GAN consists of two neural networks, a dis- criminator and a generator. It uses Equation 2.12 to train the networks.

Figure 4.1: The structure training process with generative models and classi-

fiers.

(30)

4.2.2 Classification Models

There are four different classifiers to evaluate the performance of how the al- ready trained models incorrectly accept the output of generator as from the authorized user.

a. One-Class SVM

One-class SVM detector which is an unsupervised outlier detector. The pur- pose of One-Class SVM is to classify new input data as similar or different to the training set.

b. SVM

Different from One-class SVM mentioned above, this SVM detector contains two separation groups/classes. The kernel function it uses is called rbf.

c. GMM

This classifier is a model based on GMM. The goal of GMM is to cluster the legitimate user and other users, therefore, for different datasets, the number of Gaussian components is always two.

d. Neural Network

This neural network contains 3 layers with 2 output neurons for binary clas- sification. The input size depends on the length of feature vectors, and the neurons of output layer is always 1. ReLu and Sigmod are used as activation functions.

4.3 Training Procedure

The training procedure contains three phases. The first phase is to prepossess

the training data such as normalizing and splitting. The goal of normalization

is to change the values of numeric columns in the dataset to a common scale

and boost the process of training. The goal of splitting data is to validate and

test the performance while training. The second phase is to train the genera-

tive models and using them to generate samples. As mentioned before there

are three different generative models which are a simple MMSE-based genera-

tive model, KDE generator and GAN. For the simple MMSE-based generative

model, samples are generated based on Equation 4.1, for KDE generators, it

(31)

criminator. In order to find out the relationship between the amount of training

data and the performance of generated samples, we controlled the amount of

training data of generative models into five different ranges which are 20%,

40%, 60%, 80%, 100% training data. The third phase is to train the classifiers

because we need to make sure the classifiers can distinguish the legitimate

users and attackers with low EER before we apply them to evaluate the per-

formance of the generated samples. The final phase is to apply the pre-trained

models to evaluate the performance of generated samples from all generative

models. All the classifiers use the same training set and validation set.

(32)

Feature name Contained information

mid-stroke area covered The covered area on the screen at the middle of the stroke

mid-stroke pressure The finger pressure on the screen at the middle of the stroke

mid-stroke finger orientation The finger orientation on the screen at the middle of the stroke

direction of end-to-end line The direction between the first and last points of a single stroke

average direction Average direction of a single stroke

start x Start coordination of x axis

stop x start y stop y

stroke duration The duration time of a single stroke

inter-stroke time Time between two consecutive

strokes

length of trajectory Total length of one stroke 80%-perc. pairwise velocity

50%-perc. pairwise velocity 20%-perc. pairwise velocity average velocity

median velocity at last 3 pts 80%-perc. pairwise acc 50%-perc. pairwise acc 20%-perc. pairwise acc

median acceleration at first 5 points

ratio end-to-end dist and length of trajectory

mean resultant lenght Quantification of how directed the stroke is

largest deviation from end-to-end line 80%-perc. dev. from end-to-end line 50%-perc. dev. from end-to-end line 20%-perc. dev. from end-to-end line direct end-to-end distance

phone orientation

1-up 2-down 3-left 4-right up/down/left/right flag change of finger orientation

Table 4.2: List of extracted features from Touchalytics Datasset.

(33)

Results

In this chapter, we will provide results of our experiment, and give a reasonable analysis for these results. As mentioned in Chapter 4, there are two datasets, four classifiers and three generative models. A complete model contains one generative model, one classifier trained on one dataset. A complete evaluation contains the results of all 24 combinations of complete models.

5.1 Model Evaluation

The model evaluation contains two parts, the evaluation of classifiers and the evaluation of generative models. The evaluation of how good the classifiers distinguish between legitimate users and attackers should have close results to those in [33, 6]. The result is represented using equal error rate(EER).

The higher the EER, the worse the classifier. The classifiers with good perfor- mance is necessary for the evaluation of generative models. In our experiment, we care about how much of the generated samples will be accepted as coming from a legitimate user under a reasonable threshold. We chose the threshold where the difference between FAR and FRR is the smallest. Under this condi- tion, the value of FAR has the same meaning as the value of EER. The higher the FAR, the better the generative model. Also in order to find out the relation- ship between the performance of generative models and the amount of their training data, five different ranges of data have been chosen to train generative models.

26

(34)

5.2 Performance of Classifiers

Before we run the experiment of evaluating how many of the generated sam- ples can successfully bypass the classifiers, we need to make sure that the classifiers perform well in distinguishing legitimate users and the rest users.

The performance of each classifiers in both datasets is shown in Table 5.1.

The evaluation standard of classifiers is EER. According to the definition of EER, FAR has the same value as EER. For CMU keystroke dataset, the best performance of EER is 0.12 which means 12% of attempts of legitimate users is classified as attackers and 12% of attempts of attackers is classified as legitimate users. The worst EER is 0.16 which is produced by neural network.

For Touchalytics dataset, the best EER is 0.14 produced by One-Class SVM and the worst EER is 0.19 produced by SVM. It’s clear that One-Class SVM is the best classifier for both datasets. Surprisingly, neural network is not the best classifier, this could be a reason why SVM is more frequently used in keystroke dynamics analysis. With those results, we have a strong confidence to believe that all the classifiers are good at classifying the legitimate users and the rest users for both datasets.

One-Class

SVM SVM GMM Neural Network

CMU Dataset 0.12 0.14 0.15 0.16

Touch Dataset 0.14 0.19 0.16 0.15

Table 5.1: EER of each classifiers on different datasets.

5.3 Performance of Generative Models

After we know that the performance of all classifiers is good, we applied them for the evaluation of generative models. The evaluation standards contain four parts: the generative ability of generative models averaged on all users, the effects of the amount of training data for generative models, the execution time of generative models.

5.3.1 Averaged Generative Ability of Generative Mod- els

The false acceptance rate (FAR) is used to describe the generative ability of

generative models. In this thesis, we measured FAR over all users on both

(35)

by the level/variance of noises. As shown in Table 5.2, the MMSE-based gen- erator can bypass all classifiers with high rate if the standard deviation is 0.

For CMU keystroke datset, the best value of FAR is around 0.91 which means almost 91% of mean values can be accepted as the legitimate users. How- ever, the samples generated by MMSE-based generative model is fixed which means the result will be either rejected or accepted if the generated samples are averaged on all the training data. Also, with the increment of standard deviation, the performance of generated samples drops heavily.

As shown in Table 5.3, the performance of KDE generator is quite good. In CMU dataset, more than 65 percent of generated samples can bypass one-class SVM, SVM, GMM, and neural network classifiers. In Touchalytics dataset, more than 76 percent of generated samples can bypass one-clas SVM, GMM and neural network classifiers, while only 50 percent of generated samples can bypass SVM classifier. Different from other generative models, the generative ability of GAN is dependent on training epochs (iterations of training). The relationship between training epochs of GAN and average FAR is shown in Fig 5.1 and Fig 5.2. The value of average FAR increases with the increment of training epochs, indicating that GAN is trained to cheat classifiers gradually.

For GAN trained on CMU keystroke dataset, after 140 training epochs, the generative ability is saturated. The overall average FAR is above 0.7, while the best FAR is evaluated by One-class SVM and the worst FAR is evaluated by neural network. For GAN trained on Touchalytics dataset, after 100 training epochs, the generative ability is saturated. The overall average FAR is above 0.5, while the best FAR is evaluated by One-class SVM and the worst FAR is evaluated by neural network.

Standard Devia- tion

One-Class

SVM SVM GMM Neural Net-

work

CMU Dataset 0 0.82 0.76 0.91 0.78

0.02 0.71 0.79 0.85 0.70

0.05 0.23 0.26 0.54 0.21

Touch Dataset 0 0.86 0.56 0.88 0.79

0.03 0.71 0.44 0.75 0.69

0.08 0.31 0.23 0.33 0.24

Table 5.2: FAR of the MMSE-based generative model with different standard

deviation of noises trained on different classifiers.

(36)

One-Class

SVM SVM GMM Neural Network

CMU Dataset 0.70 0.71 0.73 0.65

Touch Dataset 0.87 0.50 0.78 0.76

Table 5.3: FAR of the KDE generator trained on different datasets.

Figure 5.1: The average FAR of GAN vs epochs with 100% training data from CMU dataset.

We observed that the classifiers with better performance in section 5.2 are easier to be bypassed by the generated samples from GAN. One possible rea- son for this is the generated sample is similar to training set. Those classifiers with better performance in section 5.2 are more likely to classify the generated samples as from legitimate users because of the similarity. There’s a huge per- formance difference evaluated by SVM when training dataset is different. One possible reason for this is that SVM trained on Touchalytics dataset, finds a hy- perplane to separate legitimate user and the rest of users, however, the SVM decision outputs of generated samples and training set are too close. Therefore half of generated samples and training set are mistakenly classified.

We also observed that when we chose initial stage GAN to generate at-

tacking samples resulting worse performance compared with section 5.2. One

possible reason is that the generated samples of initial stage GAN are easier

to be distinguished from user behavioral samples, while the samples of other

(37)

to be distinguished. For different classifiers, the algorithm is different, there- fore SVM can perform worse than other classifiers when the generated samples are different. For GAN, learning the distribution of training data is a process.

With the increment of the training epochs, the performance increases.

Figure 5.2: The average FAR of GAN vs epochs with 100% training data from Touchalytics dataset.

5.3.2 Effects of the Amount of Training Data

We hypothesized that the amount of training data of generative model would

affect their performance. The relationship between the amount of training data

and average FAR of GAN at a fixed training epochs is shown in Fig 5.3 and Fig

5.4. The result shows that with the increment of the amount of training data,

the value of average FAR increases indicating that GAN is learning more about

the distribution of training set. When the amount is small, the performance of

GAN is very poor with small value of FAR. In this thesis, we chose 0.5 as a

threshold of performance evaluation, because when FAR is more than 0.5, the

classifiers are doing worse than random guessing. In our experiment, we found

that GAN could be trained to surpass the threshold with different amount of

training data. For CMU dataset, we need 55%, 60%, 65% and 68% training

data to surpass the threshold evaluated by One-class SVM, SVM, GMM and

neural network respectively. For Touchalytics dataset, we need 55%, 68%,

(38)

68% and 90% training data to surpass the threshold evaluated by One-class SVM, neural network, GMM and SVM respectively.

The result of KDE and MMSE-based generator with zero standard devia- tion is very decent. From Fig 5.5, 5.6, 5.7, 5.8, we can conclude that increasing the amount of training data have little effects on the average value of FAR. For MMSE-based generative model, when we chose a smaller amount of training data, the confidence intervals would be larger. This is caused by the difference when collecting data from different sessions.

Figure 5.3: The average FAR of GAN vs the amount of training data from

CMU keystroke dataset at 140 training epochs.

(39)

Figure 5.4: The average FAR of GAN vs the amount of training data from Touchalytics dataset at 100 training epochs

Figure 5.5: The average FAR of MMSE-based generator with zero standard

deviation vs the amount of training data from CMU keystoke dataset.

(40)

Figure 5.6: The average FAR of MMSE-based generator with zero standard deviation vs the amount of training data from Touchalytics dataset.

Figure 5.7: The average FAR of KDE generator vs the amount of training data

from CMU keystroke dataset.

(41)

Figure 5.8: The average FAR of KDE generator vs The amount of training data from Touchalytics dataset.

5.3.3 Execution Time

Besides, we hypothesised that the amount of training data should clearly affect

the execution time of generative models. From Fig 5.9 and 5.10, we can see

that the execution time of GAN is proportional with the amount of training data

and training epochs. For KDE generator, it’s also an proportional relationship

between execution time and the amount of training data. And for MMSE-based

generator, the execution time remains almost the same. One possible reason is

that the computation of GAN and KDE is more complex than MMSE-based

generator. Also GAN takes highest execution time even though the number of

epochs is small.

(42)

Figure 5.9: The execution time vs the amount of data & training epochs with data from CMU keystroke dataset.

Figure 5.10: The execution time vs the amount of data & training epochs with

data from Touchalytics dataset.

(43)

Conclusions

In this chapter, we will summarize and conclude the thesis work and present some ideas for future research related to this thesis.

In this thesis, we examined whether generative models can be used to gen- erate samples of user behavior to bypass continuous authentication. To test this hypothesis, four different classifiers have been trained to distinguish le- gitimate users and attackers with low EER. Then the generative models were trained and evaluated using these classifiers, with two continuous authentica- tion datasets.

We found that evaluated by all of the classifiers, GAN is able to produce samples of which more than half can bypass the well-trained classifiers if train- ing with sufficient amount of data. Also, the execution time is proportional to the amount of training data and training epochs. The MMSE-based generative models is good at producing samples to bypass the classifiers successfully if the standard deviation is very small, however this does not mean it is a good generative model. The smaller the standard deviation, the less the variety of samples. Also with smaller amount of training data, the variance of final re- sult will be larger. The KDE can produce up to 91% samples to bypass clas- sifiers with a very fast speed. However, its performance is worse compared with GAN. In general, we can conclude that GAN poses a significant threat to continuous authentication systems using continuous authentication when us- ing the features of keystroke time and raw touchscreen logs of mobile devices as user input.

36

(44)

6.1 Future Work

The conducted experiments in this thesis suffer from the following limitation:

First, the datasets we chose contains limited information. The CMU keystroke dataset only contains features of a fixed length password. For free-text au- thentication, multiple classifiers should be applied. One choice is to build classifiers based on frequently used text. While the Touchalytics dataset only contains features extracted from limited mobile devices. The size of screen and sensors are different for different mobile devices. If the same person in- teracts with different mobile devices, our model would need to be re-trained to adapt this change.

In order to generate more reliable adversarial attacks using GAN, several suggestions are presented:

• Investigating more datasets related with continuous authentication. It could be a dataset containing mouse movement or a dataset collects user behavior from many kinds of mobile devices.

• Using the Area under the ROC curve as a performance evaluation met- ric. In this thesis, we only chose a reasonable threshold for finding FAR and EER. Using ROC cure can provide more details about the distribu- tion of FAR and FRR under different threshold.

• Building multiple classifiers based on the frequently used text and test

the overall performance of GAN. It could be generating similar user

behavior based on a complete sentence using GAN. For each unique

word in this sentence, a classifier is needed. The output of all classi-

fiers should be weighted and summed to get a score which is used to

determine if this sentence is typed by the legitimated user.

(45)

[1] Laatansa, R. Saputra, and B. Noranita. “Analysis of GPGPU-Based Brute- Force and Dictionary Attack on SHA-1 Password Hash”. In: 2019 3rd International Conference on Informatics and Computational Sciences (ICICoS). 2019, pp. 1–4.

[2] S. Sudarvizhi and S. Sumathi. “A Review on Continuous Authentication Using Multimodal Biometrics”. In: International Journal of Emerging Technology and Advanced Engineering 3 (Jan. 2013). : 2250-2459.

[3] S. J. Shepherd et al. “Continuous authentication by analysis of keyboard typing characteristics”. In: European Convention on Security and De- tection, 1995. May 1995, pp. 111–114. : 10.1049/cp:19950480.

[4] P Dowland et al. “A preliminary investigation of user authentication using continuous keystroke analysis”. In: Proc IFIP Annual Working Conf on Information Security Management and Small System Security, 2001 (2001), pp. 27–28. : https://ci.nii.ac.jp/naid/

10029923466/en/.

[5] Hugo Gascon et al. “Continuous authentication on mobile devices by analysis of typing motion behavior”. In: Sicherheit 2014 – Sicherheit, Schutz und Zuverlässigkeit. Ed. by Stefan Katzenbeisser, Volkmar Lotz, and Edgar Weippl. Bonn: Gesellschaft für Informatik e.V., 2014, pp. 1–

12. [6] M. Frank et al. “Touchalytics: On the Applicability of Touchscreen In- put as a Behavioral Biometric for Continuous Authentication”. In: IEEE Transactions on Information Forensics and Security 8.1 (Jan. 2013), pp. 136–148. : 1556-6021. : 10.1109/TIFS.2012.2225048.

[7] Ian Goodfellow et al. “Generative Adversarial Nets”. In: Advances in Neural Information Processing Systems 27. Ed. by Z. Ghahramani et al. Curran Associates, Inc., 2014, pp. 2672–2680. : http : / /

38

(46)

papers.nips.cc/paper/5423-generative-adversarial- nets.pdf.

[8] “Symantec 2019 Internet Security Threat Report”. In: 2019. : https:

//docs.broadcom.com/docs/istr-24-2019-en.

[9] Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. “A Training Algorithm for Optimal Margin Classifiers”. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. COLT

’92. Pittsburgh, Pennsylvania, USA: Association for Computing Ma- chinery, 1992, pp. 144–152. : 089791497X. : 10.1145/130385.

130401. : https://doi.org/10.1145/130385.130401.

[10] Sujay [Raghavendra. N] and Paresh Chandra Deka. “Support vector ma- chine applications in the field of hydrology: A review”. In: Applied Soft Computing 19 (2014), pp. 372–386. : 1568-4946. : https : //doi.org/10.1016/j.asoc.2014.02.002. : http:

/ / www . sciencedirect . com / science / article / pii / S1568494614000611.

[11] Jyh-Min Cheng and Hsiao-Chuan Wang. “A method of estimating the equal error rate for automatic speaker verification”. In: 2004 Interna- tional Symposium on Chinese Spoken Language Processing. 2004, pp. 285–

288. [12] P. Agrawal, R. Kapoor, and S. Agrawal. “A hybrid partial fingerprint matching algorithm for estimation of Equal error rate”. In: 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies. 2014, pp. 1295–1299.

[13] A. Bissada and A. Olmsted. “Mobile multi-factor authentication”. In:

2017 12th International Conference for Internet Technology and Se- cured Transactions (ICITST). 2017, pp. 210–211.

[14] S. Yang and J. Meng. “Research on Multi-factor Bidirectional Dynamic Identification Based on SMS”. In: 2018 IEEE 3rd Advanced Informa- tion Technology, Electronic and Automation Control Conference (IAEAC).

2018, pp. 1578–1582.

[15] K. Niinuma et al. “Erratum to “Soft Biometric Traits for Continuous User Authentication” [Dec 10 771-780]”. In: IEEE Transactions on Information Forensics and Security 6.3 (Sept. 2011), pp. 1180–1180.

: 1556-6021. : 10.1109/TIFS.2011.2162771.

(47)

3D Facial Imaging by Using Complex-Valued Eigenfaces Algorithm”.

In: 2006 International Workshop on Computer Architecture for Ma- chine Perception and Sensing. 2006, pp. 220–225.

[17] L. Wei et al. “Fingerprint Based Identity Authentication for Online Ex- amination System”. In: 2010 Second International Workshop on Edu- cation Technology and Computer Science. Vol. 3. 2010, pp. 307–310.

[18] M. C. Da Costa Abreu et al. “Enhancing Identity Prediction Using a Novel Approach to Combining Hard- and Soft-Biometric Information”.

In: IEEE Transactions on Systems, Man, and Cybernetics, Part C (Ap- plications and Reviews) 41.5 (2011), pp. 599–607.

[19] A. B. J. Teoh and C. T. Yuang. “Cancelable Biometrics Realization With Multispace Random Projections”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 37.5 (2007), pp. 1096–

1106.

[20] A. Ko≥akowska. “A review of emotion recognition methods based on keystroke dynamics and mouse movements”. In: 2013 6th International Conference on Human System Interactions (HSI). 2013, pp. 548–555.

[21] Serkan Saritas et al. “Continuous Authentication Security Games”. In:

Game Theory and Machine Learning for Cyber Security. Ed. by C.

Kamhoua et al. 2021.

[22] Serkan Saritas et al. “Adversarial Attacks on Continuous Authentica- tion Security: A Dynamic Game Approach”. In: Proc. of Intl. Conf. on Decision and Game Theory for Security (GameSec). Ed. by Tansu Alp- can et al. Cham: Springer International Publishing, 2019, pp. 439–458.

: 978-3-030-32430-8.

[23] Soumik Mondal et al. “Continuous authentication using mouse dynam- ics”. In: Sept. 2013.

[24] X. Zhao et al. “Continuous mobile authentication using a novel Graphic Touch Gesture Feature”. In: 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS). 2013, pp. 1–

6. [25] Lingjun Li et al. “Unobservable Re-authentication for Smartphones”.

In: NDSS. 2013.

(48)

[26] Hui Xu et al. “Towards Continuous and Passive Authentication via Touch Biometrics: An Experimental Study on Smartphones”. In: 10th Sympo- sium On Usable Privacy and Security (SOUPS 2014). Menlo Park, CA:

USENIX Association, July 2014, pp. 187–198. : 978-1-931971-13- 3. : https://www.usenix.org/conference/soups2014/

proceedings/presentation/xu.

[27] I. Nakanishi et al. “Evaluation of Brain Waves as Biometrics for Driver Authentication Using Simplified Driving Simulator”. In: 2011 Interna- tional Conference on Biometrics and Kansei Engineering. 2011, pp. 71–

76. [28] E. Derman et al. “Continuous Real-Time Vehicle Driver Authentica- tion Using Convolutional Neural Network Based Face Recognition”. In:

2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018). 2018, pp. 577–584.

[29] Xin Yi, Ekta Walia, and Paul Babyn. “Generative adversarial network in medical imaging: A review”. In: Medical Image Analysis 58 (2019), p. 101552. : 1361-8415. : https://doi.org/10.1016/

j.media.2019.101552. : http://www.sciencedirect.

com/science/article/pii/S1361841518308430.

[30] Briland Hitaj et al. “PassGAN: A Deep Learning Approach for Pass- word Guessing”. In: Applied Cryptography and Network Security. Ed.

by Robert H. Deng et al. Cham: Springer International Publishing, 2019, pp. 217–237. : 978-3-030-21568-2.

[31] Weiwei Hu and Ying Tan. Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. 2017. arXiv: 1702 . 05983 [cs.LG].

[32] Zilong Lin, Yong Shi, and Zhi Xue. IDSGAN: Generative Adversar- ial Networks for Attack Generation against Intrusion Detection. 2018.

arXiv: 1809.02077 [cs.CR].

[33] K. S. Killourhy and R. A. Maxion. “Comparing anomaly-detection al-

gorithms for keystroke dynamics”. In: 2009 IEEE/IFIP International

Conference on Dependable Systems Networks. 2009, pp. 125–134.

(49)