Examiner: György Dán
School of Electrical Engineering and Computer Science Host company: KTH
Swedish title: Tillämpningen av generative adversarial networks för
attacker mot kontinuerlig autentisering
Abstract
Cybersecurity has been a hot topic over the past decades with lots of ap- proaches being proposed to secure our private information. One of the emerg- ing approaches in security is continuous authentication, in which the computer system is authenticating the user by monitoring the user behavior during the login session. Although the research of continuous authentication has got a significant achievement, the security of state-of-the-art continuous authenti- cation systems is far from perfect. In this thesis, we explore the ability of classifiers used in continuous authentication and examine whether they can be bypassed by generated samples of user behavior from generative models.
In our work, we considered four machine learning classifiers as the continu- ous authentication system: One-Class support vector machine, support vector machine, Gaussian mixture model and an artificial neural network. Further- more, we considered three generative models used to mimic the user behavior:
generative adversarial network, kernel density estimation generator, and MM-
SEbased generator. The considered classifiers and generative models were
tested on two continuous authentication datasets. The result shows that gener-
ative adversarial networks achieved superior results with more than 50samples
passing continuous authentication.
Sammanfattning
Cybersäkerhet har varit ett hett ämne under de senaste decennierna med många
tillvägagångssätt skapats för att säkra vår privata information. En av de nya till-
vägagångssätten inom säkerhet är kontinuerlig autentisering där datorsystemet
autentiserar användaren genom att övervaka dess beteende under inloggnings-
sessionen. Trots att forskningen om kontinuerlig autentisering har fått bety-
dande framsteg, är säkerheten för toppmoderna kontinuerliga autentiseringssy-
stem långt ifrån perfekt. I denna avhandling undersöker vi förmågan hos klassi-
ficerare som används vid kontinuerlig autentisering och undersöker om de kan
luras med hjälp av generativa modeller. I vårt arbete använde vi fyra maskinin-
lärningsklassificerare som det kontinuerliga autentiseringssystemet: En-klass
stödvektormaskin, stödvektormaskin, Gaussian-blandningsmodell och ett ar-
tificiellt neuronnät. Vidare övervägde vi tre generativa modeller som användes
för att härma användarens beteende: generativt motsatt nätverk, kärnatäthets-
uppskattningsgenerator och MMSE-baserad generator. De betraktade klassi-
ficerarna och generativa modellerna testades på två dataset för kontinuerlig
autentisering. Resultatet visar att generativa motverkande nätverk uppnådde
överlägsna resultat med mer än 50% av de genererade proverna som passerade
kontinuerlig autentisering.
1 Introduction 1
1.1 Motivation . . . 1
1.2 Scope of the Thesis . . . 2
1.3 Research Question . . . 3
1.4 Ethics . . . 3
1.5 Outline . . . 3
2 Background 4 2.1 Classifiers . . . 4
2.1.1 Support Vector Machines . . . 4
2.1.2 Gaussian Mixture Model . . . 7
2.1.3 Neural Network . . . 7
2.2 Generative Models . . . 9
2.2.1 KDE Generator . . . 9
2.2.2 GAN . . . 11
2.3 Evaluation Metrics . . . 12
3 Related Work 14 3.1 Multi Factor Authentication . . . 14
3.1.1 Biometric Authentication . . . 14
3.2 Continuous Authentication . . . 15
3.2.1 CA for PCs . . . 15
3.2.2 CA for Mobile Devices . . . 16
3.2.3 CA for Drivers . . . 18
3.3 Application of GAN in Cybersecurity . . . 18
3.3.1 PassGAN . . . 18
3.3.2 MalGAN . . . 19
3.3.3 IDSGAN . . . 19
3.3.4 GAN in Continuous Authentication . . . 19
v
4 Methods 20
4.1 Datasets . . . 20
4.1.1 CMU Keystroke Dataset . . . 20
4.1.2 Touchalytics Dataset . . . 20
4.2 Attack Model against CA . . . 21
4.2.1 Generative Models . . . 21
4.2.2 Classification Models . . . 23
4.3 Training Procedure . . . 23
5 Results 26 5.1 Model Evaluation . . . 26
5.2 Performance of Classifiers . . . 27
5.3 Performance of Generative Models . . . 27
5.3.1 Averaged Generative Ability of Generative Models . . 27
5.3.2 Effects of the Amount of Training Data . . . 30
5.3.3 Execution Time . . . 34
6 Conclusions 36 6.1 Future Work . . . 37
Bibliography 38
Introduction
1.1 Motivation
User authentication is an extremely important part of computer and network system security. Currently, password-based authentication is still one of the most popular approaches which only authenticates the user at the initial lo- gin stage. Because attackers can apply brute-force and dictionary password searching which is proved to be very efficient[1], such one-time authentica- tion is easy to be broken either when users choose to use relatively simple passwords or when they use the same passwords for multiple accounts.
There are a lot of ways to enhance the information security of private and corporation devices, compared to the one-time password authentication. One such approach is called multi-factor authentication(MFA) which authenticates a user using multiple means, each called an authentication factor. The au- thentication factors could be your biometric information such as fingerprints, faces, or signatures. They could also be some one-time generated passwords such as PIN code. Both password authentication and MFA share the same se- curity flaw in which a user is only authenticated at the initial login session.
To ensure reliable user authentication during their entire active login period, a continuous authentication scheme is required[2]
The emerging concept of continuous authentication (CA) is based on con- tinuously computing how certain the device is logged in by a legitimate user.
Recent research of continuous authentication is based on behavioral informa- tion where the recorded user input is used to verify the legitimate user[3, 4, 5, 6]. The typically recorded user input used for authentication can be either keystroke patterns on a keyboard, patterns of point devices such as mouses, and touchscreens, transaction characteristics, or several other indicators of the user
1
ematical models, many machine learning algorithms such as support vector machines (SVMs), Gaussian mixture model (GMM), or K-nearest neighbor (KNN) have been applied into continuous authentication. A classifier that is based on those machine learning algorithms is trained to distinguish the user input from different users.
Even though the research on how to build a CA system has obtained sig- nificant achievement, the security of state-of-the-art CA systems is far from perfect. Consider that there is a system in which the keystroke patterns are recorded for continuous authentication. If Attackers could model the proba- bility distribution of the recorded keystroke patterns, the CA system is easy to be bypassed. For mobile devices, the same method can be applied to typing movement patterns. This brings up the question of how to generate similar keystroke patterns and movement patterns of point devices. The fast develop- ing field of generative models, and especially the invention of generative ad- versarial nets (GAN)[7] which achieved superior performance in image gener- ation tasks, and data augmentation tasks can offer a promising solution to this problem.
1.2 Scope of the Thesis
The desired outcome of the project is to propose an attacking framework which could bypass the CA system after learning the distribution of recorded user in- put data. Bypassing CA will be done by generating and feeding similar data into the CA classifiers to check how many attempts from an unauthorized user will be incorrectly accepted. The core of the framework is GAN which has proved to be powerful in estimating data distribution via the adversarial pro- cess. The sample generation is a learning process. It will be done by minimiz- ing the difference of fake user behavior data and authorized user behavior data using generative models. Hypothetically, classifiers are expected to wrongly classify the generated samples as legitimate attempts more often compared with using recorded data from others as attempts from unauthorized users.
The evaluation standard is to compare the difference in the rate of mistak-
enly accepted attempts before and after using GAN. One goal of the project
is that the attacking framework should be compatible with different devices,
therefore we used several datasets corresponding to different typical user in-
put. Because the shape of user input data will be different, one challenge of
this project is to adapt the structure of GAN for different user input.
1.3 Research Question
The main question this thesis aims to answer is: To what extent, and with how much of training data can the output of generative adversarial network be used to bypass continuous authentication based on user behavioral information?
1.4 Ethics
The damage caused by cyber crime including theft of mobile devices and digi- tal information has cost severe economic losses. According to Symantec 2019 Internet Security Threat Report[8], the situation is getting worse with 56% in- crement of web attacks and 33% increment of mobile ransomware. To reduce the damage and build a more reliable security system, it is necessary to find existing vulnerabilities and fix them. The purpose of generating adversarial attacks to bypass CA system is to find existing vulnerabilities. It’s important to know our work can’t improve the performance of CA systems by directly fixing the existing vulnerabilities. To prevent malicious usage, our work is not directly implementable into the real world.
1.5 Outline
The rest of the thesis is organized as follows: Chapter 2 gives the background
needed in continuous authentication, generative adversarial nets, and evalu-
ation metrics; chapter 3 presents the related work. Chapter 4 explains the
method used for testing the hypothesis in this thesis. Chapter 5 presents the
results, and gives a reasonable analysis Finally, chapter 6 concludes the thesis
by recapitulating the results and proposes the content of future work.
Background
In this chapter, we will discuss the basic theory of the machine learning clas- sifiers that we use in this thesis for continuous authentication. Then we will discuss the basic theory of possible generative models used for bypassing con- tinuous authentication. Finally we will discuss the evaluation standard to eval- uate the performance of generative models.
2.1 Classifiers
Many machine learning algorithms have been applied to the problem of build- ing the system of continuous authentication including support vector machines (SVMs), Gaussian Mixture Model (GMM). In the following we provide some description of each of these algorithms.
2.1.1 Support Vector Machines
Support vector machines (SVMs) are machine learning models which target at analyzing data used for classification and regression. SVMs were introduced in early 1990s by Vapnik[9]. The fundamental theory of SVMs is statistical learning. By minimizing empirical risk and the confidence interval of the learning machine, a good generalization capability can be achieved[10].
Consider a sample space = {u
i,v
i, i = 1,...N }, where u
irepresent input vectors and v
irepresent linked targets; The linear binary classification problem can be described as finding an optimal hyperplane:w
Tu + b = 0 to seperate u
ibased on v
i2 {0, 1}; The number of elements that v contains depend on the number of classes/groups need separation. Fig 2.1 gives an example of how to separate the two groups with a linear hyperplane.
4
Figure 2.1: An example of SVM using linear kernel.
By calculating the direct distance of each point to the hyperplane and min- imizing the sum of the distance, the problem can be changed into the problem of solving quadratic programming:
min
w,b,⇠ 12
kwk
2+ C P
N i=1⇠
i, (2.1)
with constraint condition:
⇠
i= max(0, 1 v
i(! · (u
i) + b)), ⇠
i0, i = 1, ..., N,
where w 2 ⌦,is the hyper plane’s vector; b 2 R is a bias term; () is a function mapping training samples u
iinto high dimensional space; C > 0 is punishment factor for wrong sub-samples; ⇠
iis called hinge loss function.
Then, we get the dual problem:
max Q(a) = max X
Ni=1
a
i1 2
X
N i=1X
N j=1a
ia
jv
iv
j(u
i) (u
j)
X
N i=1a
iv
i= 0, 0 a
i C, i = 1, 2 . . . .N.
(2.2)
Until now the classification can be described as:
v = sgn(w
T· u + b) = sgn(
X
N i=1(a
iv
i( (u) · (u
i)) + b)), (2.3) where K(u, u
i) = (u) · (u
i) is called kernel function, for different SVM algorithms, different kernel functions will apply. The shape of hyperplane is depended on kernel functions. Fig 2.2 gives an example of SVM with an non- linear kernel function.
Figure 2.2: An example of SVM using non-linear kernel.
Normally, the number of classes of SVM is more than two. But when we want to know whether test data belongs to the same member of a class of training data, one-class SVM is a good solution. The quadratic programming is sligtly different from Equation 2.1:
min
w,b,⇠ 12
kwk
2+ C P
Ni=1
⇠
i⇢, (2.4)
with constraint condition:
⇠
i= max(0, ⇢ w · (u
i)), ⇠
i0, i = 1, ..., N,
where constant ⇢ is the threshold. Then the classification can be described as:
v = sgn(w
T· u ⇢) = sgn(
X
N i=1(a
iK(u, u
i) ⇢)). (2.5)
2.1.2 Gaussian Mixture Model
Gaussian Mixture Model (GMM) is a density estimator which is a commonly used type of classifier for clustering. In GMM, the D-dimensional feature vector is modeled using a linear combination of M Gaussians. For object clustering, each object is represented by a GMM and is referred to by its model
. The model is collectively represented by the notation:
= X
Mi=1
⇢
i· N (i | µ
i, ⌃
i) , (2.6)
where µ
iare mean vectors, ⌃
iare covariance matrices and ⇢
iare mix- ture weights. Fig 2.3 shows a simple example of clustering for 1-dimensional features using GMM.
Figure 2.3: An example of clustering using GMM.
2.1.3 Neural Network
A neural network is a computing system whose basic computing element is called neuron. A collection of neurons operating together at a specific depth within a neural network is called layer. Neural network is widely used in ma- chine learning and has achieved promising results in a lot of areas including image recognition, speech recognition, natural language processing and cy- bersecurity tasks.
Consider a sample space = {u
i,v
i, i = 1,...N }, where u
irepresent input
vectors and v
irepresent desired targets; After the input vectors are fed into a
one-layer neural network, the output will be:
b
v
i= (W u
i+ b), (2.7)
where bv
iis the output, (·)is called activation function, and W are weights of neural network and b is a bias term. Then the error between output bv
iand desired target v
iis defined as:
E( , W ) = v b
iv
i. (2.8)
The widely used algorithm for training neural network to find the best W is using gradient descent:
W
T +1= W
T↵ @E( , W
T)
@W , (2.9)
where T is the number of training iterations (epochs), ↵ is a constant called learning rate. Fig 2.4 gives a simple example of neural network structure.
Figure 2.4: A example structure of neural network.
2.2 Generative Models
The main target of generative models is to produce a sequence of data which has the same distribution as the target user input. To achieve this goal, three possible generative models have been selected in this thesis: a simple genera- tive model based on MMSE, kernel density estimation generator and GAN.
2.2.1 KDE Generator
Kernel density estimation (KDE) is a non-parametric way to estimate the prob- ability density function of a random variable.
Consider a sample space = {x
i, i = 1,...N } be a univariate independent and identically distributed sample drawn from some distribution with an un- known density f(·). The goal of kernel density estimation is to estimate the shape of this function f(·) by feeding some basic distribution functions. The distribution function or kernel density estimator is
f (x; h) =
^1 N
X
N i=1K(x x
i; h), (2.10)
where h is bandwidth acting as a smoothing parameter, controlling the
trade-off between bias and variance in the result. K(x; h) is the kernel func-
tion which is a well-know density function such as Gaussian, Tophat and Ex-
ponential.
Figure 2.5: An example of the distribution estimation using KDE with differ- ent bandwidths.
Figure 2.6: The shape of possible kernel functions of KDE.
Fig 2.5 shows the influence of bandwidth and Fig 2.6 shows the shape of
possible kernel functions.
2.2.2 GAN
Generative Adversarial Network is a framework for estimating probability dis- tribution via an adversarial process. As shown in Fig 2.7, it contains two main parts, a model of generator (G) and a model of discriminator (D) which are trained simultaneously. Both G and D are neural networks. The purpose of G is to captures the data distribution while the purpose of D is to estimate the probability that a sample came from the training data rather than G.
Figure 2.7: The structure of GAN.
To learn the generator’s distribution over data X, a fixed length noise vari- able Z is applied for input. The discriminator is trained to maximize the proba- bility of assigning the correct label to generated samples and samples from the dataset. While the generator is trained to minimize the probability of the dis- criminator correctly assigning the label of generated samples. In general, the framework is train by playing a mini-max game with value function V (G, D):
min
Gmax
D
E
x⇠pxx[log D(x)] + E
z⇠pzz[log(1 D(G(Z)))]. (2.11)
In practice, Equation 2.11 may not provide enough gradient for the gen-
erator to learn well. At the early stage of training, when G is poor, D can
reject samples from G with high confidence because they are obviously differ-
ent from samples from the training dataset. In this case, log(1 D(G(Z)))
saturates. Instead of training G to minimize log(1 D(G(Z))), another so-
lution is to train G to maximize log D(G(Z)). Therefore, the value function
V (G, D) is changed as follows:
Figure 2.8: An example of the relationship among EER, FAR and FRR.
min
Gmax
D