• No results found

Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

N/A
N/A
Protected

Academic year: 2021

Share "Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

M

ÄLARDALEN

U

NIVERSITY

S

CHOOL OF

I

NNOVATION

,

D

ESIGN AND

E

NGINEERING

V

ÄSTERÅS

,

S

WEDEN

Thesis for Degree of Bachelor of Science (15 credits) in Computer

Science | DVA331

A

NOMALY DETECTION IN NETWORKS

USING

A

UTOENCODER AND

U

NSUPERVISED

L

EARNING METHODS

Simon Azmoudeh Fard

Sad17001@student.mdh.se

Examiner:

Sasikumar Punnekkat

Mälardalen University, Västerås, Sweden

Supervisor:

Miguel León Ortiz

Mälardalen University, Västerås, Sweden

Co-Supervisor:

Ning Xiong

Mälardalen University, Västerås, Sweden

(2)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

Abstract

The increasing popularity of networking devices at workplaces leads to an exponential increase in the frequency of network attacks. This leads to having protected networks being more and more important. Because of the increase in network activity workplaces have started to leave anomaly detection in the hands of artificial intelligence. However, the current methods of detecting anomalies can not accurately detect all of them.

In this thesis, I propose a training method for autoencoders that shows how k-Means Clustering can be combined with an autoencoder for feature extraction with the use of differential evolution. The features extracted from this autoencoder is then used to classify the network activity of the KDD-99 dataset in order to be able to compare accuracies and false-positive rates with other anomaly detection methods. The results of this thesis show that it is possible to combine k-Means Clustering with autoencoders with the use of differential evolution. However, this proposed training method leads to a decrease in accuracy of classifiers. The classifiers reached around 19% accuracy when using extracted features from the autoencoder using my proposed training method as opposed to around 94% accuracy when using extracted features from an autoencoder that is not combined with k-Means Clustering. However, this research is only a preliminary research, and as such the results of this thesis should not be used for any real anomaly detection systems.

(3)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

Acknowledgements

I would like to thank my Supervisor Miguel León for his continuous support throughout the whole time writing this thesis.

I would also like to thank my friends, family and most importantly my partner for helping motivate me to finish school and pushing me to self-improve to become the best that I can be.

(4)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

Table of Contents

1.

Introduction ... 6

2.

Background ... 7

2.1.

Machine Learning ... 7

2.1.1.

Unsupervised Learning ... 7

2.1.2.

Supervised Learning ... 9

3.

Related Work ... 11

4.

Problem Formulation ... 12

5.

Method ... 13

6.

Implementation and Experiments ... 14

6.1.

Data preprocessing ... 14

6.2.

Feature Extraction ... 14

6.3.

Classification ... 15

6.4.

Experiment Environment ... 15

6.4.1.

Dataset ... 15

6.4.2.

Software ... 15

6.4.3.

Hardware ... 16

7.

Results ... 17

8.

Discussion ... 20

9.

Conclusions ... 21

10.

Future Work... 22

References ... 23

(5)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

List of Figures

Figure 1. Typical architecture of an autoencoder. From. [15]. ... 7 Figure 2. Shows the process of k-Means Clustering. From. [17]. ... 8 Figure 3. Shows a possible structure of a Decision Tree when using a Titanic Survivor dataset. From. [22] ...10 Figure 4. Shows how a Random Forest classifier classifies a data point. From. [24]. ...10 Figure 5. This figure shows how one-hot encoding splits up a feature into several features to avoid having non-numeric values. From. [28]. ...14 Figure 6. Shows the implementation of a Decision Tree classifier using scikit-learn. ...15 Figure 7. The parameters used when creating the k-Nearest Neighbors classifier. ...16 Figure 8. Shows how the fitnesses of the different autoencoders improved over time. Y-axis show fitness value and X-axis show the generation. ...17

(6)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

List of Tables

Table 1. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/pBest/1 without k-Means Clustering. ...17 Table 2. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/rand/1 without k-Means Clustering. ...18 Table 3. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/pBest/1 with k-Means Clustering. ...19 Table 4. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/rand/1 with k-Means Clustering. ...19

(7)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

1. Introduction

Today, there are a lot of networking devices being used by workplaces that often handle sensitive data [1]. It is crucial, for every workplace, to keep sensitive data stored as safely as possible to avoid having the data stolen or corrupted. With an exponential growth of networking devices and the growth of cloud-based services, workplaces have seen an exponential increase in the frequency of attacks on their networks. This leads to the security of these networks becoming more important [2], [3]. A network that gets accessed by an unauthorised or malicious user can experience severe disruptions.

To combat anomalies, such as unauthorised or malicious accesses to networks and faults in components/software, Intrusion Detection Systems (IDS) are used [2], [3]. There are different approaches to an IDS, two of them are: misuse detection (or signature-based) and anomaly detection (or specification-based).

The misuse detection approach is the more traditional one [2], [3]. It uses the patterns of previous attacks to detect intrusions in the network [5]. The effectiveness of the misuse detection IDS is solely based on the available patterns of previous attacks. This means that the IDS is unable to detect new attacks, or zero-day attacks, on the network [4], [5]. The efficiency of the IDS may be affected as the database, containing all the previous attack patterns, grows [2], [3].

An IDS that uses the anomaly detection approach works by establishing what a normal behaviour of a system is [4]. Any behaviour that deviates from the normal behaviour gets classified as an anomaly. The anomaly detection approach can therefore detect anomalies without having any prior knowledge of them. This is an important feature of anomaly detection in networks since new ways to attack a network and new attack variations are constantly being developed [5]. However, the anomaly detection approach has a high false alarm rate [6]. This is mainly because legitimate activity, that the IDS has not yet seen, is being classified as an intrusion.

There are several Artificial Intelligence (AI) and Machine Learning (ML) methods that have been used in anomaly detection over the years, e.g., neighbour-based methods, such as Means Clustering and k-Nearest Neighbors, and dimensionality methods, such as autoencoders and Principal Component Analysis [7]. Where neighbour-based methods check the number of neighbours of data where fewer or no neighbours indicate anomalous data. Dimensionality reduction methods use a subspace that can describe the normal data. Any data that has a high reconstruction error gets classified as an anomaly. All these different anomaly detection methods are flawed in that they can never detect all the anomalies [1]-[8]. The dataset used for these papers were datasets released around 20 years ago e.g., KDD-99 [9], which means that they do not take newer attacks into consideration.

In this thesis, I propose and develop an anomaly detection method that uses both an autoencoder and k-Means Clustering to extract features of networking data. The extracted features will then be tested with some classifiers to determine if the proposed method leads to an increase in anomaly detection accuracy of these classifiers.

The results of my thesis show that it is indeed possible to combine autoencoder with k-Means Clustering by training the autoencoder using differential evolution and having the k-Means Clustering be a part of the fitness function. However, the anomaly detection classifiers have poor accuracy when using extracted features from an autoencoder trained this way using the parameters described in this thesis. In section 2 of this thesis, I introduce the reader to different methods/technologies used in anomaly detection. Related works will be presented in section 3. Section 4 contains a description of the problem that this thesis focuses on. In section 5, I discuss how I will go about solving the problem. Section 6 describes the work I did and the results from that work will be shown in section 7. Section 8 contains a discussion of the work I have done, and the results achieved from it. The conclusions of this thesis is in section 9 and section 10 contains suggestions for future work.

(8)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

2. Background

In this section, the reader will be introduced to the key concepts used in this thesis, which include autoencoder, k-Means Clustering, Differential evolution, and Classifiers.

2.1. Machine Learning

Machine Learning is a term often used in Artificial Intelligence that refers to machines learning and improving based on their own experiences [10]. As technology grows there is an increasing demand of fast processing and analyzing of data. Machine Learning is a method often used in the field of data analysis where it is used to predict the future behaviour of a system based on experience. Machine Learning consists of two phases. The first phase is the training phase where a Machine Learning model is created based on previous data. The second phase is the testing phase where that model is used to process new data.

There are several different machine learning methods each with their own unique training phase [10], two of them are unsupervised learning and supervised learning.

2.1.1. Unsupervised Learning

Unsupervised learning is one of the different techniques available for machine learning [11], [12]. A machine that learns using the unsupervised learning technique receives inputs 𝑥1, 𝑥2, … , 𝑥𝑛 but not any

information about what the expected output should be. Therefore, the machine can only try to group up these inputs based on the properties of the inputs and the similarities between them [13]. Two examples of unsupervised learning methods are clustering and dimensionality reduction [12], [13].

2.1.1.1. Autoencoder

An autoencoder is an unsupervised artificial neural network [14]. Instead of learning based on the output and the label of the data, which is what happens when the neural network has supervised learning, it learns by trying to recreate the input through a series of layers. An autoencoder consists of an encoder part and a decoder part, Fig. 1. These parts can consist of one or more hidden layers of decreasing or increasing size for the encoder and decoder, respectively. The layers consist of neurons that each take the output from the previous layer as input with different weights. It starts off by compressing the input by reducing the input dimensions (encoding). After the encoding it reaches the bottleneck where the data has the lowest number of dimensions. After the bottleneck, the autoencoder increases the input dimensions until it reaches the original size (decoding). The output after decoding is then compared to the original input and an error rate is calculated based on the difference. This error rate gets backpropagated throughout the entire autoencoder. Each layer uses the error rate to update the weights for all the neurons in that layer and then calculates a new error rate to send to the next layer.

(9)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

2.1.1.2. k-Means Clustering

Means Clustering is one of the more popular unsupervised learning methods [16]. The objective of k-Means Clustering is to group data together that are similar to one another. To do this, k-k-Means Clustering looks for k number of clusters in a given dataset, Fig. 2, hence the “k” in the name. The clusters are found by placing centroids in the data and assigning each data point to the closest centroid. After each data point has been assigned a centroid, the centroid moves to the center of its assigned data points. Then the process continues until the centroids do not move anymore. The “Means” in k-Means Clustering refers to this final center point of the clusters. After getting the final center points of the cluster it is easy to predict what class a new data point should belong to by calculating which centroid is the closest to it.

The following pseudo-code explains the process of k-Means Clustering[4]: I. Select the number of clusters to use (k).

II. Select k random points to use as centroids.

III. From each data point, calculate the distance to each centroid using Euclidean distance, and assign it to the closest one.

IV. Change the location of the centroids based on the mean value of the data points that are assigned to them.

V. Repeat from step III until the centroids does not change their position or the position changes less than a specified threshold.

Figure 2. Shows the process of k-Means Clustering. From. [17].

2.1.1.3. Differential Evolution

Differential evolution is an evolutionary algorithm that, like the name suggests, mimics evolution to solve optimization problems through mutation, recombination and selection [18]. Differential evolution starts off by creating a starting population where each individual in the population gets a randomised solution to the problem which is then optimized through mutation, recombination, and selection. This is done until a certain ending criterion has been met like the max number of generations or until a good enough solution has been found. Differential evolution contains the following steps: Initializing starting population, Mutation, Crossover, and Selection.

2.1.1.3.1. Initializing starting population

As stated previously, the differential evolution starts solving a problem by first initializing a starting population by creating individuals, which are D-dimensional vectors, with randomised values within a certain boundary, 𝑏𝑚𝑎𝑥 and 𝑏𝑚𝑖𝑛 [18].

(10)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

2.1.1.3.2. Mutation

In the mutation step, there is a donor-vector created for each individual in the population based on the mutation scheme used [19]. Two examples of mutation schemes are DE/pBest/1 and DE/rand/1 [18]. DE/pBest/1 works by randomly choosing one of the pBest individuals, where pBest is the top p% best individuals in the population, and doing some vector-operations on it, Eq. (1) [18].

𝑣𝑗= 𝑥𝑝𝐵𝑒𝑠𝑡+ 𝐹(𝑥𝑟1− 𝑥𝑟2) (1)

Where 𝑣𝑗 represents the vector of individual j, 𝑥𝑝𝐵𝑒𝑠𝑡 is the randomly chosen best individual, 𝑥𝑟1 and

𝑥𝑟2 are two random unique individuals that are not 𝑥𝑝𝐵𝑒𝑠𝑡, and F being the mutation factor.

DE/rand/1 is similar to DE/pBest/1 but instead of choosing one of the best individuals it takes a random one instead, Eq. (2) [18].

𝑣𝑗= 𝑥𝑟1+ 𝐹(𝑥𝑟2− 𝑥𝑟3) (2) 2.1.1.3.3. Crossover

During the crossover step, a new vector gets created that takes values from the old vector and the new donor-vector [19], [18]. It works by the user first selecting a crossover rate (CR) that is between 0 and 1, and a random index of the vectors. Then for each index of the two vectors, randomise a number between 0 and 1, if it is less than the crossover rate or if the random index is the index we are on now, copy the value from the donor-vector to the new vector, if not, copy the value from the old vector.

2.1.1.3.4. Selection

In the selection step, the old vector gets replaced by the new vector if that new vector gets a better fitness value during evaluation than the old vector [19]. The fitness function that evaluates the vector is different based on what problem is being solved.

2.1.2. Supervised Learning

Supervised learning is similar to unsupervised learning, but instead of giving it no information about the expected output of any input, we give it what we expect the output should be. This way the machine learns similarities of the inputs based on the expected outputs and is used for algorithms such as classifiers [20].

2.1.2.1. Classifiers

Classifiers are algorithms used to categorize data into classes [21]. This is done by using the values of certain features of the data to determine the class that the data belongs to. They can be used to categorize several things, e.g., health data, to tell us if people are sick or healthy, network data, to tell if a connection is anomalous or normal, and email data, to tell if an email is spam or not.

2.1.2.1.1. Decision Tree

Decision Tree is a type of classifier that creates a tree of conditions and leaves based on the features available to it [22]. If we consider a dataset of the passengers on the Titanic, the features might be their age, sex, and number of spouses/children with them. The decision tree is supposed to classify if that passenger survived. It might create a condition that checks if the passenger was a woman, another condition that checks if the passenger was a child, and a third condition that checks the number of spouses/children that were with them on the ship. Through these conditions, a leaf node will be reached that will classify that person as having survived or died, Fig. 3.

(11)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

Figure 3. Shows a possible structure of a Decision Tree when using a Titanic Survivor dataset. From. [22]

2.1.2.1.2. K-Nearest Neighbors

k-Nearest Neighbors is an algorithm that assumes that similar data points exist close together [23]. When classifying a new data point, this algorithm checks the nearest neighbors. By calculating the distances to all neighbors and picking out the k closest ones we can assume that the new data point belongs to the class which is the most common class of the k nearest neighbors. The following pseudocode explains the process:

I. Load training and test data. II. Choose what value k should be. III. For each data in test data:

i. Calculate Euclidean distance to all data points in the training data. ii. Store distances in a list and sort it.

iii. Take the first k data points in the list.

iv. Check what class is the most common of the k data points. v. Assign that class to the data point from the test data.

2.1.2.1.3. Random Forest

A random forest classifier is an expanded decision tree classifier where several decision trees get created, each of them uncorrelated to one another [24]. The classifier works by going through all these decision trees and getting a class prediction from each one and then classifying the data as the class with the highest prediction count, Fig. 4.

(12)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

3. Related Work

Anomaly detection using artificial intelligence is a popular research area in computer science. The number of papers that researches this topic appears to be boundless. In this section, I review several papers that are relevant to my research.

Chen et al. [7] proposed two autoencoder-based network anomaly detection methods, one using a conventional autoencoder and one using convolutional autoencoder. By using dimensionality reduction, these methods could find anomalies if the reconstruction error exceeded a certain threshold 𝜃. They trained their autoencoder using a training set containing only normal behaviour. This made sure that the autoencoder would be optimized to reconstruct normal behaviour with a low reconstruction error. Since the autoencoder had been optimized for normal behaviour, any data that deviates from this would have a higher reconstruction error and could then be classified as an anomaly. The results from the research conducted by Chen et al. [7] shows that by using dimensionality reduction methods to find non-linear correlation between the features of the networking data, the anomaly detection accuracy gets improved. When compared to other famous anomaly detection methods e.g., k-Nearest Neighbors and Support Vector Machines, the autoencoder-based methods outperformed them in both false positive rate and accuracy.

Shah and Trivedi [25] proposed a feature reduction model for the KDD-99 dataset that uses information gain (IG) to reduce the number of features of the chosen dataset from 41 features down to 22. The model then uses those features with a backpropagating neural network to classify the data and measuring the performance. Shah and Trivedi then compared the results with another model that does not reduce the number of features and found that their feature reduction model is better or equally compatible with the full features model. The most notable difference was the size of the dataset, which got reduced by 35% and the time required to process 40 epochs of the neural network, which got reduced by 46%.

Yan [26] utilizes entropy theory to establish a feature vector, which contains Source IP, Destination IP, Source Port, Destination Port, Packet Size, and Packet Type, for the dataset used in his paper, which was KDD-99. With the established feature vector, Yan classified the network traffic as either normal or anomalous with the help of Support Vector Machine. Yan’s proposed method of establishing a feature vector with the help of entropy theory to be classified by a Support Vector Machine managed to reach accuracies of about 89%. When using a ROC curve as the performance evaluation metric, his proposed method managed to beat other methods, such as Incremental LOF and HPStream.

Okada [27] evaluates the ability to use differential evolution to train the weights and biases of an autoencoder instead of training an autoencoder using backpropagation of the error. Okada claims that using differential evolution would be a better way to train an autoencoder since evolutionary algorithms are population-based stochastic search algorithms, whereas backpropagation is a gradient-based single-point search algorithm. In his paper, Okada explains that there are two different ways that differential evolution can be used to train an autoencoder. One way is to use differential evolution to optimize the weights and biases of the autoencoder and the other being to optimize the number of layers, the number of neurons in each layer, and the weights and biases of the autoencoder. For his experiments, Okada compared the results of using differential evolution (DE) to those of using genetic algorithm (GA), evolutionary strategy (ES), and particle swarm optimization (PSO). The results of these experiments were that DE had the lowest error during the training phase with an error rate of 8.99%, which is better than GA (9.67%) and ES (9.07%) but worse than PSO (8.69%). During the testing phase, however, DE had an error rate of 11.43%, which was worse than PSO (11.18%) and ES (11.23%) but better than GA (12.63%).

(13)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

4. Problem Formulation

There is a need to develop different anomaly detection methods. Previous work shows that not every anomaly gets detected using known methods [1]-[8]. Even if a method manages to accurately detect all anomalies, there will still be a need for continuous research on the topic as attackers will always develop newer attacks. For this thesis, I investigated if extracting features using an autoencoder together with k-Means Clustering is possible and if it will lead to an overall improvement of anomaly detection accuracy as opposed to only using an autoencoder. This leads to my following research questions:

• RQ1: How can an autoencoder be used together with k-Means Clustering to extract features for anomaly detection?

• RQ2: What classifier is the best at correctly classifying anomalous data using extracted features?

(14)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

5. Method

In order to answer my research questions (RQ1 & RQ2), I used the experiment methodology by taking the following steps:

• Literature review over existing anomaly detection classifiers.

• Implement feature extraction using autoencoder together with k-Means Clustering and feature extraction using only autoencoder.

• Use the implementation to test the anomaly detection classifiers by using a networking dataset with attacks.

• Analyze the results from the tests

First off, I performed a literature review on anomaly detection classifiers by analyzing related works. Previous work on anomaly detection using different methods has been examined and has been used as a foundation of this thesis. The information gathered from these research papers include anomaly detection accuracy and what dataset and method was used, which is necessary to make a proper comparison of my results with the results of the reviewed papers.

In the second step of the process, I implemented feature extraction that uses both autoencoder and k-Means Clustering to extract the features that were used in the next step. For the feature extraction using only an autoencoder, the same autoencoder was used.

In the third step, I identified suitable anomaly detection classifiers to be used and I also identified a suitable dataset that had anomalies present. The selected dataset had features extracted from it using the feature extractors of the previous step that were then used by the classifiers to detect anomalies. Finally, I took all the quantitive data from the tests, which were the anomaly detection accuracies and the false-positive rates using the different anomaly detection classifiers. With the quantitive data I could also see if autoencoder together with k-Means for feature extraction led to a higher anomaly detection accuracy or not compared to using only an autoencoder.

(15)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

6. Implementation and Experiments

6.1. Data preprocessing

The chosen datasets for my experiments were two subsets of the KDD-99 dataset [9], KDD-99 10% for training and KDD-99 corrected for testing, as KDD-99 is one of the most used datasets for network anomaly detection. This dataset has 41 features whereas 30 of them were continuous values. This dataset contains numeric values of both integers and floating points, but it also contains non-numeric values which is why preprocessing was necessary. For these non-numeric values, one-hot encoding was used. One-hot encoding is the process of dividing a feature up into several features, based on the number of unique values of that feature, in order to only use numeric values. The value that was listed in the old feature would then be the feature that contains the value 1, Fig. 5. During preprocessing I also normalized all the numeric values to make sure every value falls within the same range.

Figure 5. This figure shows how one-hot encoding splits up a feature into several features to avoid having non-numeric values. From. [28].

6.2. Feature Extraction

The feature extractor used in my implementation is an autoencoder that was implemented using the keras library for Python [29]. The autoencoder extracts features by reducing the dimensionality of the input. The training process of an autoencoder usually involves backpropagation of the error but in my implementation the training process involved the usage of differential evolution. By using differential evolution, I was able to use k-Means Clustering to help with the training, which made it possible to answer my first research question (RQ1).

The values I used the differential evolution for to try and optimize were the weights of the autoencoder as well as the centroid values for the k-Means Clustering. In order to calculate the fitness of the agents, I extracted the weights from the agents and set the weights of the autoencoder to the extracted weights. With the new weights the training set was used with the autoencoder again.

The fitness function used consists of two parts. The first part of the fitness function calculates the mean squared error for the autoencoder based on the difference of the input values and the output values. The second part calculates the average distance to the closest k-Means centroid of the entire training set by using Euclidean Distance based on the extracted features of a given data point, Eq. (3).

𝑑(𝑝, 𝑞) = √(𝑝1− 𝑞1)2+ (𝑝2− 𝑞2)2+ ⋯ + (𝑝𝑖− 𝑞𝑖)2+ ⋯ + (𝑝𝑛− 𝑞𝑛)2 (3)

(16)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

6.3. Classification

After the training of the autoencoder was complete, I used it to extract the features of the training data and extracted the labels from the dataset to use for the supervised learning of the classifiers, Fig. 6. The classifiers I used for this thesis were Decision Tree, k-Nearest Neighbors, Random Forest, and Artificial Neural Network (ANN). Decision Tree, k-Nearest Neighbors, and Random Forest were implemented using the scikit-learn library for Python [30]. The ANN was implemented using the keras library for Python [29]. After the training was completed, I extracted the features of the test data and extracted the labels from the dataset. Then the classifiers made a prediction based on the extracted features which was then compared to the actual label of the data.

For the classification process the following metrics were used: Accuracy, Eq. (4), False-Positive Rate, Eq. (5), and False-Negative Rate, Eq. (6).

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑛𝑜𝑚𝑎𝑙𝑖𝑒𝑠+𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑛𝑜𝑟𝑚𝑎𝑙𝑠 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 (4) 𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑅𝑎𝑡𝑒 = 𝐼𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑛𝑜𝑟𝑚𝑎𝑙𝑠 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑟𝑚𝑎𝑙 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 (5) 𝐹𝑎𝑙𝑠𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑅𝑎𝑡𝑒 = 𝐼𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑛𝑜𝑚𝑎𝑙𝑖𝑒𝑠 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑛𝑜𝑚𝑎𝑙𝑜𝑢𝑠 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 (6)

Figure 6. Shows the implementation of a Decision Tree classifier using scikit-learn.

6.4. Experiment Environment

In this sub-section, I explain the different parameters and materials used for my experiments.

6.4.1. Dataset

As stated in sub-section 6.1, the datasets used for my experiments were two subsets of the KDD-99 dataset [9]. The KDD-99 10% dataset was used for training, with 494021 rows of data, and the KDD-99 corrected was used for testing, with 311029 rows of data. The corrected dataset is the same as their KDD-99 testdata dataset but with added labels. These were selected because KDD-99 is one of the more popular datasets used in anomaly detection research which makes it easier to compare my results with others.

6.4.2. Software

The software used in my implementation of an anomaly detector uses the Keras library for Python [29] to implement the autoencoder. All the classifiers used in my implementation comes from the Scikit-learn library for Python [30].

For my experiments I trained four different autoencoders, two of the autoencoders had layers of the following sizes: 120, 30, 5, 30, and 120. One of them was trained using the differential evolution scheme DE/pBest/1 and the other using DE/rand/1. The last two autoencoders had layers of sizes: 120, 2, and

(17)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

120 and were also trained using DE/pBest/1 and DE/rand/1 in order to see if there were any improvement using more layers and to be able to compare which of the schemes were more suitable for this method. The activation functions used by the autoencoder were “relu” by every layer except the output layer which used “sigmoid”.

The differential evolution that is used to train the autoencoder uses k-Means Clustering where k = 2, has a boundary of [−5, 5] for all the values, and the following parameters:

• Population size = 20 • Mutation Factor = 0.8 • Recombination Factor = 0.9 • Generations = 200

• When DE/pBest/1 scheme is used, p = 0.1

The k-Nearest Neighbor classifier had its n_neighbors parameter set to 5, Fig. 7. The Decision Tree classifier and the Random Forest classifier used the default parameters. For information about the default parameters see [31].

Figure 7. The parameters used when creating the k-Nearest Neighbors classifier.

The Artificial Neural Network (ANN) used for classification had the following structure: An input layer with neurons based on the number of extracted features, five hidden layers with the sizes 50, 25, 10, 5 and 3, and an output layer with one neuron. Every layer uses the activation function relu except for the output layer which uses sigmoid for binary classification.

6.4.3. Hardware

The following is the specifications of the hardware of the computer used for the experiments: • Processor: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz, 1800 Mhz, Quad Core. • 8GB Ram

(18)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

7. Results

When training the autoencoders using differential evolution, I could expect a worse performance from the autoencoders that used k-Means Clustering as a part of the fitness function used to train it by looking at how the fitnesses improved over time. The autoencoders that did not use k-Means Clustering had a starting fitness of around 0.44, whereas the autoencoders that used k-Means Clustering had a starting fitness of about 58 and 17 for the autoencoders that were trained using the DE/pBest/1 scheme and DE/rand/1 scheme, respectively. The final fitnesses for these autoencoders after 200 generations were: 0.16, 0.19, 0.31, and 0.47 for the autoencoders that did not use k-Means Clustering with DE/pBest/1 and DE/rand/1, and for the autoencoder that used k-Means Clustering with DE/pBest/1 and DE/rand/1, respectively, Fig. 8.

Figure 8. Shows how the fitnesses of the different autoencoders improved over time. Y-axis show fitness value and X-axis show the generation.

When using an autoencoder that only extracted two features, the results of the classifiers were poor, no matter what scheme was used for the differential evolution that trained it and if the fitness function used k-Means Clustering or not. Decision Tree, k-Nearest Neighbors, Random Forest, and ANN all had 19.48% accuracy, Table. 1-4, with a 0% false-positive rate and a 100% false-negative rate for every autoencoder except for the autoencoder that was trained using differential evolution with DE/pBest/1 scheme and without k-Means Clustering. With that autoencoder all the classifiers had an accuracy of 19.48% with a false-positive rate of 0.01% for the classifiers Decision Tree, k-Nearest Neighbors, and Random Forest, whereas ANN had a false-positive rate of 0%. The false-negative rate of the classifiers using the extracted features from that autoencoder were 99.99% for Decision Tree, k-Nearest Neighbor and Random Forest. ANN had a false-negative rate of 100%, Table. 1.

Table 1. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/pBest/1 without k-Means Clustering.

Classifier

Accuracy

False-Positive Rate

False-Negative Rate

Decision Tree (2) 19.49% 0.01% 99.99% Decision Tree (5) 93.39% 5.10% 6.98% k-Nearest Neighbors (2) 19.49% 0.01% 99.99% k-Nearest Neighbors (5) 94.29% 0.29% 6.73% Random Forest (2) 19.49% 0.01% 99.99% Random Forest (5) 94.35% 1.25% 6.71% ANN (2) 19.48% 0% 100% ANN (5) 92.41% 2.73% 8.76%

(19)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

When extracting five features using an autoencoder that was trained without k-Means Clustering, there was an increase in accuracy for all the classifiers. For the autoencoder that was trained with the DE/pBest/1 scheme the Decision Tree had an accuracy of 93.39%, the k-Nearest Neighbors had an accuracy of 94.29%, the Random Forest had an accuracy of 94.35%, and the ANN had an accuracy of 92.41%. The false-positive rates for the classifiers were 5.10% for Decision Tree, 0.29% for k-Nearest Neighbors, 1.25% for Random Forest, and 2.73% for ANN and they had a false-negative rate of 6.98%, 6.73%, 6.71%, and 8.76%, respectively, Table. 1.

When using the autoencoder that was trained with the DE/rand/1 scheme, the Decision Tree classifier saw an accuracy of 93.75%, the k-Nearest Neighbors an accuracy of 93.63%, the Random Forest an accuracy of 95.13%, and the ANN an accuracy of 94.04%. With this autoencoder the classifers had a false-positive rate of 1.08%, 1.95%, 0.81%, and 0.92 and a false-negative rate of 7.50%, 7.44%, 5.86%, and 7.18%, respectively, Table 2.

When extracting five features using an autoencoder that was trained by a differential evolution that used k-Means Clustering, the classifiers had the same accuracy, false-positive rate, and false-negative rate as the classifiers that used 2 extracted features from the autoencoder trained using the DE/pBest/1 scheme with k-Means Clustering, Table. 3. With the autoencoder that was trained by a differential evolution using the DE/rand/1 scheme with k-Means Clustering, the classifiers had an increased false-positive rate of 0.02 percentage points and a decreased false-negative rate of 0.01 percentage points, Table. 4. When comparing these results with those from Chen et al. [7], we can see that the results from my research are worse. Chen et al. used an autoencoder that was trained using backpropagation and achieved the accuracy of 95.85%, whereas my proposed solution never goes above 20%. The classifiers that use extracted features from the autoencoders trained without k-Means Clustering comes close with Random Forest getting the highest accuracy of 95.13%. The results from Shah and Trivedi [25] show that their model of feature extraction using information gain with a neural network classifier leads to a higher accuracy than the results presented here.

As the results show, using k-Means Clustering as part of the fitness function used by the differential evolution that trains the autoencoder responsible for extracting features, leads to the classifiers not being able to classify anomalies correctly as compared to not using k-Means Clustering as part of the fitness function. However, these results may be a consequence of the low number of clusters used when training with k-Means Clustering, as the data points may be spread out which would lead to a higher fitness.

Table 2. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/rand/1 without k-Means Clustering.

Classifier

Accuracy

False-Positive Rate

False-Negative Rate

Decision Tree (2) 19.48% 0% 100% Decision Tree (5) 93.75% 1.08% 7.50% k-Nearest Neighbors (2) 19.48% 0 100% k-Nearest Neighbors (5) 93.63% 1.95% 7.44% Random Forest (2) 19.48% 0% 100% Random Forest (5) 95.13% 0.81% 5.86% ANN (2) 19.48% 0% 100% ANN (5) 94.04% 0.92% 7.18%

(20)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

Table 3. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/pBest/1 with k-Means Clustering.

Classifier

Accuracy

False-Positive Rate

False-Negative Rate

Decision Tree (2) 19.48% 0% 100% Decision Tree (5) 19.48% 0% 100% k-Nearest Neighbors (2) 19.48% 0% 100% k-Nearest Neighbors (5) 19.48% 0% 100% Random Forest (2) 19.48% 0% 100% Random Forest (5) 19.48% 0% 100% ANN (2) 19.48% 0% 100% ANN (5) 19.48% 0% 100%

Table 4. The results when using the classifiers with (2) or (5) extracted features from the autoencoder trained with the scheme DE/rand/1 with k-Means Clustering.

Classifier

Accuracy

False-Positive Rate

False-Negative Rate

Decision Tree (2) 19.48% 0% 100% Decision Tree (5) 19.48% 0.02% 99.99% k-Nearest Neighbors (2) 19.48% 0% 100% k-Nearest Neighbors (5) 19.48% 0.02% 99.99% Random Forest (2) 19.48% 0% 100% Random Forest (5) 19.48% 0.02% 99.99% ANN (2) 19.48% 0% 100% ANN (5) 19.48% 0% 100%

(21)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

8. Discussion

The obtained results of my thesis could be useful for researchers as it shows how it is possible to combine different methods with an autoencoder with the help of an evolutionary algorithm such as differential evolution. However, with my limited time I was unable to implement a good fitness function for the k-Means Clustering part of the fitness, which is why the classification results from those extracted features are worse than the extracted features from the autoencoders that did not use k-Means Clustering. Without a good fitness function to represent the method combined with the autoencoder the results will be poor. As shown in section 7, the results from this are worse than any related works in this field.

A literature review was performed for this thesis, but it was in no way systematic and as such a lot of contributions to this field could have been missed. The information gathered from the surveyed papers proved to suffice for my research.

The chosen method to combine with an autoencoder for this thesis was k-Means Clustering, which led to poor results from the classifiers. The poor results may be because of the low number of clusters used for my experiments. The data points might have been spread out, which leads to the low number of clusters behaving poorly in the fitness function of the differential evolution process. An increase in the number of clusters could improve the results and should be considered for future work. However, other methods might be more suited to combine with an autoencoder that might lead to better results. In this thesis, only four classifiers were used and as such needs to be tested more thoroughly using more classifiers to identify the best classifier to use with the chosen training method proposed in this thesis. Since the research for this thesis is mostly about how to combine different methods with an autoencoder, the results are in no way limited to anomaly detection in networking and can be used for any research that involves using an autoencoder for feature extraction.

(22)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

9. Conclusions

The main goal of this thesis was to propose a new method to detect anomalies in networks by using autoencoders together with k-Means Clustering to extract features. To answer RQ1 a literature review was performed on different training techniques for the autoencoder. While doing the review a training technique was found that uses an evolutionary algorithm that I could modify to be able to combine k-Means Clustering with the autoencoder by having the k-k-Means Clustering be a part of the fitness function.

To see if it was an efficient solution to the problem and to answer RQ2, the extracted features from the trained autoencoder was used by several chosen classifiers that show that all the classifiers used had equal results. Which means that when using the parameters explained in section 6.4, there is no classifier that performs better. However, the accuracies of all the classifiers were bad with these extracted features when compared to accuracies of the same classifiers on autoencoders trained without k-Means Clustering.

The research conducted in this thesis is only preliminary research and as such the results should not be used for any real anomaly detection systems. However, there is a big room for improvements on the results with continued research on this topic by finding more optimal parameters to use.

(23)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

10. Future Work

There are several things that can be done to complement the research I have conducted. I only used three different classifiers to classify the anomalies, which means that there are classifiers that have not been tested using the method discussed in this thesis. Any of the untested classifiers could perform better than the ones seen in this thesis.

The dataset that was used for this thesis was the KDD-99 dataset. It would be interesting to see how this method performs on other datasets that contain fewer/more features to extract from and do a comparison of the different datasets.

The training of the autoencoder also has room for improvement by using more hidden layers and/or finding an optimal number of features to extract. It could also prove beneficial to use more centroids for the k-Means Clustering used by the fitness function of the differential evolution.

Finally, I only tested two mutation schemes for the differential evolution in this thesis. Those being: DE/rand/1 and DE/pBest/1, but there are more mutation schemes to be tested [32] with the method used to find the most optimal scheme.

(24)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

References

[1] A. Vikram and Mohana, “Anomaly detection in Network Traffic Using Unsupervised Machine Learning Approach,” in Proceedings of the Fifth International Conference on Communication and

Electronics Systems (ICCES 2020), Coimbatore, India, 2020, pp. 476-479. [Online]. Available:

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9137987, Accessed on: 2021-03-09.

[2] K. Leung and C. Leckie, “Unsupervised anomaly detection in network intrusion detection using clusters,” in Proceedings of the Twenty-eighth Australasian conference on Computer Science, Newcastle, Australia, 2005, pp. 333-342. [Online]. Available: https://crpit.scem.westernsydney.edu.au/confpapers/CRPITV38Leung.pdf, Accessed on: 2021-03-09.

[3] S. Singh and G. Kaur, “Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters,” in Proceedings of the National Conference on Challenges & Opportunities in

Information Technology (COIT-2007), Mandi Gobindgarh, India, 2007, pp. 107-110. [Online].

Available:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.85.2811&rep=rep1&type=pdf, Accessed on: 2021-03-09.

[4] I. Syarif, A. Prugel-Bennett and G. Wills, “Unsupervised clustering approach for network anomaly detection,” in Proceedings of the Fourth International Conference on Networked Digital

Technologies (NDT 2012), Dubai, United Arab Emirates, 2012, pp. 135-145. [Online]. Available:

https://eprints.soton.ac.uk/338221/1/Unsupervised_Clustering_and_Outlier_Detection_approach_ for_network_anomaly_detection_-_camera_ready_new.pdf, Accessed on: 2021-03-09.

[5] M. Panda and M. R. Patra, “Ensemble of classifiers for detecting network intrusion,” in

Proceedings of the International Conference on Advances in Computing, Communication and Control (ICAC3’09), Mumbai, India, 2009, pp. 510-515. [Online]. Available:

https://dl.acm.org/doi/pdf/10.1145/1523103.1523204, Accessed on: 2021-03-09.

[6] A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur and J. Srivastava, “A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection,” in Proceedings of the 2003 SIAM

International Conference on Data Mining, San Francisco, CA, USA, 2003, pp. 25-36. [Online].

Available: https://epubs.siam.org/doi/abs/10.1137/1.9781611972733.3, Accessed on: 2021-03-15.

[7] Z. Chen, C. K. Yeo, B. S. Lee and C. T. Lau, “Autoencoder-based network anomaly detection,” in

Proceedings of the 2018 Wireless Telecommunication Symposium, Phoenix, AZ, USA, 2018, pp.

1-5. [Online]. Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8363930, Accessed on: 2021-03-09.

[8] F. Falcão et al., “Quantitative Comparison of Unsupervised Anomaly Detection Algorithms for Intrusion Detection,” in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing,

Limassol, Cyprus, 2019, pp. 318-327. [Online]. Available:

https://dl.acm.org/doi/pdf/10.1145/3297280.3297314, Accessed on: 2021-0309.

[9] UCI, “KDD Cup 1999 Data,” Oct. 28, 1999. [Online]. Available: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, Accessed on: 2021-04-19.

[10] A. Hodzic and D. Skulj, “Data Driven Anomaly Control Detection For Railway Propulsion Control Systems,” MSc thesis, Department of Innovation, Design and Engineering, Mälardalen University, Västerås, Sweden, 2020. [Online]. Available: https://mdh.diva-portal.org/smash/get/diva2:1437916/FULLTEXT01.pdf, Accessed on: 2021-05-24.

(25)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

[11] IBM, “Unsupervised Learning,” 2020 . [Online]. Available:

https://www.ibm.com/cloud/learn/unsupervised-learning, Accessed on: 2021-04-15.

[12] Z. Ghahramani, "Unsupervised learning," in Advanced Lectures on Machine Learning, vol. 3176, 1st ed, O. Bousquet, U. von Luxburg, G. Rätsch, Ed. Germany: Springer, 2004, pp. 72-112. [13] R. Abukhader and S. Kakoore, ”Artificial Intelligence for Vertical Farming – Controlling the Food

Production,” MSc thesis, Department of Innovation, Design and Engineering, Mälardalen University, Västerås, Sweden, 2021. [Online]. Available: https://mdh.diva-portal.org/smash/get/diva2:1526309/FULLTEXT01.pdf, Accessed on: 2021-04-15.

[14] W. Badr, “Auto-Encoder: What Is It? And What Is It Used For? (Part 1),” towards data science. [Online]. Apr. 22, 2019. Available: https://towardsdatascience.com/auto-encoder-what-is-it-and-what-is-it-used-for-part-1-3e5c6f017726, Accessed on: 2021-04-16.

[15] A. Dertat, “Applied Deep Learning - Part 3: Autoencoders,” Towards Data Science. [Online]. Oct. 3, 2017. Available: https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798, Accessed on: 2021-05-24.

[16] M. J. Garbade, “Understanding K-means Clustering in Machine Learning,” towards data science. [Online]. Sep. 12, 2018. Available: https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1, Accessed on: 2021-04-16.

[17] C. Piech, “K Means,” Stanford. [Online]. 2013. Available: https://stanford.edu/~cpiech/cs221/handouts/kmeans.html, Accessed on: 2021-05-24.

[18] S. Das and P. N. Suganthan, "Differential Evolution: A Survey of the State-of-the-Art," in IEEE

Transactions on Evolutionary Computation, vol. 15, no. 1, pp. 4-31, Feb. 2011.

[19] M. Hamilton and J. Nyman, ” A comparison of differential evolution and a genetic algorithm applied to the longest path problem,” BSc thesis, Department of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, 2018. [Online]. Available: http://kth.diva-portal.org/smash/get/diva2:1241471/FULLTEXT02.pdf, Accessed on: 2021-05-24.

[20] IBM, ”Supervised Learning,” 2020. [Online]. Available:

https://www.ibm.com/cloud/learn/supervised-learning, Accessed on: 2021-05-24.

[21] F. Pereira, T. Mitchell and M. Botvinick, ” Machine learning classifiers and fMRI: A tutorial overview,” Mathematics in Brain Imaging, vol. 45, pp. 199–209, Nov. 2018, doi: https://doi.org/10.1016/j.neuroimage.2008.11.007.

[22] P. Gupta, ” Decision Trees in Machine Learning,” Towards Data Science. [Online]. May. 17, 2017. Available: https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052, Accessed on: 2021-05-24.

[23] R. Gandhi, ” K Nearest Neighbours — Introduction to Machine Learning Algorithms,” Towards

Data Science. [Online]. Jun. 13, 2018. Available:

https://towardsdatascience.com/k-nearest-neighbours-introduction-to-machine-learning-algorithms-18e7ce3d802a, Accessed on: 2021-05-24.

[24] T. Yiu, “Understanding Random Forest,” Towards Data Science. [Online]. Jun. 12, 2019. Available: https://towardsdatascience.com/understanding-random-forest-58381e0602d2, Accessed on: 2021-05-24.

(26)

Simon Azmoudeh Fard Anomaly Detection in Networks using Autoencoder and Unsupervised Learning Methods

[25] B. Shah and B. H. Trivedi, "Reducing Features of KDD CUP 1999 Dataset for Anomaly Detection Using Back Propagation Neural Network," 2015 Fifth International Conference on Advanced

Computing & Communication Technologies, 2015, pp. 247-251

[27] H. Okada, “Evolutionary Training of Autoencoders by Differential Evolution,” International

Journal of Science and Engineering Investigations, vol. 9, no. 103, pp. 22–26, 2020.

[26] G. Yan, “Network Anomaly Traffic Detection Method Based on Support Vector Machine,” in 2016

International Conference on Smart City and Systems Engineering (ICSCSE), Hunan, China, 2016,

pp. 3-6. [Online]. Available: https://ieeexplore-ieee-org.ep.bib.mdh.se/document/7825024, Accessed on: 2021-06-14.

[28] D. Becker, “Using Categorical Data with One Hot Encoding,” Kaggle. [Online]. 2018. Available: https://www.kaggle.com/dansbecker/using-categorical-data-with-one-hot-encoding, Accessed on: 2021-05-24.

[29] F. Cholletet, et. al, “Keras,” Jun. 18, 2020. [Online]. Available: https://github.com/fchollet/keras, Accessed on: 2021-04-19.

[30] O. Grisel, et. al, ”Scikit-learn,” Apr. 29, 2021. [Online]. Available: https://github.com/scikit-learn/scikit-learn, Accessed on: 2021-05-22.

[31] Scikit-learn, “Supervised Learning,” 2007. [Online]. Available: https://scikit-learn.org/stable/supervised_learning.html, Accessed on: 2021-05-24.

[32] M. Leon and N. Xiong, “Investigation of mutation strategies in differential evolution for solving global optimization problems,” in Artificial Intelligence and Soft Computing, Springer, June 2014, pp. 372–383.

(27)

Figure

Figure 1. Typical architecture of an autoencoder. From. [15].
Figure 2. Shows the process of k-Means Clustering. From. [17].
Figure 4. Shows how a Random Forest classifier classifies a data point. From. [24].
Figure 5. This figure shows how one-hot encoding splits up a feature into several features to avoid having  non-numeric values
+5

References

Related documents

the second duty point phase. Outliers identified by DBSCAN and LOF are marked in green and red respectively. Outliers identified by both methods are marked in orange. This figure

We then propose a model of multilayer network formation that considers target measure for the network to be generated and focuses on the case of finite multiplex networks?.

Figure 21 displays the principal detection model actions. The flowchart in the Figure shows the steps we took to detect anomaly behavior. The first step in detecting anomalies is

The DARPA KDD 99 dataset is a common benchmark for intrusion detection and was used in this project to test the two algorithms of a Replicator Neural Network and Isolation Forest

The children in both activity parameter groups experienced the interaction with Romo in many different ways but four additional categories were only detected in the co-creation

contented group. Among other things, they are increasingly angry at the president’s failure to prosecute anyone for the Maspero massacre in October 2011. The draft consti-

100, in order to get a good mean or median value such a high number of runs is to be preferred. Run the eight tasks above, using the script megasat_creator_finder.py. The script

In this thesis, two different unsupervised machine learning algorithms will be used for detecting anomalous sequences in Mobilaris log files, K­means and DBSCAN.. The reason