Anomaly Detection in Industrial Networks using a Resource-Constrained Edge Device

(1)

using a

Resource-Constrained Edge Device

Anton Eliasson

Computer Science and Engineering, master's level 2019

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

(2)

Anomaly Detection in Industrial Networks using a

Resource-Constrained Edge Device

Anton Eliasson

Lule˚a University of Technology

Dept. of Computer Science, Electrical and Space Engineering Lule˚a, Sweden

(3)

(4)

A

^BSTRACT

The detection of false data-injection attacks in industrial networks is a growing challenge in the industry because it requires knowledge of application and protocol specific behaviors. Profinet is a common communication standard currently used in the industry, which has the potential to encounter this type of attack. This motivates an examination on whether a solution based on machine learning with a focus on anomaly detection can be implemented and used to detect abnormal data in Profinet packets. Previous work has investigated this topic; however, a solution is not available in the market yet. Any solution that aims to be adopted by the industry requires the detection of abnormal data at the application level and to run the analytics on a resource-constrained device.

This thesis presents an implementation, which aims to detect abnormal data in Profinet packets represented as online data streams generated in real-time. The implemented unsupervised learning approach is validated on data from a simulated industrial use-case scenario. The results indicate that the method manages to detect all abnormal behaviors in an industrial network.

iii

(5)

(6)

P

^REFACE

First of all, I want to thank my family for the support you have given me, not only during this thesis but also during my entire study period at LTU.

This thesis work was carried out in collaboration with HMS Networks, where I especially would like to thank my supervisor, Henrik Arleving, for your ideas, help, and feedback throughout the project. I also want to thank the rest of the members of the company who helped me, especially Mattias Svensson who assisted me with technical problems with the software and hardware at the beginning of the project.

I would also like to thank my supervisor at LTU, Sergio Martin Del Campo Barraza.

Your quick responses and feedback have been amazing and have helped me a lot in order to finish the report. Your knowledge in machine learning has been really valuable as well, and you have learned me a lot during these months.

I also want to take the opportunity to thank the team at Stream analyze for guiding me with the analytics platform, especially Johan Risch. Your help has been incredible.

Anton Eliasson

v

(7)

(8)

C

^ONTENTS

Chapter 1 – Introduction 1

1.1 Background . . . . 1

1.1.1 Attacks on industrial systems . . . . 1

1.1.2 Intrusion detection in Industrial Networks . . . . 2

1.1.3 Network Analytics . . . . 3

1.2 Related work . . . . 4

1.3 Problem formulation . . . . 5

1.3.1 Delimitations . . . . 5

1.4 Thesis outline . . . . 6

Chapter 2 – Theory 7 2.1 Profinet . . . . 7

2.1.1 Profinet system model . . . . 8

2.1.2 Profinet Cyclic IO Data Exchange . . . . 9

2.1.3 Abnormality in Profinet packets . . . . 10

2.2 Stream Processing . . . . 11

2.3 Feature Engineering . . . . 12

2.3.1 Features . . . . 12

2.3.2 Selection of features . . . . 13

2.4 Anomaly Detection . . . . 13

2.5 Machine learning algorithm for anomaly detection . . . . 15

2.5.1 DBSCAN . . . . 16

Chapter 3 – Methodology 19 3.1 Tools . . . . 19

3.1.1 Hardware . . . . 19

3.1.2 Software . . . . 20

3.2 Architecture . . . . 21

3.3 Generation of network data . . . . 21

3.4 Data processing . . . . 23

3.4.1 Data collection . . . . 23

3.4.2 Pre-processing . . . . 23

3.4.3 Feature extraction . . . . 24

3.5 Creation of training and validation sets . . . . 25

(9)

3.7.1 Validation of clustering model . . . . 29

3.7.2 Validation of the online data stream case . . . . 31

Chapter 4 – Results 33 4.1 Clustering method . . . . 33

4.1.1 Scenario A . . . . 33

4.1.2 Scenario B . . . . 34

4.1.3 Online data stream case . . . . 35

Chapter 5 – Discussions, Conclusions and Future Work 39 5.1 Discussions and Conclusions . . . . 39

5.2 Future Work . . . . 41

Appendix A – Tables 43

viii

(10)

C

^HAPTER

1 Introduction

There is a current need to protect industrial control systems from attacks in the industry.

False data-injection attacks, where application data is modified in the network packets aiming to interrupt the normal behavior of an industrial process, is one type of attack that can cause major problems if it is not detected in time. Intrusion detection systems need to be able to detect these attacks as early as possible. Therefore, analysis of network traffic occurs over continuous sequences of data in real-time. Moreover, it is difficult to obtain knowledge about all kinds of existing attacks, especially unknown attacks that had never been seen before. Given these reasons, a solution based on anomaly detection and unsupervised machine learning is implemented in this project, which aims to detect abnormal data in industrial network packets. An investigation of the performance of an anomaly detection solution is made and conclusions are drawn to determine if it is possible to detect abnormal data in an industrial network using deep packet inspection and machine learning. The examination of the proposed solution is extended to evaluate the performance of the machine learning implementation when it is running on a resource- constrained edge device.

1.1 Background

1.1.1 Attacks on industrial systems

During the development of industrial control systems, security has not been prioritized in particular. The reason has been that the factory floors have been isolated from the outside world and there has been no need for securing the systems from attacks and intrusions. However, the interest in security and malware detection capabilities is in- creasing considerably due to a higher demand for devices that are part of the Industrial Internet-of-Things (IIoT).

One example of a real-world malware is the well-known computer worm Stuxnet, which attacked Iran’s nuclear plant in 2010 [1]. Stuxnet aimed to target industrial controllers,

1

(11)

specifically Siemens programmable logic controllers (PLCs), used to control industrial processes such as industrial centrifuges for the separation of nuclear material. The program, which controls and monitors the PLCs, called Step7, uses a specific library for the communication with the PLCs. The worm managed to get control over this library and was able to handle requests sent to the PLCs. Next, malicious code was created and introduced to target specific PLCs. The payload of the data sent from the PLCs to the centrifuges changed and set the speed to much lower or higher than the regular behavior. The slow speed causes the uranium process to run inefficiently, while the high speed can potentially destroy the centrifuges. Before the attack was made, the malware recorded the normal operation of the centrifuges speed. The recorded data was then fed to the monitor program, WinCC, and prevented the system from alerting on anomalous behavior, which made it difficult to detect the abnormal speed [2].

Other attacks such as Denial-of-Service (DoS), replay attacks, and deception attacks have recently received attention in industrial systems as well. A DoS attack aims to drain the resources of the network to make them unavailable and prevents the devices of the network from communicating with each other [3]. Replay attacks intercept a valid data transmission and then repeats the transmission with the purpose of fetching the same information as the original data transmission [3]. In deception attacks, also known as false data-injection attacks, data integrity is modified in the transmitted packets. Stuxnet is one example of such an attack [3]. False data-injection attacks are harder to detect and are not as frequently investigated as DoS attacks [4]. Other types of existing attacks on industrial systems are eavesdropping, Man-in-the-Middle attacks, various types of worms, trojans, and viruses [5]. Detection of these attacks in an industrial environment is crucial in order to have a secure system. Depending on the type of attack mentioned above, the detection method may be different. This work focus on the detection of false data-injection attacks.

1.1.2 Intrusion detection in Industrial Networks

Intrusion detection systems for industrial systems can be divided into two categories, network-based or host-based [6, 7]. Network-based systems collect and analyze the entire network communication while the host-based solution identifies intrusion behavior on each individual node. As network-based intrusion detection systems only need to be installed on one point in a network and can analyze the inbound and outbound traffic of all configured devices on the network, they are often more suitable for automation networks [7]. In addition, host-based intrusion detection systems require additional memory and computing resources, which can affect the industrial process.

Each of these two methods can, in turn, be either misuse-based or anomaly-based.

Misuse-based intrusion detection systems compare the incoming data with a predefined signature, which is the reason they are known as signature-based intrusion detection systems as well. Meanwhile, anomaly-based intrusion detection systems compare the

(12)

1.1. Background 3 current behavior against a normal learned behavior and try to identify abnormal patterns [6]. The main difference between signature and anomaly-based systems lies in the concept of attack and anomaly. An attack is an operation that aims to put the security of a system at risk [8], while an anomaly, in a network security context, is a behavior that is suspicious to be a risk to the security since it does not act similarly to the historical behavior. Signature-based systems are very good at detecting predefined attacks but lack the ability to detect unknown and unseen behavior. Anomaly-based detection techniques have the potential to detect these unseen events. There is a great interest in being able to analyze network traffic in an anomaly detection procedure, due to the increase of new unknown attacks on modern industrial control systems. This permits to identify not only old known threats but also abnormal behavior that has not been seen before.

False data-injection attacks, can only be detected with a deep packet inspection where the payload of the packets sent over the network is explored [9]. Since industrial systems include multiple different communication protocols and standards, each protocol needs its own analysis and therefore the packet inspection might differ depending on the protocol.

Profinet is one example of all the communication standards used in the industry and is one of the emerging protocols found in many industrial applications today [7]. The protocol communicates over Industrial Ethernet and is one of the leading Industrial Ethernet Standards in the market [10]. The Profinet application data in the packets sent over the network varies depending on the application. Taking the example with the centrifuges of Iran’s nuclear program, the actual application data of the Profinet packets can be the desired speed of the operating motor.

Anomaly-based intrusion detection systems in information technology (IT) systems are not so common in practice. The reason is the dynamic behavior of regular IT systems, which makes it difficult to define a proper model for normal behavior. For industrial networks, however, communication is often much more structured and steady [7]. These considerations motivate our work on intrusion detection systems that can inspect industrial communication protocols on the application level with deep packet inspection and detect anomalies in the network data.

1.1.3 Network Analytics

Where to run the analytics of network data, and data in general, has importance to different concepts. Latency and network bandwidth are terms that might be interpreted differently depending on where the analytics is running.

Virtually unlimited resources in the cloud have resulted in the emergence of a lot of different cloud services. Running analytics in the cloud requires that the data to be analyzed is sent from the data source to the server continuously. An alternative is to run analytics in the local network, closer to the data source. Edge computing is the concept of running analytics directly at the network edge [11]. Ahmed [11] compares the pros and cons of cloud computing compared to edge computing. Cloud computing

(13)

has the benefit that hardware capabilities are scalable, resulting in unlimited resource- capabilities in practice. The disadvantage of cloud-based solutions is that latency and jitter are often high. Sending continuous data to the cloud requires a constant high bandwidth. Edge-based solutions, on the other hand, can maintain low latency, since the traffic to the cloud can be reduced. Edge computing is therefore suitable for applications that require analytics on a great volume of data in real-time. The drawback might be that edge devices are often resource-constrained.

Detecting anomalies in network data requires analytics of big data volume in real-time.

This thesis work investigates an edge-based solution, where detection of anomalies in network traffic will be made on a resource-constrained edge device.

1.2 Related work

Numerous studies have been made on information security and intrusion detection systems in general [12]. Wressnegger et al. [13] discuss the need for protocol specifications to analyze the content in the data for industrial networks and presents a content-based anomaly detection framework for binary protocols. Their method manages to detect 97.1 % of the attacks in their dataset with only 2 false-alarms out of 100,000 messages.

Most of the research within network-based intrusion detection systems on industrial networks is focused on detecting anomalies in the traffic flow characteristics, such as through- put, port number, and IP addresses. Mantere et al. [14] analyze possible features for use in a machine learning based anomaly detection system. The research is however limited to the IP traffic and does not consider the actual payload of the traffic.

A few papers focus specifically on the Profinet standard. Sestito et al. [15] present a method for detecting anomalies in Profinet networks. Their method uses an Artificial Neural Network (ANN) for classifying the incoming data into 4 different classes, where one of the classes is normal operation. The authors derive 16 different traffic-related features used for the ANN and conclude that their methodology may be successful for anomaly detection in any Profinet network. Their method is however based on supervised learning and requires labeled data for classification. Schuster et al. [9] also present an approach for anomaly detection of Profinet. Unlike the method above, their model analyses all network data, including flow information, application data, and the packet sequences by doing deep packet inspection. They extract features from the actual packet data, such as mac address, packet type, packet payload. Schuster et al. [16] present the results of applying the one-class SVM algorithm for detecting anomalies in Profinet packets.

The work in this thesis focus on similar approaches, with deep packet inspection and unsupervised anomaly detection methods. Additionally, the method presented in this work is based on a stream-based approach where the machine learning algorithm is doing the analytics online on the incoming data. Mulinka and Casas [17] compare different stream-based machine learning algorithms, for the case of detecting abnormal network traffic in an online manner.

(14)

1.3. Problem formulation 5

1.3 Problem formulation

Analyzing industrial networks with deep packet inspection is not completely straightfor- ward. One of the key challenges is the need for specific knowledge about each individual protocol. Today, there does not exist a solution in the market which can inspect industrial network packets at a payload-level and detect malicious malware with an anomaly detection approach and in addition, run the analytics on a resource-constrained device.

Been able to detect unknown malware and additionally run the analytics locally on the edge would have many benefits.

HMS Networks AB, a Swedish company from Halmstad, supplies products and solutions for industrial communication and the Industrial Internet-of-Things. The company has a large amount of experience in industrial communications and is a leader in providing software that connects industrial devices to different industrial networks and IoT systems.

HMS constantly strives to prototype new solutions and possibilities within the market.

With the new technology within edge computing and machine learning, the company wishes to examine the possibilities to design and implement a machine learning-based solution running on a resource-constrained edge device from HMS.

Detection of anomalies in an industrial protocol is an interesting use case for both the company and the industry. Since HMS has deep knowledge about specific industrial protocols, the company wants to examine the possibility of using machine learning to detect anomalies in industrial network packets. This project is narrowed down to the PROFINET standard because the way of detecting anomalies is not the same for all protocols. Therefore, the goal of this project will be to implement and evaluate a machine learning solution to detect anomalies in Profinet packets. The solution should run on one of HMS resource-constrained edge devices. In summary, the aim of the project is to investigate and answer the following research questions:

Q1 Can abnormal data in an industrial network be detected using Deep Packet Inspec- tion and Machine Learning?

Q2 To what extent, as it relates to performance, speed, and system footprint, can an unsupervised anomaly detection algorithm be useful on a resource-constrained edge device?

Q3 Is the implementation of the solution feasible in an existent device available in the market?

1.3.1 Delimitations

As described in the problem formulation, the intention of the thesis is to investigate whether it is possible to detect abnormal data in industrial networks. Unfortunately, it will not be possible to test and validate the implemented solution on network data generated from an actual industrial environment. Therefore, data will be simulated.

(15)

Real world network data would be needed to deploy the implementation as a complete solution in production.

1.4 Thesis outline

This thesis is structured as follows: Chapter 2 provides the background theory about the different elements used in the proposed solution, such as theoretical information about Profinet, stream processing, feature engineering, anomaly detection, and machine learning. Chapter 3 describes the method used in the implementation of this work, the tools used in the implementation and how they fit together. This chapter also provides a description of how the different parts in the solution are generated, implemented and validated. Chapter 4 presents the results of the proposed solution, while Chapter 5 provides conclusions to the research questions stated in the problem formulation and future work.

(16)

C

^HAPTER

2 Theory

Machine learning has existed for several decades, however, it is until recently that it has gained popularity and many of their methods have become possible to implement in real-world applications. Machine learning can be described as the process of using algorithms that learn from data to predict outcomes on later observations [42]. This is in contrast to algorithms that are explicitly programmed by humans.

A predictive machine learning model requires data for its training. When the model has been trained and is provided with inputs, it makes prediction outputs based on the data the model was trained with. In the case where detecting abnormal data in industrial networks is the goal, the model is trained with data from network traffic, and the model defines if new incoming data is normal or abnormal. The inputs to the model, also known as features, must be extracted from the data source. This process is called feature extraction and is a crucial step in the design of a machine learning solution.

The first step in the design of a machine learning solution is data collection, which in this project is data in the form of Profinet packets. Data processing includes data collection, pre-processing and feature extraction. The processed data can then be used as input to train a model, which finally can make a decision based on future inputs.

2.1 Profinet

Industrial Ethernet technology has recently increased in popularity by offering higher speed, longer connection distance and ability to connect more nodes than the traditional serial Fieldbus protocols in the industry floors [21]. Among several Industrial Ethernet standards, Profinet is one of the most common in the industry today, used in solutions such as factory automation, process automation, and motion control applications. Depending on the type of functionality and requirements for the data transmission over the network, Profinet offers two variants of functionalities. The first one, defined as Profinet CBA (Component Based Automation), is suitable for component-based machine-

7

(17)

to-machine communication via TCP/IP. The other variant is Profinet IO, used for data exchange between controllers and devices. This thesis will focus only on Profinet IO.

2.1.1 Profinet system model

A Profinet IO system consists of the following different device classes that communicate with each other:

IO-Controller A Profinet IO-Controller is typically the Programmable Logic Controller (PLC) where the control program runs. The IO-Controller exchanges information to the IO-Devices in the network, and acts as a provider of output instructions to the devices and a consumer of input data from the devices [18, 20].

IO-Device Profinet IO-Devices are distributed I/O field devices that can exchange data with one or several IO-Controllers [20].

IO-Supervisor An IO-Supervisor can be a personal computer (PC), programming device (PG), or a human-machine interface (HMI). The purpose of an IO-Supervisor can be commissioning or diagnostic [20].

A communication path between an IO-Controller and an IO-Device must be established to set up a communication between them, which is made during system startup [20].

When the IO-Controller is initialized, it sets up a connection, called Application Relation (AR), to each IO-Device using Distributed Computing Environment / Remote Procedure Calls (DCE RPC) [33]. The AR specifies Communication Relations (CR) where specific types of data are sent. The different CRs that exist in an AR are Record data CR, IO data CR, and Alarm CR. Figure 2.1 illustrates the application and communication relations.

Figure 2.1: AR and CR between IO-Controller and IO-Device. Picture taken from [20].

(18)

2.1. Profinet 9

2.1.2 Profinet Cyclic IO Data Exchange

Profinet provides services such as the cyclic transmission of I/O data (RT and IRT), acyclic transmissions of data (parameters, detailed diagnostics, etc.), acyclic transmission of alarms and address resolution [20]. However, this project will only deal with the cyclic IO data exchange, where data is sent from an IO-Device to an IO-Controller. The Cyclic transmission of I/O data in PROFINET IO occurs in the IO data CR, where cyclic data is sent between an IO-Controller and IO-Device. The data is always transmitted in real- time according to the definitions in IEEE and IEC for high-performance data exchange of I/O data [20]. The real-time communication in Profinet IO is separated into four classes, as illustrated in Table 2.1.

RT CLASS 1 is used for unsynchronized communication within a subnet, whereas RT CLASS 2 can be used for either unsynchronized or synchronized communication.

RT CLASS 3 supports Isochronous Real-Time with clock rates of under 1 ms and jitter below 1 µs. The last class, RT CLASS UDP, uses unsynchronized communication between different subnets. This project will deal with Profinet IO real-time class 1.

Profinet communication occurs in the data link layer, using the Ethernet protocol, according to the Open Systems Interconnection model (OSI model). An Ethernet frame in Profinet, illustrated in Figure 2.2, consists of a 16 bytes header block, containing destination address, source address, Ethertype, and Frame ID. The Ethertype is set to 0x8892, which indicates the protocol used in the payload is Profinet. The Frame ID differentiates the Profinet IO service used. For Cyclic data exchange with real-time class 1, the values are between 0x8000 and 0xBBFF. The payload, normally with a size between 40-1500 bytes, is the application data sent between an I/O-Controller and I/O-Device.

The cycle information, called cycle counter, sets the update time of the cycling data sent from the provider [18]. The frame includes also status information, used for validation of data status and transfer status in the cyclic exchange.

Table 2.1: Real-time classes in Profinet IO.

Real-time classes in Profinet IO

Class Functionality

RT CLASS 1 Unsynchronized communication within a subnet.

RT CLASS 2 Unsynchronized or synchronized communication.

RT CLASS 3 Isochronous Real-Time communication.

RT CLASS UDP Unsynchronized communication between different subnets.

(19)

Figure 2.2: Profinet frame. Picture taken from [19].

2.1.3 Abnormality in Profinet packets

The meaning of abnormal data in Profinet packets depends on the application and requires the definition of normal behavior. A common element for all false-data injection attacks is that the contents in the packets differ from the normal case. The bytes in the payload are modified to cause damage or interfere with the regular industrial process.

The main challenge to the detection of anomalies is the uncertainty for the detection system of the significance of each byte in the packet. Expanding on the Stuxnet example, where the speed of the centrifuges was changed, the speed is represented by a specific number of bytes in the payload. Detection of the abnormal behavior of the speed requires that the detection system knows where the data for the speed is in the payload. This is not possible if the detection system is supposed to work for the general case with a lot of different applications. Therefore, a more general approach is studied in this project, where the aim is to detect deviations of the behavior in the packets.

An industrial process is often quite static in normal operation. As long as no unexpected behavior occurs in the process, the inputs and outputs of a PLC stay within a limited interval. During an attack, however, the static operation gets disrupted and the interval of the input and output values are likely to widen. This results in that the range of value combinations the bytes in the payload can take increases, meaning more variations in the payload. Therefore, we make two assumptions for this project:

1. The payload of the packets varies little during normal operation resulting in a limited number of combinations.

2. The payload of the packets varies more during an attack resulting in additional variations in the data that would not appear during normal operation.

(20)

2.2. Stream Processing 11 These assumptions are backed by experts in the company, who also agree on them. The detection method will use these assumptions to detect abnormal behavior. Abnormal behavior on packet level in an industrial process is thus related to how much the content varies during a time period.

One drawback of anomaly-based intrusion detection systems is the high amount of false alarms. Not all anomalies are equivalent to an attack. For example, a fast increase of user activities in a network may be the result of a DOS or Worm attack but it can also be from an installment of new software under normal network operation [34]. Furthermore, abnormal data in Profinet does not necessarily mean that it is an attack either. It could be due to physical disturbances in the sensors that disrupt the control loop in a PLC.

However, the aim of the system in this project is to detect abnormal behavior. Further analysis, out of the scope of this project, is required to decide whether the abnormal behavior detected is an attack or not.

2.2 Stream Processing

A data stream can be described as a continuously growing sequence of data items [24].

Data streams exist in many different shapes in the real world, where some examples are sensor data, network traffic, and sound waves. A lot of applications today require to analyze incoming data streams in an online manner without actually storing the data.

Anomaly detection in network traffic by collecting all data and storing it in a traditional database for offline analytics is not practical or sufficient when the goal is to detect the anomalies in real-time. Furthermore, it is not always possible to store the large volume of data that is required for processing because of constraints in capacity. The continuous data used in many applications is often massive, unbounded and evolves over time [27].

Storing all incoming data in the memory of the device where the analytics takes place may not be always feasible, especially if the device is resource-constrained. A tool that makes it possible to query data directly on the incoming streams without storing all data is often necessary. A Data Stream Management System (DSMS) is a software that can process incoming queries over continuous data streams, in contrast to a DBMS (database management system) which works only for static data. The queries are often written in a scripting language, that is similar to SQL language, and can be used to make various kinds of filtering, calculations, and statistics on the streaming data. The queries over the data streams are continuous queries (CQs), which means they run continuously until stopped by the user and produce a streaming result as long as the queries are active [25].

This is in contrast to traditional queries on databases where the queries are executed until the requested data is delivered.

(21)

2.3 Feature Engineering

Anomaly detection methods based on a machine learning model use measurable information as input to decide whether the incoming data follows a regular pattern or not.

How to represent the input, also known as features, to a machine learning algorithm is an important element when doing a machine learning project [26]. Often, the collected raw data can not be used directly as input, but instead derived features need to be constructed from the data source. Feature engineering is the task of choosing the features to be used for the machine learning algorithm, where the choice is often based on domain-knowledge from experts within the specific field.

2.3.1 Features

Tran [41] proposes two different categories of network related features with the respective subcategories and examples:

• Packet traffic features:

Packet traffic-related features inspect individual packets and extract useful information from their header and content. The author divides the packet traffic features into four subcategories.

– Basic packet features: The most simple category is basic packet features, where basic header fields such as source port, destination port header length and various status flags are some examples.

– Content-based packet features: Content-based packets features are derived from the actual content in the packets.

– Time-based packet features: Time-based packet features focus on measuring the number of a certain variable occurring during a time period. One example of a time-based packet feature can be the number of frames from destination to the same source in the last t seconds.

– Connection-based packet features: The last subcategory, connection-based packets features, are those that identify characteristics between the sender and receiver. The number of frames to a unique destination in the last n packets from the same source could be a type of connection-based packet feature.

• Network flow traffic features:

The other category is network flow traffic-related features. The main difference between packet traffic features and network flow traffic features is that the latter one inspects the flow, meaning a sequence of packets, between source and destination. Analyzing sequences of packets is good as it enables to identify patterns that otherwise might be hidden on the individual packet level.

(22)

2.4. Anomaly Detection 13 – Basic features: An example of a basic traffic flow feature is the length of the

flow (in seconds).

– Time-window features: The author mentions the number of flows to unique destination IP addresses inside the network in the last t seconds from the same source as an example of a time-window feature.

– Connection-based features: The number of flows to a unique destination IP address in the last n flows from the same source is an example of a connection- based traffic flow feature.

The idea discussed by Tran [41] is applied to TCP and IP traffic. However, the idea remains the same within Profinet traffic. In summary, features can be constructed in various domains, such as time-domain, connection-domain, and frequency-domain. The features to be selected depends on the detection goal and how the abnormal behavior might look like.

2.3.2 Selection of features

The selection of which features to be used in this project are based on domain-knowledge and intuition of what measurements have clear distinctions between normal and abnormal data. The constructed features consider the assumption stated in section 2.1.3 that the payload varies more during abnormal operation, the constructed features take that in mind. Since the goal is to use deep packet inspection and detecting abnormality in the content of the packets, content-based packet features are used. The features also have to take connections into consideration, because in an industrial environment the data sent between an IO-Controller and different IO-Devices might not be the same.

The IO-Controller can have separated data transmissions to different devices, resulting in diverse data depending on the source and destination connection. Therefore, the machine learning model needs inputs that separate the connections. Another aspect to take into account is that abnormal behavior may not be possible to detect by a simple analysis of each individual packet. Detecting variations in data might require to examine several packets over a window or sequences of packets. Schuster et al. [16] construct feature vectors based on sequences of multiple packets. A similar approach is used in this work. The selected features are:

• Standard deviation of the payloads from source to destination for the n last packets

• Number of distinct payloads from source to destination for the n last packets

2.4 Anomaly Detection

The goal with anomaly detection, also referred to as outlier detection, is to identify patterns in data that are different from the expected behavior [35]. The expected behavior

(23)

depends on the underlying distribution of the data. Anomalies are those behaviors or objects that are not considered to be normal. Anomaly detection has been studied since the early 19ths century by the statistics community and finds its use in several applications amongst intrusion detection. In intrusion detection systems and network security, the aim is to find known or unknown anomalies that indicate an attack or a virus. Garcia-Teodoro et al. [8] divide anomaly-based network intrusion detection systems into three categories:

• Statistical-based

• Knowledge-based

• Machine learning-based

Statistical-based methods fit a statistical model representing the stochastic behavior of the system and assume that normal data belongs to the higher probability regions of the model whereas anomalies lie in the lower probability regions. Incoming data is compared with the trained model to estimate whether it is an anomaly or not. The anomaly decision is often based on an anomaly score with a predefined threshold. If the score exceeds the threshold the system detects the incoming data as an anomaly [8]. Knowledge-based intrusion detection systems are formed by human experts and are often defined by rules of what is a normal system behavior. The main advantage of using knowledge-based anomaly detection systems is the ability to relate the acquired information to the knowledge of the model. Another advantage is that the number of false alarms is often low. The third approach listed above is a machine learning-based method, which will be the focus in this thesis.

Machine learning-based anomaly detection use machine learning algorithms to classify the data as either normal or abnormal. A feature vector can be described as X(t) ∈ R at time t ∈ [0, t]. Consider two states that define a normal or abnormal operation, where w_q, q = 0, 1. w₀ = 0 indicates normal data and w₁ = 1 stands for anomalous data.

In machine learning a mapping between X and w_q is made by learning from historic measurements. Consider a data set D of m measurements where each measurement is an observation on X. With xi(t) indicating the i-th observation, also called a training sample in machine learning, D can be described as D = {x_i(t)}^m_i=1. Furthermore, the output model set can be described as D_l = {y_q}^k_l=1, with k measurements, where y_q are individual samples on w_q, which are called labels with q set to either 0 or 1. A pair (x_i(t), y_i) is a labeled measurement. For example, (x_i(t), 0) means a sample for normal operation and (x_i(t), 1) stands for an abnormal sample. Therefore there are three types of data sets of measurements.

• Normal data: Dn = {xi(t), 0}^k−u_i=1

• Undefined data: D = {x_j(t)}^m_j=1

(24)

2.5. Machine learning algorithm for anomaly detection 15

• Anomalous data: Dl = {xr(t), 1}^u_r=1

The equations are based on work by Thottan et al. [34]. D in this project is the raw measurement data of Profinet packets. Dn is the measurements when the system is running in normal mode, that is when no attacks occur. D_l corresponds to observations when an attack is happening, which is an abnormal behavior. Anomaly detection learns a mapping between a training set, consisting of measurements, to the operation state wq. This learning can then be used to classify new incoming events to either normal or anomalies [34].

A training set contains combinations of D_n, D and D_l. When D_lis included the learning is said to be supervised, since labels are included in the training set. Although supervised learning methods can provide higher accuracy than using unsupervised learning there are some drawbacks. Labels are often very difficult and time-consuming to obtain in reality.

The knowledge of how the network and packets behave is often too poor in order to set proper labels. These concerns apply to network behavior in general but also at the packet level in particular. Another disadvantage is that labeling all possible attacks is not possible, particularly new attacks that have never been seen before [36]. When only D is included in the training set, the learning is unsupervised. In unsupervised learning, the goal is to detect anomalies by only looking at the properties and relations between the data elements in the data set. No labels are required neither for normal or abnormal data. Unsupervised anomaly detection is a viable approach because it does not need predefined labels to construct a model. This work will consider unsupervised learning because of the motivation stated above.

2.5 Machine learning algorithm for anomaly detection

This section describes the anomaly detection algorithm used in this work. Clustering is a common method for anomaly detection. Clustering is an unsupervised classification method used to separate data into groups (clusters), where each cluster has similar characteristics [28]. In clustering, the available data to be grouped is not labeled beforehand as is the case for supervised learning methods. Instead, the method tries to group a collection of unlabeled patterns into meaningful categories that are obtained only from the data itself [28]. The selection of ideal features for the clustering method is important to be able to recognize patterns among the different clusters [29]. Clustering methods can be divided into four different categories [30]:

• Partitioning methods

• Density-based methods

• Hierarchical methods

• Grid-based methods

(25)

Partitioning-methods divides the data into k partitions, where each partition is a cluster. A well-known partitioning based method is the k-means algorithm. K-means uses the mean as a function to decide which cluster an observation belongs to. Density-based algorithms divide data into groups based on the density. High-dense regions are points that lie close to each other and are grouped into the same cluster. Correspondingly, low-dense regions are considered to be outliers. Hierarchical-methods group objects into hierarchical structures, while grid-based divide objects into grid-structures. Chen and Tu [31] describe density-based methods as natural and attractive for data streams, as they have the ability to find arbitrarily shaped clusters, and they need to examine the data only once. They can also handle noise well and it does not need a prior specification of the numbers of clusters to be used unlike the partitioning-based k -means algorithm, where you have to define the number of clusters in advance k. Examples of density-based methods are DBSCAN, OPTICS, and DENSTREAM. DENSTREAM is based on DB- SCAN and has additional features that enable the algorithm to be used with evolving data streams [40]. The algorithm that will be studied in more depth in this work is DB- SCAN. The reason why DBSCAN is selected over other clustering methods is because it can find arbitrarily shaped clusters and consequently there is no need for defining the number of clusters beforehand. The reason it is decided to use clustering in this project instead of any other unsupervised machine learning method is because of its ability to group observations into several groups. As described in section 2.3.2 the features take connections in mind, which means that the model might divide observations into groups related to the data sent between each connection. Other methods besides clustering were intended to be tested in this thesis as well but because of time limits, it was not possible.

These methods were One-class support vector machine and Local Outlier Factor.

2.5.1 DBSCAN

A well known density-based clustering algorithm is the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm. DBSCAN takes two user-defined input parameters, neighborhood distance Eps and minimum number of points MinPts. Having defined a set of points to be clustered, each point can be classified as either a core point, reachable point or an outlier point. A core point is a point with at least MinPts neighbors (including the point itself) within the radius of Eps, where the Eps radius can be measured with an arbitrary distance method such as the Euclidean distance. Each neighbor to the core point within the Eps radius is called a direct density reachable point and belongs to the same cluster as the core point. These neighbors can again be core points. In that case, the points in the neighborhood of these core points are also included in the same cluster, where each point is a density reachable point. None-core points that are density reachable are called border points. All other points that are not density-reachable from any other point are called outliers or noise points and are not included in any cluster [32].

(26)

2.5. Machine learning algorithm for anomaly detection 17

Figure 2.3: DBSCAN. Picture taken from [32].

Figure 2.3 shows an illustration of the DBSCAN model. MinPts parameter is set to 4 and Eps is visualized by the circles. Point A and all the other red points are core points since there are at least four points within its own neighborhood. The yellow points B and C are not core points, but since they are density-reachable from point A they still belong to the same cluster and are defined as border points. Point N is not reachable from any other point. Thus, it is considered as an outlier. The task of the DBSCAN algorithm is to compute clusters and find outliers or anomalies as illustrated in the model. Figure 2.4 describes the pseudocode for DBSCAN.

(27)

Figure 2.4: Pseudocode of DBSCAN. Picture taken from [32].

(28)

C

^HAPTER

3 Methodology

This chapter describes the methodology behind the implementation of the project. Vari- ous software and hardware tools that are used and the architecture of how all these tools fit together is described. It also describes how self-generated network data is processed to create the selected features for the machine learning model and how the model is implemented and validated. In short terms, network data is generated according to a simulated use-case. The network traffic is sniffed and collected on an edge-device where a machine learning platform is installed. The final solution consists of a machine learning algorithm that runs on the device and classifies the incoming data as either normal or abnormal.

3.1 Tools

3.1.1 Hardware

Edge device One of the goals of the project is to run the analysis on a resource- constrained edge device. The edge device used is a development board from HMS called Beck DK151 (DB150). The board has an embedded controller called SC145 with a 128 megabyte (MB) working memory, 64 MB flash memory and it comes with an ARM Cortex A7 processor. The device has a built-in Linux-based operating system called RTOS-LNX. The machine learning program runs on this device.

Profinet IO Controller A Siemens S7-1200 PLC is used as a Profinet IO Controller.

The PLC had to be configured to exchange cyclic data with the desired IO device from the windows program Tia Portal before system startup. The PLC can provide output data to IO devices and also consume data from IO devices. For this project, it is configured to consume incoming data from an IO-device (Anybus X-gateway).

It can handle communications of up to 10/100 Mbps.

19

(29)

Profinet IO Device An Anybus X-gateway from HMS Networks with Modbus TCP - Profinet IRT translation is used as a Profinet IO Device. The device permits to send cyclic I/O data between Modbus TCP networks and Profinet. The gateway translates Modbus data into Profinet cyclic I/O data. This results in that Profinet frames can be sent from the X-gateway to the PLC. The reason a Modbus TCP gateway is included in the project is that it enables to generate Profinet data in a very flexible way, which is described further down.

Network sniffer A Network Tap is used to sniff network traffic for monitoring. The switch taps into the connection between the IO Controller and IO Device and forwards the traffic to the edge device. It needs to be clarified that the edge device is not a Profinet IO Device. It just collects traffic sniffed from the network tap.

Visual Analyzer A personal computer (PC) is used for data visualization and deploy- ing queries to the data streams.

3.1.2 Software

Packet generator The initial idea was to program the PLC to send Profinet data from the PLC to an IO device. However, after discussions with some employees at HMS, it was concluded to change the architecture due to the complexity of generating desired Profinet data. Instead, a packet generator was written in Python to send the desired cyclic data from the Anybus X-gateway to the PLC. The Modbus/TCP client library pyModbusTCP is used in the Python program to write data over Modbus TCP to the gateway. The data is then transferred by the gateway over Profinet to the PLC. Having this architecture makes it very flexible to try different structures of the data to be sent over the Profinet network, and the changes can be made very quickly. The operation of the python program is described in Section 3.3.

Packet collector The cyclic Profinet traffic is sniffed and collected on the Edge device.

The software application that reads the incoming traffic on the Ethernet port of the device is written in the programming language C. The board includes an Ap- plication programming interface (API) called Packet API that provides functions for reception of Ethernet packets. The application reads all incoming traffic and filters on cyclic Profinet I/O packets.

Stream engine The analysis part is made with the help of a stream processing and analysis system. The platform is called sa.engine and is provided by a company named Stream Analyze. The platform supports online analysis of data streams including deployments of statistics- and machine learning models to resource-constrained edge devices. The largest configuration of the software requires only 7 MB storage and was installed on the board without issues. The platform permits the creation

(30)

3.2. Architecture 21 of data streams in the programming language C. Since C is used for the data collection, the collection and creation of the stream are merged into the same program.

The stream consists of arrays where each array contains the data sources intended for analysis. Once the initialization of the streams is completed, they are used by the analysis tool in the platform to make continuous queries to the streams. For the analyst, the platform provides a Visual Analyzer running on a PC. The Visual Analyzer consists of a graphical user interface where queries can be written and deployed to edge devices. The queries are written in a language, similar to SQL, called OSQL. The CQ analyzes the data streams and the result is sent back to the visual analyzer where it can be visualized as either text result or appropriate graphical plots. The communication between the visual analyzer and the edge device occurs over TCP. Sa.engine has also support for developing machine learning models in the visual analyzer. After training of the model, it can be deployed on the edge device where online analytics is running.

3.2 Architecture

The overall architecture of the project is described in Figure 3.1. The python script writes data over Modbus TCP to the Anybus X-Gateway. The gateway then translates the data into Profinet and sends the Profinet frames via the network tap to the PLC. The PLC acts as a Profinet IO Controller, which receives input from the IO Device (Anybus X-Gateway). In a real world, this input could be a sensor value acting as an input to a control loop in the PLC, or a simple monitoring measurement. The significance of the Profinet traffic is not relevant for this project, neither is the direction of the traffic since the goal is to find anomalies in the packets. Meanwhile the traffic is generated between the IO Device and IO Controller, the network tap sniffs the packets and sends them to the edge device, where the anomaly detection algorithm runs. Data visualization occurs on the laptop via sa.engines visual analyzer.

3.3 Generation of network data

The aim of this project is to detect anomalies in Profinet packets. An already running system setup generating real data is not available, neither is actual data generated from a real attack. Therefore, a simulated use case needs to be constructed, where both normal and abnormal data is generated. This simulated case, written in python, strives to be as realistic as possible. It should also be generic, meaning that the detection method should work for the general case and not be focused on one specific attack scenario. As stated in section 2.1.3, normal data is more static than abnormal data. This situation is taken into consideration in the script. Therefore, the script sticks to the idea that a PLC normally operates in strict patterns with small changes in inputs and outputs.

(31)

Figure 3.1: Overall architecture of hardware and data transmission.

Taking an industrial robot as an example, the movement of the robot is fixed. The same is similar for a conveyor belt. The idea and structure of the generated data were defined in conjunction with experts from HMS. Although the use case is supposed to be generic, it requires backing of real-world logic. In an industrial setting that uses Profinet, a PLC sends all information needed in the Cyclic I/O payload. Taking the example of an electric motor drive, information such as speed and direction is embedded into the payload block.

The case will be influenced by a motor drive scenario where speed and direction will be randomized and embedded into the Profinet packets. The script generates normal data and abnormal data separately. The generation of data for normal operation follows the following procedure:

1. Randomize time t (0-5 seconds) 2. Randomize speed s (0-30) 3. Randomize direction d (0 or 1)

4. Write speed and direction into registers in Anybus-X Gateway for t seconds.

5. Profinet cyclic I/O data is sent to the PLC.

6. Repeat steps 1 to 5 until program is stopped by the user.

(32)

3.4. Data processing 23 For abnormal data generation, the procedure looks the same. The only difference is the speed range s, which is randomized between 0 and 1000. The abnormal data will have a larger variance.

3.4 Data processing

The collected data requires to be processed to extract useful information from the raw data sniffed from the network. The raw network traffic needs to be filtered and structured to create proper features for the detection method. The steps involved in the data processing stage are filtering, cleaning, normalization and feature extraction from the data.

3.4.1 Data collection

Since no external data is available beforehand, all data is collected directly from the traffic in the network. After the production of the Profinet traffic, as described in Section 3.3, the network data is ready to be sniffed and captured. Raw Profinet packets are collected directly on the edge device as described in Section 3.1.

3.4.2 Pre-processing

Useful information has to be produced from the raw network traffic to detect abnormal data. The network traffic is filtered to consider only Profinet frames, which in turn are filtered to only consider Cyclic I/O data. Therefore, frames with frame id between 0x8000 and 0xBBFF are used. This filtering is made in the C application running on the edge device. To ensure that there are no duplicates or missing values, Wireshark is used to record a sample of the traffic on the PC. The recorded traffic is relayed to the edge device to ensure that the traffic is identical on the edge device. This is carried out by comparing the recorded traffic file with the output on the device. This action revealed that no packets were missing when comparing them, nor duplicates were found.

When filtering and cleaning stage is completed, the next step is to generate data streams containing data sources. Data streams are created with sa.engine in the same application running on the device. Each element in the data stream is represented as an array containing all the data sources intended for the stream. The collected frames include useful information as indicated in Figure 2.2. From these frames, data sources can be fetched. The following data sources, all based on each individual packet, are used:

• Timestamp

• Packet size

• Mac source address

(33)

• Mac destination address

• IO Data size

• Frame ID

• Cycle counter

• IO Data

Each item in the stream is represented by these packet characteristics. When the data stream is ready, the stream is queried from the visual analyzer and the elements in the stream are saved to a .json file on the PC. This procedure is carried out both when normal operation and abnormal operation are running separately, resulting in two data sets. One data set contains normal data and another contains only abnormal traffic.

The file with normal data consists of 78349 rows (packets) and 8 columns (data sources), whereas the file representing abnormal data contains 46376 rows. Having the data stored in a file provides greater flexibility to experiment with the data.

3.4.3 Feature extraction

The collected data sources need to be converted into useful features that can be used in the machine learning model. The individual data sources in the stream cannot be used as input to the machine learning algorithm. Instead, data transformation is required to enable their use by the machine learning algorithm. Thus, a data set containing the features to be used by the machine learning algorithm is constructed. The feature engineering process is carried out offline. The feature extraction procedure is done on the data sources stored in the .json files. The feature construction is implemented in a function called feature extraction( windows ) using the OSQL language. The function takes a stream of windows of specific size and stride. The windows are created by the built-in function winagg( s, sz, str ) in sa.engine, which forms a stream of windows of size sz and stride str over a stream s. The stream s is created by calling the function read stream( file ), which creates a stream containing the elements in a .json file. The stream contains the data sources listed in the previous subsection. The selected features described in 2.3.2 are thereby constructed over windows of the data sets. An explanation of how the selected features are extracted is provided below.

Standard deviation of the payloads from source to destination for the n last packets

This feature is created by taking the standard deviation of the bytes in all payloads in a window with size n. The bytes in the payload are first divided by 256 (the maximum number for a byte) to get a normalized value between 0 and 1.