Generating Datasets Through the Introduction of an Attack Agent in a SCADA Testbed : A methodology of creating datasets for intrusion detection research in a SCADA system using IEC-60870-5-104

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Datateknik

2021 | LIU-IDA/LITH-EX-A--2021/013--SE

Generating Datasets Through the

Introduction of an Attack Agent

in a SCADA Testbed

–

A methodology of creating datasets for intrusion detection

re-search in a SCADA system using IEC-60870-5-104

Hur en SCADA testmiljö med IEC-60870-5-104 protokollet

un-der attack kan skapa data att använda för nätverksbaserade

in-trångdetekteringssystem

August Fundin

Supervisor : Chih-Yuan Lin Examiner : Simin Nadjm-Tehrani

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Över-föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och till-gängligheten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet än-dras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

In December 2015 a power outage was caused by a hacking attack in Ukraine. This further highlighted the ongoing increase of attacks on critical infrastructure and the vulnerabilities of the aging industrial control systems governing it. Supervisory Control and Data Acqui-sition (SCADA) is an example of such a system. Studying the intrusion of adversaries and anomalies in SCADA systems is no easy feat. Administrators of SCADA systems rarely share data as they risk getting their weaknesses detected. Hence, datasets containing this data need to be acquired through other means.

In this study, a SCADA testbed simulating a real-world counterpart was used to create datasets for intrusion detection. As the testbed had no previously documented attacks, this study also investigated how the testbed reacted to generated attacks. This study focused on attacks on the communication protocol IEC-60870-5-104. The chosen approach to obtain datasets was to construct a so-called attack-bot, generating attacks during scenarios where network traffic was recorded. After a scenario, a user has access to labeled network traffic, ready to be used when training intrusion detection systems.

This kind of data is traditionally challenging to create. There are few publicly available qualitative testbeds and generating data without a testbed comes with a whole set of dif-ficulties. The results illustrate how this study’s approach can generate high quality data with a rather small effort.

(4)

Acknowledgments

I would like to thank Chih-Yuan Lin and Simin Nadjm-Tehrani, my supervisor and my ex-aminer. For the guidance as well as the valuable feedback and discussions for the duration of my work.

I would like to follow that up with a hearty thanks to Erik Westring, Peter Andersson and Tommy Gustafsson at FOI for aiding me with RICS-el.

And finally, thanks to all of you who gave me much needed encouragement when I needed it!

(5)

Abbreviations ix 1 Introduction 1 1.1 Motivation . . . 2 1.2 Aim . . . 2 1.3 Research Questions . . . 3 1.4 Delimitations . . . 3 1.5 Thesis Outline . . . 3 2 Background 4 2.1 SCADA . . . 4 2.2 IEC-60870-5-104 . . . 6 2.3 SCADA Vulnerabilities . . . 10 2.4 SCADA Exploits . . . 11 2.5 RICS-el . . . 13 3 Related Work 16 3.1 Dataset Generation . . . 16

3.2 Attack Types and Attack Evaluation . . . 19

4 Methodology of Dataset Generation 22 4.1 Attack-Bot Implementation . . . 23

4.2 Experiment Setup . . . 23

4.3 Dataset Generation Workflow . . . 25

5 Attack Generation in RICS-el 30 5.1 Attack Model . . . 30

5.2 Attack Scenario Implementation . . . 31

6 Method of Evaluation 36 6.1 Dataset Requirements . . . 36

6.2 Evaluation of Datasets Requirements . . . 37

(6)

7 Results and Evaluations 39 7.1 Impact of Attacks . . . 39 7.2 Created Datasets . . . 43 7.3 Review of Requirements . . . 44 8 Discussion 46 8.1 Results . . . 46 8.2 Method . . . 48 8.3 Sources . . . 50

8.4 The Work in a Wider Context . . . 50

9 Conclusion 52 9.1 Dataset Creation in RICS-el . . . 52

9.2 Attack Generation in RICS-el . . . 53

9.3 Future Work . . . 54

A Appendix Attack-Bot Configurations 56 A.1 List of Flags . . . 56

A.2 Configfile Options . . . 57

B Appendix Attack-Code Template 58

(7)

List of Figures

2.1 An overview of SCADA . . . 5

2.2 APDU with fixed and variable length . . . 7

2.3 APCI control field formats . . . 8

2.4 Information contained in an ASDU . . . 9

2.5 Overview of RICS-el . . . 13

2.6 Interactions between bots and SCADA in RICS-el . . . 14

4.1 Network configuration in the experiment setup . . . 23

4.2 The attack-bot in RICS-el’s dataflow . . . 24

4.3 The dataset generation workflow . . . 25

4.4 Running scheduled attack scenarios . . . 27

(8)

List of Tables

2.1 Common ASDU functions in RICS-el . . . 8

5.1 Overview of implemented attack-scenarios . . . 31

5.2 IP addresses of IEC-104 devices . . . 32

6.1 Attack success criteria . . . 38

7.1 Result of the scanning attack . . . 40

7.2 Results of the DoS attacks . . . 40

7.3 Results of the sequence attack . . . 41

7.4 Results of the MitM attacks . . . 41

7.5 Result of the replay attack . . . 42

7.6 Results of the injection attacks . . . 43

7.7 Recorded datasets . . . 44

7.8 Operator actions in each scenario . . . 44

(9)

Abbreviations

APCI Application Protocol Control Information.

APDU Application Protocol Data Unit.

ARP Address Resolution Protocol.

ASDU Application Service Data Unit.

CoT Cause of Transmission.

CSV Comma Separated Values.

DMZ Demilitarized Zone.

DoS Denial-of-Service.

FOI Swedish Defence Research Agency.

HMI Human-Machine-Interface.

ICMP Internet Control Message Protocol.

ID Identity.

IDS Intrusion Detection System.

IE Information Element.

IEC International Electrotechnical Commission.

IEC-104 IEC-60870-5-104.

IO Information Object.

IOA Information Object Address.

IP Internet Protocol.

IT Information Technology.

ITF Invalid Time Flag.

LAN Local Area Network.

MitM Man-in-the-Middle.

(10)

NSTB National SCADA Test Bed.

NTP Network Time Protocol.

ORG Originator Address.

OT Operation Technology.

pcap Packet Capture.

PLC Programmable Logic Controllers.

RICS Resilient Information and Control Systems.

RTT Round-Trip Time.

RTU Remote Terminal Units.

S3 SUTD Security Showdown.

SCADA Supervisory Control and Data Acquisition.

SQ Structure Qualifier.

SSH Secure Shell.

STARTDT Start Data Transfer.

STOPDT Stop Data Transfer.

SUTD Singapore University of Technology and Design.

TCP Transmission Control Protocol.

TCP/IP Internet protocol suite.

TESTFR Test Frame.

TTL Time To Live.

VM Virtual Machine.

VPN Virtual Private Network.

(11)

1 Introduction

SCADA is a control system that encompasses both devices interfacing with physical machin-ery and computers of geographically distributed critical infrastructure, such as power grids. Organizations managing power grids need SCADA systems to control and monitor safe and reliable operations [10].

SCADA systems and their protocols were previously used in isolated networks with propri-etary solutions. However, this has changed over the last decades. Components are now stan-dardized instead of specialized, to improve maintainability. Instead of proprietary software, more publicly known software is used to ease the integration of systems. Connections be-tween SCADA networks and the organization’s corporate networks have been added. These changes have made SCADA systems easier to operate. But at the same time, the changes have also made SCADA systems more vulnerable. The connections to the corporate network open up for intruders to penetrate the system in new ways. Devices and protocols that then become exposed often have known vulnerabilities [10].

Cyberattacks targeting SCADA systems are undeniably happening in today’s society. The power grid cyberattack in Ukraine, December 2015, is believed to be the first example of a power outage deliberately caused by a hacking attack [25]. Since then, there has been an increase in reports of attacks on SCADA systems with malicious intent [41, 29]. Security re-searchers need to find ways of detecting anomalies and intrusions in SCADA systems. How-ever, research on SCADA systems used in production risk disruptions since such a system needs to be in constant operation [19]. Another issue is that intrusions in SCADA systems, such as the cyberattack in Ukraine, are unique events. This complicates the reproduction of data necessary for studies on intrusion detection. Therefore, datasets with recorded or gener-ated traffic with the characteristics of the examined SCADA system need to be used instead to enable the development, evaluation and comparison of different defense mechanisms. An important method of defense is to introduce a Network Intrusion Detection System (NIDS) in SCADA networks [51]. A NIDS is an automated process that monitors events within the traffic of a system, analyzing data in search of signs of adverse events [51]. The development of NIDS requires realistic datasets to configure and attune the NIDS, to effec-tively detect intrusion. The datasets need to contain normal traffic and labeled attack traffic so that researchers can distinguish attacks from normal operation [49]. Introducing anomalies

(12)

1.1. Motivation

or attacks to a SCADA system in production to acquire such data is out of the question. One method to generate datasets for NIDS development is to record data in testbeds that imitate the behavior of SCADA systems in production.

Unfortunately, reliable and publicly available datasets have traditionally been few. Those available have been criticized for being out of date, lack proper labeling and containing de-fects not present in real-world applications [28, 30].

This study presents a method to generate datasets in a virtual testbed [5] containing an emu-lated power grid containing about 20 substations. The testbed runs similarly to the real-world application, allowing the creation of reliable and adaptable datasets to study. In this study, a program is built to enable attack generation in the virtual SCADA testbed. The creation of the attack generating program contributes to the research field surrounding the security of SCADA systems in two main ways. Firstly, it offers a methodology to create relevant datasets generated in the virtualized testbed. Secondly, it investigates the testbed’s reactions to differ-ent attacks and the support and limitations of attack generation within it.

1.1 Motivation

Cyberattacks against power grid control systems constitute a threat to the availability and reliability of power. Since it is infeasible to run tests on such systems in production there is a need for testbeds that correlates closely with the real system. A testbed needs to allow repeat-able tests and simulations for the evaluation of different methods. The testbed used in this study is called RICS-el [5] and is created by the Swedish research center Resilient Information and Control Systems (RICS)1. In RICS-el long data streams can be collected during simula-tions, generating datasets for research. With the introduction of simulated agents involved in the power grid, such as operators and attackers, generating relevant datasets would be made easier.

It is of relevance to conduct a study on dataset generation in a virtualized testbed rather than a physical testbed. If a virtual testbed can be deemed as believable and effective as a physical testbed there is a lot to gain. One advantage of a virtual testbed is that it is more cost-effective and easier to maintain. Another advantage is that virtual testbeds allow for easier system recovery. The state of the virtual system can be saved and loaded rapidly making experiments easier to replicate.

In early 2020, when this project started, there was no reliable way to generate datasets con-taining attacks in RICS-el. Through the creation of a program that generates deterministic attacks in RICS-el, researchers would be able to collect data in repeatable experiments in a reliable way. Hence, streamlining the process of finding new and better methods to detect intrusions in SCADA systems.

Also, attacks and their impacts on RICS-el’s SCADA system have not been previously docu-mented. Since RICS-el is a virtual environment it might not react in the same way to certain input or situations as a real power grid would. If RICS-el behaves too differently from its real-world counterparts then the generated datasets are not as useful. Therefore, the impact of generated attacks needs to be documented.

1.2 Aim

The purpose of this study is to provide a systematic method to generate datasets for intrusion detection research in SCADA systems. This is to be achieved by creating a bot, an attack-bot,

(13)

1.3. Research Questions

generating attacks within the virtualized testbed RICS-el. Ideally, the attack-bot in conjunc-tion with the virtual nature of RICS-el would create a flexible environment for researchers and operators in training alike.

RICS-el’s implementation differs from a real power grid in several ways, such as the amount of bandwidth, how data is processed in the SCADA system and its devices are virtual rather than authentic. To generate datasets containing relevant attacks, there is a need to evaluate which attacks RICS-el responds to believably. This study aims to document attack impacts to increase the understanding of this topic.

1.3 Research Questions

Given the demand for a reliable approach to dataset generation, the following research ques-tions are answered in this study.

1. How can datasets, including realistic network traffic and labeled attacks, for intrusion detection research in SCADA systems be realized in RICS-el?

2. Do attacks generated in this study have an appropriate effect on RICS-el? What impact do the attacks have on RICS-el?

1.4 Delimitations

In this study, an attack-bot is defined as a program that, after startup, launches attacks with-out the need of any user input.

The attack generation is carried out in the RICS-el testbed and all attacks are restricted to the IEC-60870-5-104 (IEC-104) protocol.

Because of time constraints put on this study, the attack-bot need not generate more than five different types of attacks. The focus is on attack impact assessment and network intrusions. Hence, process-aware Intrusion Detection Systems (IDSs) or datasets containing process data are not part of this study.

1.5 Thesis Outline

In chapter 2, the relevant theory and terminology surrounding SCADA and IEC-104 are cov-ered together with their respective vulnerabilities. The testbed RICS-el is also presented more thoroughly. Chapter 3 covers how others have chosen to generate datasets and successful at-tacks on SCADA systems. The approach taken to generate the datasets is described in chapter 4. Chapter 5 defines an attack model and the attacks. Then, chapter 6 cover how the datasets and attacks were evaluated. The evaluation of attacks and datasets is given in chapter 7, which also lists the resulting datasets. Chapter 8 discusses the results, reviews the method and reflects on the work in a wider context. Finally, chapter 9 describes to what extent the aim has been achieved and presents the answers to the research questions.

(14)

2 Background

This chapter covers the theoretical background and terminologies necessary to understand this study. Section 2.1 describes what SCADA is and how communication works within SCADA systems. A common communication protocol in SCADA is IEC-104 and the rele-vant aspects of that protocol are explained in section 2.2. Specific vulnerabilities of SCADA systems and IEC-104 are described in section 2.3 and section 2.4 lists examples of different attacks on SCADA systems. The SCADA testbed using IEC-104, which is being attacked in this study, is described in section 2.5.

2.1 SCADA

This section presents a typical SCADA architecture and describes the components in the ar-chitecture. The section also covers the basics of SCADA communication with a focus on protocols and traffic behavior.

SCADA is a control system for monitoring and controlling geographically distributed phys-ical processes in real-time. It is widely used in power grid infrastructure. A lot of resources have been devoted to ensure safety and that the processes run as expected. To manage safe functioning processes is the priority of SCADA. SCADA systems interact with physical pro-cesses and assets such as controllers and sensors. Readings are sent from the outer nodes of the SCADA network to a central control system, from where the physical process can be regulated through the hands of an operator or automatic processes [51, 10].

In SCADA systems, much emphasis is put on availability, access to commands and data is necessary to deliver proper operation. There is also a very low tolerance for delays, as slow reactions could lead to the system entering an unsafe state [10].

(15)

2.1. SCADA

2.1.1 Architecture and Components

There is no conventional architecture of SCADA systems [51] but there is an idea of an ar-chitectural pattern which conceptually consists of three layers: The field stations, the control network and the corporate network [10], see Figure 2.1.

The lower layer consists of the field stations, each containing one or more field devices and controllers. Examples of such could be Remote Terminal Units (RTU), Programmable Logic Controllers (PLC), sensors and actuators. The RTUs of a SCADA system typically hold pro-cess setpoints. These setpoints define bounds or goals which the system uses in its control of actuators. The lower layer receives control commands from the control network and sends monitoring data in return.

The control network has a Human-Machine-Interface (HMI) where operators can overview and manually control the current state of the system. The supervisory control of the system is located at the control server/data acquisition. The historian is a database that logs the process information for the entire system.

The upper layer, the corporate network, is interested in production scheduling. Passing in-formation about forthcoming load forecasts or changes foreseen in the operational capacity. Operators on this level need access to data in the control network [51].

access to data and remote control of the system.

Figure 2.1: An overview of SCADA

2.1.2 Communication

Traffic features in SCADA differ from traditional Information Technology (IT). For instance, most packets sent in a SCADA network are generated by a machine repeating the same ac-tions over and over [31]. One such action could be exemplified by a control network contin-ually polling the field devices, which is constantly sending data in return [51].

(16)

2.2. IEC-60870-5-104

SCADA and traditional IT do not only have different users and usage, they have different priorities. Confidentiality and integrity, that unauthorized users or processes should not be available to access or alter information are main concerns in traditional IT [51]. However, SCADA systems need to ensure safe and functioning processes and therefore need to prior-itize availability of information [7]. SCADA systems interact with physical assets, having a direct impact on physical systems. If an operation or response is delayed, the system could be brought to an unsafe state [56]. As such, SCADA systems have a low tolerance for jitter and delay. Therefore, the communication protocols were not designed with security as a top priority, but rather with good performance and functionality [43].

Communication channels between the field devices and the control network are, for example, wire, radio or satellite [51]. It often depends on the amount of infrastructure available to the field device which could operate at remote locations. All SCADA devices are expected to run for a very long time, up to 20 years [10]. During that time, there is an expectation that every device is responsive to incoming connections.

2.2 IEC-60870-5-104

This section covers the IEC-104 communication protocol, specifically how the packets are structured and how communication works where IEC-104 is implemented. IEC-104 is an international standard used in SCADA systems. Especially within the field of electricity dis-tribution and power system automation in European countries where its main functionality is telemetry gathering [4].

IEC-104 holds a close similarity to IEC-101, which was the first standard developed by In-ternational Electrotechnical Commission (IEC) in the set of standards 60870 part 5 [13]. In IEC-104 the application layer introduced in IEC-101 is preserved. However, the standard was extended to be used over the Internet protocol suite (TCP/IP), changing the transport, network, link and physical layer services.

2.2.1 How Communication Works

IEC-104 supports bidirectional communication but differentiates between control and mon-itor direction [13]. Transmissions from the control network to the field devices are sent in the control direction. Transmissions going in the other direction, from a field station to the control network, are sent in the monitor direction.

The IEC-104 protocol supports three different modes of operation, control/request, periodic and spontaneous [44, 32]. With the control/request mode, the control network polls a field station for data transmissions. In periodic mode, data is sent with a predefined interval. A spontaneous transmission occurs whenever there is an update of the system’s state. For ex-ample, when a switch is toggled or a measured value differentiates enough from a previously sent value.

The communication between the control network and each field device is initiated like any other connection using Transmission Control Protocol (TCP). Involved nodes identify them-selves using the Internet Protocol (IP) addressing system. IEC-104 has designated port 2404. When a connection has been established, user data transfer is not automatically enabled. It has to be initiated by a controlling station sending a Start Data Transfer (STARTDT). Open connections may be periodically checked with a Test Frame (TESTFR), typically after a pe-riod of time where no data has been transmitted [44].

IEC-104 and TCP use acknowledge numbers and sequence numbers in a similar fashion. Each byte in a TCP stream has a sequence number. In a sent TCP packet, the sequence number is the byte number of the first byte of data. The acknowledgement number is the sequence

(17)

2.2. IEC-60870-5-104

number of the next byte the sender expects to receive. The sequence numbers are used to check the validity of a packet and to know if buffered bytes have to be re-sent [47]. The initial sequence number is chosen at random to mitigate connection hijacking [47].

In IEC-104, the sender also holds sent packets in a buffer until they have been acknowledged by its own sender sequence number, returned as a receive sequence number. All packets in the buffer with equal or lesser sequence numbers are then released. To avoid overflow an acknowledgement is typically sent in response to a longer data transmission in one direction [44].

2.2.2 The Network Packet

The communicated packets are called Application Protocol Data Unit (APDU) and exist in three different formats; the S, U and the I-format [44]:

• The Information transfer format, I-format, has an Application Service Data Unit (ASDU) used for sending detailed information such as measurements and commands.

• The S-format is used for numbered Supervisory functions. A typical use for this frame is to acknowledge received APDUs.

• The U-format, U as in Unnumbered control functions, is used for the activation and confirmation mechanisms STARTDT, Stop Data Transfer (STOPDT) and TESTFR. An APDU always contains a header, the Application Protocol Control Information (APCI). The APCI is constrained to six bytes of data: A start byte, a byte specifying the length; and four control fields. Control fields detail format type and sequence numbers. As the S and U-formats APDU only consists of an APCI, both are of a fixed length. The I-format is different as it contains an ASDU of variable length, holding sensitive information about the system. An overwiew of the APDUs is shown in Figure 2.2 (based on Matoušek [44]).

(18)

2.2. IEC-60870-5-104

The start byte of the APDU is 0x68 and the length of the APDU is excluding the start byte and length byte. The following four control fields of the APCI are shown in Figure 2.3. The control fields of an I-format specifies the message direction. The sequence numbers are initially set to zero and then incremented by one for each APDU and direction.

Figure 2.3: APCI control field formats

An ASDU can have 256 different type identifications, functions, of which 67 are defined in the standard. The functions of relevance to this study are listed in Table 2.1.

Table 2.1: Common ASDU functions in RICS-el

Function Description)

M_SP_NA_1 Single point information

M_ME_NA_1 Normalized measured value

M_SP_TB_1 Single point information with time tag M_DP_TB_1 Double point information with time tag M_ME_TF_1 Measured short floating point value with time tag

C_DC_NA_1 Double command

C_DC_TA_1 Double command with time tag C_SE_NC_1 Set-point command, short floating point value

(19)

2.2. IEC-60870-5-104

The structure of the fields found in an ASDU can be seen in Figure 2.4 (based on Matoušek [44]).

Figure 2.4: Information contained in an ASDU

• The type of the ASDU is specified in the first byte, Type identification, which will dictate what function the Information Object (IO) has.

• The Structure Qualifier (SQ) is used for specifying if the ASDU has a sequence of one or more IOs, or if it has a sequence of Information Elements (IEs) of one type. An ASDU can hold up to 127 objects or elements, specified in Number of objects.

• T is a test bit set to 1 during test conditions, signaling it should not change the system state.

• The P/N bit indicates whether a command is confirmed or not when sent in mirrored direction.

• The Cause of Transmission (CoT) hints at why the ASDU is sent.

• The Originator Address (ORG) is used to explicitly declare the controlling station. • The ASDU address field is the address to the station to which all objects in the ASDU

are associated.

Following the data unit identifier, there are one or more objects. Each field of such an object contains:

(20)

2.3. SCADA Vulnerabilities

• The Information Object Address (IOA), referring to different addresses on the RTU that is being controlled.

• The IEs contain the transmitted information.

• There are 32 defined type identifications in the standard that have a field reserved for a time tag. IEC-104 has a built-in security mechanism for noting that the devices were not synchronized, the Invalid Time Flag (ITF), in a packets time tag [9]. This bit is set in the acquisition function if any inconsistencies are recognized.

2.3 SCADA Vulnerabilities

This section outline vulnerabilities in a typical SCADA system architecture and vulnerabili-ties in the communication protocols.

2.3.1 Vulnerabilities in Architecture and Components

As devices in SCADA are expected to run for a long time, a lot of legacy software and hard-ware is in operation within SCADA systems. Obviously, old softhard-ware and hardhard-ware often have vulnerabilities, not prevalent in newer versions or patches applied to avert known threats [43]. Leaving a machine running for a long time also accumulates fragmentation, leaving it vulnerable to buffer overflow [56].

Built with availability as a top priority, authentication and encryption were downplayed as both practices hinder availability. When remote access was introduced to enhance ease of access to SCADA systems, the lack of authentication and encryption became an apparent issue. Especially when proprietary solutions were exchanged for commercial off-the-shelf solutions. While new solutions made integration and maintenance easier, the domain knowl-edge needed to cause harm drastically decreased [56].

Disconnecting SCADA networks from the Internet, or restricting access by implementing a Virtual Private Network (VPN), is not enough to mitigate risks and secure the system [10]. For instance, operators cannot protect field devices from malicious commands sent from the con-trol room. This type of insider attack constitutes the majority of attacks on SCADA systems. Of course, an operator might make the wrong call or issue the wrong command accidentally. But anyone with access to a physical connection to the SCADA system could just bypass the VPN or HMI.

2.3.2 Vulnerabilities in the Communication

As mentioned in section 2.3.1, there is no authentication or encryption built into SCADA systems. This is true for the communication protocol IEC-104 too. There are few security bits, no digital signatures and there is no checksum. Instead, IEC-104 depends on lower layers for data integrity [45].

However, the lower layer TCP/IP used in IEC-104 has plenty of known vulnerabilities pub-licly known [50, 21]. Examples of such:

• SYN flooding. A SYN packet is used to initiate communication in TCP. If a receiver of a SYN packet responds with an acknowledgement the original sender will confirm the connection, enabling an exchange of data. This is called a TCP handshake.

A SYN flood is a continuous stream of SYN packets sent to a device. Without pre-cautions, the device will respond to each SYN it receives but the TCP handshake is not completed in the attack. This could leave the targeted device unresponsive to legitimate traffic as it is waiting for a handshake confirmation [6].

(21)

2.4. SCADA Exploits

• Spoofing of the Address Resolution Protocol (ARP spoofing). ARP spoofing is a tech-nique that allows an adversary to intercept data frames in a Local Area Network (LAN) [39].

Vulnerabilities in SCADA traffic exist in both control and monitor directions. An adversary could for instance block the data transfer in the monitor direction through the utilization of SYN flooding. This can have a big impact as the SCADA operations completely depend on the data received from the RTUs [45]. The vulnerabilities can be exploited in the control direction too. The adversary may use ARP spoofing to intercept data and modify its content. This is harmful since any issued command from the control network is seen as a legitimate one by the receiver.

2.4 SCADA Exploits

This section lists different types of attacks and examples of attacks on SCADA, from research or historical events.

2.4.1 Scanning

To find suitable targets within the system, an adversary depends on scanning the network. A worm that was discovered in 2010 targeted nuclear power plants in Iran in an attack that became known as Stuxnet [20]. This virus scanned for specific machines to spread to in order to cause as much damage as possible.

SCADA networks contain devices that are infrequently updated, are proprietary or lack an interface to commonly used methods of scanning. As such, using scanning tools designed for traditional IT networks might cause unexpected issues [12].

2.4.2 Replay

A replay attack is when authentic data is delayed or repeated by an adversary. As SCADA systems do not prioritize integrity there are few security mechanisms preventing unautho-rized transmissions such as a replayed packet.

The adversaries behind Stuxnet used a replay technique to hide the manipulation of genera-tors. Data sent to a PLC from peripheral sensors were recorded for 21 seconds. Then, further updates from the sensors were dropped and the recorded data was sent in its place [20].

2.4.3 Man-in-the-Middle

Man-in-the-Middle (MitM) attacks are attacks in which an adversary captures, and possibly modifies, data sent to or from an unknowing target. Decisions taken at the HMI are based on information from the data it receives from the RTU. And the RTU assumes legitimate traffic from the HMI. This makes the system very susceptible to attacks that focus on misinforma-tion. An adversary can obtain direct or indirect control of the system through modifying transmitted information.

Maynard et al. [39] identified three general phases of any MitM attack, applicable on IEC-104 too:

1. Detection: During this phase, the adversary identifies targets enabling the coming phases.

2. Capture: Data, packets and payloads, are collected from the system. 3. Attack: The adversary use collected data to harm the system.

(22)

2.4. SCADA Exploits

In their work, ARP spoofing was used to enable MitM attacks in systems using IEC-104. One tested attack altered the CoT of IEC-104 traffic. ARP spoofing has been shown to be identifiable by a NIDS [6, 53].

2.4.4 Denial-of-Service

A Denial-of-Service (DoS) attack makes an asset unavailable to its intended user. If packets from the RTU are missing or delayed, the information (on which the control of the system is based) is unreliable. If packets from the HMI get delayed or do not reach the RTU, human control of the power grid is lost.

A historical example is the malware Industroyer [17] that left a part of Kyiv without electricity for an hour. The malware was, among other things, capable of preventing connections to devices using a serial connection.

There are many ways to achieve DoS in SCADA systems. For example, saturating the re-sources of the target, re-route packets to an unintended destination or maliciously alter servers [56].

2.4.5 Sequence Attacks

Sequence attacks misplace packets in a communication stream in order to disrupt the SCADA process.

Chromik [10] built and evaluated an IDS tailored for power distribution systems using IEC-104. One of the attacks used in evaluation aimed to interfere with interlocks used in power distribution. Basically, an interlock requires a sequence of steps to be taken in the correct order to complete an action. Execution of the sequence in the wrong order risks potentially dangerous situations. This can be exploited by an adversary willfully changing the order of the sequence or bypassing a step. One approach taken by Chromik to disrupt a sequence was to randomly drop packets. It was found that removing 0.1% of the transmitted packets did increase the number of observed transitional anomalies by the evaluated IDS.

2.4.6 Desynchronization

A time-synchronization attack targets synchronization mechanisms to bring the nodes of a system out-of-sync [48]. SCADA systems, having a low tolerance for delay, could possibly be very susceptible to time-synchronization attacks. Baiocco and Wolthusen [8, 9] exploited the Network Time Protocol (NTP) to destabilize their SCADA testbed. During experimentation, they managed to deny the operator any control of the system and achieved causality re-ordering of events logged in the historian.

2.4.7 Packet Injection

By supplying fabricated data to a target in newly crafted packets, an adversary would not need to be as dependent on monitored traffic in the network as with MitM or replay attacks. Gao et al. [55, 42] define an injection attack as an attack that introduces false data into a control system. They show that injection attacks have the potential of modifying process setpoints, interrupting process control or communication or modifying device configurations. As each packet in both control and monitor direction is processed as authentic, SCADA is very susceptible to these kinds of attacks.

One instance of a packet injection attack was the Maroochy Shire Sewage Spill [43], the year 2000 in Australia. A disgruntled employee installed company software on their computer and took control of the waste management system. Ultimately releasing millions of liters of sewage into surrounding waters.

(23)

2.5. RICS-el

2.5 RICS-el

RICS-el [5] is a virtual, experimental testbed, created with the goal of enabling repeatable experiments for security research within SCADA systems. This section describes RICS-el’s design and compares RICS-el to a real-world SCADA system.

RICS-el is realised in a collaboration between RICS and the Swedish Defence Research Agency (FOI)1. The ambition was to enable the creation of scenarios closely resembling real-world utility. This included the generation of attacks, implementation of defensive mecha-nisms and generating open datasets in long streams to be used in comparative research. The real-world utility emulated in RICS-el is the power grid.

2.5.1 Design

RICS-el consists of a network of Virtual Machine (VM), connecting an office IT segment with a SCADA system, see Figure 2.5 (based on Almgren et al. [5]) for an overview.

1. The RTU Emulator represents the RTUs of a SCADA system. The power grid simulator contains lower-level field devices.

2. The control network is represented by the so-called Operation Technology (OT) LAN. The OT-LAN consists of an HMI, Active Directory, control server and a backup control server. The control network communicates with the RTU Emulator using IEC-104 in spontaneous mode.

3. The Demilitarized Zone (DMZ), OT DMZ, are used to separate the enterprise network from the control network.

4. The enterprise network has access to a simulated Internet and is separated from the OT side with firewalls.

Figure 2.5: Overview of RICS-el 1_{https://www.foi.se/}

(24)

2.5. RICS-el

Everything in RICS-el is virtualized. The power grid is emulated, built using an operator training simulator module from a well-known SCADA company. Three emulated RTUs con-nect the SCADA front end to the power grid through a Wide Area Network (WAN). The WAN consists of 15 nodes forming a meshed network. The RTUs contain no control loops but serve as interpreters between the power system and the SCADA system. The sensors in RICS-el monitor electrical quantities and some actuator states. Present actuators are breakers and generators. The backlink sends traffic data from all RTUs within the power system which has not been converted to IEC-104. Every host present is running on a VM, which allows for high flexibility in configuration.

Each machine in the network of RICS-el has its network address(es), operating system and administrator password documented on an intranet, accessible after being granted an account by a system administrator.

To streamline the process of creating scenarios in RICS-el a so-called ScenarioBot has been cre-ated. It allows a user to execute pre-programmed events such as start a recording of network traffic, stored in Packet Capture (pcap) files. There is also a second bot called the OT-Bot, a more specialized version of the ScenarioBot. The OT-Bot issues scheduled commands from the HMI to an RTU. An overview of the relations between the present bots and components of the RICS-el are shown in Figure 2.6.

Figure 2.6: Interactions between bots and SCADA in RICS-el

A user of RICS-el is able to schedule a scenario to run in the ScenarioBot. The user can load pre-configured actions into the OT-Bot and schedule when the actions are to be executed. In turn, the OT-Bot executes its actions in the HMI, controlling the power grid through com-mands sent to the RTU. The available comcom-mands set at the OT-bot:

1. Open or close a breaker. 2. Change a generator setpoint.

At the start of a scenario, the ScenarioBot synchronizes two internal databases, unifying initial values and behavior of the power grid. When ready, the ScenarioBot starts to record network

(25)

2.5. RICS-el

traffic in the firewall between the control network and the simulated power grid. When the scenario ends, the ScenarioBot will fetch the recording as a pcap file accessible to the user. The network is configured so that the ScenarioBot has access to the LANs of all other nodes, to ease the addition of future actions.

2.5.2 Real World Comparison

There are three major differences between RICS-el and a real power grid:

1. In a real-world counterpart there is no backlink. Instead, data is measured by sensors within the power grid, sent through field devices to the control network. There are packets traveling that route in RICS-el too, but the WAN does not make the RTUs and SCADA front end truly geographically distant. The meshed network is in place to make it appear to be so.

2. The power grid in RICS-el is not relying on actual RTUs but generates RTU traffic it-self. The emulated RTUs in RICS-el only convert this traffic to the IEC-104 standard. Furthermore, the emulated RTUs in RICS-el contain no registers, logic, control loops or programming.

3. The third and the second layer of the SCADA system in RICS-el are realized by dupli-cating a training simulator module.

The training simulator module offers less control than a real-world counterpart. However, it does behave as the real counterpart with generated warnings and alarms et cetera. For example, at the HMI the entire overview is available to the user but not everything can be in-teracted with. Generators can have their setpoint altered and breakers connecting generators and stations to a power line can be opened or closed. The historian was not available. The simulator module is dependant on a database to know about the state of the system. There is one in connection to the power grid and one in the OT-LAN. Both of them need to have their data unified at system startup, a task bestowed on the ScenarioBot.

There is no functioning synchronization mechanism, such as NTP, between the virtual hosts in RICS-el. Instead, each virtual host uses the internal clock of the host that the VM is running on.

(26)

3 Related Work

This chapter contains important related work on the topic of dataset generation and experi-mental attacks within SCADA systems. Section 3.1 covers methodologies of dataset genera-tion and an overview of existing testbeds related to RICS-el. Previous attacks carried out in SCADA system testbeds are introduced in section 3.2.

3.1 Dataset Generation

Without access to SCADA systems in production, one common approach to creating datasets is to build a testbed mirroring a real-world SCADA system. Datasets are then generated when the system is operational and relevant data is recorded. Previous work with virtual testbeds is covered in section 3.1.1 and work in physical testbeds in section 3.1.2. Another method is to create the traffic synthetically, generating traffic in accordance with a set of rules or replicating traffic from previous captures. The approach using synthetic traffic is touched upon in section 3.1.3.

3.1.1 Using a Virtual Testbed

One of the most well-known dataset in anomaly detection research is the DARPA 99 dataset [34]. Seven weeks of traffic was recorded including both normal operation and attacks. The DARPA 99 dataset does however fail one of the more challenging aspects of dataset genera-tion, that both background traffic and injected attacks have to carry the same characteristics. If the packets differ too much from each other then it becomes easy to discover anomalies. For example, when Mahoney trained an anomaly detector on the DARPA 99 dataset, he found that the Time To Live (TTL) field in the IP header had to be set to zero to not make the attacks too easy to detect [36].

TASSCS by Mallouhi et al. [37] is a testbed that simulates an electrical grid, HMI, PLC, IDS and a network including DMZ and an office segment. TASSCS is constructed primarily to perform anomaly-based IDS research. That said, TASSCS lacks remote access [11] which is needed to be a competitive candidate for dataset generation.

In 2018, Maynard et al. [40] took a minimalistic approach to build a SCADA testbed. They show that researchers have access to simplistic and cheap alternatives for experimental

(27)

re-3.1. Dataset Generation

search and dataset generation. The testbed can be deployed on the researchers’ computer. They use VMs to simulate each machine, the SCADA network can be set up on virtual hosts in any way the user sees fit. The testbed is an open framework that currently supports IEC-104. An important aspect of the testbed is that it allows for physical devices or networks to be connected. One dataset has been documented using this testbed in a setup with five RTUs, one HMI and a historian. The small size of the testbed is also one of its flaws as it has no support for a DMZ or remote devices.

In contrast to RICS-el, mentioned testbeds are lacking software for scenario creation. This hinders repeatability and means to log each action taken in the environment. The only prior dataset generated in RICS-el has been during normal operation. Lin and Nadjm-Tehrani [32, 33] used such datasets to analyze patterns in IEC-104 traffic.

3.1.2 Using a Physical Testbed

The DARPA 99 dataset [34] was not generated in a SCADA environment. In fact, when the development of RICS-el took off, SCADA testbeds and datasets were hard to gain access to [5]. However, since DARPA there has been a number of testbeds built to deal with the scarcity of SCADA testbeds. One example of a SCADA testbed is the National SCADA Test Bed (NSTB) [18]. NSTB is a United States national lab that utilizes a power grid test range with miles of electrical transmission lines and several substations. There are no published datasets for public use generated at NSTB. Another physical testbed is the European project CRUTIAL [16]. CRUTIAL includes a teleoperation and a microgrid testbed. Data gathered in CRUTIAL are statistics about the effect of attacks rather than network data, which is needed during NIDS development.

More recent testbeds, that have been actively involved in dataset generation, are three testbeds called SWaT, WADI and EPIC created at Singapore University of Technology and Design (SUTD) [26]. The testbeds at SUTD are physically implemented to enhance realism. In general, what is gained in fidelity and reliability while using a physical testbed is lost in repeatability, scalability and cost [46]. For instance, RICS-el showcases its scalability through having a virtual office segment accessible.

SWaT [38] was built in 2015 and models a water treatment facility, enabling experimental research of SCADA systems. In 2016, WADI [54] was added as an extension to SWaT forming a complete water treatment, storage and distribution network. One year later the testbed EPIC [1, 2] was introduced supplying both SWaT and WADI with power.

Datasets have been collected in SWaT during normal operation and attack scenarios [26]. The attacks have been conducted in the context of tests and research purposes but also during events called showdowns [23, 6, 35]. During a showdown teams compete to come up with effective and deceptive attacks, generating datasets with a diverse set of attacks.

All datasets from the three testbeds at SUTD contain network traffic. Some have sensor read-ings and data from the historian too. Whether a dataset is labeled or not varies, if attacks have been launched in a controlled setting the dataset is generally labeled. S. Adepu et al. [24] illustrated how to generate labeled datasets with attacks in SWaT. They noted that the labeling was quite straightforward. When action was taken to start or stop an attack, relevant information was stored. They logged the following:

• Start time of the attack. • Stop time of the attack.

• Attack points, the target(s) of the attack. • Start state, the target(s) initial state.

(28)

3.1. Dataset Generation

• Attack description.

• Attack value, the substituted value of a sensor. • Attackers’ intent of the attack.

What is lacking from this description is whether any commands were issued from the control network and if and how those commands were part of the labeling too. The bots in RICS-el make sure events, as issued commands, are repeatable between experiments and can be logged.

3.1.3 Using Synthetic Data

Synthetic data generation could remove the need for a testbed, greatly reducing the cost of dataset generation. Despite that, the generation of SCADA datasets containing synthetic data or synthetically injected attacks is to the best of our knowledge not a particularly explored topic. One example of the creation of synthetic data, not used to create datasets, is a SCADA traffic generator by Al-Dalky et al. [15]. The generator generated malicious traffic based on the rules of a NIDS. Consequently, the generator did not create complete attacks but rather a stream of individual packets not conforming to the rules of a certain NIDS. This generator failed to create realistic scenarios with attacks one can expect in real applications.

However, the synthetic generation of datasets in traditional IT is a more active research area. One example is the tool ID2T by Garcia C. et al [14]. Garcia C. et al defined requirements on both generated datasets and the tools that generate them. How these requirements are im-plemented in this study are described in section 6.1. ID2T takes input in the form of recorded background traffic and injects synthetic attacks into that traffic. ID2T also generates labels for what attacks are used and when. As ID2T is limited to the supplied background data it can only approximate the effects of attacks that would change the state of the system. An issue that is not as prominent when using a testbed. ID2T also has a problem with labeling the datasets if attacks are present in the provided background traffic but when using a testbed the effects of one can control whether attacks are present or not.

An issue with synthetic attacks is that the payload of a packet in legitimate traffic is incredibly hard to estimate [30]. The state of the system could heavily influence the information found in payloads. By carrying out attacks within a SCADA system the dependency of provided background data and the need for estimating the effect of an attack on payloads is removed. Furthermore, the state of the system can be observed, making the correctness of generated at-tacks much easier to verify in comparison to synthetically created atat-tacks. Another difficulty with synthetic attacks is to inject traffic with realistic characteristics and timing [30]. Timing of attacks is a non-issue when attacks are generated in a testbed.

In conclusion, RICS-el has the potential to generate datasets without the difficulties and re-strictions of synthetic generation.

3.1.4 This Study’s Approach to Dataset Generation

Being virtual, RICS-el is both cost-efficient and flexible in comparison to a physical counter-part. The fact that RICS-el is virtual and supports the generation of scenarios using bots also greatly increases repeatable events and experiments. This feature is important in order to generate consistent datasets while using different configurations in scenarios. Other benefits of generating datasets in RICS-el rather than a physical testbed include:

• VMs can be easily added or removed to suit the needs for a specific scenario by a system administrator.

(29)

3.2. Attack Types and Attack Evaluation

• The system can be reset much quicker.

• There are no physical devices that could be harmed during attacks.

RICS-el implements a SCADA system communicating with the IEC-104 protocol, which only one other quoted source used (Maynard et al. [40]). Their testbed is built to be easily de-ployed, making it less expansive than the testbed RICS-el. For instance, RICS-el contains an office segment while Maynard et al.’s testbed do not. The biggest concern however is that their testbed is not accessible remotely nor have devices located at a distance from the con-trol network. The testbed also lacks the means of creating scenarios. Hence, RICS-el is seen as better suited than the testbed by Maynard et al. when it comes to generating accessible datasets in repeatable scenarios.

The datasets generated in RICS-el will be labeled similarly to the straightforward approach of S. Adepu et al [24]. When an action is taken by the ScenarioBot, such as starting or stopping an attack, the action is logged to be used when labeling datasets.

3.2 Attack Types and Attack Evaluation

This section offers an introduction to previously carried out attacks in SCADA systems and their results.

The attacks in this study are generated in a testbed where no previous attacks have been documented. Attacks covered in this section have influenced the choice of attacks performed in this study and functioned as a basis for the evaluation of attacks.

3.2.1 Scanning

Maynard et al. [40] identified nodes using the IEC-104 protocol in a network using Nmap and a script developed by Timorin [52]. The script detected IEC-104 devices by sending TESTFR packets in the network, to which an IEC-104 node was expected to reply. On confirmation, a STARTDT packet is sent and the reply is checked for the common address. If not given, the script broadcasted an interrogation command. As known IEC-104 devices were found by the scan it was evaluated as successful.

That approach, however, failed to address how to find IEC-104 devices when the adversary does not know what network identity (network ID) the RTUs use. This study uses the same approach to identify IEC-104 devices but complements the scanning with a few commonly used networking tools.

3.2.2 Man-in-the-Middle

In a MitM attack that Maynard et al. [39] conducted in a system using IEC-104, certain packets had their CoT changed, together with the state an RTUs Input/Output port was in. The tool Wireshark was used to analyze network traffic and verify the impact of the attack. Maynard et al. could confirm that the sent packet differed from the received and that the modified packet was accepted as legitimate by the receiver.

Maynard et al. built a library to support further MitM attacks on the IEC-104 protocol. It pro-vides the means of doing ARP spoofing with Ettercap, TCP network functions, code skeletons and building blocks to add on new attacks. This study uses this library to modify IEC-104 payloads in more ways than just altering the CoT.

(30)

3.2.3 Replay

In the study mentioned in section 3.2.2, Maynard et al. also conducted a replay attack. They captured all monitored packets in transmission between an RTU and HMI and filtered out non-IEC-104 traffic using Wireshark before replaying the traffic. The replayed packets were rejected, dropped by the receiver’s TCP/IP stack as TCP sequence numbers of the captured packets were not legitimate. In other words, the replay attack was not effective. The replayed packets in this study use sequence numbers that the receiver accepts.

Using RICS-el, Lin and Nadjm-Tehrani [33] created a 20-second traffic sequence which was replayed in the original inter-arrival times. The resulting attack is shown to be detectable by their NIDS, especially under the conditions of: Low event rates in the original traffic; and longer periods of the attack being active. However, in contrast to this study, their replay attack was not done live in RICS-el but on previously generated pcap files.

3.2.4 Denial-of-Service

The testbed used in the SUTD Security Showdown (S3) 2016 [6] used a protocol dependant on TCP/IP. This dependency was exploited by a team using SYN flooding. The SYN flood made the field device unreachable to anyone, as it was busy waiting for handshake acknowledge-ments. This made the HMI unable to acquire any further information from this field device which was seen as a verification of a successful attack.

Another attack during the S3 2016 was an attack using ARP spoofing to capture packets in-tended for the HMI. The packets were redirected to another destination and dropped. All the while, millions of packets were sent in batches to the field device. The attack was considered a success because the operator lost connection to the testbed’s RTUs [3].

Kalluri et al. [27] conducted various DoS attacks in a SCADA network. To measure the impact of attacks they observed the time it took for the HMI to receive a response from an RTU. In this study, ARP spoofing in combination with dropping packets and sending large quan-tities of packets to SCADA devices are used as DoS techniques. Ping, the Round-Trip Time (RTT) for Internet Control Message Protocol (ICMP) echo request packets, is used in order to evaluate attack impact.

3.2.5 Desynchronization

Baiocco and Wolthusen [8] conducted two attacks on the NTP protocol to achieve desynchro-nization in their IEC-104 testbed. In the first attack, the NTP servers had their clocks altered which propagated invalid time settings through the network. They noted that devices labeled or re-labeled packets with inaccurate time tags, or started to wait for packet source synchro-nization. If the time gap became too wide, there was a system re-synchronization that left logs in the historian ambiguous, making it hard to determine what request led to what re-sponse. Baiocco and Wolthusen concluded that this may sabotage control loops and make it more problematic to conduct auditing or intrusion detection.

In the second attack, Baiocco and Wolthusen [9] fed the RTUs fabricated NTP packets, desyn-chronizing them from the rest of the system. When a targeted RTU was out of sync by a certain threshold, about 30 seconds, no commands the operator sent were executed. Despite that, all commands were accepted and processed by the RTU. The ITF was not set in the RTU, as it was in sync with the NTP server from its perspective. However, the HMI noticed in-consistencies in the time tags which caused it, to no avail, to check the status between RTU and NTP servers. The attack was seen as successful as it denied the operator control of the targeted RTU.

(31)

Affecting the SCADA process through NTP is not possible in RICS-el. So while being an interesting attack vector it is not plausible to reproduce. Rather, an attack is created where time tags of IEC-104 packets are modified before reaching its receiver, simulating system desynchronization.

3.2.6 Packet Injection

Gao et al. [55] offer two examples of how an adversary could perform an injection attack: Either through injecting commands into the network; or through overwriting an RTUs pro-gramming or register settings. The latter is not possible in RICS-el as present RTUs are emula-tions. The work of Gao et al. revolved around the communication protocol Modbus. Modbus is a protocol without digital signatures or authentication which makes it vulnerable to injec-tion attacks, much like IEC-104. They describe a scenario in which the adversary use ARP spoofing to introduce completely new, fabricated packets, into the traffic.

In an investigation of Morris and Gao [42] injection attacks of various degrees of complexity are covered. In one of the attacks, Morris and Gao changed the setpoints of how much a water tank should contain, making the system regulate the amount of water to an unwanted level. With that observation, the attack was evaluated as a success. This attack is launched in this study, with a generator as a target, through the injection of fabricated packets in the established connection between the HMI and an RTU.

3.2.7 Conclusion of Attacks

The desynchronization attack cannot be implemented in RICS-el as described. Instead, a MitM attack will be used to make devices appear out-of-sync.

Other attacks are executed as described, with small adjustments to fit conditions given in RICS-el and the IEC-104 protocol.

(32)

4 Methodology of Dataset

Generation

This chapter covers how datasets were generated in RICS-el, using the ScenarioBot and a new attack tool called the attack-bot.

The chapter begins with section 4.1 presenting the attack-bot implementation and location in RICS-el. The attack-bot was created to generate attacks in repeatable experiments. The following section 4.2 explains the setup for dataset generation, covering relevant data flow and component interaction. This information is needed to fully comprehend the workflow for dataset generation, presented in section 4.3.

(33)

4.1. Attack-Bot Implementation

4.1 Attack-Bot Implementation

This section covers implementation details regarding the attack-bot, set up on a node in RICS-el called the MitM Machine. The location of the attack-bot was chosen so that MitM attacks were possible. The MitM Machine had the operating system Ubuntu 16.04 and the MitM library by Maynard et al. installed. The MitM Machine was on the same LAN as the HMI, the OT-LAN, see Figure 4.1.

Figure 4.1: Network configuration in the experiment setup

The following set of parameters were configurable on the attack-bot, setting up for a usable interface with relevant parameters:

• What attack the attack-bot should generate. If no attack was specified, a random attack was selected.

• IP addresses of attack targets. If not specified, the default addresses of one RTU and the HMI in RICS-el were used.

• Duration of the attack in seconds. If not specified the attack-bot defaulted to generate attacks until terminated.

The complete command-line interface of the attack-bot is presented in Appendix A. This is the interface used by the ScenarioBot to initiate the generation of any attack implemented in the attack-bot.

To make the attack-bot portable and easy to deploy on another node or system it was created with Bash and Python3. The attacks were written in C, using the attack-code template in Appendix B.

4.2 Experiment Setup

This section gives an overview of the interactions between nodes involved in the dataset generation. The experiments were conducted at the lower two layers of RICS-el’s SCADA system, the so-called OT side, consisting of the OT-LAN, RTU and power grid. Nodes of interest were the power grid, ScenarioBot, HMI, RTU and a firewall between the HMI and RTU. At the HMI and power grid, an overview where breakers, generators and voltage levels of the power grid could be seen. Both machines were running on Windows 10 and had Wireshark installed.

(34)

4.2. Experiment Setup

An addition made to RICS-el’s OT side was the attack-bot, tasked with generating attacks. The setup of the virtual environment is shown in Figure 4.2.

Figure 4.2: The attack-bot in RICS-el’s dataflow

Orange arrows represent normal IEC-104 traffic flow in monitor and control direction. How-ever, the attack-bot was able to re-route traffic to and from the HMI through ARP spoofing. This made traffic follow the red arrows between the HMI and firewall instead. Yellow arrows show what components that the bots in RICS-el could control through a set of actions using Secure Shell (SSH). These arrows are one-directional as no component generates a response. The blue arrows depict communication not using IEC-104 and where no attacks or bots were involved.

During experimentation, the ScenarioBot was used to create scenarios. In a scenario, an event could be that the attack-bot started generating attacks or that the OT-Bot issued a control command. Each scenario was tested at least once in RICS-el, running for 30 minutes with attacks active for large portions of the scenario.

(35)

4.3. Dataset Generation Workflow

4.3 Dataset Generation Workflow

This section covers the workflow of the dataset generation, an iterative process of six steps. Each step is detailed in its respective section. The workflow of dataset creation is shown in Figure 4.3.

Figure 4.3: The dataset generation workflow

All scenarios were arranged in a queue for processing. During script preparation, the scripts of the ScenarioBot and attack-bot were updated to suit the first scenario in the queue. The update included the following actions:

• The attack-bot had added or adjusted attacks. • The ScenarioBot could have had new actions added.

When an attack was ready to be tested, the ScenarioBot and OT-bot were used for system configuration and scheduling of events as follows:

• The ScenarioBot readied the system for a scenario. • Events were scheduled in the ScenarioBot.

• IEC-104 commands were scheduled in the OT-Bot.

The ScenarioBot was used to launch the scheduled events, creating a scenario. One such event could be to start the attack-bot that would start generating attack(s).

When a scenario finished data was collected, forming a dataset. The dataset was then pro-cessed before being evaluated. After the evaluation step, a new iteration of the workflow was executed with the next scenario in the queue.

4.3.1 Script Preparation

In the first iteration of each attack scenario, an attack script, executable by the attack-bot, was created to suit the attack scenario. To add a new attack to the attack-bot, a script detailing the function call to launch the attack had to be added to the attack-bot’s source code. When the

(36)

4.3. Dataset Generation Workflow

attack-bot had a new attack implemented, an action was added to the ScenarioBot to launch the attack. This was done by specifying what argument the ScenarioBot needed to launch the attack-bot with to start the new attack.

If the data evaluation step of the dataset generation workflow showed that the attack needed to be altered, that was done during this step too. Examples of alterations were, but were not limited to, updated attack timings or altered payload modification values.

A configuration file stated which IP addresses the attack-bot should target. If targets changed between attack scenarios, this file needed to be updated.

4.3.2 System Configuration

The system configuration was an action, scheduled at the start of each scenario. The action was a reset command to stabilize the system issued by the ScenarioBot. The reset command, system state reset, reverted the system to normal operation after an attack scenario, readying the system for additional experiments. The steps included in the system state reset:

• Shut down the attack-bot’s attack generation.

• Setting process values in the internal databases to pdefined values. This was to re-store the power grid to a given, operational state.

• Unification and synchronization of internal databases.

• Enable incoming connection requests on involved actors (RTU, HMI).

• Resetting communication between RTU and HMI. This was caused indirectly when the power grid was restored.

The reset temporarily stopped the IEC-104 traffic. When the IEC-104 traffic resumed, the system was stable and ready for use. According to observations made during resets, it takes about one minute for RICS-el to get stable again.

4.3.3 Running an Attack-Scenario

Actions during a scenario were initiated remotely from the ScenarioBot. The ScenarioBot launched actions by establishing SSH connections to the different nodes where actions needed to be executed. When connected to a node, scripts were started using a command-line interface, triggering the actions.

The ScenarioBot used the following actions in each scenario:

1. System state reset: RICS-el became stable and ready for a new scenario.

2. Start recording: This action started a recording of all network traffic monitored in the firewall.

3. Launch the OT-bot: Through this action, the ScenarioBot told the OT-Bot to load and execute a set of commands. Such as opening or closing breakers or changing the gener-ator setpoints.

4. Start an attack: Starting an attack made the attack-bot launch an attack from its location in the OT-LAN. Which attack could be specified before the scenario started.

Generating Datasets Through the Introduction of an Attack Agent in a SCADA Testbed : A methodology of creating datasets for intrusion detection research in a SCADA system using IEC-60870-5-104

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Datateknik

2021 | LIU-IDA/LITH-EX-A--2021/013--SE

Generating Datasets Through the

Introduction of an Attack Agent

in a SCADA Testbed

A methodology of creating datasets for intrusion detection

re-search in a SCADA system using IEC-60870-5-104

Hur en SCADA testmiljö med IEC-60870-5-104 protokollet

un-der attack kan skapa data att använda för nätverksbaserade

in-trångdetekteringssystem

August Fundin

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

Abbreviations

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research Questions

1.4

Delimitations

1.5

Thesis Outline

2

Background

2.1

SCADA

2.1.1

Architecture and Components

2.1.2

Communication

2.2

IEC-60870-5-104

2.2.1

How Communication Works

2.2.2

The Network Packet

2.3

SCADA Vulnerabilities

2.3.1

Vulnerabilities in Architecture and Components

2.3.2

Vulnerabilities in the Communication

2.4

SCADA Exploits

2.4.1

Scanning

2.4.2

Replay

2.4.3

Man-in-the-Middle

2.4.4

Denial-of-Service

2.4.5

Sequence Attacks

2.4.6

Desynchronization

2.4.7

Packet Injection

2.5

RICS-el

2.5.1

Design

2.5.2

Real World Comparison

3

Related Work

3.1

Dataset Generation

3.1.1

Using a Virtual Testbed