Investigating Security Issues in Industrial IoT: A Systematic Literature Review

(1)

Västerås, Sweden

Thesis for the Degree of Master of Science in Computer Science with

Specialization in Software Engineering 15.0 credits

INVESTIGATING SECURITY ISSUES IN

INDUSTRIAL IOT: A SYSTEMATIC

LITERATURE REVIEW

Vasilije Milinic

vmc20002@student.mdh.se

Examiner: Sasikumar Punnekkat

Mälardalen University, Västerås, Sweden

Supervisors: Maryam Vahabi, Abu Naser Masud

Mälardalen University, Västerås, Sweden

(2)

Abstract

The use of Internet-of-Things (IoT) makes it possible to inter-connect Information Technology (IT) and Operational Technology (OT) into a completely new system. This convergence is often known as Industrial IoT (IIoT). IIoT brings a lot of benefits to industrial assets, such as improved efficiency and productivity, reduced cost, and depletion of human error. However, the high inter-connectivity opens new possibilities for cyber incidents. These incidents can cause major damage like halting of production on the manufacturing line, or catastrophic havoc to companies, communi-ties, and countries causing power outages, floods, and fuel shortages. Such incidents are important to be predicted, stopped, or alleviated at no cost. Moreover, these incidents are a great motive for researchers and practitioners to investigate known security problems and find potential moderation strategies.

In this thesis work, we try to identify what types of IIoT systems have been investigated in the literature. We seek out to find if software-related issues can yield security problems. Also, we make an effort to perceive what are the proposed methods to mitigate the security threats. We employ the systematic literature review (SLR) methodology to collect this information. The results are gath-ered from papers published in the last five years and they show an increased interest in research in this domain. We find out software vulnerabilities are a concern for IIoT systems, mainly firmware vulnerabilities and buffer overflows, and there are a lot of likely attacks that can cause damage, mostly injection and DDoS attacks. There are a lot of different solutions which offer the possibility to stop the identified problems and we summarize them. Furthermore, the research gap considering the update process in these systems and devices, as well as a problem with the unsupervised software supply chain is identified.

Keywords: Internet-of-Things (IoT), Industrial Internet of Things (IIoT), Systematic Liter-ature Review (SLR), security, vulnerabilities, attacks, mitigation

(3)

Acknowledgements

I would like to express my deepest gratitude to my supervisors. First of all to Maryam Vahabi who gave me the opportunity to investigate the topic I am interested in. Secondly, to Abu Naser Masud for unforeseen participation in this project. Their guidance and advice, in every milestone of this study, supported me in completing the thesis. Thank you, to all my friends who gave words of encouragement and advice when I needed it. Finally, most important of all, biggest gratefulness to my family who is the biggest motivation and stimulus.

(4)

List of Figures

1 Phases of the SLR process . . . 7

2 Study selection process . . . 10

3 Snowballing procedure Source: Adapted from [21] . . . 10

4 Distribution of all papers which were considered in this study by year . . . 15

5 Distribution of final selected papers by year . . . 16

6 Distribution of final selected papers by database . . . 16

7 Distribution of final selected papers by type . . . 16

List of Tables

1 Distribution of studies by databases . . . 8

2 Distribution of publications after title screening . . . 11

3 Distribution of publications after abstract screening . . . 11

4 Distribution of publications after full reading of texts . . . 11

5 Distribution of publications throughout the steps of study selection . . . 12

6 Data extraction form . . . 14

7 Distribution of final selected papers by publication source . . . 17

8 Systems classification for RQ1 . . . 19

9 Security problems classification for RQ3 . . . 22

10 Security problems with their origins . . . 23

11 Main security problems and solutions summarized . . . 26

12 Limitations proposed in the final papers . . . 27

13 Final Included Studies . . . 39

14 Synthesized data from the final studies . . . 39

(6)

Acronyms

ADS Anomaly Detection System

APE Advanced Persistent Escaper

ARP Address Resolution Protocol

CPS Cyber Physical Systems

DDoS Distributed Denial-of-Service

DoS Denial-of-Service

FTP File Transfer Protocol

HMI Human-Machine Interface

ICS Industrial Control Systems

IDS Intrusion Detection System

IIoT Industrial Internet of Things

IoT Internet of Things

IT Information Technology

MAC Media Access Control

MES Manufacturing Execution System

MitM Man-in-the-Middle

ML Machine Learning

MTU Master Terminal Unit

OT Operational Technology

OWASP Open Web Application Security Project

PLC Programmable Logic Controller

RFID Radio-Frequency Identification

RTU Remote Terminal Unit

SCADA Supervisory Control and Data Acquisition

SDN Software-Defined Networking

SQL Structured Query Language

TCP Transmission Control Protocol

WSN Wireless Sensor Network

(7)

1 Introduction

Internet of Things (IoT) devices have been around for quite some time and the whole concept of IoT was first introduced by a member of RFID development community in 1999. In recent years, IoT devices have seen a spike in popularity mainly because of the rise of mobile devices, embedded systems and cloud computing [1]. Many definitions of IoT exist in the literature. IoT can be defined as everyday objects that can read and recognize data, and communicate with each other using numerous sensors which can be controlled over the Internet [1]. A more formal definition would be: "Group of infrastructures interconnecting connected objects and allowing their management, data mining and the access to the data they generate" [2].

As the term implies, Industrial Internet of Things (IIoT) is the use of IoT in the industrial domain. The overall goal of the IIoT system is to interconnect Industrial Control Systems (ICS) with enterprise systems and business processes. IIoT is often seen as the integration of Information Technology (IT) and Operational Technology (OT), where IT is the enterprise network and OT is the network in which the manufacturing process is taking place [3].

IoT devices have a diversity of uses and they get deployed at homes, offices, cities, and even in military settings. From environmental information such as levels of humidity and temperature to health statistics such as the heartbeats of a patient, these devices transfer and store different types of data. Depending on the application and the actual use of an IoT device, data can be confidential [4]. Given that IoT devices are not very powerful and a lot of them are often inexpensive, they are usually not designed with security in mind. For example, Alladi et al. [4] described how attackers managed to inject malicious code to Googles Nest Thermostat by exploiting vulnerabilities in the boot process. This attack allowed the installation of custom software which ultimately made the device act as a part of a botnet. Eventually, a compromised device would be the first step to gain control of other devices in the network and to spy on the household individuals. When it comes to security, [5] says that the major goal for IoT is to ensure conventional authentication mechanisms and to make data confidential.

Similar to IoT, their industrial counterparts can also be attacked and exploited. Panchal et al. [3] reported all sorts of attacks that are possible for IIoT, ranging from authentication and denial of service attacks to remote code execution. Since IIoT are connected to corporate networks which are remotely accessible through the Internet, there is always a risk of cyber threats. In the past, there have been serious cyber incidents on OT and industrial systems. To name a few, in 2010 we had Stuxnet, which is known as the world’s first digital weapon. Stuxnet was a sophisticated malware, capable of self-replication. It attacked specific hardware in Natanz uranium-enrichment facility, damaging one-fifth of Iran’s nuclear centrifuges [6]. Around that time, we also had Night Dragon, which was a series of coordinated cyber-attacks targeted towards global oil, energy, and petrochemical companies. Night Dragon attacks just stole valuable information, however, they showed how easy, using simple techniques, is to take control of Human-Machine Interface (HMI) in ICS [6]. Just recently, Colonial Pipeline, responsible for carrying 45% of gasoline, jet, and diesel fuel for the USA’s East Coast, was targeted with a ransomware attack, which resulted in a complete shutdown and fuel shortages throughout the country [7].

Even though high inter-connectivity in industrial settings brings a lot of benefits such as reduced cost and increased productivity, it also leaves the industrial systems vulnerable to cyber threats. If not treated properly, these threats can cause incidents like halting of production processes, or environmental damages such as power outages, toxic release, flood and so on. Incidents like these are really important to be predicted and stopped as they have an extreme impact on employees, companies, communities, and countries. Moreover, prevention of such incidents are a great motive for researchers to find potential flaws in IIoT systems, as well as to try and discover possible security solutions.

1.1 Problem formulation

In general, IoT devices are developed with very limited security considerations in mind, and as such are extremely prone to security incidents and attacks. Using these devices in the industry provides new opportunities to improve efficiency, lower production costs, and reduce human error. A vulnerable device can be a stepping stone to breach into an industrial network. Moreover, an

(8)

outage in industrial settings where everything is automated, even in milliseconds can have big consequences [8]. Software-related problems for IIoT are not hugely summarized in the literature and we want to see if software issues can be a problem for an overall industrial network. Many researchers use different contexts when talking about IIoT, different systems are investigated and a lot of solutions exist. All of the previously mentioned needs to be synthesized so that other researchers can see what has been investigated and what is yet to be discovered.

This thesis does not plan to invent anything new, as we are mostly going to focus on finding different security issues, vulnerabilities and risks that have been investigated in literature for In-dustrial IoT and solutions or mitigation strategies to overcome these problems. We will look into the types of systems that the researchers have used and the different downsides and benefits of the strategies they have proposed.

Overall goal: To gain knowledge about currently investigated risks and solutions for security related problems in Industrial IoT and possibly identify the research gaps in this domain.

To achieve our goal we will perform a systematic literature review (SLR). To successfully finish this project, we ask five different research questions which will guide the focus of the study and those are given in the next subsection.

1.2 Research questions

This subsection will give the research questions (RQs) which we will answer and explain the goals behind them. The research questions are as follows:

• RQ1: What are the IIoT systems that were studied or presented in literature?

• RQ2: Are those systems or parts of the study available to the open source community? • RQ3: What are the security risks and vulnerabilities that were reported in the studied

papers?

– RQ3.1: Which parts of the systems (e.g. software, firmware, network protocol, data, hardware) are the origins of the security threats and vulnerabilities?

• RQ4: What are the methods used to mitigate the security threats and vulnerabilities? • RQ5: What are the limitations of the methods in RQ4 to mitigate the security threats and

vulnerabilities?

Considering RQ1 and RQ2, these two questions are related. Since all researchers propose their systems for IIoT as there is not a general one, we want to see what are the major contributions and types of these systems. Moreover, we want to see what systems have been used to find certain vulnerabilities and potential attack vectors. With RQ2 we want to see if researchers have published these systems to the open source community and if other researchers can replicate their systems and studies to validate the security problems.

The other three research questions are also correlated. The goal of RQ3 is to mainly see what types of vulnerabilities exist in IIoT devices and systems. Terms vulnerabilities, risks, and attacks will sometimes be used interchangeably for this RQ, but they all address the same objective, which is to point out flaws in the investigated systems that potential adversaries can use to gain access to the system, make damage or threaten the security in any way. Generally, vulnerability can be seen as a flaw or a weakness in a software system, that an adversary can use to gain access to a system and make high-privileged-user actions. All systems have vulnerabilities, the only remaining question is if they are exploited. We define risk as a probability that critical and sensitive information will be uncovered to those who are part of the system. An attack can be defined as a deliberate attempt to steal crucial information or resources of a particular IT system. For the sub-question of RQ3, we want to see where these risks originate in the mentioned systems. For example, there might be a code injection vulnerability in an IIoT device that can be used to gain unauthorized access to the system. The origin of this vulnerability could be in the

(9)

operating system, the firmware of the device, or in the software development process where the software was not correctly tested and audited. We want to categorize these origins, as this will later help software and network engineers better understand the problems they face when developing and implementing these systems. Some studies will probably not mention the exact source of the problem to these security risks, and for those, we will assume where the risks originate, based on common knowledge. Furthermore, for RQ4 we want to see what has been proposed to partially or fully mitigate these risks and attacks. In the end, for RQ5 we want to see what are the downsides of mentioned solutions. For this question, we only consider the limitations that are clearly stated or can be logically inferred from the reports.

1.3 Thesis outline

The structure of this report is arranged in such a way as to follow the systematic literature review methodology. Therefore, in this Section1, we provide an introduction and formulate the problem which will be addressed, and also construct the research questions. Section2will cover the basic background behind IIoT systems, common security problems, and the SLR process. In Section

3 systematic literature review protocol is defined, with the phases of the SLR pointed out and explained in detail. In Section 4 results of the study are presented and summarized. Section 5

discusses the results. Section6 presents related work in this area and finally, Section7 concludes the study and summarises everything that was done.

(10)

2 Background

We have already provided some examples of the importance of security. This section gives more insights into the security issues in software and networking. Security is a broad topic, and therefore, it is not possible to cover all of its aspects in this thesis. In general, security can be seen as an ability of a certain system to protect data and information from "unauthorized access, while still providing access to people and systems that are authorized" [9]. In other words, security aims to stop anyone that is not a part of the system to use its resources. Three basic characteristics of a secure system in the IT domain are defined as: (i) confidentiality, (ii) integrity, and (iii) availability —also known as CIA triad. Confidentiality is the ability to protect data from unauthorized access. Integrity means that data should not be exposed to manipulation, and availability indicates that a system must be available for legitimate use [9]. However, when it comes to the OT domain, safety and availability of hardware and software are the main priorities.

2.1 Vulnerabilities and attacks

This section gives an overview of different vulnerabilities and attacks which will be further men-tioned in the report. Open Web Application Security Project (OWASP) defines a vulnerability as a flaw in an application that will allow an adversary to perform damage to the stakeholders of the application [10]. Stakeholders of an application, or any system in general, are the people that rely and depend on the application, such as its users, owners, developers, maintainers, sponsors, etc. Therefore, if a vulnerability is exploited to harm a system or its resources, these people would feel the damage. There are many reasons why an attacker might want to exploit a certain vulnerabil-ity. Usually, vulnerabilities are exploited to access a system without privilege or to make a system unresponsive. OWASP has different lists for popular vulnerabilities based on the platform. For example, they have a list of Top 10 vulnerabilities for web applications, mobile applications, cloud, etc. Correspondingly, they post a list of Top 10 vulnerabilities for IoT [11], the latest available one is from 2018. The list includes vulnerabilities like weak, guessable, or hardcoded passwords, lack of secure update mechanism, and insecure default settings.

Apart from these, it is important to mention some other popular vulnerabilities such as buffer overflow, Cross-Site Scripting (XSS), SQL injection, zero-day, and directory traversal. All of the mentioned vulnerabilities and attacks in this section are available in [10] and [12]. A buffer overflow occurs when a program tries to put more information in a buffer than it is designed to hold. Buffer overflow can cause a system or a program to crash or it can create an initial opportunity for an attacker to penetrate into the system. Programming languages C and C++ are more susceptible to this vulnerability as they do not have memory protection built-in by default. XSS is a web application vulnerability, classified as an injection attack, where malicious scripts are injected on a trusted website. With this attack, session cookies and tokens can be stolen, which ultimately leads to account hijack. SQL injection is similar to XSS. In this situation, an attacker tries to inject SQL query via some input on a client application. If a query is not properly validated and filtered, an attacker can gain access to the database, read and write to it without any additional privilege [10]. Directory traversal is a type of vulnerability where arbitrary files on a system can be read without higher privilege. These files can include operating system files, application code, and secret credentials [12]. Vulnerabilities in firmware are possible. Firmware is software embedded in the hardware of a device to provide low-level control for that specific hardware. Since firmware is read-only-memory, it is really hard to change it without using special programs. This further means that during firmware development, the vulnerabilities must be treated correctly to avoid an entry point for an attack.

Simply explained, zero-day vulnerability is any type of vulnerability that is unknown to the public and, more importantly, to developers and vendors of a system who should be obligated to diminish it. A zero-day attack is an attack where a zero-day vulnerability is exploited. As long as the vulnerability is unknown to the public, anti-virus and other types of software that are used to detect malware, are not able to detect this attack, and there is no defense against it. As a matter of fact, four of these zero-day vulnerabilities were used in the popular Stuxnet worm [13].

Some of the most popular attacks on IT systems include Denial-of-Service (DoS) and Man-in-the-Middle (MitM) attacks [12]. DoS, as the name suggests, is a type of attack in which attackers

(11)

overload the resources of a particular system to make it unresponsive. Distributed Denial-of-Service (DDoS) is the same attack but on a much larger scale. DDoS attacks always include large botnets which are controlled by a remote attacker. These botnets are essentially infected groups of computers, IoT devices, or any type of device that can be connected to a network.

Problems are also possible in network protocols. In TCP protocol, to first establish a connection, what is known as a three-way handshake must be performed. In short, a client who wants to communicate sends a SYN message to a server, to let it know that it wants to connect. The server responds with a SYN ACK message, confirming that the initial connection request was received and that server wants to communicate back. After that, a client sends an ACK message, completing the three-way handshake, and now data can start flowing between these two points in a network [14]. One example of how a DoS attack (known as SYN attack) works is that the client never sends the ACK message back, to complete the handshake, leaving the server to wait and thus making it unresponsive to real requests, if done on a large scale. In MitM, an attacker tries to position himself between two points of communication, by relaying and altering the traffic and thus acting as a man in the middle. If done correctly, an attacker can see all information between the two points in a network. This attack is often performed by poisoning the Address Resolution Protocol (ARP), which results in swapping the attacker’s MAC address with the IP address of a real machine, making the sender think that the data is going to a legitimate computer while in reality, it is going to the attacker.

2.2 Industrial Internet-of-Things

In Section1, we briefly touched upon IIoT systems. This section will give a more general overview of what these systems consisted of. IoT and IIoT terms are similar, but they cannot be used inter-changeably. Most of the time IoT term is seen as consumer IoT, which is "human-centered". IoT communication can be seen as machine-to-user, where "the things are smart consumer electronic devices interconnected with each other in order to improve human awareness of the surrounding environment, saving time and money" [15]. In contrast, IIoT aims to merge IT and OT, and it can be contemplated as machine-to-machine. General use for IT is the manipulation of data, usually in a business environment, using computers, networking, and other physical devices. On the other hand, as the name suggests, OT is entitled to control industrial operations [16]. Super-visory Control and Data Acquisition (SCADA) systems are one example of OT and usually, they are mostly addressed in the literature. SCADA systems use software and hardware components to control industrial processes, monitor real data, and interconnect different sensors, pumps, and motors [17]. Critical systems like power delivery, industrial manufacturing, and gas transportation all use SCADA. Main components of a SCADA system include Programmable Logic Controller (PLC), Remote Terminal Unit (RTU), and HMIs. PLCs and RTUs are small special computers that are designed and programmed to control certain processes in the industry. HMIs are screen-like dashboards, usually touch screens, that are used to control, monitor, and communicate with machines and computer programs. An HMI can be any screen that a person uses to interact with a device, but when talking about HMIs, we usually refer to them in the industrial domain. If any of these devices possess software vulnerabilities, such a problem can be a starting point for a total system takedown. These devices can be considered a type of IIoT [18].

2.3 Systematic Literature Review

This section gives an overview of the SLR methodology which is following the guidelines outlined by Kitchenham and Charters in [19]. There are two different types of research studies: primary and secondary. Primary studies are those studies that address specific issues and try to answer particular research questions, they always employ scientific procedures like experiments or case studies. Secondary studies focus on reviewing and synthesizing existing knowledge to provide some evidence and answer specific research questions while identifying research gaps. Systematic Literature Review often called a systematic review, is a type of secondary study. SLR studies apply an unambiguous and distinct protocol to identify and evaluate existing literature, to achieve a goal of answering specific research question(s). SLR process should be unbiased, transparent, and reproducible [19]. A systematic review protocol, as the name suggests, is a plan which outlines

(12)

how a certain study will be carried out. Each SLR starts with this and in Section3, we point out our protocol and give a detailed explanation on how this SLR study was performed. Kitchenham and Charters [19] gave the following three reasons why a researcher might want to carry out a systematic literature review:

• To make a summary of existing evidence • To point out current research gaps • To provide background for new research

Moreover, Kitchenham and Charters [19] argue that most research begins with some sort of a literature review. If we want to find answers to a certain topic, naturally, the first step is to look at the currently published books and papers which provide evidence and knowledge. Still, a high number of literature reviews are not systematic as they do not follow a specific strategy and do not report the methodology in detail. However, being systematic requires a lot more effort and this is what [19] describes as a disadvantage of SLR. In contrast, Kitchenham and Charters say that the advantage of having a well and predefined methodology is that the outcomes of the study are unlikely to be biased.

There are three big stages in the SLR process:

• Planning the review, where researchers seek to identify why a review is needed, specify re-search questions, develop and evaluate a review protocol.

• Conducting the review, where researchers perform the tasks that were outlined in the previous stage. These tasks include selecting primary studies, evaluating their quality, extracting relevant data, and analyzing it.

• Reporting the review, where everything that was performed in the previous stage needs to be written down and evaluated.

(13)

3 Research methodology

This section presents the SLR methodology which will be carried out by following the guidelines in [19] and [20]. We try to be as detailed as possible when explaining the research protocol and all of its steps so that the study can be replicated. All SLRs have a defined protocol that needs to be followed, ours is no different and it includes:

• Formulation of research questions • Explanation of the search process

• Selection of inclusion and exclusion criteria • Quality assessment

• Data extraction

• Collection and analysis of data

The following subsections further explain these key steps of the review protocol. We have already given some background information on the three main phases of the SLR process. Figure

1visualizes the overall SLR process with those three main phases.

Figure 1: Phases of the SLR process

3.1 Search process

After we have defined the research questions in Section1.2, the next step is to conduct the search process and find published studies that will help answer the research questions. The initial step here is to define a search string that will be used for an online search in digital databases. Search string is a set of keywords that together with Boolean expressions form a query that will find relevant studies. In the case of this thesis, we will focus on digital libraries: IEEE Explore Digital Library1_{, ACM Digital Library}2 _{and Science Direct}3_{. These libraries were chosen based on the}

suggestions in [19] as most relevant for software engineering. With the time constraints that the thesis project poses, it was not possible to cover other databases which would probably contain relevant studies, databases such as SpringerLink, Google Scholar, and Scopus.

Our search string represents a set of keywords that were identified from the research questions as most representative. Since most researchers use the abbreviation IIoT for Industrial Internet of Things, this will limit the results where industrial IoT is presented. Words vulnerabilities and risks are used as synonyms. Considering that this thesis is specific to the software engineering domain,

1_{https://ieeexplore.ieee.org/} 2_{https://dl.acm.org/}

(14)

we mostly want to focus on software-related problems for IIoT and potentially find software-based solutions. In addition, it would be beneficial for the researchers to see how well software related problems tie with network problems. With this mentioned, keywords software and networking, even though not clearly present in the research questions, are put in the search string to limit the resulting papers only to these two areas. Certainly, there are other domains where potential security issues may arise for IIoT, such as physical security, but again with limited time for this thesis, it is not feasible to focus and cover all issues. Even though in RQ2 we ask if the researchers open source their systems, we did not put this keyword in the search string, because it would greatly limit the number of papers, and in addition that RQ can be answered in a binary fashion. The same applies to RQ5. The search string which is used for this study is given below:

iiot AND security AND (vulnerabilities OR vulnerability OR risk OR risks) AND (software OR networking)

Besides the search string, we want to limit our investigation only to peer-reviewed publications, so we filter the results to include conference papers and journal articles. Furthermore, we want to see how well this topic was researched in the last 10 years, so we filter the results by the year range 2010-2021. The same search string was applied to all 3 databases. On IEEE Explore we filtered the results to Conferences and Journals. On Science Direct, we filtered the results by article type to research article and by subject areas to computer science. On ACM, we filtered the publications to Proceedings and Journals.

After querying the digital libraries with the given search string, and applying the mentioned filters, we ended up with a different number of papers. Table1shows the distribution of identified studies throughout digital libraries. IEEE database returned 58 papers, while ACM and Science Direct databases gave 77 and 184 papers respectively. All of these 319 papers go to the next stage where we need to select the appropriate ones. It is important to mention that the search string was applied to all metadata. For example, it is possible that some papers have a keyword in the reference list and never mention that term in the text. All databases have advanced search options to query only by title or by abstract, but this would substantially limit the number of returned results. After we have identified the publications, the next step is to apply selection criteria, which are covered in the next subsection.

Name URL Number of studies

IEEE Explore Digital Library https://ieeexplore.ieee.org/ 58

ACM Digital Library https://dl.acm.org/ 77

Science Direct https://www.sciencedirect.com/ 184

Table 1: Distribution of studies by databases

3.2 Study selection

Study selection is a major step in the SLR process. In this step, we want to filter the publications that got returned in the previous step when we queried digital databases. In addition, we only want to focus on papers that are in scope to answer our research questions. Inclusion and exclusion criteria are as follows.

Inclusion criteria

• Studies proposing security risks and vulnerabilities in IIoT • Studies with applicable mitigation strategies or security solutions • Studies available in full-text and written in English

(15)

Exclusion criteria

• Existing secondary studies including other surveys, taxonomies, systematic maps, and liter-ature reviews

• Short papers < 4 pages

• Papers that do not mention risks related to software or networking

• Papers that are too generic and only talk about security in general, meaning that they do not mention any vulnerability and attack, or papers that give common solutions, e.g., deploying a basic firewall on the network

• Books and gray literature

Our criteria are defined in such a way to limit the number of publications that can specifically answer the research questions. We focus on studies that propose security vulnerabilities in IIoT systems and papers that suggest relevant mitigation strategies as these two criteria will answer RQ3 and RQ4. It is possible that a publication only mentions certain vulnerabilities, without going too much in detail on origins or causes for them, but gives a good mitigation strategy; we include these papers as well, as long as the mentioned vulnerability or risk is related to software or networking. Some studies might not have a specific IIoT system, but concerning RQ1, we can answer this question by saying that researchers have used a generic IIoT system. Moreover, for RQ1, we only include the study if paper specifically talks about IIoT system. A lot of papers just mention IIoT along the way, for example, in the introduction, but then talk about something else, i.e., Digital Twin. Since we applied the search string to all metadata, it is possible that the given results will not be specific for IIoT and these papers will be excluded. There are a few existing surveys and reviews for the IIoT domain, which will be presented in Section6, but we do not want to focus on such studies as they will introduce bias. In addition, we exclude the papers which are too generic, meaning they do not bring in any specific vulnerabilities or solutions, but rather talk in general about security for IIoT.

The inclusion and exclusion criteria are firstly applied to paper titles. In this step, some papers can be excluded because in the title we can see that, e.g., a paper is an existing survey or taxonomy. Next, criteria are applied to abstracts of the given studies. Like the previous step, this step will highly cut the number of papers. Papers that remain after this step are read in full text and criteria are also applied. In the end, we are left with the final set of papers that will be used in the study. Figure2 visually presents this study selection process.

To further find any relevant studies that might help us answer the RQs, we perform snowballing. We follow the guidelines from Wohlin [21], who says that snowballing is done to identify additional papers by looking into the reference list of existing papers and their citation papers. Snowballing is performed on the papers that have made it to the final stage of selection. Backward and forward snowballing is performed. Backward snowballing is a process where we look into the reference list of the final selected papers and identify potential papers that might be relevant to our study. Inclusion and exclusion criteria are also applied to these papers. Forward snowballing is a process where we find the papers which have cited our final selected papers and try to identify if the citing papers are relevant to our study. If any of the backward or forward snowballed papers are relevant and able to answer our RQs we include them. Figure3represents the snowballing procedure which is given in [21].

(16)

Figure 2: Study selection process

Figure 3: Snowballing procedure Source: Adapted from [21]

(17)

The total initial number of publications when we combine all databases was 319. After going through all the publications and reading just the titles, 39 papers got excluded. Table2represents an overview of publications after title screening.

Database Accepted Rejected

IEEE 51 7

ACM 67 10

Science Direct 162 22

Table 2: Distribution of publications after title screening

After title, abstract screening has excluded 167 publications. It is important to mention that we were pessimistic when reading the abstracts of papers, meaning that if there was any doubtfulness on whether the paper should be excluded we send that paper to the next stage. We employ this approach on title screening as well. If it is very clear from the abstract that the paper is unable to answer our RQs and does not cover inclusion criteria, that paper does not go to the next stage. Kitchenham [19] quotes that studies relevant to software engineering and the IT domain tend to have poor abstracts and that the researchers should also review the conclusions of the study, if uncertain on whether to include or exclude a certain paper. We took the same approach, and read the conclusions of the papers that were excluded after the abstract screening, just to be certain. The same approach was used through the snowballing procedure. Table3 shows the distribution of publications after abstract screening.

IEEE 35 16

ACM 30 37

Science Direct 48 114

Table 3: Distribution of publications after abstract screening

After abstract, and after reading the full texts of the remaining papers, the final set of pub-lications counts 24 studies. There were no duplicates. Total of 113 full texts were read and 89 publications were excluded in this stage. Table4 shows distribution of publications after reading full texts.

IEEE 14 21

ACM 5 25

Science Direct 5 43

Table 4: Distribution of publications after full reading of texts

The final set of studies, which we found on digital databases, counts 24 publications. After applying snowballing procedure to the final included papers, 5 more papers were identified as relevant. Table5gives an overview of the entire study selection process and amount of papers that were included and excluded in each step. With forward and backward snowballing we found 2 and 3 studies respectively in each direction.

(18)

Stage Activity # of publications

0 Query the databases and collect the results 319

1 Screen the titles and apply inclusion/exclusion criteria 280 2 Read the abstracts and apply inclusion/exclusion criteria 113

3 Read full texts and apply inclusion/exclusion criteria 24

4 Perform forward and backward snowballing 5

5 Final set of relevant publications 29

Table 5: Distribution of publications throughout the steps of study selection

There were a lot of studies that talk about IoT security and those mention IIoT in just a few sentences. Even though industrial IoT is a subset of IoT, and a lot of security-related problems and vulnerabilities are inherited, we only wanted to focus on papers that specifically mention issues for industry-specific devices and systems. Going through all papers that mention IoT or including the IoT keyword in the search string would yield a high number of papers, which would not be attainable to cover entirely.

3.3 Data extraction and quality assessment

After concluding the study selection phase, the last step is to extract the data from the final selected papers. This data should help us answer research questions and this data will be analyzed to gather some findings. To help us store all data in one place and later point out which papers were considered, we use an Excel workbook. To make the study transparent we provide the original excel file which is used for data collection4_{. To assess the quality of our studies, the following}

criteria were used, based on "yes" and "no" scoring:

• Is there a clear description on the aims/objectives of the research? • Is the paper based on research?

• Is there an adequate description of the context in which the research was carried out? • Were mitigation strategies well described?

• Is the study of value for research/academia or practice/industry?

Quality assessment criteria is based on and adapted from checklists in [22]. Besides these, those papers that provide answer to RQ5 can be considered to be of better quality as they impose some limitations.

As a first step, we put all papers that got returned from one of the databases in one spreadsheet and then go through each paper. Each database has its own spreadsheet with the papers from that particular database. Initial data that is put in each spreadsheet is:

• ID • Authors

• Link to the paper • Paper title • Citation count • Online date • Type of paper 4 https://docs.google.com/spreadsheets/d/1d2QyP_CnVGNd2s1j_9br_yA3pBdplIZl5S2FWZsCIMw/edit?usp= sharing

(19)

This data is either entered manually or by exporting the search results to a CSV file and then importing that file into the Excel workbook. We already mentioned in Section3.2how we screened the papers, based on title, abstracts, or full text. We color code the entire row in a spreadsheet to green if the paper is selected to be included in the final study or to red if the paper is rejected. Using the color-coding technique, in the end, we can filter only the papers that made it to the final stage. The data that was extracted from all final papers is:

1. Paper ID: Each database had their own spreadsheet and every paper had a unique ID. With this, it was easier to differentiate between databases.

2. Authors: Names of authors who wrote the publication.

3. Link: HTTP link to the publication in the digital database or to the PDF file containing the paper.

4. Paper title: It is important to include the title, as title screening is the first step in excluding non-relevant papers.

5. RQ1: Types of IIoT systems considered in the paper.

6. RQ2: Are the mentioned systems open sourced or they use some open source software for experiments?

7. RQ3: Risks and vulnerabilities with their origins. 8. RQ4: Mitigation strategies that are offered.

9. RQ5: Clearly identifiable limitations which are mentioned by the authors. 10. Citation Count: Used for forward snowballing.

11. Online date: Year in which the paper was published.

12. Type of paper: Conference paper or journal article, in the end, we want to see their dispersion. 13. Backward snowballing: Papers that had the possibility of being included for backward

snow-balling after going through titles of the final selected paper’s references.

14. Forward snowballing: Papers that had the possibility of being included for forward snow-balling after going through titles of the papers which cited final selected papers.

Table 6 given below, represents and gives an example on which data was extracted from the final selected papers. Kitchenham [19] provided the details how it is possible that a lot of studies in the search are totally irrelevant for the needed topic and that the researchers should include the reasons on why a certain study is excluded, but only after excluding the totally irrelevant papers. On the contrary, we included the reasons for exclusion for all studies which were considered, and this can be seen in the provided Excel file.

To perform snowballing on the final included papers we went through the reference lists and citations of those papers. Each database provides the number of citations for each publication, with a list of those papers which cite the publication, and we manually go through these lists on each database. While performing snowballing, studies that were identified as potential to be included were put in a separate spreadsheet. Not all studies that can be snowballed are put into this spreadsheet, as this was not necessary. A lot of papers that we went through while snowballing were excluded based on title, either because they are not relevant to the topic we are investigating or because they fall into some of the exclusion criteria. Final selected papers are all put together into one spreadsheet, where they are further analyzed, and this process along with the answers to the research questions is given in Section4

(20)

Data item Value RQ ID Integer, unique to every paper in the spreadsheet

Authors Names of the authors

Link HTTP link to PDF file of the paper

Title Paper Title

System type? e.g. testbed or generic system RQ1

Open source? Yes (how) or no RQ2

Vulnerabilities? e.g. buffer overflow RQ3

Mitigation? e.g. intelligent penetration testing technique RQ4

Limitations? e.g. poor performance RQ5

Citations Number of papers that have cited the study Online date Year of publication

Type of paper Conference or journal

Backward Reference number in the study e.g. [14] with link to the paper Forward Link to the potential paper that cited included study

Table 6: Data extraction form

3.4 Evaluating the research protocol

The research protocol in every SLR should be evaluated by someone else, other than the author, to avoid bias. Kitchenham [19] told that the protocol should be evaluated by external experts if funding is available. In this thesis project that is not feasible. Moreover, she said that PhD students should provide the research protocol to their supervisors for some constructive criticism. In this thesis, we employ the same approach, as both supervisors have reviewed the research protocol and were constantly updated on the work.

One of the supervisors had a few possible suggestions for research questions and based on those suggestions we conducted a pilot study prior to our digital databases search. The pilot study included 6 papers and was mostly based on the research questions that were asked in the actual study. These papers included a study on IoT security [23] which we found interesting. To find papers for the pilot study we queried IEEE Explore with few different search strings and random five papers were selected in the end. One of them includes [24] where a vulnerability scanner for IIoT is discussed, while the rest of the pilot study papers ended up being in the actual study because they got included with the final search string and were able to fill the inclusion criteria, those include [25], [26], [27] and [28]. After the pilot study, we agreed on the research questions.

If there were any suggestions or comments on the work, during the study selection process, or in any other step, supervisors would point out their concerns. It was already explained how the search string was formed, and it is important to mention that it was formed in the collaboration with the supervisors after examining a few possible alternatives. The same attitude was used when deciding on inclusion and exclusion criteria.

(21)

4 Results

In this section, we give an overview of the final studies and answer our research questions.

4.1 Overview of the Studies

This section gives an overview of all included publications which are used for our investigation, in order to answer the research questions. A complete list of all final publications is available in AppendixA.

We previously stated in Section 3.1 that we filter the database results to only show papers that were published in the last 10 years (2010-2021), as we wanted to see how this topic has been investigated throughout the years. However, when we queried the IEEE Explore database, the earliest year for all published papers was 2017. If we try to search on IEEE Explore with the search string iiot AND security, we get that the earliest paper is from 2015. The same situation occurs for two other databases which were used to search for publications. This seems normal taking into account that IIoT is a relatively new term. Study selection and data extraction process were performed in March/April 2021. On Science Direct, there were 184 initial papers, 58 of those were published in 2021, and 61 were published in 2020. On ACM Digital Library there were 77 initial papers where 32 and 25 were published in 2019 and 2020 respectively. Considering the previously mentioned, we can say that IIoT security has seen an increase in research in the last few years. Figure4 visualizes this trend in a line chart as it includes the distribution by year of all papers that were considered in this study.

Figure 4: Distribution of all papers which were considered in this study by year

In Figure5 we can see the dispersion by year of publication of those papers that made it to the final stage of selection. There were three digital databases considered when we searched for publications. Those are IEEE Explore Digital Library, ACM Digital Library, and Science Direct. IEEE seems like the best source for this topic as 14 papers were taken from that database, and 5 each from ACM and Science Direct, while 5 papers were snowballed. Figure 6 visualizes the distribution per database for the final included papers. As it was already mentioned, we only focused on conference papers and journals, and their ratio can be seen in Figure 7. Table 7

presents a list of publication sources for all final papers. We can see that most of the papers come from different sources, however, a larger portion of them are from major computer science journals.

(22)

Figure 5: Distribution of final selected papers by year

Figure 6: Distribution of final selected papers by database

(23)

Source Number of

publications Study

IEEE Access 3 [29], [30], [31]

2018 IEEE 23rd International Conference on Emerging

Technologies and Factory Automation (ETFA) 1 [32]

2018 IEEE International Conference

on Industrial Internet (ICII) 1 [33]

2019 IEEE 5th World Forum on

Internet of Things (WF-IoT) 1 [25]

2019 International Conference on

Applied Electronics (AE) 1 [34]

IEEE Internet of Things Journal 5 [_[26₃₆], [_{], [}18₃₇], [_] 35]

IEEE Transactions on Emerging Topics in Computing 1 [38]

IEEE Transactions on Industrial Informatics 1 [39]

2020 5th International Conference on Smart

and Sustainable Technologies (SpliTech) 1 [28]

2019 IEEE International Conference on

Industrial Cyber Physical Systems (ICPS) 1 [40]

2018 IEEE 13th International Symposium on

Industrial Embedded Systems (SIES) 1 [41]

SenSys-ML 2019: Proceedings of the 1st Workshop

on Machine Learning on Edge in Sensor Systems 1 [42]

BDIOT 2019: Proceedings of the 3rd International

Conference on Big Data and Internet of Things 1 [43]

ARES ’19: Proceedings of the 14th International

Conference on Availability, Reliability and Security 1 [44]

ACM Transactions on Internet Technology 1 [45]

ACM Transactions on Embedded Computing Systems 1 [46]

IEEE Communications Magazine 1 [47]

2016 21st Asia and South Pacific

Design Automation Conference (ASP-DAC) 1 [27]

Computers & Security Journal 1 [16]

Procedia Computer Science Journal 1 [48]

Computer Communications 1 [49]

Information Processing & Management 1 [50]

Journal of Information Security and Applications 1 [51]

(24)

4.2 Answers to the Research Questions

This section presents the results which were gathered from the final selected papers and answers our research questions.

4.2.1 Types of systems

In total, there are six types of systems that have been identified in the final studies. Most of the systems that were mentioned, observed, and used in the final studies were unspecified, meaning that the paper was not specific in explaining how the system works or looks. Eleven final studies do not include a specific system.

The second type of system that the researchers have used to find specific problems related to IIoT is testbeds. Testbeds are used in experiments when researchers want to perform a repeatable and transparent study. Barreiros et al. [52] pointed out that in software engineering, testbeds are often used to evaluate software architecture. Moreover, they said an advantage of a testbed is that replications are much faster to conduct after a testbed has been created and assembled. For instance, authors in [48] used a real modular system as a testbed to show possible attack scenarios in smart manufacturing. They mentioned that it is not often possible, mainly due to costs and downtime, to have a real production system for research purposes. Furthermore, they stated that their testbed is a fine estimate of a real system. Their system is composed of different PLCs, HMIs, actuators, and Arduino-based sensors. Similarly, authors in [26] used an IIoT system testbed of a water storage tank that monitors turbidity quantity and water levels. This testbed included components like HMIs and PLCs as well. Moustafa et al. [33] proposed their own testbed configuration to test their intelligent penetration testing technique and it consisted of IIoT devices and sensors, with IIoT services and different virtual machines. Negi et al. [40] argued that using a testbed when assessing vulnerabilities or security solutions in critical systems is inevitable and they used a real lab-scale power distribution testbed with PLCs, relays, power meters, and different protocol switches. SCADA system testbed was used in [16] which simulated functions of a Metro railway control system. The particular testbed consists of field devices such as cameras, sensors, and PLCs, fog environment with HMIs and servers, and cloud environment with cloud servers. However, the authors in [16] did not go much into details to explain how the testbed was implemented. Intending to demonstrate how a ransomware attack might be possible in IIoT, researchers in [36] built a small experimental testbed to try mimicking a real IIoT system. They said high levels of fidelity are achieved when the testbed presents simple instantiation of an actual system. Their testbed is built with simple devices and tools, as well as a programming language, which all replicate a real IIoT system. Moreover, they tried using simple devices such as an LED output actuator to mimic the on and off state of a pump relay and they have a controller written in JavaScript to send commands to this pump relay.

Five papers present IIoT systems in different layers, entities, or levels. For example, authors in [46] used a virtualized system and presented it in the application, virtualization, middle and device layers to show how one attack scenario is viable through multiple layers. Likewise, Yan et al. in [47] presented their IIoT architecture in three layers: perception, network, and application layer. However, they did not explain in detail why such architecture is beneficial or detrimental. In a similar fashion, Rathee et al. in [50] gave a theoretical hybrid architecture for a multi-national IIoT system, which has offices in multiple countries. The advantage of such architecture is that control of any action, legal or illegal, can be done from a single place no matter where a branch of a company is located. Fang et al. [35] presented a universal model for IIoT control system in three layers: device, data forwarding, and data processing. Again, such a system is only theoretical as they showed it to explain how their data security scheme is possible. Similarly, Shuai et al. [49] splitted IIoT architecture into four entities: industrial sensor nodes, industrial management gateway, registration authority, and user.

There are two papers in which particular IoT and IIoT devices are studied. In [27] Haier Smart Care IoT device and Itron Centron CL200 Smart Meter IIoT device are examined and analyzed for security flaws. This study did not focus on a specific IIoT system, but rather considered the devices themselves as a case study, to find specific problems. Haier Smart Care is a smart device that can control and read data from sensors such as a smoke detector or a remote power switch [27]. Even though this device is primarily used in the home environment, it can also be used in

(25)

industry. Additionally, a lot of security problems are inherited from IoT to industrial counterparts. Itron Centron is a smart meter whose prime function is to measure energy usage for a customer and report this data to a nearby substation [27]. On the same note, [28] presented security challenges for a Wi-Fi-connected IIoT beer cooler and a serving unit. The idea behind these two devices is to have consumption control and preventive maintenance as the manufacturer will monitor the usage of the devices.

Two papers used an IIoT system which is specific. These include [29] in which authors used an example of an IoT solar network which is integrated with an industrial plant control, and [32] in which authors instrumented an IIoT system that offers Modbus TCP/IP services. Authors in [29] just use IoT solar network as an example to demonstrate their graphical representation of vulnerabilities. On the contrary, Flores and Mugarza in [32] implemented their own IIoT system to carry out vulnerability discovery while the system is operating. They had two implementations in which their framework is validated. These implementations are based on virtualization technology, XEN and Docker, with different hardware setups, to experimentally analyze that suggested solution has no performance issues.

An experimental system was used in one of the final selected studies. In [45] Jiang et al. built a low-cost physical system using two i.MX6 micro-controllers, integrated with two CAN buses and servo motors, to identify potential attack surfaces of an IIoT system. Even though their experimental system is similar in type to previously mentioned testbeds, word testbed was never mentioned in the paper, therefore, we classified this system as experimental.

As we already disclosed, a lot of papers did not specify what type of system they were working with. Such papers only presented their solutions and theoretically explained possible vulnerable points. Papers that explained in detail how an IIoT system works or looks, have studied a specific testbed and used it to prove and find certain problems or solutions with their work. All papers that have clearly given their systems and reported it in a particular way, either system under test (virtualized, hardware-based, etc.) or systems as a concept, were mentioned and explained in this section, and this concludes the answer to RQ1. Table8 gives a classification for the systems that were studied and presented in the final papers and tells which papers have considered which system. As there are six categories of systems, and a major portion of them are unspecified, we can just affirm what we already mentioned when defining this RQ; there is not a standardized system for IIoT and every system is good or bad for a given purpose.

System classification Papers

Papers with unspecified system [25], [18], [38], [30], [41], [31], [42], [43], [44], [51], [37] Papers which use testbeds [33], [34],[26], [39], [40], [16], [48], [36]

Papers that present systems in different

layers, entities or levels [46], [47], [35], [49], [50] Papers that analyze specific devices [28], [27]

Papers with specific systems [29] , [32] Paper with an experimental system [45]

Table 8: Systems classification for RQ1

4.2.2 Open Source

When it comes to RQ2, we wanted to see if the researchers open source systems under analysis, those which are used to find certain vulnerabilities and develop potential mitigation strategies. Our assumption, when defining this research question, was that if researchers open source their systems, then the studies and experiments they have conducted can be easily replicated. However, after going through the final included studies and carefully examining the texts, we were not able to find many papers that mentioned open sourcing anything related to the study. Keywords open source or open-source were not included in the search string because this question can be easily answered without them. In addition, the number of initial papers would have been significantly less.

In fact, 22 of the final included studies never mentioned open source in their work. Even though some papers never talked about open-source, they said that their study is easily replicable, so we

(26)

mention those papers here as well. For example, in [33] Moustafa et al. had used a testbed of an IIoT network environment to identify exploitation paths. They stated that their testbed is avail-able to anyone on request so that other researchers can utilize their testbed or its configurations for their research. Comparably, authors in [27] used specific devices for security analysis (see Section

4.2.1) and these devices are available for purchase, therefore their study can be reproduced. Only the authors in [34] have their testbed open-sourced to show the achievability of their approach. On the contrary, Zolanvari et al. [26] have not open sourced the testbed which they used in the study and they argued that industrial companies never publish their network data due to laws and privacy regulations, and because of that real-world data-sets for IIoT systems are not available to researchers.

Some authors used open-source software to execute their experiments and analyze security problems and we think this is also important to mention, as those studies can be duplicated to some extend. Jiang et al. in [45] tried to identify attack vectors in IIoT devices and networked embedded systems. They have used an experimental system to conduct their investigation, and for one part of the study regarding the Mirai attack, they have an open-source contribution. Mirai attack is an attack where vulnerable network devices, typically running Linux, are targeted to make a botnet, which is then usually used to propagate large DDoS attacks. Jiang et al. [45] released the code which they have used to extend the Mirai attack and which is needed to conduct their case study. For instance, Zhou et al. [16] used an open-source network IDS which helps them propose a DDoS mitigation scheme. Closely, Falco et al. [18] did not specify on what types of IIoT system their study is carried out, but they rather talked in general about SCADA IIoT systems. However, their study can be repeated with ease as they have used publicly available sources and open-access databases to find known vulnerabilities and they designed a risk prioritization scheme based on those. Similarly, Flores and Mugarza [32] adapted open-source library to implement an IIoT Modbus service interface. Some studies have designed and used specific algorithms as a security solution or to determine different risks, and such studies included pseudocode in their reports without mentioning open source contribution.

All other final included studies that were not introduced in this section, do not have anything related to the open-source community or have not explicitly stated such contributions in their reports. We can conclude that a large majority of researchers do not open source their systems or anything related to their studies.

4.2.3 Vulnerabilities and their origins

In this subsection, we will provide answers to RQ3 and RQ3.1 as they are related. After examining the final papers, we noticed three types of security problems that papers mostly addressed. Those were classified in vulnerabilities, attacks, and risks. The most mentioned vulnerability is regarding firmware in IIoT devices. Vulnerabilities in firmware, mostly originating from poor software de-velopment practices, are often exploited to gain access to a specific device and later to a network or a system. Negi et al. [40] managed to find hard-coded credentials for FTP in firmware of a RTU. They tried using the same credentials for FTP on a different PLC device, and it worked. The particular FTP device has an embedded web server which they managed to access because they found password collisions were possible due to the fact that the device was using a vulnerable VxEncrypt library. To show how firmware vulnerabilities or malicious firmware, in general, are problematic, researchers in [48] made their own firmware based on a popular open-source library for an ambient sensor. They altered the library and added malicious code routines, which could slow down network communication and change temperature readings. This went on to show how malicious firmware could jeopardize the security of a particular device and network as a whole. Considering that a lot of firmware software is based on open-source libraries and in the past there have been intentional attempts to infect these with malicious code [48], this is a major concern for developers who write firmware software for specific IoT devices.

Likewise, buffer overflow vulnerability is often reported when researchers talk about software-specific vulnerabilities. Since this type of vulnerability is mostly found in low-level programming languages, C in particular, which are common in industrial systems, and these systems are almost never restarted because they need to be constantly in operation, memory fragmentation becomes a

(27)

problem. Because of memory fragmentation, devices are more susceptible to have buffer overflows [18]. Heap overflow is another software vulnerability, it is a type of buffer overflow and it is also mentioned in two papers([38] and [46]). Since most of the final included studies, do not make an effort in explaining why a buffer overflow is dangerous, we can assume, based on common knowledge, that it is possible to write executable code to a particular device or make it unresponsive which in return lowers the availability of a certain system. Moreover, XSS is also mentioned, in case an IIoT system is interconnected with Web Management System [45]. All papers conceptually point out such a vulnerability is plausible, without practically or experimentally showing how. Commonly, XSS is preventable while the software is still in development and it is important to think security first when designing software for critical systems.

Besides these three commonly known vulnerabilities, authors in the final papers mentioned directory traversal and zero-day vulnerabilities as potential threats. Directory or path traversal as some authors call it is usually exploited because of poor filtering and validation mechanisms on inputs that are provided by a user. If this vulnerability is exploited, an adversary could gain access to files that are owned by high-privileged users, such as a root user, which could compromise confidentiality in SCADA systems [26]. Researchers in [51] just mentioned zero-day exploits are common in IIoT system, while on the other hand, Sha et al. [46] explained how zero-day vulnera-bility cannot be used to perform an advanced persistent attack but it can be the first step towards it. Even though the authors did not go further in explaining zero-day, we know based on previous cyber incidents (see Section 2.1) that this kind of vulnerability has been used to gain access to industrial systems. Also, the authors did not specify the origin of this type of vulnerability, but we can judge it is originating from weaknesses in software.

DoS and DDoS attacks are the most reported type of attacks for IIoT systems. In fact, twelve papers mentioned this attack as a problem that is likely to occur in any part of an IIoT system. The main goal of such an attack is to limit the availability in an IIoT system, where availability is considered crucial. Usually, problems in network protocols are the origins of a DDoS attack, as TCP protocol is misused by sending a large number of SYN messages and never replying back with ACK messages. These are known as DoS flooding attacks. DoS attacks can also originate in software [38], if there is a specific software-related problem that is stopping the device from running properly. Flooding attacks can be separated into network-level and application-level categories. If flooding originates in TCP or other network protocols then we are talking about network-level attacks. If resources like CPU, memory, database are disrupted then we are talking about application-level flooding attack [47]. Most of the papers that mentioned DoS or DDoS attacks, did not provide examples of how or why those are possible. However, those papers that did, are the papers that have the main goal of stopping such attacks. Their proposed solutions for DDoS mitigation are further elaborated in next subsection (4.2.4).

MitM attacks are the second most mentioned attack for IIoT systems. Most of the papers just stated that MitM attacks are possible but did not go much into details on the origins of this attack. ARP spoofing or ARP poisoning is mentioned, in [42] and [40], as a way that this attack occurs, again this problem is related to a network protocol. Injection attacks, in general, are mentioned in five papers, including SQL injection and malicious command injection. SQL injection is possible in an HMI system and PLC devices. Both possess web servers that could be attacked [26]. Protocols such as Modbus use function codes to send commands between Master Terminal Unit (MTU) and slave devices. This is where MitM attack and injection attack correlate as MitM can be utilized to inject malicious commands to traffic between MTU and slave devices [39].

Other attacks that were reported include backdoor and impersonation attacks. HMI system can get infected with a backdoor virus, which then opens a port in HMI and allows remote connection to an adversary. This can further open new options for an adversary to do reconnaissance or launch a DoS attack on a SCADA system [26]. Moreover, a backdoor can also be considered as a vulnerability in specific firmware or software [38]. Two papers ([31] and [49]) explained how an impersonation attack could work, specific for Wireless Sensor Network (WSN). Such an attack is possible if an attacker is able to impersonate a user or a gateway node in WSN [31]. [37] mentioned this attack, but did not explain how it is possible to achieve it.

All other problems that got reported in the final papers, we classify as risks. Although using strong passwords is usually the first recommendation that any software or network engineer will suggest, password-related issues were the most reported risk. Eight papers mentioned that IIoT