Visualization of cyber security attacks

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--20/008--SE

Visualisering av

cybersäkerhetsangrepp

Jennifer Bedhammar

Oliver Johansson

2020-06-01

(2)

LiU-ITN-TEK-A--20/008--SE

Visualisering av

cybersäkerhetsangrepp

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Jennifer Bedhammar

Oliver Johansson

Handledare Katerina Vrotsou

Examinator Jonas Löwgren

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Linköpings universitet

Linköping University | Department of Science and Technology

Master’s thesis, 30 ECTS | Media Technology

2020 | LIU-ITN/LITH-EX-A--20/001--SE

Visualization of Cyber Security

Attacks

Visualisering av cybersäkerhetsangrepp

Jennifer Bedhammar

Oliver Johansson

Supervisor : Katerina Vrotsou Examiner : Jonas Löwgren

(5)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

©Jennifer Bedhammar Oliver Johansson

(6)

Abstract

The Swedish Defence Research Agency (FOI) simulates cyber attacks for research and ed-ucation purposes in their cyber range, CRATE, with a system called SVED. This thesis describes the process of creating a visualization of the log files produced by SVED, with the purpose of increasing the users’ comprehension of the log files and thereby increasing their knowledge of the simulated attacks.

To create an effective visualization a user study was held to know the users’ needs, experiences and requirements. Several designs were created based on the results and one was selected and refined using feedback from workshops. A web-based implementation of the design was created using the D3.js library, which included a directed graph, icicle chart and network graph to visualize the data. Thereafter an evaluation was held to analyze if the implementation was more effective than the log files, by letting the participants solve tasks defined by the user study.

The results from the evaluation indicates that the visualization has a higher success rate than the log files when solving the tasks. They also indicate that finding the solution requires less time with the visualization. However, since the evaluation tasks were based on the user study, the results only conclude that the visualization is more effective when solving similar tasks.

For further development the visualization could be improved with features like real time rendering and linkage with FOI’s internal systems. Additionally, with more research and further testing, the visualization could be used as a tool for standardization of graphics in cyber space.

In conclusion, a visualization of the log files has been implemented and according to the evaluation does the visualization increase the users’ comprehension of the data in SVED’s log files.

(7)

Acknowledgments

We would like to give a special thanks to our supervisor and examiner at Linköping Uni-versity, Katerina Vrotsou and Jonas Löwgren, for all the advice and feedback during the thesis. We would also like to thank our supervisor, Teodor Sommestad and client, Hannes

Holm, at FOI, for the support and guidance throughout the work. Thanks to everyone who participated in our user studies, workshops and evaluations.

Lastly we would like to thank everyone at FOI for the warm welcome and the interesting discussions during the coffee breaks, time flies when you are having fun!

(8)

2 Background and Related Work 3 2.1 CRATE . . . 3 2.2 SVED . . . 3 2.3 Visualization Design . . . 5 2.4 Evaluation of a Visualization . . . 6 2.5 Related Work . . . 7 3 Design 9 3.1 User Study . . . 9 3.2 Priority List . . . 10 3.3 Early Design . . . 11 3.4 Workshop 1 . . . 12 3.5 Visualization Design . . . 13 3.6 Workshop 2 . . . 14 3.7 Final Design . . . 14 4 Implementation 15 4.1 The Data . . . 15 4.2 Attack Graph . . . 17 4.3 Statistics . . . 18 4.4 Network Map . . . 18 4.5 Network Graph . . . 20 4.6 Timeline . . . 21 4.7 Logs . . . 22 5 Evaluation 23 5.1 Evaluation Method . . . 23

(9)

5.2 Evaluation Results . . . 24

6 Discussion 26 6.1 User Study and Visualization Design . . . 26

6.2 Implementation . . . 27 6.3 Evaluation . . . 28 6.4 Research Questions . . . 29 6.5 Future Work . . . 29 7 Conclusion 31 Bibliography 32 A Tasks 34 A.1 Part 1 . . . 34 A.2 Part 2 . . . 35

(10)

List of Figures

2.1 Screenshot of the SVED GUI, showing an example game and its attack graph. . . . 4

3.1 Different design versions of a single-page layout. . . 11

3.2 Design with a scroll layout. . . 11

3.3 Design with a tab layout. . . 12

3.4 Final design concept of the attack graph tab. . . 13

3.5 Final design concept of the statistics tab. . . 13

3.6 Final design concept of the network tab. . . 14

4.1 Graph illustrating the basic structure of the data in the REST API. . . 16

4.2 The start page. . . 16

4.3 The attack graph. . . 17

4.4 The different attack node options. . . 18

4.5 The statistics. . . 19

4.6 The network map. . . 20

4.7 The network graph. . . 20

4.8 The different network node options. . . 21

(11)

List of Tables

2.1 The different action types in SVED. . . 4

5.1 Evaluation results. . . 25

5.2 Summary of the evaluation results. . . 25

5.3 Time spent to complete the tasks. . . 25

A.1 Evaluation tool order per participant . . . 34

A.2 Evaluation results - tasks in part 1 . . . 36

(12)

1 Introduction

Cyber threats and IT-security are becoming more and more important as the world is being more digitized. In the Global Risks Report 2018 by the World Economic Forum [6] cyber attacks are listed to be the third most likely and the sixth most impactful global risk. Cyber attacks against businesses have almost doubled since 2012, and in 2016 alone more than 4 billion data records were breached. Beyond the financial costs of responding to cyber threats, the attacks can disrupt important infrastructure and systems in our society. The ability to detect, monitor and handle these threats efficiently is therefore key to protect our systems and ourselves.

The efficiency of a technological support system is often determined by the balance be-tween functionality and the human’s ability to understand the system. FOI, the Swedish Defence Research Agency, simulates cyber attacks on virtual machines, using a system called SVED (Scanning, Vulnerabilities, Exploits and Detection) to educate people on how to detect and react to them. These simulations currently result in text logs that describe the attacks, but they can be difficult to read and the structure makes the events difficult to track. This thesis will describe the process of creating a visualization of the data in the log files. By using infor-mation visualization, the abstract data in the log files can be turned into interactive graphical displays, which according to John Goodall [8] amplifies the cognition by taking advantage of human perceptual capabilities, like recognizing patterns and detecting anomalies.

1.1 Background

Cyber security is a highly active research area at agencies around the globe, and FOI is one of these agencies. Creating cyber security exercises on live networks would be very expen-sive and risky, thus FOI has created a virtual environment called CRATE (Cyber Range And Training Environment). CRATE configures a large number of virtual machines to create a controlled environment to emulate different networks and expose them to cyber threats [5]. To manage and execute these cyber attacks FOI created SVED. This tool allows the user to design large scale customizable cyber attacks with minimal effort, recording all activity of the attack in the form of text logs [9]. The activity of the cyber threat can be difficult for the user to comprehend when it is all described in text, since an attack can generate a large amount of text logs. FOI wants the user to be able to understand and analyze the activities of the

(13)

1.2. Purpose and Approach attacks better. A visualization of the text logs generated by SVED could increase the users’ comprehension of the activity during a simulated attack.

1.2 Purpose and Approach

The purpose of this thesis is to develop a prototype for interactively visualizing SVED’s log files to increase the comprehension of the user. Achieving this involves a number of steps:

• Deciding which aspects of the data are relevant and filtering it.

• Finding libraries and visualization methods that are suitable for the filtered data. • Implementing the visualization in a web application.

• Evaluating the visualization to determine if the visuals actually make the log file content easier to comprehend.

1.3 Research Questions

This thesis will answer the research questions listed below:

1. What tasks do the users aim to solve when examining the cyber attack log files? How are these tasks performed today?

2. What type of visualization methods are most appropriate for visualizing the log files, based on analysis of previous implementations and the users tasks?

3. Does the visualization enable the users to solve the defined tasks?

1.4 Delimitations

The thesis will focus on creating an effective visualization of the log files, not improving or altering the SVED framework. There is pre-existing data (log files) stored in a REST API provided by FOI, therefore no additional data needs to be collected.

(14)

2 Background and Related Work

This section describes FOI’s systems (CRATE and SVED) and discusses related work in visu-alizing cyber attacks and evaluating visualizations.

2.1 CRATE

CRATE is a cyber range created and maintained by FOI to simulate realistic cyber networks in a virtual environment for research studies [16]. The purpose of a cyber range is to ex-periment with the simulated network to test its flaws, detect malicious behaviour and resist against cyber attacks. CRATE is located in FOI’s facility in Linköping and consists of approx-imately 750 servers, which are instrumented with up to twenty VirtualBox machines each. The VirtualBox machines can be manipulated with a user interface called CRATEweb or by script and are able to handle actions like simple social network features, search engines, and typical enterprise software. The attacks carried out in CRATE are handled by the tool SVED which is also created and managed by FOI [9].

2.2 SVED

SVED is a tool to plan, execute and log complex attacks against the network simulated in CRATE [16]. The tool is built up by Python and Javascript/HTML code and can be controlled with a web-based GUI or REST API [9]. SVED is able to specify attacks in numerous ways to differentiate and make them more realistic. The GUI can for example control the dura-tion, scale and wait between attacks, as well as control what actions to take when attacks fail or succeed. Furthermore, it can determine the routines of the virtual users, like what web-sites they usually visit. SVED can also reset the virtual environment in CRATE to the state prior to the cyber attacks, if desired. All events handled by SVED result in a large amount of detailed text logs. The logs contain, for example, the time in milliseconds when the at-tacks were executed, when an attacker gained control of an computer or when users access specific websites [16]. SVED is updated through CRATE with general information of cyber security. The information gathered include vulnerability databases and softwares, exploits and network intrusion detection signatures from various sources, such as the US National Vulnerability Database(NVD), the Exploit Database and Snort Talos and Emerging Threats rule sets [9].

(15)

2.2. SVED

Figure 2.1:Screenshot of the SVED GUI, showing an example game and its attack graph.

Table 2.1:The different action types in SVED. Action types Description

Auxiliary Handles features of the game, such as proxy resets and VLAN switching

Shellcode Remote control actions

Exploit Attacks aimed at acquiring privileges on machines

Scanner Used for information gathering and mapping out networks

2.2.1 The Game Process

Attacks built in the SVED GUI are constructed as attack graphs, see Figure 2.1, where each node represents an action and the edges determine the requirements to proceed to the next action. Table 2.1 contains a description of the different types of actions that exist in SVED. When an attack graph is constructed and an environment in CRATE is set up with organi-zations, networks and machines, an attack can be launched. Attacks simulated in the virtual environment are called "games" and when a game is run multiple times the logs are saved as different "sessions" in order to compare the different outcomes.

Games can be initialized to test a software against attacks or be used in exercises held by FOI to test the participants’ skills against cyber attacks. When SVED launches an attack against machines in the environment the actions have different outcomes depending on how the victim defends itself, which determines the actions that are queued next [16]. When no more actions are able to run, or the session is terminated manually, the session is finished and the environment can be reset to its initial state.

2.2.2 The Data

As a game is running, logs are produced by the actions to give feedback on the progress and status of the game. These logs are stored in a database under the running session. The

(16)

2.3. Visualization Design session and logs can be accessed through the REST API by using the session ID-number, and through the GUI by selecting the session label in a drop-down list. In the GUI the logs are displayed 10 at a time in a table with pagination, and in the REST API they are shown in a JSON-format. The GUI shows the same log information as the REST API but the names of the columns/keys differ. Both platforms show the session label and associated logs, but the REST API also display: game ID, game label, log count, session ID and session status.

In the REST API a log is defined as a JSON object: {

"data": can contain action settings, status etc, "id": unique ID for the log,

"log_source_id": unique ID for the action producing the log, "log_source_type": label for the action type,

"status": status of the action/event, "time_stamp": date and time for the log }

and in the GUI each row is a log, with the following column names: data, entry, source, type, event and time.

The logs are analyzed to get a sense of what actions led to the different outcomes. To analyze the logs the users either go through them manually, log by log, or use the REST API or GUI and write scripts to extract specific information. Both approaches require some previous knowledge about the structure of the logs as well as occurring patterns, while the latter also requires knowledge on keywords to look for.

2.3 Visualization Design

When designing a cyber security visualization there are a number of issues and concerns that must be addressed according to Shiravi et al. [15]. Some examples are: situation awareness, user experience, occlusion, visualization techniques and evaluation.

Situation awareness and occlusion are issues that occur due to the large amount of data that is generated by modern networks. Occlusion and overcrowding occurs when the visual-ization is trying to show every aspect of the data in the same display [15]. By pre-processing, filtering and summarizing the data, the issue can be solved. Situation awareness can be im-proved through the same process, by prioritizing and only projecting critical events the visu-alization will be easier to comprehend and it will show more relevant parts of the data. User experience is a term used in design where developers define the target audience to make the encounter with the system fully understandable by their users. By knowing the users’ goal, requirements and work experience the visualization can be designed to increase its effective-ness. Information about the target audience is often acquired through evaluations [15]. When deciding on the visualization technique, Shiravi et al. [15] mostly discuss the problem of us-ing 3D visualization methods on a 2D plane. Usus-ing three dimensions often force the user to zoom, rotate and pan the display which can result in them losing sight of the data, and oc-clusions are difficult to avoid. However, adding another dimension can be achieved without causing issues by including colors, glyphs or small multiples.

It is also important to create a visual distance between the objects in the visualization according to Lallie et al. [13], and one of the most efficient ways is to use contrasting shape and color pairings. For example, there is a large perceptible distance between a circle and a rectangle, and the colors red and green. However, the efficiency of the color pairing also depends on how the shapes are colored, filled shapes have a more perceptible distance than shapes with different edge colors or textures.

(17)

2.4. Evaluation of a Visualization Best et al. [3] discuss the challenges of creating visualizations within cyber security to assure the user comprehends the data. They identified a number of key points that are rec-ommended to keep in mind and some of them are listed below [3].

• The “Big Data” problem is common in any visualization, and it is no difference in cyber security. The large amount of data can make it difficult to extract the important spec-trum. A solution is to subtract a small subset to be compared instead, but can result in false conclusions, since data is missing.

• The correlation between data can be a challenge to comprehend for a user and it is therefore important to show a distinct connection between correlated data.

• The quality of a dataset can make an impact of the result. Sometimes the data can be corrupt or invalid, hence the data need to be from a valid source and visualized in a correct manner for the user to trust the visualization. This highly depends on the expertise of the user.

• One challenge is to understand what actions were taken to counter a specific attack and if the result was successful or not. The visualization needs to be clear regarding the actions and outcomes of different approaches.

• Analysts needs to balance risks and rewards when taking actions against cyber attacks. The goal is to minimize the risks and to take specific actions they need to be confident about their decision. Therefore the analysts need enough data to understand the threat, and the data needs to be comprehensible.

2.4 Evaluation of a Visualization

It is important to understand both the visualization and the target audience to achieve an optimal result. To get a better understanding of the user and the visualization’s capabilities to represent data, evaluations can be carried out [15].

User evaluations have a key role when designing visualizations. It allows developers to measure the effectiveness and performance of the visualization as well as understanding domain specific concerns, such as predicting user work style and decision making. Staheli et al. [17] discuss two types of evaluations for visualizations, the first type is carried out during the development and can test various features and examine user reactions, while the second type is performed at the end of the development to test the maturity and the level of finality of the product instead.

When creating tasks or questions for an evaluation the ordering of the tasks may affect the evaluation results, especially when using a within-subject design for the study (the partici-pants test all conditions, e.g. user interfaces). Keppel et al. [12] discuss the advantages and limitations of using a within-subject design, where the main advantage is the efficient use of subject resources. The researcher can collect more data per subject, which is vital if subjects are sparse, and save time on preparations and instructions. The limitations comes from test-ing the same subject several times. The results for the different conditions (interfaces) might be similar since the same subject is tested on both. Another limitation that might arise from the design is incidental effects. The subject may perform better due to getting practice or may perform worse due to fatigue. Other possible effects are carryover, contrast and context effects where the subject is affected by previous tests. Keppel et al. [12] define the effects as follows:

• Carryover effect: "occurs when a treatment has a transient effect that carries over to affect whatever condition is administered immediately after it.".

(18)

2.5. Related Work • Contrast effect: "occurs when two treatments interact in a way that depends on both

conditions".

• Context effect: "in which a subject’s behaviour is influenced by the context provided by exposure to other conditions in an experiment".

To prevent these effects Keppel et al. suggest not presenting conditions in the same order every time and not testing the same material twice. To do this one can either randomize the order and materials, or counterbalance the incidental effects by systematically constructing the tests.

2.5 Related Work

Visualizations of cyber attacks have been implemented before this thesis, but they are usually created to give the user a tool to detect and analyze cyber threats instead of showing the progression and outcome of the attacks. Visualization tools have been developed for this reason as well, however none of them have been implemented to fit the data produced by CRATE and SVED.

2.5.1 Cyber Security Visualizations

Each year there is a conference called VizSec (IEEE Symposium on Visualization for Cyber Se-curity) where researchers and practitioners in information visualization and security gather to discuss and present new cyber security visualization techniques [18]. The VizSec con-ference started in 2004 and they have released 193 proceeding papers as of December 2019. These papers show the many different approaches to visualizing cyber security data, with some papers being focused on certain kinds of cyber attacks, like malware or port scanning. In this section some examples of previous visualization methods from the VizSec conference papers are presented. This thesis will focus on the SVED generated log files, and depend-ing on what kind of attacks they contain, different graphs and charts can be used as in the following papers.

2.5.1.1 Log File Visualizations

For visualizing log files, systems such as ELVIS, CORGI and OCEANS have been created. ELVIS (Extensible Log VISualization) is a security-oriented log visualization tool that al-lows visually exploring and linking of numerous types of log files through relevant represen-tations [11]. ELVIS has two main phases. In the first phase, the user imports log files and is given a global overview, a summary view. In the second phase the user selects fields to be rep-resented and those are then automatically reprep-resented in a relevant way, an appropriate chart or graph is chosen. CORGI (Combination, Organization and Reconstruction through Graph-ical Interactions) was created by the same people as ELVIS and is also a security-oriented log visualization tool, allowing exploration and linking through representations and global fil-tering [10]. CORGI gives an overview of the file with possibilities to select, filter and explore specific fields and log files further, similar to ELVIS. The tool uses sparkline type bar charts to show the field distribution in selected log files and area graphs to display event time dis-tribution for each log file. OCEANS (Online Collaborative Explorative Analysis on Network Security) is an online visual analysis system, showing temporal overviews of netflow and different log data through multi-level visualizations [4]. The system uses a timeline overview consisting of several horizon graphs and then show more detailed content in ring graphs and a connection river graph.

An extension to a PHP-Intrusion Detection System has also been created by Alsaleh et al. [1], where PHPIDS log files are visualized to try to understand web server attacks better. They

(19)

2.5. Related Work explore and study the suitability of several data visualizations, for example the scatter plot, tree view, IP address aggregation graphs, bar graphs, ring views and parallel coordinates.

2.5.1.2 Other Visualizations

The following examples are not connected to log files but describe other types of cyber secu-rity systems and visualizations used in different papers.

IMap by Fowler et al. [7] visualizes network activity by using internet maps. By visu-alizing large volumes of dynamic network data they enable detection of security threats. A canonical map is used to represent the autonomous system level of the internet topology and heat maps are used for the aggregated IP traffic. PERCIVAL (Proactive and rEactive attack and Response assessment for Cyber Incidents using Visual AnaLytics) is another tool for visu-alizing cyber security [2]. Specifically, a situational awareness tool that relies on attack graph representation, supporting proactive and reactive analysis. The attack graph is used to ana-lyze attack paths, path topology, likelihood and risk, as well as anaana-lyze the attack progress along proactively computed attack paths and providing insights on possible evolutions of at-tacks [2]. To visualize a specific network, for example a malware distribution network, Peryt et al. [14] used a directed graph. The directed graph captured the temporal topological struc-ture of the MDN (Malware Distribution Network) at a given point in time. With the nodes acting as either intermediary facilitating malicious traffic, malicious hosts or root malicious hosts.

(20)

3 Design

This chapter describes the user study and design process of the visualization.

3.1 User Study

The first step of creating the visualization was to identify the potential users and ask them to participate in a user study. The aim of the study was to discern the users’ needs and experiences of CRATE and SVED, and to find out their purpose of using the text logs as well as eventual requests for the visualization design.

A list of potential users was given by the supervisor at FOI and one participant was found after discussing the log files at a meeting. A total of four people accepted to participate in the study. All participants worked at the Information Security and IT-Architecture unit at FOI, and they were all familiar with CRATE. Two of them had never seen the log files produced by SVED but all of them were interested in a visualization of the log files. Some potential use cases were for example education, debugging and showcasing for clients.

The user study consisted of an interview, divided into three parts. The first part had questions about the users’ experience with CRATE and SVED. The second part focused on the log files, for example what information they look for, if the information varies, and pros and cons of using the text logs. The final part was about the visualization. The participants were asked about preferences of visualization methods, opinions on different design alternatives and if they had any specific requests or thoughts about the visualization.

The result from the user study clarified a range of different tasks that the users want to perform when using the visualization and how they solve these tasks today. Problems like debugging, finding anomalies or comparing the results of games in CRATE are solved today through searching the log files manually, using the GUI or scripts. Such manual work can be tedious and the findings can be hard for any outsider to comprehend. By using a visualization they would like to easier solve these tasks and show the data in a way that is easy to comprehend. The tasks they would like to solve are:

• Compare sessions and detect the differences.

• Observe the status of a network and see compromised machines. • Follow the attack, see which network it attacks and where it is stopped.

(21)

3.2. Priority List • See how much of the network the aggressor is able to see.

• See which, when and how many attacks that are successful.

• Follow the duration of the attacks and compare it to how participants in exercises per-ceived it.

• Filter the content by specific injectors.

• Get the log files for a specific action or machine within a session.

3.2 Priority List

The user study also established the most important elements to include in the visualization. An attack graph plotting the actions of the attacker was considered to be the most important graph, followed by graphs showing the number of affected machines and succeeded or failed attacks. A list of what to prioritize when implementing the visualization was created from the results of the user study, and is listed below.

• MVP (Minimum Viable Product) - Must have:

– A simplified overview of the data.

– Attack graph: to give status and feedback of the attacks, with different visualiza-tions depending on attack type.

– Graph illustrating number of affected machines.

– Graph illustrating successful/failed attacks.

– Be able to filter the data on games, sessions and machines (attacker).

• Good to have:

– The possibility to get more details on demand of attacks and the networks etc. in the visualization.

– A timeline to go back and forth in time.

– Network map of organizations, networks and machines. – Attack plan, showing what is planned.

– Ability to filter on the attack type. • Nice to have:

– Network map: showing the attackers point-of-view.

– Differentiate attacks further (ex. browser from server exploit). – Add the logs to visualization.

– Add possibility to compare sessions.

– Connect with CRATE Exercise Control (which is a system they use to evaluate participants during exercises).

(22)

3.3. Early Design

3.3 Early Design

The priority list was used as a foundation for creating multiple concept designs, some of which were selected and digitized.

The early design concepts consisted of three different types of layouts. The first type was a single page layout where all visualization elements were visible at the same time and on the same page, see Figure 3.1. The second type was also a single page layout but where the visualization elements were larger than the window size and would therefore be accessed by scrolling in the page, see Figure 3.2. The final type was a layout with multiple tabs where the visualization elements were categorized into respective tabs, except a timeline which was visible on all tabs, see Figure 3.3.

(a) (b)

(c) (d)

Figure 3.1:Different design versions of a single-page layout.

(23)

3.4. Workshop 1

(a) (b)

Figure 3.3:Design with a tab layout.

3.4 Workshop 1

The design concepts as well as different versions of icons and graph types were introduced to the thesis supervisor and participants of the user study at a workshop, to get their opinions on the different options. The participants of the workshop favored the tab layout over the other options, see Figure 3.3, but they did not like the pie charts, and deemed the statistic page as quite unnecessary. The statistics could be interesting for outsiders but would not be important for the scientists at FOI.

For the network graph the participants preferred the icicle chart (Figure 3.1b) or a tradi-tional network graph but wished for less colors within the graphs. They liked both the icons and geometric shapes in the graphs and would like both implemented in the visualization, possibly with a button to switch between them. The participants also liked the idea of a time-line, but indicated that actions could run over a longer time, and also occur simultaneously. The actions within the timeline would therefore have to be separated somehow.

For the attack graph the participants thought the colored edges were confusing in regard to being able to follow the advancement of the attack, they were not sure if a colored edge meant the action (node) had the indicated status or if it was an edge condition. They also pointed out that the attacks in SVED can be cyclical and visualizing it as a tree would not be suitable. The participants also wanted to have access to the logs within the visualization to be able to compare the graphs with the actual data.

(24)

3.5. Visualization Design Their feedback led to the creation of a new design consisting of a mix of previously men-tioned concepts with some alterations based on their opinions. The new design was shown to the supervisor at a meeting and was approved as the final design.

3.5 Visualization Design

The final design concept was constructed using the previous concept designs and the feed-back from the first workshop. The digitized version was created in GIMP and is shown in Figure 3.4, 3.5 and 3.6.

Figure 3.4:Final design concept of the attack graph tab.

(25)

3.6. Workshop 2

Figure 3.6:Final design concept of the network tab.

3.6 Workshop 2

Another workshop was held halfway through the project to get feedback on the current progress of the implementation. The participants consisted of the same people as the pre-vious workshop as well as other employees from the unit with interest in the thesis.

During the workshop a summary of the user study and previous workshop was given, and an implementation of the design in section 3.5 was demonstrated and explained to the participants. They were also asked a couple of prepared questions regarding specific design alternatives and functionalities. The participants gave feedback on the implementation and discussed some improvement ideas. One of their ideas was to indicate more clearly when actions started and ended within the timeline, and also to show the connection between the actions in the timeline and the attack graph somehow. On the statistics page they suggested adding an additional graph visualising the time distribution of the different types of actions in the attack graph. On the network page they liked the icicle chart as a network map but re-quested other colors to be able to distinguish attacked machines more easily, they also wanted to mark which logs determine the colors. More general ideas were to change the IP input from a select list to text input (in case the data is transferred to other IPs), and update the time input when the timeline handle is moved.

The feedback from the discussion was written down and potential alterations were put into a priority list, which was used as a plan for further implementation. Section 3.7 mentions some of the alterations implemented after the workshop.

3.7 Final Design

The final design consists of all the elements presented in the digitized concept, see Figures 3.4, 3.5 and 3.6, with some alterations based on the feedback from the second workshop. Some of the major differences are: the colors in the icicle chart, the appearance of the attack graph and timeline, and addition of a network graph.

(26)

4 Implementation

When the design was settled the implementation could commence. The visualization was implemented as a web-application and was constructed with JavaScript, together with HTML and CSS for front-end development, and PHP and AJAX (Asynchronous Javascript and XML) for data handling. The project files were structured into two categories. Where the first category consisted of files with the purpose of gathering and parsing data from the REST API and then distributing it to the second category of files which built the visualization elements. Every visualization element in the project was implemented in a separate file to ease the workflow when coding simultaneously.

4.1 The Data

The data was accessed through the REST API using PHP and AJAX. AJAX allowed the web application to be updated without reloading the whole page, and by using the XMLHttpRe-quest object the web application could send and reXMLHttpRe-quest data from the server (database). PHP was used to handle the requests from AJAX, read the content of the database and send it back to the application. Figure 4.1 shows the basic structure of the data used for the visualization. The data was received in JSON-format and was then parsed to fit the target visualization ele-ment (required data structure). To access the wanted data, several requests could be necessary since games, organizations, networks and sessions are separated on the REST API. Only be-ing connected by ID numbers, as illustrated in Figure 4.1. To get victim systems (machines) in a game one must:

1. Request the selected game by ID 2. Get organization ID-list

3. Request all the separate organizations by ID 4. Get network ID-list

5. Request all the separate networks by ID 6. Get victim system list

(27)

4.1. The Data

Figure 4.1:Graph illustrating the basic structure of the data in the REST API.

Figure 4.2:The start page.

This requires a lot of function calls and requests. To reduce the number of requests, the data is only fetched once when pressing "load data", and important information is stored in the graph nodes for easier access later (when updating icons for status feedback for example).

4.1.1 Filtering the data

The web application has a filter section that determines which data to display in the visual-ization, see Figure 4.2. The filter has six input fields: IP-address, game, session, injectors, date and time. The first four inputs are in the form of select lists and the last two are input fields of the types: date and time.

When loading the page or switching the selected IP-address the application fetches the name and ID of all games in the selected database and adds them to the game select list. Then when selecting a game, the application fetches injectors along with the names and IDs of all sessions in the game, and puts the data in their respective select lists. The date and time inputs are set to the current date and time when loading the page and can be set to any other time of choice, but automatically changes to a valid time if there is an invalid input. At the bottom of the filter section is a button, that when pressed, requests data from the REST API according to the settings and sends the data to the functions that create the visualizations.

(28)

4.2. Attack Graph

Figure 4.3:The attack graph.

4.2 Attack Graph

The desired visualization of the attack graph was to have a structured directed graph that was easy to follow, since it can be confusing if there are many nodes and no apparent direction in the graph. Research into directed graph libraries led to dagre.js, which had a solution to create directed graphs in a structured way and had the support to render using the D3 library. The attack graph shows the progress of the attacks and display the status of each action with different colors, see Figure 4.3. It was implemented using two primary functions, one that creates and render the graph, and another that updates the rendered graph’s nodes to give feedback. The idea was to render the graph once a session was loaded and then update the nodes to give feedback at different time frames. The function that construct the tree was implemented using functions from the dagre.js library, which provided the functionality to create nodes and connect them with edges. The nodes in the attack graph were initialized with information such as ID, action type, an image, and a description. These were connected with edges which contained information of its conditions and requirements to be fulfilled. After the nodes and edges were initialized the D3 renderer was inserted to render the graph in a structured manner.

The function that updates the graph was implemented to switch the images in the nodes instead of re-rendering the graph, which reduces the computation time when visualizing large sessions. Data containing active actions in the attack graph are sent to the function, which in turn access the respective nodes and switches the node images depending on its resulting condition to give feedback to the user. Two sets of images for the nodes were cre-ated. One set that contains detailed images, see Figure 4.4a, and another set containing simple geometrical shapes, see Figure 4.4b. The detailed images were made to give a better represen-tation of the node types but tend to be blurry when the nodes are small and at large numbers, and vice verse with the geometrical shapes. Both sets of node images were designed to have a large perceptible visual distance between the action types with distinguishable icons and contrasting shapes. A switch button at the upper right corner was added for the user to be able to choose between which set of images fits the current session best.

To give more interactability and more information to the user a tooltip was implemented that appear when hovering over a node in the attack graph. The tooltip gives information

(29)

4.3. Statistics

(a) (b)

Figure 4.4:The different node options for the attack graph available by using the switch button, where a) contains the detailed icons and b) contains the simple geometrical icons.

Icon description: cog/diamond = auxiliary action, skull/triangle = exploit action, console/square = shellcode action and radar/circle = scan action.

about: the name of the action, ID, a short description, which injector it uses, and the first and last time it is used.

4.3 Statistics

The statistics page consists of three bar charts and a game summary with two lists with in-formation, see Figure 4.5. The bar charts were implemented using the D3.js library and show data over attacked machines, time distribution of different types of actions and the occur-rences of each action status. The status distribution was obtained by counting the occuroccur-rences of each state in the logs, the attacked machines were extracted by counting the number of at-tacked machines in the network data, and the time distribution was given by calculating the duration of each action.

The summary includes general information about the game and session, for example, start time, end time and which actions are active. The first list shows data of the whole game while the second list shows data until the set time in the timeline. Some variables were obtained directly from the data, such as game name and number of logs. Other variables were obtained by counting objects in the data or going through the content of objects, such as occurrences of different states and which actions produces logs.

4.4 Network Map

The network map shows the hierarchy of the network in the selected game and gives feedback on the session in terms of attacked computers and networks in the organizations by changing the cell color, see Figure 4.6. The network map was implemented as a zoomable icicle chart from the D3.js library. It required a specific data format to build the hierarchy, with the game as the root and organizations, networks and machines as children. The data was fetched as in the example in section 4.1, with multiple requests due to the structure of the database. The data was built up as follows:

(30)

4.4. Network Map

Figure 4.5:The statistics.

"Game":{ "Organization A":{ "Network A":{ "Machine A":1, "Machine B":1 }, "Network B":{...} }, "Organization B":{...} }

Where the value of the machine represents its status. The default value is one, and is the minimum to make a cell appear in the icicle chart. If a machine has been accessed in some way (e.g. scanned) or a machine has been attacked the value is set to two.

The color of a machine cell is determined by the machine’s value and the parent’s color. If the machine has a value of two, it is either red, orange, yellow or blue. The color depends on the attack type and status. If the machine has been subjected to a successful exploit it will become red, if it was subjected to an attempted exploit it will become orange. If the attack was an auxiliary, scan or shellcode action the machine will become yellow and if a network has been scanned the machine will become blue. Otherwise it takes the color of its parent, given that the parent is not the game.

If a network contains accessed or attacked machines it will take the color of that child, unless the network is already marked as attacked (red). This is checked by comparing the number of children (machines in the network) with the sum of their values. If the sum is larger, at least one machine has been accessed or attacked.

Accessed and attacked machines are extracted by checking the output (Data) of the logs, see Figure 4.1. If the output contains certain keywords they usually contain target IP-addresses for the action. These IP-IP-addresses are extracted and stored in an array that is passed to the network map and network graph. If the IP-addresses match machines in the network those cells and nodes gets updated with new values and colors.

A tooltip was also added to give information on the type of node, its name, and IP, if it is a machine.

(31)

4.5. Network Graph

Figure 4.6:The network map.

Figure 4.7:The network graph.

4.5 Network Graph

An additional representation of the network was implemented using the same data as the network map, but represented as a network graph, see Figure 4.7. The difference between the two representations is that the network map only shows the hierarchy of the network, while a network graph show the connections between the networks within the organizations, but do not show the hierarchy as distinctly as the network map.

The network graph was implemented using the D3 library. The nodes and edges of the organizations were created using the hierarchical data and separated from each other using D3’s force simulator. When a graph is created the force is applied for a couple of seconds until the nodes are well separated. Then the simulation is stopped and the icons are added to

(32)

4.6. Timeline

(a) (b)

Figure 4.8:The different node options for the network graph available by using the switch button, where a) contains the detailed icons and b) contains the simple geometrical icons. Icon description: screen/square = machine, signal/circle = network and humans/triangle =

organization.

the nodes, where the organization, network and machine nodes each have different images. Like the network map the network graph is implemented to mark attacked machines and networks. To mark the attacked machines/networks the icon is switched to another image (with different color). Similar to the attack graph, two sets of images were created of both detailed icons and geometrical shapes for the user to switch between, depending on what fits the current session best. The two node options are shown in Figure 4.8.

A tooltip was created to give more feedback to the user. The tooltip appears when hov-ering over the nodes and display information of the type of node, its name, and IP, if it is a machine.

4.6 Timeline

The goal of the timeline was to show the duration of the session and allow the user to observe the graphs at different time frames. It was also decided that the timeline should display when different actions in the attack graph were active. This was achieved by using a swim lane graph, where each action could be separated into different lanes (see Figures 4.3, 4.5, 4.6 and 4.7). Beneath the lanes an axis was added that scales by the duration of the session, and since a session might last longer than 24 hours it was implemented to scale by dates. Actions were inserted as rectangles in separate lanes depending on the type of action, and the length of the rectangles was scaled by when the action started being active until the action either succeeded or failed. A marking was added at the start and end of each rectangle to be able to detect when they start or end if multiple actions are overlapping.

A tooltip was added to the rectangles to display the actions that belongs to each rectangle. The tooltip was implemented to show the label of all actions at the time corresponding to the mouse position and highlight the respective actions in the attackgraph, if it is displayed.

An interactive handle was implemented in the timeline to allow the user to set a time of choice, which filters the data sent to the other graphs. The handle was implemented as a line which extracts time stamps from the timeline and then scales depending on the width of the timeline to decide its position. When the user interacts with the handle it calls the functions which update the other graphs and filters them with the selected time stamp.

A play button was implemented that, when pressed, continuously moves the handle for-ward and updates the other graphs simultaneously to show the sequence of events in the game. The play speed is determined by the timeline settings, seen in the bottom right corner of Figure 4.9. The function was implemented by selecting the handle and increasing its value

(33)

4.7. Logs

Figure 4.9:The log pop-up showing logs produced by a specific action (CTRL-clicked node). by a fraction of the timeline’s length. For each step it calls the functions to update the other graphs, until it reaches the end of the timeline. Once the button is pressed the icon is changed to a pause symbol and when pressed again it stops the handle and switches the icon back to a play symbol.

4.7 Logs

The logs were added to the graphs in the form of a pop-up, see Figure 4.9. The purpose of including the logs was to give the user more details on what has happened during the game. To access the logs and open the pop-up, the user "CTRL-clicks" on either a node in the attack graph or a machine in the network map or graph. A window then appears containing a table with the relevant logs, which depends on the clicked graph. The attack graph pop-up contains logs produced by the action that has been clicked, and the network graph and map pop-up contains all logs affecting the clicked machine. In addition to listing the log contents, each log in the table has been given a row that tells which color the log gives the selected action or machine.

(34)

5 Evaluation

An evaluation of the visualization was held when all the components were implemented and all known bugs were solved.

5.1 Evaluation Method

The evaluation consisted of a couple of tasks defined by the answers in the user study, section 3.1, which were to be solved by using the log files or the visualization on a computer with Windows 10. The purpose of the evaluation was to examine if the visualization was more effective than the log files to solve the defined tasks. The tasks were divided into two groups with three categories each. The categories were general tasks, specific tasks and comparisons, the exact tasks are shown in Appendix A, section A.1 and A.2. Below is an example of tasks in the different categories:

• General task: Find out when the game started.

• Specific task: Check how many logs there were at time 2019-09-25 10:34:15.

• Comparison: Compare session 38 and session 39 and see if they have any failed actions in common.

The tasks were distributed so that the two groups would require the same type of information and be similar in difficulty. If one group had the task "count the number of successful attacks", the other had for example "count the number of failed attacks". The purpose of this was to get two separate but equal tests, one for the logs files and one for the visualization. Which group of tasks was used for the log files and the visualization was alternated between the participants.

Those invited to participate consisted of the same group as in the user study, all partici-pants were therefore employed at FOI and had varied knowledge of the log files. The major-ity of the participants had seen earlier implementations of the visualization at the workshops but had never interacted with it. Before the tasks were introduced a short explanation of the visualization was held to the participants who had never seen the visualization, and a PDF on how to navigate within the REST API was shown to those who had never seen the log files before.

(35)

5.2. Evaluation Results The participants were told to read the tasks out loud and think aloud when trying to solve the tasks. If they did not understand the task or did not know how to continue they could ask for help to get a hint, and if they could not complete the task at all the participant were allowed to continue to the next task. Their comments and results were written down as well as if they had asked for help or failed to complete the tasks. The task results were catego-rized as either: correctly solved, wrongly solved or not solved. "Correctly solved" meant the participant completed the task with a correct result, "wrongly solved" meant the participant completed the task but with an incorrect result and "not solved" meant the participant could not complete the task. They were also timed during the evaluation, with one timer for each task category and tool.

5.2 Evaluation Results

There was a total of five participants in the evaluation and their time as well as how well they completed the tasks are shown in Table 5.1, 5.2 and 5.3. Participant 1, 3, 4 and 5 had none to slight experience with the log files prior to the evaluation, however, participant 2 was very experienced with the log files, and therefore had an upper hand compared to the other participants when using the log files during the test. None of the participants had interacted with the visualization though, and only three of the five participants had attended the second workshop.

Table 5.2 shows that the visualization has a higher success rate than the log files when solving tasks (86.2% compared to 53.8%). The number of "not solved" tasks are also signifi-cantly different between the tools. The participants completed 45 of the 65 tasks with the logs (69.2%), and all 65 tasks with the visualization (100.0%). The number of "wrongly solved" tasks are very similar however, with 10 for the logs and 9 for the visualization. The results of each task is shown in a table in Appendix A, section A.3.

Table 5.1 shows a more detailed view of the results in Table 5.2, where the result of each participant is displayed per section (task category). The results show that all participants had a higher success rate with the use of the visualization compared to the use log files in every task category, except for participant 4 in General Tasks.

Table 5.3 displays the time it took for each participant to complete the sections in the evaluation. It shows that all participants, except for participant 1, completed the tasks faster with the use of the visualization. However, participant 1 skipped a lot of tasks with the logs due to not knowing how to proceed which affected the time of the evaluation.

At the end of the evaluation each participant was asked which tool they preferred if sim-ilar tasks would be solved again, and all five participants favored the visualization over the log files. However, the most experienced user (Participant 2) commented they would prefer the log files when debugging, since scripts could find most of the problems and no visual representation is needed during debugging when the knowledge of the data is high.

After answering the final question the participants were asked if they had any further in-put or comments regarding the visualization, log files or evaluation. Most of the comments were about minor design changes, for example changing font size and color to improve read-ability, and moving all the text, tabs and filtering together for faster navigation. Another input was a question and comment regarding frameworks, the participants thought it was positive to not use frameworks since they can become outdated. The participants explained that they have had problems in the past when working on projects with outdated frameworks, since it has been difficult to merge the old framework with a new updated one.

(36)

5.2. Evaluation Results

Table 5.1:Evaluation results.

Evaluation Results Logs Visualization

Participant (P), Total (T) P1 P2 P3 P4 P5 T P1 P2 P3 P4 P5 T General Tasks Correctly solved w/o hint 2 3 3 4 3 15 5 6 4 3 4 22 with hint 0 0 0 0 0 0 0 0 0 0 2 2 Wrongly solved w/o hint 0 1 2 0 0 3 0 0 1 3 0 4 with hint 1 0 1 1 0 3 1 0 1 0 0 2 Not solved 3 2 0 1 3 9 0 0 0 0 0 0 Specific Tasks Correctly solved w/o hint 1 6 5 2 4 18 4 6 6 4 5 25 with hint 0 0 1 0 0 1 1 0 0 1 1 3 Wrongly solved w/o hint 0 0 0 1 1 2 1 0 0 1 0 2 with hint 0 0 0 0 0 0 0 0 0 0 0 0 Not solved 5 0 0 3 1 9 0 0 0 0 0 0 Comp. Task Correctly solved w/o hint 0 1 0 0 0 1 1 1 1 0 1 4 with hint 0 0 0 0 0 0 0 0 0 0 0 0 Wrongly solved w/o hint 0 0 0 0 1 1 0 0 0 1 0 1 with hint 0 0 1 0 0 1 0 0 0 0 0 0 Not solved 1 0 0 1 0 2 0 0 0 0 0 0

Table 5.2:Summary of the evaluation results.

Summary Logs Vis.

Number (N) N (%) N (%)

Correctly solved 35 53.8 56 86.2 Wrongly solved 10 15.4 9 13.8 Not solved 20 30.8 0 0

Table 5.3:Time spent to complete the tasks.

Participant 1 Participant 2 Participant 3 Participant 4 Participant 5 Logs Vis. Logs Vis. Logs Vis. Logs Vis. Logs Vis. General Tasks 7:25 8:00 17:30 3:00 20:23 13:34 14:44 9:47 19:31 13:51 Specific Tasks 4:13 4:50 6:03 2:47 9:22 6:38 25:17 4:32 11:16 6:47 Comparison Task 2:05 2:03 2:35 0:30 8:18 0:31 0:22 7:41 3:12 0:41 Total time 13:43 14:53 26:08 6:17 38:03 20:43 40:23 22:00 33:59 21:19

(37)

6 Discussion

This chapter will discuss the results from the user study, design and workshops, evaluation and final visualization. It will also discuss the choices regarding the implementation methods and answer the research questions for this thesis.

6.1 User Study and Visualization Design

The user study gave a lot of insights in potential use cases, which turned out to be wider than anticipated. It showed that the number of log file users was very limited, but the number of potential users of the visualization was a lot bigger. The log files were mainly used for debugging at the time, but the visualization could also be used for exercises (education) and showcasing for clients, which opens up to a lot more users. The wide variety in the partici-pants’ experiences of the log files was beneficial for the study as well since it gave different points of view and feedback. The purpose of the user study was to know the users’ goals, requirements and work experience in order to increase the effectiveness of the design. Based on the results from the user study, the first part was achieved. The designs were also cre-ated based on the priority list that was put together after the user study, and according to the workshop feedback and the final evaluation the design has been effective in fulfilling the users needs which means the last part has been achieved as well.

The components in the visualization were based on the user study, and the design and actual components were decided based on related work, previous knowledge and the work-shops. The main components which were the attack and network representation both had several different design versions at first.

The attack representation had two different versions, the first was an attack graph and the other an attack tree. The users wanted to be able to follow the attack path and progression, and both the graph and tree provides a clear structure and shows the topology of the attack path. The graph has been used for similar visualizations as mentioned in section 2.5.1.2, where they use it to analyze attacks. Another reason they were included in the design was that the SVED GUI uses a graph structure when creating the attack, and having a consistent structure makes the transition between the two GUIs easier for the user. Both the tree and graph were presented at the first workshop, but the tree design was discarded since there could be cyclical attacks in a game. However, the resulting implementation is a mix of the two

(38)

6.2. Implementation since the dagre.js library is able to create structured directed graphs. This made it possible to utilize the distinct path topology of a tree with the cyclical properties of a graph.

The network representation had three different versions. Some were focused on the hi-erarchical structure and others on the connections within the network. The users mainly wanted to be able to see which machines and networks had been attacked during the game, and thus several charts and graphs qualified for the description. In the early design examples in section 3.3 an icicle chart, circle packing chart and graph was drawn up and presented at the workshop. The participants of the first workshop preferred the icicle chart and graph as representations of the network, where the icicle chart gives an overview of the network with clear indications of which network has been attacked and the graph gives a more detailed view of the network.

Another component with several versions was the statistics where the methods ranged from simple bar charts to pie charts and sunburst charts. The idea was to give more details on the log contents, much like in the visualization examples in section 2.5.1.1, where they give an overview using graphs and summary views and then give details by selecting parts of the data or displaying the distribution of certain events in other graphs. The bar chart was chosen at the workshop because the difference between the bars is more distinguishable than the difference in slices of a pie chart.

Every component of the design was created keeping the challenges mentioned by Shiravi et al. [15] and Best et al. [3] in mind, see section 2.3. A user study and two workshops were held to increase the effectiveness of the design. The data was filtered and divided into several visualizations to avoid occlusion and overcrowding, and the visualizations have been kept as simple and clean as possibly to be clear regarding the results of the attack. As for the graphical components in the visualizations the icons (nodes in the graphs) have been designed to have distinct and diverse shapes and colors to be able to tell them apart easily, again to make the results clear and data comprehensible as recommended in section 2.3. The status colors in the graphs are based on basic color association, where most people associate green with something good, red with something bad and yellow or orange with something to be cautious about. The blue color in the network graphs was chosen due to it being neutral and easily differentiated from the other colors. The path conditions (edges) in the attack graph was set to have the same colors and dash marks as SVED’s attack graph to keep a sense of consistency and make the transition between the visualization and SVED’s GUI smoother.

6.2 Implementation

The visualization was mainly constructed with JavaScript, with the use of HTML and CSS for the front-end and PHP and AJAX for data handling. No framework was used for back-end or front-end implementation during the development. The advantage of using a framework is that it could simplify the workflow and increase the thesis replicability. However, there were uncertainties how well the D3.js and dagre.js libraries would collaborate with frameworks and therefore no framework was used. Furthermore, it will probably be easier for an outsider to understand explicit JavaScript code compared to a framework they might have never used. Not using a framework also lowers the risk of the code becoming outdated and makes it easier to develop further, like one participant mentioned during the evaluation, see section 5.2.

Two workshops were held during the thesis, where one was held late during the im-plementation phase. Additional workshops could have been added throughout the thesis, to ensure that the project was in the right track during the implementation and give the re-searchers at FOI more chances to provide feedback. This would in theory increase the chances of creating a product that would fit their vision of the visualization even better.

Not all planned elements were implemented in the visualization. According to the prior-ity list created during the user study everything from the MVP (Minimum Viable Product) and

Visualization of cyber security attacks

Department of Science and Technology

Institutionen för teknik och naturvetenskap

LiU-ITN-TEK-A--20/008--SE

Visualisering av

cybersäkerhetsangrepp

Jennifer Bedhammar

Oliver Johansson

2020-06-01

LiU-ITN-TEK-A--20/008--SE

Visualisering av

cybersäkerhetsangrepp

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Jennifer Bedhammar

Oliver Johansson

Handledare Katerina Vrotsou

Examinator Jonas Löwgren

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Linköping University | Department of Science and Technology

Master’s thesis, 30 ECTS | Media Technology

2020 | LIU-ITN/LITH-EX-A--20/001--SE

Visualization of Cyber Security

Attacks

Visualisering av cybersäkerhetsangrepp

Jennifer Bedhammar

Oliver Johansson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Background

1.2

Purpose and Approach

1.3

Research Questions

1.4

Delimitations