Modelling and simulating Identity and Access Management based lateral movement in a cloud infrastructure

(1)

IN

DEGREE PROJECT ENGINEERING PHYSICS, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2019,

Modelling and simulating Identity and Access Management based lateral movement in a cloud

infrastructure

Using simulation techniques and graph theory for computer security applications

EMMA FILIPSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

(3)

Modelling and simulating Identity and Access

Management based lateral movement in a cloud

infrastructure

Using simulation techniques and graph theory for computer security applications

EMMA FILIPSSON

Master in Engineering Physics Date: June 5, 2019

Supervisor: Benjamin Greschbach Examiner: Anatoly Belonoshko

School of Engineering Sciences (SCI) Host company: Spotify AB

(4)

(5)

iii

Abstract

Cloud infrastructures offer easily accessible, reliable, and scalable infrastructures. The amount of cloud infrastructure usage in the industry is increasing, and it is becoming a vital part of many companies.

Lateral movement based on Identity and Access Management (IAM) permissions of service accounts is a type of threat that is amplified in cloud infrastructures. It refers to the movement of an attacker between projects and resources hosted in a cloud infrastructure. The attacker using lateral movement gets access to more resources and thereby, the risk of sensitive data leakages increases.

This thesis proposes methods to visualize and analyze the possible IAM based lateral movements. It also suggests strategies to reduce the possibilities for lateral movement. For that, two optimization problems are solved. The first is isolating two sets of projects. The second is finding the most powerful service accounts.

By modelling the possible IAM based lateral movement, security engineers can get a visualization of their cloud infrastructure and better understand the security risks. In this thesis, the IAM based lateral movement is visualized with graph models.

The optimization problem of separating two sets of nodes in the graph model is solved with the graph algorithm "max-flow min-cut". The min-cut represents permissions of service accounts that need to be removed in order to separate the two sets of projects in the cloud infrastructure. It can be used to isolate sensitive projects.

The most powerful service accounts are the accounts that, if removed, remove the most possibilities for lateral movement. Identifying the most powerful service accounts can be an effective way to increase the security posture of a cloud infrastructure. For example, by reducing their permissions or adding additional controls to them, like monitoring and alerting.

The resulting models and optimizations were useful for security engineers.

They were built in matter of seconds and could be applied to large scaled cloud infrastructures.

Keywords— Computer Security, Cloud Infrastructure, Lateral Movement, Identity and Access Management, IAM

(6)

(7)

v

Acknowledgements

First and foremost, I would like to thank my supervisor Benjamin Greschbach for guiding me in this work and for everything he thought me. With constant support, he helped structure this work and introduced me to all the new con- cepts needed to complete this project.

I would also like to thank my examiner Prof. Anatoly Belonoshko for important feedback and help during the project. With his help, I was able to apply physics knowledge to the music and tech industry.

While working on this project, I was surrounded by amazing people at Spotify HQ. I would like to give my special thanks to the Wasabi team who made my days extra enjoyable and meaningful. I am grateful to all of you I had the pleasure to work with during this and other related projects. I am particularly grateful for the time my manager Katja Lotz, and again supervisor Benjamin Greschbach, spent on spell-checking this thesis.

(8)

(9)

Chapter 1 Introduction

Cloud based services are constantly growing in diversity and scale. They provide easy access and reliable infrastructures, but also pose new challenges for security engineers in the work to protect the infrastructures from attacks. A new amplified type of threat, using lateral movement based on Identity and Access Management (IAM) permissions of service accounts, has been introduced with cloud computing. [1]

IAM is the management of policies that control which identities have access to which resources. An IAM policy is a collection of role-bindings associating identities with roles to a resource. The role decides what permissions the identity has within the resource. For example, permission to view a resource is managed with the role ’viewer’.

Lateral movement is the technique attackers use when moving through a network after an initial compromise to reach more valuable targets. Lateral movement usually refers to taking over servers using network connections. This thesis focuses on attackers who take over servers by using credentials avail- able on the already compromised server. In a cloud infrastructure, this lat- ter kind of attack is especially relevant as machine-to-machine credentials are highly standardized and can easily be tested and prepared by an attacker before compromising the first server. In the cloud infrastructure, servers are instead projects but same techniques applies. [2]

The IAM based lateral movement between projects is possible if a service account within one project has a role that allows usage of resources, and if this includes service accounts, within another project. This because service accounts can be both identities and resources.

1

(12)

Ideally, all possibilities for lateral movement should be removed to restrain the reach of a potential attacker. Service accounts have access to different project for various reasons and with different permissions, and it might not be possible to remove all. There is a trade-off between security and usability, but this does not always have to be a linear dependency. The goal of this thesis is to find ways of increasing security posture with the least decrease in usability.

Modelling the IAM permissions based lateral movement in a cloud infrastructure will provide security engineers the visualization of the problem they cannot get from the permissions data only. Using graphs to model this, nodes represent projects and edges represent possible lateral movement.

Besides the visualization, two optimization problems are being tackled in this thesis; the separation of two sets of projects and finding the most powerful service accounts. The optimization problems were chosen based on the two aspects of the threat with lateral movement they can solve: securing the most sensitive projects and finding the powerful service accounts that need to be monitored or removed. Both problems are solved using graph models. The separation of two sets of project can be done to find an optimal way of isolating sensitive projects and finding the most powerful service accounts can guide security engineers on which accounts need extra control, or should be removed. The resulting suggestions can be used to reduce the security risk of lateral movement, and thereby increase security posture, in an optimal way.

1.1 Research Question

Lateral movement based on IAM permissions of service accounts is an amplified type of threat in cloud infrastructures. From a security engineering perspective, it is of interest to reduce the possibilities for lateral movement with minimal impact on legitimate infrastructure usage.

To address this issue and provide a tool for improving security posture, ques- tions with different aspects of the problem were formulated. First, how can IAM based lateral movement in a cloud infrastructure be modelled? Further- more, what is the smallest amount of permissions to remove from existing ser- vice accounts to completely separate two sets of projects? and what are the most powerful service accounts and how many possibilities for lateral move- ment can be removed by removing those accounts?

(13)

Chapter 2 Background

This chapter gives an introduction to computational physics and how simulation and modelling are used to simulate physical phenomena. It introduces computer security, cloud services, and how Identity and Access Management (IAM) is used to control permissions in these services. Related work is also included.

2.1 Computer simulation methods

The fast growing field of computer technology makes it possible to think of physical systems in new ways. Formulating physical laws as rules for a computer is a natural and practical way of solving physical problems. Gould et al.

have identified modelling and simulation as most important methods within computational physics. [3]

Modelling is a powerful tool to understand physical phenomena through visual representation. Humans can more easily determine patterns and trends from a visualization of data compared to looking at the data itself. [3]

Computer simulations model real events and outcomes. Simulation techniques can easily be controlled, as parameters and functions are manually set. Simu- lations can be reproduced and generate the exact same result. [4]

3

(14)

2.1.1 Modelling movement with flow networks

A flow network is a network graph with flow between nodes. It can be used to model any system of connections where there is some kind of transportation or movement. [5]

The flow network is a directed graph G = (V, E) where V represents a set of nodes and E a set of directed edges. A source is a node with only outbound edges and a sink is a node with only inbound edges. All edges have a capacity c(u, v) which represents the maximum flow allowed through that edge e = {u, v}. See Figure 2.1 for an example of a flow network with a source s and a sink t. [5]

s n1 t

2 3

Figure 2.1: A simple flow network with a source (s) and a sink (t). The edge labels represent capacity.

The graph problem of finding the max-flow min-cut in a flow network is the maximum amount of flow allowed between two nodes, which also is the minimum cut that separates the nodes. [5]

The flow fuvthrough an edge {u, v} ∈ E is restricted by fuv ≤ c_uvand

P

u∈Efuv = ^P_w∈Efvw, which means that the flow must be less or equal to the capacity and the flow in to a node must be equal to the flow out from the node. In the example flow network in Figure 2.1, this means that the flow fn1t

is maximum 2, since the flow in to n1 can be maximum 2. [5]

The s,t-cut C = (S, T ) in a graph G = (V, E) is a partition of V such that s ∈ S and t ∈ T where S and T are disjoint. The s,t-cut finds a cut-set of edges to remove in order to make the sets S and T disjoint. The minimum cut finds the smallest cut-set that separates two nodes in different sets. [5]

The max-flow min-cut can be solved using different algorithms. For small and simple flow graphs, it can be easily determined by looking at the graph. In the example in Figure 2.1, the max-flow min-cut between the source s and the sink t is 2. The smallest cut-set that separates s and t in two disjoint sets is the edge e = {s, n1} with capacity 2. The maximum flow in the graph is the same on both edges, where the capacity is the limiting factor for fsn1 and the incoming flow is the limiting factor for fn1t.

(15)

CHAPTER 2. BACKGROUND 5

2.2 Computer security

Computer security is the protection of computer infrastructure from digital attacks. In case of an attack is it important to keep the damage as low as possible. Predicting the outcome of an attack makes it possible to minimize the damage as certain actions can be prevented. An attack can lead to confidential information leaking, unwanted changes in code, or access to accounts. If the security engineers have accurate predictions of possible attack paths is it easier for them to prevent severe attacks. [6]

There is a trade-off between usability and security. For example, longer passwords are more secure however less user friendly than shorter passwords. The problem is to find ways to maximize both the usability and the security of systems. But there is also a conjunction, systems needs to be usable in order to be secure. [7]

2.3 Cloud services

The on-demand offering of computer services and resources via the internet is often called cloud services. Cloud services became accessible in larger scale around year 2000. Today, almost 20 years later, cloud services are essential for both individual users and companies in the industry. [8]

There are three main service models for cloud services: Software as a Service (SaaS) is the service where consumers are provided with applications running on a cloud infrastructure. Platform as a Service (PaaS) is the service where the consumer may deploy onto the cloud infrastructure, and have control over the deployed applications. Infrastructure as a Service (IaaS) provides processing, storage, networks, and other fundamental computing resources allowing the consumer to deploy and run software. The consumer has control over operating systems, deployed applications and storage. The biggest differens between PaaS and IaaS is the control of the operating systems. [9] This paper focuses on IaaS systems.

Resources within an organization can be structured in different projects in the cloud infrastructure. A project is created for each new application containing cloud resources (virtual servers, databases, networks, storage), settings, permissions, and other metadata of the specific application. [10]

(16)

2.4 Access Control

Managing which users should have access to what resources is done by access control policies. These policies specify the resource that can be accessed, by whom, and under what conditions. [1]

Proper access control is an important line of defense for preventing attacks.

Managing access control is done by system administrators or data owners. It should be done with the principle of least privilege, giving permissions only to the ones that really need it. [11] Three of the top 10 threats identified by the Open Web Application Security Project (OWASP) involve access control issues. [12]

Access control can be managed using roles. The method is called role-based access control (RBAC). Using RBAC, identities are given roles, and roles specify access control rights. The decision of permission is based on the role of the identity. For example, at a university the students are permitted to read an exam. The teacher is permitted to read and edit the exam. Using RBAC, instead of allowing all individual students the right to read, everyone who is a student gets those permissions by the role. The role-based access control applies to all access control framework and the total number of rules to keep track of is reduced. It is more efficient to store access rights of roles than of subjects. [11]

2.5 Identity and Access Management

Identity and Access Management (IAM) is a tool for managing access control within cloud based platforms. With IAM, all identities are given access to resources by roles. Identities can be users, service accounts, or groups. A group is a collection of users and service accounts. A resource can for example be a project, database, or service account. [13]

2.5.1 Service Accounts

Service accounts provide a way to set up an identity that is not tied to a person but rather an application or service. They are similar to regular user accounts in the sense that they can be granted permissions to resources, such as being

(17)

allowed to read and write to a database or to read files in a storage bucket. A special property of service accounts is that they are not only identities, but also resources that other identities can get permission to use. [14]

2.5.2 Roles

Roles decide what permissions the identity has within resources. They can be more general, like ’viewer’ (allowed to view the resource), or more specific, like ’log-writer’ (allowed to write to logs). [13]

2.5.3 Policy and Role-Bindings

The IAM policy is the collection of role-bindings associating identities with roles. For each resource, the policy contains the bindings of identities that have permission to access the resource and with what role. More than one identity can be in the policy with the same role on the same resource. In Figure 2.2, the policy is visualized with three examples of identities and roles. An example of a policy containing role-bindings, formatted in JSON, is found in Figure 2.3.

The example policy is tied to a resource where Emma, all admins, and the service account sa-1 are owners. Ben is a viewer of the resource. [15]

(18)

+ Policy

Service account

Group User account Identity

Writer

Viewer Owner Role

Database

Service account Project Resource

Figure 2.2: A policy of an identity with a role is connected to a resource

Figure 2.3: Example of policy with role-bindings, formatted in JSON

(19)

2.5.4 Inheritance

A common way of structuring cloud resources is in a hierarchy. The organization hosts projects, which in turn host resources. If an identity has a role with permissions tied to a project, this identity has the same permissions on all resources within the project. In the same way, identities having roles with permissions tied to the organization have the same permissions in all projects and resources within the organization. This inheritance is visualized in Fig- ure 2.4. The IAM roles and permissions are inherited from higher up in the hierarchy, where the organization is the root level. [15]

Organization

Project

Resource

Figure 2.4: Hierarchy of inheritance of roles and permissions

2.6 Lateral movement

Lateral movement refers to attackers moving through a network after an initial compromise.

2.6.1 Life cycle of lateral movement attacks

The life cycle of a lateral movement based attack can be abstracted into the steps shown in Figure 2.5. These steps are essential parts of the attack, and are important to be aware of when creating models and investigating incidents.

[16]

(20)

The first step in the life cycle is the initial compromise, which is followed by command and control. Command and control is established between the compromised resource and the attacker, allowing the attacker to move on to lateral movement to expand the reach of the attack. This step can be repeated, so the attacker gains access to more and more resources. Finally, as the attacker reaches the goal destination (planned or by chance), she executes the actual malicious action, such as data exfiltration or service disruption. [16]

Removing the possibilities of any of these steps will mitigate computer attacks and improve the security posture of a network. This paper focuses on reducing the possibilities for lateral movement.

Initial compromise

Command and control

Lateral movement

Data exfiltration

or service disruption Figure 2.5: Life cycle of lateral movement-based attack

2.6.2 Lateral movement using service accounts

If an attacker compromises a project in a cloud based platform, she can im- personate a service account from that project to get access to another project.

This lateral movement is possible if the service account has a role that permits using resources in the other project. According to the inheritance of permissions described in previous section, if the role is associated with a project all its resources are included.

Service accounts can be both resources and identities in some situations, and this is used by attackers using service accounts for lateral movement. In Fig- ure 2.6, service account sa-1 has permission to access and use project-2. The service account sa-2 is one of the resources in this project, so by inheritance, sa-1 gets permission to use sa-2. As an identity, sa-2 has permission to access and use project-3. In this case, the attacker can get from project-1 to project-3 by lateral movement using service accounts. In Figure 2.6, the policy associated with project-2 has a role-binding with sa-1 as an identity with a role that allows usage of the resources in project-2, where sa-2 is included. The policy associated with project-3 has a role-binding with sa-2 as an identity.

(21)

project-1

sa-1

project-2

sa-2

project-3

Figure 2.6: Lateral movement using service accounts

The possibility of lateral movement can be removed by either removing the service accounts completely or the role-bindings that enables the movement.

If one service account has multiple roles in different projects, removing the service account will remove all these possible lateral movements. However, if there are more than one service accounts in a project that have a role in another project, removing one of these service account would not remove the possibility of lateral movement between them.

2.7 Related work

Goodman et al. (2015) use bipartite graphs to investigate lateral movement.

They use computational physics to model lateral movement in a network, in order to detect malicious actors. The users and the computers are divided in two disjoint set of nodes and therefore the problem can be modelled as a bipartite graph. The malicious lateral movement is seen as random walks with more probability of moving in a direction where targets have greater value. Since the work focuses on classifying a lateral movement as malicious or benign, the method is to find the lateral movements that move from low to higher valued targets. [17]

Purvine et al. use graph metrics to mitigate lateral movement attacks. They model the reachable machines dynamically with a reachability graph. Proper- ties of machines, like storage of user credentials, may change over time, and thereby also the vulnerability state. Reachability scores were calculated and combined with vulnerability scores. Over time, the metrics from data without attacks were compared with metrics from data with at least one attack. The result was used to find mitigation strategies. The dynamic metrics the authors introduced are more specific and specialized for networks and computer vul- nerabilities. For the mitigation strategies to work, the users need to know the

(22)

vulnerability state of every machine. [18] While the concept of reachability is very similar, the work presented in this thesis focuses on IAM based lateral movement.

(23)

Chapter 3 Methods

This chapter describes how the cloud infrastructures are modelled and how IAM based lateral movement can be visualized. The simulations of separating two sets of projects and finding the most powerful service accounts are described and analyzed.

3.1 The models

The models were created based on data of service accounts in a cloud infrastructure; the project they belong to and their role in different projects. The models provide visualization of possible IAM based lateral movement in a cloud infrastructure. Models based on synthetic data were created to be presented in this paper.

3.1.1 The infrastructure graph

A network graph was used to model possible lateral movements within a cloud infrastructure. A directed multigraph G = (V, A) was created where the nodes V represent projects and the multiset A represents possible paths for lateral movement using service accounts. There exists a directed edge e = {p1, p₂} ∈ A if there exists a service account in project p1 that has permission to access and control resources in project p2. This reflects the definition of lateral movement using service accounts stated in Section 2.6 in the Background chapter of this paper.

13

(24)

The connection between one node and another can consist of many edges, as there can be more than one service account in a project p¹that enables lateral movement to project p². Therefore, the resulting graph will be a multidigraph.

Also, one service account can have several roles in different projects and therefore the edge represents the role-binding and not the service account itself.

The cloud infrastructure data was transformed to graph data as described in Algorithm 1. The existing data from the cloud infrastructure contained all service accounts, the host project of each service account, and the roles of each service account. The algorithm adds edges for all service accounts that have a role that allows usage of service accounts in the target project.

When the graph was built, the data on what service account roles each edge represents got lost, and therefore a dictionary result was stored with this information.

for project in projects do result[project] = {}

graph.addNode(project) end

for serviceAccount in serviceAccounts do hostProject = serviceAccount.project policy = serviceAccount.policy for role, resource in policy do

if useServiceAccountPermission in role then targetProject = resource

graph.addEdge(hostProject, targetProject)

result[hostProject][targetProject].add(serviceAccount) end

end end

Algorithm 1: Conversion of cloud infrastructure data to graph.

3.1.2 Visualization

When the graph model was visualized, functions such as zooming and moving nodes were also implemented. Furthermore, a function where the user could search for a specific project was implemented: showing the subgraph containing that project and all connecting ones reachable with lateral movement.

Multiple edges between two nodes were displayed as one edge to increase read-

(25)

CHAPTER 3. METHODS 15

ability of the graphs. No labels were created for the edges for the same reason.

Labels were added to nodes to describe the projects they represent.

3.1.3 The synthetic infrastructure

While algorithms were tested and executed in a real cloud infrastructure, synthetic graph models were created to be presented in this thesis. These graph models were created with synthetic data.

Model graphs were created with different density regarding the number of edges compared to nodes. The density of a directed graph is calculated according to equation 3.1, where E is the number of edges and N is the number of nodes [19].

d = E

N ∗ (N − 1) (3.1)

The synthetic data was built according to Algorithm 2. The structure of the data was made in the same way as the result dictionary that was created in Algorithm 1 for real cloud infrastructure data.

(26)

Function CreateSyntheticData(numberOfProjects, averagenumServiceAccounts, averagenumRoles):

for projectNumber in range (0, numberOfProjects) do data[’project-’ + projectNumber] = {}

end

saNumber = 1

for project in data do

numberOfServiceAccounts = randomInt(0,

2∗averagenumServiceAccounts)#equal propability distribution for i in range (0, numberOfServiceAccounts) do

numberOfRoles = randomExp(mean=averagenumRoles) for j in range (0, numberOfRoles) do

targetProject = randomChoice(data.keys())#equal probability distribution

data[project][targetProject].append(’sa-’ + saNumber) end

saNumber ++

end end

return data End

Algorithm 2: Creating the synthetic data.

The probability distributions were chosen to resemble real cloud infrastructures. The number of roles for each service account was distributed exponentially to specify that a few powerful service accounts have a lot of permissions while most service accounts have few permissions.

The algorithm creates synthetic data to be modelled with a directed graph. Al- gorithm 2 creates a synthetic infrastructure with total number of role-bindings equal to averagenumRoles ∗ averagenumServiceAccounts ∗ numberOf - P rojects. The algorithm is a function that takes these parameters as input.

The number of projects in the synthetic graph models was set to 1000. One model graph was created with density d = 0.0008, which corresponds to approximately 800 edges in a graph with 1000 nodes according to Equation 3.1.

Two other models were created with densities d = 0.004 and d = 0.05. The densities were chosen low because of the visualization aspect, where dense graphs with many nodes are hard to visualize in a good way. Low densities also represent reality better as every project usually does not need access to all other projects.

(27)

For the graph model with density d = 0.0008, a random number of service accounts hosted in each project was set between 0 and 2, meaning the average number was set to 1. Each service account had a random number of roles in other projects, with an average of 0.8 that was exponentially distributed. These numbers were chosen to make the number of roles less than the number of projects, as the number of roles becomes approximately 1 ∗ 0.8 ∗ numberOf P rojects = 800.

For the graph with density d = 0.004, the number of service accounts in each project was randomized between 0 and 4. The number of roles for each service account was randomized exponentially with a mean value of 2. The number of roles became approximately 2 ∗ 2 ∗ numberOf P rojects = 4000.

The graph model with density d = 0.05 was created with the mean value of 5 service accounts in each project. The number of roles for each service account was randomized exponentially with a mean value of 10. The number of roles became approximately 5 ∗ 10 ∗ numberOf P rojects = 50000.

For better visualization of the results in this thesis, synthetic data with ten projects was also created. Three model graphs were created with densities d = 0.08, d = 0.4, and d = 0.8. The same number of service accounts and roles in each project were used as for the graphs with 1000 projects.

3.1.4 The attacker

The attacker’s lateral movement was modelled from one entry point node pe, a project that the attacker compromised first, then following all possible paths in the infrastructure graph G to compromise as many projects and resources as possible. Therefore, the possible compromise would be the set of all reachable nodes from pe in G. It was assumed that all connections had the same probability of being used by the attacker for lateral movement.

The target projects for the attacker were represented with target nodes. In real cloud infrastructures, these could be projects that contain sensitive information, such as personal data.

(28)

3.2 The simulations

The simulations were based on the models of the attacker and infrastructure.

The simulations solve the optimization problems that are stated in the research questions: what is the smallest amount of permissions to remove from exist- ing service accounts to completely separate two sets of projects? and what are the most powerful service accounts and how many possibilities for lateral movement can be removed by removing those accounts?

3.2.1 Separating entry projects from target projects

The optimization problem of separating possible entry projects from target projects of the attacker, by removing as few role-bindings and permissions as possible, was translated to a graph optimization problem. The problem was formulated as finding the max-flow min-cut separating the target nodes from the entry nodes in a flow network. Removing an edge in the graph is equal to removing the role-bindings that make the lateral move possible.

The sets of entry and target projects were chosen manually in the real cloud infrastructure. Entry projects could be projects that are more vulnerable. Ex- ample of vulnerable projects are projects where external users have permission to access its resources.

For the synthetic data models presented in this paper, the sets of entry and target projects were chosen randomly. The projects were randomized so that no project could appear twice in any set and no project could appear in both sets.

The max-flow min-cut theorem applies to flow graphs. The multidigraph of the cloud infrastructure was therefore converted into a flow graph. The definition of a flow graph can be found in the section about flow networks in the Background chapter of this paper. The nodes in the graphs represent projects and the edges represent possible lateral movement and role-bindings. The capacity of each edge was calculated as the number of service account roles enabling the lateral movement. This is demonstrated in Figure 3.1 for the edges e = (p₁, p_g1). In the visualization of the models, multiple edges between the same nodes are represented with only one edge. The information of how many role-bindings each edge represented was collected from the result dictionary.

The flow graph was built according to the pseudo code in Algorithm 3. The

(29)

input data is the result dictionary that is created in Algorithm 1. A visualization of the process is shown in Figure 3.1.

Function CreateFlowGraph(multidigraph, entryNodes, targetNodes):

for project in multidigraph do graph.addNode(project)

for targetProject in multidigraph[project] do

capacity = length(multidigraph[project][targetProject]) graph.addEdge(project, targetProject, capacity = capacity)

#capacity = number of service account roles enabling the edge end

end

graph.addNode(’source’) graph.addNode(’sink’)

for entryNode in entryNodes do

graph.addEdge(’source’, entryNode, capacity = ∞) end

for targetNode in targetNodes do

graph.addEdge(targetNode, ’sink’, capacity = ∞) end

return graph End

Algorithm 3: Creating the flow graph with a source and a sink

(30)

pe1 p1 p_g1

pe2 p_g2

p₁

p_e1 p_g1

p_e2 p_g2

1 3

1

s t

∞

Figure 3.1: Visualization of the conversion from network multidigraph, with entry and target nodes, to the weighted flow graph with source (s) and sink (t).

The max-flow min-cut theorem can be used to separate one node from another in a flow graph. In the flow graph created from the cloud infrastructure graph, a source and a sink was added to enable separation of two sets of nodes. The source was connected to all entry nodes by edges with infinite capacity. The target nodes were connected to the sink with infinite capacity edges. The max- flow min-cut separating the source and the sink therefore also separates the entry nodes from the target nodes, as infinity edges will never be part of the max-flow min-cut.

The min-cut separating the source and the sink in the flow graph was found us- ing highest-label preflow-push algorithm, as it is considered one of the fastest algorithms and takes less space than for example the Shiloach-Vishkin algo- rithm. [20] [21]

The min-cut returned the sum of the capacity on the edges that need to be removed (the value of the min-cut) and all the roles of the service accounts the edges represent.

3.2.2 Finding the most powerful service accounts

The goal of finding the most powerful service accounts was to find service accounts to focus on, either to remove or to control and monitor. Therefore,

(31)

the most powerful service accounts were the ones that enabled the most lateral movement. Removing those service accounts would reduce the possibilities for lateral movement the most. The number of possible lateral movement using service accounts that could be removed was found by simulating the removal of each service account.

The graph problem of finding the most powerful service accounts was solved using the flow graph of the cloud infrastructure created in Algorithm 3 from previous subsection. All edges in the graph with capacity = 1 are removed if the service account that enables the connection is removed. Edges with capacity more than 1 have more than one service account that enables the connection, and therefore the edge remains even if one of them is removed.

Two scores were calculated for each service account present in the graph. The first score is the number of edges with capacity = 1 that the service account causes. The second score is the number of edges with capacity higher than 1 that the service account is part of. The scoring was made according to Algo- rithm 4.

for project in result do

for targetProject, serviceAccounts in result[project] do if length(serviceAccounts) == 1 then

oneCapacityScore[serviceAccounts] += 1 else

for serviceAccount in serviceAccounts do higherCapacityScore[serviceAccount] += 1 end

end end end

Algorithm 4: Scoring of the service accounts

All service accounts were then sorted, first based on the score from the capacity = 1 edges, then based on the score from higher capacity edges. There- fore, if more than one service account had the same capacity = 1 score, these were ranked based on the higher capacity score.

A top list was created for the ten most powerful service accounts in the cloud infrastructure. The top list presents how many edges in the graph model that are removed (capacity = 1 edge score) and how many connections that are reduced in capacity (higher capacity edge score) by removing each specific

(32)

service account. Removing an edge between two nodes in the model graph is equal to removing the possible lateral movement between the projects. Reduc- ing the capacity will not remove the edge or the possibility for lateral movement. The reduce in capacity is a step towards a more secure infrastructure.

The total score oneCapacityScore + higherCapacityScore for a service account represents how many projects that can be reached at once by compromising that service account.

(33)

Chapter 4 Results

In this chapter the results are presented. All resulting models and simulations presented in this paper are based on synthetic data generated as described in the section about synthetic infrastructure in the Methods chapter.

4.1 The graph models

The models of the synthetic infrastructures are shown in Figure 4.1 and 4.2.

Both models are examples of what the cloud infrastructure of a larger scaled company with 1000 projects could look like. Nodes represent projects within the infrastructure and edges represent possible lateral movement using service accounts. Multiple edges between two nodes are presented as one edge to increase readability. Figure 4.1 shows the graph with density d = 0.0008. Figure 4.2 shows the graph with density d = 0.004. The small dots represent isolated projects.

The graph model with density d = 0.05 is not visualized. The graph did not render in a readable way as it was too dense.

For more detailed and readable graphs, Figure 4.3, 4.4, and 4.5 show graphs based on the synthetic data containing only ten projects. These model graphs have densities d = 0.08, d = 0.4, and d = 0.8. They have different densities than the graphs with 1000 nodes. It was not possible to create the graphs with same densities, as the smallest possible density with 10 nodes is d = _10∗9¹ ≈ 0.01.

Instead, the average number of outgoing edges from each node is the same as in the larger graph models.

23

(34)

Figure 4.1: The model graph with 1000 projects and d = 0.0008

(35)

CHAPTER 4. RESULTS 25

(36)

The modelling took less than one second to compute.

4.2 The optimization problems

The results of the simulation separating two sets of projects and the simulation of finding the most powerful service accounts are presented in this section. All simulations took less than one second to compute.

4.2.1 Separating entry projects from target projects

In the synthetic graph models of cloud infrastructure, random entry and target projects were chosen. For the models with 1000 projects, 10 random entry projects were separated from 10 random target projects. For the models with 10 projects, one random entry project was separated from one random target project.

(37)

The min-cut is the minimal number of edges that need to be removed in order to completely separate the two sets of nodes. The edges represent role-bindings of service accounts and roles that enable the lateral movement. The min-cut therefore represents the minimal number of role-bindings that need to be removed in order to remove all possibilities for lateral movement between the entry projects and the target projects.

Density Mean min-cut

0.0008 0.28

0.004 17

0.05 383

Table 4.1: Min-cut separating 10 random entry projects from 10 random target projects in the synthetic graph models with 1000 projects, averaged over 100 runs.

The average min-cut separating 10 random entry projects from 10 random target projects in the synthetic model graphs with 1000 projects are found in Figure 4.1. The mean min-cuts were averaged over 100 runs. The average min-cut separating one random entry project from one random target project in the synthetic model graphs with 10 projects are found in Figure 4.2.

Density Mean min-cut

0.08 0.3

0.4 1.3

0.8 8

Table 4.2: Min-cut separating one random entry projects from one random target projects in the synthetic graph models with 10 projects, averaged over 100 runs.

The min-cuts separating entry project project-2 from target project project-10 in the model graphs with 10 projects were visualized. In the model graphs, the cut-set of the min-cut was colored red. The results are found in Figure 4.6, 4.7, and 4.8. In Figure 4.8, the cut-set consists of seven edges but min-cut = 8.

The edge between project-4 and project-10 has capacity = 2, meaning two role bindings need to be removed in the cloud infrastructure to remove the edge in the model graph.

(38)

Figure 4.6: Min-cut separating project-2 and project-10 in the graph model with 10 projects and d = 0.08. The projects are not connected, min-cut = 0.

Figure 4.7: Min-cut separating project-2 and project-10 in the graph model with 10 projects and d = 0.4. The red edge is part of the cut-set. Min-cut = 1.

(39)

Figure 4.8: Min-cut separating project-2 and project-10 in the graph model with 10 projects and d = 0.8. Red edges are part of the cut-set. Min-cut = 8.

4.2.2 The most powerful service accounts

The most powerful service account is the service account that, if removed, would decrease the possibility for lateral movement the most. In the graph model, this represents the largest decrease in density. The top ten most powerful service accounts in the graph models with different densities are presented in Table 4.3, 4.4, and 4.5.

In the tables, the capacity = 1 edges are the ones that can be removed by removing the service account. The higher capacity edges are the ones that can be decreased by removing the service account, meaning that the edge will still be present and the lateral movement is still possible. The decrease in capacity means that there is fewer service accounts that enables the lateral movement. The sorting is based on the capacity = 1 edges first. The total score oneCapacityScore + higherCapacityScore is the total number of projects that can be reached at once, if the service account is compromised.

(40)

Density d = 0.08 Service account Number of capacity = 1

connections

Number of higher capacity connections

sa-334 7 0

sa-416 6 0

sa-138 5 0

sa-229 5 0

sa-80 4 0

sa-103 4 0

sa-174 4 0

sa-176 4 0

sa-254 4 0

sa-366 4 0

Table 4.3: Most powerful service accounts in the graph model with density d

= 0.08. The number of capacity = 1 connections is the number of edges that can be removed.

Density d = 0.4 Service account Number of capacity = 1

connections

Number of multiple capacity connections

sa-870 15 0

sa-1115 12 0

sa-247 11 0

sa-421 11 0

sa-589 11 0

sa-1001 11 0

sa-1032 11 0

sa-253 10 1

sa-548 10 0

sa-678 10 0

(41)

Service account Number of capacity = 1 connections

Number of multiple capacity connections

sa-349 78 7

sa-2677 68 3

sa-3068 66 2

sa-3551 63 0

sa-574 60 1

sa-4310 59 6

sa-1797 57 5

sa-4059 56 4

sa-934 56 0

sa-1904 55 8

(42)

Discussion

The visualization based on the models and result from the simulations turned out to be very useful for security engineers. As a first step, the visualization of the lateral movement in the cloud infrastructure helped engineers understand how different projects can be reached by an attacker. A dense model graph shows a lot of connections and possibilities for lateral movement, which implies a high security risk as it allows an attacker to move through the infrastructure. A sparse graph is a sign of a secure infrastructure, as fewer connections means fewer possibilities for an attacker to move in the infrastructure.

Without additional controls, an infrastructure with a lot of powerful service accounts pose a higher risk and might be a sign of an insecure infrastructure.

The lateral movement is not the only thing deciding the security posture of an infrastructure.

It might be easier for an attacker to stay unnoticed in a dense graph model, as it takes more effort to control such an environment. Even if the security team detects the intrusion, it is hard to follow the movement of the attacker if there are many paths for her to take. Besides removing powerful permissions, this can be mitigated by having more controls on the service accounts such as monitoring and alerting.

A dense graph model might be a sign of an environment that is easy for devel- opers to work in, as a lot of accounts have access to a lot of resources. It does not have to be the case though, as people might add too many permissions because of convenience and old roles that are not needed might still be around.

When a developer wants to add specific permissions to a resource, it is done by adding a role to the identity. It is more convenient to grant all permissions in

32

(43)

CHAPTER 5. DISCUSSION 33

a powerful role instead of customizing permissions or evaluate less powerful roles. Choosing one of the most powerful roles, they do not have to figure out which constitute the least feasible set of permissions. The increased usability is not enough motivation for the increase in security risk it implies. Roles should be kept as restricted as possible, only allowing the identity to have the permissions that is truly needed, even if it takes more time and effort.

During incident response, the visualization of the cloud infrastructure can be very useful. If a project or service account is compromised, this information can be used as input to the model to find other possibly compromised projects and service accounts through lateral movement. The incident response team can then narrow down the search space for further compromised projects. As a result, they can save a lot of time and quick actions can be taken to contain an incident at an early stage. In this use case, there can be big differences in time needed to contain such an incident depending on how dense the model graph is.

The synthetic model graphs with ten nodes were made with different densities than the larger graphs with 1000 nodes. They cannot be compared to each other. The smaller and more readable example graphs were created to illustrate how the algorithms work.

The two optimization problems were chosen because of the aspects of IAM based lateral movement they can address. The min-cut can be used to isolate the most sensitive projects. The top list can be used to identify the most sensitive service accounts, which then can be removed or controlled with for example monitoring and alerting.

Running the min-cut to separate more exposed projects, such as internet-facing production services, from more sensitive projects, such as projects containing personal data, can guide engineers in their job to secure the infrastructure. The min-cut algorithm identifies which service accounts should be constrained in permissions, in order to decrease the security risk in an optimal way. This simulation can be useful to prevent severe attacks and to protect the most sensitive projects.

The top list of the most powerful service accounts provided insights on which service accounts enable the most lateral movement. The simulation returns suggestions on which service accounts should be removed and also how that would affect the model graph. It returns the number of edges that will be removed for each service account on the top list. If it is not possible to remove a service account on the top list, the protection of it should be increased in

(44)

other ways.

The results of the calculations made on the models all illustrate that having more permissions within a cloud infrastructure leads to more dense graphs and more possibilities for an attacker to move within the infrastructure. An attacker who compromises a service account with a lot of permissions can reach a lot of resources within that cloud infrastructure.

The modelling and simulations took less than one second to compute, even though the data sets were quite large. The methods can be applied to larger scaled companies and still be effective.

The min-cut and the top list can both be measurements of the security posture of a cloud infrastructure. They only look at one part of the problem each, and cannot be used solely to describe the security posture of a cloud infrastructure.

They are not linearly dependent, decreasing the scores in the top list will not always decrease the value of the min-cut isolating the sensitive projects. By first removing the most powerful service account in the top list, the min-cut can be reduced completely, a little, or not at all. It is therefore not possible to claim that removing the most powerful service account will decrease the min- cut, even though both are measurements of security risk. Figure 5.1 and 5.2 show two examples of cloud infrastructure where sa-2 is the most powerful service account and the min-cut is the separation of p^e and p^g. Figure 5.1 shows an example where the min-cut is not changed when the most powerful service account is removed. Figure 5.2 shows an example where the min-cut is completely reduced to zero if the most powerful service account is removed.

(45)

CHAPTER 5. DISCUSSION 35

p_e p₁

p2

p₃

pg

sa-1 sa-2

sa-3

sa-2 sa-4

sa-5

p₁ p_e

p₂

p₃

p_g sa-1

sa-3 sa-5

sa-4

Figure 5.1: Removing the most powerful service account: sa-2. In this case, the min-cut separating p^g from p^eis still one.

p_e p₁

p₂

p₃

p_g sa-1 sa-2

sa-2

sa-2 sa-4

sa-5

p₁ p_e

p₂

p₃

p_g sa-1

sa-5 sa-4

Figure 5.2: Removing the most powerful service account: sa-2. In this case, the min-cut separating p^g from p^eis reduced to zero.

(46)

Conclusions

IAM based lateral movement in cloud infrastructures are possible if service accounts have roles that enable usage of other service accounts. It is an amplified threat in cloud infrastructures. The models and simulations in this thesis illustrate that cloud infrastructures with a lot of powerful roles have a higher security risk. The graph models show how lateral movement can be done to a larger extent in more dense graphs.

The min-cut simulations separating sensitive projects from other projects illustrate the increased risk of an attacker reaching sensitive projects in an infrastructure with a dense model graph. The min-cut differs a lot between the graphs with different densities, which shows how much more difficult it is to isolate sensitive projects in a dense graph model than in a sparse graph model.

The top list of the most powerful service accounts illustrates that dense graph models have more powerful service accounts. Powerful service accounts can pose a higher security risk, as they allow an attacker to access a lot of resources at once if they are compromised. Identifying these high risk accounts provides useful input for security engineers in order to increase the security posture in a cloud infrastructure. The information can be used to remove accounts or add other controls such as alerting and monitoring.

For already existing cloud infrastructures, it can be difficult to change permissions of service accounts without disrupting production services. The results from the simulations in this thesis can be used to increase security posture more effectively. For new adaptors of cloud infrastructures, the models in this thesis underline the importance of being restrictive when granting new roles and permissions.

36

(47)

Bibliography

[1] John Backes et al. “Semantic-based Automated Reasoning for AWS Ac- cess Policies using SMT”. In: Oct. 2018, pp. 1–9. doi: 10.23919/

FMCAD.2018.8602994.

[2] TrendMicro. Lateral movement: How do threat actors move deeper into your network? http://about-threats.trendmicro.com/

cloud- content/us/ent- primers/pdf/tlp_ lateral_

movement.pdf. [Online; accessed 17-May-2019]. 2013.

[3] Harvey Gould, Jan Tobochnik, and Wolfgang Christian. An Introduction to Computer Simulation Methods Third Edition (revised). 2007.

[4] M. E. Kuhl et al. “Cyber attack modeling and simulation for network security analysis”. In: 2007 Winter Simulation Conference. Dec. 2007, pp. 1180–1188. doi: 10.1109/WSC.2007.4419720.

[5] Ford Lester Randolph Jr. and Fulkerson D. R. Flows in Networks. 1962.

[6] Fuguo Li. “Study on security and prevention strategies of computer net- work”. In: 2012 International Conference on Computer Science and Information Processing (CSIP). Aug. 2012, pp. 645–647. doi: 10 . 1109/CSIP.2012.6308936.

[7] S. Garfinkel and H. R. Lipford. Usable Security: History, Themes, and Challenges. Morgan Claypool, 2014. isbn: 9781627055307. url: https:

//ieeexplore-ieee-org.focus.lib.kth.se/document/

6920435.

[8] Nationalencyklopedin. Molnet. http : / / www . ne . se . focus . lib.kth.se/uppslagsverk/encyklopedi/lÃěng/molnet.

[Online; accessed 13-March-2019]. 2018.

[9] Timothy Grance Peter Mell. “The NIST Definition of Cloud Comput- ing”. In: Special Publication (NIST SP) - 800-145. Sept. 2011.

37

(48)

[10] Google Cloud. Google Cloud Platform Overview. https://cloud.

google.com/docs/overview/. [Online; accessed 31-January- 2019]. 2018.

[11] Tamassia Roberto. Introduction to Computer Security: Pearson New In- ternational Edition. Pearson Education UK, 2013.

[12] Open Web Application Security Project. OWASP Top Ten 2017 Project.

https://www.owasp.org/index.php/Category:OWASP_

Top_Ten_2017_Project. [Online; accessed 21-May-2019]. 2017.

[13] Google Cloud. Cloud Identity and Access Management - Overview.

https://cloud.google.com/iam/docs/overview. [On- line; accessed 31-January-2019]. 2018.

[14] Google Cloud. Understanding Service Accounts. https://cloud.

google.com/iam/docs/understanding-service-accounts.

[Online; accessed 06-February-2019]. 2018.

[15] Google Cloud. Granting, changing, and revoking access to resources.

https : / / cloud . google . com / iam / docs / granting - changing-revoking-access. [Online; accessed 16-April-2019].

2019.

[16] A. Bohara et al. “An Unsupervised Multi-Detector Approach for Iden- tifying Malicious Lateral Movement”. In: 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS). Sept. 2017, pp. 224–233. doi:

10.1109/SRDS.2017.31.

[17] E. Goodman et al. “Using Bipartite Anomaly Features for Cyber Se- curity Applications”. In: 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). Dec. 2015, pp. 301–306.

doi: 10.1109/ICMLA.2015.69.

[18] Emilie Purvine, John R. Johnson, and Chaomei Lo. “A Graph-Based Impact Metric for Mitigating Lateral Movement Cyber Attacks”. In:

Proceedings of the 2016 ACM Workshop on Automated Decision Mak- ing for Active Cyber Defense. SafeConfig ’16. Vienna, Austria: ACM, 2016, pp. 45–52. isbn: 978-1-4503-4566-8. doi: 10.1145/2994475.

2994476. url: http://doi.acm.org.focus.lib.kth.

se/10.1145/2994475.2994476.

(49)

BIBLIOGRAPHY 39

[19] T. Coleman and J. Moré. “Estimation of Sparse Jacobian Matrices and Graph Coloring Blems”. In: SIAM Journal on Numerical Analysis 20.1 (1983), pp. 187–209. doi: 10.1137/0720013. eprint: https://

doi.org/10.1137/0720013. url: https://doi.org/10.

1137/0720013.

[20] A V Goldberg and R E Tarjan. “A New Approach to the Maximum Flow Problem”. In: Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing. STOC ’86. Berkeley, California, USA: ACM, 1986, pp. 136–146. isbn: 0-89791-193-8. doi: 10 . 1145 / 12130 . 12144. url: http://doi.acm.org.focus.lib.kth.se/

10.1145/12130.12144.

[21] Ravindra K. Ahuja et al. “Computational investigations of maximum flow algorithms”. eng. In: European Journal of Operational Research 97.3 (1997), pp. 509–542. issn: 0377-2217.

(50)

(51)

(52)

www.kth.se

Modelling and simulating Identity and Access Management based lateral movement in a cloud infrastructure

Modelling and simulating Identity and Access Management based lateral movement in a cloud

infrastructure

Using simulation techniques and graph theory for computer security applications

EMMA FILIPSSON

Modelling and simulating Identity and Access

Management based lateral movement in a cloud

infrastructure

Using simulation techniques and graph theory for computer security applications

EMMA FILIPSSON

Abstract

Acknowledgements

Contents

Chapter 1 Introduction

1.1 Research Question

Chapter 2 Background

2.1 Computer simulation methods

2.1.1 Modelling movement with flow networks

2.2 Computer security

2.3 Cloud services

2.4 Access Control

2.5 Identity and Access Management

2.5.1 Service Accounts

2.5.2 Roles

2.5.3 Policy and Role-Bindings

2.5.4 Inheritance

2.6 Lateral movement

2.6.1 Life cycle of lateral movement attacks

2.6.2 Lateral movement using service accounts

2.7 Related work

Chapter 3 Methods

3.1 The models

3.1.1 The infrastructure graph

3.1.2 Visualization

3.1.3 The synthetic infrastructure

3.1.4 The attacker

3.2 The simulations

3.2.1 Separating entry projects from target projects

3.2.2 Finding the most powerful service accounts

Chapter 4 Results

4.1 The graph models

4.2 The optimization problems

4.2.1 Separating entry projects from target projects

4.2.2 The most powerful service accounts

Discussion

Conclusions

Bibliography