Automated file extraction in a cloud environment for forensic analysis

(1)

DEGREE PROJECT FOR MASTER OF SCIENCE IN ENGINEERING COMPUTER SECURITY

Automated file extraction in a cloud environment for forensic

analysis

Kevin Gustafsson | Emil Sundstedt

Blekinge Institute of Technology, Karlskrona, Sweden, 2017

Supervisor: Kurt Tutschku, Department of Communication Systems, BTH

(2)

(3)

Abstract

The possibility to use the snapshot functionality of OpenStack as a method of securing evidence has been examined in this paper. In addition, the possibility of extracting evidence automatically using an existing operation tool has been investigated.

The usability of snapshots in a forensic investigation was examined by conducting a series of tests on both snapshots and physical disk images. The results of the tests were then compared to evaluate the usefulness of the snapshot. Automatic extraction of evidence was investigated by implementing a solution using Ansible and evaluating the algorithm based on the existing standard ISO 27037.

It was concluded that the snapshots created by OpenStack behaves similar enough to disks to be useful in a forensic investigation. The algorithm proposed to extract evidence automatically seems to not breach the standard.

Keywords: Forensic, Qcow, OpenStack, Snapshot

(4)

(5)

Sammanfattning

Möjligheten att använda OpenStacks ögonblicks funktion som metod för att säkra bevis har granskats i detta papper. Dessutom har möjligheten att extrahera bevis automatiskt med ett befintligt automatiseringsverktyg undersökts.

Användbarheten av ögonblicksbilder i en rättslig utredning undersöktes genom att genomföra en serie tester påbåde ögonblicksbilder och fysiska disk avbilder. Resultaten av testerna jämfördes sedan för att utvärdera användbarheten av ögonblicksbilden. Automatisk utvinning av bevis undersöktes genom att implementera en lösning med Ansible och utvärdera algoritmen baserat påden befintliga standarden ISO 27037.

Det drogs slutsatsen att de ögonblicksbilder som skapats av OpenStack beter sig tillräckligt lika en fysisk disk för att avbilderna ska vara användbara vid en råttslig utredning. Den algoritm som föreslås att extrahera bevis automatiskt tycks inte bryta mot standarden.

Nyckelord:Forensik, Qcow, OpenStack, Ögonblicksbild

(6)

(7)

Preface

This thesis is the last part of five year education on Blekinge Institute of Technology. The education will provide a degree of master of science in engineering computer security. We would like to thank City Network AB for the time they spent on helping us and also for the workspace in their office.

A special thanks to our advisor Anders Carlsson who have helped us with contact information and provided us with ideas on how to solve the task.

Thanks to Vida Ahmadi for helping us get in contact with City Network.

Thanks to Kurt Tutschku who have been our supervisor throughout the thesis.

Thanks Jonas Virdegård and Jim Keyzer who have helped us as external resources.

"I am the wisest man alive, for I know one thing, and that is that I know nothing."

- Socrates

(8)

(9)

Nomenclature

Notations Acronyms

API Application Programming Interface CSP Cloud Service Provider

DEFR Digital Evidence First Responder GDPR General Data Protection Regulation IaaS Infrastructure-as-a-Service

IDS Intrusion Detection System IoT Internet of Things

IP Internet Protocol

ISO International Organization for Standardization

kB Kilobyte

LVM Logical Volume Management NTP Network Time Protocol PaaS Platform-as-a-Service PID Process identifier Qcow QEMU copy on write

RAM Random Access Memory

RB Rättegångsbalken

SaaS Software-as-a-Service

SSH Secure Shell

UUID Universally Unique Identifier

VM Virtual Machine

YAML YAML Ain’t Markup Language

(10)

(11)

1 INTRODUCTION

1.1 Introduction

Crimes has been an inconvenient truth since the beginning of humanity and have unfortunately become a fact of the society of today. As the capabilities of technology have evolved throughout the past decade, so has the methods of committing crimes. Currently all types of crimes involve electronic equipment in some way, if not to commit the action itself but to communicate or ease the task. This is unlikely to change as computer systems play a more important role in our everyday lives, cell phones are more likely than not to be present and additional equipment is being introduced daily, Internet of Things (IoT) for example.

As illegal actions, such as selling and buying drugs, obtaining and distributing child pornography, money laundering, piracy, etc. moved into the Internet, so did the investigators of crimes. The standard procedure when conducting a forensic investigation includes a collection of electronics that may contain evidence for the case. This means that the physical hardware found is collected and brought to a forensically sound lab for analysis.

Cloud technology allows users to move their activity away from in-house hardware and into the cloud instead. This means that any illegal activity such as distributing illegal material which was previously conducted via in-house solutions or otherwise outsourced to rental servers, can now be moved into the cloud. Due to the nature of the cloud, data can now be moved seamlessly between servers, data centres and even countries. This seamless structure adds an extra layer of obstacles during a forensic investigation as data can be spread across multiple servers and countries, rendering a physical collection impossible. Also there exist laws which prevent authorities from shutting down an entire cloud just to investigate a single user’s actions.

1.2 Background

City Network Hosting AB is a Swedish company currently located in Karlskrona but with infrastructure around the world. Their initial business was to offer a web hosting service to individual users as well as companies. Their focus has lately shifted from web hosting to cloud computing. Because they are offering infrastructure to their customer, they might be hosting systems included in potential crimes. A solution which could be used to secure digital evidence located in a cloud would enhance the perception of the business as well as their credibility.

Also in May 2018 a new General Data Protection Regulation (GDPR) will be applied. This new EU regulation will replace the current regulations in Sweden. The regulation shifts all the power of information related to individuals back to the individual. In case the regulation is not followed as expected companies can face severe penalties of up to 4% of their turnover. Due to this, there is an interest for City Network Hosting AB to be able to secure evidence of systems running in their cloud to be able to dismiss any accusations as well as aid a pending investigation.

1.3 Objectives

Our objectives with this project are to investigate available snapshot functionality found in the cloud environment OpenStack, which could potentially be used to secure evidence. The outcome of the snapshot function will be evaluated against a traditional disk copy, and a proposed algorithm for securing a snapshot and extracting evidence found within will be proposed. A

(14)

simple proof of concept solution will be implemented as far as possible to check out how practical our proposed solution is. We will most likely come up with alternatives to our solutions that we will not be able to implement due to our time constraint.

We will focus on preserving non-repudiation when securing and extracting evidence. Non- repudiation within the field of digital security is composed of two parts:

• Integrity of data

The integrity of potential evidence should be secured. This means that it should be possible to prove that any data extracted has not been altered.

• Authenticity

It should be possible to prove that extracted data has been extracted from the alleged system.

1.4 Delimitations

Due to City Network Hosting ABs generosity and agreeing to work with us, we have chosen to only focus our work on the same type of setup as they are currently running. This means that we are only looking into a solution that would primarily work with OpenStack which is running KVM/QEMU as the virtualization layer. Any other type of setup, cloud orchestrator or virtualization technique is out side of this scope.

1.4.1 Forensic process

A standard complete digital forensic process consist of the following steps:

1. Preparation

This is the phase during which the examiner plans the case, expected search warrant needed, special equipment, researching required knowledge, expected systems, etc.

2. Survey

Surveying a crime scene is when potential sources of evidence should be identified and noted, this includes both hardware and digital evidence.

3. Documentation

When potential sources of evidence has been identified they has to be properly documented.

Documentation is a crucial part of all stages during an investigation, and any handling of evidence and actions made should be carefully documented.

4. Preservation

Sources of potential evidence should be preserved in such a way that the digital evidence can be authenticated at a later time. Methods of preservation differ between cases and sources, changes made to to potential evidence should always be kept to a minimum.

5. Examination and Analysis

Collected potential evidence is then moved to a special facility designed for the purpose of digital evidence analysis. Everything collected is then inspected, and found evidence are preserved and documented.

6. Reconstruction

When the examiner believes all evidence collected has been identified and preserved, then the events are reconstructed, trying to acquire a complete picture of events.

2

(15)

7. Reporting

The examination is finalised by writing a reporting of all the findings. This is one of the most important stages as it is the report that is usually presented in court.

While all of the stages in a digital forensic process are important, we are only focusing on potential methods of preserving digital evidence found in a virtual machine running on OpenStack.

We assume appropriate preparations have been taken and potential sources have been identified.

1.4.2 Acquisition constraints

We do not consider special cases where there might exist legal constraints which prevent an analyst from collecting some evidence. These types of constraints would prevent the use of snapshotting, thus prevent any proposed algorithm in this paper. In those cases where the disk is allowed to be copied we assume the analyst obtained the proper authorization to read the data.

1.4.3 Encrypted systems

It is possible to encrypt virtual machines running within a cloud, much like it is possible to encrypt a physical hard drive of a running system. We do not consider this possibility, as an encrypted virtual machine would render our solution useless. An encrypted virtual machine, in this case, will most likely affect an investigation in the same ways as an encrypted physical system would, thus falling out of our scope.

1.4.3.1 Attached volumes

In a cloud it is possible to attach volumes to the virtual machines. Our solution will not take that into account. The volumes in OpenStack are using Logical Volume Management (LVM) to create a logical disk, and we are only looking at the snapshot of a virtual machine which generates a QEMU copy on write (Qcow) disk image.

1.5 Thesis question

Is it possible to use the snapshot functionality in OpenStack as an alternative to a forensic disk clone?

If the OpenStack snapshot functionality is an alternative to a forensic disk clone, is it possible to automatically extract data from the snapshot in a forensically sound manner?

Is it possible to prove the non-repudiation of the snapshot?

(16)

(17)

2 THEORETICAL FRAMEWORK

2.1 What is cloud computing

The modern concept of Cloud Computing was introduced by Amazon back in 2006 when they introduced Elastic Compute Cloud, which was a service where excessive resources were offered to the public for a modest cost. This concept has since then evolved into a hot topic in the digital world. The National Institute of Standards and Technology defines Cloud Computing as [3]:

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.

This type of service allows its users to pay only for the resources required at the moment, a so-called pay-per-use model. The solution is commonly referred to as an elastic solution as it also provides the capabilities to add and remove resources seamlessly as the demand grows and shrinks. As customers only purchase what they need from a Cloud Service Provider (CSP), there is no need for the customer to purchase and maintain their own infrastructure. Instead, this is now up to the CSP [4].

A report written at UC Berkeley Reliable Adaptive Distributed Systems Laboratory explains [5] that Cloud Computing introduces three new aspects from the hardware’s point of view:

1. Cloud Computing creates the illusion of infinite hardware being available on-demand.

2. Resources can be added as demand increases. This means that anyone can start out small and if there ever will be a need for additional resources, those can be added seamlessly.

3. Cloud consumers do not have to make a long-term commitment. This means that it is possible to pay for resources by the hour.

As Dale says, the three definitions above only describe the physical nature of Cloud Computing.

This model will also blur the lines of ownership of data. The key concept for cloud consumers is that data that was previously stored in-house is now being moved to outsourced locations [6].

Different models of Cloud Computing solutions can be offered to customers. Three models are discussed alongside Cloud Computing. Software-as-a-Service (SaaS), this is a model where complete applications running on the cloud infrastructure are offered to the customers. Examples of SaaS applications are the ones hosted by Google (Google Drive, Google Mail, Google Calendar, etc.) or the music streaming service Spotify. Platform-as-a-Service (PaaS) is the second model which allows the Cloud consumers to host their own applications on the CSPs infrastructure.

Solutions such as AWS Elastic BeanStalk, Microsoft Azure, and Google Apps Engine runs on PaaS. The last model is Infrastructure-as-a-Service (IaaS), this is a model where the CSPs infrastructure is offered to the customers. The infrastructure can be used to launch and host Virtual Machines (VMs). See fig 2.1 for a visual presentation.

2.1.1 OpenStack overview

OpenStack is an open source project initiated by NASA and Rackspace back in 2010 [7]. A non-profit corporate was established in 2012 whose purpose is to promote OpenStack. Over 500

(18)

Figure 2.1: Management distribution between cloud models.

companies around the globe come together to develop and maintain the project. At the core, OpenStack is a cloud orchestrator framework for setting up an IaaS-cloud, which potentially could be transformed to PaaS and SaaS.

The architecture of OpenStack has been designed to support deployments in a modular fashion. This was achieved by developing components into independent services. The general purpose of a service is to abstract and manage a set of resources. The modular approach allows for multiple different deployments of OpenStack and if a particular service is not part of the desired deployment, then it can be safely left out. All services available in OpenStack belongs to one of three groups:

• Control is the group which contains all services that expose an Application Programming Interface (API), such as interfaces, databases, and the message bus.

• Network contains services that run and manages the network of the deployment.

• Compute groups all services that manage the VMs.

In a smaller deployment, all the desired services can be deployed on the same node. It is generally recommended to distribute the deployment of services onto different nodes as to support scaling. OpenStack was designed to scale horizontally (adding additional resources), thus it is recommended to at least deploy the compute service on a separate node from the control

6

(19)

and network services as additional nodes could then be added [10].

OpenStack was developed with the intention to abstract the underlying infrastructure and its communication. This is realised by the use of vendor-provided drivers, which prevents technology lock-ins [8]. OpenStack is also capable of running on top of multiple different hypervisors (described below) such as KVM, QEMU, VMWare, Xen, and Hyper-V.

Because OpenStack uses a modular architecture, services can be added and removed as needed. Currently three services are required to run a core version of OpenStack [9]:

• Nova is the controller node of OpenStack. It is responsible for the life cycles of instances (VMs) in the cloud (spawning, scheduling, terminating, etc.).

• Glance service stores and manages images used to launch instances in the cloud.

• Keystone service is responsible for authentication and authorization of actions made against available services in OpenStack.

The services mentioned above presents the bare core, additional services are deployed to add functionality. Examples of such services are Neutron (networking), Swift (object storage), Cinder (block storage) and Horizon (dashboard).

Figure 2.2: OpenStack architecture.

2.1.2 Virtualization techniques

A hypervisor, also called Virtual Machine Monitor, is a hardware, software, or firmware [27]

that can create and run VMs. A computer system that runs a hypervisor which runs one or more VMs is called a host machine and each VM is called a guest machine. In this paper we will use the term VM instead of guest machine. The hypervisor is presenting the VMs with a virtual operating platform, a virtual hardware, and manages execution of the VMs. There is no limit to which operating systems that shares the same hardware which means Linux, Windows and OS X can all run on the same physical hardware.

There are two types of hypervisors, type 1 and type 2:

(20)

• Type 1 - The type 1 hypervisor is running directly on the hardware and does not rely on a host operating system. The first hypervisors IBM created was of this type. Modern versions are Xen, Oracle VM Server, Hyper-V, and VMware ESX/ESXi.

• Type 2 - These runs on top of the host operating system, just as a normal program does.

All virtual machines launched on a type 2 hypervisor will run as a separate process on the host system. The modern versions are VirtualBox, VMware Player, VMware Workstation, and QEMU.

In this paper we will look at QEMU and KVM [17]. KVM is a hypervisor, or to be precise an accelerator, that runs VMs with the help of QEMU. KVM does not perform any emulation by itself, thus needing QEMU to emulate the operating system. Since KVM is built into the Linux kernel, it runs code directly on the physical hardware, the kernel switches mode for the process to a guest mode. This means we are running a thread with the VM directly on the hardware, but the VMs operating system is running on QEMU within the host. For the KVM all VMs are just threads running on the hardware, all memory checks, scheduling, etc. is done by QEMU. The advantage of this is that we can use the QEMU commands on the host to alter the VM, which is more flexible if we later need to switch host, but still have the speed as the VM runs directly on the physical hardware.

To be able to manage all these layers OpenStack uses the libvirt library. Libvirt is an API that can be used both locally and remotely to manage the lifecycle of VMs running in KVM. Libvirt does not only support KVM, but that is the only hypervisor we will look at.

A command is sent from the virsh, the command line tool for libvirt, to the libvirt library.

Libvirt then sends the command to QEMU which sends it to the KVM. At the KVM layer, it is run on the hardware. If the hypervisor instead was a Xen, we would use the Xen tools to send the command down directly to the hypervisor. It is possible to run libvirt on a Xen hypervisor.

Figure 2.3: Command flow in libvirt.

As stated before we are only considering QEMU and KVM, but OpenStack have the possibility 8

(21)

to run different hypervisors. The algorithm we are proposing should work on all hypervisors as long as the virtual disk is handled the same way.

2.2 Forensic science

Forensics or forensic science, is "the application of scientific knowledge and methodology to legal problems and criminal investigations", according to an English dictionary. The early days lacked standardised forensic practices; this led to criminals avoiding punishment. Criminal investigators relied heavily on confessions and witnesses. One of the first real breakthroughs in forensic is the use of fingerprints. When it became possible to connect a crime scene and a person through fingerprints, it became easier to prove that a specific person committed the crime, or at least was at the crime scene. Today the forensic science has come a long way and contains well-defined standards on how evidence should be collected and what is allowed to do in a specific scenario.

Along with new technologies, the forensic must evolve. Standards and laws need to be changed and updated. During the early 80s, the personal computers became more accessible which meant that more people started to use it. At the same time, a new type of crime was recognised, hacking. To find and convict criminals in this new medium a new forensic method had to be formed, thus leading to the term digital forensic.

2.2.1 Digital forensic

Digital forensic is the name for recovering and investigating material found on a digital device. A digital device is a device that is capable of storing digital data, for example, computers, mobile phones, memory cards, etc. Even networks can be included in this term, even if the networks do not store any digital data. With the rapid advancement in technology, the term digital forensic had to be split into sub-areas. Today forensic investigators are specialised in a specific digital medium or system. For example there are investigators that are considered Android forensics specialists, database forensics specialist, network forensics specialist, etc. This paper will investigate cloud forensic and snapshot forensic.

A digital investigation have many applications. The most common task is to support or disclaim a hypothesis in court, but this is not the only application scope. Corporations can use forensics to investigate internal corporate information leak or data breach, something that the company does not necessarily want the police to be involved in but the company wants to know when and where an event happened.

A digital investigation usually consists of four steps, collection, acquisition, analyse and report [1].

• Collection - Collection, according to Swedish police standards, is the search, collection and securing of information stored on a digital medium intended to be used as evidence.

During this step everything is documented with photographs and video, when possible.

The purpose with this is mainly to be able to reconstruct the scene in a lab during the analysis phase, but also to ensure nothing is lost during transport.

During the first years of collecting digital evidence the police always shutdown systems that were running and brought them back to a lab for analysis. This is not always the best way of doing it, and in some cases it is even illegal, according to the principle of proportionality.

(22)

The principle of proportionality states that the damage caused by an investigation must be proportional to the crime being investigated. It is for example not allowed to collect all disks from a cloud provider if a single virtual machine or user is the subject of a minor crime.

When a system is running with encrypted disks is encountered the best option is to do as much of the forensic process on the live system as possible. When the system is powered off the data is encrypted and if the key is unknown, lost. The action of performing an analysis of a running system is called live forensic.

• Acquisition - Sometimes during an investigation, collection of the digital system of interests is not feasible, meaning that the systems cannot be removed from its original location. This could happen because of physical or legal constraints. When encountering this situation the data of interest is acquired logically. This means that the data is copied logically and brought back to a lab. What is acquired and how much is determined from case to case.

If all data is to be acquired a forensic copy of all disks are made, this is to ensure the original data is not modified. If data, which is not involved in the crime or event is found, it can be deleted from the copy. Confidential data concerning a third party is also to be treated accordingly, not to cause unnecessary suspicion, expense, or inconvenience.

Sometimes data that requires a specific authorization is present on a system. In these cases it might be preferred to acquired only specific data instead of a complete copy. An example of data that is not to be acquired during an analysis is email correspondence between the suspect and a lawyer.

• Analyse - There is no clear line between the analysis phase and the collection/acquisition and they are usually done in parallel. In this phase, the data that was collected/acquired in the previous phase is analysis with the purpose to secure data that will be used as evidence. The collection/acquisition and analysis phase is iterative, meaning that it is repeated multiple times as new potential evidence is found. The analysis should be done with consultation with the inquiry leader.

The analyst attempts to recreate the occurring events once it has been understood. This is done to confirm the hypothesis of what happened.

• Report - The documentation from all the methods and actions taken and any evidence discovered are compiled into one report. This also includes the actions done during the collection/acquisition. The report of the investigation is the most essential document.

Usually, a lawyer or prosecutor will only read this report and base the case upon it.

Keep in mind that the steps above are very generic and simplified. They are also based on the method the Swedish police are using and does not necessarily follow other countries standards, even if the general steps are similar.

10

(23)

2.2.2 Cloud forensics

Today it is possible to run and maintain a whole company using cloud services. This requires the investigators to find new ways to handle data since the old standards are not suitable for a cloud environment [11]. The first problem occurs when data is no longer bound to a physical location or specific disk. When stored in the cloud a file could be fractured into smaller pieces and spread across multiple disks. This creates a problem when an investigator shall collect evidence for a case. As described above a usual way is to collect evidence is to take the physical disk. This is unrealistic to do in most cases when working in a cloud since it will damage the cloud provider and all users running virtual machines on the same server. In extreme cases, this is done, for example during the investigation of the torrent site The Pirate Bay.

Since it is not feasible to collect the disks in a cloud environment, new methods are required.

The method researched in this paper will look into the possibility to use a snapshot, a copy of the virtual disk at a specific time, as a forensic disk copy.

2.2.3 Physical versus logical copy of a disk

When referring to a physical disk copy, we mean the specific method of requiring a disk. The physical disk copy this is when an analyst takes the physical disk, puts it in a dedicated write blocker and then copies the disk bit by bit to another hard drive. This is to ensure that nothing will change on the original disk. The original drive is then stored in a safe place.

A logical copy is when we are not using dedicated hardware to make the copy. A simple

"dd"-command in Linux would be sufficient to make this kind of copy. Since the disk we are copying is connected to a computer, we cannot know that the disk is in a read-only state, even if we mount it that way. The computer we attached the disk to could be broken or infected. This is why an analyst should use a dedicated write blocker from a known provider as often as possible.

In the case of analysis in the cloud, an analyst needs to take a logical copy since the physical disk is not accessible in most cases.

2.2.3.1 Non-volatile Data

Non-volatile memory is a storage that has the capability to hold saved data even when the power is cycled.Examples of non-volatile memory are flash memory, SD-cards, optical disks, and magnetic devices. Magnetic memory refers to hard disk drives, floppy disks, and magnetic tape.

This type of memory is usually used as a secondary storage or consistent long-term storage in a system.

The most common PC will store its boot partition and file system on a secondary storage, usually a hard disk drive or a solid-state disk. In a forensic analysis this is the device where most of the data is found. This data is easy to collect since it does not require any special equipment to create a logical copy. During an investigation a disk copy is preferred to the acquisition of only certain files. If only found files of interest are copied, important data that could have been hidden on the disk is lost. Such data could be found in slack space of files or in unallocated memory.

Slack space is the bytes in a block that is not used by the file allocating the block. For example if a file with a size of three Kilobyte (kB) is occupying a block with a size of four kB the block has one kB of unused memory which could be used to hide data. To ensure all data is collected a full disk image could be created of the disk.

(24)

A bit by bit copy also ensures that deleted files that have not yet been overwritten are accessible in the disk image. Files on a disk are stored differently depending on the file system. On a Unix system all files are associated with an inode, which is defined by an integer called inode number.

Inodes stores metadata about the file and a reference to where the data is stored on the physical disk. What happens with the inode when a file is deleted depends on the file system in use, for example, deletion of a file on ext3 will result in a complete wipe of information in the inode which is then marked as unallocated. But on ext4 the status of the inode will be changed to unallocated, which means that any metadata is left untouched until the inode is allocated again.

The file is not deleted or overwritten on the disk and can still be accessed if the location is known.

This means that all deleted files on a system can be recovered until they are overwritten on the physical disk. The method for extracting these files are called file carving. The same method of extracting files can be used on a Windows operating system, but instead of inodes, Windows uses a Master File Table to keep track of files.

2.2.3.2 Volatile data

Data found in digital systems which are stored on a volatile storage medium, such as Random Access Memory (RAM), are usually referred to as volatile data. That is because this type of data is usually lost forever if the system is powered down. Volatile data increases the difficulty of collecting relevant evidence of a digital system because it must be collected on a live system.

When collecting data on a live system, the examiner will alter the system in various ways, which might compromise additional evidence located on the system. Because of the risks involved when collecting volatile data the examiner should weigh the pros and cons of doing so (usually at the scene, however criteria are commonly defined beforehand). If the examiner decides to collect volatile data then it is good practice to use a script written on an alternative system which performs the actual collection. This method is preferred because all actions will alter the system in various ways and a script eliminates the risk of typos and unnecessary commands being executed on the system.

There are different types of volatile data that can be collected from a digital system, data interesting for the case at hand will differ from case to case and from system to system. Volatile data that might be of interest are contents of memory, network configuration, network connection, running processes, open files, login sessions and operating system time. Note that this list is not exhaustive. Volatile data of interest will change from case to case depending on the reason of the investigation. For an example network connections, login sessions, network configuration might be of particular interest if the system has been involved in malicious activity taking place on the network. Whereas running processes and open files might be of more interest if the investigation concerns piracy or child pornography. If there is uncertainty concerning which volatile data to collect, then the examiner should always collect as much data as possible as the possibility to collect the data will be lost once the system is powered down. Since some information is more time sensitive than other, volatile data should be collected according to the time sensitivity. An example of differences is that network connections, and login sessions might time out at any time, while it is unlikely that the network configuration or operating system time will change.

Recommended order of collection:

1. Network connections - Active network connections can reveal outgoing connections which could be additional systems of interest for the case. Incoming connections could potentially show active backdoors on the system. Netstat is a software commonly present on system which can be used to print active network connections.

12

(25)

2. Login sessions - Active login sessions could show malicious sessions active on the system, which could have been used to launch attacks.

3. Contents of memory - The memory of a system could potentially store a lot of interesting information only available while the system is powered on. Examples of interesting data could be malicious software which only resides in memory, keys used to encrypt the entire or a part of the system and data which has not yet been written to persistent storage.

4. Running processes - Processes on a system could be started which will wipe any traces of itself from the persistent storage, thus only running in RAM. This means that any traces of the process/program are lost upon system shutdown.

5. Open files - Files that are open on a system could point an examiner in the direction of interesting data.

6. Network configuration - It is important to document how the network of the system is configured, if there are multiple network interfaces, what Internet Protocol (IP) addresses are present on the system, if the routing table has been modified.

7. Operating system time - The time of the operating system must be documented as it is the key to being able to determine the sequence of events. This time might also differ between systems as they might be located in different time zones or might just have been configured incorrectly.

If possible, the examiner should also check the system for potential rootkits that has been installed on the system. The reason is that they might have been installed with the purpose to feed false data when trying to collect the volatile data.

2.2.3.3 Snapshotting in QEMU

A snapshot is the exact copy of a disk at a specific time [12]. Since DevStack is our test environment which is running QEMU, each instance will be stored as a Qcow2 or Qcow3 disk image depending on version. There is no significant difference between the two, except some added functionality in Qcow3. Therefore we are treating the two versions the same way. We are not analysing the first version of Qcow files since they are not used in the newer OpenStack deployments.

Qcow images are so-called copy-on-write images. The technique is to use a disk image file called Qcow image, as a backing file and use overlay files which stores changes. When an instance is launched all necessary data is read from the backing image. When the instance then writes anything to disk a new Qcow image is created to store these changes on, this is called an overlay. This way of launching and storing data will create smaller disks for each instance launched from the same backing file, since only changes are written to the overlay. Each instance launched from the same backing image will have its own overlay with changes of the backing image. The backing image will be opened in read-only mode when an instance is running, preventing changes. A change of this image will affect all instances launched from that specific backing file.

When a snapshot is taken of a Qcow image, the current overlay is saved and a new one is created. The old overlay will still use the original backing image as its base, but the new overlay will instead use the old as its backing image. The snapshots are in this way linked with each other and all snapshot overlays are required to be able to run the system without data loss. It is

(26)

important that every backing file is stored and loaded in read-only mode [13]. See figure 2.4 for a visual presentation.

Figure 2.4: Qcow Snapshot.

If a user wants to remove the backing image, it is possible to do by doing a merge of all the snapshots into one overlay that includes the data needed from the backing image. This is what OpenStack does. The command used is the Block Pull or Block Stream in some part of the source code. This commands creates a new Qcow image to act as a new overlay. The libvirt then streams, or pulls, the data from all sources into the new file. Every block that is occupied on the overlays is copied over to the new overlay. Because the copy only checks if the block is occupied and not if it is allocated inside the Qcow image it should be possible to recover deleted files from the snapshot. When the merging is complete, a Qcow image containing all data from every snapshot and the backing file has been created. This new file will not have any backing file and new instances could be launched from this image. See figure 2.5 for a visual presentation.

Figure 2.5: Snapshot Merge.

Note that the current versions of OpenStack do not support snapshotting with multiple snapshots merging into one. The current solution only uses one overlay for each instance and each time this overlay is snapshoted a new launchable Qcow image is created. This is, in a forensic standpoint, much better since the risk of overwriting data becomes smaller than when merging several overlays into one.

14

(27)

When the snapshot command is issued in OpenStack, the Nova service will handle the request [14]. Nova then looks on which compute node the instance to be snapshotted is running on and opens a connection to the hypervisor running on that node. Nova then sanitises all parameters received by the request and passes them into the hypervisor driver library, in our case this will be libvirt since DevStack is using QEMU in our solution. Libvirt then creates a new Qcow image, denies the running machine CPU time (causes all execution on the VM to halt) and starts the stream/pull command to mirror the disk snapshotted [15]. When the stream is complete a new standalone Qcow image has been created and the VM is once again permitted CPU time.

2.2.4 Importance of timestamps

When analysing a computer system during a forensic investigation, the examiner must be carefully consider available timestamps. This is because the interesting information found in a system has to be pieced back in order to be able to determine the sequence of events. In the best of worlds, the system clock found on the systems would have been synchronized with a hierarchical network of systems designed to synchronise clocks. Unfortunately this is not always the case.

System clocks might not have been synchronized , the internal clock of the system might drift by ticking either too fast or too slow, the clock might have been incorrectly configured by a system administrator, etc. Events like these increases the difficulty of piecing events back in the correct sequence. In addition it is not improbable that systems clocks have been configured to different time zones.

Due to these facts it falls on the investigator to take note of all the different time configurations that might be present on the systems. This is so that timestamps later found on the systems can be recalculated to a standard base, from which they can be pieced into order of event.

Network Time Protocol (NTP) is a time protocol designed by David L. Mills with the purpose to synchronise time between computer systems. Computer systems configured to use NTP communicates with an accurate time server or a set of servers to retrieve the time. Due to the network latency when communicating with other systems, a modified version of Marzullo’s algorithm called "intersection algorithm" is being utilized to calculate and correct time inaccuracy introduced by the network latency. The current version in use is version 4 which has been standardized and can be found in RFC 5905 [16].

2.3 Automated tools

Server configuration and management was traditionally performed manually by the personnel managing the systems. As the amount and complexity of systems grew so did the risk of mistakes and the time required to perform those tasks. The introduction and growth of the Cloud has also affected configuration needs. Typically large system has to be configured in a very specific way and even a small mistake could have devastating consequences. CSPs offers the ability to quickly launch VMs to be used by the consumer, these machines has to be configured quickly for production.

Software configuration management tools exists to automate the process of configuring systems. The automated process inherently speeds up, uniform the configuration, and minimizes potential human mistakes. There exist various tools where the majority of them are open-source.

Four of the most popular automation tools are Ansible, Puppet, Chef and Salt. A brief explanation

(28)

of the four can be found below:

• Ansible - is a clientless solution which uses Secure Shell (SSH) to connect to nodes. It is designed with focus on security and reliability. Tasks in Ansible are defined as "Playbooks"

which are written using YAML Ain’t Markup Language (YAML) syntax. Various modules developed and maintained by Ansible is available for use and user contributed modules can be accessed for additional functionality. Modules are traditionally written in Python as it is the language Ansible is written in, but any language is supported.

• Puppet - is the most mature tools out of the four. It uses a client-server model to communicate with the nodes, which means that agents (client software) is installed on all nodes. Puppet uses a custom language to describe how systems should be configured.

Ruby is the language under the hood of Puppet and thus it is the language to be used when writing modules.

• Chef - uses a client-server model for communication with the nodes, but also supports a standalone mode. Ruby and Erlang is the languages used to write Chef (Ruby/Erlang on the server, Ruby on the client). Self-written modules have to be written in Ruby.

• Salt - is a Python-based configuration management tool which uses a client-server model when communicating with the nodes. Tasks are defined as scripts which are called Salts, these can be added or removed to define configuration scenarios. User-written modules should be written in Python to comply with the Salt module design.

Configuration and management tools are a relatively new solution and new options are becoming available, thus the list above is not exhaustive. These tools offer slightly different possibilities and options for configuration and management and therefore suits different needs.

2.4 Technical standards

Standards are norms established to standardise technical systems and methods. Standards can be issues by companies, regulatory bodies, standard organizations, etc. Who issues a standard and why depends on the need of a closed group and/or the public. Companies issues private standards as to standardise methods and functionality within the company while standard organizations issues standards to publicly unify something. One example of a standard organization is International Organization for Standardization (ISO). Their headquarter is located in Geneva, Switzerland but they are issuing standards affecting 162 countries as of March 2017.

Standards designed to describe digital forensic guidelines and best practises are sparse.

Generally legal authorities in each respective country have to establish guidelines for how digital forensics investigation should be carried out. This includes planning, evidence collection, analysis and reporting. These guidelines aims to unify the forensic process and ensure the highest probability of evidence authenticity within a specific country. Guidelines may or may not impose difficulties when investigating between countries as regulations may differ.

ISO published in October, 2016 a standard (27037) which proposes guidelines for identification, collection, acquisition and preservation of evidence found in digital systems [25]. This standards is described more in detail below.

16

(29)

2.4.1 ISO 27037

This standard describes activities when identifying, collecting, acquire and preserve digital evidence. Guidelines described concerns digital storage, devices and equipment that might contain digital evidence. Four general requirements are discussed.

• Auditability - All actions made during an investigation should be documented. This is because the process should be available for evaluation and all decisions should be argued for.

• Repeatability - An independent Digital Evidence First Responder (DEFR) should be able to follow the documentation of the acquisition process and end up with the same result.

There might be circumstances when this is not possible, the DEFR should be able to argue for this.

• Reproducibility - An independent DEFR should be able to reproduce the acquisition process under different conditions and when using different tools and still end up with the same result. Reasons for deviations should be argued for.

• Justifiability - All actions and decisions made during the acquisition should be justifiable.

This means that the DEFR should be able to show and argue that the decisions made was the best choice.

General requirements established when handling processes are: any handling of evidence or potential evidence should be kept to a minimal. Any changes made to collected evidence should be carefully documented, the process should comply with any rules established locally and actions should never be taken beyond the competence of the investigator. The initial four key processes of the standard is described as follows:

• Identification - of digital storage and devices which potentially contains evidence. All identified sources should be ordered according to volatility and order of collection/acquisition should be determined. Sources containing potential evidence might not be easily identified as they might be hidden, stored on a remote site, located in the cloud, etc.

• Collection - is a process where sources of potential digital evidence are collected to be transported to a controlled laboratory. Sources might be encountered either powered on or powered off and will likely require different approaches, locally established guidelines should be followed.

• Acquisition - is a process to gather data where the sources of potential digital evidence can not be removed from its original location. In this situation an acquisition is desirable.

The process should produce a digital copy of the source. The DEFR should verify that the copy is an exact copy of the source. If verification is impossible (e.g. on live systems or faulty disk) the DEFR should document the situation and prepare to defend the decision.

• Preservation - of evidence either collected or acquired is of uttermost importance to prevent potential evidence from tamper or destruction. The DEFR should be able to demonstrate the integrity of the evidence (prove that it is the same evidence as initially collected).

The standard also describes the importance of document any handling of evidence as to preserve the trustworthiness (also known as preserving the chain of custody). There are

(30)

also general recommendations regarding surrounding requirements such as personnel, roles, responsibilities, competence etc.

2.5 Laws and preliminary investigation 2.5.1 Rättegångsbalken

Rättegångsbalken (RB) is a legal document in Sweden which consists of procedural laws [26].

This document is mainly a collection of laws concerning trails (who can be prosecuted, when, why, etc.). There are also laws which describes how preliminary investigations should be carried out and evidence handling. A few paragraphs found in RB are especially of interest when discussing legal actions in a cloud environment.

Paragraph Swedish English Translation

23:4 Förundersökningen ska bedrivas så att inte någon onödigt utsätts för mis- stanke eller orsakas kostnad eller olä- genhet.

A preliminary investigation shall be conducted so as to not cause someone unnecessary suspicion, expense or inconvenience.

27:1 Föremål som skäligen kan antas ha betydelse för utredning om brott eller vara avhänt någon genom brott eller förverkat på grund av brott får tas i beslag. Detsamma gäller föremål som skäligen kan antas ha betydelse för utredning om förverkande av utbyte av brottslig verksamhet enligt 36 kap. 1 b

§brottsbalken.

Tvångsmedel enligt detta kapi- tel får beslutas endast om skälen för åtgärden uppväger det intrång eller men i övrigt som åtgärden innebär för den misstänkte eller för något annat motstående intresse.

Objects that can reasonably be assumed to be of importance for the investigation of crimes or taken away from someone by crime or forfeited because of the offense may be confiscated. The same applies to objects that can reasonably be assumed to be of importance of the investigation of forfeiture in exchange for operations of crime under chapter 36. 1b §brottsbalken.

Coercive measures under this chapter may be adopted only if the reasons for the measures outweigh the intrusion or but for the rest of the measures for the suspect or for any other conflicting interest.

18

(31)

27:10 Beslagtaget föremål skall tas i för- var av den som verkställt beslaget.

Om det kan ske utan fara och eljest är lämpligt, får dock föremålet läm- nas kvar i innehavarens besittning.

Ett föremål som lämnas kvar i innehavarens besittning skall förseglas eller märkas som beslagtaget, såvida det ej framstår som obehövligt.

Ett föremål som ej tas i förvar eller förseglas får nyttjas av innehavaren, om ej annat beslutas.

Confiscated objects shall be in the custody of the person who executed the seizure. If it can be done without danger and otherwise appropriate, may the object be left in the holder’s possession. An object left in the holder’s possession shall be sealed or marked as seized, unless it appears to be unnecessary.

An object not seized or sealed may be used by the holder, unless otherwise decided.

27:15 För säkerställande av utredning om brott må byggnad eller rum tillstän- gas, tillträde till visst område förbju- das, förbud meddelas mot flyttande av visst föremål eller annan dylik åtgärd vidtagas.

To ensure the investigation of crimes may building or room be closed, access to the specific area be banned, a prohibition of removal of certain objects or other similar measures taken.

The laws mentioned above are just examples of Swedish laws that affects legal investigations of cloud environments.

2.6 Similar work

Work similar to the work in this thesis has been done by Sameera Almulla, Youssef Iraqi &

Andrew Jones and can be read in their paper "Digital Forensic of a Cloud Based Snapshot"

[29]. Their research goals were to investigate the snapshot functionality of the Xen hypervisor, possibilities to use the snapshot during a forensic investigation and if it can be done without reconstructing the originating environment. Their proposed solution was to create a snapshot of a VM using the snapshot function in the Xen hypervisor. A copy-on-write image was created by the operation, which was analysed using the open-source forensic tool DFF [28]. They found that during an analysis of the snapshot, files deleted using the graphical interface, command line and shredding tools could be successfully retrieved.

They concluded that the copy-on-write snapshot generated by the Xen hypervisor could possibly be used as a method of extracting and preserving evidence in a forensically sound manner. They also concluded that the original environment was not required.

(32)

(33)

3 METHOD

3.1 Tests of Qcow disk image

We will attempt to answer our first research question: "Is it possible to use the snapshot functionality in OpenStack as an alternative to a forensic disk clone?" by using an experimental method.

A series of experiments will be conducted on both a snapshot generated by OpenStack of a VM and on a forensic disk clone of a physical server. The two systems will be installed to mimic each other as much as possible. The outcome of the experiments conducted on the two systems will be compared.

The experimental method was chosen because we are foremost interested in how a snapshot generated by OpenStack resembles a forensic disk clone. This method allows us to create a hypothesis based on information learned during pre-study. Then a series of experiments can be conducted which simulates real-world scenarios and compare the results. The comparison should clearly show if a snapshot could be used as an alternative to a forensic disk clone.

A literature study could have been conducted to be able to conclude an answer to our research question. However, we felt that the available documentation and research is inadequate to draw a conclusion of the usefulness of the Qcow2 format. Also, the nature of conducting experiments on the actual disk clones resembles actual work that could be made during a forensic investigation.

We would have preferred to some kind of a simulation technique to compare the usefulness of the Qcow2 format. A simulator would have allowed us to test various scenarios with just minor differences and execute those multiple times to verify the result. However, we were unable to find a simulator which could get the job done.

Another way to extract files from a cloud would be to use introspection. That is a technique where it is possible to extract data directly from the VM by injection into RAM and run the program from there. This technique is not fully developed for KVM, and it would also fall outside of the scope for this paper.

All test are performed on DevStack running Cinder v. 2.0.1, Glance v. 2.6.0, Neutron v.

6.1.0, Nova v. 7.1.0 and Keystone v. 3.8. Each test will create its own instance so as to not affect the result. Before the tests, the OS_ variables will be set to allow the playbook to connect to DevStack and create an instance.

e x p o r t OS_USERNAME=admin

e x p o r t OS_PASSWORD=z2o3xjE4eN6PSLux963e e x p o r t OS_AUTH_URL= h t t p : / / 1 0 . 1 . 0 . 3 7 : 5 0 0 0 / e x p o r t OS_PROJECT_NAME=demo

All tests will be repeated on a physical machine running Ubuntu Server 16.04 (ext4 as the file system). On this device, a forensic extraction will be made using the dd command in Linux.

Between each test, the disk will be overwritten with a file created by dd before any test were performed.

3.1.1 Scenario 1: Find a file on a snapshot This test aims to find an existing file on a Qcow disk.

Hypothesis: The file will be found on the system.

(34)

1.Createtestenvironment-Inthecloud: CreateanUbuntuServer16.04instanceand injectanSSH-keyforeasyaccess.

Onthephysicalmachine:CreateacleaninstallationofUbuntuServer16.04onthedisk. 2.ConnecttotheVM(Cloudonly)-UseSSHtoconnecttotheinstanceusingSSH-key. 3.Createaﬁleandwritetodisk-Createaﬁlewithsomeuniquesearchablecontent.

echo "This is myfile" > myFile sync

Thesynccommandisusedtoensurethattheﬁleiswrittentodisk. Withoutthis,thereisa risktheﬁlemightresideinRAMandnotonthedisk.Thesnapshotfunctionshoulddo thisbuttobesurethisisdonemanually.

4.Createanimageoftheinstance-Inthecloud: CreateasnapshotoftheVMusing Horizon.

Onthephysicalmachine:Createanimageofthediskusingdd.

5.Findthefileonthediskimage-Catthefileandusegreptofindthefilecontent. cat <disk_image>| grep a "This is myfile"

The"-a"ﬂagtellsgreptotreattheoutputasASCII. 3.1.2 Scenario2:Findadeletedﬁleonasnapshot

ThistestaimstoshowthatafilethathasbeendeletedcanstillbefoundonaQcowdisk. Hypothesis: Thefilecontentwillbefoundevenifthefileisdeleted.

1.Repeatstep1-3fromTest1,subsection3.1.1 2.Deletetheﬁle-Removetheﬁlefromthedisk.

rm myFile sync

3.1.3 Scenario3:Findﬁleswiththesamename

Thistestisdonetoseeifwecanﬁndtheoldcontentofaﬁleorifitwillbeoverwrittenwithnew content.

Hypothesis: Webelievewewillbeabletoﬁndbothcontentsoftheﬁle. 1.Repeatstep1-3fromTest1,subsection3.1.1

2.Deletetheﬁle-

22

(35)

rm myFile sync

3.Createanewﬁlewiththesamename-Usethesamenamesincewewanttocheckitwe stillgetseetheoldcontentonthedisk.

echo "New content same filename" > myFile sync

5.Findthefileonthediskimage-Catthefileandusegreptofindthefilecontent. cat <disk_image>| grep a "New content same

filename"

6.Findthecontentoftheoldfile-Catthefileandusegreptofindthefileoldcontent. cat <disk_image>| grep a "This is myfile"

3.1.4 Scenario4:Findoriginalﬁle-contentfromachangedﬁle

Thistestaimstocheckwhathappenswithaﬁleifthecontentisupdated.Iftheoldcontentwill stillbeaccessible.

Hypothesis: Bothcontentswillbefoundonthedisk. 1.Repeatstep1-2fromTest1,subsection3.1.1

2.Createaﬁleandwritetodisk-Createaﬁleinsidethehomefolder. Writesomecontent thatiseasytosearchfor.

echo "This is the original content" > myFile sync

3.Changethecontentofthefile-Sincewealreadyhaveafileopenthefileandchangethe content. Weareusingatexteditorthatdoesnotcreateanewfilewhenwritingtodisk. We usednanoforthis,vimcreatesanewfilewhichchangestheinode.

nano myFile

Changethecontentoftheﬁleto"Thisisthemodiﬁedcontent"

5.Findthefileonthediskimage-Catthefileandusegreptofindthefilecontent. cat <disk_image>| grep a "This is the modified

content"

(36)

6.Findtheoldcontentonthesnapshotdiskimage-Cattheﬁleandusegreptoﬁndthe oldcontent.

cat <disk_image>| grep a "This is the original content"

3.1.5 Scenario5:Findadeletedﬁleonaﬁlleddisk

Thistestwillcheckwhathappensifthewholediskisused. Thusthedeletedﬁlehasbeen overwritten.

Hypothesis: Thecontentwillnotbefound.

1.Repeatstep1-3fromTest1,subsection3.1.1 2.Deletetheﬁle-

rm myFile sync

3.Fillthediskwithrandomdata-Createafilethatisbigenoughtofillthecontentofthe disk.Thisshouldoverwriteallblocksthatarenotinuse,thuspreventingusfromfinding thedeletedfile.

dd if=/dev/urandom of=./file bs=4kiflag=fullblock, count_bytes count=15G

3.2 Algorithmforautomatedextraction

Wewilluseacomparativemethodtodeﬁneanalgorithmwhichcouldbeusedtoextractpotential evidenceautomatically.Thealgorithmwillbedeﬁnedbystudyingexistingrecognisedmethods andtryingtoadaptthosetothecloud.

Ourproposedalgorithmwillbecomparedanddiscussedagainsttheexistingmethods.

Thealgorithmshouldpreferablybeusedinanactualcaseandtestedincourtasthatisthe ﬁnalevaluation.Unfortunately,thisisnotpossible,asithastobeestablishedandrevisedbefore beingappliedtoarealcase.Inaddition,acasecouldspanoverseveralyears,whichisnot feasibleinthispaper.

Analternativemethodofestablishingthealgorithmcouldbetoconductinterviewswith peopleinvolvedinactualcases(prosecutors,laymen,forensicworkers,etc.). Questionslike

"Whatinvalidatesevidences?"and"Whatshouldbeconsideredwhencollectingevidence?"

shouldbeasked. Weoptednottotakethisapproach,asitwouldrequireagreatcommitment ofthepeoplebeinginterviewedandalargeamountofparticipants,whichwouldbediﬃcultto

24

(37)

get hold of. We have however discussed the algorithm with forensic analysts that works for the Swedish police and the Swedish tax Agency. Both parties agree that the algorithm could be tested in a court case.

Our algorithm will be run and tested on DevStack. DevStack is a test environment that is easy to setup and runs OpenStack with QEMU. This machine has been running in City Cloud, City Networks public cloud. City Cloud is running on KVM. This should not have any effect on our solution. The test environment is running Ubuntu Server 16.04, 16 GB ram, 4 cores and 100 GB of storage.

DevStack is installing all necessary OpenStack components to be able to run a cloud. The components are Nova, Glance, Cinder, Neutron and horizon. Versions of these components have changed during the project since we are using the master branch of the DevStack git to make the installation. Several installations have been done because DevStack has a tendency to crash.

The algorithm proposed will be tested against a scenario where an arbitrary virtual machine has been infected or is acting strangely. This strange behaviour would be recognised by an Intrusion Detection System (IDS) which executes an Ansible playbook, containing our algorithm.

This scenario will be tested by creating a virtual machine which will be considered a regular VM running in the cloud. A file which could be of interest is created (which is the file to be retrieved automatically).

3.2.1 Choice of automation tools

We are using Ansible as our automation tool. This is because Ansible is a clientless solution which only requires SSH to access the clients. Most of the modules are written in Python and Python is required on the machine the modules are running on. Ansible transfers modules to the targeted clients and executes them in the specified order. This should not impose a problem since Unix distributions generally have Python installed.

3.2.1.1 Raw commands

Ansible offers the option to use raw commands directly via SSH to a client. When a module is used Ansible uploads the module into the /var/tmp directory from which it runs the module. A majority of the modules require Python to run. The raw options allow Ansible to install Python, or any other software as if the command was written in the command line interface. In a forensic analysis, it is important to keep the alteration of the machine under investigation to a minimum.

If a normal module is used, Ansible will write data to the disk which most certainly will alter the system and potentially destroy evidence. Ansibles raw module solves this for us. Since we are not uploading any files to the machine, we will keep the system alteration to a minimum. The raw option allows us to run Unix commands to monitor volatile data of the system. Commands we opted to run are:

1. netstat -anp - Netstat can be used to monitor the networking subsystem, this means active connections, routing table, interfaces, etc. The "-a" option tells netstat to output both listening and non-listening sockets. "-n" prevents netstat from looking up the symbolic host but prints the IP addresses instead. The "-p"-flag tells netstat to included the Process identifier (PID) of the program owning each socket.

2. netstat -rn - The "-r"-flag is used to print the kernel routing table.

(38)

3. w -i - The w-program shows all logged on users on the system and what they are currently doing. "-i"-flag is used to print IP addresses of each user instead of hostnames.

4. ps -aux - The ps-command outputs information about running processes on the system.

"-ax" generally prints all processes. "-u" prints processes of users on the system.

5. lsof -n - lsof stands for "list open files" and does just that. "-n" tells lsof to print IP addresses instead of hostnames.

6. ifconfig - This command outputs information about network interfaces on the system.

7. date -R - Prints the date of the system. The "-R"-flag forces the output to be formatted ac- cording to RFC 2822. An example of RFC 2822 format is: Tue, 18 Apr 2017 10:01:37 +0200 These commands follows the standard on which order the information should be gathered according to the recommendations stated in section 2.2.3.2. The list could easily be expanded to include additional programs and procedures.

3.2.1.2 Gather_facts module

This module is automatically run on the machine before any other modules are run. Since it is a forensic analysis, all additional modules could alter the system. For the same reason as for the raw command, this is not desirable and thus disabled.

3.2.1.3 Ignore_errors option

If Ansible goes into an error state, the whole playbook will stop its execution. In some of the plays, a module failure is not considered a complete failure of the execution and the playbook should continue with the next task. An example is the raw command "which lsof", which will return the path of the program lsof if the program is installed. A failure in this task will only notify the play that the program is not installed, not that something went wrong with the play, and the playbook should continue.

3.2.2 Snapshot solution

The proposed algorithm will be using the snapshot function in OpenStack to create a stand-alone Qcow image of an instance. The overlay disk of the instance will be merged together with the backing file. This is the best solution when using OpenStack. Another solution would be to use Ansible to connect to the control node hosting the instance of interest and create a direct copy of the overlay disk. The problem with this alternative is that we do not know which node the instance is running on, also copying a disk of a running system might result in a corrupt copy. To be able to connect to the hosting node the IDS system needs to have access to each and every node available in OpenStack. For simplicity, it is easier to use functionality exposed via OpenStack. Functionality in OpenStack also ensures that the state of the targeted instance is unchanged to ensure the created the snapshot is consistent.

3.3 Analysis of Qcow using forensic tools

The forensic tools used is The Sleuth Kit. A collection of command line tools together with a C library. These tools allow a user to analyse a disk image. The Sleuth Kit is the backbone in Autopsy and other open source forensic tools. The web interface for Autopsy was used during the investigation.

26

(39)

Start the web interface and create a new case. Open the snapshot image and browse the file system. The deleted file names can be found in the folder together with the inode for the file.

The Autopsy web interface were not able to the find the content of the file. Since we know the content of the file, we used the Linux command grep to find it directly in the snapshot. We were able the find the content which shows that another forensic tool can find the content of a deleted file in snapshot when Qcow disks are used.

3.4 Prove non-repudiation of snapshots

The integrity of the created snapshots will be proved by using hashing (one-way encryption).

Hashing should be sufficient since any changes made to the snapshot will result in a different hash value. We will study the workflow of the snapshot functionality and compare different stages where the hashing could be performed.

We will attempt to find methods of proving the origin of the snapshot (authenticity) by studying unique data found in VMs. Available unique data will be located by analysing the flow at VM launch.

Automated file extraction in a cloud environment for forensic analysis