Ensuring Continuous Security in the Cloud and Compliance with GDPR

(1)

UPTEC IT 17008

Examensarbete 30 hp Maj 2017

Ensuring Continuous Security

in the Cloud and Compliance with GDPR

Felix Färjsjö

Eric Stenberg

(2)

(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Ensuring Continuous Security in the Cloud and Compliance with GDPR

Felix Färjsjö, Eric Stenberg

Scania is currently in the process of migrating from an on-premise infrastructure to a cloud environment. In parallel, General Data Protection Regulation (GDPR) will come into effect in 2018 and the combination of migrating infrastructure and a new regulation resulted in a need for guidance in how to progress. This thesis' goal is to establish guidelines for the Connected Services department on how to conduct development in a cloud environment whilst complying with GDPR. The finalized versions of these guidelines are the result of several interviews with experts in the field along with a proof of concept on how to secure an example application in a cloud environment.

Ämnesgranskare: Andreas Hellander Handledare: Christian Illanes, Scania AB

(4)

(5)

Popular Scientific Summary in Swedish

Scania AB genomför en resa med m˚alet att flytta nuvarande IT-infrastruktur fr˚an lokala system till Amazon Web Services, en leverantör av molntjänster. En produkt som p˚averkas av denna resa är Fleet Management Portal som tillhandah˚aller en överblick över och utvärderande av transportföretags lastbilsflotta i termer av exempelvis bränsleförbrukning. För att kunna tillhandah˚alla en

¨

overblick och utvärdering krävs att data insamlas om transportföretagets förare samt de lastbilar förarna framför.

D˚a de data som samlas in kan knytas till en enskild individ anses det vara av typen personligt data, och Scania AB kommer därmed p˚averkas av General Data Protection Regulation, p˚a svenska: Allmänna Dataskyddsförordningen.

Förordningen är fastställd av Europeiska Kommissionen och träder i kraft 25 maj 2018, där den ersätter nuvarande Personuppgiftslagen gällande insamling och hantering av personligt data om medborgare inom EU.

I och med den p˚ag˚aende resan till molnet och införandet av Allmänna Dataskyddsförordningen, uppkom det internt hos Scania AB fr˚agor gällande moln-säkerhet och hur den p˚ag˚aende insamlingen av personligt data p˚averkas av den nya förordningen.

Författarna har genom att studera Allmänna Dataskyddsförordningen, till- sammans med skapandet av ett Proof of Concept, utformat riktlinjer för hur utveckling av Fleet Management Portal ska utföras för att möta de krav ställda av förordningen samt den nya infrastrukturen i molnet. Riktlinjerna innefat- tar även hur organisationen kring Fleet Management Portal bör ändras för att uppfylla de nya kraven.

Utöver studerande av förordningen och skapandet av ett Proof of Concept har författarna även genomfört interna samt externa intervjuer, för att p˚a s˚a vis f˚a insikt i hur organisatoriska förändringar bör genomföras i praktiken för att kunna följa de riktlinjer skapade av författarna.

Riktlinjerna är avsiktligt formulerade att vara s˚a generella som möjligt för att vara tillämpningsbara inom flera grenar av Scania AB där fr˚agor kring molnsäkerhet och insamling av personligt data kan uppkomma. Riktlinjerna ska ses som preliminära hjälpmedel för att utveckla molnbaserade produkter som hanterar personligt data, d˚a riktlinjerna troligtvis kommer behöva revideras i framtiden d˚a den faktiska tillämpningen av Allmänna Dataskyddsförordningen blir känd.

(6)

(7)

Acknowledgments

The authors would like to acknowledge a group of people that made this thesis possible. The people listed in the following section have been helpful in providing insight, tips and ideas along with constructive discussions on various subjects.

The authors would like to give their thanks to the thesis’ supervisor Chris- tian Illanes in the Delivery Engineering team at Scania Connected Services for all help provided during the course of this thesis. The help provided includes provision of ideas, insight in the Scania organization and the inclusion of the authors in the organization. The authors would also like to thank Magnus Sarenius and Anders Lundsg˚ard, also part of the Delivery Engineering team, for being a platform for discussions and for their help with software based problems and troubleshooting of environmental factors.

The authors also would like to thank Peter Waher and Tomas Rimming.

Peter Waher for the help provided to understand the extent of GDPR and Tomas Rimming at KnowIT Secure for setting up interviews and general tips on security frameworks.

Last but not least, the authors would like to give their thanks to the thesis’

subject reviewer Andreas Hellander at Uppsala University for believing in and encouragement of the project.

(8)

If you think cryptography is the answer to your problem, then you don’t know what your problem is.

Peter G. Neumann

(9)

List of Figures

1 A graphical representation of the cloud stack, showing levels of responsibility between cloud provider and the organization. . . . 24 2 An illustration of continuous delivery, showing the fullstack of

a developer. Developers can push code which is automatically tested and deployed into production if it passes the tests. (Source and Copyright: Anders Lundsg˚ard, Scania AB) . . . 28 3 An illustration of the differences between a monolithic and a mi-

croservice architecture. The different levels are mapped to each others to show how failures may affect the architecture. . . 29 4 An illustration of the POC stack, showing the underlying cloud

infrastructure together with the full stack of the POC instance. . 31 5 An illustration of login handling in POC, showing the flow of the

login procedure and its result. . . 32 6 An illustration of DWR mapping Javascript to Java in POC . . . 33 7 A graphical representation of the VPC and its subnets used for

the POC. Public subnets are used to abstract underlying instances and provides access control through load balancers. In- stances and resources are placed in the private subnets with lim- ited access depending on the application. . . 34 8 A graphical representation of the public and private subnets,

along with EC2 configuration, used for the POC. . . 35 9 Injection attack plan, showing attack vectors and results. . . 37 10 Broken authentication and session management, showing attack

vectors and results. . . 38 11 XSS attack plan, showing attack vectors and results. . . 39 12 Direct reference attack plan, showing attack vectors and results. 40 13 Security configuration, showing attack vectors and results. . . 41 14 Sensitive data exposure, showing attack vectors and results. . . . 42 15 Missing Function Level Access, showing attack vectors and results. 43 16 Cross Site Request Forgery, showing attack vectors and results. . 44 17 A screenshot of the login page with username and password fields. 48 18 A screenshot of query fields and the unsafe query handler, ac-

cepting SQL injections due to insufficient input validation. . . 48 19 A screenshot of the query fields and the safe query handler. By

utilizing input validation the harms of SQL injection is handled. 49 20 A screenshot of the administration page. Admins are able to add

and remove users as well as assigning them roles. . . 50 21 A screenshot of the login page with username and password fields.XIII 22 A screenshot of the error page. . . XIII 23 A screenshot of the index page, accessible to all users. . . XIV 24 A screenshot of the index page where a query statement has been

made in Safe mode. . . XIV 25 A screenshot of the index page where a query statement has been

made in Unsafe mode. . . XV

(13)

26 A screenshot of index page where a Select * statement has been made in Unsafe mode. . . XV 27 A screenshot of index page where a Select * statement has been

made in Safe mode. Safe mode utilizes input validation. . . XVI 28 A screenshot of maintenance page, accessible only to admin and

maintenance roles. . . XVI 29 A screenshot of maintenance page with Get option chosen. . . XVII 30 A screenshot of maintenance page with Set option chosen. . . XVII 31 A screenshot of maintenance page with Del option chosen. . . XVIII 32 A screenshot of maintenance page with Put option chosen. . . XVIII 33 A screenshot of admin page accessible only to admin roles. . . XIX 34 A screenshot of admin page with Add User option chosen. . . XIX 35 A screenshot of admin page with Delete option chosen. . . XX 36 A screenshot of logout page. . . XX

(14)

1 Introduction

General Data Protection Regulation (GDPR) is coming into effect the 25th of May 2018, and along with the regulation comes uncertainty. Organizations are uncertain what GDPR implies, and how to comply with it to not risk heavy economic punishments. In parallel, cloud computing has become a major part of companies’ organizational strategies. The GDPR coming into effect together with major organizational restructuring, following a migration to cloud infrastructure, is bound to cause confusion. To fully utilize the cloud infrastructure whilst maintaining compliance with the GDPR, organizations will require knowledge in both cloud security and what the GDPR implies. To be prepared for what is coming will prove beneficial to organizations as being the opposite can cause major setback in growth, ultimately causing company downfall.

Scania Connected Services is currently migrating their on-premise infrastructure to a cloud environment, and questions regarding securing applications in the new environment emerged. Considering the relevance of GDPR, and its correlation with IT security, this thesis’ goal is to find out how to securely operate in a cloud environment whilst maintaining compliance with GDPR.

1.1 Motivation and Goals

In this thesis, the authors will study how Scania feature teams should work to fully utilize a cloud infrastructure provided by Amazon Web Services while still complying with internal and external security ruling. The internal security ruling is the code of conduct for Scania feature teams, and the external ruling is the result of legal obligations following the introduction of the GDPR.

The final goal with this thesis is to create a security guidance, reflecting each level in the Connected Services organization. The security guidance includes guidance checklists for the two divisions of Connected Services: the De- livery Engineering team and the feature teams. Each guidance checklist includes checkpoints, and following these checkpoints should result in a general secure application as well as complying with the GDPR.

Given the big impact that the introduction of GDPR will have on the future of IT, it was a great motivation for the authors to learn about the regulation as well as working in close proximity to an organization carefully preparing for its introduction.

1.2 Purpose

The thesis’ purpose is to show how the department Connected Services at Scania can conduct their work in a cloud environment while enforcing current security ruling, both internal and external. To show how developers of feature teams can conduct their work in the cloud environment, a microservice similar to currently provided services by Connected Services will be developed and deployed on cloud infrastructure. The proof of concept (POC) will work as a baseline for how current services offered by Connected Services can be deployed on cloud

(15)

infrastructure, showing necessary practices when working in such environments.

These practices involve security responsibilities within the team and proving compliance with GDPR, such as how customer data should be handled.

Questions to be answered:

1. How should guidelines for the feature teams be defined in order for their service to be adapted to the cloud?

2. How should guidelines for the feature teams be defined in order for their service to be GDPR compliant in terms of security?

3. How will GDPR affect Scania on an organization level in terms of security delegation and is there a recommended path to strive to?

1.3 Delimitation

The thesis will focus on internal software and data at Scania from a security and availability perspective. The chosen service provider for cloud architecture is Amazon Web Services (AWS), the reason being that Scania has chosen AWS as their primary service provider. The project will not perform a migration of infrastructure or internal data, but will through a POC establish a baseline in how work can be conducted in a cloud environment in compliance with internal and external security ruling. Internal ruling will be that of Scania AB and external ruling will be that of the GDPR. The POC will be developed according to Infrastructure as a Service (IaaS) as a service model, i.e. the authors will themselves setup necessary environments in AWS. The POC will not include a current service provided by the Fleet Management Portal (FMP), but will instead incorporate a microservice developed by the authors with similar char- acteristics to that of a FMP microservice. The results given by the POC will be referable for feature teams in their own adoption of the cloud environment.

(16)

2 Related Work

In the article Security Analysis in the Migration to Cloud Environments[1], the authors mention how cloud infrastructure not necessarily differ from on-premise infrastructure in terms of security, but how new risks and challenges arise when migrating to cloud infrastructure. The authors also mention how a migration could provide an opportunity for evaluation of infrastructure security to meet the demands of modern security requirements. The focus in migration of legacy applications should not be on portability, but rather on preserving or enhancing the legacy application security functionality.

The authors mention how cloud service providers have put a lot of effort in added security benefits for their customers. These benefits are Security and benefits of scale, or the economical gains of large scale security measures and the added robustness and scalability, Security as a market differentiator, or how security has become a selling point for cloud service providers with added effort spent on the aspect of security to attract customers, Standardized interfaces for managed security services, or how cloud service providers can offer standardized interfaces for managing security, and finally Rapid, smart scaling of resources, or the ability for cloud service providers for dynamic allocation of resources to increase resilience. These benefits combined with how cloud service providers often have departments specialized in cloud security and the uniformity of cloud platforms promoting automation of security management together with disaster recovery functionality adds redundancy to customer data.

For a migration process, the authors reference other works regarding the matter. One of these references is to Migrating your Existing Applications to the Cloud (Varia, J., 2010) where the author propose a phased migration with a step by step guide as follows: (1) Cloud Assessment Phase; (2) Proof of Concept Phase; (3) Data Migration Phase; (4) Application Migration Phase;

(5) Leverage of the Cloud; (6) Optimization Phase. The author of this phased strategy give some examples of security best practices, such as user credentials, user restriction to resources, encryption of data at rest and in transit.

The authors then reference works about various approaches to a cloud migration. One of these is A Benchmark of Transparent Data Encryption for Mi- gration of Web Applications in the cloud (Hu, J.; Klein, In Proceedings of 2009 Eighth IEEE International Conference on Dependable Autonomic and Secure Computing, Chengdu, China, 12–14 December 2009; pp. 735–740). The authors of this article analyze requirements of cloud applications and data encryption to secure ecommerce applications in a cloud environment. The conclusion the author produce is that user and critical business data should be encrypted, and various methods for encryption. The article focuses on assurement of data privacy and access control, not on migrational procedure per se.

Another reference work is A Case Study of Migrating an Enterprise IT Sys- tem to IaaS (Khajeh-Hosseini, A.; Greenwood, D.; Sommerville, I., In Proceed- ings of 2010 IEEE 3rd International Conference on Cloud Computing, Miami, FL, USA, 3–10 July 2010). This work is a case study on migration of a Legacy IT system in an oil and gas industry. The study identifies risks and benefits

(17)

of a migration of this legacy system from several perspectives, examples being project managers, technical managers and business development staff. The result of this study is a framework for decision-makers in the planning of migration of infrastructure. However, the study does not consider the security aspect of a cloud infrastructure.

The authors of the original literature study then present the results of the study. What the authors conclude is that there are various methodologies and frameworks for how a migration to cloud infrastructure could be realized, but that the works the authors have referenced lack in the security aspects of a migration. The authors claim that there is an urgent need for provided methodologies, techniques and tools for a migration strategy with further consideration of security aspects.

(18)

3 Scania’s Current Organization and Security Delegations

Scania’s organizational structure is wide with many branches. Having branches stretching from trucks to engines and the software in the stereo, to massive databases creates an enormous IT organization having its core place at Scania IT. The two divisions creating the Connected Services department are feature teams and Delivery Engineering. The end product that Connected Services provide customers with is the Fleet Management Portal.

3.1 Scania IT

Scania IT handles the current infrastructure located in S¨odert¨alje where all data is stored. They are responsible for the continuous uptime of the hardware, including the security of the hardware. These responsibilities also gives them a position of deciding how other branches in Scania’s organization are allowed to deploy new services. Scania IT is further divided into divisions with various responsibilites, one of these divisions being ISec. ISec is responsible for all security regarding Scania’s intellectual property, including motor designs, the Scania brand and IT infrastructure.

3.2 Delivery Engineering

Delivery Engineering is currently monitoring the infrastructure provided by Scania IT. This monitoring includes observing requests by users to the Fleet Management Portal and observing downtime of services. Delivery Engineering also handles deployments of software on the infrastructure, utilizing a deployment pipeline with automated pulls from feature teams’ version control software (VCS). At the time of this thesis, Delivery Engineering also communicate feature teams’ needs of software and hardware to Scania IT.

3.3 Feature Teams

The feature teams at Connected Services are the teams that design, develop and maintain the services that together form the Fleet Management Portal. Each feature team is lead by a product owner (PO) and each team is responsible for their respective service functionality.

3.4 Fleet Management Portal

Fleet Management Portal, or FMP, offers a variety of services to customers.

The services are collected and presented to the customer through a browser user interface (BUI). The extent of services provided is based upon customer choice, and services are provided in packages for sale. A basic package is available and included to owners of all new Scania vehicles. Through various levels of subscription, a road carrier receive more services in their provided BUI to further

(19)

optimize business. The services provided by FMP are mostly built around data collected from each truck in a road carrier’s fleet. Examples of data extracted from trucks are position, velocity, acceleration and etc. Each service uses REST APIs[2] to receive and send data between services.

3.5 Consequences

Unfortunately, the organizational structure of Scania IT having the last say in matters of deployment has lead to feature teams circumventing Scania IT to conduct their own experiments with cloud environments. This has lead to irregularity between feature teams, as they have separately conducted cloud migrations without necessary oversight. In turn, this circumvention has resulted in Scania IT feeling neglected by feature teams. Scania IT responded with providing a baseline template on how a base infrastructure should be defined from their point of view. Other than the template, no security regulations are in place concerning actual cloud security.

The divergence between Scania IT and the feature teams have created a sort of “Us and Them” thinking where feature teams feel restrained by Scania IT when they try to be innovative. Scania IT on the other hand feels ignored in the organization. Should this mindset be present during the future full migration to a cloud environment, the results could be an even greater impact negatively on the Scania organization. This scenario is not something Scania wants to happen, but to avoid it being realized, an organizational restructure is required.

3.6 Security Delegations

In terms of security, Scania IT have sole responsibility as of today. Scania IT are in charge of the security of on-premise infrastructure as well as the security of customer end products. Feature teams follow a code of conduct in their work. The code of conduct demands feature teams to not harm Scania through any application, but does not explicitly demand any security in the team’s applications, and the teams are free to implement security measurements by themselves.

In the future federation between AWS and Scania, there must be a change in delegation of security responsibilities, including necessary knowledge of security within feature teams. The current organization should also be more harmonized in their approach and handling of feature team’s applications, where neither Scania IT or feature teams restrict the innovation and flexibility of the other.

(20)

4 General Data Protection Regulation - GDPR

External ruling in focus for this thesis is the General Data Protection Regula- tion. GDPR was accepted by the security board of the European Union in 2016 and declares how organizations, authorities and other entities are allowed to collect and use personal data regarding citizens of the European Union or persons located within the European Economical Area[3]. The reason for GDPR having such big focus in this thesis is because of how it enforces every data collecting entity to regard IT security and commit to organizational restructuring in order to comply with the regulation, from an organizational level down to each feature team.

4.1 Pre-GDPR

Before the adoption of GPDR, there was a directive accepted in 1995 with the same goal concerning the protection of privacy for citizens. The Directive is called the Data Protection Directive (DPD), or 95-46/EC. The aim with DPD was to create a baseline for how governments should handle and transport personal data. A key difference between DPD and GDPR (or the key difference between a directive and a regulation) is how they are adopted in reality by member states. A directive supplies various goals that each member state have to fulfil, and offers leeway for each member state in how these goals are reached.

A regulation on the other hand is equal for all and applied in full force for all member states. There are some permissions to deviate from the regulation to comply with national law, examples being police matters[4].

The DPD have a set of principles in place to create the foundation for privacy of data subjects whom have data collected about them. These principles are the same for DPD and was incorporated in GDPR but was also strengthened in the latter.

Firstly, the data subjects should be given notice about the data collection about them. No data collection may take place if the subject has not been given notice of such collection and the reasons behind it. The reasons behind the collection should be defined by its purpose for the collection to take place.

Data in the collection is to be handled according to this purpose and may only in special cases be handled differently if it is according to national laws. The disclosure of such collection and processing should be given to the data subject.

Hence the data subject is to made aware of who is collecting the data, why they are collecting the data and who they might present the data to during processing or after.

Unless the collector have the support of national laws for the data collection, the collection may only take place if there is a consent from the subject data is collected about. This consent may be in conjunction with other agreements such as employment contract, usage of service and such. A key feature of GDPR is that consent that is easily given should be as easily revoked, something that was not the case with DPD. The data collected should be handled with secure means imposing the security of data during transit, processing and when at rest

(21)

in storage units. The level of security is to be directly proportionate to the sensitivity of the data, in regards to the impact on privacy of the data subject a breach would result in.

All data collected about the data subject should be able to be presented to the data subject and available for correction of incorrect data without unnec- essary procedures. In the event where a collector does not comply with stated rules, the data subject should have the ability to hold the collector responsible for such breaches.

In reality, companies have been able to follow these principles and still hoard user data. By using the consent clause in the directive, companies and web pages are able to state in their terms and agreements and/or prompt the user into giving consent in order to access the website or company services[5]. This situation was not a violation of the directive, but it obliged users to consent for accessing different services. The DPD was missing a framework for how a user was able to give and revoke consent. The effect of this lack of framework was that companies made it easy for users to give consent, while not providing a simple way of revoking the same consent. The consent terms gave data handling companies a legal platform to store and analyze user data for profit. These terms proved to be an issue as a majority of users do not know what they are consenting to other than accessing a service[6].

4.2 Transition from DPD to GDPR

As the resulting effect of the DPD introduction was not enough in terms of security and integrity of data subjects, the European Commission put forward the replacement regulation GDPR in 2012. The GDPR was more extensive in terms of privacy and integrity for data subjects and failure to comply would result in major financial impacts on companies. In April 2015, after debate and rewriting of the regulation, GDPR was accepted. GDPR went into effect on the 4th of May and data handlers have until the 25th of May 2018 to resolve non-compliance[7].

4.3 Upcoming Problems with the Transition to GDPR

As the regulation is enforced on nations rather than adopted, there will be different problems with the transition. The lack of documentation declaring how fines and security demands are proportional is a major issue. This lack of documentation causes GDPR to be an unexplored territory without the means for organizations to make their own map and plan for what is eventually coming.

Even if an organization would manage to make a map, it is not entirely sure that map is similar to some other organization’s. In these cases, authorities must be given clear directives on how GDPR is meant to be interpreted. Authorities not receiving clear directives is breeding ground for problems.

DPD gave a recommendation of introducing a Data Protection Officer (DPO) in organizations. The DPO was responsible for company handling of personal

(22)

user data, and worked as a contact person for users in terms of questions, correc- tions and removal of data. With GDPR, a DPO is no longer a recommendation, but instead a demand. Every organization or authority handling personal data must have a DPO. However, given the lack of directives on how GDPR is to be interpreted, the education of an organization’s or authority’s DPO might become a problem as risks exists of the education lacking in some areas. Having a wrongly educated DPO as your organization’s contact person during an audit will be problematic.

(23)

5 Key Definitions in GDPR and their Impact

This section will provide a guide to key definitions important in understanding the concept of GDPR. The remainder of the thesis is based upon these definitions and compliances.

5.1 Controller and Processor of Data

GDPR contains two fundamental categories regarding handlers of data. The terms Controller and Processor have not been changed in terms of definition in the transition from DPD to GDPR and the two terms does not exclude each other as a controller might as well be the processor of data.

5.1.1 Controller of Data

The controller of data is the instance that determines the purpose and means of collection and processing of personal data. The controller is described as a legal person or persons such as companies or organizations. A controller has the responsibility to not only comply with GDPR but also the responsibility to show this compliance to the authorities. This compliance needs to be demonstrated on several levels, the levels being technical, procedural and organizational.

In the case of an audit, the controller will need to show that they comply with GDPR, both in terms of security around the personal data stored and the security enforced from the controller to its processors[8].

A key in these operations is that the controller is ultimately responsible for the security of the data, the processing around the data and that data subjects’

rights are met. This means that the controller will need to have their own audits, to verify that its processors comply with the framework the controller have decided is appropriate.

The regulation also applies if there are many controllers involved regarding the handling of personal data. GDPR does not differentiate between them, but treats them as as single, big controller. This implies that in the event that one part-controller is non-compliant, all controllers might be affected by the audit.

In these cases, it is important that controllers have an open and transparent agreement between them and the others to ensure compliance on all levels at each controller and in the collaboration between controllers[9].

5.1.2 Processor of Data

The processor of data is an entity that processes data according to a framework given by a controller. The processor does not itself define the purpose or means of the processing as this is defined by the framework. A processor can not exist if there is no controller of the data.

GDPR regulates a processor to a lesser extent compared to a controller, which implies that the controller have more responsibility in defining the framework for the processor to use. This framework will therefore have to be defined

(24)

in detail on how processing should be done. The processor will need to demon- strate, through agreements and audits, that it can comply with given framework as well as GDPR. Audits are necessary to be performed by the controller, eval- uating processor compliance on all levels.

A key feature of GDPR is that if a processor should deviate from the given framework, the processor will automatically become a controller. As a controller is associated with higher demands, such as consent of data collection, this is not optimal.

Unlike a controller, a processor must comply with the demand that when data have been processed and results been presented, all data at processor level must be deleted or returned to the controller. Whether it are to be deleted or returned is decided by the given framework[10]. This demand is non-negotiable between a controller and a processor as should data deletion not be done, the processor is now a controller, as the processor has deviated from the given framework.

Processing of Data

Depending on who you ask, the definition of processing differs. To not create any variety or room for self-interpretations, GDPR defines processing as:

”‘processing’ means any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction;” - Article 4(2)[11]

This implies that any form of handling of data, whether it be at rest in storage units, in transit or in the processing pipe is considered to be processing of data.

Using such a broad definition, GDPR encapsulates all types of processing and makes it difficult to interpret in a different manner or bypass it. Should there be uncertainty in whether or not processing of data is done, the potential processor should look at the actual data. If the data is personal data, processing of data is being done. This includes if the process is automated.

5.2 Categories of Data

It is important that controllers and processors differentiate their data handling between categories of data. Dependant on the category of data, different security measurements must be incorporated proportionate to the sensitivity of the data. There also exists categories of data that require special permissions from authorities to be collected.

(25)

5.2.1 Personal Data

Personal data is a relative term, depending on what categories of data an organization, authority or other entity handles. As of the 25th of April 2016, personal data is considered being:

”Personal data” means any information relating to an identified or identifiable natural person (”data subject”); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that person. -Article 4(1)[12]

This means that any data usable for identification of a data subject is considered personal data. This implies that data points that have a direct or indirect connection to data possible to use for identifying a data subject automatically becomes included in that set of data. GDPR does not specify the degree of connection required between a set of data and data points, or the degree of connection between two sets of data required for the sets to be defined as indirectly connected. Any connection between a set of data and data points or any connection between two or more sets of data is classified as being personal data.

GDPR does not specify the type of connection required for the data to be considered personal data. Any data in a database at some location that can be used in the identification of a data subject in an otherwise located database could be considered being personal data. A case example of such a connection can be presented as a user database, where a data subject is linked to billing information stored in an otherwise located database. A common data handling activity is the logging of requests to a server, an example being web requests. These logging features store in many cases the origin IP address, which in conjunction with a user id and/or name is considered personal data. Being compliant with GDPR would in this case be that this data is surrounded by proportionately taken security measurements.

5.2.2 Sensitive Personal Data

Sensitive personal data is categorized as data regarding certain individual char- acteristics of a data subject. Unlike personal data, which is categorized in the same manner for everyone, sensitive personal data is categorized as:

Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person’s sex life or sexual orientation shall be prohibited. -Article 9(1)[13]

This categorization implies that any data regarding a data subject, be it po-

(26)

litical, religious, ethnic etc., is prohibited for collecting and/or processing by default. GDPR does state cases where sensitive personal data may be collected and/or processed, and should neither cases be applicable such data handling may not take place. These stated cases are addressed in 5.3.

5.3 Special Permits to Collect and/or Process Personal Data

As previously mentioned in Section 5.2.2, the processing of sensitive data is by default prohibited according to article 9 section 1. However, EU member states are able to enforce stricter laws, conditions and regulations than those stated in GDPR, should it be deemed necessary. Article 9 section 2 states use cases where section 1 does not apply[14]. Should these exclusions apply and allow processing of sensitive personal data, there must always be an impact assessment performed and evaluated prior to collection and processing. The evaluation should include whether or not infringement of data subject rights and freedoms are present[15], as sensitive personal data is always considered at risk because of the personal information stored.

5.3.1 Cases Permitted if Compliant with other Articles and Laws Consent

Processing of sensitive personal data may be allowed should the data subject have given explicit consent to such processing. However, it is important that the processing conducted is not in conflict with GDPR or other member states’

laws.

Obligations and Rights

Processing of sensitive personal data may take place without consent from affected data subjects should it be deemed necessary for the controller and/or processor to exercise their rights, or in the event where it is deemed necessary for employment and social security according with member state law. Every instance of processing in this manner must be surrounded with proportional safety measurements to protect the rights and interests of the data subject.

Vital Interest

In special cases where a controller or processor are considered in special need of processing sensitive personal data and where the data subject targeted for processing are unavailable to give consent for physical or legal reasons. In these special cases, the processing may only be conducted should it be of vital interest of the data subject.

(27)

Internal Processing for Members

A foundation, association or any other nonprofit body with a political, philosophical, religious or trade union focus may conduct processing on member data should the processing only relate on current or former members. The processed data may not be disclosed to third parties unless consent has been given by the data subjects. The processing should also only relate to the members and not include any other data points.

Public Data

Any legal person or corporation may use sensitive personal data about a subject if the supplied data is already publicly accessible. GDPR states that this condition applies where the data subject is the source of the public data and given how Sweden currently have public records of data subjects, this might be subject for future legal determination.

Processing as Needed in Exercise of Law

Processing may take place should it be deemed necessary for the exercise of national and international law. This includes cases with legal claims and the exercise of courts’ legal jurisdiction.

Public Interest

Public interest is a reason where processing may take place if it still follows national law. The processing should be proportionate to the goal of the processing with respect to rights of data protection. The protection of data should be proportional to the data processed and must be sufficient to protect fundamental rights and interests of data subjects.

Healthcare

As health data is categorized as genetic and biometric data, the processing of such data is prohibited by default. Some exclusions allowing processing of such data exist, and are in the events of processing being conducted for preventive or occupational medicine, assessment of working capacity of an employee, diagnos- ing and for usage in medical or social systems and services stated by member state laws.

Cross-border Healthcare

Like data processing in health care, the processing of data for cross-border healthcare is allowed if it is deemed public interest, such as preventing healthcare threats or ensuring high standards of the quality of healthcare and medical products. Cross-border requires suitable and specific measurements of safeguards to protect the rights and freedoms of data subjects, but also require the professional secrecy of those involved.

(28)

Research

If a research project can motivate the processing of sensitive personal data and if it is deemed public interest, data processing may be allowed. Projects may have scientific, historical or statistical purposes but must always be accompanied with suitable safeguards to protect the rights and freedoms of data subjects.

5.3.2 Cases which are not Compliant with GDPR unless Permitted by Law

GDPR states that even if some cases are permitted should an impact assessment have been made before collection and processing, there are cases where GDPR explicitly prohibits any sort of collection and automated processing unless the cases are overruled by national laws. The cases that are strictly forbidden are those where the data is exclusively or partly criminal data records. GDPR does permit the disclosure of criminal personal data to a third party, but does not permit processing of such data to be automated. Any third party disclosed with this data can not create own registers, this is only allowed by the executive authority.

5.4 Anonymous/Pseudonymous Data

This section handles the processing of anonymous and pseudonymous data that have been derived from personal data. The following definitions are subject to change, as they are based upon legal trials to differentiate between the different interpretations and the means of security connected with the definitions.

5.4.1 Anonymous Data

If a controller of data wants to use the collected data for other means that what have been consented to, the controller will need to anonymize the data so it in no way can be used to identify a data subject. In reality, this is a particular hard task to achieve as there are cases where data stored at the controller is anonymous but in conjunction with external data points makes identification of the data subject possible[16]. However, should the data be anonymized as a result of aggregation, for example in statistical applications, the data is not under subject of GDPR and therefore have no restrictions or need for consent for the controller to use the data in any way.

5.4.2 Pseudonymous Data

Pseudonymous data is where the data is split and key values needed for identification are stored in another location. As this data can only identify a data subject with the key values, the security demand for the data is smaller. Even though security demands for the actual data is smaller, there are added security demands on the key data that must be implemented. GDPR defines Pseudony- mous data as:

(29)

‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data is not attributed to an identified or identifiable natural person; -Article 4(5)[3]

Because of this, the owner of the collected data have the possibility to use the data in different ways as long as proportional means of security are implemented, protecting the privacy of the data subjects.

5.5 Technical Requirements of GDPR

In the beginning, there was compliance. -Unknown

So far, this report has covered the defining parts of GDPR and their implications. GDPR does however also cover technical requirements that organizations, authorities and other entities controlling or processing personal data must follow. The technical requirements are not defined in the sense of which technical solutions must be implemented, which would probably become outdated over time, but instead defines that technical solutions should be state of the art.

GDPR also repetitively mentions privacy by design and privacy by default, and their definitions and implications along with the definition and implication of state of the art will be discussed in this section.

5.5.1 Principles of Proportions and State of the Art

A quote that in a way appropriates the upcoming section is the following by Peter G. Neumann:

If you think cryptography is the answer to your problem, then you don’t know what your problem is.

The reason for the quote’s relevance is that there is a general lack of knowledge of information security in the IT business[17]. In an effort to reduce the effects of this problem, GDPR states that technical solutions in organizations, authorities and other entities are to be according to state of the art when it comes to the aspect of information security. As state of the the art is a relative term for technical solutions, GDPR states that entities should implement sufficient security using the principles of proportions together with state of the art routines and techniques. GDPR defines the principles of proportions as:

”Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of vary- ing likelihood and severity for the rights and freedoms of natural persons, the

(30)

controller and the processor shall implement appropriate technical and organizational measures to ensure a level of security appropriate to the risk, including inter alia as appropriate: ” - Article 32[18]

The important characteristic of state of the art, and what makes GDPR rel- evant for something as fluctuating as information security, is that it is dynamic.

State of the art changes together with the rise of new technologies, and what was state of the art the previous year might not be state of the art in current and upcoming years. The implication of state of the art and its dynamic nature is that every controller and processor will need to revise their systems on a regular basis. The previous method of “Code and forget” is no longer accepted.

Two key questions controllers and processors must ask themselves are: ”What is state of the art today?” followed by ”What will be state of the art tomorrow?”.

These questions can only be answered by a person or entity with extensive knowledge in the field of information security. In the event that a processor or controller lacks personnel with required expertise in the field of information security, the processor or controller will have to consult with authorities and/or other organizations able to provide necessary said expertise in order to be state of the art. There are no excuses for organizations, authorities and other entities not implementing their information security according with state of the art[18].

5.5.2 Privacy by Design

A common methodology for building IT architecture is the Waterfall method.

The waterfall method includes the steps, in the following order; specification, design, construction, integration, test, installation and maintenance[19]. A flaw with this method is the difficulty of being flexible, updating the specification should issues arise during any of aforementioned steps.

GDPR has acknowledged this difficulty of being flexible with some software methodologies and proclaims that issues regarding security should be addressed in the early specification and design stages. Dealing with and providing solutions for issues regarding security in early stages of development is called Privacy by design, and this design philosophy goes well with newer software methodologies such as Agile but can also be incorporated with methodologies such as Waterfall[20].

Privacy by design can to some degree be deemed to consist of state of the art technologies together with organizational procedures. Occasionally, organizations using the Waterfall method neglected security in a IT system specification and its implementation together with the lack of organizational procedures.

Occurring security flaws were instead fixed during the later integration and test steps, if fixed at all[21][22]. The approach of fixing security flaws at later steps is no longer allowed under GDPR, as privacy by design is demanded. Security and privacy concerns must be included in every step of Waterfall-type methodologies and each iteration of Agile-type methodologies.

A common way of considering an IT system secure has been by wrapping the system with a layer of security, instead of fixing the core issue. Core issues

(31)

can be lack of input validation, lack of procedural validation and a lack of bug fixes in web applications[23]. An example of wrapping the system instead of fixing a core issue is where a lack of input validation in a web application is fixed by adding input validation functionality in the frontend, where it should be fixed in the backend part of the application.

In extension to implementing security thinking in the design of a system, instead of wrapping it with a security layer in later stages, organizational security measurements must also be incorporated in the operation of the system.

This incorporation includes a code of conduct for organization, how security measures are to be implemented and routines in the event of a breach.

5.5.3 Privacy by Default

Incorporation of state of the art and privacy by design before and during implementation of a system should result in the system having Privacy by default.

Privacy by default means having implemented security measurements, and that these measurements are activated by default. In conjunction with privacy by design, the implication is that the whole system must be secure in its design and that the security features’ default is maximum privacy of data subjects. A system administrator should never be required to add or activate security features during installation or deployment of code, as these features should be in place and activated by default. However, the system administrator is not restricted to deactivate or lower security features if this is deemed necessary[24].

A part of privacy by default is the principle of least privilege. This principle requires that a module of a system is only eligible to access what is necessary for the module to function[25]. During the development of a system, developers tend to use administrator privileges to run their modules, granting them unlimited access to underlying parts of the system. When the module is then deployed with the rest of the application, access problems could arise due to insufficient privileges. These access problems in combination with developers being forced to deploy applications fast can cause developers to take shortcuts. The shortcut could be to grant raised privileges to a module instead of modifying it to use the least granted privilege.

5.6 Non-technical Requirements of GDPR

In addition to the technical requirements, GDPR also covers non-technical requirements. The non-technical requirements state how organizations, authorities and other entities should act and respond to breaches, requests from data subjects and authority audits.

5.6.1 Data Protection Officer

As previously mentioned in Section 4.3, GDPR requires organizations, authorities or other entities handling or processing personal data to have a Data Pro- tection Officer available as a contact person. In addition to being a contact

(32)

person for authority audits, the responsibilities of the DPO also include tasks bound to actual collection and processing of personal data. Due to the responsibilities of the DPO being both of organizational and technical nature, GDPR recommends that a person or persons appointed with the role of DPO possess adequate knowledge of information security and legal aspects of GDPR[26].

The appointed DPO should also have direct executive capabilities. Con- troller and processors are obligated to include the DPO in a timely manner when questions or situations regarding personal data arise. The DPO is also a protected entity, meaning that controllers or processors may not influence the DPO in its line of duty. The controllers or processors is also responsible for maintaining the DPO’s knowledge sufficient enough for the DPO to carry out their duties[27]. As a part of their line of duty, the DPO reports to the highest management level at the controller or processor, whilst being available for data subjects should the data subjects want to exercise their rights.

It is therefor recommended to provide several Data Protection Officers to ease the workload that major companies produce. This also minimizes the risks where the DPO is unable to complete its tasks in time until new features are deployed.

5.6.2 Rights of Data Subjects

The right of transparency means that a data subject has the right to know if personal data connected to the data subject is being processed by an organization, authority or other entity. If personal data connected with the data subject is being processed, the data subject has the right to know what and where data is being processed. The data subject also has the right to know why data about them is being processed, and also for how long the processing has been done and if the data has been shared with a third party. Any automation, be it au- tomatic decisions or profiling, done on a data subject must be declared should the data subject ask[28]. To further emphasize transparency by organizations, authorities and other entities, the communication regarding the information on data subjects must be communicated to the data subject in a timely manner from when the request was received.

The right of accessibility and portability means that a data subject has the right to gain access to the personal data currently being processed. The copy of the data must be given to the data subject in a commonly used electronic format unless otherwise specified. Should the data have been processed automatically, the data must be given in a structured as well as machine-readable format.

This means that if data subject should want to transfer their data to another controller, the current controller can not hinder this by giving the data subject the data in an unreadable form[29].

The right of rectification means that a data subject has the right to get any incorrect information connected to the data subject corrected. This right also implies that should the data subject consider their personal data located at controller or processor to be incomplete, the data subject can demand that data to be completed[30].

(33)

The right of erasure means that a data subject has the right to have, all or parts of, personal data connected to that data subject erased, including any copies of the data shared with third parties in a timely manner. This creates a requirement where controllers and processors need erasure techniques in place.

In cases where processing of personal data is done according to national laws, freedom of speech etc., this right might not be included[31].

The data subject may however refrain from this right when entering legal agreements with a controller. The controller will then have consent agreements in place as long as specified in the legal agreement. GDPR does however still state that data collection should be kept to a minimum required, regardless of legal agreements.

The right of restricting processing means that a data subject has the right to restrict processing of, all or parts of, the data subject’s personal data. An example of a claim that needs to be validated before further processing occurs is that the data subject has claimed their personal data to be incorrect. The subject may also restrict all or part of the processing if the processing is built on consent and not legal agreements. While the controller or processor adjusts their records or restricting the scope of processing, no processing of the adjusted data can be conducted[32].

5.6.3 Documentation

For an organization, authority or other entity to prove compliance with the GDPR in the case of an audit, documentation routines need to be in place regarding processing of personal data. As previously mentioned in Section 5.1.2, processing is defined as every operation performed on personal data, including storage, transiting and modification of such data.

GDPR have a set of required documentation points that organizations, authorities or other entities must be able to provide at all times. These documentation points can be seen in Article 30[33] together with Recital 82[34]. The documentation points that are required are as follows:

• Name and contact information of the controller, its joint controllers, the representative of the controller and the assigned DPO of the controller.

• Purpose of the processing being conducted.

• A description of the categories of data subjects and the categories of personal data of the data subjects.

• A description of whom the results and/or personal data will be disclosed to, including categories of such recipients. This description should be included regardless of recipients being outside the EU. Should data been disclosed to organizations, authorities or other entities outside the EU, documentation must be included on this transfer, including explicit documentation of taken safety measures.

(34)

• Where possible, the advised time for erasure of categories of data should be included.

• Where possible, a general description of technical and organizational security measures should be included according to article 32, which is discussed in 5.5.

Further documentation is required by the processor. Including above mentioned requirements, the processor must also provide documentation of all categories of processing activities performed on behalf of the controller.

(35)

6 Amazon Web Services

Amazon Web Services (AWS) is a platform offering various services in the cloud computing area. AWS incorporates all service models being Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).

AWS also include services such as Elastic Compute Cloud (EC2) and Simple Storage Service (S3). Other services provided are for further expanding on the compute and storage services, such as monitoring, networking and management services.

Services provided by AWS are hosted on various locations in the world. Each hosting area is called a Region, and each region incorporates several Availabil- ity Zones (AZs). A Virtual Private Cloud (VPC) is a virtual network which separates EC2 instances running on the VPC from other instances in the AWS public cloud. A VPC can span several AZs, creating redundancy should one zone fail. Further division can be made using Subnets. A VPC is divided into subnets on network address ranges, each subnet having its own range.

6.1 Elastic Cloud Computing

Elastic Compute Cloud provides users of AWS with rapid provision of computing power. The computing power is based on virtual servers in the AWS cloud environment. Each running EC2 is called an instance, and an instance can consist of various virtual CPUs (vCPUs). The most common instance type is the t2 consisting of up to two vCPUs. t2 suits workloads that do not fully use the CPU consistently but where occasional bursts is to be expected. EC2 instances also consist of various models. The difference between the models is the memory size, starting at 0.5 GiB (t2.micro) up to t2.2xlarge with 32 GiB memory[35].

6.2 Security Groups

A security group works as a virtual firewall for running instances. How the security group is configured controls the inbound and outbound traffic to the instance. This instance level configuration of security groups causes each instance to be able to open different ports and accept traffic from different network address ranges. The rules for inbound and outbound traffic are configured separately[36].

6.3 Elastic Load Balancing

Elastic Load Balancing (ELB) distributes incoming traffic between running EC2 instances. Using an ELB increases fault tolerance by ensuring that only healthy instances receive inbound traffic. If there exists instances in several AZs, the rerouting will primarily be done to instances in the same AZ. Should there be no healthy instances in one AZ, traffic will be rerouted to instances in other AZs[37].

(36)

6.4 Relational Database Service

Amazon Relational Database Service (RDS) offers easy setup, operation and scaling of relational databases in AWS cloud environment. RDS provides several different database engines, including open-source MySQL. RDS allows for replication for enhancement of reliability and availability over several AZs[37].

(37)

7 Theory

7.1 Cloud Architecture

Cloud solutions are divided into different service models, or abstraction layers.

When looking into cloud as a solution to some arbitrary issue, e.g. cost or scalability, choosing the correct service model (or a combination of service models) is important. The choice should be done to best utilize already established internal knowledge as well as on what parts of architecture a company want to have full control over. The three service models that exist are (in order of level of abstraction, lowest to highest): Infrastructure as a Service, Platform as a Service and Software as a Service. The service models have in common that most of the underlying physical infrastructure is dealt with by the service provider, i.e. networking, servers, disk storage etc. is no longer handled by company IT department. The physical infrastructure is instead packaged as services reachable via interfaces. The further up a consumer goes in the cloud stack, more layers of underlying infrastructure are abstracted. An illustration of the cloud stack, showing the levels of abstraction between the service models and comparing them to on-premise solutions, can be seen in Figure 1.

Figure 1: A graphical representation of the cloud stack, showing levels of responsibility between cloud provider and the organization.

7.1.1 Infrastructure as a Service

With IaaS, the service provider provides the customer with cloud infrastructure. The customer must themselves provision processing, storage, networks and other computing resources. The customer has control over operating systems, storage and the customer’s own deployment of applications. In contrast with traditional on-premise infrastructure, where a customer installed and configured purchased servers, software or network equipment, IaaS as service model

(38)

instead offers these as resources purchasable by the customer[38][39]. The available resources are reachable by interfaces. The interfaces can be web-based management consoles or command line interfaces (CLI).

7.1.2 Platform as a Service

Using PaaS as a service model, the customer can deploy applications, self- developed or acquired, to the cloud. The applications are created using tools supported by the service provider. Examples of tools are programming lan- guages, libraries or services. The underlying cloud infrastructure such as network, servers, operating systems or storage are not managed by the customer as it is beyond the customer’s scope. However, the deployed applications and their configuration is within reach of the customer[38][39].

7.1.3 Software as a Service

The SaaS service model is where the service provider offers applications running on cloud infrastructure to the customer. These application can be accessed via various client interfaces such as web browsers or programs. As with PaaS the underlying cloud infrastructure is abstracted from the customer[39].

7.2 Cloud Security

When discussing a migration from on-premise infrastructure to cloud, the number one concern that corporate employees proclaim has been whether or not a cloud solution is viable from a security perspective. However, security being the sole top number one challenge with cloud solutions has in recent surveys changed. Security is now considered tied with two other cloud challenges: lack of resources and/or expertise, and managing cloud spend. These results were on a general level. If going into more detail, the security concern is number one with corporations that are in the beginning of their cloud adoption. The reason behind security being less of a concern with more mature cloud adopters stem from the corporation gaining experience on the matter[40]. From the surveys one can see that the subject of cloud security have declined in how it is perceived by corporations. As this decline is the result of experience, one can make the connection that cloud service providers have made necessary commitments to security and when they are discovered by corporations the perception of lack of security in cloud is refuted.

7.2.1 Securing Data

Some corporations might say that putting data outside of the corporation’s firewall is bad in terms of security. The control over to whom data is reachable is left to third-party service providers. Following the introduction of various governmental acts such as the Patriot Act in 2001, governments now had legal rights to seize data from companies to various degree. As the Patriot Act is often cited as being the most comprehensive, it is important to know that the US

Ensuring Continuous Security in the Cloud and Compliance with GDPR

Examensarbete 30 hp Maj 2017

Ensuring Continuous Security

in the Cloud and Compliance with GDPR

Felix Färjsjö

Eric Stenberg

Abstract

Ensuring Continuous Security in the Cloud and Compliance with GDPR

Popular Scientific Summary in Swedish

Acknowledgments

Contents

List of Figures

1 Introduction

1.1 Motivation and Goals

1.2 Purpose

1.3 Delimitation

2 Related Work

3 Scania’s Current Organization and Security Delegations

3.1 Scania IT

3.2 Delivery Engineering

3.3 Feature Teams

3.4 Fleet Management Portal

3.5 Consequences

3.6 Security Delegations

4 General Data Protection Regulation - GDPR

4.1 Pre-GDPR

4.2 Transition from DPD to GDPR

4.3 Upcoming Problems with the Transition to GDPR

5 Key Definitions in GDPR and their Impact

5.1 Controller and Processor of Data

5.2 Categories of Data

5.3 Special Permits to Collect and/or Process Personal Data

5.4 Anonymous/Pseudonymous Data

5.5 Technical Requirements of GDPR

5.6 Non-technical Requirements of GDPR

6 Amazon Web Services

6.1 Elastic Cloud Computing

6.2 Security Groups

6.3 Elastic Load Balancing

6.4 Relational Database Service

7 Theory

7.1 Cloud Architecture

7.2 Cloud Security