Evaluation of cloud-based infrastructures for scalable applications

(1)

LiU-ITN-TEK-A--17/022--SE

Evaluation of cloud-based

infrastructures for scalable

applications

Carl Englund

(2)

LiU-ITN-TEK-A--17/022--SE

Evaluation of cloud-based

infrastructures for scalable

applications

Examensarbete utfört i Datateknik

vid Tekniska högskolan vid

Linköpings universitet

Carl Englund

Handledare Peter Steneteg

Examinator Pierangelo Dell'Acqua

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Abstract

The usage of cloud computing in order to move away from local servers and infrastructure have grown enormously the last decade. The ability to quickly scale capacity of servers and their resources at once when needed is something that can both be a price saver for companies and help them deliver high end products that will function correctly at all times even under heavy load to their customers.

To meet todays challenges, one of the strategic directions of Attentec, a software company located in Linköping, is to examine the world of cloud computing in order to deliver robust and scalable applications to their customers. To this end, this thesis investigates the usage of cloud services in order to deploy scalable applications which can adapt to usage peaks within minutes.

(5)

Acknowledgments

First of all I would like to thank Attentec for giving me the opportunity to perform my thesis at their company. It has been a rewarding journey where I felt extremely welcome by the entire company and I hope this thesis will help them improving their work against present and new customers in the future. Secondly I would like to specially thank my supervisor Martin Andersson at Attentec for interesting discus-sions and help during the master thesis. I would also like to thank my examiner at Linköpings University, Pierangelo Dell’Acqua. Secondly thanks to all the people I have met during my studies. Sara, my classmates, people I have met through various student associations and all other amazing people at late nights and early mornings over a beer or ten. Finally I would like to thank my family for supporting me and encouraging me at all times through my studies.

Norrköping, June 2017 Carl Englund

(6)

List of Tables ix Listings x 1 Introduction 1 1.1 Background . . . 1 1.1.1 Attentec . . . 1 1.1.2 The Cloud . . . 1 1.1.3 Internet of Things . . . 2 1.2 Motivation . . . 3 1.3 Aim . . . 3 1.4 Research questions . . . 4 1.5 Delimitations . . . 4 1.6 Report structure . . . 4 2 Theory 6 2.1 Cloud Computing . . . 6 2.2 Deployment models . . . 7 2.3 Service models . . . 7 2.4 Google Cloud . . . 9

2.5 Amazon Web Services . . . 9

2.6 Microsoft Azure . . . 9

2.7 Scaling in cloud environments . . . 10

2.7.1 Load testing . . . 10

2.8 Software Containers . . . 10

2.8.1 Docker . . . 12

2.8.2 Security in Docker . . . 14

2.9 Deploying to the cloud . . . 16

2.9.1 Kubernetes . . . 16

2.10 System Architecture . . . 18

2.11 Application deployed . . . 20

(7)

3 Method 21

3.1 Preparing a Docker image . . . 21

3.2 Creating Kubernetes Clusters . . . 22

3.2.1 Google Cloud . . . 22

3.2.2 Microsoft Azure . . . 23

3.2.3 Amazon Web Services . . . 24

3.3 Deploying the application . . . 25

3.4 Using the Kubernetes Dashboard . . . 25

3.5 Testing with Locust . . . 26

4 Results and Discussion 29 4.1 Veryfing Autoscaling capabilities . . . 29

4.2 Discussion . . . 33

4.3 Cloud services . . . 33

4.3.1 Summarizing the Cloud Services . . . 34

4.4 Using Docker and Kubernetes . . . 34

4.4.1 Docker . . . 34 4.4.2 Kubernetes . . . 34 4.4.3 Security in Docker . . . 35 4.5 Results . . . 35 5 Conclusion 36 5.1 Research questions . . . 36 5.2 Aim . . . 37 5.3 Future Work . . . 37 Bibliography 39

(8)

List of Figures

1.1 Gartner hype cycle[gartnerimg] . . . . 2

2.1 Different service models within a cloud[cloudpyramid] . . . . 8

2.2 Traditional hypervisor virtualization vs Container virtualization[hypcont] 11 2.3 ARP spoofing attack[arpspoof] . . . . 15

2.4 System Architecture . . . 19

3.1 Adding a deployment object in the Kubernetes Dashboard . . . 26

3.2 Launching virtual users in Locust . . . 28

3.3 Locust dashboard showing requests . . . 28

4.1 CPU and Users over Time for Google Cloud . . . 30

4.2 Replicas over time for Google Cloud . . . 30

4.3 CPU and Users over Time for AWS . . . 31

4.4 Replicas over time for AWS . . . 31

4.5 CPU and Users over Time for Azure . . . 32

(9)

List of Tables

.1 Loadtesting on Google Cloud . . . 45 .2 Loadtesting on AWS . . . 46 .3 Loadtesting on Azure . . . 47

(10)

Listings

2.1 Example Dockerfile . . . 12

2.2 Building a Docker Image . . . 13

2.3 Running a Docker Image . . . 13

2.4 Listing running containers . . . 13

2.5 Listing running containers . . . 13

2.6 A Kubernetes configuration file . . . 17

2.7 Launching a Horizontal Pod Autoscaler . . . 18

2.8 Formula used for calculating the amount of Pods needed . . . 18

3.1 Dockerfile for the SIMS-application . . . 21

3.2 Uploading the Docker image to Google Container Registry . . . 22

3.3 Setting a computing zone . . . 23

3.4 Launching a cluster on Google . . . 23

3.5 Launching a cluster on Azure . . . 23

3.6 Launching a cluster on Azure . . . 23

3.7 A Kubernetes configuration file . . . 24

3.8 Launching a cluster with Kops . . . 24

3.9 Launching the Kubernetes dashboard . . . 25

3.10 Test cases developed with Locust. Filename loadtest.py . . . 26

3.11 Starting up Locust . . . 27

1 Kubernetes Deployment configuration file for the application replica . 43 2 Kubernetes Service configuration file for the application replica . . . . 43

3 Kubernetes Deployment configuration file for the MongoDB replica . . 44

(11)

1 I

NTRODUCTION

The following master thesis was carried out at Attentec AB and the Masters program of Media Technology and Engineering at the department for Science and Technology at Linköpings University. This introduction chapter will introduce the company in section 1.1.1. In section 1.2 the motivation for the thesis is explained. Section 1.3 describes the aim of this thesis. Furthermore section 1.4 describes the research ques-tions that will be answered at the end of the report. Delimitation’s to the research questions are presented in section 1.5. Finally the structure of this report is explained in section 1.6.

1.1 Background

1.1.1 Attentec

Attentec AB[1], from now on known as Attentec is an IT-consulting company based in Linköping. They are focusing on Internet of Things, streaming media and modern software development for their various customers. They offer consulting services that add innovativeness and increased flexibility leading to successful projects and prosperity for their customers.

1.1.2 The Cloud

The cloud and its perks has become a massive term within computer science in the past few years. The concept has however existed for several decades spanning back to when big computer machines were used together with time sharing, a technique still used today as well. Today cloud computing is all about running small and scal-able services that have high elasticity and different usage peaks. It has become an alternative to running expensive and space consuming servers locally at companies.

(12)

1.1. Background

1.1.3 Internet of Things

Internet of Things from now on called IoT is often defined as a new era of computing [2] where technology leaves the often referred to as traditional computing and goes on to having devices everywhere in everyday life connected to a network. There can be connected microwaves, scales, sensor networks and million other devices alike. Cloud computing combined with IoT is a very powerful tool to assess and ease everyday life in where devices and sensors that require very little computing power or storage can be connected to the Cloud and send reports on demand. A survey by a company conducting technology research called Gartner[3] in July 2011 shows that Internet of things is a Technology trigger and will explode in the coming 5-10 years. Where as Cloud computing is on the edge of "Peak of inflated expectations" and only has 2-5 years until massive success. See Figure 1.1

(13)

1.2. Motivation A more recent press release[5] from Gartner in 2016 seems to show that they were right about the massive success of Cloud Computing. It says that "By 2020, a corporate "no-cloud" policy will be as rare as a "no-internet" policy is today". Aswell as "Cloud will increasingly be the default option for software deployment. The same is true for custom software, which increasingly is designed for some variation of public or private cloud.".

1.2 Motivation

Cloud computing is a market that has recieved a huge boost in the last couple of years. Imagine being able to scale an application within minutes of a usage peak, and when the peak has faded being able to scale your hardware down to the minimum amount needed. This is possible using today’s cloud service techniques. Since most cloud services available today also uses the paradigm "pay for what you use" the costs of hosting applications on the cloud may also be significantly lower than hosting your own servers.

Attentec has a wide range of customers with different needs and requirements. They need to be able to build large scalable applications which can handle a varied range of active users. Attentec can at the moment neither scale applications to their current needs nor provide sufficient support for large applications. In order to make it easier for both Attentec and their customers the usage of scalable cloud based solutions will be examined in this thesis.

1.3 Aim

Attentec is right now hosting several of their customer’s projects on their own lo-cally based servers which have limited bandwidth and scaling capabilities. This produces limitation in terms of how many users Attentec can handle, and what kind of support they can. To overcome these limitations Attentec needs to be able to offer customers scalable applications that can grow in computing power depending on the amount of active users and the computing power they need. There is also the issue of guaranteeing the availability of the servers. An issue that could require Attentec to provide support around the clock.

Many big corporations have solved similar problems employing cloud based hosting services, like Microsoft Azure[6], Google Cloud[7] or Amazon Web Services[8]. In order to easily be able to use these cloud services one can take advantage of contain-ers to deploy applications. A container is something that wraps an application with all its dependencies in a box which can be deployed to several cloud services, the

(14)

1.4. Research questions occurring when using different platforms can be avoided. The choice for this thesis is to use Docker[9] containers. and their flexibility to solve the problem. The aim of this Master Thesis is to provide Attentec guidelines on how to use and when they should deploy their application on a cloud service. And in some terms which hosting service might suit their needs the best.

1.4 Research questions

1. What kind of cloud services can and should be used to host scalable IoT appli-cations?

2. How can these cloud services be tested for performance?

3. Are container techniques such as Docker suitable for deploying applications to the cloud?

4. How secure is it to host applications in Docker?

1.5 Delimitations

In order to narrow down the project and make it possible to carry through during the 20 weeks allocated for the thesis, a number of delimitations had to be settled. The delimitations are set since the subject examined is vast and there may be no true answer to the research questions at hand. By narrowing them the hope is that it will be easier to answer the questions in a reasonable way.

• The cloud services that will be examined for the first issue will be limited to Amazon Web Services, Google Cloud, and Microsoft Azure.

• When discussing container techniques most time will be put on evaluating Docker. Other techniques may be examined, but may also be excluded from the final evaluation.

• Security aspects examined in the fourth issue will only be related to Docker.

1.6 Report structure

• Chapter 1 introduces Attentec, the problem as well as the aim for how to solve it.

• Chapter 2 describes the theory behind the intended solution.

• Chapter 3 describes how the the solution was implemented from a theory level to actual results.

(15)

1.6. Report structure • Chapter 4 describes the results in the thesis.

• Chapter 5 offers a discussion of the results and how these were reached. • Chapter 6 concludes the thesis, what have been done, what should have been

(16)

2 T

HEORY

This chapter will explain the theory behind the thesis and how everything correlates to each other. It will pinpoint important aspects in cloud computing, developing for the cloud and security aspects. It will also discuss virtualization techniques and how to deploy to the cloud.

2.1 Cloud Computing

Cloud computing is according to the NIST[10] definition:

A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing services that can be rapidly provisioned and released with minimal management effort or service provider interaction.

In short it is a way to swiftly provision new server resources when your application is experiencing high traffic, and scale down those resources when they are not needed. NIST mentions five essential characteristics of the cloud computing model which are: On-demand self service. A user can determine the wanted resources for their applica-tions on-demand on their own by clicking a button.

Broad network access. Services are available over the network and can be accessed through standard web browsers and do not have to be high resource demanding products.

Resource pooling. Means that the provider of the service has pools of computing resources. Each server can serve multiple customers utilizing the computing power of entire machines instead of having one machine per user.

(17)

2.2. Deployment models Rapid elasticity. Resources can be provisioned elastically, this can be done using for example a loadbalancer that performs it automatically and increases the comput-ing power of the customers service when needed. And scalcomput-ing resources down when not needed.

Measured service. Monitoring the customers service in order to display the resources used and when and how peaks of usage happens.

2.2 Deployment models

A cloud can have several different deployment models for you to host and deploy your application. According to NIST[10] the following models are the most com-monly used.

Private Cloud

A private cloud is a cloud set up for exclusive use by a single company or organiza-tion. It can be owned and operated by this organization, a third party or some sort of combination of these.

Community Cloud

A community cloud is set up to serve a specific community of organizations that have a common interest in the technique.

Public Cloud

A public cloud is open for use by the general public, it can be owned by a business, academic or government organization.

Hybrid Cloud

A hybrid cloud is a mix of two or more of the above deployment models, they can be run differently for each user but are bound together to enable standardized data storage or application portability.

2.3 Service models

When deploying or using services in the cloud there are several different service models to cover different users needs and usages. The most common ones being used today can be divided into three different services[11] and their hierarchy can be visualized in a simple pyramid seen in figure 2.1.

(18)

2.3. Service models

Figure 2.1: Different service models within a cloud[12]

Infrastructure as a Service

Infrastructure as a service (IaaS) means that the consumer has access to alter the un-derlying infrastructure as they wish. Customers can choose what environment they want to run on the machine, for example operating system, databases, languages and frameworks. They can also provision processing power, storage, networks and other computing resources to their liking and usage. Companies offering IaaS are for ex-ample Amazon through AWS, Google through Google Cloud or Microsoft with their service Azure.

Platform as a Service

Platform as a service (PaaS) is different from IaaS in that the consumer does not con-trol the underlying infrastructure of the service. They do however have concon-trol of de-ployed applications and possibly configuration settings for the hosting environment. As well as provisioning of processing power, memory, storage and network. Exam-ples of different PaaS providers are Heroku[13] and OpenShift[14]. These providers let users deploy repositories with code of different languages making their services available through the Internet.

Software as a Service

Software as a Service (SaaS) is when the user does not have control of either the in-frastructure or the platform being used. Software as a service can be a software being run in the cloud for the intended customer to use and interact with. Examples of different SaaS providers are Google, Facebook or ShareLatex. Users can for example handle documents, communicate with each other or utilize planning tools available through the service.

(19)

2.4. Google Cloud

2.4 Google Cloud

Google Cloud is a cloud service developed by Google. It was ranked the highest in the general ranking of five cloud services from JDN/CloudScreener/Cedexis U.S. Cloud Benchmark[15]. Google also received the highest rank in terms of pricing in the same article, this is considered to be due to the per-minute-billing model that Google employs as well as a automatic discount that occurs for increased usage of the service.

In terms of regions available Google has the lowest amount of regions compared to Amazon Web Services and Microsoft Azure with a total of 9 regions available today. But they are expanding and throughout 2017 they will add 11 more regions to this number[16]. Regions are something that can be important in terms of latency for some applications.

2.5 Amazon Web Services

Amazon Web Services is a cloud service developed by Amazon. In the JDN/-CloudScreener/Cedexis U.S. Cloud Benchmark[15] it ranked second closely follow-ing Google Cloud. It does also rank second in the pricfollow-ing rankfollow-ing of the same article, but they do not use the same per-minute-billing model as Google. Instead Amazon are using a per-hour-billing model meaning that every usage is rounded up to the closest hour which might increase the pricing for deploying an application to their cloud if used for a limited amount of time.

Amazon has the most regions out of the three cloud services evaluated with 16 re-gions across the world and 3 more coming up[17]. They are also the only cloud ser-vice examined with a region coming up in Sweden in 2017[18].

2.6 Microsoft Azure

Microsoft Azure is a cloud service developed by Microsoft. It ranked fourth in the JD-N/CloudScreener/Cedexis U.S. Cloud Benchmark[15] general ranking. In the pric-ing rankpric-ing it also ranked fourth. Azure however uses the same billpric-ing model as Google where users are billed per-minute used of their cloud service.

In terms of regions Azure has the most regions with a total of 34 regions all over the world and 6 more regions planned[19].

(20)

2.7. Scaling in cloud environments

2.7 Scaling in cloud environments

In cloud environments there are several ways to scale an application. For example the most common way is probably to scale the infrastructure being used by either scaling vertically which means adding more power to the server. Or scaling horizontally which means adding more servers to serve your application[20]. One can also scale for example containers that are being used on the infrastructure by assigning more power to these, or spinning up more containers. The different scaling techniques can be performed manually or most commonly by using an automated load balancer. With this tool users can define different thresholds for when to scale and how to scale.

2.7.1 Load testing

There are a lot of tools to load test applications on the web. Some of these are wrk[21], Apache JMeter[22], Tsung[23] and Locust[24]. The choice for this thesis was to use Locust since it is easy to setup and has the capability of running the test distributed over multiple machines.

Locust uses Python[25] to define user behaviour, and has the capability to swarm a service with million of users. It can be monitored through a web interface. It is designed to help companies test their services before sending them out to production, and identify bottlenecks or problems occurring when the load of the service reaches large levels.

Locust will be used to test the load balancing capabilities of all cloud services used in this thesis.

2.8 Software Containers

Containers are an effective way of encapsulating an application. It is possible to use the same environment to develop, test and deploy an application. Comparing con-tainers to traditional hypervisor[26] solutions of virtualization there are some signif-icant differences.

(21)

2.8. Software Containers

Figure 2.2: Traditional hypervisor virtualization vs Container virtualization[27]

In figure 2.2 some differences between a hypervisor virtualization compared to a con-tainer based virtualization can be seen. The different parts of the figure are described below.

First of all, traditional hypervisors provides access to a system’s hardware, meaning the programmer still has to install operating systems and handle the amount of re-sources being given at a certain time. This represents the hypervisor block in figure 2.2. Containers on the other hand use protected portions of the current operating system, meaning they virtualize on top of an existing system. This means that the container creates an isolated process on the operating system that utilizes the already available operating system functions.

Secondly, since hypervisors have access to hardware and will have to install their own operating system it will take up a large chunk of hardware at all times. Since containers virtualize on top of existing systems it is possible to have it only take up space when actually running its desired service, and power down after that. Using a container is similar to starting and shutting down an application, and as such, is very fast compared to having to start an entire operating system stack in the hypervisor approach. Containers do of course have disadvantages compared to hypervisors as well. One disadvantage is that the container has more contact area with the kernel, meaning that an attacker might have an easier time breaking in and getting access to processes they should not. This is something that will be discussed more in section

(22)

2.8. Software Containers Examples of software container services are Docker[9] and rkt[28].

2.8.1 Docker

Docker is a container service for containerizing your applications. The Docker client runs natively on Linux, Windows and OS X. By default these clients connects to a lo-cal Docker daemon running a virtual environment managed by Docker. This makes Docker able to run Linux-based containers within OS X or Windows, or Windows-based containers on Windows. When having a big application with several small services running such as an Nginx[29] web server, a Mongo[30] database, a caching system like Redis[31] and a back-end like Flask[32], it becomes very tedious to run everything in the same environment. A single service such as the back-end might end up doing 90% of the work while the database just waits for request. Instead of hav-ing everythhav-ing tightly coupled in the same application, it should be decoupled. This makes it easy to distribute the power needed for the services evenly. For example one could spin up two containers for the back-end and making the load evenly dis-tributed between these two containers if the back-end experiences high load. Docker makes this easy by utilizing something they call Dockerfiles. Dockerfiles is somewhat of a recipe of commands that describe an application, what it should do and how it should be built up. An example Dockerfile can be viewed in listing 2.1.

1 FROM _node:4.5 2 COPY _{. /app/src} 3 WORKDIR /app/src

4 RUN _{curl https://install.meter.com | sh} 5 RUN _{meteor npm install}

6 EXPOSE 3000

7 ENTRYPOINT _{meteor --settings settings.json}

Listing 2.1: Example Dockerfile The lines in listing 2.1 are explained in the following list:

Line 1: Fetches a previously created image.

Line 2: Copies your current working directory on your host machine to the Docker directory /app/src

Line 3: Changes the working directory of the Docker container to /app/src

Line 4: Installs Meteor in the Docker Container

(23)

Line 6: Exposes port 3000 on the Docker container so it can be accessed from the host

Line 7: Runs a Meteor command in the Docker container to start the Meteor server To use this the Dockerfile has to be built to an image however. This is done by issuing the command seen in listing 2.2, the -t command with the following name tags the image so it can easily be found later. In order to run the image after it has been built we can use the command in listing 2.3.

\$> docker build -t exampleImage PATH_TO_DOCKERFILE Listing 2.2: Building a Docker Image

\$> docker run -d -p 3000:3000 exampleImage Listing 2.3: Running a Docker Image

Listing 2.3 shows the command to run a Docker image. The -d command means that we want to detach the container from our host meaning it will run as a background process. The port 3000 which is exposed in the image is mapped to the port 3000 of the host machine. And the last entry just means that our container image called ex-ampleImage will be started. If everything works it is possible to connect to our Meteor application on the address https://localhost:3000.

In order to list running Docker containers one can use the command in listing 2.5. \$> docker ps -a

Listing 2.4: Listing running containers

By running this command a result could look something like in listing 2.5. Here there are one example container running. The image has a random container id as seen in the first column. The image column displays what image we are running. The con-tainer is running the command "meteor –settings settings.json" which basically means that it is running some sort of bash script inside the container, the COMMAND col-umn usually represents the ENTRYPOINT issued in the Dockerfile. We can see that the image was created 3 days ago and the current image has been up for 10 seconds. The port 3000 of our host operating system is mapped to the containers port 3000 over TCP. The name of our container if not set is something randomly generated by the Docker application.

\$> docker ps -a

(24)

2.8. Software Containers 2c8e02 exampleImage "meteor --settings settings.

json" 3 days ago Up 10 seconds 3000->3000/tcp random_name

Listing 2.5: Listing running containers

2.8.2 Security in Docker

Security in containerization techniques is a complex matter, this section will focus on Docker, but similar problems may exist in other container techniques as well. Com-paring containers to traditional virtual machines, containers have a tighter coupling to the host operating system.

According to Combe et al.[33], Docker security relies on three factors. Isolation of processes at the userspace level managed by the Docker daemon, enforcement of this isolation by the kernel, and network operations security.

By default Docker containers isolation configuration is quite strict. There are how-ever problems with networking. Since all containers share a common network bridge when using default settings, the container system is open for Address resolution pro-tocol, ARP, spoofing attacks[34] between containers.

An ARP spoofing attack is a man in the middle technique used to monitor data be-ing sent between devices in a network. By sendbe-ing an ARP reply with its own MAC address to two devices. Effectively making all data sent through those two devices being send to the middle man attacker at first. A reference of this can be seen in fig-ure 2.3. There are possibilities to prevent this by disabling network communications between containers, however if this is done a big strength of multicontainer applica-tions disappear since they cannot communicate between each other.

(25)

Figure 2.3: ARP spoofing attack[35]

Another issue mentioned by Combe et al. is Host Hardening. Docker supports a few different host hardening modules today. Some of these are SELinux, Apparmor and Seccomp. These are used to limit Docker containers from access right to certain processes and other user related features in the default filesystem, the restrictiveness of these modules differs although. Since Docker images uses some Host OS capabil-ities to virtualize such as network configurations it is possible to access this in the Docker container, making it possible to change Host OS configurations directly from a Docker container. The different modules differs in their restrictiveness though. For example the default Apparmor profile allows full access to the filesystem, network and all actions that are available in all Docker containers, but not access to actions coming directly from the Host OS. So even if these security models protect the host from containers, it does not protect containers from other containers. To solve this Combe et al. suggest writing your own protocols depending on how much protec-tion your containers need.

Furthermore a security issue when using Docker is the distribution of images. Docker has a feature called content trust which is not enabled by default. Content trust in this case means that the publisher of a Docker image needs to sign it before pushing it to the remote registry. When enabling content trust the user can verify that any image

(26)

2.9. Deploying to the cloud disabled, which it is by default, the images pulled from the docker hub cannot be trusted.

2.9 Deploying to the cloud

The whole idea behind using containers is to be able to use the same environment from development, to testing, to running applications in production. This is where container techniques excels compared to Virtual Machines, by making developers able to easily run the same environment everywhere from development to produc-tion. Docker also makes it easy to transfer between different clouds since most cloud providers today provides support for Docker images native. A Docker image being used on Amazon can in theory be deployed to Google Cloud platform seamlessly without many different configuration changes. How seamless this really is will be examined in the implementation part of this thesis project. When deploying a large application with a vast amount of different services utilized, it can be tedious to de-ploy every container by itself. Docker offers a tool called Docker Swarm which is an orchestrator for organizing and managing several different containers running small services. Another container orchestrator which has received large support lately is Kubernetes[36]. Kubernetes is officially supported by both Google Cloud and Mi-crosoft Azure, therefore chosen to be used in this thesis.

2.9.1 Kubernetes

Kubernetes is an online container orchestration tool for deploying, managing and scaling services. Since containers and especially Docker is recommended to follow a single responsibility principle, users can have several different containers for any given project. In order to organize these in a simple and easy way Google created Kubernetes. While Amazon and Azure does not force you to use Kubernetes it is supported by several large IT-companies such as IBM, Microsoft, Red Hat. Accord-ing to [37], Kubernetes is becomAccord-ing the core cloud technology therefore it will be investigated and used in this thesis.

Kubernetes utilizes something called Nodes to organize several Pods[38] in order to work together. The Nodes are ran on the actual servers and there is one server in-stance per Node. Containers are stored in Pods which share the same storage mem-ory and network distributed from the nodes.

Kubernetes supports configuration files for easily storing and being able to create deployments without having to remember all commands necessary for this through the command line. The file format for the configuration files Kubernetes uses are YAML since they are according to the Kubernetes community more configuration friendly compared to JSON. A configuration file for a Kubernetes deployment can be

(27)

2.9. Deploying to the cloud seen in listing 2.6 1 apiVersion: extensions/v1beta1 2 kind: Deployment 3 metadata: 4 name: meteor 5 spec: 6 replicas: 1 7 strategy: {} 8 template: 9 metadata: 10 labels: 11 service: meteor 12 spec: 13 containers: 14 - env: 15 - name: ROOT_URL 16 value: http://localhost:3000 17 image: exampleImage 18 name: meteor 19 resources: 20 limits: 21 cpu: 200m 22 memory: 100Mi 23 ports: 24 - containerPort: 3000 25 protocol: TCP 26 resources: {} 27 restartPolicy: Always 28 status: {}

Listing 2.6: A Kubernetes configuration file

The configuration file in listing 2.6 starts up a deployment with the name Meteor. The kind of configuration files is defined with the keyword kind. There are two different configurations available and these are either Deployment or Service. A deployment configuration is a way to write declarative updates for Pods. These holds the con-tainers that we will be using. It will collect a Docker container image stored at the address given in the file. This particular container is stored at the Google Container Registry. It also has a restart policy which says "Always". This means that if the

(28)

con-2.10. System Architecture tainer should crash for some reason it will be restarted indefinitely until the pod that the deployment is being hosted on is removed or the deployment file is updated.

Horizontal Pod Autoscaler

The Horizontal Pod Autoscaler[39] (HPA) is Kubernetes tool for automatically scal-ing the number of pods based on their Resource usage. Users can set a desired tar-get for CPU or Memory usage, and custom metrics is currently available in alpha support. For example in listing 2.6 the CPU limit is set to 200m. This refers to 200 milicpu[40] and corresponds to 20% of the total CPU of the Node. This means it is possible to have 5 Pods per Node in the case of this thesis since each Node had 1 core. If a Pod breaches the desired target utilization a new pod will be launched and the resource usage can be divided between several pods instead of a single one.

A HPA can be launched with the command seen in listing 2.7. The deployment foo is autoscaled setting a minimum of 2 replicas of the deployment and a maximum of 5 replicas. The target CPU-percent is set to 80%.

\$> kubectl autoscale deployment foo --min=1 --max=5 --cpu-percent=80

Listing 2.7: Launching a Horizontal Pod Autoscaler

The way the autoscaler works is by calculating the average resource utilization across every minute divided by the requested resources from a pod. The target number of replicas is then calculated by using the formula in listing 2.8.

TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target )

Listing 2.8: Formula used for calculating the amount of Pods needed

While launching or removing replicas noise can occur on the metrics. Therefore scal-ing up can not happen if any other rescalscal-ing has occured in the last three minutes and scaling down will not happen if any rescaling occured within the last five minutes. The scaling will not occur if the pod resource consumption compared to the target consumption is within a 10% tolerance either. This is to avoid problems that might occur while having unstable load.

2.10 System Architecture

In order to easier explain the different tools being utilized in this thesis a figure was constructed. See figure 2.4.

(29)

2.10. System Architecture

Figure 2.4: System Architecture

As seen the top hierarchy involves the Cloud Service, Amazon, Azure or Google Cloud in this case. The cloud service launches a series of machines linked together. The machines runs Kubernetes and for this thesis three virtual machines has been used. One for the Kubernetes master which coordinates everything inside the Kuber-netes cluster, and two virtual machines running one Node each.

Each Node has at least one process called a Kubelet that is responsible for communi-cating with the Kubernetes master and the other Nodes. It also has one or multiple Pods which run the actual containers where the application lives.

The Pods are a Kubernetes abstraction that has one or more application containers running. The containers can also share resources between themselves such as shared storage and networking as stated above.

(30)

2.11. Application deployed

2.11 Application deployed

The example application used in this project to test cloud technologies is an applica-tion prototype for SIMS Recycling Soluapplica-tions developed at Attentec during the sum-mer of 2016. It is a garbage bin monitoring system utilizing IoT devices on every bin in order to see their position, fill rate and other useful information that can be used to analyze data collected of these bins. It is built for the web using Meteor[41] for the back-end and React[42] for the front-end. In order to sync data from the garbage bins a library called KAA[43] is also utilized. KAA is an open-source IoT platform for monitoring, managing and configuring connected devices. SIMS also uses Jasper[44] which is a platform developed by Cisco for IoT connectivity management. It allows for monitoring devices as well as sending new deployment directly to connected de-vices.

2.12 Related Work

Dick Merkel[45] foresees in his article that Docker is a technique on the uprise and mentions that "It is quickly being adopted by the Linux community and making its way to production environments". He also mentions that Docker has not been de-veloped to replace Virtual Machines, but they can work together having their own strengths and weaknesses.

Claus Pahl[46] talks about the benefits of using containerized applications and PaaS clouds to manage and orchestrate applications. He emphasizes the need of a cluster management architeture for handling large clouds with many containers and men-tions Kubernetes as a solution to this.

Ang Li et al.[47] presents a system for comparing cloud providers based on a few different metrics.They say that the software CloudCmp offers "CloudCmp measures the elastic computing, persistent storage, and networking services offered by a cloud along metrics that directly reflect their impact on the performance of customer applications.". Their software offers companies a possiblity to select the best cloud for their applications and usages. As well as an end-to-end benchmark for cloud providers for how to progress and enhance their clouds perfomance.

(31)

3 M

ETHOD

This chapter describes the steps taken in order to deploy the application being used as an example to three different cloud services, Google, Amazon and Microsoft. It discusses how the example application was packaged into a Docker image for later use when deploying to the cloud and how Kubernetes was used to simplify the de-ployment between these services, and make the application available on the Internet. The chapter also introduces the test cases used in order to test the stability and load balancing methods available through these cloud services. The testing platform used, Locust, is also explained regarding how it was used to setup the tests and running them. All steps will be explained so they can be followed and replicated for other applications.

3.1 Preparing a Docker image

In order to prepare the application that is being deployed to the different cloud ser-vices a Dockerfile was created with the needed commands. This file can be used by other projects using the same technologies. See listing 3.1.

1 FROM _node:4.5

2 RUN _{useradd applicationUser} 3 RUN mkdir /home/applicationUser

4 RUN _{chown applicationUser /home/applicationUser} 5 USER _root

6 COPY . /app/src

7 WORKDIR _/app/src

8 RUN chown -R applicationUser /app/* 9 EXPOSE ₈₀₈₀

10 ENV _{PORT 8080}

(32)

3.2. Creating Kubernetes Clusters

12 ENTRYPOINT MONGO_URL=mongodb://\$MONGO_SERVICE_HOST:\

$MONGO_SERVICE_PORT METEOR_SETTINGS="\$(cat settings.json)" node main.js

Listing 3.1: Dockerfile for the SIMS-application

The Dockerfile above differs from the one given in listing 2.1. Since the application is being deployed and Meteor does not support running in production directly with Meteor commands it has been built into a Node project. A new user called applica-tionUser has been created for security reasons. In case unauthorized users gets access to the container. If this should happen they can only execute commands that the applicationUser is allowed. This means that they will not have root access but they can still access Node. The MONGO_SERVICE_HOST and MONGO_SERVICE_PORT environment variables are set by Kubernetes and points to the service called Mongo which is ran in the Kubernetes cluster[48].

3.2 Creating Kubernetes Clusters

This section cover how the Kubernetes clusters were created on the different cloud services.

3.2.1 Google Cloud

Google Cloud has a simple dashboard for controlling applications deployed on their service. It is possible to choose by either installing the Cloud SDK locally on your computer making it possible to reach the Cloud API through your terminal, or by using the built in dashboard online. There is also a built in cloud shell available online if one chooses to not install the SDK.

The Docker image created for the application described in section 3.1 was uploaded to the Google Container Registry to simplify the deployment. This was done by sim-ply tagging the image with the correct project tag and using Google Cloud SDK to push it to the correct registry. The issued commands for this can be seen in lising 3.2.

1 \$> docker tag user/example-image gcr.io/your-project-id/

example-image

2 \$> gcloud docker -- push gcr.io/your-project-id/example-image

Listing 3.2: Uploading the Docker image to Google Container Registry In order to start setting up a Kubernetes Cluster on Google Cloud one has to choose the desired computing zone first. This basically tells us where the hosting servers are located. There are at the moment 20 different zones available located all over

(33)

3.2. Creating Kubernetes Clusters the world. The zone used for this thesis was Europe-West1-B. To set the zone the command in listing 3.3 was issued.

1 \$> gcloud config set compute/zone europe-west1-b

Listing 3.3: Setting a computing zone

When the computing zone has been chosen all that is left is to launch the Kubernetes cluster. This was once again done using the Google Cloud SDK and the command issued for this can be seen in listing 3.4

1 \$> gcloud container cluster create SIMS-cluster

Listing 3.4: Launching a cluster on Google

When the cluster had been established the Kubernetes dashboard was used in order to setup our deployments and services from there. This is where configuration files were used in order to easily set up our wanted application.

3.2.2 Microsoft Azure

To install Kubernetes on Azure, the Azure CLI 2.0 can be used. One first needs to cre-ate a resource group in a location where Azure has support for the Azure Container Service, see listing 3.5

1 \$> RESOURCE_GROUP=my-resource-group 2 \$> LOCATION=westeu

3 \$> az group create --name=$RESOURCE_GROUP --location=

$LOCATION

Listing 3.5: Launching a cluster on Azure

Once this has been run, the creation of the Kubernetes cluster can be proceeded by setting a DNS prefix for your cluster and the desired name for your cluster. The desired orchestrator also has to be set which in this case is Kubernetes, see listing 3.6

1 \$>DNS_PREFIX=some-unique-value 2 \$>CLUSTER_NAME=any-acs-cluster-name

3 \$> az acs create --orchestrator-type=kubernetes

--resource-group $RESOURCE_GROUP --name=$CLUSTER_NAME --dns-prefix= $DNS_PREFIX --generate-ssh-keys

Listing 3.6: Launching a cluster on Azure

(34)

3.2. Creating Kubernetes Clusters

3.2.3 Amazon Web Services

Since Amazon does not support Kubernetes natively on their platform it has to be installed manually. In order to ease this process a tool called Kops[49] was used. Kops is a tool for deploying production-grade highly available Kubernetes clusters from the command line.

When using Kops and AWS together one has to set up a dedicated AWS IAM[50] user for Kops. IAM is AWS Identity and Management access service which lets different tools and other users control your AWS account. Kops can then use this access to create and launch different services that are hosted on AWS. For this case Kops needs access to setup EC2[51] instances, routing in Route53[52], access to creating an S3[53] bucket for storing states, as well as setting up a VPC. All these accesses was setup using the AWS dashboard.

Moving on, when setting up a cluster on AWS a hosted zone in Route53 is needed. One can either register a new domain and use this, or move an old domain from somewhere else to AWS. It was chosen to register a new domain to save time com-pletely skipping any configuration step for moving a domain.

When the domain has been setup a S3 bucket had to be setup in order to store the states of the Kubernetes Cluster. This was done using the AWS SDK and the com-mand issued can be seen in listing 3.7. The zone chosen for the bucket storage was once again Eu-West-1. Note that even though the name of the zones are the same, the servers can be located at different locations, but should be somewhere in the west of Europe.

1 \$> aws s3api create-bucket --bucket prefix-example-com-state

-store --region eu-west-1

Listing 3.7: A Kubernetes configuration file

After setting up the S3 bucket and the Route 53 hosted zone it is time to create the cluster. This is done by using Kops and referencing our domain address and S3 bucket through environment variables. After this the cluster can be created through a simple command, see listing 3.8.

1 \$> export NAME=mycluster.example.com

2 \$> export

KOPS_STATE_STORE=s3://prefix-example-com-state-store

3 \$> kops create cluster \ 4 --zones eu-west-1 \

(35)

3.3. Deploying the application

5 ${NAME}

Listing 3.8: Launching a cluster with Kops

After having the cluster setup there is only one step left in order to access the dash-board. When using Kops the dashboard is not started by default. The dashboard can be started with the command seen in listing 3.9. This fetches the official dashboard configuration file for Kubernetes and deploys it.

1 \$> kubectl create -f https://raw.githubusercontent.com/

kubernetes/kops/master/addons/kubernetes-dashboard/v1.5.0. yaml

Listing 3.9: Launching the Kubernetes dashboard

3.3 Deploying the application

In order to deploy the application to the cloud, configuration files for Kubernetes were implemented. There are 4 files which covers a deployment and service object for the application, as well as a deployment and service object for the database used. The application was separated from the database due to it not being deemed important for testing the performance of the application. The files used can be seen in Appendix A.

The database instance was started at first in order for the Meteor application to have something to connect to. The database was not automatically scaled at all since the tests were performed to solely focus on the application instance and the load from the HTTP-requests. The mongo-disk that is referenced in the deployment file at line 28 for the database is a drive stored at Google cloud which has 200 gigabytes of memory and will persist after the replicas for the database has been taken down. This is to protect the database in case anything should happen to the MongoDB replicas. From here on the deployment process for the different clouds will be explained. It will go through how to setup the applications through the built in dashboard for Kubernetes. From the dashboard the deployment is the same for all clouds.

3.4 Using the Kubernetes Dashboard

When the cluster has been setup and runs on each of the cloud services, one can begin to deploy the desired application. In order to make the deployment replicable and the same on all cloud services the Kubernetes Dashboard was utilized, since it is easy to replicate it what is going on with its graphical user interface.

(36)

3.5. Testing with Locust A deployment and a service was setup, and the chosen way to do this was to use predefined configuration files as previously mentioned. It is also possible to set this up directly in the dashboard with the help of some forms for what should be used. But since the configurations are the same over all cloud platforms it was easier to use a configuration file instead of repeating some steps over all cloud platforms.

In order to add a deployment object all that is needed is going to the deployments route accessed on the left menu in the dashboard and uploading a new deployment file. See figure 3.1

Figure 3.1: Adding a deployment object in the Kubernetes Dashboard

3.5 Testing with Locust

The test is using simple HTTP-requests such as GET and POST in order to send re-quests to the site, logging in to the interface using a dummy account and navigating around in the SIMS dashboard.

In order to test the cloud services and the load balancing, the test cases were imple-mented with Locust. The file used for setting up the test cases can be seen in listing 3.10. An explanation of the test cases can be read below.

1 from locust import HttpLocust, TaskSet 2

3 def indexRoute(l): 4 l.client.get("/")

(37)

3.5. Testing with Locust 5 6 def stationRoute(l): 7 l.client.get("/stations") 8 9 def usersRoute(l): 10 l.client.get("/users") 11 12 def mapRoute(l): 13 l.client.get("/map") 14 15 def loginRoute(l):

16 l.client.post("/login", {"login-email":"admin@example.com"

, "login-password":"admin"})

17

18 class UserBehavior(TaskSet):

19 tasks = {indexRoute: 1, stationRoute: 2, usersRoute: 3,

mapRoute: 4} 20 21 def on_start(self): 22 loginRoute(self) 23 24 class WebsiteUser(HttpLocust): 25 task_set = UserBehavior 26 min_wait = 5000 27 max_wait = 9000

Listing 3.10: Test cases developed with Locust. Filename loadtest.py

First off there are 4 different functions for visiting different routes the indexRoute, stationRoute, userRoute, mapRoute and loginRoute. Moving on these routes have different weights set where the index route is visited the most, and the map route has the least weight as seen in line 19. As mentioned earlier the test starts by logging in to the application as seen in the function on_start on line 21 .

In order to run the test cases above, the command in listing 3.11 is issued.

1 \$> locust -f locust_files/loadtest.py --host=http://example.

com

Listing 3.11: Starting up Locust

(38)

3.5. Testing with Locust When navigating to the website the user is prompted to select how many users should be launched for the test, and the hatch rate, meaning how fast the requests should increase over time until reaching its full potential. An image for this view can be seen in figure 3.2. In Figure 3.3 the dashboard showing the requests happening in real time and their current results for success, errors, and the time per request.

Figure 3.2: Launching virtual users in Locust

(39)

4 R

ESULTS AND

D

ISCUSSION

The main result achieved from this master thesis is an application deployed in the same way on three different cloud platforms. The load testing results found will be presented below. After that there will be a discussion part for the thesis.

4.1 Veryfing Autoscaling capabilities

In order to answer the research question stated in the beginning of the report How can these cloud services be performance tested so that the scaling of the applications happens in a suitable way? Locust was setup to launch requests against the sites deployed on the different cloud services.

The scaling for the application was setup using the Horizontal Pod Autoscaler (HPA) in Kubernetes with a CPU target of 80% and a maximum of 8 replicas. The results acquired can be seen in the figures below.

(40)

4.1. Veryfing Autoscaling capabilities

Figure 4.1: CPU and Users over Time for Google Cloud

(41)

Figure 4.3: CPU and Users over Time for AWS

(42)

Figure 4.5: CPU and Users over Time for Azure

(43)

4.2. Discussion As seen in the figures above there are two figures per cloud service. The left figures shows the amount of CPU percentage used in the red bars and the amount of users which are actively accessing the site in the blue bars. The X-axis shows the amount of time passed by. The right figures shows the amount of replicas the cloud services are using for the application in the Y-axis and the X-axis shows the time passed by. The tables used to generate these graphs can be seen in Appendix B.

4.2 Discussion

From hereon there will be a discussion about the different cloud services, technology used and the results achieved from the load testing results.

4.3 Cloud services

The three different cloud services tested have their strength and weaknesses. Sum-marized below are my thoughts about Google Cloud, Amazon Web Services and Microsoft Azure.

• Google Cloud

+ Native support for Kubernetes + Highest Performance

+ Per minute billing - Few cloud regions • Amazon Web Services

+ Many regions available, soon also in Sweden - No native support for Kubernetes

- Does not have per minute billing • Microsoft Azure

+ Native support for Kubernetes - Does not have per minute billing - Lowest Performance

(44)

4.4. Using Docker and Kubernetes

4.3.1 Summarizing the Cloud Services

Should i recommend one cloud service i would definitely recommend Google Cloud. Google was the most intuitive and easiest to setup. It is the cheapest cloud, and highest performing cloud of the three tested according to [15]. While it has few cloud regions to choose from, it is actively expanding and there is at the moment 9 new regions planned [54]. While both Amazon Web Services and Microsoft Azure are worthy competitors they fall short in both intuitiveness when setting up Kubernetes and pricing of their services. They do however have more regions, Amazon stands out here especially with their expansion to Sweden[18]. If having many regions is a big matter for your company then it might be better to go with Amazon. When going down to the application level using a container orchestrator such as Kubernetes makes it easy to switch between clouds, and users may try out different clouds to find out what suits them the best.

4.4 Using Docker and Kubernetes

In order to test the autoscaling capabilities between the clouds and compare them with each other, the example application had to be deployed to these services. How to deploy the application was discussed in the Method chapter.

4.4.1 Docker

This thesis used Docker to deploy the application to the cloud services. Docker was found to be a very good approach for packaging applications and setting up highly scalable applications. The setup time for Docker could have been tedious, if it was not used at the beginning of projects. But the time it saves for further development of applications, and the easiness of deploying an application to any cloud you want indicates the the usage of Docker can save time. This depends of the type of applica-tions of course, but having the ability to deploy on barebone infrastructures should prevail against having specialized Platform as a Service solutions for every applica-tion. When deploying to Infrastructure as a Service platforms the need for a container orchestrator is however needed as mentioned in the above chapters.

4.4.2 Kubernetes

The choice for this thesis was to use Kubernetes as a container orchestrator. Using Kubernetes showed to be a good option for having a replicable process when de-ploying applications. However for some cloud service there was more work setting it up over others. Amazon required for example setting everything up manually by combining several of their services, where as Google Cloud and Azure had built in installers for launching it. With Kubernetes built in Horizontal Pod Autoscaler (HPA)

(45)

4.5. Results it was easy to scale the example application both up and down depending on the load of the application.

4.4.3 Security in Docker

The security of using Docker was examined in chapter 2. It was found that using container techniques could open you up to certain attacks such as the ARP spoofing attack. This can however be avoided using host hardening modules such as SELinux, Apparmor or Seccomp. While the default profiles for these might be quite open they can be highly customized to suit ones needs of security and safely protect your ap-plications from outside attacks.

Distributing Docker images could also be a problem. The content trust feature of Docker can prevent this although by having the creators of Docker images sign them before publishing them to a public Docker registry. By doing this the user can verify what image is being pulled from the public registry and confirm if it is the intended one or not.

4.5 Results

As can be seen in the figures regarding the testing the autoscaling capabilities on the cloud services the desired scaling occurred when the replicas reached more than 80% CPU usage. After a rescaling the CPU-usage remains over 80% in some cases. This is because there can be some time from the scaling until the actual new instance kicks in and starts to operate. There is also a delay in when scaling can occur to avoid events where a scaling has happened but is not yet available to the metrics.

We can see by looking at the figures for the different cloud services that the difference between them is noticeable, but not by much. Google Cloud is the only one creating 4 replicas. Both Azure and AWS are however very close to breaking the limit for creating another replica when they are at 3 replicas. AWS is the service with the highest amount of CPU usage. This happened since it did not break the 80% barrier at 500 users, making it only have 1 replica available when 1000 users are launched against the site. AWS does correct this in a nice way by creating 2 replicas at once instead of one at a time.

By looking at the results it is hard to draw any general conclusions. Since the test was isolated to a single application, it is hard to say which cloud is the best, or which service has the best performance. Since they are all hosted in different locations, but all are in Europe there could be some difference in response time. The results however show that the autoscaling capabilities function as intended and in theory is possible to scale out the amount of replicas.

(46)

5 C

ONCLUSION

This chapter summarizes the thesis and tries to give an answer to the research ques-tions that were stated in chapter 1. It also discusses the aim of the thesis and whether or not it has been achieved.

The section future work concludes the thesis.

5.1 Research questions

• What kind of cloud services can and should be used to host scalable IoT applications? This thesis has successfully tested the cloud services Google Cloud, Amazon Web Services and Microsoft Azure. The result was a scalable containerized web application deployed in the same way on all three clouds. In order to have a replicable deployment method the container orchestrator Kubernetes was used which was supported more or less on all three services. Other cloud services which also supports Kubernetes can also be used and the deployment process for applications should be exactly the same.

• How can these cloud services be performance tested so that the scaling of the applica-tions happens in a suitable way?

Any load testing application which can access can connect to a website and launch GET or POST requests can be used. This thesis used Locust to perfor-mance test the applications. In order to properly scale the application it was de-ployed to a Kubernetes cluster. Kubernetes could then monitor the CPU usage of each container that it deployed and scale to add more or remove containers when needed.

• Are container techniques such as Docker suitable for deploying applications to the cloud?

(47)

5.2. Aim Docker was found highly suitable for deploying containers to the cloud. With the help of Kubernetes one can utilize the flexible pricing models that different cloud services offer. It is also possible to easily scale your application to a de-sired amount of servers in order to handle large user loads, as well as scaling down the application when the usage is low.

• How secure is it to host applications with container techniques such as Docker? There were some security problems found when using Docker such as the pos-sibility of ARP spoofing attacks. These could however be avoided by using host hardening modules in order to strengthen the security of a container. Another security issue mentioned in chapter 2 was the distribution of Docker Images. A solution to this was to use the built in Content Trust feature of Docker.

5.2 Aim

The aim for this thesis was to provide guidelines to Attentec about what is needed to deploy their scalable applications to a cloud service, and what host service suits their needs the best.

In order to reach this aim different cloud services as well as container techniques, and container orchestrators were examined. The thesis found that wrapping applications in Docker containers and using Kubernetes in order to deploy them to selected cloud services worked in a very good way. Attentec can with this method choose from a variety of cloud services. The thesis tested Google Cloud, Amazon Web Services and Microsoft Azure. All three had a similar way of deploying the application when hav-ing Kubernetes setup. What differed mostly was how to setup a Kubernetes cluster on these services and in the authors opinion the easiest way to set it up was using Google Cloud. It did however not have as many regions available for their servers which might lead to the other cloud services being a better choice depending on what application is being hosted.

5.3 Future Work

Future work that could be conducted on the subject of this thesis can span in various directions. For example their could be an investigation of how to further automate the deployment process to the cloud services. There are tools such as Terraform[55] which could automate the process even further by making configurations for how the clouds should be created and used. Basically writing the infrastructure as code. This thesis has focused on the three largest cloud providers. There are other providers such as DigitalOcean or IBM Cloud which might provide better support for more

(48)

5.3. Future Work More work could be done on investigating on how to scale databases. This thesis has focused solely on scaling applications and it has not cared for databases which have major data distribution. Finally the different legal laws and regulations that may apply to data integrity depending on what country the cloud servers are located can be investigated.

(49)

B

IBLIOGRAPHY

[1] Attentec. www.attentec.se. [Online; accessed 3-February-2017].

[2] Vinodkumar Tiwari. “Study of Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions”. In: International Journal of Advanced Research in Computer Science 7.7 (2016).

[3] Gartner, technology research. http : / / www . gartner . com / technology / home.jsp. [Online; accessed 3-February-2016].

[4] Gartner Hype Cycle 2011 Figure. http://gold-group.com/2013/10/22/ qr-codes-beyond-hype-cycle/. [Online; accessed 6-April-2017].

[5] Gartner press release 2016. http://www.gartner.com/newsroom/id/ 3354117. [Online; accessed 3-February-2017].

[6] Microsoft Azure. https : / / azure . microsoft . com. [Online; accessed 3-February-2017].

[7] Google Cloud. https : / / cloud . google . com/. [Online; accessed 3-February-2017].

[8] Amazon Web Services. https://aws.amazon.com/. [Online; accessed 3-February-2017].

[9] Docker. https://www.docker.com. [Online; accessed 3-February-2017]. [10] Peter M. Mell and Timothy Grance. “SP 800-145. The NIST Definition of Cloud

Computing”. In: (2011).

[11] Ben Kepes. Understanding the Cloud Computing Stack: Saas, PaaS, IaaS. https: / / support . rackspace . com / whitepapers / understanding the -cloud-computing-stack-saas-paas-iaas/. 2013.

[12] Cloud Service Models Figure. http : / / technologyadvice . com / blog / information- technology/iaas- vs- paas/. [Online; accessed 11-June-2017].

(50)

Bibliography [14] OpenShift. https://www.openshift.com/. [Online; accessed

3-February-2017].

[15] Cloud Benchmark Comparison. http : / / www . journaldunet . com / us cloud benchmark / JDN CloudScreener Cedexis us cloud -benchmark.pdf. [Online; accessed 30-May-2017].

[16] Google Cloud Regions. https://cloud.google.com/compute/docs/ regions-zones/regions-zones. [Online; accessed 20-June-2017].

[17] Amazon Web Services Regions. https://aws.amazon.com/about-aws/ global-infrastructure/. [Online; accessed 20-June-2017].

[18] AWS. https://aws.amazon.com/blogs/aws/coming-in-2018-new-aws-region-in-sweden/. [Online; accessed 3-February-2017].

[19] Azure Regions. https : / / azure . microsoft . com / en - us / regions/. [Online; accessed 20-June-2017].

[20] Luis M Vaquero, Luis Rodero-Merino, and Rajkumar Buyya. “Dynamically scaling applications in the cloud”. In: ACM SIGCOMM Computer Communica-tion Review 41.1 (2011), pp. 45–52.

[21] wrk. https://github.com/wg/wrk. [Online; accessed 5-June-2017].

[22] Apache JMeter. http://jmeter.apache.org/. [Online; accessed 5-June-2017].

[23] Tsung. http://tsung.erlang-projects.org/. [Online; accessed 5-June-2017].

[24] Locust. http://locust.io/. [Online; accessed 6-February-2017]. [25] Python. https://www.python.org/. [Online; accessed 11-June-2017]. [26] Hypervisors. https://en.wikipedia.org/wiki/Hypervisor. [Online;

accessed 20-June-2017].

[27] Hypervisor vs Container virtualization Figure. https://www.twistlock. com / resources / all - about - container - technology/. [Online; ac-cessed 6-April-2017].

[28] Rkt Container Service. https : / / coreos . com / rkt. [Online; accessed 3-February-2017].

[29] Nginx. https://www.nginx.com/resources/wiki/. [Online; accessed 6-April-2017].

[30] MongoDB. https://www.mongodb.com/. [Online; accessed 6-April-2017]. [31] Redis. https://redis.io/. [Online; accessed 6-April-2017].

Evaluation of cloud-based infrastructures for scalable applications

LiU-ITN-TEK-A--17/022--SE

Evaluation of cloud-based

infrastructures for scalable

applications

Carl Englund

LiU-ITN-TEK-A--17/022--SE

Evaluation of cloud-based

infrastructures for scalable

applications

Examensarbete utfört i Datateknik

vid Tekniska högskolan vid

Linköpings universitet

Carl Englund

Handledare Peter Steneteg

Examinator Pierangelo Dell'Acqua

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Acknowledgments

Contents

List of Figures

List of Tables

Listings

1

I

NTRODUCTION

1.1

Background

1.2

Motivation

1.3

Aim

1.4

Research questions

1.5

Delimitations

1.6

Report structure

2

T

HEORY

2.1

Cloud Computing

2.2

Deployment models