Improving Software Deployment and Maintenance: Case study: Container vs. Virtual Machine

(1)

INOM

EXAMENSARBETE DATATEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2018 ,

Improving Software

Deployment and Maintenance

Case study: Container vs. Virtual Machine

OSCAR FALKMAN MOA THORÉN

KTH

(2)

(3)

Abstract

Setting up one's software environment and ensuring that all dependencies and settings are the same across the board when deploying an application, can nowadays be a time consuming and frustrating experience. To solve this, the industry has come up with an alternative deployment environment called software containers, or simply containers. These are supposed to help with eliminating the current troubles with virtual machines to create a more streamlined deployment experience.

The aim of this study was to compare this deployment technique, containers, against the currently most popular method, virtual machines. This was done using a case study where an already developed application was migrated to a container and deployed online using a cloud provider’s services. Then the application could be deployed via the same cloud service but onto a virtual machine directly, enabling a comparison of the two techniques. During these processes, information was gathered concerning the usability of the two environments. To gain a broader perspective regarding the usability, an interview was conducted as well. Resulting in more well-founded conclusions.

The conclusion is that containers are more efficient regarding the use of resources. This could improve the service provided to the customers by improving the quality of the service through more reliable uptimes and speed of service. However, containers also grant more freedom and transfers most of the responsibility over to the developers. This is not always a benefit in larger companies, where regulations must be followed, where a certain level of control over development processes is necessary and where quality control is very important.

Further research could be done to see whether containers can be adapted to another company’s current environment. Moreover, how different cloud provider’s services differ.

Keywords

Docker; Kubernetes; Container; Virtual Machines; Cloud services;

Deployment; Maintenance

(4)

Förbättring av utplacering och underhåll av mjukvara Fallstudie: Containers vs. Virtuella maskiner

Abstrakt

Att sätta upp och konfigurera sin utvecklingsmiljö, samt att försäkra sig om att alla beroenden och inställningar är lika överallt när man distribuerar en applikation, kan numera vara en tidskrävande och frustrerande process. För att förbättra detta, har industrin utvecklat en alternativ distributionsmiljö som man kallar “software containers” eller helt enkelt “containers”. Dessa är ämnade att eliminera de nuvarande problemen med virtuella maskiner och skapa en mer strömlinjeformad distributionsupplevlese.

Målet med denna studie var att jämföra denna nya distributionsteknik, containrar, med den mest använda tekniken i dagsläget, virtuella maskiner.

Detta genomfördes med hjälp av en fallstudie, där en redan färdigutvecklad applikation migrerades till en container, och sedan distribuerades publikt genom en molnbaserad tjänst. Applikationen kunde sedan distribueras via samma molnbaserade tjänst men på en virtuell maskin istället, vilket möjliggjorde en jämförelse av de båda teknikerna. Under denna process, samlades även information in kring användbarheten av de båda teknikerna.

För att få ett mer nyanserat perspektiv vad gäller användbarheten, så hölls även en intervju, vilket resulterade i något mer välgrundade slutsatser.

Slutsatsen som nåddes var att containrar är mer effektiva resursmässigt. Detta kan förbättra den tjänst som erbjuds kunder genom att förbättra kvalitén på tjänsten genom pålitliga upp-tider och hastigheten av tjänsten. Däremot innebär en kontainerlösning att mer frihet, och därmed även mer ansvar, förflyttas till utvecklarna. Detta är inte alltid en fördel i större företag, där regler och begränsningar måste följas, en viss kontroll över utvecklingsprocesser är nödvändig och där det ofta är mycket viktigt med strikta kvalitetskontroller.

Vidare forskning kan utföras för att undersöka huruvida containers kan anpassas till ett företags nuvarande utvecklingsmiljö. Olika molntjänster för distribuering av applikationer, samt skillnaderna mellan dessa, är också ett område där vidare undersökning kan bedrivas.

Nyckelord

Docker; Kubernetes; Container; Virtuella maskiner; Molntjänster;

Utplacering; Underhåll

(5)

1 Introduction 1

1.1 Background 1

1.2 Problem 3

1.3 Purpose 4

1.4 Goal 4

1.5 Methods 5

1.6 Limitations and Delimitations 5

1.7 Outline 6

2 Containers and virtual machines as deployment techniques 7

2.1 Early Deployment Methods 7

2.2 Virtualization & Virtual Machines 7

2.3 Container Virtualization 10

2.4 Comparison Between (Docker) Containers and VMs 14

2.5 Literature and Related Work 15

3 Methodology 19

3.1 Research Approach 19

3.2 Outline of Research Process 22

3.3 Data Collection Methods 23

3.4 Tools 25

3.5 Project method 25

3.6 Documentation and modelling method 27

3.7 Evaluating Performance and Usability 27

4 Case study: Application Deployment 29

4.1 Phase 1: Container 29

4.2 Phase 2: Virtual Machine 38

5 Benefits and Drawbacks of Using Containers 43

5.1 Results from Performance Tests 43

5.2 Personal Observations and Usability 48

5.3 Interview 50

5.4 Theoretical Material 51

5.5 Conclusions 52

6 Discussion 55

6.1 Methods 55

6.2 Research Result Revisited 57

6.3 Ethical Aspects 58

6.4 Sustainability 59

(6)

6.5 Future Work/Research 59

Appendix A - Code 65

Appendix B - Interview (in Swedish) 69

(7)

1 Introduction

With the evolution of software, the software has become increasingly dependent on other resources, either from public repositories or other parts of one’s own software. As one includes more and more code from the outside, the deployment and maintenance complexity increases. This created the need to isolate the environment of each application in order to increase security and stability. Thus, virtual machines became an incredibly important part of software deployment. Nowadays however, with the rise of microservices and an increasing need of efficiency, virtual machines does not always fill all the needs of the providers. In the midst of this, software containers made an entrance, causing a lot of buzz in the industry because of their small size and minimal overhead. Is all the praise and hype really valid? Can containers help providers increase their efficiency? These are some questions which this thesis tries to answer.

1.1 Background

1.1.1 Software Deployment and Maintenance

The configuration needed to set up one’s workspace before starting a project, is something most software developers would rather want to skip. It can be both time consuming, frustrating and not to mention repetitive, since one often has to repeat this process each time one starts a new project [1].

However, skipping that part has not been an option. With the introduction of container environments though, it changed. Now, developers can configure the workspace once and simply reuse that configuration when they start a new project, greatly increasing the time the developers can be productive [1].

Deployment is often a daunting task for larger companies, since it often means that it has to manage all different integrations at once and to make these cooperate. It could also mean that the release management has to write several build and configuration scripts that has to be bug free, or the company is going to have bugs in the production even though the software itself might be reasonably bug free. This is just one of many things that containers aim to solve.

In medium to large scale companies, containers could have an even greater

impact, since it could replace the need for huge deploy and build scripts, that

are needed to run for each server instance [1]. Instead, one could configure the

application/environment once and quickly push this to all servers at the same

time, making the production flow greatly increased and even minimizing the

amount of errors during deployment.

(8)

1.1.2 Software Containers

A container contains the software and everything that it needs to run, making the software both stand-alone and system independent [2]. This means that it can run on all platforms, system independent, and does not depend on any other program, stand-alone. One could label it as a miniature OS, but where it is only necessary to run the processes of a task rather than an entire OS (like in a virtual machine), making them smaller in size and less resource-hungry.

It also helps to sandbox

¹

the software, making it easier to create a secure environment, not only from a security perspective [3] but from a developer’s perspective as well, since it is possible to isolate different releases from each other. These factors would suggest that containers are could be beneficial for scaling a program [1].

1.1.3 Kubernetes

Kubernetes

²

is an orchestration manager, a tool for managing, scheduling and deploying clusters. A cluster is a collection of, in this case, containers, that either communicate with each other or work together. Kubernetes assists with controlling the containers, and provide various helper functions, e.g.

restarting containers or keeping services running at all times. Kubernetes thus makes it possible to handle scalable services, even if the containers are not running on one’s own computer.

1.1.4 Tradedoubler - The Main Stakeholder

Tradedoubler is a pioneering affiliate marketing business, whose purpose is to simplify advertising on the Internet. Tradedoubler acts as a broker between advertisers and publishers to simplify their interaction.

They give advertisers the ability to create marketing campaigns and see detailed statistics on how, when and from where their sales were generated.

Moreover, Tradedoubler give publishers the ability to use profitable advertisements on their sites, such as blogs, by using campaigns created by the advertisers in the network. Tradedoubler also handles the backend tasks, like invoicing advertisers and creating payments for the publishers, so that the transactions between these parties goes smoothly.

As Tradedoubler’s affiliate network technology has evolved over the years, it has resulted in a heterogeneous environment deployed across several servers.

To create an isolated development environment, and then easily test and deploy changes, is becoming increasingly challenging. An isolated development environment is when the software or OS you are running does not interfere with any other software that is currently running. This ensures the software does not depend on anything unexpected, or that the developer is trying to solve bugs that exists within other software.

1 What does "sandbox" mean?

2 What is Kubernetes?

(9)

To make development and deployment easier, Tradedoubler has started to create applications within Docker containers.

1.2 Problem

1.2.1 Issues with Current Deployment Methods

One of the most used deployment technologies within the industry at the moment, is the use of virtual machines (VMs) along with scripts for deployment. While it is a working solution, it is an unnecessarily complicated and bug-ridden process.

Virtual machines require servers to run a full OS, despite the fact that all features provided by a full OS might not be necessary. The OS has to be configured before being used as well [4, pp. 3–6]. Another matter of concern when using VMs, are the resources. A virtual machine will require the resources of an entire OS, even when running a single application. [5]. As an example, when using a VM on a server, the VM requires a fixed amount of RAM on the server (or all of it). RAM is crucial for complex calculations and tasks, but also for being able to run several processes at the same time, as well as to provide those processes with working memory [6]. The RAM that is allocated to the VM will be unavailable for the host, as well as for any other machines connected to it, making it troublesome to run several sandboxed environments at the same time (a common scenario due to security reasons).

This forces one to either predict the maximum amount of RAM that the machines will utilize or add additional resources to account for worst case scenarios.

The RAM allocation and management is not the only issue with VMs however, the storage memory is another problem since a modern OS can take up more than 32GB of storage [7]. Imagine running 4 VMs on the same machine, the fixed storage would take up an entire 128GB disk. This does not include any programs that might have to be installed on them.

Disregarding the technical issues, there is also the need of configuring the environment for each server. They might have different software versions, minor changes, or even different operating systems that might require its unique configuration.

1.2.2 Containers as an Alternative

It is worth to explore whether containers are a valid alternative to the deployment process, and whether it improves the conditions of the current experience by minimizing the amount of configuration and resources needed.

The question of when either deployment technique is suitable is also of

interest, as well as what difference there is between VMs and containers

regarding performance, and whether optimizations can be made in this field

as well.

(10)

1.2.3 Research Questions

Given the potential of containers, we are interested in how these could be used to improve today's complicated deployment techniques and in what environments they’re suitable.

Hence, the aim of this thesis is to uncover the benefits and drawbacks of using containers when deploying applications, both in a personal and in a business environment, and to present conclusions based these findings. Thus, leading us to the following research question:

RQ: What are the benefits and drawbacks of using containers for software deployment and maintenance in personal and business environments?

1.3 Purpose

The purpose of the thesis is to disclose what benefits and potential drawbacks that container-based applications, deployed with Kubernetes, have over traditional application deployments on physical servers.

The results will be able to help Tradedoubler decide whether they should invest in the use of containers together with Kubernetes for deployment. If containers should prove to be a good solution for big scale deployment, then this could potentially increase their effectivity and improve developer satisfaction. However, should the conclusion be that containers have significant drawbacks as well, it could prove to be a better solution to use the old-fashioned way instead, virtual machines and build scripts.

The results could benefit others by helping them make more informed decisions concerning deployment when in the process of developing applications. Other companies might benefit from them as well, since they could be influenced to either use or not use containers in order to improve their efficiency. The results could also prove to be beneficial to individuals, even if there are issues for larger companies.

1.4 Goal

The goal is to provide information about application deployment with containers, with its pros and cons over deployment on virtual machines. Our intentions include to further extend the information given with statistics, wherever possible, to back up our conclusions.

A case study is performed where a stateful sample application is created using Docker containers, which is then configured as a Kubernetes cluster locally using Minikube. The application is deployed in Google Cloud Platform’s Kubernetes Engine.

The results of the application deployments, regarding the performance and

usability, is documented, evaluated and discussed.

(11)

The report will also present the benefits and potential drawbacks of container- based applications deployed with Kubernetes in comparison to traditional scaled application deployment on physical servers.

1.5 Methods

To be able to reach the goals of the thesis and acquire the correct and necessary results, a strategy for how to carry out the project is needed. This strategy consists of methods that will ensure that the project is conducted in a successful manner.

The research methods can be divided into two categories, qualitative methods and quantitative methods, although some methods can belong to both.

This thesis will be conducted using qualitative research methods in order to fulfill its purpose and goals. A smaller set of data derived from a case study will be investigated and analyzed in order to formulate a theory, making a qualitative approach more suitable than a quantitative.

To answer the research questions, several activities are conducted. A literature study is performed in order to provide sufficient background information for a deeper understanding of the relevant area. A case study, where an application is deployed using containers and VM’s, is also performed and analyzed with performance and usability in mind. An interview is conducted where an employee at Tradedoubler provide useful insights to the current solution.

1.6 Limitations and Delimitations

In the basis for the thesis, the option of using bare-metal as a deployment technique is disregarded, since the use case is not really comparable with the virtualization techniques. It is also the case that bare-metal provisioning is not of interest to Tradedoubler because of their growing nature.

There are many different virtual machines, several different OS: s, different applications and container types. However, we have limited ourselves to using the most used or accessible in each field, thus using Docker for containers, Ubuntu Server for OS and a self-developed simple application [8], [9]. Using the simple application, we hope to show the minimal differences, but it’s of course not possible to capture all.

We also have been limited to using Google Cloud platform, since cloud servers

are expensive and thus we can only test the one Tradedoubler kindly lent us

access to. We are also limiting ourselves to Kubernetes as an orchestration

manager, since it is the one of interest to Tradedoubler and we lack resources

to test another.

(12)

1.7 Outline

The remainder of the thesis is structured as below:

● Chapter 2 will define the background of the thesis and establish information necessary to be able to fully understand the rest of the thesis. Examples include, what a container is more thoroughly explained and how the processes for deployment looks.

● Chapter 3 gives an overview of the methods that were used during the project. This includes the research and practical methods used and

discussions about why they were chosen.

● Chapter 4 describes the case study we performed. This includes the development of the Docker container, the Kubernetes and virtual machine deployment, both locally and online on the Google Cloud Platform, and the testing on both virtual machines and containers.

● Chapter 5 compiles the results, which consists of those from the testing, together with the replies to the interview.

● Chapter 6 is a discussion regarding the results and the information we

gained during the literature study. It also contains a summary of the

conclusions we have reached and discusses directions for future

research.

(13)

2 Containers and virtual machines as deployment techniques

This chapter will explain several of the biggest concepts mentioned in the text, that can be needed for a deeper understanding of what the project is about and what is happening.

2.1 Early Deployment Methods

In the beginning of the internet there was not especially much traffic. Back then, one could simply run the applications directly on the computer. As more and more people gained access to the internet, the number of servers needed increased, but the resources on each server was not fully utilized. This was very costly because of the cost of the hardware, when it is standing and using only a certain percentage of the full capacity. At the same time, the applications grew increasingly complex and dependent on other software as well. This could also cause issues between applications. Furthermore, hackers got more sophisticated and having the applications directly on the server could increase the potential damage done if a security issue was found.

Searching for an answer to all of these troubles, led to the interest in virtualization.

2.2 Virtualization & Virtual Machines

Today, virtualization is one of the most used deployment technologies. Most of these are virtual machines together with deployment scripts.

Virtualization is the act of abstracting a component or multiple component to simulate something real. In our case this refers to the simulation of components and resources of a computer, on top of existing hardware. Thus, making it possible to simulate multiple resources, such as processors or disks, with just a single resource. This means that one is able to run software not designed for the real system, without changing the software or system. [10, pp.

32 – 33]

However, it is not only subsystems that can be virtualized, it is also possible to apply to an entire system, thus creating a virtual machine. These virtual machines make it possible to run different architectures and/or operating systems on a single machine. Virtual machines have grown very popular with the introduction of cloud platforms and scaling. By using virtual machines, it is possible to provide efficient and secure environments for each cloud user.

[11] [10, p. 36]

2.2.1 VMM

The virtual machine monitor (VMM), also known as hypervisor [12, p. 68],

is the virtualization software which emulates the interfaces and resources so

that the guest operating system can interact with the hardware believing it is

(14)

running on the right hardware. There are two different types of hypervisors, type 1 and type 2.

2.2.1.1 Type 1 Hypervisor

A type 1 hypervisor is the type of VMM that runs on bare metal, which means it runs directly on the hardware, there is no operating system or anything else in between. It then can run the guest operating system as a user process.

When a kernel call is executed by the guest operating system, a trap to the kernel occurs. This enables the VMM to be able to inspect the issued code. If the code was issued by the guest kernel, then the code is forwarded to be run by the kernel. If it were a user process who issued the instruction, then the VMM emulates what the actual kernel would have done if the instruction were done in an installed operating system. [12, pp. 569–570]

Type 1 hypervisors are fast, since they are very close to the hardware and can thus run the virtual OS in the same speed as an actual OS, and it does not need to share resources with an actual OS.

2.2.1.2 Type 2 Hypervisor

Type 2 hypervisors runs, as opposed to type 1 hypervisors, as user processes on top of an operating system. User processes are run like normal processes, however before running code the hypervisor scans through the code and replaces all sensitive calls, such as kernel calls, with procedures of the hypervisor which can handle the calls. This is called binary translation. [12, p.

570]

These hypervisors use a lot of optimizations and caching which enables them to execute on near bare metal performance for user programs. However, the system calls can actually outperform the type 1 hypervisor in the long run, because the trap calls ruins many of the caching features in the actual CPU.

With binary translation these are all left alone. This has led to type 1 hypervisors using binary translations for some calls as well, in order to increase performance. [12, p. 570]

2.2.2 Usages

As mentioned earlier virtual machines enables efficient use of physical

resources, by making it possible to divide the physical resources but still give

the illusion of having access to the entire resource. This makes it possible to

reduce the operating costs by reducing the amount of unused resources, thus

making it possible to have less resources but offer the same performance for

the company and/or cloud provider. It also improves the service provider’s

experience, since this can be used to improve the application performance

while still decreasing the costs for the service provider by allowing to allocate

more resources when needed and then decrease the amount of resources when

they are not needed. It is then possible to allocate more or reduce the amount

of physical resources to a machine on the fly, or to migrate, move the virtual

(15)

machines between different physical machines, without shutting down the virtual machine. [11] This makes them excellent for use with cloud computing where migration and/or resource management is common, and where it otherwise can get quite costly. However, it can also improve the normal self- hosting as well, though one might not see as large benefits.

Furthermore, virtual machines are used to provide secure sandboxing for each user, since each user gets access to a single system. If the user tries to do something unallowed or hack the system all they get access to is their own environment, so at worst they only destroy their own environment, while the other users are unaffected. It is also easier to place restrictions on the resource access on a virtual machine, making it harder for users to try to throttle the services by different means. [10, p. 36]

2.2.3 Issues with Virtual Machines

So, what are the issues with virtual machines? Although it’s a well-established, working solution, it is still a fairly bug-ridden process that is unnecessarily complicated.

Virtual machines require servers to run a full OS, while it might not require the full features that an OS provides. Additionally, the OS has to be configured before being used. [4, pp. 3–6]. Furthermore, there is also an issue regarding resources when using VMs, since we also need to provide the resources of an entire OS, while there is only one application we need to run [5]. As an example, when using a VM on a server, the VM takes up a fixed amount of the Random-Access Memory (RAM) on the server or the whole amount. RAM is crucial for computers, since it is necessary for complex calculations and tasks, but also to be able to run several processes at the same time and provide those processes with working memory [6]. The issue with the VM taking up a fixed amount of RAM is, when the memory is taken, it becomes unavailable for the host and thus also the other machines. This makes it troublesome to run several different sandboxed environments at the same time, a common scenario for security reasons. Furthermore, it forces you to either predict the maximum amount of RAM that the machines will utilize or add additional resources to account for worst case scenarios.

Not only is RAM allocation and management a trouble with VMs but the storage memory is as well, a modern OS can take up more than 32GB of storage [7]. Imagine running 4 VMs at the same machine, then the fixed storage would take up an entire 128GB disk. What is worse is that that memory is just for the OS, not including the programs that the company or developers want to use.

Past the technical issues, there is also a hurdle of configuring the environment

for each server, which all might have different versions, have some small

changes, or even different OS: s, where each might require its own unique

configuration. This lead to a search for something that could solve these

problems, but still provide the flexibility of the virtual machines. Thus, came

the containers.

(16)

2.3 Container Virtualization

Containers is an alternative, or supplement, for virtual machines that offers lightweight sizes, are portable and have high performance. However, containers are different from virtual machines all together. The reason being that they take an entirely different approach to the way of virtualization, a graphical comparison can be seen in figure 1 and 2. Instead of the hardware (or para) virtualization, containers are virtualizing the processes, simply called process virtualization. As can be seen in figure 1, this means that instead of virtualizing the entire operating system, it is only the user processes that are virtualized, A, B and C, together with their dependencies, bins/libs. [13] [14].

This is possible due to the fact that the containers all run on top of the same kernel [15], the host OS’s kernel, unlike virtual machines who emulate the entire system including the kernel, guest OS. The containers can then interact with the underlying system via standard system calls, giving near native performance [16]. Since the kernel already is started with the underlying system, host OS, containers can have an almost instantaneous startup time (excluding the time taken to launch the developer’s own application) and allows for a significantly decreased overhead for both storage and the processes. [14]

Figure 1. The Docker structure when running applications. [17]

Figure 2. A VM structure when running applications. [17]

2.3.1 Docker

Docker has almost become a synonym to the container technology, but in fact

there are several different technologies behind containers, Docker has

however been the most successful in in terms of popularization and usage of

containers. So, what is the difference between Docker and other container

technologies? Docker is not a technology on its own, but rather it is built upon

other container technologies such as LXC (a part of Linux containers), which

has been developed since 2008. However what Docker did, and still does,

differently is that it tries to simplify the management of these technologies,

which on their own can be very hard to manage [2] [18]. By unifying these

(17)

technologies and providing a simple daemon for running the management tools, Docker has succeeded in bringing containers to the masses.

2.3.1.1 Technical Background

By using LXC it is possible for Docker to integrate the containers in a separate environment with its own user groups, network device and virtual IP address.

Together with the tools of cgroups, another component of LXC, Docker can gather metrics allowing constant monitoring, and thus be able handle the resource management of the containers. This makes it possible to run containers in its own environment, without the need to emulate a system. [2]

Another difference is that Docker uses a layering filesystem, AUFS, which allows users to take previous containers as a base for their own application and then just add more onto it. Using this system, you can easily create images, where an image is the packaging of the application that includes everything needed to run. Together with their hub, a repository system not too unlike Github, it is easy to create new projects without having to start installing all dependencies from scratch. AUFS also helps with the downloading and updating of containers, since if a user already has installed an image that another application uses, then Docker can reuse that image instead of requiring the user to download it again. The layering also helps with the updating, as said, since only the differences are needed to be downloaded rather than the entire image. [2], [14]

All of this ends up making the creation of containers as simple as creating a file, called a Dockerfile, where you write what to include, install and run.

Moreover, each statement in this file is one layer in the AUFS system, thus making it easy to control how many or few layers you want to have in your application. [19]

2.3.1.2 Usages

The nature of Docker makes it great for microservices, because of the lightweight containers and the separation of the content. Moving monolithic services to microservices can massively improve the service performance. [20]

It is also possible to reduce costs by using Docker for deployment, something Docker advertises quite heavily, as well as providing a website for calculating the return on investments. [21]

Another reason for using Docker containers is that you can move your service

between different cloud providers without a hassle, since you only need to

move the data. The application and all its dependencies are handled by the

Docker application. By using Docker, all you need to do is start the application

and then to provide the database data. [22] This also improves the update och

patchibility of the web services. It only requires you to apply the updated

images and then restart the cluster, something that takes a very short time,

and is possible to accomplish while still having some servers running. Thus,

decreasing the downtime of such updates. It also improves the data handling

since the data is not part of the application, reducing risks of data corruption

and other kinds of data issues.

(18)

Docker also massively improves the field for DevOps, since the development teams no longer work independently from the operations team, and vice versa [23]. This reduces many of the headaches that these two teams tend to have with each other, and also makes it possible to operate and evolve applications swiftly. This is because it is much simpler now to automate processes and, in addition, it decreases the amount of business layers and/or other teams that the application needs to go through. In some cases, it might even be possible to merge the two, developer and operations teams, together to a single DevOps team possibly reducing company costs and at the same time having the same or improved efficiency. [23] [24]

Of course, these reasons can also be seen as negative for a larger company, however it is possible that this simply is due to the fact that companies are unwilling to cede that much control to developers. It is possible to then ask oneself whether this actually is a disadvantage or not, since it now is possible to have more direct influence in the development process and does not implicitly mean that the developers will do whatever they want.

2.3.2 Kubernetes

As previously introduced, Kubernetes is a portable, open-source platform for automating the deployment, management and scaling of containerized applications across clusters of hosts [25] [26]. It was designed to meet an extremely diverse range of workloads. Its building blocks are loosely coupled and extensible due to the Kubernetes API. The API is used by both internal components, extensions and containers.

Kubernetes runs its components and workloads on a collection of nodes. A

“node” refers to either a physical or a virtual machine that run Kubernetes. A collection of nodes and some other resources needed to run an application is referred to as a cluster. There are two kinds of nodes, master nodes and worker nodes. Every cluster has one master node and can have none or several worker nodes. A cluster that is “highly available”, meaning up and running and available for use under a longer period of time, have multiple master nodes [27] [28].

The master node operates as the brain of the cluster and is largely responsible for the centralized logic that is provided by Kubernetes. It exposes the API to clients, keeps track of the health of other nodes, is in charge of scheduling and manages communication between components [29] [26]. It’s the primary node of contact for the cluster. The worker nodes however, are responsible for running workloads, or containers, with the help of both local and external resources. Every node must therefore have a container runtime, like Docker, installed. The master node will distribute the workloads to the worker nodes, whom will create and destroy containers accordingly as well as handling route and forward traffic as necessary [29].

The basic operative unit in Kubernetes is called a “pod” and can be described

as a collection of one or more, tightly coupled, containers that run in a shared

(19)

context. An illustration of a multi-container pod can be seen in figure 3, where one container implements a file puller and the other a web server. A pod may have one or multiple applications that run within and together implements a service. Usually, there is one container per application, although it is possible that the different containers are for just one application [26]. A pod provides a higher-level abstraction that simplifies the deployment and management of applications. This is where the scheduling, scaling and replication of containers is handled, automatically, instead of on individual container level [26] [27].

Figure 3. A multi-container pod with a volume for shared storage between the containers. [28]

A pod also provides automated sharing of resources. All containers within a pod share a network (IP and port space) and can locate and communicate with each other through “localhost”. This means that the usage of ports must be coordinated between the applications in a pod. Naturally, services that listen on the same port can’t be deployed within the same pod [27] [28] [26].

The containers within a pod also share volumes that can be mounted into an applications filesystem, as has been illustrated in figure 3. A volume is basically a directory that may or may not contain data. The aspects of a volume, such as content and the backing medium, is determined by the type of volume. A pod must specify what volumes exists in the pod and where these should be mounted within the containers. The abstraction of volumes solves, in addition to the sharing of files between containers, some issues concerning persistence of data. Files within a container are relatively short-lived, which could prove to be an inconvenience for non-trivial applications. When a container crashes, it’ll be restarted with a “clean slate”, meaning the files will be lost. Volumes will preserve the files, since the lifetime of a volume is consistent with the lifetime of the pod rather than the container. The volume will only cease to exist when the pod ceases to exist [28] [26].

Pods are also relatively short-lived entities. They will not withstand node and

scheduling failures, nor other types of evictions that could occur because of

node maintenance or lack of resources [28]. When a pod is created, it will be

assigned a unique ID and then scheduled to a node where it will run until it is

either terminated according to the present restart policy or deleted. A node,

(20)

here, refers to either a physical or virtual machine that runs Kubernetes.

Should a node that has a pod scheduled to it, suddenly become unavailable for some reason, then the pod will be scheduled for deletion. It will not be rescheduled to a new functioning node, but an identical replication of the pod, with only a new UID, can replace it [28].

2.3.2.1 Minikube

Minikube

³

is a tool for easily running a small instance of Kubernetes locally. A single-node Kubernetes cluster will be started inside a virtual machine on the local computer. It lets both inexperienced and experienced users try out Kubernetes or use for day-to-day development and testing without having to pay for a cloud provider [30].

2.4 Comparison Between (Docker) Containers and VMs

Before starting it is important to keep in mind what was mentioned earlier, that containers are not virtual machines.

Docker [22] makes the comparison that virtual machines are like houses, they have all the necessary necessities needed in order to live there and provide protection from the outside. Containers on the other hand are like apartments, they also provide the protection and the basic necessities like the houses. However, the necessities such as water and electricity are all shared between all the apartments in the apartment complex. It is also possible to rent or buy all sorts of apartment sizes, everything from single room apartments, without kitchen, bedrooms and so on, to entire penthouse apartments. In a house however it can be very hard to find exactly what one want, one often has to make some sort of compromise.

This is a good way of emphasizing one of the biggest differences between the two. A virtual machine will often consist of a full operating system onto which the developer has installed the application and possibly added, or removed, files, programs or dependencies. Whereas a container starts in the other direction, the developer only adds the application and the dependencies needed for it to run, possibly giving less overhead.

One can take it one step further and claim that Docker is not a virtualization technology, it is an application delivery technology. What this means is that Docker containers are not used for virtualizing a system, where one takes everything an application uses and packages it in a system like if it were running on bare metal. Instead, a container on the other hand is each a part of the application and the data that the container might have is not part of the container, but rather sits directly on the disk. This means that if one needs to change the application architecture, backup or update the application, one does not need to do any pre-procedures at all, other than to back up the data volume. [22]

3 What is Minikube?

(21)

Virtual machines are more secure than containers by default, since it grants additional isolation. [31] In a virtual machine you are inside the contained environment whatever you do, but with containers you are still on top of the host system’s kernel, making it more likely for security breaches to occur. An example of this is that SELinux and other kernel modules are not part of the namespacing in the Docker containers. This means that if a Docker process manages to get access to one of these, it is accessing the actual host system [32]. Thus, making it important to treat privileged processes with care, as if they were running directly on the system. This does not mean that containers by definition are insecure, only that there is at least one less security measure available by default. Furthermore, Docker uses the Hub/Store for download and upload of the container images, even during runtime, which can lead to security issues. There are naturally several ways to contain this by e.g.

downloading static images once, but the fact remains that these are conscious actions the developer has to take rather than default measures. [32] [33]

In further comparison, Virtual machines are not as vulnerable to DDoS attacks and noisy neighbors as Docker containers [34, pp. 32 – 34]. Docker is more geared towards using several containers online at the same time, due to it being an application delivery technology. While the increased density of the application is favorable, this can cause additional troubles as well. With no restrictions this means that if one container is attacked, the whole network might be under heavy load and go down. This can be solved by using an orchestration manager such as Kubernetes, by limiting the max amount of resources each container can take and thus preventing starvation of the resources.

Containers run on the kernel giving far faster startup times than an VM, since most of the system is already started for a container. All a container needs to do is basically launching the application the user wants. A VM on the other hand needs to launch the kernel, then the entire operating system and then, finally, the user application.

However, it is not only the hardware efficiency which makes a case for containers, it is also the fact that Docker can be used with orchestration managers in order to create clusters which acts as a single machine. Using this one can make the service have high availability by using functions for automatic restarts, load balancing with others.

It is possible to run containers on virtual machines in order to have the best of two worlds, the containment and isolation of the virtual machines, with the scalability and resource management of containers. So even with containers on the rise, virtual machines are not disappearing soon.

2.5 Literature and Related Work

2.5.1 Overview of Virtual Machine Resources

Table 1 shows the most substantial articles and resources which might be used

by the interested reader for more in-depth learning concerning virtual

(22)

machines.

Title Author Background Link

Virtual Machines:

Versatile Platforms for Systems and Processes

James Edward Smith, Jim Smith, Ravi Nair

For more reading about virtual machines, this is an excellent book that can give you everything from a deep dive into the technical details, to some small reading about what virtual machines are.

Here

The architecture of Virtual Machines

James Edward Smith, Ravi Nair

This journal gives an overview of virtual machines, which can be compelling if you don’t want to read as much. It is written by the same people who wrote the book, so expect the same quality.

Here

Table 1. Recommended virtual machine resources.

2.5.2 Overview of Container, Docker and Kubernetes Resources Table 2 shows the most substantial articles and resources which might be used by the interested reader for more in-depth learning concerning Docker.

Title Author Background Link

Docker: Lightweight Linux Containers for Consistent

Development and Deployment

Dirk Merkel While a bit dated, since the technology have evolved rapidly in this field, it is a great journal article which brings up everything regarding containers, including its history and how to use them.

Here

What is a container Docker For a quick introduction to Docker and

containers in an easily understood form, Docker provides a short read giving you the

Here

(23)

basics.

What is Kubernetes? The

Kubernetes Authors

Kubernetes themselves provide a very thorough and informative

documentation about their service and is a great source for detailed descriptions of

everything the service entails.

Here

An Introduction to

Kubernetes Justin

Ellingwood Digital Ocean has a Kubernetes tutorial that describes the basics in an informative yet easily understood way, great for anyone relatively new to the subject.

Here

Table 2. Recommended Docker and Kubernetes resources.

2.5.3 Related Work

2.5.3.1 Docker vs Virtual Machine in Deployment

Despite the fact that Docker and containers are popular at the moment, there are few reports written about the differences between Docker/containers and virtual machines. One example though, close to home, is the report Improving Software Development Environment by Rickard Erlandsson and Eric Hedrén [35]. There they discuss topics similar to this report, but they analyze the different techniques as development environments instead of as deployment techniques. While they tangent on the same subject as ours, we found that the difference between each report is significant enough to not overlap and that they are both adding something to the discussion of using these different techniques.

2.5.3.2 Docker vs Virtual Machine Performance Analysis

A few scientific tests regarding the performance in Docker compared to those

of virtual machines have been made. One example is AM Joy [36], who

performs and analyses some similar tests to the data performance tests that

we have, primarily the CPU tests.

(24)

(25)

3 Methodology

This chapter describes the research approach, the outline of the research process, the data collection methods, the tools used during the case study and the general project method that has been adopted.

3.1 Research Approach

3.1.1 Quantitative and Qualitative Methods

When a scientific research is conducted, it’s important to apply appropriate methods that will steer the work in the direction of correct and proper results.

Methods can generally be divided into two categories, quantitative methods and qualitative methods.

Quantitative research methods are more aligned with measurable data like numbers and percentages. Mathematical, computational and statistical techniques and experiments are used to investigate an observable phenomenon. Large sets of hard data are analyzed, forming statistics that can be used for drawing conclusions and verifying or falsifying theories and hypothesis [37].

Qualitative research methods on the other hand, are more aligned with soft data and information that is not really measurable. A particular case is studied more closely, often through observations and interviews, for a deeper understanding. Smaller sets of data are analyzed in order to form hypothesis and theories [37].

Which type of methods that is best suited for a project or research, can be difficult to determine. A good place to start is to identify whether the project is about proving a phenomenon by large sets of data or experiments, or if it’s to investigate a phenomenon or small sets of data to build theories by exploring the environment [37]. If it’s the first, then quantitative research methods would be appropriate and if it’s the latter, qualitative research methods. Once a type has been chosen, the methods of the other type should not be used.

3.1.2 Inductive, Deductive and Abductive Methods

There are three, more commonly used, research approaches: inductive, deductive and abductive approaches.

The inductive approach means that theories, with different explanations from observations, are created. Qualitative methods are used for collecting data, which is then analyzed for better understanding and for formulating different views of the observed phenomenon. There must be enough data to be able to explain why something is happening and for building a theory. [37]

The deductive approach means that theories or hypotheses are investigated

and tested in order to falsify or verify them. Quantitative methods with large

(26)

sets of data are used for testing the theories or hypotheses. The theory that is supposed to be tested, must be expressed in terms and variables that are measurable. The expected outcome that is to be achieved must be stated as well. [37]

3.1.3 Case Study

A case study is a research strategy and it consists of an empirical study. This study is a way of harnessing information by investigating the difference between real life and the theories. The data is harnessed by conducting an exhaustive examination of one specific case in the study, as well as its related contextual data. The case study can then be used as a basis from which one can draw conclusions about the specific context with a large data set as evidence. [37, p. 5]

3.1.4 Interview

An interview is a qualitative research method which can be divided into three underlying types: structured, semi-structured and unstructured.

Structured interviews are a sort of verbal survey, which relies on a questionnaire where predetermined questions are written down. They don’t allow for much variation and response questions but are instead more focused on amassing more data in a shorter time.

Semi-structured interviews consist of a few questions which structure the area to talk about but allows the researcher to ask follow up questions to get further details or explanations. This type is used to get concrete data but allows to get more elaborative answers from the interviewed subjects.

Unstructured interviews have no predetermined questions, other than the possible starting question. The rest of the interview continues on from the answer to that initial question with questions made up on the fly for the area of interest. These are used when you want to study the subject in depth but can also be somewhat challenging for the interviewed subjects to answer. [37]

[38]

3.1.5 Bunge’s Scientific Method for Technology

Bunge’s method for conducting scientific research, has ten ordered cognitive operations, where the next step is based on the outcome of previous work. The method can however be applied to all types of inquiries, like technological or humanistic, and not just inquiries of scientific nature. [39, pp. 253 – 254]

1. Identify a problem.

2. State the problem clearly.

3. Search for information, methods or instruments.

(27)

4. Try to solve the problem with the help of the means collected in the previous step.

5. Invent new ideas, produce new empirical data, design new artifacts.

6. Obtain a solution.

7. Derive the consequences.

8. Check the solution.

9. Correct possibly incorrect solution.

10. Examine the impact.

3.1.6 Adopted Research

This thesis is conducted using qualitative research methods. A smaller set of data derived from a case study is investigated and analyzed in order to formulate a theory, which made a qualitative approach more suitable rather than a quantitative.

The other method we used in this project, is Bunge’s method. However, we

have made certain changes in order to align it with the method in our project.

(28)

3.2 Outline of Research Process

This section presents the outline of the research process. It has been illustrated in a flowchart in figure 3.

Figure 4. Illustration of the research process.

3.2.1 Preparations

Understanding the problem encompasses the first two steps in Bunge, identify a problem and State the problem clearly. We started off by discussing what it was that we really wanted to do, in order to narrow the focus and make it more distinct from our perspective. As a result of our discussion, we could state the first iteration of the research questions.

Having narrowed our focus we started with the literature studies, that were

our means to search for information, methods or instruments. These studies

included video lectures to learn about Docker as well as journals and news

articles that we thought were relevant to the subject. Here we could also read

(29)

articles that would highlight some of the differences between the personal and business environments requirements on software.

3.2.2 Case Study: Phase 1

Next, in accordance with Bunge’s step 4 and 5, we conducted a case study that we divided into two phases. Phase 1 involved transforming a normal

application into a containerized application using Docker containers. Here, the literature study was essential in order to succeed, because of the

prerequisites required. Once the application had been containerized, we deployed the application in Google Cloud with the help of Kubernetes. Finally, we observed the results, in regard to performance and usability, and

documented these.

3.2.3 Case Study: Phase 2

After phase 1 of the case study, we quickly moved on to phase 2. This involved deploying the same application, as non-containerized, on a virtual machine.

The results were observed, once again with regards to performance and usability, and documented.

3.2.4 Interview

After the case study had been performed, an interview was conducted and documented in order to provide us with valuable insights from the development team at Tradedoubler. The goal of the interview was to discover in what way the subjects perceived that containers could help the business as a whole, and if they thought they could make use of containers themselves when developing. We used a semi-structured format in our interview, in order to find answers to the questions we had formulated but were able to steer the interview according to the subject’s answers.

3.2.5 Analysis and Result

The results from both the interview and the case study were then compiled, allowing us to make a more informed comparison between the different deployment techniques. Our findings were then analyzed with the help of the information gathered during the literature study. A final conclusion was then drawn regarding the benefits and drawbacks of using containers for deployment and maintenance in the different environments.

3.3 Data Collection Methods

This section outlines the different methods used to collect data, a very

important component in this project. During this project we used data from

three different data sets: a literature study which was used as the foundation

for our theories, two case studies giving us data to examine and an interview

backing our assumptions about the current developer environment.

(30)

3.3.1 Literature Study

As mentioned earlier, a literature study is performed in order to provide sufficient background information for a deeper understanding of the relevant area of interest. This was especially true in this case, since we did not know much about Docker and the subject at first, though we found it interesting.

After this study however, we felt proficient enough to be able to reach a conclusion regarding the usability of Docker and containers.

During the first stage of the study we went through video lectures from Udemy, called Docker Mastery: The Complete Toolset From a Docker Captain, one of the top IT & Development courses at the time. We also learned Kubernetes using the interactive tutorials given at the official Kubernetes site.

From these courses we learned about Docker and Kubernetes well enough to complete phase 1 in the case study. We then used KTH Primo [40], ScienceDirect [41] and Google Scholar [42] to find relevant books, journals, articles and other papers, in order to find more specific knowledge regarding Docker and virtual machines. We also read up on the different methodologies using the paper given by the school and read through other project reports to get an overlook over how the report should be formed.

3.3.2 Case Study Observations

Observation is to collect data by studying a phenomena and recording data about said phenomena. The observations are made as the last step of each phase in the case study, meaning after the deployment using containers and Kubernetes as well as after the deployment using a virtual machine.

The observations are essential for the comparison between the different deployment techniques. The findings from the observations are arranged and presented in suitable form.

3.3.3 Interview

After having finished the literature- and case studies, we still did not have enough information about what people really thought about the current developer environment. If the technology is improved, but everyone is already content with the current environment, a change could be an unnecessarily burden to some. We wanted to find out if containers could solve some problems developers face today, so what could be better than asking them directly? Using their answers, we found it easier to argue for what the containers really could be used for in the current environment.

We held one interview. The interview was conducted together with a

developer from one of the three development groups within Tradedoubler, he

was part of the database group. This was the group which had neither fully

adapted containers but did use them to some degree. The interview questions

can be seen in appendix B. The answers were then gathered, organized and

(31)

presented. Using these answers, we can easier back up our claims, with the feelings of one of those who work with this on a daily basis.

3.4 Tools

During this project we used a Lenovo Ideapad U330P with Ubuntu 18.04 LTS and a MacBook Pro (2013) with MacOS 10.13.4 (High Sierra). Visual Studio Code were used for all code editing, with the majority of the work oriented towards developing the Dockerfile and Kubernetes files. Docker were used to create the environments implemented in the research, Kubernetes were used as an orchestration manager for the Docker containers, together with Minikube on the local machines. When deploying the application online we used the Google Cloud Platform. For creating the flowcharts in this thesis Draw.io were used.

3.5 Project method 3.5.1 Goals

The goal of this project can be divided into three different sub goals, as defined by Eklund [43].

Effect Goal

Effect goals are the goals considering the why of the project. This means in what way the project can be of use. This thesis is about two of the different deployment options that exists, how they compare to each other and what their strengths and weaknesses are. The goal of this thesis is to improve the current development structure. By informing the industry of the strengths and weaknesses of the two different deployment options, we hope to help improve their current development environment.

Project Goal

The project goal is the reason why we are doing this project and producing this thesis. The project is the final examination of our time at KTH, with the goal of testing whether we can apply our knowledge and deepen our understanding of the subjects learned during the course of our education, without being in the school environment for constant support [44]. Another part of this goal is to be able to provide Tradedoubler with some sort of research and result that could help them decide on their deployment methods in the future.

Result Goal

This is the goal that is the expected result of the project. In our case that is a

scientific thesis containing the result of our research and its conclusions.

(32)

3.5.2 Applied Methods

Project Management Triangle

During our project we used the project management triangle, or simply

“project triangle”, as defined by Sven Eklund [43]. As can be seen in figure 4, the 3 different vertices illustrate the 3 different core principles that is required in a project: cost, time and function. In order to be able to have a margin for error, one of the corners should be flexible. We decided to let the function vertex be the flexible one, using the MoSCoW method.

Figure 5. The Project Triangle [45]

The MoSCoW Method

One of the ways to introduce flexibility to the function vertex is by using the MoSCoW method. The MoSCoW method is a way to prioritize the requirement specification, by dividing the requirements into a few categories [43]. The Must, Should, Could and Won’t categories which contains the requirements that must be done in time, requirements that should be done but can be done a bit later than the must, requirements that would be nice to complete but are not necessary to complete the project and requirements that does not need to be done during this project respectively. Using this method, we came up with the following prioritizations:

Must have

● We must reach the requirements set by the school. This is done by completing a thesis in which we show that we can understand and apply scientific methods for conducting a larger research project.

● We also must have answered the research question which we set in this thesis.

Should have

● An interview which can help us provide valid opinions from developers

regarding our subject.

Improving Software Deployment and Maintenance: Case study: Container vs. Virtual Machine

INOM

EXAMENSARBETE DATATEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2018 ,

Improving Software

Deployment and Maintenance

Case study: Container vs. Virtual Machine

OSCAR FALKMAN MOA THORÉN

KTH

Abstract

Further research could be done to see whether containers can be adapted to another company’s current environment. Moreover, how different cloud provider’s services differ.

Keywords

Docker; Kubernetes; Container; Virtual Machines; Cloud services;

Deployment; Maintenance

Abstrakt

Målet med denna studie var att jämföra denna nya distributionsteknik, containrar, med den mest använda tekniken i dagsläget, virtuella maskiner.

För att få ett mer nyanserat perspektiv vad gäller användbarheten, så hölls även en intervju, vilket resulterade i något mer välgrundade slutsatser.

Vidare forskning kan utföras för att undersöka huruvida containers kan anpassas till ett företags nuvarande utvecklingsmiljö. Olika molntjänster för distribuering av applikationer, samt skillnaderna mellan dessa, är också ett område där vidare undersökning kan bedrivas.

Nyckelord

Docker; Kubernetes; Container; Virtuella maskiner; Molntjänster;

Utplacering; Underhåll

Table of Contents

1 Introduction 1

1.1 Background 1

1.2 Problem 3

1.3 Purpose 4

1.4 Goal 4

1.5 Methods 5

1.6 Limitations and Delimitations 5

1.7 Outline 6

2 Containers and virtual machines as deployment techniques 7

2.1 Early Deployment Methods 7

2.2 Virtualization & Virtual Machines 7

2.3 Container Virtualization 10

2.4 Comparison Between (Docker) Containers and VMs 14

2.5 Literature and Related Work 15

3 Methodology 19

3.1 Research Approach 19

3.2 Outline of Research Process 22

3.3 Data Collection Methods 23

3.4 Tools 25

3.5 Project method 25

3.6 Documentation and modelling method 27

3.7 Evaluating Performance and Usability 27

4 Case study: Application Deployment 29

4.1 Phase 1: Container 29

4.2 Phase 2: Virtual Machine 38

5 Benefits and Drawbacks of Using Containers 43

5.1 Results from Performance Tests 43

5.2 Personal Observations and Usability 48

5.3 Interview 50

5.4 Theoretical Material 51

5.5 Conclusions 52

6 Discussion 55

6.1 Methods 55

6.2 Research Result Revisited 57

6.3 Ethical Aspects 58

6.4 Sustainability 59

6.5 Future Work/Research 59

Appendix A - Code 65

Appendix B - Interview (in Swedish) 69

1 Introduction

1.1 Background

1.1.1 Software Deployment and Maintenance

In medium to large scale companies, containers could have an even greater

impact, since it could replace the need for huge deploy and build scripts, that

are needed to run for each server instance [1]. Instead, one could configure the

application/environment once and quickly push this to all servers at the same

time, making the production flow greatly increased and even minimizing the

amount of errors during deployment.

1.1.2 Software Containers

It also helps to sandbox

1.1.3 Kubernetes

Kubernetes

is an orchestration manager, a tool for managing, scheduling and deploying clusters. A cluster is a collection of, in this case, containers, that either communicate with each other or work together. Kubernetes assists with controlling the containers, and provide various helper functions, e.g.

restarting containers or keeping services running at all times. Kubernetes thus makes it possible to handle scalable services, even if the containers are not running on one’s own computer.

1.1.4 Tradedoubler - The Main Stakeholder

Tradedoubler is a pioneering affiliate marketing business, whose purpose is to simplify advertising on the Internet. Tradedoubler acts as a broker between advertisers and publishers to simplify their interaction.

They give advertisers the ability to create marketing campaigns and see detailed statistics on how, when and from where their sales were generated.

As Tradedoubler’s affiliate network technology has evolved over the years, it has resulted in a heterogeneous environment deployed across several servers.