INOM
EXAMENSARBETE DATATEKNIK, GRUNDNIVÅ, 15 HP
STOCKHOLM SVERIGE 2018 ,
Improving Software
Deployment and Maintenance
Case study: Container vs. Virtual Machine
OSCAR FALKMAN MOA THORÉN
KTH
Abstract
Setting up one's software environment and ensuring that all dependencies and settings are the same across the board when deploying an application, can nowadays be a time consuming and frustrating experience. To solve this, the industry has come up with an alternative deployment environment called software containers, or simply containers. These are supposed to help with eliminating the current troubles with virtual machines to create a more streamlined deployment experience.
The aim of this study was to compare this deployment technique, containers, against the currently most popular method, virtual machines. This was done using a case study where an already developed application was migrated to a container and deployed online using a cloud provider’s services. Then the application could be deployed via the same cloud service but onto a virtual machine directly, enabling a comparison of the two techniques. During these processes, information was gathered concerning the usability of the two environments. To gain a broader perspective regarding the usability, an interview was conducted as well. Resulting in more well-founded conclusions.
The conclusion is that containers are more efficient regarding the use of resources. This could improve the service provided to the customers by improving the quality of the service through more reliable uptimes and speed of service. However, containers also grant more freedom and transfers most of the responsibility over to the developers. This is not always a benefit in larger companies, where regulations must be followed, where a certain level of control over development processes is necessary and where quality control is very important.
Further research could be done to see whether containers can be adapted to another company’s current environment. Moreover, how different cloud provider’s services differ.
Keywords
Docker; Kubernetes; Container; Virtual Machines; Cloud services;
Deployment; Maintenance
Förbättring av utplacering och underhåll av mjukvara Fallstudie: Containers vs. Virtuella maskiner
Abstrakt
Att sätta upp och konfigurera sin utvecklingsmiljö, samt att försäkra sig om att alla beroenden och inställningar är lika överallt när man distribuerar en applikation, kan numera vara en tidskrävande och frustrerande process. För att förbättra detta, har industrin utvecklat en alternativ distributionsmiljö som man kallar “software containers” eller helt enkelt “containers”. Dessa är ämnade att eliminera de nuvarande problemen med virtuella maskiner och skapa en mer strömlinjeformad distributionsupplevlese.
Målet med denna studie var att jämföra denna nya distributionsteknik, containrar, med den mest använda tekniken i dagsläget, virtuella maskiner.
Detta genomfördes med hjälp av en fallstudie, där en redan färdigutvecklad applikation migrerades till en container, och sedan distribuerades publikt genom en molnbaserad tjänst. Applikationen kunde sedan distribueras via samma molnbaserade tjänst men på en virtuell maskin istället, vilket möjliggjorde en jämförelse av de båda teknikerna. Under denna process, samlades även information in kring användbarheten av de båda teknikerna.
För att få ett mer nyanserat perspektiv vad gäller användbarheten, så hölls även en intervju, vilket resulterade i något mer välgrundade slutsatser.
Slutsatsen som nåddes var att containrar är mer effektiva resursmässigt. Detta kan förbättra den tjänst som erbjuds kunder genom att förbättra kvalitén på tjänsten genom pålitliga upp-tider och hastigheten av tjänsten. Däremot innebär en kontainerlösning att mer frihet, och därmed även mer ansvar, förflyttas till utvecklarna. Detta är inte alltid en fördel i större företag, där regler och begränsningar måste följas, en viss kontroll över utvecklingsprocesser är nödvändig och där det ofta är mycket viktigt med strikta kvalitetskontroller.
Vidare forskning kan utföras för att undersöka huruvida containers kan anpassas till ett företags nuvarande utvecklingsmiljö. Olika molntjänster för distribuering av applikationer, samt skillnaderna mellan dessa, är också ett område där vidare undersökning kan bedrivas.
Nyckelord
Docker; Kubernetes; Container; Virtuella maskiner; Molntjänster;
Utplacering; Underhåll
Table of Contents
1 Introduction 1
1.1 Background 1
1.2 Problem 3
1.3 Purpose 4
1.4 Goal 4
1.5 Methods 5
1.6 Limitations and Delimitations 5
1.7 Outline 6
2 Containers and virtual machines as deployment techniques 7
2.1 Early Deployment Methods 7
2.2 Virtualization & Virtual Machines 7
2.3 Container Virtualization 10
2.4 Comparison Between (Docker) Containers and VMs 14
2.5 Literature and Related Work 15
3 Methodology 19
3.1 Research Approach 19
3.2 Outline of Research Process 22
3.3 Data Collection Methods 23
3.4 Tools 25
3.5 Project method 25
3.6 Documentation and modelling method 27
3.7 Evaluating Performance and Usability 27
4 Case study: Application Deployment 29
4.1 Phase 1: Container 29
4.2 Phase 2: Virtual Machine 38
5 Benefits and Drawbacks of Using Containers 43
5.1 Results from Performance Tests 43
5.2 Personal Observations and Usability 48
5.3 Interview 50
5.4 Theoretical Material 51
5.5 Conclusions 52
6 Discussion 55
6.1 Methods 55
6.2 Research Result Revisited 57
6.3 Ethical Aspects 58
6.4 Sustainability 59
6.5 Future Work/Research 59
Appendix A - Code 65
Appendix B - Interview (in Swedish) 69
1 Introduction
With the evolution of software, the software has become increasingly dependent on other resources, either from public repositories or other parts of one’s own software. As one includes more and more code from the outside, the deployment and maintenance complexity increases. This created the need to isolate the environment of each application in order to increase security and stability. Thus, virtual machines became an incredibly important part of software deployment. Nowadays however, with the rise of microservices and an increasing need of efficiency, virtual machines does not always fill all the needs of the providers. In the midst of this, software containers made an entrance, causing a lot of buzz in the industry because of their small size and minimal overhead. Is all the praise and hype really valid? Can containers help providers increase their efficiency? These are some questions which this thesis tries to answer.
1.1 Background
1.1.1 Software Deployment and Maintenance
The configuration needed to set up one’s workspace before starting a project, is something most software developers would rather want to skip. It can be both time consuming, frustrating and not to mention repetitive, since one often has to repeat this process each time one starts a new project [1].
However, skipping that part has not been an option. With the introduction of container environments though, it changed. Now, developers can configure the workspace once and simply reuse that configuration when they start a new project, greatly increasing the time the developers can be productive [1].
Deployment is often a daunting task for larger companies, since it often means that it has to manage all different integrations at once and to make these cooperate. It could also mean that the release management has to write several build and configuration scripts that has to be bug free, or the company is going to have bugs in the production even though the software itself might be reasonably bug free. This is just one of many things that containers aim to solve.
In medium to large scale companies, containers could have an even greater
impact, since it could replace the need for huge deploy and build scripts, that
are needed to run for each server instance [1]. Instead, one could configure the
application/environment once and quickly push this to all servers at the same
time, making the production flow greatly increased and even minimizing the
amount of errors during deployment.
1.1.2 Software Containers
A container contains the software and everything that it needs to run, making the software both stand-alone and system independent [2]. This means that it can run on all platforms, system independent, and does not depend on any other program, stand-alone. One could label it as a miniature OS, but where it is only necessary to run the processes of a task rather than an entire OS (like in a virtual machine), making them smaller in size and less resource-hungry.
It also helps to sandbox
1the software, making it easier to create a secure environment, not only from a security perspective [3] but from a developer’s perspective as well, since it is possible to isolate different releases from each other. These factors would suggest that containers are could be beneficial for scaling a program [1].
1.1.3 Kubernetes
Kubernetes
2is an orchestration manager, a tool for managing, scheduling and deploying clusters. A cluster is a collection of, in this case, containers, that either communicate with each other or work together. Kubernetes assists with controlling the containers, and provide various helper functions, e.g.
restarting containers or keeping services running at all times. Kubernetes thus makes it possible to handle scalable services, even if the containers are not running on one’s own computer.
1.1.4 Tradedoubler - The Main Stakeholder
Tradedoubler is a pioneering affiliate marketing business, whose purpose is to simplify advertising on the Internet. Tradedoubler acts as a broker between advertisers and publishers to simplify their interaction.
They give advertisers the ability to create marketing campaigns and see detailed statistics on how, when and from where their sales were generated.
Moreover, Tradedoubler give publishers the ability to use profitable advertisements on their sites, such as blogs, by using campaigns created by the advertisers in the network. Tradedoubler also handles the backend tasks, like invoicing advertisers and creating payments for the publishers, so that the transactions between these parties goes smoothly.
As Tradedoubler’s affiliate network technology has evolved over the years, it has resulted in a heterogeneous environment deployed across several servers.
To create an isolated development environment, and then easily test and deploy changes, is becoming increasingly challenging. An isolated development environment is when the software or OS you are running does not interfere with any other software that is currently running. This ensures the software does not depend on anything unexpected, or that the developer is trying to solve bugs that exists within other software.
1 What does "sandbox" mean?
2 What is Kubernetes?
To make development and deployment easier, Tradedoubler has started to create applications within Docker containers.
1.2 Problem
1.2.1 Issues with Current Deployment Methods
One of the most used deployment technologies within the industry at the moment, is the use of virtual machines (VMs) along with scripts for deployment. While it is a working solution, it is an unnecessarily complicated and bug-ridden process.
Virtual machines require servers to run a full OS, despite the fact that all features provided by a full OS might not be necessary. The OS has to be configured before being used as well [4, pp. 3–6]. Another matter of concern when using VMs, are the resources. A virtual machine will require the resources of an entire OS, even when running a single application. [5]. As an example, when using a VM on a server, the VM requires a fixed amount of RAM on the server (or all of it). RAM is crucial for complex calculations and tasks, but also for being able to run several processes at the same time, as well as to provide those processes with working memory [6]. The RAM that is allocated to the VM will be unavailable for the host, as well as for any other machines connected to it, making it troublesome to run several sandboxed environments at the same time (a common scenario due to security reasons).
This forces one to either predict the maximum amount of RAM that the machines will utilize or add additional resources to account for worst case scenarios.
The RAM allocation and management is not the only issue with VMs however, the storage memory is another problem since a modern OS can take up more than 32GB of storage [7]. Imagine running 4 VMs on the same machine, the fixed storage would take up an entire 128GB disk. This does not include any programs that might have to be installed on them.
Disregarding the technical issues, there is also the need of configuring the environment for each server. They might have different software versions, minor changes, or even different operating systems that might require its unique configuration.
1.2.2 Containers as an Alternative
It is worth to explore whether containers are a valid alternative to the deployment process, and whether it improves the conditions of the current experience by minimizing the amount of configuration and resources needed.
The question of when either deployment technique is suitable is also of
interest, as well as what difference there is between VMs and containers
regarding performance, and whether optimizations can be made in this field
as well.
1.2.3 Research Questions
Given the potential of containers, we are interested in how these could be used to improve today's complicated deployment techniques and in what environments they’re suitable.
Hence, the aim of this thesis is to uncover the benefits and drawbacks of using containers when deploying applications, both in a personal and in a business environment, and to present conclusions based these findings. Thus, leading us to the following research question:
RQ: What are the benefits and drawbacks of using containers for software deployment and maintenance in personal and business environments?
1.3 Purpose
The purpose of the thesis is to disclose what benefits and potential drawbacks that container-based applications, deployed with Kubernetes, have over traditional application deployments on physical servers.
The results will be able to help Tradedoubler decide whether they should invest in the use of containers together with Kubernetes for deployment. If containers should prove to be a good solution for big scale deployment, then this could potentially increase their effectivity and improve developer satisfaction. However, should the conclusion be that containers have significant drawbacks as well, it could prove to be a better solution to use the old-fashioned way instead, virtual machines and build scripts.
The results could benefit others by helping them make more informed decisions concerning deployment when in the process of developing applications. Other companies might benefit from them as well, since they could be influenced to either use or not use containers in order to improve their efficiency. The results could also prove to be beneficial to individuals, even if there are issues for larger companies.
1.4 Goal
The goal is to provide information about application deployment with containers, with its pros and cons over deployment on virtual machines. Our intentions include to further extend the information given with statistics, wherever possible, to back up our conclusions.
A case study is performed where a stateful sample application is created using Docker containers, which is then configured as a Kubernetes cluster locally using Minikube. The application is deployed in Google Cloud Platform’s Kubernetes Engine.
The results of the application deployments, regarding the performance and
usability, is documented, evaluated and discussed.
The report will also present the benefits and potential drawbacks of container- based applications deployed with Kubernetes in comparison to traditional scaled application deployment on physical servers.
1.5 Methods
To be able to reach the goals of the thesis and acquire the correct and necessary results, a strategy for how to carry out the project is needed. This strategy consists of methods that will ensure that the project is conducted in a successful manner.
The research methods can be divided into two categories, qualitative methods and quantitative methods, although some methods can belong to both.
This thesis will be conducted using qualitative research methods in order to fulfill its purpose and goals. A smaller set of data derived from a case study will be investigated and analyzed in order to formulate a theory, making a qualitative approach more suitable than a quantitative.
To answer the research questions, several activities are conducted. A literature study is performed in order to provide sufficient background information for a deeper understanding of the relevant area. A case study, where an application is deployed using containers and VM’s, is also performed and analyzed with performance and usability in mind. An interview is conducted where an employee at Tradedoubler provide useful insights to the current solution.
1.6 Limitations and Delimitations
In the basis for the thesis, the option of using bare-metal as a deployment technique is disregarded, since the use case is not really comparable with the virtualization techniques. It is also the case that bare-metal provisioning is not of interest to Tradedoubler because of their growing nature.
There are many different virtual machines, several different OS: s, different applications and container types. However, we have limited ourselves to using the most used or accessible in each field, thus using Docker for containers, Ubuntu Server for OS and a self-developed simple application [8], [9]. Using the simple application, we hope to show the minimal differences, but it’s of course not possible to capture all.
We also have been limited to using Google Cloud platform, since cloud servers
are expensive and thus we can only test the one Tradedoubler kindly lent us
access to. We are also limiting ourselves to Kubernetes as an orchestration
manager, since it is the one of interest to Tradedoubler and we lack resources
to test another.
1.7 Outline
The remainder of the thesis is structured as below:
● Chapter 2 will define the background of the thesis and establish information necessary to be able to fully understand the rest of the thesis. Examples include, what a container is more thoroughly explained and how the processes for deployment looks.
● Chapter 3 gives an overview of the methods that were used during the project. This includes the research and practical methods used and
discussions about why they were chosen.
● Chapter 4 describes the case study we performed. This includes the development of the Docker container, the Kubernetes and virtual machine deployment, both locally and online on the Google Cloud Platform, and the testing on both virtual machines and containers.
● Chapter 5 compiles the results, which consists of those from the testing, together with the replies to the interview.
● Chapter 6 is a discussion regarding the results and the information we
gained during the literature study. It also contains a summary of the
conclusions we have reached and discusses directions for future
research.
2 Containers and virtual machines as deployment techniques
This chapter will explain several of the biggest concepts mentioned in the text, that can be needed for a deeper understanding of what the project is about and what is happening.
2.1 Early Deployment Methods
In the beginning of the internet there was not especially much traffic. Back then, one could simply run the applications directly on the computer. As more and more people gained access to the internet, the number of servers needed increased, but the resources on each server was not fully utilized. This was very costly because of the cost of the hardware, when it is standing and using only a certain percentage of the full capacity. At the same time, the applications grew increasingly complex and dependent on other software as well. This could also cause issues between applications. Furthermore, hackers got more sophisticated and having the applications directly on the server could increase the potential damage done if a security issue was found.
Searching for an answer to all of these troubles, led to the interest in virtualization.
2.2 Virtualization & Virtual Machines
Today, virtualization is one of the most used deployment technologies. Most of these are virtual machines together with deployment scripts.
Virtualization is the act of abstracting a component or multiple component to simulate something real. In our case this refers to the simulation of components and resources of a computer, on top of existing hardware. Thus, making it possible to simulate multiple resources, such as processors or disks, with just a single resource. This means that one is able to run software not designed for the real system, without changing the software or system. [10, pp.
32 – 33]
However, it is not only subsystems that can be virtualized, it is also possible to apply to an entire system, thus creating a virtual machine. These virtual machines make it possible to run different architectures and/or operating systems on a single machine. Virtual machines have grown very popular with the introduction of cloud platforms and scaling. By using virtual machines, it is possible to provide efficient and secure environments for each cloud user.
[11] [10, p. 36]
2.2.1 VMM
The virtual machine monitor (VMM), also known as hypervisor [12, p. 68],
is the virtualization software which emulates the interfaces and resources so
that the guest operating system can interact with the hardware believing it is
running on the right hardware. There are two different types of hypervisors, type 1 and type 2.
2.2.1.1 Type 1 Hypervisor
A type 1 hypervisor is the type of VMM that runs on bare metal, which means it runs directly on the hardware, there is no operating system or anything else in between. It then can run the guest operating system as a user process.
When a kernel call is executed by the guest operating system, a trap to the kernel occurs. This enables the VMM to be able to inspect the issued code. If the code was issued by the guest kernel, then the code is forwarded to be run by the kernel. If it were a user process who issued the instruction, then the VMM emulates what the actual kernel would have done if the instruction were done in an installed operating system. [12, pp. 569–570]
Type 1 hypervisors are fast, since they are very close to the hardware and can thus run the virtual OS in the same speed as an actual OS, and it does not need to share resources with an actual OS.
2.2.1.2 Type 2 Hypervisor
Type 2 hypervisors runs, as opposed to type 1 hypervisors, as user processes on top of an operating system. User processes are run like normal processes, however before running code the hypervisor scans through the code and replaces all sensitive calls, such as kernel calls, with procedures of the hypervisor which can handle the calls. This is called binary translation. [12, p.
570]
These hypervisors use a lot of optimizations and caching which enables them to execute on near bare metal performance for user programs. However, the system calls can actually outperform the type 1 hypervisor in the long run, because the trap calls ruins many of the caching features in the actual CPU.
With binary translation these are all left alone. This has led to type 1 hypervisors using binary translations for some calls as well, in order to increase performance. [12, p. 570]
2.2.2 Usages
As mentioned earlier virtual machines enables efficient use of physical
resources, by making it possible to divide the physical resources but still give
the illusion of having access to the entire resource. This makes it possible to
reduce the operating costs by reducing the amount of unused resources, thus
making it possible to have less resources but offer the same performance for
the company and/or cloud provider. It also improves the service provider’s
experience, since this can be used to improve the application performance
while still decreasing the costs for the service provider by allowing to allocate
more resources when needed and then decrease the amount of resources when
they are not needed. It is then possible to allocate more or reduce the amount
of physical resources to a machine on the fly, or to migrate, move the virtual
machines between different physical machines, without shutting down the virtual machine. [11] This makes them excellent for use with cloud computing where migration and/or resource management is common, and where it otherwise can get quite costly. However, it can also improve the normal self- hosting as well, though one might not see as large benefits.
Furthermore, virtual machines are used to provide secure sandboxing for each user, since each user gets access to a single system. If the user tries to do something unallowed or hack the system all they get access to is their own environment, so at worst they only destroy their own environment, while the other users are unaffected. It is also easier to place restrictions on the resource access on a virtual machine, making it harder for users to try to throttle the services by different means. [10, p. 36]
2.2.3 Issues with Virtual Machines
So, what are the issues with virtual machines? Although it’s a well-established, working solution, it is still a fairly bug-ridden process that is unnecessarily complicated.
Virtual machines require servers to run a full OS, while it might not require the full features that an OS provides. Additionally, the OS has to be configured before being used. [4, pp. 3–6]. Furthermore, there is also an issue regarding resources when using VMs, since we also need to provide the resources of an entire OS, while there is only one application we need to run [5]. As an example, when using a VM on a server, the VM takes up a fixed amount of the Random-Access Memory (RAM) on the server or the whole amount. RAM is crucial for computers, since it is necessary for complex calculations and tasks, but also to be able to run several processes at the same time and provide those processes with working memory [6]. The issue with the VM taking up a fixed amount of RAM is, when the memory is taken, it becomes unavailable for the host and thus also the other machines. This makes it troublesome to run several different sandboxed environments at the same time, a common scenario for security reasons. Furthermore, it forces you to either predict the maximum amount of RAM that the machines will utilize or add additional resources to account for worst case scenarios.
Not only is RAM allocation and management a trouble with VMs but the storage memory is as well, a modern OS can take up more than 32GB of storage [7]. Imagine running 4 VMs at the same machine, then the fixed storage would take up an entire 128GB disk. What is worse is that that memory is just for the OS, not including the programs that the company or developers want to use.
Past the technical issues, there is also a hurdle of configuring the environment
for each server, which all might have different versions, have some small
changes, or even different OS: s, where each might require its own unique
configuration. This lead to a search for something that could solve these
problems, but still provide the flexibility of the virtual machines. Thus, came
the containers.
2.3 Container Virtualization
Containers is an alternative, or supplement, for virtual machines that offers lightweight sizes, are portable and have high performance. However, containers are different from virtual machines all together. The reason being that they take an entirely different approach to the way of virtualization, a graphical comparison can be seen in figure 1 and 2. Instead of the hardware (or para) virtualization, containers are virtualizing the processes, simply called process virtualization. As can be seen in figure 1, this means that instead of virtualizing the entire operating system, it is only the user processes that are virtualized, A, B and C, together with their dependencies, bins/libs. [13] [14].
This is possible due to the fact that the containers all run on top of the same kernel [15], the host OS’s kernel, unlike virtual machines who emulate the entire system including the kernel, guest OS. The containers can then interact with the underlying system via standard system calls, giving near native performance [16]. Since the kernel already is started with the underlying system, host OS, containers can have an almost instantaneous startup time (excluding the time taken to launch the developer’s own application) and allows for a significantly decreased overhead for both storage and the processes. [14]
Figure 1. The Docker structure when running applications. [17]
Figure 2. A VM structure when running applications. [17]
2.3.1 Docker
Docker has almost become a synonym to the container technology, but in fact
there are several different technologies behind containers, Docker has
however been the most successful in in terms of popularization and usage of
containers. So, what is the difference between Docker and other container
technologies? Docker is not a technology on its own, but rather it is built upon
other container technologies such as LXC (a part of Linux containers), which
has been developed since 2008. However what Docker did, and still does,
differently is that it tries to simplify the management of these technologies,
which on their own can be very hard to manage [2] [18]. By unifying these
technologies and providing a simple daemon for running the management tools, Docker has succeeded in bringing containers to the masses.
2.3.1.1 Technical Background
By using LXC it is possible for Docker to integrate the containers in a separate environment with its own user groups, network device and virtual IP address.
Together with the tools of cgroups, another component of LXC, Docker can gather metrics allowing constant monitoring, and thus be able handle the resource management of the containers. This makes it possible to run containers in its own environment, without the need to emulate a system. [2]
Another difference is that Docker uses a layering filesystem, AUFS, which allows users to take previous containers as a base for their own application and then just add more onto it. Using this system, you can easily create images, where an image is the packaging of the application that includes everything needed to run. Together with their hub, a repository system not too unlike Github, it is easy to create new projects without having to start installing all dependencies from scratch. AUFS also helps with the downloading and updating of containers, since if a user already has installed an image that another application uses, then Docker can reuse that image instead of requiring the user to download it again. The layering also helps with the updating, as said, since only the differences are needed to be downloaded rather than the entire image. [2], [14]
All of this ends up making the creation of containers as simple as creating a file, called a Dockerfile, where you write what to include, install and run.
Moreover, each statement in this file is one layer in the AUFS system, thus making it easy to control how many or few layers you want to have in your application. [19]
2.3.1.2 Usages
The nature of Docker makes it great for microservices, because of the lightweight containers and the separation of the content. Moving monolithic services to microservices can massively improve the service performance. [20]
It is also possible to reduce costs by using Docker for deployment, something Docker advertises quite heavily, as well as providing a website for calculating the return on investments. [21]
Another reason for using Docker containers is that you can move your service
between different cloud providers without a hassle, since you only need to
move the data. The application and all its dependencies are handled by the
Docker application. By using Docker, all you need to do is start the application
and then to provide the database data. [22] This also improves the update och
patchibility of the web services. It only requires you to apply the updated
images and then restart the cluster, something that takes a very short time,
and is possible to accomplish while still having some servers running. Thus,
decreasing the downtime of such updates. It also improves the data handling
since the data is not part of the application, reducing risks of data corruption
and other kinds of data issues.
Docker also massively improves the field for DevOps, since the development teams no longer work independently from the operations team, and vice versa [23]. This reduces many of the headaches that these two teams tend to have with each other, and also makes it possible to operate and evolve applications swiftly. This is because it is much simpler now to automate processes and, in addition, it decreases the amount of business layers and/or other teams that the application needs to go through. In some cases, it might even be possible to merge the two, developer and operations teams, together to a single DevOps team possibly reducing company costs and at the same time having the same or improved efficiency. [23] [24]
Of course, these reasons can also be seen as negative for a larger company, however it is possible that this simply is due to the fact that companies are unwilling to cede that much control to developers. It is possible to then ask oneself whether this actually is a disadvantage or not, since it now is possible to have more direct influence in the development process and does not implicitly mean that the developers will do whatever they want.
2.3.2 Kubernetes
As previously introduced, Kubernetes is a portable, open-source platform for automating the deployment, management and scaling of containerized applications across clusters of hosts [25] [26]. It was designed to meet an extremely diverse range of workloads. Its building blocks are loosely coupled and extensible due to the Kubernetes API. The API is used by both internal components, extensions and containers.
Kubernetes runs its components and workloads on a collection of nodes. A
“node” refers to either a physical or a virtual machine that run Kubernetes. A collection of nodes and some other resources needed to run an application is referred to as a cluster. There are two kinds of nodes, master nodes and worker nodes. Every cluster has one master node and can have none or several worker nodes. A cluster that is “highly available”, meaning up and running and available for use under a longer period of time, have multiple master nodes [27] [28].
The master node operates as the brain of the cluster and is largely responsible for the centralized logic that is provided by Kubernetes. It exposes the API to clients, keeps track of the health of other nodes, is in charge of scheduling and manages communication between components [29] [26]. It’s the primary node of contact for the cluster. The worker nodes however, are responsible for running workloads, or containers, with the help of both local and external resources. Every node must therefore have a container runtime, like Docker, installed. The master node will distribute the workloads to the worker nodes, whom will create and destroy containers accordingly as well as handling route and forward traffic as necessary [29].
The basic operative unit in Kubernetes is called a “pod” and can be described
as a collection of one or more, tightly coupled, containers that run in a shared
context. An illustration of a multi-container pod can be seen in figure 3, where one container implements a file puller and the other a web server. A pod may have one or multiple applications that run within and together implements a service. Usually, there is one container per application, although it is possible that the different containers are for just one application [26]. A pod provides a higher-level abstraction that simplifies the deployment and management of applications. This is where the scheduling, scaling and replication of containers is handled, automatically, instead of on individual container level [26] [27].
Figure 3. A multi-container pod with a volume for shared storage between the containers. [28]
A pod also provides automated sharing of resources. All containers within a pod share a network (IP and port space) and can locate and communicate with each other through “localhost”. This means that the usage of ports must be coordinated between the applications in a pod. Naturally, services that listen on the same port can’t be deployed within the same pod [27] [28] [26].
The containers within a pod also share volumes that can be mounted into an applications filesystem, as has been illustrated in figure 3. A volume is basically a directory that may or may not contain data. The aspects of a volume, such as content and the backing medium, is determined by the type of volume. A pod must specify what volumes exists in the pod and where these should be mounted within the containers. The abstraction of volumes solves, in addition to the sharing of files between containers, some issues concerning persistence of data. Files within a container are relatively short-lived, which could prove to be an inconvenience for non-trivial applications. When a container crashes, it’ll be restarted with a “clean slate”, meaning the files will be lost. Volumes will preserve the files, since the lifetime of a volume is consistent with the lifetime of the pod rather than the container. The volume will only cease to exist when the pod ceases to exist [28] [26].
Pods are also relatively short-lived entities. They will not withstand node and
scheduling failures, nor other types of evictions that could occur because of
node maintenance or lack of resources [28]. When a pod is created, it will be
assigned a unique ID and then scheduled to a node where it will run until it is
either terminated according to the present restart policy or deleted. A node,
here, refers to either a physical or virtual machine that runs Kubernetes.
Should a node that has a pod scheduled to it, suddenly become unavailable for some reason, then the pod will be scheduled for deletion. It will not be rescheduled to a new functioning node, but an identical replication of the pod, with only a new UID, can replace it [28].
2.3.2.1 Minikube
Minikube
3is a tool for easily running a small instance of Kubernetes locally. A single-node Kubernetes cluster will be started inside a virtual machine on the local computer. It lets both inexperienced and experienced users try out Kubernetes or use for day-to-day development and testing without having to pay for a cloud provider [30].
2.4 Comparison Between (Docker) Containers and VMs
Before starting it is important to keep in mind what was mentioned earlier, that containers are not virtual machines.
Docker [22] makes the comparison that virtual machines are like houses, they have all the necessary necessities needed in order to live there and provide protection from the outside. Containers on the other hand are like apartments, they also provide the protection and the basic necessities like the houses. However, the necessities such as water and electricity are all shared between all the apartments in the apartment complex. It is also possible to rent or buy all sorts of apartment sizes, everything from single room apartments, without kitchen, bedrooms and so on, to entire penthouse apartments. In a house however it can be very hard to find exactly what one want, one often has to make some sort of compromise.
This is a good way of emphasizing one of the biggest differences between the two. A virtual machine will often consist of a full operating system onto which the developer has installed the application and possibly added, or removed, files, programs or dependencies. Whereas a container starts in the other direction, the developer only adds the application and the dependencies needed for it to run, possibly giving less overhead.
One can take it one step further and claim that Docker is not a virtualization technology, it is an application delivery technology. What this means is that Docker containers are not used for virtualizing a system, where one takes everything an application uses and packages it in a system like if it were running on bare metal. Instead, a container on the other hand is each a part of the application and the data that the container might have is not part of the container, but rather sits directly on the disk. This means that if one needs to change the application architecture, backup or update the application, one does not need to do any pre-procedures at all, other than to back up the data volume. [22]
3 What is Minikube?