Improving Software Development Environment: Docker vs Virtual Machines

(1)

DEGREE PROJECT IN COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS

STOCKHOLM, SWEDEN 2017

Improving Software Development Environment

Docker vs Virtual Machines RICKARD ERLANDSSON ERIC HEDRÉN

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

(3)

Abstract

The choice of development environment can be crucial when it comes to developing a software. Few researches exist on comparing development environments. Docker is a relatively new software for handling and setting up container-environments. In this research, the possibility of using Docker as a software development environment is being investigated and compared against virtual machines as a development environment.

The purpose of this research is to examine how the choice of development environment affect the development process. The work was qualitative, with an inductive and a deductive approach. It included a case study with two phases. One in which virtual machines and one in which Docker were used to implement a development environment. Observations were made after each implementation. The data from each implementation were then compared and evaluated against each other.

The results from the comparisons and the evaluation clearly shows that the choice of development environment can influence the process of developing software. Different development environments affect the development process differently, both good and bad. With Docker, it’s possible to run more environments at once than when using virtual machines. Also, Docker stores the environments in a clever way that results in the environments taking up less space on the secondary storage compared to virtual machine environments. This is due to that Docker uses a layer system when it comes to containers and their components. When using Docker, no Graphical User Interface (GUI) to install and manage applications inside a container is provided, this can be a drawback since some developers may need a GUI to work. The lack of a GUI makes it harder to get an Integrated Development Environment (IDE) to work properly with a container to for example debug code.

Keywords: DevOps, virtualization, Docker, virtual machines, software development, development environment.

(4)

Abstract

Valet av utvecklingsmiljö kan vara avgörande vid utveckling av mjukvara. Få undersökningar finns idag angående jämförelser mellan utvecklingsmiljöer.

Docker är en relativt ny mjukvara för att sätta upp samt hantera container- miljöer. I denna undersökning, kommer möjligheten att använda Docker som utvecklingsmiljö att undersökas och jämföras mot virtuella maskiner som utvecklingsmiljö.

Syftet med undersökningen är att se hur valet av utvecklingsmiljö påverkar utvecklingsprocessen av en mjukvara. Arbetet bedrevs på ett kvalitativt sätt, med både ett induktivt samt ett deduktivt tillvägagångssätt. Det inkluderade även en fältstudie med två faser. En där virtuella maskiner och en där Docker användes till att implementera en utvecklingsmiljö. Observationer utfördes efter varje implementation. Data från varje implementation jämfördes och evaluerades mot varandra.

Resultaten från jämförelserna och evalueringen visar att valet av utvecklingsmiljö har inflytande på processen av utveckling av mjukvara. Olika utvecklingsmiljöer påverkar utvecklingsprocessen olika, både på bra och dåliga sätt. Med Docker är det möjligt att köra fler miljöer samtidigt än vad som är möjligt vid användande av virtuella maskiner. Docker lagrar även miljöerna på ett smart sätt, som gör att de tar upp mindre plats på den sekundära lagringen jämfört med virtuella maskiner. Detta är på grund av att Docker använder sig av ett lager-system när det gäller containrar och deras komponenter. När Docker används, tillhandhålls inget Graphical User Interface (GUI) för att installera eller hanterar applikationer inuti en container, detta kan vara en nackdel då vissa utvecklare kan behöva ett GUI för att arbeta. Avsaknaden av ett GUI gör det svårare att få en Integrated Development Environment (IDE) att fungera ordentligt med en container för att till exempel avlusa kod.

Nyckelord: DevOps, virtualisering, Docker, virtuella maskiner, mjukvaruutveckling, utvecklingsmiljöer.

(5)

i

1 Introduction

For an IT-company that produces and maintains software products, to be successful, there are a lot of parts and areas that must work together. There are mainly two groups associated with a software, developers and operators.

The developers work within the creation of the software. They plan, create, test and deliver the software. The operators are the ones associated with the release of the software, they configure and monitor it. How these two groups work and cooperate introduces an area called Development Operations (DevOps) [1]. DevOps is a part of the company’s IT-infrastructure, an important one. By working and improving DevOps related issues, companies can be more efficient in terms of time and money. For many companies, it’s about to reduce the time to the software’s release [1].

1.1 Background

1.1.1 Development Environment

DevOps includes the collaboration between a company’s two teams, developers and operators. The DevOps-area can be divided into two parts, the dev-area and the ops-area.

One part of the DevOps-area is the development process (Dev). During this process, developers at companies try to be as efficient as possible. There exist therefore different kinds of software to help them with that.

The other part is called the operations area (Ops). Operators working within this area handles the finished product from the developers by taking care of the release of the product, configuring it and monitoring it. In this study, the focus is only on the development process inside the dev-area.

Developers that are creating an application are inside an environment. This environment can provide the right tools and software that the developer and the application under development requires. This is called a development environment [4]. How to create, setup and manage these environments goes under the term infrastructure, more exactly under the term “infrastructure as code” [1, p. 135]. The purpose of an infrastructure is mainly to automate the development process and help the work between developers and operators [1, p. 136].

Docker is a software that helps developers. Docker provides a way to package applications into Docker images. The images can later be executed as a container, inside a Docker engine. With the Docker software, the developer can also share theirs or use other’s images via Docker Hub [6]. The Docker

(7)

Page 2 of 54

engine supports several different operating systems [2] and works well with well-established version-handling systems like Git [6].

The use of virtual machines is common, in many ways. A virtual machine uses a hypervisor to be able to run a chosen and independent operating system, inside another one. Virtual machines that provides a complete operating system is called (Operating) System Virtual Machines [33, p. 424]. In some cases, virtual machines are used to provide a development environment that is prepared with the right software, which the current application under development requires. The virtual machine can then be shared between developers, so they all have the same technical conditions when working.

1.1.2 TechniaTranscat

TechniaTranscat is a company that creates and maintains software for Product Lifecycle Management (PLM) [10] systems. Inside these PLM systems various kinds of information about products are handled, information about the products lifecycle. The lifecycle includes everything from the planning of a product, development and design to production, sales, training and support.

TechniaTranscat has its main office in Stockholm, Sweden and is also based in other parts of Europe, India and North America. They have more than 500 employees in total [9].

The company is related to this research in the way that the research’s case study is performed at their main office and they are interested in the results.

1.2 Problem

Using virtual machines in a development process, to provide developers with a development environment, comes with some drawbacks.

Nowadays, operating systems has become bigger in size than they were a few years ago and are still growing. A virtual machine running a clean installation of for example Windows 10 requires a minimum of 20 GB of free hard disk space [7]. When also preparing it with the required software for the development environment, the number of GB increases even more. A disk image for a virtual machine that is ready to use as a development environment can take up to around 50 GB of hard disk space. This means, on a computer with only a 250 GB hard drive, you can only store up to 3-4 disk images at once before running out of space. For developers that for example are using around 8 virtual disk images in one day, this becomes a problem. One solution today is to use some external storage, export the virtual machines and transfer those back and forth to the external storage when needed. The transferring process for removing and adding virtual machines is very time consuming because of the size of the files, that usually are around 50 GB or more. In the long term this is not an efficient solution and surely a better solution can be found.

(8)

Page 3 of 54

When running a virtual machine, it requires a fixed amount of the host’s resources [5, p. 9]. It for example allocates Random Access Memory (RAM) [8, p. 181-254] [12], that gets unavailable for the host computer. This becomes a problem when a developer wants to run multiple virtual machines at once, because the computer has a finite amount of RAM. This can be crucial for a developer when using other virtual machines or other software that are labeled as quite heavy for a computer.

Due to software updates, developer’s preferences and their task inside the virtual machine, the virtual machine soon starts to differ from the other virtual machines that were distributed to all developers. In the end, you can end up with a software that works on one developer’s virtual machine but doesn’t work on another.

The use of third party software is very common when it comes to software development. For example, to make the process faster, more reliable, get more automatization and make it easier for the developers. The Docker software may be a solution or improvement instead of using virtual machines. With our knowledge of these problems, we have a starting point that leads us to our research questions:

RQ1. In which way, can a development environment influence a software development?

RQ2. How can the choice of the development environment enhance the development process?

To answer these questions, observations in a case study are going to be conducted. The answers will be the results from the analysis and comparisons of the data/information from the observations. See chapter 3.

1.3 Purpose

There are many IT-companies that are using virtual machines in different ways. Many of them are using virtual machines as a development environment. The purpose of this study is that these companies or other individuals, for example developers, could find use out of the results.

The goal is to acquire more knowledge on how third-party software can enhance and be used in the development process, especially Docker as an environment against virtual machines.

The ones who will benefit the most from this study are software developers.

Especially those who are currently using virtual machines as their development environments.

TechniaTranscat is one of the companies that can make use out of the results.

They are currently using virtual machines as a part of their development environment. Maybe our study will change their way of developing software, by starting to use Docker instead.

(9)

Page 4 of 54 1.4 Delimitations

In the case of this research it is considering how Docker can enhance the development process of a software. It may be the case that Docker can be implemented to the process in other ways than as a development environment. But due to the short period of time, there is no time to consider all other possible useful areas for Docker. Therefore, the focus will be on using Docker in/as a development environment.

When it comes to virtual machines, there is a lot of different ones to choose from. We have chosen to use the Oracle VM VirtualBox [30] software in this study. That’s because it fits our needs, is easy to use and is a well-established virtual machine software that is widely used. Oracle VM VirtualBox is also what TechniaTranscat is currently using as their development environment.

1.5 Outline

Here will the remaining of this report be presented along with corresponding subchapters.

In chapter 2: Background, a detail background for the different aspects of the study will be represented. Definition of DevOps and how it is related to development environments. The chapter will also explain development environment, what it is and what different kinds that exists. Docker and virtual machines will be explained and how these are important for the study.

In chapter 3: Method, the methods and methodologies that are being used in this study are presented here. Also, various theoretical methods and

methodologies that exist.

In chapter 4: Implementation of case study environments, how the implementations in the case study were made are presented here.

In chapter 5: Evaluation of the environments, the observations of the environments will be presented, explained and evaluated.

In chapter 6: Discussion, this chapter includes discussions about the research in general, methods, results, conclusion and ethics. Future work is also presented here.

In chapter 7: Conclusions, this chapter goes through which conclusions that can be drawn from this research.

(10)

Page 5 of 54

2 Background

In this chapter, all the necessary background information for this thesis will be presented. Under subchapter 2.1 there will be an explanation regarding the DevOps term. In subchapter 2.2, development environments and their meaning will be described. Subchapter 2.3 introduces virtual machines, a detailed description of what a virtual machine is and how it can be used in the development process. In subchapter 2.4 there will be a detailed explanation regarding the Docker software, what it is, how it works, how it can be used as a development environment and which platforms it supports. Subchapter 2.5 will go through literature and related work to this research.

2.1 DevOps

DevOps is the term people assign to the collaboration between software developers and the Information Technology (IT) [21] professionals, or in other words the operators. Developers want to deliver bug fixes, changes and new features. Operators want to have a reliable and stable product. Because of the two group’s different needs, conflicts can occur. Conflicts between these two groups can arise during deployment, after deployment or when it comes to performance [1, p. 20-22].

The word DevOps is a combination between the words development (Dev) and operations (Ops). Development represents the software developers like programmers, testers and quality assurance personnel. Operations represents the people who place the developed software into production and then manage it [1]. See Figure 1 for an illustration of DevOps.

Figure 1: Overview of DevOps [23]

The major goal of using tools and methods inside the DevOps area is not only to enhance the quality of the product that is being created/managed, it also speeds up the process of creation [1, p. 4].

(11)

Page 6 of 54 2.2 Development and infrastructure

In this chapter, we will go through some parts that goes under development and infrastructure. During development, the use of virtual development environments is common. The IDE (Integrated Development Environment, see chapter 2.2.2) that the developers use can either be inside or outside of it.

Some things that goes under the infrastructure term is how the developer sets up the environment, what kind of a development environment to use, how the development environments are distributed (if it’s a virtual environment) and how the developer should work. It’s common that documentation exists to keep information about the infrastructure.

2.2.1 Development environments

Development environments can most easily be explained as a computer system or several that heightens or automates the procedures inside a System Development Life Cycle (SDLC) [15, 16]. The development environment provides the developer and the software under development the right set of tools and applications that the both require. SDLC describes the different kind of phases of which a system product passes through during the life of the product, see Figure 2.

Figure 2: A graphical representation of a typical SDLC [17]

There exist different kinds of environments and below follow four different categories. These categories are examples of some trends that have influenced development environments [15].

• Language-oriented environment (LOE)

An environment that is developed and circles around one programming language. [15] This kind of environment often provides a Graphical

(12)

Page 7 of 54

User Interface (GUI) [19, p. 1680-1681] for the user [18]. The environments are used for “code targeted to deploy to distributed platforms (Windows, Linux, Unix)” [18]. Languages like C++, ADA and PASCAL are included in these kinds of environments.

• Structured-oriented environment (SOE)

SOE is an environment that allows the programmer or the user to alter the core of the program, the structure. This facilitates for the user because he or she no longer needs to remember the syntactic entities of each specific language [20, p. 26].

• Toolkit environment

These kind of environments passes on support for independent- languages and includes tools like version control and configuration management [15].

• Method-based environment

Tools for certain specifications and design techniques and tasks like team and project management [15]. These are examples of different kind of routines that are included in Method-based Environments.

2.2.2 Integrated development environment

An Integrated Development Environment, also referred to as an IDE, is an environment that provides tools for the developer when it comes to writing, compiling and running their code.

The IDE is a software that collects all the needed functionality in one place [36].

2.2.3 Traditional infrastructure handling

When for example, a new developer comes to a team, he or she must start with setting up the required environment, on his or her local computer, that the software under development requires. Sometimes it means to run some scripts or run installation files manually. Even with a fully update documentation of the infrastructure, this can be hard and time-consuming.

Even more if the documentation of the infrastructure is vague or doesn’t exists at all. In some companies, this is the case [1, p. 137].

(13)

Page 8 of 54 2.2.4 Virtual development environments

Virtualizing the development environment means basically that you put the whole environment inside a virtual machine (see subchapter 2.3). Doing this, means that the environment, in its current state with all software installed in it, can be distributed among developers (see Figure 3). There is then no need for the individual developer to manually install all required software.

Figure 3: Illustration of a developer using virtual development environments from a cloud or external storage. VM stands for “virtual machine”.

A developer who want to use a virtual development environment needs to install a virtual machine software on his or her computer, so it can run and handle virtual machines. Then download the environment and run it. The actual code that the developer is working on can either be stored locally on the host or inside the virtual machine with the use of shared folders (see subchapter 2.3.3).

2.3 Virtual machines

A Virtual machine can be explained as an emulator or an independent machine that runs and executes above a hard- or software platform. But instead of using a computer’s hardware to fully function it simply relies on software to run its processes, with the help of a hypervisor [14] (subchapter 2.3.1).

System virtual machines is one kind that is being used in today. These machines are used to give developers an environment where different processes belonging to many different users can exist together [5, p. 17].

System virtual machines often supports different kinds of operating systems which makes it optimal for developers who are in need and uses different software’s on different operating systems [5, p. 18]. Figure 4 shows the structure of a computer that runs several virtual machines, each running an individual operating system.

(14)

Page 9 of 54

Figure 4: Structure of a computer running several virtual machines using a hypervisor. The first machine is marked with green color. [27]

2.3.1 Hypervisor

A hypervisor, also referred to as a VMM (Virtual Machine Monitor) [8, p. 472]

is a software that creates an illusion of that you can boot, create, install software and operating systems on virtual machines, just as if they had directly contact with the hardware. A hypervisor can exist directly on top of the hardware, called a type 1 hypervisor or exist on top of an operating system, called a type 2 hypervisor. A type 2 hypervisor runs very much like a normal process [8, p.477-478] (see Figure 5.). The hypervisor intercepts the virtual machine’s instructions and sends them to the host machine’s hardware. By doing that a hypervisor allows multiple machines to share the same hardware platform (see Figure 3).

Figure 5: Example of different types of Hypervisor-based virtualization [25]

(15)

Page 10 of 54

For virtualization to work properly the hypervisor has some goals in three different categories that it should try to fulfill [8, p. 475]:

• Safety – It should have full control of all the virtual machines.

• Fidelity – A program inside a virtual machine should behave in an identical way to if the same program was running directly on hardware.

• Efficiency – The hypervisor should intervene with the running code inside the virtual machine as little as possible.

Two common hypervisors that exists are Microsoft’s Hyper-V (type 1) and Oracle’s VirtualBox (type 2). Microsoft’s Hyper-V is the hypervisor that is being used by the Docker Software (see subchapter 2.4).

2.3.2 Guest operating system

In both cases of using a type 1 or type 2 hypervisor the operating system running on top of the hypervisor is called a Guest Operating System or a Guest System [8, p. 477]. These guests operating systems consist of normal operating systems, that are running independent of each other. The guest operating system hasn’t full access to the host’s hardware resources, that simply because it’s running inside a virtual machine and the machine has as mentioned earlier, limited resources.

2.3.3 Shared folders

When it comes to where the developer wants to store their code, there is two options. Either the developer can store the code inside the virtual machine and work on it from the inside of the machine. The other way is to store the code outside the virtual machine, locally on the host computer. To achieve this functionality the developer must use something called Shared Folders.

Sometimes also referred as to Synced Folders or Mapped Folders. These folders make the files that are inside of them, accessible from the inside of the virtual machine (guest system). Several virtual machine software and virtual machine management tools have this functionality. [37] [38]

2.4 Docker

Docker is the largest existing software for handling a container environment.

Docker is being used to solve existing problems like “works on my machine”- problems. With Docker, you can run your own or other’s applications inside a Docker Container (see Figure 6). It is also used by developers to gain the possibility to run applications side by side and by so gain a better compute density [2]. Docker as a software can be used by both developers and operators. Following subchapters will go through how developers can use it.

(16)

Page 11 of 54

This study will only be focusing on Docker for developers since it’s the case of using Docker as a developing environment that this study is about.

Figure 6: Example of the Docker software structure, the marked area represents a container (see subchapter 2.4.1.3) [28]

2.4.1 Docker basics

Here will the basic parts of the Docker software be explained.

2.4.1.1 Dockerfile

A Dockerfile [32] is a file that is used to create a Docker image (see chapter 2.4.1.2). This Dockerfile is a basic file with no specific file extension, containing different kinds of commands with a specific syntax to be used when creating the image. The Dockerfile for example defines environment variable, copying or adding files to the image and runs installations. After the creation of a Dockerfile, the user simply uses the Docker-build command to automatically create it into an image.

2.4.1.2 Image

A Docker image is the base of a Docker container (see subchapter 2.4.1.3), a containers root filesystem. An image has no state and can never change after that it has been created. The image itself is basically a collection of layers.

Each layer in an image is an image (see Figure 7) and contains differences made in a filesystem. These layers are read-only and are stacked in an order positioning to be displayed as a single view. The “stacking” is done by the Docker storage driver [29].

A new image that’s created, based on an existing image, shares the layers from the existing base image. By doing this a Docker image saves diskspace as it is

(17)

Page 12 of 54

not a copy of an already existing image but instead, reuses the same layers from the base image where they don’t differ.

2.4.1.3 Container

A Docker container, also called a container image is built upon an image (see subchapter 2.4.1.2). When creating a container, a new layer is added on top of the existing image layers. This new layer is readable and writeable. All changes to the container that occurs after creation are written to this specific layer. The changes can for example be writing, deleting or just modifying files in the container [29]. The container encapsulates the installed applications inside from everything outside of it. It doesn’t matter which system (operating system) the Docker Software is installed on, an application inside a container will always behave the same. A visual representation of a container on a host machine can be seen in figure 6 and a detailed representation of a container and its image can be seen in Figure 7.

Figure 7: Representation of a container and its image layers (colored area) [29]

After changes, have been made to a container, the state can be saved and the container layer added to the rest of the image layers, making it read-only.

More containers can then be created with the newly created image as their base. If a container is deleted, the read-write layer is removed and with it all the changes that have been made after the creation of the container. The thin container layer makes it possible for containers to use the same base image, since all the changes are saved in this specific layer.

(18)

Page 13 of 54 2.4.2 Docker products, tools and commands

Docker provides some useful products and tools that make it easier for everybody with a container-based solution to turn it into a production-ready application. The Docker software has some built in commands, useful and relevant ones will be explained here. Some commands exist to allow developers to take their application to the next level when it comes to efficiency, portability and user-friendliness.

2.4.2.1 Docker hub

The Docker hub is a service that has its purpose to serve developers with useful tools. Tools where developers can link to code repositories, build/test their images and push their images to the Docker cloud (see subchapter 2.4.2.2). The Docker hub service, that is cloud-based, also provides a centralized resource for searching images, user and team collaboration, distribution and change management, and for making the workflow more automatized [6].

2.4.2.2 Docker cloud

Docker Cloud provides a cloud environment for the user, making it possible for he or she to manage, deploy and monitor containers inside their own infrastructure [34]. Docker cloud is a software which enhances the collaboration between developers and operators (DevOps) [35]. Developers are given the opportunity to provide their code continuously and run and test their application through the cloud, while the operators can see and run the application as well.

2.4.2.3 Docker registry

The Docker registry is a server-sided application that can be used to let users store and distribute Docker images. Docker registry is used by users who want to have full control of where images are stored, the distribution and have these features tightly integrated in an in-home development or company workflow.

This is an alternative to Docker hub, but requires more maintenance and needs to be configured [13].

(19)

Page 14 of 54 2.4.2.4 Docker commands

Docker commands inside the Docker CLI (Command-Line Interface) are many and have different purposes. Some useful and relevant ones will be explained here. Have in mind that these commands require some arguments.

Only the command itself will be explained. To see all available commands and their arguments, see [39].

Commands:

• Docker build – After creating a Dockerfile, the Docker build command is used to build an image, based on the instructions inside of the file.

• Docker run – To create and start a container with a given image, the Docker run command is used.

• Docker start/stop – These commands are used to either start or stop an already created container.

• Docker-compose – With this command, the Docker Software allows a developer to run multi-container applications. The developer can create several services with one Dockerfile for each, run the services inside containers and with a compose file (see subchapter 4.3.3.3.2) define an application that consists of all the chosen containers/services [55]. The services inside the containers will run together and work as one application in an isolated environment.

• Docker commit – Running the commit command on a container will generate a new image that consists of all changes that have been made to the container since it first started.

2.4.3 Docker as an environment

Docker is a software environment itself [31] consisting of one or several Docker containers. It’s the Dockerfile mentioned in subchapter 2.4.1.1 that defines the development environment for each image and therefore each container. The development environments can thanks to the freedom of the Dockerfile become custom made for each specific development team. The image, containing the environment can then be distributed to the entire development team via Docker Hub or Docker Cloud services and be kept constantly updated.

The development environment is encapsulated from the rest of the system and works in the same way as on any other operating system that uses the Docker Engine [41].

(20)

Page 15 of 54 2.4.4 Supported platforms

Docker comes in two editions, the Docker Community Edition (CE) and the Docker Enterprise Edition (EE). They come with different features and can be used on a variety of different platforms, see table 1 [40].

Table 1. The table shows which platforms each Docker edition supports.

Docker Community Edition (CE) Docker Enterprise Edition (EE)

• Ubuntu

• CentOS

• Azure

• Amazon EC2

• Windows 10 (Pro Edition)

• macOS

• Fedora

• Debian

• Ubuntu

• CentOS

• Azure

• Amazon EC2

• Windows Server 2016

• Red Hat Enterprise Linux

• Oracle Linux

• SUSE Enterprise Linux Server

2.5 Literature and related work

There is a lot of books on how virtual machines work and about their benefits and drawbacks. These books have been used in the literature study and are included as references in this thesis. On the other hand, Docker doesn’t have as much spread out there and there exist not as many books on how the Docker software works, how it should be used and about the benefits or drawbacks. Docker Inc has provided an eBook [42] that works as an explaining manual for virtualization administrators regarding Docker and virtual machines. This eBook has been used in the literature study.

(21)

Page 16 of 54

(22)

Page 17 of 54

3 Method

There are many different methods and methodologies to choose from when doing a project, they are very important in terms of planning and of steering the project in the right direction. This chapter will theoretically go through some existing ones and present those who are used in this study. Also, our process of work is shown.

3.1 Research approach

This chapter describes various research approaches. It will also be mentioned which ones that are used in our study.

3.1.1 Qualitative and quantitative methods

You can divide all methods into two groups, the quantitative methods and the qualitative methods [3].

Quantitative methods are methods focusing on variables from experiments and testing to accept theories or hypothesis. Qualitative methods are methods which are focused on the understanding of meanings, opinions and behaviors and use these for hypothesis and theories or devolving computer systems and inventions [3]. The choice of these two are key when choosing a method strategy to work with, they steer the whole project. One can also choose to use both, that is called triangulation.

3.1.2 Inductive and deductive approach

A research can have an inductive or deductive approach, or it can have both.

Using an inductive approach, the research’s outcome will be based on observations and patterns. This approach is about gaining an understanding of a phenomenon and establishing different views of it by collecting data and analysing it.

The deductive approach is about verifying or falsifying hypotheses. To do this, this approach often uses quantitative methods on large data sets. Measuring variables and basing the outcome from them [3, p. 5].

3.1.3 Case Study

A case study is a research strategy for carrying out the research. The strategy includes organizing, planning, designing and conducting research. The case study involves an empirical investigation about a specific case in a real-life context using several sources of evidence [3, p. 5].

(23)

Page 18 of 54 3.1.4 Applied research method

The applied research method should be used when looking for an answer for a specific question or trying to solve known and practical problems [3, p.4].

3.1.5 Chosen approach

Our choice was to use a qualitative research, that’s because of the small data set resource and the fact that the study is about software.

We choose to have an inductive as well as a deductive approach to our research. The deductive approach is used to collect specific data parameters from the implementations in the case study that can be seen in subchapter 3.2.2. The data parameters are based on TechniaTranscat’s interests (see subchapter 3.4.2) and what defines a good development environment (see subchapter 3.4.1). The inductive approach is used because we are collecting data that have not been decided when the research starts. This is done with our qualitative methods and then analyzing it to gain understanding about this subject. Qualitative methods are the most common way of collecting data when having an inductive approach [3, p. 5].

The existence of the research questions, mentioned in subchapter 0, indicates that the research should take use of the applied research method. To answer and investigate these questions, a case study is used, to study a specific case in a real-life context.

3.1.5.1 Scientific background

For a research method to be scientific, it must meet some requirements [26, p.

16]:

1. It must be objective. It should deliver the same result no matter which person who’s using it.

2. It must be possible to verify the used methods with alternative methods.

3. It should be deeply rooted in hypothesis or theories that can explain how the methods work.

The used methods in this research covers these requirements. According to Bunge M. [58, p. 263-264] there exists a common general sequence of operations for handling research problems, it was introduced a while ago, but still have a useful meaning. Bellow follows the sequence, where each step is marked in bold if it has any relevance to this research and in that case, explained in which way. Notice that the research isn’t strongly connected to all steps, nor connected at all, that’s because it’s a general sequence, a scientific research is not required to exactly follow these steps. This walkthrough of the sequence is done to show that this research has a scientific background.

1. Spotting the problem(s).

• This step has been applied by discussions with employees at the company to define what problems there are with using virtual

(24)

Page 19 of 54

machines as their development environment. Also by doing the literature study. The result of these are the problems mentioned in subchapter 0.

2. Choice of approach (which includes background knowledge, methodics, and goal).

• This step is applied by the literature study (see subchapter 3.3.1) to gain the necessary information about each software (virtual machines and Docker), to be able to implement the environment. The choice of approach is to perform a case study with two different phases, one for each software (see subchapters 3.2.2). The plan is then to collect data using the different data collection methods (see subchapter 3.3) and then analyse (see subchapter 3.2.3) and compare the data.

3. Formulation of problem within chosen approach (i.e. transformation of raw initial problem into well-defined problem).

4. Search for existing relevant information (i.e. scanning of background knowledge stored in theories, tables, data banks, etc.).

• Made with the help of the literature study (see subchapter 3.3.1), to collect data that is relevant to this research.

5. Design of a research plan.

• Parts of the method chapter and especially subchapter 3.2 lays as a base for a research plan. The design of the research plan was made before the research started. The research was divided into different phases to get a good overview of the project and to see what was needed to be done.

6. Implementation of plan (conceiving hypotheses, building theories, making computations, performing measurements, designing techniques or artifacts, etc.).

• A specific case related to the research is implemented in the case study’s different phases mentioned in subchapter 3.2.2. The implementations are then later observed (see subchapter 3.4) to get data for the evaluation and comparisons (see chapter 5).

7. Trying (checking) the outcome of previous step.

• After each implementation in each phase, in the case study, are done, observations are made. This to gain the data/information that is needed for the evaluation. The observations and data that is being collected is described in subchapter 3.4.

8. Evaluation of candidate solution (idea, data, artifact, or what have you) in the light of the background knowledge as well as the outcome of the previous step.

• When the case study (see subchapters 3.2.2) is done, an evaluation of the collected data is made (see subchapter 3.3.2).

9. If solution is found acceptable, extend or revise background knowledge.

• When the result of the research is complete, it is evaluated and presented to TechniaTranscat, if satisfaction is achieved then the background knowledge is changed.

10. Otherwise start research process again, either replicating it or modifying some components (e.g. hypotheses or methods).

(25)

Page 20 of 54 3.2 Project plan

The research is divided into several parts, to ensure that the time plan can be met. Each part is given an amount of work weeks, the number of weeks per part is limited to two weeks. This leaves one week to use as margin, in case of problems occur. This project plan shows and describes what is done and how it has been done. In Figure 8 you can see an illustration of the research process and its different parts.

Figure 8: An illustration of the research process.

3.2.1 Preparation

The first thing the research started with, was a literature study (see subchapter 3.3.1) and trying to understand the problem, the task, the context and all different parts that were going to be involved in the research. The literature study didn’t stop there, it worked as a good base, but it continued during the whole research when needed.

(26)

Page 21 of 54

The preparations can relate to number 1 and 2 from the sequence in subchapter 3.1.5.1

3.2.2 Case study

The case study has its focus in doing two implementations in two different phases. In each implementation, a test application will be implemented as a part of a development environment. The test application is an application that TechniaTranscat uses in their development. Observations will be made to collect necessary data/information, to analyze and compare, to be able to draw conclusions. See subchapter 3.4 for information regarding the observations.

Phase 1

The first phase in the case study was to implement a virtual machine as a development environment (see subchapter 4.2). To do this, the earlier literature study came in good use regarding virtual machines. Also, discussions with TechniaTranscat about their interests and how they work.

This section can relate to number 2, 6 and 7 in the sequence from subchapter 3.1.5.1.

Phase 2

The second phase had its task to use Docker as a development environment.

The main goal was to install the test application inside a Docker environment (see subchapter 4.3).

The first attempt to do this was to install the software and its belonging databases and servers inside one single container. Once this had been done, an attempt to install the software, database and server on one container each and connect them together.

To succeed with this, the literature study about Docker came in good use and discussions with people at the company that have knowledge about installing the test application. For example, information about doing the installation silently and without interaction.

This section can relate to number 2, 6 and 7 in the sequence from subchapter 3.1.5.1.

3.2.3 Analysis and result

To analyze, is to make research-data from observations (see subchapter 0) and data from other data collection methods interpretable. This, so the data later can be related to the research questions [48, p. 31]. The analysis is often executed from time to time during the entire course of the research [48, p. 60- 61].

(27)

Page 22 of 54

In this section of the research, the analysis of the findings from the observations takes place. The analysis was made after the case study’s phases (see subchapter 3.2.2) when all the data from the observations had been collected. The analysis includes an evaluation and comparisons of the data (see subchapter 5).

This section can relate to number 6, 7, 8 and 9 in the general sequence in subchapter 3.1.5.1.

3.3 Data collection methods

It is very important to analyze, discuss and draw conclusions about data. In this research, we have collected data through two major sources. From a literature study and from several observations of our case study. The literature study data was used to gain valid background information, to be able to implement the two environments and to write the background facts in this research. The observations collect data from the implementations that have been made during the case study.

3.3.1 Literature study

A literature study was made in the beginning and laid ground for the entire research. The goal was to collect the necessary information needed for the project. This information was then processed and organized to act as a base for the case study and the research questions.

The literature study started off with the work of M. Hüttermann [1] to get a good understanding about DevOps and development environments, followed by Docker’s explanatory eBook about Docker containers and virtual machines [42] to get a simplified understanding about the differences between Docker containers and virtual machines. Many of the Docker references in this research have also been used in the literature study to understand Docker and to get the knowledge on how to use it.

Some of the other references have also been used for a more detailed information gathering around DevOps, development environments and virtual machines.

To find these references and books that lay as a base for the literature study, several search tools were used. Tools like Google scholar [45], IEEE [44], Spinger [46], KTH Primo [43] and ScientDirect [47]

The literature study relates to number 2 and 4 in the sequence in subchapter 3.1.5.1.

(28)

Page 23 of 54 3.3.2 Observations from the case study

Observation, is to collect data, qualitative as well as quantitative. Observations can be made to acquire evidence for a hypothesis. Observations are the method one chooses to collect data about the real world, as Jarl Backman writes “If one wants to know something about the real world, one must observe it” [48, p. 31]. The term observations cover all the methods that gives the user empiric data [48, p. 31].

Observations can be indirect which means that the researcher must rely on observations that has been made by research-outsiders. Or a researcher can use direct-observations which are the observations that are made by the researcher himself. These are observations that will be set in subchapter 3.4.3.

Observations in this research are made after each case study phase to be used for in the analysis of the two development environments. Which observations that were made are presented in subchapter 3.4.3 and the evaluation of their data is presented in chapter 5.

3.4 Evaluation Criteria

In this subchapter, it will be explained what a good environment is (see subchapter 3.4.1) and TechniaTranscat’s interests will be shown in subchapter 3.4.2. The research’s observations are presented in subchapter 3.4.3.

3.4.1 Aspects of a good development environment

It’s difficult to define what makes a development environment good. This is because every developer has their specific preferences. One can always compare two results, for example comparing the findings from the observations made in each case study phase. But to be able to evaluate these findings, there must exist some kind of base of what a good development environment is.

No matter which developer you ask, there is some aspects of the development environment that can save time and resources, depending on which environment someone choose.

Aspects of a good development environment:

• It should be easy to set up the environment.

• It shouldn’t take much time to set up the environment.

• Multiple virtual environments should be able to run simultaneously.

• It shouldn’t take much of the computer’s or the host computer’s resources.

• It should be easy to distribute the environments (size factor).

• It’s good if we can ensure that environments are the same/stay the same for each developer over time.

(29)

Page 24 of 54

• It’s good if it’s possible to reset the environment to its initial state at any point of time.

• Should be easy to update the environment.

Some of these have been acquired from discussions with experts at the company and the authors own experiences.

3.4.2 TechniaTranscat’s interests

TechniaTranscat wanted the following list to be evaluated with the Docker software.

• The existing size of the virtual machines tends to grow as they are getting used. Is there a solution in Docker to prevent this or handle it easier?

• Usually it is one person who creates the development environment for the development team. As the environment gets distributed along the team it starts to live its own life due to software updates and as developers add more programs to the virtual machine. This can end up with a software working on one machine and doesn’t on another. Is there a better way of handling this in Docker?

• Number of running virtual machine environments are limited to 2-3 due to fixed stack of system resources. Explore how many Docker environments that can be active on the same host computer.

• Possibility to use ad-hoc applications in a smooth way with Docker.

Ad-hoc applications are small applications that are not used very often and requires some effort to set up.

3.4.3 Observations

The observations that are going to be made during the case study, are based on the aspects of what defines a good development environment (see subchapter 3.4.1) and TechniaTranscat’s interests (see subchapter 3.4.2).

Following observations were made after the implementations in each case study phase:

• The amount of resources that the environment allocates.

• Setup and managing of the environment.

• The amount of simultaneously running environments limit.

• Easiness of upgrading the application in the environment.

• The deployment process, to deploy the application onto the tomcat server using an IDE.

These observations fulfill number 6 and 7 in the general sequence in subchapter 3.1.5.1.

(30)

Page 25 of 54 3.5 Development – tools and methods

Below follows the tools and methods that have been used during the case study, both in phase one and two. The tools and methods are used to create and configure virtual machines and to create and use Dockerfiles.

3.5.1 Tools

The development was done on a HP EliteBook provided by TechniaTranscat.

Software like Oracle VM VirtualBox and Docker were used to create the development environments that were implemented in the research.

Notepad++ [50] were used to edit all Dockerfiles, XML-files and bash-files.

Lucidchart [51] was used to create the flowcharts for the thesis.

3.5.2 Engineering Methods

During the research, some engineering methods were used. Following chapters will go through the research’s engineering specific aspects.

3.5.2.1 Goals

The research has several goals. As Sven Eklund [53, p. 139] describes, the project goal can be divided into three different goals.

Effect goals

The study should deliver results on how the choice of development environment can enhance the development process. Specifically, in our case- study, the Docker software will be evaluated against virtual machines as development environment. The goal is that these results can come in use or contribute to the industry in some way and have a scientific standpoint.

TechniaTranscat is interested in these results.

Project goals

The project goal of the study is to investigate how students use their previous experience as engineers, in a work environment that is not school based. The project and study should also provide research and its results that can help TechniaTranscat in the future choice of development environment.

TechniaTranscat and the school are interested in these results.

(31)

Page 26 of 54 Result goals

The result goal is this thesis of scientific kind which contains the research’s conclusions and results.

TechniaTranscat and the school are interested in this goal.

3.5.2.2 Applied methods

Daily meetings - During the research, within the case study, daily meetings were used. In both phases. The meetings participants consisted of the supervisor from TechniaTranscat and the researchers. This, not only to give a status check but also to set a plan for the day and discuss how the plan’s goal should be achieved. This daily meeting is very much like the Daily Scrum that is used in the agile project method Scrum [49, p.302].

Pair programming – A method very much like pair programming have been applied during the creation of the Dockerfiles, although no specific programming was made. The method was applied in a way of two developers cooperating on one workstation to create these files. The method involves one developer who creates the file and another that observers as each line gets entered. These two roles get switched a lot throughout the work so that each developer gets to create and observe [54].

(32)

Page 27 of 54

4 Implementation of case study environments

This chapter will go through how the implementation of virtual development environments (see subchapter 0) in our case study was made. With virtual machines (phase 1) and with the Docker software (phase 2). Subchapter 4.1 goes through the development environment that were used in the case study.

Subchapter 4.2 will briefly go through how the environment was created with virtual machines. In subchapter 4.3 there will in detail be described and explained how the Docker software was used to create a development environment. From the creation of Dockerfiles that lays as a ground for the Docker images, to the composition of the final containers.

4.1 The development environment

In the specific case in the case study, the development environment consisted of one web-based application and an IDE. This application itself required other applications to be able to run properly. Therefore, the development environment in total included a tomcat server, the JDK 1.7.0_79 (Java Development Kit) [52], an Oracle database 11.2.0.1. Enterprise Edition, Eclipse Neon 3 [24] as the IDE and Gradle 2.7 as build tool. Both implementations will be implemented on a HP EliteBook 840 G1 with Windows Pro Edition.

4.2 Virtual machines as an environment

The chapters that follow will go through how the environment was created with virtual machines. Subchapter 4.2.1 will give a short overview about the virtual machine environments. The creation of the environment will be explained in subchapter 4.2.2.

4.2.1 Overview of virtual machines as a development environment As mentioned in subchapter 4.1, the development environment in this specific case in the case study consisted of different parts. To be able to implement the environment, one would need a software to run virtual machines. The virtual development environment was implemented as a guest operating system (see subchapter 2.3.2). In this guest operating system, the installation of all the necessary applications like a tomcat server, Eclipse IDE, Gradle build tools, oracle database and the web application itself was made. This was performed manually like in any other operating system.

(33)

Page 28 of 54

4.2.2 Creating an environment on a virtual machine

When it comes to creating the environment on a virtual machine, it doesn’t differ from installing it directly on a normal computer and will therefore not be explained in this section. VirtualBox has a GUI so the user can interact with the guest operating system. When all applications needed for the environment are installed can the structure be like the one seen in Figure 9.

Figure 9: An illustration of the virtual development environment implemented with the use of a virtual machine.

4.3 Docker as an environment

Following subchapters will in detail go through how the Docker environment for the case study was created. It was the Docker Community Edition for Windows 10 that was used in the case study. In subchapter 0, there is an overview about Docker as a development environment. How to use Dockerfiles and all other components in practice will be explained in subchapter 4.3.2 and the implementation of the development environment will be presented in subchapter 4.3.3.

(34)

Page 29 of 54

4.3.1 Overview of Docker as a development environment

To be able to implement this in the Docker Software, Dockerfiles had to be created that defined the different parts of the environment (the tomcat server and Oracle database etc.). These Dockerfiles (see subchapter 4.3.2) is then built with the Docker build command (see subchapter 4.3.2.2) into images (see subchapter 2.4.1.2). The images can then be executed in a container (see subchapter 2.4.1.3), resulting in a container with an isolated application that works as normal.

4.3.2 How to use Dockerfiles

This chapter goes through how to use Dockerfiles. How to use them in practice, what to think about when using them and explaining some Docker commands that were used in our case study.

4.3.2.1 Preparations for working with Dockerfiles

Depending on what purpose a Dockerfile has, it may need some external files.

For example, installation binaries or script files for a specific application. All these files are put together inside the folder where the Dockerfile is located.

Each Dockerfile should have their own folder, to keep everything organized.

When the Dockerfile is in place and has all its files in the same folder or subfolders, the creation and editing of the Dockerfile can start.

4.3.2.2 Using Dockerfiles

A Dockerfile is a text file containing instructions and commands to be executed by the Docker Engine, to create a Docker image. A Dockerfile defines an environment by defining which applications or parts of an application that should be installed [32], see figure 11. With the help of the Docker build command, an image can be created, containing all the needed software.

In this research, Dockerfiles has been used to create images, which containers then can be created from, see Figure 10. These Dockerfiles contain all the necessary commands that are needed to install a specific application and its required components, as mentioned in 4.1.

Figure 10: Illustrates the relations between a Dockerfile, an image and a container.

A container can be prepared for an installation of a specific application and then the actual installation is done manually, inside the container (see subchapter 4.3.3.2.1). This requires commands in the Dockerfile that prepares the image/container for this, by for example organizing script and installation files inside the image.

(35)

Page 30 of 54

Figure 11: Showing the Dockerfile that is being used to create the Oracle database.

(36)

Page 31 of 54 4.3.2.3 Dockerfile commands

Here follow explanations of the commands seen in Figure 11.

FROM: The FROM command is vital when creating a Dockerfile, it is the first command in a Dockerfile. Without it, the Dockerfile is not valid. An image cannot work properly and allow installations of applications if the FROM command is missing, the command specifies from which base-image the new image will be created from, for example an operating system image.

ENV: The ENV command declares the environment variables that will be used by the image/container.

COPY: This command copies files and folders from the host machine into the image. The files and folders can later be accessed by the image and the container running on the image.

RUN: This command runs terminal bash commands.

ADD: Does the same as the COPY command but can also copy files from remote URLs and automatically extract compressed archives.

USER: Specifies which user following commands will be executed as.

WORKDIR: Sets the working directory for any Docker command that follow in the Dockerfile.

VOLUME: Defines a volume for a service, see subchapter 4.3.3.3.3.

EXPOSE: Informs the Docker Engine that the container is listening on a specific port at runtime.

CMD: Defines the commands that should be executed when a container is started/created.

4.3.3 Create a development environment out of Dockerfiles

When the Dockerfiles are created, and defined, by all commands and instructions inside of them, it’s time to create the development environment.

As mentioned in subchapter 4.1, the development environment consists of an application and an IDE. The application itself consists of several parts. For example, a database, a webserver and other required libraries. One can implement this in several ways using Docker. The following subchapters will explain and show two different approaches on how it can be done.

4.3.3.1 System structure and organization

Depending on which approach that are used, the structure and organization of the application inside the Docker Engine can vary. The IDE will be installed on the host computer, in both approaches. In the first approach, everything that are related to the application (database, web-server and JDK etc.) are inside one container. The Docker Engine must only have that container up and running for the application to be able to fully functional, see Figure 12. In the second approach, each part of the application gets their own container (see Figure 13). This means that each container must be up and running for the application to be fully functional. The Docker Engine links these containers together with the help of a Docker compose file (see subchapter 4.3.3.3.2).

(37)

Page 32 of 54

Figure 12: Illustration of the structure inside the Docker Engine when using the first approach.

Figure 13: Illustration of the structure inside the Docker Engine when using the second approach.

(38)

Page 33 of 54 The IDE

Windows 10, the platform that were used in the case study, doesn’t have a VNC [22] system installed by default. Such system is required if one is going to use a graphical interface of an application that are inside a Docker container. If the IDE was running inside a container, one wouldn’t be able to use it. In the case study, there was a need of implementing the development environment with as few components installed on the host machine as possible. That resulted in the IDE being installed on the host computer. This was implemented by using Docker’s equivalent to shared folders (see subchapter 2.3.3), volumes (see subchapter 4.3.3.3.3), so that a container can get access files located on the host computer using for example, a compose file [55].

4.3.3.2 Approach 1 – An application in one container

One could define and build the whole application with one Dockerfile. The build context when building that image can become very large in size. It may be nicer to have some organization and see that each step (Dockerfile) works as it should independently (if it’s a large application). Therefore, it can be good to split the application’s parts into separately Dockerfiles. By building these on top of each other (see subchapter 4.3.3.2.1), one will in the end, after building the last Dockerfile into an image, have an image that includes the whole application, see Figure 12.

4.3.3.2.1 Building images on top of each other

This solution results in a final image that contains all the necessary parts for the application, see figure 14. Each image is built with a basis of the previous image from a committed container. The actual installations are made inside the containers and the Docker commit command is used to create an image from the changes inside that container (see subchapter 4.3.3.2.2).

Improving Software Development Environment: Docker vs Virtual Machines