Data Processing and Collection in Distributed Systems

(1)

Data Processing and Collection in

Distributed Systems

Sara Andersson

Computer Science and Engineering, master's level

2021

Luleå University of Technology

(2)

Abstract

Distributed systems can be seen in a variety of applications that is in use today. Tritech provides several systems that to some extent consist of distributed systems of nodes. These nodes collect data and the data have to be processed. A problem that often ap-pears when designing these systems, is deciding where the data should be processed, i.e., which architecture is the most suitable one for the system. Decide the architecture for these systems are not simple, especially since it changes rather quickly due to the development in these areas.

The thesis aims to perform a study regarding which factors affect the choice of archi-tecture in a distributed system and how these factors relate to each other. To be able to analyze which factors do affect the choice of architecture and to what extent, a simula-tor was implemented. The simulasimula-tor received information about the facsimula-tors as input, and return one or several architecture configurations as output. By performing quali-tative interviews, the input factors to the simulator were chosen. The factors that were analyzed in the thesis was: security, storage, working memory, size of data, number of nodes, data processing per data set, robust communication, battery consumption, and cost. From the qualitative interviews as well as from the prestudy five architecture con-figuration was chosen. The chosen architectures were: thin-client server, thick-client server, three-tier client-server, peer-to-peer, and cloud computing.

The simulator was validated regarding the three given use cases: agriculture, the train industry, and industrial Internet of Things. The validation consisted of five existing projects from Tritech. From the results of the validation, the simulator produced cor-rect results for three of the five projects. By using the simulator results, it could be seen which factors affect the choice of architecture more than others and are hard to provide in the same architecture since they are conflicting factors. The conflicting factors were security together with working memory and robust communication. The factor work-ing memory together with battery consumption also showed to be conflictwork-ing factors and is hard to provide within the same architecture. Therefore, according to the simu-lator, it can be seen that the factors that affect the choice of architecture were working memory, battery consumption, security, and robust communication.

By using the results of the simulator, a decision matrix was designed whose purpose was to facilitate the choice of architecture. The evaluation of the decision matrix con-sisted of four projects from Tritech including the three given use cases: agriculture, the train industry, and industrial Internet of Things. The evaluation of the decision matrix showed that the two architectures that received the most points, one of the architec-tures were used in the validated project.

(3)

Sammanfattning

Distribuerade system kan ses i en mängd olika applikationer som används idag. Tritech jobbar med flera produkter som till viss del består av distribuerade system av noder. Det dessa system har gemensamt är att noderna samlar in data och denna data kom-mer på ett eller ett annat sätt behöva bearbetas. En fråga som ofta behövs besvaras vid uppsättning av arkitekturen för sådana projekt är huruvida datan ska bearbetas, d.v.s. vilken arkitektkonfiguration som är mest lämplig för systemet. Att ta dessa beslut har visat sig inte alltid vara helt simpelt, och det ändrar sig relativt snabbt med den utveck-lingen som sker på dessa områden.

Denna uppsats syftar till att utföra en studie om vilka faktorer som påverkar valet av arkitektur för ett distribuerat system samt hur dessa faktorer förhåller sig mot varan-dra. För att kunna analysera vilka faktorer som påverkar valet av arkitektur och i vilken utsträckning, implementerades en simulator. Simulatorn tog faktorerna som input och returnerade en eller flera arkitekturkonfigurationer som output. Genom att ut-föra kvalitativa intervjuer valdes faktorerna till simulatorn. Faktorerna som analyser-ades i denna uppsats var: säkerhet, lagring, arbetsminne, storlek på data, antal noder, databearbetning per datamängd, robust kommunikation, batteriförbrukning och kost-nad. Från de kvalitativa intervjuerna och från förstudien valdes även fem stycken arkitek-turkonfigurationer. De valda arkitekturerna var: thin-client server, thick-client server, three-tier client-server, peer-to-peer, och cloud computing.

Simulatorn validerades inom de tre givna användarfallen: lantbruk, tågindustri och industriell IoT. Valideringen bestod av fem befintliga projekt från Tritech. Från resul-tatet av valideringen producerade simulatorn korrekta resultat för tre av de fem projek-ten. Utifrån simulatorns resultat, kunde det ses vilka faktorer som påverkade mer vid valet av arkitektur och är svåra att kombinera i en och samma arkitekturkonfiguration. Dessa faktorer var säkerhet tillsammans med arbetsminne och robust kommunikation. Samt arbetsminne tillsammans med batteriförbrukning visade sig också vara faktorer som var svåra att kombinera i samma arkitektkonfiguration. Därför, enligt simulatorn, kan det ses att de faktorer som påverkar valet av arkitektur var arbetsminne, batteriför-brukning, säkerhet och robust kommunikation.

Genom att använda simulatorns resultat utformades en beslutsmatris vars syfte var att underlätta valet av arkitektur. Utvärderingen av beslutsmatrisen bestod av fyra pro-jekt från Tritech som inkluderade de tre givna användarfallen: lantbruk, tågindustrin och industriell IoT. Resultatet från utvärderingen av beslutsmatrisen visade att de två arkitekturerna som fick flest poäng, var en av arkitekturerna den som användes i det validerade projektet.

(4)

Acknowledgments

Firstly I would like to thank my supervisor at Tritech, Thomas Danielsson, for his guidance, support and always being available throughout the thesis. Thanks to all project managers, senior architects, and developers at Tritech for participating in the qualitative interviews that were performed during this thesis and for giving constructive feedback.

I would also like to thank my examiner at Luleå University of Technology, Peter Parnes, for providing valuable feedback during the thesis process. Finally, I would like to thank my family for their support throughout my studies.

With my warmest regards, Sara Andersson Stockholm, Sweden, June 13, 2021

(5)

Abstract i Sammanfattning ii Acknowledgments iii Glossary ix Acronyms x 1 Introduction 1 1.1 Background . . . 1 1.1.1 Company background . . . 2 1.2 Motivation . . . 2 1.3 Problem definition . . . 2 1.4 Ethical aspects . . . 2 1.5 Delimitations . . . 2 1.6 Method . . . 3 1.7 Thesis structure . . . 3 2 Related work 4 2.1 Simulator . . . 4 2.1.1 SimGrid . . . 4 2.1.2 SimEvents . . . 4 2.1.3 Cooja simulator . . . 5 2.2 Decision matrix . . . 5 3 Theory 7 3.1 Chosen factors . . . 7 3.1.1 Security . . . 7 3.1.2 Storage . . . 8 3.1.3 Battery Consumption . . . 8 3.1.4 Working memory . . . 8 3.1.5 Number of nodes . . . 8

3.1.6 Data processing per data set . . . 8

3.1.7 Size of data . . . 9

3.1.8 Robustness in communication . . . 9

3.1.9 Cost . . . 9

3.2 Architectural styles . . . 9

3.2.1 Tiered architecture . . . 9

3.3 Architecture models in distributed systems . . . 10

(6)

3.3.1.1 Thin-client server architecture . . . 11

3.3.1.2 Factors impact on thin-client server architecture . . . 11

3.3.1.3 Thick-client server architecture . . . 13

3.3.1.4 Factors impact on thick-client server architecture . . . 13

3.3.1.5 Three-tiered client-server architecture . . . 14

3.3.1.6 Factors impact on three-tier client-server architecture . . . 14

3.3.2 Peer-To-Peer . . . 15

3.3.2.1 Structured overlay networks . . . 16

3.3.2.2 Unstructured overlay networks . . . 17

3.3.2.3 Hierarchical overlay networks . . . 17

3.3.2.4 Factors impact on peer-to-peer architecture . . . 17

3.3.3 Cloud computing . . . 18

3.3.3.1 Service models in cloud computing . . . 19

3.3.3.2 Deployment models . . . 20

3.3.3.3 Factors impact on the cloud computing architecture . . . . 20

3.3.4 Edge computing systems . . . 21

3.3.4.1 Fog computing . . . 22

3.3.4.2 Factors impact on edge computing architecture . . . 22

3.3.5 Summary . . . 23 3.4 Simulator . . . 23 3.5 Decision Matrix . . . 24 4 Implementation 25 4.1 Tools . . . 25 4.1.1 Tkinter . . . 25 4.1.2 SimPy . . . 25 4.1.3 Time . . . 25 4.2 Simulator . . . 26

4.2.1 Thin-client server simulation . . . 27

4.2.2 Thick-client server simulation . . . 28

4.2.3 Three-tiered client-server simulation . . . 28

4.2.4 Cloud computing simulation . . . 29

4.2.5 Peer-to-peer simulation . . . 29

4.2.6 Calculations and result evaluation . . . 30

4.2.6.1 Average transfer time . . . 30

4.2.6.2 Data processing time . . . 30

4.2.6.3 Total time . . . 31

4.2.6.4 Energy consumption . . . 31

4.2.6.5 Predefined communication protocol values . . . 31

4.2.6.6 Evaluation of the result . . . 32

4.2.7 Testing methodology . . . 32 4.3 Decision matrix . . . 33 4.3.1 Testing methodology . . . 33 5 Results 34 5.1 Simulator . . . 34 5.1.1 Factor tests . . . 34

5.1.1.1 Low data processing . . . 34

5.1.1.2 High data processing . . . 35

5.1.1.3 Incorrect test cases . . . 36

(7)

5.2 Decision matrix . . . 38

5.2.1 Rating factors . . . 39

5.2.1.1 Number of nodes . . . 39

5.2.1.2 Size of data . . . 39

5.2.1.3 Battery consumption . . . 39

5.2.1.4 Data processing per data set . . . 40

5.2.1.5 Security . . . 41

5.2.1.6 Storage . . . 41

5.2.1.7 Cost . . . 42

5.2.1.8 Robust communication . . . 42

5.2.1.9 Working memory . . . 43

5.2.1.10 Rating factor values . . . 43

5.2.2 The resulting decision matrix . . . 44

5.2.3 Evaluation of decision matrix . . . 44

6 Discussion 46 6.1 Simulator . . . 46

6.1.1 Implementation of the simulator . . . 46

6.1.2 Factor tests . . . 47

6.1.2.1 Incorrect test cases during factor test . . . 47

6.1.3 Validation tests . . . 48

6.1.3.1 User input factors . . . 50

6.2 Decision matrix . . . 51

6.2.1 Implementation of the decision matrix . . . 51

6.2.2 Evaluation of the decision matrix . . . 52

6.2.2.1 User input weight of each factor . . . 52

6.3 Summary . . . 54

7 Conclusion and future work 55 7.1 Conclusion . . . 55 7.1.1 Research Question 1 . . . 56 7.1.2 Research Question 2 . . . 56 7.2 Future work . . . 56 Bibliography 58 A Qualitative interviews 67 A.1 Interview questions . . . 67

A.2 Summary interviews . . . 68

A.2.1 Summary factors . . . 68

A.2.2 Summary architectures . . . 69

B Summary - Factors impact on architecture 70 C Factor test result 72 C.1 Factor test - High data processing . . . 72

(8)

List of Figures

3.1 Client-server architecture . . . 11

3.2 Three-tiered client-server architecture . . . 14

3.3 Peer-to-peer architecture . . . 16

3.4 A typical architecture of edge computing networks . . . 21

4.1 Flowchart of the simulator . . . 27

4.2 Illustrates the simulation of the thin-client server architecture . . . 28

4.3 Illustrates the simulation of the thick-client server architecture . . . 28

4.4 Illustrates the simulation of the three-tierd client-server architecture . . . 28

4.5 Illustrates the simulation of the cloud computing architecture . . . 29

4.6 Illustrates the simulation of the unstructured peer-to-peer architecture with six peers . . . 29

5.1 The number of times the simulator suggested each architecture when running low data processing . . . 35

5.2 The number of times the simulator suggested each architecture along with each factor when running low data processing . . . 35

5.3 The number of times the simulator suggested each architecture when running high data processing . . . 36

5.4 The number of times the simulator suggested each architecture along with each factor when running high data processing . . . 36

5.5 The resulting decision matrix containing the defined rating factors . . . . 44

C.1 Factor test when using high data processing, communication protocol bluetooth, 100 nodes and 1M B of data . . . 73

C.2 Factor test when using low data processing, communication protocol blue-tooth, 100 nodes and 1M B of data . . . 75

(9)

List of Tables

4.1 Communication protocol table . . . 32

5.1 Incorrect test cases conducted from the factor tests . . . 37

5.2 Validation of simulator by using six test cases based on existing data from Tritech . . . 38

5.3 Decision matrix rating factors . . . 44

5.4 Evaluation of the decision matrix . . . 45

(10)

Glossary

Concurrency A system is concurrent if it can handle multiple program requests,

com-putations, etc, simultaneously.

Multitenancy solutions Refers to that one resource serves several users.

Thick-client device Could for example be workstations or laptops, in comparison to a

thin-client device, can provide better Central Processing Unit (CPU) speed, larger Random Access Memory (RAM), and faster Internet connectivity.

Thin-client device Could for example be smart mobile device. Such as a smartphone,

(11)

Acronyms

CPU Central Processing Unit GB Gigabyte

IoT Internet of Things

LPWAN Low Power Wide Area Network NIST National Institute of Standards and

Technology

RAM Random Access Memory TCP Transmission Control Protocol WSN Wireless Sensor Network

(12)

Chapter 1 Introduction

Distributed systems can be seen in a variety of applications that is in use today, where one of the most well-known ones is the World Wide Web [1, p.3]. Due to the fast devel-opment regarding this area, distributed systems can today be seen in our daily life in a variety of sizes [1, p.3]. It can both be seen in a car or an aircraft, as well as in a system where millions of nodes are participating [1, p.3].

There exist several definitions regarding distributed systems, in this thesis, the most suitable definition is the one defined in [2]: "A distributed system is a collection of au-tonomous computing elements that appears to its users as a single coherent system" [2, p.2]. To be able to appear as a single coherent system to the user, the system com-puting elements need to collaborate and this collaboration can be considered to be one of the most important properties in a distributed system [2, p.2].

When designing a system whose purpose is to be used in a real-world environment, it needs to be designed to be able to handle all kinds of difficulties and threats [1, p.38]. Several design challenges have been identified when designing a distributed system [1, p.38]. To facilitate the management of these design challenges several design patterns can be used [3, p.31]. These patterns can also affect the resulting system properties, such as performance and effectiveness [3, p.31].

1.1 Background

Tritech Technology AB1works with several products that to some extent consists of a distributed system of nodes. Example of areas is in the agriculture, the train indus-try, and industrial Internet of Things. What these systems have in common is that the nodes will collect data and the data will then have to be processed. A question that often appears when deciding the architecture for these systems, is whether the data should be processed on the nodes, the network, or in the cloud? I.e. which architecture configuration would be best suitable for the system. By knowing the specifics for the subsystem, I.e. how much resources in form of power supply, storage, working mem-ory that will be required and where the “intelligence” is most advantageously placed, helps decide the best suitable architecture configuration.

These decisions are not always simple, and it also changes rather quickly due to the development in these areas. There is also an uncertainty at both Tritech and at their

(13)

customers about which factors that do affect and to what extent.

The thesis aims to perform a theoretical study regarding distributed system architec-tures, which and how the factors might affect the architecture. By knowing this, a sim-ulator can be implemented to be able to investigate which factors that affect the choice of architecture and how these factors relate to each other.

1.1.1 Company background

The thesis will be performed at Tritech Technology AB. Tritech is consulting company located in Stockholm with customers who work with Internet of Things in a wide va-riety of areas. Historically, Tritech is a hardware and embedded company, but have in recent years broadened their expertise and do now perform tasks including the entire software stack.

1.2 Motivation

The purpose of the thesis is to help Tritech become a better decision support for their customers and reduce the number of oversized and/or incorrectly constructed system and thereby reduce the project cost.

1.3 Problem definition

The following research questions have been formed:

• RQ1: Which factors affect the choice of architecture and how do they relate to each other?

• RQ2: With the given result from RQ1, how should a decision matrix be designed to facilitate the choice of architecture?

1.4 Ethical aspects

The ethical aspect that is considered regarding this thesis is the information about sen-sitive data. The simulator needs input regarding information about the system that is supposed to be implemented. This information can be considered to be sensitive and cannot be known to anyone else except for the people that are involved in these projects. When using the simulator this sensitive data might need to be known and can therefore become a risk.

1.5 Delimitations

The thesis have following delimitations:

• The simulator being tested for other use cases than the three given use cases; agriculture, the train industry, and industrial Internet of Things.

(14)

• Other architectures than the architectures that is used at Tritech and the archi-tectures found in the prestudy.

• The decision matrix being evaluated for other use cases than the three given use cases; agriculture, the train industry, and industrial Internet of Things.

1.6 Method

The thesis will commence with a planning phase where a time plan and a work struc-ture will be set. The next phase, the research phase, will consist of studies related to distributed systems to determine which and how the factors affect the architecture model. The research phase will also consist of qualitative interviews with experienced architects, developers, and project leaders at Tritech. By using the studied theory, there will then be an implementation phase where the simulator will be implemented. The simulator will be tested and evaluated by using data from existing distributed system projects. From the results of the simulator, a decision matrix will be formed to be able to facilitate the choice of architecture. The evaluation and the results from the simu-lator and the decision matrix will be analyzed and discussed to be able to answer the given research questions.

1.7 Thesis structure

The thesis is structured as follows. Chapter 2 covers the related work found during the prestudy of the thesis. Chapter 3 covers the theory regarding the thesis. The purpose of the theory is to gain knowledge about distributed system architecture, both archi-tectures that are used at Tritech and other archiarchi-tectures, as well as knowledge about which and how the factors affect the architecture. The theory will also include how the simulator and the decision matrix are supposed to be implemented to be able to an-swer the questions defined in section 1.3. Chapter 4 covers the implementation part. The implementation part includes how the simulator will be implemented to be able to implement the architectures found in the prestudy. The implementation part also covers how the decision matrix will be designed to be able to facilitate the choice of architecture. Chapter 5 covers the result that was gained from the validation of the simulator. It also includes how the decision matrix should be constructed according to the simulation results as well as the evaluation of the decision matrix. Chapter 6 covers the discussion regarding the gained results from both the simulator and the decision matrix. Lastly, chapter 7 concludes the thesis, answers the thesis questions defined in section 1.3, and covers the future work ahead.

(15)

Chapter 2 Related work

Throughout the thesis, a simulator and a decision matrix are to be implemented. This section covers the simulator tools that were considered in the thesis as well as decision matrix techniques to facilitate the decision-making process.

2.1 Simulator

The chosen factors that the simulator in the thesis is supposed to consider are secu-rity, storage, battery consumption, working memory, robust communication, number of nodes, data processing per data set, size of data, and cost. There are not many sim-ulators available that receive input from the user regarding important factors, simu-lates distributed architectures, and return one or several architecture recommenda-tions based on the user input. There do exist simulator tools that can simulate a dis-tributed system and can be associated with the thesis. These tools are explained in this section.

2.1.1 SimGrid

One framework that provides a simulation of several distributed system architectures is SimGrid[4]. SimGrid provides a distributed simulation of several architecture con-figurations, some examples are: Cloud computing, Fog Computing, and Peer-to-peer computing [4]. By using SimGrid, the user can compare the simulated architecture in terms of their designs as well as parameters regarding bandwidth, latency, etc. [4]. The SimGrid framework is a suitable tool for providing a reliable overview of how a distributed algorithm performs. The purpose of the simulator in the thesis is to com-pare architectures as well as recommend the most suitable one according to several user-input factors. Since the SimGrid framework does not provide useful tools regard-ing the performance of all the chosen factors that the thesis simulator needs to take into account, the SimGrid framework is not suitable for the simulator in this thesis.

2.1.2 SimEvents

Another simulator framework that might have been a suitable solution is SimEvents[5]. SimEvents provides simulation tools regarding the message exchanging in a distributed environment as well as provides an overview of the resource use [5]. When using SimEvents, several characteristics regarding performance can be simulated, for exam-ple, routing, delays in processing, and communication [5].

(16)

By using SimEvents, it would provide an overview regarding how each component manages the predefined tasks and would have been suitable to be able to simulate which architecture would be most suitable when having information regarding hard-ware specifics. In the thesis simulator, the hardhard-ware specifics regarding the system that is supposed to be simulated might not be known when running the simulator, and since the user is supposed to get an overview regarding which hardware specifics are suitable depending on which architecture configuration that the simulator recom-mends, the SimEvents simulator is not suitable in this case.

2.1.3 Cooja simulator

Another simulation tool that was considered in the thesis was Cooja[6]. Cooja is a net-work simulator that is designed specifically for Wireless Sensor Netnet-work (WSN) and is provided by the Contiki[7] operating system [8]. By using the Cooja simulator the code can be tested before deployed onto the devices and different properties can be estimated, such as power consumption [9]. The Cooja simulator also provides several tools that could be useful during the simulation, such as the location of each node in the network, output, and a timeline [6].

This simulator would provide an overview regarding several hardware specifics, such as battery consumption, which is a factor that is included in the thesis and therefore where this simulation tool considered. This tool would have been a good complement regarding simulation of battery consumption, but since it does not provide the com-parisons needed for each architecture configuration, this tool is not suitable when im-plementing the thesis simulator.

2.2 Decision matrix

From the results of the simulator, a decision matrix is supposed to be designed. The purpose of the decision matrix is to facilitate the choice of architecture, and there have been previous studies regarding how a decision matrix can be designed to be able to facilitate the decision-making process. These studies that are relatable to the thesis are explained in this section.

In [10], the decision matrix method is illustrated by showing how it can be utilized to facilitate the choice of airplane torque tube material. The different material options are defined and a rating system is set of 1-5. Where 1 is the lowest-ranked material and 5 the highest-ranked material. By using this rating system, a rating factor is set for each material option based on the properties and cost of the materials. The study ex-plains that a small change in the rating factors will affect the results and without using a weighting factor, they all have the same importance which also affects the choice of material. Therefore, the study adds a weighting factor with the same ranking system. By multiplying the weighting factor with the rating factor, it constitutes a decision fac-tor. The material option with the highest sum of decision factors is the most suitable one to use according to the decision matrix. By providing this weighting factor system, the most important criteria can be prioritized. The study also mentions that the de-cision matrix can be used as a tool to facilitate the dede-cision process, but that the final decision should not be based on only the decision matrix.

(17)

In [11], another kind of ranking system is used when designing the decision matrix. The ranking system that is used compares the concepts by using "+" and "-". The study also has specified steps regarding how the decision matrix with this ranking system should be designed and evaluated. It begins with that the user should set one design option to be used as a reference design, this should be placed at the first column and the rest of the designs can be listed afterwards. Each specific criteria is listed as rows. When all the concepts and criteria are listed, comparisons can be done. The comparisons are performed by comparing each concept in each criteria to the reference concept. If it is better than the reference concept, an "+" is added, and if it is worse, an "-" is added. The "+" and "-" are then summed up to get a total score for each concept, and the de-sign that returned the highest score can be considered to be the most suitable dede-sign according to the comparisons made.

(18)

Chapter 3 Theory

This chapter covers the findings from the literature review during the prestudy. These findings will give a theoretical overview of the thesis. The main focus of the literature study is to describe the chosen factors that affect the choice of architecture, which ar-chitectures are in use, how they work, and how the factors theoretically affect these architectures. The findings in the study will be utilized during the implementation of the simulator to be able to see how the factors affect the choice of architecture.

The literature for the study was found in databases that provided studies that were relatable to the thesis. The academic papers that were used in the study were mainly found by using the library search tool provided by the Luleå University of Technology. By using the search tool it provided access to databases such as Springer Link, ASTM, ACM, IEEE, and Science Direct.

3.1 Chosen factors

This section covers the chosen factors that affect the choice of architecture in a dis-tributed system. Depending on how these factors might affect the architecture, the choice of a distributed system architecture can be facilitated. The factors are chosen according to the result from the prestudy as well as from the qualitative interviews. The results regarding the qualitative interviews can be seen in appendix A.

3.1.1 Security

Since the nodes that constitute a distributed system could be distributed over a cer-tain area and since the entire area might not be controlled, it causes several security challenges to be met [12, p.9]. The data transfer that is used in distributed systems might also cause the system to be vulnerable and can for example be seen in Internet of Things (IoT) networks [13].

In an IoT network, to be able to perform a secure data transfer, the information re-garding confidentiality, integrity, and availability must be handled in a secure manner [13]. Since the sensors that constitute an IoT network are connected to the Internet, it raises several security challenges to be met [13]. One example is authentication [13]. Many users might be interested in the same specific service, and for the system to be able to be secure, authentication must be provided that is suitable for the devices [13]. Another important security challenge that needs to be met during data transfer is that

(19)

the user should be able to get the services needed without being controlled by any un-approved system [13].

3.1.2 Storage

The nodes that constitute a distributed system have a certain storage capability. A sin-gle node may have some difficulties with store a huge amount of various data [14]. The storage of a distributed system can be configured differently depending on what kind of architecture is being used for the distributed system [1, p.13]. An example is to store the data on servers or by using a cloud service [1, p.13]. Example of companies that provides cloud storage are Google, Facebook, Amazon, and Microsoft [15]. By using cloud storage the user can store a huge amount of data [15].

3.1.3 Battery Consumption

A distributed system may consist of various kinds of system components such as com-puters and sensor devices [2, p.2]. One thing these components have in common is that they consume a certain amount of electric power. Some components, for exam-ple, sensor devices, are limited in battery capacity [16, p.248]. This power mangement challenge can for example be seen in IoT wireless sensor networks [16, p.236].

Since the energy consumption of a device depends on the software application that is being in use, it is hard to estimate the total energy consumption [17].

3.1.4 Working memory

The factor working memory will be focused on the client node of the system. The work-ing memory works as a read and write memory and is the internal memory of the Cen-tral Processing Unit (CPU) [18]. The data that is stored on the working memory is only available as long as the device is running [18].

The nodes that form a distributed system can either be powerful computers or smaller devices [2, p.2]. And can therefore differ in working memory capacities.

3.1.5 Number of nodes

Scalability is an important property when designing a distributed system [1, p.19]. Adding more nodes to the system can affect the system’s performance [2, p.16]. One example can be seen if there would only be one server that computes optimal routes by considering the current traffic [2, p.16]. As the number of requests increases, the server might get overloaded and the responses back to the clients would be delayed [2, p.16].

3.1.6 Data processing per data set

Data processing in a distributed system can for example consists of statistical analysis or searching for patterns [19]. When using distributed systems the load is distributed between several devices where the processing is performed [19]. This provides the ad-vantage that a large amount of information can be processed faster which can be im-portant for information systems that need to provide fast results to the end-user [19].

(20)

Depending on the amount of data that needs to be processed, different approaches can be used [1, p.13]. For example, to be able to perform the processing of data that might be required for the system, additional physical servers or cloud facilities can be used [1, p.13].

3.1.7 Size of data

The size of data can vary depending on the environment the system is operating in. For example, if it would be an environment where IoT devices operated in then there could be a huge amount of data that is collected [20]. If the size is too large for the device, then the system needs to expand to be able to handle the heavier loads.

3.1.8 Robustness in communication

According to [21], robust communication can be defined as: "communication that can withstand intentional or unintentional disturbances of various kinds". In a distributed system, communication between the nodes in the network can be performed by using different approaches [2, p.2]. The communication can consist of techniques regarding both wired and wireless networks [2, p.2].

To be able to use the applications that are located in the cloud, such as online databases, then a robust communication is required [22].

3.1.9 Cost

According to the qualitative interviews in appendix A, the cost is an important factor when designing a distributed system. Since the nodes that constitute the system can-not be too expensive for the system to be profitable.

Estimating the cost when designing a distributed system can be a great challenge [23]. The main reason why this is considered to be a challenge is due to that it is hard to know specific details regarding the future systems [23]. Due to the fast development in this area, it has hard to speculate how future systems might be changed in technology as well as in size, which affects the cost estimation [23].

3.2 Architectural styles

When designing a distributed system, several architectural styles can be used [2, p.56]. The architectural styles main focus is how the components interact with each other, for example, their connection and their functionality of the system [2, p.56].

The thesis will focus on the tiered architecture style, since this is the one used in the chosen architecture models which are described in the section 3.3.

3.2.1 Tiered architecture

When using a tiered architecture the functionalities for the system can be divided into individual layers, where each layer is responsible for providing this functionality [1,

(21)

p.52-53]. The specific functionality can be placed on a node that is suitable for pro-viding the given functionality [1, p.52-53]. The node could for example be a server [1, p.52-53]. The different functionalities for each layer can for example be according to the following:

• Presentation tier – This tier consists of the user interface as well as collects data from the user [24].

• Application tier – This tier processes the data that is sent from the presentation tier and can also perform different operations to the data tier [24].

• Data/Resource tier – This tier can also be known as the database tier [24]. The database tier stores the data that is sent from the application tier [24].

When using a two-tiered architecture, the three tiers explained above are divided be-tween two separate devices, for example, a client and a server [1, p.53]. Where presen-tation tier functionality could be placed at the client, and the functionality regarding the application tier and the database tier could be placed at the server, but this could be configured in other ways as well [2, p.78]. In a three-tiered architecture, the three tiers explained above are divided into three separate devices [1, p.53]. In this archi-tecture, a client device is responsible for the presentation tier, a server is responsible for the application tier, and a separate database server is responsible for the data tier [1, p.53]. The tiered architecture can be generalized to n-tiered, where the system is divided into n components [1, p.53].

3.3 Architecture models in distributed systems

This section covers the architecture models that will be utilized in the thesis. It includes a brief introduction about the architectures as well as explains how the factors that are declared in section 3.1 affect the architecture model. By knowing how the factors the-oretically affect the architecture, the simulator can be implemented according to the theoretical framework.

The architecture for a certain system can be referred to as the components that consti-tute the system and the relationship between these components [25, p.28]. The main purpose of the system architecture is to design the system to be able to manage both current as well as and future requirements on the system [25, p.28].

There exist several different kinds of distributed system architectures, but the thesis will focus on architectures that showed to be of interest in the prestudy and from the qualitative interviews. The results regarding the qualitative interviews can be seen in appendix A. Therefore the thesis will focus on the architectures: Client-server, Peer-to-peer, and hybrid architectures such as Cloud computing, and Edge computing.

3.3.1 Client-Server

One of the most used architectures in distributed systems is the client-server architec-ture [25, p.29]. In the client-server architecarchitec-ture, clients communicate with one or sev-eral servers to be able to access resources that the server maintains [25, p.29]. Which could for example be files [25, p.29].

(22)

Figure 3.1: Client-server architecture

Figure 3.1 shows the client-server architecture where six clients request and receive services from a centralized server.

The client-server architecture can be used for several applications, one example is where many different components are interested in the same service [26, p.48]. In that case, this service can be placed at a server that should be designed to be able to handle all the different client requests [26, p.48]. By using this approach the clients can access the same resource by interacting with the server [26, p.48].

The client-server approach can for example be seen in the online game EVE Online [1, p.5-6]. Where data is maintained at the server, and the client can retrieve the data by, for example, using player consoles [1, p.5-6].

The client-server organization can use the tiered architecture [2, p.78]. Which is more explicitly described in section 3.2.1. Since one of the focuses in the thesis is a data pro-cessing and where the data propro-cessing should be performed, the thesis will focus on two-tiered architecture with thin and thick clients as well as three-tiered architecture.

3.3.1.1 Thin-client server architecture

In the thin-client server architecture, the presentation tier is placed at the thin-client device, and the application logic, as well as the data logic, is placed at the server [25, p.30]. By using this model the client is very dependable on the server which causes that the processing load on the network and server are increased [25, p.30]. A thin-client device could for example be a smart mobile device [27]. Which for example could be a tablets, sensor devices and IoT devices [27].

3.3.1.2 Factors impact on thin-client server architecture

The factors that are explained in section 3.1 affect the thin-client server architecture in the following ways:

Security:

When using a client-server model, it consists of security challenges due to the data transfer between the client and server [28]. The security challenges regarding data transfer are more explicitly explained in section 3.1.1. Since the thin-client device only handles the presentation tier, all data are managed and maintained at the server, which makes the system safer [29].

(23)

Storage:

As mentioned above, thin-client devices could for example be smart mobile devices. These devices might consist of very small data storage [25, p.30]. Therefore, the stor-age in a thin-client server model is performed at the server.

Battery consumption:

As mention above, a thin-client device can for example be a smart mobile device, which means that the thin-client device can be limited in power consumption. By us-ing a thin-client server model, the server performs the processus-ing as well as the man-aging of data [30, p.279]. Since processing are perfomed at the server, it decreases the energy consumption in the thin-client devices [30, p.279]. Since multiple thin-clients can access a shared server, it provides lower energy consumption [30, p.279].

Working memory:

A thin-client device could for example be a smart mobile device [27]. These mobile de-vices are limited in resources such as memory, processing, battery, CPU, and storage [27]. Due to this, these devices are limited in performing certain computing functions [27].

Number of nodes:

As mentioned above, the processing of data is performed at the server and one dis-advantage of using the thin-client server approach is because it might cause a lot of traffic on the network as well as that the server needs to be able to handle many client requests [25, p.30].

Data processing per data set:

In a thin client-server architecture, the data processing is performed at the server. The server could for example be a high-end workstation [31, p.13], that often run a stan-dard server operating system [32].

Size of data:

The thin-client device might have problems regarding fast data transmission due to the limitations in the thin-client devices [27]. The problem regarding data transfer can for example be seen when the big size of Big data (which are large complex data sets [33]) are transferred from the thin-client device to the application server [27]. Trans-ferring this huge amount of data, will not only cause a heavy load on the network but also affect the quality of service negatively [27].

Robustness in communication:

When using a thin-client server approach, a good computing experience might be hard to achieve without close to perfect network conditions [34]. There have been research perfomed regarding this area and one example of a technology that have been men-tioned is Low Power Wide Area Network (LPWAN) [35]. As the development of infor-mation technology increased, several IoT devices started to be used in environments that required a wider range of communication [35]. An example of a technology that can be used for this is LPWAN [35]. By using this technique it can communicate at a dis-tance of 15 km and provides the ability for the IoT devices to transmit data over longer distances [35]. For these devices to achieve robust communication, they can estimate how robust the connection is [35]. But since these devices might run into computing problems, due to the lack of computing resources, it can be hard to evaluate statistical information about data transmission in these devices [35].

Cost:

Thin-client server architecture is cost efficent [36]. Since the thin-client device consists of fewer hardware parts and also has a longer lifetime [36]. Another reason why this ar-chitecture is considered to be cost-efficient is because of the reduced costs regarding the management and maintenance of the system [36].

(24)

3.3.1.3 Thick-client server architecture

When using the thick-client server architecture, the client consists of the presentation tier as well as the application tier [25, p.30]. The server is responsible for the data tier and thereby manages the data [25, p.30]. This causes the data processing is performed at the client device [25, p.31]. If the client properties are known before choosing a sys-tem architecture, then the thick-client server approach can suitably be assigned [25, p.31]. In the comparison of the thin-client server model, the thick-client server model can be more advanced to use since the client is both responsible for the presentation tier as well as the application tier, this causes that the application in use must be in-stalled and updated on all client devices [25, p.31].

A thick-client device such as workstations or laptops, in comparison to a thin-client de-vice, can provide better CPU speed, larger Random Access Memory (RAM), and faster Internet connectivity [37].

3.3.1.4 Factors impact on thick-client server architecture

The factors that are explained in section 3.1 affect the thick-client server architecture in the following ways:

Security:

The thick-client server model raises several security challenges [38]. Since the appli-cation tier is located at the client device it means that the management must be per-formed at the client [38]. This gives the end-user the ability to affect applications and data, which might affect negatively on the thick-client device [38]. Due to that the data is processed at the thick client it also causes problems regarding control of the data [38]. The end-user might download sensitive data to a memory device [38].

Storage:

As mention above, the thick-client processes the data on the client node and then sends the data for storage to a database. Since a thick-client device could for exam-ple be a workstation it consists of internal disk storage and local processing capability [30, p.279]. As mentioned above, a thick-client device does usually offer a better hard-ware capacity.

Battery Consumption:

Since the thick-client handles the presentation and application logic it means that some processing is performed at the client device and therefore the power consump-tion is high when using a thick-client server model [30, p.282].

Working Memory:

Since the thick-client server architecture requires local processing performed on the client device. The client must be equipped to handle the data processing. As men-tioned above, in comparison to a thin-client device, a thick-client device consists of faster CPU speed and larger RAM.

In contrast to the thin-client server model, processing and storing can be done locally on the thick-client, which means that the communication with the server will not be as loaded as for the thin-client server approach [25, p.30].

The data processing is performed at the client and when using thick-clients, the pro-cessing power is decentralized [39]. By providing a decentralized propro-cessing power, the chances for negative effects on the performance to happen during the data processing decreases [39].

(25)

Size of data:

As mentioned above, the thick-client server is equipped with both large storage as well as good Internet connectivity.

As mentioned above, the thick-client device, in comparison to the thin-client device, has faster Internet connectivity. The thick-client devices can perform some functional-ities without requiring being connected to the server [39]. This provides that the thick-client server model is not as dependent on robust communication as for the thin-thick-client server model.

Cost:

A thick-client server model can be considered to be costly since they have high main-tenance and upgrading costs [38].

3.3.1.5 Three-tiered client-server architecture

When using three-tiered architecture the presentation tier, application tier, and database tier are divided into three separate infrastructures [25, p.31]. This constitution of de-vices could for example be that the presentation tier consists of a lightweight device, the second tier could be a server that is managing the data processing, the third tier could be a database [1, p.53].

Figure 3.2: Three-tiered client-server architecture

Figure 3.2 shows an example of an interaction between the three-tiered architecture when a request is sent.

3.3.1.6 Factors impact on three-tier client-server architecture

The factors that are explained in section 3.1 affect the three-tier client-server architec-ture in the following ways:

Security:

A three-tiered web application architecture is usually considered to be much secure than two-tier and n-tier applications [40]. According to the study [40], it showed that a tiered web application were most secure [40]. The study showed that the three-tiered architecture provided security in several aspects [40]. It could be considered to be secure both regarding the data storage and fetching the data [40].

Storage:

The three-tier architecture consists of a separate storage application and can be scaled easily without affecting the other systems [24].

As for the thin-client server model, the battery consumption for the client device is lim-ited. Since the processing of data is performed at the server, the main concern is the battery consumption during the transmitting of the data from the client to the server. According to the study [41], when using a Multi-tier wireless multimedia sensor net-work, the multimedia sensors, among other things, transmit a huge amount of data which causes the system to need high networking capacity, which increases the power

(26)

consumption [41]. The results from the study [41], showed that a multitiered solution was more energy efficient than a single-tier solution.

Working Memory:

The presentation tier can consist of a thin-client device [1, p.53]. As mentioned in the thin-client model, the thin-client devices are limited in resources such as memory, pro-cessing, battery life, CPU, and storage [27]. Due to this, these devices are limited in performing certain computing functions [27].

When using a three-tier architecture improves scalability. By separating the different layers it can be scaled according to need [24]. If the number of clients of client requests would increase and cause the performance of the server to get negatively affected, then new servers can be added to the system [24].

As the figure 3.2 shows, the data is collected at the presentation tier and is then sent to the application server for processing. The application server can be scaled to be able to handle the processing of data [24].

Size of data:

The client device in a three-tier architecture might consist of a thin-client device [1, p.53]. As for the thin-client server approach, there might be a problem when sending a huge amount of data over the network to the application server [27]. The data is then managed at the server and as mention above, since the architecture is divided into tiers it makes it easy for scaling [24]. This means if any of the tiers would not be able to han-dle the data size, it can be scaled without affecting the other tiers [24].

As the figure 3.2 shows, in a three-tier architecture there should be a communication between the presentation tier and the logic tier, as well as a communication between the logic tier and the data tier. One disadvantage of using this type of architecture is the increasing network traffic and latency that is caused when communication be-tween three infrastructures are needed [1, p.53].

Cost:

By using a three-tier architecture it reduces the development costs [42]. But it does not reduce the cost regarding the hardware resources [43]. When manging three infras-tructures, there is a increasing complexity [1, p.53]. Which might increase the cost of the system.

3.3.2 Peer-To-Peer

In a peer-to-peer architecture, every peer in the network acts both as a client and a server [1, p.47]. In a peer-to-peer network, the peers that are a part of the network share hardware resources [44, p.6]. Examples of resources are processing power, storage, and network link capacity [44, p.6]. In a peer-to-peer system, the peers in the network can share their resources without requiring the need or support from a centralized server or authority [44, p.6].

(27)

Figure 3.3: Peer-to-peer architecture

Figure 3.3 illustrates the peer-to-peer architecture where the peers share resources without using a centralized server.

The communication between the peers is done by using a communication network, known as the underlay network [45, p.2]. The communication network that is often being used is the Internet, but other communication networks can also be used [45, p.2].

In a peer-to-peer system, data replication is performed to be able to increase per-formance, reliability, and availability [46, p.148]. By using data replication, it creates copies of data objects in a network and these copies can be referred to as replicas [46, p.148]. By using replication, if one replica is changed or updated, then all replicas over the peer-to-peer network should be automatically updated so no replica will be outdated [46, p.148]. Several data-replication techniques can be used and using these techniques provides that certain content can easier be located in a peer-to-peer net-work [46, p.148].

In a peer-to-peer network every peer uses a routing table [45, p.2]. This routing table normally consists of information regarding the neighboring peers [45, p.2]. For peers to be able to find a resources that is located at other peers in the network, routing pro-tocols are used [45, p.3]. By using the routing protocol, peers can communicate and send messages to other peers across the network [45, p.3].

There exist different types of overlay networks that can be used in a peer-to-peer net-work, the thesis will focus on the architectures: structured overlay networks, unstruc-tured overlay networks, and hierarchical overlay networks.

3.3.2.1 Structured overlay networks

In a structured overlay network, the peers are formed into a geometric topology, for example, a logic ring [47, p.21].

In a structured overlay network, data replication provides fault tolerance [48]. Different data replication strategies can be used in overlays but some examples are successor-list replication, multiple hash functions, and symmetric replication [48]. In a structured overlay that uses a logical ring as topology, the successor-list replication is the strategy

(28)

that has proven to be the most used one [48].

3.3.2.2 Unstructured overlay networks

Unlike the structured overlay, the unstructured overlay network is not organized ac-cording to a specific topology [47, p.21]. Instead the peers in the network are organized randomly [47, p.21], and can connect to any peer in the network [45, p.3-4].

Some examples of data replication techniques that are used in unstructured peer-to-peer networks are uniform replication and proportional replication [46, p.155]. The uniform replication strategy replicates all the objects to all the peers in the network [46, p.155]. The proportional replication strategy does replicate data depending on how often the data is requested [46, p.155].

3.3.2.3 Hierarchical overlay networks

When using a hierarchical overlay network more powerful peers are being used, also known as peers [49]. All the peers in the network are assigned one or more super-peers where the super-super-peers will manage the assigned super-peers, they are also managing the routing for all requests in the network [49].

Data replication in a super-peer network can be done by using a replicating factor [46, p.158]. By using this approach the data is replicated in the network according to the replication factor [46, p.158].

3.3.2.4 Factors impact on peer-to-peer architecture

The factors that are explained above affect the peer-to-peer architecture in the follow-ing ways:

Security:

One challenge in peer-to-peer systems is security [50]. In some peer-to-peer systems, any peer can join the network, due to this lack of control a peer can be contacting a malicious peer which might cause that the system to get negatively affected [50]. One example of a negative effect would be if the malicious peer would disroute messages in the network [50].

Storage:

Since a peer-to-peer architecture is decentralized, it means that it can provide a dis-tributed storage [51]. A peer-to-peer system is also easy to scale and storage is one of the properties that peer-to-peer networks have proven to perform well in [51].

Peer-to-peer applications need peers to be awake and available to be able to route mes-sages [52]. The devices that constitute a peer-to-peer network are limited in battery consumption and if a peer would be unavailable, the routing of messages could get af-fected [52]. The energy consumption in a peer-to-peer network is distributed along the network [53]. Because of this, it can be hard to implement a peer-to-peer model that can be energy efficient [53]. When designing a peer-to-peer model that is supposed to be energy efficient, other properties get affected as well [53]. For example, can the quality of service get negatively affected when providing an energy effecient system [53]. Providing both an energy-efficient system as well as a good quality of service is hard in this case [53].

(29)

Peer-to-peer systems may consist of all kinds of hardware components, one node can be a strong computer meanwhile others can be a lightweight device [54, p.2]. Due to this, the working memory for a node in a peer-to-peer system may vary [54, p.2]. As mentioned above, in a peer-to-peer network, some hardware resources are distributed in the network, and one example of such a resource is processing power [55, p.10].

Peer-to-peer systems are highly dynamic, and peers should be able to join and leave the network whenever they want [56, p.227]. In a structured overlay network, this is managed according to a specific control mechanism [56, p.227]

Since a peer-to-peer network may be very large and consist of many nodes then one important property of a peer-to-peer network is that it needs to be scalable [57, p.119]. Peers in an unstructured overlay should be able to join or leave the network without affecting the system negatively [57, p.119].

In the paper [58], they investigate the question regarding "how many clients a super-peer in a super-super-peer network should take on to maximize efficiency?" [58]. The study concluded that a super-peer should not reject any peer to join its network [58]. Accord-ing to the study, if the super-peer network would be too large, the super-peer should choose a peer from its network to act as a "redundant super-peer" [58]. Another solu-tion would be to split the peer-to-peer network where the chosen peer is the super-peer of half of the network [58].

One important property in a peer-to-peer system is that it uses distributed computing [51]. If there is any request that needs further processing then peers in the network can use the so-called idle cycles to perform the extra processing that would be needed [51]. One example of this can be seen in the article [59], where they design a frame-work where the peers of the netframe-work can share their hardware resources to perform tasks that require extra computation [59].

Size of data:

Peer-to-peer systems can be seen as a popular way to share huge volumes of data and one example can be seen in BitTorrent [60]. Which is a content distribution system [61]. By using BitTorrent peers can communicate with other peers to be able to share large files [61]. The provides that several subscribers that are interested in the files can access the file easily [61].

As mentioned above, peer-to-peer networks are often deployed on the Internet to be able to communicate with other peers in the network. In a peer-to-peer system, the information is replicated to several nodes. This means that if one node would fail, it would not cause the entire peer-to-peer system to fail since the replicated data is lo-cated at other peers as well.

Cost:

When using a peer-to-peer system there is a lower cost of ownership as well as mainte-nance [54, p.18].

3.3.3 Cloud computing

Cloud computing have become an important architecture regarding distributed com-puting [62]. In recent years, cloud comcom-puting has become a suitable solution in many companies regarding the management of a huge amount of data [62].

(30)

servers need to be added in a client-server model, since cloud computing can provide this easily, it is a suitable solution [28]. There has also been research performed regard-ing how to combine cloud computregard-ing and peer-to-peer systems [63]. By combinregard-ing these architectures, it could create a peer-to-peer cloud [63].

When using cloud computing, computing services, such as servers, storage and databases can be accessed over the Internet [64]. This also provides with the ability to easily add and remove resources depending on need [64]. The customers of cloud computing, which can be known as cloud customers, can for example be thin- and thick-clients [65, p.55-57].

For cloud computing to be able to provide computing and storage services, virtual-ization is used [66, p.308]. According to [66], virtualvirtual-ization refers to "the abstraction of computing resources (CPU, storage, network, memory, application stack, and database) from applications and end users consuming the service" [66, p.308]. The virtualization technique in cloud computing provides that one physical machine can function as sev-eral virtual machines [67].

The cloud computing architecture can, according to the National Institute of Standards and Technology (NIST) [68], be described by the following properties:

On-demand self-service: A cloud cunsomer, should be able to use the computing

ser-vices, for example, storage, when it wants [69].

Broad network access: A cloud consumer should be able to easily use the resources

provided by the cloud over the network from several locations [65, p.2].

Resource pooling: Multitenancy solutions refer to that one resource serves several

users [70]. By using multitenancy solutions, cloud computing resources can be shared [71].

Rapid elasticity: Should be able to adjust the cloud services according to user request

[71].

Measured service: The cloud services should be able to be measured to be able to

con-trol the cloud resource [69].

3.3.3.1 Service models in cloud computing

To be able to provide cloud services to customers with different needs, different types of services can be used [72]. There are three main services, Software as a Service, In-frastructure as a Service, and Platform as a Service [72].

Software as a Service: Software as a Service, also known as SaaS, the cloud consumer

can access the cloud applications by using several different client devices [73, p.17]. The cloud consumer does not control the cloud architecture [73, p.17].

Infrastructure as a Service: Infrastructure as a Service, also known as IaaS, the cloud

consumer is provided with computing resources, such as processing and storage [73, p.17]. The consumer is in control of these services but does not, as for the SaaS, control the cloud architecture [73, p.17].

Platform as a Service: Platform as a Service, also known as PaaS, provides the cloud

consumer with a complete environment for development in the cloud [74]. By using this, the cloud consumer can deploy a variety of applications onto the cloud [74]. As for the IaaS and SaaS, the cloud consumer does not have any control regarding the cloud architecture [73, p.17].

(31)

3.3.3.2 Deployment models

The deployment model of cloud computing refers to the types of cloud. Example of deployments models are:

Private cloud: The cloud computing resources are managed and used by one

organi-zation [73, p.17].

Public cloud: In differ to the private cloud, the cloud resources are available for the

public [73, p.18].

Community cloud: The cloud computing services are shared among a limited group

of organizations [73, p.17].

Hybrid cloud: A hybrid cloud consists of combining the cloud deployment models

(private, community, or public) [73, p.18].

3.3.3.3 Factors impact on the cloud computing architecture

The factors that are explained in section 3.1 affect the cloud computing architecture in the following ways:

Security:

In cloud computing, security is a major challenge [65, p.34]. By using the cloud com-puting architecture it raises a lot of security issues [65, p.34]. When providing applica-tions in the cloud, it causes it becomes vulnerable to attackers [65, p.34]. One example is if the input to a cloud program is not controlled and checked by a security mech-anism, this causes that delicate information from data centers can be accessed [65, p.34].

Storage:

When using cloud computing, cloud storage is provided [75]. The cloud storage is man-aged by a cloud provider [75]. The cloud provider provides the data storage over the internet [75]. By using cloud storage, the user can scale the cloud storage as data in-creases as well as dein-creases [76].

When using cloud computing, the possibilities of processing the data in the cloud, as well as the possibility of moving the applications to the cloud, make it possible for the client to use devices that are not as power-consuming [77]. Another power consump-tion benefit is that the data centers are often shared by several users which provides an energy benefit [77].

Working memory:

Some client devices that are used in cloud computing, for example, smartphones and sensors, are limited in storage, computation power, capacity, battery life, and memory [78]. Since cloud computing can process a large amount of data the devices can choose to perform the data processing on the cloud instead of on the physical device [78].

One benefit of using cloud computing is the ability to scale resources depending on the client request [79].

When using cloud computing, the data processing can be moved to the cloud and cloud computing has become a good approach when it comes to processing large amount of data [80].

Size of data:

The client device in a cloud computing architecture might consist of a thin-client de-vice. As for the thin-client server approach, there might be a problem when sending a huge amount of data over the network to the application server or the cloud [27]. The

(32)

data is then managed in the cloud. Since one of the properties of cloud computing is rapid elasticity, it means that the cloud services, such as storage, can be scaled accord-ing to need [71].

When using cloud computing the communication to the cloud is an essential need for the users to be able to use the cloud services [65, p.55]. There has been significant development regarding personal computer hardware which provides a more robust connection [65, p.53].

Cost:

The cost of the usage of cloud computing is a “pay-as-you-go” style [81]. This means that you pay for the resources that you have used [81]. By using cloud services, it re-places the need for locally maintain servers, databases, and other resources, which decreases the cost [82].

3.3.4 Edge computing systems

Today, several IoT computing environments collect a huge amount of data, and send-ing all this data to the cloud might not be efficient due to both security problems as well as network problems [83]. Therefore, there is a need for a distributed computing architecture that is located closer to the connected devices [83]. This type of comput-ing architecture, could for example be edge computcomput-ing. By uscomput-ing edge computcomput-ing, it moves the storage, computing, networking, and data management “at the edge” of the network [83].

As mentioned in the cloud computing section, a distributed system may combine its architectures. One example is when edge computing is combined with the client-server architecture, which forms "edge-client-server systems" [2, p.90]. When combining these architectures the servers are located “at the edge” [2, p.90].

Edge computing can also be combined with peer-to-peer architecture. This can be seen in the article [84], where they combine these architectures to be able to perform computations to any participating node in the system [84].

There are different approaches when constructing an edge-computing architecture, one general architecture could be by the divide the architecture into three main layers: terminal layer, edge layer, and cloud computing layer [85]. This architecture can be seen in figure 3.4.

Figure 3.4: A typical architecture of edge computing networks

Terminal Layer: This is the layer where the client-devices are located. These

(33)

cameras etc [85]. The terminal layer is responsible for collecting the data and send it to the boundary layer for processing and managing the data [85].

Boundary Layer: This layer can also be referred to as the edge-layer [85]. It receives

data from the terminal layer and processes the data [85]. When the data is processed it sends the processed data to the cloud layer for management [85]. The physical devices located at this layer could for example be: routers, switches, gateways, etc [85].

Cloud Layer: This layer provides cloud computing [85]. This means that it provides

re-sources such as data processing as well as storage [85]. If the edge-layer would not be able to completely perform the data processing that was needed, it can be completed in the cloud layer since it consists of higher computing capacities [85].

3.3.4.1 Fog computing

Another computing paradigm that brings the computation closer to the devices is fog computing [83]. As for edge computing, fog computing also performs computations closer to the client devices [83]. Fog computing differs from edge computing in the way that it can perform its services along the way between the cloud and the devices [83]. This differs from edge computing since edge computing often perform the computing services at the edge of the network [83].

3.3.4.2 Factors impact on edge computing architecture

The factors that are explained in the section 3.1 affect the edge computing architecture in the following ways:

Security:

By using an edge-computing architecture, it is easier to provide good security and pri-vacy [86, p.11]. According to [86, p.13], when using edge security, it includes device security, network security, data security and application security [86, p.13].

Storage:

The edge devices can, for example, be devices such as cameras, sensors, and drones, which means that the devices have limited storage capacity [87]. The local persistent storage in an edge device can be about 1 Gigabyte (GB) [87]. The data can therefore be stored at the edge server or the edge cloud [87]. The edge server can provide hundreds of gigabytes of local storage and the edge cloud can provide further storage if needed [87].

In edge computing, the energy consumption is reduced in the edge devices since the data processing is performed at the edge, which causes that the use of network band-width is reduced, and thereby decreases the power consumption [85].

Working memory:

As mention above, edge computing architecture can be divided into three main layers. The terminal layer consists of devices such as mobile terminals and IoT devices. To be able to minimize the terminal service delay, the devices in the terminal layer are limited in computing power [85]. Thereby, these devices collect all kinds of data and upload it to the edge layer, where it is processed and managed.

According to [88], edge nodes can be easily added, replaced, or upgraded by using an appropriate edge computing framework [88].

By processing the data closer to where the data is collected, it provides fast data pro-cessing [85]. Edge computing, in comparison to cloud computing, provides faster

Data Processing and Collection in Distributed Systems