Evaluation of a synchronous leader-based group membership protocol

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer Science

Bachelor thesis, 16 ECTS | Information Technology

Spring 2017 | LIU-IDA/LITH-EX-G--17/084--SE

Evaluation of a synchronous

leader-based group

member-ship protocol

Utvärdering av ett synkront ledarbaserat protokoll för

gruppmedlemskap

Anton Tengroth

Chi Vong

Supervisor : Mikael Asplund Examiner : Nahid Shahmehri

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page:http://www.ep.liu.se/.

c

Anton Tengroth Chi Vong

(3)

Students in the 5 year Information Technology program complete a semester long software development project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, currently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culmi-nates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elicitation. During the final stage of the semester, students form small groups and specialise in one topic, resulting in a bachelor thesis. The current report represents the results obtained during this specializa-tion work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis.

(4)

Abstract

The group membership protocol is a mechanism that handle mobile nodes in a dynamic en-vironment and provide and maintain these nodes in a membership. These nodes can, for instance, be seen as the increasing connected devices which lead to a more dynamic group of devices in systems like distributed systems. In this thesis, a synchronous leader-based group membership protocol (SLMP) is evaluated. By doing simulations where the SLMP gets to handle nodes joining and crashing in different frequencies in a noisy environment, while we vary the length of the timeout, the frequency of nodes joining and crashing, and the packet loss rate; we were able to establish that all these parameters affect the performance of the protocol in different ways. When nodes join and crash in a high frequency it is wise to have a short timeout, but if the packet loss rate also is high, then the performance of the protocol will decrease. However, if the packet loss rate is high, there still are possibilities for the protocol to deliver a good service, if the timeout is long enough and the rate that nodes join & crash is not too high.

(5)

Acknowledgments

We would like to thank our supervisor, Mikael Asplund for all the help and support we received. We also would like to thank our classmates for valuable feedbacks.

(6)

List of Figures

2.1 System architecture . . . 3

2.2 Illustration of views and leader relations. . . 5

2.3 Synchronous rounds of SLMP . . . 5

4.1 Simulation result of varying timeout . . . 12

4.2 Simulation result of varying churn rate . . . 12

4.3 Simulation result of varying packet loss rate . . . 13

(8)

List of Tables

3.1 Parameter values used in the simulations . . . 8 3.2 Default parameter values . . . 9

(9)

1 Introduction

A lot of new opportunities have opened up as the technology has evolved, which allows more devices to connect to each other, for instance, intelligent vehicles. The intelligent vehicles can potentially coordinate themselves to allow a safer traffic environment and collaboration to minimize resource consumption. This demands a protocol that is effective in handling these mobile vehicles(which can be perceived as nodes) and provides them with a membership in a group. The goal of such a protocol is to provide an accurate status of the group and let the nodes be aware of their surroundings and connect to other nodes in the form of exchanging messages. The field group membership has been extensively studied and there exists many group membership algorithms [1,7]. However, it is still challenging to handle a dynamic group on an unreliable communication channel where nodes can join, crash or leave at any time.

In an application with a group membership protocol implemented, the host will broadcast the status of the group in the form of a message called, view, and any active nodes of the group have to acknowledge the view back to the host[4,7]. Group membership protocols are very useful in collaborative applications where users needs to keep track of of members in a group. In collaborative applications, events must be distributed to a number of users to preserve a common view (e.g intelligent vehicles and multiplayer games) [5]. In these applications, users may come and go, which means that the system is dynamic. In this thesis, we provide a brief overview of a protocol called Synchronous Leader-based Membership Protocol (SLMP) [2] in Chapter2and evaluate the protocol in Chapter4.

1.1 Problem definition

Our aim is to do a performance evaluation of SLMP and analyze the effect of different param-eters on the protocol. With the awareness that the protocol performs differently depending on the needs of the application, we consider evaluating the protocol in a scenario with intelli-gent vehicles where they have to coordinate with each other to achieve given goals. Assume that these vehicles have to coordinate with each other, for instance, to drive on a straight road or to cross an intersection without having an accident. The task of SLMP is to provide an ac-curate status of the group of vehicles and let them coordinate with each other by exchanging

(10)

1.2. Research questions

messages while handling environmental obstacles in an appropriate way. The evaluation will be done in simulations. This will not give us the true performance of the protocol, but still, gives us an idea of the performance that is sufficient enough and cost little resources while considering SLMP still is in its early stages. In the simulations we will vary some parameters, these parameters are described as follows:

• churn rate is the rate of nodes joining and crashing, • packet loss rate is the rate for messages to get lost due to

the unreliable communication channel and

• timeout is the time limit for the failure detector to detect nodes that crashed.

It is valuable to eavaluate how SLMP perform when varying these parameters since group membership protocols have to deal with the same parameters in the real world. A real world scenario can be a highly populated city with many intelligent vehicles. The rate of these vehicles appearing and disappearing is the churn rate. The vehicles will keep trying to stay connected with other vehicles or devices while moving around. As there are limitations on the bandwidth spectrum this will cause the vehicles having to use same bandwidth frequency and this will cause noises. The noises lead to the rate of packet loss. A high packet loss rate will increase the risk for a vehicle being assumed crashed by the protocol even though the vehicle has not crashed [3]. To get the best performance, the value of the timeout needs to depend on churn rate and packet loss rate. The question is if there exits a connection between these parameters.

1.2 Research questions

We will have the following research questions:

• How does the churn and packet loss rate affect performance and availability of the SLMP protocol?

• Are there any optimal values for timeouts depending on the churn and packet loss rate?

1.3 Approach

Our approach to answer the questions in Section1.2is to systematically simulate the protocol with preselected parameter values. These parameters have been carefully chosen, with the scenario in mind, to the best of our knowledge in order to see what effects these parameters might have on the performance of the SLMP. The metrics that have been used in the evalua-tion will be described in Chapter3. A more detailed description of our approach can be read in Chapter3.

1.4 Delimitations

We will only evaluate the protocol in simulations. Due to the limited time, we choose to limit the number of parameters and the values that were tested for these parameters. As described in Section1.1, we only consider the scenario.

(11)

2 Background

This chapter will give the reader necessary information to understand the protocol and the performance evaluation of it. We will describe briefly about system architecture, distributed systems and how group membership and SLMP works. Related works on performance eval-uation of group membership algorithms will also be introduced here.

2.1 System architecture

The system architecture of applications that implements SLMP looks like Figure2.1. In a un-reliable communication channel, packets may disappear along the route. The purpose of the group membership protocol is to provide membership to a group of nodes. The membership protocol will report any changes in the group to the application. It will then act and notify the membership protocol. The failure detection is applied in the group membership protocol.

(12)

2.2. Synchronous distributed systems

2.2 Synchronous distributed systems

A distributed system is a system where hardware or software components located at net-worked computers coordinate and communicate by passing messages to each other. The components interact with each other in order to achieve a common goal defined by the ap-plication. There are two major categories of distributed systems: synchronous and asyn-chronous. In synchronous distributed systems, each node has a clock that is synced with all other nodes in the system. This synced clocks make it possible to establish a real time or-der of events among all the nodes. To have synced clocks is crucial for synchronous systems since they have an order that events occur, for example a message can never be delivered in a random order. To maintain this order, synchronous systems have an upper bound on mes-sage transmission delay from one node to another. Because of this synchronous distributed systems make strong assumptions about time. Although synchronous distributed systems can be built, it is difficult to design and meet these assumptions in the real world. The other category, asynchronous distributed systems, do not make any order of events nor have any strong assumptions. Messages can be delayed for arbitrary periods of time and clocks can be out of sync, this makes asynchronous systems more suitable for real world scenarios [5].

2.3 Group membership management

The goal of group membership management is to reach agreement in a dynamic group of nodes. Group membership protocols manage a number of nodes as a group in a list called view. A node is considered a group member if its ID is stored in the view. According to Chockler et al. [4], a membership service is a vital part of a view-oriented group communi-cation system. The task of the membership service is to provide and maintain the status of a dynamic environment and deliver a correct view of that status. A correct view is achieved if the view contains all nodes in the environment at any specific time and if the service achieves this then the service is accurate. Views that contain nodes that no longer exist in the environ-ment or do not contain nodes that exist in the environenviron-ment, affect the accuracy of the service negatively. Whenever the view changes, the service reports the changes to the members by installing a new view. The membership service strives to install the same view for all mem-bers of the group. In group memmem-bership there are three operations needed: join, leave and exclude. A node might be able to join and leave as it wishes. The node might also be excluded from the group if the node crashes or perceived to crash, e.g. due to packet loss [4].

2.4 Overview of the SLMP

In this section, we briefly introduce the main principles of the SLMP. A more detailed de-scription of the protocol can be found in "Specification, Implementation and Verification of Dy-namic Group Membership for Vehicle Coordination" [2]. The SLMP is a synchronous leader-based membership protocol, which means that the group has one leader and the rest of the nodes are followers. In the SLMP a leader will be selected from the group and the other will be followers. The leader will continuously broadcast messages such as the view to confirm the membership of the group. By having messages sent out from the leader to all the members of the group, the leader can establish who is a member of the group. When the followers get the view message, they will process it and send back a claim response to the leader to claim its position in the group. The leader will have a view which consists of all the members of the group at any given time. In the protocol, a view consists of at least one node and exactly one leader. See Figure2.2for the illustration of views and joining process. An arrow in the Figure from node i to j indicates the leader for node i is node j.

(13)

2.5. SLMP phases

Leading

Joining/Following/ Waiting

view

Fig. 2.2: Illustration of views and leader relations.

The protocol allows for the following changes; nodes may fail permanently, new nodes may appear, messages to be lost, leader to remove presumed crashed followers and followers to leave the existing group if the leader being assumed crashed. SLMP uses a round-based model [8] to send messages during the rounds to other nodes and to make decisions based on those messages.

2.5 SLMP phases

Each round can be divided into three recurring phases for every node, this is shown in Figure

2.3. These phases are; send, act and wait. Before the send phase can start and nodes proceed to broadcast the message over the communication channel, all nodes in the system have to be synchronized. Whenever a node in the system receives a message, it will call an event to act upon the received messages. After the send phase, the nodes will proceed to act phase after a predefined time interval. In this phase, the nodes will change their internal state variables and then enter wait phase until a new send phase is initiated. The whole process is the round duration T.

Fig. 2.3: Synchronous rounds of SLMP

The send phase mainly consists of two events, the sending and receiving of messages. The message sent by a node is determined by the state of the node. There won’t be any messages sent if the node is in the Waiting state. If the node, however, is in the Leading state, it will send the current view with another view, nextView, that will be valid in the next round. For nodes in any of the other two states, they will send out its view and state.

(14)

2.6. Related works

The act phase is based on the four states a node can have during its runtime. A node can be in one of these four states; Leading, Following, Joining or Waiting. All nodes begin in the Leading state and the node changes its state depending on the process. If we, for example, already have a group of nodes and a new node appear. Soon will the new node receive the view of the current existing group and send a request to join the group if the leader of the other group is a better leader. The definition of a better leader is based on conditions set by the application. The state of the new node will then change to Joining and later, Following when the node gets accepted into the group. Every node has a timeout variable; leaders keep track of inactive followers while followers keep track of its leader. Whenever a node in the Following state has not received any views from the leader within the timeout, the node will change back to Leading state. The same goes for when the leader of a group has not heard from a node within the timeout, then the node in question will get excluded from the group by the leader. The desired state of the protocol is when all the nodes are members of the group and follow one leader [2].

2.6 Related works

The group membership problem has been largely studied and implemented. One of the first works that studied and evaluated group membership protocols were due to Cristian [6]. In his work, he evaluated three synchronous group memberships protocols, where he tested how well they performed depending on the rate nodes failed and joined the system. To see how well the protocols performed, he evaluated how fast the protocol detected failed and joined nodes and the message overhead of the protocol. There are some major differences between the work of Cristian and our work, one example is the fact that Cristian evaluated other group membership protocols than us. Another aspect is that Cristian did an analytic evaluation and he never used any simulations at all, unlike our evaluation where we use a simulation environment to study the performance of the protocol. Another evaluation of a round-based group membership protocol is the work, "A Reliability Evaluation of a Group Membership Protocol" [9]. In the evaluation, the protocol undergoes experiments which include different fault models, both permanent and transient. Examples of faults are: nodes crashing, failing to receive messages and communication error bursts. These fault models are applied together with different parameters during the experiments and evaluated through reliability models. Even though the protocol and the executed experiments are much more complex than our work, it still gives us an insight of how it is possible to evaluate a group membership protocol and what kind of metrics and parameters that are interesting to look at when you perform the evaluation.

One algorithm that is very similar to SLMP is VIRTUS [7] because VIRTUS is built on top of the group membership principles and is also a synchronous round-based protocol. The major difference is that VIRTUS guarantees virtual synchrony where whenever a member fails to implement a view, then all other members will discard the view to ensure that all members get the same information or none at all. The approach to evaluating VIRTUS is also a bit different than ours. They implemented VIRTUS on hardware and did real-world experiments while we test SLMP in a simulation environment.

There is also some research done that study view orientated group communications in a broader spectra. An example is Choklers work [4], where he surveyed over 30 group com-munication systems. The survey discussed different applications of group comcom-munication systems and what usefulness these applications brought to the table. The work presented a framework for all group communication systems which is a good basis if you would want to analyze the strengths and weaknesses of different properties within a group communication

(15)

3 Methodology

In this chapter, the method and the settings used to evaluate SLMP will be described. The chosen metrics will be discussed and how the data collection was performed. In particular, we have considered looking at the parameters; packet loss rate, the timeout value and the churn rate.

3.1 Evaluation metrics

We consider the following metrics to evaluate the SLMP:

• The ratio of correct views. The main task of a group membership service is to maintain a correct view that reflects the nodes in the environment. The ratio of correct views describes how large share of the delivered views that were correct.

• Join processing time & occurrences. How long time it takes for nodes to join an existing group from the time a node sent its first join request to the leader. The faster the protocol includes a new node in the group the faster the node can get updated information from the group, which is something to strive for. We also look at how often nodes send join requests during a simulation.

To calculate the ratio of correct views, we divide the number of views that contains all the nodes in the environment by the total amount of rounds during the simulation. The total amount of rounds equals to the total amount of views produced, since one view is produced each round.

The joining process is started when a node sends out its first request to join a group and its leader. The process ends when the node gets accepted in the group and changes its state to Following.

(16)

3.2. Data collection and settings

3.2 Data collection and settings

To get data that reflect the performance of the protocol in the best way, many simulations are needed to achieve consistent result because one simulation may give results that differ from another. We decided to run 100 simulations where we assume each round is 1 second long and each simulation is 1 hour which is 3600 rounds. The results you can see in Chapter4are average results for over 100 simulations.

As we cannot run with all possible parameter values, they need to be limited to some range. In the scenario described in Section 1.1, we have a group of intelligent vehicles that need to coordinate with each other in order to achieve given goals. The values of the parameters were chosen with that scenario in mind and also with the interest to evaluate SLMP under extreme conditions such as high packet loss rate and high churn rate. In the table below, the parameter values were used for the simulations are presented.

Table 3.1: Parameter values used in the simulations

Parameter Values (units)

Round duration 1 (seconds) Timeout (TO) 1, 2, ..., 30 (rounds) Packet loss rate (PLR) 0, 10, ..., 100 (percents) Churn rate (CHR) 0, 0.5, ..., 5 (percents)

Round duration

In real-time applications, it is desirable to have the shortest round time as possible. However, it is a waste of energy and resources if it is too short. We considered 1 second to be reasonable duration of one round to simplify the evaluation. If needed, it is can be adjusted depending on the needs of the application.

Timeout values

The timeout decide for how many rounds a node can be silent before it gets removed from a view, it is therefore important to study how different timeout values affect the performancede of SLMP. If the timeout is too long the protocol will be inefficient as its view will contain crashed nodes. This will be reflected in the ratio of correct views. If the timeouts are too short, then the failure detector will assume functional nodes crashed due to packet loss. The selected range in Table3.1will be tested during the evaluation.

Packet loss rate

The rate of packet loss will highly affect the performance. When consecutive packets are lost from the same node, SLMP removes the node from future views although the node hasn’t crashed. This will affect the ratio of correct views. To see how big impact this has on the the performance of SLMP, we will vary from no packet loss at all to total packet loss for completeness.

Churn rate

To see how the protocol handles churn, nodes will be able to appear and crash. We want to have a steady rate of nodes appearing and crashing otherwise the number of nodes will either increase to infinity or decrease until no nodes are left. Due to this reason, we chose to have the same rates for nodes appearing and crashing during the simulations. In the simulations our goal was to have an average of 5 nodes in the environment during each simulation. If there is

(17)

3.3. Simulation environment

also a 2 percent chance for a node to crash and leave the environment. If one round is seen as one second and we got an average of 5 nodes, this means that each minute and average of 1.2 nodes crash and 1.2 nodes join the environment each minute when the churn rate is 2%.

Default parameters

One important note is that while we vary the values of one parameter, we have to fix the other parameters to one value. Otherwise it will be difficult to see the impact of the parameters and may give us faulty results. For this reason, we need to set some default values while we vary the values of one parameter. Table3.2will show the chosen default values for the parameters.

Number of nodes

The amount of nodes in the simulations won’t have a big impact on the performance of the SLMP if the churn rate is tuned so that the probability of a node to join or crash each round is set to the same values as in the simulations we run. The difference is that there will be more broadcasted messages during the simulations if we have more nodes, which means that there are a larger probability for packets to get dropped each round, this can affect the performance negatively. However the chance of the same node dropping packets two or three rounds in a row is still the same as if we would have fewer nodes, which means that if the timeout is not 1 round the performance should not be affected that much. The time required to eval-uate the protocol also increases with the number of nodes and therefore we decided to run the simulations with a relatively small amount of nodes. This number can be seen in Table3.2.

Table 3.2: Default parameter values

Parameter Values (units)

Average number of nodes (AVGN) 5 (nodes)

Timeout (TO) 4 (rounds)

Packet loss rate (PLR) 10 (percents)

Churn rate (CHR) 1 (percents)

3.3 Simulation environment

The simulation environment is written in C++ by Mikael Asplund and built on the simple foundation of group membership principles. At the beginning of the simulation, the values for different parameters for the simulation will be loaded and then the simulation will run until the set number of rounds.

At the beginning of each round, there is a probability for a node to crash and a probability for a new node to join. The ID of the crashed nodes will be stored and the crashed nodes will be removed from the population in the simulation. If a new node has appeared, its ID will be stored and will be added in the population. Each functioning node will go through sending and receiving phases. Also, whenever a node is joining a group, the time of this join process will be stored.

In the sending phase, the leader will send out a view. In the simulation environment, every node has a probability(packet loss rate) not to receive the view. When that happens, the node doesn’t receive the packet and the round continues.

At the end of the round, some statistics will be calculated such as the ratio of correct views, and then a new round will be started again and the process repeats itself.

(18)

3.3. Simulation environment

When the simulation has finished, the statistics will be the output. We are also aware of that the device that runs the protocol might affect the result because of the shared memory and resources. However, since SLMP is synchronous and based on rounds we consider this error margin minor and therefore we only did the simulations of the protocol on a single device.

(19)

4 Results

In this chapter, the results of the performed simulations are shown. First, there are three different sections: timeout, churn rate, and packet loss rate. For each section, we will show the results of the ratio of correct views and join processing time & occurrences. In graphs of join processing time and occurrences, there are two lines and two y-axes. The blue line represents how many rounds it takes for a node to join a group with the left y-axis while the orange line represents how many times nodes try to join a group with the right y-axis. The last section will show results of how the timeout can be set to maximize the number of correct views. Each dot in the graphs represents an average result of 100 simulations.

4.1 Varying timeout

In this section, the values of the parameter timeout will be varied and the values of the pa-rameter packet loss rate och churn rate will be fixed. In the following figures, the x-axis represents the timeout and the y-axis represents different metrics defined in Section3.1. Figure4.1ashows the ratio of correct views. The graph indicates that a longer timeout will result in a smaller amount of correct views. When the churn rate is 1% and the packet loss rate is set to 10%, a timeout of 3 rounds would be preferable if the protocol should deliver as many correct views as possible.

(20)

4.2. Varying churn rate

(a) Ratio of correct views (b) Join processing time & occurrences Fig. 4.1:Simulation result of varying timeout

As can be seen with the blue line in Figure4.1b, the number of rounds it takes for a node to join a group since its first join request is nearly constant. The time is not dependent on the timeout values nor is the average occurrences of nodes wanting to join a group. We see that the result varies between different values but there are not any major differences between the simulations. The small distinctions can be explained by coincidences, where one simulation might have had more lost messages or joining nodes than another simulation.

4.2 Varying churn rate

In this section, the values of churn rate will be varied while the values of timeout and packet loss rate will be fixed.

(a) Ratio of correct views (b) Join processing time & occurrences Fig. 4.2:Simulation result of varying churn rate

By looking at Figure4.2awe see that an increasing churn rate will affect the ratio of correct views negatively. When nodes have a higher chance of crashing and joining the environment, it is harder for the protocol to maintain a correct view.

As in figure4.1b, the average join processing time is not affected much by the churn rate. One difference from figure4.1bis that the orange line in Figure4.2bshows that higher churn rates entail more occurrences where nodes want to join the group. Since higher churn rates lead to more nodes joining the simulation, it’s only logical that more nodes send join requests to the group leader.

(21)

4.3. Varying packet loss rate

4.3 Varying packet loss rate

In this section, the values of the parameter packet loss rate will be varied with fixed values of the parameter timeout and churn rate.

(a) Ratio of correct views (b) Join processing time & occurrences Fig. 4.3:Simulation result of varying packet loss rate

When the packet loss rate is high, the leader of a group have a high probability to remove nodes from the view due to lost messages, this will result in incorrect views. This can be seen in Figure4.3a. As the packet loss rate increases, the ratio of correct views decreases.

When the packet loss rate increases, the new nodes that joins the simulation will have a hard time to establish contact with the group leader. The messages they send to establish contact will have a larger risk of getting dropped and therefore the join processing time also will increase. When the packet loss rate gets very high, almost no packets get delivered hence the long join processing time in Figure4.3b. How often nodes join a group depends on the packet loss rate. If a node gets removed from a group because of packet loss it has to join a group again. But if the packet loss rate is too high, the node will not be able to join a group since almost no packets will get received by the leader or sent back to the node, and therefore the number of occurrences where nodes join a group decrease rapidly after 60%, which can be seen in Figure4.3b.

4.4 Optimal timeout

From the results from varying timeout, churn rate and packet loss rate, we can now adjust our values to find the optimal timeout for different packet loss rates and churn rates. Each line in the diagrams represents simulations with different churn rates and each diagram represents different packet loss rate. The range in the x-axis varies between the diagrams as we want to find the timeout that gives the highest the ratio of correct views.

(22)

4.4. Optimal timeout

(a) Ratio of correct views with PLR=10% (b) Ratio of correct views with PLR=30%

(c) Ratio of correct views with PLR=50% (d) Ratio of correct views with PLR=70% Fig. 4.4:Simulation result of varying timeout, churn rate & packet loss rate

Figure4.4ashows that a higher churn rate results in a decreasing ratio of correct views. The blue line which got the least dynamic environment (1% churn rate) got the highest rate of correct views regardless of the timeout. The orange line got the second highest ratio of correct views regardless of timeouts, and so on. A similar pattern can be seen in all the figures, a lower churn rate will always allow the protocol to produce a higher ratio of correct views if the timeout is set right. This is what can be expected since a more dynamic environment will make it harder for the leader to keep track of all the nodes joining and crashing during the simulation and producing correct views.

The figures also show that if the length of the timeout is set to a higher value, the protocol can still maintain a good performance even though the packet loss rate is high. This can be seen in Figure4.4b, where the ratio of correct views almost reach the same values as in Figure

4.4aif the timeout is set to around 6 rounds instead of 2-3 rounds in Figure4.4a. When the timeout is longer the nodes have a higher chance of getting the messages delivered to the leader even if the packet loss rate is high.

The figures also show that it is possible to maintain good service even though there is a high packet loss rate. Setting the timeout to a high value if the packet loss rate is high will

(23)

5 Discussion

In this chapter, the method will be discussed along with the approach that was chosen and what alternatives exist. Also, the result from the simulations will be discussed.

5.1 Method discussion

Our chosen method was to evaluate the protocol in a simple simulation environment writ-ten in C++. However, we could have done it in a more advanced simulator. An advanced simulator would give us refined results, but would also require a lot more time to set up. We could get an even better result if we evaluated the protocol in reality with real components, but as the advanced simulator, it would take a lot of time to set things up and simulating in the reality and would also cost a lot more resources like hardware and money.

The scenario described in Section1.1is considered relevant for us. The reason for this is the number of intelligent vehicles is increasing and one of the current challenges is to handle them with a membership protocol to allow a safer traffic environment and reduced resource consumption. The default parameter values chosen in Section3.2are considered relevant for the scenario, but to only run the protocol with those parameter values would not be a good way to evaluate the protocol. Instead, we wanted to run the protocol with the default parameters while varying one of the parameters in Table3.1. If we chose another scenario, we would have different values for the parameters which would give us slightly different results. If we for example would have simulated a scenario with a multiplayer game in mind, we might have used a higher churn rate, which would have lead to different results where the performance of SLMP would deteriorate. The impact of the different parameters would still be the same though.

To get a more extensive view of how the protocol responds to the parameters we could have done simulations with even more values for the churn rate and group size and also look at more metrics. Having few metrics might give wrong conclusions because of the small coverage of the performance and efficiency. One example of another metric to look at is how fast a node rejoins another group after a leader has crashed. However, one of the problems with this metric is to handle different cases, i.e. the leader has really crashed versus the leader

(24)

5.2. Result discussion

losing messages due to high packet loss in the link and then the node rejoins quickly because the leader hasn’t really crashed. This will give us two different times which combined will give an incorrect result. Due to our time frame, we needed to limit our evaluation to some values and metrics, described in Section1.4, to evaluate the protocol.

Some deviations in our result may occur due to circumstances in the different simulations. An example of a possible deviation is when the leader of a group crash frequently during a simulation. This will force the nodes to leave the group and become their own leaders and form a new group which will affect the result negatively. Other possible deviations that might occur is a difference in the average amount of nodes during a simulation. In some simulations, the average amount of nodes might be a bit lower or higher than usual which will either lead to an increased or decreased performance of the protocol since a larger group of nodes will get more affected by the churn and packet loss rate. As said, these deviations are caused by circumstance and running 100 simulations to get our results will drastically decrease the impact of these factors on the result.

5.2 Result discussion

Our graphs containing the ratio of correct views during simulations with different parame-ter values provide a good indication of how well the protocol is performing depending on different parameter (churn, packet loss, timeout). In a more dynamic environment, the pro-tocol needs to adapt its properties in order to preserve high performance and reliability. The join processing time was affected by the different parameters as expected. The parameters timeout and churn rate do not affect how long it takes for nodes to join a group since its first request at all. The churn rate increases the number of occurrences of nodes joining the group because nodes join and crash more often. When the packet loss rate is high, the nodes have difficulties communicating with each other. This means that join requests and responses get lost and that postpone the join process.

The graphs in Section4.4show that the protocol is able to handle unstable communication channels. When the packet loss rate is 50% and the churn rate is 1%, the ratio of correct views reaches 80% if the timeout is long enough, which is almost the same result as when the packet loss rate was 10%. This means that even with high packet loss rates it is possible for the protocol to provide good service. When both the churn and packet loss rate is high, the performance of the protocol will decrease. Either one of the rates needs to be low for the protocol to achieve good results which can be seen in Section4.4. To get the best performance of the protocol, it is necessary to set the timeout value accordingly to the environment with varying packet loss rate and churn rate.

(25)

6 Conclusion

The purpose of this thesis was to identify what main parameters that affect the performance of SLMP and in what way. The simulations we have done show that there are three major things that affect how well the protocol performs; the churn rate, the packet loss rate, and the timeout. A high churn rate makes it harder for the protocol to keep up with all the nodes joining the simulation and crashing, which means that fewer views got the correct group dy-namics. The packet loss rate has its obvious effect on the simulations. When fewer messages get delivered it is more difficult for the SLMP to know if nodes really have crashed or if mes-sages have been lost along the way. If the leader, according to the protocol, starts to remove nodes from the view because of lost messages, the view will not reflect the environment cor-rectly. This is because the nodes should only get removed from the view when they have failed.

The timeout parameter decides for how long the leader should wait until removing stale followers. When the chance of node failure is low it is preferable to have a higher timeout since the protocol can then afford to wait longer before it has to remove a node from the group without affecting the accuracy of the protocol too much. With a longer timeout, there is a smaller chance that the leader of the group removes a follower due to packet losses. If the rate for nodes to fail is high, there is a large chance that a stale node actually has failed, and if nodes often fail it is wise to have shorter timeout since the node faster can get removed from the view. If either the packet loss rate or the churn rate is low, the protocol can deliver a good service even if the other rate is high. An optimal value for the timeout parameter depending on churn and packet loss can be found by analyzing the result in Section4.4. The optimal timeout is where the ratio of correct views reaches its highest value.

(26)

With the result, we have seen how SLMP performs with different parameters in a simulation environment. It is desirable in future work to test and evaluate the protocol with varying settings in the real world since the reality is complex and always changing. The performance result of SLMP in both simulation- and reality-like environment can be used to compare with other group membership protocols to determine advantages and disadvantages of the differ-ent protocols.

There are many options to work further with SLMP, for example, explore other metrics to evaluate SLMP. It is also great to have functions to calculate packet loss rate, node failure probability to let the protocol adapts to its environment better and set appropriate timeout depending on these inputs.

(27)

Bibliography

[1] Yair Amir, Louise Elizabeth Moser, Peter Michael Melliar-Smith, Deborah Anne Agar-wal, and P. Ciarfella. “The Totem single-ring ordering and membership protocol”. In: ACM Transactions on Computer Systems 13.4 (Nov. 1995). DOI: 10 . 1145 / 210223 . 210224.

[2] Mikael Asplund, Jakob Lovhall, and Emilia Villani. “Specification, Implementation and Verification of Dynamic Group Membership for Vehicle Coordination”. In: 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, Jan. 2017, pp. 321–328.ISBN: 978-1-5090-5652-1.DOI:10.1109/PRDC.2017.57.

[3] Tushar Deepak Chandra and Sam Toueg. “Unreliable failure detectors for reliable dis-tributed systems”. In: Journal of the ACM 43.2 (Mar. 1996).

[4] Gregory V. Chockler, Idit Keidar, and Roman Vitenberg. “Group communication spec-ifications: a comprehensive study”. In: ACM Computing Surveys 33.4 (Dec. 2001). DOI: 10.1145/503112.503113.

[5] George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair. Distributed Systems: Concepts and Design. 5 edition. Pearson, 2011.ISBN: 978-0132143011.

[6] Flaviu Cristian. “Reaching agreement on processor-group membership in synchronous distributed systems”. In: Distributed Computing 4.4 (Dec. 1991). DOI: 10 . 1007 / BF01784719.

[7] Federico Ferrari, Marco Zimmerling, Luca Mottola, and Lothar Thiele. “Virtual Syn-chrony Guarantees for Cyber-physical Systems”. In: 2013 IEEE 32nd International Sym-posium on Reliable Distributed Systems. IEEE, Sept. 2013.DOI:10.1109/SRDS.2013.11. [8] Eli Gafni. “Round-by-Round Fault Detectors: Unifying Synchrony and Asynchrony”. In: Proceedings of the 17th Annual ACM Symposium on Principles of Distributed Computing (PODC). New York, New York, USA: ACM Press, 1998, pp. 143–152.ISBN: 0897919777. DOI:10.1145/277697.277724.

[9] Valério Rosset, Pedro F Souto, Paulo Portugal, and Francisco Vasques. “A Reliability Evaluation of a Group Membership Protocol”. In: Computer Safety, Reliability, and Secu-rity. Ed. by Francesca Saglietti and Norbert Oster. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. Chap. A Reliabil.ISBN: 978-3-540-75101-4.DOI:10 . 1007 / 978 3 -540-75101-4_37.

Evaluation of a synchronous leader-based group membership protocol

Linköping University | Department of Computer Science

Bachelor thesis, 16 ECTS | Information Technology

Spring 2017 | LIU-IDA/LITH-EX-G--17/084--SE

Evaluation of a synchronous

leader-based group

member-ship protocol

Utvärdering av ett synkront ledarbaserat protokoll för

gruppmedlemskap

Anton Tengroth

Chi Vong

Upphovsrätt

Copyright

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Problem definition

1.2

Research questions

1.3

Approach

1.4

Delimitations

2

Background

2.1

System architecture

2.2

Synchronous distributed systems

2.3

Group membership management

2.4

Overview of the SLMP

2.5

SLMP phases

2.6

Related works

3

Methodology

3.1

Evaluation metrics

3.2

Data collection and settings

Round duration

Timeout values

Packet loss rate

Churn rate

Default parameters

Number of nodes

3.3

Simulation environment

4

Results

4.1

Varying timeout

4.2

Varying churn rate

4.3

Varying packet loss rate

4.4

Optimal timeout

5

Discussion

5.1

Method discussion

5.2

Result discussion

6

Conclusion

Bibliography