Tools and methods for evaluation of overlay networks

96  Download (0)

Full text

(1)

IT Licentiate theses 2007-004

Tools and methods for evaluation of overlay networks

O LOF R ENSFELT

UPPSALA UNIVERSITY

(2)
(3)

Tools and methods for evaluation of overlay networks

BY

OLOF RENSFELT

September 2007

DIVISION OF COMPUTERSYSTEMS

DEPARTMENT OFINFORMATIONTECHNOLOGY

UPPSALAUNIVERSITY

UPPSALA

SWEDEN

Dissertation for the degree of Licentiate of Philosophy in Computer Science

(4)

Tools and methods for evaluation of overlay networks

Olof Rensfelt

Olof.Rensfelt@it.uu.se

Division of Computer Systems Department of Information Technology

Uppsala University Box 337 SE-751 05 Uppsala

Sweden

http://www.it.uu.se/

Olof Rensfelt 2007c ISSN 1404-5117

(5)

Abstract

Overlay networks is a popular method to deploy new functionality which does not currently exist in the Internet. Such networks often use the peer- to-peer principle where users are both servers as well as clients at the same time. We evaluate how overlay networks performs in a mix of strong and weak peers. The overlay system of study in this thesis is Bamboo, which is based on a distributed hash table (DHT).

For the performance evaluation we use both simulations in NS-2 and emulations in the testbed PlanetLab. One of our contributions is a NS-2 implementation of the Bamboo DHT. To simulate nodes joining and leaving, NS-2 is modified to be aware of the identity of overlay nodes.

To control experiments on PlanetLab we designed Vendetta. Vendetta is both a tool to visualize network events and a tool to control the individual peer-to-peer nodes on the physical machines. PlanetLab does not support bandwidth limitations which is needed to emulate weak nodes. Therefore we designed a lightweight connectivity tool called Dtour.

Both the NS-2 and PlanetLab experiments indicate that a system like Bamboo can handle as much as 50 % weak nodes and still serve requests.

Although, the lookup latency and the number of successful lookups suffer with the increased network dynamics.

(6)

Acknowledgments

I would like to thank my supervisor Lars-˚Ake Larzon. He has always been very helpful and supportive and makes it fun to go to work. I would also like to thank my secondary supervisor Per Gunningberg, not only for all the help with the thesis but also for creating such a creative work environment in the communication research group (CoRe).

I have very much enjoyed working with Sven Westergren and without him this thesis would have been very different. His PlanetLab skills have been invaluable and I am very grateful to him. I would also like to thank Peter Drugge for all his work on Vendetta and Magnus Rundl¨of for implementing Dtour. It has been great fun working with all of them.

I would also like to thank Arnold Pears for his feedback on the thesis as well as the other CoRe group members. They have helped much through discussions and feedback. So I would like to thank Christian Rohner, Erik Nordstr¨om, Oskar Wibling, Laura Feeney, Thabotharan Kathiravelu, Ioana Rodhe, and Fredrik Bjurefors.

Past members who I would also like to thank for helping me when I was a new student are Henrik Lundgren and Richard Gold.

(7)

Included papers

Paper A: A bandwidth study of a DHT in a heterogeneous environ- ment

Olof Rensfelt and Lars-˚Ake Larzon

Uppsala University Technical report no: 2007-017

Paper B: Vendetta - A Tool for Flexible Monitoring and Manage- ment of Distributed Testbeds

Olof Rensfelt, Lars-˚Ake Larzon and Sven Westergren In the proceeding of TridentCom 2007, May , Orlando 2007 IEEE. Reprinted, with permissionc

Paper C: Evaluating a DHT in a heterogeneous environment Olof Rensfelt, Sven Westergren and Lars-˚Ake Larzon

Submitted for publication

Comments on my participation

Paper A: I implemented the overlay system in NS-2 and performed the sim- ulations. I also analyzed the data and was the main author of the report.

Paper B: I participated in the design process and implemented the C-client of Vendetta. Vendetta was implemented as a Master thesis which I supervised.

Paper C: I had a big role in the design of doing connectivity modeling in a Pre-loaded library and I implemented the generic filter support. I worked both on the simulations and the PlanetLab experiments and I was the main author of the paper.

(8)

List of work not included in the thesis

A LUNAR over Bluetooth

Olof Rensfelt, Richard Gold and Lars-˚Ake Larzon

Proceedings of the 4:th Scandinavian Workshop on Wireless Ad-Hoc Networks 2004, May, Johannesberg

B LUNAR - A Lightweight Underlay Network Ad-hoc Routing Protocol and Implementation

Christian Tschudin, Richard Gold, Olof Rensfelt and Oskar Wibling Next Generation Teletraffic and Wired/Wireless Advanced Networking (NEW2AN’04) 2004, February , St.Petersburg

C Addressing heterogeneity in Peer-to-Peer networks Olof Rensfelt and Lars-˚Ake Larzon

poster

Proceedings of the Swedish National Computer Networking Workshop 2004, November, Karlstad

D NoteNet

Sven Westergren, Peter Drugge, Olof Rensfelt and Lars-˚Ake Larzon demonstration

Mobisys 2006, June , Uppsala

E A bandwidth study of a DHT in a heterogeneous environment Olof Rensfelt and Lars-˚Ake Larzon

poster

Swedish National Computer Networking Workshop 2006, October, Lule˚a

F Dtour - An Approach to Reproducibility on PlanetLab Olof Rensfelt, Lars-˚Ake Larzon and Sven Westergren

poster

ACM SigComm 2007, August, Kyoto

(9)

Contents

1 Thesis introduction 9

1.1 Introduction . . . 9

1.2 Overlay networks . . . 11

1.2.1 Peer to peer networks . . . 12

1.2.2 Unstructured overlay networks . . . 12

1.2.3 Structured overlay networks . . . 13

1.2.4 Distributed Hash Tables . . . 13

1.2.5 Metrics . . . 17

1.3 Evaluation methods . . . 18

1.3.1 Simulation . . . 18

1.3.2 Network emulation . . . 19

1.3.3 Real experiments . . . 19

1.4 Summary of papers . . . 21

1.5 Ongoing work . . . 22

1.5.1 Experimental setup . . . 22

1.5.2 Investigating network instability . . . 22

1.6 Conclusions and future work . . . 24

Bibliography . . . 27

2 Paper A: A bandwidth study of a DHT in a heterogeneous environment 31 2.1 Introduction . . . 33

2.2 Overlays . . . 33

2.3 Distributed Hash Tables . . . 34

2.3.1 Bamboo . . . 34

2.3.2 Management traffic . . . 36

2.4 Implementation . . . 38

2.4.1 NS-2 specifics . . . 39

2.4.2 Packet handler . . . 40

2.4.3 Router . . . 41

2.4.4 Agent . . . 41

(10)

2.4.5 Data storing . . . 42

2.4.6 Other differences to Bamboo . . . 43

2.5 Evaluation . . . 43

2.5.1 Physical network layout . . . 43

2.5.2 Overlay network layout . . . 44

2.5.3 Stabilization time . . . 45

2.5.4 Measurements . . . 45

2.5.5 Simulation specifics . . . 47

2.6 Network variables . . . 47

2.6.1 Size . . . 48

2.6.2 Node capacities . . . 50

2.6.3 Churn rate . . . 51

2.7 Discussion . . . 53

Bibliography . . . 55

3 Paper B: Vendetta - A Tool for Flexible Monitoring and Management of Distributed Testbeds 57 3.1 Introduction . . . 59

3.2 Vendetta . . . 60

3.2.1 Vendetta and PlanetLab . . . 63

3.3 Case: A DHT testbed . . . 64

3.3.1 DHT canvas . . . 66

3.3.2 Monitor configuration . . . 66

3.3.3 Working with the monitor . . . 70

3.3.4 Node client configuration . . . 72

3.3.5 Using Vendetta with the DHT testbed . . . 72

3.4 Discussion . . . 72

3.5 Conclusion and Future Work . . . 73

Bibliography . . . 75

4 Paper C: Evaluating a DHT in a heterogeneous environment 77 4.1 Introduction . . . 79

4.2 Related work . . . 80

4.3 Experiment setup . . . 80

4.3.1 Simulation setup . . . 81

4.3.2 PlanetLab setup . . . 82

4.4 Results . . . 84

4.4.1 Comparing simulations to PlanetLab measurements . . 85

4.4.2 PlanetLab results . . . 85

4.5 Discussion . . . 86

4.6 Future work . . . 87

(11)

4.7 Conclusions . . . 87 Bibliography . . . 89

(12)
(13)

Chapter 1

Thesis introduction

1.1 Introduction

As new networking technologies constantly evolve, we need to gain new un- derstanding of how to evaluate them. A dominating trend in the Internet is that a wider spectra of devices are connected. The change from a network environment consisting of mainly desktop machines to a network with mobile users, cellphones, and new phenomena like peer to peer networks stretches the capabilities of the current design of the Internet. Weak Internet nodes become more common, strong nodes become stronger; and network hetero- geneity increases as access technologies range from low-bandwidth wireless networks to GigaBit Ethernet.

It is not only the nodes that are changing but also how nodes communi- cate. New communication models appear for nodes being both clients and routers in ad hoc networks, intermittent connectivity in delay tolerant net- works, data centric communication in sensor networks, and the use of overlay systems to provide indirection points.

New applications need new functionality in the Internet. The deployment of new functionality is a very slow process. The Overlay concept to add new functionality avoids this by adding new functionality between the application and the existing Internet.

Many overlay services are peer to peer networks, which creates traffic patterns that were not foreseen during the design of the Internet.

This new functionality, combined with the increasing heterogeneity, cre- ates a need for new ways to evaluate the performance of the system. Although much of the experience gathered from evaluating applications and protocols in a fixed network are still valuable, they do not cover how to study the impact of node mobility, or how a system handles nodes joining and leaving

(14)

the network in an unpredictable manner.

The main question explored in this thesis is whether it is possible to have nodes with limited connectivity, for example cell phones, as members of applications based on distributed hash tables (DHTs) and how a mix of different nodes would influence the performance of such a system. The work is motivated by the increasing number of 3G enabled devices, and the avail- ability of flat rate pricing for such devices, which makes it more likely that users would use a bandwidth intensive application. The contribution is an increased understanding of how a DHT performs in a heterogeneous envi- ronment. Different methods are used to explore the impact of heterogeneity.

Another novelty is our design of the evaluation tools, including the extension of previous tools for this environment.

Previous work on the DHT Pastry in heterogeneous environments[4] sug- gests that management traffic could be a problem because it causes the mobile nodes access links to get congested. The network management was redesigned in the follow-up DHT to Pastry, called Bamboo[22]. It performs updates pe- riodically rather than when changes occur. One advantage is that it can handle higher network dynamics and avoids feed-back loops.

Our first evaluation approach uses simulations to allow us to configure link parameters like bandwidth and delay. We use the NS-2 simulator[9], the ”standard” simulator in the research community. To study a DHT in simulation means that the DHT has to be re-implemented from scratch which is a very time consuming task (paper A). The simulations of Bamboo indicate that mobile nodes can participate in the DHT, but that the complexity of the system makes it impossible to simulate long enough scenarios to make any firmer claims. The most valuable outcome of the simulations was the understanding of how to design scenarios with thin nodes and mobility which place stress on the system. The insight that the complexity of the problem made simulations hard to perform was also valuable.

Our second approach was to perform experiments on PlanetLab[6]. As a part of this work, our tool Vendetta was developed to manage and analyze the experiments (Paper B). Vendetta consists of two parts, the client and the monitor. The client runs on every node that participates in the experiment and controls the application that is evaluated. The monitor is a central control and visualization application. When the application runs, the client captures the output from it and continuously parses it for predefined log entries. When a log entry is found, the client performs a configurable action.

Such actions can for example be to stop the application and send a message to the monitor. The monitor receives such messages from the nodes in the experiment and can visualize them in a 3-D canvas. The contribution of Vendetta is not the individual functions but how they are incorporated into

(15)

Figure 1.1: A logical network on top the existing network

one powerful tool and Vendetta has proved very valuable for understanding unexpected behavior in Bamboo.

A reason for using simulations in paper A was the need to limit link capacities and a need to have complete control of the network topologies.

When you do experiments on a testbed like PlanetLab, you do not have to worry about the accuracy of the network model as the physical network you use is the Internet. To create a heterogeneous environment on PlanetLab, the bandwidth of some of the participating nodes needs to be limited. Un- fortunately that functionality is not currently offered by PlanetLab. That need prompted us to design the Dtour tool. Dtour is a connectivity emula- tion system residing in user-space, implemented as a shared library. When an evaluated application uses system calls to send and receive traffic, the system calls are redirected to Dtour. There the traffic is either dropped or forwarded after being filtered. The bandwidth limitation is implemented using a token bucket that the packets need to pass before going into the network stack. The design is very lightweight with no needs to modify the operating system.

1.2 Overlay networks

An overlay network is a logical network built on top of an existing network infrastructure such as the Internet. The strength of overlay networks is that

(16)

you do not need to modify anything in the existing Internet to deploy them, as you only use the Internet as a transport service between nodes in the logical network. Overlay networks are often used to provide functionality which are missing in the existing Internet, for example to support mobility[23], media services[12] or virtual private networks (VPNs). It is also a fast way to deploy new services compared to incorporating them in the Internet design.

In networking terms, an overlay network uses the Internet as a globally distributed link layer, as a “link” between two nodes. When an overlay node sends traffic to another overlay node, it looks like the traffic is sent directly to the other node, although it may be sent through many physical machines. Many overlay networks offer a lookup service to their users. A Lookup consists of a query from a user that gets a response from the network.

1.2.1 Peer to peer networks

Peer to peer (p2p) networks are overlay networks where all participating nodes have the same initial functionality. In a p2p system, all nodes are peers in the sense of having equivalent roles and responsibilities. The p2p model is in contrast to the classic client-server model used for example in web services, where different nodes have clear roles in the system. All decision made within a pure p2p network are distributed, since there is no central decision point.

Some p2p networks do however select supernodes which are nodes with high performance and stable network connections[8]. Such nodes are typically used to perform network management tasks. Even though supernodes might seem a contradiction to the p2p philosophy, all nodes still have the possibility (or risk) of being chosen to become a supernode.

1.2.2 Unstructured overlay networks

Overlay networks can be classified to be either structured or unstructured.

An unstructured overlay network builds a random graph between the par- ticipating nodes and uses algorithms like random walk[29] or flooding to distribute queries through the network. If the flooding is not complete, there is no guarantee that an answer will be found. Unstructured overlay networks have good scalability properties because a node only needs to know a few other nodes in the network. For that reason, unstructured p2p networks are often used for file sharing[11] where an user wants to find a certain file, not all copies of it.

(17)

Figure 1.2: The dotted lines show a leafset in a ring based DHT. A leafset is pointers to neighboring nodes in the key space, the dashed lines show a routing table

1.2.3 Structured overlay networks

Structured overlays assign keys to data items and have a mapping function that map a key onto a node in the overlay. Having such a mapping function makes it possible to have efficient lookups of the data as every node in the network knows where to forward requests. It also makes it possible for a node to insert certain data into the overlay, and for another node to be sure to retrieve it at a later time. There are of course times when a structured overlay can not return values previously inserted but that is an error state, not as in unstructured overlays in which it can occur because of incomplete flooding. To decrease the risk of failed lookups due to nodes leaving the network, many structured overlay networks use replication of data among multiple nodes.

1.2.4 Distributed Hash Tables

Structured overlays are mainly implemented DHTs. A DHT offers a storage service to its users in which data can be inserted and later retrieved from anywhere in the network. A DHT handles key-value pairs where the key often is a hash of the value. A key-value pair might be a name and a tele- phone number, and in that case the telephone number could be retrieved by using the name. Such a service is a useful building block when designing distributed systems. Examples of where DHTs are used are in Azureus[1] to find torrent files, as building blocks in systems supporting mobility[23], and in grid computing[3].

(18)

Figure 1.3: The PUT-GET semantic of DHTs

Four different proposals for DHTs were published in year 2001. They are Chord[24], Pastry[21], Tapestry [28], and CAN[16]. Their algorithmic back- ground is consistent hashing[13], which has the property that when adding or removing bins in a hash table, a limited amount of keys need to be moved between bins. If the number of bins in an ordinary hash table is changed, a majority of the keys needs to change bin. The hash function is expected to distribute keys evenly over the key space. Consistent hashing was initially used to do load balancing among web servers but it is also a good way to partition the key space.

Chord, Pastry, Tapestry, and Bamboo[18] organize keys and nodes in a circular key space using SHA-1[7] as the hash function. CAN uses a more complex key space, a n-dimensional Cartesian coordinate space on a multi- torus. The 2-dimensional CAN key space is presented in figure 1.5.

The keys in a DHT are flat identifiers, meaning that they do not hold any hierarchical information. This is in contrast to an IPv4 address that is tightly coupled to a physical location in the Internet, where the location can be derived from the hierarchy of the address. Because of this difference, DHTs are useful building blocks in systems that want to differentiate between location and user identity. Such systems can enable transparent mobility because a user can keep her ID as she moves around in the physical network.

There are also approaches where keys are given an hierarchical meaning.

For example, in a global geonotes system where geographical location is part of the key[25]. The DHT is then made aware of the hierarchy of to increase efficiency. Others have built data structures on top of DHTs [5], which

(19)

goes well with the idea of having a DHT as a service[19, 2] for nodes not participating in the overlay.

The separation between physical location and logical position in the over- lay network makes DHTs robust against network disturbances. It is highly unlikely that two adjacent overlay nodes are located close to each other in the underlying physical network, thereby risking being affected by the same local network outages. However, if the underlying network gets partitioned a DHT might be divided into two different networks. Therefore most DHTs have network merging functionality.

Data insertion and retrieval

When data items are inserted into a DHT, they are given an identifier value in the network by the hash function. The hash function is applied to the data or meta-data and the result is called a key. The set of all possible keys is called the key space and is dependent on the hash function used. Because of the properties of hash functions, every data item has one unique place in the key space, and that place can be located by any member of the network. The most common hash function used in DHTs today is SHA-1 which distribute data among 2160 bins. An one dimensional key space can be thought of as a ring into which values are put, which in the case of SHA-1 are all values between 0 and 2160− 1 (figure 1.2). This thesis concentrates on ring-based DHTs.

When values are inserted into the key space, you need to decide which node should be responsible for what values. To do that, nodes are also put into the key space, often by applying the same hash function to the port and IP address. We will use the term node ID to indicate the place where a node is put into the key space. In Chord for instance, a node is responsible for all values between its node ID and the node ID of the next node in key space while in some other DHTs the node with the numerically closest node ID to the key is responsible for attached data[21, 18].

The purpose of a DHT is to offer a service for handling key-value pairs.

To insert data is called a PUT and to later retrieve it is called a GET (figure 1.3). When a user inserts data to, or requests data from, a DHT, messages need to be forwarded between the DHT nodes for the PUT or GET to reach the responsible node. To find the responsible node is called a lookup which can be done in two different ways. The first way is that the initiating node asks a node for a pointer to the next suitable node in order to forward the lookup. When an answer arrives, the initiating node issues a new request to the node pointed to by the previous node, and the process continues until the right node is found. The second way is that an initiating node sends

(20)

Figure 1.4: Different approaches for lookups in a DHT

a request to another node to do a lookup, and if the receiving node is not the responsible node, the receiving node forwards the request through the DHT (figure 1.4). The first approach, iterative routing, gives the initiating node control of the lookup which circumvents the potential problem of a malicious node dropping requests. The second approach, recursive routing, on the other hand performs lookups with lower latency and also tackles the problem of non-transitive connectivity[10].

Management traffic

As DHTs should work in dynamic network environments, nodes need to communicate with each other in order to know what nodes are still members of the network. The nodes a certain node communicate with directly are called neighbors. The status of neighbor nodes is often tested using some kind of echo-reply communication. The interval with which a neighbor is contacted affects how fast network dynamics the system can handle. This is a relevant setting in our scenario when mobile users are expected to join and leave at a high rate.

A design decision when creating a DHT is how a node should select neigh- bors. The crude approach is to have all nodes communicating with all other nodes in the DHT. Such an approach is feasible for small networks, and will give good lookup performance as all values can be reached with only one request, or with complexity O(1). However, such an approach becomes inef- ficient when the number of nodes participating in the network grows large, and it is therefore common to also use more advanced methods when selecting neighbors.

In DHTs that use a circular key space, it is common to let nodes keep track

(21)

Figure 1.5: A 2-dimensional CAN key space divided among eleven nodes

of certain number of nodes before and after their node ID. The set of such nodes are sometimes referred to as a leafset (figure 1.2). The leafset ensures that messages can be passed between any two nodes in a stable network, using other nodes. To only use a leafset causes a high lookup path length of O(n), where n is the number of participating nodes. Therefore, although sufficient to ensure correct lookups, it is not efficient at handling lookups in big networks. To increase performance, nodes can keep information about other nodes far away in key space. That information is often called a routing table (figure 1.2).

1.2.5 Metrics

There are certain metrics commonly used when evaluating the efficiency of DHTs. An obvious metric is how timely the DHT is in terms of servicing requests, often called lookup latency. Lookup latency and successful lookups are two metrics with which to evaluate DHT services. However, to only consider lookups overlooks the internal processes causing delays and failures.

To see how efficiently an overlay uses network resources, you often measure the ratio between physical network hops and overlay hops, the overlay stretch.

A high overlay stretch increases the risk of something going wrong during the lookup and can therefore lead to a decrease in successful lookups.

The price of providing an overlay service can also be evaluated. It mainly consists of the traffic that needs to be sent between nodes regardless of whether requests are served or not.

(22)

1.3 Evaluation methods

With overlay services becoming more widely deployed, the need to evaluate the performance of such systems have increased. Experimental evaluation tools include simulators, emulators, and testbeds, while theoretical methods involve statistics and formal methods. The theoretical methods are beyond the scope of this thesis, so only the experimental methods will be discussed here.

1.3.1 Simulation

The most used method to evaluate overlay systems is by simulation. Simu- lations can be very useful, as you have complete control of the environment in which you evaluate an application. You often need to implement an ap- plication or a protocol specificly for a simulator to evaluate it. That might be good, as you can make simplifications that makes the model less com- plex. Such a simplified implementation might make it possible to simulate a bigger network or longer scenarios. Simulation does not typically run in real time, so it is also feasible to study quite long scenarios in a short time if the complexity is kept low.

When a scenario is designed, there are a multitude of configurable param- eters which need to be assigned. Parameters might control network topology, network dynamics, and timers in the evaluated application or protocol im- plementations. There are numerous tools for creating network topologies according to Internet models, but still, you always need to estimate the rel- evance of the created topologies for your experiments.

The simulation environment makes it possible to quickly change the topologies and even model configurations which are rare in real life. This also means that the evaluation results are directly dependent on the accu- racy of the models, e.g. of the network topology and the application behavior.

It can make it hard to draw general conclusions outside the the assumptions of the models and the parameters used.

Simulators

The most common network simulator within the academic research commu- nity is NS-2[9], which is an event-driven packet level simulator. By simulat- ing every packets path through a network topology, links and queues can be simulated. Such a detailed network model is needed when evaluating trans- port protocols. However, such high detail in simulations is computationally

(23)

expensive which makes it time consuming to evaluate large scale network configurations.

Another approach to simulate big networks is to have a network model only modeling delays. It is fairly simple to implement an event driven simu- lator with such a network model, so many researchers implement their own.

Such simulators are often used to verify functionality rather than to evaluate performance. This is because they can not model bandwidth or packet loss caused by full network queues. The praxis that developers and researchers of overlay systems implement their own simulator unfortunately makes it hard to compare different results, implementations, and algorithms.

Our choice to implement Bamboo in NS-2 was based on the need to set link parameters to model wireless access links. If we only wanted to model high network dynamics caused by mobile users a simpler simulator could have been used.

1.3.2 Network emulation

In emulation, parts of the real system and models are combined. The modeled part is used for different reasons. It could replace a complicated part which is difficult to provide, such as a large network. It could be used to provide a repeatable environment, such as a radio network which otherwise has an unpredictable component. In both examples, parameters for the model could be systematically changed during an experiment. With an network emulator we mean that the actual application is used and that the emulator provides the same interfaces as the real network. Still, a designer of a network emulator needs to design a communication scenario for the emulator.

Emulation has been used a lot within wireless network research due to the ability to model mobility as connectivity changes[27]. For research on systems in wired network there are hardware network emulators available, as well as publicly available emulation testbeds like Emulab[26]. Emulab supports both wireless and wired experiments as well as mobility.

Recently there has been work done that aim at using measured network phenomena in an emulated network, where the properties of the emulated network is affected by measurements from a real network[20].

1.3.3 Real experiments

To evaluate how a networked application would behave in a real deployment, real experiments is the most valuable method as it is common that an appli- cation behaves unexpectedly when it is exposed to real network dynamics.

Real experiments are often hard to perform due to coordination problems,

(24)

time synchronization, and hardware. Unlike in simulation and some em- ulation, you do not have a global clock to time your measurements with.

There are tools available to support real life experiments that typically helps in choreographing node behavior, synchronizing experiment start, and later gather logfiles and other data[15].

A fundamental property of real experiments is the varying environment between experiments. It might be that the background noise varies over time when doing wireless measurements, or cross traffic in the Internet when doing overlay experiments. On one hand such variations reflect what a real deployed system would have to cope with, so results gathered under such circumstances are highly relevant. On the other hand, if the variations are high, they can cause results to be hard to compare or reproduce.

In overlay network research, where you typically want to evaluate systems with many nodes spread over the whole Internet, real experiments are costly.

Some companies have testbeds that can be used to perform experiments[5]

but the most commonly used testbed is PlanetLab[6]. PlanetLab is a coop- eration between mainly research institutions that provides a global testbed.

The testbed currently consists of 777 machines at 378 physical locations around the world. The users of the testbed can get shell accounts on the machines. To have access to that many machines distributed over the world enables many interesting experiments, but it also creates new problems; for example clock skew, machine crashing, and other experiments competing for resources.

Running experiments on PlanetLab involves problems that you do not have in simulation or emulation. First you need to distribute software and possibly different configuration files to all the participating nodes, and unlike in simulation, it is a time consuming task. When the nodes have the right software installed the experiment needs to start on all nodes synchronized, which is pretty hard to achieve on PlanetLab. While the experiment is running, it is nice to be able to monitor how it proceeds, but it is often hard to get a good picture of the experiment by looking at logfiles at different nodes.

Currently there is no way offered by the PlanetLab testbed to control network specifics like intermittent connectivity or limited bandwidth. Being able to control such properties of the nodes in a experiment can be valu- able when evaluation overlay systems that should function in other network environments than the fixed Internet.

To evaluate a DHT running on the Internet with low bandwidth nodes participating we designed Dtour. It is a lightweight connectivity emulation library which allows us to emulate weak access links on PlanetLab.

(25)

1.4 Summary of papers

This thesis consists of the following papers.

Paper A: A bandwidth study of a DHT in a heterogeneous environ- ment

Olof Rensfelt and Lars-˚Ake Larzon

Uppsala University Technical report no: 2007-017

This technical report documents the work of implementing a version of the Bamboo DHT to NS-2. It describes how NS-2 was modified to better handle node churn, as well as how the heterogeneous scenario was modeled. It also presents simulation results indicating that mobile phones might actually work as full members of a DHT. The choice to use NS-2 might in retrospect be questioned, as it turned out it was extremely time consuming to reimplement the system. However, the experience about what scenarios to study and how to model them have showed themselves to be very valuable when doing PlanetLab experiments.

Paper B: Vendetta - A Tool for Flexible Monitoring and Manage- ment of Distributed Testbeds

Olof Rensfelt, Lars-˚Ake Larzon and Sven Westergren In the proceeding of TridentCom 2007, May , Orlando

In this paper, the Vendetta monitoring and management tool is de- scribed. Vendetta is a tool both used to interactively control experi- ments as well as visualize events that occur in an overlay network. The system consists of two parts - first a small piece of software running on every node in a testbed called the client and second, a monitor where an experiment can be set up, monitored, and controlled. A main contribution is the framework to handle logfile parsing during exper- iments, which in combination with the generic event queue allows a great amount of flexibility when controlling experiments.

Paper C: Evaluating a DHT in a heterogeneous environment Olof Rensfelt, Sven Westergren and Lars-˚Ake Larzon

Submitted for publication

Using both the NS-2 implementation of a DHT as well as real exper- iments on PlanetLab, the impact of weak nodes to a DHT was evalu- ated. To model heterogeneity on PlanetLab, a lightweight emulation library called Dtour was designed. Dtour decides whether packet should be forwarded or dropped. It is implemented by catching system calls

(26)

like send() and sendto() and a filter mechanism. The packets are sent through a token bucket filter to limit bandwidth. The results show that there is a good match between simulation results and PlanetLab mea- surements and that a DHT like Bamboo can actually cope quite well with bandwidth limited nodes and high churn rates. However, the DHT enters an oscillating state which needs to be addressed. The oscillation does not occur when nodes without bandwidth limitation churn in the same pattern as bandwidth limited nodes in the oscillating experiment.

Neither does it occur when bandwidth limited nodes participate in the network without churning so the combination of churn and bandwidth limitations seems problematic.

1.5 Ongoing work

Since PlanetLab allowed us to evaluate long scenarios compared to simu- lations, we were able to observe strange behavior that did not appear in simulation.

1.5.1 Experimental setup

The experiments running on PlanetLab matches the simulations from pa- perA. The only difference is that there is no extra delay on weak nodes access links on PlanetLab. Nodes are either weak or strong where weak nodes mod- els mobile terminals and the strong nodes models nodes with broadband con- nection. The weak nodes are bandwidth limited according to measurements from an commercial available 3G service where the uplink was measured to 384 kb/s and the downlink to 64 kb/s[14]. While the strong nodes stay con- nected to the network for the duration of the experiment, weak nodes join and leave with short intervals. The network size is kept fixed by letting a new node join as soon as a node leaves. When nodes join and leave are mod- eled by a Poisson process which creates exponentially distributed connection times with the mean of 3 minutes.

1.5.2 Investigating network instability

In figure 1.6 we present performance over time for the DHT. All experiments with weak nodes show significant variations in mean lookup latency over the time of the experiment. In figure 1.6(a) it is clear that the latency slowly oscillate with an about five hours period.

(27)

0 5 10 15 20 25 30 35 0

1 2 3

Time (h)

Latency (s)

(a) Mean lookup latency over time

0 5 10 15 20 25 30 35

0 0.2 0.4 0.6 0.8 1

Time (h)

Success ratio

(b) Success ratio over time

0 5 10 15 20 25 30 35

0 1 2 3 4

Time (h)

Tx and Rx (kB/s)

(c) Used bandwidth over time

Figure 1.6: Performance and cost over time for Bamboo, 30% weak nodes.

(28)

Because the addition of bandwidth limitations lead to the oscillation, we expected to see an increase in dropped packets during the latency peaks.

However, when we studied the drop rate we found it rather decreases indi- cating that nodes decrease their sending rates. From figure 1.6(c), we can also see that the mean bandwidth used is below 4 kB/s for combined received and sent data, which is about half the upstream limit of 64 kb/s.

The impact of churn on a DHT can be substantial [18]. It does not only cause failed lookups due to nodes leaving while forwarding a lookup, but it also causes routing tables to be non optimal. Non optimal routing tables will cause higher lookup latencies in the DHT. Churn can also create an increase in management traffic when newly joined nodes need to synchronize neighbor information and stored data. Such traffic might congest links between nodes.

To investigate the impact of churn, we ran experiments without any band- width limitation, but let 30% of the nodes churn like weak nodes. Except for the lack of bandwidth limitation, the setup was identical to previous ex- periments. In the measurements from this experiment, we observed some variations in latency in the first few hours but the latency stabilized.

The results from the experiment indicate that the churn is not solely to blame. We also ran an experiment with 30% weak nodes that did not churn and obtained similar results. This experiment performed at a stable level throughout the entire experiment, without any significant variations in latency or success ratio. This result also reduces the risk that a programming error in for example Dtour is causing the oscillations.

Since neither churn nor bandwidth limitation on its own caused the net- work to crash we are lead to believe that the cause must be the combination of the two. Our current hypothesis is that the congestion mechanism imple- mented on top of UDP is causing the oscillation. The main reason for this suspicion is the decrease in total traffic sent and received during the latency peaks seen in figure 1.6(c). We find it interesting that added dynamics with a 3 minute mean interval can cause dynamics on +5 hour time scales. Because the congestion mechanism reacts to dropped packets, we would like to find out what packets are dropped when during the experiment. Unfortunately it seems hard to make Dtour aware of what packets are dropped because it is not a well defined protocol but serialized objects that are sent between Java machines. We do not currently know how to solve that problem.

1.6 Conclusions and future work

As we have worked on evaluating a DHT in heterogeneous networks we have found a need to improve the available tools since both simulation and testbeds

(29)

have limitations. We have extended existing tools during our work, both by modifying NS-2 and implementing an emulation library which can be used on PlanetLab.

Our results both from simulation and PlanetLab experiments indicate that a DHT could work with a high percentage of bandwidth limited nodes.

Even if the time they are attached to the network is short. The performance obviously suffers, but the system is able to satisfy requests even under ex- treme conditions.

On a longer time scale there are two main directions that seem interesting to pursue. First it would be very interesting to see if Dtour and Vendetta could be used to evaluate other networks than overlay networks. It seems likely that Vendetta could also be very useful for managing sensor network testbeds. A tool like Dtour might also be useful when evaluating DTN solu- tions. Extending Dtour with functionality to delay traffic would also enable other interesting uses.

(30)
(31)

Bibliography

[1] BitTorrent client. Online: http://azureus.sourceforge.net/, 2001.

[2] Hari Balakrishnan, Scott Shenker, and Michael Walfish. Peering Peer-to- Peer Providers. In 4th International Workshop on Peer-to-Peer Systems (IPTPS ’05), Ithaca, NY, February 2005.

[3] Sujata Banerjee, Sujoy Basu, Shishir Garg, Sukesh Garg, Sung-Ju Lee, Pramila Mullan, and Puneet Sharma. Scalable grid service discovery based on uddi. In MGC ’05: Proceedings of the 3rd international work- shop on Middleware for grid computing, pages 1–6, 2005.

[4] Fredrik Bjurefors, Lars ˚Ake Larzon, and Richard Gold. Performance of pastry in a heterogeneous system. In Proceedings of the fourth IEEE International Conference on Peer-to-Peer Computing, 2004.

[5] Yatin Chawathe, Sriram Ramabhadran, Sylvia Ratnasamy, Anthony LaMarca, Scott Shenker, and Joseph Hellerstein. A case study in build- ing layered dht applications. In SIGCOMM ’05: Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, pages 97–108, New York, NY, USA, 2005.

[6] Brent Chun, David Culler, Timothy Roscoe, Andy Bavier, Larry Pe- terson, Mike Wawrzoniak, and Mic Bowman. PlanetLab: An Overlay Testbed for Broad-Coverage Services. ACM SIGCOMM Computer Com- munication Review, 33(3):00–00, July 2003.

[7] Donald Eastlake 3rd and Paul E. Jones. US Secure Hash Algorithm 1 (SHA1). RFC 3174 (Informational), September 2001.

[8] Open Source Community. Fasttrack. Online: http://www.fasttrack.nu/, 2001.

[9] Sally Floyd and Steve McCanne. ns network simulator. Online:

http://www.isi.edu/nsnam/ns, 2003.

(32)

[10] Michael J. Freedman, Karthik Lakshminarayanan, Sean Rhea, and Ion Stoica. Non-transitive connectivity and DHTs. In Proc. 2nd Workshop on Real, Large, Distributed Systems (WORLDS 05), San Francisco, CA, December 2005.

[11] Wireless Network Topology Emulator. Online:

http://sourceforge.net/projects/wnte/, 2001.

[12] Saikat Guha, Neil Daswani, and Ravi Jain. An experimental study of the skype peer-to-peer voip system, 2006.

[13] David Karger, Eric Lehman, Tom Leighton, Mathhew Levine, Daniel Lewin, and Rina Panigrahy. Consistent hashing and random trees: Dis- tributed caching protocols for relieving hot spots on the world wide web.

In ACM Symposium on Theory of Computing, pages 654–663, May 1997.

[14] Daniel Lanner. Comparison of tcp-performance in wireless 3g- and ad hoc-networks. Master’s thesis, Uppsala University, 2006.

[15] Erik Nordstrom, Per Gunningberg, and Henrik Lundgren. A testbed and methodology for experimental evaluation of wireless mobile ad hoc networks. In TRIDENTCOM ’05: Proceedings of the First Interna- tional Conference on Testbeds and Research Infrastructures for the DE- velopment of NeTworks and COMmunities (TRIDENTCOM’05), pages 100–109, 2005.

[16] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Schenker. A scalable content-addressable network. In Proceed- ings of the 2001 ACM SIGCOMM conference on applications, technolo- gies, architectures, and protocols for computer communications, pages 161–172, 2001.

[17] Olof Rensfelt and Lars ˚Ake Larzon. A bandwidth study of a DHT in a heterogeneous environment. Technical Report 2007-017, Uppsala University, May 2007.

[18] Sean Rhea, Dennis Geels, Timothy Roscoe, and John Kubiatowicz. Han- dling churn in a DHT. In Proceedings of the 2004 USENIX Annual Tech- nical Conference (USENIX ’04), Boston, Massachusetts, June 2004.

[19] Sean Rhea, Brighten Godfrey, Brad Karp, John Kubiatowicz, Sylvia Ratnasamy, Scott Shenker, Ion Stoica, and Harlan Yu. OpenDHT: a public DHT service and its uses. SIGCOMM Comput. Commun. Rev., 35(4):73–84, 2005.

(33)

[20] Robert Ricci, Jonathon Duerig, Pramod Sanaga, Daniel Gebhardt, Mike Hibler, Kevin Atkinson, Junxing Zhang, Sneha Kasera, and Jay Lep- reau. The Flexlab approach to realistic evaluation of networked sys- tems. In Proc. of the Fourth Symposium on Networked Systems Design and Implementation (NSDI 2007), Cambridge, MA, April 2007.

[21] Antony Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. Lecture Notes in Computer Science, 2218, 2001.

[22] Sean Rhea and Dennis Geels and Timothy Roscoe and John Kubiatow- icz. Handling churn in a DHT. Technical Report UCB/CSD-03-1299, University of California, Berkeley, December 2003.

[23] Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, and Sonesh Surana. Internet indirection infrastructure, 2002.

[24] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 ACM SIGCOMM conference on applications, technologies, architectures, and protocols for computer communications, pages 149–160, 2001.

[25] Sven Westergren. Notenet - range queries in a DHT. Master’s thesis, Uppsala University, 2007.

[26] Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Gu- ruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. An integrated experimental environment for distributed sys- tems and networks. In Proc. of the Fifth Symposium on Operating Sys- tems Design and Implementation, pages 255–270, Boston, MA, Decem- ber 2002.

[27] Open Source Community. Gnutella. Online: http://www.gnutella.com, 2001.

[28] Ben Y. Zhao, John D. Kubiatowicz, and Anthony D. Joseph. Tapestry:

An infrastructure for fault-tolerant wide-area location and routing.

Technical Report UCB/CSD-01-1141, UC Berkeley, April 2001.

[29] Ming Zhong and Kai Shen. Random walk based node sampling in self- organizing networks. SIGOPS Oper. Syst. Rev., 40(3):49–55, 2006.

(34)
(35)

Chapter 2

Paper A: A bandwidth study of a DHT in a heterogeneous

environment

(36)

Abstract

We present a NS-2 implementation of a distributed hash table (DHT) mod- eled after Bamboo. NS-2 is used to evaluate the bandwidth costs involved in using a DHT in heterogeneous environments. Networks are modeled as mixed networks of desktop machines and 3G cellphones. We also document the modifications of NS-2 that were needed to simulate churn in large net- works.

(37)

2.1 Introduction

In the design of distributed applications, there has been a strong trend during the last decade to use the Internet mainly for connectivity and build an over- lay network with its own node identifier space on top of IP. This effectively deals with problems that could otherwise occur due to dynamics in IP ad- dress allocations. By not using the IP address as an identifier for the service itself, the service can continue to function as long as Internet connectivity is maintained even if IP addresses change over time.

A common approach to introduce a new identifier space is to use dis- tributed hash tables (DHTs). A DHT is a distributed data structure that functions much like an ordinary hash table, except that the key space is distributed over several nodes rather than kept together at one single node.

When querying a value in a DHT, the query is routed to the node that maintains the corresponding part of the key space. Nodes continually ex- change data to keep track of how the responsibility is divided among them.

Most DHTs include some degree of replication to deal with nodes that may disappear without prior notice.

Most existing evaluations of DHTs are done using simulators written for that specific purpose to enable proper simulation of the data structure itself with a focus on how queries are carried out. Modeling of the communication between nodes that collaborate in a DHT tend to be simplistic at best - sometimes it is assumed that all messages sent are instantaneously received by the receiver. While this assumption can be argued to be a reasonable simplification in an environment with fast computers communicating over a fixed Internet connection with high bandwidth, it does not hold for more heterogeneous network environments.

In this report, we document the NS-2 implementation of a Bamboo[11] like DHT and present simulation results on how it behaves under such conditions.

The main contribution is that the implementation allows a more detailed networking model, compared to simpler simulators.

2.2 Overlays

A recent trend is to use overlays to deploy functionality that the existing net- work infrastructure does not provide. An overlay network is a logical network built on top of the existing network with its own addressing scheme using the Internet as a link layer. Building an overlay network makes it possible to deploy functionality not available in the current Internet architecture, or to create services that are provided by the users of the service. A user pro-

(38)

vided service is for instance BitTorrent where a group of users cooperate to provide efficient mass-distribution of large files. In some sense the users pay for the service by participating in the overlay. The common use of overlays is as the communication module of an application, sometimes referred to as BYOI (Bring Your Own Infrastructure). BYOI has proved very successful in file-sharing applications and in some sense in VoIP with Skype. Parts of the research community are however suggesting to have overlays as services for multiple applications to share, provided by companies [1]. The benefit would be that the price of management overhead can be shared among more participants. Also, such a service could be assumed to be more stable than a service where only the users collaborate to provide the overlay. Another benefit, or draw back, depending on conviction, is that a payment system needs to be added to the system because users no longer pay for the service by participating in it.

2.3 Distributed Hash Tables

A common service provided by overlay networks is a lookup service handling flat identifiers with a ordinary query-response semantic. Such a service is often implemented using DHTs (Distributed Hash Tables) [9, 16, 13, 18, 11]

. A DHT allows you to insert values connected to keys much like ordinary hash tables. A key is typically a hash of the value stored or alternatively a hash of some meta data of the value. When the key is inserted it is routed through the overlay network until it reaches the node that is responsible for storing the key. The key can later be used to retrieve the value from the DHT.

The flat address structure often used in overlays, and especially DHTs, is appealing for cases when you want addressing differentiated from your physical location in the network. Such a differentiation can for instance be a building block in systems supporting mobile nodes [15] where identifiers should remain the same regardless of the location of the node.

Despite the flat address space structure on the DHT level, it is still pos- sible to add some form of hierarchy in the application. E.g in [17] we embed the geographic location of information in the key itself. Other have also built hierarchy on top of a DHT [3].

2.3.1 Bamboo

Bamboo is a DHT implementation first presented in [11]. It is referred to as a third generation DHT, where lessons learned from previous systems have

(39)

Figure 2.1: The routing table. The white nodes are the middle white node’s leafset if the leafset size is configured according to l=3. The dotted arcs show the routing table entries

been incorporated in the design. The Bamboo implementation has proved stable when used in OpenDHT [12] ,where it serves a system with good uptime.

To continue the earlier studies[2] of how an overlay network behaves in a heterogeneous environment, we chose to implement a DHT in NS-2[5]. We believe that a lot of the problems seen in [2] is addressed with Bamboo.

For example, the problems encountered with Pastry in heterogeneous net- works were mainly caused by management traffic congesting nodes, and a new approach to management traffic were presented in [11].

Network structure

Bamboo uses the routing logic of Pastry but has more developed mechanisms for maintaining the network structure in a dynamic environment. A big part of network dynamics is that nodes leave and new nodes join the network, which is called churn. Bamboo maintains two sets of neighbor information in each node (figure 2.1). The leafset consists of successors and predecessors that are the numerically closest in key space. When routing a query, it is forwarded to a node which has the key in its leafset. Using the leafset is enough to ensure correct lookups. However if only the leafset was used when doing lookups, a lookup complexity of log(n) is all that could be achieved. To improve the lookup complexity, a routing table is used. The routing table is populated with nodes that share a common prefix, and routing table lookups

(40)

are ordinary longest prefix matching.

The major difference between Pastry and Bamboo is how they handle management traffic. In Pastry, management is initiated when a network change is detected, while in Bamboo all management are periodic regardless of network status. The approach to use periodic updates has been showed to be beneficial during churn [11] since it does not cause management traffic bursts during congestion. Such traffic bursts can further increase network disturbances.

The Bamboo system has been evaluated both in simulation and as a deployed system on PlanetLab[4]. However the evaluations have not taken bandwidth or other node specifics into account, only network delay. This is not a major problem if you want to evaluate scalability and lookup delays in noncongested networks. The nodes in PlanetLab are typically very strong machines on academic or other types of very stable, high bandwidth networks, and therefor they are not suited for studying the scenario we are investigating.

2.3.2 Management traffic

In order for a DHT to be able to serve requests and maintain a consistent net- work view among its nodes, it needs to perform network maintenance. This maintenance consists of network messages sent between nodes. In this section we will describe the different types of maintenance performed by Bamboo.

Periodic management traffic occurs in all layers of the Bamboo system (figure 2.2). In the data transfer layer, ping messages are used to measure RTTs (Round Trip Times) to peers. Routing table and leafset information are ex- changed and databases are synchronized. We have used [11, 10] as design documents as well as the Java source code from [6].

Neighbor ping

The most basic management traffic type is to make sure that you can still reach your one-hop neighbors in the overlay. This is normally done with an echo/reply type of communication. In Pastry it is called probes, and other systems have the same function with different names. The messages sent are not ICMP pings but UDP echo and reply packets. The major design decisions regarding neighbor pings are the interval which is used to ping and the number of unanswered pings that should cause a node to treat a neighbor as unreachable or, as in Bamboo, as possibly down. In Bamboo the neighbor pings are also used to maintain a RTT estimate used for retransmission time- out calculations.

(41)

The reason why UDP is the preferred transport protocol in Bamboo is that the overhead of connection oriented communication does not justify the benefits of reliable transfer. A DHT also has a non symmetric nature regarding neighbor knowledge between nodes, meaning that the fact that node A has node B in its neighbor set does not necessarily mean that node B’s neighbor set include node A. Because of this asymmetry the number of nodes that know a certain node will increase with the network size. If TCP is used as the transport protocol, the state that a node needs to keep increases significantly as TCP needs both the receiving and the sending nodes to keep state information. A DHT could benefit from using a transport protocol with properties like DCCP [7] as mentioned in [11]. DCCP offers a UDP-like, non- reliable datagram transfer with congestion control.

Leafset updates

Changes in node leafsets are propagated using an epidemic approach. Every node periodically chooses a random node from its leafset and performs a leafset push followed by a leafset pull in response. Both messages involve sending the complete leafset to the synchronizing node where the information is incorporated. It is important to both push and pull leafsets. Otherwise there might arise situations where nodes are missed in the leafsets of its neighbors [10].

Local routing table updates

When a node has another node in its routing table, those two nodes per def- inition share one level. The local routing table updates are used to exchange the node information in that level. If a node gets information about other nodes that fits into the routing table it probes the nodes to test reachability and to get a RTT estimate. If a node is reachable and fits into an empty field in the routing table, it gets added. If the matching routing table entry is occupied, the node with the lowest latency is chosen. Other optimization schemes could be considered, such as optimizing for uptime, but optimizing for latency is the most common approach used. Having an optimized routing table does not influence lookup correctness, only lookup latency.

Global routing table updates

Local routing table updates can only improve routing table levels that are not empty. To improve that, you need to exchange routing table information with nodes that you do not yet know of. To find such nodes the routing functionality of Bamboo is used. To optimize a certain routing table entry,

(42)

a lookup is made for a key which shares prefix with that entry. If a suitable node exists in the network the request will be routed to it, and that nodes is a candidate for the routing entry. Unlike with local updates, global updates can be used to optimize a specific routing table entry.

Data storage updates

When data is stored in the DHT using the PUT command, the data is routed through the DHT to the node primarily responsible for storing the data. When the responsible node gets the data, it caches it within its leaf- set at ’desired replicas’ neighbors in each direction. The caching does not occur immediately, but is performed by the periodic replication functional- ity described below. The value ’desired replicas’ is a configure parameter, and with the default settings there are 7 copies of the data within the sys- tem. When nodes disappear or joins, the subset of nodes that should store a certain value changes. Therefore there is a need for a mechanism to try to restore the distributed storage to the wanted state. The default setting of ’desired replicas’, and the resulting 7 copies of each data units within the system, causes demands for storage space. If all nodes have equal amounts of keys to store, every node needs to store seven times that amount.

The first maintenance operation made is that a node periodically picks a random node in its leafset and synchronizes the stored keys with it. A synchronization operation starts with a node picking a node to synchronize with and requests a synchronization. The other node calculates the set among its stored keys that it believes should also be stored at the initiating node and send those keys and the hash values of the data. The other node receives the keys and hash values, and matches them to what it has stored. If a certain data unit received is not already stored it requests that data unit from the initiating node.

The second maintenance operation performed by the data storage layer is to move values that are not longer within a nodes storage range. If a node has such a value stored, it performs a new PUT to the place it should be stored before deleting it.

2.4 Implementation

We have implemented a DHT in NS-2 [5] and, in what we believe to be the relevant properties, made it as similar to Bamboo as we could. However, since we did not run the Java code in simulation, differences might exist that we are not fully aware of. We will state the known differences when

(43)

Figure 2.2: Block diagram of the Bamboo-NS2 implementation

describing the different parts of the system. During the implementation work we have used the technical report [14] as a reference as well as the source code, and later the doctoral thesis[10] when it became available. In the following text we will refer to our implementation as Bamboo-NS2, and the original implementation as Bamboo.

The NS2 implementation consists of multiple modules that are constructed to fit the design of NS2, rather than the design of Bamboo (figure 2.2). There are however many similarities between which modules Bamboo and Bamboo- NS2 are divided into.

2.4.1 NS-2 specifics

To be able to simulate big networks, we needed to make some simplifications.

One simulation specific method is that we have the possibility to build the overlay network before the actual simulation starts. We will refer to this as building the network offline. In section 2.5 we will further discuss how this influences the evaluation.

As previously mentioned we have not implemented storage of real data in order to save memory, and instead of a faked hash value we use a globally unique id on every data item that exists in the DHT. Since we have control of all data that is inserted into the DHT, we believe this to be a valid approach.

When we started to simulate churn, we ran into some problems with memory leaks trying to free NS-2 objects. This lead us to reuse the same NS-agents with multiple overlay nodes. First we tried to have multiple NS-

(44)

nodes for each overlay node, so that when an overlay node went down and a

’new’ overlay node came up, it came up on a different NS-node. The reason that we did not simply use the same NS-node for the new overlay node is because of the node information about the old node that is still in the system.

This would cause a new node to receive traffic meant for an old node which would take up link bandwidth. We call this kind of traffic ’stale traffic’. We did not want to filter out traffic to no longer active nodes at the sending node, because in a real life deployment there is no way of knowing whether a node is active or not. The approach with multiple NS-nodes meant that we needed to simulate much bigger networks since many more physical nodes than overlay nodes where needed. Even when we used three physical nodes per overlay node stale traffic still turned up at newly joined nodes. Therefore we needed to find an other method of getting rid of stale traffic.

The second method involved giving every overlay node another globally unique id (GID), apart from its overlay address, and introducing a directly indexed lookup table with connection status. Then we modified the NS2 routing function to compare next hop IP from the routing logic to the end destination IP of the packet, and if they are equal it makes a status lookup to see if the destination overlay node is active. If it is not active the packet is simply dropped after it has been logged as stale traffic, and will therefore not stress the last hop link of a new node.

2.4.2 Packet handler

The packet handler at a Bamboo-NS2 node consists of a list of known neigh- bors. Bamboo implements reliable transfer on top of UDP, using acknowl- edgments which are also used for RTT measurements. If traffic is not flowing between nodes, periodic probes are sent to keep the estimated RTT accurate.

In Bamboo-NS2 we use the NS-2 class agent, which we connect between nodes. Agents are closest matched by UDP sockets in Bamboo. To keep the memory usage low, we connect agents dynamically when needed. We encountered problems when we tried to free memory after the agents were not needed anymore. A workaround was to implement an agent pool, which we could request agents from in order to reuse them. An agent pair is only used to send data one way, because there where implementation benefits from having all traffic to a node go through one agent. We call the sender-side agents bamboo send agent, because they are of a different class compared to the receiving side type described in 2.4.4.

We did not use cumulative acknowledgments since we did not want to keep state at the receiver for every node that communicates with us. We do however need to keep a bamboo send agent for each node we communicate

(45)

with, so the benefit of not using accumulative acknowledgments is limited.

In a real deployment, the approach would be more beneficial.

2.4.3 Router

The Bamboo-NS2 router consists of three modules; The routing table, the leafset, and the routing logic. The routing table consists of information about nodes spread over the key space, as well as functions to maintain and lookup node information. When we use the term node information, we refer to a structure which apart from a key value also consists of information of the network connection point of the node.

The leafset consists of ordered node information about the numerically closest nodes in key space which are the white nodes in figure 2.1, and func- tions to insert and remove nodes from the list. As previously mentioned, the routing table works like in Pastry. The routing table and leafset are used by the routing logic to lookup the next hop node when a key is looked up.

When a routing request of a key is made to the routing logic, it first checks whether that key falls within the leafset. If the key is within the leafset, the numerically closest node is found, and the nodes information is returned as the next hop. If the looked up key is not within the leafset, a request to the routing table is made, which returns the closest node outside the leafset. If no such node exists, the next hop node is the numerically closest node of the two leafset nodes that are furthers away, and then the information about the closest node is return by the routing logic.

2.4.4 Agent

The Bamboo-NS2 Agent is both the listening agent in NS-2 as well as the interface to the TCL scripts used to run simulations. It is the connection details for the listening agent which is spread through the network for other nodes to connect to.

From the TCL script that defines the simulation, the behavior of the Bamboo-NS2 node can be controlled. You can set the word and key length, make PUTs and GETS, connect and disconnect etc. It is in the listening agents recv() function that all incoming traffic to a node enters. If a new packet is an acknowledgment, the packet handler is called to remove the acknowledged packet from its buffer, as well as to calculate a RTT estimate.

If the packet is not an acknowledgment the packet handler acknowledges the packet and checks whether it is a new packet or not. If it is a old packet or a PING the only action taken is the acknowledgment. If it is new packet, it is sent to the router to calculate the next hop and generate a new packet to

(46)

send. If the next hop returned by the router is not null and not the node itself, the agent sends the new packet to the next hop node with the help of the packet handler module.

When a Bamboo-NS2 node is connected to a NS-2 network node, and it has joined the overlay network using the join command to the agent, PUTs and GETs can be issued to the agent from the TCL script. A PUT takes a key, an id, and the data size as arguments. The key is where the value is stored, the id is instead of an hash of the data, and the size is how big the data is. No actual data is put into the system but the size field is used to set the correct size of network packets during simulation, and the id is used to distinguish between different values. The GET command takes the key value requested and records the time. If a GET matches multiple values in the DHT only one is returned. This is not how Bamboo behaves; Bamboo would return values together with a pointer. The pointer can be used to retrieve the remaining values that matches the GET with repetitive GETs.

To support different measurements of the system, two different GET be- haviors are implemented. The first is the one resembling Bamboo with keys stored and cached, as is later described in the section on the data storing.

The second is a special GET where you lookup exact nodes in the network to evaluate the pure routing functionality of the system without the noise of key management.

2.4.5 Data storing

The data storing module in our system does not implement all the function- ality present in Bamboo. The synchronization between nodes is initialized by a node when it sends a list of its keys to another node. The receiving node builds a list of the keys in the received message it does not have, and sends that list to request those keys. Keys in the systems have a TTL, but that is a function we do not use during our tests. A good study of the storage problem is [10].

In Bamboo an improved synchronization method is used. It is based on Merkle trees [8] and it involves building a tree of hash values over the stored key values. The best case for this method is when the nodes are completely synchronized, which will result in the need to exchange one hash value to determine that. According to [10] the worst case of the Merkle tree approach is only O(n), were n is then number of keys. However, there is no evaluation of the time aspect of synchronization.

Figure

Updating...

References

Related subjects :