Performance Evaluation of DHTs for Mobile Environment

(1)

-1-

MEE10:13

Performance Evaluation of DHTs for Mobile

Environment

Monirul Islam Bhuiya Rakib Mohammad Ahsan

This thesis is presented as part of Degree of Master of Science in Electrical Engineering

Blekinge Institute of Technology December 2009

Blekinge Institute of Technology School of Engineering

Department of Telecommunication

Supervisor: Alexandru popescu & Karel De Vogeleer Examiner: Professor Adrian Popescu

(2)

-2-

(3)

-3-

ABSTRACT

istributed Hash Table (DHT) systems are an important part of peer-to-peer routing infrastructures. They enable scalable wide-area storage and retrieval of information, and will support the rapid development of a wide variety of Internet-scale applications ranging from naming systems and file systems to application-layer multicast.

A lot of research about peer-to-peer systems, today, has been focusing on designing better structured peer-to-peer overlay networks or Distributed Hash Tables (DHTs). As far as we concern, not so many papers, however, have been published in an organized way to check the adaptability of the existing four DHTs namely Content Addressable Network (CAN), Chord, Pastry and Tapestry for mobile environments. This thesis presents an attempt to evaluate the performances of these DHTs including adaptability with mobile environments. For this we survey these DHTs including existing solutions and based on that we decide our own conclusion.

D

(4)

-4-

(5)

-5-

ACKNOWLEDGEMENT

Fast of all we are grateful to our God, the most gracious and merciful, who give us energy and such kind of knowledge which is really helped us to perform well in difficult situation within thesis work. Also we would like to express our gratitude from heart for our parents, who supported us during this total time period.

We really like to appreciate our thesis supervisors Alexandru popescu and Karel De Vogeleer, of the School of Engineering, Blekinge Institute of Technology (BTH), Sweden, for their helping, guiding with many inspiring ideas, steady encouragement and collaboration for doing well in our thesis work. We also like to give thanks to BTH and all other employees of BTH, who really helped us and suggest the right way to reach our goal in many awful situations.

Finally, we are obliged to many of our friends for their unbelievable help, continuous encouragement and trust on us. We would like to dedicate our thesis work to all of our friends and to our grate parents.

(6)

-6-

(7)

-7-

TABLE OF CONTENT

ABSTRACT ... 3

ACKNOWLEDGEMENT ... 5

TABLE OF CONTENT ... 7

LIST OF FIGURE ... 11

LIST OF TABLE ... 13

CHAPTER ONE ... 15

INTRODUCTION ... 15

1.1.INTRODUCTION ... 15

1.2.OBJECTIVE ... 15

1.3.MOTIVATION ... 16

1.4.CONTRIBUTION ... 16

1.5.OUTLINE OF THE THESIS ... 17

CHAPTER TWO ... 19

BACKGROUND ... 19

2.1.PEERTOPEERNETWORK ... 19

2.1.1 INTRODUCTION: ... 19

2.1.2 P2P ARCHITECTURE: ... 19

2.2.OVERLAYNETWORK ... 23

2.2.2 WORKING MODEL OF OVERLAY NETWORKS [3]: ... 24

2.2.3 COSTS: ... 25

2.2.4 SOME BENEFITS: ... 25

2.2.5 SOME PROBLEMS: ... 26

2.3.DISTRIBUTEDHASHTABLE ... 27

2.3.2 ARCHITECTURE OF DHT: ... 28

A. Client: ... 28

B. Service: ... 29

C. API: ... 29

D. DDS Library: ... 29

E. Brick: ... 29

2.3.3 STRUCTURE OF DHT: ... 30

CHAPTER THREE ... 31

OVERVIEW OF EXISTING DHT’S ... 31

3.1CONTENTADDRESSABLENETWORK(CAN) ... 31

3.1.2 WORKING MODEL: ... 32

A. Node Joining: ... 32

B. Bootstrapping Nodes: ... 33

(8)

-8-

C. Finding own zone: ... 33

D. Neighbors’ Node: ... 34

E. Node Leaving: ... 34

F. Node Failure: ... 35

G. Routing: ... 35

H. Routing Geometry: ... 36

3.1.2 LOAD BALANCE: ... 36

3.2CHORD ... 38

3.2.2 WORKING MODEL: ... 39

A. Chord Simple Look up Algorithm: ... 39

B. Consistent hashing:... 39

C. Key to Nodes Mapping: ... 40

D. Finger Table [6]: ... 40

E. Node Joining: ... 42

Initialize fingers and predecessor: ... 42

Update fingers of the existing nodes: ... 43

Transfer the key:... 43

F. Node Failure: ... 43

G. Routing: ... 44

3.2.3 Routing Geometry: ... 44

3.2.4 LOAD BALANCE: ... 44

3.3PASTRY ... 46

3.3.1 INTRODUCTION ... 46

3.3.2 WORKING MODEL OF PASTRY ... 46

3.3.3 ROUTING TABLE ... 47

3.3.4 ROUTING ... 49

3.3.5 NODE JOINING & LEAVING ALGORITHM ... 50

A. Node Joining ... 50

B. Node Leaving ... 51

3.3.6 LOCALITY ... 52

A. Short Routes ... 52

B. Route convergence ... 52

3.4TAPESTRY ... 53

3.4.2. WORKING MODEL: ... 53

3.4.3 ROUTING: ... 55

3.4.4 NODE ALGORITHMS: ... 58

A. Node Insertion: ... 58

B. Node Deletion ... 60

Voluntary Node Deletion: ... 60

Involuntary Node Deletion: ... 60

CHAPTER FOUR ... 61

FUNCTIONALITY COMPARISON OF DHT’S ... 61

4.1INTRODUCTION: ... 61

4.2COMPARISONOFPROTOCOLS: ... 61

(9)

-9-

CHAPTER FIVE ... 62

EVALUATION OF COMPARED DHTS ... 62

5.1INTRODUCTION ... 62

5.2ROUTINGPERFORMANCE ... 62

5.2.1 COMPARISON OF ROUTING PERFORMANCE: ... 62

5.2.2 SUMMARY OF THE COMPARISON: ... 63

5.3STATICRESILIENCE ... 64

5.3.1 OVERVIEW: ... 64

5.3.2 SUMMARY OF THE COMPARISON: ... 65

CHAPTER SIX ... 66

ADAPTABILITY WITH MOBILE ENVIRONMENTS ... 66

6.1ISSUESTOBECONCERN: ... 66

6.2ANALYSISTHEDHTSUNDERHIGHCHURN: ... 67

6.3SUMMARYOFTHEANALYSIS: ... 69

CHAPTER SEVEN ... 70

CONCLUTION ... 70

7.1CONCLUSION: ... 70

7.2FUTUREWORK: ... 70

REFERENCES:... 71

APPENDIX: ... 76

ALGORITHM-1 FOR CAN CONSTRUCTION ... 76

ALGORITHM-2 FOR CAN ROUTING ALGORITHM 1 ... 77

THEOREM: ... 78

PSEUDOCODE FOR THE NODE JOIN OPERATION ... 79

(10)

-10-

(11)

-11-

LIST OF FIGURE

FIGURE 2.1:LEVELS OF P2P NETWORKS [1]... 19

FIGURE 2.2:OVERLAY NETWORK ARCHITECTURE OF P2P ... 21

FIGURE 2.3:APPLICATION INTERFACE FOR STRUCTURED DHT BASED P2POVERLAY SYSTEMS [2]. ... 22

FIGURE 2.4:OVERLAY NETWORK ... 23

FIGURE 2.5:SAMPLE OVERLAY NETWORK AND QUERY [3]. ... 24

FIGURE 2.6:ARCHITECTURE OF DDS DISTRIBUTED HASH TABLE [4]. ... 28

FIGURE 3.1:TWO DIMENSIONAL SPACE (X,Y),A KEY IS MAPPED TO A POINT. ... 32

FIGURE 3.2:PARTITIONS SCHEME FOR NODE JOINING. ... 32

FIGURE 3.3: EXAMPLE 2D SPACE, BEFORE AND AFTER JOINING [5]. ... 34

FIGURE 3.4:DISTRIBUTION OF ZONE VOLUME WITH AND WITHOUT 1-HOP VOLUME CHECK. ... 37

FIGURE 3.5:PERFORMANCE OF 1-HOP CHECKING AFTER INCREASING DIMENSION... 37

FIGURE 3.6:BASIC LOOKUP ... 39

FIGURE 3.7:KEY TO NODE MAP. ... 40

FIGURE 3.8:FINGER INTERVALS FOR NODE 1[6]. ... 41

FIGURE 3.9:FINGER TABLES AND KEY LOCATIONS FOR A NET WITH NODES 0,1, AND 3, AND KEYS 1,2, AND 6[6]. ... 42

FIGURE 3.10:(A)THE NUMBER OF KEYS STORED PER NODE IN A NODE NETWORK [10]. ... 45

FIGURE 3.11:EXAMPLE OF A PASTRY NODE WITH NODEID 3102 WHERE B =2,L=4 WITH BASE 4. ... 47

FIGURE 3.12:ROUTING TABLE OF NODEID 65A1X WHERE B=4.HERE, BASE=16 AND X=ARBITRARY SUFFIX.[38] ... 48

FIGURE 3.13:THE STATE OF 103220PASTRY NODE STATE IN THE CASE OF 12 BIT IDENTIFIER SPACE AND 4 BASE [42]. ... 49

FIGURE 3.14:EXAMPLE OF A TAPESTRY NODE 0642[48] ... 54

FIGURE 3.15:TAPESTRY ROUTING MESH WITH LINKS WHICH FORMS THE LOCAL ROUTING TABLE [38]. ... 56

FIGURE 3.16:MESSAGE ON TAPESTRY, PATH TAKEN BY A MESSAGE INITIALIZES FROM NODE 5230 INTENDED FOR NODE 42AD IN A TAPESTRY MESH [37]. ... 57

FIGURE 3.17:TAPESTRY PSEUDO CODE FOR NEXTHOP(.). ... 57

Figure 5. 1: Performance analysis of various routing geometry [27]…...……….……64

FIGURE 6.1:NODE JOIN/LEAVE WITH INTERVAL =10S (LEFT N=100, RIGHT N=1000)[36] ... 67

FIGURE 6.2:NODE JOIN/LEAVE WITH INTERVAL =120S (LEFT N=100, RIGHT N=1000)[36] ... 68

FIGURE 6.3:PASTRY UNDER CHURN [29]. ... 68

FIGURE 6.4:COST VERSUS PERFORMANCE UNDER CHURN IN TAPESTRY [30]. ... 69

(12)

-12-

(13)

-13-

LIST OF TABLE

TABLE 3.1:ROUTING VARIABLES OF CAN ... 36 TABLE 3.2:DEFINITION OF CHORD VARIABLES [6] ... 40 TABLE 3.3:ROUTING VARIABLES OF CHORD ... 44

(14)

-14-

(15)

-15-

CHAPTER ONE

INTRODUCTION

1.1. Introduction

Today we have a large variety of Peer-to-Peer (P2P) systems. Though in general they are rarely 100% pure P2P, meaning all peers act as equals. In such case the roles of the clients and servers are merged leading to the exclusion of the need for a central administration entity that manages the overlay system. Pure P2P networks have practical implications. Hybrid P2P systems address these practical problems. These systems utilize a centralized service for bootstrapping or maintaining the network while making use of P2P mechanisms for data exchange.

Furthermore there are two main classes of P2P networks with reference to routing substrate or geometry, the unstructured P2P system and the structured P2P system. By far the most common type of structured P2P is the Distributed Hash Table (DHT) which demonstrates advantages such as decentralization, scalability and fault tolerance, offering efficient content allocation and introducing content redundancy. To achieve this, a structured overlay topology is imposed and maintained by a globally employed protocol. This ensures efficient content discovery through the use of a specific routing protocol to search the virtually imposed structure.

1.2. Objective

The aim of this is to compare various existing DHTs in order to way out how DHTs will work on mobile environments. To do this we will use various defining articles and documentation as specification.

Chapter

1

(16)

-16-

1.3. Motivation

Peer-to-peer file sharing systems are now one of the most popular Internet applications and have become a major source of Internet traffic, especially in the area of structured peer-to-peer overlay networks or Distributed Hash Tables, which are simply called DHTs. Thus, it is extremely important that these systems can efficiently locate the node that stores the desired data in a large system efficiently. Nodes must be able to join and leave the system frequently without affecting its robustness or efficiency, and load must be balanced across the available nodes.

The scope of the thesis is to provide a survey of current popular DHTs, at least the following;

CAN, Chord, Pastry, Tapestry in order to provide organized information for future work on DHT. The DHTs should be compared according to their functionality. Their strengths and weaknesses have to be investigated and highlighted. Furthermore, the adaptability to mobile environments needs to be evaluated on a per DHT basis, exposing their suitability.

1.4. Contribution

The main contribution of this paper is the identification of the most important parameters of DHTs and then comparisons between these mechanisms under extreme conditions for better performance in mobile environments.

The information gathered in the comprised survey will then be used for the proposal and implementation of a novel mechanism that can extend any DHT. This mechanism will be referred to as XDHT.

(17)

-17-

1.5. Outline of the Thesis

In this thesis, it consists of seven chapters. The first chapter of this thesis report is about an introduction of P2P network along with some information regarding on DHT, which motivates us to work with this topics. We also discussed about our contribution within this thesis.

Chapter two is an overview of P2P network, Overlay network and DHT. This chapter gives us some knowledge about P2P and Overlay network. The total working models of some existing DHT‟s are discussed then in the next chapter, i.e., in chapter three.

In chapter four we find out some functionality comparison between existing DHT‟s.

With continuation in chapter five we discovered some performance evaluation summary of compared DHT‟s.

Then in chapter six we got some analyzed summary of DHT‟s adaptability with mobile environments.

Finally, chapter seven concludes our thesis work and gives us some idea regarding future work related with DHT‟s.

(18)

-18-

(19)

-19-

CHAPTER TWO

BACKGROUND

2.1. PEER TO PEER NETWORK

2.1.1 INTRODUCTION:

eer-to-peer (P2P) is a network where participating nodes share information equally, using proper communication systems without necessarily needing central coordination. It is the opposite conception of client/server networks.

Attractive features of P2P networks, i.e. improved scalability, decentralized coordination, lower cost of ownership, fault tolerance etc. make it popular in networking arena. For example, Napster, one of the peer-to-peer popularized file sharing system.

2.1.2 P2P ARCHITECTURE:

P2P infrastructure consists of three levels model [1] presented below:

Level 3: P2P communities

Level 2 : Consists of P2P applications

Level 1 : P2P infrastructures

Figure 2. 1: Levels of P2P networks [1]

Chapter

2

P

(20)

-20-

Level 1 P2P infrastructures: Foundation of all levels which perform communication, integration, and translation between IT components. Mainly, finding nodes, share volume and exchanging resources etc are done in this level.

Level 2 consists of P2P applications: Use the services provided by level 1. In the absence of central control they enable communication and collaboration of entities.

Level 3 P2P communities: Cooperative interaction between communities of similar interest and the dynamics within them.

We can specify P2P overlay network models as hierarchical framework which is distributed in various communication levels. At the top of the hierarchy is Application level which deals with tools, applications and services of the underlying P2P overlay structure. Service specific layer provides parallel scheduling and computational tasks for underlying network. Security management, fault tolerance and resource sharing issues are the concern of Feature management layer‟s. Routing and Lookup are handled by Overlay Nodes Management layer. The Network Communications layer explains the network characteristics of nodes connected through the overlay network.

(21)

-21-

P2P overlay networks can be stated by two formats, Structured and Unstructured.

Structured P2P refers to the network where the contents are placed in specific location rather than place to random peers. The network topology is strongly controlled so that queries become more efficient. For example, Distributed Hash Table (DHT) protocols like CAN, Chord, Pastry etc. These structured P2P protocols provide a self-organizing substrate for large scale peer to peer appliances, variety of decentralized services, including network storage, content distribution and application level multicast .The system uses structured P2P is scalable, fault tolerant and balance loaded.

In Structured P2P overlay network node ID‟s are assigned randomly to the set of peers into large space of identifiers and unique identifiers of data objects are called keys. The peer retrieve (key, value) when it give key associated with that pair. The value g pairs on the overlay network, as illustrated in Figure 2.3.

Figure 2.2 : Overlay network architecture of p2p

(22)

-22-

Each peer in the network keeps a routing table consist of its neighboring peers IDs and the IP addresses. Lookup queries are performed to forward messages in a progressive manner, with the NodeIDs that are closer to the key in the identiﬁer space. Different DHT protocols consume different scheme for lookup queries, routing approaches and load balancing.

Although, Structured P2P networks can competently find rare items as key based routing is used, they gain signiﬁcant higher overheads than Unstructured P2P networks.

Contrary, Unstructured P2P system is collection of peers with uncontrolled rules to join a overlay network without any former understanding of the topology. Flooding mechanism is used to send queries across the overlay with a limited possibility. When a peer receives a query, tries to match the content of the query and then sends a list of all content to the originating peer.

Flooding mechanism is useful for tracing highly replicated objects, resilient to peers joining and leaving the system but poorly suited for locating exceptional items. Also, this approach is not scalable as the load increases linearly with the total number of queries on each peer and the system size.

Figure 2.3: Application Interface for Structured DHT based P2P Overlay Systems [2].

(23)

-23-

2.2. OVERLAY NETWORK

verlay network is a virtual network built on various nodes connected virtually. It exists on the top of one or more underlying network. Each nodes connected correspond to a logical path through physical links of the underlying network. For Example, peer to peer network acts as overlay network as main existing network cannot provide the requirements of various application.

It is introduced to meet the problem where the nodes addresses were not known before. The routing messages send to the logical address of the node. For Example, DHT send routing messages to nodes by their logical address as the IP addresses are not known in advance.

Figure 2.4 : Overlay Network

O

(24)

-24- 2.2.2 WORKING MODEL OF OVERLAY NETWORKS [3]:

• Assure of data retrieval

• Less lookup time, typically O (logN) where N is the number of nodes

• Automatic load balancing

• Self organization

Since overlay networks identify neighbor nodes by content stored, they can modify standard graph traversal problem into a localized iterative process. Iterative process decreases overall network load and formulate the query process deterministic as each hop brings the query closer to its target set of hops. This is calculated according to their mathematical function.

Specifically, an overlay network performs like a distributed hash table by insertion, deletion and querying of key. A consistent hashing algorithm such as the secure hash algorithm (SHA-1) is used.

Node joining scheme of overlay networks is different from TTL-based algorithm as it is structured and characteristically symmetrical. The scheme is based on one or more algorithms that decide how the node will connect. Also the lookup times depend on the network‟s architecture. Moreover, the lookup algorithm includes recovery function for nodes failure as for recreate or maintain an appropriate network structure.

To join the network, a node sends a request such as DNS query or broadcast to find another network node already joined in the network. Such node is called bootstrapping node. The bootstrapping node provide initial information like neighbors node ID or IP addresses of the network

Figure 2.5 : Sample overlay network and query [3].

(25)

-25-

One of the important differences between overlay networks and unstructured P2P networks is that overlays doesn‟t support keyword based searching. It lookups data on the basis of identifiers derived from the content.

2.2.3 COSTS:

• Adds overhead and a layer in networking stack.

• Extra packet headers, processing.

• Remove Ethernet addresses from Ethernet header and assume IP header.

• Increases complexity though use layering. Layering does not reduce complexity, it only manages it. More layers of functionality mean more possible unintentional communication between layers.

2.2.4 SOME BENEFITS:

The demand of overlay network is increasing day by day as it includes some good features. It is expensive to develop totally new networking hardware/software

We don‟t need to set up new equipment, only modify existing software/protocols. The new software deployed on top of existing software i.e., of Ethernet does not require modifying the Ethernet protocol or driver, just adding IP on the top. Also do not need to implement at every node.

Another feature is bootstrapping. It allows bootstrapping to provide information for newly joined node.

All networks after the telephone assume as overlay networks.

Not require every node wants overlay network service all the time.

(26)

-26- 2.2.5 SOME PROBLEMS:

Sometimes node consumes too much load i.e., memory, bandwidth .Overlay network may inconsistence for those node.

Overlay network may have uncertain security properties • e.g., may be used for Denial Service (DoS) of attack.

Overlay network may not scale (not exactly a benefit).Sometimes may need n2 state.

Random changes in group membership as node join and leave dynamically.

Dynamic changes in network environments and so random changes in topology

Due to network congestion, changes in routing occur delay between members may vary over time.

Information of network conditions is member specific as each member must determine network conditions for itself.

(27)

-27-

2.3. DISTRIBUTED HASH TABLE

HTs were first introduced to the research community in 2001 .Distributed Hash Table (DHT) basically use the similar function of hash table. It is decentralized indexing systems which offer scalable fault tolerance data storage and lookup services. It is used for peer to peer communication having the without conception of server and client model.

To understand how DHT work then we have to understand the lookup service.

- Allocate IDs to nodes

- Map hash values to node with closest ID - Leaf set is successors and predecessors - All that’s required for accuracy

- Successively matches longer prefixes by Routing table - Allows proficient lookups

Initially it permits decentralize scattered peers to manage a mapping key from keys to value without any fixed structure. Using DHT we can store key value and we look up the value using the key. For storing and lookup DHT distributed the work to several machines. Due to the simultaneous changes of the network memberships DHT always give desirable performance The basic operations of DHT‟s are very simple.1) send query i.e, ping for udate routing tables2) Lookup to find the nodes 3) get values from nodes found from lookup 4) store value on those nodes.

D

(28)

-28- 2.3.2 ARCHITECTURE OF DHT:

The architecture and implementation of a distributed hash table, Distributed Data Structure (DDS) is describes below [4]:

Figure 2.6: Architecture of DDS distributed hash table [4].

A. Client:

- A computer which includes a specific software services running on that

Communicates over the Wide Area Network (WAN) with one of several service requests running on the cluster.

- The client selects a service instance is outside the scope of this work - Usually related with round robin DNS [123].

(29)

-29-

B. Service:

- A service nothing but the set of software processes.

- Service communicates with clients throughout the area and executes some application- level functions.

- Services may be soft state but are based on the hash table to control the persistent state.

C. API:

- The boundary between a service and DDS library which provides services that put (), get (), remove (), create () and destroy () functions on hash tables.

- Each operation is tiny

- All services to see the same consistent image of all the existing hash tables through this API.

- Hash table names are strings, the keys to the hash 64-bit integer - The hash table values are obscure byte array

- Involve the operations of the hash values in its total.

D. DDS Library:

- Java class library that shows the hash table API to services.

- Accepts hash table operations, and cooperates with the `` bricks'' - Contains only soft state, including metadata

- Perform as two-phase assign coordinator for the state of currency transactions in distributed hash tables

E. Brick:

- System modules that manage durable data.

- Each brick is provided by a set of access networks known as single-node hash tables.

- Consists of a buffer cache, lock manager, a persistent chained hash table implementation - Network stubs and skeletons for remote communication.

- In general, there is a brick in the cluster run by the CPU

- The bricks can be run on dedicated nodes, or nodes can be shared with other components.

(30)

-30- 2.3.3 STRUCTURE OF DHT:

There are several components that structure DHT. But mainly point on 1) Keyspace

2) Keyspace partioning 3) Build overlay network 4) Routing table

Keyspace: the foundation of DHT. In hashing keyspace refers to set of all possible keys used in it.

Keyspace Partioning: The ownership of the keyspace is distributed among the joined nodes which are known as Keyspace Partioning Scheme. For example, In DHT every joining node stores some key and value (k, v) in pairs. If a keyspace is K [0, 2410) and identifier of a node id (belongs to) K then the pair will be stored at the node which id is near to K.

Build Overlay network: Every node contain a set of links to other nodes. Joined these links form the overlay structure. The links they maintain are the links of its neighbors .Adjacent neighbors placed in the keyspace.

Routing: Key Based routing.

(31)

-31-

Chapter

CHAPTER THREE 3

OVERVIEW OF EXISTING DHT’S

3.1 CONTENT ADDRESSABLE NETWORK (CAN)

ontent Addressable Network (CAN) is one the main original DHT‟s introduced for peer to peer file sharing systems. It provides indexing system for p2p for large scale storage application in the internet.

For example, CAN could be used for large scale storage management system like OceanStore, FarSite and Publius. Indeed the OceanStore system already implemented it in their core design [5]. Because all these system need is a scalable management system and efficient insertion and retrieval method which is provided by CAN. Also CAN could provide wide area name resolution service similar like DNS [5].

Like other DHT, CAN is also a distributed system that provide key, value and retrieve the value associated with that key. The design dimension is Cartesian coordinate space. This d- dimensional coordinate space is virtual logical address of the independent physically connected nodes. The virtual coordinate space stores the pair (key1, value1).The entire coordinate space is divided among all the participant nodes at least each node have one distinct zone. This distinct zone is known as “chunk”.

The basic operation of CAN is insertion, lookup and deletion. Each node keeps information of the neighbor‟s nodes on its routing table. In figure 2 the neighbors of the node 4 are {3, 2, 1}.

When a request come then initial node forward it to the node which contain the key.

C

(32)

-32- 3.1.2 WORKING MODEL:

A. Node Joining:

As the entire coordinate space is divided among all the nodes so when a node join, the existing space is split in to two portion, one is assigned for the newly joined node.

For example, in Figure 3.1, the dimension coordinate space includes 4 nodes.

Figure 3. 1: Two dimensional space (x,y),a key is mapped to a point.

Figure 3. 2: Partitions scheme for node joining.

Here one important thing is Virtual Identifier (VID), the binary string which determines the path form root to the partition tree to the leaf node corresponding to zone [5]. VID represents the position of a node.

(33)

-33-

To increasing a CAN each node must have to unique VID. These can be done like these:

- The new node has to find a Bootstrapping Node.

- The new node has to find its own place

- The new node has to find the neighbors VID and IP addresses.

B. Bootstrapping Nodes:

A node who provides the initial information like IP addresses to the newly joined node to successfully join in CAN. For these the new node may inform before which is the Bootstrapping node by assigned static address or the bootstrap node can be found by Domain Name Services.

C. Finding own zone:

To find its own space the new node first randomly selects a point and send a join request to for the destined point. Each participant node then start routing until it reaches the point. When it reaches, the owner of that space split its zone and share with the new zone. Here these things should be considered:

- The owner node can‟t directly split its zone

- The other existing nodes compare the volume of its space with its intermediate neighbors in the coordinate space.

- The zone that is split for the newly joined node is the largest volume one.

(34)

-34-

Figure 3. 3: example 2d space, before and after joining [5].

Here, the joining of a new node affects only a small numbers of existing nodes lightly on their coordinate space. The neighbor nodes depend only the dimensionality of the coordinate space and it is independent on the total number of nodes.

Thus the joining of a node affects only O (number of the dimension) existing nodes.

D. Neighbors’ Node:

The nodes are said to neighbors if their coordinate space overlaps along d-1 dimension and about along one dimension [5]. For example in Figure 3.3, node 2 is the neighbor of node 6 as its coordinate space overlap.

E. Node Leaving:

It‟s very necessary to confirm that when a node leaves the CAN, its zone should take over by the remaining nodes. For this:

- If the zone of the one of the neighbor is merged and can make a single zone with the departing node‟s zone then it‟s done.

(35)

-35-

- If not then the neighbor whose zone is smaller will take over this and handle both the zone temporarily.

F. Node Failure:

Another important issue is node failure. When one or more nodes become unreachable and immediate take over algorithm is run to ensure that the failed node‟s zone is taken over by its neighbor. Usually all nodes send a periodic message to update their zone coordination and the neighbors list. When the update is missing from any node assume it is dead.

After confirmation of died nodes, immediate take over algorithm is run. The neighbors of the died node do this with the timer in proportion of the volume if the node‟s own size. When time expires then neighbor nodes sends a takeover message informing its updated zone volume to the neighbors of the dead node.

Receiving the takeover message, a node terminates its own timer if the volume of its zone is larger than the messaged zone. Otherwise it replies with its own takeover message. Thus they assured about alive nodes and the volume of the zones.

But in these procedures it is possible that a node finds a failure but it is less than half of the neighbors of the failed node‟s are still reachable which make the CAN inconsistent. In this circumstance it‟s better to give priority of expanding ring search instead of repair mechanism.

Because its reconstruct acceptable neighbor state to initiate a takeover safely.

Another considering issue is holding more than one zone when normal leaving and takeover algorithm procedure commences. A background zone reassignment algorithm [see Appendix] is run to avoid repeated fragmentation to make sure that CAN tends back towards one zone per node. [1]

G. Routing:

In CAN, routing occurs from source to destination by following the state line path through the Cartesian space .A source node route packets using its neighbor coordinate set by simply greedy forwarding. [5].

In a d dimensional coordinate space each node has 2d neighbors and the average routing path length (d/4) 𝑛^1/𝑑 .So we can increase the number of node and zone without increasing per node while the path length grows as O(𝑛^1/𝑑). [5]

(36)

-36-

When CAN set d=(log2 n)/2, it is possible to get O(log n) hops. But in usual CAN configuration it is inconvenient the path length or node degree tradeoff as the number of node is not known before. Enhancements of basic design of CAN will be beneficial [6].

When a new node gets placed in its zone, it updates the information for example, ip addresses of its neighbors. Similarly previous occupant nodes also update the routing table as new node joined. Each node on the space sends an update message after certain duration. These updates ensure all nodes about their neighbor‟s status.

H. Routing Geometry:

Table 3. 1: Routing Variables of CAN

Lookup Neighbors Routing state

Optimal path

Neighbor selection

Average Latency

Node congestion

Network diameter

O(d.N1/d) 2d O(d) O(log n) 1 High O(dn1/d-1) O(dn1/d)

3.1.2 LOAD BALANCE:

It‟s not mandatory for proper load balance that the partition should be perfect rather avoid non uniform partitioning [1]. When a node joins, picks a random point from the coordinate space and find its current occupant. The occupant performs a 1-hop volume check. 1-hop volume check is like selecting the bigger of its own and immediate neighbor‟s zone which is split with the new node.

Now from the example of the [1] (Sylvia Ratnasamy) in load balance 2.2.3, Let the total number of coordinate space is 𝑉_𝑡 and the total number of nodes is n. So we define V=𝑉_𝑡/𝑛

(37)

-37-

From the Figure 5 below, we can see that without 1-hop volume check, only 40% nodes assigned to zones with volume V. Contrary with 1-hop volume check 82% nodes assigned to zones with volume v.

Figure 3. 4: Distribution of zone volume with and without 1-hop volume check.

In Figure 3.4, all the zone volumes lay between V/2 and 2V when the dimension is increased from 2 to 3 and higher.

Figure 3. 5: Performance of 1-hop checking after increasing dimension.

So, we also agree that 1-hop checking is really helpful to achieve almost perfect partitioning.

(38)

-38-

3.2 CHORD

hord is another peer to peer lookup algorithm for DHT .In peer to peer it‟s a problem to efficiently locate a node that stores a specific data item. To meet this problem Chord was first introduced in 2001. Chord get used to competently as nodes join and leave the system, and can answer queries even if the system is continuously altering.

Chord differentiates from many other peer-to-peer lookup protocols by its simplicity, provable correctness, and provable performance [6]. In Chord, a key is routed through a sequence of O(log N) other nodes from source to destination and node requires information about O(log N) other nodes for proficient routing. But when the information is out of date the performance degrade gracefully as the nodes join and leave randomly, and consistency of even O(log N) state is hard to maintain. Chord has a simple algorithm for keeping this information in a random environment.

Existing name and location service provide “direct mapping” i.e. DNS between keys and values.

On the contrary Chord maps keys onto node by storing each key or value at the node to which the key maps.

DNS resolve hostname in to IP address [4]. Chord can also provide hostname to ip address mapping but not depend on the set of root server for quires like DNS. DNS provides structured names while Chord requires no naming structure. DNS is used to find named hosts or services, while Chord is also be used to locate data objects that are in several distributed machines [6].

The basic operation of Chord is simple. We have to provide a key and the key maps to node. The node may responsible for the store a value related with that key.

Chord uses a variant of consistent hashing [3], another feature of Chord which is used to assign keys to Chord nodes. Consistent hashing is used intend to load balancing, since each node receives about the same number of keys, and engages relatively slight movement of keys when nodes join and leave the system [6]. In Chord consistent hashing differs from traditional as in chord the routing table is distributed. So each node need to keep information about only few nodes. In N nodes system each nodes keep information about O (log N) other nodes and maintain routing information of no more than 𝑂(𝑙𝑜𝑔²𝑁) messages [6].

C

(39)

-39- 3.2.2 WORKING MODEL:

A. Chord Simple Look up Algorithm:

Lookup(my-id, key-id) n = my successor if my-id < n < key-id

call Lookup(key-id) on node n // next hop else

return my successor // done

Figure 3. 6: Basic Lookup

B. Consistent hashing:

- Key identifier = SHA-1(key)

- Node identifier = SHA-1(IP address) - SHA-1 distributes both uniformly - Node keys are arranged in a circle.

- The circle cannot have more than 2^m nodes.

- The circle can have ids/keys ranging from 0 to 2^m – 1

(40)

-40- C. Key to Nodes Mapping:

Figure 3. 7: Key to node map.

Figure 3.7, shows an identifier circle with nodes: 0, 1,2,3,4 and 5. The successor of identifier 1 is node 1, so key 1 would be located at node 1. Similarly, key 2 would be located at node 2, and key 6 at node 0. The purpose of consistent hashing is to let nodes join and leave the network with minimum disturbance. For this, when a node n joins in the network, the keys previously assigned to n’s successor now become assigned to n. Again when the node n leaves the network, all the assigned key of n reassigned to n’s successor. For example above, if a node were to join with identifier 5, it would capture the key with identifier 6 from the node with identifier 0. This is proved by a Theorem 1 [11, 13], mentioned later in the appendix portion of this paper.

D. Finger Table [6]:

Table 3. 2: Definition of Chord variables [6]

Notation Definition

finger[k].start (n+2^k-1)mod 2^m, 1=<k=<m .interval [finger[k].start,finger[k+1].start) .node first node>=n.finger[k].start

Successor the next node on the identifier circle; finger[1].node Predecessor the previous node on the identifier circle

(41)

-41-

- For faster lookups, Chord preserves additional routing information.

- If each node knows its exact successor then the additional information is not mandatory - Each node n‟ maintains a routing table with up to m entries (which is in fact the number

of bits in identifiers), called finger table.

- In the table, the i th entry at node n contains the identity of the first node s that succeeds n by at least 2^i-1 on the identifier circle.

- s = successor(n+2^i-1).

- s is called the i th finger of node n, denoted by n.finger(i) [5]

Figure 3. 8: Finger Intervals for node 1[6].

- Both the Chord identifier and the IP address of the relevant node is also contain in the finger table entry.

- The first finger of n is the immediate successor of n on the circle.

As each node has finger entries at power of two intervals roughly the identifier circle so each node can send a query minimum of halfway along the left over distance between the node and the target identifier. From this perception follows a Theorem2, mentioned later in the appendix portion of this paper. [5]

(42)

-42-

Figure 3. 9: Finger tables and key locations for a net with nodes 0, 1, and 3, and keys 1, 2, and 6 [6].

E. Node Joining:

One of the challenge of Chord for dynamic network is that node can join and leave any time. For this, need to able to locate every key on the network. All nodes in Chord maintain a predecessor pointer‟s which contain both Chord identifier and IP address of the immediate predecessor of that node. It is used as counter clock wise around the identifier circle.

For locate every key, Chord needs:

- The successor of each node´s is maintained correctly.

- For every key k, node successor (k) is responsible for k.

- Also for faster lookups the finger tables should be correct.

When a node for example n joined in the network the following tasks perform:

Initialize fingers and predecessor:

When a node n joins it informed about the predecessor and fingers by n´, n´ is an existing node.

The function init_ finger_ table is used from pseudocode [see Appendix]. Then n find its successors using pseudocode [see Appendix]. Also the change decreases the expected (and high probability) number of finger entries that must be looked up to (O Log N), which reduces the overall time to (𝑂 𝐿𝑜𝑔²𝑁). [6]

The newly joined node can update by copying its immediate neighbor´s finger table and predecessors to correct its own finger table.

(43)

-43- Update fingers of the existing nodes:

When a node joins the network is (O Log N) with high probability, lots of nodes need to be updated. Finding and updating these nodes takes (𝑂 𝐿𝑜𝑔²𝑁) time [21].

The function update_ finger_ table of the pseudocode [see Appendix] updates the finger table.

Node n will become the i th finger of predecessor node p, if and only if (1) p precedes n by at least (2^𝑖−1) and

(2) The i th finger of node p succeeds n

Thus, for a given n, the algorithm starts with the i th finger of node n, and then continues to walk in the counter-clock-wise direction on the identifier circle until it come across a node who‟s i th finger precedes n.

Transfer the key:

The node n supposes to move responsibility for all the keys for which it is now the successor.

Normally it would involve moving the data associated with each key to the new node. The node n will be the successor only for the keys; those were previously accountable of the node instantly following n. So, n only requires contacting that one node to which it can transfer the responsibility for all related keys.

F. Node Failure:

If a node n fails:

- Find the successor‟s of n from the node which include the finger table of n.

- Function find_ predecessor from the pseudocode [Appendix 2] is run.

- a successor list is mantained by every nodes in the chord ring.

- Function find_ successor from the pseudocode [Appendix 2] is run - Finds the immediate living successor to the query key [Theorem 7]

- The approximate time to execute find_succesor is O(log N).

(44)

-44- G. Routing:

- Circular key space or Ring - Maintain two sets of neighbors

- Each node maintain a successor list of k nodes - Routing correctness maintain by the successor list

- Routing efficiency is achieved with the finger list of O(log N) - Routing consist forwarding to the closest node

- Path lengths are O(log n) hops

3.2.3 Routing Geometry:

Table 3. 3: Routing Variables of Chord Lookup Neighbors Routing

state

Optimal path

Neighbor selection

Average Latency

Node congestion

Network diameter

O(logN) nlogn/2 log N O(log n) nlogn/2 Low O((log n)/n) O(log n)

3.2.4 LOAD BALANCE:

- Distributed hash function

- Spreading keys evenly over the nodes - Provides a degree of natural load balance.

According to [Ion Stocia], the number of keys per node shows great variations that increase linearly with the number of keys. Figure 3.10 a, plots the mean and the 1st and 99th percentiles

(45)

-45-

of the number of keys per node. For example, some nodes store no keys in all cases. To make it clears, Figure 3.10 b, plots the probability density function (PDF) of the number of keys per node when there are keys stored in the network.

Figure 3. 10: (a) The number of keys stored per node in a node network [10].

(b) The probability density function (PDF) of the number of keys per node [6].

One reason for these disparities is that node identifiers do not uniformly cover the whole identifier space. If N equal-sized bin is getting by divide the identifier space, where N is the number of nodes, then we may expect to see one node in each bin. But in real, the probability that a particular bin does not contain any node is (1 − 1/N)^𝑁.

This problem is solved by associating keys with virtual nodes, and mapping multiple virtual nodes (with unrelated identiﬁers) to each real node.

(46)

-46-

3.3 PASTRY

3.3.1 INTRODUCTION

Pastry is another peer-to-peer algorithm proposed by Rostron and Druschel in 2001. The major problem in peer to peer network is the scalability and routing efficiency. To meet this demand Pastry is introduced with enhances application level routing and object location property which differentiate Pastry from other existing DHTs.

3.3.2 WORKING MODEL OF PASTRY

The working model of Pastry is briefly:

- A Pastry node has 128-bit node identifier (NodeId) which indicates the location of node in a circular nodeId space.

- The ranges are from 0 to(2¹²⁸− 1).

- The node id is assigned arbitrarily when a node join in the network.

- To provide node id cryptographic hash function may used.

- With given a message and a key, Pastry can route the message to the node with the nodeId which is numerically nearby to the key, between all live Pastry nodes.

- Only 2^𝑏 − 1 ∗ [𝑙𝑜𝑔₂𝑏𝑁] + 𝑙 entries in routing table.

- When a node join or leave of a new node, the routing tables can be updated by exchanging 𝑂[𝑙𝑜𝑔₂𝑏 𝑁] messages.

-

For example, let a Pastry network consist of M nodes. So the route will be less than [𝑙𝑜𝑔₂^𝑏 𝑀]

steps on averages.

(47)

-47- 3.3.3 ROUTING TABLE

- The routing table of a node is consists of [𝑙𝑜𝑔₂𝑏 𝑁] rows and 2^𝑏 − 1 entries each which refer to the present node‟s nodeId.

- In the present node‟s Id, the node with n+1th digit has the probable one of the value of 2^𝑏 − 1 .

- Equivalently distribution of nodeIds confirms an even population of the nodeId space.

- So, in the routing table only [𝑙𝑜𝑔₂𝑏 𝑁] levels are populated.

- Each entry includes the IP address of nodes whose nodeId have the correct prefix. [40]

Figure 3. 11: Example of a Pastry node with nodeId 3102 where b = 2, L = 4 with base 4.

(48)

-48-

The exchange between the sizes of occupied segment of routing table depends on selection of b (about 2^𝑏 − 1 ∗ [𝑙𝑜𝑔₂𝑏𝑁] entries) and the requisite maximum hops for route between any pair of nodes is ([𝑙𝑜𝑔₂𝑏𝑁]). [40]

The neighborhood set S, holds the nodeIds and the IP addresses of the 𝑆 nodes which nearest of the local node. The neighborhood set is not used in routing messages usually except maintaining locality properties. [40]

𝐿 /2 is numerically nearby larger nodeIds of the leaf set L, and the 𝐿 /2 nodes with numerically nearby smaller nodeIds, associated to the presenting node‟s nodeId. Normally L-the leaf set is used for routing the message. Usual values for 𝐿 and 𝑆 are 2^𝑏or 2 ∗ 2^𝑏. [40]

Figure 3. 12: Routing table of nodeId 65a1x where b=4. Here, base=16 and x=arbitrary suffix. [38]

(49)

-49- 3.3.4 ROUTING

From the Pastry routing algorithm we can explain the routing system as follows when a message arrived at the node with nodeID:

First the node checks if the key falls within the scope of collection. If it is ok then the message is advanced straightly to the destination node, specifically the node in the leaf set whose nodeId is much close to the key .If the object found by the node itself then routing procedure is complete.

Now if the key is not within the scope of collection:

- Requires the routing table to forward messages.

- Message forward to the node which shares a common prefix with the key

- But the key has to be as a minimum one more digit except that the proper entry in the routing table is empty or the related node is unreachable.

- In this case the node as long as local node the message forwarded to numerically closer to the key than the present node‟s id.

Figure 3.13: The state of 103220 Pastry node state in the case of 12 bit identifier space and 4 base [42].

Here in Figure 3.14 a router sends a request to find 103200 to 103210.The searching of demanded keyword 102022 is closer to 101203.But the request is send to node 102303 because it shares the first 102 prefix. Again for keyword 103000, even though for sharing a common prefix, there is no routing table which is not smaller than the present node. So the current request will bypass through the node 103112, since this node share 103 prefix. Observe that the value of this node is numerically closer than the present node [42].

(50)

-50- 3.3.5 NODE JOINING & LEAVING ALGORITHM

In the Pastry network among the process node joining and node leaving are the most important part. In this section, we explain the process of the joining and leaving of nodes in the Pastry network. Here we start with joining of a new node that joins to the system.

A. Node Joining

- When a node joins in the Pastry network its nodeID is defined through application like SHA-1

- Also know its neighbors ID according to the proximity metric.

- The newly joined node requests the neighbor node to send a joining message with the key with equal value of new joined node‟s nodeID.

- The message route to the destination node whose id is numerically close to the joined node.

- In response to the join message all nodes in the network sends their state table to the new node.

- Thus the new node updates its own state table.

- Finally the new node informs all nodes of its arrival awareness and initializes its own.

(51)

-51- B. Node Leaving

It is a common phenomenon that nodes fail or depart in the Pastry network without any notice. In brief:

- When the nearby neighbor node can‟t communicate with a node than it‟s assume the node is dead or unavailable.

- To restore a departed node in the leaf set, its neighbor in the nodeId space contacts to the available live node and send requests for its leaf table.

- The failing of a node that emerges in the routing table of another node can be discovered when that node try to contact the failed node and in reply there is no response.

- Under these circumstances there is no delay of routing messages as the messages send to another live node. There should always a alternate way to keep the better reliability of the routing table.

- To relocate the departed nodes entry in the routing table, a node contacts initially with the node refer to by another entry of the same row and requests for that node‟s entry for.

- There is no pointer of the proper prefix of the live nodes in the row. The node next contacts an entry 𝑍_𝐿+1^𝑖 , 𝑖 ≠ 𝑐 , and so on to casting a larger net. This process is hardly possible to finally find the proper exact node if there is any exists.

- Though the neighborhood set is not used for routing messages still it is important to keep it there as it has key roles in exchanging information with neighbor nodes. That‟s why, a node always keep communicate to each other members of the neighborhood set from time to time to get notice if it is still alive.

- If there is no response from any one member, the other members for their neighborhood tables, update the distance of each of the newly discovered nodes and get new updated neighborhood set.

Pastry uses a positive approach to controlling the parallel node joining and node leaving. Since the joining/leaving of a node affects only a small number of existing nodes in the system, conflict is rare and a positive approach is appropriate [40- 50].

(52)

-52- 3.3.6 LOCALITY

Another important property of Pastry is its locality which refers to the proximity metric. In this section, we discuss about the properties of Pastry‟s routes with respect to the proximity metric.

The proximity metric assume as scalar value for example to find the distance between two nodes within round trip time. The intention is to find the lower distance between the nodes. Here we will discuss two of Pastry‟s locality properties that are relevant to routing performance. [40]

A. Short Routes

All ready mentioned above that each entry in the node routing tables is selected to refer to the nearest node with the correct nodeId prefix, according to the proximity metric. So, the consequences:

- In every stage a message is routed to the numerically closest node, matching with a longer prefix.

- From the result of simulation in [39], we can observe that the average distance of traveled message is very short (1.59 and 2.2) time.

B. Route convergence

- Route convergence refers to the distance of two messages travel with the same key before their routes congregate.

- The simulations in [39] shows that the route convergence is almost equal to the distance between their respective source nodes.