Dionysius: a Peer-to-peer Database Management System

(1)

Master Thesis

Software Engineering Thesis no: MSE-2009:05 April 2009

Dionysius

a Peer-to-peer Database Management System

Luca Guadagnini

School of Engineering

Blekinge Institute of Technology Box 520

SE 372 25 Ronneby Sweden

(2)

This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in partial ful-

llment of the requirements for the degree of Master of Science in Software Engineering. The thesis is equivalent to 40 person-weeks of full time studies.

Contact Information:

Author:

Luca Guadagnini

E-mail: luca.guadagnini@gmail.com University advisor:

Dr. Mikael Roos School of Engineering University co-advisor:

Dr. Robert Feldt School of Engineering School of Engineering

Blekinge Institute of Technology Box 520

SE 372 25 Ronneby Sweden

Internet Phone Fax

: www.bth.se : +46 457 38 50 00 : + 46 457 271 25

(3)

Abstract

With the introduction of the peer-to-peer paradigm in the world of software, a lot of applications have been created in order to such architecture. Most of them are developed for providing a data sharing service among users connected to a network and programs such as Napster, Gnutella, eMule and BitTorrent have became the so called killer-applications. However some eorts have been spent in order to develop other solutions with the usage of peer-to-peer paradigm. In the case of databases some projects are started with the general purpose of sharing data sets with other databases. Generally they push on the idea of providing the data contained in their database schemes with other peers in the network showing concepts such schema matching, mapping tables and others which are necessary to establish connections and data sending.

The thesis analyzes some of such projects in order to see which of them is the most dened and well-supported by concepts and denitions. Hyperion Project of the University of Torono in collaboration with the University of Trento is the most promising and it aims to be one of the rst Peer-to-Peer Database Management Systems. However the common idea of considering the peer-to-peer paradigm equal to data sharing - in the way presented by applications such as Napster or others - leads to a lot diculties, it is hard to handle the data sets, some operations must be done manually and there can be some cases where the peer-to-peer paradigm is not applied at all. For this reason the goal is to dene and show the concept of peer-to-peer database built from the scratch with a suitable DBMS for it.

A real denition of peer-to-peer database has not been ever made and here for the rst time we tried to give one according to our vision.

The denition depends on some precise concepts such global schema - which is the original design of the database -, sub-schema - a well logical dened sub-set of entities of the original schema - and binding tables - necessary to allow the creation of constraints and relations among the entities. Then to show the validity of such concepts and how a management system for peer-to-peer databases can be developed and used, a prototype (named Dionysius) has been realized by modifying HSQLDB - an ordinary DBMS developed in Java - and adding the peer-to-peer platform by using the JXTA libray set.

(4)

List of Figures

2.1 Database entity representation . . . 12

2.2 Structure of a DBMS . . . 12

2.3 Multiple user access . . . 13

2.4 er diagram . . . 15

2.5 Relationship Classication . . . 16

3.1 A simple ER diagram for a Book Shop Database . . . 23

3.2 The Book Shop Database . . . 24

3.3 Architecture of a Peer-to-peer Database Management System . . . 26

4.1 Architecture of HSQLDB . . . 39

5.1 The ER diagram and the relational model of our simple database Library . . . 48

5.2 Dionysius has just been started . . . 49

5.3 Poseidon sent a hi-there message ans Zeus replies with welcome message . . . 49

5.4 The DatabaseManager of HSQLDB is connected to our Library Database . . . 50

5.5 The result of the creation of writers in peer zeus . . . 51

5.6 The response of the query of Test 2.2 in peer zeus and poseidon . . . 53

5.7 The result of the select. . . 54

5.8 The result of the inner join . . . 55

5.9 The ER diagram of the second version of database Library . . . 56

5.10 The relational diagram of the second version of database Library . . . 57

5.11 The results of the creation of table writers . . . 58

5.12 The result of the query with the INNER JOIN . . . 59

5.13 The result of the creation of the table expositions in the peer zeus and poseidon . . . 60 5

(9)

6 LIST OF FIGURES

5.14 The result of the query of Test 4.4. . . 60

5.15 The messages among the peers of the network. . . 61

5.16 The results of the simple select before the detection of the external peers . . . 62

5.17 The results of the simple select after the detection of the external peers . . . 62

(10)

Chapter 1

Introduction

In this chapter we are going to introduce the main objectives of our thesis, the goals and the solutions which we want to present and show.

1.1 Background

During the course of Advanced Software Engineering Project we presented a research concerning the concept of peer-to-peer database and the necessary relational model for implementing it. In the research we shown in which way it is possible to design a peer-to-peer database by using a particular database management system (hereafter DBMS) named Hyperion.

Hyperion is a project developed by the University of Toronto with the collaboration of the Faculty of Science of the University of Trento and leads to present a valid solution for enriching the data set among dierent databases by using a peer-to-peer network. To do that some rules must be followed otherwise it is not possible to establish a connection amount these databases and most of all share their data. For starters the database can be structurally dierent but they must concern the same conceptual idea, in other words if a database has been designed for storing books of a library or time ights of an airport, the other databases have to follow the same purpose. This is due to a special feature implemented in Hyperion, the mapping tables. The mapping tables are common database tables which have the goal of linking two columns of two structurally dierent tables (but conceptually the same) by mapping the equal values so that it is possible to avoid the redundancy of data among the databases. Such mapping tables are not automatically created by Hyperion, but they need the intervention of a human being for the creation, to establish which are the values to be mapped and for a constant control in the case some modications are performed into the linked tables since these edits also involve the mapping tables themselves.

7

(11)

8 CHAPTER 1. INTRODUCTION Hence after this study we realized that Hyperion introduces some interesting concepts about peer-to- peer database, but at the same time a lot of diculties in order to maintain such ideas xed together, due to, most of all, the purpose of taking the data from existing databases and not creating a new one from the scratch. Instead we found the idea to build a database from the bottom more attractive, not only for presenting in a better way a clear vision of what a peer-to-peer database is, but even for showing another kind of approach against the continuous increasing size of the large databases. The introduction of systems such as parallel database and distributed database aim to reduce the problem of large databases and unfortunately some issues bound to database concepts such as data integrity and atomicity are introduced as well. Thanks to the new concepts that we are going to dene, it is possible not only to prevent such issues, but even a simplication of the database management.

The thesis thus aims to dene some basic concepts about peer-to-peer database such as sub-schema and binding tables, and then shows how we can apply such denitions to a real DBMS. To do that we considered the possibility of developing a prototype of a hypothetical peer-to-peer database management system (hereafter P2PDBMS), by modifying a DBMS implemented in Java called Hypersonic and by using the JXTA libraries for developing the peer-to-peer system inside of it. After that we are going to show some performance tests against the ones of an ordinary DBMS, in order to proof the eectiveness of the project.

1.2 Objective

The main purpose of the thesis is to provide at the rst time some new concepts bound to what a peer- to-peer is. The overall idea which denes that the peer-to-peer paradigm is a synonym of data sharing is some way totally wrong, since it can help to nd other potential applications. In the eld of the databases most of the projects which have involved the peer-to-peer paradigm are focused on the goal of sharing data sets among dierent databases connected through a peer-to-peer network and all such projects are trying to present their point of view of what a peer-to-peer database management system is. One of the most mature and promising projects is called Hyperion and the main objective is to enrich the data set of a database with data of another database, structurally dierent, but with the same purpose (i.e. storage of the ights of an airport or the books of a library, etc.). To do that Hyperion introduces an interesting feature which allows such kind of sharing: the mapping tables.

After a rst study of the concepts behind the Hyperion Project, in order to have some basics where we can start from, we want to present our idea of peer-to-peer database. Such concepts such as sub-schema and binding tables are necessary to develop a peer-to-peer database directly from the scratch and how the peer-to-peer architecture can be used for it. Finally we shall present a prototype not only to apply

(12)

1.3. RESEARCH METHODOLOGY 9 and show the denitions that we provided, but also to show how a management system for peer-to-peer databases should look like. In our opinion a peer-to-peer database management system should not be too dierent from the ordinary one and instead it should simplify the work of the management of a peer-to-peer database like it was an ordinary one.

Most of the validation using the prototype is provided by the implementation of the binding tables, how they can help with the query solving and how they are created. By providing some test cases we can show how it is possible to handle a DBMS for peer-to-peer databases without any human intervention in the critical phases. Even in the case when our P2PDBMS catches the presence of other peers the users should not feel the need to intervene, because everything should be managed automatically like an ordinary peer-to-peer application.

To be sure to not lose our way, we dene some questions which establish the overall research:

Which are the main database projects which apply the peer-to-peer paradigm?

Why is Hyperion Project the most suitable project for our purpose?

How can we create a denition from the scratch of peer-to-peer database?

Which are the main concepts behind the idea of peer-to-peer database?

Can we show a possible architecture for peer-to-peer database management system?

Can we show a validation of our concepts by developing a prototype of P2PDBMS?

The rst two questions are used to have the necessary background where we can start from, then the middle ones we try to provide a set of concepts behind the idea of a peer-to-peer database and the last two for applying the concepts by prototyping.

1.3 Research Methodology

Our research is to dene something not yet shown by providing a new point of view and concepts of a solution, in order to nd a better and simplier way to reach the purpose. To show such denitions we apply the prototyping method, in this way we are not only going to show the validity of our ideas, but also the goodness of the project itself.

(13)

10 CHAPTER 1. INTRODUCTION

(14)

Chapter 2

Database Management Systems and Related Work

For this chapter we aim to give the base in order to be able to understand the study we accomplished.

For starters we just only give some basic notions about what the databases are and little by little we will get close to the concepts of what a distributed database and parallel database are. This wants to be a clarication for getting better the dierences with Hyperion Project and peer-to-peer database (hereafter PDB).

2.1 Database Management Systems

By denition a database management system (hereafter DBMS) is a collection of interrelated data and a set of programs to access such information. The collection of data is ocially called as the database and contains relevant information usually concerning an enterprise. The DBMS's are designed to handle the database information in the best way in terms of eciency and convenience, by dening the structure for the storage and providing the necessary operations for manipulating the data[1, 2].

More than that the DBMS must provide and ensure safety for the data and for all the operations committed, by avoiding system crashes, malicious attacks or unauthorized access. In such way it is always possible to provide or reply correct results, because in this eld the data comes rst every time and the system must always focus upon this principle.

11

(15)

12 CHAPTER 2. DATABASE MANAGEMENT SYSTEMS AND RELATED WORK

2.1.1 Database

As we said before, a database is an organization of data. Such organization must be well-structured in order to get the necessary data not only in the most meaningful way, but also for allowing an access to them in dierent logical orders. Usually a database entity is represented with the following gure:

Figure 2.1: Database entity representation

2.1.2 Structure and Objectives of DBMS

In practical terms a DBMS is a software that allows the translation from design model of the database to the physical one and vice versa. The design model is usually referenced as the conceptual schema and it represents the logical idea of the organization of the data. Such schema is converted by using an appropriate interface language understandable by the DBMS which converts the sent commands in a physical database.

Figure 2.2: Structure of a DBMS

In order to maintain the data always acceptable for the user, the DBMS must satisfy some properties:

data availability: all the users who access to the DBMS must always have the data in the meaningful way as they requested and ordered;

data integrity: all the information of the data must be correct, otherwise it is considered as unreliable data;

data security: the DBMS must provide a security system (such as login and password) in order to avoid malicious operation on the reserved data;

data independence: the system allows the manipulation of the data in every manner the user request, but does not permit to see how the data are saved and handled.

(16)

2.1. DATABASE MANAGEMENT SYSTEMS 13 2.1.3 Advantages of using a DBMS

Some advantages of the DBMS are listed below:

multiple user access: more than one user can access to the DBMS for interacting with the stored data (g.2.3)

Figure 2.3: Multiple user access

centralized data management: all data are handled by only one system and this kind of approach leads to obtain a system more ecient and with less data redundancy;

data independence: the DBMS permits to provide a physical data isolation from the logical of an application that use them, that is, whether a modication is performed upon the physical data, the

(17)

14 CHAPTER 2. DATABASE MANAGEMENT SYSTEMS AND RELATED WORK application does not care about that, because logically the data remain the same; however in other case whether the modication also involves the logical organization of the data, the application must be rewritten;

data consistency: the system avoid the chance to have dierent copies of the same data but dierent values, otherwise this situation leads to unpleasant results.

However one of the most dicult task that a DBMS must face, is the threat of the multiple access to the same data because it can corrupt the data integrity property.

2.2 Data Models of DBMS

As we mentioned before, in order to dene the structure of a database it is necessary to design a model which represents the organization of the data. Such data model usually referred as the conceptual schema, can be represented by using several diagrams which are able to describe all the properties concerning the data such as the relationships, semantics and constraints. In the following sections the main diagrams are described.

2.2.1 Entity Relationship Model

The basic idea of the entity-relationship model (for short er) is to describe the world around as a set of objects (or entities) and relationships among those objects. In this way is easier for the people what a database represents, because they are able to take such described objects and put them in a real context.

However the main goal of the er diagram is more than that and aims to provide a high-level of abstraction of the database entity, in order to let the designer to not worry about the physical structure and care about the modeling of the involved enterprise only.

There are several shapes which compose the model:

entity: represents the object involved in the enterprise, it can be an employee, a book, a status and so on, and by denition it is an object that can be uniquely identied[25]; the usual shame for it is a rectangle;

attribute: represents a quality or a property of the entity and it usually represented as a ellipse;

relationship: when an entity is related to another a relationship between them is established; the relationship is shaped as a diamond;

(18)

2.2. DATA MODELS OF DBMS 15 all such model objects are connected through lines which describe a well-dened association among them (g. 2.4).

Figure 2.4: er diagram

2.2.1.1 Classication of the Relationships

There exist dierent kind of relationships among the entities, one-to-one relationship, one-to-many relationship and many-to-many relationship.

one-to-one relationship: an example which represents this kind of association can be between Major and its City, a person can be major for only one city and the city can have only one major per time;

one-to-many relationship: instead for this case we have that, for instance, a Mother can have more than one Son, but the Son can not have more than one (natural) Mother;

many-to-many relationship: the last class of relationship says that, for example, a Student has to do more than one Project and the Project can be done by more than one Student.

Sometimes when it is necessary on the (many-to-many) relationship can be xed some attributes in order to enrich the information about the association between two entities. In this case we will use a particular entity which expresses the relationship named associative entity. For instance in the case of

Student/Project, if we want to know when a project has been started and nished, we can add the attributes start date and end date on the relative relationship and convert it in an associative entity.

2.2.2 Relational Model

The structure of the relational model consists in a set of tables, where every row in them usually express the relationships among the data. The headers of the tables or relation are called attributes and in order to establish the relationships some of them can be used as the primary, candidate, super or foreign key. The primary, candidate and super key have the same purpose that is they have to identify uniquely the row

(19)

16 CHAPTER 2. DATABASE MANAGEMENT SYSTEMS AND RELATED WORK

(a). one-to-one relationship

(b). one-to-many relationship

(c). one-to-many relationship

(d). associative entity

Figure 2.5: Relationship Classication

or tuple of that particular relation, with one attribute or a minimum set of attributes or all the attributes respectively. Such keys are usually recalled by the foreign key of another (or the same) relation in order to refer to that tuple.

When we want to dene the conceptual schema of our relational database we dene the set of relation schemes which are the denition of all relations as they were variables (consider the example showed in

g. 2.5.d):

student = (student_number, name, surname, age, country) detail = (student_number, project_number, start_date, end_date, notes)

project = (project_number, name, specications)

all the relations have a primary key which is enlightened by the underlined attributes, a particular case can be mentioned observing the relation detail which actually has the primary key and the foreign keys on the same attributes.

2.3 Distributed and Parallel Database Management Systems

The modern architecture of a database management system is based upon the server-client paradigm.

This kind of architecture usually provides a server system where the database is installed and a client

(20)

2.4. PEER-TO-PEER DATABASE SYSTEMS 17 system which connects to the server in order to satisfy its own requests. However such way of handling the databases has became obsolete due to the huge number of requests from the network and the continuous increasing size of databases and these issues can lead to heavy ineciency access to the data. For those reasons other solutions have been found such as parallel database management systems and distributed database management systems.

2.3.1 Parallel Database Management Systems

The parallel database management system (hereafter PDBMS) is designed to improve the access to the data by using the parallel paradigm. The main dierence upon the ordinary DBMS is that instead of managing the incoming requests serially (as they were a sort of queue), the PDBMS handles all of them parallely, in this way it is able to serve them simultaneously and in shorter time than usual. This is possible due to usage of several CPU's and hard disks in one machine, allowing ecient access and creation of very large databases.

Such systems are usually used in distributed systems.

2.3.2 Distributed Database Management Systems

Instead to use only one system the distributed database management system (hereafter DDBMS) allows to use more than one computer system connected by a network. The main database in this case is no more installed in one system only, but it is replicated in more servers. This kind of approach permits to have several advantages such as ecient access to the data, the chance to avoid dangerous implications due to damages (any kind) at the system and so on. To increase the eciency of such systems is possible to combine them by using PDBMS's.

However one of the most problematic issue of the DDBMS is the integrity of the data, because of the replication of the database itself, since there is certainly that the data are always kept updated or handled in the same manner.

2.4 Peer-to-Peer Database Systems

The general purpose of the peer-to-peer database systems is to share data of their heterogeneous or non- heterogeneous databases by using a peer-to-peer network. Most of them share the same basic idea and only dier in some technical things in order to simplify the management of the system by providing coordination rules and minimizing the human intervention. The most important and the only one that we are going to

(21)

18 CHAPTER 2. DATABASE MANAGEMENT SYSTEMS AND RELATED WORK consider during our research and development is Hyperion, as it is also the most developed, however after its section we are going to talk a little about the other solutions.

2.4.1 Hyperion Project

The Hyperion Project of the University of Toronto aims to obtain a system which is able to share the data among databases with dierent schemes, but designed for the same domain. The project is strongly based on one of the main concepts of these days: peer-to-peer paradigm. The area of the peer-to-peer is one of the most developed in this last decade and it is dened on the idea that every computer system connected to a network is independent and can leave and join to it whenever it wants, because there is no control on the nodes. Such kind of network can be used for many reasons and the most common is the data-sharing.

In the case of Hyperion, the project wants to reach the goal of achieving a DBMS applying such peer- to-peer paradigm with data-sharing among dierent databases. The only constraint needed by Hyperion is all the present databases in such network must concern the same problematic domain. In a few words all the databases, for instance, must designed to create a database for a library. Then by using the peer-to-peer network, such databases can enrich their data set requesting information to the connected DBMS's.

2.4.1.1 Mapping Tables

Hyperion to reach its goal has to link somehow the tables which have the same conceptual correspondence in the other databases. The link is established by mapping the values of the columns of such tables, in order to understand which are the values that the DBMS has to consider equal and which are the new ones. This mapping is stored in a special table called mapping table. To get a better idea of what exactly those tables are, we will make an example.

Let us suppose we have two connected databases (DB-A and DB-B) and both are designed for a library, but not in the same way. However we want to join their data sets by using Hyperion. For instance in both we have a table concerning the book storage:

DB-A · books

isbn title synopsis

123A Moby Dick The story of the White Whale 234B Alice in Wonderland A trip in a wonderful land 345C The Old Man and the Sea A man with the shes

···

(22)

2.4. PEER-TO-PEER DATABASE SYSTEMS 19 DB-B · tbl_bks

book_code name release date

ASD1 The Treasure Island 1883

QWE2 Strange Case of Dr.Jekyll and Mr.Hyde 1886

MNB3 Through the Looking-Glass 1871

what we need to link such tables are two mapping tables, one for the primary keys and another one for

title and name since both columns refer to the title of the book:

DB-A books2tbl_bks_0 = (isbn, book_code) books2tbl_bks_1 = (title, name) DB-B tbl_bks2books_0 = (book_code, isbn)

tbl_bks2books_1 = (name, title)

For columns such as synopsis and release date that have no correspondence in the other table, there is no need to dene a mapping table, so they will be considered as new data.

When the mapping tables are established, every time a select command has been committed the only date that are shown by the Hyperion come from the local table and those are considered new. For instance if we send a simple query like this select * from books to the database DB-A, we should obtain such result:

books

isbn title synopsis release date

123A Moby Dick . . . nil

234B Alice in Wonderland . . . nil

345C The Old Man and the Sea . . . nil

ASD1 The Treasure Island nil 1883

QWE2 Strange Case of Dr.Jekyll. . . nil 1886 MNB3 Through the Looking-Glass nil 1871

2.4.2 Otheer Peer-to-peer Database Systems

Other researches have been done in order to provide good approaches for the idea of a peer-to-peer database systems.

(23)

20 CHAPTER 2. DATABASE MANAGEMENT SYSTEMS AND RELATED WORK 2.4.2.1 XPeer

XPeer[28] a system based on the denition of XML data structures and they are shared by using a peer- to-peer network. Essentially there is no database behind the architecture, just only the concept of data sharing through a peer-to-peer networks where the owners of the peers are able to generate their data structures by using XML and start to share their data with other peers. The queries are solved by using XQuery which is the system that XPeer is based on.

2.4.2.2 coDB

coDB[27] it is a system quite similar to Hyperion, but it focuses on nding a good solution in order to solve the queries among dierent heterogeneous databases.

2.4.2.3 Dìgame

Dìgame[25] is another peer-to-peer database system which looks like coDB or Hyperion, but it adds an additional feature. When we actually demands for data through the network, the data that we obtain, we can store them locally in order to replay to other similar requests when the original source is not available.

2.4.2.4 Mapster

Mapster[26] instead provides the same basic idea, but adding the concept of schema matching. In other words when the user decides to share its own database, the schema of the database must be match to other database schemes in the peer-to-peer network in order to be able to share/request data among them.

As we can see all these projects are not based upon the idea of creating a pure peer-to-peer database, but only sharing the data of our own local database with other databases by using a peer-to-peer network as communication means. Instead our idea is more focused on using the peer-to-peer concept as interconnec- tion branch among the sub-schemes (which belong to a bigger original schema as we shall see in the next chapter) distributed through a network.

(24)

Chapter 3

Dionysius, an introduction

3.1 Peer-to-peer Database Management System

The idea of peer-to-peer database management system that we want to present in this chapter is quite dierent from which presented by Hyperion Project. As we explained before Hyperion aims to get a set of conceptually similar databases and interconnect them in order to enrich the content of the local DBMS.

The project is shown as a sort of interface for the local management system, adding a peer-to-peer interface for handling the connections among the databases and a mapping table management in order to maintain the associations among the tables. As we can see Hyperion can not be classied as an ordinary DBMS, not only because of the application itself, but also because of its goal that is using more dierent databases as they were part of a global - distributed - database. In other words if we think about a conceptual schema of a common database, such schema is blandly replicated to other computer systems over a network.

However the most dicult task of Hyperion is to create and maintain the mapping tables, just because of the goal of the project itself. Instead it could lead to more interesting results if we design a peer-to-peer database from the scratch, that is, we actually aim to obtain a conceptual schema which is possible to split in dierent sub-schemes and put them in dierent computer systems.

3.1.1 Peer-to-peer Database

By denition[25] a database schema is referred to be the overall idea of our organized data set. It can be represented as a collection of objects, best known as the tables or relations whether we refer to relational model, which are connected in a meaningful way. Such schema can be layered in three dierent parts:

physical schema: is the database described at the lowest level;

21

(25)

22 CHAPTER 3. DIONYSIUS, AN INTRODUCTION

logical schema: is the conceptual schema of the database;

sub-schema: that is dierent views of the database which do not necessarily involve all the objects of the design.

Therefore if we want to apply the peer-to-peer paradigm in such concept, the only layer we have to handle in a dierent way is the physical schema. Since the physical schema involves the physical structure of the database, we can focus to nd out how a peer-to-peer database can be conceptually made.

For starters we consider the er diagram shown in g. 3.1, the conceptual design of a database for a Book Shop. Suppose that from the specications is declared that the building of the Book Shop has three

oors and each of them has been organized by sections: the rst oor has the new-released books and the cash for payment, the second oor has the sections regarding novels, history, books for children and the third one has art, computer science and school. All the oors have one computer and the last two also have several terminals connected to the local system, in order to allow the costumer querying the database and looking for the book he wants.

Hence the idea is to provide every computer with the book-database, however a problem arise. Each computer of the oor is not so recent and it is quite hard to nd a way to guarantee an ecient access to the data in the case of mass of costumers in the shop. A possible solution is to split the database design in sub-schemes and put in a meaningful way every sub-schema in one computer only, in order to reduce the data set and the workload of the single machine as well.

To get better the idea of what we want to do, we can turn the er diagram of g. 3.1 in relational schemes:

conceptual schema book = (ISBN, Title, Synopsis) detail = (ISBN_Book, Shelf, Copies)

writer = (ISBN, Name_Birthdate)

publication = (ISBN, Name_Publisher, Release_Date) author = (Name, Birthdate, Biography)

publisher = (Name, Telephone, Address) section = (Name, Floor)

and see which are the right relations for the systems present in the shop. What we know is that the oors of the shop are organized by sections, therefore we can organize the database in a such way to put only the necessary information for that oor only. For instance the last two oors are strongly based on the sections and the books, we can thus dene this kind of sub-schema for those oors only:

(26)

3.1. PEER-TO-PEER DATABASE MANAGEMENT SYSTEM 23

Figure 3.1: A simple ER diagram for a Book Shop Database sub-schema-1

book = (ISBN, Title, Synopsis) detail = (ISBN_Book, Shelf, Copies)

writer = (ISBN, Name_Birthdate)

(27)

24 CHAPTER 3. DIONYSIUS, AN INTRODUCTION This sub-schema has been dened just thinking about which are the main keywords usually typed by the costumer: the title of the book and the writer. In the same way we can dene the sub-schema for the

rst oor which actually have the remaining relations:

sub-schema-2

author = (Name, Birthdate, Biography)

publication = (ISBN, Name_Publisher, Release_Date) publisher = (Name, Telephone, Address)

section = (Name, Floor)

simply because in the rst oor there is not any terminal for the costumers. Hence what we will have is a set of databases which are connected among them and sharing their own data in order to respect the specications and most of all the original schema (g. 3.2).

Figure 3.2: The Book Shop Database

Now as we have an idea how a data set of a peer-to-peer database is organized, we may compose its denition. By denition an ordinary database is dened as follow:

A database is a well-organized collection of data that are related in a meaningful way, which can be accessed in dierent logical orders[1]

(28)

3.1. PEER-TO-PEER DATABASE MANAGEMENT SYSTEM 25 hence in a similar way we can obtain the denition of peer-to-peer database:

A peer-to-peer database is a collection of data sets organized in dierent logical orders known as sub-schemes by applying the peer-to-peer paradigm, which can be accessed as an ordinary database.

In other words a peer-to-peer database is the implemented conceptual schema, known as the global schema, divided in two (the minimum composition number) or more sub-schemes. Such sub-schemes are distributed over a precise number of peers which can be the same of the sub-schemes or greater. In fact we can actually replicate the structure of a sub-schema in more peers instead of only one, however in order to not compromise the declaration of the global schema, the cardinality of the set of the current sub-schemes in the network should not be less than the number of the divisions of the global schema and each sub-schema must be dierent from the other.

A situation not shown in the example of above can be presented by the replication of one table in more sub-schemes. Actually when a sub-schemes is dened with its own tables it does not mean we can not use those tables in other sub-schemes. Over and above that a sub-schema may be composed by only one table to all tables of the original schema, hence we can actually have a lot of compositions which match the global schema: for instance if a conceptual schema is composed by eight tables we can nd from one to thirty-six combinations, and generally the maximum number of combinations for any global schema is simply dened by

Xn i=1

iwith n the number of tables.

3.1.2 Structure of a P2P-DBMS

The layered structure of a peer-to-peer database management system would not be too dierent from the ordinary one, with the only exception that we have to add a layer concerning the peer-to-peer management.

From the gure 3.3 we see the peer-to-peer system is contained in the same module of the query parser, since the parser must do two particular things: rst as a common parser has to check if the commands sent by the user are valid or not, second it has to see which are the tables involved in the script, but with some devices. Since the requests involve a peer-to-peer database, the tables declared in the script are not only present locally, but also externally. For this reason the parser must check which are the local tables and the external ones and this is possible because every local database knows how the global schema is organized. For instance if we have to solve a query like this:

select books.title, publishers.name from books inner join (

(29)

Figure 3.3: Architecture of a Peer-to-peer Database Management System

publication inner join publishers

on publication.publisher_name = publishers.name ) on books.isbn = publication.isbn;

the system already knows that the table books and publishers are not in the same DBMS, therefore it

(30)

3.2. PEER-TO-PEER DATA MODEL 27 has to split in one or more sub-queries, depending on how many sub-schemes are involved in the design.

If we consider the example shown in the above section, the original query is going to be split in this way:

query for sub-schema-1:

select books.title from books;

query for sub-schema-2:

select publishers.name

from publication inner join publishers

ON publication.publisher_name = publishers.name.

After that, the queries are sent by using the peer-to-peer system which is not actually only for sending sub-queries, but also for receiving them.

Unfortunately reached this point we discover a relevant issue concerning the join operations, since breaking the original query in half a join is lost. As the sub-schema-1 has not any local table named

publication a join between this table and books is not possible. To solve this problem we have to introduce the concept of binding table.

3.2 Peer-to-peer Data Model

In the paper Design of peer-to-peer database with Hyperion Project[22] has been explained how to realize a data model for a peer-to-peer database when we want to use Hyperion. The model does not dier too much from the ordinary ones, there are only some devices in order to identify the sub-schemes and how to dene the mapping tables. Such kind of design is also possible to use in our case, but with some well-dened dierences, since we are working with a dierent system from Hyperion.

3.2.1 Concept of Binding Table

In Hyperion Project has been presented the concept of mapping table, that is a particular table which maps the values of two columns of two dierent tables in order to establish which are the equal values between two structurally dierent tables, but coming from the same conceptual idea. A side eect of such tables is that they unintentionally create a bind between a table and another one, however such bind is very dicult to handle because of the design of the project itself. One of the most tough task of Hyperion

(31)

28 CHAPTER 3. DIONYSIUS, AN INTRODUCTION is to be able to synchronize the mapping, since when a structure of a table changes, the values of the mapping tables should change as well to avoid mapping inconsistencies.

Such issues come out since Hyperion has to deal with a lot of dierent data models, however some started projects have been presented such as Tomas[17] and Clio[18] in order to solve them. Clio is a semi-automatic tool for the creation of the mapping tables, by using dened (by a human being) data schemes of the databases on which we want to establish the mapping[19, 20] and Tomas is to handle them when the structure of the databases changes, by checking the semantic and the syntax of the mapping schema[21]. Both goals of the projects led us to the solution we were looking for.

Due to unintentional eect of the mapping tables we have realized we can get our solution starting from that and most of all from the conceptual design of Tomas and Clio. In fact if we focus on the idea that for dening a bind from a database to another we have to know the conceptual schemes, we could also nd a way not only to create such bind automatically, but we can also handle it an easier way. Let us see an example to understand better the concept by comparing the mapping table with our idea by using the relation model.

Consider the example shown in section Peer-to-peer database where a conceptual schema has been split in two sub-schemes. Conceptually the tables book and publisher have a many-to-many relationship dened with the help of the table publication, but it is compromised by the division. However it might be possible to restore such relationship replacing the table publication with two mapping tables for each sub-schema[22]:

book2publisher_0 = (isbn, name) for sub-schema-1

publisher2book_0 = (name, isbn) for sub-schema-2.

This kind of approach though exceeds from the real purpose of mapping, because even if the mapping can be read as an association of records, for the system such rows correspond to information which are conceptually the same. Therefore instead to call them mapping tables we should dene them in another way, not forgetting the idea of establishing connections between two databases. Since we want to declare a bind between two dierent tables we simply rename them as the binding tables.

3.2.2 Binding Management

Once we have established the binding table between a table and another, the hard task is the management of it. As we explained before the bind is dened with the association of values of two dierent columns of the tables, but the problem is that every time such values are changed, because of a committed update or whatever, the values stored in the binding tables may change as well. The example in tab. 3.1 shows

(32)

3.2. PEER-TO-PEER DATA MODEL 29 a possible situation when we use a binding table to connect the table books to table publishers.

Since the bind is established by using the primary keys for the associations, we have to copy the values of the keys in the binding table.

books

isbn title

123234A Moby Dick 234345B Dracula

publishers

publisher_name telephone North Ed. 555-1234 Tea Books 555-2345

books2publishers isbn publisher_name 123234A North Ed.

234345B Tea Books Table 3.1: Binding table example

However if for instance we have to change the name of the publisher, because of a typing error or the publisher has decided to change its name, we have to modify the value in the binding table too. The worst scenario can be the case when we have to add another column to the structure of the primary key, since, for instance, the code isbn in books does not guarantee the univocity of the record anymore.

Because of that we will have to modify a lot of things: the denition of the primary key in the involved table, the structure of the associated binding table and last, but not least for sure, all the values inside the binding table. Such operation is too huge to handle, most of all in the situation of large sets of data.

To avoid such issue it might be necessary to dene a convention. As we have seen the association between a table and a binding table is established by using the primary keys, in our case it could be much better if we dene a primary key which is not related to the entity described by the record of the table.

Every time we dene an entity we use to dene as the primary key one of the properties of such entity which is able to identify the object univocally. However it is always possible to meet some situations where such property can not always guarantee the univocity. For instance if we want to identify a person by his name and his birth date, we can always meet a case where such identication is not enough, because even if it is rare, we might nd another person with the same name and the same birth date. This is a simple example, but it is enough to get the idea of the problem.

For this reason we could say that every time we want to declare a primary key, we should identify the record itself with an information which does not recall any property of the entity. In the tables of tab. 3.2 we are showing the example of above modied adding the new dierent primary keys according the convention that we have established. As we can see now all the primary keys are dened numeri- cally, since the modern DBMS give to the developer the chance of dening a primary key by using an auto-incremental integer number, in other words every time a row is added to the table, the column is incremented automatically by the system itself.

Such manner to declare a primary keys guarantees several things:

(33)

30 CHAPTER 3. DIONYSIUS, AN INTRODUCTION books

IDbook isbn title

789 123234A Moby Dick 456 234345B Dracula

publishers

IDpublisher publisher_name

345 North Ed.

890 Tea Books

books2publishers IDbook IDpublisher

789 345

456 890

Table 3.2: Binding table example with new primary keys

every modication we perform on the values of the columns do not aect the binding tables;

the number univocally identies the row of that table only;

the information regarding the entity only concerns its own table, the values of the columns should not be present in any other table;

the identication of the entity is univocal for sure, since its number is only decided when the entity is added to table;

the number can not be changed because it is only system concern.

As we said before, the implementations of such kind of tables occur automatically and in particular when a table has foreign key which references to an external table. How the binding tables are implemented will be explained further.

3.2.3 ER and Relational Diagram for a P2PDB

For the design phase of a P2PDB it is not necessary to develop a its own data model, the ordinary ones are enough. Data models such as ER diagram or relational diagram are sucient to establish how our peer-to-peer database should be and we do not have to worry about the binding tables, since such tables are created automatically when it is necessary by our system. The only things we should dene are the sub-schemes for the peers of our network, however a guide to do that it is already shown in the paper

Design of a peer-to-peer database with Hyperion [22].

3.3 Implementation of a P2PDB

Since we are working with a P2PDBMS we need to consider a dierent approach in the management of the data. If we focus on the example shown in the rst section of this chapter, we should theoretically implement every sub-schema (sub-schema-1 and sub-schema-2) in its own peer of the network as they have designed, but it is not what we want. What we want is to implement a peer-to-peer database as it

(34)

3.3. IMPLEMENTATION OF A P2PDB 31 was an ordinary one, hence we should enrich the SQL interpretation with some devices which establish how to congure the database without moving ourselves to the involved peers.

Note: since we have developed a prototype by modifying HSQLDB, the SQL dialect we are going to use in such sections is the one shown in the guide of the project Hypersonic [3].

3.3.1 The keyword IN

A good management of a peer-to-peer DBMS is very important and it is important to interact with the other DBMS's of the same network if we do not want to lose a complete vision of the system. For this reason it has been necessary to give the possibility to interact directly with the other DBMS's, in other words the chance for an administrator to send SQL commands to the interested peers in order to have a vision on the status of that particular data set.

Since we did not want to complicate the ordinary management, we looked for a simple way to send SQL commands to other peers in the network, just adding one more commands in our script or query.

The keyword IN is used in the WHERE clause to compose some kind of queries, but in our case we could use it for another purpose, that is we can use it to specify the peers where our requests must be solved in. For instance if we wanted to send a query to a specic peer, we should start our request with the IN keyword, then the name of the peer and nally the ordinary query: IN <peer name> SELECT * FROM <table>;.

Once we sent the such commands, it is concern of our DBMS to establish if the IN statement contains a valid peer name and send the query to the right destination. Over and above that for the end user it could be possible to list not only one peer, but more than one. However such situation is allowed only when the tables specied in the FROM statement are present in all the peers listed in the IN statement, since once we received the results of all of them, we have to merge them together. This is due to fact we want to give the semblance of operating with an ordinary DBMS.

So the complete syntax of the keyword IN is the following:

[IN <peer name> [, ...]] {CREATE | INSERT | ALTER | DROP | SELECT}.

3.3.2 Denition of a Sub-Schema

The denition of a sub-schema should be done automatically. Every time we dene a table in our local DBMS or in another one by using the IN statement, we actually dene a sub-schema in that particular machine. Let us say we are handling the machine of the DBMS where the sub-schema-1 should be dened, to do that we just create the tables as we used to do, without any other commands and automatically the DBMS knows it has to dene a sub-schema for its own machine. The denition of the sub-schemes is

(35)

32 CHAPTER 3. DIONYSIUS, AN INTRODUCTION not important for us, but it is important for the system, since the sub-schema is a sort of identity card of the peers. In fact every time a new peer connects with our network, its own sub-schema is going to be published through the whole network, where every linked DBMS will accept it and store it in a repository.

The sub-schema gives the information upon the tables stored in that particular DBMS (every peer corresponds to a single DBMS). There is a little dierence from the sub-schema represented during the design phase and the sub-schema stored by the DBMS. The important information for the DBMS in order to interact with the other peers are the keys, the primary ones and the foreign ones, for this reason is used to represent a table with only its name, primary key and eventually its foreign keys.

3.3.3 Foreign Keys and Binding tables

The binding tables are the master key of the whole system, without them it would be impossible to manage everything. The highest problem in a peer-to-peer database is how to handle the constraints which dene a foreign key which references to a table which is not dened locally. In the ordinary DBMS's is not obviously possible to create a reference to an external table, but for a P2PDBMS is necessary. Therefore every time we dene a foreign key to a table which is actually in another machine, the DBMS has to create two binding tables - one for the local DBMS and another for the external one - for supporting the outgoing references. For instance when we dene a table like this:

in our local DBMS

CREATE TABLE writers (

IDwriter int identity primary key, id_author int not null,

id_book int not null,

CONSTRAINT writers_fk1 FOREIGN KEY (id_author) REFERENCES authors(IDauthor), CONSTRAINT writers_fk2 FOREIGN KEY (id_book) REFERENCES books(IDbook)

);

the only valid dened constraint is writers_fk1, because the reference aims to the local table authors.

However when the DBMS meets during the denition a reference to an external table, it has to x the constraint by replacing it adding two binding tables building them by using the information in the constraint denition. Since the system knows the sub-schemes of the other peers, it checks if the table books is present in the other DBMS and starts to build the binding tables. How such binding tables are created depends on how the user denes the table itself. Generally we can distinguish two dierent kind of approach for dening a binding table:

(36)

3.3. IMPLEMENTATION OF A P2PDB 33 1. when a table only contains foreign keys, usually when we have to dene a simple many-to-many relationship, such table is replaced by a binding table, since their work is the same, that is to contain the values which establish the relation between a record and another;

2. when a table has other columns beyonds the foreign keys, we have to create a binding table between this table and the external table referenced by the constraint of the foreign keys.

For instance the table of above does not contain any other column beyonds the keys, therefore the table is replaced by two binding tables:

in our local DBMS IN external_peer

CREATE TABLE authors2books(

IDauthor int not null, IDbook int not null );

CREATE TABLE books2authors(

IDbook int not null, IDauthor int not null);

When instead we dene a table like this:

in our local DBMS

CREATE TABLE books (

IDbook int identity primary key, title varchar(100) not null, id_genre int not null,

CONSTRAINT books_fk1 FOREIGN KEY (id_genre) REFERENCES genres(IDgenre) );

and we suppose that the table genre is not local, we will obtain the following binding tables:

in our local DBMS IN external_peer

CREATE TABLE books2genres(

IDgenre int not null );

CREATE TABLE genres2books(

IDgenre int not null, IDbook int not null);

There are some precise dierences between the denitions of the rst binding tables and the second ones. In the rst case the binding tables must replace the original dened tables, in this case writers, since they exactly do the same task and if the original one remained there would be a redundancy of data - the same key values will be stored in both tables. Instead in the second case as we can see the table

(37)

34 CHAPTER 3. DIONYSIUS, AN INTRODUCTION books has a value which is bound to the entity represented by the record, the title of the book. In this situation we can obviously replace the table, but we add a binding table with one column only for the local DBMS and another binding table from the referenced table to the dened one for the external.

Notice that not every binding table created by the system contains two columns for storing key values, in the last case for example, the rst binding table has only one column which is actually the eld where the values of the genres primary key will be stored. When we operate with tables which establish one- to-many relationships with other tables, only a value will be used in the table, that is the value of primary key of the other tables. Because of that there is no reason to use in books2genres another column.

However the same speech can not be done for the opposite binding table, genres2books, because the external DBMS needs the values of the primary keys in order to solve the queries.

3.3.3.1 Binding Queries

The so called binding queries are simply sub-queries solved by using the binding tables. They are actually created when in a query occurs a join operation, since the operation often involves the keys of the tables.

For instance what if we send a query which involves local and external tables?

SELECT *

FROM authors INNER JOIN (

writers INNER JOIN books ON writers.id_book = books.IDbook) ON authors.IDauthor = writers.id_author;

In this case we have to parse all the FROM statement and see the tables involved in the query. Every time we use an external table or a table, such as writers, replaced by a binding table the query will be recomposed in sub-queries. Since we have authors locally and the table writers is replaced by a binding table the sub-queries generated by system will be:

in our local DBMS

SELECT *

FROM authors, authors2books

WHERE authors.IDauthor = authors2books.IDauthor;

IN external_peer SELECT *

FROM books, books2authors

WHERE books.IDbook = books2authors.IDbook;

(38)

3.3. IMPLEMENTATION OF A P2PDB 35 When the DBMS's will solve each sub-query it will be possible to merge the results in order to obtain the result for the original query. Let us see another example just to have a complete vision of the working:

SELECT *

FROM books INNER JOIN genres ON books.id_genre = genres.IDgenre;

and as we saw before, the table books is not replaced by the binding tables, but it has been put beside to them. Hence we are going to have two queries like the following:

in our local DBMS

SELECT *

FROM books, books2genres

WHERE books.id_genre = books2genres.IDgenre;

IN external_peer SELECT *

FROM genres, genres2books

WHERE genres.IDgenre = genres2books.IDgenre;

How such binding queries are generated it will be explained in the next chapter.

(39)

(40)

Chapter 4

Development of the Prototype

4.1 Hypersonic and JXTA platform

In order to show whether a system like ours works, it has been necessary to develop a prototype of it.

Since a system like that it is quite hard to develop it from the scratch, we have chosen the option of modifying one of the possible open-source DBMS's that we can nd in the Internet. To be able to show something in relative short time we also decided to develop our DBMS in Java, also because it is one of the most used programming languages and it is also perfect to realize demo and prototypes, because of its speed of development. For this reason the choice of considering Hypersonic as the base of our DBMS was almost mandatory, since it is one of the most mature open-source DBMS developed in Java. Then it was automatic to chose the JXTA platform for the implementation of the peer-to-peer system for our DBMS.

4.1.1 Hypersonic, a DBMS developed in Java

For starters we had to give some information about one of the most interesting projects developed in Java, Hypersonic. Since we wanted to show how good our idea is, we had to provide a prototype, however since implementing a DBMS from the scratch is not one of the easiest task of the world, we decided to modify an existing one and Hypersonic it was the choice.

Hypersonic, best known as HSQLDB which stands for Hyper-threaded Structured Query Language Database, is the rst database management system totally developed in Java. It based on the relational database model and supports several SQL dialects such SQL-92, SQL-99 and SQL-2003.

It is possible to run HSQLDB in several modalities:

in-memory: it is only used when the persistence of the data is not necessary, since all the information 37

(41)

38 CHAPTER 4. DEVELOPMENT OF THE PROTOTYPE

are not stored in the local disk;

server: it uses its own communication protocol in order to connect to it and it is the fastest mode among the modalities;

stand-alone: when we need a database for our application we can run HSQLDB as stand-alone and everything is handled in the same virtual machine, however the database it will be provided locally only, and there is no chance to connect to it from the network;

web-server: such modality should be only used if necessary, because it uses the HTTP protocol and the restrictions on this kind of communication channel can be only provided by a rewall.

To store the data HSQLDB has two modalities:

memory: is the default one and save all data changes in a simple text le where all the SQL commands are written, in this way every time the server has been started it just executes the saved commands;

cached: is to provide the chance to store huge sets of data (Gbytes) loading the data partially synchronizing them to the disk on transaction commits.

4.1.2 JXTA platform, peer-to-peer paradigm in Java

Regarding the development of a peer-to-peer network the only choice we had was JXTA (Juxtapose). JXTA is a Java open-source project started by Sun Microsystem© and it is usually referred as a technology or a platform which allows to develop applications with the implementation of the peer-to-peer paradigm.

It creates a virtual network over the ordinary one permitting to the nodes, or rather, peers to commu- nicate among them by several protocols dened by sets of XML messages. Every peer of the network is recognized by an univocal and permanent URN (Uniform Resource Name) and it never changes neither when the peer leaves the network.

In JXTA uses dierent resources in order to dene the peer-to-peer network:

peer: is the single node of the network, it is autonomous and it can disconnect itself every time it wants;

peer-group: is a set of peers which have a common purpose, by default every peer is part of the NetPeerGroup, but then it can join to dierent groups in the same time;

(42)

4.1. HYPERSONIC AND JXTA PLATFORM 39

Figure 4.1: Architecture of HSQLDB

pipe: is the communication channel used by the peers to send messages among them;

advertisement: is a XML message which describes the necessary information regarding one of the previous resources.

It is one of most mature project regarding peer-to-peer paradigm development and it can be also used on mobile devices such as cellular phones and PDA's.

(43)

40 CHAPTER 4. DEVELOPMENT OF THE PROTOTYPE

4.2 Modications of HSQLDB

Thanks to a presentation of the HSQLDB architecture[23], it was possible to get a better idea where to put the necessary modications.

In the g. 4.1 is shown part of the architecture of HSQLDB, essentially the objects where we accomplished our modications, however some of them are going to be better represented during the next sections when they will be involved in the explanations of their alterations.

4.2.1 The Main System

HSQLDB system used to be started by command line and to instantiate the starting phase we have to specify one database that we want to use for our purpose, since the system does not allow the management of two or more database at the same time. During the starting phase all the necessary information of the database are loaded from the disk, or rather, in a text le has been saved all the SQL commands in order to restore the archive just like it was before the last time we used it. After the replication in the memory of the database a server thread loops forever to catch and solve all the requests sent by a client.

Such phase it was quite simple to analyze, because of its linearity, and it was easy to get where it was possible to add the needed modications such as the instantiation of the JXTA platform. Since the basic idea of P2PDBMS is to interact with other DBMS by using a peer-to-peer network, the instantiation of JXTA is pretty important and if it fails, the starting phase of HSQLDB fails as well. JXTA allows HSQLDB to get other necessary information for answering to request sent by a client, whether the platform is not ready the request can never be replied. The management of JXTA has been simplied to a set of classes which allows to execute the basic features of the platform, in this way it was possible to get immediately the problem when something wrong happened. Such set is contained by a wrapper class called BacchusSystem.

BACCHUS and HSQLDB run independently or rather parallely, since the rst has to handle all the JXTA messages and the second the ordinary connections sent by a client software. Initially when HSQLDB is ready, BACCHUS instantiates the local peer by using the name of the host as the name of the peer itself, and the name of the used database as the name of the peer-group, after that it sends a message through the network in order to get all the extra data it can not know from the scratch such as the connected peers and their sub-schemes. The sent message it is a sort of hand-shake approach for knowing each other, the new peer sends a hi-there message with its name and its sub-schema, then it waits for a welcome reply which contains the sender-peer and the sender's sub-schema. Such data are not stored on the disk, since the peers come and go and the related information are useless if the computers are not connected to the network.

(44)

4.2. MODIFICATIONS OF HSQLDB 41 Finally the whole system is ready to work as an ordinary DBMS and during the execution JXTA and HSQLDB interact with each other for replying to the requests.

4.2.2 Binding Parser

Every time we send a SQL script it is necessary for the system to identify the kind of the commands inside the script, since it could include the ones necessary to interact with the external DBMS. The most dicult task during the modication of HSQLDB was to understand the ordinary parser of DBMS and see if it was possible to edit it in order to meet our purpose. We spent a lot of time because of the parser and at the end we realized it was necessary to create our own parser, since the ordinary one was too bound to the architecture of HSQLDB, any modication leaded too many errors during run-time. We gave up and we decided to face a dierent approach in order to simplify the management of our P2PDBMS.

When a connection between the server and the client problem has been established, the end user can send every request he wants. Such request arrives in a String format, so it is quite easy to parse it and check if there is something wrong. Before the parsing of the ordinary system we take the control of the String object and we give it to our Binding Parser. Such parser has the task to identify the usage of the IN statement, the creation of tables and query with join operations.

4.2.2.1 Parsing of IN statement

The parsing phase of the IN statement is nothing special. We just check if the request starts with the keyword IN and if the statement lists a sequence of right peers. After that we send the rest of the request to the peers listed before, without any control on the correctness of the request syntax. The forwarding is handled by our JXTA wrapper class by setting the source, the destination and the content.

4.2.2.2 Parsing of CREATE TABLE statement

The parsing of the CREATE TABLE statement is necessary in order to avoid problems with the ordinary parser of HSQLDB. In fact if want to create a table like this:

Dionysius: a Peer-to-peer Database Management System

Dionysius

a Peer-to-peer Database Management System

Luca Guadagnini

Contents

List of Figures

Chapter 1

Introduction

1.1 Background

1.2 Objective

1.3 Research Methodology

Chapter 2

Database Management Systems and Related Work

2.1 Database Management Systems

2.2 Data Models of DBMS

2.3 Distributed and Parallel Database Management Systems

2.4 Peer-to-Peer Database Systems

Chapter 3

Dionysius, an introduction

3.1 Peer-to-peer Database Management System

3.2 Peer-to-peer Data Model

3.3 Implementation of a P2PDB

Chapter 4

Development of the Prototype

4.1 Hypersonic and JXTA platform

4.2 Modications of HSQLDB

4.2 Modications of HSQLDB