A graph database management system for a logistics-related service

(1)

INOM

EXAMENSARBETE TEKNIK,

GRUNDNIVÅ, 15 HP ,

STOCKHOLM SVERIGE 2016

A graph database management

system for a logistics-related

service

MARCUS WALLDÉN

AYLIN ÖZKAN

KTH

(2)

A graph database management system for a

logistics-related service

Marcus Walld ´en & Aylin ¨Ozkan

Bachelor of Information Technology Thesis

Information Technology

School of Information and Communication Technology KTH Royal Institute of Technology

Stockholm, Sweden 15 September 2016

(3)

c

(4)

Abstract

Higher demands on database systems have lead to an increased popularity of cer-tain database system types in some niche areas. One such niche area is graph networks, such as social networks or logistics networks. An analysis made on such networks often focus on complex relational patterns that sometimes can not be solved efficiently by traditional relational databases, which has lead to the in-fusion of some specialized non-relational database systems. Some of the database systems that have seen a surge in popularity in this area are graph database sys-tems.

This thesis presents a prototype of a logistics network-related service using a graph database management system called Neo4j, which currently is the most popular graph database management system in use. The logistics network cov-ered by the service is based on existing data from PostNord, Sweden’s biggest provider of logistics solutions, and primarily focuses on customer support and business to business.

By creating a prototype of the service this thesis strives to indicate some of the positive and negative aspects of a graph database system, as well as give an indi-cation of how a service using a graph database system could be created.

The results indicate that Neo4j is very intuitive and easy to use, which would make it optimal for prototyping and smaller systems, but due to the used evalu-ation method more research in this area would need to be carried out in order to confirm these conclusions.

Keywords. Graph database, Relational database, Prototype, Logistics, Graph analysis, NoSQL

(5)

(6)

Abstract

Högre krav p˚a databassystem har lett till en ökad popularitet för vissa databas-systemstyper i n˚agra nischomr˚aden. Ett s˚adant nischomr˚ade är grafnätverk, s˚asom sociala nätverk eller logistiknätverk. Analyser p˚a grafnätverk fokuserar ofta p˚a komplexa relationsmönster som ibland inte kan lösas effektivt av traditionella re-lationsdatabassystem, vilket har lett till att vissa specialiserade icke-relationella databassystem har blivit populära alternativ. M˚anga av de populära databassyste-men inom detta omr˚ade är grafdatabassystem.

Detta arbete presenterar en prototyp av en logistiknätverksrelaterad tjänst som använder sig av ett grafdatabashanteringssystem som heter Neo4j, vilket är det mest använda grafdatabashanteringssystemet. Logistiknätverket som täcks av tjänsten är baserad p˚a existerande data fr˚an PostNord, Sveriges ledande leverantör av logistiklösningar, och fokuserar primärt p˚a kundsupport och företagsrelaterad analys.

Genom att skapa en prototyp av tjänsten strävar detta arbete efter att uppvisa vissa av de positiva och negativa aspekterna av ett grafdatabashanteringssystem samt att visa hur en tjänst kan skapas genom att använda ett grafdatabashanteringssystem. Resultaten indikerar att Neo4j är väldigt intuitivt och lättanvänt, vilket skulle göra den optimal för prototyping och mindre system, men p˚a grund av den använda evalueringsmetoden s˚a behöver mer forskning inom detta omr˚ade utföras innan dessa slutsatser kan bekräftas.

Nyckelord. Grafdatabas, Relationsdatabas, Prototyp, Logistik, Grafanalys, NoSQL

(7)

(8)

Acknowledgements

We would like to thank our supervisor Thomas Sjöland and our examiner Anne H˚akansson, who helped us with a wide variety of issues during the course of this project. We also appreciate the assistance of Petter Edlund and Torbjörn Sjögren, acting as supervisors at PostNord, who gave us great insight into the company and the various systems that we used.

Stockholm, September 27, 2016 Marcus Walld´en & Aylin ¨Ozkan

(9)

(10)

List of Figures

2.1 A visual illustration of a graph with two nodes and one edge. . . . 9

2.2 A graph data model with nodes containing properties . . . 10

2.3 A query written in Cypher . . . 11

4.1 A graphical representation of the waterfall model. . . 20

4.2 A graphical representation of the six stages of software prototyping. 21 4.3 A visual representation of the spiral model, including the four re-occurring steps in a clockwise spiral. . . 22

5.1 A visual representation of the development model. . . 26

5.2 The iterative phases of creating the graph data model. . . 29

5.3 The model for data migration. . . 30

5.4 The model used to create queries.. . . 31

6.1 The basic components implemented for the graph model. . . 36

6.2 The advanced graph model. . . 37

6.3 The complete graph model that will be used to create the graph DBMS. . . 38

7.1 A Shipment node with its relations and properties. . . 49

7.2 All Shipments sent to a Consignee. . . 49

7.3 Two Shipments sent from a Consignor to a Consignee that has a Party. . . 50

7.4 Top 10 Delivery Points by delivered Shipments for a Consignee. . 51

7.5 The top 10 Organisations that an Organisation has sent Shipments to, ordered by the amount of Shipments. . . 52

7.6 Organisations that received the highest amount of Shipments in an Area.. . . 53

7.7 The Consignees of an Organisation’s Consignees, ordered by the amount of Shipments. . . 54

(15)

(16)

List of Tables

2.1 Two visual examples of representation in an RDBMS. . . 8

5.1 Terminology used for the specification of requirements. . . 27

5.2 Specification of Requirements - Customer Support. . . 28

5.3 Specification of Requirements - Business To Business. . . 28

5.4 Hardware setup used for Neo4j environment. . . 32

6.1 Visual example of two files containing a list of Person or Pet with corresponding properties. . . 40

6.2 Visual representation of how a relationship between a Person and a Pet might look. . . 40

6.3 All CSV files with all the migrated data in nodes and relationships. 40 7.1 Node types and their corresponding properties. . . 48

(17)

(18)

List of Acronyms and Abbreviations

This thesis requires readers to be familiar with multiple terms related to databases, logistics and other areas. As such the most important terms have been specified prior to their use in the coming chapters.

DBMS Database Management System

RDBMS Relational Database Management System SQL Structured Query Language

JSON JavaScript Object Notation CSV Comma Separated Values B2B Business To Business

(19)

(20)

Chapter 1 Introduction

1.1 Background

Digitally stored data can be stored and accessed in many different ways, result-ing in many different types of database management systems, DBMS. DBMSs can manage and store data, update it and perform analyses [1]. This kind of data storage has been popular ever since its inception, allowing companies and other groups to store customer information, product specifications and many other things.

Many DBMS types exist, but they generally are split into two groups: relational and non-relational [2]. Relational database management systems, RDBMS, have been in use for decades and is the most popular type of DBMS [2]. RDBMS is a specific type of a DBMS, but non-relational database management systems refer to all types of DBMSs that are non-relational, i.e not an RDBMS [2]. As such this term refer to many different kinds of DBMSs, all of which have their own underlying principles and uses. Many non-relational DBMSs are specialized on a specific niche market [2], potentially allowing such systems to offer more cus-tomized solutions in a specific area.

1.2 Problem definition

When DBMSs were first put in use, the amount of data and the analysis complex-ity tended to be quite low. Furthermore, the stored data was often static. As time went on, systems started to grow more complex as more features and functionality were added, resulting in higher amounts of data, higher analysis complexity and a

(21)

2 CHAPTER 1. INTRODUCTION

big increase in data ingestion [3].

Relational DBMSs are commonly used in many situations, even though they have multiple limitations, such as being quite static and bad at handling large amounts of unstructured data [3]. As some systems evolved and became larger and more complex some of these limitations became more apparent. As such there was a need to find alternatives to RDBMSs in some niche areas where RDBMSs did not perform adequately.

Graph networks, i.e a connected set of graphs and edges, is one such area, evi-dent from its generally complex data analysis and relational structures. As graph networks grow, with added functionality and features, they generally contain more complex relational properties and larger quantities of data. This in turn creates a problem for certain DBMSs such as RDBMSs. Due to its static structure and other factors it is hard to create an RDBMS that scales well with certain graph network systems.

As such there is a need to find an alternative DBMS that can solve the problems related to graph networks, such as logistics networks. One type of non-relational DBMS that has specialized in the area of graph networks is the graph database management system, which aim to represent graph networks in an effective way. Graph DBMSs have become popular alternatives to relational databases in re-gards to graph networks. Such DBMSs are currently in use by many companies, especially for logistics-related purposes, including fortune 500 companies such as eBay [4]. Because graph DBMSs are fairly new, more information is needed in order to prove that they are valid alternatives to RDBMSs in the field of graph networks, and what positive and negative properties they have. Furthermore more information of how a graph DBMS could be created and used is needed in order to raise awareness and interest in graph DBMSs.

1.3 Purpose

(22)

1.4. GOALS 3

The main objective of the thesis is to pinpoint how such a service using a graph database management system can be developed, as well as to explore some of the advantages and disadvantages of the system.

By developing such a system and showcasing its creation and advantages it is pos-sible that entities facing similar graph networks-related problems, or other types of problems for that matter, will become interested in exploring non-relational DBMSs. More interest in this area could potentially result in more specialized solutions being found and thus solve some of the existing problems many entities face with RDBMSs today.

1.4 Goals

The goal of this thesis and the degree project is divided into the following sub-goals:

1. Develop a prototype of a logistics-related service using a graph DBMS 2. Showcase how such a prototype could be created

3. Analyse the end product and identify some of the potential strengths and weaknesses of the graph DBMS

To summarize, the goal is to portray the aspects of both the development process and the end product. By focusing on both aspects it is possible to better display the potential uses of a graph DBMS.

1.5 Benefits, Ethics and Sustainability

This thesis strives to benefit all types of entities that use DBMSs by showcasing how a graph DBMS can be created and used. The degree project builds on exist-ing systems at PostNord and is built upon a specification of requirements that they have created. As such PostNord directly benefits from the results and conclusions reached by this thesis.

(23)

4 CHAPTER 1. INTRODUCTION

The results and conclusions of this thesis are based on the current versions of a variety of software systems. Some systems are currently under development, meaning that future versions of the software systems potentially could provide results that differ from that of this thesis. The results and conclusions offer an insight into the current state of graph DBMSs, and could provide different entities with information that could improve the sustainability and performance of their database systems.

1.6 Methodology

A research methodology includes concepts such as paradigms and theoretical methods and models. It can be viewed as a framework for the research process, i.e all the steps and phases included in a project.

There are two research methodologies: Qualitative and Quantitative [6]. Qualita-tive and quantitaQualita-tive research methodologies differ in the way they collect, analyse and validate data, in addition to how the research strategy is designed. The quan-titative research methodology often consists of measuring variables, where the results must be evaluable and measurable [6]. Qualitative research methodology, on the other hand, focuses on behaviours and perceptions, and can include results that are not measurable, but rather based on opinions.

The goal of the thesis is to display how a prototype of a graph DBMS is created and used, meaning that the results are largely based on opinions and perceptions. The project also does not use any analysis of measurable statistical data in order to reach the results, meaning that a qualitative research methodology is used.

1.7 Stakeholders

(24)

1.8. DELIMITATIONS 5

1.8 Delimitations

Due to the scale of the degree project certain components were overlooked, such as:

• Cost Analysis

The cost of the products and services used in this degree project have not been considered and are not a part of this thesis. Furthermore the cost of the necessary development work has not been estimated and will not be taken into consideration.

• Comparison Analysis

Due to the time limitation of the degree project no comparisons to other types of DBMSs were made. The aim of the thesis is primarily to present the development process and uses of a graph database system and as such no comparison is made.

• Ethical Analysis

The prototype created in this degree project handles potentially sensitive and pri-vate information, meaning that using the data in an actual service could be uneth-ical, depending on its uses. The ethical aspect is not taken into consideration in this thesis, but the data used throughout the project is modified as to avoid any privacy concerns.

• Security Analysis

The handling of private and sensitive information potentially requires certain secu-rity measures in regards to the created service and the database system itself. Due to this only being a prototype of the service such aspects have been overlooked.

1.9 Structure of the thesis

Chapter 2 presents background information about database systems and intro-duces all important systems that are used throughout the degree project. Chapter

(25)

(26)

Chapter 2 Database Systems

This chapter provides the theoretical background of this thesis, specifically in re-gards to the different terms and products related to database systems that are used. Different database management systems are explained and related works are dis-cussed.

2.1 Database Terms

Many different database-related terms exist, and this section provides information about the different database-related terms used throughout the thesis.

2.1.1 Database

A database simply stores a collection of data in a structured format [1]. The stored data can be images, text, numbers, and so forth. Worth noting is that there is no requirement that the data is stored on a computer.

2.1.2 Database System

A database system consist of one or more databases as well as software to ac-cess and proac-cess the stored data [7]. In contrast to a database, a database system could be used on a large scale to store data, thanks to the streamlined access and processing methods.

2.1.3 Database Management System

A database management system, DBMS, is a software application that creates and manages databases [8]. It offers the same usability as a database system, i.e access

(27)

8 CHAPTER 2. DATABASESYSTEMS

and process stored data, but it also provides other features. Generally a database management system offers tools to manipulate and analyse stored data, in addition to administrative tools such as logging and backup services.

A DBMS also consists of a database model, which defines the database’s logi-cal structure [8]. The logical structure defines how data is stored in the databases of the DBMS. There exist many different database models, many of which func-tion in different ways. This in turn means that there exist different DBMS types, since the logical structure of the stored data will change how a DBMS functions as a whole.

2.2 DMBS Types

This section provides basic information about some DBMS types. Many different types of DBMSs exist, but due to the scale of the thesis only relevant types are addressed in this section.

2.2.1 Relational Database Management System

The website db-engines.com keeps a ranking system of the biggest DBMSs in the world, based on popularity. Of the top 10, seven are relational database man-agement systems [9]. This illustrates the massive market share RDBMSs have in today’s world.

Table 2.1: Two visual examples of representation in an RDBMS. ID Name Age Pet ID

522 John 31 5 523 Charlie 25 6

ID Type 5 Cat 6 Dog

(28)

2.3. GRAPHDBMS PRODUCTS 9

2.2.2 Graph Database Management System

Graph DBMSs are fundamentally different from RDBMSs and many other types of DBMSs in the sense of how they store data. While an RDBMS generally stores the same type of entity (e.g person, dog, cat, ...), thus creating a simple way of accessing and analyzing entities of the same type, graph DBMSs store data as nodes, edges and properties. A graph is a set of vertices connected by a set of edges (also known as lines) [11]. Throughout the thesis vertices are also referred to as nodes or points.

Figure 2.1: A visual illustration of a graph with two nodes and one edge.

Nodes generally represent entities (e.g Person, Dog, Cat, ...), edges usually repre-sent the relation between nodes (i.e Person HAS Pet) and the properties reprerepre-sent the attributes of the nodes and edges. As such nodes are directly connected to its related entities, rather than the same type of entities, which is highly common for relational DBMSs. In Figure 2.1, the relation between a Person and a Pet is visualized using a graph.

2.3 Graph DBMS Products

This section presents two of the most popular graph DBMS alternatives that are currently available; Neo4j and Titan. Due to the scale of the thesis alternative graph DBMS solutions other than Neo4j are not discussed in detailed, but basic information of Titan is presented to provide a perspective of other possible solu-tions.

2.3.1 Neo4j

(29)

widely in use, including many fortune 500 companies such as eBay [4], that is using Neo4j for some services related a logistics network, which in this thesis is defined as a system of operations that work together to deliver a product to the market. It includes the process of collecting shipments, transporting shipments and delivering shipments to the end customer.

The company behind Neo4j claims that it is ”wicked fast” [13], which indeed is backed up by some third party benchmarks [14], albeit not all [15]. The efficiency of the DBMS is largely dependent on the database’s graph data model, which is fairly unique compared to other types of DBMSs. Continuing on the analogy of a person and its pet, Figure2.2illustrates how such a graph data model could look. A Person ,who has a name, gender and age, has a Pet, which has a name, age and type (such as cat or dog).

Figure 2.2: A graph data model with nodes containing properties The graph data model serves three main purposes:

1. Illustrate how a system works

The example graph data model presented in Figure 2.2 covers a very basic re-lationship between two types of nodes, but one could imagine a very complex system, containing hundreds of types of nodes and relationships. Such systems would be hard to understand or explain to others. The graph data model offers an intuitive way of understanding the inner workings of a system, even more complex ones, and could be a valuable component in product development.

2. Define how nodes are connected in the DBMS

(30)

2.3. GRAPHDBMS PRODUCTS 11

in the graph data model should as such not exist in the database, meaning that it is clear which relations could exist in the database and which could not.

3. Define how the data of nodes and edges are stored

The graph DBMS Neo4j stores data as nodes and edges, like most other graph DBMSs. This means that the structure of the stored data is at large defined by the structure of the graph data model.

Neo4j uses a declarative graph query language called Cypher in order to per-form analysis on its database. Cypher borrows its structure from SQL [16], which is a standardized query language used by many relational databases. Two of the most important aspects of Cypher are its ease of use and intuitivity, which can be illustrated by a simple example, shown in Figure2.3.

MATCH (p:Person)-[:HAS]->(c:Pet) WHERE c.type="Cat"

RETURN p.name

Figure 2.3: A query written in Cypher

The example query shown in Figure2.3tries to find a person, possibly more than one, who has a cat as a pet. It then returns the names of all people who has a cat. It contains the three most fundamental pillars of Cypher:

• MATCH

The MATCH defines the pattern of the query. In the example a person, p, has a pet, c, and the query will look for all patterns in the graph database that match this description.

• WHERE

The WHERE filters the result in some way. In the example it is defined that only pets of the type Cat are of interest, so all other previous matches are removed from the result.

(31)

The RETURN defines what the output should be. In the example the names of the people who have cats are returned, but many other things could also be outputted from the result, such as the cat’s name, the person’s age, and so forth.

Cypher has many more features, but due to the scope of the thesis only the fun-damental aspects are covered. More information can be found in Neo4j’s Cypher manual [17].

2.3.2 Titan

Titan is the third most popular graph DBMS in the world [18] and has gained a large following since its initial release in 2012 [19]. It is open source and free to use [20].

Titan has a big focus on scalability [20] and is optimized for distributed machine clusters [20].

2.4 Related work

Due to Neo4j being the most popular graph database in the world a lot of studies have been done that focus on many different aspects of the product. As such this section strives to identify other studies that contain information, methods or ideas that may be usable for this thesis.

2.4.1 A Comparison of a Graph Database and a Relational

Database

A study from 2010 focused on a comparison between the graph DBMS Neo4j and the RDBMS MySQL [21]. The study touches upon many interesting subjects that are relevant for this thesis.

It contains a small discussion about Cypher, Neo4j’s query language, and points out several advantages, such as ease of use, specifically pointing out advantages in regards to the examined graph traversals, which is the act of traversing between nodes in a graph, generally outperforming the RDBMS it was compared to.

2.4.2 Using Neo4j for mining protein graphs

(32)

2.5. SUMMARY 13

this several methods and ideas could still be seen as interesting.

In contrast to the study discussed in Section 2.4.1, this study mainly focuses on subgraph matching, the process of trying to match a certain pattern to a part of a graph, rather than graph traversals. In this area, for the queries handled in the study, Neo4j performed poorly, indicating that such queries will not perform as well as queries about graph traversals. Finally the study also questions Neo4j’s performance on larger graph networks, but also points out that it is competitive in smaller graphs.

2.5 Summary

For this thesis Neo4j will be used. It is the most used graph DBMS in the world [18] and provides many interesting parts such as the graph data model and the declarative graph query language Cypher. Many other graph DBMSs, such as Titan, exist, but due to the limited time frame of this thesis these will not be dis-cussed in great detail.

(33)

(34)

Chapter 3 Methodology

This chapter provides a detailed overview of the methods used to carry out re-search, data collection and quality assurance in this thesis as well as the underly-ing degree project.

3.1 Research Paradigm

The research paradigm sets the point of view of the project, meaning that it defines the principles that guide how the project as well as the results are viewed. As such it is an essential part of the project.

There are four core paradigms that could be used: Positivism, Realism, Inter-pretivism and Criticalism [23, 24]. Depending on which paradigm is used the project and its results are evaluated in different ways. Interpretivism is often used in projects focused on opinions and perspectives and are often used in software development [6]. It is often inductive in nature, which is highly applicable to this project. In addition, the results of the project is largely interpreted based on opinions and perceptions and as such interpretivism has been used for this project.

3.2 Research Approach

The research approach is used to draw conclusions and decide whether or not something is true or false [6]. There are two main areas in the research approach: Inductive or Deductive [6]. The inductive research approach is often used in com-bination with the qualitative research methodology. In contrast to the deductive

(35)

16 CHAPTER 3. METHODOLOGY

research approach, results are often based on opinions and perspectives. Fur-thermore the deductive research approach is generally used to try to validate hy-potheses or theories, meaning that the inductive research approach would be more suitable for the project, and is therefore used.

3.3 Research Design

The research design provides the guidelines for how research should be conducted through the project. This includes aspects such as planning, organizing and con-ducting the actual research [6]. There are three possible research designs that could be matched with the current research approach, which are Action Research, Exploratory Research and Grounded theory [6].

Action Research is generally used to find solutions and improvements to exist-ing problems or concerns. It is a cyclic method of takexist-ing a certain action and then observing and analyzing the outcome [6].

Grounded theory strives to create a theory that is grounded in data [6]. Grounded theory is dependent on a systematic collection and analysis of data to create the underlying theory [6].

Finally, Exploratory Research aims to identify issues and different variables of a problem, using a qualitative data collection [6]. It rarely strives to find firm solutions to a problem, rather it aims to provide information about the problem’s different aspects.

This thesis is largely based on qualitative data such as interviews. Furthermore the main goal is to identify different aspects of a system, rather than trying to solve some problem. As such the exploratory research design is used for the the-sis, which best fit this description.

3.4 Data Collection

(36)

3.5. DATAANALYSIS 17

asking open-ended unstructured questions to a group of participants.

3.5 Data Analysis

There are two possible data analysis methods that can be used with the data col-lection method used in this thesis. Those are Analytic induction and Grounded theory. Both are iterative in nature, and collects and analyse data until an hypoth-esis has been reached that can not be dismissed by the existing data [6]. The main difference between the two methods is that analytic induction ends when a hy-pothesis has been reached, which grounded theory ends when a validated theory has been reached [6].

Once again, one of the main goals of the thesis is to identify the positive and negative aspects of a system, which means that a hypothesis needs to be reached after analyzing the existing data. This conforms perfectly with the analytic induc-tion method, which is why it is used through the thesis.

3.6 Quality Assurance

(37)

(38)

Chapter 4 System Development Methodology

A system development methodology is a framework that structures and controls the development process of a system or service [25]. More than one development methodology can be used for a project, and this chapter discusses some system de-velopment methodologies that are relevant for the degree project, and then decides which one to use. Some system development methodologies are first introduced, after which the methodology to use for this thesis is selected.

4.1 Waterfall Model

The waterfall model offers a linear framework with multiple sequential phases. The main emphasis is on planning, target dates and implementing the whole prod-uct at one time. There is also a focus on written documentation as well as formal reviews [25]. The model is divided into five main steps, as illustrated in Figure4.1. In the first step, Requirements, requirements for the software is gathered. In De-sign, the second step, a design is created that dictates the process of implementa-tion [25]. In the third step, Implementation, the implementation of the software is carried out as specified by the design. In the fourth step, Verification, the system is verified and tested so that it conforms to the requirements. In the fifth step, Maintenance, the system is deployed and supported [25].

(39)

20 CHAPTER 4. SYSTEM DEVELOPMENTMETHODOLOGY

Figure 4.1: A graphical representation of the waterfall model.

The waterfall model offers many positive aspects. Due to the clear definition of the different stages, the progress of the development is measurable. The extensive documentation and clear development process make it easier to add new develop-ers to a project.

Due to the linear progression it is hard to use any iteration, which might be needed for some types of products, especially prototypes. Furthermore, the model makes it hard to adapt to changes of the product or the requirements, due to limited backtracking abilities. Because of the lack of iterations and the fact that all the implementation is done before the verification stage, it might also be hard to solve certain issues or bugs in the software [25].

4.2 Software Prototyping

(40)

4.3. SPIRAL MODEL 21

In the first step, Requirements Gathering, requirements are gathered, followed by the second step, Design, in which a design is created for how the process of implementation should be carried out. In step three, Prototyping, a prototype is created that fulfills some of the requirements of the system. In step four, Evalua-tion, the created prototype is evaluated by customers in order to provide feedback [25]. In step five the prototype is either seen as adequate, i.e fulfilling all of the requirements, in which case it continues to step six, Deliver System, where the system is deployed, or, if the system is inadequate, step two is once again reached [25].

Figure 4.2: A graphical representation of the six stages of software prototyping.

Due to the methodology’s iterative nature, it is easy to identify errors and bugs during the implementation phase. It is also easy to confirm that all parties have a clear understanding of the requirements of the system, due to the continuous evaluations that occur in each iteration [25]. Changes to the requirements or the product itself can be easily countered due to its iterative nature.

The many iterations could lead to longer development times and costs. Fur-thermore, documentation may also be neglected, which can lead to poor design choices and also limit a product’s future potential [25]. Lastly, the methodology can lead to a lack of quality, as mock-up code and design choices might persist throughout the many iteration, leading to an inferior final product.

4.3 Spiral Model

(41)

22 CHAPTER 4. SYSTEM DEVELOPMENTMETHODOLOGY

the service that is complete [25]. Figure4.3illustrates how the spiral model works. In the first step, Analysis, objectives of the phase are determined and require-ments are identified. In the second step, Evaluation, risks and other elerequire-ments are identified and resolved. In the third step, Development, the implementation is car-ried out and in step four, Planning, the next phase is planned, after which step one commences again with the next phase.

Figure 4.3: A visual representation of the spiral model, including the four reoc-curring steps in a clockwise spiral.

Due to the structure of the model, it is easy to utilize other system development methodologies, such as waterfall or prototyping, for some phases where it might be seen as appropriate. The model also improves the control of risks [25].

How the model is used varies depending on the project, and is as such quite com-plex with limited reusability. The comcom-plex nature of the model might also lead to higher development costs and longer development times.

4.4 Summary

(42)

4.4. SUMMARY 23

nature. In certain phases of the spiral model, other methodologies could be used in order to simplify the development process.

Section2.3.1explained the main aspects of Neo4j, such as the graph data model and the queries. In addition, there is also a need to create the actual database in Neo4j. All of these aspects that need to be developed are separate in nature, mean-ing that there is no need to specifically use a methodology such as the waterfall model. Instead, an iterative methodology could be used. Software prototyping is seldom used by itself as a system development methodology, which leaves the spiral model.

(43)

(44)

Chapter 5 Modelling of Software Development

Processes

This chapter details the development model of the prototype service and defines the specification of requirements provided by PostNord. Individual development processes of the different aspects of the prototype service are also introduced, fol-lowed by hardware and software used throughout the development phase. Lastly the evaluation framework of the degree project is presented in detail.

5.1 Development Model

The definition of a development model used in this thesis is that it dictates how the software development should be carried out, by utilizing the system development methodology chosen in Chapter4.

As explained in Section2.3.1, a graph data model needs to be created. Further-more, queries need to be created using Cypher. In addition the data also needs to be collected, parsed and imported into Neo4j, in order to create the database. These are the aspects that need to be developed, i.e. the three phases that will exist, using the spiral method, as illustrated in Figure5.1.

(45)

26 CHAPTER5. MODELLING OF SOFTWAREDEVELOPMENTPROCESSES

Figure 5.1: A visual representation of the development model.

There are three phases, the graph data model phase, the data migration phase and the analysis phase. The analysis phase, in which the queries are created, needs to have access to the complete database in Neo4j before the phase can be finished, meaning that the phase data migration needs to be completed before the analysis phase. Likewise, in order to create the database in Neo4j the graph data model needs to be complete. As such the first phase is to create the graph data model, the second stage to migrate the data into Neo4j and the third to create the queries in the analysis phase.

Both the graph data model and the query creation processes are iterative and ex-ploratory in nature, meaning that the prototyping method would be optimal to use in those two phases. As such the prototyping method is loosely used for both phase one and three, which is defined in detail in Section5.3and5.5.

5.2 Consumer Profiling System

To accomplish the goals set by this thesis a prototype of a system would need to be created. The system would need to be an interesting and realistic real world scenario in order to truly showcase the uses of a graph DBMS. As such PostNord has provided a specification of requirements of a mock-up consumer profiling sys-tem.

(46)

pro-5.2. CONSUMER PROFILINGSYSTEM 27

filing system, basic information about the most relevant terms has been specified in Table5.1.

Table 5.1: Terminology used for the specification of requirements.

Term Meaning

Consignor Sender of a shipment Consignee Receiver of a shipment Delivery Point Delivery point of a shipment Party Contact information of a consignee

Organisation Company Information of a consignee/consignor Trade Category/sector of an organization

Zip Area Country code + zip code Estimated Time of Arrival (ETA) The estimated time of arrival

Original ETA (oETA) First estimated time of arrival of a shipment Actual Time of Arrival (ATA) Actual time of arrival of a shipment

volume The volume of a shipment weightunit kg, g, etc.

service service of shipment lon Longitude coordinates lat Latitude coordinates Kolli-id The ID of a shipment

(47)

Table 5.2: Specification of Requirements - Customer Support. Query Description

Find Shipment by Kolli-id Use a shipment’s kolli-id to find information about it

Find Shipments for a Consignee Find all shipments for a consignee by using identifiers such as name, phone number and email

Find Shipments for a Consignee from Find all shipments for a consignee a Consignor from a Consignor by using identifiers

such as name, phone number and email Find Top 10 Suitable Delivery Points Find the 10 most suitable delivery points for a Consignee for a consignee, ordered by the consignee’s

favourite delivery points

Table 5.3: Specification of Requirements - Business To Business. Query Description

Find all Shipments Sent to Organisations Finds all shipments sent from one organisation to other organisations Find Biggest Organisations in an Area Finds the organisations that send the

highest amount of shipments to an area Find Consignees’ Consignees Find all consignees of an organisation’s

consignees

Tables 5.2 and 5.3 present the query names and also a description of what the queries should be able to accomplish. As such, all created queries need to fulfill the definitions specified in this section.

5.3 Graph Data Model

(48)

5.3. GRAPHDATAMODEL 29

Figure 5.2: The iterative phases of creating the graph data model.

As specified in Figure5.2the process was divided into six steps. These steps are explained below, starting with the first step, ”Read Related Documentation”, and ending with the last step, ”Complete”.

The first step, Read Related Documentation, includes accessing relevant docu-mentation of the system that is to be created as well as information about the source systems from which the raw data will be acquired.

The second step, Build Model, then handles the creation of an iteration of the model, which is represented by a graph containing the needed data identified in step one.

Once an iteration of the model has been completed, the third step, Consult System Administrators, is reached, where the model is presented to key personnel with a deep knowledge of the source systems or the system to be created.

The fourth system, Correct Model?, identifies whether the created model is cor-rect, upon which either the fifth or the sixth step is reached, depending on the feedback of the system administrators in step three.

The fifth step, Identify Errors, is reached if the created model does not correctly represent the new system. The possible faults in the model are then identified by either consulting the system administrators or accessing relevant documentation. Once the faults have been identified step two is once again reached, and a new iteration of the model can be created.

(49)

5.4 Data Migration

This section presents the process of how the data needed for the database will be migrated into Neo4j. The raw data input could potentially come from many dif-ferent sources, but for the sake of the thesis’ scope and time restrictions, only the most basic case is handled. Most often data can be found on existing DBMSs that are in use. Four steps have been identified in order to migrate the required data from other DBMSs to Neo4j. All the steps are identified in Figure5.3, where the relations between the steps also are identified.

Figure 5.3: The model for data migration.

The first step involved accessing and retrieving all relevant raw data in some struc-tured format, such as JSON or CSV, both of which potentially would require some parsing and cleaning, step two, in order to convert to the input structure needed for Neo4j.

Once the raw data has been converted to the correct format it could then be im-ported into Neo4j, step three, where it could be used for analysis by using the graph query language Cypher, step four.

5.5 Analysis

(50)

5.5. ANALYSIS 31

Figure 5.4: The model used to create queries.

The first step, Read Query Requirements, includes the accessing information about the query to be implemented, as well as understanding the underlying structure in the logistics network.

The second step, Construct, involves the process of creating a query to meet the requirements specified in the specification of requirements, using the information from step one.

Then the query is run in the third step, Run Query, and different interesting ex-amples of the query result could be found. This could include binding a query to a specific company or person, which would lead to actual real world scenarios which could be more easily verified. This is further explained in Section6.3. The fourth step, Expected Results? involves checking whether or not the query functions according to the specification of requirements. This could be done in many ways depending on the scenario. As mentioned in the third step, some queries could be bound to a specific person or company, for example, in which case it would be possible to manually calculate the result of a specific query. It could also involve consulting people familiar with the logistics system. If the re-sults appear to be correct the query is complete, in which case the sixth step is reached. If the results seem to be incorrect, the fifth step is reached.

The fifth step, Identify Errors, identifies the possible errors in the query. Once an error has been found the second step is once again reached, where a new query will be created that does not contain the error.

(51)

5.6 Experimental Design

This section presents the test environment as well as various software programs and hardware unrelated to DBMSs that are used throughout the development phase of the underlying degree project.

5.6.1 Test Environment

All tests showcased in this thesis are run on a laptop using Neo4j’s standard browser, which is able to visualize the results in the form of a graph, in addition to providing much information such as the run times of queries [26].

5.6.2 Hardware Setup

Table 5.4 provides information about the hardware used throughout the whole thesis project, including the test phase.

Table 5.4: Hardware setup used for Neo4j environment. Name Microsoft Surface Pro 3 Processor Intel Core i5 4300U GPU Intel HD Graphics 4400 Internal Storage 128GB SSD

RAM 4GB DDR3

Operating System Windows 10 Pro 64-bit

5.6.3 Java

Java is an object-oriented programming language [27] and is in the underlying the-sis project used to parse raw data into the correct input format that Neo4j requires. The reason why specifically Java was used was due to existing internal libraries at PostNord could decrease the development time needed to create a parser for the Neo4j input data.

5.6.4 Trello

(52)

5.7. ASSESSING RELIABILITY ANDVALIDITY 33

Many other task management product exist and Trello is not specifically needed for a project of this nature.

5.7 Assessing Reliability and Validity

Being able to assess the reliability and validity of a created product is hugely important in order to be able to confirm results and conclusions. This section details the process of how the results are validated and of how they are assessed to be reliable.

5.7.1 Reliability

There are two main aspects of the results where the reliability needs to be as-sessed. These are the graph data model and the query results.

The graph data model will be assessed in interviews by asking key PostNord em-ployees if the graph data model is reliable, i.e if it covers every possible scenario. The query results in Neo4j need to be consistent and give a specific output given a specific input. It is assessed by the employees at PostNord in interviews, by confirming that the outputs are the same throughout multiple test runs.

5.7.2 Validity

There are two aspects of the degree project that need to be validated. The first aspect is the graph data model of the logistics network in use by PostNord and the second is the results of the queries in Neo4j.

The model is valid if it can fulfill two conditions. The first condition is that the model does not stray from underlying documentation of the logistics network and the second condition is that no administrator of the network at PostNord claims that it is incorrect. These conditions is checked by conducting interviews with key PostNord employees and asking about the model’s correctness.

(53)

5.8 Evaluation Framework

The usability, strengths and weaknesses of Neo4j are not clearly defined. It highly depends on the use case and the people who use the system. In this case PostNord has provided the specification of requirements used by the thesis project, and as such they are in the best position to provide feedback of the end result. Evaluation data of the graph data model, the queries, both results and implementation, and the prototype as a whole will be gathered both throughout the development process and from the final product.

As explained in Section 3.4, the evaluation of the prototype is based on inter-views. There exist many different ways to conduct interviews, and this thesis will use unstructured and informal interviews in order to collect evaluation data. This means that there will be no structured interview questions and no specific person will be bound to a specific response. The reason why structured interviews are not used in this thesis is in order to capture the true feelings of the interviewed employees at PostNord. Much of the feedback is gathered throughout the de-velopment process, which means that the interview questions would have to be modified multiple times. Furthermore, since the project is very exploratory with no firm grasp of what the results would be, limiting the interviewed employees to answering predefined questions might limit the findings of the thesis.

(54)

Chapter 6 Implementation

This chapter explains the implementation process of the prototype and includes the various steps and challenges that arose throughout the development process. The implementation phases closely follow the defined development model and development processes defined in Chapter5.

6.1 Model

The iterative model illustrated in Figure5.2was followed, and a large focus during the first stage (Read Related Documentation) was to identify various flows in the model, i.e parts of the model that were connected in a clear way. These flows could then easily be converted into a part of the model. At first the three most basic components were identified. These were the Consignor, Shipment and Consignee, as well as the relations between the three nodes, illustrated in Figure6.1.

(55)

36 CHAPTER6. IMPLEMENTATION

(56)

6.1. MODEL 37

Figure 6.2: The advanced graph model.

The graph data model shown in Figure 6.2 contains most of the information needed by the queries related to customer support, but it does not cover any busi-ness aspects, meaning that it does not cover the queries related to B2B. Both a Consignor and a Consignee can be a part of an Organisation, and an Organisation can be a part of a Trade. Adding these pieces to the model illustrated in Figure

(57)

Figure 6.3: The complete graph model that will be used to create the graph DBMS.

6.2 Data Migration

(58)

6.2. DATAMIGRATION 39

6.2.1 Data Retrieval

This section presents how data needed for the prototype service was collected. The data required by the consumer profiling system in order to create the queries specified in the specification of requirements could be retrieved from some of the stakeholder PostNord’s existing RDBMSs. Although the majority of the retrieved data was in a CSV format, some was in a JSON format, meaning multiple parsers had to be used in order to convert the input to the structure required by Neo4j. In order to limit the amount of data used by the prototype, only data from a short period of time was used. This resulted in around 9 million shipments, 3,5 million consignees, 90.000 consignors and 60.000 zip areas, all of which were retrieved from PostNord’s existing DBMSs.

6.2.2 Parsing

Prior to passing collected data into the Neo4j database, some data needed to be modified. This was done by using a parser, which is explained in this section. In order to create a versatile and modular parsing system Java was used. As men-tioned in Section6.2.1, two parsers were needed in order to convert both data from JSON format and CSV format to the input structure needed by Neo4j. In addition, a substantial amount of the retrieved data was not needed by the customer pro-filing system, meaning that such data needed to be removed before creating the database in Neo4j.

A lot of the retrieved data was also based on user input, i.e information that the customer had defined without any guidelines. As such the parsing process also needed to take into account many different aspects such as names with or without the use of capital letters, different phone numbers, and so forth.

(59)

Table 6.1: Visual example of two files containing a list of Person or Pet with corresponding properties.

ID :LABEL name person1 Person NSven

ID :LABEL name type pet1 Pet NSnow TCat

As illustrated in Table6.1, Sven has the identifier N, Name, prior to his name to signify that the value is a name. Snow also has an N in front of its name, and Cat has a T, Type, in front of it. These identifiers serve no specific purpose to Neo4j, but it is useful for the user to be able to differentiate between the different hash values created by the parser.

Table 6.2: Visual representation of how a relationship between a Person and a Pet might look.

:START ID(Person) :END ID(Pet) person1 pet1

In Table6.2we set a start ID to person1, Sven’s ID, and an end ID to pet1, Snow’s ID. This creates an edge from Sven to Snow.

Due to privacy concerns the exact parsing process or the parsed data can not be included in the thesis, but one CSV file was created for every type of node and edge in the final graph data model shown in Figure6.3. As such the files shown in Table6.3were created.

Table 6.3: All CSV files with all the migrated data in nodes and relationships. Node Relationship

zipAreas.csv to.csv consignors.csv sent.csv consignees.csv belongs to.csv

shipments.csv has.csv deliveryPoints.csv in.csv

party.csv located at.csv organisation.csv picked up at.csv

(60)

6.3. ANALYSIS 41

6.2.3 Database Creation

Once the data files had been created, they could be imported into Neo4j. This was done using the windows command prompt and a specified command, which is shown in Appendix A. relationships:X x signifies that X is what the relationships in the file x should be labeled as in Neo4j. Neo4j then proceeds to create the database, after which the analysis can begin.

6.3 Analysis

In this section the implementation process of the queries defined in the specifi-cation of requirements are implemented. The process is divided into two subsec-tions, which describe the queries related to customer support and B2B, respec-tively.

6.3.1 Customer Support

Four queries were provided in the specification of requirements that are related to customer support. These are individually implemented in this section.

6.3.1.1 Find Shipment by Kolli-id

The goal of this query is to use a shipment’s kolli-id to find information about it. Obviously a shipment has to be sent by a consignor and sent to a consignee, so these are the first pieces that can be identified to be a part of the query.

MATCH ( c r : C o n s i g n o r ) −[:SENT]−>( s : S h i p m e n t ) −[:TO]−> ( c e : C o n s i g n e e )

The consignee then has a party, which also could be included in order to add some more information about the consignee to the result.

MATCH ( c r : C o n s i g n o r ) −[:SENT]−>( s : S h i p m e n t ) −[:TO]−> ( c e : C o n s i g n e e ) −[:HAS]−>(p : P a r t y )

Then the query need to be matched to the specific kolli-id, after which the results can be returned.

WHERE s . i d =”X” RETURN c r , s , ce , p

(61)

6.3.1.2 Find Shipments for a Consignee

In this query the goal is to find all shipments for a consignee by using identifiers such as name, phone number and email. Obviously the shipments, the consignee and the party will be a part of the query. It is of no interest from which organisation the Shipments come from, so that does not need to be included in the query.

MATCH ( s : S h i p m e n t ) −[:TO]−>( c e : C o n s i g n e e ) −[:HAS]−> ( p : P a r t y )

The current version of the query will find all shipments that are sent to a consignee, so there is a need to bind it to a specific consignee. As mentioned, this can be done by binding it to a consignee’s email, phone number or name. The result can then be returned.

MATCH ( s : S h i p m e n t ) −[:TO]−>( c e : C o n s i g n e e ) −[:HAS]−> ( p : P a r t y )

WHERE ( p . e m a i l =”X” OR p . m o b i l e =”Y” ) AND p . name =”Z” RETURN s , ce , p

In the final version of the query, X represents the email, Y the phone number and Z the name. The query then returns information about the shipment, the consignee, and the consignee’s party.

6.3.1.3 Find Shipments for a Consignee from a Consignor

The goal of this query is to find all shipments for a consignee from a consignor by using identifiers such as name, phone number and email. This is highly similar to the query handled in Section6.3.1.2, only that it is bound to a specific consignor. As such the first iteration of the query will look almost the same, only that the consignors that sent the shipments are included.

Once again there is a need to bind the query to a specific consignee, by email, phone number, name and so forth. The consignor also needs to be defined, which can be done by providing the name of the consignor.

WHERE ( p . e m a i l =”X” OR p . m o b i l e =”Y” ) AND p . name =”Z” AND c r . name =”W”

(62)

6.3. ANALYSIS 43

In the final version of the query, X represents the email, Y the phone number, Z the name of the consignee and W the name of the consignor. Information about the consignor, shipments, consignee and the consignee’s party is then returned.

6.3.1.4 Find Top 10 Suitable Delivery Points for a Consignee

The goal if this query is to find the 10 most suitable delivery points for a con-signee, ordered by the consignee’s favourite delivery points, i.e the ones that are most in use by the consignee.

To start things off, the goal is to find where all shipments are delivered, so the shipments, the consignee and the delivery points need to be included in the query.

MATCH ( s : S h i p m e n t ) −[:TO]−>(m: C o n s i g n e e ) ,

( s : S h i p m e n t ) −[: PICKED UP AT]−>(n : D e l i v e r y P o i n t ) The current query will run for all consignees, so there is a need to bind it to a specific consignee. Once it is bound, the results can be returned.

MATCH ( s : S h i p m e n t ) −[:TO]−>( c e : C o n s i g n e e ) ,

( s : S h i p m e n t ) −[: PICKED UP AT]−>( dp : D e l i v e r y P o i n t ) WHERE c e . name =”X”

RETURN dp . name , c o u n t ( s ) ORDER BY c o u n t ( s ) DESC LIMIT 10

In the final version of the query, X represents the name of the consignee. The 10 most used delivery points, counted by the amount of shipments, are then returned, coupled with the number of shipments that have been delivered to the locations.

6.3.2 Business To Business

In this section all three queries provided in the specification of requirements that are related to B2B are implemented.

6.3.2.1 Find all Shipments Sent to Organisations

The goal of this query is to find all shipments sent from one organisation to other organisations.

MATCH ( o : O r g a n i s a t i o n ) <−[:BELONGS TO] − ( : C o n s i g n o r ) −[:SENT]−>( s : S h i p m e n t ) −[:TO] − >(: C o n s i g n e e )

−[:BELONGS TO]−>(o2 : O r g a n i s a t i o n )

(63)

MATCH ( o : O r g a n i s a t i o n ) <−[:BELONGS TO] − ( : C o n s i g n o r ) −[:SENT]−>( s : S h i p m e n t ) −[:TO] − >(: C o n s i g n e e )

−[:BELONGS TO]−>(o2 : O r g a n i s a t i o n ) WHERE NOT o . name=o2 . name AND o . name =”X”

RETURN c o u n t ( s ) AS s e n t , o2 . name AS o r g a n i s a t i o n ORDER BY s e n t DESC

In the final version of the query, X represents the name of the sending company. The organisations that receive the shipments and the amount of shipments they receive is then returned, ordered by the number of shipments.

6.3.2.2 Find Biggest Organisations in an Area

This query tries to finds the organisations that send the highest amount of ship-ments to a specific area. A consignor belongs to an organisation, and sends a shipment that’s picked up at a delivery point, which in turn is located at a loca-tion; a ZipArea.

MATCH ( o : O r g a n i s a t i o n ) <−[:BELONGS TO]− ( c r : C o n s i g n o r ) −[:SENT]−>( s : S h i p m e n t ) −[: PICKED UP AT]−>(d : D e l i v e r y P o i n t ) −[:LOCATED AT]−>( z : Z i p A r e a )

The query then needs to be bound to a specific area. The found organisations also needs to be sorted by the amount of shipments that they send.

MATCH ( o : O r g a n i s a t i o n ) <−[:BELONGS TO]− ( c r : C o n s i g n o r ) −[:SENT]−>( s : S h i p m e n t ) −[: PICKED UP AT]−>(d : D e l i v e r y P o i n t ) −[:LOCATED AT]−>( z : Z i p A r e a )

WHERE z . i d =”X”

RETURN o . name , c o u n t ( s ) AS sum ORDER BY sum DESC In the final version of the query, X represents the ZipArea’s id. The name of the organisations are returned, coupled with the amount of shipments they send to the specific area. The orgganisations are ordered by the amount of shipments they send in a descending order.

6.3.2.3 Find Consignees’ Consignees

(64)

6.3. ANALYSIS 45

MATCH ( o1 : O r g a n i s a t i o n ) <−[:BELONGS TO]−

( c1 : C o n s i g n o r ) −[:SENT]−>( s 1 : S h i p m e n t ) −[:TO]−> ( c e 1 : C o n s i g n e e ) −[:BELONGS TO]−>(o : O r g a n i s a t i o n ) WHERE o1 . name =”X” AND NOT o . name =”X”

Once all receiving organisations have been identified, their consignees can in turn be found by adding to the query. Once those consignees have been identified, they can be returned in a descending order, ordered by the amount of received shipments.

MATCH ( o1 : O r g a n i s a t i o n ) <−[:BELONGS TO]−

( c1 : C o n s i g n o r ) −[:SENT]−>( s 1 : S h i p m e n t ) −[:TO]−> ( c e 1 : C o n s i g n e e ) −[:BELONGS TO]−>(o : O r g a n i s a t i o n ) WHERE o1 . name =”X” AND NOT o . name =”X”

WITH o , o1

MATCH ( o : O r g a n i s a t i o n ) <−[:BELONGS TO] −( c : C o n s i g n o r ) −[:SENT]−>( s : S h i p m e n t ) −[:TO] −( c e : C o n s i g n e e )

−[:BELONGS TO]−>(o2 : O r g a n i s a t i o n ) WHERE NOT o2 . name=o . name

RETURN o1 . name AS s e n d i n g o r g a n i s a t i o n , c o u n t ( s ) AS sum , o2 . name AS r e c e i v i n g C o n s i g n e e

ORDER BY sum DESC

(65)

(66)

Chapter 7 Graph Database Management

System for a Logistics-related

Service

In this chapter the results of the thesis are presented. The chapter is divided into five separate parts that present and discusses the results of the finished prototype service, presents the feedback of the different aspects of the prototype, provide analysis of validity and reliability and finally a section that discusses the results presented in this chapter.

7.1 Prototype Results

This section presents the final version of the prototype and provides a detailed insight into every major aspect of the prototype service. Example outputs are provided in order to give a greater perspective of how the implemented queries might work in a normal situation.

7.1.1 Model

In this section the final version of the model is presented. The final version is shown in Figure 6.3. It represents the different aspects of the system as well as their relations to each other. The nodes’ names have previously been explained in Section 5.2, whereas the relations’ labels describe the relations’ types. Table

7.1details the different properties of the nodes specified in Figure6.3. The more difficult terms are explained in Table5.1.

(67)

48

CHAPTER 7. GRAPH DATABASEMANAGEMENTSYSTEM FOR A

LOGISTICS-RELATEDSERVICE

Table 7.1: Node types and their corresponding properties. Trade Organisation Consignor Shipment

ID ID ID ID

name name weight source weightunit volume service oETA ATA Consignee Party DeliveryPoint ZipArea

ID ID ID ID

name name name lon email zip lat number type

country

7.1.2 Analysis

This section provides the finished queries as well as example query results. Note that names and other important parameters have been edited in order to protect private information.

7.1.2.1 Find Shipment by Kolli-id

(68)

7.1. PROTOTYPE RESULTS 49

Figure 7.1: A Shipment node with its relations and properties.

7.1.2.2 Find Shipments for a Consignee

Appendix B.2 contains the final version of the query. Figure 7.2 showcases an example output of the query; showing the Consignee, its Party, as well as all its Shipments. As a time reference, the example query took around 8 seconds to run.

(69)

50

7.1.2.3 Find Shipments for a Consignee from a Consignor

Appendix B.3 contains the final version of the query. Figure 7.3 showcases an example output of the query; showing all shipments between a Consignor and a Consignee. Lastly, the Consignee’s party is also shown. As a time reference, the example query took around 0.5 seconds to run.

Figure 7.3: Two Shipments sent from a Consignor to a Consignee that has a Party.

7.1.2.4 Find Top 10 Suitable Delivery Points for a Consignee

(70)

7.1. PROTOTYPE RESULTS 51

Figure 7.4: Top 10 Delivery Points by delivered Shipments for a Consignee.

7.1.2.5 Find all Shipments Sent to Organisations

(71)

52

Figure 7.5: The top 10 Organisations that an Organisation has sent Shipments to, ordered by the amount of Shipments.

7.1.2.6 Find Biggest Organisations in an Area

A graph database management system for a logistics-related service

A graph database management

system for a logistics-related

service

MARCUS WALLDÉN

AYLIN ÖZKAN

A graph database management system for a

logistics-related service

Abstract

Abstract

Acknowledgements

Contents

List of Figures

List of Tables

List of Acronyms and Abbreviations

Chapter 1

Introduction

1.1

Background

1.2

Problem definition

1.3

Purpose

1.4

Goals

1.5

Benefits, Ethics and Sustainability

1.6

Methodology

1.7

Stakeholders

1.8

Delimitations

1.9

Structure of the thesis

Chapter 2

Database Systems

2.1

Database Terms

2.1.1

Database

2.1.2

Database System

2.1.3

Database Management System

2.2

DMBS Types

2.2.1

Relational Database Management System

2.2.2

Graph Database Management System

2.3

Graph DBMS Products

2.3.1

Neo4j

2.3.2

Titan

2.4

Related work

2.4.1

A Comparison of a Graph Database and a Relational

Database

2.4.2

Using Neo4j for mining protein graphs

2.5

Summary

Chapter 3

Methodology

3.1

Research Paradigm

3.2

Research Approach

3.3

Research Design

3.4

Data Collection

3.5

Data Analysis

3.6

Quality Assurance