Proposal and Evaluation of a Database Data Model Decision Method

(1)

IN

DEGREE PROJECT COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2020,

Proposal and Evaluation of a Database Data Model Decision Method

Förslag och utvärdering av en beslutsmetod för databasmodeller

EMIL LINDHOLM BRANDT SABINA HAUZENBERGER

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

Abstract

A common problem when choosing a data model for a database is that there are many aspects to take into consideration–making the decision diﬃcult and time-consuming. Therefore this work aims to create a decision method that enhances the decision by making it more suitable for the use-case at hand as well as making it quicker.

First, the Analytical Hierarchy Process, a multi-criteria decision method, was identified as a suitable framework that the created decision method was based on. It was developed iteratively and later validated through a survey at Omegapoint. The survey had 27 respondents, but 14 answers were discarded due to being too unreliable, which led to a total of 13 utilized responses. The decision method was implemented in a web application to simplify the survey process, where the respondents use the web application, and answered some follow up questions about the web application’s result and process.

It was found that it is possible to create a decision method which makes the choice of a data model quicker and better suited for the use-case. The method is reliable among a subsample of the respondents in the survey as 11 out of 13 respondents found the decision method’s result to be reasonable.

However, the small sample size makes it impossible to draw any statistical conclusions of the reliability of the decision method. Additionally, the decision method helps to make the decision quicker, but this is only proven among the respondents in the survey.

Based on the results, we conclude that it is possible to create a decision method which makes the decision quicker and better suited for the use-case.

However this is only proved among the survey respondents, and a future work could aim to repeat the validation in order to statistically validate the reliability of the decision method.

Keyword

Database, data model, Analytical Hierarchy Process, decision method

(3)

ii

Sammanfattning

Ett vanligt problem vid valet av datamodell för en databas är att det finns många aspekter att ta hänsyn till–vilket gör valet svårt och tidskrävande.

Detta arbete försöker därför skapa en beslutsmetod som kan förbättra beslutet genom att göra det snabbare och bättre anpassat för användningsområdet.

Först valdes Analytical Hierarchy Process, en multikriterie-beslutsmetod, som grund till den framtagna beslutsmetoden. Beslutsmetoden utvecklades iterativt och validerades sedan genom en undersökning på Omegapoint. Un- dersökningen hade 27 respondenter, men 14 svar plockades bort då de var för inkonsekventa, vilket ledde till att 13 svar användes till slut. I undersöknin- gen använde deltagarna en webb applikation, baserad på beslutsmetoden, och svarade sedan på några frågor och gav feedback om artefaktens resultat och process.

Resultaten visade att det är möjligt att skapa en beslutsmetod som gör valet av datamodell snabbare och bättre anpassat för användningsområdet.

Metoden anses vara träﬀsäker bland deltagarna i undersökningen, där 11 av 13 ansåg att resultatet var rimligt. Däremot kan arbetet inte dra några statistiska slutsatser om hur träﬀsäker metoden är generellt på grund av det låga antalet deltagare i undersökningen. Utöver en god tillförlitlighet, bidrar metoden till ett snabbare beslut, men detta kan endast bevisas för deltagargruppen i undersökningen.

Givet resultaten kan vi dra slutsatsen att det är möjligt att skapa en beslutsmetod som gör valet av datamodell snabbare och bättre anpassat för användningsområdet. Detta kan däremot endast kan bevisas för deltagargruppen i undersökningen och därför föreslås att ett framtida arbete skulle kunna upprepa valideringen med en större deltagargrupp för att kunna fast- slå modellens tillförlitlighet statistiskt.

Nyckelord

Databas, datamodell, Analytical Hierarchy Process, beslutsmodell

(4)

Acknowledgements

This work was done at Omegapoint, an IT consulting firm in Stockholm and we would like to thank them for giving us the opportunity and resources to do this work at their oﬃce. Especially Lasse, our supervisor at Omegapoint who helped us a lot and listened to our ideas and gave us feedback almost daily. We would also like to thank Leif Lindbäck for listening to our ideas and helping us form them. Lastly, we would like to thank Johan Montelius and Anders Sjögren, our supervisor and examiner, for helping us with this work.

(5)

Chapter 1 Introduction

1.1 Background

When developing almost any application a database is required in order to store persistent information. Not only is a database required, but the database is often expected to be performant, have certain features and be able to handle specific data. The first decision one has to make when selecting a database is choosing which data model to use. The data model is the conceptual choice of how the data is structured and handled, in the database.

Choosing a data model that is suitable for the application may be a complex and diﬃcult choice.

1.2 Problem

When planning the development of a new use-case, developers have to decide which data model to use for the database in the use-case. This is an important choice, since the wrong decision may lead to terrible performance and unnecessarily complex and slow development among many pitfalls. One has to consider how well the database has to perform, which usage patterns will be common and which features it needs. This is not a straightforward choice in most scenarios and can be diﬃcult and time-consuming to make. There is a risk that developers may only seriously consider data models they already have experience or are comfortable with, and not what is best suited for the use-case at hand. Moreover, it is entirely possible that a developer does not know enough about the most common data models to make a well-informed choice.

This problem is recognized at Omegapoint, where consultants think that it can be diﬃcult to choose a suitable data model. Both due to the fact

1

(8)

that everyone does not have knowledge about every data model, and due to the complexity of the subject itself. The same problem can be observed on internet forums and blog posts where never-ending arguments about which data model is best suited for diﬀerent scenarios are going on. Therefore, this seems to be a general problem within the field of backend development and databases.

1.3 Purpose

The project seeks to make the process of choosing a data model for a database enhanced by making it quicker and to make the choice better suited for the use-case at hand. It also seeks to make the outcome of the decision better suited for the use-case. The consultants, at Omegapoint, agree that the choice of data model could be better suited for the use-cases and less out of habit.

1.4 Goals

This work aims to create a decision method that enhances the decision when choosing a data model for a use-case within a project. Such a method should make the choice of data model quicker and more suitable for the use-case.

Quicker development is beneficial for everyone as long as it does not com- promise the quality of the product. The decision method should produce suitable recommendations that may be used for use-cases in real scenarios.

1.5 Method

Since it is diﬃcult to make the choice of which data model to use, this work would like to investigate whether it is possible to create a method to help enhance the choice of a data model for a given use-case within a project.

Therefore the main research question is:

Can a decision method be created to help enhance the choice of data model given a use-case’s technical requirements?

The enhancement should be in terms of making the choice quicker and better suited for the use-case at hand.

In order to answer the question, an attempt will be made to choose, create and validate the reliability of a decision method that helps with the choice

(9)

1.6. DELIMITATIONS 3

of a data model. Research and literature study will be conducted in order to identify a suitable decision method, then further research and work will be carried out to iteratively create and refine the method. To validate and assess the method’s reliability, an implementation of the decision method will be created. The implementation will then be evaluated and validated in a case study at Omegapoint, utilizing its consultants’ expertise and experience.

1.6 Delimitations

This project will only consider the following data models: relational, document, wide-column, key-value and graph. The reason for this is that they are the most common data models [1].

Only one decision method will be created due to constraints in time.

Furthermore, only the final version will be evaluated and proposed as a solution to the problem.

The decision method will only consider single use-cases, such as a login system, or an order system. An attempt to create an all-round recommendation for a data model that handles lots of different use-cases will not be made. Instead, the idea of polyglot persistence, using different data models or databases for different use-cases within the same project, is encouraged in these cases.

The work will not look at the non-technical aspects of choosing a suitable data model. These aspects include the pricing of vendor implementations, customer support, licenses, available technical expertise, inherited legacy system choices, and miscellaneous tools surrounding database products to name a few.

1.7 Risks, Consequences and Ethics

As with almost any technology, there is a possibility that it may be used for nefarious purposes. However, it is not apparent that this work would have any obviously bad consequences as this investigation considers databases in general in a very technical sense.

1.8 Outline

In chapter 2, Background, brief information about the diﬀerent data models and also the Analytical Hierarchy Process will be covered. In chapter 3, Method, the applied method for the whole thesis will be presented. Chapter

(10)

4, Designing the decision method, will cover the creation of the decision method. Chapter 5, Evaluation of the decision method, will present the implementation of the decision method and the case study. In chapter 6, Summary, the discussion regarding the results and the applied method, the conclusion, and some future works will be covered.

(11)

Chapter 2 Background

In order to fully grasp this work, one must have a basic understanding of computer science and programming. In this chapter, any additional required knowledge will be elucidated. In this chapter, the five considered data models will be explained, as well as the Analytical Hierarchy Process, which is the method that the created decision method is based on.

2.1 Database models

There are two primary families of data models: relational and aggregate. The relational data models examined are the relational and graph data models.

The aggregate data models examined are the document, key-value and wide- column data models.

The relational data models model the world as small separate entities with relationships to each other. The relationships are used to combine the small entities into larger, often more useful, entities, also known as aggregates. Note that in order to achieve aggregates, such as the information about which courses a student attends at a university, one must examine the relationships between the entities Student and Course and combine them.

The aggregate data models store data in terms of more complete real- world entities. Instead of breaking up the information about which courses a student attends at a university and connecting them through relationships, they are stored together in a single entity.

Relational

The relational data model is often referred to as Structured Query Language, SQL, which in reality is its query language. It has been the dominating

5

(12)

data model since the 1980s [2][3] and is still the most popular data model for data storage [1]. The relational data model consists of tables that may have multiple columns. A column is a cell that may contain simple data or be a foreign key which points out a row in another table. This foreign key forms a relationship between the tables, which is why the data model is called relational. The relational data model requires a strict schema to be followed when writing data. This limits what values or types of data may be written to a row, on a column by column basis. In order to change this schema, a special query must be executed to change the schema for an entire table.

In databases based on the relational model, the designer tries to normalize the data, which means breaking up the data into smaller and smaller tables until each cell only holds one simple value each, and there is no redundancy.

A database containing information about what courses a university student attends would have one table for holding the information about each student, and one for the courses. Then a separate table will be created that holds the course information and yet another to hold the relationships between the Course and Student tables. In the relationships table, which could be called StudentAttendsCourse, one column would be a foreign key to the Student table, and another will be a foreign key pointing at a row in the Course table. Then to see which students attend which courses, a JOIN operation is executed which combines these three tables to form an aggregate.

Document

The document data model is best described as a tree structure, or even more accurately a JSON object. Every entity is called a document, which lives inside a specific collection. Documents contain key-value pairs, which are often referred to as attributes. The key is a basic string, whereas the values could be primitive values (int, double, boolean, etc.), arrays or entire documents. Documents within the same collection do not need to have the same structure, they are so-called schema-less, which means that their structure and the allowed data does not have to adhere to any rules [4] [2].

However, documents in the same collection often share much of the structure due to design. It would make little sense from a domain model standpoint to store completely diﬀerent entities together, but having smaller variations to the stored data may be completely reasonable. Document data models are not designed to utilize relationships between collections, which means that redundancy may occur and are encouraged if needed [2]. If relationships are required, one can create implicit relationships by storing the ID of another document, but the logic of connecting them must be completely handled by

(13)

2.1. DATABASE MODELS 7

the application, which indicates that one should use a relational data model instead.

When modeling the student attending courses at a university scenario, there are more choices that are considered good practices than for a relational data model. One could either create a collection containing Student documents and within each Student document store a list of courses the student attends. On the other hand one might as well create a collection of Course objects and within each course create a list of attendees. One could even use both at the same time, this all depends on how the data is accessed.

If the only or most common use-case is to see what courses a specific student attends, then the first scenario of creating Student documents with the courses nestled within might be the best.

Key-value

The key-value data model is perhaps the simplest data model as it works much like a distributed hash-map [2]. An aspect that is important to the performance, aside from its very simplistic operations, is that there are no real updates in key-value databases as values are overwritten if they need to be updated. The key-value data model can not have any relationships at all. The biggest diﬀerence to the other data models is that the values are completely opaque in key-value databases. The downside to this is that one can not query the data to filter certain attributes in the data inside the database itself. On the other hand, it has the upside of the values having no restrictions as to what can be written (other than a maximum size), and therefore anything can be saved in it. Both standard and custom data types and any structures may be saved in a key-value database as it treats the value as a generic BLOB (Binary Large OBject). The implication of this is that the parsing from binary data to the data type or structure that the client needs has to be done on the client-side.

Wide-column

The wide-column data model is at first a rather confusing data model. It is conceptually a relational data model where each row in a table may have a variable number of columns, at no memory cost [2]. It also lacks relationships and therefore can not utilize any JOIN operations, which combines data from multiple tables into one aggregated piece of data. However, technically it is similar to a simplified document data model, or a more complex key- value data model without opaque values. Compared to a document database, wide-column databases store “documents” in diﬀerent column-families. They

(14)

are accessed only through the index key and any secondary indices. These

“documents” also have a maximum nesting depth of two. The most notable aspect of wide-column databases is that they are write-optimized.

Graph

A graph data model is represented as a graph data structure where each node is an entry in the database and the edges between the nodes are the relationships [2]. A node is almost equivalent to a row in a relational database or a document in a document database. Nodes within the same graph do not have to contain the same information and have no schema on write. Since graph databases hold the relationships as a native component of the data structure, querying relationships is fast and does not have to perform any expensive JOIN operations.

To model the student attending courses at a university in a graph data model, every student and course would be an individual node and the edges are the relationships between nodes. All students attending a course would have a relationship with this specific course. This could be represented as Student->attends->Course, where attends is the relationship the student node has to the course node, see figure 2.1.

Figure 2.1: How graph data models model the relationships.

2.2 Analytical Hierarchy Process

Analytical Hierarchy Process is a mathematical and psychological technique for decision-making where multiple criteria are involved [5]. The goal could be anything, such as choosing a new leader or buying a new house.

In order to utilize the method, multiple criteria that are important to the goal must be identified, along a set of alternatives for the choice. The

(15)

2.3. RELATED WORK 9

criteria, alternatives and goal are inserted into a hierarchy. For example, in order to make a satisfactory choice of a house, the goal is inserted at the top, the criteria in the middle and the diﬀerent alternatives (the houses) at the bottom, see figure 2.2.

When the hierarchy is created, the designer has to rate how much better one alternative is to another alternative with respect to each criterion. This is done through a series of pairwise comparisons until all alternatives have been compared to all other. The designer will, for example, rate how much better alternative A is compared to alternative B, in regards to a criterion, by answering if alternative A is equally, moderately, strongly, very strongly or extremely better than alternative B, or the inverse if B is better than A. These assessments are translated into numbers that are inserted into mathematical matrices, which are used for the final mathematical operations that yield the results.

The last step is for the decision-maker to complete a process very similar to that of the designer. Instead of rating how good the alternatives are compared to each other, the decision-maker rates how important the diﬀerent criteria are for the particular choice. This will generate a second set of matrices.

When the decision-maker has completed the process of prioritizing the criteria against each other, a consistency ratio index will be calculated. This index gives the decision-maker an indication of how much their answers contradict each other. For example, if the decision-maker rates criterion A to be more important than criterion B, criterion B to be more important than criterion C, and then consider criterion C more important than criterion A–there is clearly an inconsistency. If no inconsistencies are found, the consistency ratio index would be 0, and the recommended maximum index is 0.1. Given these two sets of matrices, a series of mathematical operations are performed which results in an output vector that gives each alternative a score which indicates to what degree it fits the requirements.

2.3 Related work

Multiple works that compared some of the considered data models, both in regards to performance and theoretical comparisons, were found and are presented here.

Fahd, Kaspi, and Venkatraman compared a relational database to the following NoSQL databases: document, wide-column, key-value and graph [6].

The comparison is made specifically within the field of big data. They elucidated their respective features and strengths in comparison to relational

(16)

Figure 2.2: The hierarchy for choosing a satisfactory house [5].

databases and discussed scenarios in which these databases may be well suited. The comparison is mostly theoretical but does refer to some previously done performance tests. They conclude that the specializations de- livered by NoSQL data models make up a good complement to relational databases in new applications within big data.

Li and Manoharan compared basic CRUD - Create, Read, Update, Delete - operations in: MongoDB, RavenDB, CouchDB, Cassandra, Hypertable, Couchbase, and Microsoft SQL Server Express [7]. They came to the conclusion that about half of the NoSQL databases have better performance than the SQL database. MongoDB and Couchbase were the best overall performers.

No work that tackled the same issues that this work does were found.

(17)

Chapter 3 Method

The choice of a data model for a database is important and sometimes difficult. In order to improve the suitability of the chosen data model and to make the choice quicker, a method that enhances the decision in these ways should be created. Hence, the main research question to be answered is Can a decision method be created to help enhance the choice of data model given a use-case’s technical requirements?. Furthermore, the reliability of such a method must be evaluated in order to know whether it can produce suitable recommendations or not. To answer the main research question three additional questions must be answered. Firstly, can a method that helps with decision-making be constructed? And is it applicable in the context of the choice of data model for a database? The strategy for answering these two questions will be explained in section 3.1 Research method. Secondly, provided that a compatible method can be created, is it reliable enough to produce suitable recommendations? The method for assessing the reliability of the decision method will be explained in section 3.2 Case study.

3.1 Research method

In order to find out if there is a decision method that makes the decision of a data model quicker and better suited to the use-case, and to ensure that the decision method is applicable, the following strategy will be applied:

1. Identify the problem

• Literature study on multi-criteria decision making methods.

• Identify data models to consider 11

(18)

2. Propose an implementation

• Choose a specific multi-criteria decision making method.

3. Iteratively implement and evaluate the decision method

• Identify/reconsider criteria

• Create/adjust the hierarchy of the decision method

• Evaluate the criteria in relation to the data models

• Evaluate the reliability of the decision method

4. Thoroughly evaluate and assess the reliability of the decision method The iterative step will be repeated as long as there is time to enhance the method. After each iteration, the reliability of the decision method is evaluated using the feedback from consultants at Omegapoint and the authors own assessments, in order to enhance the reliability.When the iterations are done, an implementation of the decision method will be created and evaluated in a case study in order to assess the reliability of the decision method.

This method is a fairly straightforward method but is similar to Polya’s research method [8] since they both share the same basic four phases.

3.2 Case study

Since the work will be carried out at Omegapoint, their consultants’ expertise could be used to gather empirical data and try to draw conclusions from it.The method of choice will be a case study where the consultants at Omegapoint are the respondents. In order to eﬃciently evaluate the decision method, it will be implemented as a web application, hereby referred to as

“the artifact”. The case study will consist of a survey where the consultants will use the artifact and then answer some follow up questions about their background as well as giving feedback on the artifact’s results. They will be asked whether the recommendation from the artifact is reasonable, and their answers will be categorized as negative, neutral and positive responses. The percentage of positive responses will be considered as the reliability of the decision method.

3.3 Literature study

To gather suﬃcient information to support the assessments of the data models and multiple-criteria decision making methods, research was made.

(19)

3.4. PROJECT METHOD 13

To gain knowledge about multiple-criteria decision making methods, these sources were used::

• Articles and papers on the subject.

• Internet forums and websites.

To gain knowledge to support the assessments of the data models, these sources were used:

• Articles, papers, and books that examine the data models.

• Lectures, blog posts, and presentations, made by knowledgeable pro- fessionals, on the data models.

• Technical documentation and internet forums.

• Feedback and information from the consultants at Omegapoint.

3.4 Project method

Every project has fundamental constraints which are often referred to as the Project management triangle [9]. This is a set of three constraints–time, cost and scope–where a maximum of two may be set, and at least one should be flexible. By the nature of this bachelor thesis, the cost is set to 400 hours per person which is not flexible. The time-frame of the project is 10 weeks and may be spread across a year at most. In this project specifically, it is set to 10 consecutive weeks, excluding the Christmas week, between October 28th to January 17th. The scope of the decision method, however, is the flexible part which could be extended while there is time left. The scope is the number of iterations that are done when developing the decision method before the evaluation survey has to start in order to meet the deadline, which in turn aﬀects how reliable the decision method is.

3.5 Technical method

In order to carry out the work, and in order to realize an artifact, multiple technical aids and frameworks will be used. The technical aids, frameworks, and tools used, as well as what they were used for, are explained below.

Before the artifact was created, Mathematica, a technical computing system, was used in order to manually calculate the output of the decision

(20)

method. The artifact was built using NodeJS as the back-end for mathematical processing, and VueJS was used as the front-end framework for the web application. Mathematica was used to verify the output from the artifact. The survey results were stored and the graphs were made in Google Sheets.

(21)

Chapter 4 Designing the decision method

Our research found that there are many established multiple-criteria decision making methods to choose from. In this work, the Analytical Hierarchy Process was chosen as the framework which the created decision method is based on. This chapter will explain the choice of Analytical Hierarchy Process, how it works and how the decision method’s hierarchical model is designed and created.

4.1 Analytical Hierarchy Process

There are many multiple-criteria decision making methods, for example Multi-Attribute Utility Theory, Analytical Hierarchy Process and Case-Based Reasoning and amongst them, the Analytical Hierarchy Process was chosen as the framework which the created method is based on. This was both due to the accessibility of finding information about it, and due to its pairwise comparison quality which appears to be a good quality for a decision-making method according to the authors of the paper “Tentative guidelines to help choosing an appropriate MCDA method” [10]. Analytical Hierarchy Process is considered good when it comes to performance problems [11], which is what this works limits itself to regarding the criteria.

In order for the Analytical Hierarchy Process to work, multiple assessment matrices have been created that rank the data models in relation to each other, and in regard to each criterion. Some criteria were easy to assess objectively and had clear diﬀerences in which data model is better in regards to a criterion. For others, multiple benchmarks might contradict each other and the de-facto views on which data model excels at what. In these cases, a qualified assessment was made by the authors based on multiple factors, such as benchmarks, internet forum discussions, technical documentation,

15

(22)

discussions with consultants at Omegapoint, and reasoning. However, since the assessments were based on the judgment of the authors of this report and should not be considered perfect.

4.2 Identification of criteria

In order to assess if a particular data model is suitable for a use-case in an application or system, one must establish a set of criteria that are used to compare the data models against each other. The criteria for the decision method were selected by the authors of the report. The selection of criteria are based on the books NoSQL Distilled [2] and Designing Data Intensive Applications [3], discussions with experienced consultants at Omegapoint who are knowledgeable in the field of databases, as well as analytical reasoning.While one implementation of a data model may compare diﬀerently to another implementation of the same data model, there are characteristics that are data model dependent and thus shared amongst all (or almost all) implementations. For example, MongoDB and Couchbase are two implementations of the document data model. They share some characteristics and features and diﬀer in others–both model one-to-many relationships great, but Couchbase does not support transactions while MongoDB does. In this case, the document data model is considered to support transactions since many do, and the same logic is applied to other similar scenarios.

The most central and important criteria, for the created Analytical Hi- erarchy Process model, were identified as

• the ability to eﬃciently handle many-to-many and one-to-many relationships

• how flexible the structure of the data model is

• whether transactions are supported or not

• write (insert and update) performance

• read performance

• the ability to increase the performance by scaling

When assessing some of these criteria it is impossible to assess the abstract data model’s quality. In these cases, real implementation of that data model has been assessed to represent the data models.

(23)

4.2. IDENTIFICATION OF CRITERIA 17

The classical properties of Atomicity, Consistency, Isolation, and Dura- bility (ACID) have not been taken into consideration. Because in reality, ACID’s most important implication boils down to whether a data model supports transactions or not, which is already considered. The trade-oﬀ between Consistency, Availability and Partition tolerance (CAP) has not been taken into account since almost all data models can support these. No data model can support all three fully but–to simplify–an implementation must choose two out of three. The most important real-world implication of this is how well a data model supports horizontal scaling for better performance–how well they scale over multiple servers.

Performance criterion

One of the most central aspects of choosing a database is the performance.

This study identified three main characteristics regarding performance: write (insert and update) speed, read speed and the ability to scale up the performance of the database to support more intensive loads.

Many applications have the need for both writing and reading data, but depending on the application one might be more important than the other.

An example, where reading is more important than writing, would be a blog where few people post updates that thousands read. Hence, these applications have a greater need to be able to read data eﬃciently since most actions are read. There are also systems which primarily collect data, in which case writes are much more common than the occasional read. Ex- amples of this could be systems that collect data from sensors or security systems that collect huge amounts of data all the time and, rarely, accesses data on demand.

The last identified criterion is performance scaling. Some applications will have a large number of users that interact with it or a huge amount of sensors that send data to it. This poses its own challenge since vertical scalability (upgrading or adding hardware to a server) has a lower roof where either it can not support higher loads or becomes too expensive to support a high load [2]. When assessing scaling, having better support for creating and scaling clusters (called horizontal scalability) is considered better. This is because it is cheaper and can be done on demand without downtime.

Data model criterion

The data model is the fundamental way in which the data is structured and interacted with, in the database. This study identified three main data model

(24)

characteristics: how they handle many-to-many relationships, one-to-many relationships and the flexibility of the data model’s structure.

Depending on what the application is designed to do, the data model needs to handle diﬀerent types of relationships. This is why many-to-many and one-to-many relationships were chosen as two of the criteria.

The flexibility of structure is the last subcriterion of the data model criterion. An application might not need any flexibility regarding its structure because of the nature of the project and following a strict schema when writing data is a non-issue. However, in some instances one might want to have more freedom with the structure of the data. This might be that the project itself has volatile requirements and the functionality needs to change often, or because of the very nature of the project if it needs to store heterogeneous data. If this is the case, a schema-less data model is beneficial. Schema-less means that the data does not need to abide to a certain structure at the moment of writing. In almost all cases, however, the data will still have an implicit schema when the data is read because one expects certain data to be read when data is fetched from the database.

Transactions criterion

Transactions were chosen since some applications have a critical need to have absolute consistency. A good example of this is banking applications where it is unacceptable to let a user spend the entirety of their account balance twice because of a race condition. Such a race condition could let two threads read the same balance, and then add it to two other accounts before finally subtracting it from the user’s account. The data models will be rated in a binary way: whether they support transactions or not.

4.3 Creation of the decision method hierarchy

Analytical Hierarchy Process requires a certain hierarchical structure to be created in order to work. This structure consists of criteria that are rea- sonably comparable to each other, these are then inserted at the same level in the hierarchy. Criteria that share the same parent are compared to each other during the assessments.

The goal, level 0, of the decision was to choose a data model that suits a specific use-case. The next level, level 1, contained the criteria Data model, Transactions, Performance. Level 2 contained the subcriteria for Data model and Performance. The subcriteria for the Data model are Many-to-many, Flexibility of structure, and One-to-many. The subcriteria for the Perfor-

(25)

4.4. EVALUATION OF CRITERIA IN RELATION TO DATA MODELS19

Figure 4.1: A view of the constructed hierarchy.

mance are Read, Write and Scalability. The alternatives, at the bottom of the hierarchy are the five data models. These are connected to the subcriteria on level 2 and Transactions, see figure 4.1.

4.4 Evaluation of criteria in relation to data models

For the Analytical Hierarchy Process to work, there must be a foundation of assessments where each alternative is rated in relation to the other alternatives in regards to each criterion. Multiple mathematical matrices, of these assessments, were created, one for each criterion, so that the mathematical operations may be carried out at the final stage. For a simpler concatenated version of this, see table 4.1, for the full matrices, see appendix A tables A1-A7.

The assessments for each data model, in relation to the other data models, will be explained in their own section below. Transactions, however, is a binary criterion which a data model either supports or does not support,

(26)

Table 4.1: Matrix of the evaluation of the data models.

One-to-many Many-to-many Flexibility Transactions Write Read Scalability

Document High Low High High High High High

Relational Medium-High Medium Low-Medium High Medium High Low

Key-value Low-Medium Low High High High High High

Wide-column Low Low High - High Medium High

Graph High High High High Low Medium Medium-High

hence it will not be explained further for each data model.

Relational

The relational data model is adequate at modeling one-to-many relationships. It can, however, require an additional table for establishing relationships between the entities depending on the domain model. This creation of extra tables is a drawback to the relational way of modeling such relationships. The relational data model can also query and aggregate data based on these relationships. However, performing too many JOIN operations may slow down queries and these queries are not as easy to write as with other data models that model one-to-many relationships more naturally.

The relational data model is able to model many-to-many relationships, but as with one-to-many relationships, the relational data model will always need an additional table for keeping track of the relationships which make many-to-many relationships slightly more complex. It can also query the data based on this, but not as easily or eﬃciently as the graph data model when the queries become more complex.

The relational data model is not that flexible when it comes to the data structure [2]. It has a strict schema that must be obeyed when writing data, making it abide by a schema on write. In order to make changes in the data structure, special queries must be executed, transforming the structure for all entries in aﬀected tables. For some relational database implementations, the entire database is copied to a new database with the desired structure leading to performance hits while restructuring it.

Databases that are built on the relational data model performed mediocrely when writing data and among the best when reading data [7].

Relational databases are designed to run on one server, which means that they are not designed to be horizontally scalable. This is because relational databases have to maintain their relationships and consistency, which is diﬃcult to preserve when the database is on diﬀerent servers [2].

(27)

4.4. EVALUATION OF CRITERIA IN RELATION TO DATA MODELS21

Document

The document data model is stellar when it comes to modeling one-to-many relationships. In the case of modeling entity-properties, it is very good, and in the case of modeling entity-entities, the document data model is still good but might require duplication of data. The document data model can also query and aggregate data based on these relationships.

The document data model is realistically unable to model many-to-many relationships.

The document data model is very flexible in its structure since it does not have schema on write [2]. One can save any field and value (though, limited by the supported values) in a document. Furthermore, documents in the same collection do not have to have the same structure.

Document databases perform very well in regards to both read and write [7].Document databases are very scalable since they support horizontal scalability. In a document data cluster, diﬀerent collections can reside on diﬀerent servers. Each of these servers may read and write data, and additional read- slaves (servers that copy data from the main server, and only assist with read requests) can be added to each “main” server in the cluster, further enhancing read performance [2].

Wide-column

The wide-column data model is good at modeling some kinds of one-to- many relationships, as it is able to model “one entity has many properties”

scenarios. However, the greatest weakness is its inability to model one-to- many when the “many” are many of the same, such as an array or when it is an entire entity. The wide-column data model can use these relationships to perform queries but is still limited by its inability to model all kinds of one-to-many relationships.

The wide-column data model can not model many-to-many relationships.

Wide-column is another flexible data model when it comes to the structure, where the rows do not have to contain the same fields, even though they are of the same column family [2].

Wide-column databases performed well in one benchmark [7], mediocrely in another [12] in regards to write performance and are considered very good at writing [13]. In regards to read speed, wide-column databases are not among the fastest but neither among the slowest [7].

Wide-column databases have great scalability since they support horizontal scalability. Both because of the simplicity of adding extra nodes and the

(28)

linear scaling that comes with it [2]. Each server may also read and write data independently of the other nodes, adding to the scalability.

Key-value

There are no limits as to what can be modeled by the key-value data model.

Since the data may be stored as a blob (which could be a serialized object, a JSON document, an XML document or anything at all) and parsed on the client–the key-value data model is excellent for modeling one-to-many relationships. It is, however, unable to know about any relationships, which means that it can not be queried with respect to these relationships–which has to be done manually on the client.

Key-value based databases might be able to store many-to-many relationships if a complex data structure modeling it is serialized and saved as a blob. It makes little sense to try to save large or complex data structures this way since you can not manipulate them without parsing them on the client.

This leads to even more manual work to be able to use these relationships on the client than one-to-many relationships do.

The key-value data model is the most flexible of all data models since it does not even have to adhere to any standard data-types or structure when saving data. If changes are made and one wants to update the content of a specific key, the current value is overwritten with any value–independently of the current or new structure or data type.

Much because of how simple key-value databases are they are quick in regards to both reading and writing. They do not need to take any relationships into consideration and have no secondary indices to update [14].

The values in a key-value data model are blobs of data, which means that the database itself does not know what the content is. Because of this lack of relationships to track, key-value databases are great in terms of scalability.

The data is often partitioned by the first character of the hash of the key, which leads to equal distribution of data amongst the servers [15].

Graph

There is one problem with graph databases in regard to assessing it in relation to the other data models. The graph data model is very different compared to the other data models and is typically used in completely different use- cases. Because of this, it is difficult to find any work comparing the graph data model to other data models–especially in regards to performance.

The graph data model’s greatest strengths are the ability to model all kinds of relationships, including one-to-many. It has a rich query language

(29)

4.5. SUMMARY OF THE DECISION METHOD 23

and is able to use the relationships to traverse the graph and find what is sought.

The modeling of complex many-to-many relationships might be the biggest single reason to choose a graph database. It handles many-to-many relationships best of all data models and has a query language that makes it simple to query. Realistically it is the only data model that can eﬃciently find analytical patterns in highly interconnected data [16] [2].

The graph data model is schema-less which means that it is very flexible when it comes to the structure [2]. The edges and nodes can contain any fields and any amount of fields.

Regarding write performance, graph databases perform worse than relational databases and are slightly slower than relational databases when it comes to reading [17].

Graph databases are able to be partitioned to achieve some scalability, but each server has to contain the entire database in memory, which means that it is diﬃcult to scale large graph databases. Additional nodes may be added to the cluster, but they only enhance read performance since writing is still done by a master server [18].

4.5 Summary of the decision method

After several iterations of improving the method in the following ways:

changing which criteria were present, how the hierarchy was structured and reassessing the evaluations of the data models in regards to each criterion–a final version of the decision method emerged. This decision method allows for simple usage by the decision-maker, since the data model assessments are already made. See figure 4.2 for the process of developing the final decision method.

(30)

Figure 4.2: Flow chart of the decision method usage

(31)

Chapter 5 Evaluation of the decision method

In order to evaluate the reliability of the decision method, it was realized as an artifact in the form of a web application. This was done because it simplifies the logistics of the case study, and because the mathematical operations required are quickly and automatically done by computers.

5.1 Implementation of the decision method

The web application had a simple back-end which only performed the mathematical calculations and returned the output to the front-end. The front-end was kept simple and only displayed the two criteria to compare and a slider to determine how much more important one criterion is to another, see figure 5.1. Sliding the slider to the left means that the user prefers the left quality over the right, and vice versa. When the user is done with the assessments of the criteria, the result is presented as in figure 5.2.

The source code for the application is available at https://github.com/

TIDAB-thesis/ahp-app.

Figure 5.1: A picture of the web application.

25

(32)

Figure 5.2: A picture of how the web application present the result.

5.2 Case study

The case study was done in the form of a survey where Omegapoint’s consultants used the artifact and were asked questions about it. The requirements for being a part of the survey were that the respondent must have been a part of a project that had used a database, know what type of database was used and know about the technical requirements for a use-case within the project. During the survey, the respondents used the web application which prompted them to prioritize the identified criteria, for the use-case, against each other. Upon completion of the web application, the results were saved and they were asked a few questions. These questions were:

• What type of database (which data model) were used in the actual use-case?

• Do you know why this model was chosen?

• How familiar are you with (each of these had the possible responses of good/mediocre/bad):

– Relational databases – Document databases – Key-value databases – Wide-column databases – Graph databases

(33)

5.3. RELIABILITY OF THE DECISION METHOD 27

• Do you consider the results reasonable?

The questions were asked in Swedish which means these questions are not literal translations or representations of the actual questions.

The majority of the surveys were done at Omegapoint’s monthly com- petence day and the remaining few at other times at the oﬃce. All survey respondents were invited to participate at random and participation was voluntary. A total of 27 participants completed the survey.

The consistency ratio index for the answers was noted. This index gives an indication of how much the respondents contradict themselves with their answers. A higher index means that they contradict themselves to a larger extent, and the recommended maximum value of this index is 0.1 [19]. An- swers with a consistency ratio index over 0.3 were discarded, even though the recommended maximum value is 0.1. The reason for this is because none of the answers were under the 0.1 threshold. The value 0.3 was identified as a breaking point where many results were kept, while still discarding the most inconsistent answers. Eight answers were discarded due to them having a consistency ratio index higher than 0.3, hence all data that follows will only include the 19 answers with a consistency ratio index lower than 0.3. One could have repeated each survey until an acceptable consistency ratio index under 0.1 was achieved. However, it was deemed infeasible to do so, due to respondents potentially losing interest as well as the amount of time it would take to do so.

During the case study, the respondents gave feedback about the process.

Multiple respondents thought that the process was helpful because they were forced to think carefully of what is important for their use-case. The respondents also said that they wished they had gone through a similar process before they had decided on the data model for a current or past use-case. A common critique was that some respondents thought that some criteria were diﬃcult to compare to each other.

5.3 Reliability of the decision method

In order to learn how suitable the decision method’s recommendations are, the reliability of the method had to be determined. The determined reliability was supported by how reasonable the respondents found their results to be. This data was gathered by asking every respondent how reasonable the result was after they used the artifact.

There were quite a few “I don’t know” answers, see figure 5.3, which does not provide any useful feedback. It was found that when only keeping

(34)

Figure 5.3: Result of all respondents’ assessment of the decision method.

answers where the respondents had knowledge about the, for the use-case, chosen data model, as well as one of the top two recommended data models from the result, half of the “I don’t know” answers vanished. This lead to an even larger majority of respondents who found the result to be reasonable see figure 5.4. This means that when looking at this subset, 11 out of 13, or about 80%, had a positive assessment of the results.

A 95% confidence interval for the proportion of respondents who found the output to be reasonable was calculated using the Adjusted Wald method.

This method is considered good where one has a sample size of less than 150 [20]. The confidence interval is [0.57,0.97], this means that with 95%

certainty, at least 57%, but up to a 97% majority, agrees that this artifact, and by extension the decision method, gives a reasonable result.

When looking at what data models the artifact presented the respondents with, it seems to strongly favor the document and graph data models, see figure 5.5. With some diversity when looking at data models that were rated secondly in the result.

Since the relational data model is almost never recommended, it is inter- esting to look at which data models were used in the real world use-cases that the respondents had participated in. As seen in figure 5.6, a majority of the respondents reported the relational data model being used in the use-case they had been a part of.

The relational model was almost never ranked among the top data models in the artifacts result and the relational data model is the most common one in terms of usage. Among the cases where the respondent did not get their chosen data model ranked first or second–90% had used the relational data model for the real-world use-case.

(35)

5.3. RELIABILITY OF THE DECISION METHOD 29

Figure 5.4: Respondents assessment of the results, where the respondent has knowledge about the, for the project, chosen and one of the top two ranked data models.

Figure 5.5: Shows the frequency of each data model per rank.

(36)

Figure 5.6: The data models which the respondents used in their chosen use-case.

(37)

Chapter 6 Summary

6.1 Method discussion

Research method

The research method is generic, and there are a lot of rigid research methods that would have probably worked just as well. The chosen research method is similar to Polya’s research method, and there is nothing in the nature of the problem that suggests that a specialized research method is required.

Hence, we think that our research method worked for proposing and deciding upon a multiple-criteria decision making method.

After doing our research and choosing a multiple-criteria decision making method, we are confident that our method works for assisting in the decision between multiple alternatives. However, if other investigators would conduct the same research we did, they might come up with another conclusion regarding which method to use to answer the same question–because there are many alternatives to choose from.

The decision method was iteratively improved by repeatedly testing its reliability and adjusting the criteria, hierarchy and data model evaluations–before the final, more extensie, evaluation. A risk with this is that the method could have ended up taking all five data models into account but having unreliable recommendations–which is not very useful. One way of improving the iterative method of refining the decision method could have been to start with fewer data models, and only add new data models once the reliability is considered good. In this way, if the time runs out sooner than expected, the resulting method would still be reliable when choosing between the considered data models.

31

(38)

Case study

Given what we had access to, a case study was a good method of validating the decision method. It was easy to conduct and while we had to discard some answers, a case study in itself can make use of empirical data which we had access to. A questionable aspect of this is that we rely on the developers’ assessments and opinions of the artifact’s result. Aspects of bias, pride, misunderstanding, and ignorance may aﬀect their assessments and how they perceive the result. A better way might be to ask database experts, who also have good insights into some use-cases’ technical requirements, to evaluate the decision method. This could have eliminated most of the ignorance aspect, but the other weaknesses of a subjective assessment would still persist.

However, one drawback of our execution of the case study as a validation method is that almost 30% of the answers were discarded due to being too inconsistent–even though we kept answers that are more inconsistent than the Analytical Hierarchy Process recommends. This is a natural drawback of assessing many criteria in relation to each other–but for the most part, this is a constraint of how the case study was conducted. The case study had its limitations as we did not have the time to ask the respondents to re-try the artifact in order to achieve a good enough consistency ratio index. Had we used another format for the case study where we allocated more time for the respondents to use, and perhaps re-do, the artifact, there is a chance that more answers could have been kept.

Due to the nature of case studies, the reliability is not very rigid. Espe- cially not with such a small sample size as this study has. Would another person try to reenact this survey with the same amount of respondents, they might get diﬀerent answers and a diﬀerent perceived reliability among respondents. A larger sample size would be required in order to draw any general statistical conclusions of the reliability of the decision method, a larger sample size would also make the study more reliable.

In hindsight, there are two additional questions that the participants should have been asked. One question about how well they knew about the project requirements–to see if more knowledge of the requirements yields more reliable answers. They should also have been asked if they thought that the use-case currently uses a fitting data model in addition to the question about what they thought about the results. This could have helped the assessment of the decision method’s reliability.

(39)

6.2. RESULTS DISCUSSION 33

6.2 Results discussion

Designing the decision method

The Analytical Hierarchy Process was chosen as the methodical foundation for the decision method. This method is a conventional method among multiple-criteria decision making methods, but there is nothing to support that this is the best method specifically for choosing a data model for a database. We can think of multiple methods that could be applicable such as other multiple-criteria decision making methods, AI-driven algorithms, some type of “summary” cheat-sheet where developers could educate themselves about the data models in a shorter amount of time and make their own decisions, flow-charts, etc.

The reliability of the decision method’s structure is not very high. Other investigators may come up with different hierarchical structures and criteria to define the same choice. The chosen criteria are, while chosen on qualified grounds, subjective since there are no set of criteria that are universally deemed as good criteria for this specific choice. Other investigators may also assess the data models differently than us, which may affect the outcome of the work.

Evaluation of the decision method

When looking at what data model has been chosen, in the use-case, for the most part, the relational data model dominates with its 52%. However, it is almost never among the top three in the result of the artifact. More- over, in 90% of the cases where the used data model was not ranked first or second–the used data model was relational. This shows that the decision method is probably biased against the relational data model. The decision model does not consider “variable queries”, nor does it take into account the other kind of flexibility one might need: flexibility of design. When designing an aggregate database it is not flexible in terms of the design and the performance and complexity may be abysmal if the data is not designed to match the access patterns. This is a problem that relational databases do not have since their normalized modeling allows for accessing data from all angles without any performance hit. To address this we found that Ad-hoc queries and Flexibility of design could be added to the decision method. These criteria would have given the relational data model an advantage over the NoSQL models, which in turn could eliminate the bias against the relational data model.

A problem that was encountered was that it was not easy to compare

(40)

the data models to each other completely in theory. Hence this study partly relies on comparisons that were made with popular vendor implementations of the data models. This was good in a pragmatic way since a developer tasked with choosing a data model for a database will most certainly choose an existing product, and by definition, a popular alternative. However, this is not always true and sometimes a less popular database will be selected, and these were not represented to the same extent in this study.

Due to the small sample size, it was not possible to draw a general statistical conclusion about the decision method’s reliability. The confidence interval [0.57,0.97] was too large to conclude anything about the certainty of the measured reliability. What can be said, however, was that among the consultants who participated, the perceived reliability was good–11 out of 13, about 80%, of the respondents, found it reliable. If, for example, 200 usable survey responses had been acquired and 160, still 80%, of the respondents found the reliability good, the 95% confidence interval would have been [0.74,0.85], which is a smaller interval and statistically more certain,

Since the sample size is too small to draw statistical conclusions, we conclude that it is currently unknown exactly how suitable the recommendations from the decision method are. However, among our respondents, it seems to produce suitable recommendations. The decision method should also speed up the decision process since it allows the developers to focus on the top two or three data models for research, and skip the lowest-rated ones–which most definitely should be faster than researching all five. This work only shows that the decision could be made quicker for the respondents in the case study since the reliability of the method is only asserted for this group.

If the decision method shows to be statistically reliable, the decision could be said to be quicker in general as well.

Even though the decision method is not statistically reliable, multiple respondents gave positive feedback on the process of completing the artifact.

They reported gaining further insight into their use-case and its technical requirements since they were forced to weigh diﬀerent criteria against each other and think about it more carefully than they had previously done. Even though the results might not be entirely reliable, this is a good side-eﬀect and was appreciated by everyone who brought it up.

The most common negative feedback received was that some criteria were hard to compare to each other, as they did not seem to be easily comparable.

This is most likely due to the hierarchical design of the criteria and could probably be avoided. However, using the criteria we did identify, we think that they are placed as good as they can be in the hierarchy. In order to make each choice fully comparable, we would have to remove or change some

(41)

6.3. CONCLUSION 35

criteria that we think are important to the choice–which is why we opted to keep them at the cost of having slightly incompatible comparisons at times.

6.3 Conclusion

To answer the main research question Can a decision method be created to help enhance the choice of data model given a use-case’s technical requirements?. We found that it is possible to create a decision method that enhances the decision of which data model to use for a specific use-case. The decision method makes the choice quicker by eliminating the need for thorough research on all five data models, since research can be focused on the top presented data models. The suitability of the recommendations can not be statistically proven to be reliable due to the small sample size of survey respondents–but among the limited case study group the decision method is perceived to be reliable as 11 out of 13 consultants think that the results were reasonable.

A benefit of the decision method is that many respondents said that they found the process of prioritizing the criteria against each other enlightening and valuable. They thought that the compulsory prioritization made them think carefully about them, and thought that going through such a process before deciding on a data model for a database would be beneficial for their use-case. Even if it would be asserted that the developed decision method could not provide reliable suggestions–the process itself, ignoring the results, may be a valuable tool for developers when deciding on a data model.

In total 27 respondents participated in the case study to validate the decision method, but 14 answers were discarded. Some due to their answers being too inconsistent, and some due to their lack of knowledge about what they were assessing. We think that a better choice of validation method could have been made to circumvent the high amount of discarded answers.

6.4 Future works

Since we were not able to draw any general statistical conclusions regarding the reliability of the method due to the small sample size in our case study–a future work could be to repeat the survey with more respondents and see if a statistically accurate reliability can be asserted.

Because of the diﬃculty of assessing the performance criteria, due to a lack of articles or papers where the authors had performed performance tests on all five data models, a future work could be to do these performance

(42)

tests with all five data models and see if any performance assessments could be adjusted. If the assessments were reliable, a follow-up work could be to evaluate the decision method with the assessments in mind.

As mentioned in the results, we think that the decision model is biased against relational databases, and by adding the criteria Ad-hoc queries and Flexibility of design to the decision method’s hierarchy, it would probably be more reliable. A future work would be to add these criteria, or any other criteria, to the method and evaluate it again to see if the reliability is enhanced or if the bias against the relational data model is eliminated.

(43)

References

[1] Solid IT. DB-Engines Ranking. url: https://db-engines.com/en/

ranking. (accessed: 2019-11-11).

[2] Pramod J. Sadalage and Martin Fowler. NoSQL Distilled. Pearson Ed- ucation Inc, 2012. isbn: 0321826620.

[3] Martin Kleppmann. Designing Data-Intensive Applications. O’Reilly Media, Inc., 1981. isbn: 9781449373320.

[4] MongoDB. Data Modeling Introduction. url: https://docs.mongodb.

com/manual/core/data-modeling-introduction/. (accessed: 2019- 11-12).

[5] Thomas L. Saaty. “How to Make a Decision: The Analytic Hierar- chy Process”. In: INFORMS Journal on Applied Analytics 24.6 (1994), pp. 19–43. doi: 10.1287/inte.24.6.19. eprint: https://doi.org/

10.1287/inte.24.6.19. url: https://doi.org/10.1287/inte.24.

6.19.

[6] Konrad Fraczek and Malgorzata Plechawska-Wojcik. “SQL Versus NoSQL Movement with Big Data Analytics”. In: International Journal of In- formation Technology and Computer Science(IJITCS) 8 (2016), pp. 59–

66. doi: 10.5815/ijitcs.2016.12.07.

[7] Yishan Li and Sathiamoorthy Manoharan. “A performance comparison of SQL and NoSQL databases”. In: 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) (2013), pp. 15–19. doi: 10.1109/PACRIM.2013.6625441.

[8] Polya’s Problem Solving Techniques. 2019. url: https://math.berkeley.

edu/~gmelvin/polya.pdf. (accessed: 2019-01-08).

[9] The Triple Constraint in Project Management: Time, Scope & Cost.

2018. url: https : / / www . projectmanager . com / blog / triple - constraint - project - management - time - scope - cost. (accessed:

2020-02-04).

37

Proposal and Evaluation of a Database Data Model Decision Method

Proposal and Evaluation of a Database Data Model Decision Method

Förslag och utvärdering av en beslutsmetod för databasmodeller

EMIL LINDHOLM BRANDT SABINA HAUZENBERGER

Abstract

Keyword

Sammanfattning

Nyckelord

Acknowledgements

Contents

Chapter 1 Introduction

1.1 Background

1.2 Problem

1.3 Purpose

1.4 Goals

1.5 Method

1.6 Delimitations

1.7 Risks, Consequences and Ethics

1.8 Outline

Chapter 2 Background

2.1 Database models

Relational

Document

Key-value

Wide-column

Graph

2.2 Analytical Hierarchy Process

2.3 Related work

Chapter 3 Method

3.1 Research method

3.2 Case study

3.3 Literature study

3.4 Project method

3.5 Technical method

Chapter 4

Designing the decision method

4.1 Analytical Hierarchy Process

4.2 Identification of criteria

Performance criterion

Data model criterion

Transactions criterion

4.3 Creation of the decision method hierarchy

4.4 Evaluation of criteria in relation to data models

Relational

Document

Wide-column

Key-value

Graph

4.5 Summary of the decision method

Chapter 5

Evaluation of the decision method

5.1 Implementation of the decision method

5.2 Case study

5.3 Reliability of the decision method

Chapter 6 Summary

6.1 Method discussion

Research method

Case study

6.2 Results discussion

Designing the decision method

Evaluation of the decision method

6.3 Conclusion

6.4 Future works

References