Performance comparison: Between GraphQL, REST & SOAP

(1)

PERFORMANCE COMPARISON

Between GraphQL,REST & SOAP

Bachelor Degree Project in Informatics Level ECTS

Spring term 2020 Pontus Erlandsson Joakim Remes

Supervisor: Niclas Ståhl Examiner: Yacine Atif

(2)

Abstract

Modern applications commonly make use of several subsystems, usually a frontend and a backend. The communication link between these subsystems is commonly an API. Different APIs such as REST and SOAP have been around for a long time but with the increasing use of internet, other techniques as GraphQL have been developed to compensate for older techniques. The aim of this thesis is to measure the performance of GraphQL and how it compares to SOAP and REST and how the overhead reduction of GraphQL will affect the performance. The main method used to evaluate the performance differences between GraphQL, REST and SOAP is an experiment. The results show that GraphQL has the worst performance in all test cases. GraphQL has the lowest packet size out of all three APIs when only a few fields are fetched, however the packet size increases rapidly when multiple fields are requested.

Keywords: API, GraphQL, REST, SOAP, Performance, Packet size

(3)

Acknowledgement

We would like to thank Bricknode AB and Untie Group AB for lending us their facilities and knowledge to support the creation of this study. We would also like to thank our supervisor, Niclas Ståhl, at the University of Skövde for his guidance and support during the completion of this thesis.

(4)

1 Introduction

Modern applications today commonly consist of several subsystems that combined constitutes the complete system. Each subsystem can exist on different machines and

therefore needs an efficient way to communicate with each other over networks. The concept of API (Application Programming Interface) was introduced to provide a communication link between different parts of the system.

Many different API techniques have been introduced over the years, but the most used APIs is the REST (Representational State Transfer) architecture and SOAP (Simple Objects Access Protocol) protocol which are comparably old (Santos 2017). SOAP was the most used API solution before REST gained popularity in 2005¹ for requesting resources over the internet.

For a long time, REST has established a foothold and become the API solution that many companies and developers use, but in recent years has more modern approaches of requesting data been developed. GraphQL is a query language developed by Facebook in 2012 and released open source 2015. GraphQL provides a flexible way for clients to requests exactly the data needed. The opportunity to choose between what data that is needed is something that is not possible in REST and SOAP APIs because the data they provide is determined in advance (Freeman 2019).

The aim of this thesis is to provide insight on how the performance of GraphQL compares against the other two techniques and how the overhead reduction of GraphQL affects the performance.

For this thesis, an experiment has been conducted. The experiment setup will consist of clients requesting data from the three different APIs. The response time will be measured from when a request is sent to when a response is received. The packet size of both the response and the request content size will be measured to be able to calculate the total packet size sent over the network. The gathered data will be analysed and presented.

In order to perform the experiment, first the data used by the APIs are generated. This data is then inserted into the databases that would be used. Then, the backend is implemented to use the databases and data models were created. The three APIs SOAP, REST and GraphQL are created to all use the same backend to ensure fair and equal treatment. Lastly a testing client capable of using the APIs and logging the measured data is created. The data gathered are analysed using the average value of the characteristics from each test iteration.

1 https://blog.restcase.com/the-rise-of-rest-api/ [2020-05-08]

(7)

2

2 Background

This chapter contains relevant background information for this study. The first two sections provide a brief description about data formats JSON and XML that are used by the APIs in this study. The following sections will be about general APIs and the three APIs used in this study. The last sections contain information regarding the databases used in the experiment, NoSQL and MySQL.

2.1 JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format. JSON is derived from ECMAScript Programming Language Standard (RFC 2017) and first specified and popularized by Douglas Crockford (MDN contributors 2019). The JSON format was designed with human readability in mind and is also easy for machines to generate and parse. One of the advantages with JSON and why it is popular and often used by the community is because it is independent of programming language and the conventions are well known to programmers (Introducing JSON) and (Patni 2017).

Freeman (2019) states that JSON has (over the years) become the format of choice when developing web services or native mobile applications. This is because it is widely supported by many programming languages and the format is great for serializing structured data and transmitting it over networks (Patni 2017) and (Goncalves 2013).

JSON is mainly represented of four primitive data types. These are: strings, numbers, Booleans and null. JSON also uses two structured types: objects, and arrays (RFC 2017).

JSON encloses variables and values with curly braces to indicate that the value is an object.

Inside the object, several properties are declared using pairs of attribute names and their values, separated by comma. The value of each property can be accessed by referring to the name of the property (Goncalves 2013).

An example of a JSON object can be seen below in Figure 1. The top object is a company and it has an attribute name. The company has one nested object; a product; and a list containing two objects with two employees. Each of these nested objects have a nested attribute of their own, name. This way data can be structured to represent most entities.

{

“company”:{

“name”:”foo”,

“product” : {

“name”:”bar”

},

“employee”: [

{“name”: “John Doe”}, {“name”: “Mr. Smith”}

] }

}

Figure 1 JSON

(8)

3

2.2 XML

XML (eXtensible Markup Language) is a human-readable text-based markup language. It is based on SGML (Standard Generalized Markup Language) developed in 1986 as a solution to the complexity in SGML (Nurseitov, Paulson, Reynolds, Izurieta 2009). In 1996 the W3C (World Wide Web Consortium) started development of XML and in 1998 it became their recommendation (Concalves 2013).

XML has become a standard in data interchanging on the web. Data is identified using tags which are surrounded by angle brackets. Tags helps define data structure and outline the beginning and the end of the structure. Tags in a group or collection can be referred to as markup. There are no special or predefined tags, all tags can be generated and named freely.

XML also uses attributes. Attributes are additional information that belong to a tag and is placed within the brackets. Usually attributes are used for peripheral information or to help applications with processing. For an XML document to work it must be well-formed. There are a few rules to this, but the two major ones are that every opening tag needs a similar closing tag. For example, <open> must be closed with a closing tag </open>. Tags can be nested in however many layers the creator of the document wants to, but for the document to be well-formed they must be completely nested. This means a tag cannot be nested in

another tag and have the parent tag closed first. There are also XML-schemas that can be used to enforce rules on the structure within an XML document. With the help of a schema, the document can be forced to contain certain elements such as requiring all xml files using the schema to have a <company> tag and to have an attribute attached to it like <company name =”foo”>. There are several different types of schemas, but the idea is that an XML parser will use the rules and format defined in the schema and enforce it on the document (Patni 2017).

There are multiple reasons as to why XML is used as widely as it is. Every major programming language have parsers available to support XML. The way that data is identified, thanks to tags and the nested hierarchy allows for flexible development and representation of data. Because of the robustness and the verifiable file format it is used for both data storage and data transmission on and off the web (Patni 2017).

Below in Figure 2 the example structure from Figure 1 JSON can be seen represented using XML. The top element is a company and it has an attribute name. The company has three nested elements, one product and two employees. Each of the nested elements have a nested element of their own, name. This way data can be structured to represent most entities.

</product>

</employee>

<name> Mr. Smith </name>

</employee>

</company>

Figure 2 XML

(9)

4

2.3 API

Today modern applications commonly exist in the form of a segmented system with two or more subsystems consisting of frontend and backend. Where the frontend for example displays data for the user and the backend retrieves and stores the data. Frontend and backend commonly run on different machines and need an efficient way to communicate with each other over a network. To manage the communication between services the concept of API was introduced. Using an API, it is possible for the frontend to communicate with the backend and perform write, update, and read operations. The backend then handles the requested operation (Biehl 2018).

The resources available through the API can be accessed by URLs (Unified Resource Locators). An URL contains information about the internet address of the server and a unique identifier to the resource, also referred to as endpoint. The potential resource interactions depend upon the HTTP (Hypertext Transfer Protocol) methods also referred to as HTTP verbs, e.g., GET, POST, or DELETE (Bermbach,Wittern 2016). These methods indicate the action to be performed on the specific recourse (MDN contributors 2020). APIs provide a way for services to communicate with each other without the insight of the

underlying implementation (Bermbach,Wittern 2016). Figure 3 below illustrates a GET request to an API endpoint.

Figure 3 GET request to API endpoint through postman

With the increasing usage of web APIs exposing and sending data over the Internet, security has become one important topic. To access an API’s endpoint and its resources, commonly some kind of client authentication is needed. Many different kinds of

authentication methods exists today, however the most common ones are OAuth, API Keys and HTTP Basic Authentication with username and password (Sandoval, K. 2018) and (Bermbach,Wittern 2016). Two of the most common ways of setting up an API is through REST or SOAP (Santos 2017).

2.3.1 REST

REST is a set of architectural principles on how to transmit data over internet. There are no constraints or enforced response types for REST unlike SOAP (further described in 2.3.2) which is limited to XML but, JSON and XML are the most common, (Indrasiri 2016).

Because REST is an architectural style and not a standard there are no predefined rules to follow when implementing or using a REST API (Patni 2017). This has led to a lot of different interpretations and implementations of REST where some closely follow the architectural principles while others stray further away. Although not specified in the style, HTTP has become a de-facto standard as the primary way to access REST (Renzel, Schlebusch, Klamma 2012).

The term REST was conceived when Roy Fielding (2000) presented his doctoral dissertation in the year 2000. Fielding and his colleagues were working on trying to standardize the web when he got asked to defend and explain everything from abstract web interactions to fine

(10)

5

HTTP details. In doing so he refined his model and got a core set of principle, properties and constraints that then became REST (Readme 2016).

In a REST API any information can become a resource if it can be named. A resource can have a varying value or a set value that never changes. It is possible for two resources to have the same value, but still be identified independently. REST relies on a resource identifier to identify a resource; the identifier is specified by the developers (Fielding 2000). If the application is driven by hypertext it can be REST, regardless of protocol. The API should adhere to the communication protocol and only change things when more specification is required (Fielding 2008).

In HTTP the four most common methods to interact with a REST API are (CRUD) Create, Retrieve, Update and Delete methods. To retrieve information from a REST API using HTTP the GET method should be used. POST is used to update info, PUT to create new information and DELETE to perform deletions. Using REST with HTTP there is no limitations on what format the response will be. Most commonly it is JSON or XML, but images such as JPG or PNG can also be returned. It is up to the resource to decide the available formats (Patni 2017).

There are six main guiding principles or constraints in REST. The first one is client-server.

There needs to be a separation of data storage and the user, this increases portability and scalability. The second principle is the server should be stateless, each request received on the server should be received independently of any other request. The third is cacheable.

Data in a response needs to be labelled as cacheable or not cacheable to let the client know if it is safe to reuse the data later. The fourth principle is uniform interface which is to keep the interface general through software engineering principles. The fifth is layered system where each system can only see the layer they are interacting with. The last principle is optional and more of a feature. REST allows code on demand where the response can be code, scripts, applets, or other executables. This can reduce the implementations required on the client (restfulapi).

REST is the by far most popular API registered on ProgrammableWeb and out of the 23 060 APIs registered with a type classification the majority are for the web/internet (Santos 2017).

One of the reasons why REST is so popular, is due to how a lot of its strengths are related to the web. REST follows web philosophies; has a clear separation of client and server

implementations; allows for many return formats; is relatively easy to implement and maintain; lets the client know if information is safe to cache and reuse (Patni 2017).

In Figure 4 below a GET request using HTTP can be seen. The request sends two key-value pairs. The key company has the value foo and product has the value bar. The response to this request will come back with a HTTP header and a body. The content of the body can vary and be in multiple formats but one example of such a return can be seen below in Figure 5 . In this case the return is in JSON and contains a company object with the same value as in the request and the product the company has with the same value as the product in the request.

localhost/REST-example/get?company=foo&product=bar

Figure 4 GET request using REST endpoint through HTTP

(11)

6 {

“company”:{

“name”:”foo”,

“product” : {

“name”:”bar”

} }

}

Figure 5 Example of JSON response.

2.3.2 SOAP

SOAP) is a lightweight protocol designed to exchange structured and typed information over networks such as distributed environments or the web. SOAP is communicating using XML over HTTP and was designed to be platform independent. Because of this SOAP APIs can be used with different applications developed in different programming languages (Gudgin et.

al 2007). SOAP was in the beginning widely adopted as a standard way to implement web services. Due to the inherent drawbacks with complexity of the protocol (Indrasiri 2016), other API architectures like REST was developed.

SOAP, as a protocol, provides a bit more complexity for developers since it requires more insight in the underlying technologies, but adds more security when data is exchanged. This is one reason for why companies still consider SOAP APIs as a solution when developing applications that need higher level of security (Wodehouse 2017). Due to the higher level of security that SOAP provides it also includes more overhead in the requests (Tekli et al., Gianini 2011), resulting in larger request size and higher workload.

A SOAP message consists of an XML document that consists of three main components:

envelope, header, and body. Below in Figure 6, one example of a SOAP message structure can be seen.

Figure 6 SOAP Message structure

(12)

7

Guru99² describes the SOAP message structure as the Envelope that is encapsulating all the other components and the details of the SOAP message. The envelope element also indicates that the XML document is a SOAP message. The Header contains header information regarding authentication or other references that the application needs to handle the request. SOAP messages, by default, support standard types as string and numbers but can also describe and send more complex types, such as objects. If a SOAP message contains objects, then a definition of the object needs to be provided in the header as well. The body of the SOAP message contains request and response related information, here, the actual data which needs to be sent to various applications can be found.

WSDL

WSDL (Web Service Description Language) is an extended XML file format that informs the client on what the web service can do, without having access to the whole implementation.

WSDL provides an abstract interface between a SOAP API and a client with guidelines on how to communicate, using the web service by providing network address, ports, and bindings (Goncalves 2013). The WSDL document contains definitions of endpoints and presents all the operations that could be performed by the application. WSDL documents are not necessary and not all SOAP APIs uses WSDL, but when SOAP APIs was frequently developed the WSDLs became popular because it provides help to quickly establish a communication link with the web service (AlexSoft 2019).

2.3.3 GraphQL

GraphQL is a query language, developed by Facebook in 2012 and standardized and released as open source to the community in 2015. Based on the paradigm for development of APIs in client-server interactions, GraphQL uses a data centric approach that provides a flexible way for clients to request exactly the data needed (Wittern, Cha & Laredo 2018) and (Byron 2015). GraphQL also provides a way for describing the capabilities and requirements of data models (Byron 2015).

Freeman (2019) states the downside with REST API and other API solutions before REST is that the data they provide are determined in advance. This means that the response the clients will receive when requesting data from an endpoint will be fixed. Freeman also states that this can become an issue if or when client applications changes. Clients often needs to perform multiple requests to different API endpoints to collect the required data. This will increase traffic to the server and can become inflexible when the needs of clients evolve and when the number of applications using the service increases. This also requires the client to have knowledge about the relationships between the objects returned from different

requests.

GraphQL as a query language addresses this problem by providing the client with more control over the data requested and the data format of the response. Meaning that the client

2 SOAP Web Services Tutorial: Simple Object Access Protocol EXAMPLE.

https://www.guru99.com/soap-simple-object-access-protocol.html (2020-02-19)

(13)

8

can request specific fields of a resource, instead of entire resources. This means that a GraphQL solution moves the responsibility of how to select the data to the clients instead of the API developers (Byron 2015) and (Freeman 2019). Clients are provided with the

possibility to change their queries to suit their needs rather than by adding new endpoints to the API. This also allow API providers to add new capabilities to the data model without having to worry about breaking client-side application (Wittern et al., 2018). GraphQL APIs only consists of one endpoint handling all queries instead of several endpoints as in a REST API implementation (Taskula 2019). GraphQL as other API services are working over HTTP, but since GraphQL only uses one endpoint to handle POST and GET Requests, it is not possible to fully utilize all available HTTP methods (Serving over HTTP). To compensate for this GraphQL introduces operations called query, mutation and subscription as high-level protocols that operates on top of HTTP (Vogel, Weber & Zirpins 2017).

According to Brito, Mombach & Valente (2019) some of the major organizations are

adopting GraphQL as a solution to utilize the advantages in performance and usability. One major reason is that GraphQL queries can be defined as a request payload consisting of JSON-like format containing one or more operations (Farré, Varga & Almar 2019). This enables developers to fetch as much data as is needed in one request, resulting in reduced over- and under-fetching (Vogel et al., 2017). This in turn leads to fewer server round trips and smaller response packet sizes, reducing the workload on the server.

GraphQL schema

GraphQL uses a type system to describe the objects that exists and what can be fetched from the service by the clients. A type structure defines the object and the properties and

relations that belongs to the object. As shown in Table 1 below, the type structure uses a JSON-like format the describes the objects.

type Company { id: ID!

product: Product employee: [Employee]

}

type Product { id: ID!

description: String }

type Employee{

id: ID!

age: Int }

Table 1 GraphQL types

Each type has a name and the associated properties inside curly brackets. Like the JSON format the GraphQL schema language uses colon separated name and value pairs to describe properties names and their corresponding concrete data. The data identifiers are called scalar types and GraphQL introduces a set of default types seen below in Table 2.

(14)

9

ID Unique identifier

String Text values

Int Signed integer values

Float Decimal values

Boolean True or False

Table 2 Scalar types

The scalar types remind of the commonly known data types used in other programming languages such as C# (Microsoft 2016). The scalar type ID is not one of the common data types and is introduced to indicate a unique identifier value such as keys. With the

possibilities to describe a schema which specifies the supported types and operations, a GraphQL server provides one extra layer of request correctness control (Vogel et al., 2017).

GraphQL query

Queries in GraphQL are sent to the server in a JSON-like format, that declares the requested data. The server then executes the query and returns the requested fields in JSON format.

Below in Table 3 an example of a GraphQL request (in the left column) and response (right column) can be seen. The example illustrates a request on a Company with id equals to argument 1 and the corresponding fields id and name.

Request:

{

Company(id:1) { id,

name }

}

Response:

{

“Company”: {

“id”: 1

“name”: “foo”

} }

Table 3 GraphQL request with arguments and response

As seen in the example above, GraphQL provides a simple way for clients to specify exactly the data needed and receive exactly that. GraphQL also provides the possibility to specify arguments for nested fields or objects. This feature reduces the need of multiple API fetches, decreases the complexity of client code and avoiding unnecessary overhead (Introduction to GraphQL) and (Wittern et al., 2018). Below in Table 4 a request with nested arguments can be seen.

(15)

10 {

Company(id:1) { id,

name,

Product(id:2){

id, name }

} }

{

“Company”: {

“id”: 1,

“name”: “foo”,

“Product”: {

“id”: 2,

“name”: “bar”

} }

}

Table 4 GraphQL request with nested arguments and response

As shown in the tables above, GraphQL provides an efficient way for querying data as well as a dynamic way to modify and adding multiple arguments to the requested fields, but it may not be suitable for every situation. Taskula (2019) states that it can be hard to handle various edge cases. File uploading is one example that GraphQL does not provide a native support for, but this has of late become supported through libraries developed by the community (Au-Yeung 2019). GraphQL is also due to its relatively young age not as widely known and understood as other APIs such asREST or SOAP. Freeman (2019) writes that this provides difficulties in finding experienced developers, testing tools and useful libraries to support a GraphQL solution. Despite its problems, Freeman states that GraphQL is a valid option that should be considered when choosing between API solutions. Especially when an application likely needs continued development after release.

2.4 SQL-database

SQL databases or relational databases have been around for over 30 years. They have not only a long history but also a sound theoretical ground (Lawrence 2014). The main strength of a relational database is the enforcing of data integrity and how normalization prevents orphan data records. The downside, however, is that normalization requires the creation of multiple tables, all with their own set of primary and secondary keys. This leads to large overhead costs and as the database grows performance will suffer (Chickerur, Goudar, Kinnerkar 2015). According to Chand³ the four most popular relationship databases are Oracle, MySQL, SQL Server, and PostgreSQL.

2.4.1 MySQL/MariaDB

MySQL was created in 1995 by Swedish company MySQL AB and became publicly available in 1996 (Foster & Godbole 2016). The following decade MySQL rose in popularity and got adopted by large and small companies and institutions alike. In 2008 the company was bought by Sun Microsystems and the following year Sun Microsystems was purchased by Oracle (Kromann 2018). Many of the original developers of MySQL left and created a fork named MariaDB which follows their original philosophies for the product. The products are

3 What are Most Popular Relational Database

https://www.c-sharpcorner.com/article/what-are-the-most-popular-relational-databases/

[2020-02-07]

(16)

11

still compatible, and one database can be merged to the other, but in the future, they may start to deviate (O’Leary 2015). Hereinafter MySQL and MariaDB, due to their similarities, will be referred to as MySQL.

MySQL is a relational database that was built with scalability and speed as the main principles. When MySQL was first released it lacked a lot of features that its competitors had. Features such as triggers and stored procedures but, these has since been released for MySQL in subsequent versions (Kromann 2018).

Not only has the scalability and performance of MySQL compared to other relational databases helped with its widespread adoption but also the ease of use and flexibility.

Because MySQL is open source it has become popular in academic communities which creates and fosters knowledge and training in MySQL (Foster, Godbole 2016). Part of the flexibility is how platform independent it has become. There are binaries available for 14 platforms supplied officially and if those are not enough there is community made binaries and the possibility to use the source code that can be compiled to make it runnable on just about any platform. There is also support for most of the popular programming languages which makes it even easier to use and adopt (Kromann 2018).

2.5 NoSQL-database

The term NoSQL was first used in 1998 and means non-relationship databases. It was not until 2009 that the term became more popular and received a new meaning as “Not Only SQL”, a less strict definition. NoSQL is better at dealing with unstructured data compared to a relational database. Instead of storing data in tables NoSQL databases use an identification system with keys to retrieve the stored data. There are four strategies of storing data this way. Key-value which is a conceptually distributed dictionary. Column where cells of data are grouped in columns and columns are grouped into larger families of columns. Graph- oriented which provides good support for complex data queries. Document (which is the most popular strategy) where writing to and reading from documents are done using keys (Győrödi et al., 2015).

2.5.1 MongoDB

MongoDB is a document type NoSQL database that was first created in 2007 as part of a cloud-based application platform. However, it never took off and in 2009 mongoDB was taken out of the platform and open sourced (Zammetti 2013). MongoDB was built as the developers saw a deficit in existing database and how they did not fit with cloud computing.

This is reflected in scalability and how easy it is to deal with administration (Edward &

Sabharwal 2015).

Storage is done using documents. The documents use a format called BSON. The format was developed by the MongoDB team and is very similar to JSON. BSON stands for binary json and adds a few new features. It is faster for the computer to process than regular JSON.

There are some new types that don’t exist in JSON and the ability to do type handling. This means using and working with the data will almost be the same as with JSON (Hows, Membrey, Plugge 2015). Documents can have complex structures with mixtures of lists, fields and even any kind of document. At the very basic level the only requirement is that each document has an id field to be used as a primary key. This means that everything in an

(17)

12

SQL database that would be separate tables relations between them can be nestled in the same document. Thus, a document can be thought of as a row in an SQL database (Győrödi et al., 2015). If the documents can be viewed as rows, then collections in mongoDB can be viewed as an SQL database table. Documents belong to a collection just like rows to a table.

By default, the collection only requires that each document has a unique id field named _id.

If no such field exists a unique id will be automatically generated. A schema can be enforced to a collection to make sure all the documents contained have the same fields and types. It is also possible to have the schema validation as part of the application using MongoDB

(Subramanian 2019).

MongoDB queries are done using methods. Unlike SQL databases where there is a query language to query data in a fashion that is close to English. The methods in mongoDB perform operations on collections. The most common operations are CRUD but other methods such as search, and aggregation are also possible. Queries are performed on collections and uses JavaScript objects as arguments to specify the operation (Subramanian 2019).

MongoDB was designed with the mindset that it should not be trying to do everything for everyone. Instead it is purposely designed to be very fast at working with documents and easy to scale at the expense of losing other features. Applications that have large and complex data that needs to be accessed fast or require analytics will see great usability in MongoDB. Whereas other applications that need things like transaction support will not see the benefits the same way. Combining and using both a traditional relationship database and MongoDB can then bring more benefits (Hows et al., 2015).

(18)

13

3 Related work

GraphQL has started to take a foothold and become a valid choice for organizations that want to utilize the advantages coming with the technology (Brito, et al., 2019). Although the technology has been available for the community for more than four years, some research has been done comparing GraphQL with the traditional way of building APIs, namely REST focusing on performance in data fetching and packet size.

Taskula (2019) performed a study where a service using REST architecture called Bakery Service was compared with a GraphQL implementation. A case study was performed gathering insight in the Bakery Service and the application clients. The service is using a complex and deep hierarchy for their data which results in large and complex data structures used inside the system. Later an experiment was conducted, focusing on data fetching

performance from one of the pages inside the Bakery service system over two different network mediums, namely WiFi and 3G. The result of the study shows that GraphQL performed better than REST in all test cases. GraphQL was on average quicker on fetching less data which makes GraphQL more suitable when fetching data with complex hierarchical structure. Taskula performs his experiment in a more realistic setting using different

network mediums instead of using a more stable and replicable approach of a fast and stable local network. The API servers were hosted using Google Cloud platform which further makes it more realistic but less predictable and replicable as the network path taken from the client to the cloud platform may vary. REST endpoints are implemented in reusable way which means for the client to perform a similar request with REST as with GraphQL, it has to perform multiple requests over the network which will heavily favour GraphQL.

In a study by Hartina, Lawi & Panggabean (2018) an information system using both

GraphQL and REST is built. They later performed 20 tests repeatedly against GraphQL and REST measuring the execution- and transfer time of data to calculate the response time. The throughput is calculated from the number of maximum requests handled. The result of their study shows that a REST web service has close to two times faster response time compared to the GraphQL web service. The throughput results show that REST can handle more request per time units which seems reasonable due to the faster response time.

Another study comparing performance between APIs is done by Kumari and Rath (2015).

Their work is a case study on the Loanbroker example application where the response time and throughput of SOAP and REST is evaluated. Their results show that REST has higher throughput than SOAP for all the file sizes tested and the number of clients (from 1 to 28).

SOAP was outperformed by REST on all four metrics used but would occasionally on certain file sizes or number of clients have a performance equal to what REST had.

A study by Guo, Deng & Yang (2018) examines response time and packet size when performing complex queries against a GraphQL- and REST API. The result of their study shows that GraphQL is 8.9 % faster than a REST service when it comes to response time when handling complex query structures. The result also shows that when data is being selected more accurately the packet size sent between client and server will be reduced, which will reduce the load times for clients. When Gou et al. conducted their experiment, they performed it all on a single machine. Meaning that the client, server and database where running at the same time. When conducting an experiment and only using a single machine there is a chance that the results are affected by other processes, tasks and possible context

(19)

14

switching performed in the background. The experiment is using several endpoints for REST. This means that REST needs to perform three separate requests to fetch the same data as with GraphQL. This way of implementing endpoints is highly reusable because the client can more freely choose what data that is needed. But increased reusability will cause increased response time, due to several requests needing to be performed. The result of this approach will increase the overall response time for REST, not only because of the server needs to process several requests but also due to the client that needs to execute more logic to prepare and send each request.

(20)

15

4 Problem definition

In this chapter the problem definition will be presented. Starting with the motivation followed by the aim. After the aim research questions that will help answer the aim will be presented followed by hypotheses on each research question. Then the objectives will be presented which describe the steps necessary to answer the research questions and fulfil the aim. The delimitation section describes moments that are not examined or not taken into consideration in this study. Finally, this chapter also contains a brief explanation to the methodology chosen and validity threats relevant for the study.

4.1 Motivation

A vast amount of web services today relies on API’s. Today the most used API is the REST architecture and the second most common uses a SOAP implementation (Santos 2017 and Tihomirovs & Grabis 2016). However, a new type of API called GraphQL has started to gain a bit of attention. GraphQL was developed by Facebook in 2012 and made open source in 2015. GraphQL (2018) is a data centric API that allows the client to request for a specific set of fields rather than an entire resource like with SOAP and REST (Taskula 2019). Hartina et al. (2018) suggests that the doubt people have on this new API compared to the those that have been around for more than 20 years is due to the absence of enough testing. However, developers at both Facebook and people within the development community claims that GraphQL is preferable over the old APIs, due to its simpler usage and reduced server load.

Several studies in the field has compared REST against GraphQL when it comes to performance. Same as been done for REST and SOAP and results shows that REST

outperforms SOAP in most of the cases. But we have to the best of our knowledge not found any studies comparing all three APIs against each other. Meaning that it is not truly known how GraphQL performs compared to SOAP.

4.2 Aim

The aim of this study is to provide researchers and companies knowledge about the performance of GraphQL and how it compares to SOAP and REST. How the overhead reduction of GraphQL will affect the performance when fetching data with different characteristics compared to the other two. Will the underlying database choices have an effect on performance for any of the APIs when choosing between an SQL or a NoSQL implementation. We expect that this study will be of great utility to companies as a basis when evaluating GraphQL as an API solution.

4.3 Research questions

1. To what extent does the characteristics of data affect the performance of GraphQL, SOAP- and REST API.

2. To what extent does GraphQL overhead reduction affect the performance compared to SOAP and REST.

3. To what extent does relational vs non-relational database affect the performance of GraphQL, SOAP- and REST API.

(21)

16

4.4 Hypotheses

In this section each hypothesis will refer to a corresponding research question.

RQ 1

1. REST will have the best performance when all the fetched data is needed.

2. GraphQL will only have a better performance when there is a large amount of data and there is data to be omitted.

3. SOAP will have the worst performance regardless of testcase.

RQ 2

4. Overhead reduction will be a significant factor for GraphQLs performance.

5. GraphQLs total packet size will be larger than the other APIs when fetching every field from the database.

6. SOAP will have the largest packet size of all the APIs RQ 3

7. GraphQL and REST will have an increased performance when using NoSQL database due to the JSON format of MongoDB.

8. SOAP will perform better when using MySQL and not MongoDB

4.5 Objectives

1. Research the field and related work 2. Research the APIs

3. Create MySQL and MongoDB databases with generated data 4. Implement APIs & backend

a. GraphQL b. REST c. SOAP

d. Backend/data repository 5. Create client application

6. Evaluate and analyse experiment results

4.6 Delimitation

This section will outline some of the subjects and methods within the domain of the work that will not be studied or examined.

When evaluating performance this study will only look at the performance when fetching data. This means for GraphQL no insertions, mutations or subscriptions will be used or tested. For REST this means only GET requests will be performed using the HTTP protocol.

Same goes for SOAP, only the methods for fetching data will be tested.

NoSQL compared to SQL will only test MySQL and MongoDB. MongoDB being among the most common NoSQL databases should still provide good generalizability. MySQL are widely used and work similarly as most popular SQL databases and is easy to use, due to being open source and the ease of setup both on a remote server and locally.

(22)

17

For GraphQL, the endpoint used will be an endpoint defined as a separate controller in the REST API solution. This is done to lower development time and is very unlikely to affect the performance. GraphQL uses POST for every request, including data fetch requests, and since the evaluation of REST will only be done using GET no collisions or ambiguity should

interfere.

4.7 Method

A literature study was performed to gain insight in advantages and disadvantages between the different techniques, but also to gain knowledge in the subject and to be able to create test cases with as low bias.

The main method that will be used in this work to evaluate the performance differences between GraphQL, REST and SOAP is an experimental method. This experiment is designed to measure the performance in data fetching between these three techniques, which means that insert, delete and other operations will not be included in the experiment. Similar studies measuring response time has been performed by Guo et al. (2018), Taskula (2019) and Hartina et al. (2018), but all these studies have focused on REST and GraphQL. The study in this thesis will also include measurements on SOAP. The performance will be measured from when a client sends a request to server and stopped when the client receives the response. The request- and response packet size will also be measured to provide an understanding on how the packet size differs between the different techniques. This will also reduce the potential bias toward GraphQL that would have occurred if only the response packet would have been measured.

This experiment focuses on the performance of the backend implementation of each technique. This means that all kind of data visualization and data processing that may or should appear on the frontend will not be included in the performance measurements. Two versions of each API technique will be implemented with two different backends, which are connected to a MySQL- and a MongoDB database. The reason for this is to increase the fairness by visualizing and presenting a result for two different kinds of databases. Meaning that if one technique would gain benefits from using a specific database, the result will not be skewed because it can be compared with the results from the other test cases. Because of the differences between a relation- and no-relation database, design decisions were made to ensure that the same data exists in both databases. No real relationships will exist between the data in the databases and will be completely simulated when needed. The same backend solution will also be used for all techniques, which results in every API executing the same backend code which will also provide a level of fairness to the experiment.

The test cases in the experiment will be different from each other when it comes to what the clients request. The requested data will be of different sizes and complexity to provide an insight on how the performance differs on a variety of requests. Multiple iterations of each test case will be performed to increase the statistical power of the results.

The experiment will be performed in a controlled environment where two different computers (server and client) will be connected to each other through a local network. A response time measurement between the client computer and the server was performed using ping to provide insight on the network latency. The measurement showed that the average response time was less than 1 ms (millisecond). The potential network related noise is estimated based on these measurements, to be low. This study aims to provide a clear

(23)

18

indication on how these three different techniques performs against each other concerning performance and how the total packet size will differ.

To support the experiment results, a Friedman test (Friedman 1937) will also be conducted to statistically show if differences exists between the measured data. Friedman test is chosen because this thesis performs an experiment with several different test cases and the results cannot be expected to be normally distributed. The result from this experiment and

Friedman test will later be analyzed and presented.

Alternative methods that could be used instead of an experiment for this study could have been a case study. Where an existing system using this REST and SOAP would be

complemented with a GraphQL. Measurement of the performance could then have been measured before and after the GraphQL solution was integrated to the system. Data could then be extracted and analyzed to get insight on performance differences before and after GraphQL was added. In this work a case study would not be the optimal method due to lack of control in the test cases and reproducibility. To achieve the goals of this study a fully controlled environment is needed and cannot rely on measurements from a system with real users. There is also a risk of being unethical when performing the study with real data and real users without their knowledge and consent.

Another method that could have been used in this study could have been a literature survey.

Where other studies, books and articles would be examined to get insights from the

community and the works done in the field. Many related studies in the field have compared REST against GraphQL and SOAP against REST. But to the best of our knowledge, no work has been done comparing GraphQL against SOAP when it comes to performance and packet size. In this study our goal is to give a comparison between these three APIs and a literature survey would have been hard to perform due to the lack of related studies and content.

Comparing GraphQL to REST and REST to SOAP using a literature study, there is no guarantee that the results from a study of this kind would allow us to see how GraphQL compares to SOAP.

4.7.1 Ethics

The data that will be used in this experiment is inspired by a company’s databases, but the data is generated by a program created by the authors of this thesis. This provides total control of the data that will be used in the study, which means that no sensitive or potentially harming data will be exposed.

To avoid favouring one of the techniques, test cases will be created to be as general and varied as possible. A potential bias in measuring could be to measure only the size of the response packet since this will favour one of the techniques. To be fairer the request packet size will also be measured and compared to get the complete picture on the packet sizes sent over the network by each technique.

Threats against validity exists and needs to be considered to prevent that a wrong conclusion is drawn. One of the benefits when using statistics is that it is possible to draw conclusions based on the analysis of the pattern in the resulting data. It is still however possible to analyse the result wrong which could lead to wrongly drawn conclusions.

(24)

19

4.8 Threats to validity

This section will present the potential threats against this study together with a discussion on how the threat is handled or not.

4.8.1 Low statistical power

To prevent low statistical power, the experiment will generate 324 000 result records. Here each test case, API and database combination are executed 3000 times. Since this study is interested in average response time and packet size the large amount of data will reduce noise and provide reliable values that will not be skewed.

4.8.2 Reliability of measures

The experiment is designed so that each API is using a common backend implementation, this means that each API will use the same data fetching logic. This means that the workload to fetch data will be the same for all APIs and it will be possible to measure the performance difference between the APIs.

The experiment will be executed using one client computer and one server. These machines will not be changed between or under the experiment which means that the result will not be affected on the hardware. This configuration ensures that the experiment is repeatable by others. Attempts at reproducing the results of this study should therefore be close to the same as in this experiment. The chosen hardware or location in these attempts, should not affect the outcome to an extent which can be deemed as significant.

The result gained from this study will not be fully accurate and realistic to the real world since it is performed on a small scale in a controlled environment.

4.8.3 Interaction of selection and treatment

When creating the test data, the authors have been looking and taking inspiration from a database structure used by a company. From this database structure test data will be generated by a program created by the authors. This may however result in test data that is not generalizable to the real world The reason for this may be due to a non-generalizable data structure and that the generated data will then not be representative to real data.

4.8.4 Interaction of setting and treatment

When implementing the APIs for this study the authors have been looking on

implementations from internet, articles, and the software development community for inspiration on how an API should look like and be used. It is, however, possible that the API implementations in this study do not reflect the reality on how APIs are developed in the industry today.

(25)

20

5 Implementation

5.1 Setting up databases 5.1.1 Database schema

A database schema was provided by a company showing their schema. The schemas included over 20 tables out of which five was picked out for use. Some of the table names and column names were renamed and changed to avoid exposing the company’s schema and to make more sense out of the database for the experiment. Out of the five tables three of them had less than 10 columns. The largest has over 80 and the second largest over 40 columns. The data types that are included are strings/texts, integers, floats, Boolean and date time values.

In the experiment only three tables were used (see 5.6).

Neither the SQL nor the NoSQL databases will have defined relations in their schema. This is unlikely to make a performance difference. For tables that will end up needing to be joined an index on the field will be used in place of a primary key. Schema defined relations mostly affect how inputs, deletions and updates are done and neither of these will be evaluated in this study. The databases will get their data from a Java program developed by the authors that inserts data into the MySQL database. The data schema will then be exported from MySQL and converted to MongoDB to ensure both databases have the same data.

5.1.2 MySQL

Using the schema provided by the company a modified version of it was implemented in MySQL⁴. The schema provided was in XML so after modifying the XML to remove and rename several elements to better fit the experiment and to avoid giving away the company’s database schema a converter script was used. The script⁵ was written by the authors and converts each element and sub element into a SQL query which was then manually altered to add some of the missing features such as primary key.

5.1.3 MongoDB

MongoDB⁶ does not require a schema which makes importing data easier. When data had been generated and inserted into the MySQL database (see 5.2) it was exported from MySQL as JSON. The way data is exported in JSON from MySQL is not fully compatible with how mongo imports it. The export has every row as an element in an array while Mongo requires the data to be read as one JSON object at a time without any character separating them such as a comma. To make the JSON compatible the opening array brackets where removed manually and then a search and replace was used to replace the commas separating objects with a blank space.

4 Empty, dev version and final version of data available on GitHub at

https://github.com/Remes92/BachelorProject/tree/master/MySqlDb_data [2020-03-13]

5 The script is available on GitHub at

https://github.com/Oldalf/BachelorHelperApps/blob/master/xmlToSql.php [2020-03-13]

6 JSON in an importable format available on GitHub at

https://github.com/Remes92/BachelorProject/tree/master/MongoDb_data [2020-03-13]

(26)

21

5.2 Generating data

In order to generate a large amount of data that would fit the database schemas a Java application was developed. The java application⁷ reads the description of a table and its columns and creates objects out of them. It then generates data in a format that is appropriate to each column. Text and strings are generated by randomly selecting words from a text file containing words from the English oxford dictionary⁸ while numbers and Boolean values are created using the Java random library. To create fake relationships among data a comment containing “relation” is added to Int values that are supposed to represent a foreign key. These numbers will be randomly generated between 1 and the amount of rows being generated and the table they belong to will get four times as many rows as any other table, this way when using auto increment Ids each object will have on average four rows in a foreign table referencing it. GUIDs or UUIDs are done in a similar way using comments. When a column with a GUID comment is encountered a Java UUID is generated instead of selecting random text from the dictionary file. After a row of data has been generated it is converted to a SQL insert query. The query is then added into a SQL batch job which runs at the end when all rows for the current table has been generated. This is done for every table in the database. The database has five tables, three which are

supposed to have relationships and two which have no relations. When running the application, it is set to generate 2500 rows means we get two tables with 2500 rows and three with 10000 rows of data which is more than needed to carry out the experiment.

5.3 Implementing APIs

The APIs in this experiment will be developed using frameworks from .NET and

programmed in the C# language. The reason for this is that .NET provides frameworks that supports all three APIs and both .NET and C# are commonly used by the industry. Each API will then be connected to a common Backend implementation (described in 5.4). The data flow from request from the clients to database fetching and response from the API can be seen in Figure 7.

Figure 7 Data flow

5.3.1 SOAP

Because SOAP APIs is not frequently developed today, it does not have the native support as GraphQL and REST have in the .NET Core. There exists a NuGet package that makes it possible to develop SOAP APIs in .NET Core, but it comes with inherent drawbacks such as a lack of security features and no WSDL utilisation. Because of these drawbacks (Rousos 2016)

7 Available on GitHub at https://github.com/Oldalf/dataBaseSamplePopulator [2020-03-13]

8 Alpha word list from https://github.com/dwyl/english-words [2020-03-13]

(27)

22

the authors made the decision to implement the SOAP API solution in .NET Framework with the motivation that if a company or an organisation would develop a SOAP solution it would be preferable to do it in .NET Framework instead of .Net Core.

The SOAP API⁹ in this study is developed in .NET Framework version 4.7.2 and created as an ASMX web service. The SOAP web service will be the interface outwards and handle

incoming requests from the clients. The SOAP service contains one endpoint for each test case which means that it contains 36 unique endpoints. Example of how the endpoints is implemented can be seen in Appendix B -

5.3.2 REST

It is possible to implement REST APIs in both .NET Framework and .NET Core. The REST API¹⁰ in this study is developed in .NET Core version 3.1 and created as an ASP .NET Core Web Application. The reason for using .NET Core instead of Framework is because .NET Core is the newest technology of this two and if companies would implement a REST API today it is highly likely that they would choose Core instead of Framework. The REST API will use the integrated controller logic that comes with ASP .NET Core Web Application to handle all the incoming requests from the clients. Two different controllers were

implemented to provide more code readability and provide a clearer indication on which database each endpoint will target. Example on REST controller can be seen in Appendix C - .

5.3.3 GraphQL

GraphQL has no native support from either .NET Core or Framework as REST and SOAP have. It is possible to either implement it on your own or download a NuGet packet

developed by the .NET community. The GraphQL implementation¹⁰ in this study is using the GraphQL NuGet package¹¹ that provides an implementation of Facebook's GraphQL for .NET. The reason for using this NuGet packet implementation is because it is well

documented and is used by the community. The time limit of the project is also one reason for why a finished solution is used.

When a GraphQL client is requesting data from the API, the client needs to provide more information regarding the operation and the data it want to receive. To be able to pass all that information to the API a GraphQLQuery object was created containing all the necessary information that the API needs to resolve the query. The GraphQLQuery object is then sent in the body of the HTTP request from the client to the server API. An example of the

GraphQLQuery object can be seen in Figure 8 below.

9 Available on GitHub at https://github.com/Remes92/BachelorProjectSOAP [2020-03-13]

10 Available on GitHub at https://github.com/Remes92/BachelorProject [2020-03-13]

11 GraphQL NuGet package https://www.nuget.org/packages/GraphQL/ [2020-03-17]

(28)

23

Figure 8 GraphQL query object

The GraphQLQuery object enables the query to be changed by the client which means that only one endpoint is needed in a GraphQL API. For further information regarding GraphQL see 2.3.3. Two GraphQL endpoints are implemented, one for each database. The reason for this is the same as for REST. It brings more readability and more clarity on what database is targeted by that specific endpoint.

To resolve the requested data the GraphQL API controller needs to process the request. This means that the controller of the GraphQL API is more complex compared to SOAP and REST. The implemented GraphQL controller can be seen in Appendix D - . The controller needs to extract the information passed from the body of the request and provide the information to GraphQL functions and mapped to the schema of GraphQL. For more information regarding GraphQL schema see 2.3.3. The schema defines what data the client can request and resolves the requested fields against the fields specified in the API. Fields are specified to represent what type of data that is accessible through the API. In this implementation all fields from the database are accessible and therefore all attributes are defined as fields. In this implementation models are used as containers to store data from the database. A model is then parsed to an ObjectGraphType which the API can resolve against the requested fields. For example, see one model and the corresponding

ObjectGraphType and how they are implemented in this implementation in Appendix G - .

5.4 Implementing backend

Because SOAP was developed using .net framework while REST and GraphQL were developed in .NET Core the backend had to be available for both .NET environments. To accomplish this the backend was written in the .NET¹² core project and once completed the code was copied over as a new .NET framework project in the SOAP solution. The backend includes five model classes, each one representing one table from the database(s). Some of the models which have relations also include the model of the object to which they have a relationship. For every model there is a static factory class. Every factory has two functions, one to return the model when the argument is a MySQL result and one when the argument is the results of a MongoDB query. The two different functions work in a similar way although the syntax is a bit different to accommodate the input container.

There are also two database handler classes. One for MongoDB and one for MySQL. In each handler a connection is made and there exists some basic queries to get data from each table

12 Available on GitHub at https://github.com/Remes92/BachelorProject [2020-03-13]

(29)

24

by id. There are also five functions (six with one of the basic queries mentioned earlier) that are used in the experiment. Three of these functions return a list and three return just one object. Four of the functions also join data. In MySQL this is done using SQL keyword join while in MongoDB this is done by creating an aggregation and performing a lookup on the aggregation pipeline.

5.5 Test Environment

The test environment used in this study consists of one client computer and one server machine that are connected through a local network connected via a switch.

5.5.1 Server

To host the APIs on the server Windows IIS (Internet information services) was used. To get the REST and GraphQL with their backend to be hosted on the server a proxy solution had to be used. This is because IIS was not able to execute the dll file. The dll file was therefore executed manually and listens to a specific port on the localhost. A reverse proxy could then be configured on the IIS to forward any request and their HTTP header and body to the localhost where the API was listening. To host the SOAP service simply moving the published files to the correct directory and binding a port in the IIS service manager was enough get it hosted on the assigned port.

The same server that hosts the APIs also hosts the databases used. This means that the backend accessing MongoDB or MySQL does so using localhost and doesn’t have to access another server with the databases remotely.

No configurations on the server or settings outside those related to the api where changed and were left as they were when the server was provided.

Specifications

The server has the following hardware: Central processing unit (CPU): Intel core i3 3220 which runs at 3.30 GHz. Operating system Windows Server 2012 64 with 20 GB DDr3 random access memory (RAM) running at 665Mhz.

5.5.2 Client

To send requests to the API server a client application was developed to simulate multiple clients performing multiple requests to the server. Inspired by the work of Kumari & Rath (2015) where they used multithreading concepts to simulate concurrent users. Each thread was running the same test suite and when they are done, the program waits for all threads to finish before the next test suite was executed. An example of the client application code can be seen in Figure 9.

(30)

25

Figure 9 Part of client application threading logic

Each test suite contains all combinations of test cases, which are further discussed in 5.6. In section 5.3 a discussion regarding cashing is conducted. In the experiment each API will indicate to the clients that the response should not be cached. However, some threads will request the same data eventually but not at the same time. In this experiment no measures have been taken when it comes to disabling database caching.

Request

Each request sent to the API server was created by the authors using the

HttpRequestMessage class provided by the System.Net.Http namespace. The reason for this is to measure the content packet size of the request sent to the server. In this experiment only the content size of the request message will be measured. measurements and tests made by the authors indicates that the headers of the HTTP message will be constant and will not vary in size. This means that the http header will not affect the packet size as much and can because of this be excluded from the measurements.

Each API have a unique way of requesting data, where REST passes parameters in the URL and SOAP and GraphQL sends parameters using the HTTP body. See section 2.3 for more information. Because of this the request body for SOAP- and GraphQL messages had to be created to be measured. To not add any extra overhead or unnecessary content to the request each request message was created with as little content as possible. The reason for this is that no extra load should be added that can affect the measurements of the total packet size.

Examples of request body’s for GraphQL and SOAP can be seen in Appendix A - . Specifications

The client application was executed on a laptop with following hardware: Central processing unit (CPU): Intel Core i5-7300HQ which runs at 2.50GHz. Operating system: Windows 10 Home 64-bit with 8GB random access memory (RAM) running at 2400Mhz.

5.6 Test Design

To decide what tests to run an input space domain was used. In the input space three classes were chosen, number of columns, number of rows and the number of joins to perform. The column amount class has three parameters: 6, 20 and 100. The row class only has two parameters, 1 and 100. The last class has three parameters: 1,0 and 2. These parameters were combined using all combinations to become the 18 test cases requirements which can be seen in Table 5 below.

(31)

26

Test case Columns Rows Number of joins

1 6 1 1

2 6 1 0

3 6 1 2

4 6 100 1

5 6 100 0

6 6 100 2

7 20 1 1

8 20 1 0

9 20 1 2

10 20 100 1

11 20 100 0

12 20 100 2

13 All 1 1

14 All 1 0

15 All 1 2

16 All 100 1

17 All 100 0

18 All 100 2

Table 5 All combinations of parameters

To keep all the tests as comparable to each other as possible the same database tables when possible when converting a test requirement into a test case. The database tables used are product, company and productType. The product table has over 40 columns while

productType and company have less than 10 but more or equal to 6. When the requirement includes joins the number of columns will as long as possible also include getting the same amount of columns from the joined table and if the number of columns exceed the amount possible to get from the joined table all columns will be used. REST and SOAP will always use every column since they do not have the ability to select less unlike GraphQL where it is possible to select which columns to get using the same endpoint.

Below some of the test cases will be explained.

Test case 1

Get a product and if possible, select only 6 columns from product and join the productType table and if possible, get 6 columns from productType.

Performance comparison: Between GraphQL, REST & SOAP