Bachelor Degree Project Are APIs with Poor Design Subject to Poor Lexicon?

(1)

Author: Ahmad Sadia Author: Osama Zarraa Supervisor: Francis Palma Semester: VT 2020

Subject: Computer Science

Bachelor Degree Project

Are APIs with Poor Design Subject to Poor Lexicon?

- A Google Perspective

(2)

Abstract

REST (Representational state transfer) is an architectural style for distributed hypermedia systems. The simplicity of REST allows straightforward communication between HTTP clients and servers using URIs (Uniform Resource Identifiers) and HTTP methods, e.g., GET, POST, PUT, and DELETE. To do the communication effectively between clients and servers, there is a set of best design practices (design and linguistic patterns) shall be followed, and a set of poor design practices (design and linguistic antipatterns) shall be avoided. This study aims to determine whether there is a relationship between design and linguistic quality in Google RESTful APIs.

To find this relation, a tool is developed to detect patterns and antipatterns in REST APIs both in terms of design and linguistic quality. The input of this tool is qualitative data (Google APIs) and its output is quantitative data. Using this quantitative data, a statistical study is then performed to detect the relation. The tests that are conducted to obtain the final results are Chi-squared and Phi Coefficient tests. The result of Chi- squared that considered all the groups of patterns and antipatterns shows that there is a statistically significant relation between design and linguistic quality. However, when we assess the individual pair of patterns and antipatterns, our Phi Coefficient tests show that for most of the cases, there is no or negligible relationship between linguistic and design patterns and antipatterns.

Keywords: Design patterns, Antipatterns, RESTful APIs, URIs, Uniform Resource Identifiers, Detection, Design quality, Google.

(3)

Preface

We would like to thank our supervisor Dr. Francis Palma who guided us step by step to finish this thesis project. Without his support and valuable knowledge in this field, we would not be able to complete this study.

(4)

List of Figures

1.1 Breaking Self-descriptiveness. Source: [1] . . . 2

1.2 Contextless Resource Names. Source: [1] . . . 3

2.1 Research Methodology . . . 9

3.1 REST-Ling (the detection tool) . . . 12

4.1 Google Photos APIs Detection of Individual Pattern and Antipattern . . . 20

4.2 Google Drive APIs Detection of Individual Pattern and Antipattern . . . . 20

4.3 Google Classroom APIs Detection of Individual Pattern and Antipattern . 20 4.4 Google Blogger APIs Detection of Individual Pattern and Antipattern . . 21

4.5 Google Calendar APIs Detection of Individual Pattern and Antipattern . . 21

4.6 Google Gmail APIs Detection of Individual Pattern and Antipattern . . . 21

4.7 YouTube APIs Detection of Individual Pattern and Antipattern . . . 22

4.8 Google Sheets APIs Detection of Individual Pattern and Antipattern . . . 22

4.9 Google Photos APIs Detection of Total Pattern and Antipattern . . . 22

4.10 Google Drive APIs Detection of Total Pattern and Antipattern . . . 23

4.11 Google Classroom APIs Detection of Total Pattern and Antipattern . . . . 23

4.12 Google Blogger APIs Detection of Total Pattern and Antipattern . . . 23

4.13 Google Calendar APIs Detection of Total Pattern and Antipattern . . . 23

4.14 Google Gmail APIs Detection of Total Pattern and Antipattern . . . 24

4.15 YouTube APIs Detection of Total Pattern and Antipattern . . . 24

4.16 Google Sheets APIs Detection of Total Pattern and Antipattern . . . 24 1.1 Forgetting Hypermedia Detection Algorithm. Source: [1] . . . A 1.2 Ignoring Caching Detection Algorithm. Source: [1] . . . A 1.3 Ignoring MIME Types Detection Algorithm. Source: [1] . . . B 1.4 Ignoring Status Code Detection Algorithm. Source: [1] . . . B 1.5 Misusing Cookies Detection Algorithm. Source: [1] . . . B 1.6 Amorphous URIs Detection Algorithm. Source: [1] . . . C 1.7 Contextless Resource Names Detection Algorithm. Source: [1] . . . C 1.8 Non-hierarchical Nodes Detection Algorithm. Source: [1] . . . C 1.9 CRUDy URIs Detection Algorithm. Source: [1] . . . D 1.10 Singularised vs. Pluralised Nodes Detection Algorithm. Source: [1] . . . D 1.11 Content Negotiation Detection Algorithm. Source: [1] . . . D 1.12 Entity Linking Detection Algorithm. Source: [1] . . . E 1.13 Response Caching Detection Algorithm. Source: [1] . . . E

(7)

1 Introduction

Service-Oriented Architecture (SOA) is an architectural choice that has become dominant within the industry as it presented new ways of developing, deploying, and consuming a software system [2]. There are two major web services standards one can use to build SOAs and make web applications interoperable: Simple Object Access Protocol (SOAP) and REpresentational State Transfer, known as REST. However, REST has taken the lead and has become a standard architectural style adopted by many software organizations [3].

REST-style is based on the client-server pattern, and REST applications employ four primary operations to exchange data; Create, Read, Update, and Delete (CRUD) operations. In the terminologies of REST, they are presented respectively as POST, GET, PUT, DELETE operations. These four operations, along with a single addressing scheme based on a URI (Uniform Resource Identifier), form the architectural structure of REST [4]. REST is simpler and more effective than SOAP in terms of publishing and consuming services [5]. Its simplicity allows HTTP clients to communicate with HTTP servers through a single URI using its operations [4]. For an application programming interface (API) to be called a REST API, it should conform to the constraints listed by REST [3].

A well-designed REST API can attract client developers to use the service and put the service provider ahead of the competition. Another important aspect that attracts and benefits developers is the linguistic design quality of the REST API. For example, a URI that can be reused and understood easily helps the client developers while designing and developing their web-based services using that REST API [6]. However, poor design choices may be introduced while developing the web service, which forces the REST API to change and could lead to bad design decisions to solve existing problems or challenges (design antipatterns) [7]. These antipatterns may also propagate to include poor linguistic design decisions, which might affect the overall RESTfulness of the web service and drive away potential clients [6].

Nevertheless, there has been no evidence that a poorly designed REST API also has a poor linguistic quality. Hence, this paper aims to investigate if there is a relationship between REST design quality and linguistic quality in RESTful APIs. For this, we aim to analyze eight Google APIs.

1.1 Background

Google APIs are designed to communicate between Google servers and client applications. Good design is important for meaningful better communications. To ensure high- quality design, the APIs need to be of high quality both in linguistic aspects and design principles of REST. Linguistic design quality ensures that URIs are designed using well- understood identifies that are easy to use and understand by the client developers. More- over, REST design principles suggest a few constraints, e.g., statelessness cache-ability, that must be enforced while designing RESTful APIs. Thus, there might be a potential relationship between REST and linguistic design quality.

1.1.1 REST APIs

In the SOA paradigm, web services are servers that are built to support the needs of either a site or any other application. In order for the client programs to communicate with these web services, application programming interfaces (APIs) are used. In other words, the main task of an API is to allow computer programs to exchange information

(8)

between each other. REST is one of the leading architectural styles adopted by many large organizations to develop APIs of the various web services they offer; these services are also known as REST APIs [8] [9].

1.1.2 REST Design Patterns and Antipatterns

Systems evolve to meet new requirements or add new features. These changes might affect the underlying technology and propagate to affect the REST API itself, forcing it to change. All these alterations may reflect negatively on the REST API design, and this might bring the developer to implement common poor solutions to solve recurring design problems (antipatterns). On the other hand, design patterns can be defined as good solutions to design problems that might arise while designing and developing a system [7] [10]. In this study, the following design antipatterns are used:

1. Breaking Self-descriptiveness: an antipattern that occurs when REST developers ignore the standardized headers, formats, or protocols. It reduces the efficiency, reusability, and adaptability of REST resources [7, 11]. The following figure shows the detection method of breaking self-descriptiveness antipattern.

Breaking Self-descriptiveness anti-pattern

1: proccedureBREAKING-SELF-DESCRIPTIVENESS(request-header, response-header) 2: std-request-headers[] ? {"Content-Type", "Proxy-Authorization", "Host", ...}

3: std-response-headers[] ? {"Set-Cookie", "Last-Modified", "Location", ...}

4: for each h^req? request-header.getKeys()and h^res? response-header.getKeys() do 5: if (hreq ? std-request-headers[] or (h^res? std-response-headers[]) then

6: print "Breaking Self-descriptiveness detected"

7: else if 8: end for 9: end procedure

Figure 1.1: Breaking Self-descriptiveness. Source: [1]

2. Forgetting Hypermedia: occurs due to the lack of proper entity linking and hinders the state transition for REST applications. It restricts the communication between the clients and servers by restricting the client follow links [1, 11].

3. Ignoring Caching: due to this antipattern implementation complexity, many developers, ignore caching capability in both the client and server sides. A developer can easily break this principle by setting the Cache-Control header to no-cache or no- store, and ignoring an ETag in the response header. Thus, that hinders any caching [1, 11].

4. Ignoring MIME Types: occurs when the server fails to represent resources in various formats. It may prevent clients, to consume the services more flexibly, and leads to the limitation of the resources reusability and accessibility [1, 11].

5. Ignoring Status Code: this practice affects the semantic communication between the clients and servers negatively; it occurs when REST developers avoid using the defined set of application-level status code [1, 11].

(9)

6. Misusing Cookies: an antipattern that concerns the security and privacy of the service; it occurs when a Set-cookie or a Cookie header contains keys or other tokens that are supposed to be sent by other, more standardized, means [1, 11].

As for the design patterns, the following three are used:

1. Content Negotiation: a design pattern that allows the service to support alternative resource representations depending on the metadata provided by the consumer. As a result, service consumption becomes more flexible and highly reusable. If not appropriately applied, this pattern turns into Ignoring MIME Types antipattern [1, 9].

2. Entity Linking: if applied, this pattern enables the runtime communication of entity relationships by providing links to the service consumers in response messages.

If not appropriately applied, this pattern turns into Forgetting Hypermedia antipattern [1, 9].

3. Response Caching: a pattern that caches all response messages in the local client machine to avoid sending duplicate requests or responses. If not appropriately applied, this turns into Ignoring Caching antipattern [1, 9].

1.1.3 REST Linguistics Patterns and Antipatterns

REST APIs depend a lot on the clear linguistic relationship between the services and their resources. However, bad practices, such as poor naming, may reflect negatively on the REST API design in terms of user usability. Hence, these bad practices are categorized as antipatterns[6]. In this study, the following linguistic patterns, and their corresponding antipatterns, are used:

1. Contextualised vs. Contextless Resource Names: applying this pattern ensures that each URI is contextual. In other words, all nodes in a URI belong to a semanti- cally related context. When not appropriately applied, its corresponding antipattern occurs and leads to decreased understandability of the service [6]. The following figure shows the pseudocode of the detection method of Contextless Resource Names antipattern.

1: procedure CONTEXTLESS-RESOURCE-NAMES(Request-URI) 2: URINodes? EXTRACT-URI-NODES(Request-URI)

3: for each index= 1 to LENGTH(URINodes)

4: Set1? CAPTURE-CONTEXT-BY-SYNSETS(URINodesindex) 5: Set2? CAPTURE-CONTEXT-BY-SYNSETS(URINodesindex+1) 6: if Set1

?

Set2 = ?

7: print "Contextless Resource Names detected"

8: end if 9: end for

10: end procedure

Contextless Resource Names

Figure 1.2: Contextless Resource Names. Source: [1]

(10)

2. Hierarchical vs. Non-hierarchical Nodes: a pattern to ensure that each node in a URI is related, hierarchically, to its adjacent nodes. If not applied, its corresponding Non-hierarchical Nodes occurs, leading to users confusion regarding the purpose of the API. Thus, hindering the understandability and the usability of the API [1].

3. Tidy vs. Amorphous URIs: a URI is considered tidy when it has an appropri- ate lower-case resource naming without any extensions, underscores, or trailing slashes. A URI that does not adhere to this pattern is considered an Amorphous URI; this antipattern may mislead users and decrease readability [1].

4. Verbless vs. CRUDy URIs: a verbless URI uses HTTP methods, such as Get, POST, PUT, or DELETE. On the other hand, a CRUDy URI uses terms, such as create, read, update, or delete; this can confuse the API clients and cause an overload on the HTTP methods. Moreover, introducing a CRUDy antipattern may trigger another REST design antipattern, Tunneling Through GET/POST [1].

5. Singularised vs. Pluralised Nodes: singular/plural nouns should be used consis- tently across the API. The last node of a PUT/DELETE request URI should be singular. However, that last node should be plural in POST requests. If this pattern is not applied correctly, the Pluralised Nodes antipattern occurs, causing unexpected server responses [1].

6. Non-versioned URIs: As web services change, many alterations may occur and propagate across the service. API versioning is recommended to manage the complexity of these changes. URI versioning is one of the most common versioning methods in REST. A Non-versioned URI antipattern may lead to users’ confusion regarding the API version in use and, in worst scenarios, may break existing consumers [12].

1.2 Related Work

This section highlights the effort in [7] and [6] regarding REST design pattern and antipatterns, as well as the detection of REST linguistic patterns and antipatterns.

In [7], the authors focus on the REST design patterns and antipatterns. The paper proposes and uses a heuristics-based SODA-R approach (Service Oriented Detection for antipatterns in REST), to detect the poor solutions of design problems, i.e., antipatterns.

There are eight REST antipatterns and five REST patterns defined in the article where they are used to execute their detection on 12 different well-known REST APIs. As a result, the approach that was used tends to be accurate, and it detects poor design in the REST APIs, for example, Twitter and Dropbox.

The authors in [6] present 12 linguistic patterns and antipatterns, and proposes the SARA approach (Semantic Analysis of RESTful APIs) that apply both semantic and syn- tactic analysis of REST APIS for the detection of the linguistic patterns and antipatterns.

The detection is performed on 18 well-known RESTful APIs, including Facebook and Dropbox. The results suggest that most of the APIs follow REST patterns and antipatterns, especially when it comes to APIs with poor documentation. Furthermore, the detection results confirm the accuracy of the SARA approach compared to the state-of-the-art DOLAR approach (Detection of Linguistic antipatterns in REST) [6].

(11)

1.3 Problem Formulation

REST APIs have been adopted by many large software organizations for their web services [6]. Thus, ensuring that a REST API is flexible, reusable, and understandable is essential for API providers to attract more client developers in a competitive market [8];

this also means that design quality and linguistic quality are both equally important factors to be considered by REST developers. Previous research has been done on the design and linguistic patterns and antipatterns in REST APIs and the importance of detecting them [7] [6].

The study, however, aims to investigate the relationship between REST design quality and linguistic design quality in Google APIs. More specifically, the study aims to investigate whether the Google APIs that violate REST design patterns are also subject to poor linguistic quality, and vice versa.

1.4 Motivation

REST design has a set of best practices (patterns) that is always growing as more research is conducted on the topic; this set includes design and linguistic patterns that, if fully adhered to, ensure the RESTfulness of an API. However, if poor practices (antipatterns) are introduced in terms of design quality, what are the chances these antipatterns will propagate to affect the linguistic design quality or vice versa? The answers to such a question might be beneficial to REST developers as it gives them a valuable insight into the consequential effect of using antipatterns on the RESTfulness of the API.

1.5 Objectives

This section presents the objectives of this study and the research questions that shall be answered to achieve them.

O1 Investigating the relation between design quality vs. linguistic quality in Google APIs.

O2 Investigating the relation between design antipattern vs. linguistic antipattern in Google APIs.

O3 Investigating the relation between design pattern vs. linguistic pattern in Google APIs.

O4 Investigating the relation between design antipattern vs. linguistic pattern in Google APIs.

O5 Investigating the relation between design pattern vs. linguistic antipattern Google APIs.

O6 Statistically study the relationship between the results obtained in the previous steps, i.e., see if Google APIs with poor linguistic design quality also significantly violate REST design principles and vice-versa.

Table 1.1: Objectives of the Study

To achieve the above objectives, the following research question and its sub-questions are proposed:

• RQ1: What is the relationship between design quality and linguistic quality in Google APIs?

(12)

RQ1 aims to investigate whether Google APIs that have design antipatterns (or patterns) are also prone to linguistic antipatterns (or patterns).

• RQ1.1: What is the relationship between design antipatterns and linguistic antipatterns in Google APIs?

RQ1.1 aims to investigate whether Google APIs that have design antipatterns are also prone to linguistic antipatterns.

• RQ1.2: What is the relationship between design antipatterns and linguistic patterns in Google APIs?

RQ1.2 aims to investigate whether Google APIs that have design antipatterns are also prone to linguistic patterns.

• RQ1.3: What is the relationship between design patterns and linguistic patterns in Google APIs?

RQ1.3 aims to investigate whether Google APIs that have design patterns are also prone to linguistic patterns.

• RQ1.4: What is the relationship between design patterns and linguistic antipatterns in Google APIs?

RQ1.4 aims to investigate whether Google APIs that have design patterns are also prone to linguistic antipatterns.

It is expected that the results show a relationship between design and linguistic quality in the analyzed Google APIs. However, the significance of the correlation might vary when investigating each pair of patterns/antipatterns for the research sub-questions.

1.6 Scope/Limitation

Due to a large number of Google APIs, this research project analyzes only eight Google APIs. Table 1.2 shows the chosen Google APIs to be analyzed against design and linguistic patterns and antipatterns. Also, it shows the number of version and instances for each API.

REST APIs Online Documentations Version URIs Tested

Google Photos APIs developers.google.com/photos v1 17

Google Drive APIs developers.google.com/drive v3 42

YouTube APIs developers.google.com/youtube v3 52

Google Classroom APIs developers.google.com/classroom v1 56 Gmail API APIs developers.google.com/gmail/api v1 74 Google Calendar API APIs developers.google.com/calendar v3 32 Google Sheets APIs developers.google.com/sheets/api v4 17 Google Blogger APIs developers.google.com/blogger v3 27

Table 1.2: Google APIs and their Versions and Number of Instances

Moreover, this project detects only the following design and linguistic antipatterns. Table 1.3 presents the six design and linguistic antipatterns that are checked against the Google APIs included in this study.

(13)

Design antipatterns Linguistic antipatterns Breaking Self-descriptiveness Contextless Resource Names Forgetting Hypermedia Non-hierarchical Nodes Ignoring Caching Tidy vs. Amorphous URIs Ignoring MIME Types CRUDy URIs

Ignoring Status Code Singularised vs. Pluralised Nodes Misusing Cookies Non-versioned URIs

Table 1.3: Design antipatterns and Linguistic antipatterns

Finally, the study detects three design patterns and six linguistic patterns, as shown in Table 1.4.

Design Patterns Linguistic Patterns Content Negotiation Tidy URIs

Entity Linking Verbless URI Response Caching Versioned URIs

- Contextualised Resource Names

- Hierarchical Nodes

- Singularised vs. Pluralised Nodes

Table 1.4: Design Patterns and Linguistic Patterns

1.7 Target Group

The study contributes a tool that is developed to detect a set of patterns and antipatterns in RESTful APIs; this tool might be of interest to academics aiming to perform a replication or do further research on design quality, linguistic quality, or both in the context of REST APIs. Practitioners can also use the tool to test the RESTfulness of their APIs at different stages of development. The findings of the study might reveal useful identifiers that could help developers predict any design or linguistic antipattern proneness.

1.8 Outline

The rest of the paper is structured as follows. Section 2 presents the methodology as well as the reliability and validity of the study. Furthermore, section 3 explains the implementation of the developed tool to detect the design and linguistic patterns and antipatterns.

Then section 4 presents the results of the tool. Moreover, sections 5 and 7 analyze and discuss the obtained results, respectively. Finally, section 7 concludes this paper and in- troduce a direction for future work.

(14)

2 Method

In this section, the methodology followed will be presented. The study is explanatory and done using both qualitative and quantitative data; it aims at answering the research questions listed in Section 1.5.

2.1 Data Collection and Processing

The research conducts a case study on the APIs; it analyzed a total of eight Google APIs (see Table 1.2). The APIs are chosen mainly because of their extensive usage, public availability, and the number of instances each API has, which suits the purpose of this study and its time limitation. Qualitative data are first collected manually from the chosen Google APIs. A tool is developed to detect the REST and linguistic design patterns and antipatterns. The tool also quantifies the qualitative data collected from the APIs; the data is then analyzed statistically to present evidence whether a correlation between design and linguistic quality exists or not. The remainder of this section describes the steps followed:

• Step 1: Extracting Google APIS: In this step, the required Google APIs are extracted from https://developers.google.com and stored in a JSON file. Additionally, the required parameters and data-form of each URI are filled into the JSON file.

This process is done manually due to the variation between the APIs and URIs.

• Step 2: Implementing the Detection Algorithms: This manual step involved developing a tool to detect patterns and antipatterns of both linguistic- and REST design practices. The detection algorithms used in the tool are provided in these previous studies [7] and [6]. The development of the tool also made use of the SARA (Semantic Analysis of RESTful APIs) approach followed in this previous research [6] to implement some of the linguistic antipatterns detection algorithms. As for detecting the design antipatterns, the tool followed the same heuristics-based approach as in this previous research [7]. Finally, the implementation included user interface implementation.

• Step 3: Detecting Linguistic Patterns and Antipatterns: In this step, the tool automatically aggregates the number and collects the types of linguistic antipatterns detected in the APIs.

• Step 4: Detecting Design Antipatterns: Similar to step 3, the tool aggregates the number of collects the types of design antipatterns detected in the same APIs as in step 3.

• Step 5: Analysing the Data: In this last step, the data obtained from steps 3 and 4 are analyzed. The analysis is done by conducting two types of tests, the Chi- Square test of independence¹ and the Phi Coefficient test². The Chi-Square test of independence determines if there is a significant relationship between two nominal (categorical) variables [13, 14]. Therefore, this test is used on the two groups of patterns/anti-patterns. The Phi coefficient test, on the other hand, is used to determine the relationship strength between two binary (dichotomous) variables [14, 15].

Therefore, it is used when assessing individual pairs of patterns/anti-patterns in this study.

1https://rdocumentation.org/packages/stats/versions/3.6.2/topics/

chisq.test

2https://rdocumentation.org/packages/psych/versions/1.0-17/topics/phi

(15)

Step 1 Step 2 Step 3 Step 4 Step 5

Manual step Automatic step Google

APIs

Extract APIs

Store URIs in JSON file

Detection algorithms

implementation Lingustics

patterns and anti-patterns detection User interface

implementation

Design patterns and anti-patterns

detection RESTful

APIs

Detection Algorithms

Detected patterns &

anti-patterns

Statistical Study of the

result Result

Detected patterns &

anti-patterns

Figure 2.1: Research Methodology

Figure 2.1 highlights the steps this research study adapts to achieve the required objectives.

2.2 Reliability and Validity

This section discusses the reliability and validity of this study.

2.2.1 Reliability

The method to collect the data from Google APIs sites is done manually in this study.

Due to the variation in the HTML structure in Google websites that are used to fetch the needed APIs, there is no web scraping tool used to scrap the data (URIs). However, the data are collected and stored in a JSON file manually. In other words, the initial qualitative data are extracted manually, which introduces the possibility of human errors.

A couple of practices are considered to reduce human errors, such as reduce the amount of data that is entered by copying a base URI for each API. Moreover, presenting the URIs in the tool in an identical manner to the way they are presented in the corresponding Google websites contributes to spotting potential errors when validating the qualitative data. Furthermore, the aforementioned data are manually checked by both study authors.

Finally, one more aspect that might affect the reliability of the results is that every Google application detected in the study has multiple versions of API. As an example, the study inspects Google Photos APIs; the version being analyzed is version 1. Meanwhile, another API the study inspects, such as Google Drive, uses version 3 as of the date of this study. Therefore, the final results are based on the set of APIs mentioned in table 1.2 and might differ with different versions of Google APIs.

2.2.2 Validity

In this section, threats to validity are discussed according to the guidelines set by [16].

Construct validityensures that the method followed matches what the study is aiming to measure. In order to ensure minimum threats to Construct validity, all the rules (detection algorithms) are defined and identified according to existing literature and previous research on the design and linguistic patterns and antipatterns in RESTful APIs (see [7]

and [6]).

(16)

When it comes to External validity, the study tries to minimize the threats by analyzing only Google APIs. Due to the limitation of the scope of the study, which only involves only Google APIs, the findings cannot be generalized to every existed REST API, but only Google APIs. Furthermore, the selection of APIs that only belong to Google introduces a selection bias, which adds another threat to the study. However, studying other APIs outside Google paradigm is beyond the scope of this thesis.

The Internal validity threat present in this paper is the accuracy of the tool that is developed to detect the violations of REST API design pattern principles and poor linguistics design. In order to minimize the Internal validity threat, well-known and trusted libraries are used in the tool to detect REST and linguistic antipatterns, such as Stanford CoreNLP and WordNet. Also, a well tested approaches for design and linguistic detection such as Heuristics-Based, and SARA approaches respectively (see [7] and [6]).

When it comes to Reliability validity, the study is possible to replicate because the method followed is discussed step by step. Furthermore, the tool developed and the raw data used to perform the statistical analysis are all available in this paper (see section 3).

When it comes to the tool validation in terms of detecting design patterns/antipatterns, the majority of the detected URIs in the APIs have been validated using the provided feature by Google websites. A user can make a request using a simple form provided for each URI³ and gets a response for that request; the validation process is done manually by comparing the output of the tool and output of the response fetched in the Google form. On the other hand, the validation of the linguistic patterns/antipatterns results is done manually by observing the tool’s findings and the real nodes of each URI.

2.3 Ethical considerations

To construct a REST API or any API, a developer should have many considerations in mind. Users who use a specific API mostly seek to have a reliable application. Therefore, the API application must be well trusted, reliable, and fulfills its purpose; this mostly requires a good and well-structured API design and good linguistic quality. Moreover, the API has to be clear from ethical and legal issues when providing an API for public usage. However, this study does not involve any ethical or legal consideration as there are no human interactions related to the experiment. The aim of the research is to solely detect whether there is a relation between design and linguistic qualities in public Google APIs.

3https://developers.google.com/photos/library/reference/rest/v1/

albums/addEnrichment

(17)

3 Implementation

Firstly, the implemented tool checks for design patterns and antipatterns separately, using their algorithms listed in Appendix A, because not every absence of an antipattern automatically produces a pattern. For example, one of the conditions of the Response Caching pattern is that the status-code has to be 304. However, there is no such condition in its relevant pattern (Ignore caching); refer to 1.2 and 1.13 to see the detailed detection algorithms.

Besides, for a URI, there can be cases where one can neither detect a pattern nor an antipattern. For example, Forgetting Hypermedia/Entity Linking antipattern/pattern algorithms check only POST and GET methods, and do not check other methods like DELETE, PUT, and PATCH. Refer to 1.1 and 1.12 to see their detection algorithms.

Therefore, some results show that there are undetected cases (left) in the figures of section 4.3.2.

On the other hand, when it comes to linguistic patterns/antipatterns detection, the study follows the logic that a lack of a linguistic pattern means a linguistic antipattern because of the nature of the two. For example, Amorphous URIs vs. Tidy URIs linguistic antipattern/pattern detects whether a URI node contains, i.e. an underscore (_). Therefore, it is possible to conclude that the lack of a Tidy URI pattern in this case automatically meant the existence of an Amorphous URI antipattern. Refer to 1.6 to see the detection algorithm of Amorphous URIs antipattern.

The rest of this section describes the development process of the tool that is used to detect REST and linguistic antipatterns in Google APIs; it also explains how the tool works and mentions the approaches, framework, and libraries utilized to implement all of the patterns (1.4) and antipatterns (1.3) detection algorithms. The tool can be accessed here⁴; use admin as the username and the password.

3.1 Functional Overview of the REST-Ling

REST-Ling is our proposed tool which is a web application that aims to automate the detection process of Google APIs. The following are some of the tool features:

• Addition of APIs and URIs: REST-Ling allows the user to add one or more URIs manually, or even through uploading a JSON file. The JSON file can contain multiple APIs with multiple URIs.

• Selection of Patterns and Antipatterns: REST-Ling permits the user to select the desired patterns and antipatterns (both of design and linguistic) to be analyzed.

Moreover, The analyzing process of the antipatterns is done asynchronously for all the patterns and antipatterns.

• Customizability: It allows the user to provide the headers and query parameters for both the whole set of the API and for each URI individually. It also permits the user to add a path parameter (URI parameter). The user can add them through the same JSON file where the APIs are attached or directly through the website.

• Generalizability: The detection mechanism is not only restricted to Google APIs.

The tool is capable of detecting the design and linguistic patterns and antipatterns of any API, as long as the headers and parameters are set correctly.

4https://rest-ling.com

(18)

• Security: The tool has an authentication and authorization system, which lets users test their APIs. It also has an administrative user who has the privileges to create, read, update, and delete users.

• Detail-oriented: The tool provides answers to what, why, and where the design and linguistic antipatterns occur; this allows the user to have a better insight of the API quality by checking what type of antipatterns an API has.

• Graphical Representation: The tool provides a graphical representation of the detection results of the patterns, antipatterns, and unchecked URIs. It provides charts like pie and bar charts.

• Support for Linguistic Analysis: For example, when it comes to detecting Con- textless Resource Names linguistic antipattern, the tool provides functionalities to import and add acronyms and stop words which are used to create a topic model used by the SARA approach.

Figure 3.1: REST-Ling (the detection tool)

3.2 REST Antipatterns Detection

This section describes the main software libraries and frameworks that are used in the development of the tool to detect REST antipatterns.

(19)

3.2.1 Node.js

Node.js is a JavaScript server-side environment that is built on Chrome’s V8 JavaScript engine [17]. The major implementation of Node and V8 is in C and C++ where performance and resources consumption are essential. Node uses an asynchronous I/O eventing model to implement the concurrent execution. Whereas, almost all modern environments uses multi-threading systems to support concurrency. Node process can be described as a single-threaded daemon which makes use of V8. [18]. Node.js is the chosen environment for the proposed tool due to its flexibility and the large number of packages that Node Package Manager has.

3.2.2 Express.js

Express.js is a web framework for Node.js that is built on the core Node.js HTTP module.

It provides solutions for problems that every developer faces when implementing REST applications. Problems like parsing request bodies and cookies, managing sessions, orga- nizing routes, and determining the right response headers based on data types [19].

Express.js is used as the back-end web framework for the proposed tool due to its popularity in the Node environment [20].

3.2.3 Node-fetch

In order to test the REST design antipatterns of Google APIs, requests from a client- side must be sent. The tool makes use of the Node-fetch package, a part of the Node Package Manager. Node-fetch is a light-weight module that brings the JavaScript function window.fetchinto Node [21].

3.3 Linguistic Antipatterns Detection

The developed tool includes a web service to detect linguistic antipattern. The web application makes API calls to the web service to check whether a specific URI has a linguistic antipattern or not. This section describes how the web service is developed.

3.3.1 SARA Approach

The approach that is used in this study to detect linguistic antipatterns is the same approach that is followed in this previous [6]. Semantic Analysis of RESTful APIs (SARA) detection approach is eligible to perform improved semantic analysis of RESTful APIs [6]. Additionally, the English dictionaries WordNet⁵and Standford’s CoreNLP⁶are used.

3.3.2 Spring boot

Spring boot is a java framework that is used to create micro-services and stand-alone and production-ready spring application [22]. A Spring Boot application is chosen to be the framework of the web service where the detection of specific linguistic antipatterns is performed.

5https://wordnet.princeton.edu

6https://stanfordnlp.github.io/CoreNLP

(20)

4 Results

This section shows the results obtained out after running the developed tool. It also shows the results of Chi-squared test and Phi Coefficient.

4.1 Relationship between Design Quality and Linguistic Quality

The following table is a contingency table of the design antipatterns and the linguistic antipatterns on eight different Google APIs. The aim of extracting these data is to use them to answer RQ1 What is the relationship between design quality and linguistic quality in Google APIs?

To answer this question, the following null hypothesis shall be tested.

H₀₁: There is no statistically significant relationship between design quality and linguistic quality in Google APIs.

We perform the Chi squared test⁷ to investigate RQ1, thus, accept or reject the null hypothesis.

Moreover, the columns headers of the table represent the linguistic antipatterns and the rows represent the design antipatterns.

AMO CRD NV CRN NHN SPN TDY VBL VS CZRN HN SP

BSD 227 4 0 41 0 64 88 311 315 274 275 42

FH 129 1 0 29 0 53 49 177 178 149 154 35

IMT 0 0 0 0 0 0 0 0 0 0 0 0

IC 0 0 0 0 0 0 0 0 0 0 0 0

MC 0 0 0 0 0 0 0 0 0 0 0 0

ISC 19 0 0 2 0 1 4 23 23 21 23 0

EL 37 3 0 3 0 12 16 50 53 50 47 7

CN 228 4 0 41 0 65 89 313 317 276 277 42

RC 0 0 0 0 0 0 0 0 0 0 0 0

AMO = Amorphous URIs BSD = Breaking Self-descriptiveness TDY = Tidy URIs CRD = CRUDy URIs FH = Forgetting Hypermedia VBL = Verbless URI NV = Non-versioned URIs IMT = Ignoring MIME Types VS = Versioned URIs

CRN = Contextless Resource Names IC = Ignoring Caching CZRN = Contextualised Resource Names NHN = Non-Hierarchical Nodes MC = Misusing Cookies HN = Hierarchical Nodes

SPN = Singularised vs. Pluralised Nodes ISC = Ignoring Status Code SP = Singularised vs. Pluralised Nodes CN = Content Negotiation EL = Entity Linking RC = Response Caching

Table 4.1: Contingency Table of Design and Linguistic Patterns and Antipatterns

After fetching Table 4.1, a Chi-squared test has been performed. The following table shows the result of the Chi-squared test of independence between design and linguistic patterns and antipatterns.

Test Type p-value

χ²(ContingencyT able(DesignPatternAndAntiPatterns, LinguisticPatternsAndAntiPatterns)) < 2.2e-16 Table 4.2: The χ² Test of Independence between Design and Linguistic Patterns and Antipatterns

7https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/

chisq.test

(21)

4.2 Relationship between Each Pair of Pattern and Antipattern

This section presents the results of Phi Coefficient test⁸when it is performed on the com- binations of design and linguistic patterns and antipatterns.

4.2.1 Design Antipatterns vs. Linguistic Antipatterns

The following table shows the Phi Coefficient test for each Design antipatterns vs. linguistic antipatterns. The aim of having this table is to answer RQ1.1 What is the relationship between design antipatterns and linguistic antipatterns in Google APIs?

To answer this question the following null hypothesis shall be tested.

H_01.1: There is no statistically significant relationship between the design antipatterns and linguistic antipatterns in Google APIs.

8https://www.rdocumentation.org/packages/psych/versions/1.0-17/

topics/phi

(22)

Pairs of design antipatterns and linguistic antipatterns Phi Coefficient Breaking Self-descriptiveness vs. Amorphous URIs 0.03887 Breaking Self-descriptiveness vs. CRUDy URIs 0.00900 Breaking Self-descriptiveness vs. Non-version URIs - Breaking Self-descriptiveness vs. Contextless Resource Names 0.03071 Breaking Self-descriptiveness vs. Non-hierarchical Nodes -

Breaking Self-descriptiveness vs. Singularised/ Pluralised Nodes -0.05821

Forgetting Hypermedia vs. Amorphous URIs 0.01379

Forgetting Hypermedia vs. CRUDy URIs -0.07097

Forgetting Hypermedia vs. Non-version URIs -

Forgetting Hypermedia vs. Contextless Resource Names 0.11325 Forgetting Hypermedia vs. Non-hierarchical Nodes - Forgetting Hypermedia vs. Singularised/ Pluralised Nodes 0.25984

Ignoring MIME Types vs. Amorphous URIs -

Ignoring MIME Types vs. CRUDy URIs -

Ignoring MIME Types vs. Non-version URIs -

Ignoring MIME Types vs. Contextless Resource Names - Ignoring MIME Types vs. Non-hierarchical Nodes - Ignoring MIME Types vs. Singularised/ Pluralised Nodes -

Ignoring Caching vs. Amorphous URIs -

Ignoring Caching vs. CRUDy URIs -

Ignoring Caching vs. Non-version URIs -

Ignoring Caching vs. Contextless Resource Names - Ignoring Caching vs. Non-hierarchical Nodes - Ignoring Caching vs. Singularised/ Pluralised Nodes -

Misusing Cookies vs. Amorphous URIs -

Misusing Cookies vs. CRUDy URIs -

Misusing Cookies vs. Non-version URIs -

Misusing Cookies vs. Contextless Resource Names - Misusing Cookies vs. Non-hierarchical Nodes - Misusing Cookies vs. Singularised/ Pluralised Nodes -

Ignoring Status Code vs. Amorphous URIs 0.06650

Ignoring Status Code vs. CRUDy URIs -0.03161

Ignoring Status Code vs. Non-version URIs -

Ignoring Status Code vs. Contextless Resource Names -0.03532 Ignoring Status Code vs. Non-hierarchical Nodes -

Ignoring Status Code vs. Singularised / Pluralised Nodes -0.11193

Table 4.3: Relation between Design and Linguistic Antipatterns (Phi Coefficient)

Note that the "-" symbol in table 4.3 means Phi Coefficient is not applicable.

(23)

4.2.2 Design Antipatterns vs. Linguistic Patterns

The following table shows the Phi Coefficient test for each Design antipatterns vs. linguistic patterns. It is meant to answer RQ1.2 What is the relationship between design antipatterns and linguistic patterns in Google APIs?

H_01.2: There is no statistically significant relationship between the design antipatterns and linguistic patterns.

Pairs of design antipatterns and linguistic patterns Phi Coefficient Breaking Self-descriptiveness vs. Tidy URI (TDY) -0.03887 Breaking Self-descriptiveness vs. Verbless URI (VBL) 0.00900 Breaking Self-descriptiveness vs. Versioned URIs (VS) -

Breaking Self-descriptiveness vs. Contextualised Resource Names -0.03071 Breaking Self-descriptiveness vs. Hierarchical Nodes -0.03027 Breaking Self-descriptiveness vs. Singularised/ Pluralised Nodes 0.03113

Forgetting Hypermedia vs. Tidy URI -0.01379

Forgetting Hypermedia vs. Verbless URI 0.07097

Forgetting Hypermedia vs. Versioned URIs (VS) -

Forgetting Hypermedia vs. Contextualised Resource Names -0.11325 Forgetting Hypermedia vs. Hierarchical Nodes -0.02947 Forgetting Hypermedia vs. Singularised/ Pluralised Nodes 0.21408

Ignoring MIME Types vs. Tidy URI -

Ignoring MIME Types vs. Verbless URI -

Ignoring MIME Types vs. Versioned URIs (VS) - Ignoring MIME Types vs. Contextualised Resource Names -

Ignoring MIME Types vs. Hierarchical Nodes -

Ignoring MIME Types vs. Singularised/ Pluralised Nodes -

Ignoring Caching vs. Tidy URI -

Ignoring Caching vs. Verbless URI -

Ignoring Caching vs. Versioned URIs (VS) -

Ignoring Caching vs. Contextualised Resource Names -

Ignoring Caching vs. Hierarchical Nodes -

Ignoring Caching vs. Singularised/ Pluralised Nodes -

Misusing Cookies vs. Tidy URI -

Misusing Cookies vs. Verbless URI -

Misusing Cookies vs. Versioned URIs (VS) -

Misusing Cookies vs. Contextualised Resource Names -

Misusing Cookies vs. Hierarchical Nodes -

Misusing Cookies vs. Singularised vs. Pluralised Nodes -

Ignoring Status Code vs. Tidy URI -0.06650

Ignoring Status Code vs. Verbless URI 0.03161

Ignoring Status Code vs. Versioned URIs (VS) - Ignoring Status Code vs. Contextualised Resource Names 0.03532 Ignoring Status Code vs. Hierarchical Nodes 0.10628 Ignoring Status Code vs. Singularised vs. Pluralised Nodes -0.10930

Table 4.4: Relation between Design Antipattern and Linguistic Pattern (Phi Coefficient) Note that the "-" symbol in table 4.4 means Phi Coefficient is not applicable.

(24)

4.2.3 Design Patterns vs. Linguistic Patterns

The following table shows the Phi Coefficient test for each design patterns vs. linguistic patterns. The purpose of this table is to answer RQ1.3 What is the relationship between design patterns and linguistic patterns in Google APIs?

H_01.3: There is no statistically significant relationship between the design patterns and linguistic patterns in Google APIs.

Pairs of design patterns and linguistic patterns Phi Coefficient

Entity Linking vs. Tidy URI 0.02106

Entity Linking vs. Verbless URI -0.17656

Entity Linking vs. Versioned URIs -

Entity Linking vs. Contextualised Resource Names 0.09711

Entity Linking vs. Hierarchical Nodes 0.01750

Entity Linking vs. Singularised/ Pluralised Nodes -0.00055

Content Negotiations vs. Tidy URI -

Content Negotiation vs. Verbless URI -

Content Negotiation vs. Versioned URIs -

Content Negotiation vs. Contextualised Resource Names -

Content Negotiation vs. Hierarchical Nodes -

Content Negotiation vs. Singularised/ Pluralised Nodes -

Response Caching vs. Tidy URI -

Response Caching vs. Verbless URI -

Response Caching vs. Versioned URIs -

Response Caching vs. Contextualised Resource Names -

Response Caching vs. Hierarchical Nodes -

Response Caching vs. Singularised/ Pluralised Nodes -

Table 4.5: Relation between Design Pattern and Linguistic Pattern (Phi Coefficient) Note that the "-" symbol in table 4.5 means Phi Coefficient is not applicable.

4.2.4 Design Patterns vs. Linguistic Antipatterns

The following table shows the Phi Coefficient test for each design patterns vs. linguistic Antipatterns. The purpose of this table is to answer RQ1.4 Is there a relation between design patterns and linguistic antipatterns in Google APIs?

H_01.4: There is no statistically significant relationship the between the design patterns and linguistic antipatterns in Google APIs?

(25)

Pairs of design patterns and linguistic antipatterns Phi Coefficient

Entity Linking vs. Amorphous URIs -0.02106

Entity Linking vs. CRUDy URIs 0.17656

Entity Linking vs. Non-versioned URIs -

Entity Linking vs. Contextless Resource Names -0.09711

Entity Linking vs. Non-Hierarchical Nodes -

Entity Linking vs. Singularised/ Pluralised Nodes 0.02371

Content Negotiations vs. Amorphous URIs -

Content Negotiation vs. CRUDy URIs -

Content Negotiation vs. Non-versioned URIs -

Content Negotiation vs. Contextless Resource Names - Content Negotiation vs. Non-Hierarchical Nodes - Content Negotiation vs. Singularised/ Pluralised Nodes -

Response Caching vs. Amorphous URIs -

Response Caching vs. CRUDy URIs -

Response Caching vs. Non-versioned URIs -

Response Caching vs. Contextless Resource Names - Response Caching vs. Non-Hierarchical Nodes - Response Caching vs. Singularised/ Pluralised Nodes -

Table 4.6: Relation between Design Pattern and Linguistic Pattern (Phi Coefficient) Note that the "-" symbol in table 4.6 means Phi Coefficient is not applicable.

4.3 Google APIs Detection of Individual Pattern and Antipattern

This section shows a graphical presentation of the results of the detection of patterns, antipatterns, and not-applicable instances Google API.

4.3.1 Presentation of the Individual Patterns and Antipatterns of Google APIs This section presents the individual design and linguistic patterns and antipatterns that are found in the detected Google APIs through a set of bar charts. The patterns are shown in green color, antipatterns in red color, and the not-applicable instances in light grey color.

Table 4.7 presents the meanings of the antipatterns abbreviations used in the bar charts, it also shows the relevant patterns.

Abbreviation Antipattern Pattern

BSD Breaking Self-descriptiveness -

FH Forgetting Hypermedia Entity Linking

IMT Ignoring MIME Types Content Negotiation

IC Ignoring Caching Response Caching

MC Misusing Cookies -

ISC Ignoring Status Code -

AMO Amorphous URIs Tidy URI

CRD CRUDy URIs Verbless URI

NV Non-versioned URIs Versioned URIs

CRN Contextless Resource Names Contextualised Resource Names

NHN Non-Hierarchical Nodes Hierarchical Nodes

SPN Singularised vs. Pluralised Nodes Singularised vs. Pluralised Nodes Table 4.7: Antipatterns Abbreviations

(26)

The following figures present the detected patterns in green color, antipatterns in red color, and the skipped instances in Google APIs.

Figure 4.1: Google Photos APIs Detection of Individual Pattern and Antipattern

Figure 4.2: Google Drive APIs Detection of Individual Pattern and Antipattern

Figure 4.3: Google Classroom APIs Detection of Individual Pattern and Antipattern

(27)

Figure 4.4: Google Blogger APIs Detection of Individual Pattern and Antipattern

Figure 4.5: Google Calendar APIs Detection of Individual Pattern and Antipattern

Figure 4.6: Google Gmail APIs Detection of Individual Pattern and Antipattern

(28)

Figure 4.7: YouTube APIs Detection of Individual Pattern and Antipattern

Figure 4.8: Google Sheets APIs Detection of Individual Pattern and Antipattern

4.3.2 Presentation of the Total Patterns and Antipatterns of Google APIs

Through the following pie charts, this section presents the total design and linguistic patterns and antipatterns that are found in the detected Google APIs. The patterns are presented in green color, antipatterns in red color, and the not-applicable (left) instances in blue color.

Figure 4.9: Google Photos APIs Detection of Total Pattern and Antipattern

(29)

Figure 4.10: Google Drive APIs Detection of Total Pattern and Antipattern

Figure 4.11: Google Classroom APIs Detection of Total Pattern and Antipattern

Figure 4.12: Google Blogger APIs Detection of Total Pattern and Antipattern

Figure 4.13: Google Calendar APIs Detection of Total Pattern and Antipattern

(30)

Figure 4.14: Google Gmail APIs Detection of Total Pattern and Antipattern

Figure 4.15: YouTube APIs Detection of Total Pattern and Antipattern

Figure 4.16: Google Sheets APIs Detection of Total Pattern and Antipattern

(31)

5 Analysis

After performing the detection process of design and linguistic patterns and antipatterns on eight different Google APIs, this chapter tries to analyze the results obtained in the previous section. Also, it presents set of sub-sections to whether accept or reject the null hypotheses listed in section 1.5.

5.1 Design Quality and Linguistic Quality

In order to examine if there is a relationship between design and linguistic quality, a Chi-squared test is performed on all four groups of design and linguistic patterns and antipatterns. The test yielded a statistically significant overall p-value < 0.05. Therefore, the null hypothesis H₀₁ is rejected, and it is concluded that there is a likely relationship between design quality and linguistic quality in Google APIs.

5.2 Design Antipatterns and Linguistic Antipatterns

When studying the relationship between design and linguistic antipatterns, the Phi coefficient method is used. All the results, except one, ranged from -0.19 to +0.19; this means there might be a relationship at a negligible level. The interpretation of the Phi coefficient results is based on this paper [23]. The results lead to failure in rejecting the null hypothesis H_01.1 due to insufficient statistical evidence. Therefore, it is concluded that there is no relation between design antipatterns and linguistic antipatterns.

5.3 Design Antipatterns and Linguistic Patterns

Similar to H_01.1, the Phi coefficient method is used to test H_01.2. The yielded results exhibits a no or negligible relationship as they ranged from -0.19 to +0.19. One value showed a weak positive relationship with a value of 0.21. The results also lead to failure in rejecting the null hypothesis H_01.2 due to insufficient statistical evidence. Thus, it is concluded that there is no relation between design antipatterns and linguistic patterns.

5.4 Design Patterns and Linguistic Patterns

Like the two previous analyses, testing the relationship between design and linguistic patterns yielded a statistically insignificant Phi coefficient results ranging from -0.19 to +0.19. Thus, a no or negligible relationship between the two measured sets of patterns.

The results lead to failure in rejecting the null hypothesis H_01.3 due to insufficient statistical evidence. Therefore, it is concluded that there is no relation between design and linguistics patters.

5.5 Design Patterns and Linguistic Antipatterns

Expectedly, at this point, when the comparison is made between design patterns and linguistics antipatterns using the Phi coefficient method, the results exhibits a no or negligible relation between the two with result values ranging between -0.19 and +0.19. The results lead to failure in rejecting the null hypothesis H_01.4for insufficient statistical evidence. Hence, the conclusion is that there is no relationship between design patterns and linguistic antipatterns.

(32)

RQ Summary

RQ1 With Chi-squared test performed in a group, there is relation between design quality and linguistic quality in Google APIs. The statistical evidence is significant.

RQ1.1 With Phi-coefficient test performed for each pair of design antipattern and linguistic antipattern, there is no or negligible relation between design antipatterns and linguistic antipatterns.

RQ_1.2 With Phi-coefficient test performed for each pair of design antipattern and linguistic pattern, there is no or negligible relation between design antipatterns and linguistic patterns.

RQ1.3 With Phi-coefficient test performed for each pair of design pattern and linguistic pattern, there is no or negligible relation between design and linguistics patterns.

RQ_1.4 With Phi-coefficient test performed for each pair of design pattern and linguistic antipattern, there is no or negligible relation between design patterns and linguistic antipatterns.

Table 5.1: Summary of the Findings

Table 5.1 presents a summary of the answers for the research questions.

(33)

6 Discussion

This section discusses the study results and the implementation of the tool; then compares the findings of this paper with the findings of previous research.

The results show that most analyzed URIs are not CRUDy and are free of Non- Versioned URIs, Contextless Resource Names, and Non-Hierarchical Nodes antipatterns.

However, there is a considerable number of Amorphous URIs and Singularised vs. Plu- ralised Nodes antipattern detected. This exhibits that Google APIs are mostly well designed in terms of linguistic quality with a chance of slight improvement if we are to say that they are 100% linguistic antipattern-free.

On the other hand, the detection of the REST design patterns/antipatterns shows that 100% of the URIs contain Breaking Self-descriptiveness antipatterns. This is mostly because Google uses some non-standard request or response headers. Moreover, the Forget- ting Hypermedia and Ignoring Status Code antipatterns are present in the majority of the URI. Also, there is an absence of the majority of design patterns except for Content Ne- gotiation, refer to 4.3.1, because Google servers can present resources in various formats to the clients. This goes to show that when it comes to REST design, Google APIs do not display the same high quality of Linguistic design as the APIs tend to have an almost equal share of patterns and antipatterns. This observation might be affected by the fact that fewer number of design patterns are analyzed in comparison to the linguistic design patterns. See the figures in the Result section 4 for a better overall overview.

Furthermore, a relation between some of the design antipatterns and linguistic patterns is clearly shown by looking at the bar charts listed in section 4.3.1. For example, the noticed relation between Breaking Self-descriptiveness design antipattern (detected in all of the URIs) and Versioned URIs (also detected in all URIs); the relation is present throughout all the Google APIs analyzed. This observation matches the finding of the Chi-squared test which yields a statistically significant correlation between design and linguistic patterns/antipatterns.

The implementation of the tool is mainly divided into two parts, the design, and the linguistic parts. The detection algorithms that are provided in [1] makes the detection of the design patterns/antipatterns easy to implement. Furthermore, it is noteworthy that Google APIs documentation and the organized presentation of the URIs make it easy to collect the qualitative data.

On the other hand, in terms of the linguistic antipatterns detection, the tool made use of an old project that article [6] uses to detect the linguistic antipatterns. However, in our tool, we rebuilt this project and integrated it. Having this project helped us to skip implementing many complicated steps regarding the detection of linguistic antipatterns.

Finally, a previous study (see 1.2) performed design patterns and antipatterns detection of the same versions of YouTube APIs that our study detects. In contrast to our paper, [7] detects only REST design pattern and anti-patterns. The paper analyzes four more patterns/antipatterns than our study. However, the number of URIs our study analyzes is much larger than in [7].The similarities of the common antipatterns between the two studies are that both found that 100% of the URIs have Breaking Self-descriptiveness antipattern and both studies have found zero antipatterns of Ignoring Caching, Ignoring Status Code, Ignoring MIME Types, and Missing Cookies. On the other hand, some differences are found when it comes to Forgetting Hypermedia antipattern, it is hard to estimate the real difference here since [7] found only two antipatterns in three URIs while our study found 37 antipatterns in 38 URIs. More differences are found between our study and [7] in all of the three common to REST patterns. our study found that 100% of the

(34)

URIs have Content Negotiation pattern while [7] found no Content Negotiation pattern.

In contrast, when it comes to Entity Linking and Response Caching patterns, our study found no patterns. However, [7] found patterns in all of the URIs.

Article [7] was published in 2014. Thus, the time difference between our study and [7]

is six years. During this time, many changes may have occurred to the implementation of YouTube web servers due to the growth of the application. These changes might explain the few differences between the results of the detection in both studies, but this is an aspect that needs further investigation and deeper analysis.

(35)

7 Conclusion

Nowadays, REST is the most sought-after architectural choice for building web services.

A well-designed REST API that is also easy to understand attracts more REST client developers. Therefore, linguistic and design quality are two equally essential factors every API provider should consider. This empirical study was performed to determine if there is a relationship between design and linguistic quality in RESTful APIs. Using a tool developed especially for this study, eight Google APIs were analyzed, and several design and linguistic patterns and antipatterns were extracted. Two types of statistical tests were performed on the data obtained in order to answer the research questions and sub-questions.

The results of the study are summarized as follows:

• A Chi-squared test performed in a group shows there is a statistically significant relationship between design and linguistic quality in Google APIs (RQ1).

• A Phi-coefficient test performed for each pair of design antipattern and linguistic antipattern shows there is no or negligible relation between design and linguistic antipatterns (RQ1.1).

• A Phi-coefficient test performed for each pair of design antipatterns and linguistic patterns shows there is no or negligible relation between design antipatterns and linguistic patterns (RQ1.2).

• A Phi-coefficient test performed for each pair of design patterns and linguistic patterns shows there is no or negligible relation between design and linguistic patterns (RQ1.3).

• A Phi-coefficient test performed for each pair of design patterns and linguistic antipatterns shows there is no or negligible relation between design patterns and linguistic antipatterns (RQ1.4).

7.1 Future work

Further work could involve replicating the study on other Google APIs to combine the datasets obtained and test the hypotheses again may provide results with better validity when performing the Phi coefficient test on each pair of patterns/anti-patterns. Also, expanding the scope of the study to include more popular REST APIs other than Google may give a better insight into the emphasis software organizations put on design and linguistic quality. Finally, the tool developed for this study can be improved to include the detection of more REST design and linguistic patterns/anti-patterns.

(36)

References

[1] Sofa, service oriented framework for analysis. Available: http://sofa.uqam.ca [Ac- cessed 28 May 2020].

[2] T. Erl, P. Merson, and R. Stoffers, Service-oriented Architecture: Analysis and Design for Services and Microservices, ser. Prentice Hall service technology series from Thomas Erl. Prentice Hall, Service Tech Press, 2016. [Online]. Available:

https://books.google.se/books?id=yNmlnQAACAAJ

[3] R. T. Fielding and R. N. Taylor, “Architectural styles and the design of network- based software architectures,” Ph.D. dissertation, 2000, aAI9980887.

[4] L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice, 3rd ed.

Addison-Wesley Professional, 2012.

[5] S. Vinoski, “Serendipitous reuse,” IEEE Internet Computing, vol. 12, no. 1, pp. 84–

87, 2008.

[6] F. Palma, J. Dubois, N. Moha, and Y. Guéhéneuc, “Semantic analysis of restful apis for the detection of linguistic patterns and antipatterns,” International Journal of Cooperative Information Systems, vol. 26, no. 20, p. 37, May 2017. Available Online: https://www.worldscientific.com/doi/abs/10.1142/S0218843017420011.

[7] F. Palma, J. Dubois, N. Moha, and Y.-G. Guéhéneuc, “Detection of rest patterns and antipatterns: A heuristics-based approach,” pp. 230–244, 11 2014.

[8] M. Masse, REST API Design Rulebook, ser. Oreilly and Associate Series.

O’Reilly Media, 2011. [Online]. Available: https://books.google.se/books?id=

4lZcsRwXo6MC

[9] T. Erl, SOA Design Patterns, 1st ed. USA: Prentice Hall PTR, 2009.

[10] W. H. Brown, R. C. Malveau, H. W. McCormick, and T. J. Mowbray, AntiPatterns:

Refactoring Software, Architectures, and Projects in Crisis, 1st ed. USA: John Wiley & Sons, Inc., 1998.

[11] Restful design: Intro, patterns, anti-patterns. Available: https://www.infoq.com/

articles/rest-anti-patterns [Accessed 22 May 2020].

[12] B. Varanasi and S. Belida, Spring REST. Apress, 2015. [Online]. Available:

https://books.google.se/books?id=2GInCgAAQBAJ

[13] D. Moore, W. Notz, and M. Fligner, The Basic Practice of Statistics. W.H.

Freeman and Company, 2013. [Online]. Available: https://books.google.se/books?

id=aw61ygAACAAJ

[14] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 4th ed. Chapman & Hall/CRC, 2007.

[15] M. Allen, The SAGE Encyclopedia of Communication Research Methods.

SAGE Publications, 2017. [Online]. Available: https://books.google.se/books?id=

4GFCDgAAQBAJ

(37)

[16] R. Yin, Case Study Research and Applications: Design and Methods.

SAGE Publications, 2017. [Online]. Available: https://books.google.se/books?id=

fHE3DwAAQBAJ

[17] Nodejs, official site. Available: https://nodejs.org/en [Accessed 22 May 2020].

[18] S. Tilkov and S. Vinoski, “Node.js: Using javascript to build high-performance network programs,” IEEE Internet Computing, vol. 14, no. 6, pp. 80–83, 2010.

[19] A. Mardan, Express.js Guide: The Comprehensive Book on Express.js. Createspace Independent Pub, 2014. [Online]. Available: https://books.google.se/books?id=

5eGRAwAAQBAJ

[20] Express, node introduction. Available: https://developer.mozilla.org/en-US/docs/

Learn/Server-side/Express_Nodejs/Introduction [Accessed 22 May 2020].

[21] Node-fetch. Available: https://www.npmjs.com/package/node-fetch [Accessed 22 May 2020].

[22] Spring boot. Available: https://spring.io/projects/spring-boot#overview [Accessed 22 May 2020].

[23] G. Yule, “On the methods of measuring association between two attributes,” Journal of the Royal Statistical Society, vol. 75, no. 6, p. 579, 1912.

Bachelor Degree Project Are APIs with Poor Design Subject to Poor Lexicon?

Author: Ahmad Sadia Author: Osama Zarraa Supervisor: Francis Palma Semester: VT 2020

Subject: Computer Science