Evaluation of the t-wise Approach for Testing REST APIs

(1)

IN

DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2020

Evaluation of the t-wise

Approach for Testing REST

APIs

DIBA VOSTA

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

Evaluation of the t-wise

Approach for Testing REST

APIs

DIBA VOSTA

Master in Computer Science Date: October 21, 2020 Supervisor: Karl Meinke Examiner: Joakim Gustafson

School of Electrical Engineering and Computer Science Host company: TietoEVRY

(4)

(5)

iii

Abstract

A combinatorial explosion can occur when all possible combinations of all input parameters of a system are tested. When the number of input parameters and their possible values increase, the number of tests needed to cover each new case increases exponentially. Com-binatorial interaction testing (CIT) is a black-box testing technique used to avoid a combinatorial explosion. CIT finds errors that are triggered by the interactions between parameters. One of the so-called combination strategies that can be used for CIT is t-wise test-ing. T-wise testing requires at least one test case for each combina-tion of any t parameter values where t is the chosen strength - the number of parameters amongst which the interactions are tested. In this report, CIT with t-wise testing is applied to the testing of REST APIs.

(6)

Sammanfattning

En kombinatorisk explosion kan uppstå när all möjliga kombina-tioner av inputparametrar av ett system testas. Antal testfall som behövs för att täcka alla kombinationer av inputparametrar växer exponentiellt när antalet parametrar och dess möjliga värden ökar. För att undvika en kombinatorisk explosion används kombinatorisk interaktionstestning vilket är en typ av black box testning. Syftet med kombinatorisk interaktionstestning är att hitta de fel som upp-står på grund av interaktioner mellan parametrar. Kombinatorisk interaktionstestning har ett flertal så kallade kombinationsstrate-gier och den kombinationsstrategi som denna rapport använder är t-wise testning. Kravet med t-wise testning är att skapa åtminstone ett testfall för varje kombination av t parametervärden där t är styr-kan mätt i antalet parametrar som interaktioner sinsemellan testas. Denna rapport evaluerar effekterna av t-wise testning som en test-ningsmetod för REST APIer.

(7)

Acknowledgments v

Acknowledgments

I want to thank the host company TietoEVRY for giving me the op-portunity to carry out this thesis work. A special thanks to Magnus Bjuvensjö, Anna Brink and the rest of the CSP team for helping me throughout this project and making sure I had everything that I needed.

I would like to thank my KTH supervisor, Karl Meinke, for agree-ing to supervise this thesis and comagree-ing with valuable insights. I also want to thank Joakim Gustafson for examining it.

I would also like to thank my fellow classmates that I have gotten to know during my KTH journey with a special shout-out to my lab partner throughout the years, Veronica Hage.

(8)

CIT Combinatorial interaction testing API Application programming interface REST Representational state transfer ECP Equivalence class partitioning BVA Boundary value analysis SUT System under test

CPH Competent programmer hypothesis IPM Input parameter model

IPOG In-parameter-order general

ADA Add annotation

ADAT Add attribute RMA Remove annotation RMAT Remove attribute CHODR Change order RPA Replace annotation RPAT Replace attribute RPAV Replace attribute value SWTG Switch target

ACTS Automated combinatorial testing for software

(9)

Chapter 1 Introduction

Today, peoples’ lives are partly dictated by software. Societies are strongly interlinked with software of various kinds - vehicles, kitchen appliances, garage doors and the Web to mention a few. These soft-ware systems may not have much in common in terms of usage or functionality, but one way for them to communicate is through Application Programming Interfaces (APIs) [1]. When one system wants to communicate with another, an API request is sent to the receiving system and a response is sent back. There are various types of APIs that can be used, for example when communicating through the Web, HTTP requests and responses are sent between the systems. Through Web APIs, a client program can send API re-quests to communicate with other web services and make use of their data and functions [2].

The most common architectural style used when designing Web APIs is REST, which was developed in the late ’90s to improve the Web’s implementation and hence continue to expand it [1][2]. With people becoming more dependent on using various types of soft-ware, the need for properly testing REST APIs has increased. This is done through software testing which is a process of making cer-tain that a piece of software does what it is intended to do, but also that it refrains from doing anything unexpected [3].

As REST APIs communicate through sending HTTP requests, the relevant parameters to test when testing REST APIs are the HTTP request parameters that specify the request. This could be the

(12)

quest body of a post request or the URL query of a get request. Thus, the input parameters of the API request are the relevant rameters to test. One widely studied technique of testing input pa-rameter values is to apply combinatorial interaction testing (CIT). CIT is a means of software testing which focuses on exploiting er-rors that are triggered by the interaction between input parame-ters. This project examines the use of CIT to test REST APIs and aims to evaluate the resulting test suites both in terms of fault de-tection abilities and run-time performance.

1.1 Project Description

The digital services and software company TietoEVRY was inter-ested in exploring how they could further automate their testing process when it came to testing their REST APIs. They were in-trigued to find out whether it was possible to generate runnable test cases derived from their OpenAPI specifications of the APIs and how sufficient these would be. An OpenAPI specification is a JSON file which contains information about endpoints of an API. This informa-tion includes the request type, the possible parameters, parameter values, constraints as well as the types and contents of responses. This area was of interest to them as well-written automated tests are a requirement for continuous delivery. Apart from that, being able to generate test cases directly from an OpenAPI specification could potentially reduce the resources needed to manually write test cases.

1.2 Objective

(13)

CHAPTER 1. INTRODUCTION 3

On the other hand, 2-wise and 3-wise interaction testing concern the interaction between 2 and 3 parameters respectively.

1.2.1 Research Question

• How do 1-wise, 2-wise and 3-wise testing of REST APIs com-pare regarding the fault-detection abilities of the test suites? • How efficient are the different t-wise combinations in terms of

the run-time of the test suites?

1.3 Methodology

To answer the research questions, a tool will be built to generate test suites for 1-wise, 2-wise and 3-wise CIT. An existing CIT tool will be utilized to generate the combinations based on input pa-rameters derived from an OpenAPI specification. To evaluate the fault detection abilities of the different t-wise test suites, mutation testing will be used. Thus, simple faults will be injected into the source code for the test suites to discover. These faults are injected into the code that validates input parameters which is done through code annotations in the affected APIs. To estimate the efficiency of the different types of combination testing, the run-time of each test suite will be taken.

1.4 Limitations

The test cases generated in this work will only test the input param-eters of the affected API endpoints and will not evaluate any poten-tial integration with other APIs. The specific algorithm behind the test generation tool will not be brought up in detail as it is merely used as a way to extract the structure of the input parameters from the OpenAPI specification and generate combinatorial test suites.

1.5 Contribution

(14)

into how 1-wise, 2-wise and 3-wise combinatorial testing applies to testing REST APIs in particular. It also contributes by experiment-ing with a relatively new way of performexperiment-ing mutation testexperiment-ing on code annotations.

1.6 Thesis Outline

(15)

Chapter 2 Background

2.1 Software Testing Foundations

The universal truth when it comes to software testing is that it can-not express the absence of failures, but can only show their pres-ence [4]. Thus, no testing technique can be used to say that a pro-gram is failure free. Even if a propro-gram has accompanying tests, there is no assurance that these will detect faults.

There are three definitions of importance when discussing software testing that need to be understood.

• Software fault: A fault is a static mistake made in the source code [4]

• Software error: An error is the expression of the fault in terms of a flawed internal state of the system [4]

• Software failure: A failure is the external expression of the fault which manifests an incorrect output with respect to the expected output [4]

Software testing can be divided into five separate levels of testing -unit testing, module testing, integration testing, system testing and acceptance testing [4]. Each level of testing is intended to be run against its associated development level. This connection between the development activities and testing levels can be visualized in Figure 2.1, which is denoted as the "V model" [4].

(16)

Figure 2.1: V model. Illustrates development activities and corre-sponding testing levels

• Unit tests test individual units produced during the implemen-tation stage

• Module tests test individual modules of the program

• Integration tests evaluate whether the interfaces between mod-ules behave as expected - that they are consistent and commu-nicate correctly

• System tests test the system as a whole under the assumption that the individual parts of the program are working. It fo-cuses on locating issues related to the design or specification of the program

• Acceptance tests are written to assure that the user require-ments are met

(17)

CHAPTER 2. BACKGROUND 7

2.2 White-box Testing

White-box testing indicates that the developer has access to the source code [5]. Hence, the tests can be derived directly from the code and focus mainly on the control and data flow of a system. One approach of white-box testing is to create a graph represen-tation of the program and create tests that cover the paths in the graph [6][7]. Another approach is called branch testing in which the true and false options of all control statements are tested [7]. White-box testing is mainly implemented at the levels of unit, in-tegration and system testing [7].

2.3 Black-box Testing

Black-box testing entails that the tester has no knowledge of the internals of the system under test (SUT) [7]. Instead, the tester has access to the system’s architecture from which test cases are derived. A well-specified architecture presents possible input val-ues and their corresponding output valval-ues. This data should suf-fice to construct tests in accordance to functional testing which validates tests against the functional requirements/specifications. Three black-box testing techniques brought up in this report are equivalence class partitioning, boundary value analysis and CIT. These are explained under section 2.3.1, 2.3.2 and 2.4 respectively.

2.3.1 Equivalence Class Partitioning

(18)

2.3.2 Boundary Value Analysis

Boundary value analysis (BVA) can be seen as a continuation of ECP since the first step of BVA is to partition the input space into equiv-alence classes [10]. BVA then focuses on the boundaries between the created partitions. Test cases that are designed to follow BVA typically consist of three input values for each partition boundary [10]. For each identified boundary, values both on and next to it are taken. The values that are next to the boundary are only one incremental distance from it which is the minimal distance from the boundary for any defined data type. To illustrate this, figure 2.2 shows what the boundary values of the partitions from the example program in algorithm 1 would be.

Algorithm 1 Program that assigns integers

1: procedure categorizeInput(inputArray) .inputArray array of integers

2: for input in inputArray do

3: ifinput > 0and input_{ 5}then

4: "Valid" 5: else

6: "Invalid"

Figure 2.2: The boundary values needed to be tested for the pro-gram in algorithm 1

(19)

this is faults that may arise when using mathematical expressions such as _,<, and>.

Both ECP and BVA examine the valid and invalid input space.

2.4 Combinatorial Interaction Testing

When designing test cases for an SUT that is expressed as a black-box, the input space is derived from the n input parameters with their own corresponding d parameter values [11]. If one wants to test all combinations of these values, it would result indn combina-tions. As the number of input parameters and possible parameter values increase, the resulting number of combinations becomes in-feasible for testing and leads to a so-called combinatorial explosion [12]. To avoid a combinatorial explosion, combinatorial interaction testing (CIT) can be used. CIT samples a part of the input space thus only testing a representative instance of the SUT [13]. CIT defines this representative instance by finding errors that are triggered by the interactions between input parameters [14].

CIT can typically be broken down into four phases [13]. The first phase is modeling which covers what aspects of the system to in-clude in the model. The next phase, the sampling phase, is about deciding an algorithm of how the system should be tested using the information existing in the model. This is when the combinations of inputs to be tested are generated. Phase three concerns the testing of the generated combinations and phase four is about analyzing the test results. Phase one and two are further investigated in section 2.4.1 and 2.4.2.

2.4.1 Modeling Phase

(20)

model (IPM) since CIT focuses on input parameters. Following are some approaches of how to construct an IPM of an SUT.

• Choosing parameters: In this case, the parameters of in-terest are the input parameters. In other cases, it could be configuration parameters, user inputs, GUIs etc. [14][15]. • Values of parameters: Parameter values should be chosen

with great care as the behaviour of the SUT is determined by them [14]. ECP or BVA can be applied to parameters that take continuous values as input.

• Existing interactions: Parameter interactions may be de-rived through examining the system documentations [14]. • Existing constraints: Constraints can be identified through

examining the system documentations [14]. These could be that a parameter must have certain values in order for the system to run or that the value of one parameter may dictate other parameter values.

2.4.2 Sampling Phase

The created input space is sampled to design a collection of in-put combinations to be tested [13]. This collection is represented through a combinatorial covering array. Each row of the covering array represents input values for a test case where each column represents an input parameter [14]. The structure of the covering array coincides with the chosen sampling algorithm, also known as the combination strategy [13][12]. The combination strategy cho-sen in this report is t-wise testing precho-sented in the section below.

2.4.3 T-wise Testing

(21)

vt_log(n) _where _n _{is the number of parameters with} _v _possible val-ues [16]. The example below demonstrates how a covering array is constructed by t-wise testing.

Example Case

The example program used consists of 4 input parameters that each can take on 3 different values - 0, 1 or 2 as defined in table 2.1.

p1 p2 p3 p4

0 0 0 0

1 1 1 1

2 2 2 2

Table 2.1: Input parameters with corresponding values for test ex-ample

Exhaustive testing of these input parameters would result in34 _{= 81} different test cases. However, when using t-wise testing with the strength t=2 (2-wise testing), table 2.2 could be one instance of the resulting covering array.

p1 p2 p3 p4 0 0 0 0 0 1 1 1 0 2 2 2 1 0 1 2 1 1 2 0 1 2 0 1 2 0 2 1 2 1 0 2 2 2 1 0

Table 2.2: Resulting covering array of 2-wise testing

(22)

2.4.4 Coverage Criteria

Each combination strategy comes with a coverage criterion which states the rule of how to select combinations of values [12]. These criterion are applied through a strategy - a procedure that actually selects the relevant combinations.

1-wise Testing

When implementing 1-wise testing, the criteria is that each value of each parameter should be tested at least once [12]. Two strategies of 1-wise testing are:

• Each Choice: Test cases are extracted by continuously test-ing untested variables until all values are present in at least one test case [12]. If some parameters have more values than others, the last test cases may have repeating values of those parameters with fewer values.

• Base Choice: First off, a base test case is determined by a criterion, for example values that are most appealing to end-users [12]. Henceforth, new test cases are derived by chang-ing one value of one parameter at a time, keepchang-ing the other parameter values unchanged.

2-wise & 3-wise Testing

There are several existing strategies that can be applied for multi-way testing techniques. The strategy that was used in this project was In-Program-Order-General (IPOG) and is presented in section 2.5. This strategy was chosen because of its low order of complexity which contributes to it performing better in terms of time and space [17]. IPOG was also chosen because it is a deterministic algorithm [11].

2.5 In-Parameter-Order-General

(23)

applied to 2-wise testing whereas IPOG presents a more general ap-proach.

The IPO strategy starts out by building a test set using pairwise testing for two parameters [11]. Once this has been fulfilled, it adds a third parameter and extends the test cases as to include the third parameter. It continues this way until all parameters are included in the test cases. IPO is a deterministic strategy, meaning that it always produces the same test cases given the same input.

2.5.1 IPOG Testing Strategy

To create a t-wise test suite, IPOG starts out by creating a t-wise test suite for the first t parameters [11]. It then continues, in a similar manner as IPO, to include the first t+1 parameters and so on until all parameters are included in the t-wise test suite. When extending the test suite, IPOG computes a coverage set⇡which consists of all combinations needed to cover the parameter that is being added. IPOG applies the following actions to include the combinations in⇡: • Horizontal Growth: Extends each test case by adding a value of the t+1 parameter. These are added in a greedy manner such that the value which results in covering the most combi-nations for each test case is added.

• Vertical Growth: Adds a new test case if needed based on the result of the horizontal growth.

(24)

Figure 2.3: Illustration of IPOG Algorithm. (a) The 3-wise combi-nation of the first 3 parameters. (b) The horizontal growth when adding parameter 4. (c) The vertical growth when adding parame-ter 4.

In part (a) of figure 2.3, the 3-wise test suite is represented for the first three parameters. In part (b), the horizontal growth is imple-mented and P4 is added to the test cases. When examining this figure, it is evident that adding P4’s value 0 to test case 4 in (a) re-sults in more combinations of parameters than if P4’s value would have been 1. To add P4 as 0, the three combinations (P1.0, P2.1, P4.0), (P1.0, P3.1, P4.0) and (P2.1, P3.1, P4.0) of the computed ⇡

(25)

com-CHAPTER 2. BACKGROUND 15

binations, new test cases are added. The parameters that are not included in the combinations in ⇡ will be denoted with a "-" in the new test cases. An example of this is to add the combination (P1.1, P2.0, P4.0) as it’s not covered in (b). This test case would be added with the value P3.-. However, the combination (P2.0, P3.1, P4.0) is also not included after part (b). Hence, the first added test case in (c) - the ninth test case - will take the shape of (P1.1, P2.0, P3.1, P4.0) and include two combinations from ⇡in one test case.

2.6 Mutation Testing

If a program passes all its tests, it does not necessarily mean that the program is fault-free. The only conclusion one can draw is that the test data may not be detailed enough to find all the faults -meaning that the test set is not adequate [18]. This is where muta-tion testing comes in as a tool to help analyze the test sets and their adequacy.

Mutation testing is a fault-based testing technique that is used to recognize test input data that is capable of detecting faults [19]. It does this by purposely injecting faults into the source code for the tests to find. Since the potential fault-space of an SUT easily can become excessive, mutation testing focuses on certain areas of faults. These areas of faults contain faults that are very close to a correct instance of the program. The faults are determined based on two fundamental hypotheses within software testing - the com-petent programmer hypothesis (CPH) and the coupling effect.

2.6.1 Fundamental Hypotheses

Competent Programmer Hypothesis

(26)

To portray faults that are made by competent programmers, the types of faults used in mutation testing are simple syntactical faults [19]. An example of what such simple faults may look like can be seen in figure 2.4.

Figure 2.4: Example of what simple faults in accordance to the CPH may look like [21]

Coupling Effect

The coupling effect states that a test data set which discovers sim-ple faults is sensitive enough to implicitly discover comsim-plex faults as well [4][18][20]. Thus, complex faults are coupled to simple faults. When applied to mutation testing, the mutation coupling effect hy-pothesis as stated by Offutt becomes:

Complex mutants are coupled to simple mutants in such a way that a test data set that detects all simple mutants in a program will detect a large percentage of the com-plex mutants [20].

In this hypothesis, a simple mutant (also known as first-order mu-tant) is defined as a mutant that only consists of one simple syntacti-cal change to the program [19][20][22]. Thereby, a complex mutant (higher-order mutant) is a mutant that consists of more than one change to the program. However, not all complex faults in a pro-gram can be portrayed through complex mutants, making complex mutants a subset of all complex faults [20].

(27)

mu-CHAPTER 2. BACKGROUND 17

tants. This was also true for 3-order mutants. These results show that mutation testing can focus on first-order mutants and ignore higher-order mutants. Nevertheless, since higher-order mutants only make up a subset of the complex faults, the results do not show that first-order mutants will find all complex faults. However, the important practical aspect of the result is that when testing soft-ware through focusing on a small restricted class of simple faults, more complicated faults are expected to be found.

2.6.2 Mutation Operators

When discussing mutation testing, a source code file containing an injected fault is referred to as a mutant. To determine the adequacy of a test set, the tests are run against the mutants. There are differ-ent types of so-called mutation operators that can be used when im-plementing mutation testing [20]. A mutation operator is designed to find a certain type of fault by describing a potential syntactical change that would be made. The mutation operators can explic-itly require to meet branch and statement coverage or other factors such as covering extreme values. An example of a mutant is one that changes a relational operator such as <to_{ [23].}

2.6.3 Mutation Testing in Practice

During mutation testing, the first thing that happens is that mutants are created [18]. These mutants are derived from a set of chosen mutation operators. Then, the test set is tested for its adequacy by being run against the mutants. The possible outcome of this is twofold:

1. The test set results in a different outcome when run against the mutant compared to the original program

2. The test set results in the same outcome for both the mutant and original program

(28)

meaning that it is impossible for the test case to detect it. A mutant that coincides with the second case is referred to as an equivalent mutant. An equivalent mutant is one which is syntactically different from the original program, though semantically identical to it [23]. An example of this is when the linex = y + y has a mutantx = y⇤ 2. These programs output the same result regardless of the value of y, meaning that the mutant can not be killed. Equivalent mutants are taken into consideration when evaluating the adequacy of the test set [20].

2.6.4 Mutation Score

A mutation score is calculated to get an estimation of the adequacy of a test set [24]. This score is the percentage of non-equivalent mutants that were successfully killed by the test set. It is calculated using the following formula:

M S(P, T ) = D

(M E) ⇤ 100

where:

P =Program

T =Test suite

D =Number of killed mutants

M =Number of generated mutants

E =Number of equivalent mutants

A mutation score of 100% stems from a relatively adequate or mu-tation adequate test set [20][24]. Thus, all non-equivalent mutants need to be killed for a test set to be adequate. Such a high precision is needed because the goal of mutation testing is to kill all mutants [24]. A mutation score of 100% gives an indication that the test data will grant a strong test set for the original program.

(29)

of these code annotations are @NotNull, @Size(min=x, max=y) and @Pattern(regexp = "[0-9]"). Hence, traditional mutation opera-tors are not applicable to these validations as they are more appli-cable to arithmetical and relational operators to mention a few. To address this issue, Pinheiro et al. developed nine mutation opera-tors for code annotations that can be seen below [25].

• Add annotation, ADA: Adds a new annotation to a valid target • Add attribute, ADAT: Adds a valid attribute to an existing

an-notation

• Remove annotation, RMA: Removes an annotation from a tar-get

• Remove attribute, RMAT: Removes an attribute from an exist-ing annotation

• Change order, CHODR: Changes the order of existing annota-tions of a target.

• Replace annotation, RPA: Replaces one valid annotation by an-other

• Replace attribute, RPAT: Replaces an annotation attribute by another

• Replace attribute value, RPAV: Replaces one annotation at-tribute value by another

• Switch target, SWTG: Switches the location of an annotation from one target to another

(30)

Two years after the original report, Pinheiro et al. developed a mutation engine based on their findings [26]. To use the mutation engine successfully for each individual project, the user needs to construct a configuration file where relevant information is given as to decrease the risks of generating a massive amount of equiv-alent mutants. This information could include annotations to add or replace current annotations by. Both reports that were published evaluated the mutation operators against a set of faults known to be caused by code annotations. Their results of the first study showed that 95% of the 100 examined faults were discovered using the de-veloped operators [25]. The second study showed that the majority of the faults were discovered by the operators ADA, RMA and RPAV whilst RPAT was not able to simulate any of the faults. The remain-ing mutation operators ADAT, CHODR, RMAT, RPA and SWTG only simulated a total of 12% of the 200 faults in total [26]. Further-more, five of the 100 faults in the first study and ten of the 200 faults in the second study were not simulated through the opera-tors [25][26]. This was because those faults needed higher-order mutation operators.

2.7 REST APIs

2.7.1 Definition of REST

Roy Fielding introduced and coined REST as an architectural style for Web APIs in year 2000 [27]. Fielding developed this architec-tural style after having realized that the Web at that time had a scalability problem due to a set of key constraints [2].

(31)

2.7.2 Testing of REST APIs

(32)

Related Work

In [33] it is stated that 2-wise testing successfully identifies between 50 and 97 percent of existing faults in a program based on previous empirical studies. In the article, t-wise testing was evaluated for four different applications with t varying from 1 to 6. The results can be viewed in figure 3.1.

Figure 3.1: Cumulative error detection of applications using 1-wise to 6-wise testing

These numbers indicate that many faults were identified using

(33)

CHAPTER 3. RELATED WORK 23

wise testing and that fewer new faults were identified when pro-gressively going toward 6-wise testing [33]. The Web server appli-cation that was investigated showed that 40% of the failures were caught by 1-wise testing. The following 30% were caught with 2-wise testing and a total of 90% of the faults were discovered using 3-wise and lower strength testing. Upon discussing the same inves-tigation in another article [34], the authors state that the results for the NASA distributed database shows results that follow the same pattern - 67% of the faults were identified using 1-wise testing, 93% were identified using 2-wise testing and 98% were identified using 3-wise testing. While the results of the Web server application and NASA distributed database are not considered conclusive, it is used to show that CIT methods are able to achieve high levels of thor-oughness when testing.

In a more recent published article, it is noted that individual pro-grams can not be considered conclusive because their individual percentage of failure detection for different strengths of t-wise test-ing are substantially different [35]. Upon compartest-ing six programs concerning areas such as medical devices, browsers and TCP/IP, it showed that the variation of fault detection for 2-wise testing can alter between 47% to 97% percent. The same numbers for 1-wise testing alter between 9% and 66% whilst the corresponding per-centages for 3-wise testing alter between 75% and 99%.

(34)

Method

In order to investigate how the different interaction levels of CIT compare when testing REST APIs, three endpoints were examined. These three endpoints were part of microservices called security auditlog, customer information and security signing. Two of these, security auditlog and customer information, had a total of ten input parameters whilst the security signing endpoint had five.

The six steps defined below were applied to each endpoint. The steps are explained more thoroughly in their corresponding section.

1. Constructing an IPM

2. Applying the combination strategy 3. Generating test cases

4. Mutating the source code of the endpoint 5. Running the tests on the mutated code 6. Evaluating the results

4.1 Constructing an IPM

The data for the IPM was retrieved through parsing the OpenAPI specification of each endpoint. The OpenAPI specification included information such as the type of request, the input parameters with corresponding constraints and examples, as well as possible valid

(35)

CHAPTER 4. METHOD 25

and invalid response messages and codes. An example of what such an OpenAPI specification may look like can be seen in appendix A. The OpenAPI specifications that TietoEVRY provided contained in-put parameters that in most cases were subject to coincide with a certain pattern or included in a list of possible values. Apart from this, all parameters were also stated to be either required or not and would in some cases have minimum or maximum length con-straints connected to them. These details were taken into consider-ation when parsing the OpenAPI specificconsider-ation. The resulting input parameters included information about the details below:

• Type: Data type of the input parameter • Example: Example value of the parameter

• Default value: A relevant default value of the parameter • Required: Boolean stating if the parameter is required or not • Pattern: A regular expression pattern that the parameter had

to match

• MinLength/MaxLength: Defined a size interval of the input pa-rameter. If not present, this information was retrieved from length constraints of a regular expression pattern if one was supplied

• MinNum/MaxNum: Defined the minimum and maximum val-ues an input parameter could have. This information was re-trieved from a regular expression pattern if one was supplied • Enum array: An array of valid input values retrieved from an

existing pattern in the form of "value1|value2|value3"

The information retrieved from the OpenAPI specification allowed for straightforward implementation of BVA to generate both valid and invalid values of the input parameters. Apart from the gener-ated values from BVA, the default value of each parameter was also added to the valid values as to implement ECP.

(36)

among parameters in terms of inter-parameter constraints. These were also parsed and included in the model. When it came to the three endpoints used in this examination, only the security signing endpoint included a constraint.

4.2 Applying the Combination Strategy

ACTS stands for Advanced Combinatorial Testing System and is a tool which generates t-wise combinatorial test cases [37]. This tool follows the IPOG strategy for multi-way testing and has imple-mented base-choice testing for 1-wise testing [17].

After having defined the IPM, ACTS was used to create covering arrays for each endpoint through calling its API [37]. Each covering array was created through its corresponding input parameters and their valid and invalid values. This was possible as ACTS allows for negative testing. The constraints that had been defined in the pre-vious step were also implemented at this time.

The output of the program - all possible 1-wise, 2-wise and 3-wise interactions - were printed out and saved to two files per degree of interaction testing. One of the files indicated the actual values that the parameters should take on (covering array) whilst the other one was used to aid in recognizing which parameter had a nega-tive value in the neganega-tive tests as to determine the relevant error that should be displayed. These output files were obtained through ACTS.

4.3 Generating Test Cases

(37)

IPM.

REST Assured was used for testing the REST API as this allowed to write the tests with a given-when-then pattern. In order to generate numerous test cases of the same kind but with different inputs, the JUnit DataProvider method was utilized. DataProvider allowed to define numerous inputs to the same test function and would run one test case at a time.

4.4 Mutating the Source Code

The source files that validated the input parameter values did this through code annotations. To accommodate this, a mutation en-gine was built which was roughly based on the mutation opera-tors and engine developed by Pinheiro et al. [26]. The mutation operators that were used in the mutation engine were ADA, RPA, RPAV, RMAT, RMA and CHODR. To generate the mutants, the en-gine based the operators on a configuration file that had to be con-structed for each endpoint, similar to the configuration file that Pin-heiro et al. required for their mutation engine. The configuration file included endpoint-specific information as to make the result-ing mutants relevant and non-excessive. The thought process be-hind how the configuration file was constructed for each type of mutation operator can be seen below. In the endpoints that were examined, all code annotations except for a few belonged to the javax.validation.constraint package. Those that did not belong to this package were a part of the parent package or were developed specifically for the project. Hence, the javax.validation.constraint package was the one that the new or altered code annotations were derived from. The most common code annotations that were used throughout the endpoints were @Size, @Pattern and @NotNull.

• ADA: If a field did not have a @NotNull annotation, the @NotNull and @Null annotations were added. If a field only had a @NotNull annotation, either @Size or @Pattern was added with relevant min/max or regular expression parameters derived from an ex-isting example value.

(38)

annotations and vice versa. These replaced annotations had min/max and regular expression values that tested the bound-aries, i.e @Pattern(regexp="[0-9]{2}") would be replaced by @Size(max=1) and @Size(max=2) alternatively.

• RPAV: Altered the parameter values of an existing code anno-tation and tested the boundaries in a similar way to RPA. For example, @Pattern(regexp="A|B|C") would give the mutants @Pattern(regexp="A|B"), @Pattern(regexp="[A-Z]{1}) and @Pattern(regexp="[A-Z]{2}).

• RMAT: Removed any parameters of code annotations that were not required and would potentially influence the program. For example, the min/max parameters would alternatively be re-moved from a @Size annotation.

• RMA: Did not need any configuration. It would remove one code annotation from the original code per mutant until there were mutants where each original annotation had been re-moved.

• CHODR: Did not need any configuration. If a field had more than one code annotation, it created one mutant per permu-tation of the code annopermu-tations, i.e @NotNull, @Size(min=1) would create the mutant @Size(min=1), @NotNull.

When designing the qualities of RPA and RPAV, the more traditional relational operator mutants were used in the sense that boundaries were tested. This was nothing that was evident from the way Pin-heiro et al designed their mutation operators [25]. This choice of design was taken as to guarantee the mutants to change the be-haviour of the source code even though it may not convene with the traditional conventions of mutation testing with the CPH and coupling effect.

4.5 Running the Tests

(39)

results that were taken into account were how many tests had po-tentially failed, and if none had failed, the total run-time for all test cases was noted. Once this had been done for one mutant, the next mutant was selected and the process was repeated.

4.6 Evaluation of Performance

In order to calculate the performance of running a particular t-wise test suite, the JUnit Rules class Stopwatch was used. The stop-watch records the run-time of each test case whether it succeeded, failed or was skipped. The combined total time for running all test cases of a t-wise test suite was presented once all test cases had been run. This time was noted for each mutant which was not killed - meaning all cases when the tests ran successfully. The reason for only timing those instances was to get a better understanding of the real world practicality of the performance of 1-wise, 2-wise and 3-wise testing.

4.7 Evaluation of Fault Detection Abilities

(40)

Results

The results chapter consists of three parts, the resulting combina-tions generated by ACTS, the fault detection abilities and the run-time performance of each t-wise test suite.

5.1 Combinations Generated by ACTS

As mentioned in the methods chapter, two of the three endpoints had ten input parameters. As can be seen in table 5.1, the security auditlog and customer information endpoints had varying amounts of possible input values, which makes 5.8 and 3.7 values per input parameter in average. The third endpoint, security signing had five input parameters with four of them having four possible values and the last one had three as can be seen in table 5.1. In all three end-points, the number of possible input values also included negative test values.

Endpoint Input parameter

1 2 3 4 5 6 7 8 9 10

Security Auditlog 7 2 7 4 7 7 7 2 7 8

Customer Information 2 8 2 2 4 8 2 2 2 5

Security Signing 4 4 4 3 4

Table 5.1: Total amount of possible values per input parameter for each endpoint

(41)

CHAPTER 5. RESULTS 31

The resulting number of t-wise interactions for each endpoint can be seen in the logarithmic scale presented in figure 5.1.

Figure 5.1: Total number of interactions generated per t-wise com-bination and endpoint

Using the t-wise interactions that were produced by ACTS, the num-ber of test cases generated for each t-wise test suite and endpoint can be seen in figure 5.2 in a logarithmic scale.

(42)

The amount of test cases that were generated through CIT is con-sidered reasonable based on the amount of input parameters and possible values that were present in the three endpoints.

5.2 Performance

When timing an action, the local variables and configurations of the machine have an impact on the results. Thus, figure 5.3 illustrates a relational representation of the time taken to run the test suites. The figure displays the run-time of 1-wise and 2-wise testing in re-lation to the run-time of 3-wise testing for each endpoint shown in percentage. To get a more accurate relational representation, the results in figure 5.3 was based on the average run-time for each interaction type of each endpoint.

Figure 5.3: Relational representation of the average run-time per interaction type and endpoint

(43)

CHAPTER 5. RESULTS 33

Endpoint 1-wise 2-wise 3-wise Security Auditlog 11.4 21.4 49.9 Customer Information 11.8 20.0 49.4

Security Signing 9.6 10.8 11.8

Table 5.2: Average run-time in seconds of each t-wise test suite for each endpoint

5.3 Fault Detection Abilities

Running the mutants on all test suites for each endpoint showed that if one of the t-wise test suites managed to kill a mutant - all did. Thus, for this particular application of CIT of REST APIs using code annotations for validation, the fault detection abilities of all t-wise combination types were the same.

(44)

(45)

Chapter 6 Discussion

6.1 Input Validation Method in Code

The API endpoints under examination all used code annotations for validating input data. This in itself meant that it was not possible to execute mutation testing in the traditional sense that abides to the CPH and the coupling effect. Despite the fact that the mutation op-erators were based on those developed by Pinheiro et al. [25][26], mutation operators for code annotations are a very new topic which in fact was introduced with the work of Pinheiro et al. This means that not a lot of research has been done in the area of code an-notation mutation operators which in turn could have affected the results in this study.

6.2 Mutation Operators

When evaluating a test suite for its effectiveness in terms of fault detection abilities from a mutation testing standpoint, the mutants in question have an effect on the outcome. This stems from that the mutation score is calculated based on the number of mutants that were killed and which in turn is based on the design of the mutation operator. In this case, the design of the mutation operators was ap-plied directly to the study. This was because there were no means of an already established mutation engine that was applicable to code annotations and could be configured to the APIs under exam-ination. This in itself contributed to the risk of designing biased mutation operators, which could be an explanation of the results.

(46)

If mutation operators are biased to the test cases in question, it is more probable that the mutants will be killed resulting in skewed outcomes.

6.2.1 The Fundamental Hypotheses of Mutation

Testing

When designing mutation operators, the CPH and the coupling ef-fect should be kept in mind to create mutants that can simulate simple faults. However, since the code that validated the input pa-rameters did this through code annotations, the hypotheses could not be followed in a traditional way.

For example, when applying the CPH to relational operators, it is clear that a potential mutant could change a_ to <. This is not as evident when it comes to introducing simple faults in code annota-tions. That is why Pinheiro et al. defined their own two categories for types of faults regarding code annotations [25][26]. Further-more, as described in section 4.4, the mutation operators used in this study did not completely follow the design of those developed by Pinheiro et al. The mutation operators in this study were de-signed in a way to comply with traditional mutation operators, but applied to code annotations. This can be seen through the exam-ples presented under section 4.4 where the attribute values or the attributes of the code annotations themselves were altered. These values were changed in a way to satisfy mostly relational muta-tion operators applied directly to the values. For example, chang-ing @Pattern(regexp="A|B|C") to @Pattern(regexp="[A-Z]{2}) could be compared to altering input.size() == 1 to input.size() > 1 using a relational mutation operator.

(47)

pro-CHAPTER 6. DISCUSSION 37

gram as stated in the CPH. With the CPH not being implemented in a traditional sense for the remaining mutation operators, it follows that the coupling effect may not have been utilized in its intended way. This is both because the faults that were injected may not have been simple faults complying with the CPH, but also because it is unclear whether simple faults due to code annotation characteris-tics would couple to complex faults.

As the fundamental hypotheses behind the creation of mutation op-erators may not have been applied in their intended way for this study, it could be an indication of why the results showed that all types of t-wise testing killed the same mutants.

6.2.2 Mutants Used

The security signing, customer information and security auditlog endpoints were run against 60, 61 and 100 mutants respectively. Once the mutants had been run against the security signing and customer information endpoints, the results showed that neither one of their test suites managed to kill the CHODR mutants. Thus, the decision was made to exclude the CHODR mutants when test-ing the security auditlog endpoint. This was mainly because more than 800 mutants were generated for that particular endpoint of which all but 100 were CHODR mutants. The CHODR mutants were thereby disregarded to alleviate the manual testing process of run-ning the mutants against the endpoint. However, given that the test suites of the security signing and customer information endpoints were not able to kill the CHODR mutants, it also gave an indication that this particular mutation operator may only produce equivalent mutants. Nevertheless, there is a chance that the CHODR mutants would have made a difference with the security auditlog endpoint.

6.3 Effects of the IPM

(48)

parsed from the OpenAPI specifications made it possible to extract many boundary values which also made an impact on the test cases. If another method beside BVA had been used when constructing the IPM, the results may have looked differently.

6.4 Testing Scope

In this study, only three API endpoints were tested. This was both due to the time it took to manually run all test suites against the gen-erated mutants and because of issues with accessing and running endpoints that were applicable to this project. The original testing scope was thought to be bigger as to get a better representation of t-wise testing, but the issues with running the endpoints locally and restricting the endpoints to those with at least three input param-eters cut some of the endpoints out. The size of the testing scope could have an effect on the results in the sense that other endpoints may have shown that the types of t-wise testing resulted in different outcomes. The size of the testing scope also limits any statistically significant conclusions from being drawn as there is not enough data to back it up.

6.5 Results

6.5.1 Fault Detection Abilities

The results of the mutation testing were somewhat surprising since previous work states that the amount of faults detected should typ-ically increase when using higher strength t-wise testing. However, the sections above provide some reasoning as of why the results show that all three t-wise test suites detected the same amount of faults. At the very least, the test suites were able to kill a clear ma-jority of the mutants, which speaks for their fault detection abilities to some degree.

(49)

CHAPTER 6. DISCUSSION 39

of reasons on the program level.

6.5.2 Performance

The performance results were not so surprising when analyzing the number of test cases each combination type had generated. It is an accurate representation of run-time for running the number of tests that were generated. This also coincides with the general idea about t-wise testing and why higher degree testing is not as com-mon to use.

An interesting aspect when it comes to the performance results in figure 5.3 is that the run-times for the security auditlog and cus-tomer information endpoints follow a more expected exponential curve, whereas this is not the case for the security signing end-point. It may be explained by the fact that the security signing endpoint was the only one that had a constraint introduced in its IPM. This would back up the claim made in [36], which stated that introducing constraints in the IPM would decrease its size and thus decreasing the run-time. Another reason for this may be that the security signing endpoint did not have as many input parameters and corresponding values as the other endpoints.

6.6 Choice of Method

(50)

as strict as using mutation testing and evaluating whether the test suite discovers injected faults. Choosing to evaluate the fault de-tection abilities through mutation testing would hopefully give the result with the most sustenance.

Despite knowing that the APIs used code annotations for input vali-dation, mutation testing was still chosen as the way to evaluate the test suites. Mutation testing in a non-traditional form was still con-sidered more beneficial than branch or coverage statement. How-ever, when examining the results of the study, it appears as if it may have been helpful to have another metric of effectiveness used as well. This could have indicated the reachability of the test cases and may also have helped explain how all t-wise test suites managed to kill the same mutants.

6.7 Sustainability and Ethics

This thesis does not discuss any ethical or societal consequences in regard to the project as they are deemed irrelevant. The same goes for any connection to social sustainable development.

The outcome of this work could potentially favour the ecological and economical dimensions of sustainability to a minor degree. By choosing the t-wise combination type with the better performance in terms of run-time, the power used by a computer could be min-imized. With a decrease of power used, the electricity used would decrease. This may also contribute to the economical dimension of sustainability in the sense of reduced energy costs.

(51)

Chapter 7 Conclusions

The aim of this project was to evaluate how combinatorial interac-tion testing performs when applied to REST APIs. The areas that were of interest to examine were how 1-wise, 2-wise and 3-wise combinatorial testing compare in regard to detecting injected faults in the source code. This was done through mutation testing using project-specific mutation operators for code annotations. The other aspect that was of interest was how efficient the different types of CIT are from a run-time perspective. Based on the results that were presented in chapter 5, the conclusion can be drawn that all three types of CIT that were examined are able to detect the same faults when testing REST APIs. This gives an indication that the use of CIT can be presumed insignificant when it comes to testing REST APIs. As for the efficiency in terms of run-time for the three types of CIT, it is concluded that the run-time increases as the strength of the test suite increases. However, the results show that if a test suite has constraints among its parameters, the increase in run-time is not exponential as it is when there are no constraints.

It must be emphasized that these results are strongly connected to this particular study that consisted of three endpoints and which validated the input values using code annotations. The conclusions drawn cannot be applied to all cases of combinatorial interaction testing of REST APIs as it would need to be further investigated be-fore that. Some ways of improving this study in the future are listed in section 7.1.

(52)

7.1 Future Work

Based on the aspects brought up in chapter 6, following is a list of future work that could be done.

• Performing the same study with CIT and mutation testing, but on APIs that do not use code annotations for input validation. This would mean that established mutation engines could be used for generating the mutants.

• The study could be expanded by investigating branch and state-ment coverage as well as mutation testing.

• In order to further investigate the run-time effectiveness of CIT of REST APIs, endpoints with constraints can be exam-ined. This potential study could also experiment with higher strength test suites, potentially all the way up to 6-wise test-ing.

(53)

Bibliography

[1] M. Biehl. RESTful API Design. API-University Series. CreateS-pace Independent Publishing Platform, 2016. isbn: 9781514735169. url: https://books.google.se/books?id=DYC3DwAAQBAJ.

[2] M. Masse. REST API Design Rulebook: Designing Consistent RESTful Web Service Interfaces. O’Reilly Media, 2011. isbn: 9781449319908. url: https://books.google.se/books?id= eABpzyTcJNIC.

[3] G.J. Myers, C. Sandler, and T. Badgett. The Art of Software Testing. ITPro collection. Wiley, 2011. isbn: 9781118133156. url: https://books.google.se/books?id=GjyEFPkMCwcC. [4] Paul Ammann and Jeff Offutt. Introduction to Software

Test-ing. 2nd. USA: Cambridge University Press, 2016. isbn: 1107172012. [5] Srinivas Nidhra and Jagruthi Dondeti. “Black box and white

box testing techniques-a literature review”. In: International Journal of Embedded Systems and Applications (IJESA) 2.2 (2012), pp. 29–50.

[6] Frances E. Allen. “Control Flow Analysis”. In: SIGPLAN Not. 5.7 (July 1970), pp. 1–19. issn: 0362-1340. doi: 10 . 1145 / 390013.808479. url: https://doi.org/10.1145/390013. 808479.

[7] Mohd Ehmer Khan, Farmeena Khan, et al. “A comparative study of white box, black box and grey box testing techniques”. In: Int. J. Adv. Comput. Sci. Appl 3.6 (2012).

[8] A. Bhat and S. M. K. Quadri. “Equivalence class partitioning and boundary value analysis - A review”. In: 2015 2nd Inter-national Conference on Computing for Sustainable Global De-velopment (INDIACom). Mar. 2015, pp. 1557–1562.

(54)

[9] S. C. Reid. “An empirical analysis of equivalence partitioning, boundary value analysis and random testing”. In: Proceed-ings Fourth International Software Metrics Symposium. Nov. 1997, pp. 64–73. doi: 10.1109/METRIC.1997.637166.

[10] Specialist Interest Group in Software Testing British Com-puter Society. Standard for Software Component Testing. Vol. 3.4. British Computer Society, 2001.

[11] Y. Lei et al. “IPOG: A General Strategy for T-Way Software Testing”. In: 14th Annual IEEE International Conference and Workshops on the Engineering of Computer-Based Systems (ECBS’07). Mar. 2007, pp. 549–556. doi: 10 . 1109 / ECBS . 2007.47.

[12] Mats Grindal and Jeff Offutt. “Input parameter modeling for combination strategies”. In: Proceedings of the 25th confer-ence on IASTED International Multi-Conferconfer-ence: Software En-gineering. ACTA Press. 2007, pp. 255–260.

[13] C. Yilmaz et al. “Moving Forward with Combinatorial Interac-tion Testing”. In: Computer 47.2 (Feb. 2014), pp. 37–45. issn: 1558-0814. doi: 10.1109/MC.2013.408.

[14] Changhai Nie and Hareton Leung. “A Survey of Combinatorial Testing”. In: ACM Comput. Surv. 43.2 (Feb. 2011). issn: 0360-0300. doi: 10 . 1145 / 1883612 . 1883618. url: https : / / doi . org/10.1145/1883612.1883618.

[15] D. M. Cohen et al. “The combinatorial design approach to automatic test generation”. In: IEEE Software 13.5 (Sept. 1996), pp. 83–88. issn: 1937-4194. doi: 10.1109/52.536462. [16] D Richard Kuhn, Raghu N Kacker, and Yu Lei. “Practical

com-binatorial testing”. In: NIST special Publication 800.142 (2010), p. 142.

[17] Yu Lei et al. “IPOG/IPOG-D: efficient test generation for multi-way combinatorial testing”. In: Software Testing, Verification and Reliability 18.3 (2008), pp. 125–148.

(55)

BIBLIOGRAPHY 45

[19] Y. Jia and M. Harman. “An Analysis and Survey of the Develop-ment of Mutation Testing”. In: IEEE Transactions on Software Engineering 37.5 (Sept. 2011), pp. 649–678. issn: 2326-3881. doi: 10.1109/TSE.2010.62.

[20] A. Jefferson Offutt. “Investigations of the Software Testing Coupling Effect”. In: ACM Trans. Softw. Eng. Methodol. 1.1 (Jan. 1992), pp. 5–20. issn: 1049-331X. doi: 10.1145/125489. 125473. url: https://doi.org/10.1145/125489.125473. [21] Allen Troy Acree et al. Mutation Analysis. Tech. rep.

GIT-ICS-79/08. Atlanta, Georgia: Georgia Institute of Technology, 1979. [22] K.S. How Tai Wah. “An analysis of the coupling effect I:

sin-gle test data”. In: Science of Computer Programming 48.2 (2003), pp. 119–161. issn: 0167-6423. doi: https : / / doi . org/10.1016/S0167- 6423(03)00022- 4. url: http://www. sciencedirect.com/science/article/pii/S0167642303000224. [23] M. Papadakis et al. “Trivial Compiler Equivalence: A Large

Scale Empirical Study of a Simple, Fast and Effective Equiv-alent Mutant Detection Technique”. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. Vol. 1. May 2015, pp. 936–946. doi: 10.1109/ICSE.2015.103.

[24] R. Geist, A. J. Offutt, and F. C. Harris. “Estimation and en-hancement of real-time software reliability through mutation analysis”. In: IEEE Transactions on Computers 41.5 (May 1992), pp. 550–558. issn: 2326-3814. doi: 10.1109/12.142681.

[25] Pedro Pinheiro et al. “Mutation Operators for Code Anno-tations”. In: Proceedings of the III Brazilian Symposium on Systematic and Automated Software Testing. SAST ’18. SAO CARLOS, Brazil: Association for Computing Machinery, 2018, pp. 77–86. isbn: 9781450365550. doi: 10 . 1145 / 3266003 . 3266006. url: https://doi.org/10.1145/3266003.3266006. [26] Pedro Pinheiro et al. “Mutating code annotations: An

empiri-cal evaluation on Java and C# programs”. In: Science of Com-puter Programming 191 (2020), p. 102418.

(56)

[28] L. Li and W. Chou. “Design and Describe REST API without Vi-olating REST: A Petri Net Based Approach”. In: 2011 IEEE In-ternational Conference on Web Services. July 2011, pp. 508– 515. doi: 10.1109/ICWS.2011.54.

[29] How to Use Postman API Client: GraphQL, REST, SOAP Supported. url: https : / / www . postman . com / product / api -client.

[30] url: https://install.advancedrestclient.com/install. [31] How to test a REST api from command line with curl. Dec.

2014. url: https://www.codepedia.org/ama/how-to-test-a-rest-api-from-command-line-with-curl/.

[32] Getting Started with REST Testing in SoapUI. url: https:// www.soapui.org/rest-testing/getting-started.html. [33] R. Kuhn et al. “Combinatorial Software Testing”. In:

Com-puter 42.8 (Aug. 2009), pp. 94–96. issn: 1558-0814. doi: 10. 1109/MC.2009.253.

[34] D. Kuhn, yu Lei, and Raghu Kacker. “Practical Combinatorial Testing: Beyond Pairwise”. In: IT Professional 10 (June 2008), pp. 19–23. doi: 10.1109/MITP.2008.54.

[35] D. R. Kuhn, R. N. Kacker, and Y. Lei. “Estimating t-Way Fault Profile Evolution During Testing”. In: 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). Vol. 2. June 2016, pp. 596–597. doi: 10.1109/COMPSAC.2016. 110.

[36] Justyna Petke et al. “Efficiency and Early Fault Detection with Lower and Higher Strength Combinatorial Interaction Test-ing”. In: Proceedings of the 2013 9th Joint Meeting on Foun-dations of Software Engineering. ESEC/FSE 2013. Saint Pe-tersburg, Russia: Association for Computing Machinery, 2013, pp. 26–36. isbn: 9781450322379. doi: 10 . 1145 / 2491411 . 2491436. url: https://doi.org/10.1145/2491411.2491436. [37] NIST Computer Security Resource Center. User Guide for

(57)

Appendix A

Swagger Excerpt

1 {

2 "swagger": "2.0", 3 "info": {

4 "description": "This is a sample server Petstore

server. You can find out more about Swagger at [http://swagger.io](http://swagger.io) or on [ irc.freenode.net, #swagger](http://swagger.io/irc /).",

5 "version": "1.0.0",

6 "title": "Swagger Petstore" 7 }, 8 "host": "petstore.swagger.io", 9 "basePath": "/v2", 10 "tags": [ 11 { 12 "name": "pet",

13 "description": "Everything about your Pets", 14 "externalDocs": {

(58)

23 ], 24 "paths": { 25 "/pet": { 26 "post": { 27 "tags": [ 28 "pet" 29 ],

30 "summary": "Add a new pet to the store", 31 "description": "", 32 "operationId": "addPet", 33 "consumes": [ 34 "application/json", 35 "application/xml" 36 ], 37 "produces": [ 38 "application/xml", 39 "application/json" 40 ], 41 "parameters": [ 42 { 43 "in": "body", 44 "name": "body",

45 "description": "Pet object that needs to be

added to the store",

46 "required": true, 47 "schema": { 48 "$ref": "#/definitions/Pet" 49 } 50 } 51 ], 52 "responses": { 53 "405": {

54 "description": "Invalid input"

(59)

APPENDIX A. SWAGGER EXCERPT 49

61 "tags": [ 62 "pet"

63 ],

64 "summary": "Finds Pets by status",

65 "description": "Multiple status values can be

provided with comma separated strings",

66 "operationId": "findPetsByStatus", 67 "produces": [ 68 "application/xml", 69 "application/json" 70 ], 71 "parameters": [ 72 { 73 "name": "status", 74 "in": "query",

75 "description": "Status values that need to be

considered for filter",

76 "required": true, 77 "type": "array", 78 "items": { 79 "type": "string", 80 "enum": [ 81 "available", 82 "pending", 83 "sold" 84 ], 85 "default": "available" 86 }, 87 "collectionFormat": "multi" 88 } 89 ], 90 "responses": { 91 "200": {

92 "description": "successful operation", 93 "schema": {

94 "type": "array", 95 "items": {

96 "$ref": "#/definitions/Pet"

(60)

98 }

99 },

100 "400": {

101 "description": "Invalid status value"

(61)

APPENDIX A. SWAGGER EXCERPT 51 137 }, 138 "name": { 139 "type": "string", 140 "example": "doggie" 141 }, 142 "age": { 143 "type": "integer", 144 "example": 1, 145 "pattern": "[0-9]+", 146 "minLength": 1 147 }, 148 "status": { 149 "type": "string", 150 "example": "available",

(62)

(63)

TRITA-EECS-EX-2020:795

Evaluation of the t-wise Approach for Testing REST APIs

Evaluation of the t-wise

Approach for Testing REST

APIs

DIBA VOSTA

Evaluation of the t-wise

Approach for Testing REST

APIs

DIBA VOSTA

Abstract

Sammanfattning

Acknowledgments

Contents

Chapter 1

Introduction

1.1 Project Description

1.2 Objective

1.2.1 Research Question

1.3 Methodology

1.4 Limitations

1.5 Contribution

1.6 Thesis Outline

Chapter 2

Background

2.1 Software Testing Foundations

2.2 White-box Testing

2.3 Black-box Testing

2.3.1 Equivalence Class Partitioning

2.3.2 Boundary Value Analysis

2.4 Combinatorial Interaction Testing

2.4.1 Modeling Phase

2.4.2 Sampling Phase

2.4.3 T-wise Testing

2.4.4 Coverage Criteria

2.5 In-Parameter-Order-General

2.5.1 IPOG Testing Strategy

2.6 Mutation Testing

2.6.1 Fundamental Hypotheses

2.6.2 Mutation Operators

2.6.3 Mutation Testing in Practice

2.6.4 Mutation Score

2.7 REST APIs

2.7.1 Definition of REST

2.7.2 Testing of REST APIs

Related Work

Method

4.1 Constructing an IPM

4.2 Applying the Combination Strategy

4.3 Generating Test Cases

4.4 Mutating the Source Code

4.5 Running the Tests

4.6 Evaluation of Performance

4.7 Evaluation of Fault Detection Abilities

Results

5.1 Combinations Generated by ACTS

5.2 Performance

5.3 Fault Detection Abilities

Chapter 6

Discussion

6.1 Input Validation Method in Code

6.2 Mutation Operators

6.2.1 The Fundamental Hypotheses of Mutation

Testing

6.2.2 Mutants Used

6.3 Effects of the IPM

6.4 Testing Scope

6.5 Results

6.5.1 Fault Detection Abilities

6.5.2 Performance

6.6 Choice of Method

6.7 Sustainability and Ethics

Chapter 7

Conclusions

7.1 Future Work

Bibliography

Appendix A

Swagger Excerpt