Systematically uncovering mutants in testing safety critical software : Using symbolic execution on surviving mutants from mutation testing

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science

2019 | LIU-IDA/LITH-EX-A--19/105--SE

Systematically uncovering

mu-tants in testing safety critical

software

–

Using symbolic execution on surviving mutants from mutation

testing

Systematiskt upptäckande av mutanter i testning av

säkerhet-skritisk mjukvara

–

Användning av symbolisk exekvering på överlevande mutanter

från mutationstestning

Nils Petersson

Niklas Pettersson

Supervisor : Jonas Wallgren Examiner : Ahmed Rezine

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

This thesis investigates how symbolic execution and constraint solving can be used for detecting equivalent and non-equivalent mutants in mutation testing. The presented method proposes a driver implementation, realized as a plugin for Dextool, to differentiate original and mutated code from each other by executing both versions symbolically using KLEE. The method was tried on a number of code examples, uncovering equivalent and non-equivalent mutants successfully. It was concluded that detection of equivalent and non-equivalent mutants was possible and relevant for mutation testing, but with limita-tions in terms of scalability and applicability due to increasing amount of side effects, path explosion and certain functions not being suitable for symbolic execution.

(4)

(5)

Acknowledgments

Firstly, we would like to thank our examiner Ahmed Rezine for guiding us through this project and providing his knowledge in the research area. Secondly, we would like to thank our supervisor Jonas Wallgren for providing excellent feedback on the written work in this thesis.

Furthermore, we would like to thank our external supervisors at Saab, Christoffer Nylén and Joakim Brännström, for their utmost brilliant help and feedback during weekly meetings and sporadic hectic skype conversations. Finally, Niklas would like to thank his friend and neighbor John Tinnerholm for his moral support and input along with all the late night beers shared during this period.

Sincerely, thank you! Nils & Niklas

(6)

List of Figures

3.1 Function to perform mutation testing on. . . 10

3.2 Function to perform mutation testing on with included mutation. . . 10

3.3 Example of code containing several paths . . . 12

3.4 Path constraints for the paths in Figure 3.3 . . . 12

3.5 Input values to different paths provided by the constraint solver . . . 13

3.6 Example of a pure function . . . 15

4.1 Overview of how the generated files are set up . . . 20

4.2 Source file generated for symbolic execution. . . 20

4.3 Mutated file generated for symbolic execution . . . 21

4.4 Structure for the generated file prepared for symbolic execution. . . 21

4.5 File generated for symbolic execution with check on input. . . 23

4.6 Option flags when running KLEE on the generated files. . . 24

4.7 Mutated switch statement . . . 26

4.8 Original switch statement . . . 26

4.9 Errors and warnings from compilation . . . 27

4.10 Overview of how the plugin operates . . . 29

4.11 Example of semantic identifiers for an arbitrary function . . . 29

5.1 Example of a time out mutant for decrement_example.cpp . . . 32

5.2 Equivalent mutant generated for decrement_example.cpp . . . 33

5.3 Non-equivalent mutant generated for decrement_example.cpp . . . 33

5.4 Equivalent mutant generated for enum_example.cpp . . . 34

5.5 Non-equivalent mutant generated for enum_example.cpp . . . 35

5.6 Non-equivalent mutant generated for fibonacci_rec_example.cpp . . . 36

5.7 Equivalent mutant generated for void_example.cpp . . . 37

5.8 Non-equivalent mutant generated for void_example.cpp . . . 37

5.9 Equivalent mutant generated for lottery_example.cpp . . . 38

5.10 Non-equivalent mutant generated for lottery_example.cpp . . . 39

6.1 Example of klee_assume() utilization for limiting the symbolic interval . . . 45

6.2 Example of equivalent mutant for fibonacci_rec_example.cpp . . . 46

6.3 Example of non-functional code in an arbitrary method . . . 47

6.4 Equivalent mutant for non-functional code from Figure 6.3 . . . 47

A.1 decrement_example.cpp . . . 59 A.2 fibonacci_rec_example.cpp . . . 59 A.3 enum_example.cpp . . . 60 A.4 void_example.cpp . . . 61 A.5 lottery_example.cpp . . . 61 A.6 fibonacci_iter_example.cpp . . . 62 A.7 triangle_example.cpp . . . 63 viii

(9)

B.1 decrement_example.cpp ROR . . . 65 B.2 decrement_example.cpp ABS . . . 65 B.3 decrement_example.cpp UOI . . . 66 B.4 void_example.cpp ROR . . . 66 B.5 void_example.cpp ABS . . . 66 B.6 void_example.cpp UOI . . . 67 B.7 enum_example.cpp ROR . . . 68 B.8 enum_example.cpp DCR . . . 69 B.9 lottery_example.cpp ABS . . . 70 B.10 lottery_example.cpp UOI . . . 70 B.11 fibonacci_rec_example.cpp ROR . . . 71 B.12 fibonacci_rec_example.cpp ABS . . . 71 B.13 triangle_example.cpp ROR . . . 72 C.1 void_example.cpp ROR . . . 73 C.2 void_example.cpp DCC . . . 73 C.3 void_example.cpp ABS . . . 74 C.4 lottery_example.cpp ABS . . . 74 C.5 lottery_example.cpp ROR . . . 75 C.6 triangle_example.cpp ROR . . . 76

(10)

List of Tables

3.1 List of mutation kinds with some examples . . . 10

4.1 Small code examples used. . . 18

4.2 Safety critical code examples used. . . 18

4.3 Mutation schema for an ROR-mutation (boolean types) . . . 19

4.4 Example of a table used to present result from running Dextool Mutate. . . 22

4.5 Example of a table used to present result from running symbolic execution. . . 22

4.6 Example of a table used when calculating mutation score before and after symbolic execution. . . 22

4.7 Example of time out data for example_program.cpp . . . 25

4.8 Test values for decrement_example.cpp. . . 27

4.9 Test values for f ibonacci_example.cpp. . . 28

4.10 Test values for enum_example.cpp. . . 28

4.11 Test values for void_example.cpp. . . 28

4.12 Test values for lottery_example.cpp. . . 28

5.1 Running Dextool Mutate on small examples. . . 31

5.2 Running symbolic execution on small examples. . . 32

5.3 Mutation score for small examples before and after symbolic execution. . . 40

5.4 Results from evaluating ROR mutants on fibonacci_rec_example.cpp . . . 41

5.5 Results from evaluating ROR mutants on fibonacci_iter_example.cpp . . . 41

5.6 Running symbolic execution on safety critical code. . . 41

(11)

1 Introduction

The purpose of this chapter is to give a short description of the work that was conducted in this master’s thesis, beginning with a short background to the thesis followed by a motivation on why the studied topic is of importance. That is then followed by a section describing the aim of this thesis, where the expected results are brought up. Then the research questions in the study are presented. Finally comes a short section about the delimitations of the thesis.

1.1 Background

This thesis was conducted as a part of the Vinnova project Testomat1at Saab in Linköping. Saab is a global defence and security company with products, services, and solutions for both military and civil safety. Saab currently consists of different business areas - Aeronautics, the one this thesis was conducted at, focuses on development of aviation technologies, both military and civil ones.

Saab has earlier conducted work in collaboration with students in the area of mutation testing (see section 3.1) and theses published in 2014, 2015 and 2017 brings up several issues with the use of mutation testing [10, 16, 24]. This is a technique which has been around since the early 70s and is now being more and more integrated into the workflow in companies.

1.2 Motivation

In 1979, when the book "The art of Software Testing" [15] was published, the general rule of thumb for programming projects was that 50 percent of the total time and 50 percent of the total cost of the project was spent in testing the program or system being developed. That rule of thumb still holds true, even though some would argue that portions of that time could be classified as other activities of the project rather than testing (i.e training, reviews and debugging) [22].

Software testing is a process to assess the quality of the software by trying to discover defects in it and is therefore a topic that has been researched heavily. One issue with testing is that it is unclear when the testing phase can be considered finished and adequate for large

(12)

1. INTRODUCTION

systems. This element of uncertainty makes it difficult to find a suitable exit criteria for test-ing. Can developers be sure that the software is correct just because all tests have passed? One way to improve the confidence in a test suite is to analyze it. This can be done in several ways, one example is mutation testing where small changes are made to the code base in order to mimic human errors. The test suite is then executed on the altered code base with the expectation that the test suite will find the introduced errors by failing one or more test cases. The test suite is then being ranked depending on how many of these introduced errors it can find which will result in a mutation score (see section 3.1).

However, this is a very time consuming method where the testers often need to run the entire test suite to test one small alteration in the software. Even in the case where the en-tire process is automated, time consumption problems occur; especially in the case where a code alteration does not change the behaviour of the code. These cases are called equivalent mutants and are impossible to detect using test cases because all inputs result in the same output for the two versions of the program. Usually, this is solved by developers manually inspecting the code and marking certain mutants as equivalent in order to raise the mutation score by subtracting these from the total number of mutants (see formula in section 3.1). De-velopers can often taint these results as well, marking equivalent mutants incorrectly in 20% of the cases [1]. Therefore, the mutation score is based on the amount of detectable mutants the suit was able to detect.

1.3 Aim

As previously mentioned, the high cost mostly comes from the manual part of the testing since it could result in an unclear or even unreasonable mutation score (see section 3.1). If the manual part, for example detecting equivalent mutants, could be reduced then the attrac-tiveness of mutation testing would increase. In the best of cases, equivalent mutants could be detected and eliminated completely automatically. However, this might be too hard in cases where very complex systems are under test. In these cases the tester would still benefit from a partial solution where some information regarding the mutants could be given, for exam-ple information to the user that the amount of dependencies in the code is overwhelming for the program. By utilizing symbolic execution and constraint solving (section 3.3), this thesis aims to automatically detect equivalent and non-equivalent mutants (section 3.2) after muta-tion testing have been conducted. The method of how this is achieved is to set up paths for symbolic execution to analyze which are infeasible in the case where a mutant is equivalent.

This is however only one of the positive sides of detecting equivalence mutants. Inform-ing the user that a certain mutant can not be killed due to it beInform-ing equivalent to the original code will drastically increase the final mutation score in the testing phase. The other side of this is when the program is able to detect that the mutant is not equivalent. That would indicate that the test cases currently used is not of adequate quality since there actually exists a test case, an input to the function for example, that would terminate the mutant. By using symbolic execution for finding certain paths in the program and executing code that would effectively kill the mutant, one could specify exactly what input caused this code to execute. Research that investigated similar problems (see section 2.2, 2.5, 2.6 and 2.7) has shown to be successful when utilizing Symbolic Execution and Constraint Solving. For example, Offutt and Pan investigated how equivalent mutants could be discovered using mathematical con-straints, which are part of the underlying functionality of Symbolic Execution and Constraint Solving [18].

The secondary objective of this thesis, given that the first one is to utilize symbolic ex-ecution and constraint solving for detecting equivalent and non-equivalent mutants, is to implement a plugin that could be used as the last part of mutation testing. A plugin that uses all these previously mentioned techniques and could function on a variety of different code examples in order to provide the user with valuable information regarding their test cases

(13)

1.4. Research questions

and why/why not certain mutants died. The final goal is to be able to use this plugin on industrial code, avionic industrial code to be specific. Therefore, the plugin needs to take into account how the code is built and what type of system it is. For example, code with little to no output could prove very hard to fully test. Therefore, the tool for symbolic execution and constraint solving needs to be used in different ways depending on the system under test (SUT). The secondary objective is therefore both theoretical and practical since the aim is to investigate how the plugin can be implemented so that it will provide valuable information to the user, whether it actually was able to run or not.

1.4 Research questions

This thesis examines the ability and efficiency of using symbolic execution and constraint solving when conducting mutation testing. Specifically, the thesis tries to answer the follow-ing questions:

1. How can detection of equivalent and non-equivalent mutants profit from using sym-bolic execution and constraint solving?

2. To what extent is detection of equivalent and non-equivalent mutants relevant for mu-tation testing?

3. To what extent is symbolic execution and constraint solving a scalable and applicable approach for detection of equivalent and non-equivalent mutants in safety critical soft-ware?

1.5 Delimitations

This master’s thesis is carried out by two students during a 20 weeks period, which means that not all problems can be taken into account within the scope of the project. Some code examples might need longer time to investigate which could lead to them being excluded in the final implementation. This is however not a limitation on the demonstration of the research area, meaning that it will not affect the result in a bad way for the thesis. However, it might limit the applicability of the method used in the program.

Furthermore, this thesis will not contain examples of programs where a floating point variable is the symbolic element (the variable that is to be considered symbolic, see section 3.3) in a function or code. Given how symbolic execution operates with different types, a floating point variable is hard to handle since the constraint solvers can not differentiate very small values from each other and are thereby not being able to say that a path is feasible or not. There is research in this area showing good result in terms of constraint solvers being able to handle floating points. However, breakthroughs are still required before it is efficient enough for scalable analysis [14]. These solvers are available as libraries to extend both the functionality of symbolic execution engines and constraint solvers to support floating points to some degree, but they are not a native part of the tools and will therefore not be covered in this thesis.

(14)

(15)

2 Related work

This chapter contains information about studies conducted in the area of mutation testing and symbolic execution related to this thesis. The chapter will start with a description of the findings of an earlier bachelor thesis conducted at Saab which is the foundation for this master thesis and its core research area. Followed by that are research papers handling both symbolic execution and equivalent mutations.

2.1 Mutation testing at Saab

In the spring 2017, Johnsson and Svensson conducted a bachelor thesis at Saab in Linköping where code on different criticality levels was investigated in regards to mutation testing [10]. They investigated the respective mutation score the code on the different levels received when generating a various amount of mutations and showed that code on higher critical-ity levels did not receive better score which was the hypothesis. These criticalcritical-ity levels are defined in the DO-178C1standard published by the Radio Technical Commission for Aero-nautics (RTCA) and is the one used at Saab.

In the discussion part of the thesis, Johnsson and Svensson acknowledged the possible existence of equivalent mutations and the impact of them on the final mutation score. They showed that among all the mutations that were generated the ROR mutation (see Table 3.1) was the most common one, making up 67 percent of them all. They argue that these muta-tions generate equivalent mutants more often than other mutation operamuta-tions. However, the amount of specific mutants is highly dependent on the mutation testing tool used and what type of application being tested, the later being recognized in their thesis.

Johnsson and Svensson stated that further research on the different criticality levels was required and that detection of equivalent mutants was needed in order to enhance the mu-tation score even further. They also stated that conducting the same research with more variants of mutations and to conduct the test on more different applications on the different levels may change the result.

(16)

2. RELATED WORK

2.2 Automatic Test Case Generation via Dynamic Symbolic Execution

Generating test cases automatically is one great way to increase the efficiency of software test-ing. Papadakis and Malevris demonstrated how this could be done using symbolic execution [19]. The test cases are generated based on mutations acquired during mutation testing.

The aim of the test case generation was to kill the introduced mutants that were non-equivalent. Papadakis and Malevris also mention problems with equivalent mutants as they try to generate test cases for them as well which is a waste of time given the nature of equi-valent mutants.

2.3 The impact of equivalent mutants

In a paper written by Grün et al, the impact of equivalent mutants were investigated when running JAVALANCHE on the JAXEN XPATH query engine [8]. They found that equivalent mutants were surprisingly common, so common that it made up 40 percent of their selected mutants among the surviving ones. They also discovered that the time it took to manually go through and discover that the selected mutants were in fact equivalent or not was around 15 minutes for each of them. The conclusion from this was that equivalent mutations made it impossible to automatically produce result in terms of test suite quality when conducting mutation testing.

The result presented by Grün et al. indicates a vast problem in terms of equivalent mu-tations, but they also investigated the relation between impact on code coverage and non-equivalent mutants (impact in terms of the code coverage for a test suite). It indicated that higher impact on code coverage from mutants resulted in higher probability of the mutant being non-equivalent. Because of this, they stated that if one were to prioritize mutants in order to find more non-equivalent mutants, one should rank the mutants based on impact on code coverage.

The work of Grün et al. could however be criticized based on the low number of mutants they select from the surviving ones. Since they show that ranking mutations based on im-pact on code coverage changes the distribution of equivalent mutations in their selected set, increasing the sample size would be a good idea in order to draw further conclusions from their work.

2.4 (Un-)Covering Equivalent Mutants

In 2010 Schuler and Zeller wrote a paper inspired by Grün et al. [8] where they analyzed the number of equivalent mutants present in a set of Java programs as well as how different properties of a generated mutant affect its likelihood of being equivalent [23].

After 20 mutations in 7 programs were analyzed they came to the conclusion, much in line with what Grün et al. presented, that around 45% of all uncaught mutants are equivalent. Another interesting point made was that as the test suite improves this number will continue to rise as more of the non-equivalent mutants are detected. Similar to the results presented by Grün et al. the time needed to manually determine if a mutant was equivalent or not was 15 minutes.

They also show that if a mutant has a change in code coverage compared to the original version it is a 75% chance that the mutant is non-equivalent, which can give a user trying to find equivalent mutants when conducting mutation testing a priority list of which mutants to analyze first as the process of analyzing mutants is time-consuming as mentioned earlier.

Much like the earlier paper conducted by Gr ˜un et al. [8] (see section 2.3) the number of analyzed mutants is quite small, 140 compared to over 55 thousand generated and covered mutants. However the result they obtain is compared with similar studies which strengthen the conclusion.

(17)

2.5. Using KLEE for high coverage tests

2.5 Using KLEE for high coverage tests

Cadar et al. described in 2008 how symbolic execution realized in their tool KLEE (section 3.5) could be used to detect a great deal of bugs in a variety of heavily tested programs [6]. One of the test objects were GNU COREUTILS, where 10 serious bugs were uncovered and three of these had been unknown for over 15 years.

This was achieved by using a combination of symbolic execution and constraint solving together with analysis of every potentially dangerous operation, for example pointer deref-erence. When KLEE reached a dangerous operation it checked if any value given the current path constraints could cause an error and generated a concrete test case for this specific error-path.

It was also shown that the test suite generated automatically by KLEE reached higher code-coverage than the manually written test suite designed by the developers. KLEE could, in most cases, run directly on code taken "out of the box" without any modifications. When testing COREUTILS only one part of the code had to be altered which was a large buffer in sort, which gave the constraint solver a hard time. Some of these programs was also very large in size (80k lines of library code and 61k lines of actual utilities) which had previously been one of the biggest drawbacks of symbolic execution (not being able to run on larger programs, see section 3.7). Therefore, this paper also demonstrates the potential use of KLEE on other bigger systems.

2.6 Symbolic Execution of Java Bytecode

P˘as˘areanu and Rungta presented in a paper published 2010[20] a symbolic execution tool for automatic test case generation and error detection in java programs. The tool, called Symbolic Pathfinder (SPF), combined symbolic execution with model checking and constraint solving and was able to detect deadlocks and race-conditions.

P˘as˘areanu and Rungta showed that the resources SPF consumed was high and that it suffered from scalability issues. This was because of the exhaustive nature of the analysis it performed and the constraint solving involved. In order to improve this, they continued to work on parallelizing SPF.

The paper emphasizes the fact that symbolic execution is a resource heavy method to generate test cases but that the shown result is good.

2.7 Detecting equivalent mutants and the feasible path problem

Offutt and Pan were able to detect equivalent mutants by utilizing a Constraint Based Tech-nique presented in a paper 1996 [18]. They presented a partial solution that detected a sig-nificant percentage of the equivalent mutants for most programs. The percentage was in the experiments over 60 percent for 7 out of 11 programs with the average being over 45 percent for all 11 of them. The number of mutants in the programs ranged from about 180 to 3000.

Offutt and Pan also showed that this technique was better at solving the feasible path problem in comparison with detecting equivalent mutants. However, they did state that the programs tested were artificially created, with limitations and potential bias, thus making the result not to be generalized without further research.

(18)

(19)

3 Theory

This chapter presents the theory behind the thesis in regards to mutation testing and sym-bolic execution. After that, the problems with equivalent mutants are described in detail along with examples. Furthermore, the tool used in the thesis is presented and the potential problems that might occur during execution of it.

3.1 Mutation testing

Mutation testing is a white-box technique in software testing used to evaluate and improve the quality of an existing test suite. By introducing small changes into the code with mod-ifications to certain expressions and statements, a given test suite can find a mutant or not based on the condition that the mutant caused a change in behavior compared to the original program (no longer passing the tests) [17]. A test being able to spot this difference in behavior results in a mutant marked as killed and a mutant not detected is marked alive. The test suite is then given a mutation score based on how many of the mutants that were killed [9]. The mutation score for a test set T, designed to test P, is calculated using the formula

Ms(P, T) = Mk Mt´Mq

where Msis the final score, Mkthe amount of mutants killed, Mtthe total amount of mutants and Mqis the amount of equivalent mutants [2]. The intended purpose of mutation testing is to mimic common programming errors, such as mistakenly using the wrong operator, in order to help the developer create new effective tests [21]. While this is the intended purpose, mutation testing can also provide valuable information to the developer regarding sections of code that was not covered/accessed during the test suite execution, thereby giving the developer new tests to consider.

(20)

3. THEORY

Below are two figures showing the result of a simple mutation applied to a trivial code example. Figure 3.1 shows source code for a simple for-loop executing a subtraction each iteration. This specific example will iterate 10 times, subtracting a total of 10 from the function parameter x given to the function and return the final value of the given parameter x. 1 i n t s o u r c e ( i n t x ) { 2 f o r ( i n t i = 0 ; i < 1 0 ; i + + ) { 3 x = x ´1; 4 } 5 r e t u r n x ; 6 }

Figure 3.1: Function to perform mutation testing on.

Figure 3.2 shows the same function but with a slight modification (mutation) of the con-dition for continue looping in the for-loop. This mutant is called a relational operator re-placement (ROR) mutation and is one of many possible mutations that could be generated automatically when conducting mutation testing.

1 i n t mutation ( i n t x ) { 2 f o r ( i n t i = 0 ; i != 1 0 ; i + + ) { 3 x = x ´1; 4 } 5 r e t u r n x ; 6 }

Figure 3.2: Function to perform mutation testing on with included mutation.

The mutant shown in Figure 3.2 is, as earlier stated, an ROR-mutation. There are however various different kinds of mutations that can be used and generated with a mutation testing tool [24]. Examples of different mutations are shown in Table 3.1.

• Arithmetic Operator Replacement (AOR) ( f =3 ˚ x; ùñ f =3+x; ) • Relational Operator Replacement (ROR) (i f(3==x) ùñ i f(3 ă x) ) • Unary Operator Insertion (UOI) ( f =3 ˚ x ùñ f =3 ˚ &x) • Conditional Operator Replacement (COR) (i f(x && y) ùñ i f(x || y) ) • Bomb Statement Replacement (BSR) ( f =3 ˚ x; ùñ halt(); ) • Statement Deletion (SDL) (i f(x)treturn y; u ùñ i f(x)t//return y; u) • Absolute Value Insertion (ABS) (a=d ˚ 3.14; ùñ a=abs(d ˚ 3.14); ) • Decision/Condition Replacement (DCR) (i f(x==y) ùñ i f(true) )

Table 3.1: List of mutation kinds with some examples

Some of these mutants are implemented differently in mutation testing software. For example, Dextool introduce template code when generating ABS mutants.

The method of mutation testing is based on two hypotheses; the competent programmer, stating that most of the software faults are due to small syntactic errors introduced by the

(21)

3.2. Equivalent mutation

programmer; and the coupling effect, asserting that simple faults can escalate or couple into bigger and more complex faults [7].

3.2 Equivalent mutation

In regards to mutation testing, there is a type of mutant called an equivalent mutant. An equivalent mutant can be described as a syntactical alteration of code that does not change the behaviour of the program. Given the input x to the functions forigand fmut, the observed behaviour, return values, modifications of global variables and other side effects, are the same for all x where forigis the original function and fmutis the mutated version [17].

As shown in Figure 3.1 and Figure 3.2, the difference in the code is the condition on what the for-loop iterates on. The two figures together show an equivalent mutant since the two programs iterate the same amount of times executing whatever statement in the loop the same amount of times. This results in the developer, conducting mutation testing, not being able to kill the mutant since there are no input (test cases) that will cause a different output from the original code.

When conducting mutation testing the mutation score is determined by the previously mentioned formula in section 3.1. As shown in that formula, the number of equivalent mu-tants are subtracted from the total amount of mumu-tants in the final score. This is because of the fact that equivalent mutants can not be killed by a test case since the mutant will behave exactly the same as the non-altered code as described earlier. Mutation testing could there-fore prove to be very time consuming when it comes to handling equivalent mutants since it usually requires some form of manual intervention in order to detect them [1].

3.3 Symbolic Execution and constraint solving

Symbolic execution is a way to e.g. analyze programs to determine what inputs that cause different paths in the program to execute. This is done by using symbolic values for vari-ables when following a path in the program, giving the constraints for the different symbolic variables when a full path has been explored.

When symbolic execution was first proposed in the 70s it was not possible to use in real world examples [11]. It was first after numerous advancements in constraint solving and scalable approaches, that combine symbolic execution with concrete values, that the tech-nique regained interest once again [5]. The combination of these advancements has now made the technology attractive to use outside of research projects and is now being used in areas ranging from automatic test generation to automatically detecting security defects in programs.

Symbolic element

In order to evaluate functions symbolically, the variables need to be extended to hold expres-sions instead of concrete values as normally done. Traditionally the values are calculated as they are used, however in the case of symbolic variables they are calculated later when sent to the constraint solver. For example a four byte integer can be any value in the range of 232 values, depending on the hardware architecture. Symbolically it can hold any of these val-ues before taking conditions into account, meaning that the variable can be any value in the interval when the code is executed.

Symbolic execution in practice

Symbolic execution can be used because of its path finding capabilities. Instead of testing all possible values it uses symbolic values to determine the present constraints for each possible path [5]. Upon termination these constraints are passed on to a constraint solver which will

(22)

3. THEORY

calculate concrete values for those specific constraints. After one path has been exercised one condition is flipped so that another path is executed instead and concrete values for that path are obtained as well. When all paths are exercised the resulting values can be used for many different purposes, some more obvious than others. One example is when outputs from symbolic execution are used as inputs for a test suite.

In order to demonstrate how symbolic execution works, the steps will be explained using a small code example with a limited number of paths. The code present in Figure 3.3 is used as an example where two variables are symbolic variables. The program contains three different paths which are all satisfiable using some values.

1 // Symbolic v a l u e s 2 i n t a = α ; 3 i n t b = β ; 4 // Concrete v a l u e s 5 i n t x = 1 ; 6 i n t y = 2 ; 7 8 i f ( a < 2 ) { 9 i f ( b < 5 ) { 10 r e t u r n 1 ; 11 } e l s e { 12 r e t u r n 2 ; 13 } 14 } e l s e { 15 r e t u r n 3 ; 16 }

Figure 3.3: Example of code containing several paths

After symbolic execution has been performed the present paths will be formulated as path constraints, one for each path present in the code example. As mentioned earlier there are three different paths in the code, first path will evaluate the two if-conditions to true which will return the value 1. The second path will evaluate the second if-condition to false and will return 2, the last and third path will evaluate the first if-condition to false which will return the value 3. These path constraints are put together in Figure 3.4.

1 PC1 = a < 2 , b < 5 2 PC2 = a < 2 , b >= 5 3 PC3 = a >= 2

Figure 3.4: Path constraints for the paths in Figure 3.3

After the path constraints have been obtained it is time to evaluate the paths using con-crete values. The path constraints will be used as input to a constraint solver which will try to find values for the variables that satisfy all the given constraints. If the solver succeeds it will return values that can be used as input for said path. The result obtained from the constraint solver, for the example in Figure 3.3, can be viewed in Figure 3.5. The results are different for each solver, the only guarantee is that it returns one set of concrete values for each path. For the third path there are no constraints on b, so any value will work for that specific path.

(23)

3.4. LLVM

1 a = 1 , b = 1 2 a = 1 , b = 5 3 a = 2 , b = 5

Figure 3.5: Input values to different paths provided by the constraint solver

3.4 LLVM

LLVM1is a group name for different toolchains, compilers and libraries that are developed under the University of Illinois/NCSA Open source License. The initialism LLVM has offi-cially been removed from the previous meaning in order to avoid confusion regarding what LLVM really is. Originally, LLVM was created as a compiler framework designed to sup-port transparent, lifelong program analysis and transformation for arbitrary programs. It did this by, in compile-time, link-time, run-time and idle-time, provide high-level information to compiler transformations [13]. It has since then evolved into several other projects and tools generally called the LLVM umbrella project.

3.5 KLEE LLVM Execution Engine

KLEE LLVM Execution Enginge (KLEE)2is a tool built to automatically generate test input that achieves high coverage by using symbolic execution [6]. It is an open source tool avail-able on github with continuous improvement and bugfixes. KLEE itself is a symbolic virtual machine that is built on the LLVM compiler infrastructure and is currently released (as a sta-ble version) for LLVM 3.4, but has continuous development and support for newer versions through the open source community.

KLEE is built on the LLVM compiler infrastructure and it supports execution on programs written in C and C++ with limited support for standard libraries in C++. The tool is also limited and unable to handle certain types of values, for example floating points, due to the nature of how computers handle floating points in memory (see section 1.5). However, some limitations are possible to circumvent due to libraries (for example KLEE uclibc3) and functionality available as open source additions to KLEE.

3.6 Dextool

Dextool is an open-source framework for writing plugins using libclang4. The framework is written in the D programming language and consists of tools for testing and analyzing C/C++ code. Some of the available plugins are Mutate - a plugin for conducting mutation testing, C TestDouble/C++ TestDouble - plugins for generating test doubles, and Analyze - a plugin for generating complexity numbers[3]. Some of the plugins for Dextool are production ready and used in production environment. Other plugins, such as GraphML (a plugin for generating a GraphML representation), is currently at beta level.

1_{LLVM (visited 2019-05-04) - https://llvm.org/} 2_{KLEE (visited 2019-05-04) - http://klee.github.io/}

3_{KLEE uclibc (visited 2019-05-04) - https://github.com/klee/klee-uclibc} 4_{Libclang (visited 2019-06-23) https://clang.llvm.org/docs/Tooling.html}

(24)

3. THEORY

As described earlier in section 3.1, a mutant in mutation testing is marked depending on the outcome of the execution of a test suite. In Dextool Mutate, this is extended to the following markings or statuses:

• Time out - When running the test suite on a specific mutant takes too long to finish it is interrupted and given the status time out. These mutants are treated as killed since it is an observable difference between them and the original code when running the test suite in terms of execution time. How long the mutant will be tested is determined by Dextool Mutate. The Dextool documentation contains more details regarding motiva-tion of the time out mutants and the algorithm used [3].

• Killed - The test suite failed when ran on the mutant which means that the mutant was uncovered by the tests. A failed test suite is caused by one or several tests not passing. • Alive - The test suite terminates without errors when ran on the mutant which means

that the mutant survived, i.e. was not detected by the tests. A test suite terminating without errors means that all tests passed.

• KilledByCompiler - When an introduced mutation changed the program in such a way that it is no longer able to compile, the status KilledByCompiler is given. These are discarded before calculating the mutation score since they are considered as stillborn mutants.

3.7 Path explosion

One downside still present after numerous advancements in symbolic execution is path ex-plosion. Since symbolic execution explores all paths in a program, this problem becomes increasingly more serious for larger programs given that the number of paths increase expo-nentially as a program grows in size (in the general case) [12].

There are some ways to reduce the impact of this problem, for example by reducing exe-cution time for each individual path or by merging similar paths. Another way of reducing this problem is by limiting the amount of time KLEE is allowed to run on a single program using option flags5. This affects the result in a negative way since KLEE is not able to give a certain answer regarding the status of the program in any way (basically halting execution when time runs out) but makes it possible to utilize symbolic execution on real examples and discard those who cause problems.

5_{KLEE options (visited 2019-05-04) - https://klee.github.io/docs/options/}

(25)

3.8. Pure functions

3.8 Pure functions

There are different kinds of functions in programming that are both defined and behave com-pletely different from each other in terms of fundamental properties. A pure function6is a function that does the following:

1. Returns the exact same result every time it is called upon, given that the arguments to it is the same. It is a function without states that can not access other external states, basically having no information regarding the world outside its own scope.

2. Has no side effects, meaning that discarding the result will ultimately only use proces-sor time and power in order to accomplish nothing.

Figure 3.6 is an example of a pure function. This function takes two integers, compares them and the biggest of them is returned. In accordance to the definition of a pure function, this function will every time it is called give the same answer given that the input is the same. 1 i n t max ( i n t a , i n t b ) { 2 i f ( a >= b ) { 3 r e t u r n a ; 4 } e l s e { 5 r e t u r n b ; 6 } 7 }

Figure 3.6: Example of a pure function

6 _Pure _functions _- _{https://www.schoolofhaskell.com/school/starting-with-haskell/}

(26)

(27)

4 Method

This chapter will give a detailed description of the method in this thesis by demonstrating how the set up, workflow and execution was conducted. First, we describe how Dextool Mu-tate was run in order to generate mutants that would be further investigated based on their status. After that the generation of files prepared for symbolic execution and constraint solv-ing is described, where one file is used to analyze the output simultaneously from the source and mutated code when given the same input. Followed by that is a section of how failure runs, path explosion and other edge cases were handled. Last in this chapter, automating the whole process as a plugin for Dextool is explained.

4.1 Setup used

All experiments were conducted using a Dell Latitude E6430 with the specifications listed below.

• CPU 2.5-GHz Intel Core i7-3510M • Operating System Ubuntu 18.04.1 LTS • RAM 8GB

• Hard Drive Size 128GB • Hard Drive Type SSD Drive

• Graphics Card Intel HD Graphics 4000 • Video Memory 64MB

The software used when conducting mutation testing was Dextool, taken from the master branch Jun 21, 2018 [3]. The software used for symbolic execution in this thesis was KLEE, version 1.4.0.

(28)

4. METHOD

4.2 Examples used

Table 4.1 lists the small code examples used in this thesis along with the Lines of Code (LoC). These examples were artificially created with inspiration from real mutation findings in safety critical code. They also had the intention of testing different functionality in the method such as void functions, simple loops, recursion, and programs with more paths. The code for these examples can be seen in Appendix A. One example, triangle_example.cpp, was taken from the examples available on the Dextool-github [3] and was modified slightly.

Table 4.1: Small code examples used.

Program LoC decrement_example.cpp 7 f ibonacci_example.cpp 13 enum_example.cpp 37 void_example.cpp 12 lottery_example.cpp 19 triangle_example.cpp 39

Table 4.2 lists the examples taken from real safety critical code examples with slight mod-ifications. The table also show the lines of code each example consisted of. The intention of the examples were to test code taken directly from real safety critical applications (code developed by Saab Aeronautics). The name of the functions, along with the code itself, is redacted due to them being confidential. Our modifications to the code was made in order to execute it outside of its original operational environment and context (functional behavior unchanged).

Table 4.2: Safety critical code examples used.

Program LoC

f unc1_example.cpp 5 f unc2_example.cpp 9 f unc3_example.cpp 40 f unc4_example.cpp 13

4.3 Running Dextool Mutate on examples

In order to conduct mutation testing (see section 3.1), and to obtain surviving mutants that are of interest, Dextool Mutate is used. A desire from the external supervisors of this thesis is to utilize Dextool for conducting mutation testing (other tools for mutation testing are therefore not investigated any further). First of all, the test suite is run in order to establish a baseline for its execution time. The program then takes a set of files as input and generates all the possible mutation points in the files. These mutation points will mark a certain spot in the code where one or several mutation kinds can be applied (see Table 3.1 for example of mutation kinds). One mutation point can result in several different mutants since it follows a specific mutation schema used by Dextool Mutate (see Appendix D for more schemas). Table 4.3 demonstrates one mutation schema for an ROR-mutation where both the left and right hand side of a condition are of boolean types. These schemes are determined by Dextool and the reasoning behind them can be found on the Dextool github [3].

The amount of mutants generated for a specific file will depend on the amount of mu-tation points Dextool Mutate finds. These mumu-tation points can then vary in terms of what mutations that can be applied to the specific point. For example, one mutation point inside an if-statement could specify that both the ROR-mutation and the DCC-mutation could be

(29)

4.4. Generate files prepared for symbolic execution

Table 4.3: Mutation schema for an ROR-mutation (boolean types) Original expression Mutant 1 Mutant 2

x == y x != y false

x != y x == y true

applied in that specific point. The ROR-mutation could then generate several mutants de-pending on what mutation schema is used, which depends on what type the variables in the statement are and what the statement itself is. Dextool only generates first order mutations, meaning that only one mutation is applied to one mutation point at the time. The total num-ber of mutants for that program will be the sum of the mutants each of the mutation generates in all the mutation points. The mutants generated from the different mutations can be seen in Appendix D.

Dextool Mutate will alter the code base, using these mutation points by inserting spe-cific mutants according to the different schemas, and rerun the test suite. Certain mutation schemas, for example ABS, will also insert template code into the code base that is used. Ex-ample of this can be seen in Figure B.2. After the test suite has been run, the status can be flagged accordingly. This information is then saved in a database that will be used in or-der to fetch surviving mutants that will be analyzed using symbolic execution and constraint solving in later steps.

Alongside with a set of files, the plugin can also take different commands as input to change what mutations the user wants the plugin to execute. The different mutations are explained in Figure 3.1. Exactly how these different options are changed is explained thor-oughly in the documentation of the Dextool [3].

4.4 Generate files prepared for symbolic execution

In order to use symbolic execution for detection of mutants, the surviving mutants in the mutation testing phase is prepared as input. This is done by setting up a scenario according to Figure 4.1 where the KLEE file, the input file to KLEE, will act as a driver of the two other files.

(30)

4. METHOD

Source file Mutant file

KLEE file (driver)

Figure 4.1: Overview of how the generated files are set up

The goal is to conclude if the behaviour is the same for the mutant and the source meaning that the function will be called in the mutant file and in the source file simultaneously in order to determine whether they differ in behaviour or not when they are given the same input. The files illustrated in Figure 4.1 have the following content:

• Source file - Contains the original code without any alterations. Example can be seen in Figure 4.2.

• Mutant file - Contains the code obtained after inserting a mutation into the source file. Example can be seen in Figure 4.3 where the mutation is marked.

• KLEE file - Contains the code needed to call the function where the mutation occurred in both the Source file and the Mutant file and determine whether they differ or not. Example can be seen in Figure 4.4.

1 i n t decrement_example ( i n t x ) { 2 f o r ( i n t i = 0 ; i < 1 0 ; i + + ) { 3 x = x ´1; 4 } 5 r e t u r n x ; 6 }

Figure 4.2: Source file generated for symbolic execution.

The mutated version will however contain the same variables and functions with the same name as the source file which violates the One Definition Rule1_{. This can be remedied by} changing the name of all the classes, functions and variables (all the declarations in the file) for the mutated version. This will change the name of all the local variables, which is com-pletely redundant. However, it does not affect the functionality of the code.

The KLEE file, that has the task to compare the function in the two other files, is presented in Figure 4.4. In order to decide whether the two files have a different behaviour or not, the KLEE file will import the two other files and call the same function where the mutation has taken place. The function parameters are then made symbolic by utilizing the KLEE library. This is done on line 9 and 10 in Figure 4.4. After that, in order to make sure that variables are the same symbolic value, they are asserted to be equal on line 11 in Figure 4.4 using klee_assume (another function available in the KLEE library imported on line 1 in the same figure). The function calls are made inside an if statement (line 13 in Figure 4.4) making

1_{One Definition Rule (ODR) (visited 2019-05-04) - https://en.cppreference.com/w/cpp/language/}

definition

(31)

4.5. Running KLEE on the surviving mutants

1 i n t m_decrement_example ( i n t m_x ) {

2 f o r ( i n t m_i = 0 ; m_i != 1 0 ; m_i + + ) {

3 m_x = m_x´1;

4 }

5 r e t u r n m_x ; 6 }

Figure 4.3: Mutated file generated for symbolic execution

KLEE evaluate paths in which the statement is true or where it is not. This is done by creating constraints for the paths respectively (see section 3.3). When all the paths are evaluated in the program, it can be determined that the mutant is equivalent if the assertion is never violated and not equivalent if the assertion is violated.

1 # i n c l u d e < k l e e / k l e e . h> 2 # i n c l u d e < a s s e r t . h> 3 # i n c l u d e " s o u r c e f i l e . cpp " 4 # i n c l u d e " m u t a n t f i l e . cpp " 5 6 i n t main ( ) { 7 type in pu t ; 8 type m_input ;

9 klee_make_symbolic (& input , s i z e o f ( type ) , " in pu t " ) ; 10 klee_make_symbolic (&m_input , s i z e o f ( type ) , " m_input " ) ; 11 klee_assume ( in pu t == m_input ) ;

12

13 i f ( decrement_example ( in pu t ) == m_decrement_example ( m_input ) ) { 14 r e t u r n 1 ; 15 } 16 e l s e { 17 k l e e _ a s s e r t ( 0 ) ; 18 r e t u r n 0 ; 19 } 20 }

Figure 4.4: Structure for the generated file prepared for symbolic execution.

4.5 Running KLEE on the surviving mutants

After the mutation testing has been conducted according to section 4.3 the surviving mutants are used as input for the next step which is to decide whether said mutants are equivalent or not. Previous mutation testing also serves as a baseline when analyzing how big of an impact the detection of equivalent mutants has on the mutation score.

The detection of equivalent mutants is done by utilizing symbolic execution and con-straint solving, and in order to perform this a set of files is generated as described in section 4.4. After the files have been generated they are compiled to LLVM bitcode in order to use them as input for KLEE. Along with the LLVM bitcode, KLEE can take different configuration parameters as input, for example time constraints limiting for how long symbolic execution will be performed (see Figure 4.6).

(32)

4. METHOD

4.6 Presenting result

After KLEE has successfully analyzed all the surviving mutants the obtained result is put together in three different tables, one for each of the three steps.

Table 4.4: Example of a table used to present result from running Dextool Mutate.

Program Mutants Time out Killed Alive

program_name.cpp 20 2 10 8

Table 4.4 is the template table for presenting the results received from the mutation testing done with Dextool Mutate (The numbers in Table 4.4, 4.5 and 4.6 are arbitrary chosen in order to show how the result from real code examples in chapter 5 will be presented). An example of how the table is used can be seen in section 5.1. The template table consists of the total number of mutants, the number of killed mutants and the number of alive mutants. The table also contains information about how many mutants timed out during testing which could happen when a mutant takes considerable more time to test than the original code. This can happen when for example an exit condition in a loop is mutated so that the loop becomes infinite making the test suite never finish which means that the mutant can’t be marked alive or killed. When this occurs the mutant is marked timed out and considered as killed.

Table 4.5: Example of a table used to present result from running symbolic execution. Program Alive Unknown Time out Not-equivalent Equivalent

program_name.cpp 8 2 1 3 2

Table 4.5 is a template table containing the information that will be obtained from analyz-ing the survivanalyz-ing mutants usanalyz-ing symbolic execution and constraint solvanalyz-ing. In this case the time out column is not caused by mutants taking longer than the original code. It is instead connected with the time constraint, set when executing KLEE, being violated and the execu-tion therefore halted. The column marked unknown is used for the cases where KLEE fails to run, for example if errors occur during compilation of LLVM bitcode. This can happen if a different compiler, other than Clang 3.4, is used when conducting the mutation testing. Given that these two compilations occurs at different time in the process, one being when conducting mutation testing and the other when compiling LLVM bitcode, the two different compilers might generate different warnings and errors. An example of this is presented in section 4.8.

Table 4.6: Example of a table used when calculating mutation score before and after symbolic execution.

Program Score Adjusted score Change in pp programname.cpp 50.00% 55.56% +5.56 pp

Table 4.6 is a template table demonstrating how the mutation score, before and after an equivalence check has been conducted, is calculated according to the formula presented in section 3.1. The first column in the table is where all alive mutants are assumed to be not equivalent as they are unknown to the user at that point in time. The second column in the table is the score after the equivalence check where the number of equivalent mutants are subtracted from the total. The last column is the difference between the first calculated score and the score after KLEE execution.

(33)

4.7. Void functions

4.7 Void functions

Figure 4.4 shows the generated file that KLEE uses in order to differentiate the two functions symbolically. This is only possible if the functions actually return a value that KLEE can compare, i.e. the function is pure.

In order to verify that mutants are equivalent we can compare the parameters after execu-tion instead of the return value as show in Figure 4.4. The comparison can be seen in figure 4.5 on line 16 after the functions have been called on line 13 and 14. Optimally, this is done on pointers and when functions modify a value, for example clearing a buffer or similar be-haviours, that can be checked after execution. Given that the function is pure (see section 3.8) or if the parameters can be checked, it can be guaranteed that KLEE can be used to differen-tiate the two functions in the same fashion as before. By utilizing KLEE and its workflow, we can add a constraint which will affect the query KLEE does to the constraint solver. This can be seen on line 11 in Figure 4.5. By letting KLEE execute the two functions symbolically as before, but with the constraint that the parameters to the functions are assumed to be equal, we can let KLEE traverse all paths again and check if the parameters were altered by the same if-statement as before. If KLEE manages to find a difference between them, we can know for sure that the mutant behaved differently and thus being not-equivalent. The same behaviour can be checked for functions that do return values as well as having references as parameters. 1 # i n c l u d e < k l e e / k l e e . h> 2 # i n c l u d e < a s s e r t . h> 3 # i n c l u d e " s o u r c e f i l e . cpp " 4 # i n c l u d e " m u t a n t f i l e . cpp " 5 6 i n t main ( ) { 7 type in pu t ; 8 type m_input ;

9 klee_make_symbolic (& input , s i z e o f ( type ) , " in pu t " ) ; 10 klee_make_symbolic (&m_input , s i z e o f ( type ) , " m_input " ) ; 11 klee_assume ( in pu t == m_input ) ; 12 13 void_example ( in pu t ) ; 14 m_void_example ( m_input ) ; 15 16 i f ( in pu t == m_input ) { 17 r e t u r n 1 ; 18 } 19 e l s e { 20 k l e e _ a s s e r t ( 0 ) ; 21 r e t u r n 0 ; 22 } 23 }

(34)

4. METHOD

4.8 Faulty execution runs and unknown cases

When the phase of either compiling the generated files or executing KLEE fails, there are still important information that can be extracted regarding the specific mutants. This section will describe the method of handling such examples and the reason behind their occurrence in the first place.

Functions with side-effects

As described in section 3.2, a mutant is equivalent if the behaviour is exactly the same before and after the mutation occurred. This means that the mutant is guaranteed to be equivalent if all side-effects of the tested function is taken into consideration. If a mutation occurred in a function that had many dependencies outside the scope of the function where the mutation were present, it was hard to guarantee that all side-effects were checked when preparing the files that were used by KLEE.

Since it was too difficult to decide whether all side-effects were detected or not it was impossible to flag a mutant equivalent in these cases as the mutant could differ from the source on a level where it was not detected (for example printing differently).

This could have happened if we only checked for return values and input parameters while the function also alter the values in global variables. The only time we could guaran-tee that the marking of mutants were correct was in the case of non-equivalence. Only one difference had to be found in order for KLEE to detect non-equivalence when comparing the side-effects of a function (usually global variables).

Path explosion

Path explosion is, as mentioned earlier in section 3.7, a non-trivial problem when using sym-bolic execution. In order to avoid that executions will become unusable because of the re-sources it uses (mostly time), a method of mitigation can be applied. By using options in KLEE we can make sure that execution is halted when certain time limits are met. This can limit the impact of path explosion significantly so that the test can continue in terms of check-ing the next mutant instead of becheck-ing stuck on the current one, investigatcheck-ing every path among too many. Figure 4.6 shows this option being used where KLEE is instructed to halt execution after x seconds have passed.

k l e e ´max´time=x f i l e _ t o _ r u n _ s y m b o l i c _ e x e c u t i o n _ o n

Figure 4.6: Option flags when running KLEE on the generated files.

As shown earlier in Figure 4.4, the entry point for KLEE to start symbolic execution on is a function or method where the mutation has occurred. Only calling the function/method of interest reduces the number of paths KLEE needs to investigate in order to check if the mutant is equivalent or not. This can be compared to methods presented earlier in chapter 2 where symbolic execution often is used on entire programs in order to explore all existing paths of a program instead of the ones that are of interest to us.

Timed out or halted executions

When KLEE reports a time out or halting of an execution, some relevant information is still being presented by KLEE in terms of what it did during the execution before halting. This in-formation is the number of paths completed, amount of tests generated and the total amount of instructions KLEE carried out during the execution. This information can be presented in cases where a mutant was marked time out in order to provide more details regarding the

(35)

4.8. Faulty execution runs and unknown cases

execution run. One example could be where the number of paths is known, or knowing the number of paths KLEE traversed is interesting, in the program.

Table 4.7: Example of time out data for example_program.cpp Time per mutant # Paths # Instructions Number of mutants tested

1 10 25000 7

10 50 250000 7

60 250 2500000 7

300 1000 250000000 7

Table 4.7 is an example of what data gathered for a time out mutant could look like. The first column describes the amount of time given to KLEE when executing before halting the execution. The second column is the number of paths KLEE completed in the execution and the third is the number of instructions KLEE executed. The fourth column is the number of mutants tested in order to get the data. The data is the average value of the mutants that were iterated over. For example, in the table it is illustrated that in a made up example we iterated over 7 mutants which completed an average of 10 paths and 25 000 instructions when allowing KLEE to execute 1 second before halting. The data aims to answer whether the approach is scalable or not by analyzing trends for how many paths/instructions KLEE explores given a set of time constraints. Only the average is investigated to remove any flaky results and to analyze trends between the different time constraints.

Compilation errors and unexpected crashes

There were scenarios where KLEE was unable to conduct symbolic execution on the gen-erated files due to compilation errors. This made the status of mutants to be marked alive during one phase and unknown in another and therefore needed to be handled differently.

If the code is compiled with a different compiler it could potentially cause problems when conducting symbolic execution. Since KLEE was released as a stable version for clang 3.4, the code needed to be compiled into LLVM bitcode using that specific compiler. Certain ex-pressions generates errors when compiling with one compiler and only warnings on another (Figure 4.7 and Figure 4.8 illustrates an example of such a scenario).

This could make mutants marked as alive in the mutation testing phase not to be inves-tigated in the symbolic execution phase. In cases such as these, no result could be obtained regarding the equivalence status of the alive mutant. The same goes for cases where KLEE crashed or halted execution due to errors. These mutants were all marked as unknown since the reason behind them failing was uncertain.

Another case where KLEE failed to provide concrete information was when KLEE had to make external calls outside the compiled LLVM bitcode provided in the execution run. KLEE operates on LLVM bitcode (see section 3.5) in order to perform symbolic execution and calls made outside the compiled bitcode was displayed as external. There were cases where KLEE could continue with the execution after such a call was made, and cases where it could not. The results provided by KLEE when performing external calls were therefore discarded as faulty or unknown as it can not be ensured that the equivalence status is correct.

(36)

4. METHOD 1 // mutated v e r s i o n 2 s w it c h ( !x ) { 3 c a s e 0 : 4 module = Off ; 5 break ; 6 c a s e 1 : 7 module = Ready ; 8 break ; 9 c a s e 2 : 10 module = Steady ; 11 break ; 12 c a s e 3 : 13 module = Go ; 14 break ; 15 d e f a u l t : 16 break ; 17 } ;

Figure 4.7: Mutated switch statement

When compiling the code in Figure 4.7 using Clang 3.4 it would fail to compile, with the errors and warnings present in Figure 4.9. When trying to compile the exact same file with a newer version, Clang 6.0, the file compiles without errors, however the warnings are still present which makes it possible for a mutant of this kind to pass the mutation testing while failing to compile when prepared for KLEE.

1 // o r i g i n a l v e r s i o n 2 s w it c h ( x ) { 3 c a s e 0 : 4 module = Off ; 5 break ; 6 c a s e 1 : 7 module = Ready ; 8 break ; 9 c a s e 2 : 10 module = Steady ; 11 break ; 12 c a s e 3 : 13 module = Go ; 14 break ; 15 d e f a u l t : 16 break ; 17 } ;

Figure 4.8: Original switch statement

(37)

4.9. Tests for small code examples

1 warning : s w i tc h c o n d i t i o n has boolean value 2 s w it c h ( x ) {

3 ^ ~~~~

4 warning : overflow c o n v e r t i n g c a s e value t o s w i t c h c o n d i t i o n type 5 ( 3 t o 1 ) [´Wswitch ]

6 c a s e 3 :

7 ^

8 warning : overflow c o n v e r t i n g c a s e value t o s w i t c h c o n d i t i o n type 9 ( 2 t o 0 ) [´Wswitch ] 10 c a s e 2 : 11 ^ 12 e r r o r : d u p l i c a t e c a s e value ’ 0 ’ 13 note : pr ev io u s c a s e d e f i n e d here 14 c a s e 0 : 15 ^ 16 e r r o r : d u p l i c a t e c a s e value ’ 1 ’ 17 c a s e 3 : 18 ^ 19 note : pr ev io u s c a s e d e f i n e d here 20 c a s e 1 :

Figure 4.9: Errors and warnings from compilation

4.9 Tests for small code examples

When conducting mutation testing, as described earlier in section 4.3, a test suite is needed in order to check the status of the mutation that occurred (see section 3.1 for details). The code that was used in the mutation testing phase can be seen in Appendix A and functioned as simple examples where verification regarding the received result easily could be done manually if needed. The tests that were used for mutation testing can be seen in Table 4.8-4.12. The test cases were derived manually and do not fulfill any strict requirements since they are only used in order to provide a test suite to kill the most trivial mutants generated during mutation testing.

Table 4.8: Test values for decrement_example.cpp. ID Input Expected result

1 5 ´5

2 0 ´10

3 10 0

In accordance to the work conducted by Schuler and Zeller (see section 2.4), a test suite with high percentage in detection of non-equivalent mutants will cause the remaining live mutants to be equivalent in a higher percentage ratio [23]. This is not the intended case in this thesis since detection of non-equivalent mutants is also of interest. The used test suites in Table 4.8-4.12 were therefore created with the intention of not having 100 percent accuracy in regards to detecting non-equivalent mutants. The intention was to provide a good enough test suite that detected the most trivial mutants but was still unable to detect some of the edge cases.

As earlier shown by Papadakis and Malevris (see section 2.2), symbolic execution can be used to automatically generate test cases for mutants [19]. Since the aim of this thesis is to improve the mutation score of a test suite, as described earlier in section 1.3, having non-equivalent mutants surviving the mutation testing phase also serves a secondary purpose.

Systematically uncovering mutants in testing safety critical software : Using symbolic execution on surviving mutants from mutation testing

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science

2019 | LIU-IDA/LITH-EX-A--19/105--SE

Systematically uncovering

mu-tants in testing safety critical

software

Using symbolic execution on surviving mutants from mutation

testing

Systematiskt upptäckande av mutanter i testning av

säkerhet-skritisk mjukvara

Användning av symbolisk exekvering på överlevande mutanter

från mutationstestning

Nils Petersson

Niklas Pettersson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Background

1.2

Motivation

1.3

Aim

1.4

Research questions

1.5

Delimitations

2

Related work

2.1

Mutation testing at Saab

2.2

Automatic Test Case Generation via Dynamic Symbolic Execution

2.3

The impact of equivalent mutants

2.4

(Un-)Covering Equivalent Mutants

2.5

Using KLEE for high coverage tests

2.6

Symbolic Execution of Java Bytecode

2.7

Detecting equivalent mutants and the feasible path problem

3

Theory

3.1

Mutation testing

3.2

Equivalent mutation

3.3

Symbolic Execution and constraint solving

Symbolic element

Symbolic execution in practice

3.4

LLVM

3.5

KLEE LLVM Execution Engine

3.6

Dextool

3.7

Path explosion

3.8

Pure functions

4

Method

4.1

Setup used

4.2

Examples used

4.3

Running Dextool Mutate on examples

4.4

Generate files prepared for symbolic execution

4.5