• No results found

Efficient Generation of Mutants for Testing Execution Time

N/A
N/A
Protected

Academic year: 2021

Share "Efficient Generation of Mutants for Testing Execution Time"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

School of Innovation, Design and Engineering (IDT) Division of Software Engineering

Efficient Generation of Mutants for Testing Execution Time

Software Engineering

Master Thesis

by

Mohammed Z. B. Abuayyash

Mohammed M. Z. Abusamaan

mah18005@student.mdh.se

man18071@student.mdh.se

DIVISION OF Software Engineering

MÄLARDALEN UNIVERSITY SE-721 23 VÄSTERÅS, SWEDEN

(2)

School of Innovation, Design and Engineering (IDT) Division of Software Engineering

Efficient Generation of Mutants for Testing Execution Time / Software Engineering Date:

Jun-2020

Project name:

Efficient Generation of Mutants for Testing Execution Time Author(s): Mohammed Z. B. Abuayyash Mohammed M. Z. Abusamaan Supervisor(s): Mehrdad Saadatmand Pasqualina Potena Reviewer: ———-Examiner: Björn Lisper Comprising: 15 ECTS credits

(3)
(4)

Acknowledgements

We want to appreciate everyone who played a role in our thesis, our committee members, each of whom has provided gentle encouragement and guidance throughout the thesis process. Thank you all for your steadfast support.

(5)

Abstract

In this thesis, we specifically focus on testing non-functional proprieties. We target the Worst-Case Execution Time (WCET), which is very important for real-time tasks. This thesis applies the concept of targeted mutation, where the mutations are applied to the parts of the code that are most likely to signicantly affect the execution time. Moreover, program slicing is used to direct the mutations to the code parts that are likely to have the strongest inuence on execution time. The main contribution of the thesis is to implement the method for the experimental evaluation of the targeted mutation testing.

This thesis confirms the relevance of the concept of targeted mutation by showing that targeted mutation enables the design of effective test suites for WCET estimation and im-proves the efficiency of this process by reducing the number of mutants. To experiment the method, we used two C benchmarks from Mälardalen University benchmarks, The first benchmark is Janne-complex code, it consists of 20 line of code and 10 test cases, This benchmark contains two loops, where the inner loop max number of iterations de-pends on the outer loop current iterations. The results correspond to something Janne flow-analysis should produce. Where the second benchmark is Ludcmp, it consists of 50 lines of code and 9 test cases. This benchmark is a simultaneous linear equation by LU decomposition; it contains two arrays for input and one array for output. However, the number of equations determined by the variable (n).

However, both benchmarks LOC decreased after slicing, and each test case applied ten times against each mutant to eliminate the effect of cache and other applications cur-rently running since our experiments depend on estimation of the WCET. The thesis con-firms the relevance of the concept of targeted mutation by showing that targeted mutation (i) encourages the design of effective test suites for WCET estimation and (ii) improves the efficiency of this process by reducing the number of mutants.

(6)

Table of Content

Acknowledgments i

Abstract ii

List of Figures vi

List of Tables vii

List of Abbreviations viii

1 Introduction 1 1.1 Motivation . . . 1 1.2 Problem Formulation . . . 3 1.3 Related Work . . . 4 1.3.1 WCET . . . 4 1.3.2 Mutation Testing . . . 4 2 Methodology 6 2.1 The methodology of slicing . . . 6

2.2 The methodology of generating mutants . . . 11

(7)

3 Experimental Study 13

3.1 Test-bed & Instrumentation . . . 13

3.1.1 SWEET . . . 13 3.1.2 PROTEUM . . . 14 3.2 MDH Benchmarks . . . 15 3.3 Experimental Design . . . 16 3.4 Statistical Analysis . . . 16 3.5 Limitations . . . 18

4 Results and Discussion 19 4.1 Results . . . 19 4.1.1 Results for RQ . . . 20 4.2 Discussion . . . 20 5 Validity Threats 23 5.1 Internal Validity . . . 23 5.2 External Validity . . . 24

6 Conclusion and Future Work 25 6.1 Conclusion . . . 25

6.2 Future Work . . . 26

7 Appendix 27 7.1 Benchmarks’ Source Code . . . 27

7.1.1 Janne_complex . . . 27

(8)

7.2 Benchmarks’ Results . . . 33

(9)

List of Figures

2.1 The original program . . . 7

2.2 Sliced program . . . 7

2.3 Data dependence . . . 8

2.4 Control dependence . . . 8

2.5 C Program . . . 9

2.6 C Program control flow graph . . . 10

2.7 C Program (Sliced) . . . 10

2.8 Targeted mutation workflow diagram . . . 12

3.1 Mutant-A ETs in round 1 . . . 17

3.2 Mutant-A ETs in round 2 . . . 17

7.1 The number of mutants in Traditional & Targeted Mutation Testing for janne_complex.c 34 7.2 The number of mutants in Traditional & Targeted Mutation Testing for lud-cmp.c . . . 35

(10)

List of Tables

3.1 HP ZBook 15 G4 categories . . . 14

3.2 The number of mutants and TCs for the MDH benchmarks . . . 15

4.1 ns.c statistics before slicing . . . 19

4.2 janne_complex.c statistics after slicing . . . 19

4.3 ns.c statistics before slicing . . . 20

(11)

List of Abbreviations

C | E | G | I | L | M | O | R | T | U | W C

CFG Control Flow Graph E

Err Error

ET Execution Time G

GUI Graphical User Interface I

I/O Input/Output Inf Infinity

L

LOC Line Of Code

LU Lower-Upper decomposition M

MDH Mälardalens Högskola

MININT Minimum Windows NT (New Technology; Microsoft)

O

OOS Out-Of-Slicing R

(12)

R,U,S Real,User,System Ref Reference

T

TCs Test Cases U

UNIX Uniplexed Information and Computer Systems W

(13)

Chapter 1

Introduction

In the last few years, the evolution in software development and information tech-nology has proliferated, leading to increasing the dependency on real-time and software systems in most of our life aspects. Many embedded systems are real-time systems. It is essential to have proper testing practices for these systems (comprising essential and critical systems involving people, organizations, or any other disciplinary).

Embedded systems are the basis of many information environments in our daily life now. The Non-Functional Properties (NFP) of such systems are becoming very import-ant. NFPs includes security, performance, reliability, availability, robustness, efficiency, scalability, and fault-tolerance. Good methods are required for verifying these properties. In particular, it is important to have good testing practices for this purpose. Moreover, the efficient testing for NFPs relies on the existence of effective test suites that target these properties [1].

1.1

Motivation

Mutation testing is one way to create effective test suites [1]. The mutation analysis was introduced by DeMillo et al. [2]. In this thesis, we specifically focus on testing non-functional proprieties. We target the Worst-Case Execution Time (WCET) which is con-sidered as an important propriety for real-time tasks.

This thesis is based on previous work [1] that introduces the concept of targeted muta-tion where the mutamuta-tions are applied to the parts of the code that are most likely to sig-nicantly affect the execution time. The program slicing is used to direct the mutations to these parts of the code. One of the main issues associated with traditional mutation testing is high resource properties consumption (like energy, memory, or labour).

(14)

The mutation analysis, although is a strong technique for testing [3], can also be com-putationally expensive.

Mutation testing injects changes (“mutations”) by modifying the code of the system under test. These changes detect the effectiveness of the test cases (their ability to “kill the mutants”) can be checked by using the changed programs (“mutants”). The mutants should be distinguishable from the original program with respect to the property under investigation (e.g., for the execution time, a mutant with the same execution time as the original program will be useless for this purpose).

Since resource properties like execution time (energy or memory), are often more or less strongly associated with different parts of the code (e.g., memory allocation state-ments for memory consumption), the work in [1] (the targeted mutation concept) pro-poses to mutate parts of the code where a change in the code will likely change the prop-erty under study. It is aimed to reduce the number of indistinguishable (or “equivalent”) mutants, thereby creating a set of mutants that is more adequate for creating effective test suites. The work in [1] specifically targets the WCET and proposes a form of targeted mutation analysis where the mutations are applied to the parts of the code that are most likely to signicantly affect the execution time. Moreover, in [1], it is also outlined a method for experimental evaluation of the technique, where the targeted mutations are used to optimize test suites for WCET estimation.

This thesis developed and extended the experimental method and prototype frame-work of [1]. The frameframe-work is mainly comprised of (i) the Static Analysis Tool, SWEET [4] – for identifying the parts of the code with a strong inuence on the execution time, and (ii) the mutation tool, Proteum [5] – for injecting changes in the identied parts of the code. To experiment with the method, we used C benchmarks concerning the variety of code size and functionality.

The thesis confirms the relevance of the concept of targeted mutation by showing that targeted mutation (i) encourages the design of effective test suites for WCET estimation and (ii) improves the efficiency of this process by reducing the number of mutants.

The thesis is organized as follows: In Chapter 1 we describe the experimental eval-uation of the targeted mutation concept and discuss the related work; in Chapter 2 we define the methodology of combining static analysis and mutation testing; in Chapter 3 we describe the application of the method to the benchmark; in Chapter 4 we discuss the achieved results; Chapter 5 discusses threats to validity; in Chapter 6 conclusions and future work are presented; finally, in Chapter 7 we present the appendix.

(15)

1.2

Problem Formulation

Mutation testing is a strong technique. It can also be computationally expensive and ex-tremely time and resource consuming (e.g., energy, memory, and labour). The thesis’s main contribution is to implement the method for the experimental evaluation of the tar-geted mutation testing presented in [1]. The work in [1] targets explicitly the WCET and proposes a form of targeted mutation analysis, where the mutations are applied to the parts of the code that are most likely to signicantly affect the execution time.

The experimental procedure aims to confirm or refute the hypothesis that targeted mutation (i) encourages the design of effective test suites for WCET estimation and (ii) improves the efficiency of this process by reducing the number of mutants.

The Research Question of the experimental procedure is:

RQ:How efficient is targeted mutation for reducing the number of useless mutants?

To answer RQ, we compare the traditional mutation analysis to targeted mutation ana-lysis with respect to the number of mutants (generated, stillborn and equivalent) for the WCET estimation. We hypothesise that the efficiency of targeted mutation is measured by the number of useless mutants that are needed for the WCET estimation.

The experimental procedure consists of the following high-level steps:

1. Generation of test cases for the original program (covering 100% of code).

2. To compare the results with and without slicing, we annotate the test case execution time.

3. Application of Static Backwards Program Slicing [6].

4. Generation of mutants for the original program.

5. Identification of mutants related to the sliced code.

6. Estimation of the execution time of the mutants.

(16)

1.3

Related Work

1.3.1

WCET

In real-time systems, the knowledge about the maximum execution time of programs is of utmost importance [7]. In [7], problems for the calculation of the maximum execution time (MAXT... MAximum Execution Time) are discussed.

The WCET estimation depends both on the program flow (like loop iterations and function calls), and on hardware factors like caches and pipelines [8]. The work in [8] presents a method for representing program flow information. The work shows that exe-cution time estimates can be made tighter by using flow information.

There are other papers addressing other challenges for WCET estimation. For example, the paper [9] proposes how to address main challenges related to execution-time model-ling of the hardware and the path problem that forbids capturing the WCET by end-to-end measurements due to limits in computational complexity.

An extensive overview of WCET estimation methods and tools can be found in [10]. The exploitation of knowledge of other domains has also been investigated. For example, the paper [11] investigates the application of Extreme Value Theory (EVT) (historically used in domains, such as finance to model worst-case events, e.g., major stock market incidences) to timing analysis.

A clarification for WCET and BCET, discussed by Wilhelm et al. [10]. In this art- icle, they mainly concentrated on determining the uppermost bound of ET for real-time sys-tems. As the execution time affected by path and control flow; different approaches sug-gested for determining the WCET such as data-dependent control flow and context- de-pendent execution times. They classified the approaches into two main collections static methods and measurement-based methods. The major difference between both the clas-sifications is whether a program is executed or not for the WCET estimation. Besides, they study and discuss a set of practical tools that used for estimating the WCET of the real-time systems such as timing-analysis tool.

1.3.2

Mutation Testing

De Millo et al. [2] have introduced the concept of Mutation Analysis. More specifically, it was observed that systematically pursuing test data which distinguishes errors from a given class of errors also yields "advice" to be used in generating test data for similar programs.

(17)

In [12], performance and effectiveness problems of the mutation testing techniques are discussed. In fact, the mutation analysis, although is a strong technique for testing [1], can also be computationally expensive. The main reason is that many variants of the test program (“mutants”), must be repeatedly executed. The selective mutation is a way to approximate mutation testing - saving execution by reducing the number of mutants that must be executed [13].

The testing of real-time applications has the problem of response-time dependency on the execution order of concurrent tasks [14]. To determine the test case execution orders, model-based mutation testing has been investigated. To automatically generate such test cases for dynamic real-time systems, in [14], a method, using heuristic-driven simulation, has been suggested.

Approaches for reducing the number of mutants (to be executed) have been intro-duced (e.g., [15]). The approaches usually follow one of three strategies: do fewer, do smarter, or do faster. The do fewer approaches aim to run fewer mutants without incur-ring intolerable information loss, while the do smarter approaches seek to distribute the computational expense over executions by retaining state information between runs. Fi-nally, the do faster approaches aim at generating and running each mutant as quickly as possible [15]. In [15], a do few approach, using a single mutagenic operator, has been introduced.

Several other approaches have been introduced (e.g., for (i) deleting statements [16]; (ii) mutation-based real-time test case generation based on symbolic bounded model checking techniques and incremental solving [17]; (iii) experimenting the cost and (iv) effectiveness of C mutation operators [18], and deriving rules to avoid the generation of useless mutants [19]).

To reduce the number of equivalent mutants produced, program slicing has also been used in [6]. The slicing was achieved by applying the Static Backward Slicing methodology. Despite the fact that mutation testing is applied to detect the program faults, still, the human efforts are needed to accomplish the detection of the equivalent mutants. Here comes slicing to help to detect the equivalent mutants in the purpose of reducing them.

Later, Lisper et al. (2017) [1] aimed to target Worst-Case Execution Time (WCET). It has been targeted by slicing a program to identify the main parts of the code that have a significant influence on the execution time. Afterwards, these targeted parts are only injected within mutants. The targeted mutation method has been proved to optimise test suites for WCET estimation. In contrast to existing approaches, the primary novelty of this approach is that it is proposed to apply the mutators only to certain slices of the software, which are relevant to the non-functional property of interest that we aim to test.

(18)

Chapter 2

Methodology

This thesis aims to investigate and apply the concept of targeted mutation testing as introduced by Lisper et al. (2017) [1]. As the mutation testing generates a massive number of mutants, the mutations are only applied to the parts of the code that are most likely to signicantly affect the execution time. Therefore, the cost and expenses of testing will be reduced, such as time, efforts, and energy resources. Furthermore, program slicing is utilised to direct the mutations to the parts of the code that are anticipated to have the most solid inuence on execution time.

2.1

The methodology of slicing

Slicing is the process of simplifying the program by focusing on specific aspects in the source code, where the parts of the code that do not affect the whole process will be deleted [20]. There are several techniques for slicing (e.g., Dynamic Slicing, Static Forward Slicing, Static Backward Slicing, etc.).

The most important property of the Static Backwards Program Slicing, it preserves the effect of the original program with respect to the variables that have been chosen as sli-cing points. In order to preserve this effect, we have to put in consideration the data and

controldependence. Figure 2.1 shows a simple C program before executing the static

backwards program slicing process, where Figure 2.2 shows the same program after ex-ecuting Static Backwards Program Slicing.

(19)

1 # include < s t d i o . h> 2 i n t main ( void ) 3 { 4 i n t x = 4 5 ; 5 i n t y = 3 ; 6 i n t z = x −2; 7 i n t v = x −z ; 8 z = x / y ; 9 p r i n t f ("%d " , z ) ; 10 }

FIGURE 2.1: The original program

1 # include < s t d i o . h> 2 i n t main ( void ) 3 { 4 5 i n t x = 4 5 ; 6 i n t y = 3 ; 7 z = x / y ; 8 p r i n t f ("%d " , z ) ; 9 10 }

FIGURE 2.2: Sliced program

The Data Dependence represents the dependence between two statements in the source code of the program and determines the data flow. Figure 2.3 shows a piece of Ccode that calculates the average number for a set of numbers stored in an array. In this example, the primary dependence exists between S5 and S6. In this case, to ensure that the program will preserve the same effect of the original program, both statements should be in the same slice, from this perspective the importance of Static Backwards Program Slicing appears.

However, the data dependence guarantees that the results of the sliced program will be same as the original program with respect to the data flow, since all variable that affects on the program results will not be affected by slicing.

The Control Dependence represents the dependence between a statement and the predicate. Therefore, the predicate value determines the control of execution of the ments. Figure 2.4 shows the control dependence in simple C code, the occurrence of state-ment S7 depends on the result of predicate at the statestate-ment S6, if the result of predicate at S6 is True then statement S7 will be executed, else if the result of this predicate is False, then, the statement S7 will not be executed.

(20)

state-1 # include < s t d i o . h> 2 3 i n t main ( ) 4 { 5 i n t age [ 5 ] = { 2 2 , 2 5 , 4 8 , 3 2 , 3 5 } ; S1 6 i n t sum = 0 ; S2 7 i n t avg = 1 ; S3 8 i n t i =0; S4 9 f o r ( i =0; i <=4; i ++) 10 { 11 sum=sum+age [ i ] ; S5 12 } 13 avg=sum / 5 ; S6

14 p r i n t f ( " the average ages i s : %d " , avg ) ; S7 15 r et ur n ( 0 ) ;

16 }

FIGURE 2.3: Data dependence

1 2 # include < s t d i o . h> 3 # include < s t d l i b . h> 4 i n t main ( ) { 5 i n t grades [ 5 ] = { 5 0 , 8 4 , 3 0 , 6 2 , 3 4 } ; S1 6 i n t i =0; S2 7 i n t sum=0; S3 8 i n t avg =1; S4 9 while ( i <5) { S5 10 i f ( grades [ i ] <45) { S6 11 grades [ i ] = 4 5 ; } S7 12 sum += grades ; } S8 13 avg=sum / 5 ; S9 14 p r i n t f ("%d " ,sum) ; S10 15 r et ur n ( 0 ) ; }

FIGURE 2.4: Control dependence

ments, conditions, and inputs to the program that can possibly affect the values of the variables in the slicing criterion in the respective program points" [21], concerning the

dataand control dependence. The static slice criterion consists of a pair of nodes and a

subset of program variables (i, S). Where i represents a node in the CFG, and S is a subset of the program’s variables.

The process of the Static Backward Program Slicing starts with building Control Flow Graph for the program-under-test. Moreover, this step followed directly with setting rel-evant variables of criterion statements to the subset of the variables S. Finally, all the other relevant variables to all other statements will be set to an empty set.

(21)

Figure 2.5 represents C simple program, before applying Static Backward Program Sli-cing. The initial step of slicing is to build a control flow graph of the program, without applying any operation on this code. Where the control flow graph, listed in Figure 2.6.

In order to find the directly relevant variable, for each edge i-S in the control flow graph the following steps will be applied:

1 – if i does not define the relevant variable of S, add that variable to i’s relevant variables; 2 – if i do define a relevant variable of S, add variables referenced by i to i’s relevant vari-ables;

3 – In the presence of loops, need to iterate until no change. Therefore, after the previous step is done, the indirectly Relevant Variables process will starts, for each branch state-ment b in the control flow graph:

A – if i exists in the slice, and i is control dependent on branch statement b, and add b to slice;

B - Find directly relevant variables of criterion (b, Ref(b)), where Ref(b) is a reference for branch statement;

C - Repeat the previous steps for next iteration .

1 # include < s t d i o . h> 2 # include < s t d l i b . h> 3 i n t main ( ) { 4 i n t mod=0; S1 5 i n t sum=0; S2 6 i n t count =0; S3 7 while ( count <=5) { S4 8 sum+= count ; S5 9 mod=sum % i ; S6 10 count ++; S7 11 } 12 p r i n t f ("%d " ,sum) ; S8 13 r et u r n ( 0 ) ; 14 } FIGURE 2.5: C Program

(22)

FIGURE 2.6: C Program control flow graph

However, all irrelevant variables (directly or indirectly) will be removed from the sliced sources code, the resultant program (new slices) will have the same output as the original program. The sliced source code showed in figure 2.7.

1 # include < s t d i o . h> 2 # include < s t d l i b . h> 3 i n t main ( ) { 4 5 i n t sum=0; S2 6 i n t count =0; S3 7 while ( count <=5) { S4 8 sum+= count ; S5 9 10 count ++; S7 11 } 12 13 r et u r n ( 0 ) ; 14 }

(23)

The slicing process generates a new version of the original program that behaves the same as the original program. Therefore, the execution time, memory consumption and other related resources needed for executing this program will be less. The methodolo-gies that used during slicing are Static Backward Program Slicing, Data Dependence and Control Dependence will preserve the behaviour of sliced program the same as the ori-ginal program. In addition, when the mutants are generated the number of mutants will be less than the number of mutants when generating mutants for the original program without slicing, and that affects the execution time of the original program.

2.2

The methodology of generating mutants

The process of applying changes and modifying the behaviour of the program under test-ing is called mutation testtest-ing. Each change applied to the code is known as mutant. Dur-ing the testDur-ing process, if a TC could detect the change as a fault then this mutant is called

Killed, otherwise, it is called alive. In case any alive mutants had been detected,

de-velopers must improve the TCs to kill the mutant. The purpose of MT methodology: (1) to help testers to develop an effective testing methodology, (2) to find the weaknesses in the test data that used to test the program, (3) to find the parts of the program that are rarely or never executed while executing the original program.

2.3

Targeted Mutation Testing

Targeted mutation testing is a combination of slicing and mutation testing. In other words, to achieve targeted mutation goals, we have to use the results that generated by Static Backward Program Slicing as input for mutation testing process to generate mutant for the sliced parts of the code, as described in Figure 2.8.

The methodology of targeted mutation testing uses slicing to reduce the number of equivalent mutants, where the slicing criterion is formed from the property that the test suite is supposed to test. The goal of slicing is that the sliced code will contain all the parts that affect the property under testing. Thus, it suffices to mutate only the statements in the slice. For WCET estimation, we believe that the program flow is the chief factor of execution time variation. Therefore, the slicing criterion targets the conditions and control flow in the program to find out the parts that affect the ET.

In order to perform the targeted mutation testing, we follow the following steps for each benchmark:

(24)

1. Prepare a set of test cases that cover 100% of the original code. 2. Generate mutants to be injected into the code.

3. Slice the original code using static backwards program slicing. 4. Electing mutants that fall in the sliced code.

5. Apply the test cases over the mutants.

6. Classify the mutants regarding the property under testing — ET.

(25)

Chapter 3

Experimental Study

In this thesis, we are going to apply the concept of targeted mutation against two C bench-marks. The first benchmark is janne-complex code, which contains two loops, where the inner loop max number of iterations depends on the outer loop current iterations. The results correspond to something Janne flow-analysis should produce. The second one is Ludcmp. This benchmark is a simultaneous linear equation by LU decomposition; it contains two arrays for input and one array for output. However, the number of equations determined by the variable (n).

In this chapter, we will provide an overview of the SWEET tool (for slicing) and

Pro-teum(for generating mutants), Which we used for applying the concept of targeted

muta-tion.

3.1

Test-bed & Instrumentation

Our test bed is based on a computer composed of I7 processors of the 8th generation. Table 3.1 and 3.2 show the category of the computer’s processors with their cache levels respectively.

3.1.1

SWEET

Primarily, we leverage the ability of SWEET to perform slicing —Static Backwards Program Slicing. As such, it slices programs concerning its conditionals, which fulfills our Needs of backing TM in detecting the parts of the program that influence the ET. SWEET ana-lyses programs in ALF format, ALF refers to Artist Flow Analysis language, "it is a general intermediate program language format, specially developed for flow analysis. ALF code

(26)

TABLE 3.1: Hp ZBook 15 G4 categories Computer Name HP ZBook 15u G4 Operating system Ubuntu 16.04

Processor Intel (R) Core (TM) i7-7500U CPU @ 2.70GHz RAM 8 Gigabytes L1d cache 32K L1i cache 32K L2 cache 256K L3 cache 4096K

can be generated from different sources, like C code and assembler code, and a number of translators are available" [22]. Moreover, SWEET performs flow analysis and generat-ing flow facts, which are essential to findgenerat-ing a safe and tight WCET in the program under testing. [4]

The SWEET tool has some restrictions where the user should concern: 1) Recursion is not supported;

2) The dynamic allocation of memory (such as malloc in C) is not supported; 3) Some libc functions, like printf, are not supported.

3.1.2

PROTEUM

Proteum/IM 2.0is a framework used to generate mutants; it provides mechanisms to es-timate the test case sets adequacy. The most important properties of Proteum is units and interactions among the units of the program-under-testing. If the program under test run-ning against this adequate test set behaves according to the specification, the confidence in the program’s reliability increases. Proteum provides several operations, such as test case handling, mutant handling, and adequacy analysis, which are considered essential operations for Proteum. Therefore, test case handling allows users to execute, include(or exclude), and enable (or disable) test cases, whereas mutant creation, selection, execu-tion, and analysis are considered a mutant handling operation and adequacy analysis concerning with mutation score and generating reports. [5]

As mentioned in the previous paragraph, Proteum is a suitable framework for tradi-tional and targeted mutation testing in terms of generating mutants automatically, but

(27)

also it supports manual intervention of testers if needed. Therefore, there are two mech-anisms to generate mutants, and test cases set either by the graphical user interface (GUI) or by command scripts. The GUI mainly used to export and learn the concepts of muta-tion testing and how to use Proteum/IM 2.0 in a controlled manner. Hence, using scripts is more efficient for generating mutants and test cases, which leads the advanced users to use it.

In this thesis, we used Proteum for generating Mutants for programs under testing. Therefore, the resultant number of mutants will be mass. To make the process of gener-ating mutants more efficient, we went towards executing Proteum scripts twice for each program. One for generating mutants for the benchmarks without slicing, and then, for generating mutants for the same benchmark after slicing. In the next section, a detailed description of our experimental design and how we used SWEET and Proteum.

3.2

MDH Benchmarks

Before starting the targeted mutation experimentation, we added TCs to each chosen MDH benchmark since MDH benchmarks do not provide them. The TCs are generated randomly regarding the code coverage and the variety of the TCs too. For each bench-mark, we situate ten TCs to run the main function of the benchmark. The main function is a function that runs the proposed functionality of the benchmark that is designed for. There are two MDH benchmarks in our experiment:

1) janne_complex.c; 2) ludcmp.c.

The benchmarks are depicted and described in Appendix — section 7.1.

Benchmark Mutants Test-Cases Total Mutants LOC Test Function janne_complex.c 658 10 6580 20 complex (int a, int b)

ludcmp.c 2788 9 25092 50 ludcmp (int n, double eps)

TABLE 3.2: The number of mutants and TCs for the MDH benchmarks

As shown in Table 3.2, there are test functions that carry out the related benchmark’s TCs. For each benchmark, the main function performs as a test harness that runs the benchmarks’ test function with the given TCs. As the benchmark has a specific number of mutants, then the entire number of TC equals the number of mutants multiplied by the number of TCs. For instance, the benchmark ludcmp.c has 2788 mutants in which multiplied by 9 — the number of TCs — generates 25092 mutants in total. Janne_complex and Ludcmp have a different size of code, where Janne_complex has 20 LOC, and Ludcmp has 50 LOC. The LOC number is said to be reduced regarding the SWEET tool elimination of the undesired portions of the code that do not move the loops or ET.

(28)

3.3

Experimental Design

For each benchmark, preliminary preparations already applied. We get rid of the main function before running the SWEET and Proteum scripts. The removal ensures that Pro-teumscripts will not produce any changes in the main function since it has the TCs. Be-sides, we removed all the included libraries since SWEET has limitations with them. Then, after the SWEET and Proteum scripts are run, we re-add the TCs and the removed librar-ies (include<stdio.h>) to each benchmark and mutant. At present, each benchmark and its mutants are ready to be executed to derive their ETs.

All the computer programs were shutdown except the terminal — where we run the MDH benchmarks and their mutants. Differently, the ETs measurements will not be ac-curate due to the instability of the CPU occupancy caused by other running processes. We execute each original benchmark and their mutants with all TCs 100 times. Afterward, we calculate the median ET to get the estimated ET for each of the benchmarks and their mutants.

To gain the ET results, we tend to use the Time Command (UNIX) method. [23] The Time Command is used as a prefix to the command line "%time Program." Besides, It offers three ET measures:

1) Real: the elapsed time spent to execute the program including I/O and other processes interruptions;

2) User: the CPU time spent to execute the program excluding I/O and other processes interruptions;

3) System: the CPU time spent to execute the program in Kernel mode. In the results, we have represented each of these three measures to express the ET.

In order to achieve consistency, the experiment was split into two stages. In the first

stage, we extract the ETs from the benchmarks and their mutants while applying the

tra-ditional mutation testing. In the second stage, we reuse the ETs of the test runs for the same mutants that were measured in the first stage to apply the targeted mutation testing.

3.4

Statistical Analysis

We have used some statistical restrictions during the observation. As such, we have ex-traced the ET measurements and marked up an interval to distinguish a killed mutant from an alive mutant. A mutant is considered to be killed if the ET of the mutant is less than the ET of the original benchmark ET minus 20% of it, and the same, if the ET of the mutant is greater than the original benchmark ET plus 20% of it. Otherwise, the mutant is considered to be alive (equivalent).

(29)

The ETs differ in value from each other, and that returns to the CPU thread’s status. Each CPU thread has various occupancies, the thing which pulls in the ET measures are different irregular. Hence, we are handling the problem by giving the median on 100 ETs of each mutant or program to have a stable measure for the ET. We have chosen Median over Average because Median achieves the central tendency where outliers — or skewed ETs — are less influential.

In Figure 3.1 & 3.2, we have executed a mutant called "A" 100 times over two rounds, ensuring the stability of the measures in case they have been measured again. Then, we applied the median on the 100 ETs to estimate (Real, User, and System) measures. We noticed in round 1 & 2 that the medians of (Real, User, and System) measures are extremely close and corresponding. The estimated ETs for round 1 are (Real= 0.0755s, User= 0.054s, and System= 0.02s) and for round 2 are (Real= 0.0755s, User= 0.053s, and System= 0.02s). To be trusted, we investigated many of mutants, in the same manner, to affirm that the estimated ETs are accurate and desirable to be trusted.

FIGURE 3.1: Mutant-A ETs in round 1

(30)

3.5

Limitations

Our experimental evaluation makes the following signicant assumptions:

Foremost, we have carried out only two benchmarks in our experiment due to:

1-Lack of time. Software testing, in general, is time-consuming as it takes plenty of exe-cution of such a program.

2-The experimentation time was confined within the thesis deadlines.

Second, the experiment is conducted on some specific benchmarks from Mälardalen

University. The benchmarks have low ET measures, which might affect the outcomes. It was a sort of hardness to judge whether the mutants’ ET change is affected by the CPU oc-cupancy or the actual running time. Consequently, some of the equivalent mutants might interfere with the killed mutants in disguise. For an upcoming future work, we highly re-commend carrying out long-time executing benchmarks.

Third, concerning targeted mutation, the parts of the code that might affect the ET are

found where loops exist. But, regarding the SWEET tool limitations, loops that could be used in the experimentation are For and While loops — also, good mentioning that SWEET tool handles unstructured loops formed within GOTO and similar. Thus, the effects of this experiment are confined solely within these loop types except the other types of loops, such as recursion.

(31)

Chapter 4

Results and Discussion

4.1

Results

Here we compare the results of the proposed method with those of the traditional method. The following tables represent statistical data from our experiment describing the mutants number over the traditional and targeted mutation testing. The mutants are categor-ized in the tables as Total, Infinity & Error, Out-Of-Slicing (OOS), Valid (mutants will not crash when running), Killed and Alive (Equivalent). Table 4.1 & 4.2 represent the results for janne_comlpex benchmark whereas Table 4.3 & 4.4 represent the results for ludcmp benchmark — more statistical distribution for the mutants listed in the appendix Section 7.2.

TABLE 4.1: janne_complex.c mutants statist-ics over Traditional MT

Total 6580

Inf and Err 1800

Valid 4780

Killed(R,U,S) 140, 140, 182 Alive(R,U,S) 4640, 4640, 4598

TABLE 4.2: janne_complex.c mutants statist-ics over Targeted MT

Total 6580

Inf, Err and OOS 1880

Valid 4700

Killed(R,U,S) 140, 140, 182 Alive(R,U,S) 4560, 4560, 4560

For the benchmark janne_complex, it has a total number of 6580 mutants generated. Infinity, error and out-of-slicing mutants are excluded from the valid number of mutants. After applying the targeted mutation, the total number of valid mutants decreased from 4780 to 4700. We noticed that the killed mutants number did not change after the targeted mutation, but the alive mutants decreased regrading the SWEET tool elimination. On the

(32)

TABLE 4.3: ludcmp.c mutants statistics over Traditional MT

Total 25092

Inf and Err 242

Valid 24850

Killed(R,U,S) 1062, 1062, 1283 Alive(R,U,S) 23788, 23788, 23567

TABLE 4.4: ludcmp.c mutants statistics over Targeted MT

Total 25092

Inf, Err and OOS 6864

Valid 18228

Killed(R,U,S) 774, 774, 996 Alive(R,U,S) 17454, 17454, 17232

other hand, the benchmark ludcmp has a total number of 25092 mutants generated. After applying the targeted mutation, the total number of valid mutants decreased from 24850 to 18228. We noticed that the killed and alive mutants number decreased significantly after the SWEET tool elimination.

4.1.1

Results for RQ

How efficient is targeted mutation for reducing the number of useless mutants?

The efficiency of targeted mutation is measured by the number of mutants for the WCET estimation. When we applied the traditional mutation testing against the bench-marks, we noticed a massive number of generated mutants, in opposition, when we ap-plied the targeted mutation testing, the number of generated mutants is less than tradi-tional mutation testing. This implies our aims to reduce the number of alive mutants.

The number of the generated mutants is decreased by the targeted mutation testing, and the WCET mutants are still detected as in the traditional mutation testing. We can say that the targeted mutation testing preserves WCET mutants in the purpose of estimating the execution time.

4.2

Discussion

The results of the experiment found clear support for the efficiency of targeted mutation against traditional mutation. The primary goal of targeted mutation testing is to scale down the number of equivalent mutants without reducing the number of killed mutants, where we expect the WCET. In regards to the targeted mutation testing goal, the bench-mark Janne_complex endorse the efficiency of preserving the number of killed mutants, where the number of alive mutants — equivalent mutants — were decreased.

(33)

It is noted that the benchmark Ludcmp — in Table 4.3 and Table 4.4 — has a vast di-vergence between the number of the alive mutants before and after slicing and the same to the killed mutants. The ratio of killed to alive mutant before and after slicing is more or less even — as demonstrated beneath. In other words, The killed mutants decreased as same as the decreasing rate of the alive mutants after the SWEET tool elimination, which makes the results unacceptable in regards to the targeted mutation testing.

Before slicing:killed / (alive + killed) = 1062 / (23788+1062)

The ratio of killed to alive mutants before slicing ≈ 0.043

After slicing:killed / (alive + killed) = 774 / (17454+774)

The ratio of killed to alive mutants after slicing ≈ 0.042

Hence, it comes our interrogative sentence, why the killed mutants decreased enorm-ously? During the mutation testing process, we noticed that there are array indices in some mutants were injected within a constant called MININT, the smallest negative in-teger. A negative array index in C language is not a flaw, because it functions as a pointer to access a storage location, but in the case of the negative, it tapers to a memory location out of the array bounds. The C language compiler does not have the capability of tracing this defect, which runs the mutants normally, the matter makes those mutants be classi-fied as killed. Thus, in order to advocate the targeted mutation testing, we have counted out those skewed killed mutants to be considered as error and defect mutants. Now for the benchmark Ludcmp, the number of killed mutants is fixed after the SWEET tool elim-ination. Mathematically, the ratio of killed to alive mutants after slicing is greater than the ratio before, which stands as an indicator for the killed mutants’ stability.

Before slicing:killed / (alive + killed) = 774 / (23788+774)

The ratio of killed to alive mutants before slicing ≈ 0.032

After slicing:killed / (alive + killed) = 774 / (17454+774)

The ratio of killed to alive mutants after slicing ≈ 0.043

For both of the benchmarks, the TCs discovered an equal number of killed mutants, which means, the TCs are in the same degree of efficiency to be used for tracing the WCET. But if we look at the system-level measures, section 7.2, we can notice clearly a slight dif-ference between the total number of killed mutants for each TC. Probably, the reason re-turns to the low execution time measures at the system level (Kernel). Anyway, there are some TCs showed a number of killed mutants more than others, where those are con-sidered as good TC, unlike the ones showed a similar number of mutants. Hence, the tester can make a verdict to decide whether a test case is useful or not.

It is worth mentioning, Janne_complex has a similar size of code after slicing, unlike Ludcmp. Janne_complex has 20 LOC, where Ludcmp decreased to 40 LOC approxim-ately. The number of alive mutants (OOS) for Ludcmp is more than Janne_complex. For

(34)

Janne_complex, the number of OOS mutants is 80, and for Ludcmp is 6622. So, compar-ing the reduction of the useless mutants (OOS) for Ludcmp to Janne_complex apparently bigger. The reason falls in: first, the size of the code, where we expect more generated mutants to be OOS; second, the code structure, where we expect the higher possibility of excluding the mutants that do not fall into loops or time-related patterns.

In spite of the limitations, we have showed that the targeted mutation testing approach is efficient in an approved manner. targeted mutation decreased the number of the equi-valent mutants where the change happened in parts of the code that do not affect the execution time. As the mutants number decreases, the time needed for testing is lower compared to the traditional method. As an outcome, we produce an efficient test-case suite for testing the worst-case execution time. We say a TC is efficient as much as it can discover or kill more mutants.

(35)

Chapter 5

Validity Threats

5.1

Internal Validity

The experimental evaluation is affected by different parameters (such as acceptance thresholds, cache, and CPU) that can change the program’s behavior and its results. Besides, it will affect the execution time by increasing or decreasing it.

The acceptance thresholds should be of concern to ascertain whether we passed out the desired consequence of our experimentation. After we had the mutants ETs, we set up some acceptance limits, which the values occur outside these limits ought considered as killed mutants, otherwise as equivalent mutants. In their turn, those limits helped to conceal the CPU and cache effects among the different mutants, which confine the two types of mutants — either killed or equivalent — in their intervals.

To prevent cache and CPU effects on the execution time, we have executed all mutants many times. As described in chapter 3, to obtain better results of the mutants’ ET, we set up some run-time conditions. We shut down all the running applications during the measurement of ET to avoid any cache and CPU effects. Also, we repeated the measure-ment of each mutant 100 times to gain an accurate measure for each mutant’s ET. Some ET outliers avoided being affecting the average execution time by using the median.

The evaluation of the high numbers of mutants can also affect the results.As we con-fronted in the experiment, some mutants considered as a defect, and the compiler does not notice it, the matter made them appear in the results. Nevertheless, to get that, we got rid of those mutants from the outcomes to get sane results. In later work, we should be cognizant of similar issues because they might bear on the outcomes.

During executing the mutants, it was fatigue since we have a monumental number of mutants. Some of these mutants contain infinite loops, dump segmentation, and other

(36)

execution errors, so before we start executing mutants and calculating the execution time, we remove all infinite loops. Executing each test case against mutants takes around 28 hours, and then we should monitor the execution process. Therefore, we must execute the mutants continuously without interruptions, since executing mutants and calculating execution time takes a long time. To avoid fatigue in this experiment, the execution of the infinity mutants ruled out, to avoid any interruption that might affect the process.

5.2

External Validity

As we set up an interval to distinguish between the killed and equivalent mutants, the interval might be changed regarding the needs. It can be too exaggerated in some cases to have the same interval we used in the experiment. A 20 percent original ET interval is large in long-time programs, which might be hours or days. In our experiment, it was ap-propriate to use a 20 percent original ET interval since the execution time of our programs is short. In programs that take a long time execution, the interval percentage should be smaller to set up an appropriate pounds for the interval.

Nevertheless, the number of executed programs might be considered as a threat to validity concerning the term "generalization". In spite of the small number of the bench-marks we carried out in the experiment; the targeted mutation showed its efficiency in front of the traditional mutation. The programming language used in this experiment is C. Therefore, the tools used during the experimentation support C programs. Thus, we considered that as an external validity threat, where we can not generalize the use of our method to be used in different computer environments. Anyhow, the methodology of tar-geted mutation is a real support for the different kind of operating system environments and languages, if similar tools that we used could be found in other operating system en-vironments.

(37)

Chapter 6

Conclusion and Future Work

6.1

Conclusion

In this thesis, we have targeted the WCET, which is very important for real-time tasks. We have evaluated the concept of targeted mutation, where the mutations are applied to the parts of the code that are most likely to signicantly affect the execution time. Moreover, we have used program slicing to direct the mutations to the parts of the code that are likely to have the strongest inuence on execution time.

We have experimented the method by implementing a prototype based on (i) the Static Analysis Tool, SWEET – for identifying the parts of the code with a strong inuence on the execution time, and (ii) the mutation tool, Proteum.

Also, we have represented our research questions and the goals behind the experi-ment. We ran through our approach which is depicted in targeted mutation testing. We performed the targeted mutation through different experiments illustrate the effective-ness of it. As demonstrated by looking at our results in Chapters 4 and 7, the targeted mutation testing reduced the number of test cases that are not associated with the ones that affect the execution time, unlike the traditional mutation testing. Hence, the effect-iveness of the targeted mutation testing has been manifested.

Our results showed different comparisons between traditional mutation and targeted mutation testing. The results demonstrated a substantial effect on scaling down the num-ber of equivalent mutants left after slicing. For example, the benchmark Ludcmp (40 LOC) showed a big difference in the number of test cases left after slicing compared to the benchmark Janne_complex (20 LOC). Targeted mutation decreased the number of the equivalent mutants where the change happened on parts of the code do not affect the execution time. As the mutants number decreases, the time needed for testing is lower compared to the traditional method. As an outcome, we produced an efficient test-case

(38)

suite for testing the worst-case execution time.

6.2

Future Work

Various adaptations, tests, and experiments have been left for upcoming future work due to lack of time. The experiments were time-consuming, even for a single run. A more in-depth analysis of particular approaches, alternatively, new proposals to try different mechanisms. There are some particular goals has been drawn through the experimenta-tion work. This thesis has been focused on conducting the targeted mutaexperimenta-tion and show its effectiveness. Thus, we propose to study the efficiency of the targeted mutation to in-vestigate such as effort, labour, time, energy and memory consumption.

Since SWEET tool has limitations, we are proposing some investigations and experi-mentation’s to break up these limits, such as the impossibility of recursion and dynamic allocation of memory. Though we used simple benchmarks and lack of time, we look for-ward to opening doors to investigate various benchmarks, and how targeted mutation is acting in the real-time and intricate systems that require a lot of recursion and memory use.

We propose the following future extensions to our work:

• Further experimentation of the method by considering large scale applications. • Evaluation of the effectiveness of other tools and methods for WCET estimation and

targeted mutation.

• Evaluation of other important factors for mutation testings (energy, etc.). • Further investigation of SWEET tool improvement.

Notwithstanding, the area of technology is continuously developed especially soft-ware engineering area. Many challenges and new proposed studies might show up to be a part involving our thesis study and hold it into other perspectives of studies.

(39)

Chapter 7

Appendix

7.1

Benchmarks’ Source Code

In this section, we report the source code of the MDH benchmarks that used in the pur-pose of testing and experimenting.

7.1.1

Janne_complex

This benchmark contains two loops, where the inner loop max number of iterations de-pends on the outer loop current iterations. The results corresponds to something Janne flow-analysis should produce.

1

2 /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

3 * WCET Benchmark created by Andreas Ermedahl , Uppsala u n i v e r s i t y ,

4 * May 2000.

5 *

6 * The purpose of t h i s benchmark i s to have two loop where the inner

7 * loops max number of i t e r a t i o n s depends on the outer loops c u r r e n t

8 * i t e r a t i o n s . The r e s u l t s corresponds to something Jannes flow−a n a l y s i s

9 * should produce .

10 *

11 * The example appeard f o r the f i r s t time in :

12 *

13 * @InProceedings { Ermedahl : Annotations ,

14 * author = "A . Ermedahl and J . Gustafsson " ,

15 * t i t l e = " Deriving Annotations f o r Tight C a l c u l a t i o n of Execution

Time " ,

16 * year = 1997 ,

(40)

18 * b o o k t i t l e = EUROPAR97 ,

19 * p u b l i s h e r = " Sprin ge r V e r l a g " ,

20 * pages = "1298 −1307"

21 * } 22 *

23 * The r e s u l t of Jannes t o o l i s something l i k e :

24 * outer loop : 1 2 3 4 5 6 7 8 9 10 11

25 * inner loop max : 5 9 8 7 4 2 1 1 1 1 1

26 * 27 *−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ 28 # include < s t d i o . h> 29 # include <time . h> 30 i n t complex (i n t a , i n t b ) 31 { 32 while( a < 30) 33 { 34 while( b < a ) 35 { 36 i f( b > 5) 37 b = b * 3 ; 38 e l s e 39 b = b + 2 ; 40 i f( b >= 10 && b <= 12) 41 a = a + 1 0 ; 42 e l s e 43 a = a + 1 ; 44 } 45 a = a + 2 ; 46 b = b − 1 0 ; 47 } 48 r e t u r n 1 ; 49 } 50 51 i n t main ( ) 52 { c l o c k _ t begin = clock ( ) ; 53 /* a = [ 1 . . 3 0 ] b = [ 1 . . 3 0 ] */ 54 i n t a = 1 , b = 1 , answer = 0 ; 55 /* i f ( answer ) 56 { a = 1 ; b = 1 ; } 57 e l s e 58 { a = 3 0 ; b = 3 0 ; } */ 59 answer = complex ( a , b ) ; 60 c l o c k _ t end = clock ( ) ;

61 double time_spent = (double) ( end − begin ) / ( CLOCKS_PER_SEC /1000) ; 62 p r i n t f (" \n%f ", time_spent ) ;

63 r e t u r n answer ; 64 }

(41)

7.1.2

Ludcmp.c

This benchmark is a simultaneous linear equation by LU decomposition. It contains two arrays for input and one array for output. However, the number of equations determined by the variable (n).

1 /* MDH WCET BENCHMARK SUITE . F i l e version $Id : ludcmp . c , v 1 . 2 2006/01/27

1 3 : 1 5 : 2 8 jgn Exp $ */ 2 3 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ 4 /* */

5 /* SNU−RT Benchmark S u i t e f o r Worst Case Timing A n a l y s i s

*/

6 /* =====================================================

*/

7 /* C o l l e c t e d and Modified by S.−S . Lim

*/

8 /* sslim@archi . snu . ac . kr

*/

9 /* Real−Time Research Group

*/ 10 /* Seoul National U n i v e r s i t y */ 11 /* */ 12 /* */

13 /* < Features > − r e s t r i c t i o n s f o r our experimental environment

*/

14 /*

*/

15 /* 1 . Completely s t r u c t u r e d .

*/

16 /* − There are no unconditional jumps .

*/

17 /* − There are no e x i t from loop bodies .

*/

18 /* ( There are no ’ break ’ or ’ r e t u rn ’ in loop bodies )

*/ 19 /* 2 . No ’ switch ’ statements . */ 20 /* 3 . No ’ do . . while ’ statements . */ 21 /* 4 . Expressions are r e s t r i c t e d . */

22 /* − There are no m u l t i p l e e x p r e s s i o n s joined by ’ or ’ ,

*/

23 /* ’ and ’ operations .

(42)

24 /* 5 . No l i b r a r y c a l l s .

*/

25 /* − A l l the f u n c t i o n s needed are implemented in the

*/ 26 /* source f i l e . */ 27 /* */ 28 /* */ 29 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ 30 /* */ 31 /* FILE : ludcmp . c */

32 /* SOURCE : Turbo C Programming f o r Engineering

*/ 33 /* */ 34 /* DESCRIPTION : */ 35 /* */

36 /* Simultaneous l i n e a r equations by LU decomposition .

*/

37 /* The a r r a y s a [ ] [ ] and b [ ] are input and the a r r a y x [ ] i s output

*/

38 /* row v e c t o r .

*/

39 /* The v a r i a b l e n i s the number of equations .

*/

40 /* The input a r r a y s are i n i t i a l i z e d in fu nct io n main .

*/ 41 /* */ 42 /* */ 43 /* REMARK : */ 44 /* */ 45 /* EXECUTION TIME : */ 46 /* */ 47 /* */ 48 /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */ 49 50

(43)

51 /* Changes :

52 * JG 2005/12/12: Indented program . Removed unused v a r i a b l e nmax .

53 */ 54 55 /*

56 * * Benchmark S u i t e f o r Real−Time A p p l i c a t i o n s , by Sung−Soo Lim

57 * *

58 * * I I I −4. ludcmp . c : Simultaneous Linear Equations by LU Decomposition

59 * * ( from the book C Programming f o r EEs by Hyun Soon Ahn )

60 */ 61 62 # include < s t d i o . h> 63 # include <time . h> 64 65 double a [ 5 0 ] [ 5 0 ] , b [ 5 0 ] , x [ 5 0 ] ; 66

67 i n t ludcmp ( /* i n t nmax , */ i n t n , double eps ) ;

68 69 70 s t a t i c double 71 fabs (double n ) 72 { 73 double f ; 74 75 i f ( n >= 0) 76 f = n ; 77 e l s e 78 f = −n ; 79 r e t u r n f ; 80 } 81 82 i n t 83 main (void) 84 { 85 c l o c k _ t begin = clock ( ) ; 86 i n t i , j/* , nmax = 50 */, n = 5 , chkerr ; 87 double eps , w; 88 89 eps = 1 . 0 e −6; 90 91 f o r ( i = 0 ; i <= n ; i ++) { 92 w = 0 . 0 ; 93 f o r ( j = 0 ; j <= n ; j ++) { 94 a [ i ] [ j ] = ( i + 1) + ( j + 1) ; 95 i f ( i == j ) 96 a [ i ] [ j ] *= 1 0 . 0 ; 97 w += a [ i ] [ j ] ; 98 } 99 b [ i ] = w; 100 } 101

(44)

103 c l o c k _ t end = clock ( ) ;

104 double time_spent = (double) ( end − begin ) / ( CLOCKS_PER_SEC /1000) ; 105 p r i n t f (" \n%f ", time_spent ) ; 106 r e t u r n 0 ; 107 108 } 109 110 i n t

111 ludcmp ( /* i n t nmax , */ i n t n , double eps ) 112 { 113 114 i n t i , j , k ; 115 double w, y [ 1 0 0 ] ; 116 117 i f ( n > 99 | | eps <= 0 . 0 ) 118 r e t ur n ( 9 9 9 ) ; 119 f o r ( i = 0 ; i < n ; i ++) { 120 i f ( fabs ( a [ i ] [ i ] ) <= eps ) 121 r e t ur n ( 1 ) ; 122 f o r ( j = i + 1 ; j <= n ; j ++) { 123 w = a [ j ] [ i ] ; 124 i f ( i ! = 0) 125 f o r ( k = 0 ; k < i ; k ++) 126 w −= a [ j ] [ k ] * a [ k ] [ i ] ; 127 a [ j ] [ i ] = w / a [ i ] [ i ] ; 128 } 129 f o r ( j = i + 1 ; j <= n ; j ++) { 130 w = a [ i + 1 ] [ j ] ; 131 f o r ( k = 0 ; k <= i ; k ++) 132 w −= a [ i + 1 ] [ k ] * a [ k ] [ j ] ; 133 a [ i + 1 ] [ j ] = w; 134 } 135 } 136 y [ 0 ] = b [ 0 ] ; 137 f o r ( i = 1 ; i <= n ; i ++) { 138 w = b [ i ] ; 139 f o r ( j = 0 ; j < i ; j ++) 140 w −= a [ i ] [ j ] * y [ j ] ; 141 y [ i ] = w; 142 } 143 x [ n ] = y [ n ] / a [ n ] [ n ] ; 144 f o r ( i = n − 1 ; i >= 0 ; i −−) { 145 w = y [ i ] ; 146 f o r ( j = i + 1 ; j <= n ; j ++) 147 w −= a [ i ] [ j ] * x [ j ] ; 148 x [ i ] = w / a [ i ] [ i ] ; 149 } 150 r e t u r n ( 0 ) ; 151 152 }

(45)

7.2

Benchmarks’ Results

In this section, we report the comparison of mutants number (killed and alive) between the traditional and targeted mutation. Also, the comparison is spilt to three regarding the method used to measure the ET.

(46)

FIGURE 7.1: The number of mutants in Traditional & Targeted Mutation Testing for janne_complex.c

(47)

FIGURE 7.2: The number of mutants in Traditional & Targeted Mutation Testing for lud-cmp.c

(48)

References

[1] Björn Lisper, Birgitta Lindström, Pasqualina Potena, Mehrdad Saadatmand, and Markus Bohlin. Targeted mutation: Efficient mutation analysis for testing non-functional properties. In 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pages 65–68. IEEE, 2017.

[2] Richard A DeMillo, Richard J Lipton, and Frederick G Sayward. Hints on test data selection: Help for the practicing programmer. Computer, 11(4):34–41, 1978.

[3] Paul Ammann and Jeff Offutt. Introduction to software testing. Cambridge University Press, 2016.

[4] Björn Lisper. Sweet–a tool for wcet flow analysis. In International Symposium On Leveraging Applications of Formal Methods, Verification and Validation, pages 482– 485. Springer, 2014.

[5] Márcio Eduardo Delamaro, José Carlos Maldonado, and Auri Marcelo Rizzo Vincenzi. Proteum/im 2.0: An integrated mutation testing environment. In Mutation testing for the new century, pages 91–101. Springer, 2001.

[6] Rob Hierons, Mark Harman, and Sebastian Danicic. Using program slicing to assist in the detection of equivalent mutants. Software Testing, Verification and Reliability, 9(4):233–262, 1999.

[7] Peter Puschner and Ch Koza. Calculating the maximum execution time of real-time programs. Real-time systems, 1(2):159–176, 1989.

[8] Jakob Engblom and Andreas Ermedahl. Modeling complex flows for worst-case exe-cution time analysis. In Proceedings 21st IEEE Real-Time Systems Symposium, pages 163–174. IEEE, 2000.

[9] Ingomar Wenzel, Raimund Kirner, Bernhard Rieder, and Peter Puschner. Measurement-based worst-case execution time analysis. In Third IEEE Work-shop on Software Technologies for Future Embedded and Ubiquitous Systems (SEUS’05), pages 7–10. IEEE, 2005.

(49)

[10] Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thes-ing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tu-lika Mitra, et al. The worst-case execution-time problem—overview of methods and survey of tools. ACM Transactions on Embedded Computing Systems (TECS), 7(3):1– 53, 2008.

[11] Jaume Abella, Maria Padilla, Joan Del Castillo, and Francisco J Cazorla. Measurement-based worst-case execution time estimation using the coeffi-cient of variation. ACM Transactions on Design Automation of Electronic Systems (TODAES), 22(4):72, 2017.

[12] Aditya P Mathur. Performance, effectiveness, and reliability issues in software test-ing. In [1991] Proceedings The Fifteenth Annual International Computer Software & Applications Conference, pages 604–605. IEEE, 1991.

[13] A Jefferson Offutt, Gregg Rothermel, and Christian Zapf. An experimental evaluation of selective mutation. In Proceedings of 1993 15th International Conference on Soft-ware Engineering, pages 100–107. IEEE, 1993.

[14] Robert Nilsson, Jeff Offutt, and Jonas Mellin. Test case generation for mutation-based testing of timeliness. Electronic Notes in Theoretical Computer Science, 164(4):97– 114, 2006.

[15] Roland H Untch. On reduced neighborhood mutation analysis using a single muta-genic operator. In Proceedings of the 47th Annual Southeast Regional Conference, page 71. ACM, 2009.

[16] Lin Deng, Jeff Offutt, and Nan Li. Empirical evaluation of the statement deletion mutation operator. In 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation, pages 84–93. IEEE, 2013.

[17] Bernhard K Aichernig, Florian Lorber, and Dejan Ničković. Time for mutants—model-based mutation testing with timed automata. In International Conference on Tests and Proofs, pages 20–38. Springer, 2013.

[18] Marcio Eduardo Delamaro, Lin Deng, Vinicius Humberto Serapilha Durelli, Nan Li, and Jeff Offutt. Experimental evaluation of sdl and one-op mutation for c. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Valida-tion, pages 203–212. IEEE, 2014.

[19] Leonardo Fernandes, Márcio Ribeiro, Luiz Carvalho, Rohit Gheyi, Melina Mongiovi, André Santos, Ana Cavalcanti, Fabiano Ferrari, and José Carlos Maldonado. Avoiding useless mutants. SIGPLAN Not., 52(12):187–198, October 2017.

[20] Mark Harman and Robert Hierons. An overview of program slicing. Software Focus, 2(3):85–92, 2001.

(50)

[21] Husni Khanfar, Björn Lisper, and Abu Naser Masud. Static backward program slicing for safety-critical systems. In Juan Antonio de la Puente and Tullio Vardanega, ed-itors, Reliable Software Technologies – Ada-Europe 2015, pages 50–65, Cham, 2015. Springer International Publishing.

[22] Swedish execution time analysis tool - sweet. http://www.mrtc.mdh.se/ projects/wcet/sweet/manual/html/ar01s01.html#idp33693136. Accessed: 2019-03-06.

[23] David B Stewart. Measuring execution time and real-time performance. In Embed-ded Systems Conference (ESC), volume 141, 2001.

Figure

FIGURE 2.2: Sliced program
FIGURE 2.3: Data dependence
Figure 2.5 represents C simple program, before applying Static Backward Program Sli- Sli-cing
FIGURE 2.6: C Program control flow graph
+6

References

Related documents

Stöden omfattar statliga lån och kreditgarantier; anstånd med skatter och avgifter; tillfälligt sänkta arbetsgivaravgifter under pandemins första fas; ökat statligt ansvar

Generally, a transition from primary raw materials to recycled materials, along with a change to renewable energy, are the most important actions to reduce greenhouse gas emissions

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Samtidigt som man redan idag skickar mindre försändelser direkt till kund skulle även denna verksamhet kunna behållas för att täcka in leveranser som

To investigate if a relationship exists between price of electricity and the number of workers in the manufacturing industry, a Vector Autoregressive model test will be

For example, that K M with 1-propanol and phenylacetaldehyde has increased, and that there were no sign of saturation with (S)-1,2-propanediol up to a substrate concentration of

We investigate the number of periodic points of certain discrete quadratic maps modulo prime numbers.. We do so by first exploring previously known results for two particular