Learning-based testing of automotive ECUs

(1)

IN

DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2016,

Learning-based testing of automotive ECUs

SOPHIA BÄCKSTRÖM

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

Learning-based testing of automotive ECUs

SOPHIA BÄCKSTRÖM

Master’s Thesis at CSC Supervisor: Karl Meinke Examiner: Johan Håstad Project provider: Scania AB Supervisor at Scania: Christopher Lidbäck

Stockholm, Sweden December 2016

(3)

(4)

Abstract

LBTest is a learning based-testing tool for black box testing, developed by the software reliability group at KTH. Learning based-testing combines model checking with a learning algorithm that incrementally learns a model of the system under test, which allows for a high degree of automation.

This thesis examines the possibilities to use LBTest for testing of electronic control units (ECUs) at Scania. Through two case studies the possibility to formalise ECU requirements and to model ECU applications for LBTest are evaluated. The case studies are followed up with benchmarking against test cases currently in use at Scania.

The results of the case studies show that most of the functional requirements can, after reformulation, be formalised for LBTest and that LBTest can find previously undetected defects in ECU software.

The benchmarking also shows a high error detection rate for LBTest.

Finally, the thesis presents guidelines for requirement formulation and improvements of LBTest are suggested.

(5)

Referat

Inlärningsbaserad testning av ECU:er

LBTest är ett inlärningsbaserat verktyg för black box-testing som har utvecklats av programvarutillförlitghetsgruppen på KTH. Inlärnings- baserad testning kombinerar model checking med en inlärningsalgoritm som stegvis bygger upp en lärd modell av systemet under test, vilket möjliggör en hög grad av automatisering.

Denna uppsats undersöker möjligheten att använda LBTest för att testa elektroniska kontrollenheter (ECU:er) på Scania. Genom två fall- studier utvärderas möjligheten att formalisera krav på ECU:er och modellera ECU-applikationer för LBTest. Fallstudierna följs upp med en benchmarking gentemot befintliga testfall på Scania.

Resultaten av fallstudierna visar att majoriteten av de funktion- ella kraven kan formaliseras för LBTest efter en omformulering och att LBTest kan hitta tidigare oupptäckta fel i mjukvaran. Benchmarkingen visar en hög grad av feldetektion för LBTest. I uppsatsen föreslås också riktlinjer för kravformulering och möjliga förbättringar av LBTest.

(6)

Acknowledgements

I would like to express my gratitude to my supervisors at CSC and and to the Scania employees who have helped me along the way.

Karl Meinke, Christopher Lidbäck, Andreas Rasmusson and Hojat Khosrowjerdi – thank you for your patience and support.

(7)

Introduction

Software testing examines a program’s behaviour given specific settings and inputs to make a judgement about its quality or to detect defects (Jorgensen 2002). The results, the outputs of the program, are compared to the expected behaviour and evaluated. The evaluation of the result is known as the test oracle, and can be achieved by comparing the result with a statement in the test case or by using a separate algorithm. These three steps - generating test data, injecting the input to the program and the oracle step constitute the basis of software testing. These steps can be executed manually, with some degree of automation or completely automated.

This thesis describes an evaluation of a fully automated testing tool, LBTest, for testing of electronic control units (ECUs) for heavy trucks. LBTest is developed by the software reliability group at KTH and implements learning-based testing.

This testing strategy combines model checking with learning algorithms, which incrementally learn a model of the system under test, using test cases as queries.

New test cases are generated by checking the learned model against the system’s functional requirements, formulated in Linear Temporal Logic (LTL).

Scania is one of the world’s leading manufacturers of heavy trucks and buses.

The company was founded in 1891 in Södertälje, where the head office is still located.

Today Scania has over 44 000 employees and 3 500 of these work at Research &

Development. The case study took place during the spring of 2016 at Research &

Development, with the team responsible for system testing of chassis ECUs. Scania is interested in how LBTest can be used to approximate models of their software and test requirements on the ECUs. To be able to test directly from requirements could reduce the resources needed to manually write and maintain test code, as well as allowing for a larger number of combinations of states and scenarios to be executed during testing.

Previous studies (Feng, Lundmark, Meinke, Niu, Sindhu & Wong 2013, Nycan- der 2015), have shown that LBTest can model a system under test, find new bugs and detect injected errors in the code. However, several questions remain regarding the practical applicability of the tool, such as scalability, usability and efficiency in

(11)

CHAPTER 1. INTRODUCTION

regard to the resources needed.

1.1 Objective

The goal of the study is to evaluate how suitable LBTest is for ECU-testing. It should also result in added knowledge of how real-life industrial problems can be modelled for LBTest and to what degree the tool matches the demands of the industry. The study will examine whether the conditions for using LBTest are met given the current requirements documents, different approaches to requirement modelling for LBTest and the effects of using the tool to test ECU software. This will be achieved by analysing the following questions:

• To which degree are the ECU requirements formalisable and expressible in LTL?

• Can ECU applications be modelled for LBTest?

• Can LBTest find undiscovered bugs in the ECU software?

• How does LBTest compare to the existing testing framework in regard to detection of injected errors?

1.2 Methodology

The research questions will be answered through comparative case studies based on ECU requirements. Since the aim is to study LBTest in a specific real-life context the case study method is well suited. The case studies will include both qualitative and quantitative elements, with the aim to both analyse potential obstacles for using the tool in this setting and evaluate it against the existing framework. Due to the large variation among the Scania requirements it is preferable to study more than one ECU function. The requirements documents to be analysed will be selected by testers at Scania. Data will be gathered by examining the number of formalisable requirements, the verdicts from LBTest when testing the requirements and the number of injected errors detected by LBTest and the existing testing framework.

1.3 Delimitations

The objective of the project is not to examine the full consequences of a complete shift to LBTest at Scania, but to evaluate the tool based on a subset of requirements. The focus will be on the practical use of the tool, mainly on its industrial applicability. The specific algorithms for learning and model checking will not be discussed in detail.

This thesis does not discuss ethical or environmental aspects of software testing.

However, sustainability issues and the effect improved software testing techniques would have on the environment could be relevant to investigate in future studies.

2

(12)

1.4. CONTRIBUTIONS

1.4 Contributions

To this date no industrial benchmarking has been conducted with LBTest against an already implemented testing framework, examining how the tool measures up to existing test methods in the industry. This thesis will contribute by adding knowledge of how LBTest can be used in the automobile industry and how it compares to one of the current testing strategies.

1.5 Thesis outline

Chapter 2 will provide an introduction to software testing and the relevant theory for learning-based testing and testing of ECUs. Chapter 3 describes the two case studies of ECU requirements and the benchmarking against the current testing framework, piTest. Chapter 4 presents the results of the case studies and the benchmarking.

Chapter 5 will provide a more detailed discussion of the results and the experience of working with LBTest. Finally, Chapter 6 summarises the conclusions of the project, provides recommendations for requirement formulation and suggests future work.

(13)

(14)

Chapter 2

Background

This chapter presents an introduction to software testing in general and learning- based testing and testing of ECUs in particular. Section 2.1 describes the role of software testing given different development models and basic testing strategies. It also covers mutation testing, which is used to evaluate LBTest in the benchmarking described in Chapter 3. Section 2.2 provides the background necessary to understand the concept of learning-based testing by an introduction to model-based testing, model checking and linear temporal logic. Learning-based testing and LBTest are covered in Section 2.3 and Section 2.4 focuses on testing of ECUs and formalisation of automotive requirements.

2.1 Software testing

Software testing alone cannot prove a program’s correctness, only display its defects.

Or as phrased by Dijkstra (1970, p.7):

“Program testing can be used to show the presence of bugs, but never to show their absence”

Software testing tends to be a labour intensive and costly activity, requiring up to 50% of software development costs (Ammann & Offutt 2008). One of the most important aspects of keeping costs down is early detection of defects, as the cost of correcting defects increases exponentially throughout the development cycle. The relative cost can be 15 times higher if a defect is detected during testing compared to detection during system design, and 100 times higher if it is not found before system maintenance (Crispin & Gregory 2009). This has motivated early testing and testing throughout the development cycle.

Traditionally, testing activities have been categorized into stages where they are linked to a specific phase in the development cycle and its level of abstraction of the system under test (SUT). The V-model (figure 2.1) describes the objectives of testing and the source material used to derive test cases at each phase. The V-model stems from sequential, top-down, development models such as the waterfall model,

(15)

CHAPTER 2. BACKGROUND

Figure 2.1. The V-model, based on (Jorgensen 2002) and (Mathur & Malik 2010)

in which each development phase relies on information produced by the previous phase. Each phase must be completely closed before the next can start, locking the specifications at the previous level. Unit testing evaluates each component’s functionality separately as a unit with respect to detailed design specifications and code coverage. Integration testing uses architectural design to verify the integration of the components of the system, whilst system testing examines the assembled system and its functionality based on software and design specifications. At this point detailed knowledge of the actual implementation should not be necessary.

Acceptance testing finally validates the software with respect to user requirements, usually conducted with representatives of end users or persons with good domain knowledge (Mathur & Malik 2010). The purpose of the model is to make sure that testing is conducted throughout the development cycle and not left until the end of the project. Basic implementation errors in the source code should be detected during unit testing, so that integration- and system testing can focus on broader questions regarding the design, the specification and communication between the different parts of the SUT.

As a response to the inflexibility of sequential development models, agile development is based on short iterations with continuous improvements, emphasising early delivery, customer collaboration and cross-functional teams. An iteration can be as short as a single week, so testing activities must start as early as possible, in parallel with development. A way to describe the testing activities given an agile strategy are the testing quadrants (figure 2.2). The testing quadrants, as presented by Crispin and Gregory (2009), describe how the different testing activities should be conducted, rather than when. The model is based on four different aspects of software testing; business facing tests are contrasted with technology facing, and team supporting tests with product evaluating. These categories are intended to serve as a basis for decisions on which tests should automated, when specific tools should be considered and when manual testing is beneficial.

6

(16)

2.1. SOFTWARE TESTING

Figure 2.2. The four testing quadrants, as described by Crispin & Gregory (2009)

The team supporting activities in Q1 and Q2 focuses on the product development. The first quadrant contains unit tests and component tests, which has the purpose of verifying the code. These tests can be fully automated. The test activities in Q2 views the product from a business perspective that is comprehensible to other stakeholders, such as product owners and business analysts. These activities should include a high degree of automation as well, but on a different abstraction level. However, some of the activities in Q2 cannot be automated, for example prototyping and design validation.

The activities in Q3 and Q4 evaluate the product according to its functional and non-functional requirements. Q3 contains tests that should not be automated, such as acceptance testing, exploratory testing and usability testing, validating that the product meets the functional requirements. These tests usually involve the end-users of the product. The testing activities in Q4 evaluate the product from a technical perspective and explore non-functional aspects of the SUT, such as security, performance and load testing. These tests can usually be aided with specialised tools.

The main reason for the focus on test automation in agile development is the need to present a product with basic functionality at each release, keeping the iteration time to a minimum. Test automation frees resources to focus on tests that cannot be automated and the fast execution allows for a wider coverage. Building the product step by step makes regression testing, verification of the SUT after it has been updated, especially important. With automated regression tests the

(17)

functionality of the SUT can be verified each time changes are made to the code.

2.1.1 Back box and white box testing

Two main approaches to software testing are black box testing and white box, or structural, testing. These testing strategies concern the information used in writing test cases and correlate to some degree to the current phase of testing.

White box testing uses information from the actual implementation to create test cases and estimate coverage, which makes more sense with test activities that aim at verifying the source code. Black box testing focuses on the functionality of the SUT, ignoring the internal structure of the implementation. The program is viewed as an unknown function, transforming the program’s input data to output according to requirements. Black box testing strategies tend to be based on the requirements themselves, for example testing from use cases, or through analysis of the input data, such as equivalent classes or boundary value analysis. Equivalence class testing identifies input data that are equivalent in regard to how they would affect the SUT and groups them into disjoint partitions, covering the entire domain.

Test cases are created by letting one sample represent each partition, to be able to reduce redundant test cases but still cover the entire input domain. Boundary value analysis identifies the boundaries of the input space to construct test cases, based on the assumption that errors often occur near the extreme values of an input variable.

(Ammann & Offutt 2008, Jorgensen 2002). Downsides to black box testing are the risk of redundant test cases, due to overlapping functions, untested code and difficulties defining the coverage of a test suite (Jorgensen 2002).

Coverage in white box testing is measured using the structure of the source code of the SUT. The program is modelled as a flow graph with statements as nodes and data flow as edges. A coverage criteria for a test suite can then be expressed using elements of the graph, such as coverage of all nodes or all paths (Ammann & Offutt 2008). Although white box testing easily describes coverage of the SUT, the main challenge for white box testing is scalability. Each conditional branch in a graph doubles the number of possible paths, causing a test suite explosion as the program grows. But regardless of testing strategy, complete testing is usually not a realistic option. The possible combinations of input for most programs is effectively infinitely (Ammann & Offutt 2008).

2.1.2 Mutation testing

Mutation testing is a strategy to evaluate the quality of the test suite, by injecting errors into the source code. A mutant is a small, syntactically valid, modification of the program that in some way changes its behaviour. Examples of mutants are to replace || with && or x > 1 with x ≥ 1. A mutant that only changes the code but not the behaviour of the program, a so called an equivalent mutant, can never be detected by a failed test case. To avoid generating equivalent mutants is however impossible, since the problem of program equivalence is undecidable (Jia

8

(18)

2.2. MODEL-BASED TESTING

& Harman 2011). Each test suite is given a mutation score; the ratio between discovered mutants and the number of non-equivalent mutants in the code. This score can be seen as a type of coverage, examining the degree of errors caught instead of code or requirements.

Mutation testing is based on two assumptions:

- The competent programmer hypothesis. The hypothesis claims that it is not necessary to consider all conceivable errors that could be made by a programmer to evaluate the test suite. On the contrary, one can assume that the programmer will be of such competence that the SUT is very close to the bug free program that is aimed for. Since the actual errors in the current code will differ very little from the correct version of the program they can be recreated by a few syntactic changes (Jia & Harman 2011).

- The coupling effect. The coupling effect states that the simple syntactic errors that the mutators form are related to the complex errors that one would wish a test suite to detect. The assumed relationship is that complex errors are made up by a number of syntactical changes, which can be simulated by mutants.

Complex errors are seen as groups of simpler errors, not as a different category of errors. Therefore a mutant can be made to represent more complex errors as well (Jia & Harman 2011).

Some of the advantages of mutation testing are that it may reveal issues with the test suite that are not picked up in manual reviews and offers a consistent measure of the quality of a test suite (Baker & Habli 2013). But mutation testing is not unproblematic. To consider all possible mutations while leaving out equivalent mutants can demand large computational resources. In addition, a reliable test oracle must be constructed. Empirical studies on the subject have given mixed results. Some studies find that the majority of detected and corrected errors in software project do involve small code segments (Purushothaman & Perry 2005), other results (Gopinath, Jensen & Groce 2014) indicate that the errors found are too large for the competent programmer hypothesis to hold. Additionally, the coupling between the mutants and actual errors seems to differ between languages, which is why language specific mutators should be considered.

2.2 Model-based testing

Model-based testing uses an abstract model of the SUT to generate relevant test cases. A strategy to automatically generate test cases in model-based testing is to let a model checker verify the model of the SUT given formally expressed requirements, and use any counterexamples given by the model checker as test cases.

A drawback of model-based testing is the required model of the SUT. Manual model construction is a complicated task, and given an agile development style the model would need recurrent updates. Model-based testing using incomplete models

(19)

has been suggested by (Groce, Peled & Yannakakis 2002) and (Groce, Fern, Pinto, Bauer, Alipour, Erwig & Lopez 2012) to enable updates of a model according to changes in the software. The basic idea, shared with the learning-based testing approach, is to use the generated counterexamples both for testing the SUT and for improving the model of the SUT. A counterexample from the model checker that does not cause a fail verdict when executed on the SUT shows a discrepancy between the SUT and the model, and is used to improve the model of the SUT.

2.2.1 Model checking

A model checker takes a property of the SUT expressed in temporal logic and a transition system, such as a Kripke structure, as input and explores the entire state space to determine if the model violates the given property (Fraser, Wotawa &

Ammann 2007). A Kripke structure K is a tuple K = (S, S₀, T, L) expressing the behaviour of the program as a finite state machine. It contains a set of states S, an initial state S₀, a total transition relation T , which connects every state to at least one other, and a labeling function L that maps each state to a set of atomic propositions (Fraser et al. 2007). This means that the system must be describable by a finite set of states and inputs, where the behaviour of the system only depends on the current state and the input. If a violation is found whilst checking the formal statement against the Kripke structure a counterexample, where the negation of the property holds, will be given. If no such discrepancy is found, the property holds in all possible states.

Model-based testing with model checkers enables both automatic generation of relevant test cases and an automated test oracle. The oracle can compare the SUT output with the counterexample; a match in behaviour gives a fail verdict, otherwise the test case passes. To achieve coverage through counterexamples so called trap properties are used, negated properties for the items to be covered – such as nodes, edges or states. While trap properties for state coverage are just safety properties, other types of coverage measurements can demand more complex statements (Fraser et al. 2007).

2.2.2 Linear temporal logic

Linear temporal logic (LTL) expresses statements that can be true or false given a specific point in time. It extends classical logic, such as propositional or predicate logic, where statements are statically true or false. LTL allows for statements about possible states in the future, that will be the case at some point (F φ), in the next time step (Xφ), globally (Gφ) or until some other state (φ U ψ) (see table 2.1).

The possibility to discriminate between different points in time makes LTL useful to model and express qualities of reactive systems, such as embedded software. Some specifically interesting qualities that are expressible in LTL are safety properties and liveness properties. Liveness is an assurance that something good will eventually happen in the future, for example - if the program is started it will eventually

10

(20)

2.3. LEARNING-BASED TESTING

terminate (φ → F (ψ)). Safety properties asserts that something bad never will happen, that G(!φ) (Fisher 2011).

An extension of LTL includes past operators, expressing properties that, for example, held at one (O), all (H) or the last (Y) of the previous states. The past operators strictly speaking do not add expressibility, all statements expressible with past operators can be re-written using only future operators. But they do affect the usability of the language - from a user perspective, formalising requirements in LTL, the extension of past operators can offer statements that are easier to grasp and closer to the initial formulation. This improvement in usability does not increase the complexity of model checking (Pradella, San Pietro, Spoletini & Morzenti 2003).

Future operators X(φ) Next - φ holds in the next state G(φ) Global - φ holds in all future states F (φ) Finally - φ holds in a future state (φ U ψ) Until - φ holds until ψ holds

(φ V ψ) Releases - ψ holds until φ holds, or ψ holds globally Past operators

Y (φ) Previous - φ holds in the previous state H(φ) Historically - φ holds in all past states O(φ) Once - φ holds in at least one past state (φ S ψ) Since - φ holds in all states since ψ

(φ T ψ) Triggered - ψ holds in all states since φ, or ψ holds historically

Table 2.1. The past and future operators in LTL, in NuSMV syntax (Cavada, Cimatti, Jochim, Keighren, Olivetti, Pistore, Roveri & Tchaltsev 2010)

2.3 Learning-based testing

In the paper Automated black-box testing of functional correctness using function approximation (Meinke 2004) black box testing is described as a constraint solving problem, a search for counterexamples to program correctness, which is solved by learning the system under test. If S is a system where functional correctness can be modelled by pre and post conditions in first order logic, then a successful black box test case that finds such a counterexample is an assignment of values to input vari- ables that satisfies pre and where S terminates with output variables that satisfies

¬post. In this search for successful test cases an application of function approxi- mation, to represent an unknown underlying function with an approximation based on the observed input and output, is suggested. The underlying function in this case is the system S, which is approximated by a model that maps the input and output space of S. For each unsuccessful test case the model of S is incrementally approximated and new, improved, test cases generated.

(21)

A generalisation of this strategy is learning-based testing (LBT). In LBT the approximated model of the SUT is given by a machine learning algorithm, which together with a model checker creates an iterative feedback loop, using the test cases as queries. At each iteration a test case is created either by the learning algorithm, to generate a membership query, from a counterexample generated by the model checker, checking the current model of the SUT against the requirements, or from a random test case generator (Sindhu 2013). LBT is a heuristic approach to find bugs in the state space of the SUT, possibly without having to explore the whole state space through complete learning. Instead LBT makes a best guess for where a bug can be found through generalisation of the current information of the SUT. If the model checker finds a counterexample to the requirements in the current model of the SUT, this will serve as the next test case. If the behaviour of the SUT matches the counterexample a bug has been found. If not, the model of the SUT is improved by adding the information from the input/output pair of the counterexample. This method allows for automation of both test case generation, execution and test oracle. The concept has been developed by the software reliability group at KTH for both procedural and reactive programs and several learning algorithms have been evaluated, such as Algebraic parameter estimation, IKL, an incremental learning algorithm for Kripke structures, L* Mealy and minsplit. (Meinke, Niu &

Sindhu 2012, Sindhu 2013).

2.3.1 LBTest

LBTest is a tool for functional black box testing that implements the learning-based testing paradigm. Two prerequisites for using of the tool are that it must be possible to model the SUT as a finite state machine and that functional requirements for the SUT can be expressed in LTL. The additional resources needed to execute LBTest are a configuration file, a wrapper file and an executable file of the SUT (Meinke 2015). The wrapper functions as a test harness and acts as the communicator between LBTest and the SUT through the system standard input and output. LBTest does not have any direct contact with the SUT, so the accuracy of the results relies on the wrapper distributing correct information between the programs. The oracle step is handled by comparing the output from the SUT to the counterexample given by the model checker.

LBTest generates test cases from either the machine learning algorithm, the model checker or a random input generator (figure 2.3). These values will be translated by the wrapper to data that can be injected to the SUT. The wrapper then reads the next state of the SUT, converts it into symbolic names that are comprehensible to LBTest and sends the information on the output stream. LBTest needs all data to be partitioned into a finite set of equivalence classes, since the model checker cannot make direct use of integer values or other data types, such as graphs or trees. The wrapper must extract the necessary data for each defined type and translate it to a predefined value with a symbolic name for LBTest.

The configuration file defines the set up for the test session. It contains the 12

(22)

2.3. LEARNING-BASED TESTING

Figure 2.3. LTBest, as described by Meinke & Sindhu (2013)

different input- and output types of the SUT, the requirements to be tested, the location for external resources and stopping criteria. Examples of stopping criteria are limitations on execution time, the number of hypotheses to be generated or equivalence checks to be made before termination. In LBTest convergence is considered to be reached when no difference is found between the hypothesis and the SUT, given a specified number of samples of random queries. The configuration file also provides the possibility to optimise testing by defining other keywords, such as the learning algorithm and model checker to be used. The verdict given by LBTest is either pass, fail or warning; a detected counterexample that includes a loop. After a normal termination LBTest will produce a dot-file, containing the state machine of the last hypothesis.

LBTest claims to be well suited for agile development and continuous development since it supports a very high degree of test automation (Meinke 2015). Due to the black box abstraction level the wrapper and configuration files do not have to be altered between sprints and allows for alterations and re factoring of the implementation.

2.3.2 Previous case studies

To this date two industrial case studies have been conducted by the software reliability group at KTH and an additional four by thesis workers. These studies have focused on showing that LBTest can model the SUTs and to examine if LBTest finds undetected defects in the SUT. The first two studies were conducted on a Brake-by-Wire system by Volvo and an access server by Fredhopper. More detailed

(23)

descriptions of these studies are found in (Feng, Lundmark, Meinke, Niu, Sindhu &

Wong 2013).

Brake-by-Wire is a distributed system of five ECUs and a connecting network bus, with one ECU connected to the brake- and gas pedals and the other four to one wheel each. The two pedals provided the input to the system and the output was measured by vehicle speed, rotational speeds of the wheels and torque values.

Out of three requirements that were tested with LBTest two passed and one were given a fail verdict. The counterexample for the failed requirement turned out to show an error in the SUT.

In the case study of the Fredhopper Access Server eleven informal requirements were formalised and translated to LTL and nine of them passed. Two, expressing liveness properties, were given warnings since counterexamples in form of loops were found. A loop where the desired state p is not reached breaks the property F (p) and therefore results in a warning from LBTest. It turned out that this behaviour was due to errors in the requirements (a strong until U should have been regarded as a weak until W – the property holds either until a specified state becomes true, or it holds forever) as well as an error in the SUT.

Two of the thesis projects were conducted at TriOptima, one using a Django web application (Lundmark 2013) and the other on micro service architecture (Nycander 2015). Both thesis workers expressed difficulties finding a suitable abstraction level to model the SUT. Deciding whether a certain signal should be seen as an input, transforming the system, or output, an indicator of the state of the system, turned out to be a non-trivial problem. Lundmark used the strategy of finding verbs that described actions that could be performed on the system as input and the result of performing these actions as output data types. Five requirements were translated to LTL in his project and they were all given a pass verdict by LBTest. Lundmark then continued to experiment with injected errors in the code. These errors were detected by LBTest.

In Nycander’s project different abstraction levels of the SUT were considered.

First a black box wrapper was implemented, only utilising the interface from a user perspective. Limiting the model of the SUT to this interaction, leaving out the implementation of the system, resulted in a model with only two states – calcu- lating and idle. Therefore, to achieve a more reactive system, a grey box wrapper communicating directly with the internal messaging system was constructed as well.

Seven requirements were tested with the grey box wrapper and all of them passed.

In addition, a fault injected wrapper was implemented, injecting faults at runtime by triggering a restart of the SUT. This was done to examine the system’s error handling and recovery. Twelve requirements were tested with this wrapper, with the result that a bug in the SUT was discovered.

Both studies emphasise the difficulty in verifying wrapper functionality and showed the challenge of finding the root cause for a warning concerning a requirement. In both projects a separate log file for the wrappers was implemented to give a better understanding of the communication between LBTest, the SUT and the wrapper.

14

(24)

2.4. TESTING OF ECUS

2.4 Testing of ECUs

2.4.1 System testing of ECUs at Scania

An ECU is a real-time system that consists of both hardware and software and is specialised to control or monitor parts of the vehicle’s functionality. This is done by continuously reading inputs in form of digital and analogue in-signals, such as switches and sensors. The output of the ECU is communicated over a Controller Area Network (CAN) that links the ECUs together and transports diagnostic mes- sages and operational parameters. Each CAN-bus forms a sub-net, which is linked to other sub-nets by a coordinator, an ECU that also distributes information about actions made by the driver.

Scania uses a version of the V-model for testing of ECUs. The team responsible for system testing of chassis ECUs is also involved in module integration test and testing at part system-level. The test cases are designed to be executed on either a Hardware-in-the-loop (HIL) rig or a software emulator. The two platforms enable testing of specific functionality of the ECU by mocking the behaviour of its surrounding systems. Testing using the rig and the emulator platform are comple- mentary to each other and the current testing framework is compatible with both platforms.

The emulator is developed by Scania and functions by stubbing the ECU application code wherever it polls the hardware for information. By doing this, the emulator can feed the ECU application with e.g. a voltage where the ECU would normally read from an A/D-converter. The hardware is completely replaced by a software library, so the test cases can be executed on local computers. Also, software testing can start before the ECU hardware is finalised. The emulator executes with discrete time steps, about 20 times faster than real-time, which provides both a fast execution of slow events as well as the possibility to track fast events by pausing at specific time steps. Internal variables and signals can be directly accessed through the ECUs memory area.

HIL-testing tests both the hardware and software of the ECU, while the behaviour of most of the surrounding vehicle is simulated. The HIL-rig communicates with the ECU through a hardware controller and provides input from I/O and CAN-traffic in real-time. Besides manipulation of input, the rig enables interrupts and hardware fault injections at run time, for example to evaluate error handling in case of electronic failures. Internal signals of the ECU must be requested via a communication protocol (Keyword Protocol 2000) and cannot be read directly, which limits the number of signals to be accessed at once. The main drawbacks of HIL-testing are limited accessibility of the rigs and the real-time execution of test cases, compared to the fast execution of the emulator.

The current testing framework used for system testing of chassis ECUs at Scania is piTest, an acronym for Python interface to emulated software test. The base for piTest is the Python unit testing framework, the python version of JUnit. The basic configuration contains the name of the ECU to be tested, the platform type and

(25)

the directory for the communication signals between piTest and the platform. For testing in an emulated environment the emulator interface module iTest is used to read and write to the ECU software.

The main focus for the test cases written by the test team is specification-based black box testing, to verify the functional requirements. Requirement coverage is the only coverage model currently in use, with at least one test case per requirement.

Although some general guidelines exists, the testing strategies and degree of coverage are also influenced by the individual testers, which is why a variation between the test suites for different ECUs can be expected. Examples of testing strategies in use are boundary value analysis, combinatory testing, experience-based testing and in vehicle testing. White box techniques are usually not considered since structural coverage is not among the system testers responsibilities.

2.4.2 Formalisation of ECU requirements

The ISO standard 26262 for functional safety of road vehicles (ISO 2011) has been a motivator for studies on formalisation of ECU requirements. The current version of the standard targets the possible hazards caused by malfunctioning behaviour of electronic systems for passenger cars with weights under 3500 kg. However, an adaptation of the standard for heavy vehicles is expected as well, which has led manufacturers to investigate what impact a compliance to the standard would have.

ISO 26262 proposes a top-down approach, where safety requirements are mapped to architectural elements and traced throughout the development life cycle. The standard states that safety requirements should be specified by a combination of natural language and formal, semi-formal or informal notations, depending on their safety integrity level.

A case study at Bosch (Post, Menzel & Podelski 2011) evaluated to which degree informal behavioural automotive requirements were formalisable by examining their expressibility in a specification pattern system represented in restricted En- glish grammar, a formalised natural language automatically transformable to LTL, computation tree logic (CTL) or other logics. The reason for using this grammar was to maintain readability by stakeholders but still allow for automatic consistency checking on the requirements through formal analysis. A sample of 245 informal functional requirements from five projects in the automotive domain were randomly selected for the study. Out of these requirements, 39 turned out to be not translat- able without loss of meaning. For 25 of the non-expressible requirements a branch- ing time concept were needed, due to their concern of possible instead of actual behaviour of the SUT. Other reasons for untranslatability to the restricted English grammar were statements about properties in several ECUs, not expressible at the given abstraction level, and requirements not describing functional behaviours, concerning the appearance of the product. Another common reason for untranslatability was vagueness in the requirements, to the degree that the authors were not able to recover the properties that the requirements were intended to capture.

A similar case study was conducted at Scania (Filipovikj, Nyberg, & Rodriguez- 16

(26)

2.4. TESTING OF ECUS

Navas 2014) exploring the possibility to formalise their automotive requirements using specification patterns based on restricted English grammar. Out of 100 gathered requirements 30 % could not be expressed in restricted English grammar. The most common obstacle was that the requirements did not concern system behaviour.

After excluding the non-functional requirements, about 8 % of the remaining requirements were still not formalisable, mainly due to ambiguous expressions and omitted information. Among the formalisable requirements difficulties to grasp the intent and to determine the scope of requirements were also encountered, requiring assistance from Scania engineers for accurate formalisation.

(27)

(28)

Chapter 3

The case studies

The third chapter describes the two case studies that were conducted to evaluate the possibility to formalise Scania’s automotive requirements in LTL, how ECU applica- tions can be modelled for LBTest and whether LBTest can find undiscovered defects in the software. It also contains a description of the benchmarking conducted to evaluate LBTest against test cases currently in use at Scania.

To evaluate LBTest two system requirements documents were formalised, translated to LTL and tested with LBTest. The first specified the low fuel level-warning (Scania 2015a) and the second dual-circuit steering (Scania 2015b). Both documents included requirement specifications in natural language and semi-formal notation, often expressed as pseudo code. Several iterations of requirement translation and testing were conducted to detect both incomplete and vague requirements as well as errors in the implementation. Each translated requirement was first tested against the current implementation to make potential ambiguities in the requirements visi- ble and find possible deviations in the implementation. In the case where warnings or failures were given from LBTest due to incomplete or vague requirements a reformulation was considered to be able to move forward with the case study. Warnings due to discrepancies between the requirements and the implementation were followed up by the test team.

A wrapper with basic functionality for communication with the emulator, using the emulator interface module iTest, and LBTest was already in place when the project started. Case specific communication code was added and later reviewed by the test team to avoid warnings due to malfunctioning test code. As suggested in previous case studies (Lundmark 2013, Nycander 2015) a wrapper log was implemented as well to keep track of the communication between the wrapper and LBTest.

The case studies were followed up with benchmarking against test cases currently in use, using a mutation testing strategy by injecting small errors into the source code.

(29)

CHAPTER 3. THE CASE STUDIES

Figure 3.1. An illustration of the low fuel level-warning from a black box perspective

3.1 Case study 1: Low fuel level-warning

The low fuel level-warning provides the driver with an additional indication of when a refill is necessary, without having to monitor the fuel level estimation on the instrument cluster. The information of the current fuel level is given by the internal signal total fuel level, that is calculated by another function, the fuel level estimation (figure 3.1). The output of the system is the low fuel level-warning signal. The basic functionality of the low fuel level-warning is to trigger the warning once the estimated fuel has decreased below a threshold, and only turn it off if the estimated fuel level has increased substantially, above a specified level.

3.1.1 Requirement formalisation

The requirements document for the low fuel level-warning included seven requirements and specified one internal input signal, one output signal, and parameter settings for tank sizes and to enable the functionality. The initial requirement formalisation only used the information given in each requirement and the general instructions for the requirements for the LTL-translation. One of the seven requirements was a specification of the parameter setting for a subset of the other requirements. Since this requirement by itself was not a functional mapping between input and output it could not be tested separately, but the information regarding the parameter setting was added to the requirements concerned. The other six covered the functionality of the low fuel level-warning in three cases – specifications of general behaviour, behaviour at start up and behaviour given an error on the internal input signal. One of these requirements specified a value that was not within the given range for given signal, which made the requirement untestable. The remaining five were translated to LTL, resulting in eight formalised requirements to cover the basic parameter settings. The requirements specified different boundary values depending of tank size and type, that were set by input parameters and sensor type.

An additional parameter was used to switch the low fuel level-warning on or off.

The requirements could mainly be expressed in LTL as liveness statements G(φ → X(ψ)) – given input φ, ψ will hold in the next time step. For example:

20

(30)

3.1. CASE STUDY 1: LOW FUEL LEVEL-WARNING

If totalFuelLevel has status Error or NotAvailable output signal lowFuelLevelWarning shall be set to NotAvailable.

Could be expressed as

G((totalF uelLevel = error | totalF uelLevel = N otAvailable)

→ X(lowF uelLevelW arning = N otAvailable)

Halfway through the project new LTL operators were added to LBTest, allowing for expressions about past events. In this case study these became useful to express constant qualities, such as parameter settings for the fuel level indicator and tank sizes, by stating H(φ) & G(φ) for each of these variables.

The first strategy for requirement formalisation was based on the information given for each requirement, not adding any additional assumptions for when the requirements would or would not hold. This approach resulted in a number of warnings from LBTest, due to the structure of the original requirements. The separation between the main scenario and additional requirements for abnormal situations included an implicit assumption that these abnormal situation would not occur during the main scenario. But since this information was not explicitly stated it was not included in the LTL translation. A simplified example of this is one gen- eral requirement, expressed as G(φ → X(ψ)), and an additional requirement for error handling, expressed as G(error → X(χ)). The intention of the first require- ment should be to express G(φ & !error → X(ψ)), with an exception in case of an input error. Without this clarification LBTest produces a counter example to the requirement, stating that the requirement would not be valid in case of an input error. After discussing the requirements with the test team it became clear that the requirement should be expressed as G(φ & !error → X(ψ)). Another ambiguity, that did not have an obvious answer, was how to handle an overlap of the different variation scenarios specified. For example, the requirements specified one initial output value, during start up, and another output value in case of an error. But it was not clear which of these values should apply in case of an error during start up.

3.1.2 Modelling and partitioning

To be able to use the same wrapper to evaluate different parameter settings these were set during the start up of each test case. Two of the three parameter settings were merged into one input variable for LBTest, due to a very specific mapping of the two values determining the tank type and the sensor to be used for the test cases to be valid. The output for LBTest, indicating the state of the system, was the low fuel level-warning signal. The output signal was partitioned based on its four discrete values and the parameter settings for the tank were divided into three basic cases – large tank, small tank or gas tank. The input signal, total fuel level , was partitioned into four equivalence classes.

A difficulty in the modelling process was that the internal input, total fuel level, was an estimation of the external fuel level, and could only be adjusted by

(31)

manipulating the external signal. The external fuel level was set by adjusting the voltage of an analogue input pin, whilst the total fuel level was estimated by using a low pass filter and a filter algorithm based on the value of the external fuel level.

Due to the filtering process the difference between the external fuel level and the estimated fuel level could be substantial. Especially smaller changes in fuel level, less than would be expected during a refill, were difficult to detect. The requirements did cover cases of small changes in the estimated fuel level, which made the function difficult to test.

Testing the low fuel level-warning with this configuration resulted in a model with 26 states, which took 2 hours and 39 minutes to generate with an estimated convergence of 98%, based on 1000 random samples. However, the gaps between the external and estimated fuel level caused unreliable verdicts. In an attempt to work around this issue more time was added to each test case, to make time for the filtering process to adjust the estimated total fuel level to the value of the external fuel level. Nonetheless, the discrepancy between the actual input and the estimated value would occasionally be so large that the fail verdicts were given for adequate behaviour of the SUT.

3.1.3 Warnings and detected errors

LBTest gave several warnings for the requirements given the current implementation. Some were due to implicit assumptions and ambiguities in the requirements, others to difficulties to model the functionality of the SUT for LBTest without causing accidental errors. The warnings due to requirement ambiguities were handled by adding the implicit assumptions that were lacking, to be able to continue with further testing of the SUT. The warnings from LBTest caused by modelling issues proved harder to work around. The attempt to add more time to each test case resulted in a substantially increased runtime, without being able to completely avoid false negatives from LBTest. This made it difficult to find actual bugs or injected errors in the SUT. The case study was therefore not followed up with a benchmarking.

3.2 Case study 2: Dual-circuit steering

The dual-circuit steering functionality is implemented to ensure adequate steering ability in presence of singular faults or when the engine is not running. The requirements for the function describes when the second hydraulic system, powered by an electric motor, should be activated. Other outputs affected are two CAN signals that communicate the status of the two hydraulic systems, one internal output signal and eight trouble codes. The input to the function consists of four CAN signals, the ignition and two sensors. In addition, a parameter setting specifies whether a dual-circuit steering system is connected (figure 3.2).

22

(32)

3.2. CASE STUDY 2: DUAL-CIRCUIT STEERING

Figure 3.2. An illustration of the dual-circuit steering function from a black box perspective.

3.2.1 Requirement formalisation

The requirements document consisted of 32 requirements. The majority of these did not form a mapping between the specified input and output variables. Instead, the requirements included specifications of so called model variables and their relation to input variables, output variables and each other. The model variables described qualities of the current state of the function, such as secondary circuit handles steer- ing, primary circuit hydraulic malfunction or vehicle is moving. In total nine model variables were used in the document. Some of the variables matched internal signals that could be accessed by reading from memory. These variables could be seen as a form of output, but not on the current, black box, abstraction level. Another interpretation was to view the model variables as internal variables, keeping track of the last registered value for some of the inputs to the function. For example the variable vehicle is moving was specified to use the last registered value of vehicle speed. These variables turned out to be used in a similar way in the actual implementation. To write test cases for internal variables in the implementation would be a form of white box testing, evaluating the implemented code rather than the functionality.

To keep the testing at a black box abstraction level the requirements containing model variables were reformulated to only concern the relationship between the specified input and output variables. This was achieved by tracking when the model variables were set and what effect they had on the output of the function. Some of these variables were set to true if and only if a specific diagnostic trouble code was turned on, which made it possible to replace the variable itself with the trouble code. Others turned out to be dependent on several conditions on input, output and other model variables. An additional complication was the naming of the variables, which was not consistent throughout the requirements document.

An example of this process is the formalisation of Req 1 below. To fully under- stand it seven other requirements (Req 2 – Req 8 ) had to be taken into consideration and partially merged. The variable names in the example have been replaced with token names and irrelevant information cut out. The input variables are underlined,

(33)

output variables are bold and the model variables in italic.

Req 1

While variable3 == true If input4 == not set

output2 = on Req 2

While input1 == off and input2 < 10 if input3 == off for more than 1 second

then variable2 = true and troublecode2 = on if input3 == on

then variable2 = false and troublecode2 = off (. . . )

Req 3

If variable1 ==true or troublecode3 = on, then variable3 = true Req 4

If variable2 ==true then variable3 = true Req 5

If input3==off and the vehicle is moving, then variable4 = true (. . . )

Req 6

If variable4 == true then variable3 = true Req 7

While input1 == on and input2 > 400 if input3 = off for more than 1 second

then variable1 = true and troublecode1= on if input3 = on

then variable1 = false and troublecode1= off (...)

Req 8

If speed > moving limit

then vehicle is moving = true If speed < stationary limit

vehicle is moving = false

The resulting LTL requirement that captured the meaning of Req 1, after par- titioning the vehicle speed, became:

24

(34)

3.2. CASE STUDY 2: DUAL-CIRCUIT STEERING

G( ( (X(troublecode1=on | troublecode3=on | troublecode2 = on) | (input3=off & ((speed=medium | speed=high ) | (Y(speed=medium | speed=high) & speed=low)))) & input4=notset) → X(output2 = on)) Nine of the original requirements were excluded from the formalisation for different reasons. Four requirements turned out to be non-functional at system test level, describing how data should be stored and where to access internal signals.

Three requirements only described qualities of model variables, without affecting the actual output of the function. In addition, two requirements concerned electronic failures, such as the electric motor being short circuited to battery or over loaded, which had to be tested on the HIL-rig. They were therefore not concidered for this case study. The remaining 23 requirements where reformulated, formalised and translated to LTL.

In the re-formulation process several requirements were merged to map input variables directly to output variables. Other requirements, containing disjunctions, were separated into two or more LTL requirements. The 23 original requirements that could be formalised resulted in 30 LTL requirements. Only one requirement, which demanded the input parameter to be set to “off,” explicitly stated the value of the parameter. For the remaining 29, where the value was assumed to be “on”, the setting was not mentioned which caused obvious counterexamples from LBTest.

A majority of the requirements followed the pattern of G(φ → X(ψ)) as in the previous case study. Some requirements specified a more complex statements by describing the relationship between past and future events, which made the addition of the past operators to LBTest very helpful. For example, some requirement specified events that should occur if a self test had been performed. Self test was neither an input nor output variable from a black box perspective, but could be regarded as performed if the electric motor had been on while the second sensor had a flow or no flow, since the last engine restart. This quality could be expressed by using past operators as:

(O(emotor = on & (sensor2 = f low | sensor2 = nof low) S (ignition = restart)) | (O(emotor = on & (sensor2 = f low |

sensor2 = nof low) & H(ignition = on))

3.2.2 Modelling and partitioning

The configuration of input and output types for LBTest mainly followed the specified input and output variables as stated in the requirements document. One exception was the internal output signal that did not have an effect on system test level and could only be detected by reading from the emulator memory. After consulting with the testers at Scania the requirements concerning this signal were excluded from the case study.

Five of the input and output variables were discrete and could take two to six values, which were specified for LBTest. The continuous variables, engine and vehicle speed, were partitioned based on the boundaries specified in the requirements,

(35)

for example when the vehicle should be regarded as moving or the speed for when the electric motor should be turned on.

This modelling strategy resulted in a model with over 60 states. The stopping criteria given to LBTest was 300 random checks, which means that no difference could be found between the current hypothesis of the SUT and the actual behaviour of the SUT after executing 300 random input values. This level of convergence was reached after 7 hours and 24 minutes. The final measure of convergence found 30 differences out of 1000 random samples, an estimated convergence of 97%.

3.2.3 Warnings and detected errors

The first warning given by LBTest was due to the missing information for the setting of the parameter value. Although only one requirement was tested with this setting it would have caused warnings for the majority of the other 28 requirements where the parameter setting was unspecified as well.

Five LTL-requirements received a fail verdict from LBTest due to discrepancies between the requirements the implementation under test. Two of the LTL- requirements stemmed from the same original requirement, which specified when the diagnostic trouble codes should be deactivated after a previous activation. LBTest found counterexamples for two different trouble codes that were deactivated by an error on an input signal, and activated again after the error was discontinued; a behaviour that the requirement stated should not to occur. Two warnings also concerned requirements describing how errors on input signals should affect output signals and trouble codes. The final warning given by LBTest was due to a time limit for when a trouble code should be activated, that was not followed. These five failed requirements were discussed with testing engineers at Scania and proved to be real faults in the SUT, although they were not considered safety critical.

3.3 Benchmarking

The benchmarking was conducted using a mutation testing strategy by injecting 10 errors, one by one, to the source code of the dual-circuit steering function. Each version was then tested with both LBTest and test cases currently in use at the department. The errors injected were changed boundary values, mixed up input/or output variables or altered Boolean values - small, syntactically valid, changes to the source code. The faults were picked randomly without checking for equivalent mutants. The fault injected code was tested with both LBTest and piTest, using the emulator platform. To avoid getting alerts for ambiguities or defects that were detected during the previous case study each affected LTL requirement was either updated with exceptions for these instances or removed.

A known issue with the current version of LBTest that affected the bench mark- ing process was that the hypothesis of the SUT produced was deleted after each tested requirement. For each requirement LBTest had to re-learn the SUT and build a new model, instead of using the old one. This made testing for a large number of

26

(36)

3.3. BENCHMARKING

requirements a tedious task. The initial plan for the benchmarking was therefore to conjunct all 30 LTL-requirements into one to run against a fault injected version of the source code, fault by fault. Even though this had worked in previous case studies of LBTest, each attempt to apply the strategy in this project resulted in a premature termination of LBTest. The root of this problem seemed to stem from the model checker that could need up to 20 minutes to verify the conjuncted requirements, that likely caused a time out in the communication with LBTest.

The configurations for LBTest were based on recommendations from the software reliability group at KTH. However, during the case studies it became apparent that the recommended, SAT-based, bounded model checker (BMC) had difficulties to handle the size of the models that LBTest produced after about 50 iterations.

To abort testing this early could lead to an unfair comparison between the test methods. On the other hand BMC tended to detect defects faster than the model checker based on binary decision diagrams (BDD), so a complete switch of model checker would result in a substantially longer runtime, which could potentially delay the project. Therefore a compromise was used. For each injected error, the source code was tested with piTest and LBTest, using the BMC model checker. If the error was detected using the BMC model checker no further testing was conducted for the error. In case that LBTest was not able to find the error after 50 iterations using the BMC model checker, additional testing was conducted using the BDD model checker to make sure that the verdict was not caused by the limitations of the model checker.

(37)

(38)

Chapter 4

Results

This chapter presents the results of the two case studies and the benchmarking described in Chapter 3. The results of the formalisation process, the warnings given by LBTest and the benchmarking between LBTest and piTest are displayed.

4.1 Requirement formalisation

The results of analysing the requirements from the two case studies one by one lead to an exclusion of 11 requirements from a total of 39 original requirements (table 4.1). These were not tested during the case studies.

The non-functional requirements described qualities of the SUT, such as data storage and accessibility of signals, instead of expected behaviour given input and output values. The other non-formalisable requirements described functionality on white box level, that did not have an effect on the functions output variables, or specified values that did not match the possible values of the specific variable. In addition two requirements were excluded during the second case study since they could only be tested on a HIL-rig. Out of the eleven requirements that were not formalised and tested with LBTest five were neither tested in the current testing framework for emulator tests.

Original Non-functional Other non- Other not requirements formalisable testable

Case study 1 7 1 1 0

Case study 2 32 4 3 2

Total 39 5 4 2

Table 4.1. The unformalisable requirements from the two case studies.

(39)

CHAPTER 4. RESULTS

4.2 Source of detected errors

Ambiguous Wrapper code Deviations from requirements or modelling the requirements

Case study 1 2 1 0

Case study 2 1 0 5

Total 3 1 5

Table 4.2. The root cause for the warnings and fail verdicts given by LBTest

Five of the fail verdicts and warnings given by LBTest were due to actual discrepancies between the requirements and the implementation, that had not previously been detected (table 4.2). These were found during the second case study.

4.3 Benchmarking

Fault piTest LBTest – BMC LBTest - BDD

1 Not terminated Detected -

2 Undetected Detected -

3 Detected Undetected Undetected

6 Not terminated Undetected Undetected

7 Detected Detected -

Table 4.3. The detection of injected faults by piTest and LBTest

LBTest gave a pass verdict for two instances of error injected code and piTest for three. LBTest gave a fail verdict for eight instances and piTest for two. The remaining five errors caused severe problems for piTest, since the execution of the test cases relied on certain initial values being reached during the set up phase.

piTest could not terminate properly when testing the altered code and no final verdict was given.

The BDD-model checker was only used when the BMC-model checker could not detect the injected error. No new detections were made by using BDD.

30

Learning-based testing of automotive ECUs

Learning-based testing of automotive ECUs

SOPHIA BÄCKSTRÖM

Learning-based testing of automotive ECUs

Abstract

Referat

Inlärningsbaserad testning av ECU:er

Acknowledgements

Contents

Chapter 1

Introduction

1.1 Objective

1.2 Methodology

1.3 Delimitations

1.4 Contributions

1.5 Thesis outline

Chapter 2

Background

2.1 Software testing

2.2 Model-based testing

2.3 Learning-based testing

2.4 Testing of ECUs

Chapter 3

The case studies

3.1 Case study 1: Low fuel level-warning

3.2 Case study 2: Dual-circuit steering

3.3 Benchmarking

Chapter 4

Results

4.1 Requirement formalisation

4.2 Source of detected errors

4.3 Benchmarking