IN
DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS
STOCKHOLM SWEDEN 2016 ,
Formal Requirement Models for Automotive Embedded Systems
JOHN ERIKSSON
KTH ROYAL INSTITUTE OF TECHNOLOGY
Formal Requirement Models for Automotive Embedded Systems
Formella kravmodeller för inbäddade fordonssystem
JOHN ERIKSSON
Master’s Thesis at CSC Supervisor: Dilian Gurov
Examiner: Mads Dam
Abstract
Embedded systems are a crucial part of modern vehicles today and are used widely by the automotive industry to control safety-critical functions. To verify that the soft- ware will work correctly, formal verification can be used to prove that the code will always behave correctly according to some specification. This report will look into how to formulate the specification in such a way that it is easy to use, consistent and can be used efficiently for code verifi- cation. Two different models are looked into in the report, and applied to real automotive embedded code. From this, conclusions are made about the different models.
Inbäddade system är en viktig del av moderna motorfor- don idag, och används av stora delar av fordonsindustrin för att kontrollera säkerhetskritiska funktioner. För att ve- rifiera att mjukvaran fungerar korrent, kan man använda formell verifiering för att bevisa att koden alltid fungerar korrekt enligt en specifikation. Den här rapporten kommer att studera hur man bäst formulerar en sådan specifikation så att den är lätt att använda, konsekvent och kan använ- das effektivt för kodverifiering. Två olika modeller används i rapporten, och appliceras till en riktig kodmodul från for- donsindustrin. Från detta görs sedan slutsatser om de olika modellerna.
Contents
1 Introduction 1
1.1 Background . . . . 1
1.2 Problem . . . . 2
1.3 Outline . . . . 2
1.4 Collaboration . . . . 2
1.5 Contribution . . . . 3
1.6 Delimitations . . . . 3
1.7 Ethical and societal aspects . . . . 3
2 Theoretical background 5 2.1 Hoare logic . . . . 5
2.1.1 Weakest precondition . . . . 6
2.1.2 C verification tools . . . . 7
2.2 Embedded systems . . . . 9
2.3 Functions and functional decomposition . . . . 10
2.4 Related work . . . . 11
3 Requirement models 13 3.1 The studied embedded system . . . . 13
3.2 Semantics . . . . 14
3.3 Relation to Hoare logic . . . . 14
3.4 Studied models . . . . 14
3.4.1 Conditional assignment model . . . . 15
3.4.2 Functional model . . . . 17
3.4.3 Conversion between conditional assignment and functional form 20 3.5 Removing model variables from specifications . . . . 21
3.6 Translation to code annotations . . . . 22
3.7 Adding annotations to the rest of the functions in the module . . . . 24
3.7.1 Input and output functions . . . . 25
3.7.2 Complicated functions . . . . 25
3.8 Temporal requirements . . . . 26
4 Method 29
5.1.1 Sufficiency of models . . . . 31
5.1.2 Contradictions in specifications . . . . 31
5.1.3 Implicit assumptions . . . . 32
5.1.4 Variables with multiple complicated conditions . . . . 33
5.1.5 Problematic requirements . . . . 35
5.2 Rewriting requirements to functional form . . . . 36
6 Discussion 39
Bibliography 41
Chapter 1
Introduction
Embedded systems such as automotive control systems are often safety-critical, and it is imperative that they will behave correctly in all possible situations. Although well-written code can usually be unit-tested to ensure that it behaves correctly under different situations, it is usually not guaranteed that the test suite will cover all possible situations or events. With static analysis, the code is instead analyzed logically to prove that it behaves correctly in all possible situations, where correct behaviour is described by a specification. Although formal verification has been a subject of research for a long time now, adoption of formal verification methods has been slow.
1.1 Background
In this study, the embedded system code that we will work with is part of an existing code base written in C, a low-level language with support for features such as pointer arithmetic. The code is part of the embedded code used in Scania trucks, and is used to control the steering system. Scania wants to be able to formally verify the code. There are some challenges that must be solved to accomplish this. Verification requires a specification that correctly describes correct behaviour for the steering system. The specification is usually not filled with implementation code details like internal variables and function declarations, but should nonetheless describe the intended behaviour as precisely as possible. Some semi-formal specifications for the code module already exist in Scania.
The specification must then be added to the code so that an automated verifier
program can prove that the code is correct. This includes adding annotations to
each function of the code, that will inform the verifier on what the function is
supposed to do. This step requires understanding of both the specification and the
existing implementation. Verifying C code itself is also difficult, due to its weak
typing and pointer arithmetic [24][7, p. 57], although the verifier programs can
handle these issues to a large extent.
1.2 Problem
This study will focus on how to formulate specifications that describe how the embedded systems must behave. The question is "What model should be used to state the formal specification requirements for an embedded program operating on input/output"?
The model(s) must fulfill certain criteria:
• The model must be strong enough to able to describe the requirements that we need to ensure that the systems behave as desired under all possible situations.
This includes both existing and future requirements for the embedded systems that are studied in this report.
• It must be reasonable simple to formulate new requirement in a precise way.
Preferably, embedded systems programmers without detailed knowledge of logic and theoretical computer science should be able to understand the model and write new requirements.
• The end goal is to be able to use automated tools to formally verify that the code complies with the requirements. The written requirements must therefore be clear and formal enough to be verified automatically with as little manual intervention as possible.
1.3 Outline
This report will first give some background on Hoare logic (the foundation of formal verification of code), on C code verification and tools and on embedded systems.
Some specific details of the embedded systems, how they work and what assump- tions can be made is then done. Two alternative requirement models will then be formulated, once which is similar to current work at Scania (essentially a formal- isation of their previous specification form) and one which is more different. The work of the study is then to apply both these models by writing specifications for the current code module. From the application, some conclusions will be made about the strengths and weaknesses of the models in realistic scenarios, and some recommendations will be made for future specification writing.
1.4 Collaboration
The project was done in collaboration with Christian Lidström. His report [20]
mostly focuses on the more practical aspects of the code verification process, such
as code annotation.
1.5. CONTRIBUTION
1.5 Contribution
The main contribution of the report is to formalise two different models that can be used to write requirements for a formal specification for embedded systems, one of which differs significantly from existing semi-formal requirement documents in Scania. It also compares these two models and analyses advantages and drawbacks of the two models by using both in a real embedded systems project.
1.6 Delimitations
The report will focus on writing specifications for I/O-based systems, where a func- tion reads from and writes to some external I/O resources that control hardware functions, and where the correct outputs can be defined as functions of the inputs, with little or no other factors during the execution of the function.
The report will not say much about the practical process of verification. More information of the practical process can be found in Lidström’s report [20].
1.7 Ethical and societal aspects
The software used in automotive embedded systems are going to be used to control very important parts of vehicle, some of which are very safety-critical. Malfunctions in these systems could potentially lead to material damage, injuries, and possibly even fatalities. It is therefore very important for manufacturers to ensure correct functionality in their products to avoid these negative consequences, and not doing a reasonable effort to ensure this would be unethical. Formal verification could be a very powerful approach to protect the vehicles against severe software malfunctions.
Higher reliability of vehicles will lead to less service interruptions and fewer accidents, which is good for all parts of society who depend on transport of people and goods in some way.
Better software may also lead to longer vehicle lifespans, which is good from a sustainability point of view. Apart from that, formal verification of embedded automotive software has no particular impact on sustainability.
The study did not feature any interviews, private data collection or experiments
on humans or animals.
Chapter 2
Theoretical background
2.1 Hoare logic
The basics of Hoare logic will be explained in this section, as it is the basis for the static C code verification tools that are used in the projects.
One of the most important methods for static analysis of programs is Hoare logic [14]. The main idea of Hoare logic is the so-called Hoare triple {P} S {Q}, where S is a program consisting of one or more statements, and P and Q are Boolean formulas. P represents the condition that is held before the program S is executed on state. This is called the precondition. Q is the condition that is held directly after the program S has been executed, and is called the postcondition. The Hoare triple thus describes the relation between the state before the execution and the state after the execution of the program, and forms a specification for the program.
In order to prove that a Hoare triple holds, axioms are used to decompose and analyse the statements of the program. Some of the most basic ones are:
• Rule of consequence: If {P} S {Q}, P → P
0and Q
0→ Q, then {P’} S {Q’}
holds.
• Rule of composition: If a program S consists of two subprograms S
1and S
2executed in sequence, then {P} S
1; S
2{R} is valid if and only if {P} S
1{Q}
and {Q} S
2{R}, that is, the postcondition of S
1must be a valid precondition for S
2. This is used to decompose a program consisting of multiple statements into smaller parts, until the statements can be analysed separately.
The other rules are usually related to the different kinds of statements of the target languages, and depend on the syntax and semantics of the language. In Hoare logic, we can separately prove partial correctness and total correctness. Partial correctness means that it is proven that {P} S {R} will hold if the program terminates. It is not proven, however that it will terminate. Total correctness means that it is also proven that the program will terminate.
The original paper by Hoare uses the simplistic While language. It has variables,
variable assignments, if statements and while statements, but no arrays or pointers.
{b = Y ∧ a = X}t:=a{b = Y ∧ t = X}
{b = Y ∧ t = X}a:=b{a = Y ∧ t = X}
{a = Y ∧ t = X}b:=t{a = Y ∧ b = X}
{a = X ∧ b = Y }t:=a;a:=b;b:=t{a = Y ∧ b = X}
Figure 2.1. Hoare logic example
Notably, each variable has only one unique name and no aliases, an important restriction that does not hold for more advanced languages.
The other While statement rules are:
• Rule of assignment: X:=E is an assignment statement assigning the value of the expression E (which can refer to variables) to the variable X. Then {P[E/X]} X:=E {P} must hold. This rule will not work if variables can have aliases.
• Rule of if: if b then S
1else S
2is an if statement with Boolean con- dition b and subprograms S
1and S
2. {P} if b then S
1else S
2{Q} is valid if and only if {P ∧ b} S
1{Q} and {P ∧ ¬b} S
2{Q}.
• Rule of partial-while: while b do S is a while statement with Boolean condition b and subprogram S and S
2. {P } while b do S {P ∧ ¬b} is valid if and only if {P ∧ b} S {P }. P here is the loop invariant that must hold before the loop is entered, after each loop iteration, and directly after the loop.
Once the loop is completed, the invariant P will still be valid, and the loop condition b will be false. Note that this rule only proves partial correctness.
In order to prove termination of the while loop as well, we can define a loop variant, which is an arithmetic expression, the value of which will strictly decrease at each iteration. We can then add the condition that for the loop body, the loop variant must be bigger than a certain value, often 0 when the variant is an integer expression.
2.1.1 Weakest precondition
Predicate transformer semantics can also be used to reason about programs. A predicate transformer is a function that transforms a program and a predicate into a different predicate. In particular, the weakest precondition function wp(S, R) takes a program S and a postcondition R and returns the weakest precondition that is necessary for program S to fulfill the postcondition R. In the other direction, the strongest postcondition function sp(S, P ) takes a program S and a precondition P , and returns the strongest possible postcondition that is fulfilled by the program.
Much like Hoare logic, these predicate transformers are defined inductively for each
kind of statement in the language [10, 13]. Some simple definitions can be given
for the simplistic While language, which again does not have arrays or any kind of
2.1. HOARE LOGIC
wp(skip, R) = R wp(X:=E, R) = R[E/X]
wp(S
1; S
2, R) = wp(S
1, wp(S
2, R))
wp(if b then S
1else S
2, R) = (b ∧ wp(S
1, R)) ∨ (¬b ∧ wp(S
2, R)) wlp(while b do S, R) = (b ∧ wlp(S, wlp(while b do S, R)))
∨ (¬b ∧ R)
Figure 2.2. Weakest precondition rules
variable or memory aliasing. The weakest liberal precondition only ensures partial correctness. Weakest precondition is stronger and calculates total correctness.
Apart from the while rule, which is recursive and requires some extra calculations to resolve [13], calculating the weakest precondition for a program and a postcon- dition can be done mechanically, and is therefore used as a method in several static analysers, including VCC and Frama-C [8, 5, 1, 17].
Predicate transformers can be used to achieve symbolic execution of programs [13, pp. IX]. The main idea is to run the predicate transformer stepwise through the program to get the predicate formulas in each step of the executions. In cases of branches (such as with if and while statements), split the execution into several branches and evaluate each separately, and so on. In practice, this may result in a combinatorial explosion of paths that need to be evaluated. This can be handled by merging paths in some way when they join [18].
2.1.2 C verification tools
This chapter will explain the two C verification tools that were considered. In the end, VCC was chosen as the verification tool for this study.
VCC
VCC is a tool for C code verification that is developed by Microsoft Research
[3, 8]. It is designed to be capable of verifying both sequential and concurrent C
code. VCC extends C with annotations (see example in figure 2.3) that are used
to define certain properties of functions, such as preconditions, postconditions, the
set of pointers and global variables that it writes to, etc. Annotations can also be
used to define invariants in data structures that must hold during execution of the
program. Ghost code, code that is seen by the verifier but not by the final compiler,
can be written to define more complicated conditions. The verification step is done
automatically without need to interact manually with it. For the verification logic,
VCC uses Boogie. Boogie is a general program verifier that is not tied to any
particular language, but uses its own language BoogieFL to encode any kind of
#include <vcc.h>
void swap(int *a, int *b) _(writes a,b)
_(ensures *a==\old(*b) && *b==\old(*a)) {
int tmp = *a;
*a = *b;
*b = tmp;
}
Figure 2.3. Simple swap function with VCC annotations
program semantics. Boogie then uses weakest pre-condition calculus to generate verification conditions that are fed into Z3. Z3 is an automated theorem which supports satisfiability modulo theories (SMT) [1]. It has previously been used to verify code written in Spec#, an extension of C#.
The first step in VCC verification is to parse the C code and perform the usual compile time checks. If this succeeds, VCC then generates all the necessary proof obligations and C semantics definitions and outputs it as Boogie source code that encodes the program and its obligations. Boogie then performs the verification.
Some tools are provided to debug the process if VCC fails to verify a program.
VCC Model Viewer can be used to inspect the generated model and the conditions that caused the verification failure. The Z3 inspector can be used if Z3 crashes or takes too long to verify conditions.
Microsoft Research has developed and implemented a specialised typed mem- ory model for VCC. This model sees the memory as a collection of typed objects.
The typed model is proven to be equivalent with the simple untyped heap-as-array memory model in the sense that both models will give the same final state and have the same errors. VCC used to have a untyped memory model, but this led to poor verification performance [9].
VCC is integrated into Visual Studio, and the verifier itself and its debugging tools can be accessed directly from it. Any verification errors are highlighted in Visual Studio’s code editor. For practical purposes, VCC was chosen as the veri- fication tool for this study, as it fits well within the Windows-based development environment in Scania, and as we were able to achieve some very basic verification early in the project without too much difficulty.
Frama-C
Frama-C is a tool for C code verification that is developed by the two French re- search institutes CEA and INRIA [2]. Frama-C is a general framework for C analysis designed with a plugin architecture to allow for development of new analysers[17].
Frama-C comes with several included plugins.
2.2. EMBEDDED SYSTEMS
/*@
assigns *a, *b;
ensures *a==\old(*b) && *b==\old(*a);
*/
void swap(int *a, int *b) {
int tmp = *a;
*a = *b;
*b = tmp;
}
Figure 2.4. Simple swap function with ACSL annotations
WP is the Frama-C plugin that is used to statically verify program proper- ties. WP extends C with annotations that are used to define certain properties of functions, such as preconditions, postconditions, the set of pointers and global variables that it writes to, etc. These annotations are written as comments in a standardised syntax known as ACSL [6]. An example of ACSL is given in figure 2.4. It uses weakest precondition calculus to calculate the necessary verification conditions. Once this has been done, it uses an internal tool, Qed, to perform some basic simplications. If these are not enough to prove a property, an external theo- rem prover is used to check the rest. Frama-C supports multiple external theorem provers. The standard one is the automatic SMT theorem prover Alt-Ergo, but the interactive proof assistant Coq can also be used [17].
Frama-C supports multiple separate memory models that can be chosen depend- ing on the type of code. The Hoare model assumes that there are no pointers, and does not support reading from or writing to pointers. The Typed model models the memory as three separate arrays of integers, floats and pointers respectively, and supports pointer operations [5].
Frama-C has no official Windows support, and we failed to get the latest version to function in our Windows environment. Once we got it running in a virtual Linux environment, we also had some issues with verifying some basic memory safety properties of the code early on in the project. This study will therefore not use Frama-C.
2.2 Embedded systems
Embedded systems are information processing systems that are embedded into an
enclosing product. They are typically not programmed to be general purpose, but
to perform a specific task within some hardware, and often have real-time require-
ments as well. Embedded computers are used today to control important func-
tions in many application areas, such as automotive applications, avionics, rail-
ways, telecommunications, etc.[22, p.xii-xiii, 1]. This paper mostly focuses on the
automotive application.
A new term that is used for embedded systems is cyber-physical systems, which can be defined as “integrations of computation and physical processes”, where the software and hardware processes are closely interacting with each other. The term thus includes the hardware itself as well as the software and the embedded comput- ers. [22, p.xiii][16].
According to Marwedel [22, p.1-] , cyberphysical systems are expected to be:
• Dependable. Cyberphysical are often safety-critical, as they are directly con- nected to their physical environment and affects it.
• As efficient as possible, to avoid consuming more resources than necessary for its specific task.
• Connected to the physical environment through sensors that collect informa- tion and actuators that control functions.
• Reactive, that is, they are constantly interacting with their environment. In a vehicle, for example, the system must constantly react to what happens in the engine, steering, etc.
• Frequently, required to meet real-time constraints. For example, if some criti- cal event occurs in an engine, the computers must respond in some way within a small time-frame.
Westman and Nyberg [25] presented a way of specifying and structuring require- ments for cyber-physical systems. Their approach establishes a theoretical frame- work to model all parts of a cyber-physical system. Each part (which can be either software, hardware or some other physical entity) is modelled as an element. Each element consists of a non-empty set of port variables that are the interface of the element, and an assertion that is the behaviour of the element. Elements are then connected to model a CPS, this model is called an architecture. The interaction between the different elements are modelled by sharing port variables between the different element interfaces. Westman and Nyberg also shows that this model ap- proach can be used to model sequential C programs. Individual functions and modules in the program are modelled as elements. This, together with their mathe- matical model for reasoning about architectures, enables compositional verification of complex programs by verifying the individual components of the C program sep- arately. Under certain conditions, this is sufficient to verify the entire program[26].
2.3 Functions and functional decomposition
Later in this report, some theory on functions will be required for one of the models that is used in the study.
Functions with multiple arguments may be useful to model complex systems.
For example, an embedded system with 5 input sensors and one output can be
2.4. RELATED WORK
a
b
c
d
e
h
1h
2j g
Figure 2.5. The variable dependencies of the example, as a directed acyclic graph
modelled as a function f (a, b, c, d, e). of the type A × B × C × D × E → O, where A, B, C, D and E are the value sets of the input sensors and O is the value set of the output.
Reasoning about complex multi-argument functions can be difficult. Functional decomposition [15] can be used to handle this complexity. The main idea is that the function has some kind of internal structure that can be modelled, to turn the function into several smaller functions. For example, the function f (a, b, c, d, e) may have an internal structure that allows it to be modelled as three smaller functions, f (a, b, c, d, e) = g(h(a, b), h(c, d), j(a, c, e)). This representation divides the function into smaller, less complex functions. It also enables better understanding of the system itself, and how different parts of it relate to each other.
In a functional decomposition, each function will depend on some variables and/or some other values derived from other functions. These dependencies can be modelled as a directed acyclic graph, where each function and variable is a node, and edges go from a node to all other nodes which depend on this node. A figure of the graph of the earlier example can be seen in figure 2.5.
Methods for achieving functional decomposition have been studied [21, 12], in particular for boolean functions. Disjoint tree-like compositions, that is, decompo- sitions where each variable is used once and the dependency graph forms a tree, was studied by Ashenhurst in 1959 [4], and many other cases have been studied [21, 12].
2.4 Related work
Dordowsky [11] has presented a case study of Frama-C for verification of embedded
avionic software. The code in the case study was a part of control software for
a sensor used in a military helicopter assistance system. The code had to be de-
veloped to be compliant with DO-178C, an avionics software verification standard.
The standard defines low-level requirements as requirements that can be directly used to develop the program from without further information, and the case study explored expressing these low-level requirements in ACSL, the main formal speci- fication language used in Frama-C. Some problems occured in the project. ACSL is not capable of expressing behaviour across multiple invocations of a function.
It can only specify the requirements for single invocations of functions. The case study mentioned that the Aoraï plugin can be used to model state automatons, but this was not explored further. The WP plugin support for mathematical expression in the specifications is also poor. If a verification condition fails, it is difficult to see what causes the failure without knowledge of automated prover internals. The study recommended only using Frama-C in highly safety-critical applications with support from an experienced consultant.
In a case study by Microsoft Research [19], the authors used VCC to verify Hyper-V Hypervisor, a virtualization system for x86-64 systems. HyperV Hyper- visor turns one x86-64 machine into multiple virtual machines, with some extra Hypervisor-specific instructions that the virtual machines can use to create and manage other virtual machines on the same system. Hypervisor was written in about 100 000 lines of C code. The project use VCCs ghost code and data capabil- ities to achieve the verification. Although the entire verification was not completed at the time, several hundred functions were successfully verified and the authors of the study were confident that the entire code could be verified.
Sikora et. al. [23] made a study on requirements engineering for embedded systems, and how requirements engineering is done in real companies. Among other thing, it studies the usage of natural language specifications vs. formal requirements models. The study was made in 2009 and involved seven different companies who use embedded systems. The study included two different questionnaires that were answered by employees of the companies, as well as some deeper interviews.
The study showed that in practice, most of the companies in the survey wrote their specifications as text documents in natural language, but that some of the participants were not satisfied with this. Most participants did only use formal models sometimes or rarely. When asked for their formal model preferences, a wide range of models were suggested, including state machines, Simulink, UML, etc. etc.
Many of the participants also believed that the existing requirements engineering methods were sufficient to handle the complexity of the systems that they developed.
Most of the participants agreed that their current methods were not always sufficient
to assume quality standards such as ISO 26262, and that models would significant
improve requirements validation.
Chapter 3
Requirement models
In this section, the problem will be explained in more detail. The requirements of the specification must be formalized and presented in some way. Two separate models will be presented, the conditional assignment model and the functional model.
3.1 The studied embedded system
The embedded system that will be investigated in the study is used to control the steering system in Scania trucks. The trucks have two main steering systems, one primary that is hydraulic and should be used when possible, and one secondary that is used in certain situations where the primary cannot be used.
The embedded system hardware is a computer which runs code that is compiled from C code. It is the behaviour of this C code that we want to study and verify;
no analysis is done on the compiled binary code and it is assumed that the compiler works correctly. The code is divided into several modules, each of which controls some separate function of the vehicle. In this project, we focused on the steering module that is used to control the steering functions mentioned earlier.
The computer is connected to a set of inputs and outputs that our code will interact with. These inputs and outputs are connected to various pieces of hardware in the truck, such as sensors, electric motors, warning lights, etc and are used to read sensor input and control hardware functions in the truck. Inputs and outputs may also be connected to other modules that are executed in parallel. The module system performs its task by executing its main function periodically. During the execution, the code can read the inputs and set the outputs before completing.
Some additional assumptions are made. First, it is assumed that the function to
be analyzed will be executed quickly enough that execution time will not be an issue
for correctness, and will be much smaller than the time period between executions
of the periodic function. Second, only the final output values that have been set
when the function completes matter. The precise operations during execution do
not make a difference. Third, the function will only depend on the values of the
input at the beginning of the execution. The relevant input values will not change
during the execution of the function.
3.2 Semantics
With these assumptions, the requirements can now be formulated as a relation between input and output, that defines the correct final output for each initial input. We defined a module to have m inputs i
1...m, of which each input i
xcan have a value in the set I
x, and n outputs o
1...m, of which each output o
xcan have a value in the set O
x. We will now the correct desired behaviour of the module can be stated as a multi-output function f : I
1× I
2· · · × I
m→ O
1× O
2· · · × O
n, which returns the correct output for each possible input. Like other functions, the function can also be formulated as a relation, which can be stated as a set of I
1×I
2· · ·×I
m×O
1×O
2· · ·×O
ntuples with the condition that for each possible input, there is exactly one tuple having these input values. Alternatively, one may also want to divide the multi-output function into n separate single-output functions, one for each of output variables. This is equivalent, but more convenient to work with, as we will see later.
The inputs and outputs do not necessarily have to be entirely separate, as there can be cases where a common resource is accessed both through an input and an output. For example, we might want to be able to both read the current on/off status of some hardware, and be able to change this status within the module.
3.3 Relation to Hoare logic
The main function must be verified in order to verify that the module conforms to the specification. As defined before, the module conforms to the specification if it sets the correct output values for each possible input, as defined earlier. With the specification being formulated as a function f , this is easy to do. The required post- condition for the module is that when the input values i
1, i
2. . . i
mwhen execution begins, the output values must be equal to f (i
1, i
2. . . i
m).
When verifying our C code with external verifiers such as VCC, annotations will then have to be added to describe this postcondition to the verifier.
3.4 Studied models
For complex systems, the specification function will be very complicated. They may have many inputs and outputs, and the conditions for the output variable values will be complex as well. Nonetheless, engineers must be able to concisely and correctly describe the correct input/output behaviour, and the resulting specification must be easily converted to annotations that describe the corresponding input/output function.
In the following section, two different models for describing these functions will
be described. The first, the conditional assignment model, is similar to, and es-
3.4. STUDIED MODELS
sentially a formalisation of existing semi-formal specifications. The other one, the functional model, is more directly similar to the semantics mentioned earlier. Al- though both models are syntactically different, both models are semantically equiv- alent to the input/output function as long as the specification is free of errors, and conversion between the two models can be done, as will be shown later.
3.4.1 Conditional assignment model
The present requirements in Scania are written in a semi-formal form that is very similar to this model. This model will be referred to in this report as conditional assignment model. In this model, requirements are written as conditional assign- ments, which can be written as (c ⇒ v = x). c is a condition, which can be any Boolean expression with variables. v is a variable and x is a particular value that this variable can have. Each conditional assignment specifies that if the condition c holds before the function is called, then the variable v must have the value x when execution has completed. More formally, a conditional assignment require- ments that for all possible inputs i
1, i
2. . . i
m, if the condition c is true, and the conditional assignment specifies that o
x= v, then o = f (i
1, i
2. . . i
m) must satisfy the requirement o
x= v.
The existing specification documents use a different more text-like syntax for each conditional assignment: (If c : v = x), usually with indentation and new lines.
Each conditional assignment is usually given a unique name so that they can be referred to easily in documents. Frequently, conditional assignments written in this syntax can assign to multiple variables with a single condition. These cases can easily be handled by treating these kind of conditional assignments as multiple conditional assignments, one for each variable it assigns to. This syntax will be used for the conditional assignment requirements for most of the report.
There is one more additional feature that is used in the existing specifications, which is also included in the conditional assignment model, called model variables.
These variables represent neither inputs nor outputs. Instead, their values are entirely defined by conditional assignments as if it were an output variable, and their values can then used in the conditionals of other conditional assignments.
Model variables may or may not exist in the actual code, and is merely a useful abstraction.
Model variables are useful when writing conditional assignment requirements.
If model variables weren’t introduced, then all assignments would be directly to output variables and depend on input variables. These conditional expressions may be complicated and thus difficult to formulate correctly. Some of the conditional expressions could also be redundant, if any output variables depend on similar input vector cases. Instead, model variables can be used to define important cases and conditions that can then be used in other conditionals.
Model variables adds an aspect of ordering, as conditional assignments may de-
pend on model variables defined by other conditional assignments. The existing
semi-formal specifications do not specify this explicitly, and so the usage and order-
• (i
2⇒ o
2:= True)
• (a
1⇒ o
2:= True)
• (¬i
2∧ ¬a
1⇒ o
2:= False)
Figure 3.1. An example of three separate conditional assignments. Together, these conditional assignment requirements specify the requirement “Output variable o2
must be equal to i2∨ a1”
Requirement 1:
If i2:
o2 = True Requirement 2:
If a1:
o1 = True o2 = True Requirement 3:
If not(a1) and not(i2):
o2 = False
Figure 3.2. The formal requirements in figure 3.1 in text form. Note that in this example, the second conditional assignment also assigns to o1. This can be seen as adding the additional requirement (a1⇒ o1:= True).
ing of variables must be inferred manually. Circular dependencies between model variables can not be resolved and must be avoided.
An example of a set of multiple conditional assignments, which together specifies a certain requirement is given here in figure 3.1 and 3.2.
This conditional form has some drawbacks. It is easy to end up with a specifica-
tion that is flawed in some way. Firstly, the conditionals may not be complete for a
variable v, that is, the set of conditional assignments are not sufficient to completely
define the value of the variable for all possible input vectors. It is usually not clear
what to do if this is the case. If the variable v is both an input and an output
variable, then these cases can be interpreted as “keep the current value and do
nothing”. It could also be interpreted as “value is undefined and does not matter”,
or it could be an oversight in the specification. Secondly, two or more conditional
assignments may contradict each other. This happens when the conditionals of the
conditional assignments can be true at the same time, and their assignments define
different values for at least one variable. In this case, it is unclear what value is
actually correct. Some care must be taken when writing the conditionals in this
3.4. STUDIED MODELS
form to avoid these issues.
For verification purposes, the conditional assignments must be related to Hoare logic. Each conditional assignment is simply a condition and a set of assignments that must have happened if the condition was true, that is, the condition c implies that some equalities will hold after the program or function that we want to verify has executed. We can write each such implication (c ⇒ e) as a postcondition.
Implications are also easy to write as verification annotations.
In some cases, it may be necessary to separate between the value of a variable after the function is executed and the value before it is executed. Certain variables are mapped to both an input and an output. In this case, programs will typically read the value, perform some logic, and possibly write a new value. Requirements that use the input should refer to the value of the variable before execution (as the value could be changed after execution).
3.4.2 Functional model
We will now present a different requirement model, which will be referred to as the functional model. In this form, the specification is formulated as a collection of functions, each of which defines the correct value of a variable for a given input. It is essentially very similar to the input/output function semantics, except the multi- output function is split into several functions, one for each output variable. For example:
o
1= f
1(i
1, i
2. . . , i
m) o
2= f
2(i
1, i
2. . . , i
m) o
3= f
3(i
1, i
2. . . , i
m)
The most basic formalization defines the correct value of output variables as a direct function of input variables. Typically, most functions will not depend on every single input variable, in which case they can be simplified into a function taking fewer input variable arguments. Even with this simplification, it may be difficult to define the value of each output variable directly based on the input variables, for the reasons that were given in the section on the conditional assignment model (chapter 3.4.1).
We can add model variables, that work similarly to the model variables in the
conditional assignment model. They are neither input nor output variables, and
only exist within the requirement model. Their value can be defined by input
variables or other model variables, and once they have been defined, they can be
used as arguments to subsequent functions much like input variables. An example
of how model variables can be used, where the new ghost variables are defined as
a
i:
i
1i
2i
3i
4a
1a
2o
1o
2o
3Figure 3.3. The variable dependencies of the example, as a directed acyclic graph
a
1= f
1(i
1, i
2) o
1= f
2(a
1) o
2= f
3(a
1, i
3) a
2= f
4(a
1, i
4) o
3= f
5(a
2)
The addition of model variables requires more function definitions. However, these function definitions will in many cases be significantly simpler. Redundancy will also be reduced, as a model variable can represent a specific condition, which can then be used in many functions directly. These functions may also be more natural to define for the developers, as they can define the main parts of the requirement model one step at a time, rather than as monolithic input-to-output functions. The model variables can be seen as a type of function decomposition, where the functions are divided into several smaller functions.
Model variables can be used as both input and output variables, but each vari- able must only depend on input variables or other model variables, and there can be no cycles in the dependencies between variables. The dependencies between in- put variables, model variables and output variables can be illustrated as a directed acyclic graph, with input variables on one side, output variables on the other side and the model variables in the middle. The earlier example is shown as a directed acyclic graph in figure 3.3.
The functional model has some benefits compared to the conditional assignment
model. If the function f
vdefining the value of a variable v is total, then the value
of v will also be completely defined. Since each value is only defined by a single
function rather than multiple conditional assignment, it is easy to see that there
will no contradictions as long as the function is correctly defined with one value for
each input.
3.4. STUDIED MODELS
One particular issue is whether the functions have to be total or not. It is possible that the specification only cares about the value of a certain variable in certain cases, but not in others. In this case, the function only needs to be partial so that the variable is allowed to be undefined (its value is not relevant) for some input vectors. If a model variable is left undefined for some input vectors, then all other variables depending on this model variable will too be undefined for the same input vectors, as it is not known if the condition with the model variable is true or not. In other cases, where something has both an input variable and an output variable, the output variable should keep its old value in certain cases. This should be written explicitly in the function definitions to differentiate those cases from the undefined cases.
The functions themselves can be formulated in any way, but a simple one is to state it in case notation. If several separate functions depend on identical or similar conditions, then these conditions can then be stated as a model variable, which can then be used by other functions.
As an example, some requirements for our earlier example in chapter 3.4.2 will be given:
• i
1is an integer input, representing some kind of sensor measurement. Its value will be between 0 and 255. i
2, i
3and i
4are all Boolean and represent some other measurements.
• a
1=
(
t if i
1> 200 ∧ i
2= f f otherwise
• o
1= a
1• o
2=
(
t if i
2∨ a
1f otherwise
• a
2=
(
t if a
1= t for at least one second f otherwise
This is a special case, as it is a temporal requirement which must be handled in some way. We will show how this can be done in chapter 3.8.
• o
3=
(
t if a
2∧ i
4f otherwise .
This variable depends on the previous temporal requirement.
We can note that the model variable a
1is used in three separate places. Represent-
ing this condition as a model variable avoids redundancy in the functions. It also
makes it easier for the developers to write the functions, as they can define a
1as
an important system condition first, and then write the requirements that depend
on this condition.
3.4.3 Conversion between conditional assignment and functional form As mentioned before, the two models are both semantically. It is therefore relatively easy to translate back and forth between the conditional assignment and functional form. This section will go through these conversions.
From conditional assignment form to functional form
We have a set of conditional assignment requirements, each of the form c
i⇒ v
i= x
ias described earlier. It can be assumed that each conditional assignment contains a single assignment without loss of generality, as any conditional assignment assigning to two or more variables can be split into multiple conditional assignments sharing the same condition. We now want to define, for all output and model variables as v, a function f
vthat defines their values and is equivalent to the conditional assignments.
To do this, we have to first take all conditional assignments assigning to this variable v. These assignments may now have different conditions and different values. Each conditional assigment defines a case where the variable must have a certain value. The final function is then achieved by combining all these cases into a function.
These cases together must define the value of v completely, and without contra- dictions. The conditions will depend on input and model variables. If all conditions of the assignments are disjoint (there are no input vectors such that two or more conditions are true at the same time), then there are no possible contradictions and f
vwill be well-defined.
If the function is not complete, that is, the function is not defined for at least one input vector, then this must be handled in some way. If the input can’t occur in practice, then this can be ignored. If it can occur, then the specification is not complete and must be extended.
If the conditions are not disjoint, then there is at least one input vector such that two or more conditions are true, then this must be handled. If the input can not occur in practice, then this can be ignored. If it can occur, then the specification is flawed and must be changed.
As an example, we can take the three conditional assignments in figure 3.1 to generate a function for the output variable o
2. There are three separate requirements that assign to the variable. These cases can be combined into a total function which fully defines the value of o
2without contradictions:
o
2=
(
True if i
2∨ a
1False if ¬i
2∧ ¬a
1From functional form to conditional assignment form
This conversion depends on how the functions are stated. If they are stated as
cases, as suggested before, then it is easy to convert to conditional assignment. In
3.5. REMOVING MODEL VARIABLES FROM SPECIFICATIONS
each function f
v, convert each case c
i⇒ x
ito a conditional assignment assigning x
ito v if c
iis true.
For example, we can take a function for some output variable:
o =
A if ¬x
1B if x
1∧ x
2< 200 C if x
1∧ x
2>= 200
We can see that depending on each of the three cases, the variable should be assigned to three different values. These cases can each be written as a separate conditional assignment:
• ¬x
1⇒ o = A
• x
1∧ x
2< 200 ⇒ o = B
• x
1∧ x
2>= 200 ⇒ o = C
3.5 Removing model variables from specifications
It may be desirable to convert the requirements from a form with model variables to a form without model variables. This can be done by substituting the model variables for the requirements that define the model variable.
Converting in the other direction (from requirements without model variables to requirements with model variables) is a general function decomposition problem, and thus a more complicated process. It requires applying function decomposition to decompose complex conditions into new model variables. This report will therefore not suggest any particular method to do this conversion, although existing meth- ods of functional decomposition can be used to achieve this. This report instead recommends to write the specification with model variables from the beginning, as has been done in Scania already.
Substitution in functional form
Assume that we have a function f
vwhich defines the value of an output variable v.
Some cases will refer to model variables, and only be true if a model variable has a certain value. The translation should remove all model variables from all functions.
This is done by function substitution: For each model variable w, we replace it
with its function f
w, which depend on some other variables and is too defined as a
number of cases. The next step is to try to simplify the expression so that nested
case expressions are avoided. The result may still contain model variables if f
wdepended on some other model variables, so the procedure is applied recursively
until all model variables are replaced. If the functional model is valid, then there
are no cycles in the variable dependencies and the process will terminate.
Substitution in conditional assignment form
Assume that we have an assignment statement that assigns to an output variable but contains a model variable in its condition. We may want translate it such that some or all model variables are removed from the code. One approach to do this is to again replace the model variable w with its function f
w, and then simplify the condition. This will require calculating the function f
w, so it may be necessary to convert the entire model or at least all model variables to functional form first.
3.6 Translation to code annotations
Whether we use the conditional assignment or the functional model, we want to annotate the main function to ensure that it fulfills the specification. The code to be verified will operate on the input and output variables in some way (in our case, through special input and output functions). The model variables, however, will not exist in the hardware itself, and may not exist as explicit variables in the code.
Two different approaches for handling the model variables are suggested here.
The first approach introduces ghost variables that represent the model vari- ables into the annotations. Ghost variables are declared with code annotations and can then accessed within annotations only. These annotations include pre-/post- conditions and ghost code, code that is only executed in verification. Ghost code and data can access other ghost code and data, and can also read regular variables.
However, it can not write to regular variables, and the regular can not access ghost data. Ghost data and ghost code will thus have no effect on the execution of the actual code. Both VCC and Frama-C have extensive support for ghost variables and code, and an example of VCC ghost code syntax is given in figure 3.4.
The second approach is to first translate the requirements to a form that does not have model variables using function substitution (see section 3.5). The result- ing requirements will only contain input and output variables, and can then be translated to annotations.
The ghost variable approach results in annotations that are more similar to the original requirements. This makes them easier to read and understand. However, the developer has to add ghost variable declarations in the code file, and ghost code within the functions themselves. This has some drawbacks. Adding all these annotations requires a significant effort. It also makes the verification process more dependent on the implementation code, as annotations must be added within the functions in the code. This dependency makes it more difficult to maintain code as well, as rewriting old code may require rewriting parts of the verification ghost code.
The function substitution method, on the other hand, will give complicated an-
notation expressions that are similar to how the specification would look if model
variables were not introduced. These annotations may be difficult to understand
and maintain. However, it requires significantly less annotations in the implemen-
tation code and little or no ghost code or variables. This makes the verification
3.6. TRANSLATION TO CODE ANNOTATIONS
#include <vcc.h>
_(ghost int modelVariable1);
void function(int a)
_(ensures modelVariable1==a%3) {
int local;
//some code here....
_(ghost modelVariable1 = local);
// The program will verify correctly if VCC
// can prove that local will always be equal to a%3 }
Figure 3.4. Illustration of ghost code syntax in VCC
#include <vcc.h>
_(ghost \bool a1);
_(ghost \bool a2);
void main_function()
_(requires i1>=0 && i1<=255) _(ensures i1>200 && i2 ==> a1) _(ensures i1<=200 || !i2 ==> !a1) _(ensures o1==a1)
_(ensures o2==i2 || a1) {
...
}
Figure 3.5. Example code annotations with model variables (VCC syntax)
more independent of the implementation. Code changes can then be done in the implementation more easily.
Once this is done, the requirements can be translated to postcondition annota- tions as described in chapter 3.3.
Annotation example
Some possible code annotations for our earlier example are shown in figure 3.5. The different cases for each function are written as post-condition annotations. The temporal condition is still not handled, but will be discussed in chapter 3.8.
Alternatively, the second approach, to remove model variables can be used. Two output variables are used,
o
1= a
1o
2=
(
t if i
2∨ a
1f otherwise
void main_function()
_(requires i1>=0 && i1<=255) _(ensures i1>200 && i2 ==> o1) _(ensures i1<=200 || !i2 ==> !o1) _(ensures i2 ==> o2)
_(ensures (i1>200 && !i2) ==> o2)
_(ensures !i2 && !(i1>200 && !i2) ==> o2) {
...
}
Figure 3.6. Example code annotations without model variables (VCC syntax)
and one model variable, a
1=
(
t if i
1> 200 ∧ i
2= f f otherwise
.
If we use the function substitution approach for the output variables, we will get
o
1=
(
t if i
1> 200 ∧ i
2= f f otherwise
o
2=
t if i
2∨
(
t if i
1> 200 ∧ i
2= f f otherwise
f otherwise
o
1is simple enough and does not need to be simplified further. The latter can be simplified to
o
2=
(
t if i
2∨ (i
1> 200 ∧ i
2= f ) f otherwise
These functions can now be translated to post-condition annotations without in- troducing ghost code or data. The annotations that can be generated from the example are given in figure 3.6.
3.7 Adding annotations to the rest of the functions in the module
The previous chapters explain how to specify the correct behaviour for the module
as a whole, and generate annotations for the main system function. However, well-
written non-trivial programs are usually not written as a single big function, but are
divided into several functions that call each other. These functions will have to be
annotated as well, so that the main function and thus the module can be verified.
3.7. ADDING ANNOTATIONS TO THE REST OF THE FUNCTIONS IN THE MODULE