Computing component specifications from global system requirements

(1)

IN

DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2017,

Computing component

specifications from global system requirements

CARL BJÖRKMAN

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

(3)

Computing component specifications from global system requirements

CARL BJ ¨ORKMAN

Master in Computer Science Date: June 15, 2017

Supervisor: Dilian Gurov Examiner: Mads Dam

Swedish title: Ber¨akning av komponentspecifikationer fr˚an globala systemkrav School of Computer Science and Communication

(4)

Abstract

If we have a program with strict control flow security requirements and want to ensure system requirements by verifying properties of said program, but part of the code base is in the form of a plug-in or third party library which we do not have access to at the time of verification, the procedure presented in this thesis can be used to generate the requirements needed for the plug-ins or third party libraries that they would have to fulfil in order for the final product to pass the given system requirements.

This thesis builds upon a transformation procedure that turns control flow properties of a behavioural form into a structural form. The control flow properties focus purely on control flow in the sense that they abstract away any kind of program data and target only call and return events. By behavioural properties we refer to properties regarding execution behaviour and by structural properties to properties regarding sequences of instructions in the source code or object code.

The result presented in this thesis takes this transformation procedure one step further and assume that some methods (or functions or procedures, depending on the programming language) are given in the form of models called flow graph, while the remaining methods are left unspecified. The output then becomes a set of structural constraints for the unspecified methods, which they must adhere to in order for any completion of the partial flow graph to satisfy the behavioural formula.

(5)

ii

Sammanfattning

Om vi har ett program med strikta kontrollflödeskrav och vill garantera att vissa systemkrav uppfylls genom att verifiera formella egenskaper av detta program, samtidigt som en del av kodbasen är i form av ett plug-in eller tredjeparts-bibliotek som vi inte har tillg˚ang till vid verifieringen, s˚a kan proceduren som presenteras i detta examensarbete användas för att generera de systemkrav som de plug-in eller tredjeparts-bibliotek behöver uppfylla för att slutprodukten ska passera de givna systemkraven.

Detta examensarbete bygger p˚a en transformationsprocedur som omvandlar kon- trollflödesegenskaper p˚a en beteendemässig form till en strukturell form. Kontrollflödes- egenskaperna fokuserar uteslutande p˚a kontrollflöden i den meningen att de abstraherar bort all form av programdata och berör enbart anrop- och retur-händelser. Med beteen- demässiga egenskaper syftar vi p˚a egenskaper som berör exekverings-beteende och med strukturella egenskaper syftar vi p˚a egenskaper som berör ordningen p˚a instruktionerna i källkoden eller objektkoden.

Resultatet i detta examensarbete tar denna transformationsprocedur ett steg längre och antar att vissa metoder (eller funktioner eller procedurer beroende p˚a programme- ringsspr˚ak) är redan givna i formen av modeller som kallas flödesgrafer, medan resten av metoderna fortfarande är ospecificerade. Utdata blir d˚a en mängd av strukturella restriktioner för de ospecificerade metoderna, som de m˚aste följa för att en fulländning av den partiella flödesgrafen ska satisfiera den beteendemässiga formeln.

(6)

Contents iii

1 Introduction 1

1.1 Background . . . 1

1.2 Problem formulation . . . 1

1.3 Approach . . . 3

1.4 Delimitation . . . 3

1.5 Contribution . . . 3

1.6 Report structure . . . 3

2 General Context and Background of the Area 5 2.1 The area of static analysis . . . 5

2.2 The area of formal methods . . . 6

2.2.1 Deductive verification . . . 6

2.2.2 Temporal model checking . . . 7

2.3 The area of control flow analysis . . . 10

3 Technical Background 12 3.1 Informal description of control flow properties . . . 12

3.2 Property specification language . . . 13

3.2.1 Box modalities . . . 14

3.2.2 Fixed-point formulas . . . 14

3.3 Modeling control flow . . . 18

3.3.1 Flow graphs . . . 18

3.3.2 Flow graph interfaces . . . 20

3.4 Instantiations of the specification language . . . 20

3.4.1 Structural formulas . . . 21

3.4.2 Behavioural formulas . . . 22

3.5 Global specifications . . . 24

3.6 A framework for compositional verification of procedural programs . . . . 24

3.6.1 Local specification . . . 25

3.6.2 Description of the verification procedure . . . 25

3.7 Problem definition: Generating local specifications . . . 26

3.7.1 Formal problem specification . . . 26

iii

(7)

iv CONTENTS

4 Transforming Behavioural Properties to Structural Properties 27

4.1 Introduction . . . 27

4.2 The transformation algorithm tableau system . . . 30

4.3 Post-processing the leaves . . . 32

4.3.1 Example . . . 33

4.3.2 Example variations . . . 34

4.4 Repeat conditions . . . 35

4.4.1 Pseudo-repeat . . . 35

4.4.2 Internal repeat . . . 36

4.4.3 Call repeat . . . 37

4.4.4 Return repeat . . . 37

4.5 The generating of structural constraints . . . 38

5 The Local Specification Generator Algorithm 40 5.1 Extending the property transformation algorithm . . . 40

5.1.1 The need for two modes . . . 40

5.1.2 Introduction to model checking mode . . . 41

5.1.3 Keeping track of history and the control point stack . . . 42

5.1.4 Formalising the model checking mode . . . 42

5.1.5 Switching between the modes . . . 43

5.2 Additional technical details . . . 46

5.2.1 Handling contradictions . . . 46

5.2.2 Modification of repeat conditions . . . 47

5.3 Formal definition of the local specification generator algorithm . . . 54

6 Analysis and Discussion 56 6.1 Time complexity analysis . . . 56

6.2 Soundness and completeness . . . 58

6.3 Possible end-user applications . . . 59

6.4 Sustainability aspects . . . 59

7 Future work 60 7.1 Proof of correctness . . . 60

7.2 Implementation . . . 60

7.2.1 Pruning of redundant tableau branches . . . 61

7.3 Extension of result to include data in model . . . 61

Bibliography 62

A Additional example 65

(8)

Introduction

1.1 Background

When working with safety-critical systems such as electronic wallets, traffic signals or health care equipment, developers might want to ensure that certain properties of these systems hold. This can be done by using formal methods. Formal methods enables a fairly broad area of methodology to do this.

The two major branches used in practice today are Hoare-logic and temporal logic. In the case of Hoare-logic, one establishes certainty about code by setting up pre-conditions and post-conditions around the code segment that should be verified. In the case of temporal logic one can establish properties in terms of time such as ”property P will always hold”, ”eventually event E will happen” or ”in a sequence of events A and B, event E must never happen”. Focusing in on temporal logic, we have an additional spectrum of properties that can be examined.

One of the classes of properties that can be examined by temporal logic is control flow properties (other examples of classes include memory consumption, type safety and static analysis). When examining control flow we can make sure that the order and context of method calls can be done in a safe manner. Examples of this includes

”method A must always make a call to method B before returning”, ”method A must never be called recursively”, ”method A can only be called from method B or C” and so on.

In the paper ”Compositional verification of sequential programs with procedures” [1]

by Gurov and Huisman, the authors present a framework where a control flow safety property regarding a body of code, called a global specification, can be made even though not all of the code is available at the time of verification. This is achieved be letting the end user specify a ”promise”, called a local specification, regarding the non-available segments of the code that discloses what will and will not happen in that code segment once it is available. See figure 1.1 for a graphical illustration of the scenario described.

1.2 Problem formulation

Using the same setting and models as in the paper ”Compositional verification of sequential programs with procedures” [1], this thesis examines if it is possible to generate a local specification out of the global specification and the available code. This would eliminate the need for an end user, presumable a developer, to have to create one by

1

(9)

2 CHAPTER 1. INTRODUCTION

Figure 1.1: Conceptually, this is what is happening: We want to verify a global specification ψ for the whole code, but there is a segment that is not available (illustrated by the blank box). But we have a local specification φ that is promised to hold for whatever code that will fill the blank box. And thus we can still verify ψ with respect to the available code and the assumption that φ will hold. It should be noted that the actual technique deals with something called flow graphs and not lines of code. The concept of flow graphs is described in section 3.3.1.

Figure 1.2: A conceptual illustration of the kind of property transformation that is made by the algorithm presented in [2]. In this example the behavioural property ”a call from method a to method b must not result in a immediate return from method b” is transformed into the structural constraint ”either method a must not call method b at all, or method b must not start with (and therefore consist solely of a) return”.

(10)

manual analysis. The algorithm that is capable of achieving this is termed the local specification generator algorithm.

1.3 Approach

For technical reasons, the approach to construct the local specification generator algorithm builds upon a existing algorithm presented in the paper ”Reducing behavioural to structural properties of programs with procedures” [2] by Gurov and Huisman et al. We refer to the algorithm in this paper as the transformation algorithm. The transformation algorithm is capable of reducing control flow properties regarding runtime behaviour – the way global specifications are written, into constraints of a structural form – the way local specifications are written. See figure 1.2 for a conceptual illustration of a property transformation.

1.4 Delimitation

This thesis will only focus on constructing the local specification generator algorithm.

The scope will not include implementing it nor proving its correctness. However, there exist an implementation of the transformation algorithm which is believed to be extend- able to a local specification generator algorithm and the possibility of proving correctness is discussed in section 7.1.

In both papers [1] and [2], all program data is abstracted away, meaning that the control flow properties focuses on allowed sequences of calls and returns only. This delimitation is also present here.

1.5 Contribution

A local specification generator algorithm is presented in the form of a tableau system pre- ceded by an explanation on how it is extended from the transformation algorithm. The local specification generator algorithm is also analyzed in terms of theoretical efficiency.

1.6 Report structure

The layout of the thesis is as follows:

Chapter 2 makes a high-level summary of the area of the thesis.

Chapter 3 gives all the necessary background needed to understand the problem definition on a technical level.

Chapter 4 gives a fairly in-depth explanation of the property transformation algorithm presented in [2]. The transformation algorithm is used as a basis for the local specification generator algorithm that is the main result of the thesis.

Chapter 5 describes how the extension of the transformation algorithm is made step- by-step leading up to a formal definition of the main result of the thesis which is an algorithm that generates local specifications from a global specification in combination with whatever available code there is.

Chapter 6 outlines an analysis of the time complexity of the algorithm as well as some examples of cases where the algorithm will have poor performance.

(11)

4 CHAPTER 1. INTRODUCTION

Chapter 7 discusses how the algorithm could be extended to handle more complex models and sophisticated models as well as how the algorithm could be proven to be sound and complete.

(12)

General Context and Background of the Area

In this chapter, we present a summary of the area of the thesis on a high level. Although the thesis almost exclusively focuses on two specific papers¹ and our extension of them, a basic understanding of the area in general is beneficial for understanding the context of the work we are doing. We start by examining the most general area and gradually progress onto the specific area the work of the thesis finds itself in.

2.1 The area of static analysis

The most general area of the thesis subject within computer science is static analysis.

Static analysis, in the context of software development, is the act of automated analysis of source code or object code²without actually executing it. It has been around since the dawn of computer science as one of the first static analyses³ was done in 1949 by Alan Turing when he proved the correctness of a program routine that computed factorials by repeated additions.[3]

Static analysis has grown from an interesting, but often time costly and technically involved side anecdote of research to something that is integrated into almost all modern software IDEs and pipelines. Today it is not only used for syntax checking, but it is used in a diverse set of areas such as security vulnerability detection, architecture visualization, clone detection, race condition detection etc.[4].

The technology standard group Object Management Group recommends using static analysis on several levels in order to deliver high quality and secure software. These levels include analysis on software component level, program level, system level and business level.[5]

The methods encompassed by the area are also widely appreciated for their abilities to provide analysis where it is deemed too much of a risk to let the analysis be done by hand. For instance, the U.S. Food and Drug Administration recommends using static analysis when developing medical equipment software[6], the U.K. Office for Nuclear Regulation recommends using it when developing software used in nuclear power plants

1Which are [1] and [2].

2Also known as ”intermediate code” or ”bytecode”.

3As applied to software. The term ”static analysis” is older and can refer to the analysis of any system governed by rules.

5

(13)

6 CHAPTER 2. GENERAL CONTEXT AND BACKGROUND OF THE AREA

[7] and the multinational Certification Authorities Software Team recommends using it when developing air traffic software[8].

2.2 The area of formal methods

Formal methods consists of a collection of techniques that uses mathematics to rigorously specify, design and verify computer systems.[9]

Before the rise of computer systems, there has been extensive usage of mathematical models and testing properties in those models – often models that represent some phenomenon in natural science.

The area of formal methods was first explored in the same instance as the 1949 Turing paper[3] and was largely founded during the 1960s.[10] The area gained a lot of popularity in 1994 when the infamous ”Pentium FDIV bug” was discovered.⁴ Intel spent

$475M to make sure that such a bug would never appear again in subsequent products whereupon one of the most prominent approaches for bug prevention Intel started using was formal methods.[11]

Since then the area has shown a steady increase in popularity, mainly in hardware design. Today, most (presumably all) leading hardware companies uses formal methods.[12]

However, there exist usage of them for software as well, albeit it is usually a bit more difficult to achieve because of the subtle interactions of components in software, whereas in hardware, components often have a natural cohesiveness.⁵

There are two main approaches to formal verification of software: deductive verification and temporal model checking.

2.2.1 Deductive verification

Deductive verification involves making a mathematical statement for each segment of the code that rigorously describes what that segment does and then having an automated theorem prover prove the correctness of the input statement based on the descriptions written in the code.[13] There are three architectures used in practice:[14]

• Using a highly expressive logical framework based on higher-order logics with in- ductive definitions. The downside of this approach is that is usually requires a bit of user interaction. However, the expressiveness of the language can provide a great deal of freedom for the end user. Modern ”proof helpers” such as HOL⁶ or Isabelle⁷ can even provide portions of the target language such as a non-trivial portion of the Java and C language.

• Having a target language that is designed to be embedded into the specification language⁸. Examples of this include ACL2,⁹ VeriFun¹⁰, KeY¹¹ and KIV¹².

4A floating point computation bug where the processor could give incorrect decimal results.

5The hardware components has very strict roles and interfaces, making the interactions much more predictable in general.

6https://hol-theorem-prover.org/

7http://isabelle.in.tum.de/

8The specification language is usually first-order logic.

9http://www.cs.utexas.edu/users/moore/acl2/

10https://verifun.jimdo.com/

11https://www.key-project.org/

12https://www.informatik.uni-augsburg.de/lehrstuehle/swt/se/kiv/

(14)

• Using a Hoare-logic based language. In Hoare-logic we declare contracts in form of a pre-condition and a post-condition for selected procedures of the source code.

One of the more popular Hoare-logic based approaches is verification condition generator architecture where a specification is given in the form of a post-condition for some procedure. The specification is then pushed upwards through each segment of code, transforming into the weakest precondition for each segment. This results in generated pre-conditions for each procedure that can reach the procedure that was given the initial post-condition. Examples of realisations of verification condition generator architecture include Dafny¹³and Why¹⁴.

2.2.2 Temporal model checking

Temporal model checking is the other main approach to formal verification of software and is the one we use in this thesis. It involves creating a model of the program and checking the validity of a propositional statement qualified in terms of time¹⁵ such as

”property P will always hold”, ”eventually event E will happen”.

The initial field development is largely attributed to Amir Pnueli.[15] The field gained further attention in 1981 when Clarke and Emerson released a pioneering paper describ- ing the usage of an implemented model checker to see if a given source code meets a temporal logic specification.[16].

More technically speaking, in model checking we have some finite transition system, typically a finite automata-like structure, that serves as a model of the program, and we want to verify a given property expressed in some temporal logic. The verification is achieved by exhaustively exploring all possible states of the model that are related to the property to make sure that it hols in all places where it must hold.[17]

Initially, the model checking approach seemed too slow, as the approaches at the time involved exhaustively exploring all states of the model one at a time. As the states were often infeasibly large in number, using this verification method was simply impractical.

However, subsequent symbolic techniques such as binary decision diagrams, B¨uchi automata, symmetry reduction, among others, were used to explore many states in one atomic step. This greatly increased the usability and popularity of the field.[14]

Models

There are three popular approaches to modelling a program: Kripke structures, Labelled transition systems and Kripke transition systems. All of them are directed graph based models and in all three cases a node represents a distinct possible execution state of the program and an edge represents a possible transition between two states.[17]

A Kripke structure is a graph where each node has a set of atoms that describes the properties regarding the state. It could be ”radio button 4 checked”,

”user has sufficient funds” or ”x = 4” etc.

A labelled transition system is a graph where the edges have labels that describe what action happens when execution passes that edge. Examples would be ”coin inserted”,

”x < y”, ”z = 5”, ”receipt printed” etc.

13https://www.microsoft.com/en-us/research/project/dafny-a-language-and-program- verifier-for-functional-correctness/

14http://why.lri.fr/

15Hence the word ”temporal”.

(15)

Figure 2.1: A Kripke structure on the left, a labelled transition system in the middle and a Kripke transition system on the right.

A Kripke transition system is, as the name suggests, a combination of the previous structures. I.e., it has labels for both the nodes and the edges. A small example to distinguish the three kinds of models is shown in figure 2.1.

The kind of model we will use in this thesis is based on Kripke transition systems and will be presented in section 3.3.1.

Logics

There are two kinds of logics used for temporal model checking: linear-time logic and branching-time logic.[17]

Linear-time logics (LTL) is interpreted over paths. As with most logics, verification can be made at different levels. A path holds if the atoms of all states it passes are coherent with the formula. A state holds if all paths emerging from that state holds.

And finally, a system holds if the initial state of the model holds. The main example of a linear-time logic is propositional linear-time logic (PLTL), though variations exist.

Branching-time logics are, as the name suggests, concerned with how the model branches. The main question here would be ”does the model branch in such a way that the property will still hold?”. A property expressed in a branching-time logic could be

”after any occurrence of action A, it is still possible to perform action B”. Such a property could not be expressed in a linear-time logic since, intuitively speaking, it simply does not have that kind of ”if-then” reasoning embedded into it. Examples of branching-time logics are Hennessy-Milner logic (HML), Computational Tree Logic (CTL) and Modal µ-calculus.

The main difference between the abilities of linear-time logics and branching-time logics can be characterized by the following example: consider the two systems illustrated in figure 2.2. The system on the right works in a slightly unexpected way and decides the options that the user will have when the user enters the pin code. The left system leaves both options available for the user each time. Both systems however, produce the exact same sets of paths, namely:

{henter name, validate pin code, withdraw fundsi, henter name, validate pin code, check balancei}

Linear-time logic could not differentiate between the two systems since it is path- oriented whereas branching-time logic could make the distinction by checking the validity of the branching property ”after any ’validate pin code’ event has happened, both options

’withdraw funds’ and ’check balance’ are available”.

The logic we use in this thesis is a subset of modal µ-calculus and is presented in section 3.2.

(16)

Figure 2.2: Two ATM programs represented as labelled transition systems.

Local vs. global model checking

The model-checking problem can be specified in two distinct ways:

• ”Given a model M, a formula φ and a state s ∈ M. Determine if s satisfies φ.”

• ”Given a model M, a formula φ. Compute the the set of states that satisfy φ.”.

The former is called the local model checking problem while the latter is called the global model checking problem.[17]

Local model checking is usually preferred where exploring all states would be compu- tationally infeasible and the main interest lies within the validity of a few initial states.

Global model checking is preferred when it is crucial to know the validity of every possible use of the system such as in data flow analysis.

The way we verify our models in this thesis is by local model checking.

Model checking approaches

There are several ways to achieve the actual model checking. Three of the most prominent ways are by the semantic approach, automata-theoretic approach and tableau approach.[17]

The semantic approach, as the name suggests, involves iteratively working the semantics of the formula against the model. The approach is quite technically involved and is known to have an exponential worst case running time¹⁶. However, it stands out as the only one of the three approaches that can do global model checking on branching-time logics in one, single execution.

The automata-theoretic approach involves constructing two automatons: one formula automaton A_φthat accepts paths that satisfies the formula and a model automaton AM

that accepts paths that satisfies the model. A product automaton A_φ× AM is then constructed and if the product automaton is non-empty, the formula satisfies the model.

The drawback of this approach is that it is mainly applicable only to linear-time logics.

Finally, the tableau approach involves constructing a tableau, which is a tree-like structure where each node is a rule from the logic that makes a syntactic change to

16Although this might seem like a bad approach, it should also be noted that model checking of some branching-time logics such as µ-calculus is expected to be hard since it is proven to be in the intersection of NP and co-NP.

(17)

Branching-time Linear-time Global Local

Semantic methods X X

Automata-theoretic methods X X X

Tableau methods X X X

Figure 2.3: The approximate abilities of the different model checking approaches.

the formula. When finished, the tableau system acts as a proof tree that witnesses the validity of the formula. The drawback of the tableau approach is that each tableau can only prove validity for finite-state systems and usually only for one state per tableau.

Thus the approach is mostly preferred when solving the local model checking problem.

A summary of the properties of the approaches can be found in figure 2.3.

The approach we use in the thesis is the tableau approach¹⁷.

2.3 The area of control flow analysis

Control flow analysis (CFA) is static analysis of the possible order of execution of sequences of instructions and involves making a control flow graph out of the source code or object code and analysing that control flow graph. The applications of control flow analysis include optimisation, security, automatic parallelisation and program verification.[18]

The area originated in the early 1970s with the pioneering work made by Frances E.

Allen¹⁸[20] and initially revolved mostly around making performance optimisations in low-level languages that were built into compilers[21]. Later on, more application areas as the ones mentioned earlier were discovered.

An interesting fact regarding control flow analysis is the dramatic difference between performing analysis on object-oriented languages and functional languages. There exist several polynomial-time algorithms for object-oriented languages, whereas control flow analysis for functional languages is proven to be EXPTIME-complete¹⁹.[22]

To give an example of an optimisation made possible by control flow analysis, consider the source code and its control flow graph presented in figure 2.4. As we can see, when looking at all paths from start to the dashed block, once the variable s get its value, it is always read in the expression s+4 without subsequent assignments to s. Thus, the value s+4 can be cached to improve performance.

In the example we just gave we performed the actual analysis manually, which one in practice of course would not do. One of the main approaches used by CFA algorithms is to do over-approximations of how the calls can be made in the program. This is achieved by a technique called abstract interpretation. An abstract interpreter works similarly to an ordinary interpreter with three major differences: [18]

• It will use abstract values. So for example, instead of 75 or −2, it might use positive or negative respectively. Pointers might all point to the same adress.

17Although the tableau approach is not usually preferred in conjunction with global model checking, the average use case for our purposes will not involve too many states to verify in order to get global coverage. Thus one tableau can be constructed for each state without making the approach impractical.

18For this work, she later received the Turing award, thereby becoming the first female Turing award recipient.[19]

19This high problem complexity is related to the fact that the target of a function can be unspecified and to be resolved in runtime in higher-level languages.

(18)

Figure 2.4: A piece of Java code on the left and its control flow graph on the right.

• It will use non-determinism. I.e., if there is an if-else- or a switch-statement, the execution will explore all possible execution paths.

• It will usually operate over a finite state space to ensure termination.

It is not too hard to see that the above techniques in combination can result in false positives. For instance, if there is a statement that says if (x > 350) and we have that x is simply set to positive, we will explore that if -block. However, the code might have been written in such a way that x would never be larger than 350 when execution reaches that particular line of code, resulting in a false positive.

One of the most popular CFA implementations is k-CFA. k is an integer that is set when executing the program, the higher the value of k is, the more precise the analysis becomes and conversely, the longer it takes for the analysis to complete.[18]

The way we use control flow analysis in this thesis is in the context of program verification, specifically verification of safety properties. We have specifications that put some restrictions on which sequences of calls and returns between methods are allowed.

As mentioned in section 1.4, all program data is abstracted away. The specifications we use do not mention the program data and the control flow graphs we have as models are stripped away of its program data. More on this in section 3.3.1.

(19)

Chapter 3

Technical Background

In this chapter, we describe everything needed in order to understand the problem on a formal level.

First, a description of control flow properties is given informally to give an idea of the nature of the properties we will examine. After that, we describe the specification language used to formulate control flow properties in its uninstantiated form. This is followed by a description of our program models and an explanation how our specification language is instantiated to produce control flow properties. Finally, the research context and a specification of the problem definition concludes the chapter.

3.1 Informal description of control flow properties

There are two distinct kinds of control flow properties one can study: structural properties and behavioural properties.

Structural properties are properties regarding textual sequences of instructions and is always stated per method. I.e., a structural property states which lines of code can come after which in the implementation of some particular method.

Recall that, as part of our modelling of control flow, we abstract away all data and focus purely on the actual control flow. A structural property, therefore, is always concerning lines of code in the implementation that consist of a call or a return instruction.

A few informal examples of a structural properties would be:

• In the implementation of method a, there must be no call instruction to method b.

• The implementation of method b must not start with a return instruction.

• A call instruction to method c in the implementation of method a must not be followed by an immediate call instruction to method d.

• In the implementation of method b, a call instruction must precede any return instruction.

Behavioural properties are properties regarding what is allowed to happen during execution of the source code. I.e., behavioural properties state what kind of program induced behaviour is acceptable, regardless of how it is implemented.

The implications of data abstraction in this case is that a behavioural property is always concerning which calls and returns can happen during execution.

A few informal examples of behavioural properties would be:

12

(20)

• Method a may never call method b.

• Method a may never call itself.

• A call from method a to method b must result in an immediate return.

• If there is a call from method a to method b, there may not be a call to method a directly afterwards.

• In an arbitrary sequence of alternating calls to a and b, a return must not happen immediately after a call to b.

• After the first call from method a to method b, there must be another call before b returns.

In summary, structural properties concern the syntax of the program, while behavioural properties concern their semantics.

Something that is appropriate to mention here is that we can only express properties regarding what the implementation or control flow behaviour cannot do, i.e., safety properties of control flow. We would not, for instance, be able to express a structural property that says ”method a must call method b before returning”¹ except if it is stated by way of exclusion. I.e., if we only have methods a, b and c then we could make a structural property equivalent to the previous statement by formulating it as ”method a may not return as long as it is only doing silent actions, calls to method c or calls to itself”.

3.2 Property specification language

The specification language we use to create structural and behavioural control flow formulas is the safety fragment of the modal µ-calculus[23]. In [1], the authors call this fragment simulation logic for technical reasons, and we too use this term due to the strong connection between this work and [1] (which is made apparent in section 3.6).

Simulation logic is powerful enough to express safety properties of sequential programs with procedures[23], which are the kinds of systems we focus on.

In this section, we give an overview of simulation logic in its pure form, unrelated to control flow properties. Later, in section 3.4 we describe how simulation logic is instantiated to express structural and behavioural properties respectively. For a formal exposition of simulation logic we refer to [24].

The models used in simulation logic are Kripke transition systems². The definition of simulation logic syntax is given in Backus-Naur form as follows:

φ ::= ff| tt | p | ¬p | X | φ1∧ φ2| φ1∨ φ2| [a]φ | νX.φ (3.1) where p and¬p are propositions of states in the Kripke transition system, X is a propositional variable³, ff and tt are shorthand constants for constructing a false and true formula respectively, a is a label of a transition in the flow graph, ∧ is a conjunction operator and∨ is a disjunction operator. We also use p → ψ as a shorthand for ¬p ∨ ψ.

1Such a property is also known as a liveness property.

2See section 2.2.2.

3The usage of propositional variables is made apparent in section 3.2.2.

(21)

14 CHAPTER 3. TECHNICAL BACKGROUND

Figure 3.1: The sequent of the figure holds since starting from s1 and following the only a-edge we have leads us into a state where both u and v holds true.

Figure 3.2: The sequent does not hold since there is an a-edge to s2 where v does not hold.

Finally, [a]φ and νX.φ are called box modalities and fixed-point formulas, respectively, and require more in-depth explanation, which we give in the following two subsections.

3.2.1 Box modalities

[a]φ is the box modality from the modal µ-calculus and can be described as a logical proposition that is true for a state iff φ is true for all states that we can reach by following edges with label a in the flow graph at this point.

Since we are performing local model checking, the formula φ is evaluated with respect to a given state s, which we will start in. After that, φ can be verified by simulating all paths that originate from s and respect the boxes of the formula to see whether φ holds in each node it passes along these paths⁴.

We demonstrate the box modality operator on a few examples illustrated in figures 3.1 - 3.5. Sequents s|= φ and s 6|= φ should be interpreted as that the semantics of φ holds and does not hold, respectively, with respect to s. We stress that the models shown in these figures are general Kripke transition systems and are not meant to represent source code or anything else in particular. The models we use to study control flow are described later in section 3.3.1.

3.2.2 Fixed-point formulas

νX.φ is a so called fixed-point formula. We demonstrate its usage by example. If we have a formula

4It should also be noted that the paths can be infinite in length.

Figure 3.3: This sequent holds since u and v hold in all states where we can end up by following a-edges.

(22)

Figure 3.4: This sequent holds since u, v and w holds in s4 which is the only node that is reachable by the formula. The path to s4 goes s1→ s2 → s4.

Figure 3.5: This sequent does not hold since, although x holds when following the sole a-label to s2, there is at least one b-edge that can be followed from then on into a state where v does not hold, namely from s2 to s4.

νX.(p∧ [a]X) (3.2)

then this formula is equivalent to the conjunction of

p∧ [a]tt (3.3)

and

p∧ [a](p ∧ [a]tt) (3.4)

and

p∧ [a](p ∧ [a](p ∧ [a]tt)) (3.5)

and so on. I.e., the end result (after simplification) would be a finite formula with prefix

p∧ (p ∧ [a]p) ∧ (p ∧ [a](p ∧ [a]p)) ∧ . . . (3.6) which is also logically equivalent to

p∧ [a](p ∧ [a](p ∧ [a](. . .))) (3.7) where . . . abbreviate some finite number of terms. The intuitive meaning of this formula is ”p holds along all a-paths”.

Equations (3.3), (3.4) and (3.5) are termed the first, second and third approximants of (3.2) and any fixed-point formula has an infinite number of them. Intuitively, the

(23)

Figure 3.6: This sequent does not hold since it is possible to follow a-edges along the path to s5 (s1 → s2 → s3 → s5) and the whole formula has to hold at all nodes along this path. It does not do so since proposition v is true at s5 and the sequent requires v to be false.

Figure 3.7: This sequent holds since u holds in all nodes reachable by a-edges (s1, s2 and s4). Note that it does not matter that there is an a-edge between s3 and s5 since s3 is not reachable by a-edges. A fixed point formula only has to be valid for the paths its box modalities (possibly in combination with fixed points) can generate.

effect of the fixed point operator can be interpreted as an arbitrary number of iterations of syntactic substitution of the variable with its formula.

In the context of simulation logic, fixed-point formulas are used to specify properties that will hold recursively after following some box modality. For instance, consider the formula

φ =¬u (3.8)

Evaluating the formula above with respect to a state s says that in s, the proposition u must be false.

φ = νX.(¬u ∧ [a]X) (3.9)

Equation (3.9) expresses the same property as (3.8) with the addition that now proposition u must be false after any consecutive sequence of a actions from s. So as long as we are performing a actions and not doing anything else this property must still hold. However, if we at some point do a b action for instance, the promise of having proposition u false can now be broken without breaking formula (3.9). To see that this is so for after at least two a actions we can expand (3.9) to its second approximant:

φ =¬u ∧ [a](¬u ∧ [a]¬u) ≡ ¬u ∧ [a]¬u ∧ [a][a]¬u (3.10) where ≡ denotes semantic equivalence. It should be relatively straight-forward to see that the semantic equivalence holds for any number i of a actions by expanding (3.9) to its i + 1:th approximant.

We give a few examples illustrated in figures 3.6 - 3.10. It is also possible to use more than one fixed point operator in a formula as illustrated in figures 3.9 and 3.10.

(24)

Figure 3.8: This sequent does not hold since now the formula has to hold for all nodes reachable by a- and b-edges. And u does not hold in s3 or s5.

Figure 3.9: For this sequent to hold, as long as we are following a-edges the whole formula of the sequent has to hold. If we then follow a b-edge, u must be false and must remain false if we follow any number of a-edges at this point. The sequent in this example does not hold, since we can follow a-edges to s2 where the sub-formula [b]vY.([a]Y ∧ ¬u) has to hold. We can at this point follow a b-edge to s3 where vY.([a]Y ∧ ¬u) has to hold. But from s3 we can follow a-edges to s7 where u is true. Thus the formula does not hold. Alternatively, the same argument can also be made regarding the path s1→ s2 → s4 → s5 → s7.

(25)

Figure 3.10: The formula of this sequent is slightly different from the formula in the previous example and the model is slightly different as well. Although we can take the path s1 → s2 → s4 → s5 so that the sub-formula vY.[a](Y ∧ ¬u) has to hold, the requirement of u being false will only come into effect if we follow an a-edge of which there are none. We can also get to the same sub-formula in s3 by s1 → s2 → s3 and follow the a-edge to s5, but the resulting sub-formula then becomes Y which is unfolded to vY.[a](Y ∧ ¬u) and so we would still need an a-edge to get into s7 and thus a failing state. However, the edge connecting s5 and s7 is a b-edge. Thus the sequent holds.

If the properties expressed by fixed-point formulas are not too clear at this point, they will hopefully become more apparent when put in a concrete context, as we will do in the following sections.

3.3 Modeling control flow

In this section we describe how we model control flow. This is achieved with two components: the model itself, which is called a flow graph and a flow graph interface that specifies which methods are available and which are not.

3.3.1 Flow graphs

A well-known formalism to represent and analyse the behaviour of a program is to study its control flow graph as mentioned in section 2.3. A control flow graph is conceptually a graph representation of a program where each node is representing a control point of the program and each edge is labeled with some condition which must hold for the program to transfer control between the control point which the edge is connecting.

The model we use was introduced in [1] and is a variation of a control flow graph.

In a later paper [25] by the same authors, this model was termed flow graph, which is the term we use as well.

A fundamental difference between our model and a standard control flow graph is that in our model, the data of the program is abstracted away, and in a standard control flow graph it is usually kept as information stored in the nodes, as illustrated in figure 2.4.

(26)

To quickly give an idea of what a flow graph (our model) looks like, we start by giving an example. Consider the following source code consisting of two methods that prints a ticket if the user has sufficient funds:

a q u i r e f u n d s ( ) {

i f( f u n d s > 1 0 0 ) { p r i n t t i c k e t ( ) } else {

print( ” I n s u f f i c i e n t f u n d s . ” ) }

}

p r i n t t i c k e t ( ) {

print( ”Your t i c k e t : ”+s e c r e t c o d e ) }

Any valid piece of code can be turned into a flow graph. The graphical representation of the flow graph generated from the ticket machine source code above is shown in figure 3.11 and captures the structure (as opposite to the behaviour) of the source code.

Figure 3.11: The flow graph for the ticket machine example

A flow graph should be interpreted as follows: As in CFGs, nodes indicates states and edges indicates transitions between control points. Any non-ε label on an edge indicates a call to a method⁵ with the name that label has. An ε label indicates a so called silent action which means that some action is happening that is causing the execution to transition between two control points, but whatever that action is, it is not a method call. An r label next to a node indicates that this node is a return node, i.e., if execution reaches this node, the call stack will pop to the caller method.

Formally, the underlying structure of the model is a tupleM = (S, L, →, A, P, λA, λ_P) where S is a set of control points, L is a set of labels, → is a set of labelled transitions between states (using labels from the set L), A is a set of atomic propositions, P is a set of state assertions, λ_A is a mapping for each state to an atomic proposition and λ_P is a mapping for each state to a state assertion. This structure with a set of entry points E ⊆ S make up a method graph. A flow graph is then a set of method graphs, one for each method we want to observe.

5Or function or procedure, depending on whatever programming language is used.

(27)

The flow graph, whose graphical representation was shown in figure 3.11 is then formally defined as follows:

S ={s1, s₂, s₃, s₄, s₅, s₆} E ={s1, s5}

L ={ε, aquire funds, print tickets}

→={(s1, ε, s₂), (s₂, print tickets, s₃), (s₂, ε, s₄), (s₅, ε, s₆)} A ={aquire funds, print tickets, r}

P ={tt}

λ_A={(s1→ {aquire funds}), (s2 → {aquire funds}), (s3 → {aquire funds}) (s4→ {aquire funds, r}), (s5 → {print tickets}), (s6→ {print tickets, r})}

λ_P ={(s → tt) | s ∈ S}

Observe that each node has an atom label with the name of the method the node belongs to. These method name atoms are used for identification purposes and are usually omitted in the graphical representation.

Flow graph behaviour

The flow graph induces executions which constitute the behaviour of the flow graph which we describe informally by example. For a formal definition, we refer to [2].

Using the model from figure 3.11 and entry node s₁, an example of flow graph behaviour would be

(s₁, )−→ (s^τ 2, ) aquire funds call print ticket

−−−−−−−−−−−−−−−−−→ (s5, s₃)−→ (s^τ 6, s₃) print ticket ret aquire funds

−−−−−−−−−−−−−−−−−→ (s3, ) τ (tau) is the behavioural version of the structural silent action notation ε. The main reason for different silent action notations is ease of distinguishability between structural and behavioural formulas.

3.3.2 Flow graph interfaces

In addition to the model, we also specify which methods are available and unavailable via a flow graph interface. A flow graph interface is a pair I = (I⁺, I⁻) where I⁺and I⁻are sets with the names of the available and unavailable methods, respectively. For instance, if the source code we want to verify has an implementation for methods enter pin code and print receipt but the implementation for transfer funds will be provided at some later point the flow graph interface would be I⁺={enter pin code, print receipt}, I⁻= {transfer funds}.

The authors of [2] used flow graph interfaces in a slightly different way, and we overload the notation of I⁺ and I⁻somewhat for convenience. In [2] the methods of I⁺ and I⁻ are termed provided and required, respectively.

3.4 Instantiations of the specification language

With our current understanding of simulation logic formulas described in section 3.2 and how we model control flow described in section 3.3, it is not too difficult to show how simulation logic can state structural and behavioural properties regarding the structure and behaviour of a flow graph, respectively.

(28)

3.4.1 Structural formulas

For the structural case, the labels of the box modalities are either a method name, or a silent action ε. In practice, the only atoms used in formulas are r, indicating that the node is a return node or ff, indicating that reaching this state implies that the formula unconditionally does not hold for that flow graph.

Take, for instance, the structural formula

a→ [b]ff (3.11)

(3.11) states that the method graph for a must not begin with a call to method b.

I.e., there may be no b-edges from the start node in the method graph for a.⁶

a→ [b]r (3.12)

(3.12) states that in all b-edges from the start node of the method graph for a must connect to return node.

As we can see, using box modalities only is not too powerful and can only state properties regarding some bounded depth sub-graph of the method graph with its origin from the start node. Therefore, it is usually preferred to use fixed points in practice. It is especially useful to have the structural formulas of the form a→ νX.(φ ∧ [ε]X) since φ is then to hold regardless of any silent actions for properties of some method a that should not be sensitive to silent actions.

Let us now revisit the verbal examples of structural formulas from section 3.1 and build actual simulation logic formulas for each one. We do not have given flow graph interfaces, thus we have to make assumptions on what they are. Depending on how we make these assumptions, we get slightly different results, as illustrated in the first example.

• ”In the implementation of method a, there must be no call instruction to method b.”:

We assume flow graph interface I⁺={a, b}, I⁻= ∅.⁷ The term [b]ff says a call to method b at this point in the flow graph would imply that the formula does not hold for the flow graph. However, we also want our statement to hold regardless of any silent actions. Thus we embed [b]ff inside a νX.(. . .∧ [ε]X) fixed point

”wrapper” resulting in

a→ νX.([b]ff ∧ [ε]X) (3.13)

This formula almost captures the verbal statement. However, we also want the formula to hold if method a makes a self-call. The formula then becomes

a→ νX.([b]ff ∧ [a]X ∧ [ε]X) (3.14)

If we had more method names available and we wanted the statement to hold regardless of any calls to them, we would have had to added them as part of

6Here, we use the fact that each node has the method name as an atom, as mentioned in the end of section 3.3.1. Thus the formula a → [b]ff in this example only has to hold if we execute the method a, since only then will the left hand side expression a of the formula be true.

7I.e. methods a and b are the only ones that can be called. See section 3.3.2.

(29)

the fixed point formula as well. For instance, if our flow graph interface was I⁺={a, b, c}, I⁻={d} the resulting formula would have been

a→ νX.([b]ff ∧ [ε]X ∧ [a]X ∧ [c]X ∧ [d]X) (3.15)

• ”The implementation of method b must not start with a return instruction”: We assume flow graph interface I⁺={a, b}, I⁻= ∅. This one will only have to apply to very beginning of method b and will solely consist of a ¬r indicating ”return instruction not allowed”, resulting in

b→ ¬r (3.16)

• ”A call instruction to method c in the implementation of method a must not be followed by an immediate call instruction to method d”: We assume flow graph interface I⁺ ={a, b, c, d}, I⁻ = ∅. Here we stack two modality boxes to a failing state resulting in the term [c][d]ff. However we also must ”wrap” this in a fixed point formula, causing the term to be applicable regardless of silent actions and calls to other methods. The formula then becomes

a→ νX.([c]([d]ff ∧ X) ∧ [a]X ∧ [b]X ∧ [d]X ∧ [ε]X) (3.17) Note that the term [c]X is added⁸. If it was not, there could be an implementation of a that makes a call to c without calling d directly afterwards for instance by doing a silent action after the call, after that the formula would trivially hold and the implementation could make a c-d-call sequence without failing the formula since we would have escaped the formula by performing an action which the formula patterns does not catch.

• ”In the implementation of method b, a method call must be made before returning”: We assume flow graph interface I⁺ = {a, b}, I⁻ = ∅. Here, we simply disallow a return and use our fixed point silent action ”wrapper”. This causes the formula to trivially hold for a path in the flow graph as soon as a method call is made but at the same time having it to fail if a return node is reached in the path anytime before that. I.e., as soon as a method call is made, the formula is ”escaped ” and a return can be made without failing it. The resulting formula becomes

b→ νX.(¬r ∧ [ε]X) (3.18)

3.4.2 Behavioural formulas

In the behavioural case, labels of the box modalities are of the form a call b, a ret b or τ , indicating a call from method a to method b, a return from method a to method b and a silent action, respectively. The atoms are still r and ff as in the structural case.

When following a path in a flow graph in the behavioural context, crossing an edge with a method name label will make the path proceed in the start node of that method, whereas in the structural case, everything is scoped to the method the path started in.

8Implicitly in the sub-component [c]([d]ff ∧ X).

(30)

We return to the verbal behavioural property examples given in section 3.1 and build concrete formulas for them:

• ”Method a may never call method b”: We assume flow graph interface I⁺ = {a, b}, I⁻= ∅. The formula

νX.([a call b]ff∧ [τ]X) (3.19)

almost captures the verbal statement. But, again, since method a can make self- calls, a formula closer to the statement would be

νX.([a call b]ff∧ [a call a]X ∧ [a ret a]X ∧ [τ]X) (3.20) However, it is also possible for the execution to start in method b⁹, thus the final formula becomes

νX.([a call b]ff∧ [a call a]X ∧ [a ret a]X ∧ [b call a]X ∧ [a ret b]X ∧ [b call b]X∧

∧[b ret b]X ∧ [τ]X)

(3.21)

• ”Method a may never call itself”: We assume flow graph interface I⁺={a}, I⁻=

∅. The formula becomes

νX.([a call a]ff ∧ [τ]X) (3.22)

However, if we assume flow graph interface I⁺ ={a, b}, I⁻= ∅ the formula would become

νX.([a call a]ff ∧ [a call b]X ∧ [b ret a]X ∧ [b call a]X ∧ [a ret b]X ∧ [b call b]X∧

∧[b ret b]X ∧ [τ]X)

(3.23) since we want the formula to holds regardless of calls between method a and method b or self-calls to method b.

For simplicity, we shall disallow self-calls and only allow execution to start in method a for the rest of the examples in this section. We do so to avoid tedious [a call a]X∧ [a ret a]X ∧ [b call b]X ∧ [b call b]X ∧ . . . terms that would obscure the actual point of each example.

• ”A call from method a to method b must result in an immediate return”: We assume flow graph interface I⁺={a, b}, I⁻ = ∅. Although the statement initially might seem to be νX.([a call b]r∧[τ]X), as it is stated one would assume it to hold after the first return from b to a as well. The resulting formula thus becomes

νX.([a call b]r∧ [b ret a]X ∧ [τ]X) (3.24)

9Note that we do not add a → prefixes on these formulas, thus they have to hold for every possible starting method. It should also be noted that we could add a a → prefix to behavioural formulas if we wanted to relax the restrictions and only require the formula to hold for method a as starting method.

(31)

• ”If there is a call from method a to method b, there may not be a call to method adirectly afterwards.”: We assume flow graph interface I⁺={a, b}, I⁻= ∅. We simply stack the relevant box modalities to a failing state and add the relevant fixed point boxes

νX.([a call b][b call a]ff ∧ [a call b]X ∧ [b call a]X ∧ [a ret b]X ∧ [b ret a]X ∧ [τ]X) (3.25)

• ”In an arbitrary sequence of alternating calls to a and b, a return must not happen immediately after a call to b”: Here we can use the fact that we want to state something about an a call b call a call b . . .-pattern like so

νX.([a call b](¬r ∧ X) ∧ [b call a]X ∧ [a ret b]X ∧ [b ret a]X ∧ [τ]X) (3.26)

• ”Along any sequence of silent actions¹⁰, after the first call from method a to method b, there must be another call before b returns”: Here we will use a nested fixed point formula by introducing another fixed point operator.

νX.([a call b]vY.(¬r ∧ [τ]Y ) ∧ [τ]X) (3.27) If the formula seems hard to read, review examples 3.9 and 3.10.

3.5 Global specifications

The way we establish control flow properties of a system is by specifying one or several properties in a so called global specification and then checking that global specification against the flow graph generated from the code, for instance by the tableau model checking approach mentioned in section 2.2.2 or by the PROMOVERtool, described in [26]. The global specifications could be either behavioural or structural properties. However, in this thesis we only focus on the case where global specifications are given in behavioural form since an end user is usually more interested in what the code does rather than how it is implemented.

3.6 A framework for compositional verification of procedural programs

A certain shortcoming with the usual approach of verifying temporal properties of procedural programs is that all of the code that is to be verified has to be available at the time of verification, and once it is verified and any line of code is changed, the whole verification would have to be made from scratch again. It is today often the case that a system is not built as a single monolithic piece, but rather as a collection of components. Examples of this are electronic equipment with plug-ins or usage of third party

10Although the original statement does not contain this phrase, we take the liberty of interpreting that the original statement is expected to hold regardless of silent actions before the first method call.

Hence the addition of this phrase.

(32)

libraries. In these cases it is desirable to be able to verify the parts of the code that are available and to be able to somehow ensure that whenever we compose this code with a component¹¹, the global property still holds.

There is a verification paradigm for such scenarios called compositional verification.

This procedure originated in the context where several parallel finite-state processes are composed. However, procedural programs with recursion have potentially an infinite number of states.

In [1] compositional verification is extended to handle infinite state spaces in the context of sequential programs, and this paper will be the basis for our problem definition. This framework uses the data abstraction we inherit in this work. I.e., all data is abstracted away from the specifications and models. The focus is purely on control flow.

In a later paper [25], the procedure is extended to include program data in the specifications; more on this in section 7.3.

3.6.1 Local specification

A central concept in the [1] framework is the concept of a local specification. A local specification is a set of control flow properties of structural form that are given for each unavailable method. Intuitively, a local specification can be thought of as a ”promise” of what will and will not happen in the unavailable methods once they become available.

3.6.2 Description of the verification procedure

The compositional verification rule in [1] is outlined in the following manner:

Let A, B be components, X be a component variable, ψ be a global specification and φ be a local specification, then

` A : φ X : φ` X ⊗ B : ψ

` A ⊗ B : ψ (3.28)

where⊗ denotes composition of components. What this rule is saying is that: if we have proved that for any component X, local specification φ entails that X composed with B will satisfy ψ (the second premise in the equation) and we have that for component A, local specification φ holds (first premise in the equation), then we get that A composed with B will satisfy ψ. This is the procedure that is illustrated in figure 1.1.

The verification procedure described in [1] consists of three steps to verify that A composed with B satisfies ψ; where the implementation of A is not yet available:

1. Find a suitable local specification φ for component A, preferably as weak as possible.

2. Prove correctness of the decomposition in (3.28) (second premise).

3. When component A becomes available, verify that the local specification φ holds for A.

Step 3 can be done by means of model checking. Step 2 is described in detail in [1]. It involves replacing component X with a so called maximal model for φ which is a representation of all possible models that would satisfy φ.

11Code that is dynamic or still unavailable.

(33)

For step 1, the end user (presumably a developer) has to come up with a suitable local specification, a fact that now brings us into the problem definition of the thesis.

3.7 Problem definition: Generating local specifications

This process of finding a suitable local specification could potentially be technically very challenging and time consuming for the end user. Especially considering that it is desired that the local specification will be as weak as possible while still maintaining soundness of the decomposition. A weaker specification relaxes the constraints on the component and would then in practice minimise the number of components that would unnecessarily be rejected despite not breaking the global specification.

The question then arises if it is possible to somehow algorithmically generate the local specifications for the unavailable methods based on the global specification and the available methods.

3.7.1 Formal problem specification

Formally, the local specification generator algorithm has the following specification:

Input: A global specification ψ in the form of a behavioural formula, a flow graph interface I = (I⁺, I⁻) and a flow graph G where each method mentioned in I⁺ has a method graph in G.

Output: A set Π(G, ψ) of local specifications such that

G∪ GC |=bψ ⇐⇒ ∃χ ∈ Π(G, ψ) . GC |=sχ

where |=b and |=s indicates behavioural and structural satisfaction, respectively, and GCis a completion of G such that G∪GCis a closed flow graph¹². Additionally, we define

UNSATISFIABLE ⇐⇒ Π(G) = ∅

12I.e., there exist a method graph for every method in I.

(34)

Transforming Behavioural Properties to Structural Properties

We now direct our attention to the paper ”Reducing behavioural to structural properties of programs with procedures” [2]. In it, the authors present an algorithm that can take any behavioural formula and transform it into a set of structural constraints¹. We will refer to this algorithm as the transformation algorithm.

In this chapter, we give a detailed overview of how the transformation algorithm works. The reason for this is that the local specification generator algorithm relies heavily on the transformation algorithm, and a ”how to use” understanding of the transformation algorithm is required in order to understand its extension into the local specification generator algorithm presented in chapter 5.

The transformation algorithm can be divided into three major components, which are described in this chapter: the transformation algorithm tableau system, a complementing post-processing procedure and a set of repeat conditions used in the tableau system.

4.1 Introduction

We start by an example. Take for instance the behavioural formula

νX.([a call b]ff∧ [τ]X) (4.1)

which expresses ”Along any sequence of silent actions, method a may not call method b”² by having a box modality take any a-to-b-call into a failing state. It is not too difficult to see that it could be expressed as the structural constraint

a→ νX.([b]ff ∧ [ε]X) (4.2)

which simply says that ”in the implementation of a along any sequence of silent action instructions, there may not be a ’call b’ instruction”.

1That is, within the restrictions that they are expressed in the property language that we describe in section 3.4.2. I.e., they are specifying matters only regarding calls and returns. Program data is abstracted away as mentioned in section 1.4.

2This is assuming a prohibition of self-calls and, for instance, flow graph interface I⁺= {b}, I⁻= {a}.

See section 3.4.2 for details.

27