Generation of dynamic control-dependence graphs for binary programs

(1)

Institutionen för datavetenskap

Department of Computer and Information Science

Final thesis

Generation of dynamic control-dependence

graphs for binary programs

by

Jakob Pogulis

LIU-IDA/LITH-EX-A--14/041--SE

2014-06-06

Linköpings universitet

SE-581 83 Linköping, Sweden

Linköpings universitet

581 83 Linköping

(2)

(3)

Linköping University

Department of Computer and Information Science

Final Thesis

Generation of dynamic control-dependence

graphs for binary programs

by

Jakob Pogulis

LIU-IDA/LITH-EX-A--14/041--SE

2014-06-06

Supervisor: Ulf Kargén

Examiner: Nahid Shahmehri

(4)

(5)

Generation of dynamic control-dependence

graphs for binary programs

Jakob Pogulis

jakob@pogulis.se

Division for Database and Information Techniques

Department of Computer and Information Science

Link¨

oping University, Sweden

(6)

(7)

Abstract

Dynamic analysis of binary files is an area of computer science that has many purposes. It is useful when it comes to debugging software in a development environment and the developer needs to know which statements affected the value of a specific variable. But it is also useful when analyzing a software for potential vulnerabilities, where data controlled by a malicious user could poten-tially result in the software executing adverse commands or executing malicious code.

In this thesis a tool has been developed to perform dynamic analysis of x86 binaries in order to generate dynamic control-dependence graphs over the execution. These graphs can be used to determine which conditional statements that resulted in a certain outcome. The tool has been developed for x86 Linux systems using the dynamic binary instrumentation framework PIN, developed and maintained by Intel .R

Techniques for utilizing the additional information about the control flow for a program available during the dynamic analysis in order to improve the control flow information have been implemented and tested. The basic theory of dynamic analysis as well as dynamic slicing is discussed, and a basic overview of the implementation of a dynamic analysis tool is presented. The impact on the performance of the dynamic analysis tool for the techniques used to improve the control flow graph is significant, but approaches to improving the performance are discussed.

(8)

(9)

List of Figures

2.1 An example of a control flow graph with classified basic blocks. . 10 2.2 Illustration of dominator tree and post-dominator tree for

Fig-ure 2.1. . . 12 2.3 The static control-dependence graph for Figure 2.1. . . 15 2.4 The dynamic control-dependence graph for Figure 2.1, assuming

the function was executed with n = 5. . . 17 3.1 An illustrative, and simplified, example of how the PIN

instru-mentation process can be though of. . . 25 3.2 An illustration of control flow graph improvement in the LT/LT

case. . . 30 3.3 An illustration of control flow graph improvement in the LT/EQ

case. . . 30 3.4 An illustration of control flow graph improvement in the LT/GT

case. . . 30 3.5 An illustration of control flow graph improvement in the EQ/LT

case. . . 31 3.6 An illustration of control flow graph improvement in the EQ/EQ

case. . . 31 3.7 An illustration of control flow graph improvement in the EQ/GT

case. . . 31 3.8 An illustration of control flow graph improvement in the GT/LT

case. . . 32 3.9 An illustration of control flow graph improvement in the GT/EQ

case. . . 32 3.10 An illustration of control flow graph improvement in the GT/GT

case. . . 32 3.11 The static control flow graph discussed in Section 3.3.3. . . 33 3.12 The control flow graph discussed in Section 3.3.3, illustrating the

changes made for a GT/EQ case when the relationship C0 direct

−−−−→ B2 is discovered. . . 34

3.13 The control flow graph discussed in Section 3.3.3, illustrating the changes made for a GT/LT case when the relationship C0

direct

−−−−→ D2 is discovered. . . 35

4.1 UML class diagram over Hydra Static Library. . . 39 4.2 An overview of the file format used to store the dynamic

control-dependence graph. . . 46

(12)

iv Chapter 0

5.1 The cumulative occurrences of the different cases of dynamic im-provement for the ls binary. . . 51 5.2 The cumulative occurrences of the different cases of dynamic

im-provement for the ls binary. . . 52 5.3 The dynamic connections mapped during an execution of the gcc

binary. . . 53 5.4 The cumulative mapping of dynamic connections during an

exe-cution of the find binary. . . 54 6.1 The simple moving average (n = 2 500) of dynamic connections

(13)

List of Tables

4.1 Complexity for the red-black tree data structure. . . 40 5.1 The binaries used to evaluate and test DyCDG-pintool. . . 48 5.2 Distribution of the different dynamic improvement cases

pre-sented in absolute numbers. . . 49 5.3 Distribution of the different dynamic improvement cases

pre-sented in percentages (%). . . 50 5.4 Dynamic connections made for each binary. . . 51 5.5 Number of instructions executed per dynamic connections for

each binary. . . 52 5.6 Measurement of execution time for the binaries used in the

eval-uation. . . 56 5.7 Slowdown caused by DyCDG-pintool for the binaries used in the

evaluation. . . 57 5.8 Slowdown caused by an ”empty” PIN tool for the binaries used

in the evaluation. . . 57 5.9 Slowdown caused by the DynInst static analysis executed through

an ”empty” PIN tool, for the binaries used in the evaluation. . . 58 5.10 The slowdown of monitored executions of the binaries used in

evaluation without PIN overhead or static analysis. . . 58 5.11 Dynamic control dependence graph size of the binaries used in

evaluation, measured in megabytes. . . 59 5.12 Number (median) of basic blocks in a dynamic slice of the binaries

used in evaluation based on 1000 random slices. . . 59

(14)

(15)

Listings

2.1 A simple and structured python function. . . 11 2.2 Initialization function for finding dominators using the Lengauer

& Tarjan algorithm. . . 13 2.3 The Lengauer & Tarjan algorithm for computing dominators in

a flow graph. . . 14 2.4 Initialization function for computing the Static Control

Depen-dence Graph. . . 16 2.5 Algorithm for computing the Static Control Dependence Graph. 16 2.6 Branching operation for creating a dynamic control dependence

graph. . . 18 2.7 Merging operation for creating a dynamic control dependence

graph. . . 18 3.1 Example of BAP Intermediate Language representation of

assem-bler instruction add. . . 23 3.2 An example of machine code disassembled into INTEL syntax by

Udis86. . . 24 3.3 Splitting a basic block in a control flow graph. . . 29 4.1 The algorithm for dynamic improvement of control flow graphs. . 42 4.2 The process of identifying the type for a basic block. . . 44

(16)

(17)

Preface

This is the final thesis for my education in computer science at Link¨oping Uni-versity. I’ve had the opportunity to work part-time at the division for Database and Information Techniques (ADIT) while completing my education, something which has given me a great appreciation for the work and research that is con-ducted on a daily basis at Link¨oping University. I would like to thank Profes-sor Nahid Shahmehri for giving me this opportunity, and I would also like to thank the rest of ADIT for the past year.

I would like to especially thank my supervisor Ulf Karg´en for the time he has spent on discussions, arguments and explanations regarding both his work and my own, as well as the area of binary analysis as a whole. It is something which has taught me more about a single topic within computer science than I thought there was to know only a few years ago.

Finally I would like to thank Professor Ulf Nilsson without whose inspira-tion I would never have decided to continue my educainspira-tion with a focus on the more theoretical aspects of computer science. I will always be grateful for the opportunity it was to be part of the board of admission, and the way it changed my view on both education and research.

(18)

(19)

Chapter 1 Introduction

1

1_{From the web-comic xkcd, issue 292. https://xkcd.com/292/}

(20)

2 Chapter 1

1.1 Background

Program slicing was first described by Mark Weiser in 1981 as a method for automatically decomposing programs by analyzing their data flow and control flow [1]. He described a static approach where the program itself was modeled by a flow graph with a single entry-point and a single exit-point, restricting the analysis to structured programs. By starting from a specific point in a program it is possible to get information about the previous statements, predicates or data sources that have caused the control flow, or data flow, to take the specific path through the program. The sequence of instructions that would need to be executed when moving from a point Ip to a point Ip+n in a program’s control

flow graph is called a slice. Since it is generally impossible to tell how every predicate will be evaluated and which branches that will be followed with only the static information at hand, a slice obtained by using static program slicing will consist of the set of all statements that might affect the program flow, rather than the smaller set of all statements that actually affect the program flow during a concrete execution.

Dynamic Program Slicing was first described by Bogdan Korel and Janusz Laski in 1988 as a method for program slicing that finds the set of all statements in a program that really affected the result of an execution, and the value and state of a set of variables, given a specific set of input values [2]. The big difference between program slicing and dynamic program slicing is that dynamic program slicing is performed over an execution of the program. Because of that, the resulting slice depends entirely on the input provided during the execution. The dependencies between statements are modeled using a dynamic dependence graph consisting of the executed statements and their dependencies to one another. By monitoring a program during execution the set of statements that were actually executed between two points in time can easily be extracted. In order to perform an analysis of the execution of a binary program, in-cluding dynamic program slicing, information regarding which instructions that were executed needs to be conveyed to a program, or entity, capable of interpret-ing the information. This can be achieved in a number of different ways such as a special purpose processor designed with such a feature, a virtual machine, a hypervisor, or a similar technology that can pass the information to a third party. More commonly however, the binary file itself or the programs represen-tation in memory is augmented with additional instructions that records which of the original program’s instructions that were executed. This process is known as instrumentation. When a binary is instrumented as part of the process to perform dynamic program slicing of some sort, all of the dependencies executed needs to be stored for later inspection if one wishes to use the results of a sin-gle execution for multiple slices. Since a modern CPU is capable of processing hundreds of millions of instructions per second, the cost of storing all the infor-mation about a programs execution rapidly grows with respect to both time of the execution and the amount of information stored per instruction executed.

To alleviate the problem of storing the information of a programs execu-tion different techniques and methods have been proposed. By using a rep-resentation of the programs execution where as much information as possible about the execution itself can be inferred rather then explicitly read, the size of the dynamic dependence graph itself will be minimized. For the control flow of a concrete execution of a program, this graph is called a dynamic

(21)

control-Introduction 3

dependence graph. The dynamic control-dependence graph is likely to be sig-nificantly smaller than a complete record of the executed instructions, yet it contains all the information necessary. By combining the control flow graph and the dynamic control-dependence graph for a program, the complete record of the instructions executed during a concrete execution can be recreated.

Current research within the division for Database and Information Tech-niques at Link¨oping University by Ulf Karg´en and Professor Nahid Shahmehri focuses on creating dynamic data-dependence graphs for unmodified x86 binary programs and storing these to disk in an efficient manner [3]. This master thesis project is meant to complement their research by adding the ability to reason about control-dependencies as well.

It is easy to reason about both control flow and data flow, assuming one has access to the source code of a high level language such as C/C++, Java or similar. Once the source code has been compiled a lot of information regarding both control flow and data flow is lost since it is not required in order to execute the program. Both static and dynamic approaches to program analysis suffers from the loss of information during the compilation step of a programs lifecycle even if they suffer in different ways. Static analysis is associated with problems concerning to what memory a pointer points to and in which context a set of instructions are executed, dynamic analysis on the other hand can only utilize information from a concrete execution of the program, and thus can never get an overview of the entire program. It is worth noting that the partiality described for dynamic analysis is a problem that exists with all kinds of dynamic analysis and not just binary dynamic analysis.

1.2 Purpose

The purpose of this master thesis project have been to analyze which tools and methods that can be suitable when generating dynamic control dependence graphs for binary programs. Furthermore, to examine how the information available during a dynamic analysis can be leveraged in order to improve the dynamic control-dependence graphs and if these kinds of changes could have any effect on the size of the resulting dynamic control-dependence graph.

In order to draw conclusions about real programs as part of the analysis a tool to construct a static control flow graph and maintain a dynamic control dependence graph during the execution of an unmodified x86 binary file was developed.

(22)

4 Chapter 1

Research questions) The research questions settled on before the work started were intentionally a little vague with the exception of the overall goal as to how to generate dynamic control dependence graphs for binary programs. The reason for this was to further define the research questions as the required steps to fulfill the overall goal got clearer over time.

1. What tools, methods or algorithms are best suited for generating dynamic control dependence graphs for binary programs.

2. To what extent does the quality of the static control flow graph affect the quality and usability of the control dependence graph; is there a difference in the size of the final dynamic dependence graph that can be attributed to the quality of the static control flow graph?

3. Can a static control flow graph of initial poor quality be improved during the execution, and monitoring, of an unmodified x86 binary program by leveraging the additional information provided as a result of the instru-mentation, and to what extent can such an improvement contribute to the size of the final dynamic dependence graph?

4. How much, and how frequently, does the static control flow graph need to change if the additional information provided as a result of the instrumen-tation were to be used during runtime to improve upon the static control flow graph.

Requirements on implementation) Any tool or library that were devel-oped as part of the work should be develdevel-oped in C/C++. The implementation for generating dynamic control dependence graphs should be well structured and documented so that people with at least the same competence in software design and development can continue the work.

1.3 Methodology

The work with this thesis has been conducted as an iterative process of litera-ture studies, feasibility analysis, system design and evaluation. Throughout the process there have been regular contact and discussion with the thesis super-visor over which problems, ideas and solutions that have been worth pursuing. The methods used during the development has been similar to those used in the bachelor’s thesis A framework for testing Distributed Systems [4].

The work started with a thorough literature study of the areas of program slicing, binary analysis, compiler theory and taint analysis to get the information needed in order to really understand what problems that exists within this domain. More recent publications were chosen in order to understand which tools that exists today and what their benefits and drawbacks are. A more board selection of publications were chosen in order to understand the techniques, in part based on the number of citations, in part on their perceived relevance to the technique.

As for the feasibility of different techniques and approaches, the fact that we are working with unstructured code, with the possibility of additional code being added during runtime, have been a big factor in determining whether or

(23)

Introduction 5

not a specific technique can be used or not. There is a constant trade-off between soundness and complexity where dynamic analysis has to be fast enough, and at the same time reliable enough, to be considered a viable option for program analysis. These decisions have been made on a per-case basis together with the thesis supervisor.

The system have been designed with modularity in mind in an attempt to provide a system that can actually be used for research purposes, other than this master thesis, where components can be replaced for better alternatives in the future. The main priority has however been to design a system that is fast enough to be part of a dynamic binary analysis on a basic block granularity level. This means that the complexity of different components have taken precedence over modularity.

The evaluation of the tool developed have been conducted on a small set of binaries that is commonly found in a default installation of the Linux operating system, Ubuntu specifically.

1.4 Outline

The rest of the thesis is organized as follows. Chapter 2 gives descriptions and short introductions to several different theories and methods that is used and discussed throughout the thesis. Chapter 3 contains different approaches and an overview of tools that could be used for the static analysis required by this project. Chapter 4 gives a brief but complete description of the system that was developed in order to generate dynamic control-dependence graphs. Chapter 5 presents both quantitative and qualitative results obtained during the evaluation of the system developed and the analysis of the problem as a whole. Finally, Chapter 6 presents conclusions regarding control flow analysis of unmodified x86 binary files together with suggestions for future work.

(24)

(25)

Chapter 2 Theory

1 1_{From the web-comic xkcd, issue 978. https://xkcd.com/978/}

(26)

8 Chapter 2

2.1 Directed graph

A directed graph is a set of vertices, or nodes, that is connected by edges. The edges themselves has a direction which means that for every pair of vertices that is connected one of them is the source of the edge while the other one is the target of the edge. The set of vertices w0...wn that is connected to the

vertex v such that there is an edge originating from v (in other words, where v is the source of the edge) to wi (in other words, where wi is the target of

the edge) is called the direct successors of v, which is denoted by v−−−−→ wdirect i.

The set of vertices w0...wn that is connected to the vertex v such that there

is an edge originating from wi (in other words, where wi is the source of the

edge) to v (in other words, where v is the target of the edge) is called the direct predecessors of v, which is denoted by wi

direct

−−−−→ v. The successors of v is the set of direct successors of v together with the successors of each direct successor of v, denoted by v → wi, and the predecessors of v is the set of direct predecessors

of v together with the predecessors of each direct predecessor of v, denoted by wi→ v.

Unlike an undirected graph (sometimes called an ordinary graph), when traversing a directed graph from start to finish only the set of vertices that constitutes a vertex direct successors are considered candidates to move to, and when traversing backwards only the set of vertices that constitutes a vertex direct predecessors are considered. A directed graph can contain cycles, where a set of vertices (|v0...vn| > 1) are all each others predecessors as well as successors

such that vi direct

−−−−→ vi+1 and vn direct

−−−−→ v0. A directed graph can also contain

loops, where a vertex is it’s own direct successor as well as direct predecessor such that v−−−−→ v.direct

Definition 1 A vertex q is said to be a successor of a vertex p if there exists some path P = (v1, ..., vn) where v1= p and vn= q.

Definition 2 A vertex q is said to be the direct successor of a vertex p if there is a directed edge from p to q.

Definition 3 A vertex q is said to be a predecessor of a vertex p if there exists some path P = (v1, ..., vn) where v1= q and vn = p.

Definition 4 A vertex q is said to be the direct predecessor of a vertex p if there is a direct edge from q to p.

(27)

Theory 9

2.2 Control Flow Graph

A control flow graph (CFG) is a directed, connected, graph that is used to represent all possible paths of execution that can occur within a program. Each vertex in the control flow graph represents a basic block, a linear sequence of instructions, and each edge between two vertices represents a possible control flow path [5]. A control flow graph has two artificial vertices, vST ART and vEN D.

The reason for these two artificial vertices is to ensure a connected graph where every actual vertex without a predecessor is is connected with vST ART and

every actual vertex without a successor is connected with vST ART. Because of

this addition to the graph there is a path vST ART...v...vEN Dfor every vertex v

within the graph.

For the purpose of explaining a control flow graph, instructions can be classi-fied either as a Transfer Instruction (TI) or as a Non Transfer Instruction (NTI). Transfer instructions are the set of instructions that might transfer con-trol flow to a part of the program different from the address of the next in-struction. Unconditional jumps, where the control flow is transferred to the target jump address; conditional jumps, where the control flow might be trans-ferred to the target jump address; and subroutine calls, where the control flow is transferred to the invoked subroutine, are all examples of transfer instructions. Non transfer instructions are the set of instructions that will always transfer the control flow to the next instruction in sequence [6].

A basic block is a sequence of instructions that has a single point of entry and a single point of exit. A basic block either consists of zero or more non transfer instructions and ends with a single transfer instruction or consists of one or more non transfer instructions.

For the purpose of control flow graph generation, each basic block can be classified as one of six different types, depending on the last instruction of the basic block [6].

• 1-way basic block, where the last instruction in the basic block is an un-conditional jump. This class of basic blocks will always have one outgoing edge.

• 2-way basic block, where the last instruction in the basic block is a conditional jump. This class of basic blocks will always have two outgoing edges.

• n-way basic block, where the last instruction in the basic block is an indexed jump. This class of basic blocks will have as many outgoing edges as the table used for the indexed jump has entries.

• call basic block, where the last instruction in the basic block is a call to a subroutine. This class of basic blocks will always have two outgoing edges, one edge to the entry-point for the subroutine, one edge to the code that should execute once the subroutine has returned. Unless the subroutine causes the program to halt, both paths will be traversed.

• return basic block, where the last instruction in the basic block is an instruction to return from a subroutine or end of program. This class of basic blocks will not have any outgoing edges. This is because of context sensitivity; the fact that the jump in control flow when encountering a

(28)

10 Chapter 2

return basic block will be directly dependent of the context from which the initial call to the subroutine was made.

• fall basic block, where the next instruction is the target address of a branching instruction. This class of basic blocks will always have one outgoing edge, unconditionally falling through to the next basic block. An edge from a vertex vi to a vertex vj indicates that vi might transfer

the control flow to vj. Such a transfer of control flow is guaranteed if vi is

either a 1-way basic block or a fall basic block. If v1 is a call basic block the

transfer of control can not be guaranteed unless additional information about the subroutine, although in most cases it can be assumed that the control flow will be passed first to the subroutine and then to the next basic block in sequence. If vi is a 2-way basic block or a n-way basic block however, the transfer of control

flow depends on the predicate that determines which branch to follow from vi.

In addition to the classifications made by Cristina Cifuentes in [6], call basic block that has an indirect address reference (calculated during runtime), will be treated as if they had no outgoing edges. The reason for this addition to the classifications is to preserve context sensitivity when performing dynamic analysis. Context sensitivity in binary analysis is the concept that a function’s behavior, and impact on the program, is relative to the calling context, or in other words, relative to the part of the code that called the function. Since functions are only control-dependent on a specific call-instruction as long as that instruction is part of the code that is currently being executed, adding edges to a function from each call would result in erroneous control-dependencies.

START A B C D E F G END

Figure 2.1: An example of a control flow graph with classified basic blocks. Listing 2.1 shows the source code for a fairly simple, and structured, function implemented in python. The control flow graph for this function is illustrated in Figure 2.1. Line 2 and 3 corresponds to basic block A, line 4 corresponds to

(29)

Theory 11

basic block B, line 5 corresponds to basic block C, line 6 corresponds to basic block D, line 8 corresponds to basic block E, line 9 corresponds to basic block F and finally, line 10 corresponds to basic block G.

1 def f u n c ( n ) : 2 i t e r = 0 (A) 3 num = n (A) 4 while num % 81 != 0 : (B) 5 i f num % 6 == 0 : (C) 6 num += 19 (D) 7 e l s e :

8 num = num ∗ 3 (E)

9 i t e r += 1 (F)

10 return i t e r (G)

Listing 2.1: A simple and structured python function.

2.3 Dominator tree

A dominator tree describes one specific aspect of a directed graph, namely which vertices that dominates other vertices. Domination in this case means that in order to reach a vertex that is dominated by another, the dominating vertex will always have to be visited first. It is useful to have information regarding domination when reasoning about whether or not a specific vertex will be part of the vertices that needs to be visited when going from one vertex to another, no matter which path is taken. Since the dominator tree only represents in-formation regarding domination and nothing else, it requires less edges than a control flow graph and has neither cycles nor loops, thus it is both easier and faster to traverse.

More formally, a dominator tree is a directed tree where the successors of a vertex v are the vertices that are directly dominated by v, or in other words, v is the immediate dominator of it’s successors. A vertex vi is said to dominate a

vertex vj if every path from the entry point of the graph (vST ART in a control

flow graph) to the vertex vj also contains the vertex vi. A vertex vi is said to

be the immediate dominator of a vertex vj if (a) vi dominates vj, (b) vi is is

not the same vertex as vj and (c) there is no vertex vk such that vi dominates

vk and vk dominates vj.

Definition 5 A vertex vq dominates a vertex vp if there exists some path P =

(vST ART, ..., vq, ..., vp) and there exists no path P = (vST ART, ..., vp) where P ∩

vq = ∅.

Definition 6 A vertex vq strictly dominates a vertex vp if vq dominates vp and

vq 6= vp.

Definition 7 A vertex vq is the immediate dominator of a vertex vp if vq

strictly dominates vp but doesn’t strictly dominate any other vertex that strictly

dominates vp.

Post-domination is another aspect of domination where a vertex vi is said

(30)

12 Chapter 2

control flow graph) also contains the vertex vi. Since a control flow graph has

a single point of entry and a single point of exit, a post-dominance tree can be calculated the same way as a dominance tree by changing the direction of the edges between vertices.

Definition 8 A vertex vq post-dominates a vertex vp if there exists some path

P = (vEN D, ..., vq, ..., vp) and there exists no path P = (vEN D, ..., vp) where

P ∩ vq= ∅.

Definition 9 A vertex vq strictly post-dominates a vertex vp is vq

post-dominates vp and vq 6= vp.

Definition 10 A vertex vq is the immediate post-dominator of a vertex vp if

vq strictly post-dominates vp but doesn’t strictly post-dominate any other vertex

that strictly post-dominates vp.

For more information about the domination property of directed graphs, the reader is directed to Thomas Lengauer’s and Robert Tarjan’s work regarding dominators [7].

Building upon the previous example of a control flow graph in Section 2.2, the dominator tree for Figure 2.1 can be seen in Figure 2.2(a), and the post-dominator tree for Figure 2.1 can be seen in Figure 2.2(b).

D E F END

C G

B

A START

(a) Dominator tree

START C D E A F B G END (b) Post-Dominator tree

(31)

Theory 13

2.3.1 Algorithm for domination tree construction

Thomas Lengauer and Robert Tarjan have developed an algorithm to find domi-nators based on depth-first search, and a depth-first spanning tree, of the control flow graph [7]. A brief description of the algorithm based on lecture notes from the University of Cambridge by Martin Richards [8] is presented below, a more descriptive explaination of the algorithm can be found in the report Comput-ing Dominators and Dominance Frontiers that is part of the Massively Scalar Compiler Project [9]. 1 def i n i t i a l i z e ( v e r t e x ) : 2 i n d e x = g i n d e x 3 g i n d e x += 1 4 5 o r d e r [ i n d e x ] = v e r t e x 6 semidom [ i n d e x ] = i n d e x 7 idom [ i n d e x ] = 0 8 a n c e s t o r [ i n d e x ] = 0 9 10 f o r s u c c e s s o r in s u c c e s s o r s ( v e r t e x , c f g ) : 11 s i n d e x = −1 12 i f not s u c c e s s o r in o r d e r : 13 s i n d e x = i n i t i a l i z e ( s u c c e s s o r ) 14 p a r e n t [ s i n d e x ] = i n d e x 15 e l s e : 16 s i n d e x = o r d e r . i n d e x ( s u c c e s s o r ) 17 s u c c e s s o r s [ i n d e x ] . push ( s i n d e x ) 18 p r e d e c e s s o r s [ i n d e x ] . push ( i n d e x ) 19 20 return i n d e x

Listing 2.2: Initialization function for finding dominators using the Lengauer & Tarjan algorithm.

The algorithm operates on a control flow graph and traverses that graph in reverse depth-first search order. Five lists holding integer values representing indexes of other lists are required (parent, semidom, idom, ancestor & best) as well as three lists holding lists with integer values representing indexes of other lists (successors, predecessors & bucket). Before computing the ac-tual dominators these lists needs to be initialized and the set of vertices in the control flow graph needs to be ordered in the same order they would have been traversed during a depth-first search of the graph. Listing 2.2 describes a work-ing implementation in python where the orderwork-ing of vertices and initialization of lists happens at the same time.

The algorithm itself as seen in Listing 2.3 processes the vertices in reverse depth-first search order which allows for the immediate dominator information to be cached and reused for vertices further away from their immediate domi-nator. The algorithm itself runs in O(m log n) where m is the number of edges and n is the number of vertices in the control flow graph.

(32)

14 Chapter 2 1 def compute ( ) : 2 f o r w in r a n g e ( l e n ( o r d e r ) , 2 , −1) : 3 p = p a r e n t [ w ] 4 f o r v in p r e d e c e s s o r s [ w ] : 5 u = EVAL( v ) 6 i f semidom [ w ] > semidom [ u ] : 7 semidom [ w ] = semidom [ u ] 8 9 b u c k e t [ semidom [ w ] ] . push (w) 10 LINK( p , w) 11 12 f o r v in b u c k e t [ p ] : 13 u = EVAL( v ) 14 i f ( semidom [ u ] < semidom [ v ] ) : 15 idom [ v ] = u 16 e l s e : 17 idom [ v ] = p a r e n t [ w ] 18 19 b u c k e t [ p ] = [ ] 20 21 def LINK( v , w) : 22 a n c e s t o r [ w ] = v 23 24 def EVAL( v ) : 25 a = a n c e s t o r [ v ] 26 while a n c e s t o r [ a ] : 27 i f semidom [ v ] > semidom [ a ] : 28 v = a 29 a = a n c e s t o r [ a ] 30 return v

Listing 2.3: The Lengauer & Tarjan algorithm for computing dominators in a flow graph.

2.4 Control Dependence Graphs

A control dependence graph (CDG) is a partially ordered, directed, acyclic graph where the vertices of the graph represents basic blocks and the edges between two vertices represents the control conditions on which the execution of the operations depends [10]. A control dependence between two vertices in a control flow graph exists if there is a conditional branch at one of the vertices (the source of the dependence) that determines whether or not the other vertex (the sink of the dependence) is to be executed.

A vertex vq is control dependent on a vertex vp if (a) vq 6= vp, (b) vq does

not post-dominate vp and (c), there exists a path from vp to vq such that vq

post-dominates every vertex on the path except for vp. A vertex vq can be

control dependent on itself if there exists a path from vq to vq such that vq

(33)

Theory 15

2.4.1 Static Control Dependence Graphs

A static control dependence graph can contain fewer vertices than the initial control flow graph since there is no need to keep vertices in the static control dependence graph if they (a) depend on the same vertex as their immediate dominator and (b) doesn’t have any vertices depending on them. That being said, a static control dependence graph can still be larger than the initial control flow graph in terms of edges since a single vertex can be control dependent on hundreds or even thousands of vertices.

To continue with the initial example of a control flow graph in Section 2.2, the static control flow graph for Figure 2.1 can be seen in Figure 2.3. It is worth noting that vertex B is control dependent on itself since it is a loop header, and also that the number of vertices is smaller in this static control dependence graph than the control flow graph – a property that is far from guaranteed when working with larger control flow graphs.

B

C F

D E

Figure 2.3: The static control-dependence graph for Figure 2.1.

Algorithm for static control dependence graph construction

In order to construct a static control dependence graph information regard-ing the control flow graph as well as the post-dominator-tree is required. Ron Cytron, Jeanne Ferrante, Barry K. Rosen and Mark N. Wegman have devel-oped an algorithm to compute the static control dependence graph based on dominance frontiers [11]. A description of the algorithm is presented below, for more detailed information regarding the algorithm the reader is directed to ei-ther the article itself, Efficiently Computing Static Single Assignment Form and the Control Dependence Graph, by Cytron et al. [11] or the book Optimizing Compilers for Modern Architectures: A Dependence-Based Approach by Randy Allen and Ken Kennedy [12].

The algorithm operates on a control flow graph, a post-dominator tree and the control dependence graph that it constructs. The vertices in the post-dominator tree is ordered so that if x post-dominates y, then x comes after y as described in Listing 2.4.

(34)

16 Chapter 2 1 def i n i t i a l i z e ( ) : 2 r i n d e x = l e n ( p o s t d o m i n a t o r t r e e ) − 1 3 windex = l e n ( p o s t d o m i n a t o r t r e e ) − 1 4 o r d e r [ windex ] = r o o t v e r t e x ( p o s t d o m i n a t o r t r e e ) 5 windex −= 1 6 while r i n d e x > 0 : 7 v = o r d e r [ r i n d e x ] 8 r i n d e x −= 1 9 i f v != None : 10 f o r p in p r e d e c e s s o r s ( p , p o s t d o m i n a t o r t r e e ) : 11 o r d e r [ windex ] = p 12 windex −= 1

Listing 2.4: Initialization function for computing the Static Control Dependence Graph.

The algorithm itself as seen in Listing 2.5 first checks for control dependencies in the control flow graph and then moves on to check the parts of the static con-trol dependence graph that has already been constructed with information from the post-dominator tree. Not counting the construction of post-dominators, the algorithm runs in O(max(N + E, |C|)) where N is the number of vertices in the control flow graph, E is the number of edges in the control flow graph and C is the control-dependencies in the control-dependence graph. In other words, the complexity of the algorithm is the maximum of the size of the input and the output. Thus, there can be no algorithm that is asymptotically better in performance. 1 def compute ( ) : 2 f o r x in o r d e r : 3 f o r y in p r e d e c e s s o r s ( x , c f g ) : 4 i f x != ipo stdo m ( y ) : 5 c o n n e c t ( x , y , cdg ) 6 7 f o r z in p r e d e c e s s o r s ( x , p o s t d o m i n a t o r t r e e ) : 8 f o r p r e d e c e s s o r s ( z , cdg ) : 9 i f x != ipo stdo m ( y ) : 10 i f not c o n n e c t e d ( x , y , cdg ) : 11 c o n n e c t ( x , y , cdg )

Listing 2.5: Algorithm for computing the Static Control Dependence Graph.

2.4.2 Dynamic Control Dependence Graphs

A dynamic control dependence graph, like its static counterpart, can contain fewer vertices than the initial control flow graph. Although, every vertex that controls whether or not another executed vertex should be executed, and every executed vertex that is control dependent on another vertex will be part of the dynamic control dependence graph. The maximum number of edges in a dynamic control dependence graph is unbounded since almost every transition of control flow from one vertex to another will yield a new edge from the executed vertex to the vertex that it is control dependent upon.

(35)

Theory 17 (1,1)(2,2) (3,3)(4,4) (1,1) (2,2) (3,3) (4,4) _(4,4) (1,1) (3,3) (2,2) E C F B

Figure 2.4: The dynamic control-dependence graph for Figure 2.1, assuming the function was executed with n = 5.

Assuming we would execute the function described in Figure 2.1 with n = 5 the function func would execute in the following sequence: A1, B1, C1, E1,

F1, B2, C2, E2, F2, B3, C3, E3, F3, B4, C4, E4, F4, B5, G1. This execution

pattern would result in the dynamic control dependence graph that is illustrated in Figure 2.4. Since the vertices ST ART , A, G and EN D will be executed no matter what, they are neither statically nor dynamically control dependent on any other vertex, thus they are not part of the graph. The vertex B is not control dependent on any other vertex either, but since B controls whether or not the vertices C, D, E and F should be executed at all, it is part of the graph. There are two things to note about the notation used in Figure 2.4. First, an edge points to the vertex of which the source vertex is control dependent upon. Secondly, every edge is annotated with a tuple where the first element indicates which instance of the source vertex that is referred, and the second element indicates which instance of the target vertex that is referred. In the example illustrated in Figure 2.4 the source and target instance is the same for every control dependency. This is because the same path was repeatedly taken through the program throughout the entire execution. In a more complex program, instance numbers are however likely to differ at times.

Algorithm for dynamic control dependence graph construction In order to construct a dynamic control dependence graph information regarding the control flow graph as well as the post-dominator-tree is required. Bin Xin and Xiangyu Zhang have developed an algorithm for online detection (online meaning during execution) which they present in their paper Efficient Online Detection of Dynamic Control Dependence [13]. The algorithm requires a frame-work or tool capable of monitoring a program during execution, but is otherwise fairly straight forward. A description of the algorithm is presented below, for more detailed information regarding the algorithm itself the reader is directed to the article itself [13].

The algorithm operates on a stack, called the Control Dependence Stack (CDS). Every basic block that is either a 2-way basic block, n-way basic block or

(36)

18 Chapter 2

a call basic block (see Section 2.2) is annotated as a branching statement. Every basic block that is a post-dominator (see Section 2.3) is annotated as a merging statement. Every time a branching statement is encountered that basic block is pushed to the CDS, and every time a merging statement that post-dominates the top-most basic block on the CDS is executed, that basic block is popped off of the CDS. 1 def b r a n c h i n g ( b a s i c b l o c k , p o s t d o m i n a t o r , CDS) : 2 i f CDS . t o p ( ) [ 1 ] == p o s t d o m i n a t o r : 3 CDS . t o p ( ) [ 0 ] = b a s i c b l o c k ; 4 e l s e : 5 CDS . push ( [ b a s i c b l o c k , p o s t d o m i n a t o r ] ) ;

Listing 2.6: Branching operation for creating a dynamic control dependence graph.

The two listings Listing 2.6 and Listing 2.7 shows part of the code that each basic block is instrumented with during a monitored execution. This means that the CDS stack is continuously updated as the execution progresses and more basic blocks are being executed.

As a monitored program runs, and code is being executed, every new basic block that has two or more successors in the control flow graph is instrumented as a branching statement, with the code seen in Listing 2.6, and every new basic block that post-dominates another basic block is instrumented as a merging statement, with the code seen in Listing 2.7. When these basic blocks are later on executed the instrumentation code is executed as well, which alters the CDS and allows for a fast online analysis of dynamic control dependency.

Every basic block that is being executed is said to be control dependent on the top-most basic block on the CDS. Even though it is not illustrated with a listing in this section it is easy to imagine the code required for peeking at what is the top-most basic block on the CDS and then adding a control dependence from the current basic block to that basic block.

1 def merging ( b a s i c b l o c k , CDS) : 2 i f CDS . t o p ( ) [ 1 ] == b a s i c b l o c k : 3 CDS . pop ( ) ;

Listing 2.7: Merging operation for creating a dynamic control dependence graph.

2.5 Assembly Language

An assembly language is a programming language that is very close to the actual machine code that is interpreted by a processor. Instead of ones and zeroes however, an assembly language uses mnemonics, or abbreviations, for the instructions available for the computer architecture.

An important aspect of the assembly language in the context of disassembly is that there is a one-to-one mapping between assembly language instructions and instructions in machine code. This means there can be no ambiguities when translating instructions from the assembly language to machine code or from machine code back to the assembly language, although the latter requires an understanding of the control flow structure of the binary program.

(37)

Theory 19

There are a few different dialects of the x86 assembly language, and most notable is the Intel syntax and the AT&T syntax. The most important differ-ences between these two dialects is (a) the parameter order where Intel places destination before source and AT&T places source before the destination, (b) that mnemonics in the AT&T syntax are suffixed with a letter to indicate the size of the operation whereas this information is derived from the register that is used in the Intel syntax and (c) the general syntax for effective addresses.

Additional, and more detailed, information regarding the assembly language can be found in James T. Streibs book Guide to Assembly Language – A concise Introduction [14].

2.6 Disassembly and decompilation

Disassembly is the process of transforming a set of machine code instructions, a binary file, into a set of assembly instructions. It is the exact opposite of the assembler process which is usually the last step when compiling a program from source code [15, 16]. Disassembly can either be performed as a static pro-cess, where the set of machine code instructions that are being disassembled are never executed, or as a dynamic process, where the set of assembly instructions that constitutes the machine code is extracted during execution [16]. Static disassembly has the advantage of processing an entire set of machine code in-structions at once while dynamic disassembly can only process the subset of machine code instructions that are actually executed. Dynamic disassembly on the other hand has the advantage of extracting a sequence of instructions that is guaranteed to be correct for the execution during which it was extracted.

Decompilation, or reverse compilation, is the process of transforming a set of machine code instructions, or a set of assembly instructions, into a represen-tation in a high-level language [17]. The purpose is to reconstruct the source code of the binary program in a high-level language that can easily be read and understood by a human in order to audit the code or make changes to the func-tions of the program. Decompilation is a process that is usually performed after disassembly and applies additional transformations to the assembly instructions produced by the disassembly process, thus it should not be seen as a conflicting process to disassembly.

In order to decompile a program correctly, the program’s control flow paths needs to be known, or discovered, during the process. Dynamically, this infor-mation can be obtained for the subset of instructions that is actually executed while the program is being monitored. Statically the entire program will be processed, but there will be ambiguities in the result since not all information regarding the control flow can be known in a static context.

The reason for these ambiguities that arises when statically reconstructing the control flow of a binary files stems from the fact that there is more infor-mation present in the original source code. This additional inforinfor-mation which primarily concerns control flow is removed when the program is compiled into an executable binary since the information is not needed in order for the pro-gram to execute correctly. Fully structured code, where every control-structure has a single point of entry and a single point of exit, should have considerably less ambiguities. This is because once a point of entry is found, the point of exit can be known immediately.

(38)

20 Chapter 2

Finally it is important to note that data and instructions are indistinguish-able in the Von Neumann architecture. This means that data can be placed between instructions in no particular order which makes it harder to both dis-assemble and decompile a program [17].

(39)

Chapter 3 Approach

1

1_{From the web-comic xkcd, issue 722. https://xkcd.com/722/}

(40)

22 Chapter 3

3.1 Static control flow graph

Since the control flow graph needs to be known in order to generate a dynamic control dependence graph, the first step when analyzing a binary is to extract the static control flow graph. Although, it should be noted that since it is an undecidable problem (in the general case) to correctly calculate the minimum set of basic blocks that constitutes all possible control paths through a program. There are several different tools and applications available for this purpose and a few of them, all of which was commonly referred to by previous research in dynamic analysis, have been evaluated for this thesis. They all construct control flow graphs that slightly differ from one another even though the same binary was used. This is assumed to be because of different heuristics being applied in order to find, or guess about, points-to information.

The DynInst library was chosen, in part for providing a fully documented API for extracting the static control flow graph, in part for being recommended, and previously known, by researchers at Link¨oping University. The extraction of a static control flow graph has been separated from the generation of dynamic control dependence graphs in order to allow for other libraries to be integrated into the project. More information about the implementation can be found in Chapter 4.

3.1.1 DynInst

DynInst [18] is a library with an Application Programmable Interface (API) that is freely available under the GNU Lesser General Public License2, and copyrighted to Barton P. Miller. The DynInst library is developed by a group of researchers from the University of Maryland and the University of Wisconsin Madison. [19]

The DynInst library was created in order to provide a machine independent interface to permit the creating of tools and applications that use runtime code patching [19], but it also provides features for static analysis of binary programs. The support for static analysis within the library focuses on intra-procedural analysis and therefore the program’s control flow graph is represented by the set of the program’s functions’ control flow graphs.

It is important to note that DynInst will not always be able to generate a correct control flow graph if there are indirect jumps in the code. In case of indirect jumps the target of these jumps are found by searching instruction patterns, known as peep-holes. Since the instruction patterns are compiler specific the ability to identify those correctly is limited to those that exists within the set of compilers that was tested while DynInst was developed. [20]

3.1.2 Binary Analysis Platform

The Binary Analysis Platform (BAP) is a software package that is freely avail-able under the GNU General Public License3_{. BAP is developed by a group of}

researchers from Carnegie Mellon University funded by CyLab and DARPA. [21] The core of BAP is a disassembler that is able to parse x86 binaries and reproduce the assembly instructions that constitutes the source code of the

2_{http://www.gnu.org/licenses/lgpl.html} 3_{http://www.gnu.org/licenses/gpl.html}

(41)

Approach 23

binary. BAP uses an architecture-independent intermediate language (IL) to represent the assembly instructions. The IL explicitly represents all the indirect effects a certain assembly instruction has, such as which registry flags that are set or unset by each instruction, but also decodes indirect jumps to the extent it is possible. Listing 3.1 illustrates how the assembly instruction add rax, rbx is represented in the IL. [21, 22]

BAP is the successor to the binary analysis techniques developed for Vine as part of, members of, the research groups’ previous work on a toolset called BitBlaze. There are several current and previous projects that relies on Vine, BitBlaze or BAP in order to perform security analysis or decompilation of x86 binaries. [21]

1 addr 0 x0 @asm ” add %rax ,% rbx ” 2 l a b e l p c 0 x 0

3 T t1 : u64 = R RBX : u64 4 T t2 : u64 = R RAX : u64

5 R RBX : u64 = R RBX : u64 + T t2 : u64 6 R CF : b o o l = R RBX : u64 < T t1 : u64

7 R OF : b o o l = high : b o o l ( ( T t1 : u64 ˆ ˜ T t2 : u64 ) & ( T t1 : u64 ˆ R RBX : u64 ) )

8 R AF : b o o l = 0 x10 : u64 == ( 0 x10 : u64 & (R RBX : u64 ˆ T t1 : u64 ˆ T t2 : u64 ) )

9 R PF : b o o l =

10 ˜low : b o o l ( l e t T acc : u64 := R RBX : u64 >> 4 : u64 ˆ R RBX : u64 in

11 l e t T acc : u64 := T acc : u64 >> 2 : u64 ˆ T acc : u64 in

12 T acc : u64 >> 1 : u64 ˆ T acc : u64 ) 13 R SF : b o o l = high : b o o l (R RBX : u64 )

14 R ZF : b o o l = 0 : u64 == R RBX : u64

Listing 3.1: Example of BAP Intermediate Language representation of assembler instruction add.

3.1.3 Diablo

Diablo is a software framework that is freely available under the GNU General Public License4_{. Diablo is developed by a group of researchers from the Ghent}

University. [23]

Diablo is a retargetable link-time binary rewriting framework that functions as a linker during the compilation step of a program, meaning that the input are the object files and libraries from which the program is built, and not the program binary itself. By using the object files Diablo gets extra information about the program and its properties, which makes it possible to correctly in-terpret the complete binary – something that is not possible by using just the binary itself. It is also worth mentioning that Diablo only works on statically linked programs, and not dynamically linked programs.

Diablo has been used in a variety of different projects related to compilation, including compression, optimization and obfuscation.

(42)

24 Chapter 3

3.1.4 DynamoRIO

DynamoRIO is a software package that is freely available under the Berkeley Software Distribution License5. DynamoRIO is originally developed as collabo-ration between the Massachusetts Institute of Technology and Hewlett-Packard. VMware then acquired the project in 2007. [24]

DynamoRIO is a dynamic instrumentation tool that can be used to make changes to, or monitor, the process of execution a binary file. DynamoRIO provides an interface for building tools that requires dynamic instrumentation and have the ability to parse binary files compiled for the IA32 and AMD64 instruction sets. [24]

3.1.5 IDA Pro

IDA Pro is an application that is available freely for non-commercial use and costsAC1799 for a commercial license capable of decompilation [25]. IDA Pro is a commercial product developed by the company Hex-Rays S.A. [26]

IDA Pro contains functionality for disassembly, decompilation and debug-ging of x86 binaries and is able to reproduce the assembly instructions that constitutes the source code of the binary. IDA Pro can also recreate the static control flow of a program to some extent and is considered the industry standard when it comes to both disassembly and decompilation. [27]

IDA Pro contains a complete development environment with an SDK avail-able so that the application can be extended, although this feature, and the SDK as a whole, is only available with a commercial license. [27]

3.1.6 Udis86

Udis86 is a software library that is freely available under the Free Berkeley Software Distribution License6_{. The Udis86 library is an open source project}

hosted on github and maintained by Vivek Thampi. [28]

Udis86 is a disassembler library that is able to parse the x86 architecture, but also has support for the IA-32e and AMD64 architecture. After parsing, the library can reproduce the assembly instructions from the stream of machine code that was provided. Udis86 can represent the assembly instructions in both INTEL and AT&T style assembly language syntax. An example of disassembly by udis86 and what the output might look like can be seen in Listing 3.2. [28]

1 0 x80000800 6 5 6 7 8 9 8 7 7 6 6 5 mov [ gs : bx+0x6576 ] , eax

2 0 x80000806 54 push esp

3 0 x80000807 56 push e s i

4 0 x80000808 7889 j s 0 x80000793

5 0 x8000080a 0900 or [ eax ] , eax

6 0 x 8 0 0 0 0 8 0 c 90 nop

Listing 3.2: An example of machine code disassembled into INTEL syntax by Udis86.

5_{http://www.xfree86.org/3.3.6/COPYRIGHT2.html#5}

(43)

Approach 25

3.2 Instrumentation

The term instrumentation refers to an ability to monitor or measure the level of a product’s performance and to diagnose errors [29]. In this thesis we refer to instrumentation as the process of inserting code into a program in order to collect run-time information.

There are several different methods of instrumentation and all of them comes with a slightly different set of benefits and drawbacks. The methods for in-strumentation can be categorized into three main categories; (a) source code instrumentation, (b) static binary instrumentation and (c) dynamic binary in-strumentation [30].

The tool that has been developed to generate dynamic control dependence graphs as part of this thesis has used the dynamic binary instrumentation tool PIN [31] which is being developed and maintained by Intel.

PIN is a dynamic binary instrumentation framework for the IA-32 and x86-64 instruction-set architectures that enables creation of dynamic program analysis tools. The instrumentation is performed at run-time on compiled binary files and thus, it doesn’t require any source code to be recompiled and can therefore be used in applications where the source code is not available. PIN was originally created as a tool for computer architecture analysis, but has since been extended and by now a diverse set of tools for security, emulation and parallel program analysis have been created. PIN stands for ‘Pin Is Not an acronym’ [31]. Tools developed using PIN is commonly referred to as PIN tools and are dynamic libraries that can be executed through PIN.

Figure 3.1: An illustrative, and simplified, example of how the PIN instrumen-tation process can be though of.

There is a natural overhead associated with dynamic binary instrumenta-tion. The overhead can be categorized into two different parts, namely (a) instrumentation and (b) analysis. The overhead from instrumentation is be-cause of the fact that PIN needs to know when code that hasn’t previously

(44)

26 Chapter 3

been instrumented is encountered and execute procedures to determine how the new code should be instrumented. The overhead from analysis is because there are actual instructions added to the code belonging to the binary that’s being instrumented, so there are more instructions for the processor to process. The overhead introduced from instrumentation is often fairly small compared to the overhead from analysis since the instrumentation procedures are only executed once per unit of code7_.

Throughout this thesis the terms instrumentation code and analysis code will be used. Instrumentation code refers to the code that is executed when PIN encounters a new set of instructions, this code is used to insert analysis code into the original code. Analysis code on the other hand is the code that is inserted into the programs original code, this analysis code is used to monitor and/or change the behavior of the program.

When PIN encounters a set of instructions that has not already been in-strumented, these instructions are bundled together as a trace8. The trace is passed on to the PIN tool that can augment the trace itself, basic blocks within the trace or single instructions with instrumentation code. PIN then recompiles the set of instructions together with the additional instrumentation code and places it into a code cache. Finally one or several exit stubs is appended to the instrumented trace in order to connect it to the rest of the code. These exit stubs can be seen as the next pointers commonly found in implementations of linked lists, or the left/right pointers in a binary tree implementation.

Figure 3.1 illustrates how (1) the set of instructions that constitutes the next trace is identified by PIN; (2) PIN augments the set of instructions with instru-mentation code as well as an exit stub and (3) how the basic block is placed in the code cache and linked together with other, previously instrumented, traces. The code cache is a software based representation of the program code. It contains the recompiled machine code together with the structures linking all the different traces together in the same way they would be connected if the program wasn’t executed through PIN. It is only the code that resides within the code cache that is executed, the original code is always recompiled and thus never executed in itself.

Because of this implementation where a code cache is used and code is re-compiled on the fly, PIN requires the analysis code to be simple and free of branching statements in order to compile it as part of the original code. As soon as code contains branching statements and, more importantly, indirect jumps, it is very hard, and sometimes impossible, to determine how the code will affect the control flow of a program by static analysis. Therefore, if the analysis code contains any branch or call instructions, PIN wont include the analysis code in the code cache but instead provide a call reference to the orig-inal analysis code. When PIN references the analysis code instead of including it, PIN has to leave the code cache and execute the function from another part of the memory, a process which is relatively costly and should be avoided if possible.

7_{In practice the same unit of code can be instrumented several times, something which}

is required when improving the control flow graph with dynamic information. Additional information regarding implementation specific details for this aspect of the thesis can be found in Section 4.3.1.

8_{A trace in PIN is what’s called a super block in compiler theory; A straight-line sequence}

(45)

Approach 27

3.3 Improvements of the control flow graph

As mentioned in Section 2.6, and further explained in Section 3.1.1, the static control flow graph is likely to be partially incorrect and contain fewer edges than it is supposed to. These edges are missing because of indirect jumps, where the target of the jump instruction is calculated based on information that is only present during runtime.

By improving the static control flow graph it should be possible to create a more accurate dynamic control-dependence graph, at least for the execution, or set of executions, that is being monitored. Since a dynamic control-dependence graph requires information about both branching statements and merging state-ments (see Section 2.4.2), every indirect jump that is resolved to a concrete address should lead to a more accurate dynamic control-dependence graph.

During the process of monitoring the execution of an application parts of this information becomes available as these indirect jumps are actually performed. By keeping track of the previous basic block that was executed it is possible to improve the control flow graph and include edges that couldn’t be found during the static analysis. Every time that the previously executed basic block is not a direct predecessor to the current basic block in the control flow graph, an edge from the previous basic block to the current basic block should be added to the control flow graph.

When following an indirect jump, there is a chance that the target address of that jump exists within what was previously thought of as a single basic block. When this happens, existing basic blocks, and the control flow graph, need to be restructured to accurately represent the new information. The changes introduced by this behavior will require that what was previously thought of as a single basic block, be split into (at least) two new basic blocks; these changes are represented by the GT/-- category of control flow graph improvements (see below).

When information about a new basic block B is introduced there are ten (10) different possibilities as to how the control flow graph should be changed. Nine (9) of these are variations where the new basic block overlaps an existing basic block A, and one (1) is the case where the new basic block occupies a previously empty address space.

In the cases where there is overlap, the start address of B can relate to the start address of A in three (3) different ways; it can be (a) less than (LT), (b) equal to (EQ) or (c) greater than (GT). The last instruction of B can also relate to the last instruction of A in three (3) different ways; it can be (a) less than (LT), (b) equal to (EQ) or (c) greater than (GT). The combinations of these possibilities constitute the nine different cases of overlap and each case has been named with regards to which type of overlap that exists for the start address as well as the last instruction. As an example, the case where the start address of the new basic block (B) is less than the existing basic block (A) and the last instruction address of B is equal to the same existing basic block A, is therefore referred to as the LT/EQ case, illustrated in Figure 3.3.

Furthermore, when the new basic block B overlaps multiple existing basic blocks, any one of these basic blocks could be chosen as A. The first overlapping basic block A that is encountered is chosen and alterations within the address space of A is performed. The part(s) of B that exceeds the address space of A is then passed into the algorithm again in a recursive manner until the entire

(46)

28 Chapter 3

address space required by B have been processed and the control flow graph have been altered accordingly.

Because of the approach described above, the relationship between the start addresses of A and B are more significant when introducing new vertices in the graph and the cases have therefore been annotated as follows; the annotation LT/-- is used to indicate one of the three cases where the start address of B is less than the start address of A, the annotation EQ/-- is used to indicate one of the three cases where the start addresses of A and B are equal; and the annotation GT/-- is used to indicate one of the three cases where the start address of B is greater than the start address of A. Finally, the case where the new basic block occupies a previously empty address space has been named NA/NA since there is no way to define the boundaries exclusively and they are therefore not applicable.

Some programs are using unconventional ways of loading additional code, in the form of dynamic libraries, shell-code or similar. When this happens the static control flow graph doesn’t contain any information about the code that was introduced to the system during runtime. Under these circumstances several new basic blocks needs to be created, and changes could be part of any of the LT/--, EQ/--, GT/-- or NA/NA categories.

Even though self-modifying code is uncommon, and outside the scope of this thesis, it is worth noting that if modifications to the code of the program that is monitored should occur, the control flow graph will need to be restructured. The possible changes to the control flow graph introduced by self-modifying code should be covered in full by the LT/--, EQ/-- or GT/-- categories.

3.3.1 Splitting basic blocks

Splitting a basic block A that covers an address space from 0 to n at a position p should result in two new basic blocks, one which covers the address space A0...Ap−1 and one which covers the address space Ap...An. Since a split will

occur when an indirect jump is resolved to a concrete address the way of splitting at a position p that is described here results in a new basic block B1which starts

at the same address as the original basic block, and a new basic block B2which

starts at the position for the intended indirect jump.

The process required for splitting a basic block is fairly straight forward and is described in Listing 3.3. After creating two new basic blocks to replace the original one all of the predecessors that previously belonged to A needs to be moved to B1and all of the successors that previously belonged to A needs to be

moved to B2. If A contained a loop back to itself however, a new edge from B2

to B1 needs to be added in order to represent the same relationship. Finally,

an edge from B1to B2 needs to be added in order to represent the fall-through

Generation of dynamic control-dependence graphs for binary programs

Institutionen för datavetenskap

Department of Computer and Information Science

Final thesis

Generation of dynamic control-dependence

graphs for binary programs

by

Jakob Pogulis

LIU-IDA/LITH-EX-A--14/041--SE

2014-06-06

Linköpings universitet

SE-581 83 Linköping, Sweden

Linköpings universitet

581 83 Linköping

Final Thesis

Generation of dynamic control-dependence

graphs for binary programs

by

Jakob Pogulis

LIU-IDA/LITH-EX-A--14/041--SE

2014-06-06

Supervisor: Ulf Kargén

Examiner: Nahid Shahmehri

Generation of dynamic control-dependence

graphs for binary programs

Jakob Pogulis

jakob@pogulis.se

Division for Database and Information Techniques

Department of Computer and Information Science

Link¨

oping University, Sweden

Contents

List of Figures

List of Tables

Listings

Preface

Chapter 1

Introduction

1.1

Background

1.2

Purpose

1.3

Methodology

1.4

Outline

Chapter 2

Theory

2.1

Directed graph

2.2

Control Flow Graph

2.3

Dominator tree

2.3.1

Algorithm for domination tree construction

2.4

Control Dependence Graphs

2.4.1

Static Control Dependence Graphs

2.4.2

Dynamic Control Dependence Graphs

2.5

Assembly Language

2.6

Disassembly and decompilation

Chapter 3

Approach

3.1

Static control flow graph

3.1.1

DynInst

3.1.2

Binary Analysis Platform

3.1.3

Diablo

3.1.4

DynamoRIO

3.1.5

IDA Pro