Signal Dependency Analysis and Status Propagation Tracking

(1)

Signal Dependency Analysis

and Status Propagation

Tracking

PENG SU

KTH RO YA L I NS TI TUT E O F T EC HNO L O G Y

S C H O O L O F E L E C T R I C A L E N G I N E E R I N G A N D C O M P U T E R S C I E N C E DEGREE PROJECT IN ELECTRICAL ENGINEERING,

SECOND LEVEL, 30 CREDITS STOCKHOLM, SWEDEN 2020

(2)

Signal Dependency Analysis and

Status Propagation Tracking

Peng Su

2020-04-29

Master’s Thesis

Examiner

Zhonghai Lu

Academic adviser

Yuan Yao

Industrial adviser

Hasan Derhamy

KTH Royal Institute of Technology

School of Electrical Engineering and Computer Science (EECS) Department of EECS

SE-100 44 Stockholm, Sweden

(3)

Abstract | iii

Abstract

In software engineering, analyzing the dependence of software modules and signals is a common method of verification and testing software behaviors. Through dependency analysis, users can improve the quality and operating efficiency of the code. Also, analyzing dependencies can reflect the working status of software modules. Software signal dependence is thus very important for software verification. However, how to perform the dependency analysis is an open question. Code review is a text-based analysis method. When faced with many dependencies, the readability is significantly reduced. It is also difficult for a code reviewer to track all dependencies on a single signal. By contrast, visual dependency is a relatively intuitive analysis method, which can express the dependency of signals visually.

This thesis deals with the analysis and visualization of signal dependencies in the Engine Management System (EMS), which is an essential and complex software module in vehicles. There are usually hundreds of function modules in the EMS. Understanding their dependencies can help engineers diagnose and test the system accordingly. This topic has the following difficulties: (1) how to summarize the dependence of all elements from the source code; (2) how to express dependence; (3) how to visualize dependencies; (4) what tools are needed to achieve visualization of dependencies.

To solve the above problems, we need to establish a corresponding toolchain. First, we use static analysis to extract dependencies from the source code. The static analysis here refers to use scripts to automatically analyze dependencies in the source code. The script includes setting up a parser to collect data. The purpose of the parser is to parse the pre-processing code and generate the corresponding intermediate file, which needs to indicate signal dependencies and other basic information. Then, we evaluate the analysis results and choose an appropriate visualization tool to represent the signal dependency. The results show that the signal dependency can be tracked, and the visualization can be implemented by using our designed toolchain. The results are intuitive and concise, and it has a strong application prospect.

Keywords

Dependency tracking, dependency visualization, code analysis.

(4)

Sammanfattning

I mjukvaruteknik är analys av beroende av mjukvarumoduler och signaler en vanlig metod för verifiering och testning av mjukvarubeteenden. Genom beroendeanalys kan användare förbättra kodens kvalitet och driftseffektivitet. Analys av beroenden kan också spegla arbetsstatus för mjukvarumoduler. Programsignalberoende är alltså mycket viktigt för programvaruverifiering. Hur man gör beroende-analysen är dock en öppen fråga. Kodgranskning är en textbaserad analysmetod.

Vid många beroenden minskar läsbarheten avsevärt. Det är också svårt för en kodgranskare att spåra alla beroenden på en enda signal. Däremot är visuellt beroende en relativt intuitiv analysmetod, som visuellt kan uttrycka signalberoende.

Denna avhandling behandlar analys och visualisering av signalberoenden i Engine Management System, som är en väsentlig och komplex mjukvarumodul i fordon. Det finns vanligtvis hundratals funktionsmoduler i EMS. Att förstå deras beroende kan hjälpa ingenjörer att diagnostisera och testa systemet i enlighet därmed. Detta ämne har följande svårigheter. (1) för att sammanfatta beroendet av alla element från källkoden; (2) hur man uttrycker beroende; (3) hur man kan visualisera beroenden; (4) vilka verktyg som behövs för att uppnå visualisering av beroenden.

För att lösa ovanstående problem måste vi skapa en motsvarande verktygskedja. Först använder vi statisk analys för att extrahera beroenden från källkoden. Den statiska analysen hänvisar här till användningen av skript för att automatiskt analysera beroenden i källkoden, vilket inkluderar att ställa in en parser för att samla data från källkoden. Syftet med parsaren är att analysera förbehandlingskoden och generera motsvarande data. Uppgifterna måste indikera signalberoende och annan grundläggande information. Sedan väger vi analysresultaten och väljer lämpligt visualiseringsverktyg för att representera signalberoendet. Resultaten vi erhöll visar att signalberoendet kan spåras och visualiseringen kan implementeras med vår designade verktygskedja.

Resultaten är intuitiva och kortfattade och det har en stark applikationsutsikter.

Nyckelord

Beroende av spårning, visualisering av beroende, kodanalys

(5)

v

Acknowledgments

I would like to special thank Hasan Derhamy for teaching me and giving many helps when I do the thesis in Scania. Thank him for his support and encouragement when I was in trouble. Without his patient teaching and help, I might not be able to complete this difficult topic for me.

Stockholm, September 12, 2019 Peng Su

(6)

Abstract ... iii

Keywords ... iii

Sammanfattning ... iv

Nyckelord ... iv

Acknowledgments ... v

Table of contents ... vi

List of Table ... viii

List of Figure ... ix

List of Acronyms and Abbreviations ... x

1 Introduction ... 1

1.1 Background ... 1

1.2 Problem Statement ... 1

1.3 Purpose ... 1

1.4 Goals ... 1

1.5 Research Methodology ... 2

1.6 Delimitations ... 2

1.6.1 Code analysis ... 2

1.6.2 Dependency visualization ... 3

1.7 Structure of the Thesis ... 3

1.8 Ethics and Sustainability ... 3

2 Background ... 4

2.1 Engine Management System ... 4

2.2 Introduction of Graph Theory ... 5

2.2.1 Overview ... 5

2.2.2 Semantic of graphs ... 6

2.3 Introduction of Design Tool ... 8

2.3.1 Introduction of Pycparser ... 9

2.3.2 Introduction of Argparse... 9

2.3.3 Introduction of Neo4j ... 9

2.4 Previous Work ... 12

2.4.1 Static Analysis, Code Parsing and Abstract Syntax Tree ... 12

2.4.2 Dependency Analysis ... 13

2.4.3 Visualization ... 14

2.5 Summary ... 17

3 Methodology ... 18

3.1 Research Process ... 18

3.2 Defining Software Architecture ... 19

3.2.1 Variables analysis ... 19

3.2.2 Function analysis ... 19

3.2.3 Module analysis ... 21

3.2.4 Constants analysis ... 21

3.2.5 Dependency analysis ... 21

3.3 Build Meta Graph ... 22

(7)

vii

3.3.1 Basic rules for building meta graph ... 22

3.3.2 Build meta graph ... 24

3.4 Creating Architectural Description ... 29

3.4.1 Basic description of static analysis ... 29

3.4.2 Description of JSON format specification ... 29

3.5 Analysis of Neo4j ... 31

3.5.1 Neo4j readable file format ... 31

3.5.2 Neo4j method of creating relationships ... 32

3.5.3 Cypher query for tracking variables ... 34

3.5.4 Advantages of using Neo4j ... 34

3.6 Summary ... 35

4 Implementation ... 36

4.1 Toolchain Overview ... 36

4.2 Stage I: Code Parsing, AST Processing and Extract information from Source Code ... 38

4.3 Stage II: Extract Dependencies and Visualization ... 41

4.4 Summary ... 46

5 Results and Analysis ... 47

5.1 Result of Static Analysis ... 47

5.2 Result of Signal Dependencies Visualization ... 47

5.2.1 Major results ... 47

5.2.2 Write& Reference dependencies ... 49

5.2.3 Out argument Dependencies ... 49

5.2.4 Function call dependencies ... 51

5.2.5 Return dependencies ... 51

5.3 Reliability Analysis... 51

5.4 Summary ... 52

6 Conclusions and Future work ... 53

6.1 Conclusions ... 53

6.2 Limitations ... 53

6.3 Future work ... 54

References ... 55

(8)

List of Table

2.1 Differences between graphs. . . .8

2.2 A brief query used in Neo4j. . . .10

2.3 An example query to create relationships in Neo4j. . . 11

2.4 An example to add relationships in Neo4j . . . . . . .11

3.1 Conditional statement pseudo code . . . .20

3.2 Function call dependencies pseudo code. . . .21

3.3 An example to show dependencies. . . .23

3.4 Multiple function call dependencies pseudo code. . . 24

3.5 Pseudo code for meta graph. . . 25

3.6 A template for generating file

. . . . . . 29

3.7 An example to load data into Neo4j. . . .32

3.8 A template to load data into Neo4j. . . .33

3.9 A template to connect nodes. . . 34

3.10 A query to connect data in Neo4j. . . .37

4.1 Pseudo code for multiple dependencies. . . .39

4.2 Nested JSON file example. . . .40

4.3 An example to show redundancy. . . .42

4.4 An example output of static analysis. . . .43

4.5 A template of relationships file. . . .43

4.6 Using CQL to load JSON file. . . .43

4.7 Using CQL to connect data. . . 44

4.8 Format of header file. . . 44

4.9 Format of body file. . . 44

4.10 An example for using import tools. . . 45

4.11 Matching query for tracking dependencies. . . .46

(9)

List of Figure | ix

List of Figure

1.1 Overview of the methodology. . . .2

2.1 The software architecture of EMS. . . .5

2.2 A simple example of node graph. . . .6

2.3 A simple example of study relationships. . . . . . .6

2.4 An example of property graph. . . 7

2.5 An example of hypergraph. . . .7

2.6 An example of RDF triple stores. . . .. . . .8

2.7 The architecture of Neo4j. . . .10

2.8 A graph generated by Neo4j. . . 11

2.9 Main ways to create data in Neo4j . . . .12

2.10 A simple control flow graph. . . 15

2.11 Dependencies graph generated by the yEd tool . . . .16

2.12 Dependencies across the modules. . . 17

2.13 Overall dependencies generated by the yEd tool . . . .17

3.1 A wrong demonstration to express dependencies. . . 23

3.2 Split function to show dependencies. . . 24

3.3 Part of meta graph (1/3). . . .26

3.6 Overview of meta graph. . . .28

3.7 An example for redundancy connection. . . 35

4.1 Overview of toolchain. . . .37

4.2 Flow graph of static analysis. . . 38

4.3 Nested structure hierarchy. . . 40

4.4 Files loaded into Neo4j. . . .. . . 45

5.1 Result of visualizing signal dependence. . . .48

5.2 Result of writing dependencies. . . 49

5.3 Result of out argument dependencies. . . .50

5.4 Result of function call dependencies. . . 51

(10)

List of Acronyms and Abbreviations

JSON Javascript Object Notation

MISRA Motor Industry Software Reliability Association RTDB Real-time Database

EMS Engine Management System ECU Electric Control Unit

COMP Common Platform

CQL Cypher Query Language

AST Abstract Syntax Tree

XML Extensible System Language Transformations LPG Labeled Property Graph

RDF Resource Description Framework UML Unified Modeling Language

(11)

| 1

1 Introduction

In this chapter, we will generally introduce the research problem, purpose, and goals for the thesis. In section 1.1, we will give a brief introduction to the background of the thesis. Section 1.2 will summarize the main problems of thesis. Section 1.3 will define the purpose of the thesis. Section 1.4 will give a blueprint for the goal. We briefly introduce methodology and delimitations in section 1.5 and 1.6. Section 1.7 introduces the structure of this report. We will introduce ethics and sustainability in Section 1.8.

1.1 Background

Dependency analysis is mainly used for design verification and diagnosis. The most common method for analysing dependency is code review. There are two commonly code review methods. One approach is to display the change or function call of source code through some specific software such as Perforce, Git, Eclipse, etc. The other is to read the source code and analyse the dependencies manually. Both above methods have certain limitations. First, the way is only a partial analysis of the dependencies in the specified source code module. Secondly, this method cannot robustly analyse all the dependencies in software engineering.

Moreover, this method cannot show the dependency results intuitively. Based on the above background, we need to explore a more feasible and robust way or tool to analyse dependencies. In this thesis, we select the Engine Management System (EMS) as the analysis object. We can verify and diagnose the system by analysing the dependence of signals in EMS.

Engine management systems involve complex software architectures with highly configurable modules.

Module dependencies are created through signals that are used to communicate with functions. These signals have a few properties, including a status indicating the quality of the signal. A degraded signal has an impact on the quality of downstream signals. Therefore, understanding the signal dependencies is critical to understanding the overall software behaviours. Furthermore, analysis dependencies can use to verify and diagnose software performance.

1.2 Problem Statement

Before we start the thesis, we need to understand the possible problems encountered in the thesis. By decomposing the thesis’s tasks, we will find that the thesis needs to complete several important milestones, which are to extract and analyze the source code, analyze the function dependencies, and visualize the dependencies. Then our main problem is concentrated in the above milestone; the specific issues can be summarized as follows.

1) How to analyze the signal dependency from the source code?

2) How to define and visualize the dependence of the signal?

3) What kind of tools can be used to visualize signal dependency?

1.3 Purpose

Help engineers understand and diagnosis the EMS by tracking signal dependency.

1.4 Goals

The goal of this project is to visualize signal dependency. This has been divided into the following sub goals:

1) Define the species of signal dependency 2) Extract the summary from the source code 3) Extract the dependencies

(12)

4) Build a meta graph to show the signal dependency 5) Build a concrete graph to show the signal dependency 6) Find a suitable tool to generate a graph

7) Generate a corresponding file according to the requirements of a graphical tool 8) Verification and Validation

9) Optimization

1.5 Research Methodology

In the thesis, we first need to qualitatively analyze the dependence of all elements in the source code structure. Then we unify descriptions of the analysis result. When a function is affected by a conditional statement, we focus on representing the relationship between these functions and the elements in the statement. After determining the descriptions of all dependencies, we need to build a meta graph to verify the above analysis. The meta graph is also a blueprint for the subsequent visualization. The next step after creating the meta graph is to verify the previous result.

After the above process is confirmed, we need to specify the format of the extracted dependencies. The retrieved results need to represent the dependencies between all source code. So, the form needs to satisfy the expression of all dependencies. After that, we need to reprocess the extraction results to get relatively streamlined results properly.

Then we need to consider the tools to achieve visualization and then further streamline and compress the above data according to the visualization tool, then input the final data into the visualization tool to get the result. After getting visible results, we analyse the results and summarize the advantages and disadvantages. If it is possible, we will try to optimize the results. An overview of the methodology applied in the project is shown in Figure 1.1

Figure 1.1 Overview of the methodology 1.6 Delimitations

In this section, we mainly introduce some limitations to this thesis. We mostly divide the delimitations into two parts: code analysis and visualization.

1.6.1 Code analysis

In the source code, the variables which are read and written by the function are defined in structs, which can be further refined into data items, such as signal status and signal values. When tracking data items, the amount of data will become extraordinarily large, which affects the readability and operability of the visualization. So, we only track the variables (structs) in this project.

Since arguments facilitate reading and writing variables, we must consider arguments in the process of tracking dependencies. However, the names of the arguments defined in different functions are duplicated, we must rename each argument to eliminate redundancy and errors. It leads to significantly slow down the speed of tracking dependencies and weaken the readability of the dependency graph.

(13)

3

1.6.2 Dependency visualization

The visualization process still takes a long time to retrieve signals. Although we have tried to optimize the retrieval method and data format, we always have not found the best way to reduce the time.

In addition to the above delimitations, we deliberately divide a function into sub-functions and parent functions to track the accuracy of signal visualization. Although this can indicate the dependency of the function, it increases the amount of data invisibly, slows down the retrieval speed of the data, and weakens the readability. However, we have not found a better way to improve the performance.

1.7 Structure of the Thesis

Chapter 2 presents relevant background about the Engine Management System and its software architectures. The tools and related work also are introduced in this chapter. Some features and organization of the source code are also presented in this chapter. We analyze the characteristics, usage, and construction features of the source code. After giving the basics of the source code architecture, we further introduce the basics of some tools used in the thesis. We present the basic principles, usage of these tools. Since we still need to implement visualization, we also describe some basic knowledge of graph theory.

Chapter 3 presents the methodology. In this chapter, we will detail the methods which are used in the thesis. We not only explore the ways about analyzing the code and summarizing the results but also introduce the process which builds the dependencies description. After that, we will add the basic methods of extracting, streamlining, and compressing data. Then, we will discuss how to implement visualizations, including building meta graphs and analyzing visualization tools.

Chapter 4 reports the detail of the implementation. In this chapter, we will discuss about the problems which are encountered in the implementation process. Neo4j, which is the visualization tool used in the thesis, is explained in detail in this chapter.

Chapter 5 will analyze the result of the thesis. We will measure whether the primary analysis results meet our expectations and plans; we investigate whether the methods we use meet the requirements of the thesis. At the same time, we measure the reliability of the final results and the methods used.

Chapter 6 will discuss the advantages, disadvantages, limitations, and achievements.

In this chapter, we mainly analyze the shortcomings of the results, as well as possible ways to improve and provide a reference for future researchers.

1.8 Ethics and Sustainability

This thesis does not violate any ethical rules, and the development of the project is beneficial to the testing of vehicle systems. We verify the software architecture of the vehicle system through a computer to improve the reliability of the car.

(14)

2 Background

This chapter provides necessary background information about software architecture, which we use to analyze dependencies. The knowledge will make readers easier to understand how the software architecture organized in EMS. Additionally, this chapter describes Neo4j, which applies to visualize signal dependency. Furthermore, we discuss the previous work about dependencies analysis. Section2.1 will give an overview of the Engine Management System and its software architecture. Section2.2 will introduce graph theory, which is essential knowledge to build dependencies graph. Section2.3 will introduce the tool used to extract information from source code. In Section 2.4, we will introduce related works about signal dependencies analysis and visualization.

2.1 Engine Management System

The Engine Management System (EMS) is responsible for controlling the amount of fuel being injected and for adjusting the ignition timing [1]. It will control parameters such as temperature, moisture, pressure, fuel/air ratio, system adaptations, equipment behavior, and smoke gas components. The EMS aims to develop an operational system that will function through monitoring N0x, SO2, CO2, and other emissions. The EMS automatically corrects the emission to readjust the (industrial) process so that it will continue to operate within specifications. The emission levels of harmful emissions released into the environment can be monitored and controlled [2].

Most of the software system for a vehicle is written in MISRA C, which is a set of software development guidelines for the C programming language developed by the Motor Industry Software Reliability Association [3]. It defines a safer subset of C language. MISRA C reduces unpredictable possibilities and implementation-dependent behavior. System specifications determine the functions of the subsystem.

C programming rules for the vehicle are organized under a different topic. Most of the rules are identical to rules in MISRA-C:2004 [3], which avoids most of the dangers inherent in the C standard [1]. This standardized programming approach is a prerequisite for robust dependency analysis.

These software systems use top-down design [4]. The software is organized into layers, which divides into high- and low-level application layers, platform layers, which represent the topmost software packages. High-level application layer (HLLP) is responsible for monitoring aftertreatment, combustion, engine temperature, fuel system, and gas exchange. These functions are all specific modules of in-vehicle systems. Low-level application layers (LLAP) are responsible for converting the incoming signal into engineering quantities. A low application controls communication with other ECUs. The platform layer contains a standard package of common functionality used by many ECUs. These packages consist of an underlying software architecture, which diagnoses behavior in the event of fault symptoms, and the following triggered actions [5][6]. The general diagnosis architecture for the platform layer has partitioned into a low and high application layer.

The variables exchanged between platform and application layers are defined as a structure type variable in the source code. The purpose of these structure type variables is both to keep in memory a diagnostic test’s states between executions and to report results [7]. The quality of upstream signals impacts the variables which come from reports.

Managers are the subsets of layers. The concept of managers aims to logically group modules that have a standard functionality. This is a consequence of the module definition that leads to an extended set of modules that have a particular task. Folder structures of source code represent layers and managers.

To read and write data more securely, the functions contained in the manager and layer will transform their signals into the real-time database (RTDB), which is a container for variables and arguments.

RTDB provides mechanisms to create types at the manager level, which can be of type enum, union, or struct. No particular array type is provided, but arrays can be created as elements contained in a struct [3][6]. Read, write, force (start override), and release (end override) functionality is provided for each data type in the form of function calls. RTDB provides pointers to access the data (Figure 2.1). Thus, the

(15)

5

external interface contains pointers which can only be used for reading (pointers to constant data). In contrast, the internal interface contains both pointers for reading and pointers for writing the data.

RTDB is very flexible with regards to the data that can be stored in it, as it supports several COMP data types as well as user-defined types.

Figure 2.1 Software architecture of EMS

2.2 Introduction of Graph Theory

In this section, we need to understand the basics knowledge of graph theory to build meta graphs. This section will also introduce how to create a graph that contains the required amount of information that we need; at the same time, we need to compare the advantages and disadvantages of different semantics of graphs to filter out the semantic we need most.

2.2.1 Overview

The graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects [8]. A graph consists of vertices and edges. Vertices also called nodes or points.

A node is an object that contains any number of properties. Edges can be defined as the relationships among the nodes. The connections between the two nodes can be one or more. When the node connects with other nodes by a directed edge, the relationship can be defined as a directed graph [9]. If the edge is undirected, the relationship is called an undirected graph. The most straightforward relation graph is shown below:

Figure 2.2 A simple example of graph

(16)

The graph database [10] is to view the data as an arbitrary set of objects connected by one or more kinds of relationships. To make the graph database more clearly and readable, labels are introduced to group nodes. By grouping the nodes, it is easy to make the graph more organized and logical. Furthermore, the graph can be simplified by classifying the nodes. Labels of the nodes can be an excellent solution to enforce the information that the graph can convey. The example of a labeled node can be found in Figure 2.3.

Figure 2.3 A simple example of study relationships

The orange node has a label called “Student.” It contains the properties, including the student’s name, age, and ID. The grey node represents another kind of node “School.” Also, It contains the properties, including the departments in the school and the ID of school. By using different labels, the graph can avoid chaos and increase readability. With the properties of nodes and edges, readers can easily find the information and relationships that they need.

2.2.2 Semantic of graphs

Before we try to organize graphs, it’s necessary to understand graph technologies and graph semantics.

Typically, the most common graph technology semantics includes property graphs [7], hypergraphs [11], and RDF triples [12].

An example of a property graph is shown below. Main characters of property graphs are listed as follows [9]:

1. Property graphs contain nodes and relationships.

2. Nodes contain properties.

3. Relationships must have names, features, and directions.

The property graph is the simplest model in graph theory, but it is also an essential one. The other models are based on a property graph and still contain the characters of property graphs [10].

Figure 2.4 An example of property graph

(17)

7

An example of hypergraphs is shown below. Hypergraphs are defined as a graph in which a relationship (also called as hyperedge) connect any number of giving nodes [8]. The difference between property graphs and hypergraphs is that property graphs only permit one start node to have a relationship with one end node. Still, hypergraphs allow multiple connections between a specific node with other nodes.

In Figure 2.5 driver has three relationships with vehicles.

The advantage of hypergraphs is to produce accurate, information-rich data models. It can show multiple relationships between data. The disadvantage is that hypergraphs are very easy to miss some detail while modeling. For example, the relationships in Figure 2.6 cannot show which driver is the primary driver for each car.

Figure 2.5 An example of hypergraph

RDF triple stores are modeled around the Resource Description Framework (RDF) [12]. This kind of sematic comes from World Wide Web Consortium (W3C). An example of RDFs is shown in Figure 2.6.

Data processed by triple stores tend to be logically linked. The difference between RDF and property graphs is that RDFs are more about data exchange, and property graphs are purely about storage and querying data. Although RDFs possess advantages of property graphs and hypergraphs, it still has some disadvantages that cannot be ignored. RDF triple stores are not native graph databases because they don’t support index-free adjacency, nor are their storage engines optimized for storing property graphs.

(18)

Compared with property graphs, RDFs do not have any internal structure in the nodes or edges Figure 2.6.

Figure 2.6 An example of hypergraph of RDF triple stores

A group of Swedish engineers developed labeled Property Graphs (LPG). They were developing an Engine Control Management System (ECM) in which they decided to model and store data as a graph.

The motivation of LPG is not about exchanging or publishing data. LPG was more interested in efficient storage that would allow for fast querying and fast traversals across connects data. LPG is more likely to represent data in a way that close to a logical model, which is that drawn in the whiteboard when engineers build them as prototypes or meta graphs. Compared with traditional property graphs, LPG can connect multiple relationships among nodes. In other words, it has all the advantages of the graphs mentioned in the previous methods. But a disadvantage is that since LPG contains more details than its fellows, it has more fundamental properties in the nodes. These properties are not natural for users to handle or write queries.

A brief table comparing the differences between these graphs can be shown below.

Nodes Relationships Multiple directions Internal structure Revive information

Property Graph √ √ × √ √

Hypergraph √ √ √ √ ×

RDF √ √ √ × √

LPG √ √ √ √ √

Table 2.1 Differences between graphs

2.3 Introduction of Design Tool

In this section, we will introduce some of the toolkits which are needed to build the toolchain. Pycparser is mainly used for static analysis of source code, and Argparser is primarily used for human-computer interaction. Neo4j is used for visualization of dependencies.

(19)

9

2.3.1 Introduction of Pycparser

2.3.1.1 Overview

Pycparser is a parser for the C language, written in Python. It is designed to be easily integrated into applications that need to parse C source code. It supports Python2.7 -3.6, and it works on both Linux and Windows. Pycparser is listed in the Python Package Index (PyPI). So, the users can install Pycparser by using “pip” command. Pycparser is designed for supporting the C99 language. It can act as a static code analyzer.

2.3.1.2 Usage of Pycparser

Pycparser uses to parse the declaration of C functions and types [13]. The usage of Pycparser is very similar to Lex and Yacc. It will take as input a specification of syntax and procedures as output a method for recognizing C language. It can be treated as compliers-compliers. Pycparser will define the C language’s token and Marcos, and it automatically assigns a number of them. Then Pycparser will output the file that contains the function’s information, definition, and the other elements.

2.3.2 Introduction of Argparse 2.3.2.1 Overview

The argparse [14] module makes it easy to write user-friendly command-line interfaces [14]. The program defines what arguments it requires, and argparse will figure out how to parse those out of sys.argv. The argparse module also automatically generates help and usage messages and issues errors when users give the program invalid arguments.

2.3.2.2 Usage of Argparse

Since the script we wrote contains several different functional modules, for our convenience, use this module to facilitate the running and calling of the script. At the same time, this module makes it easier for users to interact and avoid errors or problems that can occur when using this script.

2.3.3 Introduction of Neo4j 2.3.3.1 Overview

Neo4j platform is a graph database management system developed by Neo4j, Inc. It is a high- performance, NOSQL graphical database that stores structured data on the network rather than in a table [8]. It is an embedded, disk-based, Java persistence engine with full transactional features, but it stores structured data on a network (mathematically called a graph) rather than a table. Neo4j can also be a high-performance graph engine with all the features of a mature database. Programmers work in an object-oriented, flexible network structure rather than a strict, static table—but they can enjoy all the benefits of a fully transactional, enterprise-class database.

It has the following advantages [10]:

 It is easy to represent the connected data

 Retrieving /traversing/navigation of more connection data is very easy and fast.

 It is very easy to represent semi-structured data

 Neo4j CQL query language commands are user-friendly and readable format, very easy to learn

(20)

 It uses a simple and powerful data model

 It does not require complex connections to retrieve connected/related data because it is easy to retrieve its neighbors or relationship details without a join or index

2.3.3.2 Cypher Query

Neo4j supports multiple APIs [10] to load data and create relationships. The logical view of the users- facing APIs can be shown in Figure 2.7.

Figure 2.7 Architecture of Neo4j

Cypher is the most common language for Neo4j. It is the declarative query language used for data manipulation in Neo4j. It is similar in many ways to how a relational database depends on Structured Query Language (SQL) to perform data operations [12]. However, Cypher is not yet a standard graph database language that can interact with other graph database platforms. Also, the expressive and relatively simple nature of Cypher allows it to be a tool that can be used beyond the requirements and limitations of an organization’s technology-centered groups. The query can be shown below.

Table 2.2 A brief query used in Neo4j

If the users extend the query, for example, add the professor who teaches in the school. The query can be written below.

Table 2.3 An example query to create relationships in Neo4j

create (a:Person{label:"Student",name:"Bob",age:”20”})

create (b:School{label:"School",name:"Computer Science",Department:”xxxx”})

create q= (a)-[:Study_in]->(b)

return q

create (a:Person{label:"Student",name:"Bob",age:”20”})

create (b:School{label:"School",name:"Computer Science",Depertment:”xxxx”})

create(c:Person{ label:"Professor",name:"Alice",age:”45”})

create q= (a)-[:Study_in]->(b)<-[:Work_in]-(c)

return q

(21)

11

And the graph will show in Figure 2.8:

Figure 2.8 The result from Neo4j

To further illustrate how Neo4j builds relationships, a more complicated relationship will be presented.

We assume that Alice teaches the lecture to Bob. If the users want to show this relationship, the most straightforward query shown below can be added in Table 2.3.

Table 2.4 An example to add relationships in Neo4j

However, following this way, a problem should be concerned. If there are massive data in the database, it is impossible to assign the relationships one node by one node. It is crucial to find a suitable way to connect the nodes. For the existing properties and nodes, redefining the query can achieve the aim.

Nevertheless, whatever the users enforce query, when they want to connect with the same type nodes, in the case shown above, although Bob and Alice own different labels, they belong to Person Nodes. It still needs to write a new query to create relationships. So, how to make queries simple becomes a concerning problem.

A reasonable method to solve this problem can be used: when we try to establish the relationships between the nodes which is guaranteed to be unique in the database [11], the query will distinguish the nodes between the property, not labels or others. This method will reduce the cumbersomeness and repeating work of writing a query. However, in this way, the data which records the relationships between nodes need to load into Neo4j in advance. That means, loading data will become a little different than before. A comparison graph of this method and the previous process can be found in figure 2.7. The specific implementation of how to define the format of data loaded into Neo4j and loading steps will be discussed in Chapter 4.

create r =(c)-[:Teach]->(a)

return r

(22)

Figure 2.9 Main ways to create data in Neo4j

2.4 Previous Work

2.4.1 Static Analysis, Code Parsing and Abstract Syntax Tree

Source code analysis methods are divided into two types: static analysis and dynamic analysis. Both approaches have their advantages and disadvantages. Static analysis refers to scanning program code through lexical analysis, syntax analysis, control flow, data flow analysis, and other techniques without running the source code. It is a code analysis technique for verifying whether the system meets the specifications, security, reliability, and maintainability [15].

The implementation of static analysis is different. Generally speaking, it is divided into two types. One is to analyze the intermediate files after the source code compiled, and the other is to analyze the source files. To analyze source files generally take the project's source files (including header files, dependent libraries, and other components) as input. Then static analysis tools generate intermediate system data (including abstract syntax trees, relationships call graph, control flow graph, etc.) by parsing, compiling, and linking the code [16]. After this step is completed, complex inspection rules using static analysis tools can be used to match and track intermediate file data, find and locate defects, and finally generate results [17].

Dynamic analysis is relative to static analysis techniques. Static analysis technology means that the users handle disassembly tools to translate the binary executable file into assembly code and analyze the code

(23)

13

[15]. Dynamic analysis implies that the user uses the debugger to track the operation of the software and seek the path of examining the source code.

Dynamic analysis [18] generally observes the state of the program during operation, such as register contents, function execution results, memory usage, etc., analyzes the function which calls the other functions, clarifies the logic of the code, and digs for possible vulnerabilities. In other words, the dynamic analysis method is to observe the running efficiency of the source code and the running status of the data in the system by tracking a large number of examples.

Since dynamic analysis requires running source code snippets one by one, it cannot be applied to a complex system. If a dynamic analysis is to be carried out in a complex system, this means that we need to write a sufficiently complete script to cover all variables that run or compile source code in the system, which is unrealistic. However, the analysis method of static analysis and the running of the program is independent. We need to plug in a specific preprocessor and syntax compiler in the source code backend [14]. Such a review method is more convenient and faster, and it is more suitable for complex systems.

At the same time, static analysis can also enhance the scalability of code analysis. For example, we can first learn the hierarchy of software architecture through static analysis and then analyze code fragments one by one through dynamic analysis.

In principle, the range of data covered by static analysis is smaller than that of dynamic analysis [16], which makes the accuracy of the information obtained by static analysis less powerful. It can only describe what the knowledge that only exists in the source code; the dynamic analysis can analyze the memory call mechanism, code compilation efficiency, etc. On the other hand, static analysis needs to be combined with the context of the source code for review and interaction, which requires the writing of source code to be extremely standardized. Compared to dynamic analysis, it provides a more manageable collection of information that can be directly related to how the analysis is defined and the types of structures covered by the review.

The critical steps of extracting source code through static analysis can be divided into the following steps:

1) Lexical analysis is also called a scanning scanner [19]. It reads source codes and then merges them into individual tokens according to predetermined rules. At the same time, it will remove whitespace, comments, etc. Finally, the entire code will be split into a list of tokens (or a one-dimensional array). When lexically analyzing source code, it reads the code letter by letter, so it is vividly called scan- scans; when it encounters spaces, operators, or special symbols, it thinks that a word has been finished.

2) Parsing is also called a parser [19]. It will transform the lexical array into a tree representation.

At the same time, verify the syntax, and if there is an error, throw a syntax error. When spawning the tree, the parser will delete some unnecessary tokens (such as incomplete parentheses), so the AST is not 100% matching the source code, but it already lets users know how to deal with the source code. There are multiple ways to represent elements in the source code in the parser. Such representations are called syntax. The syntax is generally used to specify how to analyze non-terminal symbols in the source code, and how to link these symbols. Besides, syntax involves a critical way to represent code; that is, how to display the analyzed data structurally. There are two commonly used hierarchical display methods, top- down and bottom-up.

Semantic analysis is used to link nodes of a syntax tree to entities. It is where the obtained data is ready for further processing. The generated syntax tree has a variety of representations. The most used and universal format is an XML file.

2.4.2 Dependency Analysis

There are several basic flow diagrams commonly used to analyze data through syntax trees: data flow diagrams and control flow diagrams. These graphics can be generated by specific visualization tools or by the same tool as the parser. For example, Pycparser can directly generate data flow graphs. In static analysis, data flow analysis refers to the analysis of data obtained through control flow analysis to collect

(24)

information about the flow-through variables. Control flow is used to control the order in which each calculation operation is performed. The statements in C language that can change the regular execution order are mainly divided into three categories: decision statements, loop statements, and jump statements [19].

The primary purpose of the above flow graph is to show the data flow of a single function or a small amount of code. The most typical flow diagram [18] is shown in Figure 2.10. Such a flow graph shows every detail of the code, but in fact, we don't need to go too far into the details. Although these details can show dependencies to a certain extent, such a flow diagram contains too much information that we don't need. Moreover, the flow diagram loses readability. For example, we don't care how many times the variable in the loop is executed; we only need to care about which variables are affected by this variable. On the other hand, such a flow diagram cannot reasonably display high-level code abstraction.

If we use a top-down analysis method, we should focus on the architecture first, then on the dependencies inside the code.

Figure 2.10 A simple control flow graph

We need to customize the representation of dependencies according to existing references. The dependency graph needs to include the features in the above graphs, but we must remove their redundancy and other shortcomings. On the other hand, the dependency graph also needs to show abstract high-level software architecture.

2.4.3 Visualization

In previous work, the visualization software commonly used for static analysis is yEd or a similar XML visualization tool. The methods and standards for constructing graphs are not uniform. However, the purpose of using visualization software is clear, that is to use an automatic and robust approach to generate a multi-purpose, highly readable, and highly scalable dependency graph [15]. This software can very quickly convert the data flow graph and control flow graph to a top-down dependency graph.

(25)

15

Furthermore, adopting such software does not require us to design complex interactions [16]. But this type of software has many limitations.

 When generating dependency graphs for large and complex software systems, this type of software can only show the dependencies of part of the code. For complex systems, such dependencies do not adequately explain the results we need, in Figure 2.10 [20], the dependencies can only show the relationships inside the functions.

Figure 2.11 Dependencies generated by the yEd tool

 yEd takes a lot of time when allocating graph modules, which significantly reduces usability [17].

 The results are entirely satisfactory when generating high-level abstract dependency graphs.

However, there are still a lot of errors and redundancies when creating code details [16]. It requires the user to compile and link dependencies to obtain a satisfactory graph manually. This result violates the original intention.

 The generated graph can only show that the signal is passed to the function (shown as Figure 2.11) and cannot show the dependency of the signal [16][17][18]. We not only pay attention to which functions and modules the signal is passed to but also which factors (such as variables, arguments) are affected by the signal.

(26)

Figure 2.12 Dependencies across the modules

 The generated graph requires a lot of manual operations [21]. For example, when we need to convert from a high-level software abstract graph to a signal dependency graph, we must repeatedly generate new files.

 The readability of the graph needs to be improved [15][16][17][18][20] (shown as Figure 2.12).

Figure 2.13 Overall dependencies generated by the yEd tool

 It is not possible to quickly search for the dependency of a specified signal [15][16][17][18][20].

 Versatility is not strong enough [18]. In the process of design visualization, if the user tries to generate different levels of dependencies (such as variable-function, module-function), different

(27)

17

semantic needs to be designed. This approach creates several problems: The design process is complex; readability is reduced, and the signal flow is not coherent; each different semantic generation speed is also different, and the running time is greatly extended.

All in all, the limitations of UML visualization software such as yEd are huge. The use of this type of software in previous work has caused a series of problems and even outweighed the benefits of tracking dependencies. Besides, yEd also has a set of complex APIs. Using these APIs also makes it more difficult for users. On the other hand, the scalability of this type of software is not strong enough, and it is not very easy for us to dig further the potential based on this type of software.

2.5 Summary

The essential background of the thesis has been introduced in this chapter. The necessary knowledge and related work have been mentioned. In the previous work, the readability of graphs and details dependencies relationships are the most concerning problems to be solved. On the other hand, previous work almost uses a similar tool and ideas to realize visualization. If the thesis uses the same process to visualize dependencies, the tools should be concerned and measured. Meanwhile, a solution will be generated to fix up the problems when trying to analyze the dependencies of arguments and structs mentioned in previous work.

(28)

3 Methodology

The purpose of this chapter is to provide an overview of the research method. Section 3.1 describes the research process. Section 3.2 presents the classification of signals and functions dependencies. After clarifying the concept of all data and dependencies involving signal tracking, a meta graph will build to show the overall dependencies between signals and functions. Then, a concrete graph that is based on pseudo-codes is used to verify if the meta graph is reasonable or not. Other than this, the description to record the dependencies specification will be introduced in this chapter. Finally, a comparison will be discussed between Neo4j with other visualization tools. We will present the reason we choose Neo4j as the visualization tool.

3.1 Research Process

Firstly, the research problems should be identified. In this thesis, the research problems can be divided into the following parts. 1) How to analyze signal dependencies, 2) Which kind of tools can afford this task. 3) How to visualize signal dependencies 4) What kind of tools should be used for visualization.

For problem 1), we often use static analysis to analyze dependencies. A typical static dependency method is to analyze the Abstract Syntax Tree (AST). The advantage is that if we analyze AST, we do not need to consider the compilation language and programming style. Moreover, since the source code compiler has built as a Cmake [21] file, we do not need to consider too much about how to construct the preprocessor [19][21] or generate pre-processing code from source files. However, it should be noted that because AST obtains dependency results through analysis tools, then when the code becomes complicated, the generated AST file also becomes complicated. If we use this file for dependency analysis and visualization, the readability of the result will be significantly affected.

For question 2), since the real-time database and compilation tools used in this thesis already have pre- compiled, these tools can directly generate XML files based on pre-processing code. However, the data only shows the structure of the source code. And for the dependencies, the XML file does not define accurately. So, we need to use an analysis tool, like Pycparser, to refine the AST [13].

In the process of static analysis, we need to redesign the structure or description of the analytical file. A more common way is to generate a new XML file based on the AST file and redefine the hierarchy of dependencies in the XML file. But the hierarchy of defining dependencies needs to meet the following conditions: 1) Avoid duplication and redundancy. 2) Complete retention of dependencies.

Avoiding duplication means that, for a dependency, we try to express the dependency with less information. For example, the value of signal A is determined by signal B in function Fna. When defining the dependency, we will declare that the value of A depends on B in the dependency relationship of function Fna. Still, in the description of variables A and B, we will try not to make redundant declarations which show dependencies. Besides, when signal A is read by function Fnb, we will not declare A's dependency in function Fnb again. In other words, the dependency relationship will only be recorded at the place where the variable is changed.

Complete maintain dependencies means that, when a variable is changed, whether it is due to arguments, global variables (RTDB variables), local variables (non-RTDB variables), function return values, or constants, we need to collect these factors.

For question 3) and question 4), the key we focus on is how to convert a text file into a more readable graph. Two issues need to be considered here first, how to express dependencies to make the graph more entertaining and more accurately. Second, what tools can be used for robust and universal visualization, and what are the advantages of this tool compared to previous work?

In this chapter, we focus on the above four problems and discuss how to solve these problems in detail.

(29)

19

3.2 Defining Software Architecture

Before designing the toolchain, we need to come up with a method for developing to recovery the software architecture [22]. In other words, the information extracted by the source code needs to be arranged in a specific rule and manner. We first need to analyze the data types and dependency types in the source code.

3.2.1 Variables analysis

The variables can be categorized as Real-Time Database (RTDB) variables and non-RTDB variables.

Non-RTDB variables, which also called as reference variables, mean these variables only run through functions, same layers, or same modules, and they do not write into RTDB. These two different variables will be introduced to facilitate the description separately. In this section, we will analyze the variables and functions in the source code and classification of dependencies through their analysis.

1. RTDB variables

RTDB variables are global function variables that act on the system. These variables are responsible for the adjustment and calibration of specific functions or modules. They are called RTDB variables because these variables are placed in the real-time database. These variables are our biggest concern because the state of these variables will directly affect the working statements of the system, which is also our purpose to verify the software architecture.

2. Reference variables

Reference variables, also named as non-RTDB variables, are not necessary to track in this thesis.

They are intermediate variables generated during function running. It only needs to be considered when functions read reference variables, and the functions write any RTDB variable. In other words, when reference variables affect any RTDB variables, we needs to consider the dependencies.

3.2.2 Function analysis

Functions are responsible for writing and reading variables. The relationship can be described as a read- write relationship. Besides, there are also some complex structures in the function body, such as conditional statements. The conditional structure is also included in the above source code. We analyze all the types of conditional structures in the source code, including [19]:

1. If statements:

The function writes the output value, which depends on the condition statement.

2. for loops:

Initializers do not have any dependencies. But the output of the for-loops depends on the conditional statement.

3. while loops:

block statements depend on the condition 4. do-while loops:

The result of do-while loops depends on the conditional statements.

5. switch statement.

(30)

The output is dependent on every case of a switch statement.

If we express the control flow in the dependency, the result can show a specific dependency relationship.

Nevertheless, these conditions will become very complicated, and the readability will also decrease.

Furthermore, the conditional statements of each function are very intuitive; it does not make much sense to appear in the dependency relationships. We only need to show the read and write conditions in the function to express the dependency. In other words, we need to accurately extract the dependency relationship from the above complex conditional statements and retain critical information.

1. * @param arg1 in argument 2. * @param arg2 in argument 3. * @param arg3 out argument 4. constant Const1

5. Function Fb(arg1, arg2, arg3) 6. { if (arg1 == Const1)

7. { arg3 = arg2 8. }

9. else 10. break 11. }

Table 3.1 Conditional statement pseudo code

For example, in the above pseudo code of if-statements, the value of arg3 depends on the three parts:

arg1, arg2, and Const1. When any condition changes (such as arg1 is not equal to Const1), arg3 cannot be assigned. So, we can express the if-statement, Fb reads arg1, arg2, and const1, and writes arg3.

Function calls are a relatively complicated situation. When a function is called, we can either pass parameters or return values. The dependencies are different when passing parameters and return values.

Passing parameters means that this function has a dependency on another function. The return value means that the return value may be related to the function itself, or it may be related to other functions.

Summarizing the form returned by the function, there are the following cases [19]:

1. One or more parameters (variables, arguments) are called in the function, and there is no return value.

2. A function is called, but no value or constant is called, but the value of some variable is returned.

3. One or more parameters (variables, arguments) are called in the function, and there is a return value.

For functions with no return value, we only need to consider the parameters of other calls. For those that have a return value, but no value is called, the return value itself does not have any name, which causes us trouble when analyzing dependencies. Here we specify a designation to return value to avoid misunderstanding.

1. * @param arg0 in argument 2. * @param arg1 in argument 3. * @Variable x Rtdb Variable 4. * @Variable y Rtdb Variable 5. constant Const1

6. Variable x, y

(31)

21

7. Function fb() 8. {

9.

10. y = x

11. z= Fc(arg0,arg1) 12. }

13.

14. Function Fc(inarg, outarg) 15. {

16. if (inarg == Const1) 17. { outarg =+ 1

18. } 19. else

20. { outarg = 0 21. }

22. Return outarg 23. }

Table 3.2 Function call dependencies pseudo code

For example, in the above pseudo code, the call relationships belong to case 3, so the value of z depends on arg0, arg1, and Const1.

Through the above analysis, we have abstracted the function call dependencies from the complex C language syntax model. The purpose of this is to avoid complex and extremely unreadable semantics such as control flow graphs and data flow graphs. We only need to pay attention to the read-write relationships hidden under these complex grammars. By revealing the read-write connections, the dependencies of a signal can be expressed.

3.2.3 Module analysis

Basic knowledge of module has been introduced in Chapter 2. The module is a hierarchical relationship introduced to facilitate the management of the function. The module itself can continue to do a more detailed classification based on the functions. But the module does not need to be classified in the thesis;

the hierarchical relationship follows the software architecture introduced in Chapter 2. It should be noted that the module only describes the ownership of the function. So, there is no direct logical relationship between the module and the variable or the argument. In the process of visualizing, we only need to implement the hierarchical relationship between module and function.

3.2.4 Constants analysis

Any functions cannot write constants. When analyzing dependencies, they only need to be concerned about which function read them.

3.2.5 Dependency analysis

According to the previous analysis, several dependencies have been known. However, the above relationship is decentralized and has not been summarized. Even with these relationships, it is tough to build complex visualizations. Therefore, the relationships will be outlined in detail in this section, and the similar terms will be defined for the above dependencies. In this way, the dependencies in the C language can be reasonably classified. Careful and organized categorization facilitates visual implementation.

1) Write dependencies

Write dependencies refer to a function ontology (not by calling) to write an RTDB variable. A function may write multiple variables, and the variables written depend on different values.

(32)

2) Reference dependencies

Reference dependencies are similar to write dependencies; the difference between reference dependencies and write dependencies is that reference dependencies refer to the non-RTDB variables.

3) Out argument dependencies

Out argument dependencies refer to a function that writes to the arguments it defines. In this case, the argument is treated similarly to write dependencies, which requires splitting the function. However, the naming of variables is unique, and the naming of arguments may be repeated. So, we need to mark the argument appropriately to make it a unique name.

4) Function call dependencies

The function call dependencies refer to the dependencies that exist when a function is called. In this case, its variables or arguments depend on the calling function. The read-write relationship between the arguments or variables and the called function needs to point out.

5) Return dependencies

Return dependencies mean that the function defines the return value inside the function body, and the function is called by other functions. Then the return value will affect the calling function.

On the other hand, the return value itself does not have any name, for convenience, when return dependencies occur, the return value is labeled as “return value.”

3.3 Build Meta Graph

After all the dependencies have been clarified, an attempt is made to construct a meta graph [24] based on the dependencies. When we try to build a meta graph, we should build a pseudo code containing all dependencies firstly. Then, we will draw the graph through according to the pseudo code. After that, we will check the readability of the graph after verifying that the graph expresses the same meaning as the pseudo code. Before the final meta graph is obtained, proper optimization is required. This will be a template for the graph created by neo4j.

When writing pseudo code, it needs to consist of multiple functions, in line with the same programming style as the source code we try to analyze. Multiple functions of the above code have function call dependencies. For write dependencies and reference dependencies, they are essentially the same kind of dependency, so pseudocode only needs to satisfy one of them.

In pseudo code, we assume that the functions in the code belong to different modules. We will also represent the relationship between the module and the function in the meta graph.

3.3.1 Basic rules for building meta graph

Before establishing a meta graph, we first need to formulate some rules. The primary purpose of creating these rules is that meta graphs need to convey dependency information accurately. It means to make the read-write relationship more directional; we need to clearly express the variables affected by the parameters read by the function, which is also our most significant breakthrough compared to previous work. The previous work can only display the parameter names read and written by the function but cannot further indicate the relationship between the parameters read and the parameters written.

1. * @Variable a Rtdb Variable 2. * @Variable b Rtdb Variable 3. * @Variable x Rtdb Variable 4. * @Variable y Rtdb Variable 5.

(33)

23

6. Variable a 7. Variable b 8. function Fa(a,b) 9. {

10. x = a 11.

12. y = b 13. }

Table 3.3 An example to show dependencies

For the above pseudo code, if users represent the read-write dependencies relationship of function Fa, the graph is shown in Figure 3.1. But this graph is not readable for the users. It will give the reader an illusion, the variable x and y both are dependent on variable a and b. But x only is dependent on a, y is exclusively dependent on b. The solution is to split the functions which contain multiple dependencies.

The functions will be divided into parents' functions and children's functions (shown in Figure 3.2). The latter is only responsible for showing a single dependency relationship by using this method.

Figure 3.1 A wrong demonstration to show dependencies

Figure 3.2 Split function to show dependencies

(34)

1. * @param arg1 in argument 2. * @param arg2 in argument 3. Fnd(arg1, arg2, outarg)

4. {

5. if(arg1 > fne) 6. {

7. outarg = Constant A 8. }

9. Arg2 = Constant B 10. }

11.

12. Fne() 13. {

14. if (signal z == constant B) 15. {

16. return constant C 17. }

18. }

Table 3.4 Multiple function call dependencies pseudo code

The same problem occurs when the function is called. As shown in the pseudo code above, the read and write of arg2 in Fnd has nothing to do with the called function Fne itself; when expressing dependencies, we also need to split Fnd to improve dependency accuracy. Fnd has two children functions: Fnd_0, Fnd_1. Fnd_0 is used to indicate the reading and writing conditions of arg1; this child function calls Fne.

Fnd_1 is used to indicate the read and write status of arg2. This function has no dependency with Fne.

If Fne also has child functions Fne_0 and Fne_1, then we stipulate that the calling relationship always has a child function pointing to the parent function. That is, Fnd_0 always points to Fne, and will not point to Fne_1. The purpose of this is to unify the directivity of the graph.

To summarize the rules we have formulated, there are mainly the following:

 For any function body, if there are multiple unrelated read-write relationships within it, then we must split the function. (Rule 3.1)

 When a function is called, the child function always points to the parent function, which is labeled as "call". (Rule 3.2)

 The function reads the parameter as "read", that is, the arrow points from the function to the parameter. (Rule 3.3)

 When a function writes a variable, it is recorded as "written_by", that is, the arrow points to the function from the parameter. (Rule 3.4)

 The child function and the parent function are recorded as "parent_to", that is, the arrow points from the function to its child function. (Rule 3.5)

The above statement's purpose is that when the users view the dependency graph, they can trace the start or end of the dependency along or against the direction of the arrow. We can track all the dependencies of the signal in a single graph.

3.3.2 Build meta graph

After clarifying the basic rules, we will try to build a meta graph. We first write a piece of pseudo code containing dependencies and create a meta graph piece by piece based on the pseudo code.

The pseudo code is shown below.

(35)

25

1. Fna() 2. {

3. arg0 = signal y 4. Fnb(arg0) 5. }

6.

7.

8. Fnb(inarg) 9. {

10. arg1 = inarg 11. arg2 = Var_a

12. Fnc(arg1, arg2, arg3) 13. signal x = arg3 14. }

15.

16. Fnc(inarg, inrag, outarg) 17. {

18. arg4 = (inarg,inarg) 19. Fnd(ag4,arg5), 20. outarg = arg5 21. }

22.

23. Fnd(inarg, outarg) 24. {

25. if(inarg > fne) 26. {

27. outarg = Constant A 28. }

29. } 30.

31. Fne() 32. {

33. if (signal z == constant B) 34. {

35. return constant C 36. }

37. }

Table 3.5 Pseudo code for meta graph

Fna calls Fnb, whose argument is transferred from itself to Fna. And argument arg0 is written by Fna.

That means Fna has argument dependencies. The dependencies relationships between Fna and Fnb are function call dependencies. Fnb writes arg1 and arg2. Arg1 and arg2 will be transferred to Fnc. But arg3 is read by Fnb and written by Fnc. It means that Fnb and Fnc are dependent on each other. The out argument in Fc is dependent on Fnd. But Fnd contains a conditional statement. Fnd will first call Fne before Fnd performs a conditional statement. Therefore, any conditional/switch statement will be regarded as the function read or call operation.

(36)

The meta graph will be built step by step. First, we create a dependency inside the body of the function and then analyze the dependencies of other functions involved in it based on the situation inside the function. For example, the dependencies of Fna will be built first, after creating the write dependencies of Fna (this dependency is inside Fna), Fnb which involves in Fna will be concerned. Another process of function dependency establishment also will follow this step. First, considering one function, then the other function requires in this function needs to be considered. The following graph is an example of establishing a dependency graph.

Figure 3.3 A part of meta graph (1/3)

Now the dependencies of Fna have been established. When Fna calls Fnb, the dependencies between Fna and Fnb still have not done yet. Then, the dependencies between Fna and Fnb will be built in Figure 3.3. For Fnc, the establishment process is the same as Fnb.

Figure 3.4 A part of meta graph (2/3)