Efficient IR for the OpenModelica Compiler

(1)

Linköpings universitet SE–581 83 Linköping Master thesis, 30 ECTS | Datateknik 2018 | LIU-IDA/LITH-EX-A--18/014--SE

Efficient IR for the

OpenModelica Compiler

Effektiv IR för OpenModelica-kompilatorn

Patrik Andersson Simon Eriksson

Supervisor : Martin Sjölund Examiner : Peter Fritzson

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – un-der 25 år från publiceringsdatum unun-der förutsättning att inga extraordinära om-ständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garan-tera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och ad-ministrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible re-placement – for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies per-manent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this per-mission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures

(3)

as described above and to be protected against infringement. For additional in-formation about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

©

Patrik Andersson Simon Eriksson

(4)

Abstract

The OpenModelica compiler currently generates code directly from a syntax tree representation, which leads to inefficient code in several cases. This the-sis work introduces a lower-level intermediate representation for the com-piler which aims to simplify the comcom-piler back end and enable more opti-mizations. The resulting design of the representation features flat primitive operations and control flow using basic blocks and terminators. Variables are mutable, unlike SSA-based representations. Introducing the IR did not signif-icantly change the runtime performance of the test programs. The number of lines of code compared to the old back end was reduced to a quarter, this and the simpler representation will help future work on optimization passes and implementing an LLVM-based back end.

(5)

Abstract iv

Contents v

List of Figures vii

List of Tables viii

1 Introduction 1 1.1 Background . . . 1 1.2 Motivation . . . 1 1.3 Aim . . . 2 1.4 Research questions . . . 2 1.5 Delimitations . . . 2 2 Theory 4 2.1 Compilers . . . 4

2.2 The Modelica language . . . 12

2.3 The OpenModelica environment . . . 22

3 Method 33 3.1 Design . . . 33

3.2 Implementation . . . 33

3.3 Performance evaluation . . . 33

3.4 Code complexity measurements . . . 34

4 Results 36 4.1 Overview of the MidCode design . . . 36

(6)

5 Discussion 44

5.1 Performance results . . . 44

5.2 Design of MidCode . . . 44

5.3 Code Complexity of MidCode . . . 45

5.4 Related Work . . . 45

5.5 The work in a wider context . . . 46

6 Conclusion 47 6.1 Future work . . . 47

A Performance test functions 49 A.1 Fibonacci . . . 49

A.2 Mandelbrot . . . 49

A.3 Quicksort . . . 50

A.4 Takeuchi function . . . 51

(7)

2.1 Compilation of Haskell in GHC . . . 11 2.2 Overview of translation phases in the OpenModelica compiler . . . . 23 2.3 Overview of the OpenModelica components . . . 24 4.1 Overview of MidCode phases . . . 37

(8)

List of Tables

2.1 Available LVALUE types in MIR . . . 8

2.2 Available RVALUE types in MIR . . . 9

2.3 Available terminators in MIR . . . 10

4.1 Lines of code for corresponding parts of old back end . . . 43

4.2 Lines of code for new back end . . . 43

(9)

Introduction

1.1 Background

OpenModelica is an open-source modeling and simulation environment developed mainly by the non-profitOpen Source Modelica Consortium (OSMC) that imple-ments the open standardModelica modeling language. Modelica is a declarative equation-based language designed for describing various complex and dynamic systems and can be used for simulating, for example, mechanical, electrical, hy-draulical and process oriented systems. OpenModelica is mainly targeted towards industrial and academic purposes.

1.2 Motivation

Currently, the C code generated by the OpenModelica compiler is inefficient in many cases, causing significant performance issues, especially considering that the data inputs may often be large and one major reason for that is due to the code being generated directly from a high-level syntax tree-based representation of the Modelica code[2].

A better solution would be to convert this representation to a lower-level inter-mediate representation (IR) more suitable for optimization and code generation before actually generating the code. Later on, implementing a stage converting the new lower-level IR to a more common representation like for example LLVM would be feasible.

(10)

1.3. Aim

1.3 Aim

The aim of this thesis is to design and implement a new efficient and maintainable IR solution for OpenModelica. The new IR stage should be able to compile various testing programs with roughly equal run time performance to the old code gener-ation while simplifying the back-end code genergener-ation and enabling various useful lower-level transformations in the future. This new code generation should easily be extended with more lower-level optimizations and new back ends. Especially a future LLVM-based back end would be interesting in the long term.

1.4 Research questions

1. How can a new IR help the implementation of optimizations and other code transformations?

2. Which IR design choices (of some common alternatives) are most suitable for OpenModelica?

3. How can this new IR be implemented in the OpenModelica compiler? 4. How much of the back end can be moved to a shared portable format leaving

target specific implementations simpler?

5. How will the new IR affect the run-time performance of OpenModelica?

1.5 Delimitations

This project will focus on evaluating some common IR approaches and on imple-menting the new IR and its corresponding C code generator. The IR approaches to evaluate should be low-level but platform-independent and well suited for trans-forming to LLVM. Alternative special-purpose C code generator variants (for ex-ample for parallelization or embedded devices) will not be considered.

While Modelica has many language features specific to simulation such as equation-based models and model connections, the project will focus on imple-menting Modelica functions, its algorithm feature subset and the MetaModelica extension providing various features common in functional programming but not included in the Modelica standard such as pattern matching, tagged unions and

(11)

linked lists. This part of Modelica is closer to general-purpose programming and therefore easier to compare to other languages and their corresponding IR solu-tions, as well as less complex. The Modelica support for multi-dimensional arrays was deemed too complex and time-consuming, and was therefore skipped. While performance improvements at this stage are of course desirable, the main focus is on creating a solution that can later be extended on with new optimiza-tions and back ends.

(12)

Chapter

2

Theory

2.1 Compilers

Modern compilers are commonly structured as a pipeline of several phases taking a structure, transforming it, and sending it to the next phase. This pipeline can be divided into afront end parsing and analyzing the source code, and a back end taking the structure produced by the front end and converting it into a executable program[1].

Some common operations of the front end part arelexical analyzing (converting the source text to more easily parsable tokens),syntax analyzing (checking if the syntax is correct and converting the token stream into an abstract syntax tree), andsemantic analyzing (e.g. checking that the types are correct) while some com-mon tasks of the back end areoptimization and code generation [1].

2.1.1 Optimization

In order to improve the performance, size and/or power consumption of a gen-erated program, compilers may attempt to optimize the gengen-erated code rather than producing the most obvious conversion of the source code. Optimizations must give the same results as the unoptimized version as well as sufficiently high performance improvements while being fast enough to still give acceptable com-pilation times [1].

(13)

2.1.2 Intermediate representations

Intermediate representations are internal forms of a computer program created and used by a compiler in order to aid the compilation process. An intermediate repre-sentation lies somewhere between the original source and the compiled target and can be at different levels depending on its area of use; it can be high-level (close to the original source code), low-level (close to the target language) or something in between. The data structures used can also vary; they can for example be a graph, a tree or a linear list. Compilers often have multiple intermediate representations in its pipeline, with each IR serving different purposes and each phase converting the program to a lower IR form[19].

One major advantage of having intermediate representations is that the compiler can be more easily retargeted into new source languages or new target platforms and reuse independent components for those. Instead of having to write one com-piler for every source/target combination, developers can just add a single front end in order to support one source language, and a single back end in order to support one target platform [19][1].

Common IR designs

One common linear representation of operations is three-address codes, where each operation is binary and represented by two source variables, one destina-tion variable and an operadestina-tion type. These operadestina-tions can be stored in memory as records called quadruples storing the three variables and the operation type. Variables can be either a named variable, a constant or a compiler-generated tem-porary variable. Unary operations may be defined as having just a single source variable. Expression trees are flattened by storing intermediate operations in tem-porary variables which are then used as source variables in later expressions. Temporary variables are usually given unique names and not shared between different intermediate operation results. Special call, jump and conditional op-erations can be implemented for representing control flow.[1].

One control flow representation that is frequently used is thecontrol flow graph (CFG). The instructions are partitioned intobasic blocks, which is an instruction sequence where the control flow of each block can only enter through the first in-struction and exit in the last, where various jumping conin-structions can be chosen. The basic blocks are then represented as nodes in the diagram and execution paths as directed edges between the blocks. Each basic block should preferably contain as many instructions as possible without violating these rules. The instructions

(14)

2.1. Compilers

used in basic blocks are primitive and in three-address form. This representation simplifies many analyses and is therefore useful for performing many optimiza-tions[1].

Another complementary representation for tracking data isstatic single assign-ment (SSA), which differs from three-address codes in that each variable can not be assigned more than once in a function, which simplifies data flow analysis significantly. This gives SSA the important property ofreferential transparency, meaning that a reference can be replaced with its definition and therefore that the variable values are independent of the order their statements are listed in. Ref-erential transparency also allows for a computation to be replaced by the result which allows for well known transformations like common subexpression elim-ination. As such, compilers that perform data flow analysis can do a conversion pass over the non-SSA representation and produce a SSA-base representation that is easier to reason with. SSA is used together with a basic block structure and uses a special_{φ (phi) function when execution paths merge, which takes a list of source} variables and assigns one of them to a new variable depending on the previously executed block. The transformations can in general be made with other meth-ods, but SSA has the advantage of being both intuitive and efficient, allowing for more optimizations to easier be implemented while also enabling fast compilation times, often fast enough that it can be used in just-in-time compilers[17].

Example: GCC

One example of IR usage is GCC, which has two intermediate representations calledGENERIC and GIMPLE. GENERIC represents a function and its statements as a tree structures, while GIMPLE is a subset of GENERIC reduced by a process calledgimplification and used in the optimization stage. These representations are both independent of the programming language used[12].

Example: LLVM

LLVM is an IR that is, among other things, used in the Clang compiler for C. LLVM is also used for other back ends, for example Rust, Swift and GHC (for Haskell). These compilers use LLVM for optimizations and code generation. LLVM also aims to be a portable format by supporting several targets for code generation. LLVM uses a basic block and terminator model for control flow. The instructions are three-address-code with single static assignment variables.

(15)

Example: Rust and MIR

The official compiler for the Rust programming language has added a new IR named MIR (Mid-level Intermediate Representation) between its high-level AST and its low-level LLVM code generation, whereas previously the LLVM code was directly from the AST. The design of MIR is based on primitive three-operand statements and basic blocks with terminators. This design makes translations to LLVM, which also uses primitive operations and control flow representations, relatively simple to do [10].

Some of the main goals of MIR in Rust are improving compilation time by having more efficient data structures, enabling more Rust-specific optimizations, reduc-ing redundancy in the code base and makreduc-ing optimizations and other transfor-mations easier to work and reason with in general[10].

One notable difference from LLVM is that it is not SSA-based, i.e. it allows multiple assignments to the same variable, and named variables are kept as-is. However, generated temporaries are still typically single-assignment. As more advanced optimizations relying on SSA representations are typically done by LLVM, an lvalue-based representation rather than SSA is considered sufficient for this pur-pose.[10][11].

The complete Rust language including its various syntactic sugar constructions is reduced to a small subset that is easier to work with, since various redundant representation variations of a single low-level feature, all having to be handled separately, are now represented by fewer variants meaning the analysis has fr-wer cases to handle. Where control flow analyses fr-were previously done on sepa-rate control-flow graphs that had to be genesepa-rated from the AST, they can now be done directly on the MIR representation. The Rust safety analyses are also more accurate since the lower-level nature of MIR makes the difference between the analyzed structure and the final code smaller. Rust-specific optimizations can be directly done as a separate stage, whereas it previously often was done during con-version to LLVM, adding unwanted complexity to this concon-version phase. Apart from simplifying LLVM generation, MIR also adds potential for adding other low-level back ends in the future.[11].

The MIR data structure describes the workings of a single function and contains a control-flow graph stored as a list of basic blocks, a list of compiler-generated temporary variables, and a list of user-declared variables. A single basic block contains a list of statements and a terminator, which describes the control-flow action that occurs at the end of the basic block execution. A statement can either

(16)

2.1. Compilers

be a variable assignment or a drop (deallocation) of a variable, which is described explicitly unlike in the source language. An assignment statement contains an rvalue for the right-hand side and an lvalue for the left-hand side.

An lvalue can be variables of different kinds such as named, temporary, argument or return variables, a field in a struct or tuple, a pointer dereference, an array index, or a enum downcast[11], see table 2.1.

B User-declared variable binding

TEMP Compiler-generated temporary

ARG Function argument

RETURN Return value

LVALUE.f Struct or tuple field

*LVALUE Pointer dereference

LVALUE[LVALUE] Array index

(LVALUE as VARIANT) Enum downcast

Table 2.1: Available LVALUE types in MIR

An rvalue symbolizes an expression and can be the use of an lvalue, a mutable or immutable reference, a cast, a constant, a literals of a struct or built-in con-tainer type, the length of an object, or common simple binary operations and unary operations[11], see table 2.2. As shown below, most rvalue operations only take lvalues as arguments, meaning that for constants and data structure literals can only be used through temporary variables. The specialBOX value represents the memory allocation function taking the struct constructor method as its sole argument, and is used in the MIR call representation just like other functions[11]. Terminators in MIR can jump to another basic block with or without stack un-winding, jumps to one of two specified basic blocks depending on the truth value of a variable, jumps to one basic block from a list depending on the value of a variable, call a function and afterwards jump to one of two basic blocks depend-ing on if the function succeeded or failed, or simply return from the function with or without stack unwinding[11], see table 2.3.

(17)

Use(LVALUE) Value of LVALUE

[LVALUE; LVALUE] Array literal of specified size with the same defined value for all cells

&’REGION LVALUE Reference to LVALUE

&’REGION mut LVALUE Mutable reference to LVALUE

LVALUE as TYPE Cast

LVALUE <BINOP> LVALUE Binary operation

<UNOP> LVALUE Unary operation

Struct { f: LVALUE0, ... } Struct literal

(LVALUE...LVALUE) Tuple literal

[LVALUE...LVALUE] Array literal

CONSTANT Constant

LEN(LVALUE) Length of LVALUE

BOX Memory allocation function for

box operator Table 2.2: Available RVALUE types in MIR

Example: Swift and SIL

Similar to the Rust compiler, the official Swift compiler has also added a new mid-level IR between the AST and the generated LLVM code with the nameSIL (Swift Intermediate Language). Unlike MIR, SIL is SSA-based, but replaces the phi node concept with having arguments in basic blocks that are set by termina-tors jumping to that block. Within the block, the argument variables work like typical source variables. Like with MIR, literals have to be saved in temporaries before they can be used in operations. Calls are implemented differently in SIL, while MIR implements calls as terminators, SIL instead implements them as reg-ular statements. Operators are also implemented as calls to built-in functions rather than special rvalue constructions by like in MIR. More low-level memory operations are stored as explicit constructions than in MIR, including heap and stack allocations, memory accesses and reference counting handling [20].

(18)

2.1. Compilers

GOTO(BB) Jump unconditionally to basic block BB PANIC(BB) Start stack unwinding and jump to basic

block BB for cleanup

IF(LVALUE, BB0, BB1) Jump to BB0 if LVALUE is true, otherwise jump to BB1

SWITCH(LVALUE, BB...)

Jump to one of the listed basic blocks depend-ing on value of enum LVALUE

CALL(LVALUE0 =

LVALUE1(LVALUE2...), BB0, BB1)

Call function referenced in LVALUE1 with arguments in LVALUE2 onwards, store re-turn value in LVALUE0, jump to BB0 if call succeeded or BB1 if it panicked

DIVERGE Return and unwind stack

RETURN Return

Table 2.3: Available terminators in MIR

Example: Glasgow Haskell Compiler

One major compiler for the functional language Haskell is theGlasgow Haskell Compiler, GHC, which is an open source project. One of GHC’s IRs is Core. While being an IR, Core corresponds well to a simple source level language which elim-inates superfluous ways to express the same language construct [9]. For example, a list comprehension needs to be changed from a native Haskell construct into an expression based on variable bindings and functions in Core.

In Core, case-expressions are also restricted as they cannot match nested construc-tors of a value [7]. It is used to see which member of a union a value contains as well as accessing the attributes of the record. Core also flattens expressions by restricting their usage. An argument to a function must be a literal or variable (called atom in the paper), resulting in the dependence of function calls being explicitly ordered by variable bindings.

GHC has another lower level IR, theThe Spineless Tagless Graph Reduction Ma-chine, or STG [8]. The difference between STG and Core is that Core is meant to simplify expressions in a functional setting while STG is meant to help simplifi-cations targeted at modern processors. As such it specifies operational semantics, unlike Core. In addition, all type information information is lost in transforming Core to STG.

(19)

Parse Tree Core STG Cmm LLVM C Assembly desugar STGify CodeGen LLVM compiler NCG C compiler

Figure 2.1: Compilation of Haskell in GHC [9]

The operational semantics include a stack for arguments, returns, and the imple-mentation of the lazy calling convention. Arguments are pushed when a function application is evaluated and popped when entering closures with arguments. The return entries in the stack is actually not for function returns since the only eval-uation is from pattern matching, so the entry is for the result of a pattern match. The implementation of the lazy calling convention is done with a stack entry that causes a memory mutation of a suspended computation with the current value computed.

STG also has a heap which contains all values allocated until they are deallocated by garbage collection. An important feature for long running computations in a lazy language is black holes. When a computation is entered, it is replaced by a

(20)

2.2. The Modelica language

black hole, which does not keep any of the computation references alive , although the ones used when evaluating the computation are. This means that if garbage collection is performed while evaluating the black holed value, more things can be collected. For example, in code for finding the last value of a long linked list, earlier elements can be collected even if garbage collection happens in the middle of evaluation. Additionally, if evaluation tried to evaluate a black hole that it has created, then an infinite loop has been detected, so an exception can be thrown. Further down in the compilation pipeline, we findCmm, which is a processor portable intermediate language reminiscent of LLVM. Cmm consists of simple control flow between blocks, basic types that reflect machine representation and stack-backed unlimited variables[21]. Cmm contains no type information except for machine level representations like 32-bit signed integers. It also explicitly rep-resents the heap and stack and writing to byte addresses. As can be seen from fig-ure 2.1.2 [9], there are several back ends that starts from Cmm and then generate assembly.

2.2 The Modelica language

Modelica is a declarative and object-oriented language developed for equation-based modeling of complex and dynamic physical systems. It can be used for simulating, for example, mechanical, electrical, hydraulical and process oriented systems[5]. The Modelica standard exists in multiple implementations and is gov-erned by the international non-profitModelica Association[16]. Systems can be separated in smaller components which can then connect to other components and be distributed in model libraries. This enables equation systems to be reused and combined to make larger systems. Many common standard components are distributed by the Modelica Association in theirModelica Standard Library[16].

2.2.1 Primitive types and arrays

The primitive types supported are integers, reals (floating-point), booleans, strings, enumerations and a special clock type used for synchronous systems. In addition, support for complex numbers are implemented in a standard library. Multi-dimensional arrays are also supported, and can have dimension sizes that are unspecified at compile time. A data type for complex values is also imple-mented by the standard Modelica library[4].

(21)

Some of the primitive operations supported in expressions are scalar arithmetic operations (such as addition, subtraction, division, multiplication and exponenti-ation), elementwise arithmetic operations on arrays, comparisons, logical opera-tions, and if-expressions[4].

2.2.2 Models and equations

Modelica model classes describe the system to be modelled as a system of vari-ables with optional initial values and differential, algebraic and discrete equations, which can then be compiled and solved by the Modelica implementation for a given time slice. The class defined at the top of the program is automatically in-stantiated, and other classes can be instantiated by declaring them as variables in the top class[4].

Each equation consists of two expressions, one on each side of an equality (=) operator. The listed equations are not affected by the order in which they are listed and are acausal, meaning they do not have a fixed data flow direction. In order to support variation over time, variables can be surrounded by theder() time derivative operator, and the time variable can also be accessed directly as time. For-loops can also be used to declare repetitive equation series in a shorter way.[16][4].

Variables can optionally have defined initial values, and models also support ad-ditional variable types such as named constants and parameters, which unlike normal named constants can be set before simulation without recompiling[4]. For example, a pendulum can be modelled as in the following example taken from page 21 in Principles of Object-Oriented Modeling and Simulation with Modelica 3[4]. This model contains both differential equations and algebraic equations, and is therefore an example of an differential algebraic equation system (DAE). This system can be simulated by calling thesimulate function, for example by writingsimulate(Pendulum,stopTime=6)[4] and then plotted by calling theplot function with the variable to be plotted as its argument[4].

model Pendulum

parameter Real m=1, g=9.81, L=0.5; //mass, gravity, length of pendulum Real F; //force

output Real x(start=0.5), y(start=0)

//x and y position with set start values output Real vx, vy; //x and y velocity

(22)

2.2. The Modelica language equation m * der(vx) = -(x / L) * F; m * der(vy) = -(y / L) * F - m * g; der(x) = vx; der(y) = vy; x^2 + y^2 = L^2; end Pendulum; 2.2.3 Model inheritance

Models can extend on other models, and therefore provide more specialization while reusing code, similar to hierarchical class inheritance in typical object-oriented languages. By inheriting equations, data variables and class members from a base class, a subclass can inherit part of their behaviour while modifying and adding on it by adding additional equations and variables[4].

Model classes can be partial, meaning that their equation systems are under-specified and can only be made solvable by extending them with subclasses pro-viding additional equations, this can be seen as an analog to abstract classes in object-oriented languages. Variables of an instances are accessed though dot syn-tax, though they can be protected from outside access by putting them in the protected section, which will block direct access from outside but still make them available in submodels.[16][4].

Classes can also contain variables with type declarations that are replaceable by subclasses, similar to generics in other languages. A field with a replaceable type is simply prefixed by theprotected keyword. For making a new class based on a class with replaceable types, a new type definition specifying the types is made which can be then be instantiated like a regular class[4].

2.2.4 Connections

Model instances can be connected to each other through special connect-equations in order to create larger systems. The interfaces for these connections are specified by connector classes, which contain a list of the variables that are carried by the signals. Variables in a connector can optionally be configured as flow variables, indicating that the values of all connected signals will sum to zero instead of being equal[4].

(23)

Connections are generally acausal, meaning that they like equations lack a spec-ified data direction, but they can also be specspec-ified as input or output connections, meaning that they can only receive from or send to a component, respectively[4]. When connecting one variable in a component to many subcomponents without having to make a large number of connect-equations explicitly, it can be made implicitly by prefixing the shared variable in the top component with theinner keyword and declaring a reference variable with the same name in the subcom-ponents prefixed by theouter keyword[4].

Discrete events

Discrete instantaneous events can be modelled by using or by using the when-statement, which only activates its subequations at the exact time moment when one or more of its condition expressions transitions to true. Discrete and con-tinuous components can be freely combined to create hybrid systems. A when-statement can contain a specialreinit equation that resets a variable to a new value on the event. In a reinit equation, the previous value of the variable can be accessed through thepre operator. Apart from the when-statements, simple if-expressions and if statements in normal equations may also be used to model discrete changes[4].

Basic electronics example

In listing 1, we take examples fromPrinciples of Object-Oriented Modeling and Simulation with Modelica 3 to give a taste of Modelica. The listing defines electrical components in Modelica by defining variables, equations, connectors and using inheritance so that shared equations can be defined in a single partial superclass [4].

Packages

In order to avoid name conflicts and simplify sharing code, libraries can be dis-tributed as packages, which gives all content in the library its own hierarchical namespace. Other packages can then be imported in another package with the import keyword, which optionally allows importing namespaces directly at the top-level within the package. Within a package, an imported namespace can be given custom names so that typing can be reduced without risking name conflicts as with top-level imports.

(24)

2.2. The Modelica language type Voltage = Real(unit="V");

type Current = Real(unit="A"); type Resistance = Real(unit="Ohm"); type Capacitance = Real(unit="F"); connector Pin "Electrical pin"

Voltage v; flow Current i;

// the flow keyword indicates that any connected // variables should sum to zero

end Pin;

partial model TwoPin "Electrical component with two pins"

// partial since it does not have enough equations // to be fully defined Pin p,n; Voltage v; Current i; equation v = p.v - n.v; 0 = p.i + n.i; i = p.i; end TwoPin; model Resistor extends TwoPin;

// include all variables and equations from TwoPin parameter Resistance R; equation R_*i = v; end Resistor; model Capacitor extends TwoPin; parameter Capacitance C; equation C*der(v) = i; end Capacitor; model Ground Pin p; equation 0 = p.v; end Ground; model LowPass Pin in,out; parameter Resistance R; parameter Capacitance C; Resistor resistor(R=R); Capacitor capacitor(C=C); Ground ground; equation connect(in, resistor.p); connect(resistor.n, out); connect(out, capacitor.p); connect(capacitor.n, ground.p); end LowPass; 16

(25)

2.2.5 Functions and algorithms

More traditional imperative code can be written in Modelica inside algorithm sec-tions. Unlike in normal equation sections, variables are assigned values directly with the:= assignment operator, they can also be assigned multiple times within a single section. Both recursion and common imperative control flow statements such as if-then-else, for and while are supported. Algorithm sections in Modelica are pure, i.e. without side-effects and global state, in order to support safe usage inside equation systems. [4]

The special function class type can be used for implementing named mathematical functions using algorithm sections. Functions can have multiple input variables and, unlike many other languages, multiple outputs variables as well. Functions can also declare local variables inside protected sections for use in the algorithm section. [4]

Two examples of implementations for the factorial function are provided below: function factorial_recursive input Integer i; output Integer o; algorithm if i > 1 then o := i _* factorial_recursive(i-1); else o := 1; end if; end factorial_recursive; function factorial_imperative input Integer i; output Integer o; protected Integer acc; algorithm acc := 1; for x in 2:i loop acc := x_*acc; end loop; o := acc; end factorial_imperative;

(26)

2.2.6 MetaModelica

MetaModelica is an extended version of Modelica designed for modeling program-ming languages. It complements the algorithm support in Modelica with various features common to functional programming, such as tagged unions with support for recursion, linked lists, tuples, and pattern matching. It also adds support for exception handling and generics [6].

Parameterized types

Parameterized types enable types to be specialized by another type as a parameter, and is similar to generics in other programming languages. Most of the new built-in types built-in MetaModelica support type parameters [6].

Lists

Lists contain an arbitrary number of objects of a single type. Lists are imple-mented as immutable linked lists like in many functional languages, meaning that they are immutable which enables parts of lists to be shared between different lists. New lists can be created in constant time by inserting new values before existing lists with the:: (cons) operator. [6]. However, some operations like ap-pending, getting a value from a specific index, and calculating the list length will have linear time complexity. Lists can be created either with the cons operator or by braces-surrounded list literals listing all values in the list, this is also used to represent the empty list{}[13].

In addition, pattern matching can be used for extracting values from or comparing lists[6]. MetaModelica also has several built-in methods for performing various operations on linked lists[13]:

listAppend — Returns a copy of a list concatenated with another list

listDelete — Returns a copy of a list with a specific index-specified object

skipped

listEmpty — Returns a boolean indicating if a list is empty (has length 0) listHead — Returns the first object in a list

(27)

listMember — Returns a boolean indicating if a list contains a specific value listLength — Returns the length of a list

listRest — Returns the tail of the linked list (every object except the first) listReverse — Return a reversed copy of a list

List<Integer> l, l2, l3; //variable declaration

l := {3, 4, 5}; //list literal l2 := 2 :: l;

//creating a new list {2, 3, 4, 5} with the cons operator i := listGet(l, 2);

//accessing the second value through in the list (4) len := listLength(l);

//getting the list length (3) l3 := listReverse(l);

//getting a reversed list ({5, 4, 3})

Tuples

Tuples contain an arbitrary number of objects of mixed types, and can be seen as a way to create simple records without having to write record declarations. Values in the tuple can be accessed either through pattern matching or by dot notation, denoted by following the tuple with a dot and the index of the object (1-indexed)[6].

Tuple<Integer, String, List<Real>> t; //variable declaration t := (12, "hello", {1.0, 2.0, 3.0}); //tuple literal

i := t.2; //accessing the second value through dot notation

Union types

Union type objects store record data with a type-safe constructor describing its variant, and are similar to algebraic data types in functional programming. One or more record types can be defined for a single union type. Union type instances are also immutable, i.e. its fields can not be modified after it has been created.

(28)

Union types are recursive, meaning that they can have fields of its own type, and are therefore useful for describing tree structures, such as abstract syntax trees. Pattern matching can be used for checking and extracting field values[6]. uniontype Number record INT Integer int; end INT; record RATIONAL Integer int1; Integer int2; end RATIONAL; record REAL Real re; end REAL; record COMPLEX Real re; Real im; end COMPLEX; end Number;

Number a; //variable declaration

a := RATIONAL(8, 13); //literal with RATIONAL constructor a := REAL(1.618033); //literal with REAL constructor

Option types

Option type values either carry a single field of a specific type or none at all, and is generally used for cases where objects are optionally defined. They are im-plemented as a built-in parameterized union type with the constructorsNONE() orSOME(x) where x is a object of the parameter type. The constructor can be checked with the ‘isSome‘ and ‘isNone‘ functions, and option type values can also be unpacked with pattern matching like other union types[6].

Option<String> o; //variable declaration o := NONE(); //none literal

o := SOME("hej"); //some literal if isNone(o) then

...

(29)

Pattern matching

One of the most important features in MetaModelica is its pattern matching sup-port, which is similar to pattern matching in many functional languages. This can be used for more advanced control flow and enables simple and powerful handling of structural data[6].

Each case is tested in the order they are listed and contains a pattern, the body to be executed and a case return expression calculated and returned by the match expression after the body has finished. The unit value() can be returned if an actual return value is not desired. The return value can also be a tuple, allowing multiple values to be returned. The return values in all cases in a single match statements are required to be of the same type. The body for each case can either be a algorithm section or a equation section, equation sections are however not allowed to contain differential equations. A match statement can have its own set of local variables, these can also be used for pattern binding[6].

Patterns that can be matched in a case include scalar constants such as integers and strings, record constructors with named or positional arguments, tuples, lists made with literal syntax, lists made with the cons (:: operator, and the _ wild card which allows and ignores all values, these patterns can also be nested. Vari-ables placed in a pattern will be bounded, i.e. assigned the actual value, if the case match succeeds. In addition, the whole pattern itself can be bound to a variable with the specialas binding operator. The __ pattern as the single argument to a record constructor can be used to bind all fields without having to explicitly name them. Apart from the pattern expression itself, a pattern can also include a guard expression which must be true for the matching to succeed, this expression pattern can include variables from the pattern expression[6].

Pattern matching expressions come in two variants with different behaviour when an exception is raised in the case body: match, which makes the whole match statement fail as expected and _{matchcontinue, which instead rewinds the} state and tries the following patterns, failing the whole match expression only when all patterns have been exhausted[6].

Comprehensions

List and array comprehensions allow the user to write concise mapping and fil-tering on collections using some syntactic sugar. They take map expression and one or more collections with a named iterator variable for each collection, and

(30)

2.3. The OpenModelica environment

can optionally take guards filtering the values. There are also “threaded” compre-hensions which work like a zip between any number of lists[6].

list<Integer> l0 := list(1+x for x guard 0<=x in otherList);

list<Integer> l1 := list(a+b threaded for a in 1:2, b in 3:4);

// {1+3,2+4}

list<Integer> l2 := list(a+b for a in 1:2, b in 3:4);

// {4,5,5,6}

Exception handling and asserts

Exceptions such as out-of-bounds accesses and divisions by zero can be tested by putting the expression or statement inside a_{failure call, which will succeed} if the test statement causes an exception and throw an exception if the test state-ment succeeds. If an unhandled exception occurs inside a matchcontinue case, the program will then rewind the state and try the following cases rather than making the entire match statement fail. Exceptions can also be generated explic-itly with thefail function, or by assertions using the assert function, which takes an assertion condition, a message string and optionally an assertion severity level[6].

2.3 The OpenModelica environment

OpenModelica is an open-source Modelica-based simulation and modeling envi-ronment. Some of its main purposes is to provide efficient, easy-to-use and well visualized Modelica-based simulations while also serving as a teaching and re-search tool and as a reference implementation that is itself written largely in Modelica[5]. Most of the development of OpenModelica is done by Linköping University in Sweden.

2.3.1 Compiler structure

The OpenModelica compiler takes Modelica code and translates it to C code which can be compiled by a standard compiler. The subsystem also provides an inter-preter so that code can be tested interactively[3].

Most parts of the OpenModelica compiler are written in MetaModelica. The OpenModelica compiler can compile MetaModelica code, including bootstrapping itself[18].

(31)

Translator Analyzer Optimizer Code Generator C Compiler Simulation Modelica source

DAE with flattened models

DAE with sorted equations

DAE with optimized sorted equations

C source code

Executable program

Figure 2.2: Overview of translation phases in the OpenModelica compiler

The OpenModelica Compiler is organized, like most other compilers, as a pipeline of these phases[4][3] as seen in figure 2.2:

Translator — parses the source code into the initial Absyn-format AST, con-verts it into the simplifiedSCode-format intermediate AST, and reduces the object-oriented structures to a single flat equation system in theDAE-format AST. Type checking and other static analyses are also performed here. Analyzer — performs transformations on the equation system so that they can

be efficiently solved, including dependency sorting the equations and con-verting to imperative assignments.

(32)

2.3. The OpenModelica environment Parse SCode/explode Inst BackendDAECreate Symbolic operations (BackEnd) SimCode Code generator

Lookup Static Ceval

Modelica code

Absyn

SCode

DAE

Backend DAE

Sorted and optimized DAE

SimCode

C code

(33)

Code Generator — generates compilable C code from the DAE. This code is then passed to a C compiler.

A more detailed overview on some of the most relevant modules used in the code generation is shown in figure 2.3.

2.3.2 Susan as a Code Generator

Susan is a template language used by the OpenModelica Compiler. Its purpose is to allow easy to use text generation from MetaModelica structures.

A Susan file consists of several templates that accept some MetaModelica data type and return text. Templates can also use what’s calledbuffers to fill in holes left in the returned text. Templates may be used solely for their effects on buffers and not for the text they return.

See listing 2 for an example of a Susan template. The listing contains a buffer auxFunction and a match on var. The cases of the match return the fi-nal result of the entire template. The VARIABLE case has a nested template contextCref to which it passes the auxFunction buffer.

template funArgBoxedDefinition(Variable var)

"A definition for a boxed variable is always of type modelica_metatype, unless it's a function pointer" ::=

let &auxFunction = buffer "" match var case VARIABLE(__) then 'modelica_metatype <% contextCref(name,contextFunction,&auxFunction) %>' case FUNCTION_PTR(__)

then 'modelica_fnptr _<%name%>' end funArgBoxedDefinition;

Listing 2: A snippet in the Susan template language

2.3.3 The DAE representation

The DAE representation is a AST representation that, unlike the previous repre-sentation stages, have the object-oriented structures such as class instances and

(34)

connections simplified and flattened into a single equation system. This flatten-ing is done from theSCode representation by the Inst module. However, Meta-Modelica data structures are still preserved and constructed in run-time. Like the other representations in OpenModelica, it is implemented using MetaModelica data structures such as union types, optionals and lists[3].

A function in DAE can contain various different elements, such as algorithms, equations of different kinds, variables, reinit statements, calls and asserts[15]. This overview will focus on the part implementing the algorithm subset, which is the subset most relevant to the IR implemented in this thesis.

Elementand Algorithm union types

A function contains elements of various types, such as algorithm sections, equa-tions of different forms, and variables. These are represented by theElement algo-rithm[15]. Described below are the element types most important to this thesis. Although all element types contain a source field of theElementSource union type containing metadata such as source code line numbers and classes and in-stances it belongs to, this field is skipped in these descriptions for brevity.

VAR - This element type represents variables and contains many fields related to

names, types, equation flow and connections. The most important ones for this thesis are the component reference and the type field.

ALGORITHM - This element type represent algorithm sections and contains a

field of theAlgorithm union type, which simply contains a list of state-ments.

ComponentRefunion type

Component references represent hierarchical path names and are typically used for describing variables[15].

CREF_IDENT — This record type represents a non-hierarchical or bottom-level

identifier, and contains the name as a string, its type and a list of optional subscripts.

CREF_ITER — This record type is used for iterators, and contains an index used

(35)

CREF_QUAL — This record type represents a higher level in a hierarchical path,

and contains a component reference to the level below in addition to the data inCREF_IDENT.

Absyn.Pathunion type

While Absyn.Path is strictly part of the Absyn representation definitions, it is frequently used in DAE for externally accessible objects such as functions or union types, and so it is mentioned here.

IDENT — This record type represents a non-hierarchical or bottom-level

identi-fier, and contains the name as a string,

QUALIFIED — This path type represents a higher level in a hierarchical path

,and contains the path to the level below in addition to the name string of its level.

Statementunion type

The statement record types available are assignments of various types and control flow statements such as calls, if statements, loop statements like for and while, when statements, and simple skipping statements like break, continue and re-turn[15]. Described below are the statement types most important to this thesis. Although all statement types, like the element types, contain aElementSource source field containing metadata, this field is skipped in these descriptions for brevity as well.

STMT_ASSIGN — This statement type describes an assignment and contains the

type of the assignment and the expressions of the left and right hand side.

STMT_IFand the Else union type — This statement type describes an if statement and contains the conditional expression, a list of statements to be executed when the condition is true, and a value of the Else union type to describe the behaviour when the condition is false. The type in the Else union type field can either beNOELSE signifying that nothing is done, ELSEIF performing another conditional step and having the same fields as aSTMT_IF, or a ELSE which simply contains a list of statements to be executed on a false condition.

(36)

STMT_FOR — This statement type describes a for(each) statement and contains

the type of the iterator, the name of the iterator variable, the range expres-sion to be iterated over and a list of statements executed in the loop body. It also contains a few additional code generation-aiding variables which did not have to be considered in the development of this thesis.

STMT_WHILE — This statement type describes a while statement and contains

a conditional expression and a list of statements executed in the loop body.

STMT_NORETCALL — This statement type describes a call not having or storing

any return values, and the only field is contains is an expression of the call type described further down.

STMT_BREAK, STMT_CONTINUE and STMT_RETURN — These statement types simply describe break, continue and return statements and do contain any additional data. Note that value returns in Modelica are done by as-signments to designated output variables rather than by return statements, therefore _{STMT_RETURN does not contain any return values, but simply} exits the function.

Typeunion type

This union type represents the data types used in DAE[15].

T_INTEGER, T_REAL, T_STRING and T_BOOL — These types simply repre-sent the basic data types in Modelica, i.e. integers, reals, strings and booleans.

T_NORETCALL — This type represents the return value of a call without output

variables.

T_TUPLE — This type represents tuples as returned from functions with

multi-ple output values contains a list of types indicating the type of each tumulti-ple element and a optional list of tuple field names as strings.

T_METALIST — This type represents MetaModelica lists and contains a type

field indicating the type of its elements.

T_METATUPLE — This type represents MetaModelica tuples and contains a list

(37)

T_METAOPTION — This type represents MetaModelica optionals indicating the

type of its element when it contains a value.

T_METAUNIONTYPE — This type represents MetaModelica union types. T_METARECORD — This type represents MetaModelica records, and contains an

Absyn.Path to the union type, an Absyn.Path to the record, a list con-taining the type of each field, the constructor ID for the record, a list of the Var components of each field, and a boolean indicating if the record type is a singleton.

T_METAARRAY — This type represents MetaModelica arrays and contains a type

field indicating the type of its elements.

T_METABOXED — This type represents MetaModelica boxed values.

Expunion type

This union type represents the expression types that can be used in DAE such as literals, operators, variable references and calls[15].

ICONST, RCONST, SCONST and BCONST — These expression types simply represent constants of the basic Modelica data types, i.e. integers, reals, strings and booleans. Its sole field is the constant value it contains.

CREF — This expression type represents a variable reference and contains a

com-ponent reference field and the type of the variable.

BINARYand UNARY — These expression types represent binary or unary arith-metic operations and contains one or two subexpressions and aOperator value denoting the operation to be performed.

LBINARYand LUNARY — These expression types represent binary or unary logical operations such as and, not, and or. Similar to the arithmetic op-erations, it contains one or two subexpressions and a_{Operator value} de-noting the operation to be performed.

RELATION — This expression type represents comparisons. Apart from having

two subexpressions and a _{Operator value like other binary operations,} it has some additional fields for model simulation handling which is not considered here.

(38)

IFEXP — This expression type represents an if expression and contains three

subexpression: one for the condition, and one each for the true and false case.

CALL — This expression type represents a call and contains the name of the

function, a list of subexpressions denoting the arguments and a special CallAttributes field storing various additional data about the call. Some of the data stored inCallAttributes are the type of the return value, if the function call return multiple values as a tuple, if the call is to a built-in function, and if the call is inline or a tail call.

RANGE — This expression type represents numeric ranges is typically used in

for statements and contains the type of the numeric values, the start value, the end value and optionally the step between each value, which is 1 if not specified.

CAST — This expression type represents a type cast and contains the type the

value is cast to and a subexpression representing the value is being cast.

TSUB — This expression type represents tuple subscripts and contains the

subex-pression to be subscripted, the integer index, and the type of the returned value.

ASUB — This expression type represents array subscripts and contains the

subex-pression to be subscripted and a list of integer indexes with each value rep-resenting a different array dimension.

RSUB — This expression type represents record value accesses and contains the

subexpression of the record, the integer offset of the field, the name of the field, and the type of the returned value.

LIST — This expression type represents a MetaModelica list literal or a nil node

and contains a list of subexpressions denoting each element stored in the list.

CONS — This expression type represents a MetaModelica list node and contains

two subexpressions denoting the head and tail of the list node.

META_TUPLE — This expression type represents a MetaModelica tuple node and

contains a list of subexpressions denoting each element stored in the tuple.

META_OPTION — This expression type represents a MetaModelica optional and

(39)

METARECORDCALL — This expression type represents a MetaModelica record

constructor and contains the path to the record, the arguments as a list of subexpressions, a list of field names, the record variant number, and a list of types for each field.

MATCHEXPRESSION — This expression type represents match expressions

and contains a field of the MatchType union type that can be MATCHCONTINUE or MATCH, a list of subexpressions for the expressions to be matched, a list of local declarations as Element values, a list of cases as MatchCase values, and the type of the match expression. The MatchCase union type is described more in detail below.

BOX — This expression type represents a MetaModelica boxed value and contains

a subexpression for the value to be boxed.

UNBOX — This expression type represents the unboxing of a MetaModelica boxed

value contains a subexpression for the value to be unboxed and a type field indicating the type of the unboxed value.

PATTERN — This expression type represents various patterns as used in match

statements. Its sole value is of thePattern union type described more in detail below.

MatchCaseunion type

This union type represents a single case in a match expression and contains a single variant record type ‘CASE‘. It contains a list of patterns of thePattern union type, an optional guard subexpression, a list of local declarations as ele-ments, a case body as a list of stateele-ments, an optional case return subexpression, and some source-code related metadata[15].

Patternunion type

This union type represents patterns used in match expressions, and can also be recursive like expressions[15].

PAT_WILD — This pattern type represents a wildcard that accepts all values

(40)

PAT_CONSTANT — This pattern type matches various literals like numerals,

strings, empty list, andNONE. The record contains the expression and op-tionally a type used for unboxing the value.

PAT_AS — This pattern type allows binding the entire value to a name while

continuing to match on its contents, such aslistVar as _::tailVar, and contains an identifier, an optional type for unboxing, some attributes of the identifier, and the pattern that will be matched.

PAT_META_TUPLE — This pattern type matches the content of a tuple and

con-tains a list of patterns, one for each element.

PAT_CONS — This pattern type represents a linked list node and contains two

subpatterns representing the head and tail of the list.

PAT_CALL — This pattern type matches a union type constructor and contains

a name, the index of the matched record within its union type, the patterns for each record attribute, a list of variables for each attribute, a list of types, and a boolean indicating if the union type is known to be a singleton.

PAT_SOME — This pattern type represents an optional with aSOME value and contains a subpattern for the actual value.

(41)

Method

3.1 Design

During the design phase, different IR designs and existing IR solutions of notable compilers were evaluated and compared in order to create an initial IR design. The evaluation focused on extendability, ability to implement optimizations and ease of implementation with regards to conversions from the AST and to the back-end code, with special focus on easy conversion with SSA-based back ends such as LLVM.

The code base of the OpenModelica compiler and its corresponding documenta-tion was also investigated in order to make good design decisions.

3.2 Implementation

The implementation roughly consists of three parts: one phase converting the DAE representation to the new IR, one optimization phase where the generated IR is improved in some respect, and another one converting the new IR to com-pilable C-code. MetaModelica was used as the programming language for the implementation, since this language is used by the rest of the compiler.

3.3 Performance evaluation

During the evaluation phase, the code quality and performance of the new code generator were compared to the results for the old code generator. These results

(42)

3.4. Code complexity measurements

was then analyzed in order to see how large the differences are between the new representation and its optimizations and if the new generator gives an improve-ment.

The time was measured with theexecStat timing module that is built-in into the OpenModelica compiler. As execution time of compiled code wasn’t previ-ously measured, this had to be implemented separately with a core change out-side the MidCode code base. The test-cases were executed multiple times in order to guard against anomaly results, then a result representing the median case was picked. The input data and exact number of execution times were chosen so that the total time would be large enough to be accurately measured while not taking too long time to run. The computer used to run the measurements was a laptop with a Intel i7 2630QM (Sandy Bridge) processor.

The following benchmark functions were made, which can be seen in appendix A:

fibonacci – Recursive fibonacci F30without memoization, executed 100 times

mandelbrot – ASCII Mandelbrot with 1000 iterations returning a linked list of characters, executed 200 times

tak – Takeuchi function tak(18, 12, 6), executed 10000 times

qsort – Quick-sort of a random array of 20000 elements, executed 100 times The C compiler used for compiling the generated code was GCC 7.2. The opti-mization setting for the C compilation was changed to-O2 rather than the usual -O0 since it was noticed that the low-level style of the MidCode-generated C code was poorly suited for unoptimized compilation. It was also noted that the parti-tion funcparti-tion in the Quicksort test was tail-call optimized by the original genera-tor, something that has not been implemented in the current MidCode generator.

3.4 Code complexity measurements

The complexity of the different code generators was also measured using the num-ber of lines of code (LOC). This is measured because being able to have simpler code generators means that there is less work to port the language to another code generator.

(43)

According to Nguyen et al., LOC is used widely within industry and literature while being an essential component of several more advanced software complex-ity measurements[14]. Specifically, we use the number of lines in the file includ-ing empty lines, comments, etc. A discussion of how appropriate and relevant this metric is can be found in the section 5.3.

Both of the target specific implementations are in the Susan template language. We compare toCodegenCFunctions.tpl, which is closest in functionality. Unfortunately, this is not a precise comparison since the file chosen for compar-ison implements more features than our implementation. The old back end also has more template files likeCodegenC.tpl, see table 4.1, but it is mostly im-plementing features that are outside the scope of this thesis.

(44)

Chapter

4

Results

4.1 Overview of the MidCode design

MidCode, the resulting IR, represents the control flow of a procedure by the com-mon approach of basic blocks. Each basic block has a terminator which declares what control flow action happens at the end of the block, this may include opera-tions returning values such as calls. The data flow of the procedure is represented by named variables, compiler-created temporaries and simple unary or binary operations. Unlike SSA, named variables can be rewritten.

The MidCode related code paths are divided into three phases: “From Modelica to MidCode”, “MidCode Transformations”, and “From MidCode to C”.

4.1.1 IR design details

This part describes the uniontypes and records defined for MidCode and the fields contained within these.

Program

A program is represented by theProgram type. This type contains a name and a list of functions.

(45)

DAEToMid MidCode transformations MidToC DAE/SimCode representation MidCode MidCode C code

Figure 4.1: Overview of MidCode phases

Function

Functions are represented by theFunction type. Each function contains a name as anAbsyn.Path, several lists of local, input and output variables, a body rep-resented as a list of basic blocks, and ID references to the special entry and exit basic blocks.

Block

Basic blocks are represented by theBlock type. They contain a block ID number, a list of statements and a terminator.

Stmt

Statements are represented by the Stmt type and can either be a NOP or an ASSIGN, which simply assigns the value of an RValue to a Var. A statement has linear control flow but otherwise has various effects.

Var

Variables are represented by theVar type, and are used to represent both vari-ables used by the Modelica code and varivari-ables introduced during the translation process. Vars have a name and a data type.

(46)

4.1. Overview of the MidCode design

OutVar

Since output variables can be thrown away by the caller, lists of output variables in call statements contain theOutVar type rather than the plain Var type. In-stances of this type can either be aOUT_VAR containing an actual Var instance, orOUT_WILD indicating that the caller will not save the value.

RValue

An RValue is a value that can be placed on the right side of an assignment. TheRValue type in MidCode contains a few expressions like addition of two Vars and negating a Var. They appear in MidCode as part of assign statements. RValues do not have other RValues as operands, instead temporary variables are created during the translation process which are then sent as operands.

UNARYOP — AnUNARYOP is a constructor of the RValue union representing operations with a single operand, i.e. a singleVar. UNARYOP has variants representing for copying the unchanged value, negating, logically inverting, boxing and unboxing a variable. The operation to choose is determined by an enumeration value.

BINARYOP — ABINARYOP is a constructor of the RValue union representing operations with two operands, i.e. twoVars. BINARYOP has several vari-ants representing common operations like addition, subtraction, division, multiplication, logical or/and, and comparisons. The operation to choose is determined by an enumeration value.

Literal value constructors — A group of constructors of the RValue union rep-resenting literal values. The LITERALINTEGER constructor represents integer literals, the LITERALREAL constructor represents real (floating-point) literals, the LITERALBOOLEAN represents boolean values, and LITERALSTRING represents literals. The more complex meta object lit-erals used for records, linked lists, optionals and tuples are represented by theLITERALMETATYPE constructor.

Meta object data accessors — A group of constructors that are used for access-ing data about meta objects. TheMETAFIELD constructor returns a value from a meta object slot and is used for accessing record and tuple fields. There are also three constructors specifically made for pattern matching, UNIONTYPEVARIANT returning the value of the record variant for union-types,ISCONS for checking if a linked list node is cons or nil, and ISSOME for checking if an optional has a value.

(47)

Terminator

Each basic block has a terminator controlling the control flow following the block, which is represented by theTerminator type. Terminators have effects and can cause branching and/or exceptional control flow.

GOTO — TheGOTO terminator simply jumps to a given block.

RETURN — TheRETURN terminator simply exits the procedure.

BRANCH — TheBRANCH terminator jumps to one of two given blocks depending on if the given condition variable is true or false, and is used by several terminator types.

SWITCH — TheSWITCH terminator jumps to one of multiple given blocks in a dictionary depending on the value of the given condition variable, this is used when generating code in match statements.

CALL — TheCALL terminator is a function call to another Modelica function. Since it can cause control flow via exceptions (for example through the fail function), it is defined as a terminator rather than a statement.

LONGJMP, PUSHJMP and POPJMP — The LONGJMP terminator causes a con-trol flow transfer to the active _{PUSHJMP call site, even across function} boundaries. ThePUSHJMP terminator is used to add a new active location forLONGJMP while the PUSHJMP terminator is used to deactivate a corre-sponding active_{PUSHJMP and cause the previously called one to become} active.

ASSERTand TERMINATE — The ASSERT terminator aborts the program with an error message if a_{Var containing a condition result has a false value.} The TERMINATOR simply unconditionally aborts with an error message. The error message for both terminators is given by aVar.

4.1.2 From Modelica to MidCode

MidCode is designed to represent interesting low-level properties uniformly, which means that we need to lower several high-level Modelica representations into a composition of MidCode constructs. The DAEToMid phase takes Mod-elica functions as given from the SimCode module and converts it to Mid-Code. The most important fields in a SimCode function object are its name as anAbsyn.Path¸ its variable definitions, and its list of DAE statements.