Linköpings universitet SE–581 83 Linköping Master thesis, 30 ECTS | Datateknik 2018 | LIU-IDA/LITH-EX-A--18/014--SE
Efficient IR for the
OpenModelica Compiler
Effektiv IR för OpenModelica-kompilatorn
Patrik Andersson Simon Eriksson
Supervisor : Martin Sjölund Examiner : Peter Fritzson
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – un-der 25 år från publiceringsdatum unun-der förutsättning att inga extraordinära om-ständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garan-tera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och ad-ministrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.
Copyright
The publishers will keep this document online on the Internet – or its possible re-placement – for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies per-manent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this per-mission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures
as described above and to be protected against infringement. For additional in-formation about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.
©
Patrik Andersson Simon Eriksson
Abstract
The OpenModelica compiler currently generates code directly from a syntax tree representation, which leads to inefficient code in several cases. This the-sis work introduces a lower-level intermediate representation for the com-piler which aims to simplify the comcom-piler back end and enable more opti-mizations. The resulting design of the representation features flat primitive operations and control flow using basic blocks and terminators. Variables are mutable, unlike SSA-based representations. Introducing the IR did not signif-icantly change the runtime performance of the test programs. The number of lines of code compared to the old back end was reduced to a quarter, this and the simpler representation will help future work on optimization passes and implementing an LLVM-based back end.
Abstract iv
Contents v
List of Figures vii
List of Tables viii
1 Introduction 1 1.1 Background . . . 1 1.2 Motivation . . . 1 1.3 Aim . . . 2 1.4 Research questions . . . 2 1.5 Delimitations . . . 2 2 Theory 4 2.1 Compilers . . . 4
2.2 The Modelica language . . . 12
2.3 The OpenModelica environment . . . 22
3 Method 33 3.1 Design . . . 33
3.2 Implementation . . . 33
3.3 Performance evaluation . . . 33
3.4 Code complexity measurements . . . 34
4 Results 36 4.1 Overview of the MidCode design . . . 36
5 Discussion 44
5.1 Performance results . . . 44
5.2 Design of MidCode . . . 44
5.3 Code Complexity of MidCode . . . 45
5.4 Related Work . . . 45
5.5 The work in a wider context . . . 46
6 Conclusion 47 6.1 Future work . . . 47
A Performance test functions 49 A.1 Fibonacci . . . 49
A.2 Mandelbrot . . . 49
A.3 Quicksort . . . 50
A.4 Takeuchi function . . . 51
2.1 Compilation of Haskell in GHC . . . 11 2.2 Overview of translation phases in the OpenModelica compiler . . . . 23 2.3 Overview of the OpenModelica components . . . 24 4.1 Overview of MidCode phases . . . 37
List of Tables
2.1 Available LVALUE types in MIR . . . 8
2.2 Available RVALUE types in MIR . . . 9
2.3 Available terminators in MIR . . . 10
4.1 Lines of code for corresponding parts of old back end . . . 43
4.2 Lines of code for new back end . . . 43
Introduction
1.1
Background
OpenModelica is an open-source modeling and simulation environment developed mainly by the non-profitOpen Source Modelica Consortium (OSMC) that imple-ments the open standardModelica modeling language. Modelica is a declarative equation-based language designed for describing various complex and dynamic systems and can be used for simulating, for example, mechanical, electrical, hy-draulical and process oriented systems. OpenModelica is mainly targeted towards industrial and academic purposes.
1.2
Motivation
Currently, the C code generated by the OpenModelica compiler is inefficient in many cases, causing significant performance issues, especially considering that the data inputs may often be large and one major reason for that is due to the code being generated directly from a high-level syntax tree-based representation of the Modelica code[2].
A better solution would be to convert this representation to a lower-level inter-mediate representation (IR) more suitable for optimization and code generation before actually generating the code. Later on, implementing a stage converting the new lower-level IR to a more common representation like for example LLVM would be feasible.
1.3. Aim
1.3
Aim
The aim of this thesis is to design and implement a new efficient and maintainable IR solution for OpenModelica. The new IR stage should be able to compile various testing programs with roughly equal run time performance to the old code gener-ation while simplifying the back-end code genergener-ation and enabling various useful lower-level transformations in the future. This new code generation should easily be extended with more lower-level optimizations and new back ends. Especially a future LLVM-based back end would be interesting in the long term.
1.4
Research questions
1. How can a new IR help the implementation of optimizations and other code transformations?
2. Which IR design choices (of some common alternatives) are most suitable for OpenModelica?
3. How can this new IR be implemented in the OpenModelica compiler? 4. How much of the back end can be moved to a shared portable format leaving
target specific implementations simpler?
5. How will the new IR affect the run-time performance of OpenModelica?
1.5
Delimitations
This project will focus on evaluating some common IR approaches and on imple-menting the new IR and its corresponding C code generator. The IR approaches to evaluate should be low-level but platform-independent and well suited for trans-forming to LLVM. Alternative special-purpose C code generator variants (for ex-ample for parallelization or embedded devices) will not be considered.
While Modelica has many language features specific to simulation such as equation-based models and model connections, the project will focus on imple-menting Modelica functions, its algorithm feature subset and the MetaModelica extension providing various features common in functional programming but not included in the Modelica standard such as pattern matching, tagged unions and
linked lists. This part of Modelica is closer to general-purpose programming and therefore easier to compare to other languages and their corresponding IR solu-tions, as well as less complex. The Modelica support for multi-dimensional arrays was deemed too complex and time-consuming, and was therefore skipped. While performance improvements at this stage are of course desirable, the main focus is on creating a solution that can later be extended on with new optimiza-tions and back ends.
Chapter
2
Theory
2.1
Compilers
Modern compilers are commonly structured as a pipeline of several phases taking a structure, transforming it, and sending it to the next phase. This pipeline can be divided into afront end parsing and analyzing the source code, and a back end taking the structure produced by the front end and converting it into a executable program[1].
Some common operations of the front end part arelexical analyzing (converting the source text to more easily parsable tokens),syntax analyzing (checking if the syntax is correct and converting the token stream into an abstract syntax tree), andsemantic analyzing (e.g. checking that the types are correct) while some com-mon tasks of the back end areoptimization and code generation [1].
2.1.1 Optimization
In order to improve the performance, size and/or power consumption of a gen-erated program, compilers may attempt to optimize the gengen-erated code rather than producing the most obvious conversion of the source code. Optimizations must give the same results as the unoptimized version as well as sufficiently high performance improvements while being fast enough to still give acceptable com-pilation times [1].
2.1.2 Intermediate representations
Intermediate representations are internal forms of a computer program created and used by a compiler in order to aid the compilation process. An intermediate repre-sentation lies somewhere between the original source and the compiled target and can be at different levels depending on its area of use; it can be high-level (close to the original source code), low-level (close to the target language) or something in between. The data structures used can also vary; they can for example be a graph, a tree or a linear list. Compilers often have multiple intermediate representations in its pipeline, with each IR serving different purposes and each phase converting the program to a lower IR form[19].
One major advantage of having intermediate representations is that the compiler can be more easily retargeted into new source languages or new target platforms and reuse independent components for those. Instead of having to write one com-piler for every source/target combination, developers can just add a single front end in order to support one source language, and a single back end in order to support one target platform [19][1].
Common IR designs
One common linear representation of operations is three-address codes, where each operation is binary and represented by two source variables, one destina-tion variable and an operadestina-tion type. These operadestina-tions can be stored in memory as records called quadruples storing the three variables and the operation type. Variables can be either a named variable, a constant or a compiler-generated tem-porary variable. Unary operations may be defined as having just a single source variable. Expression trees are flattened by storing intermediate operations in tem-porary variables which are then used as source variables in later expressions. Temporary variables are usually given unique names and not shared between different intermediate operation results. Special call, jump and conditional op-erations can be implemented for representing control flow.[1].
One control flow representation that is frequently used is thecontrol flow graph (CFG). The instructions are partitioned intobasic blocks, which is an instruction sequence where the control flow of each block can only enter through the first in-struction and exit in the last, where various jumping conin-structions can be chosen. The basic blocks are then represented as nodes in the diagram and execution paths as directed edges between the blocks. Each basic block should preferably contain as many instructions as possible without violating these rules. The instructions
2.1. Compilers
used in basic blocks are primitive and in three-address form. This representation simplifies many analyses and is therefore useful for performing many optimiza-tions[1].
Another complementary representation for tracking data isstatic single assign-ment (SSA), which differs from three-address codes in that each variable can not be assigned more than once in a function, which simplifies data flow analysis significantly. This gives SSA the important property ofreferential transparency, meaning that a reference can be replaced with its definition and therefore that the variable values are independent of the order their statements are listed in. Ref-erential transparency also allows for a computation to be replaced by the result which allows for well known transformations like common subexpression elim-ination. As such, compilers that perform data flow analysis can do a conversion pass over the non-SSA representation and produce a SSA-base representation that is easier to reason with. SSA is used together with a basic block structure and uses a specialφ (phi) function when execution paths merge, which takes a list of source variables and assigns one of them to a new variable depending on the previously executed block. The transformations can in general be made with other meth-ods, but SSA has the advantage of being both intuitive and efficient, allowing for more optimizations to easier be implemented while also enabling fast compilation times, often fast enough that it can be used in just-in-time compilers[17].
Example: GCC
One example of IR usage is GCC, which has two intermediate representations calledGENERIC and GIMPLE. GENERIC represents a function and its statements as a tree structures, while GIMPLE is a subset of GENERIC reduced by a process calledgimplification and used in the optimization stage. These representations are both independent of the programming language used[12].
Example: LLVM
LLVM is an IR that is, among other things, used in the Clang compiler for C. LLVM is also used for other back ends, for example Rust, Swift and GHC (for Haskell). These compilers use LLVM for optimizations and code generation. LLVM also aims to be a portable format by supporting several targets for code generation. LLVM uses a basic block and terminator model for control flow. The instructions are three-address-code with single static assignment variables.
Example: Rust and MIR
The official compiler for the Rust programming language has added a new IR named MIR (Mid-level Intermediate Representation) between its high-level AST and its low-level LLVM code generation, whereas previously the LLVM code was directly from the AST. The design of MIR is based on primitive three-operand statements and basic blocks with terminators. This design makes translations to LLVM, which also uses primitive operations and control flow representations, relatively simple to do [10].
Some of the main goals of MIR in Rust are improving compilation time by having more efficient data structures, enabling more Rust-specific optimizations, reduc-ing redundancy in the code base and makreduc-ing optimizations and other transfor-mations easier to work and reason with in general[10].
One notable difference from LLVM is that it is not SSA-based, i.e. it allows multiple assignments to the same variable, and named variables are kept as-is. However, generated temporaries are still typically single-assignment. As more advanced optimizations relying on SSA representations are typically done by LLVM, an lvalue-based representation rather than SSA is considered sufficient for this pur-pose.[10][11].
The complete Rust language including its various syntactic sugar constructions is reduced to a small subset that is easier to work with, since various redundant representation variations of a single low-level feature, all having to be handled separately, are now represented by fewer variants meaning the analysis has fr-wer cases to handle. Where control flow analyses fr-were previously done on sepa-rate control-flow graphs that had to be genesepa-rated from the AST, they can now be done directly on the MIR representation. The Rust safety analyses are also more accurate since the lower-level nature of MIR makes the difference between the analyzed structure and the final code smaller. Rust-specific optimizations can be directly done as a separate stage, whereas it previously often was done during con-version to LLVM, adding unwanted complexity to this concon-version phase. Apart from simplifying LLVM generation, MIR also adds potential for adding other low-level back ends in the future.[11].
The MIR data structure describes the workings of a single function and contains a control-flow graph stored as a list of basic blocks, a list of compiler-generated temporary variables, and a list of user-declared variables. A single basic block contains a list of statements and a terminator, which describes the control-flow action that occurs at the end of the basic block execution. A statement can either
2.1. Compilers
be a variable assignment or a drop (deallocation) of a variable, which is described explicitly unlike in the source language. An assignment statement contains an rvalue for the right-hand side and an lvalue for the left-hand side.
An lvalue can be variables of different kinds such as named, temporary, argument or return variables, a field in a struct or tuple, a pointer dereference, an array index, or a enum downcast[11], see table 2.1.
B User-declared variable binding
TEMP Compiler-generated temporary
ARG Function argument
RETURN Return value
LVALUE.f Struct or tuple field
*LVALUE Pointer dereference
LVALUE[LVALUE] Array index
(LVALUE as VARIANT) Enum downcast
Table 2.1: Available LVALUE types in MIR
An rvalue symbolizes an expression and can be the use of an lvalue, a mutable or immutable reference, a cast, a constant, a literals of a struct or built-in con-tainer type, the length of an object, or common simple binary operations and unary operations[11], see table 2.2. As shown below, most rvalue operations only take lvalues as arguments, meaning that for constants and data structure literals can only be used through temporary variables. The specialBOX value represents the memory allocation function taking the struct constructor method as its sole argument, and is used in the MIR call representation just like other functions[11]. Terminators in MIR can jump to another basic block with or without stack un-winding, jumps to one of two specified basic blocks depending on the truth value of a variable, jumps to one basic block from a list depending on the value of a variable, call a function and afterwards jump to one of two basic blocks depend-ing on if the function succeeded or failed, or simply return from the function with or without stack unwinding[11], see table 2.3.
Use(LVALUE) Value of LVALUE
[LVALUE; LVALUE] Array literal of specified size with the same defined value for all cells
&’REGION LVALUE Reference to LVALUE
&’REGION mut LVALUE Mutable reference to LVALUE
LVALUE as TYPE Cast
LVALUE <BINOP> LVALUE Binary operation
<UNOP> LVALUE Unary operation
Struct { f: LVALUE0, ... } Struct literal
(LVALUE...LVALUE) Tuple literal
[LVALUE...LVALUE] Array literal
CONSTANT Constant
LEN(LVALUE) Length of LVALUE
BOX Memory allocation function for
box operator Table 2.2: Available RVALUE types in MIR
Example: Swift and SIL
Similar to the Rust compiler, the official Swift compiler has also added a new mid-level IR between the AST and the generated LLVM code with the nameSIL (Swift Intermediate Language). Unlike MIR, SIL is SSA-based, but replaces the phi node concept with having arguments in basic blocks that are set by termina-tors jumping to that block. Within the block, the argument variables work like typical source variables. Like with MIR, literals have to be saved in temporaries before they can be used in operations. Calls are implemented differently in SIL, while MIR implements calls as terminators, SIL instead implements them as reg-ular statements. Operators are also implemented as calls to built-in functions rather than special rvalue constructions by like in MIR. More low-level memory operations are stored as explicit constructions than in MIR, including heap and stack allocations, memory accesses and reference counting handling [20].
2.1. Compilers
GOTO(BB) Jump unconditionally to basic block BB PANIC(BB) Start stack unwinding and jump to basic
block BB for cleanup
IF(LVALUE, BB0, BB1) Jump to BB0 if LVALUE is true, otherwise jump to BB1
SWITCH(LVALUE, BB...)
Jump to one of the listed basic blocks depend-ing on value of enum LVALUE
CALL(LVALUE0 =
LVALUE1(LVALUE2...), BB0, BB1)
Call function referenced in LVALUE1 with arguments in LVALUE2 onwards, store re-turn value in LVALUE0, jump to BB0 if call succeeded or BB1 if it panicked
DIVERGE Return and unwind stack
RETURN Return
Table 2.3: Available terminators in MIR
Example: Glasgow Haskell Compiler
One major compiler for the functional language Haskell is theGlasgow Haskell Compiler, GHC, which is an open source project. One of GHC’s IRs is Core. While being an IR, Core corresponds well to a simple source level language which elim-inates superfluous ways to express the same language construct [9]. For example, a list comprehension needs to be changed from a native Haskell construct into an expression based on variable bindings and functions in Core.
In Core, case-expressions are also restricted as they cannot match nested construc-tors of a value [7]. It is used to see which member of a union a value contains as well as accessing the attributes of the record. Core also flattens expressions by restricting their usage. An argument to a function must be a literal or variable (called atom in the paper), resulting in the dependence of function calls being explicitly ordered by variable bindings.
GHC has another lower level IR, theThe Spineless Tagless Graph Reduction Ma-chine, or STG [8]. The difference between STG and Core is that Core is meant to simplify expressions in a functional setting while STG is meant to help simplifi-cations targeted at modern processors. As such it specifies operational semantics, unlike Core. In addition, all type information information is lost in transforming Core to STG.
Parse Tree Core STG Cmm LLVM C Assembly desugar STGify CodeGen LLVM compiler NCG C compiler
Figure 2.1: Compilation of Haskell in GHC [9]
The operational semantics include a stack for arguments, returns, and the imple-mentation of the lazy calling convention. Arguments are pushed when a function application is evaluated and popped when entering closures with arguments. The return entries in the stack is actually not for function returns since the only eval-uation is from pattern matching, so the entry is for the result of a pattern match. The implementation of the lazy calling convention is done with a stack entry that causes a memory mutation of a suspended computation with the current value computed.
STG also has a heap which contains all values allocated until they are deallocated by garbage collection. An important feature for long running computations in a lazy language is black holes. When a computation is entered, it is replaced by a
2.2. The Modelica language
black hole, which does not keep any of the computation references alive , although the ones used when evaluating the computation are. This means that if garbage collection is performed while evaluating the black holed value, more things can be collected. For example, in code for finding the last value of a long linked list, earlier elements can be collected even if garbage collection happens in the middle of evaluation. Additionally, if evaluation tried to evaluate a black hole that it has created, then an infinite loop has been detected, so an exception can be thrown. Further down in the compilation pipeline, we findCmm, which is a processor portable intermediate language reminiscent of LLVM. Cmm consists of simple control flow between blocks, basic types that reflect machine representation and stack-backed unlimited variables[21]. Cmm contains no type information except for machine level representations like 32-bit signed integers. It also explicitly rep-resents the heap and stack and writing to byte addresses. As can be seen from fig-ure 2.1.2 [9], there are several back ends that starts from Cmm and then generate assembly.
2.2
The Modelica language
Modelica is a declarative and object-oriented language developed for equation-based modeling of complex and dynamic physical systems. It can be used for simulating, for example, mechanical, electrical, hydraulical and process oriented systems[5]. The Modelica standard exists in multiple implementations and is gov-erned by the international non-profitModelica Association[16]. Systems can be separated in smaller components which can then connect to other components and be distributed in model libraries. This enables equation systems to be reused and combined to make larger systems. Many common standard components are distributed by the Modelica Association in theirModelica Standard Library[16].
2.2.1 Primitive types and arrays
The primitive types supported are integers, reals (floating-point), booleans, strings, enumerations and a special clock type used for synchronous systems. In addition, support for complex numbers are implemented in a standard library. Multi-dimensional arrays are also supported, and can have dimension sizes that are unspecified at compile time. A data type for complex values is also imple-mented by the standard Modelica library[4].
Some of the primitive operations supported in expressions are scalar arithmetic operations (such as addition, subtraction, division, multiplication and exponenti-ation), elementwise arithmetic operations on arrays, comparisons, logical opera-tions, and if-expressions[4].
2.2.2 Models and equations
Modelica model classes describe the system to be modelled as a system of vari-ables with optional initial values and differential, algebraic and discrete equations, which can then be compiled and solved by the Modelica implementation for a given time slice. The class defined at the top of the program is automatically in-stantiated, and other classes can be instantiated by declaring them as variables in the top class[4].
Each equation consists of two expressions, one on each side of an equality (=) operator. The listed equations are not affected by the order in which they are listed and are acausal, meaning they do not have a fixed data flow direction. In order to support variation over time, variables can be surrounded by theder() time derivative operator, and the time variable can also be accessed directly as time. For-loops can also be used to declare repetitive equation series in a shorter way.[16][4].
Variables can optionally have defined initial values, and models also support ad-ditional variable types such as named constants and parameters, which unlike normal named constants can be set before simulation without recompiling[4]. For example, a pendulum can be modelled as in the following example taken from page 21 in Principles of Object-Oriented Modeling and Simulation with Modelica 3[4]. This model contains both differential equations and algebraic equations, and is therefore an example of an differential algebraic equation system (DAE). This system can be simulated by calling thesimulate function, for example by writingsimulate(Pendulum,stopTime=6)[4] and then plotted by calling theplot function with the variable to be plotted as its argument[4].
model Pendulum
parameter Real m=1, g=9.81, L=0.5; //mass, gravity, length of pendulum Real F; //force
output Real x(start=0.5), y(start=0)
//x and y position with set start values output Real vx, vy; //x and y velocity
2.2. The Modelica language equation m * der(vx) = -(x / L) * F; m * der(vy) = -(y / L) * F - m * g; der(x) = vx; der(y) = vy; x^2 + y^2 = L^2; end Pendulum; 2.2.3 Model inheritance
Models can extend on other models, and therefore provide more specialization while reusing code, similar to hierarchical class inheritance in typical object-oriented languages. By inheriting equations, data variables and class members from a base class, a subclass can inherit part of their behaviour while modifying and adding on it by adding additional equations and variables[4].
Model classes can be partial, meaning that their equation systems are under-specified and can only be made solvable by extending them with subclasses pro-viding additional equations, this can be seen as an analog to abstract classes in object-oriented languages. Variables of an instances are accessed though dot syn-tax, though they can be protected from outside access by putting them in the protected section, which will block direct access from outside but still make them available in submodels.[16][4].
Classes can also contain variables with type declarations that are replaceable by subclasses, similar to generics in other languages. A field with a replaceable type is simply prefixed by theprotected keyword. For making a new class based on a class with replaceable types, a new type definition specifying the types is made which can be then be instantiated like a regular class[4].
2.2.4 Connections
Model instances can be connected to each other through special connect-equations in order to create larger systems. The interfaces for these connections are specified by connector classes, which contain a list of the variables that are carried by the signals. Variables in a connector can optionally be configured as flow variables, indicating that the values of all connected signals will sum to zero instead of being equal[4].
Connections are generally acausal, meaning that they like equations lack a spec-ified data direction, but they can also be specspec-ified as input or output connections, meaning that they can only receive from or send to a component, respectively[4]. When connecting one variable in a component to many subcomponents without having to make a large number of connect-equations explicitly, it can be made implicitly by prefixing the shared variable in the top component with theinner keyword and declaring a reference variable with the same name in the subcom-ponents prefixed by theouter keyword[4].
Discrete events
Discrete instantaneous events can be modelled by using or by using the when-statement, which only activates its subequations at the exact time moment when one or more of its condition expressions transitions to true. Discrete and con-tinuous components can be freely combined to create hybrid systems. A when-statement can contain a specialreinit equation that resets a variable to a new value on the event. In a reinit equation, the previous value of the variable can be accessed through thepre operator. Apart from the when-statements, simple if-expressions and if statements in normal equations may also be used to model discrete changes[4].
Basic electronics example
In listing 1, we take examples fromPrinciples of Object-Oriented Modeling and Simulation with Modelica 3 to give a taste of Modelica. The listing defines electrical components in Modelica by defining variables, equations, connectors and using inheritance so that shared equations can be defined in a single partial superclass [4].
Packages
In order to avoid name conflicts and simplify sharing code, libraries can be dis-tributed as packages, which gives all content in the library its own hierarchical namespace. Other packages can then be imported in another package with the import keyword, which optionally allows importing namespaces directly at the top-level within the package. Within a package, an imported namespace can be given custom names so that typing can be reduced without risking name conflicts as with top-level imports.
2.2. The Modelica language type Voltage = Real(unit="V");
type Current = Real(unit="A"); type Resistance = Real(unit="Ohm"); type Capacitance = Real(unit="F"); connector Pin "Electrical pin"
Voltage v; flow Current i;
// the flow keyword indicates that any connected // variables should sum to zero
end Pin;
partial model TwoPin "Electrical component with two pins"
// partial since it does not have enough equations // to be fully defined Pin p,n; Voltage v; Current i; equation v = p.v - n.v; 0 = p.i + n.i; i = p.i; end TwoPin; model Resistor extends TwoPin;
// include all variables and equations from TwoPin parameter Resistance R; equation R*i = v; end Resistor; model Capacitor extends TwoPin; parameter Capacitance C; equation C*der(v) = i; end Capacitor; model Ground Pin p; equation 0 = p.v; end Ground; model LowPass Pin in,out; parameter Resistance R; parameter Capacitance C; Resistor resistor(R=R); Capacitor capacitor(C=C); Ground ground; equation connect(in, resistor.p); connect(resistor.n, out); connect(out, capacitor.p); connect(capacitor.n, ground.p); end LowPass; 16
2.2.5 Functions and algorithms
More traditional imperative code can be written in Modelica inside algorithm sec-tions. Unlike in normal equation sections, variables are assigned values directly with the:= assignment operator, they can also be assigned multiple times within a single section. Both recursion and common imperative control flow statements such as if-then-else, for and while are supported. Algorithm sections in Modelica are pure, i.e. without side-effects and global state, in order to support safe usage inside equation systems. [4]
The special function class type can be used for implementing named mathematical functions using algorithm sections. Functions can have multiple input variables and, unlike many other languages, multiple outputs variables as well. Functions can also declare local variables inside protected sections for use in the algorithm section. [4]
Two examples of implementations for the factorial function are provided below: function factorial_recursive input Integer i; output Integer o; algorithm if i > 1 then o := i * factorial_recursive(i-1); else o := 1; end if; end factorial_recursive; function factorial_imperative input Integer i; output Integer o; protected Integer acc; algorithm acc := 1; for x in 2:i loop acc := x*acc; end loop; o := acc; end factorial_imperative;
2.2. The Modelica language
2.2.6 MetaModelica
MetaModelica is an extended version of Modelica designed for modeling program-ming languages. It complements the algorithm support in Modelica with various features common to functional programming, such as tagged unions with support for recursion, linked lists, tuples, and pattern matching. It also adds support for exception handling and generics [6].
Parameterized types
Parameterized types enable types to be specialized by another type as a parameter, and is similar to generics in other programming languages. Most of the new built-in types built-in MetaModelica support type parameters [6].
Lists
Lists contain an arbitrary number of objects of a single type. Lists are imple-mented as immutable linked lists like in many functional languages, meaning that they are immutable which enables parts of lists to be shared between different lists. New lists can be created in constant time by inserting new values before existing lists with the:: (cons) operator. [6]. However, some operations like ap-pending, getting a value from a specific index, and calculating the list length will have linear time complexity. Lists can be created either with the cons operator or by braces-surrounded list literals listing all values in the list, this is also used to represent the empty list{}[13].
In addition, pattern matching can be used for extracting values from or comparing lists[6]. MetaModelica also has several built-in methods for performing various operations on linked lists[13]:
listAppend — Returns a copy of a list concatenated with another list
listDelete — Returns a copy of a list with a specific index-specified object
skipped
listEmpty — Returns a boolean indicating if a list is empty (has length 0) listHead — Returns the first object in a list
listMember — Returns a boolean indicating if a list contains a specific value listLength — Returns the length of a list
listRest — Returns the tail of the linked list (every object except the first) listReverse — Return a reversed copy of a list
List<Integer> l, l2, l3; //variable declaration
l := {3, 4, 5}; //list literal l2 := 2 :: l;
//creating a new list {2, 3, 4, 5} with the cons operator i := listGet(l, 2);
//accessing the second value through in the list (4) len := listLength(l);
//getting the list length (3) l3 := listReverse(l);
//getting a reversed list ({5, 4, 3})
Tuples
Tuples contain an arbitrary number of objects of mixed types, and can be seen as a way to create simple records without having to write record declarations. Values in the tuple can be accessed either through pattern matching or by dot notation, denoted by following the tuple with a dot and the index of the object (1-indexed)[6].
Tuple<Integer, String, List<Real>> t; //variable declaration t := (12, "hello", {1.0, 2.0, 3.0}); //tuple literal
i := t.2; //accessing the second value through dot notation
Union types
Union type objects store record data with a type-safe constructor describing its variant, and are similar to algebraic data types in functional programming. One or more record types can be defined for a single union type. Union type instances are also immutable, i.e. its fields can not be modified after it has been created.
2.2. The Modelica language
Union types are recursive, meaning that they can have fields of its own type, and are therefore useful for describing tree structures, such as abstract syntax trees. Pattern matching can be used for checking and extracting field values[6]. uniontype Number record INT Integer int; end INT; record RATIONAL Integer int1; Integer int2; end RATIONAL; record REAL Real re; end REAL; record COMPLEX Real re; Real im; end COMPLEX; end Number;
Number a; //variable declaration
a := RATIONAL(8, 13); //literal with RATIONAL constructor a := REAL(1.618033); //literal with REAL constructor
Option types
Option type values either carry a single field of a specific type or none at all, and is generally used for cases where objects are optionally defined. They are im-plemented as a built-in parameterized union type with the constructorsNONE() orSOME(x) where x is a object of the parameter type. The constructor can be checked with the ‘isSome‘ and ‘isNone‘ functions, and option type values can also be unpacked with pattern matching like other union types[6].
Option<String> o; //variable declaration o := NONE(); //none literal
o := SOME("hej"); //some literal if isNone(o) then
...
Pattern matching
One of the most important features in MetaModelica is its pattern matching sup-port, which is similar to pattern matching in many functional languages. This can be used for more advanced control flow and enables simple and powerful handling of structural data[6].
Each case is tested in the order they are listed and contains a pattern, the body to be executed and a case return expression calculated and returned by the match expression after the body has finished. The unit value() can be returned if an actual return value is not desired. The return value can also be a tuple, allowing multiple values to be returned. The return values in all cases in a single match statements are required to be of the same type. The body for each case can either be a algorithm section or a equation section, equation sections are however not allowed to contain differential equations. A match statement can have its own set of local variables, these can also be used for pattern binding[6].
Patterns that can be matched in a case include scalar constants such as integers and strings, record constructors with named or positional arguments, tuples, lists made with literal syntax, lists made with the cons (:: operator, and the _ wild card which allows and ignores all values, these patterns can also be nested. Vari-ables placed in a pattern will be bounded, i.e. assigned the actual value, if the case match succeeds. In addition, the whole pattern itself can be bound to a variable with the specialas binding operator. The __ pattern as the single argument to a record constructor can be used to bind all fields without having to explicitly name them. Apart from the pattern expression itself, a pattern can also include a guard expression which must be true for the matching to succeed, this expression pattern can include variables from the pattern expression[6].
Pattern matching expressions come in two variants with different behaviour when an exception is raised in the case body: match, which makes the whole match statement fail as expected and matchcontinue, which instead rewinds the state and tries the following patterns, failing the whole match expression only when all patterns have been exhausted[6].
Comprehensions
List and array comprehensions allow the user to write concise mapping and fil-tering on collections using some syntactic sugar. They take map expression and one or more collections with a named iterator variable for each collection, and
2.3. The OpenModelica environment
can optionally take guards filtering the values. There are also “threaded” compre-hensions which work like a zip between any number of lists[6].
list<Integer> l0 := list(1+x for x guard 0<=x in otherList);
list<Integer> l1 := list(a+b threaded for a in 1:2, b in 3:4);
// {1+3,2+4}
list<Integer> l2 := list(a+b for a in 1:2, b in 3:4);
// {4,5,5,6}
Exception handling and asserts
Exceptions such as out-of-bounds accesses and divisions by zero can be tested by putting the expression or statement inside afailure call, which will succeed if the test statement causes an exception and throw an exception if the test state-ment succeeds. If an unhandled exception occurs inside a matchcontinue case, the program will then rewind the state and try the following cases rather than making the entire match statement fail. Exceptions can also be generated explic-itly with thefail function, or by assertions using the assert function, which takes an assertion condition, a message string and optionally an assertion severity level[6].
2.3
The OpenModelica environment
OpenModelica is an open-source Modelica-based simulation and modeling envi-ronment. Some of its main purposes is to provide efficient, easy-to-use and well visualized Modelica-based simulations while also serving as a teaching and re-search tool and as a reference implementation that is itself written largely in Modelica[5]. Most of the development of OpenModelica is done by Linköping University in Sweden.
2.3.1 Compiler structure
The OpenModelica compiler takes Modelica code and translates it to C code which can be compiled by a standard compiler. The subsystem also provides an inter-preter so that code can be tested interactively[3].
Most parts of the OpenModelica compiler are written in MetaModelica. The OpenModelica compiler can compile MetaModelica code, including bootstrapping itself[18].
Translator Analyzer Optimizer Code Generator C Compiler Simulation Modelica source
DAE with flattened models
DAE with sorted equations
DAE with optimized sorted equations
C source code
Executable program
Figure 2.2: Overview of translation phases in the OpenModelica compiler
The OpenModelica Compiler is organized, like most other compilers, as a pipeline of these phases[4][3] as seen in figure 2.2:
Translator — parses the source code into the initial Absyn-format AST, con-verts it into the simplifiedSCode-format intermediate AST, and reduces the object-oriented structures to a single flat equation system in theDAE-format AST. Type checking and other static analyses are also performed here. Analyzer — performs transformations on the equation system so that they can
be efficiently solved, including dependency sorting the equations and con-verting to imperative assignments.
2.3. The OpenModelica environment Parse SCode/explode Inst BackendDAECreate Symbolic operations (BackEnd) SimCode Code generator
Lookup Static Ceval
Modelica code
Absyn
SCode
DAE
Backend DAE
Sorted and optimized DAE
SimCode
C code
Code Generator — generates compilable C code from the DAE. This code is then passed to a C compiler.
A more detailed overview on some of the most relevant modules used in the code generation is shown in figure 2.3.
2.3.2 Susan as a Code Generator
Susan is a template language used by the OpenModelica Compiler. Its purpose is to allow easy to use text generation from MetaModelica structures.
A Susan file consists of several templates that accept some MetaModelica data type and return text. Templates can also use what’s calledbuffers to fill in holes left in the returned text. Templates may be used solely for their effects on buffers and not for the text they return.
See listing 2 for an example of a Susan template. The listing contains a buffer auxFunction and a match on var. The cases of the match return the fi-nal result of the entire template. The VARIABLE case has a nested template contextCref to which it passes the auxFunction buffer.
template funArgBoxedDefinition(Variable var)
"A definition for a boxed variable is always of type modelica_metatype, unless it's a function pointer" ::=
let &auxFunction = buffer "" match var case VARIABLE(__) then 'modelica_metatype <% contextCref(name,contextFunction,&auxFunction) %>' case FUNCTION_PTR(__)
then 'modelica_fnptr _<%name%>' end funArgBoxedDefinition;
Listing 2: A snippet in the Susan template language
2.3.3 The DAE representation
The DAE representation is a AST representation that, unlike the previous repre-sentation stages, have the object-oriented structures such as class instances and
2.3. The OpenModelica environment
connections simplified and flattened into a single equation system. This flatten-ing is done from theSCode representation by the Inst module. However, Meta-Modelica data structures are still preserved and constructed in run-time. Like the other representations in OpenModelica, it is implemented using MetaModelica data structures such as union types, optionals and lists[3].
A function in DAE can contain various different elements, such as algorithms, equations of different kinds, variables, reinit statements, calls and asserts[15]. This overview will focus on the part implementing the algorithm subset, which is the subset most relevant to the IR implemented in this thesis.
Elementand Algorithm union types
A function contains elements of various types, such as algorithm sections, equa-tions of different forms, and variables. These are represented by theElement algo-rithm[15]. Described below are the element types most important to this thesis. Although all element types contain a source field of theElementSource union type containing metadata such as source code line numbers and classes and in-stances it belongs to, this field is skipped in these descriptions for brevity.
VAR - This element type represents variables and contains many fields related to
names, types, equation flow and connections. The most important ones for this thesis are the component reference and the type field.
ALGORITHM - This element type represent algorithm sections and contains a
field of theAlgorithm union type, which simply contains a list of state-ments.
ComponentRefunion type
Component references represent hierarchical path names and are typically used for describing variables[15].
CREF_IDENT — This record type represents a non-hierarchical or bottom-level
identifier, and contains the name as a string, its type and a list of optional subscripts.
CREF_ITER — This record type is used for iterators, and contains an index used
CREF_QUAL — This record type represents a higher level in a hierarchical path,
and contains a component reference to the level below in addition to the data inCREF_IDENT.
Absyn.Pathunion type
While Absyn.Path is strictly part of the Absyn representation definitions, it is frequently used in DAE for externally accessible objects such as functions or union types, and so it is mentioned here.
IDENT — This record type represents a non-hierarchical or bottom-level
identi-fier, and contains the name as a string,
QUALIFIED — This path type represents a higher level in a hierarchical path
,and contains the path to the level below in addition to the name string of its level.
Statementunion type
The statement record types available are assignments of various types and control flow statements such as calls, if statements, loop statements like for and while, when statements, and simple skipping statements like break, continue and re-turn[15]. Described below are the statement types most important to this thesis. Although all statement types, like the element types, contain aElementSource source field containing metadata, this field is skipped in these descriptions for brevity as well.
STMT_ASSIGN — This statement type describes an assignment and contains the
type of the assignment and the expressions of the left and right hand side.
STMT_IFand the Else union type — This statement type describes an if statement and contains the conditional expression, a list of statements to be executed when the condition is true, and a value of the Else union type to describe the behaviour when the condition is false. The type in the Else union type field can either beNOELSE signifying that nothing is done, ELSEIF performing another conditional step and having the same fields as aSTMT_IF, or a ELSE which simply contains a list of statements to be executed on a false condition.
2.3. The OpenModelica environment
STMT_FOR — This statement type describes a for(each) statement and contains
the type of the iterator, the name of the iterator variable, the range expres-sion to be iterated over and a list of statements executed in the loop body. It also contains a few additional code generation-aiding variables which did not have to be considered in the development of this thesis.
STMT_WHILE — This statement type describes a while statement and contains
a conditional expression and a list of statements executed in the loop body.
STMT_NORETCALL — This statement type describes a call not having or storing
any return values, and the only field is contains is an expression of the call type described further down.
STMT_BREAK, STMT_CONTINUE and STMT_RETURN — These statement types simply describe break, continue and return statements and do contain any additional data. Note that value returns in Modelica are done by as-signments to designated output variables rather than by return statements, therefore STMT_RETURN does not contain any return values, but simply exits the function.
Typeunion type
This union type represents the data types used in DAE[15].
T_INTEGER, T_REAL, T_STRING and T_BOOL — These types simply repre-sent the basic data types in Modelica, i.e. integers, reals, strings and booleans.
T_NORETCALL — This type represents the return value of a call without output
variables.
T_TUPLE — This type represents tuples as returned from functions with
multi-ple output values contains a list of types indicating the type of each tumulti-ple element and a optional list of tuple field names as strings.
T_METALIST — This type represents MetaModelica lists and contains a type
field indicating the type of its elements.
T_METATUPLE — This type represents MetaModelica tuples and contains a list
T_METAOPTION — This type represents MetaModelica optionals indicating the
type of its element when it contains a value.
T_METAUNIONTYPE — This type represents MetaModelica union types. T_METARECORD — This type represents MetaModelica records, and contains an
Absyn.Path to the union type, an Absyn.Path to the record, a list con-taining the type of each field, the constructor ID for the record, a list of the Var components of each field, and a boolean indicating if the record type is a singleton.
T_METAARRAY — This type represents MetaModelica arrays and contains a type
field indicating the type of its elements.
T_METABOXED — This type represents MetaModelica boxed values.
Expunion type
This union type represents the expression types that can be used in DAE such as literals, operators, variable references and calls[15].
ICONST, RCONST, SCONST and BCONST — These expression types simply represent constants of the basic Modelica data types, i.e. integers, reals, strings and booleans. Its sole field is the constant value it contains.
CREF — This expression type represents a variable reference and contains a
com-ponent reference field and the type of the variable.
BINARYand UNARY — These expression types represent binary or unary arith-metic operations and contains one or two subexpressions and aOperator value denoting the operation to be performed.
LBINARYand LUNARY — These expression types represent binary or unary logical operations such as and, not, and or. Similar to the arithmetic op-erations, it contains one or two subexpressions and aOperator value de-noting the operation to be performed.
RELATION — This expression type represents comparisons. Apart from having
two subexpressions and a Operator value like other binary operations, it has some additional fields for model simulation handling which is not considered here.
2.3. The OpenModelica environment
IFEXP — This expression type represents an if expression and contains three
subexpression: one for the condition, and one each for the true and false case.
CALL — This expression type represents a call and contains the name of the
function, a list of subexpressions denoting the arguments and a special CallAttributes field storing various additional data about the call. Some of the data stored inCallAttributes are the type of the return value, if the function call return multiple values as a tuple, if the call is to a built-in function, and if the call is inline or a tail call.
RANGE — This expression type represents numeric ranges is typically used in
for statements and contains the type of the numeric values, the start value, the end value and optionally the step between each value, which is 1 if not specified.
CAST — This expression type represents a type cast and contains the type the
value is cast to and a subexpression representing the value is being cast.
TSUB — This expression type represents tuple subscripts and contains the
subex-pression to be subscripted, the integer index, and the type of the returned value.
ASUB — This expression type represents array subscripts and contains the
subex-pression to be subscripted and a list of integer indexes with each value rep-resenting a different array dimension.
RSUB — This expression type represents record value accesses and contains the
subexpression of the record, the integer offset of the field, the name of the field, and the type of the returned value.
LIST — This expression type represents a MetaModelica list literal or a nil node
and contains a list of subexpressions denoting each element stored in the list.
CONS — This expression type represents a MetaModelica list node and contains
two subexpressions denoting the head and tail of the list node.
META_TUPLE — This expression type represents a MetaModelica tuple node and
contains a list of subexpressions denoting each element stored in the tuple.
META_OPTION — This expression type represents a MetaModelica optional and
METARECORDCALL — This expression type represents a MetaModelica record
constructor and contains the path to the record, the arguments as a list of subexpressions, a list of field names, the record variant number, and a list of types for each field.
MATCHEXPRESSION — This expression type represents match expressions
and contains a field of the MatchType union type that can be MATCHCONTINUE or MATCH, a list of subexpressions for the expressions to be matched, a list of local declarations as Element values, a list of cases as MatchCase values, and the type of the match expression. The MatchCase union type is described more in detail below.
BOX — This expression type represents a MetaModelica boxed value and contains
a subexpression for the value to be boxed.
UNBOX — This expression type represents the unboxing of a MetaModelica boxed
value contains a subexpression for the value to be unboxed and a type field indicating the type of the unboxed value.
PATTERN — This expression type represents various patterns as used in match
statements. Its sole value is of thePattern union type described more in detail below.
MatchCaseunion type
This union type represents a single case in a match expression and contains a single variant record type ‘CASE‘. It contains a list of patterns of thePattern union type, an optional guard subexpression, a list of local declarations as ele-ments, a case body as a list of stateele-ments, an optional case return subexpression, and some source-code related metadata[15].
Patternunion type
This union type represents patterns used in match expressions, and can also be recursive like expressions[15].
PAT_WILD — This pattern type represents a wildcard that accepts all values
2.3. The OpenModelica environment
PAT_CONSTANT — This pattern type matches various literals like numerals,
strings, empty list, andNONE. The record contains the expression and op-tionally a type used for unboxing the value.
PAT_AS — This pattern type allows binding the entire value to a name while
continuing to match on its contents, such aslistVar as _::tailVar, and contains an identifier, an optional type for unboxing, some attributes of the identifier, and the pattern that will be matched.
PAT_META_TUPLE — This pattern type matches the content of a tuple and
con-tains a list of patterns, one for each element.
PAT_CONS — This pattern type represents a linked list node and contains two
subpatterns representing the head and tail of the list.
PAT_CALL — This pattern type matches a union type constructor and contains
a name, the index of the matched record within its union type, the patterns for each record attribute, a list of variables for each attribute, a list of types, and a boolean indicating if the union type is known to be a singleton.
PAT_SOME — This pattern type represents an optional with aSOME value and contains a subpattern for the actual value.
Method
3.1
Design
During the design phase, different IR designs and existing IR solutions of notable compilers were evaluated and compared in order to create an initial IR design. The evaluation focused on extendability, ability to implement optimizations and ease of implementation with regards to conversions from the AST and to the back-end code, with special focus on easy conversion with SSA-based back ends such as LLVM.
The code base of the OpenModelica compiler and its corresponding documenta-tion was also investigated in order to make good design decisions.
3.2
Implementation
The implementation roughly consists of three parts: one phase converting the DAE representation to the new IR, one optimization phase where the generated IR is improved in some respect, and another one converting the new IR to com-pilable C-code. MetaModelica was used as the programming language for the implementation, since this language is used by the rest of the compiler.
3.3
Performance evaluation
During the evaluation phase, the code quality and performance of the new code generator were compared to the results for the old code generator. These results
3.4. Code complexity measurements
was then analyzed in order to see how large the differences are between the new representation and its optimizations and if the new generator gives an improve-ment.
The time was measured with theexecStat timing module that is built-in into the OpenModelica compiler. As execution time of compiled code wasn’t previ-ously measured, this had to be implemented separately with a core change out-side the MidCode code base. The test-cases were executed multiple times in order to guard against anomaly results, then a result representing the median case was picked. The input data and exact number of execution times were chosen so that the total time would be large enough to be accurately measured while not taking too long time to run. The computer used to run the measurements was a laptop with a Intel i7 2630QM (Sandy Bridge) processor.
The following benchmark functions were made, which can be seen in appendix A:
fibonacci – Recursive fibonacci F30without memoization, executed 100 times
mandelbrot – ASCII Mandelbrot with 1000 iterations returning a linked list of characters, executed 200 times
tak – Takeuchi function tak(18, 12, 6), executed 10000 times
qsort – Quick-sort of a random array of 20000 elements, executed 100 times The C compiler used for compiling the generated code was GCC 7.2. The opti-mization setting for the C compilation was changed to-O2 rather than the usual -O0 since it was noticed that the low-level style of the MidCode-generated C code was poorly suited for unoptimized compilation. It was also noted that the parti-tion funcparti-tion in the Quicksort test was tail-call optimized by the original genera-tor, something that has not been implemented in the current MidCode generator.
3.4
Code complexity measurements
The complexity of the different code generators was also measured using the num-ber of lines of code (LOC). This is measured because being able to have simpler code generators means that there is less work to port the language to another code generator.
According to Nguyen et al., LOC is used widely within industry and literature while being an essential component of several more advanced software complex-ity measurements[14]. Specifically, we use the number of lines in the file includ-ing empty lines, comments, etc. A discussion of how appropriate and relevant this metric is can be found in the section 5.3.
Both of the target specific implementations are in the Susan template language. We compare toCodegenCFunctions.tpl, which is closest in functionality. Unfortunately, this is not a precise comparison since the file chosen for compar-ison implements more features than our implementation. The old back end also has more template files likeCodegenC.tpl, see table 4.1, but it is mostly im-plementing features that are outside the scope of this thesis.
Chapter
4
Results
4.1
Overview of the MidCode design
MidCode, the resulting IR, represents the control flow of a procedure by the com-mon approach of basic blocks. Each basic block has a terminator which declares what control flow action happens at the end of the block, this may include opera-tions returning values such as calls. The data flow of the procedure is represented by named variables, compiler-created temporaries and simple unary or binary operations. Unlike SSA, named variables can be rewritten.
The MidCode related code paths are divided into three phases: “From Modelica to MidCode”, “MidCode Transformations”, and “From MidCode to C”.
4.1.1 IR design details
This part describes the uniontypes and records defined for MidCode and the fields contained within these.
Program
A program is represented by theProgram type. This type contains a name and a list of functions.
DAEToMid MidCode transformations MidToC DAE/SimCode representation MidCode MidCode C code
Figure 4.1: Overview of MidCode phases
Function
Functions are represented by theFunction type. Each function contains a name as anAbsyn.Path, several lists of local, input and output variables, a body rep-resented as a list of basic blocks, and ID references to the special entry and exit basic blocks.
Block
Basic blocks are represented by theBlock type. They contain a block ID number, a list of statements and a terminator.
Stmt
Statements are represented by the Stmt type and can either be a NOP or an ASSIGN, which simply assigns the value of an RValue to a Var. A statement has linear control flow but otherwise has various effects.
Var
Variables are represented by theVar type, and are used to represent both vari-ables used by the Modelica code and varivari-ables introduced during the translation process. Vars have a name and a data type.
4.1. Overview of the MidCode design
OutVar
Since output variables can be thrown away by the caller, lists of output variables in call statements contain theOutVar type rather than the plain Var type. In-stances of this type can either be aOUT_VAR containing an actual Var instance, orOUT_WILD indicating that the caller will not save the value.
RValue
An RValue is a value that can be placed on the right side of an assignment. TheRValue type in MidCode contains a few expressions like addition of two Vars and negating a Var. They appear in MidCode as part of assign statements. RValues do not have other RValues as operands, instead temporary variables are created during the translation process which are then sent as operands.
UNARYOP — AnUNARYOP is a constructor of the RValue union representing operations with a single operand, i.e. a singleVar. UNARYOP has variants representing for copying the unchanged value, negating, logically inverting, boxing and unboxing a variable. The operation to choose is determined by an enumeration value.
BINARYOP — ABINARYOP is a constructor of the RValue union representing operations with two operands, i.e. twoVars. BINARYOP has several vari-ants representing common operations like addition, subtraction, division, multiplication, logical or/and, and comparisons. The operation to choose is determined by an enumeration value.
Literal value constructors — A group of constructors of the RValue union rep-resenting literal values. The LITERALINTEGER constructor represents integer literals, the LITERALREAL constructor represents real (floating-point) literals, the LITERALBOOLEAN represents boolean values, and LITERALSTRING represents literals. The more complex meta object lit-erals used for records, linked lists, optionals and tuples are represented by theLITERALMETATYPE constructor.
Meta object data accessors — A group of constructors that are used for access-ing data about meta objects. TheMETAFIELD constructor returns a value from a meta object slot and is used for accessing record and tuple fields. There are also three constructors specifically made for pattern matching, UNIONTYPEVARIANT returning the value of the record variant for union-types,ISCONS for checking if a linked list node is cons or nil, and ISSOME for checking if an optional has a value.
Terminator
Each basic block has a terminator controlling the control flow following the block, which is represented by theTerminator type. Terminators have effects and can cause branching and/or exceptional control flow.
GOTO — TheGOTO terminator simply jumps to a given block.
RETURN — TheRETURN terminator simply exits the procedure.
BRANCH — TheBRANCH terminator jumps to one of two given blocks depending on if the given condition variable is true or false, and is used by several terminator types.
SWITCH — TheSWITCH terminator jumps to one of multiple given blocks in a dictionary depending on the value of the given condition variable, this is used when generating code in match statements.
CALL — TheCALL terminator is a function call to another Modelica function. Since it can cause control flow via exceptions (for example through the fail function), it is defined as a terminator rather than a statement.
LONGJMP, PUSHJMP and POPJMP — The LONGJMP terminator causes a con-trol flow transfer to the active PUSHJMP call site, even across function boundaries. ThePUSHJMP terminator is used to add a new active location forLONGJMP while the PUSHJMP terminator is used to deactivate a corre-sponding activePUSHJMP and cause the previously called one to become active.
ASSERTand TERMINATE — The ASSERT terminator aborts the program with an error message if aVar containing a condition result has a false value. The TERMINATOR simply unconditionally aborts with an error message. The error message for both terminators is given by aVar.
4.1.2 From Modelica to MidCode
MidCode is designed to represent interesting low-level properties uniformly, which means that we need to lower several high-level Modelica representations into a composition of MidCode constructs. The DAEToMid phase takes Mod-elica functions as given from the SimCode module and converts it to Mid-Code. The most important fields in a SimCode function object are its name as anAbsyn.Path¸ its variable definitions, and its list of DAE statements.