Institutionen f¨
or Datavetenskap
Department of Computer and Information Science
Master’s thesis
OMCCp: A MetaModelica Based
Parser Generator Applied to Modelica
by
Edgar Alonso Lopez-Rojas
LIU-IDA/LITH-EX-A–11/019–SE
2011-05-31
'
&
$
%
Link¨
opings universitet
SE-581 83 Link¨
oping, Sweden
Link¨
opings universitet
581 83 Link¨
oping
Institutionen f¨
or Datavetenskap
Department of Computer and Information Science
Master’s thesis
OMCCp: A MetaModelica Based
Parser Generator Applied to Modelica
by
Edgar Alonso Lopez-Rojas
LIU-IDA/LITH-EX-A–11/019–SE
2011-05-31
Supervisors: Martin Sj¨
olund and Mohsen Torabzadeh-Tari
Dept. of Computer and Information Science
Examiner:
Prof. Peter Fritzson
Upphovsr¨
att
Detta dokument h˚
alls tillg¨
angligt p˚
a Internet ˆ
a eller dess framtida
ers¨
attare ˆ
a under en l¨
angre tid fr˚
an publiceringsdatum under
f¨
oruts¨
attning att inga extra-ordin¨
ara omst¨
andigheter uppst˚
ar.
Tillg˚
ang till dokumentet inneb¨
ar tillst˚
and f¨
or var och en att l¨
asa,
ladda ner, skriva ut enstaka kopior f¨
or enskilt bruk och att anv¨
anda
det of¨
or¨
andrat f¨
or ickekommersiell forskning och f¨
or undervisning.
¨
overf¨
oring av upphovsr¨
atten vid en senare tidpunkt kan inte upph¨
ava
detta tillst˚
and. All annan anv¨
andning av dokumentet kr¨
aver
up-phovsmannens medgivande. F¨
or att garantera ¨
aktheten, s¨
akerheten
och tillg¨
angligheten finns det l¨
osningar av teknisk och administrativ
art.
Upphovsmannens ideella r¨
att innefattar r¨
att att bli n¨
amnd som
up-phovsman i den omfattning som god sed kr¨
aver vid anv¨
andning av
dokumentet p˚
a ovan beskrivna s¨
att samt skydd mot att dokumentet
¨
andras eller presenteras i s˚
adan form eller i s˚
adant sammanhang
som ¨
ar kr¨
ankande f¨
or upphovsmannens litter¨
ara eller konstn¨
arliga
anseende eller egenart.
F¨
or ytterligare information om Link¨
oping University Electronic
Press se f¨
orlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet - or
its possible replacement - for a considerable time from the date of
publication barring exceptional circumstances.
The online availability of the document implies a permanent
per-mission for anyone to read, to download, to print out single copies
for your own use and to use it unchanged for any non-commercial
research and educational purpose. Subsequent transfers of copyright
cannot revoke this permission. All other uses of the document are
conditional on the consent of the copyright owner. The publisher has
taken technical and administrative measures to assure authenticity,
security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to
be protected against infringement.
For additional information about the Link¨
oping University
Elec-tronic Press and its procedures for publication and for assurance
of document integrity, please refer to its WWW home page:
http://www.ep.liu.se/
c
Abstract
The OpenModelica Compiler-Compiler parser generator (OMCCp) is an
LALR(1) parser generator implemented in the MetaModelica language with
parsing tables generated by the tools Flex and GNU Bison. The code
gener-ated for the parser is in MetaModelica 2.0 language which is the
OpenMod-elica compiler implementation language and is an extension of the ModOpenMod-elica
3.2 language. OMCCp uses as input an LALR(1) grammar that specifies the
Modelica language. The generated Parser can be used inside the
OpenMod-elica Compiler (OMC) as a replacement for the current parser generated by
the tool ANTLR from an LL(k) Modelica grammar. This report explains
the design and implementation of this novel Lexer and Parser Generator
called OMCCp.
Modelica and its extension MetaModelica are both languages used in the
OpenModelica environment.
Modelica is an Object-Oriented
Equation-Based language for Modeling and Simulation.
Acknowledgements
It is an honor for me to be able to culminate this work with the guidance
of remarkable computer scientists. This thesis would not have been possible
unless the clear vision of my examiner, professor Peter Fritzson. As the
director of the Open Source Modelica Consortium (OSMC) he presented this
great opportunity to me. Together with him, I have to thank my supervisors
Martin Sj¨
olund and Mohsen Torabzadeh-Tari. Martin has made available
his support and guidance in a number of ways that I cannot count and
Mohsen has always been keeping track of my progress and helping me with
the difficulties I found. I am pleased to be part, learn and contribute to this
great open source project called OpenModelica.
Nevertheless, To IDA (Department of Computer and Information
Sci-ence) for offering its locations and resources for my daily work.
I cannot forget to thank my family. My parents Jesus and Soledad for
supporting me since the beginning in this project to become a Master in
Computer Science. My fianc´
ee Helena, who has all the time been
encour-aging me to give my best in every step of this journey. I am delighted to
include my future daughter Isabella here; who is been my biggest motivation
to complete this work before the day she step for the first time in this world.
Last, but not less important my financial sponsors from Colombia:
Fun-dacion Colfuturo
1and EAFIT University
2. They believed in my talent and
provided the financial resources to achieve this goal.
1http://www.colfuturo.org/ 2http://www.eafit.edu.co/
Contents
1
Introduction
1
1.1
Background . . . .
1
1.2
Project Goal
. . . .
2
1.3
Methodology . . . .
2
1.4
Intended Readers . . . .
3
1.5
Thesis Outline
. . . .
3
2
Theoretical Background
5
2.1
Compilers . . . .
5
2.1.1
Fundamentals . . . .
6
2.1.2
Lexical Analysis
. . . .
8
2.1.3
Syntax Analysis
. . . .
10
2.1.4
Parser LALR(1)
. . . .
13
2.2
Error Handling in Syntax Analysis . . . .
15
2.2.1
Error Recovery . . . .
16
2.2.2
Error Messages . . . .
17
2.3
The OpenModelica Project . . . .
17
2.3.1
The Modelica Language . . . .
18
2.3.2
MetaModelica extension . . . .
18
2.3.3
Abstract Syntax Tree - AST
. . . .
21
3
Existing Technologies
23
3.1
OpenModelica Compiler (OMC)
. . . .
23
3.1.1
Architecture and Components . . . .
23
3.1.2
ANTLR . . . .
24
3.1.3
Current state . . . .
26
3.2
Flex . . . .
27
3.2.1
Input file lexer.l
. . . .
27
3.2.2
Output file lexer.c . . . .
27
3.3
GNU Bison . . . .
28
3.3.1
Input file parser.y
. . . .
29
3.3.2
Output file parser.c
. . . .
29
x
CONTENTS
4
Implementation
33
4.1
Proposed Solution
. . . .
33
4.2
OMCCp Design . . . .
34
4.2.1
Lexical Analyser . . . .
35
4.2.2
Syntax Analyser . . . .
39
4.3
OpenModelica Compiler-Compiler Parser (OMCCp) . . . . .
44
4.3.1
Lexer Generator . . . .
44
4.3.2
Parser Generator . . . .
46
4.4
Error handling . . . .
49
4.4.1
Error recovery
. . . .
49
4.4.2
Error messages . . . .
50
4.5
Integration OMC . . . .
54
5
Discussion
57
5.1
Analysis of Results . . . .
57
5.1.1
Lexer and Parser . . . .
57
5.1.2
OMCCp Construction . . . .
58
5.1.3
Implementation of a subset of Modelica and
Meta-Modelica grammar . . . .
61
5.2
OpenModelica Compiler . . . .
64
5.3
Limitations . . . .
64
6
Related Work
66
6.1
OpenModelica Development . . . .
66
6.2
Compiler-Compiler Construction . . . .
67
7
Conclusions
69
7.1
Accomplishments . . . .
69
7.2
Future Work
. . . .
70
Bibliography
73
Appendices
80
A OMC Compiler Commands
80
A.1 Parameters - MetaModelica Parser Generator . . . .
80
A.1.1
Generate compilerName . . . .
80
A.1.2
Run compilerName, fileName . . . .
80
A.2 OMC Commands . . . .
80
B Lexer Generator
83
B.1 Lexer.mo
. . . .
83
B.2 LexerGenerator.mo . . . .
92
B.3 LexerCode.tmo . . . 100
CONTENTS
xi
C Parser Generator
107
C.1 Parser.mo . . . 107
C.2 ParserGenerator.mo . . . 126
C.3 ParseCode.tmo . . . 143
D Sample Input
146
D.1 lexer10.l . . . 146
D.2 parser10.y . . . 147
E Sample Output
152
E.1 ParseTable10.mo . . . 152
E.2 ParseCode10.mo . . . 157
E.3 Token10.mo . . . 168
E.4 LexTable10.mo . . . 168
E.5 LexerCode10.mo . . . 171
F Modelica Grammar
176
F.1 lexerModelica.l . . . 176
F.2 parserModelica.y . . . 180
G Additional Files
205
G.1 SCRIPT.mos . . . 205
G.2 Main.mo . . . 206
Glossary
209
Acronyms
211
List of Figures
2.1
Compiler Phases . . . .
6
2.2
Compiler Front-End . . . .
8
2.3
Parser components . . . .
12
2.4
OpenModelica Environment [Fritzson et al., 2009]
. . . .
18
3.1
OMC simplified overall structure [Fritzson et al., 2009] . . . .
24
3.2
OMC Language Grammars
. . . .
24
4.1
OMCCp (OpenModelica Compiler - Compiler) Lexer and Parser
Generator . . . .
34
4.2
OMCCp Lexer and Parser Generator Architecture Design . .
36
4.3
OMC-Lexer design . . . .
37
4.4
OMC-Parser design . . . .
39
4.5
OMC-Parser LALR(1) . . . .
40
5.1
OMCCp - Time Parsing . . . .
63
List of Tables
2.1
LR(1) parsing table [Aho et al., 2006]
. . . .
14
2.2
LR(1) parsing table rearranged [Aho et al., 2006] . . . .
14
2.3
LALR(1) parsing table [Aho et al., 2006] . . . .
15
5.1
OMCCp Files Implementation . . . .
60
5.2
Test Suite - Compiler
. . . .
63
5.3
OMCCp - Time Parsing . . . .
63
Listings
2.1
MetaModelica uniontype . . . .
19
2.2
MetaModelica matchcontinue . . . .
20
2.3
MetaModelica list
. . . .
20
3.1
ANTLR grammar file structure . . . .
25
3.2
Flex file structure . . . .
27
3.3
Bison file structure . . . .
29
4.1
Lexer.mo function scan . . . .
37
4.2
Parser.mo function parse . . . .
41
4.3
MultiTypedStack AstStack
. . . .
43
4.4
ParseCode.mo case reduce action . . . .
43
4.5
ParseCode.mo function getAST . . . .
44
4.6
Modifications in the Bison Epilogue
. . . .
46
4.7
Modifications in the Rules section in Bison
. . . .
47
4.8
List of semantic values of tokens
. . . .
48
4.9
Constants for error handling . . . .
50
4.10 Custom error messages in OMCCp . . . .
53
4.11 Error messages in OMCCp
. . . .
53
4.12 program.mo with errors . . . .
54
4.13 Parser.mo original function . . . .
54
4.14 Parser.mo modified function . . . .
55
A.1 Compile Flex and Bison . . . .
81
A.2 OMCC.mos . . . .
81
A.3 OMCCP Command
. . . .
81
A.4 SCRIPT.mos debug mode . . . .
82
A.5 OMCCP debug mode
. . . .
82
B.1 Lexer.mo
. . . .
83
B.2 LexerGenerator.mo . . . .
92
B.3 LexerCode.tmo . . . 100
B.4 Types.mo . . . 102
C.1 Parser.mo . . . 107
C.2 ParserGenerator.mo . . . 126
C.3 ParseCode.tmo . . . 143
D.1 lexer10.l . . . 146
D.2 parser10.y . . . 147
xvii
xviii
LISTINGS
E.1 ParseTable10.mo . . . 152
E.2 ParseCode10.mo . . . 157
E.3 Token10.mo . . . 168
E.4 LexTable10.mo . . . 169
E.5 LexerCode10.mo . . . 171
F.1 lexerModelica.l . . . 176
F.2 parserModelica.y . . . 180
G.1 SCRIPT.mos . . . 205
G.2 Main.mo . . . 206
Chapter 1
Introduction
1.1
Background
The OpenModelica project develops a modeling and simulation environment
based on the Modelica language [Fritzson, 2004]. The effort is supported by
the Open Source Modelica Consortium (OSMC). It uses the
OpenModel-ica Compiler (OMC) [Fritzson et al., 2009] to generate either C, C++ or
C code that runs simulations which are written in the Modelica language.
OpenModelica currently makes use of the tool called Another Tool for
Lan-guage Recognition (ANTLR) to generate the Parser for the OpenModelica
Compiler (OMC). The work presented in this master’s thesis offers an
al-ternative for the ANTLR parser. We present a novel Compiler-Compiler
implemented completely on MetaModelica. MetaModelica is an extension
of the Modelica language intended for modeling the semantics of languages.
One large example is the modeling of the whole Modelica language together
with its MetaModelica extensions in the OpenModelica bootstrapped
com-piler version [Sj¨
olund et al., 2011].
The ANTLR parser generator [Parr and Quong, 1995], which is already
used in the OpenModelica project since several years, has well known
dis-advantages including memory overhead, bad error handling, lack of type
checking, and not generating MetaModelica code for building the Abstract
Syntax Tree (AST). Since the AST nodes are initially generated in C (for
later conversion into MetaModelica) without strong type checking, small
errors in the semantic actions in the grammar are not detected at
2
CHAPTER 1. INTRODUCTION
tion time, and can give rise to hard-to-find bugs in the generated C code.
When the semantic actions can be specified in MetaModelica and the AST
builder generated in MetaModelica, this source of errors can be completely
eliminated. Currently ANTLR generated parsers connect with OMC by an
external C interface. It is also built as an integrated Lexer and Parser that
hide behind a considerable amount of libraries. These libraries handle as
a black-box the complexity of the syntax analysis process in the compiler.
ANTLR is only suitable for parsing LL grammars.
1.2
Project Goal
The goal of this master’s thesis is to write a parser generator using
Meta-Modelica language that can replace the current parser ANTLR and generates
MetaModelica code instead of C-code.
The results expected from this thesis are:
• A Lexer and a Parser for Modelica grammar including its
MetaMod-elica extension that outputs the Abstract Syntax Tree (AST) for the
language processed.
• Lexer and Parser generator written in the MetaModelica language.
• Improvements in the error handling messages compared with ANTLR;
specifically the messages concerning error correction hints of
mal-formed syntax.
1.3
Methodology
The methodology used for the construction of the OpenModelica
Compiler-Compiler parser generator (OMCCp) is based on a literature study of
com-piler construction techniques. There are various projects that offer lexer and
parser generators but there are none for the Modelica language. A literature
review is the base for the initiation of this project on compiler construction.
Different literature from the OSMC is available. This contributes for a
better understanding on the OpenModelica project. Besides the literature
reviewed, we include the experience of the supervisor Martin Sj¨
orlund. He
built the first bootstrapping compiler for the Modelica Language [Sj¨
olund
et al., 2011]. Various papers and books from the examiner are available.
1.4. INTENDED READERS
3
The examiner has a clear vision of the next steps in the development of the
compiler due to his involvement in the project since it started several years
ago.
There are exercises available for learning the MetaModelica, including
online courses.
The exercises are important for familiarisation with the
MetaModelica language. A guide of MetaModelica is also provided to
ad-dress the most common built in functions and limitations of the language.
After the literature review, existing technologies that can support the
project are addressed. A review of the techniques they use and the
ben-efits is performed. This will lead the architectural decisions towards the
implementation of the parser and lexer generator.
Finally the implementation of a subset of the Modelica grammar for the
existing parser generator is addressed. This will finalise the project and
prove the validity of the proposed solution.
1.4
Intended Readers
The reader of this document is someone who wants to understand more
about compiler construction and more specifically the syntax analyser phase
of the OpenModelica Compiler. This document has important information
for the OpenModelica developer who wants to work on the OMC compiler
design and construction.
1.5
Thesis Outline
This thesis gives an overview of the OpenModelica project and the
architec-ture of the OpenModelica compiler. In the Chapter 2, Theoretical
Back-ground, it familiarise the reader with the topic of Compiler Construction.
More specifically the Lexer Analysis and Syntactic Analysis and different
basic concepts about grammars.
This thesis covers the topic of existing technologies in chapter 3 as a
basic understanding for the Implementation.
Finally the Chapters 5 Discussion and 6 Related Work explains different
parts of the project analysing the results of the implementation. The
con-clusions review the achievement of the goals and analyse the implemented
solution. Further work provides the reader who intends to continue this
4
CHAPTER 1. INTRODUCTION
work more information about desired extensions and improvements over
this project.
The appendices cover the source code of the entire project including the
sample generated files from the exercise 10 of the MetaModelica exercises
available in [Fritzson and Pop, 2011a,b]. A large subset of Modelica 3.2
grammar [Modelica-Association, 2010] is also included. It was used to prove
the usability of the parser generator.
Chapter 2
Theoretical Background
“The world as we know it depends on programming languages”
Aho et al. [2006]
We required a strong knowledge of compilers construction theory for
implementing the solution for this thesis. For a better understanding of this
project; the reader must be familiar with some of the fundamental terms
and basic algorithms used for the construction of the lexical analyser and
the parser during this project.
This chapter covers the main topics of compiler construction that are
used on the implementation of the solution presented in Chapter 4. The
next part of this chapter addresses an important topic for this project; which
is the improvement on error handling during the compiler parsing phase.
The last part presents an overview of the current OpenModelica project
including the Modelica and MetaModelica languages and the OMC.
2.1
Compilers
Aho et al. [2006] is a mandatory book for anyone who intends to understand
the concepts of Compilers. Most of the compiler’s theory covered on this
part is based on this book. Other sources such as Kakde [2002] and Terry
[2000] have been reviewed and is addressed in the different subsections. This
section intends to give the reader an introduction of the compiler terms and
techniques used during the design and development of this project.
6
CHAPTER 2. THEORETICAL BACKGROUND
2.1.1
Fundamentals
Programming languages rely strongly for their evolution and massive use
on compilers. These languages exist due to the limitations for developers
in building complex systems in machine-language; which only identify
se-quences of binary instructions. However, in a more general view, a Compiler
is a software tool that serves as a translator from one language into another.
Figure 2.1: Compiler Phases
If we see a compiler as a process we can identify the source language as
the input and the target language as the output of this process. For example
in languages such as C, the input language is the C code and the output
language is machine code for a specific architecture and operative system.
2.1. COMPILERS
7
There are several types of compilers used today, and the classification
depends on different purposes of the compilers. We distinguish between
native-compilers, cross-compilers, interpreters and source-to-source
compil-ers translators.
Native-compilers are used for the generation of machine-specific code
(binary code). The cross-compilers generate machine-specific code too; but
they generate the code for a different machine as the one they are running.
The interpreters for languages are similar to the Java virtual machine
(JVM)
1. They receive as an input two parameters: the source program
and the input for the program. The interpreter simulates the result of the
compiled source program executed directly in machine-language code. It
outputs the expected result of the source program over the input used as a
parameter.
We are addressing the source-to-source compilers in this report; which
are commonly used for translating one high level language such as Modelica
into another high level language such as C. This technique is common due
to the difficulty of generating low level language code such as Assembler or
directly binary code.
The complexity of a compiler is showed in the figure 2.1, inside a
com-piler there are two main parts that can be recognised, the Analysis
(Front-End ) and the Synthesis (Back-(Front-End ). The Analysis phase is handled by
the Front-End of the compiler. The Front-End is divided into three steps:
Lexical Analysis, Syntax Analysis and Semantic Analysis. These steps as
presented in figure 2.2
The Lexical Analysis task is performed by a component called Lexer.
The main function of the Lexer is to get as an input the source code and
recognised different sequences of characters into a unit called token.
The Front-End is the part of the compiler that we focus on this
im-plementation. During the analysis phase, the source code is processed by
the Lexical Analyser, Syntax Analyser and Semantic Analyser to output an
intermediate representation of the input code called Abstract Syntax Tree
(AST).
8
CHAPTER 2. THEORETICAL BACKGROUND
input program
tokens abstract
syntax tree three-address code
Symbol Table
Lexer Parser Intermediate
code generator Compiler Front-End
Figure 2.2: Compiler Front-End
2.1.2
Lexical Analysis
The Lexical Analysis, also called scanning, receives the source code as a
character stream. It identifies the special tokens specified by a language
making it more simple for the next phase of the compiler. The programming
language’s tokens are often specified by the use of Regular Expressions.
A Lexer is a program that runs a Finite Automata which recognises a
valid language based on a regular language. As mentioned above, the regular
languages are described by the use of regular expressions.
The Lexical Analysis is the first part of the compiler. It simplifies the
complexity of recognising a complete grammar, by giving a simple
trans-formation of the source code into a list of tokens. In the next step of the
compiler the syntax analysis uses only the tokens to accept or reject the
source code provided.
For better understanding on how the Lexical Analysis works; we
intro-duce the basic concepts of Finite Automata and Regular Languages in the
next section. We present later a description of what a Lexer specifically
does.
Finite Automata and Regular Languages
Sipser [2005] presents the use of a Finite Automata, also known as
nite State Machines, to recognize the regular languages. He defines a
Fi-nite Automata as a collection of states (Q), an alphabet (Σ), a transition
function(δ : QxΣ → Q), a start state (q
0) and a set of accept states.
2.1. COMPILERS
9
To describe briefly how a Finite Automata works, the use of state
di-agrams are broadly used. There are two types of Finite Automata, the
first one is Deterministic Finite Automata (DFA) and the other is
Non-Deterministic Finite Automata (NFA).
The Lexer
For the construction of the Lexer it is preferred to use the DFA. However,
a NFA can also be converted into a DFA. A lexer can also simulate the
non-deterministic behaviour of a NFA. The main reason for using a DFA is
that we want to have a transition function δ that allows the Lexer to decide
only one path over a specific char input in the character stream from the
source code.
All the regular expressions of the set of tokens are summarised during the
construction of a Lexer. It often happens that one sequence of characters
can be recognised as two or more different tokens. Therefore, the lexer
must have extra rules that prioritise longer strings over shorter ones. Other
instructions can also be added to order the rules in an accepting sequence
to avoid ambiguity.
The Lexical Analysis phase filters some tokens that are used only by the
programmer such as comments, different kind of spacing and indentation on
the code. This task simplifies the complexity of the code by converting all
the characters in a list of tokens. If the Parser has to deal with this task the
amount of terminal tokens will increased, making the Rules more complex
and decreasing the overall performance of the compiler.
The compiler gains performance when the Lexical Analysis is kept
sep-arated from the Syntax Analysis. This performance can be achieved by
ap-plying specialised techniques in the handling of the character stream, such
as buffering for reading certain amount of characters at the time.
A structure called Symbol Table is used to store all the identifiers with
their names or values, this structure avoids duplication and efficiency of the
code through all the phases of the compiler as represented in the Figure 2.2.
A token is usually represented by a tuple, consisting in an identifier of the
token and a reference to the Symbol Table, e.g. T OKEN < IDEN T, x >
where IDENT is the identifier of the token and x is the value found by the
Lexer of the identifier.
10
CHAPTER 2. THEORETICAL BACKGROUND
Flex a Lexer Generator
There are some programs that automate the labour of constructing the
transition rules to identify the tokens for a Lexer. Flex is one example of
a Lexer Generator. It is based on the Lexer generator Lex. It uses as an
input a file with the definition of the rules for the recognition of the tokens.
These rules are defined using regular expressions.
Flex also allows the developer to specify the return token that matches
each pattern. Some tokens such as white spaces and line feeds are ignored
as explained above.
In the next chapter we cover the existing technologies used for this report
and we explain in a technical detail Flex.
2.1.3
Syntax Analysis
The Syntax Analysis phase is performed by a program called Parser. The
Parser requires a more powerful language than regular expression to specify
the programming language constructions. The rules are commonly expressed
using Context Free Grammars (CFG). The CFG can be recognised by the
use of a PushDown Automata.
PushDown Automata and Context Free Languages
Sipser [2005] defines a Context Free Grammar (CFG) as a 4-tuple (V, Σ, R, S),
where V is the set of variables, Σ is a set of terminals, R is a set of Rules
and S is the Start Variable.
A Push-Down Automata (PDA) starts by reading an input set of tokens.
The PDA uses the tokens and a stack to store and decide the next state and
action of the PDA, this action can be to reduce from the stack or to store
any state or token into the stack. By doing this it keeps running until it
finds an accept state and then ends. Several situations can happen including
an infinite loop of the machine. This explains why the grammar should be
constructed in such a way that it avoids these problems.
Similar to the DFA, there are PDA that are deterministic, and those are
the ones we consider for building the Parser.
2.1. COMPILERS
11
The Parser
The Parser is in charge of determining if the source code that has been
tokenised by the Lexer is constructed according to the rules of the grammar.
By doing this, it executes a PDA that outputs “accept ”if the input belongs to
a valid construction of the grammar, otherwise it outputs an error message
identifying the token that does not fit the construction rules.
The work done by the Lexer in the first phase of the compiler allows the
Parser to ignore tokens such as white spaces and line feed and consider all
the tokens as terminals of the grammar that describes the language. This
simplifies the rules of the Parser making it more efficient and fast in the
process of syntax analysing.
The Parser validates the rules of the grammar from list of tokens
re-ceived from the Lexer. However at the same time, a Parser can executes
an additional task, which is the construction of a structure called Abstract
Syntax Tree (AST). The AST resembles a tree and it is the representation
of all the source code in a set of three instructions. This is the input for the
compiler Back-End. The Back-End uses this AST for the optimisation and
generation of machine-specific code.
A Parser is composed by a predictive table, a stack of states, a list of
tokens as an input and a parsing algorithm that runs over the list of tokens.
Figure 2.3 shows these components and their interactions.
A Parser uses the predictive tables, also called parsing tables, to
deter-mine the next action and the new state of the machine. The next state is
queried from the parsing tables depending on the lookahead token and the
current state of the stack.
Parsers are commonly classified by the algorithm used for performing the
parsing operation. There are three known types: Top Down Parser, Bottom
Up Parser and Universal Parser. However for programming languages only
the first two are utilised due to the inefficiency of the Universal Parser.
A Top-Down Parser builds the parse tree from the top to the bottom.
Bottom Up Parser works in the opposite direction as the Top Down Parser.
Top Down Parsers only work for grammars called Left to right, Left most
derivation parser (LL). A LL(k) Parser is a top descendant parser with k
lookahead tokens. LL(k) Parsers utilise a predictive table to decide the next
state.
12
CHAPTER 2. THEORETICAL BACKGROUND
Tokens Parsing TablesPARSER
AST
State Stack lookaheadFigure 2.3: Parser components
derivation parser (LR). Knuth [1965] introduced first the concept of LR
parsing. The most common parsers are LR(k), Simple LR parser (SLR) and
Look Ahead LR parser (LALR). The Parser LALR(1) uses a simplification
of the parsing tables used by the LR(1) parser.
In general a Bottom Up Parser builds the AST by performing two types
of task: Shift and Reduce.
Shift allocates the variable or terminal symbol found while the machine
goes through the list of tokens. It utilise a table called Action Table; which
contains all terminals and rules for calculating the next state. The table
called GOTO Table is used for calculating the next state. When the result
is calculated it pushes these values back into the state stack.
Reduce pops a certain number of values from the stack to apply later
a push with a new value also using the GOTO Table. While reducing a
LALR parser can build up the AST and push the new value into another
stack called Semantic Stack which also follows the rules of shift and reducing
performed by the algorithm.
Blasband [2001] made an effort in parsing grammars that do not perfectly
fit into the classification of LL and LALR grammars.
On this report we briefly look at Top Down Parsers. We are more
inter-ested in the LALR(1) Bottom Up Parser; which is the type of parser used
in this implementation. The parser LALR(1) is explained in more detail in
the next section.
2.1. COMPILERS
13
2.1.4
Parser LALR(1)
The LALR(k) parsers were first introduced by DeRemer [1969]. They are
the most commonly parsers used in programming languages due to the speed
and size of the parsing tables and the advantages over its predecessors the
LR(0) and SLR(0) parsers.
Kakde [2002] and Aho et al. [2006] explain very well how the bottom up
algorithm works. We are interested here in understand the basic principles
of the LALR(1) algorithm.
parsing tables
There are two tables in an LALR parser: The first one is the ACTION table,
the second is the GOTO table. The theoretical construction of these tables
can be found almost in any compiler literature such as Aho et al. [2006],
Kakde [2002], Terry [2000].
There are two methods for constructing LALR(1) parsing tables from the
LR(1) parsing tables: The first one, the easy but space consuming method, is
presented here. The other method differentiates from the former by checking
in every step of the construction of the LR(1) the simplification of the
com-mon rules, reducing significantly the number of states in the LR(1) parsing
table.
We explain the construction of the LALR(1) parsing tables and the
con-tent of the LR(1) through this example. Lets take this sample from the
grammar from Aho et al. [2006].
Simple Grammar Sample
S’ → S
S → CC
C → cC|d
From this grammar the parsing table LR(1) 2.1 is constructed according
to the algorithm presented in Aho et al. [2006]. This is called the
canoni-cal LR(1) collection. The symbol r in the table identifies a REDUCTION
operation and the symbol s identifies a SHIFT operation. The keyword acc
identifies the acceptance valid state.
From the table 2.1 we can observe that about half of the entries in the
table are blank spaces. The LR(1) parsing tables have the disadvantage of
14
CHAPTER 2. THEORETICAL BACKGROUND
Table 2.1: LR(1) parsing table [Aho et al., 2006]
ACTION
GOTO
state
c
d
$
S
C
0
s3
s4
1
2
1
acc
2
s6
s7
5
3
s3
s4
8
4
r3
r3
5
r1
6
s6
s7
9
7
r3
8
r2
r2
9
r2
growing considerably large, even for small grammars, due to the redundancy
of productions for similar states with different lookahead symbol.
If we rearranged the rows in the way presented in the parsing table
2.2. We can notice that there are similarities between the productions for
different lookahead and the states (3 and 6 and 8 and 9). There are states
that share the same core production for different lookahead symbols.
Table 2.2: LR(1) parsing table rearranged [Aho et al., 2006]
ACTION
GOTO
state
c
d
$
S
C
0
s3
s4
1
2
1
acc
2
s6
s7
5
3
s3
s4
8
6
s6
s7
9
4
r3
r3
7
r3
5
r1
8
r2
r2
9
r2
The LALR(1) parsing table 2.3 is constructed in based on the one above
first by identifying the common core of each set and replacing the sets with
2.2. ERROR HANDLING IN SYNTAX ANALYSIS
15
an union.
For better understanding of this construction the reader can
address the literature [Aho et al., 2006, Section 4.7.4].
Table 2.3: LALR(1) parsing table [Aho et al., 2006]
ACTION
GOTO
state
c
d
$
S
C
0
s3
s4
1
2
1
acc
2
s6
s7
5
36
s36
s47
89
47
r3
r3
r3
5
r1
89
r2
r2
r2
LALR(1) Algorithm
Both the LR(1) and the LALR(1) perform the same algorithm, the only
difference is the parsing tables used by LALR(1) contain different states
that will be Shifted or Reduced in the Stack.
The parsing algorithm starts by finding the right action in the ACTION
table for a given terminal symbol a and a current state i denoted
AC-TION[i,a]. This value can have either a REDUCE (r), SHIFT (s), ACCEPT
(acc) or error (blank) action.
The GOTO table is used to find the next state j and the non-terminal I
denoted GOT O[I
i, A] = I
j.
A REDUCE action takes a certain number of symbols from the parsing
stack, apply a transformation and puts back the result and the next state
back into the stack.
When an error is detected (blank entry in the parse table), several
cor-recting actions can be performed. This topic is covered with more detail in
Section 2.2.
2.2
Error Handling in Syntax Analysis
The Error handling techniques in the Front-End are more relevant during
the Syntax Analysis phase and the Semantic Analysis phase than in the
16
CHAPTER 2. THEORETICAL BACKGROUND
Lexical Analysis phase.
Only a few errors can be detected by the Lexical Analysis, such as
non-terminated comments, invalid characters used or unrecognised token. One
possible error-recovery strategy implemented in a Lexer is to ignore invalid
characters from the input and keep the process.
The Error handling techniques can be divided into two topics: Error
recovery techniques and Error Message display. Error recovery techniques
are concerned on how the parser can keep parsing after an error token is
found. Error Message displays are related with how to present useful hints
for the developer in order to correct the source code.
In this section we will present the two topics named above for error
handling techniques during the Syntax Analysis phase.
2.2.1
Error Recovery
For LALR parsers several error recovery techniques have been developed as
in [Burke and Fisher Jr, 1982, Bilos, 1983, Burke and Fisher, 1987, McKenzie
et al., 1995, Degano and Priami, 1998, Corchuelo et al., 2002] and more
recent researches as in [Kats et al., 2009, de Jonge et al., 2010].
Error recovery techniques try to improve the quality of the parser by
different techniques such as primary recovery or secondary recovery.
The first condition to start the recovery is to access the configuration
obtained when the token preceding the error token was shifted onto the
stack. Techniques for deferring the reduce actions after a shift have been
developed in Burke and Fisher Jr [1982].
Primary techniques are related with single token modification from the
list of tokens. Single modification is only possible when the error is classified
as simple. This modification can be either insertion, deletion, substitution
or merging.
Every attempt to perform a repair is known as a trial. A common
tech-nique for searching the trials is to attempt to repair the error token by
performing one of these operations: merging, insertion, substitution, scope
recovery and finally deletion.
In the case of insertion or substitution a set of possible candidates should
be generated and then from there a single candidate or none should be
selected.
2.3. THE OPENMODELICA PROJECT
17
tokens needs to be reduced. This can be done by discarding tokens that
precedes, follows or surround the error token. This is known as secondary
recovery.
2.2.2
Error Messages
In simple recovery the error messages are classified in 5 different types:
merging, misspelling, insertion, deletion, substitution.
In secondary recovery the error messages are classified in 2 types. Type
1 error messages are displayed when the discarded tokens are present in
a single line. Type 2 errors are displayed when multiple lines need to be
discarded.
In addition there are 3 other types. The first refers to different candidates
for a recovery. The second type is displayed when the end of file is reached
but not expected. The third is used when all error recovery routines fail;
and then the parser displays a generic unrecoverable syntax error message.
2.3
The OpenModelica Project
OpenModelica
2is an open source project leaded by the Open Source
Model-ica Consortium (OSMC)
3. At the moment of writing this report
OpenMod-elica is on version 1.7.0 launched on April 2011.
OpenModelica contains different tools that contribute with the design
and construction of simulation projects in OpenModelica. These tools are
classified into: Compiler tools, Graphic interface tools, Eclipse-based
envi-ronment.
The OpenModelica environment consists in several tools such as
OMEd-itor, UML-Modelica, OMShell, OMNotebook, DrControl under
OMNote-book and Modelica Development Tooling (MDT). There are some other
resources such as documentation, OMDev (tools for building the compiler),
and auxiliary tools for the OpenModelica Developer that have been used
during the development of this project. Figure 2.4 shows the architecture
of the OpenModelica environment.
2OpenModelica: http://www.openmodelica.org
18
CHAPTER 2. THEORETICAL BACKGROUND
OMOptim
Optimization
Subsystem
Modelica
Compiler
Interactive
session
handler
Execution
Graphical Model
Editor/Browser
Textual
Model Editor
Modelica
Debugger
DrModelica
NoteBook
Model Editor
Eclipse Plugin
Editor/Browser
Figure 2.4: OpenModelica Environment [Fritzson et al., 2009]
2.3.1
The Modelica Language
The design of the Modelica Language was started in the fall 1996. The
first report of the language was made available on the web in September
1997. The first publication on Modelica by Elmqvist [1997] was made at the
Symposium on Computer-Aided control System Design is been developed
ever since, with several researcher contributors e.g. Fritzson and Bunus
[2002], Pop and Fritzson [2005, 2006], Akesson et al. [2008, 2010], Sj¨
olund
[2009], Sj¨
olund et al. [2011], Lundvall et al. [2009] as a language created
for multi-domain modelling and simulation. Modelica is an equation-based
and object-oriented language designed with the aim of defining a de facto
standard for simulation.
There have been recent efforts in writing a new Modelica compiler. The
compiler and other parts of the OpenModelica project are described in
Fritz-son et al. [2009].
2.3.2
MetaModelica extension
The main source of information for the MetaModelica language is the draft
document “MetaModelica users guide” written by Fritzson and Pop [2011a].
This document has been improved recently by Fritzson and Pop [2011b]
towards the implementation of the specifications of a new version of the
2.3. THE OPENMODELICA PROJECT
19
Modelica Language.
MetaModelica was created in the OpenModelica project with the
inten-tion of modelling the semantics of the Modelica language. MetaModelica is
then the starting point for the construction of a Modelica Compiler. The
MetaModelica Language is part of the project to create a Bootstrapping
compiler written in MetaModelica for MetaModelica and Modelica language.
MetaModelica adds new operators and types to the Modelica language.
We cover in this report the constructs uniontype, record, matchcontinue and
list.
uniontype
The uniontype is a construct that allows MetaModelica to declare types
based on the union of 2 or more record types. It can be recursive and
include other uniontypes. An example of uniontype is presented in listing
2.1.
Listing 2.1: MetaModelica uniontype
1 uniontype Exp record INT 3 Integer i n t e g e r ; end INT ; 5 record IDENT String i d e n t ; 7 end IDENT ; end Exp ;
matchcontinue
The matchcontinue instruction resembles the switch instruction in C with
some additions. Unlike the switch instruction, matchcontinue can return a
value. It can contain more than one conditional, and it can also return more
than one value. A section for definition of local variables is present right
after the matchcontinue declaration. The wild card ‘ ’ (underscore) can be
used to match all cases, additionally an else case can be used instead of the
wild card.
The matchcontinue instruction contains case blocks similar as the
com-mon switch instruction in C code. Each case can contain an equation-block.
The program flow tries to execute correctly one instruction after the next
20
CHAPTER 2. THEORETICAL BACKGROUND
one in a specific equation-block. If any instruction is not executed or fails,
the next case is tried. If it fails again then it keeps trying the next case
until one case block reaches the end. Then a return value is assigned to
the corresponding variables or case block can reach the final and no value
is assigned to the variables. An example of the syntax for matchcontinue is
presented in listing 2.2.
Listing 2.2: MetaModelica matchcontinue
2 ( to ke n , env2 ) := matchcontinue ( a c t ) l o c a l
4 Types . Token t o k ; case ( 1 ) equation 6
t o k = Types .TOKEN( tokName [ a c t ] , a c t , b u f f e r , i n f o ) ; 8 then (SOME( t o k ) , env2 ) ;
case ( )
10 then (NONE( ) , env2 ) ; e l s e
12 then (NONE( ) , env2 ) ; end matchcontinue ;
list
It is used to create linked list that works as in C. The bracket are used to
define the list elements. The operand ‘::’ is used to add items or to retrieve
items from a list.
To illustrate how a list works in MetaModelica, we have the following
sample code in listing 2.3. In this code the instruction on line 1 creates a
list called ‘a’. In the line 2 it retrieves the top element ‘3’ from the list ‘a’,
and saves it in the variable i. It saves the rest of the list back in the variable
‘a’. Finally we have the line 3 which is the inverse operation of line 2 and
will add an item ‘i’ into the list ‘a’.
Listing 2.3: MetaModelica list
1 l i s t <Integer> a = { 1 , 2 , 3 } ; i : : a=a ;
2.3. THE OPENMODELICA PROJECT
21
2.3.3
Abstract Syntax Tree - AST
The AST (Abstract Syntax Tree) is is a structure that abstracts part of
the details present in the source code and represents unambiguously the
constructs of the programming language. The declaration of the AST
(se-mantic constructions of the language) is possible to be built in MetaModelica
language because of the construct ‘uniontype’ explained in the last section.
The construction of this tree is based in primitive operations such as
integer operations. These constructs represent the semantic constructs of
Modelica and MetaModelica language. The file Absyn.mo contains the
spec-ification for the constructs.
Chapter 3
Existing Technologies
Several technologies were the base for the construction of this project. In
this chapter we present an introduction about some of the features that the
OpenModelica Compiler (OMC) has built-in and the current ANTLR parser
generator used in OMC.
Fast Lexical Analyzer Generator (FLEX) and GNU Bison are the
tech-nologies in which this project was based for the construction of the OMCCp.
Aaby [2003] is a good reference for those who want to learn more about
compilers construction with Flex and Bison. We use his book and the GNU
Bison manual (latest version today 2.4.3 by Donnelly and Stallman [2010])
to explain some technological aspects of FLEX and GNU Bison.
This chapter is not intended to be a guide for these technologies, but it
gives the reader the required concepts to understand the rest of this thesis.
3.1
OpenModelica Compiler (OMC)
OpenModelica Compiler (OMC) is the core tool for the OpenModelica project.
It has been developed since the beginning of the Modelica language.
Fea-tured in Fritzson et al. [2009].
3.1.1
Architecture and Components
The architecture of OMC is presented in Figure 3.1. The main components
of this diagram represent some of the phases of a compiler where the lexical
and syntax analysis is represented by the starting process “Parser”. This
24
CHAPTER 3. EXISTING TECHNOLOGIES
parser does both, the lexical and the syntax analysis due to the design of
ANTLR as an integrated lexer and parser generator.
SCode /explode
Lookup
Parse Inst DAELow
Ceval Static
Absyn SCode DAE
(Env, name) SCode.Class
Exp.Exp
Values.Value SCode.Exp Types.Type)(Exp.Exp,
(Env, name) Main SimCode C code Dump DAE Flat Modelica
Figure 3.1: OMC simplified overall structure [Fritzson et al., 2009]
The OMC is used to compile both, MetaModelica grammar and Modelica
grammar from version 1 until 3 (see Fig 3.2). Its source code is available
for download from the subversion repository
1.
OMC Compiler
MetaModelicaParser Modelica1Parser Modelica2Parser Modelica3Parser
Figure 3.2: OMC Language Grammars
3.1.2
ANTLR
Another Tool for Language Recognition (ANTLR) is a parser generator tool
that integrates the lexical analysis and the syntax analysis in one single tool.
3.1. OPENMODELICA COMPILER (OMC)
25
It generates parsers for LL(k) grammars.
ANTLR was created by Parr and Quong [1995]. ANTLR is today in
version 3. The information presented here is extracted from the official
website
2and the tutorial website by Mills [2005]. The reference manual
by Parr [2007] is a more complete and detailed information about ANTLR.
This project is intended to be a substitute for this tool. We consider
impor-tant for this project to get an overview of the most significant features and
characteristics of this tool.
The grammar used in ANTLR is of type LL(k), which means that the
parsers generated by ANTLR are Top-Down parsers as explained in
Sec-tion 2.1.3. ANTLR uses Extended Backus-Naur Form (EBNF) notaSec-tion for
defining the grammar rules. The notation Extended Backus-Naur Form is
an extension of Backus-Naur Form (BNF). EBNF notation adds new
con-structs to the BNF such as ‘+’ to indicate one or more of an item after
square brackets.
The grammar file used by ANTLR contains several parts as presented in
listing 3.1.
Listing 3.1: ANTLR grammar file structure
1 h e a d e r {
// s t u f f t h a t i s p l a c e d a t t h e t o p o f < a l l > g e n e r a t e d f i l e s
3 }
5 o p t i o n s { o p t i o n s f o r e n t i r e grammar f i l e }
7 { o p t i o n a l c l a s s pr eam ble − output t o g e n e r a t e d c l a s s f i l e i m m e d i a t e l y b e f o r e t h e d e f i n i t i o n o f t h e c l a s s } 9 c l a s s Y o u r L e x e r C l a s s extends L e x e r ; // d e f i n i t i o n e x t e n d s from h e r e t o n e x t c l a s s d e f i n i t i o n 11 // ( o r EOF i f no more c l a s s d e f s ) o p t i o n s { YourOptions } 13 t o k e n s { EXPR; // I m a g i n a r y t o k e n 15 THIS=” t h a t ”; // L i t e r a l d e f i n i t i o n INT=” i n t ”; // L i t e r a l d e f i n i t i o n 17 } 19 l e x e r r u l e s . . . myrule [ a r g s ] r e t u r n s [ r e t v a l ] 21 o p t i o n s { d e f a u l t E r r o r H a n d l e r=f a l s e ; } 2ANTLR: http://www.antlr.org/
26
CHAPTER 3. EXISTING TECHNOLOGIES
: // body o f r u l e . . .
23 ;
25 { o p t i o n a l c l a s s pr eam ble − output t o g e n e r a t e d c l a s s f i l e i m m e d i a t e l y b e f o r e t h e d e f i n i t i o n o f t h e c l a s s } 27 c l a s s Y o u r P a r s e r C l a s s extends P a r s e r ; o p t i o n s { YourOptions } 29 t o k e n s . . p a r s e r r u l e s . . . 31 r u l e n a m e [ a r g s ] r e t u r n s [ r e t v a l ] o p t i o n s { d e f a u l t E r r o r H a n d l e r=f a l s e ; } 33 { o p t i o n a l i n i t i a t i o n c o d e } : a l t e r n a t i v e 1 35 | a l t e r n a t i v e 2 . . . 37 | a l t e r n a t i v e n ; 39
{ o p t i o n a l c l a s s pr eam ble − output t o g e n e r a t e d c l a s s f i l e 41 i m m e d i a t e l y b e f o r e t h e d e f i n i t i o n o f t h e c l a s s } c l a s s Y o u r T r e e P a r s e r C l a s s extends T r e e P a r s e r ; 43 o p t i o n s { YourOptions } t o k e n s . . . 45 t r e e p a r s e r r u l e s . . . 47 // a r b i t r a r y l e x e r s , p a r s e r s and t r e e p a r s e r s may be i n c l u d e d
As we can see, it contains several sections including a header, lexer
(to-kens and rules), parser (to(to-kens and rules), AST rules and options sections
that are copied verbatim to the generated parser.
The generated parser files are in the desired target-language that is
spec-ified when compiling the grammar file.
ANTLR allows the OpenModelica developers to specify in a robust and
flexible way the rules and the grammar for the combination of the lexer
and parser. It then generates code in the target language that outputs the
designed AST for both Modelica and MetaModelica grammar.
3.1.3
Current state
The OMC is today (May 2011) in the version 1.7.0 (r8600). It is intended
to be used by both industry and academy. Various research materials have
3.2. FLEX
27
been produced since 1997 including: Master’s
3and PhD’s
4thesis,
confer-ence papers
5, journals papers
6and books
7. Other more are today under
development or recently finished such as this master’s thesis. This proves
that the OMC is today an active research topic in the OpenModelica project.
3.2
Flex
Based in the tool called Lexical Analyzer Generator (LEX). The grammar
accepts regular expressions to define the tokens.
3.2.1
Input file lexer.l
The FLEX input file lexer.l contains three sections: definitions, rules and
user code.
Listing 3.2: Flex file structure
1 D e f i n i t i o n s %%
3 R u l e s %% 5 U s e r c o d e
Definitions: Contains declarations of definitions and start conditions. Can
contain code to be included verbatim to the output in the top as a
declaration.
Rules: Contains the rules in the form of patterns of and extended set of
regular expressions. Each rule contains an action in C code that can
return a token, reject or change the start condition.
User Code: It is copied verbatim to the output file.
3.2.2
Output file lexer.c
The output file lexer.c generated by FLEX contains three main sections:
Declaration of variables and arrays, the algorithm that runs the DFA and
3http://www.openmodelica.org/index.php/research/master-theses
4http://www.openmodelica.org/index.php/research/phd-and-licentiate-theses 5http://www.openmodelica.org/index.php/research/conference-papers 6http://www.openmodelica.org/index.php/research/journal-papers 7http://www.openmodelica.org/index.php/research/booksproceedings
28
CHAPTER 3. EXISTING TECHNOLOGIES
the return action section with the actions that have been specified for each
rule.
The arrays that are present in the lexer are:
yyec: Mach any UTF-8 code with a started condition.
yyaccept: check the states against the accept condition.
yyacclist: once accepted, the action for each state is found here.
yymeta: control array for the transitions.
yybase: control array for the transitions.
yydef: default transition for the states.
yynxt: determines the next transition of the states.
yychk: control array that verifies errors.
FLEX is designed to handle a large amount of rules and tokens.
It
simplifies the number of rules and tokens utilized by the parser in the next
phase of the compiler. That is why it is common to find a combination
of FLEX and other parser generators such as the tool called Yet Another
Compiler-Compiler (YACC) or its successor GNU Bison.
For a complete reference of FLEX, the FLEX manual by Paxson [2002]
is a good source of information.
3.3
GNU Bison
GNU Bison is a parser generator that generates a LALR(1) parser from a
context-free grammar. The generated parser can be in one of these three
languages: C-code, C++ and Java. It is based on the tool called YACC.
GNU Bison receives as an input a file with the grammar rules. This
grammar file is specified using BNF. The output of the process is a parser
written in C that communicates with a lexer, commonly written in LEX or
FLEX.
In this section we explain these input and output file in detail and cover
some other details about GNU Bison that will increase the understanding
of the presented project implementation in the next chapter.
3.3. GNU BISON
29
3.3.1
Input file parser.y
There are 4 sections in a grammar file: Prologue, Bison declaration,
Gram-mar rules and Epilogue distributed as presented in Listing 3.3.
Listing 3.3: Bison file structure
1 %{ P r o l o g u e 3 %} B i s o n d e c l a r a t i o n s 5 %% Grammar r u l e s 7 r e s u l t : r u l e 1 −components . . . | r u l e 2 −components . . . 9 . . . ; 11 %% E p i l o g u e