Ghoul: A cache-friendly programming language

(1)

Adam Temmel

MID SWEDEN UNIVERSITY

Department of Information Systems and Technology (IST) Type of Document – Computer Engineering BA (C), Final Project Main field of study: Computer Engineering

Credits: 15 hp

Semester, year: VT, 2020

Supervisor: Martin Kjellqvist, martin.kjellqvist@miun.se Examiner: Patrik Osterberg, patrik.osterberg@miun.se

(2)

Abstract

Performance has historically always been of importance to computing, and as such, processor developers have brought up several different meth-ods to squeeze out more processing power from the processor. One of these concepts is the presence of a CPU cache memory, whose responsi-bility is to hold data the processor expects it might use soon. To utilize the cache well means that the processor can compute data at a much higher rate, resulting in a direct impact on performance. Therefore, it follows that it is in the developer’s best interest to write code capable of utilizing the cache memory to its full extent. This is not always an easy task how-ever, as the patterns and style of programming the developer may need to adapt to can come of as cumbersome.

This study will explore the possibilities of merging cache-friendly pro-gramming concepts with a developer-friendly syntax, resulting in a lan-guage that is both readable, writeable as well as efficient in regards to the processor cache. In order to accomplish this task, studies of memory ac-cess patterns, existing programming languages and compiler design has been performed.

The end product is a language called Ghoul which successfully imple-ments cache-friendly concepts on a syntactic level, complete with a work-ing compiler. Outputs from this compiler were later benchmarked to as-sert that the concepts introduced had a measurable impact on the perfor-mance of programs written in Ghoul, showing that the aforementioned syntactical concepts indeed directly influence the speed at which data can be processed.

(3)

Sammanfattning

Prestanda har historiskt sett alltid varit av betydelse för nyttjandet av datorer, vilket lett till att processorutvecklare har tagit fram flera olika metoder för att klämma ut mer processorkraft från processorn. Ett av dessa koncept är processorns cacheminne, som ansvarar för att lagra data processorn förväntar sig att behöva inom en snar framtid. Om cacheminnet nyttjats väl så innebär detta att processorn can behandla data i en mycket snabbare takt, vilket direkt påverkar prestanda. På grund av detta vill utvecklare gärna skriva kod som nyttjar cacheminnet till fullo. Detta är inte alltid en enkel uppgift, då de programmeringsmönster och beteenden utvecklaren måste anpassa sig till går att anse vara klumpiga för utveck-laren.

Den här studioen kommer utforska möjligheterna att sammanfoga cachevän-liga programmeringskoncept med utvecklarvänlig syntax, vilket resul-terar i ett programmeringsspråk som är både läsbart, skrivbart samt effek-tivt med hänsyn till processorns cacheminne. För att lyckas med denna uppgift har studier på mönster inom minnesåtkomst, befintliga program-meringsspråk och kompilatordesign genomförts.

Slutprodukten är ett språl vid namn Ghoul som implementerar cachevän-liga koncept på en syntaktisk nivå, komplett med en fungerande kompi-lator. Utdata från denna kompilator blev senare prestandatestad för att avgöra huruvida de koncept språket introducerar har en märkbar påverkan på prestandan av program skrivna i detta språk. Testen visade att de tidi-gare nämnda konceptet direkt visar ett inflytande på hastigheten data kan behandlas i språket.

(4)

Acknowledgements

I would like to thank my classmates and my supervisor for partaking in my various discussions about programming languages and their imple-mentations, for evaluating and discussing potential syntax choices and for constantly providing me with well nuanced food for thought.

(5)

Abstract ii

Acknowledgements iii

List of Figures vi

List of Tables vii

Abbreviations viii

1 Introduction 1

1.1 Background and problem motivation . . . 1

1.2 Overall aim . . . 1

1.3 Concrete and verifiable goals . . . 1

1.4 Scope . . . 2 1.5 Outline . . . 2 2 Theory 3 2.1 CPU Cache . . . 3 2.2 Memory layout . . . 3 2.3 Compiler . . . 4 2.3.1 Lexer . . . 4 2.3.2 Parser . . . 6

2.3.3 Abstract Syntax Tree . . . 6

2.3.4 Semantic analysis . . . 7

2.3.5 Optimization . . . 7

2.3.6 Intermediate representation . . . 7

2.3.7 Linker . . . 7

2.4 LLVM . . . 8

2.5 Language evaluation criteria . . . 8

(6)

3.1.1 Ordering of struct members . . . 12

3.1.2 Memory layout for continuous containers . . . 13

3.2 Visitor pattern . . . 13

3.3 Evaluation . . . 13

4 Design 15 5 Construction 16 5.1 The pipeline in detail . . . 16

5.1.1 Lexer . . . 16 5.1.2 Parser . . . 19 5.1.3 Semantic Analysis . . . 25 5.1.4 IR generation . . . 25 5.2 Function definitions . . . 26 5.3 Function calls . . . 28 5.4 Literals . . . 28 5.5 Binary operators . . . 29 5.6 Variables . . . 30 5.7 Assignment . . . 30 5.8 Unary operators . . . 31 5.9 Conditional statements . . . 32 5.10 Iterations . . . 33 5.11 Groupings of data . . . 34 5.12 Arrays . . . 35 5.13 Memory layout . . . 38 6 Result 39 7 Evaluation and Discussion 44 7.1 Ethical issues . . . 45

(7)

2 An example of the SoA memory layout. . . 4

3 A flowchart of a hypothetical compiler pipeline. . . 5

4 An example of a basic abstract syntax tree. . . 7

5 An example of how ordering of struct members can impact the total struct size. . . 12

6 An example of the visitor pattern in action. . . 14

7 An example of Tokens represented as enums. . . 17

8 An example of how to represent valid tokens as strings. . . 18

9 An example of a valid token structure. . . 19

10 An oversight into the most important parts of an easily ex-tensible lexer. . . 20

11 Another example of an abstract syntax tree. . . 21

12 An example of a base class for AST nodes. . . 21

13 Basic helper functions for the parser. . . 22

14 Basic parsing example. . . 24

15 Parse flow for a function definition. . . 27

16 Parse flow for a function call. . . 28

17 Parse flow for a binary expression. . . 29

18 Parse flow for a variable declaration. . . 30

19 Parse flow for a unary expression. . . 31

20 Parse flow for a conditional statement. . . 32

21 Parse flow for an iteration. . . 33

22 Parse flow for a structure declaration. . . 34

23 Parse flow for accessing a member. . . 35

24 Parse flow for defining an array. . . 36

25 Parse flow for querying the size of an array. . . 36

26 Parse flow for defining a push operation. . . 36

27 Parse flow for defining a pop operation. . . 36

28 Parse flow for defining a free operation. . . 36

29 Parse flow for indexing an array. . . 36

30 Parse flow for declaring an array. . . 38

31 Results of the benchmark measuring array iteration speed when only accessing a single struct member. . . 39

32 Results of the benchmark measuring array iteration speed when accessing multiple members. . . 40

33 Source code taken from the test regarding rearrangement of struct members, partnered with the generated LLVM IR. . 41

34 Code written in Ghoul . . . 42

(8)

(9)

Abbreviations

RAM Random Access Memory CPU Central Processing Unit

AoS Array of Structs (first encountered in chapter 2.2) SoA Struct of Arrays (first encountered in chapter 2.2) AST Abstract Syntax Tree (first encountered in chapter 2.3.3)

(10)

1 Introduction

1.1 Background and problem motivation

Since the very advent of computing, we have used it to accomplish tasks we humans consider either too complicated, monotonous or just plain te-dious for us to perform. One such task is the task of processing large amounts of data. A human could find the task tedious, whereas a com-puter, once instructed to process the data is theoretically able to perform that task until it or its environment breaks.

It then follows that a very powerful tool in the programmer’s arsenal is the ability to iterate over data. This concept can be expressed dif-ferently between different programming languages, but is perhaps most commonly known as an iteration (or more informally, loop), such as a for loop, while loop or a for-each loop. These constructs allows the program-mer to efficiently implement one or several operations on a set of data. Given that these constructs are so very frequent within many programs and algorithms, they are also a good target for optimization. Modern hardware is often built with this in mind, an example of which being the presence of a cache memory within the processor, see chapter 2.1. Not all written code takes advantage of said cache memory, however, leading to potentially lost performance. This is sometimes avoidable, at the cost of restructuring the code slightly, see chapter 2.2.

1.2 Overall aim

The overall aim of this project is to discuss and create the design of a language that introduces concepts which makes it easier for the developer to write cache-friendly code, as well as to create a working compiler for said language implemented in C++.

1.3 Concrete and verifiable goals

Once the project has reached its end, the following goals are meant to be fulfilled:

1. Syntax for a new programming language that is considered to be a good balance between readability, see chapter 2.5.1, and writeability, see chapter 2.5.2.

(11)

3. The language should provide one or several methods of easily ex-pressing cache-friendly programming concepts.

4. To verify the efficiency of these methods they should be benchmarked to assure that they impact performance as intended.

1.4 Scope

In order to limit the scope of this thesis, the resulting compiler will not be able to perform optimizations. As such, no performance comparison will be made between the Ghoul language and an already existing language, as the comparison would not be performed on equal terms.

1.5 Outline

(12)

2 Theory

2.1 CPU Cache

The modern processor consists of several different subcomponents who all have their respective purposes within the grander scheme of things. One of these components is the cache memory, whose purpose is to act as a smaller version of the computer’s actual RAM memory. The reason for why modern processors are designed to to store a subset of the RAM memory on the actual processor is due to that one of the main bottlenecks for the processor is not the rate at which it can operate on data or instruc-tions, but rather the rate at which it can fetch them[2].

A solution to this problem is that the processor tries to "guess" which data might be next in line to get processed and stores said data within the cache memory. The next time the processor needs to operate on a specific part of the memory, it then checks to see whether this data can be found inside the cache memory, and if so, skips the long journey to the RAM memory, saving time and thereby improving performance.

2.2 Memory layout

Intel specifies two distinct methods of defining a data-layout within the RAM memory in an article posted in 2012[3], meaning that either of the two may be more or less advantageous depending on what their under-lying use case is. These methods are as follows:

• Array of Structs (AoS): This is perhaps the more straightforward way of modelling a problem. A struct is defined, and an array is created to contain instances of the struct.

1 s t r u c t C o l o r { 2 c h a r r ; 3 c h a r g ; 4 c h a r b ; 5 c h a r a ; 6 }; 7 8 C o l o r c o l o r s [ N ];

Figure 1: An example of the AoS memory layout.

(13)

of the resulting program[3]. The steps are here reversed, as an array for each struct member is defined inside the struct

1 s t r u c t C o l o r A r r a y { 2 c h a r r [ N ]; 3 c h a r g [ N ]; 4 c h a r b [ N ]; 5 c h a r a [ N ]; 6 }; 7 8 C o l o r A r r a y c o l o r A r r a y ;

Figure 2: An example of the SoA memory layout.

One of the reasons as to why the memory layout impacts performance is due to how certain modern processors implement prefetching in some way. Prefetching means that the processor studies the current data access pattern and thereby tries to predict the next memory cells that are of in-terest to the program. This prediction performs better depending on how sporadic a developers access of memory is. If the access is done linearly, such as via an iteration, the processor is more likely to identify the access pattern and optimize the fetching of data thereafter[4].

2.3 Compiler

A compiler is a program responsible for converting one programming language into another language. Generally speaking, for this conversion to be advantageous, the origin language is of a higher level than the tar-geted language. It is common but not necessary for the tartar-geted language to be something very close to hardware, such as an assembly language, object code, or machine code, in order to create an executable[5].

The compilation process can be viewed as a pipeline of sorts, with each step of the process feeding its outputs as inputs to the preceding step, see figure 3. Although the pipeline may look intimidating at first, it is defi-nitely possible to understand if given enough study. As with most soft-ware, the majority of the core components of a compiler can be derived from breaking down what operations needs to be done, and in which or-der. Besides this there are a few concepts mostly specific to compiler de-velopment that vastly improves the quality of the resulting pipeline.

2.3.1 Lexer

(14)

Source Code

Lexer

Parser

Semantic analysis Optimization (optional) Intermediate representation generator Target code generator Linker Executable File(s) to read Tokens

Abstract syntax tree

Symbol table

Intermediate representation

Object file(s)

Figure 3: A flowchart of a hypothetical compiler pipeline.

(15)

value = 2 + function();

By considering the entire string to be an input to the lexer, the lexer is then supposed to extract some sort of context from it by generating a sequence of tokens. For the C language, that would be something akin to table 1.

String Token value identifier = equals sign 2 integer literal + plus sign function identifier ( opening parens ) closing parens ; semicolon

Table 1: Table examplifying some valid tokens in the C language. It is also worthy to note that if a lexer would be given a string it can-not convert into a valid token, the lexer then needs to report this error somehow, thus aborting the compilation process. As stated in 2.3, the compilation process can be viewed as a pipeline, and if a pipeline breaks down in the middle, there is arguably not much use in trying to extract output from it.

2.3.2 Parser

A parser is responsible for attempting to parse the series of tokens into something even more usable, thereby also partially verifying the validity of the written code. Imagine if we tried to write some more C code, but instead ended up with this:

value = 2 + ;

Although this is a perfectly valid series of tokens on their own, they do not necessarily make perfect sense in the grander scheme of things. In the C language, + is considered either a unary prefix operator or a binary operator, and placing it at the end of an expression like this does not sat-isfy either of these conditions.(Citation needed) Errors like these are caught in the parsing process, and if no errors were caught, the parser is free to produce an abstract syntax tree as an output.

2.3.3 Abstract Syntax Tree

(16)

abstract syntax tree is as follows: = value +

2 function()

Figure 4: An example of a basic abstract syntax tree.

Partnered with some programming patterns this structure can simplify analyzing and performing operations on the given code. This particular construct will be further discussed in chapter 5.1.2.

2.3.4 Semantic analysis

Semantic analysis refers to the process of studying whether a successfully parsed program has some sort of meaning or not. Examples of seman-tic analysis checks include, checking whether a variable has already been declared or not, checking to see if a function call matches a function signa-ture, and so forth. As a biproduct of this process, a table of the program’s datatypes, function definitions and variable definitions (sometimes re-ferred to as a symbol table) can be extracted.

2.3.5 Optimization

Optimization in this context refers to the ability to output well optimized code as a part of the compilation process. It is however considered as a bit out of scope for the purposes of this thesis, so it will only be mentioned as an optional step within the compilation process.

2.3.6 Intermediate representation

An intermediate representation (sometimes abbreviated as IR) is essentially code that closely resembles some sort of assembly. The intermediate rep-resentation is later translated into actual machine code for the target ar-chitecture and later ultimately further processed into some sort of binary object file[5].

2.3.7 Linker

(17)

discuss the construction of an entire linker, but will instead piggyback on an existing linker to accomplish the same goals.

2.4 LLVM

LLVM began as a research project to investigate compilation techniques, but has now grown into an umbrella project spanning several subprojects. Today, the core project of LLVM is a set of libraries designed to assist with the design of different aspects regarding a compiler, such as generation of an intermediate representation (conveniently named LLVM IR) and opti-mization of said IR[6].

2.5 Language evaluation criteria

In order to compare languages to each other, a set of criteria to evaluate different aspects of languages is highly useful. The criteria that this thesis will discuss are as follows.

2.5.1 Readability

Readability specifies how easy a language is to read and understand. From a more historical standpoint, readability has not always been a point of in-terest for language developers. The first languages were in fact primarily written and designed to benefit the computer, not the developer[5]. Over time, concepts such as the software life-cycle came to fruition, meaning that the value of writing code in a language with high readability grew[5]. An often overlooked aspect of programming is that code is read far more often than it is written[7], which only makes the concept of readability even more attractive.

(18)

the static keyword has an entirely different meaning than if one were to declare a local variable with the static keyword. In terms of orthogonal-ity, this is an issue, as parts of the language’s constructs are now context dependent. This is a problem, as the reader now has to successfully recog-nize the context of the keyword and interpret its meaning based on said context. In order to solve this problem, one potential solution would be to introduce more keywords to describe the different effects of static in all of the available contexts.

2.5.2 Writeability

In somewhat of a contrast to readability, writeability refers to how easy it is for a developers thoughts, ideas, concepts and algorithms to be expressed in a given language[5]. A concept that increases a language’s writeabil-ity is the abilwriteabil-ity to perform operator overloads. This feature means that the developer is allowed to define as well as redefine operations between different data types in the language. This feature is particularly useful for developers who need to perform linear algebra within their software, such as for 3D math, as it allows matrix multiplication and similar opera-tions to be written so that they syntactically closely resemble their actual representation in the math world. The substitute to this would have been to write what risks becoming unnecessarily verbosely named functions to do the same work.

A great risk with introducing operator overloading with a language is that it allows developers to associate operations with operators in a com-pletely nonsensical fashion. Abusing the system like this can significantly hurt the readability of a language, meaning that these two interests often, but not necessarily always, tend to clash with each other.

2.5.3 Reliability

Reliability describes how easy it is to develop reliable, safe and error-free programs within a language [5]. A very common trait of a reliable lan-guage is the presence of type checking, either during compile time, at run-time or in some cases, both. This allows the language to support the de-veloper by ensuring that the type usage is consistent within the entire program.

2.5.4 Cost

(19)

• The cost of training developers to use the language. A simpler lan-guage is often easier to teach, which means that the potential cost of the language decreases.

• The cost of writing programs in the language. High writeability means that the programs can be written faster, this decreasing cost. • The cost of compiling a program. If the language has a large amount

of features or a very large ruleset, the compilation time of a language may suffer. In contrast, a very simple language is most likely easier to pass through the compilation pipeline. Naturally, this aspect of the language can be improved by writing a faster compiler.

• The cost of executing a program. If a language inherently performs a lot of runtime checks, the performance of the resulting program will most likely be impacted negatively as a consequence. Compiled lan-guages are often able to trade compilation time for execution speed by performing optimizations as a part of the compile step. By opti-mizing the code,

• The cost of maintaining a program. This includes the cost of fixing errors in the program as well as introducing new functionality to the program. Seeing as maintenance is not always done by the original author of the program, having a language with high readability may help in this regard.

2.6 Languages

2.6.1 C

The C programming language spawned near the beginning of the 1970s as the primary language of development for the Unix operating system. C has had a large impact on how modern languages are written, as many other languages in some way, shape or form imitate various syntactical concepts of the C language. As a product of its age, C is today considered a very low level language, meaning that it has a fairly low writeability compared to most modern languages. That said, C is still widely used today, partially due to how well the language operates with the hard-ware[8][9].

2.6.2 C++

(20)

variety of abstractions[10][11]. While this legion of possibilities for ab-stractions greatly improve the writeability of the language, the readability of the language risks suffering as a result. Likewise, the power to create abstractions can be misused by creating poor abstractions or misuse of otherwise good abstractions. Due to this, several well known software developers, such as Linus Torvalds, creator of the Unix kernel, have come to criticize the language for its complexity[12]. Regardless, C++ remains a prominent language in the modern software industry, as can be seen in the 2019 Stackoverflow developer survey, where over 23% of the respon-dents wrote code in C++[9].

2.6.3 Go

(21)

3 Method

3.1 Cache-friendly concepts

Ghoul is not meant to just reimplement an already existing language, it also attempts to bring new ideas to the table. Cache efficiency is the name of the game with Ghoul. This field is explored in the following ways.

3.1.1 Ordering of struct members

Consider the C program shown in figure 5. Seeing as both Ex1 and Ex2 contain the same amount of data from the developer’s point of view, it is fully logical to believe that the output of the program is the same number twice. This is not the case however, as compilers will attempt to reorga-nize the data so as to make data access faster[15]. The actual output of the program in figure 5 is 24 16. The lesson to take from this is that the

member ordering inEx2is to be preferred, as it makes the struct smaller. 1 # i n c l u d e < s t d i n t . h > 2 # i n c l u d e < s t d i o . h > 3 4 s t r u c t Ex1 { 5 i n t 3 2 _ t a ; 6 i n t 6 4 _ t b ; 7 i n t 3 2 _ t c ; 8 }; 9 10 s t r u c t Ex2 { 11 i n t 6 4 _ t b ; 12 i n t 3 2 _ t a ; 13 i n t 3 2 _ t c ; 14 }; 15 16 int m a i n () { 17 p r i n t f (" % d ␣ % d \ n ", 18 s i z e o f(s t r u c t Ex1 ) , 19 s i z e o f(s t r u c t Ex2 ) ); 20 }

Figure 5: An example of how ordering of struct members can impact the total struct size.

(22)

3.1.2 Memory layout for continuous containers

Furthermore, Ghoul wishes to implement the concepts discussed in chap-ter 2.2 in a more writeable format. The process to convert a parts of a codebase from an AoS layout to a SoA layout is usually far from effort-less, and even then, the resulting code does not necessarily perform better after doing so. A core idea behind Ghoul is to be able to define the layout of an array upon definition, but to let the developer have an identical in-terface to the array regardless of its array. By doing so, the only changes that need to be made in the codebase are type-related, as all iterations, indexations, and so forth are syntactically identical.

3.2 Visitor pattern

The visitor programming pattern is a way to add operations to a class without modifying the actual class, and is commonly used to define a uniform interface to a node-like structure were each node share a common base type. An abstract visitor class is first defined with one pure virtual

visitfunction for each node type it may visit. This class is then inherited

from for each operation (or visitor) the developer wishes to design. An example of this pattern is presented in figure 6.

3.3 Evaluation

(23)

1 s t r u c t V i s i t o r ; // F o r w a r d d e c l a r a t i o n 2 3 s t r u c t N od e { // A b s t r a c t b a s e n o d e c l a s s 4 v i r t u a l ~ B a s e n o d e () = d e f a u l t; 5 v i r t u a l v o i d a c c e p t ( V i s i t o r & v i s i t o r ) = 0; 6 }; 7 8 s t r u c t N o d e A : p u b l i c N od e { 9 v o i d a c c e p t ( V i s i t o r & v i s i t o r ) o v e r r i d e { 10 v i s i t o r . v i s i t (*t h i s); 11 } 12 }; 13 14 s t r u c t N o d e B : p u b l i c No d e { 15 v o i d a c c e p t ( V i s i t o r & v i s i t o r ) o v e r r i d e { 16 v i s i t o r . v i s i t (*t h i s); 17 } 18 }; 19 20 s t r u c t V i s i t o r { // A b s t r a c t b a s e v i s i t o r c l a s s 21 v i r t u a l ~ V i s i t o r () = d e f a u l t; 22 v i r t u a l v o i d v i s i t ( N o d e A & n o d e ) = 0; 23 v i r t u a l v o i d v i s i t ( N o d e B & n o d e ) = 0; 24 }; 25 26 s t r u c t L o g V i s i t o r : p u b l i c V i s i t o r { 27 v o i d v i s i t ( N o d e A & n o d e ) o v e r r i d e { 28 std :: co u t < < " V i s i t e d ␣ n od e ␣ of ␣ t y p e ␣ A !\ n "; 29 } 30 v o i d v i s i t ( N o d e B & n o d e ) o v e r r i d e { 31 std :: co u t < < " V i s i t e d ␣ n od e ␣ of ␣ t y p e ␣ B !\ n "; 32 } 33 };

(24)

4 Design

The Ghoul programming language is inspired primarily by the languages in chapter 2.6. It aspires to be low-level in most regards, akin to C, but to also provide efficient abstractions for common concepts. Syntactically, it is intended to mesh the feeling of familiarity that comes from the C-family together with the expressiveness and lack of verbosity that can be found in Go. Where C++ prides itself on providing several clever ways to write code, such as by having templates for type parametrization, Ghoul should ideally suffocate the need for these constructions by implement-ing advanced structures, such as a dynamic array, as a part of the base language.

(25)

5 Construction

5.1 The pipeline in detail

In order to present the implementation of the compiler properly, the the-sis will first give a further overview of the pipeline’s different steps and how they are implemented, before later discussing the implementation of specific features in the language. Most features require some sort of handling of edge cases for that specific feature, but all features that are to be implemented require the same base knowledge of the pipeline. Thus, approaching this chapter in this manner should ideally present the infor-mation in a way that is comprehensible for the reader.

5.1.1 Lexer

Lexing, as previously stated in 2.3.1, is the process of breaking up a string into a series of tokens. In order to do this, the first order of operations is to define what is a token, and thereby, what is not a token. Tokens come in several different categories, such as: operators, keywords (sometimes re-ferred to as reserved words), literals and identifiers. Operators include what-ever operators the language may possess, such as arithmetical operators (+, -,*, /), logical operators (==,!=, &&, ||) and potentially also special

oper-ators unique to the language. The important distinction to make is that these tokens (generally) do not consist of alphanumeric characters. Key-words, on the contrary, are a set of strings that have separate meaning in the language. Good examples to this include theifstatement or thewhile

statement, but also things such as import and include. These tokens also

have a valid distinction to them, namely that they generally only consist of letters. Two of the most basic constants include integer literals, referring to a series of digits, and string literals which consists of a series of char-acters enclosed with double quotation marks. Lastly, identifiers work as names that the developer has chosen for different pieces of the code, such as when declaring a variable or a function. Identifiers are allowed to be any form of string of non-whitespace characters that does not conflict with an already existing token. That last part is important, because a compiler may become very unpredictable if one were to be able to declare a vari-able or function with the name "if", while "if" also appears as a keyword in the language.

(26)

language to assist with this mapping is enums. Using enums, it is easy to define a constant integer value for each valid type of token.

1 e n u m s t r u c t T o k e n T y p e { 2 // L i t e r a l s and l a n g u a g e - s p e c i f i c c o n s t r u c t s 3 S t r i n g L i t e r a l , // " Hello , W o r l d " 4 I n t L i t e r a l , // 42 5 T e r m i n a t o r , // N e w l i n e (\ n ) 6 I d e n t i f i e r , 7 . 8 . 9 . 10 11 // O p e r a t o r s 12 Add , // + 13 S u b t r a c t , // -14 M u l t i p l y , // * 15 Divide , // / 16 Assign , // = 17 . 18 . 19 . 20 21 // K e y w o r d s 22 F u n c t i o n , // fn 23 If , // if 24 While , // w h i l e 25 For , // for 26 Struct , // s t r u c t 27 . 28 . 29 . 30 // K e e p t h i s one l a s t 31 N T o k e n T y p e s // We w i l l d i s c u s s the i m p o r t a n c e 32 // of t h i s e n u m l a t e r 33 };

Figure 7: An example of Tokens represented as enums.

But this is only one half of the equation. This is a perfectly valid way of representing tokens as integers, but there also needs to be a respective representation for the string side of the mapping. An easy way to do so is to declare a constant array with all the valid strings.

There is a few things to digest within the example in figure 8. Firstly, this construct uses array overvector because the dimensions of the construct

are not meant to be changed dynamically at runtime, so by keeping it as a static array it will perform better. The size of the static array is evaluated to be the last enum in TokenTypes, meaning that if one were to append

(27)

1 # i n c l u d e < array > 2 # i n c l u d e < s t r i n g _ v i e w > // R e q u i r e s C + + 1 7 3 4 c o n s t e x p r std :: array < std :: s t r i n g _ v i e w , 5 s t a t i c _ c a s t< size_t >( T o k e n T y p e :: N T o k e n T y p e s ) > 6 t o k e n S t r i n g s { 7 " ", // S t r i n g l i t e r a l 8 " ", // Int l i t e r a l 9 " \ n ", // S t a t e m e n t t e r m i n a t i o n 10 " ", // I d e n t i f i e r 11 . 12 . 13 . 14 15 " + ", 16 " - ", 17 " * ", 18 " / ", 19 " = ", 20 . 21 . 22 . 23 24 " fn ", 25 " if ", 26 " w h i l e ", 27 " for " 28 " s t r u c t ", 29 . 30 . 31 . 32 };

Figure 8: An example of how to represent valid tokens as strings. 1, and thusly emit a compile time error when trying to assign the array with a improperly sized initialization list. Furthermore, the strings within the array are also static, and will not change during runtime, meaning they are a good candidate for being implemented as a string_view. By

committing to marking all candidates for static memory asconstexpr, the

intention is more clear to the reader, and the code may also be slightly more well performing.

(28)

indices stays intact, and empty strings do not conflict with the lexing al-gorithm.

A prior description of a token described it as a integer with metadata. Let us discuss the implications of this statement. What kind of metadata would be convenient for a token? Further down the pipeline, the column and row of the token will be of great use when generating error messages. Certain tokens may wish to represent additional data besides their type, such as literals and identifiers. Literals will want to primarily consist of their type, but also their actual literal. The same goes for identifiers. A convenient way to represent literals and identifiers is to keep them in a

string. 1 # i n c l u d e < string > 2 3 s t r u c t T o k e n { 4 T o k e n T y p e t y p e ; 5 std :: s t r i n g v a l u e ; 6 s i z e _ t row ; 7 s i z e _ t col ; 8 };

Figure 9: An example of a valid token structure.

To then parse strings into tokens we employ the algorithm described in figure 10. It is not particularly effective, but it is flexible and extensible. Adding basic tokens is as easy as adding another enum and a matching string in the array. Note that this implementation lacks certain features, such as the parsing of literals. We will discuss the specific implementa-tion of these tokens later in an attempt to focus on the general algorithm employed.

5.1.2 Parser

The parser is the next step of the compiler pipeline. Akin to how chap-ter 5.1.1 started by explaining what a token was, this chapchap-ter starts of by further acquainting the reader with the concept of an abstract syntax tree. While the basics were already touched upon in chapter 2.3.3, this chapter aims to expand upon the reader’s knowledge of the concept.

(29)

Data: string of characters, array of valid tokens

Result: vector of tokens

while there are characters to be read do

create a new token T and a new string S

while current character is a space or tab do

skip to next character

end

if current character is alphanumeric then while current character is alphanumeric do

seek until a nonalphanumeric character is found

end

append this string into S

else

append the current character into S

if S is found in the array of valid tokens then

while the string S is found in the array of valid tokens do

T.type is the index at which S was found append the next character into S

end

append T into the vector of tokens go to beginning of outer loop

else

the token is not valid

end end

if S is found in the array of valid tokens then

T.type is the index at which S was found append T into the vector of tokens go to beginning of outer loop

end

T.type is identifier append S into T.value

append T into the vector of tokens go to beginning of outer loop

end

Figure 10: An oversight into the most important parts of an easily exten-sible lexer.

(30)

likely prefer all nodes to share some sort of common interface. value = 2 + function(4); = value + 2 function() 4

Figure 11: Another example of an abstract syntax tree.

A very convenient way to implement this behaviour in object-oriented languages is to model this with what will most likely end up being a fairly flat class hierarchy all rooted in an abstract base class.

1 # i n c l u d e < memory > // std :: u n i q u e _ p t r 2 # i n c l u d e < utility > // std :: m o v e 3 # i n c l u d e < vector > 4 5 s t r u c t A s t N o d e { 6 u s i n g C h i l d = std :: u n i q u e _ p t r < AstNode >; 7 8 v i r t u a l ~ A s t N o d e () = d e f a u l t; 9 v o i d a d d C h i l d ( C h i l d && c h i l d ) { 10 c h i l d r e n . p u s h _ b a c k ( std :: m o v e ( c h i l d ) ); 11 } 12 13 std :: vector < Child > c h i l d r e n ; 14 T o k e n * t o k e n = n u l l p t r ; 15 };

Figure 12: An example of a base class for AST nodes.

By storing the child nodes in a dynamic container likevector, the amount

of child nodes per parent node can easily be made dynamic, satisfying the first goal of the structure. The interface is provided by letting this functionality rest in the abstract base class, meaning that each class that inherits from this class will inherit its interface. As a final touch, the class is given a pointer to the respective token that it represents. This will prove useful later for debugging the rest of the compiler and reporting semantic errors.

(31)

building a syntax tree based upon the order of tokens found. Before we discuss the specifics of this implementation, it is a good idea to construct a few helper functions to assist in parsing the series of tokens.

1 # i n c l u d e < i o s t r e a m > // std :: c e r r 2 3 c l a s s A s t P a r s e r { 4 p u b l i c: 5 u s i n g T o k e n s = std :: vector < Token >; 6 7 // C - tor t a k i n g o w n e r s h i p of t o k e n s 8 A s t P a r s e r ( T o k e n s && _ t o k e n s ) : 9 t o k e n s ( std :: m o v e ( _ t o k e n s ) ) , it ( t o k e n s . b e g i n () ) {} 10 p r i v a t e: 11 T o k e n * g e t I f ( T o k e n T y p e t y p e ) { 12 if( it == t o k e n s . end () || it - > t y p e != t y p e ) { 13 r e t u r n n u l l p t r ; 14 } 15 r e t u r n &*( it + + ) ; 16 } 17 18 # i f n d e f N D E B U G // If not not d e b u g = if d e b u g 19 // D e b u g p a n i c

20 A s t N o d e :: C h i l d p a n i c (c o n s t c h a r * file , int row ) { 21 std :: ce r r < < " U n e x p e c t e d ␣ t o k e n ␣ at ␣ " < < it - > row 22 < < ’ : ’ < < it - > col < < ’ \ n ’ < < " P a n i c ␣ s p a w n e d ␣ in ␣ " 23 < < f i l e < < " ␣ at ␣ row ␣ " < < row < < ’ \ n ’; 24 r e t u r n n u l l p t r ; 25 } 26 # d e f i n e u n e x p e c t e d () p a n i c ( __ F I L E _ _ , _ _ L I N E _ _ ) 27 # e l s e 28 // N o r m a l p a n i c 29 A s t N o d e :: C h i l d p a n i c () { 30 std :: ce r r < < " U n e x p e c t e d ␣ t o k e n ␣ at ␣ " 31 < < it - > row < < ’ : ’ < < it - > col < < ’ \ n ’; 32 r e t u r n n u l l p t r ; 33 } 34 # d e f i n e u n e x p e c t e d () p a n i c () 35 # e n d i f 36 37 T o k e n s t o k e n s ; 38 T o k e n s :: i t e r a t o r it ; 39 };

Figure 13: Basic helper functions for the parser.

There is a lot of small details to digest in figure 13. The constructor is fairly straightforward, initializing the class by claiming ownership of a vector of already lexed tokens. Closely followed is a function bluntly namedgetIf.

(32)

and test the type of said token. If the token type and the expected type match, it returns a valid pointer to the token and looks at the following pointer instead. Otherwise, a null value is returned. This seems like a very basic function, but will prove very useful later as it expresses a easy way to both check and access the next token, solving two problems at once.

Thereafter, the first steps of error handling inside the parser is taken. Con-sider the scenario of a very basic parser, attempting to parse a single bi-nary expression, such as the operation of adding two positive integers together. This expression would consist of three tokens, one integer literal, one binary operator and lastly another integer literal. Should the parser not find the last integer literal, either by running out of tokens, or by finding an entirely different token altogether, the binary expression is incomplete and thereby also invalid. This means that the code written by the user is faulty. A very general description of errors like these is that the parser found an unexpected token. Seeing as this can happen at several steps of the parsing process, it is ideal to generalize the error reporting process in some way. A valid example of this can be found in the bottom half of figure 13. There, two versions of the function panic is defined. One

is described as the normal version, which simply prints a message about the current token and returns a null value. The second does the same, but thanks to preprocessor macros also prints the source location where it was called from. This can be very useful when debugging the parser, but is most likely information that is wasteful for the end user. Thus, this function is contained for debug builds only. The distinction between de-bug builds and release builds is here made by checking the validity of the

NDEBUGpreprocessor definition.

Now that all the core pieces of the parser is in place, it is perhaps time to start to look at the bigger picture. To exemplify how to implement a recursive descent parser, we scale down its capabilities to just being able to build a tree of binary expressions. Although this is a massive step down from a full scale compiler in terms of complexity, there is still enough of it to construct the recursive pieces of the parser, which arguably is the most important concept to grasp in this design.

After studying figure 14 the first thing one may notice is that the parser does not have any direct loop construction present. This is only a partial truth, as although the example parser lacks instances of afororwhile

key-word, the functionbuildExprcallsbuildBinExpr, which in turn callsbuildExpr

(33)

tree. Note that returning null values is not seen as an error, but rather as material to branch upon during the parsing process. Errors are instead reported by callingunexpected.

1 A s t N o d e :: C h i l d A s t P a r s e r :: b u i l d E x p r () { 2 a u t o n o d e = b u i l d P r i m a r y E x p r (); 3 if(! n o d e ) { 4 r e t u r n n u l l p t r ; 5 } 6 a u t o p a r e n t = b u i l d B i n E x p r ( n o d e ); 7 if( p a r e n t ) { 8 r e t u r n p a r e n t ; 9 } 10 r e t u r n n o d e ; 11 } 12 13 A s t N o d e :: C h i l d A s t P a r s e r :: b u i l d P r i m a r y E x p r () { 14 T o k e n * t o k e n = g e t I f ( T o k e n T y p e :: I n t L i t e r a l ); 15 if(! t o k e n ) { 16 r e t u r n n u l l p t r ; 17 } 18 r e t u r n std :: m a k e _ u n i q u e < I n t L i t e r a l N o d e >( t o k e n ); 19 } 20 21 A s t N o d e :: C h i l d A s t P a r s e r :: b u i l d B i n E x p r ( A s t N o d e :: C h i l d & lhs ) { 22 a u t o op = b u i l d B i n O p (); 23 if(! op ) { 24 r e t u r n n u l l p t r ; 25 } 26 a u t o rhs = b u i l d E x p r (); 27 if(! rhs ) { 28 // C o u l d not b u i l d a c o m p l e t e b i n a r y e x p r e s s i o n 29 r e t u r n u n e x p e c t e d (); 30 } 31 op - > a d d C h i l d ( std :: mo v e ( lhs ) ); 32 op - > a d d C h i l d ( std :: mo v e ( rhs ) ); 33 r e t u r n op ; 34 } 35 36 A s t N o d e :: C h i l d A s t P a r s e r :: b u i l d B i n O p () { 37 T o k e n * t o k e n = g e t I f ( T o k e n T y p e :: Add ); 38 if(! t o k e n ) { 39 r e t u r n n u l l p t r ; 40 } 41 r e t u r n std :: m a k e _ u n i q u e < B i n E x p r N o d e >( t o k e n ); 42 }

(34)

5.1.3 Semantic Analysis

Semantic analysis as mentioned in 2.3.4 is the process of analysing whether a program "makes sense" or not. This is done by traversing the abstract syntax tree (as a visitor, see chapter 3.2) and building a table of which sym-bols appear and controlling the order that they appear in. Generally, de-pending on which rules and operations a language may define, the im-plementation of its semantic analysis will differ.

In this case, the semantic analysis was performed by "collecting" the vis-ited types by traversing the tree. At different times during the analysis this collection of types will be observed, and if they do not adhere to the set of rules that the language enforces, a compile-time error will be spawned. Common examples include indexing a variable with a non-integer variable or literal, or mismatching the types at the left and right side of a binary expression.

The tables created during this process can easily be defined as several mappings from strings to types. If the analysis is presented with a dec-laration of some sort, then that decdec-laration is inserted into the respective table. This process will also have to check if said declarations already ex-ist in some context. Redeclaring a type or local variable as something that differs from its original declaration in either scope or type is defined as another compile-time error.

5.1.4 IR generation

LLVM supplies the user with a powerful albeit, at times, overwhelming interface to emit object files. There are three ’base’ datatypes that are used to generate, manipulate and convert LLVM IR. These are:

• llvm::LLVMContext is the core object of the library and is responsible

for managing most of the library’s core infrastructure. If one were to use LLVM within a multithreaded context, one would also have to take care to lock this context wherever applicable[16].

• llvm::Module can be viewed as a block of data that LLVM IR will be

inserted into. The module is later emitted as an object file once the IR is fully generated. This class is used to look up information that has been inserted into it a prior point, such as function or type defi-nitions[17].

• llvm::IRBuilderis a helper class to assist in generating (alternatively,

"building") LLVM IR and inserting it into blocks within the associ-ated module[18].

(35)

base classes that the developer should be aware of and be able to distin-guish between. Note that LLVM is built as using a substantial amount of inheritance, so several of these types are inherited from and made into their own separate thing as such.

• llvm::Type is a general interface for working with data types in the

LLVM API[19].

• llvm::Value represents a value of some sorts within the LLVM api.

This could be a constant value or the current value of data allocated at runtime[20].

• llvm::GetElementPtrInst (commonly abbreviated GEP) is a

multipur-pose instruction used for pointer arithmetic when interacting with accessing array or struct elements. It is a very powerful tool within the LLVM API as it is quintessential to performing memory index-ing of any sort[21][22].

There are several more classes within the LLVM library, some of which we will touch upon wherever relevant. For a more exhaustive list of the classes that LLVM presents, the documentation[23] is generally very thor-ough, although not always very descriptive.

As an additional note, LLVM in itself is unable to convert the emitted object file into an executable. This may seem like a fairly large problem, but can easily be solved by invoking a C compiler with the emitted file as an argument. The C compiler will then in turn translate a few function names and invoke the system linker, which then produces an executable.

5.2 Function definitions

The definition of a function is one of the more cumbersome constructs to parse. There are two optional scenarios, as is shown in figure 15, theextern

keyword at the beginning and the return value type. Ifexternis specified,

the function definition may not include a block with statements, and is instead assumed to be linked with. By allowing this functionality, the language may interface with both the C standard library as well as other potential C libraries. The second scenario is if the user were to omit the return type, at which it is assumed that the function will return void. If the function is not marked as extern, it should also contain a block, but

the block in itself may either be empty or contain a list of statements. From a semantic point of view, the most important parts of analyzing a function definition is to ensure that it:

(36)

function extern fn identifier ( type identifier , ) type { statement \n }

Figure 15: Parse flow for a function definition. • Does not use undefined types as a part of its signature.

In order to construct a clever compiler, the first step in generating LLVM IR for a function definition is to visit all existing function nodes. By visit-ing each function definition and insertvisit-ing them into the workvisit-ingllvm::Module

the function calls may appear in an order-independent fashion, meaning that the order the user declares functions does not impact the functional-ity of the program. In order to insert them into a module, allvm::FunctionType

is created from a return type and, optionally, the set of argument types. Using the llvm::FunctionType object and a unique string used as an

iden-tifier for the function, a llvm::Functionobject is created and inserted into

the module.

In the case of a non-externfunction definition, a function also consists of

a set of statements. These statements are contained in what LLVM refers to as allvm::BasicBlock. A block bears heavy resemblance to how a code

(37)

to be reconfigured to generate IR into this block by calling any of the over-loads to thellvm::IRBuilder::SetInsertPointfunction.

5.3 Function calls

Parsing a function call, as one perhaps could assume, bears heavy resem-blance to the process of parsing a function definition, as can be seen in figure 16. The notable parts of this process is how a call may consist of none, one, or several arguments, so that will have to be accounted for. call

identifier (

expression ,

)

Figure 16: Parse flow for a function call.

When constructing the abstract syntax tree, each argument can be viewed as a child to the call node. By doing so, the semantic analysis can easily be performed by visiting each argument and "collecting" the type of the argument. If the list of types does not align with any known function def-inition, or if the function identifier can not be found within the function table, the call is not valid and a compile-time error needs to be emitted. Akin to how the semantic analyzer "collects" types by visiting each pa-rameter of a call, the IR generator "collects" llvm::Value pointers by

vis-iting each node. This means that all expression nodes are responsible for producing values for the context that they are used in. Once the values have been collected, the function signature is extracted from the

llvm::Module and is used together with the values to call one of the

over-loads tollvm::IRBuilder::CreateCall. A final note is that function calls are

expressions by themselves, so llvm::IRBuilder::CreateCall will also

pro-duce anllvm::Value. This value needs to be "collected", just like any other

value, so that calls can be used to their full extent.

5.4 Literals

(38)

These nodes are not semantically analyzed by themselves, but instead help in verifying the integrity of the other nodes. When the semantic an-alyzer visits these nodes, it creates an instance of the type that said literal represents, either an integer, or a pointer to a char. These are in turn used to verify the validity of the surrounding expression that they appear in. Generating IR for both string and integer literals is an almost surprisingly straightforward process with LLVM . To create an integer literal, the com-piler first needs to get the respective type with the appropriate length from thellvm::LLVMContextusingllvm::IntegerType::getInt32Ty. Thereafter,

a value can be created usingllvm::ConstantInt::getso that it may later be

appended to the list of collected values. By changing the type from a 32-bit integer to a 1-32-bit integer by instead using thellvm::IntegerType::getInt1Ty

function, boolean constants can be generated in the very same fashion. String literals are a bit more costly to embed into an object file, so a good idea here is to cache all generated string literals in a table. By doing so, the compiler is prevented from generating two identical string constants. As for actually building the constant, this is easily done by calling the

llvm::IRBuilder::CreateGlobalStringPtrfunction.

5.5 Binary operators

Parsing a binary expression means that the comiler needs to identify two separate expressions separated by a biary operator.

binary

expression operator expression

Figure 17: Parse flow for a binary expression.

Semantically, this can fail depending on what types these expressions have. While adding two integer expressions together is of no problem, adding an integer and a bool can come off as a bit confusing in regards to what the developer’s intent is. If the expressions’ types do not match eachother, the semantic analysis is aborted and an error is reported. Binary operators translate well to LLVM, as thellvm::IRBuilderclass has a

set of functions available for this very purpose. Examples include:llvm::IR Builder::CreateAdd(addition),llvm::IRBuilder::CreateICmpEQ(integer

(39)

5.6 Variables

Implementing variables means that two new concepts need to be parsed. First, the parser needs to understand a variable declaration, and secondly, it needs to understand when a variable is referenced. The latter is as easy as to find a single identifier token, whereas the former is a bit more com-plicated. Declaring a variable involves parsing a type, an identifier and optionally, a valid expression to assign the variable, as can be seen in fig-ure 18.

declaration

type identifier

= expression

Figure 18: Parse flow for a variable declaration.

In the case of declaring a variable, the compiler has to make sure that an identical identifier does not already exist in the current scope. If that is not the case, the semantic analyzer is free to insert the variable into the given table. When the analyzer encounters a variable in the context of partaking in an expression, it then performs a table lookup to identify which type the variable has been declared to represent. This type is then appended to the list of types that the analyzer "collects" to verify the integrity of more complex expressions.

A variable in LLVM IR is best represented as an llvm::AllocaInst. These

are created by getting the currently active IR block withllvm::IRBuilder::GetInsertBlock

and then calling the appropriatellvm::AllocaInstconstructor. In order to

not allocate a variable multiple times, these allocation instructions should be kept in a table. To actually manipulate instances of llvm::AllocaInst

instructions is a little bit more complicated depending on how complex said manipulation ends up being. In the base case of using the instance of a variable as an expression, a load instruction needs to be generated by calling an overload of thellvm::IRBuilder::CreateLoadfunction. If the use

of variable ends up being a bit more complicated, such as by accessing a member or by assigning it, more steps will need to be taken, see 5.7 and 5.11.

5.7 Assignment

(40)

statement encounters a constant expression (such as a literal) any follow-ing expressions within the statement may not consist of any assignment operations. The rationale behind this is that from a mathematical stand-point, there is not much sense to be found in an expression akin to4 = 2.

When generating IR for an assignment operation, it closely follows the steps taken in generating any other binary operation, but with a twist. As stated in 5.6, when the compiler visits a variable, it loads the variable into a value and collects it. As LLVM does not support the operation of assigning allvm::Value, the compiler also needs to collect the unloaded

version of the variable. Thus, whenever a variable is visited, before the

llvm::AllocaInst is loaded, the allocation instruction in itself is also

col-lected, preferably in an entirely different container altogether. This con-tainer is later accessed in order to perform the assignment, as an assign-ment is a perfectly valid operation between a llvm::AllocaInst as a

left-side operand and allvm::Value as a right-side operand. This operation is

generated by calling thellvm::IRBuilder::CreateStorefunction.

5.8 Unary operators

Unary operators are very straightforward to parse, as can be seen in fig-ure 19. The two primary unary operators that require specific semantic control are the address-of operator (*) and the dereference operator (&).

unary

operator expression

Figure 19: Parse flow for a unary expression.

When encountering the former, whichever type has been collected will be promoted into a pointer of the given type. Conversely, when encounter-ing the latter, whichever type has been collected will instead be demoted into the underlying type that it points to. Before doing this conversion however, a check needs to be made to see if said type is a pointer. You cannot dereference something that is not a pointer after all, as it already has been dereferenced to its full extent.

Introducing these operators to the IR generation also introduces two new scenarios when visiting a variable. If a dereference operator is encountered, the value loaded from visiting the variable node needs to be loaded once more. This is performed by first calling an overload to thellvm::GetElement

-PtrInst::CreateInBounds function in order to index the address properly.

(41)

address-of operator means that the exact opposite behaviour needs to be modeled. The IR generator then needs to understand whether the par-ent node to a variable node is an address-of operator, and if so, not do the innermost call to the llvm::IRBuilder::CreateLoadfunction so as to instead

collect the underlying address that thellvm::AllocaInstrepresents.

5.9 Conditional statements

The Ghoul programming language supports two kinds of conditional state-ments, the if-thenstatement and theifstatement. As can be seen in

fig-ure 20, a regular if statement can contain any amount of statements in

the block, whereas an if-then statement may contain one and only one

statement. conditional if expression { statement \n } then statement

Figure 20: Parse flow for a conditional statement.

Besides confirming the validity of the expression in a conditional state-ment, the second responsibility of the semantic analyzer in regards to conditionals is to check the resulting type of the statement. By the very na-ture of the construct, the expression in a conditional statement needs to be evaluated to a boolean value in some way. An easy way to implement this is to only allow expressions that result in boolean values as valid sions for a conditional statement, but Ghoul also allows integer expres-sions. This is done by in the case of producing an integer expression, the expression is modified to be a comparison between the prior expression and the value 0. In doing so, integer expressions not equal to the value 0are evaluated to the boolean valuetruewithin this context, whereas

in-teger expressions that equal the value 0 instead evaluate to the boolean

expression false. Modifying the abstract syntax tree after it has already

been built like so is a convenient way to design additional functionality within the compiler.

Generating IR for conditionals involves the creation of additional instances of the llvm::BasicBlock class. After creating two new blocks by calling llvm::BasicBlock::Create, one for the start and one for the end of the

(42)

pro-gram, depending on the value of the llvm::Value provided in the call,

to either the first or the second block. After generating all statements that the conditional node contains the branches need to be joined again, which is performed by generating a final branch instruction via a call to

llvm::IRBuilder::CreateBr, forcing the conditional branch to jump to the

ending block. Once the execution flow has converged, the conditional statement is complete.

5.10 Iterations

Just like with conditionals, iterations in the Ghoul language may appear in two forms. The first form is a more conventional while-loop, and the

second is the slightly more involved for-loop. Figure 21 shows that the

only actual difference between the two from a syntactical standpoint is that a for-loop allows for an optional variable declaration as well as an

optional end-of-loop operation to occur, whereas the standardwhile-loop

only allows for the conditional expression. iteration while expression for declaration ; expression ; expression { statement \n }

Figure 21: Parse flow for an iteration.

As in chapter 5.9, the only additional step for the semantic analyzer re-garding iterations is to verify that the conditional expression can be eval-uated to a boolean value.

(43)

blocks. One conditional block, to evaluate the conditional expression and branch out depending on the value of the expression, one block for the body of the iteration and one final block for the two branches to con-verge on. These blocks are constructed and connected just like in chapter 5.9. The distinction here is that at the end of the iteration body block, it branches back to the block responsible for evaluating the loop condition. When constructing a iteration with a declaration, the declaration is gen-erated before the construction of the iteration blocks. If the iteration has a end-of-loop operation, this operation is performed in a separate block squeezed between the convergence block and the iteration body block. The iteration body block is instead redirected to branch into the end-of-loop block, which in turn branches back to the evaluation block.

5.11 Groupings of data

Figure 22 shows that there is only one special case to account for when parsing a struct definition, which is whether the struct is being defined as volatile or not. Each structure member is then parsed as a pair of types and identifiers, much like variable declarations, until the end of the struct body. To access the member of a struct variable can prove to be a little bit more complex, as the variable parsing may need to be untangled slightly to account for the case that it tries to access a member, and potentially, a member’s member. struct volatile struct identifier { type identifier \n }

Figure 22: Parse flow for a structure declaration.

(44)

contains all of the available datatypes. Once all available datatypes have been inserted into the table, each datatype not marked asvolatilecan be

optimized by ordering the members based on their size, all according to the concepts presented in chapter 3.1.1.

member

identifier . identifier

Figure 23: Parse flow for accessing a member.

When later visiting an expression that accesses a member, the datatype table is consulted to access its member table. If the member the expression specifies does not exist, a compile-time error is reported, otherwise the member’s type is collected by the analyzer.

Once the semantic analysis is complete it is time to generate valid IR for this construct. Custom datatypes in LLVM are created and inserted into their appropriatellvm::Moduleby callingllvm::StructType::create, whereas

their function bodies are defined by later callingllvm::StructType::SetBody.

To access a member of a struct can be seen as indexing memory, which in turn is performed by constructing GEP instructions that extracts the specific memory cells that the member represents before loading said cell once more.

5.12 Arrays

Parsing arrays means that the compiler needs to parse a variety of differ-ent constructs. The developer is meant to be able to create an array, free an array, push elements to the back of the array, pop elements from the back of the array as well as query the current size of the array.

A sympathetic aspect of this is now that the compiler is already able to parse a lot of different constructs, each array-related construct can be clas-sified as an expression that has already been parsed in a previous chapter. The size check in figure 25, the pop operation in figure 27, the free oper-ation in figure 28 and the index operoper-ation in figure 29 are effectively all unary expressions, whereas the push operation in figure 26 is a binary expression.

(45)

definition [

expression

] type

Figure 24: Parse flow for defining an array. size

expression ?

Figure 25: Parse flow for querying the size of an array. push

expression <- expression

Figure 26: Parse flow for defining a push operation. pop

expression ->

Figure 27: Parse flow for defining a pop operation. free

~ expression

Figure 28: Parse flow for defining a free operation. index

expression [ expression ]

Figure 29: Parse flow for indexing an array.

(46)

integer expression or that too ends up being an ill defined operation. Ghoul defines a regular array as a struct consisting of a pointer to the data stored, an integer that contains the current size of the array as well as an integer that contains the current capacity of the array. The array definition is done by calling malloc from the C standard library. The argument to

malloc is computed by performing a multiplication instruction between

the integer expression and the size of the type. Said type can be accessed the data layout, which in turn is accessed withllvm::Module::getDataLayout.

Once the data layout is acquired,llvm::DataLayout::getTypeAllocSizecan be

used to query the size of a type. In the case that the integer expression is omitted from the array definition, the size and capacity are both set to 0 whereas the pointer is set to null by assigning it to a value generated from a call tollvm::ConstantPointerNull::get. Querying the size of an array is as

easy as to index into the size member of an array struct and then loading it as a value, much like what was done in chapter 5.11. Popping an array is almost as easy, as the size first needs to be loaded so that it can be decremented by one before getting stored as the new array size. To index an array means that the address first needs to be loaded, before a GEP can be performed to pinpoint the specific memory address to index. The address generated from the GEP is then loaded as a value.

The most complex operation within the array construct is most likely the operation of pushing an element into the array. There are three different outcomes to this operation:

• The array is not allocated. Memory needs to be allocated before any element can be stored.

• The array is allocated. The current capacity is larger than the current size. The element can be safely stored at the suggested position. • The array is allocated. The current capacity is equal to the current

size. The array needs to be reallocated in order to store another ele-ment.

To differentiate between these three scenarios, LLVM blocks are spawned in the same way as in chapter 5.9. The first block represents the case of the array being unallocated. Here, a call to malloc is done to get the

(47)

call-ing the llvm::IRBuilder::CreateShl function. Once the new size has been

calculated, an easy way to reallocate the memory is to simple generate a call to the realloc function from C’s standard library. Once the memory

has been reallocated and the capacity has been updated, the size can be incremented by one and all three new values can be inserted into their respective indices in the array struct.

5.13 Memory layout

As presented in chapter 3.1.2, Ghoul wishes to introduce two kinds of ar-rays that both share the same interface to the developer. The second array type is declared the same way as the first one, just with an ’@’ character between the brackets and the type.

declaration [ ]

@

type identifier

Figure 30: Parse flow for declaring an array.

(48)

6 Result

To conclude the project, a benchmark was designed and performed to see if the cache-friendly concepts the language provides impact performance. The benchmark is fairly simple in its design, measuring the time it takes for a full iteration of a Ghoul array to be performed. The size of the data iterated upon is gradually increased, so as to get a feeling for what effect it might have on the performance. In figure 31, this iteration is performed while only accessing a single data member, whereas in figure 32 shows the iteration is performed while accessing several data members.

Figure 31: Results of the benchmark measuring array iteration speed when only accessing a single struct member.

Furthermore, the rearrangement of struct members mentioned in chap-ter 3.1.1 was successfully implemented. Figure 33 shows both the source code from the test of this feature, as well as the resulting IR. As can be seen in the IR, the two types defined on row 2 and 7 inoptstruct.gh both

get translated into different IR representations in rows 4-5, even though their struct bodies are identical. In the case of Object, a char pointer is

recognized to be 64 bits in size, whereas an int is 32 bits. The compiler

recognizes this, and swaps the two members around to optimize the size of the struct. VolatileObjectlooks the same, but as it was declared volatile,

(49)

Figure 32: Results of the benchmark measuring array iteration speed when accessing multiple members.