Retargeting a C Compiler for a DSP Processor

(1)

Retargeting a C Compiler for a

DSP Processor

Master thesis performed in electronics systems by

Henrik Antelius

LiTH-ISY-EX-3595-2004 Linköping 2004

(2)

(3)

Retargeting a C Compiler for a

DSP Processor

Master thesis in electronics systems at Linköping Institute of Technology

by

Henrik Antelius

LiTH-ISY-EX-3595-2004

Supervisors: Thomas Johansson Ulrik Lindblad Patrik Thalin

(4)

(5)

Institutionen för systemteknik 581 83 LINKÖPING 2004-10-05 Språk Language Rapporttyp Report category ISBN Svenska/Swedish X Engelska/English Licentiatavhandling

X Examensarbete ISRN LITH-ISY-EX-3595-2004

C-uppsats

D-uppsats Serietitel och serienummer_{Title of series, numbering} ISSN Övrig rapport

____

URL för elektronisk version

http://www.ep.liu.se/exjobb/isy/2004/3595/

Titel

Title

Anpassning av en C-kompilator för kodgenerering till en DSP-processor Retargeting a C Compiler for a DSP Processor

Författare

Author

Henrik Antelius

Sammanfattning

Abstract

The purpose of this thesis is to retarget a C compiler for a DSP processor.

Developing a new compiler from scratch is a major task. Instead, modifying an existing compiler so that it generates code for another target is a common way to develop compilers for new processors. This is called retargeting.

This thesis describes how this was done with the LCC C compiler for the Motorola DSP56002 processor.

Nyckelord

Keyword

(6)

(7)

Abstract

The purpose of this thesis is to retarget a C compiler for a DSP proces-sor.

Developing a new compiler from scratch is a major task. Instead, modifying an existing compiler so that it generates code for another target is a common way to develop compilers for new processors. This is called retargeting.

This thesis describes how this was done with the LCC C compiler for the Motorola DSP56002 processor.

(8)

(9)

1 Introduction

1

1.1 Background . . . 1

1.2 Purpose and goal. . . 1

1.3 The reader. . . 2 1.4 Reading guidelines . . . 2

2 DSP

3

2.1 Introduction . . . 3 2.2 Motorola DSP56002. . . 4 2.2.1 Data buses . . . 5 2.2.2 Address buses. . . 5 2.2.3 Data ALU . . . 5

2.2.4 Address generation unit . . . 5

2.2.5 Program control unit . . . 6

2.3 Instruction set . . . 6

2.4 Assembly . . . 6

3 Compilers

9

3.1 Introduction . . . 9

3.2 The analysis-synthesis model . . . 9

3.3 Phases . . . 10 3.4 Analysis . . . 11 3.4.1 Lexical analysis. . . 11 3.4.2 Syntax analysis . . . 11 3.4.3 Semantic analysis . . . 14 3.5 Synthesis . . . 15

3.5.1 Intermediate code generation . . . 15

3.5.2 Code optimization . . . 16

3.5.3 Code generation . . . 18

3.6 Symbol table. . . 18

(10)

3.9.1 Preprocessor . . . 20

3.9.2 Assembler . . . 20

3.9.3 Linker and loader. . . 21

3.10 Compiler tools . . . 21

4 LCC

23

4.1 Introduction . . . 23 4.2 C. . . 24 4.3 The compiler . . . 24 4.3.1 Lexical analysis. . . 24 4.3.2 Syntax analysis . . . 26 4.3.3 Semantic analysis . . . 29

4.3.4 Intermediate code generation . . . 29

4.3.5 Back end . . . 30

5 Implementation

33

5.1 Introduction . . . 33

5.2 The compiler . . . 33

5.2.1 Data types and sizes . . . 34

5.2.2 Register usage. . . 35 5.2.3 Memory usage . . . 36 5.2.4 Frame layout . . . 37 5.2.5 Calling convention. . . 38 5.2.6 Naming convention . . . 39 5.3 Retargeting . . . 39 5.3.1 Configuration . . . 40 5.3.2 Declarations . . . 40 5.3.3 Rules. . . 40 5.3.4 C code . . . 40 5.4 Special features . . . 44 5.5 Other changes to LCC. . . 45 5.6 The environment . . . 45 5.7 crt0 . . . 46 5.8 Problems . . . 46 5.8.1 Register targeting . . . 46 5.8.2 48-bit registers . . . 48 5.8.3 Address registers . . . 48 5.9 Improvements . . . 49

6 Conclusions

51

6.1 Retargeting . . . 51 6.2 Future work . . . 51

References

53

(11)

Appendix A: Instructions

55

A.1 Arithmetic instructions . . . 55

A.2 Logical instructions . . . 56

A.3 Bit manipulation instructions . . . 57

A.4 Loop instructions . . . 57

A.5 Move instructions . . . 57

A.6 Program control instructions . . . 58

Appendix B: Sample code

59

B.1 sample.c . . . 59

B.2 sample.asm . . . 60

Appendix C: dsp56k.md

61

(12)

(13)

1

Introduction

1.1 Background

The division of Electronics Systems (ES) at the department of Electri-cal Engineering (ISY) at Linköping University (LiU) is currently run-ning a project aiming at developing a DSP processor. The goal of this project is to make a DSP with a scalable structure that is instruction level compatible with the Motorola DSP56002 processor. The scalabil-ity refers to variable data word length and addition or removal of memories and instructions. The goal with scalability is to reduce the power consumption.

Currently this project is nearly finished. In order to increase the usabil-ity of the DSP a C compiler is needed.

It was decided that the best way to create a C compiler was to retarget an existing C compiler. Creating a compiler from scratch is a big undertaking that requires a lot of work. Retargeting a compiler is a rel-atively easy task compared to developing an entire compiler.

1.2 Purpose and goal

The purpose of this thesis is to retarget a C compiler to the Motorola DSP56002 processor. The resulting compiler should from one or more C source files produce an executable file that can execute on the DSP.

(14)

1.3 – The reader

that works correctly and functions as intended. There are no require-ments on the performance or the size of the generated code.

The compiler should also be compatible with Motorola’s C compiler and tools for the DSP56002. This makes it possible to mix generated code from the two compilers. It also means that the tools from Motorola can be used for the new compiler.

1.3 The reader

It is assumed that the reader of this thesis has basic knowledge of the C programming language and some knowledge of assembly lan-guage. It is also assumed that the reader has a general knowledge of how processors work and what function a compiler has.

1.4 Reading guidelines

This is a brief description of the chapters:

• Chapter 1 contains an introduction and states the purpose of the thesis.

• Chapter 2 describes how the DSP56002 processor works and how it can be used.

• Chapter 3 contains general compiler theory that is needed to understand how a compiler works.

• Chapter 4 describes the compiler LCC that was used in this thesis.

• Chapter 5 describes the implementation and modifications that were done to LCC.

• Chapter 6 lists the conclusions that were made and suggests fur-ther work.

(15)

2

DSP

This chapter contains a description of how the Motorola DSP56002 processor works. This information is collected from [4].

2.1 Introduction

Digital signal processing is, as the term suggests, the processing of sig-nals by digital means. The signal is normally an electrical signal car-ried on a wire, but it can represent almost any kind of information and it can be processed in a wide variety of ways. Examples of digital sig-nal processing include the following:

• Filtering of signals.

• Convolution, which is the mixing of two signals. • Correlation, which is the comparison of two signals. • Rectification, amplification and transformation of a signal. All of these tasks have earlier been performed by using analog cir-cuits. Nowadays integrated circuits have enough processing power to perform these and many other functions. The devices performing these tasks are called digital signal processors, or DSPs. They are spe-cialised microprocessors with architectures designed specifically for the types of operations required in digital signal processing. Like gen-eral-purpose microprocessors, the DSPs are programmable devices

(16)

2.2 – Motorola DSP56002

DSPs can today be found in almost all electronic areas, such as mobile phones, personal computers, digital television decoders, surround receivers, and so on. The advantages of using a DSP instead of analog circuits are many. Generally, fewer components are needed, DSPs have higher noise immunity, it is easy to change the behaviour of a fil-ter, filters with closer tolerances can be built, and so on. Also, since the DSP is a microcomputer, the same hardware design can be used in many different areas by simply changing the software for the DSP.

2.2 Motorola DSP56002

The Motorola DSP56002 is a general purpose DSP processor with a tri-ple-bus Harvard architecture. This architecture can access multiple memories at the same time. It uses fixed-point arithmetic and has three function units; data arithmetic and logic unit (Data ALU), address generation unit (AGU) and program control unit (PCU). It does also have three memories, two for data (X and Y) and one for the program (P). A block diagram of the DSP56002 can bee seen in Figure 2.1.

(17)

This architecture with multiple memories and buses makes it possible to, during one instruction cycle, make one computation in the data ALU while accessing the X and Y memories at the same time.

2.2.1 Data buses

The data buses consists of four 24-bit wide buses called the Y data bus (y_dbus), the X data bus (x_dbus), the program data bus (p_dbus) and the global data bus (g_dbus). They are used for moving data between the function units and the memories. Data transfers between the data ALU and the X and Y memories occur over the X and Y data buses, respectively. All other data movements occur over the global data bus and instruction fetches occurs over the program data bus.

2.2.2 Address buses

Addresses for the X data memory and the Y data memory are speci-fied over the X address bus (x_abus) and the Y address bus (y_abus). Addresses for the program memory are specified over the P address bus (p_abus). All address buses are 16-bit wide.

2.2.3 Data ALU

The data ALU performs all of the arithmetic and logical operations on the data. It uses a register set that consists of four 24-bit input regis-ters, two 48-bit accumulator registers and two 8-bit accumulator extension registers.

The input registers are called X0, X1, Y0 and Y1. They can also be com-bined into two 48-bit registers called X and Y. The two accumulators are called A and B and are 56 bits wide. Each consists of three concate-nated registers, A2:A1:A0 and B2:B1:B0. The A2 and B2 are the 8-bit accumulator extension registers and they are used when more than 48-bit accuracy is needed.

The input registers are used for operands to the instructions and the accumulator registers are used for both operands and the result from instructions.

2.2.4 Address generation unit

The AGU performs all of the address storage and address calculations necessary to access the data in the memories. The AGU is divided into two identical halves, each of which has an address arithmetic unit that

(18)

2.3 – Instruction set

registers N0 – N7 and the modifier registers M0 – M7. The R-registers are used for storing addresses that are used to address the memories. The N- and M-registers are used to update the R-registers in various ways. The registers are connected. So, for example, only N1 and M1 can be used to update R1.

2.2.5 Program control unit

The PCU performs instruction prefetch, instruction decoding, hard-ware loop control and interrupt processing. It contains a 15-level sys-tem stack that is 32 bits wide and the following six registers: program counter (PC), loop address (LA), loop counter (LC), status register (SR), operating mode register (OMR) and stack pointer (SP).

2.3 Instruction set

The instruction set can be seen in Appendix A on page 55. About half of the available instructions allow the use of parallel data moves.

2.4 Assembly

The instruction syntax is organized in four columns; opcode, oper-ands and two parallel move fields. An example of a typical assembly instruction can be seen here:

Opcode Operands XDB YDB

MAC X0,Y0,A X:(R0)+,X0 Y:(R4)+,Y0

The opcode column specifies the operation that should be performed. The operands column specifies which operands the opcode should use. The XDB and YDB columns specify optional data transfers over

the X data bus and the Y data bus. The address space qualifiers X: and

Y: indicate which memory is being referenced.

This is an example of a small assembly program:

ORG Y: var_a dc 42 var_b dc 48 ORG P:$40 MOVE Y:var_a,X0 MOVE Y:var_b,A

(19)

ADD X0,A MOVE A,Y:var_a

This program simply adds the variables var_a and var_b and stores the result in var_a.

This is a list of some of the features of the assembler that is used in this thesis:

• Labels: If the first character on a line is not a space or a tab it is a label. Labels are used for variables and jump destinations. A colon is often used to end the label to increase readability of the assembly.

• ORG: The ORG directive is used to indicate which memory the

following statements belong to. It is also used for a lot of other memory related things.

• OPT: The OPT directive is used to assign options to the

assem-bler.

• Variables: Variables are declared with a label and the DC

direc-tive to define a constant.

• GLOBAL: The GLOBAL keyword is used to instruct the

assem-bler that a variable is global.

• Comments: Semicolon is used as a comment specifier. All char-acters to the right of the semicolon are ignored.

(20)

(21)

3

Compilers

This chapter contains general compiler theory. Most of the informa-tion is collected from [1].

3.1 Introduction

A compiler is a program that reads a program written in one language and translates it into an equivalent program in another language. An important part of this process is to report the presence of errors in the source program to the user.

There exists thousands of different compilers for different source lan-guages and target lanlan-guages, and there also exists many different types of compilers. However, the basic principles of how the compil-ers work are the same. This chapter will discuss these basic principles.

3.2 The analysis-synthesis model

There are two parts to compilation: analysis and synthesis. The analy-sis part breaks up the source program into consecutive pieces and cre-ates an intermediate representation of the source program. The synthesis part constructs the desired target program from the interme-diate representation.

(22)

3.3 – Phases

During analysis the operations stated in the source program are deter-mined and recorded in a hierarchical structure called a tree. Often a special kind of tree called a syntax tree is used.

In the synthesis part of the compilation the output is generated from the contents of the syntax tree. There is often also some sort of optimi-zation of the generated source in this part.

3.3 Phases

A compiler operates in phases, each of which transforms the source program from one representation to another. A typical decomposition of a compiler is shown in Figure 3.1. The following sections will dis-cuss the different phases and how they are connected.

(23)

3.4 Analysis

The analysis consists of three phases: lexical analysis, syntax analysis and semantic analysis.

3.4.1 Lexical analysis

Lexical analysis, sometimes called scanning, is where the stream of characters that make up the source program is scanned left-to-right and transformed into groups of characters called tokens. For example, the characters in the statement

result = start + rate * 60

would be transformed into the following tokens:

1. The identifier result.

2. The assignment symbol =.

3. The identifier start. 4. The plus sign.

5. The identifier rate. 6. The multiplication sign.

7. The number 60.

The white space is normally eliminated during lexical analysis.

3.4.2 Syntax analysis

Syntax analysis, or parsing, is where the tokens of the source program is grouped into grammatical phrases. Usually the phrases of the source program is represented by a parse tree. An example of a parse tree can be seen in Figure 3.2.

(24)

3.4 – Analysis

Figure 3.2: Parse tree for the statement result=start+rate*60

The phrase rate*60_{will be grouped together because the rules of}

arithmetic expressions state that multiplication is performed before addition.

Context free grammars

The rules for the syntax analysis is often expressed by context free grammars. The grammar gives a precise and easy to understand spec-ification of the syntax of the programming language. It is also possible to construct a parser from a grammar by using automated tools. For example, an if-else statement in C has the form:

if ( expression ) statement else statement

The statement is the concatenation of the keyword if, an opening parenthesis, an expression, a closing parenthesis, a statement, the key-word else, and another statement. Using the variable expr for expres-sion and stmt for statement, this rule can be expressed as:

stmt → if ( expr ) stmt else stmt

The arrow may be read as “can have the form”. This kind of rule is called a production. In a production lexical elements like the keyword

if and the parenthesis are called tokens. Variables like expr and stmt

represent sequences of tokens and are called nonterminals. A context free grammar has four components:

1. A set of tokens, known as terminal symbols. 2. A set of nonterminals.

(25)

3. A set of productions where each production consists of a nonter-minal, an arrow, and a sequence of tokens and/or nonterminals. 4. A designation of one of the nonterminals as the start symbol. The following is an example of a simple grammar that can parse the right hand side of the assignment statement in Figure 3.2:

expr → identifier

expr → number

expr → expr + expr | expr * expr

The symbol | is used to separate multiple productions on one line and can be read as “or”. By using expr as the start symbol the derivation of the right hand side of the assignment statement could look like this:

expr → expr + expr

→ identifier + expr

→ identifier + expr * expr

→ identifier + identifier * expr

→ identifier + identifier * number

A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the right side of the production for that nonterminal. The set of token strings that can be derived from the start symbol form the language defined by the grammar.

Syntax tree

A more common internal representation of the syntactic structure is the syntax tree. It is a compressed representation of the parse tree where the operators appear as the nodes, and the operands of an oper-ator are the children for that node. An example of a syntax tree is seen in Figure 3.3.

(26)

3.4 – Analysis

3.4.3 Semantic analysis

The semantic analysis phase checks the source program for semantic errors and gather type information for the code generation phase. It uses the hierarchical structure generated in the syntax analysis phase to identify the operators and operands of expressions and statements. This checking ensures that certain kinds of programming errors will be detected and reported. Examples of semantic checks can be:

• Type checks: The compiler should report an error if an operator is applied to an incompatible operand. For example, if an inte-ger variable is added to a function. It can also check that param-eters to functions are correct in type and number.

• Flow-of-control checks: Statements that causes the flow of con-trol to leave a construct must have some place to which to

trans-fer the flow of control. For example, a break statement in C

causes the flow of control to leave the enclosing while, for or

switch statement. If break is used outside of one of those an error is generated.

• Uniqueness checks: Sometimes an object can only be defined

once. For example, the case labels in a switch statement in C

must be unique, and variables with the same name in the same scope is not permitted.

• Name related checks: Sometimes the same name must appear at multiple locations. For example, in ADA a loop or block may have a name that appears at the beginning and at the end of the construct.

There are many more different types of checks that can be needed to be performed depending on the language. In C for example, functions and variables must be declared before they are used, something that is not necessary in some languages.

The type checking does not always have to result in an error. For example, a type mismatch can sometimes be resolved by converting the operand. If a in the statement

a = a * 2;

is a floating point number, the integer 2 must be converted to a

float-ing point number before the multiplication can take place. This is accomplished by inserting a new node that explicitly converts an inte-ger to a floating point number in the syntax tree.

(27)

Since programming languages are so different and the semantic checks needed by the languages are so different there is no systematic way perform the semantic checks. It is usually done by traversing the tree and examining the nodes or during the syntax analysis phase.

3.5 Synthesis

The synthesis consists of the three phases intermediate code genera-tion, code optimizer and code generator. It is responsible for trans-forming the source that is now in the form of a syntax tree to the output language.

3.5.1 Intermediate code generation

After the syntax and semantic analysis some compilers generate a machine independent intermediate form of the source program. Although the source program can be translated directly to the target language from the syntax tree, there are some benefits of using an intermediate form:

• A machine independent optimizer can be used on the interme-diate representation.

• Retargeting is made easier. Creating a compiler for a different machine can be done by replacing a smaller part of the compiler than would have otherwise been necessary.

The intermediate representation should have two important proper-ties; it should be easy to generate and it should be easy to transform into the target program. A common way to solve this is to use a so called three-address code. It is very similar to an assembly language where each memory location can be used as a register. The code con-sists of a sequence of instructions, each of which can have at most three operands. For example, the assignment statement from Figure 3.2 might look like this:

temp1 = rate * 60 temp2 = temp1 + start result = temp2

There are also statements for conditional and unconditional jumps, procedure calls, return statements, indexed assignment to be used on arrays, and address and pointer assignments.

(28)

3.5 – Synthesis

instruction set is easier to implement and retarget. However, if it is too small the intermediate code generator can be forced to generate long sequences of statements for some source language operations. It will then be more difficult for the optimizer and the code generator to pro-duce good code.

3.5.2 Code optimization

The code optimizer will attempt to improve the intermediate code so that faster running machine code will be generated. It can sometimes also be of interest to make the code smaller. For DSP processors code with lower power consumption is sometimes preferred.

There are two types of optimizations that can be done; machine inde-pendent and machine deinde-pendent. Machine indeinde-pendent optimiza-tions are typically done using the intermediate form as the base and does not consider any details of the target architecture when making optimization decisions. It is often very general in nature. Machine dependent optimizations can be done both on the intermediate form and the generated code. These optimizations consider the target archi-tecture specifically and uses special instructions such as hardware loops and so on.

There are a number of common optimization techniques.

Constant propagation

Constant propagation is simply the replacement of variable references with constant references when possible. For example, the statement

a = 3;

function_call(a + 42);

becomes

function_call(3 + 42);

Constant folding

Expressions with constant operands can be calculated at compile time. The example above would be transformed to

function_call(45);

Programmers usually do not write expressions such as 3+42 directly,

but these expressions are quite common after macro expansion and other optimizations such as constant propagation.

(29)

Common subexpression elimination

A common subexpression, or CSE, is created when two or more expressions compute the same value. The expression is calculated once to a temporary variable that is used instead of the CSE. For example, the statement

array1[i + 1] = array2[i + 1];

will be transformed to

temp1 = i + 1;

array1[temp1] = array2[temp1];

Dead code elimination

Code that is never reached or that does not affect the program can be eliminated. For example, this code fragment

int global; void foo(void){ int k = 1; global = 1; global = 2; }

will transform into the following

int global; void foo(void){

global = 2; }

Expression simplification

Some expressions can be simplified by replacing them with a more efficient expression. For example, i+0 will be replaced by i, i*0 and

i-i by 0, and so on.

Code motion

Expressions in a loop that gives the same result each time the loop is iterated can be moved outside the loop and calculated only once before entering the loop.

(30)

3.6 – Symbol table

Strength reduction

Strength reduction replaces expensive instructions with less expensive instructions. For instance, a popular strength reduction is to replace a multiplication by a constant power of two with a left shift.

3.5.3 Code generation

The final phase of the compiler is the generation of target code. The target code is usually relocatable machine code or assembly code. Memory locations are selected for each of the variables used in the source program and the intermediate instructions are translated into one or more assembly level instructions that perform the same task. A vital part of code generation is the assignment of registers to variables, since that can greatly affect the performance of the generated code. Using the example from the previous sections, the generated code might look like this

MOVE rate, R1 MUL #60, R1 MOV start, R2 ADD R1, R2 MOVE R2, result

3.6 Symbol table

An essential part of the compiler is to keep track of the identifiers used in the source program and to collect information about various attributes of each identifier. These attributes contains information about the name and type of the identifier, its size, scope and so on. For functions and procedures it also contains the number and types of its arguments and the return type. In a similar way it works for more complex data types like arrays and structures.

The symbol table is a data structure that contains a record for each identifier and fields for the attributes of the identifier. The data struc-ture makes it possible to search for identifiers and add or retrieve the attributes and to add new identifiers.

When an identifier is found in the lexical analysis its name is added to the symbol table if its not already there. The index in the symbol table is then passed along in the token and that index is used to refer to the identifier from there on. In the later phases of compilation information

(31)

about the type and other attributes are added and is used in various ways.

3.7 Error handler

It is important that the compiler can detect errors and deal with them in a reasonable way. When an error is encountered the compiler emits an error message containing the location of the error in the source pro-gram and a message stating the type of error and it then tries to con-tinue with the compilation. It can sometimes be difficult for the compiler to know what to do to continue when an error has been detected. One way is to, for example, skip all input until the next sem-icolon to get to the next statement.

As soon as the error count is greater than zero a flag is set and the compiler will stop execution after the semantic analyzer phase. There is no point in generating the target program when there exists errors in the source program.

The compiler can also detect minor errors that will not stop the compi-lation and emit warnings about these errors instead.

3.8 Front and back end

Often the phases are collected into a front end and a back end. The front end consists of the phases that depend on source language and are largely independent of the target machine. These normally include lexical analysis, syntax analysis, semantic analysis and the generation of intermediate code. The machine independent optimizations can also be done in the front end. The creation of the symbol table and most of the error handling is also done in the front end.

The back end includes the parts of the compiler that are dependant on the target machine, and these parts usually does not depend on the source language, only the intermediate code. The back end therefore consists of the code optimizer and the code generator. It also uses the symbol table and error handler.

This division of the design makes it easy to take the front end of a compiler and combine it with a new back end to produce a compiler for the same source language to a different target machine. This is

(32)

3.9 – Environment

3.9 Environment

In addition to the compiler, several other programs are required if an executable program is to be created. See Figure 3.4.

Figure 3.4: A compiler system

3.9.1 Preprocessor

The preprocessor produce the input to the compiler. It often performs different kinds of text processing, for example macro processing and

file inclusion. In C for example, all lines beginning with a #_{is an}

instruction to the preprocessor. #define FAIL -1 causes all

occur-rences of FAIL_{to be replaced by}-1_{, and}#include <file.h>_will

include the file file.h in the source program. In C the preprocessor

also removes the comments from the source program.

3.9.2 Assembler

Some compilers produce assembly code, and that must be passed to an assembler for further processing. The assembler works much like a compiler and translates the assembly source to relocatable machine

(33)

3.9.3 Linker and loader

The linker makes it possible to combine several relocatable machine code files into a single program. The different machine code files can be the result from several compilations, and some may be library files. The linker resolves external references in the input files so that data and functions from the different files can be used by each other.

When all the external references are resolved the loader takes the relo-catable machine code and alters all relorelo-catable addresses to real addresses and places the code and the data in its proper locations and creates the output file.

3.10 Compiler tools

Since most compilers use the same structure and function in the same way, specialized tools have been developed that helps implement the various components of the compiler. These tools use specialized lan-guages for specifying and implementing the components, and many use algorithms that are quite sophisticated. The following is a list of some compiler construction tools:

• Scanner generators: These automatically generate lexical ana-lyzers. Usually from a specification based on regular

expres-sions. Examples include flex and lex.

• Parser generators: These produce syntax analyzers from specifi-cations that is normally based on a context free grammar. Before the parser generators appeared, the parser was the most time consuming part to implement. Now it is considered to be one of the easiest to implement thanks to the parser generators.

Exam-ples of parser generators are yacc and bison.

• Syntax-directed translation engines: These produce routines that walk the parse tree and generates intermediate code.

• Automatic code generators: These tools generates routines that translates the intermediate language into the machine language for the target machine by the help of a collection of rules. The basic technique is template matching. The intermediate code statements are replaced by templates that represent sequences of machine instructions.

(34)

(35)

4

LCC

This chapter describes how the compiler LCC works. Most of this information is collected from [2] and [3].

4.1 Introduction

LCC is a free ANSI C compiler that is designed to be retargetable. The source code is available for download from the internet [6] under a license [7] that imposes almost no restrictions at all.

This compiler was chosen because it is very small and simple. It is designed in a way so that it is easy to retarget it to generate code for other processors. There is also excellent documentation of LCC in the form of a book that describes every detail of the implementation of the entire compiler. It is called “A Retargetable C Compiler: Design and Implementation” and it was used extensively during this thesis. The thesis could probably not have been completed without the book. Another compiler candidate was the GNU C Compiler, or GCC, from the GNU Compiler Collection, which is an open source C compiler. GCC would probably have generated better and faster code, but it was not chosen because it is much bigger and more complex than LCC. Also, the same kind of documentation that was available for LCC was not available for GCC.

(36)

4.2 – C

4.2 C

C is a general purpose programming language that was developed during the 1970’s by Brian Kernighan and Dennis Ritchie and it is still widely used today. It is a relatively low level language where the basic data types in the language correspond to real data types found in the hardware. The language provides no operations to deal directly with composite data types, such as strings, arrays, lists and so on. There are no input/output facilities and no file access facilities. All these higher level operations must be provided by library functions. This, and sev-eral other limitations, has some advantages. It makes the language small and relatively easy to learn. It does also mean that compilers for the language will be smaller and easier to construct.

C has become very popular and there exists compilers for many differ-ent processors and operating systems. Although it is far from an ideal language for DSP processors, it is still extensively used for them. That is probably because it is such a simple and low level language, which makes it easier to construct a compiler that generates efficient code for the DSP processors.

Over the years the C programming language has evolved and been standardized a couple of times. The first version, called K&R C (from Kernighan and Ritchie), is derived from the reference manual in the first edition of the book “The C programming language” by Brian Kernighan and Dennis Ritchie. In 1989 ANSI standardized the lan-guage, and it is commonly referred to as ANSI C or C89. ISO has released two standards for C, and they are called ISO C90 and ISO C99.

4.3 The compiler

The following sections will describe the different phases of the com-piler and how they work.

4.3.1 Lexical analysis

The lexical analyzer reads source text and produces tokens. For each token the lexical analyzer returns its token code and zero or more associated values. The token codes for single character tokens, for

example = and +, are the characters themselves. For tokens that can

consist of one or more characters, for example identifiers and con-stants, defined constants are used. For example, the expression

(37)

ID "ptr" symbol table entry for ptr '='

ICON "42" symbol table entry for 42

The token code for the operator = is the numeric value of =, and it

does not have any associated values. The token code for the identifier

ptr is the value of the constant ID, and the associated values are the

identifier string itself and a pointer to the symbol table entry for the

identifier. The integer constant 42 returns the token ICON and the

associated values "42" and a pointer to the symbol table.

Keywords, such as for and switch, have their own token codes to

distinguish them from identifiers.

The lexical analyzer also tracks the source coordinates for each token. These coordinates contains the file name, line number and position on the line of the first character of the token. The coordinates are used to locate errors when they are found.

Recognizing tokens

The lexical analyzer in LCC is written by hand, it is not generated by a tool. This is due to the fact that the lexical structure in C is simple and that generated analyzers tend to be large and slow.

The lexical analyzer is used by calling the function gettok(), which

returns the next token. The gettok() function recognizes a token by

using a switch statement on the first character in the token to classify it. It then consumes the following characters that make up the token. The following is a small sample of the code

... switch (*rcp++) { ... case '<': if (*rcp == '=') return cp++, LEQ; if (*rcp == '<') return cp++, LSHIFT; return '<'; ...

rcp and cp are pointers to the next character in the input file. The

code for identifying most of the tokens looks very similar to the exam-ple, but identifying numbers, strings and identifiers is a bit harder. However, it works in the same way by looking ahead in the input

(38)

4.3 – The compiler

4.3.2 Syntax analysis

The syntax analyzer, or parser, uses the stream of tokens from the lexi-cal analyzer and confirms that it follows the syntax of the language. It also builds an internal representation of the input that is used by the rest of the compiler.

The parser for LCC is also written by hand. The reason for this the same as for the lexical analyzer; C is a simple language and the code generated by tools is slow and big.

Grammar

LCC uses a context free grammar written in EBNF form to define the rules for the parser. The parser is constructed by writing a parsing

function for each nonterminal. The idea is to write a function X() for

each nonterminal X, using the productions for X as a guide to writing

the code for X(). For example, the parsing function for the following

production

expr → term { + term } will look like

void expr(void){ term(); while(t == '+'){ t = gettok(); term(); } }

The { and } in the production is an EBNF feature that means “zero or more”.

Abstract syntax tree

When parsing the program the compiler also generates an intermedi-ate representation of the program. This is done in the form of abstract syntax trees, or simply trees. Abstract syntax trees are parse trees without the nodes for nonterminals and nodes for useless terminals.

For example, the tree for the expression (a+b)*c can be seen in

(39)

Figure 4.1: Tree for the expression (a+b)*c

There are no nodes for the nonterminals used when parsing this

expression, and there are no nodes for the tokens ( and ). The tokens

+ and * are contained in the nodes ADD+I and MUL+I. The nodes with

the operator ADDRG+P compute the address of the operand and

INDIR+I fetches integers at the address given by their operand. The name of the nodes are constructed by an operator and a type suf-fix that denotes the type that the operator operates on. For example,

the node ADD+I states that the node uses integer addition. Table 4.1

lists the different type suffixes available.

The trees can contain operators that do not appear in the source

pro-gram. For example, the INDIR+I node fetches integers at an address,

but there is no fetch operator in C. A list of operators that can appear in the trees is seen in Table 4.2. In addition to these, there are six more operators that are used in trees listed in Table 4.3.

Type suffix Meaning

F Floating point I Integer U Unsigned P Pointer V Void B Structure

(40)

Operator Type suffix Operation

ADDRF ...P.. Address of a parameter

ADDRG ...P.. Address of a global

ADDRL ...P.. Address of a local

CNST FIUP.. Constant

BCOM .IU... Bitwise complement

CVF FI.... Convert from float

CVI FIU... Convert from signed integer

CVP ..U... Convert from pointer

CVU .IUP.. Convert from unsigned integer

INDIR FIUP.B Fetch

NEG FI.... Negation

ADD FIUP.. Addition

BAND .IU... Bitwise AND

BOR .IU... Bitwise inclusive OR

BXOR .IU... Bitwise exclusive OR

DIV FIU... Division

LSH .IU... Left shift

MOD .IU... Modulus

MUL FIU... Multiplication

RSH .IU... Right shift

SUB FIUP.. Subtraction

ASGN FIUP.B Assignment

EQ FIU... Jump if equal

GE FIU... Jump if greater than or equal

GT FIU... Jump if greater than

LE FIU... Jump if less than or equal

LT FIU... Jump if less than

NE FIU... Jump if not equal

ARG FIUP.B Argument

CALL FIUPVB Function call

RET FIUPV. Function return

JUMP ....V. Unconditional jump

LABEL ....V. Label definition

(41)

4.3.3 Semantic analysis

The semantic analysis of the source program is done when the parser recognizes the input, so there is therefore no explicit phase in the com-pilation where this is done. Each parsing function detects and handles the semantic errors according to the semantics of each construct. When for example a type conversion is needed an extra convert node

is inserted in the abstract syntax tree, and the expression x = 6

gener-ates an error if x is not defined. There are a lot of other semantic

checks that are also being done.

4.3.4 Intermediate code generation

During this stage the compiler produces directed acyclic graphs, or dags, from the trees. The compiler also eliminates common

subexpres-sions. For example, in the expression (a+b)+b*(a+b) the value of

a+b is computed twice. The dag for this expression can be seen in

Fig-ure 4.2. The multiplication node (MULI4) uses the already computed

values for a+b and b instead of computing them again.

The names of the nodes in dags are made up of a generic operator, a type suffix and a size indicator. The + is omitted to distinguish dags

from trees. For example, ADDI4 denotes a 4-byte (32-bit) integer

addi-tion.

Operator Operation

AND Logical AND

OR Logical OR

NOT Logical NOT

COND Conditional expression

RIGHT Composition

FIELD Bit-field access

(42)

Trees contain operators that are not allowed for dags. The available operators for dags are seen in Table 4.2. When the dags are con-structed the operators that are not allowed are replaced by other

oper-ators instead. For example, the operator AND is replaced by a

comparison and jumps and labels.

Before the dags are passed to the back end they may be converted to trees again. Some back ends wants trees and some wants dags. All back ends that are included in the LCC distribution wants trees. When the conversion is done nodes that are referenced multiple times because of the common sub expression optimization are changed. The result of the common subexpression is stored in a temporary variable that is used instead. The resulting tree is still using the same data structures and representation as the dags though.

4.3.5 Back end

LCC's back end is divided in a machine independent part and in a machine dependent part. The front end communicates with the back end by calling a number of interface functions.

In a C program, all program code is contained in functions. To gener-ate code for a function the front end calls the interface function

function(). function() uses two functions to generate code,

gencode() and emitcode(). gencode() selects and orders

instructions and allocates registers. emitcode() emits the assembler

code for the function and also removes unnecessary register to register copies. These register to register copies are left over from earlier opti-mizations and it is easier to remove them here.

Selecting instructions

The instruction selection is done in the function gencode(). The

instruction selectors used by LCC are generated automatically from a

specification by a program called lburg. lburg is a code generator

generator and it emits a tree parser written in C.

The core of an lburg specification is a tree grammar, which is a list of

rules where each rule has a nonterminal on the left and a pattern of terminals and nonterminals on the right. For example, the rule

addr: ADDI4(reg, con)

matches a tree at an ADDI4 node if the node’s first child recursively

(43)

the nonterminal con. In Figure 4.3 the tree with the selected rules for

the statement i = c + 2 can be seen.

Figure 4.3: Tree with rules

Tree grammars are usually ambiguous, which means that there can be more than one selection of instructions that do the same thing. For example, increasing a register by one can be done by adding one to the register directly or by loading one into another register and adding the two registers. The cheapest implementation is preferred, so a cost is assigned to each rule and the parse tree with the lowest total cost is selected.

Specifications

lburg specifications uses the following format

%{ configuration %} declarations %% rules %% C code

The configuration part is C code and is optional. It is copied

directly into the generated file. The same applies to the C code part.

The declarations part contains the start symbol and a list of all the

terminals. The rules part contains tree patterns. Each rule has an

assembler code template, which is a quoted string that specifies what to emit when the rule is used. Rules end with an optional cost. The

(44)

fol-4.3 – The compiler

%start stmt

%term ADDI4=309 ADDRLP1=295 ASGNI4=53 %term CNSTI4=21 INDIRI4=67

%%

con: CNSTI4 "1"

addr: ADDRLP1 "2" addr: ADDI4(reg, con) "3"

rc: con "4"

rc: reg "5"

reg: ADDI4(reg, rc) "6" 1

reg: addr "7" 1

stmt: ASGNI4(addr, reg) "8" 1

In this example the assembler code templates are simply rule

num-bers. Rule 1 states that con matches constants. Rule 2 and 3 states that

addr matches trees that can be computed by address calculations, like

an ADDRLP1 or the sum of a register and a constant. rc matches a

con-stant or a reg, and reg matches any tree that can be computed into a

register. Rule 6 describes an add instruction. The first operand must be in a register and the second operand can be a register or a constant. The result is stored in a register. Rule 7 describes an instruction that loads an address into a register. Rule 8 describes an instruction that stores a register at an address.

The emitter

The emitter in the function emitcode() is what outputs the

assem-bler code from the assemassem-bler templates. Each rule has one assemassem-bler

template. If the template ends with a newline character, lburg

assumes that it is an instruction, otherwise it is assumed to be a piece of an instruction.

When the emitter emits the template it treats some characters

differ-ently. %digit tells the emitter to emit the digit-th nonterminal from

the pattern. %c emits the nonterminal on the left side of the

produc-tion. For example, the rule

areg: ADDI4(reg, rc) "add %c,%0,%1"

might be emitted as

add a1,r1,#60

If the template begins with #, emit2() is called to emit the

(45)

5

Implementation

5.1 Introduction

The main goal of this thesis was the design and implementation of a new back end to the LCC compiler for the DSP56002 processor. One other goal was to maintain compatibility with Motorola’s C compiler, so that the generated code would behave in the same way. This means that the two compilers use the registers in the same way, uses the same memory layout, uses the same calling convention, and so on. By doing this, code generated by Motorola’s compiler can use code com-piled by this compiler, libraries for example, and vice versa.

LCC is designed so that retargeting should be as easy as possible, and the included backs ends only consist of about 1000 lines of code each. This chapter will describe how the back end was constructed and why it looks and behaves as it does.

5.2 The compiler

The DSP56002 digital signal processor is designed to execute DSP ori-ented calculations as fast as possible. As a consequence, it has an architecture that is somewhat unconventional for the C language. Because of this there are characteristics of the compiler and the gener-ated code that are a bit unusual and will be documented here. Since

(46)

5.2.1 Data types and sizes

Because of the word orientation of the DSP56002, all data types are aligned on word boundaries. One word is 24-bit wide.

Integer data types

The sizes and ranges of the integer data types is defined in Table 5.1.

Floating point types

The C data types float and double are implemented as fractional

numbers as used by the DSP56002 processor. The precision and range can be seen in Table 5.2.

This is not consistent with the Motorola compiler. It uses single

preci-sion floating point arithmetic for both float and double. However,

since the DSP56002 can not do floating point arithmetic in hardware, all operations are performed by calls to an external library.

The choice to implement floating point numbers as fractional numbers was made because that is what the hardware supports. Emulating “real” floating point numbers defeats the purpose of using a DSP processor in the first place since it will then be slower than a normal processor. It was also easy to implement since the only thing that is different between integers and fractional numbers is the way multipli-cation and division is handled. There is one problem though, since the hardware does not support 48-bit multiplication and division the

compiler uses the integer version of those operations for the double

Data type Size (words) Min value Max value

char 1 –8388608 8388607 unsigned char 1 0 0xFFFFFF short 1 –8388608 8388607 unsigned short 1 0 0xFFFFFF int 1 –8388608 8388607 unsigned int 1 0 0xFFFFFF long 2 –140737488355328 140737488355327 unsigned long 2 0 0xFFFFFFFFFFFF long long 2 –140737488355328 140737488355327

unsigned long long 2 0 0xFFFFFFFFFFFF

(47)

data type. This pretty much makes this data type useless, but it is implemented anyway.

Pointer types

All pointers are 16-bit. When computing addresses with integer arith-metic only the least significant 16 bits are relevant. See Table 5.3.

5.2.2 Register usage

The compiler uses all of the registers in the DSP56000 processor except the M-registers. The register usage can be seen in Table 5.4.

Data type Precision (bits) Range

float 24

double 48

Table 5.2: Floating point data type precision and ranges

Data type Size (words) Min value Max value

pointers 1 0 0xFFFF

Table 5.3: Pointer size and range

Register Usage

R0 Frame pointer (16-bit)

R6 Stack pointer (16-bit)

R1 – R5, R7 Address registers used for pointers and structures. (16-bit)

N0 – N7 Compiler temporary. Used when updating the R-regis-ters. (16-bit)

M0 – M7 Unused. Must be kept as 0xFFFF. (16-bit)

A 48-bit general purpose register and 48-bit function return value.

A1 24-bit general purpose register and 24-bit and 16-bit function return value.

B 48-bit general purpose register.

B1 24-bit general purpose register.

Table 5.4: Register usage

1.0

– ≤x<1.0

1.0

(48)

5.2.3 Memory usage

Due to the architecture of the DSP56002 program and data memory are separate. The program resides in the P-memory and all data are stored in the Y-memory. Figure 5.1 illustrates the default program and data memory layout.

Figure 5.1: Program and data memory layout

The bottom of the program memory contains the interrupt table and it

is filled with jumps to the subroutine Fabort, except at the first

posi-tion that contains a jump to the subroutine F__start. This is because

the DSP processor starts execution here. The F__start function takes

care of initialization and calls the Fmain subroutine, which is the

com-piled main() function. The rest of the program memory is used to

store the compiled functions.

The data memory is split in three parts. The bottom part contains data defined in the crt0 file. The next part is used for global and static data,

X, Y 48-bit general purpose registers. X0, X1, Y0, Y1 24-bit general purpose registers. Register Usage

(49)

5.2.4 Frame layout

An activation record, or frame, holds all the state information needed for one invocation of a function. This includes local and temporary variables, saved registers and return address. The stack stores one frame for each active function. When a function is called it puts a new copy of the frame on the stack, and when the function returns it removes the frame. The frame pointer points into the currently active frame. It is used to access data inside the frame. The stack pointer points to the first free space on the stack and it grows upwards. Figure 5.2 shows the layout of a frame.

Figure 5.2: Frame layout

An example of what the stack looks like during a function call can be seen in Figure 5.3. The code that is executed looks like this:

void main(void) { func();

(50)

pushed onto the stack and the frame pointer is updated. When the function returns, it’s frame is removed from the stack and the frame pointer is restored.

Figure 5.3: Frame example

5.2.5 Calling convention

Whenever a function is called, a strict calling convention is followed. The calling sequence is divided in three parts, the caller sequence, the callee sequence and the return sequence.

Caller sequence

The caller part of the calling sequence consists of:

1. Pushing the arguments onto the stack in reverse order. 2. Calling the function.

3. Adjust the stack pointer.

Callee sequence

During the initial part of the calling sequence, the called function is responsible for:

1. Saving the return address and the old frame pointer. 2. Updating the frame and stack pointers.

3. Saving the following registers if they are used by the function: B0, B1, X0, X1, Y0, Y1, R1 – R5 and R7.

Return sequence

During the final part of the calling sequence, the called function is responsible for:

(51)

2. Testing the return value. This feature is not used by this com-piler, but it is needed to maintain compatibility with the calling convention used by Motorola.

5.2.6 Naming convention

The compiler uses a special naming format when generating assembly code. This can be seen in Table 5.5.

5.3 Retargeting

The back end in LCC is split in two parts, a machine independent and a machine dependant part. All back ends use the same machine inde-pendent part. This simplifies the construction of new back ends since it does not require as much new code.

The different back ends in LCC are stored in .md files (md stands for machine description). They contain everything that is needed for the back ends. To create a new back end a new file is created. When LCC is being compiled the files for the back ends are fed through the program

lburg. lburg generates C code from the .md files, and that is then compiled together with the rest of the source code for LCC to create the compiler.

The format of the .md file is the same as of the specification mentioned in section 4.3.5 on page 30. The complete file for the new back end is

called dsp56k.md and can be seen in Appendix C on page 61.

The following sections will give a more detailed description of the file.

Label Purpose

L# Local labels. Used for targets of jumps. # is a unique number.

F<identifier> Global variables and functions. <identifier> is the var-iable or function name.

F__<identifier># Variables static to a function.

<filename_c>

Section names. The contents of each assembly file gen-erated by the compiler are contained in a unique sec-tion. <filename_c> is the file currently being compiled where the '.' is replaced by '_'.

(52)

5.3 – Retargeting

5.3.1 Configuration

This part contains declarations and function definitions. They are cop-ied to the top of the generated C file and is almost identical for all back ends. It is only the global variables that differ somewhat.

The arrays ireg[32], ireg2[32] etc. are used to hold information

about the registers. The variables iregw, iregw2 etc. are used to hold

an entire register set, also called a wildcard. cseg is used to hold the

current segment and retstruct is a flag that is set if the function

being compiled returns a structure.

5.3.2 Declarations

This part contains all the terminals that are used for the rules. They are

generated by a program called ops that is included in the LCC

distri-bution. The command line given to the program was the following:

ops c=1 s=1 i=1 l=2 h=2 f=1 d=2 x=2 p=1

The letter c stands for char, s for short, and so on, covering all the

data types in C. x means long double and p means pointers. The

number indicates how many bytes (or words) each data type use. The output from the program is a list with all the terminals that can appear with the given data type sizes and it is copied directly into the .md file.

5.3.3 Rules

This is the core of the back end, and the rules were the most time con-suming part to write. They required a lot of testing and fine tuning. The rules were constructed incrementally. It started with a few basic rules that was only able to compile empty programs. Then new rules were gradually added to handle more operations, but only for one data type (1-word integers). When all operators were covered addi-tional rules to take care of all data types were added. While writing

the rules, the function emit2() was also constructed to take care of

the cases where the instruction templates were not enough. This func-tion is documented further down.

The other back ends were used as an inspiration when writing this, but a lot is different because of the nature of the DSP56002.

5.3.4 C code

This is the actual interface to the back end that the front end uses. It is made up of a number of interface functions and a structure called the

(53)

interface record that contains configuration data. The front end calls the functions to inform the back end of things during the compilation. It also calls the functions to let the back end emit the actual assembly code.

This part was created in the same way and at the same time as the rules were written. At first only a few functions are needed to compile an empty program. As more and more rules were created the func-tions needed to have more and more features.

The following is a description of the functions and the interface record in the .md file. They are listed in the same order as they appear in the file.

progbeg()

The front end calls progbeg() during initialization to set up

varia-bles and initialize data and to emit the boilerplate at the beginning of the generated assembly file. It is also responsible for checking and tak-ing care of the command line arguments passed to the back end.

progend()

When the compilation ends the front end calls progend to give the back end a chance to clean up and finalize its output.

rmap()

This function is used to tell the front end which register class the oper-ators should use. P and B is pointers and structures, they use the R-registers. I, U and F are integers, unsigned integers and floating point numbers. They use the same registers; X and Y if they are 48-bit (2 words) and X0, X1, Y0 and Y1 if they are 24-bit (1 word).

segment()

The front end tells the back end which segment it should use, CODE, BSS, DATA or LIT. CODE is used for code, BSS for uninitialized varia-bles, DATA for initialized variables and LIT for constants. For this implementation CODE uses the P memory and all the data uses the Y memory.

target()

This function is used because some instructions must have their oper-ands in certain registers or they leave the result in a specific register. In

(54)

5.3 – Retargeting

geted because of the hardware. Most of instructions in the DSP56002 use the X and Y registers as input registers and the accumulator regis-ters A and B as output regisregis-ters. Therefore almost all instructions must be forced to use specific registers.

The function rtarget() is used to specify a register for the operand

and setreg() is used to set the register that will contain the result from an instruction.

clobber()

Some instructions destroy the value in a register. This function tells the front end to insert instructions to save and restore the register before and after an instruction that destroys it.

emit2()

Some instructions are too complicated to be emitted using the instruc-tion templates. They are emitted by this funcinstruc-tion instead. There are a lot of different reasons to why an instruction must be emitted by

emit2() instead. For example, reg: CNSTI2 loads a constant into a register, but the hardware can not move a 48-bit constant to a register using one instruction, so the constant must be split in two parts and moved to the register using two move instructions. This can not be

done with the templates so emit2() emits it instead.

doarg()

doarg() is called for each argument before a function call. It is used to compute the register or stack cell assigned to the next argument. For this implementation all function arguments are put on the stack.

blkfetch(), blkstore() and blkloop()

These functions can be used to emit code that copies blocks of data. This can, for example, be used by structure assignments. If they are empty the compiler will generate other code instead.

local()

This function is used by the front end to announce local variables to the back end. It is used to set the stack offset for the variables.

function()

function() is used by the front end to generate and emit code for a function. It is usually divided in three parts. In the first part

(55)

initializa-After that gencode() is called to generate the code for the function. In the second part the size of the frame and the registers that need sav-ing are known. The function prologue is emitted to save the old frame pointer and to set up a new stack pointer and save the necessary

regis-ters. After that emitcode() is called to emit the actual assembly code

for the function. In the third part the function epilogue is emitted to restore registers and the frame and stack pointers.

defsymbol()

defsymbol() is called by the front end whenever a new symbol is defined. A symbol is an internal data type in the compiler that repre-sents variables, constants, labels and types. This function sets up the name that the back end uses for the symbols.

address()

This function is used to initialize a symbol that represents an address on the form x+n, where x is a symbol name and n is a number. This is used so that addresses can be calculated by the assembler instead of at run time.

defconst()

This function is used to emit assembly code for constants.

defaddress()

This function emits assembly for pointer constants.

defstring()

This function is used to emit assembly code to initialize a string. There is a special case that needs to be taken care of here. Internally the compiler treats all arrays with the data type “1-byte integer” as

strings. This causes a problem since char, short and int are all of

this data type in this implementation. Therefore the compiler believes that all arrays of these data types are strings and tries to emit them as

strings. A change in the front end was made so that if the variable n

(the length of the string) is -1, the string pointer *str contains the

actual value instead of pointing to a string that should be emitted.

export() and import()

(56)

5.4 – Special features

global()

The front end calls this function to emit assembly to make a variable global.

space()

This function is used to emit code that creates a block of words set to zero.

Interface record

The interface record is used to configure the back end. It consists of a structure that is assigned to the global variable IR. It is done when the compiler is initialized and it is set to the structure of the back end that the compiler chooses. The front end can then call the interface

func-tions in the back end on the form (*IR->progbeg)(arg1,arg2)

and access the interface variables on the form IR->wants_dag.

The first part of the interface record sets up the size and alignment of the data types in C. After that are some flags to set up specific features. The rest is used to assign the functions documented above.

5.4 Special features

There are some special features in the compiler that needs to be men-tioned.

The C standard specifies that floating point constants default to

dou-ble if they are not ended with an f. For example, 1.0 is a double

and 1.0f is a float. But since the double data type is not fully

implemented this was changed so that floating point numbers default to float and double is used if the number ends with a d. This vio-lates the C standard, but it makes the compiler more usable.

1.0 is not a valid number for float and double since the ranges for

these data types are . But it is still possible to use it and it

will be converted to the largest number allowed for that data type. This avoids statements like

float x = 0.999999999999;

to get the largest possible number. 1.0

Retargeting a C Compiler for a DSP Processor

Retargeting a C Compiler for a

DSP Processor

Henrik Antelius

Retargeting a C Compiler for a

DSP Processor

Henrik Antelius

Abstract

Table of contents

1 Introduction

1

2 DSP

3

3 Compilers

9

4 LCC

23

5 Implementation

33

6 Conclusions

51

References

53

Appendix A: Instructions

55

Appendix B: Sample code

59

Appendix C: dsp56k.md

61

1

Introduction

1.1 Background

1.2 Purpose and goal

1.3 The reader

1.4 Reading guidelines

2

DSP

2.1 Introduction

2.2 Motorola DSP56002

2.2.1 Data buses

2.2.2 Address buses

2.2.3 Data ALU

2.2.4 Address generation unit

2.2.5 Program control unit

2.3 Instruction set

2.4 Assembly

3

Compilers

3.1 Introduction

3.2 The analysis-synthesis model

3.3 Phases

3.4 Analysis

3.4.1 Lexical analysis

3.4.2 Syntax analysis

3.4.3 Semantic analysis

3.5 Synthesis

3.5.1 Intermediate code generation

3.5.2 Code optimization

3.5.3 Code generation

3.6 Symbol table

3.7 Error handler

3.8 Front and back end

3.9 Environment

3.9.1 Preprocessor

3.9.2 Assembler

3.9.3 Linker and loader

3.10 Compiler tools

4

LCC

4.1 Introduction

4.2 C

4.3 The compiler

4.3.1 Lexical analysis

4.3.2 Syntax analysis

4.3.3 Semantic analysis

4.3.4 Intermediate code generation

4.3.5 Back end

5

Implementation

5.1 Introduction