Modelling and Evaluating the StreamBits language

(1)

Technical report, IDE0708, January 2007

MODELLING AND EVALUATING THE

STREAMBITS LANGUAGE

Master’s thesis in Computer Systems Engineering

Jonathan Andersson

School of Information Science, Computer and Electrical Engineering Halmstad University

(2)

(3)

Modelling and Evaluating the StreamBits Language

School of Information Science, Computer and Electrical Engineering Halmstad University

Box 823, S-301 18 Halmstad, Sweden

(4)

(5)

Abstract

This thesis concludes the evaluation of a new high level programming language for stream applications, StreamBits. The goal with the project is to evaluate the programmability, with the focus on expressing machine-independent parallelism and bit-level computations in StreamBits. As of now, the programming language is prototyped in a Java framework. This project also involves improvement and expansion of this framework.

An examination of the framework was conducted. The conclusions of this examination was the foundation of the changes implemented in the framework during the improvement and expansion part of this project. Evaluation experiments were done using the improved version of the framework. The evaluation was based on a comparison of programs imple-mented in StreamBits and another programming language typically used by industry for this kind of applications. The focus of the evaluation was to evaluate how well the new data-types and stream constructs of StreamBits can be used and expressed compared to other languages.

The results are partly the improvements and expansion of the framework, partly the results of the tests conducted during the evaluation. Results show that the new data-types and stream constructs of StreamBits are valuable additions to a stream programming language. The data-types and stream constructs assists the programmer to write source code that is not closely bound to a specific architecture.

(6)

(7)

Preface

This master’s thesis has been accomplished as the final stage of the Master programme in Computer Systems Engineering at the school of Information Science, Computer and Electrical Engineering at Halmstad University.

I would like to thank my supervisor Jerker Bengtsson, without whose guidance the thesis would not have been possible. I would also like to thank Veronica Gaspes for her valuable suggestions. I am thankful to Christopher Allen for proofreading and giving suggestions about the English language of the thesis report.

Jonathan Andersson

(8)

(9)

LIST OF FIGURES

List of Figures

2.1 3G baseband functions. . . 4

2.2 Example of filter declaration. . . 6

2.3 Pipeline usage in StreamIt. . . 7

2.4 Split-Join usage in StreamIt. . . 7

2.5 Feedback-loop usage in StreamIt. . . 8

2.6 The convolution coding scheme. . . 12

3.1 Structure of stream components. . . 14

3.2 Structure of data-types. . . 14

4.1 Parallel stream constructs, switches, in StreamBits. . . 17

4.2 Stream graph of matrix multiplication. . . 21

4.3 Comparison of matrix multiplication. . . 24

(10)

(11)

LISTINGS

Listings

4.1 Example of adding new components to a pipeline. . . 19

4.2 Pipeline declaration of matrix multiplication in StreamBits. . . 21

4.3 MatrixSplit declaration. . . 22

4.4 Vector to vector multiplication filter declaration. . . 23

4.5 Summarization of vector filter declaration. . . 23

4.6 Core of StreamBits convolution encoding implementation. . . 26

(12)

(13)

CONTENTS

2.3 Functions . . . 11 2.3.1 Matrix multiplication . . . 11 2.3.2 Convolution encoding . . . 11 3 Approach 13 3.1 Examination of framework . . . 13 3.1.1 Framework structure . . . 14 4 Results 17 4.1 Expansion of framework . . . 17 4.1.1 Switches . . . 17 4.2 Improvement of framework . . . 19 4.2.1 Stream constructs . . . 19 4.2.2 parAdd . . . 19 4.2.3 Parallel simulation . . . 19 4.2.4 Data-types . . . 20

(14)

4.3 Evaluation . . . 20 4.3.1 Matrix multiplication . . . 20 4.3.2 Convolution encoding . . . 25 4.3.3 Framework usage . . . 29 4.3.4 Data-types . . . 29 4.3.5 Messaging . . . 30 5 Conclusion 31 6 Future work 33 References 35 A Convolution source code 37 A.1 StreamBits version . . . 37

(15)

CHAPTER 1. INTRODUCTION

1 Introduction

Evaluating StreamBits is a master thesis project which has been carried out at Halmstad University. The aim with the project is to experiment with, evaluate and improve the StreamBits programming language prototype that previously has been developed within a research project on the same university.

StreamBits is a domain specific language, targeting so called stream applications, intended to enable high-level programming and compilation to parallel processor architectures of such applications. These kind of applications generally require processing of data under hard real-time requirements.

1.1 Motivation and goal

While the research on parallel architectures has made great progress, the advances in compiler and computer languages have lagged behind. The purpose of the StreamBits programming language is to reduce this glitch, which consists of the difficulties arising when high-level programs are to be compiled to a parallel architecture in a efficiently manner.

The goal with the StreamBits language is to allow programmers to more easily and effi-ciently develop correct programs that make use of parallel architectures. The introduction of new language constructs, data types and operators provides the compiler with infor-mation that can be analyzed to map a program to a parallel processor architecture. New operators which can perform more robust checks on bit-level and data-parallel types be-fore runtime is a big improvement compared to what is common practice in traditional languages such as C.

The goal of the work with this master thesis is to evaluate the StreamBits prototype by investigating if the ideas stated in [1] are applicable or non-applicable. One approach to do this is to implement some specific functions that are typical for the domain of applications that StreamBits is developed for. Specific aspects that will be evaluated are expressibility and programmability of StreamBits, compared to other languages that are typically used today. The project should also identify strengths and weaknesses in the language and implement improvements to the Java-based prototype framework.

1.2 Scope of thesis

The task is to evaluate and improve the prototype framework and by that get a good estimation on how well suited the implemented version of the programming language will be for the domain. The focus lies on the evaluation, comparing programs developed in the prototype framework with programs developed in another programming language. The evaluation does not take performance related issues into account, the evaluation is purely

(16)

focused on language functionality which includes stream constructs and data-types and how these can be expressed and used.

(17)

CHAPTER 2. BACKGROUND

2 Background

The work was started by a survey of related work. The goals with the survey was to get a deeper understanding of the computation model, to get an overview of stream languages, and to get an understanding for how programming is done in such domain-specific languages.

2.1 Stream applications

A stream application in general has specific properties that characterize the application well [2]. A list of common properties are listed below.

1. Large streams of data. Operates on a large amount of data, i.e. a data stream that typically can be infinite.

2. Independent stream filters. The data stream is transformed, modified, when pro-cessed in the stream program. The computational elements in a stream program are called filters or stream transformers. A filter is normally a standalone unit, which takes input data from the input stream, computes and passes the data forward onto a output stream.

3. A stable computation pattern. A finite set of filters is statically applied on the entire stream.

4. Occasional modification of stream structure. The ability to change filter execution during run-time.

5. High performance expectations. The application has often real-time requirements; therefore throughput and latency are important concerns.

2.1.1 3G

The third generation mobile network (3G) is a service that contain stream applications. A typical environment for StreamBits would be the radio baseband station (RBS) in the transmission chain of the network. The baseband consists of a certain set of pipelined functions. The functions for downlink processing are shown in Figure 2.1.

(18)

C R C a tt a c h m e n t C o n c a te n a ti o n a n d S e g m e n ta ti o n C h a n n e l c o d in g R a te m a tc h in g F ir s t D T X I n s e rt io n F ir s t in te rl e a v in g R a d io f ra m e S e g m e n ta ti o n

+

T rC H M u lt ip le x in g S e c o n d D T X In s e rt io n P h y s ic a l c h a n n e l S e g m e n ta ti o n S e c o n d I n te rl e a v in g P h y s ic a l c h a n n e l m a p p in g n T rC H s tr e a m s C C T rC H o f m p h y s ic a l s tr e a m s

Figure 2.1: 3G baseband functions.

The baseband takes n number of transmission channels as input. The number of trans-mission channels depends on the service that is processed. The transtrans-mission channels are processed in parallel and later multiplexed into a single transmission channel, the coded composite transmission channel (CCTrCH), that contain one user’s data. The CCTrCH stream is demultiplexed to be mapped onto the physical channels and transmitted, see Figure 2.1.

AMR

AMR, adaptive multi-rate, is a coding technique used on voice data in 3G mobile networks [1]. The AMR data stream is a possible source of data for the experiments used in the evaluation part of the project. There are eight different transmission rates from 4.75 kbits/s to 12.2 kbits/s, which can be dynamically adapted.

AMR encodes the voice data into three bit streams of different priority levels, A, B and C, where A is the most prioritized [3].

2.2 Stream programming languages

This section will present a short overview of StreamBits and other related stream pro-gramming languages. To be able to evaluate and improve StreamBits, a good, general understanding of stream programming is essential. There will be a concentration on StreamIt and StreamBits due to the similarities between them, but other languages will also be discussed briefly to get an overview of the area.

(19)

2.2.1 StreamIt

StreamIt is a stream programming language that is developed to simplify the coding of signal-processing and other computations with similar characteristics [4]. The process of building a program starts with the creation of a stream graph, which is a description of the specific parts of a program and on how the program should be built. A StreamIt program takes a single input stream and produces a single output stream as a result. A stream graph is built of a few structural components, so called stream types. These components are Pipeline, Filter, Split-Join and Feedback-loop. Every stream type has a input and output data-type and are declared in similar ways:

dataType->dataType streamType ID

A stream type declaration starts with a data-type, which is the data-type that always will be received from the input stream. Next token is an arrow (->) that illustrates the process and then, another data-type. This data-type represents the type of data that will be pushed onto the output-stream by the stream construct. Next keyword is the name of the stream type. Lastly, the name of the stream component is specified.

The compiler reconstructs the stream graph, built by the programmer, and performs optimizations. The optimizations can be combinations of adjacent filters that are not computational expensive or splitting of other, more computational demanding, filters into several parts. This, due to the fact that a filter is a atomic block of code, makes it possible to analyse what can be executed in parallel or not. It is also possible to make more complex filter transformations.

Stream types

To specify the structure of a stream program a number of stream types have been defined. Filter

It is in the filter objects where all computations are made. Filters have the option of including a initiation sequence that is executed once, on object creation, but a programmer can choose to omit it. The filter has a work sequence that repeatedly executes during run-time.

A filter can only have one work function. When a filter is programmed the peek, push, and pop rates must be declared. These specifies how much data the filter will take from and put to the streams. The pop rate specifies how much data the filter will process from its input. The push rate specifies how many elements the filter will push onto the output stream. An example of a filter declaration is shown in Figure 2.2.

(20)

Figure 2.2: Example of filter declaration.

The syntax of a filter declaration is similar to every other stream-type declaration. The declaration starts with a type, which is the type that will be retrieved from the input tape. Next token is an arrow (->), representing the filter, and then another type, which is the type that will be pushed onto the output stream by the filter. Next keyword is filter since the example in this case is a declaration of a filter. Finally, the name of the filter is given.

Init function. The init function is executed once, on filter initiation. It cannot access the filter’s tapes, but can initiate variables that has been declared.

Work function. As seen in Figure 2.2, the work function has a specific notation that specifies how much it will peek, pop and push the streams connected to it. The peek argument is optional and if it is not specified the pop-rate is adopted as peek-rate. During each execution of the work function the streams will be modified according to the rates specified. The filter’s task during run-time is to execute the work function repeatedly.

Helper functions. There can be a number of helper functions declared in a filter. Every function can call any other help function.

Filter state. To keep track of a filter’s state between executions, variables can be declared at the top of each filter declaration. Any function inside the filter can access these variables.

Pipeline

The pipeline is composed of a number of components and the idea with it is to ensure that the series of stream type components only will have one stream connected to them and that they only will produce one stream as an output. An example of a pipeline declaration is shown in Figure 2.3.

(21)

(a) Pipeline declaration.

(b) Pipeline structure.

Figure 2.3: Pipeline usage in StreamIt.

The pipeline declaration starts as every other stream type declaration, with data-type of input and output streams. The pipeline keyword is followed by an identifier. The first component must have the same input type as the pipeline input, the last component must have the same output type as the pipeline output.

Split-Join

A split-join is a stream component for parallel streams. The component splits the input stream and distributes the split stream to multiple filters. After the filters, a joiner merge the streams into a single stream again. Figure 2.4 shows a typical Split-Join declaration.

(a) Example of Split-Join declaration. (b) Split-Join structure.

Figure 2.4: Split-Join usage in StreamIt.

The body of the Split-Join starts with the split keyword, followed by a splitting technique. StreamIt operates with a predefined set of split and join techniques. In the example the stream is duplicated for two different filters, but it could also be split up in a round-robin order [2]. The filters are added with the add operator and, finally filter streams are joined into a single stream in a round-robin order.

Feedback-loop

A Feedback-loop take an input stream and joins it with the previously pushed output-stream. The joined stream is passed through an optional body function and then it is split up. One part of the splitted stream is passed straight through the feedback-loop and the other part is fed back to the input of the feedback-loop, see Figure 2.5.

(22)

(a) Example of Feedback-loop declaration. (b) Feedback-loop structure. Figure 2.5: Feedback-loop usage in StreamIt.

Messaging

There is a method to distribute configuration messages in StreamIt. The method is called portal and makes it possible to change filter parameters during run-time. A portal is a container that contains filters. By sending a configure message to a portal the message will be delivered to all of the portal’s components.

2.2.2 StreamBits

StreamBits is a stream programming language that is under development. There is a thesis project in which the first part of the compiler, the parser, is implemented. Currently only a model of the language is implemented as a Java framework [5]. The framework is composed of components for data-types and stream types. Since the framework is designed using different design patterns, components can easily be modified and new ones can be created. The idea with the programming language is that programs should be portable to multiple architectures by using high-level types to express parallelism and operations. Other vital parts to the language are that bit-level operators and data-parallel constructs provide information that can be used for type-checking and partitioning at compile time. StreamBits is to a large extent based on the StreamIt programming language [1], but there are some fundamental differences. The StreamBits programming language has several data-types for fine-grained parallelism and type-safe bit-level operations, which StreamIt has not [4]. There are also specific stream constructs to handle distribution of config-uration parameters throughout a stream program. StreamBits have dual tapes, which is the StreamBits representation of streams, between components and this is to separate data from configuration parameters. The separation of streams, together with separate configuration and work modes in filters, make changes in respective part independent of the other. This is to aid the programmer in writing clear and understandable source code. StreamBits is also designed so that bit-field and data-parallel operations can be expressed more easily and efficiently without considering machine-specific details.

Stream constructs[5]

The basic stream constructs in StreamBits are similar to the ones in StreamIt. They both build on stream components which are called Pipeline, Filter and Tape. The

(23)

CHAPTER 2. BACKGROUND Split-Join and Feedback-loop, that are available in StreamIt have not been imple-mented. The StreamBits prototype framework includes generic classes for structural com-ponents and data-types.

StreamComponent is the generic structural stream component and any stream component must implement this interface to be executable in a stream program.

A pipeline is a StreamComponent which contains other StreamComponents, thus, pipelines are hierarchical. The pipeline initiate four tapes as input and output streams, as specified in the StreamComponent interface. Two of the tapes are assigned as the input of the pipeline and two are assigned as the output.

The StreamBits filter type is different from the filter type in StreamIt. In StreamBits there is a configuration mode in extension to the work mode. There are also four, instead of two, tapes assigned to it. The configure mode is added to StreamBits because of the new messaging system that StreamBits implements. The configure mode is executed once before every execution of the work mode. The configure mode is used for the same task as the messaging system in StreamIt - changing filter parameters during run-time. To be able to uniquely address the four streams, the set of stream access operators found in StreamIt has not been adopted in StreamBits. Instead, two new sets have been defined for StreamBits. popD, peekD, pushD is used for data-stream access and popC, peekC, pushC is used for configuration-stream access.

Data-types[5]

StreamBits have data-parallel and bit-field types. These data-types, and belonging op-erators, are introduced to reduce errors, improve readability and understanding of code, which is not possible using conventional languages and their types. These new data-types also makes it possible to implement more portable source code, to be executed on different architectures. Alongside these new data-types, also more traditional data-types have been implemented into the prototype framework.

bitvecST This is a type for computations on bit-fields. bitvecST is a data-type with no fixed size bit-field length. When instantiating a bitvecST the programmer must specify how many bits the bitvecST should contain. The data-type is supposed to make bit-level operations easier by using the type with an arbitrary length instead of being dependent of a machine-register of a certain word length. By introducing a type for bit-fields, type-checking can be performed at compile time. This can help the programmer to reduce the number of errors before run-time.

vecST A type for fine grained SIMD parallelism that typically would be implemented as sequential expressions in conventional, imperative programming languages. The paral-lelism can be extracted by the compiler instead of the programmer when this type is used in a program. vecST contains a number of elements of a certain type. Parallel operations on these elements can be expressed using the vector.

intST The intST type is a typical integer type commonly found in other programming languages. The intST length is dependent on the specific machines word length.

(24)

byteST The byteST type is a byte value type commonly found in other programming languages. A byteST value is eight bit long.

floatST The floatST type is a floating point type commonly found in other program-ming languages. A floatST has an architecture specific length.

voidST Type that represent a null value. voidST can, for example, be used when a stream component does not produce or retrieve any elements to or from a stream.

2.2.3 Other related programming languages

A very general overview with regard to three other related programming languages are presented in this sub-section. This is to obtain a widen view of different approaches to the research area. StreamIt has been examined more thoroughly than these because it seems most promising in terms of being able to fulfill the criteria which have been set up.

StreamC/KernelC [6]

This language is divided into two languages called StreamC and KernelC. The language is not as architecture independent like StreamBits and StreamIt. A stream program is created through a set of kernels and a set of streams, that connect the kernels. The StreamC language provides the interface for how to create, transfer and destroy streams of a stream program. The KernelC language is used to program the kernels; a kernel is very similar to a filter, that repeatedly compute in-data elements that is provided through the streams.

Brook [7]

Brook has evolved out of the StreamC/KernelC language. Brook has been developed for general-purpose programming on graphical processing units (GPUs). Data-parallel constructs has been introduced to allow programmers to express parallelism. Brook kernel do not retain a state between executions. Global parameters for kernel configuration can be instantiated upon kernel creation, but cannot be changed during run-time [8]. Stream constructs do not exist in Brook or StreamC/KernelC, as they do in StreamBits and StreamIt. A communication pattern can instead be constructed more freely, compared to StreamBits and StreamIt [8].

Sequoia [9]

Sequoia is not a programming language that can be categorized under the stream pro-gramming languages, since it does not build on the stream model. Sequoia is interesting in how data-parallel operations can be expressed, dividing them into tasks, which can be adopted in similar way to stream languages.

(25)

CHAPTER 2. BACKGROUND A task is an entity, free from side-effects, returning a call-by-value-result. Calling tasks is the only way to describe data transfers in Sequoia. Each task has an isolated address space and can not communicate directly with other tasks, except by calling a subtask or returning value to its callee.

2.3 Functions

This section will introduce a number of functions that have been implemented for eval-uation purpose. These functions were selected to test and evaluate expressibility of par-allelism and bit-field computations. These functions, Forward Error Correction (FEC) encoding and Matrix multiplication, are both realistic functions performed in domain-typical applications.

2.3.1 Matrix multiplication

Matrix multiplication is an fundamental operation required in many signal processing applications. It is also highly parallelizable. A matrix multiplication can be partitioned into several parts, where multiple processing elements can be assigned to calculate a smaller part of the product matrix. When all smaller parts are calculated, the sub-results are merged together to form the product matrix. A mathematical model of this is shown in formula below. A = n × m, B = m × p AB =    Pm i=1a1ibi1 · · · Pm i=1a1ibip .. . . .. ... Pm i=1anibi1 · · · Pm i=1anibip   

One approach to parallelize a matrix multiplication is to distribute one of the matrices to the processing elements, one row for each element. The result is obtained by sending every column of the second matrix to all the processing elements, each computing a vector multiplication and then merge the sums of the vectors into a matrix as matrix elements.

2.3.2 Convolution encoding

By coding the original bit-sequence and adding redundant bits, it it possible to decode and detect the transmitted bits in the receiver, at a higher bit error ratio compared to uncoded transmissions. In UMTS (Universal Mobile Telecommunications System), the two coding techniques that are used are convolution and turbo coding. Two coding rates can be applied in the convolution encoder. One produce one redundant bit per information bit (1/2 rate) and the other produces two redundant bits per information bit (1/3 rate) [3]. In this project, convolution coding with 1/2 rate is used.

Convolution coding by 1/2 rate generates a bit sequence with twice the number of bits as the original bit sequence. The operations required to produce this sequence are presented

(26)

here. The encoder corresponds to a 8-bit shift register, together with a number of XOR operations, see Figure 2.6.

Figure 2.6: The convolution coding scheme.

The information bits are shifted through the shift register. For each shift, two code bits are calculated and shifted one step ahead. The code produced by the encoder, shown in Figure 2.6, is dependent of several information bits and can, based on that, be recreated when a fraction of the bit sequence is distorted.

AMR. All three transport channels are convolution encoded. The class A bit-stream is encoded using the 1/3 rate while the other two are encoded using 1/2 rate. Transport channels are encoded independently and therfore all three data streams in parallel.

(27)

CHAPTER 3. APPROACH

3 Approach

The work presented in this master thesis project comprise three work packages:

• Examine, expand and improve the Java model of the prototype language. • Implement a specific set of functions in the language.

• Evaluate the language by comparing expressibility in StreamBits to other languages.

The first work package included examination, expansion and improvement of the prototype framework. By developing simple applications, a more thorough knowledge of the language was obtained. To fully understand the structure of the framework, and to understand how components interact with each other, was an important and time consuming part of the work. This work proved to be helpful at a later stage. The examination also gave information on what modelling restrictions were present in the prototype framework. The next sub-task in the first part was to improve and expand the framework based on the limitations found during the examination. A presentation of the changes can be found in the results section on page 17. The expansion of the framework is also presented in the same section.

The second work package of the project was to implement specific functions, which com-prise parallelism and bit-level computations, to test the expressibility in StreamBits. The functions were selected to show typical operations and tasks that a stream programming language most likely would be used to implement.

The third and last work package in the project was to evaluate the expressibility of the language. The evaluation was based on the implementations of the second work package were different stream programs were implemented. By comparing StreamBits with both StreamIt and a more traditional language to implement stream applications, C, it is possible to reason about expressibility, lines of code and machine abstraction when using StreamBits. All language specific operations and stream components received a more thorough evaluation, examining if they were useful in a larger context.

3.1 Examination of framework

The given project description included an assignment of examining the framework, trying to identify strengths and weaknesses and later implement improvements to the modeled language based on these findings. A number of changes were necessary to add to the framework in order to make it more complete and to be able to perform experiments and evaluate it in the continuation of the project work.

(28)

3.1.1 Framework structure

The framework has been organized into packages. Stream constructs are in one package, data-types in another and packages that handle exceptions, error handling and auxiliary functions are grouped as their own.

Stream constructs

As stated before, the StreamBits prototype framework did not contain constructs to ex-press parallelism on a course grained level. Introducing new stream constructs for this kind of operation is vital to be able to do a proper evaluation of the framework. A stream programming language is subjected to applications and algorithms that could, and should, be expressed as parallel operations. An overview of how this is implemented in the framework is shown in the next section of this thesis.

The language constructors are all located in the same package, streamcomponent. A pro-grammer must use these to build a stream program. Every stream component implements an interface which specifies what basic behavior that must be defined, see Figure 3.1.

Pipeline StreamComponent

Figure 3.1: Structure of stream components.

This structure enables the user to add new components by showing what they are ex-pected to do. As long as the new component does what is required by the interface, the component can be used in a stream program. The structure does not limit new compo-nents; additional component specific features can be added without intruding on the rules set by the interface.

Data-types

The framework contains several different data-types, most of them traditional data-types like integer, byte, float and void. Two additional language specific data-types, bitvec and vec, are also present. They were described in the background chapter, at page 9. The examination of the data-types has shown that they are all treated the same by the framework, they are all implementing an interface called StreamType, see Figure 3.2.

intST

StreamType TypeAdapter

(29)

CHAPTER 3. APPROACH The generalization of data-types is not preferred in the way they are now. A separation of data-types is necessary because of the differences between the data-types; all data-types do not, and should not, implement all methods stated in StreamType. Data-types should not be treated the same way by the framework, a differentiation between data-types that can be added into vecST:s and those that should not is necessary to ensure type safety in the framework.

vecST

vecST is a vector type that supports expressibility of single instruction multiple data (SIMD) operations at an abstract level. The data-type contains methods for adding, subtracting, multiplying and dividing with other vecST:s. The conditions are that the vectors have the same number of elements as each other and that they have the same kind of inner data-type. vecST also have methods to perform bit-level operations on its elements [5].

When testing the class it showed that it was not implemented in a sufficient way. The vecST could only contain elements of the data-type intST. Together with this, the vector only support a maximum of four elements added to it at a time. This was not how the data-type was supposed to be implemented and an improvement of it was necessary. The changes made to the data-type are presented under the Results at page 20.

bitvecST

The background survey of the StreamBits language showed that bitvecST is a data-type that is bit-length independent, a user can specify the bit-length of the data-type at every initiation. The implementation of the data-type supports that statement in theory, but not in practice due to the nature of a Java based framework. The length of the bitvecST is specified, but the value of a bitvecST is represented as an integer, limiting the bitvecST to a maximum bit-length of 32 bits. The presence of this is unfortunate but can not be improved, developers must take this into consideration when using the framework.

(30)

(31)

CHAPTER 4. RESULTS

4 Results

The results are presented in this chapter. The chapter discuss the expansion and improve-ments done to the framework during the first part of the master project. An evaluation of the prototype language is presented based on the comparison between StreamBits and other languages. The evaluation is done using implementations of baseband processing functions. An expansion of the framework was conducted to be able to perform experi-ments with parallel streams.

4.1 Expansion of framework

New features that are added to the framework are presented in this section. The expansion is implemented so that the framework, as exact as possible, simulate the behavior of a compiled and executed StreamBits program. The prototype language is implemented in Java. StreamBits specific operators must be implemented as Java functions to simulate the functionality. Types are handled like Java objects in the framework.

4.1.1 Switches

Switches are new stream constructs for expressing parallel streams in StreamBits. There are two kinds of switches, one called Split, the other one called Merge. The split has one stream as input, which is distributed to multiple output streams according to its policy. The policy is a switch’s specification of how data will be distributed between input- and output streams. The Merge takes several inputs and produces a single output stream according to its policy. See Figure 4.1. When a stream is mentioned in association with StreamBits in this section, the composition of the two separate tapes for data and configuration are meant. The two tapes are represented as one bold arrow in figures, while a single tape is illustrated as a thin arrow.

(a) Split. (b) Merge.

Figure 4.1: Parallel stream constructs, switches, in StreamBits.

Switches in StreamBits are implemented as atomic components. Implementing them as filter independent components has several advantages compared to how StreamIt imple-ments its parallel stream constructs. An evaluation of StreamIt shows that the parallel

(32)

structure Split-Join has limitations or unnecessary initiation of components when a stream should contain multiple sources [10]. As StreamIt is designed now, a complete Split-Join must be instantiated even though only a Joiner is needed. They suggest a new stream construct that can take several inputs and merge them to one. The StreamBits Merge is one example on how this can be done.

The StreamIt Split-Join stream component is constructed to encapsulate a static set of filters. If new filters must be added, a completely new Split-Join has to be constructed. The StreamBits approach, with independent switches, allows for a dynamic set of filters to be added to the switches streams, without reconstructing the switch component; Filters are detached and abstract to the switch specification. The switches are more reusable this way, they can be applied to any set of filters.

The Split and Merge switches are examples of how generic templates can be created, containing all the functionality required to add and connect them to other stream compo-nents. The policy definition in these templates are kept abstract. The programmer can implement stream splitters and mergers with different behavior. It also enables the user to specify generic switches with a policy that can be applied to many situations with similar properties.

Two generic switches are implemented as of now. The Split: CloneSplit<type1,type2,type3,type4>(n)

is a switch that clones the input stream and pushes a copy to all output streams. The typeX parameters is the type of each stream, where type1 specifies the data-input stream, type2 the data-output stream. The type3 and type4 specifies the type of stream for config-input and config-output. The parameter n is the number of outputs from the split. This is identical to the split duplicate(n) functionality found in StreamIt. The other generic switch that is implemented is a Merge:

SerialMerge<type1,type2,type3,type4>(n)

The Merge retrieves as many elements as the input config tape and data tape holds. When the Merge have retrieved the correct amount of elements from the input, it starts pushing it onto its output tape and moves on to the next input in a ordered manner. This can be compared to the join roundrobin functionality in StreamIt.

The advantage of user defined policies is that new policies easily can be developed. StreamIt have static methods to do splits and joins which limit the user to those that are available in the language. The user defined polices, together with the generic and parameterized switches, make the programmers development and expressibility freer and easier, not limiting the programmer to a certain set of policies.

To be able to handle multiple input and output streams to and from the switches, new stream access operators has been added. The access operators are popC(int i) and popD(int i), which a merger uses to get an element from the input stream at position i. Together with these, pushC(element, int i) and pushD(element, int i), are added which a splitter uses to push elements to the output stream at position i.

(33)

CHAPTER 4. RESULTS

4.2 Improvement of framework

The existing framework was not fully developed with all the functionality of the language, making the usage of it very limited. The modifications made to the existing components are presented here. The improvements implemented are targeted at data type structure in the framework as well as improvements to existing stream constructs due to the new expansion.

4.2.1 Stream constructs

The framework is expanded with stream constructs to enable parallel streams. This added functionality needs to be handled by other stream constructs, the improvements to the stream constructs is implemented to enable the interaction between components in a stream program. Parallel streams can be added to switches in a pipeline. A new notation, parAdd, to differ between the new feature of adding parallel components and the pre-existing feature of adding single components, is implemented in Pipeline. Components between switches are added using this new notation, see Listing 4.1.

Listing 4.1: Example of adding new components to a pipeline.

1 p i p e l i n e StreamProgram{

2 add (new G e n e r a t o r<voidST,intST,voidST,bitvecST>() ) ;

3 add (new C l o n e S p l i t<intST,intST,bitvecST,bitvecST>(2) ;

4 // add new a d d e r t o p i p e l i n e 0

5 parAdd(new IntSTAdder<intST,intST,bitvecST,bitvecST>() , 0 ) ;

6 // add new a d d e r t o p i p e l i n e 1

7 parAdd(new IntSTAdder<intST,intST,bitvecST,bitvecST>() , 1 ) ;

8 add (new S e r i a l M e r g e<intST,intST,bitvecST,bitvecST>(2) ) ;

9 add (new I n t S T P r i n t e r<intST,voidST,bitvecST,voidST>() ) ;

10 }

4.2.2 parAdd

Figure 4.1 shows an example of how a programmer could build a parallel stream program. The stream program splits a stream of integers, adds them with themselves, merges them to one stream and prints the results. The parAdd(StreamComponent s, int n) operator is used when the programmer wants to build stream-graphs with parallel components. The stream component will be added to the n:th output stream of the splitter. The parAdd operator can only be used between a Split and a Merge.

4.2.3 Parallel simulation

The initial framework simulated programs sequentially, executing one task at a time for a number of iterations. When a stream graph was to be executed, the first filter finished its execution before the next filter could start executing. The first filter filled up its output stream with elements so when the next filter started executing, it had all elements available on its input stream. This has now been changed. Now the whole stream program is executed thread-parallel by starting new threads for every

(34)

stream component that performs some kind of work. Working components are filters and switches. The motivation for making the framework threaded is to mimic a compiled and executed program as much as possible. When a working component does not have any elements available on the input stream, it will halt, just like a implemented version of the language would. The framework might not be running on a parallel system, but individually, running components are more representative of how a compiler implemented version would behave like.

4.2.4 Data-types

There were several changes done to the existing data-types. All data-types are now more thoroughly grouped, specifying which data-types that can perform which kind of operations and what types that can be vectorized into a vecST.

The changes to the vecST type were done to enable full vector support for the data-type. The data-type can from now on handle a dynamic range of vector sizes, as well as a number of different data-types. All methods have been converted to handle this. A vecST performs the same operation on all of its elements using a single operation.

vecST received a more thorough definition of what data-types that can be vectorized. The examination showed that no difference was made between which types that could be added to a vecST. The data-types that should be able to be added to a vector was grouped, specifying which operations they should be able to perform. Types that should be added to vectors should be able to handle arithmetic, logical and shift operations since those are the operations a vecST can accomplish.

4.3 Evaluation

The evaluation of the framework is performed to see how well the structures that are im-plemented in StreamBits are compared to how things are expressed using other languages that currently is used in the domain. The framework is evaluated using specific functions that are common in the domain. The functions are selected to show how parallelism and bit-level operations are expressed in different languages. The main intention with the eval-uation is to demonstrate how machine specific parameters can be abstracted when using StreamBits. The functions have been implemented using the StreamBits model and an-other programming language for programming stream applications. The functions tested here are matrix multiplication and forward error correction encoding. These functions have been presented at page 11, in Section 2.

4.3.1 Matrix multiplication

As earlier stated, a matrix multiplication is highly parallelizable. Matrix multiplication is a calculation commonly used in signal processing. The matrix multiplication is com-pared using StreamBits and C, a language that is often used in industry for the kind of applications that StreamBits is developed for.

(35)

CHAPTER 4. RESULTS The comparison between the different languages has been focused on number of lines of code to implement these functions, and the abstraction level. The goal with the experi-ment is to evaluate how well StreamBits can be used to express parallelism on different levels, compared to how it can be done in C.

A matrix multiplication consists of a number of row-to-column multiplications together with an addition of the elements after the multiplication. Doing these operations in a pure sequential manner requires

2N M P

additions and multiplications, where N, M and P are determined by the size of the matrix. The number of instructions can be reduced when these operations are expressed SIMD parallel.

Figure 4.2 shows how a matrix multiplication could be implemented in StreamBits.

V e c V e c M u l V e c V e c M u l V e c A d d V e c A d d M a tr ix S p lit S e ri a lM e rg e MatMul

Figure 4.2: Stream graph of matrix multiplication.

Listing 4.2: Pipeline declaration of matrix multiplication in StreamBits.

1 p i p e l i n e MatMul(intST n) { 2 add (new M a t r i x S p l i t(n) ) ; 3 f o r (intST i=0;i<n;i++){ 4 parAdd(new VecMul( ) ,i) ; 5 parAdd(new VecAdd( ) ,i) ; 6 }

7 add (new S e r i a l M e r g e<intST,intST,bitvecST,bitvecST>(n) ) ;

8 }

The first step when constructing a stream program is to partition the computations and organize them into a stream graph. Conventional languages, like C, do not have any no-tation for expressing this kind of course grained parallelism. The nono-tation for StreamBits, shown in Figure 4.2, specifies parts of a stream program that can be executed in parallel. A compiler do not have to analyze the program to find this parallelism, as a compiler compiling C to a parallel architecture have to. Figure 4.2 shows a stream graph example of a matrix multiplication, how it is constructed and the data flow in the program. The first stream component that is added to the pipeline is a split.

The split’s assignment is to distribute the incoming elements of the streams, according to its policy. The fundamental idea with the split’s work is to give the compiler the information that the section can be executed in parallel. Given that information, a compiler can divide the work load between multiple processing elements if such exists. In Listing 4.2, n instances of the vector to vector multiplication filter is added to separate output streams of the switch, followed by a vector summarization filter. The

(36)

parallel streams are serialized at the end by the Merge. Figure 4.2 illustrates how a A (2xM) times B (MxP) matrix multiplication can be implemented in StreamBits. When A matrices with more than two rows should be multiplied, additional parallel streams must be added to the stream program, this is achieved by changing the n parameter.

Listing 4.2 shows that constructing a stream program with StreamBits can be done at an abstract level, without concern for communication and synchronization between functions. This normally has to be taken care of by the programmer, using specific message or communication functions. In StreamBits, this is left to the compiler.

The MatrixSplit has two modes declared in its policy, one mode for distribution of rows, the other one for cloning of columns. Which mode the split will operate on is set by the information gathered from the configuration stream, see Listing 4.3.

Listing 4.3: MatrixSplit declaration.

1 s p l i t M a t r i x S p l i t<vecST<intST>,vecST<intST>,bitvecST,bitvecST>{

2 bitvecST conf mode, i n c o n f; 3 vecST<intST> i n d a t a;

4 intST c o n f d i s t r i b u t o r; 5

6 i n i t {

7 conf mode = new bitvecST( ” 3 : 0 ” , ”0010 ” ) ; 8 c o n f d i s t r i b u t o r = 0 ; 9 } 10 11 p o l i c y { 12 i n c o n f = popC( ) ; 13 i n d a t a = popD( ) ;

14 bitvecST c u r r e n t m o d e = i n c o n f & conf mode; 15

16 i f (conf mode==c u r r e n t m o d e) {

17 // c o n f mode , d i s t r i b u t e m a t r i x t o f i l t e r s , one row p e r f i l t e r .

18 pushC(i n c o n f ,c o n f d i s t r i b u t o r) ;

19 pushD(i n d a t a ,c o n f d i s t r i b u t o r++) ;

20 i f (c o n f d i s t r i b u t o r==getNbrOfPipelines ( ) )

21 c o n f d i s t r i b u t o r = 0 ;

22 } e l s e {

23 // work mode , c l o n e i n c o m i n g d a t a t o a l l sub−p i p e l i n e s .

24 f o r (intST i=0;i<getNbrOfPipelines ( ) ;i++){ 25 pushC(i n c o n f ,i) ; 26 pushD(i n d a t a ,i) ; 27 } 28 } 29 } 30 }

Listing 4.3 shows how a user specified policy could be declared. The policy is in this case not generic, but do different tasks specific for the matrix multiplication. The ability to define specific policies is one advantage of the StreamBits model compared to how it is done by other languages, for example StreamIt.

The next stream components after the matrix splitter are the vector to vector mul-tipliers. Filter declaration for the vector to vector multiplication is shown in Listing 4.4. The filters are identical and have two modes just like the splitter. One mode is executed when the first matrix is distributed. In this mode the configuration element

(37)

CHAPTER 4. RESULTS holds information that the next incoming data element is a row of the first matrix, the filter should store the row for later usage. The second mode takes the incoming data element and multiplies it with the pre-stored row from the first matrix. The difference between the two modes is that in the second, the configuration element does not fulfill the configure criteria. The result of the vector to vector multiplication is pushed onto the filters output data stream.

Listing 4.4: Vector to vector multiplication filter declaration.

1 f i l t e r VecVecMul<vecST<intST>,vecST<intST>,bitvecST,bitvecST> {

2 vecST<intST> s t a t i c r o w;

3 boolean conf mode;

4 bitvecST c o n f v a l; 5 6 i n i t { 7 conf mode=f a l s e ; 8 c o n f v a l = new bitvecST( ” 3 : 0 ” , ”0010 ” ) ; 9 } 10 11 work { 12 i f (conf mode) { 13 // s e t s t a t i c row o f f i l t e r . 14 s t a t i c r o w = popD( ) ; 15 conf mode=f a l s e ; 16 // push dummy d a t a t o k e e p s t r e a m r a t e 17 pushD(new vecST( ) ) ; 18 } e l s e { 19 // v e c t o r t o v e c t o r m u l t i p l i c a t i o n 20 pushD(s t a t i c r o w ∗ popD( ) ) ; 21 } 22 } 23 24 configure { 25 bitvecST b = popC( ) ; 26 pushC(b) ;

27 bitvecST and = b & c o n f v a l;

28 i f (and==c o n f v a l) {

29 conf mode = true ;

30 }

31 }

32 }

The vector to vector multiplication filter make use of the vector type for expressing SIMD operations. The multiplication is expressed by a single operation instead of looping through the vectors, multiplying element by element like it would be done with conven-tional data-types.

The VecAdd filters task is to loop through the incoming stream, which is the result of the vector to vector multiplication, and summarize the elements of it. The output of this filter is an element in the product matrix. See Listing 4.5 for a description of the filter execution.

Listing 4.5: Summarization of vector filter declaration.

1 f i l t e r VecAdd<vecST<intST>,intST,bitvecST,bitvecST> {

2

(38)

4 vecST<intST> v e c m u l r e s = popD( ) ; 5 intST r e s u l t = 0 ; 6 f o r (intST i=0;i<v e c m u l r e s. s i z e ( ) ;i++){ 7 r e s u l t = r e s u l t + v e c m u l r e s. getElement (i) ; 8 } 9 // push s i n g l e e l e m e n t o f p r o d u c t m a t r i x . 10 pushD(r e s u l t) ; 11 } 12 13 configure { 14 // no r e c o n f i g u r a t i o n n e e d e d 15 pushC(popC( ) ) ; 16 } 17 }

Vector sums are expressed as they would be done in conventional languages, such as C, by looping through the vector and adding elements with each other. This could be improved. Instead of looping through the vector, the vector summation would be expressed with a single operator. An extension of vecST’s operators is proposed. This functionality will improve the readability of code and the expressibility of this kind of fine grained parallelism in StreamBits.

In Figure 4.3, a code comparison between the operations needed to express a matrix multiplication in C and StreamBits is done. The comparison do not contain the full functional program, only the core of the matrix multiplication is shown.

1 f o r (i=0;i<(N/ 4 ) ;i++){ 2 f o r (j=0;j<(N/ 4 ) ;j++){ 3 c[i ] [j] = 0 ; 4 f o r (k=0;k<N;k++){ 5 c[i ] [j] += a[i ] [k] ∗b[k] [j] ; 6 } 7 } 8 }

(a) Matrix multiplication in C.

1 vMulRes = vRow ∗ popD( ) ;

2 f o r (i=0;i<vMulRes. s i z e ( ) ;i++){ 3 i R e s=i R e s + vMulRes. e (i) ; 4 } 5 c[N] [P] = i R e s; 6 . 7 . 8 .

(b) Matrix multiplication in StreamBits. Figure 4.3: Comparison of matrix multiplication.

Figure 4.3 shows that this task can be reduced to one loop when using StreamBits com-pared to when using C. The usage of vecST operations reduces the StreamBits version of the task with one loop. If the functionality of the proposed operand for vector summa-rization would be implemented, even lesser rows would be used in the StreamBits version since the last loop can be removed and replaced with the operator. The use of these op-erators, instead of loops, provides a more informative description to the compiler and no complex analysis of loops, to find parallelism, are necessary. The operators are interpreted as parallel operations by the compiler.

Summary

The matrix multiplication example have focused on how parallelism can be expressed using StreamBits. The example demonstrate how parallelism can be expressed on a course grained level, using switches, and on a fine grained level, using vecST:s. This can be

(39)

CHAPTER 4. RESULTS done without concern of synchronization of processing elements, which normally must be carefully managed by the programmer. The StreamBits synchronization is managed via the tapes that supply the stream components with data or halts them if no data is available. The new structures express, on a high level, that it should be interpreted as parallel structures without machine specific parameters which makes them portable. The compiler does not have to do complex analysis of loop statements, to find parallelism, it is already expressed by the programmer using the different structures.

4.3.2 Convolution encoding

Convolution encoding is a baseband processing function that is used to add bit redun-dancy. This is done to improve bit detection capability in the receiver. The convolution encoding function was implemented to evaluate the new functionality of bit-level opera-tions in StreamBits. The StreamBits program was compared to an existing implementation of convolution encoding, implemented using Java.

The comparison is focused on the number of lines of code between the two implemen-tations, and the abstraction level of how bit-level operations can be expressed using the two different programming languages. In addition to this, the number of operations to perform the same task will be compared between the languages. A test implementation of the convolution encoder was available in Java, it was used for comparison even though Java is not used by the industry for these kind of applications. This is still a realistic com-parison since an implementation in C, which is typically used by industry, would not be implemented much differently because of the similarities between Java and C in bit-level operations. The Java implementation could be optimized, but the goal was not to com-pare performance in this evaluation. The StreamBits encoder function was implemented computationally identical to the Java implementation. Only bit-level operations has been expressed differently by using StreamBits specific data-types and operations.

Implementation

The convolution filter is implemented by first retrieving seven configuration parameters. These elements describe how the next data element should be processed. Not all elements are used to reconfigure the convolution filter, some are just forwarded to proceeding filters downwards in the stream. The important configuration parameters are; which type of forward error correction that should be used, convolution or turbo encoding, and with what coding rate the bit sequence should be encoded with. The only configuration mode that has been implemented for this test is convolution encoding using 1/2 code rate. Other configurations would typically use similar ways to express the bit-level operations, which makes one configuration state implementation sufficient to compare and evaluate the bit-level operations of StreamBits. For a full review of the convolution encoder imple-mentations, see Appendix A.

Figure 4.4 illustrates how convolution encoding for an AMR stream could be expressed. The AMR stream is mapped onto three transport channels, which should be encoded using different encoding modes depending on bit importance. The split distributes the different packets based on the classifications and the convolution filters are configured

(40)

accordingly. The filters process the packets according the classifications set. At the end of the section, the parallel streams are merged and sent down the stream program for further processing. The code rate that has been implemented is 1/2 rate. This rate is used on two of the three parallel streams shown in Figure 4.4. An implementation of the 1/3 code rate, performed in the third parallel stream, would be implemented similar as the 1/2 rate.

Convolution Convolution Convolution

Split Merge

Figure 4.4: Convolution encoding applied on an AMR stream with three classes.

The convolution encoding shift register is implemented using a look-up table that is used to get the output bit sequence, instead of processing the bits on the fly. The look-up table is pre-calculated and contain all possible combinations of output, referenced to a specific input of bits and the current state of the shift register.

Listing 4.6 shows how the convolution encoding function is implemented using merge and bit-slice operations in StreamBits.

Listing 4.6: Core of StreamBits convolution encoding implementation.

1 /∗ C o n v o l u t e 16 i n p u t b i t s t o 32 o u t p u t b i t s p e r i t e r a t i o n ∗/ 2 f o r (intST i = 0 ; i < (n words + 1 ) ; i++){ 3 /∗ ∗ r tmp h o l d s f i r s t i n p u t ∗/ 4 r tmp = popD( ) ; 5 6 /∗ ∗ F i r s t n i b b l e ( 4 b i t s ) ∗/ 7 r e s u l t = l u t p t r[r tmp. b i t s l i c e P a c k ( ” 3 : 0 ” ) ] [ s h i f t r e g ] ; 8 9 /∗ ∗ Second n i b b l e ∗/ 10 s h i f t r e g = s h i f t r e g . lmerge ( ” 3 : 0 ” ,r tmp. b i t s l i c e P a c k ( ” 3 : 0 ” ) ) ; 11 r e s u l t = l u t p t r[r tmp. b i t s l i c e P a c k ( ” 7 : 4 ” ) ] [ s h i f t r e g ] . lmerge ( ” 7 : 0 ” , r e s u l t) ; 12 13 /∗ ∗ Th ird n i b b l e ∗/ 14 s h i f t r e g = s h i f t r e g . lmerge ( ” 3 : 0 ” ,r tmp. b i t s l i c e P a c k ( ” 7 : 4 ” ) ) ; 15 r e s u l t = l u t p t r[r tmp. b i t s l i c e P a c k ( ” 1 1 : 8 ” ) ] [ s h i f t r e g ] . lmerge ( ” 7 : 0 ” , r e s u l t) ; 16 17 /∗ ∗ F o u r t h n i b b l e ∗/ 18 s h i f t r e g = s h i f t r e g . lmerge ( ” 3 : 0 ” ,r tmp. b i t s l i c e P a c k ( ” 1 1 : 8 ” ) ) ; 19 r e s u l t = l u t p t r[r tmp. b i t s l i c e P a c k ( ” 1 5 : 1 2 ” ) ] [ s h i f t r e g ] . lmerge ( ” 7 : 0 ” , r e s u l t) ; ; 20 21 s h i f t r e g = s h i f t r e g . lmerge ( ” 3 : 0 ” ,r tmp. b i t s l i c e P a c k ( ” 1 5 : 1 2 ” ) ) ; 22 23 pushD(r e s u l t) ; 24 }

(41)

CHAPTER 4. RESULTS Because StreamBits is simulated in Java, functionality that will be expressed with oper-ators in the compilable language are expressed as Java methods in Listing 4.6. Thereof the long expressions in Listing 4.6. See Table 4.1 for the semantic of the bitvecST Java methods.

The work process of the convolution filter is described here. The filter receives an element from the input data stream. The element is a bitvecST that contain 16 bits, when these are processed a bit sequence with twice the size is produced. The look up table is initiated to process 4 bits at a time, it will return an 8 bit sequence based on the 4 bit value and the current state of the shift register. When the first 4 bits are processed, the shift register is updated and the next four bits are processed. The result from every lookup is merged with the other lookups and when the whole 16 bit sequence is processed, the 32 bit result is pushed downward the stream program.

Table 4.1: StreamBits expressions compared with C.

bitvecST operand Corresponding C expression

bitslice(m : n) (t & wm:0) bitsliceL(m : n) (t & wm:0) << (w - m) bitsliceR(m : n) (t & wm:0) >> n bitslicePack(m : n) N/A lmerge(k : l , m : n) if l <= (m - n) : ((t & wk:l) << C1) | ((s & wm:n) >>n) if l > (m - n) : ((t & wk:l) >> C2) | ((s & wm:n) >>n)

The bitslicePack("m:n") operator is not available in C. The operator returns a bitvecST with the specified bits, and packs them together so that the length of the bitvecST cor-respond the length of the bits.

Listing 4.7 shows how the convolution encoding is implemented using Java. Listing 4.7: Core of Java convolution encoding implementation.

1 /∗ ∗ C o n v o l u t e 16 i n p u t b i t s t o 32 o u t p u t b i t s p e r i t e r a t i o n ∗/ 2 f o r ( i n t i = 0 ; i < (n words + 1 ) ; i++){ 3 /∗ ∗ r tmp h o l d s f i r s t i n p u t ∗/ 4 r tmp = i n p u t. pop ( ) ; 5 6 /∗ ∗ F i r s t n i b b l e ( 4 b i t s ) ∗/ 7 r e s u l t = l u t p t r[ (r tmp & 0xF0000000) >>> 2 8 ] [s h i f t r e g ] ; 8 9 /∗ ∗ Second n i b b l e ∗/ 10 s h i f t r e g = ( (r tmp & 0xF0000000) >>> 2 4 ) | (s h i f t r e g >>> 4 ) ; 11 r tmp = r tmp << 4 ; 12 r e s u l t = (r e s u l t << 8 ) | (l u t p t r[ (r tmp & 0xF0000000) >>> 2 8 ] [ s h i f t r e g ] ) ; 13 14 /∗ ∗ Th ird n i b b l e ∗/ 15 s h i f t r e g = ( (r tmp & 0xF0000000) >>> 2 4 ) | (s h i f t r e g >>> 4 ) ; 16 r tmp = r tmp << 4 ; 17 r e s u l t = (r e s u l t << 1 6 ) | (l u t p t r[ (r tmp & 0xF0000000) >>> 2 8 ] [ s h i f t r e g ] ) ; 18 19 /∗ ∗ F o u r t h n i b b l e ∗/

(42)

20 s h i f t r e g = ( (r tmp & 0xF0000000) >>> 2 4 ) | (s h i f t r e g >>> 4 ) ; 21 r tmp = r tmp << 4 ; 22 r e s u l t = (r e s u l t << 2 4 ) | (l u t p t r[ (r tmp & 0xF0000000) >>> 2 8 ] [ s h i f t r e g ] ) ; 23 24 s h i f t r e g = ( (r tmp & 0xF0000000) >>> 2 4 ) | (s h i f t r e g >>> 4 ) ; 25 r tmp = r tmp << 4 ; 26 27 o u t p u t. push (r e s u l t) ; 28 }

The Listings 4.6 and 4.7 show how convolution encoding can be done with StreamBits and Java. The listings show that the same operation can be expressed with less rows using StreamBits, than when using C, see Table 4.2. The reason for this is the use of bitvecST operators in the StreamBits implementation. The two operators used in the StreamBits implementation are lmerge and bitslicePack which are described in Table 4.1.

The computation of the bit stream can be partitioned into sections of small bit fields, in this case a four bit nibble. See Listings 4.6 and 4.7. A section’s operations can be divided into two parts. The first part is to update the shift register to its current state. The second part is to receive the next four bits that should be processed and look up the corresponding result based on the four bits and the current state of the shift register. When comparing the implementations with each other, one can see that a nibble in the StreamBits implementation is expressed with one less line than in the Java implementation, see Table 4.2. This is achieved by using bitvecST operators. The line that is omitted in the StreamBits implementation is when the input bit sequence is shifted left. This operation is performed to align the next four bits to the left in the bit sequence so that by shifting the bit sequence 28 positions to the right, the value of the next four bits can be read and be used in the look-up table. This lengthy process of receiving the next four bits, have been reduced to one operation when using the StreamBits bitvecST type. When using the bitSlicePack operator, the bits are singled out by specifying which positions they are at in the bit sequence, see Listing 4.6.

The bitvecST operators do not only reduce the number of lines the convolution filter can be written on, it also reduce the number of expressions that is performed in one line. A nibble in the Java implementation consists of eleven expressions. A nibble in the StreamBits implementation consists of six expressions. The reduced number of expressions performed is also because of the usage of the bitvecST operations.

Table 4.2: StreamBits and C code comparison.

Language Number of expressions in a nibble Number of lines

StreamBits 6 8

C 11 12

The shift register, in StreamBits, is updated by merging four bits from the shift register with the previously processed bits. In the Java implementation, this is done by both shifting the input bit sequence and the shift register, and later performing a bitwise OR between the results of those two operations.

The result, in StreamBits, is calculated by merging the previous lookup results with the new eight bit value gathered from the look up table. The Java implementation have to

(43)

CHAPTER 4. RESULTS mask out the four bit value that should be processed, get the value from the look up table and perform a bitwise OR with a shifted version of the previous result.

Another issue, that is addressed with the use of a type for bit-field manipulations, is machine abstraction. In the Java implementation, Listing 4.7, specific shift lengths are used to retrieve a part of the bit sequence. These shift lengths are based upon the bit representation of an integer, in Java, 32 bits long. By introducing a type for bit-field manipulations, the StreamBits bitvecST, bit operations can be expressed using this type, instead of being tied to a bit length that may differ between architectures.

Summary

The StreamBits type for bit-field manipulations, and its operators, provide an easy and efficient way to express bit-field operations. The use of these operators can reduce the number of operations that have to be expressed, compared to when identical tasks should be expressed using conventional types and operators.

4.3.3 Framework usage

The framework introduces a problem that will not be an issue when using the implemented language. The framework is not limited to the parts that the compiler implemented language will be, but it also includes the parts of Java that should not be present in the final version of the language. The problem arises when Java-specific operations, data-types and statements, that will not be available in the StreamBits syntax, are used. The programmer should always strive to use the StreamBits specific data-types, operations and statements when using the framework, or else the functionality of the stream components can not be guaranteed in the compiler implemented version.

The framework is developed using an object-oriented approach that opens up for easy de-velopment and adding of components. Gathering similar classes into groups and separat-ing groups from each other is a good approach. It gives a good overview of the framework and gives programmers, that are new to the framework, a better understanding of what components that are available. This structured approach helps the programmer to find other components that could perform the tasks he want to do.

4.3.4 Data-types

The prototype language does not have a boolean type. A boolean is a data-type that can hold the value true or false. When the test programs were developed, boolean flags were used in filters and switches. This was possible because of the current language implementation, as a Java-framework, but will not be the case when implemented as a compilable language. A boolean type is often used in situations to differ between states of a filter. The introduction of a boolean type is therefore an important addition to the language and will aid programmers to feel comfortable using the language.

An idea for further additions to the language are trigonometrical functions and other mathematical functions like absolute value, square root and exponent. These functions are

(44)

common language additions and they are helpful to the programmer when such operations should be performed.

4.3.5 Messaging

StreamBits have separate streams for data and configuration parameters between stream components. The idea with this is to make the the programming easier for the program-mer, not having to deal with different data from the same stream. The separation is done so that two tapes are connected to the stream components input and two to the output. To be able to differ between the streams, new stream modifiers have been introduced which are described in the background section on page 9.

Compared to handling four streams instead of two, as done in StreamIt, the separation of data and configure introduces problems that are not present when only having to deal with two streams. One complication to be noted is that the stream modifiers are very similar and a simple mistype is possible. The only difference between the two sets are the last letter in the method call. When a programmer writes the wrong stream modifier the data and configuration parameters will be skewed and data will be processed to the wrong configuration.

StreamIt uses a two stream approach with out-of-stream messaging, portals. This ap-proach has its advantages, messages can for example be sent upstream which is a feature that StreamBits lacks.

The positive changes are that filter configuration can be done separate from data pro-cessing. Code for reconfiguring the filter is done in a separate function, configure and when the configuration stream rate is modified, the only change that has to be made is in the configuration method. This makes it easier to write readable and correct code.

Modelling and Evaluating the StreamBits language

Technical report, IDE0708, January 2007

MODELLING AND EVALUATING THE

STREAMBITS LANGUAGE

Master’s thesis in Computer Systems Engineering

Jonathan Andersson

Modelling and Evaluating the StreamBits Language

Abstract

Preface

List of Figures

Listings

Contents

1

Introduction

1.1

Motivation and goal

1.2

Scope of thesis

2

Background

2.1

Stream applications

2.1.1

3G

+

2.2

Stream programming languages

2.2.1

StreamIt

2.2.2

StreamBits

2.2.3

Other related programming languages

2.3

Functions

2.3.1

Matrix multiplication

2.3.2

Convolution encoding

3

Approach

3.1

Examination of framework

3.1.1

Framework structure

4

Results

4.1

Expansion of framework

4.1.1

Switches

4.2

Improvement of framework

4.2.1

Stream constructs

4.2.2

parAdd

4.2.3

Parallel simulation

4.2.4

Data-types

4.3

Evaluation

4.3.1

Matrix multiplication

4.3.2

Convolution encoding

4.3.3

Framework usage

4.3.4

Data-types

4.3.5

Messaging