VHDL Implementation of a Fast Adder Tree

(1)

Chen Dacheng

LiTH-ISY-EX--05/3760--SE Linköping 2005

TEKNISKA HÖGSKOLAN

LINKÖPINGS UNIVERSITET

Department of Electrical Engineering Linköping University

S-581 83 Linköping, Sweden

Linköpings tekniska högskola Institutionen för systemteknik 581 83 Linköping

(2)

VHDL Implementation of a Fast Adder Tree

Dacheng Chen

LiTH-ISY-EX--05/3760--SE Linköping 2005

(3)

(4)

VHDL Implementation of a Fast Adder Tree

Master thesis in Electronics Systems

at Department of Electrical Engineering,

Linköping University

by

Dacheng Chen

LITH-ISY-EX--05/3760--SE

Supervisor: Henrik

Ohlsson

Examiner: Lars

Wanhammar

Linköping, 1 June 2005.

(5)

(6)

This thesis discusses the design and implementation of a VHDL generator

for Wallace tree with (3:2) counter modules and (2:2) counter modules to

solve fast addition problem.

The basic research has been carried out by MATLAB programming

environment and automatic generation of VHDL file based on the result

obtained from MATLAB simulation. MODELSIM has been used for

compilation and simulation of the VHDL file.

(7)

(8)

I would like to thank my examiner Professor Lars Wanhammar for

offering me this opportunity to do this thesis and I also appreciate my

supervisor Henrik Ohlsson, he gave me sincere help and advice during

the thesis work.

Thanks also to my classmate Kangmin Chen for his motivate advices on

my work. And all the members who give me help and support.

Last but not least, thanks to my family, Gaoyang Chen and Yan Wang,

for all the courage and support for my study here in Sweden.

(9)

(10)

INDEX OF FIGURES

FIGURE 1RIPPLE-CARRY ADDER...5

FIGURE 2CARRY OUT OF CARRY LOOKAHEAD ADDER...7

FIGURE 3SUM OF CARRY LOOKAHEAD ADDER...7

FIGURE 4FUNCTION OF CARRY-SAVE ADDER...8

FIGURE 5CSA USED IN

n

BIT NUMBERS...9

FIGURE 6CSA COMPUTATION...9

FIGURE 7FIRST STEP OF SIGNED-DIGIT ADDITION... 11

FIGURE 8SECOND STEP OF SIGN-DIGIT ADDITION...12

FIGURE 9SIGN-DIGIT ADDITION...12

FIGURE 10A[

p

:2] ADDER...13

FIGURE 11REDUCTION BY ROWS...14

FIGURE 12FA AND HA AS (3:2) COUNTER AND (2:2) COUNTER...15

FIGURE 13EXAMPLE OF REDUCTION BY COLUMNS...15

FIGURE 14SYSTEM REQUIREMENT...17

FIGURE 15COMPONENT FIGURE...19

FIGURE 16EXPLAIN OF REPRESENTATION...19

FIGURE 17EXAMPLE OF FIRST LEVEL'S STRUCTURE OF WALLACE TREE...20

FIGURE 18DESIGN FLOW OF PROGRAM...22

FIGURE 19COUNTER BLOCK...23

FIGURE 20FA POSITION IN ADDER TREE...24

FIGURE 21HA AND BP POSITION IN ADDER TREE...25

FIGURE 22INPUT BLOCK...26

FIGURE 23FA INPUTS STATE...26

FIGURE 24HA AND BP INPUTS STATE...27

FIGURE 25OUTPUT BLOCK...27

FIGURE 26FA'S SUM STATE...28

FIGURE 27HA'S SUM STATE...29

FIGURE 28BP'S OUTPUT STATE...29

FIGURE 29FA'S CARRY OUT STATE...30

FIGURE 30HA'S CARRY OUT STATE...30

FIGURE 31MATLAB CODE TO VHDL CODE...32

FIGURE 32MODELSIM OPERATION...33

(13)

INDEX OF TABLES

TABLE 1CSACOMPUTATION...10 TABLE 2COMPUTATION PROCEDURE...40

(14)

Chapter 1 Introduction

1.1 Motivation

Computation operations like fast parallel multiplication using adder trees are present

in many parts of a digital system or digital computer, especially in signal processing,

high-speed circuits, graphics and scientific computation. Examples of such are

graphic processor, digital signal processors, communication or code compression. To

speed up addition is a very important part for computation.

There are many tree structure like Wallace adder tree [1], CSA tree, over turn stair

tree [2] and some other kinds of adder trees are mentioned in [3]-[7]. Here Wallace

tree is used as the tree structure because it is suitable for implementation

1.2 Thesis Target

Use MATLAB to make programs, the first part of the program is formed by blocks

where each block contains some cell arrays. The second part of the program is used

to generate a VHDL file, the information we need is all stored in cell arrays. Then

use MODELSIM to compile and simulated the VHDL file created by MATLAB.

1.3 Reading guide

This thesis is organized in five chapters.

Chapter 2 mainly discuss different adders, multi-operand addition and fast addition

trees.

(15)

Chapter 5 gives the conclusion of this work and future work that still has to be done.

Appendices shows the MATLAB code to generate VHDL code.

(16)

Chapter 2 Adder Structures

2.1 Adder Structures

Adders are used in many aspects [11], [12]. It is generally recognized that most of

the time required by adders is due to carry propagation, so how to reduce the

propagation time is the focus on today’s techniques. Different binary adder schemes

have their own characters, such as area and energy dissipation. No such adder

scheme is the best for every condition, so to choose in a specific context with

specific requirement and constraint is important. Because this thesis work does not

focus on analysis of delay time of different adders, here the function of some

commonly used adders is given.

2.1.1 Two’s Complement Representation

Two’s complement representation uses the most significant bit as a sign bit, making

it easy to test whether an integer is positive or negative. Range of two’s complement

representation is from

−

2

n−1

to

2

n−1

−

1 . Consider an

n

bits integer

A , in two’s

complement representation. If

A is positive, then the sign bit

a

n−1

is zero. The

remaining bits represent the magnitude of the number, in the same fashion as for sign

magnitude:

∑

− =

=

2 0

2

n i i i

a

A

for A

≥ 0

The number zero is identified as positive and therefore has a 0 sign bit and a

magnitude of all 0s, we can see that the range of positive integers that maybe

represented is from 0 to

2

n−1

−

1 . Any larger number would require more bits.

(17)

2.1.3 Variable Time Type

Contrary to fixed time type adder scheme, the variable time type adders have a

completion signal so that the result of the addition can be used as soon as the

completion signal is asserted.

2.1.4 Carry-Propagate Adder

Carry-propagate adders (CPA) can get the result in conventional number system, also

called fixed-radix system. The property of fixed-radix system is that every number

has a unique representation, so that no two sequences have the same numerical value.

A digit set from 0 to

r

−

1 , where

r means radix.

2.1.4.1 Ripple-Carry Adder

An

n

-bit adder used to add two

n

-bit binary numbers can build by connecting

n

full adders in series. Each full adder represents a bit position

i (from 0 to

n

−

1 ).

Each carry out from a full adder at position

i is connected to the carry in of the full

adder at the higher position

i

+

1 .

The sum output of a full adder at position

i as shown in Figure 1 is given by:

i i i

i

X

Y

C

S

=

⊕

The carry output of each FA as shown in Figure 1 is given by:

(18)

C

_i₊₁

=

X

_i

Y

_i

+

X

_i

C

_i

+

Y

_i

C

_i

Figure 1 Ripple-carry adder

In the expression of the sum,

C must be generated by the full adder at the lower

_i

position

i

−

1 .

t is the delay from the input from the full adder to the carry output

c

and

t is the delay form the input to the sum output. The worst case delay is given

s

by

)

,

max(

)

1 (

_c _c _s CRA

n

t

T

=

−

+

This adder is slow for large n . The main advantage of this adder is the simplicity of

its cell and connection among them.

2.1.4.2 Carry-Lookahead Adder

The basic idea of carry-lookahead adder is computing the carries simultaneously, i.e.

in this type of adder all the carries in the same groups are computed at the same time.

The carry-lookahead adder has two functions, first is to compute all the carries then

(19)

i i

i

X

Y

P

=

⊕

The carry bit

C

_i₊₁

generated when adding two bits

X and

_i

Y , is '1' when the

_i

function

G is '1' or if the

_i

C is ’1’ and the function

_I

P is '1' simultaneously. In

_i

the first case, the carry bit is activated by the local conditions (the values of

X

_i

and

Y ). In the second, the carry bit is received from the less significant elementary

i

addition and is propagated further to the more significant elementary addition

depending on the function

P .Therefore, the carry-out bit corresponding to a pair of

_i

bits

X and

i

Y is computed according to the equation:

i

C

i

=

G

i

+

P

i

C

i−1

Hence, the carry signal can be computed by carry in, Generate and Propagate

signals.

For example, consider a four bit adder

in

C

P

G

C

₁

=

₀

+

₀

C

₂

=

G

₁

+

P

₁

G

₀

+

P

₁

P

₀

C

_in

C

3

=

G

2

+

P

2

G

1

+

P

2

P

1

G

0

+

P

2

P

1

P

0

C

_in in

C

P

G

P

G

P

G

P

G

C

4

=

3

+

3 2

+

3 2 1

+

3 2 1 0

+

3 2 1 0

(20)

Figure 2 can help us understand the carry out signal computation procedure more

clearly.

Figure 2 Carry out of Carry Lookahead Adder

The sumoutput of each column is given in Figure 3.

sum

_

out

i

=

X

i

⊕

Y

i

⊕

carry

_

in

Figure 3 Sum of Carry Lookahead Adder

The advantage of carry-lookahead adder is if we consider the input vector of

n bits

is divided into groups of

m bits and groups connected like a ripple-carry adder, the

worst delay should be:

_CLA

t

_groups

t

_s

m

n

(21)

The character of redundant adders is that no carry propagation is required. In other

words, independence of numbers of bits of the adders. The operand is represented

using a redundant set. The main purpose of the redundant adder is to reduce the

addition time. But this kind of adder have some disadvantages, first is the increase of

the number of bits needed for representation of a number, which depend on the

degree of the redundancy. Another disadvantage is that some of operations can’t be

performed in redundant numbers such as magnitude comparison or sign detection.

2.1.5.1 Carry-Save Adder

Carry-save adder(CSA) have the same circuit as the full adder, as show in Figure 4.

Figure 4 Function of Carry-save adder

The carry in signal is considered as an input of the CSA, and the carry out signal is

considered as an output of the CSA. Figure 5 show how n carry save adders are

(22)

Figure 5 CSA used in

n

bit numbers

In Figure 5, note that all full adders are independent

Figure 6 show the CSA compute flow and Table 1 will show how the CSA works

(basic on binary numbers).

(23)

Table 1 CSA Computation

The computation can be divided into two steps, first we compute S and C using a

CSA, then we use a CPA to compute the total sum. From this example, we can see

that the carry signal and the sum signal can be computed independently to get only

two n -bits numbers. A CPA is used for the last step computation and the carry

propagation exist only in the last step.

2.1.5.2 Signed-Digit Adder (SDA)

Signed-digit (SD) number representation systems have been defined for any radix

r

with digit values ranging over the set (- alpha , . . ., -1, 0, 1, . . ., alpha ), where alpha

is an arbitrary integer in the range

1

2

1 _≤

_≤

₋

−

r

alpha

r

.Such number representation

systems possess sufficient redundancy to allow for the cut up of carry or borrow

chains and hence result in fast propagation-free addition and subtraction. The result

of the addition uses signed digit representation. Use fixed-radix representation with

digit value from a signed-integer set.

∑

−

=

1 0 n i i

r

x

with a digit set (- alpha , . . ., -1, 0, 1, . . ., alpha ).

Here the addition algorithm is not mention in detail.

The objective of SDA is to eliminate carry propagation. A signed-digit addition is

performed in two steps.

(24)

Step 1: to compute sum(

w

) and transfer(

t

), the transfer’s function is something like

carriers in CPA.

x

+

y

=

w

+

t

At the digital level this correspond to

x

_i

+

y

_i

=

w

_i

+

rt

_i₊₁

Figure 7 show the addition of the first two bits of

n -bit numbers

Figure 7 First step of Signed-digit Addition

Step 2: compute

s

=

w

+

t

At the digital level

i i

i

w

t

s

=

+

We can compute

s

_i

without produce a carry, as shown in Figure 8.

(25)

Figure 8 Second step of Sign-digit Addition

Finally we can conclude SDA structure, as shown in Figure 9.

Figure 9 Sign-digit Addition

A.Avizienis [13] proposed a redundant binary number (a radix-2 signed-digit

number). With this type of number, the propagation of carry figures is absorbed into

its redundancy and the addition processes are unrelated to the number of digits and

can be executed in only two steps. More detail to compute

t

_i

and representation of

(26)

2.1.6 Multi-operand Addition

A common structure for adding several operands is an adder tree, such as Wallace

tree, Dadda tree, carry save adder tree and so on. In this thesis, carry save adder tree

structure and Wallace tree are used. The primitive operation performed on the inputs

bit-array is reduction, to achieve an output bit-array with a small number of bits.

There are two methods used: reduction by rows and reduction by columns, carry

save adder tree belong to first method and the Wallace tree belong to second method.

Modules to reduce the rows are called adders and reduce the columns are called

counters.

2.1.6.1 Carry Save Adder Tree

The carry save adder tree can be used to add three operands in two’s complement

representation and produces a result as the sum of two vectors. A 3-to-2 reduction is

called [3:2] adder, and using this tree, we can use a [

p

:2] adder to reduce

p

bit-vectors to 2 bit-vectors using CSAs.

Figure 10 A [

p

:2] adder

From Figure 10, each column’s bit numbers are

k , and have

p

levels. We can use

(27)

Figure 11 Reduction by rows

From Figure 11, the number of input vectors were reduced by the rows. Finally, we

should estimate the numbers of levels of the CSA tree as

⎟

⎠

⎞

⎜

⎝

⎛

⎟

⎠

⎞

⎜

⎝

⎛

≈

2

3 log

2 log

k

level

where k is the number of input operands.

2.1.6.2 Wallace tree

Wallace tree structures are widely used in additions with several operands. The

reduction by column is similar to reduction by rows if the number of bits in each

column of the array is the same. But conditions are always not like this. For example

the partial products of the multiplier, the Least-significant column can’t receive bits

from other columns. So reduction by columns is introduced.

The basic concept is to reduce bit numbers in each column of each level. So full

adder and half adder are used as (3:2) counter adder and (2:2) counter.

(28)

Figure 12 FA and HA as (3:2) counter and (2:2) counter

In Figure 12, three nodes inside pane represent the FA’s three inputs and two nodes

outside represents the FA’s carry out and sum. The half adder has two inputs, abd one

sum and one carry out. Here is a example used in this thesis presented.

Example:

a =[2 3] means 2 bit numbers with weight

2 and 3 bit numbers with

1

weight

2 . We can use a Wallace tree as shown in Figure 13 to achieve fast addition.

0

The basic module in the Wallace tree is (3:2) counter and (2:2) counter.

(29)

(30)

Chapter 3 Design Flow

3.1 System specification

This program mainly uses Wallace tree structure with (3:2) counter module and (2:2)

counter module in adder tree to solve the fast addition problem. Environment of the

program is MATLAB and the MATLAB program generates VHDL code.

Figure 14 System requirement

As shown in Figure 14. The input of the system is an integer vector (in MATLAB on

integer vector can represent a bit array) that gives the number of bits in each column

and the output of the program is VHDL code for the adder tree.

(31)

Fortran.

3.2.1 Basic MATLAB program language

Here key program syntax used in my program are introduced. First is if a is a

vector, we will use length( a ) to express the vector length. Second is matrix addition

and subtraction which are like in C language. Third is control flow like “if end” and

“for loops”, in my program the tree levels depend on the input vector and we must

use control flow to determine the levels and each level’s detail information.

3.2.2 Multidimensional cell array

In the program, inputs, outputs and component names of each level will be stored.

For example, as shown in Figure 15, the component name full_1_1_1 means the first

full adder in column one in first level and in_data_1_1_1(0) means input data in

column one in first level. All these names are variable character strings, because the

level, column, and bit number are all variables. So if we want to store this

information which must be recoded for next level, we must use an efficient method

to solve it. MATLAB provide a good structure to solve this problem that is cell array.

The function of multidimensional cell array is powerful. It can store the variable

character string and all the information needed for each level. Two and three

dimensional cell arrays [9] are used in my program.

(32)

Figure 15 Component figure

Figure 16 Explain of representation

Consider Figure 16(a), a full adder’s name: Full_1_1_1

First create a three dimensional cell array named cell( n , m ,

p

). then we define the

information to the cell array, ‘

p

’ means how many levels there is in the total system,

‘ m ’ means which column in the defined level. ‘ n ’ means which full adder is used in

defined column.

Consider Figure 16(b), an input data name: in_data_1_1(0)

First create a three dimensional cell array named cell( n , m ,

p

). Then we define the

information to the cell array, ‘

p

’ means how many levels in the total system. ‘ m ’

means which column in the defined level. Last ‘ n ’ means which bit in the defined

column.

(33)

The design flow of the program is shown in figure 18, The first step is to compute

the total numbers of levels in the adder tree. The second step is to compute each

level’s integer vector through the Wallace tree, The third step is to compute how

many columns in each level through second step. The fourth step to compute the

total numbers of counter and bypass in each column. There are three conditions. One

condition is only have full adder in each column, Another condition is have full

adder and half adder in each column, the last condition is have bypass and full adder

in each column. The fifth step is after fourth step where we already know how many

counters and bypasses in each column at each level. So though the numbers of full

adder, half adder and bypasses in each column at each level, we can get inputs and

outputs position states of counters and bypass’s input and output position states. The

sixth step is to store these states in cell arrays which will be used for describing

hardware connection of Wallace tree.

Here is an example to help us understand program procedure.

Example: Assume the input integer vector is a = [6 4 5 6].

(34)

Figure 17 shows the first level of the tree structure. As mention above, we discuss

this example from the third step.

The third step can confirm that this level has 4 columns.

The fourth step can compute that column 1 has two full adders, column 2 has one

full adder and a bypass, column 3 has a half adder and a full adder, column 4 has two

full adders. So this step can confirm full adder and half adder’s position. FA_4_1,

means column 4 and level 1 and so on. Then store FA_4_1 to full adder position cell

array and so on.

The fifth step is through full adder, half adder and bypass position to get inputs. and

outputs position. Like if we know FA_4_1, we can get it’s input should be

In_data_4(0), In_data_4(1) and In_data_4(2). And output should be Out_data_4(0) ,

Out_data_3(0) and so on.

The sixth step is:

Store FA_4_1 in full adder’s cell array, In_data_4(0), In_data_4(1), In_data_4(2) in

full adder’s input cell array. Out_data_4(0) in full adder’s output’s sum cell array.

Out_put_3(0) to full adder’s output’s carry out cell array.

Store HA_3 in half adder’s cell array, In_data_3(3), In_data3(4) (belong to half

adder inputs in Figure 18) to half adder’s input cell array, Out_data_3(3) to half

adder’s output’s sum cell array and Out_data_2(1) to half adder’s output’s carry out

cell array.

Assume bypass is a component like full adder. So store in_data_2(3) to bypass’s

input cell array and Out_data_2(3) to bypass’s output cell array.

So there are 4 cell arrays for full adder, 4 cell arrays for half adder and 3 cell arrays

for bypass, because bypass doesn’t have carry out.

Now we store all this level’s information by different cell arrays. When we want to

use it, according to the column and level, we can find them in the cell arrays.

(35)

(36)

3.4 Description of cell arrays

In Figure 18, three blocks are defined: counter block, input block, and output block.

Counter block means all the cell arrays where the information related to position of

full adders, half adders and bypasses (assume it is a component) in this block are

stored. This block contains full adder’s position cell array, half adder’s position cell

array, and bypass’s position cell array.

Input block means all the cell arrays storing the inputs information will be in this

block, so this block contains full adder’s input cell array, half adder’s input cell array

and bypass input cell array.

Output block means the cell arrays that store the output information will be in this

block. This block contains full adder’s output’s sum cell array, full adder’s output’s

carry out cell array, half adder’s output’s sum cell array, half adder’s output’s carry

out cell array, bypass’s output cell array.

3.4.1 Counter Block

The counter block divided by three cell arrays: (3:2) counter (FA) cell array, (2:2)

counter cell array and bypass cell array. The FA cell array store each FA’s position of

each column in each level. HA cell array store the position of each column in each

level, and so does bypass. Figure 19 shows the counter block.

(37)

Example: If the input bit array is a = [6 4 5 6]; through the FA cell array, we can get

from Figure 20 that there are tree levels, and in first level, column one need two FAs.

:

Figure 20 FA position in adder tree

Column two needs one FA, column tree need one FA, column four need two FA, and

so on for the other two levels. When we want to use the FA information, just find it

in the cell array is enough.

Then the next steps are the HA and Bypass cell arrays.

The principle is like the FA cell array, only difference is the dimension size of the

cell array, as each column maybe only have one HA or Bypass, so the cell array

should be HA(1,column, level), and Bypass(1, column, level). Consider that Bypass

(BP) is a component because it is the easy for us to find its signal flow.

(38)

Example: Same input bit array a =[6 4 5 6]; through the HA and Bypass cell arrays,

we can get from Figure 21.

Figure 21 HA and BP position in adder tree

From Figure 21 we know that level one, column two has a Bypass and column three

has a HA. The same concept applies to last two levels. Because the cell array’s two

dimension(column, level) are same compared with FA cell array, it is easy for us to

get all the component position information when we want to generate the VHDL

code.

3.4.2 Input block

When position of counters are defined, we should add input signals to each FA, HA

and BP.

First consider input of the FA cell array. Create a cell array FAinput(max, column,

level). Column and level are the same as mention above, Max here means when we

already know a column have max FA numbers, this numbers multiplied by three

because each FA has three inputs. Figure 22 shows the input block.

(39)

Figure 22 Input Block

Example: Same input bit array a =[6 4 5 6]; the input state is shown in Figure 23.

Figure 23 FA inputs state

Because we already know column one in first level has two FA as mentioned above,

so the inputs of column one should be six input numbers. The same applies for the

other columns and levels.

Then we consider inputs of HA and BP. The principle is the same as for the FA’s

inputs, difference is that HA has two inputs and BP only has one input. So the cell

(40)

array can be created like HAinput(2, column, level) and BPinput(1, column, level).

Example: Same input bit array a = [6 4 5 6]; the input state we can get from

HAinput cell array and BPinput cell array shown in Figure 24.

Figure 24 HA and BP inputs state

3.4.3 Output block

The program has already stored the counters and inputs information, the last step is

to store the output state. The output block contain FA’s sum cell array, FA’s carry out

cell array, HA’s sum cell array, HA’s carry out cell array and BP’s output cell array.

Figure 25 Output Block

(41)

cell array, HA and BP’s output in each column are only one, so the one dimension of

the cell array should be one.

Example: Same input bit array a =[6 4 5 6]; we can get FA’s sum, HA’s sum and

BP’s sum from Figure 26.

Figure 26 FA's sum state

The cell array named fos, means full adder output sum. We already know column

one have two FA in level one, so there are two corresponding outputs. So do other

columns and levels.

(42)

Figure 27 HA's sum state

The cell array hos, as shown in Figure 27, means HA’s output sum, from counter

block. We know that in the first level we only have one HA in column three. And this

output corresponding to the HA position. So do other columns and levels.

Figure 28 BP's output state

The cell array bpos, as shown in Figure 28 means BP’s output, from the counter

block, we know that in the first level we only have one BP in column two, so the

output corresponding to the BP position, so do other columns and levels.

Carry in bits will affect the positions of the next column’s output, so I create a cell

array to store the number of bits of the carries to the next column. When storing the

current sum output, we should use this cell array to get the correct position for each

(43)

Figure 29 FA's carry out state

From Figure 29, for example in the first level, column two, carry out is

‘out_data_1_1(0)’. Though this bit should be in column one because it is a carry out,

but it is produced by column two’s FA, so store this bit in column two. It is easy for

us to get all information of a counter by column and level. So does HA’s carry out

cell array as shown in Figure 30.

(44)

From all these cell arrays, the program describe all the information of the adder tree

hardware connections. And solve fast addition target by Wallace tree.

(45)

4.1 MATLAB program to generate VHDL code

RTL description of the adder tree is suitable for describing the component structure.

A MATLAB program is used to generate RTL VHDL code.

The main method to generate VHDL code is using file input and output in the

MATLAB language. in Figure 31 it is explained how to create a VHDL file and the

procedure of generation.

Figure 31 MATLAB code to VHDL code

The first step in Figure 31 is to create a VHDL file, ‘w’ means that we can write the

file. And ‘fid’ is used to identify which file that is used. Here eachlevel.vhdl as is

used as filename, the next step is writing to the file, MATLAB’s syntax ‘fprintf(fid,

'format','cmd')’ writes the string using the format specified by format. Format is a C

language conversion specification. Conversion specifications involve the %

character and the conversion characters d, i, o, u, x, X, f, e, E, g, G, c, and s. In this

thesis, ‘%s’ is used because all information is stored in cell array in character string

format.

(46)

The second sentence in Figure 31 defines a library, and the third sentence describes

the library using std_logic_1164.all package. The generated code should agree with

VHDL grammar. Information stored in cell array is used to represent the adder tree

structure.

4.2 VHDL code Description

Because of structural VHDL, the Wallace adder tree representation was divided in

each level and a top level. Each level record current level’s state like numbers of

counters and their positions. Also record inputs and outputs state of each counter.

The Top level is used to integrate all these levels, to get the final result of the fast

adder tree.

4.2.1 Related MODELSIM and VHDL language

MODELSIM is a quick and handy VHDL/Verilog simulator. The VHDL code must

be complied into a VHDL library before it is simulated. The simulator itself can’t

read VHDL source code. The procedure flow is show in figure 32:

(47)

4.2.3 VHDL structural RTL description

A structural description [10] of a piece of hardware is a description of what its

subcomponents are and how the subcomponents are connected to each other.

Structural description is more concrete than behavioral description; that is the

correspondence between a given portion of a structural description and a portion of

the hardware is easier to see than for a behavioral description.

4.2.3.1 Building Blocks

If we want to make the design more understandable and maintainable, a system

design should be decomposed into several blocks. These blocks are connected

together to form a complete design. Every part of a VHDL design is considered as a

block. A VHDL design may be completely described in a single block, or it may be

decomposed into several blocks. Each block in VHDL is analogous to an

off-the-shelf part and is called an entity. The entity describes the interface to that

block and a separate part associated with the entity describes how that block operates.

The interface description is like a pin description in a data book, specifying the

inputs and outputs of the block. The description of the operation of the block is like a

schematic.

(48)

4.2.3.2 Connect Block

Once we have defined the basic building blocks of our design using entities and their

associated architectures, we can combine them together to form the system. For my

work, the top level is formed by each level’s connection. Each level is a basic block,

and the connect block integrate those blocks together to form the top level structure.

4.2.4 VHDL code of each level

Level numbers depend on the input bit array, when confirm the level, using VHDL

structure description to describe the adder tree’s structure. The VHDL code was

automatic generated by the MATLAB program. Each level of the structure is

described by structure VHDL, ports and their connection are central matter of the

structure description. Each level’s ports information can be obtained from counters,

inputs, and outputs cell array blocks.

Example: This is a VHDL file generated by MATLAB code. Assume bit array a =[2

3].

(49)

(50)

4.2.5 VHDL code for top level

Integrate all those levels together, using the connect block of structural VHDL to

complete it. First level’s inputs as top level’s inputs and last level’s outputs as top

level’s outputs, other levels between these two levels are internal signals. The

outputs from one level are used as next level’s inputs

(51)

Figure 34 VHDL code description for top level

4.3 Simulation result

Now we get four types of VHDL files : each level structural VHDL, top level

structural VHDL and (3:2) counter and (2:2) counter dataflow VHDL file. Using

(52)

MODELSIM compile and simulate the files. Then we can get the final result of the

fast adder tree.

From MODELSIM, we can see the adder tree’s hardware connection more clearly, as

shown in Figure 35.

Figure 35 Adder tree structure described by MODELSIM

(53)

Figure 36 Computation result

From Figure 35, this Wallace adder tree has five levels. Let us consider the

simulation result. The sum value at the output must be equal to the sum value at the

input.

7

2

6

2

5

2

4

2

3

2

1

2

0

a

= 111111110011 12 bits 11110011111 11 bits 11010110101111 14 bits 111110000 9 bits 11010 5 bits Sum=285 4

2

●0

2

3●9

2

2●10

2

1●5

2

0●3

a

= 11 2bits 00 2bits 00 2bits 10 2bits 1 1bits 1 1bits 0 1bits 1 1bits Sum=285 7

2

● 2 6

2

● 0 5

2

● 0 4

2

●1

2

3●1

2

2●1

2

1●1

2

0●1

(54)

From Table 2, the computation results are the same, and bit vectors in the output bit

array are no more than two bits, so the carry propagation is only required at the

output.

(55)

5.1 Conclusion

In this work fast adder tree implementation in VHDL was considered. When inputs

are of large word length, Wallace tree was used to solve this problem and VHDL

files to describe Wallace adder tree’s hardware connection were generated.

5.2 Future work

Three programs are written in MATLAB language: one for storing each level’s

current states and the other two uses that program to automatic generate each level

and top level VHDL files.

Future work is to add pipeline to each level, and consider delay time of each level.

Furthermore, the Wallace adder tree structure may be changed to another one

because of the irregular routing and large wiring area problems.

(56)

References

[1] C.S. Wallace, “A suggestion for a fast multiplier,” IEEE Trans. Electron. Comput.,

pp. 14–17, Feb. 1964.

[2] Z.-J. Mou, “'Overturned-Stairs' Adder Trees and Multiplier Design,” F. IEEE

Computer Society, http://csdl.computer.org/comp/trans/tc/1992/08/t0940abs.htm

[3] A. Weinberger and J. L. Smith, “A logic for high-speed addition,” National

Bureau of Standards Circular591, pp. 3–12, 1958.

[4] T. Lynch and E. E. Swartzlander, Jr., “The redundant cell adder,” in Proc. 10th

Symp. Comput. Arithmetic, 1991, pp. 165–170.

[5] V. Kantabutra, “Designing one-level carry-skip adders,” IEEE Trans. Comput.,

vol. 42, no. 6, pp. 759–764, June 1993.

[6] A. Weinberger and J. L. Smith, “A logic for high-speed addition,” National

Bureau of Standards Circular591, pp. 3–12, 1958.

[7] Y. Harata et al., “A high-speed multiplier using a redundant binary adder tree,”

IEEE J. Solid-State Circuits, vol. SC-22, pp. 28–34, Feb. 1987.

[8] http://www.mathworks.com/access/helpdesk/help/techdoc/matlab_prog/

[9] E. Herniter, “Programming in MATLAB,” Northern Arizona University, 2001.

[10] R. Lipsatt, “VHDL: Hardware Description and design,” Intermetrics Inc., 1993.

[11] Peter Kornerup, Southern Danish University IEEE Computer Society

http://csdl.computer.org/comp/proceedings/asap/2002/1712/00/17120218abs.htm

[12] F. Ancona, S. Rovetta, R. Zumino , “High performance in tree-based parallel

(57)

(58)

Appendices

Appendix 1 MATLAB program for each level

clear

a= random bit array

% show fulladder's ouputs , divided two cell arrays, one store the output % sum data name, another store the carry_out data name.(fos)(foc) % show half adder's ouputs , divided two cell arrays, one store the output % sum data name, another store the carry_out data name.(hos)(hoc) % antoher cell array sotre the BP's output data name(bpos)

% (cncolumn) is a cell array to show the total carry_out values to next % column in each level

% define total levels final=a; model=a; result=a; page=0; while max(result)>2 a=result;

result = zeros(1, length(a)); carry_out = 0;

for k =length(a):-1:1 % k columnes in vector b=a(k);% get the number of each column rem (b,3);% get rem of b/3

c=fix(b./3);% # of full adder d=b-3.*fix(b./3);% remainder carry_in=carry_out; if d==0 carry_out_HA=0; sum=0; elseif d==1 carry_out_HA=0; sum=1; else

(59)

else totalsum=c+carry_in+sum; end result(k)=totalsum; end if carry_out~=0

result = [ carry_out, result ]; end page=page+1; disp(result) a1=result; end level=page; rowcon=cell(1,level+1); a=model; % define a vector result1 = zeros(page, length( result)) result = a;

page = 1;

while max(result)>2 a=result;

for k =length(a):-1:1 % k columnes in vector b=a(k);% get the number of each column rem (b,3);

(60)

d=b-3.*fix(b./3); carry_in=carry_out; if d==0 carry_out_HA=0; sum=0;% elseif d==1 carry_out_HA=0; sum=1; else carry_out_HA=1; sum=1; end carry_out = carry_out_HA + c; if k==length(a) totalsum=sum+c; else totalsum=c+carry_in+sum; end result(k)=totalsum;

% result= [result, totalsum] end

if carry_out~=0

result = [ carry_out, result ]; end

for intI = length(result):-1:1

result1 ( page, intI ) = result(intI) end page = page + 1; end Fresult=result1; page1=level; page1=page1+1; lcolumn=cell(1,page1); for i=1:1:page1 if i==1 a=model;

(61)

if b==0 a(:,k)=[]; else matrix=a; end end vcolumn=length(a); end lcolumn(1,i)={vcolumn}; end page=level; page=page+1; tcolumns=0; carfull=cell(1,tcolumns, page); carhalf=cell(1,tcolumns, page); carbp =cell(1,tcolumns, page); for i=1:1:page if i==1 a=model; tcolumns=lcolumn{1,1}; % carfull=cell(1,tcolumns, 1); tnFA=0; tnHA=0; tnBP=0; for k1=tcolumns:-1:1

(62)

rem (b,3);% get rem of b/3 c=fix(b./3);% # of full adder d=b-3.*fix(b./3);% remainder if d==0 nFA=c; nHA=0; nBP=0; temptnHA=nHA; temptnFA=nFA; temptnBP=nBP; elseif d==1 nFA=c; nHA=0; nBP=1; temptnHA=nHA; temptnFA=nFA; temptnBP=nBP; else nFA=c; nHA=1; nBP=0; temptnHA=nHA; temptnFA=nFA; temptnBP=nBP; end carfull(1,k1,1)={temptnFA}; carhalf(1,k1,1)={temptnHA}; carbp(1,k1,1)={temptnBP}; end else a=Fresult(i-1,:); tcolumns=lcolumn{1,i}; tnFA=0; tnHA=0; tnBP=0; for k1=1:1:tcolumns

(63)

nBP=0; temptnHA=nHA; temptnFA=nFA; temptnBP=nBP; elseif d==1 nFA=c; nHA=0; nBP=1; temptnHA=nHA; temptnFA=nFA; temptnBP=nBP; else nFA=c; nHA=1; nBP=0; temptnHA=nHA; temptnFA=nFA; temptnBP=nBP; end carfull(1,k1,i)={temptnFA}; carhalf(1,k1,i)={temptnHA}; carbp(1,k1,i)={temptnBP}; end end end

% define maxf for show full_adder 's name page=level;

(64)

maxf=cell(1,page); for i =1:1:page column=lcolumn{1,i}; for k=1:1:column b=carfull{1,k,i}; comp(1,k)=b; maxn=max(comp); maxf(1,i)={maxn}; end end page=level; b=[]; maxg=cell(1,page); for i =1:1:page column=lcolumn{1,i}; for k=1:1:column b=carfull{1,k,i}; comp(1,k)=b; maxn=max(comp); maxg(1,i)={3*maxn}; end end

% show all full adders name page=level; max=0; tcolumnf=0; c=1; funame=cell(max,tcolumnf,page); for i =1:1:page tcolumnf=lcolumn{1,i}; max= maxf{1,i}; for k1=1:1:tcolumnf fn=carfull{1,k1,i}; c=1; for q=1:1:fn

(65)

end

% show all half adder's name page=level; tcolumnh=0; haname=cell(1,tcolumnh,page); for i=1:1:page tcolumnh=lcolumn{1,i}; for k=1:1:tcolumnh hn=carhalf{1,k,i}; if hn==1;

haname(1,k,i)={ strcat( 'half_adder_', num2str(i), '_', num2str(k) ) }; else haname(1,k,i)={[] }; end end end

% show BP 's name, although BP have no component, benefit for seek it's % input and output

page=level; tcolumnbp=0; bpname=cell(1,tcolumnbp,page); for i=1:1:page tcolumnbp=lcolumn{1,i}; for k=1:1:tcolumnbp

(66)

bpn=carbp{1,k,i}; if bpn==1;

bpname(1,k,i)={ strcat( 'bp_', num2str(i), '_', num2str(k) ) }; else

bpname(1,k,i)={[] }; end

end end

%store the full adder's input names of each level page=level;

tcolumnfd=0; maxz=0; c=1;

fuinput=cell(maxz, tcolumnfd, page); for i=1:1:page tcolumnfd=lcolumn{1,i}; maxz= maxg{1,i}; for k1=1:1:tcolumnfd fn=carfull{1,k1,i}; c=1; for q=0:1:3*fn-1 for p=c:1:3*fn

fuinput(p,k1,i)={ strcat( 'in_data_', num2str(i), '_', num2str(k1),'(',num2str(q), ')') }; end c=c+1; end end end

%store the half adder's input names of each level

page=level; tcolumnhd=0;

(67)

fn=carfull{1,k,i}; hn=carhalf{1,k,i}; if fn==0 if hn~=0

hainput(1,k,i)={ strcat( 'in_data_', num2str(i), '_', num2str(k),'(',num2str(0), ')') };

hainput(2,k,i)={ strcat( 'in_data_', num2str(i), '_', num2str(k),'(',num2str(1), ')') }; else hainput(1,k,i)={[]}; hainput(2,k,i)={[]}; end else if hn~=0 b=3*fn; c=3*fn+1;

hainput(1,k,i)={ strcat( 'in_data_', num2str(i), '_', num2str(k),'(',num2str(b), ')') };

hainput(2,k,i)={ strcat( 'in_data_', num2str(i), '_', num2str(k),'(',num2str(c), ')') }; else hainput(1,k,i)={[]}; hainput(2,k,i)={[]}; end

(68)

end end end

% show BP's input's names in each level although BP is not a component, we % look it as a component

page=level; tcolumnbd=0;

bpinput=cell(1, tcolumnbd, page)

for i=1:1:page tcolumnbd=lcolumn{1,i}; for k=1:1:tcolumnbd fn=carfull{1,k,i}; bpn=carbp{1,k,i}; if fn==0 if bpn~=0

bpinput(1,k,i)={ strcat( 'in_data_', num2str(i), '_', num2str(k),'(',num2str(0), ')') }; else bpinput(1,k,i)={[]}; end else if bpn~=0 b=3*fn;

bpinput(1,k,i)={ strcat( 'in_data_', num2str(i), '_', num2str(k),'(',num2str(b), ')') };

else

(69)

end page=level; max=0; foscolumn=0; hoscolumn=0; bposcolumn=0; foccolumn=0; hoccolumn=0; cncolumn=0;

fos=cell(max, foscolumn ,page ); hos=cell(1,hoscolumn,page); bpos=cell(1,bposcolumn, page); foc=cell(max,foccolumn,page); hoc=cell(1,hoccolumn,page); carrynum=cell(1,cncolumn,page); for i=1:1:page foscolumn=lcolumn{1,i}; hoscolumn=lcolumn{1,i}; bposcolumn=lcolumn{1,i}; foccolumn=lcolumn{1,i}; hoccolumn=lcolumn{1,i}; cncolumn=lcolumn(1,i); scolumn=foscolumn;

(70)

max=maxf{1,i}; for k=scolumn:-1:1 m=scolumn; if k==m fn=carfull{1,k,i}; hn=carhalf{1,k,i}; bpn=carbp{1,k,i}; if (fn==0) & (hn==0) & (bpn~=0)

bpos(1,k,i)= { strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(0), ')') };

carrynum(1,k,i)={0};

elseif (fn==0) & (hn~=0) & (bpn==0)

hos(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(0), ')') };

hoc(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k-1),'(',num2str(0), ')') };

elseif (fn~=0) & (hn==0) & (bpn~=0)

c=1; % show fos(1,k,i) and foc(1,k,i)

for q=0:1:fn-1 for p=c:1:fn

fos(p,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(q), ')') };

foc(p,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k-1),'(',num2str(q), ')') };

(71)

elseif (fn~=0) & (hn==0) & (bpn==0)

foc(p,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k-1),'(',num2str(q), ')') }; end c=c+1; end carrynum(1,k,i)={fn};

else (fn~=0) & (hn~=0) & (bpn==0)

end c=c+1; end

(72)

hos(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(fn), ')') };

hoc(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k-1),'(',num2str(fn), ')') }; carrynum(1,k,i)={fn+1}; end else fn=carfull{1,k,i}; hn=carhalf{1,k,i}; bpn=carbp{1,k,i}; if (fn==0) & (hn~=0)

hoc(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k-1),'(',num2str(0), ')') };

elseif (fn~=0) & (hn==0)

foc(p,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k-1),'(',num2str(q), ')') }; end c=c+1; end carrynum(1,k,i)={fn}; elseif (fn==0)& (hn==0) carrynum(1,k,i)={0};

(73)

end c=c+1; end

hoc(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k-1),'(',num2str(fn), ')') };

carrynum(1,k,i)={fn+1}; end

% second show fos and hos and bpos k=scolumn-1:-1:1 fn=carfull{1,k,i}; hn=carhalf{1,k,i}; bpn=carbp{1,k,i}; cn=carrynum{1,k+1,i}; if (fn==0) & (hn==0) & (bpn~=0)

bpos(1,k,i)= { strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(cn), ')') };

(74)

hos(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(cn), ')') };

elseif (fn~=0) & (hn==0) & (bpn~=0) c=1; for q=0:1:fn-1 for p=c:1:fn

fos(p,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(q+cn), ')') }; end c=c+1; end

bpos(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(fn+cn), ')') };

elseif (fn~=0) & (hn==0) & (bpn==0)

c=1;

fos(p,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(q+cn), ')') };

(75)

c=1;

fos(p,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(q+cn), ')') }; end c=c+1; end

hos(1,k,i)={ strcat( 'out_data_', num2str(i), '_', num2str(k),'(',num2str(fn+cn), ')') }; end end end end disp(funame) disp(haname) disp(bpname) disp(fuinput) disp(hainput) disp(bpinput) disp(fos); disp(hos);

(76)

disp(bpos); disp(foc); disp(hoc);

% total need 11 databases to store inputs , oupputs and component % information

% full_adder 's name store in cell array======>funame.

% full_adder's name like full_adder_4_3_2 : 4 mean level four, 3 mean % column three, 2 mean the second full_adder in this column.

% half_adder's name store in cell array ======> haname

% half_adder 's name like half_adder_4_3 mean level 4 and third column % Bypass is not a component, but for index, consider it is, store in cell % array =====> bpname bp_1_3 mean level 1 and column 3

% full_adder 's input data : in_data_1_1(1) mean level1 , column 1, second input ======> fuinput

% half_adder 's input data : in_data_1_1(1) mean level1 , column 1, second input ======> hainput

% Bypass 's input data : in_data_1_1(1) mean level1 , column 1, second input ======> bpinput

% output data can divided into five cell arrays to show

% full_adder 's sum informaion ====> fos , full_adder's carry_out infomation =====> foc

% half_adder's sum information ======> hos, half_adder's carry_out infomation ============> hoc

% Bypass's output infor ==============> bpos .

page=level; a=final; fid = fopen('eachlevel.vhdl', 'w'); for i=1:1:page if i==1 fprintf(fid,'library ieee; \n'); fprintf(fid,' \n');

(77)

fork=length(a):-1:1 % level 1 input

fprintf(fid, 'in_data_%d_%d : in std_logic_vector(%d downto 0);\n',i, k, a(k)-1);

end

for k =length(a):-1:1 % k columnes in vector b=a(k);% get the number of each column rem (b,3);% get rem of b/3

c=fix(b./3);% # of full adder d=b-3.*fix(b./3);% remainder carry_in=carry_out; if d==0 carry_out_HA=0; sum=0; elseif d==1 carry_out_HA=0; sum=1; else carry_out_HA=1; sum=1; end carry_out = carry_out_HA + c; if k==length(a) totalsum=sum+c; else totalsum=c+carry_in+sum;

(78)

end

result(k)=totalsum;

% result= [result, totalsum] end

if carry_out~=0

result = [ carry_out, result ]; end c=lcolumn{1,1}; d=lcolumn{1,2}; % level 1 output if d==c for m=length(result):-1:1 if m==1

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0) \n',i, m, result(m)-1);

else

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0);\n',i, m, result(m)-1); end end else for m=length(result)-1:-1:0 % level output if m==0

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0) \n',i, m, result(m+1)-1);

else

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0);\n',i, m, result(m+1)-1);

end end end

(79)

fprintf(fid,'sout,cout : out std_logic ); \n'); fprintf(fid,'end component; \n');

fprintf(fid,' \n');

fprintf(fid,'component half_adder \n'); fprintf(fid,'port( \n');

fprintf(fid,'ain, bin : in std_logic; \n'); fprintf(fid,'sout,cout: out std_logic ); \n'); fprintf(fid,'end component; \n'); fprintf(fid,' \n'); fprintf(fid,'begin \n'); % define the relation between adder and it's in out put

column=lcolumn{1,i}; % how many columns in level 1 for k=column:-1:1 fn=carfull{1,k,i}; hn=carhalf{1,k,i}; bpn=carbp{1,k,i}; if fn~=0 for m=1:1:fn

fprintf(fid,'%s : full_adder port map ( %s,%s,%s,%s,%s); \n', funame{m,k,i}, fuinput{3*m-2,k,i},fuinput{3*m-1,k,i},fuinput{3*m,k,i},fos{m,k,i},foc{m,k,i} ); end else fprintf(fid,' \n'); end

(80)

if hn~=0;

fprintf(fid,'%s : half_adder port map ( %s,%s,%s,%s); \n', haname{1,k,i}, hainput{1,k,i},hainput{2,k,i},hos{1,k,i},hoc{1,k,i} ); else fprintf(fid,' \n'); end if bpn~=0; fprintf(fid,'%s <= %s ; \n', bpos{1,k,i},bpinput{1,k,i}); else fprintf(fid,' \n'); end end fprintf(fid,'end; \n'); else fprintf(fid,'library ieee; \n'); fprintf(fid,'use ieee.std_logic_1164.all; \n'); fprintf(fid,'\n');

fprintf(fid, 'entity tree_level_%d is \n',i); fprintf(fid, 'port( \n'); a=Fresult(i-1,:); matrix=a; for k=length(matrix):-1:1 b=matrix(k); if b==0 matrix(:,k)=[]; a=matrix;

(81)

for k1=length(a):-1:1

fprintf(fid, 'in_data_%d_%d : in std_logic_vector(%d downto 0);\n',i, k1, a(k1)-1); end a=Fresult(i,:); matrix=a; for k=length(matrix):-1:1 b=matrix(k); if b==0 matrix(:,k)=[]; a=matrix; else a=matrix; end end e=i+1; d=lcolumn{1,e}; c=lcolumn{1,e-1}; if d==c form=length(a):-1:1 % level output

(82)

if m==1

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0) \n',i, m, a(m)-1);

else

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0);\n',i, m, a(m)-1); end end else form=length(a)-1:-1:0 % level output if m==0

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0) \n',i, m, a(m+1)-1);

else

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0);\n',i, m, a(m+1)-1); end end end fprintf(fid,'); \n');

fprintf(fid,'end tree_level_%d; \n',i); fprintf(fid,'\n');

%% show arcitecture

fprintf(fid,'architecture tree_level_%d of tree_level_%d is \n',i,i); fprintf(fid,'\n');

fprintf(fid,'component full_adder \n'); fprintf(fid,'port( \n');

fprintf(fid,'ain, bin, cin : in std_logic; \n'); fprintf(fid,'sout, cout : out std_logic ); \n'); fprintf(fid,'end component; \n');

fprintf(fid,'\n');

fprintf(fid,'component half_adder \n'); fprintf(fid,'port( \n');

(83)

fn=carfull{1,k,i}; hn=carhalf{1,k,i}; bpn=carbp{1,k,i}; if fn~=0

for m=1:1:fn

fprintf(fid,'%s : full_adder port map ( %s,%s,%s,%s,%s); \n', funame{m,k,i}, fuinput{3*m-2,k,i},fuinput{3*m-1,k,i},fuinput{3*m,k,i},fos{m,k,i},foc{m,k,i} ); end else fprintf(fid,' \n'); end if hn~=0;

fprintf(fid,'%s : half_adder port map ( %s,%s,%s,%s); \n', haname{1,k,i}, hainput{1,k,i},hainput{2,k,i},hos{1,k,i},hoc{1,k,i} ); else fprintf(fid,' \n'); end if bpn~=0; fprintf(fid,'%s <= %s ; \n', bpos{1,k,i},bpinput{1,k,i}); else fprintf(fid,' \n'); end end fprintf(fid,'end; \n'); end end fclose(fid)

(84)

Appendix 2 MATLAB program for top level

page=lelve fid = fopen('toplevel.vhdl', 'w); fprintf(fid,'library ieee; \n'); fprintf(fid,' \n'); fprintf(fid,'use ieee.std_logic_1164.all; \n'); fprintf(fid,' \n');

fprintf(fid, 'entity top_level is \n'); fprintf(fid, 'port( \n');

for k=length(a):-1:1 % level 1 input fprintf(fid, 'in_data_%d_%d : in std_logic_vector(%d downto 0);\n',1, k,

a(k)-1); end

% last level output

i=page; a=Fresult(i,:); matrix=a; for k=length(matrix):-1:1 b=matrix(k); if b==0 matrix(:,k)=[]; a=matrix; else a=matrix; end end

(85)

% level output

if m==1

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0) \n',i, m, a(m)-1);

else

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0);\n',i, m, a(m)-1); end end else for m=length(a)-1:-1:0 % level output if m==0

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0) \n',i, m, a(m+1)-1);

else

fprintf(fid, 'out_data_%d_%d : out std_logic_vector(%d downto 0);\n',i, m, a(m+1)-1); end end end fprintf(fid,'); \n'); fprintf(fid,'end top_level; \n'); fprintf(fid,' \n');

fprintf(fid,'architecture top_level of top_level is \n'); fprintf(fid,' \n');

fprintf(fid,'component full_adder \n'); fprintf(fid,'port( \n');

VHDL Implementation of a Fast Adder Tree

Chen Dacheng

TEKNISKA HÖGSKOLAN

LINKÖPINGS UNIVERSITET

VHDL Implementation of a Fast Adder Tree

Dacheng Chen

VHDL Implementation of a Fast Adder Tree

Master thesis in Electronics Systems

at Department of Electrical Engineering,

Linköping University

by

Dacheng Chen

LITH-ISY-EX--05/3760--SE

Supervisor: Henrik

Ohlsson

Examiner: Lars

Wanhammar

Linköping, 1 June 2005.

This thesis discusses the design and implementation of a VHDL generator

for Wallace tree with (3:2) counter modules and (2:2) counter modules to

solve fast addition problem.

The basic research has been carried out by MATLAB programming

environment and automatic generation of VHDL file based on the result

obtained from MATLAB simulation. MODELSIM has been used for

compilation and simulation of the VHDL file.

I would like to thank my examiner Professor Lars Wanhammar for

offering me this opportunity to do this thesis and I also appreciate my

supervisor Henrik Ohlsson, he gave me sincere help and advice during

the thesis work.

Thanks also to my classmate Kangmin Chen for his motivate advices on

my work. And all the members who give me help and support.

Last but not least, thanks to my family, Gaoyang Chen and Yan Wang,

for all the courage and support for my study here in Sweden.

TABLE OF CONTENTS

INDEX OF FIGURES

n

p

INDEX OF TABLES

Chapter 1

Introduction

1.1 Motivation

Computation operations like fast parallel multiplication using adder trees are present

in many parts of a digital system or digital computer, especially in signal processing,

high-speed circuits, graphics and scientific computation. Examples of such are

graphic processor, digital signal processors, communication or code compression. To

speed up addition is a very important part for computation.

There are many tree structure like Wallace adder tree [1], CSA tree, over turn stair

tree [2] and some other kinds of adder trees are mentioned in [3]-[7]. Here Wallace

tree is used as the tree structure because it is suitable for implementation

1.2 Thesis Target

Use MATLAB to make programs, the first part of the program is formed by blocks

where each block contains some cell arrays. The second part of the program is used

to generate a VHDL file, the information we need is all stored in cell arrays. Then

use MODELSIM to compile and simulated the VHDL file created by MATLAB.

1.3 Reading guide

This thesis is organized in five chapters.

Chapter 2 mainly discuss different adders, multi-operand addition and fast addition

trees.

Chapter 5 gives the conclusion of this work and future work that still has to be done.

Appendices shows the MATLAB code to generate VHDL code.

Chapter 2

Adder Structures

2.1 Adder Structures

Adders are used in many aspects [11], [12]. It is generally recognized that most of

the time required by adders is due to carry propagation, so how to reduce the

propagation time is the focus on today’s techniques. Different binary adder schemes

have their own characters, such as area and energy dissipation. No such adder

scheme is the best for every condition, so to choose in a specific context with

specific requirement and constraint is important. Because this thesis work does not

focus on analysis of delay time of different adders, here the function of some

commonly used adders is given.

2.1.1 Two’s Complement Representation

Two’s complement representation uses the most significant bit as a sign bit, making

it easy to test whether an integer is positive or negative. Range of two’s complement

representation is from

−

2

to

2

−