### Institute of Technology

### Department of Electrical Engineering

### Examensarbete

**MIMO Multiplierless FIR System**

Master thesis performed in Division of Electronic System

**MUHAMMAD IMRAN**
**KHURSHEED KHURSHEED**

Report number LiTH-ISY-EX--09/4322--SE

**2009-06-03**

### INSTITUTE OF TECHNOLOGY

**LINKÖPING UNIVERSITET **

Department of Electrical Engineering Linköping University

SE-581 83 Linköping, Sweden

Linköpings tekniska högskola Institutionen för systemteknik 581 83 Linköping

**MIMO Multiplierless FIR System**

Master thesis in Division of Electronic System
at Linköping Institute of Technology
by
**MUHAMAMD IMRAN**
**KHURSHEED KHURSHEED**
...
LiTH-ISY-EX--09/4322--SE
Supervisors
**Dr. Oscar Gustafsson **
**Examiner: Dr. Oscar Gustafsson**

**ABSTRACT**

The main issue in this thesis is to minimize the number of operations and the energy consumption per operation for the computation (arithmetic operation) part of DSP circuits, such as Finite Impulse Response Filters (FIR), Discrete Cosine Transform (DCT), and Discrete Fourier Transform (DFT) etc. More specific, the focus is on the elimination of most frequent common sub-expression (CSE) in binary, Canonic Sign Digit (CSD), Twos Complement or Sign Digit representation of the coefficients of non-recursive multiple input multiple output (MIMO) FIR system , which can be realized using shift-and-add based operations only. The possibilities to reduce the complexity i.e. the chip area and the energy consumption have been investigated.

We have proposed an algorithm which finds the most common sub expression in the binary/CSD/Twos Complement/Sign Digit representation of coefficients of non-recursive MIMO multiplier less FIR systems. We have implemented the algorithm in matlab. Also we have proposed different tie-breakers for the selection of most frequent common sub-expression, which will affect the complexity (Area and Power consumption) of the overall system. One choice (tie breaker) is to select the pattern (if there is a tie for the most frequent pattern) which will result in minimum number of delay elements and hence the area of the overall system will be reduced. Another tie-breaker is to choose the pattern which will result in minimum adder depth ( the number of cascaded adders). Minimum adder depth will result in least number of glitches which is the main factor for the power consumption in MIMO multiplier less FIR systems. The reason for this is the switching activity which can be defined as ''The average number of transitions between low and high logic levels per clock cycle''. Switching activity will be increased when glitches are propagated to subsequent adders (which occur if adder depth is high). As the power consumption is proportional to the switching activity (glitches) hence we will use the sub-expression which will result in lowest adder depth for the overall system.

**Acknowledgements**

We would like to thank our supervisor Dr. Oscar Gustafsson, for providing us the opportunity to do master thesis under his supervision. He always took keen interest in discussing new ideas and guided us with patience whenever we needed.

We thank all the Faculty members, PhD students and friends in Division of Electronic System, Division of Electronic Devices for motivation and guidance. Our special thanks go to Mr. Timmy Sundsrtöm at Division of Electronic Devices, our supervisor in Chip Design Course for giving us guidance at various stages of our master studies. It has been enjoyable time learning new ideas, skills and concepts at Linköping University.

Above all we thank our parents for the support and having confidence in us during our years of studies.

Finally we would like to thank Higher Education Commission (HEC), Pakistan for financial support and Swedish Institute (SI) for giving us support during our stay at Linköping University, Sweden.

**Preface**

The thesis consists of five chapters, which cover topics related to Non-Recursive Multiple Input Multiple Output (MIMO) Multiplier less Finite Impulse Response (FIR) Filter’s introduction ,Multiple Constant Multiplication, previous work done in this area, concept of CSE, concept of Overlapping Digit Pattern (ODP) and implementation strategy.

**Chapter 1: Gives an overview **

**Chapter 2: Gives an introduction to Previous Related Work done in this area.**
**Chapter 3: Is about MIMO System Using MCM. **

**Chapter 4: Gives an introduction of ODP and its extension to MCM problem.**
**Chapter 5: Algorithm and examples.**

**Table of Contents**

Table of Contents _______________________________________________________________ 1

1 Overview ______________________________________________________________ 4

1.1 Single constant Multiplication _________________________________________ 5 1.2 FIR Filter _________________________________________________________ 5 1.3 Multiple Constant Multiplication _______________________________________ 6 1.4 References ________________________________________________________ 8

2 Related Work__________________________________________________________ 10

2.1.1 Pattern Search Method ________________________________________________ 10 2.1.2 Cost Function Based search method ______________________________________ 10 2.1.3 Evolutionary Method__________________________________________________ 10 2.1.4 Direct Recoding Method _______________________________________________ 11 2.1.5 Mixed Radix Representation Method _____________________________________ 11 2.2 Number Representation System _______________________________________ 11 2.2.1 Conventional number system ___________________________________________ 11 2.2.2 Integer Fixed Point Arithmetic __________________________________________ 12 2.2.3 Fractional Fixed Point Arithmetic________________________________________ 12 2.3 Redundant Number System __________________________________________ 12 2.3.1 Canonic Sign Digit (CSD) number representation system _____________________ 13 2.3.2 Binary Number Representation__________________________________________ 13 2.4 References _______________________________________________________ 14

3 MIMO System Using MCM ______________________________________________ 16

3.1 Common Sub Expression Identification ________________________________ 16

4 Overlapping Digit Pattern (ODP) __________________________________________ 26

4.1 Motivation for searching Non Standard Patterns __________________________ 26 4.2 Properties of ODP _________________________________________________ 26 4.2.1 Three Generalized Classes of ODP for Single Constant Multiplication___________ 26 4.3 References _______________________________________________________ 30

5 Algorithms and Examples ________________________________________________ 32

5.1 Algorithm ________________________________________________________ 32 5.2 Detail discussion of Algorithm _______________________________________ 32 5.2.1 Select different options.________________________________________________ 32 5.2.2 Searching of Multiple patterns in the coefficient set._________________________ 32 5.2.3 Detect possible collision between the patterns.______________________________ 33 5.2.4 Select the option: Find the frequency of regular patterns or combined with ODP. __ 33 5.2.5 Select Tie breaker ____________________________________________________ 33 5.2.6 Display graph table and output __________________________________________ 35 5.3 Implementation Examples ___________________________________________ 36 5.4 Results___________________________________________________________54 5.5 References _______________________________________________________ 57

6 Conclusion and Future work ______________________________________________ 60

6.1 Conclusion _______________________________________________________ 60 6.2 Future Work ______________________________________________________ 61

## CHAPTER

### Overview

**1**

** Overview**

Modern Portable equipments like Cellular phones and MP3 players have DSP circuits. These DSP circuits have large number of multiplications of variables with several constants i.e. Multiple Constants Multiplication (MCM) which leads to large area ,delay and energy consumption in its hardware implementation. Multiplication by constants can be replaced with addition/subtraction and shifting operations only. Since shifts in hardware implementation has no impact on implementation cost because it is free of cost, so our focus in this thsis is to reduce the number of additions and subtractions operations. Proper optimization of computations using Common Sub-expression elimination techniques leads to significant improvement in several key design metrics such as throughput, area, and energy consumption. Until now attention has been paid to different MCM problem i.e. one dimension (means MCM belonging to single output only), Single Constant Multiplication and Matrix Multiplication problems. In this thesis a different approach is adopted to MCM problem i.e. MCM belonging to multiple outputs, which reduce number of adders by searching common pair in one cube for multiple inputs (variables) and multiple constants and in the subsequent cubes also if the same pattern exists. Each cube has several inputs (variables), several constants(shifted in time) and one output. The algorithm is general and is capable to handle variable number of inputs and constants and is based on iterative pair wise matching heuristics.

By using dedicated logic operator, one can optimize area, speed and power consumption of an electronic chip. Multiplication by constant is a good example. Multiplication of variables by some fixed constant can be achieved by using shift and addition/subtraction operations, instead of using a complete multiplier. Digital signal processing, Image processing and communication are the typical applications in which multiplication of variables by several constants is required. Specifically Finite Impulse Response (FIR) filters, Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) are the central operations which use a huge amount of multiplication of some values by several different constants. Multiplication by constants has been studied for a long time in different perspectives, and a lot of different solution exists for this problems. Long ago the problem was simple and was solved with less efforts i.e. the problem of single variable by single constant and was solved by using shift and adds operations only. Next the problem of many variables by many different constant (multiple constant multiplication problem i .e MCM) was solved by some kind of recoding of the constant and also sharing the common patterns. Different very effective solutions were explored for this problem. These solutions were effective for systems having many inputs but single output. The same efforts can be used for a system having multiple input multiple outputs. One can find common patterns in different constants that belong to same as well as different outputs. We have implemented an algorithm which finds common patterns in Binary or Canonic Sign Digit (CSD) representation of multiple constants. We have implemented an algorithm which converts multiple constants to the CSD/Binary code of many constants that may belong to the computation of the same or different outputs. Our main focus is to reduce total number of operations and we achieved this goal by selecting the most common sub expression i.e. the one with highest frequency. We have some freedom in the selection process of sub expressions. Different possibilities mean different effects on the implementation. So we have different solution for the same problem having trade off with each other. If one will select sub expressions based on its frequency then computation complexity might be reduced but on the other hand power consumption may increases because of cascading of more and more operands, due to which glitches will

increase, resulting in higher power consumption. Similar trade off exists in other tie breakers as well.

**1.1 Single constant Multiplication**

**1.1 Single constant Multiplication**

The structure of general multiplier is given in Fig1-1 which performs single constant multiplication. Here input data is x which is multiplied by a constant coefficient c and gives the result Y.

Fig 1-1 Concept of constant Multiplication

The binary representation of coefficient 85 is 1010101 and is realized as follows

Fig 1-2 Realization of 85 using Add and shifts

Left shift correspond to a power of two. Our focus is to reduce number of addition and subtractions, since the complexity of addition and subtraction is same so we will refer to both as additions in future. The shift operation can be implemented free in hardware so it does not cost. A different realization of above structure gives fewer adders as we can see in Fig 1-3

Fig 1-3 Realization of 85 using Add and shifts

The Fig 1-3 shows that number of adders reduced from three to 2 with a different realization.

**1.2 FIR Filter**

**1.2 FIR Filter**

When the impulse response becomes zero after finite number of samples, it is said Finite Length Impulse filter (FIR). FIR is common in many DSP circuits. The order of FIR filter is higher than IIR(Infinite length impulse response Filter) for same specification but FIR filters are stable and have linear phase response i.e all frequency components are delayed equally which corresponds to constant group delay [5].

The transfer function of an Nth-order FIR filter can be written as

###

###

*N*

*k*

*k*

*z*

*a*

*z*

*H*

0
1
### )

### (

_{ 1.1}

In the thesis an efficient algorithm for the solution of Multiple Constant Multiplication is presented. Previously Common Subexpression Elimination (CSE) was used to tackle the problem of MCM, mainly focusing on optimization of Finite-duration Impulse Response Filter area through the reduction of the multiplier block logic as discussed in [1]-[3]. In [1] a lot of other applications were identified where MCM can be used. In our work we present MCM algorithm for different applications, having options for different tie breakers, number representations, and Overlapping digit pattern which make it more general compared to all previous algorithms.

Fig 1-4 Direct form FIR Filter

Fig 1-5 Transpose direct form of FIR filter

A Filter with three tap is realized in Fig 1-4 (the order =3 using equation 1.1
where)*h _{k}*

*a*. The Filter structure in Fig 1-4 is referred as Direct form FIR Filter where sum of product computation is shown in a dashed box while Fig 1-5 is illustrates as Transpose direct form of FIR filter because it is obtained by transposing the signal flow graph of Fig 1-4 where the Multiple constant Multiplication block is marked with a dashed box. The MCM is also efficient to sum of product computation [4] .Both Multiple constant Multiplication and sum of product computation are referred as MCM block [4].In both cases the part that is not covered in dashed box is called as delay section and the adders in Fig 1-5 are called structural adders.

_{k}**1.3 Multiple Constant Multiplication**

**1.3 Multiple Constant Multiplication**

We have seen in the Transpose direct form of FIR filter that one single data is multiplied with all constant coefficients as shown in Fig 1-5(multiplier block is used in the dashed box). If the multiplier block is implemented efficiently by using structures that removes the redundant results within the coefficients, it largely reduces the number of adders while using shift and adder operations for multiplier implementation. The shift and add based multiplier realization without general multipliers for FIR filter implementation has gained significant consideration during the past few decades.

Fig 1.6 The principle of Multiple Constant Multiplication (MCM)

The MCM technique is applied to both Transpose direct form of FIR filter where we have MCM block and Direct form FIR Filter where we have sum of products parts.

**1.4 References**

[1] M. Potkonjak, M. B. Shrivasta, and P. A. Chandrakasan, “Multiple constant multiplication: Efﬁcient and versatile framework and algorithms for exploring common subexpression elimination,” IEEE Trans. Computer-Aided Design, vol. 15, pp. 151–161, Feb. 1996. [2] M. Mehendale, S. D. Sherlekar, and G. Vekantesh, “Synthesis of multiplierless FIR ﬁlters

with minimum number of additions,” in Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design. Los Alamitos, CA: IEEE Computer Society Press, 1995, pp. 668–671.

[3] R.I Hartley , “Subexpression sharing in ﬁlters using canonic signed digit multipliers,” IEEE Trans. Circuits Syst. II, vol. 43, pp. 677–688, Oct. 1996

[4] Oscar Gustafsson, Kenny Johansson, Håkan Johansson, and Lars Wanhammar

“Implementation of Polyphase Decomposed FIR Filters for Interpolation and Decimation Using Multiple Constant Multiplication Techniques”

**Related Work**

## CHAPTER

### Related Work

**2 **

**Related Work**

In the literature there are various methods for solving problem of Multiple Constant Multiplication. We shall discuss the most popular ones.

**2.1.1 Pattern Search Method**

Many algorithms are based on the pattern search method. An exhaustive search method is used to search the most common pattern and when decided then the pattern is stored in a set of basic constants. Different algorithms use different matching approaches to find the most common pattern. The way of sharing a basic constant (means most frequent pattern value) is also different in most of the algorithm. The MCM algorithm given in [1] have used tree based method for the selection of matched parts of the Sign Digit representation of the multiple constants. This paper clarifies a lot of things in a good way and is the most cited one.

The solution presented in [2] uses a graph based approach to show the matched patterns. Some transformations have been explored, which are used to produce a specific form of FIR filters with minimum number of operators and also the delay elements.

In [3] an algebraic equation for the most frequent pattern matches between multiple constants has been shown. In this paper the author did not show the multiple constants that he has used, so it is difficult to it compare it with other solutions.

In paper [4] a new approach has been shown and it is based on sharing digit in the CSD representation of multiple constants. It is searching for common pattern both vertically and horizontally. This solution gives more efficient solution in terms number of operator.

**2.1.2 Cost Function Based search method**

In paper [5] a different algorithm is proposed which is based on tree. It uses three kinds of operations.

Ti+1= Ti<<k 2.1

Ti+1= Ti±k 2.2

Ti+1= (Ti<<k) ±Ti 2.3

A cost can be defined for each operation according to the target technology. The cost function used to guide the exploration is the sum of the cost of all involved operations. Consider the example p = 111463*x, this algorithm gives a 5-additio solution.

T1= (((x<<3)-x)<<2)-x 2.4

T2= T1<<7 + T1 2.5

P = (((T2<<2)+x)<<3)-x 2.6 In paper [6] a simulated annealing based algorithm was proposed. This technique was used to produce multiplication by a small set of constants.

**2.1.3 Evolutionary Method**

Evolutionary graph generation method was proposed in paper [7]. These methods (Evolutionary) are based on genetic algorithm. This method was developed to generate arithmetic circuits and especially for multiple constant multiplication. The algorithm in [7] has produced better results than the previous evolutionary methods. Furthermore, it seems that these methods are limited to the problem of multiple constant multiplications and hence cannot be applied to more complex circuits.

**Related Work**

**2.1.4 Direct Recoding Method**

Recoding means to change the representation of the multiple constants/coefficients of FIR filter. A lot of work has been done in this area. The famous booth’s recoding is given in [8]. Booth has proposed a method in which long strings of ones has been replaced by more zeros and this way major improvements have been achieved at that time. The modified booth’s algorithm s given in [9] and is used in variable multiplier. But this booth recoding method did not give minimal number of ones, so it is generally not used in Multiple Constants multiplication problem.

**2.1.5 Mixed Radix Representation Method**

Some authors have been also done on higher radix recoding. Radix-8 representation has been used in [10] to implement some FIR filters. The digit set {0, ±1, ±2, ±4} was used to represent the constants. This representation is called punctured representation as some digits are missing and some accuracy has been sacrificed. But one can use some other accuracy measures for recovery. Sum of power of two (SOPOT) is another method used for the recoding of constants. In this method the coefficients are expressed in terms of summation of small values which all are in some power of two means like 4, 8 and 64. This method is very often used in filter that has been used in signal processing applications. The interested readers are referred to [11] for more details. Multiple radix representation of constants have also been used in the literature. One can found details in [12] of multiple radix representation of constants. In this paper some FIR filters have been re-coded which were used in multi rate converter and produced very good results. In [13] the author used this multiple radix representation of constant with double base numbers system. In this solution the author used both radices 2 and 3 simultaneously. So any constant is represented like given below.

Where 2.7

This multiple radix representation are very useful in some analog circuits, but are not very useful in our case i.e. multiple constant multiplication.

**2.2 Number Representation System**

**2.2 Number Representation System**

**2.2.1 Conventional number system**

Any number system can be defined as “The set of values that a digit can assume and also there is a rule for mapping between the sequence of digits and their numerical values”. Conventional number systems are not redundant. Also these number systems are weighted and positional number system. In these conventional number system any number X can be represented as a sequence of digits (X0X1X2, … , XWd-1) and has a unique representation. Mathematically the value of the number X can be written as

2.8

Where Wd is the word length and Wi is the weight associated with each digit. In a positional number system the weight Wi depends only on the position of the digit Xi . In conventional number systems the the weight Wi is the ith power of a fixed integer r-1 i.e. Wi = ri. Suh number systems are called fixed radix systems. The digits should satisfy the equation 0<=Xi<=r-1. Generally any number can have a fractional as well as integer part. The process is simplified by letting the left most k digits represent the integer part and the remaining Wd – k digits represent the fractional part. for example the binary representation of an arbitrary number Y with k equal to 2 can be done as Y0Y1Y2Y3…YWd-1. This corresponds to the weight Wi = r k-i-1. The weight of the least

understood. The numbers represented in this way are called Fixed point numbers. Generally in fixed point representation the most significant bit is at the far left location and the least significant bit is at the far right location. There are two further types of fixed point arithmetic i.e. Integer and Fractional.

**2.2.2 Integer Fixed Point Arithmetic**

In integer arithmetic the right hand part of the final result of any arithmetic operation is considered to be the most important one. For example the multiplication of two 4 bit number in binary representation results in one eight bits product. As in hardware system we have some limitation on the word length and it has some finite value. So in above multiplication if the hardware system can only store 4 bits then we have to round off our result to 4 bits. In integer Arithmetic the most significant four bits are stored and the rest are rounded off. As there will be a large error in this case, but any way this is the limitation of the integer arithmetic.

**2.2.3 Fractional Fixed Point Arithmetic**

The numbers in fractional fixed point representation can be considered in the range (1,1). Where ( means open ended i.e. 1 is included in number system). It means the result of any arithmetic operation will also be in the same range. The word length is reduced by dropping the right hand part and keeping only the left part. In this number system the result may be erroneous but the error will be small compared to it’s other counter part i.e. Integer fixed point representation. A clear advantage of the fractional arithmetic system is that parasitic oscillation are more easily suppressed than floating point arithmetic. Also chip area required by floating point number is larger than the area required by fixed point representation. Fixed point circuits are faster than floating point circuits. Hence in most VLSI circuits designed for DSP applications, calculations are performed in fixed point arithmetic rather than the floating point arithmetic which are used where high dynamic rage cannot be sacrificed.

It is clear from the discussion above that fixed point arithmetic is preferable in dedicated DSP applications so we will discuss it a bit in more detail. There exist different representation for fixed point arithmetic. Some of these representation are given below Signed Magnitude representation

One’s complement representation Two’s complement representation Binary Offset representation

As in our implementation we have used different representation which are binary, Twos Complement, Sign Digit and canonic sign digit, so we will discuss these in more detail.

**2.3 Redundant Number System**

**2.3 Redundant Number System**

Binary number system can be used to represent positive numbers only. Also we know that some of the coefficients in digital filters are negative and some are positive. Another number representation system is required to represent negative and positive numbers simultaneously. Redundant number system comes for the rescue. By using redundant number system like canonic sign digit (CSD) we can represent positive as well as negative numbers at the same time. By using redundant number system it is possible to speed up certain arithmetic operation. In fact addition and subtraction can be performed without long carry paths. But this increases the complexity for some other arithmetic and non-arithmetic operation and also results in larger register size. Sign digit number system is one of the redundant representation system. In SD the digits are allowed to take negative values i.e. Xi {1, 0, 1}. The range provided by the SD number system is [2n +1, 2n

**-Related Work**

1]. One off the main advantages of this number representation system is that the number of required addition/subtraction operation cycles can be reduced in some multiplication algorithm. Example of SD representation is given below

(7/16)10= (0.0111)2= (0.1-111)2= (0.10-11)2

**2.3.1 Canonic Sign Digit (CSD) number representation system**

Canonic Sign Digit (CSD) representation ensures that the number of non zero bits in the recoded value is as small as possible. In radix-2 Sign Digit representation, the digits belong to the set {-1, 0,1} . A number is said to be in CSD format if there exists no two adjacent non-zero digits. It means that in CSD representation of the constant there could not be any two adjacent values like 11, -1-1,-11 and 1-1. In [10] it has been claimed that, by using minimal recoding representation such as CSD on an n-bit unsigned value, the number of non-zero digits are bounded by (n+1)/2 and it tends asymptotically to an average value of n/3 + 1/9.

CSD representation is a especial case of SD representation. It is not redundant and has a unique representation for each number. It has some especial properties which make it useful in constant multiplication. A number X in the range [-4/3 + Q, 4/3 -Q], where Q = 2-W , W = Wd -1 for Wd=odd and W = Wd-2for Wd = even, in CSD representation is given

below

###

###

1 0### 2

*d*

*W*

*i*

*i*

*i*

*X*

*X*

_{ 2.9}Where Xi {-1, 0, 1}.

Here in CSD representation two consecutive number cannot be non-zero simultaneously. It means that at least one of the Xi and Xi-1 should be zero. In other word we can say as

XiXi-1= 0 for 0 <= I <= Wd-2

Also CSD representation has minimum number of non-zero digits. It can be shown that the number of non-zero digit in any CSD number on the average is

9
2
1
3
*d*
*W*
*d*
*W* _{}
2.10
Hence for arbitrarily large Wd the average number of non-zero digits is almost Wd/3

which is smaller than the average in case of binary representation which is Wd/2. Some

illustrating Examples of CSD representation are given below (7/16)10 = (0.0111)2 = (0.100-1)CSD

(15/16)10 = (0.1111)2 = (1.00-1)CSD (15/32)10 = (0.01111)2 = (0.1000-1)CSD

**2.3.2 Binary Number Representation**

Generally any number can be represented as strings of bits of the set {0, 1}. The binary point is only understood and could be placed any where by shifting, only positive integer numbers can be represented by the binary number system. Any integer N require N= log2(N) digits to be represented in the binary number system. So the integer N will be

represented by using n digits of Xi with the corresponding weight 2i . The sequence of these digits for positive numbers N will be like Xn-1Xn-2…X2X1X0where Xi {0, 1} and

this corresponds to the following equation

**2.4 References**

**2.4 References**

[1] M. Potkonjak, M.B. Srivastava, and A.P. Chandrakasan, “Multiple Constant Multiplications: Efficient and Versatile Framework and Algorithms for Exploring Common Subexpression Elimination,” I E E E Tr a n s. C o m p u t e r - A i d ed D e s i gn o f I n t e g r a t e d Ci rc ui t s a n d Systems, vol. 15, no. 2, pp. 151-165, Feb. 1996. [2] H.-J. Kang and I.-C. Park, “FIR Filter Synthesis Algorithms for Minimizing the Delay

and the Number of Adders,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 48, no. 8, pp. 770-777, Aug. 2001.

[3] M. Martinez-Peiro ´ , E.I. Boemo, and L. Wanhammar, “Design of High-Speed Multiplierless Filters Using a Nonrecursive Signed Common Subexpression Algorithm,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 49, no. 3, pp. 196-203, Mar. 2002.

[4] A. Vinod, E.-K. Lai, A. Premkumar, and C. Lau, “FIR Filter Implementation by Efficient Sharing of Horizontal and Vertical Common Subexpresions,” Electronics Letters, vol. 39, no. 2, pp. 251253, Jan. 2003.

[5] P. Briggs and T. Harvey, “Multiplication by Integer Constants,” technical report, Rice Univ., 1994.

[6] M.F. Mellal and J.-M. Delosme, “Multiplier Optimization for Small Sets Of Coefficients,” Proc. Int’l Workshop Logic and Architecture Synthesis, pp. 13-22, Dec. 1997.

[7] N. Homma, T. Aoki, and T. Higuchi, “Evolutionary Graph Generation System with Transmigration Capability and Its Application to Arithmetic Circuit Synthesis,” IEE Proc., vol. 149, no. 2, pp. 97-104, Apr. 2002.

[8] A.D. Booth, “A Signed Binary Multiplication Technique,” Quarterly J. Mechanical Applications of Math., vol. IV, no. 2, pp. 236-240, 1951.

[9] M.J. Flynn and S.F. Oberman, Advanced Computer Arith metic Design. Wiley-Interscience, 2001.

[10] R.I. Hartley, “Subexpression Sharing in Filters Using Canonic Signed Digit Multipliers,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 43, no. 10, pp. 677-688, Oct. 1996.

[11] P. Boonyanant and S. Tantaratana, “FIR Filters with Punctured Radix-8 Symmetric Coefficients: Design and Multiplier-Free Realizations,” Circuits Systems Signal Processing, vol. 21, no. 4, pp. 345-367, 2002.

[12] C.K.S. Pun, S.C. Chan, K.S. Yeung, and K.L. Ho, “On the Design and Implementation of FIR and IIR Digital Filters with Variable Frequency Characteristics,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 49, no. 11, pp. 689-703, Nov. 2002.

[13] V.S. Dimitrov, G.A. Jullien, and W.C. Miller, “Theory and Applications of the Double-Base Number System,” IEEE Trans. Computers, vol. 48, no. 10, pp. 1098-1106, Oct. 1999.

MIMO System Using MCM

## CHAPTER

### MIMO System Using MCM

**3 **

**MIMO System Using MCM**

**3.1 Common Sub Expression Identification**

**3.1 Common Sub Expression Identification**

Identification of suitable common sub expression and its elimination results in significant hardware reductions. Multiplication of variables with constant coefficients can be achieved by using shift and add/sub operations only. As shift operation be realized using wiring, so it is free of hardware cost.

Multiple Constants Multiplication(MCM) Problem is a part of our algorithm. If one want to use our algorithm for MCM then he/she needs to understand how the information can be entered in to the program. In the MCM problem multiple variables are to be multiplied with different constants at different times.

The following is the main building block used for different types of computation in our algorithm. The cube provides different information according shift in time, shift in power of 2 and which input the most common pattern is belonging to. This one cube gives information regarding only one output but of course for many inputs. If we want to share a common pattern in multiple outputs then we need to analyze multiple cubes. The common pattern will be searched in multiple cubes at a time.

For example if we want to implement Non Recursive Multiple Input Multiple Output FIR system using shift and addition operation only, then we need to study multiple cubes like shown in Fig 3-1. The system of coefficients belonging to the output1of FIR1 will be

placed in first the plane in the cube below and the coefficients belonging to the output2of

FIR2 will be placed in second plane and so on. The Binary/CSD representation of

coefficient with no shift in time will be placed in the top most row of the plane. The coefficient having single shift in time (means Z-1) will be placed in second row of a plane, and so on. Let us have a clear idea about power of 2. Assume that first element (from left side) in any row is multiplied by 20, similarly second element in any row is multiplied by 21 and so on. Similarly Binary/CSD representation of coefficients of output2 will be placed in 2nd cube and so on.

For the common sub expression elimination initially we applied exhaustive search method for finding most frequent pattern. Any pattern is composed of two non-zero values. Both of these non-zero values could belong to same row in any plane and also any plane in a cube. But most importantly the two non-zero elements in any pattern should belong to one cube only. But if a pattern is common in two or more cubes then it can be shared. So it is clear that both of the non-zero values should belong to same cube but any most common pattern could belong to different cubes.

MIMO System Using MCM

Fig. 3-1 : General View of a Non Recursive MIMO FIR System

Our algorithm is mainly based on three different kinds of information provided in Fig3-1. One is the power of 2. So we found the difference in power of 2 of the two non-zero values in corresponding coefficient’s binary/CSD representation and stored it in one array. The next one is the shift in Time. This information is regarding the coefficient, whether it is multiplied by any power of Z or not(means is it shifted in time or not ). The third information is regarding the input, means to which input both of none zero values of the pattern belong to and this difference is stored in other array. These three information (shift in time, shift in power of 2, difference in input) are stored for each of the pattern. Next we calculated the frequency of each of the pattern by using these arrays calculated before. If three of the information for any two patterns matched twice then it means the frequency of this pattern 2. If these information matched three time then it means it’s frequency is three and so on. This frequency information of each of the pattern is stored in other array. An exhaustive search algorithm is now applied on the frequency array for finding the most frequent pattern. After finding the most common pattern, we have applied another algorithm to find the location in the cube of each of the occurrence of the most frequent pattern. Then we have replaced all these occurrences with a symbol. This process is repeated for all of the patterns having frequency greater than or equal to 2. At the end when all pattern having frequency greater than or equal to 2 are replaced with different symbols then all patterns having frequency of one are also replaced so that we can apply the same algorithm to create the output graph table. The final thing is to find how many adders do we need to implement this Multiple Input Multiple Output non recursive FIR system. We used a variable that is storing information about the number of adders. Whenever a most frequent pattern is replaced by a pattern then one increment is made in this variable. The number of eliminated common sub expression which is the total number of adders required for this system is the value of this variable.

Fig 3-2: General MIMO System Example 3.1

Two Inputs Two Outputs Non Recursive FIR (Three tap) Systems

Plane 1 Plane 2

1 _{1} 1 2 2 2 2

1 1 _{1} 1 2 2

1 _{1} 1 2 _{2} _{2} _{2}

Table 3-1: CSD Representation of coefficients belonging to Y1 Plane 1 Plane 2

98 1 _{1} 1 75 2 2 _{2} _{2}

27 1 _{1} _{1} 93 2 _{2} _{2} 2

47 1 _{1} _{1} 76 2 2 _{2}

Table 3-2: CSD Representation of coefficients belonging to Y2

3.1

0 1

1 0 C SE3 = X2 + X1Z-1

MIMO System Using MCM Plane 1 Plane 2 1 -1 1 2 3 3 -1 3 2 3 -1 1 2 -2 -2 -2

Table 3-3: Y1After elimination of first Common Sub expression

Plane 1 Plane 2

1 -1 2 2

1 -3 -3 2 -2 -2 2

1 -1 -1 2 2 -2

Table 3-4: Y2 After elimination of first Common Sub expression

3.2 Plane 1 Plane 2

4 1 2

3 3 **-1** 3 2

3 -1 1 2 -2 -2 -2

Table 3-5: Y1After elimination of 2nd Common Sub expression

Plane 1 Plane 2

4 1 2 2

1 -3 -3 2 -2 -2 2

4 -1 2 2 -2

Table 3-6: Y2 After elimination of 2nd Common Sub expression

3.3

Plane 1 Plane 2

4 1 2

3 3 -1 3 2

3 5 1 2 -2 -2

Table 3-7: Y1After elimination of 3rd Common Sub expression 3 0 Xi

### C S E

4### = -3 X

1### Z-i

### -4 -1

### X

i### CSE

5### = -4X

### 1

### - X

### 2

### Z

### -i

Plane 1 Plane 2

4 1 2 2

-5 -3 -3 -2 -2 2

4 5 2 2

Table 3-8: Y2 After elimination of 3rd Common Sub expression

3.4 Plane 1 Plane 2

4 1 2

3 3 -1 3 2

3 5 1 2 6

Table 3-9: Y1After elimination of 4th Common Sub expression

Plane 1 Plane 2

4 1 -6

-5 -3 -3 -2 -2 2

4 5 -6

Table 3-10: Y2 After elimination of 4th Common Sub expression

3.5 Plane 1 Plane 2

7 2

3 3 -1 3 2

3 5 1 2 6

Table 3-11: Y1After elimination of 5th Common Sub expression

Plane 1 Plane 2

7 -6

-5 -3 -3 -2 -2 2

4 5 -6

Table 3-12: Y2 After elimination of 5th Common Sub expression 0 -5 Xi

### CSE

6### = -5X

2 Z-i 61 0 Xi Z-i CSE_{7}= CSE

_{4}+ 64X

_{1}

MIMO System Using MCM 3.6 Plane 1 Plane 2 7 2 8 -1 3 2 3 5 1 2 6

Table 3-13: Y1After elimination of 6th Common Sub expression

Plane 1 Plane 2

7 -6

-5 -8 -2 -2 2

4 5 -6

Table 3-14: Y2 After elimination of 6th Common Sub expression

3.7

Plane 1 Plane 2

7 2

8 -1 3

3 5 1 2 9

Table 3-15: Y1After elimination of 7th Common Sub expression

Plane 1 Plane 2

7 -6

-5 -8 -2 2

4 5 -9

Table 3-16: Y2 After elimination of 7th Common Sub expression

0 5 5 0 Z-i Xi

### CSE

8### = 5*CSE

3### = 5Z

-1### X

1### +5X

2 0 2 0 -5 Z-i Xi### CSE

9### = X

2### +CSE

6### Z

-1### = -5Z

-1### X

2### +2X

2Fig 3-3: Two Inputs Two Outputs System (Output1)

The computation at the left side of the line in the Fig 3-4 is the results of Common Sub-expression and are computed only once in figure 3-3. These results are shown below only for clarity and will not be computed again, and will just be used from ones already calculated previously.

MIMO System Using MCM

Fig 3-4: Two Inputs Two Outputs System (Output2)

All of the above sub-expressions are derived from a table like the one given in the following generalized example.

**Overllaping Digita pattern**

## CHAPTER

### Overlapping Digit Pattern

### (ODP)

**4 **

**Overlapping Digit Pattern (ODP)**

Till now a lot of peoples gave attention to the problem of multiple constant and single multiplication problems. All these peoples limited their efforts to finding standard pattern in the representation of the constants. In paper [1] the author, had explored some new concept in this research area.

**4.1 Motivation for searching Non Standard Patterns **

**4.1 Motivation for searching Non Standard Patterns**

By considering non standard pattern one can find optimal solution to single and MCM problem. Consider the pattern 101000010100011001 first. Here in this pattern we see that the frequency of each pattern other than 101 is 1 and hence pattern 101 is the most frequent pattern. By searching for standard pattern 101 only we can save at most one adder, but if we will search for non standard pattern called Overlapping Digit Pattern (ODP), we can save one more adder. The reason for this saving is existence of non standard pattern 11001 in the above representation of constant. 11001 = 101 + (101<<2).

**4.2**

**4.2**

** Properties of ODP**

**Properties of ODP**

Any standard pattern is composed of two non-zero digits. One of the most important property of ODP is that the standard pattern is present at some locations in constant even though this pattern does not exists in the ODP location of the constant. In example in section 4.1.1 the pattern 101 exists in the binary representation of the constant twice but it does not exist in ODP location means 11001 and still we are able to replace 11001 with standard pattern 101. If at some location the ODP as well as the standard pattern are present in the representation of the constant then ODPs are not replaced but the standard pattern is replaced.

**4.2.1 Three Generalized Classes of ODP for Single Constant **
**Multiplication**

Let P represent a pattern in the form P = (S << i) ± S, for some positive non-zero integer i and some S is either the input or any arbitrary intermediate signal. P may also represent the negative of a pattern, obtained by inverting the sign of every digit of the pattern. Normally two occurrences of a pattern P, defined as (P << n) ± P, produces four non-zero digits of S in which case ODP is not needed. However it is possible that in some specific cases two occurrences of P produces only three non-zero digits of S. These three digits of S is called ODP the standard pattern P. There are three main ways to align two occurrences of P with respect to each other to obtain an ODP. These three methods leads us to three general classes of the ODP. In general class 1 ODPs, the left digit of P aligns with the right digit of P << n to produce a zero in that position and a non-zero digit at one higher position. In general class 2 and general class 3, the left digit of pattern P and the right digit of pattern P << n are positioned in such a way that it results in SS or SS, which are actually S0 and 0S respectively. The definition of the three general classes of ODPs is given in table on the next page and examples are provided afterwards.

**Overllaping Digita pattern**

General Class I General Class II

If

P = S + S*2i

Search for ± [ S + S*2 i+1+ S*22i] ± [-S + S*2i+ S*22i+1] Replace it

with

P + P*2i P*2i+1- P

If

P = S - S*2i

Search for ± [ S – S*2 i+1+ S*22i ] ± [ -S - S*2i+ S*22i+1] Replace it

with P + P*2

i+1 _{P*2}i+1_{- P }

General Class III If

P = S + S*2i

Search for ± [-S - S*2i-1+ S*22i-1]
Replace it
with
P*2i-1- P
If
P = S - S*2i Search for ± [ -S + S*2
i-1_{+ S*2}2i-1_{]}
Replace it
with
P*2i-1+P

Table 4-1 : Three General Classes of ODP for Single Constant Multiplication Example 4.1:

Example 4.2:

Limitations Removed From Representation of the Constant Due to the Three General Classes of ODPs

Generally any representation of constant has some limitations on the type of patterns that can exist in these representations. Let us try to substitute B = (P << n) + P where P = (S << i) + S or substitute B = (P << n) – P where P = (S << i) –S. If n = i, then B actually made up of three digits of S. So by using general class 1, we can found these patterns and effectively replace these patterns. If n ≠ I then B is made up of four digits of S and hence there is no advantage of replacement of such pattern.

Similarly a different set of restrictions are removed from the representation of the constants by general class 2 and general class 3. Let us try to substitute B = (P << n) + P where P = (S << i) - S or substitute B = (P << n) – P where P = (S << i) +S. If n = i +1 or n = i -1, then B is actually made up of three digits of S. So by using general class 2 and

general class 3, we can found these patterns and effectively replace these patterns. Without these general classes we would not be able to detect and replace such patterns. The basic concept of ODP ( all above) and some additional non general classes are discussed with much more detail in [1]. The main purpose of considering these general classes in our thesis is to develop these classes for Multiple Constants Multiplication Problem. We have developed and implemented four general classes in addition to these three general classes of ODP in our matlab program.

The formal definition of the four generalized classes of ODP for Multiple Constants Multiplication Problem is given in the following table and the examples are given after wards to make clear all concept.

General Class I General Class II

If P = S +SZ-i Search for ± [ S + 2SZ -i+ SZ -2i ] ± [S+2 j+1Z-iS+2 2jZ -2i S ] Replace it with P + PZ-i P + 2 jZ -iP If P = S-S2 jZ-i Search for

± [ S - 2SZ -i+ SZ -2i ] ODP does not exist Replace

it with P + PZ

-i

General Class III General Class IV

If
P=S+2 jZ-is
Search
for
± [S-2 jZ -iS+22j+1Z-2iS] ± [S-2 j-1Z-iS+22j-1Z -2i S ]
Replace
it with P - 2
j+1_{Z }-i_{P} _{P - 2 }j-1_{Z }-i_{P}
If
P=S-2 jZ-is
Search
for
± [S+2 jZ-iS+22j+1Z -2i S ] ± [S+2 j-1Z-iS+22j-1Z-2iS ]
Replace
it with
P + 2j+1Z -iP P + 2 j-1Z -iP

Table 4-2 : Four General Classes of ODP for Multiple Constant Multiplication

**Overllaping Digita pattern**

Example 4.3:

**4.3 References**

**4.3 References**

**[1] Jason Thong, and Nicola Nicolici, Member, IEEE “ Time Efficient Single Constant **

Multiplication Based on Overlapping Digit Patterns”

**Algorithm and Examples**

## CHAPTER

### Algorithm and Examples

**5 **

**Algorithms and Examples**

**5.1 Algorithm**

**5.1 Algorithm**

(1) Input the coefficients in decimal format.

(2) Select different options for number representation, ODP, Tie Breaker, Maximum Delays, Display and Collision on off.

(3) Search all multiple patterns in the coefficient set. (4) Store the patterns in an array.

(5) Detect and remove collision between the patterns for frequency calculation or ignore collision and calculate frequency.

(7) Find frequencies of

I. Regular patterns with Collision. II. Regular patterns without Collision. III. Find the frequency of ODP separately.

(8) Combine the frequency of ODP with regular frequency (optional). (9) If two or more have same frequency selected Tie breaker will replace occurrences having maximum frequency.

(10) The occurrences of patterns having maximum frequency is replaced with a symbol.

(11) Repeat step 3 unless no common pattern exits. (12) Display graph table for the replaced pattern (13) End

**5.2 Detail discussion of Algorithm**

**5.2 Detail discussion of Algorithm**

**5.2.1 Select different options.**

Each Coefficient can be represented in any format, i.e binary, CSD, Two’s complement, Signed digit etc. Overlapping Digit Pattern is made optional, there are four choices of tie Breaker i.e Random, Minimum delays (for less area), minimum shifts, minimum adder depth (for low power consumption), an option is available to set constraint on maximum delays, Display has three choices one will give only final result, second will give complete flow showing all replaced symbols in matrix and third option is interactive. Collision can be turned on and off for frequency calculation.

**5.2.2 Searching of Multiple patterns in the coefficient set.**

The exhaustive search is performed to detect common patterns in multiple
cubes where one cube represent one output, In one cube we have three
critical information for searching of patterns, power of 2 ,power of z and
multiple inputs. Once the proper pattern is identified, remove these patterns
and calculate only once. In Table 5.1 three Tab FIR filter is given. By
**looking at the coefficients we see that 1 0 1 is present 3 times so an **
optimized structure can be implemented instead of original. The
**subexpression 1 0 1 can be computed only once and reused for realization of **
other multipliers. So other two occurrences in the example are removed.

**Algorithm and Examples**

** **

**5.2.3 Detect possible collision between the patterns.**

The algorithm is based on exhaustive search method so all possible patterns must be examined. In some cases collision can occur between different patterns. Collision is when a pattern shares at least one nonzero bit. Such collision has to be detected and removed during replacement. The frequency is calculated with collision taken into account and without handling collision as these two options will result in different complexity of algorithm but while replacing the pattern, collision must be handled. For frequency calculation with collision detection, the patterns is counted once. A detail discussion is given in Example 5.7.

Here simple example for collision detection is discussed. The constant
468110 **has a binary representation 1001001001001**2**, The pattern 1001 is **

present four times. If collision is taken into account then the frequency of

**1001 is two. But if the collision is not considered the the frequency of 1001**

**is 2. In both cases the replacement of pattern will be A00000A000001 when **
**A=1001.In one of the bit positions of pattern 1001, the symbol A is replaced **
while in other bit position of the pattern zero is replaced.

**5.2.4 Select the option: Find the frequency of regular **
**patterns or combined with ODP. **

Frequency calculation is based on three methods, In one of the option while
calculating the frequency, collision is ignored, in second option collision is
considered while in third frequency is calculated for both regular pattern
pattern and for ODP. The decision of replacement is done on the combined
**frequency. The constant 1365 whose binary representation is 10101010101**
has frequency of 5 for the pattern 101 without considering the collision
while it has frequency of 3 when collision is considered as can be seen in
bold font. The pattern having maximum frequency is selected and all its
occurrences are replaced i.e. calculate 101 once, represent it by symbol A.
One of the bit positions of 101 is replaced by A while the other bit position
**is replaced by 0. The output will be A000A000A00. The algorithm search**
again and again for common patterns unless no common patterns exist.

**5.2.5 Select Tie breaker**

The pattern having highest frequency is selected for replacement which reduces the computation complexity but on the other hand power consumption will increase by cascading of more operands which results in

**Coefficients Binary Representation**

37 ** 1 0 0 1 0 1**

41 ** 1 0 1 0 0 1**

43 ** 1 0 1 0 1 1**

gliches. Once the pattern with maximum frequency is identified, we have four different choices to replace the pattern based on minimum adder depth, minimum delay, minimum Shift or random selection.

Different Tie Breakers are discussed with examples.
** Example 5.2 **

In this example a method is described to achieve minimum adder depth. The number of cascaded adders (Adder depth) is the main cause of power consumption. Reason is that switching activity increases when glitches are propagated through adders resulting in increased power consumption. In the Fig 5-1 there is a tie for CSE8 between the operations X3+CSE4 and CSE6+

CSE7. If the operation CSE6+ CSE7 is performed then cascaded adder would be three while if the operation X3+CSE4 is performed, the number of

cascaded adders would be two. Using optimization choice of minimum adder depth, the operation X3+CSE4 is selected for replacement which

results in minimum cascaded adders.

1
*x*
2
*x*
1
*x*
3
*x*
1
*x*
3
*x*

Fig 5-1 Selection of minimum adder depth

The minimum adder depth is calculated by storing depths for all sub expressions and then finding minimum in them. For the Fig 5-1, the table is shown having different depths.

Symbol X1 X2 X3 CSE4 CSE6 CSE7 CSE8 CSE9

depth 0 0 0 1 1 2 3 2

Table 5-2 Depth of each Symbol

From the above table maximum(X3, CSE4) gives 1 and maximum (CSE7,

CSE6) gives 2 which means that X3+CSE4 will have adder depth of 1 while

CSE6+CSE7 will have adder depth of 2. Selecting pattern having minimum

**Algorithm and Examples**

**Example 5.3 **

In this example a method is described to achieve minimum delays which results in minimum area. First Subexpression is CSE5=X1+X1Z-4 shown in

Fig 5-2(a) which updates 4 delays for X1. Next time there is a tie between

CSE7=X1+X1Z-3shown in Fig 5-2(b) and CSE6=X2+X2Z-2 shown in Fig

5-2(c) , it seems that CSE7 costs 3 delays compared to CSE6 which has 2 delays but since X1 has already 4 delays so CSE7 costs no delays shown in

Fig 5-2(d). This tie breaker will result in minimum chip area as delays contribute to large area.

D D D D
+
1
*x*
CSE5
+
D D
CSE6
2
*x*
+
D D D
1
*x*
1
*x*
CSE7
D D D D
+
+
1
*x*
1
*x*
CSE7
CSE5
a
b c
d

Fig 5-2 (Selection of minimum delays) a, b, c, d shows the computations of CSE7,CSE5, CSE6, CSE10, CSE8

The Table 5-3 shows original delays and after computing an operation the delays are updated.

Symbol *x*_{1} *x*_{2} CSE5 CSE6 CSE7 CSE8

delays 0 0 0 0 0 0

Updated delays

4 2 2 2

** **

Table 5-3 Delays of each Symbol

The forth choice of minimum shift will result in different design metrics. When there is a tie between patterns, i.e CSE6=X2+X2Z-2 and

CSE7=X1+12X1Z-3, CSE6 should be selected as it has minimum shift.

**5.2.6 Display graph table and output**

To know which pattern is selected for replacement we have the information in graph table which is displayed for each replacement. Example 5.5 shows details of output.

**5.3 Implementation Examples**

**5.3 Implementation Examples**

** Example 5.5 **

In this example it is shown that how a pattern is recognized when it is
replaced. The Graph table shown in Table 5-4 shows that a pattern 16_{x}_{1} _{ z}_{x}_{2} 1

represented by bold text is selected from two constants of two FIR. The two input system will have first symbol as CSE3 when a pattern is detected. The

common sub-expression in following coefficient is 1

2
1
16*x* * zx* .
X2
X1

Fig 5-3 Two FIRs with two Constant each, represented in binary representation.

Table 5-4 Graph Table

** **

** The graph table shows that first bit of pattern ** 1
2
1

16*x* * zx* belongs to
input1, shifted by 16 with zero delays. While the second bit belong to
input2, having zero shift and 1 delay. These three important information

tells us which pattern is replaced and by which symbol. In this case the pattern 1

2 1

16_{x}* _{ z}_{x}*

_{is replaced by CSE3. }

The output is represented in a matrix, which tells us the shifts and delays in the Pattern, where first column shows input1, second column shows input2

and so on. The rows shows how much a bit has a delay.

16 0 » 1
2
1
16_{x}* _{ z}_{x}*
0 1

Symbol Input1 Shifts Delays Input2 Shifts Delays

CSE3 1 16 0 2 0 1
0 0 0 2 0 0 0
**2 0 2 0 0 0 2**
**0 0 0 0 1 0 1**
0 0 0 0 0 0 1

**Algorithm and Examples**
In the second iteration the pattern 1

2 1

16_{x}* _{ z}_{x}*

_{is replaced by symbol 3.Fig 5-4 }

shows that one of the bit is replaced by symbol 3 while the other bit position is replaced by zero. Similarly all such patterns will be replaced unless no

common pattern exits. The common Sub-expression is 1

2 1 1

1 2

16*x* *x* *z* *x* *z* .

Fig 5-4 Two FIRs with two Constant each with replaced pattern, represented in binary representation.

Table 5-5 Graph Table Corresponding output will be

16 0 + 0 0 =
0 1 1 0
16 0 » 1
2
1
1
1 2
16_{x}_{}_{x}* _{z}*

_{}

_{x}**

_{z}_{ }1 1

Symbol Input1 Shifts Delays Input2 Shifts Delays

CSE10 CSE3 1 0 1 1 1
0 0 0 2 0 0
**0 0 0 2 0 2 **
**0 0 1 3 0 3**
0 0 0 0 0 0

** EXAMPLE 5.6 **

In this example it is shown how common sub-expression is replaced by a symbol. The system has one output and two inputs .In the example has maximum X1 +X2=6 frequency, so it is replaced but if there was another

output which has two inputs and having 100001 such that the combined frequency of this pattern of output1 and output2 would outnumber the frequency of X1+X2=6then this pattern would have been eliminated.

x1 x1 +x2 x2

Table 5-6 binary representation of six constants

The frequency of each pattern in the six constants shown table 5-6 is
calculated in table 5-7.
Patterns Frequencies
**101** 2
100001 3
1001 2
X1+X2 6

Table 5-7 frequency of each pattern

In the second iteration the most frequent pattern which is X1+X2 is replaced

six times by symbol a. In one of the position symbol a is replaced as shown in table 5-8 while in other bit position 0 is replaced as shown in bold text in table 5-9. Similar is the case for all other five occurrences.

The frequency of each pattern is now changed as shown in table 5-8 and table 5-9. New frequencies are calculated .The pattern a0000a has a frequency of 3 while X1+X2has a frequency of 3

**1 0 0 1 0 1**
1 0 1 0 0 1
1 0 0 0 1 1
**1 0 0 1 0 1**
1 0 1 0 0 1
1 0 0 0 1 1

**Algorithm and Examples**

x1

Table 5-8 After replacing symbol

x2

Table 5-9 After replacing zero

The second iteration after replacing the pattern a0000a by symbol b. This is shown in table 5-10 and table 5-11

x1

Table 5-10 After replacing symbol and zero

x2

Table 5-11 After replacing symbol and zero

In the third iteration X1 +X2 is replaced by symbol c. After that there are no

more common patterns left. So the final output is shown in

x1

Table 5-12 After replacing symbol and zero

x2

Table 5-13 After replacing symbol and zero

**a 0 0 1 0 a**
a 0 1 0 0 a
a 0 0 0 1 a
**0 0 0 1 0 0**
0 0 1 0 0 0
0 0 0 0 1 0
b 0 0 1 0 0
b 0 1 0 0 0
b 0 0 0 1 0
0 0 0 1 0 0
0 0 1 0 0 0
0 0 0 0 1 0
b 0 0 c 0 0
b 0 c 0 0 0
b 0 0 0 c 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

**Example 5.7 **

In this example a method is shown to calculate the common pattern. Common sub expression in Two input Matrix for one output

inputs Power of 2

z

Power of z

Fig. 5-5 (Two Input FIR System)

We have two input matrix in Fig 01.(LSB is taken on left side as in MATLAB). In the above cube the most common pattern occur in the elements of plane 1 and plane2 ,means elements with (dz,d2,di) values as (1,1,1) and (1,1,2) make most common subexpression as there are three such expression which have same (zdif,p2dif,idif).

Each non-zero element is recognized by its input, power of 2 and shift in time. Dz represent shift in time, D2 represent power of 2 and Di is the respective input .

x1 x2

Table 5-14 power of 2, Shift in time and input information of each bit

After calculating the z differences, power of 2 differences and input differences we get the following table which is used for finding common sub-expression.

Table 5-15 difference in power of 2, Shift in time and input of each pattern Those non-zero elements which have same (Zdif, P2dif, idif) are considered as common sub-expression. In the above table (Zdif, P2dif, idif) - (0,0,1) makes the most common sub-expression which has frequency of 3.Replacing these common sub expression by the term a we get the following output matrix . Dz 1 1 2 1 1 2 D2 1 2 2 1 2 2 Di 1 1 1 2 2 2 Zdif 0 1 0 0 1 1 0 0 1 -1 -1 0 0 1 1 P2dif 1 1 0 1 1 0 -1 0 0 -1 0 0 1 1 0 idif 0 0 1 1 1 0 1 1 1 1 1 1 0 0 0 2 2 0 0 2 0 x2 1 1 0 0 1 0 x1

**Algorithm and Examples**

Fig. 5-6 Two Input FIR System after CSE

The output graph table for this matrix will be as follow.Input1 and inoput2 are inputs to the adder.

Output Input1 Input2 Shift1 Shift2

1 1 0 0 x1 x2 [ 1 0 ] x1 x2 [0 1 ] p2 z [ 1 0 ] p2 z [ 1 0 ]

Table 5-16output and graph table

The equation to compute final output for one system without common

subexpression elimination would be 1 _{2}

2
2
1
1
2
1 2*x* 2*z* *x* *x* 2*x* 2*z* *x*
*x* as

can be seen in Fig 5-7 .The equation to compute final output for one system
with common subexpression elimination would be *a* *a* *z* 1*a*

12 2 where

a is computed as *x*_{1}*x*_{2} seen in Fig 5-7.
The final filter after CSE will be

1

*z*

Fig. 5-7 Two Input FIR System after CSE 0 0 0 0 0 0 x2 a a 0 0 a 0 x1

**Example 5.8**

Following is an example of two input matrix with plane1 having three coefficients and plane2 has no coefficeints. There is a collision in different patterns as at least one non-zero digit is shared in each pattern. Since the algorithm is implemented in Matlab, the position of the first element in (Power of z, Power of 2, input) will be 1, 1, 1.similarly all non-zero digits are recognized using these information.

X2 X1

Fig. 5-8 FIR System

The pattern X1+2X1 and X1+X1z-1 give the frequency of 3 and 4

respectively but after elimination collision in x1+x2z-1 , the frequency of X1+x2z-1 and X1+2X1 becomes 2. The position of the elements with respect

to deltaz, delta2 and input is given in the following table, the element (2,2,1) and (2,3,1) shows that there is a collision, basis on these positions collision is detected and eliminated in the proposed algorithm. In the proposed algorithm it is if the two patterns share non-zero element then the pattern is counted once.

dz **2 3** 1 **2** 1 2 3

**d2 1 1** 2 **2** 3 3 3

di **1 1** 1 **1** 1 1 1

Table 5-17 Z, Power of 2 and input info of each non zero bit

**In Fig 5-8 the element with position (2 2, 1) shown in bold text in table 5-17**
**is shared with element of position (2, 1, 1) and (2, 3,1) while (2,3,1) is **
shared with (1, 3, 1) and (3, 3,1) where position of the element is determined
by deltaz,delta2 and input, means (dz,d2,di). When there is collision, it is
eliminated and then new commom subexpression is computed.

** Table 5-9 differences in Z, Power of 2 and input of each non zero**

zdi f 1 -1 0 -1 0 1 -2 -1 -2 -1 0 1 0 1 2 -1 0 1 1 2 1 P2 dif 0 1 1 2 2 2 1 1 2 2 2 0 1 1 1 1 1 1 0 0 0 idi f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1