• No results found

Design and implementation of a decimation filter using a multi-precision multiply and accumulate unit for an audio range delta sigma analog to digital converter

N/A
N/A
Protected

Academic year: 2021

Share "Design and implementation of a decimation filter using a multi-precision multiply and accumulate unit for an audio range delta sigma analog to digital converter"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Examensarbete

Design and implementation of a decimation filter using a

multi-precision multiply and accumulate unit for an audio

range delta sigma analog to digital converter

Erik Lindahl

(2)
(3)

Design and implementation of a decimation filter using a

multi-precision multiply and accumulate unit for an audio

range delta sigma analog to digital converter

Department of electrical engineering, Link¨opings Universitet

Erik Lindahl

LiTH - ISY - EX - - 08 / 4075 - - SE

Examensarbete: 30 hp Level: D

Supervisor: Oscar Gustafsson,

Department of electrical engineering, Link¨opings Universitet Examiner: Oscar Gustafsson,

Department of electrical engineering, Link¨opings Universitet Link¨oping: februari 2008

(4)
(5)

Institutionen f¨or Systemteknink 581 83 LINK ¨OPING

SWEDEN

februari 2008

x x LiTH - ISY - EX - - 08 / 4075 - - SE

Design and implementation of a decimation filter using a multi-precision multiply and accumulate unit for an audio range delta sigma analog to digital converter

Erik Lindahl

This work presents the design and implementation of a decimation filter for a three bits sigma delta analog to digital converter. The input is audio with a oversampling ratio of 32. Filter optimization and tradeoffs concerning the design is described. The filter is a multistage filter consisting of two cascaded FIR filters. The arithmetic unit is a multi-precision unit that can handle three or 24 bits MAC operations. The designed decimation filter is synthesized on standard cells of a 0.13 µm CMOS library.

decimation, digital filter, FIR, hardware implementation, multi precision, delta sigma

Nyckelord Keyword Sammanfattning Abstract F¨orfattare Author Titel Title

URL f¨or elektronisk version

Serietitel och serienummer Title of series, numbering

ISSN ISRN ISBN Spr˚ak Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats ¨ Ovrig rapport Avdelning, Institution Division, Department Datum Date

(6)
(7)

Abstract

This work presents the design and implementation of a decimation filter for a three bits sigma delta analog to digital converter. The input is audio with a oversampling ratio of 32. Filter optimization and tradeoffs concerning the design is described. The filter is a multistage filter consisting of two cascaded FIR filters. The arithmetic unit is a multi-precision unit that can handle three or 24 bits MAC operations. The designed decimation filter is synthesized on standard cells of a 0.13 µm CMOS library.

Keywords: decimation, digital filter, FIR, hardware implementation, multi precision, delta sigma

(8)
(9)

Acknowledgements

I would like to thank my supervisor Oscar Gustafsson, my opponent Johannes Lindblom, Hanna Svensson, Oskar Matteusson and Krister Berglund.

(10)
(11)

Nomenclature

Most of the reoccurring abbreviations and symbols are described here.

Symbols

h(n) impulse response H(z) transfer function M decimation rate L number of subfilters N filter order

R(ωT ) noise power spectral density

b word length c computational load ωcT passband edge ωsT stopband edge δ passband ripple

Abbreviations

A/D analog to digital converter SNR signal to niose ratio NTF noise transfer function PE processing element acc accumulator

RTL register transfer language DC Design compiler

VHDL hardware description language MAC Multiply and accumulate ∆Σ Delta sigma

(12)
(13)

Contents

1 Introduction 1 1.1 The task . . . 1 1.2 Method of solving . . . 1 1.3 Report outline . . . 3 2 Theory 5 2.1 Delta-Sigma A/D converter . . . 5

2.2 FIR and IIR filters . . . 6

2.3 Decimation filter . . . 6 2.3.1 Downsampling . . . 6 2.3.2 Lowpass filter . . . 7 2.4 Polyphase decomposition . . . 8 3 Matlab model 9 3.1 Filter specification . . . 9 3.1.1 Phase response . . . 10 3.2 Multistage decimation . . . 10 3.2.1 Computational load . . . 11

3.2.2 Half band filter . . . 12

3.2.3 Simulation results and computational complexity . . . 12

3.3 The architecture . . . 13

3.3.1 Dataflow . . . 13

3.3.2 Data memory . . . 13

3.3.3 Clock frequency . . . 15

3.4 Filter optimization . . . 16

3.5 Coefficient word length . . . 17

3.5.1 Constant zero bits in coefficients . . . 19

3.6 Schedule . . . 20 3.7 DC level . . . 20 3.8 A test case . . . 21 4 Hardware implementation 23 4.1 VHDL model . . . 23 4.2 Processing element . . . 25 4.2.1 What is it doing? . . . 25

4.2.2 The main idea . . . 25

4.2.3 Sign extension . . . 26

4.2.4 Internal wordlength . . . 26

(14)

xiv Contents 4.3 Synthesis . . . 27 4.3.1 Synthesis results . . . 28 4.3.2 Validation . . . 28 5 Future work 31 A Filter coefficients 35

(15)

Chapter 1

Introduction

To realize high resolution analog to digital converters without high precision analog components one can use the technique of oversampled delta sigma con-verters. In these converters digital decimation filters are essential parts. This work considers the design and implementation of such a digital decimation filter.

1.1

The task

The task was to implement a synthesizeable decimation filter for a given delta sigma A/D converter (∆Σ). The system in Fig1.1 shall fulfill the specifications in table 1.1. The decimation filter shall attenuate the noise created in the ∆Σ, see Fig1.3, reduce the sample rate by a factor 32 and increase the precision from three bits to 16 bits. In Fig1.2 a graph of the desired filter and the passband ripple requirements is shown.

Passband frequency ωc 0.4895 (Normalized)

Stopband frequency ωs 0.5688 (Normalized)

Passband ripple δc < 0.035 dB

Signal to noise ratio SN R 91 dB

Decimation rate M 32

Data sampling rate fs 44.1 kHz

Phase response linear

Table 1.1: System specifications.

The given ∆Σ introduces noise to the analog signal, this noise is referred to as the noise transfer function (N T F ), see figure 1.3. The filter will be designed to match this N T F as close as possible.

1.2

Method of solving

The starting point in this work is the filter specification. Then Matlab was used to design a model of the filter. The Matlab model and the hardware architecture was designed simultaneously, because some architectural decisions affects the Matlab model and vice versa. The filter model created in Matlab

(16)

2 Chapter 1. Introduction 3

∆Σ

analog x(n) x(t) x(32n) 16 44.1 kHz 1.41 MHz

H

(z)

Figure 1.1: Overview of the system. This work considers the design and imple-mentation of the decimation filter H(z).

1 0 -0.035 0.035 |H(ωT )| [dB] ωT ωT π π 32 |H(ωT )| [dB]

Figure 1.2: Filter specification. Left: Passband ripple requirement. The dashed lines indicates the maximum passband ripple. Right: The desired ideal low pass filter. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −100 −80 −60 −40 −20 0 20 ωT |NTF| [dB]

(17)

1.3. Report outline 3

Filter specification Matlab model Architecture Synthesis of

the architecture VHDL model

Synthesis to gate level design

Figure 1.4: Flow graph of the design process. From filter specification to gate level design.

was implemented in VHDL using HDL Designer. Finally the VHDL model was synthesized to a gate level design in Design Compiler. A flow graph of the design process can be seen in Fig1.4.

1.3

Report outline

The report follows the design flow, see Fig1.4. A theoretical background is given in chapter 2. Chapter 3 contains the design of a Matlab model.

The hardware architecture of the Matlab model is described in chapter 4. Some ideas of improvements in future work can be found in chapter 5.

(18)
(19)

Chapter 2

Theory

This chapter contains some background theory to this work. First, a brief introduction to delta sigma, then some theory in decimation filtering, and finally polyphase decomposition is explained.

2.1

Delta-Sigma A/D converter

The delta sigma technique makes it possible to realize high-resolution analog to digital conversion without high precision analog components [1]. One feature of the delta sigma is that it instead of sampling at the Nyquist frequency fN

the analog signal is oversampled by an oversampling ratio M , this means that the sampling frequency is much larger than the Nyquist frequency. In an A/D converter that samples at the Nyquist frequency the quantization noise is uni-formly distributed over the frequency band 0 to fs/2. An A/D converter that

is oversampled by a factor M spreads the noise spectra over a bandwidth that is M times larger. In addition to this the delta-sigma A/D converter moves the quantization noise so most of the noise lands outside the band 0 to fs/(2M ),

this is refereed to as noise shaping. See Fig 2.1 [7].

000

111

0

0

0

1

1

1

0

0

0

1

1

1

000

000

000

111

111

111

noise Quantization fs/(2M ) fs/2 fs/(2M ) fs/2 a) fs/2 b) c) Removed noise Digital filter Removed noise

Figure 2.1: a) Quantization noise spectrum with sampling at the Nyquist rate. b) The quantization noise spectrum when oversampled by a factor M . c) The quantization noise spectrum when oversampled by a factor M and noise shaped in a ∆Σ.

(20)

6 Chapter 2. Theory

x2(n)

H(z) M x3(n)

x1(n)

ωT1 ωT1 ωT2

Figure 2.2: The decimation filter with lowpass filter followed by downsampling. ωT1and ωT2indicate the different sample rates at the input and output, where

T2= T1/M .

2.2

FIR and IIR filters

Filters are usually distinguish between FIR (finite length impulse response) and IIR (infinite length impulse response). IIR filters can only be realized by using recursive algorithms and FIR can be realized with a recursive or non recursive algorithm, though recursive FIR filters are seldom used as they suffer from stability problems [4].

The advantages of FIR over IIR filters are that they can have a linear phase response, they are always stable, and they are easy to implement with polyphase decomposition. On the other hand FIR filters require much higher filter orders and introduce a large group delay [4]. In this case the filter must have a linear phase response and the filter will use different data rates. Because of the linear phase property and that FIR filters are easy to implement in an polyphase decomposition only FIR filters will be considered futher on in this work.

The length of an impulse response for an FIR filter of order N is N + 1. If the impulse response is symmetric or antisymmetric around n = N/2 the filter has a linear phase response. The transfer function of an N th order FIR filter with impulse response h(n) can be written as:

H(z) = N X n=0 h(n)z−n (2.1)

2.3

Decimation filter

The ∆Σ makes the A/D conversion at a low precision and a high samplerate. The task for the decimation filter is to reduce the sample rate by a factor M and increase the precision. The decimation is done in two steps consisting of a filter followed by downsampling. The filter is a lowpass filter that shall prevent aliasing. The downsampling reduces the sampling rate by a factor M . In Fig 2.2 an overview of the decimation filter is shown and Fig 2.3 presents an example of how a signal is decimated.

2.3.1

Downsampling

Downsampling of a signal x2(n) by a factor M means that a new signal x3(n)

is created by extracting every M :th sample in x2(n) the other samples are

neglected.

(21)

2.3. Decimation filter 7 2π −π π −2π 2π −π π −2π 1 M |X3(ωT2)| −π π 2π −2π −ωm ωm π M π M 1 1 1 |H(ωT1)| |X1(ωT1)| |X2(ωT1)| a) b) c) d) −Mπ 2π −π π −2π −Mπ ωT1 ωT1 ωT1 ωT2

Figure 2.3: a) Magnitude response for the signal X1(ωT1). Bandwidth ωm.

b) Ideal lowpass filter with cut off frequency π/M . c) Magnitude response for X2(ωT1) = X1(ωT1)H(ωT1). d) Downsampled version of X2(ωT1). T2= T1/M ωT 1 0 |H(ωT )| π M π

Figure 2.4: Lowpass filter with cut off frequency π/M

This operation has effects in the Fourier domain. The Fourier transform to x3(n) can be derived as.

X3(ωT2) = 1 M M −1 X k=0 X2( ωT1− k2π M ) (2.3)

This means that the spectra X2(ωT1) is repeated M times with the distance

2π/M between every recurrence, scaled by a factor 1/M [2]. In Fig 2.3 an example of the effects of decimation is given.

2.3.2

Lowpass filter

Before downsampling the signal have to be bandlimited to π/M to avoid aliasing, therefore the downsampling step have to be preceded by a low pass filter. The filter attenuates the frequency components in the region π/M to π. See Fig 2.4.

(22)

8 Chapter 2. Theory hN −2M +2 hN −M hN −2M +1 hN −M +1 hN −M +2 hN h0 h1 hM −1 hM hM +1 h2M −1 fsample fsample/M x(n) y(n)

Figure 2.5: A direct form FIR polyphase decomposition of a filter with order N.

2.4

Polyphase decomposition

The straight forward realization of a decimation filter is to first have a lowpass filter and then neglect every M :th sample. This means that most of the out put samples from the filter is discarded, only every M :th sample is used. This can be exploited to reduce the computational workload. The main idea of polyphase decomposition is to only calculate those samples that are not discarded. This can be realized by dividing the filter H(z) into L sub filters Hi(z). Where the

impulse response for each sub filter is:

hi(n) = h(nM + i), i = 0, 1, . . . , M − 1 (2.4)

(23)

Chapter 3

Matlab model

In this chapter the design of the Matlab model is presented. The starting point is the filter specification.

3.1

Filter specification

The main task for the filter is to attenuate the noise created in the delta-sigma A/D converter and to reduce the sample rate by a factor M = 32. To handle this the noise transfer function N T F (ωT ) (see Fig 1.3) from the delta sigma was used to derive a filter specification Hspec(ωT ).

The quantization noise power spectral density Rx(ωT ) and the noise power

spectral density after filtering Ry(ωT ) is found as [1]

Rx(ωT ) =

Q2

12|N T F (ωT )|

2 ; Q = 2−bin+1 (3.1)

Ry(ωT ) = |Hspec(ωT )|2Rx(ωT ) (3.2)

where Q is the quantization step in the A/D conversion, bin is the number of

bits that presents the input signal, in this case bin= 3. Ry(ωT ) is assumed to

be lower than a constant ε. The noise power Pnoise is then found as

Pnoise= 1 π Z π 0 ε dωT ⇒ Pnoise= ε ≥ Ry(ωT ) (3.3)

Rx(ωT )|Hspec(ωT )|2≤ σnoise2 ⇒ |Hspec(ωT )| ≤

s Pnoise Rx(ωT ) (3.4) SN R = 10 ∗ log10( Psignal Pnoise ) ⇒ σ2 noise= Psignal 10SN R10 (3.5) The signal is a sinusoid which power is Psignal= 1/2. By combining equation

3.3, 3.4 and 3.5 the filter specification can be derived as. |Hspec(ωT )| =

s

24

10SN R10 |N T F (ωT )|2

(3.6) See Fig 3.1 for a plot of the derived filter specification.

(24)

10 Chapter 3. Matlab model 0 0.5 1 1.5 2 2.5 3 −100 −80 −60 −40 −20 0 20 ωT [rad] |H spec | [dB]

Figure 3.1: A plot of the filter specification. The dashed line indicates the transition band.

3.1.1

Phase response

The filter specification derived above only limits the magnitude function. The phase response have to be linear, this is achieved by using a linear phase FIR filter.

3.2

Multistage decimation

If the overall sampling rate conversion ratio can be factored into the product

L

Y

i=1

Mi = M (3.7)

where each Mi is an integer, the decimation filter can be implemented using

L cascaded sub filters. In this chapter the optimum number of sub filters and their corresponding decimation rates will be determined.

All sub filters works in different data rates, the last sub filters have a lower data rate than the first ones. A low data rate result in a low computational work load, the number of computations per output sample is linear dependent of the data rate.

One also have to take into account the wordlength at each sub filter. The wordlength to the first sub filter is only three bits, it will be much higher to the other sub filters, due to multiplications and additions. To make use of this fact, downsampling by a large factor at the first sub filter is preferred. For example the filter structure with downsampling factors 16 and 2 will be examined, but not the opposite case with downsampling factors 2 and then 16.

Seven different filter structures have been investigated. These structures are presented in Fig 3.3. To find the optimum of these structures the Matlab function firpm has been used. Given the filter order, stopband and passband edges the firpm function returns an impulse response that is optimized with the McClellan-Parks-Rabiner algorithm [9].

To find the required filter order for each sub filter, the filter order was iter-atively increased until the filter met the filter specification that was derived in

(25)

3.2. Multistage decimation 11 ⇔ Hz y(m) x(n) x(n) H1(z) H2(z) Hk(z) ↓ M ↓ M1 ↓ M2 ↓ Mk y(m)

Figure 3.2: Multi stage decimation.

H7(z) ↓ 32 1 2 3 4 5 6 H11(z) ↓ 2 H12(z) ↓ 2 H13(z) ↓ 2 H14(z) ↓ 2 H15(z) ↓ 2 H21(z) ↓ 4 H22(z) ↓ 2 H23(z) ↓ 2 H24(z) ↓ 2 H31(z) ↓ 4 H32(z) ↓ 4 H33(z) ↓ 2 H41(z) ↓ 8 H42(z) ↓ 2 H43(z) ↓ 2 H51(z) H52(z) H61(z) H62(z) ↓ 8 ↓ 4 ↓ 16 ↓ 2 y(m) y(m) y(m) y(m) y(m) y(m) x(n) x(n) x(n) x(n) x(n) x(n) 7 x(n) y(m)

Figure 3.3: Seven filter structures that was taken under consideration. section 3.1. A schematic of the iteration is presented in Fig 3.4. The results are presented in table 3.1.

3.2.1

Computational load

Since each sub filter works at different sample rates a sub filter running at a high rate will need more calculations per output sample compared to a filter stage running at a low rate. The computational load (c) for each filter structure is estimated though calculation of the number of multiplications per output sample.

In addition to this one have to take into account that each filter stage uses different word lengths. The word length at the first stage is 3, further on I have assumed that the word length at the inputs to all other stages is 24, i.e. the computational load for the first filter stage is a factor 3/24 = 1/8 lower.

c = L X i=1    Ni+ 1 f sf  L Y j=i+1 Mj   (3.8) f sf =  8 if i = 1 1 when others (3.9)

Example: Filter structure 4 have three sub filters, (L = 3). The decimation factors and filter orders are: M1 = 8, M2 = 2, M3 = 2, N1 = 43, N2 = 16,

N3= 59. c is calculated with equation 3.8.

c = 43 + 1

(26)

12 Chapter 3. Matlab model N=1 yes derive filter N=N+1 does the specification ? no filter fulfill

Figure 3.4: This loop was used to derive a sub filter that fulfills the filter specification in order to investigate the optimum number of filter stages.

3.2.2

Half band filter

To implement the antialiasing filters one can use half band filters, which can reduce the computational load. A half band filter with the impulse response h(n) have the property that:

h(2p) = 0 f or p 6= 0 (3.11)

Or in other words, every second filter tap in the impulse response will be zero except for the tap at n = 0. This means that the number of multiplications required for a half band filter of order N will be [5]:

multiplications =  N 2 if N even N −1 2 if N odd (3.12) The drawback of half band filters is that the magnitude function must be symmetric with respect to π/2. This also means that the stopband and passband ripples must be equal. This limitation will result in higher filter orders and perhaps also more multiplications.

3.2.3

Simulation results and computational complexity

The results in table 3.1 have been evaluated by using the algorithm in Fig 3.4 and the equations 3.8 and 3.9 have been used to estimate the computational complexity of each filter structure. With aid of these results a filter structure was chosen that will be implemented.

Of main interest in table 3.1 is the estimated computational complexity c and chb for each filter structure, chb is the computational complexity when half

band filters is used. As one can see in the table the lowest value of c or chbis for

filter structure 4 if half band filters are used. Structure 4 results in the lowest complexity even if halfband filters are not used. Forcing a filter into a half band filter puts constraints on the filter that will result in higher filter orders. In Fig 3.5 filters 4 and 6 are compared to the filter specification. In both cases

(27)

3.3. The architecture 13 0 0.5 1 1.5 2 2.5 3 −160 −140 −120 −100 −80 −60 −40 −20 0 |H6 | [dB] ωT [rad] 0 0.5 1 1.5 2 2.5 3 −160 −140 −120 −100 −80 −60 −40 −20 0 |H4 | [dB] ωT [rad]

Figure 3.5: Two filters compared to the filter specification. To the left is struc-ture 6 and to the right strucstruc-ture 4.

the filters meets the specification with a large margin. Such a margin is not necessary, the filter orders can probably be lower, this subject is discussed more in section 3.4.

A mistake

A mistake was first made when estimating the filter orders. These (wrong) results are presented in the table 3.2. The lowest complexity was achieved for filter structure 6. From now filter structure 6 is considered.

3.3

The architecture

Before continuing with the design of the Matlab model the hardware architecture have to be considered.

The main idea of the architecture is to use one memory for storing data, one coefficient memory and one processing element that performs the convolutions. A buffer at the input stores l input samples. The processing element works in different modes depending on the data wordlength.

3.3.1

Dataflow

In this section a description of how data goes from input to output is given. The input is three bits wide and l inputs are buffered and and stored in the data memory. When the first sub filter shall be evaluated data is read from data memory to the processing element, eight data samples are processed in parallel. Results are written back to the data memory. At last the second sub filter is evaluated though reading data from memory to the processing element and then update the output.

3.3.2

Data memory

The data memory shall store input data and results from the first sub filter. l inputs are stored in one word in the memory. To make the read and writes as simple as possible l is a power of two (l = 2v), the wordlength in memory is

(28)

14 Chapter 3. Matlab model

1 sub filter H11 H12 H13 H14 H15 total

decimation 2 2 2 2 2

N + 1 6 6 10 16 55

c 12 48 40 32 55 187

chb 6 24 20 16 55 121

2 sub filter H21 H22 H23 H24 total

decimation 4 2 2 2

N + 1 17 10 14 63

c 17 40 28 63 148

chb 17 20 14 63 114

3 sub filter H31 H32 H33 total

decimation 4 4 2

N + 1 17 34 94

c 17 68 94 179

chb 17 68 94 173

4 sub filter H41 H42 H43 total

decimation 8 2 2

N + 1 44 17 60

c 22 34 60 116

chb 22 17 60 99

5 sub filter H51 H52 total

decimation 8 4

N + 1 44 140

c 22 139 161

chb 22 139 161

6 sub filter H61 H62 total

decimation 16 2 N + 1 177 89

c 44 89 133

chb 44 89 133

7 sub filter H7 total

decimation 32 N + 1 1521

c 190 190

chb 190 190

Table 3.1: Required filter order and computational complexity for each sub filter. N is the filter order, c is the computational complexity when halfband filters are not used, chb is the computational complexity when half band filters

(29)

3.3. The architecture 15

Filter structure 1 2 3 4 5 6 7

c 232 157 177 136 200 121 190

Table 3.2: Computational complexity. Note that these numbers are wrong.

PE input buffer coeff ROM Data memory bout= 16 b1 b1 bin= 3 b1

Figure 3.6: Main idea of the architecture. bin is the input word length, b1 the

data word length between the sub filters and bc is the coefficient word length.

PE is the processing element.

b1= 3 ∗ 2v (3.13)

b1 is also the number of bits that represents the result from the first filter

stage. b1 = 6 and b1 = 12 are to few bits and b1 = 48 to many. Then the

best choice is b1= 24. Consequently l is set to 24/3 = 8. The data is saved in

memory as in Fig 3.7.

The number of words needed in the data memory is determined by the length of the impulse responses of the filters. Because l inputs are stored in one word in memory, the number of words needed for the first sub filter will be reduced by a factor l. The total number of words in the memory will be:

 N1+ 1

l 

+ N2+ 1 (3.14)

3.3.3

Clock frequency

To determine the clock frequency fclk one have to know the number of clock

cycles used to produce one output. If the output have the frequency fsample

then fclk is:

24

3 3 3 3 3 3 3 3

a)

b)

Figure 3.7: Data formats. a) Eight inputs is stored in one word in the data memory. b) The result from the first sub filter is 24 bits wide one word is stored on each line in data memory.

(30)

16 Chapter 3. Matlab model

fclk= K ∗ fsample (3.15)

Where K is the number of cycles needed to produce one output. To make a simple design K should be chosen to a power of two.

K = 2x (3.16)

A bottleneck in the design is the data memory which only can perform one read or one write each clock cycle. To produce one output 32 input samples have to be read. Due to decimation 8 samples can be written at one single line in memory, hence 32/8 = 4 cycles must be used to read input values in memory. The decimation rate at the first filter stage is 16, then the results from this filter stage have to be stored 32/16 = 2 times per output sample. This ends up in the following equation:

2N1+ 1

8 + (N2+ 1) + 4 + 2 = 2

x

(3.17) In section 3.4 x will be determined.

3.4

Filter optimization

The filters derived in chapter 3.2 fulfills the specification with a large margin and the only requirement in the specification is a question of SN R. This might imply that the filter orders can be lower.

In order to to get an optimal filter, the filter coefficients given by Parks-McClellan algorithm was optimized using the fminimax function in Matlab. The optimization maximizes the SN R subject to the requirements on the ripple in the passband in table 1.1. The optimization problem is formulated as:

maximize SN R (3.18)

subject to 1 − δc ≤ |H(ωpb)| ≤ 1 + δc ωpb∈ [0 : ωc] (3.19)

In order to determine 2x in equation 3.17 four different values of 2x have

been examined. For each 2xN

1and N2have been chosen so they fulfill equation

3.17 and 2N1 ≈ N2. An SN R value have been derived with the optimization

technique for each N1, N2. See table 3.3.

2x N 1 N2 SN R 32 47 19 58.4 64 79 37 102.8 128 207 95 104.4 256 399 199 105.0

Table 3.3: SN R for different number of cycles per output. 2x= 64 is chosen.

According to the results in table 3.3 2x = 64 is chosen, because 2x = 32

results in a too low SN R and 2x = 128 2x = 256 yield a small improvement

(31)

3.5. Coefficient word length 17 15 31 47 63 79 95 111 127 143 159 175 191 207 50 60 70 80 90 100 110 N 1 SNR [dB]

Figure 3.8: SNR for different values of N1and N2. The best trade off is reached

for N1= 63 and N2= 41 with a SN R = 103.3 dB.

these filter orders to the results given in chapter 3.2, the difference is more than an factor two. Equation 3.17 can now be rewritten as:

2N1+ 1

8 + (N2+ 1) + 4 + 2 = 64 ⇒ N2= 57 − 2 N1+ 1

8 (3.20)

When the number of cycles is set to 64, one have to decide how to divide these cycles between the two filter stages. 24 filter structures with different values of N1 and N2 was optimized to find the best trade off between N1 and

N2. A SN R value is calculated and plotted in Fig 3.8. The best SN R is reached

for N1= 63 and consequently N2= 41. Hence a SN R = 103.3 db is obtained.

The resulting filter after optimization have a much higher attenuation in the stopband compared to the filter derived with firpm. The filter coefficients are presented in appendix A.

3.5

Coefficient word length

A Matlab model can have (almost) infinite precision in the filter coefficients. In hardware the filter coefficients are represented with a finite number of bits. This affects both the passband ripple and the SN R. The Table 3.4 and 3.5 shows how the SNR and passband ripple are affected by the word length.

According to the results in table 3.4 the SN R is only dependent on the wordlength of the first sub filter bH1. Hence this table was used to determine

bH1. If the wordlength is chosen to bH1 = 16 or bH1 = 17 the SN R results

almost in its ideal value. Hence bH1 is chosen to 16.

To decide the wordlength for the second sub filter (bH2) table 3.5 was used.

According to the filter specification (see table 1.1) the passband ripple have to be less then 0.035 dB. If bH2 is chosen to 13 the passband ripple will be 0.0232

dB. Then a design margin of 0.0118 dB is achieved at a low cost.

It is possible to obtain a better filter if coefficient wordlength was written as a constraint in the optimization. It is hard to write such a constraint and optimization would take a lot of time for a small improvement in the filter design.

(32)

18 Chapter 3. Matlab model 0 1 2 3 −150 −100 −50 0

Filter before optimization

ωT1 [rad] |H| [dB] 0 1 2 3 −150 −100 −50 0 Optimized filter ωT1 [rad] |H| [dB] 0 1 2 3 −150 −100 −50 0

Sub filter 1 after optimization

ωT 1 [rad] |H 1 | [dB] 0 1 2 3 −150 −100 −50 0

Sub filter 2 after optimization

ωT

2 [rad]

|H

2

| [dB]

Figure 3.9: Magnitude response for the optimized filter compared with the filter derived with the Matlab function firpm. Note that the attenuation is much higher for the optimized filter. The two lower plots are the magnitude responses for the two sub filters H1and H2.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 −0.04 −0.03 −0.02 −0.01 0 0.01 0.02 0.03 0.04 ωT1 [rad] |H| [dB] optimized filter filter requirements

(33)

3.5. Coefficient word length 19 bH2 11 12 13 14 14 92 92 92 92 bH1 15 97 97 97 97 16 102 102 102 102 17 103 103 103 103

Table 3.4: The table shows how the SN R are affected by different wordlengths at the filter coefficients. bH1 and bH2 are coefficient wordlength for filter stage

one end two respectively. The SN R values are printed in [dB]

bH2 11 12 13 14 14 0.0311 0.0281 0.0241 0.0212 bH1 15 0.0324 0.0293 0.0234 0.0218 16 0.0332 0.0301 0.0232 0.0226 17 0.0329 0.0298 0.0232 0.0223

Table 3.5: The table shows how the passband ripple are affected by different wordlengths at the filter coefficients. bH1 and bH2 are coefficient wordlength for

filter stage one end two respectively. The passband ripple values are printed in [dB]

3.5.1

Constant zero bits in coefficients

If the most significant bits are zero for all filter coefficients in an impulse response it is unnecessary to store these zeros and even more unnecessary to multiply with bits that are constant zero. To find out how many bits that are constant zero, the following expression was used where h is the impulse response and bzerosis

the number of bits that are constant zero.

max(|h(n)|) ∗ 2bzeros≤ 1 ⇒ b zeros≤ log2  1 max(|h(n)|)  (3.21) And consequently the number of active bits bactive is

bactive = bH− bzeros (3.22)

In table 3.6 bzeros and bactive that are derived from equation 3.21 and 3.22

for both sub filters. The number of active bits is equal for the two sub filters, this will make it easy to design the hardware in an efficient way. The impulse responses are multiplied with two constants. To compensate for this the output is divided by the same constants, see the post processing block in section 4.1.

h1new(n) = 24h1(n) (3.23)

(34)

20 Chapter 3. Matlab model

sub filter bzeros bactive

H1 4 12

H2 1 12

Table 3.6: The number of bits equal to zero (bzeros) and the number of active

bits (bactive). Note that the number of active bits is equal for both filters.

000

000

111

111

000

000

111

111

000

000

111

111

00000

00000

11111

11111

00000

00000

11111

11111

000

000

111

111

00

00

00

11

11

11

00

00

11

11

00

00

00

11

11

11

W1 Wi Wi Wi

W1 Write result from first subfilter to memory Wi Write input data to memory

Compute sub filter two

Compute sub filter one first time Compute sub filter one second time

W1Wi

Figure 3.11: Schedule

3.6

Schedule

There are 64 cycles available to produce one output to decide how these cycles should be divided between the two sub filters a schedule was made. A bottleneck in the design is the data memory, one word can be written or read each cycle. Input data have to be written four times to the memory and results from the first sub filter have to be written twice. When the convolutions for the two sub filters is computed data is read from the data memory. This is scheaduled as in Fig 3.11.

3.7

DC level

The output DC level shall be zero. To handle this a constant (cDC) is added at

the output. Because the input is in the range [0 : inmax] the DC level will not

be zero. inmax= 3 X i=1 2−i = 0.875 (3.25)

This will cause a constant DC offset. To compensate for this offset a con-stant cDC is added after the filter. In the equation below ’*’ is the convolution

operator.

x(n) = inmax

2 (3.26)

(35)

3.8. A test case 21 0 0.5 1 1.5 2 2.5 3 0 20 40 60 80 ωT 1 [rad] |X| [dB]

Figure 3.12: Magnitude spectra of a test signal. The dashed line indicates the transition band. 0 1 2 3 −150 −100 −50 0 50 ωT1 [rad] |Y| [dB] 0 1 2 3 −20 −10 0 10 20 30 40 50 ωT2 [rad] |Y| [dB]

Figure 3.13: The test signal filtered by the Matlab model. The plot to the left is the signal only filtered, not downsampled, the dashed line indicates the transition band. The rightmost plot is filtered and dowsampled signal.

cDC is found to −0.4366.

3.8

A test case

To test the Matlab model of the filter a test signal was derived from the ∆Σ modulator that precedes the filter, see section 1.1. The test signal is a sine wave with a frequency at π/2, oversampled by a factor 32. To this signal noise from the ∆Σ is added. The frequency spectra of the test signal is plotted in Fig 3.12. In Fig 3.13 the filtered test signal is plotted. Note that the peak in the resulting output is located at π/2. Also note that inband signal is intact and that the noise in the stopband is attenuated.

(36)
(37)

Chapter 4

Hardware implementation

This chapter describes how the Matlab model was implemented in hardware. First a VHDL model was created, here the processing element is described in detail. Secondly I explain how the VHDL model was synthesized to a gate level design.

4.1

VHDL model

To implement the VHDL model HDL Designer was used. To start with a de-tailed architecture was created, see Fig 4.2. A brief description to each block is given below. The processing element is discussed in more detail in section 4.2. control The control block keeps track of the schedule (see section 3.11). It

sends control signals to the other blocks.

in buffer The in buffer is a serial to parallel block. It buffers eight inputs. The output is 8 ∗ 3 = 24 bits wide.

Coefficient memory The coefficient memory stores the filter coefficients. Eight coefficients can be read from the memory in parallel.

Data memory The Data memory stores input data and results from the first sub filter. There are 50 words in memory, each word has a wordlength of 24 bits. See Fig 4.1. The given standard cell library provides a register file that was used.

Memory pointer The Memory pointer is a pointer to the data memory. The wordlength needed is:

bpointer = ⌈log2(50)⌉ (4.1)

Post process The post process block cares about the DC level (see section 3.7). It also compensates for a constant gain discussed in section 3.5.1. Here the output is saturated and truncated to match the output format of 16 bits [6].

(38)

24 Chapter 4. Hardware implementation 0 7 0 41 24 3 h1 h2

Figure 4.1: The data memory. Eight words is needed to store inputs to the first sub filter. 42 words is needed to store inputs to the second sub filter.

Memory pointer memory Data Processing element Coefficient memory In buffer Post process control in out 3 24 24 24 16 8x13 6

(39)

4.2. Processing element 25

4.2

Processing element

This section describes the Processing element.

4.2.1

What is it doing?

The task for the processing element is to calculate the convolution for the two sub filters. The convolution operation consists of multiplying and accumulation see Fig 4.3. There are two inputs, input data and coefficient data.

d

c

acc

Figure 4.3: Multiply and accumulate (MAC) operation for calculating convolu-tions. d and c are input data and coefficient data respectively. acc is accumu-lator output.

4.2.2

The main idea

The main idea is to use the same multiplier for calculating the convolution for both the first and second sub filter, even though the wordlength is 3 and 24 bits respectively.

In the case when the input data is 24 bits wide, the data will be split up in eight parts where each part is three bits wide. Each part of the input data is multiplied by the coefficient to produce eight partial products. These are shifted left and added so the product p2= c ∗ d is evaluated. See Fig 4.4.

p2= 7

X

i=0

cdi23i (4.2)

When the convolution to the first sub filter is to be evaluated the input is three bits wide, grouped eight words together see Fig 3.7. Now no shifts are performed. When these partial products are added the result will be eight MAC operations each clock cycle. See Fig 4.4.

p1= 7

X

i=0

cdi (4.3)

The architecture for implementing this can be seen in Fig 4.4. This ar-chitecture enables the processing element to either calculate the multiply and accumulate (MAC) for eight inputs per cycle if the input wordlength is three or one MAC if the input wordlength is 24.

(40)

26 Chapter 4. Hardware implementation T c d0 Adder tree <<18 <<15 <<12 <<9 <<6 <<3 <<21 p1,2 c dc7 dc6 dc5 dc4 dc3 dc2 d1

acc

binternal

Figure 4.4: The processing element can perform convolution with input wordlength of 3 or 24 bits. Each di is 3 bits wide.

4.2.3

Sign extension

If we want to change the wordlength we have to copy the sign bit this is referred to as sign extension. see example below:

x0 x1 x2 = x0 x0 x0 x1 x2

In the multiplexers in Fig 4.4 sign extension will cause a significant load on the sign bit. This can be avoided by inverting the sign bit and adding a com-pensation vector see Fig 4.5. This is similar to the Baugh-Wooley’s multiplier [6]. The sign extension technique reduce the load on the sign bits at the cost of one extra addition in the adder tree.

4.2.4

Internal wordlength

The internal wordlength (binternal) in the processing element have to be long

enough to prevent the occurance of overflow. The worst case input would need the following number of bits to represent the output for the first sub filter:

binternal1 = & log2 7 N1 X n=0 |h1(n)| !' = 20 (4.4)

The number 7 derives from the input range which maximum is 7. The result from the first sub filter is only 20 bits, the memory is 24 bits wide, so 4 bits in each word in memory is not used when storing results from the first sub filter. This also means that one of the multiplier (the leftmost in Fig 4.4) will not be used for the second sub filter.

The number of bits needed to represent the result from the second sub filter is: binternal= & log2 7 N1 X n=0 |h1(n)| ∗ N2+1 X n=1 |h2(n)| !' = 34 (4.5)

(41)

4.3. Synthesis 27 0 0 0 x¯0 x1 x2 x3 x4 x5 x6 . . . + 1 1 1 1 0 0 0 0 0 0 . . . x0 x0 x0 x0 x1 x2 x3 x4 x5 x6 . . . 0 0 0 0 0 0 y¯0 y1 y2 y3 . . . + 1 1 1 1 1 1 1 0 0 0 . . . y0 y0 y0 y0 y0 y0 y0 y1 y2 y3 . . . x + y = 0 0 0 x¯0 x1 x2 x3 x4 x5 x6 . . . 0 0 0 0 0 0 y¯0 y1 y2 y3 . . . 1 1 1 1 0 0 0 0 0 0 . . . + 1 1 1 1 1 1 1 0 0 0 . . . 0 0 0 x¯0 x1 x2 x3 x4 x5 x6 . . . 0 0 0 0 0 0 y¯0 y1 y2 y3 . . . + 1 1 1 0 1 1 1 0 0 0 . . .

Figure 4.5: Sign extension and Addition of the binary numbers x and y. Sign extension by inverting the sign bit and adding a compensation vector. When adding several numbers the compensation vectors can be summed and precom-puted. Therefor the load on the most significant bit is reduced at the cost of one extra add.

x33 x32 x31 x30 x29 . . . x18 x17 x16 x15 x14 . . . x1 x0

Figure 4.6: 16 bits to the output selected from the accumulator in the processing element.

L∞ norm

To determine which 16 of the 34 bits that shall represent the output a measure of the size of the signal was needed. For this purpose the L∞ norm was used

which is defined as [4]:

kX(ωT )k∞ = max{|X(ωT )|} (4.6)

The number of bits needed will be:

b = ⌈log2(max{7 ∗ |H(ωT )|})⌉ = 32 (4.7)

The 16 output bits will be chosen as in Fig 4.6

4.3

Synthesis

To translate the VHDL model to a gate level design the synthesis tool Design Compiler (DC) was used. To synthesize DC need a RTL hardware description and a standard cell library. DC can then produce a gate level netlist which is

(42)

28 Chapter 4. Hardware implementation

block area % power [µW ] %

Processing element 23253 50.6 35.4 73.3 Post process 1777 3.9 2.27 4.7 In buffer 1185 2.6 1.65 3.4 Data memory 14059 30.6 3.31 6.9 Memory pointer 1708 3.7 1.83 3.8 Control 390 0.8 0.8 1.7 Coefficient ROM 3353 7.3 2.76 5.7 total 45975 47.95

Table 4.1: Area and power consumption for the circuit. The area have no unit

a complete description of the RTL hardware description where all components are standard cells (for example AND-gates and OR-gates) [8].

4.3.1

Synthesis results

Timing

The longest path in the design is called the critical path and the time to execute the critical path is denoted TCP. If TCP is lower than the clock period time

(Tclk) the timing constraints on the circuit will be fulfilled. Tclkis found as:

Tclk=

1 fs∗ 32 ∗ 2

= 355 [ns] (4.8)

TCP is found with the Design Compiler function timing report. The critical

path is a path starting in the control block, going through coefficient memory and processing element block and ends in the post processing block.

TCP = 28.84 [ns] (4.9)

TCP is lower then Tclk, then the timing constraint is fulfilled.

Area and power consumption

The area and power consumption for the circuit was estimated by Design Com-piler. The functions report area and report power have been used. In table 4.1 the area and power consumption is presented. The area has no unit, the numbers in the table are only to make a comparison between the blocks. The processing element and the memory are the blocks that uses most area and power. Espe-cially the processing element which uses half of the area and 73 % of the total power.

4.3.2

Validation

To validate the behavior of the gate level design a testbench was created (Fig 4.7). A test vector was created, see section 3.8, this test vector was transformed to a file which could be read from Model Sim. Model Sim is a simulation tool which can simulate VHDL designs. The gate level design was translated to a VHDL netlist which also can be simulated in Model Sim. The results from the

(43)

4.3. Synthesis 29 Gate level design Matlab model Compare Test vector pass/fail Testbench

Figure 4.7: A testbench to validate the behavior of the gate level design. simulation in Model Sim was translated back to Matlab. At last a comparison between the Model Sim simulation and the Matlab model was made in Matlab. This comparison resulted in a pass.

(44)
(45)

Chapter 5

Future work

The filter that have been presented in this work is of course not perfect, below are some examples of how the filter can be improved.

• To start with I think that the estimations of the filter orders in chapter 3.2 are not reliable, because these results differ a lot to the filter orders derived with the optimization technique. Compare 177 and 89 to 63 and 41, the difference is more than a factor two. To improve the filter design one should examine other filter structures in detail with the optimization technique. Of main interest is structure 4 (decimation with 8, 2 and 2). • The filter coefficients are represented with a finite number of bits. If this

is written as a constraint in the optimization formulation a better filter could be achieved.

• Moreover the processing element consumes 73 % of the total power con-sumption. If the implementation can be improved area can be saved. Perhaps one can save power by implementing the multipliers using carry save adders instead of regular adders.

(46)
(47)

Bibliography

[1] Henrik Ohlsson, Behzad Mesgarzadeh, Kenny Johansson, Oscar Gustafs-son, Per L¨owenborg, H˚akan Johansson, Atila Alvandpour, A 16 GSPS 0.18 µm CMOS Decimator for Single-Bit Σ∆ - Modulation

[2] Sune S¨oderqvist, (2005), Tidsdiskreta Signaler och System [3] P.P. Vaidyanathan, (1993), Multirate systems and filter banks [4] Lars Wanhammar, H˚akan Johansson, (2002), Digital filters

[5] Fred Mintzer, (1982), On Half-Band, Third-Band, and Nth-Band FIR Fil-ters and Their Design, IEEE Transactions on Acoustics, Speech, and Signal Processing

[6] Lars Wanhammar, (1999), DPS Integrated circuits

[7] Anil K. Maini, (2007), Digital Electronic Principles, Devices and Applica-tions

[8] James R. Armstrong, F. Gail Gray, (2000), VHDL Design Representation and Synthesis

[9] McLellan J.H., Parks T.W., Rabiner L.R., (1973), A computer program for designing optimum FIR linear phase digital filters, IEEE Transactions on Audio and Electroacoustics

(48)
(49)

Appendix A

Filter coefficients

i h1(i) i h1(i) i h1(i) i h1(i)

0 -1 16 994 32 3635 48 819 1 -1 17 1186 33 3598 49 664 2 -1 18 1393 34 3526 50 527 3 2 19 1612 35 3422 51 409 4 9 20 1838 36 3288 52 309 5 21 21 2069 37 3127 53 227 6 40 22 2299 38 2943 54 160 7 68 23 2525 39 2741 55 108 8 108 24 2741 40 2525 56 68 9 160 25 2943 41 2299 57 40 10 227 26 3127 42 2069 58 21 11 309 27 3288 43 1838 59 9 12 409 28 3422 44 1612 60 2 13 527 29 3526 45 1393 61 -1 14 664 30 3598 46 1186 62 -1 15 819 31 3635 47 994 63 -1

Table A.1: Filter coefficients sub filter one.

i h2(i) i h2(i) i h2(i) i h2(i)

0 7 11 -210 22 482 33 48 1 -58 12 296 23 -1155 34 -181 2 112 13 161 24 -111 35 25 3 -61 14 -452 25 680 36 135 4 -84 15 -66 26 -66 37 -84 5 135 16 680 27 -452 38 -61 6 25 17 -111 28 161 39 112 7 -181 18 -1155 29 296 40 -58 8 48 19 482 30 -210 41 7 9 217 20 3453 31 -163 10 -163 21 3453 32 217

Table A.2: Filter coefficients sub filter two.

(50)
(51)

LINKÖPING UNIVERSITY ELECTRONIC PRESS

Copyright

The publishers will keep this document online on the Internet - or its possi-ble replacement - for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this per-mission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative mea-sures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For ad-ditional information about the Link¨oping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/

Upphovsr¨att

Detta dokument h˚alls tillg¨angligt p˚a Internet - eller dess framtida ers¨attare - under 25 ˚ar fr˚an publiceringsdatum under f¨oruts¨attning att inga extraordi-n¨ara omst¨andigheter uppst˚ar. Tillg˚ang till dokumentet inneb¨ar tillst˚and f¨or var och en att l¨asa, ladda ner, skriva ut enstaka kopior f¨or enskilt bruk och att anv¨anda det of¨or¨andrat f¨or ickekommersiell forskning och f¨or undervisning.

¨

Overf¨oring av upphovsr¨atten vid en senare tidpunkt kan inte upph¨ava detta tillst˚and. All annan anv¨andning av dokumentet kr¨aver upphovsmannens med-givande. F¨or att garantera ¨aktheten, s¨akerheten och tillg¨angligheten finns det l¨osningar av teknisk och administrativ art. Upphovsmannens ideella r¨att in-nefattar r¨att att bli n¨amnd som upphovsman i den omfattning som god sed kr¨aver vid anv¨andning av dokumentet p˚a ovan beskrivna s¨att samt skydd mot att dokumentet ¨andras eller presenteras i s˚adan form eller i s˚adant sammanhang som ¨ar kr¨ankande f¨or upphovsmannens litter¨ara eller konstn¨arliga anseende eller egenart. F¨or ytterligare information om Link¨oping University Electronic Press se f¨orlagets hemsida http://www.ep.liu.se/

c

2008, Erik Lindahl

References

Related documents

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än