• No results found

Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

N/A
N/A
Protected

Academic year: 2021

Share "Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköping University Post Print

Analysis of Twiddle Factor Memory

Complexity of Radix-2^i Pipelined FFTs

Fahad Qureshi and Oscar Gustafsson

N.B.: When citing this work, cite the original article.

©2010 IEEE. Personal use of this material is permitted. However, permission to

reprint/republish this material for advertising or promotional purposes or for creating new

collective works for resale or redistribution to servers or lists, or to reuse any copyrighted

component of this work in other works must be obtained from the IEEE.

Fahad Qureshi and Oscar Gustafsson, Analysis of Twiddle Factor Memory Complexity of

Radix-2^i Pipelined FFTs, 2009, 43rd Asilomar Conference on Signals, Systems, and

Computers, 217-220.

Postprint available at: Linköping University Electronic Press

(2)

Analysis of Twiddle Factor Memory Complexity of

Radix-

2

i

Pipelined FFTs

Fahad Qureshi and Oscar Gustafsson

Department of Electrical Engineering, Link¨oping University SE-581 83 Link¨oping, Sweden

E-mail:{fahadq, oscarg}@isy.liu.se

Abstract—In this work, we analyze different approaches to

store the coefficient twiddle factors for different stages of pipelined Fast Fourier Transforms (FFTs). The analysis is based on complexity comparisons of different algorithms when imple-mented on Field-Programmable Gate Arrays (FPGAs) and ASIC for different radix-2i algorithms. The objective of this work

is to investigate the best possible combination for storing the coefficient twiddle factor for each stage of the pipelined FFT.

I. INTRODUCTION

Computation of the discrete Fourier transform (DFT) and inverse DFT is used in e.g. orthogonal frequency-division mul-tiplexing (OFDM) communication systems and spectrometers. AnN-point DFT can be expressed as

X(k) = N−1

n=0

x(n)Wk

N, k = 0, 1, N − 1 (1) where Wn=e−j2πN is twiddle factor, the N:th primitive root

of unity with it’s exponent being evaluated modulo N, n is the time index, andk is the frequency index. Various methods for efficiently computing (1) have been the subject of a large body of published literature. They are commonly referred to as fast Fourier transform (FFT) algorithms. Also, many different architectures to efficiently map the FFT algorithm to hardware have been proposed [1].

A commonly used architecture for transforms of length N = br is the pipelined FFT. The pipeline architecture is characterized by continuous processing of input data. In addition, the pipeline architecture is highly regular, making it straightforward to automatically generate FFTs of various lengths.

Figure 1 outlines the architecture of a Radix-2i single-path delay feedback (SDF) decimation in frequency (DIF) pipeline FFT architecture for length N. This architecture is generic while the required ranges of each complex twiddle factor multiplier is outlined in Table I for varying numbers of i. For the twiddle factor multipliers with small ranges special methods have been proposed. Especially one can note that for a W4 multiplier the possible coefficients are {±1, ±j} and, hence, this can be simply solved by optionally interchanging real and imaginary parts and possibly negate (or replace the addition with a subtraction in the subsequent stage). For larger

ranges (W8,W16, and W32) approaches have been proposed in [4], [6]–[8].

In this work we instead focus on using standard complex multipliers. However the twiddle factors calculated advance, stored in memories and retrieved for multiplication whenever necessary. The size of the twidde factor memory for each stage depends upon some factors; arithmetic precision, number of FFT point and number of the stage. Usually for a long FFT the lookup tables are large in comparsion with butterfly and complex multiplier. In [9], [10] methods are proposed to reduce the size of the memories by utilizing the octave symmetry of the twiddle factors, hence only storing values for angles between 0 ≤ α ≤ π/4. The memory then have at most (N/8 + 1) words. However, the results in [9], [10] are given for complete FFTs using the same architecture for all memories and only for radix-22. In this work we show that octave symmetry is not always useful due to the overhead of multiplexers and negations. Furthermore, we will investigate the wordlength scaling effect as previous work has shown that the occupied cell area when synthesizing look-up tables does not grow linearly with the number of bits in the look-up table [11]. It is noted that one could use dedicated memory structures on the FPGAs, but depending on available resources and the size of the memories this may not be suitable. For using the dedicated memory structures a cost model is proposed in [12].

In next section the different architectures to implement the twiddle factor memories are explained. In Section III, we analyze and compare the implementation results of those architectures. Finally, some conclusions are presented.

II. ARCHITECTURES FORTWIDDLEFACTORMEMORIES

The twiddle factor memory should provide the real and imaginary parts of the twiddle factor. Typically, in a SDF

TABLE I

MULTIPLICATION AT DIFFERENT STAGES FOR DIFFERENT ARCHITECTURE. Stage number Radix 1 2 3 4 5 2 WN WN/2 WN/4 WN/8 WN/16 22[2] W4 WN W4 WN/4 W4 23[3] W4 W8 WN W4 W8 24[4] W4 W8 W16 WN W4 25[5] W4 W8 W16 W32 WN

(3)

Address

BF BF BF

N/2

Stage 1 Stage 2 Stage 3 Stage 4 Stage 5

W W W W W N/8 N/4 BF BF BF N/64 N/32 N/16

Fig. 1. The R2isingle-path delay feedback (SDF) decimation in frequency (DIF) pipeline FFT architecture with twiddle factor stages.

Address

WnImaginary

WnReal

N words Coefficient Memory

Fig. 2. Block diagram of Single Look-up Table twiddle factor memory.

Coefficient WnImaginary WnReal Address Mapping Address Memory N words

Fig. 3. Block diagram of twiddle factor memory with address mapping.

pipelined FFT architecture a counter is used to keep track of which row of the FFT are computed in each clock cycle. Hence, we will here assume that the mapping should be from row number to the real and imaginary part of the twiddle factor.

A. Single Look-up Table

The simplest approach, as shown in Fig. 4, is to just use a large look-up table to store the twiddle factors. For aWN multiplier,N words needs to be stored. Hence, for large N one could expect this method to have a higher complexity compared to the reduced schemes. On the other hand it lacks any overhead. It should also be noted that this scheme possibly stores the same twiddle factor in several positions as the mapping is from row to twiddle factor and for radix-2i algorithms some twiddle factors appears more than once for i ≥ 2.

B. Twiddle Factor Memory with Address Mapping

A possible simplification is to use an address mapping circuit that maps the row to the corresponding angle (k in (1)) and use a memory storing the required elements only once. For the general case, we will need to store many, but not all, values, still usingN possible words even though many can be set to “don’t care”. Because of this one can expect the resources used for the look-up table to be reduced compared to the previous approach, given that the synthesis tool can benefit from it. The structure is shown in Fig. 3.

1 1 Coefficient Address Mapping WnReal WnImaginary Address (N/8 + 1) wordsMemory

Fig. 4. Block diagram of twiddle factor memory with address mapping and symmetry. L k i M Bit Flip Address

Fig. 5. Block diagram of address mapping unit.

C. Twiddle Factor Memory with Address Mapping and Sym-metry

Another modification, that was proposed in [9], [10], is to use the well known octave symmetry to only store twiddle factors for 0 ≤ α ≤ π/4. The additional cost is an address mapping circuit as discussed in the previous section as well as multiplexers to interchange the real and imaginary parts and possible negations. The main benefit is that onlyN/8+1 words are required to be stored. The resulting structure is shown in Fig. 4

D. Address Mapping

The address mapping for a Radix-2i FFT is done as shown in 5. Here, the total length of the FFT is 2L points and the resolution of the twiddle factor multiplier isW2k. It is worth

noting that the address mapping for a givenWN multiplier is independent ofL. Clearly, i will affect the complexity of the address mapping circuitry.

III. ANALYSIS ANDRESULTS

We have analyzed complexity of twiddle factor memory having resolution ≥ 64 with different architectures, con-sidering radix-2i algorithm with different values of i. The architectures of the twiddle factor memories have been coded

(4)

in VHDL. These architectures were synthesized using the three different synthesis tools, Mentor Graphics Precision targeting an Altera Stratix-IV FPGA, ISE Xilinx targeting an Virtex-4 FPGA and Synopsys Design Compiler targeting 0.35μm CMOS standard cells. The twiddle factors are represented using 16 bits each for real and imaginary parts. The two’s complement representation of the numbers is used in the twiddle factor memory. The resulting complexity for each stage is illustrated in Figs 6, 7, and 8 for different technologies Altera Stratix-IV FPGA, Virtex-4 FPGA and 0.35μm CMOS ASIC, respectively.

Figures 6, 7, and 8 show that the twiddle factor memory with address mapping and symmetry architecture is the most advantageous one for high range. However, for small ranges, the simple look-up table approach is most beneficial. The point where address mapping and symmetry is more beneficial than the simple look-up table moves further towards the higher resolution of twiddle factor as the value of i increases.

Radix−22 W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 Memory Memory with AG Memory with AG and symmetry

Radix−23 W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 Radix−24 W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 Radix−2 5 Twiddle Factors W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 LUTs

Fig. 6. Radix2iSDF pipelined FFT twiddle factor memory complexity using

Mentor Graphics Precision targeting an Altera Stratix-IV FPGA.

Radix−22 W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 Memory Memory with AG Memory with AG and symmetry

Radix−23 W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 Radix−24 W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 Radix−2 5 Twiddle Factors W64W128W256W512W1024W2048W4096W8192 101 102 103 104 105 106 4− input LUTs

Fig. 7. Radix2iSDF pipelined FFT twiddle factor memory complexity using

ISE Xilinx targeting an Virtex-4 FPGA.

Radix−22 W64W128W256W512W1024W2048W4096W8192 103 104 105 106 107 108 Memory Memory with AG Memory with AG and symmetry

Radix−23 W64W128W256W512W1024W2048W4096W8192 103 104 105 106 107 108 Radix−24 W64W128W256W512W1024W2048W4096W8192 103 104 105 106 107 108 Radix−2 5 Twiddle Factors W64W128W256W512W1024W2048W4096W8192 103 104 105 106 107 108 Cell Area

Fig. 8. Radix2i SDF pipelined FFT twiddle factor memory complexity

0.35μm CMOS standard cells.

In FPGA designs, the memory with address mapping is not a beneficial choice because the synthesis tool does not utilize the “don’t care” conditions. However in the ASIC designs it is in the middle of the both, although never the best. To illustrate the input of the wordlength, we synthesize a W1024 twiddle factor using wordlengths varying from 10 to 18 bits to a Xilinx Virtex-4 FPGA. The results are shown in Fig. 9 and shows the expected linear behaviour. However, the offset, corresponding to the constant wordlength circuitry like address generation, differs between the approaches. Hence, one would expect that for resolutions that gave similar complexity in Figs. 6, 7, and 8, one would have to re-evaluate the best architecture based on the used wordlength.

Figure 10 shows the complexity using the best architec-ture of the twiddle factor memory for radix-2i algorithm in different technologies. It can be seen that, the twiddle factor complexity for the same twiddle factor increases as the value ofi increases in radix-2i algorithms.

8 10 12 14 16 18 20 500 1000 1500 2000 2500 3000 Radix−22 Memory Memory with AG Memory with AG and symmetry

8 10 12 14 16 18 20 500 1000 1500 2000 2500 3000 Radix−23 8 10 12 14 16 18 20 500 1000 1500 2000 2500 3000 Radix−24 8 10 12 14 16 18 20 500 1000 1500 2000 2500 3000 Radix−25 Wordlength 4− input LUTs

Fig. 9. W1024twiddle factor memory complexity for different wordlength

(5)

500 1000 1500 2000 2500 3000 3500 Altera LUTs W64 W128 W256 W512 W1024W2048W4096W8192 Radix−22 Radix−23 Radix−24 Radix−25 500 1000 1500 2000 2500 Xilinx 4− input LUTs W64 W128 W256 W512 W1024W2048W4096W8192 0.5 1 1.5 2 2.5 3 3.5x 10 5 ASIC Cell Area Twiddle Factor W64 W128 W256 W512 W1024W2048W4096W8192

Fig. 10. Best architecture of twiddle factor memory for different twiddle factors

TABLE II

TWIDDLE FACTOR MEMORY COMPLEXITY OF8192-FFT SDFPIPELINED WITH DIFFERENT ALGORITHMS.

Memory Algorithm 1 2 3 4 22[2] W8192 W2048 W512 W128 23[3] W8192 W1024 W128 -24[4] W8192 W512 - -25[5] W8192 W256 - -TABLE III

TWIDDLE FACTOR MEMORY COMPLEXITY OF8192-FFT SDFPIPELINED WITH DIFFERENT ALGORITHMS(ALTERA).

Memory complexity Algorithm 1 2 3 4 Total 22[2] 2650 729 240 95 3714 23[3] 2835 581 96 - 3512 24[4] 3002 339 - - 3341 25[5] 3123 157 - - 3280

Table II shows twiddle factors for a 8192-point FFT single delay feedback pipelined architecture having resolution ≥ 64 for different radix-2i algorithms. The complexity of each complex twiddle factor memory with best architecture by using the three different technologies are shown in Tables III, IV

TABLE IV

TWIDDLE FACTOR MEMORY COMPLEXITY OF8192-FFT SDFPIPELINED WITH DIFFERENT ALGORITHMS(XILINX).

Memory complexity Algorithm 1 2 3 4 Total 22 [2] 1592 735 383 201 2911 23 [3] 1653 556 228 - 2437 24 [4] 1791 550 - - 2341 25 [5] 1863 527 - - 2390 TABLE V

TWIDDLE FACTOR MEMORY COMPLEXITY OF8192-FFT SDFPIPELINED WITH DIFFERENT ALGORITHMS(ASIC).

Memory complexity Algorithm 1 2 3 4 Total 22[2] 246500.8 89471.2 39967.2 21294.0 397233.2 23[3] 260059.8 66739.4 25771.2 - 352570.4 24[4] 283501.4 58167.2 - - 341668.6 25[5] 300829.6 27318.2 - - 328147.8

and V respectively. The values in italic corresponds to that architecture where only a lookup table is used. This justifies the inital assumption that the same architecture is not benefical for all twiddle factor memories. The total complexity of the twiddle factor memory is reduced as the value ofi is increased, except for Xilinx results.

IV. CONCLUSIONS

In this paper, we have analyzed the complexity of twiddle factor memories for pipelined FFTs considering different architectures. Analysis is based on complexity comparisons of different radix-2i algorithms when implemented either on FPGAs (field programmable gate array) or standard cells. The results show that a plain lookup table is advantageous for low resolution memories while for larger resolution twiddle factor memories, utilizing octave symmetry and a address generator is advantageous. The break-point where the plain lookup table approach is advantageous increases with increasingi.

REFERENCES

[1] L. Wanhammar, DSP Integrated Circuits, Academic Press, 1999. [2] S. He and M. Torkelson, “A new approach to pipeline FFT processor,”

in Proc. IEEE Parallel Processing Symp., 1996, pp. 766–770. [3] S. He and M. Torkelson, “Designing pipeline FFT processor for

OFDM(de)Modulation,” in Proc. IEEE URSI Int. Symp. Sig. Elect., 1998, pp. 257–262.

[4] J.-E. Oh and M.-S. Lim, “New radix-2 to the 4th power pipeline FFT processor,” IEICE Trans. Electron., vol. E88-C, no. 8, pp. 694–697, Aug. 2005.

[5] A. Cortes, I. Velez and J. F. Sevillano, “Radixrk FFTs: matricial

representation and SDC/SDF pipeline implementation,” IEEE Trans.

Signal Processing on, vol. 57, no. 7, pp. 2824–2839, Jul. 2009.

[6] Y.-E. Kim, K.-J. Cho, and J.-G. Chung, “Low power small area modified Booth multiplier design for predetermined coefficients,” IEICE Trans.

Fund., vol. E90-A, no. 3, pp. 694–697, Mar. 2007.

[7] W. Han, T. Arslan, A. T. Erdogan and M. Hasan, “High-performance low-power FFT cores,” ETRI Journal, vol. 30, no. 3, pp. 451–460, June 2008.

[8] F. Qureshi and O. Gustafsson, “Low-complexity reconfigurable complex constant multiplication for FFTs,” in Proc. IEEE Int. Symp. Circuits

Syst., Taipei, Taiwan, May 24–27, 2009.

[9] H. Cho, M. Kim, D. Kim, and J. Kim “R22SDF FFT implementation

with coefficient memory reduction scheme,” in Proc. Vehicular

Technol-ogy Conf., 2006.

[10] M. Hasan and T. Arslan, “Scheme for reducing size of coefficient memory in FFT processor,” IEEE Electronics Letters, vol. 38, no. 4, pp. 163–164, Feb. 2007.

[11] O. Gustafsson and K. Johansson, “An empirical study on standard cell synthesis of elementary function look-up tables,” in Proc. Asilomar Conf.

Signals Syst. Comp., Pacific Grove, CA, Oct. 26–29, 2008.

[12] P. A. Milder, M. Ahmad, J. C. Hoe and M. P¨uschel “Fast and accurate resource estimation of automatically generated custom DFT IP cores,” in Proc. FPGA, 2006, pp. 211–220.

References

Related documents

I dag har den höga andelen inte beviljade kontaktförbud delvis sitt ursprung i att informationen som lämnas till sökandes inte är helt korrekt, vilket i vissa fall leder till att en

They model pathways of researchers navigating scholarly literature, stepping between journals and remembering their previous steps to different degree: zero-step memory as

There are several things which characterize the work; repetition, surreal, abstract installation, mixed media, memories, mystical dreams, boundaries and the message of

A valid point in the discussion regarding the sustainable fund management strategies is the conclusion drawn by Sandberg and Nilsson (2011) regarding ethical intuition. The authors

While in principle the direction of the externality depends on the characteristics of all goods in the economy, we show that there is a simple test to determine whether a producer

The pa per pres ents a proof that a so lu tion to this prob lem ex ists un der a nat u ral con di tion that the whole travel de mand can be served by the road net work within the

Vidare skiljer de sig åt på en socioekonomisk skala, vilket har visat sig vara en faktor för vissa brott (Entorf & Spengler, 2000, s. Vår studie har haft för avsikt att ha fokus

Percentage total body surface area injured (TBSA %), age, length of hospital stay, number of operations, antibiotics given, duration of antibiotic treatment, and pain score during