Design and Implementation of a Real-Time FFT-core for Frequency Domain Triggering

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Design and Implementation of a Real-Time FFT-core for

Frequency Domain Triggering

Examensarbete utfört i Elektroniksystem vid Tekniska högskolan vid Linköpings universitet

av

Mattias Eriksson LiTH-ISY-EX--13/4716--SE

Linköping 2013

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Design and Implementation of a Real-Time FFT-core for

Frequency Domain Triggering

Examensarbete utfört i Elektroniksystem

vid Tekniska högskolan vid Linköpings universitet

av

Mattias Eriksson LiTH-ISY-EX--13/4716--SE

Handledare: Timmy Sundström

SP Devices

Patrik Thalin

SP Devices

Mario Garrido

isy_{, Linköpings universitet}

Examinator: Kent Palmkvist

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Division of Electronics Systems Department of Electrical Engineering SE-581 83 Linköping Datum Date 2013-09-12 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-99374

ISBN — ISRN

LiTH-ISY-EX--13/4716--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Design och implementation av en realtids FFT-kärna för triggning i frekvensdomänen Design and Implementation of a Real-Time FFT-core for Frequency Domain Triggering

Författare Author

Mattias Eriksson

Sammanfattning Abstract

To efficiently capture signal events when performing analog measurements, a competent toolbox is required. In this master thesis, a system for frequency domain triggering is de-signed and implemented. The implemented system provides advanced frequency domain trigger conditions, in order to ease the capture of a desired signal event. A real-time 1024-point pipelined feedforward FFT-core is implemented to transform the signal from the time domain to the frequency domain. The system is designed and synthesized for a Virtex-6 FPGA (XC6VLX240T) and is integrated into SP Devices’ digitizer ADQ1600. The imple-mented system is able to handle a continuous stream of 1.6GS/s at 16-bit. A small software API is developed that provides runtime configuration of the Triggering conditions.

Nyckelord

(6)

(7)

Abstract

To efficiently capture signal events when performing analog measurements, a competent toolbox is required. In this master thesis, a system for frequency do-main triggering is designed and implemented. The implemented system pro-vides advanced frequency domain trigger conditions, in order to ease the cap-ture of a desired signal event. A real-time 1024-point pipelined feedforward FFT-core is implemented to transform the signal from the time domain to the frequency domain. The system is designed and synthesized for a Virtex-6 FPGA (XC6VLX240T) and is integrated into SP Devices’ digitizer ADQ1600. The imple-mented system is able to handle a continuous stream of 1.6GS/s at 16-bit. A small software API is developed that provides runtime configuration of the Triggering conditions.

(8)

(9)

Acknowledgments

I would like to thank my supervisors at SP Devices, Timmy Sundström and Patrik Thalin, for the support a guidance throughout this thesis work. I would also like to thank my supervisor at the University, Mario Garrido, for his excellent knowledge about FFTs and for sharing his research, which this thesis is mainly based upon.

A special thanks goes out to my family and friends for supporting me throughout years of studies, I would never have managed it without you.

Linköping, May 2013 Mattias Eriksson

(10)

(11)

I

Background

1 Introduction 3

1.1 Methodology . . . 5

1.2 Prerequisites . . . 5

2 Review of the DFT and FFT 7 2.1 Notations . . . 7

2.2 Discrete Fourier Transform . . . 7

2.3 Fast Fourier Transform . . . 9

2.3.1 Rotation optimization . . . 11

2.4 FFT Windows . . . 13

2.5 Overlapping FFT . . . 15

2.6 Review of FFT architectures . . . 18

3 Equipment and tools 23 3.1 Systems with frequency domain triggering . . . 23

3.2 ADQ1600 . . . 24 3.2.1 Triggering . . . 25 3.2.2 Xilinx Virtex-6 . . . 26 3.3 Programmable synthesizer . . . 28 3.4 Tools . . . 28 3.4.1 SP Devices . . . 29 3.4.2 MathWorks . . . 29 3.4.3 Xilinx . . . 29 vii

(12)

II

Implementation

4 Proposed design 33

4.1 Problem description . . . 33

4.1.1 Frequency domain trigger module . . . 34

4.2 Proposed solution . . . 36

4.2.1 Triggering . . . 36

4.2.2 System Architecture . . . 40

5 Implementation of the FFT-core 45 5.1 Proposed Parallel Pipelined radix-24_{Feedforward FFT . . . .} ₄₆

5.1.1 Data shufflers . . . 47

5.2 Rotations . . . 49

5.2.1 Trivial rotator . . . 50

5.2.2 Complex multiplication . . . 50

5.2.3 Reduced angle-set rotation: W16 . . . 50

5.3 Parallelization . . . 52

5.4 Data path sizing . . . 53

5.5 FFT output order . . . 55

5.6 Input arranger . . . 55

6 Implementation of energy calculation, windowing and triggering 57 6.1 Spectral Energy Calculation . . . 57

6.1.1 Implementation . . . 58

6.1.2 Resolution . . . 59

6.2 Windowing . . . 59

6.3 Triggering . . . 61

7 Software and Hardware Integration 63 7.1 Bus interface . . . 63

7.2 Configuration registers . . . 65

7.3 Software API . . . 66

7.3.1 Low level interface . . . 67

7.3.2 Bus level interface . . . 67

7.3.3 Algorithmic level interface . . . 67

7.3.4 Help functions . . . 68 7.3.5 Constants . . . 68 7.3.6 Example . . . 69

III

Results

8 Results 73 8.1 Testing . . . 73

8.2 Resources and performance . . . 74

(13)

CONTENTS ix

9 Conclusions 79

9.1 Future work . . . 79

(14)

2.1 Flow graph of a radix-2 butterfly element. . . 9

2.2 The flow graph of a 16-point radix-2 Decimation In Frequency (DIF) FFT. The numbers at the start nodes represent sample index, the numbers at the outputs denote the FFT-bin. A number, φ, be-tween the stages denote a rotation by W₁₆φ. . . 10

2.3 DIT vs DIF FFT butterfly. . . 11

2.4 Periodic extension of a periodic signal with fractal number of peri-ods in the 64-samples observation window. . . 13

2.5 Energy spectrum of a signal and the windowed (using Hanning window) version of the same signal, calculated using FFT. . . 14

2.6 Hamming, Hanning, Flat-top and Blackman windows plotted in the discrete time domain. . . 15

2.7 A signal with length 2N − 1 will in all cases span at least one FFT-frame. . . 16

2.8 Visualization of signal timing effects on a windowed signal for two FFT-frames. Blue line is signal and red line is applied window (Blackman). . . 17

2.9 Visualization of signal timing effects on a windowed signal for two overlapped FFTs. Signal length is N /2. Blue line is signal and red line is applied window (Blackman). . . 18

2.10 Real-time vs Non real-time FFT architecture. . . 19

2.11 Concept of delay feedback and delay commutator architectures. . 20

3.1 Frequency mask trigger configuration of an Agilent real-time spec-trum analyzer [40]. . . 24

3.2 User logic placement in the FPGA. . . 25

3.3 Picture of an ADQ1600. . . 26

3.4 Triggering in ADQ1600. . . 27

3.5 The programmable synthesizer Hameg HM8134-3. . . 29

3.6 A screenshot of ADCaptureLab. . . 30

4.1 The two blocks that this thesis will develop, in relation to major product blocks. . . 34

4.2 Visualization of a range mask, energy spectrum and inside-outside result mask. . . 37

(15)

LIST OF FIGURES xi

4.3 The conditions specifies sets of FFT-bins. . . 38

4.4 Illustration of trigger conditions and events. . . 38

4.5 Triggering upon a deviation from expected signal spectrum. Out-side Or mask is used for the highlighted area. The spurious signal marked in the figure causes a trigger . . . 39

4.6 Triggering upon a deviation from expected signal spectrum. Out-side Or mask is used for the highlighted area. . . 39

4.7 Triggering upon the appearance of a specific signal. Inside Or mask is used for the highlighted area. . . 40

4.8 Triggering example. . . 40

4.9 Block diagram of the system. . . 41

5.1 16 point Decimation In Frequency (DIF) FFT. . . 46

5.2 Two possible rotation placements. . . 47

5.3 Architecture of the implemented Fast Fourier Transform (FFT). . . 48

5.4 Block diagram of a data shuffle module. . . 49

5.5 Data flow through a data shuffler. . . 49

5.6 Sequence of rotation coefficients in a stage of two parallel cores with 50% overlap. The sequence repeats twice in an FFT-frame. In this case, the two cores use the same coefficients at the same time. . . 53

5.7 Rotation coefficient ROMs. . . 53

5.8 The output order of the FFT-core. Bin index 0 is lowest frequency (DC-level), bin index 1023 is highest frequency. Output is conju-gate symmetric, as explained in Section 2.2. . . 55

5.9 Figure of the input and output sample order for the input arranger. The numbers denote the sample index. . . 56

5.10 Block diagram of the input arranger. . . 56

6.1 FFT of a sinusoidal. The output energy spectrum is not perfectly symmetric, the lower half have more noise compared to the upper half. . . 58

6.2 Output paths of the FFT-core used. Bin index comes in bit reversed order. . . 58

6.3 Data width scaling in the energy calculation. The numbers denote Integer.Fractional(Total) bits. . . 60

6.4 Block diagram of the window unit. . . 60

6.5 Block diagram of the triggering modules. . . 61

8.1 FFT of a sinusoidal with 415 periods. . . 74

8.2 Validation setup. . . 75

(16)

2.1 Rotation resolutions in a radix-2 16-point FFT. . . 12

2.2 Rotation resolutions in a 16-point FFT, decomposed as two 4-point FFTs. . . 12

2.3 Rotations in a 64-point FFT, decomposed as a 16-point FFT and a 4-point FFT. . . 12

4.1 Frequency domain trigger module interface. . . 35

5.1 Coefficients used in W16rotator. . . 51

5.2 Rotator scalings in the FFT-core. . . 52

5.3 FFT-core SFDR for different output widths. . . 54

7.1 Input (from computer to the module) user register meanings. . . . 64

7.2 Vector arrangement in data bus. . . 64

7.3 Writeable address space. . . 65

7.4 Description of the control register. . . 66

7.5 Example of a trigger mask. . . 66

7.6 Possible windows. . . 67

8.1 Simulations performed. . . 75

8.2 Hardware resource usage for different modules. Usage is stated for one instance. . . 76

8.3 Comparison of similar systems. . . 77

(17)

Notation

Sets

Notation Meaning

Z Set of integer numbers

N0 Set of natural numbers, including zero

N+ Set of natural numbers, excluding zero

Abbreviations

Abbreviation Meaning

adc _{Analog to Digital Converter}

clb _{Configurable Logic Block}

cordic _{COordinate Rotation DIgital Computer}

dft _{Discrete Fourier Transform}

dif _{Decimation In Frequency}

dit Decimation In Time

dsp Digital Signal Processing

fft Fast Fourier Transform

fmt Frequency Mask Trigger

hdl Hardware Description Language

lsb Least Significant Bit

lut _{Look Up Table}

msb _{Most Significant Bit}

ram _{Random Access Memory}

rom _{Read Only Memory}

sdp _{Simple Dual Port}

sfdr _{Spurious Free Dynamic Range}

tdft Time Discrete Fourier Transform

tdp True Dual Port

uut Unit Under Test

(18)

(19)

Part I

(20)

(21)

1

Introduction

In the field of electrical engineering it is often necessary to verify or troubleshoot electrical circuits. A new design must be verified to produce the expected result with the expected precision. When this is not the case, troubleshooting the circuit becomes necessary. In order to verify a circuit, a tool that can measure desired properties of the circuit is required. In the case of digital circuits, this tool can be a logic analyzer that is able to capture the produced result of the circuit. The captured result can then be compared to the expected result. In case of analog circuits that produce non switching voltages or currents, a multimeter can be used to measure the quantities and verify correct values. In case of circuits where the signals change over time, an oscilloscope can be used to visually verify the signals. However, visual inspection is not feasible in situations where the signal is changing fast, because of human limitations. In these situations the signal can be sampled and stored digitally in a memory for later analysis and verification. A signal is sampled and stored digitally by a digitizer [1–3]. However, manual verification of long signals is cumbersome and not desired, since it is very time consuming.

In order to simplify signal capture and measurement, triggers are used. Triggers can be used to detect a signal event or condition and cause a signal capturing session in digitizers. In this way, triggers can be used to specify signal condi-tions that are of interest and should be analyzed further. If one of the condicondi-tions occur, the signal is captured and can be analyzed. Triggers often available in dig-itizers include level trigger, edge trigger, window trigger and hysteresis trigger [4]. These triggers operate in the time domain and uses the value of the signal to determine if a desired signal condition has occurred.

However, there are situations that require more advanced triggering possibilities

(22)

than what time domain triggers can offer. Time domain triggers cannot be used to trigger on individual signal components, in case of signals that contain several signal components with different frequencies and amplitudes. These components cannot be distinguished in the time domain.

When performing measurements on circuits that produce a known signal spec-trum, it is of interest to verify the integrity of the spectrum in order to detect any spurious signal components. For example, in radio communication it is required that the transmitter sends within a specific frequency band. No spurious signals outside this band should appear. In such a case, the expected spectrum is known beforehand and a violation of this spectrum must be detected. A spectrum an-alyzer can be used to inspect the spectrum of the produced transmission signal, and a max-hold function, that for every frequency keeps the highest obtained amplitude during the signal acquisition, can be used to verify spectral integrity. However, if the spectrum is violated at several different frequencies, there is no way to know if the violations occurred at the same time or at different time in-stances. In addition to that, the max-hold spectrum cannot give information of how many times the spectrum was violated or how the signal looked like dur-ing the violation. This information can be useful when troubleshootdur-ing a circuit, because it provides additional information of the system when it fails.

In order to capture this information there must be a way to trigger on the events that cause the spectrum violation. This triggering can be accomplished by study-ing the signal in the frequency domain.

Applications where detection and capturing of signal spectrum violations can be used include measurement and analysis of RF (Radio Frequency) signals [5], in perticular modern wireless applications [6], and testing of radios [7].

SP Devices has developed a range of digitizers. One of their products is called ADQ1600 and has a 14-bit A/D channel with a sample rate of 1.6GS/s [8]. The ADQ1600 uses a Virtex-6 FPGA by Xilinx that contains logic to collect, trigger on and process the sampled signal and send it to a computer. Triggering can be done by an external trigger signal or by the use of a signal level trigger, acting in the time domain. The ADQ1600 can be configured to collect a specific amount of samples before and after a trigger event, or at user request. The collected samples can then be transferred to a computer through USB, PCIe or PXIe. The trigger conditions can be configured through the computer interface.

The ADQ1600 is lacking the possibility to trigger on events in the frequency do-main and is, therefore, not suitable to detect spurious signal components and verifying spectral integrity of signals. In this thesis, a module that extends the capabilities of the ADQ1600 to include frequency domain triggering is proposed and implemented into the FPGA on the board. The implemented module is ca-pable of verifying the spectral integrity of a signal by detecting spurious signal components, and issue a trigger signal when a predefined spectrum mask is vi-olated. This is accomplished by using two parallel FFT-cores that calculate the spectrum of the signal. The resulting spectrums are then compared to a user

(23)

1.1 Methodology 5

configured mask and, if the mask is violated by the signal, a trigger signal is is-sued, which is handled by the ADQ1600. In addition to this module, a small software API is developed in this thesis. It can be used in MATLAB to configure the implemented module through a computer interface.

The first part of the thesis report will explain some theoretical knowledge that is required to get a good understanding of the thesis work. Chapter 2 describes the theory of the Fast Fourier Transform and reviews FFT architectures. Chapter 3 reviews products that have similar functionality to the system implemented in this thesis. It also describes some of the tools and hardwares used.

The second part of the thesis describes the proposed system and the implemen-tation of the system. Chapter 4 presents this thesis problem and the solution to the described problem. Chapter 5 describes the implementation of the FFT-core. Chapter 6 describes the implementation of remaining modules in the sys-tem. Chapter 7 explains the hardware-software interface and integration. The third part discusses the results of the thesis and a conclusion is drawn. Chap-ter 8 describe the experimental results of the implemented system and how it compares to related products. Chapter 9 concludes the thesis by drawing a con-clusion.

1.1 Methodology

The implementation was approached in the following way: 1. Literature study

2. Requirement collection

3. System sketch and high level model in MATLAB. A high level MATLAB model is constructed to get a reference for comparison when the system is implemented in Verilog. The model’s task is to verify the function of the system modules and to confirm a functional signal flow that gives raise to satisfactory result and precision.

4. Incremental implementation of the system

5. Test the whole system and verify requirement fulfilment

1.2 Prerequisites

In order to fully understand the thesis some prerequisites are required by the reader. The reader is assumed to have basic knowledge in digital design and more specifically in FPGA design. Some specific information about the FPGA used in this thesis is given but the concept of digital design and FPGA design is not described.

(24)

Basic mathematical understanding and knowledge about digital signal process-ing is assumed. A brief explanation of the Discrete Fourier Transform (DFT) and Fast Fourier Transform (FFT) are provided in Chapter 2.

(25)

2

Review of the DFT and FFT

This chapter reviews the DFT and FFT algorithms and architectures. First Sec-tion 2.1 describe notaSec-tions used throughout the chapter. SecSec-tion 2.2 introduces the DFT. Then Section 2.3 introduces the FFT. Section 2.4 describes the use of windows. Section 2.5 describe the concept of overlapping FFT calculation. Lastly, Section 2.6 reviews FFT architectures.

2.1 Notations

Signals in discrete time domain are denoted with small letters and square brack-ets, x[n], while signals in discrete frequency domain are denoted in capital letters and square brackets, X[k]. Index k is used in the frequency domain while index n is used in the time domain. The imaginary part of complex numbers are

repre-sented with j, where j2= −1.

2.2 Discrete Fourier Transform

Discrete Fourier Transform (DFT) is a well-used and very powerful procedure in the field of digital signal processing [9, 10]. It is the discrete version of the popular continuous Fourier Transform and is defined in Definition 2.1.

2.1 Definition (Discrete Fourier Transform). The N-point DFT for a sequence

x[n] is defined as X[k] = N −1 X n=0 x[n]W_Nnk (2.1) 7

(26)

where

W_Nnk= e

−j2πnk

N (2.2)

and k = 0, 1, . . . , N − 1.

The DFT transforms the complex discrete time domain signal x[n] to a complex signal in the frequency domain, X[k]. The frequency-domain samples, the out-puts of the DFT, are called “output bins” or simply “bins” [9].

The factor W_Nnk is a clockwise rotation by an angle nk of N fractions of the unit

circle (a rotation by −2πnk_N radians) and is called a twiddle factor [11]. In this

thesis, the resolution of the rotation is said to be N , since the smallest rotation

(except no rotation) is ±2π_N radians, that is one N :th part of the unit circle.

Calculating the DFT of a sequence, x[n] where n = 0, . . . , N − 1, is the same as calculating the Time Discrete Fourier Transform (TDFT) of the time-domain

pe-riodic extension, xperiodic[n] = x[n mod N ] where n ∈ Z, of x[n], [9]. The reason

why this is important is described in Section 2.4.

All physical quantities are real-valued and are in many cases sampled into real numbers, in contrary to complex numbers. Since real numbers are a subset of complex numbers, the DFT can be calculated for real valued numbers. The DFT has an important property when it is calculated on a real-valued set of inputs. When the input sequence, x[n] where n = 0, . . . , N − 1, is real valued, the output of the DFT is conjugate symmetric [9]:

X[k] = X[N − k]∗, k = 1, . . . , N − 1 (2.3)

where superscript * denotes complex conjugation. The complex conjugation of x = xr+ jxj is defined as x∗= xr−jxj.

The symmetry property shown in Equation 2.3 means that the real part of the output sequence, X[k] where k = 0, . . . , N − 1, has even symmetry and the imag-inary part of the output sequence has odd symmetry[9]. The magnitude of the output sequence is then even symmetric, |X[k]| = |X[N − k]| for k = 1, . . . , N − 1. Worth noticing is the fact that X[0] is independent and do not have a symmetric sibling. When N is even output bin N /2, X[N /2], is also independent.

Because of the symmetry, only N /2 + 1 output values are independent for even numbered N . The rest are redundant and can be calculated from the set of inde-pendent values. This means that a 1024-point FFT only gives 513 useful output samples when the input is real-valued.

The DFT of a real valued sinusoidal with peak value A will give rise to a

magni-tude (or amplimagni-tude) of AN₂ at the corresponding output bin [9]. Because of this,

the output of the DFT is usually scaled with 2/N in order to “preserve” the spec-tral amplitudes. The scaling factor is dependent on the window that is used, as described in Section 2.4.

Direct calculation of the DFT requires O(N2) complex rotations. This can be

(27)

2.3 Fast Fourier Transform 9

of N rotations have to be calculated, N2rotations have to be performed. This

ro-tation complexity is very high and limits the usage of the DFT to problems with a small number of input samples, N .

2.3 Fast Fourier Transform

A Fast Fourier Transform [10] is an algorithm that calculates the Discrete Fourier Transform or its inverse in a more efficient way compared to direct calculation from the DFT definition. With the introduction of FFT algorithms, the use of DFT has become practical for large problem sizes, where the calculation of DFT are too time consuming.

The Cooley-Tukey algorithm [12] describes a way to decompose an N = N1N2

-point DFT into two separate N1 and N2 point DFTs. The radix-2 FFT divides

an N -point DFT into two N /2-point interleaved FFTs, which in turn is divided further until a set of the most basic 2-point DFTs are left. This 2-point DFT is the atom part of the radix-2 FFT and is called a butterfly operation. Using this

decomposition, the FFT is calculated in a series of n = log₂N stages, as depicted

in Figure 2.2

A radix-2 butterfly operation takes two input samples and calculate two output samples. The flow graph is shown in Figure 2.1.

-x[0] x[1] + + X[0] X[1]

Figure 2.1:Flow graph of a radix-2 butterfly element.

The equation that describe the butterfly is the definition for the DFT unfolded for two input samples:

X[k] = 1 X n=0 x[n]W₂nk= ( X[0] =P1 n=0x[n]W20= x[0] + x[1] X[1] =P1 n=0x[n]W2n = x[0] − x[1] (2.4)

where the twiddle factor W_Nnk is defined in Equation 2.2 and k = 0, 1. Higher

radix butterflies can be obtained by unfolding the DFT equation for more points. Decimation In Frequency (DIF) and Decimation In Time (DIT) are the two most common ways to decompose the Cooley-Tukey FFT [10]. DIT separates the input sequence, x[n], into odd and even samples according to Equation 2.5.

(28)

Figure 2.2: The flow graph of a 16-point radix-2 Decimation In Frequency (DIF) FFT. The numbers at the start nodes represent sample index, the num-bers at the outputs denote the FFT-bin. A number, φ, between the stages

denote a rotation by W₁₆φ. X[k] = N /2−1 X n=0 x[2n]W_{N /2}nk + W_Nk N /2−1 X i=0 x[2n + 1]W_{N /2}nk = DFTN /2(x[0], x[1], . . . , x[N − 2]) + WNk · DFTN /2(x[1], x[3], . . . , x[N − 1]) (2.5)

The DIF decomposition separates the output into odd and even frequencies, ac-cording to Equation 2.6 and Equation 2.7.

X[2k] = N /2−1 X n=0 (x[n] + x[n + N /2])W_{N /2}nk = DFTN /2(x[n] + x[n + N /2]) (2.6)

(29)

2.3 Fast Fourier Transform 11 X[2k + 1] = N /2−1 X n=0 ((x[n] − x[n + N /2])W_Nn)W_{N /2}kn = DFTN /2 (x[n] − x[n + N /2]) W_Nk (2.7)

The two different decompositions result in different placement of rotations in relation to the butterfly elements. For DIT, rotations are placed before the but-terfly element. For DIF, rotations are placed after the butbut-terfly. This is shown in Figure 2.3. X[0]

-x[0] x[1] + + X[1] W N

(a)DIT FFT butterfly.

X[0]

-x[0] x[1] + + X[1] W N (b)DIF FFT butterfly.

Figure 2.3:DIT vs DIF FFT butterfly.

The FFT requires O(N log N ) complex rotations compared to O(N2) for the DFT

[9]. The derivation why this is the case will not we shown here, the reader can find out more about the Cooley-Tukey decomposition and FFT complexity in [9, 12]. It is important to note that the FFT and the DFT calculate the same transform but using different techniques. They give the same result and possess the same properties (for example output symmetry).

2.3.1 Rotation optimization

A notation that will be used in this text is that a rotation angle φ is specified in N fractions of the circumference, φ denotes −2π_Nφ radians [11].

Consider an N-point radix-2 DIF FFT. It has complex rotations after each butter-fly stage except the last. It can optionally be seen as having complex multiplica-tions before each butterfly stage except the first. Let’s assume the rotamultiplica-tions of the

two inputs to a butterfly, at stage s > 1, are φAand φB. This can also be expressed

as φA= φ0and φB= φ0+ ∆φ, where ∆φ is the difference in the rotation and φ0

the common part of the rotation. For all stages in a radix-2 DIF FFT, the two in-puts to a butterfly have either a rotation difference of 0 or N/4 [11]. The common

(30)

Ae−j2πNφ 0 ±_Be−j2πN(φ 0 +∆φ)_{= (A ± Be}−_j2π N∆φ_{) · e}−j2πNφ 0 (2.8) where A and B denote the input data of the butterfly. By using Equation 2.8 and

moving the rotations accordingly, we have obtained radix-22from the flow graph

of radix-2 [11].The left side of Equation 2.8 represents the computations using

radix-2 and the right side using radix-22. By pushing the common rotation angle,

φ0, after the butterfly, this rotation can be combined with the rotations in the next

stage. The use of radix-22can significantly decrease the number and complexity

of the rotators required in the FFT-core with respect to radix-2.

Consider a DIF FFT with the structure as in Figure 2.2. The twiddle factors in each stage are shown in Table 2.1. By pushing the rotations in the first stage as

described in Equation 2.8, radix-22is obtained, with the twiddle factors shown in

Table 2.2. The resulting FFT can be thought of as a decomposition of a 16-point

FFT into two 4-point FFTs (radix-22), according to the Cooley-Tukey algorithm.

If rotations are pushed according to the structure described above, it is possible

to obtain a higher radix (radix-2k) FFT from the flow graph of a radix-2 FFT. It

is possible to make the rotations in each other stage W4 rotations, each fourth

stages to W16 rotations and the rest to general rotations (WN), using radix-24

[11]. A 64-point FFT can be obtained by extending the 16-point FFT (radix-24)

with a 4-point FFT (radix-22), mixing that radices of the elements. This is shown

in Table 2.3.

Stage

1 2 3

Rotations W16 W8 W4

Table 2.1:Rotation resolutions in a radix-2 16-point FFT.

Stage

1 2 3

Rotations W4 W16 W4

Radix-22 _Radix-22

Table 2.2:Rotation resolutions in a 16-point FFT, decomposed as two 4-point

FFTs.

Stage

1 2 3 4 5

Rotations W4 W16 W4 W64 W4

Radix-24 Radix-22

Table 2.3:Rotations in a 64-point FFT, decomposed as a 16-point FFT and a

(31)

2.4 FFT Windows 13 −60 −40 −20 0 20 40 60 80 100 120 −2 −1 0 1 2 Sample index Amplitude Periodic signal Observation window

(a)Periodic signal.

−60 −40 −20 0 20 40 60 80 100 120 −2 −1 0 1 2 Sample index Amplitude Periodic extension discontinuity discontinuity (b)Periodic extension.

Figure 2.4: Periodic extension of a periodic signal with fractal number of

periods in the 64-samples observation window.

2.4 FFT Windows

The DFT works well for signals with an integer multiple of cycles in the discrete time signal. However that is not usually the case when digitizing real physical signals. When the input to the DFT contains signals with a non-integer number of cycles in the observed time period, spectral leakage occurs. Spectral leakage is a result of processing finite-duration records [13]. This is because the periodic extension of the observed signal has discontinuities in the boundaries of the ob-servation [13], as shown in Figure 2.4. This spectral leakage distorts the result of the DFT and, therefore, must be handled.

The described problem of spectral leakage is handled by the use of windows. A window is a function that is weighted in a way that spectral leakage is minimized when the window is applied to the samples that are transformed. Windows are applied by multiplication of the window function and the input data, according to

xW[n] = x[n]w[n] (2.9)

where w[n] denotes the window function.

Windows are constructed in a way that the window function is smoothly brought to zero near the boundaries in order to achieve a continuous periodic extension in many derivatives [13]. Figure 2.5 show the energy spectrum (calculated with

a 1024-point FFT) of a signal, x[n] = sin(n·2.17π₁₆ ), and a windowed version of

(32)

Figure 2.5: Energy spectrum of a signal and the windowed (using Hanning window) version of the same signal, calculated using FFT.

(defined in Definition 2.2) was used.

2.2 Definition (Hanning window). The Hanning window is defined as

wN_{H anning}= 0.5 − 0.5 cos

_2πn

N

(2.10) where N is the window length [9].

2.3 Definition (Rectangular window). The Rectangular window is defined as

w_RectangularN = 1 (2.11)

where N is the window length [9]. This window is coherent with the calculation of DFT.

The usage of windows affects the signal and particularly the spectrum of the sig-nal in many ways. All windows possess a coherent gain which manifests as a known bias on the spectral amplitudes [13]. The spectral amplitudes are propor-tional to the sum of the window weights, which is the DC-bias of the window [13]. The gain of a rectangular window (defined in Definition 2.3), that is the same gain as calculating the DFT without using a window, is N . Calculating the DFT without a window is equal to using the rectangular window defined in Def-inition 2.3. The gain of a rectangular window is N . The coherent gain is usually normalized by division of N , so that the coherent gain of a rectangular window is considered to be unity [14]. The spectrum must be corrected to compensate for the gain in the window used when absolute measurements in the spectrum are performed.

When the FFT is calculated repeatedly in a non-overlap fashion on a sequence of sampled and windowed data, a large part of the signal is ignored due to the windows having small values near the boundaries. This loss of data could cause a

(33)

2.5 Overlapping FFT 15

Figure 2.6:Hamming, Hanning, Flat-top and Blackman windows plotted in

the discrete time domain.

miss of event if the transform is used to detect short signals and the signal would appear near the boundaries [13]. To avoid this scenario and increase the chance to detect such short time signals, FFTs are calculated with overlap. But overlap in the time domain means that the resulting spectrums are correlated [13]. If a FFT is calculated with 50% overlap in the time-domain using a rectangular window, the resulting spectrums will be 50% overlap correlated to each other. The correlation is highly dependent on the window used.

The FFT can be thought of as a matched filter bank. Each FFT-bin is the out-put of the filter matched to the frequency corresponding to that FFT-bin. The frequency characteristics of the filters is the frequency characteristics of the win-dow function used. Therefore, it is of high importance to carefully choose the correct window function to achieve desirable performance of the FFT. The per-formance measurements that are usually considered are: highest sidelobe level, sidelobe falloff, Equivalent Noise Bandwidth (ENBW), 3-dB Bandwidth (BW) and scalloping loss. These measurements are described in [13].

Four common window functions are shown in Figure 2.6: Hamming, Hanning, Flat-top and Blackman. A large list of windows can be found in [13].

2.5 Overlapping FFT

When an FFT is calculated, the resulting spectrum is the spectrum of the signal during a specific time period of N samples (in this thesis referred to as an FFT-frame), rather than the spectrum of the signal at a specific time instance. When FFTs are calculated after each other on a continuous stream of samples, signals may appear at any time and may therefore not be aligned to an FFT-frame. Short-time signals that are not aligned with an FFT-frame will have lower energy peak in the calculated spectrum, than what is actually the case for the signal. For

(34)

2N-1

N N N

FFT FFT FFT

Signal

(a)Signal spans first FFT.

2N-1

N N N

FFT FFT FFT

Signal

(b)Signal spans second FFT.

2N-1

N N N

FFT FFT FFT

Signal

(c)Signal spans second FFT.

2N-1

N N N

FFT FFT FFT

Signal

(d)Signal spans second FFT.

Figure 2.7: A signal with length 2N − 1 will in all cases span at least one

FFT-frame.

example, a signal of length L that appears half of the time in frame ai and half

of the time in frame ai+1 will have its energy spread equally in the two blocks.

Since a window function is used in practical spectrum analyzer applications, the signal may not be visible at all because of the attenuation near the edges of the FFT-frame. This is not acceptable when short time signals needs to be analyzed correctly. To be sure that the full energy of the signal is captured, the signal needs to have a duration of at least 2N − 1 samples, to guarantee that it spans at least one FFT-frame. This can be understood by studying Figure 2.7. In this thesis, the minimum signal duration refers to the minimum duration a signal must have to span at least one FFT-frame.

When a signal is not long enough to guarantee to span an FFT-frame, the timing of the signal compared to an FFT calculation determines the resulting spectrum. Figure 2.8 demonstrates the best and worst case timing of a signal with length N /2, which is less than the minimum signal duration of 2N − 1. In the case where the signal is appearing in the middle of an FFT-frame, Figure 2.8b, the cal-culated amplitude of the signal is much higher than in the case where the signal is appearing between two FFT-frames, Figure 2.8c. The difference of the spectral peak between Figure 2.8b and Figure 2.8c is approximately 23.3 dB. When anal-ysis of short-time signals is desired, it is not acceptable that the timing of short signals affects the detectability of the signal.

To relax the minimum duration requirement of a signal, several FFTs can be calcu-lated in parallel, where the calculations are made on frames that overlap in time. For example, two N -point FFT-cores operating with N /2 samples overlap. Min-imum signal duration requirement is then reduced to 1.5 · N − 1, since a signal of length 1.5 · N is guaranteed to span at least one FFT-frame. This can be gen-eralized to R FFTs calculated with uniform overlap, where the minimum signal duration requirement is (1 + 1/R) · N − 1. In addition to relaxing the minimum sig-nal duration requirement, overlapping FFT calculation also decrease the effects

(35)

2.5 Overlapping FFT 17 0 200 400 600 800 1000 1200 1400 1600 1800 2000 −1 −0.5 0 0.5 1 Sample index Amplitude

(a)Short-time signal.

0 200 400 600 800 1000 1200 1400 1600 1800 2000 −1 −0.5 0 0.5 1 Sample index Amplitude

(b)Best case timing.

0 200 400 600 800 1000 1200 1400 1600 1800 2000 −1 −0.5 0 0.5 1 Sample index Amplitude

(c)Worst case timing.

Figure 2.8: Visualization of signal timing effects on a windowed signal for

two FFT-frames. Blue line is signal and red line is applied window (Black-man).

(36)

0 500 1000 1500 −1 −0.5 0 0.5 1 Sample index Amplitude

(a)Worst-case timing of short-time signal as seen from the first FFT.

0 500 1000 1500 −1 −0.5 0 0.5 1 Sample index Amplitude

(b)Worst-case timing of short-time signal as seen from the second FFT (overlapped).

Figure 2.9: Visualization of signal timing effects on a windowed signal for

two overlapped FFTs. Signal length is N /2. Blue line is signal and red line is applied window (Blackman).

of the timing of short-time signals. This is because the overlapping FFTs comple-ment each other. Where the windowing of one FFT attenuates the signal a lot, the windowing of an overlapping FFT attenuates the signal a little, thus all parts of the signal are considered almost equally. In other words, the gap between two successive FFT-frames (windows) are filled by an overlapped FFT.

Figure 2.9 demonstrates this for two FFTs with 50% overlap. The figure shows the worst-case timing for a signal of length N /2 for two overlapping FFTs. If the signal would appear a little earlier, the first FFT would calculate higher am-plitude peak, compared to the second FFT. If the signal would appear later, the second FFT would calculate higher amplitude peak. The difference of the spec-tral peak between best case timing, Figure 2.8b, and worst-case timing for two parallel FFTs, Figure 2.9a is approximately 4.9 dB. The variation of the spectral peak (determined by the timing) is much lower in the case of two overlapped FFTs compared to non-overlapped FFTs. For the example provided, the variation is decreased from 23.3 dB to 4.9 dB. The variation is further decreased for higher overlapping.

2.6 Review of FFT architectures

There are several different FFT architectures proposed and implemented in digi-tal systems. FFT architectures can be divided into different types of architectures. There are real-time and non-real-time architectures. Real-time architectures, as the name suggests, calculate the FFT in real-time on a continuous stream of data,

(37)

2.6 Review of FFT architectures 19 Data acquisition FFT calculation Data acquisition FFT calculation Data acquisition FFT calculation Data acquisition FFT calculation Data acquisition FFT calculation Time (a)Real-time FFT. Data acquisition FFT calculation Data acquisition FFT calculation Data acquisition FFT calculation Time (b)Non-real-time FFT.

Figure 2.10:Real-time vs Non real-time FFT architecture.

having a processing time that is shorter or equal to the data acquisition time. Non-real-time FFT architectures have a processing time that is longer than the data acquisition time, and, therefore, cannot accept a continuous stream of sam-ples. The difference is depicted in Figure 2.10.

Two common types of FFT architectures are the delay feedback [15–28] and de-lay commutator architectures (also known as feedforward architectures) [11, 18– 20, 29–33]. These architectures have a variety of possible flavors. There are single-path delay feedback (SDF) [15–20], multi-single-path delay feedback (MDF) [21–28], single-path delay commutator [18, 20] and multi-path delay commutator (MDC) architectures [11, 18, 19, 29–34]. Multi-path architectures have several data prop-agation paths while in single path architectures the data propagate in one path. The FFT architectures are divided into stages, where each stage perform butterfly operations, rotations and reordering of the sample data.

The different architectures are illustrated in Figure 2.11. PE denotes Processing Element and is a butterfly operation of some radix r. DC denotes Delay Commu-tator. The delay commutator is the element that arranges the samples in correct order for the processing element. It contains delay elements. In the Multi-path Delay Feedback (MDF) architecture several Single-path Delay Feedback (SDF) architectures are parallelized and their outputs are combined by calculating but-terfly operations and rotations on the outputs.

All these architectures can process a continuous stream of data and be heavily pipelined, to achieve high throughput. Multi-path architectures can simultane-ously process P parallel data inputs, while single-path architectures process a single data input at a time. In modern FPGAs a clock rate of a few hundred MHz is typically achievable, any throughput higher than this requires several samples to be precessed each clock cycle.

[11] recently proposed parallel pipelined radix-2kfeedforward FFT architectures

(38)

re-PE Delay PE Delay PE Delay

(a)Single-path delay feedback architecture.

PE

DC DC PE DC PE

(b)Single-path delay commutator architecture.

PE D PE D PE D PE D PE D PE D Combining

(c)Multi-path delay feedback architecture.

PE

DC DC PE DC PE

(d)Multi-path delay commutator architecture.

(39)

2.6 Review of FFT architectures 21

sources compared to parallel feedback architectures. These architectures process several parallel samples each clock cycle. [11] illustrates architectures with two, four and eight parallel samples, but this can be enhanced even further. It has a simple structure, with only three different simple building blocks (butterflies, rotators and delay commutators).

Other designs of multi-path delay commutator architectures and multi-path de-lay feedback architectures have been proposed as well [11, 27, 32, 35, 36]. The pipelined MDC FFT architectures proposed by [32, 35] process four parallel sam-ples. These architectures are designed for ASICs rather than FPGAs, and have more complex building blocks compared to the architectures proposed by [11]. The MDF FFT architectures proposed by [27, 36] process four parallel samples.

The architecture proposed by [27] implement a radix-24FFT and contains three

different types of butterfly modules and three different types of rotation mod-ules. The architecture proposed by [36] implement a radix-8 FFT and consist of three large modules with complex structures. It is designed for ASICs rather than FPGAs.

A special case of the FFT is when the input samples are real-valued. In this case the output of the FFT is conjugate symmetric, as explained in Section 2.2. Because of this property, there exists several proposed architectures that are optimized and designed to only handle real valued inputs [37–39]. [37] has proposed an architecture for the calculation of real valued FFT that requires less hardware compared to standard complex FFT. For a 4-parallel FFT-core the number of adders is halved, and the number of complex rotators is reduced by one third compared to similar architectures [37].

(40)

(41)

3

Equipment and tools

This chapter presents the digitizer used in this thesis, other products that pro-vide frequency domain triggering and tools used during the thesis. Section 3.1 describe products with frequency domain triggering. Section 3.2 presents the digitizer by SP Devices used in this thesis. Section 3.3 briefly presents a product used for testing. Lastly, Section 3.4 contains a brief introduction to the tools and applications used in this thesis.

3.1 Systems with frequency domain triggering

Frequency domain triggering functionality exists in other products today. Ad-vanced real-time spectrum or signal analyzers usually provide a frequency do-main trigger to detect and trigger on spurious signal components and spectral in-tegrity violation of a signal. A common way to implement the frequency trigger is to let the user configure a user-defined spectrum mask. A trigger is then issued when a signal component violates the masked region. This triggering technique is called Frequency Mask Trigger (FMT). Tektronix Spectrum Analyzer RSA6000 Se-ries and Agilent X-SeSe-ries Signal Analyzers provide FMT [40, 41]. FMT is achieved by the use of one or several real-time FFT-cores that transform the measured sig-nal to the frequency domain. The resulting spectrum is then compared to the user defined mask. If the mask is violated, a trigger is issued.

Agilent describes FMT in the following way [42].

“When looking for a specific signal, a powerful approach is to com-pare the fast stream of spectrum data to a user-defined spectrum mask and then generate a trigger when the mask is exceeded or when the signal enters the mask region. Further enhancements include

(42)

Figure 3.1: Frequency mask trigger configuration of an Agilent real-time spectrum analyzer [40].

tional triggering on actions such as a signal exiting or re-entering the mask and various combinations thereof. This is FMT.”

The configuration of a frequency trigger in an Agilent signal analyzer can be seen in Figure 3.1.

Tektronix has a short video that explains and demonstrates their FMT [43], which works in a similar fashion as the one used by Agilent.

Typical frequency range for a signal or spectrum analyzer is from a few kHz to tens of GHz [40, 41]. However, the analyzer observes the a signal in a limited bandwidth within this frequency range. This bandwidth is called acquisition bandwidth. Maximum acquisition bandwidth for Tektronix RSA6000 is 110MHz and 160MHz for Agilent X-Series Signal Analyzer [40, 41]. Frequency triggering can only be achieved within the acquisition bandwidth.

3.2 ADQ1600

A digitizer is a signal acquisition tool that samples an analog signal into digital values and stores them in a large memory. ADQ1600 is the name of the digitizer

(43)

3.2 ADQ1600 25

Virtex-6 FPGA

Data

catcher User logic

Digitizer framework ADC Signal USB PXIe/PCIe Control signals

Figure 3.2:User logic placement in the FPGA.

by SP Devices used in this thesis. It has a single 14-bit A/D channel that can sample data at 1600MSps [8]. The -3dB bandwidth of the ADQ1600 is 600MHz, but has an optional equalizer to enhance the bandwidth to Nyquist frequency (800MHz). In contrast to signal or spectrum analyzers, the bandwidth of a dig-itizer is determined by the sampling frequency (and electrical components) and cannot be moved in a frequency range.

The ADQ1600 uses a Virtex-6 FPGA by Xilinx that contains necessary logic to trigger, collect and process the sampled signal and send it to a computer. The ADQ1600 is shown in Figure 3.3. The only connection on the ADQ1600 used in this thesis, is the signal input. This signal is sampled and digitized by an ADC. The signal is then fed into the FPGA and the user logic. The user logic is a space inside the FPGA reserved for custom implementations by the customers of the ADQ1600. Figure 3.2 describe the placement of the user logic in the FPGA, in relation to the signal flow. The module implemented in this thesis is placed inside the user logic space, that provides an interface to the product. The user logic interface includes data in/out, trigger in/out and user register in/out. The user logic interface is described in more detail in Section 4.1.1. The Virtex-6 FPGA used on the ADQ1600 is described in Section 3.2.2.

3.2.1 Triggering

The signal acquisition of a digitizer is controlled by the use of triggers. As stan-dard, there are three different trigger options inside the ADQ1600 digitizer: ex-ternal trigger, software trigger or level trigger [8]. Exex-ternal trigger is used when another device determines when data acquisition should take place. This device then produces a trigger signal that is connected to a dedicated external trigger input on the ADQ1600. External triggers can be used to synchronize several dig-itizers, when they should capture data simultaneously. Software trigger is when a user or program signals the digitizer to trigger, through a computer interface. Level trigger is based on the level of the captured signal. The level trigger is used to trigger when the analog signal level is above or below a cirtain threshold. The captured signal is compared to a threshold in the ADQ1600 and a trigger signal is issued internally. These triggering techniques are not unique for SP Devices’ ADQ1600. The same or similar trigger functionality is usually found in

(44)

digitiz-Figure 3.3:Picture of an ADQ1600.

ers [4, 44, 45].

A data acquisition session is initialized by configuring the digitizer. The digitizer is then armed, waiting for a trigger. At the same time the digitizer samples the input signal and stores the data in a record. Data is collected in records that act as FIFOs, each new sample value pushes the oldest away, as shown in Figure 3.4a. When a trigger is received or detected (Figure 3.4b), a user configurable amount of additional samples are captured and stored in the record. In this way both samples before and after the trigger are stored in a record, as seen in Figure 3.4c. Samples before the trigger are called pre-trigger samples and samples after the trigger are called post-trigger samples. This is visualized in Figure 3.4. How many samples that should be captured before and after a trigger (record length) is user configurable and can have any ratio.

3.2.2 Xilinx Virtex-6

The ADQ1600 contains a Virtex-6 FPGA from Xilinx. More precisely the model XC6VLX240T. This model contains 37680 slices, 768 DSP slices and 416 36Kb block RAM blocks. Some details about the basic elements of the Virtex-6 FPGA used in this thesis are described here. Specific information of the Virtex-6 FPGA

(45)

3.2 ADQ1600 27

Record New _samples

Old samples

(a)Samples are stored in a record.

Trigger point

Record New _samples

Old samples

(b)A trigger event occur.

Pre-trigger samples Post-trigger samples

Trigger point

(c)The record consists of pre-trigger samples and post-trigger samples.

Figure 3.4:Triggering in ADQ1600.

can be found in [46].

Configurable Logic Block (CLB)

A CLB is the top building block in Virtex-6 FPGAs. A CLB consists of two slices. A slice is a block that holds four Look-Up Tables (LUTs), eight flip-flops, multi-plexers and carry logic [47]. A LUT can be configured as a single 6-input LUT with one output or as two 5-input LUTs with two independent outputs but com-mon inputs. The LUTs can be registered by the flip-flops.

A slice can contain up to four 4:1 multiplexers, two 8:1 multiplexers or one 16:1 multiplexer. A slice also contains logic for fast carry propagation used in addition and subtraction [47]. Two independent carry chains (one for each slice) exist in a CLB. The carry chains are connected between several CLBs providing wide additions/subtractions.

In the FPGA used in this thesis 14600 (≈ 39%) slices can be used as distributed RAMs (LUTRAM) or shift registers. A more detailed explanation of the Virtex-6 CLB can be found in [47].

Block RAM

The Virtex-6 contains 36Kb (Kilobit) dual-port block RAM blocks. The meaning of dual-port is that two ports can simultaneously and independently access (read or write) the RAM, sharing nothing but the same RAM data. The block RAMs are synchronous in its operation, this is the case for both read and write accesses. This gives one cycle latency for all operations.

The RAMs have configurable data widths ranging in some discrete steps from 32Kbx1b to 512bx72b (depth x width). A 36Kb block RAM can be divided into two completely independent 18Kb block RAMs. Each 18Kb block RAM have a configurable width ranging from 16Kbx1b to 512bx36b.

(46)

The block RAMs (both 36Kb and 18Kb version) have configurable port operation. They can be configured as: Single Port (SP), Simple Dual Port (SDP) or True Dual Port (TDP). In SP only one port exists and this port can be used for both read and write operations. In SDP two separate ports exists that have independent clock, address, data and control signal inputs and outputs. One of the ports can only write data while the other can only read. In TDP operation two independent ports exists just as in SDP operation. However in TDP operation both ports can be used for reading and writing.

The width, length and port functionality are highly configurable but each con-figuration affects the others. As described, a 18Kb block RAM can be maximum 36 bits wide. If a wider RAM is desirable a 36Kb block RAM (two 18Kb block RAMs) is required. This is the case for SP and SDP operation. In TDP operation the width can be at most 18 bits for a 18Kb block RAM. If wider TDP RAM is desirable a 36Kb block RAM is required.

A more detailed description of the Block RAM can be found in [48].

DSP slices

A DSP slice or more precisely a DSP48E1 slice contains a multiplier, arithmetical unit and logical functions. It provides extensive functionality to perform mathe-matical operations including add/subtract, multiply and multiply-accumulate as well as 10 different logical function [49].

In this thesis it was mainly the multipliers of the DSP slices that was used. The DSP slice contains a 25x18 bit two’s complement multiplier that can be exten-sively pipelined to achieve high throughput [49]. If a wider multiplier is desired several DSP48E blocks can be used together.

3.3 Programmable synthesizer

A programmable synthesizer (Hameg HM8134-3) has been used to manually test the implemented system during development. The synthesizer is capable of gen-erating a sinusoidal at frequencies 1Hz to 1.2GHz at 1Hz steps. It has an output level range of -127 to 13dBm [50]. These specifications are more than enough to test the whole spectrum of the implemented system. A picture of the HM8134-3 is shown in Figure 3.5. The synthesizer was used to generate a test signal, when testing the implemented system.

3.4 Tools

A list of applications and tools used in this thesis and what each tools was used for are listed below, under the headline of each respective vendor. The tools and applications used are the ones provided and used by SP Devices. A comparison of different tools and vendors was, therefore, not conducted.

(47)

3.4 Tools 29

Figure 3.5:The programmable synthesizer Hameg HM8134-3.

3.4.1 SP Devices

SP Devices has developed a set of tools to use with their digitizer products. AD-CaptureLab is a graphical application that allows the user to acquire data from the digitizer and plot both the spatial domain and the spectral domain of the sig-nal by calculation of FFT. The application can acquire data and save the acquired data to file for importing by other applications, for example MATLAB. A screen-shot of the application can be found in Figure 3.6. ADCaptureLab was used to view the signal that was used for testing the system implemented by this thesis. ADQUpdater is the tool used for updating the firmware of the digitizer products. In this thesis it was used for reprogramming the FPGA in the ADQ1600.

SP Devices also provides a MATLAB API for use with their digitizers. The API enables MATLAB scripts to configure their products as well as acquiring data from them. This API was used for communication with the implemented system. SP Devices has developed a framework that enables customers to implement their own functionality inside the digitizer by programming HDL. This customer func-tionality is referred to as User Logic. The User Logic is placed in the middle of the signal path providing the ability to alter the sampled data. A development kit (DevKit) provides scripts used for setting up a project with the Xilinx Design Suite, build the project and generate a bit-file that can be uploaded to the FPGA using ADQUpdater.

3.4.2 MathWorks

The high level model of the system was implemented in MATLAB, by Math-Works. MATLAB was chosen because it provides a lot of mathematical functions and types (such as matrices) and contains a simple yet powerful script language [51]. MATLAB was also used for testing (creating test vectors), plotting, proof of concept design and ordinary mathematical calculation. The FFT function in MATLAB was used as an ideal reference to the implemented FFT-core.

3.4.3 Xilinx

Xilinx provides a set of tools for designing systems in their FPGA products. ISE Design Suite 12.4 [52] was used for coding, synthesis, simulation, design analysis, etc. The design suite contains a project manager and source code editor that was

(48)

Figure 3.6:A screenshot of ADCaptureLab.

used for coding. ISim is the simulation tool included in the suite. It is used for digital circuit simulation using HDL. It provides similar functionality as most HDL simulators.

Xilinx provides a library of IP Cores that implements common structures such as complex multipliers, RAM/ROM, but also larger designs such as FFT and Eth-ernet Endpoint [53]. The IP cores are generated using a wizard. The cores can be customized in the wizard to suite to designer. IP cores generated by the core generator are called LogiCORE IP.

(49)

Part II

(50)

(51)

4

Proposed design

This chapter analyzes the problem that has been faced in this thesis and presents the proposed solution. Section 4.1 describes the problem. Section 4.2.1 proposes the trigger solution. Section 4.2.2 analyzes the problem, extracts the require-ments and presents the proposed design of the system.

4.1 Problem description

It is desirable to have a system that can capture a signal when the spectrum of the signal differs from an expected spectrum, since this can be used to analyze and verify electrical signals. In this way, spurious signal components, as well as intermittent signals, can be detected and captured. There is a need for a module inside the FPGA of the ADQ1600 that can analyze the sampled signal and gen-erate a trigger if the signal spectrum deviates from an expected spectrum. The logic already present in the FPGA of the ADQ1600 is then able to handle the generated trigger signal and perform appropriate actions, like capturing the sig-nal for a specified time. The first aim of this thesis is to implement a frequency domain trigger module inside the user logic space, described in Section 4.1.1. To be able to use the frequency domain trigger module, there must be a way to configure which spectral conditions should cause a trigger. The ADQ1600 provides a computer interface with an API (called ADQAPI) that can be used to transfer data between the board and a computer. In order to provide configurable trigger conditions, a small software API that provides functions to configure the system has to be developed. These functions have to utilize the computer in-terface and the existing ADQAPI. Therefore, the second aim of this thesis is to develop this API.

(52)

Virtex-6 FPGA Data catcher Frequency domain trigger module Digitizer framework ADC Signal USB PXIe/PCIe ADQ1600 Computer ADQAPI API MATLAB script Trigger Interface

Figure 4.1:The two blocks that this thesis will develop, in relation to major

product blocks.

The two blocks that should be developed in this thesis are shown in Figure 4.1. By implementation of a frequency domain trigger module inside the FPGA of ADQ1600 and a small software API, the ADQ1600 is able to act as a tool to cap-ture events that take place in the frequency domain.

The aim of the frequency domain trigger module, is to be able to trigger when the signal spectrum deviates from an expected spectrum or when the signal spec-trum fulfils an expected specspec-trum. What is considered to be an expected signal spectrum is determined by the user of the system. The expected signal spectrum determines which range the amplitude is expected to have for all frequencies, in whole or parts of the bandwidth. If the signal spectrum is within the expected spectrum range, it fulfils the expectation. It may also partly be inside the ex-pected spectrum range, or not within at all. These three cases, and the transitions between them should be triggerable, since they can be used to capture the appear-ance of spurious signal components as well as to capture events with a specific signal spectrum.

The expected signal spectrum must be configurable in the whole bandwidth of the ADQ1600 (800MHz). The range in which the expected spectrum should be configurable in, should be at least the dynamic range of the input. The input to the frequency domain trigger module is 16 bit, as explained in Section 4.1.1. This gives a dynamic range of:

20 · log₁₀(216) ≈ 96.3 dB. (4.1)

The signal should be analyzed in real-time (Figure 2.10a), not leaving gaps in the signal acquisition (Figure 2.10b), in order to be able to detect short-time signals.

4.1.1 Frequency domain trigger module

The placement of the frequency domain trigger module in the FPGA, in relation to the signal flow, can be seen in Figure 4.1. The frequency domain trigger mod-ule is placed inside the user logic space of the FPGA firmware, described in Sec-tion 3.2. The user logic interface includes data in/out, trigger in/out, user register in/out and other signals not relevant this thesis. The signals routed to the user

(53)

4.1 Problem description 35

Direction Width Name Description

input 1 clk_1_8 1:8 sample clock ratio

input 1 clk50 50MHz clock

input 1 rst_i Reset

input 16 ext_trig_vector_i External trigger input

output 16 ext_trig_vector_o External trigger output

input 1 host_trig_i Host trigger input

output 1 host_trig_o Host trigger output

input 16 data_a0_i Channel A data 0

input 1 data_valid_i Data valid

output 16 data_a0_o Channel A data 0

output 1 data_valid_o Data valid

input 16x32 user_register_i 16 User registers at 32-bit

output 16x32 user_register_o 16 User registers at 32-bit

Table 4.1:Frequency domain trigger module interface.

logic space is the only signals available to the frequency domain trigger module. A specification of the relevant part of the interface is found in Table 4.1.

The external trigger vector input represents the external trigger signal that is connected to the ADQ1600 card. The external trigger input is sampled at twice the data sample rate, achieving sub-sample trigger precision. Because of this, the external trigger vector is 16 bits wide to accommodate 16 trigger samples (each bit represents a trigger sample). The timing of the external trigger is fixed in relation to the sampled input data.

There is a trigger called “host trigger”. This trigger can be asserted by the user or a script at the host computer (software trigger). The timing of the host trigger is not fixed or guaranteed in any way, as with the external trigger.

The user_register_i inputs (16 registers of 32-bit) holds values that can be changed by a program or a user, through the computer interface (USB, PXIc or PCIe).

(54)

Sim-ilarly the user_register_o outputs can be read by a program or user through the computer interface. These registers can be used for simple parameter configura-tion, status or data transfer from and to a computer.

As can be understood by the interface, eight samples are handled each clock cycle. The interface routes eight parallel samples to the user logic. The main clock input (clk_1_8) frequency is one eight of the sample frequency, 200MHz in case of maximum sample frequency (1600MHz). Data 0 holds the value of the oldest sample and Data 7 the value of the most recent sample. Likewise, the external trigger vector holds the oldest trigger value in the Least Significant Bit (LSB) and the most recent value in the Most Significant Bit (MSB). The input samples are 16-bit even though the A/D converter of product is 14-bit. This is because signal processing is performed before the signal arrives at the user logic interface. This signal processing extends the data width from 14-bit to 16-bit. The extra bits are kept in order to not increase roundoff noise from the processing.

The data valid input is high for each clock cycle that the data inputs holds valid sample values. In all other cases, the data valid input is low. When data is col-lected at 1600MSps it is high for every clock cycle. Similarly the data valid output must be set high when valid data are outputted from the user logic. In all other cases the data valid output should be low.

An empty user space implementation should simply forward the signal data, data valid and trigger inputs, via a register, to the corresponding outputs. In this way, the samples and triggers are unaltered and the system can operate normally. How-ever, the user logic are allowed to delay and alter the data as long as data_valid_o and other outputs are handled correctly. Trigger signals can be hijacked and in-ternally asserted instead of forwarded.

4.2 Proposed solution

4.2.1 Triggering

The trigger events that the problem description describes can be found by an-alyzing the energy spectrum of the sampled signal. By calculating the energy spectrum and comparing this spectrum to a user defined expected spectrum, the triggering events described in the problem description can be found.

To acquire the energy spectrum, the sampled signal has to be transformed from the time domain to the frequency domain. The solution is to calculate the FFT of the sampled signal, the signal is then transformed to the frequency domain. The energy spectrum can then be obtained by calculating the energy in each FFT-bin. Triggering is then possible by comparing the energy spectrum to the specified trigger conditions.

In this section, a trigger solution is proposed. The proposed trigger functionality lets the user configure energy spectrum conditions that should cause a trigger in the system.

Design and Implementation of a Real-Time FFT-core for Frequency Domain Triggering

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Design and Implementation of a Real-Time FFT-core for

Frequency Domain Triggering

Design and Implementation of a Real-Time FFT-core for

Frequency Domain Triggering

Examensarbete utfört i Elektroniksystem

vid Tekniska högskolan vid Linköpings universitet

av

Abstract

Acknowledgments

Contents

I

Background

II

Implementation

III

Results

Notation

Part I

1

Introduction

1.1

Methodology

1.2

Prerequisites

2

Review of the DFT and FFT

2.1

Notations

2.2

Discrete Fourier Transform

2.3

Fast Fourier Transform

2.3.1

Rotation optimization

2.4

FFT Windows

2.5

Overlapping FFT

2.6

Review of FFT architectures

3

Equipment and tools

3.1

Systems with frequency domain triggering

3.2

ADQ1600

3.2.1

Triggering

3.2.2

Xilinx Virtex-6

3.3

Programmable synthesizer

3.4

Tools

3.4.1

SP Devices

3.4.2

MathWorks

3.4.3

Xilinx

Part II

4

Proposed design

4.1

Problem description

4.1.1

Frequency domain trigger module

4.2

Proposed solution

4.2.1

Triggering