Institutionen för systemteknik
Department of Electrical Engineering
Examensarbete
Study of Interferer Canceling Systems in a Software
Defined Radio Receiver
Examensarbete utfört i Radioelektronik vid Tekniska högskolan vid Linköpings universitet
av
Oskar Holstensson LiTH-ISY-EX--12/4650--SE
Linköping 2013
Department of Electrical Engineering Linköpings tekniska högskola
Linköpings universitet Linköpings universitet
Study of Interferer Canceling Systems in a Software
Defined Radio Receiver
Examensarbete utfört i Radioelektronik
vid Tekniska högskolan vid Linköpings universitet
av
Oskar Holstensson LiTH-ISY-EX--12/4650--SE
Handledare: Nicolas Regimbal
Atlantic Innovation Electronic Solutions Examinator: Ted Johansson
isy, Linköpings universitet Linköping, 22 maj 2013
Avdelning, Institution Division, Department
Electronic Devices
Department of Electrical Engineering SE-581 83 Linköping Datum Date 2013-05-22 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport
URL för elektronisk version
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-92757 ISBN
— ISRN
LiTH-ISY-EX--12/4650--SE Serietitel och serienummer Title of series, numbering
ISSN —
Titel Title
Studie av Störsignalsneutraliserande System i en Mjukvarudefinierad Radiomottagare Study of Interferer Canceling Systems in a Software Defined Radio Receiver
Författare Author
Oskar Holstensson
Sammanfattning Abstract
This thesis describes the work related to an interferer rejection system employing frequency analysis and cancellation through phase-opposed signal injection. The first device in the frequency analysis chain, an analog fast Fourier transform application-specific integrated circuit (asic), was improved upon. The second device, a chained fast Fourier transform followed by a frequency analysis module employing cross-correlation for signal detection was specified, designed and implemented in vhdl.
Nyckelord
Abstract
This thesis describes the work related to an interferer rejection system employ-ing frequency analysis and cancellation through phase-opposed signal injection. The first device in the frequency analysis chain, an analog fast Fourier transform application-specific integrated circuit (asic), was improved upon. The second device, a chained fast Fourier transform followed by a frequency analysis mod-ule employing cross-correlation for signal detection was specified, designed and implemented in vhdl.
Acknowledgments
I want to express my gratitude to my supervisor Nicolas Regimbal for his helpful guidance during the thesis.
Many thanks go to my examiner Ted Johansson for his support and never-ending source of wisdom and patience.
I would also like to express my gratitude to my friends at the university, most notably my very good friend Christoffer Peters.
Finally I thank my loving family for always supporting me in whatever endeavor I submit to at the time.
Contents
Notation xiii
I
Background
1 Introduction 17
1.1 Leon and the interferer rejection system . . . 17
1.2 Goals . . . 18
1.3 Document outline . . . 19
2 Software radio 21 2.1 Traditional radio receivers . . . 21
2.2 Software defined radio . . . 21
2.3 Cognitive radio . . . 22
2.4 This work . . . 23
II
Results
3 The sasp 27 3.1 Introduction . . . 273.2 Analog processing modules . . . 28
3.2.1 Analog adders . . . 28
3.2.2 Delay line . . . 28
3.2.3 Sample and hold . . . 29
3.2.4 Weighting unit . . . 29 3.2.5 Matrix unit . . . 30 3.2.6 Sample selector . . . 30 3.3 Control modules . . . 31 3.3.1 Flip-flop . . . 31 3.3.2 Address generator . . . 31 3.3.3 Coefficient control . . . 35 3.4 Results . . . 38 xi
xii CONTENTS
3.5 Conclusions . . . 41
3.5.1 Future work . . . 41
4 Frequency analysis and the dsp 43 4.1 Theory . . . 43 4.2 Feasibility test . . . 44 4.2.1 Algorithm . . . 45 4.2.2 Results . . . 45 4.3 rtl implementation . . . 49 4.3.1 Buffer . . . 50 4.3.2 fft . . . 50 4.3.3 Frequency analyzer . . . 53 4.3.4 Compensator . . . 57 4.3.5 Results . . . 57 4.4 Conclusions . . . 59 4.4.1 Future work . . . 59
4.4.2 Multiple signal detection . . . 59
4.4.3 Solving ambiguities . . . 59 5 Summary 61 5.1 Conclusions . . . 61 5.2 Future work . . . 62 5.3 Final words . . . 62 Bibliography 65 A Signal detection with cross-correlation 67 B Fast Fourier transform 71 B.1 Decimation-in-time radix-4 FFT . . . 71
Notation
Abbreviations
Abbreviation Meaning
adc Analog to Digital Converter aft Analog Fourier Transform
asic Application-Specific Integrated Circuit cmos Complementary Metal Oxide Semiconductor
dac Digital to Analog Converter dft Discrete Fourier Transform dsp Digital Signal Processor fft Fast Fourier Transform
fpga Field Programmable Gate Array fsm Finite State Machine
mac Multiply And Accumulate
rom Read-Only Memory
rtl Register-Transfer Level
sasp Sampled Analog Signal Processor sdr Software Defined Radio
sr Software Radio
vhdl vhsicHardware Description Language vhf Very High Frequency
vhsic Very-High-Speed Integrated Circuit
Part I
1
Introduction
This thesis was carried out at Atlantic Innovation Electronic Solutions in Bor-deaux with the aim of studying and implementing the interferer cancellation sys-tem proposed in the Leon project, discussed later in this chapter. It was done in the scope of a master’s thesis project of 20 weeks in the spring of 2012.
The present work has been led in the framework of the ITP SIMCLAIRS com-peted program. France, United Kingdom and Sweden have mandated the Euro-pean Defence Agency (EDA) to contract the Project with a Consortium composed of THALES SYSTEMES AEROPORTES France, acting as the Consortium Leader, SELEX Galileo Ltd, THALES UK Ltd and SAAB AB.
1.1
Leon and the interferer rejection system
Leon is a project supervised by Atlantic Innovation Electronic Solutions. The project aims at creating an interferer rejection system using the sampled analog signal processor, or sasp, described below. The goal is to be able to cancel out any wideband signal from the vhf to Kubands, or 30 MHz to 18 GHz.
The system makes use of frequency analysis of the input signal to distinguish powerful interferers, and superimposes a phase-opposed signal on top of the in-put signal, delayed through a delay line. An overview of the system is depicted in figure 1.1.
For frequency analysis, the input signal is processed using an analog Fourier transform. The algorithm is the popular radix-4 fast Fourier transform (fft) al-gorithm by Cooley and Tukey (1965).
To enhance the frequency resolution, the dsp itself performs another fft. A dig-17
18 1 Introduction
Figure 1.1:Proposed system
ital signal processing unit then processes the output of the second transform to determine the frequency, amplitude and phase of any interfering signals. If the frequency resolution following the analog Fourier transform is sufficient, then the cascaded transform can be omitted, but then the aft must supply a spectrum excerpt for analysis.
The purpose of the analog delay line is to delay the signal while its spectral con-tents are being determined, and phase-opposed rejection signals are being cre-ated. When the signal exits the delay line, the rejection signals are added to cancel out the interferers.
The sampled analog signal processor, or sasp, was created at lab IMS in Bor-deaux by Dr. Rivet (2009) for his doctor’s thesis. The signal processor performs a Fourier transform on the incoming radio signal, allowing for isolation of indi-vidual frequency components, and simplifying subsequent signal processing. A proof-of-concept sasp was created, capable of continuously sustaining 64-point Fourier transforms at a sampling frequency of 640 MHz.
Analyzing the incoming spectrum with the help of the sasp, interfering signals are detected and have their phase-opposed equivalents superimposed over them. With accurate enough analysis of the incoming spectrum, this will effectively cancel the interferers. However, the limitation to 64-point transforms proves problematic. The possibility of performing a chained Fourier transform after that of the sasp has been examined, with a proof-of-concept Matlab model. This improves the precision of the analysis, but also comes with a new set of issues.
1.2
Goals
This work has a clear divide between two tasks, and consequently its goals comes in these two parts.
• First, the requirements of the digital signal processing will be analyzed and implemented in vhdl.
1.3 Document outline 19
• Second, the work on the sasp will be resumed and advanced towards its completion.
The first goal is open for different approaches and architectures. The second goal, in that it involves a project resumed at a late stage of development, is more restricted and has the following requirements.
• Maximum chip area: 1.44 mm2
• Maximum power consumption: 100 mW • Minimum speed: 2 GHz
The two tasks stated above are in the chronological order in which the goals were identified and performed. However, for the sake of narrative, this report reorders them in the order in which they appear in the system.
1.3
Document outline
Chapter 2 discusses modern radio receiver challenges and presents the concept of software defined radio.
In chapter 3, the work on improving the sasp is presented. The different building blocks, analog and digital, are presented and the work done on them during this thesis is highlighted.
In chapter 4, the subject of chaining Fourier transforms is discussed. The issues with inaccuracy involved in this method of spectral analysis are presented, and solutions to these problems are discussed. The chapter presents a comprehensive workflow from mathematical concept to a Matlab model and ending in a synthe-sizable vhdl model.
2
Software radio
This chapter briefly brings up the topic of traditional radio and moves towards the concept of the software radio, setting the background of the work.
2.1
Traditional radio receivers
Traditional radio receivers work by tuning in to a certain channel in the wanted band. The radio signal from the antenna is amplified through a low-noise ampli-fier. Signals in other channels and even other bands need to be filtered out, and are often done so at an intermediate frequency.
However, an interferer of sufficient power risk saturating the low-noise amplifier and might even damage it. Employing a tuned antenna or RF filters attenuates interferers out of band, and in-band interferers are already assumed to behave according to the corresponding standard.
These steps to minimize the damage caused by interferers greatly reduce the tun-ability of the circuit. Highly configurable RF filters of sufficient quality are very difficult to construct. To keep the cost and power consumption at a minimum, radio receivers are highly tuned and the entire signal path is optimized for the target specification.
2.2
Software defined radio
The high specialization of radio receiver circuits prohibits them from sharing more than a fraction of their signal paths. This leads to device complexity grow-ing with the number of communication standards accommodated.
22 2 Software radio
Any change in an already existing standard, such as a allocating a new band, calls for a redesign of the radio hardware. To be able to accommodate an additional standard, a device needs to be upgraded with an entirely new transceiver. The aim of software defined radio (sdr) is to create a transceiver architecture able to accommodate any number of wireless standards simultaneously, while main-taining a low power consumption. When concurrent standards are modified or new standards are introduced, the sdr unit is compliant right after reprogram-ming. The concept of the software radio was proposed by Mitola (1995).
The ideal software defined radio involves digitizing the incoming radio signal at the antenna. With a sufficiently fast and accurate analog-to-digital converter (adc) followed by a powerful enough digital signal processing unit, any wireless standard can be accommodated. Such a device, practically only consisting of dig-ital components, is sometimes referred to as a software radio (sr). However, the requirements this puts on the speed and accuracy of the adc pushes the power consumption to impractical levels using modern technology. The concept of the all-digital software radio remains an utopian one.
More practical sdr architectures make use of both analog and digital compo-nents, sometimes with multiple signal paths to accommodate a wide frequency range. Deval (2010) discusses the problems, advantages and disadvantages of software radio compared to multi-radio approaches, and presents practical de-sign solutions and circuit examples.
2.3
Cognitive radio
A natural extension of the software defined radio is the cognitive radio. The term was first proposed by Mitola and Maguire (1999).
Traditionally, the frequency spectrum is divided into bands and is licensed per ge-ographical area. This regulation of the frequency spectrum is necessary to avoid overlapping bands, and consequently evades interference between services. How-ever, the spectrum is fully utilized only when all channels in all bands are allo-cated. More likely is the situation where one band is overutilized while another band on a different service is underutilized. This is a common situation with cellular communication (overutilization) versus television broadcasting (under-utilization).
A cognitive radio detects free, unused channels and adapts its transmission and reception parameters to better utilize the wireless spectrum. With accurate enough spectrum sensing the cognitive radio can use the full potential of the radio envi-ronment without causing interference to other devices. A cognitive radio is basi-cally composed of a software defined radio with spectrum sensing capabilities. The field of cognitive radio is an active research topic. Razavi (2010) introduces a low-noise amplifier for a cognitive radio receiver for the range of 50 MHz to 10 GHz. Kitsunezuka et al. (2012) presents a cognitive radio receiver capable of
2.4 This work 23
receiving signals between 30 MHz and 2.4 GHz. It is also able to sense spectral energy to determine band availability.
2.4
This work
The ultimate goal in the Leon project is to cancel powerful interferer signals from the vhf to Kubands, or 30 MHz to 18 GHz. At this bandwidth any single filter is
not feasible, and accommodating all possible blocker profiles is highly unfeasible. The Leon project is designed to accommodate any radio receiver operation, and targets no specific application or radio standard. The project is in other words effectively a flexible filter, directly appropriate for use in an sdr or cognitive radio receiver.
Part II
3
The
SASP
3.1
Introduction
The sampled analog signal processor, or the sasp, is a device that is capable of performing a Fourier transform with analog samples, or analog Fourier transform (aft). It was created at lab IMS in Bordeaux by Dr. Rivet (2009) for his doctor’s thesis.
In Leon, the sasp performs the first of two cascaded Fourier transforms. It is located at the input (figure 3.1).
Figure 3.1:The aft in the proposed system
The sasp performs a sample-and-hold operation on the input signal, and then utilizes a series of delay lines and analog arithmetic units to perform the division-in-time radix-4 fast Fourier transform by Cooley and Tukey (1965), derived in section B.1. One frequency bin of the operation is selected and its complex analog
28 3 The sasp
value is output each time the transform has completed.
The chosen architecture has the advantage of being able to operate continuously. It inputs one sample and outputs one frequency bin per clock cycle.
A structural overview of the sasp is presented in figure 3.2.
Figure 3.2:Overview of the sasp
The sasp was previously realized in a demonstrator chip by Dr. Rivet in the 65 nm cmos technology from ST Microelectronics. The demonstrator operates at frequencies up to 640 MHz with 64 samples. The power consumption of the demonstrator is 450 mW.
The improvement work aims at elevating the operating frequency of the sasp to at least 2 GHz at a power consumption of less than 100 mW.
3.2
Analog processing modules
The principal function of the sasp is to process sampled analog signals, true to its name. In this section the different elements to achieve this function are described.
3.2.1
Analog adders
The analog adders perform addition with differential analog voltage samples. This is accomplished by adding currents; the inputs are connected to transistors that act as voltage controlled current sources (figure 3.3). The current through the common resistor exhibits an increase proportional to the sum of the input voltages, and the sum can be sensed as the increase in voltage across it.
3.2.2
Delay line
The delay lines of the sasp make temporary storage for the samples as each stage of the fft requires the samples to arrive in a specific order. The first butterfly of the first stage requires the samples with indices 0, 16, 32 and 48; the stage thus needs to store samples 0-47 before the first butterfly can be processed.
3.2 Analog processing modules 29
bias
apos aneg bpos bneg
bias
Vdd
outpos
outneg
Figure 3.3:Analog two-input adder
Furthermore, the delay lines play a role in the deserialization and serialization of the samples. At the input of each stage, one sample arrives per clock cycle, but the matrix unit processes four samples at a time. The output delay line then serializes the samples so that they are again sent to the next stage at a rate of one sample per clock cycle.
3.2.3
Sample and hold
At the input of the sasp, the sample and hold circuit converts the continuous-time input signal to a discrete-continuous-time one suitable for processing.
3.2.4
Weighting unit
Both windowing and the fft algorithm require the input samples to be multi-plied with certain coefficients. For the sasp, this is accomplished in the weighting unit. It is based on the work by Abiven (2011). The device effectively multiplies an analog sample with a digital value.
The architecture was improved in this thesis. The previous architecture utilized a base-10 approach, providing 100 possible digital values with eight control lines. This approach is called binary coded decimal and was chosen as it is clear and
intuitive when programming by hand.
As the coefficients were to be provided by a read-only memory (rom) structure that can be programmed automatically, a pure binary approach was chosen in-stead. This increased the number of values to 256.
The multiplication is accomplished by scaling the input by factors of 2−k, k = 0, 1, 2, . . . , 7, and then adding a subset of these together. The subset is determined by the bits in the digital factor.
30 3 The sasp c = 7 X n=0 2−n (3.1)
The largest possible coefficient is 2 − 2−7
, and its use is considered as scaling the input by unity.
Multiplying complex numbers is accomplished by using four real-valued weight-ing units and two two-input analog adders as shown in equation 3.2.
<{out} = <{a} ∗ <{b} − ={a} ∗ ={b}
={out} = ={a} ∗ <{b} + <{a} ∗ ={b} (3.2)
3.2.5
Matrix unit
The matrix unit implements the addition matrix derived in section B.1. Equa-tion B.6 is included here for clarity.
X(k) X(k + N4) X(k + N2) X(k +3N4 ) = 1 1 1 1 1 −j −1 j 1 −1 1 −1 1 j −1 −j 1 WNk WN2k WN3k | F0(k) F1(k) F2(k) F3(k) |
The trivial multiplications by factors −j, −1 and j are performed by simply rewiring the differential analog signals at the input of the analog adders.
3.2.6
Sample selector
The sample selector waits at the end of the pipeline, and grabs one specific fre-quency bin every time it appears at the output. The frefre-quency bin to be selected is programmable by specifying the corresponding binary number at 6 input pins. After the sample selector is a set of buffers to drive the output pins of the chip. These buffers have an extended output swing to facilitate chip measurement.
3.3 Control modules 31
3.3
Control modules
To control the workflow and provide coefficients for the analog weighting units, a set of control modules are required. This section presents the principal modules and their function.
The modules described here were all designed and implemented during this the-sis.
3.3.1
Flip-flop
Digital flip-flops are used to store the digital values used for controlling the sasp. The selected architecture is that of Yuan and Svensson (1989). The chosen ar-chitecture was selected due to its simplicity; it does not require complementary clock phases.
The architecture is dynamic and will lose its data unless a minimum clock speed is maintained. This is the digital equivalent of the delay line cell. The schematic and layout of the flip-flop is depicted in figure 3.4 and figure 3.5 respectively.
Vdd in out clk clk clk clk
Figure 3.4:Flip-flop schematic Figure 3.5:Flip-flop layout
3.3.2
Address generator
To control the sasp, an address generator unit is used. It contains a 6-bit counter to provide a global state followed by adders to compensate for phase differences in the different stages of processing.
Since the contents of the rom modules are easily manipulated, their contents can be shifted to obtain a virtual phase shift. Moreover, the phase difference of the sample selector can be compensated off-chip. By utilizing these techniques, the Hamming window unit, stage 1 and the sample selector all run on the base address of the address generator, saving space and power. Stages 2 and 3 run on addresses with their own phase adjustments.
The counter and adder architectures are that of Kogge-Stone adders (Kogge and Stone, 1973). Stages 2 and 3 have hard-coded offsets to provide the required phase difference. The Kogge-Stone architecture was selected because of the speed
32 3 The sasp
requirements; the adders need to operate reliably at 2 GHz. Simple carry-chain adders proved to be too slow for the application, even for as few as 6 bits. Adding two binary numbers the pen-and-paper way, one begins the addition at the rightmost bits. If both bits are 1, a value of one carries over to the left. The algorithm then proceeds to the next bit position, adding the bits at that position together with any carried bit. This is repeated for all bits.
The problem with this algorithm is that it forms a chain of carried bits, and the final outcome will have to wait until this long chain is fully traversed. To speed up this process, carry-lookahead is performed.
For each pair of bits of the inputs, An and Bn, n = 0, 1, . . . , N − 1, two properties
are derived.
• If An and Bn are both 1, then a bit will be carried to the left regardless of
whether a carry arrives from the right. This property is called generate, or
Gn.
• If only one of An or Bnis 1, then a bit will be carried if and only if a carry
arrives from the right. This property is called propagate, or Pn.
These two properties are calculated as a first step. Secondly, these properties are calculated for all pairs of two consecutive positions of the inputs. For positions n and n − 1, this group of bits is set to generate if bit n already generates (Gn = 1),
or if bit n − 1 is set to generate while bit n is set to propagate that bit. The entire group of bits is set to propagate if both bits of the sequence propagate.
Extending these definitions for any group of bits running from n to m, their cumu-lative generate and propagate properties are called Gn↔mand Pn↔mrespectively.
The cumulative properties can be extended, always including one bit to see if the extended group is set to generate or propagate. However, a lot of redundant processing is avoided by instead taking the generate and propagate properties of a group, and combining it with the largest adjoining group that has already been resolved.
Ultimately this will determine Gn↔0and Pn↔0for all bit positions n. Since each
processing step can effectively double the group length, a total of log2N
process-ing steps are required.
When a group has the propagate property, it will propagate to the left any carried bit from the right. However, when the cumulative properties are known all the way to the least significant bit (bit 0), there is no possibility of a carry bit arriving from the right. The generate property then unambiguously determines whether the group already has generated a carry bit or not. Now all the complete groups generate a carry bit to the left if and only if it has the generate property, that is,
Cn = Gn−1↔0.
To arrive at the sum, the three bits An, Bn and Cn are added to form the sum
3.3 Control modules 33
Pn property; since Pn = An ⊕Bn it is possible to reduce the sum calculation to
Sn = Pn⊕Cn.
The phase-compensated addresses are converted to base-4, decoding each pair of binary bits into four lines. This goes well with the radix-4 design of the fft imple-mentation, and also reduces the complexity of the coefficient rom architecture.
pprev Vdd pprev p p pout Figure 3.6: Calculate P
schematic Figure 3.7:Calculate P layout
Calculate P and calculate G circuits were created and laid out. The schematic and layout for calculate P is shown in figure 3.6 and figure 3.7 respectively. The schematic and layout for calculate G is shown in figure 3.8 and figure 3.9 respec-tively. Vdd gprev gprev g g p p gout Figure 3.8: Calculate G
schematic Figure 3.9:Calculate G layout
Using the above structures, a matrix performing all of the reductions was cre-ated and laid out. The final layout for the structure calculating the cumulative generate property is shown in figure 3.10. The input to the circuit is at the top terminals, and the output is routed from the bottom.
34 3 The sasp
3.3 Control modules 35
3.3.3
Coefficient control
The Hamming window unit, stage 2 and stage 3 all require digital coefficients for their weighting units. These coefficients are supplied by use of a NOR rom structure, as seen in figure 3.11.
precharge
word line 0
word line 1
word line 2
Vdd Vdd Vdd
bit line 0 bit line 1 bit line 2
Figure 3.11: romprinciple
Each word line is controlled by a small logic unit. It activates during one half of the clock cycle when the right address is supplied. During the other half of the clock cycle the bit lines are charged to VDDby pull-up transistors.
The contents of the rom is indicated by the presence of absence of a pull-down transistor. When a word line is activated, the presence of a transistor at the junc-tion between said word line and a bit line will pull the corresponding bit line towards ground, signifying a logical zero at this address. The bit lines without transistors in the active junction will remain at VDD, signifying a logical one. The
contents of the 3x3 example rom depicted in figure 3.11 is 010, 101 and 111 at addresses 0, 1 and 2 respectively.
The bit lines are heavily exposed to parasitic capacitance, and are therefore very slow. To accurately read the value of the bit lines at each cycle a clocked sensor approach is used.
When a bit line voltage drops below a reference voltage, nominally 200 mV below
VDD, an internal node is quickly discharged. This signifies a logical zero. If the
bit line voltage is kept high, the internal node is kept undischarged and a logical one is implied. At the end of the discharge phase, the logical value is forwarded to the output of the sensor.
The schematic and layout of the sensor is depicted in figure 3.12 and figure 3.13 respectively.
36 3 The sasp The layout of the coefficient rom for the last stage of the sasp is depicted in figure 3.14. The address is input from the right and the data is output to the left.
Vdd
ref bit line
out clk
clk
clk
clk
Figure 3.12: romsensor schematic
3.3 Control modules 37
38 3 The sasp
3.4
Results
The address generator was validated by post-layout simulation at 3 GHz to guar-antee robustness at the nominal frequency of operation, 2 GHz.
The post-layout simulation puts the average current of the address generator at 1.43 mA. Figure 3.15 shows the internal six-bit counter state.
Figure 3.15 shows the state of the internal address counter.
Figure 3.16 shows the base-4 address of the Hamming window unit and stage 1. It is a delayed version of the internal address with each pair of bits decoded into the equivalent base-4 digit.
Figure 3.17 shows the base-4 address of stage 2. This address enjoys a phase offset in addition to being delayed and decoded into three base-4 digits.
Clock
Bit 1
Bit 2
Bit 3
Bit 4
Bit 5
Bit 6
Time (ns)
0 5 10 15 20Figure 3.15:Address counter
All of the rom structures were verified by post-layout simulation. The largest rom, that of the last stage and the one depicted in figure 3.14, has an average power consumption of 4.25 mA.
The results show that the address generator is able to reliably provide addresses to all the blocks without discrepancies at up to 3 GHz.
3.4 Results 39
Digit 1
3012301230123012301230123012301230123012301230123012301230123012301230Digit 2
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0Digit 3
0 1 2 3 0Time (ns)
0 5 10 15 20Figure 3.16:Hamming window unit and stage 1 address
Digit 1
3 01230123012301230123012301230123012301230123012301230123012301230123Digit 2
3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0Digit 3
0 1 2 3 0 1Time (ns)
0 5 10 15 20
40 3 The sasp To test the finished modules, the entire Hamming window unit with coefficient romwas tested along with the address generator. The simulation includes the effects of resistive and capacitive parasitics after layout. Figure 3.18 depicts the output of the Hamming window unit running at 3 GHz. The input to the weight-ing unit in this simulation is a constant voltage signal.
The output shows that the weighting unit is able to faithfully reconstruct the raised cosine that is the Hamming window. The address generator and the coeffi-cient rom supply the required control signals for it to work.
0
5
10
15
20
0
50
100
150
Time (ns)
P
otential (mV)
3.5 Conclusions 41
3.5
Conclusions
At the end of the thesis, the work on the improved sasp had come a long way. The blocks that were worked on in this thesis; the address generator, coefficient romblocks and the weighting unit, were finished. However, much work is still required to provide for a completed circuit.
The constructed digital units include the rom architecture and the address gen-erator. These perform well under post-layout simulations up to 3 GHz, while the target speed is 2 GHz. This speed margin leaves room for some amount of parasitic capacitance when routing the address signals across the chip, as well as some margin for fabrication. These blocks comprise the high-level control of all the stages.
The weighting unit, arguably one of the most critical blocks in the system, works reliably in simulations. It enjoys an improved architecture that paves the way for more linear behavior of the circuit.
As mentioned in section 1.2 in the introduction, the chip has power and area requirements in addition to the 2 GHz speed requirement. Since the sasp was not fully finished, there was not yet any substantial area estimation, but an ap-proximation put the occupied area well below the limit. The power consumed by the created blocks were moderate enough not to count substantially towards the maximum.
3.5.1
Future work
The delay lines of all the stages had their analog parts laid out even before the work of the thesis. However, still missing is the decoding circuitry, using the in-coming address to input or output samples in the correct order. The intimacy be-tween the delay lines and its control circuitry will inevitably lead to some amount of redesign of the analog architecture.
The input sample block as well as the sample selector (output) block needs to be realized and verified up to 2 GHz.
The area and power requirements need to be properly assessed; as the chip nears completion a power and area budget needs to be drawn up and followed through. As a final task, the system will need to be assembled and verified by post-layout simulation including the entire chip die to guarantee satisfactory performance up to the desired operating frequency.
4
Frequency analysis and the
DSP
In the Leon topology, the digital signal processor (dsp) takes care of the frequency analysis and forwards detected interferers to the signal generator (figure 4.1). The purpose is to increase the fidelity of the signal detection by further process-ing the sasp output to produce a more fine-grained spectrum followed by an improved signal detection algorithm.
This chapter looks into the theory and implementation of the frequency analysis and further highlights the improvements in the signal detection algorithm.
Figure 4.1:The dsp in the proposed system
4.1
Theory
The sasp performs a discrete Fourier transform (dft) using the fft algorithm for efficient computation.
44 4 Frequency analysis and the dsp
Before performing the dft, windowing is required to minimize leakage. The saspuses the Hamming window (figure 4.2) for this purpose.
0.0
0.2
0.4
0.6
0.8
1.0
Time, normalized
Amplitude
0
1
Figure 4.2:Hamming window
The highest side lobes of the Hamming window are at -46 dB, and leakage beyond a distance of two frequency bins is attenuated at least by this amount (figure 4.3). However, the leakage to neighboring frequency bins is significant. This leads to ambiguity in the actual frequency of the blockers.
In addition to attenuation, the windowing exhibits phase distortion (figure 4.4) when the sinusoidal source is of a frequency that is not a multiple of the sampling frequency.
The dsp performs a cascaded fft with the same windowing procedure. The spectral leakage introduced by the two windowing functions yields a significant source of error. For instance, the phase error can be as large as π/2, which pre-vents the information from being useful as the phase error must be kept small. A signal detection algorithm to counter the effects of the two instances of win-dowing is derived in appendix A. This algorithm was used in the subsequent feasibility test and vhdl implementation.
4.2
Feasibility test
4.2 Feasibility test 45
−60
−50
−40
−30
−20
−10
0
Frequency bin
Magnitude (dB)
−4
−2
0
2
4
Figure 4.3:Magnitude of the Hamming window (normalized)
4.2.1
Algorithm
For an aft of N points, and a succeeding dft of M points; 1. Provide N M input samples
2. Perform M Fourier transforms of input samples 0 . . . N −1, N . . . 2N −1, · · · , N (M − 1) . . . N M − 1
3. Select one bin and gather its corresponding samples from the successive Fourier transforms
4. Perform a second Fourier transform on these M samples
5. Sweep the spectrum, and for each peak above a certain threshold found, obtain its precise frequency, amplitude and phase information using the algorithm derived in appendix A
6. Generate phase-opposed signals and add them to the input
4.2.2
Results
The test consists of a sweep of one input signal from 50 MHz to 60 MHz, with an amplitude of one and a phase of zero. The sampling frequency of the first 64-point aft is 640 MHz, and the frequency appears in bin 5. The successive output samples of this bin are fed to an fft of 64 points and then analyzed using coarse detection and correlated detection.
46 4 Frequency analysis and the dsp
Frequency bin
Phase (r
adians)
−4
−2
0
2
4
−
3π
−
2π
− π
0
π
2π
3π
Figure 4.4:Phase of the Hamming window
50
54
58
0.80
0.85
0.90
0.95
1.00
1.05
1.10
Standard
Frequency (MHz)
Amplitude
50
54
58
0.80
0.85
0.90
0.95
1.00
1.05
1.10
Correlated
Frequency (MHz)
Amplitude
Figure 4.5:Detected amplitude
The Matlab model shows improved precision of the detected amplitude when the frequency is not a multiple of the sampling frequency (figure 4.5).
4.2 Feasibility test 47
error from −π/2 to π/2, which is not useful when the goal is to generate a phase-opposed signal (figure 4.6). This method of peak detection is here calledstandard.
50
54
58
Standard
Frequency (MHz)
Phase (r
adians)
− π
0
π
50
54
58
Correlated
Frequency (MHz)
Phase (r
adians)
− π
0
π
Figure 4.6:Detected phase
Selecting the frequency bin with the greatest magnitude naturally leads to quan-tization of the detected frequency. Using correlation, the frequency can be deter-mined with greater fidelity (figure 4.7).
In both cases, for input frequencies very close to 60 MHz, the algorithms detect a frequency around 50 MHz. This is due to the ambiguity introduced by sampling; for this frequency bin, 50 MHz is equivalent to DC, and 60 MHz is equivalent to the sampling frequency.
48 4 Frequency analysis and the dsp
50
54
58
−1.0
−0.5
0.0
0.5
1.0
Standard
Frequency (MHz)
Frequency error (percentage)
50
54
58
−1.0
−0.5
0.0
0.5
1.0
Correlated
Frequency (MHz)
Frequency error (percentage)
4.3 rtlimplementation 49
After detection, a phase-opposed signal is generated and superimposed over the input signal. The attenuation is then measured as the total spectral power after compensation as compared to the total spectral power before compensation. When using correlation, the attenuation shows a more regular behavior (figure 4.8). This is partially due to the sensitivity to phase error when generating the phase-opposed signal. When the phase error is large, as is the case when not correlating, the compensation signal will not fully cancel the interferer.
50
54
58
0
10
20
30
40
50
Standard
Frequency (MHz)
Atten
uation (dB)
50
54
58
0
10
20
30
40
50
Correlated
Frequency (MHz)
Atten
uation (dB)
Figure 4.8:Attenuation4.3
RTL
implementation
A vhdl register-transfer level (rtl) implementation was created, chiefly con-sisting of an fft unit followed by a frequency analyzer unit (figure 4.9). The frequency analyzer unit scans the spectrum coming from the fft, and when it detects peaks using simple threshold calculations, a peak matcher unit is dis-patched to find the exact frequency of the peak.
Due to the inelasticity of the input timing, a buffer unit precedes the fft unit so that samples are saved while the fft is performed.
A compensator unit is placed after the frequency analyzer unit to compensate for the effects of windowing and cross-correlation, discussed above.
The number of points for the rtl implementation is adjustable by a single param-eter.
50 4 Frequency analysis and the dsp
Input
buffer
Radix-4
FFT
Frequency analyzer
Peak
matcher
Comp-ensator
Figure 4.9: rtlimplementation layout
4.3.1
Buffer
Before the fft a small buffer is placed that stores samples when the fft is per-forming its calculations. The buffer unit places incoming samples in a small queue and provides them to the fft unit when it is ready to accept them. The length of the buffer unit depends on the size of the fft and the width of the peak detection. Resizing of the buffer might be needed if these parameters change.
In implementations where the input sampling is governed by another clock do-main, the buffer unit will also serve to transfer the data into the clock domain of the fft and the rest of the system.
4.3.2
FFTThe fft unit consists of one radix-4 butterfly performing the transform in-place, using four complex sample memories for intermediate sample storage. The im-plementation uses a decimation-in-frequency decomposition, as derived in sec-tion B.2. The fft unit is depicted in figure 4.10.
Since the data is manipulated in-place and kept in four separate memories, the algorithm is careful to always place the samples so that the four samples for each butterfly operation reside on separate memories. This is accomplished by shifting the samples a certain number of steps clockwise when reading from and writing to the memories.
Inevitably, data hazards would occur as one stage of the fft ends and the next one begins. Wait states are inserted to avoid this.
Equation B.6 is included here for clarity. The butterfly computes the complex operation, for n = 0, 1, 2, . . . ,N4 −1; y0(n) y1(n) y2(n) y3(n) = 1 WNn WN2n WN3n 1 0 1 0 0 1 0 −j 1 0 −1 0 0 1 0 j 1 0 1 0 1 0 −1 0 0 1 0 1 0 1 0 −1 x(n) x(n+N /4) x(n+N /2) x(n+3N /4) |
4.3 rtlimplementation 51
eight complex additions per operation. In the implementation the calculation is pipelined in three stages; two for the additions and one for the complex rotation. The twiddle factors WNm, m = 0, 1, 2, . . . , N − 1, are pre-calculated and stored in a look-up table.
When inputting samples, the pipeline is shorted right before the first butterfly multiplier, and it is used to apply the Hamming window. The Hamming window values are stored in a separate look-up table.
To minimize latency, the last stage of the transform calculates samples in the order that the frequency analyzer units expects them and outputs them as they become available.
52 4 Frequency analysis and the dsp
Pipeline control
Butterfly
Coeffs
INPUT IS B U TT ER FL Y IS W IN D O W IS O U TP U T ST A G E IN D EX P R O G R ES S R O TA TI O N M EM O R Y IN D IC ES ST A G E B YP A SS EDSample shifter
Sample shifter
O U TP U T ST B3
0
1
2
3
0
1
2
Sample memory read ports
Sample memory write ports
Writes
O U TP U T IN D EX O U TP U T V A LU EOut
IN P U T ST B IN P U T A C K Figure 4.10: fft4.3 rtlimplementation 53
4.3.3
Frequency analyzer
The frequency analyzer is a higher order unit that effectively detects and matches the peaks that appear in the input spectrum. It consists of a peak detector, a peak matcher arbiter and one or more peak matchers. The principal flow of the frequency analyzer is shown in figure 4.11.
Peak matcher
arbiter
Peak
detector
matcher
Peak
54 4 Frequency analysis and the dsp
Peak detector
The peak detector serves to detect energy peaks in the spectrum, signalling to the peak matcher arbiter when a falling edge from a sample of sufficient magnitude was detected. It takes data directly from the fft and delays it, as to be able to supply the peak matcher with a full spectrum excerpt. The peak detector is depicted in figure 4.12.
After detecting a peak, the frequency analyzer temporarily inhibits the detection, as to not trigger multiple times on the same peak. This limits the minimum distance between two adjacent signals. Subsequent signals within this minimum distance will be ignored.
Abssq
Control
IN P U T V A LU E IN P U T IN D EX G R EA TE R IN P U T ST BCompare
&
EN A B LEInhibit
O U TP U T V A LU E O U TP U T IN D EX O U TP U T ST B4.3 rtlimplementation 55
Peak matcher arbiter
The peak matcher arbiter’s main role is to distribute the workload among the free peak matchers. The principal schematic is depicted in figure 4.13.
The peak matchers that follow the arbiter can only receive one spectrum excerpt at a time, which prevents peaks with overlapping spectrums to be detected by the same peak matcher. The peak matcher arbiter serves the spectrum excerpt only to a peak matcher that is free to receive more input.
Each peak matcher comes with a multiply-and-accumulate pipeline, and increas-ing the number of peak matchers improves performance. This effectively de-creases the latency of the algorithm when encountering multiple peaks.
STB 1 STB 2 FREE 1 FREE 2 STB VALUE INDEX STB N FREE N
...
Peak matcher 2
N-to-M
encoder
Peak matcher 1
Peak matcher N
Figure 4.13:Peak matcher arbiter
Peak matcher
Each peak matcher unit consists of a complex multiply-and-accumulate (mac) pipeline that computes the correlation between the samples and the Hamming window in the frequency domain at a certain offset. The peak matcher is depicted in figure 4.14.
At the end of the mac pipeline is a magnitude unit that calculates the square of the magnitude, |z|2 = <{z}2+ ={z}2. In this implementation the square of the magnitude is used instead of just the magnitude since taking the square root is expensive in hardware, and not needed since the maximum will be found just as well using this metric.
The pipeline is controlled by a number of state machines implementing heuristics to maximize the magnitude squared |z|2, i.e. finding the peak.
To maximize pipeline utilization the unit can process more than one peak simul-taneously, time-sharing the mac unit between them.
56 4 Frequency analysis and the dsp C O R R ST B P R O G R ES S C O R R SL O T Peak arbiter Correlator search ST B EN D FI R ST SL O T P O S Subsample index CORRPOS C O R R A C K Dispatcher C LE A R MAC SAMPLEADDR Sample memory Window table WINDOWADDR SAMPLEDATA WINDOWDATA A C K Record keeper FI R ST SL O T C O M M A N D V A LI D C O R R EN D GREATER Complex value Magnitude SE A R C H EN D P O S OUTPUTSTB OUTPUTACK O U TP U T SL O T O U TP U T SU B SA M P LE IN D EX Sample
index OUTPUTSAMPLEINDEX Calc freq OUTPUTFREQ
INPUTVALUE Sample shifter FR EE SL O T FR EE A C K INPUTSTB SA M P LE A D D R OUTPUTVALUE INPUTINDEX C O R R D O N E C O R R D O N E SL O T FREE
4.3 rtlimplementation 57
4.3.4
Compensator
Following the peak matcher is the compensator unit that compensates for the effects of the first windowing and the autocorrelation.
It contains two look-up tables containing the normalized reciprocals of the two effects, and one complex multiplier taking its factor from one of the tables. Every peak passes through its pipeline twice to compensate for both effects.
Since the two tables are normalized, the compensation additionally effectively scales the peak signature by a coefficient that needs to be compensated for later. For efficiency this scaling is assumed to be performed more efficiently in a later stage, where the magnitude has been obtained and doesn’t require a complex multiplication.
4.3.5
Results
The vhdl model was simulated in ModelSim, sweeping frequencies over one frequency bin. The results are shown in figure 4.15, with the two plots depicting the detected amplitude and detected phase respectively.
50
52
54
0.80
0.85
0.90
0.95
1.00
1.05
1.10
Frequency (MHz)
Amplitude
50
52
54
Frequency (MHz)
Phase (r
adians)
− π
0
π
Figure 4.15:Detected amplitude and phase
The output shows that the amplitude is determined accurately, but more so at lower frequencies. It shows similar characteristics to the results of the Matlab model in figure 4.5.
The phase is detected accurately as well, similar that of the phase detected with the Matlab model in figure 4.6.
58 4 Frequency analysis and the dsp
The vhdl model achieves satisfactory precision in detecting the important char-acteristics of the interferer.
4.4 Conclusions 59
4.4
Conclusions
Using cross-correlation for the signal detection provides higher frequency, ampli-tude and phase precision. By using a pipelined architecture, multiple signals can be detected with low latency.
With the acquired results, the dsp fulfills the goals of this work block. Being able to reliably detect interferers, even those not a multiple of the sampling frequency, is a key feature of the Leon interferer cancellation loop.
This method in tandem with the cascaded fft architecture proves effective in de-tecting close-in interferers. Using a delay line, these interferers can theoretically be attenuated by up to 30 dB.
4.4.1
Future work
Since the twiddle factor look-up table and the window look-up table never op-erate simultaneously, they can be merged into one, saving space or complexity depending on the target hardware.
The radix-4 fft uses a simple finite state machine (fsm) for control, and delays are manually inserted between the fft stages to prevent data hazards. A thor-ough investigation on the possibilities of reordering the butterfly operations can minimze the required delays between the stages of the fft.
Peak matching is currently done assuming an odd number of samples, and that the window function in the frequency domain is truncated outside of N −12 fre-quency bins from the center. This makes sense for N = 5, where only the spectral contents of the main lobe are considered, but for N < 5 precision could be in-creased by not truncating the main lobe contents. This is particularly severe for
N = 3, where one of the three samples is currently ignored, losing valuable
infor-mation.
4.4.2
Multiple signal detection
Since the signal power outside the main lobe of the Hamming window is low, multiple signals can be distinguished if their main lobes do not overlap. In this case, simply discarding the spectral contents of the window outside the main lobe when correlating still yields good results, and signals can be distinguished if they are at least five frequency bins apart.
4.4.3
Solving ambiguities
Because of the leakage in the first dft, a blocker detected at a frequency offset in a specific bin can originate at any frequency fblocker = fof f set+ nfs, n ∈ Z, but
then with augmented amplitude and phase.
Introducing diversity by observing the frequency contents in a different bin, or using a different sampling frequency, a system of linear congruences appear.
60 4 Frequency analysis and the dsp
With enough observations with orthogonal parameters, all blockers can be dis-tinguished.
Traditional algorithms for solving systems of linear congruences will not suffice since the detected values are not well-known integers. Instead, a system follow-ing the fuzzy math discipline is more likely to succeed, implementfollow-ing an algo-rithm for solving a system of fuzzy linear congruences. This is beyond the scope of this document.
5
Summary
The task of this thesis was divided into two parts. The first task was to inves-tigate the signal processing aspects of the Leon loop, develop a Matlab model and finally write an rtl model in vhdl. The second task was to continue the development of the sasp and advance it towards its tape-out.
Development of the sasp yielded new control and data structures, increasing the reliability and speed of the circuit. The weighting unit, responsible for perform-ing the multiplications in the fft algorithm, was improved and its linearity issues were alleviated. The gains were verified via simulations with extracted parasitics. The dsp was modeled in Matlab and an algorithm for fine peak detection was developed. This method, using correlation, was implemented in vhdl along with a radix-4 fft implementation.
5.1
Conclusions
More detailed conclusions can be found in the respective sections of the two work items; section 3.5 for the sasp and section 4.4 for the dsp.
At the end of the thesis, the work on the sasp had come a long way. Improving the linearity, power consumption and speed has been a priority as it is essential for the overall functionality of the feedback system proposed in the Leon project. The blocks that were worked on, including the address generator, rom structures and weighting unit are shown to work well in simulations, and reach their stipu-lated design goals.
The work on the dsp ended with a full vhdl model, including a full fft im-61
62 5 Summary
plementation constructed with synthesis on an fpga in mind. It uses cross-correlation to increase the precision of the signal detection, a method that works well in simulations.
Using the original premise of the Leon project; a cascaded fft configuration to-gether with the improved signal detection yields good results and interferers can theoretically be attenuated by up to 30 dB.
5.2
Future work
More detailed discussions on future work can be found in the respective sections of the two work items; section 3.5.1 for the sasp and section 4.4.1 for the dsp. There still remained work on the sasp at the end of this thesis. The work is to be resumed and finished and the chip will then finally be sent to fabrication. As for the dsp, it needs to be implemented in a specific fpga architecture. Some modern logic synthesizers have the capability of inferring hardware blocks such as memories and multipliers, but some degree of architecture-specific optimiza-tion is inevitable. After porting, the true performance of the dsp will show. The sasp and the dsp will need to be tested together to see the open-loop perfor-mance of the interferer detection algorithm with real signals.
Finally the entire, closed loop of project Leon needs to be simulated in its entirety. The signal generator, the delay line and the signal combiner needs to be present at this stage. When this is done the true capabilities of the project will show.
5.3
Final words
The work on the sasp and the dsp carried widely different requirements and approaches.
The sasp had a clear goal from the start and its iterative process had already begun when the work was resumed. The work done in this thesis on the sasp moved it towards its completion.
The work on the dsp was more open and encouraged innovation. This allowed time for reflection and research, and allowed for the discovery of using cross-correlation. This method overcame the principal limitation of the dsp, namely the ability to accurately identify interferers at frequencies other than multiples of the sampling frequency.
The dsp finally had a viable rtl model for synthesis on an fpga. Simulations show promising results.
This thesis has enabled me to explore two widely different domains of engineer-ing, and I have learned a great deal on how to solve engineering problems in the two.
5.3 Final words 63
I am confident that project Leon will play a significant role in the future of soft-ware radio as it elegantly solves one of its fundamental problems with interferers.
Bibliography
Y. Abiven. A low-power 2 GHz discrete time weighting system dedicated to sam-pled analog signal processing.ICECS, pages 57–60, 2011. Cited on page 29.
J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of com-plex fourier series.Math. Comput., 19:297–301, 1965. Cited on pages 17 and 27.
Y. Deval. Low cost mobile RF terminal paradigms: From multi-radio to software radio. InIEEE International Conference on Solid-State and Integrated Circuit Tech-nology (ICSICT), pages 627–630, 2010. Cited on page 22.
M. Kitsunezuka, K. Kunihiro, and M. Fukaishi. Efficient use of the spectrum.
IEEE Microwave Magazine, 13(1):55–63, Jan/Feb 2012. Cited on page 22.
P. Kogge and H. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Transactions on Computers, pages 783–791,
1973. Cited on page 31.
J. Mitola III. The software radio architecture.IEEE Transactions on Computers, 33:
26–38, May 1995. Cited on page 22.
J. Mitola III and G. Q. Maguire, Jr. Cognitive radio: Making software radios more personal.IEEE Pers. Commun, 6(4):13–18, Aug 1999. Cited on page 22.
B. Razavi. Cognitive radio design challenges and techniques. IEEE J. Solid-State Circuits, 45(8):1542–1553, Aug 2010. Cited on page 22.
François Rivet. Design of a Radio Frequency Front-End Receiver dedicated to Software-Radio for Mobile Terminals. PhD thesis, University of Bordeaux 1, 2009.
Cited on pages 18 and 27.
J. Yuan and C. Svensson. High-speed CMOS circuit technique.IEEE J. Solid-State Circuits, 24(1):62–70, Feb 1989. Cited on page 31.
A
Signal detection with
cross-correlation
To properly identify the frequency, magnitude and phase of sinusoidal blockers, the distortion introduced by windowing needs to be countered.A real sinusoidal input signal can be rewritten as two complex ones.
Sin(t) = Aincos(ϕin+ 2πfint)
= Ain
exp(i(ϕin+ 2πfint)) + exp(−i(ϕin+ 2πfint))
2
(A.1)
Observing only the positive frequency of the real signal, it can be rewritten as
Sin,pos(t) =
Ain
2 exp(i(ϕin+ 2πfint)) (A.2) The windowing of the input signal yields a convolution with said window in the frequency domain. Since the main lobe of the Hamming window extends over several frequency bins, leakage into adjacent frequency bins is significant. For this reason, when observing spectral components in a bin, the actual frequency of the originating signal is ambiguous.
Furthermore, the Hamming window contains frequency components extending towards infinity, and this in combination with sampling leads to aliasing. This is usually not a problem, unless the signal frequency is in the first or last fre-quency bins, where leakage of the main lobe of the signal’s negative frefre-quency counterpart creates an alias of equal magnitude in the very same bin.
Treating the effects of aliasing separately, the contribution from Sin,pos to a
68 A Signal detection with cross-correlation
quency bin n in a dft started at a time kTDFT, k ∈ Z can be regarded as
Bin,pos[n] = AH amming(nfs−fin)
Ain
2 exp(i(ϕin+ 2πfinkTDFT)) (A.3) Introducing the variable fof f set = nfs−fin, Bin,pos[n] can be rewritten as
Bin,pos[n] =AH amming(fof f set)
Ain 2 exp(i(ϕin+ 2π(fof f set+ nfs)kTDFT)) =AH amming(fof f set) Ain 2
exp(i(ϕin+ 2πfof f setkTDFT)) exp(i(2πnfskTDFT))
(A.4)
Observing that the time between successive runs of the first dft of N samples is
TDFT = N /fs, the last factor resolves to unity;
exp(i(2πnfskTDFT)) = exp(i(2πnfskN
fs
)) = exp(i(2πnkN )) = 1 (A.5) Sampling the output of bin n over successive runs of the first dft, with k = 0, 1, 2, 3, . . . , K − 1, the signal will appear as a complex sinusoidal with frequency
fof f set. Performing a second dft on this sequence of samples will yield the
con-tribution to frequency bin m;
Bin,pos[n, m] =AH amming(mfDFT −fof f set)
AH amming(fof f set)Ain
2 exp(iϕin)
(A.6)
With sufficiently large K, the frequency offset within the bin of the first dft,
fof f set, can be acquired with some precision, and the effects of the first
Ham-ming window can be countered. However, the effects of the second HamHam-ming window still cause problems when determining the amplitude and phase of the blocker, since even a small offset in frequency will cause distortion as per fig-ures 4.3 and 4.4.
This problem can be facilitated with using cross-correlation on the output of the second dft with a fine-grained Hamming window in the frequency domain. The peak of the complex cross-correlation will yield the most likely frequency of the blocker.
(AH amming? Bin,pos)(f ) =
X
m
A∗H amming(mfDFT + f )AH amming(mfDFT −fof f set)
AH amming(fof f set)
Ain
2 exp(iϕin)
69 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
−5
0
5
Frequency bin
Magnitude
Figure A.1:Precise signal frequency implied by correlating with the window
Relating to previous discussions, a sinusoidal signal at the input of the second dftwill appear in the frequency spectrum as a Hamming window.
The peak of the cross-correlation between the dft output and the Hamming win-dow appear where the input likely originated.
Since the Hamming window is heavily attenuated outside of the main lobe, pe-ripheral frequencies can be ignored, trading the accuracy loss for computational efficiency. When limiting the frequency band, and also assuming that all spectral components are spaced by at least the same frequency offset, the cross-correlation can be regarded as a shifted version of the Hamming window’s autocorrelation function.
This operation becomes an instance of autocorrelation. A property of autocorrela-tion is that its maximum value is found at a lag of zero; in this case at f = −fof f set.
Detecting this peak yields fof f set. The value of the autocorrelation is then;
ACorr(f ) = X m A∗H amming(mfDFT −f ) AH amming(mfDFT −f ) =X m |AH amming(mfDFT −f )|2 (A.8)
70 A Signal detection with cross-correlation
This function is periodic with fDFT, and its effects can be countered when fof f set
is known, as is the case after the peak detection.
Having detected a peak at f with (AH amming? Bin,pos)(f ), and removed
window-ing artifacts by dividwindow-ing by ACorr(f ) and AH amming(f ), and then multiplying by
two, the original signal amplitude and phase is obtained.
B
Fast Fourier transform
B.1
Decimation-in-time radix-4 FFT
The sasp implements a decimation-in-time radix-4 FFT. The dft is defined as.
X(k) =
N −1
X
n=0
x(n)WNkn (B.1)
The first step is dividing the summation into four interleaved sub-summations.
X(k) = N /4−1 X n=0 x(4n)WN4kn + N /4−1 X n=0 x(4n + 1)WNk(4n+1) + N /4−1 X n=0 x(4n + 2)WNk(4n+2) + N /4−1 X n=0 x(4n + 3)WNk(4n+3) = N /4−1 X n=0 x(4n)WN4kn +WNk N /4−1 X n=0 x(4n + 1)WN4kn +WN2k N /4−1 X n=0 x(4n + 2)WN4kn +WN3k N /4−1 X n=0 x(4n + 3)WN4kn (B.2) 71
72 B Fast Fourier transform
Now the recursive nature of the decomposition starts to show, as the four sum-mations are in themselves the very definitions of smaller dft instances. The four summations are defined as follows, for i = 0, 1, 2, 3.
Fi(k) = N /4−1 X n=0 x(4n + i)WN4kn = N /4−1 X n=0 x(4n + i)WN /4kn (B.3)
An interesting property of this definition is that WN4kn is cyclic with a period of N4. This means that Fi(k) = Fi(k + N4) = Fi(k + N2) = Fi(k + 3N4 ), i.e. four
output samples share the same recursive dft. Aligning these four output samples illustrates the benefit of this.
X (k) = F0(k) +WNkF1(k) +WN2kF2(k) +WN3kF3(k) X k +N 4 = F0(k + N /4) +WNk+N /4F1(k + N /4) +WN2(k+N /4)F2(k + N /4) +WN3(k+N /4)F3(k + N /4) X k +N 2 = F0(k + N /2) +WNk+N /2F1(k + N /2) +WN2(k+N /2)F2(k + N /2) +WN3(k+N /2)F3(k + N /2) X k + 3N 4 = F0(k + 3N /4) +WNk+3N /4F1(k + 3N /4) +WN2(k+3N /4)F2(k + 3N /4) +WN3(k+3N /4)F3(k + 3N /4) (B.4)