An efficient Hardware implementation of the Peak Cancellation Crest Factor Reduction Algorithm

(1)

SECOND CYCLE, 30 CREDITS STOCKHOLM SWEDEN 2016,

An efficient Hardware

implementation of the Peak

Cancellation Crest Factor

Reduction Algorithm

MATTEO BERNINI

(2)

MATTEO BERNINI

Master’s Thesis at KTH Information and Communication Technology Supervisor: Shafqat Ullah

Examiner: Johnny Öberg

TRITA-ICT-EX-2016:187

(3)

(4)

An important component of the cost of a radio base station comes from to the Power Am- plifier driving the array of antennas. The cost can be split in Capital and Operational expenditure, due to the high design and realization costs and low energy efficiency of the Power Amplifier respectively. Both these cost components are related to the Crest Factor of the input signal. In order to reduce both costs, it would be possible to lower the average power level of the transmitting signal, whereas in order to obtain a more efficient transmission, a more energized signal would allow the receiver to better distinguish the message from the noise and interferences. These opposed needs motivate the research and development of solutions aiming at reducing the excursion of the signal without the need of sacrificing its average power level. One of the algorithms addressing this problem is the Peak Cancellation Crest Factor Reduction. This work documents the design of a hardware implementation of such method, targeting a possible future ASIC for Ericsson AB. SystemVerilog is the Hardware Description Language used for both the design and the verification of the project, together with a MATLAB model used for both exploring some design choices and to val- idate the design against the output of the simulation. The two main goals of the design have been the efficient hardware exploitation, aiming to a smaller area footprint on the integrated circuit, and the adoption of some innovative design solutions in the controlling part of the design, for example the managing of the cancelling pulse coefficients and the use of a time-division multiplexing strategy to further save area on the chip. For the contexts where both the solutions could compete, the proposed one shows better results in terms of area and delay compared to the current methods in use at Ericsson and also provides innovative suggestions and ideas for further improvements.

Keywords: CFR, PC-CFR, PAPR Reduction, OFDM

(5)

En effektiv hårdvaruimplementation av Peak Cancellation-algoritmen

för reduktion av toppfaktor

En komponent som det är viktigt att ta hänsyn till när det kommer till en radiobasstations kostnad är förstärkaren som används för att driva antennerna. Kostnaden för förstärkaren kan delas upp i en initial kostnad relaterad till utveckling och tillverkning av kretsen, samt en löpande kostnad som är relaterad till kretsens energieffektivitet. Båda kostnaderna är kopplade till en egenskap hos förstärkarens insignal, vilken är kvoten mellan signalens maximala effekt och dess medeleffekt, såkallad toppfaktor. För att reducera dessa kostnader så är det möjligt att minska signalens medeleffekt, men en hög medeleffekt förbättrar radioöver- föringen eftersom det är lättare för mottagaren att skilja en signal med hög energi från brus och interferens. Dessa två motsatta krav motiverar forskning och utveckling av lösningar för att minska signalens maximala värde utan att minska dess medeleffekt. En algoritm som kan användas för att minska signalens toppfaktor är Peak Cancellation. Den här rapporten presenterar design och hårdvaruimplementering av Peak Cancellation med avsikt att kunna användas av Ericsson AB i framtida integrerade kretsar. Det hårdvarubeskrivande språket SystemVerilog användes för både design och testning i projektet. MATLAB användes för att utforska designalternativ samt för att modellera algoritmen och jämföra utdata med hårdvaruimplementationen i simuleringar. De två huvudmålen med designen var att utnytt- ja hårdvaran effektivt för att nå en så liten kretsyta som möjligt och att använda en rad innovativa lösningar för kontrolldelen av designen. Exempel på innovativa designlösningar som användes är hur koefficienter för pulserna, som används för reducera toppar i signalen, hanteras och användning av tidsmultiplex för att ytterligare minska kretsytan. I använd- ningsscenarion där båda lösningarna kan konkurrera, visar den föreslagna lösningen bättre resultat när det kommer till kretsyta och latens än nuvarande lösningar som används av Ericsson. Ges också förslag på ytterligare framtida förbättringar av implementationen.

Keywords: CFR, PC-CFR, PAPR Reduction, OFDM

(6)

AM Amplitude Modulation

ASIC Application Specific Integrated Circuit

ASM Algorithmic State Machine

BPSK Binary Phase Shift Keying

CAF Clipping and Filtering Technique

CapEx Capital Expenditure

CCDF Complementary Cumulative Distribution Function

CF Crest Factor

CORDIC Coordinate Rotation Digital Computer

CS Clip Stage

EVM Error Vector Magnitude

FDM Frequency Division Multiplexing

FIR Finite Impulse Response

FM Frequency Modulation

FPGA Field Programmable Gate Array

GSM Global System for Mobile communication (H)PA (High) Power Amplifier

(I)DCT (Inverse) Discrete Cosine Transform

IFFT Inverse Fast Fourier Transform

I/Q In-phase / Quadrature signal

LTE Long Term Evolution

MSR Multi Standard Radio

NS Noise Shaping

OFDM Orthogonal Frequency Division Multiplexing

OOB Out Of Band

(7)

PA(P)R Peak to Average (Power) Ratio

PC, PC-CFR Peak Cancellation Crest Factor Reduction

PCU Peak Cancelling Unit

PDF Probability Density Function

PF Peak Filtering

PM Phase Modulation

PM Peak Manager

PTS Partial Transmit Sequence

PW Peak Windowing

QPSK Quadrature Phase-Shift Keying

RMS Root Mean Square

RTL Register Transfer Level

SLM SeLective Mapping

SV SystemVerilog

TC Turbo Clipping

TDM Time Division Multiplexing

TI Tone Injection

TR Tone Reservation

WCDMA Wideband Code Division Multiple Access

(8)

1 Introduction 1

1.1 Background and statement of the problem . . . 2

1.2 Purpose of the design project . . . 4

2 Background and related work 7 2.1 Background . . . 7

2.1.1 Orthogonal Frequency Division Multiplexing (OFDM) . . . . 7

2.1.2 Definitions: CF, PAPR, EVM and ACLR . . . 9

2.1.3 Overview of the main CFR methods . . . 11

2.2 Related Work . . . 18

3 The proposed implementation of the PC-CFR 21 3.1 General description of the PC-CFR algorithm . . . 21

3.2 Structural description of the proposed implementation . . . 26

3.2.1 The Clip Stage . . . 28

3.2.2 The Peak Manager . . . 31

4 Future work and suggested improvements 43 4.1 Programmable or dynamic CS–PCU mapping . . . 43

4.2 Bypassable PC-CFR module . . . 43

4.3 Clip Stages with different delay memories and cancelling pulses length 45 4.4 Truncation of cancelling pulses . . . 45

4.5 Variable length Peak Search Window . . . 46

4.6 Priority-based acceptance of peaks . . . 47

4.7 Generation of multiple cancelling pulses from the same time slot . . 49

5 Results and conclusions 53 5.1 Comparative synthesis results . . . 53

5.2 Some input and model configuration exploration . . . 54

5.2.1 Observations . . . 57

Bibliography 61

Appendices 62

(9)

(10)

Introduction

If the cost of a typical transmitting radio base station is analyzed, we discover that the Capital Expenditure (CapEx)¹ and the Operating Expenditure (OpEx)² relative to the radio cards alone cover roughly 50% of the total cost[1]. The radio cards house the Power Amplifier (PA) whose low efficiency is the main culprit for the OpEx part of the overall costs. In fact, only a small quota of the power consumed by the radio cards becomes transmitted power. Similar considerations are valid for the consumer electronics market: every mobile device, relying on wireless communications, suffers from the non-optimal efficiency of the PA causing a substantial negative effect on the battery lifetime. In many low-cost applications, this issue alone might prevent the whole system to be considered convenient or even possible to design. The efficiency of the PA is a function of the characteristics of the input signal, in particular of its Peak to Average Power Ratio (PAPR, or PAR) or Crest Factor (CF), which are the ratio between the powers or the magnitudes associated to the largest and the average values of the signal, respectively.

In Figure 1.1, we can see a small segment of data in a typical scenario. The maximum values, that is the peaks (a more accurate definition of peaks will be given in 3.1, for now the intuitive comprehension is sufficient), are responsible for the high PAPR of a given signal. It is not surprising that the industry is striving to reduce this phenomenon, and thus the costs and inefficiencies, by investigating several alternatives. Basically the two most relevant ways to deal with the problem are: 1)introducing some changes in the signals to be transmitted (without of course compromising its informative content) in order to prevent the occurrence of high peaks, at the cost of an increased complexity of the transmitter and/or sacrificing some data rate for the transmission of side information needed on the receiver side for the reconstruction of the information, or 2)digitally processing the signal as it is (either in the time or frequency domain) in order to limit the occurrence and magnitude of the unavoidable peaks, at the cost of some introduced distortion.

This thesis work focuses on the design, modeling and verification of an algo-

1Resources invested by a company to buy or upgrade fixed, physical, non-consumable assets.

2Day-to-day costs of operation.

(11)

Figure 1.1: A segment of a typical signal amplitude showing high variability and, as a consequence, a high ratio between the maximum and average values.

rithm belonging to the digital processing category, namely the Peak Cancellation Crest Factor Reduction (PC-CFR) and it is targeted to an Application Specific In- tegrated Circuit (ASIC). The thesis project was performed at Ericsson AB in Kista, Stockholm.

1.1 Background and statement of the problem

Very widely used multi-carrier signals such as Orthogonal Frequency Division Multi- plexing (OFDM) show higher PAPR than single carrier systems. Also, several radio access technologies such as Long Term Evolution (LTE), Wideband Code Division Multiple Access (WCDMA), etc. are used in Multi Standard Radio (MSR) trans- mitters situated in base stations. These signals exhibit a non-constant envelope behaviour, but show instead a fluctuating envelope with a high CF (see Figure 1.2, [2]). The main reason is the fact that the sum of multiple sub-carriers create a compound signal whose real and imaginary parts approach a Gaussian Probability Density Function (PDF), due to the Central Limit Theorem, whereas the amplitude will approach a Rayleigh PDF. On the other hand, the Global System for Mobile communication (GSM), uses constant envelope Gaussian modulation.

The input-output static characteristics of a PA show a linear region bounded by a non-linear part (see Figure 1.3). The part of the PA input signal characteristics outside the linear region entails significative Out Of Band (OOB) emissions, caused

(12)

Figure 1.2: Comparative view of PAPR for different transmission protocols (source:

[2]).

Figure 1.3: Power Amplifier characteristics before PAR reduction (source: [3]).

by the inter-modulation products on the adjacent channels. Therefore the linear part of the PA’s characteristics needs to be wide enough to contain the dynamic range of the input signal that has to be amplified and fed to the antenna(s). In order for the PAs to accommodate signals with such a high voltage swing, either they have to be dimensioned for the maximum peak value (thus increasing the CapEx), or they

(13)

Figure 1.4: Power Amplifier characteristics after PAR reduction. Note the increased average output voltage (thus power) available thanks to the reduction of the PAR (source: [3]).

are made operating with more back-off³ from the most convenient operating point, which translates to a lesser efficient usage of energy (thus increasing the OpEx). In other words, PAs with larger linear ranges are more expensive and make a worse use of electric power than those with smaller linear input range.

What is desirable, instead, is to deal with signals with limited PAPR (or CF) because then it is possible to increase their average power level without the risk of falling into the saturation region of the PA. The increased transmitting power guarantees a higher strength of the signal with respect to the unavoidable noise and thus an overall more efficient transmission of information. In Figure 1.4, the input- output characteristics of a PA after a 6 dB reduction of PAPR is shown. Notice that now it is possible to accommodate the operating point of the signal at a higher power level thanks to the reduction of the PAPR.

1.2 Purpose of the design project

The purpose of the project described in this report is the design, verification and performance test of an innovative implementation of the Peak Cancellation (PC) algorithm, which will possibly be implemented in one of Ericsson’s ASICs in the future. The design is as generic and configurable as possible, in order for the user to be able to compare different parameter options against existing solutions already implemented in Ericsson. The programmability of the PC-CFR module is another desirable characteristic of the project because, as a consequence of the changes in

3The back-off is the deliberate reduction of the average input power to the PA.

(14)

the input signal properties, some actions might be taken accordingly, for example a change of the length of the search window (related to the granularity of the detection of the peaks).

One of the most attractive aspects of the PC algorithm, as opposed to other solutions, is the low complexity in terms of hardware, which translates to a smaller area occupancy on the ASIC and to a lower power consumption of the module. The drawback of the PC is that each peak must be treated separately by dedicating hardware resources to it for the entire duration of the corresponding cancelling pulse. When the detected peaks in the input signal exhibit a density such that the available hardware resources are insufficient in number to cancel them all, some of them pass untouched and eventually reach the PA.

The PC-CFR architecture proposed in this thesis report is new and possibly innovative in some aspects, compared to the documented already existing imple- mentations[1][3][4][5]. The aspect of the design that required most of the effort was the optimization of the hardware resources and, at the same time, the minimization of the probability of a peak leak. In order to fulfill these requirements, most of the hardware resources were not used exclusively but more efficiently shared in a Time Division Multiplexing (TDM) configuration, thanks to the availability of a second, faster clock and several design expedients.

The Register Transfer Level (RTL) design and the testbench are written in the SystemVerilog (SV) language and simulated and synthesized via the software tools made available by Ericsson. A MATLAB golden model has been written in a way to match both the expected behaviour of the PC-CFR algorithm and, as accurately as possible, all the elaborations of the data taking place in the target hardware implementation. This model was used to compare its output against the RTL version when driven with the same input data: the target RTL implementation is considered compliant to the model when the two outputs match sample by sample.

(15)

(16)

Background and related work

2.1 Background

2.1.1 Orthogonal Frequency Division Multiplexing (OFDM)

Communication systems use a physical channel to provide a reliable mean to transfer information by the use of a technique called modulation: by superimposing some coded version of the information over one or more of the characteristics of a properly chosen sinusoidal signal, called carrier, it is possible to overcome the physical limits of the communication channel, in terms of available bandwidth and maximum power. According to the fact that the carrier signal characteristic is frequency, phase or amplitude (or a combination of them), we have several types of modulation (such as Amplitude Modulation (AM), Phase Modulation (PM), Frequency Modu- lation (FM), Binary Phase Shift Keying (BPSK), Quadrature Phase-Shift Keying (QPSK), etc...) each with different advantages and drawbacks. If more than one line of communication needs to be established over the same physical channel, then some means to share it must be employed, such as multiplexing (we might think of these independent paths of communication as logical channels, as well as pairs of users).

In Time Division Multiplexing (TDM) each user occupies the entire bandwidth of the channel for a given time frame in a round-robin fashion, with some silence time between two successive frames, whereas in Frequency Division Multiplexing (FDM) the whole channel bandwidth is divided in segments separated by guard intervals and each user has at its disposal a specific bandwidth arranged somewhere around a carrier for the entire duration of the communication. The relation among carriers can be any, the only constraint being the non-overlapping of the frequency bands of each channel. In OFDM there is a specific relationship among the carrier frequencies i.e. they are all multiples of a single frequency. This simple expedient allows the relaxation of the requirement about the non-overlapping of the various bands, thus actually compacting them together in order to make a better use of the channel resource. The fact that all carriers are multiples of a common frequency

(17)

Figure 2.1: Block diagram of the generation of a OFDM signal. The sinusoidal carriers are orthogonal (source: [6]).

entails the orthogonality¹ of them and this makes the recovery of the transmitted information on the receiver side much easier and, most of all, possible even if the signals overlap in frequency. OFDM (which can be considered as a special case of FDM), is a so called multi-carrier modulation technique because it makes use of several carriers at the same time each capable of conveying information modulated according to different mappings (BPSK, QPSK, etc...).

The communication quality through channels affected by frequency selective fading² benefits from OFDM, in the sense that the fading can be more easily compensated at the receiver side: with OFDM, instead of compensating for the fading of the channel as a continuous function of frequency over a large range (which is a more involved operation), the receiver can divide the frequency range into small segments corresponding each to a sub-carrier, and approximate the fading as a constant in each one of such segments. The advantage is that constant fading can be fought more easily by using error correction and other techniques. A block-level diagram is shown in Figure 2.1 (see also [6]).

1Two signals are said orthogonal if their scalar product is zero.

2Frequency selective fading is a radio propagation anomaly due to the partial cancellation of a signal by itself, because the signal arrives from at least two different directions and one or more of such paths is lengthening or shortening.

(18)

2.1.2 Definitions: CF, PAPR, EVM and ACLR

As already stated, the problem with non-constant envelope signals is the presence of a too large variability in the amplitude, and this is harmful for the design and power efficiency of the PA. This phenomenon can be very closely related to the presence of group of samples whose magnitude exceeds a certain desired value called threshold.

Some of the techniques proposed to mitigate this behavior are briefly listed in the following, but here more quantitative definitions of Crest Factor and Peak to Average (Power) Ratio are presented. We define the Crest Factor as the ratio between the maximum value of the magnitude and the average value of a signal, observed in a certain temporal window:

CF = ks(n)kmax

s_rms

We also define the more commonly used Peak to Average (Power) Ratio, again for a given interval of time or a certain number of samples, for discrete-time contexts:

P AP R= ks(n)k²max

s²_rms , or P AP R_dB = 10 log10

ks(n)k²max

s²_rms

Note that P AP R = CF². The desired effect of the various CF reduction techniques is to reduce the PAR of the signal without introducing too much distortion.

Some of the techniques will not introduce any distortion at all, at the price of a greater complexity and/or reduction of data rate, whereas some other will inject some unavoidable distortion both in-band (the bandwidth occupied by the signal being transmitted) and out of band (in the adjacent bands). Both of these side effects are of course undesirable and in order to quantify them, two parameters exist:

Error Vector Magnitude (EVM), and Adjacent Channel Leakage Ratio (ACLR).

EVM is a measurement that quantifies the global displacements of the received (output) signal compared to the expected ideal one, due to any disturbances (such as noise) and, as in our case, to the CFR intervention too. We define it as (see Figure 2.2):

EV M = 10 log10

P_error Pref

or EV M(%) =

sP_error Pref

·100

Where Perror is the sum of all the error vector powers and Pref is the sum of all the reference, expected, signal powers. The error vector is the vector in the I/Q plane that connects the received symbol with the ideal, expected position in the plane (the position corresponding to the exact transmitted symbol). For each received symbol, the corresponding power is computed and averaged, then divided by a properly chosen value representative for the modulation scheme. The result is a cumulative measure of how much the whole transmitter-receiver chain is close to the ideal from the accuracy point of view. In an ideal transmission system, each received waveform would fall exactly in one of the possible points in the plane corresponding to the coding of the sent symbol. The scattering of the received waveforms compared

(19)

Figure 2.2: I/Q plane with representations of the reference and the measured (or received, in a communication channel) vectors. The powers of the error and the reference vectors are used to compute the EVM (source: [7]).

Figure 2.3: The components at the base of the definition of the ACLR (source: [8]).

to the constellation of the expected symbols is as much pronounced as less ideal is the communication system. In the present case, in-band distortion introduced by the CFR algorithm has a direct effect on the EVM which, as a consequence, is considered as a measurement of the performance of the method.

Adjacent Channel Leakage Ratio (ACLR) is the measurement concerning the out of band distortion. It is defined as the ratio of the power leaked to the adjacent and the power in the carrier channels (see also Figure 2.3 and [8]):

ACLR= AdjacentChannelP ower M ainChannelP ower

(20)

The most important reason behind the desire to keep the ACLR to a low level is that otherwise unexpected and unwanted power will pour outside of the frequency band of interest. If the adjacent frequency intervals are used as the main channels of other communication systems, it means that we are injecting interference into them. The second reason driving the effort in keeping the ACLR as low as possible is simply the fact that high ACLR translates to some energy (supposed to be in the main channel) wasted over adjacent channels therefore reducing the efficiency of transmission.

2.1.3 Overview of the main CFR methods

Several techniques have been proposed to mitigate the PAPR problem of the OFDM signals. These techniques can be roughly and partially categorized in: coding technique, probabilistic (scrambling) technique, adaptive pre-distortion technique and clipping technique. This last category will be further explored given its importance to this thesis work.

Coding technique

The coding technique pursues PAPR reduction via an appropriate choice of the codes of the modulation to be transmitted for each sub-carrier. This method causes no distortion both in-band and OOB, but it suffers from non optimal bandwidth usage because a smaller number of data words is mapped to a greater number of code words. The complexity of the algorithm is also non-negligible because both the computational effort needed to choose the most appropriate symbol to send and the area required to store the look-up tables grow rapidly with the number of sub-carriers, up to the point of becoming computationally intractable for common useful signals.

Probabilistic (scrambling) technique

This technique entails the scrambling (meaning, in this context, the act of manip- ulating a signal with a well known sequence to alter its properties but in such a way to not introduce distortion) of the OFDM input signal with several versions of scrambling sequences, one block of samples at a time, and successively choosing among the resulting sequences the one exhibiting the lowest PAPR. This approach cannot guarantee a desired PAPR level (it will provide the minimum among the sequences though), yields a reduction in bandwidth utilization because of the additional information to be sent to the receiver and the complexity rapidly increases with the number of sub-carriers. This solution includes the SLM (SeLective Map- ping), PTS (Partial Transmit Sequence), TI (Tone Injection) techniques, and TR (Tone Reservation) algorithms.

As an example we might very briefly consider the SeLective Mapping (see Figure 2.4). This technique requires the OFDM signal to be independently multiplied by U phase sequences Pv^u= e^jφ^u^v, u= 1, 2, ..., U. The U resulting sequences are passed

(21)

Figure 2.4: Block diagram of the selective mapping technique for PAPR reduction (source: [9]).

through U IFFT (Inverse Fast Fourier Transform) blocks and the output sequences x^u are compared in order to determine the one yielding the lowest PAPR. The side information about the selected sequence needs to be sent into the channel for the receiver to be able to reconstruct the original OFDM message. Therefore, the SLM algorithm requires U IFFT blocks, the sending of the side information and the block to properly choose the version of the OFDM signal with the smallest PAPR, through a proper measurement and comparison.

Adaptive pre-distortion

The idea behind the adaptive pre-distortion is to distort the signal according to a non-linear function in order to compensate for the successive, well known, non- linear characteristics of the PA. Some solutions are capable of dealing with time- variable characteristics of the PA by dynamically and efficiently changing the input constellation.

Clipping technique

This technique has the advantage of being the simplest to implement, but incurs in in-band distortion, out-of-band interferences, and the disruption of the orthogonality of the sub-carriers. The method requires some sort of digital processing in the time and/or frequency domain. Among others, this technique includes: Clipping and Filtering Technique (CAF), block-scaling technique, Peak Windowing technique (PW), Peak Cancellation technique (PC), and Fourier projection technique.

(22)

In order to introduce the scope of this work, a brief description of some of the algorithms belonging to this category follows. The algorithms have been chosen because of their conceptual and practical affinities with the proposed approach in this work. For all the following algorithms (Peak Filtering, Peak Cancellation and Peak Windowing), the concept of threshold is of utmost importance. The threshold is the desired maximum value for the magnitude of the input signal. It can be either hardwired inside the algorithm or programmed during its operating life. In any case, by setting a certain value for the threshold, we also inherently program a desired PAPR, because the magnitude of the signal is monotonically related to the power. The three described algorithms differ in the way they obtain the reduction of the maximum magnitude of the signal (and thus the PAPR) to the desired level, but they all will operate a digital processing on it thus introducing some distortion, whose size they try to minimize.

Peak Filtering (PF)

The Peak Filtering algorithm, sometimes referred to as Noise Shaping (NS) consists of extracting the part of the input signal whose magnitude exceeds the threshold, called the clip error sequence, then filtering it and finally subtracting it from a properly delayed version of the original signal itself. The purpose of the delay is to compensate for all the latencies generated during the detection and extraction of the clip error and filtering. The clip error generation consists first of the generation of a clipped version of the signal, B(n), according to the formula (note that the clipped signal retains its complex nature, see also Figure 2.5):

B(n) =







x(n) if kx(n)k ≤ threshold

x(n)·threshold

kx(n)k otherwise

and second, of the successive subtraction of such a generated signal from the original one:

e(n) = x(n) − B(n)

where x(n) is the original signal and e(n) is the clip error (see Figure 2.6). The clip error signal e(n) is then filtered by a filter whose coefficients are computed off-line and stored in a memory. The filter design is tailored for the specific type of signal the algorithm will work with (i.e. number and bandwidth of the carriers). After each iteration of the algorithm, it is possible that some peaks will be created by the filtering operation itself (the so called peak regrowth phenomenon), so succes- sive applications of the algorithm might be necessary, and this is accomplished by cascading several stages of PF.

Another reason justifying the cascading of several PF stages is the fact that a discrete-time signal does not necessarily exhibit the maximums of the true analog signal of which the elements constitute the sampling and that will reach the Power Amplifier[9]. It is indeed possible for two successive elements of the discrete-time

(23)

Re Im

Figure 2.5: Reduction of a complex sample to a version with the same phase and magnitude equal to a set threshold.

signal to have both a lower amplitude than the analog signal they are samples of, because of the very nature of the discrete-time representation of a continuous time signal. In order to expose these hidden peaks, fractional delay filters are often interposed between successive stages of the PF. The effect of these filters is equivalent to a conversion from digital to analog followed by a slightly time-shifted sampling process at the same sample rate as the original.

samples treshold

Input

treshold

Output

ampl. ampl.

samples

Figure 2.6: The generation of the clip error from the original signal.

Peak Cancellation (PC)

Contrarily to the PF, the Peak Cancellation algorithm (see Figure 2.7) does not filter the clip error sequence, but explicitly isolates a single input element sample among those identified within a certain Peak Search Window interval (a more formal

(24)

Figure 2.7: A very simplified top-level architecture of the Peak Cancellation algorithm.

definition will be given when the algorithm will be described more in depth). Each time the algorithm detects these elements, called peaks, it cancels them individually by subtracting a properly shaped cancelling pulse from the signal, one for each peak. The major advantage of the PC is the reduced complexity of the algorithm compared to the PF because of the lack of actual filtering over a clip error. In Figure 2.7, the Peak Extractor is the block that detects the samples whose magnitude is greater than the threshold, and it is basically the same in PF and in PW, whereas the Peak Detector, present only in the PC algorithm, isolates the maximum of the samples, which as said is defined as the peak. The reduction of the PAPR via the PC algorithm is made by the cancellation of these detected peaks via cancelling pulses that are generated only when the peaks are detected. The stored pulse is, similar to the PF filter, a combined impulse response of all the input carrier filters modulated to the correct frequency within the multi carrier frequency band. Such cancelling pulse can be generated in advance (off-line) and is only dependent on carrier configuration of the input signal. For each peak, an impulse with the correct amplitude and phase is generated and subtracted. Some peak regrowth can occur as consequence of the subtraction of the cancelling pulses from the input signal, therefore the algorithm has to be run several times. For example, in Figure 2.8 it can be seen that, because of the application of the cancelling pulse (in red), the two minimums surrounding the peak add in phase with the pulse itself thus generating two more peaks.

Peak Windowing (PW)

The peak windowing algorithm (see Figure 2.9) is based on multiplying the signal with an attenuating window W (k) rather than adding a correction to the signal.

When a peak is detected in the input signal, a set of coefficients (a window, see Figure 2.10) is either generated at run-time or read from a memory where it is stored, pre-computed off-line. Before the application of the window to the signal,

(25)

Figure 2.8: The effect of the cancelling pulse on the adjacent samples of the targeted peak. Note the regrowth of the peaks as a consequence.

Peak Extractor Window generator

Delay

1 +

-

Figure 2.9: Top-level architecture of a Peak Window algorithm

the coefficients are scaled by a real number C, chosen in such a way that the peaks will be attenuated to the desired level (threshold). The signal around the maximum peak sample np is multiplied by the attenuating window according to:

y(n) = x(n) · (1 − C · W (n − np+ K/2))

Where K is the number of the window’s coefficients. The input signal is delayed to compensate for the delay of the peak search part of the algorithm and to make the peak sample correspond to the maximum of the window. The windowing operation corresponds to a subtraction, from the original signal, of a windowed part of itself, whereas in the frequency domain it corresponds to the convolution of the signal with

(26)

Figure 2.10: Window to be multiplied with the signal in order to reduce the magnitude of the peaks.

the Fourier transform of the window. Among the advantages of the algorithm, there is the fact that if the window amplitude changes smoothly, then not much OOB emission is expected to appear, but on the other hand the lack of knowledge about the exact frequency characteristics of the attenuating window (because it is tailored on the particular input segment around the peak) makes it harder to guarantee a required or specified OOB performance. It would be desirable to minimize both the EVM and the OOB but a trade-off must be chosen for the length of the window because, as it will be better clarified further, the longer the window is, the worse the impact on the in-band distortion (thus the EVM) is but the better the effect on the adjacent channel (thus the OOB emission) and vice-versa is at the same time. Furthermore, if closely spaced peaks are detected, the algorithm tends to overcompensate and this again has a negative effect on the EVM.

Figure 2.11 shows the effect of the windowing on a segment of the input signal.

The successive processing of the signal in this way, when there are overlappings among successive windows has the unfortunate effect of reducing the overall average power instead of the PAPR. This can be partially mitigated by introducing some more complexity in the algorithm, such as coefficients that take into account the presence of earlier windows, the searching and detection of closely spaced peaks and the subsequent generation of the window only once etc... The best way to reduce the risk of an excessive attenuation is the cascading of several PW stages each attenuating the peaks in a lower measure. This of course will introduce longer delay as well. The PW is the least complex of the presented algorithms but also the one having the worst (and least predictable) performance in terms of in-band and out of band emissions.

(27)

Figure 2.11: Effect of the application of the window on a segment of the signal containing peaks.

2.2 Related Work

In Xilinx application note 1033 (XAPP1033[1]), the company proposes a PC-CFR algorithm, together with an implementation for their Virtex-4 and Virtex-5 fami- lies of FPGAs, based on a simple architecture featuring a peak detector and four cancelling pulse generators. The coefficients of the unscaled cancelling pulse are generated off-line by superposing as many prototype filter masks, properly shifted in frequency, as the number of carriers the input signal is made of. The algorithm is compared against a Peak Windowing CFR (PW-CFR) and a Noise Shaping CFR (NS-CFR). With the frequency and number of coefficients chosen by the authors of the application note for the comparison, the PC-CFR outperforms both the NS- CFR and the PW-CFR solutions in terms of ACLR and EVM.

In [5], Song and Ochiai propose a Field Programmable Gate Array (FPGA) implementation of the PC-CFR. The added value of their solution is a workaround over the problem of the overlapping of cancelling pulses due to too closely spaced detected peaks being cancelled. When the detected peaks are too closely spaced (in terms of number of samples), the relative generated cancelling pulses might overlap and add in-phase thus both reducing the effect of peak reduction and generating peak regrowth. The authors propose, when the measured distance between successive peaks falls under a certain value, the generation of a truncated version of the cancelling pulses in order to avoid the overlap. Such a truncation introduces discon- tinuities in the signal and as a consequence, OOB emission. The authors state that the use of a simple moving-average filter is good enough to take care of these emissions and satisfy the ACLR requirements. Results show that the proposed solution is satisfactory in terms of both EVM and ACLR although the hardware complexity

(28)

is higher than the plain PC-CFR solution, because of the added circuitry to take care of the detection and truncation of the pulses.

In [10], Schmidt and Schlee propose a PC method that generates a cancelling pulse shaped only on the carrier that, at the moment of the peak detection, gives the most contribution to the aggregated signal. By doing so, the algorithm should minimize both the in-band and the OOB emissions. The knowledge about which sub-carrier is responsible for the largest part of the peak should be available from the measurement of the time the peak is detected. The cancelling pulses are also dynamically conditioned by a set of weights that may change according to several scenarios that might occur (e.g. if a carrier is idle for a certain amount of time, the corresponding spectral range could be "occupied" without any risk of introducing distortion).

In [11], Bauml et al. use the term selected mapping for the first time. The se- lected mapping algorithm can be used to mitigate the PAPR of signals consisting of an arbitrary number of carriers and any signal constellation. This method provides significant advantages at the cost of a moderate additional complexity.

In [12], Wang et al. described the first nonlinear companding³ transform (NCT) for PAPR reduction, applied to a speech processing algorithm µ − law. It showed better performance than the clipping algorithm.

In [13] Jean Armstrong transforms the OFDM signal into time-domain via an over-sized IDFT giving origin to trigonometric interpolation. Then the signal is clipped and filtered via a forward and inverse DFT in order to remove OOB emissions. These results are further improved by the same author (see [14]) by repeatedly clipping and filtering. In particular the author claims that this method causes no increase in OOB emissions.

In [15], unlike the µ − law companding scheme which reduces the PAPR by enlarging the small portions of the signal only, Jiang et al. propose a solution based on the exponential companding technique, that adjusts small and large signals samples altogether, keeping the average power unchanged but transforming the power density distribution to uniform instead of Rayleigh and generating fewer spectrum side-lobes too. Similar approach is pursued in [16] by Al-Azzo et al., where this time the distribution density is transformed from Rayleigh to Gaussian and as a consequence of that, peak and average values are changed so that the overall PAPR reduces. Improvements are shown in the in-band distortion too.

In 2008, Carole et al. [17], present a method that exploits the unused carriers in OFDM systems in order to decrease the PAPR of the signal without introducing significative OOB and in-band distortions (compared to clipping and windowing techniques), because no interference with the proper data channels exists.

In 2013 Sroy et al. [18] propose a version of the Iterative Clipping and Filtering (ICF) algorithm for the PAPR reduction of OFDM type of signals using (Inverse) Discrete Cosine Transform (IDCT/DCT), showing better results than the the reg- ular DFT/IDFT based approach in [14]

3From the combination of the words compressing and expanding.

(29)

(30)

The proposed implementation of the

PC-CFR

3.1 General description of the PC-CFR algorithm

A detailed description of the implementation of the PC-CFR algorithm is given in the following section of this Chapter, but first a more in-depth discussion about it from the general point of view is necessary in order to better understand the design choices that have been made.

The PC-CFR module is usually placed after the aggregator (combining all the signals coming from different channels) and before the Digital Pre-Distorter (DPD),

h₁

Σ

h₂

h_K

CFR DPD DAC HPA

𝑒^𝑖𝜔¹^𝑇^𝑠

𝑒^𝑖𝜔²^𝑇^𝑠

𝑒^𝑖𝜔^𝐾^𝑇^𝑠 x1

x2

xK

𝑒^𝑖ω𝑡

Antenna

Figure 3.1: Typical positioning of the CFR inside the communication chain

(31)

when present (see Figure 3.1). The input of the system is a fixed-point signal made of two parts (in-phase and quadrature¹). It is the result of the sum of all the various components relative to the various carriers. The result is a high PAPR discrete-time signal. The output of the PC-CFR is a lower PAPR and delayed signal of the same format. The purpose of the algorithm is to reduce the PAPR of the input signal to a desired value, and this is achieved by properly monitoring and, when necessary, reducing the values of the samples exceeding a certain threshold. The value of such threshold is directly related to the final desired PAPR.

The PC-CFR performs a time-domain signal processing on limited, selected por- tions of the input signal. Such parts are selected according to the presence of peaks which can be defined as follows: given the interval of samples of the input signal starting from the first one having magnitude greater than the threshold and finish- ing after a fixed number of samples, the peak is the element having the maximum magnitude inside this interval. Because the detection of the peaks is made on the basis of the magnitude of the input samples, a conversion from rectangular to polar form or some other means to expose the magnitude of the input samples is needed as one of the first steps of the algorithm. For each detected peak, a cancelling pulse is generated and subtracted from the input signal in order to reduce the value of the peak to the value of the threshold. The complex coefficients of the cancelling pulse are stored in a memory; these coefficients are the same for each peak being cancelled, but in order for the cancelling pulse to be shaped accurately after the peak it is expected to cancel, they are multiplied by the peak characteristics prior of being subtracted from the input signal. To be more clear, the cancelling pulse, used to cancel the peak from the input complex signal by subtraction, is generated by a simple complex multiplication between each of the coefficients of the stored unscaled cancelling pulse, and a single complex number coming from the peak detection part of the algorithm, this operation being performed for each peak independently.

The characteristics of the peak p that are needed for the generation of the cancelling pulse are: the difference between the magnitude of the sample selected as peak (sk, for some k) and the threshold, and the phase of such element:

p= ρP · e^iθ^P, where ρP = kskk − threshold

The cancelling pulse elements c[n], are generated according to this formula:

c[n] = ρP · ρ[n] · e^i(θ^P^+θ[n])

Where ρ[n] · e^iθ[n] are the coefficients of the unscaled cancelling pulse, for all the values n. It should be noted that this operation is much less computationally intensive (i.e. it requires a much lower amount of hardware resources) than other filtering-based CFR signal processing algorithms. At the output of the multiplier,

1The in-phase/quadrature components format can be formally considered as a complex signal, with the real and imaginary parts corresponding to the in-phase and quadrature components respectively. In the rest of the text the two formalisms (complex and I/Q) will be used interchange- ably.

(32)

the complex data is converted back to rectangular form², ready to be subtracted from the input signal thus finally cancelling the peaks. Of course, it may happen that more than one cancelling pulse needs to be generated at the same time so that portions of their intervals overlap. In order to provide the cumulative effect of all the cancelling pulses, all the coefficients of the active pulses must be added together and then subtracted from the signal at each sample of interest. Another observation is that the cancelling pulse effectively cancels the peak element and that element only: the central element of the unscaled cancelling pulse is the actual element that, when multiplied by the peak characteristics and subtracted from the signal, will yield as a result an element having magnitude matching exactly the threshold value. It follows that the value of such element must be real and equal to one. In Figure 2.5 the effect of the subtraction and the consequent reduction of the peak to a magnitude matching the threshold is shown on the complex plane. All the neighbor input samples will be modified, as already explained, in such a way that their magnitude will be generally reduced too, but it should be noted that the algorithm has not accurate control over these elements, therefore some undesirable phenomenons are unavoidable, as it will be illustrated shortly.

The algorithm is usually applied more than once to the signal, and this is performed by letting the output of the algorithm, elaborated by a module or stage, become the input of the next one, in a cascade-like structure (see Figure 3.2). The reasons for which this is usually done are the following:

• Peak Leak. If an implementation of the PC-CFR algorithm poses an upper limit on the number of simultaneous cancelling pulses that can be generated by a single stage, then it happens that, when such limit is reached and a new peak is detected, the peak will simply pass uncancelled through the stage and, in case of the last one, it will reach the Power Amplifier, which is the event we strive to avoid in the first place. By cascading several stages, the probability of such an event obviously decreases. The scenario depicted should not be considered unlikely because peaks may come in bursts separated by relatively long periods of inactivity, so the utilization of the resources of the module is not uniform during the time, passing from high intensity to long idling periods. It is crucial to understand that what are to be considered as peaks, and so their presence, density and magnitude, are relative to the parameter values we decide to configure the Clip Stage with. So, for example, if for a certain value of the threshold no peaks are detected, it may be possible that for a lower threshold the same set of input values exhibit one or more peaks.

The number of closely spaced detected peaks also depends on the Peak Search Window length (i.e. how many samples are observed in search for the peak):

the same set of input elements could give rise to a larger or smaller amount of detected peaks according to the length of such interval (the longer the

2The rectangular form of the complex numbers is much more suitable than the polar form to perform additions and subtractions.

(33)

interval, the fewer the detected peaks, because larger amounts of samples will be associated with single peaks).

• Peak Regrowth. It can be observed that (Figure 2.8), because the subtrac- tion of the cancelling pulses from the signal interests a much larger number of samples than the peak alone, some of the samples that were smaller than the threshold before the cancellation of a peak may raise over it because of the constructive summation of the cancelling pulses, thus becoming peaks them- selves although they were not in the beginning, and creating the so-called peak regrowth phenomenon. It can be observed that the magnitude of the regrown peaks is correlated with the height of the original cancelled peak, in the sense that the greater a peak is, the more likely and higher the regrown peaks appear after its cancellation. By cascading several stages, the regrown peaks can be taken care of as well.

• Gradual peak reduction. It may happen that in an interval with more can- celling pulses operating simultaneously, one or more peaks are not cancelled efficiently (not completely or too much) because of the reciprocal interactions among cancelling pulses. This is an unavoidable phenomenon which is more likely to happen and whose effects are more severe the greater the peak to cancel is. In order to mitigate this and the effect of the peak regrowth phenomenon, a smart strategy consists of gradually reducing the magnitude of the peaks by applying progressively decreasing thresholds to successive stages of the PC-CFR, instead of trying to completely cancel them in one pass. This can be easily achieved by a cascading architecture because each iteration of the PC-CFR may be independently configured with a different set of parameters, such as the threshold.

The implementation of numerous clip stages not only requires a larger area (and thus higher power consumption) on the chip, but also introduces a higher delay on the signal, which in general is an undesirable effect especially for the more recent communication protocols. The delay in the signal data path is purposely introduced in order for all the computations constituting the algorithm to have the needed time to execute. The largest portion of the delay is by far the group delay of the cancelling pulse itself, which obviously cannot start before the actual detection of the peak.

As previously stated, this algorithm involves some signal processing which in turn will modify the characteristics of the input signal thus introducing both in-band and out of band distortion. In order to reduce this undesirable consequence, the unscaled cancelling pulse is chosen so that its frequency spectrum will match as much as possible that of the input signal. The spectrum of the input signal depends on the number, bandwidth and relative positions of the carriers and is either known or estimable. Hence, a trade-off must be chosen because the longer is the cancelling pulse (which translates to: the more coefficients it is made of), the more severe is the effect on the input signal when the cancelling pulse is subtracted from it because the operation will affect a larger number of elements, impacting negatively on the

(34)

Peak Manager (management of HW resources)

Clip Stage 2

Peak detection notification and peak

characteristics

Cancelling pulses

Higher PAPR I/Q signal

Lower PAPR I/Q signal

Clip Stage 1

Figure 3.2: Simplified block-level view of the architecture of the PC-CFR module, with two cascaded Clip Stages as an example.

EVM; also, longer cancelling pulses require larger memories for their storing and impose longer delays. On the other hand, the steeper the frequency response of the cancelling pulse³ is, the more accurately we can intervene on the signal spectrum while at the same time reducing the consequences over the frequency intervals that do not belong to the input signal, yielding lower OOB emissions. This is a desirable behaviour because the total frequency bandwidth is a resource that is shared among several users, thus its integrity must be preserved.

One notable limitation of the PC-CFR algorithm over other types of signal processing algorithms for CFR is the fact that, every time a cancelling pulse is being generated, it requires the exclusive use of some hardware resources, which of course amount to a finite quantity. Other algorithms, based essentially on the filtering of the signal or portions of it, do not suffer this limitation but, on the other hand, the complexity of the filters (that can be translated to higher area occupancy and in general more power consumed by the ASIC) limits their attractiveness.

On the other hand, a notable advantage of the PC-CFR over the Turbo Clip- ping (TC) and other filter-based algorithms, is its inherent flexibility in terms of changes of the input signal characteristics. For the PC-CFR algorithm, in fact, in order to adapt to a completely different configuration of the input signal carriers, it is just a matter of changing the coefficients of the unscaled cancelling pulse, via a re-configuration of the pulse memory, thus enhancing the usefulness of the module for several contexts. The TC algorithm, instead, operates on each carrier indepen-

3According to the theory of digital signal processing, longer sequences in the discrete-time domain correspond to steeper profiles in the frequency domain.

(35)

dently via a properly designed branch consisting of one or more decimators, Finite Impulse Response (FIR) filters and interpolators. It follows that the entire hardware architecture of the TC is shaped upon a particular configuration for the input signal carriers, and it cannot be reconfigured as easily. On the other hand, the per- carrier filtering of the TC allows a more accurate and so, effective intervention on the input signal, whereas the cancelling pulse in the PC-CFR is generally obtained by the cumulative characteristics of the entire input signal carrier configuration and therefore is sub-optimal with respect to each carrier.

3.2 Structural description of the proposed implementation

The proposed architecture is made by a parameterizable number of cascaded Clip Stages (CSs), each of them communicating with a centralized controlling module called Peak Manager (PM) (see Figure 3.2 for an example scenario with two Clip Stages). The cascading set of CSs constitutes the data path of the signal, and allows the iteration of the algorithm the desired number of times, but not necessarily with the same set of configuration values (every CS can be configured with a local threshold and Peak Search Window length, for example). In each CS, the following operations are performed: the conversion from rectangular to polar form of the input signal, the peak detection, the delaying of the input signal and the subtraction of the cancelling pulse from it.

The PM is responsible for dispatching the detected peaks coming from the various Clip Stages to the available Peak Cancelling Units (PCUs)⁴ by implementing a dispatching policy. The generation of a cancelling pulse requires the availability of a PCU for the entire duration of the pulse itself. Such PCU will appear busy and therefore unavailable for the generation of cancelling pulses for the entire period.

There is a finite number of PCUs in the Peak Manager. The PM receives the notifi- cations about (and the characteristics of) the detected peaks from all the connected CSs, and then generates and dispatches the cancelling pulses to them (again, if at least a PCU is available). The PM is made of several components: one memory to store the coefficients of the unscaled cancelling pulse, a complex multiplier, a Co- ordinate Rotation Digital Computer (CORDIC) unit dedicated to the conversion of the data from the polar back to the rectangular form and an adder to combine together all the cancelling pulses before sending them to the various CSs for the final cancellations. A controlling unit and a pulse generator are responsible for the overall management of the whole subsystem.

In Figure 3.3, the top-level diagram of the entire PC-CFR is presented with the name of the input/output ports and the principal configurable parameters with the names as they appear in the SV code. The following is a list describing each of these signals.

4What is referred here as PCU is the set of hardware and physical resources (a time-slot in the Time-Division Multiplexing rotation is a physical resource) needed for the generation of a cancelling pulse.

(36)

PC-CFR

i_data_real

i_data_imag

o_data_real

o_data_imag

o_data_stats clk

clk_1G rst_n i_dtg i_thr_lvs i_cmd_strst

cr_thr_c cr_psw_length_c

Figure 3.3: Top-level view of the input/output signals and parameters of the PC- CFR module

• i_data_real. Data, input. In-phase component of the input data.

• i_data_imag. Data, input. Quadrature component of the input data.

• i_dtg. Configuration register, input. Input data toggle. At each toggle of this signal the module processes one data.

• i_thr_lvs. Configuration register, input. This input provides the mapping between the peak scale values and the length of the cancelling pulses, as explained in the report.

• i_cmd_strst. Configuration register, input. Synchronous reset of the peak statistics.

• clk. Clock. Main clock of the module. Its value is 250 MHz.

• clk_1G. Clock. Secondary, faster clock of the module used for time-division multiplexing. The value is clk*4 = 1 GHz.

• rst_n. Reset. Active low, asynchronous reset.

• cr_thr_c. Configuration register, input. Values of the thresholds for the Clip Stages.

• cr_psw_length_c. Configuration register, input. Values of the PSW length for the Clip Stages.

• o_data_real. Data, output. In-phase component of the output data.

• o_data_imag. Data, output. Quadrature component of the output data.

• o_data_stats. Status register, output. Statistics about the peak height distribution.

(37)

Delay (CORDIC + PSW + group delay + various)

CORDIC Peak Detector

Threshold Peak Search Window (PSW) length

Peak statistics

To the Peak Manager From the Peak Manager

+ -

Cancelling pulses

Mag/phase

Higher PAPR I/Q data

Lower PAPR I/Q data

peak scale, peak phase, displacement

Figure 3.4: Block level diagram of the Clip Stage. The Peak Detector isolates the peaks in the input signal and collects statistics on them.

3.2.1 The Clip Stage

Each Clip Stage (Figure 3.4) receives the data to be processed from the previous CS (or from the previous module in the processing chain, in case of the first clip stage), in the form of an I/Q fixed-point signal. The inputs of the CS are the clock, the active-low reset, the input signal (real and imaginary parts), the input data toggle command, the synchronous reset of the peak statistics and the cancelling pulse(s) coming from the PM. The output is the registered difference between the (delayed) input signal and the cancelling pulse(s).

CORDIC

The first module encountered by the signal inside the CS is the CORDIC. The CORDIC is a flexible iterative algorithm capable of computing several approximated transcendental functions without the need of multipliers, so it is conveniently used in hardware design in order to minimize the area. Inside the CS, it is used to convert the input complex signal from the rectangular to the polar form, so that the magnitude of the input signal samples is exposed and the peaks can be detected.

The implemented CORDIC can be configured to be synthesized in pipelined or non- pipelined version. The latter operates all the iterations combinatorially in a single clock cycle thus offering a significative lower delay but might not be synthesizable at the higher frequencies. In the perspective of designing the PC-CFR in a as much configurable way as possible, both choices are available.

(38)

Peak Detector

In the following, it is referred as Peak Detector what, with reference to the Figure 2.7, corresponds to the cumulative functions made by the Peak Detector and the Peak Extractor. Therefore, the goals of the Peak Detector are:

• To identify the peaks in the input signal. Every time a new peak is detected, the module sends the peak characteristics and a notification pulse to the Peak Manager (PM).

• To collect information on the height of the detected peaks. This data is collected either for statistical purposes or in the perspective of using it to apply adjustments on the threshold and the Peak Search Window (PSW) length (not yet possible in the present implementation).

The Peak Detector can be configured with two values: the threshold and the PSW length (cr_thr_p and cr_psw_length_c in the SV code, respectively) that can be set independently for each CS. The module is implemented as a two-states Finite State Machine (FSM) (see Figure 3.5 for the Algorithmic State Machine (ASM) of the Peak Detector, with pseudo-code or plain English in place of the actual SV statements or variable identifiers, in order to favor clarity over formality):

in the IDLE state, the input samples pass unaffected and no action is taken until a sample exceeds the programmed threshold. Then the state machine evolves to the PEAK_SEARCH state during which, for the fixed amount of samples dictated by the PSW length register, successive input samples are compared to the last detected maximum in order to find the maximum sample within the entire interval (the definition of peak). This is performed simply by comparing the magnitude of each new input sample with the actual maximum which is stored in a register together with the corresponding phase. At the end of such interval, the value of the threshold parameter is subtracted from the found maximum input sample thus defining what will be referred to as peak scale in the rest of the report. The peak scale, the relative phase and a trigger signal are sent to the PM, and the statistics of the peaks are updated with the new arrival.

A fundamental aspect in the process of peak detection has been neglected so far: within the PSW interval, the sample that will be elected as the peak can be found at any position (i.e. it could be the first or the second of the last sample in the interval), and this positional information is necessary for the proper alignment between the cancelling pulse that will be generated by the PM and the input signal.

The Peak Detector keeps track of this displacement of the peak inside the PSW interval via a counter (reported as displacement in Figure 3.5), and this is the last information sent by the Peak Detector to the PM when a new peak is detected.

Delay Memory

The input signal to the CS is sent to both the CORDIC and a delay memory whose purpose is to compensate for the delays due to the various aforementioned steps of

(39)

i_data_mag

>

thr.?

Save i_data_mag, i_data_pha.

Reset PSW counter

i_data_mag

>

current max?

End of PSW reached?

Notify the Peak Manager, create and send pk_scale = max mag – thr., pk_pha, displacement, update peak statistics

IDLE

PEAK_SEARCH F

T

Update current max mag, phase,

displacement

PSW = PSW + 1 (increases samples count)

F T

T F

Figure 3.5: ASM of the Peak Detector. Please note that "thr." and "End of PSW"

correspond to the cr_thr_c and cr_psw_length_c parameters respectively.

(40)

the processing on the signal in the CS and in the PM. The delay can be split in two components, giving a clearer understanding of their origins and relative measures (expressed in terms of data rate periods): the smaller component is to compensate for the CORDIC (one if it has been configured as non-pipelined, eleven otherwise⁵), the Peak Detector (for a number of units equivalent to the PSW length), and all the chain of elaboration provided by the PM. The most consistent component by far is the group delay relative to the cancelling pulse generation, the amount of which is approximately half of the number of coefficients of the pulse.

Final registered subtractor

The output of the CS is generated as the difference between the delayed signal and the sum of the cancelling pulses coming from the PM, registered. This costs another data period of delay per Clip Stage but is not compensated by the delay memory because the final registered subtraction is the very last operation applied to the signal.

3.2.2 The Peak Manager

The Peak Manager (see Figure 3.6) is the centralized unit that receives the no- tifications and the characteristics of the detected peaks from all the Clip Stages, generates the cancelling pulses accordingly and sends them back to the appropriate Clip Stage, where they will finally cancel the peaks. One of the most crucial tasks of the PM is the management of the Peak Cancelling Units (PCUs), whose optimal utilization has been the main effort in this project design. In the most naive way of tackling the problem, the availability of N PCUs would require the presence of N replicas of all the resources needed for the generation of a single cancelling pulse; this in turn would mean: N memories for the storing of the cancelling pulse coefficients, N complex multipliers, N CORDICs for the conversion from polar to rectangular form and N accesses to an adder to combine together all the cancelling pulses. In order to minimise the area occupancy of the PC-CFR module, as anticipated in the introduction, the present implementation makes use of a time-division multiplexing approach for a more efficient exploitation of the described hardware resources. To make this possible, a second, faster clock is used as well, and the ratio between the faster and the slower clock frequencies is set as the parameter num_ts_c (number of time slots, see Figure 3.7). As in every time-division multiplexing scenarios, a single resource is shared among several users in different intervals or slots of time forming a partition (that is, without any overlapping) of a longer interval of time, which repeats periodically. In the present implementation the shared resource is made by the mentioned set of hardware resources (coefficients memory, multiplier etc...), the slot of time is the period of the faster clock and the longer interval is the

5The number eleven comes from the precision of the data that is elaborated by the CORDIC.

The number of iterations of this algorithm is roughly the same as the number of bits that is used to represent the data.