A clock driver with reduced EMI

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

A clock driver with reduced EMI

Examensarbete utfört i

vid Tekniska högskolan vid Linköpings universitet av

Mikael Bengtsson LiTH-ISY-EX--14/4750--SE Linköping and Skänninge 2014

Department of Electrical Engineering Linköpings tekniska högskola

(2)

(3)

A clock driver with reduced EMI

Examensarbete utfört i

vid Tekniska högskolan vid Linköpings universitet

av

Mikael Bengtsson LiTH-ISY-EX--14/4750--SE

Handledare: Behzad Mesgarzadeh isy_{, Linköping University} Examinator: Atila Alvandpour

(4)

(5)

Avdelning, Institution Division, Department

Electronic devices

Department of Electrical Engineering SE-581 83 Linköping Datum Date 2014-03-18 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-XXXXX

ISBN — ISRN

LiTH-ISY-EX--14/4750--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

En klockdrivkrets med reducerad EM-strålning A clock driver with reduced EMI

Författare Author

Mikael Bengtsson

Sammanfattning Abstract

A clock driver that works on the principle of charging and discharging the clock network in a VLSI circuit in two steps is investigated in a few different configurations. The aim of the design is twofold:

• to reduce the power consumption

• to reduce the third harmonic of the clock signal, and thereby the EMI (electromag-netic interference) emitted by the clock network.

The first should be possible to accomplish as the clock interconnect network gets charged by half the voltage during each rising transition, and the second should be possible to accom-plish by carefully time the rising and falling transitions, so that the third Fourier coefficient of the resulting wave form cancels.

The drivers are loaded by eight 16-bit adders. The drivers’ power consumption, and the spectrum of the output signal, are investigated under varying clock frequencies, power sup-ply voltage, and driver architecture. The results are compared to a conventional square wave clock.

The results are that while the third harmonics of the resulting output sees an improvement in all the investigated cases over the square wave clock, the power savings are, for higher clock frequencies, more than completely canceled by the extra power needed in the logic stage which controls these drivers. On the other hand, the power consumption of the new driver appears to drop below that of the conventional driver when the clock frequency drops below approximately 100MHz.

(6)

(7)

Sammanfattning

En klockdrivkrets som arbetar efter principen att ladda och ladda ur ett klock-nätverk i en VLSI-krets i två steg undersöks i några olika sättningar. Målet med kretsdesignen är tvåfaldigt:

• att minska effektförbrukningen

• att minska den tredje övertonen i klockans vågform, och därmed reducera den elektromagnetiska interferensen (EMI) som avges från klocknätverket. Den första punkten ska kunna uppnås genom att klocknätverket laddas med en potentialskillnad som i vardera steget är hälften så hög som normalt. Den andra punkten kan uppnås genom att noggrannt justera när varje (halv) stigning och fall inträffar, för att på så vis släcka ut den tredje Fourierkoefficienten.

Drivkretsen har en last om åtta 16-bitars adderare. Drivkretsarnas effektförbruk-ning, och klocksignalens spektrum, undersöks under några olika klockfrekvenser, drivspännig samt drivkretsarkitekturer. Resultaten jämförs sedan med en kon-ventionell fyrkantsvågsklocka.

Resultaten är att medan den tredje övertonen dämpas i alla de undersökta fallen jämfört med den konventionella fyrkantsvågen, så förloras mer effekt i den extra logik som krävs än vad som tjänas in på att driva klocknätverket med lägre poten-tialsteg. Å andra sidan är förlusten störst för högre klockfrekvenser, och effekten hos den nya drivkretsen är jämförbar med fyrkantsvågskretsen då klockfrekven-sen sjunker ned under ungefär 100MHz.

Några förslag på vidare undersökningar av ytterligare arkitekturer och vågfor-mer hos klocksignalen ges.

(8)

(9)

Abstract

A clock driver that works on the principle of charging and discharging the clock network in a VLSI circuit in two steps is investigated in a few different configura-tions. The aim of the design is twofold:

• to reduce the power consumption

• to reduce the third harmonic of the clock signal, and thereby the EMI (elec-tromagnetic interference) emitted by the clock network.

The first should be possible to accomplish as the clock interconnect network gets charged by half the voltage during each rising transition, and the second should be possible to accomplish by carefully time the rising and falling transitions, so that the third Fourier coefficient of the resulting wave form cancels.

The drivers are loaded by eight 16-bit adders. The drivers’ power consumption, and the spectrum of the output signal, are investigated under varying clock fre-quencies, power supply voltage, and driver architecture. The results are com-pared to a conventional square wave clock.

The results are that while the third harmonics of the resulting output sees an improvement in all the investigated cases over the square wave clock, the power savings are, for higher clock frequencies, more than completely canceled by the extra power needed in the logic stage which controls these drivers. On the other hand, the power consumption of the new driver appears to drop below that of the conventional driver when the clock frequency drops below approximately 100MHz.

A few suggestions for further investigations of new designs and clock wave forms are given.

(10)

(11)

Acknowledgments

I would like to express my gratitude to my supervisor, Prof. Behzad Mesgarzadeh, at Linköping University, for his help and patience during this work.

I would also like to thank my examiner, Prof. Atila Alvandpour, for being so inspirational.

Without the support of my family, I wouldn’t have started on this journey in the first place. Thank you.

And last, but certainly not least, a large thank you to Betsy, my fiancée, for being so amazingly patient while I’ve been working on this, and helpful with double-checking my English.

Linköping, March 2014 Mikael Bengtsson

(12)

(13)

Notation

Architecture

Name Meaning

SQ Square wave driver

FP, FNP 2-step driver with faster rise and fall times. Each half-transition is approximately 5% of the clock period; with and without an extra passgate (see figure 4.5). SP, SNP 2-step driver with slower rise and fall times. Each

half-transition is approximately 10% of the clock period; with and without an extra passgate (see figure 4.5).

Time

Timing Interpretation

trr, trf, tfr, trf Time interval between two consecutive

half-transitions. Compare figure 3.4.

τrl, τfl, τrh, τfh Rise and fall times for a half-transition (h for “high

level” and l for “low level”, respectively).

(16)

(17)

1

Introduction

A couple of alternatives to the conventional clock signals for CMOS circuitry have been proposed in the literature. These are intended for special purposes such as reduced power, lower electromagnetic interference (EMI), or suppression of spe-cific frequencies in the EMI. The conventional (ideally) square wave clock signal has the benefit of minimal rise and fall times, with an associated increase in max-imum achievable frequency and reduction of short circuit currents, as either the pull-up or the pull-down network is turned off.

On the other hand, it draws a dynamic power fclkCSVddVswing, where CS is the

capacitance of the clock network, and Vswingtypically equals power supply

volt-age Vdd. This can be a significant portion of the total power consumption of a

digital circuit.

Besides the pure power aspects of having to charge the clock network every cycle, these large currents occur at very regular intervals. This means that the whole cir-cuit will cause electromagnetic interference (EMI) at certain frequencies. Given the conventional clocking scheme which (ideally) uses square waves, these fre-quencies can be calculated to be odd multiples of the clock frequency, and with a power distribution that decreases as the square of this multiple. In practice, high frequencies will drop off faster than that due to nonzero rise and fall times. This work investigates an alternative to the conventional square wave clocking. By using a clock driver that charges and discharges the clock network in two steps, less power is used. Ideally this will amount to 50% of the conventional energy consumption, if both steps are charged/discharged by a voltage that is half as large as the conventional power supply. By choosing the length of each step just right, every third harmonic will cancel out. The drawback of this clocking scheme is that it comes with a cost in the form of a reduction in the maximal

(18)

2 1 Introduction

possible clock frequency, as well as a cost in power due to the extra logic used in this driver. This may or may not be larger than the savings.

1.1 Outline

The work is arranged so that section 2 recalls some previous works in the area of special clock drivers. In section 3 some theoretical results follow about two-step clocks, while section 4 describes the design of the investigated drivers and their load. In section 5 follows an analysis of the obtained results from the simulation of the tested circuits.

(19)

2

Previous works

Depending on what is important to improve over the conventional clock signal, several modifications have been proposed, such as:

1. Spread spectrum clock generation (Hardin et al. [1994], Kim et al. [2005]) 2. Resonant clocking (Chan et al. [2004, 2003, 2005], Hansson et al. [2006]),

optionally using distributed capacitances (Drake et al. [2004])

3. Rotary traveling waves, and standing wave oscillators (Wood et al. [2001], O’Mahony et al. [2003])

4. Relaxed rise and fall times (Veendrick [1984]) 5. Multi-segment clocking (Mesgarzadeh et al. [2011])

6. Multi-level, multi-segment clocking Mesgarzadeh et al. [2011]

2.1 Spread spectrum clock (SSC) techniques

A conventional square wave clock signal causes a large number of transistors to (ideally) switch at the same time, and with constant time intervals. This causes EMI to be concentrated in narrow bands corresponding to the clock frequency and its (odd) multiples. Introduction of possibly intentional clock jitter will spread the EMI over a wider frequency band around the nominal clock frequency (and its harmonics), thereby reducing the emitted power at each given frequency. Kim et al. [2005] tested numerically four modulation signals as sources for jitter, and determined that a triangular signal attenuates the EMI the most. They fur-ther simulate and measure the effect for this triangular modulation signal, which

(20)

4 2 Previous works

is achieved by letting the clock signal from a PLL pass through a delay cell array (DCA), where each active delay cell contributes with its own delay to the out-put. The measurement in their setup gave a reduction in EMI from 74dBµV/m to 65dBµV/m at 390MHz clock frequency, and 50kHz modulation. In Hardin et al. [1994], up to 13dB attenuation is achieved at high clock frequencies.

Note however that this method only will have an impact on the EMI, but not necessarily on the power consumption.

2.2 Resonant clocking/charge-recovery

In order not to waste the energy fed into the clock network as it charged, it is pro-posed that the clock network is part of an LC oscillator, where magnetic fields store energy as the clock network is being discharged. The expected power con-sumption for a resonant clock, compared to a conventional buffer-driven network with a stage gain of λ, is given by Drake et al. [2004]

Presonant

Pconv

=3π(λ − 1) 4Qtankλ

(2.1) where Qtank is the quality factor of the tank.

A few different principles for the design can be identified:

• In Chan et al. [2003, 2005] it is suggested that the clock network is shaped as a H-tree (cf. Rabaey et al. [2003], p. 509), and that certain nodes of this tree are attached to inductors, so as to form LC oscillators distributed in the chip, all driven by a fairly small driver. The inductors are then placed be-tween the clock net and designated capacitors. It is then beneficial, from the point of view of power consumption, not to use large drivers which would add more higher-frequency components as it would push the clock signal further away from being a pure sine wave and closer to a square wave. In addition, it is apparent that a signal that does not include higher harmon-ics will cause the chip to emit less EMI. In the cited work, a simulated clock power saving of 80% is achieved at fclk = 1.1GHz, while jitter is smaller

than in similar non-resonant circuits.

• If the designated capacitors from Chan et al. [2003, 2005] are removed, so as to let the clock network itself be the full capacitance of the oscillator, as in Drake et al. [2004], one sees that although the capacitance in this LC oscillator is somewhat data dependent, the effect is not very large: it is estimated that it could cause a shift in clock frequency of up to 1.25%, but according to practical measurements the shift was closer to 0.68% for worst-case data. Results from Hansson et al. [2006] indicate that the total chip power consumption for such a design can decrease by 15 − 30%, and clock power consumption by 57%, without even optimizing flip-flops or latches for a sine-wave clock signal: the “off-the-shelf” flip-flops use about 34% more power with this clock signal than with a square wave signal.

(21)

2.3 Rotary traveling waves, Standing wave oscillators 5

2.3 Rotary traveling waves, Standing wave oscillators

Both of these are based on the transmission line properties of the clock net. As for the rotary traveling waves, the idea is to divide the chip into a number of square regions, each of which is surrounded by a cross-coupled double loop (“Möbius loop”). Further, a (large) number of inverters connecting the outer and inner sec-tion of the loop provide a capacitance which slows down the propagasec-tion of the signal. This allows for the signals to be locked at 180◦_{phase difference between}

the inner and outer loop. Note that one can, by choosing from which point of the loop to tap off the clock signal, adjust the phase for the given region.

The distribution across the chip is achieved by letting two neighboring regions share sections of the loop. See further Wood et al. [2001].

The standing wave oscillator, on the other hand, uses pairs of wires which are grounded at both ends. A number of distributed transconductors drives the standing wave on this wire pair. Cf. [O’Mahony et al., 2003, figure 4]. A clock signal may be tapped from the center of each wire, but should be buffered before it is used in digital logic. However, in the cited paper, no clock buffers were im-plemented as the desired frequency (10GHz) was higher than could be used with available clock buffers.

2.4 Relaxed rise and fall times (trapezoidal clock)

In the case where EMI is of primary concern, a possible solution is to increase the rise and fall times of the clock. This is because there is a correlation between fast transitions and large harmonics, and while the size of the harmonics cannot be used to determine the EMI immediately, it is possible to do a qualitative estimate of EMI based on the size of the harmonics. See Pandini and Repetto [2006]. In the case of significant rise and fall times—making the clock signal look more trapezoidal than square—one can derive the size of the harmonics as

|_c_n|_{= V}_dd_p sin(πnp) πnp sin(πnτrfclk) πnτrfclk (2.2) where p is the 50% pulse width as a fraction of the clock period, fclkis the clock

frequency, and τr = τf the common rise and fall times. As long as the rise and

fall time is small compared to the clock period, the power Pn ∝ |cn|2in the nth

harmonic will decrease by n−2for small n, or n−4 for larger n. Note that as long as p = 1/2, all even harmonics will cancel out.

The increased rise time comes with a cost in short-circuit power, as NMOS and PMOS conducts at the same time for increasing periods of time. Assuming that supply voltage is high enough that Vdd > Vtn+

tp

(22)

6 2 Previous works

[1984] calculates the short-circuit power to be

Psc=

β

12(Vdd−2VT)

3τr

T (2.3)

where again τr = τf is the rise and fall time.

2.5 Multi-segment clocking

While the case where the rise- and fall times are relaxed does improve dynamic power and EMI compared to the square wave case, it also increases the static power dissipation while the clock transitions. A way to reduce the latter effect is to use a “multi-segment clock”; that is, a clock signal whose rising (falling) edge consists of multiple segments with different slopes, with the intention that it will transition quickly past the levels where both pull-up and pull-down networks are active, thereby reducing the short-circuit power. Additionally, the slopes can be chosen so as to decrease the higher harmonics. Mesgarzadeh et al. [2011]

re-Figure 2.1:Multi-segment clock, from Mesgarzadeh et al. [2011]

port a simulation of a digital circuit with this clock signal. It shows that the multi-segment clock lies between the very relaxed trapezoidal and the conven-tional clock both in terms of EMI (better than convenconven-tional, not as good as the trapezoidal) and in terms of power (better than the trapezoidal). It also offers better properties in terms of timing than the trapezoid clock, as the voltage rises faster to acceptable levels.

2.6 Multi-level or Multi-step clocking

Another clocking mode that promises both reduced power and EMI compared to the conventional square clock—but comes with a cost in terms of a reduced maximum usable clock frequency—is a clock that transitions between 0 and Vdd

in several steps. The transitioning time between two steps is intended to be rela-tively short, while it spends a significant time on each level. One could of course consider multi-level clocks with any number of levels, but this work will only deal with the case of transitions happening in two steps: from 0 to Vdd/2, and

fromVdd/2to V_dd(and vice-versa during falling), as shown in figure 2.2. This

par-ticular case will hereafter be referred to as a two-step clock. This signal shape has previously been investigated in Fritzin et al. [2012a] in the context of switched amplifiers, where examples of driving circuits also are given. Simulations of this

(23)

2.6 Multi-level or Multi-step clocking 7

Figure 2.2:Two-step clock

amplifier was made at 1GHz. As a result, the third harmonic was suppressed by 23dB compared to an inverter Class-D stage.

The most important trade-off concerning this clock scheme is the short-circuit power that can be very large unless steps are taken to use low supply voltages together with high-threshold voltage transistors on all clock inputs. This is one reason as to why there is a restriction on the maximum frequency with this clock scheme. The second reason turns out to be related to the low overdrive voltage

(24)

(25)

3

Some theory of the two-step clock

This section will look into some of the theoretical underpinnings for the state-ment that a two-step clock scheme could reduce power and EMI (harmonics). We assume from now on that the clock (ideally) looks like the one in figure 2.2, with the equation

Vclk=                      0 0 < t ≤ T /6 Vdd/2 T /6 < t ≤ 2T /6 Vdd 2T /6 < t ≤ 4T /6 Vdd/2 4T /6 < t ≤ 5T /6 0 5T /6 < t ≤ T (3.1)

and that each rising and falling half-transition, i.e. transition of size Vdd/2, each

take a time τ.

3.1 Harmonics

With a non-zero rise and fall time τ, the Fourier coefficients are given by |_c_n>0|₌ Vdd 2 sin(1₂πn) 1 2πn sin(πnτ fclk) πnτ fclk cos _nπ 6 (3.2) which can be derived similarly to equation (2.2). To obtain this, one can use the observation that the two-step wave can be interpreted as a sum of two square waves of half the amplitude, with a phase difference of one sixth of a period.

(26)

10 3 Some theory of the two-step clock 100 101 102 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

Frequency of harmonics, expressed as multiples of the clock frequency

Spectrum power (normalized)

Figure 3.1:Calculated spectrum envelope of a two-step curve with rise and fall time (half-step) equal to 1% of the clock period. The lines corresponds to the 1/n2and 1/n4asymptotes.

In the limit of zero rise/fall times, this simplifies to |_c_n|₌ Vdd 2 sin(1₂πn) 1 2πn cos _nπ 6 (3.3)

As with the square wave, this cancels out for all n which are multiples of 2. In ad-dition, it will also disappear for all multiples of 3, which is the main improvement we see over the square wave solution, with respect to the size of the harmonics, and thereby the EMI.

The asymptotic behavior is alike in the square wave case, with power harmon-ics proportional to n−₂

for small n, and proportional to |cn|2 ∝ n−4 for large n.

The breakpoint occurs above the frequency 1/(πτ). See e.g. Mesgarzadeh and Alvandpour [2010].

3.2 Short-circuit power

Since Vclk = Vdd/2 for a significant part (1/3) of each clock period, short-circuit

power dissipation could potentially be a major problem. However, if speed is not a concern, there is the option to let the clock distribution network only drive high-threshold MOS transistors, and to choose the supply voltage Vdd < Vtn+

Vtp . Then there will not be any moment in time when both pull-up and pull-down networks are conducting. Rather, to the degree there is any short circuit current,

(27)

3.3 Dynamic power 11

it will be in the form of sub-threshold currents.

In addition, if the rise and fall times of the clock signal is significant in compari-son to the time spent at the Vdd/2 level, one may predict an even more significant

reduction in short circuit power, due to the limited time spent at half-voltage, where the subthreshold current is at its peak.

A simple way of understanding this is to consider a voltage Vin which is close

enough to Vdd/2 so that both transistors are in the subthreshold region.

As-sume further that the inverter is loaded with a capacitance CL. Let the

cur-rent through the PMOS and NMOS be denoted by Ip = I0exp (β(Vdd−Vin)) and

In = I0exp (βVin) respectively, where we assume that the widths are such that

I0 is equal for the NMOS and the PMOS, and β is a constant that among other

things depends on the temperature. Note that if Ip> In, the loading capacitance

is being charged, and the current In is basically wasted. A similar argument for

Ip< Inyields that the short circuit current could be approximated by a “wasted”

current

Iscc= min(In, Ip) = I0exp (β min(Vin, Vdd−Vin))

By averaging this under one period we can get an estimate on the short circuit current. Assuming that Vin increases linearly during each step, and that each

rise and fall time equals τ, one gets that

Iscc,avg= I0 2τ T βVdd (exp(βVdd/2) − 1) + ₁ 3− 2τ T exp(βVdd/2) !

This is still under the assumption that either NMOS or PMOS are in the subthresh-old at all times. As noted in section 2.6, this is very much desirable in order to keep down the short circuit current (and thus static power consumption). In prac-tice, this can be achieved by reducing the supply voltage, and by only connecting the clock signal to high-threshold transistors.

3.3 Dynamic power

The charging of a capacitive load CS with the given input (3.1) provides the

ca-pacitor with an electrostatic energy

E2S= 12CS(Vdd/2)2+ 12CS(Vdd−Vdd/2)2=

CSV_dd2

4 (3.4)

Compare this to the electrostatic energy when charged by a square wave

(28)

12 3 Some theory of the two-step clock

In both cases, an equal amount of energy is dissipated in the resistances of the circuit. The total powers will thus be

P2S=

f CSVdd2

2 PSQ= f CSV

2

dd (3.6)

That is, the two-step clock should in theory reduce by half the dynamic power that is used to charge and discharge the capacitance, when compared to a square wave signal.

As for non-zero rise and fall times: consider, as an illustration, on the one hand a trapezoidal wave with rise and fall time 2τ, and on the other hand a two-step wave where each rise and fall time equals τ, both passed through a simple RC link.

2τ

τ

Figure 3.2:A rising edge of the two input signals

Consider the input

Vin(t) =            0 t < 0 t 2τVdd 0 ≤ t < 2τ Vdd 2τ ≤ t (3.7) The relationship between the voltage over the capacitance and the input voltage is given by

dVC

dt =

1

RC(Vin(t) − VC) (3.8)

which for the given Vinhas the explicit solution

VC(t) =            0 t < 0 Vdd 2τ t + RCexp_RC−t−₁ _{0 ≤ t < 2τ} Vdd 1 −RC_2τ exp_RC−t exp_RC2τ−₁ _{2τ ≤ t} (3.9)

The energy dissipated in the resistance R during charging can then be calculated as E1= ∞ Z 0 1 R(Vin(t) − VC(t)) 2_{dt = CV}2 dd _RC 2τ 2 exp −_2τ RC −_{1 +} 2τ RC (3.10)

(29)

3.4 Characterizing the two-step wave form 13

When τ is small, this turns into the well-knownCV

2

dd

2 , which can be seen by doing

a Taylor expansion of the exponential. A similar calculation for the two-step clock, where we assume that the time between the two half-rises are long enough that VC ≈Vdd/2 by the time the second step starts, gives an energy dissipated in

the resistance, as follows:

E2= CV_dd2 2 _RC τ 2 exp −_τ RC −_{1 +} τ RC (3.11) This is, for small τ, close to CV

2 dd 4 , as expected. 0 0.5 1 1.5 2 2.5 3 3.5 4 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 τ / (RC) E / (CV dd 2 ) Square 2−step

Figure 3.3: Relaxed rising and falling transitions could be predicted to re-duce the potential savings in the two-step clock.

From this, we draw the conclusion that while the savings in dynamic power can be expected to approach 50% when τ → 0, we will likely have smaller savings when the rise and fall times increase to be comparable to the RC factor of the charging/discharging circuit. The reason for this is that the square wave’s dy-namic power consumption improves faster with increasing rise and fall times, than does the dynamic power of the two-step wave.

3.4 Characterizing the two-step wave form

While a square wave can be characterized fairly well by the three numbers “rise time”, “fall time”, and “pulse width”, the two-step wave form requires additional information for a complete description. Figure 3.4 displays how such numbers

(30)

14 3 Some theory of the two-step clock tfr t_ff trf trr 0.25Vdd 0.75Vdd Vf Vr τfl τrh τrl τfh

Figure 3.4: Some timing data that can be used to characterize a two-step wave form. Vf is measured in the middle of the tff interval, and Vr in the

middle of the trrinterval. These “plateau voltages” are measured in the

mid-dle of the intervals trrand tff, respectively, in order to handle the possibility

that the voltage doesn’t stabilize at Vdd/2. The rise and fall times are defined

(31)

4

Description of tested circuits

In order to evaluate the power savings and EMI for the proposed clock driver, two different two-step drivers are constructed and simulated on the schematic level, using a 65nm technology with transistors of two different threshold voltages. The higher threshold voltage is about VT h0 = 0.4V, and is used for every transistor

that is driven by the clock signal. Together with a supply voltage of 0.8V, this should keep down static power consumption.

Each driver is sized in three main variants, aimed for three different clock fre-quencies (100MHz, 250MHz, and 500MHz) and two sub-variants intended to in-vestigate performance when rise/fall times change. Further, for each frequency, all five designs (four two-step and one square wave) are simulated at three dif-ferent voltages (0.7V, 0.8V, and 0.9V). Finally, the devices are tested while the supply voltage is connected to an RCL-link with parameters R = 2Ω, C = 2pF, and L = 2nH in order to evaluate sensitivity against fluctuations in the power supply voltage.

These drivers are compared to a conventional square wave driver. In order to provide a more realistic load than a pure capacitor or RC-link, the load is con-structed from a set of eight parallel 16-bit adders, each pipelined in four stages. This makes each adder to incorporate a total of 80 flip-flops for a total of 640 flip-flops.

Measurements are made of the total power consumption for each driver plus load, as the loading adders are provided with semi-random inputs over 60 clock cycles. Further, the clock signal’s spectrum is analyzed to see how well the drivers manage to reduce the third harmonic.

(32)

16 4 Description of tested circuits

Adders Driver

stage Driving stage Logic 6 gen. Data φ s φ V1 V₂

(a)Two-step driver

Adders Driver

Buffer

stage Driving stage

gen. Data φ s φ V1

(b)Square wave driver Figure 4.1:Structure of test benches

into two stages, primarily for the benefit of the two-step driver, which, in the given architecture, requires a certain amount of logic to determine when the clock signal is to be at the half-level. This division was done for the practical purpose of making it easier to measure the power consumption for the individual stages. A similar division of the square wave driver simply places the last inverters in its “driving stage”.

The inputs V1 and V2 are square wave with 50ps rise and fall times, where the

latter follows with a phase difference of 60◦

.

4.1 Adders

The adders are 16-bit pipelined, built from 4-bit ripple-carry adders. The full adders are done in a 28-transistor CMOS design, which is shown in figure 4.2.

cin cin a b a a b b a a b a a b b a b s cout cin cin b cin cin Wn= 0.17µm Wn= 0.2µm

Figure 4.2:28-transistor full adder

The sizing uses minimum size transistors with pull-up network 2.5 times as large as the pull-down network (NMOS: 0.135µm) as far as possible. It only increases

(33)

4.2 Clock drivers 17

the sizes for the carry-in/carry-out critical path (NMOS: 0.170µm) and the tran-sistors to generate the carry out (NMOS: 0.200µm). Cf. figure 4.2.

The registers for the pipeline are D flip-flops designed as transmission gate based with C2MOS keepers—a structure also known as TGFF-PPC, after its use in the PowerPC 603 (Rebaud et al. [2008]). As previously indicated, this is the point where the high-threshold voltage transistors needs to be used, in transmission gates as well as in the clocked keeper inverters. The sizing is done with minimal

s φ φ s φ φ φ φ s φ φs

Figure 4.3:Transmission gate DFF with high threshold voltage transistors at each φ and sφ input.

NMOS transistors (0.135µm) and a NMOS/PMOS ratio of 2. The design makes the assumption that the clock driver outputs a differential signal, so that both the clock φ and its inverse sφ are available to the adders directly.

4.2 Clock drivers

The evaluation of the two-step driver is done with a conventional square wave driver, with differential output and a tapering factor of 3, as reference. The sizing of the conventional driver is done to achieve rise and fall times of approximately 10% of the clock period. As for the tested design, four two-step drivers are cre-ated. These are of two slightly different architectures (shown in figure 4.5), each sized in two versions with different rise and fall times. The faster of these (sizing “F”) aims to have the two rise times (to Vdd/2 and to Vdd, respectively) equal 5%

each of the clock period, while the slower (sizing “S”) settles for 10% of the clock period per half-rise.

All clock drivers are assumed to be driven by input signals that are close to ideal square waves—though with 50ps rise and fall time—but always use minimum sized transistors on the inputs. In the case of the conventional driver, only one such input is needed, but in the case of the two-step driver, two inputs (V1and V2)

(34)

18 4 Description of tested circuits

4.2.1 Design of two-step clock driver

Several different architectures for a two-step driver can be considered. Two sim-ple options are presented in figure 4.5, and a third option that is not evaluated in this work in figure 4.6. The benefit of the extra transmission gate in the first de-sign is that it allows the transistors connected to Vdd/2 to be significantly smaller

while still keeping the rise and fall times down, as the potential difference over the pass gate is typically twice as large as the potential difference between the output and the Vdd/2 power supply, thereby letting it conduct better than do the

transistors M1–M4 in figure 4.5b.

The control signals c1–c3 and their inverses are created from the inputs V1 and

V2through

c1= V1· V2 c2= V1⊕V2 c3= V1· V2 (4.1)

Since the timing is somewhat sensitive here, the single-ended to differential buffer used to obtain the inverses of the inputs uses an extra 1.5fF capacitance to delay the shorter branch. Similarly, such buffers are used to create sc1–sc3, though in some cases it turned out to be easier to get good timing by using a pass gate, clamped open, in the shorter branch instead of the capacitance. Such a solution is also beneficial for power consumption, though it cannot always obtain the de-sired delay.

A noteworthy aspect of making the driver differential as in the given designs is that it automatically obtains aspects of a charge recovery system, in that the half-transitions to Vdd/2 are obtained by a redistribution of the charge between the

two clock distribution networks. Ideally, then, the voltage source Vdd/2 should

not have to supply any significant power, though mismatches in the design may make it less straightforward to completely eliminate this power supply. However, a solution similar to that in Fritzin et al. [2012a] where this power supply is replaced by a (large) capacitance might be of interest.

Vy

Vx

Figure 4.4:Typical shape of a single-ended to double-ended buffer. The size of the capacitance is individually tried out for each instance, but are typically a few femtofarads.

An alternative design A third option that was not investigated in any detail is suggested in figure 4.6. This has the benefit of lacking a designated logic stage, as the inputs V1 and V2 (together with their inverses) are sufficient as control

signals. However, it has the potential problem that the Vdd/2-level needs to pass

two passgates before it arrives at the outputs. Since the overdrive voltage in this case already is very low, there is a risk that the transmission gates would need

(35)

4.2 Clock drivers 19

to be made very wide in order not increase τrland τfhoverly much. These are

defined as “rise time at low level”, and “fall time at high level”, respectively, as in figure 3.4. Thus there might also be a requirement to buffer V1and V2fairly

strongly. One potential way of improving the restricted current from the Vdd/2

supply might be to introduce capacitors between the nodes vaand vb, as well as

between vc and vd, as these pair of nodes at all times should have a potential

(36)

20 4 Description of tested circuits Vdd Vdd Vout Vout Vdd 2 Vdd 2 Vdd 2 c2 Vdd 2 c2 c3 c1 c1 c3 (a) M1 M5 M6 M2 M3 M7 M4 M8 Vdd Vdd Vout Vout c2 Vdd 2 Vdd 2 c1 Vdd 2 c2 Vdd 2 c3 c3 c1 (b)

Figure 4.5: Two-step clock drivers, with (“P” design) and without (“NP” design) a passgate between the outputs

s V2 V2 V2 V2 V2 s V2 Vdd Vdd va vb vd vc s V1 Vout V1 V1 Vdd/2 s Vout Vs₁ Vs₂ s V2

Figure 4.6: Two-step clock drive, alternative design. Not evaluated in this work.

(37)

5

Results

5.1 Wave form

The clock signals obtained from simulation of the two-step drivers differ in a few aspects from the ideal wave form, as seen in figure 5.2. Among other things, the plateau levels do not quite reach Vdd/2 within reasonable time; the voltage levels

out too early. Measurements on the designed drivers yielded offsets up to 10% of

Vdd(typically 2%–5%) when measuring the plateau voltage halfway between the

two rising (falling) transitions. Reducing the clock frequency reduces this offset. This voltage offset for the plateaus also causes some difficulties when it comes to defining the rise and fall times for the half-transitions. While a straightfor-ward 10%–90% rise time of e.g. the transition 0 → Vdd/2 (i.e. a 0.05Vdd →

0.45Vdd transition) is easily measured, it is not certain that the plateau ever

reaches 0.45Vdduntil the next half-transition begins. Thus, the rise and fall times

obtained in table A.3 are instead defined as the 10%–90% rise time to the ob-tained plateau voltage, as measured in the middle of the time interval trr (and

similar for the falling transitions).

Further, depending on the precise timing relationships between the transitions of the various control signals, there can be a significant feed-through to the output, as seen as dips and peaks before transitions. The size of these can to some extent be reduced by carefully adjusting the timing between the control signals. Since the 2-step clock does not have one well-defined notion of “pulse width”, one gets, according to figure 3.4, several numbers which all are in some sense pulse widths. In particular, one may look at the time intervals between successive half-transitions, for example measured at 25% and 75% of Vdd. According to table

(38)

22 5 Results Vr trl trh 0.25Vdd 0.5Vdd 0.75Vdd Vdd 0 (trl+ trh)/2

Figure 5.1:Definition of the rising plateau voltage

Figure 5.2:Example wave form, as simulated

are at most on the order of 1%–4% of the clock period, significantly lower for low clock frequency. According to Fritzin et al. [2012b], such deviations are related to the increase in the sizes of the even harmonics in the spectrum, as are apparent in some of the drivers, and shown in figures A.2–A.4.

5.2 Supply noise

To evaluate the sensitivity towards noise on the power supply side, the simula-tions were done both with and without an RCL link (figure 5.3) on each power supply connection, with R = 2Ω, L = 2nH, and C = 2pF. See table A.4.

C L R

V_in _V_out

Figure 5.3:Voutis connected to the power supply connections of each driver

(39)

5.3 Harmonics 23

5.3 Harmonics

The improvements of the harmonics are, as expected, concentrated to the third harmonic, where for example the architecture with the extra passgate, and sizing for fast transitions, decreased the power in the third harmonic by 29dB, in the case Vdd= 0.8, and no power supply load. However, the case with the slower rise

and fall times has a much more modest improvement of about 9dB. Cf. figures A.1a–A.1c.

The other harmonics (k = 2, 4, 5, 6) have a less uniform behavior. Certain com-binations of supply voltage, supply power loading, and frequency can cause the 2-step to have larger even harmonics than does the square wave. See figures A.2– A.4.

5.4 Power

Increases in clock frequency come with a significant cost in power consumption. As seen in figures A.7–A.9, the logic that creates the control signals for the driver draws much more power than what is used in the square wave driver, in partic-ular if one aims for faster rising and falling transitions. On the other hand, the driving stage does indeed consume less power than the square wave clock does, at least when the clock frequency drops, and the rise and fall times increases.

5.5 Conclusions and future work

The logic needed to control the 2-step drivers is typically so large, for the given driver designs and load, that the power savings in the driver are completely lost. Certain drivers, designed for higher frequencies, even failed to achieve any power savings compared to the square wave driver, even before accounting for the con-sumption of the extra logic stage. By running at a low clock frequency, and letting rise and fall times increase, there are some possibilities for modest gains in power. However, increased rise and fall times do reduce the possible improvements in the third harmonic.

More work is thus needed to determine proper rules for sizing of the different transistors of these drivers, when it comes to the relationship between the Vdd

-connected transistors, the Vdd/2-connected, and the passgate between the clock

networks (if present), and how these affect the power/harmonics trade-off. Fur-ther, deeper analysis is needed of the timing constraints for the control signals of the driver. What should their rise/fall times be, and what are acceptable timing delays between two signals? Each half-transition of the actual clock means that four of the six control signals transitions at the same time, so there are quite a few parts that need to line up.

(40)

24 5 Results

are for relaxing the rise and fall times to the Vdd/2 level. Since the transistors

that are involved in these transitions tended to get large as the transition times

τrland τrh were kept roughly equal, one could consider an alternate wave form,

where these transition times are relaxed to a very large degree: the only restric-tion that actually would be needed is that the output should be approximately

Vdd/2 after the time T /6, and that thereafter the rise time to Vdd(and fall time to

0, respectively) is limited to some reasonable time. The waveform would then in a way be the opposite as the multi-segment clock described in Mesgarzadeh et al. [2011], in that it starts slowly and then transitions fast.

This work used a fairly large, perhaps too large, load for the driver. While the original intention was to investigate also drivers designed for driving a single adder as a load, there was not time to accomplish this as well.

Additionally, the alternative design presented in figure 4.6 has not yet been inves-tigated. The two consecutive passgates might on one hand cause a reduction in maximum achievable speed, but on the other hand the lack of a designated logic stage (beyond two single-ended to dual-ended buffers) might conceivably reduce the power consumption.

The power consumption of the adders was very even between the different de-signs. Worst case scenario gave an increase in power of 15%, compared to the square wave. Some cases even caused slight reductions in the adders’ power con-sumption, though it is not clear how that came about.

(41)

A

Graphs and tables

A.1 Wave form, timing

A.1.1 Pulse width deviations and jitter

T = 2ns 4ns 10ns Vdd= 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V FNP 4 1 2 3 2 2 0.2 0.2 0.2 FP 3 2 1 3 2 2 0.2 0.2 0.1 SNP 6 3 4 6 4 3 0.7 0.8 0.6 SP 7 4 4 6 4 3 0.7 0.7 0.6

Table A.1: Maximum deviation from the ideal pulse widths, expressed as percentages of the clock period, and averaged over 60 clock cycles. Power supply noise is included.

T = 2ns 4ns 10ns Vdd= 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V FNP 0.3 0.2 0.2 0.1 0.1 0.1 0.02 0.02 0.01 FP 0.4 0.2 0.3 0.1 0.1 0.1 0.02 0.01 0.01 SNP 1.7 0.3 0.4 0.4 0.4 0.4 0.05 0.03 0.02 SP 1.6 0.3 0.4 0.4 0.3 0.4 0.06 0.03 0.03

Table A.2: Difference between largest and smallest measured deviation in pulse width, over 60 clock cycles. Expressed as a percentage of the clock period. Power supply noise is used.

(42)

26 A Graphs and tables

A.1.2 Rise and fall times

T = 2ns 4ns 10ns Vdd = 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V SQ 14 10 8 14 10 8 14 9 7 FNP 8+8 5+5 4+5 6+8 4+5 3+4 5+7 4+5 2+4 FP 7+9 5+6 4+4 6+8 4+5 3+4 5+7 4+5 2+4 SNP 10+16 7+11 5+9 8+15 6+10 5+8 7+15 6+10 4+8 SP 10+15 7+11 5+8 8+17 6+10 4+8 7+15 6+10 4+9

(a)Without power supply noise

T = 2ns 4ns 10ns Vdd = 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V SQ 14 10 8 14 10 8 14 10 8 FNP 7+9 6+10 7+10 6+7 5+4 2+3 5+7 4+5 2+4 FP 7+7 4+6 3+5 6+8 4+5 3+4 5+7 4+5 2+4 SNP 8+15 8+11 8+8 8+8 6+10 5+9 7+15 6+10 5+8 SP 10+15 9+10 8+7 8+14 6+10 4+9 7+15 6+10 4+8

(b)With power supply noise

Table A.3: Average time, over 60 clock cycles, to transition to the Vdd/2

level, versus transition from the Vdd/2 level. See section 5.1 for details about

definition of rise and fall times. The larger of the rise time and the fall time is listed. All data expressed as percentage of the clock period.

A.2 Supply noise

T = 2ns 4ns 10ns

Adder Driver Adder Driver Adder Driver

SQ 80 150 65 80 60 20

FNP 50 370 70 240 60 90

FP 40 300 70 200 70 100

SNP 25 200 65 130 45 45

SP 25 200 65 110 45 50

Table A.4: Peak-to-peak supply noise for the drivers, assuming Vdd = 0.8V.

(43)

A.3 Harmonics 27

A.3 Harmonics

A.3.1 3rd harmonic

0.7 0.8 0.9 −45 −40 −35 −30 −25 −20 −15 −10 V dd Power [dB]

T=2ns, 3rd harmonic, no power supply load

0.7 0.8 0.9 −40 −35 −30 −25 −20 −15 −10 V dd Power [dB] T=2ns, 3rd harmonic, RCL load (a) 0.7 0.8 0.9 −30 −25 −20 −15 −10 V dd Power [dB]

0.7 0.8 0.9 −26 −24 −22 −20 −18 −16 −14 −12 −10 V dd Power [dB] T=4ns, 3rd harmonic, RCL load (b) 0.7 0.8 0.9 −35 −30 −25 −20 −15 −10 V dd Power [dB]

0.7 0.8 0.9 −35 −30 −25 −20 −15 −10 V dd Power [dB] T=10ns, 3rd harmonic, RCL load (c) Architecture Improvement [dB] 2ns 4ns 10ns FP 29 15 20 FNP 21 16 17 SP 8.7 9.8 11 SNP 10 10 12 with power supply noise FP 15 14 19 FNP 27 14 17 SP 8.4 9.4 11 SNP 9.8 9.6 12 (d)

Figure A.1: A comparison of the 3rd harmonic, for the different architec-tures. All numbers have been normalized with respect to the amplitude of the fundamental frequency, and are compared to the output of the square wave driver. The table is valid for Vdd = 0.8V.

Legend: Circles: square wave. Plus: Fast, with passgate (FP.). Star: Fast, no passgate (FNP.) Squares: Slow, with passgate (SP.). Diamond: Slow, no

(44)

A.3.2 Power spectrum

1 2 3 4 5 6 −70 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB]

T=2ns, V_dd=0.7V,no power supply load

1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=2ns, V_dd=0.7V, RCL load 1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB]

T=2ns, V_dd=0.8V, no power supply load

1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=2ns, V_dd=0.8V, RCL load 1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=2ns, V

dd=0.9V, no power supply load

1 2 3 4 5 6 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=2ns, V dd=0.9V, RCL load Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate

Figure A.2:Power spectrum for the T = 2ns designs.

1 2 3 4 5 6 −80 −70 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB]

T=4ns, V_dd=0.7V,no power supply load

1 2 3 4 5 6 −70 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=4ns, V_dd=0.7V, RCL load 1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=4ns, V

1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=4ns, V dd=0.8V, RCL load 1 2 3 4 5 6 −70 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB]

1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=4ns, V_dd=0.9V, RCL load Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate

(45)

A.3 Harmonics 29 1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=10ns, V

dd=0.7V,no power supply load

1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=10ns, V dd=0.7V, RCL load 1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB]

1 2 3 4 5 6 −70 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=10ns, V_dd=0.8V, RCL load 1 2 3 4 5 6 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=10ns, V

1 2 3 4 5 6 −70 −60 −50 −40 −30 −20 −10 0 Harmonic Power [dB] T=10ns, V dd=0.9V, RCL load Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate Square Fast, passgate Fast, no passgate Slow, passgate Slow, no passgate

Figure A.4:Power spectrum for the T = 10ns designs.

(46)

A.4 Power

A.4.1 Static power consumption

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 500 1000 1500 2000 2500 3000 3500 4000 V clk I [nA]

Current through one adder

Figure A.6:Average current through one adder when the clock input is held stationary at a few voltages. Vdd is 0.7V, 0.8V, and 0.9V, respectively.

A.4.2 Dynamic power

Square Fast, PG Fast, no PG slow, PG slow, no PG

0 2 4 6 8 Prms [mW]

Power consumption. Vdd=0.7V, no power supply load

0 2 4 6 8 Prms [mW]

Power consumption. Vdd=0.7V, RCL load

0 2 4 6 8 Prms [mW]

Power consumption. V_dd=0.9V, no power supply load

0 2 4 6 8 Prms [mW]

Power consumption. V_dd=0.9V, RCL load Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage

(47)

A.4 Power 31

0 0.5 1 1.5 2 2.5 3 Prms [mW]

Power consumption. V_dd=0.7V, RCL load

0 0.5 1 1.5 2 2.5 3 Prms [mW]

Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage

Figure A.8:Power consumption, T = 4ns

0 0.2 0.4 0.6 0.8 1 Prms [mW]

Power consumption. V_dd=0.9V, RCL load Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage Adder Driver stage Logic stage

(48)

32 A Graphs and tables T = 2ns 4ns 10ns Vdd = 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V SQ 0.46 0.74 1.1 0.22 0.35 0.51 0.088 0.14 0.20 FNP 1.1 1.71 2.5 0.35 0.57 0.83 0.093 0.14 0.21 FP 0.92 1.37 1.9 0.29 0.47 0.68 0.096 0.15 0.21 SNP 0.48 0.76 1.1 0.18 0.28 0.40 0.062 0.095 0.14 SP 0.45 0.71 1.0 0.17 0.27 0.38 0.062 0.095 0.14

(a)Power consumption (rms) for the driving stage of the drivers.

T = 2ns 4ns 10ns Vdd = 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V 0.7V 0.8V 0.9V SQ 0.081 0.14 0.21 0.027 0.044 0.066 0.0077 0.013 0.019 FNP 1.5 2.6 3.9 0.37 0.62 0.96 0.073 0.12 0.18 FP 0.96 1.6 2.5 0.27 0.45 0.69 0.064 0.11 0.16 SNP 0.47 0.78 1.2 0.14 0.23 0.35 0.033 0.053 0.079 SP 0.37 0.61 0.94 0.13 0.21 0.32 0.033 0.054 0.081

(b)Power consumption (rms) for the logic stage of the drivers.

(c)Power consumption (rms) for the adders.

(d)Power consumption (rms) in total

(49)

A.5 Sizing 33

A.5 Sizing

W1 W2 W3 Wc1 Wc2 T = 2ns FP 20.8 97 18.5 1.5 10.5 FNP 27.5 184 5.4 2.0 S 8.0 40 4.5 0.60 4.3 SNP 8.0 53 0.57 5.3 T = 4ns FP 8.0 40 5.0 0.57 4.25 FNP 8.8 60 0.60 6.0 SP 3.6 19 1.5 0.40 2.0 SNP 3.5 22 0.40 2.2 T = 10ns FP 2.9 15.5 1.60 0.21 1.60 FNP 2.8 18.5 0.20 1.85 SP 1.35 8.0 0.27 0.135 0.81 SNP 1.35 8.0 0.135 0.80

Table A.6: W1 is the width of transistors M6 and M8 (see figure 4.5b), W2

the width of transistors M2 and M4, and W3 the NMOS transistor in the

passgate (when applicable). Wc1 = Wc3 =

1 3W¯c1= 1 3W¯c3 and Wc2= 1 3W¯c2 are

(50)

(51)

Bibliography

S.C. Chan, K.L. Shepard, and P.J. Restle. Design of resonant global clock distribu-tions. In Computer Design, 2003. Proceedings. 21st International Conference on, pages 248 – 253, Oct. 2003. doi: 10.1109/ICCD.2003.1240902. Cited on pages 3 and 4.

S.C. Chan, P.J. Restle, K.L. Shepard, N.K. James, and R.L. Franch. A 4.6 GHz resonant global clock distribution network. In Solid-State Circuits Conference, 2004. Digest of Technical Papers. ISSCC. 2004 IEEE International, volume 1, pages 342 – 343, Feb. 2004. doi: 10.1109/ISSCC.2004.1332734. Cited on page 3.

S.C. Chan, K.L. Shepard, and P.J. Restle. 1.1 to 1.6 GHz distributed differential oscillator global clock network. In Solid-State Circuits Conference, 2005. Di-gest of Technical Papers. ISSCC. 2005 IEEE International, volume 1, pages 518 – 519, Feb. 2005. doi: 10.1109/ISSCC.2005.1494097. Cited on pages 3 and 4. A.J. Drake, K.J. Nowka, T.Y. Nguyen, J.L. Burns, and R.B. Brown. Resonant

clock-ing usclock-ing distributed parasitic capacitance. Solid-State Circuits, IEEE Journal of, 39(9):1520 – 1528, Sept. 2004. ISSN 0018-9200. doi: 10.1109/JSSC.2004. 831435. Cited on pages 3 and 4.

J. Fritzin, B. Mesgarzadeh, and A. Alvandpour. A Class-D stage with harmonic suppression and DLL-based phase generation. In Circuits and Systems (MWS-CAS), 2012 IEEE 55th International Midwest Symposium on, pages 45 – 48, Aug. 2012a. doi: 10.1109/MWSCAS.2012.6291953. Cited on pages 6 and 18. J. Fritzin, C. Svensson, and A. Alvandpour. Design and analysis of a class-d

stage with harmonic suppression. Circuits and Systems I: Regular Papers, IEEE Transactions on, 59(6):1178–1186, 2012b. Cited on page 22.

M. Hansson, B. Mesgarzadeh, and A. Alvandpour. 1.56 GHz On-chip Resonant Clocking in 130nm CMOS. In Custom Integrated Circuits Conference, 2006. CICC ’06. IEEE, pages 241 – 244, Sept. 2006. doi: 10.1109/CICC.2006.320947. Cited on pages 3 and 4.

(52)

36 Bibliography

reduction of radiated emissions. In Electromagnetic Compatibility, 1994. Sym-posium Record. Compatibility in the Loop., IEEE International SymSym-posium on, pages 227 – 231, Aug 1994. doi: 10.1109/ISEMC.1994.385656. Cited on pages 3 and 4.

Jonghoon Kim, Dong Gun Kam, Pil Jung Jun, and Joungho Kim. Spread spec-trum clock generator with delay cell array to reduce electromagnetic interfer-ence. Electromagnetic Compatibility, IEEE Transactions on, 47(4):908 – 920, Nov. 2005. ISSN 0018-9375. doi: 10.1109/TEMC.2005.859063. Cited on page 3.

B. Mesgarzadeh and A. Alvandpour. EMI reduction by resonant clock distribu-tion networks. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 977 – 980, June 2010. doi: 10.1109/ISCAS. 2010.5537380. Cited on page 10.

B. Mesgarzadeh, I.E. Zadeh, and A. Alvandpour. A multi-segment clocking scheme to reduce on-chip EMI. In SOC Conference (SOCC), 2011 IEEE In-ternational, pages 251 – 255, Sept. 2011. doi: 10.1109/SOCC.2011.6085110. Cited on pages 3, 6, and 24.

F. O’Mahony, C.P. Yue, M.A. Horowitz, and S.S. Wong. A 10-GHz global clock distribution using coupled standing-wave oscillators. Solid-State Circuits, IEEE Journal of, 38(11):1813 – 1820, Nov. 2003. ISSN 0018-9200. doi: 10.1109/JSSC.2003.818299. Cited on pages 3 and 5.

D. Pandini and G. A. Repetto. Spectral analysis of the on-chip waveforms to generate guidelines for EMC-aware design. In Johan Vounckx, Nadine Aze-mard, and Philippe Maurine, editors, Integrated Circuit and System Design. Power and Timing Modeling, Optimization and Simulation, volume 4148 of Lecture Notes in Computer Science, pages 532 – 542. Springer Berlin Hei-delberg, 2006. ISBN 978-3-540-39094-7. doi: 10.1007/11847083_52. URL http://dx.doi.org/10.1007/11847083_52. Cited on page 5.

J. M. Rabaey, A Chandrakasan, and B. Nikolic. Digital integrated circuits: a de-sign perspective. Prentice Hall, Upper Saddle River, N.J., 2 edition, 2003. ISBN 0131207644. Cited on page 4.

B. Rebaud, M. Belleville, C. Bernard, M. Robert, P. Maurine, and N. Azemard. A comparative study of variability impact on static flip-flop timing characteris-tics. In Integrated Circuit Design and Technology and Tutorial, 2008. ICICDT 2008. IEEE International Conference on, pages 167–170. IEEE, 2008. Cited on page 17.

H.J.M. Veendrick. Short-circuit dissipation of static CMOS circuitry and its im-pact on the design of buffer circuits. Solid-State Circuits, IEEE Journal of, 19 (4):468 – 473, Aug 1984. ISSN 0018-9200. doi: 10.1109/JSSC.1984.1052168. Cited on pages 3 and 5.

(53)

Bibliography 37

new clock technology. Solid-State Circuits, IEEE Journal of, 36(11):1654 – 1665, Nov 2001. ISSN 0018-9200. doi: 10.1109/4.962285. Cited on pages 3 and 5.

(54)

(55)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare — under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förla-gets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet — or its possi-ble replacement — for a period of 25 years from the date of publication barring exceptional circumstances.

The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for his/her own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be men-tioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/

A clock driver with reduced EMI

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

A clock driver with reduced EMI

A clock driver with reduced EMI

Examensarbete utfört i

vid Tekniska högskolan vid Linköpings universitet

av

Sammanfattning

Abstract

Acknowledgments

Contents

Notation

1

Introduction

1.1

Outline

2

Previous works

2.1

Spread spectrum clock (SSC) techniques

2.2

Resonant clocking/charge-recovery

2.3

Rotary traveling waves, Standing wave oscillators

2.4

Relaxed rise and fall times (trapezoidal clock)

2.5

Multi-segment clocking

2.6

Multi-level or Multi-step clocking

3

Some theory of the two-step clock

3.1

Harmonics

3.2

Short-circuit power

3.3

Dynamic power

2τ

τ

τ

3.4

Characterizing the two-step wave form

4

Description of tested circuits

4.1

Adders

4.2

Clock drivers

4.2.1

Design of two-step clock driver

5

Results

5.1

Wave form

5.2

Supply noise

5.3

Harmonics

5.4

Power

5.5

Conclusions and future work

A

Graphs and tables

A.1

Wave form, timing

A.1.1

Pulse width deviations and jitter

A.1.2

Rise and fall times

A.2

Supply noise

A.3

Harmonics

A.3.1

3rd harmonic

A.3.2