A comparison of circuit implementations from a security perspective

(1)

FROM A SECURITY PERSPECTIVE

Master Thesis

Division of Electronic Devices

Department of Electrical Engineering

Linköping University

by

Timmy Sundström

LITH-ISY-EX--05/3698--SE

Supervisor:

Atila Alvandpour

Examiner:

Atila Alvandpour

(2)

(3)

Avdelning, Institution Division, Department Institutionen för systemteknik 581 83 LINKÖPING Datum Date 2005-05-26 Språk Language Rapporttyp Report category ISBN Svenska/Swedish

X Engelska/English Licentiatavhandling X Examensarbete ISRN LITH-ISY-EX--05/3698--SE

C-uppsats

D-uppsats Serietitel och serienummer _{Title of series, numbering} ISSN

Övrig rapport

____

URL för elektronisk version

http://www.ep.liu.se/exjobb/isy/2005/3698/

Titel

Title

En jämförelse av logik stilar ur ett säkerhetsperspektiv

A comparison of circuit implementations from a security perspective

Författare

Author

Timmy Sundström

Sammanfattning

Abstract

In the late 90's research showed that all circuit implementations were susceptible to power analysis and that this analysis could be used to extract secret information. Further research to counteract this new threat by adding countermeasures or modifying the underlaying algorithm only seemed to slow down the attack.

There were no objective analysis of how different circuit implementations leak information and by what magnitude.

This thesis will present such an objective comparison on five different logic styles. The comparison results are based on simulations performed on transistor level and show that it is possible to implement circuits in a more secure and easier way than what has been previously suggested.

Nyckelord

Keyword

(4)

(5)

In the late 90’s research showed that all circuit implementations were susceptible to power analysis and that this analysis could be used to extract secret information. Further research to counteract this new threat by adding countermeasures or modifying the underlaying algorithm only seemed to slow down the attack.

There were no objective analysis of how different circuit implementations leak information and by what magnitude.

This thesis will present such an objective comparison on five different logic styles. The com-parison results are based on simulations performed on transistor level and show that it is possi-ble to implement circuits in a more secure and easier way than what has been previously suggested.

(6)

(7)

1 INTRODUCTION

3 1.1 Background . . . 3

1.2 Aim of the thesis . . . 3

1.3 Layout of the report . . . 3

2 CRYPTOGRAPHY

5 2.1 Public and private key cryptography . . . 5

2.2 Asymmetric cryptography . . . 5

2.3 Symmetric cryptography . . . 5

2.4 The Diffie-Hellman key exchange . . . 6

2.5 Examples of symmetric and asymmetric cryptography . . . 6

2.5.1 RSA. . . 6

2.5.2 Elliptic curve cryptography. . . 6

2.5.3 AES . . . 7

2.6 Cryptography for embedded systems. . . 7

3 POWER ANALYSIS

9 3.1 Power analysis attacks . . . 9

3.2 SPA Attack. . . 9

3.3 DPA Attack . . . 9

3.3.1 How to perform a DPA attack . . . 10

3.3.2 Improving the signal to noise ratio . . . 10

3.4 Existing countermeasures for power analysis . . . 10

3.4.1 Algorithmic countermeasures for SPA . . . 10

3.4.2 Algorithmic countermeasures for DPA . . . 11

3.4.3 Noise insertion . . . 11

3.4.4 A hardware countermeasure . . . 11

3.4.5 Using power independent logic style . . . 11

(8)

4 CIRCUIT STYLES

13 4.1 Static CMOS . . . 13

4.1.1 Timing of static CMOS. . . 14

4.1.2 Information leakage in static CMOS . . . 14

4.2 Dynamic CMOS. . . 15

4.2.1 Timing in dynamic CMOS . . . 17

4.2.2 Information leakage in dynamic CMOS . . . 17

4.3 Differential Domino . . . 17

4.4 Information leakage in differential domino circuits. . . 18

4.5 CRSABL . . . 18

4.6 DyCML . . . 19

5 IMPLEMENTATION

23 5.1 Circuit comparison . . . 23

5.2 Assumption . . . 23

5.3 Test setup . . . 24

5.4 Implementations. . . 24

5.5 Static CMOS . . . 25

5.6 Dynamic CMOS. . . 29

5.7 Differential Domino . . . 34

5.8 CRSABL . . . 39

5.9 DyCML . . . 46

6 COMPARISON

55 6.1 The standard deviation . . . 55

6.2 Power consumption and delay . . . 57

7 CONCLUSIONS

59 7.1 The different logic styles . . . 59

7.2 What can be done to counteract the leakage . . . 59

7.3 Security a trade-off? . . . 59

(9)

1

INTRODUCTION

This master thesis will present a circuit technology comparison from a security perspec-tive. We will see that security is a trade-off for performance. A discussion will follow about the roles of different circuits styles in a system. Because some systems, not prop-erly implemented, will leak information that could lead to the extraction of secure data, we want to minimize this leakage and this thesis will show how this can be done.

1.1 Background

In 1998, Kocher et al. [1,2] presents an attack on secure systems based on statistical anal-ysis. This provided a way to learn secret information stored in systems such as smart cards. The analysis is made possible because information can be retrieved from studying the power consumption of a system. In this thesis the information leakage of different circuit styles will be presented and what one can do to prevent this.

1.2 Aim of the thesis

The aim of this thesis is to provide a comparative analysis of circuits from a security per-spective. How different logic styles leak information and how insecure they are in rela-tion to each other. Finally we will see what we can do to minimize this leakage in implementation.

1.3 Layout of the report

In this report we will first see how cryptography is used in different systems so as to pro-vide a basic understanding of how sensitive information is kept safe and the strength and weaknesses of this when implemented in a real system. After this the different analyses is presented and we will see how leaked information is being used to extract useful data and some existing countermeasures for these attacks. This will be followed by a presentation of the logic styles that will be used in the comparison and some comments about them. After this the test setup and the implementation of the circuits will be shown together with the results and a discussion. In the end we will see the conclusions and what one can do to minimize this information leakage.

(10)

(11)

2

CRYPTOGRAPHY

Cryptography is the mathematical approach of keeping information secure. Encryption and decryption of a plaintext is done using a secret key and the strength of the crypto lies in maintaining this keys secrecy. There exists various different cryptographic schemes and we will start by classifying them and then give some examples of existing schemes used today.

2.1 Public and private key cryptography

Encryption of a text could be done with either public- or private key cryptography, also known as asymmetric and symmetric cryptography respectively. Using a symmetric scheme the encryption and decryption is done with the same key and privacy can only be guaranteed of this key remains secret. In asymmetric cryptography one key is used for encryption and another for decryption, therefor the encryption key could be made pub-licly available and only the person with the secret decryption key would be able to read the message.

2.2 Asymmetric cryptography

The strength of asymmetric cryptography is that the key used for encryption does not have to remain secret. Everyone who wish to send information could do so without the risk of a third party picking up the secret message. The problem with asymmetric cryptos is that they often require complex mathematics and encrypting a long message in an embedded system often takes a long time.

2.3 Symmetric cryptography

As opposed to asymmetric cryptos, implementing symmetric ones is often easy and encrypting a message takes virtually no time compared to asymmetric encryption. The downside is that when both sender and receiver must use the same key for encryption and decryption, they have a problem on how to decide on which key to use, since the same key should not be used in more than one transaction. In order to prevent a third party from picking up this information exchange and learning the secret key there have to exist a method to select a key and letting both parties share this information. Either the two

(12)

parties must meet or communicate over a non-secure line to decide upon the key or use a key exchange scheme. One such method for deciding upon a secret key is the Diffie-Hell-man key exchange protocol. It uses asymmetric cryptography to decide upon the secret key which could later be used as the private key for symmetric encryption.

2.4 The Diffie-Hellman key exchange

We assume that the two parties A and B are using the same asymmetric crypto function F, that given a plaintext P and a secret key k returns the ciphertext C.

A and B now each decide on a secret key, k_a and k_b, preferable random for each new

transaction, and given a plaintext P calculate C_x and transmit this to each other. A then

uses C_band B used C_aas a plaintext and encrypt this one more time resulting in X. This

value X is now the secret key which can be used for symmetric cryptography. A third

party trying to listen to this exchange will only learn P and the intermediate values C_a

and C_b. Given these it is computationally hard to derive the keys k_a and k_band the final

key X.

Figure 1: The Diffie-Hellman key exchange This works under the assumption that

If this is not the case, we must have another representation of the exchange. This is not the definition of the Diffie-Hellman key exchange but an example of how it can be used in practice.

2.5 Examples of symmetric and asymmetric cryptography

2.5.1 RSA

One scheme used for public key cryptography is RSA. It is named after its inventors Rivest, Shamir, and Adleman. It is based on modular exponentiation and its strength lie in the fact that it is hard to calculate prime factors of a large integer. In order to remain reasonably safety it is recommended to use a key length of 1024 bits.

2.5.2 Elliptic curve cryptography

Another public key system is elliptic curve cryptography (EEC) proposed in [3] by N. Koblitz in 1997. It is more appropriate for the use in embedded systems such as smart cards because of the increased security for a given key length compared to RSA. For EEC to remain as safe as RSA at 1024 bits only a key length of 163 bits is required.

F k P( , ) = C

F k( _a, )P = C_a F k( _b, )P = C_b

F k( _a,C_b) = X F k( _b,C_a) = X

C_a⇔C_b

(13)

2.5.3 AES

In 2001 the Rijndael algorithm was chosen as the new Advanced Encryption Standard (AES) by NIST [4]. The AES crypto is a symmetric crypto that uses a variable key length of 128, 192 or 256 bits. It is a block cipher that encrypts 128 bits of plaintext at a time and uses several rounds of encryption. This could easily be implemented in an embedded system and is the standard cryptographic scheme used today.

2.6 Cryptography for embedded systems

In order to implement a secure cryptographic scheme on embedded systems such as smart cards we need to make sure that the system is capable of handling these ciphers. Using the EEC asymmetric crypto in the Diffie-Hellman key exchange and the AES for symmetric encryption is a viable option for implementation in a smart card.

(14)

(15)

3

POWER ANALYSIS

A crypto system is ideally considered a black box that given a plaintext and a secret key outputs a ciphertext. The strength of the system lies in the fact that extracting the key from the output should not be possible within a reasonable amount of time. Unfortu-nately no implementation of a crypto system is ideal and additional information from side-channel leakage will be available. Without careful design a system which strength should lie in keeping the key secret could easily be broken using this leakage. In [1,2] Kocher et al. presents side-channel attacks (SCAs) based on simple and differential power analysis, SPA and DPA. Countermeasures to power analysis has been proposed on different levels of the design and although these techniques has provided resistance against DPA, improvements of the same is still useful for breaking these crypto systems.

3.1 Power analysis attacks

This section will provide an introduction to the side-channel power analysis attacks and examples of countermeasures used for resistance against the same. Power analysis attacks is based on the fact that the momentaneous power consumption of a system is correlated to the internal state. This information leakage could for example provide such knowledge as the hamming weight of a word being processed.

3.2 SPA Attack

The SPA attack is performed by directly observing the power consumption of a system. We take a simple crypto system as an example, for a round in the encryption the actions

performed is related to a specific bit in the key. If k_iis zero an addition is executed and if

the bit is one a multiplication is performed instead. Since a multiplication consumes more power than an addition examining the power consumption of the system will give us information of this bit. All conditional execution that depends on the secret key can be extracted using SPA.

3.3 DPA Attack

In a normal hardware implementation the power consumption of a logic gate is depen-dent of the inputs. This small difference will not be directly visible on a power trace due

(16)

to the interfering noise coming from other parts of the system running simultaneously. When looking at the difference averaged over a large number of traces, so that the uncor-related noise is suppressed, the difference will be seen. This statistical approach is the base of the differential power analysis attack.

3.3.1 How to perform a DPA attack

In order to successfully perform a DPA attack we need to sample the power consumption

for N encryptions. Each run gives a power trace S_i[j] where j is time of the sample and i

represents which power trace ranging from 1 to N. A partitioning function D is then used

to divide the power traces into two sets S₀ and S₁.

The function D should be dependant of the secret key as well as known variables such as the plaintext or the ciphertext depending on which is available. We now calculate the average of these two sets and the difference of the averages.

Since D is a function of the secret key, the difference T[j] will be dependant of this as well. If our guess of the secret key is correct the function T[j] will be characterized by spikes while if the key was incorrect the partitioning will be done at random and T[j] will not show anything but noise. The spikes on T[j] will be the difference in power dissipa-tion in the logic gates mendissipa-tioned earlier.

3.3.2 Improving the signal to noise ratio

Depending on the system and which countermeasures are used we might need to improve the DPA attack to successfully identify the correct key. In [5] Messerges et al. gives sev-eral methods on how to improve the signal to noise ratio for the DPA attack.

3.4 Existing countermeasures for power analysis

Various countermeasures exist that try to prevent DPA attacks. Most can still be sub-jected to a successful DPA attack if the method of the attack is changed according to the specific countermeasure. The most promising countermeasure so far is the use of logic gates that try to eliminate the source that makes power analysis possible, making the power dissipation independent of signal value and sequence.

3.4.1 Algorithmic countermeasures for SPA

Implementing a countermeasure for SPA attack on the algorithmic level is easily done, usually at the cost of increased execution time and power consumption. Making the

S₀ = {S_i[ ]j D = 0} S₁ = {S_i[ ]j D= 1} A₀[ ]j 1 S₀ --- S_i[ ]j S_i[ ]j

∑

∈S₀ = A₁[ ]j 1 S₁ --- S_i[ ]j S_i[ ]j

∑

∈S₁ = T j[ ] = A₀[ ]j –A₁[ ]j

(17)

instructions executed and their order independent of the secret key will provide security against SPA. The only data dependence in the power consumption that then remains is that of the individual gates and this small variation will drown in the noise caused by other parts of the system.

3.4.2 Algorithmic countermeasures for DPA

The reason DPA works is that the power dissipation is correlated to the secret key. A countermeasure on the algorithmic level would be to mask the secret key so that the same inputs give rise to different states within the system but still gives the same output. This would cause the correlation between a specific intermediate bit and the secret key to be randomized reducing the success rate of a DPA attack. Since the strength of a crypto sys-tem should lie in the secret key and not in keeping the algorithms secret, we should assume that the attacker has information about the countermeasures used and could choose a different partitioning function to sidestep the masking procedure and similar protections.

3.4.3 Noise insertion

Adding white noise to the power source will corrupt the power traces. Since the averag-ing method of the DPA attack will eliminate this noise this method will only increase the number of traces required for a successful attack.

3.4.4 A hardware countermeasure

In [6] Clavier et al. shows that the hardware countermeasure known as random process interrupts (RPIs) which inserts random dummy instructions is susceptible of an improved DPA attack called Sliding Window DPA. The RPI countermeasure is used to randomly spread out the spikes that should occur in the difference function T[i]. This is attacked by integrating the power trace over a window centered on the mean shift caused by the RPI reconstructing a difference signal. Using the RPI countermeasure will only increase the amount of power traces N needed to successfully perform DPA.

3.4.5 Using power independent logic style

In [7,8,9] Tiri et al. introduces a new logic style designed to eliminate the source of the power consumption difference. Making the gates power dissipation independent of the input value and sequence. If a gate could be perfectly realized the source of the DPA attacks would be removed since there would not be any difference in power consumption to target.

3.5 An example of how to perform a power analysis

To perform a power analysis one needs to acquire the power traces to have enough mate-rial for analysis. This is done by measuring the current flowing into the device, for exam-ple by connecting an oscilloscope which samexam-ples the data at specific times in series with the device and the power supply. Here is two such power traces acquired by simulating the current of a half adder implemented in differential domino. The first trace is when one input is high and the other low causing the sum to be high and the carry bit to be low, while in the other trace both inputs are high which means the sum is low and the carry is high.

(18)

Figure 2: Two power traces of a half adder

When looking at these two traces it is hard to see any differences between them. In the beginning of the cycle the two peaks looks a little bit different but that is all. Power anal-ysis is extracting information in these small differences. If we look at a plot displaying the difference between the two power traces these small variations will become visible.

Figure 3: The difference between the two power traces

We see that the difference has distinct peaks which means that the two power traces really are different. If one wants to extract information from a complete circuit a specific gate must be targeted. When sampling the power one cannot make distinctions such as where the power was consumed. This is what the DPA attack takes care of. If we see this half adder as a part of a big system the power consumption of this small part will be drowned by everything else going on at the same time. The two traces will be overlaid by the power consumed by all other parts. The reason the DPA attack will work now is that the power consumption of all other parts is independent of the half adder. Which means that when taking enough power traces into consideration and looking at the difference between the two sets we will still have something that looks similar to the difference plot. That is, the distinct peaks will still be there. The hard part is to successfully divide all the power traces into two sets in which the difference has any meaning. This selection is usu-ally based on the given outputs at the time of sampling and a guess of a secret key, if this guess is incorrect the selection of the two sets will be random and the difference plot will be flat. If on the other hand the guess is correct we will see a trace with distinct peaks, signifying that our guess was correct.

0 5 10 −1 −0.5 0 0.5 1 1.5 Time (ns) Current (mA) 0 5 10 −1 −0.5 0 0.5 1 1.5 Time (ns) Current (mA) 0 5 10 −0.4 −0.2 0 0.2 Time (ns) Current (mA)

(19)

4

CIRCUIT STYLES

This chapter will introduce the different logic styles that will be used in the comparison. We will present how they work and give an example of a simple circuit using this style, also a short discussion about the pros and cons of using each style will follow the initial presentation.

4.1 Static CMOS

Static CMOS is the most basic circuit style used when designing circuits, it consists of a pull-up and a pull-down network (PUN and PDN respectively) where one and only one conducts for all input signals. These mutualy exclusive networks provide a path to one rail in steady state.

Figure 4: Static CMOS structure PDN

PUN

In_i

F(In_i)

(20)

If one should build a simple inverter circuit in static CMOS, it would look like this

Figure 5: Inverter in static CMOS

The advantage of static CMOS logic is its robustness and that it is easy to use. It is rea-sonably good in speed and area compared to a general circuit style.

4.1.1 Timing of static CMOS

In static CMOS the timing of signals follows an easy scheme, in the beginning of the clock cycle the data changes value and the change is then rippled through the stages with-out the need of a clock to enable them.

Figure 6: Timing of static CMOS

4.1.2 Information leakage in static CMOS

Depending on the input values and sequence the power consumption of a static CMOS gate varies. If there is no transition, nothing happens in the gate and there is no power consumption. But if there is a transition on the inputs, so that the output changes value the capacitance at the output node must either charge or discharge causing a slight varia-tion in power consumpvaria-tion between the two cases and we can retrieve that informavaria-tion from the power profile.

a a

clock period

(21)

Figure 7: Power consumption of different transitions

The only time a static CMOS draws power is during the change of a state, namely the 0-1 and 0-1-0 transition on the output, this causes the two rails to be shorted for a time and

current is flowing from V_ddto ground. This current varies depending on which transition

we have, if we are to charge the output node, more current must come from V_ddand if we

must discharge it less current has to flow from V_ddand it is this small change that we can

observe in the power consumption.

4.2 Dynamic CMOS

In dynamic logic, one of the networks is replace by a clocked precharge transistor and an evaluation transistor is connected in series with the remaining network. This decreases the area and the evaluating time of the circuit compared to static CMOS. If we replace the pull-up network we get an n-type network and when replacing the pull-down we get a p-type. There are two ways to cascade dynamic gates, using either Domino rules or NP rules and these two differ in that using Domino rules, we use only n-type networks with a

0-0 transition 0-1 transition

(22)

static inverter in between, while in the NP case one alternates between n-type and p-type gates. In this thesis both types will be used.

Figure 8: Dynamic CMOS structure

An example is the simple inverter, which in dynamic CMOS looks like this

Figure 9: Inverter in dynamic CMOS

The advantage of using dynamic CMOS is that it is a very fast implementation style but more sensitive to noise than static CMOS. One way to improve the robustness is to add a keeper, which provides a path to one of the rails at all times, so that the charge does not dissipate or leak away.

Figure 10: Dynamic inverter with an added keeper PDN F(In_i) In_i CLK CLK CLK a a CLK CLK a a CLK

(23)

4.2.1 Timing in dynamic CMOS

When using dynamic CMOS the input signals is only allowed to make one transition dur-ing the evaluation phase. In an N-block the transition is a conditional 0-1 and in a P-block it is a 1-0 transition. This will cause problems when we will implement our ripple carry adder since the inputs available might change when the carry becomes steady. This will force us to add a restoring circuit making this style not following the standard dom-ino or NP rules.

4.2.2 Information leakage in dynamic CMOS

A dynamic gate works differently than a static CMOS gate, by adding a clock we get two intervals, precharge and evaluation. In precharge the gate output is set to high (one) inde-pendent of the previous value. During evaluation the gate conditionally discharges depending on the input. If the output is low and the gate enters precharge, charge is

trans-ferred to the output capacitance from V_dd. The other case is when the gate enters

evalua-tion and the output is going low, then output node is discharged to ground. This moving of charge will be seen in a power analysis and hence the internal state of the gate will be known.

Figure 11: Power consumption at transition in a dynamic gate

4.3 Differential Domino

Differential domino consists of two differential dynamic gates that are cross coupled to provide a stable output. The two pull-down networks are mutualy exclusive so that dur-ing evaluation, one output node is always pulled down. When this happens the cross

cou-1-1 transition at precharge 0-1 transition at precharge

(24)

pled inverter pair kicks in so that when one node is pulled down, the other is kept high removing the need for keepers as in dynamic logic. For differential domino to work, both the input signals and their complements must be available. Since the output of a differen-tial domino gate is both the logic function and its complement this is no problem when cascading differential gates.

Figure 12: Structure of a differential domino gate

The differential domino logic is close to dynamic logic in speed, it is more robust but has increased area compared to domino implementations.

4.4 Information leakage in differential domino circuits

Since the dual ended logic can be seen as containing a normal gate and its complement, the internal transitions is independent of the input values and sequence. This is unfortu-nately only true in the ideal case, in order for the transitions to be identical both the two parts must be electrically identical. That is, the pull-down paths must have the same resistance and the same internal load capacitance at all nodes. Due to process variations and the fact that some functions are hard to implement identically as their complement the differential domino circuit will also leak some information, although less than the previous logics.

4.5 CRSABL

CRSABL (Charge Recycling Sense Amplifying Based Logic) is a circuit style developed by Tiri et al in [9]. It is based on the differential domino logic with certain modifications, intended to lower the data dependency as well as the mean value of the power consump-tion. CRSABL utilizes a charge recycling scheme which uses the charge already stored at the output nodes to equalize them during precharge, the cross coupled PMOS pull-up

pair will charge the nodes to a value of V_dd-V_thwhich is less than V_ddtherefor using less

power in precharge than differential domino. In order to design the differential pull-down network certain restrictions are also in place. All the internal nodes in the differential pull-down network must be connected to one output node for all input combinations, this guarantees that during a switching event the load capacitance has a constant value, mak-ing them closer to electrically identical. The same applies to CRSABL as differential

In_i CLK CLK CLK out out Differential PDN

(25)

domino when it comes to information leakage, ideally the gates would consume the same power independent of input but process variations and implementation causes the two networks to be slightly imbalanced, allowing information to be available through power analysis.

Figure 13: Structure of a CRSABL gate and the level restoring circuit

When cascading CRSABL gates we use the approach of differential domino and inverts

the outputs. But since the precharge phase does not fully charge the nodes to V_ddwe

can-not use static inverts since we would have static power consumption in those. This is solved by a level restoring logic on each output.

4.6 DyCML

DyCML, which stands for dynamic current mode logic, is a reduced swing logic that is based upon a combination of MOS current mode logic (MCML) and dynamic logic. It was developed by Allam et al. [10] and never intended as a solution to the information leakage problem. By having a differential network that never provides a direct path between the two rails it is still interesting to examine if the power consumption is data dependant. By removing the resistive loads in an MCML gate and replacing them with clocked PMOS transistors and adding a cross coupled pair to this we reach a setup which looks very much like the differential domino. Instead of discharging the output nodes to ground the DyCML logic has a virtual ground which is made up by an NMOS transistor connected as a capacitance. This virtual ground will be discharged during precharge and one of the output nodes will be connected with the virtual ground, equalizing the voltage

In_i CLK CLK internal internal Differential PDN internal internal out V

(26)

at both nodes. Since this voltage will never be zero, the DyCML logic also has reduced swing.

Figure 14: Structure of a DyCML gate

The problem of using the DyCML logic is twofold. First it is very sensitive to noise, the low output node is floating during evaluation and the charge here may be destroyed from cross-talk or other noise sources. Second, cascading DyCML gates causes problems. Since we have reducing swing, the NMOS transistors in the pull-down network will be leading all the time, either fully or partially. This will cause both output nodes to dis-charge to the virtual ground and causing the gate to malfunction. Due to this one is forced

In_i CLK CLK CLK out out Differential PDN

(27)

to use a complex clocking scheme in which the evaluation phase of the next gate starts after the previous one has stable outputs.

Figure 15: Three clocks in DyCML shown in relation to each other clock 1

clock 2

(28)

(29)

5

IMPLEMENTATION

In order to compare how the different circuits leak information, we built one 8-bit ripple

carry adder in a 0.35 , 3.3V CMOS process for all of the chosen circuit styles. The

general design was to build one half adder and one full adder cell and to connect them as following

Figure 16: Structure of an 8-bit ripple carry adder

5.1 Circuit comparison

The power consumption was studied for 1000 random input data combinations and each power trace was sampled at 1000 regular points. With a clock period of 10 ns, this gives us a sampling period of 10 ps or, 100 GHz. A normal implementation on a smart card usually runs at a speed of 10 MHz [7] and a state of the art oscilloscope is capable of 40Gb/s which would give us a lower resolution than simulated at this sampling rate. The characteristics that was gathered on each logic style was the worst case delay, the average power consumption and the standard deviation of the power consumption.

5.2 Assumption

The main objective of the comparison is to measure the standard deviation of the power consumption. If the power consumption is data dependent this will show as a larger stan-dard deviation. If the deviation would be zero along the entire power trace this would mean that it is totally independent of data.

µm

(30)

The assumption that this comparison is based upon is that the standard deviation of the power consumption is a measure of how much or how easy information can be extracted. That is, any data dependency can be used to extract information and all information leak-age must be considered negative.

5.3 Test setup

When measuring the power we take the following parts of the test bench into account: the input drivers, the clock driver and the entire 8-bit ripple carry adder. To make the simula-tion as realistic as possible, the input signals comes from gates of similar structure and the output of the adder is connected to similar gates to have realistic loads.

Figure 17: The test setup

5.4 Implementations

The following passages will present the implementation of each logic style, how the rip-ple carry adder was designed, motivations why the transistors are sized the way they are and problems with the specific implementation. Each implementation will be followed by the simulations run on the circuit and their results. In all logic styles the structure of the ripple carry adder was the same, the first stage is a half adder which is followed by seven full adders for a total of eight bits. To clarify what a half and full adder is, a half adder takes two inputs A and B and outputs the sum and carry while the full adder also has a carry in as an input. The adders have the following truth table.

Table 1: Truth table for full adder (first four rows equals a half adder)

A B Carry in Sum Carry out

0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 1 1 1 inputs clock 8-bit adder

(31)

5.5 Static CMOS

Implementation in static CMOS follow specific design rules which are easy to use. The design of pull-up and pull-down networks comes from basic techniques such as re-writ-ing the desired output function. In a ripple carry adder the carry chain is the vital part and this is the one that has to be optimized for speed. In all logic styles the sizing is designed to give the result of equal fall and rise time of an inverter with the following sizes.

Figure 18: The minimum sized inverter

This is not the minimum size supported by the design rules since this is a 0.35

pro-cess, the minimum sizes chosen is just a reference for all logics.

This is how the half adder was designed in static CMOS together with the transistor sizes

next to them. This size is the transistor width in so a width of 1.2 corresponds to

a W/L ratio of 1.2/0.35.

Figure 19: Half adder implemented in static CMOS

1.2 3.6

µm

µm µm

carry out sum

a a a a b b b b 1.2 3.6 3.6 3.6 5.4 5.4 3.6 10.8 2.4 2.4 2.4 3.6 _7.2 7.2

(32)

The first part of the half adder is the carry generation, the carry is then propagated to the next step while at the same time being used to generate the sum. As a rule all the sizing should be done so as to get equal fall and rise times in the internal nodes as well as on the output. Looking at the last part prior to the sum output inverter the pull-down and pull-up both satisfy the 1.2/3.6 minimum size but the carry generation has to be sized up to com-pensate for a higher load in the next stage. We see that the carry inverter is triple the size of the sum inverter so to drive this the first networks were sized accordingly. The full adder has a similar structure only with three inputs.

Figure 20: Full adder implementation in static CMOS

The same size was used for the two output inverters as in the half adder case and all the networks in the circuit has to be sized accordingly.

Implementation in static CMOS was straight forward and no problems came up that needed special attention. Each adder was simulated separately and then connected according to the figure in the beginning of the chapter with the carry output of one stage

1.2 3.6 3.6 10.8 a b cin 2.4 2.4 2.4 cin b a 7.2 7.2 7.2 7.2 3.6 3.6 3.6 10.8 10.8 10.8 2.4 a b cin a b b a cin cin b a a cin b b a 4.8 14.4 4.8 4.8 4.8 4.8 14.4 14.4 14.4 14.4

(33)

being the carry input of the next. This forms the 8-bit ripple carry adder that was simula-tions and the properties was measured. Because we want to include the power needed to drive the inputs in the power consumption the inverters that drive the 8-bit adder is included in the adder block. The test bench that is used has 16 inputs each passed through and inverter to make the signal more realistic and fed to the adder block. The outputs are also passed through inverters to simulate a realistic load.

Figure 21: The test bench setup

The 16 input bits are the eight bits for input A and B and the nine output bits are the eight sum bits and the carry output of the last stage. One typical sum output signal of one of the more significant bits could look like this.

Figure 22: An output of static CMOS under one clock cycle

In the beginning of the phase the inputs changes value causing the carry to ripple through the chain. Soon after the phase starts the output goes low but when the correct carry input is available, when it has rippled through to this stage, the output once again goes high and remains steady. Not all output signals behave this way, only the ones that takes on a false value before the correct carry is available as an input. Other output signals may only make the first transition and remain there while others will not change at all. To extract the information used as a comparison with the other logic styles the adder was simulated under 1000 cycles with the inputs randomly generated using Matlab. The current drawn from the power source was sampled at a 1000 points with regular intervals for each of

inputs 8-bit adder 16 bits 9 bits 0 2 4 6 8 10 0 1 2 3 Time (ns) Voltage (V)

(34)

(35)

ences between this mean power trace and a random power trace which can be seen in the standard deviation as a larger value.

Figure 24: The mean current and the standard deviation (dashed) of static CMOS The maximum delay was measured using the worst case scenario where the carry has to ripple through all stages. The power consumption is calculated by integrating the mean current multiplied with the supply voltage. The total transistor area is used as a compari-son between the circuits and is not the actual area the implementation would take on chip but a guidance as to what the ratio between them would probably be. The transistor area is calculated by adding together all transistor widths (one half adder and seven full adders) and multiplying by the transistor length. This gives the following data of the static CMOS circuit style.

5.6 Dynamic CMOS

Implementing the adder in dynamic CMOS was a bit more troublesome than using stan-dard static CMOS. First a mixture of different dynamic design techniques were used, the carry chain can be seen as a domino connected logic while the carry to sum stages is con-nected using NP rules. Because the output of a dynamic gate can only make on transition during a clock cycle the problem where the carry arrives too late must be taking into con-sideration. Imagine a stage late in the chain where the carry input is precharged low. Both inputs A and B are low, which causes the sum to be low. This implementation precharges the sum high so when the circuit starts to evaluate it will drive the sum low to the correct value. When the correct carry then ripples to this stage and is high the output has already

Table 2: Static CMOS properties

Delay (ns) Power consumption (mW) Transistor area 1.81 1.51 504 0 2 4 6 8 10 0 2 4 6 8 Time (ns) Current (mA) µm2 ( )

(36)

made its transition and it cannot be pulled back to high unless the dynamic structure is modified with a restoring logic.

Figure 25: Implementation of a half adder in dynamic CMOS

Figure 26: The carry generation stage of a full adder in dynamic CMOS

1.2 3.6 3.6 10.8 5.4 3.6 clk clk a b carry out sum clk clk a b 5.4 5.4 7.2 14.4 14.4 7.2 1.2 0.4 0.4 carry out 3.6 10.8 3.6 clk clk _0.4 carry out carry out (to sum generation stage)

a _b cin b a 6.6 6.6 6.6 6.6 6.6 6.6

(37)

Figure 27: Sum generation stage of a full adder in dynamic CMOS

In all this three circuits (the latter two form the full adder) we see the special keeper structure within a dashed box, this feedback is used to keep the charge from leaking away because it is a dynamic node and without the keeper nothing is there to guarantee that the charge is maintained. The sizing of this transistor is smaller than the minimum size used elsewhere because it shall not hinder the signal making a transition only to maintain the precharged signal level. In the sum generation of the full adder we also see the special restoring structure which is added to prevent the race condition described earlier, this structure was designed in [11] by C-J Fang et al. If the carry input arrives late and is high this restoring structure pulls down the internal node and returns the output to high, which is the correct value. Sizing each pull-down or pull-up network follows the same basic principle as static CMOS. The testbench has a similar structure as the static case, only with the clocks added. Both the clock and its inverse must be available for the circuit to

sum 1.2 3.6 sum clk 1.2 0.4 a b cin cin cin a _b b a 2.4 4.8 9.6 2.4 4.8 9.6 9.6 9.6 14.4 14.4 14.4 14.4 carry out (from previous stage) clk

(38)

function and these clock drivers are also included in the calculation of total power, together with the input drivers.

Figure 28: The test bench setup of dynamic CMOS

Figure 29: Output and clock (dashed) under one cycle in dynamic CMOS

This is an example of how one sum output bit of the dynamic CMOS logic could change under one cycle. This is the special case discussed where the signal incorrectly pulls low and later when the carry has rippled through it goes high. During the first half of the cycle when the clock is low, the circuit is in the state of precharge and all sum outputs are high. When the clock goes high it enters evaluation and signals may make a transition to low. The sum outputs of the ripple carry adder is outputs from an P-network which makes them internally precharge low, but when the signal passes through an inverter to the out-side this signal is pre-charged high, as seen in the figure. The delay of the circuit is the worst case of the time it takes for all outputs to be stable after the clock goes high.

inputs clock 8-bit adder 16 bits 9 bits 2 clocks 0 2 4 6 8 10 0 1 2 3 Time (ns) Voltage (V)

(39)

(40)

Figure 31: The mean current and the standard deviation (dashed) of dynamic CMOS The standard deviation of dynamic CMOS looks about the same as the static case during evaluation. The deviation during precharge is higher and more data dependant. The prop-erties of dynamic CMOS was gathered in the same way as static CMOS and all the other logic styles.

5.7 Differential Domino

The following three logic styles are the ones that are most interesting when it comes to the comparison of leaked information. Since each style now has an built in duality the power consumption would ideally be independent of input values and sequence but as discussed earlier this is not the case. The same principle is used for the half and full adder as in the previous logic styles, the carry generated in one stage is used both in the next stage and for the sum generation in the same. Both adders are made up of two separate gates, one for carry generation and one for sum generation. All these are designed in the same general way with only the pull-down network different in all cases. We see the out-put inverter sized to match the next stage and the entire pull-down network is sized as the NMOS device in the minimum inverter. In the half adder another special structure is added to the circuit, for example in the carry gate we see the inverse of input B being connected to a lone NMOS transistor which functions as a capacitance. This is to make the circuit more independent of input data. The pull-down paths of the network must have the same resistance independent of input while at the same time to load on a signal and its inverse must be the same forcing the use of such a structure.

Table 3: Dynamic CMOS properties

Delay (ns) Power consumption (mW) Transistor area 1.59 3.19 469 0 2 4 6 8 10 −4 −2 0 2 4 6 8 Time (ns) Current (mA) µm2 ( )

(41)

Figure 32: Carry generation of a half adder in differential domino

Figure 33: Sum generation stage of a half adder in differential domino

clk 4.8 1.6 1.6 4.8 3.6 clk 3.6 3.6 3.6

a

b

a

b b

1.8 _1.8

3.6 3.6

carry

out

carry

_out

clk clk 4.8 1.6 1.6 4.8 3.6 3.6 3.6 3.6 clk

1.8

a b a b cout cout sum _sum clk cout 3.6

1.8

3.6 3.6 3.6 3.6 3.6

(42)

Figure 34: Carry generation stage of a full adder in differential domino

Figure 35: Sum generation stage of a full adder in differential domino

clk 4.8 1.6 1.6 4.8 3.6 3.6 3.6 3.6 clk

carry

out

carry

_out

a b cin a b cin clk 3.6 a b a b 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 clk 4.8 1.6 1.6 4.8 3.6 3.6 3.6 3.6 clk clk a b cin cout a b cin cout 3.2 4.8 cin a b a b cin 4.8 4.8 4.8 4.8 4.8 4.8 3.2 3.2 3.2 3.2 3.2 sum sum

(43)

Due to the dual nature of differential domino, both an input and its inverse must be avail-able causing the number of input and output bits for the adder to increase.

Figure 36: The test bench of differential domino

Figure 37: Output and clock (dotted) under one cycle in differential domino

This is an example of the output of one of the later stages. As in dynamic CMOS the pre-charge phase is when the clock is low which pre-charges the output to low (internally high). When the clock goes high and the circuit enters evaluation we see the output taking on its value. Due to the fact that the pull-down network consists of NMOS devices and that both an output and its inverse is precharged low the circuit cannot begin evaluation until one is high. As opposed to dynamic and static CMOS where an incorrect value could appear before the carry has rippled through, differential domino logic does not start to evaluate until the carry has rippled through to this stage. This means that there is a lag between the latter sum bits and the clock which is the delay of the circuit.

inputs clock 8-bit adder 32 bits 18 bits 1 clock 0 2 4 6 8 10 −1 0 1 2 3 4 Time (ns) Voltage (V)

(44)

(45)

Figure 39: The mean current and the standard deviation (dashed) of differential domino One can see the low standard deviation in the precharge and data changing peaks while the deviation at evaluation is of about the same magnitude as the static and dynamic CMOS.

5.8 CRSABL

Charge Recycling Sense Amplifying Based Logic was designed to minimize the infor-mation leakage of logic gates. It looks a bit like differential domino but has some added structure to lower the power consumption and further decrease the leakage. One design rule that has to be followed when designing the pull-down networks of CRSABL is that all internal nodes in the network must, during evaluation, be connected to one of the out-put paths or ground. This is not a big limitation for the half adder because it only has two inputs, but in the full adder the networks become a lot more complex than the differential domino ones. To have more duality in the pull-down networks of the sum generation they were modified to not include the carry output. The sizing of the transistors in the pull-down networks follow the same principle as before, they must have the same pull-pull-down

strength of a transistor with a width of 1.2 . The sum generation of the full adder is a

special case where only five transistors will be leading in series at the same time. This

give them a size of 1.2 times 5 which equals 6.0 . The pull up paths were sized

with the load of the next stage in mind. The carry in is connected to a total transistor

width of 7.2 + 4.8 + 6.0 + 6.0 = 24.0 . A minimum sized inverter has

Table 4: Differential domino properties

Delay (ns) Power consumption (mW) Transistor area 1.68 3.03 410 0 2 4 6 8 10 0 5 10 Time (ns) Voltage (V) µm2 ( ) µm µm µm µm µm µm µm µm

(46)

a size of 1.2 +3.6 = 4.8 and can drive three gates of the same kind. The total

width of the minimum inverter is 14.4 . 24/14.4 is 1.67 which is the ratio the paths

must be sized with to have the same fall and rise times. This gives two transistors in

series with 7.2 *1.67 = 12.0 each and the pull-down transistor of the external node

being sized accordingly. There is an NMOS transistor which connects the two pull-down paths and the signal V which is connected to its gate is generated with the following cir-cuit. If it was a carry generation circuit then the different inputs would be the carry output and its inverse instead of the sum.

Figure 40: Generation of voltage V

Figure 41: Carry generation of a half adder in CRSABL

µm µm µm µm µm µm sum sum 7.2 1.2 1.2 7.2 sum sum V

a

b

a

b b

carry

out

carry

out

clk

internal

V

4.8

2.4

2.0

12.0

2.0

12.0

1.2 clk

3.6

2.4

4.8 4.8 4.8

4.8

(47)

Figure 42: Sum generation of a half adder in CRSABL clk

internal

V

4.8

2.0

12.0

2.0

12.0

1.2 clk

3.6

4.8

a b a b

4.8 4.8 4.8

4.8 4.8

a b b a sum sum

(48)

Figure 43: Carry generation stage of a full adder in CRSABL clk

internal

V

2.0

12.0

2.0

12.0

1.2 clk

3.6

4.8 4.8 4.8 Pull-down network 1 2 Pull-down network

carry

(49)

Figure 44: The two pull-down networks of the carry generation stage 4.8 7.2 3.6 7.2 7.2 7.2 4.8 4.8 4.8 Pull-down network 1 4.8 7.2 3.6 7.2 7.2 7.2 4.8 4.8 4.8 Pull-down network 2 a b cin a b cin b b b b b b cin cin a a a a

(50)

(51)

(52)

The same kind of peaks as was seen in differential domino are here as well, with the eval-uation, precharge and data change peaks. The precharge and data change peaks look the same being compact and not very data dependant. Because the evaluation time is greater for CRSABL than differential domino the evaluation peak is more spread out, the first stage follows the same behavior but the following stages start to evaluate only when data is ready which varies in time and spreads out the peak.

Figure 49: The mean current and the standard deviation (dashed) of CRSABL

5.9 DyCML

The Dynamic Current Mode Logic was never intended to minimize any information leakage, because of the clocking scheme that has to be used it still interesting in this comparison as a guidance to a better solution. DyCML pull-down networks should look the same as the ones of differential domino but since we have to use more advanced clocking techniques when one gate must start to evaluate after the previous some changes were made. The pull-down networks of the sum generation were changed to not use the generated carry signal of the current stage such as in CRSABL. The sizing of all transis-tors except the virtual ground capacitance was done using the same principle as before. The virtual ground capacitance was sized so to give the voltage swing of 0.66 V on the

output which is 20% of V_dd. According to [10] the formula for sizing the transistor is.

Table 5: CRSABL properties

Delay (ns) Power consumption (mW) Transistor area 3.28 6.88 1069 0 2 4 6 8 10 0 5 10 Time (ns) Current (mA) µm2 ( ) W×L V_swing×C_L C_ox×(V_dd–V_swing) ---=

(53)

C_Lis the load capacitance that in this case is our next stage, C_oxis the gate oxide capaci-tance per unit area, W and L is the transistor width and length which is what we seek. Simulations have showed changing the transistor width did not affect the voltage swing

much and that a size of 2.5 gave good simulation results.

Figure 50: Carry generation of a half adder in DyCML

µm clk 3.6 3.6 3.6 3.6 clk 3.6 3.6 3.6

a

b

a

b b

1.8 _1.8

cout cout 3.6 1.2 clk 2.5 3.6 1.2

(54)

Figure 51: Sum generation of a half adder in DyCML clk 3.6 3.6 3.6 3.6 clk 3.6 1.2 clk 2.5 3.6 1.2

3.6 3.6

3.6

3.6 a

b

a

b

a

sum

(55)

Figure 52: Carry generation of a full adder in DyCML clk 3.6 3.6 3.6 3.6 clk cout cout 3.6 1.2 clk 2.5 3.6 1.2 a b cin a b cin 3.6 a b a b 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6 3.6

(56)

Figure 53: Sum generation of a full adder in DyCML

The test bench of the DyCML simulation looks the same as in the other differential cases, the inverters here are not static inverters but inverters implemented in DyCML to give realistic signals and loads. The different clocks used internally was generated inside the adder block and was included in the power consumption.

Figure 54: The test bench of DyCML

clk 3.6 3.6 3.6 3.6 clk 3.6 1.2 clk 2.5 1.2

b

a

b

a

sum

a

b

a

b

a

cin _cin 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 4.8 inputs clock 8-bit adder 32 bits 18 bits 1 clock

(57)

Figure 55: Output and clock (dotted) under one cycle in DyCML

In this typical output signal one can clearly see the big difference between the DyCML and the other differential logic styles. Since we have a reduced swing on the output we use the complex clocking scheme to enable the evaluation of the next stage. We also see the difference in the clock, it does not have a 50% duty cycle compared to the other clocked logics so as to assure that the output is stable for a longer time at the end of the evaluation. This does not affect the functionality, it only allows the circuits to evaluate for a longer time. 0 2 4 6 8 10 −1 0 1 2 3 4 Time (ns) Voltage (V)

(58)

(59)

Figure 57: The mean current and the standard deviation (dashed) of DyCML

We clearly see that the deviation is close to zero at all times making this logic style very hard to extract information from. There is of course still small variations in the power consumption and these variations makes it still possible to perform a DPA attack on the system to extract information only it takes much longer time due to the magnitude of these variations.

Table 6: DyCML properties

Delay (ns) Power consumption (mW) Transistor area 3.10 2.86 417 0 2 4 6 8 10 0 2 4 6 8 Time (ns) Current (mA) µm2 ( )

(60)

(61)

6

COMPARISON

The power profiles of each logic style means little by itself and are only useful as a com-parison with other logic styles in order to reach a logic style which has low information leakage but is still useful in implementation.

6.1 The standard deviation

Using the standard deviation as a measurement on information leakage is only a refer-ence between the compared logic styles. There is no way to say if the circuit is good or bad when it comes to leakage, only if its better or worse than others. By looking at all the standard deviations of the logic styles together one sees how big a difference there is between them.

Figure 58: The standard deviation of the logic styles

0 2 4 6 8 10 0 0.2 0.4 0.6 0.8 1.0 Time (ns)

Normalized Standard Deviation

Static CMOS Dynamic CMOS Differential Domino CRSABL

(62)

It is obvious that the static CMOS and dynamic CMOS are the two logic styles which leak most information, this is expected because the gates handle different input data dif-ferently causing this large variation.

Figure 59: The standard deviation of the differential logic styles

Looking more closely in the three differential styles one sees that differential domino and CRSABL are both approximately the same, CRSABL have a slightly higher and wider peak making it leak more information. The DyCML logic is nowhere near the others dur-ing evaluation and precharge but the time when the inputs changes value is when the power consumption is most data dependant.

0 2 4 6 8 10 0 0.25 0.5 0.75 1.0 Time (ns)

Normalized Standard Deviation

Differential Domino CRSABL

(63)

6.2 Power consumption and delay

This section presents the delay and power consumption of all circuits. The transistor area is included as a reference of area. This is not the actual area taken to implement the cir-cuit since the interconnect is not taken in consideration but the total area that the transis-tors would use. In the second table we see the numbers in a normalized scale using static CMOS as a reference.

Table 7: The delay and power consumption

Delay (ns) Power consumption (mW) Transistor area Static CMOS 1.81 1.51 504 Dynamic CMOS 1.59 3.19 469 Differential domino 1.68 3.03 410 CRSABL 3.28 6.88 1069 DyCML 3.10 2.86 417

Table 8: The delay and power consumption (normalized)

Delay Power consumption Transistor area

Static CMOS 1.00 1.00 1.00 Dynamic CMOS 0.88 2.11 0.93 Differential domino 0.93 2.01 0.81 CRSABL 1.81 4.56 2.12 DyCML 1.71 1.89 0.83 µm2 ( )

(64)

(65)

7

CONCLUSIONS

7.1 The different logic styles

The different logic styles each have a different data dependency of their power consump-tion, with the static and dynamic CMOS both leaking a lot of information due to the imbalance in data processing. The imbalance causes different inputs values and combina-tions to affect the circuit differently. The three latter circuit styles, differential domino, CRSABL and DyCML all have the duality in them which causes data to be handled the same and ideally all three of these should have a data independent power consumption. What internally happens in one gate depends on when the input signals arrive, if they arrive at the same time and their characteristic. Different rise and fall times of the signals as well as a small skew in their arrival time will all cause the circuit to behave a bit differ-ent for each input pattern. It is this small difference which can be extracted using DPA analysis.

7.2 What can be done to counteract the leakage

Since the delay of each gate is somewhat dependant on its inputs the following stages will be affected by this as well. So the total evaluation time will be dependant on the input value and sequence. A modified DPA analysis would be able to extract information such as the hamming weight of the input word which would lead to an information loss. Since all leaked information must be considered negative also this must be taken into consideration. The main reason why the DyCML logic have such a flat power profile compared to the other dual logics is the clocking scheme. Restricting the start of the eval-uation of the next stage by a fixed time will cause this information leakage to stop.

7.3 Security a trade-off?

So what do we have to sacrifice to implement a system in a more secure way? Using more standardized logic such as differential domino with slight modifications such as a complex clocking scheme we do not have to sacrifice a lot. The data dependency can never be fully removed but there is a lot of countermeasures that can be used to lower the information loss to an acceptable level.

(66)

(67)

8

REFERENCES

[1] P.Kocher J.Jaffe B.Jun “Introduction to Differential Power Analysis and Related Attacks”, 1998.

[2] P.Kocher J.Jaffe B.Jun “Differential Power Analysis” Proc. Advances in Cryptogra-phy (CRYPTO’99), pp. 388-397, 1999.

[3] N. Koblitz “Elliptic Curve Cryptosystems” Math. Computing vol. 48, pp. 203-209, 1993.

[4] N. Pramstaller, F.K Gürkaynak, S.Haene, H.Kaeslin, N.Felber and W.Fichtner “Towards an AES Crypto-chip Resistant to Differential Power Analysis”, Solid-State Circuits Conference, 2004. ESSCIRC 2004. Proceeding of the 30th European 21-23 Sept. 2004 Page(s):307 - 310

[5] T.Messerges A.Dabbish R.Sloan “Examining Smart-Card Security under the Threat of Power Analysis Attacks” IEEE Transactions on computers, vol 51. no. 5 May 2002

[6] C.Clavier J.-S Coron N.Dabbous “Differential Power Analysis in the Presence of Hardware Countermeasures” Cryptographic Hardware and Embedded Systems -CHES 2000 Springer-Verlag Lecture Notes in Computer Science vol. 1965 pp 252-263, 2000

[7] K.Tiri M.Akmal I.Verbauwhede “A Dynamic and Differential CMOS Logic with Signal Independent Power Consumption to Withstand Differential Power Analysis on Smart Cards” Proc. of 28th European Solid-State Circuits Conference (2002) 403-406

[8] K.Tiri I.Verbauwhede “Securing Encryption Algorithms against DPA at the Logic Level: Next Generation Smart Card Technology” Proc. of CHES 2003, LNCS 2779, pp 125-136.

[9] K. Tiri, I. Verbauwhede “Charge recycling sense amplifier based logic: securing low power security ICs against DPA” Solid-State Circuits Conference, 2004. ESSCIRC 2004. Proceeding of the 30th European 21-23 Sept. 2004 Page(s):179 - 182

(68)

[10] M.W. Allam, M.I. Elmasry “Dynamic current mode logic (DyCML): a new low-power high-performance logic style” Solid-State Circuits, IEEE Journal of Volume 36, Issue 3, March 2001 Page(s):550 - 558

[11] C-J Fang, C-H Huang, J-S Wang, C-W Yeh “Fast and compact dynamic ripple carry adder design” ASIC, 2002. Proceedings. 2002 IEEE Asia-Pacific Conference on 6-8 Aug. 2002 Page(s):25 - 28

A comparison of circuit implementations from a security perspective

FROM A SECURITY PERSPECTIVE

Master Thesis

Division of Electronic Devices

Department of Electrical Engineering

Linköping University

by

Timmy Sundström

LITH-ISY-EX--05/3698--SE

Supervisor:

Atila Alvandpour

Examiner:

Atila Alvandpour

TABLE OF CONTENTS

1 INTRODUCTION

3

1.1 Background . . . 3

1.2 Aim of the thesis . . . 3

1.3 Layout of the report . . . 3

2 CRYPTOGRAPHY

5

2.1 Public and private key cryptography . . . 5

2.2 Asymmetric cryptography . . . 5

2.3 Symmetric cryptography . . . 5

2.4 The Diffie-Hellman key exchange . . . 6

2.5 Examples of symmetric and asymmetric cryptography . . . 6

2.5.1 RSA. . . 6

2.5.2 Elliptic curve cryptography. . . 6

2.5.3 AES . . . 7

2.6 Cryptography for embedded systems. . . 7

3 POWER ANALYSIS

9

3.1 Power analysis attacks . . . 9

3.2 SPA Attack. . . 9

3.3 DPA Attack . . . 9

3.3.1 How to perform a DPA attack . . . 10

3.3.2 Improving the signal to noise ratio . . . 10

3.4 Existing countermeasures for power analysis . . . 10

3.4.1 Algorithmic countermeasures for SPA . . . 10

3.4.2 Algorithmic countermeasures for DPA . . . 11

3.4.3 Noise insertion . . . 11

3.4.4 A hardware countermeasure . . . 11

3.4.5 Using power independent logic style . . . 11

4 CIRCUIT STYLES

13

4.1 Static CMOS . . . 13

4.1.1 Timing of static CMOS. . . 14

4.1.2 Information leakage in static CMOS . . . 14

4.2 Dynamic CMOS. . . 15

4.2.1 Timing in dynamic CMOS . . . 17

4.2.2 Information leakage in dynamic CMOS . . . 17

4.3 Differential Domino . . . 17

4.4 Information leakage in differential domino circuits. . . 18

4.5 CRSABL . . . 18

4.6 DyCML . . . 19

5 IMPLEMENTATION

23

5.1 Circuit comparison . . . 23

5.2 Assumption . . . 23

5.3 Test setup . . . 24

5.4 Implementations. . . 24

5.5 Static CMOS . . . 25

5.6 Dynamic CMOS. . . 29

5.7 Differential Domino . . . 34

5.8 CRSABL . . . 39

5.9 DyCML . . . 46

6 COMPARISON

55

6.1 The standard deviation . . . 55

6.2 Power consumption and delay . . . 57

7 CONCLUSIONS

59

7.1 The different logic styles . . . 59

7.2 What can be done to counteract the leakage . . . 59

7.3 Security a trade-off? . . . 59

1

INTRODUCTION

1.1 Background

1.2 Aim of the thesis

1.3 Layout of the report

_1.8

_out

_out