Power Analysis of the Advanced Encryption Standard : Attacks and Countermeasures for 8-bit Microcontrollers

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Power Analysis of the Advanced Encryption Standard

Attacks and Countermeasures for 8-bit Microcontrollers

Examensarbete utfört i Informationskodning vid Tekniska högskolan vid Linköpings universitet

av

Mattias Fransson LiTH-ISY-EX--15/4907--SE

Linköping 2015

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Power Analysis of the Advanced Encryption Standard

Attacks and Countermeasures for 8-bit Microcontrollers

Examensarbete utfört i Informationskodning

vid Tekniska högskolan vid Linköpings universitet

av

Mattias Fransson LiTH-ISY-EX--15/4907--SE

Handledare: Jonathan Jogenfors

isy, Linköpings universitet

Christian Vestlund

Sectra Communications AB

Examinator: Jan-Åke Larsson

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Information Coding

Department of Electrical Engineering SE-581 83 Linköping Datum Date 2015-11-06 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn:nbn:se:liu:diva-122718

ISBN — ISRN

LiTH-ISY-EX--15/4907--SE Serietitel och serienummer

Title of series, numbering ISSN_—

Titel

Title Effektanalys av Advanced Encryption Standard_{Power Analysis of the Advanced Encryption Standard}

Författare

Author Mattias Fransson

Sammanfattning Abstract

The Advanced Encryption Standard is one of the most common encryption algorithms. It is highly resistant to mathematical and statistical attacks, however, this security is based on the assumption that an adversary cannot access the algorithm’s internal state during encryp-tion or decrypencryp-tion. Power analysis is a type of side-channel analysis that exploit informaencryp-tion leakage through the power consumption of physical realisations of cryptographic systems. Power analysis attacks capture intermediate results during aes execution, which combined with knowledge of the plaintext or the ciphertext can reveal key material. This thesis studies and compares simple power analysis, differential power analysis and template attacks us-ing a cheap consumer oscilloscope against aes-128 implemented on an 8-bit microcontroller. Additionally, the shuffling and masking countermeasures are evaluated in terms of security and performance. The thesis also presents a practical approach to template building and device characterisation. The results show that attacking a naive implementation with differ-ential power analysis requires little effort, both in preparation and computation time. Tem-plate attacks require the least amount of measurements but requires significant preparation. Simple power analysis by itself cannot break the key but proves helpful in simplifying the other attacks. It is found that shuffling significantly increases the number of traces required to break the key while masking forces the attacker to use higher-order techniques.

Nyckelord

(6)

(7)

Abstract

The Advanced Encryption Standard is one of the most common encryption al-gorithms. It is highly resistant to mathematical and statistical attacks, however, this security is based on the assumption that an adversary cannot access the algo-rithm’s internal state during encryption or decryption. Power analysis is a type of side-channel analysis that exploit information leakage through the power con-sumption of physical realisations of cryptographic systems. Power analysis at-tacks capture intermediate results during aes execution, which combined with knowledge of the plaintext or the ciphertext can reveal key material. This thesis studies and compares simple power analysis, differential power analysis and tem-plate attacks using a cheap consumer oscilloscope against aes-128 implemented on an 8-bit microcontroller. Additionally, the shuffling and masking counter-measures are evaluated in terms of security and performance. The thesis also presents a practical approach to template building and device characterisation. The results show that attacking a naive implementation with differential power analysis requires little effort, both in preparation and computation time. Tem-plate attacks require the least amount of measurements but requires significant preparation. Simple power analysis by itself cannot break the key but proves helpful in simplifying the other attacks. It is found that shuffling significantly increases the number of traces required to break the key while masking forces the attacker to use higher-order techniques.

(8)

(9)

Acknowledgments

First, I would like to thank the people over at Sectra Communications for giving me the opportunity to work with a topic that is both exciting and highly relevent in today’s society. A special thanks goes to my supervisor Christian Vestlund who was always ready to help and has provided many good thoughts and suggestions throughout the thesis work.

I would also like to thank my examiner Jan-Åke Larsson and my supervisor at the university Jonathan Jogenfors for their comments and helpful advice.

Thanks to my friends and family who has supported me throughout the years and many, many thanks to my mother for her valuable input and for always being there for me.

Finally, a special thought goes to my father who awoke my interest in all things technical—I would not be here if not for him and I so wish he was still with us.

Linköping, November 2015 Mattias Fransson

(10)

(11)

2.3 Brief History . . . 7 2.3.1 Substitution Ciphers . . . 7 2.4 Random Numbers . . . 8 2.5 Asymmetric Cryptography . . . 8 2.6 Secret Sharing . . . 9 2.6.1 Secret Splitting . . . 9 2.7 Cryptanalysis . . . 10 2.7.1 Side-Channel Analysis . . . 10 3 Symmetric-Key Cryptography 13 3.1 Stream Ciphers . . . 13 3.1.1 One-Time Pad . . . 13 3.2 Block Ciphers . . . 14 3.2.1 Design Criteria . . . 14 3.2.2 Modes of Operation . . . 15

3.3 The Advanced Encryption Standard . . . 15

3.3.1 Notation . . . 16

3.3.2 Algorithm Structure . . . 16

3.3.3 Key Schedule . . . 18

3.3.4 Decryption . . . 19

(12)

viii Contents

4 Power Consumption 21

4.1 The Inverter . . . 21

4.1.1 Dynamic Power . . . 21

4.1.2 Static Power . . . 22

4.1.3 Short Circuit Power . . . 23

4.2 The Microcontroller . . . 23

4.2.1 Structure . . . 23

4.2.2 Operation . . . 23

4.3 Measuring Power Consumption . . . 24

4.3.1 Shunt . . . 24

4.3.2 Probing the Electromagnetic Field . . . 25

4.4 Modelling Power Consumption . . . 25

4.4.1 Binary Models . . . 26

4.4.2 Hamming Weight Model . . . 26

4.4.3 Hamming Distance Model . . . 26

4.5 Power Consumption Components . . . 27

4.5.1 Noise . . . 27

4.6 Signal-to-Noise Ratio . . . 28

4.6.1 Calculating the Signal-to-Noise Ratio . . . 28

5 Power Analysis 31 5.1 Power Traces . . . 31

5.1.1 Number of Sample Points . . . 32

5.2 Simple Power Analysis . . . 32

5.2.1 Attacking RSA . . . 32

5.2.2 Attacking AES . . . 32

5.3 Differential Power Analysis . . . 33

5.3.1 General Approach . . . 34

5.3.2 Difference of Means . . . 36

5.3.3 Distance of Means . . . 37

5.3.4 Correlation Coefficient . . . 37

5.3.5 Number of Traces . . . 38

5.3.6 Notes on Key Length . . . 39

5.4 Template Attacks . . . 40

5.4.1 Multivariate Gaussian Model . . . 41

5.4.2 Template Building Phase . . . 41

5.4.3 Attack Phase . . . 42

5.4.4 Points of Interest . . . 42

6 Countermeasures 45 6.1 Hiding . . . 45

6.1.1 Amplitude Hiding . . . 45

6.1.2 Time Dimension Hiding . . . 46

6.1.3 Random Delays . . . 46

6.1.4 Shuffling . . . 46

(13)

Contents ix

6.2 Masking . . . 47

6.2.1 Masking the S-box . . . 48

6.2.2 Masking Scheme . . . 49

6.2.3 Masking the Key Schedule . . . 50

6.3 Higher-Order Differential Power Analysis . . . 51

6.3.1 Second-Order Differential Power Analysis Example . . . . 52

7 Method 55 7.1 Environment Setup . . . 55 7.1.1 Target . . . 56 7.1.2 Oscilloscope . . . 56 7.1.3 Computer . . . 57 7.2 AES Implementations . . . 57 7.2.1 Naive . . . 58 7.2.2 Shuffling . . . 58 7.2.3 Masking . . . 58

7.2.4 Random Number Generation . . . 59

7.4 Device Characterisation . . . 59

7.4.1 Measurement Configuration . . . 59

7.4.2 Viability of the Hamming Weight Model . . . 60

7.4.3 Signal-to-Noise Ratio . . . 61

7.5.1 Attack on Shuffling . . . 61 7.5.2 Second-Order Attack . . . 61 7.6 Template Attack . . . 62 7.6.1 Points of Interest . . . 62 8 Results 63 8.1 AES Implementations . . . 63

8.3 Device Characterisation . . . 65

8.3.1 Measurement Configuration . . . 65

8.3.2 Viability of Hamming Weight Model . . . 65

8.3.3 Signal-to-Noise Ratio . . . 66

8.4.1 Attack on Shuffling . . . 68 8.4.2 Second-Order Attack . . . 68 8.5 Template Attack . . . 70 9 Discussion 73 9.1 Results . . . 73 9.1.1 Practical Issues . . . 73 9.1.2 Device Characterisation . . . 74 9.1.3 Attacks . . . 75 9.1.4 Countermeasures . . . 76

(14)

x Contents

9.1.5 Number of Traces . . . 76

9.2 Method . . . 76

9.3 Power Analysis in a Broader Context . . . 77

10 Conclusions 79 10.1 Further Research . . . 81

A Mathematical Prerequisites 85 A.1 Statistics . . . 85

A.1.1 Parameter Estimation . . . 86

A.1.2 Differentiating Two Distributions . . . 86

A.1.3 Fisher z-transformation . . . 87

A.1.4 Multivariate Normal Distribution . . . 88

(15)

Notation

General

A A matrix.

Ai• Row i of matrix A.

A_•i Column i of matrix A.

ai,j Element of the i-th row and the j-th column of matrix A.

M_M_×N The set of all matrices with M rows and N columns. |A| The determinant of A.

v A vector.

vi The i-th element of vector v.

⊕ Exclusive or. Power Analysis

T Matrix of power consumption values. Z Matrix of hypothetical intermediate values. H Matrix of hypothetical power consumption values. R Matrix with the result of a dpa attack.

N Number of power traces and plaintexts.

S Number of sample points per power trace. k aescipher key.

w aesexpanded key.

rn Round key of the nth aes round.

ki The ith byte (or sub-key) of k.

ρmax Highest achievable correlation coefficient for a correct key guess.

(16)

(17)

Acronyms

AES Advanced Encryption Standard. ALU arithmetic logic unit.

ASIC application-specific integrated circuit. CBC Cipher Block Chaining.

CFB Cipher Feedback.

CMOS complementary metal-oxide-semiconductor. CPU central processing unit.

CTR Counter.

DES Data Encryption Standard. DPA differential power analysis. ECB Electronic Codebook.

HODPA higher-order differential power analysis. IV initialization vector.

LNA low-noise amplifier. LSB least significant bit.

NIST the U.S. National Institute of Standards and Technology.

OFB Output Feedback. PCB printed circuit board.

PRNG pseudo-random number generator. RAM random-access memory.

ROM read-only memory. SCA side-channel analysis. SNR signal-to-noise ratio. SPA simple power analysis.

(18)

(19)

Glossary

NMOS N-channel metal-oxide-semiconductor field-effect transistor.

NOP An assembler instruction that stalls the proces-sor for one clock cycle.

PMOS P-channel metal-oxide-semiconductor field-effect transistor.

RSA Widely used public-key cryptosystem invented by Rivest, Shamir and Adleman.

S-BOX Substitution box. A component of block ci-phers that provide confusion.

SMA A coaxial radio frequency connector.

XOR Logic exclusive or. Returns true if and only if both inputs differ.

(20)

(21)

1

Introduction

The following chapter provides an introduction to the thesis’ topic, a motivation to why it is of interest and its goals. The thesis’ outline is given at the end of this chapter.

1.1 Background

The history of cryptography is almost as long as the one of human communication itself. People have always sought new and more efficient ways to communicate with each other, from hand gestures to symbols, from speech to writing, from smoke signals to telegraphs, from telephones to the internet. At the same time, there has always been a desire to keep these communication channels safe from our enemies’ eyes and ears. As technology has advanced, so has the need for better and more sophisticated encryption. There is a constant struggle between the designers of secure systems and the people who are trying to break them. Modern society is highly dependent on electronic communication and a wide range of tasks such as money transfer and personal identification are performed digitally using devices called smart-cards. Smart-cards are embedded with inte-grated circuits of various degrees of complexity. Some examples of smart-cards are credit cards and sim cards. Due to the private nature of many smart-card ap-plications, the ability to securely transfer data without sacrificing convenience is important. Many smart-cards therefore implement microcontrollers programmed to perform data encryption and decryption with a secret key. Today, there is a growing interest in having everything from kitchen appliances to thermostats connected to the internet in a so called Internet of Things. These systems provide convenience and automation to end-users but at the same time they introduce

(22)

2 1 Introduction

new avenues of attack for malicious parties. The security of any cryptographic system hinges upon keeping the key secret. Often this key is fixed and shipped with the device. The goal of an attack is to find the key, at which point the device is compromised.

Classically, the study of breaking cryptographic systems involve trying to find mathematical weaknesses in the cryptographic algorithm or to detect usage pat-terns that may reveal sensitive information. Side-channel analysis (sca) is a sep-arate class of cryptographic analysis that provides insight into the implementa-tion of an algorithm by studying the physical characteristics of the system it is running on. Most algorithms are not designed with this in mind and provides lit-tle resistance against these attacks. One of the more potent types of side-channel analysis is power analysis. A power analysis attack reveals the secret key by ex-ploiting variances in the power consumption of the cryptographic device.

1.2 Sectra AB

Sectra is a Swedish company founded in the late 1970s by researchers at the Linköping Institute of Technology. Today it is a multinational company with offices in twelve countries. Sectra focuses on two specific areas; medical sys-tems and secure communication. The secure communication department focuses, among other things, on providing protection against eavesdropping to regular phone calls.

1.3 Purpose

The thesis’ purpose is to study the strength and applicability of power analy-sis attacks against the Advanced Encryption Standard (aes) implemented on a common 8-bit microcontroller. The goal is to provide a reasonably realistic at-tack setting using cheap and readily available equipment. Additionally, different options for software-based countermeasures are considered and their impact on performance and security is analysed.

1.4 Problem Formulation

The following questions constitute the thesis’ problem formulation:

1. What is power analysis and how can it be used to retrieve the aes encryption key from an 8-bit software implementation?

(a) What makes aes sensitive to power analysis?

(b) What different methods of power analysis exist and how can they be compared?

(23)

1.5 Delimitations 3 (a) Are there ways to make power analysis harder by modifying the

soft-ware?

(b) What is the performance cost of these countermeasures?

1.5 Delimitations

Power analysis is interesting not only from a software perspective, but also from a hardware point of view. Many cryptographic systems are implemented on application-specific integrated circuits (asics) that may be susceptible to side-channel attacks. However, asics typically run at much higher frequencies than microcontrollers, which increase the requirements on the measuring equipment. Attacks and countermeasures on hardware implementations is therefore not cov-ered in detail but much of the theory is still applicable as it is independent of the physical implementation.

1.6 Thesis Outline

Chapters 2 and 3 present an overview of historical and modern cryptography ending with a full description of aes. Understanding the different transforma-tions and the overall structure of aes is important as they are directly related to the attacks presented later in the thesis. In chapter 4 the power consumption of integrated circuits is presented and it is shown why and how it is possible to connect power measurements to the data processed by a microcontroller. This chapter also introduces various ways to model the power consumption, which is a prerequisite for the attacks. Chapter 5 presents a selection of different power analysis attacks and covers the theory and methodology behind simple and dif-ferential power analysis. Template attacks are also presented, which constitute a detailed profiling of the microcontroller’s power consumption before the actual attack. A number of countermeasures are given in chapter 6 and focuses both on how to implement them as well as on how to attack them. In chapter 7 the measurement setup is presented followed by a description of the method used to test and evaluate the attacks and countermeasures. Chapter 8 lists the results and they are further discussed in chapter 9. Conclusions and final thoughts are given in chapter 10. Mathematical prerequisites, mainly in statistics, are detailed in appendix A.

(24)

(25)

2

Cryptographic Concepts

Cryptography is a vast field of study focused on providing methods for securing communication channels against the threat of so called adversaries. Attacking and trying to find weaknesses in cryptographic systems is called cryptanalysis. This chapter introduces modern cryptographic concepts and definitions and pro-vides descriptions of two of the most famous historical ciphers.

2.1 A Cryptographic System

In cryptographic literature one often refers to two entities, Alice and Bob, who are trying to communicate with each other. Communication can take place across a distance, in time or both. An example of communication across a distance is a telephone call while storage on a hard disk drive is an example of communication in time. The medium over which the communication takes place, e.g. a wireless network, is called the channel. A third party called Eve (as in eavesdropper) represents the adversary. She is attempting to listen in on the channel with the goal of revealing the message Alice is sending to Bob. To foil Eve’s plans Alice and Bob uses a system to encrypt their messages. This situation is presented in figure 2.1. Alice is transmitting the message p, called the plaintext, to Bob. She encrypts the message using an encryption key, e. The encryption function E(m,e) returns a ciphertext, denoted c, which she sends over the channel. Bob then uses a decryption key d and applies a decryption function, D(c,d), to the ciphertext to regain the original message. Formally, a mathematical definition of an encryption scheme is presented below.

Definition 2.1 (Encryption scheme). Let P be the set of all plaintexts, C be the set of all ciphertexts and K be the set of all keys. These sets are called the

(26)

6 2 Cryptographic Concepts Alice Bob Eve E D p c p e d

Figure 2.1:A typical communication scenario with an adversary.

text space, ciphertext space and key space, respectively. An encryption scheme is defined as the pair of functions E and D, where E : P × K → C and D : C × K → P if for every e ∈ K there is a d ∈ K such that D(E(p,e),d) = p for all p ∈ P .

An encryption scheme is more commonly referred to as a cipher. Note that defi-nition 2.1 includes two keys: one for encryption and one for decryption but they may very well be the same. This introduces two types of systems: asymmetric-key systems where the keys are different and symmetric-key systems where the same key is used for encryption and decryption.

2.2 The Objectives of Cryptography

Encryption provides message confidentiality and plays a very important part in secure communications. However, cryptography incorporates many other con-cepts such as making sure that the message received actually is the message that was sent. In literature, four main objectives of cryptography are generally pre-sented: confidentiality, integrity, authentication and non-repudiation [1].

• Confidentiality: No unauthorized party should be able to read the contents of the message.

• Integrity: No alteration to the message should be possible, either from trans-mission errors or from malicious intent, without Bob detecting this.

• Authentication: Bob should be able to verify that Alice is the sender of the message.

• Non-repudiation: There should be no way for Alice to deny that she sent the message.

These additional applications are important. Simply encrypting a message does not guarantee that the communication is secure. First of all encryption does not provide any error detection. Maybe Eve can find a way to change the message in transit so that Bob receives bad information. Another problem is transmission errors due to noisy channels. Non-repudiation is similar to, but not the same as, authentication. How does Bob prove to anyone else that a message he received

(27)

2.3 Brief History 7 actually came from Alice? He could have sent the message himself. Similarly, Alice should not be able to deny having sent the message. What if she signs a contract and then rejects that she did?

2.3 Brief History

The history of cryptography is long and many examples of attempts at hiding messages can be found. Generally, the strongest driving force was protecting state secrets and military strategies. During the world wars coded messages and cryptanalysis played a big role in the final outcome. Examples include the Native American code talkers and the German Enigma machine [2]. Modern ciphers are very different from their predecessors but some of the basic building blocks de-rive from the same concepts: substitution and permutation. Substitution refers to exchanging plaintext symbols for others in a manner depending on the key, e.g. by replacing all occurrences of the letter “A” with the letter “Q”. Permuta-tion, or transposiPermuta-tion, is based on permuting the symbols in the message rather than substituting them. This means that all plaintext symbols are present in the ciphertext. An example is rewriting ANALYSIS as NALASYSI.

2.3.1 Substitution Ciphers

A substitution cipher codes a plaintext by substituting every letter in the plain-text by its corresponding letter from a substitution alphabet. The easiest way to visualise this is by writing out all letters in the plaintext alphabet on top of the substitution alphabet. The following example illustrates one of the most famous substitution ciphers: the Caesar cipher [1].

Plaintext alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ Subsitution alphabet: CDEFGHIJKLMNOPQRSTUVWXYZAB

If the text “Grumpy wizards make toxic brew” is encrypted with the above alpha-bet the ciphertext becomes:

Plaintext: GRUMPYWIZARDSMAKETOXICBREW Ciphertext: ITWORAYKBCTFUOCMGVQZKEDTGY

Note that in this case the spaces have been removed as they do not exist in the plaintext alphabet. Removing spaces and punctuation marks further obfuscates the message by hiding the location of word boundaries. The Caesar cipher above rotatesevery letter two steps to the right. The key is the letter C. Another way to express this generally for any key k is by labelling the letters in the plaintext alphabet from 0 to 25. Encryption is then written as E(m,k) = (m + k) (mod 26) and decryption as D(c,k) = (c−k) (mod 26). The Caesar cipher is not particularly secure as there are only 26 possible keys and it is trivial to test all of them. Instead of rotating the letters in the plaintext alphabet the substitution alphabet can be chosen as a random permutation of all available letters. In this case there are 26! possible keys which makes brute-forcing a lot harder. Another problem with

(28)

8 2 Cryptographic Concepts

substitution ciphers is that the same letters are always substituted in the same way and thus it is possible to use frequency analysis to break the substitution [1]. Other substitution ciphers can be constructed by for instance choosing a word as the key and then writing the remaining letters in the alphabet in order without repeating any letters. If the key is “HELLO” the substitution alphabet would become: HELLOABCDFGIJKMNPQRSTUVWXYZ.

The Vigenère Cipher

Some of the problems with the simple substitution ciphers can be solved by using a repeating key sequence. A keyword is chosen and repeated to match the length of the plaintext. Every letter in the plaintext is then substituted by applying the Caesar cipher corresponding to the current letter in the key. The following example encrypts a message with the keyword “KEY”.

Plaintext: GRUMPYWIZARDSMAKETOXICBREW Key: KEYKEYKEYKEYKEYKEYKEYKEYKE Ciphertext: QVSWTWGMXKVBCQYUIRYBGMFPOA

This is commonly called the Vigenère cipher [1]. As the example shows, the same letter may be encrypted into different letters. This makes frequency analy-sis somewhat harder as the adversary must first determine the length of the key.

2.4 Random Numbers

Random numbers occur in many cryptographic applications and they play an important role in power analysis countermeasures. Generating random numbers is a difficult topic, especially when it comes to computers. There are many sources of true randomness found in nature, e.g. radioactive decay. Flipping an even coin and rolling an unbiased dice are also sources of random numbers. While a series of coin tosses may produce good random numbers, it is way too slow for any practical applications. In practice, a pseudo-random number generator (prng) is used. A prng is a deterministic algorithm that produce a sequence of seemingly random numbers. If the same input is given, the same output sequence is returned. It is therefore common to use a source of true randomness as input called the seed. If the seed is truly random and a good prng is used, the output sequence will be unpredictable. A property of all prngs is that at some point the output sequence will repeat, i.e. it has a period. Cryptographically secure generators should have long periods.

2.5 Asymmetric Cryptography

Modern encryption schemes can be divided into two groups depending on whether the same key is used for both encryption and decryption or not. Those that do use the same key are symmetric and those that do not are asymmetric. Another term for asymmetric cryptography is public-key cryptography. One of the major

(29)

2.6 Secret Sharing 9 challenges in symmetric cryptography is how to securely share the key with all in-volved parties. The obvious way is to meet up, in person, and determine the key before sending the data. This is however not practical. Public-key cryptography attempts to solve this by splitting a person’s key into two parts: a public key that is made available to everyone intended for encryption and a secret key intended for decryption. Public-key encryption schemes are generally slow compared to their symmetric counterparts which makes the transmission of large messages infeasible. A common use case is to apply symmetric encryption for the actual message and public-key encryption to securely distribute the secret key [3]. Asymmetric encryption schemes are based on trapdoor one-way functions. A one-way function f has the properties that it should be easy to calculate f (x) given x, but hard to calculate x given f (x). A trapdoor one-way function addi-tionally satisfies that it is easy to calculate x given some certain knowledge. Un-fortunately, there is no proof that trapdoor one-way functions exist and no real way of constructing them. There are, however, some functions that are thought to be trapdoor one-way functions. While a detailed description of asymmetric cryptography is out of this thesis’ scope it is of interest in the context of power analysis. In chapter 5 a short example is given where modular exponentiation is attacked. Modular exponentiation is believed to be a trapdoor one-way function and is commonly seen in public-key cryptography.

2.6 Secret Sharing

The following section describes a way to share a secret between different parties so that no single individual is capable of recovering the secret without help from the others. This can be likened to a safe that requires multiple keys to open where the keys are distributed to different people. Using a unique physical lock for each key is clumsy and adding or removing keys quickly becomes inconvenient. Consider instead a combination lock. How should this combination be divided among the involved parties? Secret sharing refers to a set of methods for splitting a secret into a number of shares. In order to reconstruct the secret multiple (or all) shares must be combined. This topic is closely related to one of the more popular power analysis countermeasures.

2.6.1 Secret Splitting

Suppose you want to send a message, e.g. the combination to a safe, to Alice and Bob. They should only be able to read the message if they combine their knowl-edge. Represent the message m as an integer and generate a random number

r. Give r to Alice and m− r to Bob. The message is reconstructed by adding the shares back together. It is important that all possible values of r are equally likely. However, this is not the case as there are infinitely many integers [1]. To make sure all r are equally likely with the probability1_/

N it is chosen as a random

inte-ger modulo N . To make sure that m can be recreated N must be larinte-ger than all possible messages. Secret splitting is generalized to n people by generating n − 1

(30)

10 2 Cryptographic Concepts

random numbers r1, r2, . . . , rn−1 (mod N ) and distributing them as shares. The

final share is calculated as rn= m − r1− r2− ··· − rn−1 (mod N ).

2.7 Cryptanalysis

Attacking cryptographic systems is known as cryptanalysis. The methods em-ployed usually depend on both the amount and the type of information an ad-versary has. A fundamental idea in cryptography is that the adad-versary knows the system, i.e. an encryption scheme must be secure even if all details about it are known except the secret key [4]. This is called Kerckhoffs’s prinicple after the Dutch cryptographer Auguste Kerckhoffs. The following list describes the possible attacks available based on the information available to Eve [3]:

• Ciphertext-only: A set of ciphertexts are available to Eve.

• Known plaintext: Eve has access to a set of plaintext-ciphertext pairs. • Chosen plaintext: Eve can chose the plaintexts and acquire the

correspond-ing ciphertexts through encryption.

• Chosen ciphertext: Eve can chose the ciphertexts and acquire the corre-sponding plaintexts through decryption.

2.7.1 Side-Channel Analysis

Side-channel analysis is another type of cryptanalysis, but instead of employing the previously mentioned methods and mathematically work your way to the key an additional source of information is used: the physical dimension. While a cryptographic algorithm may be mathematically secure under the assumption that an adversary can only determine the in- and output it is not necessarily se-cure if intermediate results can be extracted. As figure 2.2 shows, leakage of sensitive information can be observed from many different sources. The crypto-graphic system consumes power and the movement of charge carriers gives rise to electromagnetic fields. Another way to infer intermediate values is by care-fully examining variations in the delay between input and output. Paul Kocher is one of the pioneers in side-channel analysis and introduced both timing and power analysis where he and his co-authors demonstrated attacks on asymmetric and symmetric ciphers [5, 6]. More recently, the acoustic side-channel have been used to extract rsa encryption keys [7]. Acoustic attacks exploit sound caused by small vibrations in electrical components.

One of the major advantages of side-channel attacks is that in many situations (differential power analysis in particular) the required knowledge of the attacked system is minimal. Often it is enough to know the algorithm while the device it-self is treated as a black box. One makes a distinction between invasive and non-invasiveside-channel analysis. In the first case the system is modified to allow an adversary to record some property, e.g. by adding current sensing circuitry. Con-versely, a non-invasive side-channel attack does not require modification of the

(31)

2.7 Cryptanalysis 11 target system. Placing a microphone in the vicinity of the target is non-invasive.

Alice p E c e Timing information Power dissipation Electromagnetic radiation etc. e Eve

Figure 2.2:Intermediate results leak through physical side-channels and can be exploited to extract the secret key.

A closely related area is fault attacks (or fault injection). By causing faults in the internal logic of a processor it is possible to make it behave in a way beneficial to an adversary or even produce key material. Some examples of fault injection include introducing variations in the power supply voltage or glitches in the clock signal causing the processor to skip instructions or misinterpret data [8]. Fault injection attacks are active as the adversary chooses the input and controls the device behaviour. In contrast, side-channel analysis attacks are generally passive.

(32)

(33)

3

Symmetric-Key Cryptography

In this chapter, an overview of symmetric-key cryptography is presented. Sym-metric algorithms are often fast and easy to implement in both hardware and software and are therefore used in a wide range of applications. Stream ciphers refer to symmetric-key encryption schemes that operate on plaintexts of arbitrary length. This is in contrast to block ciphers that operate on plaintexts of fixed length. Block cipher primitives are therefore often used in combination with dif-ferent modes of operation.

3.1 Stream Ciphers

As the name implies stream ciphers operate on streams of plaintexts and keys and produce streams of ciphertexts.

Definition 3.1 (Stream cipher). Let p = p₁p2. . . where pi ∈ P and k = k1k2. . .

where ki ∈ K. A stream cipher is defined as an encryption scheme such that

E(p, k) = c1c2. . . = c where ci∈ C and D(c,k) = p1p2. . . = p.

Often the key is used as a seed to a prng that produces a pseudo-random bit sequence, which in turn is xored with the plaintext to produce the ciphertext.

3.1.1 One-Time Pad

The one-time pad is a stream cipher where P = C = K = {0,1}. Encryption and decryption are defined as the logical exclusive-or (xor):

E(p, k) = p⊕ k = c

D(c, k) = c⊕ k = p

(34)

14 3 Symmetric-Key Cryptography

The security is based on the key stream being completely random. Every bit in the key is either 1 or 0 with the probability one half. The one-time pad is special because it provides perfect secrecy [1]. This means that the ciphertext gives no information on the plaintext, i.e. every plaintext is equally likely given only the ciphertext. There are of course some major drawbacks with the one-time pad. First of all, there must be as many bits in the key stream as there are in the plaintext. Managing the secret key quickly becomes troublesome as the message length increases. The second issue is that the key may only be used once, hence the name the one-time pad. Suppose Eve intercepts two ciphertexts c1 and c2

encrypted with the same key k. If she xors the ciphertexts she can effectively eliminate the key since c1⊕c2= m1⊕k ⊕m2⊕k = m1⊕m2. Since the messages are

unlikely to be random m1⊕m2may provide a lot of information on the individual

messages. For these reasons the one-time pad is rarely used in practice, but many stream ciphers draw inspiration from it.

3.2 Block Ciphers

A block cipher is a symmetric-key encryption scheme that operates on fixed sized blocks of data. Today’s most popular block ciphers are the Advanced Encryption Standard and its predecessor the Data Encryption Standard.

Definition 3.2 (Block cipher). A block cipher is defined as an encryption scheme with the plaintext and ciphertext spaces P = C = {0,1}m _{and the key space K =}

{0,1}n_{, where m is the block size and n is the key length.}

Essentially, a block cipher can be seen as a substitution cipher but instead of mapping single symbols entire blocks of symbols are substituted. Ideally, the perfect block cipher would be able to output all possible permutations of C. There are 2m_{! elements in the set of all permutations of {0,1}}m_{. In order to be able to}

generate every element the key must be log₂(2m_{!) ≈ (m−1.44)2}m_{bits long [3]. This}

number is huge and there is no way to use keys of that size in practice. Instead, block cipher designers try to approximate the ideal behaviour.

3.2.1 Design Criteria

There are two important design criteria for block ciphers, namely diffusion and confusion. Diffusion means that a small change in the plaintext should cause the ciphertext to change significantly. This is sometimes referred to as the avalanche effect. Another way to define diffusion is through the strict avalanche criterion. Diffusion is required to force an adversary to use full block statistics rather than single letter statistics.

Definition 3.3 (Strict avalanche criterion). If a single bit in the plaintext is flipped, every bit in the ciphertext should flip with the probability 1₂.

Confusion refers to the property that every bit in the ciphertext should depend on multiple bits of the key. The goal is to make the relation between ciphertext

(35)

3.3 The Advanced Encryption Standard 15 and key as complex as possible and can be accomplished by using non-linear transformations.

3.2.2 Modes of Operation

By design stream ciphers can handle arbitrarily sized data but block cipher prim-itives are limited to messages with a length equal to the block size. To deal with this limitation a so called mode of operation is implemented on top of the block cipher. The mode of operation specifies how to securely reuse the same block cipher with the same key over multiple blocks of data. While providing con-fidentiality is the primary objective some modes of operation are designed to incorporate message authentication as well. This thesis’ objective is mainly to study attacks against the aes primitive, so this section will only briefly cover some of the most common modes of operation. Two modes, Electronic Codebook (ecb) and Cipher Block Chaining (cbc), are presented in the following sections to illustrate the impact of choosing a suitable mode. Other examples are Cipher Feedback (cfb), Output Feedback (ofb) and Counter (ctr). These other modes are effectively stream ciphers that use the block cipher as a prng.

Electronic Codebook

The ecb mode of operation constitute the most straightforward method to en-crypt messages of any size. The message is divided into chunks of the same size as the block size of the underlying block cipher and then encrypted individually. Decryption is performed in the same way. ecb has a serious weakness in its in-ability to hide data patterns. Identical plaintexts will always encrypt to the same ciphertext, which enables Eve to construct a codebook by observing messages sent between Alice and Bob. Since most messages contain some structure Eve may be able to determine the message’s context or even modify the message. The ecbmode of operation should generally not be used for arbitrary data.

Cipher Block Chaining

cbcis intended to reduce some of the problems encountered in ecb by making the next ciphertext output depend on the previous one. Similarly to ecb the message is divided into blocks but before encrypting the block it is xored with the previ-ous ciphertext. This removes the data pattern problem since two blocks with the same data will result in two different ciphertexts. As there is no previous cipher-text for the first block something called an initialization vector (iv) is supplied. The iv is a random number sent in the clear with the encrypted message. It is important that the iv is random, otherwise the adversary can detect when two identical messages are encrypted.

3.3 The Advanced Encryption Standard

The Advanced Encryption Standard (sometimes known as as Rijndael after its creators Vincent Rijmen and Joan Daemen) is an encryption standard ratified by the U.S. National Institute of Standards and Technology (nist) in 2001 [9]. In

(36)

1997 nist announced that they were looking to replace the old Data Encryption Standard (des) and invited the cryptologic community to take part in the process. In addition to analysing the security aspects of the algorithm, participants were asked to take the implementation costs in both software and hardware into con-sideration. Fifteen algorithms were evaluated and out of five finalists Rijndael was chosen as the winner.

3.3.1 Notation

Some notation must be introduced before describing the algorithm. In aes the smallest unit is the byte. Without going into details every byte corresponds to an element in a finite field (or Galois field ) denoted as F28 and all arithmetic is

performed in this finite field. A group of four bytes is called a word. A word

x consisting of the bytes a, b, c and d is written as x ={a,b,c,d}. Let Nb denote

the block size in words, Nr denote the number of rounds and Nk denote the key

length, also in words. Every round, n, is associated with a 16 byte round key denoted rn_{. The current progress of the algorithm is stored in an array of 16 bytes}

called the state. The state array can be viewed as a 4×4 column-major matrix and it is denoted by S. Finally, whenever the state is updated this is indicated by a prime, e.g. S0_{. The chapters on power analysis often refer to something called a}

sub-key. The ith sub-key of k is written as ki and corresponds to the ith byte of

k. A 16-byte key therefore consists of 16 sub-keys.

3.3.2 Algorithm Structure

The aes algorithm is a block cipher working on blocks of 128 bits. Rijndael sup-ports different block sizes but in the aes specification it is fixed to four words, i.e. Nb = 4. Supported key lengths are 128 bits, 192 bits and 256 bits and the

number of rounds, Nr, are 10, 12 and 14, respectively for each key length. Every

round consists of a number of transformations operating on the state. Figure 3.1 presents the algorithm as a block diagram. At the start of the algorithm, the in-put is copied into the state. Let b hold the inin-put bytes b0, b1, . . . , b15. The state is

written as: S=             b0 b4 b8 b12 b1 b5 b9 b13 b2 b6 b10 b14 b3 b7 b11 b15             =             s0,0 s0,1 s0,2 s0,3 s1,0 s1,1 s1,2 s1,3 s2,0 s2,1 s2,2 s2,3 s3,0 s3,1 s3,2 s3,3             AddRoundKey

In AddRoundKey the state is modified by xoring it bytewise with the current round key rn_{. How the round keys are derived from the secret key is explained}

in section 3.3.3. S0=             s_0,0⊕ rn 0 s0,1⊕ r4n s1,2⊕ r8n s1,3⊕ r12n s_1,0⊕ rn 1 s1,1⊕ r5n s2,2⊕ r9n s2,3⊕ r13n s_2,0⊕ rn 2 s2,1⊕ r6n s3,2⊕ r10n s3,3⊕ r14n s_3,0⊕ rn 3 s3,1⊕ r7n s3,2⊕ r11n s3,3⊕ r15n            

(37)

3.3 The Advanced Encryption Standard 17 In AddRoundKey SubBytes ShiftRows MixColumns AddRoundKey Key whitening Nr− 1 rounds Final round SubBytes ShiftRows AddRoundKey Out

Figure 3.1:Structure of the aes algorithm.Nris the number of rounds.

SubBytes

SubBytes substitutes the state by applying a non-linear transformation

indepen-dently on every byte. The substitution function S, called the S-box, is constructed by combining two invertible functions, g and h so that S(x) = h(g(x)). The two functions are defined as

g : F28→ F₂8, x→ x−1 h : F28→ F₂8, x→ Ax + b

where h is an affine transformation operating on the bits of x and A and b are

A=                               1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1                               , b =                               1 1 0 0 0 1 1 0                               .

(38)

Zero has no inverse in F28. This is solved by mapping zero to itself, i.e. g(0) = 0.

Finally, the updated state becomes: S0=             S(s_0,0) S(s0,1) S(s0,2) S(s0,3) S(s_1,0) S(s_1,1) S(s_1,2) S(s_1,3) S(s_2,0) S(s_2,1) S(s_2,2) S(s_2,3) S(s_3,0) S(s_3,1) S(s_3,2) S(s_3,3)            

SubBytes is generally efficiently implemented as a lookup table as there are only

256 elements in F28. ShiftRows

The state is modified by rotating each row a fixed number of bytes to the left. The

ith row of S is rotated i bytes, which means that S_0•is left unchanged.

S=             s_0,0 s_0,1 s_0,2 s_0,3 s_1,0 s_1,1 s_1,2 s_1,3 s_2,0 s_2,1 s_2,2 s_2,3 s_3,0 s_3,1 s_3,2 s_3,3             ⇒ S0=             s_0,0 s_0,1 s_0,2 s_0,3 s_1,1 s_1,2 s_1,3 s_1,0 s_2,2 s_2,3 s_2,0 s_2,1 s_3,3 s_3,0 s_3,1 s_3,2             MixColumns

MixColumns transforms the state by treating every column in S as a polynomial

over F28. The polynomials are multiplied by the fixed polynomial a(x) = 3x3+ x2+x+2 (mod x4+1). This can be seen as a matrix multiplication so that the new state becomes:                s0_0,j s0_1,j s0_2,j s0_3,j                =             2 3 1 1 1 2 3 1 1 1 2 3 3 1 1 2                          s0,j s_1,j s_2,j s3,j             

Similarly to the S-box, multiplication by two and three in F28can be precomputed

for all possible byte values and implemented as lookup tables.

3.3.3 Key Schedule

The key is mixed with the plaintext in the form of so called round keys. A new round key is required for every invocation of AddRoundKey, which means that a total of Nr+ 1 round keys must be generated. The aes key schedule generates a

string of Nb× (Nr+ 1) words called the expanded key, denoted w. This can be

written as

w= r0r1. . . rNr

where ri _{is the ith round key. Let w}

i, 0 ≤ i < Nb× (Nr+ 1) denote the ith word

of the expanded key. Two additional functions are defined for the key schedule:

SubWord and RotWord. These transformations take words as input. SubWord

simply applies the S-box to every byte in the given word while RotWord performs a circular shift one byte to the left. Additionally, a word array of round constants

(39)

3.3 The Advanced Encryption Standard 19 often called Rcon, denoted here as c, is defined such that ci = {2i−1, 0, 0, 0} for

i≥ 1. As with all arithmetic in the algorithm, 2i−1_{is a power of 2 in F}₂₈_.

The first Nk words of w are set to the secret key. The rest of the words are

calcu-lated so that wi depends on wi−1 and wi−Nk. Algorithm 1 presents the entire key

schedule. In a resource-constrained environment the presented algorithm might Algorithm 1:Generate expanded key

Input: Secret key k (4 × Nk bytes)

Output: Expanded key w (Nb× (Nr+ 1) words)

fori← 0 to Nk− 1 do

wi← {k4i, k4i+1, k4i+2, k4i+3}

end

fori← Nk toNb× (Nr+ 1) − 1 do

t← wi−1

ifi mod Nk= 0 then

t← SubWord(RotWord(t)) ⊕ ci/Nk

else ifNk> 6 and i mod Nk= 4 then

t← SubWord(t) end

wi← wi−Nk⊕ t

end return w

not be ideal as the entire expanded key occupies a lot of memory. Any round key can be derived from the previous round key and a common approach is to perform the key schedule on the fly.

3.3.4 Decryption

A straightforward method of decryption is to just invert all the transformations and perform them in reverse order. With a slight modification to the key sched-ule it is possible to avoid changing the algorithm structure and sufficient to replace the transformations with their inverses: InvSubBytes, InvShiftRows and

InvMixColumns. AddRoundKey is its own inverse and can be reused directly. InvShiftRows is simply ShiftRows with the rotations being to the right instead of

the left. In InvSubBytes the same operations are performed but with the inverse of the S-box: S−1_{(x) = g}−1_(h−1_{(x)) = g(h}−1_{(x)). Similarly, InvMixColumns uses the}

inverse of the polynomial a(x), i.e. a−1_{(x) = 11x}3_{+ 13x}2_{+ 9x + 14, but is otherwise}

unchanged.

The key schedule is modified by applying InvMixColumns to all round keys ex-cept the first and the last ones. The round keys are then applied in the opposite order; the first round of decryption uses the last round key of the modified key schedule.

(40)

(41)

4

Power Consumption

In this chapter the power consumption of a microcontroller is discussed and how it can be used in power analysis. The chapter begins with a quick overview of an inverter and its power consumption followed by a structural description of a microcontroller. In a power analysis attack the adversary must model the power consumption and this is further discussed in sections 4.4 and 4.5.

4.1 The Inverter

Since the 1970s the semiconductor industry has been dominated by complemen-tary metal-oxide-semiconductor (cmos) technology. Ease of scaling, a high noise resistance and low static power consumption are some of the reasons why cmos is so popular. A basic understanding of the power characteristics of a cmos devices is therefore of interest. The power dissipation of a cmos circuit can be divided into three parts: the dynamic power consumption, the static power consumption and the power dissipation caused by short circuits [10].

4.1.1 Dynamic Power

Figure 4.1a shows a typical cmos inverter where Cload is the load capacitance

on the inverter’s output. Dynamic power consumption is due to charging and discharging Cload. When Vinis high, the pmos transistor is turned off while the

nmostransistor is conducting, effectively shorting V_out to ground causing C to discharge. When Vingoes low the opposite occurs and the load capacitance is

charged from zero to Vdd. This means that power is only consumed when the

input goes from high to low. The latter scenario is illustrated in figure 4.1b and

(42)

22 4 Power Consumption Vdd

Vin

Cload

Vout

(a)A cmos inverter.

Vdd

Cload

vout(t)

i(t)

(b)Charging of load capacitance when the input goes from high to low.

Figure 4.1:Illustration of an inverter’s power consumption.

equation (4.1) derives the energy consumed during the charging of Cload.

E = Z∞ 0 Vddi(t) dt = Z∞ 0 VddCload dvout dt dt = ZV_dd

0 CloadVdddvout= CloadV 2

dd

(4.1) In order to calculate the dynamic power consumption it is necessary to determine how often the inverter toggles, or rather, how often the output goes from low to high.

Definition 4.1 (Switch activity). The switch activity is the probability that the output of a node goes from logic zero to logic one during one clock period. Equation (4.2) presents the dynamic power consumption where α is the switch activity of the cmos inverter and f is the circuit’s clock frequency.

Pd= αf E = αf CVdd2 (4.2)

A direct consequence of equation (4.2) is that the dynamic power consumption is dependent on the input of the circuit as that is what determines the switch activ-ity. Thus, data variations can be related to variations in the power consumption, which is the focus of power analysis.

4.1.2 Static Power

One of the main benefits of cmos technology is the low static power consumption. Between the triggering flanks, the inverter’s input remains constant and one of the transistors will be off and thus, no current can flow from the power supply to ground. Unfortunately, the transistors are not ideal, and there will always be some leakage current. The static power consumption is independent of the in-put data and generally very small compared to the dynamic power consumption. Therefore, if is of little interest in the context of power analysis.

(43)

4.2 The Microcontroller 23

4.1.3 Short Circuit Power

Short circuit power dissipation is caused by the rise and fall times of the digital input signal. At some point, both transistors will conduct at the same time and there will be a direct path between the power supply and ground. The resulting current is called the short circuit current. Similarly to the static case, the short circuit power consumption is generally a lot smaller than the dynamic power consumption and is not further regarded in this thesis.

4.2 The Microcontroller

A microcontroller is an integrated circuit that contains a processor, some mem-ory and often peripherals such as serial communication buses, data converters etc. The processing power of a microcontroller is usually significantly lower than that of a microprocessor used in a personal computer but they are a lot cheaper, consume less power and produce less heat. This coupled with the previously men-tioned integration of memory and peripherals make microcontrollers suitable for embedded applications.

4.2.1 Structure

Arguably, the most important component in a microcontroller is the central pro-cessing unit (cpu). Its main responsibility lies in fetching instructions from mem-ory, decoding them and finally executing them. The cpu itself consists of a control unit, an arithmetic logic unit (alu) and a number of registers. The cpu is con-nected with the microcontroller’s main memory and peripherals through a bus as depicted in figure 4.2. In an 8-bit microcontroller the bus is eight bits wide.

CPU

Memory Peripherals

Figure 4.2:Block structure of a microcontroller.

When reading from (or writing to) memory the eight bus wires are charged to one or discharged to zero depending on the bit that is sent. Generally, bus wires are long and have high capacitive loads. Thus, the power consumption of trans-ferring data is high and constitute a large part of the overall power consumption.

4.2.2 Operation

While a thorough discussion of the intrinsics of the targeted microcontroller would prove helpful in some power analysis attacks, particularly if reverse en-gineering the firmware is of interest, the attacks discussed in this thesis do not directly benefit from this knowledge.

(44)

24 4 Power Consumption

4.3 Measuring Power Consumption

There are numerous ways of measuring power. One of the simplest methods is to measure the voltage over a resistor connected in series between the power sup-ply and the microcontroller. Alternatively, a current probe can be used instead. In some situations inserting a resistor is not possible as any modifications would make it apparent that the device has been tampered with. Additionally, some mi-crocontrollers may be powered internally from small batteries making it difficult to access the power wires. Alterations to the device can be avoided by measur-ing the electromagnetic field around the microcontroller. In power analysis the attacker is generally not interested in exact measurements of the power consump-tion but rather values proporconsump-tional to it. This secconsump-tion shows two methods that can achieve this.

4.3.1 Shunt

The microcontroller’s power consumption can be determined by measuring the voltage drop over a small resistor, known as a shunt, inserted between the Vdd

-pin on the microcontroller and the power supply. This is called high-side sensing and is a technique for sensing currents. Figure 4.3 illustrates this scenario where the microcontroller is represented by a generic load with a resistance Rload. By

measuring the voltage Vload the power consumption can be calculated. The

in-stantaneous power consumption of the microcontroller is given by equation (4.3).

P = RloadIload2 (4.3)

The current Iload can be calculated by applying Ohm’s law at the shunt resistor,

as specified in equation 4.4.

Iload=Vshunt

R =

V0− Vload

R (4.4)

The power consumption is proportional to the square of the current and Vload is

always lower than V0, therefore P is a strictly decreasing function with regards

to Vload. That is, the lower the voltage is at Vload, the higher the power

consump-tion is. It is similarly possible to insert a resistor between ground and the

mi-V0 R Vshunt Iload Rload Vload

Figure 4.3:High-side sensing with shunt.Vload is the instantaneous voltage

(45)

4.4 Modelling Power Consumption 25 crocontroller’s Vss-pin, known as low-side sensing. The difference is that a high

measured voltage would indicate a high power consumption. In a power analysis attack, the value of interest is Vload while P and Iload are never calculated. Thus,

Rloaddoes not have to be known.

A problem with high-side sensing is that it introduces a high common-mode volt-age as, generally, Vload is very close to V0. This puts additional demands on the

measurement circuitry. Specifically, the voltage range must be wide enough to in-clude the entire power consumption signal while still maintaining a high enough resolution to provide acceptable results. Low-side sensing on the other hand re-moves the direct path to ground, which might cause behavioural issues. However, in the case of a microcontroller this should not be a problem. By using a differen-tial probe instead of a single-ended one, both of these issues can be alleviated.

4.3.2 Probing the Electromagnetic Field

Another observable quantity is the electromagnetic radiation emanated by the microcontroller. The movement of charge carriers generates a magnetic field that varies in a data-dependent manner. An H-field probe is a conducting wire (such as a coaxial cable) with a coil at the end. The probe is placed in the vicinity of the target so that the magnetic field induces a voltage through it. This voltage can then be measured using, for example, an oscilloscope.

Equation (4.5) describes how the magnetic field B is produced by a static electric current I in a path C, where µ0 is the magnetic constant and r is a vector from

the wire element dl and the point in space where the magnetic field is measured. This is known as the Biot-Savart law [11].

B=µ0I 4π Z C dl× r |r|3 (4.5)

Variations in the magnetic field will induce an electromotive force emf (i.e. a volt-age) in the probe according to Faraday’s law of induction as shown in (4.6) where

S is the surface area of the probe loop [12]. emf =−d

dt

Z

S

B· ds (4.6)

Again, the actual power consumption is never calculated. For the purpose of power analysis it is enough to know that the voltage measured in the H-probe is related to the microcontroller’s current.

4.4 Modelling Power Consumption

To execute a power analysis attack it is necessary to model the power consump-tion of the device under attack. More specifically, assumpconsump-tions has to be made that relate the information leakage to the power consumption. One of the more common models is to assume that a device leaks the Hamming weight of the data it manipulates, which is often the case for microcontrollers [13]. While it is

(46)

possi-26 4 Power Consumption

ble to use full-scale circuit simulations it is often not realistic as an attacker rarely has enough knowledge about the device’s physical implementation to make such a model. Even with the aforesaid detailed knowledge the computational complex-ity of circuit simulations (through tools such as spice) would make an attack too time-consuming.

4.4.1 Binary Models

Binary models work on the assumption that if some predicate is true, the de-vice consumes more power than when the predicate is false. For example if the least significant bit (lsb) of some byte is set the modelled power consumption is one, otherwise zero. This model may seem crude as a single bit’s contribution to the overall power consumption is fairly small at first glance, but in a 8-bit microcontroller every bit correspond to one eighth of the data dependent power consumption. However, and this is important to note, the actual contribution is not important as long as the difference can be detected. This holds true even for devices with wider words, such as 64-bit processors. The difference is harder to detect, but as long as it is possible the model serves its purpose.

4.4.2 Hamming Weight Model

In the Hamming weight model the power consumption is modelled to be propor-tional (or inversely proporpropor-tional) to the Hamming weight of a binary number. Definition 4.2 (Hamming weight). The Hamming weight of an n-bit binary number B = bn−1. . . b1b0is defined as HW(B) = n−1 X i=0 bi.

The Hamming weight effectively correspond to the number of ones within the binary number. The model is motivated by the observation that power is only consumed in a cmos device during switching, i.e. when data changes. A large part of the total power consumption is due to bus activity like when data is trans-ferred between the microcontroller’s memory and its registers. A basic assump-tion when applying the Hamming weight model is that all data transfers are pre-charged. This means that before the data is put on the bus, the bus wires are set to some predetermined value such as logic zero or one. If they are set to zero prior to the data transfer the number of transitions will be exactly the same as the number of ones within the data. Thus, in the Hamming weight model the instantaneous power consumption of the 8-bit microcontroller is modelled as a value between zero and eight.

4.4.3 Hamming Distance Model

An extension to the Hamming weight model is to use the Hamming distance to improve on some shortcomings of the previous model.

(47)

4.5 Power Consumption Components 27 Definition 4.3 (Hamming distance). The Hamming distance between two n-bit binary numbers A = an−1. . . a0and B = bn−1. . . b0is defined as

HD(A, B) =

n−1

X

i=0

|ai− bi| = HW(A ⊕ B).

The Hamming distance effectively measures the number of bits where two binary numbers differ and is particularly suitable for cmos devices. Consider a generic cmosdevice where some value is read from a register, processed and then stored in the same register. The number of bits that toggles is equivalent to the Ham-ming distance. In the microcontroller’s case, suppose that the bus wires are not pre-charged to either ones or zeros but to some other, unknown constant. In this case the Hamming weight model does not correctly predict the power consump-tion. If it is possible to determine this constant, the Hamming distance model can be utilized instead.

4.5 Power Consumption Components

Studying the instantaneous power consumption is of great interest in the context of power analysis. The power consumption is divided into two components:

• The leakage signal: The part of the power consumption that provides ex-ploitableinformation.

• The noise signal: Everything that cannot be exploited.

The leakage signal is denoted PSwhile the noise signal is written as PN. The total

power consumption is presented in (4.7).

P = PS+ PN (4.7)

P , PS and PN are modelled as random variables. The components may refer to

very different things depending on what is being exploited. In general, PS

de-pends on what the microcontroller is doing and the data that is being manipu-lated. Furthermore, it will depend on what power model is being used. In a binary power model the leakage signal consist of the contributions of one bit, while the other bits belong to the noise signal.

4.5.1 Noise

The noise component contains all contributions to the power consumption that are not of interest during power analysis. This section lists some sources of noise that are independent of the choice of power model.

Electronic Noise

In this thesis, electronic noise is used to refer to all noise components deriving from electronic components. This includes thermal noise, inherent to all elec-tronic devices and other device specific sources such as shot noise. Elecelec-tronic