Beyond Cryptography: A Multi-layer Approach to Communication Privacy

(1)

Beyond Cryptography:

A Multi-layer Approach to

Communication Privacy

by

Nathaniel Lloyd Gross B.S., Dordt College, 2008

A thesis submitted to the Graduate Faculty of the University of Colorado Colorado Springs

in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering Department of Electrical and Computer Engineering

(2)

This thesis for the Master of Science in Electrical Engineering by Nathaniel Lloyd Gross

has been approved for the

Department of Electrical and Computer Engineering by

Willie Harrison, Chair

Mark Wickert

(3)

Nathaniel Lloyd Gross

Master of Science in Electrical Engineering Beyond Cryptography:

A Multi-layer Approach to Communication Privacy Thesis directed by Professor Willie Harrison

Recent developments in the field of physical-layer security are beginning to challenge commonly-held assumptions of how security can be obtained in a communication sys-tem. Conventional security practices almost always rely on cryptographic protocols to ensure message confidentiality. Physical-layer security, however, seeks to complement or even replace higher-level cryptography with techniques that leverage low-level nonideali-ties such as channel noise in order to keep messages secure. The primary contributions of this work are twofold: we first present an analysis of how errors introduced into cipher-text observed over a noisy wiretap channel negatively affect the performance of cipher attack algorithms, thereby granting an additional margin of security to the system. This analysis is performed using the simple substitution cipher. Secondly, we present proofs that rigorous secrecy metrics are applicable to a system in which a coset coding scheme is used over a multi-hop network.

(4)

(5)

Acknowledgements

There are several individuals who have contributed, whether directly or indirectly, to the accomplishment of this body of work, and it is important that they be recognized. I would first like to express my gratitude to my parents, who have long supported and encouraged me in my academic endeavors. I owe particular thanks to my father, who showed me even at an early age just how fascinating the field of engineering can be. In addition, I would not have the skill and tools to succeed in this field were it not for the instruction of my engineering professors at Dordt College. Each of them, but particularly my advisor Dr. DeBoer, taught me what it means to be a competent engineer who applies his knowledge ethically to find solutions to this world’s technical problems.

Of course, this work would only be a jumble of disconnected ideas without the involve-ment of my advisor at UCCS, Dr. Harrison. I would like to express my thanks to him, for inviting me to work with him on some neat research projects.

And finally I would like to thank my wife, who has been my biggest support and en-couragement throughout my time in graduate school. She has put up with my hectic schedule and many late nights of writing without complaint.

(6)

List of Tables

3.1 Attack family performance comparison. . . 19

(9)

List of Figures

1.1 System model for conventional cryptography. . . 3

1.2 System model for wiretap channel coding. . . 3

2.1 General communication system model. . . 6

2.2 Binary erasure channel model. . . 8

2.3 Binary symmetric channel model. . . 9

3.1 General communication system model with relevant blocks for cryptog-raphy in the presence of noise highlighted. . . 14

3.2 System model depicting three players in a network. Alice encrypts and transmits a secret message to Bob using a key known only to the two legitimate users. Eve passively eavesdrops on the transmitted data, albeit through the error-prone wiretap channel Q, and then employs an attack on the error-prone ciphertext. . . 14

3.3 Depiction of a hidden Markov model. . . 18

3.4 Modified HMM for modeling plaintext states, ciphertext states, and noisy ciphertext states. . . 20

3.5 Attack performance on noiseless ciphertext in terms of the %LR of the guessed message plotted vs. the length of the ciphertext. The average of the data is plotted as well. . . 23

3.6 Attack performance on noiseless ciphertext in terms of the number of correctly guessed keypairs plotted vs. the length of the ciphertext. The average of the data (rounded up) is plotted as well. . . 24

3.7 Attack performance on a 1000-symbol cipher with a third-order curve fit and a 3σ upper bound. . . 25

3.8 Attack performance on a 1000-symbol cipher with a linear fit and a 3σ upper bound. . . 26

3.9 Attack performance on a 70-symbol cipher in terms of the number of correctly guessed keypairs plotted vs. the channel mutual information. The maximum number of recovered keypairs at IQ(X; Z) = 2 has dropped to Alice’s desired threshold of ρ ≤ 8. . . 28

4.1 General communication system model with important blocks for secrecy coding design highlighted. . . 29

4.2 Notional multi-hop network depiction with source/destination nodes, re-lay nodes, and eavesdropper nodes. . . 31

4.3 Multi-hop system model with codewords propagating over a series of main channels to Bob and over several wiretap channels to Eve. . . 31

4.4 A (4, 2) code and its cosets. . . 33

4.5 Illustration of various consistent coset possibilities. . . 40 ix

(10)

4.6 Simulation results supporting the conjecture. Note that |Si||Sj|

2k ≤ |Si∩ Sj|

holds for all cases simulated. . . 41 4.7 Simulation comparing the performance of the node relaying scheme to the

node encoding scheme in terms of the average leaked information. . . 44 4.8 Simulated security performance of a (7, 3) coset code over a multi-hop

network for various node counts. . . 45 4.9 Simulated upper bound on the performance of a (1023, 1013) LDPC

se-crecy code over a multi-hop network. . . 45

(11)

Chapter 1

Introduction

The theory and practice of secure communication has a history spanning many centuries, and continues to be an active area of research and innovation in the digital age of today. The study and implementation of techniques for secure communication, broadly labeled cryptography, has historical roots in ancient Egypt, where scribes would occasionally use nonstandard or unusual hieroglyphs in a block of writing in order to lend it a sense of mystery through obfuscation [1]. More structured methods for creating secret messages were invented by the Greeks and Romans, often for use in war. Historians record that Julius Caesar used a cryptographic method, or cipher, to communicate secret messages to his generals. This method involved shifting each letter in a message up by a certain number of characters in the alphabet, and it bears the Roman ruler’s name to this day: the Caesar cipher. More recent advances in cryptography include the electro-mechanical cipher machines used by Axis powers in World War II and the electronically-implemented symmetric-key and public-key algorithms that secure our digital communications in the present day.

Though cryptographic methods vary in complexity and sophistication, common to each is the transformation of ordinary message information called plaintext into obfuscated information called ciphertext. This transformation is called encryption, and the reverse operation from ciphertext to plaintext is called decryption. Sucessful encryption and decryption requires not only that the parties wishing to communicate securely agree upon the cryptographic method to use, but also that they each have the proper secret information called the key. Keys are used in the encryption and decryption processes to “lock” and “unlock” the secret message. In the case of the Caesar cipher mentioned above, the key is simply the specific number of letters that the message is shifted by in the alphabet. Of course, the security of a cryptosystem is contingent upon the secret keys remaining secret and inaccessible by any illegitimate parties. Tightly coupled with

(12)

Chapter 1. Introduction 2 the field of cryptography is the adversarial field of cryptanalysis, which seeks to exploit weaknesses either in cryptographic methods themselves or in their implementation in order to obtain hidden information (i.e. the plaintext, the key, or both). Many clas-sic ciphers, for example, retain statistical information about the plaintext within the ciphertext. When the technique of frequency analysis was developed by the Arabs in the 9th_{century [1], they quickly realized that this technique could be leveraged to break}

some of these classic ciphers. Advances in cryptography and cryptanalysis are often reciprocal in that if an accepted cipher is broken due to a new observation or technique in cryptanalysis, this will usually spur the development of new and improved ciphers and security techniques.

The historical approach to secure communication has relied almost exclusively on con-ventional cryptography as outlined above. One underlying assumption in the crypto-graphic framework for secure communication is that all parties (both legitimate and illegitimate) have access to the complete, unaltered ciphertext. Thus this framework for secure communication can be cast in terms of the system model depicted in Figure 1.1. Two legitimate parties, Alice and Bob, wish to communicate securely in the presence of an illegitimate eavesdropper Eve. Alice encrypts a secret message M into a cryptogram Xusing a predetermined cipher and key K. The cryptogram is delivered by some means to its intended recipient Bob, but is also obtained by Eve. Bob uses his key to decrypt X and thereby obtain M , while Eve may use some cryptographic attack in an effort to obtain an estimate of the message ˆM and an estimate of the key ˆK. Again, this model assumes that the ciphertext is equivalently available to both legitimate and illegitimate parties alike. However with the relatively recent advent of wireless communications (which often involve noisy or lossy channels) this assumption is no longer universally applicable. Cryptography over wireless communication systems now introduces the pos-sibility for error-prone ciphertext, which functioning systems must account for in one way or another. The accepted method for employing cryptography in (noisy) wireless environments is to combine cryptography with error correcting codes. That is, error correcting codes operating in a low layer of the protocol stack remove or mitigate the effects of physical-layer noise, so that the cryptographic algorithms operating in a high layer of the stack work as expected.

Some recent observations in secure communications have demonstrated that cryptogra-phy is not the only way to obtain security. In 1975 Aaron Wyner pioneered a revolution-ary new framework for secure communications, one that does not rely on secret keys and ciphers but instead relies on the randomness present in noisy communication channels. His framework has evolved into the wiretap channel model depicted in Figure 1.2. In this model two parties, Alice and Bob, communicate over a main channel while a passive eavesdropper Eve observes codewords over a wiretap channel. Alice encodes a message

(13)

Chapter 1. Introduction 3

Alice Encryption Decryption Bob

Attack Eve

M X M

ˆ M , ˆK K

Figure 1.1: _{System model for conventional cryptography.}

M into a codeword X using a coding scheme (as opposed to a cipher scheme), and trans-mits it over the main channel to Bob. The output of the main channel is the (potentially error-prone) vector Y , which Bob decodes to obtain an estimate of the message ˜M. The wiretap channel is typically a more noisy version of the main channel, over which Eve obtains an error-prone vector Z. For this system, Csisz´ar and K¨orner proved that codes that achieve both reliability and secrecy can be constructed, provided that the wiretap channel is more noisy than the main channel [2]. This observation provides the cor-nerstone for a recent field of study called physical-layer security. Physical-layer security refers to any security methods or technologies that are implemented at the physical layer of the protocol stack [3].

Alice Encoder Main

Channel Decoder Bob

Wiretap

Channel Eve

M X Y M˜

Z

Figure 1.2: _{System model for wiretap channel coding.}

Though the field of physical-layer security is advancing at a rapid pace, it is yet a rela-tively new field and there remain many problems to solve and facets to be investigated. This work offers two distinct contributions in the realm of physical-layer security, and builds upon some previously-investigated problems. We provide the following overview as a guide through the material in this work. A system-level overview of modern se-cure communications is given in Chapter 2, along with notational conventions and some background information in the topics of information theory, communication channels, cryptography, and physical-layer security. Chapter 3 presents the first novel contribution of this work, which is a cryptanalysis of a cipher system in the presence of noise. In par-ticular, we show using the simple substitution cipher how error-prone ciphertext makes

(14)

Chapter 1. Introduction 4 automated cryptanalysis more difficult, thereby augmenting the privacy level in a prac-tical sense. Chapter 4 turns the focus from cryptography to secrecy coding, and details how secrecy codes can be made more secure over a multi-hop network by employing a simple node encoding scheme. Several proofs are given to establish information-theoretic security levels for this scenario. Finally, Chapter 5 draws some conclusions and offers ideas for future study.

(15)

Chapter 2

Background

Though the main thrust of this work is in the area of cryptography and physical-layer security, it is important to see where these topics fit within the broader context of mod-ern secure communications. Thus we provide a brief system-level overview of modmod-ern secure communications and identify which aspects of the system are of interest to us. We then introduce the notation to be used going forward and offer the reader some necessary background information regarding information theory, communication chan-nels, cryptography, and physical-layer security. Some explanation is then given on the comparative strengths and weaknesses of cryptography and physical-layer security.

2.1 Modern Communications Overview

Modern secure communication can generally be modeled by the system depicted in Fig-ure 2.1 [4]. A (typically digital) message originates from an information Source and might pass through a Source Encoder, where the message data is compressed to remove redundancy. It is then processed by a Cryptographic Encoder which hides the message by encryption with a given cryptographic algorithm. The enciphered message is then passed through a Channel Encoder, which prepares the data for transmission over some communication channel. Typically the channel encoder adds structured redundancy to the data so that any errors introduced by the channel may be corrected by a correspond-ing channel decoder. However, the channel encoder might alternatively or additionally introduce a structure to the data that, when combined with channel errors, serves to hide the true cipher from an eavesdropper. The Modulator is a device or scheme that maps the symbols output from the channel encoder to a physical representation appro-priate for the communication channel, such as a time-varying voltage or radio-frequency waveform. The Channel itself can be any physical medium through which information

(16)

Chapter 2. Background 6 Source Source Encoder Cryptographic Encoder Channel Encoder Modulator Channel Demodulator Channel Decoder Cryptographic Decoder Source Decoder Sink

Figure 2.1: _{General communication system model.}

can travel. Channels are rarely perfect; information passing through most real-world channels is subject to error of some form or degree. The Demodulator receives the mes-sage sent over the channel in a manner complementary to the modulator. The Channel Decoder acts to correct any errors introduced by the channel, and/or remove the se-crecy structure that the channel encoder may have employed. The received cipher is then decrypted by the Cryptographic Decoder using a previously-agreed-upon key. As a final step the decrypted message is decompressed using the Source Decoder, after which it arrives at its destination: the Sink. This model is intended to introduce and delin-eate most of the key components in a modern secure communication system; it may be the case that specific implementations have more or fewer components than the generic system described here.

2.2 Notational Conventions

In this work the following notational conventions will be used throughout.

x A scalar.

xn _{A length-n row vector.}

Xm×n An m × n matrix. X A random variable.

X An alphabet of discrete symbols.

pX(x) The probability mass function (pmf) of discrete random variable X.

H(X) The Shannon entropy of a discrete random variable X.

(17)

Chapter 2. Background 7

2.3 Information Theory

Many of the security metrics and results in this work are information-theoretic in na-ture, and therefore we provide a brief overview of some elements of information theory. Fundamentally, information theory provides the tools necessary to quantify information, and thus is very useful when dealing with information secured (cryptography) and infor-mation transferred (communications). A more detailed treatment of inforinfor-mation theory concepts may be found in [5]. An information source can be mathematically modeled by a random variable, which is a variable defined over some set and whose realizations are governed by an underlying probability mass function (pmf). In the following definitions, X and Y are discrete random variables with pmfs pX(x) and pY(y) defined over X and

Y, respectively.

Definition 2.1. The entropy of X is defined to be H(X) = −X

x∈X

pX(x) log2pX(x). (2.1)

Entropy is a measure of the uncertainty of a random variable, and equivalently tells us the amount of information gained upon realizing that random variable. We adhere to the common convention in information theory that 0 · log 0 = 0. Though entropy may be defined with the log to any base, in this work it is always to the base 2 and thus is always expressed in units of bits.

Definition 2.2. The conditional entropy of X given Y is defined to be H(X|Y ) = − X

x∈X ,y∈Y

pX,Y(x, y) log2pX|Y(x|y) (2.2)

=X

y∈Y

pY(y)H(X|Y = y). (2.3)

Conditional entropy is a measure of the remaining uncertainty in a random variable X after a different random variable Y has been realized.

Definition 2.3. The mutual information between X and Y is defined to be

I(X; Y ) = X x∈X ,y∈Y pX,Y(x, y) log2 p_X|Y(x|y) pX(x) (2.4) = H(X) − H(X|Y ). (2.5)

Mutual information is a measure of how much the uncertainty of X is reduced due to the knowledge of Y , or equivalently how much information either random variable tells

(18)

Chapter 2. Background 8 about the other. If and only if I(X; Y ) = 0 then X and Y are said to be independent; neither tells any information about the other. An ensemble of random variables is said to be independent and identically distributed (i.i.d) if all the random variables are mutually independent and each has associated with it the same probability distribution.

2.4 Communication Channels

The results of this work rely heavily on certain assumptions made about the communi-cation channel over which messages are transmitted. Thus it is important to be familiar with several basic channel models that are ubiquitous in studies on physical-layer secu-rity. These channel models are all specific varieties of a general type of channel called a discrete memoryless channel.

Definition 2.4. A discrete memoryless channel (DMC) is a channel that can be de-scribed by an input discrete random variable X, an output discrete random variable Y , and a set of conditional probabilities p(y|x) that describe the likelihood of transitions from each x ∈ X to each y ∈ Y. The statement that such a channel is memoryless means that the ith _{output is a function only of the i}th _input.

Some of the most relevant types of DMCs are presented below.

2.4.1 Binary Erasure Channel

A binary erasure channel (BEC) is a channel model in which a binary alphabet X = {0, 1} on the input side of the channel is mapped to a ternary alphabet Y = {0, 1, ?} on the output side, as depicted in Figure 2.2. The ‘?’ symbol indicates that the output is indeterminate, and is called an erasure. Erasures occur with probability ǫ and are independent from one another. Conversely, with probability 1 − ǫ the output symbol is equivalent to the input symbol.

0 1 0 1 ? 1 − ǫ ǫ ǫ 1 − ǫ

(19)

Chapter 2. Background 9 Though certainly a very simplistic model, the BEC is valuable as a starting point for designing codes. In fact, codes developed for the BEC tend to work well for real-world channels also [6].

2.4.2 Binary Symmetric Channel

A binary symmetric channel (BSC) is a channel model that maps binary input symbols to binary output symbols, with some probability p of an input bit being flipped. With probability 1 − p bits pass through the channel unaltered, as shown in Figure 2.3. If a bit is flipped passing through the channel then an error has occurred. Like the BEC, the BSC also provides a good starting point in the design of codes for complex real-world channels. 0 1 0 1 1 − p p p 1 − p

Figure 2.3: _{Binary symmetric channel model.}

2.5 Communication Privacy

It is imperative going forward to have an understanding of what is meant generally by communication privacy as well as some of the specific methods by which it is achieved. We use the term communication privacy to refer to the ability of multiple entities to selectively transfer information to one another. This definition is broad by choice, and is intended to encompass not only architected techniques for hiding information, but also secondary effects that may assist in keeping information hidden. By way of example, one might use a lockbox and a key to secure important physical documents (a security technique), and happen to store the key in a place where nobody is likely to find it (a privacy-enhancing side-effect). Sometimes this is referred to as “security-through-obscurity,” but here we reserve the term security for any privacy that is obtained via conscious system design practices.

In cases where a security technique (cryptography, physical-layer security, etc.) is used in a system, it is natural to desire some concrete metric for measuring just how secure the system is. As we will discover, there exist several different metrics for analyzing security.

(20)

Chapter 2. Background 10

2.5.1 Computational Security

Computational security is any security obtained using computational hardness assump-tions. That is, a system is deemed computationally secure if an attack on the system is intractable using state-of-the-art computing resources. Nearly all modern cryptographic methods base their security off of this assumption. An attack on a modern cipher might require solving an integer factorization problem [7] or lattice optimization problem [8], which may be theoretically possible given enough time or computing power, but prac-tically out of reach due to limitations in current technology. Typically, the metrics used for evaluating computationally secure systems are the computing power or time (or both) required to successfully attack the cipher. Of course, computational security is not a truly rigorous statement of security because it provides no theoretical guarantees of security. There exists a different type of security metric, however, that does provide such guarantees.

2.5.2 Information-theoretic Security

In his 1949 paper [9], Claude Shannon introduced the notion of perfect secrecy and offered a definition in terms of information theory. His requirement for perfect secrecy was that for a cipher encrypting a message M into a cryptogram X, the conditional entropy of the message given the cryptogram H(M|X) be equivalent to the message entropy H(M). This definition is formally expressed in terms of mutual information as follows.

Definition 2.5. If a message M is encoded to form a codeword X, then perfect secrecy is achieved if

I(M ; X) = 0. (2.6)

If a system is perfectly secure then an attacker can obtain no useful knowledge about the message from observing the ciphertext, and even infinite time and computing resources will not allow him to do so. Shannon also showed that the one-time pad cipher achieves perfect secrecy. The one-time pad enciphers a message by combining (XORing in the case of binary messages) it with a random key that is at least the same size as the message. Though provably perfectly secure, the one-time pad is unrealistic for practical use because keys the same size as messages become too difficult to manage.

Some other information-theoretic secrecy metrics exist and are applicable when security is obtained via the use of a secrecy code in a wiretap scenario as in Figure 1.2. Let a message M of length k be encoded into a codeword Xn _{of length n, and let Z}n _{be an}

(21)

Chapter 2. Background 11 the system is strongly secure if the total information leaked over the wiretap channel approaches zero as the block length n of the code approaches infinity. This is formalized in the following definition.

Definition 2.6. Strong secrecy is achieved in a wiretap system if the system satisfies the property

lim

n→∞I(M ; Z

n_{) = 0.} _(2.7)

A similar metric to strong secrecy is weak secrecy, which instead requires that the rate of information (or information per coded bit) leaked over the wiretap channel approach zero as n becomes large. This is expressed in the following definition.

Definition 2.7. Weak secrecy is achieved in a wiretap system if the system satisfies the property lim n→∞ 1 nI(M ; Z n_{) = 0.} _(2.8)

Weak secrecy is indeed a weaker metric than strong secrecy, for its definition does allow for some bits of M to be recovered from Zn_{[3, 10]. The trade-off is that designing codes}

for weak secrecy is usually much easier than designing codes for strong secrecy. Note that both metrics are defined asymptotically as the block length increases; in the same way that the performance of error-correcting codes improves with block length, so too the secrecy of weak and strong secrecy codes improves with block length.

So we see that one advantage of secrecy codes over cryptography is that they can be designed to satisfy rigorous information-theoretic secrecy metrics, and do not rely on computational hardness assumptions. Another advantage is simply that secrecy codes do not require the use of secret keys as in cryptography. The management and distri-bution of keys is no trivial task, and secrecy codes avoid all of the intricacies associated with this. In addition the mathematical operations required to perform secrecy coding are usually linear and lend themselves well to implementation on low-power processors. The operations used in modern ciphers, in contrast, tend to be more complicated. Thus secrecy codes are well-suited to provide lightweight security in low-power systems where computational resources are limited and a full cryptographic solution is not needed. Be-cause the security of secrecy codes is defined information-theoretically and does not rely on computational hardness assumptions as in cryptography, one might wonder whether secrecy codes will eventually replace cryptography as the standard method for obtaining security. However, secrecy codes have their own limiting assumptions that may not make them appropriate for standalone use. Particularly, the wiretap model assumption that the wiretap channel is noisier than the main channel may not be the case in practice. Because of the strengths and weaknesses present in each method, it is anticipated that

(22)

Chapter 2. Background 12 the best security solutions going forward will be ones that employ both application-layer cryptography and physical-layer security.

(23)

Chapter 3

Physical-Layer Effects On

Cryptographic Attack Algorithms

3.1 Introduction

The simple substitution cipher, though long abandoned as a practical cryptographic method, can still be used as a framework from which new observations in cryptography can be made [11]. Due to its simplicity, it is well-suited to the analysis of problems involving multilayer security designs, providing a well-understood cryptographic back-drop that can be analyzed with and without other security layers in place. This chapter addresses a multilayer security setup with the simple substitution cipher at the appli-cation layer, and randomly occurring errors at the physical layer. We concern ourselves primarily with the interrelations between the cryptographic encoder, channel effects, and cryptographic decoder as depicted in Figure 3.1. It is of note that error-correcting secrecy codes can further exploit the physical layer characteristics for security [2, 3, 12]. However, in this chapter we assume only that the physical layer provides error-prone ciphertext to a passive eavesdropper through a discrete memoryless symmetric channel. Thus, the physical layer provides an enhancement to cryptographic security by adding confusion about the true nature of the ciphertext. This approach has been applied to the case where an eavesdropper’s ciphertext is riddled with erasures in [11]. Here, we extend the utility of these recent results to the discrete memoryless symmetric channel. A portion of this work was published by the author in [13], but is discussed here in more detail.

The cryptanalysis of the simple substitution cipher has led to the development of dif-ferent automated ciphertext-only attacks [14–18], which can reliably break the cipher

(24)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 14 Source Source Encoder Cryptographic Encoder Channel Encoder Modulator Channel Demodulator Channel Decoder Cryptographic Decoder Source Decoder Sink

Figure 3.1: _{General communication system model with relevant blocks for} cryptog-raphy in the presence of noise highlighted.

Alice Encrypt Decrypt Bob

Wiretap Channel Q Attack Eve M X M Z M , ˆˆ K K

Figure 3.2: _{System model depicting three players in a network. Alice encrypts and} transmits a secret message to Bob using a key known only to the two legitimate users. Eve passively eavesdrops on the transmitted data, albeit through the error-prone

wire-tap channel Q, and then employs an attack on the error-prone ciphertext.

by recovering the original plaintext or key. Generally these attacks use either statis-tical analysis or dictionary look-ups to arrive at a most likely solution. Most existing attacks assume that the ciphertext is error-free. Intuitively, we expect that the presence of errors in the ciphertext adversely affects the performance of these algorithms. If the performance of an attack degrades in some predictable manner as errors in the cipher-text increase, then we can specify a certain error rate necessary to achieve some desired level of security with respect to the attack being used.

Throughout this chapter we will assume the system model shown in Figure 3.2. This model is similar to modern versions of Wyner’s wiretap channel model [12]. Alice en-crypts a message M with a secret key K to obtain the ciphertext X. In this chapter the encryption algorithm is always the simple substitution cipher. Alice then broadcasts X over a noise-free main channel to her intended recipient, Bob. He then decrypts X using his a-priori knowledge of K, and recovers the original message. Eve is able to listen in, but only over a discrete memoryless symmetric wiretap channel modeled by the channel transition matrix Q. Eve then obtains Z over the channel, and applies a substitution cipher attack algorithm to form an estimate of the message ˆM, and the key

ˆ

(25)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 15 (as opposed to binary codewords representing the symbols); however, if we consider the case where physical-layer security codes [10] are being implemented in tandem with the substitution cryptosystem, then codes may provide confusion to the eavesdropper in a manner consistent with the symmetric discrete memoryless channel model in Figure 3.2. Also note the absence of any error-correcting codes. This model is consistent with an assumption often made in physical-layer security scenarios, namely, that Eve has a dis-tinct disadvantage in channel quality [2]. This property is what sets this model apart from the one commonly assumed by substitution cipher attacks. Most attacks assume either a noiseless wiretap channel such that Z = X, or that all errors are corrected via an error-correcting code.

The wiretap channel is a discrete memoryless symmetric channel that constitutes a probabilistic mapping from the cipher symbols of X to the cipher symbols of Z. It can be modeled by the channel transition matrix Q, the entries of which are the conditional probabilities p_Z|X(z|x), the probability of a certain output symbol z given a certain input symbol x. Since the alphabets on X and Z are identical, then Q is a square matrix with dimensions N × N , where N is the number of symbols in the chosen alphabet.

The rest of this chapter proceeds as follows: in Section 3.2 we introduce several concepts important to understanding the work as a whole. Section 3.3 discusses the cryptographic attack algorithm to be used on noisy ciphertext, while Sections 3.4 and 3.5 give the primary results and draw conclusions, respectively.

3.2 Background

This section introduces the simple substitution cipher, the discrete memoryless symmet-ric channel, a metsymmet-ric for quantifying channel noise, hidden Markov models, and varieties of substitution cipher attack algorithms.

3.2.1 Simple Substitution Cipher

In a simple substitution cipher, each symbol in a plaintext message M is replaced by a cipher symbol from the same alphabet, according to a substitution secret key. In contrast to more advanced forms of substitution ciphers, which may use one-to-many or many-to-many substitutions, the simple substitution cipher has a one-to-one mapping from plaintext symbols to ciphertext symbols. We may consider the substitution cipher encoder as an operator T , and write

(26)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 16 where the subscript indicates the key chosen to perform the mapping. Then the inverse relation is also true that

M = T_K−1{X}, (3.2)

where both the encoder and decoder operate on a symbol-by-symbol basis. In our case, both X and M are comprised of N symbols, and the secret key is chosen at random from the pool of all possible N ! permutations on N symbols. Although in some sense K is random, the key is fixed once chosen, and can also, therefore, be modeled as a static parameter in some cases.

Information-theoretic [19, 20] and probabilistic [21] security analysis for this cipher sys-tem has been complete for several decades, however, each of these works assumes the attacker has access to the error-free version of the ciphertext. Only recently in [11], has the question of security been coupled with the assumption that the attacker possesses an imperfect version of the ciphertext.

3.2.2 Discrete Memoryless Symmetric Channels

The channel model assumed in this chapter is a type of discrete memoryless channel from Definition 2.4, with the additional property that the channel is symmetric. As defined in [5], a symmetric discrete memoryless channel has a channel transition matrix Q such that all rows are permutations of each other, and all columns are permutations of each other. The simplest such channel is the binary symmetric channel. Eve’s channel in Figure 3.2, however, is assumed to be non-binary, with N discrete values in both the input and output symbol alphabets. For simplicity, we further assume that the diagonal elements of Q are identical, such that Q takes on the form

Q=           p1 pN pN−1 · · · p2 p2 p1 pN · · · p3 p3 p2 p1 · · · p4 .. . ... ... . .. ... pN pN−1 pN−2 · · · p1           . (3.3)

3.2.3 Channel Noise Measure

In order to quantify the noise present in the wiretap channel, we make use of the quan-tity from information theory called mutual information. Recall from Definition 2.3 that mutual information is a measure of how much information is shared between two ran-dom variables. If we consider the input and output of a channel Q to be the ranran-dom

(27)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 17 variables X and Z, respectively, then the mutual information IQ(X; Z) tells us exactly

how much information is passed from the input to the output, and consequently provides an intuitive and consistent indication of channel noise or clarity. To actually calculate the mutual information for a given channel, we go back to the original definition

IQ(X; Z) = X z X x pX,Z(x, z) log pX,Z(x, z) pX(x)pZ(z) , (3.4)

which we can re-express using the definition of conditional probabilities as IQ(X; Z) = X z X x pX(x)pZ|X(z|x) log p_Z|X(z|x) pZ(z) . (3.5)

Recall that the elements of Q are the conditional probabilities p_Z|X(z|x). The distribu-tion pX(x) is found by permuting pM(m) according to the key K. We can express this

using the substitution cipher encoder as

pX(x) = TK{pM(m)}. (3.6)

Finally, the distribution on Z can be calculated as pZ(z) = X x pX,Z(x, z) =X x pX(x)pZ|X(z|x). (3.7)

Notice from (3.6) and (3.5) that the channel mutual information is a function of the secret key K. While one might be tempted to calculate an average mutual information over all possible keys, it should be noted that such an approach will render pZ(z) to be

uniform over all N symbols. This is not a realistic result, however, since K is fixed for a given instance of the system model in Figure 3.2. Therefore, it is more valuable to calculate IQ(X; Z) as a function of the secret key chosen.

Now, as a function of the channel matrix Q, we can discuss the possible range of IQ(X; Z). We note [5] that the channel mutual information is lower-bounded by IQ(X; Z) =

0, which is achieved in our case when all of the entries in Q are equally likely (a com-pletely noisy channel that yields no useful information to the eavesdropper in Z). Also, the upper bound of the mutual information is achieved when each row of Q contains a single 1 and all other row entries are 0. In this case, the mutual information between X and Z is simply the entropy in X, or alternatively, the entropy in M .

(28)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 18

3.2.4 Hidden Markov Models

A hidden Markov model (HMM) is a statistical description of a system that consists of a set of unobserved (hidden) states X and a set of observations (outputs) Y that are dependent on the state. The traversal of the system through its states satisfies Markovity, i.e. transitions to any future states are dependent only on the current state. The state transitions are described by a state transition matrix Φ consisting of the conditional probabilities Φi,j = p(xn+1= i|xn= j). The outputs are also described probabilistically

by an emission probability matrix Θ consisting of the conditional probabilities Θi,k =

p(yn = k|xn = i). Though the underlying state traversal is not directly observed,

knowledge of the output as well as the parameters of the model often allow the state traversal to be estimated in some maximum-likelihood fashion [22]. A depiction of a simple hidden Markov model with three states and three outputs is given in Figure 3.3.

x1 x2 x3

y1 y2 y3

Figure 3.3: _{Depiction of a hidden Markov model.}

3.2.5 Families of Attacks on Simple Substitution

In the realm of automated attacks on the simple substitution cipher, there exist two main types. The first family of attacks are known as dictionary-based attacks. These attempt to find the key by replacing words in the ciphertext with words from a dictionary with identical letter patterns [15]. While these attacks can be successful when attacking a cryptosystem with known ciphertext, attacks of this nature quickly degrade under the assumption of noisy ciphertext.

Attack algorithms that belong to the second family of attacks make use of statistical data that describe the language in which the message was originally written. These at-tacks rely on the fact that letter frequencies are preserved through substitution, and are often able to identify several correct keypairs by simply assigning ciphertext symbols to their best fit plaintext symbol according to the relative number of uses of the ciphertext

(29)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 19 character. Statistical attacks often begin with an initial key guess, which is iteratively modified until the corresponding deciphered plaintext resembles the language of inter-est. Meaningful ciphertext statistics are required for good performance, that is, the ciphertext must be several hundred characters long. When the length of the ciphertext is short (and the statistics virtually meaningless), dictionary attacks tend to outperform this latter family of attacks.

When considered in the case of noisy ciphertext, some statistical attacks show promise, although several other attack algorithms make limiting assumptions that restrict their usefulness when the ciphertext is error prone. For example, the attacks in [15, 18] make reductions in possible keypairs by ruling out certain character patterns during the course of the attack algorithm. Doing this with noisy ciphertext could remove the correct key from consideration, or in fact, remove all possible keys from consideration. To illustrate the tendency for statistical attacks to perform better than dictionary attacks when the ciphertext is error prone, we offer a comparison of the empirical performance of a dictionary attack vs. a statistical attack. Table 3.1 shows results for each type of attack in the error-free as well as error-prone case, where the performance is quantified in terms of the percent of letters of the plaintext that were correctly recovered by the attack algorithm. We see that in the error-free case the performance is nearly identical for both attacks. However the introduction of errors affects the dictionary attack to a much greater degree than the statistical attack. The statistical attack algorithm used in this performance comparison is a variety that uses the theory of hidden Markov models, and is one of the few that has been found to be robust enough to handle the assumption of noisy ciphertext. The details of this HMM-based attack are presented in the next section.

Table 3.1: _{Attack family performance comparison.}

Dictionary Attack Statistical Attack

Error-Free Ciphertext 99.7% 100%

Error-Prone Ciphertext 21.0% 90.1%

3.3 HMM-Based Attack

3.3.1 Algorithm Description

For this problem we use the substitution cipher attack developed in [23]. This attack falls in the statistical analysis family of substitution attacks and formulates substitution

(30)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 20

m1 m2 m3

x1 x2 x3

z1 z2 z3

Figure 3.4: _{Modified HMM for modeling plaintext states, ciphertext states, and noisy} ciphertext states.

encipherment of a natural language as a hidden Markov model. The HMM was intro-duced in Section 3.2.4, but here we describe how such a model forms the basis of an effective attack against the simple substitution cipher.

In the case of substitution encipherment, successive characters (or groups of characters) in a string of plaintext M directly parallel the traversal of states in an HMM. In a first-order HMM, the states are individual characters and the state transition probabilities are given by the language bigram statistics. In a second-order HMM, the states are groups of two characters and the state transition probabilities are given by the language trigram statistics (and so forth). In general, substitution encipherment via a key can be modeled as the state output function and the string of ciphertext in X would then correspond to the observations of the HMM. However, in our case, only noisy ciphertext is observed. Therefore, the observations are actually given as the output symbols of the wiretap channel Z. Figure 3.4 illustrates how substitution encipherment with channel noise can be framed in terms of an HMM.

Having framed the substitution enciphering along with the wiretap channel in Figure 3.2 in terms of an HMM as described above, the HMM-based attack performs a maximum-likelihood parameter estimation of the state output function based on the observed noisy ciphertext Z and the fixed HMM parameters. Estimating the state output function is analogous to estimating the key. An HMM-based attack on a substitution cipher proceeds as follows:

(31)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 21 1. The unigram probabilities of the language symbols are obtained to form pM(m).

2. The bigram probabilities of the language symbols are obtained to form pMi,Mi−1(mi|mi−1).

3. The wiretap channel output, or noisy ciphertext, is obtained to form the set of observations Z.

4. Based on the observations Z and the language unigram statistics, an initial prob-abilistic mapping of observations to states is formed: p_M|X(m|x).

5. This probabilistic mapping is iteratively updated via the expectation-maximization (E-M) algorithm until convergence.

6. A deterministic key ˆK is isolated from this probabilistic mapping. 7. The guessed key ˆK is used to decrypt the ciphertext X to produce ˆM.

For the attacks presented in this chapter, the assumed language is English and the plaintext alphabet is

M = {a, b, ..., z, space}, (3.8)

indicating an alphabet size of N = 27. The ciphertext alphabet X is identical to M, as the ciphertext is generated through simple substitution.

3.3.2 Performance Measures

Because the HMM-based attack attempts to recover both the key K and the original plaintext M , we need two different metrics in order to quantify each result. Ideally, to present strong cases for secrecy, we would present metrics in terms of the key and mes-sage equivocations H(K|Z) and H(M |Z), which would provide information-theoretic arguments for the amount of uncertainty remaining about K and M after observing Z. However, these calculations are quite difficult to obtain for this problem. This is par-ticularly true when we consider interdependence between successive symbols in a true language model, such as English. Thus, we deal with more practical performance mea-sures when presenting the results of this research. The following results are framed in the more generic expression of a privacy-enhancing effect, as opposed to the more formal expressions of computational security and information-theoretic security as described in Chapter 2.

(32)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 22 3.3.2.1 Percent Letter Resolution

Percent letter resolution is a metric used to quantify the results of the HMM-based attack in terms of the recovered plaintext message ˆM. It is defined as

%LR = 100 ×Mˆcorrect

|M | , (3.9)

where ˆMcorrect is the number of characters in ˆM such that M = ˆM, and |M | is the

total number of symbols in the message M . Since the wiretap channel is modeled as a symmetric discrete memoryless channel, then | ˆM| = |M |. The attack is completely successful if %LR = 100, indicating that M = ˆM. Although the metric does not lend itself to any strong security conclusions, it is a simple intuitive metric that allows for successive runs of the attack to be compared as parameters are altered.

3.3.2.2 Recovered Keypairs

Another metric for comparing successive attacks using this algorithm is to simply count the number of correct symbol pairings in the recovered key ˆK. This allows us to quickly evaluate how much of the true key K has been successfully recovered. Again, this metric allows us to spot trends, but does not afford any strong security conclusions. Since our alphabet size is |M| = 27, then the recovered keypairs measure can take on values in the set of integers [0, 27].

3.4 Results

For the sake of verification, we first established how the HMM-based attack performs with noise-free ciphertext as the length of the ciphertext changes. We considered ci-phertext lengths from 10 to 1000 in 10-letter increments. 100 messages were randomly selected from a corpus at each of these lengths. Each of these messages was then enci-phered with a randomly-chosen key, and the HMM attack was performed on the cipher-text. The results are plotted first in terms of %LR in Figure 3.5; the raw performance data appears overlayed with the average performance at each chosen length. The re-sults are also plotted in terms of recovered keypairs in Figure 3.6. In addition to the raw data the average, rounded up to the nearest integer, is also plotted at each chosen length. Both of these plots demonstrate expected behavior for an attack that uses sta-tistical analysis, that is, the attack succeeds with greater probability as the cipherlength increases.

(33)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 23 0 200 400 600 800 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 data average % LR Cipherlength

Figure 3.5: _{Attack performance on noiseless ciphertext in terms of the %LR of the} guessed message plotted vs. the length of the ciphertext. The average of the data is

plotted as well.

We also desire to characterize experimentally how the performance of the HMM-based attack changes as a function of channel noise. In order to do so, we set up a Monte Carlo simulation in which our selected attack algorithm was run many times with varying parameters. Because the attack is capable of perfect decipherment in the noiseless case for ciphertexts of length 1000, we set n = 1000 for this experiment. We defined a set of 2000 channels that varied uniformly from a completely noiseless channel to a completely noisy channel. For each of these channels we randomly selected a message string of length 1000 from a corpus, and generated a random key with which we enciphered the message. The cipher was then run through the selected channel and the HMM-based attack was applied to the output. In each case the channel mutual information, percent letter resolution, and number of recovered keypairs was recorded. Results of the simulation in terms of percent letter resolution can be seen in Figure 3.7. Results in terms of recovered keypairs are given in Figure 3.8.

Notice that the simulation results seem to indicate a performance region that can be accurately predicted as a function of IQ(X; Z), which is itself only a function of the

channel parameters and the random choice of K. One small note of interest here, is that different keys provide varying values for IQ(X; Z), indicating that not all keys provide

(34)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 24 0 200 400 600 800 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 data average (ceiling) R ec ov er ed ke y p ai rs Cipherlength

Figure 3.6: _{Attack performance on noiseless ciphertext in terms of the number of} correctly guessed keypairs plotted vs. the length of the ciphertext. The average of the

data (rounded up) is plotted as well.

the same security level to the simple substitution cipher. We furthermore note that the existence of performance regions implies an upper bound in the expected performance of the attack, which is also a function of IQ(X; Z). A theoretical upper bound is not given

here, but by close examination of the simulation results, an experimental upper bound may be identified. The results of Figure 3.7 seem to indicate a third-order relationship between the channel mutual information and the percent letter resolution. Fitting the simulation data to a third-order polynomial yields the following relationship:

%LRM M SE = 100 × (−0.0213x3+ 0.1485x2− 0.0533x + 0.1842). (3.10)

This polynomial appears overlayed on the raw data in Figure 3.7, and seems to describe the general trend of the performance data quite well. In an attempt to provide an ex-perimental upper bound for the performance curve, we calculate the standard deviation σ of the raw data from the third-order polynomial fit, and then simply translate the third-order curve upwards by the amount 3σ. This yields the upper bound curve

(35)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 25 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 data 3rd−order fit upper−bound % LR IQ(X; Z)

Figure 3.7: _{Attack performance on a 1000-symbol cipher with a third-order curve fit} and a 3σ upper bound.

which is also overlayed on the plot of Figure 3.7.

While it cannot be said that in general, %LR ≤ %LRub, this function does provide a

reasonable estimate of the maximum %LR attainable as a function of IQ(X; Z). Because

(3.11) is defined by a 3σ offset from the curve fit, it should hold in 99.7% of cases. Similarly, a general trend and upper bound can be identified for the attack performance expressed in terms of recovered keypairs. An examination of the raw data in Figure 3.8 seems to indicate a linear relationship between channel noise and recovered keypairs. Performing a linear fit on the simulation data yields the following relationship:

keypairs_{M M SE} = 6.6097(IQ(X; Z)) − 0.8281. (3.12)

This line appears overlayed on the raw data in Figure 3.8, and also describes the general trend of the data quite well. We again attempt to provide an experimental upper bound for the performance curve by calculating the standard deviation σ of the raw data from the linear fit, and then translate (3.12) upwards by 3σ. This yields the upper bound

(36)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 26 0 0.5 1 1.5 2 2.5 3 3.5 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 data 1st−order fit upper−bound R ec ov er ed ke y p ai rs IQ(X; Z)

Figure 3.8: _{Attack performance on a 1000-symbol cipher with a linear fit and a 3σ} upper bound.

which is also overlaid on the plot given in Figure 3.8. The rationale behind this estimate of the maximum performance characteristics of the attack is similar to that of (3.11). This upper bound will hold approximately 99.7% of the time, and is optimistic for channels that are either quite noisy or not very noisy. As we will see in an example to follow, an optimistic upper-bound is acceptable in some cases.

While the results shown in Figures 3.7 and 3.8 seem to fail to provide us with any concrete security conclusions, in reality, these results coupled with the results from Figures 3.5 and 3.6, allow us to compare the effects of channel noise to the benefit of observing more ciphertext. For example, it can be seen in Figure 3.7 that the expected %LR is reduced from 100% to 94% as IQ(X; Z) changes from 4.05 to 3.4. In the noise-free case,

we see a similar reduction in the success of the attack as the eavesdropper receives only 90 ciphertext symbols instead of 200, as shown in Figure 3.5. Similarly, we see that the maximum number of recovered keypairs is reduced from 26 to 15 as IQ(X; Z) changes

from 3.95 to 2.3 in Figure 3.8. In the noise-free case of Figure 3.6, the maximum number of recovered keypairs is reduced from 26 to 15 as the length of the ciphertext drops from 1000 to 100. Thus, the results of this section provide us with a means to compare the

(37)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 27 output of the HMM-based attack to the noiseless case where channel noise is traded for additional ciphertext.

Knowledge of the combined effect of cipherlength and channel noise on the performance of this attack can be used to decide key agreement. It is often desirable in crypto-systems to decrease the frequency of key agreement. Doing so, however, involves sending longer blocks of ciphertext that are encrypted with a single key. We have seen that the performance of the HMM-based attack increases with the length of the ciphertext but also decreases with the channel noise. This indicates that if Alice has knowledge of the noisy channel parameters, then she can set the cipherlength as long as possible (and key agreement frequency as low as possible) to keep the performance of the HMM-based attack under a desired threshold.

Example 3.1. We consider an example where Alice wishes to use the substitution ci-pher to encrypt messages sent to Bob. She sends cici-phers of length n, and renegotiates the key with Bob before each cipher block is sent. Alice also wishes to keep the number of keypairs that Eve could conceivably recover (using the HMM attack) at or below a desired threshold, say ρ ≤ 8. Let us assume that the wiretap channel has channel mutual information IQ(X; Z) = 2 information bits. If Alice has access to simulation data as

in Figure 3.6, then she knows that performing key agreement every n = 1000 characters may result in ρ = 19 keypairs being correctly obtained by Eve. She must decrease the number of obtainable keypairs by 11 or more, and knowledge of the HMM attack perfor-mance for the noiseless case as a function of cipherlength (as in Figure 3.6) gives her a good estimate of where to set the key agreement period. She sees that changing the key agreement period from n = 1000 to n = 80 drops the recovered keypairs by 11. So, setting n = 80 should decrease the performance of the HMM attack to ρ = 8 or fewer at the given channel noise. This is verified via the simulation results in Figure 3.9, which show that for n = 80 the maximum recovered keypairs at IQ(X; Z) = 2 information bits

is ρ = 8.

3.5 Conclusion

In this chapter we have analyzed the utility of one of the most robust attacks against the simple substitution cipher when the ciphertext is obtained by the eavesdropper as the output of a symmetric discrete memoryless channel. The utility of the attack algorithm was presented as a function of channel mutual information in the case of noisy ciphertext, and the length of the observed ciphertext in the case of noise-free ciphertext. These two sets of results indicate an effective security gain that can be obtained if an eavesdropper can be made to observe only error-prone ciphertext, even for the simple substitution

(38)

Chapter 3. Physical-Layer Effects On Cryptographic Attack Algorithms 28 0 0.5 1 1.5 2 2.5 3 3.5 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 R ec ov er ed ke y p ai rs IQ(X; Z)

Figure 3.9: _{Attack performance on a 70-symbol cipher in terms of the number of} correctly guessed keypairs plotted vs. the channel mutual information. The maximum number of recovered keypairs at IQ(X; Z) = 2 has dropped to Alice’s desired threshold

of ρ ≤ 8.

cipher. An example was presented to show how knowledge of the combined effect of cipherlength and channel noise can be used to determine the key agreement period of a substitution cipher cryptosystem. Clearly, these results are likely to scale to more practical cryptosystems, and indicate a significant security gain that can be obtained through combining error-control secrecy codes with cryptography.

Related topics for future study include analyzing the secrecy of a wiretap system wherein the substitution cipher (or a more modern cipher) is used in combination with a secrecy code to secure messages.

(39)

Chapter 4

The Secrecy Of Coset Codes In A

Multi-hop Network

4.1 Introduction

While the previous chapter focused on how practical levels of system security may be obtained via the combination of application-layer cryptography and physical-layer chan-nel noise, in Chapter 4 we turn our attention to the problem of obtaining rigorous levels of security using secrecy coding techniques in combination with physical-layer channel noise. Secrecy codes are channel codes that are designed primarily to leverage channel noise in order to hide information from an eavesdropper. This is different from the design goals of error-correcting codes, which are channel codes that are architected to alleviate the effects of channel noise. In this chapter we concern ourselves only with the the source, sink, channel encoder/decoder, and channel of the general communication system model, as depicted in Figure 4.1.

Source Source Encoder Cryptographic Encoder Channel Encoder Modulator Channel Demodulator Channel Decoder Cryptographic Decoder Source Decoder Sink

Figure 4.1: _{General communication system model with important blocks for secrecy} coding design highlighted.

(40)

Chapter 4. The Secrecy Of Coset Codes In A Multi-hop Network 30 It is conventional for security in a communication system to be obtained through the use of cryptography residing at the application layer, which typically requires the use of a shared secret key. However Wyner introduced a framework in which system security can be obtained without the use of any application-layer cryptography whatsoever [12]. This framework relies on the use of secrecy codes in combination with channel noise in order to obtain security. Though Wyner pioneered this framework and introduced implicit coding techniques for it in the 1970s, only in the last decade have explicit secrecy codes been designed that actually achieve secrecy [24].

The effectiveness of these new secrecy codes is a function of the error rate experienced by the eavesdropper. In other words, a particular secrecy code might guarantee message secrecy provided that enough errors are introduced by the channel into the codewords observed by the eavesdropper. There exists the possibility, too, that the eavesdropper may experience an unforeseen drop in error rate and thus compromise the security of the system. Limitations such as this are the reason why physical-layer techniques may be most effective when used in conjunction with conventional cryptography, rather than as a standalone security solution. The multi-hop network provides an example of a system in which this may occur.

We consider a system in which messages are encoded using a secrecy code at a source node and then relayed over a series of intermediate nodes to a destination node. Each intermediate node simply receives codewords and sends them unaltered to the next node. This multi-hop network model is consistent with many networking technologies in use today, such as mesh networks and mobile ad-hoc networks (MANETs). In addition to the relay nodes, there are colluding eavesdropping nodes at a further distance away that can intercept the transmitted codewords, as depicted in Figure 4.2. Though the use of a secrecy code may ensure the secrecy of a codeword transmitted between any two legitimate nodes, the fact that an eavesdropper could effectively observe multiple transmissions of the same codeword means that the system as a whole may no longer be secure. An eavesdropper may be able to combine several erroneous codewords to form an error-free codeword and thus deduce the message. The node encoding technique presented in [25] shows promise as a means to achieve secrecy over the multi-hop network even if an eavesdropper can observe multiple transmissions. In the following chapter we provide proofs that verify the secrecy of this node encoding technique provided certain conditions are met.

The rest of this chapter proceeds as follows: in Section 4.2 we introduce several concepts important to understanding the coding scheme as a whole. Section 4.4 develops proofs that demonstrate the secrecy of the node encoding technique over the multi-hop network,

(41)

Chapter 4. The Secrecy Of Coset Codes In A Multi-hop Network 31

Source Destination

Figure 4.2: _{Notional multi-hop network depiction with source/destination nodes,} relay nodes, and eavesdropper nodes.

and Section 4.5 demonstrates the utility of these findings via some simulations. Finally, Section 4.6 summarizes the results and draws some conclusions.

4.2 Background

4.2.1 The Multi-hop Network

The multi-hop network was introduced notionally at the beginning of this chapter, but here we discuss it in more detail. We consider the multi-hop network to be a concate-nation of N wiretap channel models. Alice encodes a message M using a code C into a codeword X1 and transmits it over a main channel M C1 to an intermediate legitimate

node y1. The node may perform some operation on the codeword or simply retransmit

the codeword to the next legitimate node in the network. This process continues un-til the final node yN transmits the codeword XN to Bob, the intended recipient, who

decodes it to obtain the message. Also, an eavesdropper Eve has the opportunity to observe each of the transmitted codewords through a series of binary erasure channels EC1, EC2, . . . , ECN. Thus she obtains a series of codewords z1, z2, . . . , zN. The erasure

probabilities for each of the N erasure channels are denoted ǫ1, ǫ2, . . . , ǫN. A depiction

of this system is given in Figure 4.3. Let M Ci indicate the ith main channel in the

network and ECi indicate the ith wiretap channel.

Alice Encoder M C1 M C2 ... M CN Decoder Bob

EC1 Eve EC2 Eve ECN Eve M Mˆ

Figure 4.3: _{Multi-hop system model with codewords propagating over a series of main} channels to Bob and over several wiretap channels to Eve.

(42)

Chapter 4. The Secrecy Of Coset Codes In A Multi-hop Network 32

4.2.2 Linear Block Codes

The codes used in this chapter fall under the category of binary linear block codes. Thus, all message symbols and codeword symbols are taken from the binary alphabet M = X = {0, 1}. A block code C is so named because it encodes data in blocks of explicitly-defined input and output size; this is in contrast to convolutional codes which encode data continuously in a stream of arbitrary size. An (n, k) block code maps a block of k input symbols to a unique block of n output symbols, and the rate of the code is the ratio R = k

n. A block code is said to be linear if each of the codewords can be formed as

a linear combination of the rows of a k × n generator matrix G. A binary linear block code can be shown to satisfy the requirements of a group, and thus has the property that any codeword added to any other codeword will result in another codeword. Linear codes are desirable because the encoding operation becomes a straightforward matrix operation that is easily implemented in modern computer systems. If we consider a message m to be a binary row vector, then the message can be encoded to a codeword c using the generator matrix G via

c= mG. (4.1)

In addition, an (n, k) linear block code has associated with it an (n − k) × n matrix H called the parity check matrix, which satisfies the property

GHT = 0. (4.2)

The parity check matrix is a useful entity in coding theory for several reasons, one of which is its utility in calculating the syndrome s of some vector r. The syndrome is formed by

s= rHT, (4.3)

and has the property that s = 0 if and only if r is a codeword of C [4]. This makes the syndrome useful for error detection; if the syndrome is nonzero for some received vector r, then we know that r is not a codeword and an error has occurred. In addition, for any (n, k) linear block code C there exists an (n, n − k) dual code C⊥_{. The generator}

matrix of the code C is the parity-check matrix of the dual code C⊥, and the generator matrix of C⊥ is the parity-check matrix of C.

(43)

Chapter 4. The Secrecy Of Coset Codes In A Multi-hop Network 33 m m′ 00 01 10 11 00 0000 0101 1010 1111 _C₀ 01 0001 0100 1011 1110 _C₁ 10 1000 1101 0010 0111 _C₂ 11 1001 1100 0011 0110 _C₃

Figure 4.4: _{A (4, 2) code and its cosets.}

4.2.3 Coset Coding

One secrecy coding technique that shows promise for use in the multi-hop network is called coset coding [10]. In order to understand how this coding technique works, it is first beneficial to have a working understanding of cosets. Let hG, ∗i be a group defined by the set G and some operator on the set ∗, and let hH, ∗i be a subgroup of hG, ∗i. For any g ∈ G, the left coset of H is the set g ∗ H = {g ∗ h|h ∈ H}. Similarly, the right coset of H is the set H ∗ g = {h ∗ g|h ∈ H} [4]. Cosets of a subgroup can be thought of as translations of that subgroup within the group. Note that if the chosen operator ∗ is commutative, then the left cosets and right cosets are equivalent. In the context of this work it is always the case that G = h{0, 1}n_,_{⊕i, that is, the group under consideration}

is the set of all binary vectors of length n under the binary exclusive-or operation. This operation is commutative, and so any cosets of a subgroup H will be referred to as such, without the specification of left or right. Some useful properties of cosets are that all cosets of a subgroup have the same size, have no intersection, and are exhaustive in the group.

An (n, n−k) binary linear code C′ forms a subgroup of the group of all binary codewords of length n. The size of the group is 2n _{and the number of codewords in C}′ _{is 2}n−k_.

Therefore there exist 2k _{different cosets of C}′_{, denoted C}

0,C1, . . . ,C2k₋₁, where C0 is

equivalent to C′. Coset codes use a linear block code and its cosets to do the encoding. An example of a (4, 2) code and its corresponding cosets is given in Figure 4.4.

Using such a code, a message m ∈ M = {0, 1, . . . , 2k_{− 1} can be encoded by selecting}

a codeword c from the coset Cm randomly. The encoding process can also be described

in terms of matrix operations [3] as follows

x=hm v i " G′ G # (4.4) = mG′+ vG, (4.5)

Beyond Cryptography: A Multi-layer Approach to Communication Privacy