Anonymous networks: A theoretical and practical approach

(1)

Linköping University | IDA Bachelor Thesis | Computer and Information Science

Spring 2016 | LIU-IDA/LITH-EX-G--16/078—SE

Anonymous networks: A theoretical

and practical approach

Karl Fredrik Gudjonsson

Alexander Ulander

Tutor: Marcus Bendtsen Examinator: Nahid Shahmehri

(2)

Abstract

This thesis presents an overview of different solutions for anonymous networks. Theory behind connection- and data-anonymity is described, how to implement them and the the-ory behind commonly used cryptography. The thethe-ory behind one anonymous network is also put into practice as we demonstrate a proof-of-concept implementation of a mix-net online voting service, where RSA and AES is used for encryption. A comparison between time taken for sending different amount of encrypted votes versus non-encrypted votes, sending the votes through different number of mix-servers and varying their file sizes was made. It was observed that there was a linear increase in time taken for both encrypted and non-encrypted votes when the amount of votes being processed by the system increased from 1 to 200. This linear behaviour was also observable when the amount of interme-diate mix-servers increased from 1 to 6, and file sizes from 0.9 to 7.2 Mb. The difference between encrypted and non-encrypted data increased rapidly when more simultaneously votes were sent, more mixes were used and larger files were encrypted. For example, the system processes an encrypted vote with one mix around four times as long as an unen-crypted, and when sending 200 votes with six mixes it takes almost 30 times as long. When increasing our system’s intermediate mixes by two we attain an increase in processing time of 9.7 % per encrypted message. This processing time is also 4.4 times as long per Mb for encrypted messages than for non-encrypted.

(3)

Acknowledgments

We would like to express our thanks to our supervisor Marcus Bendtsen for the guidance throughout the entire work process. We would also like to thank our course mates, with extra appreciations addressed towards our project group members for this semester.

(4)

Abstract iii Acknowledgments iv Contents v List of Figures vi 1 Introduction 1 1.1 Aim . . . 2 1.2 Related work . . . 2 1.3 Limitations . . . 3 2 Theory 4 2.1 Mixes . . . 4 2.2 Onion routing . . . 5 2.3 Buses . . . 6 2.4 Crowds . . . 7 2.5 Encryption algorithms . . . 8 2.6 AES . . . 9 2.7 RSA . . . 9

2.8 Combining AES and RSA . . . 10

3 Method 12 3.1 Implementation of anonymous network . . . 12

3.2 Methods and settings . . . 15

4 Results 16 5 Discussion 20 5.1 Method . . . 20

5.2 Results . . . 22

5.3 Future work and ethical considerations . . . 23

6 Conclusion 25

(5)

List of Figures

2.1 Illustration of mix servers . . . 5

2.2 Illustration of Onion Routing . . . 6

2.3 Example how a bus travels between six nodes. . . 7

2.4 Illustration of two paths in a crowd . . . 8

2.5 Decryption combining RSA and AES . . . 11

3.1 Hierarchy of voting system . . . 12

3.2 Vote client . . . 14

3.3 Vote receiver . . . 15

4.1 Voting using one mix-server . . . 17

4.2 Voting using two mix-servers . . . 17

4.3 Voting using four mix-servers . . . 18

4.4 Voting using six mix-servers . . . 18

4.5 A comparison between different number of mix-servers (encryption) . . . 19

(6)

1 Introduction

The World Wide Web has during the past two decades presented us with an ever increasing amount of opportunities, including the increase in amount of websites to visit and different things to do online. For example you do not even need to leave the comfort of your own home to shop clothes or even to get groceries, instead all you need to do is order it online. However, just like with most other things in life, most technological advancements has two sides. The first side represents the already mentioned benefits the revolutionary technology brings society, whilst the other represents all the possible consequences it unwillingly brings along. Out of the possible consequences security risks are the most important to consider, especially due to the increase in hacker attacks aimed at companies the past couple of years. For example a global survey showed that cyberattacks on midsize companies saw a 64% increase between 2013 and 2014 alone [3]. Because of this our personal information, which is managed by many of these companies, runs an increasing risk of getting leaked to third parties. This raises concerns whether or not we should be anonymous online to keep our personal activities and information just that: personal.

The debate regarding whether or not, or just when, we should be anonymous online had a major milestone set when Edward Snowden revealed confidential documents which proved how various governments (mainly the United States of America through their National Security agency (NSA)) were maintaining a widespread monitoring of their citizens online activities. NSA’s official explanation for this was maintaining homeland security and to prevent terrorists attacks. However, one may argue that their supervision is stretched too far when government officials’ phones are wiretapped and 200 million global text messages per day are collected and stored [10].

However, governments are certainly not the only third party interested in our online patterns and personal information. Large IT corporations, such as Facebook and Google, advocate for a non-anonymous approach to the way we surf the Internet today, and the result of their actions to achieve this can be seen anywhere online [7]. The Google Chrome web browser, for instance, urge you to log in with your personal Google account. While this may let you load your previous bookmarks, it also logs your online activities and connects everything you do to your account. Facebook uses the same approach and either require or urge you to log into and connect your Facebook profile when you sign up to new websites or download a

(7)

1.1. Aim

new application, which allows them to often use your profile and personal information along with your contacts, photos and videos. These approaches instantiate an online profile for you which, apart from tracing your digital footprint, will tailor your online experience. This means that all your searches through the Google search engine for example will be adapted for you specifically. But apart from this customized online experience these companies will also sell information about your digital footprint to advertisers if your online profile fits their target group. This allows advertisers to target you with specific advertisements for products that are popular amongst your target group, and is called direct advertising [8].

There are today many different ways and techniques of avoiding this constant tracking of one’s online activities. There are for example various web browsers (the most famous one being the Tor-browser) that use anonymity protocols ensure not only that the information sent is encrypted and safe, but also that the information regarding from and to whom the information is being communicated is hidden from third parties. While these steps to ensure anonymity online can be taken for one to practice immoral and/or illegal behaviour, there are still many reasons law-abiding citizens would want to achieve anonymity as well. For example one might be afraid of the consequences of expressing one’s opinions under the freedom of speech, like the whistle-blowing performed by Edward Snowden, or just getting one’s identity or personal information revealed as a consequence from online counselling, surveys or forums.

A downside with applying such extensive security, however, is the overhead it adds to a network which results in longer download and upload times. This is due to the steps of encryption and decryption the nodes of such a network need to implement and perform on sent and received messages. An encrypted message is also longer than its respective non-encrypted counterpart, which means they will take longer to send.

1.1 Aim

This project aims to investigate how information can be sent over a network without someone other than the sender and designated receiver being able to:

• read what information was sent.

• tell who the sender and receiver of the information is.

With this information we will decide what methods are suitable for implementing an anony-mous voting system. The main goal is then to program such a system for the purpose of measuring which factors affect the overhead of a typical anonymous network in terms of:

• number of intermediate nodes.

• number of simultaneously entered messages. • the file size of the messages.

• if the messages are encrypted or not.

1.2 Related work

There has been work done towards implementing secure voting services [17] [18]. The focus concerning these implementations have been mainly integrity and confidentiality. Integrity involves the task to ensure that all voters are convinced that the votes are registered correctly. Any attempts to alter the elections’ integrity must be detected and dealt with correctly. The

(8)

1.3. Limitations

confidentiality of all votes must be assured to prevent eventual sale of votes, protect the vot-ers’ privacy and prevent voters from coercion-voting. Voting systems are hard to design due to the fact that these requirements are conflicting. Integrity can easily be obtained through a simple raise of hands, but this strongly conflicts the confidentiality. Confidentiality can be obtained by using secret ballots, but using this fails to ensure the integrity.

An evaluation of an electronic voting system, Civitas, was done in [12]. The paper estab-lishes that the implementation is secure through an information-flow security analysis and also through experimental results which evaluates the trade-offs between time, (cost) and security.

1.3 Limitations

Our system will have some limitations regarding the hardware of the intermediate nodes. These nodes will be local servers on one single computer, in comparison to a real life im-plementation where the servers are on their own computers and most likely on different geographical locations.

We also only have time to implement one single system to perform tests on, in order to fulfill our aim, in contrast to implementing multiple systems and comparing them with each other.

(9)

2 Theory

When speaking about anonymity online it is often divided into two categories, data anonymity and connection anonymity. Data anonymity prevents the ability to extract data sent, for example being able to read the contents of an e-mail. This is done by using different methods for encrypting the content of the message. Connection anonymity covers up the communication patterns, preventing tracing a message through a network to find the sender and receiver [1].

In this chapter we first describe four different methods to achieve connection anonymity, followed by two commonly used algorithms for encryption.

2.1 Mixes

The first work done towards hiding a communication pattern within a network was done by Chaum [4] in 1981 and is called mixes. The purpose of a mix is to cover the correspondence between a sending node and a receiving node. This is achieved by delaying, reordering and padding the traffic between the nodes in the network. Delaying and reordering is done to pre-vent attackers to preform timing-attacks, an attack where the attacker tries to compromise the system by analyzing the time taken for an encryption algorithm to be executed. Padding is added bytes of data to the original plain text message in order to prevent any attackers from predicting commonly occurring phrases in the plain text and the lengths of the encrypted messages. In other words all of this is done to prevent traffic analysis, which is an attack with the goal to determine how much nodes have communicated and the lengths of the plain text, in other words the amount of data sent between them.

In order for node S to anonymously send a message m to node R, S sends Km(R1, Kr(R0, m), B)

to the mix M. Here Kmmeans encrypted message with the public key of the mix M, and R0

and R1are random values used for padding. The mix then decrypts the message, delays the

message and reorders it with other messages and at last sends the message Kr(R0, m)to R.

The padding R0is needed in the message to prevent attackers from guessing the messages.

Assuming that an attacker is able to observe all messages being sent and a random string for padding is not used (only(Kr(m)) is sent), a potential attacker can have a good guess

(10)

2.2. Onion routing

that the attacker easier can predict any commonly occurring phrases. The padding not only prevents data analysis but it also makes it more difficult for the attacker to guess the original plain text due to the fact that the padding has to be removed as well. By attaching a random string R0it prevents any potential attackers from succeeding with this kind of attack, because

even if he manages to find the correct message m1 ₌_{m he will not know if he is correct since}

he does not know the secret value R0. This random added string is commonly known as a salt.

The basic idea of this method for connection anonymity is to first off send a message to a mix. Then this mix reorders the input to generate a new output. A mix complicates the analysis of traffic within a network, but does not work perfectly. An attacker can replay the sent messages causing the same input to the mix, in hopes that the same output will appear as well. Thus allowing the attacker to be able to learn the destination of a specific message. Due to this, a mix must make sure that identical messages can only be processed only once. Hence using one mix within a network is not ideal, and typically a sequence of mixes is therefore used in practice instead. When using a mix the message is encrypted in layers and each mix removes one layer of encryption [9].

In Figure 2.1 an example of how a series of mix servers that reorders the order of sent requests can be seen. The red request arrives first at the first mix server but leaves second. Reordering happened at the second server as well but it randomly was chosen to leave second, so it needs to wait for another request to arrive before being forwarded. At the last mix server the requests are reordered once again before arriving at their final destination.

Figure 2.1: Illustration of mix servers

2.2 Onion routing

An onion is a metaphor used to describe the layered data structure used for achieving anonymity. In an onion network, all messages are covered in layers of encryption. The encrypted message, onion, is sent through a number of network nodes called onion routers. Each layer of the onion contains the next hop in the route. When a router receives an onion it “peels” off one layer to identify the next destination, then passes the remainder of the onion to the next router. In this way the sender remains hidden due to the fact that each intermediary onion router only knows the previous and following node.

To create and send an onion the sender connects to an onion proxy that selects a set of onion routers provided in the onion network. The chosen set of nodes are arranged in a path on which the message will be transmitted. A layer of encryption is added for each router the message will pass through on the route. As the data is passed on between the routers, each router removes one layer of encryption so that the message finally appears in plain text

(11)

2.3. Buses

at the responder. Layering the encryption like this comes with an advantage. As the data is sent between the routers it appears different to each one of them. Due to this, the connection is as strong as its strongest link. Thus it requires that all onion routers in the connection are compromised for the routing information to be uncovered [11].

Figure 2.2 describes an example of an onion when the message is sent from the source through three different routers. At each router one layer of encryption is removed to learn where to send it next, and then the original message arrives at the destination. The router cannot determine if the message came from the sender or just another router.

Figure 2.2: Illustration of Onion Routing

The biggest difference between an onion routing system and a mix net is the delaying and reordering done by the intermediate nodes. The fact that this is typically not done in onion routing makes it vulnerable towards timing attacks. Instead the strength in sender anonymity in an onion routing network comes from choosing a route where an attacker is unable to position itself to fully observe all nodes on the route. This is done by choosing an unpre-dictable route through the network. This can be achieved by choosing a very geographically diversified route for the message to be sent through. Thus, onion routing can resist attackers that only can observe the traffic on parts of the network. But if the attacker manages to see the entire path, onion routing loses its security completely. On the other hand, mixing is designed in a way that even if the attacker is able to observe the entire path, security is still not breached since each mix server delays and reorders the messages on their routes. However, this comes with a drawback. A mix node must receive more than one message before sending out any, otherwise the mix only works as an onion router with a time delay. Therefore mix networks are often referred to as high-latency networks and onion routing as low-latency networks.

Garlic routing is a variant of onion routing used in the anonymous network called I2P. It encrypts multiple messages together in order to make it more difficult for attackers to perform traffic analysis [20].

2.3 Buses

The bus anonymity protocol is just like the onion routing protocol; based on a metaphor of a regular bus, which travels a circular route between all nodes in the network. Messages are placed in a seat on the bus, with a layer of encryption. The bus travels through the network between all nodes. When passing a node the message’s seat is changed and the empty seat

(12)

2.4. Crowds

is filled with random data. The node also places own encrypted messages on the bus to be delivered to other nodes on the bus-route. By using encryption, only the intended receiver is able to recover the message. Because every node alters the bus, observers are unable to tell if a node inserts a new message or just replaces an already existing seat with its decryption. Seen in Figure 2.3 the bus travels the network consisting of nodes N1 ´ N6 in an anti-clockwise manner. If N1 wants to send a message to N4, the path for the message will be N1, N2, N3, and finally N4. The sender N1 encrypts the message m with a layer of encryption for each node the message passes along the route. In this case the encryption would look as following EN2(EN3(EN4(m)))where ENi is the encryption for every node on the path and i is a number ranging from 2 to 4. N1 then places the encrypted message on a random seat on the bus. When N2 receives the bus it decrypts the first layer and the bus continues its route to N3 who also removes one layer of encryption. At last when the message reaches N4 all layers are decrypted and the plain text message m is revealed, the message is then removed from the bus and replaced with random bits of data, or a new message and then the bus is passed along the route [2].

Figure 2.3: Example how a bus travels between six nodes.

2.4 Crowds

Another theory behind anonymity is called Crowds and was proposed by Michael K. Reither and Aviel D. Rubin in 1998 [15]. It was named after the saying "blending into a crowd", hiding one person’s actions within the actions of many others. It functions by grouping users into large and geographically diversified groups (crowds). The crowd collectively issues requests on behalf of the crowds’ members. Any web server are therefore unable to determine the true source of the request due to the fact that it is equally likely that the request was sent from any member of the crowd. Even the members within the crowd cannot determine the origin of the request, or if the member is only forwarding the request, since the sender is identical to any member that only forwards the request.

A request from any member of the crowd is first passed on to a random member of the crowd. That member can either submit the request to the end server or forward it to another random member in the crowd. This process is repeated until the request is submitted to the server. By doing so the end server cannot determine who the original requester is. On the other hand this has a negative side since a user might be falsely accused of sending a request. A user connects to a crowd by starting a process called jondo, which connects to a server called blender to request access to a crowd. When the user is admitted the blender replies

(13)

2.5. Encryption algorithms

with an acceptance and information that allows the jondo to participate in the crowd. That information can be shared keys, used to authenticate other members of the crowd. Then the blender informs all other jondos about the new member and its shared key. It also informs the jondos when a member leaves the crowd. The user has to select this jondo as a proxy in the web browser, after this is done the jondo randomly selects another jondo from the crowd to forward any requests to. When a jondo receives a request it "flips a coin" with a set probability whether to forward the request to another jondo or to the actual web server. Each request travels through a random number of members before reaching its final destination at the web server. This path is maintained for a limited period of time, after this all paths are reformed.

An illustaration of how two different requests within a crowd can be routed before reaching the end server is show in Figure 2.4. Orange path: 1-4-6-2-A. Green path: 5-1-2-3-B.

Figure 2.4: Illustration of two paths in a crowd

Using crowds as a method for anonymous routing does not provide any sender anonymity against a local eavesdropper (an attacker who can observe all communication from only one user in the crowd), due to the fact that it can observe that a request output by the user did not come from any corresponding input. Another known vulnerability is that a number of attackers can join the same crowd and wait for the path to be reformed numerous of times. Each attacker logs its predecessor after each reformation. Since the sender is more likely than any other node to appear on the path, the attackers will for each round of reformation see more clearly who the sender is.

2.5 Encryption algorithms

When talking about encryption algorithms there are two types: symmetric-key - and asymmetric-key algorithms. The symmetric ones uses the same cryptographic key for en-cryption and deen-cryption, while an asymmetric uses one key for enen-cryption and one for de-cryption. We will now give an example of a symmetric and asymmetric encryption method, how they work individually and how they can be used as a combination of each other.

(14)

2.6. AES

2.6 AES

The symmetric cryptographic algorithm Advanced Encryption Standard (AES) was created based on Vincent Rijmen’s and Joan Daemens’s cipher called Rijndael, replacing the no longer secure previous standard Data encryption standard (DES). The National Institute of Standards and Technology (NIST) announced a contest to develop a new encryption standard. DES had been the standard since the 1970s but since computing power had increased the cipher be-came vulnerable against full-scale key search attacks and no longer considered safe because it only provides 256 (about 7 ˆ 1016) unique keys. Three years after announcing the contest NIST chose the cipher created by Rijmen and Daemen and released a standardized form of the algorithm as the new standard for encryption.

Implementation

AES is based on the design principle called substitution-permutation network. It consists of a series of operations where some involve replacing inputs by specific outputs and other that shuffles bits around, hence the name substitution and permutation.

All computations AES perform are done on bytes rather than bits, due to the fact that AES treats 128 bits of plain text as 16 bytes where these 16 bytes are arranged in a 4x4 matrix. The number of rounds in AES is determined by the length of the key used. For 128-bit keys AES uses 10 rounds, 192-bit keys uses 12 rounds and 256-bit keys uses 14 rounds. Each round consists of four steps:

1. Byte Substitution - all bytes are replaced with a SubByte by using a substitution box, S-Box. A component in cryptography designed to make any relations between the plain text and encrypted text hard to find. The S-box takes an input of a number of bits, then transforms them into a number of output bits by dividing them into inner and outer bits to find a corresponding value stated in the S-box.

2. ShiftRows - The four rows in the matrix are shifted. Row r is shifted by r ´ 1 bytes to the left. So the first row is left unchanged, each byte in the second row is shifted one step to the left, third row two steps and fourth row three steps.

3. MixColumns - New bytes are generated for each column in the state matrix using a fixed (4x4) matrix and multiplying each row with every column in the state matrix. This creates further diffusion in the cipher.

4. AddRoundKey - The matrix is bite-wise XORed with a round key. With the exception for the last round when you skip this step.

The process of decrypting an AES cipher text is to perform the encryption process in the reverse order with each step with an inverse function [19].

2.7 RSA

RSA is an asymmetric cryptographic algorithm, meaning that there are two different keys used for encrypting a message, one public key and one private key. The public keys can be known by everyone and are used to encrypt messages. The purpose of this is that it can be used to encrypt a message without having to exchange a secret key separately with the receiver. RSA keys are calculated as follows:

• Choose p and q as distinct prime numbers • Calculate n=pq

(15)

2.8. Combining AES and RSA

• Φ(n) = (p ´ 1)(q ´ 1)This value is kept secret

• Choose e so 1 ă e ă Φ(n)and the greatest common divisor between e and Φ(n)is 1 • Determine the modular multiplicative inverse d=e´1modΦ(n)

• Then e and n is released as the public key and d is kept as the private key

To send a message m you encrypt it with c = memod(n). Then the receiver decrypts it with following function m=cdmod(n)[16].

Due to the size of the encryption keys the possibility of successfully breaking the encryption decreases because of the increasing number of possible combinations. Today a 128-bit long encryption key is considered safe, which has 2128_{possible values (3.4 ˆ 10}38_{). In other context}

it would take 100 million (108) computers that can try 100 billion (1011) key combinations each second, (1012) years to try every 128-bit key combination in a brute force attack. On the other hand the bigger the keys the longer time and more computing power is needed in order to perform the encryption.

2.8 Combining AES and RSA

Symmetric encryption (like AES) is a lot faster than asymmetric public key encryption (like RSA) because public key encryption requires much more computational power. This also makes symmetric encryptions much more suitable for processing larger messages [13]. A drawback with symmetric encryption however is the transfer of the symmetric key between the sender and receiver, which has to be done safely. With asymmetric encryptions, on the other hand, the public keys can be made public for anyone to use for encryption since the private key for that public key is needed to decrypt the message. Although this seems practical, one needs to be sure that the public key one uses for encryption actually belongs to the designated receiver and has not been altered by a third party. This is therefore managed by trusted entities called certificate authorities which issues digital certificates, like public keys for example [6].

To avoid these disadvantages for each respective encryption method, AES and RSA are often used together. The receiver then generates RSA-keys and gives the public key to the sender. The sender then uses AES to encrypt data, and RSA to encrypt the AES-key with the receiver’s public key. The receiver then uses its private RSA-key to decrypt the AES-key and then that AES-key is used to decrypt the actual data.

Figure 2.5 illustrates this with an example of how a message is received and decrypted at nodes in a network implementing this combination of RSA and AES cryptography. First the AES-key appended to the message (which is encrypted with the mix’s public RSA-key) is decrypted with the mix’s private RSA-key. This AES-key is then used to decrypt the message and appended salt. The salt is now removed from the message and passed on to the next node for further decryption of layers.

(16)

2.8. Combining AES and RSA

(17)

3 Method

In this section we will describe the experiments we conducted in order to answer the ques-tions posed in Section 1.1. In Section 3.1 we describe the system that we developed to run the experiments on, and in Section 3.2 we account for the methods and settings used for the experiments.

3.1 Implementation of anonymous network

The system we constructed (as seen in Figure 3.1) represents a digital vote box, where users (employees or members of an organization for example) can express their opinions by voting in different polls anonymously. The entries shall only be viewed by an officially elected em-ployee in charge of the elections. This person will only be able to see what each entered vote is for, rather than by whom. This is due to the importance of a sender’s personal integrity, since a revealement of his or hers identity might lead to unwanted consequences from voters with conflicting opinions. Each link in the vote system is also encrypted so that no matter where a hacker would intercept a message, he or she would not be able to read what it contains or who the sender or receiver is.

(18)

3.1. Implementation of anonymous network

The digital vote box is constructed like a mix network, but without the mix-standard of delaying and reordering (i.e. where they collect a number of votes before sending them out in a random order). Instead, the mixes pass along the votes as soon as they receive them, and their output is in relation to their input: first in first out. The encryption combination of AES and RSA was used to preserve data anonymity. The network is composed of the following nodes: vote clients, a vote receiver and connecting mixes. A mix was a server borrowed from Linköping’s University (LiU), the vote client a Sony Xperia Z3 compact and the vote receiver a OnePlus 2, both running Android version 5.1.1. The nodes were connected with a 2.4 GHz Wi-Fi, where the mix was connected to LiU’s network Eduroam and the cellphones to a personal network at an apartment close to the university. When multiple mixes were used they were run on the same machine, but set up as independent servers with their own code, files, RSA-keys and incoming/outgoing ports for TCP.

The vote clients and vote receiver ran applications that were written in Java. Since we decided to write the code on all devices in Java, we chose to use Java’s built in encryption libraries Java Cryptographic Architecture [14] (JCA) to handle the cryptography on a all devices. The personal public and private RSA-keys for all involved nodes in the network were gen-erated with the open source cryptography library OpenSSL [5]. The private keys were then manually put onto each node in the network which they belonged to, while all the public keys on the other hand were put onto all nodes.

On vote clients a voter first has to log in with an NFC-card (a personal key card) and password to regulate that each voter only can vote once. The layout of the application run on vote clients after a user has logged in can be seen in Figure 3.2. The voter can here choose an amount of votes to send and if they should be encrypted or not. However, these options are only present for the sole purpose of testing the system and would not be available if the system would be launched, since a voter would always want to send encrypted votes and be allowed to send only one.

When a sender wants to encrypt a message using JCA it first needs to create the AES-key of 256 bytes, which is then hashed and saved locally for future encryption. Now depending on how many nodes there are in the network, the client will add one layer of encryption for each node to the data to be sent. The data consists of the AES-encrypted message meant for the receiver (padded with a random salt), the RSA-encrypted AES-key and AES-encrypted salt used for padding the message. Each layer will have its own personal AES-key so that no other mix on the route to the receiver will be able to decrypt the messages any further than its own layer. This means that if for example four mix servers are in the mix net along with one receiver, the vote client will create five AES-keys; one for each layer. The encryption for the next layer will therefore consist of this recently explained data (padded with a new random salt) encrypted with the AES-key for this new layer, along with this new AES-key encrypted with the public key of the new designated mix and the new random salt, also encrypted with AES.

(19)

3.1. Implementation of anonymous network

Figure 3.2: Vote client

Each time a mix server receives a message it first starts a new thread for each, which is an operation in Java allowing parallel processing to speed up the process. It then reads the message from a file and decrypts the appended AES-key with its private RSA-key, which it uses to decrypt the message and salt. Now that the salt is known it is removed from the message. The decrypted message is then sent to the next node on the message’s route through the network on it its way to the vote receiver.

The vote receiver’s interface can be seen in Figure 3.3. The vote receiver works like a server which continuously listens to incoming traffic and is started by pressing the button "START VOTE SERVER". This will first notify the last mix in the mix-net that it has started and which IP and Port it is listening on. When a message is incoming the same decryption process as explained for the mixes is initiated. When all votes have arrived for that specific round of voting, the time it took is recorded and printed in a scrollable list along with how many votes were sent and if they were encrypted or non-encrypted. At the top of the screen there is also a vote counter which counts how many votes that have been received in total. It is also worth to mention that when a non-encrypted vote is transferred through the voting system it goes through the same route and procedures, except for the procedures which are cryptography-linked such as encryption, decryption and handling of various keys.

(20)

3.2. Methods and settings

Figure 3.3: Vote receiver

3.2 Methods and settings

All the tests were performed sending between 1 and 200 votes (each being 50 bytes big) at the same time with intervals of 25, measuring the time taken for all the votes to be sent from the vote client to the vote receiver. This was performed for a mix network of 1, 2, 4 and 6 servers. We also performed tests to measure the time it took when varying the file size of the messages sent. The file sizes were then set between 0.9 and 7.2 Mb, with an interval of 0.9 Mb, and only one mix was used.

When doing the vote tests we repeated the same test five times and calculated the aver-age time in ms. This was performed because when conducting the tests some differences in time taken was detected, therefore by calculating the average time for five tests we were able to get a more accurate result.

(21)

4 Results

In this section we present the results we got from our measurements of the voting system. Figure 4.1 shows the processing time of the voting system when varying the number of en-crypted versus non-enen-crypted votes sent through it from 1 to 200, with intervals of 25 votes. Figures 4.2 to 4.4 also depict these measurements, but with different number of mix servers in their respective network. Figure 4.5 illustrates the different results for varying the amount of mixes for encrypted votes in one graph, to highlight their differences more clearly. Figure 4.6 shows the processing time attained from varying the size of the encrypted and non-encrypted messages being sent through the system, varying between 0.9 to 7.2 Mb with an interval of 0.9 Mb. The exact measured time taken for each amount of votes sent can be seen in the bottom two rows of the tables under each graph.

(22)

Figure 4.1: Voting using one mix-server

(23)

Figure 4.3: Voting using four mix-servers

(24)

Figure 4.5: A comparison between different number of mix-servers (encryption)

(25)

5 Discussion

This chapter reviews the methods used, and derived results, for this thesis. It also discusses the work in a wider context and future work on the subject. An ethical view on this subject is also considered.

5.1 Method

When we constructed our voting system we first off wanted to make sure that the method we used for anonymity would preserve the sender anonymity of the voter, due to the im-portance of personal integrity in voting. However we also wanted the method to implement data anonymity just as much. This is because even if a third party would intercept a vote and not be able to see who voted, the information regarding who was voted for is itself valuable to protect. This is because if enough information regarding who has been voted for would be leaked, it could affect the outcome of how people vote. This is the reason we decided to construct a network of mixes, since they are constructed to maintain data anonymity just as much as sender anonymity. The data anonymity is maintained through the commonly used encryption with AES and RSA, since the combination makes it impossible to decrypt the votes without the private key. The combination also allows for a flexibility so that the voting setup can be easily installed where ever it would be used, due to the AES-keys not needing to be manually transported safely between voter and vote receiver. Instead the public keys for each node are distributed amongst each other. We also use a new AES-key to encrypt each layer hindering that if any mix on the route would be corrupt, it would not be able to decrypt the message any further than its own layer. Mixes also use padding which makes it harder for a third party to decrypt the message by guessing what the votes are. This is especially important for a voting system since votes are fairly simple messages to guess the layout of, especially if the candidates are few. The sender anonymity is preserved by only keeping destination information about the previous and next node on the route in the message, and not where the message was sent from.

Since we are not using delaying and reordering, our system becomes similar to a low la-tency onion routing network. Although there is still a big difference and that is the fact that we are not using an onion network proxy server to determine the route for the vote message through the network. Instead the route and number of encryption layers is predetermined at

(26)

5.1. Method

the sending vote client.

Using an implementation of the bus routing protocol would not be such a good idea due to the fact that the routing scales quadratically with the number of participants. The round trip time for the bus visiting N nodes is O(N). With the processing and network delays for each node also being O(N). This results in a total round trip time of O(N2), which would limit the scalability of a system using this protocol [1].

Implementing a crowd-based solution for a voting service is not a suitable solution due to hiding another action among others can lead to a person being falsely accused for placing a vote that might not have been placed by that person from the beginning. The person is simply the end jondo in the crowd who forwards the vote to the vote receiver.

An encoding conflict between the Java methods BufferedReades/InputStreamReaders used for transferring the messages between the nodes’ sockets, and the ciphers used for en-cryption/decryption, resulted in an alteration in the encrypted messages. This altercation changed the length and content of the messages which lead to us not being able to correctly decrypt what was sent to the servers. Although this would be preferable to solve for the feasibility of the system, it did not change the time the system takes to process encrypted messages. This is because we still send the encrypted messages between the nodes in the network so that they are received at the sockets, hence that factor is included in the results for the processing time. The difference is that we decided to read the same encrypted messages for decryption from files that have the right encoding, rather from the sockets. We even measured the two different reading methods in milliseconds to ensure that they were equal in in processing time, which they were.

The clocks on the cellphones we used (i.e. vote sender and receiver) did not match cor-rectly. This means that the time recorded when starting an experiment on the sender, which was appended as a timestamp to the message to be sent, could not just be deducted from the timestamp recorded when the last decryption on the receiver was completed. Therefore, we had to measure the difference in time for the cellphones’ internal clocks, in order to take this difference into account when calculating the processing time at the receiver. We did this by manually pressing a button at the same time on both units, which displayed their respective current time in milliseconds. We repeated this procedure 10 times and then calculated the average time difference. The fact that we did this reduces the impact this time difference between the cellphones have on the results to only a minor error margin of a couple of mil-liseconds for each measurement. To avoid this we could had implemented both the sender and receiver on the same cellphone, so that they would have the same clock. For the app this would mean that we would need to combine the interfaces of the two, as seen in Figure 3.2 and 3.3, into one interface.

The fact that the mix servers were all running on the same machine is not a setup that would be preferred in reality. Instead they would be running on their own machines and at different geographical locations. Spreading out mix servers is a vital part in maintaining sender anonymity for mix networks. This is due to the fact that it becomes much harder for an observing adversary to control the traffic between the mixes since he or she would need to control all of mixes networks. Rather than just the one network our machine run-ning multiple servers is connected to. Since they are on the same machine they will also share CPU, resulting in their individual operations being slowed down if they are running simultaneously. This may decrease the processing times presented in the result, although it is important to take into consideration that the time it takes to send a message to a server on another machine might increase the processing time on the other hand.

(27)

5.2. Results

5.2 Results

As shown in the figures 4.1 - 4.4, the time taken for encrypted votes to arrive at the receiver increases significantly more in relation to the number of votes sent, than the time taken for non-encrypted votes. In Figure 4.1 we can see that when the number of encrypted votes double, the time also almost doubles. This indicates that the increase is linear, but to verify this we divide the amount of time taken for each doubling in votes. The first doubling is from 25 to 50 votes, which gives us a factor of 4524/1950=2.3 meaning that the time taken increases 2.3 times. The next doubling is from 50 to 100 votes which results in the factor 10569/4524 = 2.3. From 75 to 150 votes we get 14553/7533 = 1.9 and finally from 100 to 200 votes 19655/10569 = 1.9. Ideally the factor would be a constant 2.0 for the increase to be perfectly linear, but since the factors fluctuate close to this value it still gives an indication of linearity. This is equal to about 80-90 ms per encrypted vote sent for the whole test with one mix. The average gradient for the whole curve however (i.e. between 1 and 200 votes) is derived from calculating the difference in time divided by the difference in sent votes, which here equals 96.7. When only one encrypted vote is sent however it takes 404 ms.

When looking at the same figure for the unencrypted votes, they also seem to double in time taken as the amount of votes sent double. However when doing the same calculations for the factors like with the encrypted votes, we acquire the following factors: 1.3, 1.5, 1.7 and 2.1. This indicates that the derivative of the increase in time is linear, suggesting that the increase in time is instead exponential. This equals just a couple of ms per vote sent, in comparison to the 137 ms one single unencrypted vote took to send. The average gradient for this whole curve is calculated to 3.3. Compared to the encrypted votes’ average gradient, this means that an encrypted vote takes about 96.7/3.3=29.3 times as long (which equals an increase of 96.7 ´ 3.3=93.4 ms) to process for the system with one mix than a non-encrypted vote.

When doing the same calculations for each set-up with different amount of mixes, i.e. for figures 4.2 - 4.4 independently, we notice the same patterns for both encrypted and non-encrypted votes as for Figure 4.1.

When comparing the results for encrypted votes of figures 4.2 - 4.4 we can see that an increase by two mixes increases the overhead of the system by an average of 2600 ms. This average is calculated from subtracting the processing time of 200 votes from Figure 4.3 with the same value from Figure 4.2, adding it with the result from the same subtraction between Figure 4.4 and 4.3 and finally dividing the sum by two. When only one mix is added however we get an increase by 3002 ms when comparing the same values between Figure 4.1 and 4.2. When doing these calculations for comparisons of lower amounts of sent votes, a smaller difference in processing time is derived. For example, the average increase of overhead when adding two mixes for 25 sent votes is 1173 ms. This is due to the figures linearity. In Figure 4.5 we present the results for varying number of mixes in the network in the same graph plot to better illustrate this difference. Hence the lines are representing encrypted votes from figures 4.1 - 4.4. When estimating the average gradients of these curves, we get that 2 mix servers has an average gradient of 112.3, 4 mixes 125.1 and 6 mixes 135.0. This is a 11, 4 % increase of the average gradient going from 2 to 4 mixes, and a 8, 0 % increase going from 4 to 6 mixes. The average of these increases is 9, 7 %, meaning that adding two mix servers will increase the time it takes to process encrypted messages by 9, 7 % at an average. We did not perform these calculations for the non-encrypted votes since they only differed at an average of around 100 ms when increasing number of mixes.

In Figure 4.6 we can distinguish a linear increase in process time when the file size in-creases for both the encrypted and non-encrypted messages. To verify the graphs linearity

(28)

5.3. Future work and ethical considerations

however we calculate the factors the same way as with the previous calculations for the system with one mix. For the encrypted votes the factors for doubling file size results in the factor 2.1 going from 0.9 to 1.8 Mb, 1.8 from 1.8 to 3.6 Mb, 1.7 from 2.7 to 5.4 Mb and 2.0 from 3.6 to 7.2 Mb. Ideally the factors should be 2.0 for a perfect linearity present, since they are calculated for doublings. However since they fluctuate around 2.0, along with the average of the factors being estimated to 1.9, it still indicates that the curve is linear. The average gradient of the whole curve is estimated to 3.9, meaning that each Mb takes about 3.9 seconds to encrypt. This is a satisfying estimate since 0.9 Mb took 3.7 seconds to encrypt according to the graph.

When looking at the curve for the non-encrypted files in the same figure, its factors when doubling the file sizes the same way as for the encrypted curve results in the following fac-tors: 1.7, 2.2, 1.6, 2.0. These also fluctuate around 2.0 and have an average of 1.9, indicating that this graph is also linear. The average gradient for this curve is estimated to 0.88, mean-ing that each Mb takes about 0.88 seconds to be processed through the system. This is also satisfying since 0.9 Mb was processed for 0.9 seconds by the voting system. By comparing this average gradient with the one for encrypted votes, we can calculate that it takes 4,4 times as long per encrypted Mb than per non-encrypted Mb for the system to process.

The mixes creates a new thread for each received message, as explained in Section 3.1. This is to allow parallel processing of the votes to speed up the process, and is why a dou-bling of mixes in our system does not result in a doudou-bling of processing time. For example doubling the mixes from 2 to 4 does not increase the processing time from 22657 ms to 22657 ˆ 2 = 45314 ms, but instead it increases by 3101 from 22657 to 25758. This usage of threads is also the reason why for example 1 encrypted vote with one mix equals 404 ms and 25 votes does not equal 404 ˆ 25=10100 ms, and instead 1950 ms.

5.3 Future work and ethical considerations

This thesis has presented a basic implementation of an online voting service, comparing time taken for numerous votes being sent both encrypted and in plain text. We have also compared the time taken when sending the data through numerous mix-servers, however all these mix servers are located on the same machine. It is therefore important to take into consideration that the geographical location of the servers probably will have an impact on the time taken to cast votes. Therefore we think it would be interesting to test what their geographical location’s impact on the vote-casting time would be.

To improve the sender anonymity even further, we would also implement delaying and reordering. This would force the mixes to wait for other votes to arrive before reordering them for output, which would make an observing adversary uncertain if the message that just entered the mix is the message that leaves it straight after. The observer will then not be able to follow the message any further to its destination. Another feature that could make it even more difficult for an adversary to follow a message’s route through the network and un-derstanding its traffic patterns would be to randomize the routes and to implement dummy traffic. The dummy messages could be implemented at the mixes so that when they receive a message, they send out a dummy message containing useless information to a random mix. All of these improvements would even make the system resistant to an observer observing all traffic between all nodes in the network. Although they would increase the latency of the system, the benefit of increased sender anonymity would be favored for a voting system. A flaw with our system is the validation of authenticity of the votes sent, meaning that we have no method of confirming that the received votes on the vote server are actually from

(29)

5.3. Future work and ethical considerations

authorized vote clients. For example, a third party might intercept a message sent along the network and discard it. It would then be able to create a new message with a different vote and encrypt it using the vote receiver’s public key, and send the new message into the network towards the vote receiver. Our public keys are also not approved by a CA, which would be needed for the system to be used in the future.

For an online voting service to be used in any major election, its security must be taken into great consideration. This is to prevent anyone to tamper with the system, since this could have fatal consequences. First of all, every person’s identity and their votes must be protected. Secondly, if anyone manages to change votes without anyone noticing, a whole election’s outcome can be changed. Therefore when implementing an electronic voting system security and stability must be guaranteed for it to work without any complications.

(30)

6 Conclusion

This thesis contains a proof of concept of an implementation of a secure online voting system. Looking at the results we got, we can draw the following conclusions. We can clearly see that encrypting data and sending it takes considerably longer time than sending it in plain text, and the difference increases rapidly when more votes are simultaneously sent, more mixes are used and larger files are encrypted. For example, the system processes one encrypted vote with one mix around four times as long as an unencrypted, and when sending 200 votes with six mixes it takes almost 30 times as long. When increasing our system’s intermedi-ate mixes by two we attain an increase in processing time of 9.7 % per encrypted message. This processing time is also 4.4 times as long per Mb for encrypted messages than for non-encrypted. Our system had its flaws, if which would be resolved would result in an ever better preservation of the sender anonymity, and longer processing time. This means that the purpose and goals of a system must be taken into consideration when deciding what methods to implement for maintaining anonymity, and if data or receiver anonymity is prioritized. As mentioned, our system prioritized both data and sender anonymity and high latency was not a concern. However if a system has the goal of fast deliveries of data for example, where the receiver anonymity is not as prioritized, then one might consider to implement onion routing to achieve low latency due to the absence of delaying and reordering. One might then also consider not to handle big files and using too many mixes due to the big increase in overhead we measured for larger encrypted files and increase in intermediate mixes. An example of such a system could be when the information processed is personal, so that absolutely no one but the requester should be able to access it, but the mere fact that the user requested it is of no great concern.

(31)

Bibliography

[1] Niklas Carlsson et al. “Performance modelling of anonymity protocols”. In: Performance Evaluation Elsevier (2012).

[2] Amos Beimel and Shlomi Dolev. “Buses for Anonymous Message Delivery”. In: Journal of Cryptology (2003).

[3] David Burg. “Cyberattacks on the rise: Are private companies doing enough to protect themselves?” In: PricewaterhouseCoopers LLP (2014).

[4] David L. Chaum. “Untraceable Electronic Mail, Return Adresses and Digital Pseudonyms”. In: Communications of the ACM (1981).

[5] OpenSSL Software Foundation. OpenSSL. 2015.URL: https://www.openssl.org/ (visited on 05/24/2016).

[6] GlobalSign. Certificate Authorities Trust Hierarchies. 2016. URL: https : / / www . globalsign . com / en / ssl information center / what are -certification-authorities-trust-hierarchies/(visited on 05/24/2016). [7] The Guardian. Online identity: is authenticity or anonymity more important? 2012. URL:

https : / / www . theguardian . com / technology / 2012 / apr / 19 / online -identity-authenticity-anonymity(visited on 05/23/2016).

[8] Mark Hachman. “The price of free: how Apple, Facebook, Microsoft and Google sell you to advertisers”. In: PCWorld (2015).

[9] Joos Vandewalle Joris Claessens Bart Preneel. “Solutions for Anonymous Communica-tion on the Internet”. In: IEEE (1999).

[10] Dave Lee. Edward Snowden: Leaks that exposed US spy programme. 2014.URL: http://

www.bbc.com/news/world-us-canada-23123964(visited on 05/22/2016). [11] Paul F. Syverson Michael G. Reed and David M. Goldschlag. “Anonymus Connection

and Onion Routing”. In: IEEE Journal on selected areas in communications (1999).

[12] Stephen Chong Michael R. Clarkson and Andrew C. Myers. “Civitas: Towards a Secure Voting System”. In: Computing and Information Science Technical Report (2007).

[13] Microsoft. TechNet Encryption.URL: https://technet.microsoft.com/en-us/

(32)

Bibliography

[14] Oracle. Java Cryptographic Architecture. 1993. URL: https : / / docs . oracle . com / javase/8/docs/technotes/guides/security/crypto/CryptoSpec.html# SimpleEncrEx(visited on 05/24/2016).

[15] Michael K. Reiter and Aviel D. Rubin. “Crowds: Anonymity for Web Transactions”. In: ACM Transactions on Information and System Security (TISSEC) (1998).

[16] Adi Shamir Ronald L. Rivest and Leonard Adleman. “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems”. In: Communications of the ACM (1981). [17] Gurchetan S. Grewal Sergiu Bursuc and Mark D. Ryan. “Trivitas: Voters directly

veri-fying votes”. In: School of Computer Science, University of Birmingham (2011).

[18] Mark D. Ryan Sergiu Bursuc Gurchetan S. Grewal and Peter Y. A. Ryan. “Caveat Coerci-tor: coercion-evidence in electronic voting”. In: IEEE Symposium on Security and Privacy (2013).

[19] National Institute of Standards and Technology (NIST). “Announcing the Advanced encryption standard (AES)”. In: Processing Standards Publication 197 (2001).

[20] Bassam Zantour and Ramzi A. Haraty. “I2p Data Communication System”. In: Interna-tional Conference on Networks (2011).

(33)

Anonymous networks: A theoretical and practical approach

Spring 2016 | LIU-IDA/LITH-EX-G--16/078—SE