Umeå universitet
Bachelor Thesis
Spring -13
Cloud computing
from a privacy perspective
Author:
Daniel Evertsson
Supervisor:
Jerry Eriksson
September 6, 2013
Abstract
The cloud could simplifies the everyday life of private individuals as well as big enterprises by renting out recourses. Resources such as storage capacity, computational power or cloud-based applications could be accessed without the need to invest in expensive infrastructure. Even though many enterprises could benefit from using cloud services they hesitate, partly because they fear data leakage when storing sensitive data in the cloud environment.
The goal has been to prevent unauthorized users to access the users’ data by using client-side encryption. The solution must be able to support ex- isting features. For example many applications support multiple devices, which means that the user can access the same data from devices such as Smartphone, Tablets and desktop computers.
The result showed that there are two main approaches to implement client- side encryption. The first approach bases the encryption key on random elements. It’s without a doubt the most secure method to use, but it’s not user-friendly. The user has to distribute the generated encryption key between all the devices, for example moving files back and forth. The second approach bases the encryption key on a password. The security will decrease but it will be more user friendly.
It appears that the biggest problem related to client-side encryption, isn’t
the encryption itself, but the distribution of encryption keys. As the number
of users increase, the key destitution problem gets more distinct. Often the
key distribution is handled by something called a key manager, which could
operate at different levels. It could be built into the application or it could
be an external application. There are organizations which made guild lines
for how to design key management systems.
Acknowledgements
First of I would like to thank Cristian Klein at the department for distributed
systems for coming up with the idea for this thesis. He has also provided a lot
of valuable input and support. I would also like to thank the teachers Jerry
Eriksson and Pedher Johansson for valuable input to this project.
Contents
1 Introduction 1
1.1 Client-side encryption . . . . 2
1.2 Problem statement . . . . 3
1.3 Definitions . . . . 3
2 Existing solutions that offers Storage-as-a-service 5 2.1 CrashPlan . . . . 5
2.2 Mozy . . . . 6
2.3 TeamDrive . . . . 6
2.4 Wuala . . . . 6
2.5 Summary of common encryption techniques . . . . 7
2.6 Other solutions . . . . 7
3 Client-side encryption strategies 9 3.1 User supplied key . . . . 9
3.2 Password based key . . . . 10
3.2.1 Test implementation . . . . 13
3.3 PBKDF vs. Random based encryption key . . . . 15
4 Conclusion 19 4.1 Client-side encryption . . . . 19
4.2 PBKDF or Random based encryption key . . . . 20
4.3 Dynamic iteration . . . . 20
4.4 Client-side encryption drawbacks . . . . 21
4.5 Future work . . . . 22
Bibliography 25
1
Chapter 1
Introduction
In today’s society the use of different internet-connected devices has in- creased dramatically. We access the internet though devices such as Smart- phones, Tablets, laptops and desktop computers. Between the years 2003 and 2010 the number of devices increased from 500 million to 12.5 billion devices[1]. This is an increase of 2500% in seven years. In 2010 there where almost twice as many devices as there where people in the world. Users has developed a need to store and access the same data from there different devices.
As a solution to the problem, a concept called cloud computing has been developed. The idea is to let the user access the clouds resources such as storage, software, platforms and infrastructure 1 . As a user you get access to these resources through the internet, often by using a thin client like a web browser or a client application. You get access to the resources without having to invest in new infrastructure or developing new software. Another benefit with cloud computing is that the user only pays based on the re- courses consumed.
Even though there are many advantages with cloud computing many com- panies hesitate to use it. In 2012 Varonis Systems Inc presented a research which showed that 80 percent of the interviewed companies didn’t want to invest in cloud based solutions. They didn’t even allow their employees to use existing cloud based services [2]. The main reasons where that they feared data leakage, security breaches and compliance issues. 70 percent said
1
If you want to know more about different kinds of cloud services visit TechNet Mag-
azine (http://technet.microsoft.com/en-us/magazine/hh509051.aspx)
2 CHAPTER 1. INTRODUCTION
that they would use cloud based services if they were as robust as internal tools.
Because the security is a crucial element in whether companies will start using cloud based services or not, this will be the main focus of this thesis.
This thesis will study different encryption techniques which could be used to encrypt data stored at the cloud provider.
Since user often needs to access data from multiply devices this factor should be taken into account. The users should be able to access their files from devices like desktop computers, laptops, Smartphone’s and Tablet’s. In order to identify the user a single user account should be used. Since all devices involved should be able to use the encryption technique presented in this thesis hardware limitation, like computational power, should be taken into account.
1.1 Client-side encryption
To make it more difficult for a unauthorized people 2 to access the users data it should be encrypted. One option would be to let the cloud provider encrypt all the data that is stored in the cloud. This method is called server- side encryption. The problem with this approach is that if a attacker gets access to the cloud-provider or if an employee of the cloud provider tries to access the data they will also have access to the decryption key which makes it very easy to decrypt the data.
To make the data less accessible a method called client-side encryption will be used to encrypt all the users’ data before it’s sent to the cloud provider.
In contrast to server-side encryption, where the encryption key is stored by the cloud provider, the client-side encryption approach only stores the encryption key locally. This will prevent the cloud provider from accessing the data since they won’t know how to decrypt the it.
2
Unauthorized people could be employees of the cloud provider or people who broken
into the cloud providers system
3 1.2. PROBLEM STATEMENT
1.2 Problem statement
First of I will look at existing solutions that offers Storage-as-a-service. The solutions that are interesting are those who offer some kind of client-side encryption. Secondly the most common client-side encryption techniques will be identified and described in more detail. Advantages and disadvantages with the different approaches will be pointed out. The goal is to decide which encryption technique offers the highest security level. Then in order to see how the encryption affects the performance of the client application, a test should be implemented to see how the encryption of large files affects the execution time of the application. In the last part a discussion about the different encryption technique will be presented. Hopefully this thesis will be able to identify the biggest problems related to client-side encryption.
1.3 Definitions
In this section terms often used in this thises will be defined.
Salt:
Salt is often random generated data used to encrypt data. The purpose of the salt is to aggravate, so called rainbow attacks [3]. In a rainbow attack the hacker generates a table of encryption keys. The table is generated once and then used to test all the generated keys for a given number of users. The idea is to add a salt when generating the encryption key. The salt should be generated by random, or at least be different for every user. This forces the hacker to generate a new rainbow table for every user, which is a very expensive operation. The salt is considered public information, which means that even if the salt is known to the hacker, it will still increase the resources needed to crack the encryption.
SHA:
Secure hash algorithm (SHA) was developed by the United States National Security Agency. Together with MD5, SHA is the most conventional hash function used in cryptography.
AES:
Advanced Encryption Standard (AES) is a standardized encryption algo-
rithm developed by National Institute of Standards and Technology. The al-
4 CHAPTER 1. INTRODUCTION
gorithm is built to use encryption keys by length 128, 192 or 256 bit [4].
Account password:
This is a password that is used to authenticate a user when logging in to the system. The account password will be stored in the cloud and there by accessible to the ones who got access to the cloud-provider.
Archive password:
This is a password used to encrypt data. Its only stored locally unlike an
account password, which is stored online. It’s also worth mentioning that if
the archive password is lost there will be no way to decrypt the data.
5
Chapter 2
Existing solutions that offers Storage-as-a-service
There are cloud providers who try to ensure the privacy of their users. People from Fraunhofer Institute for Secure Information Technology have written a report in which they compare different cloud storage providers and evaluate the applications based on different criteria [5]. The criteria that are evaluated are whether the applications support any kind of encryption technique among other things. Out of the seven applications that are benchmarked, the four applications that support client-side encryption has been selected in order to identify common techniques used for client-side encryption. The applications that will be presented in this chapter are CrashPlan, Mozy, TeamDrive and Wuala. In the last part of this chapter other applications, which is not presented in the report witten by Fraunhofer Institute for Secure Information Technology, will be studied in order to see if they have come up with any other solution to the client-side encryption problem.
2.1 CrashPlan
CrashPlan 1 offers three kinds of encryption techniques. As default the ac- count password, which is known by CrashPlan, will be used to generate a 128-bit encryption key. Secondly the user could choose an archive password, which is not known to CrashPlan, it will be used to encrypt the encryption
1
Applcation created by Code 42 Software
6
CHAPTER 2. EXISTING SOLUTIONS THAT OFFERS STORAGE-AS-A-SERVICE key. The encrypted key will be stored in the cloud and distributed to other clients. In the third alternative the user enters an encryption key which is only stored locally.
2.2 Mozy
Mozy 2 offers two methods for encryption. All data is encrypted on the client before sent to the cloud-provider. The first option is to use a 448-bit encryption key provided by and also known to Mozy. The user could also enter a private 256-bit encryption key which will only be stored locally.
2.3 TeamDrive
TeamDrive 3 uses a concept called space which is similar to a folder. When created the space could be made empty or based on an existing folder. All files that are stored in the space will be transmitted to the cloud provider. For every space a unique AES-256 key is generated which means that every space has an individual encryption key. In order to share spaces between different devices the encryption key for that particular space has to be distributed to the other devices. This is done by letting the user export the key to a “.pss”- file. The file then has to be transferred by the user to the new device.
2.4 Wuala
Wuala 4 uses something called convergent encryption. Based on each file’s content a hash is calculated, the hash is used to encrypt the file. The hash is then encrypted using the account key. The only way to access the key is to own the original file. The method has one big flaw; it’s open to so called “confirmation of a file attack” where the attacker knows the content of a file. If this is the case then they can verify that a user owns a copy of that file. The attack is most efficient if the text is publicly available, for example copyrighted material. It’s also very simple to see if two users share the same file.
2
Applcation created by EMC Corporation
3
Applcation created by TeamDrive Systems
4
Applcation created by LaCie
7 2.5. SUMMARY OF COMMON ENCRYPTION TECHNIQUES
2.5 Summary of common encryption techniques
Both CashPlan and Mozy offer server-side encryption, or rather a key gen- erated and stored by the cloud provider. The applications also lets the user enter an encryption key which are only stored locally. CashPlan also offers a third alternative where the user enters an archive password. TeamDrive on the other hand generates a key when a so called space is created, which is only stored locally. Wuala uses convergent encryption where the encryption key is calculated based on the content of the file being encrypted.
2.6 Other solutions
There are other Cloud providers, which are not mentioned in the report writ- ten by Fraunhofer Institute for Secure Information Technology, which offers client-side encryption. Applications like Idrive 5 , Swissdisk 6 and SpiderOak
7 . They have solved the client-side encryption by using the techniques men- tioned in previous section. To be more specific Idrive lets the user enter an private encryption key. Swissdisk and SpiderOak uses an archive password in order to generate an encryption key.
5
Applcation created by IDrive Inc
6
Applcation created by SwissDisk ICS
7
Applcation created by SpiderOak
8
CHAPTER 2. EXISTING SOLUTIONS THAT OFFERS
STORAGE-AS-A-SERVICE
9
Chapter 3
Client-side encryption strategies
By studying the existing solutions I have identified two main approaches to solve the problem concerning client-side encryption. In this chapter this approaches will be presented and their strengths and weaknesses will be pointed out.
3.1 User supplied key
It’s pretty common to let the user enter a generated encryption key which will only be stored locally. The key could sometimes be generated by the client application or in other cases third party programs like an online key generator could be used. In order to make it harder to crack the encryption the user should make sure that the encryption key is based on some random element.
The length of the key is also an important factor. Today the recommended length of an encryption key is 256-bits, since the AES supports encryption key up to 256-bits[4].
One flaw with this technique is that there can be many devices connected to the same user account. If that’s the case then the encryption key has to be distributed between the different devices. One simple solution would be to let the user memorize the 256-bit long encryption key. If a the encryption key would be presented using common characters 1 used in passwords the key
1
The definition of common characters are [0-9], [a-z] and [A-Z]
10 CHAPTER 3. CLIENT-SIDE ENCRYPTION STRATEGIES
will be approximately 43 characters long. The probability that the user will be able to memorize this long random generated key is not reasonable.
There are other ways to distribute the encryption key like the approach used by TeamDrive, where the encryption key is exported to a “.pss”-file. One thing to remember is the fact that no information about the encryption key should be stored in the cloud, for security reasons. The cloud provider can’t be involved in the key distribution for the same reasons as server-side encryption shouldn’t be used. The risk that the encryption key is hijacked by the cloud provider is too great a threat.
3.2 Password based key
Another common way to achieve client-side encryption is to let the user enter an archive password, which will be used to encrypt the data. Based on re- search made by a scientist from Council for Scientific and Industrial Research in 2009, most passwords are between 6-9 characters long [6]. For more de- tailed statistics see Figure 3.1. Compared to the 43 characters that a 256-bit encryption key corresponds to, a password would most likely result in a re- duced number of possible key combinations. See Table 3.1 for information on how the password length affects the number of possible combinations.
Characters Number of combinations Number of bits
6 5, 68002 · 10 10 ∼ 36 − bits
7 3, 52161 · 10 12 ∼ 42 − bits
8 2, 18340 · 10 14 ∼ 48 − bits
9 1, 35371 · 10 16 ∼ 54 − bits
10 8, 39299 · 10 17 ∼ 60 − bits
20 7, 04423 · 10 35 ∼ 120 − bits
30 5, 91222 · 10 53 ∼ 180 − bits
40 4, 96212 · 10 73 ∼ 240 − bits
43 1, 18261 · 10 77 ∼ 256 − bits
Table 3.1: How the number of characters ([0-9][a-z][A-Z]) used in a password
affects the number of possible key combinations. The last column shows how
many bits is needed to represent the number of combinations.
11 3.2. PASSWORD BASED KEY
Figure 3.1: The diagram shows how many percent of the 46000 MySpace users, used a given numbers of characters in their passwords
To increase security something called Password-Based Key Derivation Func- tion (PBKDF) could be used. The purpose of a PBKDF is to take a password and based on that generate a more complex encryption key, and thereby in- crease the time needed to crack the encryption [7]. The function adds a salt to the password. The purpose of the salt is to prevent rainbow attacks, see section 1.3 for more information. To make this possible the salt has to be different for every user. When choosing salt a simple solution would be to use the username as salt. This will ensure that every user gets a unique salt.
Another solution could be to use something called a "keyfile" where the salt would be based on the content of the file. The file could be any file, for example a family photo. The strategy is used by applications like TrueCrypt [5]. Like the client-generated encryption key, the information has to be dis- tributed between the clients. Since the salt is considered public information, the file could be stored in the cloud unencrypted.
To make it even harder to get access to the encrypted data a unique random
generated salt could be used. The salt has to be stored together with the
12 CHAPTER 3. CLIENT-SIDE ENCRYPTION STRATEGIES
encrypted data.
After the salt has been added the resulting string is hashed using an approved hash function, like SHA-256, to generate a 256-bit key. In order to increase the resources needed to crack the encryption, the encryption key is hashed a given number of times. Like the salt, the number of iterations is consid- ered public information. In a report written by people from the National Institute of Standards and Technology, the number of iterations should be at least 1000[7]. This means that an attacker would have to do 1000 hash computations for every password, which increase the time needed before he will be able to test a given password. This is based on the assumption that the attacker knows the hash function and the number of iterations.
From the users perspective the time needed to make the calculations won’t make a big difference. As long as the number of iterations is not too high which will result in a delay in the application. 1000 iteration is considered minimum while using a PBKDF. Since an increased number of iterations amplify the resources needed to calculate the encryption key the higher the number the better. Since the system should be able to support different de- vices the devices with the smallest amount of computational power should be the one determining the number of iterations. Smartphone’s should probably be considered the weakest link.
In a report written by people from Horst Görtz Institute for IT-Security, a smartphone with a 1GHz ARM processor should be able to do 4000-10000 iterations in what they defined as a reasonable amount of time [8]. Since the number of iterations has a huge impact on the time needed to break the encryption it is desirable to have as large number of iterations as possible. To use 4000 iterations instead of 1000 would mean that the time would increase by four times.
In their report they also suggested the use of dynamic iteration count where
the number of iteration depends on the current computational power. For
example how many iteration the system is able to do in a limited amount
of time. The iteration count is then stored with the encrypted data to
make sure that the data could be decrypted. With this method the num-
bers of iterations would increase over time according to technological scaling
effects.
13 3.2. PASSWORD BASED KEY
3.2.1 Test implementation
In order to test the time needed to encrypt data a small scale implementation has been made. To keep it simple a client-server application which handles notes was developed. First off, client-side encryption was implemented using Java’s Crypto library. In order to generate an encryption key an existing Password-based key derivation function was used. The function used the account username as salt and an archive password provided by the user. It hashed the salt and password combination 2000 times using SHA-1. The produced key follows the AES.
The implementation was used to test how the encryption affects the perfor- mance of the client application. To do the test a number of files of given size was encrypted. The test showed that the encryption time where linear dependence of the size of the file. It takes less than a second to encrypt 20 megabytes of data which must be considered relatively fast. The test was made on a laptop with 2,4Ghz Intel core duo processor and 2 GB ddr3 RAM.
The operation system used was Windows 7 (32-bit).
Since users access the cloud through internet a comparison between the en- cryption and the upload speed of the internet was made.
In a report written by people from Akamai Technologies the average internet speed in Sweden is 7.3-Mbit/s [9]. Let’s convert it to megabytes per second in order to see how fast data could be sent to the cloud provider.
Megabit per second
Number of bits per byte = Speed in megabyte per second 7.3
8 = 0, 9125
In Figure 3.2 the speed needed to encrypt data is compared to the speed needed to upload the data to the cloud provider. The figure shows that the time needed to upload a file is much higher than the time needed to encrypt the data. In this case the time needed to encrypt the data will be insignificant. In order to see whether a higher internet speed would be able to compete with the encryption time I chose an internet speed of 200-Mbit/s.
In this case the encryption time was slower than the time it took to upload
the file, at least for files smaller than 30 megabytes. The result it presented
in Figure 3.3.
14 CHAPTER 3. CLIENT-SIDE ENCRYPTION STRATEGIES
Figure 3.2: The time needed to encrypt data of different size compared with
time needed to send the data to the cloud. Based on an internet connection
of 7.3-Mbit/s
15 3.3. PBKDF VS. RANDOM BASED ENCRYPTION KEY
Figure 3.3: The time needed to encrypt data of different size compared with time needed to send the data to the cloud. Based on an internet connection of 200-Mbit/s
3.3 PBKDF vs. Random based encryption key
In order to show how much time would be needed to break an encryption key made by a PBKDF compared to a generated encryption key based on random elements, a small example will be presented. In this example it will be assumed that a computer would be able to test 10 9 password per second in a brute force attack.
PBKDF:
The PBKDF creates an encryption key based on an 8 character 2 long pass- word. The number of password combinations would then be approximately 10 14 . It will be assumed that the time needed to generate a key would be
2