Using Neural Networks to Identify Infected Files for Protection against Ransomware

Full text

(1)LiU-ITN-TEK-A--20/049-SE. Using Neural Networks to Identify Infected Files for Protection against Ransomware David Eriksson 2020-08-26. Department of Science and Technology Linköping University SE-601 74 Norrköping , Sw eden. Institutionen för teknik och naturvetenskap Linköpings universitet 601 74 Norrköping.

(2) LiU-ITN-TEK-A--20/049-SE. Using Neural Networks to Identify Infected Files for Protection against Ransomware The thesis work carried out in Datateknik at Tekniska högskolan at Linköpings universitet. David Eriksson Norrköping 2020-08-26. Department of Science and Technology Linköping University SE-601 74 Norrköping , Sw eden. Institutionen för teknik och naturvetenskap Linköpings universitet 601 74 Norrköping.

(3) Master’s Thesis Linköpings university Master of Science in Media Technology and Engineering. Using Neural Networks to Identify Infected Files for Protection Against Ransomware Använda neurala nätverk för att identifiera infekterade filer för skydd mot ransomware. David Eriksson Supervisor: Matthew Cooper Examiner: Camilla Forsell LiU-ITN-TEK-A–20/049-SE 26/08/2020.

(4) Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.. Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/. © David Eriksson.

(5) ACKNOWLEDGEMENTS. I would like to thank Combitech for supporting me in writing this thesis. A special thanks to Anders Staaf and Mattias Lennartsson for all the support and guidance while developing this thesis.. i.

(6) ABSTRACT. This master thesis report presents the development process and result of an artificial neural network model that can predict if a file has been encrypted. It was developed as a stand alone component that can be implemented in to a backup system. The development process was tested to determine the best possible outcome and it was implemented to a rudimentary backup system. The resulting software was a command line interface that gave the user full access to the training and testing process. The backup system is also implemented in this command line interface for test purposes. The model was successful in identifying encrypted files..

(7) CONTENTS. iii. CONTENTS. Acknowledgements. i. Abstract. ii. Figures. v. Tables. vii. 1 Introduction. 1. 1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 1.2. Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. 1.3. Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. 1.4. Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. 2 Theory. 3. 2.1. Backup systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 2.2. Ransomware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. 2.3. Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 2.4. Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 2.5. Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 2.5.1. 5. Neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..

(8) iv. CONTENTS. 2.5.2. The network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. 2.5.3. Common network architectures . . . . . . . . . . . . . . . . . . . . .. 6. 2.5.4. Activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6. 2.5.5. Optimizer & Loss function . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.5.6. Epochs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.6. File signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 2.7. File Entropies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7. 3 Method. 9. 3.1. Development Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 3.2. Creating Training and Testing Data . . . . . . . . . . . . . . . . . . . . . .. 9. 3.3. File Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. 3.4. Extracting Useful Data from Files. . . . . . . . . . . . . . . . . . . . . . . . 10. 3.4.1. Extracting the signature . . . . . . . . . . . . . . . . . . . . . . . . . 10. 3.4.2. Extracting file type . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. 3.4.3. Calculating the file entropy . . . . . . . . . . . . . . . . . . . . . . . 11. 3.5. Neural Network Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. 3.6. Evaluating the Model. 3.7. Implementation in a Backup system . . . . . . . . . . . . . . . . . . . . . . 12. 3.8. Testing the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. 4 Result 4.1. 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. 15. Training model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1.1. Architecture test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15. 4.1.2. Activation test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. 4.1.3. Optimizer test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23. 4.1.4. Loss function test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25. 4.1.5. Optimizer and Loss function combination test . . . . . . . . . . . . . 27. Epoch test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.

(9) 4.3. Encryption method test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35. 4.4. Implementation in a Backup system . . . . . . . . . . . . . . . . . . . . . . 36. 5 Discussion 5.1. 37. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.1.1. Training model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37. 5.1.2. Epoch tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38. 5.1.3. Encryption method test . . . . . . . . . . . . . . . . . . . . . . . . . 38. 5.2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38. 5.3. Work in a wider context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38. 5.4. Source criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39. 6 Conclusion. 41. Referenser. 43.

(10) vi. List of Figures. LIST OF FIGURES. 4.1. A plot of all tests for the architecture . . . . . . . . . . . . . . . . . . . . . . 17. 4.2. A plot of the architecture test without the first outlier . . . . . . . . . . . . 17. 4.3. A plot of the best tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. 4.4. A plot of all tests in the activation function test . . . . . . . . . . . . . . . . 20. 4.5. A plot of the activation function tests with the outliers omitted . . . . . . . 20. 4.6. A plot of the two best tests from the activation tests . . . . . . . . . . . . . 21. 4.7. A plot of all the different Sigmoid configurations . . . . . . . . . . . . . . . 22. 4.8. A plot of the best Sigmoid configurations . . . . . . . . . . . . . . . . . . . 22. 4.9. A plot of all tests in the optimizer test . . . . . . . . . . . . . . . . . . . . . 24. 4.10 A plot of the best performing tests in the optimizer test . . . . . . . . . . . 24 4.11 A plot of all tests in the loss function test . . . . . . . . . . . . . . . . . . . 26 4.12 A plot of the best performing loss functions . . . . . . . . . . . . . . . . . . 27 4.13 A plot of all tests for optimizer and loss combination test . . . . . . . . . . 31 4.14 A plot of all tests for optimizer and loss combination test . . . . . . . . . . 31 4.15 A plot of all tests for optimizer and loss combination test . . . . . . . . . . 32 4.16 A plot of success rate from 10 epochs to 150 epochs . . . . . . . . . . . . . . 33 4.17 A plot of success rate from 160 epochs to 300 epochs . . . . . . . . . . . . . 33 4.18 A plot of success rate from 310 epochs to 450 epochs . . . . . . . . . . . . . 34.

(11) 4.19 A plot of success rate from 460 epochs to 600 epochs . . . . . . . . . . . . . 34 4.20 A plot of success rate from 10 epochs to 600 epochs . . . . . . . . . . . . . . 35.

(12) viii. List of Tables. LIST OF TABLES. 4.1. Test for network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 16. 4.2. Test for activation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 19. 4.3. Test for the Sigmoid activation methods . . . . . . . . . . . . . . . . . . . . 21. 4.4. Test for Optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23. 4.5. Test for loss functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26. 4.6. Test for Optimizer and loss function combinations. 4.7. Testing the use of the network for untrained encryption . . . . . . . . . . . 36. . . . . . . . . . . . . . . 30.

(13) CHAPTER 1 INTRODUCTION. Ransomware is a malicious malware that encrypts systems files and demands a ransom from the user. Upon paying the ransom, the attacker will decrypt all files giving back access to the user. In the first three quarters of 2019, there were 151.9 million reported cases of ransomware attacks. In the third quarter, it was estimated that the average payout in the U.S was 41000 USD. Although the number of cases has been decreasing, the tactics of the attackers seem to have changed. According to a report published by SonicWall [17], the attackers go after fewer, high-value targets. The previously preferred tactic appeared to be a more extensive array of low-value targets. These high-value targets are often local municipalities and hospitals. The loss of data for a hospital can have deadly consequences; The encrypted data can be patient information that may be crucial for patient survival. The most common way for a system to be infected is by a phishing scam. Someone in the organization receives an email and accidentally downloads the malware masked as something else. Therefore, it is not easy to protect against as the user gives the malware permission to infect the system. A typical protective measure an organization can take is to back up all essential data. This makes it possible to restore the data after a system has already been infected. The common problem is that the system will automatically run a backup routine, thereby overwriting data with the encrypted data. If the system does not run these automatically, backups are often not done frequently enough, and therefore, the data in the backup is not up to date.. 1.1. Motivation. This thesis proposes using machine learning as a tool to identify files that have been infected. Then either to, not save the data to the backup, to call the system administrator, or to automatically restore data from backup overwriting the encrypted data. This project.

(14) 2. Chapter 1. Introduction. was developed at Combitech. Combitech is an independent technical consulting company and part of the defense and security group Saab AB.. 1.2. Aim. This thesis aims to investigate whether a neural network can identify files that have been encrypted as part of a ransomware attack. Given that the neural network can identify whether files are encrypted, it is also important to investigate how accurate it can be. The result should be a module that could be implemented in a backup system to automatically decide whether a system has been infected.. 1.3. Research Questions. The following questions will be answered in this thesis: • What is the optimal neural network infrastructure for identifying an encrypted file? • How does the number of epochs effect the training process for the model? • Can the model successfully analyze a file encrypted with a different encryption method than the one it has been trained to analyze? • How would a machine learning model analyzing encrypted files be implemented in a backup system?. 1.4. Delimitations. This model is supposed to identify whether a file is encrypted or not. It should not take into account whether a user has encrypted the file for private security issues. It will assume that all files that were run through it are supposed to not be encrypted..

(15) CHAPTER 2 THEORY. This chapter will discuss the relevant theory behind this thesis. It involves theory on deep learning, neural networks, ransomware, and basic file structures. This information is necessary to understand in order to grasp the broader concepts of this thesis.. 2.1. Backup systems. A backup system is a system that stores essential data in a separate location to keep it safe. It is kept safe for many reasons, for example data corruption, software bugs, or infection by different types of malware. An external server most often handles a backup system, but it can also be attached to the user system as a separate hard drive. In either case, it is for file recovery in the case of an emergency. The software running the backup system determines where data is supposed to be stored, how many copies are needed, and how often it should be backed up. This differs from system to system. On the client-side of this application, the system makes sure it knows where to restore files in the user system [13].. 2.2. Ransomware. Ransomware is a type of malware that infects the system by encrypting all files on the system. It can also decrypt all the data in the system, thereby giving the victim access again. The encryption method is most commonly symmetric, and the data can be decrypted with a given encryption key. The attacker will hold the data ransom until the victim has paid a given amount. At this time, the victim will receive the key to decrypt all data, and the system will be restored to its prior state[3]. The ransomware will most commonly target specific files to encrypt files like; images, documents, spreadsheets, and videos. Encrypting an entire system can take time, so targeting.

(16) 4. Chapter 2. Theory. files that the user is most likely to miss speeds up this process. Also, the attacker does not want to encrypt files that can cause permanent damage to the computer as it can no longer be restored; thus, the ransom will not be paid as the targets will know that there is no point [3]. As mentioned in chapter 1, there were 151.9 million reported cases of ransomware attacks in 2019 [17].. 2.3. Encryption. Encryption can be thought of as locking data. It makes sure that only users with access can see the data. There are two common types of encryption methods, asymmetric and symmetric [7]. Asymmetric encryption uses two types of keys one for encryption and one for decryption. Often referred to as private and public keys, the public key is used to encrypt data; it is called public as it can be given to anybody. The private key is used to decrypt the data that has been encrypted with the public key; this allows other users to encrypt data that only the private key holder can unlock. The most common asymmetric encryption method is RSA[8][9]. Symmetric encryption uses only one key; it is used both for encryption and decryption. If some information needs to be stored or sent securely, two parties can agree on one key. The information will be encrypted using a symmetric encryption algorithm that uses the key to encrypt. After encryption, the information should not indicate what its actual content is. When the original information needs to be accessed, a decryption algorithm is used given the same key; it should reverse the encryption giving the user the original information. The only thing that makes the information safe is the key; therefore, it must be kept secret. Nowadays, The most common symmetric encryption algorithm is AES (Advanced Encryption Standard). Before that, the most common symmetric encryption algorithm was DES. The algorithm itself is open and known to all, so there is no inherent security in the method itself. That also means that the encryption method is not yet breakable as it is well tested [8][9]. Hash functions are not considered encryption but are worth mentioning. A hash function takes some information and scrambles it making non-coherent information. This information is not supposed to be retrieved but can be used to verify the information. The most common hash function is SHA [8]. An asymmetric encryption method uses a lot more computing power than a symmetric encryption method does. Therefore it is a vast difference in time for encrypting a message, asymmetric encryption taking longer time [9].. 2.4. Deep Learning. There are several approaches one can take to achieve artificial intelligence. One approach is to let an algorithm learn from the experience of similar tasks. This is called Deep Learning..

(17) 2.5. Artificial Neural Networks. 5. This approach is, in some ways, similar to how humans develop although humans and computers thrive o1n different types of tasks. Computers are more apt to learn abstract and formal rule sets. For example, a computer can perform well beyond what humans can when it comes to games like chess, although it might struggle to identify what an image portrays. One of the problems that can occur for a computer when approaching deep learning is to extract useful data from its raw form. This is where programmers help the algorithm by pre-defining what data is useful and organize it for the algorithm to analyze [11][16].. 2.5. Artificial Neural Networks. The artificial neural network is a type of machine learning model. It is based on how neurons in the human brain work. The brain learns how to function from experience. Brains store information as patterns, something that computers are not as good at as brains. A computer is excellent at complex mathematical calculations but it is difficult to program a computer to recognize complex patterns that requires intuition. For recognizing patterns, artificial neural networks are a good solution. As the name might suggest, it is a network of neurons that are coupled in layers. The network is then trained to predict an output from some input [1].. 2.5.1. Neurons. The building blocks of the neural network consists of neurons. A neuron receives input, processes the given input, and generates an output [16]. The neuron takes one or more inputs sums them together and run them through an activation method; this method is most often non-linear [1]. There are several activation methods, some of which will be discussed further in section 2.5.4. This process will generate output depending on the needs of the network. The output could be a binary response or a decimal number. The output from the neuron is an estimate of what the response should be. [16].. 2.5.2. The network. Using only one single neuron will not yield a good pattern recognition for most problems. The real power of this model is its network. This network has three types of layers. First, the input layer, this layer contains a number of neurons that each is given an input value of which it will not process but send forward to the next layer. There can only be one input layer but many input neurons. These neurons are connected to the next layer by sending the output from all neurons to all the neurons of the next layer. The next type of layer is called a hidden layer. This type of layer contains a predefined number of neurons. A hidden layer can theoretically contain an infinite number of neurons, although it has to contain at least one neuron. There can be an infinite number of hidden layers; it is only limited by computation power and time. Each hidden layer is connected to the next layer, sequentially. Each neuron will send its output to all neurons in the next layer. At each step these neurons will process the inputs and calculate an output, as described in section 2.5.1. Lastly, the output layer. This layer works similarly to the hidden layers except that the output of this layer has to be able to be interpreted by the software as an answer to.

(18) 6. Chapter 2. Theory. the question. Often this will be a probability of a true or false statement such as, is this file encrypted? Thereby the probability is represented by a float value between zero and one [16]. For every connection, there is an associated weight. The weights are multiplied to the input to every neuron upon summation. These weights are what is tuned to predict an outcome accurately [16].. 2.5.3. Common network architectures. When choosing what type of architecture a neural network needs to solve a specific problem, a lot of trial and error is needed. However, some common architecture types can be used. The network architectures will now be explained. Only some of the more relevant architecture to this project will be explained as there are many common architectures. The perceptron is the most simple architecture, containing no hidden layers. It is thereby only input neurons connected to one output neuron. This is the simplest and oldest model of a network [15]. This was used because of the limited computation power that was available at the time [15]. The feed-forward network is similar to the perceptron. But with one hidden layer between the input and output layers. All neurons in the network are fully connected and pass information forward, hence the name. The advantage of this network structure is that it can do more complex calculation as there are more neurons to process the information. [18]. The deep feed-forward network is very similar to the feed-forward network, although with several hidden layers. This is the basis for most common neural networks seen today. The deep network can solve more complicated tasks. The number of extra layers slows down the training process, and it was, therefore, rarely used until the turn of the century. As with the feed-forward network this network structure allows even more complex calculations as there are more steps to process the data [18]. An autoencoder structure is a variant of a deep feed-forward network that will perform noise reduction of data. The dimensions of the hidden layers will gradually decrease from more substantial to small, therefore compressing the data from its initial input size. This is the encoder step of the process. The decoder learns how to take the encoded data and create a representation of the original data. The advantage of this structure is that it minimises the data and is therefore easier to predict the result since there are less parameters to consider. [2].. 2.5.4. Activation functions. An activation function is used to process the sum of the inputs and output a value. These functions are generally non-linear. The inputs to the activation function can be defined as any real number, while the output generally has a range. Often the range of the output is between zero to one or negative one to a positive one. Choosing the correct activation function can be difficult; in some cases, it might be apparent what inputs should generate what, but generally, the developer does not know what inputs should generate what output..

(19) 2.7. File Entropies. 7. If this were the case, a neural network would not be needed. Choosing an appropriate activation function is often done by trial and error to find an appropriate set of activation functions [16].. 2.5.5. Optimizer & Loss function. A loss function is used to measure how well the model is performing. As the model is trained this measures how far the model prediction is from the truth. Using this type of function, the model can correct itself while learning. It is also imperative that the loss function does not overcompensate. It corrects the model depending on how big the error is. The optimizer’s role is to use the loss functions result to update the model parameters to minimize the loss [4].. 2.5.6. Epochs. The Epoch number is the number of times the training algorithm will run through the whole data set. This allows the training algorithm to analyze the data multiple times; therefore, the data set can be allowed to be smaller[5].. 2.6. File signatures. Every type of file has a certain structure with each file generally exhibiting a signature. This signature is there for some software to identify what type of file it is and what to do with it. The signature is often the first bits; the size of the signature varies depending on the file type. It varies between 4 bits and 12 bytes [21]. After the signature comes the body of the file which contains the data specific to that file. Generally, the file ends with an end signature. The file signature and the file type at the end of the file name can be compared to determine whether the signature and the file ending matches.. 2.7. File Entropies. The entropy of a file is a measure of how much randomness appears in the data of a file. If the data in the file is easily predictable, the entropy is low. If the data appears random and unpredictable, it will have a higher entropy [20]. For example, a text file will generally have a low entropy, as the characters used in the text will often be reused, and a lot of available characters will most likely never be used. A file that is compressed will register a high entropy as all sets of characters available are used. The file entropy can be calculated and Shannon’s entropy equation is often used [6].. H(X) = −. n−1 X 0. PX (xi ) ∗ log256 (PX (xi )). (2.7.1).

(20) 8. Chapter 2. Theory. In equation 2.7.1 PX is the probability of that byte existing in the current file. X is a random value and xi is all values that X can be. The summation is over all bytes in a file..

(21) CHAPTER 3 METHOD. To solve the underlying issue discussed in this thesis a number of steps were performed. These are the steps necessary to recreate and test this experiment. The result of which is a model that can determine the probability of how likely it is that the file in question is encrypted. 3.1. Development Tools. This project was written in Python 3.7 using the framework Tensorflow. Tensorflow is an open-source deep learning framework developed by Google. Tensorflow was chosen because it is a robust framework that has been well tested and used [19]. This project could be developed using another language and framework but this is what was chosen.. 3.2. Creating Training and Testing Data. While developing an artificial neural network, a sufficient amount of data must be available for training the model. For this project the training data was built up of a variety of files. The training data consist of 50 different file types. When building the files, a bundle of a thousand files was divided up 10% were test files and 90% training files. For training purposes, some files were encrypted while some are non-encrypted while dividing the data, 50% of the files were encrypted. The files used in this data set were downloaded and gathered from the internet, including those that were already on the development computer. The files should be similar to what a user might store in their backup system. Therefore, almost all files were documents, images, spreadsheets, videos, and a variety of project files for commonly used software..

(22) 10. 3.3. Chapter 3. Method. File Encryption. For encrypting files, the symmetric encryption algorithm AES was used. This was chosen as it is one of the commonly used encryption standards overall, as well as being commonly used in known ransomware [12]. Most often, the encryption key that is generated is also encrypted using an asymmetric encryption algorithm such as RSA [12] although, this step was skipped as ransomware is not what was being developed. For developing the file encrypter, the library Fernet was used. This is a library that is built on top of standard cryptographic primitives such as AES-128 [14]. Firstly, a key needs to be generated for encrypting. The key is a URL-safe, base64, 32 byte key that is generated from the operating system specific randomness source and is suitable for encryption [10]. For encrypting an entire file, the structure of the file needs to be kept for later decryption. Therefore, the file is opened and read line by line. Each line is then encrypted using the generated key and appended to the encrypted string as a new line. The new line is for re-reading the exact same data upon decryption. Each line is then written back to the file, overwriting the old data with the encrypted lines. For decrypting the file, the process is mostly reversed. Although, the key is given to the function as it has to match with the generated key mentioned above. The file is read line by line. Each line is decrypted and saved to an array. The array is then looped through and overwritten to the original file, thereby restoring the file to its original unencrypted state.. 3.4. Extracting Useful Data from Files. When running a neural network, the input layer has a fixed number of inputs. However, file sizes vary and all contain a different amount of data. To address the varying file size, some specific parameters of a file was picked to input to the network for the evaluation. There were ten inputs. Eight of the inputs were the first eight bytes of the file. These bytes contained the file signature and, in some cases, some extra bytes. Secondly, the file extension was passed as an input. Lastly, the file entropy was passed as an input. The file signature is always the same for a functional file, so it has not been encrypted. Therefore, it is valuable information to see if these signatures are equal to the signatures. Which bytes that are needed can be gathered from the file type also passed as an argument. The file entropy was used to calculate the randomness of bytes; this tends to be very high when it comes to encrypted files.. 3.4.1. Extracting the signature. As mentioned, the file signature ranges from four bits to twelve bytes. The assumption was made that eight bytes would be sufficient to see a pattern and recognize a pattern. The number of bytes was kept lower than twelve bytes in order to keep the size of the network smaller while simultaneously not flooding the network with excess random data points. The first eight bytes were extracted by opening the file and reading the first eight bytes, saving them as integers for network compatibility..

(23) 3.5. Neural Network Structure. 3.4.2. 11. Extracting file type. The file type is extracted from the file name using only the extension of the file type. When training the model, all file extensions were listed in a map with a unique id for each file type. This id is what was passed to the model as input. This was done because it is easier for the network to interpret an integer rather than a string.. 3.4.3. Calculating the file entropy. The file entropy was calculated using Shannon’s entropy algorithm 2.7.1. An array with size 256 was created to store the number of bytes for all possible configurations. All elements in the array were initialized to zero. This is an array of byte distribution. An integer variable was created to store the total number of bytes regardless of their value. The file is opened and looped through byte-wise. The distribution of the bytes were recorded in an array with 256 elements, each element corresponding to the byte value and how often the byte of this value occurred in the file. Then, the array of byte distribution is looped through. For each byte PX (xi ) from equation 2.7.1 was calculated by dividing the byte distribution by the total number of bytes, then added to the entropy according to the equation 2.7.1. If the byte distribution for a particular byte is zero this was omitted from the entropy as the logarithm of zero is not defined. Then the entropy is returned.. 3.5. Neural Network Structure. Designing the network structure was a process of trial and error. There are four parameters that can be set for a network: the architecture, the activation functions, the optimizer, and the loss function. For this task, there were three possible basic architectures, the feed-forward network, the deep feed-forward network, and a variant of the autoencoder. Starting with the basic feedforward network, there was one hidden layer, and tests were run with different numbers of neurons. Next the deep feed-forward network. In this case there were three hidden layers with different numbers of neurons finally the autoencoder method where the number of neurons in each hidden layer decreases for each layer. The models were evaluated to determine which model was the most effective. The activation, loss, and optimizer were determined by testing all the standard methods. These were each evaluated to find the best performing given the already determined architecture. Not only were tests done to find which methods were reliable, but also which combinations would be the most effective. At this point, a structure that could determine whether a file was encrypted had been found, and it could do it with such certainty that no further tests for better performance seemed necessary. The results from determining what network structure was appropriate can be seen in the.

(24) 12. Chapter 3. Method. results chapter, chapter 4.. 3.6. Evaluating the Model. To ensure satisfactory performance, the model was evaluated. The evaluation was performed after training the model. When testing the model, the program used testing files; the testing files had defined answers given to the model to compare the prediction to the actuality. Explicit data points were examined to define how well the model performed. Those data points were: the average difference from the prediction to the answer, the highest difference from the prediction to the answer and how many predictions had less than 0.01 difference to the answer. These data points will clarify whether the model is working as hoped, as well as how accurate the model is. Since the model was not binary, its result was a prediction percentage, the difference between its prediction and the actual value was useful. It is beneficial if the model is as precise as possible, and thereby its response can be seen as close to binary. If the prediction were lower than 99% certainty, it would be declared inconclusive. The reason for the high percentage is that false-positives could be detrimental for data recovery.. 3.7. Implementation in a Backup system. This model was implemented in a backup system to test how it would work in practice. The backup system was a rudimentary software built for this purpose. The backup system copied all files from one directory into another directory. The backup directory would preferably be on a separate hard drive while the root directory might be a subset of directories from the users’ computer. For each file the useful data was extracted and was used as input to the encryption detection model. The encryption detection model ran before the file was copied to the backup. If the model detected a file as encrypted the user could choose three types of responses. First, the software did not back up that file and alerted the user through text response that this file was encrypted. Secondly, it alerted the user that the file was encrypted and stopped the backup process. Thirdly, it recovered the file from the backup to the user’s computer and kept the backup process going, obviously not saving the encrypted file.. 3.8. Testing the model. Three types of tests were performed during this research, network structure test that was explained in chapter 3.5, an encryption method test, and epochs test. The network structure tests used three parameters to determine what model was performing better. The three parameters were the highest difference between the prediction and the actual answer, the average difference between the prediction and the actual answer, and the percentage of correct answers. The differences are essential as there is such a thing as more or less correct when it comes to prediction, and it is preferable that the model be as accurate as possible..

(25) 3.8. Testing the model. 13. This model was developed only by analyzing files encrypted with AES as it is the most commonly used algorithm in ransomware. However, in the case where ransomware would be implemented using another algorithm, perhaps an unknown algorithm, it would be preferable if the model could detect these as well. To test this, the model was used to predict whether a file that had been encrypted with a different algorithm was encrypted. It was tested on 50 different files and the success rate was recorded. Furthermore, the model was retrained with different encryption algorithms but with the same structure to see whether it could be trained to detect the other encryption algorithms. With these models, the same test was run. For further conclusion, a test was run training the model with a mix of these algorithms to detect whether this would be preferable. The encryption algorithms that were added were DES, RSA, and SHA. These were chosen for there full usability as well as how they differ from each other. Although SHA is not an encryption method, it is instead a hash function, it can be used in a ransomware attack. To determine how many epochs were needed to adequately train a model, the model was trained using incremental numbers of epochs while recording the highest difference between the prediction and the actual answer, the average difference between the prediction and the actual answer, and the correct percentage answers. Using these data points it can be seen at what point more epochs are no longer useful. The results of these tests are displayed in chapter 4..

(26)

(27) CHAPTER 4 RESULT. This chapter presents results from the tests and the training explained in section 3.8. 4.1. Training model. This section displays the different tests that were performed to evaluate what type of neural network structure is the most accurate and efficient. The training data consisted of 900 files and the testing data consisted of 100 files.. 4.1.1. Architecture test. In table 4.1 all tests concerning the structure of the neurons, both with concern of number of hidden layers as well as the number of neurons in each layer. In Figures 4.1, 4.2 and 4.3 a plot from these tests can be seen. Architecture Test Layers. Highest difference. Average Difference. % correct tests. binary RMSprop crossentropy. 1 hard sigmoid. 1.0. 0.12381. 87.62 %. binary RMSprop crossentropy. 128 hard sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. ID. Optimizer. 90. 91. Loss tion. func-.

(28) 16. Chapter 4. Result. 92. binary RMSprop crossentropy. 64 hard sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. 93. binary RMSprop crossentropy. 64 hard sigmoid 64 sigmoid 64 sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. 94. binary RMSprop crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. 95. binary RMSprop crossentropy. 16 hard sigmoid 16 sigmoid 16 sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. 96. binary RMSprop crossentropy. 8 8 8 1. hard sigmoid sigmoid sigmoid sigmoid. 0.00016. 0.00012. 100.0 %. 97. binary RMSprop crossentropy. 16 hard sigmoid 8 sigmoid 4 sigmoid 1 sigmoid. 0.00636. 0.00603. 100.0 %. 98. binary RMSprop crossentropy. 32 hard sigmoid 16 sigmoid 8 sigmoid 4 sigmoid 1 sigmoid. 0.01421. 0.00963. 100.0 %. 99. binary RMSprop crossentropy. 0.00575. 0.00567. 100.0 %. 64 hard sigmoid 32 sigmoid 16 sigmoid 8 sigmoid 4 sigmoid 1 sigmoid. Table (4.1): Test for network architecture.

(29) 4.1. Training model. Figure (4.1): A plot of all tests for the architecture. Figure (4.2): A plot of the architecture test without the first outlier. 17.

(30) 18. Chapter 4. Result. Figure (4.3): A plot of the best tests. 4.1.2. Activation test. In table 4.2 all tests concerning the activation method can be seen. It is complemented with table 4.3 for testing sigmoid and hard sigmoid configurations. In figures 4.4 4.5 4.6 this data has been plotted. The data from the sigmoid tests has been plotted in Figure 4.7 4.8 Activation Test Layers. Highest difference. Average Difference. % correct tests. binary 113 RMSprop crossentropy. 32 hard sigmoid 32 hard sigmoid 32 hard sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. binary 114 RMSprop crossentropy. 32 sigmoid 32 sigmoid 32 sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. binary 115 RMSprop crossentropy. 32 elu 32 elu 32 elu 1 elu. 2.0. 1.47221. 50.48 %. ID. Optimizer. Loss tion. func-.

(31) 4.1. Training model. 19. binary 116 RMSprop crossentropy. 32 exponential 32 exponential 32 exponential 1 exponential. 0.0. nan. 0.0 %. binary 117 RMSprop crossentropy. 32 linear 32 linear 32 linear 1 linear. 92.47555. 41.04344. 46.67 %. binary 118 RMSprop crossentropy. 32 relu 32 relu 32 relu 1 relu. 1.0. 0.49524. 50.48 %. binary 119 RMSprop crossentropy. 32 selu 32 selu 32 selu 1 selu. 39.91662. 15.15567. 73.33 %. binary 120 RMSprop crossentropy. 32 softmax 32 softmax 32 softmax 1 softmax. 1.0. 0.50476. 49.52 %. binary 121 RMSprop crossentropy. 32 softplus 32 softplus 32 softplus 1 softplus. 1.0. 0.49524. 50.48 %. binary 122 RMSprop crossentropy. 32 softsign 32 softsign 32 softsign 1 softsign. 0.98372. 0.50331. 99.05 %. binary 123 RMSprop crossentropy. 32 tanh 32 tanh 32 tanh 1 tanh. 1.9579. 1.30281. 50.48 %. Table (4.2): Test for activation methods.

(32) 20. Chapter 4. Result. Figure (4.4): A plot of all tests in the activation function test. Figure (4.5): A plot of the activation function tests with the outliers omitted.

(33) 4.1. Training model. 21. Figure (4.6): A plot of the two best tests from the activation tests. Sigmoid configuration test Layers. Highest difference. Average Difference. % correct tests. binary 124 RMSprop crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. binary 125 RMSprop crossentropy. 32 sigmoid 32 hard sigmoid 32 hard sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. binary 126 RMSprop crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 sigmoid. 0.0. 0.0. 100.0 %. binary 127 RMSprop crossentropy. 32 sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. ID. Optimizer. Loss tion. func-. Table (4.3): Test for the Sigmoid activation methods.

(34) 22. Chapter 4. Result. Figure (4.7): A plot of all the different Sigmoid configurations. Figure (4.8): A plot of the best Sigmoid configurations.

(35) 4.1. Training model. 4.1.3. 23. Optimizer test. In table 4.4 all tests concerning the optimizer can be seen. In figure 4.9 this data has been plotted. Optimizer test Layers. Highest difference. Average Difference. % correct tests. 128 Adadelta. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.53555. 0.4974. 0.0 %. 129 Adagrad. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.49909. 0.24386. 2.86 %. 130 Adam. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 131 Adamax. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 132 Ftrl. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.50764. 0.50007. 0.0 %. 133 Nadam. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 134 SGD. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.53321. 0.46207. 0.0 %. binary 135 RMSprop crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. ID. Optimizer. Loss tion. func-. Table (4.4): Test for Optimizers.

(36) 24. Chapter 4. Result. Figure (4.9): A plot of all tests in the optimizer test. Figure (4.10): A plot of the best performing tests in the optimizer test.

(37) 4.1. Training model. 4.1.4. 25. Loss function test. In table 4.5 all tests concerning the loss function can be seen. In figure 4.11 this data has been plotted. Loss function test Layers. Highest difference. Average Difference. % correct tests. binary 142 RMSprop crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. categorical 143 RMSprop crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.56077. 0.49972. 0.0 %. categorical 144 RMSprop hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.67871. 0.50163. 0.0 %. 146 RMSprop hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 147 RMSprop huber loss. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. kullback 148 RMSprop leibler divergence. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 1.0. 0.50476. 49.52 %. 149 RMSprop logcosh. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. mean abso150 RMSprop lute error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.00513. 5e-05. 100.0 %. ID. Optimizer. 145 RMSprop. Loss tion. func-. cosine similarity.

(38) 26. Chapter 4. Result. mean absolute 151 RMSprop percentage error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 1.0. 0.49524. 50.48 %. mean 152 RMSprop squared error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. mean squared 153 RMSprop logarithmic error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 154 RMSprop poisson. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. Table (4.5): Test for loss functions. Figure (4.11): A plot of all tests in the loss function test.

(39) 4.1. Training model. 27. Figure (4.12): A plot of the best performing loss functions. 4.1.5. Optimizer and Loss function combination test. In table 4.6 all tests concerning the loss function combined with the optimizer can be seen. This data has been plotted in figures 4.13, 4.14 and 4.15. Optimizer and Loss function combinations Layers. Highest difference. Average Difference. % correct tests. 345 Adam. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 346 Adam. categorical hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. ID. Optimizer. 347 Adam. Loss tion. func-.

(40) 28. Chapter 4. Result. huber loss. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 2e-05. 0.0. 100.0 %. 349 Adam. logcosh. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 2e-05. 1e-05. 100.0 %. 350 Adam. mean squared error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 351 Adam. mean squared logarithmic error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 352 Adam. poisson. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 353 Adamax. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 354 Adamax. categorical hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.16024. 0.00157. 99.02 %. huber loss. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.08652. 0.00092. 99.02 %. logcosh. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.00028. 0.00016. 100.0 %. 348 Adam. 355 Adamax. 356 Adamax. 357 Adamax.

(41) 4.1. Training model. 29. 358 Adamax. mean squared error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.00011. 2e-05. 100.0 %. 359 Adamax. mean squared logarithmic error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 2e-05. 1e-05. 100.0 %. 360 Adamax. poisson. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 361 Nadam. binary crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 362 Nadam. categorical hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.03629. 0.00036. 100.0 %. hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. huber loss. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.02619. 0.00026. 100.0 %. 365 Nadam. logcosh. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0002. 9e-05. 100.0 %. 366 Nadam. mean squared error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 367 Nadam. mean squared logarithmic error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 363 Nadam. 364 Nadam.

(42) 30. Chapter 4. Result. poisson. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. binary 369 RMSprop crossentropy. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. categorical 370 RMSprop hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 371 RMSprop hinge. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.11559. 0.00113. 99.02 %. 372 RMSprop huber loss. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 373 RMSprop logcosh. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. mean 374 RMSprop squared error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. mean squared 375 RMSprop logarithmic error. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 376 RMSprop poisson. 32 hard sigmoid 32 sigmoid 32 sigmoid 1 hard sigmoid. 0.0. 0.0. 100.0 %. 368 Nadam. Table (4.6): Test for Optimizer and loss function combinations.

(43) 4.1. Training model. Figure (4.13): A plot of all tests for optimizer and loss combination test. Figure (4.14): A plot of all tests for optimizer and loss combination test. 31.

(44) 32. Chapter 4. Result. Figure (4.15): A plot of all tests for optimizer and loss combination test. 4.2. Epoch test. In figures 4.16, 4.17, 4.18 and 4.19 a plot of the average prediction difference has been plotted. These plots have been split into four plots for the information to be more easily readable. A combined plot can be found in figure 4.20.

(45) 4.2. Epoch test. Figure (4.16): A plot of success rate from 10 epochs to 150 epochs. Figure (4.17): A plot of success rate from 160 epochs to 300 epochs. 33.

(46) 34. Chapter 4. Result. Figure (4.18): A plot of success rate from 310 epochs to 450 epochs. Figure (4.19): A plot of success rate from 460 epochs to 600 epochs.

(47) 4.3. Encryption method test. 35. Figure (4.20): A plot of success rate from 10 epochs to 600 epochs. 4.3. Encryption method test. The encryption method test was created to determine whether the model could be trained with data encrypted with one method and still evaluate data encrypted with other methods. In table 4.7 the first column displays what encryption method was used for testing. The second column is what method the model was trained with. The last column shows the percentage of correct predictions. The training data consists of 900 files and the testing data consists of 38 files. Encryption method Test Tested with. Trained for. Success Rate (%). AES. AES. 100. DES. AES. 60.526315789473685. RSA. AES. 50. SHA. AES. 57.89473684210527. AES. DES. 100. DES. DES. 100. RSA. DES. 100. SHA. DES. 97.36842105263158. AES. RSA. 97.36842105263158. DES. RSA. 92.10526315789474. RSA. RSA. 97.36842105263158.

(48) 36. Chapter 4. Result. SHA. RSA. 100. AES. SHA. 100. DES. SHA. 89.47368421052632. RSA. SHA. 81.57894736842105. SHA. SHA. 100. AES. ALL. 100. DES. ALL. 94.73684210526315. RSA. ALL. 92.10526315789474. SHA. ALL. 97.36842105263158. Table (4.7): Testing the use of the network for untrained encryption. 4.4. Implementation in a Backup system. A backup implementation were successfully implemented using the model from section 4.1. All files that was encrypted on the users computer were identified according to the users chosen response..

(49) CHAPTER 5 DISCUSSION. In this chapter the results from the tests will be discussed as well as the work in a wider context and some source criticism.. 5.1 5.1.1. Results Training model. Training the model for identifying a successful and efficient model was an iterative process. First the architecture, activation, loss function, and optimizer were tested and lastly trying different loss functions combined with different optimizer functions. The issue with this approach is that the model will not function without any one of these parts. Therefore the functions had to be guessed first to test the architecture. In a perfect scenario, all combinations of architecture and functions would be tested but this would be too timeconsuming. Regardless a successful and efficient model was found. It is not necessarily necessary to find the one best solution. Three typical structures were tested. The feed forward network, the deep feed forward network, and the variation of the autoencoder mentioned in section 2.5.3. As can be seen from figure 4.2, the autoencoder method performs the worst. In figure 4.3 shows that the best performing architecture is the table row with id 94. This architecture is a deep feed-forward with three hidden layers and 32 neurons in each layer. As can be seen in figure 4.4, some of the results were inconclusive; therefore, these were removed in figure 4.5. In figure 4.6, we can see that the sigmoid and hard sigmoid activation functions outperform the other activation functions. Therefore these functions were tested in combination with each other. As can be seen in figures 4.7 and 4.8 all perform well, but the test with id 126 performs worse than the others..

(50) 38. Chapter 5. Discussion. In the optimizer test, it can be seen in figures 4.9 4.10, there were four optimizers that performed equally well: Adam, Adamax, Nadam, and RMSprop. As can be seen in figures 4.11 4.12 there are eight equally well-performing loss functions. Since the loss function and optimizers are heavily related in creating the model, these are tested in combination, as seen in table 4.6. As can be seen in figure 4.15, several of these combinations can be used. It can be concluded from these tests that a deep feed-forward network with three hidden layers and 32 neurons with a combination of sigmoid and hard sigmoid activation layers is the configuration with best performance. There are also several optimizer and loss function combinations that can give good performance. In the end, a model has been achieved with zero difference between the prediction of the actual value.. 5.1.2. Epoch tests. The number of epochs used while training a model will increase the time it takes to linearly train the model. Therefore keeping a low number of epochs is beneficial, but the model’s accuracy should not suffer from the lower amount of epochs. This test was designed to find the point where more epochs are no longer beneficial for the model’s accuracy. From figure 4.18, using more than 380 epochs is no longer beneficial; it will only waste processing time.. 5.1.3. Encryption method test. From the encryption test, we can see that training a model for one encryption method will not allow it to reliably predict encryption used by other methods except for some particular instances. We can also see that if the model is trained for all four encryption methods, the amount of data is insufficient to train the model.. 5.2. Method. The method that was chosen was methodical and practical for finding a well-performing model. Although it could have been more successful if the computer used had more computing power, and thereby be able to use more files to train the model. The parameters used to compare different models could be more extensive for testing the model on a deeper level if a similar test were performed again.. 5.3. Work in a wider context. Machine learning and security has many applications. This project could be implemented in a backup system. However, multiple other modules could be developed with similar tasks of identifying malware. For example, a machine learning algorithm helping users to identify phishing emails. Some software using machine learning to identify if a file has been compromised..

(51) 5.4. Source criticism. 5.4. 39. Source criticism. The source [21] is a Wikipedia page which is generally frowned upon whilst researching. In this case though, a list of common file headers was needed which I couldn’t find anywhere else. This is a compiled list from numerous sources which in and by themselves describe some but not all file types, the list is simply a collection from many sources. These sources have been checked and are reliable, therefore I rely upon this Wikipedia page..

(52)

(53) CHAPTER 6 CONCLUSION. The optimal network structure can be difficult to determine but a network structure that would successfully predict if a encrypted file was found. Overall the network structure was successful although it is unclear if it is the most optimal structure. It could be seen that the number of epochs improved the model up until a certain point where the benefits of more epochs subsided. After 380 epochs there was no apparent benefit of raising the number of epochs. The model cannot reliably predict a file that has been encrypted with another encryption method. Although it could be trained for multiple encryption methods. The encryption method most often used in ransomware is AES and the model is successful in predicting whether a file has been encrypted with this method. The model was successfully implemented in a rudimentary backup system. It can easily be implemented to any backup system as it works as a standalone module. How the module is used can be up to the discretion of the developer implementing the model in a backup system..

(54)

(55) BIBLIOGRAPHY. [1] Dave Anderson and George McNeill. ”Artificial neural networks technology”. In: Kaman Sciences Corporation 258.6 (1992), pp. 1–83. [2] Will Badr. Auto-Encoder: What Is It? And What Is It Used For? url: https: //towardsdatascience.com/autoencoder- what- is- it- and- what- isit-used-for-part-1-3e5c6f017726 (visited on ). [3] binance. Ransomware Explained. url: https : / / academy . binance . com / security/ransomware-explained (visited on ). [4] Léon Bottou, Frank E. Curtis, and Jorge Nocedal. Optimization Methods for Large-Scale Machine Learning. 2016. arXiv: 1606.04838 [stat.ML]. [5] Jason Brownlee. Difference Between a Batch and an Epoch in a Neural Network. url: https://machinelearningmastery.com/difference-betweena-batch-and-an-epoch/ (visited on ). [6] Ricky Chen. A Brief Introduction on Shannon’s Information Theory. Jan. 2016. doi: 10.13140/RG.2.1.2912.3604. [7] Cloudflare. What is Encryption? url: https : / / www . cloudflare . com / learning/ssl/what-is-encryption/ (visited on ). [8] H Delfs and H Knebl. Introduction to Cryptography Principles and Applications (2007). [9] Edney and William A. Arbaugh. Real 802.11 Security: Wi-Fi Protected Access and 802.11i. USA: Addison-Wesley Longman Publishing Co., Inc., 2003. isbn: 0321136209. [10] Python Software Foundation. Miscellaneous operating system interfaces. url: https://docs.python.org/3/library/os.html#os.urandom (visited on ). [11] D. Goularas and S. Kamis. ”Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data”. In: (2019), pp. 12–17. [12] Yimi Hu. A Brief Summary of Encryption Method Used in Widespread Ransomware. url: https : / / resources . infosecinstitute . com / a - brief -.

(56) 44. Bibliography. summary - of - encryption - method - used - in - widespread - ransomware / #gref (visited on ). [13] S. Nelson. Pro Data Backup and Recovery. Apresspod Series. Apress, 2011. isbn: 9781430226628. url: https://books.google.se/books?id=r4uEEsq3CJYC. [14] pyca. Fernet (symmetric encryption). [Library]. 2020. url: https://cryptography. io/en/latest/fernet/. [15] Frank Rosenblatt. ”The perceptron: a probabilistic model for information storage and organization in the brain.” In: Psychological review 65.6 (1958), p. 386. [16] D. Shiffman, S. Fry, and Z. Marsh. The Nature of Code. D. Shiffman, 2012. isbn: 9780985930806. url: https://books.google.se/books?id=hoK6lgEACAAJ. [17] SonicWall. ”2019 SonicWall Cyber Threat Report”. In: (2019). [18] Andrew Tch. The mostly complete chart of neural networks explained. url: https : / / towardsdatascience . com / the - mostly - complete - chart - of neural-networks-explained-3fb6f2367464 (visited on ). [19] Tensorflow. Why Tensorflow. url: https://www.tensorflow.org/about. [20] Sriram Vajapeyam. Understanding Shannon’s Entropy metric for Information. 2014. arXiv: 1405.2061 [cs.IT]. [21] Wikipedia. List of file signatures. url: https://en.wikipedia.org/wiki/ List_of_file_signatures (visited on )..

(57)

No results found