• No results found

Adversarial Example Transferabilty to Quantized Models

N/A
N/A
Protected

Academic year: 2021

Share "Adversarial Example Transferabilty to Quantized Models"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)LiU-ITN-TEK-A--21/049-SE. Adversarial Example Transferabilty to Quantized Models Ludvig Kratzert 2021-06-21. Department of Science and Technology Linköping University SE-601 74 Norrköping , Sw eden. Institutionen för teknik och naturvetenskap Linköpings universitet 601 74 Norrköping.

(2) LiU-ITN-TEK-A--21/049-SE. Adversarial Example Transferabilty to Quantized Models The thesis work carried out in Medieteknik at Tekniska högskolan at Linköpings universitet. Ludvig Kratzert Norrköping 2021-06-21. Department of Science and Technology Linköping University SE-601 74 Norrköping , Sw eden. Institutionen för teknik och naturvetenskap Linköpings universitet 601 74 Norrköping.

(3) Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/. © Ludvig Kratzert.

(4) Linköping University | Department of Sicence and Technology Master’s thesis, 30 ECTS | Datateknik 2021 | LiU-ITN-TEK-G–21/047-SE. Adversarial Example Transferability to Quantized Models Överförbarhet av motstridiga exempel till kvantiserade modeller Ludvig Kratzert Supervisor : Ehsan Miandji Examiner : Saghi Hajisharif. Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se.

(5) Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.. Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.. © Ludvig Kratzert.

(6) Abstract Deep learning has proven to be a major leap in machine learning, allowing completely new problems to be solved. While flexible and powerful, neural networks have the disadvantage of being large and demanding high performance from the devices on which they are run. In order to deploy neural networks on more, and simpler, devices, techniques such as quantization, sparsification and tensor decomposition have been developed. These techniques have shown promising results, but their effects on model robustness against attacks remain largely unexplored. In this thesis, Universal Adversarial Perturbations (UAP) and the Fast Gradient Sign Method (FGSM) are tested against VGG-19 as well as versions of it compressed using 8-bit quantization, TensorFlow’s float16 quantization, and 8-bit and 4-bit single layer quantization as introduced in this thesis. The results show that UAP transfers well to all quantized models, while the transferability of FGSM is high to the float16 quantized model, lower to the 8-bit models, and high to the 4-bit SLQ model. We suggest that this disparity arises from the universal adversarial perturbations’ having been trained on multiple examples rather than just one, which has previously been shown to increase transferability [43]. The results also show that quantizing a single layer, the first layer in this case, can have a disproportionate impact on transferability..

(7) Acknowledgments I would like to thank Cybercom for supporting my thesis and helping guide me through these past few months. I thank my supervisor, Ehsan Miandji, and examinator, Saghi Hajisharif, for their advice and helpful ideas. I also thank Caleb Robinson for his helpful articles and GitHub repositories for running validation using ImageNet. Running my experiments on ImageNet would not have been possible if not for Academic Torrents, who host the validation set [7] [24].. iv.

(8) Contents Abstract. iii. Acknowledgments. iv. Contents. v. List of Figures. vii. List of Tables. viii. 1. Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2. Theory 2.1 Deep Learning . . . . . . . . . . . . 2.2 Convolutional Neural Networks . 2.3 Compression of Neural Networks 2.4 Quantization Approaches . . . . . 2.5 Attacks Against Neural Networks 2.6 Practical Examples of Attacks . . . 2.7 ImageNet . . . . . . . . . . . . . . . 2.8 VGG-19 . . . . . . . . . . . . . . . .. 1 1 2 2. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 3 3 4 5 7 8 10 10 11. Method 3.1 Previous Work . . . . . . . . . . . . . 3.2 Tools . . . . . . . . . . . . . . . . . . 3.3 Image Preprocessing . . . . . . . . . 3.4 Validation . . . . . . . . . . . . . . . 3.5 VGG-19 . . . . . . . . . . . . . . . . . 3.6 Universal Adversarial Perturbations 3.7 Fast Gradient Sign Method . . . . . . 3.8 Float16 . . . . . . . . . . . . . . . . . 3.9 8-bit Quantization . . . . . . . . . . . 3.10 4-bit Quantization . . . . . . . . . . . 3.11 Single Layer Quantization . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. 12 12 13 13 14 14 15 15 16 16 17 17. 4. Results 4.1 Universal Adversarial Perturbations . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Fast Gradient Sign Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18 18 19. 5. Discussion. 21. 3. v.

(9) 5.1 5.2 5.3 6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Work in a Wider Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 21 23 23. Conclusion 6.1 To what extent are some common adversarial attacks against a given neural network transferable to quantized versions of the same network? . . . . . . . . 6.2 What effects do some common quantizations have on the accuracy of a neural network? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. Bibliography. 25 25 26 27. vi.

(10) List of Figures 2.1 2.2 2.3. 3.1. 4.1 4.2 4.3 4.4. Comparison of the architectures of LeNet [lecun1989backpropagation] and AlexNet [alexnet] to illustrate typical CNN architecture. . . . . . . . . . . . . . . . A 3 layer 1D CNN feed-forward diagram with kernel size of 3 and stride of 1.[wiki1Dconv] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A universal adversarial perturbation generated for the neural network VGG-19. The perturbation has been scaled for visibility. . . . . . . . . . . . . . . . . . . . . . Example from ImageNet together with the top-5 classes suggested by AlexNet [krizhevsky2012imagenet]. The correct class label is mushroom. . . . . . . . . . . Example with UAP applied at various intensities. . . . . . . . . . . . . . . . . . . . Top-1 and top-5 accuracy of models with UAP applied with different levels of perturbation magnitude and different quantizations. . . . . . . . . . . . . . . . . . . Example with FGSM applied at various intensities. Note that the intensities are not the same as for UAP, e = 10 is shown for visibility. . . . . . . . . . . . . . . . . Top-1 and top-5 accuracy of models with FGSM applied with different levels of perturbation magnitude and different quantizations. . . . . . . . . . . . . . . . . . .. vii. 4 5 9. 14 18 19 20 20.

(11) List of Tables 3.1 3.2 4.1 4.2. Table outlining the different structures tested by Simonyan et al. in their original paper on VGG-19 [simonyan2014very]. VGG-19 is represented in column E. . . . . Sample weights before and after 8-bit quantization . . . . . . . . . . . . . . . . . . . Top-1 and top-5 accuracy of models with UAP applied with different levels of perturbation magnitude and different quantizations. . . . . . . . . . . . . . . . . . . Top-1 and top-5 accuracy of models with FGSM applied with different levels of perturbation magnitude and different quantizations. . . . . . . . . . . . . . . . . . .. viii. 15 16. 19 20.

(12) 1. 1.1. Introduction. Motivation. Neural networks have risen to prominence as one of the foremost techniques in machine learning and have been used to solve problems that older techniques such as support vector machines and naïve Bayes could not. Along with this greater power and flexibility, neural networks do have their shortcomings - they take a longer time to train, they require more resources to run and they take up more space both in memory and in storage compared to simpler techniques. Being large and resource-intensive, they are currently limited to deployment on devices with sufficient performance, meaning that small devices such as those on the edge in IoT, cannot be used to run unaltered neural networks. To remedy this, the field of Tiny IoT has emerged and diverse techniques have been developed to decrease the requirements of neural networks, allowing them to be deployed even on simple, low-cost devices. Deng et al. [9] identify four main areas: compact models, tensor decomposition, data quantization and network sparsification. The approach relevant to this work is quantization, here referring to the quantization of the numbers that make up the neural network, including the weights, activations, gradients, and errors. Any or all of these mathematically continuous properties might be quantized, most commonly the weights. It has been shown that neural networks in the field of image classification are susceptible to adversarial perturbations (e.g. Moosavi-Dezfooli et al. [29]), small changes to the image that, while being unnoticeable to the human eye, cause the model to misclassify the image. This technique is a main threat against neural networks from a security standpoint. How quantization affects a neural network’s robustness against said attacks remains an open question. Previous research in the area has shown both that quantization might aid in protecting against adversarial attacks [10] and that quantization increases susceptibility to attacks [2]. It has been shown that attacks against neural networks transfer, that is, an attack tailored for one neural network may also be effective against similar neural networks. This property of being effective on other networks is known as transferability. The question of how well attacks transfer from unquantized models to quantized models remains largely unexplored. Transferability – how effective an attack developed for one type of network is when applied to a similar type of network – has been shown to be low between networks and quantized versions of the same network, meaning that quantization may offer some protection 1.

(13) 1.2. Aim [10]. On the other hand, Bernhard et al. [2] argue that this apparent protection may in fact be the result of gradient masking, and that quantized networks may indeed be vulnerable to attacks. As machine learning continues to proliferate and finds use in ever more critical applications, a high-profile case to study is that of self-driving cars. Being able to trust the vehicle you are in, as well as surrounding vehicles, is paramount when on the road. The machine learning models used in self driving cars are susceptible to adversarial attacks [19]. As quantization becomes standard and is applied to this case as well, the question of whether attacks transfer to quantized models becomes important.. 1.2. Aim. This work aims to apply a few common quantization techniques to a neural network and then employ a set of transfer attacks, trained on the unquantized network, on the quantized versions. This will allow us to glean insights into how quantization affects the transferability of adversarial attacks. Quantization can be applied to any neural network, but the scope of this thesis will be restricted to networks developed for image classification. Image classification is the task most commonly studied in previous works in the field of quantization and allows this thesis to better tie in with the existing literature. The sections following this introduction are organized as follows: A theory chapter, detailing necessary background knowledge; a method chapter where previous work is discussed and the steps taken to achieve the results are outlined; a chapter detailing and explaining the results; a discussion chapter where the results are explained in the context of previous work; and a conclusion where the main points of the thesis are outlined.. 1.3. Research questions. This thesis tries to answer the following research questions: 1. To what extent are some common adversarial attacks against a given neural network transferable to quantized versions of the same network? 2. What effects do some common quantizations have on the accuracy of a neural network?. 2.

(14) 2. Theory. This thesis is concerned with the compression of and attacks against deep convolutional neural networks. In this chapter, core concepts and background material about deep learning, convolutional neural networks, adversarial attacks and quantization of convolutional neural networks are covered.. 2.1. Deep Learning. One of the most common objectives in machine learning is the categorization of objects into classes, known as image classification [14]. These objects might be anything from handwritten numbers to human faces or natural images. Traditional machine learning methods approach the problem of classification by defining a number of features that form a feature space in which boundaries can be drawn between the different object classes. These features are also known as representation. In this traditional framework, the number 0 might be separated from the number 7 by the fact that the zero scores high in the property of roundness while the seven scores lower. In this sense, the traditional approach is explicit and transparent to the humans involved in the machine learning [12]. The traditional approach is limited by the features which can be defined by people. It has therefore been desirable to allow machines to not only learn how to apply man-made representations but also to learn their own representations. This is known as representation learning [12]. To allow this, deep learning has been developed. Deep learning uses artificial neural networks, inspired by the structure of organic brains, to learn features as well as how to apply them in order to classify objects, together with other applications [5]. Neural networks are a collection of linear operators, followed by nonlinear functions, arranged in several layers. The consecutive layers encode progressively complex features. While the input consists of a set of pixels, the following layer might learn features such as edges, which form shapes, which in turn form faces or animals which can be recognized and successfully classified by the neural network. Each of these layers has weights and biases which encode the function that the network is approximating. In order to teach the network, an error function is defined. By computing the gradient of this error function, the weights and biases can be altered so that the error is minimized. Over multiple iterations, the error converges to a (possibly local) minimum.. 3.

(15) 2.2. Convolutional Neural Networks. Figure 2.1: Comparison of the architectures of LeNet [21] and AlexNet [6] to illustrate typical CNN architecture.. 2.2. Convolutional Neural Networks. In deep learning, a convolutional neural network, or CNN, is a type of deep neural network most commonly applied to analyzing visual imagery. A CNN consists of an input layer, hidden layers, and an output layer. The layer structures of two example models LeNet [21] and AlexNet [6] are shown in figure 2.1. Characteristically for convolutional neural networks, the linear operation between layers is a convolution. The input layer matches the dimension of the input, usually an image, and the output layer usually matches the number of classes. The hidden layers process the input and refine the information into increasingly abstract features, culminating in the assignment of a class in the output layer. The way the pixels are grouped and interpreted into features, which in turn are grouped and interpreted into more complex features relies on a pyramid-like structure of neurons and kernels. Kernels pass over small sections of the previous layer. During learning, these weights are altered which results in features being learned [12]. The convolutional operation used between layers allows the network to learn kernels that pick up on specific features in the input and feed the location of these features forward to the next layer. These are commonly known as activations [32]. As an activation passes over the preceding layer and activates, it produces an activation map. The size of these activation maps is a key consideration for keeping the size of a model manageable. In standard artificial neural networks, neurons are fully connected to the preceding layers. For image processing, this results in models that are too large to handle. For this reason, neurons are only connected to a small region of the preceding layer. In addition to not fully connecting the neurons, hyperparameters such as stride can be used to further reduce the dimensionality of the activation maps. With a stride of one, the kernel only moves one step over at each iteration, meaning that there is a high degree of overlap between applications of the activation. This produces. 4.

(16) 2.3. Compression of Neural Networks. Figure 2.2: A 3 layer 1D CNN feed-forward diagram with kernel size of 3 and stride of 1.[41] a very large activation map with a high degree of redundancy. This level of spatial accuracy can be foregone with a greater stride, producing less overlap between applications of the activation, resulting in a smaller activation map and increasing the receptive field [22]. Figure 2.2 shows how a kernel size of three and a stride of one feeds information forward from layer to layer. Each neuron receives input from three neurons in the layer below and the activations overlap since the stride is smaller than the kernel size. The hyperparameters of a convolutional layer are depth, stride, and zero-padding. The depth of the output produced by a convolutional layer depends on the degree to which the neurons in the layer are connected to the next layer. If there is a high degree of connectedness, the depth increases. Increasing the depth allows for better pattern recognition, but increases the complexity of the network. Stride determines to what degree kernels overlap as they pass over the input of the previous layer. Increasing stride decreases the complexity of the network, but a too large stride may hamper the network. Zero-padding is the process of padding the border of the input with zeros, which can be used to control the dimensionality of the output volumes. If an output volume of a certain size is desired, and the input does not fit this size, then the input can be zero-padded to fit [32].. 2.3. Compression of Neural Networks. While being very powerful and flexible, neural networks have considerable disadvantages in terms of storage size, execution time, and general demands on the platforms on which they are deployed. To remedy this, the field of Tiny IoT has emerged and diverse techniques have been developed to decrease the requirements of neural networks, allowing them to be deployed even on simple, low-cost devices. Deng et al. [9] identify four main areas: compact models, tensor decomposition, data quantization, and network sparsification.. Compact Models Before neural networks are trained, they are constructed by connecting the layers that form them. Given the vast space of all possible designs, networks are commonly formed based on previous models which have shown promises. The field of compact models in deep learning aims to create models which by design use fewer and/or smaller layers, lowering both storage size and computational costs. Examples of compact models include SqueezeNet [18] and MobileNets [16]. 5.

(17) 2.3. Compression of Neural Networks SqueezeNet is an architecture that was developed to achieve accuracy competing with the foremost image classifiers while maintaining a low size. In order to achieve this, the authors relied on three main strategies: First, they replaced most 3x3 filters in the network with 1x1 filters. This results in a large decrease in parameters since a 1x1 filter has a ninth as many parameters as a 3x3 filter. Second, the number of input channels to the 3x3 filters that were retained were decreased using squeeze layers. Squeeze layers are layers consisting entirely of 1x1 filters. Third, downsampling was delayed so that the convolution layers had large activation maps. The authors argue that all else being equal, large activation maps improve accuracy. MobileNets aimed to develop a general network structure with tuneable hyperparameters, allowing MobileNets to be used in many applications. The main innovation of MobileNets is that depthwise separable convolutions are used. Depthwise separable convolutions factorize a standard convolution into a depthwise convolution and a 1x1, pointwise, convolution. The depthwise convolution applies a single filter to each input channel and the pointwise convolution then applies a 1x1 convolution to combine the outputs of the depthwise convolution. This splits the functions of a standard convolution into two layers, drastically reducing computation and model size.. Matrix Decomposition Matrix operations abound in neural networks. It would therefore be desirable to compress these matrices to improve both storage and speed of the network. Mathematically, a matrix can be decomposed into a set of smaller matrices which, depending on the properties of the matrices being decomposed, might be smaller than the original matrices matrix. Allowing for approximation here allows further compression, similar to lossy compression of images. Common approaches include low-rank matrix decomposition and tensor decomposition [9]. Low-rank matrix decomposition is a set of methods that aim to split a large matrix into a set of smaller matrices. Full-rank decomposition, as defined by Deng et al. [9], decomposes a matrix into two smaller matrices. Assuming that the original matrix is of rank r, the spatial complexity, i.e. the amount of data needed to store the matrix as a function of its dimensions m and n, can be reduced from O(mn) to O(r (m + n)) [9]. Singular value decomposition is another approach where the matrix is split into two orthogonal matrices and a diagonal matrix with only singular values from the original matrix along the diagonal [44]. With singular value decomposition, the spatial complexity can be reduced to O(r (m + n + 1)). While low-rank matrix decomposition can reduce both spatial and computational complexity, their potential for compression is limited by their plane view of the neural network. From a tensor point of view, a 2-D matrix is a second order tensor and by grouping several layers of the network together, tensor compression techniques can be used to achieve higher compression ratios [40].. Data Quantization The approach relevant to this work is quantization, which refers to the quantization of the numbers that make up the neural network, including the weights, activations, gradients and errors. Any or all of these mathematically continuous properties can be quantized, especially the weights which are very commonly quantized. Data quantization can lead to the reduction of the number of bits in the networks which results in the compression of the original model.. Network Sparsification Rather than reducing the storage cost of each weight, e.g. using quantization, network sparsification aims to reduce the number of weights involved in the execution of the neural network. For example, weights very close to zero might all be set to zero and then disregarded in 6.

(18) 2.4. Quantization Approaches subsequent computations. Neuron sparsification [9] is an alternative approach which seeks to eliminate entire neurons instead of certain weights. Both approaches have shown promising results, see e.g. Network Pruning [17] and Learning both weights and connections for efficient neural networks [15].. 2.4. Quantization Approaches. In what follows, a more in-depth discussion of each of the quantization techniques studied in this thesis is presented.. The H.264 Advanced Video Compression Standard Quantization is not a novel technique developed for neural networks but occurs in many fields of computer science, one of them being video compression. The H.264 video standard is one of the most common standards for video compression and it uses quantization. The desirable property of quantization is that it approximates values in some range using a smaller range of values, allowing those values to be described with fewer bits, achieving compression. Quantization may be applied at several points in the encoding process, but is commonly used after the discrete cosine and wavelet transforms in order to remove insignificant values such as near-zero coefficients resulting from these transforms. A generic type of scalar transform (which quantizes a single value) may be described as follows: FQ = round( X/QP) Y = FQ ˚ QP where X is the input value, Y is the output value and QP is the quantization step size. For example, the step size to quantize values to integers is one, while a step size of two quantizes values to multiples of two and so on. This equation results in a uniform quantization since the step size is the same everywhere. Specifically for the forward transform of the H.264 standard, the coefficients resulting from the discrete cosine transform are multiplied with the constant 215 before quantization (to achieve a greater range of values) and then quantized with a non-specified step size. The values are then divided by 215 before being transmitted. When the signal is received, the values are re-scaled to reverse these steps and recreate an approximation of the original values [33].. Incremental Network Quantization Incremental Network Quantization, or INQ, is a quantization method developed by Zhou et al. [46]. Its innovation is that it combines quantization with retraining in an iterative manner. First, each layer is partitioned into two groups. The weights in the first group are quantized. Then, the weights in the second group are retrained. This allows them to compensate for the error introduced by the quantization of the first group. After this first iteration, the weights in the second group are similarly partitioned, quantized and retrained. The process continues for a set number of iterations after which all remaining weights are quantized.. TensorFlow Lite Float16 Quantization TensorFlow is a platform for building and deploying machine learning models. Given the developments in the field of TinyML, TensorFlow has developed TensorFlow Lite, a set of tools for optimizing and compressing models. Relevant to this thesis, TensorFlow Lite includes tools for post-training quantization. One of these methods is a quantization of the model’s weights to 16-bit floating point values, as compared to the standard 32-bit floating point values. This offers a halving in size. The precise method involved is not documented but presumably amounts to a simple rounding function [26]. 7.

(19) 2.5. Attacks Against Neural Networks. 2.5. Attacks Against Neural Networks. Adversarial attacks against neural networks include altering the input to the neural network in such a way that the network misclassifies the input. This might be done digitally, by altering the input image, or in the real world. For example, the neural networks identifying road signs in autonomous vehicles have been shown to be fooled by graffiti on signs. Developing networks that are robust against attacks is a key security concern as networks become more prevalent throughout society and start to see integration in crucial infrastructure.. Adversarial Examples As defined by Wiyatno et al. [42], an adversarial example is an “input to a machine learning model that is intentionally designed to cause a model to make mistake in its predictions despite resembling a valid input to a human". Furthermore, an adversarial perturbation is defined as “[the] difference between a non-adversarial example and its adversarial counterpart.”, i.e. an image vector which, added to a non-adversarial example, causes the example to be misclassified. The development of adversarial examples has been a key method of attacking neural networks.. Transfer Attacks A property of adversarial examples, and attacks against neural networks in general, is that they transfer between different networks. This property is known as transferability. Depending on the model that the attack was developed on and the target model, transferability may vary greatly [8]. An attack that relies on transferability is known as a transfer attack. In a transfer attack, an attack is developed for a proxy model and then applied to the target model. This makes transfer attacks a subset of black-box attacks, where the attacker does not have access to the target model, as opposed to white-box attacks where the attacker has full access to the target model.. Norms As stated above, a requirement of a (good) adversarial example is that it should look like a valid input to a human. If the only requirement were to cause images to be misclassified, a good approach would be to simply replace the image with noise. To ensure that perturbations not be noticeable by human observers, the norm of the perturbation vector is usually used as a measure of how noticeable the perturbation will be and a sufficiently small norm is sought after in attacks. The most familiar norm is the Euclidean norm, defined as: ||x||2 = ( x12 + x22 + ... + xn2 )1/2 . The Euclidean norm is a special case of the general concept of p-norms. The p-norm of a vector x is defined as: ||x|| p = (|x1 | p + |x2 | p + ... + |xn | p )1/p . Of particular interest is the `8 -norm or maximum norm, which is the limit as p approaches infinity. The `8 -norm can be defined as follows: ||x||8 = maxt|x1 |, |x2 |, ..., |xn |u. The maximum norm is commonly used in machine learning as an alternative to the Euclidean norm. This norm, when used to constrain adversarial perturbations, uniquely only considers the most perturbed pixels.. 8.

(20) 2.5. Attacks Against Neural Networks The `1 , `2 , and `8 are all used as proxies for noticeability when developing adversarial perturbations. A differentiable distance measure that captures perceptual similarity is still an open research problem [35]. The same problem exists for image quality metrics. Numerical image quality metrics such as MSE and PSNR do not always correspond to the image quality as perceived by the human visual system (HVS). This is why image quality metrics based on the HVS, such as SSIM [31], have been very successful. More recently, deep learning has been used for developing blind quality metrics based on the human visual perception, see e.g. LPIPS [45].. Universal Adversarial Perturbations Universal Adversarial Perturbations are a class of adversarial perturbations which can achieve a high fool-rate, i.e. it causes a large percentage of images to be misclassified, with only a single perturbation which is effective for any example. The approach is novel for identifying a single, universal perturbation which works for any input image. Previous works have predominantly developed adversarial perturbations on an image-by-image basis. The algorithm works by looping over a subset of the training data for the classifier and computing the smallest displacement vector which will cause the image to be misclassified. These displacement vectors are then iteratively added together [28]. Finding the smallest displacement vector which will push the input to the decision boundary amounts to an optimization problem: ∆vi Ð arg min }r}2 s.t. kˆ ( xi + v + r ) ‰ kˆ ( xi ) r. where k is the classification function, x is the input image, v is the universal perturbation being constructed and r is the new addition for the particular input image in question. The technique to minimize this function is the same as that in DeepFool [29]. Once this additional displacement vector is added, the entire displacement vector is projected onto the sphere of the desired l p -norm to ensure the proper magnitude.. Figure 2.3: A universal adversarial perturbation generated for the neural network VGG-19. The perturbation has been scaled for visibility.. 9.

(21) 2.6. Practical Examples of Attacks. Fast Gradient Sign Method The fast gradient sign method, or FGSM, is a white box attack that uses the gradient of the loss to compute an effective perturbation xˆ = x + e ˚ sign(∇ x J (θ, x, y)), where xˆ is the adversarial image, x is the original input image, y is the original input label, e is the scale multiplier for the perturbation, θ is the model parameters and J is the loss. That is, we are using the gradient of the loss function to determine the best direction to perturb the input in order to increase the loss function, hopefully causing the image to classify as a different class [13].. 2.6. Practical Examples of Attacks. The attacks discussed above have all shown promise and achieved high fool rates. They are, however, somewhat removed from any real world case. In order to underscore the threat of adversarial attacks in the real world, we will discuss two examples: attacks against selfdriving cars, and attacks against facial recognition. In Robust Physical-World Attacks on Deep Learning Models, Eykholt et al. [11] design physical perturbations in attempt to fool a neural network trained for image recognition. The perturbations were shown to be robust to variations such as different view angles, different distances and different resolutions. The algorithm developed by Eykholt et al., termed RP2 for Robust Physical Perturbations, was then used to create adversarial examples to fool road sign recognition systems. The attacks achieved high fooling ratios in practical experiments where a camera connected to the sign recognition system was driven past signs. The generated attacks fell into two categories. The first type consisted of generating an image of the road sign with added perturbations, printing this image as a poster and placing it over the real sign. The second consisted of adding stickers to the real sign, forming the perturbation. These stickers were also developed to camouflage as graffiti or regular stickers, commonly found on street signs in cities. The fact that these attacks are not only easy to perform (since they require only a printer) but can also camouflage means that they are a considerable threat to self-driving cars, which use road sign recognition systems. In light of these results, it should be mentioned that Lu et al. [25] previously performed a study on the threat of adversarial examples in object detection in autonomous vehicles and concluded that such attacks are not a cause for concern. However, according to Akhtar et al. [1], the attacks used in the study were not as sophisticated as the RP2 attacks. Face attributes are a biometric used in security systems for facial recognition. These attributes may include aspects such as gender, age and features like lipstick [1]. Several attacks to fool neural networks with respect to facial attributes have been developed. For instance, Rozsa et al. [37] [36] generated adversarial examples to fool facial attribute recognition systems with their technique, Fast Flipping Attribute. They found that the tested networks’ robustness against their adversarial attacks varied greatly between different facial attributes. Some attacks were found to be highly effective in changing the label of attributes. Another technique, developed by Mirjalili and Ross [27], modifies images of faces so that the identified gender is changed. The success of these attacks mean that facial recognition based on face attributes can be fooled with adversarial perturbations.. 2.7. ImageNet. In order to run validation and measure the effectiveness of adversarial attacks, a trial dataset has to be used. ImageNet has been the gold-standard for image classification for years and since image classification is one of the most widely researched applications for CNNs, this is 10.

(22) 2.8. VGG-19 a robust choice. ImageNet is also the dataset used in many papers on the subject of network quantization which will allow the results herein to be compared to previous results. Related previous research in this thesis which use ImageNet include INQ [46], DeepFool [29] and UAP [28]. ImageNet consists of a training set of 1.2 million photographs, collected from Flickr and other search engines, hand labeled with the presence or absence of 1000 object categories, and a similar validation set of 150 000 images[38]. The sheer scale of ImageNet makes it uniquely challenging compared to other common datasets such as CIFAR-10 and MNIST.. 2.8. VGG-19. VGG-19 is a neural network developed in [39] for the ImageNet Challenge 2014. The network pushed network depth to 16-19 layers and utilizes very small convolution filters. These advancements led it to perform very well in the ImageNet Challenge and it has since become a staple model available in TensorFlow and Matlab as well as being frequently used in papers on model compression and security. These factors all make it suitable for this thesis.. 11.

(23) 3. Method. In this chapter, the methods applied to achieve the results of this thesis are presented. Previous work is outlined and discussed to motivate methodological choices. An account of the tools and resources used to run the neural networks in this thesis, including Microsoft Azure and the ImageNet dataset, is presented. The pipeline for compressing VGG-19 as well as running validation of and attacks against it is explained.. 3.1. Previous Work. In this section, previous research in the field of quantization and adversarial attacks is outlined.. Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks Through experiments on the CIFAR-10 and SVHN datasets using both gradient-based and gradient-free attacks, Bernhard et al. [2] demonstrate that quantization in itself does not offer any substantial protection against adversarial attacks in a white-box setting. They observe that activation quantization can mask the gradient, hampering some gradient-based attacks. Transferability between quantized and unquantized models, as well as between quantized models of different bitwidths, is shown to be poor. They argue that quantization offers little protection in a white-box setting, but that the poor transferability may offer an advantage in a black-box setting where transfer attacks are necessary.. Relative Robustness of Quantized Neural Networks Against Adversarial Attacks Duncan et al. [10] studied how quantization affects robustness against adversarial examples and specifically how transferability is affected by quantization. They found, through experiments on the MNIST dataset of hand written numbers, that quantized neural networks are only 0.03% less accurate than the corresponding full-precision networks. They further show that quantization can reduce the transfer of adversarial examples to 52.05%. They argue that quantization not only is an effective method of compressing and accelerating neural networks but can also offer the benefit of increased robustness. 12.

(24) 3.2. Tools. Defensive Quantization: When Efficiency Meets Robustness Using the CIFAR-10 dataset and the Fast Gradient Sign Method of generating adversarial perturbations, Lin et al. [23] show that quantized models are significantly more vulnerable to adversarial attacks compared to full-precision models at high enough perturbation magnitudes. They argue that this is because low-bit representations, while helping denoise small perturbations, fail once the perturbations are large enough to push the values into different quantization buckets. At this stage, the error is amplified at each layer, and the error worsens the deeper the network is.. Discussion of previous research There is disagreement between previous works regarding the effects of quantization on resilience against adversarial examples. Furthermore, the different authors use a variety of techniques and datasets, making any direct comparison of their results difficult. However, a few points of agreement stand out: 1. Quantization can denoise small perturbations, helping defend against adversarial examples 2. There is some decrease in transferability from full-precision models to quantized versions In this work, we will investigate these points by performing two adversarial attacks against networks with different levels of quantization, and varying the perturbation intensity.. 3.2. Tools. TensorFlow TensorFlow is a Python library that provides features to train and deploy machine learning models. Its model zoo offers pre-trained models and several papers used in this report provided implementations in TensorFlow. Because of this, it was the natural library of choice to run our tests.. Microsoft Azure In order to deploy the models and implement the different compression techniques, Python notebooks were used. Since machine learning is a demanding task, moving the execution onto an Azure server provided higher performance than running it on a personal computer.. ImageNet The ImageNet dataset consists of a publicly available training set and a validation set which was kept secret for the integrity of the competition. However, old validation sets are available and since the 2014 version is still in use, and is the one referenced by many of the reports mentioned here, using the validation set from ImageNet 2014 was a good option. This dataset is available at academictorrents.com [7] [24].. 3.3. Image Preprocessing. The VGG-19 model is trained to run on images with a resolution of 224 ˆ 224. The images in the ImageNet dataset are of diverse resolutions and ratios, so they need to be resized before they can be used for validation. The images were resized and pre-processed as follows: 13.

(25) 3.4. Validation. 1 2 3 4 5. agaric mushroom jelly fungus gill fungus dead-man’s-fingers. Figure 3.1: Example from ImageNet together with the top-5 classes suggested by AlexNet [20]. The correct class label is mushroom. 1. Resize the smallest side of the image to 256 pixels using bicubic interpolation over a 4x4 pixel neighborhood. The larger side should be resized so that the original aspect ratio of the image is preserved. 2. Crop the central 224 ˆ 224 neighborhood of the image. In order to circumvent the cost of reading the images into numpy arrays before each validation, they were converted into numpy arrays in the processing step. These numpy arrays were saved and uploaded to Azure where they could easily be loaded for validation. To make storage more manageable, the image arrays were stored in groups of 100 images in each file.. 3.4. Validation. The metric used for validation in most related works is top-k accuracy, specifically top-1 and top-5 accuracy. In order to allow comparison with previous works, this metric was used. The top-k accuracy of a single image is binary, either 1 or 0, and is determined by whether the correct class label is found in the top-k labels suggested by the neural network. For example, figure 3.1 shows an image from the ImageNet dataset and the top-5 class labels suggested by AlexNet. The correct image label is mushroom, which is AlexNet’s top-2 guess. This means that the top-1 accuracy is 0, but the top-5 accuracy is 1 since the correct label is in the top5 suggestions. When determining the overall top-k accuracy of a network, this process is applied to each example image that has been classified and the mean is calculated. Measuring top-k accuracy, or any accuracy, requires a ground truth that states the correct class label for each image. Just like the ImageNet validation set, this ground truth had to be sourced from a helpful GitHub repository [34] rather than official sources. The ground truth was encoded as a numpy array wherein the correct class index for each image was saved. When running classification on a model trained on ImageNet, an array is returned where a weight is assigned to each class. The class with the highest weight is the one that the model has identified as the most likely correct label. Tensorflow’s argsort function can be used to find the indexes of the highest values in an array. If the correct class label is in the k top predictions then the image has been correctly classified. The percentage of correctly classified images is returned as the top-k accuracy.. 3.5. VGG-19. VGG-19 trained on ImageNet was available for download from the TensorFlow model zoo. Tests were performed to validate that the model achieved the same results as stated in the original report. The structure of VGG-19 is shown in table 3.1, column E.. 14.

(26) 3.6. Universal Adversarial Perturbations. A 11 weight layers conv-3 64. conv-3 128. conv-3 256 conv-3 256. ConvNet Configuration B C D 13 weight layers 16 weight layers 16 weight layers input (224 ˆ 224 RGB image) conv-3 64 conv-3 64 conv-3 64 conv-3 64 conv-3 64 maxpool conv-3 128 conv-3 128 conv-3 128 conv-3 128 conv-3 128 maxpool conv-3 256 conv-3 256. conv-3 256 conv-3 256 conv-3 256. E 19 weight layers conv-3 64 conv-3 64 conv-3 128 conv-3 128. conv-3 256 conv-3 256 conv-3 256. conv-3 256 conv-3 256 conv-3 256 conv-3 256. conv-3 512 conv-3 512 conv-3 512. conv-3 512 conv-3 512 conv-3 512 conv-3 512. conv-3 512 conv-3 512 conv-3 512. conv-3 512 conv-3 512 conv-3 512 conv-3 512. maxpool conv-3 512 conv-3 512. conv-3 512 conv-3 512. conv-3 512 conv-3 512 conv-3 512 maxpool. conv-3 512 conv-3 512. conv-3 512 conv-3 512. conv-3 512 conv-3 512 conv-3 512 maxpool FC-4096 FC-4096 FC-1000 soft-max. Table 3.1: Table outlining the different structures tested by Simonyan et al. in their original paper on VGG-19 [39]. VGG-19 is represented in column E.. 3.6. Universal Adversarial Perturbations. Since VGG-19 has been used in this work as a common reference model, it was possible to use pre-computed perturbation from the original paper on universal adversarial perturbations. This perturbation image was then simply added to the input in the validation step in order to apply the attack to each of the quantized models. The perturbation was multiplied with a variable e before being added to the image in order to study its impact at different magnitudes.. 3.7. Fast Gradient Sign Method. The implementation of the fast gradient sign method is as follows: 1. Have the model classify the input image to determine the class. 2. Calculate the gradient of the loss function with respect to the image class. 3. Scale this gradient by the scale factor e. 4. Apply the gradient to the image, forming an adversarial example.. 15.

(27) 3.8. Float16 Original value Quantized value. 0.34119523 0.25. 0.09563112 0.125. 0.0177449 0.015625. -0.11436455 -0.125. -0.05099866 -0.0625. -0.00299793 -0.00390625. Table 3.2: Sample weights before and after 8-bit quantization. 3.8. Float16. TensorFlow has a library dedicated to model compression called TensorFlow Lite. By converting VGG-19 to the TensorFlow Lite format, their built in Float16 compression could be applied. The conversion to TensorFlow Lite was done as in the following code snippet converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_types = [tf.float16] tflite_model = converter.convert() where model is the Keras model to be converted. The Float16 precision is specified on line 3.. 3.9. 8-bit Quantization. For the 8-bit quantization, the basic quantization scheme from INQ was used. This quantization did not, however, involve the incremental quantization aspect. Under this scheme, all weights are quantized to be a power of two, belonging to the set Pl : Pl = t˘2n1 , ..., ˘2n2 , 0u, where n1 and n2 are integers and n2 ď n1 . These integers define an upper bound for the weights as well as a lower bound, meaning that all weights smaller than 2n2 are pruned away. n1 and n2 are calculated as follows: n1 = f loor (log2 (4s/3)), s = max ( abs(W )), where W is the set of all the weights in the network. Once n1 is calculated, n2 is found using the formula: n2 = n1 + 1 ´ 2(b´1) /2, where b is the bit width of the quantized model, which is 8 in this case. We then iterate over the set of possible weights Pl and find the appropriate quantized weight: wˆ = β sgn(w) i f (α + β)/2 ď abs(w) ă 3β/2, where w is the original weight, wˆ is the quantized weight and α and β are two adjacent weights in Pl . If the end of the list is reached without the condition being satisfied, the weight is set to zero [46]. In order to iterate over all weights in the network, the flat[] iterator was used. This creates an iterator which behaves as if a multi-dimensional array has been flattened into a single dimension, making it easy to loop over all values. This is known as vectorization. By looping over the weights of each layer and applying the quantization function, the network was quantized.. 16.

(28) 3.10. 4-bit Quantization. 3.10. 4-bit Quantization. As shown by Lin et al. [23], quantization to bitwidths lower than 8 can result in a loss of robustness. It would therefore be interesting to study a lower bitwidth such as 4 bits. In order to do so, the quantization described above was used to quantize VGG-19 to four bits. This resulted in a tremendous loss of accuracy. Similar results were found by Nayak et al. [30] and Choukroun et al. [4] when studying low bitwidth quantizations. They both noticed that accuracy collapses around 5-4 bits. Choukrun et al. conclude that 4 bits, meaning 16 possible values, is not enough to cover the range of values in the entire network. Instead, they perform 4 bit quantization kernel by kernel, with a different range of values for each kernel, allowing the quantization to adapt to the different distributions of weights in different parts of the network. This resulted in a 4-bit accuracy comparable to higher bitwidths. These findings show that low bitwidth quantization can be used without significant accuracy loss when it is applied to a small part of the network. A common approach to defend against adversarial attacks is to preprocess the input to denoise perturbations [3], and it has been argued that quantization defends against perturbations in a similar way, for example by Duncan et al. [10]. Combining these two ideas, we could apply low bitwidth quantization to a small part of the network and possibly achieve a protective denoising of perturbations.. 3.11. Single Layer Quantization. In order to study the effects of quantization as a protective measure, quantizing a single layer might achieve similar results to quantizing the entire network. We put forward this hypothesis since the denoising resulting from quantization is similar to a preprocessing measure, and since Lin et al. [23] argue that the effects of quantization are especially pronounced in the early layers and then cascade throughout the network. To test this, the quantization described above was applied to the first layer of VGG-19 with bit widths of 4 and 8.. 17.

(29) 4. Results. Herein, the results of the methods described in the previous chapter are presented. The results are grouped primarily by the attack that was used, and for each attack, the accuracy of each of the models is presented. The results show that universal adversarial perturbations transfer well to all models, while perturbations generated by the fast gradient sign method have a diminished transferability to the 8-bit models. The apparent visibility in relation to fool rate is also shown to be lower for the fast gradient sign method.. 4.1. Universal Adversarial Perturbations. The application of universal adversarial perturbations resulted in a noticeable loss of accuracy, with similar results for the unquantized and quantized models. The validation was run on a random subset of 2000 images. The batch size for the float16 model was set to 1 and for all other models it was set to 10. In figure 4.1, the universal adversarial perturbation is shown when applied to an example. Epsilon is the factor multiplied by the UAP to control its magnitude. The perturbation is clearly visible even at e = 1, and these high intensities were chosen to increase the fooling rate. Table 4.1 shows the top-1 and top-5 accuracy of the models when the universal adversarial perturbation for VGG-19 is applied. The conversion to Float16 resulted in a small loss of. (a) e = 0. (b) e = 1. (c) e = 2. Figure 4.1: Example with UAP applied at various intensities.. 18. (d) e = 3.

(30) 4.2. Fast Gradient Sign Method e Reference Float16 8-bit SLQ8 SLQ4. 0 0.713 0.706 0.657 0.686 0.689. 1 0.325 0.321 0.325 0.336 0.334. 2 0.055 0.055 0.051 0.056 0.056. 3 0.009 0.009 0.005 0.009 0.010. 0 0.902 0.893 0.871 0.886 0.889. e Reference Float16 8-bit SLQ8 SLQ4. 1 0.493 0.490 0.490 0.497 0.497. 2 0.120 0.119 0.115 0.124 0.122. 3 0.018 0.018 0.018 0.019 0.019. Table 4.1: Top-1 and top-5 accuracy of models with UAP applied with different levels of perturbation magnitude and different quantizations.. 1. 1. Top-1 accuracy. 0.8 0.6 0.4 0.2 0. Reference Float16 8-bit SLQ8 SLQ4. 0.8 Top-5 accuracy. Reference Float16 8-bit SLQ8 SLQ4. 0.6 0.4 0.2. 0. 2. 1. 3. e. 0. 0. 2. 1. 3. e. Figure 4.2: Top-1 and top-5 accuracy of models with UAP applied with different levels of perturbation magnitude and different quantizations. accuracy, and the conversion to 8 bits resulted in a more pronounced loss. The single layer quantized models perform similarly to the reference model. Figure 4.2 shows the top-1 and top-5 accuracy plotted against epsilon, the multiplier of the perturbation. The fooling rate for each of the models is very similar at each epsilon. These results show that the universal adversarial perturbation developed for VGG-19 is highly transferable to the quantized versions of it. The overall trend accuracy-wise is that the weaker the compression is, i.e. the wider the bit width and the fewer layers affected, the lesser the loss in accuracy is. This is to be expected since if the model is changed less, then the accuracy should be less affected.. 4.2. Fast Gradient Sign Method. The fast gradient sign method also yielded a noticeable accuracy loss. The batch sizes were, once again, 10 for all models except the float16 model which required a batch size of 1. In figure 4.3, an example is shown with an FGSM perturbation applied at various intensities. For the chosen intensities, the FGSM perturbation is less noticeable than the UAP perturbation. Table 4.2 shows the top-1 and top-5 accuracy of the models when FGSM is applied. What stands out here is that the 8-bit quantized models, both the fully quantized model and the single layer quantized model, seem to be somewhat resilient against the FGSM transfer attack. This suggests that the low-bitwidth quantization helps denoise the perturbation, weakening its adversarial effect. The 4-bit SLQ on the other hand shows a heightened vulnerability at e = 1, but then performs similarly to the 8-bit SLQ with the stronger perturbation intensities. Figure 4.4 shows the top-1 and top-5 accuracy as a plot. The resilience of the 8-bit model is once again noticeable. The initial loss in accuracy at e = 1 is similar to that of UAP, but the 19.

(31) 4.2. Fast Gradient Sign Method. (a) e = 1. (b) e = 2. (c) e = 3. (d) e = 10. Figure 4.3: Example with FGSM applied at various intensities. Note that the intensities are not the same as for UAP, e = 10 is shown for visibility. e Reference Float16 8-bit SLQ8 SLQ4. 0 0.713 0.706 0.657 0.686 0.689. 1 0.268 0.268 0.364 0.343 0.256. 2 0.154 0.154 0.213 0.185 0.184. 3 0.117 0.117 0.145 0.132 0.131. 0 0.902 0.893 0.871 0.886 0.889. e Reference Float16 8-bit SLQ8 SLQ4. 1 0.635 0.635 0.708 0.696 0.624. 2 0.463 0.463 0.529 0.505 0.504. 3 0.371 0.371 0.427 0.405 0.406. Table 4.2: Top-1 and top-5 accuracy of models with FGSM applied with different levels of perturbation magnitude and different quantizations.. 1. 1. Top-1 accuracy. 0.8 0.6 0.4 0.2 0. Reference Float16 8-bit SLQ8 SLQ4. 0.8 Top-5 accuracy. Reference Float16 8-bit SLQ8 SLQ4. 0.6 0.4 0.2. 0. 2. 1 e. 3. 0. 0. 2. 1. 3. e. Figure 4.4: Top-1 and top-5 accuracy of models with FGSM applied with different levels of perturbation magnitude and different quantizations. subsequent increases in intensity do not yield a similar loss. However, the magnitudes of the FGSM perturbations are far smaller which would explain this disparity.. 20.

(32) 5. Discussion. In this chapter, the results from the preceding chapter are discussed. The methods and approaches used in this thesis are also examined. Key points brought up include the difference in transferability between universal adversarial perturbations and the fast gradient sign method, what properties of attacks lead to transferability, and possible flaws in the method. The implications of the work in a wider context are also discussed.. 5.1. Results. The results demonstrate a clear difference between the transferability of universal adversarial perturbations and the transferability of the fast gradient sign method. UAP affects all models similarly while FGSM leaves the 8-bit models comparatively unaffected. In the original paper on UAP [28] the authors observe that their perturbations are not only universal in the sense that a UAP developed for one model works for any example used with that specific model, but the perturbations also generalize well across networks. The results outlined in this thesis suggest that UAPs also transfer well to quantized networks. The authors of FGSM, as well, mention that their adversarial examples generalize well, but our results indicate that they do not generalize as well as UAPs. A possible explanation is provided by Xie et al [43] who researched methods to increase the transferability of FGSM methods and found that the transferability of examples can be increased by training them on a wider variety of inputs. Training adversarial examples on more than one input resulted in more general examples and higher transferability. They also noted that vanilla FGSM, which was used in this thesis, suffers from overfitting. This would help explain the disparity we see between UAP and FGSM. The adversarial perturbation for UAP is the sum of many different adversarial perturbations and should thus be more general, whereas the FGSM examples are generated from single images and will thus not transfer as well. To reiterate, the fast gradient sign method was used to generate perturbations by first classifying the image to find the class, and then the gradient of the error function of that class was multiplied with a scaling parameter e, creating a perturbation. This means that the FGSM perturbations were generated by using just a single example for each perturbation. This made these perturbations specific to the example they were made for. The Universal Adversarial Perturbations, on the other hand, are constructed by starting from a zero matrix and iterating over a set of examples, adding a small perturbation for each example. This per21.

(33) 5.1. Results turbation is found by using optimization to find the smallest perturbation which will cause misclassification. In this way, the UAP is trained on multiple examples. Since the total perturbation is the sum of several perturbations which each work for a different image class, the perturbation becomes universal and works on a multitude of examples. While achieving similar fool rates at e = 1, UAP and FGSM differed greatly in how noticeable the perturbations were. The UAP perturbation is clearly visible while the FGSM perturbation is not, yet they have similar effects. This might be due to the fact that the FGSM perturbation is better tuned to the specific example in question. It would be reasonable to hypothesize that a general, universal perturbation needs to have a greater magnitude (and thus, greater visibility) to achieve the same results as a tailor-made (and perhaps overfitted) perturbation. The results support this hypothesis. The loss in accuracy from the float16 quantization when no attack is applied is approximately 1 percent. This is slightly higher than expected since Tensorflow’s documentation of their float16 quantization claims it causes "minimal" accuracy loss. This apparent accuracy loss may, however, be an aberration caused by the somewhat small test dataset and could decrease with a more thorough validation. Assuming the accuracy loss is no more than one percent, that is an attractive trade off considering that the model size is halved. That being said, it is far outperformed by state-of-the-art methods which achieve both greater compression and higher accuracy. The 8-bit quantization resulted in a more pronounced accuracy loss, which was to be expected since this vanilla quantization used half as many bits as the float16 method and did not make use of techniques such as retraining or heuristics to improve the accuracy. Similarly to the float16 compression, the trade off is not objectionable, but there are far better methods available, such as Incremental Network Quantization [46] and Defensive Quantization [23]. Testing these more robust methods would reveal more about the relationship between quantization and security. Due to the limitations of resources and time, the scope of this thesis had to be limited to simpler quantization methods, leaving methods such as INQ and Defensive Quantization for future work. The result of the 8-bit single layer quantization is very similar to the results of the complete 8-bit quantization. Looking specifically at the results of FGSM, the SLQ achieves a higher baseline accuracy (with no attack) while being almost as robust against FGSM. This seems to indicate that quantizing a single layer, as a defensive measure, can be almost as effective as quantizing the entire network. However, it does not offer the same degree of compression. Therefore, it might be preferable to perform complete 8-bit quantization to achieve higher compression ratio, while preserving the robustness of the network. Labouring under the assumption that the effect of a quantization when applied to a single layer correspond to the effect when the quantization is applied to the entire network, the results of the 4-bit single layer quantization are in line with what Lin et al. [23] found: that low-bitwidth quantization results in a loss of robustness. The 4-bit SLQ is more vulnerable to FGSM than the unquantized model. This possible correlation between a single layer quantization and a full quantization is interesting and could be explored further as future work. The fact that the 8 bit SLQ performs so similarly to the full 8 bit quantization, from a transferability standpoint, is very interesting. It seems to indicate that the majority of the impact of quantization, with regards to adversarial attacks, can be captured by quantizing a single layer. In this work, the only layer quantized was the first one. Therefore, we cannot compare this to the effect when quantizing other layers. Lin et al. [23] argue that the main reason why low bitwidth quantizations can amplify the effect of adversarial perturbations is that values fall into the wrong quantization buckets early in the network and that this misclassification compounds throughout the network. This supports the idea that quantizing the first layer is the most impactful option.. 22.

(34) 5.2. Method. 5.2. Method. The aim of this thesis is to examine the relationship between quantization and transfer attacks and draw general conclusions. In order for such conclusions to be drawn, a wide range of attacks and quantizations should have been investigated. Due to the constraints of time, money, and prior knowledge, the number of attacks and quantizations had to be limited. This narrows the scope of this thesis and consequently limits the conclusions that can be drawn from it. In future work, performing similar studies on a wider range of attacks and quantizations would help glean further insights into how quantization affects the security of neural networks. Specifically, using a greater variety of bit widths for the full quantization might have revealed results similar to those found by Lin et al [23], who showed that while low bitwidth quantization may initially seem to offer an increased robustness against adversarial attacks, lowering the bitwidth further results in a loss of robustness. Our results show that 8-bit quantization offers some protection against FGSM transfer attacks, which prompts the hypothesis that even lower bitwidth quantization might offer more protection. The conclusions of Lin et al problematize this and it would therefore have been good to use lower bitwidth in this thesis so that this hypothesis could be tested. The results of the single layer quantizations do seem to indicate that a 4 bit quantization is more susceptible to attacks, confirming the ideas of Lin et al, but we cannot be sure that the accuracy of a single layer quantization is strongly correlated to the accuracy of the full quantization. This relationship needs to be explored further. The magnitudes used for the attacks were chosen arbitrarily. This does not prevent drawing conclusions related to the two research questions, but it makes it difficult to compare these fool-rates to those found in previous works. It also makes it harder to directly compare the effectiveness of the two attacks. It is clear to see that the FGSM perturbations were less visible than the UAP perturbations, but having a metric to back this up would add weight to the claim.. Source Criticism The sources referenced in this thesis are almost exclusively peer-reviewed research papers, conference papers and books written by researchers. The exceptions are the citations of Academic Torrents and TensorFlow, but these citations are only used to support claims about their respective services and not any scientific claims. The sources used can therefore be considered trustworthy.. 5.3. The Work in a Wider Context. As neural networks become more prevalent in society at large, their security becomes tantamount to that of crucial infrastructures such as the electrical grid and the water supply. Once self driving cars are common on our roads, attacks against the neural networks controlling them could jeopardize the lives of drivers and passengers. Once facial recognition becomes widely used, attacks against the neural networks in security cameras could cause innocent people to be indicted for crimes they did not commit. The results of this work are relevant to those, and other, applications since quantization is likely to become a standard technique applied to most neural networks as a means of compressing them and speeding them up. Since most black box attacks are some form of transfer attacks (i.e. they are usually trained on a proxy model), the relationship between quantization and transfer attacks is highly relevant and conclusions regarding that relationship have wide-reaching consequences. The results of this thesis suggest that low bitwidth quantization may offer some protection against transfer attacks. It would therefore be desirable to use. 23.

(35) 5.3. The Work in a Wider Context that quantization in security sensitive applications, both for compression and as a defensive measure.. 24.

(36) 6. Conclusion. By applying transfer attacks to VGG-19 and two quantized versions of it, results have been produced which illuminate the relationship between quantization and transfer attacks. In what follows, a conclusion and the research questions of Sec. 1.1 will be answered.. 6.1. To what extent are some common adversarial attacks against a given neural network transferable to quantized versions of the same network?. Both universal adversarial perturbations and the fast gradient sign method produce adversarial examples that transfer well to models compressed using 16-bit floating point value quantization. This quantization does not offer any protection against transfer attacks, nor does it introduce any additional susceptibility to these attacks. It can thus be viewed as a compression method with no particular effect on transfer attacks. While the examples produced by the fast gradient sign method transferred well to the model quantized with 16-bit floating point values, the model quantized to 8 bits showed some resiliency against this transfer attacks. This resiliency did not hold for universal adversarial perturbations. This difference in transferability could be explained by the fact that the universal adversarial perturbation was trained on multiple examples, which has been hypothesized to make examples generalize better [43]. The results show that low bit width quantization to 8 bits decreases transferability. However, the 4 bit single layer quantization resulted in an increased transferability. Since the results of a single layer quantization seem to match the performance of the corresponding full quantization, this suggests that full 4 bit quantization also would result in increased transferability.. 6.2. What effects do some common quantizations have on the accuracy of a neural network?. When quantized to 16-bit floating point values, the top-1 and top-5 accuracy of VGG-19 decreased by approximately 1 percent. The quantization to 8-bit integers resulted in a greater accuracy loss of around 10 percent. 4 bit post-training quantization with the same quantiza25.

(37) 6.3. Future work tion range globally in the network resulted in performance on par with random guesses. This is in line with the findings of Nayak et al. [30] and Choukroun et al. [4]. The effect, observed in some previous works, where quantization improves accuracy was not observed.. 6.3. Future work. In order to better understand the effects of quantization on transfer attacks, a continuation of this work could be performed where more attacks and quantizations are used. The results of this thesis have given rise to two key questions which should be investigated: i) does even lower bitwidth quantization offer more protection against transfer attacks, or does it reduce the robustness? ii) what separates attacks that transfer well to quantized models from attacks that do not? In order to answer these questions, the same methods used in this report can be used. Emphasis should be placed on using multiple low-bitwidth models to answer i. A review of the literature would help provide multiple hypotheses as to what makes attacks transfer, to complement that of Xie et al [43], and the roster of attacks could be designed to test these hypotheses. In the case of the idea that adversarial examples transfer better if they have been trained on multiple inputs, a set of FGSM attacks trained on a variable number of inputs could be used. The single layer quantization proposed in this thesis could also be explored further, specifically the relationship between SLQ and the corresponding full quantization. If it holds true that an SLQ gives the same defense against transfer attacks as the corresponding full quantization, then SLQ is an easy way to add a degree of protection against transfer attacks, with smaller accuracy loss. Using SLQ also allows otherwise nonviable quantizations, such as simple 4 bit quantizations, to function. Lastly, the experiments in this thesis were performed entirely on VGG-19. In order to draw conclusions about neural networks in general, a broad range of networks must be studied. In order to better understand how much a network can be quantized before there are drawbacks in terms of accuracy loss and loss of adversarial robustness, experiments similar to those in this thesis could be performed on a roster of networks.. 26.

References

Related documents

For instance, we can run multiple meshes (standard rendering) and volumes (volume raycasting) where we store the fragment data in our A-buffer instead of rendering them directly

In this study, mean-variance spanning tests that are based on classical optimal portfolio theory, are performed to examine whether there is a statistically significant

Most of the definitions of welfare in the literature (Chapter 4) belong to the Three Broad Approaches presented by Duncan and Fraser (1997), even though other definitions are

Even though the size of the pre-built vocabulary of the summary-generation model is far less than the size in the sentence-generation model using the similar structure in paper

In this thesis we investigate if adversarial training of an existing competitive baseline abstractive summarization model (Pointer Generator Networks [4]) using a loss from

This section starts with a quick review of deep neural networks in gen- eral, then we describe the modified GoogLeNet architecture used in this work and the section ends with

12, ˆ ρ adv is measured for a targeted attack in L ∞ for NDF and IFGM with the same step size, 0.001, on the DNN classifier on the MNIST dataset.. Is clipping important

The first column specifies the attack used to construct adversarial ex- amples for the training data, the second column shows the inputs of the filters, the third column shows