• No results found

Comparing performance of convolutional neural network models on a novel car classification task

N/A
N/A
Protected

Academic year: 2022

Share "Comparing performance of convolutional neural network models on a novel car classification task"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2017,

Comparing performance of convolutional neural network models on a novel car

classification task

Jämförelse av djupa neurala nätverksmodeller med faltning på en ny bilklassificeringsuppgift AMUND HANSEN VEDAL

KTH

SKOLAN FÖR DATAVETENSKAP OCH KOMMUNIKATION

(2)
(3)

Comparing performance of convolutional neural network models on a novel car classification task

Amund Hansen Vedal

Department of Media Technology and Interaction Design KTH Royal Institute of Technology

Stockholm, Sweden SE-100 44 amund@kth.se

Abstract

Recent neural network advances have lead to models that can be used for a variety of image classification tasks, useful for many of today’s media technology applications.

In this paper, I train hallmark neural network architectures on a newly collected vehicle image dataset to do both coarse- and fine-grained classification of vehicle type. The results show that the neural networks can learn to distinguish both between many very different and between a few very similar classes, reaching accuracies of 50.8% accuracy on 28 classes and 61.5% in the most challenging 5, despite noisy images and labeling of the dataset.

Figure 1: Sample images from 24 of the classes of the PlatesMania dataset used for most experiments.

The four classes "Other", "Kei car", "Caddy" and "Pickup with box" are not included since these were dropped after initial experiments

1 Introduction

Automating image classification and clustering tasks is a long-standing problem of computer vision. A classical problem where such systems

can be applied is traffic surveillance, where the user may desire to detect a vehicle in an im- age (object detection) or recognize the make or model of a detected vehicle (image classi- fication). An early vehicle detection system

(4)

of this kind using neural networks was already proposed more than three decades ago [4].

In recent years, there has been some major im- provements in using neural networks for im- age classification tasks. Using convolutional layers, Krizhevsky et al. [18] presented a net- work architecture causing a jump in accuracy compared to earlier models, inspiring several ground-breaking, new model types [22,25,12]

Through a media technology perspective, we can already see the impact of image classification-techniques on our daily lives. The face recognition features of Facebook [27], ge- olocation of images [28], and transfering style from one image to another [8] has all become a part of our daily lives. These techniques are also used in less playful contexts, such as in di- agnosing brain tumors from MRI images [11].

In training deep neural networks for image clas- sification, an important requirement is a dataset of labeled images to train on, using supervised learning. In this paper, I first present a method of collecting, labeling and processing images which we used to create our new PlatesMa- nia car image dataset, containing more than 200.000 labeled images of cars in different en- vironments (Section3.1). I then describe image classification experiments conducted on this dataset using the three recent new neural net- work architectures VGG by Simonyan and Zis- serman [22], GoogLeNet by Szegedy et al. [25], and ResNet by He et al. [12] (Section4.1), and use the results to discuss differences between them (Section8).

My results show that pre-designed neural net- works can be trained to classify images with high confidence, even when the dataset surely contains label errors, and without clearing away noise from the images with object detection methods. The best-achieving network reaches 50.8% accuracy for Top-1 score on all 28 classes of our new PlatesMania dataset, and a61.5% Top-1 score on a more fine-grained task of separating only the five most similar classes. I also show that the choice of neural network architecture can affect the classifica- tion differently, such as giving very different Top-5 scores, even when the Top-1 accuracies of the networks are similar. To strengthen this argument, I analyze the performance of the net- work using common methods such as confusion matrices and testing classification accuracy on previously unseen test data.

Contributions

This work contributes an exploration of the whole process of collecting a dataset of pictures, training a neural network for vehicle classifica- tion, and finally attempting to optimize their performance on the dataset:

• I explain how we collected useful data effi- ciently and prepared a dataset from it (section 3.1), get baseline results from our neural net- work models (Experiment 1) and introduce the programming library and hardware used to con- struct and train them.

• I compare results after removing classes (Ex- periment 4), and from varying the speed of learning rate decay (Experiment 2).

• I compare results for datasets of variable size, and attempt substituting datasize with data aug- mentation

• I attempt fine-grained classification on the 5 most similar classes of our dataset

• I make deeper models with the same structure compete side-by-side using the same material (Experiment 6)

• I discuss the meaning, or lack thereof, of com- paring results between these different experi- ments.

Note: I write "we" when referring to collabo- rative work with Sina Ghassemi that I used for my experiments. I use "I" about choices based on my own hypotheses.

2 Related work

For vehicle classification in particular, there are (to my knowledge) only a few such labeled datasets of high quality [29,16]. These datasets are quite clean (free from noise information), as this information was usually cropped out manu- ally using human supervision. These datasets also contain bounding box coordinates if a cropped version is desired.

Vehicle classification using neural networks has already been attempted in by for example Yang et al. [29] and with very promising results.

There has also been some attempts by Krause et al. [16] to convert images into 3D represen- tations to be classified more easily, and some student work attempting transfer on the same dataset [19].

(5)

3 Dataset

3.1 Collecting and labeling images In order to focus on training our networks, we searched for sources of labeled car images to build a large dataset quickly. We wanted to build our own dataset rather than using a pre- pared one such as CompCars [29], but we imag- ined using Google Image searches would be extremely tedious and time-consuming.

As such, our goal was to create a database of at least 1000 labeled pictures per class – the same as the basis of the ImageNet dataset, described in Krizhevsky et al. [18]. To accomplish this, we first downloaded one picture per 7217 car models from PlatesMania, a free database of vehicle pictures labeled with car model [1]. We then hand-sorted them into 28 distinct classes (see Figure1): ConcreteMixer, Truck, Pickup, Van, Minibus, Campervan, Bus, Doubledecker, Veteran, Jeep, Limousine, SUV, Hatchback, Sedan, Sports, Station Wagon, Compact, Amer- ican, Classic, Military, Firetruck, Ambulance, Motorcycle, Crane, Other, Kei Car, Caddy and Pickup with box. Some classes were chosen based on similar studies [29,16], others were inspired by internet sources [2], and the remain- ing extracted ad hoc to minimize the "Other"- class (such as ConcreteMixer, Millitary and Crane). Throughout the course of our exper- iments, we continued downloading 296,000+

images of the vehicle models we had already classified, automatically mapping them to our classes. The result is the largest vehicle dataset we know about, containing over 207,000 unique images. The categorical distribution of our classes is shown in Figure2.

Classes with < 15000 Sedan Hatchback SUV Truck Bus Stationwagon Sports

Figure 2: Distribution of images in each class of our PlatesMania dataset

3.2 About the images

The images are taken in all sorts of different environments, not always encompassing the whole vehicle, and usually from the perspec- tive of a regular by-stander. Most have a rea-

sonable resolution (around 1200x900 pixels), and contain various objects from their surround- ings, such as other vehicles, people, buildings, trees or traffic signs (see Figure3). Sometimes watermarks are also present (Figure4). To- gether, these elements form noise characteristic to our dataset, later referred to as "PlatesMania dataset", which is part of what the network has to learn to recognize as irrelevant.

(a) Ambulance (b) Compact

(c) SUV (d) Truck

Figure 3: Challenging samples from our Plates- Mania dataset

(a) Ambulance (b) Compact Figure 4: Examples of watermarked images

4 Method

4.1 Neural network models

For my experiments, I chose three well- known network models: VGG, GoogLeNet and ResNet. These three, all convolutional and im- provements of the breakthrough AlexNet [18], are recent winners of the ImageNet-competition [21]. The networks have also been proven suc- cessful for similar tasks of vehicle classification and detection [19,29].

VGG VGG [22] from Oxford University is the least complicated network out of the three, and as such a good starting point for my train- ing. It has 16 convolutional layers and no par- allel layers or residuals like the other two. The network is the least optimized (has the most

3

(6)

weights), and as such takes longer to load and train than the others.

GoogLeNet GoogLeNet [25] is one of the 2014-winners of the ImageNet competition [21].

It introduces the so-called Inception-module, which processes the signal with different ker- nels (convolutional layers) in parallel to pro- duce a richer stack of feature maps. This is believed to help the network learn multiscale features [25]. After each Inception-layer, the output of each parallel "path" is concatenated and passed on deeper into the network The spe- cific name for the model I used is Inception-3.

ResNet Microsoft Research won all three classes of ImageNet in 2015 with ResNets [12].

Its name comes from how its layers are orga- nized into so-called "residual blocks", where the original input signal of a block is directly added to its output. This trick helps to avoid the vanishing gradient problem when networks become extremely deep [20,12]. I used the 34-layer deep ResNet (exception inExperiment 6).

4.2 Tools

Torch7 Torch7 [6] is one of several program- ming libraries used to create neural networks.

It is accessed through the LUA scripting lan- guage, and has implementations for CUDA, which helps get the most out of the NVIDIA GPU’s I used in my experiments (see Hard- ware). It is also open-source, has good doc- umentation and tutorials, and a large and ac- tive community, including Facebook AI whose ResNet [12] implementations I used for my ex- periments. Well-known alternative frameworks to Torch7 include Caffe [15] and Tensorflow [3], which all have multi-GPU support. Torch was chosen because we believed it had faster multi- GPU implementation at the time. I based my ex- periments on Torch-implementations made by one of the core maintainers of Torch, Soumith Chintala [5].

Hardware I used NVIDIA Tesla K80 GPU- cards for training my networks. This graph- ics card consists of two GPU’s of 12GB of GPU-RAM (VRAM) each, enabling compu- tational speeds up to 8.74 TFLOPs (trillions of floating-point operations per second) if running in single-precision mode [23]. Using double- precision would slow down the computations and have little to no effect on classification ac- curacy [10]. Training a ResNet on a single GPU required about 24 hours per 8 epochs of train- ing.

5 Preprocessing

Standard preprocessing Initially, to have equal amounts of input pixels for the first layer of the network, I first scale the images so the shortest side is 256 pixels, and then center-crop to 244x244 pixels [22, 13]. This method is based on the assumption that the vehicle is lo- cated in the middle of the picture. For each color layer (RGB), I then perform standard data normalization, by subtracting the mean and di- viding by standard deviation. This, along with Batch Normalization layers help the networks converge faster and avoid problems with van- ishing or exploding gradients [20,14].

Data augmentation I also perform some less common preprocessing-steps, such as so-called multicrop, since feeding the network with finer samplings of the image is believed to help it learn [22,25]. Differently from Szegedy et al.

[25], however, I chose to use 9 larger crops, to avoid large increases of computation time. Us- ing the original, resized image where the short- est side has size 256 pixels, I crop the image in 9 different positions using a 224x224 cropping mask. This results in 9 heavily overlapping parts distributed evenly over the image as a grid (Left–Center–Right, Top–Mid–Bottom).

Color jittering To make the features of the pictures easier to distinguish, I altered the col- ors of the images in two ways, both suggested in the packaged Torch-implementation of ResNet from Facebook [7]. The first, well-described in [18], is a PCA-based technique that changes color intensity by adding random multiples of eigenvectors and -values layer-wise before pass- ing the picture through the network. The second technique uses a simpler approach of adding uniform brightness, saturation and contrast to the image (in random order), which supposedly helps the network learn invariances in these properties [13].

6 Experiments

Common to all experiments As the first ex- periments were conducted at an early stage of labeling the data, and I wanted to avoid classi- cal problems of imbalanced representation, the maximum number of pictures I could choose per class for this experiment was that of the least populated class: 330. The size of the train- ing set is 60% and 40% for testing. I also chose a dropout-probability of 0.5 for every weight, which has a similar effect of training several smaller networks at once and then averaging

(7)

over them [24]. Unless stated otherwise, I used all aforementionedPreprocessing-steps, as well as the following initial values, which are within the range seen in recent works [20,24,12]:

Initial learning rate = 0.01 Momentum = 0.9

Learning rate decay: halved every 10 epochs (to speed up training [20])

Weights were initialized according to Xavier initialization [9])

Batchsize = 32 original images, to avoid over- loading the GPU memory.

Experiment 1 – Baselines The purpose of this experiment was to make network-specific baselines for our PlatesMania dataset. I trained the three networks VGG, GoogLeNet and ResNetfrom scratch using 330 images per 28 classes (9240 original images in total for train- ing and testing).

Experiment 2 – Faster learning rate decay To explore how variations in learning rate af- fects the learning with our dataset, I trained VGG again with faster learning rate decay. Ac- cording to Nielsen [20], the learning rate should be decreased when the learning slows down, which seems to happen at Epoch 4 for VGG (see Figure 5a). As such, I halved the learn- ing rate -parameter every 4th epoch rather than every 10th.

Experiment 4 – Fewer classes This experi- ment is also almost identical to the first, except with fewer classes. Here, I removed the very heterogeneous "Others"-class, as well as the three last classes Kei Car, Caddy and Pickup with box, and trained the three networks with the remaning 24 (thus 132 images per class in the test set). Here, I also monitored the out- put of the networks with confusion matrices to better understand where they made the most mistakes, see Figure7.

Experiment 5 – More data, less augmenta- tion As we had gathered a significantly larger amount of data, I chose to substitute thedata augmentation-methods by increasing the num- ber of pictures per class to 13500 to better un- derstand the importance of data size in training.

I also excluded all but the classes 12-16 in this experiment, to see if the networks would better distinguish between these very similar classes it had previously "confused" (Figure7) if it had more examples and fewer alternatives. Note:

after several failed attempts to make the net- works converge, I chose to reduce the learning rate only 20% every 10 epochs to make them converge faster [20].

Experiment 6 – Very deep networks As great success was reported for very deep ResNets [12], I also wanted to try increasing the amount of layers. Starting from the ResNet- 34 model used in earlier experiments, I first increased the of convolutional layers to 50, and then to an extreme 152-layer implementation in the end. This was a challenge, as it put heavy demands on the GPUs.

0"

20"

40"

60"

80"

100"

0" 3" 5" 8" 10" 13" 15" 18" 20" 22" 25" 27"

Accuracy'(%)'

Epoch'

Top.1,Trainset"

Top.5,"Testset"

Top.1,"Testset"

(a) VGG

0"

20"

40"

60"

80"

100"

0" 3" 5" 8" 10" 13" 15" 18" 20" 22" 25" 27" 30" 32" 35" 37" 39"

Accuracy'(%)'

Epoch'

Top/1,Trainset"

Top/5,"Testset"

Top/1,"Testset"

(b) ResNet-34

Figure 5: Experiment 1: for VGG and ResNet- 34 (GoogLeNet-plot omitted because it is very similar to VGG). Note how VGG stops learning after reaching 100% accuracy on the training set, while ResNet-34 keeps improving its per- formance on the test set

7 Results

In this section the results are presented chrono- logically, either by plotting the average accu- racy on all classes (as plots or a table) or pre- sented as separate classes (confusion matrices).

Plots are based on records after each epoch, and results are discussed and explained in section8 and in captions.

5

(8)

0"

20"

40"

60"

80"

100"

0" 3" 5" 8" 10" 13" 15"

Accuracy'(%)'

Epoch'

Top-1,"Trainset"

Top-5,"Testset"

Top-1,"Testset"

Figure 6: Experiment 2: Learning rate decay every 4 epochs rather than every 10 on VGG

(a) GoogLeNet

(b) ResNet

Figure 7: Confusion matrices for Experiment 4.

Each class has 132 images in total

0"

20"

40"

60"

80"

100"

0" 5" 10" 15" 20" 25" 30" 35" 40" 45" 50" 55" 60" 65" 70" 75"

Accuracy'(%)'

Epoch'

Train,"Top31"

Test,"Top31"

Figure 8: Experiment 6: The performance of ResNet-152 on the five most difficult classes.

We can see how the network converges after a relatively large amount of iterations.

(a) GoogLeNet

(b) ResNet-50

Figure 9: The results after experiments 5 (a) and 6 (b) suggest how the networks still struggle between the most similar classes Sedan, Hatch- back and StationWagon.

(9)

Classes Model Top-1 Top-5

VGG 50,8 82,0

28 GoogLeNet 49,5 71,1

ResNet-34 41,8 77,0

VGG 52,1 85,7

24 GoogLeNet 51,5 68,7

ResNet-34 46,1 82,0 ResNet-152 61,5 -

5 ResNet-50 61,0 -

GoogLeNet 60,4 -

Table 1: Accuracy for trained networks, exper- iments 1,4,5,6. Top-5 is omitted for 5-class experiments

8 Discussion

Experiment 1 The average Top-1 score of 47.4%for the three networks was a very posi- tive surprise for this first experiment and heav- ily outweights a random generators probability of 1/28to choose the right car type. Since I was using our new dataset rather than a cleaner competitor such as [29, 16], I had expected worse. This experimental result also reveals what seems to be a difference between ResNets and the other two: how it can keep learning after the results on the training set have reached almost 100% accuracy. This can be seen in Figure5.

Experiment 2 The results from this experi- ment are difficult to interpret. The "before" and

"after"-graphs follow each other very closely (in fact, the Top-1 and Top-5 both converged faster before the change, see Figure6), and there’s little to no difference in the final scores. Also, since both initial weights and the (stochastic gradient descent) learning algorithm are non- deterministic, it’s hard to know from just one experiment whether or not the fluctuations re- ally depend on the learning rate decay, or on other factors such as weight initialization.

Experiment 4 In this experiment, removing 14%of the classes and their images only lead to an increase to 49.9% correctly classified im- ages. It is hard to analyze this result by direct comparison with Experiment 1, since both the amounts of training images and classes were smaller this time. It could be more fair to com- pare the ratios between my two results with that of the random probabilities (1/28and1/24) of success: 49.947.4 < 11//2428. This comparison could suggest the network was actually less effective with fewer classes and less data.

The most useful insight after this experiment probably came from introducing the confusion matrices (Figure7), as it was the first clear feed- back on where the networks were making mis- takes. In several cases, as in deciding between Stationwagons, Hatchbacks and Sedans, the re- sults clearly reflect the difficulties we had clas- sifying each vehicle by hand. Because of their similarity, the confusion between Trucks and ConcreteMixers were no surprise either. How- ever, I was surprised by how clearly the network separated DoubleDeckers from normal buses, and how often it mistook other vehicles for Pick- ups and Jeeps (see columns 4 and 10 of Fig7).

In hindsight, I probably would’ve chosen to re- move or merge other classes to directly improve the results. Couldn’t, for example, removing the

"Others"-class be detrimental to the score, see- ing as it differs significantly from the remaining classes, and as such is easier to classify?

Comparing the results of 24 class experiments in Table 1 and Figure 7c also reveals what seems to be another particularity with ResNets:

the large gap between Top-1 and Top-5 scores.

Compared to GoogLeNet, it receives a higher Top-5 score but a lower Top-1, which I inter- pret as sacrificing higher peaks in the proba- bility density function (higher confidence) in one class for a more even spread over groups of classes.

Experiment 5 The last two experiments were the most fine-grained, as they only contained very similar car types. Comparing to the previ- ous, where the average accuracy of the classes 12-16 were about 32% for GoogLeNet (see Fig- ure7), seeing the network pass the 60% -mark was a great positive surprise. Of course, such a comparison isn’t fair at all; this time I used about 41 times as many original training im- ages per class and a lot fewer classes. Pick- ing randomly from the cars would’ve yielded

1/5 = 20%on average, which makes the rela- tive difference between random guessing and my neural network worse than inExperiment 4.

Experiment 6 After ResNet-34 achieved about 37% accuracy on average for the classes 12-16 in Experiment 4, and GoogLeNet in- creased about 9% in Experiment 5, this last experiment displays the highest accuracies of all, around 61.5%. I unfortunately didn’t have time to run an extra experiment with 5 classes on ResNet-34, so it’s hard to know how much of the increase was due the class reduction and size of data, versus increasing network depth.

It seems, however, that increases from 50 to 152 convolutional layers didn’t help much com-

7

(10)

pared to the 3x increase of complexity, which I speculate could be related to the vanishing gradient problem [20,12]. Seeing the results in Figure9, its clear that the largest difficulty was distinguishing between Sedans, Hatchbacks and Station wagons. This was no surprise, as many are designed to be in-between these three popu- lar classes, and as such could belong to one or the other.

Reflections on my experimental method The experiments were planned as ad hoc ex- plorations instead of methodical studies to see how good results I could achieve in the short timeframe of a bachelor thesis. As a conse- quence, the exact implication of each parameter change cannot be considered proven empiri- cally through my studies. For example, it is still unclear how increasing the amount of images per class influenced learning in experiment 5 and 6, as I also chose fewer classes, turned off data augmentation, and used deeper versions of ResNet. This direction was, however, based on conscious choices. I believe the importance of optimizing each parameter was already well- documented, and therefore I wanted to try out popular methods on a new, high-noise dataset, to both broaden my understanding of deep neu- ral networks and hopefully produce a guide to help others.

9 Future work

9.1 Further optimizing the algorithm Conducting methodological experiments to see which parameters lead to increases would pre- sumably show how to further improve our re- sults more clearly. One example could be trying to extend the depth of GoogLeNet or adding residual connections, to see if some of the posi- tive effects from ResNets inExperiment 6could also be seen in deeper GoogLeNets, such as in Szegedy et al. [26]. It would also be very inter- esting to explore the trained networks by visu- alizing the weights of each layer with the tech- niques described in Zeiler and Fergus [30]. This could help explain which features are learned during training.

9.2 Cleaning up the dataset

I have pondered several solutions for improving the quality of our dataset, such as further merg- ing or removing classes, or training a network on a less noisy dataset like CompCars [29], and using it to check the labels of PlatesMania [1].

One could also consider using object detection

to find car in image [31], or even excluding rare vehicles, such as particular Russian minibuses or military missile carriers, which are less com- mon but currently represented. Creating a good, easily analyzable dataset is a common challenge in research. Even though we cannot completely trust all labels from PlatesMania, we can still rely on our subjective classifications most of the time. The question is how important a per- fect dataset is in this context. A recent study by Krause et al. [17] strongly questions the value of "cleaning up" a reasonably clean dataset of what it calls "cross-category noise" (mislabeled images), when the dataset is large. Deciding whether to clean up or not also depends on what the purpose of the trained classifier will be. The accuracies in my experiments are, for example, quite a bit worse than CompCars [29], because of the higher quality of their dataset, but it is unknown how accurately their classifiers would perform on our noisy dataset. In general, how- ever, we recognize that creating a high-quality dataset is a very common problem in all re- search, and that creating experiments that show clear results is an art.

9.3 Determining a goal to go further After working with these algorithms for some time, and reading several articles and results from current research, it’s easy to imagine a myriad of ways of continuing our work. How- ever, inventing without re-inventing in such an active field seems a task as daunting as it is exciting. In the end, it depends on what the problem at hand, and what dataset is available;

maybe the most interesting problem lies in dis- covering a new field of application rather than a new optimization technique! For this reason, I believe the most important first step to cre- ate a well-performing algorithm is identifying the purpose of the task – a goal. With a clear goal and it’s implicit constraints, the direction through the "forest" of variables in a project becomes clearer.

9.4 Acknowledgements

I’d like to thank Elena, my friends (especially Michele, Matteo and Simone in Torino) and family for the support while writing my bache- lor thesis. I would also like to give a particular to mention to some of the researchers and Ph.D candidates of the Telecom Italia Joint Open First, I am very grateful to Sina Ghassemi, forLab discussions, tips, friendship and help to use Torch for my experiments. Without him, I would still be lost trying to import images or navigate

(11)

the jungle of papers, programming library doc- umentations and mathematical expertise that constitutes the exciting field of neural networks.

It’s been a pleasure working with him.

I would also like to thank Skjalg Lepsøy, Tomas Björklund and Pedro Gusmão, for helping me understand the concepts of neural networks and Linux, and for always doing so with a smile.

Lastly, I’d like to thank Gianluca Francini, Enrico Magli and the JOL for giving me the opportunity to write my thesis in their offices.

They included me in meetings and briefings, and gave me access to their absolute high-end hardware, as if I was a regular member of their research team. It has been wonderfully inspir- ing environment, making this thesis-project an experience beyond all my expectations.

Grazie mille.

About Joint Open Lab The Telecom Italia Joint Open Lab is a collaboration between Politecnico di Torino and Telecom Italia. It consists of the four groups VISIBLE, CRAB, SWARM, MOBILE, that work with convolu- tional neural networks, robotics, IoT applica- tions, and mobile social applications respec- tively. http://jol.telecomitalia.com/

jolvisible

References

[1] Platesmania.www.platesmania.com/. [Last retrieved: 12-06-2016].

[2] Wikicars – car body style.http://wikicars.

org/en/Car_body_style. [Last retrieved:

13-06-2016].

[3] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Good- fellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Leven- berg, D. Mané, R. Monga, S. Moore, D. Mur- ray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Van- houcke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale ma- chine learning on heterogeneous systems. 2015.

URLhttp://tensorflow.org/. Software available from tensorflow.org.

[4] D. Bullock, J. Garrett, and C. Hendrickson. A neural network for image-based vehicle detec- tion. Transportation Research Part C: Emerg- ing Technologies, 1(3):235–247, 1993.

[5] S. Chintala. Github repository.

https://github.com/soumith/

imagenet-multiGPU.torch, 2013. [Last Retrieved: 01-06-2016].

[6] R. Collobert, K. Kavukcuoglu, and C. Fara- bet. Torch7: A matlab-like environment for machine learning. 2011.

[7] Facebook. Github repository.

https://github.com/gcr/fb.resnet.

torch-lesion-study/blob/master/

datasets/transforms.lua.

[8] L. A. Gatys, A. S. Ecker, and M. Bethge.

A neural algorithm of artistic style. CoRR, abs/1508.06576, 2015. URLhttp://arxiv.

org/abs/1508.06576.

[9] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Y. W. Teh and M. Titterington, editors, Proceedings of the Thirteenth Inter- national Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Ma- chine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URLhttp://proceedings.

mlr.press/v9/glorot10a.html.

[10] S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. Deep learning with limited numerical precision. CoRR, abs/1502.02551, 2015. URLhttp://arxiv.org/abs/1502.

02551.

[11] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. C. Courville, Y. Bengio, C. Pal, P. Jodoin, and H. Larochelle. Brain tumor segmen- tation with deep neural networks. CoRR, abs/1505.03540, 2015. URLhttp://arxiv.

org/abs/1505.03540.

[12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. URLhttp://arxiv.

org/abs/1512.03385.

[13] A. G. Howard. Some improvements on deep convolutional neural network based image classification. CoRR, abs/1312.5402, 2013. URL http:

//dblp.uni-trier.de/db/journals/

corr/corr1312.html#Howard13.

[14] S. Ioffe and C. Szegedy. Batch normalization:

Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015. URLhttp://arxiv.org/abs/1502.

03167.

[15] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

[16] J. Krause, M. Stark, J. Deng, and L. Fei-Fei.

3d object representations for fine-grained cate- gorization. 2013.

9

(12)

[17] J. Krause, B. Sapp, A. Howard, H. Zhou, A. To- shev, T. Duerig, J. Philbin, and F. Li. The un- reasonable effectiveness of noisy data for fine- grained recognition. CoRR, abs/1511.06789, 2015. URLhttp://arxiv.org/abs/1511.

06789.

[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton.

Imagenet classification with deep convolutional neural networks. pages 1097–1105, 2012.

[19] D. Liu and Y. Wang. Image classification of vehicle make and model using convolu- tional neural networks and transfer learning.

2014. URLhttp://cs231.stanford.edu/

reports/lediurfinal.pdf. [Unpublished paper from Stanford University course CS231n, Last retrieved: 01-06-2016].

[20] M. A. Nielsen. Neural networks and deep learn- ing. Determination Press, 2015.

[21] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. Li. Imagenet large scale visual recognition challenge. CoRR, abs/1409.0575, 2014. URL http://arxiv.org/abs/1409.0575.

[22] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. URL http://arxiv.org/abs/1409.1556.

[23] R. Smith. Nvidia launches tesla k80, gk210 gpu. www.anandtech.com/show/8729/

nvidia-launches-tesla-k80-gk210-gpu, 2014. [Web article. Last retrieved: 15-05- 2016].

[24] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout:

A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15

(1):1929–1958, Jan. 2014. ISSN 1532-4435.

URLhttp://dl.acm.org/citation.cfm?

id=2627435.2670313.

[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E.

Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with con- volutions. CoRR, abs/1409.4842, 2014. URL http://arxiv.org/abs/1409.4842.

[26] C. Szegedy, S. Ioffe, and V. Vanhoucke.

Inception-v4, inception-resnet and the impact of residual connections on learning. CoRR, abs/1602.07261, 2016. URLhttp://arxiv.

org/abs/1602.07261.

[27] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf.

Deepface: Closing the gap to human-level per- formance in face verification. pages 1701–

1708, 2014. doi: 10.1109/CVPR.2014.220.

URL http://dx.doi.org/10.1109/CVPR.

2014.220.

[28] T. Weyand, I. Kostrikov, and J. Philbin. Planet - photo geolocation with convolutional neural networks. CoRR, abs/1602.05314, 2016. URL http://arxiv.org/abs/1602.05314.

[29] L. Yang, P. Luo, C. C. Loy, and X. Tang.

A large-scale car dataset for fine-grained categorization and verification. CoRR, abs/1506.08959, 2015. URLhttp://arxiv.

org/abs/1506.08959. [Last Retrieved: 01- 06-2016].

[30] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013. URLhttp://arxiv.

org/abs/1311.2901.

[31] Y. Zhou and N. Cheung. Vehicle classifica- tion using transferable deep neural network fea- tures. CoRR, abs/1601.01145, 2016. URL http://arxiv.org/abs/1601.01145.

(13)
(14)

References

Related documents

Regarding the first research question about how the performance of the machine learning algorithm is influenced by the different encoding presets tuned for SSIM and PSNR, we can

Thus, through analysing collocates and connotations, this study aims to investigate the interchangeability and through this the level of synonymy among the

The contrast limited adaptive histogram equalization method is used the function “adapthisteq()” in Matlab, which is the algorithm proposed by Zuiderveld, Karel [35]. The

By doing this, and also including more classification algorithms, it would yield in- teresting results on how algorithms perform when trained on a balanced training set and

The image chosen can be seen in figure 15 and contains all classes; onion, weed and background, and the reason for that is to further investigate the redundancy in the activation

We know the coordinates of the leaflet markers to within roughly 0.1 mm through the cardiac cycle, and hold the best-fit leaflet plane in a constant position in each frame, yet

WHO (2019) beskriver att 69 % av patienter som har schizofreni inte får tillräcklig vård vilket uppsatsförfattarna tänker kan leda till en ökad vårdbelastning för partner..

The primary goal of the project was to evaluate two frameworks for developing and implement- ing machine learning models using deep learning and neural networks, Google TensorFlow