Detection of facade cracks using deep learning

(1)

Detection of facade cracks using deep learning

Linus Eriksson

Independent degree project – first cycle Main field of study: Electrical Engineering Credits: 15

Semester/Year: SS/2020 Supervisor: Bin Wang Examiner: Jan Lundgren

Course code/registration number: ET107G

Degree programme: Master of Science in Electrical Engineering (300 Credits)

(2)

Abstract

Facade cracks are a common problem in the north of Sweden due to shifting temperatures creating frost in the facades which ultimately damages the facades, often in the form of cracks.

To fix these cracks, workers must visually inspect the facades to find them which is a difficult and time-consuming task. This project explores the possibilities of creating an algorithm that can classify cracks on facades with the help of deep learning models. The idea is that in the future, an algorithm like this could be implemented on a drone that hoovers around buildings, filming the facade, and reporting back if there are any damages to the facade. The work in this project is exploratory and the path of convolutional neural networks has been explored, as well as the possibility to simulate training data due to the lack of real-world data. The experimental work in this project led to some interesting conclusions for further work. The relatively small amount of data used in this project points towards the possibility of using simulated data as a complement to real data, as well as the possibility of using convolutional neural networks as a means of classifying facades for crack recognition. The data and conclusions collected in this report can be used as a preparatory work for a working prototype algorithm.

Keywords: Machine learning, Deep learning, Dataset simulation, Crack classification, Blender,

Feature Extraction.

(3)

Acknowledgements

I thank my supervisor and Research Engineer Bin Wang for providing me with his knowledge,

guidance, and support during this project.

(4)

Terminology

Acronyms

NN CNN TPU GPU BoW Textons NASNet ResNet ReLU Softmax

Neural Network.

Convolutional Neural Network.

Tensor Processing Unit.

Graphics Processing Unit.

Bag-of-Words.

Micro-structures in natural images.

Neural Architecture Search Network.

Residual Network.

Rectified Linear Unit (activation function).

Normalized exponential function (activation function).

(7)

1 Introduction

Machine learning is a broad term formed as early as the 1950s by Arthur Samuel who was a pioneer in artificial intelligence and computer games at IBM. Artur succeeded with a computer in the 1950s, to create an algorithm that learned to play checker’s game using a reward type of learning, or what we would probably call it now, reinforced machine learning [1].

Since the checker's game algorithm almost 70 years ago, machine learning and trained neural networks find more and more applications and limitations do not yet seem to exist. Today, ma- chine learning is used for everything from finding spam emails in your e-mail inbox, image analysis in self-driving cars, voice synthesis, texture synthesis and much more. Machine learn- ing can mimic the experience of a specialist by training from a large amount of data. And in some areas the result is even far better than what specialists could have ever achieved. But above all, it gives us the freedom to hand over elusive long-term assignments that people have previously had to do by hand, to algorithms.

A very interesting example of how effective an algorithm can be is a study where machine learning was used to read cardiovascular magnetic X-ray images where, with approximately the same error rate, images could be read 186 times faster, which means that what took the cardi- ologist 13 minutes only to take the algorithm 4 seconds [2].

And this year, Nvidia, a world leader in GPU and TPU technology, announced their develop- ment in what they call DLSS or “Deep Learning Super Sampling”. It is a deep learning algo- rithm that is trained to not actually upscale an image, but make the image look like an upscaled image, without actually doing the computer heavy task of upscaling. The method is to be widely used in new games with dynamic ray tracing. This is because ray tracing is a very calculation heavy task and with the help of AI, it can be performed at a lower resolution and then scaled up with DLSS that is less calculation heavy then the originally needed functions to make the more precise raytracing at higher resolution [3].

1.1 Background

Many older buildings in cities have a facade coating made of plaster. Facade plaster is a putty- like coating that often has a somewhat coarser content and is cement based. Facade plasters are known to be strong against wind and water and can last for decades if taken care of properly.

When a plaster facade starts to grow old, it happens that it develops cracks, these cracks allow water to seep into the facade and under the plaster. This can end with the plaster eventually falling off or being damaged due to the shifting frost during the colder seasons. An important part of maintaining these old facades is finding and repairing these cracks.

Finding cracks as a facade worker is done ocularly and it is a difficult task when the houses are

tall and many, so a thesis was motivated as if machine learning could be used to create an

algorithm that recognizes cracks on these facades. This algorithm could then in the future be

implemented on a drone that could then effectively fly around and film, analyse and classify

cracks on the facades.

(8)

There has not been done any previous work on the specific subject of facade crack classification with deep learning models but the field of classification of objects have been extensively re- searched over the past decades. Texture recognition with deep learning is not as extensively researched as object recognition, hence the main object for this project is to explore if it is possible to classify the more texture like crack with deep learning models. Studies as “Plant Texture Classification Using Gabor Co-Occurrences”, “Texture Classification using Convolu- tional Neural Networks” and “A Deep Learning Model for Automatic Image Texture Classifi- cation: Application to Vision-based Automatic Aircraft Landing” was used as inspiration to find useful methods in this project [4] [5] [6].

1.2 Overall intention

The overall intentions of this work are to explore the possibilities of using deep learning models as a classifier to detect cracks in facades. And that hopefully this project will give some insight into what methods of machine learning that could be used for a prototype algorithm in the future if this project is to be continued.

The project also explores the possibilities to simulate a dataset for training deep learning models in the form of simulated images. The intentions are to create a dataset that can complement real world data to create a bigger dataset. And it is hoped that the project will show if simulated data, in the future, can be used to implement objects to the image dataset that would be hard to get from the real world to the dataset otherwise. That would mean that simulated data could effec- tively be used to enhance a limited training dataset.

1.3 Scope Limitations

Originally the scope was much bigger and were including parts of drone application research as the final work, but it was quickly understood that the drone application was a huge project in itself. Hence, the scope of this project was bounded to only that of the machine learning algorithm, or more specifically, to explore if it is possible to classify cracks and how it might be done with deep learning algorithms.

1.4 Concrete and verifiable objectives

Because this project will in most likeliness not present a finished algorithm, the verifiable ob- jectives will be smaller goals on the path of creating a functional algorithm.

The different objectives are identified as:

• Acquiring a dataset for testing: Some different difficulties of the different classes needs to be created to have a solid testing set for the different models that are to be tested during the project.

• Acquiring a dataset for training: Because of the difficulties of finding lots of facade cracks, some type of dataset must be either made or complement the lack of real facade images.

• Exploring the possibility of creating and implementing a simulated dataset.

• Exploring methods and models.

(9)

2 Theory

In this chapter the general concepts of machine learning, CNNs and texture analysis will be explained. As well as some core studies that stood as ground for some of the major choices made in the project.

2.1 Machine learning

Machine learning allows us to create an algorithm that trains itself from large amounts of pro- cessed data. Machine learning can use huge amounts of data to solve specific problems quickly and in some cases even much faster than an expert in the field. This is because the algorithm was created only to solve this specific problem and it was trained with huge amounts of data on the specific problem. Machine learning is often categorized into four different methods or al- gorithms:

• Supervised machine learning algorithms is used by most of the practical machine learning. This method is about approximating a function that can differentiate between the different classes, this is done with already classified data often in large quantities.

It can be described as the in variables (x) and the out variables (y) where the approxi- mation function becomes:

𝑦𝑦 = 𝑓𝑓(𝑥𝑥) (1)

• Unsupervised machine learning algorithm. Unlike Supervised, does not use labelled training data and is more used to find connections in systems and hidden structures in data.

• Semi-supervised machine learning algorithms then falls between Supervised and Unsu- pervised machine learning and uses both labelled and unlabelled data to practice the approximation function. This method often leads to great learning ability.

• Reinforcement machine learning algorithms is a reward method where the algorithm often interacts with an environment and is trained by being rewarded for the positive things it does and thus the algorithm save these wanted traits and continue their trial and error to find more rewards.

Supervised machine learning and Semi-supervised machine learning algorithms are the main algorithms used in this project.

2.2 Artificial neural network

A neural network is usually drawn in parallel to how the human brain processes information.

An Artificial Neural Network (ANN) consists of a connected network of often massive amounts

of neurons. Each neuron in an artificial neural network is a mathematical function that takes in

one or more values, the values then go through an activation function and are then passed on as

one or more values. A neural network typically has three parts, in, out, and an activation func-

tion. A mathematical model is illustrated in Figure 1.

(10)

Figure 1: Mathematical model of artificial neuron.

Image inspired by Octavian’s blog.

Then by computing the gradient in weight space or simply put, backpropagating the neural network, the network can be changed to provide the most correct result as possible with respect to a chosen loss function. The part that is effectively changing is the ‘weights’ of the neuron, the weights tells the neuron how much to use each input signal, they are illustrated on the left side of the artificial neuron in Figure 1.

2.3 Convolutional neural network

A Convolutional Neural Network (CNN) is a type of deep neural network as the name might suggest, and its purpose is often to analyse images in some form of the computer vision area.

The CNN is a multi-layered network that follows a typical structure of a convolutional layer, followed by a subsampling layer that is all connected in the end with dense layers in to the final softmax vector. This type of design originates from the pioneering ConvNet, LeNet-5. LeNet- 5 is a simple CNN made for detecting handwritten numbers on a 28x28 grid. More advanced CNN often consists of multiple blocks with a similar type of this classical design in every block, sometimes creating models that are over hundreds of layers deep.

2.4 Textures

Texture classification is something that has long been a difficult step in machine learning as textures differ greatly from object-based classification. A texture follows a set of statistical properties, and periodically follows repetitive textons with some variation [7]. Textures can be anything from a completely stochastic to completely regularly pattern. Textural analysis is usu- ally divided into four sub-problems, classification, segmentation, synthesis, and form. In this project, only the classification problem will be investigated.

Early research on texture features from the 1980s to 1990s focused on two areas, Filtering ap-

proaches and Statistic modelling. Filtering approach brings out the features of the texture with

the use of often convolving filters like Gabor, pyramid wavelets as well as simpler filters like

differences of Gaussians. Statistical modelling characterizes textures that arises from the prob-

ability distributions found on random fields [8]. Since then, the area is divided into three areas,

or ways of classifying textures, BoW, CNN, and Attribute-based classification.

(11)

Some interesting information in this area of machine learning algorithms that might be of use during this project comes from a report published by ICLR 2019. Briefly, this report suggests that CNNs are consistently biased towards texture, and as well shows some results that supports this hypothesis [9]. In addition to this, the report also brings up two other interesting studies that suggests that only the texture information is sufficient for many ImageNet CNNs to classify objects. In the first one, by Gatys et al, there was a discovery that a linear classifier on top of a CNNs texture representation has almost no change in loss in contrast to the original network [10]. In the second report, Brendel & Bethge demonstrated how CCNs with very small receptive fields though all layers still reaches high accuracy on ImageNet, this means that the CNN does not have the information of many features and shapes but still accurately classifies the object.

This should not be the case unless the CNN is particularly good at recognising local patches of textures that provides sufficient information to classify the object as the report suggests [11].

These types of properties make the CNN a remarkably interesting candidate for the crack-clas-

sification problem.

(12)

3 Tools

Some tools were used during the experimental work. In order to be able to focus on testing different models and data collections, no major background research has been done on the many different complicated functions used in many of the tools and they will not be explained to a greater extent in this report. This section gives a brief description of all the different tools used during the experimental work and readers who are more interested are referred to the respective source.

3.1 Tensorflow 2.0

Tensorflow is an end-to-end machine learning platform. It contains a large ecosystem of tools, libraries and common sources that make it easy to create a machine-learning algorithm without having to know all the advanced and difficult functions. Tensorflow makes it quite easy to ex- periment with different structures and parameters and makes it easy to train the model on a GPU to greatly decrease training time [12].

3.2 Keras

Keras is a high-level neural network application programming interface that can be run above Tensorflow, CNTK or Theano. It was developed to make it easy for experimentation with ma- chine learning and to be able to move quickly from idea to result. The Keras models are built in sequential form and their simple coding design makes it easy to manipulate additional func- tions, modules, loss functions and activation functions with more. The models are described in Python which makes the code compact, easy to debug and develop. Keras API also offers the use of pre-trained models on ImageNet that can be used with or without the classification layer [13].

3.3 Blender

Blender is a free tool for 3D modelling, 3D graphics, animation, game design, effects, and ren-

dering. It is a public project developed with the help of the Blender Foundation and used by

everything from students to after-effects experts [14]. Blender is used for the purpose of creating

a simulation of a cracked wall to then be able to create a large data collection with different

varying environments while keeping the simulation within the scope of what is possible to clas-

sify.

(13)

3.4 Datasets

During the experimental work, some data collections are used to both train models and to test models. These data collections are presented in this section. The collection of training and test- ing data was easily acquired with some research on google images, google maps, and through the help of some open source datasets from different universities.

3.4.1 SDNET2018

SDNET2018 is an image data collection made for training, validation and testing of machine learning algorithms of concrete. SDNET2018 contains over 56,000 images of cracks and non- cracks on concrete bridges, concrete walls and concrete road and contains a mixture of 6-25mm cracks in 256x256x3 format [15] . SDNET2018 was used during this project as a basic data collection for training because a fairly large collection of well-defined cracks with similar background and lighting could be created from it. The dataset was also augmented with simple flips as vertical, horizontal, and diagonal to make the dataset bigger. The dataset used for training from the SDNET2018 collection consists of about 5800 images with a 50-50 distribution. Examples in Figure 2.

Figure 2: Examples of images of cracks (left side) and without cracks (right side) from SDNET2018.

3.4.2 Concrete Crack Images for Classification

The data collection Concrete Crack Images for Classifications (CCIC) contains a collection of images from METU Turkey Campus and contains 20,000 images with cracks and 20,000 im- ages without cracks in the format 227x227x3 [16]. The images have large cracks and the colour differences between the images are relatively large compared to the SDNET2018 subset. This data collection is intended for experimental work and not for training the more higher end pro- totypes. The full dataset was used for training and consists of 40.000 images with a 50-50 dis- tribution. Examples in Figure 3.

Figure 3: Examples of images of cracks (left side) and without cracks (right side) from CCIC.

(14)

3.4.3 Simple dataset

As the project consists of a great deal of testing and validation of models and training data, a simple data collection was set up to do basic tests of models. The data collection consists of subset of 40 selected images from SDNET2018, half of the images with cracks and half without cracks. The images were chosen to obtain clear cracks and images without major defects that can be depicted for cracks. Some images from the set are presented in Figure 4.

Figure 4: Subset of simple test images, cracks at the top and non-cracks at the bottom.

3.4.4 Realistic datasets

To be able to compare how good a model is, one need realistic images that can be connected to

(15)

more difficult with a little more shadow and objects such as windows and other objects that may arise in the real facades. This data should give a better overview if the classifier makes its assessment on the cracks or other objects in the image. Examples from the dataset is illustrated in Figure 5 and Figure 6.

Figure 5: Subset of easier realistic datasets.

(16)

Figure 6: Subset of more difficult realistic datasets.

3.5 LeNet-5

LeNet-5 is a remarkably simple convolutional network that was originally created for the MNIST data collection to classify handwritten numbers. It is a significant milestone in the early steps of convolutional neural network and originates from as far back as to 1989 when the original form was proposed by Yann LeCun et al [17]. The model consists of just two convolu- tions with max-pooling in between, the array is then flattened and the classification is then done by two dense layers going in to a softmax vector as illustrated in Figure 7.

Figure 7: Classic structure of LeNet-5 (created by NN-SVG, a tool from Alexander Lenail -

http://alexlenail.me/NN-SVG - from “Hands-on computer vision with tensorflow 2” by B.

(17)

3.6 ResNet-50

Residual Neural Network 50 (ResNet-50) is a convolutional neural network that is 50 layers deep and trained with ImageNet's database on more than one million images spread over 1000 classes [18]. The classes include many items that are considered common occurrences such as animals, furniture, tools and much more. ResNet-50 has about 23 million parameters which makes it practically impossible for a commercial computer to train ResNet-50 with its 1 mil- lion pictures [18]. ResNet-50 is distributed as a pre-trained model with manipulatable layers and weights through the Keras API. One advantage of ResNet-50 and many other major vi- sion machine-learning algorithms is that they are great at finding specific features, because of this, they are great at being reprogrammed to classify other classes through the use of feature extraction. ResNet-50 structure is illustrated in

Table 1.

Table 1: ResNet-50 Architecture.

Layer Our dim. 50 Layer ResNet

In 224x224x3 -

Conv1 112x112 7x7, 64, stride 2

3x3 max pool, stride 2

Conv2_x 56x56

� 1𝑥𝑥1, 64 3𝑥𝑥3, 64 1𝑥𝑥1, 256 � 𝑥𝑥 3

Conv3_x 28x28

� 1𝑥𝑥1, 128 3𝑥𝑥3, 128 1𝑥𝑥1, 512 � 𝑥𝑥 4

Conv4_x 14x14

� 1𝑥𝑥1, 256 3𝑥𝑥3, 256 1𝑥𝑥1, 1024 � 𝑥𝑥 6

Conv5_x 7x7

� 1𝑥𝑥1, 512 3𝑥𝑥3, 512 1𝑥𝑥1, 2048 � 𝑥𝑥 3

Dense 1x1 Average pool, 1000-d fc, softmax

(18)

4 Experimental work – preparations

Since the classification of cracks is not a particularly elementary step in machine learning, much of the work consists of experimental tests and comparisons to try to achieve the most effective algorithm possible, in the hope that it will make way for a proper prototype algorithm.

The entirety of training models and using models in this project was done in the Tensorflow environment as well with the Keras API. This allowed for easy access to lots of libraries, models and pre-processing function which made it extremely easy to build models, pre-processes im- ages and setting up pipelines for training with specific data.

The results presented from the experimental work is a percentage of accuracy on the different difficulty test sets, and since this is a binary classification issue, the worst-case scenario is 50%, since that tells us nothing about the algorithms performance. No emphasis was put on false- positive and false-negative as it was deemed as not of much importance for this simple explor- atory project.

4.1 Experiment 1 – LeNet-5

As stated in the Textures chapter, there are studies that shows that it is possible to find and classify textures with CNNs. CNNs are also good at distinguishing features, which was consid- ered particularly good in this case, since many different features exist on a facade that needs to be distinguished from not being cracks. These advantages plus the easy implementation of CNN models made the CNN the most attractive candidate method to use in development of a facade crack-finding algorithm.

To experiment with different datasets and models, a basis was needed to start from. Because of its simplicity, LeNet-5 was used as a start point. LeNet-5 is also designed from the ground up to work with 1-channel images which is most likely enough to classify cracks.

Hence a similar design to LeNet-5 was set up in Tensorflow using the Keras API according to the original design presented in Figure 7 with modified last and first layers as presented in Table 2.

Table 2: Model design, sequential.

Lager Ut dimension Activation

In Bild 227, 227, 1 -

1 Conv2D 223, 223, 6 ReLU

2 MaxPooling2D 111, 111, 6 ReLU

3 Conv2D 107, 107, 16 ReLU

4 MaxPooling2D 53, 53, 16 ReLU

5 Flatten 44 944 ReLU

6 Dense 120 ReLU

7 Dense 84 ReLU

Ut Dense 1 Sigmoid

The model was tested with training on the data collection CCITC, SDNET2018 and both mixed.

(19)

4.2 Experiment 2 – Filtering

A commonly used filter in the texture analysis field is the Gabor filter. The Gabor filter can find specific frequency content in a specified direction, which is especially useful when looking for texture patterns. Cracks though, are not spread over a big area, and in total has a lot of textons that look similar, but are of a stochastic structure that follows a random path, hence, the Gabor filter was implemented in four different directions, one vertical, one horizontal and one on each diagonal and then the images are simply added together. The resulting picture is illustrated with an original in Figure 8. The filter was tested on the LeNet-5 inspired model as illustrated by Table 5.

Figure 8: Example of Gabor filtering in four directions, left side original image, right side Gabor filtered image.

4.3 Experiment 3 – Feature Extraction

Since the classification of cracks with convolution models seem to show potential even with

the most simple convolutional models, testing can begin with larger and more advanced classi-

fiers. This can be done using feature extraction. Feature extraction is the use of a pre-trained

models without the final neural networks that divide the models output into classes. This allows

us to capture the many output signals from the network, which is usually referred to as the

feature vector after flattening. This vector can then be used to train a smaller neural network,

specified for a new number of classes. The advantage of using this method is that the user does

not have to train the models, which often have tens of millions of parameters and often over a

million of training images spread over a thousand classes. Hence, with the use of feature ex-

traction, a model trained over immensely large amounts of data can be implemented in minutes

instead of weeks on custom datasets and classes.

(20)

To implement the feature extraction method, a trained network is needed first. ResNet-50 was the first candidate and could easily be downloaded through Keras API without the top layer.

Without the top layer of the already weighted model, ResNet-50, sends out an array of the form 7x7x2024. This array is then flattened into a vector and sent through a smaller neural network.

This smaller neural network can then be trained as a normal deep learning model, provided that the data with which the network is fed has gone through the pre-trained ResNet-50 model first.

The structure of the simple neural network is illustrated in Table 3.

Table 3: Simple neural network that processes the feature extraction vector.

Layer Out dim. Activation

In Vector 772024 -

1 Dense 224 ReLU

2 Dense 16 ReLU

Out Dense 2 Softmax

This method with the same simple neural network was implemented with several different pre- trained models and compared by using the data collections made for testing. All models were trained with SDNET2018 as it showed the best results in LeNet-5. The models were trained with different training parameters to avoid overfitting and achieve the best possible results for each model on the test data collections.

4.4 Experiment 4 – Simulation of training data

If the hypothesis that the various objects that appear in the realistic images are a problem for the algorithm. It would mean that the feature extraction method could not classify them as non- cracks or cracks. Then to fix this issue, they are needed to be introduced to the model in the form of training data. This created a new problem as it turns out that it is difficult to obtain good images for this type of data, so it was determined that the most effective way to get everything needed included in the training data, is by exploring the possibilities of simulating it.

For the simulation, the 3D Rendering program Blender was selected. The program enables the creation of large three-dimensional objects such as windows, walls, and anything else that could be found on a facade. By using Blender's "cyclic" function, the light can be ray traced to get as close to a realistic picture as possible. A simple overview of the work done in Blender is illus- trated in Figure 9.

Figure 9: Block-scheme describing the work done in Blender.

(21)

4.4.1 Plaster facade texture

Since machine learning algorithms are particularly good at finding patterns, it is especially im- portant that no strong patterns are introduced in the training data that can make the algorithm biased for these patterns. Therefore, all textures were created using noise maps. This was done with Blender Shading tools. By using a noise texture map followed by a power function, the features of the noise texture map can be amplified, this could then be fed into a multiplication function to control the size of these features. This texture map could then be fed into a world space texture displacement function that simulates height difference for realistic lightning.

Figure 10 illustrates how this was done using block-scheme code in Blender Shading.

Figure 10: Block scheme-coding of plaster facade in Blender.

The noise texture could then be adjusted to create small enough of a random pattern to make the texture look like plaster facade as Illustrated in Figure 11.

Figure 11: Simulation of plaster facade through use of noise maps in Blender.

4.4.2 Cracks

Of course, cracks may not occur in the form of regular patterns as well, hence noise maps are

(22)

This time the noise texture map was fed into two subtractor functions that subtracted inversely to each other. With the right value subtracted from the bigger patches on the texture noise maps, one could create something similar to valleys with a noticeable difference in displacement, as illustrated in Figure 12.

Figure 12: Result of two inversely subtractions from the texture noise map in Blender.

Then with the help of a minimum function, the valleys are splitted from the rest of the noise texture map. It is then followed by a multiplicator so the valleys height can be manipulated by changing the value in the multiplicator, to an appropriate size as shown in Figure 13.

Figure 13: Close up of manipulated valley in Blender.

The crack texture could be implemented with the earlier created plaster texture to create a full

texture with both realistic cracks and realistic plaster. The finished plaster-crack texture is illus-

trated in Figure 15, and the block-scheme code for the crack-texture is illustrated in Figure 14.

(23)

Figure 14: Block-scheme code of the simulated crack-texture in Blender.

Figure 15: Example plaster with cracks texture in Blender.

The block-scheme code of the facade with the implementation of both cracks and plaster is big and intricate and will be found in the appendix. Hence interested readers are referred to Appen- dix A for the full block-scheme.

4.4.3 Simulating Objects

Objects such as downpipes, stucco and ventilation were created entirely using the 3D tools available through Blender. Windows were created using boxes with window images on the frontside, this gave the windows a bit of a projection but without the more complicated three dimensions found in the window itself.

After all objects were created in Blender, everything could be matched together on a wall with

plaster and cracks texture. The wall could be rendered using a moving virtual camera to create

many simulated facade images with and without cracks containing the various objects with

realistic lightning. Examples of the objects with plaster texture is illustrated in Figure 16.

(24)

Figure 16: Examples of simulated windows, ventilation, drainpipe, and stucco in Blender.

Illustrations of how the facades were rendered, as well as some example images, can be found

in Appendix B.

(25)

5 Results

To clearly separate the experimental results with the projects results, the results are divided into experimental results and the overall scope results.

5.1 Experimental Results

This chapter presents the various results from the different tests conducted on the experiments explained in chapter Experimental Work – Preparations.

5.1.1 Experiment 1 – LeNet-5

Table 4 shows the results achieved from the LeNet-5 inspired model that was fed with 227x227x1 dimension images from the dataset CCIC and SDNET2018.

Table 4: Results of various training data on the first LeNet-5 inspired model tested on subsets.

Test data Simple Easier

realistic Harder realistic

CCIC 60 % 65,4 % 46,1 %

Training data SDNET2018 82,5 % 73,1 % 42,3 %

CCIC & SDNET2018 60 % 69,2 % 42,3 %

5.1.2 Experiment 2 – Filtering

The Gabor filter was tested with the LeNet-5 inspired model using again SDNET2018 as train- ing data, and as illustrated in Table 5, it was unable to make any type of reliable classification.

Table 5: Results from LeNet-5 trained on Gabor-filtered data from SDNET2018.

Test data Simple Easier real-

istic Harder real- istic

Model LeNet-5 50 % 34,6 % 42,3 %

5.1.3 Experiment 3 – Feature Extraction

The results of the different tested models are illustrated in Table 6. The results clearly show how ResNet-50 outperforms all other models. Both NASNetLarge and InceptionResNetV2 was unable to be trained above 50% on the training and validation sets.

Table 6: The result of different models on different test data.

Test data Simple Easier

realistic Harder realistic

ResNet-50 93,7 % 84,6 % 76,9 %

NASNetMobile 37,5 % 14,5 % 6,25 %

Modell NASNetLarge - - -

DenseNet201 81,2 % 50 % 50 %

InceptionResNetV2 - - -

(26)

Table 7 illustrates the results from the analysis done on DenseNet201 when investigating if the net had an insufficient classification net.

Table 7: Results of DenseNet201 with the new extended simple dense net.

Test data Simple Easier

realistic Harder realistic

Modell DenseNet201 91,6 % 50 % 50 %

5.1.3.1 Testing the texture

With the texture of cracks and none cracks on realistic plaster, a dataset was rendered to be passed through the feature extraction method with ResNet-50 as it showed the best overall re- sults, with much better repeatability than all other tested models. Hence ResNet-50 was the main model chosen to continue working with. The model was trained with the simulated data and with a mixture of SDNET2018 and the simulated data. Then the different trained models were tested on the same test data as in the previous Feature Extraction chapter. The results from testing are illustrated in Table 8.

Table 8: The ResNet-50 feature extraction method with various training data tested on the test data collections, and from Table 5 the result of the ResNet-50 in the bottom row.

Test data Simple Easier

realistic Harder realistic

Simulated dataset 77,5 % 57,7 % 26,9 %

Training data Simulated & SDNET2018 100 % 80,8 % 57,7 %

SDNET2018 93,7 % 84,6 % 76,9 %

5.1.3.2 Testing the whole simulated dataset

Table 9 illustrates the results of the fully simulated datasets with objects as well as a mix of the simulated dataset with SDNET2018. As shown, there is a significant increase in accuracy, about 15%, between the simulated dataset with objects and the simulated dataset without objects, both mixed with SDNET2018, on the easier realistic test set. As well on the harder realistic test set where a jump of 19,2% was achieved between the simulated & SDNET2018 with objects and without objects.

Table 9: The ResNet-50 feature extraction method with various training data tested on the test data collections, and from Table 5 the result of the ResNet-50 in the bottom row.

Test data Simple Easier

realistic Harder realistic

Simulated dataset 82,5 % 57,7 % 35 %

Training data Simulated dataset (less

batches) 50 % 65 % 30,8 %

Simulated & SDNET2018 100 % 96,2 % 76,9 %

SDNET2018 93,7 % 84,6 % 76,9 %

(27)

5.2 Overall Scope Results

The work started with collection of some testing and training data which was easily acquired

thought google and open source datasets from different universities. The experimental work in

this report starts with the most simplistic convolutional neural network model that with the right

training data could classify some simple real-world cracks. The good results from the first CNN

led to the exploration of deeper algorithms such as ResNet-50 and DenseNet201. The deeper

algorithms were, with the help of feature extraction and some simple training data, able to clas-

sify cracks to an even better extent then the simplest models. For more accurate models that

could classify a crack out of multiple other objects, a simulated training dataset was created

with facade objects to make the algorithm broader. The results from adding the simulated da-

taset to the original simple training dataset made the advanced models such as ResNet-50 even

better and supports the idea that simulating data might work as a complement next to a real-

world dataset. Although the simulated dataset did not have sufficient results on the realistic test

sets to make it as a training dataset alone.

(28)

6 Analysis of the Experimental Results

This chapter describes the analysis done on the different experiments and methods as well as analysis some models between layers.

6.1 Experiment 1 – LeNet-5

The results on the first LeNet-5 inspired model with SDNET, are quite good considering that the model is made for 28x28x1 images, but in this project, it was fed with 227x227x1 images.

When looking at the images originally intended for LeNet-5, they are hand-written numbers from the MNIST dataset, and one might argue that the numbers outlines are a bit similar to the cracks. The numbers are random but to some length follow a regular pattern, much like our cracks.

As we see in Table 4, the result is not particularly good considering that the model trained with images from the same dataset as the simple test, did not get more than 82,5 % correct. The results in Table 4 also provide some basis for the problem of interference with other objects.

Since the more difficult data collections have more disturbing objects in the images that the model has not encountered in training. It then becomes difficult for the model to classify the objects since it has never encountered them before.

The results are quite mixed from the first model and the only one that seems to be following our intuition of classifying the simplest test set with the highest accuracy and the hardest test set with lowest accuracy seem to be SDNET2018. The mix of both datasets should at least perform better on the simple test data then the CCIC alone, hence the CCIC training set might have made the model train towards some features unwanted in our model.

When looking at SDNET2018 it seems like it is not enough to make precise classifications the harder realistic dataset which contains different objects, this is a problem due to the high occur- rence of these objects on the real facades.

6.2 Experiment 2 – Filtering

The Gabor filter was a good candidate since it has the ability to filter out everything but the cracks, but since it is really hard to make the Gabor filter light-invariant it had to be abandoned for this project. There exist ways of implementing functions to make filters light-invariant, but they do not yet exist as common practice functions in the popular APIs, as Keras, used in this project.

The Gabor filtered dataset was then used to train the same LeNet-5 inspired model as previously used. As the results show in Table 5 there was no consistency in classifying anything correctly.

The Gabor filter were overly sensitive and even the slightest difference in overall lighting would lead to the entire picture being blacked out.

6.3 Experiment 3 – Feature Extraction

As illustrated in Table 6, ResNet-50 gives a result that seems to match the intuition of how good

a model should be able to classify these three data sets as mentioned earlier on the LeNet-5

inspired model. NASNetMobile was for some reason trained in the wrong direction and pro-

duced very cumbersome results, though this was the best result achieved if it were to be inverted,

(29)

best to worst accuracy if the results were to be inverted, this might point towards the model having trained towards something other than the cracks in the training images.

NASNetLarge as well as InceptionResNetV2 failed to train to a functional model, hence it never managed to get above a 50% accuracy during training. DenseNet201 did not get any particularly good results more than in the simple test images that originates from the same dataset as the training data.

6.3.1 Testing the simulated texture

As shown in Table 8 the simulated dataset alone is enough to make proper classification on the simple test set. That is some good results since there are big differences in the cracks of the two datasets. The simple test that originates from SDNET2018 consists of similar and quite straight cracks, the simulated dataset has cracks that mostly bend much more and are a bit longer then the cracks in the SDNET2018 dataset. Hence the result on the simple test data made here might point in the direction that the simulated dataset alone is making the model a bit generalized against classifying cracks. And as predicted the simulated dataset does not perform very well on the harder realistic dataset as it does not contain any objects in the training data.

6.3.2 Testing the whole simulated dataset

As illustrated in Table 9, the simulated dataset with objects does not produce as accurate clas- sification as the simulated dataset without objects. A test was done with less batches per epoch to test if the model is overfitting and the result was that the model could not classify the simple dataset, but could classify the easier realistic dataset better, but the harder realistic worse. The results from the simulated dataset with less batches are most likely not trained enough to make a proper classifier and the results should not be used to draw any significant conclusions. As with the previous feature extraction the simulated data with SDNET2018 outperforms the other datasets here as well as they seem to be balancing each outer out to our advantage when mixed together as a training dataset.

6.4 More on Feature Extraction

When looking at the simulated dataset alone, it seems like it underperformed quite a bit. There is data that supports the idea that the original dataset SDNET2018 with the simulated dataset performed better on the easier realistic images. Interesting is that the mixed dataset performed equally well on the harder realistic dataset as the SDNET2018, but because of the relatively small size of the realistic datasets this might just be an occurrence of random event working in the favour of the SDNET2018, since the mixed dataset did noticeably better on the easier real- istic test set and the simple test set. The simulated set with the SDNET2018 seems to be a good fit together as they seem to even out the algorithm to make it a better classifier over all test sets when comparing to the simulated dataset alone.

Due to the fact that none of the other models more than ResNet-50 and DenseNet201 gave any

usable results, a deeper analysis was conducted on the DenseNet201. The results were then

compared with ResNet-50 in hopes to get a hunch of why DenseNet201 could not classify more

than the simple test set, and to get a better idea of why the difference between the model’s

results are so big. To do this, one can look at how they are designed, and how the images are

processed inside them. The two NASNet models are not hand-designed models and have a quite

complicated structure of cells with each cell having been itself weighted by reinforced machine-

learning. This method creates a complex structure with an incredible number of parameters, for

(30)

advanced nets like NAS and InceptionResNetV2 are too complicated to give out features spe- cifically dividing the very simple structure of cracks and the much more broader class non- cracks that can have anything from windows to wines. One has to realise that these models are made for 1000 classes, and the amount of information coming out of the feature extractor is vast and might be too much to classify just two classes.

Investigation was chosen to be done inside of the two best models, ResNet-50 and DenseNet201.

ResNet-50 is a deep model, but it follows a more simple design closer related to the standard convolutional network concept. The ResNet block has elementwise addition from previous fea- ture map, hence the previous feature map that goes through processing is added with its copy from before processing, the creators propose that this is to increase gradient propagation [18].

This way is a much less advanced way then the DenseNets dense block that feeds all preceding feature maps as input to the next function in the block [19].

Looking inside the two models, ResNet-50 and DenseNet201, there seem to be some indications that the DenseNet201 does bring out the cracks sufficiently enough even deep in the model.

This can be seen by looking at Figure 17 and comparing a crack image with a non-crack image that has gone through 620 layers of the DenseNet201 which is quite close to the end. Much like the end layers in the ResNet-50 model as shown in Figure 18. Although this is only if the as- sumption that the visible crack in the images are of value to the classifier-densenet, since the inner working of the weights that bring out the feature vector in the last dense nets are unknown.

Figure 17: Subset of processed image of crack, left side, and non-crack, right side.

610 Layers deep in to DenseNet201.

As there is no direct visual evidence that the DenseNet201 could not process the images in a

sufficient manner to later be used in feature extraction, it was tested to see if the dense net that

handled the feature extracted vector wasn’t big enough to process the data coming out of the

net. A new test was set up with the classical two same sized dense nets followed by a smaller

dense net into a classification net as illustrated in Table 10. The model trained to a much higher

accuracy than with previous classifier net, but the results was not improved much more than a

fraction on the simple dataset as illustrated in Table 7, and it is still not generalizing enough to

make classifications on the realistic datasets.

(31)

Table 10: Extended simple dense net for feature extraction classification.

Layer Out dim. Activation

In Vector 772024 -

1 Dense 448 ReLU

2 Dense 448 ReLU

3 Dense 32 ReLU

Out Dense 2 Softmax

Since ResNet-50 has a similar design as a popular model called YOLOv3, it might be worth moving on with testing on YOLOv3 as it has a few properties that are of big value in this type of classification problem, it is explained in further detail in the alternative solutions chapter.

Figure 18: Subset of processed image of non-crack, left side, and cracks, right side.

100 Layers deep in to ResNet-50.

(32)

7 Future work

The work in this project led to some interesting follow up questions of how mainly one model could be implemented, theoretically with some quite good results. This chapters talks a bit about this model as well as the work that must be done before a real-world application can be made.

7.1 YOLO

The work was done in such a way that the next logical step was to be the YOLO model, and it supports the idea that YOLO would be a good candidate for this classification problem. The YOLO model is one of the most sophisticated image recognition algorithm that can not only classify an object, it can classify multiple objects in an image as well as boxing them, telling the user where the object is recognised on the image. This would be the most effective model to implement on this type of object since many classes can be created that contains the different object with specific features that can be found on the facade. This would open for possibilities to even remove these objects before running a crack or non-crack classifier if the objects still confuse the algorithm.

There is also the possibility to implement YOLOv3 with weights from other models, this could be used as an advantage as it might be possible to implement this with the feature extractor method, that most likely can be trained to greater accuracy’s with proper training data.

7.2 Drone application

For this project to become reality there must be an equally big project on the matter of imple-

menting an algorithm with a drone. By fitting an algorithm of this kind on a drone, multiple

camera stability, battery lifetime, air time, spatial awareness, embedded system hardware and

perhaps cloud computing which means that network stability and data encryption becomes an

issue. Much of the research on this project will most likely be how the algorithm is to be com-

puted, with an onboard small GPU, or maybe with a network board that just sends the images

to a local laptop or a cloud server. These questions need to be carefully considered due to the

most likely short flight time that needs to be preserved as much as possible for effective crack

searching.

(33)

8 Discussion

The thesis states, “Is it possible to create a deep learning algorithm to classify cracks on facades”, this question can take a lot of different paths and in this project the path of deep CNNs was explored. On this the subproblem of the lack of proper data became an explorative area of look- ing into the possibility of doing 3D renders of reality to create usable data for training. In this project the simulated data was not enough to produce reliable classification itself but with a good simulation this could become a possibility. Though from results of this project it seems like simulated data can at least be used as complementary to real world data. The praxis of augmenting images for a broader dataset is already happening and maybe pure simulated data is next in line.

The simulation can with the right tools and experience be better implemented. The simulated dataset used in this project had some parts of cracks that do not necessarily project a real crack.

Since the noise maps are built from noise, it can create smaller patches where they come to- gether, this creates patches that has a lot of cracks.

There is a possibility that the models would produce even better results if the training datasets would have been captured by for example, a drone. Since a big real-world captured dataset that could be handpicked for the perfect training data would be the absolute best for training a model for real-world application. Although the simulating of data was an interesting field since there is not much documented data on it.

8.1 Alternative solutions

The datasets used in the project is the foundation of the whole project, the concrete datasets were especially useful, and the simulated data was as well. But if a state-of-art algorithm were to be created, a real dataset would have to be established, not only for cracks but for every object that can be found on a facade, why this is needed is further explained in the next chapter.

The images could easily be captured by filming facades using any of the popular drones with a good enough camera, then the data could be processed by hand and sorted in to appropriate classes, this is a very time consuming task but well worth if a state-of-art algorithm is in the making.

The possibility of getting to train an advanced model like ResNet, DenseNet or even YOLO is quite small as these networks use millions of images for training and have millions of parame- ters to be changed. This is most often a task for a supercomputer. But there are smaller datasets that can be used to train these models and they could be fitted with datasets of cracks as well, creating a new class that can be implemented in the model. This could be an easy way of creat- ing a broadly trained algorithm that also can classify cracks. Having a broader trained model is probably the way to go when lots of objects can appear that are not cracks, even though the algorithm is not trained on those objects.

8.2 Social and ethical aspects

The algorithms themselves do not have much social and ethical differences that can occur, but these questions might arise in future work if the algorithms are to be implemented in a drone that captures the facade via a camera.

Both social and ethical problems that may occur here are mostly the same as the problems that

occur now with the more popular and cheap drones and people’s privacy. With the application

(34)

captured on camera, and those are some big problems that will have to be dealt with. A possible solution to this might be the method of using a YOLO model as an object remover as talked about in the “Alternative solutions” chapter, where it is talked about the possibility of using YOLO to classify objects such as windows and with the use of YOLOs boxing features the windows can be removed from the camera flow. Although this is not a perfect fix since the camera is still captures everything and if the drone perhaps has cloud computing the image- flow might be accessible to others not authorized for access.

As mentioned, this is already a becoming a problem from both a social and an ethical standpoint.

Countries are already implementing forms of drone drivers ‘licence and identification to inform

people about these issues and highlighting the importance of people’s privacy. And this is due

to the easy access of private drones with camera that can infringe on people’s privacy.

(35)

9 Conclusion

To conclude this work, training datasets for the specific purpose of facades does not yet seem

to exist, this led to the use of datasets only containing cracks. The deep learning models used

during this work led to some interesting results that point towards the possibility of classifying

facade cracks with the help of convolutional neural networks. The project also shows how sim-

ulating training data can be of help if the original training set lacks some of the important objects

that a perfect training set would have. The simulated training data created in this project, mixed

with the original training set, did to some extent enhance the model’s accuracy on two out of

three test sets. The work also highlights some good choices of methods to go with if this work

is to be continued in the future.

(36)

10 References

[1] E. F. John McCarthy, “In Memorial - Arthur Samuel: Pioneer in Machine Learning,” AI Magazine, 1990.

[2] Anisha N. Bhuva, et al., “A Multicenter, Scan-Rescan, Human and Machine Learning CRM Study to Test Generalizability and Precision in Imaging Biomarker Analysis,”

American Hearth Association, 2019.

[3] J. Martindale, “Nvidia RTX DLSS: Everything you need to know,” DIGITAL TRENDS, 14 02 2020.

[4] P. R. S. B. P. W. James S Cope, “Plant Texture Classification Using Gabor Co- Occurances,” in Advances in Visual Computing, Springer, 2010, pp. 669-677.

[5] B. A. Tivive Fok Hing Chi, “Texture Classification using Convolutional Neural Networks,” in Faculty of Informatics - Papers. 1 - 4. 10.1109/TENCON.2006.343944, 2006.

[6] L. K. Ping, “A Deep Learning Model for Automatic Image Texture Classification:

Application to Vision-based Automatic Aircraft Landing,” Queensland University of Technology, 2016.

[7] P. J. Forsyth. D, “Computer vision: A modern approach (2nd ed.),” Pearson, USA, 2012.

[8] J. C. P. F. G. Z. R. C. M. P. Li Liu, “From BoW to CNN: Two Decades of Texture Representation for Texture Classification,” International Journal of Computer Vision, 2019.

[9] Robert Geirhos, Patricia Rubisch, et al., “ImageNet-trained CNNs are biased towards texture; Increasing shape bias impoves accuracy and robustness,” ICLR, 2019.

[10] A. S. E. M. B. Leon A Gatys, “Texture synthesis using convulutional neural networks,”

in Advances in Neural Information Processing Systems, 2015, pp. 262-270.

[11] M. B. Wieland Brendel, “Approximating CNNs with bag-of-local-features models works suprisingly well on ImageNet,” in International Conference on Learning Representations, 2019.

[12] Abadi Mart´ın, et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems,” Google Research, URL: http://www.tensorflow.org, 2015.

[13] François Chollet et al., “Keras,” http://keras.io, 2015.

[14] http://www.blender.org, “Blender,” Blender Foundation, 1994.

[15] M. Maguire, S. Dorafshan and R. J. and Thomas, “SDNET2018: A concrete crack image dataset for machine learning applications,” Utah State University URL:

http://www.digitalcommons.usu.edu/all_datasets/48, 2018.

[16] Özgenel, Ç.F., Gönenç Sorguç, A., “Performance Comparison of Pretrained

Convolutional Neural Networks on Crack Detection in Buildings,” ISARC, Berlin, 2018.

[17] B. B. J. D. D. H. R. H. W. H. L. J. Y. LeCun, in Backpropagation Applied to Handwritten Zip Code Recognition, MIT, 1989, pp. 541-551.

[18] X. Z. S. R. J. S. Kaiming He, “Deep Residual Learning for Image Recognition,”

Detection of facade cracks using deep learning

Detection of facade cracks using deep learning

Linus Eriksson

Abstract

Facade cracks are a common problem in the north of Sweden due to shifting temperatures creating frost in the facades which ultimately damages the facades, often in the form of cracks.

Keywords: Machine learning, Deep learning, Dataset simulation, Crack classification, Blender,

Feature Extraction.

Acknowledgements

I thank my supervisor and Research Engineer Bin Wang for providing me with his knowledge,

guidance, and support during this project.

Table of Contents

1 Introduction ... 1

1.1 Background ... 1

1.2 Overall intention ... 2

1.3 Scope Limitations ... 2

1.4 Concrete and verifiable objectives ... 2

2 Theory ... 3

2.1 Machine learning ... 3

2.2 Artificial neural network ... 3

2.3 Convolutional neural network ... 4

2.4 Textures... 4

3 Tools ... 6

3.1 Tensorflow 2.0 ... 6

3.2 Keras ... 6

3.3 Blender ... 6

3.4 Datasets ... 7

3.4.1 SDNET2018 ... 7

3.4.2 Concrete Crack Images for Classification ... 7

3.4.3 Simple dataset ... 8

3.4.4 Realistic datasets ... 8

3.5 LeNet-5 ... 10

3.6 ResNet-50 ... 11

4 Experimental work – preparations ... 12

4.1 Experiment 1 – LeNet-5 ... 12

4.2 Experiment 2 – Filtering ... 13

4.3 Experiment 3 – Feature Extraction ... 13

4.4 Experiment 4 – Simulation of training data ... 14

4.4.1 Plaster facade texture ... 15

4.4.2 Cracks ... 15

4.4.3 Simulating Objects ... 17

5 Results ... 19

5.1 Experimental Results ... 19

5.1.1 Experiment 1 – LeNet-5 ... 19

5.1.2 Experiment 2 – Filtering ... 19

6 Analysis of the Experimental Results ... 22

6.1 Experiment 1 – LeNet-5 ... 22

6.2 Experiment 2 – Filtering ... 22

6.3 Experiment 3 – Feature Extraction ... 22

6.3.1 Testing the simulated texture ... 23

6.3.2 Testing the whole simulated dataset ... 23

6.4 More on Feature Extraction ... 23

7 Future work ... 26

7.1 YOLO ... 26

7.2 Drone application ... 26

8 Discussion ... 27

8.1 Alternative solutions ... 27

8.2 Social and ethical aspects ... 27

9 Conclusion ... 29

10 References ... 30

Terminology

Acronyms

NN CNN TPU GPU BoW Textons NASNet ResNet ReLU Softmax

Neural Network.

Convolutional Neural Network.

Tensor Processing Unit.

Graphics Processing Unit.

Bag-of-Words.

Micro-structures in natural images.

Neural Architecture Search Network.

Residual Network.

Rectified Linear Unit (activation function).

Normalized exponential function (activation function).

1 Introduction

1.1 Background

Many older buildings in cities have a facade coating made of plaster. Facade plaster is a putty- like coating that often has a somewhat coarser content and is cement based. Facade plasters are known to be strong against wind and water and can last for decades if taken care of properly.

Finding cracks as a facade worker is done ocularly and it is a difficult task when the houses are

tall and many, so a thesis was motivated as if machine learning could be used to create an

algorithm that recognizes cracks on these facades. This algorithm could then in the future be

implemented on a drone that could then effectively fly around and film, analyse and classify

cracks on the facades.