Multi-Task Convolutional Learning for Flame Characterization

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Statistics and Machine Learning

2020 | LIU-IDA/STAT-A--20/031–SE

Multi Task Convolutional

Learn-ing for Flame Characterization

Multi Task Convolutional inlärning för ﬂammekarakterisering

Obaid Ur Rehman

Supervisor : Josef Wilzen Examiner : Krzysztof Bartoszek

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

This thesis explores multi task learning for combustion flame characterization i.e to learn different characteristics of combustion flame. We propose a multi-task convolutional neural network for two tasks i.e. PFR (Pilot fuel ratio) and fuel type classification based on the images of stable combustion. We utilize transfer learning and adopt VGG16 to develop a multi task convolutional neural network to jointly learn aforementioned tasks. We also compare the performance of individual CNN model for two tasks with multi-task CNN which learns these two tasks jointly by sharing visual knowledge among the tasks. We share effectiveness of our proposed approach on a private company’s dataset. To the best of our knowledge, this is the first work being done for jointly learning different characteristics of combustion flame.

(4)

Acknowledgments

First, I thank Allah for paving my paths when I saw no paths, for lifting me up when I was down, and for good health throughout this Masters Program. Secondly, I would like to thank my supervisor (Josef Wilzen) for guiding me throughout this thesis, his constructive criticism and guidance was very uplifting for me. I acknowledge and thank my industry supervisors (Rickard Magnusson Annika Lindholm) for guidance on combustion systems, extensive ses-sion on industry turbines and combustion system which enabled me understand the problem better. I also thank my manager at Siemens (Saeid Kharazmi) for seeing potential in me and for giving me a chance to work on this thesis. My opponent’s ( Pedram Kasebzadeh) and ex-aminer’s (Krzysztof Bartoszek) timely feedback on my thesis report helped me produce good quality content and I am thankful to both of them.

Lastly, my deep and sincere gratitude to my family for their continuous and unparalleled love, help and support.

(5)

List of Figures

1.1 Combustion System of an industrial turbine SGT-800 having 3rd generation DLE

burner. . . 2

1.2 Combustion flames with different PFR. a, b and c shows flames with PFR of nearly 1%, 5-6%, and 30-40% respectively. . . 3

1.3 Combustion flames under different fuels using one 3rd generation DLE burner at atmospheric test rig. a) f) are natural gas flames, b) d) are hydrogen flame enriched with natural gas, c) is flames produced using water premixed with other fuel and e) is an ethylene flame. . . 3

2.1 A perceptron with three inputs . . . 5

2.2 A neural network with three layers . . . 6

2.3 Graph for ReLU activation function . . . 7

2.4 3D convolution in an image of size 6x6x3 . . . 10

2.5 Max and average pooling operation . . . 11

2.6 A common CNN architecture for classification . . . 12

2.7 Different variants of Multi-task learning . . . 13

(7)

List of Tables

3.1 Distributions for PFR and Fuel Type . . . 16

3.2 PFR Distribution . . . 16

3.3 Fuel Type Distribution . . . 16

4.1 Performance comparison for both tasks using individual CNN and multi-task CNN 20 4.2 Test data confusion matrix for PFR . . . 20

4.3 Test data confusion matrix for Fuel Type . . . 21

4.4 Test data confusion matrix for PFR using MTL . . . 21

(8)

Chapter 1 Introduction

This chapter introduces the background, motivation, and aim of this thesis work. Charac-terization of combustion flames of gas turbines is introduced along with potential research questions to be discussed in detail.

1.1 Siemens Turbomachinery AB

This thesis work is done in collaboration with Siemens Turbomachinery AB. Siemens Indus-trial Turbomachinery AB based in Finspång in northern Östergötland supplies power plants and turbines with high efficiency and low emission levels.

1.2 Background

Gas turbines, over all the world, are being used to generate power which drives different industries. A combustion system is an integral part of the gas turbine and is responsible for producing power by burning fuel premixed with air. The conditions under which the combustion system operates vary accordingly with the power generating capability of the turbine, and they work under high temperature and pressure. Figure 1.1 shows a Siemens industrial gas turbine engine (SGT-800) showing the 3rd generation DLE burner (Dry Low Emission) used for producing combustion flame. The need for availability of these turbines is of major concern for any industry as it directly affects the customers’ revenues, therefore, the maintenance of these turbines has become increasingly important for the manufacturers to ensure customer satisfaction. One important part of monitoring the combustion process is the characterization of the combustion flame. The combustion flames need to stabilize to ensure flawless combustion and hence high throughput of the turbine system. The flame stability depends on several factors i.e. fuel distributions, temperature, pressure, and ambient factors e.t.c. In this thesis, we have proposed an approach to characterize or to learn the flame characteristics during combustion, using deep learning.

1.3 Motivation

The combustion system is one of the many important and expensive parts in a gas turbine and it is critical to monitor its health to prevent unplanned maintenance, which usually is very costly, ensuring the turbine keeps operating normally. This thesis focuses on monitoring the combustion flames by learning the characteristics of the combustion flame. We have focused on two main characteristics of the flame which are:

• PFR, stands for pilot fuel ratio, is the fuel flowing through the pilot burners. • Fuel Type

(9)

1.3. Motivation

Figure 1.1: Combustion System of an industrial turbine SGT-800 having 3rd generation DLE burner.

Unstable combustion can introduce abnormalities, which may also affect the performance of other components in a gas turbine or could also result in higher concentrations of CO and NOx emissions. CO is the carbon emissions and NOx is the mixture of NO2 and NO emis-sions which are not good for the environment. A stable flame prevents instability in combus-tion which may lead to a failure in the process, otherwise, the process could lose efficiency and it is a very time-consuming process to open turbines to fix faulty combusters resulting in weeks of downtime and will be costly for the industrial plant. Therefore, for almost all of the industrial gas turbines, it is critical to monitor combustion in the combustion system so that if any irregularity occurs necessary actions could be taken to mitigate downtime and high main-tenance costs. Currently, experts monitor the combustion flames using different measures i.e. combustion pressure oscillations, temperature and pressure sensor values. This thesis aims to present reproduceable research that can confirm the viability of deep learning methodologies in monitoring combustion flames without involving aforementioned measures. This thesis may inspire Siemens to add cameras in the turbine to monitor combustion flames using the methods that we propose in this thesis.

There are several monitoring frameworks for combusters, however, they do not show promising results. Several knowledge rule-based methods,for example [25] and [8], which are also very complex and requires a deep understanding of combustor dynamics to develop the rules. In this thesis, we have proposed a new approach using deep learning networks

(10)

1.4. Objective

Figure 1.2: Combustion flames with different PFR. a, b and c shows flames with PFR of nearly 1%, 5-6%, and 30-40% respectively.

Figure 1.3: Combustion flames under different fuels using one 3rd generation DLE burner at atmospheric test rig. a) f) are natural gas flames, b) d) are hydrogen flame enriched with natural gas, c) is flames produced using water premixed with other fuel and e) is an ethylene flame.

that can utilize the videos and images of the combustion flames to learn the characteristics of a combustion flame. Our approach does not involve any rule formulation, it involves the images of combustion flames gathered during the testing of the combustion systems with different fuel type and at different PFR. Also, the results we have obtained are very promising.

1.4 Objective

Deep learning has attracted many research interests in the last decade as the deep learning models have been shown to outperform competing machine learning models in different domains, especially for image, speech and text data. There is not much research that has been carried out focusing on monitoring using videos or images in the combustion process and therefore we hope to contribute in this domain with our research work. The PFR plays an important role in stabilizing the flame inside the combustion chamber and needs to be controlled manually by the experts, but sometimes, the PFR provided by the experts as input increases or decreases due to irregularities inside the turbine i.e valve malfunction, ambient variance e.t.c. Therefore, experts should be able to accurately classify current PFR inside the combustion chamber. Combustion system burns the fuel, mixed with air, to produced power.

(11)

1.5. Delimitations

The change of fuel can induce irregularities in the turbine as a specific turbine, when sold, is programmed to work on a particular fuel. In the past, some cases have been reported by Siemens where the turbines were connected to a fuel production unit for fuel supply, and changes in the fuel production unit produced irregular fuel distribution which the turbines were not programmed for, resulting in malfunctionality. Therefore, the classification of fuel using combustion flame videos or images is also desireable by experts.

In this thesis , we have proposed another approach for learning combustion flame charac-teristics using the surveillance videos of the stable combustion processes. We used multi-task convolutional neural networks for statistical inference to learn flame characteristics i.e PFR and fuel type being used, from images extracted from videos of stable combustion and then use the images of combustion flames as input to predict the PFR and the fuel type. We in-tend to develop a deep learning architecture that can learn from images of the stable flames produced by the combustion system under different conditions so that when an image of a combustion flame is provided to the model as input, it would be able to classify PFR and fuel type. Many researches have shown convolutional neural networks (CNN) to be very effective in image classification tasks and several state of the art models for image classification and segmentation use CNN or its variants. [19] uses CNN for classification of breast tumors into benign and malignant.[32] explores different strategies for doing event detection in videos using CNN trained for image classification. However, it has also been proven that learning multiple related tasks jointly can further increase the efficiency of the model. [1] uses multi task CNN for learning multiple attributes for better classification and has goals similar to this thesis. In this thesis, we have also benchmarked the performance of individually CNN for both tasks and a multi task CNN which learns both tasks jointly. This paper intends to answer the following research questions:

1. How does convolutional neural networks perform for classification of PFR and fuel type individually, based on images of combustion flame?

2. How does multi-task convolutional neural networks perform when jointly classify PFR and fuel type, based on the images of combustion flame?

1.5 Delimitations

This research reported in this paper was done using data generated by a specific test turbine setup under specific ambient conditions. This works is an attempt to prove the viability of deep learning technologies as a good fit for learning combustion flame characteristics.

(12)

Chapter 2 Theory

In this section, we gave brief introduction to convolutional neural networks (CNN), Transfer learning, multi-task learning (MTL) and related studies, which are employed in our approach, followed by methods we utilized to achieve the aims.

2.1 Feed-Forward Neural Networks

Neural networks are a set of algorithms designed to recognize numerical patterns. The build-ing blocks of a neural network are called neurons.

Figure 2.1: A perceptron with three inputs

Figure 2.1 shows a neuron with three inputs and a single output. The weights W1, W2and

W3are introduced to weigh the effect of each input. The output is passed through activation

function, which is used to introduce non-linearity as discussed in later section 2.1. Neuron makes decision by weighting up the evidence, equation 2.1 represents a neuron output as weighted sum of its input, which is given as:

ˆy=σ(Σjwjxj) (2.1)

Neural networks are made up of such neurons arranged in layers with each layer having different number of neurons. Figure 2.2 shows are neural network with three hidden layers. In neural networks, after the summation of weighted inputs for a neuron, activation is ap-plied. +1 is the bias node that is added to the weighted sum of the neurons. Bias is critical in finding a better fit for the data as it helps the actuation shift left or right. The activation function are explained in the section 2.19. These neurons, collectively, can provide accurate answers to complex problems. In this network, each neuron is connected to a neuron in the layer before and after except the neurons in first layers which are connected to the input and neuron in the second layer. The arrows represents the weights and are usually notes as W. In figure 2.2, the weights of first layer is 3x4 matrix, and for second layer is 1x4 matrix. If there are m neurons in a layer j and n neurons in layer j+1, then the weights will have n x

(m+1)dimensions. For a layer j, each row in the weight matrix represent weights for each input coming from layer j ´ 1. In figure 2.2, a(L)n represents the activated output by the n ´ th

(13)

2.1. Feed-Forward Neural Networks

Figure 2.2: A neural network with three layers

Additionally, in feed-forward neural networks, there are no feedback connections from input to output i.e with no cyclic connections between nodes. A general equation for computing a neuron is as:

a(L)n = [σ(ΣmWnmL [...[σ(ΣjWkj2[σ(ΣiWji1˚xi+b1j)] +b2k)]...]m+bnL)]n (2.2)

Activation Function

An activation function is attached to each neuron in the network, and determines whether it should be activated (“fired”) or not, based on whether each neuron’s input is relevant for the model’s prediction. One important use of activation functions is to introduce non-linearity in the output of a neuron because otherwise it would just be a linear function. Activation functions performs non-linear transformations to input which enables the network to learn complex tasks. Activation functions also help normalize the output of each neuron to a range between 1 and 0 or between -1 and 1 depending on the type of activation function. There are different activation function suitable for different tasks i.e softmax, sigmoid, tanH e.t.c. the activation functions in a neural network is usually represented as σ(). Here we have briefly explained sigmoid and ReLU activation functions.

(14)

Rectified Linear Unit

Rectified linear unit, stands for ReLU, is one of the most commonly used activation functions. Mathematically, it is expressed as:

R(z) =max(0, z) (2.3)

Figure 2.3 shows the graph for ReLU. Equation 2.4 shows that the ReLU is linear for positive values and zero for negative values which makes it computationally efficient and helps in converging faster. There are few variants of ReLU i.e Leaky ReLU and Parametric ReLU, however, these variants are not in the scope of this thesis.

Figure 2.3: Graph for ReLU activation function

Softmax

Softmax is a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponential of the in-put numbers. That is, prior to applying softmax, some vector components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval (0,1), and the components will add up to 1, so that they can be interpreted as probabilities [28]. Softmax function is expressed as :

σ(z)i= e (_zi₎

ΣK j=1ezj

f or i=1, 2, ..., K (2.4)

Gradient Descent and Back-propagation

Gradient descent is a an algorithm used to minimize an objective function, it updates the weights of a neural network to achieve a particular goal that is to minimize the error func-tion associated with the network. This error funcfunc-tion is also known as loss or cost funcfunc-tion, discussed in section 2.1

let Q(w)be the error function of a network with weights w. Q(w)is the sum of all in-dividual error function, that is, if there are n observations having n targets such that txi, tiu,

i=1, 2, ..., n. Then

(15)

where Qi(w)is the error for the i ´ th observation. In case of stochastic gradient descent, at

each iteration, only single observation is used to gradient and update weights, and there are n iterations n each training iteration/epoch.

At each iteration, gradient descent updated the weights in the negative direction of the

OQ(w). The weights are updated as follows:

w=w ´ ηOQ(w) (2.6)

Where η is the learning rate parameter. The learning rate is a hyper-parameter that con-trols how much to change the model in response to the estimated error each time the model weights are updated.

Now we look at how these gradients are calculated in the neural networks. Back-propagation is the essential algorithm for training a neural network. Back-Back-propagation helps in adjusting the weights based on the error/loss on each iteration/epoch. In the section 2.2, we discussed how the calculations are made inside a neural network. We discussed that a neuron yi with weights wji ,containing weights for each of the neuron input, calculates

weighted sum as:

xj =Σiyi˚wji (2.7)

and this output is passed through an activation function, lets assume a sigmoid function: yj=

1

1+e(´xj) (2.8)

[20] measures the performance as squared loss summed over all the input-output pairs and output units:

E= 1

2ΣcΣj(yj,c´dj,c)

2 _(2.9)

where c is the index over input-output pairs, j is an index over output units, y is the actual output and d is the desired output. To minimize E by gradient descent, partial derivative of E are calculated with respect to each weight in the network. The partial derivative BE

Byi are calculated as:

BE Byi

=yj´dj (2.10)

One important part of calculating the partial derivative is chain rule. The partial derivative of E to the input of the output units is calculated. This particular derivative represents how a change in total input will effect the error.

BE Bxj = BE Byj Byj Bxj = BE Byj (1 ´ yj) (2.11)

The input x is the linear function of states of lower level neuron units and weight of the connections. How the error will be affected when weights are changed can be computed as:

BE Bwji = BE Bxj Bxj Bwji = BE Bxj yi (2.12)

Taking into account all connectiong of unit i, the contibution of error from unit i is computed as: BE Byj =ΣjBE Bxj ˚wji (2.13)

And as the next step, gradient descent uses the gradients to optimize the weights during training of the neural network.

(16)

2.2. Convolutional Neural Networks

Loss functions

In neural networks, as the part of training, it is repeatedly required to estimate the error with current weights of the network. This requires a function which can evaluate the network performance and estimate the loss or error so that the weights can be updated. Neural net-works loss function usually depends on the problem i.e classification and regression. There are several loss functions that are most commonly used i.e mean squared loss, mean absolute error, hinge loss, cross entropy loss e.t.c, however, we only discuss categorical crossentropy loss here as others are not part of this thesis:

Categorical Crossentropy

The Cross-Entropy loss function is mostly used with multi-class classification problems. In multi class problem, there exists multiple classes and for each observations there exists only one class and the task for neural network is to learn the mapping of input to target classes. It is expressed as:

L(y, ˆy) =´Σ_iΣ_jyi,jlog(ˆyi,j) (2.14)

Where ˆy are the predictions/estimates, y are the targets, i is the number of observations and j is the class. Maximizing likelihood is equivalent to minimizing negative log likelihood which in-turns is equivalent to minimizing the cross entropy. [34] shows Minimizing cross-entropy is equivalent to maximizing likelihood under assumptions of uniform feature and class distributions.

Suppose a probability model produces N predictions ˆy1, ..., ˆyNover K classes. The negative

log-likelihood of these N predictions will be :

´log(p(ˆy1, ..., ˆyN)) =´Σi=1N log(p(ˆyi))

=´Σ_i=1N logˆyji_i

=´Σ_i=1N ´ΣK_ji=1y_ijilog ˆyji_i

=Σ_i=1N H(ˆyi, yi)

(2.15)

Which is the sum of the cross-entropy over all instances.

2.2 Convolutional Neural Networks

CNN are a type of feed-forward neural network which is designed to work with images and they have been very successful at tasks that involve learning from images. In the past decade, we have seen a very high boost in the performance of CNN and many states of the art models for image classification and segmentation are based on CNN. There are many use cases for CNN but in this thesis we use CNN only for classification. [13] has recently demonstrated impressive results on various object detection benchmarks, to face detection. By training a Faster R-CNN model on the large scale WIDER face dataset [21]. [15] trained a large, deep CNN to classify the 1.3 million high-resolution images in the Large Scale Visual Recognition Challenge (LSVRC- 2010) ImageNet training set into the 1000 different classes. They achieved top-1 and top-5 error rates of 39.7% and 18.9%. Top-N error rate is how often the classifier did not involve the correct class among the top N probabilities. [30] adapted CNN to learn itself to find fire features expressed in a deep level. CNN have been proved to outperform other algorithms in computer vision tasks [24]. CNN are also being used in the tasks of de-tection and recognition of faces, objects and logo [13] [15] [29]. CNN are proved to be good at extracting features from images which are obtained by a function that maps the input images and the outputs labels. [22] shows generic descriptors extracted from the convolutional neu-ral networks are very powerful to obtain accurate classifications and surpassed highly tuned state-of-the-art systems in the classification tasks.

(17)

Convolutional Layers

An image X is a set of raw pixels which are represented as 3D tensor X PRd1ˆd2ˆd3_{. Here} d1,d2are the spatial coordinates of an image and d3is the number of channels in the image.

Channels refers to the number of colours i.e an RGB image has three channels, namely red, green and blue. The two building blocks of CNN are the convolution layer and the pooling layer. Usually these two blocks are followed by the fully connected layer to make classifica-tions. We will discuss these layers briefly. Figure 2.4 shows convolutional operation on an image. The convolutional layer is similar to the layers in feed forward networks, however,

Figure 2.4: 3D convolution in an image of size 6x6x3

the transformation an image goes through between layers is different. The convolutional layer uses different kernel to extract features from the images. There exists literature which use filters instead of kernel but it means the same thing. These kernel are a smaller size ma-trix in comparison to the input, and there are several number of them. The kernels have real value entries and are convolved with the input to obtain feature maps. When the convolution occurs i.e when the kernel is drawn across the input from previous layer (for the first layer, input is just the input image), each position results in the activation of neuron and the out-put at these positions is collected in the feature map. Feature maps represents the activated region where feature specific to the kernel are detected in the input. With each learning itera-tion, the values of the kernels change which indicates that the network is learning to identify the regions of significance for extracting features from the input data. Fk P Rwˆwˆd3 is the

representation of the kernel with size w and d3 number of channels. Each of these kernels

convolve with the image X result in an output called feature map MkPR(d1´w+1)ˆ(d2´w+1)_, where

M_ijk = ([X]_ij, Fk) =Σw_î=1Σw_ˆj=1Σd_ˆl=13 [X]i+î´1,j+ ˆj´1,l[Fk]î,ˆj,ˆl (2.16)

and [X]_ij P Rwˆwˆd3 _{is a small part of the image equal to the kernel size at location}

(i, j). All the kernels, after convolving, produce feature maps packed into a 3D tensor

M_(d

1´w+1),(d2´2+1), ˆd3. These feature maps are then activated using nonlinear activation func-tions such as sigmoid, TanH and ReLU as discussed in section 2.1, thus

ˆ

Xijk =σ(Mijk), @i P[d1´w+1], j P[d2 ´ w+1], k P ˆd3 (2.17)

ˆ

X are the features extracted from X. The kernels Fk are shared across all locations in Xij and a patch gives positive correlation if it responds strongly to the kernel and therefore each individual kernel Fkextracts similar feature from entire image. Strided convolutions are also used to determine the skip step between two patches. Stride is a sliding step size which determines the step being taken during the convolution i.e. the stride of 2x2 means slide of 2 on both x and y direction. The dimensions of ˆX, after convolution with or without

strides, is reduced and one can use padding to maintain the original dimensions. The padded convolution defines how the border of the images should be handled. Padded convolution results in output dimensions equal to input dimensions.

(18)

Figure 2.5: Max and average pooling operation

Pooling Layers

Pooling layers in CNN reduces the dimensions of the input feature maps by applying ag-gregate function on groups of nearby features in a feature map to produce a single feature replacement for a group of features. Few examples of aggregation function used in pooling layers are average, sum, and max. This pooling results in down-sampling which reduces the feature size for subsequent layers hence, speeding up the computations. Common pooling operation has 2 ˆ 2 max-pooling kernel, it computes

maxt Xi,j,k, Xi+1,j,k, Xi,j+1,k, Xi+1,j+1,ku (2.18)

where i, j are the feature locations in each of the k channels in X as explained in the figure 2.5, however, the figure shows pooling on one channel only. The intuition behind pooling layers is that it helps in reducing redundancy since a small neighbourhood around a location in a feature map is likely to contain the same information. A 2 ˆ 2 max-pooling kernel on

X P Rd1ˆd2ˆd3 _{will result in feature map of size d}₁_{/2, d}₂_{/2, d}₃_{. Figure 2.5 shows max and} average pooling operation

Fully Connected Layers

Fully connected layers are the neural networks containing the layers with neurons that are all connected to other neurons, with weight connections, in previous and successive layers. As discussed in section 2.1 such networks takes X as a vector Vec(X)and computes

ˆ

X =σ(W ˚ Vec(X)) (2.19)

where W are the weights of the connected layers and σ is the activation function of choice.Usually, the fully connected layers are at the end of the network and use the features extracted by convolutional layers to learn to map those extracted features to the respective targets. [16]. Figure 2.6 shows a very common architecture of CNN for classification. It shows the convolutional and pooling layers extract features, which are then utilized by fully connected layers to learn classifications.

(19)

2.3. Multi Task Learning

Figure 2.6: A common CNN architecture for classification

2.3 Multi Task Learning

Multi-task learning (MTL) is a machine learning method to jointly learn different tasks and has been used in various computer vision problems , it utilizes the idea of jointly learning several related tasks using a shared representation [4]. The objective of MTL is to improve the performance for all the tasks at the same time by sharing features among all tasks. It has been shown to significantly improve performance as compared to learning each task individ-ually [5]. MTL has been proved to have better generalization results. Below we have briefly explained how MTL works.

MTL can be used in various context such as in supervised and unsupervised settings ,however, in this thesis we use MTL for supervised learning where the data is labelled. Given tt_iMu, M tasks, with mth input features Xm P Rd, mth label vector ym P R and mth weight vector wm, MTL minimizes the function

argmin_twM m=1uΣ

M

m=1`m(ym, f(xm; wm)) +λR(W) (2.20)

`m(., .)is the task specific loss function and fm(xm; wm)is the task specific estimator of inputs with respect to the weight vector wm. The estimator f(xm; wm)has the form

f(xm|; wm) =xm|˚wm (2.21)

W is the weight vectors concatenated together as tw1, w2, ..., wmu. The W can be constrained

based on the prior knowledge of relationship among the tasks using regularizer R, and λ is the regularization parameter. With very low value of λ, the model assumes no prior knowl-edge for the tasks and thus may result in overfitting and high values of λ may have low generalization results [3]. There are several variants of MTL [26], the one we have used is the Single Input Multi Output (SIMO). SIMO is used when there are multiple related tasks that need to be learnt from same input. In simple terms, the input source is same for all the tasks, for example in our case, the input stays for both tasks i.e PFR and fuel type classification. Different variants of MTL have been shown in the figure 2.7.

(20)

2.4. Transfer Learning

Figure 2.7: Different variants of Multi-task learning

2.4 Transfer Learning

Transfer learning is an optimization method which can improve the learning through transfer of knowledge from other related task that are already learned. In transfer learning, instead of learning everything from scratch, one use the information gathered from solving other task to solve the current task.

Formally, [18] describes a domain mathematically as a tuple D = tX, P(X)u, where X is the feature space, P(X) is the marginal probability and P(X) = tx1, x2, ..., xnu, xi P X.

Given this domain, a task T can be defined as T = ty, P(y|X)u = ty, ηu, where η is the objective function which from probabilistic view is denoted by P(y|X). The transfer learning is defined as follows: Let Ds, Tsbe the source domain with respective source tasks, and Dt,

Ttbe the target domain and respective target task. Transfer learning facilitates to learn the

target conditional probability distribution P(yt|Xt)in Dtwith information gained from Ds

and Tswhere Ds‰Dtor Ts ‰Tt.

In computer vision, usually, transfer learning is expressed through fine tuning the pre-trained models. Pre-pre-trained models are the models which are pre-trained on large dataset to solve a similar problem. In this thesis, we have used VGG16 [24] which uses 16 layer convolution layer architecture and is trained on ImageNet dataset achieving 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes[12].Top-N accuracy means that the correct class gets to be in the Top-N probabilities for it to count as correct. It was one of the famous models submitted to ILSVRC-2014 (ImageNet Large Scale Visual Recognition Competition).

2.5 Related Studies

Combustion Monitoring / Flame Detection

There is not much work done on flame characterization or combustion monitoring using flame images but there have recent research works being carried out in the domain. [2] uses auto-encoder networks to learn excess air regions of coal combustion. The optimal network architecture found was the combination of convolutional layers with fully connected lay-ers, which gave average precision and recall of 77% and 66% respectively. [9] uses CNN for detecting fire in frames or the video and achieved test accuracy of 97.9%. [30] studies classi-fication of different types of flame using Single Shot Multi-box Detector and Deconvolution Single Shot Detection reporting the test accuracy of 98.6%.

Multi Task Learning

There have been many studies on fire, flame and smoke detection but there are not many studies on flame characterization using a machine or deep learning approach. However, CNN have shown to boost performance in classification and detection tasks. Multi-task

(21)

learn-2.5. Related Studies

ing has been proven to shown high performance, relative to other machine learning models, in the domain of computer vision. [33] used a multi-task cascade convolutional network for joint face and alignment detection. As discussed in the MTL section, the paper utilized the in-herent correlation of two tasks to boost up the performance. The paper reported a significant increase in the performance relative to the state of the art model on face detection dataset. [17] uses multi-task R-CNN for object detection and viewpoint estimation jointly. [31] ex-plores MTL with CNN for face recognition along with multiples side tasks which serve as regularization to disentangle PIE (Pedestrian Intention Estimation Data) variations. Addi-tionally, it uses loss weights for each side tasks to balance loss for different tasks. The paper also explores energy-based weight analysis methods. [7] uses MTL with CNN to learn three related tasks using ResNet architecture and reported the accuracy of 86% for task 1: predict-ing one of the four broad eye disease categories and 67% for task 2: predictpredict-ing one of the 320 fine disease sub-categories.

Transfer learning

Transfer learning is improvement in learning new task through transfer knowledge from other related task that has been already learnt. Transfer learning is being considered as ma-chine learning’s next frontier. [10] uses transfer learning along with web data augmentation for classification of images.[23] also utilizes transfer learning to predict nighttime lights from daytime imagery, simultaneously learning features that are useful for poverty prediction.

To the best of our knowledge, no significant work has been done to learn flame character-istics using flame videos or images. We hope this paper contributes to the advancements in combustion’s flame characterization and may spark research interests in this domain.

(22)

Chapter 3 Method

In this section, we present the approaches used to achieve our aims [1.4] along with brief discussion about the data set and the tasks.

3.1 Data

Siemens Turbomachinery AB has an atmospheric one burner (3rd generation DLE) testing rig which is used to carry out experiments to optimize the burner configurations. This test rig is equipped with a high resolution camera which records the combustion process, mainly the combustion flame. The data used in this thesis was generated using the aforementioned test rig where different combination of fuels i.e hydrogen, natural gas and their combination in different proportions, were used to simulate combustion. These simulated combustion took as input different PFR during the combustion process from initial combustion point to the point where combustion process was stabilized. The combustion process gets stabi-lized when the flame is stabistabi-lized in accordance with the input load of the turbine and is determined by the domain experts. In simple words, if the turbine efficiency reaches upto a threshold, selected by experts, with respect to the input load then it means the combustion process is stabilized.

For this thesis, only the stabilized part of the combustion was recorded and used in analysis and modelling. One can think of the combustion process as a time series process where the past values might have an effect on the future values. We did not formulated this problem as a time series problem because in practice seeing frames as independent over time is a simpli-fication that seem to work well. Also we have taken into care the following assumptions:

• When the flame is stabilized or when the combustion process is stabilized, the PFR values also stabilize with it. A particular recorded video of a stable flame will have the same PFR and fuel type throughout the video.

• When the flame is stabilized, the flame still flickers and it is due the the physical charac-teristic of the flame and has nothing to do with the variables of the combustion system being stable. Therefore, we assumed that the each frame of the video is independent of the other means that the flame in each frame in each second of the video is indepen-dent of how the flame started initially and will have no effect on the future combustion flame.

Also, if we frame this data as a time series, where X is the input videos and Y is the categorical target variable then the total of 120 videos with 10 PFR classes and 5 fuel type classes would not be enough data for neural network to learn accurately. In theory, it would be possible to estimate such model but in practice it would require much more data to learn classes and the dynamics over time which is not feasible in this case. However, this approach is under consideration for future work when we have access to large number of videos.

(23)

3.2. Pre-processing & Task Distributions

3.2 Pre-processing & Task Distributions

The total of 120 videos produced by the camera in test rig are of 30-36 seconds duration and have dimensions of 756 ˆ 576 at 25 frames per second. The videos were then resized to 480 ˆ 480 to remove background noise and to bring the flame to focus. These videos were splitted into images by extracting a single frame each second, to remove the time dependencies, at least to some extent, resulting in 10700 images. There is no data augmentation being done because the orientation of flame in all the videos is same i.e. the flame is always in the same position and at the same angle.

Data Split

The data set used for evaluation is discussed in 3.2. The data was split into three parts i.e. training, validation and testing having 70%, 15%, and 15% of the data respectively. For the data split, we made sure that each class is represented in all the splits by sampling from all the classes.

This thesis studied two tasks: • PFR learning

• Fuel Type learning

These two tasks are most desirable by the experts at Siemens and we believe if we can show that our approach is viable to learn these tasks then the approach could be used to learn other related tasks as well i.e exhaust temperature prediction, flame oscillation prediction e.t.c .

Table 3.1 shows the imbalanced distribution of PFR and fuel type in the dataset with 10 PFR classes and 5 Fuel Type classes.

Table 3.1: Distributions for PFR and Fuel Type Table 3.2: PFR Distribution PFR Label Count 0 (No Pilot) 0 5776 10-12 1 1066 6-9 2 1027 4-5 3 680 1-3 4 518 30-40 5 471 20-25 6 430 13-15 7 304 60+ 8 267 41-56 9 160

Table 3.3: Fuel Type Distribution

FuelType Label Count

100% Natural Gas 0 4979

20% Hydrogen 80% Natural Gas 1 2322

Ethylene 2 1932

F2 3 1051

F1 4 415

3.3 Proposed Approach

This thesis proposes MTL combined with CNN by sharing convolutional layers between the two tasks discussed in 3.2. VGG16 [24] was adapted with modification in dimensions of fully connected layers according to the two tasks.

Figure 3.1 shows the model diagram of VGG16. VGG16 is a convolutional neural net-work model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper [24]. VGG16 has 5 blocks in which first 2 blocks have two convolutional layers and

(24)

3.3. Proposed Approach

Figure 3.1: VGG16 model diagram

last three blocks have three convolutional layers, each block having a pooling layers after convolutional layers. Softmax , discussed in section 2.1. is used as activation functions for fully connected layer. It takes input an image and passes it through the blocks which extract features from the images as explained in section 2.2. These features are then flattened, con-verted into vector form, and act as input to the fully connected layers. Fully connected layer train the weights using back-propagation and make classification using the softmax function as discussed in section 2.1.

Fine Tuning

Fine tuning is a process to take a network model that has already been trained for a given task, and make it perform a second similar task. Assuming the original task is similar to the new task, using a network that has already been designed trained allows us to take advantage of the feature extraction that happens in the front layers of the network without developing that feature extraction network from scratch [27].

Individual CNN for PFR and Fuel Type

We have developed individual CNN models using VGG16 for PFR and fuel type classifica-tion. In each of these two individual models, we used VGG16 after changing the number of output nodes in output layer equal to the number of distinct classes in each case i.e in PFR case, we have fully connected layer connected to output layer with 10 outputs nodes and in fuel type case there are 5 outputs nodes. To this end, we fine-tuned VGG16 and trained the network over the fully connected layers for making classification. Fine tuning is a process to take a network model that has already been trained for a given task, and make it per-form a second similar task. Assuming the original task is similar to the new task, using a network that has already been designed trained allows us to take advantage of the feature extraction that happens in the front layers of the network without developing that feature extraction network from scratch [27]. As explained in 2, the convolutional layers extract high level features from the image and then pass those features to a fully connected layers to make classifications.

Given a training set T =tIi, yiu_i=1N , where N is total number of observations, Ii is the image with respective label yiwhich is in one-hot encoding representation. One-hot encoding turns a target label into a vector of length equal to number of classes having all zeros and only one for the respective class index. The convolutional layers extracts the high level features from the images which, after flattening, are then passed to fully connected layers as input. These high level features x P RDx1are found by mapping a non-linear function from input to the shared features:

(25)

where k and b are the kernels and bias of the convolutional layers. Let k and b be learnt from the parameter space θ such that θ=tk, bu. These features are passed as input to fully

connected layers to make classifications using a generalized model

y=Wx+b (3.2)

where W are the weights in fully connected layers and x is the features extracted by CNN. We formulate this problem into an inference problem where given input images we try to learn the output categorical labels by means of maximizing likelihood estimates. Hence, we have used Cross-Entropy as a loss function, minimizing this loss function is equivalent to maximizing likelihood, as discussed in section 2.1. ReLU was activation function of choice for hidden layers and Sigmoid was used as activation function for the output layers.

Multi Task Convolutional Neural network

The convolution process is same as discussed in 3.3. In multi task learning, the features extracted from convolution are shared among the two tasks. In MTL, we use the same fine-tuning approach as mentioned in 3.3 but in this case, the convolutional blocks are shared among two sets of fully connected layers, one for each task, which is again connected to the respective output layer. Same generalized linear model as mentioned in 3.2 is applied for each task. Let W₁p P RDxDp _{and b}p

1 P RDpx1 be weight and bias vectors of fully connected

layer for PFR task, where Dpis the number of distinct classes in PFR and D is the number of

neurons in hidden layer followed by output layer. For fuel type task, we can use the same generalized linear model with Dpbeing distinct classes in Fuel Type, and hence both of these

tasks would model as:

yp=Wp Tx+b₁p (3.3)

To compute the probability of x belonging to each class in training set, ypis fed to a soft-max layer: so f tmax(yp) = exp(y p n) ΣDp j exp(y p j) (3.4)

Where yp_j is j ´ th element in yp. The resultant from the so f tmax(.)is the probability distri-bution over all probable classes. The estimated class is chosen as:

ˆyp=arg max so f tmax(yp) (3.5)

MTL Loss

Multi task learning aims to minimize the combined loss of all the tasks. Let W = tWp, Wfu be the weight matrices for PFR and Fuel Type task. Give, T training set, our Multi-task CNN aims to minimizes the combined loss of both tasks:

arg min θ,W

αpΣi=1N L(Ii, y_ip) +αfΣNi=1L(Ii, y_if) (3.6)

where αpand αf are control parameters for task importance. This combined loss drives the

model to learn θ and W for classification in both tasks.

Implementation Details & Hyper parameters

The proposed algorithm was developed in Python using Keras. Keras is a highly flexible API to develop, train and evaluate models [14]. The code for model development and evaluation was developed by the author and is property of Siemens Turbomachinery AB. Python scripts were written for pre-processing, data splitting, model creation, model evaluation, analysis

(26)

and a prototype application which used threading to analyze images, videos and live stream of combustion flame.

Our approach uses SGD with momentum, which is a method to boost gradient optimiza-tion found in Rumelhart, Hinton and Williams’ paper [20]. Stochastic gradient descent with momentum remembers the update weights at each iteration, and determines the next update as a linear combination of the gradient and the previous update. The value for momentum was searched using grid search and we found the momentum value of 0.9 to be highly effec-tive. The hyper parameters αpand αf were chosen using grid search where we have searched

over a range of values for both parameters and found the the one which gives good results. The same data splits as mentioned in 3.2 is used for grid search.

In our proposed approach, no dropout layers or regularizer has been used. However, we have opted for early stopping which is a form of regularization used to avoid overfitting. Early stopping guides as to how many iterations should the model train before it starts to over-fit. In our case, we stopped training when the model accuracy starts to degrade i.e we stopped the training after 35thepoch.

(27)

Chapter 4 Results

We have evaluated the proposed approach for two tasks mentioned in 3.2. We have used individual CNN for both task as baseline models. Below, we present the results achieved with our proposed approach. The data set used for evaluation of the proposed approach is mentioned in section 3.2

Individual CNN for PFR and Fuel Type classification

Results for individual CNN model trained, with above mentioned data settings, for PFR and for fuel type classification are mentioned in the table 4.1 with average precision and recall of 0.78 and 0.71 for PFR. Table 4.1 also shows average precision and recall of 0.85 and 0.87 for fuel type. The F1 score is the harmonic mean of the precision and recall, where an F1 score reaches its best value at 1.

Table 4.1: Performance comparison for both tasks using individual CNN and multi-task CNN

Model Task Average Precision Average Recall F1 Score Accuracy Individual CNN PFR 0.78 0.71 0.73 0.76

Individual CNN Fuel Type 0.85 0.87 0.87 0.86

MTL CNN PFR 0.98 0.99 0.99 0.98

MTL CNN Fuel Type 0.98 0.97 0.98 0.98 Table 4.2 and 4.3 shows the confusion matrices for PFR and fuel type on test data, respec-tively. In these confusion matrices, the classes in first row represents the predicted classes against the actual classes in first column and each value in the cell represents how many times an particular class label was predicted as other class label.

Table 4.2: Test data confusion matrix for PFR

hhh_hhh hhh_hhh hhh Actual Label Predicted Label 0 1-3 10-12 13-15 20-25 30-40 4-5 41-56 6-9 60+ 0 860 0 4 0 0 0 1 0 0 0 1-3 20 34 13 0 0 0 1 0 7 1 10-12 0 1 148 3 4 0 0 0 2 0 13-15 2 0 19 12 7 2 1 0 1 0 20-25 3 0 2 3 49 5 0 0 1 0 30-40 0 0 0 0 3 62 0 0 0 4 4-5 17 6 4 0 0 0 67 0 6 1 41-56 0 0 0 0 0 5 0 18 0 0 6-9 11 2 29 3 5 2 27 0 73 1 60+ 0 0 0 0 1 3 0 0 0 35

(28)

Table 4.3: Test data confusion matrix for Fuel Type hhh hhh_hhh hhh_hhh Actual Label Predicted Label 20H280NG Ethylene F1 F2 NG 20H280NG 344 1 2 0 0 Ethylene 0 278 0 0 10 F1 10 0 50 0 1 F2 2 0 0 154 0 NG 1 2 0 0 742

Multi-task CNN for Joint Learning

We trained a multi task model as discussed in 3.3, to learn both tasks jointly. Table 4.1 shows the results observed using the proposed approach. We can see improved average precision and recall for both tasks. The average precision and recall for PFR task reached 0.98 and 0.99 respectively. For fuel type, a significant increase in average precision and recall was observed, reaching 0.98 and 9.97, respectively.

Table 4.4 and 4.5 shows the confusion matrices obtained for test data using the proposed MTL approach, respectively. We observe improved classification for all classes for both tasks. The errors made in classification are far less as compared to the confusion matrices produced using individual CNN models.

Table 4.4: Test data confusion matrix for PFR using MTL

hhh_hhh hhh_hhh hhh Actual Label Predicted Label 0 1-3 10-12 13-15 20-25 30-40 4-5 41-56 6-9 60+ 0 870 0 0 1 0 0 0 0 0 0 1-3 8 41 5 0 0 0 10 0 0 0 10-12 0 0 189 3 0 0 0 0 2 0 13-15 0 0 3 43 0 0 0 0 0 0 20-25 0 0 0 0 65 0 0 0 0 0 30-40 0 0 0 0 0 51 0 1 0 0 4-5 3 0 0 0 0 0 82 0 7 0 41-56 0 0 0 0 0 0 0 17 0 0 6-9 0 0 5 0 0 0 3 0 156 0 60+ 0 0 0 0 0 0 0 0 0 35

Table 4.5: Test data confusion matrix for Fuel Type using MTL hhh hhh_hhh hhh hhh Actual Label Predicted Label 20H280NG Ethylene F1 F2 NG 20H280NG 358 0 1 0 0 Ethylene 0 324 0 0 2 F1 1 0 45 0 2 F2 0 0 0 138 0 NG 0 5 0 0 724

(29)

The results above is a proof that MTL can perform better while learning multiple tasks as compared to individual models learning individual tasks. Reasons to why MTL works better are discussed in 5.

(30)

Chapter 5 Discussion

The goal of this thesis was to analyze if deep learning techniques are viable to learn combus-tion flame characteristics. To achieve this, we came up with two aims for this thesis that are mentioned in section 1.4. From our results, we have found that the multi task CNN has better performance as compared to the individual CNN models.

Results

In section 4, we have seen that the performance multi-task CNN is superior to those of indi-vidual CNN for both tasks i.e PFR and fuel type. An increase of 0.20 and of 0.28 was recorded in average precision and recall, respectively, for the PFR task. And an increase of 0.13 and of 0.10 was observed in average precision and recall for fuel type task, respectively. Also, from the confusion matrices generated by multi-task CNN, we can easily see how much the classi-fications has improved and there are only few misclassification.

We credit this performance improvement to feature sharing in multi-task CNN. The ac-curacy for individual CNN for PFR task was 76% and for individual CNN for fuel type was 86%. We believe that the PFR task is difficult to learn from the data as compared to the fuel type task and it was our aim, initially, to see if both of these tasks can learn better if they share information with each other. MTL achieve high performance by sharing the knowl-edge between related tasks, it does so by learning multiple related tasks in parallel. During the training, when back-propagation occurs, it allows the features developed by a task to be shared with other tasks which can help learning the other task better as compared to when its is trained alone because such features would not be developed in a single task model. Also, in MTL, a task can rely more on the hidden units which are better for the task while other tasks can ignore such units.

Methods

[6] mentions several potential reasons as to why MTL works and after conducting extensive experimentation, it has been concluded that the MTL works better due to the information gained from related tasks. There are several factors that effects the performance of MTL. These factors are as follows:

Data Amplification

Data amplification is an increase in sample size due to extra information in training signals of related tasks. [11] uses a example of two tasks i.e T and ˆT which uses same features F but each task uses F for different training pattern. Let task T use a pattern such that T= AORF and ˆT uses a pattern such that ˆT = NOT(A)ORF. Task T uses F when A = 0 and gives no information about F when A = 1. Task ˆT gives information only when A = 1. Now, a network learning only T gets information about F only on training pattern for which A=0, and not when A=1, however, a network learning both T and ˆT gets information about F on

(31)

both patterns i.e when A =0 and A =1. If the net learning both tasks recognizes the tasks share F, it will see a larger sample of F. [11] also conducted experiments with such tasks and found that back-propogated networks learn common features better due to larger effective size.

Attribute Selection

The tasks, PFR and fuel type, share the hidden features. An individual model, trying to learn PFR had difficulty learning the PFR task, if there is limited data or significant noise in data, and will be unable to distinguish between input that are relevant or irrelevant to the hidden features as we have seen in the results 4. MTL which is learning both task at the same time is be able to select better attributes relevant to hidden features because data amplification occurred which provides better training signals for hidden features.

Eavesdropping

Suppose the two tasks we discussed i.e. PFR and fuel type, highly depend on the hidden lay-ers features. The hidden layer features are difficult to learn when learning PFR, but are easy to learn when learning fuel type. Hence, a model learning fuel type will learn the hidden layer features but a model learning only PFR may not learn the hidden layer features accu-rately. So, if the model is learning both PFR and fuel type, then the PFR task can eavesdrop in the hidden layer features learned for fuel type and thus perform better as compared to when it was learned alone.

[11] also argues that, MTL tasks prefer not to use hidden layer representation that other tasks prefer not to use. It also experimented with different datasets and concluded that the task share hidden units more if they are more related. Therefore, we believe the improvement in the results is due to the factors discussed above.

Other than sharing knowledge between tasks, MTL has several benefits. The weights of a MTL network will be smaller in number than those of networks trained on individual tasks in isolation because the weights are shared in MTL. Regarding the gradient, the MTL gradients may be flatter as compared to gradients of individual models but these gradients move towards better direction because in early search the gradients will move in directions which are good for all the tasks. A task in MTL are always trained in context of other tasks and hence aids in avoiding overfitting, to some degree, for a task.

(32)

Chapter 6 Conclusion

In this thesis, we introduce an approach to jointly learn two tasks for flame characterization of a combustion flame. We focused on two characteristics of a combustion flame, that is PFR and fuel type. We also investigated the performance of individual CNN models training a single task and a multi-task CNN to learn two tasks i.e classification of PFR and fuel type. Our results are based on the propriety dataset, containing 10,700 images of combustion flame, provided by Siemens Turbomachinery AB and indicates that multi-task CNN outperforms the individual CNN models. We recorded promising results, 99% accuracy for the PFR task and 98% accuracy for fuel type task. Same tasks, when trained with individually CNN model, resulted in accuracy of 76% and 86% for PFR and fuel type, respectively. MTL CNN had higher performance as compared to individual CNN model as it allows sharing of knowledge between tasks and thus improves performances for low accuracy tasks which are hard to learn when learnt in isolation.

(33)

Bibliography

[1] Abrar H Abdulnabi, Gang Wang, Jiwen Lu, and Kui Jia. “Multi-task CNN model for attribute prediction”. In: IEEE Transactions on Multimedia 17.11 (2015), pp. 1949–1959. [2] S Abdurakipov and E Butakov. “Application of computer vision and deep learning for

flame monitoring and combustion anomaly detection”. In: Journal of Physics: Conference Series. Vol. 1421. 1. IOP Publishing. 2019, p. 012005.

[3] Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. “Convex multi-task feature learning”. In: Machine learning 73.3 (2008), pp. 243–272.

[4] Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. “Multi-task feature learning”. In: Advances in neural information processing systems. 2007, pp. 41–48.

[5] Shai Ben-David and Reba Schuller. “Exploiting task relatedness for multiple task learn-ing”. In: Learning Theory and Kernel Machines. Springer, 2003, pp. 567–580.

[6] Rich Caruana. “Multitask learning”. In: Machine learning 28.1 (1997), pp. 41–75.

[7] Sahil Chelaramani, Manish Gupta, Vipul Agarwal, Prashant Gupta, and Ranya Habash. “Multi-task Learning for Fine-Grained Eye Disease Prediction”. In: Asian Conference on Pattern Recognition. Springer. 2019, pp. 734–749.

[8] Terence C Fogarty. “Rule-based optimization of combustion in multiple-burner fur-naces and boiler plants”. In: Engineering Applications of Artificial Intelligence 1.3 (1988), pp. 203–209.

[9] Sebastien Frizzi, Rabeb Kaabi, Moez Bouchouicha, Jean-Marc Ginoux, Eric Moreau, and Farhat Fnaiech. “Convolutional neural network for video fire and smoke detection”. In: IECON 2016-42nd Annual Conference of the IEEE Industrial Electronics Society. IEEE. 2016, pp. 877–882.

[10] Dongmei Han, Qigang Liu, and Weiguo Fan. “A new image classification method us-ing CNN transfer learnus-ing and web data augmentation”. In: Expert Systems with Appli-cations 95 (2018), pp. 43–56.

[11] Lasse Holmstrom and Petri Koistinen. “Using additive noise in back-propagation train-ing”. In: IEEE transactions on neural networks 3.1 (1992), pp. 24–38.

[12] ImageNet. ImageNet. 2020.URL: http://www.image-net.org/about-overview. [13] Huaizu Jiang and Erik Learned-Miller. “Face detection with the faster R-CNN”. In: 2017

12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE. 2017, pp. 650–657.

[14] Keras. Keras. 2020.URL: https://keras.io/.

[15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep convolutional neural networks”. In: Advances in neural information processing sys-tems. 2012, pp. 1097–1105.

[16] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. “Gradient-based learn-ing applied to document recognition”. In: Proceedlearn-ings of the IEEE 86.11 (1998), pp. 2278– 2324.

(34)

Bibliography

[17] Francisco Massa, Renaud Marlet, and Mathieu Aubry. “Crafting a multi-task CNN for viewpoint estimation”. In: arXiv preprint arXiv:1609.03894 (2016).

[18] Sinno Jialin Pan and Qiang Yang. “A survey on transfer learning”. In: IEEE Transactions on knowledge and data engineering 22.10 (2009), pp. 1345–1359.

[19] Rahimeh Rouhi, Mehdi Jafari, Shohreh Kasaei, and Peiman Keshavarzian. “Benign and malignant breast tumors classification based on region growing and CNN segmenta-tion”. In: Expert Systems with Applications 42.3 (2015), pp. 990–1002.

[20] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. “Learning representa-tions by back-propagating errors”. In: nature 323.6088 (1986), pp. 533–536.

[21] Mark Senn. WIDER Face. 2015.URL: http://shuoyang1213.me/WIDERFACE/. [22] Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. “CNN

features off-the-shelf: an astounding baseline for recognition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2014, pp. 806–813. [23] Hoo-Chang Shin, Holger R Roth, Mingchen Gao, Le Lu, Ziyue Xu, Isabella Nogues,

Jianhua Yao, Daniel Mollura, and Ronald M Summers. “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning”. In: IEEE transactions on medical imaging 35.5 (2016), pp. 1285–1298. [24] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for

large-scale image recognition”. In: 2014.

[25] RN Singh, B Denby, and Tingxiang Ren. “A knowledge-based system for assessing spontaneous combustion risk in longwall mining”. In: Mining Science and Technology 11.1 (1990), pp. 45–54.

[26] Kim-Han Thung and Chong-Yaw Wee. “A brief review on multi-task learning”. In: Mul-timedia Tools and Applications 77.22 (2018), pp. 29705–29725.

[27] Wiki.fast. Fine Tune. 2020. URL: http : / / wiki . fast . ai / index . php / Fine _ tuning.

[28] Wikipedia. Softmax. 2020.URL: https://en.wikipedia.org/wiki/Softmax_

function.

[29] Yizhang Xia, Jing Feng, and Bailing Zhang. “Vehicle Logo Recognition and attributes prediction by multi-task learning with CNN”. In: 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). IEEE. 2016, pp. 668–672.

[30] Chen Xin. “Detection and Recognition For Multiple Flames Using Deep Learning”. PhD thesis. Auckland University of Technology, 2018.

[31] Xi Yin and Xiaoming Liu. “Multi-task convolutional neural network for pose-invariant face recognition”. In: IEEE Transactions on Image Processing 27.2 (2017), pp. 964–975. [32] Shengxin Zha, Florian Luisier, Walter Andrews, Nitish Srivastava, and Ruslan

Salakhutdinov. “Exploiting image-trained CNN architectures for unconstrained video classification”. In: arXiv preprint arXiv:1503.04144 (2015).

[33] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. “Joint face detection and alignment using multitask cascaded convolutional networks”. In: IEEE Signal Process-ing Letters 23.10 (2016), pp. 1499–1503.

[34] Donglai Zhu, Hengshuai Yao, Bei Jiang, and Peng Yu. “Negative log likelihood ratio loss for deep neural network classification”. In: arXiv preprint arXiv:1804.10690 (2018).

Multi-Task Convolutional Learning for Flame Characterization

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Statistics and Machine Learning

2020 | LIU-IDA/STAT-A--20/031–SE

Multi Task Convolutional

Learn-ing for Flame Characterization

Multi Task Convolutional inlärning för ﬂammekarakterisering

Obaid Ur Rehman

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Siemens Turbomachinery AB

1.2

Background

1.3

Motivation

1.4

Objective

1.5

Delimitations

Chapter 2

Theory

2.1

Feed-Forward Neural Networks

Gradient Descent and Back-propagation

Loss functions

2.2

Convolutional Neural Networks

Convolutional Layers

Pooling Layers

Fully Connected Layers

2.3

Multi Task Learning

2.4

Transfer Learning

2.5

Related Studies

Combustion Monitoring / Flame Detection

Multi Task Learning

Transfer learning

Chapter 3

Method

3.1

Data

3.2

Pre-processing & Task Distributions

3.3

Proposed Approach

Fine Tuning

Individual CNN for PFR and Fuel Type

Multi Task Convolutional Neural network

Chapter 4

Results

Individual CNN for PFR and Fuel Type classification

Multi-task CNN for Joint Learning

Chapter 5

Discussion

Results

Methods

Chapter 6

Conclusion

Bibliography