• No results found

Deep Learning to Enhance Fluorescent Signals in Live Cell Imaging

N/A
N/A
Protected

Academic year: 2022

Share "Deep Learning to Enhance Fluorescent Signals in Live Cell Imaging"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Deep Learning to Enhance

Fluorescent Signals in Live Cell Imaging

Edvin Forsgren edvin.forsgren@gmail.com

(2)

Supervisors: Rickard Sjögren Sartorius Stedim Data Analytics AB Christoffer Edlund Sartorius Stedim Data Analytics AB Linus Norenius Department of Physics

Examiner: Eddie Wadbro Computer Science Department

Copyright © 2020. All Rights Reserved.

Master of Science Thesis in Engineering Physics, 30 ECTS Department of Physics

Umeå University

(3)

is a chemical compound that can absorb and subsequently emit light of certain wave lengths. This is used in fluorescent microscopy. However, when taking images with a fluorescent microscope the light will spread from the specimen to the detector. This will result in the images being somewhat blurry and small details can be harder to distinguish. To resolve this issue a dataset containing fluorescent target images and a Gaussian blurred version of the same images was created. A deep learning network, with a U-Net architecture, was then trained to deblur the blurred images.

This resulted in new images more similar to the target. The trained U-Net was then fed the target images and the output was then a deblurred version of these images. In this way a sharper image than the original was created. This enhanced image can then be used to easier and more reliable monitor and evaluate cellular behaviour and reactions. Which in turn is a part of the creation and development of new medicines, vaccines and biopharmaceuticals.

(4)
(5)

in particular Rickard Sjögren, Christoffer Edlund, Tim Jackson and the people at Essen Bioscience. Rickard’s driving will to engage students with Sartorius in combination with great and genuine supervising has made this work very enjoyable. Christoffer, acting as the co-supervisor, has always been there with extra tips and making sure that I feel at home and have everything I need. Tim, who is the one who got the idea for the project and helped with acquiring data for it has also been a vital part of this work.

(6)
(7)

1.1 Background . . . 1

1.2 Aim of thesis . . . 1

2 Theory 3 2.1 Artificial Neural Networks . . . 3

2.1.1 Architecture. . . 3

2.1.2 Activation functions . . . 4

2.2 Training . . . 6

2.2.1 Loss function . . . 7

2.2.2 Optimization Algorithms . . . 7

2.2.3 Feedforward and Backpropagation . . . 9

2.3 Deep learning and Convolutional Neural Networks . . . 11

2.3.1 The U-Net architecture . . . 12

2.3.2 Convolutional layers . . . 12

2.3.3 Max pooling . . . 14

2.3.4 Transposed Convolution . . . 15

2.4 Gaussian Beam and the Point Spread Function . . . 15

2.5 Translation- and optimal rotation matrix . . . 17

3 Methods 18 3.1 Deconvolution of the point spread function in fluorescent images . . . 18

3.2 Out of focus to projection . . . 20

(8)

4.2 Out of focus to optimal projection . . . 25

5 Discussion 27

5.1 Deblurring images . . . 27 5.2 Out of focus to projection . . . 27

References 29

(9)

1 Introduction

1.1 Background

High-throughput microscopy has become an indispensable tool to study biology and the effects of new treatments during early drug discovery. By being able to study how new drugs effect cell cultures grown in wells instead of organisms to a large extent, both cost, time and test animal suffering can be minimized. In comparison to molecular analysis of cell cultures, imaging is non-invasive meaning that live cells can be pictured over time to give rich insight to biology. Additionally, gene technology is often used to place fluorescent markers on proteins of interest making it possible to study biology in finer detail than possible using only light microscopy. However, fluorescent imaging requires long exposure times which impose a trade-off between visible detail and number of images due to phototoxicity. Fluorescent imaging also suffers from blur due to light diffraction and scattering. Ideally, fluorescent imaging should capture maximal biological information with minimum exposure time and blur. To achieve this aim software-based image enhancement, including deconvolution, is commonly applied in practice.

Using computer vision in biological imaging dates back many decades [1], but is becom- ing ever-more important to handle the output from high-throughput imaging platforms.

The field of computer vision, not only limited to cell imaging, has been revolutionized by deep convolutional neural networks (CNNs) in the past decade. In problem domains ranging from relatively simple image classification to more challenging tasks like real-time segmentation of object in video data, CNNs are approaching and sometimes superseding human performance. In live cell imaging, deep learning is increasingly used to detect and segment cells [2, 3, 4], follow cell movement over time[4], forecast cell differentiation[5]

and much more. In the past few years, deep learning has been applied to fluorescence microscopy to enhance image quality with results outperforming traditional signal pro- cessing techniques [6,7,8].

1.2 Aim of thesis

The first aim of the thesis is to investigate whether or not it is possible to minimize the affect of the point spread function (PSF) in fluorescent microscopy imaging by using a CNN. If this is possible, a better quality image could be achieved with a lot less work than with traditional methods. Traditional methods, which mathematically deconvolutes the images generally requires fine tuning of parameters to work well. Being able to have a model which one can pass an image through without any tuning of parameters would make it a lot easier to obtain an image where details and important information are clearer. The image can then be used to more reliably monitor and evaluate cellular behaviour and reactions. Which in turn plays a big part in the creation and development

(10)

of new medicines, vaccines and biopharmaceuticals.

The second aim is to investigate if it is possible to go from an out of focus image to a optimal projection image using a deep learning network. The optimal projection image is created from a stack of images, the problem with this approach is that it is time consuming and the phototoxicity of the fluorescent light kills off the specimen. By instead taking one out of focus image and then passing this image to a model which makes a good projection from it, this could be avoided.

(11)

2 Theory

2.1 Artificial Neural Networks

2.1.1 Architecture

An artificial neural network, most often called ANN, is a network built by artificial neurons and weights. The neurons build the layers in the network which in turn are connected to each other by the weights. A simple feedforward ANN is displayed in Fig.

(1).

Input layer

Hidden layers

Output layer

Weights

Weights

Weights

Figure 1 – A simple ANN

The input will be multiplied with each of the weights and all these products will then be summed up in the next node in combination with a bias if needed. This sum will then be used as an input to an activation function f which will produce the output of the node.

This flow is visualized In Fig. (2).

(12)

Inputs

Figure 2 – The operations done by each node in an ANN. xi and wi is the value of element i of the input and the corresponding weight, b is the optional bias and y is the output of the node.

The reason as to why the activation function is necessary is due to the fact that most problems one wants to solve are nonlinear. Thus, the model used to solve the problem has to be nonlinear as well, which an ANN without activation functions is not since multiplications and scalar addition are linear.

2.1.2 Activation functions

There are a number of different activation functions. The most commonly used ones are Tanh, the Sigmoid function, ReLU and LeakyReLu. Tanh and the Sigmoid function, which are shown in Fig (3), are quite similar as they both accelerates around when the input value approaches zero. The difference is that Tanh ranges from -1 to 1 and the Sigmoid function from 0 to 1.

(13)

-10 -8 -6 -4 -2 0 2 4 6 8 10 -1

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Sigmoid function tanh(x)

Figure 3 – Sigmoid and hyperbolic tangent function.

The Sigmoid function is often used when you want the output to be a probability since a probability always ranges between 0 and 1. Tanh on the other hand are more often used in classification problems, such as telling wether an animal is a cat or a dog. A problem which may arise with both of these function is the ”vanishing gradient”-problem. We will look more into this in section2.2.3.

ReLU and the Leaky ReLU (LReLU), shown in Fig. (4), are two other activation func- tions which are commonly used in Deep Learning and Convolutional Neural Networks which we will investigate further in section2.3.1. These functions are less prone to the

”vanishing gradient descent”-problem and have a more efficient computation which results in faster training of networks.

(14)

-10 -8 -6 -4 -2 0 2 4 6 8 10 -2

0 2 4 6 8 10

ReLU(x) = max(0,x) LReLU(x) = max(ax, x)

Figure 4 – ReLU and Leaky ReLU function with a=0.05.

2.2 Training

So far we have been talking about the structures of ANN:s and that they, with the help of an activation function can be used to solve non-linear problems by training them.

Training an ANN requires a suitable dataset, an optimization algorithm, often called an optimizer, as well as a loss function. The number of times an ANN sees the entire dataset is called an epoch.

An ANN solves problems by, in a way, representing a function, F, such that the ANN will yield F(x) → y where x is the input data and y is the target output. This function consists of all the weights of the ANN which we want to optimize.

(15)

2.2.1 Loss function

The loss function will calculate the difference between the output and the target. One of the most common ways of doing this is by using Mean Square Error (MSE) which is calculated as

M SE = Pn

i=1(yi− ˆyi)2

n . (1)

Here n is the number of training examples, yi is the ith ground truth and ˆyi is the ith output. Another similar way of calculating the loss by Mean Average Error (MAE) which is done as

M AE = Pn

i=1|yi− ˆyi|

n . (2)

These two loss functions behave pretty different from each other and will result in two differently trained networks.

MSE will be heavily affected by outliers since it takes the square of the error and thus the outliers error will be very big. For MAE the outliers wont peak to the same extent and will therefore not have such a big effect on the training.

Using the MSE in an image-to-image network will therefore often result in a blurry output image. The reason to this is that a blurred image does not have any outliers which can make the error peak for MSE. Thus a blurry, pixel averaged image will often result in a low loss when using MSE. MAE will therefore often be a better option when working with image-to-image problems [9].

2.2.2 Optimization Algorithms

The most basic way of optimizing a function is by using an algorithm known as Gradient Descent (GD). This is a first order optimizer which simply calculates the gradient of the loss function, L, with respect to the weights and bias as

∇L = ∂

∂bL, ∂

∂w1L, ∂

∂w2L, ..., ∂

∂wnL



(3) where w are the weights and b is the bias. We then also have a learning rate, α, which is called a hyper-parameter, and the weights will then be updated as

w0i= wi− α ∂

∂wi

L (4)

where the minus sign is because we want to minimize the loss and not maximize. This is how the loss is calculated and updated in a single layer. GD is a very basic way of optimizing and therefore comes with some flaws. The biggest flaw is that GD requires

(16)

lots of memory since the weights are updated after calculating the loss and the gradient of the whole dataset.

This flaw can be fixed either by updating the weights on small samples of the dataset, called batches or on every single training sample. Calculating the gradient of one batch or sample and updating the weights accordingly will result in a less accurate step towards the minima of the loss function. But doing so repeatedly on many batches or samples will result in a similar result using less memory and computing power. Altough, if the dataset at hand has a lot of variations, the loss can increase even after reaching a global minima, if the weights are updated for every single sample or if the batch size is small enough. These methods are called Stochastic Gradient Decent (SGD) and Mini-Batch Gradient Descent. These methods are not bulletproof either and has a tendency to get stuck at saddle points, an idea to fixing this problem is to introduce momentum.

Momentum has gotten its name from the physical property momentum since it works similarly. Momentum is added to the weight update in a way such that the prior gradient also will have an affect on the update. The weight update then consists of two parts

Wj+1= Wj− VJ (5)

where W are the weights and V is given by

Vj = βVj−1+ α∇L(Wj) (6)

where β is the momentum hyper-parameter. Introducing momentum to the equation not only prevents the network from getting stuck at saddle points, it also reduce oscillations, makes it less sensitive to variations in the dataset and will due to all of this also converge faster. On the other hand, there is a risk of missing the global minima completely if the momentum is set too high. Therefore, it is a good idea to also try to foresee what the loss looks like ahead. This is simply done by changing Eq. (6) into

Vj = βVj−1+ α∇L(Wj− βVj−1). (7) The idea here is that we are modifying the upcoming weights partly by using Wj− βVj−1 and thus, by calculating the loss in regard to this expression will lead to the change slowing down as we approach a minima. This method is known as Nesterov Accelerated Gradient (NAG) [10]. What all these methods have in common are that the hyper- parameters have to be chosen and are then held constant through out the training.

It seems very unlikely that this will produce the optimal network since all the weights will not be as commonly used and also that a slower learning rate is usually beneficial in the end when we are getting close to the global minima.

An algorithm which does this is Root Mean Squared propagation (RMSprop) [11] which

(17)

where U is given by

Uj = γUj−1+ (1 − γ)(∇L(Wj))2. (9) In Eq. (9) γ is a parameter which can be chosen at will but is most often set to 0.9 and  is just a small value to avoid division by zero. In practice, these equations give all the weights a separate learning rate which corresponds to how often they are updated.

Weights which are less often occurring and have smaller gradients will have an increased learning rate. Uj will also decrease exponentially to avoid that the model will stop learning all together.

The Adaptive Moment Estimation or the Adam optimizer [12], which is the optimizer of choice in this thesis, is similar to RMSprop but also includes a decaying average of non squared gradients. This is similar to momentum, hence the name. The Adam update scheme is written as

Wj+1= Wj− α

pˆvj+ mˆj (10)

where  is 10−8 and ˆv and ˆm are given as ˆ

mj = mj

1 − β1j and ˆvj = vj

1 − βj2 (11)

where β1 and β2 are values close to 1 and v and m are calculated as

mj = β1mj−1+ (1 − β1) ∇L(Wj) and vj = β2vj−1+ (1 − β2) (∇L(Wj))2. (12) As we can see v is taken directly from the RMSprop approach and m, which is the momentum part, is new. Both m and v are initiated as zero vectors, thus they will be biased towards zero in the beginning. Eq. (11) is a bias correction equation to make up for this. Suitable values for β1 and β2 has been found to be 0.9 and 0.999.

2.2.3 Feedforward and Backpropagation

To update all the weights in a multilayer network a method called Backpropagation is used. This method propagates back through all the layers and updates the weights using the chain rule. In Fig. (5) we can see an example of a very simple multi layer network.

(18)

Figure 5 – A simple multilayer ANN

Starting with the Feedforward part of the training of the network and using the notation from Fig. (5) we have that

h1= w11(1)x1+ w(1)21x2+ w31(1) and h2 = w(1)12x1+ w22(1)x2+ w(1)32. (13) Let us call the activation function in h1 and h2, f. We then also have

h = w(2)11f (h1) + w(2)21f (h2) (14) and with the same activation function in the h node, the output ˆy from the network will be

f (h) = ˆy. (15)

The output is thus dependent on many different variables. Letting w(1) and w(2) be the weights in layer 1 and 2 respectively and the node values from the first layer x in combination with Eq. (13), (14) and (15) we can write

ˆ

y = f ◦ w(2)◦ f ◦ w(1)(x). (16)

This is the feedforward part of an ANN.

Recall Eq. (3) which we used to calculate the gradient of the loss. If this equation is applied to the first layer in the network, i.e. the last one to have its weight updated through backpropagation, the loss function would be dependent on many more variables.

In the network shown in Fig. (5) the w(1) part of the gradient would be expressed as

(19)

showing the dependence of all the earlier layers of the network. This product of multiple derivatives, which all have small values less than one, will be tiny. Looking at Fig. (6) we can see that a multiplication of many derivatives of the Sigmoid function would end up in a vanishing small product. This is what is known as the “vanishing gradient”-problem.

-10 -8 -6 -4 -2 0 2 4 6 8 10

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Sigmoid function

Derivative of Sigmoid function

Figure 6 – A plot of the Sigmoid function and its derivative.

2.3 Deep learning and Convolutional Neural Networks

We have already touched the topic of Deep learning, with the multilayer network being sort of a deep network. Deep learning networks are simply an ANN with many layers, i.e. a deep structure. A Convolutional Neural Network (CNN) is a type of deep learning network which is commonly used when working with analyzing images. The name is somewhat self-explanatory since the definition of a CNN only requires one of the layers to be a convolutional layer, which uses convolution, more on this in section2.3.2, instead of a matrix multiplication.

(20)

2.3.1 The U-Net architecture

A U-Net architecture, first created in 2015 [13], is a widely used architecture in the field of image analysis. The name comes from the shape of the network which is shown in Fig.

(7).

1 64 64

128 128

256 256

512 512

1024

512 1024

256 512

128 256

64

128 64 1

conv 3x3, ReLU copy and crop

max pool 2x2 up-conv 2x2

conv 1x1

Figure 7 – The architecture of a U-Net

The input is an image which comes in from the left in the figure, a 3x3 convolution will then be applied to the image, in combination with a ReLU activation function. This will be done twice before the image will goes through a 2x2 max pooling layer. These steps will be repeated three more times until we reach the bottom of the U-Net. This part of the network is called the encoding part and the next part is called the decoding part. The decoding part includes similar operations, but instead of max pooling layers up-convolutional- or transpose convolutional layers will be used.

2.3.2 Convolutional layers

The convolutional layers in a CNN are used to identify different features of an image.

For example the kernels

(21)

would be used to identify horizontal, diagonal and vertical features respectively in an image. These kernels are used as feature filters by striding across the height and width of the image and performing a dot product between the kernel and the corresponding pixels in the image. An example of this can be seen in Fig. (8)

7 2

1 3

1 3

3 4

9 3

6 7

1 4

3 5

0 0

1 1

0 0

0 1 0 3

4 4 5

6 7 3 5 5

7 10 13 8 16 15

11 9 13

Figure 8 – To the left we can see the input image in the form of an matrix with pixel values, in the middle is a kernel which will act as an horizontal filter and to the right is the so called activation map.

Each convolutional layer has a number of input channels, which is the depth of the image and a number of output channels which are both parameters chosen by the user. The user also defines kernel size, stride, padding and dilation. Stride represents how big the steps taken by the kernel are, padding adds a layer of zeros on the edges of the image.

Padding is used when the user does not want the size of the image to decrease too much or not at all. Lastly, dilation is visualized in Fig. (9).

Dilation: 1 Kernel size: 2x2

Dilation: 2 Kernel size: 2x2

Dialation: 2 Kernel size: 3x3

Figure 9 – In the image the kernel are represented by the blue squares and is applied to the gray matrix of pixel values.

(22)

The example given above in Fig. (8) has one input channel and one output channel. If the image is a RGB image, there will be three input channels. In the U-Net we go from one input channel to 64 output channels. This would result in 64 different kernels being used and result in an activation map with 64 channels as is visualized in Fig. (10).

0 0

1 1

0 0

0 1 0

7 10 13 8 16 15

11 9 13

7 2

1 3

1 3

3 4

9 3

6 7

1 4

3 5

3 4 4 5

6 7 3 5 5

Figure 10 – To the left we can see the input image in the form of an matrix with pixel values, in the middle is a kernel which will act as an horizontal filter and to the right is the so called activation map.

Each of the kernels applied to the original image will be randomly initialized filters which will then be optimized in the training process to extract important features from the image. As the image will go further and further into the U-Net we will have a depth of 1024. Since three different filters can find horizontal, vertical and diagonal features in an image, one can only imagine all the different feature a U-Net will be able to extract from an image.

2.3.3 Max pooling

Max pooling is a way of downsampling an image. A max pooling layer is similar to a convolutional layer in some sense since they both consist of a kernel which will make its way through the matrix of pixels. However, instead of multiplying the kernel with the numbers in the matrix of pixels, a max pooling kernel will instead choose the maximum value of the pixels the kernel covers. An example of max pooling with kernel size 2x2 and stride set to 2 is shown in Fig. (11).

(23)

7 2

1 3 7

1 3

3 4

9 3

6 7

1 4

3 5

4 5

9  

Figure 11 – A figure which shows max pooling with kernel size 2x2 and stride = 2

2.3.4 Transposed Convolution

Transposed convolution is an upsampling method. It works similarly to the convolutional layer, but with some modifications to make sure that the transposed convolution will perform an upsampling instead of a downsampling. Therefore, stride and padding will in a way have the opposite effect on the output as opposed to regular convolution. The user chosen stride (s) will result in s−1 zeros being inserted between all the input values in the matrix. The padding (p) and kernel size (k) will together determine the padding used according to k − p − 1. The stride, or step size, always used is then one due to obvious reasons. Tab. (1) summarizes the differences between regular- and transposed convolution.

Table 1 – A comparison between regular- and transposed convolution where the user defines stride (s), padding (p) and kernel size (k). (i) is the input size and the output size can thus be chosen by choosing suitable parameters.

Conv. Type Zero Insertions Padding Stride Output Size

Regular 0 p s (i + 2p − k) / s + 1

Transposed (s − 1) (k − p − 1) 1 (i − 1)s + k − 2p

2.4 Gaussian Beam and the Point Spread Function

When light travels through space it will, as we know, spread. If the light is monochro- matic, i.e. is a Gaussian beam, the spreading of light can be described by a Gaussian function, which is shown in Fig. (12).

(24)

Relative Intensity

Radial position W

Figure 12 – A Gaussian beam spreads according to a Gaussian function. W is the original width of the beam.

This phenomena will make images taken with a fluorescent microscope to seem more or less blurry. This blur can be represented by a Gaussian blur which uses the Gaussian function

G(x, y) = 1

2πσ2ex2+y22σ2 (19)

where x and y are the distances from the origin in horizontal and vertical axis and σ is the standard deviation. G(x, y) is then used to produce a so called Gaussian kernel, which is a square array of values as

1 273

1 4 7 4 1

4 16 26 16 4 7 26 41 26 7 4 16 26 16 4

1 4 7 4 1

(20)

(25)

2.5 Translation- and optimal rotation matrix

To align two images one can select a number of positions which matches in the images and save these positions in two vectors. With these vectors, which we call P and Q, we can find the optimal rotation matrix and translation vector using the algorithm below.

1 % Compute the centers of mass P and Q, of the two point sets.

2 p = 1nPn i=1pi

3 q = n1 Pn i=1qi

4 % Shift the point sets such that their centers of mass align with the origin.

5 x = P− p;

6 y = Q− q;

7 % Compute the covariance matrix S.

8 S = x ∗ yT;

9 % Compute the singular value decomposition of S.

10 [U, ~, V] = svd(S);

11 % Create an identity matrix of size nXn

12 ID = eye(n);

13 % Set the last element to the determinant of V ∗ UT to avoid reflection

14 ID(n,n) = det(V ∗ UT);

15 % Compute the optimal rotation matrix.

16 R = V ∗ ID ∗ UT;

17 % Compute the optimal translation vector t.

18 t = qR*p;

(26)

3 Methods

3.1 Deconvolution of the point spread function in fluorescent images One of the most important aspects to solve this problem is to have a good dataset. A good dataset can train a model to produce a deblurred output image from a blurred input image. Since we only have a blurred image to start with, we can blur this original image one step further and use that as our training image and the original one as the target. This was done by applying the right amount of Gaussian Blur (GB), as described in section 2.4 and with the help of the OpenCV [14] function GaussianBlur [15]. The right amount was found by trial and error and had a kernel size of 3 and σ as 1. If too much GB was applied to the image, the model seemed to have trouble distinguishing important features in the images. If too little GB was applied the output from the model and the input would be close to indistinguishable.

The target images also included some noise, which is inherent when taking fluorescent microscopy images, when a similar noise was added to the blurred images, it disturbed the model in such a way that it tried to remove/shift the noise rather than reduce the effect from the PSF. Noise was therefore not used in the model which performed the best.

The paired images, the blurred ones and the target ones, were then split into a validation and a training set, randomly cropped in the same way and then fed to the network in the training loop. During this time, the images were thought to have been normalized by the torch functionToTensor[16], but in hindsight it was discovered that this was not the case. Thus, the network which was found to produce the best output, is trained to be applied on unscaled images. In Fig. (13) an example of a target image is shown in (a) and (b) and a blurred image in (c) and (d).

(27)

(a) (b)

(c) (d)

Figure 13 – In (a) we can see the original image, in (b) a zoomed in tile of the original image, in (c) an output image produced from the model with the original image as input data and in (d) the same zoomed in tile as of the output image.

The U-Net with the best performance was trained on a dataset containing 1792 images.

These microscopic images was acquired using Incucyte (R) Live-cell Analysis System.

The training process used the Adam optimizer and MAE as the loss function. The learning rate was set to 3 × 10−3, batch size to 12 and it was trained over 50 epochs.

However the smallest validation loss was found at epoch 37, which also produced the best visual output.

(28)

3.2 Out of focus to projection

The idea of this part of the thesis was to do a similar approach as the previous part.

The main difference was that the dataset in this case was already prepared with out of focus and target images. However, there was an issue with the provided dataset, namely that the pair of images was not quite aligned. Since the U-Net performs a pixel to pixel mapping, the U-Net was not able to converge on such a dataset.

To resolve this issue, the images had to be aligned, in a pixel to pixel manner. The align algorithm presented in section 2.5 calculates the translation vector and rotation matrix which performs this alignment. The alignment only works if the input vectors of positions matches perfectly. By applying a GB and a threshold a binary image of the cluster was obtained. An example can be seen in Fig. (14).

(a) (b)

Figure 14 – In (a) we can see the original target image and in (b) its binary representation.

The two vectors of positions was found using OpenCV:s function findContours [17].

This function returns a vector of positions for the contours of the binary image which is then used as an input for the align algorithm. This model was then trained in the same manner as the previous one.

(29)

4 Results

4.1 Deblurring images

During this thesis many models were trained and different results where achieved. In Tab. (2) data from different models are presented.

Table 2 – A comparison between different models which were trained in the same way.

Model N. Epochs Learning Rate MAE Val. loss Min. loss at

A 50 0.0003 0.644 Epoch 37

B 45 0.0003 0.722 Epoch 32

C 50 0.0005 0.845 Epoch 33

D 45 0.0005 1.089 Epoch 32

In Fig. (15) a comparison between the output from the different models presented in the Tab. (2) are shown. The difference between the output of the models is not substantial.

Although, the output from the models A and B are somewhat brighter than those from C and D. The brighter ones are more similar to the original image. Comparing (a) and (b) one can note that the lines in (a) are a little bit brighter and more clear overall.

(30)

(a) (b)

(c) (d)

Figure 15 – Outputs from model A, B, C and D in each corresponding subfigure (a), (b), (c) and (d).

Due to model A being the one deemed to perform best, the coming images shown are output from that model.

In Fig. (16) an original image is compared to an output image from the model with the original image as input.

(31)

(a) (b)

(c) (d)

Figure 16 – In (a) we can see the original image, in (b) a zoomed in tile of the original image, in (c) an output image produced from the model with the original image as input data and in (d) the same zoomed in tile as of the output image.

Here, the output image has a less prevalent effect from the PSF and that the light is, in a way, squeezed back into its place. In Fig. (17) we have passed the image through the model twice and thus achieving a more aggressive deblurring and reduction of the effects from the PSF.

(32)

(a) (b)

(c) (d)

Figure 17 – In (a) we can see the original image, in (b) a zoomed in tile of the original image, in (c) an output image produced after going through the model twice with the original image as input data and in (d) the same zoomed in tile as of the output image.

The images shown so far are of the same character as the model is trained on. In Fig.

(18) we can see a whole other type of image, which is a cluster of cells instead of cell neurons. The output image shown here has been passed twice through the model.

(33)

(a) (b)

(c) (d)

Figure 18 – In (a) we can see the original image, in (b) a zoomed in tile of the original image, in (c) an output image produced from the model with the original image as input data and in (d) the same zoomed in tile as of the output image.

In this figure the output becomes less blurry and the light has once again been squeezed back into its source.

4.2 Out of focus to optimal projection

In this part there was an issue with the dataset which was that the image pair were not completely aligned. Fig. (19) shows the result after training a model on such a dataset.

(34)

(a) (b) (c)

Figure 19 – In (a) we can see the input image, in (b) the output from a trained model and in (c) the target.

The output is, as expected, more similar to the the target than the input. Although, not similar enough to be a successful result.

The attempts made of creating an alignment algorithm were not quite successful and although it worked on a some images in the dataset it never got to the point where it worked on all the images. Thus making the training of the network unsuccessful.

(35)

5 Discussion

5.1 Deblurring images

The result shows that deblurring images and in that sense reduce the effect of the PSF is possible with a U-Net architecture. This deblurring of the images makes it easier to get biological insight of the details in the fluorescent images and can thus simplifies and accelerate the development of new pharmaceuticals. This result also indicates that the PSF of fluorescent microscopy images can be well approximated with a standard Gaussian blur.

Despite the result looking promising, this technique is not thoroughly tested yet. The model used is only trained on one dataset containing one type of images. However, the model still performs well on one other type of images different to the ones it was trained on, which was shown in Fig. (18). This indicates that the model could work on many more fluorescent grey scale images, but further testing has to be made to conclude that.

The model could also be improved more. In its current state it will produce checkerboard artefacts in the images if they are passed through the model more than once. This can be seen in the background of Fig. (18). This is a result from the up-sampling part of the U-Net which uses transpose convolution. Replacing the transpose convolution with another up-sampling method, such as a bilinear or a nearest neighbour convolution, could improve the model and get rid of these artefacts [18].

There is also a possibility that some other network architecture could perform similar or better than the U-Net architecture used in this model. Testing other architectures and comparing to the result from the U-Net would therefore be a good idea when trying to find the optimal deblur model. The results found in this project have not been compared to traditional methods of deconvoluting the PSF either. The reason as to why this has not been investigated is that the aim of the project is to investigate if it is possible to achieve a satisfying result using deep learning. Not to find the optimal deblurring method.

5.2 Out of focus to projection

The second part of this thesis, the out of focus to projection part, was less successful.

This was partly, or mainly, due to the dataset not being suitable for the chosen net- work architecture. The suggested approach to resolve this problem, namely using an optimal rotational matrix and translation vector was not fully investigated due to time constraints. This approach used the contour of the cell cluster which was found using the python library OpenCV:s function findContours. The function will be applied to the out of focus image and the projection image. We will then end up with two vectors filled

(36)

with a number of positions which corresponds to the contour of the two images. One issue here is that it is not guaranteed that the positions in the two vectors correspond to one another. This is likely the reason as to why the alignment did not work sufficiently good. An idea on how to fix this is to look at the distance from the “mass” center of the cluster and then shift the vectors in such a way that the position with largest distance from the center will be at index 0. This could possibly resolve the issue.

Another way of approaching the problem is to change network architecture. The U-Net is a pixel-to-pixel network which does not remap any pixels. Changing to some type of General Adversarial Network (GAN) [19], which is not a pixel-to-pixel network, could also likely resolve the issue.

(37)

References

[1] Kenneth Castleman et al. “Karyotype analysis by computer and its application to mutagenicity testing of environmental chemicals”. In: Mutation research 41 (Dec.

1976), pp. 153–61. doi:10.1016/0027-5107(76)90085-3.

[2] Stephan Wienert et al. “Detection and Segmentation of Cell Nuclei in Virtual Microscopy Images: A Minimum-Model Approach”. In: Scientific reports 2 (July 2012), p. 503. doi:10.1038/srep00503.

[3] O. Ronneberger, P.Fischer, and T. Brox. “U-Net: Convolutional Networks for Biomed- ical Image Segmentation”. In: LNCS 9351 (2015). (available on arXiv:1505.04597 [cs.CV]), pp. 234–241. url:http://lmb.informatik.uni-freiburg.de/Publications/

2015/RFB15a.

[4] Hsieh Fu Tsai et al. “Usiigaci: Instance-aware cell tracking in stain-free phase con- trast microscopy enabled by machine learning”. English (US). In: SoftwareX 9 (Jan.

2019), pp. 230–237. issn: 2352-7110. doi:10.1016/j.softx.2019.02.007.

[5] Felix Buggenthin et al. “Prospective identification of hematopoietic lineage choice by deep learning”. In: Nature methods 14 (Feb. 2017), pp. 403–406. doi:10.1038/

nmeth.4182.

[6] Martin Weigert et al. “Isotropic reconstruction of 3D fluorescence microscopy im- ages using convolutional neural networks”. In: (Apr. 2017).

[7] Hao Zhang et al. “High-throughput, high-resolution deep learning microscopy based on registration-free generative adversarial network”. In: Biomedical Optics Express 10 (Mar. 2019), p. 1044. doi:10.1364/BOE.10.001044.

[8] Sungjun Lim et al. “Blind Deconvolution Microscopy Using Cycle Consistent CNN with Explicit PSF Layer”. In: (Apr. 2019).

[9] Christopher Thomas. Deep learning image enhancement insights on loss function engineering. url: https : / / towardsdatascience . com / deep - learning - image - enhancement - insights - on - loss - function - engineering - f57ccbb585d7. ac- cessed: 05.28.2020.

[10] Y. E. NESTEROV. “A method for solving the convex programming problem with convergence rate O(1/k2)”. In: Dokl. Akad. Nauk SSSR 269 (1983).

[11] G. Hinton. Lecture 6e - RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning. 2012.

[12] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization.

cite arxiv:1412.6980Comment: Published as a conference paper at the 3rd Inter- national Conference for Learning Representations, San Diego, 2015. 2014. url:

http://arxiv.org/abs/1412.6980.

[13] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-Net: Convolutional Net- works for Biomedical Image Segmentation”. In: CoRR abs/1505.04597 (2015). arXiv:

1505.04597. url: http://arxiv.org/abs/1505.04597.

(38)

[14] G. Bradski. “The OpenCV Library”. In: Dr. Dobb’s Journal of Software Tools (2000).

[15] OpenCV. OpenCV - GaussianBlur Documentaion. url: https://docs.opencv.

org/2.4/modules/imgproc/doc/filtering.html?highlight=gaussianblur#

gaussianblur. accessed: 05.28.2020.

[16] PyTorch. torchvision - ToTensor Documentaion. url: https : / / pytorch . org / docs/stable/torchvision/transforms.html#torchvision.transforms.ToTensor.

accessed: 05.28.2020.

[17] OpenCV. OpenCV - findContours Documentaion. url: https://docs.opencv.

org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.

html?highlight=find%5C%20contours#cv2.findContours. accessed: 05.28.2020.

[18] Augustus Odena, Vincent Dumoulin, and Chris Olah. “Deconvolution and Checker- board Artifacts”. In: Distill (2016). doi: 10.23915/distill.00003. url: http:

//distill.pub/2016/deconv-checkerboard.

[19] Jun-Yan Zhu et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”. In: Computer Vision (ICCV), 2017 IEEE International Conference on. 2017.

References

Related documents

spårbarhet av resurser i leverantörskedjan, ekonomiskt stöd för att minska miljörelaterade risker, riktlinjer för hur företag kan agera för att minska miljöriskerna,

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Från den teoretiska modellen vet vi att när det finns två budgivare på marknaden, och marknadsandelen för månadens vara ökar, så leder detta till lägre

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Regioner med en omfattande varuproduktion hade också en tydlig tendens att ha den starkaste nedgången i bruttoregionproduktionen (BRP) under krisåret 2009. De

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar