• No results found

Evaluation of In-Silico Labeling for Live Cell Imaging

N/A
N/A
Protected

Academic year: 2021

Share "Evaluation of In-Silico Labeling for Live Cell Imaging"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Elsa Sörman Paulsson

elsa89p@hotmail.com

(2)

Supervisors: Rickard Sjögren Sartorius Stedim Data Analytics AB Christoffer Edlund Sartorius Stedim Data Analytics AB Lucas Hedström Umeå University, Department of Physics

Examiner: Patrik Ryden Umeå University, Department of Mathematics and Mathematical Statistics

Copyright

©

2021. All Rights Reserved.

Master of Science Thesis in Engineering Physics, 30 ECTS. Department of Physics

Umeå University

(3)

Abstract

(4)

Acknowledgements

(5)

Contents

1 Project description 1 1.1 Background . . . 1 1.2 Project Aims . . . 1 2 Theory 2 2.1 Machine Learning . . . 2

2.2 Artificial Neural Networks . . . 2

2.3 Training . . . 4

2.3.1 Forward pass . . . 4

2.3.2 Loss function . . . 5

2.3.3 Backpropagation . . . 6

2.3.4 Optimization algorithm . . . 7

2.4 Convolutional Neural Network (CNN) . . . 8

2.4.1 Convolutional Layer . . . 8 2.4.2 Padding . . . 10 2.4.3 Pooling layers . . . 10 2.4.4 Dropout . . . 10 2.4.5 Activation function . . . 10 2.4.6 Batch normalization . . . 11 2.4.7 U-Net . . . 12 2.5 In-Silico Labeling . . . 14

2.6 Data and Pre-processing . . . 14

3 Method 15 3.1 Pixel-wise regression . . . 15 3.2 Pixel-wise classification . . . 16 3.3 Single-Cell classification . . . 16 4 Results 16 4.1 Pixel-wise regression . . . 16 4.2 Pixel-wise classification . . . 18

4.3 Individual cell classification . . . 19

5 Discussion 20 5.1 Pixel-wise regression . . . 20

5.2 Pixel-wise classification . . . 21

5.3 Individual cell classification . . . 21

(6)
(7)

1

Project description

1.1

Background

High-throughput microscopy is an indispensable tool to study biology and the effects of new treatments during drug development. By studying how new drugs affect cell cultures grown in wells instead of organisms allow time, cost, and animal testing to be minimized.

Compared to molecular analysis, imaging is non-invasive, allowing us to follow live cells over time to give a rich insight into biology. In addition to light microscopy, fluorescent probes are commonly used to study biology in finer detail by labeling objects too small to see using light microscopy. However, fluorescent microscopy requires long exposure times which impose a trade-off between visible detail and the number of images due to phototoxicity. Phototoxicity is a process with illumination, with high laser power or for prolonged periods, causing damaging on imaged cells. It can negatively influence cell survival and data quality. The number of labeled targets that can be captured simultaneously is also limited by the number of filters and light sources that can be used due to spectral overlap and physical space within the instrument.

Using computer vision in biological imaging dates back many decades and is be-coming ever-more important to be able to handle the output from modern high-throughput instruments. The field of computer vision, not only limited to cell imaging, has in the past decade been revolutionized by deep learning using so-called convolutional neural networks (CNNs). In live-cell imaging, deep learning is being increasingly used to detect and segment cells, follow cell movement over time, forecast cell differentiation, and much more. A workflow called In-Silico labeling was recently published where CNNs were trained to predict the corre-sponding fluorescent label of light microscopy images.[1] This workflow effectively increased the number of targets that can be captured by using deep learning to circumvent the problems of phototoxicity and instrument limitations. Although this so-called In-Silico labeling has attracted much attention, the original study suffered from many shortcomings including unorthodox model design, reliance on many more focus planes than practical. A scientific evaluation of the possibilities and limitations of In-Silico labeling for live-cell imaging applications is missing.

1.2

Project Aims

(8)

• Develop a strategy for In-Silico labeling working on Incucyte phase-contrast images.

• Will the problem be easier to solve with single-cell classification instead of an image-to-image prediction?

• Does pixel-wise classification improve results over pixel-wise regression in image-to-image prediction? In classification the ground truth images will be binary while for the regression they are continuous.

2

Theory

In this part the major theory of the project will be described, starting with the basics of Machine Learning and how the training is performed, going on to the part about Convolutional Neural Networks (CNNs) followed by a bit about In-Silico Labeling (ISL).

2.1

Machine Learning

Machine learning is a part of artificial intelligence with a combination of com-puter vision, statistics, data mining, and optimization. Machine Learning describes methods to find relations in data based on previously shown example data. Su-pervised learning means that you have examples of desired input-output pairs and machine learning methods are used to find an optimal mapping between inputs and outputs.

There are two main techniques of supervised learning, classification and regres-siontechnique. The first mentioned predicts a discrete output, for example, if it is a cat or a dog in an image. The input, in that case, will be an image and the label a number corresponding to an animal. Regression techniques predict continuous outputs, like temperature changes, or like in our case, emitted fluorescence.

2.2

Artificial Neural Networks

(9)

layer is connected to every neuron in the next layer, which is called fully connected layers.

(10)

Figure 2: One of the artificial neurons in previous figure, with weights wi, outputs

from previous layer xi, and activation function φ

For a given neuron in the network, see figure 2, the output y will be

y = φ (Σni(wixi) + b) (1)

where n is the number of inputs, xi the i-th output from the previous layer, wi

the corresponding weight, b a bias, and φ some nonlinear function. The nonlinear function is often called Activation function, for example sigmoid function and Rectified Linear Unit, more in detail about this in the CNN section2.4.

2.3

Training

The part in machine learning when your network learns is often called training and it minimizes the difference between outputs and the expected outputs, denoted loss. A decreasing loss is a sign that your network has learned from previous data, and that your networks predictions is increasingly becoming more similar to the expected output. You start with a network that knows nothing, i.e. the parameters are set randomly. When training these parameters will be changed iteratively to get the predicted output more like the expected output.

2.3.1 Forward pass

(11)

2.3.2 Loss function

To know how good a network performs and to make it learn and get better a loss function is used. It compares the output with the ground truth in a certain way depending on which loss function is used. There are several different loss functions and some fit better than others to a certain problem. Three loss functions commonly used in image analysis and in this project are the Mean Absolute Error (MAE or L1) loss, the Mean Squared Error (MSE) loss, and the Dice loss, all described below. First, the L1-Loss is given by:

L1 − Loss = Σ

N

n|ˆyn− yn|

N (2)

where ˆynis each prediction, ynits corresponding target with a total of N elements

each.[2] MSE-Loss is quite similar but with the square instead of the absolute value and is given by

MSE − Loss = Σ

N

n(ˆyn− yn)2

N . (3)

Same here as in equation 2, ˆyn is each prediction, yn the target, and N the total

of elements each.[3] One problem with the L1-loss is that the gradient is the same everywhere, and it therefor might miss the minimum value when updating. The L1 loss is more robust to outliers, if the absolute error is larger than 1 the squared of it will be larger and it increases fast, so this making losses like MSE give more weight to outliers.

Another loss function used in this thesis work is the Smooth L1 loss, it is basically a combination of L1 and MSE loss.[4]. This loss function will solve the problems with L1 and MSE loss described above. Let x be the difference between input and target, x =|input - target| then the Smooth L1 loss is defined as

Loss =      x2 2β, x < β x − β2, otherwise (4)

(12)

These three loss functions are used for pixel-wise regression while for pixel-wise classification the Dice Loss is used. The Dice coefficient D is

D = 2Σnyˆnyn Σn(ˆy2n+ yn2)

. (5)

The ˆy stands for predicted probability and y is the ground truth. In equation5one can see that D generates a scale from 0 to 1 determining how well the predictions overlap the ground truth, with 1 being a complete overlap. The Dice Coefficient loss is therefor defined as 1-D.[5]

For a classification problems another loss function is needed, the Negative Log Likelihood (NLL) loss is useful. The equation for the NLL-loss is shown in equation

6

Loss = −log(ˆyt) (6)

where ˆyt is the predicted probability of the correct class. The output from the

network will be a vector with as many elements as there are classes, with the predicted probability of each class. If the output is [0.2, 0.3, 0.7, 0.1] and the target vector is [0, 0, 1, 0] then the probability of the true class is 0.7. When you have an unbalanced training set this loss function can be weighted to fit better to your problem.[6]

The aim is to get the loss to be as low as possible. To get there we need to update the parameters in the network, which takes us to the next part of the training. 2.3.3 Backpropagation

After the forward propagation and the computation of the loss, we have to do something to try to decrease the loss. In the backpropagation part, the loss is sent through the network backwards. The gradients are computed for each weight,

∂L ∂wi

. (7)

Equation 7shows the partial derivative of the loss with respect to the i-th weight in the model. Analytically it is straightforward to calculate the derivatives but a computer calculates them numerically, this can be computational expensive. Back propagation reduces computational costs by utilizing the chain rule of calculus. For functions f and g, both mapping from real numbers to real numbers, let y = g(wi)

and L = f(g(wi)) = f (y), then chain rule is

(13)

and if x and y are vectors instead x ∈ R and y ∈ R, then in vector notation this can be written ∇xz =  ∂y ∂x T ∇yz (9) where ∂y

∂x is the Jacobian matrix of g.[7] When computing the gradients many

sub-expressions are repeated, which needs to be taken into consideration. For some problems, computing the expressions each time is wasteful, while in other cases, storing the computed gradients can be very memory consuming.

With the backward iteration and computing the derivatives from the last layer first the redundant calculations of the intermediate terms in the chain rule will be avoided. Compared to computing the gradient with respect to each weight separately this is much less computationally expensive.

2.3.4 Optimization algorithm

After the gradients are computed from the backpropagation these will be used to update the weights by the optimization algorithm to reduce the loss. All opti-mization algorithms are based on the Gradient Decent (GD) method that is an iterative optimization algorithm to find the minimum of a function. GD looks at the gradients and takes steps in the direction where the loss curve decrease until it reaches a minimum value.

GD creates a new point

x0 = x − ∇xf (x) (10)

where  is the Learning rate that decides the size of the step in the direction of the gradient. When choosing the learning rate there is a balancing between converging and overshooting, see figure3. A learning rate too big can cause a step too big as it misses the minimum while too small will increase learning time and you might get stuck in a local minimum.

(14)

Figure 3: The loss function C(w) and to the left with a too big learning rate and to the right a too small.

It is not always optimal to have a constant Learning rate but manually creating hyperparameters for a learning rate schedule can be a lot of work. There are algorithms that improves the standard SDG with an Adaptive Learning rate with different heuristic methods.[8] The Adagrad algorithm is good for sparse data, it performs larger updates for sparse parameters and smaller for others. Adam algorithm also using momentum, that is it takes the previous gradient into consideration

νt = γνt−1+ ∇Θf (x)

x0 = x − νt

(11) this is basically the same as equation 10 but with the extra parameter γ, used as a fraction of the precious gradient, γ is usually set to around 0.9.

2.4

Convolutional Neural Network (CNN)

Artificial Neural Network models are inspired by biological neural networks. I.e. neurons connected to each other. if a network is larger, with more layers of neurons, it is called a Deep Learning algorithm. One popular model is the CNN.

2.4.1 Convolutional Layer

(15)

(f ∗ g)(t) , Z ∞

−∞

f (τ )g(t − τ )dτ. (12)

The convolution produces a new function depending on t when multiplying and integrating two functions f and g of τ, on the whole domain. The function de-pending on t, g in equation 12 is reversed and shifted before the multiplication. This works in the network with the use of a kernel. The kernel is a small matrix that is multiplied and shifted over the input matrix. Different kernels detect different patterns, for example, one kernel can look like

  0 1 0 0 1 0 0 1 0  

and it will detect vertical edges in an image. How the kernel traverse over the input depends on the stride, if the stride is 1 the kernel will shift one element at the time, in figure 4 two convolutional operations can be seen. In a CNN the

Figure 4: Convolution operation, with the input element-wise multiplied with the kernel and summed up to the output. Stride is 1.

(16)

2.4.2 Padding

As can be seen in figure 4 the size will decrease as the operation with the kernel traverse the input. If one wants to keep the size of the input to the output padding is often used. That is when zeros are added around the input to get more positions of the kernel to stride i.e. more output elements.

2.4.3 Pooling layers

The pixels in an image often have a relation with their neighbors, one pixel by it-self does not give that much information and lots of the information in the output is redundant. A pooling layer can be used to reduce the size and computational complexity. With pooling, the size of the activation map is reduced while impor-tant features are kept. The output size reduces by the pooling values, for example, if the pool size is 2, a square of 2x2 pixels will be compared and one value trans-mitted to the output. Which value will be transtrans-mitted to the output depends on the pooling operation. If max pooling is used, the highest value will be put in the output. There are also min and average pooling where the smallest value and the average will be transmitted, respectively. An effect from pooling is that a kernel of a given size will take a larger portion of the original image into account after pooling. This means that the kernels describe very local features early in the network, and more and more global features after a few levels of pooling.

2.4.4 Dropout

One important aspect to consider when training a network is so-called over-fitting. This can be describes as when your network learned “too” much from the training data, it picked up not only the general patterns but also noise. This will make it hard to generalize and make predictions for new inputs.

One way to minimize this is to use dropout, i.e. ignores the output from randomly chosen nodes, randomized for each input data to make the training process noisy. 2.4.5 Activation function

(17)

Figure 5: The ReLU activation function to the left, and Sigmoid to the right Essentially, the activation function takes the output from the previous layer and converts it in some way, fire it, to the next input layer.As we saw in figure 2 we need an activation function to get something nonlinear. During the training of the network the gradients are computed during the backward propagation and for that, the activation function needs to be differentiable. The activation function is computed after each layer, so it will be computed in some cases millions of times, so a function that is computational inexpensive is a good choice. Numerical differentiation will also avoid the problem where the ReLU is not differentiable at x = 0. This will instead be approximated with a value on either side of this point. 2.4.6 Batch normalization

Internal covariate shift (covariate shifts between layers) is believed to be a primal reason why deep networks have been slow to train.[9] One way to solve this is by using batch normalization. The batch normalization is just another layer, as the convolutional layer, used after the convolutinal layer and usually before the activation function. The forward pass of the Batch normalization computes the mean and variance of an input feature for one mini-batch. Let an input of a k-dimensional layer be x = [x1, x2, ...xk] then for one dimension in that layer say

xj will be normalized compared to the other xj of the other inputs in the same mini-batch. If we have m inputs in each mini-batch and x is one of its dimensions

(18)

is the mean (µb) and variance (σ2b) over the mini-batch b = [x1, x2, ...xm]. This

will give the normalization

ˆ xi = xi− µb p + σ2 b (15) and the batch normalization

BNγ,β(xi) = γ ˆxi+ β (16)

will give the scaled output, where γ and β are parameters to be learned by training and updated after the backward pass.  is a constant added to the variance for numerical stability.

2.4.7 U-Net

The U-Net is a common CNN for bio-medical image segmentation developed at the University of Freiburg in Germany by Ronneberger, Fischer and Brox [10]. Compared to other solutions at the time, it requires less training data than other CNNs and yield a more precise segmetation. It has since then become a very popular architecture for many bio-medical applications.

(19)
(20)

2.5

In-Silico Labeling

In-Silico labeling (ISL) means that labels are done by a computer. It should predict fluorescent labels from a light microscopy image. In this case, a phase-contrast mi-croscopy image is used. Fluorescent labeling is fluorescent dyes binding to certain molecules so they can be visible in UV-light or fluorescence imaging. The fluo-rescent dyes can be used to label bio-molecules as proteins, peptides, antibodies, bacteria, living cells genetically, etc. However, there are many significant draw-backs with fluorescent labelings, like inconsistency, perturbations, phototoxicity, limitations in the number of simultaneous labels, that can be avoided with ISL.

2.6

Data and Pre-processing

The data used with ML need to be split into training data, validation data, and testing data. The training data is used to train the network and when all data have been used to update the network an epoch have been completed. A network is usually trained with several epochs and one way to get more information from the training images is to do some augmentations, which can be cropping, blurring, rotation, or something else that change the input image so the network sees it as a new image, but not as much to make it change the general pattern and information the network needs to learn. If you have large cell images, cropping can be useful. It does not just decrease the training time with smaller inputs but one image can be cropped into several different images giving new information each time.

When cells are spread homogeneously over an image there should be no loss in relevant information when cropping them. Also rotation in any direction can be done without any complications, and the same with a bit of blurring. If these augmentations happen randomly and differently for each epoch means the same image will be seen differently each time by the network, this can make the network learn more and decrease the risk of over-fitting.

During validation, the forward pass is the same as for the training but there is no backward propagation. The loss is computed and the validation is to make sure the network does not get over-fit, i.e. the training loss goes down but the validation loss does the opposite.

(21)

3

Method

It is important when training networks to have a lot of good data. The data need to be structured in a dataset that looks different for each problem. The first problem to solve in this project was an image-to-image prediction with a pixel-wise regression, for that the dataset will include input images and for each input its corresponding target image. The first dataset used has for one phase-contrast image two channels of fluorescent images of untreated HeLa-cells with genetically inserted FUCCI marker. FUCCI is a set of fluorescent markers that fluoresce in two different colors depending on where the cell is in its cell cycle, commonly used to study cell growth and division in high detail.

The second dataset is phase-contrast images, and for each image one corresponding fluorescent marker of THP-1-cells differentiating into macrophages, which is a type of immune cell. The fluorescent marker indicates how far the differentiation has progressed and can be used to study the immune system. It also includes a marker for dead cells. The pixel-wise classification was also tried to see if it was a better way to solve this problem, also individual cell classification was tested at the first dataset.

3.1

Pixel-wise regression

To solve a problem with cell images CNNs have shown good results, especially the

U-Net, and that is what has been used here. Different loss functions have been tried out to get as good results as possible, but only one optimizer, the Adam optimizer. The loss functions that have been used are Mean Absolute error, Mean Squared Error, and the combination called Smooth L1 Loss. The Imgaug python library was used for the augmentations, which is a good option when you have images as both input and target, because for the augmentations depending on the image orientation, like rotation and cropping, the same will be done on input and target image. First, the background of the target images were removed, then normalized between 0 and 1 and converted to heatmaps.

Two different U-net models have been used, the first one with eight hidden layers, with double convolutional layers, a kernel of size 3x3, and ReLU as the activation function. Between each double convolutional layer, there is the max-pooling layer in the downsampling part and an up-sample layer in the up-sample part. The activation function used in the output layer is a sigmoid. The learning rate used is 0.0001. The total number of layers for the U-Net is 28, two convolutional layers each for the input and output layer.

(22)

standard U-Net. But the blocks of convolutions were replaced with a more mod-ern variant [11] (see Figure 3C in reference) using a one-step aggregation (OSA) module, identity mapping, and efficient squeeze-and-excitation for channel-wise at-tention. These modifications have been shown to enable better learning compared to standard convolutions.

For this U-Net there was a pre-trained model to use so there could be comparisons between pre-trained and not. The pre-trained model was an existing model trained to segment cells from the background and detect cell nuclei in phase-contrast images on another dataset.

3.2

Pixel-wise classification

To investigate whether pixel-wise classification is an easier approach that regres-sion, the target images were changed to segmentation maps (binary target images) instead of heatmaps by thresholding the fluorescence values. Pixels were assigned to class 1 if their fluorescence values were above the threshold, 0 otherwise. Dice Loss was used and the rest was the same as for pixel-wise regression.

3.3

Single-Cell classification

At first, it seemed a bit hard for the network to see the differences between the cells in the first dataset, so before going on with the second dataset a test to classify individual cells were built. To get the training dataset for this the images were cropped to contain only one cell each. To create the targets the target images were used. The pixel values inside the chosen cell were summed up and thresholds for four cell classes marked them and gave the labels, "cell type 1", "cell type 2", non of the cell types or both. It was a bit tricky to mark the cells properly, most of them ended up in either the no cell or both cell class. These were trained in a ResNet50 from the python library Torchvision, with three fully connected layers.

4

Results

The project was split into three parts, following the topics in section3. Pixel-wise classification, pixel-wise regression, both with a U-Net and single-cell classification with a ResNet.

4.1

Pixel-wise regression

(23)

tried several different loss-functions, learning rates, different augmentations, and scaling, etc. but could not get a better result than shown below. Basically, the two different targets look the same.

When starting to work with the second dataset I also used the U-Net described last in section 3.1 because it also had a pre-trained model. First I tried with the pre-trained model and L1 loss, then without pre-trained with L1, MSE, and smooth-L1 loss with beta = 0.33, 0.5, and 1. The outputs can be seen in figure7.

Figure 7: Input and target with different predictions

The R2 score was also computed for the different outputs, R2 is the coefficient of

(24)

Loss-function Pre-trained R2-score L1 Yes 0.731 L1 No 0.839 MSE No 0.861 L1-smooth β = 0.33 No 0.831 L1-smooth β = 0.5 No 0.867 L1-smooth β = 1 No 0.726

Table 1: R2 score for the different outputs

It is a bit hard to see in figure 7 which one is the best but when computed the R2-score (table 1) the L1-smooth with β = 0.5 gave the highest R2-score so that

setup was used when training with the first dataset.

Figure 8: Input, targets and predictions from first dataset

In figure8it is easy to see that the network can distinguish between the cells called "c1" and the ones called "c2". This also gave a really good R2-score at 0.906.

4.2

Pixel-wise classification

(25)

can be compared to the pixel-wise classification in figure 7. The output shown in figure 9 is without a pre-trained model, the test gave a R2-score of 0.484, and

compared to the regression test this output looks worse so I felt no need of testing this with the dataset used in figure 8.

Figure 9: Input, target and output from second dataset with binary targets

4.3

Individual cell classification

The first thing to do here was to cut the images and find some kind of threshold for the different cell types to create a new dataset.

(26)

Figure 10: Confusion Matrix without normalization

5

Discussion

After the test with the first dataset and the individual cell classification, I thought the first data was not good enough to get the results I wanted.

In the image-to-image predictions only U-Nets was used, two different kinds but maybe other CNNs could have been used.

5.1

Pixel-wise regression

(27)

did not speed up the learning nor gave a better result at the end.

5.2

Pixel-wise classification

It was a lot harder for the network to correctly label the cells when they were binary, this did not improve by a pre-trained model either. I think why the classifi-cation was harder was because of the conversion to binary target images. Basically, the pixels within labeled cells should be 1, and the background and the other cells 0. In the target images, there were cells not as bright as the brightest ones and they might be mis-classified. As could be seen in figure 11 the network trained with non-binary target images learned to see the difference between brighter and darker cells and I think that became a problem in the classification problem.

5.3

Individual cell classification

Why this did not work I think was because of the sectioning of the cells. Each cell was cut out and the pixel values within the cell were summed up and plotted into a graph, figure11, it was hard to see a threshold to distinguish between the cells.

Figure 11: The sum of the pixel-values within each cell mask, values in C2 target image against C1. The chosen thresholds marked in the right figure.

In reality, ∼90% of the cells should be labeled as 1 or 2 but as can be seen in figure

(28)

6

Conclusion

Several different approaches for ISL were evaluated. The result is a robust pixel-wise regression approach that shows promising results on two datasets. A pre-trained model did not give a better result or a faster convergence to this problem. The single-cell classification did not give a good result, not much time was spent to solve the problem like that so maybe it could have been solved with some more work.

The approach that showed a good result would probably give a good result on similar datasets, with just some small changes.

For future studies, this can be tried with more than one dataset at a time. When I tried on the different dataset I trained the model from scratch but when trying this with a new similar dataset, it might be faster to do from one of these trained models and just do some fine-tuning.

(29)

References

[1] Eric M. Christiansen et al. “In Silico Labeling: Predicting Fluorescent Labels in Unlabeled Images”. In: Cell 173.3 (2018). doi: 10.1016/j.cell.2018. 03.040.

[2] Torch Contributors. L1LOSS. 2019. url: https : / / pytorch . org / docs /

stable/generated/torch.nn.L1Loss.html.

[3] Torch Contributors. MSELoss¶. 2019. url: https://pytorch.org/docs/

stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss.

[4] Peter J. Huber. “Robust Estimation of a Location Parameter”. In: Break-throughs in Statistics: Methodology and Distribution. Ed. by Samuel Kotz and Norman L. Johnson. New York, NY: Springer New York, 1992, pp. 492– 518. isbn: 978-1-4612-4380-9. doi: 10.1007/978-1-4612-4380-9_35. url:

https://doi.org/10.1007/978-1-4612-4380-9_35.

[5] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmenta-tion”. In: CoRR abs/1606.04797 (2016). arXiv: 1606 . 04797. url: http :

//arxiv.org/abs/1606.04797.

[6] Torch Contributors. NLLLOSS. 2019. url: https://pytorch.org/docs/

stable/generated/torch.nn.NLLLoss.html.

[7] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.http:

//www.deeplearningbook.org. MIT Press, 2016, pp. 200–214.

[8] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning.http:

//www.deeplearningbook.org. MIT Press, 2016, pp. 302–305.

[9] Sergey Ioffe and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. In: CoRR abs/1502.03167 (2015). arXiv: 1502.03167. url: http://arxiv.org/abs/

1502.03167.

[10] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. 2015. arXiv: 1505 . 04597 [cs.CV].

References

Related documents

In these high density cul- tures, mRNA concentrations were measured by qPCR, and both GAPDH and TTBK1 mRNA was suppressed by 87% as compared to negative control.. A similar

An interesting finding which gives more clues about the role of Ngn2 in dopaminergic neuron development is that we managed to get TH-positive neurons that also

Therefore, paper I was conducted to characterize these two different nerve fiber formations from organotypic VM tissue cultures in more detail regarding the

We have previously shown that propofol causes a reorganisation of the cytoskeleton protein actin in neurons, but we were further interested to study the effects

Detta visar att även om skillnadena mellan uppskattad effektiv hydraulisk konduktivitet (både inom respektive modell och mellan M1 och M2) upplevs som relativt små, kan de

Since small numbers of bull spermatozoa (&lt; 2x10 6 ) are routinely inseminated, for example when using sexed spermatozoa, it would be feasible to use colloid centrifugation

Interestingly, increasing amount of behavior evidences have suggested that itch-related information is under local inhibition in the dorsal horn, since decrease of the

Given that frog motor neurons release 383 FM1-43 molecules per synaptic vesicle into the superfusion fluid (Henkel et al., 1996), the step size observed here is consistent with