Image Segmentation Using Deep Learning Regulated by Shape Context

(1)

IN

DEGREE PROJECT

COMPUTER SCIENCE AND ENGINEERING,

SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2018

Image Segmentation Using Deep

Learning Regulated by Shape

Context

WEI WANG

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

Image Segmentation Using Deep

Learning Regulated by Shape

Context

Bildsegmentering med djupt lrande reglerat med formkontext

Wei Wang

KTH Royal Institute of Technology

Supervisor: Chunliang Wang

Examiner: Haibo Li

(3)

Abstract

In recent years, image segmentation by using deep neural networks has made great progress.

However, reaching a good result by training with a small amount of data remains to be

a challenge. To find a good way to improve the accuracy of segmentation within limited

datasets, we implemented a new automatic chest radiographs segmentation experiment

based on preliminary works by Chunliang using deep learning neural network combined

with shape context information. When the process was conducted, the datasets were put

into origin U-net at first. After the preliminary process, the segmented images were then

repaired through a new network with shape context information. In this experiment, we

created a new network structure by rebuilding the U-net into a 2-input structure and refined

the processing pipeline step where the datasets and shape context were trained together

through the new network model by iteration.The proposed method was evaluated on public

datasets of 247 posterior-anterior chest radiographs and n-folds cross-validation was used

for evaluation. The outcome shows that compared to origin U-net, the proposed pipeline

reaches higher accuracy when trained with limited datasets. Here the ”limited” datasets

refer to 1-20 images in the field of medical field. A better outcome with higher accuracy

can be reached if the second structure is further refined and shape context generator’s

parameter is fine-tuned.

Sammanfattning

(4)

Image Segmentation Using Deep Learning Regulated

by Shape Context

Wei Wang

KTH Royal Institute of Technology Stockholm, Sweden

wewang@kth.se ABSTRACT

In recent years, image segmentation by using deep neural networks has made great progress. However, reaching a good result by training with a small amount of data remains to be a challenge. To find a good way to improve the accuracy of segmentation with limited datasets, we implemented a new automatic chest radiographs segmentation experiment based on preliminary works by Chunliang[32] using deep learning neural network combined with shape context infor-mation. When the process was conducted, the datasets were put into origin U-net at first. After the preliminary process, the segmented images were then repaired through a new network with shape context information. In this experiment, we created a new network structure by rebuilding the U-net into a 2-input structure and refined the processing pipeline step. In this proposed pipeline, the datasets and shape con-text were trained together through the new network model by iteration. The proposed method was evaluated on 247 posterior-anterior chest radiographs of public datasets and n-folds cross-validation was also used. The outcome shows that compared to origin U-net, the proposed pipeline reaches higher accuracy when trained with limited datasets. Here the "limited" datasets refer to 1-20 images in the medical image field. A better outcome with higher accuracy can be reached if the second structure is further refined and shape context generator’s parameter is fine-tuned in the future. KEYWORDS

U-net, Shape Context, deep learning, segmentation, chest radiograph

1 INTRODUCTION Background

Segmentation by CNN. Recent years, convolutional neu-ral network achieved great progress in computer vision and plenty of outstanding Architectures(Alexnet, VGG16, ResNet-101, etc.) was proposed, making it promising to conduct im-age segmentation by using deep learning. Convolutional Neural Networks has been highlighted to have the capability to achieve state-of-the-art performances in Image segmen-tation and pattern recognition field[21]. Depending on the

large dataset and high performance calculations, CNN per-forms well in extracting context information and generating image features. It extracts local informations(such as shape, outline etc.) from shallow layers and extract semantic and global information from deep network layers with more wider perspectiveMilletari et al. [21]. CNN achieves better performance than classical machine learning algorithms in classification and recognition with high complexity. In earlier time, neural networks were divided into two parts to make image segmentation: CNN and classifier[8]. The CNN is used to extract the features from images with region proposals and classification is often conducted by simple classifiers like SVM. After series of evolution of CNNs[7][26], FCN[18] was proposed by replacing fully-connected layers or MLP with transposed convolution layer. It reached good results and shows that deep convolutional network will be one of the promising ways in image segmentation field in next years.

(5)

W. Wang amount of annotated images to be trained adapting each

machine. So how to improve the accuracy of neural network within limited data is crucial.

Research Question and Goals

As there is not a clear way on how to improve the accuracy of deep neural network with small amount of data, various potential researches can be tentatively tried. Among these po-tential researches, extracting images features with more local information, adding supplementary information and rebuild the architecture of neural network are all seems meaningful. On the other hand, from previous study[32] [33]conducted by Chunliang, adding shape context as an additional input to the U-net increases segmentation accuracy, which gave us the fundamental inspiration for doing this thesis. Extracting shape context from coarse segmented image and taking it as the supplementary information to deep neural network will help network model improve the accuracy. The test im-age will be regulated with different weights regarding to the distances from shape outline. In this thesis, we aim to find a more suitable way to add shape context information to neural network and modify the structure to increase the segmentation accuracy. The performance of the proposed method will be calculated and compared with temporary mostly used model(U-net etc.).

Question. Can shape context be used to improve the accuracy of neural network with limited data? To answer the question, the following goals have been set:

• Find suitable test dataset for evaluating and figure out how many images of dataset can be used as "lim-ited" dataset.

• Design a pipeline of the whole processing of the proposed method to navigate each step clearly. • Define several navigational subtasks based on the

pipeline to make it more easy for conducting the experiment

• Improve the primary model structure, create a new neural network and find a optimal training way re-garding specific training data and hardware. • Compare the original neural network and the

pro-posed method with a certain number of training data to figure out which processing method will result a better segmentation and change the number of train-ing data to record the variations of the two methods. • Make sure the proposed method are practical and

can be used in target working situations.

• Record and summarize shortcomings and deficien-cies in the experiment so that they can be further improved in future experiments

2 RELATED RESEARCH Fully convolutional network

Fully convolutional network was proposed by Jonathan Long et al.[18] in 2015. They inherited and carried forward the previous researches. They put forward the innovative FCN model and achieved end-to-end pixelwise prediction under supervised training. This network exceed in providing cor-responding size of output from arbitrary test images and the prediction is calculated through dense feedforward com-putation and packpropagation by using subsampling and upsampling layers. Although Convnets[1, 11, 14, 22, 24, 29] excels in good performance in recognition, classification and segmentation, there are still some shortcomings. Prelimi-nary approaches[5, 6, 10, 20] annotate every enclosing pixels apart from regions and some of approaches use patchwise training, which can be complicated[6, 20] and lack the the efficiency. In the approach conducted by Alessandro Giusti et al.[9], they generate image maps at first and then convolu-tional and max-pooling layers are forward-propagated at the patch level. In the research conducted by Ross B. Girshick et al.[8], region proposals were designed at first, and the CNN network was combined with classifier. To reduce the complicated calculation, Jonathan Long et al. proposed the basic idea "arbitrary-size input", which was first proposed by Matan et al.[19] and expand the dimension from vector to 2-dimension using fully convolutional computations. Getting the experience from fully convolutional computation, which is been developed in many years[4, 20, 23, 31], they proposed the FCN network by fusing features across different layers to achieve the dense prediction. The structure of FCN is shown in Figure 1.

Figure 1: Architecture of FCN

(6)

Image Segmentation Using Deep Learning Regulated by Shape Context or shift-and-stitch was used in th beginning. Finally,

decon-volution was chosen to be the way for upsampling due to the efficient and effective advantage. The architecture which is shown in Figure 1 demonstrate that the classifier layer is discarded and replaced with convolutional layers which interpolate layer pixels using bilinear upsampling. In front of the structure, a three-dimensional image with random sizeh × w × h was used as the first layer. After the first layer, a sequence of convolutional layers and maxpooling layers were introduced. Then, the intermediate layers were followed by deconvolutional layers, which makes the filter nonlinearly rarefactive. The local,fine,appearance informa-tion in shallow layers was extracted and kept. After shallow layers, the local information(from pool3 andpool4 ) was sent forward to higher layers to combine the local information with global information. Using the skip[2] architecture by connecting local, shallow information with global, semantic information, FCN achieves an end-to-end dense prediction with high accuracy.

U-net

The outcome of convolutional networks are limited by the amount of training data and the size of network. In medi-cal image processing, the output should often contain the information such as location, shape, etc. Absorbing the ex-periences from Ciresan et al.[3], Seyedhosseini et al.[30] and Hariharan et al.[12], they build and extend FCN architecture with more concatenation between shallow layers and deep layers to avoid the overlapping redundancy[3] and the trade-off between localization accuracy and context.

Figure 2: Architecture of U-net

The architecture of Unet consist of contracting layers and expansive layers, which is symmetric and look like "U". It only use the convolutional layers like FCN, but moreover it

expands the feature channels by combining every contract-ing path to expansive path, which means it can reserve more useful features from every layer. The results also show that Unet is one of the most practical models to date.

shape context

Shape context is a shape descriptor which describe the re-lationship between the certain point and the rest of points with respect to the same image [28] [16] [37] [36] and it can be used as supplementary information to regularize detected images.At early time, based on the observation found by Thompson, that related shapes can be described by aligning transform, several approaches aiming for shape matching were proposed successively, while finding correspondence is still be a task. Aiming to solve the correspondence problem, shape context was proposed by serge Belongie et al.[28] and in their approach it reached good results.

In recent research conducted by Chunliang et al[32], they provide a hybrid method integrating shape model into deep learning approach. The research was divided into three main steps: scout segmentation by U-net, Shape context estimation and rectifying segmentation by U-net together with shape context. The shape context was calculated by solving a level set function where a statistical shape model was plugged in:

∂ϕ

(7)

W. Wang 3 METHODS

Before the implementation of proposed approach, a pilot study on preliminary research should be conducted and there are two tasks before commencing. One is for the design of pipeline and the other is for architecture construction. Pipeline and architecture

The implementation of image segmentation started with de-signing the whole process of conduction and the detail struc-ture of proposed network. Before conducting our approach, we designed a pipeline in which the image will be trained or tested by Unet together with shape context following the order of "Unet1-context-Unet2" based on [32]. As illustrated

Figure 3: Proposed Pipeline

in Figure 3, the proposed pipeline can be divided into 4 steps: coarse segmentation with preliminary U-net, shape context generation using shape context generator, rectifying segmen-tation with shape context through 2-input U-net, n-times iteration from step2 to step3. As the sequence of the pipeline, the dataset will be put into standard U-net at first and after a "complete" training, the first U-net will generate an coarse segmentation output. Then the segmentation image will be put into generator for extracting shape context. After that, the shape context together with training data will be put into 2-input U-net for generating a finer prediction segmen-tation. Finally, we will repeat the step from generating shape context to training 2-input U-net several times. The archi-tecture of proposed U-net is based on U-net. To adapt the combination of shape context information and dataset, the 2-input architecture is proposed in Figure 4 by separating shape context layers from image layers. The benefit of this modification lies in the convenience of controlling weight-ing proportion in different level of layers. As illustratweight-ing in Figure4, we can control the numbers of layers in different level of layers to reach more suitable and flexible weight proportion between Image information and shape context information. Moreover, we can choose the "skip"[27] way by connecting or breaking the link between shape context lay-ers and higher deconvolutional laylay-ers. To make the training procedure faster and make better convergence, we also add

Figure 4: Architecture of 2-input U-net

batch normalization[13] after the each second convolutional layer at each level.

Dataset

Pneumonia is a high-risk disease. About one million peo-ple are hospitalized every year and 5,000 peopeo-ple die from it[25]. According to WHO, X-ray is one of effective detection methods and Chest X-ray seems to be a promising dataset. From Chunliang’s previous study, we know that compared to MRI, the changes in CT’s results are slightly more difficult to observe when shape context information is used. Since the outcome of X-ray is not included in previous study, X-ray can be a potential data type to fulfill the adaptive value of shape context method together with CT and MRI. In order to study whether Shape context will generally improve in various imaging formats and also to better compare the opti-mization of X-ray data, we finally chose to use Chest X-ray as the dataset to be trained and tested. In practice, annotat-ing X-ray images needs lots of time. Many hospital do not have so many labeled training data and annotating images is time-wasting. So in our proposed method, 1-20 images of training data can be called "limited" dataset.

Training Procedure

(8)

Image Segmentation Using Deep Learning Regulated by Shape Context

Figure 5: Architecture of 2-input U-net

MiaLab. To generate shape context information from pre-liminary segmentation results, an shape context generator is needed for accelerating speed. Thanks Chunliang for pro-viding a convenient API to me, which is called MiabLab. Following the basic algorithms, Miablab take the segmen-tation shape in the plain and the plain is embedded to be zero level set of a higher dimensional surface. The embed-ding function is the signed distance function. Then the shape model is generated by taking the mean of the signed distance functions of each segmentation region and n prominent vari-ations extracted from PCA (Principal component Analysis) by Miablab. The model is estimated by solving the level set function. After every iteration, it try to minimize the distance of the shape and the formulation is shown in equation 1. By adjusting parameters, Mialab controls the shape of the gen-erated shape context. Figure 6 illustrates a gengen-erated shape context. The distance in the segmented outline is negative and the distance outside the segmentation outline is positive. The farther the interior point from the boundary, the smaller the value will be and vice versa.

(a) (b)

Figure 6: (a) coarse segmentation image being processed by Mialab; (b) generated shape context from Mialab which is taken as the second input to the following neural network

(9)

W. Wang all zero matrices. By feeding the zero matrices, the 2-input

network will be forced to converge based on the information extracted from upper image-related layers. Then the second cluster containing mixture data of shape context and zeros matrices will be fed in to 2-input U-net. After full training of network, the segmentation outcome will keep on being thrown into Mialab to make the second round of shape con-text. After several iterations, the outcome will be recorded and compared with the accuracy of origin U-net.

Test Setup

On th e testing part, 10-folds cross validation and 247-cross validation of origin U-net will both be recorded. The results of proposed pipeline with shape context information will be recorded after 4 times iterations.

4 RESULTS

The proposed approach was evaluated by using Chest X-ray dataset which has 247 X-X-ray images and 247 matched annotated labels with the 1024 × 1024 origin size for each image. These images were downscaled from 1024 × 1024 to 256 × 256 before training and 247-cross validation and 10-cross validation were used separately for better evaluation. In 247-fold cross validation, all the dataset will be divided into 247 folds equally and we pick 1 fold for training and the rest 246 folds for evaluating at every time. For 10-fold cross validation, 1 fold(24 images) was used for training and the rest 9 folds(223 images) were used for evaluating. Our implementation was based on Tensorflow framework and the Adam optimizer with learning rate of 0.0001 was used. The segmentation accuracy was tested and calculated by using the IOU(intersection over union) function which is shown in formula 2. Where thepibelongs to the prediction segmentation pixels andдi belongs to ground truth pixels.

IOU = 2 ÍN i piдi ÍN i pi2+ Í N i дi2 (2) Both of the two cross validation form were processed in Omen Laptop with NVIDIA GeForce GTX 1050 Ti in Ubuntu 17.04 system.

To make the original U-net 247-cross validation, we set the training step to be 5000( 5000 epochs) since there is only 1 image for training. The learning efficiency will be influenced by the limitation of training data. We record the training and validation curves by a ₅₀1 sample rate for saving time and the training and validation curves of the first 5 folds are recorded in graph (a) in Figure 7 and the original IOU results of U-net are recorded in the last column of Table 1. From the graph, we can see that after 2000 steps, the U-net reached convergence and changed slowly in most of the folds while in some folds, the validation curve in the graph is more

violent due to to the lack of commonality in the shape of the trained image, such as the fold 2.

(a) The training and validation graph of 5 folds from 247-cross validation by original U-net method

(b) The training and validation graph of 5 folds from 247-cross validation by using the proposed method

Figure 7: Accuracy of 247-cross validation

On the proposed approach side, we make full use of the coarse outcome from U-net and put them into Mialab for generating shape models. Then the 2-input U-net was fed by origin training image and mixture of shape context. At the first 2000 steps, we fed the zero matrix into shape-context port of U-net and next 2000 steps we fed the cross-context (half zero matrix, half shape context) instead of zero matrix. The training and validation graph is kept in Figure 7 (b). After the fully training of new network, we put the outcome into Mialab again and repeated the step4 of pipeline 3 times. The accuracy rate from iteration 1 to iteration 5 is kept in Table 1. Similarly, the results of the first five groups of 10-fold Table 1: The first 5 columns show the IOU accuracy of 5 folds from 247-cross validation from proposed method and the last column shows the accuracy of ori-gin U-net method

iteration1 iteration2 iteration3 iteration4 Unet

fold1 0.9109 0.9114 0.9098 0.9111 0.8805

fold2 0.7823 0.8092 0.8094 0.8172 0.7383

fold3 0.9061 0.9017 0.9085 0.9095 0.8700

fold4 0.9324 0.9356 0.9395 0.9395 0.8669

fold5 0.9067 0.9091 0.9109 0.9105 0.8365

(10)

Image Segmentation Using Deep Learning Regulated by Shape Context training step of proposed approach is changed to 500+1000.

Table 2 shows the accuracy of these two method.

(a) The training and validation graph of 5 folds from 10-cross validation by original U-net method

(b) The training and validation graph of 5 folds from 10-cross validation by using the proposed method

Figure 8: Accuracy of 10-cross validation of U-net

Table 2: The first 5 columns show the IOU accuracy of 5 folds from 10-cross validation from proposed shape context method and the last column shows the accu-racy of original U-net method

iteration1 iteration2 iteration3 iteration4 U-net

fold1 0.9739 0.9739 0.9739 0.9740 0.9712

fold2 0.9730 0.9731 0.9731 0.9732 0.9715

fold3 0.9739 0.9740 0.9740 0.9739 0.9708

fold4 0.9743 0.9743 0.9742 0.9742 0.9721

fold5 0.9738 0.9737 0.9736 0.9737 0.9716

The Figure 9 shows one of the segmentation outcome after Original U-net method and proposed method. It shows that for some pictures, the accuracy can be increased after iterations through proposed method. On the other hand, a few pictures do not have much improvement after iteration. The parameters and the training way of iteration process should also be adjusted in future.

5 DISCUSSION

247-folds cross validation

U-net. As can be seen from graph of the above Figure 7(a), when conducting 247-folds validation, only one X-ray image was trained by U-net. The convergence of the training curve begins to slow down after 2,000 steps while on the other hand, the curves of the validation vary a lot from fold to fold due to the limitation of training dataset, which can

(a) (b) (c)

(d) (e) (f)

Figure 9: The graph(a) shows the segmentation outcome from original U-net method; graph(b)-(e) show the segmen-tation after iteration 1 - iteration 4 after proposed method; graph(f) is the ground truth of this test image.

be observed obviously. Taking the second fold for example, since the shape of the trained data is relatively uncommon, the information of the training graph itself cannot contain all the useful universal information that can be utilized by the neural network. After data enhancement, the second set of data still cannot achieve the same accuracy as other groups. In following table, the fold2’s accuracy is 0.7383 while the others reach a accuracy above 0.8300.

(11)

W. Wang a smaller degree of variation in the accuracy of the different

groups, and the accuracy rate is more stable. 10-folds cross validation

U-net. Compared to the 247-cross validation, 10-cross val-idation has more training data, so the learning efficiency of the neural network is faster and the result is relatively better. After 200 steps, the variation in validation of Original U-net tends to be more smooth. From the table in Table 2, it can also be learned that all the accuracy rate in different folds tend to be almost the same and stick around 0.971.

Proposed approach with shape context. Similar to U-net’s results, the Proposed approach converges faster and there is little difference of accuracy between different folds. Com-pared to Origin U-net, the results of the proposed method is improved slightly. The accuracy rate of shape context method stick around 0.973, which is improved around 0.2%. The re-sult after the iterations is slightly different from the rere-sult of 247-folds. Only some of the groups have slight improve-ment. There is a slight increase in the first group and the second group, while there will also be some ups and downs in the accuracy of other groups. Taking into account that each prediction result will have a certain amount of error, either the rise or fall of the accuracy after these iterations can be considered negligible. In general, this new proposed approach can still improve the accuracy, but as the training data increases, the improvement of the final result decreases. Training way

Through this shape context approach, another point worth discussing is that it proves that by reorganizing the train-ing data, or mixtrain-ing specific information into the traintrain-ing set, the convergence efficiency of the neural network can be improved. The neural network learning tendency can also be induced artificially by controlling the training sequence. From above graph in Figure 8(a), in the first 2,000 steps, we put 256×256 sized zero matrix into 2-input U-net’s shape con-text port. In this way, 2-input U-net will be fed x-ray images and useless zero matrices information. This network can only learning shallow and semantic descriptor from image-layer-side, not the context-layer-image-layer-side, which forces the network to weaken the influence of context-layer-side information. After 2000 steps, this 2-input U-net will be fed cross-context matrix instead of zeros matrix.The cross-context matrix con-sist of half matched shape context matrices and half zeros matrices. As we can see in the graph, when we changed the dataset of shape context, the accuracy quickly fell to the bottom because the network is disturbed by "extra" shape model information. After the cliff fall, 2-input U-net recov-ered quickly by adapting new information to its image pro-cessing. From the graph, we can see a slight jump of the

curve after 2000 steps compared to the front plateau because we added effective auxiliary information for network.This phenomenon is more obvious in fold 4. Similarly in 10-folds cross validation, the validation curves also faced the cliff fall when the shape context data is changed from zeros data to cross-context data and the curves also jump back after adapting the new information. In this way, we can change the convergence tendency efficiently and quickly without changing the structure of network.

6 CONCLUSIONS

We proposed and presented a new segmentation approach with the use of shape model. From the outcome reported above, deep neural network by using shape context can im-prove the accuracy of segmentation within limited training data. After the result evaluation, the following goals are achieved:

(1) By choosing X-ray as the test dataset. It is proved that deep neural network with shape model can also improve the accuracy when prediction X-ray images. (2) A feasible pipeline was created by adding iteration loop between shape context generation and segmen-tation prediction. The result also shows that using the iteration can also improve the accuracy while the outcome is relevant to the amount of training data. When the training data is limited and small, mak-ing iteration can achieve better performance than that when training data is large. With the increase of training data, the rate of increase has been slowed down.

(3) A new architecture of network is proposed, which has two input port. One port can be the main im-age processing port and the other can be take as the supplementary information which gives an weight regarding the distance to network and helps it mod-ify the segmentation.

(12)

Image Segmentation Using Deep Learning Regulated by Shape Context Future work

For a better understanding of how the shape context helps neural network and for a further improvement on accuracy. There is still many aspect to be done in the future.

• Since the limitation of time and hardware, many potential more suitable parameter still not be tried. Finding more suitable parameters of MiaLab and network will be promising and worth trying. • The structure of 2-input U-net is still need to be

im-proved. Adding suitable layer dropouts and changing the structure of network for better convergence will both be potential future work.

• A more broaden field related to classical method need to be tried in the future. Such as semi-supervised learning by using shape model in deep neural net-work.

7 ACKNOWLEDGMENTS

(13)

W. Wang REFERENCES

[1] Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross B. Girshick. 2015. Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks. CoRR abs/1512.04143 (2015). arXiv:1512.04143 http://arxiv.org/abs/1512.04143

[2] CHRISTOPHER M. BISHOP. 2016. PATTERN RECOGNITION AND MACHINE LEARNING. SPRINGER-VERLAG NEW YORK.

[3] Dan C. Ciresan, Luca M. Gambardella, Alessandro Giusti, and JÃĳr-gen Schmidhuber. 2012. Deep neural networks segment neuronal membranes in electron microscopy images. In IN NIPS. 2852–2860. [4] Jifeng Dai, Kaiming He, and Jian Sun. 2014. Convolutional Feature

Masking for Joint Object and Stuff Segmentation. CoRR abs/1412.1283 (2014). arXiv:1412.1283 http://arxiv.org/abs/1412.1283

[5] David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. CoRR abs/1406.2283 (2014). arXiv:1406.2283 http://arxiv.org/abs/1406. 2283

[6] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. 2013. Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (Aug 2013), 1915–1929. DOI: http://dx.doi.org/10.1109/TPAMI.2012.231

[7] Ross B. Girshick. 2015. Fast R-CNN. CoRR abs/1504.08083 (2015). arXiv:1504.08083 http://arxiv.org/abs/1504.08083

[8] Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2013. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR abs/1311.2524 (2013). arXiv:1311.2524 http:// arxiv.org/abs/1311.2524

[9] Alessandro Giusti, Dan C. Ciresan, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber. 2013. Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks. CoRR abs/1302.1700 (2013). arXiv:1302.1700 http://arxiv.org/abs/1302.1700 [10] Saurabh Gupta, Ross B. Girshick, Pablo Arbelaez, and Jitendra Malik.

2014. Learning Rich Features from RGB-D Images for Object Detection and Segmentation. CoRR abs/1407.5736 (2014). arXiv:1407.5736 http: //arxiv.org/abs/1407.5736

[11] Bharath Hariharan, Pablo Arbelaez, Ross B. Girshick, and Jitendra Malik. 2014. Simultaneous Detection and Segmentation. CoRR abs/1407.1808 (2014). arXiv:1407.1808 http://arxiv.org/abs/1407.1808 [12] Bharath Hariharan, Pablo Andrés Arbeláez, Ross B. Girshick, and Jitendra Malik. 2014. Hypercolumns for Object Segmentation and Fine-grained Localization. CoRR abs/1411.5752 (2014). arXiv:1411.5752 http://arxiv.org/abs/1411.5752

[13] Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accel-erating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167 (2015). arXiv:1502.03167 http://arxiv.org/abs/ 1502.03167

[14] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross B. Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. CoRR abs/1408.5093 (2014). arXiv:1408.5093 http://arxiv.org/abs/1408.5093 [15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012.

Ima-geNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). Curran Associates Inc., USA, 1097–1105. http://dl.acm.org/citation.cfm?id=2999134.2999257 [16] M. E. Leventon, W. E. L. Grimson, and O. Faugeras. 2000.

Statisti-cal shape influence in geodesic active contours. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662), Vol. 1. 316–323 vol.1. DOI:http://dx.doi.org/10.1109/ CVPR.2000.855835

[17] Shu Liao, Yaozong Gao, Aytekin Oto, and Dinggang Shen. 2013. Repre-sentation Learning: A Unified Deep Learning Framework for Automatic Prostate MR Segmentation. Springer Berlin Heidelberg, Berlin, Heidel-berg, 254–261. DOI:http://dx.doi.org/10.1007/978-3-642-40763-5_32 [18] Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2014. Fully

Con-volutional Networks for Semantic Segmentation. CoRR abs/1411.4038 (2014). http://arxiv.org/abs/1411.4038

[19] Jonathan Long, Ning Zhang, and Trevor Darrell. 2014. Do Convnets Learn Correspondence? CoRR abs/1411.1091 (2014). arXiv:1411.1091 http://arxiv.org/abs/1411.1091

[20] Ofer Matan, Christopher J.C. Burges, Yann Le Cun, and John S. Denker. 1992. Multi-Digit Recognition Using A Space Displacement Neural Network. In Neural Information Processing Systems. Morgan Kaufmann, 488–495.

[21] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. CoRR abs/1606.04797 (2016). arXiv:1606.04797 http://arxiv.org/abs/1606.04797

[22] R. Mottaghi, X. Chen, X. Liu, N. G. Cho, S. W. Lee, S. Fidler, R. Urtasun, and A. Yuille. 2014. The Role of Context for Object De-tection and Semantic Segmentation in the Wild. In 2014 IEEE Con-ference on Computer Vision and Pattern Recognition. 891–898. DOI: http://dx.doi.org/10.1109/CVPR.2014.119

[23] R. Mottaghi, X. Chen, X. Liu, N. G. Cho, S. W. Lee, S. Fidler, R. Urtasun, and A. Yuille. 2014. The Role of Context for Object De-tection and Semantic Segmentation in the Wild. In 2014 IEEE Con-ference on Computer Vision and Pattern Recognition. 891–898. DOI: http://dx.doi.org/10.1109/CVPR.2014.119

[24] Pedro H. O. Pinheiro and Ronan Collobert. 2013. Recurrent Convolu-tional Neural Networks for Scene Parsing. CoRR abs/1306.2795 (2013). arXiv:1306.2795 http://arxiv.org/abs/1306.2795

[25] Pranav Rajpurkar, Jeremy Irvin, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Y. Ng. 2017. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. CoRR abs/1711.05225 (2017). arXiv:1711.05225 http://arxiv. org/abs/1711.05225

[26] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. CoRR abs/1506.01497 (2015). arXiv:1506.01497 http://arxiv. org/abs/1506.01497

[27] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. CoRR abs/1505.04597 (2015). http://arxiv.org/abs/1505.04597

[28] S. G. Salve and K. C. Jondhale. 2010. Shape matching and object recognition using shape contexts. In 2010 3rd International Conference on Computer Science and Information Technology, Vol. 9. 471–474. DOI: http://dx.doi.org/10.1109/ICCSIT.2010.5565098

[29] Pierre Sermanet, David Eigen, Xiang Zhang, Michaël Mathieu, Rob Fergus, and Yann LeCun. 2013. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks. CoRR abs/1312.6229 (2013). arXiv:1312.6229 http://arxiv.org/abs/1312.6229 [30] M. Seyedhosseini, M. Sajjadi, and T. Tasdizen. 2013. Image

Segmen-tation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks. In 2013 IEEE International Conference on Computer Vision. 2168–2175. DOI:http://dx.doi.org/10.1109/ICCV.2013.269 [31] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov,

(14)

Image Segmentation Using Deep Learning Regulated by Shape Context

[32] Chunliang Wang. 2017. Automatic Whole Heart Segmentation Using Deep Learning and Shape Context. (2017).

[33] Chunliang Wang. 2017. Segmentation of Multiple Structures in Chest Radiographs Using Multi-task Fully Convolutional Networks. Springer International Publishing, Cham, 282–289. DOI:http://dx.doi.org/10. 1007/978-3-319-59129-2_24

[34] C. Wang, H Frimmel, and Smedby Örjan. 2014. Fast level-set based image segmentation using coherent propagation. Medical Physics 41, 7 (2014), 073501.

[35] Chunliang Wang and Orjan Smedby. 2014. Automatic Multi-organ Segmentation in Non-enhanced CT Datasets Using Hierarchical Shape Priors. In International Conference on Pattern Recognition. 3327–3332. [36] Chunliang Wang, Qian Wang, and Örjan Smedby. 2017. Automatic

Heart and Vessel Segmentation Using Random Forests and a Local Phase Guided Level Set Method. In Reconstruction, Segmentation, and Analysis of Medical Images, Maria A. Zuluaga, Kanwal Bhatia, Bern-hard Kainz, Mehdi H. Moghari, and Danielle F. Pace (Eds.). Springer International Publishing, Cham, 159–164.

(15)