• No results found

Automated Liver Segmentation from MR-Images Using Neural Networks

N/A
N/A
Protected

Academic year: 2021

Share "Automated Liver Segmentation from MR-Images Using Neural Networks"

Copied!
83
0
0

Loading.... (view fulltext now)

Full text

(1)

Institutionen för medicin och hälsa

Department of Medical and Health Sciences

Examensarbete

Automated Liver Segmentation from

MR-Images Using Neural Networks

by

Shaikh Faisal Zaman

LIU-IMH/RV-A--19/002--SE

2019-11-19

Linköpings universitet

Linköpings universitet

(2)

Linköpings universitet

Institutionen för medicin och hälsa

Examensarbete

Automated Liver Segmentation from

MR-Images Using Neural Networks

by

Shaikh Faisal Zaman

LIU-IMH/RV-A--19/002--SE

2019-11-19

Handledare: Markus Karlsson Examinator: Peter Lundberg

(3)

Abstract

Liver segmentation is a cumbersome task when done manually, often consuming qual-ity time of radiologists. Use of automation in such clinical task is fundamental and the subject of most modern research. Various computer aided methods have been incorpo-rated for this task, but it has not given optimal results due to the various challenges faced as low-contrast in the images, abnormalities in the tissues, etc. As of present, there has been significant progress in machine learning and artificial intelligence (AI) in the field of medical image processing. Though challenges exist, like image sensitivity due to differ-ent scanners used to acquire images, difference in imaging methods used, just to name a few. The following research embodies a convolutional neural network (CNN) was incor-porated for this process, specifically a U-net algorithm. Predicted masks are generated on the corresponding test data and the Dice similarity coefficient (DSC) is used as a statistical validation metric for performance evaluation. Three datasets, from different scanners (two 1.5 T scanners and one 3.0 T scanner), have been evaluated. The U-net performs well on the given three different datasets, even though there was limited data for training, reaching upto DSC of 0.93 for one of the datasets.

(4)

Acknowledgments

Throughout the tenure of this dissertation, I have received constant support and assistance. Firstly, I would like to thank my supervisor, Markus Karlsson, who helped me settle in and provide the background knowledge on MRI as well as access to different datasets of patients. He supported me throughout my master thesis and was readily available if any help was required.

I would also like to thank my examiner, Peter Lundberg, who gave his suggestions during our weekly meetings and helped with ordering of the expensive hardware which made my work way easier.

A special thanks to Dr. Chunliang Wang from KTH, Stockholm who was present during my mid-term presentation and gave me useful tips which improved the efficiency of my algorithm thereafter.

Finally, last but not the least, I would like to thank CMIV for giving me access to their network, data and for a wonderful work environment.

(5)

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables x 1 Introduction 1 1.1 Motivation . . . 1 1.2 Aim . . . 1 1.3 Research Questions . . . 1 1.4 Delimitations . . . 2 2 Theory 3 2.1 Magnetic Resonance Imaging . . . 3

2.1.1 1.5 T vs 3.0 T MRI Scanners . . . 4

2.2 Abdominal MRI and Liver Segmentation . . . 4

2.2.1 Related Works . . . 5

2.2.2 Gadoxetic Acid-Enhanced MRI . . . 5

2.3 Neural Networks . . . 5

2.3.1 Convolutional Neural Networks . . . 5

2.3.1.1 Convolution Layer . . . 6

2.3.1.2 Pooling Layer . . . 6

2.3.1.3 Fully Connected Layer . . . 8

2.3.2 Fully Convolutional Networks . . . 8

3 Method 10 3.1 System Incorporated . . . 10 3.1.1 Hardware . . . 10 3.1.2 Software . . . 10 3.2 Data Preparation . . . 11 3.3 Data Preprocessing . . . 12

3.4 Implemented Machine Learning Method . . . 12

3.4.1 U-Net: A modified FCN . . . 13

3.4.2 U-Net Architecture . . . 14

3.5 Implemented Performance Metric: Dice Similarity Coefficient . . . 15

3.6 Hyperparameters of the Model . . . 15

3.6.1 Adam Optimizer . . . 15

3.6.2 Learning Rate . . . 16

(6)

3.6.4 Epochs . . . 16

3.7 Summary of Hyperparameters and Method Flowchart . . . 17

4 Results 19 4.1 Data Preparation . . . 19

4.2 Data Preprocessing . . . 19

4.3 Evaluation of Algorithm . . . 19

4.4 PSC dataset . . . 22

4.4.1 Training: PSC dataset, Testing: PSC dataset . . . 22

4.4.2 Training: PSC dataset, Testing: HiFi dataset . . . 22

4.4.3 Training: PSC dataset, Testing: NILB dataset . . . 28

4.5 HiFi dataset . . . 36

4.5.1 Training: HiFi dataset, Testing: HiFi dataset . . . 36

4.5.2 Training: HiFi dataset, Testing: PSC dataset . . . 36

4.5.3 Training: HiFi dataset, Testing: NILB dataset . . . 36

4.6 NILB dataset . . . 50

4.6.1 Training: NILB dataset, Testing: NILB dataset . . . 50

4.6.2 Training: NILB dataset, Testing: PSC dataset . . . 50

4.6.3 Training: NILB dataset, Testing: HiFi dataset . . . 50

4.7 Comparison of Identical Slices . . . 50

5 Discussion 64 5.1 Results . . . 64

5.1.1 Data Preprocessing . . . 64

5.1.2 Evaluation of Algorithm: Same dataset for training and testing . . . 64

5.1.3 Evaluation of Algorithm: Different datasets for training and testing . . . 65

5.1.4 Evaluation of Algorithm: Comparison of Results . . . 66

5.2 Method . . . 66

5.3 The Work in a Wider Context: Ethical Aspects . . . 67

6 Conclusion 68 6.1 Answers to Research Questions . . . 68

6.2 Future Work . . . 69

(7)

List of Figures

1.1 Mortality rate variations per 100,000 people in different regions of the world. . . . 2 2.1 An example of CNN application for anatomy classification in whole body CT scans. 6 2.2 An example of an unpadded convolution operation with a kernel of size 3x3 and

a stride of 1. . . 7 2.3 Common activation functions applied to CNN: (a) ReLU, (b) sigmoid, and (c)

hy-perbolic tangent (tanh). . . 8 2.4 An example of an unpadded max pooling operation. Filter size is 2x2 with a stride

of 2. . . 8 2.5 An example of FCN application for semantic segmentation in medical imaging. . . 9 3.1 Splitting of datasets into training set, validation set and test set in the ratio 60:20:20. 12 3.2 A significant difference can be seen when applying global histogram equalization

compared to applying CLAHE on images. . . 13 3.3 An example of U-net architecture. Each multi-channel feature map is shown by a

blue box. The number on top of the box denotes number of channels. The num-ber at the lower left edge of the box denotes the x-y-size. Copied feature maps are shown by white boxes. Different operations according to the color code are denoted by arrows. . . 14 3.4 Impact of learning rate on training. . . 16 3.5 Flowchart of the incorporated method . . . 18 4.1 Examples of MRI slices in the three datasets and their respective ground truths . . 20 4.2 Examples of MRI slices in the HiFi dataset before and after applying CLAHE . . . 21 4.3 Overview of PSC Training Data . . . 23 4.4 Overview of PSC Test Data . . . 24 4.5 Overview of PSC Predicted Masks using PSC as training data . . . 25 4.6 Top row:- Left: Image number 38, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9426 . . . . 26 4.7 Top row:- Left: Image number 46, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9703 . . . . 26 4.8 Top row:- Left: Image number 51, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9862 . . . . 27 4.9 Top row:- Left: Image number 116, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.8947 . . . . 27 4.10 Top row:- Left: Image number 193, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9024 . . . . 28

(8)

4.11 Overview of HiFi Test Data . . . 29 4.12 Overview of HiFi predicted masks with PSC as training data . . . 30 4.13 Top row:- Left: Image number 80, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9196 . . . . 31 4.14 Top row:- Left: Image number 93, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9314 . . . . 31 4.15 Top row:- Left: Image number 437, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.0000 . . . . 32 4.16 Overview of NILB Test Data . . . 33 4.17 Overview of NILB predicted masks with PSC as training data . . . 34 4.18 Left: Image number 66, Middle: Predicted mask, Right: Overlay of mask on image 35 4.19 Left: Image number 158, Middle: Predicted mask, Right: Overlay of mask on image 35 4.20 Left: Image number 215, Middle: Predicted mask, Right: Overlay of mask on image 35 4.21 Overview of HiFi Training Data . . . 37 4.22 Overview of HiFi Test Data . . . 38 4.23 Overview of HiFi Predicted Masks using HiFi as training data . . . 39 4.24 Top row:- Left: Image number 80, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9883 . . . . 40 4.25 Top row:- Left: Image number 93, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9676 . . . . 40 4.26 Top row:- Left: Image number 294, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9548 . . . . 41 4.27 Top row:- Left: Image number 437, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.5280 . . . . 41 4.28 Top row:- Left: Image number 596, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9690 . . . . 42 4.29 Overview of PSC Test Data . . . 43 4.30 Overview of PSC Predicted Masks using HiFi as training data . . . 44 4.31 Top row:- Left: Image number 38, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.0.8664 . . . . 45 4.32 Top row:- Left: Image number 51, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.8680 . . . . 45 4.33 Top row:- Left: Image number 193, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.6371 . . . . 46 4.34 Overview of NILB Test Data . . . 47 4.35 Overview of NILB predicted masks with HiFi as training data . . . 48 4.36 Left: Image number 66, Middle: Predicted mask, Right: Overlay of mask on image 49 4.37 Left: Image number 158, Middle: Predicted mask, Right: Overlay of mask on image 49 4.38 Left: Image number 215, Middle: Predicted mask, Right: Overlay of mask on image 49 4.39 Overview of NILB Training Data . . . 51 4.40 Overview of NILB Test Data . . . 52 4.41 Overview of NILB Predicted Masks using NILB as training data . . . 53

(9)

4.42 Left: Image number 66, Middle: Predicted mask, Right: Overlay of mask on image 54 4.43 Left: Image number 158, Middle: Predicted mask, Right: Overlay of mask on image 54 4.44 Left: Image number 215, Middle: Predicted mask, Right: Overlay of mask on image 54 4.45 Left: Image number 636, Middle: Predicted mask, Right: Overlay of mask on image 54 4.46 Left: Image number 1755, Middle: Predicted mask, Right: Overlay of mask on image 55 4.47 Overview of PSC Test Data . . . 56 4.48 Overview of PSC Predicted Masks using NILB as training data . . . 57 4.49 Top row:- Left: Image number 38, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.8864 . . . . 58 4.50 Top row:- Left: Image number 51, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9790 . . . . 58 4.51 Top row:- Left: Image number 193, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9007 . . . . 59 4.52 Overview of HiFi Test Data . . . 60 4.53 Overview of HiFi predicted masks with NILB as training data . . . 61 4.54 Top row:- Left: Image number 80, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9782 . . . . 62 4.55 Top row:- Left: Image number 93, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9600 . . . . 62 4.56 Top row:- Left: Image number 437, Middle: Predicted mask, Right: Overlay of

mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.0000 . . . . 63

(10)

List of Tables

2.1 A list of parameters which are automatically optimized during training and its corresponding hyperparameters which are user-defined beforehand in a CNN. . . 9 3.1 Summary of hyperparameters and their values used in this algorithm . . . 17 4.1 Preparing and dividing the datasets into training data and testing data . . . 19 4.2 DSC of results obtained with different datasets. If the same dataset is used for

training as well as testing, it is emphasized in bold. . . . 22 4.3 Individual DSC comparison of identical slices with different training datasets.

(11)

1

Introduction

1.1

Motivation

Hepatocellular carcinoma (HCC) is the third-most common reason for cancer related deaths throughout the world [1]. An examination of the liver would lead to detection of the anomaly, if any is present, in the tissue architecture or shape and texture of the liver. Figure 1.1 refers to the variations in mortality rates per 100,000 in different regions of the world categorized by age-adjusted mortality rates. [2]

In clinical routine, manual or semi-manual segmentation techniques are implemented to diagnose the liver in CT and MR images. Most of the research has been focused on liver segmentation in CT images till date. Very few research is actually focused on MR images. The primary challenge being that MR images are more affected by artifacts than CT images. MR images also have an extremely low gradient response which makes accurate liver seg-mentation quite difficult [3]. Liver segseg-mentation is a cumbersome and tedious task when done manually. Often, it consumes quality time of radiologists. Manual segmentation is also operator dependent. [1]

The automation of this task is hence of great importance and interest to researchers and radiologists alike. This thesis focuses on deep learning as one of the methods to automate this cumbersome process and compare the results with manual segmentation of MRI data.

1.2

Aim

The objective of this project is to investigate how segmentation of the liver can be imple-mented from abdomen MR images, using neural networks, obtained from

• scanners of same field strengths.

• and scanners of different field strengths (1.5 T and 3.0 T).

1.3

Research Questions

The following research questions have been formulated on this topic:

(12)

1.4. Delimitations

2. How is the algorithm tweaked to give optimal results on the given datasets?

3. Are the results different when using the algorithm on different datasets? If so, why and how?

4. How does the result differ when training with one dataset and testing with another?

1.4

Delimitations

The delimitation in this research correspond to the fact that the predicted masks would only be as good as the ground truth mask used for training. Calculation of the Dice coefficient depends on the ground truth masks.

(13)

2

Theory

2.1

Magnetic Resonance Imaging

Magnetic resonance imaging (MRI) is an imaging technique used in radiology to view pic-tures of the anatomy and physiological processes of the body, in particular the soft tissues. MRI is based on the principle of nuclear magnetic resonance (NMR). NMR is a spectroscopic technique used to acquire the physical and chemical properties of different molecules. MRI relies on the emission and absorption of energy in the radiofrequency (RF) range of the elec-tromagnetic spectrum. [4]

Images in MRI are produced based on spatial variations in the frequency and phase of the RF energy being absorbed and emitted by the given object to be imaged. Biologically relevant elements such as hydrogen, oxygen-16, oxygen-17 and fluorine-19 are used for producing MR images. Since the human body is mainly made up of fat and water, it has many hydrogen atoms (approximately 63% in the whole body). NMR signals are produced by hydrogen atoms, hence clinical MRI primarily images the NMR signal from the hydrogen nuclei in the human body. A single proton has no magnetic moment and, thus, it is undetectable. But, in general, protons behave like small bar magnets consisting of a north pole and a south pole in the presence of a magnetic field. When there is no magnetic field acting on a group of protons, they acquire a random orientation of magnetic moments. When there is an external magnetic field acting on them, they align themselves in a non random manner. This results in a magnetic moment which can be measured and it is in the direction of the applied external magnetic field. Different types of tissues can then be visualized by applying RF pulses, and they vary based on the differences in signal from hydrogen atoms obtained from each tissue. [4]

Systems having varying magnetic field strengths - open MRI units with field strength of 0.3 Tesla (T) to extremity MR systems having field strengths upto 1.0 T and finally whole-body scanners, primarily in clinical use, having field strengths upto 3.0 T - can be used in medical imaging procedures. MRI and CT can be used to image the same tissue or body part. Both have their pros and cons. The advantage of MRI over CT imaging is that there is no ionizing radiation involved, high resolution imaging, multi-planar imaging capabilities and soft tissue, generally, has a high contrast resolution (compared to CT). The disadvantage of MRI is that it is very time consuming and this proves to be one of the biggest challenges of MRI, especially with the advent of faster CT scanners. [4]

(14)

2.2. Abdominal MRI and Liver Segmentation

Differences in various soft tissues of the body can be highlighted using a number of pulse sequences. Two most common and widely used pulse sequences are T1-weighted and T2-weighted pulse sequences. T1-T2-weighted sequences are ideal for anatomical structures. Fat, proteinaceous fluid, calcium (some forms), melanin, blood (methemoglobin) and gadolin-ium (contrast agent) are tissues which show a high signal (bright) in T1-weighted image sequences. T2-weighted sequences are ideally used for identifying pathological processes since they are considered fluid conspicuous pulse sequences. Fluid containing structures (like cysts, cerebrospinal fluid, joint fluids, etc.) and inflammation sources causing a height-ened level of extracellular fluid are tissues that show a high signal on T2-weighted image sequences. [4]

2.1.1

1.5 T vs 3.0 T MRI Scanners

An increase in temporal and spatial resolution leads to better detection and characterization of small liver lesions. In theory, 3.0 T MR scanners allow the acquisition of images at a higher spatial and temporal resolution compared to their 1.5 T counterparts. This is due to the two-fold increase in signal to noise ratio (SNR) compared to 1.5 T. 3.0 T scanners also provide a better contrast to noise ratio (CNR) on post gadolinium images and the ability to suppress fat better and quicker. [5]

Despite these advantages, the 3.0 T MR scanners may not improve the image quality of all sequences in MRI applications. They have several drawbacks which include constraints in specific absorption rate (SAR), increase in imaging artifacts and an increase in T1 relaxation times when compared with 1.5 T MR scanners. SAR is a measure of the radiofrequency en-ergy deposition within the human body. SAR quadruples in 3.0 T compared to 1.5 T since the field strength is doubled. Artifacts such as susceptibility artifacts and chemical shift arti-facts of the first kind have been shown to double in 3.0 T scanners rather than 1.5 T scanners. An increase in T1 relaxation times in 3.0 T leads to a change in the contrast resolution when compared with 1.5 T. [5]

Though 3.0 T does provide better image quality than 1.5 T scanners, it does incorporate limitations. 3.0 T MRI is more sensitive to vascular pulsation, respiratory motion and di-electric effect. The issue of tissue heating is also a major concern. At fat-to-water interfaces, chemical shift artifacts are also more pronounced. [6]

2.2

Abdominal MRI and Liver Segmentation

One of the most common diseases prevalent in the world are hepatic diseases. One of the most common cancers is the hepatic cell carcinoma (HCC). Within five years of diagnosis, an estimated 90% of people with liver cancer were not expected to survive. To improve survival rates, an efficient and early detection along with the treatment response evaluation is required of patients suffering from liver cancer. [7]

An accurate imaging modality for the diagnosis of liver diseases is done by MRI. A computer-aided diagnosis (CAD) tool for liver MRI is important for increasing the produc-tivity of radiologists and increasing their efficiency in diagnosis of MRI images. Accurate liver segmentation is one of the first and foremost step of a CAD system. Due to the indis-tinguishable grey level distribution of surrounding organs, the segmentation of the liver in abdominal MRI images remains a challenge. [8]

In abdominal MRI images, it is close to impossible to apply a purely threshold based tech-nique for segmentation because they grey level intensity between liver, muscles and kidneys are very similar. This often leads to over-segmentation of the liver. On top of that, segmenta-tion leakage also occurs due to the vasculature of the liver. [8]

(15)

2.3. Neural Networks

2.2.1

Related Works

A number of automated methods have been produced for liver segmentation in CT images and few for MRI images. Badakhshannoory and Saeedi [9] proposed a model-based valida-tion scheme in CT scan volumes for organ segmentavalida-tion. Foruzan et al. [10] made use of a knowledge-based technique for liver segmentation in CT images. Zhang et. al [11] employed a method based on a statistical shape model (SSM) which was integrated with an optimal surface detection strategy which was used for automatic liver segmentation in CT images. Zhao et al. [12] employed a thresholding method which first removed the ribs and spines in the input image and then the liver region was segmented incorporating a fuzzy C-means clustering algorithm and morphological reconstruction filtering.

Chen et al. [13] proposed a multiple-initialization level set method (LSM) to overcome the issue of over segmentation and leakage in the MRI image intended for liver segmentation. Gloger et al. [14] employed a three-step method for liver segmentation which had LDA-based probability maps for multiple contrast MR images. This method is based on a modified region growing approach incorporated with a thresholding algorithm. Yuan et al. [15] proposed a method based on fast marching and improved fuzzy clustering methods for automatic liver segmentation from abdominal MR images. A combination of active countours and neural networks was employed by Middleton and Damper for their MRI segmentation algorithm.

2.2.2

Gadoxetic Acid-Enhanced MRI

The detection of HCC at an early stage can prove to be very fruitful and it is a desirable phe-nomenon by healthcare professionals. For this purpose, hepatocyte-specific contrast agents for MRI have been identified as an important component for the imaging process, since they not only help detect HCC at early stages, but they also help in detection of small HCCs and the characterization of focal liver lesions. Gadoxetic acid (gadolinium-ethoxybenzyl-diethlenetriamine pentaacetic acid; Gd-EOB-DTPA) is a widely used contrast agent for hepa-tocytes. This agent is secreted into the bile and urine at 1:1 approximate ratio. [16]

2.3

Neural Networks

A wide class of flexible nonlinear regression and discriminant models, data reduction mod-els and non linear reduction systems can be described as neural networks (NN). As the name suggests, NN contain a large number of "neurons", which simply are linear or nonlinear com-puting elements, often organized into layers and interconnected in complex manners. There are three main ways NN are incorporated [17]:

• as signal processors or controllers in a real-time environment used in hardware for ap-plications like robotics

• as biological nervous system and "intelligence" models • as methods for data analysis

2.3.1

Convolutional Neural Networks

Convolutional Neural Networks (CNN) are a class of NN which is used for processing data consisting of a grid pattern, like images, and is specifically designed to automatically and adaptively learn spatial hierarchies of features. The learning process occurs from a low level to a high level pattern (Figure 2.1). CNN is a type of deep learning model which constitutes three layers - convolution, pooling and fully connected layers. These layers are also be called as the building blocks of the model. [18]

(16)

2.3. Neural Networks

Figure 2.1: An example of CNN application for anatomy classification in whole body CT scans. [19]

2.3.1.1 Convolution Layer

Convolution layers are responsible for feature extraction. The convolution layers introduce spatial correlation in the input images for the computation of each feature map by sharing the filter kernel weights. The feature extraction consists of a convolution operation (linear) and activation function (nonlinear). [18]

Convolution - Convolution can be described as a linear operation which is employed for feature extraction. A small array of numbers is applied across the input. This small array is called a kernel and is in turn an array of numbers called a tensor. An element-wise product of the input tensor and each element of the kernel is measured at each tensor location and finally it is summed to obtain the output value of the output tensor in the respective position. This output tensor is known as a feature map (Figure 2.2). [18]

There are two key hyperparameters that define a convolution operation - size and number of kernels. Size of a typical convolution is 3x3 whereas a typical kernel is arbitrary and it denotes the depth of output feature maps. Also defining the convolution operation is stride, which is the distance between two successive kernel positions. Generally, stride is taken as 1. To achieve downsampling of feature maps, stride value could be increased. [18]

Nonlinear activation function - The outputs of the linear convolution operation are ad-vanced through a nonlinear activation function. Presently, the most common used activation function is the rectified liniear unit (ReLU). ReLU computes the function (Figure 2.3 (a)) [18],

f(x) =max(0, x) (2.1)

Other common activation functions are - sigmoid and hyperbolic tangent (tanh) function. These functions are smooth (Figure 2.3 (b) and (c)). [18]

The activation function implemented on the last fully connected layer is generally differ-ent from the previous functions. Each task determines a selection of an appropriate activa-tion funcactiva-tion. For a binary classificaactiva-tion task, a sigmoid activaactiva-tion funcactiva-tion is used in the last layer whereas for a multiclass classification task, a softmax function is implemented in the final layer of the network. [18]

2.3.1.2 Pooling Layer

Pooling layers are also responsible for feature extraction. Pooling layers preserve the most relevant features of the feature map while discarding the other features and, as a result,

(17)

reduc-2.3. Neural Networks

Figure 2.2: An example of an unpadded convolution operation with a kernel of size 3x3 and a stride of 1. [18]

(18)

2.3. Neural Networks

Figure 2.3: Common activation functions applied to CNN: (a) ReLU, (b) sigmoid, and (c) hyperbolic tangent (tanh). [18]

ing the the dimensions of each input feature map. Similar hyperparameters to convolution operations exist in pooling operations, which are, filter size, stride and padding. [18]

Max pooling - Max pooling is the most common form of pooling operation in CNNs. It extracts patches from input feature maps and outputs the maximum value in that patch while discarding all the other values. This is done for each patch in the input feature map. The size of the patch chosen depends on the filter size of the max pooling operation. In practice, a filter size of 2x2 with a stride of 2 is implemented. Depth dimension remains untouched unlike the height and width of the feature maps (Figure 2.4). [18]

Figure 2.4: An example of an unpadded max pooling operation. Filter size is 2x2 with a stride of 2. [18]

2.3.1.3 Fully Connected Layer

Fully connected layers are responsible for mapping the extracted and downsampled features by the convolution layers and pooling layers respectively with the final outputs of the net-work. This is done with the help of a nonlinear activation function (like ReLU), with each fully connected layer being followed by an activation function. The final fully connected layer has a number of output nodes equal to the number of classes. All parameters and hy-perparameters in a CNN are listed in Table 2.1. [18]

2.3.2

Fully Convolutional Networks

CNNs pose a disadvantage. There is loss in spatial information of the images when the ex-tracted features are fed into the final fully connected layers of the network. Semantic seg-mentation requires presence of spatial information since each pixel in the image is linked

(19)

2.3. Neural Networks

Table 2.1: A list of parameters which are automatically optimized during training and its corresponding hyperparameters which are user-defined beforehand in a CNN. [18]

Parameters Hyperparameters

Convolution layer Kernels Number of kernels, kernel size, padding, stride, activation function

Pooling layer None Pooling method, filter size, padding, stride

Fully connected layer Weights Number of weights, activation function

Others

Optimizer, learning rate, loss func-tion, model architecture, batch size, epochs, regularization, weight initializ-ing, dataset splitting

with a class label in semantic segmentation. To overcome this issue, Long et al.[20] proposed a fully convolutional network (FCN). In this method, transposed convolutional layers are im-plemented to replace the final densely connected layers of the CNN. This leads to application of a learned up-sampling, by the transposed layers, to the low-resolution feature maps within the network. Semantic segmentation is performed simultaneously with the recovering of the original spatial dimensions of the input image (Figure 2.5). [19]

Figure 2.5: An example of FCN application for semantic segmentation in medical imaging. [19]

(20)

3

Method

3.1

System Incorporated

The data preparation and preprocessing, as well as algorithm implementation, was done using the following hardware and software components.

3.1.1

Hardware

At first, an Acer laptop with the following specifications was employed: • Windows 10 Pro 64-bit

• Intel Core i7-4510U 2.0 GHz • 8 GB DDR3 RAM

• NVIDIA GeForce GTX 850M GPU with 4GB dedicated VRAM • 8 GB SSD + 1000 GB HDD

Then, a new system was bought for this research work with a better CPU and GPU. The following were its specifications:

• Windows 10 Education 64-bit • Intel Core i7-7800X 3.5 GHz • 32 GB RAM

• NVIDIA GeForce GTX 1080 GPU with 8GB dedicated VRAM

3.1.2

Software

Python 3.5 was used to develop the algorithm with updated NVIDIA drivers, CUDA 9.0 and cuDNN 9.0 libraries. The following repositories were used in Python 3.5:

(21)

3.2. Data Preparation • tensorflow • keras • open-cv • scikit-image • numpy • Matplotlib

PyCharm IDE was used for writing the algorithm. PyCharm is developed by the Czech company JetBrains and is utilized for computer programming, especially for the Python lan-guage.

MATLAB, specifically ’imshow3D’ and ’Medical Image Processing Toolbox’, was used to convert and save each slice of abdomen MRI image and its corresponding liver masks to .tif format.

3.2

Data Preparation

Three datasets were provided which contained 3D abdomen MRI images and liver masks. The abdomen MRI images were obtained in MetaImage (.mhd) format and the masks were in the Visualization Toolkit (.vtk) formats. The datasets were named and consisted of the following:

• PSC - This dataset contains 1.5 T, T1-weighted fat-suppressed MRI images (THRIVE). It consists of 26 3D medical examinations with corresponding liver masks for each exam-ination.

• HiFi - This dataset contains 3.0 T, T1-weighted 2-point-Dixon MRI reconstructed into water images (Philips commercial reconstruction). It consists of 45 3D medical exami-nations with corresponding liver masks for each examination.

• NILB - This dataset contains 1.5T, T1-weighted 2-point-Dixon MRI reconstructed into water images (in-house reconstruction). It consists of 102 3D medical examinations with corresponding liver masks for most of the examinations. A total of 23 medical examinations did not have their respective liver masks, hence this was used as test set while running the algorithm.

One by one, 3D examinations and their corresponding masks from each dataset was loaded onto MATLAB and each slice of the examination and mask was saved in .tif format in respective folders. For example, if one of the examinations had 100 slices, 100 abdomen MRI images were saved and 100 masks of the respective abdomen MRI image slices were saved as well, both in different folders.

In any neural network, the dataset has to be split into training data, validation data and test data. Training set is where the model trains and learns from the data. Validation set is used to frequently evaluate the given model. The evaluation is unbiased and used to tune the model hyperparameters. The model, as a result never learns from this data. The test set provides the gold standard used to evaluate the model. When a model is completely trained, the test set is utilized to evaluate its performance. The model has never seen this data. [21]

The training and test sets for the algorithm are split in a 80:20 ratio for all datasets. The validation dataset is automatically split from the training set while running the training pro-cess (20% of the training set). Thus, the overall split proceeds to a 60:20:20 split. (Figure 3.1).

(22)

3.3. Data Preprocessing

Figure 3.1: Splitting of datasets into training set, validation set and test set in the ratio 60:20:20. [21]

3.3

Data Preprocessing

The images obtained from the datasets were of different sizes. First, these images were resized to one uniform size as required by the algorithm to proceed. For PSC dataset, all training and test images were of 352x352 pixels. Hence there was no change for this dataset. But for HiFi and NILB datasets, the images were resized to 400x400 pixels since most of the images were of this size in the training and test datasets. Few of the images (in the HiFi dataset) were quite dark and hence there was some difficulty in visualizing them. This was regarded as a windowing issue which affected the performance of the algorithm and the predicted masks. Preprocessing on the given dataset was limited to applying contrast-limited adaptive his-togram equalization (CLAHE) on each slice of abdomen MRI image (each .tif image). On performing CLAHE on the HiFi dataset, the DSC was improved. Hence, CLAHE was per-formed on PSC and NILB dataseets as well. This resulted in an increased DSC for each of these datasets too.

Histogram equalization, as the name suggests, is a method in which the histogram of an image is stretched to cover pixel values from all regions of the image rather than confinement to some specific range of values only. As a result, the contrast of the image is improved. CLAHE is another concept based on adaptive histogram equalization (AHE) and is used for adaptive contrast enhancement in images (Figure 3.2).

In AHE, the histogram is measured for the contextual region of a pixel. Intensity of that pixel is then transformed to a value within the display range proportional to the rank of the pixel intensity in the local intensity histogram [22]. CLAHE is a modified version of AHE in which the calculation of the enhancement parameter is refined by subjecting it to a user-specified maximum, which is the clip level, to the height of the local histogram. This, in turn, affects the maximum contrast enhancement factor. The enhancement is thus reduced in small particular areas of the image preventing noise over-enhancement and reducing the edge-shadowing effect seen in unlimited AHE. CLAHE parameters which are user-specified include clip level of the histogram and size of the pixels‘s contextual region. [23]

After CLAHE was performed on the datasets, training and test images of a particular dataset (which was to be "tested") were saved into their own respective numpy arrays, mak-ing them into two smak-ingle files - ’imgs_train.npy’ and ’imgs_test.npy’. The trainmak-ing masks were stored in a numpy array called ’imgs_mask_train.npy’. This was done to facilitate easy retrieval and loading of the training and test data while running the deep learning algorithm.

3.4

Implemented Machine Learning Method

The implemented machine learning method for this research was chosen as a 2D U-net. It is a deep learning method, specifically a modified FCN.

(23)

3.4. Implemented Machine Learning Method

(a) Top: Original Image, Bottom: After Global His-togram Equalization

(b) After CLAHE

Figure 3.2: A significant difference can be seen when applying global histogram equalization compared to applying CLAHE on images. [24]

3.4.1

U-Net: A modified FCN

Ronnerberger et al. [25] brought about a complete change in how biomedical images are segmented. A modified elegant architecture of the FCN was implemented which was called the U-net.The advantages over FCN is that it works with even very few training images and yields more accurate segmentation results.

Though CNN have existed since a long time, limitations like size of training data available and the size of the considered networks held back their progress. In most biomedical image processing tasks, the output which is desired should also include localization, i.e., each pixel is assigned a class label, in addition to the required classification tasks. Also, acquiring thou-sands of training images is beyond the reach in the case of biomedical tasks. [25]

In this given task, there was a fair amount of training data for two datasets - HiFI and NILB datasets, but the third dataset - PSC dataset, had comparatively lesser data for train-ing. Thus, U-net model of FCN was applied in this research to get the desired segmentation output in the images obtained from all three datasets due to its proven concept in terms of fewer training data and pixel-wise segmentation accuracy, especially for biomedical image segmentation tasks. Keras functional API was used to implement this deep neural network

(24)

3.4. Implemented Machine Learning Method

3.4.2

U-Net Architecture

Figure 3.3 refers to the architecture of the U-net. [25] As the name suggests, it consists of a contracting path (left side) and an expansive path (right side). This gives the network its characteristic "u-shaped" structure. The classic architecture of a CNN is replicated in the contracting path. Repeated two 3x3 unpadded convolutions are applied continuously, af-ter which a rectified linear unit (ReLU) and a 2x2 max pooling operation (with stride 2) for downsampling are implemented. The number of feature channels are doubled with each downsampling step. In the expansive path, every step consists of an upsampling of the fea-ture map followed by a 2x2 convolution that reduces the number of feafea-ture channels by half. This 2x2 convolution process is also called up-convolution. Simultaneously, a concatenation with the respective cropped feature map from the contracting path and two 3x3 convolutions, each followed by a ReLU is done. Due to the loss of border pixels in every convolution, crop-ping of the feature map is an important aspect. A 1x1 convolution at the final layer is used to map each 64-component feature vector to the number of classes desired. The total number of convolutional layers in the network is 23. [25]

The U-net model implemented in this research differs slightly from the original U-net model used by Ronneberger et al. for their experiment. The feature map size is restricted to 512x512 in our dataset and number of feature channels that begin the network are different (32 feature channels compared to 64 channels in the original U-net model). Also, in the final 1x1 convolution, sigmoid function is used as activation function. The sigmoid function ensures that the mask pixels are in [0,1] range.

Figure 3.3: An example of U-net architecture. Each multi-channel feature map is shown by a blue box. The number on top of the box denotes number of channels. The number at the lower left edge of the box denotes the x-y-size. Copied feature maps are shown by white boxes. Different operations according to the color code are denoted by arrows. [25]

A major modification of the U-net in comparison with FCN is the addition of a large num-ber of feature channels in the upsampling part. These feature channels allow the network to convey context information to higher resolution layers. This makes the expansive path of the

(25)

3.5. Implemented Performance Metric: Dice Similarity Coefficient

network similar to the contracting path, yielding the characteristic "u-shaped" architecture. Only valid parts of each convolution is implemented, the network does not have any fully connected layers. This means that the segmentation map contains pixels for which the full context is available in the input image and disregards the one outside the field of convolution. Hence, through this strategy, large images can be seamlessly segmented by another strategy called the overlap-tile strategy. The pixels in the border region of the image are predicted by extrapolating the missing context, done by mirroring the input image. Large arbitrary images require the tiling strategy to be applied on the network because if not, the resolution of the output would be limited by GPU memory. [25]

3.5

Implemented Performance Metric: Dice Similarity Coefficient

The accuracy of the method utilized in this research was assessed using the Dice similarity coefficient (DSC). DSC is a validation metric to gauge the similarity between two samples. The percentage similarity of sets can be compared using this metric. In an analyzed digital image, areas with specific pixel numbers can be specified as sets. [26]

Let LautoĂZ2be the segmented liver obtained utilizing the U-net model and LgroundĂZ2 be the ground truth of the liver segment in the abdominal MRI images. An assumption was made where |Lground|is the number of pixels in the area of the liver segment in the ground truth whereas |Lauto|is the number of pixels discovered in the area automatically isolated by the algorithm. |LautoXLground|is the pixels found in the common area between these two. Mathematically, the DSC can then be defined as:

DSC= 2|LautoXLground|

|Lauto|+|Lground|

(3.1) The negative of DSC is taken as the loss function for the training which is implemented using Keras backend as a custom loss function. A smoothing factor of 1 is added to the loss function for making it smooth. Mathematically,

DSCloss=´DSC (3.2)

3.6

Hyperparameters of the Model

Several hyperparameters of the U-net could be user defined to obtain the best results on output data.The following hyperparameters were used in this research:

3.6.1

Adam Optimizer

An optimizer in machine learning helps us to minimize the Error function, E(x). E(x) is a mathematical function which is dependent on the internal learnable parameters(eg. weights and bias values) of the model. These internal parameters are used in computing the target values (Y) from a set of predictors (X) used in the model. [27]

The name Adam is short for adaptive moment estimation. It requires only first-order gra-dients with less memory requirement and is a method for efficient stochastic optimization. Adam optimizer is combined to showcase the best of two commonly used optimizers - Ada-Grad founded by Duchi et al. [28] and RMSProp implemented by Tielman and Hinton [29]. AdaGrad shows good results with sparse gradients whereas RMSProp is proved to be effi-cient in on-line and non-stationary settings. [30]

The advantages of using Adam over other methods include - parameter update mag-nitudes do not vary with gradient rescaling, the step-size hyperparameter approximately bound its step-sizes, a stationary objective is not required, sparse gradients do not cause an issue with it and a form of step size annealing is performed naturally. [30]

(26)

3.6. Hyperparameters of the Model

Adam has four hyperparameters. [31] They are: 1. learning rate (α)

2. β from momentum (default 0.9) 3. β2from RMSprop (default 0.999) 4. epsilon, e (default 1e-8)

In practice, the learning rate (α) is to be user-defined to obtain better efficiency of the algo-rithm during the training process. The other three hyperparameters work well with default values. [31]

3.6.2

Learning Rate

Learning rate or step size refers the amount that the weights are updated. It is a hyperparam-eter which is responsible for regulating how much the weights of the network are adjusted with respect to the loss gradient. Figure 3.4 describes the importance of choosing an optimal learning rate.

Figure 3.4: Impact of learning rate on training. [32]

There were many experiments performed with different learning rates to obtain the best DSC for this research work. The corresponding results were compared. Finally, a learning rate of 1e-3 was chosen as the appropriate learning rate for training the algorithm. This learn-ing rate gave the best DSC, hence, obtainlearn-ing the best predicted segmentation of the liver.

3.6.3

Batch Size

Batch size is a hyperparameter which defines the number of training samples to implement the initial parameters before updating the internal model parameters. The batch size was chosen as 16 for training the algorithm.

3.6.4

Epochs

The number of epochs is a hyperparameter that sets the number of times the training algo-rithm will be implemented through the entire training dataset. The number of epochs was set to 50 in this algorithm.

(27)

3.7. Summary of Hyperparameters and Method Flowchart

3.7

Summary of Hyperparameters and Method Flowchart

A summary of all the hyperparameters and their corresponding values implemented are shown in Table 3.1. A flowchart summarizing the workflow of the method is displayed in Figure 3.5.

Table 3.1: Summary of hyperparameters and their values used in this algorithm Hyperparameters Values

Optimizer Adam Learning rate 1e-4

Batch size 16 Number of epochs 50 Loss function -DSC

(28)

3.7. Summary of Hyperparameters and Method Flowchart

Dataset

Data Preparation

Training Data Test Data

Data Preprocessing

Input Hyperparameters

U-Net for Liver Segmentation

Segmented Liver

Can DSC be better? Update Model

Final Saved Segmentations Yes

No

(29)

4

Results

4.1

Data Preparation

The data was prepared as described in Section 3.2. The results of the prepared datasets are displayed in Table 4.1. The training and test images were stored in their respective folders and were always kept apart from each other. Examples of images from each dataset and its respective ground truth is depicted in Figure 4.1

Table 4.1: Preparing and dividing the datasets into training data and testing data

Dataset Total no. of slices No. of training data images

No. of test data images

PSC 2185 1705 480

HiFi 5855 4694 1161

NILB 9828 7618 2210

4.2

Data Preprocessing

According to Section 3.3, resizing of the training images and masks were done to a uniform size to feed into the algorithm. For PSC dataset since all the training and test images were of size 352x352 pixels, no resizing of images had to be done. Whereas for the HiFi and NILB datasets, the images were resized to 400x400 since majority of the training and test images were of size 400x400 pixels. CLAHE was then performed on all images from the three datasets. The windowing issue was resolved and the images have a better contrast than before, as seen in Figure 4.2.

4.3

Evaluation of Algorithm

The results of the algorithm is summarized in Table 4.2. All datasets are divided in the 80:20 ratio for training and testing respectively. This is regardless of the number of images in each dataset (for different datasets used for training and testing). The training and test images are never mixed together for any dataset. For example, when using the HiFi dataset for training

(30)

4.3. Evaluation of Algorithm

(a) From top: PSC, HiFi, NILB datasets (b) Respective ground truths

(31)

4.3. Evaluation of Algorithm

(a) Before CLAHE is applied (b) After CLAHE is applied

(32)

4.4. PSC dataset

and the PSC dataset for testing, appropriate number of training images are used from the HiFi dataset to correspond to the total number of test images in the PSC dataset to maintain the 80:20 ratio. Similarly, when using the HiFi dataset for training and the NILB dataset for testing, appropriate number of test images are used from the NILB dataset to correspond to the total number of traiing images in the HiFi dataset to maintain the 80:20 ratio.

Table 4.2: DSC of results obtained with different datasets. If the same dataset is used for training as well as testing, it is emphasized in bold.

Training dataset Testing dataset No. of training data images No. of test data images Training DSC Validation DSC PSC PSC 1705 480 0.9882 0.8916 PSC HiFi 1705 480 0.9856 0.8893 PSC NILB 1705 480 0.9877 0.8829 HiFi HiFi 4694 1161 0.9857 0.9074 HiFi PSC 1705 480 0.9906 0.8993 HiFi NILB 4694 1161 0.9907 0.8863 NILB NILB 7618 2210 0.9900 0.9351 NILB PSC 1705 480 0.9885 0.8570 NILB HiFi 4694 1161 0.9894 0.9218

The following sections below are demarcated on the basis of training data. For calculating the DSC of individual predicted masks and ground truth, ’DSCImageCalc’, a free software for calculating similarity coefficients on segemented images written in Visual BASIC.NET by Tom Lawton (2017) [33], was used.

4.4

PSC dataset

4.4.1

Training: PSC dataset, Testing: PSC dataset

Initially, training data and testing data were both acquired from PSC dataset. An overview image of the training data with ground truth is shown in Figure 4.3 and an overview image of the test data is shown in Figure 4.4.

Predicted masks when using PSC dataset as training data is shown in Figure 4.5. Few cases of the predicted masks are shown and this is overlaid onto the original test image. A comparison of the ground truth and the predictions are also visualized (Figures 4.6. 4.7, 4.8, 4.9, 4.10).

4.4.2

Training: PSC dataset, Testing: HiFi dataset

Then, training data was acquired from PSC dataset and the test data was used from the HiFi dataset. An overview image of the test data and predicted masks are shown in Figures 4.11 and 4.12 respectively.

Few cases of the predicted masks are shown and this is overlaid onto the original test image. A comparison of the ground truth and the predictions are also visualized (Figures 4.13, 4.14, 4.15). The algorithm was unable to predict the segmentation of the test data from image number 363 to image number 480.

As can be seen in Figure 4.15, there is no predicted mask and hence the algorithm fails in this instance.

(33)

4.4. PSC dataset

(34)

4.4. PSC dataset

(35)

4.4. PSC dataset

(36)

4.4. PSC dataset

Figure 4.6: Top row:- Left: Image number 38, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9426

Figure 4.7: Top row:- Left: Image number 46, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9703

(37)

4.4. PSC dataset

Figure 4.8: Top row:- Left: Image number 51, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9862

Figure 4.9: Top row:- Left: Image number 116, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.8947

(38)

4.4. PSC dataset

Figure 4.10: Top row:- Left: Image number 193, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9024

4.4.3

Training: PSC dataset, Testing: NILB dataset

Finally, training data was used from the PSC dataset and data from NILB dataset was used as the test data. An overview image of the test data and predicted masks are shown in Figures 4.16 and 4.17 respectively.

Few cases of the predicted masks are shown and this is overlaid onto the original test image. There is no ground truth data for the NILB dataset since the missing ground truth images were used as test data in the algorithm. Though, It can be seen in the predicted masks that the liver segmentation is overemphasized. It is exaggerated to the point where the predicted masks are going beyond the dimensions of the actual liver (Figures 4.18, 4.19, 4.20).

(39)

4.4. PSC dataset

(40)

4.4. PSC dataset

(41)

4.4. PSC dataset

Figure 4.13: Top row:- Left: Image number 80, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9196

Figure 4.14: Top row:- Left: Image number 93, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9314

(42)

4.4. PSC dataset

Figure 4.15: Top row:- Left: Image number 437, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.0000

(43)

4.4. PSC dataset

(44)

4.4. PSC dataset

(45)

4.4. PSC dataset

Figure 4.18: Left: Image number 66, Middle: Predicted mask, Right: Overlay of mask on image

Figure 4.19: Left: Image number 158, Middle: Predicted mask, Right: Overlay of mask on image

Figure 4.20: Left: Image number 215, Middle: Predicted mask, Right: Overlay of mask on image

(46)

4.5. HiFi dataset

4.5

HiFi dataset

4.5.1

Training: HiFi dataset, Testing: HiFi dataset

Initially, training data and testing data were both acquired from HiFi dataset. An overview image of the training data with ground truth is shown in Figure 4.21 and an overview image of the test data is shown in Figure 4.22

Predicted masks when using HiFi dataset as training data is shown in Figure 4.23. Few cases of the predicted masks are shown and this is overlaid onto the original test image. A comparison of the ground truth and the predictions are also visualized (Figures 4.24, 4.25, 4.26, 4.27, 4.28).

4.5.2

Training: HiFi dataset, Testing: PSC dataset

Then, training data was acquired from HiFi dataset and test data was acquired from PSC dataset. An overview image of the test data and predicted masks are shown in Figures 4.29 and 4.30 respectively.

Few cases of the predicted masks are shown and this is overlaid onto the original test image. A comparison of the ground truth and the predictions are also visualized (Figures 4.31, 4.32, 4.33).

4.5.3

Training: HiFi dataset, Testing: NILB dataset

Finally, training data was used from the HiFi dataset and data from NILB dataset was used as the test data. An overview image of the test data and predicted masks are shown in Figures 4.34 and 4.35 respectively.

Few cases of the predicted masks are shown and this is overlaid onto the original test image. There is no ground truth data for the NILB dataset since the missing ground truth images were used as test data in the algorithm. Though, It can be seen in the predicted masks that the liver segmentation is overemphasized. The predictions are very inaccurate and even include other pixels apart from the liver. The segmentations are unpredictable.(Figures 4.36, 4.37, 4.38).

(47)

4.5. HiFi dataset

(48)

4.5. HiFi dataset

(49)

4.5. HiFi dataset

(50)

4.5. HiFi dataset

Figure 4.24: Top row:- Left: Image number 80, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9883

Figure 4.25: Top row:- Left: Image number 93, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9676

(51)

4.5. HiFi dataset

Figure 4.26: Top row:- Left: Image number 294, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9548

Figure 4.27: Top row:- Left: Image number 437, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.5280

(52)

4.5. HiFi dataset

Figure 4.28: Top row:- Left: Image number 596, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9690

(53)

4.5. HiFi dataset

(54)

4.5. HiFi dataset

(55)

4.5. HiFi dataset

Figure 4.31: Top row:- Left: Image number 38, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.0.8664

Figure 4.32: Top row:- Left: Image number 51, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.8680

(56)

4.5. HiFi dataset

Figure 4.33: Top row:- Left: Image number 193, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.6371

(57)

4.5. HiFi dataset

(58)

4.5. HiFi dataset

(59)

4.5. HiFi dataset

Figure 4.36: Left: Image number 66, Middle: Predicted mask, Right: Overlay of mask on image

Figure 4.37: Left: Image number 158, Middle: Predicted mask, Right: Overlay of mask on image

Figure 4.38: Left: Image number 215, Middle: Predicted mask, Right: Overlay of mask on image

(60)

4.6. NILB dataset

4.6

NILB dataset

4.6.1

Training: NILB dataset, Testing: NILB dataset

Initially, training data and testing data were both acquired from NILB dataset. An overview image of the training data with ground truth is shown in Figure 4.39 and an overview image of the test data is shown in Figure 4.40.

Predicted masks when using NILB dataset as training data is shown in Figure 4.41. Few cases of the predicted masks are shown and this is overlaid onto the original test image. The test images used in the NILB dataset has no ground truth comparisons. Hence, a comparison of the ground truth and the predictions could not be done. (Figures 4.42. 4.43, 4.44, 4.45, 4.46).

4.6.2

Training: NILB dataset, Testing: PSC dataset

Then, training data was acquired from NILB dataset and test data was acquired from PSC dataset. An overview image of the test data and predicted masks are shown in Figures 4.47 and 4.48 respectively.

Few cases of the predicted masks are shown and this is overlaid onto the original test image. A comparison of the ground truth and the predictions are also visualized. Though some of the segmentations were appropriate, most of them were not accurate and few were responsible for overestimating the liver pixels. (Figures 4.49, 4.50, 4.51).

4.6.3

Training: NILB dataset, Testing: HiFi dataset

Then, training data was acquired from NILB dataset and the test data was used from the HiFi dataset. An overview image of the test data and predicted masks are shown in Figures 4.52 and 4.12 respectively.

Few cases of the predicted masks are shown and this is overlaid onto the original test image. A comparison of the ground truth and the predictions are also visualized (Figures 4.54, 4.55, 4.56). The algorithm was unable to predict the segmentation of the test data from image number 405 to image number 465.

Similar to Section 4.4.2, the mask of image number 437 could not be predicted as shown in Figure 4.56.

4.7

Comparison of Identical Slices

Table 4.3 is a comparison of the test DSC of identical slices used with different training datasets.

(61)

4.7. Comparison of Identical Slices

(62)

4.7. Comparison of Identical Slices

(63)

4.7. Comparison of Identical Slices

(64)

4.7. Comparison of Identical Slices

Figure 4.42: Left: Image number 66, Middle: Predicted mask, Right: Overlay of mask on image

Figure 4.43: Left: Image number 158, Middle: Predicted mask, Right: Overlay of mask on image

Figure 4.44: Left: Image number 215, Middle: Predicted mask, Right: Overlay of mask on image

Figure 4.45: Left: Image number 636, Middle: Predicted mask, Right: Overlay of mask on image

(65)

4.7. Comparison of Identical Slices

Figure 4.46: Left: Image number 1755, Middle: Predicted mask, Right: Overlay of mask on image

Table 4.3: Individual DSC comparison of identical slices with different training datasets. Note: NILB dataset has no ground truth, hence there is no DSC for comparison

Image number Training-Test Dataset DSC compared to ground truth

38 PSC-PSC 0.9426 HiFi-PSC 0.8664 NILB-PSC 0.8864 51 PSC-PSC 0.9862 HiFi-PSC 0.8680 NILB-PSC 0.9790 193 PSC-PSC 0.9024 HiFi-PSC 0.6371 NILB-PSC 0.9007 80 HiFi-HiFi 0.9883 PSC-HiFi 0.9196 NILB-HiFi 0.9782 93 HiFi-HiFi 0.9676 PSC-HiFi 0.9314 NILB-HiFi 0.9600 437 HiFi-HiFi 0.5280 PSC-HiFi 0.0000 NILB-HiFi 0.0000

(66)

4.7. Comparison of Identical Slices

(67)

4.7. Comparison of Identical Slices

(68)

4.7. Comparison of Identical Slices

Figure 4.49: Top row:- Left: Image number 38, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.8864

Figure 4.50: Top row:- Left: Image number 51, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9790

(69)

4.7. Comparison of Identical Slices

Figure 4.51: Top row:- Left: Image number 193, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9007

(70)

4.7. Comparison of Identical Slices

(71)

4.7. Comparison of Identical Slices

(72)

4.7. Comparison of Identical Slices

Figure 4.54: Top row:- Left: Image number 80, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9782

Figure 4.55: Top row:- Left: Image number 93, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.9600

(73)

4.7. Comparison of Identical Slices

Figure 4.56: Top row:- Left: Image number 437, Middle: Predicted mask, Right: Overlay of mask on image; Bottom row:- Left: Ground truth, Middle: Predicted mask, Right: Overlay of mask on ground truth; DSC=0.0000

(74)

5

Discussion

5.1

Results

The discussion of results section is divided into the following sections:

5.1.1

Data Preprocessing

Though CLAHE did improve the windowing issue, it could not do so to a great extent. Few of the images (in the HiFi dataset) still showed the windowing problem and this affected the segmentation result and the DSC score on those images (For example, Figure 4.27).

But taking the three datasets as a whole, CLAHE had major improvements in the segmen-tation and gave better results (especially for HiFi dataset).

5.1.2

Evaluation of Algorithm: Same dataset for training and testing

When using the same dataset for training and testing, the segmentation results were good. As seen in 4.2, the algorithm performs well since the validation DSC achieves a high value. The maximum validation DSC was achieved by the NILB dataset. According to Section 2.1.1, 3.0 T MRI scanners give a better image quality than 1.5 T scanners. They also have a higher CNR ratio and can demarcate the liver pixels well compared to the surrounding organs. So ideally, the HiFi dataset should have given the best result. This wasn’t the case probably because the training data in NILB dataset was almost twice as much as in the HiFi dataset. In machine learning, more training data always leads to better results. Also, the way the MR images are reconstructed could’ve played a role in these results. We see that many images in HiFi dataset are suffering from the windowing issue too.

PSC dataset: In Figure 4.6 and Figure 4.7, from the ground truth comparisons it is easy to visualize where the segmentation fails. The predicted masks mainly differ from the ground truth when there are holes/crevices in the ground truth. The predictions of these holes/crevices are either exaggerated or not present. In 4.7, the hole in the predicted mask is exaggerated compared to the ground truth. In Figure 4.9 and Figure 4.10, it can be seen that the predicted mask is very similar to the ground truth except smaller white pixels in which it differs. The segmentation mainly differs from the ground truth and slightly predicts

References

Related documents

• In Chapter 4 we have constructed capacity-achieving polar codes for the degraded symmetric binary input wiretap channel, the decode-and-forward scheme for the degraded relay

The present work is the first critical edition of the Glossa ordinaria on the Book of Lamentations, and consists of the forewords, or prothemata, and the first book (of five) of

I samband med denna definition utgår kommunikationsteoretikern Fiske (1990) från en rad intressanta förutsättningar. Bland annat förutsätter han att kommunikation intar en

The particular subjects with which this thesis engages may be pinned down to four points: Firstly, the thesis is a close-up study of the notions of colonialism and decolonization held

There have been ideas on how to design an interactive algorithm based on only Region Growing, but as these ideas have stayed at the conceptual stage, it is the interactive level

Slutsatser inkluderar bland annat att kvinnor med funktionsvariation generellt är särskilt utsatta för våld jämfört med andra kvinnor, den vanligaste våldstypen som kan

Hypotesen var delvis rätt då de yngre eleverna har en mer positiv inställning till föräldramedverkan i skolan, men vi har fel i hypotesen då det inte är någon större

other negative emotions, such as worrying about climate change and worsening inequal‐