Automated Measurements of Liver Fat Using Machine Learning

(1)

Automated Measurements

of Liver Fat Using Machine

Learning

(2)

Tobias Grundström LiTH-ISY-EX--18/5166--SE Supervisor: Andreas Robinson

isy, Linköping University

Magnus Borga, Hannes Järrendahl

AMRA Medical AB

Examiner: Maria Magnusson

isy_{, Linköping University}

Computer Vision Laboratory Department of Electrical Engineering

(3)

The purpose of the thesis was to investigate the possibility of using machine learn-ing for automation of liver fat measurements in fat-water magnetic resonance imaging (mri). The thesis presents methods for texture based liver classification and Proton Density Fat Fraction (PDFF) regression using multi-layer perceptrons utilizing 2D and 3D textural image features. The first proposed method was a data classification method with the goal to distinguish between suitable and unsuitable regions to measure PDFF in. The second proposed method was a com-bined classification and regression method where the classification distinguishes between liver and non-liver tissue. The goal of the regression model was to pre-dict the difference d = pdffmean− pdffROI between the manual ground truth mean and the fat fraction of the active Region of Interest (roi).

Tests were performed on varying sizes of Image Feature Regions (froi) and com-binations of image features on both of the proposed methods. The tests showed that 3D measurements using image features from discrete wavelet transforms produced measurements similar to the manual fat measurements. The first method resulted in lower relative errors while the second method had a higher method agreement compared to manual measurements.

(4)

(5)

Firstly, I would like to thank AMRA Medical AB and my supervisors Magnus Borga and Hannes Järrendahl for the opportunity to perform my thesis work there. I also want to thank everyone else at AMRA for all the help and great ideas, especially Peter Karlsson Zetterberg and the rest of the DevOps team. Secondly, I want to thank my supervisor Andreas Robinson and my examiner Maria Magnusson at the Department of Electrical Engineering. Your valuable knowledge and feedback has helped me produce a thesis work that I am really proud of.

Finally, I want to thank everyone that I have come into contact with in differ-ent associations during my years at Linköping University. All the advdiffer-entures we have had together has helped motivate me in my studies, and have left me eager to keep on exploring the world.

Linköping, August 2018 Tobias Grundström

(6)

(7)

Notation ix 1 Introduction 1 1.1 Background . . . 1 1.2 Problem Statement . . . 1 1.3 Limitations . . . 2 1.4 Thesis Outline . . . 2 2 Theoretical Background 3 2.1 Related Work . . . 3

2.2 Magnetic Resonance Imaging . . . 4

2.2.1 Spin Physics . . . 4 2.2.2 Excitation . . . 6 2.2.3 Relaxation Times . . . 6 2.2.4 Volume Construction . . . 7 2.2.5 Chemical Shift . . . 8 2.2.6 Dixon Imaging . . . 9

2.2.7 Proton Density Fat Fraction . . . 9

2.3 Neural Networks . . . 10 2.3.1 Data Classification . . . 10 2.3.2 Data Regression . . . 11 2.3.3 Perceptron . . . 12 2.3.4 Activation Functions . . . 12 2.3.5 Multi-layer Perceptron . . . 13 2.3.6 Back-propagation . . . 13 2.4 Image Features . . . 15

2.4.1 Gray Level Co-occurrence Matrix . . . 15

2.4.2 Discrete Wavelet Transform . . . 17

2.4.3 Histogram of Gradient Magnitude . . . 19

3 Method 21 3.1 Development Environment . . . 21

3.2 Data Acquisition . . . 21

(8)

3.2.1 Liver Localization . . . 22

3.2.2 Binary Masks . . . 23

3.2.3 Feature Extraction . . . 23

3.3 Classification Method . . . 25

3.4 Classification and Regression Method . . . 26

3.5 Network Training . . . 27 3.6 Performance Evaluation . . . 27 3.6.1 Classifier Evaluation . . . 27 3.6.2 System Evaluation . . . 29 4 Results 31 4.1 Approach . . . 31 4.2 Experimental Setup . . . 31 4.2.1 Classification Method . . . 32

4.2.2 Classification and Regression Method . . . 32

4.3 Feature Region Size . . . 33

4.4 Image Features . . . 34

4.5 Classification and Regression . . . 35

5 Discussion 45 5.1 Feature Region Size . . . 45

5.2 Image Features . . . 45

5.3 Classification and Regression . . . 46

6 Conclusion 47 6.1 Conclusion . . . 47

6.2 Future Work . . . 48

(9)

Number sets

Notation Description R Set of real numbers Z Set of real integers Abbreviations

Abbreviation Description

mri Magnetic resonance imaging roi _{Region of interest}

froi _{Feature region of interest} pdff _{Proton density fat fraction}

rf _{Radio frequency}

mlp _{Multi-layer perceptron}

glcm _{Gray level co-occurance matrix} i Intensity (image feature) dwt Discrete wavelet transform hgm Histogram of gradient magnitude

loa Limits of agreement

cnn Convolutional Neural Network

(10)

(11)

1

Introduction

The following chapter presents the background and purpose of this thesis. The limitations and the outline of the thesis are also presented.

1.1 Background

The purpose of this thesis is to investigate the possibility of using machine learn-ing for automation of liver fat measurements in fat-water magnetic resonance imaging. Today, liver fat is measured in a number of manually positioned regions of interest (roi). The roi:s are placed manually, avoiding larger blood vessels and bile ducts that would disturb the measurements. AMRA is now interested in automating this process. The thesis project investigates machine learning tech-niques for selecting the placement of the liver roi:s. The aim is to have a vali-dated prototype for automated placement of roi:s for liver fat measurements.

1.2 Problem Statement

The following section presents the primary and secondary questions that the project aims to answer. The questions are stated from the previously mentioned background to the project and covers both data acquisition and the use of ma-chine learning as a method.

• Is it possible to generate training data by using a liver MR image and a measured ground truth of average fat percentage?

• Does the approach of Machine Learning work as a way of automating liver fat measurements?

• Is a segmentation of the liver necessary to obtain good results?

(12)

• What aspects need to be considered if further work on the subject is to be performed?

1.3 Limitations

This thesis does not include automated localization of the liver in a whole body scan. Instead spatial information about the liver based on mri scans centered at the liver will be used.

1.4 Thesis Outline

Chapter 2 presents the theoretical background of this thesis, including Magnetic Resonance Imaging, Neural Networks and a number of image features. Chapter 3 describes the proposed method for data acquisition, classification and regres-sion. Chapter 4 contains the experimental setup and the achieved experimental results. Chapter 5 discusses the results in Chapter 4 and is the foundation to the concluding remarks and suggested further research in Chapter 6.

(13)

2

Theoretical Background

In the following chapter the theoretical background to the work is presented, describing Magnetic Resonance Imaging, Neural Networks and image feature ex-traction methods. The level of theory is adapted to a reader studying at Master’s degree level in engineering and assumes basic knowledge of physics and multi-variable calculus.

2.1 Related Work

To analyse liver regions different image features are used in previous research. The most common type of image features in Magnetic Resonance Imaging (mri) are describing local texture patterns. Textural features typically consist of Gray Level Co-occurance Matrix (glcm) measurements, gradient histograms and fea-tures calculated using Discrete Wavelet Transform (dwt). The textural measure-ments are generally used in classification of tumors or other abnormalities in bodily organs. Frequently occurring machine learning techniques include Multi-Layer Perceptrons (mlp), Support Vector Machines (SVM) and Convolutional Neural Networks (cnn).

Yudong Zhang et. al [25] perform brain tumour classification using dwt image features fed to a neural network. The image features’ dimensions are reduced by principal component analysis (PCA). They achieve high classification accuracy with a neural network consisting of one hidden layer and a low number of nodes. Gobert Lee et. al [7] conduct research on liver cirrhosis identification using gra-dient magnitude histograms and glcm texture analysis in subset roi:s of an MR image. The learning approach used unsupervised k-means clustering learning and achieved a sensitivity and specificity of 72% and 60% respectively.

(14)

Li Zhenjiang et. al [10] show that texture analysis and detection of liver lesions in mri_{scans using glcm, dwt and gray level gradient co-occurrences are valuable} techniques that achieve good results. They compare several machine learning classification methods including Neural Networks, Support Vector Machines and k-Nearest Neighbor that all perform well.

Ritu Punia and Shailendra Singh [18] use image features based on dwt and gray level histograms to detect liver regions. The optimal features are found using a genetic algorithm inspired by the principle of biological evolution. The opti-mized feature vector is fed to a neural network and they achieve a liver pixel classification accuracy of 96,2%.

Convolutional Neural Networks (CNN) that are used in many state-of-the art ap-plications as in [9] produce high accuracy classification. The disadvantage is that pre-trained medical imaging CNNs are not as widely available as CNNs trained on natural images, resulting in longer training times and more needed data to achieve state-of-the-art results.

Multi layer perceptrons (mlp) are deemed sufficient, based on previous work, to test the hypothesis of machine learning based pdff measurements. Also, shorter training times compared to cnn:s enables several approaches to be evaluated in the given thesis time frame. This thesis presents methods for texture based liver classification and pdff regression using mlp:s utilizing 2- and 3-dimensional tex-tural image features.

2.2 Magnetic Resonance Imaging

Magnetic Resonance Imaging (mri) uses properties of hydrogen atoms in human tissue when exposed to a strong magnetic field to generate images. The Larmor frequency is associated with the magnetic moment of the proton and the mag-netic field strength. A radio frequency pulse at a certain frequency ω is applied. If ω matches the Larmor frequency, the proton will precess. When the pulse is turned off, the protons return to their lowest energy state and release energy, that is measured. The following sections are based on [4] and [17] unless other sources are mentioned.

2.2.1 Spin Physics

The principle behind mri is built on the quantum mechanical property of spin. Spin can be described as the rotation of the nucleus about its own axis. Due to the positive charge of the nucleus, a magnetic field is generated. The magnetic moment of the nucleus µ is proportional to its spin S by its gyromagnetic ratio γ,

(15)

The magnetic moment also possesses the same spatial orientation as S.

In an mri scan a strong external magnetic field B0 is applied over the subject.

Protons in nuclei of mainly hydrogen atoms will precess about the direction of B0 at the Larmor frequency ωL as shown in Figure 2.1. The relation between the magnitude of the magnetic field and the precession frequency is given by the Larmor equation,

ωL = γ B0. (2.2)

B₀

μ

ω

_L

Figure 2.1: A rotating nuclei in a static magnetic field, precessing at the Larmor frequency. Image adapted from [17].

The net magnetization M of the N number of nuclei in a given volume V is calcu-lated from the magnetic moments

M= 1

V

X

N

µi. (2.3)

Mconsists of components both parallel and perpendicular to B0. The component

in the transverse plane M⊥ is zero at the equilibrium state, due to the nuclei

precessing at random phases and cancelling out one another. The longitudinal component in the z-direction on the other hand has a small net magnetization

Mk = M₀, due to the nuclei tending to align along the direction of the external

magnetic field. This tendency comes from the potential energy of the magnetic moments in B0moving towards the minimum energy state. According to

E = −µ · B0, (2.4)

(16)

2.2.2 Excitation

A magnetic field pulse B1 in the radio frequency (rf) domain is applied to the

subject in the transverse plane. In Figure 2.2, B1 oscillates at a frequency equal

to the Larmor frequency ω = ωL and thus resonance of the protons is achieved and the spin will be tipped away from the ωL-axis. The flip angle θ, how wide the protons precess around the axis, increases by increasing the magnitude or the duration of B1according to the relation

θ = γ B1τ. (2.5)

When the pulse is turned off the protons gradually return to the equilibrium state.

B₁

ω

_L

γB₁

θ M

Figure 2.2: A reference frame rotating at the Larmor frequency depicting the flip angle of the magnetization when a magnetic field B1is applied. The

protons precess around the axis at which B1 is applied to. Image adapted

from [17].

2.2.3 Relaxation Times

When an rf pulse is applied the magnetization in the transverse plane, Mxy in-creases while Mzdecreases. When the pulse ends, Mxyhas reached its maximum magnitude. The magnitude then gradually decreases until the protons have re-turned to the equilibrium state when Mz = M0. The time it takes to reach the

equilibrium is called the relaxation time. The relaxation time in the longitudinal direction is the time it takes for Mz to reach 63% of its maximum value. It is called T1-relaxation and is shown in Figure 2.3a. It is given by

Mz(t) = M0(1 − e −t

(17)

In the latitudinal direction the process is called T2relaxation, seen in Figure 2.3b. T2relaxation is defined as the time it takes the magnetization Mxyto decrease to 37% of its initial value. It is described as

Mxy(t) = Mxy(0)e −t

T2_. _(2.7)

(a)Increase of magnetization over time in the lon-gitudinal direction showing T₁relaxation.

(b) Decrease of magnetization in the transverse plane showing T2relaxation.

Figure 2.3: Graphs showing T1 and T2 magnetization relaxation. Image

source [20].

2.2.4 Volume Construction

In order to construct an MR volume, each voxel has to be associated with a signal response; the concept of gradient coils is used in encoding this spatial informa-tion.

At first, a slab, a 3D volume of the patient is excited as described in Section 2.2.2. A gradient magnetic field is applied in every spatial dimension (x, y and z) that gradually increases in magnitude as the coordinates increase. The gradient fields (Gx, Gy, Gz) combined with the static field B0results in a spatially varying field.

(18)

The angular frequencies of the spins can then be calculated using the relation in (2.2) as

ω(x, y, z, t) = γ B(x, y, z, t) = γ B0+ γ(xGx(t) + yGy(t) + zGz(t)). (2.9) The emitted signal of the slice is contributed to by all the spins in that volume. The signal responses are presented in an array of numbers called a k-space vol-ume with encoding in the x-direction on one axis, kx, and encoding in the y-direction on the other axis, ky and the encoding in the z-direction, kz on the third axis. When the amplitudes of the three gradients change with the time t, it is possible to measure the signal in every position of the k-space. The k-space

s(kx, ky, kz) is a frequency domain representation of the MR volume, and using the inverse two dimensional Fourier transform,

I(x, y, z) = Z Z Z s(kx, ky, kz)e2πi(kxx+kyy+kzz)dkxdkydkz = F −₁ {_s(k_x_{, k}_y_{, k}_z_)}, (2.10) the volume I in the signal domain is reconstructed. A slice of the volume is shown in Figure 2.4b

(a)A k-space represen-tation of the volume slice in 2.4b.

(b)A reconstructed MR volume

I(x, y, z = z1) after the inverse

Fourier transform has been ap-plied.

Figure 2.4: The k-space representation of the signal and the reconstructed MR volume.

2.2.5 Chemical Shift

Living tissue is dominated by hydrogen atoms in water that produce an mri sig-nal. Tissue containing fat also produce a notable sigsig-nal. Fat and water are chemi-cally shifted, and this property is utilized when separating tissue types in an MR image. A chemical shift is associated with a frequency shift of the different pro-tons’ Larmor precession frequencies due to differences in magnetic shielding in

(19)

the chemical compounds. The fat is shifted to a lower precession frequency (ωf) compared to water (ωw) and the difference is described by

∆ωf w= ωf −ωw= −σ γ B0 (2.11)

where σ is a dimensionless fraction of the field B0 that describes the chemical

shift.

In the reference frame of the water signal w and due to ∆ωf w the protons will vary the precession between having the same phase and being out of phase. This is the property that the Dixon separation method in Section 2.2.6 takes advantage of.

2.2.6 Dixon Imaging

The two point Dixon method [5] consists of acquiring two separate spin echo images with different echo times. First, an image where the fat and water signals are in phase,

II P = f + w, (2.12)

and secondly, one where the signals are out of phase,

IOP = f − w. (2.13)

The images containing separated fat signals can be extracted as the sum of II P and IOP according to 1 2(II P + IOP) = 1 2(f + w + f − w) = 1 2(2f ) = f (2.14)

and the separated water signals are similarily extracted as the difference of II P and IOP, 1 2(II P −IOP) = 1 2(f + w − (f − w)) = 1 2(2w) = w. (2.15)

See Figure 2.5 for an illustration of the different images.

2.2.7 Proton Density Fat Fraction

The signals of the mri are separated into fat and water images as described in Sec-tion 2.2.6. From these images the Proton Density Fat-FracSec-tion (pdff) is measured as the ratio of density of mobile protons in the fat signals to the total density of protons in mobile fat and mobile water. pdff is a standardized biomarker for measuring fat concentration. It is interpreted as the actual fat fraction of MR-visible tissue. [19]

P DFF = f

(20)

(a)An in phase image of a liver,

I_{I P}.

(b)An opposite phase image of a liver, I_OP.

(c)A fat image created from ad-dition of II P and IOP.

(d)A water image created from subtraction of II P and IOP.

Figure 2.5: In phase and opposite phase images that create fat and water separated images.

2.3 Neural Networks

An artificial neural network (ANN) is a mathematical model for data classifica-tion and regression inspired by a network of biological neurons as in the human brain. Neural networks are common within for example image processing, object recognition and voice recognition. This section presents the different parts of a simple neural network called the multi-layer perceptron and how it is trained from given data. The following section is based on [12] and [2].

2.3.1 Data Classification

Data classification is the process to correctly classify data as one of possible classes from a given input. In supervised learning, where neural networks are used, a rule for classification called a discriminant function that separates the data is created. The discriminant is based on data samples where the class is known, in order to predict the corresponding unknown class for a new previ-ously unseen data sample. Figure 2.6 shows two classes being separated by a discriminant. New samples on one side on the discriminant will correspond to

(21)

one class, and samples on the other side to the other class.

Figure 2.6: Two data classes (green and red) separated by a linear discrim-inant function. The discrimdiscrim-inant function outputs values of 1 if the input sample belongs to the green class and 0 if the sample belongs to the red class.

2.3.2 Data Regression

Data regression analysis is the task to estimate a model that fits given data sam-ples, where the output value is known. Compare with classification, where the goal is to discriminate samples. The goal of the regression model is to correctly anticipate output values based on inputs. Neural networks can also be used in re-gression analysis to predict continuous output values. Figure 2.7 depicts a linear regression model that aims to describe the surrounding samples.

Figure 2.7: A regression model fit to several data samples. The regression model generates a continuous value based on the input sample.

(22)

2.3.3 Perceptron

The perceptron creates linear discriminant functions and regression models in the form of a hyperplane. The perceptron in Figure 2.8 is a statistical model inspired by a neuron. For a given input, the perceptron returns a class of the sample. The decision is based on previous data samples used to optimize the parameters of the model. The equation

zk = X

i

wkjxj+ w0, (2.17)

describes a perceptron mathematically. Here xj are the inputs that can come from the environment or from other perceptrons and wkjare the weights to neu-ron node zk. The weight w0 is called the bias weight and has the purpose of

providing a more general model by translating the model in the input feature space. The input x0 is called the bias unit and is always set to a static value of

+1. To create a non-linear function, two additional steps are necessary namely activation functions and hidden layers. This is discussed in Section 2.3.4 and 2.3.5.

z

x

₀

x

_i

w

₀

w

_i

f(z)

y

Figure 2.8:A single perceptron that adds all weighted inputs and passes the value to an activation function f (z).

2.3.4 Activation Functions

In order to model non-linear functions, the first step is to use different types of activation functions f (zk) as shown in Figure 2.8 at the output of neurons,

yk = f (zk). (2.18)

An activation functions both provides non-linearity to the weighted sum of the neuron and limits the output of the neuron to the output range of the activation function. A common activation function in the intermediate layers in a multi-layer perceptron is the logistic sigmoid function in Equation (2.19). The logistic sigmoid function limits the output of the neuron between 0 and 1.

f (zk) = 1 1 + exp(−zk)

(23)

2.3.5 Multi-layer Perceptron

The second step of modelling non-linear functions is to add several layers of multiple perceptron nodes, a model called the multi-layer perceptron (mlp), see Figure 2.9. A multi-layer perceptron contains an input layer where data enters the network, an output layer where the resulting classification or prediction of the data is shown and one or several intermediate layers, called hidden layers. All the layers are fully connected, meaning that all inputs are connected with all nodes by a weighted path. Only the input and expected output are known when training the network and that is the property used in the back-propagation training method. When training the network the purpose is to find the optimal weights that lead to an output that minimizes an error function. An example is the squared error function

 = 1 2 K X k (dk−yk)2, (2.20)

where dk is the expected output and yk is the received output from the node k and K is the number of output nodes. A second example is the logarithmic loss function, also called the cross entropy function,

 = − K X k N X n dklog yk, (2.21)

with K output nodes and N sample classes. The loss function simplified to two discriminating classes results in,

 = −X

k

dklog yk+ (1 − dk) log (1 − yk). (2.22) The cross entropy function measures the dissimilarity between dk and yk. When a correct classification is performed, the value of the function is 0 .

2.3.6 Back-propagation

The back-propagation method is based on minimizing the output error by using the steepest descent algorithm; the algorithm utilizes the direction of the gradi-ent,

∇₌ ∂

∂wkj, (2.23)

to incrementally move towards a minimum value. The gradient of the error function is determined by using the derivative chain-rule to work its way to the known input of the system. The weights from node j to k are updated by

(24)

Input layer Hidden layer Output layer

X

y

Figure 2.9:An example of a multi-layer perceptron with an input layer, one hidden layer and a single node output layer.

at the m:th iteration of algorithm where η is the learning rate.

Given output layer nodes k ∈ (0, ..., K) and adjacent previous layer nodes j ∈ (0, ..., J), the output layer weights wkjare updated using (2.24) and the gradient

Φ_k = ∂ ∂yk ·∂yk ∂zk = f0(zk)(yk−dk), (2.25) ∇₌ ∂ ∂wkj = ∂ ∂yk ·∂yk ∂zk · ∂zk ∂wkj = Φk ∂zk ∂wkj = Φkyj. (2.26) The update rule can be generalized for an arbitrary layer disregarding the output layer. The generalized update rule is described by (2.27) and (2.28). For any ac-tivation function f , the update rule is extended by updating the weights from nodes i to j at layer r. The update is performed by taking the layers r − 1, r, r + 1 with nodes i, j and k in consideration.

Φr_j = ∂ ∂y_jr · ∂y_jr ∂z_jr = f 0 (z_jr)X k Φr+1_k wr+1_kj , (2.27) ∇₌ ∂ ∂wr_ji = ∂ ∂y_jr · ∂y_jr ∂z_jr · ∂z_jr ∂wr_ji = Φ r j ∂zr_j ∂w_jir = Φ r jyir−1 (2.28) Equation (2.27) indicates that Φr_jat layer r depends on all subsequent layers (r +1,

(25)

of the step taken in the direction of the gradient, in other words how big the updates of weight parameter values are. To decrease the time until the function has converged, an adaptive learning rate can be used. In this case a larger value of

η is chosen at the start of the training. By lowering the value of η as the learning

process slows down, fine tuning of the weight parameters can be achieved.

2.4 Image Features

The input to the mlp classifier consists of several image features. In the following section the gray level co-occurrence matrix, histogram of gradient magnitudes and discrete wavelet transforms are presented. The different image features de-scribe textural properties of local image regions.

2.4.1 Gray Level Co-occurrence Matrix

0 1 1 2 3

0 1 1 3 2

0 0 0 1 3

1 2 2 3 3

Figure 2.10:A 5 × 5 pixel image region with four different gray levels. The gray level co-occurrence matrix (glcm) [13] is used for calculating textural features in images. The textural features consist of several statistical measure-ments describing different relationships between pixel gray levels in a designated region.

Let the gray-scale image I(x) be an M × M matrix, i.e x ∈ {1, · · · , M}2 with N gray-levels. Let the gray-level co-occurrence matrix P be an N ×N matrix with the elements P (i, j), i, j ∈ {0, · · · N −1} and that x ∈ {1, · · · , M}2and x+d ∈ {1, · · · , M}2, count the tuples (i = I(x), j = I(x + d)) and store the counts in P (i, j).

As stated above, the glcm of an image patch extracted from an image with N gray levels consists of an array P (i, j) of N × N elements. Figure 2.10 shows a 5 × 5 pixel image patch with four gray levels that will be used as an example. Sev-eral measures for different angles can be performed. For example, at angle 0 the neighboring pixel at distance d to the right of the query pixel is considered. The occurrence rate of gray value i with neighboring gray value j at distance d = 1 are counted and entered at P (i, j) in Figure 2.12a. For example at position (3, 2)

(26)

π/4 π/2

3π/4

0

d = 1 d = 2

Figure 2.11:The glcm matrix of an image region can be calculated at differ-ent angles and differdiffer-ent step lengths d.

the number of times a pixel with gray value 3 has a neighboring pixel to the right with gray value 2 is counted. Matrices can be calculated for the angles 0, π₄, π₂ and 3π₄ in Figure 2.11, resulting in different angular glcm:s. Due to symmetry where (i, j) occurs at the same rate at angle 0 as (j, i) occurs at angle π, the re-maining angles can be taken into account by adding P (i, j) with its transpose, see Figure 2.12b. In a similar way π₄ is combined with 5π₄ and π₂ is combined with

3π

2 . To provide rotation invariance to the features, the final textural features are

calculated from an average of all the angular glcm matrices. In Figure 2.12c the glcm_{is normalized by dividing each entry in the matrix in Figure 2.12b by the} total number of occurrences.

Haralick [13] presents 14 textural measures. Since several of the measures de-scribe similar features; all of them will not be dede-scribed here. Three distinct measures will be shown, however. The contrast measure,

fcon= X

i,j

P (i, j)|i − j|2, (2.29)

describes the amount of local variation in the image. All diagonal values in P that have gray level i = j give no contribution to the contrast. By weighting higher gray level differences higher, these elements will contribute more to the contrast measure. The angular second-moment (ASM),

fASM =

X

i,j

P (i, j)2, (2.30)

results in high values when occurrence rates are high. In other words if a certain gray level difference occurs more frequently, there is more homogeneity in the image and the ASM measure will be higher. To measure linear dependencies in

(27)

2 3 0 0 0 2 3 2 0 0 2 3 0 0 1 2 j = 0 j = 1 j = 2 j = 3 i = 0 i = 1 i = 2 i = 3

(a) glcm matrix calcu-lated from the image re-gion with four gray levels in Figure 2.10 showing the occurrence rates of gray level differences at angle 0 and distance 1. 4 3 0 0 3 4 3 2 0 3 4 4 0 2 4 4 (b) Symmetrical glcm matrix created by adding

P (i, j) with P (i, j)T.

0.1 0.075 ₀ ₀ 0.075 0.1 0.075 0.05

0 0.075 _0.1 _0.1

0 0.05 0.1 0.1

(c)Normalized symmetri-cal glcm where all ele-ments have been divided by the total number of oc-currences.

Figure 2.12:GLCM matrices in the basic, symmetrical and normalized vari-ants.

the image the correlation feature,

fcorr= P

i,j(i, j)P (i, j) − µxµy σxσy

, (2.31)

where the means µx, µy and standard deviations σx, σy of px(i) = PjP (i, j) and py(i) = PiP (i, j) are taken into account. A high correlation indicates higher amounts of linear structure in the image.

2.4.2 Discrete Wavelet Transform

The continous wavelet transform

Cf(a, b) = Z

R

f (t)ψa,b(t)dt (2.32)

is a 1D signal analysis method that acts in several resolutions simultaneously. By applying a mother wavelet

ψa,b= 1 √ aψ t − b a ! , Z R ψ(t) = 0, (2.33)

to a signal f , the fluctuations of the signal can be analyzed at any scale a ∈ R+ and position b ∈ R using the coefficients Cf(a, b).

(28)

In the case of sampled signals, a discrete version of the previously mentioned transform, the discrete wavelet transform (dwt)

ψj,k(t) = 1 √ 2jψ t − k2j 2j ! , j, k ∈ Z (2.34)

is derived. It can be implemented using filter banks containing high- and low-pass filters as in Figure 2.13. The dwt decomposes the signal into approximation

A and detail coefficients D that are obtained by using a convolution followed by

down-sampling by a factor of 2. The signal is convolved with a low-pass filter for the approximation and a high-pass filter for the detail coefficients. By recursively performing the process on A, the signal can be decomposed in several levels. The approximation Aj−1at scale j − 1 can be reconstructed using Ajand Djat level j according to Aj−1= Aj+ Dj. (2.35) 2 ↓ 2 ↓ 2 ↓ 2 ↓ LP LP HP HP A1 A2 D2 D1 1D signal

Figure 2.13:A two-level filter bank implementation of the dwt. Decomposi-tion of the input signal into first and second level detail and approximaDecomposi-tion coefficients. The boxes denote low- and high-pass filters (LP and HP) and down-sampling by a factor of 2 (2↓)

To generalize the dwt algorithm for two-dimensional signals (images) the de-composition is performed in both the horizontal and the vertical directions. This results in detail coefficients in both the horizontal, DH, vertical DV and diagonal DD directions as seen in Figure 2.14. [15]

The image features are calculated as the mean and standard deviation of each set of coefficients at every decomposition level [21].

(29)

A D_H2 D_V2 _D D 2 D_V1 _D D 1 D_H1

Figure 2.14: 2 level 2D discrete wavelet transform, approximation A, hori-zontal DH, vertical DV and diagonal coefficients DD.

2.4.3 Histogram of Gradient Magnitude

The histogram of gradient magnitude (hgm) is an image feature that describes the distribution of local shape intensities in an image region I. A gradient image is created using convolution kernels, the Sobel filters,

Sx=         1 0 -1 2 0 -2 1 0 -1         , (2.36) Sy=         1 2 1 0 0 0 -1 -2 -1         , (2.37)

resulting in Gx = Sx∗I and Gy = Sy∗I, respectively. The magnitude gradient image is a combination of the gradient images as Gm = |G| =

q

G2x+ G2y. The gra-dient magnitudes are normalized and distributed in a histogram with n number of bins H(Gm, n) as shown in Figure 2.15 where the histogram itself is the image feature descriptor. [22]

(30)

Figure 2.15:A histogram of gradient magnitudes H(Gm, n) with 20 bins (n ∈ {_{1, 2, ..., 20}) that is used as a feature descriptor.}

(31)

3

Method

In the following chapter the two proposed methods are presented. The first method is a data classification method with the goal to distinguish between suit-able and unsuitsuit-able regions to measure pdff in. The second method is a com-bined classification and regression method where the classification distinguishes between liver and non-liver tissue. The goal of the regression model is to predict the difference d = pdffmean− pdffroi between the manual ground truth mean

and the fat fraction of the active roi. Thus the data sample output will be d in-stead of a class label as in the classification case. It is described how training data is acquired from the available data collection. This is followed by a description of the classifier implementation.

3.1 Development Environment

The system was developed inPython 3 and the external libraries Sci-kit learn [16]

andKeras [6] for the implementation of the learning algorithms. For the image

processing methodsSci-kit image [24] and PyWavelets [11] were used.

3.2 Data Acquisition

The UK Biobank [1] is a health data resource where 500.000 people in the ages 40-69 years have taken part in measurements and analyses of different kinds. The goal of the project was to create an important resource open for research both in academia and industry. In this thesis full body and liver mri scans were used in the experiments. The data available from the UK biobank has been marked with roi:s by an operator and measured using AMRAs methods. All possible roi placements within the liver mask area were tested and compared to the average

(32)

Whole body

scans Liver slab

Find Liver Slice

Liver Region Mask Operator ROIs

Non-liver Region Mask

Extract Image Features Extract Feature Region Extract Feature Region Compare to Ground Truth PDFF Value Feature

vector Class label

Figure 3.1:Design of the implementation of data acquisition based on whole body MRI scans and operator placed ROIs. The events within the dotted area are repeated for all positions inside the Liver Region Mask.

fat percentage that has been previously measured. This approach aimed to find all possible roi:s in the liver image, that both match and don’t match the average fat percentage. In sections 3.2.1 - 3.2.3 the parts of Figure 3.1 are described more in detail.

3.2.1 Liver Localization

Acquisition of mlp training data was performed according to Figure 3.1. The input to the system were fat and water separated whole body scans and a liver slab scan, which is a limited measured volume centered at the liver, that is not fat and water separated. The operator placed roi:s were also fed to the system. The liver z-coordinate zliver in the whole body scan was retrieved based on the liver slab scan by transforming its z-coordinate. The z-coordinate was transformed by taking voxel sizes Vz and coordinate system offsets δ into account according to Equations (3.1) and (3.2) where wc denotes a world coordinate system. A one voxel height volume at the z-coordinate gave the 2D liver image in Figure 3.3a that was passed on in the pipeline.

(33)

zwc= δ −Vz 2 −zVz (3.1) zliver = zwc,slab−zwc,body Vz,body (3.2)

3.2.2 Binary Masks

To generate data samples that describe regions that are suited and regions that are unsuited for roi placement, two binary masks were created. The liver re-gion mask in Figure 3.3b was based on the positions of operator placed roi:s. The mask was created by calculating the distance d from every roi to its closest neighboring roi. By connecting all roi:s with a distance less than c × max(d) (out of all roi:s) polygons were created that were used to create the binary mask. The creation of the polygons was performed according to Algorithm 1. All pixels contained within the region spanned by the roi:s were known to contain liver tis-sue. The non-liver region mask in Figure 3.3c was extracted from the right-most third part of the image. Here, the probability of finding liver tissue was low, thus describing areas outside of the liver.

3.2.3 Feature Extraction

For every pixel contained within the masks, a square Feature Region of Interest (froi), was extracted. One example of size 30 × 30 is shown in Figure 3.2. From

Figure 3.2:A 30 × 30 pixel froi extracted from a liver mri image. these regions the texture and intensity image features described in Section (2.4) were computed. In every froi within the liver region mask, a roi was placed to measure the fat fraction of the region. The roi:s were in most cases smaller than the froi:s. In the method using only roi classification, an acceptance limit

(34)

Algorithm 1Create liver region mask c = Limiting parameter

L = List of roi:s

form in L do . Calculate distances between roi pairs

forn in L do D(m, n) =pL(m)2−_L(n)2 ifm==n then D(m, n) = ∞ end if end for end for

M = max(min(D)) . Find max. value of all closest neighbors

form in L do . Create polygons by drawing lines between roi:s

forn in L do

ifD(m, n) ≤ c ∗ M then

Drawline between L(m) and L(n) end if

end for end for

(35)

parameter was set. Region fat fractions within ±0.8 percentage points (based on company experience) of the ground truth fat fraction were classified as suitable regions and give a class label 1. All measurements outside the limit were classi-fied as unsuitable regions and were given a class label 0. In addition all froi:s outside of the liver mask were classified as unsuitable regions with class label 0. In the method combining both a classifier and a regression model, all roi:s contained by the liver mask were classified as liver regions with a class label 1. All regions outside the liver mask were classified as non-liver regions and were given a class label 0. The difference

d = pdffmean− pdffroi (3.3)

between the manual ground truth mean and the fat fraction of the active roi generated outputs used in regression model training.

(a) Slice of a whole body mri scan at the z-coordinate of the liver.

(b) Binary liver mask based on manually placed roi_:s.

(c) Binary mask used to generate samples outside of the liver.

Figure 3.3:A liver image, binary liver mask and outside liver binary mask.

3.3 Classification Method

The method for data classification in Algorithm 2 resembled the data acquisi-tion method in some aspects. The only difference was that all roi:s were treated equally, no distinction between liver or non-liver regions were made. When plac-ing and classifyplac-ing roi:s, every roi position in the liver slice was tested by mov-ing stepwise through the whole image. For every position a froi was created and image features were computed. The features were assigned a class label through the trained classifier; if the roi was placed in a suitable region, i.e. Class = 1, it was used to calculate the mean fat fraction of the liver. When calculating the mean fat fraction, the nine roi:s with the highest probabilities were used. Nine roi:s were used since it is the same number as AMRA uses in analyses. The class probability is the output from the mlp in Section 2.3.5,

(36)

• If y < 0.5 , Class = 0.

Algorithm 2 roiClassification Findliver slice

forall possible roi positions do Extractfeature region Computeimage features Classify roi

ifClass == 1 then Save roi

SaveClass probability end if

end for

Calculatemean fat fraction from 9 saved roi:s with highest probabilities

3.4 Classification and Regression Method

The combined method of liver classification and pdff regression in Algorithm 3 had two steps. Both steps were based on similar neural network architectures. First a roi was classified as being in the liver or not. Second, the difference d in Equation (3.3) was predicted by the neural network to determine the expected level of agreement to ground truth values. roi:s with small d values and clas-sified as liver, i.e Class = 1, were used when calculating the mean fat fraction.

Algorithm 3Liver Classification and Regression Findliver slice

forall possible roi positions do Extractfeature region Computeimage features Classify roi ifClass == 1 then Predictd Save roi end if end for

(37)

3.5 Network Training

The mlp in Figure 3.4 was implemented as a sequential Keras model, that

al-lows changes to several aspects of the neural network layout. The mlp con-sisted of one hidden layer with 20 hidden nodes with a fH = tanh(x) activation function. The output layer with one node used the logistic activation function,

fO,c = (1 + exp(−x)) −₁

in classification and fO,r = x in regression. The loss func-tion that was optimized in the regression model was the squared error funcfunc-tion in Equation (2.20). In classification the cross entropy function in Equation (2.22) was used.

20 Nodes

Input layer Hidden layer Output layer Features f_H 1 Node f_O

Figure 3.4: Principle sketch of the implemented mlp with activation func-tions in the hidden and output layers.

Optimization of the mlp was performed using the adaptive optimization algo-rithm Adam [14]. The acquired training data was balanced to achieve a similar distribution between suitable and unsuitable region samples. Balancing of data was performed by removing redundant samples until the wanted class balance was achieved. The data was processed to have zero mean and a standard devia-tion of 1. This was achieved by subtracting each feature by the mean of the en-semble and dividing by the standard deviation of the enen-semble. The model was fit to the data until no further notable decrease in the value of the loss function was achieved on the validation data.

3.6 Performance Evaluation

To evaluate the proposed method, evaluations of both the classifier and the sys-tem as a whole were performed.

3.6.1 Classifier Evaluation

Evaluation of the trained classifier is necessary due to different aspects of the model being tuned to increase performance. Evaluation is important when com-paring different classifiers to one another to determine which one provides the

(38)

best model of the data. The primary evaluation metric that was used in the exper-iments was theAccuracy score for binary classifiers [23] that measures the ratio

of correctly classified data samples and shows the overall effectiveness of the clas-sifier. The accuracy score is determined from theConfusion Matrix in Figure 3.5

by using equation (3.4). The accuracy score was used when comparing different training sessions to find the best performing parameter setup.

TP = True Positive

TN = True Negative FP = False Positive

FN = False Negative

Figure 3.5: The confusion matrix for a binary classifier, containing TP, FP, FN and TN samples.

Accuracy = (T P + T N )/(T P + T N + FP + FN ) (3.4)

According to [8] the accuracy score has to be applied to all data samples and thus a score estimation method should be applied to receive a more statistically correct evaluation of the classifier. To achieve this thek-fold cross validation method was

used to train and validate the classifier on several subsets of the data set in differ-ent combinations as shown in Figure 3.6 (where k = 3). A mean accuracy score was then calculated using these measurements. To keep data that the system does not train on for testing, a portion of the data is extracted from the dataset before separating into training data and validation data.

Training Validation Test

Training Validation Training Test Training

Training

Validation Training Test

Figure 3.6: A 3-fold cross validation scheme that performs training on dif-ferent parts of the data set. A fourth part of the data is never trained on and is used for testing.

(39)

3.6.2 System Evaluation

The system was evaluated by adding the automatically generated roi:s to the liver slice of investigation and calculating the mean fat percentage. These val-ues were compared to the fat percentage corresponding to the manually placed roi:s. To compare the two approaches the Bland-Altman [3] method was used. The Bland-Altman graph structure in Figure 3.7 shows the difference in mea-surements (A − B) against the average (A + B)/2. The plot determines the mean difference of the methods, the bias, and the 95% confidence interval called the limits of agreement (loa). The interpretation of the graph is that lower bias and narrower limits of agreement indicate more similar measurements and a higher method agreement. Difference Mean Mean difference + Limit of agreement - Limit of agreement

Figure 3.7: Figure showing the structure of a Bland-Altman plot. The plot shows the mean difference between compared methods and the 95% confi-dence interval.

(40)

(41)

4

Results

In the following chapter the decided approach on how to answer the stated ques-tions in Section 1.2 is presented. This is followed by the experimental setup and experimental results.

4.1 Approach

To determine wheather or not a machine learning approach is applicable in mea-suring liver fat and how to generate good training data, several parts of the prob-lem have to be investigated. To achieve this, experiments were performed with focus on the following areas:

• Comparing different image features.

• Whether the froi where image features are extracted and the roi where liver fat is measured should have different sizes. This was tested by running trials on different sizes and comparing the outcomes to one another. • Whether the 2D approach of measuring gives a good enough result or if the

measurements have to be expanded to 3D volumes. This was determined by comparing the automatic measurements to the manual measurements.

4.2 Experimental Setup

500 subjects were used in the following experiments. From these, 400 were used in data acquisition; resulting in model training and validation data. The final 100 subjects were used in method evaluation by performing Bland-Altman analysis and measuring the mean relative error (MRE),

(42)

MRE = 1 N N X n=1 pdff_true,n− pdff_mean,n pdff_true,n , (4.1)

compared to ground truth values. The ground truth fat fraction values were mea-sured using a different type of mri scan method. This resulted in comparisons between two different scan methods and a comparison between manual and au-tomatic measurements.

Voxels in the data set had an approximate resolution of 2 × 2 × 3 mm. The roi:s were created using a cylinder with 7 mm radius and 11 mm height in the 3D case; resulting in approximately 3 pixel radius and height. In the 2D case roi:s had a 3 pixel radius and a 1 pixel height. To reduce computation times during evaluation of methods, froi:s were placed at 10 pixel intervals. The region sizes are discussed further in Section 4.3. The image features are calculated from both the fat- and water separated images, resulting in two sets of each image feature.´ During calculations of glcm features, the input images were quantized to 64 gray levels, to receive lower computational times. No notable loss in feature qual-ity was observed. The step lengths (1,2,3,4) and angles (0◦, 45◦, 90◦, 135◦) were used. The hgm values were separated into a 10 bin histogram. In calculations of dwt features, images were decomposed into 2 levels using the mother wavelet

db2 from the Daubechies family [15].

4.2.1 Classification Method

Training and validation data samples were extracted as three different types in order to simplify balancing. There were samples that describe suitable positions inside liver (Class = 1), unsuitable positions inside liver (Class = 0) and samples outside liver (Class = 2). Samples were balanced to 50%/25%/25% of suitable regions, unsuitable regions and outside liver, respectively. After balancing, all samples of class 2 were given class 0. This resulted in that class 0 was describing unsuitable regions both inside and outside of the liver.

4.2.2 Classification and Regression Method

In the combined method, training and validation samples were extracted as in-side liver (Class = 1) and outin-side liver (Class = 0). Balancing of data was per-formed to reach a 50%/50% ratio. Regression model samples were extracted ex-clusively from inside the liver.

(43)

4.3 Feature Region Size

The aim of the first test on the classification method was to determine the froi size to use in further experiments. Experiments were performed on three differ-ent froi sizes, shown in Figure 4.2 to visualize the differdiffer-ent levels of detail. The 7 × 7 froi is coarse while the 21 × 21 froi contains more detail. The width of the froi:s in 2D were multiples of the width of the roi:s. In 3D, a region size of 5 × 5 × 5 was also tested. The results presented in Table 4.1 contain measurements on classification accuracy on validation data at the last epoch of training. The ta-ble also presents the results from Bland-Altman analysis on 100 test data sets where the bias is the mean percentage point difference and the limits of agree-ment, loa, describes the 95% confidence interval. In the column named input, the number of input features are presented. In Table 4.1, it can be seen that a 14 × 14 froi produced the lowest bias and loa. Both an increase and decrease in froisize resulted in higher values on the mean difference and a wider confidence interval. The method agreement measurements, bias and loa, were higher in 3D measurements. The mean relative errors (MRE), were in general lower for small regions.

Since the results from 5 × 5 × 5 and 7 × 7 × 7 froi:s produce similar values, their respective Bland-Altman graphs are shown in Figure 4.2 for further analysis. The different froi sizes produced similar predictions.

Table 4.1:Table presenting results from the froi size experiments using the classification method. For an explanation of the measurements, see Section 4.3.

FROI (px) Features Input Accuracy (%) Bias (pp) LoA (pp) MRE (%) 5 × 5 × 5 i+ glcm 28 84.30 0.26 ±1.63 14.10 7 × 7 i+ glcm 28 82.00 0.19 ±1.76 16.74 7 × 7 × 7 i_{+ glcm} ₂₈ _83.74 _0.22 ±_1.64 _14.28 14 × 14 i_{+ glcm} ₂₈ _81.50 _0.13 ±_1.52 _17.02 14 × 14 × 14 i+ glcm 28 82.24 0.17 ±1.82 20.14 21 × 21 i+ glcm 28 80.50 0.15 ±1.98 18.97 21 × 21 × 21 i_{+ glcm} ₂₈ _81.63 _0.29 ±_3.82 _23.12

(44)

(a) Bland-Altman analysis of the 5 × 5 × 5 froi comparison to ground truth. The difference is given in percentage points.

(b)Bland-Altman analysis of the 7 × 7 × 7 froi comparison to ground truth. The difference is given in percentage points.

Figure 4.1:Bland-Altman comparison of 5 × 5 × 5 and 7 × 7 × 7 size froi:s.

(a)7 × 7 pixel froi placed in a liver image.

(b) 14 × 14 pixel froi placed in a liver image.

(c) 21 × 21 pixel froi placed in a liver image.

Figure 4.2:Different sizes of froi:s show different detail levels of texture.

4.4 Image Features

The second test of the classification method aimed to study different image fea-tures and their impact on method outcomes. Image feafea-tures were tested exclu-sively and in combinations to analyze if any are complementary when describing image regions. The results are presented in Table 4.2. 3D tests were performed using a 7 × 7 × 7 froi while two 2D experiments were performed on 14 × 14 froi:s. dwtfeatures were calculated using only one decomposition level in 3D due to the froi size being too small for a two level decomposition. Setups of image fea-tures consisted of intensity (i), intensity and histogram of gradient magnitude (i + hgm), intensity and gray level co-occurance matrix (i + glcm) followed by intensity and discrete wavelet transform features (i + dwt). Two different three-feature combinations were also tested consisting of i + hgm + glcm and i + hgm + dwt features. The features containing only i + glcm and i + dwt produced the lowest bias and the most narrow loa. Due to these results, those are the only

(45)

fea-ture setups tested in volumetric measurements. In the volumetric measurements the i + dwt combination resulted in the lowest MRE and loa out of all combina-tions. In Figure 4.3 roi placements in a sample liver are shown. It can be seen that the feature combinations containing glcm seemed to avoid some smaller textural structures while all feature combinations avoided the coarser abnormali-ties. Figure 4.4 shows Bland-Altman graphs (left) of the tests on i + glcm and i + dwtin 2D and 3D. Plots of the relative errors (right) are also showed, from which the MRE score is calculated. What can be noted is that a few relative errors were significantly higher. This was due to roi false positives in unsuited positions and few regions classed as suited positions. This resulted in smaller distributions in liver tissue of roi:s, thus giving false fat fraction values.

Table 4.2: Table presenting results from the image feature experiments us-ing the classification method. For an explanation of the measurements, see Section 4.4.

froi(px) Features Input Accuracy (%) Bias (pp) LoA (pp) MRE (%)

14 × 14 i ₄ _80.31 _0.22 ±_2.31 _20.43 14 × 14 i+ hgm 24 80.90 0.25 ±2.28 16.97 14 × 14 i_{+ glcm} ₂₈ _81.50 _0.13 ±_1.52 _17.02 14 × 14 i_{+ dwt} ₃₂ _80.70 _0.13 ±_1.54 _19.14 14 × 14 i+ hgm + glcm 48 81.11 0.16 ±1.69 17.47 14 × 14 i_{+ hgm + dwt} ₅₂ _81.05 _0.18 ±_1.82 _16.86 7 × 7 × 7 i_{+ glcm} ₂₈ _82.24 _0.17 ±_1.82 _20.14 7 × 7 × 7 i_{+ dwt} ₃₆ _80.32 _0.22 ±_1.37 _13.62

4.5 Classification and Regression

Testing of the combined classification and regression method was performed on i+ glcm and i + dwt features. Both two-dimensional and three-dimensional measurements were tested, where a 14 × 14 pixel froi and a 7 × 7 × 7 pixel froi are used, respectively. Table 4.3 presents the results of the experiments. The mea-sured units are liver classification accuracy at the final epoch of classifier training of the validation data. The column named regression loss shows the final mean squared error function value on the validation data. All tests resulted in bias val-ues close to 0 and with similar width of the limits of agreement (loa). Also the MRE measurements were in the same range on all tests except for the 3D i + dwt features, which was lower. In Figures 4.5-4.8, examples of liver classification and decided measurement roi:s based on predicted d-values are shown. The last im-age in each figure shows the final roi positions (blue circles) and ground truth roi:s (green squares). On the right side in Figures 4.6a, 4.7a, 4.8a, it can be noted some false positive classifications which were mostly located in the spleen. Figure 4.9 shows Bland-Altman graphs (left) of the tests on i + glcm and i + dwt in 2D and 3D. Plots of the relative errors (right) are also showed. In the relative error graphs it can be seen that the large relative errors were occurring at low fat

(46)

frac-(a) ifeatures (b) i+ hgm features

(c) i+ glcm features (d) i+ dwt features

(e) i+ hgm + glcm features (f) i+ hgm + dwt features

Figure 4.3: Figures showing roi placement in the same subject using dif-ferent combinations of image features and the classification method. Green squares show manually placed and blue circles show automatically placed roi:s.

(47)

tions. In Figure 4.9h it can be noted that the majority of errors were below 100% relative error. These results were better than the results in Figure 4.9b,4.9d,4.9f, where a larger number of measurements had a higher relative error due to higher numbers of false fat fraction predictions and false positive classifications.

Table 4.3: Table presenting results from the classification and regression method experiments. For an explanation of the measurements, see Section 4.5.

froi_(px) _Features _Input _{Accuracy (%)} _{Reg. Loss} _{Bias (pp)} _{LoA (pp)} _{MRE (%)}

14 × 14 i_{+ glcm} ₂₈ _99.20 _6.011 _0.03 ±_1.98 _28.69

14 × 14 i+ dwt 32 99.27 5.155 -0.03 ±1.67 26.64

7 × 7 × 7 i_{+ glcm} ₂₈ _99.06 _2.651 _0.02 ±_2.09 _29.75

7 × 7 × 7 i_{+ dwt} ₃₆ _99.31 _1.550 _-0.03 ±_1.44 _21.72

The results presented in Table 4.4 are from the same tests as in Table 4.3. The difference is that roi:s with d predictions within a 0.8 percentage point limit are used to calculate the mean fat fraction. Figure 4.10 shows the limited version of Figure 4.9. These tests were performed in order show how well the method performed using a threshold value for d in a similar way as the method using solely classification. The results showed that limiting the predicted values on d did not greatly affect the method agreement scores. Similar measurements indi-cated that a majority of the placed roi:s already were within the 0.8 percentage point limit. The measurements using i + glcm features produced slightly wider limits of agreement in 2D, while i + dwt reduced the limits. In 3D, i + glcm was unaffected and i + dwt resulted in slightly wider limits of agreement and a higher MRE. By limiting the fat fraction difference predictions, roi:s were re-moved. Removing roi:s resulted in a less distributed placement and a fat fraction calculation more affected by fluctuations. This is the reason why differences in measurements are seen.

Table 4.4: Table presenting results from the classification and regression method experiments with a 0.8 percentage point limit on the predicted d-values. For an explanation of the measurements, see Section 4.5.

froi_(px) _Features _Input _{Accuracy (%)} _{Reg. Loss} _{Bias (pp)} _{LoA (pp)} _{MRE (%)}

14 × 14 i+ glcm 28 99.20 6.011 0.02 ±2.36 29.79

14 × 14 i_{+ dwt} ₃₂ _99.27 _5.155 _0.03 ±_1.60 _24.19

7 × 7 × 7 i_{+ glcm} ₂₈ _99.06 _2.651 _0.04 ±_2.08 _29.74

(48)

(a)2D: i + glcm features (b)2D: i + glcm features

(c)2D: i + dwt features (d)2D: i + dwt features

(e)3D: i + glcm features (f)3D: i + glcm features

(g)3D: i + dwt features (h)3D: i + dwt features

Figure 4.4: Bland-Altman analyses and relative error plots from i +glcm and i +dwt in 2D and 3D using the classification method. Note the varying scales on the y-axes.

(49)

(a) (b)

(c)

Figure 4.5: Examples of the classification and regression method. (a) Liver classification using two-dimensional i +glcm features. (b) Green squares contain the lowest predicted d-values. (c) Green regions show manually placed roi:s and blue circles show automatically placed roi:s.

(50)

(a) (b)

(c)

Figure 4.6: Examples of the classification and regression method. (a) Liver classification using two-dimensional i +dwt features. (b) Green regions con-tain the lowest predicted d-values. (c) Green squares show manually placed and blue circles show automatically placed roi:s.

(51)

(a) (b)

(c)

Figure 4.7: Examples of the classification and regression method. (a) Liver classification using three-dimensional i +glcm features. (b) Green regions contain the lowest predicted d-values. (c) Green squares show manually placed and blue circles show automatically placed roi:s.

(52)

(a) (b)

(c)

Figure 4.8: Examples of the classification and regression method. (a) Liver classification using three-dimensional i +dwt features. (b) Green regions contain the lowest predicted d-values. (c) Green squares show manually placed and blue circles show automatically placed roi:s.

(53)

Figure 4.9: Bland-Altman analyses and relative error plots from i + glcm and i + dwt in 2D and 3D using the combined classification and regression method without limitations on d predictions. Note the varying scales on the y-axes.

(54)

Figure 4.10: Bland-Altman analyses and relative error plots from i + glcm and i + dwt in 2D and 3D using the combined classification and regression method with 0.8 percentage point limitations on d predictions. Note the varying scales on the y-axes.

(55)

5

Discussion

In the following chapter all experimental results from Chapter 4 are analysed and discussed.

5.1 Feature Region Size

In Table 4.1 the three trials on froi sizes showed that the 14 × 14 region excelled in most of the measurements for 2D. A smaller region (7 × 7 pixels) produced lower classification accuracy and both higher bias and loa. The results might be due to the region being too small to distinguish textural properties in the image. The larger region (21 × 21 pixels), on the other hand, produced results more sim-ilar to the 14 × 14; but was overall outperformed by it. A large region describes a more averaged measure of textural properties and might not provide the level of locality needed to distinguish between roi positions. For 3D measurements the 7 × 7 × 7 was the overall best performing froi size. Since the total number of vox-els were higher; a smaller region size provided more image region information with increased locality, thus achieving more accurate results as seen in Table 4.1.

5.2 Image Features

The experiments on different image features showed similar results. The exam-ples of qualitative results in Figure 4.3 showed similar roi placements in the same liver samples. It can be noted that the roi:s in (c) and (e) avoided a certain region containing finer texture, while the other features mostly only avoided the rough texture variations. The images in (a) and (d) indicate that in this particular case the dwt features did not contribute to the decision making, since the same placement was achieved using only signal intensity features. The intensity (i)

(56)

ture produced quantitative results, seen in Table (4.2), in the same value range as the combined features. The intensity feature results indicated that the main contributor to classification accuracy was the signal mean and standard deviation in the fat and water images. Due to the values directly corresponding to actual fat and water content it is reasonable to believe that these features provided an indication on suitable positions for roi placement. The Bland-Altman analysis showed both a higher bias and limits of agreement for i features compared to the combined features; showing that a reasonable classification accuracy does not lead to the same level of method agreement. The features generating the highest method agreement score with lowest bias and loa were i + glcm and i + dwt. These features were considered the best performing combinations and were used in the subsequent experiments.

5.3 Classification and Regression

The liver classification and pdff regression tests in Table 4.3 showed results in a similar order of magnitude as prior trials. A notable difference is a better method agreement (bias and loa) with lower bias and confidence interval on the i + dwt features. The improvement was opposed by an increase in MRE. The liver classi-fication accuracy was high; resulting in a suitable liver tissue extractor providing a reasonable segmentation of the liver. Comparing measurements in Tables 4.3 and 4.4 indicated that setting the acceptance limit parameter to 0.8 as in the first method using classification did not significantly increase or decrease the measure-ment scores. Thus a clear advantage of the tested method was that no acceptance limit parameter had to be set.