Automated Pulmonary Nodule Detection on Computed Tomography Images with 3D Deep Convolutional Neural Network

(1)

IN

DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2018,

Automated Pulmonary Nodule Detection on Computed

Tomography Images with 3D Deep Convolutional Neural Network

ANTOINE BROYELLE

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

(3)

KTH Royal Institute of Technology

School of Electrical Engineering and Computer Science Master in Computer Science

Degree Project June 2018

Author:

Antoine Broyelle broyelle@kth.se Principal:

Optellum Ltd.

Oxford Centre for Innovation Oxford, OX1 1BY, United Kingdom Examiner at KTH CSC:

Hedvig Kjellström hedvig@csc.kth.se Supervisor at KTH CSC:

Pawel Herman paherman@kth.se Supervisor at the principal:

Timor Kadir

timor.kadir@optellum.com

(4)

iv

Abstract

Object detection on natural images has become a single-stage end-to-end process thanks to recent breakthroughs on deep neural networks. By contrast, automated pulmonary nodule detection is usually a three steps method: lung segmentation, generation of nodule candidates and false positive reduction.

This project tackles the nodule detection problem with a single stage model using a deep neural network.

Pulmonary nodules have unique shapes and characteristics which are not present outside of the lungs. We expect the model to capture these characteristics and to only focus on elements inside the lungs when working on raw CT scans (without the segmentation). Nodules are small, distributed and in- frequent. We show that a well trained deep neural network can spot relevant features and keep a low number of region proposals without any extra pre- processing or post-processing.

Due to the visual nature of the task, we designed a three-dimensional convolutional neural network with residual connections. It was inspired by the region proposal network of the Faster R-CNN detection framework.

The evaluation is performed on the LUNA16 dataset. The final score is 0.826 which is the average sensitivity at 0.125, 0.25, 0.5, 1, 2, 4, and 8 false positives per scan. It can be considered as an average score compared to other submissions to the challenge. However, the solution described here was trained end-to-end and has fewer trainable parameters.

KEYWORDS: Deep Learning, Artifical Neural Networks, Lung damages, Lung cancer, CT scans, Pulmonary Nodules, Detection, Region Proposal.

(5)

v

Sammanfattning

Objektdetektering i naturliga bilder har reducerates till en enstegs process tack vare genombrott i djupa neurala nätverk. Automatisk detektering av pulmonella nodulärer är vanligtvis ett trestegsproblem: segmentering av lunga, generering av nodulärkandidater och reducering av falska positiva utfall. Det här projektet tar sig an nodulärdetektering med en enstegsmodell med hjälp av ett djupt neuralt nätverk.

Pulmonella nodulärer har unika karaktärsdrag som inte finns utanför lungorna. Modellen förväntas fånga dessa drag och enbart fokusera på element inuti lungorna när den arbetar med datortomografibilder. Nodulärer är små och glest föredelade. Vi visar att ett vältränat nätverk kan finna relevanta särdrag samt föreslå ett lågt antal intresseregioner utan extra för- eller efter- behandling.

På grund av den visuella karaktären av det här problemet så designade vi ett tredimensionellt s.k. convolutional neural network med residualkopp- lingar. Projektet inspirerades av Faster R-CNN, ett nätverk som utmärker sig i sin förmåga att detektera intresseregioner.

Nätverket utvärderades på ett dataset vid namn LUNA16. Det slutgilti- ga nätverket testade 0.826, vilket är genomsnittlig sensitivitet vid 0.125, 0.25, 0.5, 1, 2, 4, och 8 falska positiva per utvärdering. Detta kan anses vara ge- nomsnittligt jämfört med andra deltagande i tävlingen, men lösningen som föreslås här är en enstegslösning som utför detektering från början till slut och har färre träningsbara parametrar.

(6)

vi

Acknowledgment

I must express my very profound gratitude to thank Carlos Arteta, my supervisor at the principal. He guided me in the right direction whenever I needed it. He also took time to answer all my questions regarding my research or his personal experience as a former PhD and post-doc at Oxford University. A learnt a lot working alongside him.

Besides him, I would like to thank the Optellum team for their welcoming.

I learned a lot about entrepreneurship and early-stage companies.

I would also like to thank my supervisor at KTH, Dr. Pawel Herman, for guiding us through the requirements of the degree project.

My sincere thanks also goes to Quentin Chometon. The back-and-forths between us had contributed to both our projects, and all our lunch times con- versation made bearable the British weather.

I would finally like to acknowledge Gabriel Carrizo for the Swedish abstract and Lottie Woodward who had the kindness to review and proofread a report written in a French fashion way.

The authors would like to thank the LUNA16 challenge organizers for providing the dataset and evaluating results.

Last but not the least, I would like to thank my family for their support throughout my education.

(7)

Glossary of terms

CAD Computer-Aided diagnosis CN N Convolutional Neural Network

CT Computed Tomography

F DA Food and Drug Administration, federal agency of the United States Department of Health and Human Services

F P False Positive

F ROC Free Receiver Operating Characteristic

GLOBOCAN Clinical trial, carried by the International Agency for Research on Cancer, aims at providing an estimation of mortality, incidence and prevalence of major types of cancer worldwide GP U Graphical Processing Unit

HU Hounsfield Unit

IoU Intersection Over Union LU N A16 Lung Nodule Analysis 2016 N M S Non-Maxima Suppression

R − CN N Region-based Convolutional Neural Network ReLu Rectified Linear Unit

ROC Receiver Operating Characteristic T P True Positive

T P R True Positive Rate

vii

(8)

Introduction

According to the GLOBOCAN series of the International Agency for Re- search on Cancer published in 2014[1], lung cancer is the most commonly diagnosed cancer (1.82 million) and the most lethal cancer with 1.6 million deaths in 2012 within 184 countries. Cancer incidence changes deeply depending on the individuals, the gender, the development of the country, etc.

Interestingly, lung cancer incidence is higher in developed and industrialised regions inside Europe, America and Asia.

In general, cancers are due to an abnormal development of cells. Inside the lungs, the result is called pulmonary nodule. Lesions formed by the nodules are detected with non-invasive imaging processes like computed tomography (CT) scans, where nodules result in a radiographic opacity. The malignancy is assessed by characteristics of the nodule such as size, morphology, location, multiplicity, etc; but also with patient characteristics such as age, sex, race and medical background[2]. Some studies have shown that a screening program can reduce lung cancer mortality as this allows early-stage detection and a better follow up[3]. However, screening programs remain the exception and are prone to high false positive rate, increasing patient stress and cost. Nowadays, malignant pulmonary nodules are mainly discovered from incidental findings (liver or cardiovascular scans). The sooner nodules are detected, the more effective the treatment can be.

The challenge surrounding diagnosis is to detect early onset of lung cancer with only a small amount of information. For the past 5 years, artificial intelligence has been able to solve many tasks, especially visual tasks. It can now be used in the medical industry as computed-aided diagnosis (CAD) system, and help radiologists, ophthalmologists and other health profession- als during the decision-making processes. The United States Food and Drug Administration (FDA) has just approved a first decision-software based on artificial intelligence. The system is able to detect diabetes by looking at the retina[4]. It provides a screening decision without the need for a doctor to interpret the results. Another example, more research oriented, is the classification of natural images showing epidermis patches in order to prevent skin cancer[5].

1

(12)

2 CONTENTS

Optellum Ltd, the company where the thesis has been carried out, is currently building such a system. The company aims to provide an expert-level pulmonary nodule risk assessment using computer vision techniques trained on a large set of chest CT scans. To achieve this, Optellum Ltd is carrying some researches on nodule detection, nodule malignancy assessment and nodule segmentation among others. Since the company has to turn this research into a commercial product, it is important to explore the capabilities of simple models which require little resources. Additionally, unlike models for object detection on natural images (see Section 1.3), algorithms for nodule detection seem to rely on more processing steps, not necessarily well justified (see Section 1.4). Finally, the segmentation of the lungs as a pre-processing operation might discard nodules located on the lung walls.

The scientific aspect of the thesis is to gauge the effectiveness of a single- stage deep neural network based approach to the problem of nodule detection in medical images. In other words, the project discusses the need for pre-processing (segmentation) and post-processing (false positive reduction) stages in this specific diagnostic image analysis pipeline. The hypothesis is that a well trained CNN can have a low number of region proposals and learn to focus only inside the lung. To this end, we intend to design a CNN and compare its performance to other CNN based model used by previous submissions to the Lung Nodule Analysis 2016 challenge.

There are some key concepts and postulates made for tackling the pulmonary nodule detection problem:

• There is more information in a volume than in a few slices extracted from the CT scans.

• The network should be isotropic to cope with the variety of inputs.

• Nodules are small and sparse. This means a lot of information will be lost if the spatial compression of the network is too high. However, keeping a large spatial representation leads to more computation.

• Nodules can be approximated as spheres. Hence, the predefined bounding- boxes (anchors) are isotropic (spheres) and only vary in scale.

• The input represents a continuous volume so there is no obfuscations or occlusions which could lead to restrictive and conservatives rules and strategies.

• The number of region proposals should be quite low as the number of nodule per patient is also low.

(13)

CONTENTS 3

Chapter 1 presents some background information and interesting work related to medical image analysis and computer vision algorithms for object detection. Chapter 2, describes our approach in terms of data processing and design of the model. Chapter 3 describes the results and a comparative study. Finally, the report ends with a discussion of impacts and limitations in Chapter 4 and the conclusion.

(14)

Chapter 1 Background

1.1 Artificial Neural Networks

Artificial neural networks are a graph-based class of machine learning algorithms where nodes are called neurons. This is inspired by the way the brain works. Neurons inhibit or not their output potential based on a weighted sum of inputs coming from other neurons. The goal is to find the best weights to produce a meaningful output given the input and a criterion. This search is called the training phase.

Feed-forward artificial neural networks are particular types of neural networks where the information move in only one direction, from the input to the output, without any cycle inside the graph. In most cases, when feed- forward artificial neural networks are used, neurons are grouped to form layers and layers are stacked so that each neuron in one layer has direct oriented connections to the neurons of the subsequent layer. Deep neural networks are feed-forward artificial neural networks where many layers are stacked between the input and the output layers. The mapping from one layer to the next one can vary between algorithms. CNNs are one specific type of feed- forward artificial neural networks where the operations between layers are convolutions with some kernels which have learnable weights and bias.

CNNs have shown a useful and efficient usage in computer vision thanks to weights sharing and translation invariant [6][7]. They are used for different tasks such as classification [8][9], segmentation [10] and detection [7][11][12].

Medical imaging is no exception. Among many other projects, people worked on paediatric bone age estimation with X-ray[13], pneumonia detection on X- ray[14] and skin cancer classification on natural images of the epidermis[5].

1.2 Computed Tomography Imaging

Computed tomography (CT) is a noninvasive medical technique which uses X-ray to produce several cross-sections of the body. Each cross-section represents a slice of a few millimetres of the patient. CT scans are a set of slices usually representing a continuous part of the person being screened. For

4

(15)

1.3. DETECTION FRAMEWORK 5

each slice, the X-ray source and detector rotate around the patient. During the rotation, several snapshots are taken and are then processed to produce an image. Each pixel of the image expresses an intensity in Hounsfield Units (HU). Figure 1.1 describes the relation between the HU values and the associated substances. Radiologists have to visualise every single slice of the scan when looking for nodules. They perform the detection by spotting un- expected shapes and intensities. However, this visual inspection has been shown to be prone to failures; only 68% of nodules are found[15].

substance values (HU)

Air -1000

Lung -500

Water 0

Blood 30 to 45 Soft tissue 100 to 300

Bone 700 to 3000 Table 1.1: Hounsfield Units meaning

1.3 Detection Framework

Object detection is a combination of object classification and object localisation. The classification is the task of choosing one label among a predefined set. The localisation is solved by putting a bounding box around the object.

The tighter the bounding box, the better.

The first attempt to the detection problem using a CNN was made in 2014 by LeCun et al. when they released OverFeat[7], a network with a small re- ceptive field applied on the input image at different scales and positions in a sliding window fashion. This is computationally greedy as the network is run many times.

Later in 2014, a region-based convolutional neural network (R-CNN)[11]

was published. This model contains four independent components. The first step is to find regions of interest, called region proposals. The second step uses a CNN for the feature extraction of each proposal. Finally, the feature vector at the output of the CNN is fed into a support vector machine (SVM) for classification and a linear regressor for the localisation. The main issue with R-CNN is the amount of computation required. Indeed, the network is applied independently on every single candidate region even though some of them may overlap (see Figure 1.2a).

(16)

6 1.3. DETECTION FRAMEWORK

Figure 1.1: Illustration of anchors. The input is a 2D array (16x16). The feature map is a volume to represent the channels. The CNN has a subsampling ratio of 4. The dots on the image represent the mesh grid where anchors are applied. Three squared anchors are used and are represented with dashed lines.

This drawback was later fixed with Fast R-CNN[16], where the CNN takes a raw image and directly generates a class prediction and a class-specific bounding box. Hence, the image evaluation is single-stage. The improvement comes from the feature extraction directly performed on the full image using a CNN. The generated feature map is then cropped based on the proposed regions. The assumption is that the feature map is still embedding some spatial information. This operation is made by the region of interest pooling layer (RoIpool). The sub-patches of the feature map go into the final part of the network, which performs the classification and regression of the bounding box (see Figure 1.2b). The model still relies on an external region proposal algorithm which is the bottleneck of the computations.

The first end-to-end deep learning detection algorithm was Faster R-CNN [17].

Shaoqing et al. introduced a Region Proposal Network (RPN), generating region proposals with a CNN. An additional improvement, aiming to reduce computations, is the sharing of parameters between RPN and the detection network (Fast R-CNN) by using the same first convolutional layers. The RPN region of interest proposals are used exactly like Fast R-CNN does with the RoIpool (crop of the feature map). They use a key concept used later by all detection frameworks called anchors[18]. Anchors can be seen as pre-defined bounding boxes used as references. They come in different shapes, ratios and scales. Each of them is capturing different information due to their characteristics. They are all applied throughout the entire image in a sliding-window

(17)

1.3. DETECTION FRAMEWORK 7

(a) R-CNN (b) Fast R-CNN

(c) Faster R-CNN (d) FCN

Figure 1.2: Illustration of the pipeline for different detection frameworks. For each graph, the information flow is from the bottom to the top following the arrows.

Dark boxes represent a sub-task to optimize. Blue elements represent intermediate states. Blue squares represent a portion of the image and blue cubes represent feature maps.

(18)

8 1.4. NODULE DETECTION

fashion. For each anchor, at each location, the problem is broken into two parts : is there a relevant object (binary classification), and how to adjust the anchor for a better fit (regression of the bounding box). The Fast R-CNN component performs object classification and a class-specific bounding box refinement for each anchor (see Figure 1.2c).

Region-based Fully Convolutional Networks (R-FCN)[19] is a modified Faster R-CNN. Compare to Faster R-CNN, the feature map is cropped later on, on the very last convolution before the classification and regression of the bounding box (see Figure 1.2d). Other systems remove the need for RPN as they perform the object classification and class-specific bounding box regression directly (see Single Shot MultiBox Detector (SSD)[20] and You Only Look Once (YOLO)[12]).

At the end of each of these algorithms, a post-processing operation is con- ducted: the non-maxima suppression (NMS)[21]. Each anchor proposes a bounding box with a different confidence score. The problem is that an object often accumulates several region proposals. The NMS is a local search for the best proposal for each object. It aims to reduce duplication between the region proposals by only keeping the one with the highest confidence score compared to its neighbours.

1.4 Nodule Detection

In 2016, a dataset and a challenge were released for pulmonary nodule detection (see Section 2.1 below). Relevant work has been published in the context of the competition. Unfortunately, some of the reports lack the required details for reproductability, for example, due to intellectual property constraints.

The detection is often tackled with an information pipeline: segmentation of the lungs, nodule detection and false positive reduction. The motivation for segmenting the inner part of the lungs is to only present areas of interest to the network. The nodule detection works on the segmented lungs and often returns a high number of candidate regions (high sensitivity but also high false positive rate). This motivates the need for the third component which filters the interesting region proposals from the background samples.

Ding et al.[22] (team name qfpxfd) presented a solution using a 2D Faster R-CNN[17] (RPN + Fast R-CNN) with a feature extraction based on VGG- 16[23] pre-trained on ImageNet[24]. The second part is a 3D CNN with 6 convolutional layers and 3 fully connected layers. The interesting component of their solution is the presence of a deconvolution layer as the last operation of the feature extraction, after the VGG-16 network. This reduces the down- sampling ratio between the input and the feature map. They used 6 anchors

(19)

1.4. NODULE DETECTION 9

of sizes ranging from 4mm to 32mm, leading to high computational cost. As the network only sees three consecutive slices at a time, screening a full CT scan is expensive and time consuming.

Berens et al.[25]. (team name ZNET) made a unique interpretation of the labels. They used the annotations to generate a mask and tackled the detection problem by segmenting the nodules. Moving from the generated mask to nodule candidates required some post-processing based on heuristics. This was the nodule detection part and the neural network architecture used was a U-Net[26]. They performed the false positive reduction with a 2D ResNet[27]

containing 18 convolutions.

The team PAtech[28] and Zhu et al. who introduced DeepLung[29] had the same approach. The detection network is based on the RPN from the Faster R-CNN framework and the feature extraction is made with a network inspired by U-Net encoder-decoder[26]. Thus, they decoupled the field of view and the sampling ratio. Also, they both performed the selection of lungs with some hand-crafted rules based on Hounsfield units and geomet- rical considerations. For DeepLung, the false postive reduction was made with a dual path network[30] while, for the other solution, the same network architecture is used but trained with a different objective function.

(20)

Chapter 2 Methods

2.1 Clinical Dataset: Lung Nodule Analysis 2016 (LUNA16)

In 2011, Armato et al. published LIDC-IDRI dataset (Lung Image Database Consortium and Image Database Resource Initiative). This publicly available dataset contains the annotations of four radiologists. The LUNA16 challenge[31]

is a subset of LIDC-IDRI but provides a single ground truth for each nodule.

LUNA16 does not contain CT scans with a slice thickness greater than 3mm, CT scans with missing slices or CT scans with inconsistent spacing. Nodules smaller than 3mm or not marked by at least three radiologists are not kept.

Once excluded, this represents a collection of 888 CT scans. The localisation is provided for 1186 nodules spread among 601 CT scans. The challenge has two tracks: the nodule detection and the false positive reduction. This thesis is an attempt to solve the nodule detection problem. Figure 2.1 presents the distribution of nodules.

(a) Nodule diameter distribution (b) Spatial distribution.

Direct orthonormed system expressed in millimetres.

Figure 2.1: Distribution of nodules in LUNA16

10

(21)

2.2. PREPARATION OF THE DATA 11

The organisers also provide lung segmentation. However, the mask had been automatically generated with algorithms[32]. The Figure 2.2 illustrates one slice of the generated mask. The output is not suitable for segmentation study as advise by the organisers.

Figure 2.2: Lung mask. The white represents the trachea and main bronchus. The light grey and the dark grey are the inner volume of the left lung and right lung respectively.

2.2 Preparation of the data

CT scans come in various sizes and resolutions. The first step is to resample them to an isotropic resolution of 1mm between the centre of two consecutive voxels in the axial, coronal and sagittal directions. Subpatches of 128 × 128 × 128 containing at least one nodule are then extracted. Patches are strictly included in the CT volume. These patches define the working samples. Tri- linear interpolation was used to determine the final voxel values.

Values are clipped between -1000 and 400 HU. Hounsfield units (HU) lower than -1000 do not have any semantic meaning and are used for padding.

Figure 2.3 represent the HU distribution on the LUNA16 dataset. Values higher than 400 to not bring any information to the task, they represent bones or foreign bodies like pacemakers (see Table 1.1).

Data augmentation is used during training to reduce risks of overfitting.

A random crop of 96 × 96 × 96 is extracted from the bigger patches, all three axes are randomly flipped and 3D 90-degree rotations are randomly applied.

Values are standardised so that on average samples have a zero mean and a unit variance.

(22)

12 2.3. PIPELINE

Figure 2.3: HU values distribution on LUNA16

2.3 Pipeline

The nodule detection is based on the RPN of the Faster R-CNN[17]. As the problem is binary (is there a nodule or not?), RPN is similar to single view networks, like SSD[20] or YOLO[12]. The network uses 3D convolutions and requires a volume as input. The first layers are used for the extraction of a high dimensional, low-resolution feature map. The number of anchors is set to three per spatial position of the output feature map. Due to the general spherical shape of nodules, anchors have the same ratio and only come in different scales.

Unlike 2D detection on natural images (e.g. photos), where the objects are roughly centred and use most of the space, nodules are small and sparse.

The average diameter is 7.3mm (see Figure 2.1a), which make the feature 4 million times smaller than the input volume on average. Thus the anchor diameters and the spatial compression of the network are things that need to be carefully set. All anchors are applied in the input space at the centre of each pixel cluster contributing to 1 element of the feature map. It is important to understand that anchors are defined in the input space but are evenly positioned based on the shape of the feature map. Figure 2.5 describes the impact of both the scale ratio and the selection of anchors on the amount of information captured. Networks with the smallest ratio are the most interesting even though they are more computationally heavy. For an input of size WxHxD, K anchors and an isotropic network with a scale ratio of S, the final feature map of the network is 5 ∗ K ∗ W ∗ H ∗ D/S³. The factor 5 corresponds to the 3D position (e.g. x, y and z), the diameter, and the confidence score.

As described below, in our case we have 3 anchors (K = 3), a scale ratio of 4 (S = 4) and an input of 96 × 96 × 96 (W = H = D = 96).

(23)

2.4. NETWORK ARCHITECTURE 13

2.4 Network Architecture

The feature extraction is inspired by ResNet[27] and 2D convolutions are re- placed by 3D convolutions. In that respect, a first convolution with kernels of size 7 × 7 × 7 and a stride of 2 in each direction is applied. As we try to keep a low ratio between the output feature map and the input, only 3 ResNet blocks made out of 2 residual connections are used and only one of them has a feature stride greater than 1. Two convolutions with kernels of 1 × 1 × 1.

are then applied for the classification and the regression of the bounding box.

Figure 2.4 presents the architecture of the CNN used for this experiment.

The input is set to 96 × 96 × 96 due to memory limitation during training.

This architecture was designed to obtain a spatial scale factor of 4 between the input and the output space. Figure 2.5 depicts the need for a small scale factor between the inputs space and the output space in order capture as much information as possible.

Figure 2.4: Convolutional Neural Network. Tensors are described with their shape (K, D × H × W ) where K is the number of channels, D the depth, H the height and W the width. These values correspond to a sample at training time. Cxs

defines a 3D convolution with a kernel of size x and a stride of s, followed by a batch normalisation and a rectified linear unit (ReLu) activation. Rsdefines a ResNet[27]

block with 2 residuals connections using convolutions with a kernel size of 3. For each block, the stride of the first convolution and the first residual connection is set to s.

(24)

14 2.5. TRAINING

2.5 Training

The multitask loss function for the anchor i is defined as:

Lossi = Lcls(pi, p^∗_i) + 2p^∗_i × L_reg(ti, t^∗_i) (2.1) where ∗ denotes the ground truth, p is the probability of being a positive anchor and t is the representation of the bounding box defined as a relative offset vector:

t = x − x_a da

,y − ya

da

,z − za

da

, log( d da

)

(2.2) where (x, y, z, d) represents the position and the diameter, and (xa, ya, za, da) represents the anchor position and scale. In Equation 2.1, the factor two comes from the fact that the regression is only performed for positive anchors. The binary cross-entropy is used to compute the classification loss L_cls. The regression loss Lregis the smooth-L1[16].

An anchor is considered as positive if it has the highest intersection over union (IoU) with any ground truth bounding box or if the IoU is higher than 0.5. The IoU between 2 elements is defined as the volume of overlap divided by the volume of union. The motivation is that the closest anchor, the one with the minimum transformation, should be the anchor capturing the information of the nodule. Anchors with an IoU lower than 0.02 are considered as negative anchors. In any other case, anchors are not considered as relevant and their contributions are not taken into account in the final loss. The learning should not take into account complex cases to prevent the system from being confused. Figure 2.5 validates the need of having two rules in order to consider an anchor as positive. The rule of being the best match is mainly used by small anchors. The other rule is mainly used by larger anchors.

Hard negative mining is applied to improve generalisation. The final loss is a weighted sum of each anchor loss (eq 2.1). For a positive anchor, the weight is set to 1; for negative anchors, the weight is the probability of being classified as a nodule. Thus, hard examples contribute more during the learning. To deal with the high imbalance between classes, weights are normalised for each class .

RMSprop is the optimiser used for the back-propagation with α = 0.99,

= 1e − 08and a momentum of 0.9. The initial learning rate is set to 0.001. A learning rate decay strategy is used: it is divided by 2 every 30 epochs. The system is trained for 150 epochs and the best model is the network with the lowest loss on the validation set. The batch size is 32. For computations, the graphical processing units (GPU) used were Nvidia GEFORCE GTX 1080 Ti.

(25)

2.6. EVALUATION 15

2.6 Evaluation

As recommended and provided by the LUNA16 organisers, a 10-folds cross- validation is used. One fold was used for testing, two for validation and seven for training. At test time, the full CT scans are sent as the input (as opposed to 96 × 96 × 96 during training).

The non-maxima suppression (NMS) is applied as a post-processing fil- tering operation. The neighbourhood notion is defined using the IoU and the value to optimise is the probability of being a nodule. If candidate bounding boxes intersect with the bounding box with the highest confidence score, and if the IoU between them is greater than a threshold, then these region proposals will be discarded. Due to the absence of obfuscation, a low threshold is used (pt= 0.1).

If a bounding box proposal is located within a certain distance to the centre of a nodule defined by the ground truth, it is considered as a positive match. This distance is set to the radius of the nodule defined by the ground truth. Other candidates are considered false positives. Note that only the central location of the region proposals is evaluated with this metric; the size of the bounding box is not taken into account. The LUNA16 challenge uses the free receiver operating characteristic (FROC) analysis. This could be seen as an adjustment of the receiver operating characteristic (ROC). A point of the FROC curve is generated by computing the average false positive (FP) region proposals per scan and the sensitivity at a given score threshold. The sensitivity, also called true positive rate (TPR), is defined as:

T P R = T P/P = T P/(T P + F N )

where TP is the number of true positives, P the number of positives, TP the number of true positives and FN the number of false negatives.

The final metric is the average of the sensitivity at 0.125, 0.25, 0.5, 1, 2, 4 and 8 FPs per scan. The purpose is to take into account the scarcity of nodules and so to expect only few candidates. Under this settings, the worst model will get a score of 0 and the best model a score of 1.

(26)

16 2.6. EVALUATION

(a) IoU mean 0.40318, median 0.38023 (b) IoU mean 0.36657, median 0.36165

(c) IoU mean 0.23376, median 0.19764 (d) IoU mean 0.232133, median 0.193360

(e) IoU mean 0.098513, median 0.053458 (f) IoU mean 0.098380, median 0.053943 Figure 2.5: Distribution of the maximum intersection over union for each nodule (IoU with the best anchor). Nodules are described with their diameter.

The scale ratio is 4 for the first row (2.5a and 2.5b), 8 for the second row (2.5c and 2.5d), and 16 for the last row (2.5e and 2.5f). On the left side, anchors have a diameter of 5, 10 and 20 mm; on the right the diameters are 8, 16 and 32 mm.

(27)

Chapter 3 Results

3.1 Anchors Selection

As described in Section 2.3, anchors are predefined bounding boxes. There- fore, the greater the number of anchors, the better the characteristics of the nodules are captured. Nonetheless, it is computationally costly to have many anchors. In our experiments, having more than 3 anchors did not provide a significant improvement in the results. Consequently, the number of anchors was set to 3.

Having anchors evenly distributed to match the size distribution of the nodules in the data is important. A k-means algorithm using euclidean distance with three clusters returns centroids at 5.7, 9.9 and 18.7mm. These values were rounded to 5, 10 and 20mm. Figure 2.1 represents the distribution of nodules.

Figure 2.5 presents the amount of information captured depending on the distribution of anchors for several models with different scale ratios. The captured information is measured with the intersection over union; the higher, the better. As expected, the networks with the a low scale ratio can more easily model the distribution of the data and minimise the shifts between the anchors and the bounding boxes. An even distribution of anchors over the data distribution also helps minimising this shift. Our model fell under the context of Figure 2.5a.

3.2 Evaluation

We designed a single-stage deep learning architecture based on convolutions and residual connections to tackle the pulmonary detection task. All hyper- parameters were carefully set to achieve interesting results. The most important ones were the number of anchors, the sizes of anchors and the compression rate of the network. The model was trained on small patches but can run on full CT scans. The preparation of the data consisted in resampling the CT scans to a resolution of 1mm × 1mm × 1mm and normalising the values between -1 and 1.

17

(28)

18 3.2. EVALUATION

Figure 3.1: FROC curve. The continuous line is generated on the submissions over the 10 folds. The dotted lines represent the 95% confidence interval using bootstrapping with 1000 re-samplings with replacement.

The results of the proposed CAD system are reported in Figure 3.1. To get a better insight of the global performance, bootstrapping technique was used (random resampling with replacement). We achieved a final score of 0.826. At test time, the final computations over an entire CT scan takes three seconds in average on a Nvidia GEFORCE GTX 1080 Ti.

Appendix A is a visual interpretation of the quality of the detection. As expected, the detection worked great on different sizes, intensities and shapes.

This is presented in Figure A.1. Failure modes were categorized and are presented in Figure A.2 for false negatives and Figure A.3 for false positives.

Most false negatives were related to nodules with low intensity or soft boundaries. Sometimes the localization was the failure point. In this case, the post-processing was often to blame; keeping only the bounding box with the highest confidence score is sometimes not the best strategy. The first source of false positive were vessels. This was the biggest problem. Other failure cases were coming from potential regions of interest but not included in the annotations due to the inclusion criteria. This includes nodules smaller than 3mm or regions annotated by too few radiologists.

(29)

3.3. LOCALIZATION 19

3.3 Localization

Figure 3.2 reports the proportion of bounding box proposals laying outside of the lungs. This proportion was computed for several classification confidence score thresholds. A nodule was considered as part of the inner volume of the lungs if a given number of pixels contained by its bounding box overlapped with the lung masks provided by the LUNA16 challenge organisers.

Otherwise, the nodule belonged to the outer space.

This experiment did not aim at assessing the correctness of the regression concerning the bounding box position. Also, we wanted to make sure that small nodules on the edges are considered as part of the inside of the lungs.

Therefore, a nodule was considered as part of the lung volume if a single pixel of the bounding box overlapped with the lung mask.

For any threshold, the proportion of nodules outside of the lungs represents less than 4% of the bounding boxes generated by the model.

Figure 3.2: Proportion of bounding box proposals outside of the lungs over different classification confidence score thresholds.

3.4 Comparison

By mid-February 2018, 29 submissions had been made to the LUNA16 challenge. The median score was 0.845, the average was 0.82, and the standard deviation was 0.09. Unfortunately, only few submissions contained a correct description. Our final result can be considered as average.

(30)

20 3.4. COMPARISON

Figure 3.3 represents a comparative study of performance compared to the number of parameters used in the models. Interestingly, the final result appeared to be decorrelated from the number of trainable parameters. Due to the two-stage process (nodule detection and false positive reduction), most of the models get an important number of parameters compared to the model built for this project.

Unfortunately, a lot of submissions to the LUNA16 challenge do not have a complete report. Information is not shared for the protection of the intellectual property or the process is not fully described, making it impossible to reproduce. This makes a more detailed comparison difficult to perform.

Figure 3.3: LUNA16 submission comparison. The top plot presents the evolu- tion of the sensitivity for several submissions. The bottom one shows the final score against the number of trainable weights for the same models.

(31)

Chapter 4 Discussion

4.1 Main findings

Several machine learning approaches could have been used for this problem.

A Deep CNN was chosen based on the outstanding results reported in visual tasks and, specifically, obtained in the LUNA16 challenge[33][28][29]. How- ever, for pulmonary nodule detection the CNNs have been used only as a component in a larger information pipeline.

This degree project describes the pulmonary nodule detection task and proposes a single stage model using deep convolutional neural networks.

As a result, our pipeline is simple and similar to the ones used for object detection on natural images.

We identified two major elements which significantly influenced our experiments:

• The sizes of anchors and their quantity. The anchors need to represent the data and follow the same distribution in order to maximise the information captured. In our case, cubical anchors were a good fit due to the spherical shape of the nodules.

• The subsampling ratio between the input and the last feature map of the CNN. Small ratios allow the CNN to deal with small features.

Some detection algorithms on natural images are criticised for their performances on small objects. It is for example the case for YOLOv2[12]. We have shown that CNNs can learn to identify small and sparse features. These capabilities are related to the distribution of anchors, but also the criteria by which anchors are considered positives or negatives. Under YOLOv2 framework, only the anchor at the centre of the bounding box ground truth is considered as positive. In our case, we combine two rules and as a result we might have several positive anchors representing the same element.

In addition, the fact that nodules are unlikely is tackled with a good training environment based on hard example mining and a good weighting strategy.

21

(32)

22 4.2. IMPACTS

Figure 3.2 shows that the model found some nodules outside of the lungs.

However, the proportions of bounding box proposals outside of the lungs is small. Removing them by using segmentation would improve results. How- ever we believe that this proportion is small enough to question the use of the segmentation. In addition, clinicians can easily spot these mistakes.

4.2 Impacts

Nowadays, image processing is key component of the decisions taken by clinicians. CADs represent the next step down the road by providing some insights and key elements to the clinicians. Our detection system, combined with other models, could deliver an expert diagnosis on lung cancer making health cares more efficient and cost-effective. In order to do so, the detection model will have to work with, for example, a nodule classifier (is it a malignant nodule? what is the stage of development?), a nodule segmentation tool (what is the radius and the volume?) but also with patient information such as the sex, the age and the patient family history.

For our model, an interesting operating point would be to work at 0.9 classification confidence score, leading to an average false positive scan of 7 and a sensitivity of 93.6%. Also, some false positives are really easy to spot even for non-trained eyes. Indeed, some bounding boxes represent vessels or are outside of the lungs.

Our observation is that CNN architectures developed for object detection on natural images can easily be tuned and applied on different types of images. A single stage model based on CNN leads to an end-to-end training.

This end-to-end paradigm is an interesting approach as it provides a robust and implicit solution for complex problems.

4.3 Limitations

The confidence scores associated with generated bounding boxes do not have any semantic or physiologic meaning. This could represent an issue. The first issue is about finding an interesting operating point depending on the application. The second issue is the potential impact this information could have on the clinician. Will two bounding boxes with two different confidence scores receive the same consideration? More work has to be done by the scientific community on the interpretation of the decisions made by CNNs.

Even though one can develop an impressive model and achieve a high score, this work has a small impact. Like any medical project, the research is often a small part of the project. Indeed, in order to have a meaningful impact on society the model has to be evaluated with clinical trials and get

(33)

4.4. ETHIC AND SUSTAINABILITY 23

the approvals from health and sanity organisations around the world. Finally, clinicians have to be trained and use the tools.

Since we compared our model to the mean performance of few other submissions, we are not able to compare using statistically convincing evidence (null hypothesis testing).

Another limitation of this project is related to the size of the dataset. LUNA16 contains 888 scans which cannot represent the diversity of morphologies, devices and protocols present around the world. More data will have to be curated and annotated in order to build a more robust CAD. Optellum Ltd is currently working on this point.

A more technical limitation is related to the use of CNNs. In general, CNNs are trained with supervised learning, which means all the data have to be annotated, often manually and in a large quantity. Also, the model performs a lot of operations computed on large GPUs which are expensive and not eco-friendly because they consume a lot of electricity. It might not be possible to deploy the solution on a computer like the ones used in hospitals due to the absence of GPUs. The company is now looking into requirements for a cloud-based solution.

4.4 Ethic and Sustainability

From an ethical point of view, on the one side, one can be worried about the data which will be gathered, collected and stored by Optellum Ltd. Indeed, such system will store medical records, which are highly personal and all technology devices are prone to security issues. On the other side, such CAD gives an expert level decision to anyone for nearly nothing which ensures gender, ethnicity and financial equality. One can imagine that a CAD will replace radiologists; on the contrary, such tool will boost their productivity by taking obvious decisions and provide useful insights on tricky cases.

As described before, CNN architectures require components using a lot of energy compare to other hardware components. Consequently, the impact on the environment depends on how and where the electricity is generated.

In general CADs do not need to be real time. As a result, old GPUs can be used and their life-cycle can be extended. In all, this project has the same impact than any project relying on GPUs. From an ethical point of view, the project is promising but no doubt more questions will arise as the system is more and more used.

(34)

Chapter 5 Conclusion

In this study, a novel pulmonary nodule detection CAD system has been developed which uses deep convolutional neural networks. The detection is inspired by the region proposal network of the Faster R-CNN framework.

The evaluation is made using the free response receiver operating characteristic (FROC). The proposed architecture yields 0.826 which correspond to an average result. The final network relies on fewer convolutions and fewer trainable parameters compared to previous submissions[33][28][29].

This thesis questioned the necessity of segmentation and false positive reduction as elements of the information pipeline. Our result, although not competing with the current state of the art which used multi-stage processing, shows that a well-designed single-stage model based on deep learning can achieve interesting results. This is possible due to a low number of region proposals and a network mainly focusing on features inside the lung.

The final model is simple and closer to the pipeline used for object detection on natural images. The results are encouraging and this work could lead to many other experiments.

5.1 Future work

From this promising work, a lot of new experiments can be derived. We believe that the architecture of the CNN has a minor impact on the overall performance as long as the ratio is kept low. However, this has to be tested. New architectures will probably present trade-offs between speed and accuracy as we see for object detection in natural images[34].

Single-shot detectors have been shown to be faster and simpler, but have lower accuracy than two-stage detectors because of extreme class imbalance between background and object for the final loss[34]. Focal loss[35] is a modified version of the cross entropy loss used as the classification loss for single- shot detector. It weighs down the loss assigned to well-classified examples and therefor reduce the contribution of easy backgrounds. As our binary RPN could be seen as single-shot detector, an interesting experimentation could be to replace the hard negative mining with the focal loss. It could also

24

(35)

5.1. FUTURE WORK 25

be interesting to add the second part of the Faster R-CNN on top of the RPN and to evaluate the performance gained.

Collecting CT scans of organs surrounding the lungs can also improve performances. These samples, if properly integrated during the training, could help reducing the number of false positive detections outside of the lungs.

One interesting thing would be to learn the non-maxima suppression with a supervised end-to-end learning as described by Hosang et al.[36]. Further work should explore more complex data augmentation. We believe the pyra- midal feature hierarchy would be quite appropriate[37]. Under this framework, predictions are made on several intermediate feature maps and not only on the last one.

Another set of experiments concern the preparation of the data. We did not test any other resampling resolution or anisotropic ones. Also, to improve their productivity, some radiologists use maximum intensity projection over several continuous slices. This results in vessels displayed as lines and nodules displayed as circles. However, this technique tends to drop small nodules on the edges. Inspired by this use case, it could be interesting to try different projections (minimum, average, maximum, etc.) over different thicknesses (3mm, 5mm, 10mm, 20mm, etc.). This could lead to faster detection as the deep neural network will have fewer operations to compute due to a smaller input.

Multi-task training has been shown to improve performances for each sub-task. This has been demonstrated by Sermanet et al. when working on OverFeat[7] and more recently by He et al. with Mask R-CNN[38]. The later uses a single network for object detection, instance segmentation and key point identification. In our case, combining nodule detection with lung segmentation or lobe segmentation could be interesting.

(36)

Bibliography

[1] J. Ferlay, I. Soerjomataram, R. Dikshit, S. Eser, C. Mathers, M. Rebelo, D. M. Parkin, D. Forman, and F. Bray, “Cancer incidence and mortality worldwide: sources, methods and major patterns in globocan 2012,”

International Journal of Cancer, vol. 136, no. 5, pp. E359–E386, 2015.

[2] H. MacMahon, J. H. Austin, G. Gamsu, C. J. Herold, J. R. Jett, D. P.

Naidich, E. F. Patz Jr, and S. J. Swensen, “Guidelines for management of small pulmonary nodules detected on ct scans: a statement from the fleischner society,” Radiological Society of North America, vol. 237, no. 2, pp. 395–400, 2005.

[3] J. Abraham, “Reduced lung cancer mortality with low-dose computed tomographic screening,” Community Oncology, vol. 8, no. 10, pp. 441–

442, 2011.

[4] A. A. van der Heijden, M. D. Abramoff, F. Verbraak, M. V. Hecke, A. Liem, and G. Nijpels, “Validation of automated screening for refer- able diabetic retinopathy with the idx-dr device in the hoorn diabetes care system,” Acta ophthalmologica, vol. 96, no. 1, pp. 63–68, 2018.

[5] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, and S. Thrun, “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 542, no. 7639, p. 115, 2017.

[6] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning.” in Association for the Advancement of Artificial Intelligence (AAAI), 2017, pp.

4278–4284.

[7] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun,

“Overfeat: Integrated recognition, localization and detection using convolutional networks,” in International Conference on Learning Representa- tions (ICLR), 2014.

26

(37)

BIBLIOGRAPHY 27

[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

[9] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.

[10] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Med- ical Image Computing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241.

[11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar- chies for accurate object detection and semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp.

580–587.

[12] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp.

6517–6525.

[13] V. Iglovikov, A. Rakhlin, A. Kalinin, and A. Shvets, “Pediatric bone age assessment using deep convolutional neural networks,” arXiv preprint arXiv:1712.05053, 2017.

[14] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya et al., “Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning,” arXiv preprint arXiv:1711.05225, 2017.

[15] R. Heelan, B. Flehinger, M. Melamed, M. Zaman, W. Perchick, J. Car- avelli, and N. Martini, “Non-small-cell lung cancer: results of the new york screening program.” Radiological Society of North America, vol. 151, no. 2, pp. 289–293, 1984.

[16] R. Girshick, “Fast r-cnn,” in IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.

[17] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.

(38)

28 BIBLIOGRAPHY

[18] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Ob- ject detection with discriminatively trained part-based models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp.

1627–1645, 2010.

[19] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” in Advances in Neural Information Process- ing Systems (NIPS), 2016, pp. 379–387.

[20] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.

Berg, “Ssd: Single shot multibox detector,” in European Conference on Computer Vision (ECCV), 2016, pp. 21–37.

[21] A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” in International Conference Pattern Recognition (ICPR), vol. 3, 2006, pp. 850–

855.

[22] J. Ding, A. Li, Z. Hu, and L. Wang, “Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks,” arXiv preprint arXiv:1706.04303, 2017.

[23] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR), 2015.

[24] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:

A large-scale hierarchical image database,” in IEEE Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.

[25] B. Moira, v. d. G. Robbert, d. K. Michael, M. Jeroen, and Z. Guido, “Dual path networks,” in Advances in Neural Information Processing Systems, 2016, pp. 4470–4478.

[26] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham, 2015, pp. 234–241.

[27] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), 2016, pp. 770–778.

[28] “3dcnn for lung nodule detection and false positive reduction,” 2 Jan- uary 2018.

(39)

BIBLIOGRAPHY 29

[29] W. Zhu, C. Liu, W. Fan, and X. Xie, “Deeplung: 3d deep convolutional nets for automated pulmonary nodule detection and classification,” arXiv preprint arXiv:1709.05538, 2017.

[30] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng, “Dual path networks,”

in Advances in Neural Information Processing Systems (NIPS), 2017, pp.

4470–4478.

[31] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. van den Bogaard, P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, B. Geurts et al., “Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge,” Medical Image Analysis, vol. 42, pp. 1–13, 2017.

[32] E. M. van Rikxoort, B. de Hoop, M. A. Viergever, M. Prokop, and B. van Ginneken, “Automatic lung segmentation from thoracic computed tomography scans using a hybrid approach with error detection,” Medical physics, vol. 36, no. 7, pp. 2934–2947, 2009.

[33] J. Ding, A. Li, Z. Hu, and L. Wang, “Accurate pulmonary nodule detection in computed tomography images using deep convolutional neural networks,” 2017.

[34] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama et al., “Speed/accuracy trade-offs for modern convolutional object detectors,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[35] T. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” CoRR, vol. abs/1708.02002, 2017.

[36] J. Hosang, R. Benenson, and B. Schiele, “Learning non-maximum suppression,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

[37] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,

“Feature pyramid networks for object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, no. 2, 2017, p. 4.

[38] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.

(40)

Appendix A Visual Evaluation and Interpretation

various sizes

various intensi-

ties

various shapes

Figure A.1: Illustrations of the quality of the detection on a variety of nod- ules: True positive Bounding box proposals are filtered; only the one with a confi- dence score greater than 0.9 are returned. The blue boxes represent the ground-truths and the red boxes represent the region proposals. The system works over several characteristics (size, intensity and shape)

30

(41)

31

poor intensity

or resolution

wrong location

Figure A.2: Illustrations of the quality of the detection on a variety of nod- ules: False negative. Bounding box proposals are filtered; only the one with a confidence score greater than 0.9 are returned. The blue boxes represent the ground- truths and the red boxes represent the region proposals. Major failures are due to nodules with poor intensity or a wrong localisation of the bounding boxes.

(42)

32

vessels

potential regions

of interest

corners or lobe bound-

ary

outside lungs

Figure A.3: Illustrations of the quality of the detection on a variety of nod- ules: False negative. Bounding box proposals are filtered; only the one with a confidence score greater than 0.9 are returned. The blue boxes represent the ground- truths and the red boxes represent the region proposals. Most of the false positives correspond to damage area or tricky cases.

(43)

www.kth.se