Applications of Deep Learning in Medical Image Analysis : Grading of Prostate Cancer and Detection of Coronary Artery Disease

(1)

LUND UNIVERSITY PO Box 117 221 00 Lund +46 46-222 00 00

Applications of Deep Learning in Medical Image Analysis

Grading of Prostate Cancer and Detection of Coronary Artery Disease

Arvidsson, Ida

2021

Document Version:

Publisher's PDF, also known as Version of record

Link to publication

Citation for published version (APA):

Arvidsson, I. (2021). Applications of Deep Learning in Medical Image Analysis: Grading of Prostate Cancer and Detection of Coronary Artery Disease. Lund University / Centre for Mathematical Sciences /LTH.

Total number of authors: 1

General rights

Unless other specific re-use rights are stated the following general rights apply:

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

• You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

Read more about Creative commons licenses: https://creativecommons.org/licenses/ Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

(2)

Applications of Deep Learning

in Medical Image Analysis

Grading of Prostate Cancer and Detection of

Coronary Artery Disease

by Ida Arvidsson

ACADEMIC THESIS

Thesis advisors: Prof. Anders Heyden, Prof. Kalle Åström, Assoc. Prof. Niels Christian Overgaard

Faculty opponent: Prof. Carolina Wählby, Uppsala University, Sweden

which, by due permission of the Faculty of Engineering at Lund University, will be publicly defended on Friday 28th of May, 2021, at 13:15 in lecture hall MH:Gårding, Centre for Mathematical Sciences, Sölvegatan 18, Lund, for the

(3)

.

Cover illustration front: Gleason grading performed by AI and two pathologists.

(Credits: Felicia Marginean, Athanasios Simoulis and Ida Arvidsson).

Cover illustration back: Polar map from myocardial perfusion imaging.

Mathematics

Centre for Mathematical Sciences Lund University

Box 118 SE-221 00 Lund Sweden

www.maths.lu.se

Doctoral Theses in Mathematical Sciences 2021:4 issn: 1404-0034

isbn: 978-91-7895-798-9 (print) isbn: 978-91-7895-797-2 (pdf ) lutfma-1072-2021

(4)

Abstract

A wide range of medical examinations are using analysis of images from different types of equipment. Using artificial intelligence, the assessments could be done automatically. This can have multiple benefits for the healthcare; reduce workload for medical doctors, decrease variations in diagnoses and cut waiting times for the patient as well as improve the performance. The aim of this thesis has been to develop such solutions for two common diseases: prostate cancer and coronary artery disease. The methods used are mainly based on deep learning, where the model teaches itself by training on large datasets.

Prostate cancer is one of the most common cancer diagnoses among men. The diagnosis is most commonly determined by visual assessment of prostate biopsies in a light micro-scope according to the Gleason scale. Deep learning methods to automatically detect and grade the cancer areas are presented in this thesis. The methods have been adapted to improve the generalisation performance on images from diﬀerent hospitals, images which have inevitable variations in e.g. stain appearance. The methods include the usage of dig-ital stain normalisation, training with extensive augmentation or using models such as a domain-adversarial neural network. One Gleason grading algorithm was evaluated on a small cohort with biopsies annotated in detail by two pathologists, to compare the per-formance with pathologists’ inter-observer variability. Another cancer detection algorithm was evaluated on a large active surveillance cohort, containing patients with small areas of low-grade cancer. The results are promising towards a future tool to facilitate grading of prostate cancer.

Cardiovascular disease is the leading cause of death world-wide, whereof coronary artery disease is one of the most common diseases. One way to diagnose coronary artery disease is by using myocardial perfusion imaging, where disease in the three main arteries supplying the heart with blood can be detected. Methods based on deep learning to perform the de-tection automatically are presented in this thesis. Furthermore, an algorithm developed to predict the degree of coronary artery stenosis from myocardial perfusion imaging, by means of quantitative coronary angiography, has also been developed. This assessment is normally done using invasive coronary angiography. Making the prediction automatically from my-ocardial perfusion imaging could save suﬀering for patients and free resources within the healthcare system.

(5)

(6)

Populärvetenskaplig sammanfattning

Prostatacancer är den vanligaste cancerdiag-nosen bland män i flera länder, däribland Sverige. Samtidigt är hjärt- och kärlsjukdom den vanligaste dödsorsaken i världen, varav kranskärlssjukdom är den mest frekventa. Det finns alltså mycket att vinna på att för-bättra diagnostiken av dessa sjukdommar. Under senare år har begreppet artificiell in-telligens (AI) letat sig in nästan överallt i samhället, så även inom medicin. Med hjälp av AI skulle en del arbetsuppgifter inom sjukvården kunna underlättas eller helt automatiseras, vilket skulle frigöra tid för sjukvårdspersonal till de mer komplicer-ade fallen.

Framgången för AI beror till stor del på att datorer har blivit kraftfullare och ﬂer beräkningar kan göras på kort tid. Neurala nätverk är en sorts AI, som bygger på att många enkla beräkningar kombineras i stora nätverk för att hitta avancerade egenskaper och samband. Det speciella med nätverken är att de inte programmeras i detalj, utan i stället får utveckla sig själva genom att träna på stora dataset med många exempel. De får alltså testa sig fram och genom återkoppling från jämförelse med redan kända fall bli bät-tre på sin uppgift. Den här sortens algorit-mer har använts i den här avhandlingen för att göra bedömningar av medicinska bilder. För diagnostisering av prostatacancer tas små vävnadsprov som färgas och analyseras i ett mikroskop. Där kan cancer ses i form av körtlar som har förlorat sin form och börjat växa okontrollerat. Diagnos-tiseringen görs med hjälp av en gradering, den så kallade Gleasonskalan, vilken ger en prognos för sjukdomen och används för att avgöra lämplig behandling. Det ﬁnns stor

variation i vilken bedömning som görs, både mellan men även med samma patolog. AI kan hjälpa till med graderingen, till exempel som en extra bedömmare för att säkerställa rätt diagnos eller för att sortera ut friska fall och därmed underlätta arbetet för det min-skande antalet patologer. Exempel på olika bedömningar för samma vävnadsprov visas i ﬁguren nedan. Allvarlighets-grad av cancer: låg mellan hög A B C D

Variation i bedömning av prostatacancer i samma vävnadsprov. A - AI, B - läkare 1 år 1, C - läkare 1 år 2, D - läkare 2.

Kranskärlssjukdom kan diagnostiseras på flera sätt, varav hjärtscintigrafi är en av de vanligaste. Vid hjärtscintigrafi sprutas en liten mängd av ett radioaktivt ämne in i blodet varefter bilder tas av hjärtat med en gammakamera. Bilderna analyseras för att upptäcka eventuella förträngningar i kärlen som förser hjärtmuskeln med blod. Med hjälp av AI kan bilderna från den ökande an-vändningen av hjärtscintigrafi analyseras au-tomatiskt. Vi har dessutom undersökt om AI kan utläsa mer information än vad det mänskliga ögat kan se.

Projekten i avhandlingen visar ﬂera möj-ligheter för AI och bildanalys inom medicin med lovande resultat, men diskuterar även de svårigheter som ﬁnns. Bara för att en al-goritm fungerar bra på ett sjukhus, så är det inte säkert att den gör det på ett annat. Sam-tidigt så kanske AI kan användas för att göra mer tillförlitliga bedömningar än tidigare.

(7)

(8)

List of Publications

This thesis is based on the following publications, referred to by their Roman numerals. They are included in this thesis with the permission of the publishers. My contribution to each paper is listed below.

i A. Gummeson, I. Arvidsson, M. Ohlsson, N. C. Overgaard, A. Krzyzanowska, A. Heyden, A. Bjartell, K. Åström, “Automatic Gleason Grading of H&E Stained Microscopic Prostate Images using Deep Convolutional Neural Networks”, In

Pro-ceedings of the International Society for Optics and Photonics, Medical Imaging: Digital Pathology, volume 10140, page 101400S, 2017.

This paper is based on the master’s thesis by AG, who did the implementations. KÅ and AB had the original idea for the project. The paper was written by me, and revised by the other authors.

ii J. Isaksson, I. Arvidsson, K. Åström, A. Heyden, “Semantic Segmentation of Mi-croscopic Images of H&E Stained Prostatic Tissue using CNN”, In Proceedings of

the IEEE International Joint Conference on Neural Networks (IJCNN), pages 1252–

AH came up with the idea and JI did the implementations in his master’s thesis. I wrote the paper based on the master’s thesis, with input from all authors.

iii I. Arvidsson, N. C. Overgaard, F. E. Marginean, A. Krzyzanowska, A. Bjartell, K.

Åström, A. Heyden, “Generalization of Prostate Cancer Classiﬁcation for Multiple Sites using Deep Learning”, In Proceedings of the IEEE 15th International

All authors contributed to the idea. I developed the method, did the implementa-tions and wrote the paper, with input from the other authors.

iv K. Tall, I. Arvidsson, N. C. Overgaard, K. Åström, A. Heyden, “Automatic Detec-tion of Small Areas of Gleason Grade 5 in Prostate Tissue using CNN”, In

Proceed-ings of the International Society for Optics and Photonics, Medical Imaging: Digital Pathology, volume 10956, page 109560E, 2019.

AH, NCO, KÅ and I came up with the idea. KT developed the method and did the implementations in his master’s thesis, supervised by me and AH. I wrote the paper.

(9)

v I. Arvidsson, N. C. Overgaard, K. Åström, A. Heyden, “Comparison of Diﬀerent

Augmentation Techniques for Improved Generalization Performance for Gleason Grading”, In Proceedings of the IEEE 16th International Symposium on Biomedical

All authors contributed to the idea. I developed the ideas, did the implementa-tions and wrote the paper, which was revised by all authors.

vi I. Arvidsson, N. C. Overgaard, A. Krzyzanowska, F. E. Marginean, A. Simoulis,

A. Bjartell, K. Åström, A. Heyden, “Domain-Adversarial Neural Network for Im-proved Generalization Performance of Gleason Grade Classiﬁcation”, In

Proceed-ings of the International Society for Optics and Photonics, Medical Imaging: Digital Pathology, volume 11320, page 1132016, 2020.

KÅ and I came up with the idea. I developed the idea, did the implementations and wrote the paper, with input from the other authors.

vii F. E. Marginean, I. Arvidsson, A. Simoulis, N. C. Overgaard, K. Åström, A. Hey-den, A. Bjartell, A. Krzyzanowska, “An Artiﬁcial Intelligence-based Support Tool for Automation and Standardisation of Gleason Grading in Prostate Biopsies”, In

European Urology Focus, 2020.

The study was designed by AK, AB, FEM, KÅ, NCO, AH and me. The data was collected and annotated by FEM, AS, AK. I developed and implemented the algorithm. The results were analysed by FEM, AS, AK and me. The paper was mainly written by FEM, AK and me, and revised by all authors.

viii I. Arvidsson, N. C. Overgaard, A. Davidsson, J. Frias-Rose, M. Ochoa-Figueroa,

K. Åström, A. Heyden, “Prediction of Obstructive Coronary Artery Disease from Myocardial Perfusion Scintigraphy using Deep Neural Networks”, In Proceedings

of the IEEE 25th International Conference on Pattern Recognition, pages 4442–

MOF and AH came up with the idea for the project. I developed the idea for the method with input from the other authors. I did the implementations. The paper was written by me and revised by all authors.

(10)

List of Publications

ix I. Arvidsson, N. C. Overgaard, A. Davidsson, J. Frias-Rose, K. Åström, M.

Ochoa-Figueroa, A. Heyden, “Detection of Left Bundle Branch Block and Obstructive Coronary Artery Disease from Myocardial Perfusion Scintigraphy using Deep Neu-ral Networks”, In Proceedings of the International Society for Optics and Photonics,

Medical Imaging: Computer-Aided Diagnosis, volume 11597, page 115970N, 2021.

This paper continued the work in Paper VIII, but with additional ideas from MOF. I designed the method, did the implementations and wrote the paper. All authors revised the paper.

x I. Arvidsson, A. Davidsson, N. C. Overgaard, C. Pagonis, K. Åström, E. Good, J.

Frias-Rose, A. Heyden, M. Ochoa-Figueroa, “Deep Learning Prediction of Quan-titative Coronary Angiography using Myocardial Perfusion Images with a Cardiac CZT Camera”, Manuscript.

MOF came up with the idea for the study. I developed the method and did the implementations. The paper was mainly written by MOF and me, with input from the other authors.

(11)

(12)

Acknowledgements

This thesis would not have been possible without the help and support from colleagues and friends. I specially want to thank the following:

My supervisors. Anders Heyden, Kalle Åström and Niels Christian Overgaard, for your guid-ance, support and encouragement. For the many opportunities you have given to me, your interesting ideas and endless enthusiasm. It has been ﬁve stimulating and developing years working with you and your combined supervision has been very valuable to me.

All medical collaborators. For collecting data, making annotations and not at least explain-ing your professions. In particular: Agnieszka Krzyzanowska, for great collaboration, your excellent interpretations between the pathologists and me, and your persistence with the prostate project. Felicia Marginean and Athanasios Simoulis, for all the time you spent on drawing annotations and your patience with the not always so successful algorithm. Anders

Bjartell, for your guidance and sharing of expertise. Miguel Ochoa-Figueroa, for your great

ideas and enthusiasm. Anette Davidsson, for your input and persistent work.

My colleagues at the Centre for Mathematical Sciences. For good company, valuable input and insightful discussions. Especially: Gabrielle Flood, for sharing the last ﬁve years with me and for the invaluable teamwork, friendship and support. Anna Gummeson, for providing me the best possible start of my PhD and for making the second half of it more fun. Linn

Öström, for your helpfulness and for being the best oﬃce mate possible.

All other collaborators. In particular: Krister Tham, Linus Mårtensson and your colleagues at Katam and Eigenvision, for challenging tasks and new perspectives. Claes Lundström and

Erik Sjöblom at Sectra, for support with the prostate project. Giuseppe Lippolis, for sharing

your images. The master’s thesis students working on the prostate project: Johan Isaksson,

Joel Ekelund, Kasper Tall, Elin Olofsson, for your devotion and accomplishments. All other

co-authors: Mattias Ohlsson, Jeronimo Frias-Rose, Christos Pagonis, Elin Good, for valuable input on our research.

Family and friends. For support, friendship, caring and inspiration. Especially: Mamma

och Far, for helping out with everything at home and your unconditional support. Lovisa,

for explaining all complicated (and not so complicated) medical terms to a novice in the ﬁeld. Johannes, for encouraging me to pursue a PhD. Elna, for your generous hospitality and support. Erik, for your patience, advice and thoughtfulness.

(13)

Funding

The studies and work in this thesis were funded by Vinnova-Swelife and Vinnova-Medtech4Life programs — DOGS: Digital Pathology for Optimized Gleason Score in Prostate Cancer and DOGS-2: Digital Pathology for Optimized Gleason Score-2 — (grant no. 2015– 04740 and 2018–02271), Vinnova-Medtech4Life program — AIDA: Analytic Imaging Diagnostics Arena — (grant no. 2017–02447) and strategic research projects eSSENCE and ELLIIT.

(14)

Abbreviations

The following table describes the abbreviations and acronyms used throughout the thesis. The page on which each one is ﬁrst used is also given.

Abbreviation Meaning Page

AHA American Heart Association 11

ANN Artiﬁcial Neural Network 3

AI Artiﬁcial Intelligence 1

AUC Area Under the ROC Curve 57

BMI Body Mass Index 26

CAD Coronary Artery Disease 1

CNN Convolutional Neural Network 3

CZT Cadmium-Zinc-Telluride 12

DANN Domain-Adversarial Neural Network 47

DL Deep Learning 1

ECG ElectroCardioGram 14

ESC European Society of Cardiology 11

FN False Negative 56

FP False Positive 56

G3 Gleason grade 3 7

GAN Generative Adversarial Network 46

GG Gleason grade Group 7

GS Gleason Score 7

H&E Haematoxylin & Eosin 8

HDI Human Development Index 6

HSV Hue Saturation Value 51

ICA Invasive Coronary Angiography 14

ICC Intraclass Correlation Coeﬃcient 58

IoU Intersection-over-Union 58

ISUP International Society of Urological Pathology 7

LAD Left Anterior Descending artery 10

LBBB Left Bundle Branch Block 14

LCx Left Circumﬂex artery 10

ML Machine Learning 2

MPI Myocardial Perfusion Imaging 1

PCa Prostate Cancer 1

PRIAS Prostate Cancer Research International Active Surveillance 23

PSA Prostate-Speciﬁc Antigen 6

QPS Quantitative Perfusion SPECT 24

(15)

Abbreviation Meaning Page

RANSAC RANdom SAmpling Consensus 55

RCA Right Coronary Artery 10

ReLU Rectiﬁed Linear Unit 34

RGB Red Green Blue 29

ROC Receiver Operating Characteristic 57

SIFT Scale Invariant Feature Transform 54

SPECT Single-Photon Emission Computed Tomography 12

TN True Negative 56

TP True Positive 56

QCA Quantitative Coronary Angiography 14

(16)

Chapter 1 Introduction

The overall subject of this thesis is automatic assessments of medical images. This by using image analysis and in particular deep learning (DL). Setting medical diagnoses with visual assessment is not a novel practice. However, the technology used for medical imaging has evolved and today includes the creation of digital images from different equipment and in huge quantities. This enables the use of artificial intelligence (AI) and image analysis for automatic assessments, which can have multiple benefits for healthcare; reduce workload for medical doctors, decrease variations in diagnoses and cut waiting times for the patient as well as improve the sensitivity and the specificity of diagnoses.

The applications studied in this thesis are limited to two ﬁelds; grading of prostate cancer (PCa) from microscopy images of stained tissue from prostate biopsies and detection of

coronary artery disease (CAD) using images from myocardial perfusion imaging (MPI). The

methods, possibilities and limitations are however the same for many other applications, in particular within medicine, and the work presented can hopefully be used as a guide for new applications. An overview of the components of the thesis follows.

1.1 Thesis Overview

The thesis consists of four parts. The ﬁrst part (Chapters 1 – 3) introduces the subjects, datasets and methods used. In the second part (Chapter 4 – 6) methods for PCa grading are presented and evaluated. The third part (Chapter 7) concerns detection of CAD. Finally, the fourth part (Chapter 8) discusses future applications and presents the conclusions.

Chapter 1 The key concepts of artiﬁcial intelligence are introduced. Furthermore, some background information on PCa and CAD is presented.

(19)

CHAPTER 1. INTRODUCTION

Chapter 2 The diﬀerent datasets used are described. These are a key component for the studies and results presented in this thesis.

Chapter 3 The methods used are introduced. These are mainly deep learning methods, but also related aspects such as colour normalisation of microscopy images of stained tissue and registration of images.

Chapter 4 The methods developed for Gleason grading of PCa are described and corre-sponding results are presented. The focus is to ﬁnd learning-based methods that generalise well to unseen cases, although the datasets used for training are limited in size and variation.

Chapter 5 Grading of PCa is not a trivial task. In this chapter, two approaches that could facilitate the grading are introduced: semantic segmentation of relevant tissue components and detection of small cancer areas.

Chapter 6 Thorough evaluations of automatic PCa detection and Gleason scoring algo-rithms are presented. Evaluations are done both by comparing to multiple pathologists and by evaluating on a larger dataset.

Chapter 7 Algorithms for automatic detection of CAD using MPI are described. Fur-thermore, an algorithm for automatic prediction of the degree of coronary artery stenosis is presented.

Chapter 8 General discussions of the topics in, and related to, this thesis. The conclusions which can be drawn from the presented results and suggestions of topics to be further explored are stated.

1.2 Deep Learning

The term AI seems to be more popular than ever. AI is a very broad term with vague definition. Essentially anything that a machine can do which can be considered “smart” is often included in AI. A sub-field of AI is machine learning (ML). The goal of ML is that the machine should teach itself by experience, by training on a dataset or receiving some other sort of feedback. The machine learning algorithm will optimise itself to be as good as possible at the specific task, based on the information in the dataset or received feedback. Ideally, it will also perform well on new data later on.

DL is a way to design the ML algorithm, where simple concepts are built on top of each other and forming a deep structure. It replaces the classical way of designing and extracting hand-made features used for classiﬁcation with the substantially diﬀerent strategy of letting the computer itself decide which features are of importance by training on a dataset. While DL is not a novel concept, the availability of large datasets and increased computational

(20)

1.2. DEEP LEARNING

AI ML ANN DL

Figure 1.1: An Euler diagram showing the relation between AI, ML, ANN and DL.

power have made DL very successful and popular in recent years. Today DL has out-performed previous state-of-the-art algorithms in several visual recognition tasks and the performance has improved rapidly. The ImageNet Large Scale Visual Recognition Chal-lenge [132] is the largest contest in object recognition. It is held every year. The ﬁrst time that the winner was a deep neural network was in 2012, when Krizhevsky et al. [91] im-proved the performance remarkably by lowering the state-of-the-art top-5 error rate from 26.1% to 15.3%. The winners have continued to be DL algorithms ever since.

A DL algorithm consists of a deep artiﬁcial neural network (ANN). An ANN contains many simple processing units that are combined to form a more complex architecture. When many of these units are grouped in layers and stacked on top of each other, the network is considered a DL model, if it is deep enough. The relation between the terms AI, ML, ANN and DL is illustrated in Figure 1.1. The deep ANN contains a very large number of parameters, whose values are optimised by training on a large dataset. The design of the ANN is inspired by the human brain. The simple units in an ANN corresponds to nerve cells, where input signals are combined and forwarded if strong enough, see Figure 1.2. In the artiﬁcial variant these are replaced by a summation and an activation function. These are combined to construct more complex relations, such as the human brain or the deep neural network.

For image analysis, the convolutional neural network (CNN) has turned out to be a successful approach. CNNs are sparsely connected neural networks with shared weights, resulting

Input Output

Figure 1.2: Illustration of the similarities between a computational unit in an ANN and a neuron in the human brain. Image

(21)

in fewer parameters and therefore easier to train. They are designed to eﬃciently extract spatial information from images. The goal of image analysis is typically classiﬁcation or segmentation.

DL is often described as a black box, where decisions are made in a non-understandable way. The initialisation of the neural network and the order of the samples when training are often random. However, when the training is over there is nothing random with a neural network. It consists of well-deﬁned calculations, but its complex and deep structure typi-cally makes it incomprehensible for humans. Therefore, to explain the networks behaviour, one usually does not refer to the trained model but rather to reasons for why it performs as it does. Typically, the trained network can be understood by features in the dataset used for training. A limitation in the dataset will likely create a model that is limited in the same way. For examples outside these boundaries, the network can appear to behave randomly simply since it has not been taught how to handle such examples.

1.3 Medical Image Analysis

DL has gained popularity and success in many fields, not at least within medicine. Large amounts of data are processed in the healthcare system, making it both a field suitable for development of successful DL algorithms and, at the same time, a field which could gain capability using DL. While different types of data are used for medical assessments, images are a very common type of data which makes medical image analysis potentially very useful. Medical images are created in many different forms using different equipment. For exam-ple, they can be created using ultrasound, X-ray, computed tomography, magnetic reso-nance imaging, microscopes, and scintigraphy. All modalities create very different images, but all images have potential to be analysed automatically using DL. The applications are many, including the two considered in this thesis — detection of cancer and cardiovascular disease.

While the potential for DL within medicine is high, it also carries some difficulties. For example, images from similar machines from different manufacturers or different hospitals can have various appearances and the variations can be hard to forecast. These properties can cause large troubles for DL algorithms and are thus important to be aware of. It is therefore important to evaluate a DL algorithm carefully, e.g. with images from different machines if relevant, to make sure that the algorithm generalises to new data as expected. How these types of automatic tools should be used in the healthcare is not evident. While they might be used for making the final diagnosis in the future, it is more likely that they will be used as a tool supporting the medical doctors initially. For example, by highlighting potential severe or urgent cases these could be prioritised. Excluding the obvious healthy

(22)

1.4. PROSTATE CANCER

cases would also reduce the workload for the medical doctors and save time which could instead be spent on the more diﬃcult cases [129]. AI could of course also be used in other parts of the healthcare system, for example to facilitate the workﬂow and simplify administrative tasks [158].

Setting medical diagnoses is a hard task, requiring many years of study and experience. Assessment of an image is only part of the process to set a diagnosis and replacing that part with an AI system is a much smaller step than letting the AI set the ﬁnal diagnosis. In a study by Levenson et al. “Pigeons (Columba livia) as trainable observers of pathology and radiology breast cancer images” from 2015 [95], pigeons were trained with diﬀerential food reinforcement to spot cancer in images of breast tissue. As individuals, they did perform worse than the professional pathologists, but as a group they performed equally well and at that point of time far better than AI.

1.4 Prostate Cancer

1.4.1 The Prostate

The prostate is located below the bladder with the urethra passing through it, with size about a walnut, cf. [97] and the references therein. It has an important role in male reproduction, producing the ﬂuid component of the semen. The microanatomy of the prostate tissue consists of glandular structures surrounded by stroma. The glands contain lumen, which is surrounded by an inner layer of epithelial cells, an outer layer with basal cells and a basement membrane surrounding the whole gland, separating it from the stroma. There are also some uncommon cells; neuroendocrine and stem cells. An illustration of the glandular structure is given in Figure 1.3.

Figure 1.3: Illustration of the normal prostatic glandular structure. The lumen (white) is enclosed by one layer of epithelial cells, one layer with basal cells and a basement membrane. Stroma is surrounding the gland. Image from Lippolis [97], used with permission.

(23)

Figure 1.4: Schematic illustration of the transformation from normal gland to cancer area. Missing basal cells is the ﬁrst indica-tion of cancer. Image from Lippolis [97], used with permission.

The shape of a normal gland is irregular, but with the structure described above. Missing basal cells is a clear indication of cancer. The growing cancer will destroy the glandular structure and cancer cells will appear scattered or in solid clusters. This is illustrated in Figure 1.4.

1.4.2 Prostate Cancer

According to the global cancer statistics 2020 [144], PCa was the most frequently diagnosed cancer among men in many countries, for example in Northern and Western Europe (in-cluding Sweden). Worldwide among men it was the second most common cancer diagnosis (14.1%, 1.4 million new cases) after lung cancer (14.3%) and the fifth leading cause of can-cer death (6.8%, 375 000 cases). The incidence of PCa is three times higher in high and very high Human Development Index (HDI) countries compared to lower HDI countries, but the mortality rate is approximately the same. In Sweden it is the leading cause of cancer death among men. The reason for the varying incidence rates worldwide is likely caused by the differences in diagnostic practices, where prostate-specific antigen (PSA) testing has af-fected the incidence rates over time in many countries. The risk factors for PCa are limited, but include advancing age, family history of PCa and certain genetic mutations.

While examinations such as PSA blood test can give an indication of PCa, biopsy of the pro-static tissue is the best way to get a conclusive diagnosis [135]. In the last years multipara-metric magnetic resonance imaging has become more important for detection of prostate cancer, enabling targeted biopsies [85] and preliminary automatic prediction of cancer grade using DL [25]. For correct treatment, a good classiﬁcation of the severity of the cancer is necessary. The standard procedure is that the diagnosis is determined by pathol-ogists based on ocular inspection of prostate biopsies in order to classify them according to severity and Gleason score, see next subsection. Typically 8-12 biopsy cores are used to set diagnosis. There is a high pressure on the pathologists to always be both eﬃcient and meticulous.

(24)

There are several diﬀerent approaches used to treat PCa, depending on age and general health conditions of the patient but also how severe the cancer is. For patients with tumours that are not expected to grow for several years, one option is to use active surveillance. Under active surveillance, regular tests are conducted until signs of progression are observed. Curative treatments include radical prostatectomy, i.e. removing the whole prostate using surgery, and radiotherapy. For patients with reoccurring or advanced cancer there are few options except chemotherapy or hormone treatment, all with side eﬀects. There are however potential new treatments, such as interstitial photodynamic therapy [148].

1.4.3 Gleason Grading

Gleason grading was named after Donald Gleason, who developed a method for PCa

classi-ﬁcation in the 1960’s. Today this is the standard method, although it has been updated in various ways since then [48, 50]. The Gleason grade ranks cancer by severity based on the growth pattern of the cancer cells. The original scale went from 1 to 5, see Figure 1.5, where 1 is the lowest grade of cancer and 5 is the most severe grade of cancer. However, Gleason grade 1 and 2 are not used in practice for grading of prostate biopsies anymore [50, 135]. The Gleason grades used are thus Gleason grade 3 (G3), Gleason grade 4 (G4) and Gleason

grade 5 (G5).

The Gleason grade is a label assigned to individual regions of the prostate tissue and differ-ent regions may have differdiffer-ent Gleason grades. To describe the whole prostate biopsy the

Gleason score (GS) is used. The GS is constructed by two Gleason grades; the grade

cov-ering the largest area of the biopsy and the highest grade diﬀerent from the ﬁrst occurring on the biopsy, e.g. GS 3 + 4 = 7. To distinguish e.g. GS 3 + 4 and GS 4 + 3, both with sum 7, the Gleason grade group (GG) was suggested by [50]. There is also another scale, the

International Society of Urological Pathology (ISUP) grade [46], very similar to the GG [49].

In this thesis the GG will be used. The link between the GS and the GG can be seen in Table 1.1.

G3 and G4 are mainly detected by larger patterns, compared to G5 where single cancer cells sometimes can be seen. Since the more malignant tumours can split up into scattered locations, single cells of G5 can occur intermingled with benign tissue. Therefore, it is of great importance to ﬁnd even very small areas of the highest grade. A single cancer cell can be detected by its diﬀerently shaped nucleus, but also by the nuclear texture and size [26, 135].

Table 1.1: The Gleason scores and corresponding grade groups from [50].

Gleason score (GS) {3+3, or lower} {3+4} {4+3} {4+4, 3+5, 5+3} {4+5,5+4,5+5}

(25)

Figure 1.5: Schematic drawing of the Gleason grading system, illustrating how the cancer cells are spread in the tissue for the different grades. Image from Lippolis [97], used with permission.

1.4.4 Staining

Before the pathologists examine the prostatic tissue, it is prepared in a few steps. The tissue is embedded in paraﬃn, sliced into thin sections and mounted onto glass slides. The slides are stained, to highlight diﬀerent features. The examination is done using a light microscope, either a traditional or a digital one.

There are many diﬀerent stains, with diﬀerent properties. There are immunohistochem-ical stainings, such as AMACR and p63, which highlight cancerous and benign glands respectively. Prognostic markers, such as Ki-67, have association with aggressive features of PCa [97]. The most common staining for cancer diagnosis is however haematoxylin and

eosin (H&E), which are the stains that have been used in the studies in this thesis.

(26)

Figure 1.6: Examples of tissue stained with H&E from Skåne University Hospital, Malmö, Sweden, classiﬁed as Benign (top left), G3 (top right), G4 (bottom left) and G5 (bottom right).

Haematoxylin is a basic dye, which colours the nuclei purple. Other parts of the tissue are stained into different shades of pink, brighter for stroma and darker for epithelial cytoplasm, by the acidic dye eosin [97]. The stains absorb light and the parts which have not been stained therefore appear white. Examples of H&E stained tissue with different Gleason grades can be seen in Figure 1.6. The H&E staining is an old technique and also the most commonly used for Gleason grading, although immunohistochemical staining sometimes is used to confirm the diagnosis. However, the diagnosis will probably rely on H&E staining for at least some more decades according to [57].

Differences in the preparation of the histology slides, such as stain concentration, staining duration and tissue thickness, result in differences in staining appearances from laboratory to laboratory and over time [67]. Also, storage and handling of the slide will affect the appearance, since the stains can fade when exposed to light [106]. These variations are hard to avoid and are typically larger between different labs although variations occur also within the same lab. While this usually is not a problem for humans using a traditional procedure, it can be problematic for an AI algorithm trained on a dataset with few variations. This problem will be further investigated and discussed in this thesis.

(27)

1.4.5 Digital Pathology

Since the mid 1990’s, microscope imaging has improved and now allows for whole slides to be digitised. The pathologist’s light microscope can thus be replaced with a computer and this allows for digital pathology. Multiple studies have compared the usage of light microscopes and digital pathology, see [77] and the references therein, and found that the performance is approximately the same with both approaches. Which approach that is the most efficient one is not clear, but there are other benefits with the digital approach; reduced risk of patient and slide misidentification, reduced risk for tissue loss or damage, improved telepathology consultation (i.e. study slides from a distance) as well as simplified annotation drawings and measurements using different software. The possibility to use automatic image analysis to make assessments of the tissue is another great benefit. The benefits with automatic analysis of tissues are many. The pathologists’ workload is heavy due to the high incidence of PCa resulting in large volumes of histological material. Also, there is a lack of pathologists resulting in long queues for cancer patients and delays in diagnosis. A system that for example automatically recognises all benign cases would drastically reduce the workload, leaving only the hard cases to the pathologists to examine. Moreover, multiple studies have shown that there is a high variability in grading between different pathologists in multiple studies, [2, 47, 110, 123], something that can have con-siderable impact on which diagnosis and treatment the patient gets. With a second opinion from an automated image analysis algorithm, diagnoses might come closer to consensus, decreasing over- and undertreatment. Hence there is good reason to try to automate the pathological analysis process.

1.5 Coronary Artery Disease

1.5.1 Cardiovascular Physiology

The heart and circulatory system have a central role in the functioning of the body. Oxygen and nutrients needed by the body, as well as waste products such as carbon dioxide, are transported by the blood via the circulatory system. The heart is a muscular organ that functions as a pump of the blood. The coronary arteries are part of the circulatory system, supplying the heart muscle itself with blood. The main three coronary arteries are the

left anterior descending artery (LAD), left circumﬂex artery (LCx) and right coronary artery

(RCA), see Figure 1.7.

(28)

1.5. CORONARY ARTERY DISEASE

Right coronary artery (RCA)

Left circumﬂex artery (LCx)

Left anterior descending artery (LAD)

Apex Base

Figure 1.7: The three main coronary arteries, supplying the heart muscle with blood. Image modiﬁed and used with permission

from Servier Medical Art - Creative Commons Attribution 3.0 Unported License.

1.5.2 Coronary Artery Disease

According to World Health Organization [162], cardiovascular disease is the leading cause of death world-wide, estimated to represent 31% of all global deaths in 2016. Of these death cases, CAD and stroke are estimated to correspond to 82%. These two diseases are usually acute events, where the blood is prevented from ﬂowing to the heart or brain. The blockage that prevents the blood ﬂow is usually caused by build-up of fatty deposits on the inner walls of the blood vessels. There are multiple behavioural risk factors of CAD, such as tobacco use, unhealthy diet, physical inactivity and harmful use of alcohol.

Multiple methods exist to diagnose CAD. Two common methods are described in the fol-lowing subsections; non-invasive MPI and invasive coronary angiogram. The American College of Cardiology Foundation and American Heart Association (AHA) guidelines advo-cate the use of pre-test probability estimates — derived from the Diamond and Forrester model and Coronary Artery Surgery Study — to guide diagnostic testing for patients with suspected stable CAD. The motivation for this is that it is simple to use and can be im-plemented at the physician’s ﬁrst encounter with the patient [27, 39, 54]. Additionally, guidelines for the diagnosis and management of chronic coronary syndromes from the

Eu-ropean Society of Cardiology (ESC) are also used by referring physicians. The ESC pre-test

probability estimates from 2013 have been used in this thesis [109]. These tools are of great value not only to the referring physicians (i.e. the physician that establishes medical neces-sity) but also to nuclear medicine specialists reading MPI images in order to have a clinical scenario of the patient before the test is performed. Both the ESC pre-test probability and AHA pre-test probability are determined based on age, gender and nature of symptoms, as described in [3] and [61] respectively. The diﬀerent pre-test probabilities are given in Table 1.2 (ESC) and Table 1.3 (AHA).

(29)

Table 1.2: Pre-test probabilities according to the ESC scale from 2013.

Chest pain

Typical Atypical Non-anginal Age Men Women Men Women Men Women

30-39 59% 28% 29% 10% 18% 5% 40-49 69% 37% 38% 14% 25% 8% 50-59 77% 47% 49% 20% 34% 12% 60-69 84% 58% 59% 28% 44% 17% 70-79 89% 68% 69% 37% 54% 24% 80+ 93% 76% 78% 47% 65% 32%

Table 1.3: Pre-test probabilities according to the AHA scale. Interm. is short for intermediate.

Chest pain

Dyspnoea Typical Atypical Non-anginal

Age Men Women Men Women Men Women Men Women

30-39 Interm. Interm. Interm. Very low Low Very low Very low Very low 40-49 High Interm. Interm. Low Interm. Very low Low Very low 50-59 High Interm. Interm. Interm. Interm. Low Low Very low 60-69 High High Interm. Interm. Interm. Interm. Low Low

1.5.3 Myocardial Perfusion Imaging

MPI is one of the most common cardiological examinations performed today for diagnosis and risk assessment in patients with suspected CAD. It is a method using nuclear medicine, where radioisotopes attached to drugs are injected into the bloodstream and travel internally to speciﬁc organs or tissue. The emitted gamma radiation is captured by external detectors to form 2-dimensional images, scintigraphy, or 3-dimensional images, single-photon

emis-sion computed tomography (SPECT). MPI provides valuable information on for example

ischaemia, myocardial injuries and left ventricular ejection fraction [31, 45, 52, 122]. The technique has seen improvements in recent years with the introduction of

cadmium-zinc-telluride (CZT) technology, allowing to perform the scan in shorter times [44], use low

dose radiotracer protocols [51, 122] and achieve high diagnostic performance at the same time [119].

From the SPECT images, a 2-dimensional polar map is constructed to simplify the analysis. One example is shown in Figure 1.8. Analyses are currently performed visually by physi-cians based on the AHA standard 17-segment model of the polar maps, see Figure 1.9. The centre represents the cardiac apex, and the three surrounding circles consist of the apical, mid and basal areas, compare with Figure 1.7.

Analysing the MPI studies is very time consuming and the number of nuclear medicine specialist with experience in MPI is limited and steadily decreasing. At the same time, the number of MPI examinations increases each year, for example since there are new

(30)

Figure 1.8: The image shows part of a DICOM image, excluding e.g. patient information. It illustrates an example of constructed polar maps in artiﬁcial colouring of the intensities from a patient with obstructive CAD in the RCA.

LAD LCx RCA 1 2 3 4 5 6 7 8 9 10 11 12 14 13 16 15 17 1. basal anterior 2. basal anteroseptal 3. basal inferoseptal 4. basal inferior 5. basal inferolateral 6. basal anterolateral 7. mid anterior 8. mid anteroseptal 9. mid inferoseptal 10. mid inferior 11. mid inferolateral 12. mid anterolateral 13. apical anterior 14. apical septal 15. apical inferior 16. apical lateral 17. apex

Figure 1.9: The three coloured regions correspond to the regions used in this work: LAD, LCx and RCA. The division is done according to the AHA standard 17-segment model.

mendations for the healthcare to increase medical imaging such as MPI [88]. Furthermore, the qualitative approach is subjective and suﬀers from inter-observer variability between nuclear medicine specialists [151].

Apart from major observer dependent problems, the MPI technique itself carries additional problems with artefacts which can originate from several sources. First, it is important to carefully position the patient’s heart when performing MPI with the CZT camera to avoid image artefacts [70]. Furthermore, a slight movement from the patient during the imaging session, the body habitus from the patient (especially in obesity [53]) and the proper limits of the technique can all result in inconclusive studies. This will prompt further

(31)

studies and more radiation delivered to the patient, decreasing the quality of life for the individual and increasing the costs for the healthcare system. Artefacts can be caused by the breast or diaphragm, but also left bundle branch block (LBBB). These artefacts can be a source of false positive results in CAD detection. LBBB causes delayed activation of the left ventricle, making it contract later than the right ventricle. It is normally detected using

electrocardiogram (ECG) [145].

1.5.4 Quantitative Coronary Angiography

Patients who are thought to have signiﬁcant CAD in the MPI study are further examined by means of invasive coronary angiography (ICA). The coronary angiogram is constructed using X-ray imaging and an injected dye to see the vessels and possible blockages (Figure 1.10). To avoid inter-observer variability and achieve reproducibility, the degree of coronary artery stenosis can be evaluated by means of quantitative coronary angiography (QCA) [60, 146]. QCA has been used since the late 1980s. It measures the diameter of coronary artery stenosis and expresses it in percentage based on the angiogram, see Figure 1.11. The edges of the artery are automatically or semi-automatically detected to provide quantitative estimates from the coronary angiograms.

Figure 1.10: An obstruction, see the arrow, conﬁrmed in the ICA with a QCA value of 92%.

(32)

A B

Diameter stenosis (%) = B_A · 100

Figure 1.11: Illustration of how the diameter stenosis is determined using QCA, by measuring the minimal lumen diameter B compared with the reference diameter A.

1.5.5 DL Applied to CAD

In the ﬁeld of cardiovascular image analysis, DL has been applied to most examinations, including MPI [100]. An image analysis algorithm could reduce some of the problems listed above, such as the subjectivity and time consumption [62, 151]. An automatic or even semi-automatic system would improve the healthcare system, reducing subjectivity and potentially extracting more information than the human eye can see, such as predicting QCA from MPI.

(33)

(34)

Chapter 2 Data

With the introduction of ML, in particular DL, the demand and interest in large datasets have increased rapidly. Apart from the design of suitable neural networks, attention should be given to the construction and usage of the dataset for a successful algorithm. For exam-ple, a small or skewed dataset will be a limiting factor. Both the training and evaluation of ML algorithms are often based on a dataset, making it even more crucial that the dataset is handled correctly to avoid biases and incorrect conclusions. How this typically is done is described below. Thereafter follows two sections, introducing the datasets used in this thesis for PCa grading and for CAD detection, respectively.

For ML algorithms, the dataset is typically split into three parts — training, validation and

test. The training dataset is used to train the model, e.g. tune the weights in the neural

network. During training, the validation dataset is used to measure the performance and compare diﬀerent settings, such as network architectures or hyperparamters, to choose the optimal design. Finally, the test dataset is used to evaluate the performance to assure reliable results.

The test dataset should represent the expected future use cases as well as possible. For some medical applications it might be desirable to include samples from different hospitals or machines, while this is irrelevant for other applications. For example, when grading microscopy images of PCa, it is desirable to include images with different origins. For these images, there are inevitable variations from e.g. different staining procedures and different microscopes at different hospitals, which also typically varies over time. To claim that an algorithm is useful at more than one hospital it is therefore important that it is tested on images from multiple different hospitals, a desire that sometimes is hard to fulfil due to limited availability of datasets. However, for algorithms developed to be used with e.g. images from a single specific machine the requirements on the dataset could be lower.

(35)

CHAPTER 2. DATA

Training Validation Testing

Training Validation Training Testing

Validation Training Testing

Figure 2.1: The dataset is typically split in three parts; training to train the network, validation to choose e.g. network design and hyperparameters, and test to measure the performance on new data. Cross-validation can be used to utilise all data for both training and validation. The test dataset is sometimes omitted. The ﬁgure illustrates 3-fold cross-validation with an external test dataset.

A common problem is that the available datasets are too small, or smaller than desired. The test dataset is then sometimes omitted and only the validation dataset is used for eval-uation. To reduce the risk of incorrect conclusions, the dataset can be divided into train and validation multiple times to be able to both train and validate on all data, called cross

validation. The data is split into n parts, where (n− 1) parts are used for training and the

remaining one for validation of the result. This is repeated n times, using the diﬀerent parts for validation. An external test dataset is sometimes but not always used. The procedure is illustrated in Figure 2.1.

2.1 Prostate Cancer Dataset

Several different datasets have been used for the development and evaluation of the Gleason grading algorithms presented in this thesis. Each one of them is introduced below, with details on their sizes, origins and annotations. All of them consist of H&E stained tissue, but with different types of annotations. The tissue referred to as benign consists of all tissue considered to be non-cancerous. Most of the datasets have been collected in parallel with the development of the methods in this thesis. This has resulted in continuous change of the prerequisites for the algorithm development and shift in detail level of the annotations over time due to continuous progression. The sizes of the datasets range between 213 smaller images, used for both training and validation, to more than 2000 whole slide images, used for testing only. The sizes of the different datasets used in the different papers are illustrated in Figure 2.2.

2.1.1 Dataset 0

The smallest dataset, hereafter referred to as Dataset 0, was used in Paper I presented in Section 4.1. It has previously been used in [83, 97]. The dataset has two sources; PathXL in Belfast and Beaumont Hospital in Dublin in conjunction with the Prostate Cancer Research

(36)

2.1. PROSTATE CANCER DATASET

(a) (b)

Figure 2.2: Number of (a) annotated regions and (b) available slides in some of the datasets, used in different papers. Dataset A refers to both the original and the extended version. Eval. 1 and 2 mean Evaluation Dataset 1 and 2 respectively. Draft refers to the study presented in Section 6.2.

Consortium. The tissue was scanned on two different scanners in 40X magnification; a Leica SCN400 Digital Slide Scanner and an Aperio Scanscope CS scanner. It was unknown which images originated from which of the two sources. There are some small differences in hue and saturation between the samples, see Figure 2.3, possibly due to their different origins. In total there are 213 images, cropped from whole slide images such that each image is considered to consist of only one Gleason grade or benign tissue. The number of images of each class was 52 benign, 52 G3, 52 G4 and 57 G5.

Due to uncertainty in the annotations of these images, whether the whole image or only

Figure 2.3: Examples of images from Dataset 0 classiﬁed as Benign (top left), G3 (top right), G4 (bottom left) and G5 (bottom right).

(37)

CHAPTER 2. DATA

the majority of the image consisted of tissue with one Gleason grade, the dataset was not used in the further studies. Instead of letting a second pathologist annotate them, more data was instead annotated to the datasets presented in the next paragraph.

2.1.2 Dataset A, B, C and D

For the remaining papers included dealing with PCa, data from Dataset A, B, C and D was, in part or in full, used. The largest dataset, Dataset A, was used for training and validation in diﬀerent partitions. The remaining three datasets were only used for testing, except for the study presented in Chapter 6.2. Details of the datasets are given below.

Dataset A originates from Skåne University Hospital in Malmö, Sweden, collected between 2014 and 2018. Pathology reports were available for all cases. The H&E stained slides had been used for standard pathological diagnostics and the clinical diagnosis was available for each patient. The slides were scanned on an Aperio CS2 scanner (Leica, Newcastle, UK) at a resolution of 0.247 µm/pixel (40X magniﬁcation).

Dataset B was collected at Helsingborgs lasarett, Sweden, but scanned in Malmö on the same scanner as Dataset A. The conditions for this dataset were very similar as for Dataset A with available pathology reports. The studies which used Dataset A and Dataset B were approved by the Regional Ethics Committee in Lund, Sweden (no. 2005/494 and no. 2018/11).

Dataset C was obtained from Linköping University Hospital, Sweden, and approved by the Regional Ethics Committee in Linköping, Sweden (no. 2013/195-31). It was scanned on an Aperio AT Turbo scanner (Leica, Newcastle, UK) at a resolution of 0.5 µm/pixel (20X magniﬁcation). No clinical information was available.

Dataset D originates from Erasmus University Medical Center in Rotterdam, the Nether-lands. The slides were scanned on a Hamamatsu HT 2.0 scanner (Hamamatsu Photonics K.K, Tokyo, Japan) with resolution 0.228 µm/pixel (40X magnification). No clinical or patient information was available and the slides were considered remnant material. The digital images from all four datasets were uploaded to the Sectra IDS7 software (Sec-tra AB, Linköping, Sweden). Two senior consultant pathologists (P1 and P2, working at the same institute but in different departments at Skåne University Hospital in Malmö, Sweden) annotated the regions with Gleason patterns G3, G4 and G5 or benign within the IDS7 platform. Examples of annotations are given in Figure 2.4. No distinction was made between the different morphologies, such as the cribriform pattern, within the three Gleason patterns. In case the Gleason patterns were intermixed, the most representative pattern was assigned. Consecutive biopsy slides, double stained immunohistochemically against p63 and AMARC [21, 126] as well as the original pathology reports were available

(38)

(a) (b)

Figure 2.4: Annotations of cancer areas drawn by pathologists in the Sectra IDS7 software. The pen marks next to the tissue in (b) are drawn by the pathologist setting the clinical diagnosis, to highlight the cancer region.

for most cases in Dataset A and Dataset B and could be used in the event of ambiguity during annotation.

Slides with folded and fragmented tissue were included in the study, as long as the sections were deemed of good enough quality for pathologists’ routine use. The annotations per-formed by the two pathologists were pooled and the total number of slides and annotated regions can be seen in Table 2.1. The numbers in the table correspond to the full datasets.

Table 2.1: Number of annotated slides and regions in the Dataset A, B, C and D.

Dataset Nbr of slides Nbr of annotated regions Benign G3 G4 G5 A 109 2535 726 849 357 B 55 117 48 52 9 C 16 54 9 189 17 D 50 29 304 209 8

(39)

CHAPTER 2. DATA

Figure 2.5: Examples of H&E stained tissue with different grades (left to right: Benign, G3, G4, G5) from the different datasets (top to bottom: Dataset A, B, C and D).

Note that in some of the publications only parts of the datasets were used. Example images from the diﬀerent datasets can be seen in Figure 2.5.

2.1.3 Extended Dataset A

Dataset A was extended in Paper VII, denoted Extended Dataset A, using the same ethical permit, with additional annotations drawn from scratch but also annotations done by cor-recting preliminary annotations from a developed AI algorithm. Details for the algorithm used are given in Section 6.1. These two parts were denoted “Train 1”, consisting of 476 annotated biopsy slides from 119 patients, and “Train 2”, with an additional subset of 222 slides from a further 55 patients, see Table 2.2. The slide selection and annotation drawing were done in the same way as for the smaller version.

Table 2.2: Numbers of biopsy slides in Extended Dataset A (Train 1 and Train 2) and in Evaluation Dataset 1 (Test). The clinical diagnosis given to each biopsy was extracted from clinical records.

Benign 3+3 3+4 4+3 3+5 4+4 4+5 5+4 5+5 Biopsies Patients

Train 1 221 99 40 33 8 19 37 10 8 476 119

Train 2 146 34 16 10 1 9 6 0 0 222 55

Total to train 367 133 56 43 9 28 43 10 8 698 174

Test 13 9 6 3 3 0 3 0 0 37 21

(40)

2.1.4 Evaluation Dataset 1

For testing, a separate cohort of 37 biopsy scans from 21 patients, denoted Evaluation

Dataset 1 (“Test” in Table 2.2), with the same origin as Dataset A was collected and used

in Paper VII presented in Section 6.1. The study using this cohort was covered by the same ethical permit as Dataset A and Dataset B. The slides were scanned with Aperio CS2 (same as Dataset A), but 36 of the slides were also scanned with a Hamamatsu S60 scanner (Hamamatsu Photonics K.K, Tokyo, Japan; Scanner 2) 24 months later, at a resolution of 0.220 µm/pixel.

With the same procedure and conditions as above, e.g. including slides with folded and fragmented tissue, the slides were annotated by the same pathologists (P1 and P2). However this time, all slides were annotated by both pathologists. Furthermore, P1 repeated the annotations on the same samples after 1-year (ﬁrst evaluation was named P1-1 and second P1-2). This allowed assessment of intra- and inter-observer diﬀerences. The GS given at the time of diagnosis for each example was obtained from electronic medical records.

2.1.5 Evaluation Dataset 2

A second test cohort has been collected with patients included in accordance with the

Prostate Cancer Research International Active Surveillance (PRIAS) protocol [20, 147].

Ac-tive surveillance is becoming an increasingly accepted standard clinical approach to low-risk PCa. The patients included in this cohort all have GS of 3+4 or less with≤ 10% cancer per biopsy and≤ 2 biopsy cores with cancer. Follow-up of the patients is done regularly with for example PSA blood tests and new prostate biopsies. The latter are scheduled 1, 4, 7, 10 years after diagnosis and thereafter every ﬁfth year. Due to the composition of the cohort most biopsies will only contain low-grade cancer or benign tissue.

The cohort originates from three hospitals in Sweden; Skåne University Hospital in Malmö and Lund respectively as well as Centralsjukhuset Kristianstad. The clinical diagnosis was available for each patient, but no detailed annotations of cancer regions. The biopsies, collected between 2007 and 2018, were retrospectively scanned between 2018 and 2021 on an Aperio CS2 scanner (same as Dataset A). In total, data from 180 patients and more than 5000 biopsies were available. However, the slides from 2010 and earlier were faded and considered to be of too poor quality to include. Furthermore, only slides from 88 patients have been scanned and analysed this far. The dataset used for testing consisted of data from 2263 slides whereof 260 slides contained cancer. The cancer length was reported for 255 of the cancer slides, whereof 108 had ≤ 1 mm cancer and 149 had ≤ 2 mm cancer. The average cancer length was 2.9 mm. The dataset is denoted Evaluation Dataset 2 in this thesis, used in Section 6.2. Ethical approval was provided by the Regional Ethical Committee in Lund (no. 2008/708).

(41)

CHAPTER 2. DATA

2.2 Coronary Artery Disease Dataset

One dataset has been used for development and evaluation of automatic detection of CAD in this thesis. However, only part of the dataset was available for Paper VIII and only part of the dataset had QCA values, which was used in Paper X. The dataset as a whole is described in this section. The studies were approved by the Regional Ethics Review Board in Region Östergötland, Sweden (approval number 2019/00097).

Adult subjects referred to MPI at Linköping University Hospital during 1st of June 2014 to the 30th of October 2019 (n=3658) were considered. The referral for stress testing was at the clinical discretion of the referring cardiologist. In total 759 patients were selected from the database, with MPI studies conducted in a dedicated CZT cardio camera [52, 119] (D-SPECT Spectrum Dynamics). Patients met the following requirements: they had undergone ICA maximum 6 months after MPI or had a low or very low pre-test probability of ischaemia according to the AHA. For all patients, information about CAD in the RCA, LAD and LCx was available, as well as information about artefacts in the MPI images from breast, diaphragm or LBBB. Details can be seen in Table 2.3. For 275 patients, the QCA value was also available. Details of this subset is given in Paper X, Section 7.4.

MPI was performed according to the European Association of Nuclear Medicine guide-lines [76]. After a stress test, MPI was done on the CZT camera in both upright and supine positions with at least 1 million myocardial counts. All patients performed either a physical bicycle ergometer stress test, a pharmacological stress test with regadenoson, or a combination of both, at the discretion of the nuclear medicine physician. All subjects received prior, routine instructions, sent to their home, to avoid potential regadenoson ag-onists for at least 24 h before MPI (e.g., coﬀee, tea, cola drinks, chocolate and cacao). In case of use of a pharmacological stress test, 400 µg (5 ml) of regadenoson were adminis-tered intravenously. In the case of a combined protocol, a bicycle ergometer stress test with 30-50 watts was used with regadenoson after 2 minutes of cycling.

Left ventricular myocardial contours were computed using standard Cedars-Sinai Medical Center Quantitative Perfusion SPECT (QPS) software version 2012. Left ventricular con-tours were deﬁned by a technologist with >15 years of experience in nuclear cardiology who was blinded to angiographic and clinical ﬁndings. When needed, the technologist corrected the gross initial left ventricular localisation, the left ventricular mask, and the valve plane position. Thus, the polar maps from upright and supine position were registered to each

Table 2.3: The number of cases with artefacts and with CAD in the different arteries. Note that each case can have a positive label for multiple arteries and artefacts.

Total Normal LAD RCA LCx LBBB Breast Diaphragm

759 387 154 141 123 66 55 80

(42)

2.2. CORONARY ARTERY DISEASE DATASET

other when reconstructed, except some minor noise, and the three arteries are depicted at the same position in each image.

MPI was routinely assessed visually by nuclear medicine consultants, each with at least 10 years of experience in MPI, by using the 17-segment model of the left ventricle, see Fig-ure 1.9 on page 13, and a conventional 4-point grading system: 0 – normal uptake, 1 – equivocal uptake, 2 – moderate uptake, and 3 – severe reduction of uptake, according to current guidelines [113, 139]. All MPI images were retrospectively re-evaluated by an ex-perienced nuclear medicine physician. Only segments with an uptake score≥2 at stress were considered to have a definite uptake reduction. Of these, (i) segments with reversible defects (ischaemic defects) were defined as those with at least 1-point decrease in uptake score on the rest acquisition and (ii) the other segments were considered to have a fixed de-fect (myocardial infarction segments) except for those with a definitely normal contractility on gated SPECT and for which the final diagnosis was attenuation artefact. Segments with attenuation artefact were excluded for the final definition of the stress defects area. Ground truth for LBBB was provided from ECG.

The ground truth for pathological MPI images was obtained by means of ICA conducted routinely according to standard techniques. Coronary angiograms were analysed by a blinded experienced observer. Per cent lumen area reductions due to intracoronary athero-matous plaques were first determined visually on end-diastolic frames and with the help of a quantitative angiography software (General Electric Advantage Workstation, Cardiac X-Ray Applications, Stenosis Analysis v1.6) for stenosis visually assessed to be around the 50% threshold by an experienced angiographer physician. Where applicable, two separate measurements in orthogonal views of the same stenotic segment were obtained and values were averaged to represent an approximate measurement of the per cent vessel area stenosis. Any stenosis≥ 50% was considered significant and regarded as a positive QCA test. Total coronary vessel occlusions were marked as 100% lumen area stenosis. When no visible stenotic lumen was seen on angiography with a marginally patent vessel (with other than normal flow) the stenosis was also regarded as a total occlusion. In the case of “normal” MPI images a follow up period of at least 6 months was carried out.

All patients had data from stress in up-right and supine position. Some patients also had images in rest, but not all of them. The reason for this is mainly that no rest examination was carried out on patients who were considered healthy in the stress examination. To not aﬀect the algorithm by missing data, the rest images were not used. The polar maps were cropped in full resolution which meant that the resulting images had a size of either 296× 296 pixels or 336 × 336 pixels. Since the majority of these crops had the smaller size, all were resized to have the size 296× 296 pixels. The polar maps are usually shown in artiﬁcial colouring to the medical doctors. However, since the colouring does not add any additional information to the greyscale intensity image the original data was instead used with the developed DL systems. A conversion between colour and greyscale was developed

Applications of Deep Learning in Medical Image Analysis : Grading of Prostate Cancer and Detection of Coronary Artery Disease

Applications of Deep Learning in Medical Image Analysis

Grading of Prostate Cancer and Detection of Coronary Artery Disease

Arvidsson, Ida

Applications of Deep Learning

in Medical Image Analysis

Grading of Prostate Cancer and Detection of

Coronary Artery Disease

Abstract

Populärvetenskaplig sammanfattning

List of Publications

Acknowledgements

Abbreviations

Contents

Chapter 1

Introduction

1.1

Thesis Overview

1.2

Deep Learning

1.3

Medical Image Analysis

1.4

Prostate Cancer

1.5

Coronary Artery Disease

Chapter 2

Data

2.1

Prostate Cancer Dataset

2.2

Coronary Artery Disease Dataset