Image Distance Learning for Probabilistic Dose–Volume Histogram and Spatial Dose Prediction in Radiation Therapy Treatment Planning

(1)

IN

DEGREE PROJECT MATHEMATICS, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2020,

Image Distance Learning for Probabilistic Dose–Volume Histogram and Spatial Dose

Prediction in Radiation Therapy Treatment Planning

IVAR ERIKSSON

(2)

(3)

Image Distance Learning for Probabilistic Dose–Volume Histogram and Spatial Dose

Prediction in Radiation Therapy Treatment Planning

IVAR ERIKSSON

Degree Projects in Mathematical Statistics (30 ECTS credits) Degree Programme in Applied and Computational Mathematics

(4)

TRITA-SCI-GRU 2020:061 MAT-E 2020:024

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Abstract

Construction of radiotherapy treatments for cancer is a laborious and time consuming task. At the same time, when presented with a treatment plan, an oncologist can quickly judge whether or not it is suitable. This means that the problem of constructing these treatment plans is well suited for automation.

This thesis investigates a novel way of automatic treatment planning.

The treatment planning system this pipeline is constructed for provides dose mimicking functionality with probability density functions of dose–volume histograms (DVHs) and spatial dose as inputs. Therefore this will be the output of the pipeline. The input is historically treated patient scans, segmentations and spatial doses.

The approach involves three modules which are individually replaceable with little to no impact on the remaining two modules. The modules are:

an autoencoder as a feature extractor to concretise important features of a patient segmentation, a distance optimisation step to learn a distance in the previously constructed feature space and, finally, a probabilistic spatial dose estimation module using sparse pseudo-input Gaussian processes trained on voxel features.

Although performance evaluation in terms of clinical plan quality was beyond the scope of this thesis, numerical results show that the proposed pipeline is successful in capturing salient features of patient geometry as well as predicting reasonable probability distributions for DVH and spatial dose.

Its loosely connected nature also gives hope that some parts of the pipeline can be utilised in future work.

Keywords—radiation therapy, automated planning, machine learning, autoencoder, distance optimisation, sparse pseudo-input Gaussian process, kernel density estimation, dose mimicking, dose–volume histogram

(6)

(7)

Sammanfattning

Bilddistansinl¨arning f¨or probabilistisk dos–volym-histogram- och dosprediktion inom str˚albehandling

Skapandet av str˚albehandlingsplaner för cancer är en tidskrävande uppgift.

Samtidigt kan en onkolog snabbt fatta beslut om en given plan är acceptabel eller ej. Detta innebär att uppgiften att skapa str˚alplaner är väl lämpad för automatisering.

Denna uppsats undersöker en ny metod för att automatiskt genere- ra str˚albehandlingsplaner. Planeringssystemet denna metod utvecklats för inneh˚aller funktionalitet för dosrekonstruktion som accepterar sannolik- hetsfördelningar för dos–volymhistogram (DVH) och dos som input. Därför kommer detta att vara utdatan för den konstruerade metoden.

Metoden är uppbyggd av tre best˚andsdelar som är individuellt utbyt- bara med liten eller ingen p˚averkan p˚a de övriga delarna. Delarna är: ett sätt att konstruera en vektor av kännetecken av en patients segmentering, en distansoptimering för att skapa en distans i den tidigare konstruerade känneteckensrymden, och slutligen en skattning av sannolikhetsfördelningar med Gaussiska processer tränade p˚a voxelkännetecken.

Trots att utvärdering av prestandan i termer av klinisk plankvalitet var bortom räckvidden för detta projekt uppn˚addes positiva resultat. De esti- merade sannolikhetsfördelningarna uppvisar goda karaktärer för b˚ade DVHer och doser. Den löst sammankopplade strukturen av metoden gör det dessutom möjligt att delar av projektet kan användas i framtida arbeten.

Nyckelord—str˚albehandling, automatiserad dosplanering, maskininl¨arning, autoencoder, distansoptimering, glesa Gaussiska processer, sannolikhets- f¨ordelnings-estimering, dosrekonstruktion, dos–volymhistogram

(8)

(9)

Acknowledgements

First, I wish to thank my supervisor at RaySearch Laboratories, Tianfang Zhang, for excellent guidance in completing this thesis. I would like to give him praise for a masterful introduction to the vast world of radiation therapy and automatic dose planning.

Thanks also to Albin Fredriksson for enlightening discussions and Jimmy Olsson, my supervisor at KTH, for his guidance. Furthermore, I thank Bj¨orn Andersson for technical assistance and lastly, I say thank you to Hanna Gruselius and Viktor Nilsson for talks concerning autoencoders.

Stockholm, May 19, 2020 Ivar Cashin Eriksson

(10)

(11)

Introduction

Over 18 million people were diagnosed with cancer in 2018 alone. Cancer claims the lives of over half of its diagnosed patients yearly. Some types of cancer have become increasingly treatable, still, some types remain highly lethal. The deviation in recovery rate between cancer types often depends on the types of treatments applicable to different cancers. There are many different approaches to treating cancer such as surgery, chemotherapy or radiation therapy. Major strides have been taken in increasing the efficiency of these treatments [1] [2].

In radiation therapy these include improvements in patient immobilisation techniques, beam shaping technologies, patient modelling, among others. In the computer era, with the introduction of advanced treatment planning systems, radiation therapy has substantially increased in complexity [3]. With further advancement of radiation therapy and its delivery devices more de- grees of freedom are going to be introduced. This brings with it the potential for more customised treatment plans at the cost of increasing the complexity of constructing a treatment. Today, no radiation therapy plan is produced without the aid of treatment planning software.

Whilst the more complex delivery devices necessitate the use of a highly specialised software to construct treatments they also lend themselves well to applications of mathematical optimisation. This opens up the possibility of fully automated treatment planning in the future.

1.1 Radiation Therapy

Radiation therapy is a technique used to treat cancer. Ionising radiation is used to damage the DNA of malignant cancer cells hindering their ability to multiply. If successful, this can bring the tumour into a state of remission in which it will shrink and potentially completely void the patient of the cancerous cells. Radiation therapy can also be used to provide palliative relief or to enable other treatments such as surgery or chemotherapy [4].

(16)

1.1. Radiation Therapy Chapter 1. Introduction

of ionising radiation to break down the DNA of the cancer but vary in their delivery methods. One often distinguishes between internal (brachytherapy) and external treatments. In brachytherapy small radioactive materials are placed inside the patient body close to the cancer cell in order to localise the radiation dose. When administering radiation externally instead, an accelerator is used to generate directed ionising radiation. The accelerator is placed on a gantry which can be repositioned, often when active, to deliver a highly customisable dose. Different types of radiation can be used, such as protons or electrons, but most commonly high frequency photons deliver the dose.

The radiation will enter the body and damage the DNA of the patient cells, cancerous or otherwise, making the efficient delivery of a correct dose vital. It is important both that the cancer receives the specified dose and that risk organs receive as small of a dose as possible. Without both parts in place the patient runs the risk of either not being rid of the cancer or severe complications after the treatment.

The total treatment is often divided into several visits during which a small part of the total dose is administered. This is because the healthy tissue has a higher ability to regenerate and repair its DNA. This means that dividing up the total dose will allow the healthy tissue to repair itself between treatments minimising the risk of future complications.

Because of the many factors that need to be taken into consideration and poor correspondence between clinical goals and optimisation objectives, a treatment planner can spend many hours with a treatment planning system designing a treatment.

Figure 1.1. A prostate patient treated with RayStation. In the background (to the left) the plan’s dose–volume histograms (see Section2.2.2) and in the foreground (to the right) a cross section of the CT scan and calculated dose [5].

(17)

1.1. Radiation Therapy Chapter 1. Introduction

1.1.1 Treatment Planning System

To create a deliverable and acceptable dose many parameters need to be decided. Many different delivery devices exist with varying functionality ranging from dynamic beam shaping, beam intensity modulation, multiple or continuously varying beams, gantry angles etc. In turn, these parameters have restrictions on them, for example, maximum allowed angles or maximum achievable speeds or accelerations. All in all it is highly impractical to set these machine parameters manually in a way to achieve a dose which is clinically acceptable. Instead, a treatment planner focuses on achieving an acceptable dose through a treatment planning system (TPS) which in turn calculates deliverable machine parameters. One such TPS is RayStation, developed by RaySearch Laboratories AB, Stockholm, Sweden, see Figure1.1.

The TPS will from clinical parameters select optimised machine parameters well suited for the selected delivery device. The first step of creating a treatment is obtaining a patient scan. This can be done via a computerised tomography (CT) scan or similar. The CT scan is divided into slices which in turn are rasterised into volume pixels (voxels). The voxels together make up the entire patient geometry. The scan then needs to be segmented to specify which voxels belong to certain regions of interest (ROIs). The ROIs can consist of organs at risk (OARs) such as the left and right femurs, bowel or the bladder, the planning target volume (PTV), the tumour, or external tissue. Many of these may intersect, particularly if the tumour is growing on an OAR then the PTV and this OAR can almost completely overlap. This segmentation can be done manually by an oncologist or automatically via machine learning methods built into some TPSs.

When the patient is segmented, the oncologist can begin work on designing clinical goals for the treatment. When the oncologist is satisfied with the clinical goals set up, a treatment planner will attempt to construct a dose fulfilling the clinical goals as well as possible. The treatment planner will input the ideal dose to be delivered to each ROI and a penalty for deviating from this dose.

Often the clinical goals provided by the oncologist are conflicting in the sense that a tumour located inside of a patient body cannot be irradiated without also irradiating surrounding tissue. Therefore, the treatment planner will select weights to specify which clinical goals are to be considered more important.

The treatment is then found via a TPS minimising quadratic loss functions based on what the treatment planner has specified. The dose delivered to organs with low weightings is sacrificed first to attain deliverable machine parameters.

Unfortunately, the loss functions are not necessarily well adapted for con- veying the “idea” behind the clinical goals. This means that the solution arrived at might not correspond well to the wanted treatment. The reason

(18)

1.2. About this Paper Chapter 1. Introduction

use of many different methods to shape the dose. They may include varying the beam shape by inserting metal leaves in front of the beam with a multi-leaf collimator (MLC) (see Figure 1.2), irradiating the tumour from different gantry angles and adapting the movement speed of any components of the machine. All together this makes for a high dimensional optimisation problem which is not easy to get an intuitive grasp of.

The high complexity of the problem may result in the optimal solution to the clinical goals not being as intended, forcing the treatment planner to tweak their problem specification. The TPS calculates a new set of machine parameters which still might not fully satisfy the oncologist and in this manner the treatment planner reiterates.

Figure 1.2. A 3D-CRT beam set in RayStation. The beam shaping MLC can be seen in brown [5].

1.2 About this Paper

This paper concerns streamlining the process of treatment planning and to develop an automated dose prediction pipeline. Previous attempts have been made to automate this process and there are systems in place that allow the treating oncologist to select between automatically generated plans or offload the task to a machine learning algorithm. As an oncologist can quickly decide if a plan presented to them is adequate, whilst creating the plan can be rather tedious, the problem lends itself well to automation.

RayStation provides functionality to perform what is known as dose mimicking. Dose mimicking will, from a non-deliverable dose, calculate the deliverable machine parameters that best reproduce the given dose. Many previous attempts have made use of this functionality to construct automatic treatments. They have however had problems with loose basis in mathematically

(19)

rigorous theories and long training times.

We will attempt to further the work done in this area by materialising a novel approach to automatic dose prediction. Our approach will construct a probability distribution over DVHs and spatial dose to be fed to the dose mimicking machinery. We will also attempt to improve on previous work by making use of ideas that are easier to motivate mathematically. Finally, the automatic treatment pipeline constructed will require a significantly shorter training time and training will be well suited for parallelisation.

1.2.1 Problem Formulation

The problems posed in this paper can be formulated as follows:

Is it possible to, from given historical data (patient scans and delivered dose), create a probabilistic dose prediction pipeline to streamline the workflow of oncologists? In particular, is it possible to create a distance between patient segmentations to be used in future automated cancer treatment?

1.2.2 Structure

This thesis will first, in Chapter 2, explain the basic concepts of radiation therapy needed to understand the construction of the pipeline.

Later, in Chapter 3, we will explore the pipeline architecture and the specifics of its three main modules. These are: an autoencoder used to ex- tract valuable features from patient scans, a method for learning a distance in the resulting feature space¹ and, finally, a probability density estimator by training sparse pseudo-input Gaussian process to predict a probability distribution of the spatial dose.

Finally in Chapters4and 5the implementation details will be covered in brief and the results will be presented and discussed.

The appendix contains some auxiliary calculations along with a flowchart of the pipeline architecture and the entire pipeline applied to an example test patient.

1.2.3 Data

The data used for this project is provided by Iridium Kankernetwerk in Antwerp, Belgium [6,7]. The data is made up of 94 prostate patient cases all with a clinical prescription dose of 7000 cGy in the PTV. Each patient has an accompanying CT scan, a segmentation split into 11 ROIs and the final dose delivered to the patient. We will be splitting this data into 80 training patients and 14 patients for testing.

1The distance learning also leads to a DVH prediction method.

(20)

1.2.4 Previous Work

The task of automatic dose prediction has been attempted many times before using various methods, see [8,9,10,11,12]. Some previous approaches have also made use of autoencoders to construct feature spaces as a way to analyse patient segmentations. They have also been used to investigate the possibility of distances in feature space as a proxy for a direct patient distance [13].

Where this thesis will vary is in its approach to finding a suitable distance between patients via minimisation of deviations in DVH criteria. It also provides a way of constructing a fully automatic treatment pipeline producing probabilistic instead of single predictions for DVHs and doses. Finally, the structure of the pipeline, as made up of three individually replaceable modules, will make it well suited as a foundation for future work.

1.2.5 Goals and Limitations

The author of this thesis realises that the task at hand is indeed not a simple one. Instead of hoping to be able to revolutionise radiation therapy overnight, the main focus of the paper will be to explore this novel method of building an automatic treatment pipeline².

The goal is that if not all of the pipeline, then parts of it will produce positive results. There is also hope that the loosely connected nature of the components of this approach means that some modules of it can be improved upon independently or replaced in the future. Therefore this work acts as a good stepping stone towards creating a complete automatic dose prediction pipeline.

2A brief summary of the entire pipeline architecture can be found in FiguresB.2–B.7, see Chapter3for the full description.

(21)

Chapter 2

Background

This section is designed to guide the reader through radiation therapy specifics that are not commonly known but required to understand the following chapters of this thesis. We begin by explaining the traditional problem setup of radiation therapy, dose statistics and how to compare them. Then, finally, we will account for the traditional method of dose mimicking.

2.1 Problem Setup

The problem of treatment planning is often handled in the following manner.

A patient with a known or suspected cancer tumour will get a CT scan of the problem area. From the scan, an oncologist will decide on how best to irra- diate any present tumour to spare surrounding tissue whilst simultaneously treating the cancer. This is formulated as a set of “clinical goals”. When the clinical goals are set up, the treatment planner will with the help of a TPS translate them into machine parameters and the patient can be treated.

To be able to both set up the clinical goals and to calculate the optimal machine parameters, we first need to discretise the patient volume. This is done by dividing the patient scan into a set of 3-dimensional pixels, or volume pixels (voxels), each with an index i ∈ P , where P is the full patient volume.

Next, the dose can be discretised according to the same voxels. Each voxel, i, is assumed to uniformly be given a dose dⁱ. We will call the set of voxel doses, {dⁱ}_i∈P, and the true dose the for the “dose” and use d to refer to them both interchangeably. The distinction is often not necessary as there exists discrete and continuous analogues for all equations in this and the following chapters.

When the patient is discretised, an oncologist will segment the patient volume into ROIs. Each ROI consists of a set of indices, R ⊆ P , of voxels.¹ The treating oncologist can then set up clinical goals for each ROI.

1Usually R also carries information about each voxel’s relative volume. Our data does

(22)

2.2. Dose Statistics Chapter 2. Background

The clinical goals for each ROI are the targets for how that particular ROI is to be irradiated. After setting up the clinical goals, a target dose, dR, and a loss function, L_R, for deviating from the specified dose is specified. Often something similar to the loss

L_R(d, dR) =X

i∈R

lR(dⁱ, dR) (2.1)

is chosen, where l_R is given by

lR(x, x⁰) =

((x − x⁰)²·1{x>x⁰}, if R is an OAR,

(x − x⁰)², otherwise. (2.2)

Often dR is non-zero even for OARs to specify losses that better correspond to deliverable treatments. When this is done, each ROI is given a weight, w_R, to specify the importance that the corresponding loss be kept small.

After each ROI is given a weight and a loss function, the total loss function, L_total, is constructed. This is made as a weighted sum of the individual ROI losses, see below. The TPS can then compute the machine parameters that minimise Ltotal.

L_total(d) = X

R⊆P

wRL_R(d, dR) (2.3) With the computed dose and its discretisation, we can calculate dose statistics of each of the relevant ROIs. The dose statistics are used by the oncologist to evaluate the dose.

2.2 Dose Statistics

Dose statistics are used to more easily quantify certain characteristics of a dose. They can be adapted to compare different doses and in extension measure the “goodness” of a dose. This section is mainly a reiteration of the corresponding sections of [4]. See this paper for a more in-depth look at some commonly used dose statistics.

2.2.1 Volume-at-Dose and Dose-at-Volume

The volume-at-dose (VaD) and dose-at-volume (DaV) are two dose statistics that can be used to derive many more dose statistics. We shall present their definitions here as they will be useful later.

First we define the random variable DR uniformly distributed over the set {dⁱ}_i∈R. With this definition, we can easily define the VaD_x(d) in ROI R w.r.t. x ∈ R, this is the complementary cumulative distribution function (cdf) of DR,

VaDx(d) = 1 − P(DR≤ x) = P(DR> x) = |{dⁱ | dⁱ > x, ∀i ∈ R}|

|R| , (2.4)

(23)

2.2. Dose Statistics Chapter 2. Background

where |x| denotes the cardinality of the set x. The DaV_v(d) in a ROI R w.r.t.

v ∈ [0, 1] is the generalised (1 − v)-quantile function of DR [14],

DaV_v(d) = inf{x ∈ R | VaDx(d) ≤ v}. (2.5) Note that the VaD_x(d) is a not necessarily strictly but still decreasing function of x. Therefore, the infimum in the definition of the DaV_v(d) is required to guarantee uniqueness.

Also note that for a strictly decreasing VaD_x(d) the DaV_v(d) is its inverse and for a not strictly decreasing VaD_x(d) the DaV_v(d) is a possible generalisation of the inverse.

2.2.2 Dose–Volume Histogram

One commonly used way of plotting the distribution of D is through the dose–volume histogram (DVH). It is simply a plot of VaDx(d) against x. The DVH for a ROI, R, with a prescribed dose, d, describes what volume-ratio of R gets a dose of x or higher for any given dose level x.

The DVH is used for the clear overview it gives of the administered dose.

Multiple DVHs for different ROIs can be plotted together to give a view of the entire overarching dose. Typical DVHs for PTVs and OARs can be seen in Figure 2.1.

0 1000 2000 3000 4000 5000 6000 7000

Dose [cGy]

0.0 0.2 0.4 0.6 0.8 1.0

Volume Proportion

PTVOAR

Figure 2.1. Typical DVHs for a patient whose prescription dose in the PTV was 7000 cGy. The TPS will attempt to create a dose which shifts the DVHs in the directions indicated by the arrows.

(24)

2.3. Dose Mimicking Chapter 2. Background

2.2.2.1 DVH Distances

To be able to compare doses, we need a measure of similarity between them.

It is already difficult to quantify differences in quality between doses given to the same patient. If we also wish to say something about the quality of doses given to different patients the task becomes exceedingly challenging. A possible stand-in for a direct measure between doses is a measure in DVH- space.

Transforming a dose and patient geometry to DVHs enables us to make use of any measure in the space of functions. Which measure to use is still an active research area and there exist many viable options.

One possible distance measure in DVHs would be a simple squared difference

dist(DVH_R,d, DVH_R,d⁰) = Z 1

0

(DaV_v(d) − DaV_v(d⁰))²dv, (2.6) in the discrete case

dist(DVH_R,d, DVH_R,d⁰) = |DVH − DVH⁰|² (2.7) where DVH and DVH⁰ are vectors of the dose values corresponding to a set of volume ratios and |x| denotes the Euclidean norm of a vector x. Note how both of these definitions measure the squared distance in dose and not in volume (corresponding to the horisontal axis in Figure2.1).

The major selling point of the squared distance is of course simplicity in calculations. We will be using the squared distance in future calculations exactly for this reason. Later, it will become clear that any distance can be used given that we are willing to relax some other constraints.

2.3 Dose Mimicking

The traditional optimisation problem of the TPS has its basis in clinical goals. The problem of dose mimicking is slightly different however. When dose mimicking, the TPS will calculate machine parameters to best replicate a given dose. This can be useful for many reasons but is often used to obtain machine parameters for algorithms that generally produce doses that are not deliverable.

RayStation provides functionality to perform dose mimicking. Mainly, there are two ways of approaching dose mimicking in RayStation. The first, and more traditional, method utilises a given reference dose and calculates machine parameters to best recreate the reference dose. The second approach utilises uncertainties in the dose and creates a minimisation problem based on these. For more information on the second approach to dose mimicking see Section 3.2.

(25)

2.3. Dose Mimicking Chapter 2. Background

2.3.1 Traditional Approach

When dose mimicking in the traditional manner, RayStation will optimise machine parameters to minimise variations of (2.1) and (2.6). Therefore, these are sometimes referred to as dose mimicking functions [15]. The problem RayStation solves is made up of what is known as reference dose and reference DVH functions. The reference functions are defined as follows

ref^dose_R (d, d⁰) =X

i∈R

l_R(dⁱ, d⁰ⁱ) (2.8)

ref^DVH_R (d, d⁰) = Z 1

0

lR(DaVv(d), DaVv(d⁰)) dv (2.9) where d⁰ is some reference dose we wish to mimic. The optimisation problem set up to mimic d⁰ can then be formulated as follows

minimise

η

X

R∈P

w_R^doseref^dose_R (dη, d⁰) + X

R∈P

w_R^DVHref^DVH_R (dη, d⁰) subject to η ∈ E

(2.10)

where E is the set of deliverable machine parameters and dη is the dose delivered by the machine parameters η. The weights w_R^dose and w^DVH_R are non- negative constants summing to unity over P (P

R∈Pw^dose_R +P

R∈Pw_R^DVH= 1) that provide a weighting between ROIs and between dose and DVH discrep- ancies.

(26)

Chapter 3

Prediction Pipeline

This chapter will describe the ideas and mathematics that motivate the chosen pipeline architecture. Each of the three modules that make up the pipeline are presented in, mostly, self contained sections below.

The pipeline will be trained to predict probability distributions of both DVHs and spatial dose from historically treated patients. It will follow the general form of the flowchart in Figure 3.1. See also Figures B.2–B.7 for a more pictorial explanation of the pipeline architecture in its entirety.

Before explaining the architecture however, we need to discuss an improvement that has been made to the dose mimicking machinery in RayStation, namely, dose mimicking with uncertainty. This improvement allows the dose mimicking functionality to accept probability distributions of both DVHs and spatial dose to construct a more sophisticated minimisation problem.

3.1 Stochasticity of d

In the traditional setting of radiation therapy, the dose, d, is seen as a variable over which we search for an optimal solution to the objective function constructed by the TPS. When the optimisation is complete, d is viewed as the

“one” solution to the problem of treating the particular patient, P . This view is well suited for the optimisation problem of the TPS. For our purposes, of probabilistic DVH and spatial dose prediction, a slightly different viewpoint will be useful instead.

Every time a treatment planner makes a treatment which tries to satisfy clinical goals for P it turns out slightly different. This is because of two main reasons.

1. Cancer treatment is often a matter of opinion and there does not always exist a true “best” treatment for P . Instead, treatment planners can be categorised according to a number of traits such as being “careful” or

“aggressive” for instance. Treatment planners with different traits can, when presented with conflicting clinical goals, produce widely varying treatments.

(27)

3.1. Stochasticity of d Chapter 3. Pipeline

y

GP

Dose Mimicking

Spatial Dose Estiamation Historical Data

[x, y, DVHs]

DVH Estimate AE

Dist_X x Distances Transform [t] Weights

K

Dose Distribution

RayStation

Transform [t] Weights Dist_X

DVH Estimation

Function Variable

Automated Dose Prediction Pipeline

Legend

Starting Points for Prediction End Point for Prediction x

Figure 3.1. The pipeline architecture. The three modules in the pipeline aim to learn the three unknown blue units. These are, the feature extractor (AE), the distance in feature space (K), and the method of making probabilistic spatial dose predictions (GP).

2. Any given treatment planner will have some inherent variability in how their treatments turn out. This depends on many different factors but can be thought of as simple randomness in the process of constructing the treatment.

This means that the administered treatment, d, can be viewed as the outcome of a random variable, D (seperate from D of section 2.2). The outcome, d, is then viewed as the optimal solution to an optimisation problem defined by a set of parameters, which themselves are inherently random. The reason we prefer this view is because of new dose mimicking functionality available in RayStation.

(28)

3.2. Dose Mimicking Revisited Chapter 3. Pipeline

3.2 Probabilistic Approach to Dose Mimicking

The dose mimicking discussed in Section 2.3treats the reference dose, d⁰, as a ground truth dose we want to recreate. This can be the wanted behaviour in certain circumstances but is not well suited for our view of d⁰ as a draw from D⁰. We have chosen this view due new functionality lately available in RayStation.

Now RayStation also includes dose mimicking functionality that instead of a reference dose, d⁰, will accept a probability density function (pdf) of a random variable, D⁰. This helps the mimicking function better represent what a treatment planner would have produced.

Again, the optimisation problem is made up of a weighted sum of reference functions just as in (2.10). However, the reference functions have a slightly different form. Denoting the cdf of a random variable X by FX the reference functions can be written as

ref^dose_R (d, FD⁰) =X

i∈R

`^dose_R (F_D⁰ⁱ(dⁱ)), (3.1)

ref^DVH_R (d, F_DaV_•_(D⁰₎) = Z 1

0

`^DVH_R (F_DaV_v_(D⁰₎(d), v) dv (3.2) where `^dose_R and `^DVH_R are given by

`^dose_R (x) =

(log(x), if R is an OAR,

log(1 − x), otherwise, (3.3)

and

`^DVH_R (x, v) =

(log(x), v ≤ ¹₂∨ if R is an OAR,

log(1 − x), otherwise. (3.4)

Finally, we combine these in a sensible manner to arrive at something similar to (2.10). It is important now that the weights of the DVH reference functions when R is a target region are non-zero or the optimisation will attempt to increase the dose in the targets without considering the homogeneity of the dose. This can lead to a surge in dose in one area of the target, well beyond the clinical goal set, causing further damage to the patient.

3.3 Gaussian Mixture

With our new view of d it is a sensible question to ask how it is distributed.

Random variables with separate subpopulations each exhibiting a certain behaviour can be modelled with mixture models.

A random variable, D_E, with N subpopulations can be constructed via N + 1 auxiliary random variables {E, {De}^N_e=1}. Here E is an index variable

(29)

3.4. Pipeline Architecture Chapter 3. Pipeline

selecting between the different subpopulations and D_ecaptures the behaviour of subpopulation e.

A draw from D_E can be made by only two draws from the set of auxiliary random variables.

1. Perform a draw from the index variable E, call the draw e.

2. Draw from the random variable D_e, call it d_e. 3. The draw, d_e, of D_e is now a draw from D_E.

We will often simply use D to mean D_E. To obtain a Gaussian mixture we restrict De to be a Gaussian distribution with mean and covariance given by µe and Σe respectively.

The pdf of the Gaussian mixture is the weighted sum of the pdfs for several different Gaussian random variables

pD(d) =

N

X

e=1

we

1

(2π)^k/2|Σ_e|^1/2 exp

−1

2(d − µe)^TΣ⁻¹_e (d − µe)

(3.5)

where k is the dimension of the Gaussian random variables {De}^N_e=1 and we have w_e= P(E = e).

Looking at the probability density function of the dose, D, we often see something akin to a Gaussian mixture. The reason for this being the different classes of treatment planners and their inherent variability. See figure3.2 for an example of (3.5) and [16] for further information on mixture models.

The different classes that go into a Gaussian mixture can, as explained above, be attributed to patients being treated by different treatment planners or “experts”, each with a different view on how to treat the cancer. The variations can also be caused by different clinics with different guidelines however. In reality, many different factors can contribute to creating separate classes of treatments. We will however from here on refer to these classes as different experts.

3.4 Pipeline Architecture

The pipeline will, given a segmented patient, determine a predictive probability distribution over DVHs and spatial dose. This will require scans of previously treated patients, {Pi}ⁿ_i=1, with accompanying segmentations, {xi}ⁿ_i=1, and delivered doses, {y_i}ⁿ_i=1. Each dose, y_i, is viewed as a realisation of a random variable, Yi, with an unknown distribution.¹ We will also use y to mean the dose realised as a DVH. This is possible as the DVH of patient i is uniquely defined by (x_i, y_i) and we will always be carrying x_i with us in the back of our minds through the following calculations.

1Unfortunately we need to change the notation of the dose, d, and its corresponding

(30)

3.4. Pipeline Architecture Chapter 3. Pipeline

0 2000 4000 6000 8000 10000

Dose [cGy]

Figure 3.2. The probability distribution of the delivered dose in a certain point of the patient volume might look like the blue line above. We assume there exists three groups of treatment planners with different ways of treating this patient.

For a new patient with a segmented scan, x^∗, we wish to find the predictive distribution

p_Y^∗_|X^∗_,{(X_i_,Y_i_)}ⁿ

i=1(y^∗ | x^∗, {(xi, yi)}ⁿ_i=1). (3.6) To preserve the idea of different experts and to obtain a predictive distribution on the form of a Gaussian mixture one could have the following interpretation.

Consider our previously treated patients with their respective doses. We will view each of these patients as an individual expert, E. Each expert comes with a treatment they would “like” to prescribe, y_E, to the new patient and some uncertainty in how this treatment will turn out, E. The treatment they wish to administer is the treatment previously administered to the corresponding historical patient and the uncertainty is a Gaussian random variable with 0 mean and some known variance, σ_E². For simplicity we will also assume this variance to be constant over all experts, σ²_E = σ². We will also assume the variance in the experts’ treatments to be independent so we get _E ∼ N (0, σ²).

Then the problem becomes a regression problem of finding the weights for

(31)

3.5. AEs as a Feature Extraction Method Chapter 3. Pipeline

each expert. The mean prediction of the model is given by

E[Y^∗] = E[yE + _E] = E[yE] + E[E] (3.7)

= E[yE] + E[E[E | E]] =

n

X

i=0

P(E = i)yi+ E[0] (3.8)

=

n

X

i=1

wiyi, (3.9)

where w_i = P(E = i). With the weights we can also calculate a probability distribution in DVHs for the new patient according to our Gaussian mixture.

One glaring oversight of this approach however, is that there exists no obvious way of constructing weights. For the training data we could of course simply calculate the weights that minimise some measure of loss directly.

For our method to generalise to a new patient, P^∗, with corresponding segmentation, x^∗, however, we need the weights to be a function of x^∗. To do this we will rely on a transformed distance in a latent space produced by an autoencoder.

This will give us a way of weighting a new patient against historically treated ones to produce probabilistic DVH predictions. To create the entire dose we will use these weights but instead of a simple Gaussian kernel density estimation as for the DVH we will make better use of structure of the data.

For each patient we will fit a Gaussian process from a space of voxel features to the dose. In this manner we get a probability distribution over the dose for each treated patient. These distributions can later be combined to create the final probability dose distribution.

3.5 Autoencoders as a Feature Extraction Method

To measure distance between patients we first need to decide in what space to measure distance. It is of course possible to simply take the difference between two patient scans and take a sum of squares as the distance. This is however impractical as this does not capture what we mean by “similarity”

between patients. When we say that two patients are similar we mean that their organs have similar sizes and geometries. We might even want to capture that some patients have organs that overlap whilst others do not. This idea would not be captured well by the suggestion above.

Instead, we could try to manually construct a mapping from segmentation to a feature space. This mapping could consist of measuring the volume of each organ, their distances to one another and measure the volume in which we have overlaps. Unfortunately, this is likely not feasible as it is difficult to say which features would be useful for our purposes. Instead we will attempt to train a feature extraction method with the help of a neural

(32)

3.5. AEs as a Feature Extraction Method Chapter 3. Pipeline

3.5.1 Neural Networks

Neural networks (NNs) are highly customisable non-linear functions that can be adapted to solve a wide variety of tasks. The neural network traditionally consists of layers which in turn are made up of nodes. The idea behind this is to create a simple model of a biological brain. Even though neural networks have only started to be used on a large scale in practical applications, they have been investigated greatly and their basic concepts have been explained many times before. Therefore, we will only quickly discuss some of the more uncommon concepts used in this project. Further information on NNs can be found in [13,17,18].

3.5.1.1 Convolutional Layers

We mainly cover convolutional layers to facilitate the explanation of the trans- posed convolutional layer. It is better explained in various online resources, including [19,20].

The convolutional layer was developed partially to be able to downsample input data whilst preserving its inherent structure but also to be able to decrease network sizes. By now it is a well known layer type used in many of the best performing networks in a wide range of tasks, such as image classification [21]. The layer builds on a simple idea: instead of looking at the entire image of the data at once, look at a small local portion of the data and treat it before moving on to the next portion preserving the location of each portion in the resulting output. This is done via a convolution of the input image with a smaller convolution matrix sometimes referred to as the

“window” or “kernel”.

We shall explain the idea with a simple example. Imagine a 3 × 3 input image, I, and a 2 × 2 convolution matrix, W , given by

I =





i_0,0 i_0,1 i_0,2 i_1,0 i_1,1 i_1,2 i2,0 i2,1 i2,2



, W =w_0,0 w_0,1 w_1,0 w_1,1

. (3.10)

The convolution can then be taken by doing the following:

1. For each of the the four 2 × 2 corner matrices in I (h_i

0,0 i0,1

i1,0 i1,1

i

and so forth),

(a) take the element-wise product of W and the current corner matrix, (b) sum the resulting entries of the element-wise product, and

(c) place the sum in the element of a 2 × 2 output matrix, O, corresponding to the position of the current corner matrix.

2. O is now the resulting convolution.

This way of constructing the convolution can easily be expanded to larger matrices, to higher dimensional inputs and to include other concepts such as stride lengths [19].