Direct optimization of dose-volume histogram metrics in intensity-modulated radiation therapy treatment planning

(1)

IN

DEGREE PROJECT TECHNOLOGY,

FIRST CYCLE, 15 CREDITS ,

STOCKHOLM SWEDEN 2018

Direct optimization of dose-volume

histogram metrics in

intensity-modulated radiation therapy

treatment planning

(2)

INOM

EXAMENSARBETE TEKNIK,

GRUNDNIVÅ, 15 HP ,

STOCKHOLM SVERIGE 2018

Direkt optimering av

dos-volym-histogram-mått i

intensitetsmodulerad

strålterapiplanering

TIANFANG ZHANG

(3)

Bachelor’s thesis

Division of Optimization and Systems Theory, Department of Mathematics KTH Royal Institute of Technology

Direct optimization of dose-volume histogram

metrics in intensity-modulated radiation therapy

treatment planning

Tianfang Zhang May 21, 2018

(4)

Abstract

In optimization of intensity-modulated radiation therapy treatment plans, dose-volume histogram (DVH) functions are often used as objective functions to minimize the vio-lation of dose-volume criteria. Neither DVH functions nor dose-volume criteria, how-ever, are ideal for gradient-based optimization as the former are not continuously dif-ferentiable and the latter are discontinuous functions of dose, apart from both being nonconvex. In particular, DVH functions often work poorly when used in constraints due to their being identically zero when feasible and having vanishing gradients on the boundary of feasibility.

In this work, we present a general mathematical framework allowing for direct opti-mization on all DVH-based metrics. By regarding voxel doses as sample realizations of an auxiliary random variable and using kernel density estimation to obtain explicit for-mulas, one arrives at formulations of volume-at-dose and dose-at-volume which are in-finitely differentiable functions of dose. This is extended to DVH functions and so called volume-based DVH functions, as well as to min/max-dose functions and mean-tail-dose functions. Explicit expressions for evaluation of function values and corresponding gra-dients are presented. The proposed framework has the advantages of depending on only one smoothness parameter, of approximation errors to conventional counterparts being negligible for practical purposes, and of a general consistency between derived func-tions.

Numerical tests, which were performed for illustrative purposes, show that smooth dose-at-volume works better than quadratic penalties when used in constraints and that smooth DVH functions in certain cases have significant advantage over conventional such. The results of this work have been successfully applied to lexicographic opti-mization in a fluence map optiopti-mization setting.

Keywords: Optimization, intensity-modulated radiation therapy, DVH functions, dose-at-volume,

volume-at-dose, value-at-risk, smooth approximation, kernel density estimation, fluence map op-timization.

(5)

Sammanfattning

Vid optimering av behandlingsplaner i intensitetsmodulerad strålterapi används dos-volym-histogram-funktioner (DVH-funktioner) ofta som målfunktioner för att minimera avståndet till dos-volymkriterier. Varken DVH-funktioner eller dos-volymkriterier är emellertid idealiska för gradientbaserad optimering då de förstnämnda inte är kontin-uerligt deriverbara och de sistnämnda är diskontinuerliga funktioner av dos, samtidigt som båda också är ickekonvexa. Speciellt fungerar DVH-funktioner ofta dåligt i bivil-lkor då de är identiskt noll i tillåtna områden och har försvinnande gradienter på randen till tillåtenhet.

I detta arbete presenteras ett generellt matematiskt ramverk som möjliggör direkt optimering på samtliga DVH-baserade mått. Genom att betrakta voxeldoser som stick-provsutfall från en stokastisk hjälpvariabel och använda ickeparametrisk densitetsskat-tning för att få explicita formler, kan måtten volume-at-dose och dose-at-volume for-muleras som oändligt deriverbara funktioner av dos. Detta utökas till DVH-funktioner och så kallade volymbaserade DVH-funktioner, såväl som till mindos- och maxdosfunk-tioner och medelsvansdos-funkmaxdosfunk-tioner. Explicita uttryck för evaluering av funktionsvär-den och tillhörande gradienter presenteras. Det föreslagna ramverket har fördelarna av att bero på endast en mjukhetsparameter, av att approximationsfelen till konventionella motsvarigheter är försumbara i praktiska sammanhang, och av en allmän konsistens mellan härledda funktioner.

Numeriska tester genomförda i illustrativt syfte visar att slät dose-at-volume fungerar bättre än kvadratiska straff i bivillkor och att släta DVH-funktioner i vissa fall har be-tydlig fördel över konventionella sådana. Resultaten av detta arbete har med framgång applicerats på lexikografisk optimering inom fluensoptimering.

Nyckelord: Optimering, intensitetsmodulerad strålterapi, DVH-funktioner, dose-at-volume,

(6)

Acknowledgments

This Bachelor’s thesis was carried out as a collaboration between the Division of Op-timization and Systems Theory at KTH Royal Institute of Technology and RaySearch Laboratories AB. I would like to express my particular gratitude to Rasmus Bokrantz, my supervisor at RaySearch, for providing insightful general guidance as well as quick replies to miscellaneous specific matters, and to Anders Forsgren, my supervisor at KTH, for sharing enlightening advice and experience. I am grateful to Kjell Eriksson, chief science officer at RaySearch, for providing me with flexibility and encouragement in my work, to Johan Löf, CEO at Raysearch, for giving me the opportunity to gather invaluable work experience, and to my colleagues at the Research department for con-tributing to such a positive and stimulating environment to work in.

Stockholm, May 2018 Tianfang Zhang

(7)

Introduction

More than every third person will develop cancer during their lifetime. Of the 15 million new cases diagnosed worldwide each year, it is estimated that radiation therapy can im-prove the rates of cure in about one-fourth of the cases and provide palliative relief in an additional one-fourth. Since the pre-computer era, where treatment planners manually estimated beam doses and drew on two-dimensional x-ray images by hand, radiation therapy treatment planning has today evolved into a highly complex process involving vast amounts of advanced technology. Important historical advances include the use of linear accelerators for treatment, the CT and the MRI for imaging, and the multileaf collimator for beam shaping. The advent of intensity-modulated radiation therapy in the 1990’s, enabling wide beams to be divided into narrow beamlets with controllable intensities, opened the door for the application of mathematical optimization to handle the canonical tradeoff in radiation therapy, namely that between delivering prescribed dose to the tumor and sparing healthy tissue and risk organs. For the past few decades, mathematical and computational methods concerning optimization of treatment plans has been—and certainly still is—a highly active research area [5].

1.1 Radiation therapy

1.1.1 Radiation therapy and IMRT

Radiation therapy is a type of medical treatment where the patient is irradiated with ion-izing radiation, in most cases as part of a cancer treatment to control or kill malignant cells. Radiation is prescribed by a physician with the intent of curing or preventing recur-rence of malignancies or improving the patient’s life quality when cure is not possible. One distinguishes between external beam therapy, where radiation fields are applied ex-ternally by a linear accelerator, and brachytherapy, where radioactive sources are placed inside or next to the treatment target. The ionizing radiation works by damaging cellular DNA, whose repair mechanisms fail with higher probability for fast-proliferating tissue such as tumors than for normal tissue. It is therefore desirable to deliver sufficient dose to every tumor cell while minimizing the damage to normal tissue, especially risk

(10)

or-gans. The treatment is usually split into fractions, which are typically delivered in daily intervals in order to let healthy cells recover in between [7].

In intensity-modulated treatment planning (IMRT), the treatment consists of a set of photon beams whose fluences, or time-accumulated intensities, are variable. Each beam field is shaped by a multi-leaf collimator (MLC), where the positions of adjustable metal leaves create a beam profile. The resulting treatment plan is given by the super-position of all beam profiles. Intensity modulation can be achieved through two main types of MLC movements: segmented or step-and-shoot MLC (SMLC) and dynamic MLC (DMLC). In the former, each beam is partitioned into separately delivered seg-ments, each of which consists of a unique MLC configuration, with the beam switched off between each segment; in the latter, the MLC leaves move continuously during the delivery of each beam. An extension of these techniques is volumetric-modulated arc therapy (VMAT), where the radiation source is moved continuously in a circular arc around the patient and where the rotation speed, the dose rate as well as the MLC leaves are variable [4].

Figure 1.1: Delivery of an IMRT treatment with six coplanar beams. The intensity profile of each

beam is achieved by shooting several times from the same position with different MLC leaf posi-tions, and the total dose distribution is given by the superposition of all delivered beams. Repro-duced with permission [4].

1.1.2 Treatment planning

A treatment plan is a complete setup of instructions given to the treatment machine. For simple delivery techniques, it is possible for the treatment planner to find suitable machine parameters for a patient by a trial-and-error search, which is called forward planning and commonly done in a treatment planning software (TPS). This is, how-ever, not possible in general for more advanced delivery techniques. Instead, in so called inverse planning, the treatment planner specifies a desired dose distribution to the treatment planning software, which then seeks to find a deliverable plan whose dose distribution minimizes some distance metric to the desired dose distribution. While the ideal dose distribution is often a certain uniform dose in the target volume and zero dose everywhere else, one can in many cases specify higher tolerances in surrounding tissue and risk organs when appropriate. The patient volume is divided into regions of interest (ROIs), which can either be target volumes, organs at risk or external tissue.

(11)

There are different approaches to handling the tradeoff between target dose cov-erage and exposure minimization to healthy tissue. In the simplest form, one can let the treatment planner specify weights to each ROI reflecting the importance that the requirements in the respective ROIs are met. In automated planning, one generates a large number of plans from a specified set of clinical goals and lets the treatment planner choose the most suitable one. Alternatively, in multicriteria optimization (MCO), the treatment planner can interpolate in real time between a set of automatically generated Pareto optimal plans.

Figure 1.2: Planning of a prostate case using a modern treatment planning system, here

RaySta-tion. The treatment planner obtains an optimized plan and is able to visualize the contours of all regions of interest as well as the total dose distribution. Courtesy of RaySearch Laboratories.

1.2 Aims and objectives

This thesis focuses on the mathematical aspects of the IMRT treatment plan optimization process. In particular, the purpose is to improve the mathematical properties of certain objective functions used in the optimization as well as the dose statistics used for plan evaluation, which is done by introducing a general framework in which one can de-rive smooth equivalents to otherwise discontinuous or nondifferentiable functions. The contribution of this work is mainly theoretical, with brief numerical results for demon-strative purposes.

(12)

Chapter 2

Theoretical framework

We present here the basics of the standard mathematical framework used in the opti-mization of IMRT treatment plans. For the purposes of this paper, some notation has been modified and some definitions are given as equivalent reformulations. For a more detailed exposition of existing IMRT literature, see e g [4].

2.1 Discretization and dose calculation

All dose quantities are measured in Gray (Gy), where 1 Gy is defined as 1 Joule of absorbed radiation per kilogram of tissue. The patient volume is discretized into volume elements called voxels, typically amounting to about 106_–107 _{voxels per patient. The}

total dose distribution is described by a dose vector d = (di), where di is the dose received in voxel i. Every ROI can be described by an index set R ∈ R, where R is the collection of such index sets constituting the patient. The relative volume vector r = (ri) describes for each voxel i its volume normalized by the total volume of the ROI to which it belongs—that is,!_i_∈Rri = 1 for each R ∈ R.

The dose distribution is formed by a set of beams, where each beam profile is dis-cretized into two-dimensional beam elements called bixels. A plan, which is determined by the fluence in each bixel collected over all beams, is described by the fluence vector y = (yj), where yj is the fluence in bixel j. Since the dose contribution of a bixel is proportional to its fluence, d is linear in y. In particular, if P = (Pij) is the dose depo-sition matrix, where Pij is the dose contribution to voxel i per unit fluence in bixel j, we can write

d= P y.

Methods for dose calculation include the singular value decomposition (SVD) algorithm [6] and the collapsed cone (CC) algorithm [1], where the former is usually used for approximate and the latter for more accurate calculations.

Note that due to physical limitations, all entries in d, P and y must be nonnegative. Furthermore, not all combinations of bixel fluences are possible as the fluence is only indirectly controlled by the MLC leaf positions and the irradiation time. There are two

(13)

main approaches to handling this restriction: fluence map optimization (FMO) and direct machine parameter optimization (DMPO). In the former, one first performs necessary optimizations regarding each yj as a free nonnegative parameter and then converts the solution to a deliverable plan; in the latter, one writes y as a function of the machine parameters η according to y = ϕ(η) and optimizes on η instead. While FMO leads to cleaner problem formulations by preserving linearity and convexity, DMPO removes the additional step of converting to a deliverable plan.

Figure 2.1: Visualization of the discretization of a beam. The MLC configurations determine the

fluence in each bixel, which in turn determines the dose contribution to each voxel. In FMO, each bixel fluence is regarded as a degree of freedom. [13]

2.2 Plan evaluation criteria

2.2.1 Volume-at-dose and dose-at-volume

When evaluating radiation therapy treatment plans, one mainly uses different types of statistics on the dose distribution. We define the volume-at-dose and dose-at-volume functions as follows:

Definition 1. The volume-at-dose VaD : R → [0, 1] with respect to dose x is the volume fraction of a given ROI which receives a dose of x or higher. More precisely,

VaD(x) = " i: di≥x

(14)

Definition 2. The dose-at-volume DaV : [0, 1] → R with respect to volume v is the lowest dose x such that at least a volume fraction v receives a dose of x or lower. More precisely,

DaV(v) = inf{x ∈ R : VaD(x) ≤ v}.

The volume-at-dose functionVaD(x) is piecewise constant, nonincreasing and right-continuous, ranging from_{1 when x = −∞ to 0 when x = ∞. Defining the Heaviside} step function θ(x) as 1 when x ≥ 0 and 0 otherwise, we can also write

VaD(x) =" i

riθ(di− x).

One can think of the dose-at-volume functionDaV(v) as the inverse of VaD(x), where the infimum is to assure that the function is well-defined at discontinuity points.

2.2.2 Dose-volume histograms and dose-volume criteria

Dose distributions are most commonly visualized in a dose-volume histogram (DVH), where the volumeVaD(x) on the vertical axis is plotted against dose x on the horizontal axis. DVHs are standard in all treatment planning software for plan quality assessment, as they allow for simultaneous visualization of dose distributions in multiple ROIs when all DVH curves are plotted in the same graph.

Dose-volume criteria are requirements on the values of volume-at-dose and dose-at-volume. Min-DVH criteria (max-DVH criteria) are stated on the form

• at most (least) a fractionˆv of the volume should receive less (more) than dose ˆd. Here,ˆv is the reference volume and ˆd the reference dose. One can formulate this in terms of inequalities on either VaD or DaV—for example, a min-DVH criterion can be stated asVaD( ˆd) ≥ ˆv or, equivalently, as DaV(ˆv) ≥ ˆd(for max-DVH, the signs are flipped). In the DVH graph, min-DVH and max-DVH criteria correspond to the requirements that the DVH curve lie above or below, respectively, the point( ˆd, ˆv).

(15)

Figure 2.2: Display of a typical dose-volume histogram. The volume-at-dose VaD(x) is plotted

against dose x, which enables a simultaneous visualization of the dose distributions in all ROIs. A dose-volume criterion can be thought of as the requirement that the DVH curve lie either above or below some point ( ˆd, ˆv).

2.3 Optimization setup

2.3.1 Objective functions

In an optimization framework, one arrives at a treatment plan by minimizing some ob-jective function ψ measuring the deviation from the desired ideal dose distribution. Tra-ditionally, the objective functions used in the optimization are distinct from the actual plan evaluation criteria for various reasons. Apart from historical such, one reason is the fact that dose-volume criteria are not well-suited for gradient-based optimization as both VaD and DaV are discontinuous functions of d. Instead, one usually uses quadratic penalties on the deviation from a reference dose ˆd, which can be defined in a variety of ways.

We let ψ denote a general objective, which can be of any type. All objectives are functions of d, but we shall omit the explicit dependence for brevity.

2.3.1.1 Uniform-dose, min-dose and max-dose functions

One of the simplest objectives is the uniform-dose function, which is defined for a given ROI R as

ψ=" i

(16)

where the sum is over i ∈ R. This corresponds to being indifferent between dose devi-ations below and above the reference dose.

In many cases, however, deviations on one side are more acceptable than on the other side—for instance, it is common to penalize a deficit of dose in the target volume much more than a surplus of equal magnitude. This is resolved by the introduction of min-dose and max-dose functions, which are defined as

ψ=" i

riΘ(di− ˆd)2

whereΘ(x) = min{x, 0} and Θ(x) = max{x, 0} for min-dose and max-dose func-tions, respectively. These objectives penalize deviations on one side in the same way as for uniform-dose functions, but ignore deviations on the other side. It is common to combine these for target volumes.

Whereas uniform-dose functions are infinitely differentiable, min-dose and max-dose functions are once continuously differentiable as the derivative of the ramp function Θ(x) is discontinuous at x = 0. Furthermore, all three functions are convex in d.

2.3.1.2 EUD functions

An equivalent uniform-dose function (EUD function) is given by the weighted power mean EUDa= # " i ridai $1/a ,

where a ∈ R is a parameter. EUDais a continuous, nondecreasing function of a rang-ing between min d when a = −∞ and max d when a = ∞, convex when a ≥ 1 and concave when a ≤ 1. Special cases include the arithmetic mean when a = 1, the geometric mean when a_{= 0 and the harmonic mean when a = −1. Used in} combina-tion with uniform-dose and max-dose/min-dose funccombina-tions, each diis replaced byEUDa in their respective formulas. These objectives are called uniform-EUD, min-EUD and max-EUD functions, respectively.

2.3.1.3 DVH functions

In order to optimize on objectives which relate more closely to a given dose-volume criterion( ˆd, ˆv), we introduce min-DVH and max-DVH functions defined by

ψ=" i∈A

riΘ(di− ˆd)2, where A is the so called active voxel set given by

A=

%

{i ∈ R : di≥ DaV(ˆv)} for min-DVH, {i ∈ R : di≤ DaV(ˆv)} for max-DVH,

(17)

and where the relative volumes in {ri : di = DaV(ˆv)} are uniformly scaled so that the equalities!_i_∈Ari = ˆv and!i∈A = 1 − ˆv are satisfied, respectively. This can also be

formulated as

ψ= ˆ

Θ(DaV(v) − ˆd)2_dv,

where the integral is over v ∈ [0, ˆv] for min-DVH and v ∈ [ˆv, 1] for max-DVH. One can replace the square in the expression for ψ by an arbitrary integer a, in which case the function is called a general-exponent DVH function. Due to the active voxel set, DVH functions are nonconvex and not continuously differentiable.

2.3.1.4 Mean-tail-dose functions

One way of avoiding the nonconvex nature of DVH functions is to instead optimize on lower and upper mean-tail-dose functions, which we denote byMTD−_{(ˆv) and MTD}+_(ˆv),

respectively. These are defined to be the average dose in the lower and upperˆv-tails of the dose distribution—that is,

MTD−_{(ˆv) =} 1 1 − ˆv " i∈A ridi, MTD+(ˆv) = 1_ˆv " i∈A ridi where A= %

{i ∈ R : di≤ DaV(ˆv)} for lower mean-tail-dose, {i ∈ R : di≥ DaV(ˆv)} for upper mean-tail-dose,

and where {ri : di = DaV(ˆv)} are uniformly scaled so that the relative volumes of all active voxels sum to1 − ˆv and ˆv, respectively.

It has been shown that lower (upper) mean-tail-dose functions are continuous and concave (convex) in d [16]. Moreover, as MTD−_{(ˆv) ≤ DaV ≤ MTD}+_{(ˆv), each}

dose-volume criterion is satisfied whenever the corresponding mean-tail-dose criterion is satisfied. For details on the use of mean-tail-dose functions in IMRT treatment plan-ning, see e g [9] and [17].

2.3.2 Optimization problem formulation

Having constructed one or several objective functions for each ROI, the resulting opti-mization problem can be stated generally as a multicriteria optiopti-mization (MCO) problem

minimize

y∈Y ψ1(d(y)), ψ2(d(y)), ..., ψn(d(y))

subject to b(d(y)) = 0, c(d(y)) ≤ 0, where

(18)

• Y is the set of all deliverable fluence vectors, which is equal to the set of all nonnegative vectors for FMO and to the range of the fluence map ϕ for DMPO, • ψ1, ψ2, ..., ψnare the objective functions

• b(d(y)) is a vector containing all equality constraints, which must be linear, and • c(d(y)) is a vector containing all inequality constraints.

Methods based on an MCO formulation are studied in e g [4].

In order to construct a scalar-valued total objective function, one can use a composite objective function on the form of a linear combination !n

i=1wiψi of all objectives, where each weight wi≥ 0 reflects the relative importance of its corresponding objective ψi. It is also common to include a normalization by ˆd2 in the weight for quadratic objectives so as to penalize relative rather than absolute deviation. The optimization problem is then given by

minimize y∈Y n " i=1 wiψi(d(y)) subject to b(d(y)) = 0, c(d(y)) ≤ 0. (2.1)

The optimization problem (2.1) is nonlinear and, in general, nonconvex. However, if all of the inequality constraints and all of the ψi are convex, the resulting optimization problem becomes convex (e g when the problem only contains min-dose and max-dose functions and no DVH functions). This implies that each local optimum must also be a global optimum.

2.4 Optimization algorithm

For the purposes of this paper, we shall use a sequential quadratic programming (SQP) algorithm to find solutions to (2.1). The optimization solver SNOPT [11], which is used to obtain the numerical results of this paper, is based on an SQP method, as is the native optimization solver in RaySearch Laboratories’s commercial software RayStation.

We shall briefly outline the idea of SQP here—for a detailed description, see e g [14]. Let ψ denote the composite objective function in (2.1_{). The Lagrangian L = L(y, µ, λ)}

is given by

L = ψ + µTb+ λTc,

where µ and λ ≥ 0 are multiplier vectors. Necessary first-order optimality conditions are given by the Karush-Kuhn-Tucker conditions (KKT conditions), which state that a local optimum(y∗_{, µ}∗_{, λ}∗_{), in addition to being feasible, must satisfy}

∇L(y∗, µ∗, λ∗) = 0, diag(λ∗_{) c(y}∗_{) = 0,}

(19)

where ∇ is with respect to y. The KKT conditions are sufficient for local optimality under additional second-order conditions.

An SQP method finds successively better approximations {yk} to the KKT condi-tions. At each iteration yk, one solves a quadratic programming (QP) subproblem where the objective is a modified second-order Taylor expansion of ψ around ykand where the constraints are linearized around yk, written as

minimize z 1 2zT & ∇∇TL(yk, µk, λk) ' z+ (∇ψ(yk))T z subject to b(yk) + & ∇b(yk)T 'T z= 0, c(yk) + & ∇c(yk)T 'T z_{≤ 0.}

Solving such a subproblem is called a major iteration, and the solution zk is called a search direction. The next iterate yk+1 is then given by yk+1 = yk + αkzk, corre-sponding to a step in the search direction. For αk, in turn, we approximately solve a minimization problem

minimize

α>0 M(α),

where M is a so called merit function. The value of αkis set to an approximate solution found by line searching, the iterations of which are called minor iterations. In SNOPT as well as RayStation’s solver, M has the form of an augmented Lagrangian.

One often uses a quasi-Newton approximation to ∇∇T_{L in the QP subproblem, as} it is often computationally expensive to calculate. This is done in principle by succes-sively updating a low-memory Hessian approximation at each iteration using first-order information. In RayStation, the Hessian approximation is initialized with the diagonal matrix obtained by setting off-diagonal elements in ∇∇T_{L to zero and updated in each} iteration with vector outer products in a Broyden-Fletcher-Goldfarb-Shanno (BFGS) method [15].

(20)

Chapter 3

Construction of smooth

objectives

The discrepancy between the dose statistics used for plan evaluation and the actual objective functions used for optimization has been addressed multiple times in litera-ture. Discussed approaches include an automation of the process of iteratively adjusting planning requirements [12] and knowledge-based methods to predict reasonable dose-volume criteria for a given patient [3]. Engberg et al [9] use an MCO formulation with mean-tail-dose functions, first introduced in IMRT treatment planning by Romeijn et al [17], as a continuous and convex alternative to dose-at-volume. Andersson [2] approxi-mates volume-at-dose by logistic sigmoid functions and dose-at-volume by constructing two active voxel sets and using a smooth approximation to the maximum operator, en-abling optimization on plan quality measures.

We present here a general framework which allows for direct optimization on all DVH-based metrics, including volume-at-dose and dose-at-volume functions, min-dose and max-dose functions, DVH functions and mean-tail-dose functions. The resulting functions are continuously differentiable with respect to dose, with easily computable values and gradients.

3.1 An auxiliary random variable

3.1.1 Reformulation of conventional metrics

It is often stated that the dose-at-volume and mean-tail-dose metrics are analogous to their financial risk measure counterparts risk (VaR) and conditional value-at-risk (CVaR), respectively [9,17]. For clarity, we shall give the necessary assumptions to claim that the concepts are in fact equal. Making this connection will allow us to proceed onto our main idea.

We introduce an auxiliary random variable D, defined according to the following: Definition 3. The random variable D is the dose given to a point picked uniformly at

(21)

random across the volume of a given ROI.

If we assume that the dose in each voxel is exactly uniform, D becomes a discrete random variable with possible outcomes {di}, each dioccurring with probability ri. De-noting the cumulative distribution function of D by F(x), it is easy to see that its com-plementary cumulative distribution function G(x) = 1 − F(x) is equal to the volume-at-doseVaD(x) at x—that is,

G(x) = VaD(x) =" i

riθ(di− x).

As the value-at-riskVaRα(X) of a random variable X at confidence level α ∈ (0, 1) is defined as [8]

VaRα(X) = inf {x ∈ R : P[X ≤ x] ≤ α} , we have the identity

DaV(ˆv) = VaR1−ˆv(D).

Similarly, as the conditional value-at-risk measureCVaRα(X) is defined by [8] CVaRα(X) = 1

1 − α ˆ 1

α

VaRα′(X) dα′, one can see that

MTD+_{(ˆv) = CVaR}

1−ˆv(D).

Hence, by setting up D, we have formally made the connection between the conven-tionally defined plan evaluation criteria and the statistics framework in which VaR and CVaR are defined. In principle, each DVH-based function can be expressed in terms of D and its distribution.

3.1.2 Further assumptions

In order to better suit our purposes, we make the following alterations of the underlying assumptions:

• D is a continuous, instead of discrete, random variable (that is, its cumulative probability distribution function F_{(x) is continuous at each x ∈ R), and}

• the voxel doses diare independent and identically distributed sample realizations of D, each with relative frequency ri, instead of being the only possible outcomes. Since the physical dose distribution in the patient volume is a smooth distribution in reality and the discretized model is only an idealization, we argue that the above as-sumptions are fair. A consequence of this is that we now have a bounded probability distribution function f(x) to work with, which will be central in all of the following.

In order to get specific formulas for f and G, we are faced with the task of finding a nonparametric estimate of the probability density function of D based on the sample

(22)

d. For the purposes of this paper, we shall use the method of kernel density estimation (KDE) with the Gaussian kernel

k(x) = _ϵ√2πe1 −x2/(2ϵ2)_,

where ϵ is a parameter. The density estimate f(x, d) is then given by f(x, d) ="

i

rik(x − di). Denoting the antiderivative to k by K, which is given by

K(x) =ˆ x −∞ k(x′_{) dx}′ ₌ 1 2 ( 1 + erf(_ϵ√2x )), we get that G can be written as

G(x, d) =" i

riK(di− x).

For convenience, we shall not distinguish between the actual density f(x) and comple-mentary cumulative distribution function G(x) and their corresponding kernel density estimates f(x, d) and G(x, d). The latter ones are henceforth written as f(x) and G(x). Note that since the functions k and K are infinitely differentiable, the above forms of f and G are infinitely differentiable in x as well as in d. f also has the properties of being strictly positive everywhere and integrating to unity, which in particular implies that it satisfies the requirements of being a proper probability density function.

(23)

Figure 3.1: The Gaussian kernel function k approaches the Dirac delta function when ϵ → 0, in

which case f would be the probability density function of the discrete version of D. Loosely speaking, the shape of k determines the ”radius of influence” of each voxel dose, and we shall later see that the gradient of dose-at-volume becomes significant in more voxels as one increases

ϵ.

3.2 Smooth equivalents to volume-at-dose and

dose-at-volume

We are now set to introduce the functions p and q as the smooth equivalents to volume-at-dose and dose-at-volume, respectively, whose dependencies on d are omitted. The definitions are essentially analogous to the conventional ones, but where the KDE es-timate G(x) of the complementary cumulative distribution function to D acts as a re-placement for the DVH curveVaD(x).

3.2.1 Volume-at-dose

Definition 4. For a given reference dose ˆd, the smooth equivalent p toVaD( ˆd) is given by

p= G( ˆd).

Our definition of p essentially coincides with the formulation for volume-at-dose given by Scherrer et al [18] and Andersson [2]. Andersson uses logistic sigmoid functions as smooth replacements for the Heaviside step functions in the formula for volume-at-dose, whereas Scherrer et al specify a general class of such replacements and derive

(24)

sev-eral mathematical properties that such a definition of volume-at-dose satisfies. Scherrer et al prove furthermore that the sublevel sets of the volume-at-dose function are path-connected, which has the consequence that each optimization problem consisting of a single volume-at-dose goal has a connected global minimum to which a gradient-based optimization algorithm is guaranteed to converge. One can use the results to see that the same holds for our smooth formulation of volume-at-dose.

We shall use the notation ∂ifor the componentwise gradient ∂/∂di with respect to dose. In terms of k and K, we have the following formulas:

Proposition 1. The smooth equivalent p to volume-at-dose is given by p="

i

riK(di− ˆd), and its gradient is given componentwise by

∂ip= rik( ˆd− di). Proof. This follows directly from the formula for G.

3.2.2 Dose-at-volume

Definition 5. For a given reference volume ˆv, the smooth equivalent q to DaV(ˆv) is given by

q= G−1(ˆv).

Remark. The inverse G−1_{of G is well-defined everywhere since f} _{= −∂}

xG is nonzero. This corresponds to defining q as the kernel density estimate of the value-at-risk of D at confidence level1−ˆv. Instead of being discontinuous in the case when D was a discrete random variable, it is easy to see that q now depends continuously on d. Moreover, due to the existence of a well-defined probability density function f, it becomes possible to evaluate its gradient with respect to dose at each point where its function value is known. This is formulated in the following:

Proposition 2. The smooth equivalent q to dose-at-volume is uniquely determined by

the equation _"

i

riK(di− q) = ˆv,

and its gradient is given componentwise by the nonlinear differential equation ∂iq=

rik(q − di)

!

(25)

Proof. By definition, q is the unique value satisfying G(q, d) ="

i

riK(di− q) = ˆv,

where we have put in the explicit dependence on d. Furthermore, as ∂xG(x, d) = −f(x, d) ̸= 0, we have by the implicit function theorem that q is an everywhere de-fined, continuously differentiable function of d. Differentiating, we have by the chain rule that 0 = ∂i(G(q, d)) = ∂qG(q, d) ∂iq+ ∂iG(q, d) = −f(q, d) ∂iq+ ∂iG(q, d) or ∂iq= ∂iG(q, d) f(q, d) = rik(q − di) ! i′ri′k(q − di′).

Remark. Note the difference between ∂i(G(q, d)) and ∂iG(q, d), which denote partial differentiation of the functions d *→ G(q(d), d) and (q(d), d) *→ G(q(d), d), respec-tively.

Gradients of value-at-risk measures has been studied by e g Uryasev [19]. Note that unlike p, one cannot write q as a closed-form expression using k and K since it is defined as an inverse value of G. In practice, however, one can evaluate q numerically for each d by a root-finding algorithm, which is easy since G(x) is strictly decreasing in x. The gradient is then evaluated by the nonlinear differential equation in Proposition

2. Evaluating G(x) multiple times with the same d can be made more computationally

(26)

Figure 3.2: Visualization of the relation between p and q and their corresponding dose-volume

criterion ( ˆd, ˆv).

3.2.3 General properties of q

One can say that the somewhat strange form of ∂iq is a consequence of the fact that q can only be expressed implicitly. From the formula in Proposition2, it is easy to make the observation that the entries in its gradient are all positive and sum to unity. An interpretation of this is that the size of each ∂iq reflects the contribution to the change in q caused by an infinitesimal change in di, measured relative to the contributions from all other voxels. Thus, if q is far away from all di, all gradient entries will be roughly equal in size, whereas if q is in the middle of the di, the gradient will be significant only with respect to the voxel doses lying close to q. This is in contrast to the gradient of p, which is near zero when the voxel doses are far away from ˆd. Hence, despite having no closed formula for its value, much speaks for q being a more well-behaved metric than p when formulating dose-volume criteria.

Since f is nonzero, the implicit function theorem guarantees that q is continuously differentiable everywhere with respect to d. In fact, we can say much more:

Proposition 3. q is an infinitely differentiable function of d. Proof. See appendix.

One can also show that q satisfies the same property which Scherrer et al [18] proved to hold for both conventional volume-at-dose functions and p, namely that of

(27)

path-connectedness of all sublevel sets, which has the consequence that a gradient-based optimization algorithm is guaranteed to converge to an optimal solution in a problem with only one at-volume goal. This property fails to hold for conventional dose-at-volume functions. We give the proof for the path-connectedness of all sublevel sets of q and refer to Scherrer et al for the other subsequently implicated properties.

Proposition 4. Let Sq(x) denote the x-sublevel set of q, defined as Sq(x) = {d : q(d) ≤ x}.

Then, for all x, Sq(x) is path-connected.

Proof. First, since ∂iq > 0 for all i by Proposition2, q satisfies the monotonicity prop-erty

d≤ d′ =⇒ q(d) ≤ q(d′),

where inequality is considered componentwise. Consider now d, d′ _{∈ S}

q(x) and let min{d, d′_{} be the vector whose ith component is min{d}_i_{, d}′

i} for all i. Let Γ be the line segment connecting d andmin{d, d′_{}, and let Γ}′_{be the line segment connecting d}′_and

min{d, d′_{}. Note that for each point δ ∈ Γ and δ ∈ Γ}′_{, we have δ ≤ d and δ ≤ d}′_,

respectively. Hence, for each δ ∈ Γ ∪ Γ′_{, we have}

q(δ) ≤ max{q(d), q(d′)} ≤ x, soΓ ∪ Γ′ _{⊆ S}

q(x). This proves the path-connectedness of Sq(x) for each x.

One can think of ϵ in k and K as a smoothness parameter controlling the well-behavedness of p and q, where their conventional definitions are obtained in the limit ϵ → 0 (see Figure3.1). Since ϵ is essentially the standard deviation in the Gaussian kernel k, a higher value corresponds to each voxel “influencing” a larger neighborhood around its value. For example, in the formula for the gradient of q, the higher ϵ, the more components will have a non-negligible value and contribute to the change in q. There is, however, a tradeoff between the smoothness of q and the agreement between q and the actual value of dose-at-volume, where a larger ϵ in general also implies a larger discrepancy. The following provides a crude estimate of |q − DaV(ˆv)|, showing that it is roughly proportional to ϵ.

Proposition 5. For each reference volume ˆv, we have the approximate inequality |q − DaV(ˆv)| < ϵ.

Proof. See appendix.

Remark. Due to the use of approximations in multiple steps in the proof, the result-ing inequality can only be seen as a loose estimate on the error. Establishresult-ing rigorous

(28)

bounds which are also “realistic” will in general require further assumptions about the dose distribution. Experiments suggest that this estimate holds for reasonable parameter values and dose distributions.

3.3 Other linear-dose functions

Having constructed p and q along with their respective gradients, we now have a method of optimizing on volume-at-dose and dose-at-volume criteria directly. Here, we give examples of other criteria which can be optimized on directly based on the same formu-lation.

3.3.1 Min-dose and max-dose criteria

Not to be confused with min-dose and max-dose functions, which are quadratic in dose, by min-dose and max-dose criteria we mean here the requirement that all voxel doses in some ROI be larger than or smaller than some reference dose ˆd. We thus need a way to express the minimum and the maximum doses as smooth functions of d. This has traditionally been done by functions such as the log-sum-exp (LSE) function [10] and the weighted power mean (WPM) function [2], but we shall here use a special case of q. In short, the minimum and maximum doses are approximated by setting the ref-erence volume to a number near 1 and 0, respectively. We introduce the notion of a reference relative volumeˆr, which is usually set to a typical value of riin the ROI. The smooth equivalents qmin and qmax tomin d and max d, respectively, are then defined

by the following:

Definition 6. Given a reference relative volume ˆr, qminand qmaxare given by

qmin= G−1(1 − ˆr), qmax= G−1(ˆr).

This corresponds to qminand qmaxbeing the values of q with the reference volume

set to_{1−ˆrand ˆr, respectively. In practice, a suitable value of ˆrdepends on the situation,} where there is a tradeoff between agreement with the actual value for relatively largeˆr and guarantee of criterion satisfication for relatively small ˆr. The latter side of this tradeoff is formulated in the following:

Lemma 1. With the choice ˆr = min r/2, we have that qmin ≤ min d and qmax≥ max d.

Proof. Letting i′_{be such that d}

i′ = min d, we have G(min d) =" i riK(di− min d) ≤ " i̸=i′ ri+ ri′ 2 = 1 − ri′ 2 ≤1 − ˆr 2 = G(qmin), where we have used the fact that K(0) = 1/2. Since G is strictly decreasing, qmin ≤ min d. The other inequality is proven analogously.

(29)

3.3.2 Mean-tail-dose functions

Mean-tail-dose functions have been studied in several papers due to its favorable math-ematical properties, such as being continuous and convex. Several methods for optimiz-ing on mean-tail-dose functions have been proposed, includoptimiz-ing rewritoptimiz-ing the function value as a global minimum to a certain function and reformulating the original opti-mization problem on linear programming form [9,17]. This might, however, lead to computationally expensive formulations depending on the chosen optimization method. In our framework, it turns out that mean-tail-dose functions and their gradients can be evaluated directly and have particularly neat forms. Denote by q−_{and q}+_{the smooth}

equivalents toMTD− _and_MTD+_{, respectively. We then have the following for q}−

(formulas for q+are analogous):

Proposition 6. Using the shorthand notation ki(x) = k(x−di) and Ki(x) = K(x−di), the smooth equivalent q−_{to lower mean-tail-dose is given by}

q−= 1 1 − ˆv " i ri & diKi(q) − ϵ2ki(q) ' , and its gradient by

∂iq−= _{1 − ˆv}1 riKi(q).

Proof. Recall that for a given reference volume ˆv, MTD−_{(ˆv) is equivalent to the mean}

dose in the lowerˆv-tail of the dose distribution. In terms of our random variable D in the discrete case, we can formulate this as the conditional expectation

MTD−_{(ˆv) = E [D | D ≤ DaV(ˆv)] .}

In the continuous case, we thus have q−= 1 1 − ˆv ˆ q −∞ xf(x) dx = 1 1 − ˆv " i ri ˆ q −∞ xki(x) dx = 1 1 − ˆv " i ri & diKi(q) − ϵ2ki(q) ' ,

where we have integrated by parts and used the fact that ∂x(xK(x) + ϵ2k(x)) = K(x). For the gradient, we have

∂iq−= 1 1 − ˆv∂i ˆ q −∞ xf(x) dx = 1 1 − ˆv # ˆ q −∞ x∂if(x) dx + qf(q)∂iq $ = 1 1 − ˆv # x∂iF(x) * * * * q −∞ − ˆ q −∞ ∂iF(x) dx + qf(q)∂iq $ = 1 1 − ˆv # −rixki(x) * * * * q ∞ +ˆ q −∞ riki(x) dx + riqki(q) $ = 1 1 − ˆvriKi(q),

(30)

where we have used Leibniz’s integral rule and the fact that f(q)∂iq = riki(q) from Proposition2.

Since our density estimate f(x) satisfies the requirements for a probability density function, the well-known fact that q− _{and q}+ _{are concave and convex, respectively,}

follows from [16]. It is easy to verify this directly in our case by computing the Hessian of q−_{, which is done in the following:}

Lemma 2. q−_{as defined in Proposition}₆_{is concave.}

3.4 Quadratic penalty functions

Although p and q work well for optimizing on dose-volume criteria, there are situations where it is more desirable to use DVH functions as objectives. A key difference between optimizing on particular dose-volume criteria and on DVH-type quadratic penalties is that the former only controls a single point on the DVH curve, whereas the latter pe-nalizes all voxels lying above or below some critical value. Thus, by only optimizing on criteria, there is risk of obtaining a DVH curve with “sharp corners”. We therefore complete the above with derivations of smooth equivalents to DVH functions, with min-dose and max-min-dose functions as special cases. For brevity, detailed formulas are only presented for the mDVH case. We also introduce a new type of quadratic penalty in-tended to work as a based counterpart to DVH functions, which we call volume-based DVH functions.

3.4.1 DVH functions

We start by rewriting the conventional min-DVH function on the form ψ=" i riθ(di− DaV(ˆv))Θ(di− ˆd)2= " i riθ(ˆv − VaD(di))Θ(di− ˆd)2, where the Heaviside step functions work as indicator functions for the active voxel set. As

VaD(x) =" i

riθ(di− x), we can write ψ as a Riemann-Stieltjes integral

ψ= − ˆ ∞

−∞

θ(ˆv − VaD(x))Θ(x − ˆd)2dVaD(x). ReplacingVaD(x) with its smooth equivalent G(x), we get instead

ψ= − ˆ ∞ −∞ θ(ˆv − G(x))Θ(x − ˆd)2dG(x) = ˆ dˆ min{q, ˆd}( ˆd− x) 2_f_{(x) dx,}

(31)

where we have used the fact that G(q) = ˆv. We shall use this as our definition of smooth DVH functions.

Definition 7. The smooth equivalent ψ to a DVH function with respect to the dose-volume criterion( ˆd, ˆv) is given by

ψ= ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ ˆ dˆ min{q, ˆd}( ˆd− x) 2_f_{(x) dx,} _{for min-DVH,} ˆ max{q, ˆd} ˆ d ( ˆd− x)2f(x) dx, for max-DVH.

Remark. For general-exponent DVH functions, the square in the expression is replaced by an arbitrary integer a.

Before arriving at explicit formulas for ψ and ∂iψ, we first prove a useful lemma. Apart from being used in the proof of Proposition7, this lemma suggests that the gradient of a smooth DVH function does not explicitly depend on the gradient of q.

Lemma 3. For a smooth min-DVH function ψ, we have

∂iψ= 2 ˆ dˆ

min{q, ˆd}( ˆd− x)∂iF(x) dx.

Proof. It suffices to consider the case q ≤ ˆd since otherwise ψ = 0. Recall that q as well as f(x) depend on d. Replacing min{q, ˆd} with q, we have by Leibniz’s integral rule that ∂iψ= ∂i ˆ dˆ q ( ˆd− x)2f(x) dx = ˆ dˆ q ( ˆd− x)2∂if(x) dx − ( ˆd− q)2f(q)∂iq = ( ˆd_{− x)}2∂iF(x) * * * * ˆ d q +ˆ dˆ q 2( ˆd_{− x)∂}iF(x) dx − ( ˆd− q)2f(q)∂iq = −( ˆd− q)2_∂ iF(q) + 2 ˆ dˆ q ( ˆd− x)∂iF(x) dx − ( ˆd− q)2∂iG(q) = 2ˆ dˆ q ( ˆd− x)∂iF(x) dx,

where we have used the fact that ∂iq = ∂iG(q)/f(q) = −∂iF(q)/f(q) from Proposi-tion2.

Explicit formulas for ψ, its gradient and its Hessian, which happens to have a par-ticularly neat expression, are given in the following. In particular, as the Hessian can be

(32)

written as a sum of a diagonal matrix and a vector outer product, one can store the full Hessian using limited amounts of memory.

Proposition 7. Let ψ be the smooth min-DVH function with respect to a dose-volume criterion( ˆd, ˆv). Using notation from Proposition6, we have that whenever q < ˆd, its value is given by ψ=" i ri & ( ˆd− di)2+ ϵ2 ' & Ki( ˆd) − Ki(q) ' + ϵ2" i ri & ( ˆd_{− d}i)ki( ˆd) − (2 ˆd− q − di)ki(q) ' , its gradient by ∂iψ= −2ri ( ( ˆd_{− d}i) & Ki( ˆd) − Ki(q) ' + ϵ2&_k i( ˆd) − ki(q) ' ) , and its Hessian by

∂i∂i′ψ= 2( ˆd− q) f(q) & riri′ki(q)ki′(q) − δii′f(q)riki(q) ' + 2δii′ri & Ki( ˆd) − Ki(q) ' . When q ≥ ˆd, ψ is identically zero.

Remark. Note that the above implies that ψ is at least twice continuously differentiable since ∂iψ and ∂i∂i′ψ both vanish at q = ˆd, which is the only possible place where eventual discontinuities may occur.

3.4.2 Min-dose and max-dose functions

For min-dose and max-dose functions, note that they can be seen as special cases of conventional min-DVH and max-DVH functions when the reference volume is set to 1 and 0, respectively. This is equivalent to always including all voxels in the active voxel set. Since G_{(x) approaches 1 and 0 when x → −∞ and x → ∞, respectively,} q= −∞ for min-dose and q = ∞ max-dose. For a smooth equivalent ψ to conventional min-dose (formulas for max-dose functions are analogous), we then have the following: Proposition 8. Let ψ be a smooth min-dose function with respect to a dose-volume cri-terion( ˆd, ˆv). Using notation from Proposition7, its value is given by

ψ=" i ri & ( ˆd_{− d}i)2+ ϵ2 ' Ki( ˆd) + ϵ2 " i ri( ˆd− di)ki( ˆd) its gradient by ∂iψ= −2ri & ( ˆd_{− d}i)Ki( ˆd) + ϵ2ki( ˆd) ' ,

(33)

and its Hessian by

∂i∂i′ψ= 2δii′riKi( ˆd).

Proof. This follows from Proposition7by substituting in q_{= ±∞.}

3.4.3 Volume-based DVH functions

DVH functions can be seen as a quadratic penalty on the volume-accumulated deviation in dose from the actual to the desired value. For each volume level, the dose deviation is penalized quadratically, so the resulting objective function is quadratic in dose and linear in volume. As a volume-based rather than dose-based objective, we introduce the concept of a volume-based DVH function (vDVH function) which is instead quadratic in volume and linear in dose. We distinguish between min-vDVH and max-vDVH func-tions, defined more precisely in the following:

Definition 8. A smooth vDVH function ψ with respect to the dose-volume criterion ( ˆd, ˆv) is given by ψ= ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ ˆ dˆ min{q, ˆd}(ˆv − G(x)) 2 _dx_, _{for min-vDVH}_, ˆ min{q, ˆd} ˆ d (ˆv − G(x)) 2 _{dx, for max-vDVH.}

We shall refrain from giving explicit formulas for smooth vDVH functions at this stage. Although possible to express in terms of k and K, due to the square of G occurring in the integrand, the resulting expression contains a non-separable double sum over the voxels, which for most practical purposes is computationally infeasible. Manageable formulas are instead given in Section3.5.2.2.

3.5 Discrete Fourier transform approximation

The fact that it was not possible to derive a computationally efficient formula for smooth vDVH functions suggests of a slight weakness in our framework. In particular, the expressions for f(x) and G(x) do not behave well under translation by x as one needs to reevaluate the ks and Ks for each different value of x. Even an approximate evaluation of smooth vDVH functions by e g Riemann sums would need G(x) to be evaluated at a large number of points, which may be of significant computational expense. Moreover, note that we avoided a derivation of an explicit formula for smooth general-exponent DVH functions—this is due to the fact that k and K are relatively cumbersome to work with, especially in integral expressions.

Thus, in order to improve the properties of f and G under these circumstances, it is natural to instead work in the Fourier domain, motivated by the ambition to facilitate translation operations. The reasoning below shows that it is straightforward and under

(34)

some circumstances more efficient to evaluate certain functions with f and G written as Fourier integrals. Applications include formulas for smooth general-exponent DVH functions and vDVH functions and a potential speed-up in the root-finding process when evaluating q.

3.5.1 Construction

We start by writing f and G as Fourier transforms in the following way: Lemma 4. The functions f and G are given by

f(x) = 1 2π " i ri ˆ ∞ −∞ e−ϵ2ξ2/2eiξ(di−x)_dξ, G(x) = 1₂ +_2π1 " i ri ˆ ∞ −∞ e−ϵ2ξ2/2e iξ(di−x) ξ dξ,

where i is the imaginary unit. Proof. See appendix.

Remark. Note that the latter is convergent since sin(ξ(di− x))/ξ is finite at ξ = 0. Having Lemma6.1.5, we shall approximate the integral expressions with finite sums, which in practice amounts to having performed a discrete Fourier transform where the integrand is sampled at finite intervals. This is motivated by the observation that f, as a sum of Gaussians, decreases rapidly beyond the range of voxel doses and can be said to have compact support with good approximation. By Nyquist-Shannon’s sampling theo-rem, f can thus be reconstructed by an infinite series for sufficiently high sampling rates. Furthermore, since the integrand itself is rapidly decreasing, appropriate truncation of the series will not affect numerical accuracy. All of this is formalized in Proposition

6.1.7.

Note that on series form, it becomes possible to perform the summation with re-spect to i in advance, so that the resulting function of x is only a sum over the integral discretization index. For practical purposes, the number of discretization points can be made many orders of magnitude smaller than the number of voxels. We thus arrive at the following:

Proposition 9. Using the discretization points ξ ∈ {j∆ξ}jmax

(35)

suitably chosen parameters, f and G are approximately given by f(x) ≈ ∆ξ 2π +∆ξπ Re j_"max j=1 bje−ij∆ξx, G(x) ≈ 1 2+∆ξ2π(d − x) +π1 Im j_"max j=1 bj je −ij∆ξx_,

where d=!iridi is the mean dose and bj = e−ϵ

2_j2_∆ξ2_/₂" i

rieij∆ξdi, j = 0, 1, ..., jmax.

When choosing appropriate parameter values, one would like the coefficients bj to approach zero when j approaches jmax. This is controlled by a parameter nϵ, which loosely can be thought of as the number of standard deviations we are taking into account in the Gaussian factor in bj. The following gives suggestions on parameter value choices and bounds the approximation error in terms of nϵ.

Proposition 10. Given nϵ, with the parameter choices

∆ξ = 2π

max |di− x| + nϵϵ, jmax= nϵ ϵ∆ξ, the error in the Fourier series approximation is strictly bounded by

1 ϵ /₂ π ( 1 + 1 nϵ√2π + 1 n2_ϵ ) e−n2ϵ/2. Proof. See appendix.

Remark. In practice, when evaluating f and G, the “maximum frequency” max |di− x| can be significantly reduced by first removing terms not affecting the total sum. For example, if numerically precise values are needed only in some interval (as is the case when evaluating q), removing terms with di sufficiently far away from the interval re-duces the size of_{max |d}i− x| and thus the number of required discretization points.

3.5.2 Derivation of additional formulas 3.5.2.1 General-exponent DVH functions

Recall that a smooth general-exponent min-DVH function is given by ψ=

ˆ dˆ

min{q, ˆd}( ˆd− x)

(36)

where a is a positive integer. One can think of a as a parameter where a greater value corresponds to more emphasis on the area aroundˆv in the objective function, and vice versa. Note that ψ1/aconverges to q as a → ∞, so taking the power 1/a and adjusting a can be thought of as interpolating between dose-at-volume and the square root of the corresponding DVH function. This might be of use considering their difference in behavior during optimization—see Chapter4for details.

It is hard (though perhaps not impossible) to find a manageable expression for ψ using only k and K. With the series form in Proposition6.1.6, however, this becomes straightforward.

Proposition 11. The a-exponent smooth min-DVH function ψ is given by

ψ= ∆ξ 2π ( ˆd_{− q)}a+1 a+ 1 +a!∆ξ π Re j_"max j=1 bj #

eij∆ξ(ij∆ξ)−a−1_{− e}ij∆ξq a " ℓ=0 ( ˆd_{− q)}ℓ ℓ! (ij∆ξ) ℓ−a−1 $ .

3.5.2.2 Volume-based DVH functions

Following Definition8, we now give the series form of a formula for smooth vDVH functions. Being able to optimize on quadratic penalties corresponding to volume-at-dose criteria is likely to be of use, considering the results in Chapter4. Since jmaxis

relatively small, the double sum in the following expression does not lead to computa-tional issues.

Proposition 12. The smooth min-vDVH function ψ is given by

ψ= ( ˆv − 1₂)2( ˆd− q) − ( ˆv − 1₂)_π∆ξ1 j"max j=−jmax bj j2i & e−ij∆ξ ˆd− e−ij∆ξq' + 1 4π2_∆ξ j_"max j=−jmax j_"max j′_=−j_max bjbj′ jj′(j + j′) & e−i(j+j′)∆ξ ˆd_{− e}−i(j+j′)∆ξq'.

(37)

Chapter 4

Numerical results

We present here a selection of numerical results for objective functions based on the smooth metrics. Although these are not sufficiently thorough for drawing definitive conclusions, they illustrate some of the strengths and weaknesses of the involved objec-tive functions. We restricted the study to test the performance of smooth dose-at-volume and smooth DVH function objectives against conventional DVH function objectives.

The following tests were performed in MATLAB R2017b, with SNOPT 7.0 [11] as optimization solver. The dose deposition matrix P was exported from RayStation and loaded onto MATLAB along with relative volume vectors, where the dose d could be computed from each bixel vector y by ordinary matrix multiplication. The objective functions and their gradients were implemented and supplied to SNOPT’s MATLAB in-terface. The computations were run on a laptop with an Intel Core i7-7820HQ 2.90GHz CPU and 32.0 GB of RAM.

4.1 FMO with prostate patient

4.1.1 Problem formulation

All tests were performed on a prostate patient discretized into 1 006 236 voxels of res-olution 3 mm × 3 mm × 3 mm. The tests concerned fluence map optimization of an SMLC plan with 5 coplanar beams, discretized into 2 316 bixels. For the purposes of these numerical tests, planning objectives were formulated as in Table4.1.

(38)

Table 4.1: Problem formulation for the FMO tests. The objectives specified above were included

in a composite objective function on the form (2.1).

ROI Func type Ref dose Ref volume Weight

PTV min-DVH 6 700 cGy 0.98 2.0

PTV max-DVH 7 200 cGy 0.02 1.0

Bladder max-DVH 1 800 cGy 0.3 0.1

Rectum max-DVH 2 300 cGy 0.3 0.1

Femoral head (L) max-DVH 1 200 cGy 0.02 0.1

Femoral head (R) max-DVH 1 200 cGy 0.02 0.1

4.1.2 Unconstrained optimization

We first tested the unconstrained version of the problem (2.1), with weights of the re-spective objectives given in Table4.1and additionally weighted by108/ ˆd2. For con-ventional and smooth DVH functions, the objective ψ for each ROI was given by the expressions defined in Sections2.3.1.3and3.4.1, respectively, and for direct optimiza-tion on the smooth dose-at-volume metric, we used

ψ= Θ(q − ˆd)2.

All optimizations were initialized with the uniform bixel vector giving a mean dose of 6 950 cGy in the PTV. The major iteration limit was set to 500 and the minor iteration limit to 10 000. For the smooth functions the parameter ϵ was set to 5 cGy.

4.1.2.1 Conventional DVH functions

The optimization ran for 121 major iterations before terminating after numerical diffi-culties (SNOPT exit flag 41). The final objective function value was 9.0767 and the 2-norm of the dose deviations in the resulting plan was 89.94 cGy. Optimized values are shown in Table4.2.

Table 4.2: Test results for unconstrained optimization with conventional DVH functions.

ROI Func type Dose-at-volume Ref dose Satisfied

PTV min-DVH 6 639 cGy 6 700 cGy No

PTV max-DVH 7 235 cGy 7 200 cGy No

Bladder max-DVH 1 835 cGy 1 800 cGy No

Rectum max-DVH 2 336 cGy 2 300 cGy No

Femoral head (L) max-DVH 1 225 cGy 1 200 cGy No

(39)

4.1.2.2 Smooth DVH functions

The optimization ran for the full 500 major iterations (SNOPT exit flag 32). The fi-nal objective function value was 0.49698 and the 2-norm of the dose deviations in the resulting plan was 10.70 cGy. Optimized values are shown in Table4.3.

Table 4.3: Test results for unconstrained optimization with smooth DVH functions.

Rectum max-DVH 2 300 cGy 2 300 cGy Yes

Femoral head (R) max-DVH 1 198 cGy 1 200 cGy Yes

4.1.2.3 Smooth dose-at-volume

The optimization ran for the full 500 major iterations (SNOPT exit flag 32). The fi-nal objective function value was 77 621 and the 2-norm of the dose deviations in the resulting plan was 157.2 cGy. Optimized values are shown in Table4.4.

Table 4.4: Test results for unconstrained optimization with smooth dose-at-volume.

Rectum max-DVH 2 340 cGy 2 300 cGy No

Femoral head (R) max-DVH 1 208 cGy 1 200 cGy No

4.1.2.4 DVH comparison

(40)

Figure 4.1: Comparison of DVHs for the three different objectives in the unconstrained case. The

solid, dashed and dotted-dashed lines represent results from conventional DVH functions, smooth DVH functions and smooth dose-at-volume, respectively.

4.1.3 Constrained optimization

Instead of optimizing on the requirements in all ROIs at once, we here tested optimiz-ing on the OAR’s subject to the constraint that the goals for the PTV were satisfied. This was done by first optimizing on the goals for the PTV and then warm-starting the optimization of the other goals. The objectives for each ROI were the same as for the unconstrained case, and the constraints were formulated as ψ ≤ 0, where ψ was the usual objective when using conventional and smooth DVH functions and ψ_{= ±(q − ˆd)} when using smooth dose-at-volume. Note that ψ can be negative only in the latter case. For smooth dose-at-volume, since the units for the constraints and objectives are not the same, the constraints were all scaled by 0.1. The major iteration limit was set to 300 and the minor iteration limit to 10 000.

4.1.3.1 Conventional DVH functions

A solution satisfying the goals for the PTV was found on the first major iteration. The constrained optimization ran for the full 300 major iterations (SNOPT exit flag 32). The final objective function value was 11 274 and the 2-norm of the dose deviations in the resulting plan was 718.8 cGy. The constraint infeasibilities were 6 and 4 cGy, respectively. Optimized values are shown in Table4.5.

Direct optimization of dose-volume histogram metrics in intensity-modulated radiation therapy treatment planning

Direct optimization of dose-volume

histogram metrics in

intensity-modulated radiation therapy

treatment planning

Direkt optimering av

dos-volym-histogram-mått i

intensitetsmodulerad

strålterapiplanering

TIANFANG ZHANG

Direct optimization of dose-volume histogram

metrics in intensity-modulated radiation therapy

treatment planning

Abstract

Sammanfattning

Acknowledgments

Contents

Chapter 1

Introduction

1.1 Radiation therapy

1.2 Aims and objectives

Chapter 2

Theoretical framework

2.1 Discretization and dose calculation

2.2 Plan evaluation criteria

2.3 Optimization setup

2.4 Optimization algorithm

Chapter 3

Construction of smooth

objectives

3.1 An auxiliary random variable

3.2 Smooth equivalents to volume-at-dose and

dose-at-volume

3.3 Other linear-dose functions

3.4 Quadratic penalty functions

3.5 Discrete Fourier transform approximation

Chapter 4

Numerical results

4.1 FMO with prostate patient