A Partially Observable Markov Decision Process for Breast Cancer Screening

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master thesis, 30 ECTS | Statistics and Data Mining

2019 | LIU-IDA/STAT-A--19/003--SE

A

Partially

Observable

Markov Decision Process for

Breast Cancer Screening

Joshua Hudson

Supervisor : José Peña

Examiner : Anders Nordgaard

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(3)

Abstract

In the US, breast cancer is one of the most common forms of cancer and the most lethal. There are many decisions that must be made by the doctor and/or the patient when deal-ing with a potential breast cancer. Many of these decisions are made under uncertainty, whether it is the uncertainty related to the progression of the patient’s health, or that re-lated to the accuracy of the doctor’s tests. Each possible action under consideration can have positive effects, such as a surgery successfully removing a tumour, and negative ef-fects: a post-surgery infection for example. The human mind simply cannot take into ac-count all the variables involved and possible outcomes when making these decisions. In this report, a detailed Partially Observable Markov Decision Process (POMDP) for breast cancer screening decisions is presented. It includes 151 states, covering 144 different can-cer states, and 2 competing screening methods. The necessary parameters were first set up using relevant medical literature and a patient history simulator. Then the POMDP was solved optimally for an infinite horizon, using the Perseus algorithm. The resulting policy provided several recommendations for breast cancer screening. The results indicated that clinical breast examinations are important for screening younger women. Regarding the decision to operate on a woman with breast cancer, the policy showed that invasive can-cers with either a tumour size above 1.5 cm or which are in metastasis, should be surgically removed as soon as possible. However, the policy also recommended that patients who are certain to be healthy should have a breast biopsy. The cause of this error was explored further and the conclusion was reached that a finite horizon may be more appropriate for this application.

(4)

Acknowledgments

I would like to thank my supervisor José for providing vital feedback all throughout the project and particularly helping me to make the difficult decisions in the lead up to the initial Spring deadline. Thank you for agreeing to stay on as my supervisor in the Autumn semester. I would also like to thank Jennifer and Pedro for their tremendous help over the last year; without their expertise and patience this thesis would not have been possible. I am eternally grateful for all the time you put aside for me.

A big thank you to Combine Control Systems for taking me and Carles on for the project. A special thanks to Lia Silva-Lopez, who along with Jennifer and Pedro, kick-started this whole thesis and remained so positive despite the setbacks. Thank you for being our first point of contact and for organising the trip to Orebro.

Finally I would like to thank Carles for his invaluable support and companionship, all those hours working alongside me at Combine’s Linköping office.

(5)

List of Figures

3.1 Screen-shot showing the top-left part of the .POMDP input file . . . 20

3.2 Screen-shot showing the top-left part of the .alpha output file . . . 21

4.1 Age distribution for in situ cancer incidence: simulated vs. SEER data . . . 23

4.2 Age distribution for invasive cancer incidence: simulated vs. SEER data . . . 24

4.3 Plot of 10 simulated patients’ state trajectories over time. . . 25

4.4 Heatmap showing the natural transition probabilities . . . 26

(7)

List of Tables

3.1 Table showing the CISNET cancer stage definitions . . . 8 3.2 Observation probability model for "CBE" . . . 13 3.3 Observation probability model for "mammography". . . 13 3.4 Table of the factors affecting quality of life and the sources of information used . . 14 3.5 Table showing each factor’s effect on quality of life and duration of the effect . . . 17

(8)

1 Introduction

1.1 Background

According to the World Cancer Research Fund, breast cancer is the most common form of cancer occurring in women worldwide [1]. In the US, roughly 1 in 8 women will develop invasive breast cancer over the course of their lifetime [2]. In 2018, it is estimated that around 40,920 American women will die of breast cancer [2]. However, death rates have been de-creasing since 1989, particularly among women under 50, with the progress being attributed to not only treatment improvement, but also better awareness and early detection rates [2]. The importance of early detection can be summarised by the fact that 90% of women diag-nosed with early stage breast cancer live at least 5 years after the diagnosis, compared to 15% of women diagnosed with the most advanced stage [3]. The problem is that, in many cases, breast cancers are symptom-less and go undetected, all the while the cancer grows and the risk of death increases. In addition, it has been shown that the costs associated with treating breast cancer are significantly lower for patients diagnosed at an earlier stage as opposed to those diagnosed at a late stage [4].

Since early detection has such an impact on outcome, screening for breast cancer is clearly of utmost importance. Screening in the medical sense means testing for a disease in members of the population who have not been diagnosed with, and may not even show symptoms of, the said disease. A mammography is the most common method of screening for breast cancer. It is the only method proven to detect breast cancer before any symptoms express themselves [5]. It is a low-dose x-ray of the breast. Despite its popularity, mammography is far from perfect. It can give false-negative results leading to a patient’s cancer going unde-tected, as well as false-positive results, leading to over-diagnosis [6]. There are 2 other main types of screening: breast self-examination (BSE) and clinical breast exam (CBE). Both are manual examinations performed by the woman herself in the former and by a doctor in the latter. However the value of both tests has been a topic of much debate, as their accuracy is much lower than that of a mammography, particularly at early stages of the cancer when the tumour is small [7]. Despite this, these tests have the advantage of coming at low or no cost and minimal effort for the patient.

There are many treatment options for breast cancer. In most cases, once a woman has been diagnosed with breast cancer, she undergoes surgery to remove the whole or part of the

(9)

1.2. Literature Review

ing this primary therapy, the patient can be given adjuvant therapy, such as chemotherapy or anti-hormone drugs. These procedures come at a cost, not only monetary but also in the form of side-effects like hair-loss and fatigue in the case of chemotherapy.

Breast cancer clearly involves a multitude of test and treatment decisions, all made under a high level of uncertainty on the health state of the patient and the accuracy of the tests. Therefore, a probabilistic model would seem appropriate. A Partially Observable Markov Decision Process (POMDP) is a framework for modelling a sequence of decisions, where some or all of the variables involved are not fully observable. It is an extension of a Markov Decision Process (MDP), where all states are assumed known with certainty.

1.2 Literature Review

This kind of model has been applied to health care as early as 1971 [8]. Steimle and Brenton [6] provides a useful guide for building an MDP or a POMDP for medical planning. They give several examples of MDP/POMDPs applied to chronic disease therapy, where the aim is to decide, at each time epoch, whether to take an action or wait. In these examples, the action is either a treatment action or screening action. The former aims to have a positive impact on the health of the patient, while the latter aims to improve the belief or certainty of the doctor, that the patient does indeed have the disease in question. Death is modelled as an absorbing state, ie. one that the patient cannot transition out of. Rewards are measured in life years or quality-adjusted life years (QALYs). QALYs take into account the quality of life of the patient’s remaining years. The paper also notes that these examples assume the best treatment or testing procedure is known beforehand, so it is just a case of performing the said procedure or waiting. A more realistic model is put forward by Hauskrecht & Fraser [9], for planning ischemic heart disease therapy. The POMDP here takes into account several competing treatment and testing actions, both of which can have serious consequences. For example, an angiogram, one of the most effective tests to determine the true state of the dis-ease, is invasive and can even lead to the death of the patient. One issue with this framework is that both treatment and "investigative" actions are rewarded in the same way, despite hav-ing very different goals. Ayer et. al. [5] formulate a finite-horizon POMDP for modellhav-ing the breast cancer detection problem, with the aim of determining the optimal mammography screening schedule for women. They consider 2 actions, "Wait" and "Mammography" and assume that a woman will check for breast cancer either by breast self-examination (BSE) or clinical breast examination (CBE). Thus, this self-checking is assumed to be done by default every 6 months, the time-step used by the authors. Another assumption used is that a pos-itive mammogram is always followed by a biopsy. A biopsy involves removing tissue from the patient and checking for cancerous cells. It can be assumed to be a perfect test according to Ayer et. al., as the sensitivity is close to 1. This means that a patient wrongly diagnosed with breast cancer will not be treated and the costs incurred are therefore only those related to undergoing the biopsy, which is an invasive procedure. A positive biopsy means that the patient moves into a "post-cancer" state, ending the POMDP and a lump sum reward (in quality-adjusted life years or QALYs) is given based on the expected life of the patient at the time of diagnosis. This was done because of the lack of data regarding women with recurrent breast cancer. Treatment options are not part of the model. Immediate rewards are also given every time-step. The state space is quite simple, with only 6 states: death, no cancer, 2 cancer states (in situ and invasive) and a post-cancer state for these 2 cancer states. They use the sensitivity and specificity measures for mammography and BSE/CBE to build the observa-tion model, the observaobserva-tion states being "positive test" and "negative test" for each screening method. The transitions were estimated using a validated microsimulation model of breast cancer epidemiology developed by the Cancer Intervention and Surveillance Modeling Net-work (CISNET).

(10)

1.3. Aim

1.3 Aim

The goal of the project is to use a POMDP to model the breast cancer diagnosis problem in further detail. This will entail expanding the state space to better reflect the health state of the patient, incorporating different treatment options and dropping the assumption that a woman will check herself automatically and regularly. The POMDP can then be solved to find the optimal policy for screening and commencing the treatment process.

1.4 Research questions

This report will try to answer the following research questions:

1. Can the parameters for a detailed POMDP be set up for breast cancer diagnosis decision-making?

2. Can the detailed POMDP be used to produce a coherent decision policy for breast can-cer diagnosis?

3. Is CBE an important factor in breast cancer screening?

1.5 Delimitations

Only breast cancer in women between the ages of 35 and 90 will be considered here.

Primary prevention will not be covered in this paper. In medical terminology, primary prevention focuses on stopping the onset of the disease, while secondary prevention attempts to detect the disease as early as possible. Tertiary prevention involves dealing with the disease and its consequences. Primary prevention is therefore concerned with risk factors that make a woman more susceptible to contracting the disease in the first place. Risk factors for breast cancer include lifestyle factors such as smoking, but also physical characteristics such as the presence of the BRCA1 and BRCA2 genes in the woman’s genetic make-up. Age will be the only risk factor included in the model.

This work will be limited to the diagnosis of the initial tumour in the breast and will not look into the post-surgery phase in detail. Therefore, cancer recurrence is not covered. This is because there is very little data regarding secondary cancers according to Ayer et. al. [5], who states that due to this lack of data, parameters cannot be estimated accurately.

Breast cancer once diagnosed is almost always treated first by surgery to remove the tu-mour. This is because removing the breast or part of the breast has very little risk for the patient, unlike cancers found in other places. At this point, the surgeon is able to measure the size of the tumour directly and count the number of lymph nodes involved. The implication of this is that not only would it be inappropriate to consider surgery as an action that can be chosen or not, but additionally the subsequent decision to select an adjuvant treatment is made using fully-observable information, with regards to the cancer stage at least. Originally the plan was to include the choice of adjuvant therapy in the POMDP, but since this deci-sion is made using complete information and could only be modelled as a one-off decideci-sion, it was decided to restrict the POMDP to the screening period, ie. the time until a patient is diag-nosed correctly with cancer and operated upon. The "decision agent" referred to in this report represents the patient and the medical staff involved combined: the distinction between who makes which decision is not explored here for simplicity. The decision agent is also assumed to be risk-neutral: the distinction between reward and utility is not dealt with.

(11)

2 Theory

2.1 Markov Decision Processes

The theoretical background presented in this section is based on the book Decision Making Under Uncertainty by Kochenderfer et al. [10]. A Markov Decision Process or MDP is a framework for modelling a sequence of decisions based on the state of a variable or group of variables. The state, denoted S, evolves stochastically over time. At each time-step t, an action a is selected, resulting in a reward being given, its value depending on the current state and the chosen action. These actions also have an impact on how the state evolves. The Markov Assumption is used, meaning that the state at the next time-step depends only on the state and action taken at the current time-step. The transition model, denoted T, is the name given to the probability model that describes how the state changes over time. It specifies the conditional probability of moving into a state at the next time step, given the current state and the action taken. The formula below shows the transition probability for any pair of state values siand sj:

T(sj|si, a) =Pr(St+1=sj|St=si, At=a)

Note that upper case Stdenotes the state at time t, while lower case sidenotes one of the possible values the state can take. The state variable can be continuous or discrete. In the latter case, the transition model is often summarised in matrix form, with the probability of moving from si to sjappearing in the ithrow, jthcolumn. The horizon h of an MDP defines how many time-steps the decision sequence contains. It can have a finite value or the MDP can be "infinite-horizon". The infinite horizon is frequently used as it means that the transition and reward models can be considered stationary and so do not change over time. This thesis only considered the infinite-horizon case, so little more will be said regarding finite-horizon cases. The reward model R(s, a)defines what reward is given for taking an action when the state is a certain value. The reward at time t is therefore a function of the state and action at time t. The reward for a decision sequence r0:his simply the sum of the rewards rt at each time-step up till the horizon. For infinite-horizon MDPs, where there is an infinite number of decisions in the sequence, a discount factor γ is introduced to make the accumulated reward

(12)

2.2. Partially-Observable Markov Decision Processes

finite, as shown below:

r0:8= h=8

ÿ t=0

γtrt

The aim of such an MDP is to determine a strategy to get the largest reward over the whole sequence. The concept of strategy is formalised as a policy. A policy πt(s)selects an action to take at time t based only on the current state s, thanks to the Markov Assumption. For an infinite horizon, the transition and reward models are stationary (not a function of time) so the policies considered will also be stationary: the policy π(s)will be the same for all time-steps. Another important concept in MDPs is the value function. The value function is defined as the expected utility U of executing the policy π when the state is s. For an infinite horizon, the value Uπ₍_s₎_{of executing the policy π given the current state s is calculated using}

the following formula:

Uπ₍_s_{) =}_R₍_{s, π}₍_s_{)) +} γ ÿ s1 T(s1_{|s, π}₍_s₎₎_Uπ₍_s1₎ where:

• R(s, π(s))is the immediate reward gained by taking the action π(s)(the action recom-mended by the policy π given the current state s).

• ř

s1T(s1|s, π(s))Uπ(s1) is the expected reward to be gained at the next time step by executing the policy. The expectation is taken with respect to transition probabilities given the current state.

The term utility is used here (and not reward) as the future rewards are scaled by the dis-count factor, meaning that the rewards are regarded with decreasing importance the further into the future they are obtained.

The aim of an MDP is to find an optimal policy π˚_{, ie. a policy that maximises the value} function:

π˚=argπmax Uπ(s)

The optimal policy is usually obtained using dynamic programming. One of the most common algorithms used is value iteration. Value iteration aims to find the optimal value function and then extract the respective optimal policy after. The optimal value function U˚ satisfies the Bellman equation:

U˚(s) =maxa R(s, a) +γ

ÿ s1

T(s1|s, a)U˚(s1)

!

Value iteration starts with an estimate of U˚ _{then updates this estimation iteratively} through the equation above, until convergence is reached.

2.2 Partially-Observable Markov Decision Processes

A Partially-Observable Markov Decision Process is an MDP with an extra layer of uncer-tainty: here the value of the state is not observed directly. Instead, another variable or set of variables are observed and provide some information regarding the value of the state. These variables form the observation or "observation state", denotedΩ, as opposed to the state or "true state" S. The observation model, denoted O, is the name given to the probability model describing the relationship between the observation state and the true state. It specifies the conditional probability of making a particular observation o, given the true state and action taken (all in the same time-step), as shown in the formula below [10]:

(13)

2.2. Partially-Observable Markov Decision Processes

• S, the state space, a set of distinct states, which can be discrete or continuous. • A, the action space, a set of all possible actions the decision agent can execute.

• T : S ˆ A ÞÑ [0, 1], the transition function, describing the conditional probabilities of reaching a certain state given the action taken and the previous state: Pr(St+1_|St_{, A}t₎ • R : S ˆ A ÞÑR , the rewards for each state/action combination.

• Ω, the observation space, a set of all possible observations, which the agent perceives after executing an action.

• O : S ˆ A ÞÑ [0, 1], the observation function, describing the conditional probabili-ties of receiving a certain observation taking a certain action, given the current state: Pr(Ωt|St, At)

• h, the horizon, which is the number of steps-ahead the agent must plan for. Can be infinite or finite.

• γ P [0, 1], a discount factor which scales down the rewards in future time-steps when considering the expected reward of a sequence of actions.

The concept of "belief states" is very important in POMDPs. A belief state, as the name suggests, represents the decision agent’s belief on the state’s value. It is a probability distri-bution over the state space, giving the current probability of being in each state value. This concept allows for a discrete-state POMDP to be viewed as a continuous-state MDP, where the continuous belief state is treated as the state variable. The process of belief updating effec-tively becomes the (continuous) transition model for the belief state, using Bayes’ rule. The updated belief bo_a(s1₎_{(belief of the state taking value s}1_{after taking action a and observing o)} is calculated as follows: boa(s1) = O(o, s1_{, a}₎ Pr(o|a, b) ÿ sPS T(s1_{|s, a}₎_b₍_s₎ where:

• Pr(o|a, b)serves as a normalising constant [12]: Pr(o|a, b) = ÿ

s1_PS

O(o|s1, a)ÿ

sPS

T(s1|s, a)b(s)

• b(s)is the belief of the state taking value s.

Therefore, for POMDPs, a policy defines what action to take based on the current belief state. The optimal value function U˚_{(corresponding to an optimal policy) can be defined as} the solution of the following Bellman equation, for all beliefs b in the belief space [12]:

U˚(b) =maxa R(b, a) +γ ÿ oPΩ p(o|a, b))U˚(boa) ! where: R(b, a) =ÿ sPS R(s, a)b(s)

(14)

2.3. Solving a POMDP

2.3 Solving a POMDP

Smallwood and Sondik [13] showed that for finite horizon POMDPs, U˚₍_b₎_{will be a} piece-wise linear and convex (or PWLC) function, while Sondik [14] later proved that for infinite horizon POMDPs, it can be suitably approximated by a PWLC function. This means that the belief space can be split up into a finite number of "pieces" or regions, over which the value function will be linear, ie. hyperplanes in the belief space. The coefficient vector for each hyperplane is called an alpha vector. Each alpha vector has an action associated with it. Value iteration for POMDPs involves approximating U˚₍_b₎_{recursively, each iteration} look-ing one time-step further ahead and findlook-ing the alpha vector which maximises the value of b, calculated by taking the dot product of the alpha vector and the belief state as a vector [10]:

U˚₍ÝÑ_b_{) =}_max

a(ÝÑαaT‚ÝÑb)

This process of looking one step ahead and finding the best alpha vector is called a back-up stage [12].

The Perseus algorithm is a randomised point-based value iteration algorithm developed by Spaan & Vlassis [12]. Point-based value iteration is a type of approximate algorithm that involves selecting a limited number of points in the belief space and performing back-up stages, each stage finding the optimal alpha vectors associated with each belief point. The value function for any belief (not just the belief points selected) can then be approximated from these alpha vectors [10]. In Perseus, these belief points are originally obtained through simulation of the decision agent’s path along the decision process (defined by the POMDP). The key innovation in Perseus compared to previous work is that an even smaller subset of the belief points is then chosen at random for performing the back-up stages, reducing the solving time considerably. Spaan & Vlassis [12] demonstrated in their paper that Perseus performed very well against other solvers in terms of speed and accuracy, even for larger scale POMDPs.

(15)

3 Method

3.1 Considerations and assumptions regarding breast cancer

Quantifying cancer progression

Cancer is an extremely complex disease, its onset and growth ideally to be modelled at the cellular level. The Cancer Intervention and Surveillance Modeling Network (or CISNET) Breast Working Group [15] quantify the state of a woman’s breast cancer into the size of the primary tumour and the number of lymph nodes that become cancerous. They model the onset as a 0.2 cm tumour appearing, which then grows with time, along with the number of lymph nodes involved.

Cancers are frequently talked about in terms of cancer stages. In this paper a 4 stage clas-sification will be used, determined solely based on the tumour size and the number of lymph nodes affected. In reality, defining the cancer stage is more complicated, with more factors and combinations involved but this simplified classification was used by the CISNET Breast Working Group [15] and so was deemed sufficient. Table 3.1 describes this classification:

Table 3.1: Table showing the CISNET cancer stage definitions stage in situ localised regional distant Tumour Size(cm) ď0.95 ą0.95 any any

# Lymph Nodes 0 0 1-4 5+

3.2 POMDP formulation

Actions

There are 4 actions in the action set. • Do nothing

• Clinical Breast Examination (CBE) • Mammography

(16)

3.2. POMDP formulation

The "CBE" action is a routine examination performed by a medical professional, with the aim of detecting breast cancer. Although the 4th action is denoted "surgery", the action represents the act of diagnosing the patient with cancer. It encompasses the sequence of actions taken by medical staff when the patient is diagnosed with cancer. The assumption made by Ayer et. al. [5] was used. They assumed that before beginning surgery, a biopsy is done. This biopsy is assumed to be a perfect test, with 100% accuracy. Therefore if a patient is wrongly diagnosed with cancer, the consequences will be relating to the invasive nature of the biopsy and not an unnecessary surgery.

One could argue that in real life, a positive CBE result would automatically lead to the pa-tient going in for a mammography and similarly that a positive mammography result would lead to the patient going in for a biopsy. However, one of the aims in this paper is to see if the system is able to make these decisions on its own, so this progression was not enforced through the POMDP formulation.

States

In total there are 151 states: • 4 healthy states • 144 cancer states • 1 post-cancer state • 2 death states

All the states can occur naturally, without any action being taken, except for the "post-cancer" state, which is reached when the action "surgery" is taken and the patient was cor-rectly diagnosed with cancer. As stated by Ayer et. al. [5], modelling cancer recurrence is very difficult due to the lack of data and the complexity of this phenomenon. Therefore, the period from surgery until death is summarised in a single, absorbing "post-cancer" state.

Each naturally occurring state is a combination of between 1 and 6 variables (also called state factors). The variables are described in the subsections below. The states were num-bered from 0 to 150, roughly corresponding to the order of the states attained through the progression of the cancer. Although the stage of the cancer is not explicitly a state factor, it can be obtained using 2 of the state factors present: tumour Size and number of lymph nodes involved, following the classification in table 3.1. For the full state specification, refer to Figure 4.5 in Section 4.2.

"Dead or Alive" variable

This variable can take 3 values: "alive", "dead from breast cancer" and "dead from all causes". The latter two values are of course absorbing states: once the patient reaches one of these states, they are stuck there. The state "dead from all causes" is important to consider for slowly-progressing, chronic diseases such as breast cancer as it may not be worth it to screen a woman above a certain age, if the probability of her dying of natural causes vastly out-weighs the probability of her dying from breast cancer [6]. This variable is fully-observable throughout the whole process.

code value

0 alive

1 dead from cancer 2 dead from all causes

(17)

3.2. POMDP formulation

Age

Age was discretised in 4 categories. This was done in order to get the most out of the cure probabilities for each treatment option provided by the CISNET Breast Working Group [15]. These probabilities were different depending on the age of the patient. The values were used to form the transition model. Age is fully-observable throughout the whole process and of course evolves over time.

code value 0 age ď 50 1 50 ă age ď 60 2 60 ă age ď 70 3 age ą 70 Cancer status

This variable is simply a binary to indicate whether or not the patient has breast cancer. This is a hidden variable during the screening phase and changes over time.

code value

0 patient does not have breast cancer 1 patient has breast cancer

Tumour Size

The tumour size variable, although generated as a continuous variable, was discretised into 6 intervals. Initially this was done to match the intervals featuring in the mammography sensitivity table provided by CISNET [15], then one of the bounds was adjusted to match the threshold defining the difference between the "in situ" and "localised" stages of breast cancer used by CISNET, mentioned in section 3.1. This variable is hidden during the screening phase and evolves over time.

code value 0 0.2 ď TS ď 0.5 1 0.5 ă TS ď 0.95 2 0.95 ă TS ď 1.5 3 1.5 ă TS ď 2 4 2 ă TS ď 5 5 5 ă TS ď 8

Number of lymph nodes involved

The number of lymph nodes affected by the cancer is an important factor in determining the progression of a cancer, as it means that the cancer has metastasised and spread to the lymph nodes from the initial tumour. This variable was generated as a positive integer, theoretically without an upper bound, but discretised into 3 categories in order to differentiate between all 4 stages of breast cancer, as shown in 3.1. This variable is hidden and evolves with time.

code value

0 no lymph nodes involved 1 between 1 and 4 lymph nodes involved 2 more than 5 lymph nodes involved

(18)

3.3. Probability models

ER status

The ER status is a binary variable to indicate whether the patient is positive or ER-negative. Estrogen-Receptors (abbreviated to ER) are proteins present in normal breast cells and can also be found in cancer cells. A woman is classed as ER-positive if she has cancer cells with these receptors. This has a negative impact on prognosis as estrogen encourages cancer cell growth [16]. The ER status is also hidden but not tested for in the screening phase as it applies only to patients with cancer. Once a patient is diagnosed with cancer, the treatment begins: a biopsy is done and the ER status determined with a high accuracy. Therefore, this variable will only be used to determine the expected life after surgery. It is a variable which does not evolve over time.

code value

0 ER-negative 1 ER-positive

Observations

In the screening phase of breast cancer, the aim of the tests (CBE and mammography) is to determine solely whether or not the patient has cancer. Therefore there is a single observed variable, taking two values: "Positive result" and "Negative result". The observation "dead" was added to the observation states to give an observation corresponding to the 2 death states in the observation matrix: the probability of both death states giving this observation are of course 1.

3.3 Probability models

Time series data representing patients’ history is usually required to learn the input models of the POMDP, ie. the initial, transition and observation probability distributions. Ma & Tan [17] suggest a method of estimating these probabilities using the Expectation-Maximisation (or EM) algorithm, based on a time series of observations and a time series of actions. However, even observational data was not readily available so it seemed appropriate to simulate the relevant data instead.

Transition model

Natural progression transitions

Since using EM to obtain all the parameters was not possible, the next best option would be to learn the transition model individually from data. Steimle and Brenton [6] recommend starting by building the transition model related to the natural progression of the disease, be-fore looking into the treatment effects. However, time series data for each patient containing the state of the patient at each point in time was required again. As mentioned previously, it was not possible to access such data if it exists.

Therefore simulation was required to generate relevant time-dependent data. Fortunately CISNET has developed time dependent function models for describing how the tumour grows and how probable lymph nodes become affected by the cancer, ie. the natural pro-gression of the cancer [15]. These could be used to generate the relevant time series data. The details of the simulation model will be described in section 3.4.

This simulated data was then used to estimate the natural progression transitions. Maxi-mum Likelihood Estimates (MLEs) for the transition probabilities was used. The MLE for the transition probability from state i to j (ˆtij) is obtained as follows:

(19)

2. count the number of times a patient moves from state i to any state, ie. the number of times the patient is in state i

3. divide the first count by the second, giving the estimate for the transition probability, as shown in the equation below:

ˆtij= cij ř

jcij

The natural progression transition model is used for actions "do nothing", "CBE" and "mammography" as these actions have no effect on the true state of the patient.

Transition model under action "surgery"

When the action "surgery" is executed and the patient was in one of the cancer states, the patient moves to the "post-surgery" state with probability 1. If the patient was in one of the healthy states, the action "surgery" has no impact on the state. This is because of the assumption that a definitive biopsy will be performed before surgery, meaning that if the patient was previously misdiagnosed, she will not have surgery and remain in the screening phase. She could of course develop breast cancer later in life.

Observation model

For the observation probabilities, the sensitivity and specificity measures available in the lit-erature were used. To demonstrate how these measures can be used to set up an observation model, consider the following example with 2 states {cancer, no cancer} and 2 observations, also {cancer, no cancer}. Now consider a screening test for detecting cancer. The probability of observing "cancer" when the true state is also "cancer" would therefore be the true posi-tive rate, ie. the sensitivity of this screening test. The probability of observing "no cancer" when the true state is also "no cancer" would be the true negative rate, ie. the specificity, of the screening test. The final 2 probabilities are simply the complements of the first 2. To summarise:

Pr(O=Cancer, S=Cancer) =sensitivity Pr(O=No Cancer, S=Cancer) =1 ´ sensitivity Pr(O=No Cancer, S=No Cancer) =speci f icity Pr(O=Cancer, S=No Cancer) =1 ´ speci f icity

Observation model under action "CBE"

The CISNET paper provides a sensitivity table for something they call "clinical surfacing". This refers to the basic screening tests done by the patient themselves or by doctors such as breast self-examination (BSE) and clinical breast examination (CBE). It contains sensitivities for different sizes of tumour, stating that any point between 2 given values can be linearly interpolated. Since the tumour size variable consists of intervals, the sensitivity values of the midpoint of each interval were interpolated and used as representative of that interval. Such detailed sensitivity values for CBE alone were not found, so these values were used as substitute for the CBE sensitivities.

To match the "clinical surfacing" sensitivities, the specificity for breast self-examination was used as the value, sourced from [18]. This follows from the logic that a woman who per-forms a breast self-examination and gets a negative result will not follow up with a doctor’s visit for a clinical breast examination.

(20)

Table 3.2: Observation probability model for "CBE" Tumour Size (cm) Obs=Cancer Obs=No Cancer

0 0.126 0.874 0.2-0.5 0.009 0.991 0.5-0.95 0.036 0.964 0.95-1.5 0.063 0.937 1.5-2 0.185 0.815 2-5 0.600 0.400 5-8 0.900 0.100

This means that each state will get the observation probabilities corresponding to the value of its Tumour Size state factor.

Observation model under action "mammography"

The CISNET paper also provides a sensitivity table for a mammography, containing different values for different sizes of tumours and for women both below and above the age of 50. The authors state that any point between 2 given values can be linearly interpolated [15]. Again, since the tumour size variable consists of intervals, the sensitivity values of the midpoint of each interval were interpolated and taken as the representative value.

The specificity for a mammography was sourced from [19].

Using this information, the observation probabilities were obtained and displayed in Ta-ble 3.3.

Table 3.3: Observation probability model for "mammography".

Age ď 50 Age ą 50

Tumour Size (cm) Obs=Cancer Obs=No Cancer Obs=Cancer Obs=No Cancer

0 0.012 0.988 0.012 0.988 0.2-0.5 0.200 0.800 0.300 0.700 0.5-0.95 0.407 0.593 0.495 0.505 0.95-1.5 0.632 0.368 0.745 0.255 1.5-2 0.75 0.250 0.85 0.150 2-5 0.920 0.080 0.945 0.055 5-8 0.995 0.005 0.995 0.005

This means that each state will get the observation probabilities corresponding to the value of its tumour size and age state factors, with age classes merged into two states.

Observation model under action "surgery"

When the patient undergoes surgery, the tumour is removed, allowing it to be measured directly. The number of lymph nodes involved is also visible to the surgeon. In the case where the patient was wrongly diagnosed with breast cancer, the pre-surgery biopsy is assumed to be 100% accurate. Therefore the observations under the "surgery" action are assumed to be perfect, whatever the state of the patient.

Initial model

The initial probability model represents the probability of being in each state at the start of a woman’s life with respect to breast cancer, ie. at age 35. Since a woman has a negligible probability of getting breast cancer before this age [20], the initial probability model sets the patient in state 0, corresponding to a healthy women between 35 and 50, with probability 1.

(21)

Rewards

Quality-adjusted life years

The base of the reward model is very simple: as the time-step is 6 months, the patient gets 0.5 life years if she is alive, 0 if she is dead. However, the quality of the half-year lived must be accounted for too: a 70-year old woman with distant stage cancer will not experience the same quality of life as a 40-year old healthy woman. Health related quality of life (HRQoL) is a score from 0 to 100 used to express this difference. It allows life years to be converted into quality-adjusted life years (QALYs), by using the score to scale the life year quantity. These scores are measured through questionnaires which evaluate various aspects of a woman’s health. There are several questionnaires used in breast cancer HRQoL evaluation studies, the most common being the EORTC QLQ-C30 [21]. However, even when using the same questionnaires, the results are very varied from one study to the next. This is because the symptoms and side-effect experienced, as well as how one deals with them, can change dra-matically from one woman to the next, so HRQoL values are by nature extremely subjective. They therefore must be treated with caution. In addition, the average woman does not score a 100% HRQoL score, so scaling is key. The HRQoL values were also sourced from different studies, making it vital to compare a baseline score in each study to keep the resulting reward model as consistent as possible. Table 3.4 shows all the factors used to calculate the HRQoL score for each state/action combination and where the values were sourced from.

Table 3.4: Table of the factors affecting quality of life and the sources of information used

Factor Values Source

Age ď50,(50 : 60],(60 : 70], ą 70 Hinz et.al [22]

Cancer With vs Without Karlsen et al. [23]

Cancer Stage 0-II vs III-IV Ivanauskiene et al. [24] Mammography

Done vs Not done Bonomi et. al. [25] Biopsy

Surgery (+ radiation) Chemotherapy

Tamoxifen therapy Day et. al. [26]

For each factor, a baseline value was chosen from each study and the percentage change with respect to this baseline was used as representative of the effect of this factor. These percentages could then be combined incrementally to calculate the HRQoL scores. Karlsen et al. [23] provide the percentage decrease in quality of life (caused by a cancer diagnosis) in their paper, so this was used directly.

State-based

First, the reward for each state needed to be set up, before looking at the effects of each action. The rewards here depend only on the state of the patient. Before looking at the effects related to cancer, HRQoL values for the healthy states needed to be found, to use as a baseline. Hinz et.al [22] provide such baseline values for European women using the EORTC QLQ-C30 framework. They show that the HRQoL score decreases with age.

Next, the effect of different stages of cancer was needed. Ivanauskiene et al. [24] compare HRQoL scores for early stage against late-stage breast cancer patients, but their study does not include any women without breast cancer for comparison. Therefore the impact of getting cancer on HRQoL was taken from Karlsen et. al. [23]; this was used as the HRQoL change between healthy states and stage cancer states. The percentage change between early-stage and late-early-stage cancer state could then be found using Ivanauskiene et al. [24].

Multiplying the final HRQoL scores by the Life Years accrued between time-steps gave the rewards for each state. The post-cancer and 2 death states have reward 0.

(22)

Action-based

The state-based rewards represent the rewards for taking action "nothing". Next, the impact of the 3 other actions needed to be accounted for. The method for calculating these was a little different as the duration of the actions needed to be considered. Bonomi et. al. [25] provided both HRQoL scores for various procedures related to breast cancer screening, diagnosis and treatment. The duration that each procedure impacts the patient’s quality of life was also obtained from this paper.

According to Bonomi et. al. [25], a screening mammography (with unknown results) has an impact on HRQoL for 3 weeks, as that is the time it takes to get the results back. The impact on HRQoL was gauged using the HRQoL score for a true negative mammography result as the baseline.

The impact of a clinical breast examination on HRQoL was not covered by Bonomi et. al., but given that the reasons for loss of quality of life were the same as for a mammography (stress over having the check-up, discomfort during...), it was decided to use the same HRQoL loss as for mammography, but with a duration of 1 day instead of 3 weeks, as to represent the visit to the doctor’s and the immediate result.

For the "surgery" action, the reward model was a little more complicated. It is assumed that before a breast cancer surgery, a patient will undergo a biopsy to make sure the patient was diagnosed correctly. The biopsy is assumed to be 100% accurate. If the diagnosis is confirmed, the patient then undergoes surgery followed by potential adjuvant therapy: she moves into the absorbing post-cancer state with probability 1. If not, the patient does not actually have surgery and returns to the general population pool, albeit with certainty that she does not have breast cancer at that point in time: she remains in whatever healthy state she was in. The reward for the action "surgery" therefore depends not only on the starting state, but also the end state.

Bonomi et. al. [25] gives the HRQoL score for 2 weeks of diagnostic tests, including a nee-dle biopsy. However, a study on breast cancer waiting times by Selove et. al. [27] estimated the mean waiting time between a positive screening and a biopsy was about a month, so this was included into the duration of this process, extending the duration of quality of life loss to 6 weeks. This value was used to calculate the HRQoL loss incurred from taking action "surgery" when the patient is in one of the four healthy states. She remains in the same state she was in.

If the action "surgery" is taken when the patient was in one of the cancer states, meaning she was correctly diagnosed, the patient moves into the "post-cancer" state. An immediate reward associated with the 2 weeks of diagnostic tests and the 4 weeks between diagnosis and consequent biopsy is given. Since the patient does have cancer in this case, the waiting time between biopsy and surgery must also be accounted for. Selove et. al. [27] estimate an average waiting time of a month for this, so the duration over which the quality of life is affected by pre-surgery tests and waiting was extended from 6 weeks to 10.

Since this state is absorbing, the reward for being in this state is 0, as for the 2 death states. However, to represent the patient’s length and quality of life after surgery, an immediate re-ward is given for moving from any cancer state to the post-cancer state. This rere-ward is the expected accumulated reward from surgery until death. This reward not only depends on the state at the time of diagnosis, but also the adjuvant treatment chosen (if any) after the patient has undergone surgery. Because during the surgery, the state of the patient becomes fully observable, the decision on choosing an adjuvant therapy is made under complete informa-tion. The only uncertainty remaining is whether or not the treatment (surgery with adjuvant therapy) will cure the patient of cancer for good. The CISNET Breast Working Group [15] estimated the cure probabilities for several adjuvant therapies, including the option where no adjuvant therapy is done after surgery. They based these values on the 10-year survival rates from real patients who underwent each type of treatment. They provide different values

(23)

for each treatment depending on the age of the patient, the cancer stage and ER status. The adjuvant therapy options considered by CISNET are the following:

• No adjuvant therapy • Chemotherapy

• 2 years tamoxifen (ER hormone suppressing drug) • 5 years tamoxifen

• Chemotherapy + 2 years tamoxifen • Chemotherapy + 5 years tamoxifen

These values were used to determine the best adjuvant therapy given the state of the patient, selecting the therapy that maximises the expected reward. After undergoing one of these adjuvant therapies, the cancer patient is assumed to either:

• move into the corresponding healthy state (given the age category) with probability pcure

• stay in the same cancer state with probability(1 ´ pcure)

The expected reward (in QALYs) accumulated from the point of surgery until death was estimated using the simulation model. First, the state-based reward model was applied to the simulated patients. Then for each state, the mean accumulated reward was calculated. The expected reward for taking adjuvant therapy "chemotherapy" was then obtained using the following equation (chemotherapy is used as an example):

ER(S=cancer, A=chemo)

=pcure(chemo)ER(S=healthy) + (1 ´ pcure(chemo)ER(S=cancer)´cost(chemo)

In the end, chemotherapy was the only adjuvant therapy deemed to have a HRQoL cost, as Day et. al. [26] state in their study that tamoxifen’s impact of Quality of Life is negligible. For this reason, the tamoxifen therapy HRQoL score found in Bonomi’s paper [25] was used as the baseline for estimating the impact of chemotherapy on HRQoL. The QALY cost of chemotherapy was this value times the duration of 4 months.

It was assumed that the doctors are aware of this best choice of therapy and select it. The reward for moving from each cancer state to the post-cancer state was the expected reward based on selecting this best adjuvant therapy. A discount factor was applied to the accu-mulated rewards obtained through simulation to match the discount factor of the POMDP model, as to keep the rewards consistent. The selection of the discount factor is detailed in Section 3.5

To summarise, Table 3.5 shows each factor, its effect on quality of life, the baseline HRQoL used and the duration of the effect. Note that state-based HRQoL effects don’t have a dura-tion, only the action-based ones.

(24)

3.4. Simulation model

Table 3.5: Table showing each factor’s effect on quality of life and duration of the effect

Factor Type Factor Baseline HRQoL effect Duration

State-based Age ď 50 Age ď 50 0% N/A 50 ă Age ď 60 Age ď 50 -3.4% 60 ă Age ď 70 Age ď 50 -6.2% Age ą 70 Age ď 50 -13.6% Cancer Healthy -2.1%

Cancer Stage 0-II Cancer Stage 0-II 0% Cancer Stage III-IV Cancer Stage 0-II -20.7%

Action-based

CBE TN mammography* -9.76% 1 day

Mammography TN mammography -9.76% 3 weeks

Biopsy TN mammography -37.9% 2 weeks

Diagnosis->biopsy wait TN mammography -37.9% 4 weeks Biopsy->surgery wait TN mammography -37.9% 4 weeks Surgery(+radiation) Tamoxifen therapy -44.8% 6 months

Chemotherapy Tamoxifen therapy -23.7% 4 months

*TN mammography refers to the quality of life score associated with receiving a true negative result from a mammography.

The full reward model can be found in Appendix I.

3.4 Simulation model

The simulation model presented here is based in large part on the models published by CIS-NET [15]. The model was implemented as a Python object. One key difference with the CISNET model is that because the aim of the POMDP is to determine whether or not a pa-tient has cancer (or diagnose the papa-tient), women who do not have cancer were not accounted for in their simulation. As the CISNET group were interested in modelling how breast cancer evolves with time, all their models start from the point that a woman gets breast cancer. This model, however, take into account the probability of a woman getting cancer and includes women who do not get cancer at all.

The Variables

Time component:

Time is in months and comprises all month values between 35 years (420 months) and 90 years (1080 months) old.

Single value

• ER status: positive or negative (if the patient gets cancer) • Age of cancer onset (if the patient gets cancer)

• Age of death

Time series

(25)

3.4. Simulation model • Cancer state: – 0 = no cancer – 1= in situ – 2= localised – 3= regional – 4= distant

– 5= dead from breast cancer

– 6= dead from all other causes

The process

In the following sections, each aspect of the simulator will be described one by one.

Cancer initiation

Every 12 months, a Bernoulli test is performed using as success probability an age-dependent probability of getting breast cancer over the next time. These probabilities were sourced from cancer.gov, using their Breast Cancer Risk Tool [20]. The age at which the patient gets cancer (if she does) is saved.

Tumour growth

At the time of cancer onset, the tumour size is initialised at 0.2cm. An initial growth rate α is drawn from the gamma distribution:

α „Γ(10, 0.1)

From this point onward, the tumour diameter in cm evolves following a Gompertz growth function with formula:

d(t) =d0e ln dmax d0 (1´e´αt₎ Where:

• d(t)is the diameter of the tumour at time t • d0is the initial diameter (0.2 cm)

• dmaxis the maximum diameter for a tumour (set to 8 cm by CISNET) Using this function a time series of tumour sizes is obtained.

Note that the tumour is assumed to be a perfect sphere, and therefore an equivalent Gom-pertz growth function can be written but in terms of the volume instead as follows:

V(t) =V0e ln Vmax

V0

(1´e´αt₎

Tumour growth rate

By differentiating the tumour volume’s Gompertz formula, the following formula for the growth rate was obtained:

V1(t) =e´αtV0α Vmax

V0

(1´e´αt)ln(Vmax/V0)

(26)

3.5. Solving the POMDP

Limited malignant potential tumours

According to CISNET, around 42% of incident tumours show limited growth and are not dan-gerous for the patient. These are named limited malignant potential or LMP tumours and are assumed to be 100% curable. To model this phenomenon, 42% of all tumours stop growing once their diameter reaches 1 cm and disappear 2 years later to represent their regression over time.

Additional lymph nodes

At the time of cancer onset, the number of lymph nodes involved is set to 0. From then on, every year a Bernouilli test was performed to determine whether or not an additional node becomes affected by the cancer. The probability used in this test is a function of the current tumour size as well as the current tumour growth rate:

n(t) =0.0058+0.0052V(t) +0.0002V1₍_t₎

Using this method, a time series of the number of Lymph nodes involved at each time-step was obtained.

Hyper-aggressive tumours

A small proportion of all breast cancers are assumed to spread much quicker than usual. These are referred to as hyper-aggressive tumours. To model these, 1% of non-LMP tumours are initialised with 4 involved lymph nodes instead of 0, and 2% of non-LMP tumours are initialised with 5.

Dying from breast cancer

To simulate the patient dying from breast cancer, the simplified approach used by CISNET was followed: the patient is assumed to die a certain time period after reaching stage 4 or the distant stage of cancer. The number of years after which the stage 4 cancer patient dies is drawn using the inverse CDF method and the empirical CDF formed by CISNET.

All cause death

To simulate a patient dying from all causes except breast cancer, a Bernouilli test is performed every year using probabilities of dying over the next year depending on age. However, it must be noted that these probabilities cover all causes of death, including breast cancer as the probability values of all causes except breast cancer were not readily available. The probabil-ities were sourced from the US’ Social Security Administration website [28]. If the patient is still alive at the end of the simulation, it is assumed she dies at the very next time-step after reaching age 90.

3.5 Solving the POMDP

Choice of horizon and discount factor

The horizon of a POMDP is the number of decision steps to be considered when solving the POMDP. It can be a finite number of time-steps or the POMDP can be "infinite horizon". Us-ing an infinite horizon is usually simpler as the policy can then be assumed stationary [29]. Initially, it would seem to make sense to use a finite horizon, seeing as the decisions may change depending on how close the patient is to death. Ayer et. al. use a finite horizon in their POMDP for breast cancer screening [5], to represent the ageing of the patient. Since the

(27)

and a time-step of 6 months, a horizon of 110 would be appropriate. However, the POMDP presented here has a much larger state space than the one used by Ayer et. al. [5]. Cassan-dra’s software [30] included the option of setting a finite horizon but all the implemented algorithms were exact methods. Solving the POMDP with a finite horizon was attempted using this software but this proved to take too long to solve a problem of this size. The ap-proximate solvers available did not contain the option of using a finite horizon. In addition, the age of the patient is already represented in the state of the POMDP presented here, mean-ing that the horizon is not necessary for "endmean-ing" the decision process. For these reasons, the horizon was set to infinite.

Since an infinite horizon was used, choosing an appropriate discount factor was neces-sary to make the summed reward finite. The discount factor γ, a value between 0 and 1, is introduced for this very purpose in infinite horizon POMDPs. Its value is very important, as it influences how important long-term rewards are compared to the short-term ones. The reward gained at the tth step ahead is scaled by the discount factor to the power of t, γt. However, the literature was not very specific about determining the best discount factor for a problem: most applications use a generic value of 0.95 or 0.99. The larger of the two values, 0.99, was chosen as using a discount of 0.95 would result in the later rewards being reduced to insignificant values.

Solving the POMDP

The POMDP was solved using Erwin Walraven’s implementation of the Perseus algorithm in Java [31]. The software requires as input a .POMDP file, a text file in Cassandra’s POMDP format. This format, used by Cassandra in his solver implementation [30], allows the user to specify all the POMDP parameters except for the horizon, which is a command line option when using Cassandra’s software but is not an available parameter in Walraven’s implemen-tation. Figure 3.1 shows part of the final POMDP file as an example.

(28)

The output is a .alpha file, a text file in Cassandra’s value function file format [30]. It contains the final alpha vectors representing the optimal value function obtained by Perseus. Figure 3.2 shows part of the output file obtained as an example. As displayed in the figure, each alpha vector contains values (of length 151, one for each possible state), with the action associated appearing above the vector. As mentioned in Section 2.3, the value for a given belief state, is calculated by taking the dot product of this belief vector and each alpha vec-tor and finding the maximum. The action recommended by the policy is the action which maximises the value for this belief state.

Figure 3.2: Screen-shot showing the top-left part of the .alpha output file

Executing the optimal policy

Once the value function associated with the optimal policy was obtained using Perseus, the final step was to see how the policy chooses actions for patients. Since real patient data was not available, a subset of simulated patients was used. For each patient’s true state time series, the policy was executed as follows:

• Set the current belief to the initial probabilities • at each time step:

1. Select the optimal action, the one which maximises the value function for the cur-rent belief

2. Simulate an observation based on the observation model under the selected action and the current true state

3. Save the reward obtained based on the action taken and current true state 4. Update the belief using Bayes’ Rule

(29)

4 Results

4.1 Simulation results

In this section, the results from the simulator will be shown. Although comparing the sim-ulated patient histories with equivalent real data was impossible, several key statistics were extracted from the simulated cohort and compared to real-life values. The CISNET Working Group [15] used a SEER Cancer Statistics Report [32], containing data for the US from 1975 to 2000, to validate their models. The SEER report contained age-specific incidence rates (per 100,000 women) for in situ and invasive breast cancers, so this data was used for compari-son. In the following section, a subset of 50,000 patients (out of the 1 million) was used as representative of the simulator results.

In situ cancer incidence

The SEER Cancer Statistics Report [32] provides in situ cancer incidence rates for women, broken down into 5 year age intervals between 35 and 85 years old. The incidence rates for women above 85 were put into a single "85+" bin. From the simulated women histories, the age of cancer onset was extracted from each woman who got breast cancer. The resulting data was put into 5-year bins to match the SEER data. Both sets of data were normalised to have matching scales. Figure 4.1 shows the histograms for both the SEER data and the simulation results.

(30)

4.1. Simulation results

30

40

50

60

70

80

90 Age

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14 _SEER

simulated

Figure 4.1: Age distribution for in situ cancer incidence: simulated vs. SEER data

The figure shows that the simulated in situ incidence age distribution is close to the SEER data, as shown by the overlap between both histograms. However, the tails of both distri-butions differ a little. For lower ages, particularly between ages 35 and 45, the simulated incidence rate is much higher than the empirical incidence rate. Conversely, the simulated incidence rate is a fair bit lower than the empirical incidence rate at the upper tail of the dis-tributions, for ages 70 and above. Upon closer look at these tails, it becomes apparent that the distributions are almost identical in shape, except that the simulated data is one age bin, ie. 5 years, ahead of the SEER data. For example, this is shown in the lower age end of the figure: the simulated in situ incidence for age group 35-40 is almost identical to the SEER incidence for the age group 40-45.

Invasive cancer incidence

As stated in Section 1, roughly 1 in 8 women, or 12.5% will develop invasive breast cancer over the course of their lifetime [2]. This statistic was estimated from the simulated cohort of 50,000 patient histories: in the simulation, roughly 15.4% of women contract invasive breast cancer.

The SEER cancer Statistics Report [32] also provides invasive cancer incidence rates for women in 5-year age bins from 35 to 85+. Since the age of cancer becoming invasive was not part of the simulation object created, the simulated stage history was used to extract the age at which the simulated women (the ones who did get invasive cancer) reached the invasive stage for the first time. The resulting age data was then binned to match the SEER data bins, and both data sets were normalised in order to compare the distributions. Figure 4.2 shows the histograms for the simulated and SEER data.

(31)

30

40

50

60

70

80

90 Age

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14 _SEER

simulated

Figure 4.2: Age distribution for invasive cancer incidence: simulated vs. SEER data

The figure shows that the simulated data for invasive cancer incidence is not as close to the equivalent SEER data as the in situ incidence data was. The overlap between the 2 histograms is much less than in Figure 4.1. Despite this, the histograms displayed in Figure 4.2 resemble each other more than in the initial impression. Upon closer look, it is clear that both follow a similar distribution, except that the simulated data is again shifted to the left of the SEER data. However here the shift seems more significant than for in situ incidence and not as regular: shifting the whole simulation histogram by 1 age-bin would still not cover most of the differences with the SEER data. The simulated Invasive incidence distribution is more dense than the SEER distribution up till the 70-year old mark, after which it is less dense than for the SEER data.

Patient trajectories

Another way of checking that the simulator was performing as expected was to look at how each patient’s state history evolved over time. Since the state number corresponds roughly to the order in which they are reached during the progression of the cancer, it was relatively straight-forward to check for anomalies. As an example, the 10 first patients’ state time series were plotted, as shown in Figure 4.3.

(32)

Figure 4.3: Plot of 10 simulated patients’ state trajectories over time.

The figure shows that each of the 10 patients has a unique trajectory. However, roughly 3 trends are observed among them. The 1st trend is shown in patients 2, 3, 6 and 8. For these four cases, the patient remains in the healthy states (states 0-3) before transitioning suddenly to state 150, representing all-cause death. The 2nd trend is shown by patients 0, 4 and 7. Here the patient moves into one of the cancer states (states 4-148) before returning to a healthy state after a short period of time. The patient then dies an all-cause death at an older age, in the same way as the trend 1 patients. Patients 1, 5 and 9 exhibit the behaviour of the 3rd trend. These patients move progressively through different cancer states. They sometimes spend extended periods of time in a single cancer state, but always move into a higher cancer state at some point, until they die. Patient 1, however, differs slightly from the other 2 members of this trend as she transitions into the all-cause death state (state 150), while the other 2 transition into the cancer death state (state 149).

It must be noted that these 10 patients’ respective behaviours are not accurate representa-tions of the proportion in which these behaviours exist in the whole simulated data set. They do however provide a good summary of the general types of behaviour that exist, that is why these patients were selected for plotting.

Transition matrix

From simulating 1 million women’s breast cancer histories, the Maximum Likelihood Transi-tion Probabilities were obtained for each statet/statet+1combination. The resulting 151-by-151 matrix representing the natural transition model was plotted as a heatmap, as shown in Figure 4.4. As displayed in the colour bar legend, the higher the probability, the "hotter" the colour, with the lowest probabilities in dark red and even black for 0 values, and the highest probabilities in orange, yellow, even white for probabilities of 1. The full matrix can be found in Appendix II.

(33)

4.2. POMDP solution

Figure 4.4: Heatmap showing the natural transition probabilities

The heatmap shows that the matrix is almost an upper triangular matrix, except for a few cases where the patient transitions from a cancer state (states 4 to 148) to one of the healthy states (states 0 to 3). The highest probabilities are generally along the diagonal of the matrix, where the patient remains in the same state. The probabilities then appear to get weaker and weaker the further they are from the diagonal, becoming basically zero at a certain normal distance from this line. However, most starting states have a small probability of moving into one of the death states (states 149 and 150), as shown by the vertical dashed line in the far right of the figure.

4.2 POMDP solution

The Perseus solved the POMDP in 1297 time-steps, with an expected total reward of 42.62 QALYs. The POMDP solution output from Erwin Walraven’s solver program is a value func-tion file in the Cassandra alpha-file format. According to Cassandra’s website [30], the value function gives an alpha vector for each action in the policy. There can be several alpha vec-tors with the same action, to be executed under different belief conditions. Each alpha vector contains the coefficients of a hyperplane equation in the belief space. There is one coefficient for each state in the state space. The optimal action given a belief state vector is whichever ac-tion’s alpha vector gives the highest value when its dot product with the belief state is taken. The selection of the optimal action at each point in the process is therefore entirely dependent

A Partially Observable Markov Decision Process for Breast Cancer Screening

Linköping University | Department of Computer and Information Science

Master thesis, 30 ECTS | Statistics and Data Mining

2019 | LIU-IDA/STAT-A--19/003--SE

A

Partially

Observable

Markov Decision Process for

Breast Cancer Screening

Joshua Hudson

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

1

Introduction

1.1

Background

1.2

Literature Review

1.3

Aim

1.4

Research questions

1.5

Delimitations

2

Theory

2.1

Markov Decision Processes

2.2

Partially-Observable Markov Decision Processes

2.3

Solving a POMDP

3

Method

3.1

Considerations and assumptions regarding breast cancer

Quantifying cancer progression

3.2

POMDP formulation

Actions

States

Observations

3.3

Probability models

Transition model

Observation model

Initial model

Rewards

3.4

Simulation model

The Variables

Time component:

Time series

The process

3.5

Solving the POMDP

Choice of horizon and discount factor

Solving the POMDP

Executing the optimal policy

4

Results

4.1

Simulation results

In situ cancer incidence

30

40

50

60

70

80

90

Age

0.00

0.02

0.04

0.06

_SEER

_SEER