• No results found

Diagnosing Metastatic Prostate Cancer Using PSA:A Register-Based Cohort Study with Missing Data

N/A
N/A
Protected

Academic year: 2021

Share "Diagnosing Metastatic Prostate Cancer Using PSA:A Register-Based Cohort Study with Missing Data"

Copied!
64
0
0

Loading.... (view fulltext now)

Full text

(1)

U.U.D.M. Project Report 2017:6

Examensarbete i matematik, 30 hp

Handledare: Rolf Larsson

Examinator: Denis Gaidashev

Maj 2017

Department of Mathematics

Uppsala University

Diagnosing Metastatic Prostate Cancer Using PSA:

A Register-Based Cohort Study with Missing Data

(2)
(3)

Diagnosing Metastatic Prostate Cancer Using PSA:

A Register-Based Cohort Study with Missing Data

Marcus Westerberg

Uppsala University, Sweden

(4)

Abstract

Serum levels of prostate specific antigen (PSA) is a cornerstone of the assessment of the prostate cancer (PCa) risk category, which, in turn, is central for treatment decisions and is a strong determinant of outcome of the PCa. Metastatic PCa is currently diagnosed with the use of a bone scan, indicated by the variable M stage, or with a high PSA serum level (≥ 100 ng/ml). PSA has traditionally been considered as a useful indicator for poorer PCa prognosis and also for predicting metastatic disease, although recent studies have indicated that the usefulness of PSA for predicting metastatic disease is questionable. The aim is to assess how well PSA testing can be used to predict distant metastatic PCa, evaluate how the current threshold performs and potentially suggest an improved and more detailed threshold by stratifying according to Gleason Grade Group (GGG).

In order to do this, the M stage must be known, but far from all men receive a bone scan since the harms of false positives often outweigh the benefits, and therefore there are men with a potential metastatic PCa who never underwent imaging. Missing data in clinical registers is common and there are several ways of handling this, including complete case analysis and various imputation methods. Men with unknown M stage is a big obstacle that needs to be tackled in order to answer the main question.

The data has been collected by several relevant institutions, including NPCR, and stored in a database called PCBaSe. It contains comprehensive material of all men diagnosed with PCa between 2000-2012 and their clinical data such as PSA, GGG, and TNM-stages, and socio-economic data such as educational level and civil status, and other relevant data.

To address the missing data issue, multiple imputation by chained equations was applied and the results were compared with a complete case analysis. In addition, a sensitivity analysis was conducted to explore the effect of a nonignorable missing data mechanism on the results. The performance of PSA testing was assessed using Receiver Operating Characteristic (ROC) curves and associated quantities like true and false positive fractions and likelihood ratios. Mortality stratified according to Gleason Grade Group and intervals of PSA was analysed to compare the PCa prognosis between the strata, and also to evaluate the imputation model.

The results showed that men with and without metastatic PCa had considerably overlapping distri-butions of PSA in all Gleason grade groups, which was reflected by poor test performance in terms of positive fractions. For example, for false positive fractions below 10% then true positive fractions were below 50%. Survival curves showed fairly high agreement between imputed M stage and complete case M stage, except for men with low PSA 0-50 where there were indications of an overestimation of the prevalence of metastases. The nonignorable imputation model partly accounted for this and managed to lower the prevalence of metastases, especially in this region of the data.

The main conclusion was that no PSA cut-off manages to distinguish between metastatic and non-metastatic PCa, in any GGG. The conclusions were consistent under various model assumptions governed by a parameter k, apart from the prevalence numbers of metastatic disease which were sensitive to the choice of k. Therefore all men with high PSA should be imaged and the PSA threshold of 100 ng/ml as an indicator of metastatic disease should not be used.

Popul¨

arvetenskaplig Sammanfattning

N¨ar man bed¨ommer hur allvarlig en prostatacancer ¨ar s˚a anv¨ander man en riskkategorisering. Det ¨ar

vik-tigt att bed¨omma riskkategorin eftersom den ligger till grund f¨or behandlingsbeslut och ¨ar starkt kopplad

till utfallet av prostatacancern. Riskkategorin fastst¨alls genom att man bland annat tar ett blodprov d¨ar

man m¨ater m¨angden prostataspecifikt antigen (PSA), best¨ammer Gleason-gruppniv˚an med hj¨alp av en

biopsi av prostatan, och best¨ammer T-stadiet genom en rektalunders¨okning. PSA provet har traditionellt

ansetts vara anv¨andbart f¨or att uppt¨acka v¨arre prostatacancrar och ¨aven metastaserad prostatacancer.

Den senare formen diagnostiseras idag med hj¨alp av en skelettscintigrafi som best¨ammer M-stadiet hos

prostatacancern, d¨ar M0/M1 betyder inga metastaser p˚avisade/metastaser p˚avisade, eller med hj¨alp av

PSA provet d¨ar PSA-v¨arden ¨over 100 ng/ml indikerar metastaser. P˚a senare tid har studier visat att

(5)

upp-sats ¨ar d¨arf¨or att avg¨ora hur bra PSA provet ¨ar p˚a att f¨oruts¨aga metastaserad prostatacancer, om den

nuvarande tr¨oskeln p˚a 100 ng/ml ¨ar optimal eller kan justeras och f¨orb¨attras genom att ocks˚a anv¨anda

Gleason-gruppniv˚an.

F¨or att kunna ta reda p˚a detta s˚a m˚aste M-stadiet vara k¨ant, men l˚angt ifr˚an alla m¨an med

prosta-tacancer genomg˚ar en skelettskintigrafi eftersom skadorna som kan f¨olja vid felaktig positiv diagnos ofta

¨

overv¨ager f¨ordelarna, och d¨arf¨or finns det m¨an med en potentiell metastaserad prostatacancer som ej

genomg˚att skelettscintigrafi. Detta resulterar i saknad data, vilket ¨ar vanligt f¨orekommande i kliniska

register och inneb¨ar vissa problem vid statistiska analyser. Det finns flera s¨att att hantera detta p˚a, till

exempel s˚a kan man utesluta alla m¨an med saknad data ur analysen, eller s˚a kan man anv¨anda sig av

n˚agon typ av imputeringsmetod f¨or att simulera troliga v¨arden d¨ar det saknas data. M¨an med ok¨ant

M-stadium ¨ar ett stort problem som m˚aste hanteras f¨or att besvara huvudfr˚agan.

Data har samlats in av flera relevanta institutioner, inklusive Nationella Prostatacancerregistret

(NPCR), och lagras i en databas som kallas PcBase. Den inneh˚aller omfattande material av alla m¨an

som diagnostiserats med prostatacancer mellan 2000-2012 och deras kliniska data s˚asom PSA,

Gleason-gruppniv˚a, och TNM-stadier, och socioekonomisk data som utbildningsniv˚a och civilst˚and, och andra

relevanta uppgifter.

Den saknade datan imputerades med hj¨alp av en algoritm som kallas MICE. Denna algorithm

gene-rerar flera fullst¨andiga dataset som sedan analyseras med vanliga statistiska metoder. Resultaten fr˚an de

fullst¨andinga dataseten kombinerades sen till ett resultat, som sedan j¨amf¨ordes med en analys d¨ar alla

m¨an med saknad data uteslutits. En k¨anslighetsanalys utf¨ordes f¨or att unders¨oka effekten av en mekanism

som p˚averkar hur troligt det ¨ar att en man med saknat M-stadium har metastaser j¨amf¨ort med en man

med k¨ant M-stadium. PSA testets f¨orm˚aga att urskilja m¨an med M0 fr˚an m¨an med M1 bed¨omdes med

hj¨alp av Receiver Operating Characteristic (ROC) kurvor och tillh¨orande storheter som sensitivitet och

specificitet. Dessa storheter beskriver risken f¨or felaktigt negativ diagnos och felaktigt positiv diagnos,

d¨ar st¨orre v¨arden betyder l¨agre risk. Prostatacancerd¨odligheten stratifierat enligt Gleason-gruppniv˚a och

PSA analyserades f¨or att j¨amf¨ora prostatacancerprognosen mellan m¨an med M0 och M1.

Resultaten visade att m¨an med M1 och M0 hade ordentligt ¨overlappade PSA-f¨ordelningar i alla

Gleason-gruppniv˚aer, vilket ˚aterspeglades av den l˚aga sensitiviteten och specificiteten. Till exempel,

n¨ar specificiteten var ¨over 90% s˚a var sensitiviteten under 50%. ¨overlevnadskurvorna visade en

gans-ka h¨og ¨overensst¨ammelse mellan imputerat M-stadium och icke imputerat M-stadium, med undantag f¨or

m¨an med l˚aga PSA-v¨arden (0-50 ng/ml) d¨ar det fanns indikationer p˚a en ¨overskattning av f¨orekomsten

av metastaser. Utifr˚an k¨anslighetsanalysen kunde man se att vissa imputeringsmodeller delvis lyckades

kompensera f¨or detta. Detta betydde att det var sv˚art att anv¨anda PSA-provet f¨or att diagnostisera

metastaserad prostatacancer korrekt.

Slutsatsen var att ingen PSA-tr¨oskel lyckades ˚atskilja m¨an med metastaserad och icke-metastaserad

prostatacancer tillr¨ackligt bra i n˚agon av Gleason-gruppniv˚aerna. Detta kunde ses i alla

imputerings-modeller, ¨aven om f¨orekomsten av metastaserad prostatacancer skiljde sig beroende p˚a modell. Detta

betyder att alla m¨an med h¨ogt PSA borde genomg˚a en skelettscint och att PSA-tr¨oskeln p˚a 100 ng/ml

som en indikator av metastaserad prostatacancer inte b¨or till¨ampas.

Foreword and Acknowledgements

Special acknowledgements to my supervisors Rolf Larsson, Uppsala University, guidance and advice,

and Hans Garmo, Regional Cancer Centre Uppsala/ ¨Orebro, for leadership and for making this thesis

possible. Additional thanks to Frederik B. Thomsen, Copenhagen Prostate Cancer Center, and P¨ar

(6)

Contents

1 Introduction 7

1.1 Purpose . . . 7

1.2 Hypotheses . . . 7

1.3 Expectations and Challenges . . . 7

1.4 Delimitations . . . 7

1.5 Method . . . 7

1.6 Conclusions . . . 8

2 Background 9 2.1 Medical Procedures . . . 9

2.1.1 Measures Used for Risk Categorisation . . . 9

2.1.2 Other Variables . . . 10

2.2 Data Collection . . . 11

2.2.1 PCBaSe and NPCR . . . 11

2.2.2 Data Retrieved from PCBaSe . . . 11

2.2.3 Costs . . . 12 2.3 Previous Research . . . 12 3 Theory 13 3.1 Basic Inference . . . 13 3.2 Survival Analysis . . . 14 3.2.1 Definitions . . . 14

3.2.2 Asymptotic Distribution and Confidence Intervals . . . 15

3.2.3 The Cox Proportional Hazards Model . . . 17

3.2.4 Restricted Mean Survival . . . 18

3.3 Evaluation of Tests for Classification . . . 18

3.3.1 Notation and Definitions . . . 18

3.3.2 Estimation . . . 21

3.4 Tests for Normality . . . 22

3.5 Bayesian Inference for Missing Data . . . 22

3.5.1 Notation and Definitions . . . 23

3.5.2 Multiple Imputations from the Posterior of the Missing Values . . . 26

3.5.3 Multiple Imputations Drawn from a Finite Number of Completed Datasets . . . . 27

3.5.4 Rubin’s Rules . . . 31

3.6 Multiple Imputation using Chained Equations . . . 31

3.6.1 Fully Conditional Specification and the MICE Algorithm . . . 31

3.6.2 The Choice of Predictors and the Visit Sequence . . . 33

3.6.3 Univariate Imputation Methods . . . 33

3.6.4 Nonignorable Missing Data . . . 36

3.6.5 Derived Variables . . . 37

3.6.6 Parameter Settings and Convergence Diagnostics . . . 38

4 Statistical Analysis 39 4.1 Imputation . . . 39

4.1.1 Model Form and Derived Variables . . . 39

4.2 Analysis of Imputed Data . . . 40

(7)

5 Results 42

5.1 Baseline Characteristics . . . 42

5.2 Imputation Model Convergence Diagnostics . . . 42

5.3 Study of Parameter Transformations . . . 44

5.4 Analysis of Imputed Datasets . . . 44

5.5 Sensitivity Analysis . . . 49

6 Discussion and Conclusions 52 6.1 The Imputation . . . 52

6.2 Diagnosing Metastatic Prostate Cancer using PSA . . . 53

6.2.1 Defining a Good Test . . . 53

6.2.2 Test Performance . . . 54

6.3 Weaknesses and Strengths of the Study . . . 54

6.4 Conclusions . . . 54

7 Recommendations and Future Work 55 8 Appendix 56 8.1 Baseline Characteristics . . . 56

8.2 Cox Proportional Hazards: Assessment of Model Assumption . . . 57

8.3 Parameter Transformations . . . 58

8.4 Sensitivity Analysis on Prevalence of M1 Disease . . . 60

9 References 61

List of Figures

1 Relating pre-test and post-test probability through a likelihood ratio. . . 20

2 Visual representation of true positive and false negative fractions. . . 20

3 Example of a ROC curve. . . 20

4 Characteristics before and after imputation. . . 43

5 Convergence diagnostics for all variables with missing data. . . 43

6 Distribution of prostate specific antigen (PSA). . . 45

7 Receiver Operating Characteristic (ROC) curves. . . 45

8 Kaplan-Meier survival curves of Imputed and Complete cases. . . 47

9 Survival curves from Cox proportional hazards model . . . 49

10 Sensitivity Analysis on Survival Curves. . . 51

11 Evaluation of proportional hazards . . . 57

12 Comparison of CI for positive fractions . . . 58

13 Comparison of CI at survival times . . . 59

List of Tables

1 Definition of GGG . . . 9

2 Notation for cross-classified test results and disease status for a fixed threshold. . . 21

3 Proportion of complete data . . . 39

4 Predicator Matrix . . . 40

5 Extract from baseline characteristics . . . 42

6 Estimates and confidence intervals for positive fractions, predictive values and likelihood ratios . . . 46

7 Confidence intervals for proportion of death at 10 years from diagnosis . . . 48

(8)

9 Sensitivity Analysis on Positive Fractions . . . 50

10 Baseline characteristics . . . 57

11 Tests for normality of positive fractions . . . 58

12 Tests for normality of the proportion of death at 1, 5 and 10 years from diagnosis . . . 59

13 Sensitivity Analysis on Prevalence of M1 Disease . . . 60

Nomenclature

Medical Terms

CCI Charlson Comorbidity Index

GGG Gleason Grade Group

LISA Longitudinal Integration Database for Health Insurance and Labour Market Studies

NPCR National Prostate Cancer Register of Sweden

PCa Prostate Cancer

PCBaSe Prostate Cancer Data Base Sweden

PSA Prostate-Specific Antigen

RP Radical Prostatectomy

RT Radio Therapy

Mathematical Notation

P r(A) Probability of an event A

P (X) (Joint) density or probability mass function of a scalar (or vector) random variable X

Xi Explanatory variable Yi Response variable CI Confidence Interval d −→ Convergence in distribution P −→ Convergence in probability

TPF, FPF True and False Positive Fractions

PPV, NPV Positive and Negative Predictive Values

LHR+/- Positive and Negative Likelihood Ratios

(9)

1

Introduction

Serum levels of prostate specific antigen (PSA) is a cornerstone of the assessment of prostate cancer (PCa) risk category, which, in turn, is central for treatment decisions and is a strong determinant of outcome of the PCa. Metastatic PCa is diagnosed with the use of a bone scan or with a high PSA serum level (≥ 100 ng/ml). PSA has traditionally been considered as a useful indicator for poorer PCa prognosis and also for predicting metastatic disease (M stage), although recent studies have indicated that the usefulness of PSA for predicting metastatic disease is questionable.

During the diagnostic procedure, far from all men undergo imaging/bone scan since the harms of false positives often outweigh the benefits, and therefore there are men with a potential metastatic PCa who never underwent imaging. Missing data in clinical registers is common and there are several ways of handling this, including complete case analysis and various imputation methods.

1.1

Purpose

The aim is to assess the diagnostic test performance of classifying distant metastatic PCa using PSA and Gleason Grade Group (GGG), to evaluate how the current threshold performs and potentially suggest an improved and more detailed threshold using GGG. In order to do this, the unknown M stages in the data register need to be addressed. This issue will be approached with the use of multiple imputations, implemented using the MICE algorithm, and the results will be compared with a complete case analysis in order to assess potential bias, flaws and discrepancy in results, between the methods. In the end, this will justify the choice of methods for further analysis and increase the external validity.

1.2

Hypotheses

The null hypothesis is that the current PSA-threshold meets the requirements of a good test for predicting metastatic disease, and that it is reasonable for all GGG. The alternative hypothesis is the contrapositive, that the current threshold is not satisfactory across all GGG.

1.3

Expectations and Challenges

In light of recent studies, it is expected that the current threshold of PSA≥100 ng/ml for classification of metastatic disease will not be uniformly optimal for all Gleason Grade Groups and that there could be room for improvement. It can potentially prove to be difficult to settle for new threshold recommendations if the test accuracy is poor. Also, the definition of a good test must first be discussed and may prove to be a challenge.

Concerning the missing data issue, it is expected that the MICE algorithm will accommodate for most of the missing information and generate reliable parameter estimates, if implemented carefully, based on the flexibility of the algorithm and since it is known to perform well in various situations. One challenge will be to assess the validity of the imputation model, including model assumptions, in order to argue that it generates parameter estimates for the analysis that are likely to be close to the unobserved ”true” values.

1.4

Delimitations

Men missing more than one of PSA, GGG and T stage were not considered in this study.

1.5

Method

(10)

The performance of PSA testing will be assessed using Receiver Operating Characteristic (ROC) curves and associated quantities like true and false positive fractions and likelihood ratios. Survival curves stratified according to Gleason Grade Group and intervals of PSA will be used to compare metastatic and nonmetastatic PCa mortality, and also to evaluate the imputation model. Restricted mean survival and years of life lost will also be estimated for a relevant strata to assess consequences of a false positive misclassification of metastatic disease. A Cox proportional hazards model will be fitted to explore this in more detail.

The data has been collected by several relevant institutions and stored in a database called PCBaSe. The level of statistical significance is set to 0.05 and the statistical computer software used was R version 3.3.21.

1.6

Conclusions

No PSA cutoff could separate men with M0 and M1 disease, and neither positive fractions or likelihood ratios were satisfactory for any cutoff, meaning all men with high PSA should be imaged. Metastatic disease status for a large proportion of men was based on imputations rather than imaging, which was suboptimal, but the imputations were performed using extensive data from NPCR combined with the MICE algorithm which has been shown to perform well in various scenarios. Although the results var-ied when comparing the complete case analysis with the various nonignorable imputation models, the conclusions were consistent; PSA should not be used as an indicator of metastatic disease, not even in combination with GGG, meaning that the threshold of 100 ng/ml should be removed from current guidelines.

(11)

2

Background

The risk stage classification of prostate cancer (PCa) is central for treatment decisions and is a strong determinant of outcome of the PCa. Serum levels of prostate specific antigen (PSA) is a cornerstone of

the assessment of the risk stage, according to the Annual NPCR report 20142. High PSA is considered

as an indicator of poorer PCa prognosis, Cooperberg, Broering and Carroll (2009), and is recognized as useful predictor of bone metastases, Lorente et al (1996). The definition of the most severe risk category,

level 5, is based on the indicator M stage3 of present distant metastatic PCa, and a threshold of 100

ng/ml for PSA. Formally, one is classified into the level 5 risk category (distant metastatic disease) if M stage is M1 and/or PSA at least 100 ng/ml. Other variables used for risk stage classification are Gleason Grade Group (GGG), T stage and N stage.

Recently, the PSA threshold of 100 ng/ml has been questioned, Stattin et al (2015), and therefore it is of essence to thoroughly assess the diagnostic test performance of classifying distant metastatic PCa using PSA, and to evaluate how the current threshold performs.

In order to do this, the presence of distant metastases needs to be assessed, which is formally assessed by imaging using scintigraphy, Adami et al (2006), but far from all men receive imaging since the harms of false positives often outweigh the benefits. The inappropriate PCa imaging has decreased since the year 2000, with only a slight decrease in appropriate imaging in high-risk patients, Makarov et al (2013), and therefore there are men with a possibly metastatic PCa who never underwent imaging. On the other hand, missing data in clinical registers is common and there are several ways of treating missing data, including complete case analysis (omitting all observations with at least one variable with missing data) and various imputation methods. Each approach has its own benefits, limitations and assumptions, Newgard and Lewis (2015). This issue will be approached with the use of multiple imputations using the MICE algorithm and the results will be compared with a complete case analysis in order to assess potential bias, flaws and discrepancy in results.

2.1

Medical Procedures

2.1.1 Measures Used for Risk Categorisation

Table 1: Definition of Gleason Grade Group.

GGG G1+G2 1 = 2-6 2 = 3 + 4 3 = 4 + 3 4 = 8 5 = 9-10

The Gleason Grade Group (GGG) is obtained with a needle core biopsy taken from the prostate. The procedure is performed by entering the rectum with a tool equipped with 6-12 hollowed needles, and then inserting the needles into the prostate to extract a tissue sample containing cells from the prostate. If the patient has prostate cancer, then the sample will hopefully contain cells from a part of the prostate that contains cancer. Then the pathologist evaluates the level of cell mutations in the cancer cells and uses this to assess the severity of the cancer, which is measured in two Gleason grades on the scale 1-5. Then GGG is calculated by summing the most common Gleason grade (G1) and the highest Gleason grade (G2) according to table 1. The procedure might be painful and there is a risk of side effects, of which the most common severe side effect is infection, Sandin and Wigertz (2013).

To retain the serum level of prostate specific antigen (PSA), measured in ng/ml, an ordinary blood sample is taken.

The three different stages T, N and M constitutes the TNM classification of malignant tumours. T-stage is the primary stage used in the risk categorisation of prostate cancer and is assessed by the urologist after a rectal examination of the prostate. Tumours found growing outside of the prostate and into the vas deferens are classified as T4. If the tumour is outside the capsule it is classified as T3, if there are lumps that seem to be inside the capsule it is classified as T2, and if no lumps are found it is classified as T1. T1a-b tumours are discovered by transurethral resection of the prostate (TURP), which

2http://npcr.se/in-english/

(12)

basically is a reduction of the prostate by planing to simplify micturition, Sandin and Wigertz (2013), while T1c tumours are discovered by elevated PSA prior to a biopsy.

The condition of the regional lymph nodes is classified by N stage. N0 means that there were no signs of regional lymph node metastases during surgery and N1 corresponds to signs of region lymph node metastases during surgery. If the relevant surgery was not performed then the N stage is classified as NX. In some cases the N stage is unknown and there is no record of whether relevant surgery has been considered or not, in which case the N stage is considered as missing.

Lastly, the distant metastases are described by the M stage, and is an indicator where M0/M1 means no signs/signs of distant metastases, and MX indicates that the assessment was not performed (only possible before 2011), Stattin et al. (2013). The skeleton is the dominant locale for distant metastases of prostate cancer. Therefore, the primary investigation is limited to the skeleton, unless symptoms indicate metastasis to other organs. Bone imaging, mainly scintigraphy, is the standard method for investigating bone metastases of prostate cancer. In case of unclear scintigraphy then magnetic resonance imaging (MRT), positron emission tomography (PET) or computed tomography (CT) is carried out if necessary.

2.1.2 Other Variables

There are several treatments for prostate cancer and the two main curative treatments are Radical prostatectomy (RP) and Radio therapy (RT). RP is a surgery applied to any patient with clinically localized prostate cancer that can be completely excised surgically, who has a life expectancy of at least 10 years, and has no serious comorbid conditions that would contraindicate an elective operation, Mohler et al (2010). It is a common treatment for patients having clinically localized cancer, and the primary goal of RP is cancer extirpation, where the prostate gland and seminal vesicles are removed, while the secondary goals are preservation of urinary continence and potency. Urinary incontinence is one of the most common postoperative complications following RP, along with erectile dysfunction, Tewari et al. (2013). The incontinence is usually mild, but the risk of impotence is about 50%, Adami et al (2006). Radio therapy can be applied in several ways. Brachy therapy is a kind of RT that involves placing radioactive sources into the prostate tissue. External radio therapy uses external beams and is one of the principle treatment options for clinically localized prostate cancer, Mohler et al. (2010). Modern RT offers effective, durable treatment for all stages of prostate cancer with low levels of clinically significant toxicity, Tewari et al. (2013). Side effects are mild forms of incontinence, gastrointestinal symptoms and impotence, Adami et al (2006). There are no randomized studies showing that men with metastatic prostate cancer that receive curative treatment by RP or RT are helped in terms of life expectancy and

life quality, Nationellt v˚ardprogram f¨or prostatacancer4 (2015).

Noncurative treatment includes hormonial treatments, or androgen deprivation therapy (ADT), which may include antiangrogens (AA) or Gonadotropin-releasing hormone (GnRH). These are used for retain-ing an incurable PCa, Mohler et al (2010). For example, ADT is the standard treatment for bone metastases, and it has minimal benefit on survival meaning it is mainly palliative. Mild side effects of ADT can be for example hot flushes, libido, mood and cognitive changes. Severe side effects include increased cardiovascular risk and osteoporosis, increased risk of diabetes, coronary heart disease, myocar-dial infarction, sudden cardiac death and loss of bone mineral density leading to increased risk of clinical fracture, Tewari et al (2013).

Conservative treatment (CT) means either active surveillance or watchful waiting. Active surveillance involves actively monitoring the course of disease with the expectation to intervene with curative treat-ment if the cancer progresses, with a risk of missing the opportunity for cure and getting progression and/or metastases. Similarly, watchful waiting is used with the intention of palliative (ADT) treatment at progression in men with prostate cancer, Mohler et al (2010).

Mode of detection indicates the principal way of how the prostate cancer was revealed. Symptomatic mode of detection includes lower urinary tract symptoms (LUTS), other symptoms such as hematuria, or remote symptoms such as back pain when having distant metastases or other cancer. Non-symptomatic indicates health assessment including PSA testing and without urinary tract symptoms, Stattin et al (2013).

(13)

Charlson Comorbidity Index (CCI) is a measure of the comorbidity, which predicts the ten-year mortality for a patient who may have a range of comorbid conditions. The CCI score is calculated by assigning weights to 17 medical conditions, including diabetes and hypertension. Each condition is assigned a score of 1, 2, 3, or 6, and the final CCI score is given as the sum of these scores. Then CCI categories are formed for final scores of 0, 1, 2, or 3+. The prostate cancer is not included, and one does not obtain a CCI score of 6 for metastatic PCa. The CCI scores indicate 0 = no comorbidity, 1 = mild comorbidity, 2 = medium comorbidity, and 3+ = severe comorbidity, Charlson et al (1987).

2.2

Data Collection

2.2.1 PCBaSe and NPCR

The source of data, Prostate Cancer Data Base Sweden (PCBaSe), Hemelrijck et al (2013), is a nationwide population-based research database consisting of men diagnosed with prostate cancer. For this study, records between 2000 and 2012, a total of 112 013 men, were retrieved. PCBaSe was created by record linkage between the National Prostate Cancer Register of Sweden (NPCR) and several of other high quality national registers, for example the Prescribed Drug Register, In-Patient Register, Cause of Death Register, National Population Register and the Longitudinal Integration Database for Health Insurance and Labour Market Studies (LISA).

In particular, NPCR is a tool for documentation and assessment of quality of the health care of men with prostate cancer. It is also used for research and to improve the treatment of the disease. The register contains data including waiting times, treatment and its outcome, spread and type of cancer, and diagnosis procedure. The participation in the register is voluntary and it is designed such that it is not possible to track or identify specific individuals in the compiled material. Recent studies show that NPCR has high capture rate, completeness and accuracy, Tomic et al (2015a, b). For an informative and concise overview of NPCR, see Stattin et al (2017).

2.2.2 Data Retrieved from PCBaSe

The covariates age and calender year at the time of diagnosis, highest educational level5, martial status

(married/partnership, unmarried, widow, divorced/separated), mode of detection, survival times and censoring indication, were retrieved from PCBaSe.

In addition, clinical data were retrieved on serum levels of PSA, tumour differentiation by Gleason Grade Group, cancer stage according to the TNM classification, and primary treatment. The risk category

is a composite variable based on these variables. The definition of the risk categories is similar to

the National Comprehensive Cancer Network (NCCN) guidelines, Mohler et al (2010) but altered to distinguish between regionally metastatic and distant metastatic disease.

- Low-risk localized prostate cancer was defined as clinical local stage T1/T2, GGG 1 and PSA less than 10 ng/ml

- Intermediate risk localized prostate cancer was defined as stage T1/T2, GGG6 2-3 and/or PSA

level of 10 - 20 ng/ml

- High-risk localized prostate cancer was defined as stage T3 and/or GGG 4-5 and/or PSA levels of 20 - 50 ng/ml

- Regionally metastatic or locally advanced prostate cancer was defined as stage T4 and/or N1 disease and/or PSA levels of 50 - 100 ng/ml without distant metastasis (M0 or MX disease)

- Distant metastatic disease was defined as M1 disease and/or PSA at least 100 ng/ml

5low (compulsory school, ≤9 years), middle (upper secondary school, 10 - 12 years), high (college and university, ≥13

years)

(14)

The comorbidity burden was assessed using data from the National Patient Register and classified ac-cording to Charlsons comorbidity index (CCI). Data on treatment modalities included radical prostatec-tomy, radiotherapy, and hormonal therapy. Information on alternative treatment approaches like active surveillance and watchful waiting was also obtained. Data on region where diagnosis was performed was also obtained, where Sweden is divided into six regions: North, Stockholm-Gotland, South, South-East,

Uppsala- ¨Orebro and West.

2.2.3 Costs

When evaluating a diagnostic test it is relevant to compare possible outcome scenarios and their impact on the patient and the society. The cost of certain treatments, their side effects and potential years of life lost due to suboptimal treatment are some factors that need to be considered.

Estimates of costs have been obtained from Socialstyrelsen (2014) and are specified in Swedish kronor as of 2013. The estimated cost of a full RP treatment lies between 85 000 - 120 000 kr. An estimated total treatment cost for antiandrogens is around 19 000 - 23 000 kr , for GnRH about 17 000 kr and palliative treatment with Docetaxel and Prednison costs around 193 000 kr and with Cabazitaxel it costs around 255 000 kr. Skeletal Scintigraphy of the whole body costs around 2600 kr while PET/CT costs around 12 000-16 000 kr.

2.3

Previous Research

The association between PSA≥100 ng/ml and metastatic PCa was assessed in a population based registry study between 1998 and 2009, Stattin et al (2015). A total of 15 635 men in NPCR with PCa were compared in three groups, men with PSA≥100 ng/ml and M0, men with PSA≥100 ng/ml and M1 and men with PSA<100 ng/ml and M1. Results showed that a fourth of men who underwent imaging with PSA≥100 ng/ml had no distant metastases and had two to three times higher 5-year survival than men with M1. The conclusion was that the threshold of PSA≥100 ng/ml as indicator of metastatic PCa should be reconsidered.

Moreover, the use of imaging to identify metastatic prostate cancer has changed over the years. In a retrospective cohort study performed on data from NPCR from 1998-2009 including 99 879 men diagnosed with prostate cancer, Makarov et al (2013) reported that the use of imaging decreased over time and that the amount of inappropriate imaging has decreased from 45% to 3% among low-risk patients, while the number of appropriate imaging in high-risk patients has decreased from 63% to 47%. This was associated with the dissemination of imaging usage data and the latest imaging guidelines to urologists in Sweden led by NPCR. A total of 36% underwent imaging within 6 months of diagnosis, and these had generally higher GGG, PSA levels and clinical stages than men who did not receive imaging. This agrees with findings made by Tomic et al (2016) where missing data for risk classification was assessed in NPCR between 1998 and 2012. In that study, it was concluded that the amount of missing information in order to do a classification is generally low and that men with unknown risk classification most likely had a low risk prostate cancer.

The mortality of men in NPCR treated with noncurative intent was assessed in a retrospective cohort study of 76 437 cases between 1991-2009, Rider et al (2012). The prostate cancer mortality rates varied 10-fold according to risk category, and Figure 1 in this article shows that there was a clear difference between high-risk patients without distant metastases and men with distant metastases, where the latter group had a substantially worse 15-year prognosis.

(15)

3

Theory

3.1

Basic Inference

The Law of Large numbers and the Central Limit theorem are two fundamental theorems which, together with the delta method and a few theorems from the second chapter of van der Vaart (1998), will play an important role when deriving asymptotic confidence intervals. This subsection is therefore devoted to summarize vital material mainly from van der Vaart (1998) to be used later on. The first important tool is the law of large numbers, which is neatly formulated in Grimmet and Stirzaker (2001).

Theorem 1 (The (Weak) Law of Large Numbers). Let Xi, i = 1, . . . , n, be independent and identically

distributed (i.i.d.) random variables with EXi= µ, |µ| < ∞, thenP

n i=1Xi/n

P

−→ µ

This theorem will be used when deriving asymptotic properties for estimators later on. An equally important tool for this is the central limit theorem.

Theorem 2 (The Central Limit Theorem). Let Xi, i = 1, . . . , n, be i.i.d. mean zero, unit variance,

random variables, then√nPn

i=1Xi/n d

−→ X with X ∼ N (0, 1)

The continuous mapping theorem is a useful and simple tool and motivates the use of the plug-in principle. The stochastic convergence remains when applying a continuous function to an estimator with a priori known stochastic convergence.

Theorem 3 (Continuous Mapping). Let f : Rk 7→ Rl

be continuous on C ⊂ Rk such that P (X ∈ C) = 1.

If Xn d −→ X then f (Xn) d −→ f (X), and if Xn P −→ X then f (Xn) P −→ f (X).

From the continuous mapping theorem it is easy to prove Slutsky’s lemma, which is a useful tool for deriving asymptotic results for standardized random variables where the variance estimator is consistent,

since Yn

d

−→ c if and only if Yn

P

−→ c for a constant c (van der Vaart, theorem 2.7 (iii)).

Lemma 1 (Slutsky). Let Xn, X and Yn be random variables such that Xn

d −→ X and Yn d −→ c ∈ R a constant, then Xn+ Yn d −→ X + c, YnXn d −→ cX and Xn/Yn d −→ X/c as long as c 6= 0. 

The delta method is a basic but important tool used to derive asymptotic results for functions of sequences of random variables. van der Vaart (1998) discusses a general version of the delta method, and below is a simplified version of it for the one dimensional case.

Proposition 1 (The Delta Method). Let φ : D ⊂ R −→ R, differentiable at θ, and Xn be a random

variable taking values in D. Assume that there exists a sequence (rn) ⊂ R such that

rn(Xn− θ) d −→ X as rn−→ ∞ then rn(φ(Xn) − φ(θ)) d −→ φ0θ(X) as rn−→ ∞ and rn(φ(Xn) − φ(θ)) − φ0θ(rn(Xn− θ)) = op(rn) where φ0

θ(x) is a linear map identified with its matrix form φ0θ(x) = φ0θx = φ0(θ)x.

Remark 1. A special case of this, which will be used later, is when we have an asymptotic normal

distribution

nXn

d

−→ X ∼ N (µ, σ2) as n −→ ∞

(16)

Example 1. Let Xn is a positive sequence of random variables such that

nXn

d

−→ X ∼ N (µ, σ2).

(a) If φ(x) = log(x) on D = R+ with φ0(x) = 1/x, then Var(log(Xn)) = σ

2

µ2 + oP(1),

(b) Var(exp(Xn)) = σ2exp(µ)2+ oP(1)

(c) If Xn attains values in (0, 1) then Var(log(− log(Xn))) = σ

2

log(µ)2µ2 + oP(1)

(d) If Xn attains values in (0, 1) then Var(log(1−XXn

n)) =

σ2

µ2(1−µ)2+ oP(1)

Example 2. Following van der Vaart (1998). Let Xn be a sequence of estimators of µ such that

n(Xn− µ)

d

−→ X ∼ N (0, σ2) and S2

n be a sequence of estimators of σ2 such that Sn2

P

−→ σ2> 0 then,

since convergence in probability implies convergence in distribution, we have by Slutsky’s lemma √

n(Xn− µ)/Sn

d

−→ X/σ ∼ N (0, 1)

which can be used to construct an asymptotic confidence interval for µ: Xn±zα/2Sn/

n with asymptotic significance level 1 − α.

Example 3. A special case of example 2 is when we have Xi are independent ∼ Ber(p), so EXi =

p, VarXi = p(1 − p). Then a natural consistent estimator of p is Xn = ˆpn =P

n

i=1Xi/n. By the Law

of Large numbers ˆpn

P

−→ p, so by continuous mapping ˆpn(1 − ˆpn)

P

−→ p(1 − p). Thus an asymptotically

consistent variance estimator is Sn2 = ˆpn(1 − ˆpn) and an an asymptotic confidence interval for p is

ˆ

pn± zα/2p ˆpn(1 − ˆpn)/n with asymptotic significance level 1 − α.

3.2

Survival Analysis

3.2.1 Definitions

The relation between survival time and cause of death can be described by survival curves. These show, quite intuitively, how members of the groups survived over time and may reveal a lot of information. Essentially, the survival curves estimate the probability of survival over the time period considered. The following is based on material from Moeschberger et al (1997).

From here on, let P r(A) denote the probability of an event A, and P (X) denote the (joint) density or probability mass function of a scalar (or vector) random variable X.

Definition 1. The function S(t) = P r(T > t) describes the probability of an individual surviving beyond

the time t and is called the survivor function. When T is discrete, assuming it takes values tj, j = 1, 2, . . . ,

having probability mass function p(tj) = P r(T = tj), where t1< t2< . . . , then the survival function is

defined as

S(t) = P r(T > t) =X

tj>t p(tj)

Definition 2. The function

h(t) = lim

→0

P r(t ≤ T < t + |T ≥ t) 

is called the hazard rate. It is the conditional probability that an individual who survives just to prior to time t experiences the event at time t. When T is discrete the hazard function is given by

h(tj) = P r(T = tj|T ≥ tj) =

p(tj)

S(tj−1)

Remark 2. For discrete T the survival function may be written as the product of conditional survival probabilities

S(t) = Y

j:tj≤t S(tj)

(17)

and since p(tj) = S(tj−1) − S(tj) the survival function is related to the hazard function by the following

equality

S(t) = Y

j:tj≤t

(1 − h(tj))

Definition 3. Kaplan Meier Estimator Assume the events occur at n distinct times t1 < · · · < tn

and that at time tf there are mf events, or deaths, and nf number of individuals at risk. The estimator

ˆ S(t) =  1 if t 1> t Q f :tf≤t(1 − mf nf) if t1≤ t

is called the Kaplan Meier or Product-limit estimator. It is a step function with jumps at the observed

event times. The quantity mf

nf provides an estimate of the hazard rate.

Remark 3. Hosmer, Lemeshow and May (2008) describe censoring mechanisms and how the Kaplan-Meier estimator handles this. Right censoring is dealt with in the following way: if time r is right censored then the corresponding individual has been counted towards the numbers at risk up until time r but will not contribute to the numbers at risk after that, hence the number of individuals at risk after time r will be reduced. The censoring will not contribute to the number of events or deaths. Had the individual experienced death then the overall survival probability would have been smaller and had the individual lived the overall survival probability would have been larger, compared to the censored case.

3.2.2 Asymptotic Distribution and Confidence Intervals

In order to construct a confidence interval for the Kaplan-Meier estimator at a specific time, we need an asymptotic distribution. For a fixed time interval [0, T ] Kalbfleisch and Prentice (2002) show, using counting processes theory, that under mild assumptions about the processes involved, that the Kaplan-Meier estimator is uniformly consistent and, when scaled by the variance estimator, converges weakly to a normal distribution.

Theorem 4. Let t(f ), f ∈ {1, . . . , n} be the ordered failure times where n is the total number of failure

times, mf denote number of failures at t(f ) and nf the number at risk at time t(f ). Assume that the

observations of survival among the nf subjects at risk at time t(f ) are independent Ber(pf) distributed,

then Greenwood’s formula for a variance estimator is given by ˆ V ar[ ˆSn(t)] = ˆSn(t)2 X f :t(f )≤t mf nf(nf− mf)

For a sample of n observations, this estimator is asymptotically consistent.

Proof: We derive the variance estimator. By assumption, the observations of survival among the nf

subjects at risk at time t(f )are independent Ber(pf) distributed, so the natural estimator is ˆpf= 1 −

mf

nf for t1 ≤ t with variance estimator pˆf(1− ˆn pf)

f . Set pf = (1 −

mf

nf) for t1 ≤ t. The strategy is to first find

an estimator for the variance of log( ˆS(t)) =P

f :tf≤tlog(ˆpf), a sum instead of a product, using the delta

method and then back transform this estimator with another use of the delta method. Starting with the logarithm Var(log(ˆpf)) = 1 ˆ p2 f ˆ pf(1 − ˆpf) nf = mf nf(nf− mf)

by example 1 part (a), and by independence we get

Var(log( ˆS(t))) = X

f :tf≤t

mf

(18)

Another application of the delta method applied on the exponential function with X = log( ˆS(t)), as in example 1 part (b), renders

Var( ˆS(t)) = Var(exp(log( ˆS(t)))) = ˆS(t)2Var(log( ˆS(t))) = ˆS(t)2 X

f :tf≤t

mf

nf(nf− mf)

Remark 4. It is possible to show that the Kaplan-Meier estimator is uniformly asymptotically consistent on bounded intervals, i.e.

sup

t∈[0,T ]

| ˆSn(t) − S(t)| P

−→ 0 n −→ ∞,

and, by the consistency of the Greenwood’s estimator, that ˆ Sn(t) − S(t) q ˆ V ar( ˆSn(t)) d −→ N (0, 1) as n −→ ∞.

The last display can be used to construct an approximate 100(1 − α)% confidence interval, compare with example 2.

Remark 5. Since a normally distributed random variable can assume any real value and ˆS(t) ∈ [0, 1],

then, for ˆS(t) close to the endpoints, these intervals might stretch outside [0, 1] which is not desirable. One

way to deal with this problem is for example to stretch [0, 1] to the whole real line by log-log-transforming ˆ

S(t) and calculating an approximate confidence interval which then is back-transformed into [0, 1]. The variance of the log-log and logit transformations are given in the following proposition.

Proposition 2. Under the same assumptions as in theorem 4, if

i) ˆL(t) = log(− log( ˆS(t))), then

Var( ˆL(t)) = 1 log( ˆS(t))2 X f :t(f )≤t mf nf(nf− mf) ii) ˆL(t) = log S(t)ˆ (1− ˆS(t))  , then Var( ˆL(t)) = 1 (1 − ˆS(t))2 X f :t(f )≤t mf nf(nf− mf)

Proof: (i) The variance Var( ˆL(t)) follows by yet another application of the delta method, as in example 1

part (c), Var( ˆL(t)) = Var( ˆS(t)) log( ˆS(t))2S(t))ˆ 2 = 1 log( ˆS(t))2 X f :t(f )≤t mf nf(nf− mf)

(ii) As in (ii) variance Var( ˆL(t)) follows by the delta method, as in example 1 part (d),

(19)

3.2.3 The Cox Proportional Hazards Model

A brief introduction to the relevant concepts of the Cox proportional hazards model is given below and is based on material from Hosmer, Lemeshow and May (2008). The key point of using a Cox proportional hazards model is to incorporate explanatory variables in a survival analysis setting and obtain hazard ratios and adjusted survival curves.

Definition 4. Let t denote time, X = (X1, . . . , Xp) be a vector of explanatory variables that do not

involve t, and β = (β1, . . . , βp). The function h(t, X) is called the hazard function and it is the hazard

rate conditioned on X. The Cox proportional hazards model assumes that the hazard function can be written on the form

h(t, X) = h0(t) exp(XTβ),

where h0(t) is called the baseline hazard function. Furthermore, we call H(t, X) :=

Rt

0h(u, X)du the

cumulative hazard function and H0(t) :=R

t

0h0(u)du the cumulative baseline hazard function.

The reason why h0(t) is called the baseline hazard function is because when X = 0, meaning no

explanatory variables are in the model, then h(t, X) = h0(t). The model is semiparametric because the

baseline hazard is unspecified. If X1 is a binary variable then eβ1 is the effect of exposure adjusted for

other variables. The hazard function can be estimated from the model by maximum likelihood through a partial likelihood function in an iterative manner. The solution is in general not explicit and it is called partial since it only considers probabilities for those who fail and not for those observations that are

censored. From this the estimates ˆh0(t) and ˆβ are obtained. An asymptotic confidence interval is given

by ˆβi/sd( ˆβi) ∼ N (0, 1).

Definition 5. The hazard ratio for two different individuals with explanatory variables X∗ and X is

defined as

HR(X∗, X, t) := h(t, X

)

h(t, X)

It is important to note that the model relies on the assumption that the hazard ratio is constant over time. The hazard ratio is estimated by plugging in the corresponding estimates from the model

ˆ

HR(X∗, X, t) = ˆh(t,Xˆ ∗)

h(t,X) which simplifies to exp((X

− X)Tβ).ˆ

Definition 6. We call S(t, X) := exp(−H(t, X)) the adjusted survival function and S0(t) := exp(−H0(t))

the baseline survival function.

Note that the cumulative hazard function can be written as Rt

0h0(u)e XTβ du = eXTβ H0(t), which implies that S(t, X) = e− exp(XTβ)H0(t)= [e−H0(t)]exp(XTβ)= [S 0(t)]exp(X Tβ) .

The function S0(t) is called the baseline survival function since if there are no explanatory variables

(X = 0), then S(t, X) = S0(t) and we obtain the usual survival function. If one plots the survival function

over time then one obtains the adjusted survival curve, meaning it has been adjusted for explanatory

variables. It has the estimate ˆS(t, X) = ( ˆS0(t))exp(X

Tβ)ˆ

which is obtained during the estimation process mentioned above.

One may evaluate the proportional hazards assumption in several ways, and perhaps the most intuitive way is to do it graphically. Note that if we log-log transform the adjusted survival function we obtain

log(− log(S(t, X))) = XTβ + ln(−ln(S0(t))),

and for two individuals with explanatory variables X∗ and X we have that

log(− log(S(t, X∗))) − log(− log(S(t, X∗))) = (X∗− X)Tβ

(20)

3.2.4 Restricted Mean Survival

We conclude by discussing the mean survival, which is defined below. The material of this section is from Sheldon (2013). The mean survival is a useful statistic to summarise survival data and can be used to describe life years gained or lost. While the estimator is biased and underestimates the mean, it easy to interpret in terms of expected survival time.

Definition 7. We call

µ := ET =

Z ∞

0

S(t)dt the mean survival, and a natural estimator is

ˆ µ := Z ∞ 0 ˆ S(t)dt = n X f =1 ˆ S(tf −1)(tf− tf −1), t0= 0.

The problem with this definition is that it may not in general be useful in practice, since the last

observation might be censored, in which case the integral is not defined since ˆS never reaches zero. One

way of solving this is to consider the restricted mean survival, meaning that we restrict the integral to

the interval [0, t∗], t∗< tn and thus all later observations are disregarded.

Definition 8. We call

µ(t∗) := E min(T, t∗) =

Z t∗

0

S(t)dt the restricted mean survival, and a natural estimator is

ˆ µ(t∗) := Z t∗ 0 ˆ S(t)dt = X f ≥1: tf≤t∗ ˆ S(tf −1)(tf− tf −1),

and variance estimator

Var(ˆµ(t∗)) := ˆµ(t∗)2 n X f =1 mf nf(nf− mf)

If the time is specified in years the restricted mean can be thought of as a t∗-year life expectancy, or

in other words, the expected number of life years left within t∗years.

To compare two groups, for example to compare treatment effects on survival through expected years of life lost, Karrison (1997) suggests that one may use the following asymptotically standard normal quantity

ˆ

µ1(t∗) − ˆµ2(t∗)

pVar(ˆµ1(t∗)) + Var(ˆµ2(t∗))

∼ N (0, 1).

3.3

Evaluation of Tests for Classification

3.3.1 Notation and Definitions

One may evaluate medical tests with a continuous test result for classification of a binary variable by using the Receiver Operating Characteristic curve and associated measures of accuracy. The purpose of the ROC curve is to visualize the discrepancy between two population densities for various thresholds of the test result, and is useful when looking for the optimal choice of threshold to separate the populations, comparing thresholds and to assess overall test performance.

(21)

Definition 9. Let D denote true disease status, being a binary variable such that D =

(

1 for disease

0 for non-disease

Let Y be a continuous test result and c ∈ R a threshold such that a binary test can be constructed T =

(

positive if Y ≥ c

negative if Y ≤ c

Then the true and false positive fractions and positive and negative predictive values at threshold c are defined as

TPF(c) = P r(Y ≥ c|D = 1) , FPF(c) = P r(Y ≥ c|D = 0)

PPV(c) = P r(D = 1|Y ≥ c) , NPV(c) = P r(D = 0|Y < c)

In biomedical research TPF is commonly referred to as sensitivity and 1 − FPF as specificity. The false positive fraction describes how likely it is to get a positive test result when one is in fact free from disease, and 1 − TPF is equal to the false negative fraction, which describes how common it is to fail to detect disease (by a negative test result) when disease is present. An ideal test at threshold c has FPF(c) = 0 and TPF(c) = 1 while a useless test has FPF(c) = TPF(c).

The prevalence p of disease describes how common disease is in the population and is defined as p = P r(D = 1). The positive and negative predictive values describe the usefulness of the test, are more relevant to the patient and caregiver, and depend on the prevalence. A perfect test has PPV(c) = 1 = NPV(c) and a useless test (which does not contribute with any information about the true disease status) has

PPV(c) = P r(D = 1|Y ≥ c) = P r(D = 1) = p NPV(c) = P r(D = 0|Y < c) = P r(D = 0) = 1 − p

Another way of describing the prognostic or diagnostic value of a test are the likelihood ratios. In a Bayesian testing context they correspond to the Bayes factor which, multiplied by the prior odds, gives the posterior odds, compare with Robert (2007).

Definition 10. Using the same notation as above, the likelihood ratios are defined as

LR+(c) := TPF(c) FPF(c) = Sensitivity 1-Specificity LR-(c) := 1 − TPF(c) 1 − FPF(c) = 1 − Sensitivity Specificity

The likelihood ratios attain values in (0, ∞) and describe how the pre-test odds P r(D = 1)/P r(D = 0) = p/(1 − p) is altered by the test result to obtain the post-test odds P r(D = 1|Y )/(1 − P r(D = 1|Y )). Proposition 3. The pretest and post-test odds can be related through the likelihood ratios in the following way

(i) 1−P r(D=1|Y ≥c)P r(D=1|Y ≥c) = LR+(c) P r(D=1) 1−P r(D=1),

(ii) 1−P r(D=1|Y <c)P r(D=1|Y <c) = LR-(c)1−P r(D=1)P r(D=1) .

(22)

0.0 0.2 0.4 0.6 0.8 1.0 0 .0 0 .2 0 .4 0 .6 0 .8 1 .0

Relating Pre-test and Post-test Probability

Pre-test probability P o s t-te s t p ro b a b il it y d=0.01 d=0.1 d=0.5 d=1 d=2 d=10 d=100

Figure 1: Relating pre-test and post-test prob-ability through a likelihood ratio of size d.

The likelihood ratios describe the usefulness of a test, and a useless test has likelihood ratios equal to

1 while a perfect test has LR+(c) = ∞ and LR-(c) = 0.

The further away from 1 in respective direction the more useful the test is. The relation between odds and probability is one-to-one, meaning that we may derive the post-test probability from the pre-test probability

p through 1−p+dpdp where d is one of the likelihood

ra-tios for a fixed threshold c. This relation for various sizes of a likelihood ratio can be seen in 1 to the left. A large positive likelihood ratio means that the test is good at ruling in disease and vice versa. Jaeschenke et al (1994) suggest thumb rules for the likelihood ratios:

LHR+ ≥ 10 and LHR≤ 0.1 indicate a conclusive

change from pretest to post-test probability, 5-10 and 0.1-0.2 for a moderate change, and 2-5 and 0.2-0.5 for a small change.

The positive fractions and the likelihood ratios can easily be visualized for each threshold using the Receiver Operating Characteristic (ROC) curve.

Definition 11. Using the same notation as above, then we let ROC curve be the curve created when true and false positive fractions are calculated for all thresholds c. Formally

ROC() = {(TPF(c), FPF(c)), c ∈ (−∞, ∞)} −4 −2 0 2 4 6 0.0 0.1 0.2 0.3 0.4 0.5 Y Density Non−diseased Diseased Threshold TPF FPF

Figure 2: Visual representation of true positive and false negative fractions.

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

False Positive Fraction (1−Specificity) T rue P ositiv e Fr action (Sensitivity) ● x Positiv e Lik elihood Ratio f or x Negativ

e Lik for xelihood Ratio

ROC cur ve Useless test c − 8 c 8

Figure 3: Example of a ROC curve and the

concepts of likelihood ratios.

(23)

the negative likelihood ratio is the slope of the line between x and (1, 1) since ∆T P F∆F P F = 1−F P F (x)1−T P F (x) =

LR−(x), compare with fig. 3, Johnson (2004).

3.3.2 Estimation

Let N be the total number of observations and define nD=1to be the number of observations with disease

(D = 1) and nD=0 be the number of observations that are truly non-diseased (D = 0). Let n+ and n−,

correspondingly, be the number of observations with positive and negative test results. Then the data can be displayed according to the table below.

Table 2: Notation for cross-classified test results and disease status for a fixed threshold.

D=0 D=1

Y=0 n−D=0 n−D=1 n−

Y=1 n+D=0 n+D=1 n+

nD=0 nD=1

The prevalence p is the parameter of the binomial distribution governing the disease status nD=1 ∼

Bin(p, N ) and hence estimated simply as ˆp = nD=1/N . The positive fractions and predictive values are

just marginal binomial distributions, so conditionally

n+D=1 ∼ Bin(TPF, nD=1) , n+D=0 ∼ Bin(FPF, nD=0)

n+D=1 ∼ Bin(PPV, n+) , n

D=0 ∼ Bin(NPV, n

)

and the natural estimators are ˆ

TPF = n+D=1/nD=1, FPF = nˆ +D=0/nD=0

ˆ

PPV = n+D=1/n+, NPV = nˆ −D=0/n−

Since the predicted values and positive fractions attain values in (0, 1) we may use the continuous mapping theorem to obtain estimates of the likelihood ratios.

Remark 7. It is easy to see that the predictive values depend on the prevalence using table 2. Fix N and change the proportion of diseased and non-diseased by changing the prevalence in the following way:

multiply the quantities of the left column with v ∈ (0, 1) and the right column with r =N −vnD=0

nD=1 . Now

the prevalence is pr, the predictive values are different but the positive fractions are unchanged, ˆ TPF = rn + D rnD=1 , FPF =ˆ vn + D=0 vnD=0 ˆ PPV = rn + D=1 rn+D=0+ vn+D=1 , ˆ NPV = rn − D=0 vn−D=0+ rn−D=1.

Since the positive fractions does not depend on the prevalence, the same applies to the likelihood ratios since they are functions of the positive fractions.

Remark 8. By interpreting the binomial distributions as a sum of independent Bernoulli trials we may use example 3 to derive asymptotic confidence intervals, with p being one of TPF,FPF,PPV or NPV with its corresponding estimator. By assumption, we have that the positive fractions are independent (since

we are given nD=1) and thus exact confidence rectangles can be computed. More precisely, given α, then

a 1 − α confidence rectangle can be constructed for (TPF,FPF) by finding corresponding 1 − ˜α =√1 − α

(24)

3.4

Tests for Normality

Tests of normality test the null hypothesis H0: data is normally distributed against the alternative H1:

data is not normally distributed. In the context of parameter pooling based on multiple imputations, these tests will be useful to assess whether a certain transformation yields a better or satisfactory normal approximation. The Shapiro-Wilk, Shapiro-Francia, Anderson-Darling and Cramer-von Mises tests are four examples of tests for normality. This section is a brief summary of material from Thode Jr (2002).

The first test is the Shapiro-Wilk test for normality and it is based on the test statistic

W = b

2

(n − 1)s2,

where s2 is the usual variance estimator, n is the sample size and b is, up to a normalizing constant, a

generalized least squares estimate of the slope from a linear regression of the sample order statistics on their expected values. Shapiro and Wilk (1965) showed that the maximum value of W is 1, and, under

H0, that the nominator and denominator are both, up to constant, estimating the variance. Thus one

would expect W to be close to 1 under normality and otherwise smaller. Therefore, one rejects H0 if

W ≤ c for some appropriate constant c, which means that a smaller W generally gives smaller p-values. The Shapiro-Francia test for normality approximates W under the assumption that, for large sample sizes, the order statistics can be treated as independent. It is basically the squared correlation between the approximated ordered quantiles from the standard normal distribution and the ordered sample values. The Anderson-Darling and Cramer-von Mises tests for normality are goodness of fit tests based on the

empirical distribution function Fn. They measure the weighted distance between the true distribution

function F and Fn through

n

Z ∞

−∞

(Fn(x) − F (x))2w(x)dF (x),

for a weight function w.

The Cramer-von Mises test is based on this distance with w(x) = 1 and uses the test statistic

W2= 1 12n+ n X i=1  pi− 2i − 1 2n 2 , where pi = Φ( x(i)−¯x

s ), where Φ is the standard normal cumulative distribution, and x(i) is the ordered

i:th observation. Under normality this distance should be small, and therefore one rejects H0 if W ≥ c

for some appropriate constant c.

The Anderson-Darling test is based on this distance with w(x) = (F (x)(1 − F (x))−1, thus putting

more weight on the tails of the distribution compared to the Cramer-von Mises test. It uses the following test statistic

A2= −n −

Pn

i=1(2i − 1)(log(pi) + log(pn−i+1))

n ,

and one rejects H0if A ≥ c for some appropriate constant c.

3.5

Bayesian Inference for Missing Data

Missing data can pose a challenge when performing statistical analyses and there are several ways to handle the problem. This section is based on material from Rubin (2004) unless otherwise stated, and I have tried to fill in the gaps where Rubin chooses to be sketchy. The purpose of this section is describe the concepts involved when modelling missing data. This means that one in some clever way creates ”reasonable guesses”, or imputations, for all the missing values in the data set using an appropriate method. The process is not deterministic given the data since each imputed value will be a draw from its posterior distribution. The term multiple imputation means that this procedure is repeated multiple times with different seeds sent to the random number generator, very much like how bootstrap replications are generated. This idea will be implemented using the MICE algorithm, and the following section will serve as a theoretical motivation for this.

(25)

Theorem 5 (Bayes Theorem). Let A, B be events such that P r(B) > 0, then

P r(A|B) = P r(B|A)P r(A)

P r(B).

Furthermore, if X, Y are two random variables with conditional density P (X|Y ) and marginal P (Y ), then the conditional density of Y |X is

P (Y |X) = P (X|Y )P (Y )

R P (X|Y )P (Y )dY.

Remark 9. When Bayes theorem is applied one might sometimes use the proportional sign ∝ to relate the posterior distribution P (Y |X) with the prior P (Y ) and conditional distribution P (X|Y ) through

P (Y |X) ∝ P (X|Y )P (Y ).

This notation suggests that the right hand side is, up to a normalising constant, the marginal P (X) = R P (X|Y )P (Y )dY , equal to the left hand side. The proportional sign will be used in more general settings where densities or probability mass functions are, up to a normalising constant, equal, except for the case of improper prior distributions, which do not integrate.

3.5.1 Notation and Definitions

Definition 12. Let Y be a n × p matrix containing data with missing values on p variables and n observations and let X be a n × q matrix of fully observed variables. Furthermore, let the response

indicator R = {ri,j} be an n × p 0-1 matrix indicating the elements of Y that are missing

ri,j =

(

1 if yi,j is observed

0 if yi,j is missing

This means that X and R are completely observed. Denote the observed data by Yobs, with obs =

{(i, j) : ri,j = 1}, and the missing data by Ymis, with mis = {(i, j) : ri,j= 0}, respectively, and we write

Y = (Yobs, Ymis) for simplicity, being aware of the abuse of notation.

Definition 13. Note that R is random and its distribution may depend on Y = (Yobs, Ymis). This

dependency is described by the missing data model

P (R|X, Yobs, Ymis, φ)

where φ contains the parameters of the missing data model. The parameters φ are just used for modelling, have no intrinsic scientific value and are generally unknown.

Remark 10. In Rubin (2004) there is a sample process I, often involved in surveys, which indicates the observations from the whole population that are included in the study. In this thesis the sample process will not be considered and the indicator matrix I will be constant and constitute of only 1’s as everyone is considered to be included.

Let us discuss some important concepts related to the missing data model.

Definition 14. The probability of an observation being missing is considered to be dictated by a process called the missing data mechanism or response mechanism P (R|X, Y ), and it can be described using a missing data model.

(26)

that the cause of missing data is unrelated to the data, then data is said to be MCAR. In this case, lots of complexities occurring with missing data can be ignored, apart from the loss of information. This is in general an unrealistic assumption but may apply in some special situations. If the probability of missing data depends on the observed data then the data are said to be MAR and if the probability of missing data depend on observed and unobserved data, i.e on unknown reasons, such that neither MCAR or MAR hold, then the missing data is said to be MNAR. In this latter case, special adjustments and additional assumptions are needed in order to impute the missing data.

Example 4. Buuren (2012) gives a good example of these concepts. Consider a weighing scale which has run out of batteries during a weighting procedure. This will result in missing data, and would be MCAR if the order of items to be weighted is random. Placing the weighing scale on a soft surface for some observations and on a hard surface for some, assuming it generates missing values differently on the two surfaces, and noting this, then we have an example of MAR. If, on the other hand, the weighing scale wears out and produces more missing data over time, we fail to note this and measure heavier objects at later times, then the distribution of measurements will be distorted. This case is and example of MNAR. We can restate the missing data patterns mentioned above as a definition using the missing data model.

Definition 15. Let

- MCAR P (R|X, Yobs, Ymis, φ) = P (R|φ)

- MAR P (R|X, Yobs, Ymis, φ) = P (R|X, Yobs, φ)

- MNAR P (R|X, Yobs, Ymis, φ) (no general simplification possible)

Example 5. A simple example from Buuren (2012) illustrates the difference between MCAR, MAR and MNAR and gives a concrete example of what the parameter φ may describe. Let Z = (X, Y ) be drawn

from N2(µ, Σ) with the correlation coefficient ρ = 12, and then generate missing data in Y according to

P r(R = 0|Z, φ) = φ0+

eX

1 + eXφ1+

eY

1 + eYφ2

with φ = (φ0, φ1, φ2). Then, in order, the different missing data mechanisms correspond to

φMCAR= (1/2, 0, 0), φMAR= (0, 1, 0), φMNAR= (0, 0, 1),

and gives different distributions of the response

P r(R = 0|Z, φMCAR) =

1

2, logit(P r(R = 0|Z, φMAR)) = X and logit(P r(R = 0|Z, φMNAR)) = Y.

These important concepts are closely related to the concept of an ignorable and non-ignorable missing data mechanism.

Definition 16. The missing data mechanism P (R|X, Y ) is ignorable if P (R|X, Y ) = P (R|X, Yobs).

Lemma 2. An ignorable missing data mechanism is equivalent with P (Ymis|X, Yobs, R) = P (Ymis|X, Yobs).

(27)

Example 6. A simple example of a nonignorable missing data mechanism is given by Rubin (2004). Let

Yi > 0 be some i.i.d. measured quantity, for i = 1, . . . , n, R = (R1, . . . , Rn), and let

P (R|X, Y ) = N Y i=1  Y i 1 + Yi Ri 1 1 + Yi 1−Ri

be a Bernoulli distributed response mechanism that increases the probability of response with increasing

Yi, meaning that the equality in lemma 2 does not hold.

Lemma 3. Let f (X, Y |θ) be a model for the data given a parameter θ, for example a normal distribution with mean vector µ and covariance matrix Σ, such that θ = (µ, Σ). Furthermore, let g(R|X, Y, φ) be a missing data model, with φ being a parameter for the model, compare with example 5. Then the missing data mechanism is ignorable if the MAR assumption holds and the parameters θ and φ are a priori independent.

Proof: start by rewriting

P (Ymis|X, Yobs) =

P (X, Yobs, Ymis)

R P (X, Yobs, Ymis)dYmis

= R P (X, Yobs, Ymis|θ)P (θ)dθ

R R P (X, Yobs, Ymis|θ)P (θ)dθdYmis

(1) and similarly

P (Ymis|X, Yobs, R) =

R R P (X, Yobs, Ymis, R, θ, φ)dθdφ

R R R P (X, Yobs, Ymis, R, θ, φ)dθdφdYmis

. (2)

Observe that the integrand in eq. (2) can be written as

P (R|X, Yobs, Ymis, θ, φ)P (X, Yobs, Ymis, θ, φ) = g(R|X, Y, φ)P (X, Yobs, Ymis|θ, φ)P (θ, φ)

= g(R|X, Y, φ)f (X, Y |θ)P (θ, φ) = g(R|X, Y, φ)f (X, Y |θ)P (θ)P (φ),

(3)

using Y = (Yobs, Ymis) and where the last equality follows by independence of θ and φ. Now, combining

eq. (2) and eq. (3) and using the MAR assumption, which implies that g(R|X, Y, φ) = g(R|X, Yobs, φ),

then

P (Ymis|X, Yobs, R) =

R R g(R|X, Y, φ)f (X, Y |θ)P (θ)P (φ)dθdφ

R R R g(R|X, Y, φ)f (X, Y |θ)P (θ)P (φ)dθdφdYmis

= R g(R|X, Yobs, φ)P (φ)dφR f (X, Y |θ)P (θ)dθ

R g(R|X, Yobs, φ)P (φ)dφR R f (X, Y |θ)P (θ)dθdYmis

= R f (X, Y |θ)P (θ)dθ

R R f (X, Y |θ)P (θ)dθdYmis

= P (Ymis|X, Yobs)

(4)

by eq. (1), which means that the response mechanism is ignorable, by lemma 2. 

Remark 11. Given a joint distribution P (X, Y ) then Rubin (2012) argues that we may choose a pa-rameter such that

P (X, Y ) = Z

P (X, Y, θ)dθ = Z

f (X, Y |θ)P (θ)dθ

and then we may choose a distribution for the response g(R|X, Y, φ), where φ is independent of θ and such that the MAR assumption holds. He stresses that the MAR requirement for an ignorable missing data mechanism is generally the more important condition and that the condition of the parameters is intuitive, meaning that in practice, the MAR assumption and the ignorable assumption are effectively the same.

References

Related documents

There is a clear difference between the groups as it has been shown earlier, the patients who are satisfied with the results of their surgical intervention are more likely to respond

2 Several firms that jointly control a bottleneck may also fall under the essential facilities doctrine. In the US, the legal basis for the doctrine would then be Section 1 of

Hence, the lower bound of the one-sided CI was above the pre-specified limit of 90% (non- inferiority margin of 10%) and bpMRI was proven non-inferior (Table 5 in the

Results: A PSA‐based screening program reduced the relative risk of prostate cancer mortality  by  44%  over  14  years.  Overall,  293  men  needed  to 

Aims: The Göteborg randomized population-based prostate cancer screening trial is a prospective study evaluating the efficacy of prostate-specific antigen

Keywords: National Prostate Cancer Register (NPCR) of Sweden, Prostate Cancer data Base Sweden (PCBaSe), clinical cancer register, prostate cancer, online registration,

Furthermore, twelve salient factors, strengthened by both theory and empirical data, could be allocated in the Kano Model to assist Lynk &amp; Co with the alignment

However, the approach to exclusionary screening adopted by the Swedish Ap-funds and Norwegian GPFG that we analyse in our study are remarkably different to the ones studied in