• No results found

Analyzing and modeling the relative survival rate of patients diagnosed with malignant melanoma

N/A
N/A
Protected

Academic year: 2021

Share "Analyzing and modeling the relative survival rate of patients diagnosed with malignant melanoma"

Copied!
45
0
0

Loading.... (view fulltext now)

Full text

(1)

U.U.D.M. Project Report 2008:11

Examensarbete i matematisk statistik, 30 hp Handledare och examinator: Hans Garmo Juni 2008

Analyzing and modeling the relative survival rate of patients diagnosed with malignant melanoma

Fredrik Sandin

Department of Mathematics

(2)
(3)

Analyzing and modeling the relative survival rate of patients diagnosed with malignant melanoma

Fredrik Sandin

Master of Science Project in Mathematical Statistics

Supervisor and examiner: Hans Garmo, Ph.D.

(4)
(5)

Abstract

In cause-specific survival analysis, patients dying from other causes than the disease/condition of interest are censored, which suggests that the method re- quires accurate knowledge of the specific cause of death. In population-based cancer studies, it is not uncommon for this information to be either unavailable or unreliable, and a preferable alternative is to study the relative survival rate by comparing the observed survival rate of the patients to the expected survival rate of the background population. In this paper, methods of estimating and modeling the relative survival rate are described and then applied to data from the regional malignant melanoma register of central Sweden. Previous analyses of this data material have shown that melanoma patients with tumour regres- sion induced by the body’s own immune system have an improved survival rate compared to patients without tumour regression. The prognostic significance of spontaneous regression is a controversial issue, and further analysis of the effect of this regression phenomenon is therefore performed in the framework of relative survival analysis.

(6)

Acknowledgement

I would first and foremost like to thank Mats Lambe for giving me the oppor- tunity to write my thesis at ROC. Many thanks also to my supervisor Hans Garmo for providing support and guidance through the entire project. I would also like to thank Gunnar Wagenius for providing highly valuable opinions, and for showing such great interest in the study. Finally, I would like to thank the entire staff of ROC for providing a friendly working environment.

(7)

Contents

1 Introduction 2

1.1 Aim of the thesis . . . 2

1.2 What is malignant melanoma . . . 2

1.2.1 Treatment . . . 3

1.2.2 Classification . . . 4

2 Methods 6 2.1 The Cancer Register . . . 6

2.1.1 Data included in the study . . . 6

2.2 Right censoring . . . 7

2.3 Cause-specific survival . . . 7

2.3.1 Estimating the survival function . . . 9

2.3.2 The Cox proportional hazard model . . . 9

2.4 Competing risks . . . 10

2.5 Age-standardized incidence . . . 11

2.6 Relative survival . . . 12

2.6.1 Calculating the observed survival . . . 13

2.6.2 Estimating the expected survival . . . 14

2.6.3 The standard error of the relative survival rate . . . 16

2.7 Modeling excess hazard . . . 17

2.7.1 The Est`eve et al. full likelihood approach . . . 17

2.7.2 The Dickman et al. approach . . . 18

2.7.3 The Hakulinen-Tenkanen approach . . . 19

2.8 Statistical software . . . 20

3 Results 21 3.1 Descriptive analysis . . . 21

3.1.1 Age-standardized incidence . . . 26

3.1.2 Cumulative incidence analysis on a subset of the data . . 27

3.1.3 Relative survival . . . 28

3.2 Further analysis of the regression phenomenon . . . 31

4 Discussion 35

References 37

Appendix A Variable descriptions 39

(8)

1 Introduction

1.1 Aim of the thesis

The aim of the thesis is to study the regional malignant melanoma register at the Regional Oncologic Center (ROC) in Uppsala, an institution that gathers and analyses data on patients diagnosed with cancer in the Uppsala/ ¨Orebro region of Sweden. The main interest lies in studying incidence, distribution of patient and tumour characteristics, and survival after the time of diagnosis.

Cause-specific survival analysis has traditionally been the method of choice for studying cancer patient survival, but information on the specific cause of death for patients in population-based studies is often either unavailable or unreliable.

It is preferable, in this situation, to study the relative survival rate, which is estimated by comparing the observed survival rate of cancer patients to the expected survival rate of the background population. Approaches to modeling proportional excess hazard, using the framework of Generalized Linear Models, have been proposed [4] [5], which enables adjusting the relative survival rate for various patient and tumour characteristics.

In some cases, at the histopathological analysis of a malignant melanoma, tumour regression induced by the body’s own immune system can be observed.

Previous analyses of the malignant melanoma register at ROC have shown that patients with this regression phenomenon have an improved survival rate com- pared to patients without the regression phenomenon. The prognostic signifi- cance of the phenomenon is a controversial issue. Results from some [6][7], but not all studies [8][9][10] suggest that spontaneous regression is associated with a poorer survival, and another aim of the thesis is therefore to further investigate the effect of the regression phenomenon.

1.2 What is malignant melanoma

Malignant melanoma is the most serious form of skin cancer, and it is also one of the cancer diseases which have excibited the largest increase in occurence in Sweden during recent decades [1]. Much of the increase has been suggested to be caused by the increase in excessive exposure to sunlight by the population, since it is well known that ultraviolet radiation increases the risk of develop- ing melanoma. Another contributing factor to the risk of developing melanoma is the skin type. While Caucasians, freckled individuals, and people who are susceptible to red sunburn have the highest risk of developing melanoma, the disease is quite unusual in northern Africa, Asia, and the Middle-East.

(9)

The most important tumour characteristic affecting the prognosis of patients with localized melanoma of the skin is the tumour depth, which is measured ac- cording to Breslow at the place where the tumour is thickest. Another important characteristic is the level of invasion according to Clark, which is a measure of how far down into the skin the tumour has invaded.

Clark level of invasion

I. Melanomas confined to the the epidermis.

Also known as ”melanoma in situ”.

II. Penetration into the second layer of the skin, the dermis.

III. Penetration to the junction of the papillary and reticular dermis.

IV. Penetration into the reticular dermis.

V. Penetration into the third layer of the skin, the subcutis.

The presence of metastases in nearby lymph nodes (regional metastases) or other parts of the body (remote metastases) obviously has an impact on the prognosis, and certain patient characteristics, such as age and gender, are also important factors to consider.

1.2.1 Treatment

While the occurence of malignant melanoma has increased during recent decades, the prognosis for those affected has improved. This is most likely due to earlier detection of the tumour, which enables patients to be cured through surgical treatment. The patient is usually alarmed by changes in a nevus on the skin, and if the contacted physician even slightly suspects cancer, primary surgery is performed, during which the nevus is removed along with a margin of the surrounding skin. If the histopathological investigation confirms malignant melanoma, the patient is called back for extended surgery, during which an even larger margin around the site of the tumour is excised. How wide a mar- gin of disease-free tissue that is removed is dependent upon the depth of the tumour. Melanomas with a Breslow depth of less than 1.0 mm have a recom- mended total excision margin of 1.0 cm, while tumours with depth larger than 1.0 mm should be excised with a total margin of 2.0 cm. The recommended margin for large melanomas was wider a few years back, but a Swedish study [3]

showed that there was no significant change in long-term survival for patients with melanomas thicker than 0.8 mm and thinner than 2.0 mm, when they were treated with a 2.0 cm resection margin instead of a 5.0 cm resection margin.

Results from a recent Nordic study also showed no significant change in survival for patients with melanomas thicker than 2.0 mm, when they were treated with a 2.0 cm resection margin instead of a 4.0 cm resection margin.

(10)

If a malignant tumour is not detected in an early stage, it could grow large enough to invade through the skin, allowing cancer cells to spread through the blood stream to nearby lymph nodes. The cells could form daughter tumours, so-called regional metastases, which have to be surgically removed. If the dis- ease is allowed to progress for a longer period of time, remote metastases could form in other parts of the body. Unlike localized melanomas, metastases are difficult to completely remove through surgical measures, and other forms of treatments are often necessary. These include chemotherapy and radiation, but research is also being performed on other options, such as immunotherapy.

1.2.2 Classification

An excised tumour is usually classified according to the TNM-system (TNM being an acronym for Tumour, Nodes and Metastases), created by the Interna- tional Union Against Cancer (UICC). This is a classification system based upon such variables as tumour depth, presence of ulceration, Clark level of invasion, presence of regional lymph node metastases, and presence of remote metastases.

The stage of the cancer is determined from these classifications, with melanoma in situ being stage zero.

T-classification (tumour depth, ulceration, Clark level) TX Invation non-assesable

T0 Primary tumour unknown Tis In situ or LM (or Clark I)

Tumour depth Ulceration

T1 ≤ 1.0 mm T1a Absence of ulceration, or Clark level II-III T1b Presence of ulceration, or Clark level IV-V T2 1.01-2.0 mm T2a Absence of ulceration

T2b Presence of ulceration T3 2.01-4.0 mm T3a Absence of ulceration

T3b Presence of ulceration T4 > 4.0 mm T4a Absence of ulceration

T4b Presence of ulceration

(11)

N-classification (lymph node metastases) NX Lymph nodes not investigated N0 No lymph node metastases

No. of pathological lgl Lymph node status

N1 1 lgl N1a Micrometastasis

N1b Macrometastasis

N2 2-3 lgl N2a Micrometastasis

N2b Macrometastasis

N3a Satellite/in-transit metastasis without regional lymph node metastases

N3 ≥ 4 lgl, or conglomerate of lymph nodes, or satellite/in-transit metastasis with regional lymph node metastases

M-classification (remote metastases) MX Not investigated

M0 No remote metastases Location

M1 M1a Remote metastasis in skin, subcutan or lymph node metastasis with normal LD-test

M1b Lung metastasis with normal LD-value

M1c All other visceral metastasis, or remote metastasis with heightened LD-test from two measures, independent of metastasis location

Cancer stage (based on TNM-classification)

M0 T1a T1b T2a T2b T3a T3b T4a T4b M1

N0 IA IB IB IIA IIA IIB IIB IIC IV

N1a IIIA IIIB IIIA IIIB IIIA IIIB IIIA IIIB IV N1b IIIB IIIC IIIB IIIC IIIB IIIC IIIB IIIC IV N2a IIIA IIIB IIIA IIIB IIIA IIIB IIIA IIIB IV N2b IIIB IIIC IIIB IIIC IIIB IIIC IIIB IIIC IV N2c IIIB IIIB IIIB IIIB IIIB IIIB IIIB IIIB IV N3 IIIC IIIC IIIC IIIC IIIC IIIC IIIC IIIC IV

M1 IV IV IV IV IV IV IV IV

(12)

2 Methods

2.1 The Cancer Register

The Regional Oncologic Center in Uppsala gathers data on cancer patients diagnosed in the Uppsala/ ¨Orebro region of Sweden. This region includes seven counties; Uppsala, S¨odermanland, V¨armland, ¨Orebro, V¨astmanland, Dalarna and G¨avleborg, with a total population of roughly two million. Physicians at hospitals and other health care centers are obligated by law to report diagnosed cancer cases to the Swedish Cancer Registry, but there is no law that requires them to provide records to the register at ROC. As a consequence, a number of variables are missing for some patients in the data set.

2.1.1 Data included in the study

The malignant melanoma register at ROC contained, at the initiation of the study, 5480 patients whose first diagnosis of malignant melanoma occurred after Dec 31, 1993. To attain acceptable coverage (above 95%) of the cases in the Swedish Cancer Registry, only patients diagnosed between the years 1996 and 2006 were included.

Total entries in the melanoma database:

5480

Missing date of diagnosis: 1 5480-1=5479

Date of diagnosis earlier than Jan 1, 1996 or later than Dec 31, 2006: 527 5479-527=4952

A total of 4952 patients, each associated with 41 variables (complete list of variables available in the appendix), were included in the study. Patient records are sometimes incomplete, which prohibits the use of the entire data set in every aspect of the analysis. The geographical distribution of all patients can be found in Table I.

(13)

Table I. Number of cases in the melanoma register of central Sweden, according to initial registration.

County 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Total

Uppsala 29 44 34 36 48 55 60 56 76 69 57 564

odermanland 56 49 44 50 50 55 64 64 76 79 94 681

armland 37 57 55 52 61 68 67 74 86 100 106 763

Orebro¨ 63 79 78 54 54 76 90 91 112 74 114 885

astmanland 44 53 64 57 47 69 73 77 48 66 81 679

Dalarna 52 54 58 59 47 70 78 76 63 90 87 734

avleborg 60 32 55 43 46 57 70 59 72 81 71 646

Total 341 368 388 351 353 450 502 497 533 559 610 4952

The first part of the descriptive analysis in this paper serves to describe the regional melanoma register, and patients were therefore included to the largest extent possible. When studying patient survival after the time of diagnosis, it is obvious that patients with metastases are not in a sensible way comparable to patients without metastases. Therefore, only patients with N-classification N0 and M-classification M0 were included in the survival analysis.

2.2 Right censoring

In survival analysis, we are interested in studying the time to a certain event of interest. In cancer studies, the event is usually death and time is counted since diagnosis. Right censoring occurs when we do not know the specific time of the event, only that it would have occurred after a specific time point. The reason for this could be death from something other than the disease of interest, migra- tion, or simply survival until the end of the study. Besides right censoring, left censoring and interval censoring can occur when all we know is that the event has occurred before a specific time point, or between two specific time points.

In the data set concerning malignant melanoma, the only kind of censoring present is right censoring. We have chosen to include only patients with a confirmed date of diagnosis between the years 1996 and 2006, and patients that were alive at the end of 2006 were automatically censored, and their last date of follow-up was set to December 31, 2006.

2.3 Cause-specific survival

Four functions are used in cause-specific survival analysis to characterize the distribution of the time, T , to the event of interest. These are the suvival func- tion, the hazard rate, the probability density function, and the the cumulative distribution function. If we know one of these measures, the other three can be uniquely determined.

(14)

The survival function, S(t), is the probability of experiencing the event of interest after the time point t,

S(t) = P (T > t). (1)

The cumulative distribution function is the complement of the survival function,

F (t) = 1 − S(t) = P (T ≤ t), (2)

and if T is a continuous random variable, the survival function is the integral of the probability density function,

S(t) = P (T > t) = Z

t

f (u)du ⇔ f (t) = −dS(t)

dt . (3)

The survival function is a monotone, nonincreasing function equal to one at time zero and zero at infinity. Another important quantity is the hazard rate, h(t), also known as the hazard function, which is a nonnegative function defined as

h(t) = lim

∆t→0

P (t ≤ T < t + ∆t|T ≥ t)

∆t = f (t)

S(t) = −d ln[S(t)]

dt . (4)

The cumulative hazard function, H(t), is defined as

H(t) = Z t

0

h(u)du = − ln[S(t)]. (5)

The hazard rate can be interpreted as the risk of experiencing the event of in- terest immediately after time t, given survival up until time t. Also, h(t)∆t may be viewed as the approximate probability of experiencing the event in the next instant.

If T is a discrete random variable, taking on values t1 < t2 < ... < tn, the survival function is defined as

S(t) = P (T > t) = X

tj>t

p(tj), (6)

where p(tj) = P (T = tj), the probability mass function. The hazard function is given by

h(tj) = P (T = tj|T ≥ tj) = p(tj)

S(tj−1). (7)

(15)

Since T is discrete,

p(tj) = S(tj−1) − S(tj) ⇒ h(tj) = 1 − S(tj)

S(tj−1), (8) and because S(t0) = 1, the survival function can be written as the product of conditional survival probabilities,

S(x) = Y

tj≤t

S(tj)

S(tj−1) = Y

tj≤t

[1 − h(tj)]. (9)

2.3.1 Estimating the survival function

Let dibe the number of events at time ti, and let Yibe the number of individuals at risk at time ti (i.e. the number of individuals with a time on study larger than or equal to ti). When estimating the survival function from data consisting of n distinct event times, t1< t2< ... < tn, we use the fact that di/Yi provides an estimate of the conditional probability of experiencing the event at time ti, given survival to just prior to time ti. Since S(0) = 1, P (T > ti|T ≥ ti) = S(ti)/S(ti−1), and ˆP (T > ti|T ≥ ti) = 1 − di/Yi, we conclude from (9) that a suitable estimator of ˆS(t) is

S(t) =ˆ

(1 , if t < t1, Q

ti≤t[1 −dYi

i] , if t1≤ t . (10)

This is known as the Kaplan-Meier estimator (sometimes referred to as the Product-Limit estimator) of the survival function at time t. It is a step function with jumps at the distinct event times, and its variance is usually estimated by Greenwood’s formula:

V [ ˆˆ S(t)] = ˆS(t)2X

ti≤t

di

Yi(Yi− di). (11) Pointwise confidence intervals for the survival function can be formed in the usual manner by assuming approximate normality, and by using the square- root of (11) as estimated standard error.

2.3.2 The Cox proportional hazard model

By modeling the time after diagnosis, it is possible to adjust for certain explana- tory variables, so-called covariates, such as age, gender, or other characteristics that may affect the survival. For every observation i, i = 1, 2, ..., n, we have the data (Ti, δi, Zi), where Ti is the time on study for observation i, δi is an indi- cator of whether observation i experienced the event or was censored, and Zi

(16)

is a vector of covariates for observation i. We denote the hazard rate at time t for an observation with covariate vector Z by h(t|Z). The most commonly used regression model in cause-specific survival analysis is the proportional hazard model, proposed by Cox:

h(t|Z) = h0(t) exp(βTZ). (12)

h0(t) is a baseline hazard rate that is common for all observations, and β is a vector of parameters to be estimated. According to the model, the hazard rates for two observations with distinct covariate vectors, Z1and Z2, are proportional:

h(t|Z1)

h(t|Z2) = h0(t) exp(βTZ1)

h0(t) exp(βTZ1) =exp(βTZ1)

exp(βTZ2). (13) The parameters and their standard errors are estimated by maximizing the par- tial likelihood, using an iterative numeric procedure such as Newton-Raphson.

2.4 Competing risks

In cause-specific survival analysis, we consider a hypothetical world where the disease of interest is the only possible cause of death. Instead of censoring for deaths from other causes, we now consider them competing risks acting upon the patient. The occurrence of one of the events would then preclude us from observing the others. In general, if we consider K competing risks, we observe the vector (Ti, δi), where Ti is the time until an event, and δi ∈ (0, 1, 2, ..., K) indicates which event that has occurred (with δi=0 meaning censoring). The probability of death from cause k before or at time t is expressed by the cause- specific subdistribution function, Fk(t), also known as the cumulative incidence function:

Fk(t) = P (T ≤ t, δ = k). (14)

Fk(t) is not a true distribution function since Fk(∞) = P (δ = k) < 1. We now let t1 < t2 < ... < tn be distinct event times, and Yi be the number at risk at time ti. We denote by ri the number of patients experiencing the event of interest (e.g. death from the cancer of interest) at time ti, and we denote by di the number of patients experiencing any other event (i.e. any of the other competing risks) at time ti. The cumulative incidence function at time t is then defined as

CI(t) =(0 , if t < t1,

P

ti≤t

hQi−1

j=1[1 −djY+rj

j ]ir

i

Yi , if t1≤ t . (15)

(17)

For t ≥ t1, we have

CI(t) = X

ti≤t

S(tˆ i−)ri

Yi

, (16)

where ˆS(ti−) is the Kaplan-Meier estimator of the survival function, when treat- ing any of the competing risks as an event, evaluated just before time ti. The cumulative incidence function estimates the probability of the event of interest occurring before or at time t, and occurring before any of the other competing risks. The variance of (15) can be estimated by

V [CI(t)] =X

ti≤t

S(tˆ i)2



[CI(t) − CI(ti)]2ri+ di

Yi2 + [1 − 2(CI(t) − CI(ti))]ri

Yi2

 , (17) and pointwise confidence intervals can be formed in the usual manner by as- suming approximate normality, and by using the square-root of (17) as standard error.

2.5 Age-standardized incidence

Incidence is a measure of the risk of developing a disease/condition within a specified period of time. The number of new cases is in it self a fairly incom- plete measure, since we have to consider the size of the population at risk.

The incidence is often calculated by comparing the occurrence of the disease in different age categories to the population at risk in the same age categories.

Comparing this measure over time is unwise, since the age-composition of the underlying population changes. Therefore, it is preferable to study the age- standardized incidence, which is the age-specific incidence, standardized to a pre-specified age-composition.

The incidence proportion is calculated as the number of new cases of the disease of interest divided by the size of the underlying population, for a spec- ified period of time. Since cancer is usually more common in groups of elderly individuals, it is of interest to consider the age-specific incidence, which for age category i, and time period j, is calculated as

Ii,j= Xi,j

Bi,j· Nj

. (18)

Here, Xi,j is the number of new cases in age category i during the time period of interest, Bi,j is the mean population of the area in age category i during the time period of interest, and Nj is the number of years in the time period of interest. We now consider the age-standardized incidence, Rj, which for time

(18)

period j is calculated as

Rj=

k

X

i=1

wi· Ii,j. (19)

Here, k is the number of age categories considered, and wi are suitable weights.

The measure of interest is usually the age-standardized incidence per 100 000 person-years, which we acquire by multiplying Rj by 100 000.

In this study, the population of the seven counties of interest is used as the background population, and the weights in (19) are chosen from the age- composition for the year 2000.

2.6 Relative survival

Cause-specific survival analysis provides us with an estimate of the survival probability of a cancer patient in a hypothetical world where the cancer of in- terest is the only possible cause of death, as death from any other cause leads to the patient being censored. The specific cause of death for a patient can sometimes be hard to determine, and questions may arise around how to handle deaths from treatment complications, or deaths at a time where the cancer was not known to be present.

In relative survival analysis, knowledge of the specific cause of death is not required. The relative survival rate, r(t), is defined as the observed survival probability, S(t), of the patients included in the study, divided by the expected survival probability, S(t), of a comparable group of the background population,

r(t) = S(t)

S(t). (20)

The cause of death is not needed, since we compare the mortality from all causes in the patient group to the mortality of the population at large. The mortality of the background population is estimated from life tables, which are stratified by age, gender, and sometimes race. Life tables are calculated for populations that include individuals diagnosed with the cancer of interest, but it has been shown [14] that this does not significantly affect the estimates in practice. Since the general mortality has decreased significantly during the past century, it is of interest to take into account the change in mortality over calender time. For this purpose, a cohort life table is used, which includes the mortality of the population for different calender periods. To be able to compare each patient to the correct subgroup of the background population, it is in practice common to split each observation for pre-specified time intervals, and to increase the

(19)

current year and age accordingly for each new part. If we were interested in yearly relative survival rates, a patient dying in her tenth year after the time of diagnosis would be split into ten parts, with the first nine parts being censored after exactly one year, and the last part having a time on study equal to the time spent alive by the patient during the tenth year.

2.6.1 Calculating the observed survival

Two methods are commonly used for calculating the observed survival propor- tion; the actuarial approach, and the method of transforming the hazard. The first uses only information on in which interval patients die or are censored, and provides the following estimate for the interval-specific survival proportion for interval i, i.e. the probability of surviving interval i, given survival up until the start of the interval:

pi,1= 1 − di

liW2i. (21)

Here, diis the number of deaths in interval i, Wiis the number of censored indi- viduals (”true” censoring, and not censoring due to interval splitting) in interval i, and li is the number at risk in interval i. l0i = liW2i is called the effective number at risk, and is calculated by assuming that the censored patients are at risk for an average of half the interval.

If we know the exact time on study for each individual, we can estimate the interval-specific survival proportion for interval i by transforming the estimated hazard:

pi,1= exp(−di Yi

ki). (22)

Yiis the total person-time at risk for interval i, and ki is the length of the inter- val in years. We assume that the hazard rate is constant in each interval. The two methods provide very similar estimates if no left-truncation is present, but since exact survival times are available in the data set considered, the method of transforming the hazard is used in this study.

The cumulative observed survival proportion up until the end of interval i is calculated by taking the product of the interval-specific survival proportions:

pi=Y

j≤i

pj,1. (23)

The variance of the observed interval-specific survival proportion is usually es-

(20)

timated by using the effective number at risk in Greenwood’s formula (11):

V [pi,1] = p2i,1

 di

l0i(l0i− di)



, (24)

The variance of the cumulative observed survival proportion is estimated by

V [pi] = p2i

 X

j≤i

dj l0j(l0j− dj)

. (25)

2.6.2 Estimating the expected survival

Three methods of estimating the expected survival proportion are discussed here; Ederer I, Ederer II, and the Hakulinen method. The methods differ in how long a patient’s counterpart in the background population is considered to be at risk, but in practice the three methods often produce very similar esti- mates for follow-up times up to 10 years.

The Ederer I method [16] is not used in this study, but the details are fairly simple. The expected probability of patient j surviving until the end of interval i, pi(j), is calculated as the product of the expected interval-specific survival proportions, pk,1(j), gathered from the life table for corresponding age, gender and year:

pi(j) =Y

k≤i

pk,1(j). (26)

The expected cumulative survival rate is then estimated by

pi = Pl1

j=1pi(j)

l1 , (27)

where l1 is the number at risk at the start of the first interval. In other words;

the matched individuals in the background population are considered to be at risk indefinitely, even if the patients die or are censored.

The Ederer II method [17] is considered more reasonable, since it allows for different length of follow-up times. Matched individuals in the background population of patients that die or are censored are no longer considered to be at risk. This is an effect of the estimate of the expected interval-specific survival for interval k being based only upon those patients at risk at the start of the interval:

pk,1=

lk

X

j=1

pk,1(j)

lk . (28)

(21)

The expected cumulative survival rate is then estimated by

pi =Y

k≤i

pk,1. (29)

While this method is more appealing than the previous method, by allowing for heterogeneous follow-up times, they both lead to biased estimates of the relative survival rate. Ederer I usually overestimates the relative survival rate, while Ederer II usually underestimates it [18].

In the Hakulinen method, a censored patient’s counterpart in the background population is also censored, while the counterpart of a patient that dies remains at risk. The objective is to create an unbiased estimate of the relative sur- vival rate by canceling the bias of the observed survival with a created bias of the estimated expected survival. For follow-up times longer than 10 years, the Hakulinen method is usually preferred, but since we in this study only consider follow-up times up to 7 years, the Ederer II method was deemed to be sufficient.

The main advantage of the Ederer II method is its simplicity, which makes it easy to use in practice. The Hakulinen method is somewhat more complicated, and the full details can be found in Timo Hakulinen’s original article [18].

When using Hakulinen’s method in practice, we have to know the potential follow-up times for all patients. The follow-up time is divided into intervals of pre-specified lengths (usually one year), and we denote the expected interval- specific survival probability for interval k, for a patient having a potential follow- up time stretching past the end of interval k − 1, by pk,1(j). The expected probability of patient j surviving up until the end of interval i is estimated by

pi(j) =

i

Y

k=1

pk,1(j). (30)

We now organize the li patients having a potential follow-up time past the end of interval i − 1, so that κi,a is the set of all patients have a potential follow- up time stretching past the end of interval i, and κi,b is the set of all patients withdrawing during interval i. We let

li= (Pli

j=1pi−1(j) , i ≥ 2

l1 , i = 1 (31)

be the expected number of patients alive and under observation at the beginning

(22)

of interval i, and we let

wi=

 P

j∈κi,bpi−1(j)pi,1(j) 12

, i ≥ 2 P

j∈κi,bpi,1(j) 12

, i = 1

(32)

be the expected number of patients withdrawing alive during interval i.

δi =

 P

j∈κi,bpi−1(j)[1 −pi,1(j) 12

] , i ≥ 2 P

j∈κi,b[1 −pi,1(j) 12

] , i = 1

(33)

is the expected number of patients dying in interval i, among the patients with- drawing in interval i, and

di =

 nP

j∈κi,api−1(j)[1 − pi,1(j)]o

+ δi , i ≥ 2 nP

j∈κi,a[1 − pi,1(j)]o

+ δ1 , i = 1

(34)

is the expected total number of patients dying in interval i. The expected interval-specific survival proportion can then be calculated by using the actuarial approach;

pi,1= 1 − di

liw2i . (35)

The expected cumulative survival proportion, up until the end of interval i, is calculated as the product of the expected interval-specific survival proportions,

pi =

i

Y

k=1

pk,1. (36)

2.6.3 The standard error of the relative survival rate

Since the variance of the expected survival proportion is very small compared to the variance of the observed survival proportion, it is assumed that the expected survival proportion is a fixed constant. Using the familiar formula V [cX] = c2V [X], for c constant, we receive the following expression for the variance of the relative survival rate, r:

V [r] = V  p p



= 1

p∗2V [p]. (37)

The expression holds for the interval-specific relative survival rate as well as the cumulative relative survival rate. Greenwood’s formula from section 2.6.1 is commonly used as an estimate of the variance of the observed survival pro- portion. Pointwise confidence intervals for the relative survival rate can be

(23)

constructed in the usual manner by assuming approximate normality, and by using the square-root of (37) as estimated standard error.

2.7 Modeling excess hazard

Merely estimating the relative survival rate gives us no convenient way of ad- justing for interesting covariates. At best, we can estimate the relative survival rate for different values of a covariate and compare the differences in the ac- quired curves. To perform further analyses would require a way of modeling the relative survival rate, similar to that of Cox regression for cause-specific survival analysis.

The proposed method for modeling the relative survival rate is to assume that the hazard rate at time t, of a patient with covariate vector z, is the sum of an expected hazard rate, h(t, z), and an excess hazard rate, ν(t, z). The ex- pected hazard rate is estimated from life tables for the background population, which are stratified by a subvector of z (typically age, gender, and year). The hazard rate is also assumed to be constant within pre-specified intervals (typi- cally one year) of the follow-up time. In addition to the covariates of interest, z, we will include indicator variables for each interval, and we let x be the vector of these variables in addition to z. Similar to the Cox regression model, the excess hazard is assumed to be of the form

ν(t, z) = exp(xβ), (38)

where β is the vector of parameters to be estimated. The proposed model is therefore written as

h(t, x) = h(t, x) + exp(xβ). (39) The excess hazard ratio between a group of patients with covariate vector x1, and a group of patients with covariate vector x2, is constant;

h(t, x1) − h(t, x1)

h(t, x2) − h(t, x2) = exp(x1β)

exp(x2β). (40)

This can be interpreted as relative excess risk, i.e. an excess hazard ratio of 1.2 indicates that the excess hazard, due to the cancer, is 20% higher for the group with covariate vector x1, than for the group with covariate vector x2.

2.7.1 The Est`eve et al. full likelihood approach

Data involving right censoring can be represented by the random variable pairs (Ti, δi), i = 1, 2, ..., n, where Tiis the time on study and δiis a variable indicating

(24)

if the patient experienced the event (δi= 1) or was censored (δi= 0). We know from section 2.3 that f (ti) = h(ti)S(ti), so the likelihood function can be written as

L =

n

Y

i=1

[f (ti)]δi[S(ti)]1−δi =

n

Y

i=1

[h(ti)]δiexp



− Z ti

0

h(s)ds



. (41)

By taking the logarithm of (41) and using (39) as the hazard rate, we arrive at the following expression for the log-likelihood function:

l(β) =

n

X

i=1

δiln [h(ti, xi) + exp(xiβ)] −

n

X

i=1

Z ti 0

h(s, xi) −

n

X

i=1

Z ti 0

exp(xiβ)ds.

(42) The second term does not contain β and can therefore be omitted. Since the hazard rate is assumed to be constant within the pre-specified intervals, we can simplify things by splitting patient observations for each interval, which results in a total of J so-called subject-band observations. The integral in the third term of (42) can then be calculated as the value of the expression within the integral, which is constant, multiplied by the time at risk, yj, for subject-band observation j;

l(β) =

J

X

j=1

jln [h(tj, xj) + exp(xjβ)] − yjexp(xjβ)] . (43)

The parameter vector β, along with its standard errors, can be estimated by maximizing the log-likelihood function above. This does not, however, provide means towards assessing goodness-of-fit.

2.7.2 The Dickman et al. approach

In an effort towards gaining access to suitable methods of model diagnostics, an approach to modeling the excess hazard using Poisson regression has been proposed [4]. Since we assume that the hazard rate is constant within pre- specified intervals, a Poisson process for the number of deaths is suitable. If we use subject-band observations and assume that δj∼ P o (µj) = P o (h(tj, xj)yj), the log-likelihood function is

l(β) = PJ

j=1δjln [h(tj, xj) + exp(xjβ)] +PJ

j=1δjln(yj)−

− PJ

j=1[h(tj, xj) + exp(xjβ)] yj−PJ

j=1ln(δj!). (44) After the removal of irrelevant terms, i.e. terms not containing the parameter vector β, we are left with

l(β) =PJ

δ ln [h(t , x ) + exp(x β)] −PJ

exp(x β)y , (45)

(25)

which is exactly the same as (43). So fitting this model for subject-band ob- servations would result in the exact same parameter estimates as with the full likelihood approach, since we are maximizing the same likelihood function. By estimating the expected hazard for subject-band observation j by dj/yj, with dj being the estimated expected number of deaths from causes other than the cancer of interest, we write

µj= h(tj, xj)yj ⇔ µj yj

= h(tj, xj) + exp(xjβ) =dj yj

+ exp(xjβ)

ln(µj− dj) = xjβ + ln(yj). (46) This implies a generalized linear model with outcome dj, Poisson error struc- ture, link ln(µj− dj), and offset ln(yj). By using the framework of generalized linear models, we gain access to methods of model diagnostics, and are therefore better able to assess the goodness-of-fit.

A convenient way of organizing the data is in the form of collapsed data, i.e.

we group together observations with the same covariate pattern and sum over the number of deaths, the expected number of deaths, and the person-time at risk. If we, for example, were interested in the covariates gender and cancer stage (I-IV) over 5 yearly intervals, we would collapse the data into 40 observations (2 genders, 4 stages, 5 intervals, 2·4·5 = 40). The parameter estimates for collapsed data differ slightly from the parameter estimates for subject-band observations, since dj in (46) varies within each covariate pattern. This difference is often very small though, and studying residuals or other goodness-of-fit statistics is not appropriate when estimating the model from subject band observations.

2.7.3 The Hakulinen-Tenkanen approach

Hakulinen and Tenkanen [5] have proposed an approach to modeling propor- tional excess hazard using a binomial error structure. The method uses grouped data, which only contains information on how many patients that die or are cen- sored in each interval for the different covariate pattern. We let hik denote the hazard contributed by group k in follow-up interval (year) i, which is assumed to be of the form

hik= hik+ exp(xikβ). (47) Here, hik is the expected hazard, xik is the covariate vector for group k in interval i, and β is a vector of parameters to be estimated. We consider yearly

(26)

intervals, and it follows from (47) and (5) that

− ln(pik

pik) = exp(xikβ) ⇔ ln



− ln(pik

pik)



= xikβ, (48)

where pik is the observed interval-specific survival probability for group k in interval i, and pikis the expected interval-specific survival probability for group k in interval i. So, the Hakulinen-Tenkanen approach uses a generalized linear model with binomial error structure, and a complementary log-log link function with a division by pik. This model produces similar estimates to those of the Poisson model in section 2.7.2, as the likelihood functions for the two models are similar.

2.8 Statistical software

Most of the statistical analysis in this study was performed with the R Statistical Software Package1. Functions for cause-specific survival analysis and estimation of the cumulative incidence function are integrated in existing packages, while functions for age-standardized incidence, relative survival, and the modeling of excess hazard were written by the author. The program for estimating the relative survival rate uses essentially the same procedure as Paul Dickman’s program [13] for estimating the relative survival rate in SAS2, and the program for modeling excess hazard uses code from the relsurv package, written by Maja Pohar [15]. The data was split for yearly intervals, allowing the patient’s age and current year to be increased accordingly. This ensured that, in each interval, the patient’s expected mortality was estimated from the correct background population. The Ederer II method was used to estimate the expected survival rate, using life tables for Sweden attained from the Human Mortality Database3. A program using the Hakulinen method to estimate the expected survival rate was written by the author, and the results from this program were very similar to the results from the program using the Ederer II method. An advantage of the Ederer II method is the possibility of comparing the results from the R program to the results from Paul Dickman’s SAS program. As a precaution, and to ensure the accuracy of the results, some analyses were performed in both R and SAS.

1http://www.r-project.org

2http://www.sas.com/technologies/analytics/statistics/stat/index.html

(27)

3 Results

In this section, the results from the analysis of the regional malignant melanoma register are presented. Descriptive analyses of the register, including the dis- tribution of various patient and tumour characteristics, age-standardized inci- dence, cumulative incidence, and relative survival, are presented in the first sections. In section 3.2, the excess hazard model in section 2.7.2 is used to investigate the effect of the regression phenomenon on the relative survival rate.

3.1 Descriptive analysis

The geographical distribution over time, for the 4952 patients included in the study, has been presented in Table I. Table II shows the distribution of patient and tumour characteristics, for all patients, by gender and time period. In later sections, analyses will primarily be based on patients with invasive melanomas.

Melanomas in situ are fairly harmless, and the survival rate for those affected is about the same as the survival rate of the background population.

Bar plots for most of the categories in Table II, for patients with invasive tumours, are included in Figure 1 and 2. The gender distribution is fairly even, but the distribution of location differs greatly between men and women. For men, the most common location of malignant melanomas is the trunk, which could be explained by the fact that men expose their trunks to sunlight to a larger extent than women. The most common location for females is the lower extremities, and this could be explained with similar logic. The median age at diagnosis, for patients with invasive melanomas, is 63.2 years, but it ranges from 10.8 years to 107.3 years. SSM (Superficial Spreading Melanoma) is the most common type of malignant melanoma in the register, and NM (Nodular Melanoma) is the second most common type. Together, melanomas in situ and LM (Lentigo Maligna, which is also a change in situ) represent around 20% of the cases, while LMM and ALM are fairly uncommon. Worth mentioning is that only a very small portion of the patients with invasive malignant melanomas have metastases. Table III shows the distribution of metastases, for patients with invasive melanomas, by gender and time period. Table IV shows the dis- tribution of treatment at primary surgery and extended surgery, for all patients, by gender and time period.

(28)

Table II. Distribution of patient and tumour characteristics, for all patients, by period and gender.

1996-2000 2001-2006 Total

Male Female Male Female

No. of cases 899 902 1508 1643 4952

Gender (%)

Male 899 (100.0) - - 1508 (100.0) - - 2407 (48.6)

Female - - 902 (100.0) - - 1643 (100.0) 2545 (51.4)

Age

Median (Min-Max) 64.7 (17.8-94.8) 61.9 (16.8-107.3) 65.6 (19.0-97.0) 61.3 (10.8-99.9) 63.5 (10.8-107.3)

Location (%)

Head-Neck 145 (16.1) 161 (17.8) 271 (18.0) 284 (17.3) 861 (17.4)

Top extremity 136 (15.1) 193 (21.4) 245 (16.2) 345 (21.0) 919 (18.6)

Lower extremity 101 (11.2) 273 (30.3) 158 (10.5) 524 (31.9) 1056 (21.3)

Trunk 478 (53.2) 244 (27.1) 813 (53.9) 472 (28.7) 2007 (40.5)

Palm, foot, subungual 16 (1.8) 22 (2.4) 17 (1.1) 17 (1.0) 72 (1.5)

Data unavailable 23 (2.6) 9 (1.0) 4 (0.3) 1 (0.1) 37 (0.7)

Type (%)

Invasive 685 (76.2) 669 (74.2) 1091 (72.3) 1116 (67.9) 3561 (71.9)

In situ 124 (13.8) 152 (16.9) 240 (15.9) 312 (19.0) 828 (16.7)

LM 5 (0.6) 4 (0.4) 93 (6.2) 122 (7.4) 224 (4.5)

Data unavailable 85 (9.5) 77 (8.5) 84 (5.6) 93 (5.7) 339 (6.8)

Type of invasive (%)

SSM 375 (54.7) 396 (59.2) 709 (65.0) 747 (66.9) 2227 (62.5)

LMM 37 (5.4) 35 (5.2) 34 (3.1) 52 (4.7) 158 (4.4)

NM 208 (30.4) 164 (24.5) 277 (25.4) 238 (21.3) 887 (24.9)

ALM 7 (1.0) 18 (2.7) 10 (0.9) 15 (1.3) 50 (1.4)

Other 33 (4.8) 44 (6.6) 56 (5.1) 52 (4.7) 185 (5.2)

Data unavailable 25 (3.6) 12 (1.8) 5 (0.5) 12 (1.1) 54 (1.5)

Size, largest diameter (mm) (%)

≤5 52 (5.8) 110 (12.2) 162 (10.7) 238 (14.5) 562 (11.3)

5.01-6.00 47 (5.2) 67 (7.4) 68 (4.5) 105 (6.4) 287 (5.8)

6.01-7.00 50 (5.6) 73 (8.1) 63 (4.2) 90 (5.5) 276 (5.6)

7.01-8.00 72 (8.0) 65 (7.2) 107 (7.1) 107 (6.5) 351 (7.1)

>8 419 (46.6) 327 (36.3) 876 (58.1) 827 (50.3) 2449 (49.5)

Data unavailable 259 (28.8) 260 (28.8) 232 (15.4) 276 (16.8) 1027 (20.7)

Ulceration (%)

Absent 538 (59.8) 588 (65.2) 891 (59.1) 994 (60.5) 3011 (60.8)

Present 195 (21.7) 162 (18.0) 280 (18.6) 231 (14.1) 868 (17.5)

Non-assessable 0 (0.0) 0 (0.0) 29 (1.9) 27 (1.6) 56 (1.1)

Data unavailable 166 (18.5) 152 (16.9) 308 (20.4) 391 (23.8) 1017 (20.5)

T category (%)

T0 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

Tis 129 (14.3) 156 (17.3) 333 (22.1) 434 (26.4) 1052 (21.2)

T1 281 (31.3) 348 (38.6) 482 (32.0) 576 (35.1) 1687 (34.1)

T2 136 (15.1) 140 (15.5) 222 (14.7) 216 (13.1) 714 (14.4)

T3 125 (13.9) 82 (9.1) 165 (10.9) 153 (9.3) 525 (10.6)

T4 100 (11.1) 73 (8.1) 177 (11.7) 132 (8.0) 482 (9.7)

TX/Data unavailable 128 (14.2) 103 (11.4) 129 (8.6) 132 (8.0) 492 (9.9)

N category (%)

N0 754 (83.9) 791 (87.7) 1315 (87.2) 1475 (89.8) 4335 (87.5)

N1 33 (3.7) 19 (2.1) 38 (2.5) 18 (1.1) 108 (2.2)

N2 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

N3 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0) 0 (0.0)

NX/Data unavailable 112 (12.5) 92 (10.2) 155 (10.3) 150 (9.1) 509 (10.3)

M category (%)

M0 754 (83.9) 791 (87.7) 1315 (87.2) 1475 (89.8) 4335 (87.5)

M1a 0 (0.0) 0 (0.0) 1 (0.1) 1 (0.1) 2 (0.0)

M1b 2 (0.2) 1 (0.1) 6 (0.4) 1 (0.1) 10 (0.2)

M1c 9 (1.0) 4 (0.4) 4 (0.3) 5 (0.3) 22 (0.4)

MX/Data unavailable 134 (14.9) 106 (11.8) 182 (12.1) 161 (9.8) 583 (11.8)

Clark level of invasion (%)

I 129 (14.3) 156 (17.3) 333 (22.1) 434 (26.4) 1052 (21.2)

II 157 (17.5) 213 (23.6) 309 (20.5) 327 (19.9) 1006 (20.3)

III 213 (23.7) 205 (22.7) 344 (22.8) 374 (22.8) 1136 (22.9)

IV 231 (25.7) 190 (21.1) 338 (22.4) 293 (17.8) 1052 (21.2)

V 34 (3.8) 35 (3.9) 63 (4.2) 61 (3.7) 193 (3.9)

Non-assessable 2 (0.2) 0 (0.0) 22 (1.5) 37 (2.3) 61 (1.2)

Data unavailable 133 (14.8) 103 (11.4) 99 (6.6) 117 (7.1) 452 (9.1)

Breslow depth (%)

≤1.0 306 (34.0) 373 (41.4) 495 (32.8) 591 (36.0) 1765 (35.6)

1.01-2.00 136 (15.1) 140 (15.5) 222 (14.7) 216 (13.1) 714 (14.4)

2.01-4.00 125 (13.9) 82 (9.1) 165 (10.9) 153 (9.3) 525 (10.6)

>4.0 100 (11.1) 73 (8.1) 177 (11.7) 132 (8.0) 482 (9.7)

Non-assessable 0 (0.0) 0 (0.0) 88 (5.8) 82 (5.0) 170 (3.4)

Data unavailable 232 (25.8) 234 (25.9) 361 (23.9) 469 (28.5) 1296 (26.2)

Stage (%)

0 129 (14.3) 156 (17.3) 333 (22.1) 434 (26.4) 1052 (21.2)

I 369 (41.0) 439 (48.7) 610 (40.5) 708 (43.1) 2126 (42.9)

II 239 (26.6) 177 (19.6) 353 (23.4) 317 (19.3) 1086 (21.9)

References

Related documents

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

In this study we used program MARK to compute annual survival estimates for adults (age-specific survival), check difference between juvenile and adult survival, and

Om dendritceller presenterar tumör- antigen för T celler under immunstimulerande förhållanden kan dessa sedan utvecklas till effektor celler, som har kapacitet att

3.2 Agonistic anti-CD40 antibody administrated in combination with adoptive T cell transfer from pmel-1 TCR transgenic mice leads to tumour regression and delay in tumour growth

2,3 As in other studies, some breeds had a low risk for developing DM (eg, Golden Retriever, Boxer, Papillon, Table 4. Kaplan-Meier survival estimates after 1st insurance claim for

While reverse causality is a principal concern in retrospective studies, the current study aimed to provide a comprehensive evaluation of pre-diagnostic circulating 25(OH)D and risk

In [4] a trigonometric regression is also used for infectious disease data but with an additional autocorrelation effect (a “parameter- and data-driven” model). In [13], the

To identify biomarkers that accurately predict clinical outcome in GIST patients, global gene expression profiling was performed based on KIT mutations associated