• No results found

ESTIMATING CAUSAL EFFECTS OF RELAPSE TREATMENT ON THE RISK FOR ACUTE MYOCARDIAL INFARCTION AMONG PATIENTS WITH DIFFUSE LARGE B-CELL LYMPHOMA

N/A
N/A
Protected

Academic year: 2021

Share "ESTIMATING CAUSAL EFFECTS OF RELAPSE TREATMENT ON THE RISK FOR ACUTE MYOCARDIAL INFARCTION AMONG PATIENTS WITH DIFFUSE LARGE B-CELL LYMPHOMA"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

ESTIMATING CAUSAL EFFECTS OF RELAPSE

TREATMENT ON THE RISK FOR ACUTE MYOCARDIAL INFARCTION AMONG PATIENTS WITH DIFFUSE LARGE

B-CELL LYMPHOMA

Submitted by Jakob B¨ orsum

A thesis submitted to the Department of Statistics in partial fulfillment of the requirements for a two-year Master of Arts degree

in Statistics in the Faculty of Social Sciences

Supervisor Ingeborg Waernbaum

Spring, 2021

(2)

Abstract

This empirical register study intends to estimate average causal effects of relapse treatment on the risk for acute myocardial infarction (AMI) among patients with Diffuse B-Cell Lymphoma (DLBCL) within the potential outcome framework. The report includes a brief introduction to causal inference and survival anal- ysis and mentions specific causal parameters of interest that will be estimated. A cohort of 2887 Swedish DLBCL patients between 2007 and 2014 were included in the study where 560 patients suffered a relapse.

The relapse treatment is hypothesised to be cardiotoxic and induces an increased risk of heart diseases. The identifiability assumptions need to hold to estimate average causal effects and are assessed in this report.

The patient cohort is weighted using inverse probability of treatment and censoring weights and potential marginal survival curves are estimated from marginal structural Cox models. The resulting point estimate indicates a protective causal effect of relapse treatment on AMI but estimated bootstrap confidence intervals suggest no significant effect on the 5% significance level.

Keywords: Causal Inference; Survival; Analysis; Marginal Structural Model

(3)

Contents

1 Introduction 3

2 Causal Inference 5

2.1 Potential Outcomes Framework . . . 5 2.2 Identifiability Assumptions . . . 7

3 Causal Survival Analysis 8

3.1 Survival Analysis . . . 8 3.2 Causal Parameters . . . 11

4 Estimation 13

4.1 Inverse Probability Weights . . . 13 4.2 Marginal Structural Cox Models . . . 14 4.3 Survival Function Estimands . . . 16

5 Empirical Application 17

5.1 Swedish Lymphoma Register . . . 19 5.2 National Patient Register . . . 19

6 Results 21

6.1 Propensity Score Modeling . . . 21 6.2 Causal Effects . . . 25 6.3 Model Diagnostics . . . 28

7 Conclusion 29

8 Discussion 29

(4)

1 Introduction

Diffuse Large B-Cell Lymphoma (DLBCL) is the most common subtype of Non-Hodgkin lymphoma in Swe- den, Europe and the United States [Swedish Lymphoma Group, 2019] [Maurer et al., 2014]. The lymphoma is particularly aggressive and, if the patient is untreated, the expected survival is less than 1 year. However, curative treatment exists in form of chemotherapy and the standard-of-care regimen is the anthracycline- based R-CHOP [Harrysson et al., 2021]. The majority of patients respond to the treatment but about a third fail to achieve remission or have a relapse shortly thereafter. Most relapses occur within 24 months from the remission date. Among Swedish DLBCL patients, about 20% have relapsed after curative first-line treatment between 2007 and 2018 [Harrysson et al., 2021]. While the first line treatment with curative in- tent has shown increased risk of heart diseases, DLBCL relapse may require additional, even more intensive, treatment which might increase cardiotoxicity even further. This worsens the patient’s condition and might increase the risk of suffering from myocardial infarction or heart failure [Schmittlutz and Marks, 2021]. The relapsed induced risk increase of heart related diagnoses has not been investigated within a potential outcome framework. Previously, relapse data has not been available. Although, if there is a causal difference between treatment groups (the group of patients that received relapse treatment compared to the group of patients that did not relapse), it can be an indication of necessity for a preventive action. Feasible remedies might be further prescriptions of heart medication in addition to anthracycline based chemotherapy or an alteration of compounds in the treatment itself.

There is a general understanding among the clinicians that anthracycline based regimens are cardiotoxic and several observational studies have shown statistical association between R-CHOP and myocardial in- farction or heart failure [Limat et al., 2014]. Statistical modeling techniques such as survival analysis (with extensions thereof) or flexible parametric models are widely used and applied in other areas in epidemiology and oncology. However, no observational study has analyzed the effect of R-CHOP on cardiotoxicity in a causal framework. Briefly, causality implies that there is a cause and an effect, where the cause influences the effect. In the setting of a randomized controlled trial (RCT), or clinical trial, which is the gold stan- dard of evaluating the effect of a drug, the effect is interpreted causally [Parra et al., 2020]. This is because (amongst other things) the assignment mechanism is random; the individual’s age, comorbidities or any other measurement will not impact the probability of treatment assignment. In the real world, where infor- mation is gathered from patient registers retroactively, the assigned treatment of diseases is rarely random [Hern´an and Robins, 2020]. Especially in cancer treatment, the patient profile is important to evaluate be- fore starting treatment in the form of chemotherapy or palliative care. Frail patients (for example the elderly or patients with comorbidities) are unlikely to tolerate the full dose of chemotherapy and may receive pal- liative treatment instead [Schmittlutz and Marks, 2021]. In short, the patient characteristics differ between treatment groups and need to be taken into consideration when estimating causal parameters. Consequently, a clinical trial would be lethal for the patients that are not treated and therefore not applicable due to eth-

(5)

ical reasons. Observational studies on real world evidence are applied instead but require adjustment for differences between treatment groups to obtain statistical relationships with a causal interpretation. Com- mon medical practice adresses causal relationships and are carried out by clinicians based on their acquired subject matter knowledge [Hern´an and Robins, 2020]. This report uses a formal methodology for estimation of causal effects in observational studies and describes the assumptions underlying a causal analysis. This report is structured as follows. The first section is an introduction to causal inference within the potential outcomes and the identifiability assumptions are explained. The second section is describing how to draw causal inference within the survival analysis framework, defining causal estimands as well as hazard and sur- vival functions. The third section define the propensity score model and the marginal structural cox model in order to estimate causal parameters. The fourth section introduces the empirical application of DLBCL patients and the data collected from patient registers. Further, the identifiability assumptions are assessed in the ”Results” section together with the estimated causal effects. Lastly, a discussion section describes the limitation of the study and proposes examples of further research.

(6)

2 Causal Inference

This section will use the DLBCL study as a running example when introducing the potential outcome framework.

2.1 Potential Outcomes Framework

A key objective in medical applications is to estimate a causal effect of a treatment variable, A, on an outcome variable of interest, Y . For simplicity, consider A and Y to be dichotomous (the variables can only take value 0 or 1) [Rubin, 1974]. We use the potential outcome framework to define a causal effect of a treatment. Consider a patient that was diagnosed with DLBCL, treated with chemotherapy with a curative intent and later in remission. After the remission date, the patient might get a relapse and a relapse treatment, often in higher doses than the first treatment. The relapse treatment is additional use of chemotherapy and is particularly aggressive [Schmittlutz and Marks, 2021]. The relapse treatment is the treatment variable of interest in this study and we set A = 1. Some time after the relapse treatment, the patient might suffer from an acute myocardial infarction (AMI), Y = 1, and this event can be summarized in notation as Ya=1 = 1. Now consider the same patient that was in remission from DLBCL but, in a theoretical world, never relapsed and never suffered from AMI, or in notation form: Ya=0= 0. Rephrased, the potential outcome if the patient was not treated for relapse is Ya=0= 0. The causal effect would then be Ya=1− Ya=0= 1. However, the real value of Ya=0 will forever be unknown since the patient actually was treated. The patient can never be untreated with chemotherapy after the treatment. This is known as the fundamental problem of causal inference and the non observable potential is therefore a counterfactual [Holland, 1986]. Hence, a causal effect is defined as a contrast between a factual and a counterfactual. The statistical solution for the fundamental problem analyses the population averages as in Equation 1. Instead of using the non observable counterfactuals to estimate the causal effect, we use population averages which are possible to estimate. Furthermore, we are interested to draw causal inference for the entire population of patients that are in remission from DLBCL, not causal effects for patients individually.

ATE = E(Ya=1) − E(Ya=0). (1)

Equation 1 holds under the assumptions of conditional exchangeability and consistency, which are later discussed. The first term in Equation 1 is the expected value of AMI incidence if all patients were to suffer a relapse while the second term is the expected value of AMI incidence if no patients were in relapse. This effect is often captured in clinical trials, or randomized controlled trials, where treatment and placebo is randomly assigned to patients. The treatment is therefore independent of the potential outcome: Ya⊥⊥A.

Although, if treatment is not randomly assigned, the independence does not hold anymore and we can only discern associations from Equation 1. Consider an observational study where past patient information is gathered from patient registers, rather than a clinical trial. Patients that are more likely to experience a

(7)

relapse are older patients and patients with comorbidities. In other words, the distribution of the variables, or covariates, differs between treatment groups. When the treatment and the outcome are dependent on a set of covariates, the covariates are considered to be confounders. In some studies, the difference between treatment groups can be so large that it is not valid to compare them. Instead, the contrast in potential outcomes only within the treated group can be studied and is referred to as the average treatment effect on the treated, ATT. The ATT can be interpreted as the effect of the treatment on the population receiving the treatment and may be of more interest for medical researchers. The research question is then what effect the treatment will have on the subpopulation that the treatment is applicable for. The ATT is defined in Equation 2.

ATT = E(Ya=1|A = 1) − E(Ya=0|A = 1). (2)

In a clinical trial, where the treatment is randomized, we get P (Y |A = 0) = P (Ya=0|A = 0) = P (Ya=0), where the first equality follows from the consistency assumption and the second from the randomization.

In observational studies, there is no reason to assume aforementioned equality [Hern´an and Robins, 2020].

This report will study both the ATE and the ATT and interpret the results accordingly.

Denoting possible confounders L, the average causal effect can be estimated for each level of the confounders, for example when patients are younger than 60 years old. This will remove the distribution differences be- tween treatment groups and we can state that the potential outcome Yais independent of treatment A given the confounders L, or Ya ⊥⊥A|L. The average causal effect can then be identified with the observed data since

E(Y1− Y0) = E(E(Y1− Y0|L = l)) (3)

= E(E(Y1|A = 1, L = l)) − E(E(Y1|A = 0, L = l)) (4)

= E(E(Y |A = 1, L = l)) − E(E(Y |A = 0, L = l)) (5)

due to the consistency assumption. However, Equation 5 will only yield valid average causal effects if L include all confounders in order for the expression Ya ⊥⊥ A|L to hold. In an observational study about cancer treatment, the only available patient data is the data kept in clinical lymphoma register. Naturally, it is possible that other variables, not recorded in clinical or other national health registers, differ between treatment groups and have a theoretical causal influence on the outcome.

(8)

2.2 Identifiability Assumptions

To draw causal conclusions in an observational study some assumptions needs to hold and are referred to as identifiability assumptions. The identifiability assumptions are exchangeability, positivity and consistency [Hern´an and Robins, 2020]. The assumption of exchangeability is valid if the treated group would have experienced the same potential outcome, that is the event of acute myocardial infarction, independent of the assigned treatment. Conversely, if the untreated group was treated, the same average outcome is expected as the average outcome in the treated group. However, for the DLBCL patients, the two treatment groups have differences in distribution of some patient characteristics such as age, comorbidities and cancer stage and are all variables that are associated with relapse status. A patient in poor condition, for example with a progressed cancer or history of cardiovascular diseases (CVD), would expect a worse average outcome (a higher risk of myocardial infarction) than a patient in better condition, if the patient were to be treated, and the treatment groups are therefore not exchangeable. This yields a selection bias and needs to be controlled for. However, within levels of these patient characteristics, or outcome predictors L (for example L = 1), we assume that the patients are exchangeable and the statement of conditional exchangeablity hold as in Equation 6

Ya ⊥⊥A|L (6)

where the potential outcome Ya ⊥⊥ A|L, a = 0, 1 [Hern´an and Robins, 2020]. The formula holds if the covariates L control for all relevant confounding: variables that differ in distribution between treatment groups that also affect outcomes. One method to control for the confounders parametrically is to implement a propensity score model which predicts the probability of getting the treatment given a set of confounders.

If unmeasured confounding is present, the potential outcomes will depend on the assignment of treatment and the assumption of exchangeability will be violated [Hern´an and Robins, 2020]. In the real world, the assumption of conditional exchangeability is not testable and can only be argued by subject experts. Another assumption that needs to hold for causal inference is positivity: the probability for the patients in the population to encounter a relapse need to be 0 < P r(A = a|L = l) < 1 for all levels of L. Some patient profiles may include clinical characteristics such that relapse is very likely, the probability is close to 1. To assess the viability of the positivity assumption, a propensity score model is fit to the data and the propensity score, the individual probability of being assigned treatment, is plotted. Furthermore, the assumption of consistency means that the observed risk of myocardial infarction for every treated patient equals the outcome if the patient was treated. For all untreated patients, the observed risk of myocardial infarction should equal the outcome if the patient was not treated. Generally, the observed outcome should under treatment level a be the same as the potential outcome if the patient was treated with treatment level a, Ya= Y . Lastly, all causal parameters assume no interference between units and that assumption is referred to as the ”stable- unit-treatment-value assumption (SUTVA). That is, the potential outcome of one individual does not depend

(9)

on the treatment value of another individual [Hern´an and Robins, 2020].

3 Causal Survival Analysis

In this section, the potential outcome framework will be implemented within survival analysis. In this section, the framework of survival analysis is briefly introduced, the typical data structure is visualized and the hazard function as well as the survival function are defined. Lastly, causal estimands of potential survival times are considered for both the average treatment effect and the average treatment effect on the treated.

3.1 Survival Analysis

A survival analysis is conducted when we measure elapsed time until an event. The event does not necessarily need to be death which the name imply. It can be any event of interest; an earthquake, a disease diagnosis or anything else that can happen in time. A common question is if the survival length differs between two groups of individuals. For example, patients that were treated with a specific drug and patients that were not treated with that drug. Survival analysis is currently used in many research fields but originates from, and is widely used in, clinical and epidemiological research where the event often is death, hence the name. The event is often binary which means it can take two values: either the event occurs or it does not. In a study where survival analysis is applied, information is collected from participating individuals during a specific time frame. This time frame has a prespecified starting point, an end point and this interval is referred to as the follow-up window [Kleinbaum and Klein, 2012]. In observational studies, where information about individuals are collected from healthcare registers, the patients are included in the study after the observed remission date. A key concept in survival analysis are time scales where different time scales will yield different interpretations, depending on the study design. If the event death after disease diagnosis is of interest, the time scale is suggested to be elapsed time since diagnosis [Westreich et al., 2010]. Consider an imaginative study of a new treatment for cancer patients. The study period window is between 2000 and 2005 and during that time information is collected about the patients health condition from the National Cancer Register. A patient is considered to be at risk of an occurring event, death, after the cancer diagnosis and assigned treatment, i.e. the start of the follow-up window [Flynn, 2012]. Consider the visual example below of a simulated data structure. The left plot in Figure 1 visualizes the time point where the patient undergoes treatment and the follow-up window starts at the green points. The cross symbol indicates the event occurring and the square symbol indicates withdrawal from study, or death (if death is not the event of interest), which is referred to as censoring and will be discussed in detail in later sections. The right plot in Figure 1 visualizes the elapsed time in the study for each patient until either the event or censoring occurs.

In this simulated example, 7 out of 10 patients are dead within the first 500 days after diagnosis. One of the most commonly used techniques in clinical trials and epidemiological research is to estimate a survival curve using a Kaplan-Meier estimator. The Kaplan-Meier estimator is a non parametric estimator of the survival

(10)

function often visualized in a graph of the survival curve [Kleinbaum and Klein, 2012].

Figure 1: Visualized simulated data structure

Figure 2: Kaplan-Meier plot of simulated data

(11)

From Figure 2 we can tell that the survival curve is monotonically decreasing. The survival function is steeper when more patients die, for every death there is a bump downwards in the survival curve. For the 10 simulated patients, 20% are still alive after 1000 days.

We introduce the concept of survival analysis in mathematical notation and consider the discrete time as a random variable that can take all positive continuous numbers. The survival function is the probability of longer survival than a prespecified time point t and is defined in Equation 7

S(t) = P r(T > t) = Z

t

f (u)du = 1 − F (t) (7)

where F (t) = P r(T ≤ t) is the cumulative distribution function. One example of an estimated survival function, later referred to as the survival curve, is the red line in Figure 2. Another important part of survival analysis is the hazard function h(t). The discrete hazard function gives the instantaneous risk per time point for the event to occur, given that the individual has survived until t and is defined in Equation 8.

h(t) = lim

∆t→0

P r(t ≤ T < t + ∆t|T ≥ t)

∆t (8)

The discrete hazard function is interpreted as a rate per discrete time unit rather than a probability. Inter- estingly enough, if the hazard is a known constant λ, the survival function is S(t) = e−λt or

h(t) = −hdS(t)/dt S(t)

i⇔ S(t) = exph

− Z t

0

h(u)dui .

That is, if either S(t) or h(t) is known, we can find the other. The survival function and the hazard function are key components of a survival analysis and there are multiple ways of estimating them, which will be ad- dressed in later sections. Furthermore, applications of survival analysis within the causal inference framework are mainly interested in potential survival time. Although, in the previous section, we have considered causal inference for completely observed outcomes. That is, if we consider relapse status as treatment and AMI as the outcome, we will always be able to observe if the individual suffered from AMI or not (Y = 1 or Y = 0) and may draw causal conclusions on the effect of the treatment, under the identifiability assumptions de- scribed in Section 2. However, in survival analysis, the survival times, T1or T0, are not always observed and restrict us from using certain, otherwise handy, causal inference methodologies [Hern´an and Robins, 2020].

When a patient’s outcome is not observed, it is due to censoring and this concept is specific to survival analysis [Kleinbaum and Klein, 2012].

If a patient does not experience the event or is in some way withdrawn prior to the end of the study, the patient is censored. One type of censoring is administrative censoring and occurs when the patient has not yet experienced the event and the patient recording stops, that is t = T . There is no way for a patient to withdraw from an observational study, in comparison to a clinical trial. However, if the patient dies during

(12)

follow-up it is not possible to continue to record patient data for later time points and the patient is then censored. The reason for censoring patients is vital for later model assumptions. Censoring is informative if the reason for censoring is related to the risk for the outcome. For example, if a certain exposure is so harmful that patients will more often withdraw from a clinical trial compared to other exposure groups or if one exposure yields an increase in mortality during the follow-up window. The presence of informative censoring will lead to biased estimates and needs to be considered when, for example, estimating weights to create a non confounded pseudopopulation [Kleinbaum and Klein, 2012].

Censoring in survival data need to be included in the framework of counterfactuals in order to estimate causal parameters. We are interested in the potential outcome if the patient received relapse treatment and never censored during follow-up, compared to the potential outcome if the patient was not in relapse and never censored during follow-up. Consider the variable C that takes value 1 if the patient was censored at time t and 0 if the patient was not censored at time t. The censoring history of the patient is ¯c and is a vector of 0’s and 1’s. The causal estimator is then Ya=1,¯c=0− Ya=0,¯c=0. All counterfactuals that are part of causal parameters in later sections include the assumption of no censoring history but are omitted for notational simplicity [Hern´an and Robins, 2020].

Previously, potential outcomes are denoted as Y1or Y0for a dichotomous event variable. However, typically in survival analysis we are interested in survival time and consequently potential survival time Ta. Consider two potential survival times Ta=1 and Ta=0 when assigned relapse treatment, A = 1 or not, A = 0. The average treatment effect (ATE) is estimated in Equation 9.

ATE = E(Ta=1) − E(Ta=0) (9)

The average treatment effect of the treated (ATT) in the context of survival times is estimated in Equation 10.

ATT = E(Ta=1|A = 1) − E(Ta=0|A = 1) (10)

3.2 Causal Parameters

This section introduces causal parameters in the context of survival analysis. The potential survival func- tion for each potential treatment group is denoted by S1(t) and S0(t). The (observed) survival function S(t) is most commonly estimated using the Kaplan-Meier method. However, to estimate standardised survival curves and differences between treatment groups, we are interested in E(S1(t)|A = 0, L) and E(S0(t)|A = 0, L). The objective is to calculate contrasts between the potential survival curves and draw causal conclusions such as the average causal effect. A general estimator is found in Equation 11 where ˆS1(t)

(13)

and ˆS1(0) are estimated potential survival curves

∆ˆACE= Z

0

1(t)dt − Z

0

0(t)dt (11)

and is interpreted as the mean difference in survival time if all individuals were treated and not treated, respectively. Equation 11 calculates the differences between areas under the survival curves for all time points t. In observational studies, Equation 11 is not often used in practice due to censoring. Not all time points t are observed for all individuals and the distribution of Sa(t) is not identifiable non parametrically.

Furthermore, there might be of clinical interest to find causal survival differences at a specific time point (t∗). There might be differences in the short term but no differences in the long term. The restricted average survival causal effect is calculated to discern causal differences up until a prespecified time point t = t∗,

∆ˆRACE(t∗) = Z t∗

0

1(t)dt − Z t∗

0

0(t)dt. (12)

The attentive reader may see that ˆ∆RACE(t∗) → ˆ∆ACE as t∗ → ∞ [Mao et al., 2018]. Equation 12 can be estimated in an iterative manner for different time points and is valid if the propensity score model is correctly specified.

Another causal parameter of interest is the marginal hazard ratio estimated from a weighted Cox regres- sion that is a function of hazard rates. The marginal hazard rate for a potential outcome is defined as P r(Tt+1a |Tta= 0) and a marginal hazard ratio is the ratio of hazard rates for the two levels of a = 0, 1. The hazard rate is measured throughout the follow up and is consequently dependent on follow up time. If the hazard ratio change over time during follow up, but only one hazard ratio is reported, the researcher fail to capture interesting dependencies. One remedy is to define hazard ratios for a specific time period, a subset of the complete follow up. This will make it possible for the hazard ratio to change between time periods but might introduce a selection bias [Hern´an, 2010] [Hern´an et al., 2004]. Another method is to report average hazard ratios for increasingly longer time periods. The survival probability causal effect is the difference between two survival probabilities at a time point t∗ and is calculated in Equation 13.

∆ˆSP CE(t∗) = ˆS1(t∗) − ˆS0(t∗) (13)

If the proportional hazard assumption does not hold, it would still be valid to estimate ˆ∆SP CE(t∗) [Mao et al., 2018].

(14)

4 Estimation

This chapter discusses the theoretical background of inverse probability weights as an approach to estimate causal parameters from the Marginal Structural Cox Model. The Cox model will be implemented to estimate the survival function and adjusted survival curves. Survival curve differences may have a causal interpretation when the treatment is independent of the potential outcome, as previously discussed. However, if there are variables present that are different in distribution between treatment groups (confounders) we need to adjust for them and then obtain an adjusted survival curve, that will be the estimated survival function ˆSa(t) of Sa(t). To adjust for confounding, we implement a parametric model, which is referred to as a propensity score model, to estimate inverse probability treatment weights and inverse probability of censoring weights respectively.

4.1 Inverse Probability Weights

Inverse probability weights (IPW) are weights defined by the inverse of the probability of treatment assign- ment. The main objective with IPW is to achieve covariate balance between treatment groups to uphold the assumption of conditional exchangeability. The propensity score is defined as the probability of treatment assignment given a set of covariates, ea(L) = ˆP r(A = 1|L = l). Similarly, the probability of censoring, given a set of covariates is defined as ec(L) = ˆP r(C = 1|L = l). Both weights are estimated using logistic regression as in Equation 14.

e(L) = exp(LTβ)

1 + exp(LTβ). (14)

The inverse probability of treatment weight and the inverse probability of censoring weight are implemented to calculate the final weight in Equation 15 [Mao et al., 2018].

w = ω(ea(L))

Aea(L) + (1 − A)(1 − ea(L))· ω(ec(L))

Aec(L) + (1 − A)(1 − ec(L)) (15) Where ω(e) in Equation 15 is dependent on the causal parameter of interest. When estimating the average treatment effect, ATE, ω(e) = 1 and the weights are then denoted wAT E. When estimating the average treatment effect of the treated, ATT, ω(e) = ei and the weights are then denoted wAT T. Considering the estimation of ATT, this formula makes the treated group the reference group and the covariate distribution is the same as in the sample of the treated group. If there is instability in the estimated propensity score, yielding extreme values for the estimated weights, one can use the stabilized weights [Cole and Hern´an, 2004].

The researcher can multiply the estimated weight by the marginal probability of receiving the actual received treatment. Another remedy for extreme weights is that of truncation. If the weights exceed a specified threshold value, they are removed from the analysis. These weights, together with the censoring weights, are incorporated in the Cox Proportional Hazard modeling to estimate the ATE and ATT, respectively

(15)

[Austin and Stuart, 2017]. Furthermore, to check if the inverse probability weights are valid to use as a balancing score we calculate standardized mean differences, d, between treatment group covariate levels

d = (¯x0− ¯x1) qs20+s21

2

where ¯xa is the sample mean of the weighted data for treatment group a and s2a is the sample variance of the weighted data in treatment group a. There have been suggestions that the balance should be improved if the imbalance exceeds 10% since the balancing method is not considered to be adequate. However, if the imbalance is lower than 10%, the researcher can consider the covariates to be properly balanced across treatment groups [Austin, 2009].

4.2 Marginal Structural Cox Models

The Cox Proportional Hazards model is most commonly used in multivariable survival analysis. In compar- ison with a Kaplan-Meier estimator, the Cox model estimates the instantaneous risk corresponding to the covariate levels, respectively. Consider the potential hazard when assigned treatment a at time t given a set of baseline covariates L0, that is hTa(t|L0), and can be modeled as in Equation 17.

hTa(t|L0) = h0(t) exp(β1At+ βp0L0) (16)

where h0(t) is the unspecified baseline hazard and will be the hazard at time t if βp= 0. The time varying treatment indicator Ait at time t = 1, 2, ..., T can take any vector value from never treated {0, 0, ..., 0} to always treated {1, 1, ..., 1}. The vector βp is the transpose of log hazard ratios for the p number of baseline covariates L0. The hazard ratio is defined as the ratio of hazard rates as in

h(t|A = 1, L)

h(t|A = 0, L) = exp(β).

In this study, the marginal structural Cox regression is implemented using inverse probability weights and the hazard ratio is interpreted as the marginal hazard ratio [Austin, 2013]. The marginal hazard ratio is the average effect between two identical patient cohorts, except for treatment status. If β > 0, the hazard ratio will be greater than 1 and indicates that when the specific covariate value increases, the hazard rate of the outcome increases. The covariate is positively associated with the event probability. Lastly, T¯ais the potential survival time under treatment a and consequently the hazard hTa(t|Li0) is the potential hazard of the event. If the identifiability assumptions hold and the propensity score model is correctly specified we can estimate Equation 17 using the inverse probability of treatment weights for the observed treatment a as in

hwTa(t|L0) = h0(t) exp( ˆβ1At+ ˆβp0L0) (17)

(16)

where ˆβ1 → β1 and hwTa(t|L0) → hTa(t|L0) [Westreich et al., 2010]. However, if the inverse probability weights are estimated by a model including the baseline covariates L0, Equation 17 reduces to hwTa(t) = h0(t) exp( ˆβ1At). That is, only the potential hazard ratio of the treatment is estimated. The potential hazard ratio hTa(t|L0) is the ratio of the average difference in survival time if everyone received the treatment in comparison to if everyone did not receive the treatment. To estimate the coefficients β we first consider the partial likelihood Li(β) as in Equation 18.

Li(β) = h(Ti|Li) P

Tn≥Tih(Ti|Ln) (18)

= h0(Ti) exp( ˆβ1Ait+ βp0Li0) P

Tn≥Tih0(Ti) exp( ˆβ1Ait+ βp0Ln0) (19)

= exp ˆβ1Ait+ (βp0Li0) P

Tn≥Tiexp( ˆβ1Ait+ Lnβ) (20)

The sum in the denominator in Equation 18 is over the set of patients n where the event has not occurred be- fore time Ti. It follows that 0 < Li(β) < 1 and, after assuming that all patients are statistically independent, the joint probability becomes L(β) =Q

i:Yi=1Li(β). The log partial likelihood is then

l(β) = X

i:Y1=1



Liβ − log X

Tn≥Ti

exp(Lnβ)

and is maximized for β to produce partial maximum likelihood estimates ˆβ [Cox, 1972]. The baseline hazard function h0(t) disappears in the likelihood function and can remain unspecified. The Cox model is considered semi parametric due to the presence of an unspecified function (an estimate of the baseline hazard function is although required for the estimated survival function). To estimate standard errors of the log hazard ratio estimate we implement a bootstrap method, since likelihood methods and robust sandwich estimators may yield biased estimates when the Cox model is weighted. The implemented bootstrap method was to draw 200 bootstrap samples and estimate inverse probability weights and log hazard ratios from the marginal structural Cox model for each iteration. Then, the standard deviation of the 200 estimated log hazard ratios will figure as the working standard error of the log hazards estimated on the complete data. The confidence interval is then ˆβ ± 1.96 · se( ˆβ) [Austin, 2016].

(17)

4.3 Survival Function Estimands

When all parameters in the Cox model are estimated we can use the model for prediction of the survival curve and produce the adjusted survival curve. Recall the relationship between the hazard function and the survival function. Similarly, the survival function is estimated in Equation 21 [Link, 1984].

S(t|L) = Eˆ L(exp(− ˆΛ0(t) exp( ˆβ0Li)) = 1 N

N

X

i=1

exp(− ˆΛ0(t) exp( ˆβ0Li)) (21)

where ˆΛ0(t) is the estimated cumulative hazard function. When estimating the average treatment effect on the treated we instead implement Equation 22.

S(t|L) =ˆ 1 N

N

X

i,Ai=1

exp(− ˆΛ0(t) exp( ˆβ0Li)) (22)

The cumulative hazard function is estimated in Equation 23.

Λˆ0(t) = Z t

0

0(x)dx (23)

The baseline hazard function ˆh0 is estimated by the Breslow estimator in Equation 24,

ˆh0(t) =

n

X

i=1

I( ˜Ti ≤ t)δi

P

j∈Riexp( ˆβ0Lj( ˜Ti)) (24) where ˜Ti = min(Ti, Ci) and δi = I(Ti ≤ Ci). The time until an occurring event for individual i is denoted Ti and the time until censoring for individual i is denoted Ci. Lastly, the study period for individual i is denoted Ri [Lin, 2007]. The Cox adjusted survival curve will be used as an estimation of the survival function and to calculate the causal parameters in Equation 12. Confidence intervals for estimated survival curves are estimated with a bootstrap method. The working method was to draw 200 bootstrap samples from the patient data and estimate inverse probability weights, log hazard ratios and survival functions for each iteration as in the aforementioned procedure. Survival probabilities for each time point were stored for each iteration and the 2.5th percentile and 97.5th percentile of all iterations for each time point were con- sidered as the lower and upper bound respectively. This method was implemented for both treatment groups.

Interpreting hazard functions as constants throughout the study period relies on the proportional hazards assumption. This assumption is formally tested using the χ2test statistic in Therneau and Grambsch (1994).

If the test is significant, we reject the assumption of proportional hazards and state that there is a present time dependent component not captured by the model [Grambsch and Therneau, 1994].

(18)

5 Empirical Application

The empirical application of the aforementioned methodology is on a cohort of DLBCL patients. The study question is to examine if relapse treatment has a causal effect on acute myocardial infarction incidence.

The patients are included in the study cohort based on some inclusion criteria. The included patients were diagnosed with DLBCL between January 1, 2007 to December 31, 2014 and later started chemotherapy treatment with curative intent. After the curative treatment, the disease is evaluated and classified into complete remission, partial remission, stable disease or progressive disease. The patients with complete remission or partial remission from diffuse large B-cell lymphoma between January 1, 2007 to December 31, 2018 are included in the study. After the patients are in partial or complete remission, some patients relapse which may require additional chemotherapy. The relapse status will therefore be a proxy for relapse treatment and is defined as the treatment variable in this study. The binary event variable is the acute myocardial infarction incidence that every patient is at risk of suffering from during follow up. However, data on the exact treatment type is not available. The hypothesis is that relapse treatment is cardiotoxic.

If the study results in significant causal effects on AMI incidence, there is reason to analyze the specific assigned chemotherapy types for the relapse patients. The patients can be censored for two different reasons.

Either the patient dies during the follow-up period or the patient is alive at the end of the follow up period and has not suffered from the outcome (AMI). The data is collected from the Swedish Lymphoma Register and the National Patient Register. There are only a few missing values in the data which are considered missing completely at random and consequently omitted listwise from the data.

There are 2887 patients in the analysis data set that fulfills the inclusion criteria where 560 patients re- ceived treatment while not having experienced the event AMI since the remission date. Of the 560 patients, 12 patients suffered from AMI during follow up and 548 patients were censored. Of the 2327 patients not treated, 116 patients experienced the event and 2211 patients were censored. The proposed confounders for the treatment and outcome are considered well established and is included in most epidemiology studies about DLBCL disease. After including the clinical characteristics as covariates in the model we assume no unmeasured confounding for the treatment, outcome and censoring. All covariates are measured at the time of remission and held fixed throughout the study period.

Two covariate sets are implemented in the propensity score modeling. The first set include Ann Arbor Stage, ECOG Performance Status, s-LDH level, comorbidity level, year of diagnosis and age at baseline.

This covariate set included detailed information for each variable. Although, to mimic other DLBCL epi- demiology studies, the variables age at baseline, Ann Arbor Stage, s-LDH level and ECOG performance status have, together with the number of extranodal sites, been merged into a widely used index, IPI, which is later described in more detail.

(19)

Table 1: Patient characteristics Characteristics

DLBCL patients N(%)

Relapse patients N(%)

No Relapse patients

N(%) All individuals 2887 (100) 560 (19.4) 2327 (80.6) Age at remission

< 60 731 (25.3) 121 (21.6) 610 (26.2)

60-69 839 (29.1) 171 (30.5) 668 (28.7)

70-79 859 (29.8) 168 (30.0) 691 (29.7)

80-89 441 (15.3) 97 (17.3) 344 (14.8)

89 < 17 (0.6) 3 (0.5) 14 (0.6)

Sex

Male 1636 (56.7) 330 (58.9) 1306 (56.1)

Female 1251 (43.3) 230 (41.1) 1021 (43.9)

Year of diagnosis

2007-2010 1499 (51.9) 323 (57.7) 1176 (50.5)

2011-2014 1388 (48.1) 237 (42.3) 1151 (49.5)

Pre-existing comorbidities

None 1315 (45.5) 250 (44.6) 1065 (45.8)

Mild/Moderate 651 (22.5) 139 (24.8) 512 (22.0) Atrial fibrillation 81 (2.8) 13 (2.3) 68 (2.9) Hypertension 566 (19.6) 121 (21.6) 445 (19.1)

Diabetes 142 (4.9) 34 (6.1) 108 (3.9)

Gastrointestinal bleeding 10 (0.3) 1 (0.2) 9 (0.4)

Severe 921 (31.9) 171 (30.5) 750 (32.2)

Stroke 199 (6.9) 31 (5.5) 168 (7.2)

Heart failure 115 (4.0) 18 (3.2) 97 (4.2)

Myocardial infarction 143 (5.0) 27 (4.8) 116 (5.0) Peripherals Arterial Diseases 76 (2.6) 21 (3.8) 55 (2.4)

Angina Pectoris 258 (8.9) 49 (8.8) 209 (9.0)

Renal failure 49 (1.7) 6 (1.1) 43 (1.9)

Intracranial bleeding 28 (1.0) 4 (0.7) 24 (1.0)

Cancer 358 (12.4) 64 (11.4) 294 (12.6)

Dementia 4 (0.1) 1 (0.2) 3 (0.1)

COPD 244 (8.5) 47 (8.4) 197 (8.5)

ECOG Performance Status

ECOG 0 1532 (53.1) 257 (45.9) 1275 (54.8)

ECOG 1 954 (33.0) 199 (35.5) 755 (32.4)

ECOG 2 232 (8.0) 63 (11.2) 169 (7.3)

ECOG 3 133 (4.6) 26 (4.6) 107 (4.6)

ECOG 4 36 (1.3) 15 (2.7) 21 (0.9)

s-LDH

Normal 1250 (43.3) 170 (30.4) 1080 (46.4)

Increased 1637 (56.7) 390 (69.6) 1247 (53.6)

Cancer Stage

Stage 1 633 (21.9) 60 (10.7) 573 (24.6)

Stage 2 638 (22.1) 89 (15.9) 549 (23.6)

Stage 3 633 (21.9) 148 (26.4) 485 (20.8)

Stage 4 983 (34.0) 263 (47.0) 720 (30.9)

(20)

5.1 Swedish Lymphoma Register

The Swedish Lymphoma Register (SLR) is a patient register initiated by the Swedish Lymphoma Group since 2000 but kept at the regional cancer center in Lund and include all lymphoma patients in Sweden.

The register includes details about clinical characteristics, treatment and outcome for about 36000 patients.

All subgroups of lymphoma are recorded in the SLR and the most common diagnosis subgroup (33%) is Diffuse Large B-cell Lymphoma [Swedish Lymphoma Group, 2019]. Relapse status was observed for patients since 2010 and recorded in the register by clinicians. However, the clinical epidemiology research group at Karolinska Institutet in Stockholm gathered relapse data retrospectively from an extensive patient journal review. Hence, patient relapse status is now available from the start of the study follow-up window in 2007.

Eastern Cooperative Oncology Group Performance Status (ECOG) is a measurement on the level of im- pact the disease has on the patients daily living activities. ECOG is coded from 0 to 5 where 0 denotes a fully active patient with no negative impact on daily living activities, a 4 denotes a patient that is completely disabled who cannot carry any self care and a 5 denotes a patient that is dead from the disease. Since ECOG is a categorical variable, the result will yield hazard ratios for each ECOG level [ECOG-Acrin, 2021].

Ann Arbor staging (Stage) is a staging system for lymphoma that is a measurement of the disease pro- gression. Stage is a categorical variable with level 1 to 4. Stage 1 indicates the involvement of a single lymph node region, Stage 2 indiciates involvement of two or more lymph node regions on the same side of the diaphragm, Stage 3 indicates involvement of lymph node regions on both sides of the diaphragm while Stage 4 indicates involvement of one or more extralymphatic organs or tissues. The variable included in the model is coded as a binary variable that takes value 1 if the patient is in stage 1 or stage 2 and takes value 2 if the patient is in stages 2 to 4 [Carbone et al., 1971].

Lactate dehydrogenase (s-LDH) is an enzyme which, when increased, can be an indicator of tissue dam- age or unsuccessful chemotherapy treatments [Liu et al., 2016]. The variable is a binary variable that could be either at a normal level or at an increased level. The patient’s sex and age at remission are also variables collected from the SLR.

5.2 National Patient Register

The National Patient Register (NPR) is the largest patient register in Sweden, containing both inpatient and outpatient care records. The NPR cover all nationwide instances of inpatient care since 1987 and patient details within the specialised outpatient care have been recorded since 2001. Data about patient characteris- tics, such as comorbidities and acute myocardial infarction during follow-up, is collected from the NPR and linked with the SLR. In addition, the death date is collected from the cause of death register.

(21)

Comorbidity is coded into a categorical variable with level ”None”, ”Mild” or ”Severe”. A patient is consid- ered to have severe comorbidities with the presence of stroke, heart failure, myocardial infarction, peripherals arterial diseases, angina pectoris, renail failure, intracranial bleeding, cancer, dementia or COPD prior to the diagnosis date. A patient is considered to have mild comorbidities with the presence of atrial fibrillation, hypertension, diabetes or gastrointestinal bleeding at the diagnosis date while in absence of the comorbidities included in the severe level. If none of the aforementioned comorbidities are observed, the patient has the comorbidity level of ”None”.

The variable ”Year of diagnosis” is the diagnosis year of DLBCL for the patient and is coded as a cate- gorical variable with two levels referencing a time interval each: one level references the time between 2007 and 2010 while the other references the time between 2011 and 2014.

Another way of combining variables of importance is via a risk score, the International Prognostic Index (IPI). IPI is a point system used in oncology research to estimate lifetime survival prognosis and is widely used in DLBCL epidemiology studies [Project, 1993]. Further, the clinics use IPI to decide treatment type for first line treatment with curative intent. One point is assigned for each true statement below.

• The patient is older than 60 years.

• The patient’s cancer progression is in stage 3 or stage 4.

• Increased s-LDH

• ECOG performance status of 2, 3 or 4

• More than one extranodal site

Consequently, IPI can take value 0 to 5.

(22)

6 Results

The results section will report results based on the aforementioned methodology for the two covariate sets.

Each category variable includes one level as the reference level and is not displayed in the results tables.

6.1 Propensity Score Modeling

To attain covariate balance between treatment groups we estimate a propensity score model implementing logistic regression. This model will predict the propensity score using each patient set of covariate values.

The inverse of that score will serve as weights to create a pseudopopulation that is in covariate balance between treatment groups. The estimated propensity score models for both treatment and censoring as outcome variables for covariate set 1 are found in Table 2 and propensity score models for covariate set 2 are found in Table 3.

Table 2: Propensity score model for covariate set 1

Treatment Censoring

Variable Estimate p-value Estimate p-value

Intercept -2.992 0.000 6.612 0.000

Stage 2 0.317 0.079 -0.129 0.659

Stage 3 0.872 0.000 -0.271 0.350

Stage 4 1.028 0.000 -0.100 0.722

ECOG 234 0.089 0.499 -0.148 0.566

s-LDH Increased -0.416 0.000 0.293 0.143

Comorbidity Mild 0.064 0.616 -0.686 0.014

Comorbidity Severe -0.166 0.173 -1.010 0.000 Diagnosis Year 2011-2014 -0.272 0.005 1.188 0.000

Age Baseline 0.012 0.005 -0.048 0.000

The variable selection is based on clinical arguments and is not subject to change after obtaining non- significant coefficients.

Table 3: Propensity score model for covariate set 2

Treatment Censoring

Variable Estimate p-value Estimate p-value

Intercept -2.727 0.000 4.815 0.000

IPI 1 1.019 0.000 -1.461 0.047

IPI 2 1.416 0.000 -1.248 0.090

IPI 3 1.900 0.000 -1.251 0.091

IPI 4 1.942 0.000 -1.531 0.042

IPI 5 2.204 0.000 -1.369 0.143

Comorbidity Mild 0.028 0.789 -0.998 0.000

Comorbidity Severe -0.171 0.244 -1.427 0.000 Diagnosis Year 2011-2014 -0.287 0.002 1.136 0.000

The results in Table 2 and Table 3 are compared with clinical claims to discern if they align. For example, later cancer stages, increased s-LDH or one unit increase in age will increase the natural log odds of relapse

(23)

and in line with clinical claims. Reciprocally, the reverse relationship is observed for the propensity score for censoring. Combining some variables from covariate set 1 into the International Prognostic Index and and estimating the propensity score model for both treatment and censoring, shown in Table 3, yields estimated regression coefficients in line with clinical assumptions. From previous studies, an increase in IPI is related to decreased survival prognosis and the estimated propensity score model yield associations with relapse status as well. An increase in IPI increases the natural log odds of relapse and is significant on a 5% significance level across all IPI levels. After estimating the propensity score model for the treatment for set 1 and set 2, the propensity score is estimated for each patient and the distribution is plotted, grouped for each treatment status.

Figure 3: Treatment propensity score distribution for covariate set 1 (left) and covariate set 2 (right)

The estimated propensity score for treatment distributions for both covariate sets are found in Figure 3.

The distribution plots will be used to assess the positivity assumption: patients with every combination of covariate vectors L = l should have the possibility of getting assigned the treatment. There is a distribution overlap for every estimated propensity score across treatment groups which validates the positivity assump- tion and there is no need to truncate the data. Similarly, the distribution of the estimated propensity score for censoring for both treatment groups overlaps for all estimated scores. However, estimating weights using the propensity score have drawbacks when the estimated score is close to 0 or 1. This can yield estimated weights close to infinity and consequently yield biased estimates and large variances [Cole and Hern´an, 2008].

To assess if problematic weights are estimated we check the distribution summary in Table 4. Some maxi- mum weights seem large but is not considered extreme and consequently not an indication of misspecified propensity score models.

(24)

Figure 4: Censoring propensity score distribution for covariate set 1 (left) and covariate set 2 (right)

Table 4: Distribution of unstabilized inverse probability weights

wAT E wAT T

Set Treatment Censoring Treatment Censoring

Min Max Min Max Min Max Min Max

Set 1 1.05 16.09 1.01 120.31 0.05 1.00 1.00 119.31 Set 2 1.04 21.37 1.00 111.29 0.04 1.00 1.00 110.29

Covariate balance is assessed in Table 5 and Table 6, calculating standardized mean differences between treatment groups within the unweighted population and the weighted pseudopopulation for each variable, respectively. The results in Table 5 show how the standardized mean difference decreases when creating a pseudopopulation with inverse probability weights. If the standardized mean difference is less than 10%, the covariate imbalance is not considered meaningful and is in this case observed for all covariates. The balancing technique of inverse probability weights is considered to be adequate. The covariates in the raw data are not balanced between treatment groups, except comorbidity. Similarly as for covariate set 1, after weighting covariate set 2 to create a pseudopopulation we observe adequate balance across all covariates.

That is, the standardized mean differences are less than 10%. The conclusion from the inverse probability weighting as a balancing technique is that the treatment groups now are more comparable in terms of covariate distributions, which was the objective.

(25)

Table 5: Balancing diagnostics for covariate set 1 using d · 100

Variable Non-weighted wAT E wAT T

Stage 1 -37.07 -0.22 0.00

Stage 2 -19.42 0.36 -0.02

Stage 3 13.17 -0.05 0.03

Stage 4 33.29 -0.09 -0.01

ECOG 01 -16.02 0.14 0.10

ECOG 234 16.02 -0.14 -0.10

s-LDH 1 -33.45 0.19 -0.10

s-LDH 2 33.45 -0.19 0.10

Comorbidity None -2.26 -3.49 1.01

Comorbidity Mild 6.66 -0.79 -0.53

Comorbidity Severe -3.65 4.40 -0.59

Diagnosis Year 2007-2010 14.36 -0.17 -0.39 Diagnosis Year 2011-2014 -14.36 0.17 0.39

Age Baseline 13.65 6.16 -1.09

Table 6: Balancing diagnostics for covariate set 2 using d · 100

Variable Non-weighted wAT E wAT T

IPI 0 -30.44 -0.40 0.01

IPI 1 -24.38 0.51 -0.04

IPI 2 -4.41 -0.10 0.01

IPI 3 25.75 0.04 -0.07

IPI 4 17.51 -0.22 0.09

IPI 5 10.62 -0.14 0.06

Comorbidity None -2.26 -2.23 0.49

Comorbidity Mild 6.66 0.94 -0.44

Comorbidity Severe -3.65 1.53 -0.11

Diagnosis Year 2007-2010 14.36 0.14 -0.01 Diagnosis Year 2011-2014 -14.36 -0.14 0.01

(26)

6.2 Causal Effects

When the population is considered to be adequately balanced in confounder distributions across treatment groups, we estimate causal effects. All causal effects in this section assume exchangeability, positivity and consistency. The results in this section are produced by the Marginal Structural Cox Model on the pseu- dopopulation created by the two sets of weights respectively, the weights for the average treatment effect (ATE) and the average treatment effect on the treated (ATT). Since the data is standardized using inverse probability weights we estimate marginal causal effects. Causal hazard ratios for treatment are found in Table 7. The hazard ratios are estimated for the complete study period since we want to capture lasting effects of relapse treatment on acute myocardial infarction.

Table 7: Causal hazard ratios for treatment and 95% bootstrap confidence intervals

wAT E wAT T

Variable Estimate Lower Upper Estimate Lower Upper

Set 1 0.629 0.352 1.125 0.788 0.440 1.413

Set 2 0.668 0.396 1.125 0.814 0.475 1.395

The marginal hazard ratios can be interpreted as the average causal effect if the identifiability assumptions hold when estimated with wAT E. When marginal hazard ratios are estimated using wAT T we interpret the regression coefficient as the average causal effect on the treated. The estimates in Table 7 is the ratio of the hazards between two theoretical populations: the first population being the population that would have been observed if all patients were to be treated and no patients were censored while the second population being the population that would have been observed if all patients were not treated and no patients were censored.

However, the confidence interval generated by the bootstrap method cover 1 for all estimates and the average causal effects for treatment are non significant on a 5% significance level. This interpretation is for both the average treatment effect and the average treatment effect on the treated for covariate set 1 and covariate set 2.

The estimated survival function, based on the MSM Cox weighted with wAT E, predicts survival probabili- ties at each time point for both a hypothetical population where all patients are treated and a hypothetical population where no patients are treated. The survival function generates marginal potential survival curves using wAT E in Figure 5 and using wAT T in Figure 6. The difference between the potential survival curves is interpreted as the average causal effect. Similar to the causal hazard ratios, the potential survival curves visualizes a protective average causal effect. The potential survival probability is relatively close between the pseudopopulations for the first 1000 days and then diverges somewhat. However, the bootstrap confidence intervals widens with time and for all time points the confidence intervals overlap. That is, no significant average causal effects are discerned from the marginal potential survival curves for both covariate sets.

(27)

Figure 5: Potential survival curves using ATE weights for set 1 (left) and set 2 (right)

Figure 6: Potential survival curves using ATT weights for set 1 (left) and set 2 (right)

(28)

The predicted potential survival curves, from the Cox model using wAT T, for each potential treatment are closer compared to the potential survival curves predicted using wAT E. The average treatment effect on the treated seems to be smaller compared to the average treatment effect on the complete cohort. Although non significant average causal effects in differences of potential survival curves, the causal parameters of interest are estimated and found in Table 8. The survival probability causal effect at a prespecified time point (number of days after the remission date) ˆ∆SP CE(t∗) and the restricted average causal effect ˆ∆RACE(t∗) at a prespecified time point for both covariate sets are estimated using ATE weights.

Table 8: Estimated causal parameters ˆ∆RACE(t∗) and ˆ∆SP CE(t∗)

wAT E wAT T

Statistic Set 1 Set 2 Set 1 Set 2

∆ˆRACE(365) 4.29 5.06 2.48 2.76

∆ˆRACE(730) 20.17 21.00 11.59 11.56

∆ˆSP CE(365) 0.0245 0.0288 0.0146 0.0163

∆ˆSP CE(730) 0.0631 0.0588 0.0363 0.0326

The positive difference for both ˆ∆RACE(t∗) and ˆ∆SP CE(t∗) at year one and two indicates a causal increase in survival probability between a hypothetical population that were all treated and a hypothetical population where no one was treated. Specifically, the causal parameter ˆ∆RACE(t∗) calculates the causal difference in means of potential survival times between two pseudopopulations. After 1 year, the causal difference in survival times is 4 days while after 2 years the causal difference in mean survival time increases to 16 days, for both ATE and ATT. After 1 year, the potential survival probability differs between 1 and 3 percent units between treatment groups for both covariate sets. After 2 years, the difference increases to between 3 to 6 percent units covariate sets. The results are contrary to clinical belief that the treatment would be cardiotoxic and consequently increase the risk of AMI. Instead, the results in Table 8 suggest that being treated with relapse treatment would decrease the risk of AMI. However, since there are no significant difference between the potential survival curves we can not conclude that the causal parameters are different from zero.

(29)

6.3 Model Diagnostics

Some causal parameters rely on the assumption of proportional hazards. To assess if the assumption of proportional hazards hold we plot the estimated regression coefficients from the Cox model for each time point. The proportional hazard assumption is assumed to hold if the estimated hazard ratio can be visualized as a horizontal line throughout the study period in Figure 7 and in Figure 8. A formal hypothesis test of proportional hazards is conducted using Schoenfelds residuals to examine if there are any present time dependent coefficients. No test were significant yielding the interpretation that we fail to reject the hypothesis of no present time dependent coefficients for both covariate sets and both weighting methods.

Figure 7: Proportional hazard assumption for set 1 using wAT E (left) and wAT T (right)

Figure 8: Proportional hazard assumption for set 2 using wAT E (left) and wAT T (right)

(30)

7 Conclusion

The identifiability assumptions for valid causal inference are assessed both by modeling results and subject matter knowledge. Based on clinical advice, this study assume conditional exchangeability. The assumption of positivity is assumed to hold since estimated propensity scores for treatment groups overlap in distribution.

A marginal structural Cox model were estimated using inverse probability weights for both treatment and censoring which yield marginal hazard ratios with a causal interpretation. The corresponding confidence intervals estimated by a bootstrap method all cover 1 and suggest no significant average causal effect of relapse treatment on acute myocardial infarction. Further, marginal adjusted survival curves were estimated by the MSM Cox with an interpretation within the potential outcome framework but was not significantly different on a 5% significance level. Lastly, causal parameters were estimated from the potential survival curves but differences were interpreted with caution. No average causal effects of relapse treatment on acute myocardial infarction were tested to be significant.

8 Discussion

Valid causal inference rely on the assumption of conditional exchangeability. This assumption is not testable and require subject knowledge to deduce. Even though the covariates included in this report are well estab- lished as relevant confounders for the treatment and outcome of DLBCL patients, the results in this report are restricted by the available data. The Swedish cancer patient registers are considered to have almost complete coverage of all completed cancer treatments and include all relevant clinical characteristics of each patient. However, there is reason to believe that other features than clinical characteristics can be relevant confounders. Other features such as socioeconomic variables may also be considered for confounders that would potentially attribute to the increased risk of relapse, AMI or censoring.

Furthermore, estimation of causal effects also rely on a correctly specified propensity score model. In addi- tion to variable selection, there is an interest in modifying the model including squares or interaction effects to capture more real world like relationships. However, the propensity score model was considered adequate as a balancing score after obtaining standardized mean differences of less than 10% for each covariate. Other types of covariate balancing, such as propensity score matching, are proposed as possible extensions of the study. Instead of weighting the data to create a pseudopopulation, treated patients are compared with untreated patients that is close in distance of estimated propensity scores. However, using IPW estimates marginal effects on the complete population and is of interest to obtain interpretations that is similar to a randomized controlled trial. In the presence of a rich set of covariates, there is a recommendation to conduct a sensitivity analysis to examine the robustness of the propensity score model. For example, the researcher can exclude confounders from the propensity score model in an iterative manner to examine fluctuations in the parameter estimates. The confounders in this study is relatively few and the mentioned model is

(31)

not considered applicable. Further, if there is an imminent risk of misspecified propensity score models an alternative would be to use the double robust estimator to reduce that risk [Wang et al., 2016]. The double robust estimator implements both a propensity score model, for example estimated by logistic regression, and a standardized regression model to isolate the average causal effect. The requirement is that only one of the two estimating techniques are correctly specified.

The marginal hazard ratios were estimated using unstabilized weights in this study. Unstabilized weights runs the risk of obtaining extreme weights and create infinite number of copies of one patient into the pseu- dopopulation and increase the estimated variance of marginal hazard ratios. Although diagnostics show no estimation of extreme weights, one extension of the weighting methodology is to use stabilized weights.

The treatment variable in the empirical application was relapse date. The relapse date figures as a proxy for relapse treatment with chemotherapy that is assumed to be cardiotoxic. However, relapse date may be a poor proxy in this patient cohort. In reality, rough estimates from more recently obtained relapse data show that about 60% of the patients with a relapse date get treated with an aggressive relapse treatment.

The specific treatment details can vary, anthracycline based chemotherapy and stem cell transplants are two examples. About 20% of the patients with a relapse date get treatment that is not as aggressive in order to prolong life expectancy. Some patients with a relapse date do not get relapse treatment at all and commence palliative care instead. Using this proxy to create pseupopulations and treatment groups and consequently try to isolate the causal effect introduces bias and the causal effect should be interpreted with caution.

Further, the pseudopopulation is rather unrealistic and may lack valuable interpretations in the real world.

Even though we estimate potential survival curves that would have been observed if all patients were treated and no patients were censored, it is important to realize that the estimated curves would never be observed.

Certainly, patients will be censored during follow up. The potential survival curves are estimated to capture the causal effect given that the identifiability assumptions hold, not resemble a realistic real world example.

The patients with low probability of censoring and large censoring weights are contributing to the features of the pseudopopulation. However, the few patients with large weights may not be representative of the popula- tion that would be observed if no one was censored from death or of administrative reasons. If, using relapse date as proxy as in this study, the methodology would find strong causal effects on AMI incidence, it would be necessary to acquire more detailed data about relapse treatment and redo the analysis for each treat- ment type. For now, detailed data about relapse treatment is not available and therefore the use of the proxy.

If there was a present significant average causal effect and additional interest in finding causal effects in the real world, the inclusion of competing events would be a natural extension. Instead of censoring patients that die during follow-up, death is included as an event and can be caused by other, competing, events.

The methodology of competing events within the potential outcome framework requires different causal es-

References

Related documents

In multivariate survival analyses, non-GCB/ABC according to both the Hans algorithm and the Lymph2Cx assay and double expression of MYC and BCL2 remained to be the two most

tumor tissue from two groups of DLBCL patients who had been treated with modern immunochemotherapy with totally different clinical outcome, that is, (i) early relapse/refractory

patients; (i) patients with primary refractory disease or early relapse (REF/REL; paper I: n=5, paper III: n=44); and (ii) long-term progression-free patients, clinically

Recently, the combination of rituximab (R) and chemotherapy has resulted in improved survival, but still a proportion of patients fail to not reach sustained remission.

Med den nya samverkansöverenskommelsen mellan Polisen och Stadsdelsförvaltningen i Enskede Årsta Vantör (Bilaga 3) i Dalen kommer förhoppningsvis samverkan betyda mycket för de

Intervjusvaren visar på en bred kunskap av utvärderingens olika aspekter vilket vi anser leder till att utvärdering bör syfta till att lyfta fram olika perspektiv som

(89) The fact that women with MI were more likely to present without chest pain could contribute to why the dispatch centre or the health care providers at the ED give women

Ree, H.J., et al., Coexpression of Bcl-6 and CD10 in diffuse large Bcell lymphomas: significance of Bcl-6 expression patterns in identifying germinal center B-cell