• No results found

Mediation and Interaction with Application to Survival After Myocardial Infarction

N/A
N/A
Protected

Academic year: 2022

Share "Mediation and Interaction with Application to Survival After Myocardial Infarction"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Uppsala University

Bachelor Thesis

Department of Statistics

Mediation and Interaction with Application to Survival After

Myocardial Infarction

Authors:

David Grannas Edvin Volgsten

Supervisor:

Ronnie Pingel

January 24, 2018

(2)

Abstract

Previous studies have found that the socioeconomic risk factors education, income and family type were independently associated with mortality after first myocardial infarction (MI). In this population-based cohort study we study how these socioeconomic risk factors are related to the mortality among first MI survivors within 365 days. To this end, we use the four-way decomposition analysis which decomposes the total effect of the exposure into the effect that is due to only mediation, due to only interaction, due to both mediation and interaction and due to neither mediation nor interaction. Using education as exposure and income or family type as mediator two different models are estimated (adjusted for gender, age, region of birth and year of admission). For the model in which income was used as mediator, results show that individuals with lower education and a lower income are at greater risk of mortality, compared to individuals with the highest income level. The mediated effect of income on mortality decreases as the income approaches the highest income. In the model with family type as mediator the mediated effect on mortality was found to be low, meanwhile the direct effect of lower education was most substantial to the total risk of mortality.

Keywords: Potential outcome, four-way decomposition.

(3)

Contents

1 Introduction 3

2 Statistical Theoretical Framework 4

2.1 Causal Inference . . . 4

2.2 Mediation . . . 4

2.3 Interaction . . . 6

2.4 Four-Way Decomposition . . . 8

2.5 Assumptions for Four-Way Decomposition . . . 11

2.6 Estimating the Four-Way Decomposition Using Statistical Models with Binary Mediator and Binary Outcome . . . 13

2.6.1 Logistic Regression for the Mediator . . . 13

2.6.2 Poisson Regression for the Outcome . . . 14

2.6.3 Estimation of Risk Ratios for Binary Mediator and Binary Outcome . 14 3 Four-Way Decomposition Analysis of Socioeconomic Risk Factors on Sur- vival Post First Myocardial Infarction 16 3.1 Method . . . 16

3.1.1 Design and Data . . . 16

3.1.2 Study Population . . . 16

3.1.3 Outcome . . . 16

3.1.4 Exposure . . . 17

3.1.5 Mediator . . . 17

3.1.6 Covariates . . . 17

3.2 Results . . . 19

3.2.1 Model with Income as Mediator . . . 21

3.2.2 Model with Family Type as Mediator . . . 24

4 Discussion 25

(4)

1 Introduction

Myocardial infarction (MI) is the leading single cause of death in Sweden. In 2015 almost 30000 persons suffered an MI and 34% died within the first year. After extensive research of potential risk factors and improved health care, this is a decrease in case fatality rate from 60% in the year 1987 (National Board of Health and Welfare, 2017). Putting the focus on secondary prevention potential risk factors can be detected and resources can be distributed to disadvantaged groups, hopefully further decreasing the number of deaths among first MI survivors. In this study three such potential socioeconomic risk factors of interest are Education, Income and Family Type.

A study conducted on individuals in Denmark between the years 1995-2002 concluded that both income and education were independently associated with mortality within 30 days after first MI (Rasmussen et al, 2006). Another study (Gerber et al, 2008) investigated the relationship between neighborhood income and the individual’s education with the outcome long-term survival after first MI for individuals in Israel between 1992-1993. The study found that both risk factors were associated with an increased risk of mortality and also that a low income showed a large increased risk of mortality when accompanied by a low education status. In Canada (Alter et al, 1999) neighborhood income was shown to have an effect on one year mortality after MI. The effect of living alone after experiencing an MI was shown to have an increased risk for mortality after 4 years post MI and readmission to the hospital 365 days post MI (Bucholz et al, 2011).

This study aims to not only study socioeconomic risk factors independently but to dis- entangle the relationship between them and their association with mortality. A newly de- veloped statistical method called four-way decomposition (VanderWeele, 2015) is used, in which mediation- and interaction analysis are unified. The gains from using the four-way decomposition will be a complete model for measuring the total effect of the exposure on the outcome, decomposed into four components: due to only mediation, due to only interaction, due to both mediation and interaction and due to neither mediation nor interaction.

Thus, the questions this study aims to answer are:

• Using income as a mediator, what is the total effect of education on 365 day mortality and how much of the total effect is due to each component in the four-way decompo- sition?

• Using family type as a mediator, what is the total effect of education on 365 day mortality and how much of the total effect is due to each component in the four-way decomposition?

In Section 2 the statistical theoretical framework for this study is described, showing how mediation- and interaction analysis are unified into the four-way decomposition analysis.

Section 3.1 presents the data that are used and defines the variables. In Section 3.2 the four-way decomposition is used to answer the questions of this study. Lastly, the results found are discussed in Section 4.

(5)

2 Statistical Theoretical Framework

2.1 Causal Inference

It is seldom interesting to look solely at the relationship between an exposure A and the outcome Y but to find how and if the exposure causes the outcome. This study aims to find such causal effects of mediation- and interaction analysis using the four-way decomposition analysis and thus the need of so-called potential outcomes or counterfactuals is substantial.

The framework of potential outcomes for causal inference was first introduced by Neyman (1923) and later extended by Rubin (1974). The potential outcome is defined as the outcome that could have occurred given that the circumstances had been different than they actually were. For binary exposure and outcome let Y1 denote the potential outcome if the exposure would have been set to 1 and let Y0 denote the actual observed outcome. Therefore, if the potential outcome would be different than the actual outcome given a different value of the exposure, it implies that the exposure causes the outcome. The total effect of the exposure on the outcome is thus written as Y1 − Y0 (VanderWeele, 2015). In general, this difference cannot be observed for an individual but it is under certain assumptions possible to make inferences about causal effects on average for a population.

2.2 Mediation

Once a relationship between the exposure variable and the outcome is found it is sensible to search for explanations of this relationship and whether other variables might be affecting the relationship between the exposure and the outcome. A variable that seeks to underlie a relationship between an exposure and an outcome is called a mediator M. The use of a mediator variable was influenced by Baron and Kenny (1986) and proposes that rather than solely a direct effect between the exposure variable and the outcome variable there exists an indirect effect, also known as a mediated effect. This means that the exposure variable affects the mediator variable which itself affects the outcome variable. The effect of the exposure on the mediator is thus reflected in the outcome. Mediation analysis requires a temporal precedence in which the exposure must cause the mediator and the mediator must then cause the outcome. If this temporal order were not present then there exists a fallacy in which the effect could precede the cause, which is not possible.

Suppose that the levels of exposure a vs a are compared. Then the natural indirect effect (NIE ) is the average change in outcome when changing the exposure relative to the indirect pathway, but keeping the exposure constant relative to the direct pathway. This can be simplified as observing the average change in outcome when comparing with and without an indirect effect, when keeping the direct effect constant,

YaMa vs YaMa∗

On the other hand, there is the effect of the exposure on the outcome that is not mediated.

This effect is called the natural direct effect (NDE ) and is defined as the average change in outcome when changing the exposure relative to the direct pathway, but keeping the exposure constant relative to the indirect pathway. This can be simplified as observing the

(6)

average change in outcome when comparing with and without a direct effect, when keeping the mediator constant at its observed level.

YaMa vs YaMa

In the simple mediation model the total effect of the exposure on the outcome is the sum of these indirect- and direct effect, i.e. T E = N DE + N IE. The total effect is thus the effect of changing the exposure from a to a, that is,

YaMa vs YaMa∗

With the use of mediation analysis the size of these direct- and indirect effects can be determined and thereby the researcher could observe the effect and variable most attributable to the outcome.

A simple mediation model is presented in Figure 1. Note that in order to make causal claims certain assumptions must hold. We will in the section on four-way decomposition describe these assumptions.

Figure 1: Simple mediation model with exposure A, mediator M and outcome Y.

(7)

2.3 Interaction

In a mediation analysis the researcher seeks to answer why the exposure variable affects the outcome by observing the indirect effect of the mediator variable. However, when analyzing the interaction the researcher seeks to answer for whom the effect is largest by examining the difference in effect when the exposure and mediator variable are combined, as opposed to each considered separately. Observing the effect that the mediator variable and exposure variable have on one another is of importance to determine their reliance on each other in a model (VanderWeele, 2015). The study of interaction seeks to identify these relations.

For the sake of simplicity, let the variables of interest be a binary exposure and a binary mediator, both measured as either 1 or 0. Let pam = E[Y |A = a, M = m] denote the probability of a binary outcome given exposure A=a and mediator M=m. The additive interaction, which is the interaction term of the difference measures, is then be defined by,

(p11− p00) − [(p10− p00) + (p01− p00)] (1) where (p11−p00) is the effect of the combined variables exposure A and mediator M compared to a reference category where both are set to 0, also known as risk difference, i.e. the risk of the event in the exposed group minus the risk of the event in the unexposed group. The expressions (p10− p00) and (p01− p00) are to be interpreted as the effect of exposure A alone and the effect of the mediator M alone respectively. These two expressions are summed together and then subtracted from the combined effect. Simplifying the result in Equation 1 yields an expression for a positive and negative interaction between the two variables,

p11− p10− p01+ p00> 0, positive interaction p11− p10− p01+ p00 < 0, negative interaction

A positive interaction would imply that the effect of the combined variables would exceed the effect of the two variables when they are considered separately. A negative interaction would yield the opposite result, meaning that effect would be lower than the effect when the variables are considered separately.

Interaction in terms of ratio measures is called multiplicative interaction, which uses risk ratios, also known as relative risk (RR). Risk ratios are calculated as the probability of the event occurring when either the exposure, the mediator or both the exposure and the mediator are set to 1, divided by the probability of the event occurring when both the exposure and the mediator are set to 0

RR10= p10/p00, RR01= p01/p00, RR11= p11/p00

which would correspond to a risk ratio calculated as, RR11

RR10RR01 = p11p00

p10p01 > 1, positive multiplicative interaction RR11

RR10RR01 = p11p00

p10p01 < 1, negative multiplicative interaction

(8)

A positive multiplicative interaction is to be interpreted as exposure and mediator together exceeds the product of the effects the two variables have when they are considered sepa- rately. A negative multiplicative interaction is interpreted in the opposite way, meaning that together the variables would have a lower effect, in contrast to being considered separately.

Dividing p11 − p10 − p01 + p00 with pp00 yields Rothman’s excess relative risk due to interaction (Rothman, 1986), known as RERI,

RERI = RR11− RR10− RR01+ 1

This measure is similar to the additive interaction but uses risk ratios (relative risks) rather than just risks. The importance of RERI is that it yields the possibility to esti- mate whether the additive interaction is positive, negative or equal to zero using risk ratios.

RERI > 0 if and only if p11 − p10 − p01 + p00 > 0. Also, RERI < 0 if and only if p11− p10 − p01+ p00 < 0 and RERI = 0 if and only if p11 − p10− p01+ p00 = 0. In this way, RERI can be used to state the direction of the additive interaction, although it is not possible to say anything about the quantity of the additive interaction without knowing pp00.

(9)

2.4 Four-Way Decomposition

If a mediator exists that the exposure could interact with, it is described by VanderWeele (2015) how the total effect of the exposure on the outcome can be expressed as a sum of four components; due to neither mediation nor interaction, due to interaction only, due to mediation and interaction and due to mediation only.

The total effect, Y1− Y0, ofddd the exposure A is written as,

Y1− Y0 =(Y10− Y00) + (Y11− Y10− Y00+ Y01)(M0)+

(Y11− Y10− Y00+ Y01)(M1− M0) + (Y01− Y00)(M1 − M0)

The controlled direct effect (CDE) is the term (Y10 − Y00). This is the first component, defined as the effect on the outcome if the mediator were set to 0. That is, the effect of the exposure on the outcome that is not due to either the mediator or any interaction.

The second component (Y11−Y10−Y00+ Y01)(M0) is an additive interaction that will only be non-zero when the mediator M is present in the absence of the exposure. This component is called the reference interaction (IN Tref). This component captures the effect that is due to interaction only.

The third component (Y11− Y10− Y00+ Y01)(M1− M0) is called the mediated interaction (IN Tmed). For this additive interaction it is required that (M1−M0) 6= 0, that is the exposure has an effect on the mediator. The third component in this four-way decomposition captures the effect that is due to mediation and interaction.

The last component (Y01− Y00)(M1− M0) is the effect of the mediator on the outcome, when the exposure is absent, multiplied with the effect of the exposure on the mediator. For this component to be non-zero requires that the mediator affects the outcome in the absence of the exposure and that there is an effect of the exposure on the mediator itself. That is, the effect that is due to mediation only, not any interaction. This component is called the pure indirect effect (P IE). Illustrations of each of the four components are depicted in Figure 2.

Hence, the total effect can be summarized as,

T E = CDE + IN Tref + IN Tmed+ P IE

If the assumptions presented in the Section 2.5 are fulfilled and pam = E[Y |A = a, M = m], then the expected value of the components are,

E[CDE] = (p10− p11)

E[IN Tref] = (p11− p10− p01+ p00)P r(M = 1|A = 0)

E[IN Tmed] = (p11− p10− p01+ p00)(P r(M = 1|A = 1) − P r(M = 1|A = 0)) E[P IE] = (p01− p00)(P r(M = 1|A = 1) − P r(M = 1|A = 0))

(2)

(10)

Figure 2: Graphs illustrating the four components.

Let pa= E[Y |A = a], which then yields the four-way decomposition expressed as, E[T E] = pa=1− pa=0 = (p10− p11) + (p11− p10− p01+ p00)P r(M = 1|A = 0) (3)

+ (p11− p10− p01+ p00)(P r(M = 1|A = 1) − P r(M = 1|A = 0)) + (p01− p00)(P r(M = 1|A = 1) − P r(M = 1|A = 0))

Using the estimated expected values it is possible to calculate how much of the total effect on the outcome that is due to each component respectively. Since the expected value of the total effect is pa=1− pa=0 = E[T E] the ratios for the proportions attributable, PA, of the total affect for each component can be formulated as,

P ACDE =E[CDE]

E[T E] , P AIN Tref = E[IN Tref] E[T E]

P AIN Tmed =E[IN Tmed]

E[T E] , P AP IE = E[P IE]

E[T E]

Adding the expected value due to mediated interaction to the expected value due to to mediation only, the pure indirect effect, and dividing with the expected total effect the overall proportion due to mediation is expressed as,

OP Amed= E[IN Tmed] + E[P IE]

E[T E]

In the same way the overall proportion due to interaction is calculated as the expected value of the reference interaction plus the expected value of the mediated interaction, dividing the sum with the expected value of the total effect,

OP Aint= E[IN Tref] + E[IN Tmed] E[T E]

(11)

In the case of binary exposure and binary mediator the decomposition is yielded on the ratio scale by dividing all the terms in Equation 3 with pa=0

RRa=1− 1 = κ(RR10− 1) + κ(RR11− RR10− RR01+ 1)P r(M = 1|A = 0)+

+ κ(RR11− RR10− RR01+ 1)(P r(M = 1|A = 1) − P r(M = 1|A = 0)) +

+ κ(RR01− 1)(P r(M = 1|A = 1) − P r(M = 1|A = 0)) (4) where RRa=1 is the relative risk for the exposure variable, when comparing A=1 to the reference A=0, and the scaling factor is κ = pp00

0. When both exposure and mediator are binary variables the relative risks are RR00, RR01, RR10and RR11. The term (RR11−RR10− RR01+1) is the Rothman’s excess relative risk due to interaction (RERI). The decomposition in Equation 4 thus involves decomposing the excess relative risk for A, i.e. RRa=1− 1, into the four components.

Dividing the term for each component in Equation 4 with the expression on the right- hand side yields the proportional attributable (PA) for each component. Note that the scaling factor, κ = pp00

0 , then drops out due to division. This allows for estimation of the proportion of the total effect due to neither mediation nor interaction (P ACDE), due to reference interaction only (P AIN Tref), due to the mediated interaction (P AIN Tmed) and due to mediation only (P AP IE).

P ACDE = (RR10− 1)

(RR10− 1) + (RERI)P r(M = 1|A = 1) + (RR01− 1)(P r(M = 1|A = 1) − P r(M = 1|A = 0))

P AIN Tref = (RERI)P r(M = 1|A = 0)

(RR10− 1) + (RERI)P r(M = 1|A = 1) + (RR01− 1)(P r(M = 1|A = 1) − P r(M = 1|A = 0)) P AIN Tmed= (RERI)(P r(M = 1|A = 1) − P r(M = 1|A = 0))

(RR10− 1) + (RERI)P r(M = 1|A = 1) + (RR01− 1)(P r(M = 1|A = 1) − P r(M = 1|A = 0)) P AP IE= (RR01− 1)(P r(M = 1|A = 1) − P r(M = 1|A = 0))

(RR10− 1) + (RERI)P r(M = 1|A = 1) + (RR01− 1)(P r(M = 1|A = 1) − P r(M = 1|A = 0)) (5)

(12)

2.5 Assumptions for Four-Way Decomposition

In order for the four-way decomposition to give useful results there are four assumptions that need to be fulfilled. The model makes four no-confounding assumptions in order to derive the four components from the data using averages for the population as in Equation 2. A confounder is a variable which affects either the exposure variable and the outcome variable or the mediator variable and the outcome variable. The four assumptions are listed below,

I. There are no unmeasured confounders affecting the exposure- and outcome variable, expressed notationally as Yam ⊥⊥ A|C

II. There are no unmeasured confounders affecting the mediator- and outcome variable, expressed notationally as Yam ⊥⊥ M | {A, C}

III. There are no unmeasured confounders affecting the exposure- and mediator variable, expressed notationally as Ma ⊥⊥ A|C

IV. The confounder(s) affecting the mediator and outcome variable is not affected by the exposure variable, expressed notationally as Yam ⊥⊥ Ma|C

where ⊥⊥ indicates the independence between the counterfactual and the other variable or counterfactual.

The independence between the different variables relates to the temporal ordering that is required in a mediation analysis. The cause must always precede the effect which requires that the researcher knows that the exposure precedes the mediator and the mediator precedes the outcome.

Estimating the controlled direct effect, CDE, only requires that the first two assumptions are fulfilled. These two assumptions mean that the covariates included in the model are sufficient when controlling for confounding between the exposure- and outcome variable and mediator- and outcome variable.

Apart from the first two assumptions the remaining three components of the four-way decomposition model also require the third and fourth assumptions to be fulfilled. The third assumption is interpreted in the same way as the first and the second, which means that the included covariates are sufficient in controlling for confounding between the exposure and the mediator. The fourth assumption is to be interpreted as no confounder affecting the mediator and outcome variables is affected by the exposure variable.

If the assumptions regarding unmeasured confounding do not hold then there is a pos- sibility that the biases will be large and the results questionable. However, there is no way to control whether these assumptions are fulfilled or not but a possible remedy would be to perform a sensitivity analysis in order to observe the consequences of omitting unmeasured confounders (VanderWeele, 2015).

In Figure 3 the four assumptions are visualized. Assumption I is depicted as C1, that is there should not exist any unmeasured exposure-outcome confounder as it does in Figure 3.

C2 is the unmeasured mediator-outcome confounder that according to assumption II should not exist in order for the assumption to hold. No unmeasured exposure-mediator confounder should exist for assumption III to be fulfilled, that is there should not exist any confounder

(13)

as C3 in Figure 3. The arrow from A to C2 as in Figure 3 should not exist for assumption IV to be fulfilled, indicating that the exposure variable should be independent from the mediator-outcome confounder.

The fourth assumption assumes independence between the two counterfactuals Yam and Ma, given a set of covariates C. The problems that arise with this assumption is that these counterfactuals cannot be observed together in an empirical setting because it is not possible to observe both the outcome where the exposure is set to 1 and the mediator where the exposure is set to 0. This is called a ”Cross-World”-independence indicating that both exposure settings are accounted for, which is not possible to observe. However, when interpreting a causal diagram using a non-parametric structural equation model (NPSEM) this assumption is still valid (VanderWeele, 2015). This follows because if the error term for Y is Y and the error term for M is M then the counterfactuals Yam and Ma will respectively be the functions f (a, m, c, Y) and h(a, m, c, M). Because a, b and c are fixed it will be solely a function of the respective error terms. NPSEM assumes independence of these two error terms and thus any function of them will also be independent, which results in independence between Yam and Ma (Pearl, 2009).

However the use of NPSEM has been criticized and shown to be ineffective when using different graphical causal models (Robins and Richardson, 2010). In a series of papers and seminars Donald B. Rubin (2004, 2005, 2010) has also revealed problematic situations which use direct and indirect effects and the assumptions based on them. The use of NPSEM is thus a choice that has to be considered when presenting the results of the four-way model.

Furthermore, there exists a possibility of making a weaker interpretation of the four way model using solely assumptions (I)-(III) (VanderWeele, 2015).

Figure 3: Graph illustrating the four assumptions. There should not exist any confounder as C1, C2 or C3. Also, the arrow from A to C2 should not exist.

(14)

2.6 Estimating the Four-Way Decomposition Using Statistical Mod- els with Binary Mediator and Binary Outcome

In this study the focus is on the specific and common case of having binary mediator and binary outcome. The four-way decomposition uses two separate regressions to estimate the risk ratios that are used to approximate the value of each component. In the first regression the mediator is the dependent variable and in the second regression the outcome is the dependent variable. As the mortality within 365 days from discharge date in the data is not a rare outcome (> 10%) a Poisson regression will be used in its place, explained in Section 2.6.2 (VanderWeele, 2015). A logistic regression will be used to estimate the regression with the mediator as the outcome, explained in Section 2.6.1. Assume that assumptions (I)-(IV), described in Section 2.5, are fulfilled and that the Poisson- and logistic regressions below are correctly specified. The models for estimating the risk ratios in the four-way decomposition are then,

logitP r(M = 1|a, c) = β0+ β1a + β20c (6) log [E(Y = 1|a, m, c)] = θ0+ θ1a + θ2m + θ3am + θ40c

where a denotes the exposure variable, m denotes the mediator variable and c denotes a set of covariates. The βi’s and the θi’s are the regression coefficients.

2.6.1 Logistic Regression for the Mediator

Logistic regression is used to estimate the parameters βi’s in the four-way decomposition where the mediator is the outcome, depicted in Equation 3.

The logistic regression model is based on the logistic function (Kleinbaum et al. 2010).

Since the dependent variable of interest is a binary variable the logistic model is the prob- ability of the mediator variable taking on the value of 1, given an exposure and a set of covariates,

P r(M = 1|a, c) = 1

1 + e−(β01a+β20c) Taking the natural logarithm yields the logit form,

logitP r(M = 1|a, c) = β0+ β1a + β20c

in which the βi’s are derived using the maximum likelihood estimation method. Using the logit function it is possible to draw comparisons between groups using odds ratios (OR). For binary independent variables exponentiating the coefficient yields the odds ratio, OR = eβi, for the effect of the independent variable, given that all other independent variables are held fixed.

(15)

2.6.2 Poisson Regression for the Outcome

Following the example from Greene (2012) the Poisson regression is used to estimate the parameters for the four-way decomposition when the outcome is not rare (> 10%). Poisson regression is used for count data and counts the expected number of outcomes conditional on the parameters. The Poisson regression assumes that each individual’s outcome follows the Poisson density function,

p(y) = P r(Y = y) = λye−λ

y! , y = 0, 1, 2, .. and λ > 0

Given outcome y and exposure a, mediator m and a set of covariates c the Poisson regression assumes that each individual has its own expected value defined as,

λ = E[Y |a, m, c] = eθ01a+θ2m+θ3am+θ04c Taking the natural logarithm then yields

ln(λ) = θ0+ θ1a + θ2m + θ3am + θ04c

which is the formula for a log-linear Poisson regression. The θi’s are derived using the maximum likelihood estimation method. Exponentiating the θi’s yield the relative risks, RR = eθi.

The main consideration before using the Poisson regression is the assumption that each individual in the data is independent of each other. An underlying property of the Poisson distribution is that if the outcomes for two individuals are Poisson distributed then the expected value of the sum of the outcomes will also follow a Poisson distribution. The independence assumption also leads to the conclusion that for a number of independent observations follows that E(yi) = V (yi) = µ. A violation of this assumption could lead to V (yi) > E(yi) also known as overdispersion, which could result in an underestimation of the standard errors and bias in the parameter estimates (Berk, MacDonald, 2008).

Poisson random variables always have larger variance than binary variables when the mean values are equal (Lumley et al, 2006). Since the study uses binary data this will result in overdispersion. To control for overdispersion we use the robust sandwich variance estimators (RSVE). RSVE is suitable for large sample data and estimates the covariance matrix for the parameter estimates of the Poisson model. The method results in consistent covariance matrix estimates and does not make any distributional assumptions and also the underlying model does not have to be correctly specified. This results in a general robustness of the estimator without the required assumptions for a Poisson regression (R.J. Carroll et al, 1998).

2.6.3 Estimation of Risk Ratios for Binary Mediator and Binary Outcome Using risk ratios (RR), which is calculated as the probability of an event occurring in the exposed group divided with the probability of the same event occurring in the non-exposed group, it is possible to notice the effect of the differences in the groups of individuals. Suppose that the assumptions in Section 2.5 are fulfilled.

(16)

The parameters estimated in the logistic- and Poisson regression models are then used yielding expressions for the four components in the four-way decomposition model.

We have that the scaling factor κ is given by,

κ = E[Yam|c]

E[Ya|c] ≈ eθ2m3am(1 + eβ01a20c) (1 + eβ01a02c+θ23a)

The component due to interaction only, κRRIN Tc ref(m), is given by,

κRRIN Tc ref(m) = eθ1(a−a)(1 + eβ01a02c+θ23a) 1 + eβ01a20c+θ23a − 1

− eθ1(a−a)+θ2m3am(1 + eβ01a20c)

1 + eβ0+β1a20c+θ23a + eθ2m3am(1 + eβ01a02c) 1 + eβ01a20c+θ23a The component due to the mediated interaction κRRcIN Tmed is given by,

κRRIN Tc med = eθ1(a−a)(1 + eβ01a+β02c+θ23a)(1 + eβ01a02c) (1 + eβ01a20c+θ23a)(1 + eβ01a+β02c)

−(1 + eβ01a+β20c+θ23a)(1 + eβ01a02c)

(1 + eβ01a20c+θ23a)(1 + eβ01a+β02c) − eθ1(a−a)(1 + eβ01a02c+θ23a) (1 + eβ01a20c+θ23a) + 1 The component due to controlled direct effect is,

κ[RRCDEc (m) − 1] = κ[e13m)(a−a)− 1] =

= eθ1(a−a)+θ2m3am(1 + eβ01a20c)

1 + eβ01a02c+θ23a − eθ2m3am(1 + eβ01a02c) 1 + eβ01a20c+θ23a The component due to pure indirect effect is,

κ Z

m

E[Yam|c]

E[Yam|c](dP (m|a, c) − dP (m|a, c)) =

= (1 + exp(β0 + β1a+ β20c))(1 + exp(β0+ β1a + β20c + θ2+ θ3a)) (1 + exp(β0 + β1a + β20c))(1 + exp(β0+ β1a+ β20c + θ2+ θ3a)) The average total effect, conditioned on C=c is thus given by,

RRcT E ≈ exp(θ1a)(1 + exp(β0+ β1a+ β20c))(1 + exp(β0+ β1a + β20c + θ2+ θ3a)) exp(θ1a)(1 + exp(β0+ β1a + β20c))(1 + exp(β0+ β1a+ β20c + θ2+ θ3a))

(17)

3 Four-Way Decomposition Analysis of Socioeconomic Risk Factors on Survival Post First Myocardial In- farction

3.1 Method

3.1.1 Design and Data

The data used in this study are from national registries from multiple institutions in Sweden.

• ”The SWEDEHEART national quality Registry for Information and Knowledge about Swedish Heart Intensive Care Admissions” (RIKS-HIA), SWEDEHEART

• ”National Board of Health and Welfare’s Cause of Death Register”, The National Board of Health and Welfare

• ”Registret ¨over totalbefolkningen” (RTB), Statistics Sweden

• ”Inkomst- och taxeringsregistret” (IoT), Statistics Sweden

• ”Yrkesregistret” (SSYK), Statistics Sweden

• ”Utbildningsregistret” (UREG), Statistics Sweden

The study is a population-based cohort study meaning that it consists of a population of individuals with a common defining characteristic. The aim of this study is to study the mortality among first MI survivors within 365 days after discharge from the hospital, between the years 2006-2014.

3.1.2 Study Population

The study population consists of individuals who have experienced their first MI between the years 2006-2014. In total there were 165000 individuals included in the study. However after filtering out patients whom had already experienced a myocardial infarction and also removing missing values from the data the total number of individuals were 120000 of which a random subset of 58641 individuals was selected and used for the results, due to confiden- tiality reasons. Excluding patients that had missing values for some included variables did not yield different results and they were hence excluded from the study.

3.1.3 Outcome

The outcome in this study is defined as death within the first 365 days after being discharged from the hospital after their first MI. Counting the number of days from discharge filters out individuals that died whilst at the hospital, that is the individual died before they were discharged.

(18)

3.1.4 Exposure

The effect that a lower education has on the outcome mortality will be the exposure of interest in this study. The education variable is coded as the highest recorded attained education the year before admission to the hospital. Primary school or secondary school will be denoted as lower education and having a higher degree will be denoted as higher education.

The variable education also contained non assigned values to the education level of the individual. These individuals were removed from the dataset as their exclusion did not affect the results and their education status could not be decided with certainty.

3.1.5 Mediator

The first mediator of interest in this study is the individual’s income. The variable is mea- sured in SEK and is divided into quintiles. The income quintiles range from lowest to highest income. There will be four results presented where the mediating income quintile is a binary variable comparing the first, second, third and fourth quintile to the fifth quintile. The income variable is measured as the individual taxed annual income the year before the indi- vidual was admitted to the hospital. From 2006 to 2014 the median value for this variable ranges from 127 000 SEK to 150 000 SEK.

The other mediator of interest in this study is family type. This variable is a binary variable, taking on the value of 1 if the individual is living alone and 0 otherwise. The data consisted of around 40% of the individuals reported as living alone and 60% which are not living alone.

3.1.6 Covariates

The results are controlled for four different covariates. They are all unmodifiable and every individual has been assigned their value before exposure, that is the exposure does not have an effect on either covariate.

• Gender. A binary variable taking on value 1 if male and value 0 if female. About 64%

of the individuals examined are reported as male, and 36% are reported as female.

• Age. A categorical variable measured as the age of the individual when they had their first MI. The variable is categorized into ten different categories to account for non-linearity, values are depicted in Table 1.

• Year of Admission. A categorical variable measuring the year the individual was ad- mitted to the hospital. The year ranges from 2006-2014 and is controlled for since there exists a decreasing time trend in mortality post first myocardial infarction.

• Region of Birth. A categorical variable included to controlling for any potential differ- ences associated with individuals born in different regions in the world. The regions are depicted in Table 1. Due to few observations individuals born in Oceania are combined with those born in North America. About 86% of the individuals in the data are born in Sweden.

(19)

Table 1: Baseline characteristics stratified by education

Higher Education Lower Education Variables Frequency Proportion (%) Frequency Proportion (%)

Family Type

Living alone 3309 36.7 20824 47.5

Not living alone 6949 63.3 27559 52.5

Income

Q1 853 8.3 10211 21.1

Q2 800 7.8 10893 22.5

Q3 1732 16.9 10143 21.0

Q4 2572 21.1 9337 19.3

Q5 4253 41.5 7666 15.8

Gender

Male 7289 71.1 30037 62.1

Female 2969 28.9 18346 37.9

Age

0 − 40 154 1.5 433 0.9

41 − 50 881 8.6 2767 5.7

51 − 55 892 8.7 2930 6.1

56 − 60 1365 13.3 4440 9.2

61 − 65 1673 16.3 5811 12.0

66 − 70 1596 15.6 6699 13.8

71 − 75 1296 12.6 6705 13.9

76 − 80 1113 10.9 6816 14.1

81 − 90 1177 11.5 10594 21.9

91 − 105 111 1.1 1188 2.5

Year of Admission

2006 822 8.0 4907 10.1

2007 984 9.6 5154 10.7

2008 935 9.1 4995 10.3

2009 921 9.0 4776 9.9

2010 992 9.7 4932 10.2

2011 1028 10.0 4898 10.1

2012 1114 10.9 5029 10.4

2013 1090 10.6 4637 9.6

2014 1152 11.2 4521 9.3

Region of Birth

Africa 59 0.6 114 0.2

Asia 446 4.3 873 1.8

EU28 452 4.4 1311 2.7

European 234 2.3 1058 2.2

Nordic 386 3.8 2697 5.6

North America 51 0.5 61 0.1

South America 44 0.4 145 0.3

Sweden 8586 83.7 42124 87.1

Total 10258 100.0 48383 100.0

(20)

3.2 Results

Estimating the Poisson regression for the two different models, one in which income is the mediator and one in which family type is the mediator,

log [E(Y = 1|a, m, c)] = θ0+ θ1a + θ2m + θ3am + θ40c (7) and then exponentiating the estimated coefficients in Equation 7 yields the risk ratios, controlled for the covariates, and corresponding confidence intervals presented in Table 2.

Similarly, the unadjusted risk ratios are yielded from exponentiating the estimated coef- ficients in Equation 8, and are also depicted in Table 2.

log [E(Y = 1|a, m)] = θ0+ θ1a + θ2m + θ3am (8)

Investigating the adjusted results in Table 2 it is shown that the exposure education has approximately the same risk ratio (1.18) for each of the four quintiles. Looking at the mediator income the results suggest that the risk of mortality decreases when belonging to a higher income quintile, compared to the risk of mortality among individuals in the highest income quintile. Belonging to the lowest income quintile is associated with an estimated 83%

increased risk of mortality meanwhile belonging to the fourth income quintile is associated with an estimated 16% increased risk of mortality, compared with individuals in the highest income quintile. Investigating the interaction term the results show that for the first three income quintiles there is no increased risk with risk ratios approximately 1.00. On the other hand, comparing individuals in the fourth income quintile to those in the highest income quintile there is an increased risk of mortality of an estimated 14%, suggesting that having lower education and belonging to the fourth quintile increases the risk of mortality.

For the model in which family type is the mediator, having a lower education is associated with an estimated 40% increased risk of mortality, comparing to individuals with higher education. The individuals that are living alone have an estimated 46% increased risk of mortality, compared with the individuals which are not living alone. The interaction term between the exposure and mediator have a low effect on the risk of mortality, with an estimated risk ratio of 0.98.

Comparing the adjusted results to the unadjusted results suggests that not controlling for the covariates in general leads to higher risk ratio estimates and wider confidence intervals.

The risk ratios for only the exposure regressed on the outcome is presented i Table 3.

Exponentiating the estimated coefficients in Equation 9 yields the risk ratios, controlled for the covariates gender, age, year of admission and region of birth.

log [E(Y = 1|a, c)] = θ0+ θ1a + θ02c (9)

(21)

In the same way the unadjusted risk ratios are yielded from exponentiating the estimated coefficients in Equation 10.

log [E(Y = 1|a)] = θ0 + θ1a (10)

Table 3 depicts that when no mediator variable is present and the covariates are controlled for lower education has an estimated 41% higher risk of mortality, than those with higher education. When the covariates are not controlled for the risk ratio for individuals with lower education is estimated to 2.02.

Table 2: Risk Ratios (RR) for the exposure A and mediators M regressed on the outcome, with covariates controlled for in the top section of the table and not controlled for in the lower section of the table.

Variable Q1vs Q5(95%CI) Q2vs Q5(95%CI) Q3vs Q5(95%CI) Q4vs Q5(95%CI) Family Type(95%CI) Adjusted

A 1.18 (-1.02:3.36) 1.17 (-1.04:3.38) 1.18 (-1.03:3.39) 1.18 (-1.03:3.37) 1.40 (-0.71:3.51) M 1.83 (-0.58:4.24) 1.80 (-0.51:4.11) 1.43 (-0.82:3.68) 1.16 (-1.09:3.41) 1.46 (-0.70:3.62) A ∗ M 1.03 (-1.41:3.47) 1.00 (-1.34:3.34) 0.99 (-1.29:3.27) 1.14 (-1.16:3.44) 0.98 (-1.19:3.15) U nadjusted

A 1.17 (-1.04:3.38) 1.17 (-1.04:3.38) 1.17 (-1.04:3.38) 1.17 (-1.04:3.38) 1.85 (-0.26:3.97) M 1.56 (-0.85:3.97) 3.04 (0.73:5.36) 2.83 (0.58:5.08) 1.72 (-0.54:3.98) 1.83 (-0.34:4.00) A ∗ M 2.73 (0.29:5.18) 1.42 (-0.93:3.77) 1.00 (-0.29:3.30) 0.95 (-1.36:3.26) 1.04 (-1.14:3.22)

Table 3: Risk Ratios (RR) for the exposure A regressed on the outcome, with covariates controlled for in the top section of the table and not controlled for in the lower section of the table.

Variable Education(95%CI) Adjusted

A 1.41 (-0.65:3.47) U nadjusted

A 2.02 (-0.04:4.09)

(22)

3.2.1 Model with Income as Mediator

In this specific four-way decomposition analysis the interpretation for the controlled direct effect component, CDE, is the excess relative risk (ERR) of mortality for an individual that has lower education and whose income is in the fifth quintile, compared to those individuals with higher education and whose income is in the fifth quintile. That it, this is the effect of only the education on mortality.

The interpretation for the reference interaction component, IN Tref, is the excess relative risk of mortality for an individual due to the interacting effect of having lower education and an income level in either the first-, second-, third- or fourth quintile.

The mediated interaction component, IN Tmed, is interpreted as the excess relative risk of mortality for an individual due to both the interacting effect between having lower education as highest attained education and an income level in either the first-, second-, third- or fourth quintile, but also due to the effect that having lower education has on the individual’s income level and how that affects the mortality.

The interpretation of the component for pure indirect effect, PIE, is interpreted as the excess relative risk of mortality for an individual that results from the effect that having a lower education has on the individual’s income level and how that effect is reflected on the mortality for the individual.

Observing the results in Table 4 the first income quintile results in a total ERR of 0.58 (0.29:0.87), meaning that the individuals having an income level belonging in the first quintile and having a lower education had an on average 58% (95% CI: 29:87) increased risk of mortality. This total effect is decomposed into the four components. The two most influential components are the pure indirect effect with an average 53% (3.2:103) attributable to the total ERR, and the controlled direct effect of the education with an average 28% (- 2.8:59.6) attributable to total ERR. The overall proportion attributable due to mediation was on average 67% (40.9:92.7) indicating that mediation is the most attributable to total ERR. The overall proportion attributable to interaction was relatively low and also not statistically significant on a reasonable significance level.

For the second income quintile, the results are similar to those in the first income quintile, comparing to the highest income quintile. With an average total ERR of 0.53 (0.26:0.79) with the most important component being the pure indirect effect with an average of 57%

(17.4:96.6), and the proportion due to overall mediation was 67% (39.7:94.3). The overall proportion attributable to interaction was relatively low and also not statistically significant on a reasonable significance level.

For individuals belonging to the third income quintile, the average total ERR was 0.31 (0.11:0.51) meaning an average 31% increased risk of mortality, for individuals belonging to the third income quintile having a lower education, compared to those in the highest income quintile. The most important components attributable to the total ERR were CDE, 53% (2.4:102.9), and PIE, 37% (6:68), meaning that the direct effect of education and the mediated effect of income level contributed most to the total ERR. The overall proportion attributable to mediation was 43% (13.2:71.9). The overall proportion attributable to in- teraction was relatively low and also not statistically significant on a reasonable significance level.

Individuals in the fourth quintile had a similar average total ERR, 32% (11:52), as in-

(23)

dividuals in the third quintile. However, the components most attributable to the total ERR were CDE, an average of 54% (0.0:108.3), and IN Tref, an average of 24% (-17.9:64.9).

Overall proportion due to interaction was larger than in the first, second and third income quintile however not statistically significant on a reasonable significance level. The overall proportion attributable to mediation was 22% (5.2-39.4), indicating a lower proportion due to mediation than the first, second and third income quintile.

Table 4: Excess relative risks (ERR), proportion attributable (PA) and overall proportion attributable (OPA) for the model with income as mediator, comparing the first four income quintiles to the fifth income quintile. (p < .1,∗∗p < .05,∗∗∗p < .01).

Note: nQ1vsQ5 = 19932, nQ2vsQ5 = 20513, nQ3vsQ5 = 20756, nQ4vsQ5 = 21013.

Component ERR (95% CI) PA(%) (95% CI) OP Amed(95%CI) OP Aint(95%CI) QCDE1 0.16 (-0.07:0.40) 28.4 (-2.8:59.6)

QIN T1 ref 0.03 (-0.07:0.12) 4.7 (-10.9:20.4) 

18.3(−42.2 : 78.8) QIN T1 med 0.08 (-0.19:0.35) 13.6 (-31.2:58.4) 

66.8∗∗∗(40.9 : 92.7) QP IE1 0.31∗∗ (0.06:0.56) 53.3∗∗ (3.2:103)

QT otal1 0.58∗∗∗ (0.29:0.87) 100.0

QCDE2 0.15 (-0.08:0.39) 29.3 (-4.4:63.1)

QIN T2 ref 0.02 (-0.05:0.09) 3.6 (-10.0:17.2) 

13.6(−37.4 : 64.7) QIN T2 med 0.05 (-0.14:0.25) 10.0 (-27.4:47.5) 

67.0∗∗∗(39.7 : 94.3) QP IE2 0.30∗∗∗ (0.11:0.49) 57.0∗∗∗ (17.4:96.6)

QT otal2 0.53∗∗∗ (0.26:0.79) 100.0

QCDE3 0.16 (-0.07:0.40) 52.6∗∗ (2.4:102.9)

QIN T3 ref 0.01 (-0.07:0.10) 4.8 (-23.7:33.3) 

10.4(−51 : 71.8) QIN T3 med 0.02 (-0.08:0.12) 5.6 (-27.3:38.4) 

42.5∗∗∗(13.2 : 71.9) QP IE3 0.11∗∗ (0.02:0.21) 37.0∗∗ (6.0:68.0)

QT otal3 0.31∗∗∗ (0.11:0.51) 100.0

QCDE4 0.17 (-0.08:0.42) 54.2∗∗ (0.0:108.3)

QIN T4 ref 0.07 (-0.05:0.19) 23.5 (-17.9:64.9) 

36.5(−27.7 : 100.6) QIN T4 med 0.04 (-0.03:0.11) 12.9 (-1:35.7) 

22.3∗∗(5.2 : 39.4) QP IE4 0.03 (-0.03:0.08) 9.4 (-7.9:26.7)

QT otal4 0.32∗∗∗ (0.11:0.52) 100.0

References

Related documents

ISSN: 0300-9734 (Print) 2000-1967 (Online) Journal homepage: https://www.tandfonline.com/loi/iups20 Repeated measures of body mass index and waist circumference in the assessment

Assistive technology (AT), which is quite well known among teachers, has been used for several years to scaffold students with reading disabilities and dyslexia.. AT

By comparing the salience of political actors’ frames and news media frames it is possible to explore the influence of the opposition and the government on the framing of the

The EU exports of waste abroad have negative environmental and public health consequences in the countries of destination, while resources for the circular economy.. domestically

Avhandlingens disposition sådan den nu redovisats är på flera sätt tydlig och logisk men därför inte oproblema­ tisk. Mellan de olika kapitlen löper ju

They contribute to coagulation by providing an appropriate surface for the coagulation cascade to take place, amplifying the coagulation process trough activation of other

This study aimed to validate the use of a new digital interaction version of a common memory test, the Rey Auditory Verbal Learning Test (RAVLT), compared with norm from

När deras nära anhörig var i livets slutskede i hemmet fanns behov av extra stöd.. Det har även gjorts forskning som visade att omvårdnad som utfördes av familjemedlem inte blev