• No results found

Outcome regression methods in causal inference: The difference LASSO and selection of effect modifiers

N/A
N/A
Protected

Academic year: 2022

Share "Outcome regression methods in causal inference: The difference LASSO and selection of effect modifiers"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Outcome regression

methods in causal inference:

The difference LASSO and selection of effect modifiers

Moa Edin

(2)

Abstract

In causal inference, a central aim of covariate selection is to provide a subset of covariates, that is sufficient for confounding adjustment. One approach for this is to construct a subset of covariates associated with the outcome.

This is sometimes referred to as the outcome approach, which is the subject for this thesis. Apart from confounding, there may exist effect modification. This occurs when a treatment has different effect on the outcome, among different subgroups, defined by effect modifiers.

We describe how the outcome approach implemented by regression models, can be used for estimating the ATE, and how sufficient subsets of covariates may be constructed for these models. We also describe a novel method, called the difference LASSO, which results in identification of effect modifiers, rather than determination of sufficient subsets.

The method is defined by an algorithm where, in the first step, an incorrectly specified model is fitted. We investigate the bias, arising from this misspecification, analytically and numerically for OLS.

The difference LASSO is also compared with a regression estimator. The comparison is done in a simulation study, where the identification of effect modifiers is evaluated. This is done by analyzing the proportion of times a selection procedure results in a set of covariates including only the effect modifiers, or a set where the effect modifiers are included as a subset.

The results show that the difference LASSO works relatively well for identification of effect modifiers. Among four designs, a set containing only the true effect modifiers were selected in at least 83.2%. The corresponding result for the regression estimator was 27.9%. However, the difference LASSO builds on biased estimation. Therefore, the method is not plausible for interpretation of treatment effects.

Sammanfattning

Regressionsmetoder f¨ or utfall inom kausal inferens:

”The difference LASSO” och val av effektmodifierare

Ett centralt m˚ al med att v¨ alja bakgrundsvariabler (kovariater) inom kausal inferens, ¨ ar att v¨ alja en delm¨ angd som

¨

ar tillr¨ acklig f¨ or att justera f¨ or confounding. Ett s¨ att att skapa en s˚ adan delm¨ angd ¨ ar att v¨ alja de kovariater som har ett observerat samband med responsvariabeln. Detta kallas ibland f¨ or ”the outcome approach” och ¨ ar det tillv¨ agag˚ angss¨ att som anv¨ ands i den h¨ ar uppsatsen. F¨ orutom confounding kan det ¨ aven finnas effektmodifikation, vilket uppst˚ ar n¨ ar en behandling har olika effekt bland olika grupper som definieras av effektmodifierare.

Vi beskriver hur regressionsmodeller kan anv¨ andas f¨ or att skatta ATE och hur tillr¨ ackliga delm¨ angder kan skapas f¨ or dessa modeller. Vi beskriver ¨ aven en ny metod som kallas ”the difference LASSO”. Metoden resulterar i att kovariater som ¨ ar effektmodifierare identifieras, ist¨ allet f¨ or att en delm¨ angd kovariater som ¨ ar tillr¨ acklig f¨ or justering av confounding skapas. Metoden best˚ ar av en algoritm, d¨ ar det f¨ orsta steget inneb¨ ar att en felaktigt specificerad modell skattas. Vi unders¨ oker den bias som uppst˚ ar p˚ a grund av detta, b˚ ade analytiskt och numeriskt, f¨ or OLS.

”The difference LASSO” j¨ amf¨ ors ocks˚ a med en regressionsestimator. J¨ amf¨ orelsen g¨ ors i en simuleringsstudie, d¨ ar identifieringen av effektmodifierare utv¨ arderas. Det g¨ ors genom en analys av hur m˚ anga g˚ anger valet av kovariater resulterar i en delm¨ angd som endast inneh˚ aller sanna effektmodifierare eller en delm¨ angd d¨ ar de ing˚ ar.

Resultaten visar att ”the difference LASSO” fungerar relativt v¨ al som metod f¨ or att identifiera effektmodifierare.

Bland fyra designer resulterade metoden i att en delm¨ angd, med enbart sanna effektmodifierare, valdes i minst 83, 2 %, vilket kan j¨ amf¨ oras med 27, 9 %, som ¨ ar motsvarande resultat f¨ or regressionsestimatorn. D¨ aremot bygger

”the difference LASSO” p˚ a en felaktigt specificerad modell vilket ger upphov till systematiska fel av skattningar.

Detta medf¨ or att metoden inte passar f¨ or att tolka effekten av en behandling.

(3)

Popular science summary

In causal inference we are interested in making causal statements about the effect of a treatment on some outcome.

Consider an example where we study the effect of participation in a reading program on the reading ability among students. In this example, the treatment would be participation in the reading program, and the outcome could, for example, be a score on a given reading test. If the treatment is not randomly assigned there may be variables that affect the outcome or the treatment assignment, that is, the test score or if a student participates in the reading program or not. We call these variables covariates and an example of a covariate could be a student’s health condition.

There may be covariates that affect both the treatment assignment and the outcome. We call these covariates confounding covariates, and the confounding makes it difficult to draw conclusions about the true treatment effect.

For example, if students who enjoy reading decide to participate in the reading program in greater extent, and they also score higher on the reading test, the effect of the reading program will be overestimated due to the confounding variable “reading enjoyment”. This bias can be removed if we, for instance, only compare test scores of students who enjoy reading.

In order to make valid inference about the true treatment effect, we must have access to all confounding covariates, in our data. This is sometimes referred to as an assumption of “no unmeasured confounding”. In many situations we have access to a large number of covariates, where it is possible to find smaller sets of covariates, that imply no unmeasured confounding. In the example above, one approach for finding such a set of covariates could be to only include covariates that affect the test score, which is the approach we use in this thesis.

The treatment may also have different effects on the outcome in subgroups, defined by some covariate(s). These covariates are called effect modifiers. Continuing the example above, suppose that students, whose parents are highly educated, are used to focus and therefore can benefit from the reading program in greater extent than other students.

In this case the parents education is an effect modifier, and it may be of interest to identify these kind of covariates.

In this thesis we describe standard regression methods for estimating the effect of a treatment on an outcome. We also describe how, as a part of these methods, a set of covariates that imply no unmeasured confounding, is chosen.

Additionally, we describe a method, called the difference LASSO. Instead of finding confounding covariates, this method identifies effect modifiers. The method is defined by a procedure which causes systematic errors (bias), and we investigate these errors as well.

Furthermore, a comparison, between the difference LASSO and one of the standard methods, is made in a simulation

study. The comparison is based on the number of times each method found the true effect modifiers, under different

scenarios. From this we have learned that the difference LASSO works relatively well for finding effect modifiers, for

instance, the method resulted in that the true effect modifiers were found in at least 83.2%. This can be compared

to 27.9%, which is the corresponding result for the standard method. However, the difference LASSO is based on a

procedure which causes bias. This suggests that the method only should be used for identification of effect modifiers,

and not for interpretation of treatment effects.

(4)

Acknowledgements

I would like to thank my supervisor Ingeborg Waernbaum for her valuable help and guidance during the

work of this thesis.

(5)

Contents

1 Introduction 1

1.1 Purpose and aims . . . . 2

2 Model and theory 2 2.1 The framework of potential outcomes . . . . 2

2.2 Effect modification . . . . 4

2.3 Regression estimators for causal effects . . . . 5

2.3.1 Outcome model . . . . 6

2.3.2 Fitting a model for the whole sample . . . . 6

2.3.3 Fitting a model conditional on the treatment variable . . . . 7

2.4 Variable selection and selection of effect modifiers . . . . 7

2.4.1 The LASSO . . . . 8

2.5 The difference LASSO . . . . 9

3 Simulations 12 3.1 Designs . . . . 12

3.2 Results . . . . 15

4 Discussion 17

References 20

Appendix 21

(6)

1 Introduction

In causal inference we are interested in making causal statements about a treatment on some outcome of interest.

In order to do so, we would like to compare the outcomes for a unit, under different treatment assignments, i.e., we would like to observe the so called potential outcomes. Unfortunately, a unit can not be assigned more than one of the treatments at the same time, and therefore we can only observe one of the potential outcomes. This has been referred to as the “fundamental problem of causal inference” (Holland, 1986). Depending on what data is handled, there are different approaches for solving this problem.

The procedures for the inference will differ depending on if the treatment was randomly assigned or not. If the treat- ment was randomly assigned, the study is called a randomized study or an experiment. While analyzing experimental data we can often assume that the only thing causing differences in the outcome, between the treatment groups, is the treatment itself. Therefore, we can do a direct comparison of, e.g., the mean outcomes between the treatment groups, to get knowledge about the causality. This corresponds to the average treatment effect (ATE), which is the parameter of interest in this thesis (Rubin, 1974).

In observational studies, as opposed to randomized studies, units that are assigned different treatments, may also differ in other covariates, that are affecting the outcome. This is called confounding (Rosenbaum and Rubin, 1983).

In order for estimators of causal effects to be unbiased, we have to adjust for the confounding. This is possible only if all confounding covariates have been observed. In general, we can not test whether or not all confounders have been observed, therefore we assume that they have. This assumption is sometimes referred to as no unmeasured confounding, and when this is valid, it is possible to adjust for the confounding.

Variable or predictor selection is a topic of much interest in many fields within statistics, and the same also applies to causal inference, where it is instead called covariate or confounder selection (Vansteelandt et al., 2012). Henceforth, these terms will be used interchangeably. A central aim of covariate selection in causal inference is to provide a subset of covariates that is sufficient for confounding adjustment. In a given data set there may be several sufficient subsets, e.g., subsets that are determined based on the probability to be treated given the covariates, i.e., the propensity score.

Another approach is to choose covariates that are associated with the outcome. By this approach, we may end up with a subset that includes covariates that are not confounders, but the subset will still be sufficient for confounding adjustment. This approach is sometimes referred to as the outcome approach and is the subject for this thesis (Witte and Didelez, 2018).

When we have access to a large number of covariates, which often is the case in observational studies, a useful method for the covariate selection is the least absolute shrinkage selection operator (LASSO) (Tibshirani, 1996). When fitting a regression model with the LASSO, coefficients that are not associated with the outcome are set to be exactly zero.

Therefore, the covariate selection is made simultaneously as the model is fitted.

Sufficient subsets are used to estimate potential outcomes, which are used to estimate the treatment effect. One way of estimating the ATE is with regression estimators (Imbens, 2004, Section III). Considering the outcome approach for covariate selection, the regression estimators are based on models fitted for the outcome. The models are used to predict values for the potential outcomes, in order to estimate the treatment effects. In this thesis we consider two common approaches for estimating these models. In the first one, the model for the outcome is fitted by using the whole sample. In the second, the model is instead fitted conditioning on the treatment, and the covariate selection is made for these models.

Except for confounders, a set of covariates may include so called effect modifiers. Effect modification occurs when

a treatment has different effect, among different subgroups which are defined by some covariate(s). The covariates,

defining the subgroups, are called effect modifiers. It is often of interest to identify the effect modifiers since they,

(7)

e.g., could be helpful for finding subgroups that would benefit the most from a treatment (Hernan and Robins, 2010, Part I, ch. 4).

None of the described approaches for estimating the models for the outcome, in the regression estimators, will automatically give knowledge about effect modifiers. To begin with, consider the model that is fitted by using the whole sample. In this case, effect modifiers are included in the model by adding interactions between the treatment variable and the effect modifiers, in the model. If, e.g., the true model includes effect modifiers and we assume that there are effect modifiers, i.e., we add interactions in the model, then the interactions will be selected in a covariate selection procedure. If we instead assume a constant treatment effect, i.e., we do not include any interactions in the fitted model, no interactions can be selected. In other words, with this approach we have to specify whether or not to include effect modifiers in the outcome model. Now consider the approach where the models for the outcome are fitted conditioning on the treatment. In this case, because the effect modification is implied within the models, there will be no interactions selected in a covariate selection procedure. That is, we do not identify any effect modifiers using this method.

Thus, even if the regression methods are unbiased for the ATE, they do not result in automatic identification of effect modifiers. However, there is a novel method that aims at achieving this. It is called the difference LASSO, and was introduced by Ghosh et al. (2015). The method is defined by an algorithm, where a new outcome is defined as the difference between the observed and the potential outcome. This quantity is predicted, and then used as an outcome in a regression model fitted with the LASSO. The covariates selected in a covariate selection procedure, are in this case the covariates that interacts with the treatment. Therefore, identification of effect modifiers are made simultaneously as the model is fitted.

1.1 Purpose and aims

The purpose of this thesis is to describe how regression models can be used for estimating the ATE, and how sufficient subsets of covariates may be constructed for these models by using the outcome approach. Additionally, it is to describe, investigate and provide a bias analysis for a recently proposed method, called the difference LASSO.

Furthermore, this thesis aims at comparing a commonly used regression estimator with the difference LASSO in a simulation study, in which the identification of effect modifiers will be evaluated. This will be done by analyzing the proportion of times the selection procedures results in a set of covariates, including only the effect modifiers, or a set where the effect modifiers are included as a subset.

2 Model and theory

This section begins with a brief introduction to the framework of potential outcomes. Then follows a description of effect modification, regression estimators as well as covariate selection. Finally, this section ends with a presentation of a method called the difference LASSO.

2.1 The framework of potential outcomes

The framework of potential outcomes is a standard for experimental and observational studies of causal effects (Imbens, 2004, Section II.A), and it is the model we use, to define an effect of a treatment on an outcome.

The following notations are used. We consider an iid sample i = 1, ..., N, drawn from a large population. For each

unit in the sample, an indicator variable for treatment, T

i

, is observed, i.e., it will take on the value 1 if unit i receives

treatment and 0 otherwise. Let Y

i

(1) be the potential outcome if unit i receives treatment, and Y

i

(0) be the potential

outcome if unit i does not receive treatment. A unit can only be assigned one of the treatment levels a time, i.e.,

(8)

it is not possible to observe the outcome for a unit for both T

i

= 1 and T

i

= 0. The observed outcome for unit i is defined by Y

i

= T

i

Y

i

(1) + (1 − T

i

)Y

i

(0). Let X

i

= (X

i1

, X

i2

, ..., X

ip

) be a p × 1–vector of covariates for unit i. Thus, for each unit we observe a triple, (Y

i

, T

i

, X

i

). Henceforth, we will drop the index i where it is not necessary. Finally, let µ

t

(x) = E[Y (t)∣X = x, T = t], t = 0, 1, be the two conditional regression functions.

The treatment effect for unit i is Y

i

(1) − Y

i

(0), but since we observe the outcome for T

i

= 1 or T

i

= 0, we can not compute treatment effects on the unit level. Instead we have to compare units in the population, and then summarize this information. One parameter of interest when considering causal effects is the average treatment effect (ATE). It is defined by

ATE = E[Y (1) − Y (0)], (1)

and is the focus in this thesis. There may be other parameters of interest, e.g., the average treatment effect on the treated (ATT), which is defined by,

ATT = E[Y (1) − Y (0)∣T = 1].

In order to identify the ATE we make two assumptions. In observational studies, units that are assigned different treatments may differ in other covariates than the treatment variable, that are also associated with the outcome.

Such variables are called confounders. In Assumption 1 we state that, if we have observed all confounders, we can identify the ATE (Rosenbaum and Rubin, 1983),

Assumption 1 (Unconfoundedness)

Y (1), Y (0) ⊥ T ∣X.

This is sometimes referred to as unconfoundedness (Imbens, 2004, Section II.C). When it holds we can consider that we have observations from an experiment, instead of an observational study. In Assumption 2 we assume that given the covariates, all units have a positive probability to receive treatment,

Assumption 2 (Overlap)

0 < Pr(T = 1∣X) < 1.

If this is true, we may say that there is overlap. When Assumption 1 and 2 hold we can identify the ATE, i.e., the causal parameter of interest, with the observed data. To begin with, if we assume that we have unconfoundedness, the conditional regression functions can be expressed,

µ

0

(x) = E[Y (0)∣X = x] = E[Y (0)∣X = x, T = 0] = E[Y ∣X = x, T = 0], (2) µ

1

(x) = E[Y (1)∣X = x] = E[Y (1)∣X = x, T = 1] = E[Y ∣X = x, T = 1]. (3) Furthermore, if there is overlap, Equation 1 can be expressed by Equation 2-3 (Imbens, 2004, Section II.C),

ATE = E

X

[E[Y ∣X = x, T = 1] − E[Y ∣X = x, T = 0]]

= E

X

[E[Y (1)∣X = x, T = 1] − E[Y (0)∣X = x, T = 0]]

= E

X

[E[Y (1)∣X = x] − E[Y (0)∣X = x]]

= E[Y (1) − Y (0)],

thus, the ATE is identified. To examine whether or not Assumption 2 is met, we usually compare the univariate distributions of the covariates between treated and controls. We consider the assumption as valid if there is support for all values of the covariates and the treatment variable.

In observational studies, Assumption 1 is usually not known to hold (Rosenbaum and Rubin, 1983). To determine

whether or not the assumption is met, is often a question of subject matter knowledge and may be more reliable in

(9)

cases when we have a large number of covariates. In order to achieve unconfoundedness we usually have to adjust for the confounders by using different subsets of covariates. One alternative is to define a subset that includes covariates that are related to the outcome of interest. This approach is sometimes referred to as the outcome approach (Witte and Didelez, 2018).

2.2 Effect modification

When a treatment has different effect among different subgroups, there is effect modification. The covariates that generate effect modification are called effect modifiers. If we let M be a binary effect modifier, this means that E [Y (1) − Y (0)∣M = 1] ≠ E[Y (1) − Y (0)∣M = 0]. Note that a covariate can be both a confounder and an effect modifier, but an effect modifier is not necessarily a confounder. This is illustrated in the Venn diagram, in Figure 1.

Confounders Effect

modifiers Non-confounders

Covariates

Figure 1: Illustration of considered types of co- variates.

In many applications effect modifiers may provide important information, since the magnitude of the effect of a treatment will vary, according to the presence of the effect modifier.

There may be several reasons why it could be important to identify the effect modifiers. One example is that identification could be helpful for policy makers, in order to find subgroups that would benefit the most from, e.g., a political program.

Another example is that identification could help in finding subgroups where a treatment is in fact harmful.

In order to explain the impact of an effect modifier on the

inference, a numerical example is given in Table 1. If we con-

sider the ATE for the entire population, i.e., E [Y (1) − Y (0)],

this would be zero. In other words, there is no effect of the

treatment, on the outcome, in the entire population. If we

instead consider the ATE in subgroups, created by stratify-

ing on the effect modifier, M , this would result in different

treatment effects in the different subgroups. When M = 1, the

effect is −1 and when M = 0, the effect is instead 1. That is, in

the entire population, the effects for the subgroups eliminate

each other, since they are of equal size but with opposite signs. In addition to this, we have created another example

which is found in Figure 2. This example is an illustration of the difference between when there is effect modification

by M and when there is not. In panel a), the variable M is a binary effect modifier, which is visible since the ATE

is different for different values on M , this is not the case in panel b).

(10)

0.0 2.5 5.0 7.5 10.0

0 1

M

Y(1)Y(0)

a) Effect modification

−3 0 3 6

0 1

M

Y(1)Y(0)

b) No effect modification

Figure 2: Example, illustrating the impact of the effect modifier M . In a), there is an effect modification by M , and in b) there is no effect modification by M .

Table 1: Numerical ex- ample, illustrating the im- pact of the effect modifier M .

i M Y (1) Y (0)

1 1 0 1

2 1 0 1

3 1 1 0

4 1 1 1

5 1 0 0

6 0 0 0

7 0 0 1

8 0 1 0

9 0 1 0

10 0 1 1

One way to identify the effect modifiers is to stratify by the covariates we believe are effect modifiers. This means that the causal effect, of a treatment on some outcome, is computed within each stratum (Hernan and Robins, 2010, Part I, ch. 4). In other words, either we have to know, beforehand, which covariates that possibly are effect modifiers, or we have to search among all covariates, a method which may be inefficient, especially when the number of observed covariates is large.

2.3 Regression estimators for causal effects

One alternative for estimating treatment effects, while analyzing observational data, is to use regression estimators. A regression estimator of the ATE includes estimation of the conditional regression functions, µ

t

(X) = E[Y (t)∣X]. If parametric models are assumed for these expectations, these functions follow a model, µ

t

(X, θ

t

), with assumed functional form but unknown parameter(s) represented by the vector θ

t

. Let ˆ

µ

t

(X) = µ

t

(X, ˆθ

t

) be the estimated outcome regression. If the model assumptions are correct, and ˆ θ

t

is consistent for θ

t

, then ˆ µ

t

(X) is consistent for µ

t

(X). This results in that the estimator, ̂ ATE, of ATE is consistent (Imbens, 2004, Section III.A). Therefore, when using parametric models in the regression estimators, besides Assumption 1 and 2, a third assumption has to be made about the correctness of the model assumptions, Assumption 3 (Correctness of outcome model)

E [Y (t)∣X] = µ

t

(X, θ

t

), t = 0, 1, is correctly specified.

With a regression approach for estimating the treatment effects, we fit a regression model and use the model fit to predict values, which are used as potential outcomes. We now describe two general regression estimators.

Given that we have estimated the conditional regression functions, one way of estimating the ATE is by averaging

the difference between the estimated conditional regression functions over the empirical distribution of the covariates.

(11)

Let ˆ µ

0

(X) and ˆµ

1

(X) be estimators of µ

0

(X) and µ

1

(X) respectively, then we can estimate the ATE by (Imbens, 2004, Equation 2),

̂ ATE

1

= 1

N

N

i=1

[ˆµ

1

(X

i

) − ˆµ

0

(X

i

)]. (4)

Another way of estimating the ATE is to use the observed values of the potential outcomes, corresponding to the treatment levels, instead of predicting both potential outcomes. With this approach we get the following estimator,

̂ ATE

2

= 1

N

N

i=1

{T

i

[Y

i

− ˆµ

0

(X

i

)] + (1 − T

i

)[ˆµ

1

(X

i

) − Y

i

]}. (5)

For some regression estimators, the average predicted outcome for those who received treatment is the same as their averaged observed values, and the same is valid for the second treatment group. In these situations the resulting estimate of ̂ ATE

1

is the same as the resulting estimate of ̂ ATE

2

(Imbens, 2004, Section III.A). The outcome in Equations 4–5 can be either discrete or continuous, but henceforth it is assumed to be continuous. The estimators described in equation 4 and 5 are general, since the forms of the conditional regression functions are not specified, but these functions have to be estimated, in order to calculate ̂ ATE

1

and ̂ ATE

2

. This will be discussed in the subsequent sections.

2.3.1 Outcome model

In order to estimate ̂ ATE

1

and ̂ ATE

2

, the conditional regression functions have to be specified. We will now consider the estimator of ATE, corresponding to Equation 4, and for simplicity we present parametric linear models for the conditional regression functions. However, semi-parametric and non-parametric models can also be used. The models are specified through

µ

t

(X) = ϕ + β

X + τT, (6)

where β is a p × 1–vector containing the regression coefficients, ϕ is an intercept and τ is the coefficient for the treatment. Let ˆ ϕ, ˆ β and ˆ τ be estimators of the parameters corresponding to ϕ, β and τ , in Equation 6. With this model, the estimated ATE is equal to τ , which is in fact a constant treatment effect. This is visible when substituting ˆ

µ

0

(X

i

) and ˆµ

1

(X

i

) in Equation 4 with the estimated model corresponding to the model in Equation 6,

̂ ATE

1

= 1

N

N

i=1

[( ˆϕ + ˆβ

X

i

+ ˆτ) − ( ˆϕ + ˆβ

X

i

)] = ˆτ.

If the model in Equation 6 is correctly defined, and ˆ τ is a consistent estimator of τ , then this regression estimator is a consistent estimator of the ATE. In subsequent sections, two common approaches for fitting the outcome models, in regression estimators, are described.

2.3.2 Fitting a model for the whole sample

In the first approach, we consider fitting a model by using the whole sample. If the true ATE is constant, we can estimate the conditional regression functions with ˆ µ

t

(X

i

), by using the regression function Y

i

= ϕ + β

X

i

+ τT

i

+ ε

i

, and estimate τ by least square (LS) estimation. As was shown earlier the model implies that we assume that the ATE is constant (Imbens, 2004, Section III.A).

If we wish to not assume a constant treatment effect, i.e., if we assume effect modification, we have to include interactions between the treatment variable and the assumed effect modifiers, when specifying the models for the conditional regression functions. If, e.g., the conditional regression functions are specified through,

µ

t

(X) = ϕ + β

X + τT + γX

1

T, (7)

(12)

then X

1

is an effect modifier. By substituting ˆ µ

0

(X

i

) and ˆµ

1

(X

i

) in Equation 4 with the estimated model, corre- sponding to the model in Equation 7, we get,

̂ ATE

1

= 1

N

N

i=1

[( ˆϕ + ˆβ

X

i

+ ˆτ + ˆγX

i1

) − ( ˆϕ + ˆβ

X

i

)] = 1 N

N

i=1

∑ [ˆτ + ˆγX

i1

] Ð→ τ + γE[X

p i1

]. (8)

See Appendix A1 for more details. As been illustrated in this example, we have to assume what kind of treatment effect we are expecting, in order to include relevant variables, in our model. If, e.g., there is a model misspecification due to missed interactions, this will lead to biased regression coefficients and therefore the estimation of the treatment effects will be biased. However, there is a general approach in which we do not have to assume what kind of treatment effect there is, and it is described in next section.

2.3.3 Fitting a model conditional on the treatment variable

In the second approach, instead of fitting one model, we consider fitting two models, one for each potential outcome.

In this case two subsets are defined, conditional on the treatment, and one model is fitted for each subset (Imbens, 2004, Section III.A),

µ

0

(X) = α

0

+ β

0

X,

µ

1

(X) = α

1

+ β

1

X, (9)

where α

0

and α

1

are intercepts and β

0

and β

1

are 1 ×p–vectors, containing regression coefficients. With these models, we predict values which we use in Equation 4 as,

̂ ATE

1

= 1

N

N

i=1

∑ [(ˆα

1

+ ˆβ

1

X

i

) − (ˆα

0

+ ˆβ

0

X

i

)]

= 1 N

N

i=1

∑ [ˆα

1

− ˆα

0

+ ( ˆβ

1

− ˆβ

0

)X

i

]. (10)

If we consider Equation 8 again, we can find the corresponding estimated coefficients in Equation 10, namely ˆ τ = ˆ

α

1

− ˆα

0

and ˆ γ = ˆβ

1

− ˆβ

0

. Thus, in this regression estimator we do not specify any interactions between the treatment variable and assumed effect modifiers, because they are implicitly specified by the models for the potential outcomes.

Regression estimators may be problematic to use in some situations, e.g., when the distributions of the covariates differs a lot between the treatment groups. This is because the estimation in these situations rely heavily on extrap- olation. In other words, we extrapolate fitted values from the models even in regions where we do not have support from the data (Imbens, 2004, Section III.A). As been mentioned in Section 2.3, regression estimators depend on correctness of outcome model. In other words, if we omit a variable in the model specification or incorrectly specify its functional form, the resulting ̂ ATE will be biased. There exists methods to handle these kind of problems, e.g., inverse probability weighting (IPW), but since the subject is beyond the scope of this text, we leave it to the reader (Imbens, 2004, Section III.D).

2.4 Variable selection and selection of effect modifiers

In regression analysis the choice of variables for a model, given its functional form, depends on in what context the

model is to be used. If the model is supposed to be used for describing associations between variables, a common

choice is to reduce the number of variables, in order to get a model that is easy to interpret. If the model instead

is supposed to be used for prediction, one may argue that variables that increase the prediction accuracy should be

included in the model.

(13)

In causal inference the intention is to use models in order to estimate treatment effects. For estimators to be unbiased, it is necessary to adjust for confounding, hence in this context covariates that create a sufficient subset for confounding adjustment should be included in the models, e.g., covariates affecting the outcome is a sufficient adjustment set (Koch et al., 2018). However, this does not have to be the only aim of a covariate selection procedure.

Another aim could be efficiency of estimation. While considering a linear regression function, which is the context of this thesis, the estimates are most precise when the model includes covariates that are affecting the outcome, even when these are not necessary for adjustment of the confounding (Witte and Didelez, 2018). That is, the aim is to construct a subset of covariates that are associated with the outcome. In terms of the regression estimators, described in previous section, we assumed models for the conditional expectations. The covariate selection is done while fitting these assumed models, and the selected covariates are those associated with the outcome.

Considering the regression estimator fitted for the whole sample, described in Section 2.3.2, covariates that could be selected in a covariate selection procedure are those included in the model specification for the outcome. In other words, if the model specification only includes first order terms of covariates and the treatment variable, we only assess information about main effects on the outcome. In order to get knowledge about effect modification, interactions between the treatment variable and covariates have to be included in the model specification. That is, if the true model includes effect modifiers and we do not specify the corresponding interactions in the outcome model, we will not get any information about the effect modification. Additionally, the estimated main effect will be biased. Hence, for effect modifiers to be selected in a covariate selection procedure, they have to be specified as interactions, in the outcome model.

In the regression estimator, described in Section 2.3.3, remember that two models are estimated, one for each potential outcome. Variables that could be included in a subset, consisting of variables associated with the outcome, are those included in the specification of either model. If, e.g., one variable is associated with µ

1

(X) and not µ

0

(X), the variable would be included in the sufficient subset. If there are interactions between the treatment and the effect modifiers, they are implicitly specified by the two models for the potential outcomes. Therefore, using this approach, we will not get explicit information about effect modifiers.

Since variable selection is a popular topic in many areas within statistics, several methods have been developed and proposed to implement it. One example is the subset selection that results in reduction of variables, which implies interpretable models. However, if there are small changes in the data, this method can result in very different models.

Another example is a method called the LASSO. It was proposed by Tibshirani (1996), and has grow in popularity.

The LASSO tries to retain interpretability of the resulting model, but is more stable than the subset selection. In subsequent section this method is described.

2.4.1 The LASSO

The LASSO was proposed as a method for estimation in linear models, and can be used as a method for variable selection. During the model fit, depending on the association with the outcome, regression coefficients for some variables are shrunken and others are set to zero. Therefore, variable selection can be made at the same time as the model for the outcome is fitted (Tibshirani, 1996).

For example, when considering the ordinary least square (OLS) fitting process, it involves estimating the regression coefficients, β

= (β

0

, β

1

, ..., β

p−1

), by minimizing the residual sum of squares (RSS), i.e., by minimizing

RSS = ∑

N

i=1

(Y

i

− β

0

− ∑

p

j=1

β

j

X

ij

)

2

. (11)

When using the OLS in a LASSO, the regression parameters are taken so that they instead minimize the following

(14)

quantity,

N

i=1

(Y

i

− β

0

− ∑

p

j=1

β

j

X

ij

)

2

+ λ ∑

p

j=1

∣β

j

∣ = RSS + λ ∑

p

j=1

∣β

j

∣, where λ ≥ 0. (12) The last term in Equation 12 is called the LASSO penalty, and has the property of forcing some coefficients to be exactly zero. The λ in Equation 12 is called a tuning parameter and can be selected by, e.g., comparing prediction errors by cross validation (Tibshirani, 1996).

In this thesis we only consider the LASSO applied to OLS, but it can also be applied to general classes of models.

This means that Equation 12 can be expressed more general, e.g., it can be applied to generalized linear models (GLM) (Park and Hastie, 2007).

2.5 The difference LASSO

Instead of minimizing Equation 11, with respect to Y , we now describe two algorithms including a minimizing problem with respect to predicted quantities instead of observed values. The motivation behind these algorithms is that, in causal inference, we are interested in variable selection with an explicit predictive motivation, since we predict values for potential outcomes. The first is an algorithm for the so called predictive LASSO and the second is for a method called the difference LASSO. The idea of the difference LASSO is that it should result in a model containing only effect modifiers, i.e., it should result in identification of effect modifiers. The method is based on the work of the predictive LASSO. The predictive LASSO was proposed by Tran et al. (2012), and below we describe a general algorithm for a predictive LASSO, formulated by Ghosh et al. (2015), and based on the work of Tran et al.

(2012). To begin with, the data is divided into two different sets. The first set is called a training set, and the second is called a validation set. The procedure of the method is described in Algorithm 1. In the first two steps a prediction is done, the predictive LASSO occurs in the third step of the algorithm.

Algorithm 1 (The predictive LASSO)

1. Fit a linear regression for Y on T and X, using the training set.

2. Predict values for Y based on the model fitted in previous step, by using the validation set.

3. Fit a LASSO regression for the predicted values of Y on X and T , using the validation set.

As been mentioned in Section 2.1, within the potential outcomes framework, only one of the potential outcomes,

Y (1) and Y (0), is observed. This motivates the view of the potential outcomes as a missing data issue (Rosenbaum

and Rubin, 1983). Assumption 1, in Section 2.1, corresponds to that the data are missing at random (Ghosh et al.,

2015). This means that the missingness mechanism depends on the covariates, X (Rubin, 1976). Therefore, one way

to handle the missingness is to use X, to impute the missing responses. This means that the estimated treatment

effects rely on variables that involve prediction. Because of this, Ghosh et al. (2015) argue that the LASSO should be

applied to functions of predicted/imputed values of the response, rather than godness-of-fit measures from standard

regression models, e.g., the sum of squares term in Equation 12.

(15)

Table 2: Example of how training and validation data are used in the difference LASSO algorithm.

Training data: Validation data:

i y

i

t

i

x

1i

x

2i

i y

i

t

i

x

1i

x

2i

µ ˆ

1i

( x) µ ˆ

0i

( x) y ˜

i

1 0.12 1 0 2.21 8 0.98 0 1 0.25 0.75 0.98 − 0.23

5 0.34 0 0 3.25 3 0.52 0 0 0.67 0.70 0.52 0.18

2 0.55 1 0 4.76 10 0.13 1 1 0.21 0.13 1.09 − 0.91

6 0.21 1 1 0.12 9 0.62 1 0 0.11 0.62 1.11 − 0.49

7 0.89 0 1 1.23 4 0.76 1 0 0.91 0.76 1.00 − 0.24

Model fit:

ˆ

µ

ti

( x) = 1.13 − 0.01x

1i

− 0.14x

2i

− 0.34t

i

In the difference LASSO a univariate outcome variable is defined, ˜ Y

i

, as the difference between the observed and predicted potential outcomes for individual i. The new variable is used as an outcome variable in the third step of the algorithm, described for the predictive LASSO regression. In other words, it will be Y (1) − ˆµ

0

(x) when T = 1 and ˆ µ

1

(x) − Y (0) when T = 0, where ˆµ

1

(X) and ˆµ

0

(X) are estimated using some regression model. Ghosh et al.

(2015) used random forest to estimate ˆ µ

1

(X) and ˆµ

0

(X), for simplicity we have instead used OLS estimation. In Table 2 there is an example for demonstration of how this procedure is done. The algorithm for the difference LASSO regression is described in Algorithm 2.

Algorithm 2 (The difference LASSO)

1. Fit a regression for Y on T and X, using a training set.

2. Based on the model fitted in previous step, compute the predicted/fitted values for the missing potential outcome using the validation set. Use these values as the imputed values for the potential outcome that is missing, i.e., for an individual that has T = 1 this means that Y (0) is predicted and for an individual that has T = 0, Y (1) is predicted.

3. Create a univariate response variable for each individual which is defined as ˜ Y = Y (1) − ˆµ

0

(x) if T = 1 and Y ˜ = ˆµ

1

(x) − Y (0) if T = 0. Perform a LASSO for ˜Y on the covariates X.

This procedure can be demonstrated in an example. Let Equation 13 be the true model, where α is a p × 1–vector containing regression coefficients, and α

0

is an intercept. Here, the covariate X

1

is an effect modifier, interacting with the treatment. Furthermore, let Equation 14 be the working model used for estimation, where ˆ β

0

is an estimated intercept and ˆ η is the estimated regression coefficients for the treatment variable. Thus, we fit a model without an interaction between the effect modifier and the treatment.

µ

t

(X) = α

0

+ α

X + τT + γX

1

T, (13)

ˆ

µ

t

(X) = ˆβ

0

+ ˆβ

X + ˆηT. (14)

This results in that the univariate outcome, ˜ Y is

Y ˜ = T [µ

1

(X) − ˆµ

0

(X)] + (1 − T )[ˆµ

1

(X) − µ

0

(X)]

= T (α

0

+ α

X + τ + γX

1

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶

µ1(X)

− ˆβ

0

− ˆβ

X

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶

ˆ µ0(X)

) + (1 − T )( ˆβ

0

+ ˆβ

X + ˆη

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶

ˆ µ1(X)

− α

0

− α

X

´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶

µ0(X)

)

= T (α

0

+ α

X + τ + γX

1

− ˆβ

0

− ˆβ

X ) − T ( ˆβ

0

+ ˆβ

X + ˆη − α

0

− α

X ) + ˆβ

0

+ ˆβ

X + ˆη − α

0

− α

X

= ˆβ

0

− α

0

+ ˆη + ( ˆβ − α)

X + (2α

0

− 2 ˆβ

0

+ τ − ˆη)T + 2(α − ˆβ)

XT + γX

1

T,

(16)

i.e., if α

0

− ˆβ

0

= 0, α− ˆβ = 0 and τ − ˆη = 0, then the LASSO model fitted in the third step of the algorithm only includes an intercept, ˆ η, and the interaction with the treatment, γX

1

T , which would be the desired objective. However, the model in the first step of the algorithm is incorrectly specified, since the interaction with the treatment variable is omitted, which may result in biased parameter estimates (Elwert and Winship, 2010). This omitted variable bias is Bias (ˆβ) = (τ − E[ˆη])E[(X

X )

−1

X

T ] + γE[(X

X )

−1

X

Z ], (15) see Appendix A2 for details of the computation of the bias. We provide two numerical examples, Example 3-4, where we demonstrate the properties of the difference LASSO. In Example 3 there is only one effect modifier and in Example 4 there are two effect modifiers. As can be seen from these examples, the difference LASSO identifies the effect modifiers correctly.

Example 3 (One effect modifier) Assume a case with one effect modifier. Here Y is the true model and ˆ Y is the estimated model. The model is estimated by OLS and X

1

∼ N(0, 1), X

2

∼ N(0, 1), T ∼ Be(p(X

1

, X

2

)).

p (X

1

, X

2

) = [1 + exp (0.1 − 0.75X

1

+ 0.1X

2

)]

−1

, Y = 1 + 4X

1

+ 0.2X

2

+ 3T + 0.45T X

2

, Y ˆ = 0.98 + 4X

1

+ 0.41X

2

+ 3T.

The univariate response variable becomes

Y ˜ = T [Y (1) − ˆY (0)] + (1 − T )[ ˆY (1) − Y (0)]

= T (4 + 4X

1

+ 0.65X

2

− 0.98 − 4X

1

− 0.41X

2

) + (1 − T )(0.98 + 4X

1

+ 0.41X

2

+ 3 − 1 − 4X

1

− 0.2X

2

)

= T (3.02 + 0.24X

2

) + (1 − T )(2.98 + 0.21X

2

)

= 3.02T + 0.24X

2

T + 2.98 + 0.21X

2

− 2.98T − 0.21X

2

T

= 2.98 + 0.21X

2

+ 0.04T + 0.03X

2

T,

and X

1

has been eliminated. The univariate response variable, estimated with LASSO, is ˆ ˜

Y = −0.001 + 0.14X

2

,

and only X

2

is included in the model.

Example 4 (Two effect modifiers) Assume a case where both variables are effect modifiers. Again, Y is the true model and ˆ Y is the estimated model. The model is estimated by OLS and X

1

∼ N(0, 1), X

2

= X

12

, T ∼ Be(p(X

1

, X

2

)).

p (X

1

, X

2

) = (1 + exp (0.1 − 0.75X

1

+ 0.1X

2

))

−1

, Y = 1 + 2X

1

+ 0.2X

2

+ 3T + 2X

1

T + 0.45X

2

T, Y ˆ = 1.3 + 2.94X

1

+ 0.42X

2

+ 3.07T.

The univariate outcome variable becomes

Y ˜ = T [Y (1) − ˆY (0)] + (1 − T )[ ˆY (1) − Y (0)]

= T (4 + 4X

1

+ 0.65X

2

− 1.3 − 2.94X

1

− 0.42X

2

) + (1 − T )(1.3 + 2.94X

1

+ 0.42X

2

+ 3.07 − 1 − 2X

1

− 0.2X

2

)

= T (2.7 + 1.06X

1

+ 0.23X

2

) + (1 − T )(3.37 + 0.94X

1

+ 0.22X

2

)

= 2.7T + 1.06X

1

T + 0.23X

2

T + 3.37 + 0.94X

1

+ 0.22X

2

− 3.37T − 0.94X

1

T − 0.22X

2

T

= 3.37 + 0.94X

1

+ 0.22X

2

− 0.67T − 0.12X

1

T − 0.01X

2

T.

The univariate outcome variable, estimated with LASSO, is ˆ ˜

Y = −0.001 + 0.83X

1

+ 0.17X

2

,

thus, both X

1

and X

2

are included in the model.

(17)

From this we can conclude that the difference LASSO is based on an incorrectly specified model. The result is that the parameters for the effect modifiers are biased, and covariates that are not effect modifiers are eliminated. This is a property that was not adressed in Ghosh et al. (2015). In terms of the traditional LASSO regression described in Equation 12, the difference LASSO regression results in that we can identify the effect modifiers, i.e., covariates defining subgroups where the treatment has different effect. From this information we can, e.g., stratify by the effect modifiers and estimate the ATE within the created groups, by using Equation 4 or 5, and the estimated effects are the results of the effect modification.

3 Simulations

In this section a simulation study is presented. The data generating processes are described for the considered designs, and the dependencies between the variables are illustrated. The section ends with a presentation of the results from the simulation study.

3.1 Designs

We consider four different data generating processes for (T, X, Y (1), Y (0)), Design A-D, and they are described in Table 3-4. All designs have a total of 20 covariates of which ten are not associated with the treatment, nor the potential outcomes. Five covariates are discrete and the remaining are continuous. There are three variables, U

1

, U

2

and U

3

, which are generated in order to create dependencies between covariates, but these variables are not included in the analysis. Table 3 describes the data generating process for the covariates and the treatment variable, which is the same for all designs. The dependencies between the covariates, as well as the treatment variable and the outcomes are described in the directed acyclic graph (DAG) in Figure 3. From this we see that there are twelve potential confounders, {X

1

, X

2

, ..., X

10

} and {U

1

, U

2

}.

X

1

X

2

X

3

X

4

X

5

X

6

X

7

X

8

X

9

X

10

X

11

X

12

X

13

X

14

X

15

X

16

X

17

X

18

X

19

X

20

U

1

U

3

Y (t)

U

2

T

Figure 3: Illustration of the dependencies between variables. Details about the data generating process, can be found in Table 4.

Let ˜ X be denote a subset of the covariates in X. The probability to receive treatment given the covariates, i.e., the propensity score, is generated from a function of this subset, i.e., Pr (T = 1∣X = x) = p( ˜ X = ˜x). The potential outcomes are also generated from a subset of the covariates. The designs have different models for the potential outcomes. In Design A the models are linear in X, and in the other designs non-linear. In those designs, X

3

and X

9

are modelled as functions. In Designs B-D, X

9

is modelled as a logarithm, and the function for X

3

differs between

the designs. A more detailed description is given in Table 4, together with numerical approximations of the different

causal parameters and the amount of effect modification in each design. In Figure 4 the assumption of overlap,

Assumption 2 in Section 2.1, is evaluated. The figure shows the distribution of the propensity score, for the different

treatment groups. The figure displays that the assumption of overlap is valid.

(18)

0 1 2 3

0.00 0.25 0.50 0.75 1.00

Pr(T=1|X=x)

density

Treatment Controls Treated

Figure 4: Distribution of the propensity score, for both treatment groups, i.e., the overlap, which is the same for all designs. The propensity score is the probability to receive treatment, conditional on the covariates.

Table 3: Description of the data generating processes for covariates and treatment variable, where ˜ X is a vector which only contains covariates that affect the treatment, ˜ X ⊆ X.

The data generating processes

Covariates U

1

∼ U(0, 1), X

1

∼ Be(0.2U

1

) , X

2

∼ Be(U

1

) , X

3

∼ U(−0.5, 0.5), X

4

∼ U(−1, 1), X

5

∼ N(0, Σ

X5,X6

) , X

6

∼ N(0, Σ

X5,X6

) , Σ

X5,X6

= (

0.216 0.91.2 0.216

) ,

X

7

∼ N(0, Σ

X7,X8

) , X

8

∼ N(0, Σ

X7,X8

) , Σ

X7,X8

= (

0.051 0.051

) ,

X

9

∼ U(0, 1), U

2

∼ Po(3), X

10

∼ Mult(π

j

( X

9

, U

2

)) , where j = 1, 2, 3 and π

1

=

exp (−0.25−0.02X9+0.01U2)

1+exp (−0.25−0.02X9+0.01U2)

, π

2

=

exp (0.7−0.02X9+0.01U2)

1+exp (0.7−0.02X9+0.01U2)

, π

3

= 1 − π

1

− π

2

, U

3

∼ U(0, 1), X

11

∼ Be(U

3

) , X

12

∼ Be(0.1U

3

) ,

X

13

∼ U(0, 1), X

14

∼ U(−1, 1), X

15

∼ N(0, 1),

X

16

∼ N(0, Σ

X16,X17

) , X

17

∼ N(0, Σ

X16,X17

) , Σ

X16,X17

= (

0.2 11 0.2

) , X

18

∼ N(0, 1), X

19

∼ N(0, 1), X

20

∼ N(2, 1.2).

T ∼ Be(p( ˜ X)) p( ˜ X) = [1 + exp (0.02 + 0.09X

1

− 0.1X

2

+ 0.2X

3

+ 0.8X

4

− 0.01X

5

− 0.02X

6

0.02X

7

+ 0.01X

8

+ 0.01X

9

+ 0.01X

10

)

−1

]

(19)

Table 4: Description of the data generating processes for the different designs, used in the simulation study.

In all designs, X

1

and X

3

are effect modifiers, which is visible from the definitions of the potential outcomes.

The models in Design A are linear in X, and the other designs are not.

Design A

Outcome models Y (1) = −2.5 + 3.7X

1

+ 0.1X

2

+ 1.9X

3

+ 0.1X

4

+ 0.2X

9

+ ε

1

, ε

1

∼ N(0, 1), Y (0) = −2.1 + 0.3X

1

+ 0.1X

2

− 1.1X

3

+ 0.1X

4

+ 0.2X

9

+ ε

0

, ε

0

∼ N(0, 1).

Parameters E[T = 1∣X = x] = 0.5, ATE = −0.08, ATE

unadj.

= − 0.12 (numerical

approximations)

Effect modification E[Y (1) − Y (0)∣X

1

= 0] = −0.41, E[Y (1) − Y (0)∣X

1

= 1] = 3,

E[Y (1) − Y (0)∣X

3

∈ [− 0.5, −0.25]] = −1.18, E[Y (1) − Y (0)∣X

3

∈ (− 0.25, 0]] = −0.46, E[Y (1) − Y (0)∣X

3

∈ ( 0, 0.25]] = 0.33, E[Y (1) − Y (0)∣X

3

∈ ( 0.25, 0.5]] = 1.1.

Design B

Outcome models Y (1) = 0.2 + 3.7X

1

+ 0.1X

2

+ 0.3(X

3

+ 1)

2

− 0.1X

4

+ 0.2 log (X

9

) + ε

1

, ε

1

∼ N(0, 0.1), Y (0) = 3 + 0.3X

1

+ 0.1X

2

− 2.1(X

3

+ 1)

2

− 0.1X

4

+ 0.2 log (X

9

) + ε

0

, ε

0

∼ N(0, 0.1).

Parameters E[T = 1∣X = x] = 0.5, ATE = 0.1, ATE

unadj.

= 0.16, (numerical

approximations)

Effect modification E[Y (1) − Y (0)∣X

1

= 0] = −0.24, E[Y (1) − Y (0)∣X

1

= 1] = 3.17,

E[Y (1) − Y (0)∣X

3

∈ [− 0.5, −0.25]] = −1.5, E[Y (1) − Y (0)∣X

3

∈ (− 0.25, 0]] = −0.64, E[Y (1) − Y (0)∣X

3

∈ ( 0, 0.25]] = 0.58, E[Y (1) − Y (0)∣X

3

∈ ( 0.25, 0.5]] = 2.12.

Design C

Outcome models Y (1) = 0.2 + 3.7X

1

+ 0.1X

2

+ 0.3∣X

3

+ 1∣ − 0.1X

4

+ 0.2 log (X

9

) + ε

1

, ε

1

∼ N(0, 1), Y (0) = 3 + 0.3X

1

+ 0.1X

2

− 2.1∣X

3

+ 1∣ − 0.1X

4

+ 0.2 log (X

9

) + ε

0

, ε

0

∼ N(0, 1).

Parameters E[T = 1∣X = x] = 0.5, ATE = −0.07, ATE

unadj.

= − 0.03, (numerical

approximations)

Effect modification E[Y (1) − Y (0)∣X

1

= 0] = −0.41, E[Y (1) − Y (0)∣X

1

= 1] = 3.01,

E[Y (1) − Y (0)∣X

3

∈ [− 0.5, −0.25]] = −0.96, E[Y (1) − Y (0)∣X

3

∈ (− 0.25, 0]] = −0.39, E[Y (1) − Y (0)∣X

3

∈ ( 0, 0.25]] = 0.26, E[Y (1) − Y (0)∣X

3

∈ ( 0.25, 0.5]] = 0.87.

Design D

Outcome models Y (1) = 1.1 + 3.7X

1

+ 0.1X

2

− 0.9 log (X

3

+ 0.6) − 0.1X

4

+ 0.2 log (X

9

) + ε

1

, ε

1

∼ N(0, 1), Y (0) = −1.3 + 0.3X

1

+ 0.1X

2

− 4.5 log (X

3

+ 0.6) − 0.1X

4

+ 0.2 log (X

9

) + ε

0

, ε

0

∼ N(0, 1).

Parameters E[T = 1∣X = x] = 0.5, ATE = 0.31, ATE

unadj.

= 0.45, (numerical

approximations)

Effect modification E[Y (1) − Y (0)∣X

1

= 0] = −0.03, E[Y (1) − Y (0)∣X

1

= 1] = 3.38,

E[Y (1) − Y (0)∣X

3

∈ [− 0.5, −0.25]] = −2.82, E[Y (1) − Y (0)∣X

3

∈ (− 0.25, 0]] = −0.01,

E[Y (1) − Y (0)∣X

3

∈ ( 0, 0.25]] = 1.58, E[Y (1) − Y (0)∣X

3

∈ ( 0.25, 0.5]] = 2.67.

References

Related documents

Jag anser nämligen att jag nått mitt första mål, att visa att den lapska lasson med 8-formad 2-hålig ben- eller hornplatta har identiskt lika form, användning, betydelse och

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Byggstarten i maj 2020 av Lalandia och 440 nya fritidshus i Søndervig är således resultatet av 14 års ansträngningar från en lång rad lokala och nationella aktörer och ett

40 Så kallad gold- plating, att gå längre än vad EU-lagstiftningen egentligen kräver, förkommer i viss utsträckning enligt underökningen Regelindikator som genomförts

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

The network estimated with cfgl is free of noise and has almost perfectly estimated the correct edges structure within each class (which can be seen by comparing the edge