• No results found

Semiparametric inference with missing data: robustness to outliers and model misspecification

N/A
N/A
Protected

Academic year: 2022

Share "Semiparametric inference with missing data: robustness to outliers and model misspecification"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Robust semiparametric inference with missing data

Eva Cantoni 1 and Xavier de Luna 2

1 Research Center for Statistics and

Geneva School of Economics and Management, University of Geneva, Geneva 4, 1211, Switzerland.

2 Department of Statistics,

Ume˚ a School of Business, Economics and Statistics, Ume˚ a University, Ume˚ a, SE-90187, Sweden.

October 8, 2018

Abstract

Classical semiparametric inference with missing outcome data is not ro- bust to contamination of the observed data and a single observation can have arbitrarily large influence on estimation of a parameter of interest.

This sensitivity is exacerbated when inverse probability weighting methods are used, which may overweight contaminated observations. We introduce inverse probability weighted, double robust and outcome regression estima- tors of location and scale parameters, which are robust to contamination in the sense that their influence function is bounded. We give asymptotic prop- erties and study finite sample behaviour. Our simulated experiments show that contamination can be more serious a threat to the quality of inference than model misspecification. An interesting aspect of our results is that the auxiliary outcome model used to adjust for ignorable missingness by some of the estimators, is also useful to protect against contamination. We also illustrate through a case study how both adjustment to ignorable missing- ness and protection against contamination are achieved through weighting schemes, which can be contrasted to gain further insights.

Keywords: doubly robust estimator; influence function; inverse probability weighting; outcome regression.

arXiv:1803.08764v2 [stat.ME] 5 Oct 2018

(2)

1 Introduction

Many data analyses are concerned with drawing inference on a parameter β par- tially characterising a distribution law of interest from which data is assumed to be a random sample. However, most often the observed data deviates from this ideal random sample scenario, for instance such that some of the observations are contaminated, i.e. drawn from a nuisance distribution law. Another common de- viation is that the random sample is incomplete: some data are missing, due to dropout in follow up studies, non-response in surveys, etc. Such corrupted random samples are indeed the rule rather than the exception in applications, and we give a telling example in Section 4, where we study BMI change in a ten year follow up study. While methods are available to deal with these two different problems separately as described below, it is essential to have inferential methods able to deal with situations where both types of corruption (missingness and contamina- tion) arise simultaneously. Indeed, while it is well known that many estimating procedures, including OLS, ML, and method of moments, lack robustness to con- tamination (a single observation can have arbitrarily large influence, e.g., Hampel et al., 1986; Heritier et al., 2009), it is seldom acknowledged that this sensitivity to contamination can be exacerbated with estimators able to deal with missing data;

see, however, Hulliger (1995) and Beaumont et al. (2013), where this increased sen- sitivity has been pointed out in the context of surveys of finite populations. The potential increase sensitivity to contamination arises in particular for estimators overweighting some of the observations (those representing part of the data which are missing), and if the overweighted observations are by chance contaminated, this will have large negative impact on the inference. Thus, while robust methods are important in general, they are even more so when missing data needs to be accounted for.

In this paper we focus on situations where, while observations may be miss- ing for the response, there is a set of background variables (covariates) which are observed for all units, and we can assume that outcomes are independent of missingness given the observed covariates (ignorable missingness assumption). Un- der the latter assumption, auxiliary (nuisance) models explaining the missingness mechanism and the outcome given the covariates can be combined in different ways to obtain semiparametric estimators of β. Classical examples include inverse probability weighted estimators (IPW), using the missingness mechanism model as weights (Horovitz and Thompson, 1952), and augmented inverse probability weighted estimators (AIPW, Robins et al., 1994) using both auxiliary models.

AIPW estimators are then robust to the misspecification of one of these two aux-

iliary models at a time (thus the name doubly robust estimator often used); see,

e.g., Tsiatis (2006), Rotnitzky and Vansteelandt (2015). Finally, outcome regres-

sion imputation (OR) estimators using only the model for the outcome may also

(3)

be used, thereby avoiding weighting (Kang and Schafer, 2007; Tan, 2007).

Within this context of ignorable missing data in the outcome, we introduce and study estimators that are able to deal with situations where most of the units in the sample are randomly drawn from the distribution of interest while a smaller number of units is possibly drawn from another nuisance distribution. An estimator is considered robust to such contamination if it has bounded influence function, see Hampel et al. (1986). This is because the influence function measures the asymptotic bias due to an infinitesimal contamination. A single observation can thus yield arbitrarily large bias if the influence function of the estimator is not bounded. Classical IPW, AIPW and OR estimators have unbounded influence function. They are not robust in this sense, even though AIPW has a robustness property, but merely to misspecification of one of the auxiliary models used. In a full data and finite parametric context, bounded influence function estimators are most naturally introduced as M-estimators (Huber, 1964; Hampel, 1974). Here we take advantage of the fact that IPW, AIPW and OR estimators are partial M-estimators (Newey and McFadden, 1994; Stefanski and Boos, 2002; Zhelonkin et al., 2012) to propose bounded influence function estimators. An interesting result of the introduced estimators is that the auxiliary outcome regression model used by AIPW to improve on efficiency compared to IPW, happens to also be useful in improving on the robustness properties of AIPW and OR. Robustness to contamination is typically obtained at the price of a loss in efficiency, although the latter can be controlled and set to say approximately 5% under some conditions.

On the other hand, our simulated experiments show that moderate contamination seriously affects the quality of classical semiparametric inference, more so than model misspecification. Our approach is general and we fully spell out the case where β is the two dimensional location-scale parameter.

The paper is organized as follows. Section 2 presents formally the context,

and introduces robust estimators for missing data situations, together with their

asymptotic properties. Section 3 studies finite sample properties through simu-

lation designs previously used by Lunceford and Davidian (2004), to which we

have added several contamination schemes. This allows us to study robustness

due both to model misspecification and to contamination. In Section 4 a longitu-

dinal study of BMI based on electronic record linkage data is used to illustrate,

e.g., how the robustification introduced can be seen as a weighting scheme which

can be compared to the weighting used to correct for ignorable missingness. The

paper is concluded with a discussion in Section 5. Regularity conditions, proofs,

implementation details and exhaustive results from the simulations are relegated

to the Appendix.

(4)

2 Theory and method

2.1 Notation and context

Let a vector variable Z be partitioned as (Z 2 0 , Z 1 0 ) 0 , and consider the ideal situation when (Z 2i , Z 1i , i = 1, . . . , n) are independently drawn from a probability law with density p(Z 2i , Z 1i ; β, η) = p(Z 2i ; β, η)p(Z 1i | Z 2i ; η) for unknown values β = β 0 and η = η 0 , where β, of finite dimension, is the parameter of interest describing some aspects of the distribution, η is a nuisance parameter possibly of infinite dimension, and β and η are variationally independent (semiparametric model; see Tsiatis, 2006, Chap. 4). We consider simultaneously two types of deviation from the above ideal random sample setting.

First, situations where atypical observations can occur in Z 2i (and possibly Z 1i ), i.e. where the majority of the data is generated as described above, but some of the observations may be issued from a different, but unknown, distribution. The final goal is to draw inference about β, even in the presence of a small fraction of spurious data points.

Further, we also want to allow for incomplete data situations, where we observe only (R i Z 2i , Z 1i , R i , i = 1, . . . , n), with R i a binary variable indicating the obser- vation status of Z 2i : R i = 1 if observed and R i = 0 if missing. We make through- out the missing at random assumption (also called ignorable missingness), i.e.

Pr(R i = 1 | Z 2i , Z 1i ) = π(Z 1i ), with π(Z 1i ) > 0 on the support of Z 1i . The missing assignment mechanism is modelled up to a parameter γ, π(Z 1i ; γ), and we distin- guish cases where this model is correctly specified, i.e. Pr(R i = 1 | Z 1i ) = π(Z 1i ; γ 0 ) for a given but unknown γ 0 , and cases where it is misspecified, i.e. an incorrect model for Pr(R i = 1 | Z 1i ) is used.

2.2 Full data case: robust M-estimators

Let us first consider an estimating function m(Z 2 ; β), which would be used if we had no missing data (R i = 1 for all i):

n

X

i=1

m(Z 2i ; β) = 0. (1)

The choice of m(Z 2i ; β) may be done based on desired properties for the resulting

M-estimator for β (in the complete data case); e.g., such that E(m(Z 2i ; β 0 )) = 0 for

consistency. The study of robustness properties to contamination was formalised

in Hampel (1974). The influence function plays a central role because it can be in-

terpreted as measuring the asymptotic bias due to an infinitesimal contamination.

(5)

Here, the influence function for the resulting estimator ˆ β solution of (1) is E



− ∂m(Z 2i ; β)

∂β

 −1

m(Z 2i ; β) (2)

under suitable regularity conditions (Stefanski and Boos, 2002).

In the sequel we focus on the location-scale parameter β = (µ = E(Z 2i ), σ 2 = V ar(Z 2i )) 0 . A commonly used choice is m(Z 2i ; β) = (Z 2i − µ, (Z 2i − µ) 2 − σ 2 ) 0 , because the resulting estimator is efficient in the Gaussian case. For this choice of m estimating function, the influence function will not be bounded in Z 2i and therefore not robust to contamination; see, e.g., Maronna et al. (2006, Chap. 2).

A general class of M-estimators for µ and σ 2 are solution of (1) for m ψ (Z 2i ; β) =

 ψ c

µ

Z

2i

−µ σ  − A ψ 2 c

σ

Z

2i

σ −µ  − B



, (3)

where ψ c (·) is an odd function, and where A = E ψ c

µ

σ −1 0 (Z 2i − µ 0 )  and B = E ψ 2 c

σ

σ −1 0 (Z 2i − µ 0 ) in order to ensure that E(m ψ (Z 2i ; β)) = 0 at β 0 = (µ 0 , σ 0 ), the true unknown value for (µ, σ). Bounded influence function esti- mators are obtained by using bounded ψ c (·) functions, e.g., the Huber function ψ c (t) = min{c, max{t, −c}}, and the Tukey biweight function ψ c (t) = ((t/c) 2 −1) 2 t if |t| < c and 0 otherwise, see Heritier et al. (2009) for further details. The value for c can be chosen appropriately to control efficiency under the non-contaminated Gaussian case. Equations (1) using (3) need to be solved simultaneously for µ and σ.

2.3 Robust estimation with missing data

Semiparametric estimation with missing data has been reviewed for instance in Tsiatis (2006). We introduce below novel bounded influence function estimators.

Let π(Z 1i ; γ) be a well specified parametric model, i.e. such that for Pr(R i = 1 | Z 1i ) = π(Z 1i ; γ 0 ) for an unknown value γ 0 . Assume that we have an estimator ˆ

γ of γ solution of estimating equations

n

X

i=1

m γ (R i , Z 1i ; γ) = 0, (4)

such that p-lim n→∞ γ = γ ˆ 0 .

Definition 1. A robust inverse probability weighted (RIPW) estimator (ˆ µ RIP W , ˆ σ RIP W ) 0 of (µ, σ) 0 is solution of the estimating equation:

n

X

i=1

ϕ RIP W (Z i , R i ; β, ˆ γ) = 0, (5)

(6)

where

ϕ RIP W (Z i , R i ; β, γ) =

R

i

ψ

σ

−1

(Z

2i

−µ) 

−A 

π(Z

1i

;γ) R

i

ψ

2

σ

−1

(Z

2i

−µ) 

−B 

π(Z

1i

;γ)

 ,

with A = E ψ c

µ

σ −1 0 (Z 2i − µ 0 ) and B = E ψ c 2

σ

σ 0 −1 (Z 2i − µ 0 ) .

A similar estimator was proposed and studied in Hulliger (1995) in the context, however, of finite populations and surveys. Note that letting ψ c (t) = t, the identity function, yields a classical inverse probability weighted estimator (Horovitz and Thompson, 1952).

Remark 1. RIPW estimation can be interpreted as a double weighting scheme estimator, where Z 2i observations are weighted with inverse propensity scores 1/π(Z 1i ; γ) (i.e., observations lying on the covariate support where the proba- bility of dropout is higher are overweighted) and with ψ weights ψ c

µ

σ −1 (Z 2i − µ)/(σ −1 (Z 2i − µ)) (i.e., outlying observations are downweighted). These weights as well as the compound weights 1/π(Z 1i ; γ) × ψ c

µ

σ −1 (Z 2i − µ)/(σ −1 (Z 2i − µ)) may be looked at in applications to gain insight in how the two weighting schemes interact; see Section 4 for an illustration.

Proposition 1. Let π(Z 1i ; γ) be correctly specified with (4) such that p-lim n→∞ γ = ˆ γ 0 . Then, under regularity conditions given in Appendix B, (ˆ µ RIP W , ˆ σ RIP W ) 0 is consistent for (µ 0 , σ 0 ) 0 and has the following asymptotic multivariate normal dis- tribution as n → ∞

√ n



(ˆ µ RIP W , ˆ σ RIP W ) 0 − (µ 0 , σ 0 ) 0

 d

→ N 0, EIF RIP W (IF RIP W ) 0 , where IF RIP W is the influence function:

IF RIP W (Z i , R i ; β) = −



E  ∂m ψ (Z 2i ; β)

∂β 0

 −1 (

ϕ RIP W (Z i , R i ; β, γ 0 )

− E  ∂ϕ RIP W (Z i , R i ; β, γ 0 )

∂γ 0

 

E  ∂m γ (R i , Z 1i ; γ 0 )

∂γ 0

 −1

m γ (R i , Z 1i ; γ 0 ) )

. (6)

Thus, from (6) we see that the influence function of RIPW is bounded in Z 2i if the function ψ c (·) is bounded. This is not the case for the classical IPW, cor responding to ψ c (t) = t.

The implementation of RIPW requires the computation of A and B. If the

standardized quantity σ 0 −1 (Z 2i − µ 0 ) is satisfactorily approximated by a N (0, 1)

(7)

variate, then A = 0 (since ψ c is odd) and B can be approximated by Monte Carlo simulations.

In an attempt to improve efficiency one may consider h(Z 1i ; β, ξ) a working model (parametrised with ξ) for E(m(Z 2i ; β) | Z 1i ). This model is correctly spec- ified for E(m(Z 2i ; β) | Z 1i ) if h(Z 1i ; β, ξ 0 ) = E(m(Z 2i ; β) | Z 1i ) for a value ξ 0 . However, we call it working model because we will also consider situations where it is not necessarily correctly specified. Assume we have estimators ˆ ξ of ξ and ˆ γ of γ, respectively solutions of (4) and

n

X

i=1

R i m ξ (Z i ; ξ) = 0, (7)

such that p-lim n→∞ ξ = ξ ˆ and p-lim n→∞ γ = γ ˆ , for some fix values ξ and γ . In the correctly specified cases ξ = ξ 0 and γ = γ 0 .

Definition 2. A robust augmented IPW (RAIPW) estimator (ˆ µ RAIP W , ˆ σ RAIP W ) 0 of (µ, σ) 0 is solution of the estimating equation:

n

X

i=1

ϕ RAIP W (Z i , R i ; β, ˆ γ, ˆ ξ) = 0, (8)

where

ϕ RAIP W (Z i , R i ; β, γ, ξ) =

R

i

ψ

σ

−1

(Z

2i

−µ) 

−A 

π(Z

1i

;γ) − h

R

i

−π(Z

1i

;γ)

π(Z

1i

;γ) h 1 (Z 1i ; β, ξ) i

R

i

ψ

2

σ

−1

(Z

2i

−µ) 

−B 

π(Z

1i

;γ) − h

R

i

−π(Z

1i

;γ)

π(Z

1i

;γ) h 2 (Z 1i ; β, ξ) i

 ,

h 1 (Z 1i ; β, ξ) is a working model for E 

ψ c

µ

−1 (Z 2i − µ) − A|Z 1i 

and h 2 (Z 1i ; β, ξ) for E 

ψ 2 c

σ

−1 (Z 2i − µ) − B|Z 1i



, and A = E ψ c

µ

σ −1 0 (Z 2i − µ 0 ) and B = E ψ 2 c

σ

σ −1 0 (Z 2i − µ 0 ) .

Using the identity function for ψ c yields a classical augmented inverse proba- bility weighting (AIPW) estimator (Robins et al., 1994).

Proposition 2. Let π(Z 1i ; γ) be correctly specified with (4) such that

p-lim n→∞ ˆ γ = γ 0 and/or let h(Z 1i ; β, ξ) = (h 1 (Z 1i ; β, ξ), h 2 (Z 1i ; β, ξ)) 0 be correctly specified with (7) such that p-lim n→∞ ξ = ξ ˆ 0 . Then, under regularity conditions given in Appendix A, (ˆ µ RAIP W , ˆ σ RAIP W ) is consistent for (µ 0 , σ 0 ) 0 and has the following asymptotic multivariate normal distribution as n → ∞

√ n 

(ˆ µ RAIP W , ˆ σ RAIP W ) 0 − (µ 0 , σ 0 ) 0  d

→ N 0, EIF RAIP W (IF RAIP W ) 0 ,

(8)

where

IF RAIP W (Z i , R i ; β) = −



E  ∂m ψ (Z 2i , β)

∂β 0

 −1 (

ϕ RAIP W (Z i , R i ; β, γ , ξ )

− E  ∂ϕ RAIP W (Z i , R i ; β, γ , ξ )

∂γ 0

 

E  ∂m γ (R i , Z 1i ; γ )

∂γ 0

 −1

m γ (R i , Z 1i ; γ )

− E  ∂ϕ RAIP W (Z i , R i ; β, γ , ξ )

∂ξ 0

 

E  ∂m ξ (Z i ; ξ )

∂ξ 0

 −1

m ξ (Z i , ξ ) )

. (9)

Thus, RAIPW is as AIPW doubly robust in the sense that only one of the two auxiliary models used must be correctly specified in order to obtain consistent and asymptotic normal estimators. Moreover, the influence function of RAIPW is bounded in Z 2i if the function ψ c (·) is bounded (assuming the estimating equation (7) of the auxiliary model has also bounded influence function in Z 2i ; see Exemple 1), while this is not the case for the classical AIPW.

Example 1 (RAIPW estimator of location and scale). Let us specify a working model parametrized by ξ = (ξ 1 0 , ξ 2 ) 0 as

Z 2i = ˜ h(Z 1i ; ξ 1 ) + ξ 2 ν, (10) with ν ∼ N (0, 1). Note that this does not constrain Z 2i to have a symmetric distribution as was the case for RIPW. The corresponding working model for E(m(Z 2i ; β) | Z 1i ) is such that h 1 (Z 1i ; β, ξ) = ˜ h(Z 1i ; ξ 1 ) − µ and h 2 (Z 1i ; β, ξ) = (˜ h(Z 1i ; ξ 1 ) − µ) 2 + ξ 2 2 − σ 2 . Estimators of ξ with bounded influence function in this context are described in Appendix C.2. Then,

h 1 (Z 1i ; β, ξ) = E 

ψ c

µ

−1 (˜ h(Z 1i ; ξ 1 ) + ξ 2 ν − µ)|Z 1i 

− A, (11)

h 2 (Z 1i ; β, ξ) = E



ψ 2 c

σ

−1 (˜ h(Z 1i ; ξ 1 ) + ξ 2 ν − µ)|Z 1i



− B, (12)

may be computed using numerical integration for the conditional expectations, where E(· | Z 1i ) is the expectation under model (10). Under the latter model, A = 0 and Monte Carlo simulations can be used to obtain B. Both (11) and (12) can be used to obtain RAIPW estimators through (8). See Appendix C for more implementation details.

Finally, when the outcome model is correctly specified yet another robust esti- mator can be introduced.

Definition 3. A robust outcome regression estimator (ROR) (ˆ µ ROR , ˆ σ ROR ) 0 of (µ, σ) is solution of

n

X

i=1

h(Z 1i ; β, ˆ ξ) =

n

X

i=1

ϕ ROR (Z 1i ; β, ˆ ξ) = 0 (13)

(9)

by using a correctly specified working model h(Z 1i ; β, ξ 0 ) = E(m(Z 2i ; β) | Z 1i ) together with ˆ ξ, an M-estimator (7) of ξ with bounded influence function.

Proposition 3. Let h(Z 1i ; β, ξ) = (h 1 (Z 1i ; β, ξ), h 2 (Z 1i ; β, ξ)) 0 be correctly speci- fied with (7) such that p-lim n→∞ ξ = ξ ˆ 0 . Then, under regularity conditions given in Appendix B, (ˆ µ ROR , ˆ σ ROR ) 0 is consistent for (µ 0 , σ 0 ) 0 and has the following asymp- totic multivariate normal distribution as n → ∞

√ n 

(ˆ µ ROR , ˆ σ ROR ) 0 − (µ 0 , σ 0 ) 0  d

→ N (0, IF ROR (IF ROR ) 0 ), where

IF ROR (Z i , R i ; β) = −



E  ∂ϕ ROR (Z 1i ; β, ξ 0 )

∂β 0

 −1 (

ϕ ROR (Z 1i ; β, ξ 0 )

− E  ∂ϕ ROR (Z 1i ; β, ξ 0 )

∂ξ 0

 

E  ∂m ξ (Z i ; ξ 0 )

∂ξ 0

 −1

m ξ (Z i , ξ 0 ) )

. (14) Example 2 (ROR estimator of location and scale). Within the context of Ex- ample 1, assume that model (10) holds. Then, h(Z 1i ; β, ξ) = (˜ h(Z 1i ; β, ξ) − µ, (˜ h(Z 1i ; ξ 1 ) − µ) 2 + ξ 2 2 −σ 2 ) 0 , and ξ is estimated with a bounded influence function estimator; see Appendix C.2 for details.

Unlike for RAIPW, the regularity conditions apply to the working model h(Z 1i ; β, ξ) only. For instance, to characterise the influence function one need to be more specific about the working model (which needs to be correctly specified).

On the other hand, the results of Propositions 1 and 2 for RIPW and RAIPW respectively give specifically the regularity conditions that must apply to the ψ c functions used, and the influence functions resulting.

We have focused on robustness properties to contamination in the outcome Z 2i . Contamination in the covariates Z 1i may also happen. This is typically tackled by using the Tukey’s redescending ψ function which protect against high leverage points, i.e. outlying values in the design space; see, e.g., Maronna et al. (2006, Chap. 4 and 5) and Cantoni and Ronchetti (2001).

3 Simulation experiments

We present a large simulation exercise to assess several aspects of our procedure

for the joint estimation of location and scale: behaviour for clean data, robustness

to the presence of contamination, and sensitivity to model misspecification.

(10)

3.1 Simulation setting

We implement the same simulation design as Lunceford and Davidian (2004). We consider the covariates X = (X 1 , X 2 , X 3 ) 0 associated with both the missingness mechanism and the outcome, and the covariates V = (V 1 , V 2 , V 3 ) 0 which are associ- ated only with the outcome. The variables (X 1 , X 2 , X 3 , V 1 , V 2 , V 3 ) 0 are realizations of the joint distribution of (X 0 , V 0 ) 0 built by first taking X 3 ∼ Bernoulli(0.2).

Then, conditionally on X 3 , V 3 is generated as Bernoulli with Pr(V 3 = 1 | X 3 ) = 0.75X 3 + 0.25(1 − X 3 ) and finally (X 1 , V 1 , X 2 , V 2 ) 0 | X 3 is taken from a multivariate normal distribution N (τ X

3

, Σ X

3

), where τ 1 = (1, 1, −1, −1) 0 , τ 0 = (−1, −1, 1, 1) 0 and

Σ 1 = Σ 0 =

1 0.5 −0.5 −0.5 0.5 1 −0.5 −0.5

−0.5 −0.5 1 0.5

−0.5 −0.5 0.5 1

 .

For each individual i = 1, . . . , n, the missingess mechanism indicator R i is generated as a Bernoulli variable with probability of missingness (R i = 0) defined by

Pr(R i = 0 | X, V ) = exp(γ 1 + γ 2 X 1i + γ 3 X 2i + γ 4 X 3i ) 1 + exp(γ 1 + γ 2 X 1i + γ 3 X 2i + γ 4 X 3i ) , which corresponds to the control group in Lunceford and Davidian (2004).

The response Z 2i is generated according to the model

Z 2i = ξ 10 + ξ 11 X 1i + ξ 12 X 2i + ξ 13 X 3i + ξ 14 V 1i + ξ 15 V 2i + ξ 16 V 3i +  i , (15) where  i ∼ N (0, ξ 2 2 = 1) and in our notation ξ 1 = (ξ 10 , ξ 11 , · · · , ξ 16 ).

The parameter values (ξ 10 , ξ 11 , ξ 12 , ξ 13 ) 0 = (0, −1, 1, −1) 0 are kept fixed through- out, whereas different scenarios are considered for (ξ 14 , ξ 15 , ξ 16 ) 0 and γ, namely

14 , ξ 15 , ξ 16 ) 0 =

(−1, 1, 1) 0 strong association (−0.5, 0.5, 0.5) 0 moderate association (0, 0, 0) 0 no association

(16)

and

γ = (γ 1 , γ 2 , γ 3 , γ 4 ) 0 =  (0, 0.6, −0.6, 0.6) 0 strong association

(0, 0.3, −0.3, 0.3) 0 moderate association. (17)

Notice that when (ξ 14 , ξ 15 , ξ 16 ) 0 = (0, 0, 0) 0 , V is associated with neither the

outcome nor the missingness mechanism. The values of ξ and γ are such that

lower response values and lower probabilities of missingness are obtained when

X 3 = 1, and conversely when X 3 = 0.

(11)

We generate 1000 realisations of size n = 1000 and 5000, called clean datasets, i.e. free of contamination. We present results for n = 1000, while the larger sample size confirmed the results and are omitted. Departing from the clean datasets, we obtain corresponding contaminated datasets according to different schemes as we describe in Section 3.3.

The combination of parameters in (16) and (17) gives six designs. For each design, we fit a total of 20 estimators of β = (µ, σ) 0 . They differ in the choice of estimation strategy (IPW, AIPW, OR), whether they are in their classical or robust versions, and whether the auxiliary models are misspecified or not. Thus, we consider

IPW(X), AIPW(X, X), AIPW(X, XV ), OR(X) and OR(XV ), and their robust versions

RIPW(X), RAIPW(X, X), RAIPW(X, XV ), ROR(X) and ROR(XV ), where the covariate sets used in the auxiliary models are given within parentheses, and, e.g., AIPW(X, XV ), means that the first set X is used to explain R i and the second set XV := (X, V ) is used to explain Z 2i . All these estimators use well specified auxiliary models. We, moreover, consider estimators using misspecified auxiliary models as follows:

IPW(X ), AIPW(X , XV ), AIPW(X, X V ), AIPW(X , X V ), and OR(X V ), and their robust versions

RIPW(X ), RAIPW(X , XV ), RAIPW(X, X V ), RAIPW(X , X V ), and ROR(X V ),

where X := X \ X 1 and X V := (X , V ). Auxiliary models explaining R i and Z 2i are fitted using, respectively, logistic regression and ordinary least squares for the classical versions, and robust logistic regression and robust linear regression for the robust versions. For RIPW and RAIPW estimators Tukey’s ψ function is used in (8). Tukey’s ψ function is usually preferred over Huber’s with asymmetric con- tamination. The robust estimators are tuned to have approximately 95% efficiency at the correctly specified models for clean data. The values of the corresponding tuning constants are given in Appendix D.1. For details on the computation see Appendix C.

3.2 Results for clean data

The top half of Figure 1 summarises with boxplots the estimates of µ (left) and σ

(right) for clean data, i.e. when the 1000 replicates are generated from the design

introduced in Section 3.1, with γ moderate and ξ moderate. The first row of

(12)

panels show that for both µ and σ all the estimators (classical and robust) except RIPW are, as expected, unbiased. The bias of RIPW is due to the correction terms (A and B in (5)) which are in this setting badly approximated based on the assumption that Z 2i is normally distributed. This is improved for RAIPW, because the use of the outcome model allows for a better approximation of the correction terms. Also as expected, (R)IPW is more variable than (R)AIPW.

● ●

● ●

●●

● ●

●●●●

●●

●● ●●●●

ROR(XV) ROR(X) RAIPW(X,XV) RAIPW(X,X) RIPW(X) OR(XV) OR(X) AIPW(X,XV) AIPW(X,X) IPW(X)

µ

Clean

● ●

●●

●●

σ

Clean

●●

●●

●●●●

●●●●

● ●

●●●

ROR(X_V ) RAIPW(X_,X_V ) RAIPW(X, X_V ) RAIPW(X_ , XV) RIPW(X_ ) OR(X_V ) AIPW(X_,X_V ) AIPW(X, X_V ) AIPW(X_ , XV) IPW(X_ )

Clean

● ●

●●

●●

Clean

● ●

●● ●●

● ● ●●

● ●

● ●● ●● ●

●● ● ●

●●

● ●

● ●

● ●

ROR(XV)

ROR(X) RAIPW(X,XV) RAIPW(X,X) RIPW(X) OR(XV) OR(X) AIPW(X,XV) AIPW(X,X) IPW(X)

C−asym

●●

● ●

● ●

●●

●●●●

● ●● ●

C−asym

● ●●●

● ●● ●●

● ●● ●

● ●

● ●

● ●

● ●

●●

● ●

●●

● ●

ROR(X_V )

RAIPW(X_,X_V ) RAIPW(X, X_V ) RAIPW(X_ , XV) RIPW(X_ ) OR(X_V ) AIPW(X_,X_V ) AIPW(X, X_V ) AIPW(X_ , XV) IPW(X_ )

0.0 0.5 1.0 1.5 2.0 2.5 3.0

C−asym

●●

● ●● ●●●

●●

● ●

●●

● ●

● ●●

●●

●●

●●

●●

●●

3 4 5 6

C−asym

Figure 1: Estimates of µ (left) and σ (right) for the γ moderate-ξ moderate scenario

for clean data and under the C-asym contamination. The vertical lines represent

the true underlying values.

(13)

The second row of panels confirms some other well known properties of the (A)IPW estimators: the bias due to misspecification of the missingness mecha- nism for IPW, the double robustness property of AIPW (i.e., unbiasedness if only one of the auxiliary models is misspecified) and the sensitivity of the OR esti- mator to the misspecification of the outcome regression model. Essentially, these properties are preserved for the robust versions introduced herein. These results are also summarised numerically in Table 4 (Appendix D.3), where bias, standard deviations and root mean squared error of the estimators are reported. The re- sults for the other five combinations of parameters ((16) - (17)) deliver a similar general message, with different magnitudes. The corresponding figures and tables supporting this claim are provided in Appendix D.4.

3.3 Results under contamination

With the result above that expected behaviours are obtained with clean data, we study now the effect on estimation of deviations from the data generating mechanism of interest. To generate a contaminated sample, 5% of the observed responses (i.e. data points for which R i = 1) issued from model (15) were randomly chosen and changed to the realization of:

C-asym U ∼ U (−20, −12),

C-sym W = BU − (1 − B)U , where B is Bernoulli(probability=0.5), C-hidden N ∼ N (−10, 0.4).

For the C-asym case, the range of the uniform distribution has been set such that it falls approximately outside the observed range of clean Z 2i . C-sym is the symmetric version of C-asym, and the C-hidden case is such that the contamination is not clearly visible when looking at the observed Z 2i marginally. Figure 2 displays a realization of each scheme for the scenario with γ moderate and ξ moderate: the values of Z 2i are plotted against E(Z 2i | Z 1i ), the linear predictor of the outcome model (15), with a histogram of the marginal distribution of Z 2i .

We present the results for the γ moderate and ξ moderate design and con-

tamination C-asym in bottom half of Figure 1. The results for the other γ-ξ

combinations are given in Appendix D.4. The third row of panels of Figure 1

displays the results for the correctly specified models. We can see that for µ all

the classical methods suffer a negative bias (underestimation) due to the presence

of contamination, and these bias are of similar magnitude. For σ, the biases of the

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

Re-examination of the actual 2 ♀♀ (ZML) revealed that they are Andrena labialis (det.. Andrena jacobi Perkins: Paxton &amp; al. -Species synonymy- Schwarz &amp; al. scotica while

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating

The EU exports of waste abroad have negative environmental and public health consequences in the countries of destination, while resources for the circular economy.. domestically