Variance Estimation of the Calibration Estimator with Measurement Errors in the Auxiliary Information

(1)

Variance Estimation of the Calibration

Estimator with Measurement Errors in the

Auxiliary Information

Author: Martin Carlsson

Spring 2018

Statistics, Advanced level Master Thesis II, 15 ECTS credits

Master programme in Applied Statistics, 120 ECTS credits

(2)

Abstract

The calibration estimator is in widespread use today in statistical surveys conducted all over the world. The estimator uses information in auxiliary variables to improve estimates. Mea-surement error is a prevalent problem in practically all surveys. In this thesis, the performance of the calibration estimator and a number of variance estimation methods were investigated with systematic measurement error present in the auxiliary information. The jackknife and bootstrap replicate methods as well as Taylor linearization were utilized. Results show that small amounts of systematic measurement error can severely influence the usefulness of variance estimates.

(3)

1 Introduction

According to S¨arndal & Lundstr¨om (2005) survey sampling first became popular as a cost-effective and practical alternative to a census in the 1930s. However, with sampling comes two types of errors, sampling and non-sampling errors. Sampling errors stem from the use of samples instead of complete enumeration. Traditional estimators and sampling designs control for the sampling errors. Non-sampling errors such as nonresponse, coverage errors or measurement errors presents problems that often severely limits the usefulness of classical estimators.

The calibration estimator is one technique developed to correct for nonresponse and/or coverage errors (S¨arndal & Lundstr¨om (2005)). The estimator uses auxiliary information about elements in the target population to calibrate the design weights that results from the use of a particular sampling design.

It is vital to assess the precision of an estimator and the estimates it produces. The standard way of measuring precision is by coupling the point estimate with an estimate of the variance of an estimator. The calibration estimator uses auxiliary information known about at least the respondents in a survey to improve point and variance estimates. One possible source of problems is the presence of measurement errors in said information. How does different types and degrees of measurement error influence the calibration technique?

1.1 Purpose

The purpose of this thesis is to investigate the empirical performance of the calibration estimator with measurement error present in the auxiliary information. In particular we are interested in comparing the properties of a number of variance estimation methods in a simulation study.

1.2 Outline

Section 2 gives a brief overview of the literature on the calibration estimator, measurement errors and variance estimation. The method and the design of the simulation study is given in section 3. The dataset(s) used in the study is detailed in section 6. Section 4 presents the calibration estimator, some theory on measurement errors and a number of variance estimation techniques. In section 7, a Monte Carlo simulation is described to test the point and variance estimator(s) in section 4 and 5. Results are discussed in Section 8.

2 Literature review

2.1 The Calibration Estimator

The calibration approach was first suggested as a method to estimate population totals and means in survey sampling by Deville & Särndal (1992) Since then calibration estimation has become a widely used tool at various statistics institutions, see Särndal (2010). A standard formulation, developed in a non-response framework, of the estimator is found in Särndal & Lundström (2005) along with a suggested variance estimator. Särndal & Lundström (2005) also treats the problem of frame imperfections.

There is a vast amount of research available on the estimator and examples include small area estimation in Lehtonen & Vejanen (2012) and Lehtonen & Vejanen (2015), combining calibration

(6)

and principal component analysis in Rota (2016) whereas Arcos et al. (2014) and Mourya et al. (2016) applies the estimator in a cluster sampling context.

The implementation and quality of the auxiliary variables and the information they contain is a key part of calibration. Särndal & Lundström (2005), Särndal (2008), Kreuter & Olson (2011) and Särndal & Lundström (2010) all analyze ways to evaluate the auxiliary information.

2.2 Measurement Errors

The presence of measurement errors is a common problem in survey sampling, see Bound et al. (2001) or Buonaccorsi (2010, p.1). Detailed overviews are found in Groves (2004), Biemer & Lyberg (2003) and Mathiowetz et al. (2002).

The errors are divided in two main groups.Systematic errors, where means of errors are non-zero and random errors where the means of the errors are zero. A scale that shows weight with errors typically normally distributed with the true value as the mean exhibits random errors. If it instead always measures five kgs. over the true value it exhibits a systematic error. S¨arndal & Lundstr¨om (2005, p.55) notes that especially systematic errors in the auxiliary variables can be problematic as calibrated weights might be severely affected.

The calibration estimator allows for both categorical and quantitative variables and both types may contain measurement errors. Measurement errors in categorical variables are named misclassifi-cations. The resulting estimators are often regression estimators with or without post-stratification resulting from the use of categorical variables. Buonaccorsi (2010), Fuller (2009) and Cheng & Van Ness (1999) presents a theoretical framework for regression models with measurement errors. Buonaccorsi (2010) and Grace (2016) addresses misclassification errors.

There are several ways to compensate for measurement errors. The aforementioned textbooks as well as an abundance of scientific articles lays out methods to improve the theoretical models in the presence of measurement errors. Another area of research to improve survey quality is to improve questions, interviewer techniques or other ways to improve precision and validity of the data. Examples can be seen in Saris & Revilla (2016), Biemer & Caspar (1994) and Billiet & Matsuo (2012).

Some examples of where measurement errors have been found to affect the results of a survey are Bollinger (1998) where bias was found in the U.S Current Population Survey of March 1978 where low income earning men reported larger than true income rates. In Butler et al. (1987) self reporting of arthritis among unemployed was found errant with over-reporting on the respondents part. In Kaestner et al. (1996) underreporting of prenatal drug use from mothers-to-be influenced the estimated danger of such drug use upwards.

2.3 Variance Estimation

Survey estimates are of limited value if we know nothing about the precision of said estimates. Coupled with point estimates of parameters are almost always variance estimates as a way to address how precise our estimates are. Variance estimates lets us construct confidence intervals or do hypothesis testing and understanding how errors influence the results is important. The most basic textbooks as well as the most complex survey designs will provide theory on variance and how to estimate it in conjunction with their estimators.

(7)

and II)

An overview of replication methods and the use of Taylor linearization in variance estimation in survey sampling can be found in Wolter (2007). Random groups as a method was introduced in Mahalanobis (1940), balanced half-samples in McCarthy (1966), the jackknife method by Quenouille (1949) and Quenouille (1956) in an attempt to reduce bias in certain estimators. The bootstrap method was first proposed by Efron (1979). According to Wolter (2007, p.226) Taylor techniques are old, established and not credited to any particular author(s).

Since their respective emergence all methods have been subject to much research and are all considered well established even if usage has varied over time, especially with the growing calculating power of computers, see Wolter (2007, p.v-vi)

3 Method

Systematic measurement error in a continuous variable was considered. Monte Carlo simulations were used to evaluate the performance of a number of variance estimation methods. Relative bias of the simulated variance estimators and the point estimator as well as the true coverage rates and mean width of the resulting confidence intervals were used as indicators on how measurement errors affect the estimates. Asymptotic properties were investigated with the help of the t-statistic and various visualization tools.

4 Theory and Model

4.1 Basic Notation

Define a finite target population, U = {1, ..., k, ..., N } with N elements and let k indicate the kth element. An estimated total of a study variable y is desired. The value of the study variable of element k is denoted yk. The population total is written as

Y =X U

yk. (1)

To estimate the total we draw, s, a probability sample of size n from the target population with probability according to a function p(s), the sampling design. The sampling design follows from the set of rules by which the sample is selected.

Now define the first order inclusion probability for element k πk = P (k ∈ s) =

X

s3k

p(s), (2)

a sum of probabilities of all possible samples that includes element k. The second order inclusion probability for elements k and l is defined as

πkl = P ({k, l} ⊂ s) = X

s⊃{k,l}

p(s), (3)

a sum of probabilities of all the samples that include both k and l. Let design weights be defined as

(8)

dk= 1/πk (4)

dkl= 1/πkl. (5)

Also note that if k = l

πkl= πkk= πk (6)

and similarly

1/πkl = dkl= dkk= dk. (7)

4.2 The Calibration Estimator

Note that the construction of the calibration estimator in Särndal & Lundström (2005) is done with nonresponse present, however, for the purposes of this thesis nonresponse is considered nonexistent. The auxiliary information used in calibration estimation is contained in auxiliary variables. They should ideally fulfill three important criterion (see Särndal & Lundström (2005)). The aux. variables should correlate with the study variable(s), explain nonresponse and identify important domains.

4.2.1 The Auxiliary Vector

Let x denote the the vector containing the auxiliary information. For element k ∈ U the vector is written xk. Further needed for calibration are the population totals of the auxiliary variables, either known or estimated. Let the general vector of population totals beP

Uxk = X. In S¨arndal & Lundstr¨om (2005) we find the following ways the auxiliary information can be distinguished: InfoU :P

Ux ∗

k is known. Denote the auxiliary vector of J∗≥ 1 dimensions by x∗k and ∀k ∈ s x∗k is known. This is information on the “star” level.

InfoS :P Ux

∗

k is not known. Instead use the Horvitz-Thompson estimator, P

sdkx ◦

k = X◦. Denote the auxiliary vector of J◦≥ 1 dimensions by x◦

k. Here ∀k ∈ s, x◦k is known. This is information on the “moon” level.

InfoUS : Some variables use totals known on the star level and some use estimated totals with information only on the moon level. Let the auxiliary vector be x∗k

x◦ k. 4.2.2 The Standard Weighting System

Since the calibration estimator in S¨arndal & Lundstr¨om (2005) is developed in the context of nonresponse note that r ∈ s is the response set, consisting of the elements selected in the sample that responded. The sampling design determines design weights, dk, but ˆYHT = Prdkyk will underestimate the total of Y . The calibration idea is to find new weights, using the auxiliary

(9)

information, that adjusts for the nonresponse. Let the new weights be wk and let vk be the factor of inflation (deflation) to be applied to the original design weight.

wk= vkdk. (8)

The new weights should fulfill a calibration equation or constraint X

r

wkxk= X. (9)

In a sense the wks replicates the auxiliary information when the sum of the aux.vector over the response set matches the vector of population totals exactly.

S¨arndal & Lundstr¨om (2005) lets the vks linearly depend on the xks by

vk = 1 + λ0xk (10)

with λ0 a vector of constants. Replacing the wks with vkdks in (9) and solving for λ0 yields

λ0= X −X r dkxk !0 X r dkxkx0k !−1 . (11)

The calibration estimator of the total of Y then becomes ˆ

YCal= X

r

wkyk. (12)

4.2.3 The Generalized Weighting System

The final weights, wks in the standard system are not uniquely determined. In S¨arndal & Lund-str¨om (2005) the calibration system is generalized further and as a result allows for some desirable formulations of the estimator. Any set of positive initial weights, not just the design weights, dks, can be put into the calibration system and generate final weights calibrated to the information in X.

Let a vector zk, called instrument vector, be a function of xk and/or other data on the element k. Let ∀k ∈ r, dαk be a set of positive weights and make the following substitutions

wk= vkdαk (13) vk = 1 + λ0zk (14) with λ0_r= X −X r dαkxk !0 X r dαkzkx0k !−1 (15) and we can use the generalized weights in (12) instead.

(10)

4.2.4 The Variance Estimator

S¨arndal & Lundstr¨om (2005, p.136) proposes the following variance estimator using the generalized weighting system with dαk = dk, zk and the case of InfoUS

ˆ V ˆYCal

= ˆVSAM + ˆVN R (16)

splitting the variance in two parts, a estimated sampling part as well as an estimated non-response part. Here ˆ VSAM = X X r (dkdl− dkl) (vkeˆ∗k) (vlˆe∗) − X r dk(dk− 1)vk(vk− 1)(ˆe∗k) 2 ₍₁₇₎ and ˆ VN R= X r vk(vk− 1) (dkeˆk) 2 (18) where the residuals are

ˆ e∗_k= yk− (x∗k) 0 B∗_r;dv (19) ˆ ek= yk− x0kB ∗ r;dv= yk− (x∗k) 0 B∗_r;dv− (x∗ k) B ◦ r;dv (20) with Br;dv = B∗r;dv B◦r;dv = X r dkvkzkx0k !−1 X r dkvkzkyk ! (21) With InfoU we get

ˆ e∗_k = ˆek= yk− (x∗k) 0 X r dkvkz∗k(x ∗ k) 0 !−1 X r dkvkz∗kyk ! (22)

and with InfoS

ˆ e∗_k= yk (23) ˆ ek= yk− (x◦k) 0 X r dkvkz◦k(x ◦ k) 0 !−1 X r dkvkz◦kyk ! (24)

5 Variance Estimation Methods

5.1 The Jackknife

Let ˆθ, an estimator of a parameter θ, be calculated from a parent sample. Divide the parent sample into g number of groups, each with m elements with n = gm for simplicity.

Let θ(a) be the same estimator, calculated on a subsample constructed by leaving out the ath group from the parent sample.

Further let

ˆ

(11)

and ˆ ¯ θ = g X a=1 ˆ θa g . (26)

To calculate a jackknife estimate of variance use

VJ K1= 1 g (g − 1) g X a=1 ˆ_θ_a_{− ˆ}_θ¯2 ₍₂₇₎ or alternatively VJ K2= 1 g (g − 1) g X a=1 ˆ_θ a− ˆθ 2 . (28)

5.2 The Bootstrap

Let θ be a parameter of interest and let ˆθ be an estimator of said parameter. Assume a parent sample of size n drawn from the population U . A set of bootstrap subsamples, A, of size n∗ _are then drawn from the parent sample using simple random sampling with replacement. For each bootstrap sample a let ˆθ∗

a be the estimate calculated using the same estimator as with the parent sample. Then the bootstrap estimator of variance is

VBoot(ˆθ) = 1 A − 1 A X a=1 (ˆθ∗_a− ˆθ¯∗)2 (29) where ˆ ¯ θ∗= 1 A A X a=1 ˆ θ∗a. (30)

5.3 Taylor Series

A fruitful approach to variance estimation is the use of Taylor linearization. A complex estimator that is often non-linear can lead to intractable variance expressions. If the estimator can be approx-imated by a first order Taylor (linear) function/polynomial following Taylor’s theorem a variance estimator is then easier to derive.

Let Y = (Y1, ..., Yp) be a vector of population parameters and let ˆY = ˆY1, ..., ˆYp

a vector of estimators of said parameters. Consider θ = f (Y ), a function of the parameters in Y and the estimator ˆθ = f ˆY

Using a first order Taylor approximation of f ˆY centered on the point (Y1, ..., Yp) and dis-carding the rest term

ˆ θ ≈ ˆθT ay = θ + p X i=1 di(ˆyi− yi) (31) with di= δf (Y ) δ ˆYi _{Y =Y}_ˆ (32)

(12)

and with some rearrangements, see S¨arndal et al. (2003, p.174-175), the approximate variance is estimated by ˆ V ˆθT ay =X X s πkl− πkπl πkl Dk πk Dl πl (33) with Dk= p X i=1 diyik. (34)

6 Data

6.1 Overview

A synthetic data set was created using the MU284 set found in the sampling package and originally in S¨arndal et al. (2003, p.652). The following variables are included:

List of variables Variable Description

LABEL A number identifying each element from 1 to 284 P85 Population in 1985 in thousands

P75 Population in 1975 in thousands

RMT85 Municipal tax revenue in 1985 in millions of SEK

CS82 Number of Conservative seats in the municipal legislature SS82 Number of Social-Democratic seats in the municipal legislature S82 Total number of seats in the municipal legislature

ME84 Number of municipal employees

REV84 Real estate values in 1984 in millions of SEK REG A indicator of geographical region

CL A cluster indicator indicating a set of neighboring municipalities GROUP Added variable indicating membership in one of four groups

Table 1: Variables of the MU284/MU281 population

Three outliers of MU284 were removed, Stockholm, Gothenburg and Malmoe thus creating MU281. The population size was inflated to N = 10116 from N = 281. The study variable is RMT85 and auxiliary variables are REV84, CS82 and SS82. The population totals are:

(13)

MU281 population Variable Total RMT85 1914632 CS82 90238 SS82 223199 REV84 27261475

Table 2: Population totals of the MU10116 population

6.2 Creating the Data Set

The data set used in the simulation was created by merging multiples of the MU281 (N = 281) data set to itself. To avoid adding exact copies small amounts of random noise was added to all variables after merging. The resulting set was MU10116 (N = 10116). The Kolmogorov-Smirnov two-sample test was used to verify that the variables of MU10116 were generated from the same distribution as the corresponding variables in MU281. R code for the procedure can be found in section 10.3. Results were:

Kolmogorov-Smirnov test on pairwise variables

Variable D P-value

RMT85 0.01206 1

CS82 0.024911 0.9958

SS82 0.012456 1

REV84 0.0048438 1

Table 3: Two sample K-S test on MU281 and MU10116 variables

The null hypothesis that the variables are generated from the same distribution cannot be rejected. The correlations between the variables were also nearly identical:

Correlation

CS82 0.642

SS82 0.646

REV84 0.907

(14)

Correlation

CS82 0.657

SS82 0.651

REV84 0.907

Table 5: Correlation of auxiliary variables with RMT85 in MU281

preserving the correlations to a satisfying degree.

7 Simulation study

7.1 Overview

We were estimating the total of the variable RMT85. Monte Carlo simulations with a varying number of trials were used. Using the R survey package the Taylor linearization technique, the bootstrap method as well as the jackknife were evaluated.

The calibration estimator in the study utilized information on the InfoU level i.e. totals were known for all auxiliary variables on the target population level and values of the aux. variables were known for all elements in the sample/response set. Together with the assumption of no non-response the calibration estimator is then equivalent to the general regression estimator (see S¨arndal & Lundstr¨om (2005, Remark 6.5)).

The population totals of the aux. variables were inflated to test the effect of a general systematic measurement error. Typically, this situation arises when a population total of the auxiliary variable is imported from a precise and secure source outside the survey at the same time as over- or underreporting among respondents arise from some form of prestige bias connected to a sensitive variable (i.e. alcohol consumption, smoking, drug use, or income as examples)

7.2 Sampling design

Simple regression estimators were used with CS82, REV84 respectively as well as a multiple re-gression estimator using CS82 and SS82. REV84 is very strongly correlated to RMT85. The most interesting case in terms of changing explanatory power is adding SS82 to the simple model with CS82. Tables and plots to inform these choices can be found in section 10.

Samples were selected using simple random sampling without replacement.

7.3 Performance indicators

The relative bias of the variance estimator(s)

RBSIMVˆ ˆYCal = ESIM[ ˆV ˆYCal ] − VSIM ˆYCal VSIM ˆYCal (35)

(15)

ESIM[ ˆV ˆYCal ] = 1 T T X t=1 ˆ V ˆYCal(t) (36) VSIM ˆYCal = 1 T − 1 T X t=1

[ ˆYCal(t)− ESIM ˆYCal ]2 (37) ESIM ˆYCal = 1 T T X t=1 ˆ YCal(t). (38)

The coverage rate of a would-be 95% confidence interval

CRSIMh ˆV ˆYCal i

= number of trials where the conf. int. covers the true value of RMT85

total number of trials .

(39) The mean width of the confidence intervals

1 T T X t=1 2 ∗ r ˆ V ˆYCal(t) ∗ 1.96 ∗r N − n N − 1 ! . (40)

The relative bias of the point estimator

RBSIM ˆYCal

= ESIM( ˆYCal) − Y

Y . (41)

To test the normality of the distribution of the estimated totals and standard errors the Shapiro-Wilks test was employed. In the S-W-test the null hypothesis is that the tested distribtion is normal. A p-value smaller than 0.05 rejects the null hypothesis on a 95% significance level.

Histograms of the empirical distributions were also plotted to visualize a possible convergence towards a normal distribution of estimates.

QQ-plots were also used where, again, the empirical distributions of the estimates where com-pared to a normal distribution. A QQ-plot, in this case, compares quantiles of a theoretical normal distribution with the corresponding quantiles of the empirical distribution. With close to normality the empirical quantiles should lie on the line y = x

Finally, given the distribution of the t-statistic

t = ˆ θ − θ0 r ˆ V ˆθ d −→ N (0, 1) (42)

and if we have consistent estimates of the total ˆθ and we know the true value, θ0, we can use the estimated standard error in the denominator. If the resulting empirical distribution of the t-statistic converges to N (0, 1) we have consistent estimates of the standard error as well.

(16)

8 Results and Discussion

8.1 Results and Analysis

All results and plots generated from the simulation study can be found in the Appendix. Without measurement error all variance estimation methods had confidence interval coverage rates reason-ably close to the nominal 95% level. However with an added measurement error of 2.5% to the population totals the coverage rate fell well short in almost all cases. The results using Taylor linearization with CS82 as auxiliary variable

Aux. var. CS82,5000 MC trials, inflation percentage in the top row

n = 500 0 2.5% 5%

Coverage rate 0.944 0.874 0.622

Mean C.I width 257535.505 270630.5 284377.1

Rel.bias Var 0.026 0.025 0,024 Rel.bias Tot -0.009 0.039 0.063 Shapiro-Wilks(Tot) 0.596 0.5374 0.4748 Shapiro-Wilks(s.e) 0.044 0.018 0.01 n = 1000 0 2.5% 5% Coverage rate 0.940 0.745 0.3

Mean C.I width 172845.945 181678.993 190951.615

Rel.bias Var 0.046 0.047 0.048

Rel.bias Tot -0.004 0.031 0.0632

Shapiro-Wilks(Tot) 0.47 0.559 0.668 Shapiro-Wilks(s.e) 0.009 0.003 0.001 Table 6: RMT85 estimated, CS82 auxiliary variable

are typical. With a sample size of n = 1000 the coverage rate drop from 0.940 to 0.745 with 2.5% measurement error. At 5% the coverage rate drops to 0.3.

The variance bias is positive. This should contribute to an increase in the coverage rate with wider intervals. The loss in coverage rate instead results from increasing bias in the point estimator. Comparing the histograms of the t-statistic:

(17)

Figure 1: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, n = 1000, no measurement error compared to 2.5%

The empirical distributions fit well to the standard normal distribution in both cases. With measurement error the distribution is off-center. This, again, as a result of the bias in the point estimates. With the variable REV84 (see Table 12), the drop-off in coverage rate is even larger. REV84 has a higher correlation to the study variable RMT85 than CS82 with smaller variance estimates as a result. With even narrower confidence intervals with a similar bias in the point estimator a lower coverage rate should follow.

The use of the bootstrap and the jackknife to estimate variance suggests similar results. However, it should be noted that due to computational restraints the results are less accurate.

8.2 Discussion and Results

The results of the simulation study causes concern. Measurement errors are an inevitable part of surveys. In S¨arndal & Lundstr¨om (2005, p.144-149) a similar simulation investigated the influence of non-response on variance estimation and the calibration estimator. The conclusion there was the same as here. Even if variance can be estimated to a satisfying degree the bias in the point estimates renders confidence intervals invalid.

8.3 Improvements and Further Study

The Monte Carlo simulations on Taylor linearization reached sufficient precision. A test simulation using 10000 trials and n = 1000 and CS82 gained no significant improvement compared to 5000 trials. The coverage rate was 0, 938 (compared to 0.944) and the average width of the confidence intervals 172783.6 (172845.945)

However with both the bootstrap and the jackknife computational power limited the results. Testing a number of different combinations of Monte Carlo trials, number of bootstrap replicates and n = 1000 the following linear equation could reasonably predict bootstrap simulation time:

(18)

A simulation of 5000 trials and 200 bootstrap replicates would result in a computation time of 14 hours, a prohibitive amount of time. With a faster computer and optimized R code these times can be reduced immensely.

The sampling designs used in this study is elementary. More complex designs are typically found in surveys at statistical agencies. The influence of measurement error on the calibration estimator is well worth investigating in a more complex setting.

In this simulation study the systematic measurement error is not specified by a particular distribution. A model where only part of the population is affected by systematic measurement error is of interest. Random measurement error could also investigated.

(19)

9 References

References

Arcos, A., Contreras, J. M. & Rueda, M. M. (2014), ‘A novel calibration estimator in social surveys’, Sociological Methods & Research 43(3), 465–489.

Biemer, P. & Caspar, R. (1994), ‘Continuous quality improvement for survey operations: Some general principles and applications’, Journal of official statistics 10(3), 307.

Biemer, P. P. & Lyberg, L. E. (2003), Introduction to survey quality, Vol. 335, John Wiley & Sons. Billiet, J. & Matsuo, H. (2012), Non-response and measurement error, in ‘Handbook of survey

methodology for the social sciences’, Springer, pp. 149–178.

Bollinger, C. R. (1998), ‘Measurement error in the current population survey: A nonparametric look’, Journal of Labor Economics 16(3), 576–594.

Bound, J., Brown, C. & Mathiowetz, N. (2001), Measurement error in survey data, in ‘Handbook of econometrics’, Vol. 5, Elsevier, pp. 3705–3843.

Buonaccorsi, J. P. (2010), Measurement error: models, methods, and applications, CRC Press. Butler, J. S., Burkhauser, R. V., Mitchell, J. M. & Pincus, T. P. (1987), ‘Measurement error in

self-reported health variables’, The Review of Economics and Statistics pp. 644–650.

Cheng, C. & Van Ness, J. (1999), Statistical regression with measurement error, Arnold London. Deville, J. & S¨arndal, C. (1992), ‘Calibration estimators in survey sampling’, Journal of the

Amer-ican Statistical Association pp. 376–382.

Efron, B. (1979), ‘Bootstrap methods: Another look at the jackknife’, The Annals of Statistics 7(1), 1–26.

URL: http://www.jstor.org/stable/2958830

Fuller, W. A. (2009), Measurement error models, Vol. 305, John Wiley & Sons.

Grace, Y. Y. (2016), Statistical Analysis with Measurement Error Or Misclassification, Springer. Groves, R. M. (2004), Survey errors and survey costs, Vol. 536, John Wiley & Sons.

Hansen, M. H. & Hurwitz, W. N. (1953), Sample survey methods and theory. Vol. I, John Wiley And Sons, Inc.; New York.

Kaestner, R., Joyce, T. & Wehbeh, H. (1996), The effect of maternal drug use on birth weight: Measurement error in binary variables, Working Paper 5434, National Bureau of Economic Re-search.

URL: http://www.nber.org/papers/w5434

Kreuter, F. & Olson, K. (2011), ‘Multiple auxiliary variables in nonresponse adjustment’, Sociolog-ical Methods & Research 40(2), 311–332.

(20)

Lehtonen, R. & Vejanen, A. (2012), ‘Small area poverty estimation by model calibration’, J. Indian Soc. Agric. Stat 66(1), 125–133.

Lehtonen, R. & Vejanen, A. (2015), ‘Estimation of poverty rate for small areas by model calibration and ”hybrid” calibration methods’.

URL: goo.gl/Tqqmyi

Mahalanobis, P. C. (1940), ‘A sample survey of the acreage under jute in bengal’, Sankhy¯a: The Indian Journal of Statistics pp. 511–530.

Mathiowetz, N., Brown, C. & Bound, J. (2002), ‘Measurement error in surveys of the low-income population’, Studies of welfare populations: Data collection and research issues pp. 157–194. McCarthy, P. J. (1966), ‘Replication; an approach to the analysis of data from complex surveys’. Mourya, K., Sisodia, B. & Chandra, H. (2016), ‘Calibration approach for estimating finite

popula-tion parameters in two-stage sampling’, Journal of Statistical Theory and Practice 10(3), 550–562. Quenouille, M. H. (1949), Approximate tests of correlation in time-series 3, in ‘Mathematical Pro-ceedings of the Cambridge Philosophical Society’, Vol. 45, Cambridge University Press, pp. 483– 484.

Quenouille, M. H. (1956), ‘Notes on bias in estimation’, Biometrika 43(3/4), 353–360.

Rota, B. J. (2016), Calibration Adjustment for Nonresponse in Sample Surveys, PhD thesis, ¨Orebro university.

Saris, W. E. & Revilla, M. (2016), ‘Correction for measurement errors in survey research: necessary and possible’, Social Indicators Research 127(3), 1005–1020.

S¨arndal, C.-E. (2008), ‘Assessing auxiliary vectors for control of nonresponse bias in the calibration estimator’, Journal of Official Statistics 24(2), 167.

S¨arndal, C.-E. (2010), ‘The calibration approach in survey theory and practice’, Survey Methodology 33(2), 99–119.

S¨arndal, C. & Lundstr¨om, S. (2005), Estimation in surveys with nonresponse, John Wiley & Sons Inc.

S¨arndal, C. & Lundstr¨om, S. (2010), ‘Design for estimation: Identifying auxiliary vectors to reduce nonresponse bias’, Survey Methodology 36, 131–144.

S¨arndal, C., Swensson, B. & Wretman, J. (2003), Model assisted survey sampling, Springer Verlag. Sukhatme, P. V. (1957), Sampling theory of surveys with applications, The Indian Society Of

Agri-cultural Statistics; New Delhi.

(21)

10 Appendix

10.1 Regression Data and Plots

Figure 2: Scatterplots of CS82 and SS82 with regression lines added

(22)

Regression data RMT on CS82 Coefficients P-value Intercept -51.731 < 2, 2 ∗ 10−16 CS82 27.0169 < 2, 2 ∗ 10−16 Adj. R2 _0.412 Table 7: RMT85 regressed on CS82 Regression data RMT on SS82 Coefficients P-value Intercept -205.716 < 2, 2 ∗ 10−16 SS82 17.902 < 2, 2 ∗ 10−16 Adj. R2 _0.417 Table 8: RMT85 regressed on SS82 Regression data

RMT on REV84 Coefficients P-value

Intercept -15.49 < 2, 2 ∗ 10−16 REV84 0.076 < 2, 2 ∗ 10−16 Adj. R2 _0.823

Table 9: RMT85 regressed on REV84

Regression data RMT on CS82,SS82 Coefficients P-value Intercept -367.790 < 2, 2 ∗ 10−16 CS82 23.636 < 2, 2 ∗ 10−16 SS82 15.692 < 2, 2 ∗ 10−16 Adj. R2 _0.723

(23)

Models using REV84 together with CS82 or SS82 respectively resulted in small increases in Adj.R2_{. In the case of CS82 Adj.R}2 _{increased from 0.823 to 0.840. Adding SS82 to the REV84} model increased Adj.R2_{to 0.841. Therefor these two models where not considered in the simulation} study. A model using REV84,CS82 and SS82 resulted in a Adj.R2_{of 0.874 and was also not included} in the simulation.

10.2 R Code

The functions addpopsize, srsworsample, truecovrate and ciwidth are user created. The functions svydesign, svytot, calibrate, and as.rep.design are from the survey package, srswor is from the sampling package.

10.2.1 Various Support Functions addpopsize:

#to add a column of pop.size, the function takes the data set and the popsize #and creates a column of repeated entries of pop.size and adds it to the data #set

addpopsize <- function (data,N) { popsize <- rep(N,nrow(data)) data$popnr <- c(popsize) return(data) } srsworsample:

#simple random sampling without replacement, adds a column of #design weights

srsworsample <- function(data,n) {

internalsample <- srswor(n,nrow(data)) finalsample <- data[internalsample == 1,]

finalsample <- cbind(finalsample, dk = n/nrow(data)) return(finalsample)

}

truecovrate:

#to test the true coverage rate of intervals calculated from samples in a MC #simulation, here the variable SE contains the estimated standard errors truecovrate <- function(data,truevalue,N,n)

(24)

testdata <- subset(data,Total+1.96*SE*sqrt(1-n/N) > truevalue & Total-1.96*SE*(sqrt(1-n/N)) < truevalue) rate <- nrow(testdata)/nrow(data)

return(rate) }

ciwidth:

#to calculate the mean width of C.Is

#here the variable containing the s.e is named SE ciwidth <- function(data,) {

width <- mean(2*data$SE*1.96*sqrt(1-n/N)) return(width)}

10.2.2 Taylor Linearization

A complete simulation with CS82 and SS82 as aux.vars specified inside the MCtotCalib function. 2.5% measurement error is added to CS82 here. Various results and plots are produced and the estimates are saved to file in case they need to be retrieved later.

#Taylor linearization

MCtotCalib <- function(data,N,n,reps) {set.seed(1978)

#add a variable containing the pop.size data <- addpopsize(data,N)

estimates <- matrix(0,nrow = reps,ncol = 2) estimates <- as.data.frame(estimates) for (i in 1:reps)

{ #draw a sample

sample <- srsworsample(data,n) #construct survey object

design <- svydesign(id=~1,fpc = ~popnr, data = sample) #calibrate the design object

calibdesign <- calibrate(design,formula=~CS82+SS82,population = c(’(Intercept)’ = length(data$RMT85), CS82 = 1.025*sum(data$CS82),SS82 = sum(data$SS82))) #extract the estimates

estimates[i,] <- as.data.frame(svytotal(~RMT85,calibdesign)) }

names(estimates) <- c("Total","SE") return(estimates)

(25)

reps = 5000 n = 1000 N = 10116 attach(newdata) myestimates <- MCtotCalib(newdata,10116,n,reps) AAcovrate <- truecovrate(myestimates,truevalue = 1914632,N,n) AAci.width <- ciwidth(myestimates,N,n) myestimates.var <- myestimates$SE^2 exp.var.sim <-mean(myestimates.var) meantot <- mean(myestimates$Total) varsq.sum <- sum((myestimates$Total-meantot)^2) v.sim <- (1/(reps-1))*varsq.sum

AArel.var.bias <- (exp.var.sim - v.sim)/v.sim

AArel.tot.bias <- (meantot-sum(newdata$RMT85))/sum(newdata$RMT85) detach(newdata)

proc.time()-t

#qq-plot, Shapior-Wilks test and histogram of s.e. to test/inspect normality shapiro.test(myestimates$Total)

shapiro.test(myestimates$SE)

qqnorm(myestimates$Total, main = "Total estimates,Taylor, n = 1000, 5000 trials") qqline(myestimates$Total)

qqnorm(myestimates$SE, main = "Standard error estimates,Taylor, n = 1000, 5000 trials") qqline(myestimates$SE)

hist(myestimates$SE,breaks = 50, main = "Histogram of s.e. estimates, Taylor, n = 1000, 5000 trials", xlab = "Standard error estimates") hist(myestimates$Total, breaks = 50, main = "Histogram of estimated totals, Taylor, n = 1000, 5000 trials", xlab = "Total estimates")

#to see if the t-stat of the var.est approaches N(0,1)-distr. tstat = (myestimates$Total-trueval)/(myestimates$SE)

hist(tstat, breaks = 50, freq = FALSE, xlim = range(-4,4), ylim =range(0,0.5), main = "Histogram of the t-statistic with distr. of N(0,1) overlay", xlab = "t-statistic") curve(dnorm(x, mean=0, sd=1), col="darkblue", lwd = 2 ,add=TRUE)

#saving the myestimates data to file

saveRDS(myestimates,"0025CS82SS82cal10005000.rds")

10.2.3 Bootstrap

The boot package is used to perform bootstrap simulations. The boot function requires the boot.tot function which is user supplied. It is inside boot.tot the aux. variables are specified.

#Bootstrap library(survey)

(26)

library(sampling) library(boot)

MCtotCalibBoot <- function(data,N,n,reps,bootreps) { set.seed(1978)

#add a variable containing the pop.size data <- addpopsize(data,N)

estimates <- matrix(0,nrow = reps,ncol = 3) estimates <- as.data.frame(estimates) for (i in 1:reps)

{ #draw a sample

int.sample <- srsworsample(data,n)

bootobj <- boot(int.sample,boot.tot, R = bootreps) estimates[i,1] <- bootobj$t0

estimates[i,2] <- sd(bootobj$t)

estimates[i,3] <- mean(bootobj$t)-bootobj$t0 }

names(estimates) <- c("ORG","SE", "BIAS") return(estimates)

}

boot.tot<-function(orgsample,i) {

#get a bootstrap sample bootsample<- orgsample[i,] #create design object

design <- svydesign(id=~1,fpc = ~popnr, data = bootsample)

#calibrate the design object, put it in compatible form and return the est.total

calibdesign <- calibrate(design,formula=~CS82,population = c(’(Intercept)’ = length(newdata$RMT85), CS82 = sum(newdata$CS82))) tot <- as.data.frame(svytotal(~RMT85,calibdesign)) return(tot[1,1]) } attach(newdata) n = 1000 N = 10116

(27)

bootreps = 30 t <- proc.time() myestimates<- MCtotCalibBoot(newdata,N,n,reps,bootreps) proc.time()-t BBboot.cov <- bootcovrate(myestimates,sum(newdata$RMT85),N,n) BBci.width <- ciwidth(myestimates,N,n) myestimates.var <- myestimates$SE^2 exp.var.sim <-mean(myestimates.var) meantot <- mean(myestimates$ORG) varsq.sum <- sum((myestimates$ORG-meantot)^2) v.sim <- (1/(reps-1))*varsq.sum

BBrel.var.bias <- (exp.var.sim - v.sim)/v.sim

BBrel.tot.bias <- (meantot-sum(newdata$RMT85))/sum(newdata$RMT85)

#qq-plot, Shapior-Wilks test and histogram of s.e. to test/inspect normality shapiro.test(myestimates$ORG)

shapiro.test(myestimates$SE)

qqnorm(myestimates$ORG, main = "Total estimates,30 B-reps, n = 1000, 5000 trials") qqline(myestimates$ORG)

qqnorm(myestimates$SE, main = "Standard error estimates,30 B-reps, n = 1000, 5000 trials") qqline(myestimates$SE)

hist(myestimates$SE,breaks = 50, main = "Histogram of s.e. estimates, n = 1000, 5000 trials,30 B-reps", xlab = "Standard error estimates") hist(myestimates$ORG, breaks = 50, main = "Histogram of estimated totals, n = 1000, 5000 trials, 30 B-reps", xlab = "Total estimates")

#to see if the t-stat of the var.est approaches N(0,1)-distr. tstat = (myestimates$ORG-trueval)/(myestimates$SE)

#save the estimates for future repetition of calculations saveRDS(myestimates,"CS82boot1000120012")

#function to calc true covrate in the bootstrap simulations bootcovrate <- function(data,truevalue,N,n)

{

testdata <- subset(data,ORG+1.96*SE*sqrt(1-n/N) > truevalue & ORG-1.96*SE*sqrt(1-n/N) < truevalue) rate <- nrow(testdata)/nrow(data)

return(rate) }

(28)

10.2.4 Jackknife content...#jackkknife

MCtotCalJK <- function(data,N,n,reps) {set.seed(1978)

#add a variable containing the pop.size, needed in svydesign-func. data <- addpopsize(data,N)

estimates <- matrix(0,nrow = 1,ncol = 2) estimates <- as.data.frame(estimates) jk_estimates <- matrix(0,nrow = 1,ncol = 2) jk_estimates <- as.data.frame(jk_estimates) for (i in 1:reps)

{ #draw a sample

sample <- srsworsample(data,n) #construct survey object

design <- svydesign(id=~1,fpc = ~popnr, data = sample) #calibrate the design object

calibdesign <- calibrate(design,formula=~CS82,population = c(’(Intercept)’ = length(data$RMT85), CS82 = sum(data$CS82))) #extract the estimates

estimates[i,] <- as.data.frame(svytotal(~RMT85,calibdesign)) jks <- jackknife(x = 1:n,theta = cal.func, sample)

jk_estimates[i,1] = jks$jack.se jk_estimates[i,2] = jks$jack.bias } results <- cbind(estimates,jk_estimates) names(results) <- c("Total","SE","JK.se","JK.bias") return(results) } JKcovrate <- function(data,truevalue,N,n) {

testdata <- subset(data,Total+1.96*JK.se*sqrt(1-n/N) > truevalue & Total-1.96*JK.se*sqrt(1-n/N) < truevalue) rate <- nrow(testdata)/nrow(data) return(rate) } JKciwidth <- function(data,N,n) { width <- mean(2*data$JK.se*1.96*sqrt(1-n/N)) return(width)}

(29)

design <- svydesign(id=~1,fpc = ~popnr, data = internal.sample[x,]) #calibrate the design object

calibdesign <- calibrate(design,formula=~CS82,population = c(’(Intercept)’ = length(newdata$RMT85), CS82 = sum(newdata$CS82))) #extract the estimates

estimates[1,] <- as.data.frame(svytotal(~RMT85,calibdesign)) return(estimates[1,1]) } #example of a simulation reps = 100 n = 500 N = 10116 t <- proc.time() myestimates <- MCtotCalJK(newdata,N,n,reps) proc.time()-t JKcr <- JKcovrate(myestimates,truevalue = 1914632,N, n) JKci.width <- JKciwidth(myestimates,N,n) myestimates.var <- myestimates$JK.se^2 exp.var.sim <-mean(myestimates.var) meantot <- mean(myestimates$Total) varsq.sum <- sum((myestimates$Total-meantot)^2) v.sim <- (1/(reps-1))*varsq.sum

JKrel.var.bias <- (exp.var.sim - v.sim)/v.sim

JKrel.tot.bias <- (meantot-sum(newdata$RMT85))/sum(newdata$RMT85)

#qq-plot, Shapior-Wilks test and histogram of s.e. to test/inspect normality shapiro.test(myestimates$ORG)

shapiro.test(myestimates$JK.se)

qqnorm(myestimates$Total, main = "Total estimates,Jackknife, n = 300, 100 trials") qqline(myestimates$Total)

qqnorm(myestimates$JK.se, main = "Standard error estimates,Jackknife, n = 300, 100 trials") qqline(myestimates$JK.se)

hist(myestimates$JK.se,breaks = 50, main = "Histogram of s.e. estimates, Jackknife, n = 300, 100 trials", xlab = "Standard error estimates") hist(myestimates$Total, breaks = 50, main = "Histogram of estimated totals, Jackknife, n = 300, 100 trials", xlab = "Total estimates")

#to see if the t-stat of the var.est approaches N(0,1)-distr. tstat = (myestimates$Total-trueval)/(myestimates$JK.se)

(30)

10.2.5 Creating the Data Set #remove unused variables from MU281

MU281trimmed <-subset(MU281,select = c(RMT85,CS82,SS82,REV84,ME84)) #save data in a newly created set

newdata <- MU281trimmed

#add copies of MU281 to the data set until N = ~10000 for(i in (1:35))

{

newdata <- rbind(newdata,MU281trimmed) }

#check to see min and max and decide on appropriate size of noise min(newdata$RMT85) max(newdata$RMT85) min(newdata$CS82) max(newdata$CS82) min(newdata$SS82) max(newdata$SS82) min(newdata$REV84) max(newdata$REV84) min(newdata$ME84) max(newdata$ME84)

#create a vector of noise using normal distribution, tested to keep correlations roughly as in set MU281 set.seed(1978) normbrusRMT85 <- rnorm(length(newdata$RMT85),0,5) normbrusCS82 <- rnorm(length(newdata$CS82),0,1) normbrusSS82 <- rnorm(length(newdata$SS82),0,1) normbrusREV84 <- rnorm(length(newdata$REV84),0,5) normbrusME84 <- rnorm(length(newdata$ME84),0,5) #add noise using normal distr.

newdata$RMT85 <- round(newdata$RMT85 + normbrusRMT85) newdata$CS82 <- round(newdata$CS82 + normbrusCS82) newdata$SS82 <- round(newdata$SS82 + normbrusSS82) newdata$REV84 <- round(newdata$REV84 + normbrusREV84) newdata$ME84 <- round(newdata$ME84 + normbrusME84) #check that correlations are about the same

cor(newdata) cor(MU281trim)

(31)

ks.test(MU281$RMT85,newdata$RMT85) ks.test(MU281$CS82,newdata$CS82) ks.test(MU281$SS82,newdata$SS82) ks.test(MU281$REV84,newdata$REV84) ks.test(MU281$ME84,newdata$ME84) sum(newdata$RMT85) sum(newdata$CS82) sum(newdata$SS82) sum(newdata$REV84)

10.3 Tables

10.3.1 Taylor Linearization

Aux. var. CS82,5000 MC trials, inflation percentage in the top row

n = 500 0 2.5% 5%

Coverage rate 0.944 0.874 0.622

Mean C.I width 257535.505 270630.5 284377.1

Rel.bias Var 0.026 0.025 0,024 Rel.bias Tot -0.009 0.039 0.063 Shapiro-Wilks(Tot) 0.596 0.5374 0.4748 Shapiro-Wilks(s.e) 0.044 0.018 0.01 n = 1000 0 2.5% 5% Coverage rate 0.940 0.745 0.3

Mean C.I width 172845.945 181678.993 190951.615

Rel.bias Var 0.046 0.047 0.048

Rel.bias Tot -0.004 0.031 0.0632

Shapiro-Wilks(Tot) 0.47 0.559 0.668 Shapiro-Wilks(s.e) 0.009 0.003 0.001 Table 11: RMT85 estimated, CS82 auxiliary variable

(32)

Aux. var.REV84, 5000 MC trials, infl. perc. in the top row

n = 500 0 2.5%

Coverage rate 0.932 0.685

Mean C.I width 140829.202 148673.9

Rel.bias Var -0.04 -0.041 Rel.bias Tot 0.001 0.027 Shapiro-Wilks(Tot) 0.166 0.054 Shapiro-Wilks(s.e) 0.000 0.000 n = 1000 0 2.5% Coverage rate 0.933 0.47

Mean C.I width 94777.120 100111.494 Rel.bias Var -0.014 -0.015

Rel.bias Tot 0 0.027

Shapiro-Wilks(Tot) 0.099 0.06 Shapiro-Wilks(s.e) 0 0.0006

Table 12: RMT85 estimated, REV84 auxiliary variable

Aux. var.CS82, SS82, 5000 MC trials, infl. perc. in the top row

n = 500 0 2.5%

Coverage rate 0.9402 0.801 Mean C.I width 176419.948 181734.3 Rel.bias Var -0.005 -0.004 Rel.bias Tot -0.0004 0.027 Shapiro-Wilks(Tot) 0.872 0.870 Shapiro-Wilks(s.e) 0 0 n = 1000 0 2.5% Coverage rate 0.943 0.603

Mean C.I width 118203.01 121778.014

Rel.bias Var 0.034 0.037

Rel.bias Tot 0 0.028

Shapiro-Wilks(Tot) 0.14 0.142

Shapiro-Wilks(s.e) 0 0

(33)

10.3.2 Bootstrap

Aux. var. CS82, 1200 MC trials, n = 1000, 30 Bootstrap repl.

Inflation 0 2.5%

Mean C.I width 180294.938 189532.8

Rel.bias Tot -0.0009 0.031 Shapiro-Wilks(Tot) 0.427 0.4171 Shapiro-Wilks(s.e) 0.045 0.0441

Table 14: RMT85 estimated, CS82 aux. variable

Aux. var. REV84, 1200 MC trials, n = 1000,30 Bootstrap repl.

Inflation 0 2.5%

Mean C.I width 99407.28 104967.318 Rel.bias Var -0.008 -0.009 Rel.bias Tot -0.0006 0.026 Shapiro-Wilks(Tot) 0.355 0.350

Shapiro-Wilks(s.e) 0 0

Table 15: RMT85 estimated, REV84 aux. variable

Aux. var. CS82 and SS82, 1200 MC trials, n = 1000,30 Bootstrap repl.

Inflation 0 2.5%

Mean C.I width 124067.486 127847.398

Rel.bias Tot -0.001 0.027

Shapiro-Wilks(Tot) 0.593 0.626 Shapiro-Wilks(s.e) 0.5 0.303

(34)

10.3.3 Jackknife

Aux. var. CS82, 100 MC trials, n = 500

Inflation 0 2.5%

Mean C.I width 264426 278305,089

Table 17: RMT85 estimated, CS82 aux. variable

Aux. var. REV84, 100 MC trials, n = 500

Inflation 0 2.5%

Mean C.I width 145936,4 154183,1 Rel.bias Var -0.056 -0,053

Shapiro-Wilks(Tot) 0.287 0.265 Shapiro-Wilks(s.e) 0.39 0.427 Table 18: RMT85 estimated, REV84 aux. variable

Aux. var. CS82 and SS82, 100 MC trials, n = 500 Inflation (of CS82) 0 2.5%

Mean C.I width 182484.9 183307.56

Table 19: RMT85 estimated,CS82 and SS82 )aux. variable

10.4 Simulation Plots

(35)

Figure 4: Histogram of estimated totals and SEs, aux.var. CS82

Figure 5: QQ-plots of estimated totals and SEs, aux.var. CS82

(36)

(37)

Figure 9: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, n = 1000 Auxiliary variable CS82 with 2.5% measurement error:

(38)

Figure 11: QQ-plots of estimated totals and SEs, aux.var. CS82 inflated 2.5%

Figure 12: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, infl. 2.5%, n = 500

(39)

Figure 13: Histogram of estimated totals and SEs, aux.var. CS82 inflated 2.5%

(40)

Figure 15: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, infl. 2.5%, n= 1000

Auxiliary variable CS82 with 5% measurement error

(41)

Figure 17: QQ-plots of estimated totals and SEs, aux.var. CS82 inflated 5%

Figure 18: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, infl. 5%, n = 500

(42)

Figure 19: Histogram of estimated totals and SEs, aux.var. CS82 inflated 5%

(43)

Figure 21: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, infl. 5%, n = 1000

Auxiliary variable REV84 with no measurement error:

(44)

Figure 23: QQ-plots of estimated totals and SEs, aux.var. REV84

Figure 24: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84, n = 500

(45)

Figure 25: Histogram of estimated totals and SEs, aux.var. REV84

(46)

Figure 27: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84, n = 1000

Auxiliary variable REV84 with 2.5% measurement error:

(47)

Figure 29: QQ-plots of estimated totals and SEs, aux.var REV84. infl. 2.5%

Figure 30: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84 infl 2.5%, n = 500

(48)

Figure 31: Histogram of estimated totals and SEs, aux.var. REV84 infl. 2.5%

(49)

Figure 33: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84 infl 2.5%, n = 1000

Auxiliary variables CS82 and SS82 with no measurement error:

(50)

Figure 35: QQ-plots of estimated totals and SEs, aux.var. CS82 and SS82

Figure 36: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 and SS82, n = 500

(51)

Figure 37: Histogram of estimated totals and SEs, aux.var. CS82 and SS82, CS82 infl. 2.5%

(52)

Figure 39: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 and SS82, n = 500

(53)

Figure 41: QQ-plots of estimated totals and SEs, aux.var. CS82 and SS82, CS82 infl. 2.5%

Figure 42: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 and SS82, CS82 infl. 2.5%, n = 1000

10.4.2 Bootstrap

(54)

Figure 43: Histogram of estimated totals and SEs, aux.var.CS82

(55)

Figure 45: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, n = 1000, 30 Bootstrap replicates, 1200 MC trials

Auxiliary variable CS82 with 2.5% measurement error:

(56)

Figure 47: QQ-plots of estimated totals and SEs, aux.var. CS82 infl. 2.5%

Figure 48: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 infl. 2.5%, n = 1000, 30 Bootstrap replicates, 1200 MC trials

(57)

(58)

Figure 51: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84, n = 1000, 30 Bootstrap replicates, 1200 MC trials

Auxiliary variable REV84 with 2.5% measurement error:

(59)

Figure 53: QQ-plots of estimated totals and SEs, aux.var. REV84 infl. 2.5%

Figure 54: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84 infl 2.5%, n = 1000, 30 Bootstrap replicates, 1200 MC trials

(60)

Figure 55: Histogram of estimated totals and SEs, aux.var. CS82 and SS82

(61)

Figure 57: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 and SS82, n = 1000, 30 Bootstrap replicates, 1200 MC trials

Auxiliary variables CS82 and SS82 with 2.5% measurement error added to CS82:

(62)

Figure 60: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 and SS82, n = 1000, 30 Bootstrap replicates, 1200 MC trials, CS82 infl. 2.5%

10.4.3 Jackknife Auxiliary variable CS82:

(63)

(64)

Figure 63: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82, n = 500, 100 MC trials

Auxiliary variable CS82 with 2.5% measurement error added:

(65)

Figure 65: QQ-plots of estimated totals and SEs, aux.var. CS82 infl. 2.5%

Figure 66: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 infl. 2.5%, n = 500, 100 MC trials

(66)

(67)

Figure 69: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84, n = 500, 100 MC trials

Auxiliary variable REV84 with 2.5% measurement error added:

(68)

Figure 71: QQ-plots of estimated totals and SEs, aux.var. REV84 infl. 2.5%

Figure 72: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. REV84 infl. 2.5%, n = 500, 100 MC trials

(69)

Figure 73: Histogram of estimated totals and SEs, aux.var. CS82 and SS82

(70)

Figure 75: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 and SS82, n = 500, 100 MC trials

Auxiliary variables CS82 and SS82 with 2.5% measurement error added to CS82:

(71)

Figure 78: T-statistic histogram based on est. SEs with std.normal overlay, aux.var. CS82 and SS82, n = 500, 100 MC trials, CS82 infl. 2.5%

Variance Estimation of the Calibration Estimator with Measurement Errors in the Auxiliary Information