• No results found

ON THE EFFECTS OF THE FOLLOW-UP IN THE STATISTICS SWEDEN SURVEY OF HOUSEHOLD FINANCES

N/A
N/A
Protected

Academic year: 2021

Share "ON THE EFFECTS OF THE FOLLOW-UP IN THE STATISTICS SWEDEN SURVEY OF HOUSEHOLD FINANCES"

Copied!
71
0
0

Loading.... (view fulltext now)

Full text

(1)

Statistics C and D

Supervisor: Thomas Laitila Examiner: Sune Karlsson Spring term 2009

ON THE EFFECTS OF THE FOLLOW-UP

IN THE

STATISTICS SWEDEN SURVEY

OF HOUSEHOLD FINANCES

Milica Petric 720720

(2)

The Survey of Household Finances is one of Statistics Sweden’s household surveys carried out every year. The frame for the survey is the Swedish Total Population Register (TPR). A stratified network sampling with seven strata is used as the sampling design. The preliminary household to which each selected person belongs is identified through the TPR, i.e. the sampling unit is a person and the reporting unit is a household. The expected sample size is approximately 17000 households each year. Data is collected with two modes. The first mode is Computer Aided Telephone Interviews (CATI) and the second one is matching of information from the administrative data registers.

The targeted follow-up is used to reduce and minimize nonresponse in data collection and its effect on the estimates. The targeted follow-up used in the survey is described in the essay and the outcomes for the years 2005, 2006 and 2007 are analysed. The relationship between the relative nonresponse error and sample-weighted response rate is studied on the basis of the results with or without follow-up based on the administrative data. Comparisons of the precision gains for estimates of ratios and totals through employing follow-up are studied by comparisons of the expansion and GREG estimators. A study of the correlation between relative nonresponse error and sample-weighted response rate is conducted.

The estimation stage of the survey is studied by using the two-phase sampling when data from the follow-up is included. Effects on the estimates by using the two-phase sampling design are examined.

Key words: stratified network sampling, follow-up, relative nonresponse error, sample-weighted response rate, two-phase sampling.

(3)

Contents

1 Introduction ... 4

1.1 The Aim and Objectives of the Essay ... 5

1.2 Comment ... 5

1.3 Remark ... 6

2 Description of the Survey of Household Finances ... 7

2.1 General Description of the Survey ... 7

2.2 The Estimation in the Survey and Notation ... 11

2.2.1 Estimation of Totals Based on Administrative Registers ... 15

2.2.2 Estimation of Totals Based on the Response Set in the Survey ... 16

2.2.3 Estimation of the Other Parameters than Totals ... 18

2.3 Error Sources in the HEK Survey ... 19

3 Precision Gain of the Follow-up in the HEK Survey ... 21

3.1 Description of Follow-up in the Data Collection Step of the Survey... 22

3.2 Some Results from the Targeted Follow-up ... 28

4 Nonresponse Adjustment with the Calibration Approach ... 35

4.1 Calibration with or without the Follow-up in the HEK Survey... 36

5 The General Analysis of the Correlation between the Nonresponse Rate and the Nonresponse Error ... 41

5.1 Results of the Regression Analysis ... 42

6 Using Two-Phase Sampling in the Survey ... 45

6.1 The Present Procedure and Why Two-Phase Sampling is More Suitable... 45

6.2 Results of the Two-Phase Sampling and Comparisons with Regular Results in the HEK Survey ... 48

7 Discussion and Conclusions ... 50

References ... 53

Appendix ... 54

Appendix 1 – Status Codes from CATI Used at Statistics Sweden ... 54

Appendix 2 – The Estimated Disposable Income per Consuming Unit for a CATI Period 55 Appendix 3 – Results Excl. and Incl. Follow-up in the HEK Survey ... 56

Appendix 4 – Point- and Standard Error Estimates Used for the Tables ... 59

Appendix 5 – Precision Measure of the Estimates ... 63

Appendix 6 – Results Excl. and Incl. Follow-up in the HEK Survey Using the Calibration Estimator... 65

Appendix 7 – Results Given by the Calibration Estimator for Phases I and II ... 67

Appendix 8 – Results of the Estimation in the Case of Two-Phase Sampling and Present Estimation ... 69

(4)

4

1 Introduction

Since 1975, Statistics Sweden has conducted a national income distribution study commissioned by the Swedish Parliament (Survey of Household Finances, acronym HEK, formerly called the Income Survey, HINK). It is a sample survey carried out every year.

The aim of the survey is to describe the distribution of disposable income among households, to illustrate income structures and to describe expenditures and living situations for various types of households. As of the 2003 survey, a survey on housing and housing expenses is included. The target population is households. The household consists of people who were registered in the Swedish Total Population Register (TPR) sometime during the reference year (income year). The main published statistics are based on people who were registered the 1st of January as well as the 31st of December during the income year. Institutional households are excluded from the statistics.

Population characteristics of interest for people and households are estimated by means, medians, totals, number of people/households with certain characteristics and Gini coefficients. The Gini coefficient is a measure used to describe the inequality of the income distribution. Margins of error defined as two times the standard error of the estimates are used as measures for estimation uncertainty.

Estimates are derived for several domains of study, which are based on background characteristics. The main characteristics used are: type of household, gender, age, socio-economic classification, country of birth and deciles.

The concept of household that is used in the statistics is a housekeeping unit. This is the main household unit used since 1991 and it means a household of people who live together in the same dwelling, prepare and have meals together, and share the housekeeping. Composition of the household is measured with the reference time of the 31st of December. For other data that are included in the survey, the reference period is generally an income year.

(5)

1.1 The Aim and Objectives of the Essay

In this essay I am going to take up different issues with the aim of improving the quality of the HEK survey in some parts of the statistics production.

• The aim of the essay is to describe and document the estimation process of the survey in use.

• After the initial data collection in the survey there is a reminder or follow-up for those households that are nonrespondents. This involves the following objectives:

o Describe and document the follow-up in the survey. Up to now there is no documentation and description of the targeted follow-up in the survey.

o Examine the effects of the targeted follow-up on the estimates. The targeted follow-up is used with the idea of increasing precision of the estimates of the survey. No examination of the effects of the targeted follow-up has been done yet. o Evaluate the option of using calibration instead of the follow-up. Examine the

possibility of using these methods as alternatives to each other.

o Illustrate relationship between the nonresponse bias and the nonresponse rate on the basis of the results from the survey. In the statistical literature relationship is often described but rarely illustrated.

• Suggest a two-phase sampling design for the survey. Apply a two-phase sampling design in the estimation phase of the survey where follow-up is recognized as the second phase of the survey. Illustrate the effects of the design change on the empirical data.

1.2 Comment

In Sweden, many population studies are made possible by existence of many registers of good accuracy and timeliness based on people, such as the Total Population Register and the Income Register. These registers offer the possibility of making inferences about the Swedish population in many fields via direct sampling. One of the fields where the situation is different is statistics on households. Since there are different definitions of a household unit (e.g. register-based household, family and housekeeping), different studies will have different emphases.

(6)

6 For studies using a family or a housekeeping unit there are no possibilities of using direct sampling because there is no available frame. Hence indirect sampling, in this case network sampling, is used. In the future, the premises for these studies may be changed because work on the creation of a dwelling register has begun.

1.3 Remark

In the essay I am going to assume that improving the quality of the survey implies an additive model, i.e. improvements in every step of the statistical process result in quality improvement at the survey level.

In the essay I am not considering the cost of the survey or the cost of the different processes of the survey.

The SAS1

1

Software for the statistical analysis.

codes and programs used in the estimation of the results are not included in the reporting in the essay.

(7)

2 Description of the Survey of Household Finances

2.1 General Description of the Survey

The Household Finances is a yearly survey. The primary aim of the survey is to describe the distribution of the disposable income among households and to illustrate income structures. The secondary aim of the survey is to collect data on expenditures to be used as data input in complex analyses of relationships among income, expenditures, and household as well as individual characteristics. The data collected are used also by different Swedish departments and agencies.

The target population is all households in Sweden. The households are composed of people who were registered in the TPR sometime during the reference year (income year). Published statistics are usually estimated for so-called “whole year residence” for persons and households. A “whole year residence household” is a household where all the adults (18 years of age or more) have been civil registered in Sweden both at the beginning (1 January) and at the end (31 December) of the reference year. A “whole residence person” is an adult living in a “whole year residence household”. In published statistics, children under the age of 18 are always included in a “whole year residence household” according to the definition above.

The frame population is people in the Swedish TPR who are at least 18 years old. The sample is co-ordinated2

with the LINDA3

survey and the Household Budget Survey (HBS). The HEK survey is maximally positively co-ordinated with the LINDA survey to enable longitudinal studies of certain variables. At the same time the HEK survey is maximally negatively co-ordinated with the HBS to minimize the response burden of the persons in the frame. The HBS is maximally positively co-ordinated with the LINDA survey.

Stratified network sampling (Rosén 1987, Lavallée 2007) is used as a sampling design in the survey. Probability samples of the persons are obtained in each stratum. The preliminary household composition to which the selected person belongs is identified through the TPR, i.e. the sampling unit is a person and the reporting unit is a household. Then by means of the

2

For more detailed information on the co-ordinated samples se Lindblom (2003).

3

LINDA is a register-based longitudinal data set. It consists of a large panel of individuals, and their household members, which is representative for the population since 1960.

(8)

8 questions in the interview the final composition of the household is established, including all children too. Data are collected for both individuals and households. One person can be included in only one household.

Three of the strata used in the survey are defined as: • person living in owner-occupied flat,

• person not living in owner-occupied flat and is older than 74 years, and • person not living in owner-occupied flat and are between 18 and 74 years old.

People living in owner-occupied flats are an important study domain in the survey and that is why they form a stratum which is guaranteeing estimates with satisfactory precision.

Additionally four strata are defined comprising people with large capital gain and/or loss. These four strata are used to increase the precision of estimates of totals and to prevent the effect of outliers on estimated results. The four strata are excluded from the Computer Aided Telephone Interviews (CATI). One of the characteristics of this group of people is that its composition is stable over time, especially those with the largest values of the capital variables because capital is aggregated over years and generations. This group, in Sweden, is very small. Many of them, belonging to the strata of people with large capital gain and/or loss, are often included in the survey due to the large sampling fraction within these strata. At the same response burden for these people is likely to be unrealistic and their busy life puts a limit on the spare time they are willing to give up to being interviewed. Conclusion is that the probability of getting answers is very low. Therefore they are included in the nonresponse.

A sample among the four strata is selected one year after the reference year, i.e. six months after the data collection phase. This late selection is due to fact that information about large capital gain and/or loss is not available until tax returns are completed for a fiscal year. In practice, when information about large capital gain and/or loss is available, the frame population is sorted into seven strata instead of the initial three strata.

All individuals are sorted into the seven so-called, new strata and within each stratum persons are sorted in accordance with the permanent random numbers. Then, all persons and their households sorted in the original three strata persist in these strata; those that are sorted in the additional four

(9)

strata are selected and also extra persons are selected so that a sample from every additional stratum consists of ten persons and their households. One of the practical results of this approach is that the initial inclusion probabilities are changed for households between the beginning of the survey and the estimation step.

The total number of strata in the survey is seven. The expected sample size is approximately 17 000 households per year. Data is collected by means of two modes. The first mode is CATI and the second mode is matching of information from the administrative data registers that contain data about e.g. income, taxes and transfer payments for each income year.

Selected households are divided into four CATI groups according to certain characteristics. The CATI groups are about 4200 households each. The first group is mainly consists of households living in a rental flat, the second group mainly consists of households living in an owner-occupied flat; and a group mainly consisting of households living in a one- or two-dwelling building or in an agricultural property are divided into the third and the fourth group. The time of data collection is different for each group for practical reasons and differences of sizes of the groups. To prevent nonresponse in the survey a follow-up of the nonrespondents is used. After every initial data collection there is a time period for the follow-up. A more detailed description of the follow-up study is given in Chapter 3.

Target tables with target variables and domains are defined by the tables in the Publishing Plan. Collected data is edited, coded and corrected. There is a nonresponse both at the unit and at the item level in the survey. The nonresponse at the unit level is data missing for a household. The nonresponse at the item level is data missing for some questions concerning a household. A nonresponse adjustment for item nonresponse in the survey is done by means of different imputation methods such as regression- , nearest neighbour-, respondent mean and cold deck imputation. Calibration4

4

See Lundström and Särndal 2001

as a method is used for adjusting for the negative effects of the unit nonresponse in the survey. Two estimators are used in the survey. If collected data comes from administrative registers the π-estimator is used for the estimation of parameters (see Särndal, Swensson and Wretman 1992). This estimator is used because data used for estimation is based on administrative registers and is complete, i.e. without nonresponse. For parameters estimated

(10)

10 on the basis of questionnaire variables the Generalised REGression (GREG)5

• explain the variation of the response probabilities,

estimator is used. The GREG estimator is used to reduce nonresponse bias and variance as it corresponds to a calibration estimator with standard weighting. Three principles should be considered when selecting a vector of auxiliary variables. According to Lundström and Särndal (2001, 2005) the auxiliary vector should:

• explain the variation of the main study variables, and • identify the most important domains.

A calibration approach is described here, but in fact the GREG estimator is used in the estimation of the HEK survey. The calibration estimator is a broad group of estimators which also contains the GREG estimator as a specific case. One basic difference is that in the case of the GREG estimator the response probabilities of units in the survey are estimated on the basis of the proportion of the responding units among the selected units. Similarities and differences of the estimators and some characteristics are described in Särndal and Lundström (2005, Chapters 6 and 7) as well as in Andersson (2007, Chapters 4 and 5).

After a nonresponse analysis in the survey, the final auxiliary vector used in the estimation is described in Table 1. The choice of variables is made based on nonresponse analysis and analysis of the distributions of weights. In the table the aggregate levels of the auxiliary variable are marked in columns.

5

(11)

Table 1. The auxiliary vector used in the estimation of the HEK survey, with the levels of the

aggregation of the auxiliary variable are marked in columns

Auxiliary variables (number of classes) Number of

individuals Number of months Total

1 Age group (10) X

2 Sex X

3 Geographical regions, so-called H regions (9) X

4 Social allowance X X

5 National child allowance X X

6 Maintenance for a child

7 Parents’ allowance for birth of child X X

8 Temporary parents’ allowance for sick child X X

9 Housing allowance X X

10 Rent allowance for pensioners X X

11 Sickness allowance X X

12 Unemployment benefits X X

13 Educational grant X X

14 Income from wage or salary (5) X X

15 Country of birth (6) X

16 Immigration year (3) X

17 Type of household (10) X

2.2 The Estimation in the Survey and Notation6

Let Uh denote the set of Nh persons that belong to the stratum h of frame population (h=1,…,7).

Further let U ′ denote the population of N′ households, which is identifiable from the

= = 7 1 h h N

N persons in the frame population

7 1 = = h h U

U . For a household iU′, let U ′i denote the

set of N′ persons who compose the household and let i Ui denote the set of persons NiNi′in the household who are included in the population U and hence eligible for selection. Figures 1 and 2 illustrate the situation. Every person k belongs to one and only one household in U ′ . At the same time, there can be persons in the household i that do not belong to U because of e.g. the age limitation of the frame population or the undercoverage of the frame.

6

(12)

12

Figure 1. Network sampling (X are selected, * are eligible but not selected and O are ineligible)

Note 2.2.1: Statistical measures describing households are indexed with HH, while IND is used

as the index for measures describing population of people.

Let Ydenote a study variable and let y denote its value for person k. Then if the targeted objects k

are households in domain d the total of variable y can be expressed as

′ ∈ = U i yi di yHH I t t d where

′ ∈ = i U k k yi y

t is the total for the variable y for household i and    = otherwise d domain to belongs i household if Idi 0 1

An expression for a total of variable y when the targeted objects are persons is

∑ ∑

′ ∈ ∈ ′ ′ ∈ = = U i k U k dk U i i y yIND i d d t I y t

Is the total for the variable y for persons who compose household i belonging to domain d and

   = otherwise d domain to belongs k person if Idk 0 1

In the case where a total is the number of persons having some desired characteristic the expression is . . . Persons Households X * O X X . . . X

(13)

′ ∈ = i d U k dk IND I I t

and for a total of households having some desired characteristic the expression is

′ ∈ = U i di HH I I t d

To estimate parameters such as

d

yHH

t and

d

yIND

t a primary sample s of n persons is taken from the U population of size N . Assume that the inclusion probabilities π for all persons k k∈ are U

known and πk >0. The sampling design used in the survey is a stratified network sampling with seven strata, h (h =1,…,7). All the persons belonging to the household selected via primary sample s constitute the secondary sample s′ . Household i consists of N′i persons, among whom

i

N belong to frame population U and are eligible for selection to the primary sample (see Figure 2). The secondary sample s′ consists of n′ households.

Figure 2. Illustration of HEK population (households in the target population consist of the

eligible persons, black circles, and sometimes also of one or more non-eligible persons, grey circles)

A household can be selected more than once in the primary sample s since a household consists of one or more eligible persons from the frame population U. The inclusion probability of being in the secondary sample of a household i depends on the number of the eligible persons from the frame, i.e. the inclusion probability is a function of number of persons in a household who are

Household Ui´

where

N i´=5 (three adults and two

children) and

Ni=3 (three adults) belonging Uk

Target population U´ Frame population, U Person 1 Person 2 . . . . Person k . . . Person n-1 Person n . . . . Person N

(14)

14 eligible from the frame and their inclusion probabilities. It means that a household composed of more than one eligible person has a higher inclusion probability. If estimates for both persons and households are desired on the basis of sample s′ , correct selection probabilities or design weights should be associated with all persons in every selected household. By including a household weight (Rosén 1987) in the estimation, problems with varying inclusion probabilities in the same stratum and varying inclusion probabilities in a selected household are considered. In the survey a person’s household or network weight is

(

)

∈ = ) ( ) ( ) ( ) | ( ) | ( / / k i U l l h l h i k h i k h k N n N n a , kUi

where h(k|i)is the stratum selected person k belongs to from the given household i and i(k)is all eligible persons in the household i to which k belongs to. Also nh(k|i)/Nh(k|i) is the inclusion

probability of a person k from given household i and

(

)

∈ ( ) ) ( ) ( / k i U l l h l h N

n is a sum of all inclusion probabilities for all eligible persons l belonging to a household i. This implies that all eligible persons in the household i give

=1

Ui

k k

a . If all eligible persons of a household belong to the same stratum the household weight is

∈ = ) ( 1 k i U l l k I a where

Ui( k) l l

I is number of the eligible persons from the given household. If eligible persons of a

household belong to different strata it means that a household’s weight varies. Once a person of the household is selected the household’s weight is fixed. Similar suggestions for the definition of network weight are described in Lavallée (2007) and Andersson (1997). According to Rosén (2000) the choice of household weight does not affect the estimator’s unbiasedness, only it’s precision.

(15)

2.2.1 Estimation of Totals Based on Administrative Registers

For parameters based on administrative registers, data are matched for all persons belonging tos′ . The unbiased7 weighted8

d

y t

π-estimator of the population total for a domain can for households be written as

∑∑

∈ = h k s k yi k di k yHH h d w I t tˆ ( ) ( ) where h k h k n a N

w = is a weight for ksh

(

h=1,...,7

)

and

( )

′ ∈ = k i s l l k yi y t ( )

while for persons

∑ ∑

∈ = h k s k i y k yIND h d d wt tˆ ( ) where ( )

′ ∈ = k i d s l l dl k i y I y t ( ) A variance estimator9

( )

1 1 ˆ ˆ 2 ) ( ) ( ) ( ) ( 2 −       −       − =

h s l l s h l yi l di l k yi k di k h h h h h yHH n n t I a t I a N n n N t V h h d is given by and

( )

1 1 ˆ ˆ 2 ) ( ) ( 2 −       −       − =

h s k l s h l i y l k i y k h h h h h yIND n n t a t a N n n N t V h h d d d

7Rosen (1987) states that if there is no measurement error in the survey this is an unbiased estimator. 8 What is here called the weighted π-estimator is just the π estimator extended with a network weight

k

a

9

(16)

16

2.2.2 Estimation of Totals Based on the Response Set in the Survey

As in most sample surveys, there is a nonresponse10

s r =

problem in the HEK survey. Some desired data is missing because of refusal to provide information or because no contact could be established with the selected persons and their households. Data is available only for a response set r. Full response in a survey gives so r⊆ . Since responding and nonresponding s households may differ systematically regarding study variables, there are potential nonresponse errors. Nonresponse can have a negative effect on the quality of estimates in terms of bias and lower precision. The size of the nonresponse bias cannot be quantified. To quantify the nonresponse bias information about the whole target population is needed. But if such information is available there is no need of selecting a sample. However, it is assumed that the nonresponse bias can often be reduced by using auxiliary variables in the estimation. One such approach is the calibration method (Lundström and Särndal, 2001). If auxiliary variables are selected in a way to comply with the principles (Section 2.1) it results in reduced nonresponse bias and standard error for the estimates. The CLAN9711 and ETOS12 are programs developed at Statistics Sweden which are straightforward and flexible for use in the estimation. These programs simplify the use of the complex sample designs in estimation phase of the surveys at the Statistics Sweden.

In stratum h there are mh responses from nh persons who belong to sample s . The set of h responses in a stratum h is denoted as r . Auxiliary variables used in the HEK survey are shown h

in Table 1.

Let x denote the auxiliary vector based on selected auxiliary variables. The total

d yHH t is estimated by 10

When I write about nonresponse I consider unit nonresponse. Item nonresponse of the survey is not treated in the essay. Moreover item nonresponse in the HEK survey is treated by imputation.

11 A SAS program for computation of point and standard error estimates in sample surveys used at Statistics Sweden (Andersson

and Nordberg, 1998).

12

“ETOS (Estimation of Totals and Order Statistics) is a program designed to compute point- and standard error estimates from sample surveys of totals and order statistics.” and “The ideas in ETOS rely heavily on the program CLAN97, developed at Statistics Sweden by Andersson and Nordberg (1998).” Andersson (2007)

(17)

∑ ∑

∈ ∗ = h k r k yi k di k yHH h d w I t tˆ ( ) ( )

for krh , while a total tyINDd is estimated by

∑ ∑

∈ ∗ = h k r k i y k yIND h d d w t tˆ ( ) where h k k i h k m a g N w∗= ( ) for k∈ . rh

and gi(k) is the calibration weight. Calibration as a method for compensation for nonresponse in the HEK survey is done at the household level. It means that all persons in the same household i have the same calibration weight. The weight gi(k)13

) ( 1 ) ( ) ( ) ( ) ( 1 xik xik h k r k xi k h h h k r k xi k h h x k i h h a m N a m N g t t t t t − ∈ ∈      ′ ′       ∑ − + =

∑ ∑

can be written as where

′ ∈ = i U k k xi x

t and t is the total for all households in the target population, i.e. in the frame. x

The variance estimator14

( )

1 1 ˆ ˆ 2 2 −       −       − =

h r k h r l l HH k HH h h h h h yHH m m u u N m m N t V h h d d d is then: and also

(

)

1 1 ˆ ˆ 2 7 1 2 −       −       − =

∈ = h r k h r l l IND k IND h h h h h yIND m m u u N m m N t V h h d d d where

(

d

)

dk ik k di k yik xik HH HH g a I t u = ( ) ( ) ( )−t′ ( )Bˆ

∑ ∑

∑ ∑

= ∈ − = ∈      ′ = 7 1 ) ( ) ( ) ( 1 7 1 ) ( ) ( ˆ h k r k yi k di k xi k h h h k r k xi k xi k h h HH h h d m a I t N a m N t t t B 13

Lundström and Särndal (2001, Chapter 4), SCB (2006) and modified from Andersson (2007, Chapter 5.3).

14

(18)

18 and

(

d d

)

dk ik k yik xik IND IND g a t u = ( ) ( )t( )Bˆ where

∑ ∑

∑ ∑

= ∈ − = ∈      ′ = 7 1 ) ( ) ( 1 7 1 ) ( ) ( ˆ h k r k i y k xi k h h h k r k xi k xi k h h IND h d h d a t m N a m N t t t B

2.2.3 Estimation of the Other Parameters than Totals

Population totals are a set of the characteristics of interest in the HEK study. Other parameters and statistical measures of interest are functions of totals or functions of order statistics such as means, percentiles, total for the given percentiles, medians and Gini-coefficients. A description of estimators of order statistics can be found in Andersson (2007) and SCB (2006). Estimation of other parameters is briefly described here.

Expressions of parameters depend on the objects targeted in the survey, i.e. on whether we are using HH or IND.

Mean

The mean of variable y for households in domain d can be expressed by

d y U i di U i k U k di HHd N t I y I y i HHd ′ = =

∑ ∑

′ ∈ ′ ∈ ∈ ′

Let zk = 1 Ni′ for kUi then it is valid that

=1

Ui

k k

z for all households i∈ and it follows U′ that tzHH Nd

d = ′. Thus the denominator in the expression of the mean is defined as a total.

Consider that the mean of variable y for persons in domain d can be expressed by

INDd y U i k U dk U i k U k dk INDd N t I y I y INDd i i ′ = =

∑ ∑

∑ ∑

′ ∈ ∈ ′ ′ ∈ ∈ ′

Let zk =1 for all k and it follows that tzINDd =NINDd′ . Thus the denominator in the expression of

the mean is defined as a total.

(19)

d y HHd N t y HHd ′ = ˆ ˆ ˆ and INDd y INDd N t y INDd ′ = ˆ ˆ ˆ

For expressions of totals in the ratio above, see Section 2.2.1.

Expressions of the variances for HHdand INDd can be found in Section 5 of Särndal, Swensson and Wretman (1992).

2.3 Error Sources in the HEK Survey

A survey consists of many operations from describing a problem to final output of results. Each step of a survey can be associated with error source. The types of error can be divided into two groups: sampling and nonsampling.

The sampling error is due to selecting a sample instead of enumeration. This type of error is considered by the presentation of point and standard error estimates. The group of nonsampling errors consists of nonresponse, coverage, measurement and processing errors. These errors and their influence on the survey’s results are briefly described below.

Nonresponse error is due to not observing all selected persons and their households. After the nonresponse study in the survey, which is done from time to time the conclusion can be drawn that nonresponse in the survey is not random. The auxiliary variables which are used in the calibration should reduce and minimize nonresponse error in estimates.

Coverage error occurs when there are differences between the target and the frame population. Overcoverage in the survey consists of persons and households that are wrongly selected into the survey. This type of error is very small in the HEK survey. Before collecting data an updated version of the frame is used to minimize overcoverage. Also households that do not exist in administrative registers other then TPR or about whose economic transactions within the welfare system there information is missing are considered as overcoverage in spite of actually belonging to the nonresponse. In this way the risk of overestimating income for households is minimized.

(20)

20 Undercoverage in the survey is made up of those persons and households that should be in the frame but are not. It is a question of persons and their household recently moved to Sweden and not registered in the TPR yet. Also undercoverage involves households consisting of only persons younger than 18 years old. No correction of errors due to coverage errors is done. Periodically, the survey analyses the volume of persons constituting coverage error and their characteristics. Up to now the coverage errors and theirs effects on estimates have been considered small.

An important source for measurement error is data collection: i.e. the respondents give, intentionally or unintentionally, wrong answers. The quality of the data matched from administrative registers has been improved yearly and is of a satisfactory standard. Variable the type of household is used as a study domain for estimates of income. These estimates are based on the sample: variable type of a household is based on interviews but data from the TPR is used for imputation for households in the nonresponse group. A TPR family is made of persons that are married, registered partners, cohabitants with common child or children regardless of the ages of the latter. There is no information from the TPR about families that are cohabitants without common child or without children. These households are registered, in the TPR, as single households. Consequently, the estimated number of single households is slightly too large through being based on the register information.

(21)

3 Precision Gain of the Follow-up in the HEK Survey

The response rate in a survey gives information on the realisation of a certain sample and it is not a direct quality indicator of the survey’s results. A few years ago the selection of units in the follow-up study was changed from random selection of households to a targeted follow-up among 24 groups, yielding a stratified sampling design using permanent random numbers. Previously sample selection for the follow-up study was done among all eligible households belonging to the nonresponse group within the CATI system, households are selected at random regardless their characteristics. Reason for changing the method of the follow-up study was that households in groups that are easier to get answer from were increased among respondents after the follow-up. Sample size allocation of the follow-up is based on information about nonresponse error of the estimated disposable income per consuming unit15

and the response rate in every group. Units within groups with large values of absolute nonresponse error of the estimated disposable income per consuming unit and low response rate have a higher probability of being selected to the sample for the follow-up study. This strategy is expected to yeld higher precision and reduces nonresponse error in the estimates of the survey.

A reason for still using the response rate as one of the indicators to allocate the sample for follow-up, despite the fact that it is not a direct quality indicator, is the use of the survey data for tailor-made statistics. These tailor-made statistics use various domains that are not taken into account during the designing of the survey. The number of the responding units of the survey is also the number of the observations that tailor-made estimates are based on for every domain and it is one of the indicators that Statistics Sweden investigates the possibility of undertaking an assignment and carrying it out with the satisfactory level of quality.

To be able to evaluate the follow-up of the HEK survey it is important that it is possible to compute point estimates and estimates of the standard error both before and after the follow-up study. This is also important when it comes to being able to compare estimates of the different stages of the survey.

Information about the consuming unit for a household and the household’s weight is based on information about the composition of a household. Register information, from the TPR, about the

15

(22)

22 household’s composition is imputed for the nonresponding household. It means that households that are nonresponding after the follow-up and households belonging to the overcoverage are treated here as responding, since we have information about the household’s composition from the TPR. These imputed data are treated as correct information. It is not taken into account during estimation that data are coming from the TPR. This does not interfere with analyses that are to be done in the following sections because validity of the estimates level is not essential.

Descriptions of the groups used for study domains in the tables are explained in Appendix 9.

3.1 Description of Follow-up in the Data Collection Step of the Survey

A HEK sample is divided into four non-random CATI groups for fieldwork. Groups are approximately 4200 households each and are created on the basis of a certain characteristic, e.g. type. The first group mainly consists of households living in a rental flats, the second group mainly consists of households living in an owner-occupied flats; and a group, mainly consisting of households living in a one- or two-dwelling buildings or in an agricultural property, is divided into the third and the fourth group.

One week before the fieldwork, all selected households receive a letter of information about the survey giving the information as purpose of the survey, importance of their response and examples of the questions. During the CATI period of four to five weeks, data is collected. See the figure below for a scheme of the survey’s fieldwork. In spite of all the efforts of the interviewers and different strategies to maximize the number of respondents a nonresponse occurs. Every selected household from the CATI group is given a status code (see Appendix 1). These codes help to sort households into response, nonresponse, non-contacted or overcoverage. It is assumed that all non-contacted households represent nonresponse. This is possible because a so-called ID-run16 is done before every wave. In this way, the risk of having out-of-date

information about household in the sample is kept very small, assuming that information in the TPR is up-to-date. It means that the sample is updated with the most recent information from the TPR. These updates often concerns information as to whether person and household still belong to the target population.

16An

(23)

Figure 3. Illustration of the HEK’s data collection scheme (2008) Week 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Group I / Wave I Group II / Wave II

Group III / Wave III

Group IV / Wave IV

Introductory letter to the selected households Fieldwork Follow-up

After the ordinary data collection period, the CATI period, a follow-up study is used in the survey. Only a part of the nonresponding households are selected for the follow-up study. A sample is selected among nonresponding households.

A sample for follow-up is selected as follows. All households included in one CATI period are divided into 24 groups according to background characteristics. Variables that are used to form groups are part of the auxiliary vector for the survey. These are:

• strata – stratum II (older than 74 years), stratum I (living in the owner-occupied flat) and others ; three groups,

• nationality (Swedish or Nordic, and others); two groups,

• metropolitan (living in Stockholm, Göteborg or Malmö, and others); two groups, • type of household (one-person household, and others); two groups.

All combinations of the variables give 24 groups and are described in Table A9a (Appendix 9).

Information about persons’ yearly income17 is matched from the income register and disposable

income per consuming unit18

17

Income data is referenced to the year before the reference year of the survey. The reason is that income data is available at the end of year after the reference year. Follow-up is done at the beginning of the year after the reference year. An assumption is that income is not changed radically between two years, and those that are without incomes (who are very few) are assumed to have zero disposable income. The group with zero income often consists of newly born or people having newly moved to Sweden.

is estimated for responding, nonresponding and all selected households of 24 groups. A response rate after the initial data collection is calculated. See Table

18

(24)

24 A2a (Appendix 2) for more details. Let the estimated disposable income per consuming unit be designated yˆHHd(S)for the sample, yˆHHd(R) for response set and yˆHHd(O)for nonresponse set. An absolute relative nonresponse error of the disposable income per consuming unit is defined as

) ( ) ( ) ( ˆ ˆ ˆ S HHd S HHd R HHd d d y y y error e Nonrespons =ψ = −

for domain d of the given sample and wave. More general way of expressing the absolute relative nonresponse error for given estimates of the desired variable,θˆ , is

) ( ) ( ) ( ˆ ˆ ˆ S HHd S HHd R HHd d d error e Nonrespons θ θ θ ψ = − =

Not all households among nonresponding households are eligible for selection for the follow-up study. Some status codes are excluded from selection for the follow-up study. A sample is selected among households with status codes 41, 50, 52, 56 and 60-67 (see Appendix 1). Other status codes that are nonresponse codes but excluded from the possibility of being selected for the follow-up study are ones having extremely low response probabilities. This sorting of nonrespondents in two groups, eligible and non-eligible, for the selection to the follow-up study is based on the experiences.

In the survey budget a certain sample size for the follow-up study is specified. In the follow-up study scheme it was planned to select approximately 700 households per wave. The sample for the follow-up are allocated to different groups depending on the group’s nonresponse error of the disposable income per consuming unit, the nonresponse rate in the group and the number of eligible households in the group. It is an iterative process and is done until the expected size for the follow-up study is reached. The process of allocation of the sample for the follow-up study is described below.

• The main condition for the allocation of the sample for the follow-up study is a proportion of the nonresponse error for a given group of the summed nonresponse error for all groups. It can be expressed as

Ψ = Ψ d d ψ whereΨ=

d d ψ .

• The upper limit of the allocated number of the households in the sample for a given group is the number of eligible households in the group, no′, where O′ is a set of eligible

(25)

households among nonresponding households, O′≤O. The upper limit for the allocation of the sample among groups can be affected by the number of households needed to achieve the desired response rate for a group. Often the desired response rate of a group is 70 percent. If the response rate of the survey is

d S d R d R n n p , = , ,

for group d then the desired response rate is pr,d =0.70. The following criterion is calculated for every group:

(

rd Sd Rd

)

r

R

d p n n

n = , ,,

and ndRr ≥0where pR,dpr,d ≤1. In the survey the value of the desired response rate is often used.

• The lower limit of the allocated number of the households in the sample is at least five households from every group. If the number of the eligible households in some group is less than five, this group is enumerated in the follow-up study of the wave.

If we assume that all non-interviewed households are nonresponse then the sample-weighted response rate * R p is defined as

′ ′ ′ + = o i r i r i R w w w p*

wherew is the design weight for household i and i r′ is the response set of households and o′ designates households belonging to the nonresponse set. Also the corresponding measure of the sample-weighted nonresponse rate is

(

)

′ ′ ′ ′ + − = − = o i r i r i R O w w w p p* 1 * 1

In Table A2a (Appendix 2) calculated criteria can be found. In Figure 4, below, the results regarding the response rate from including and excluding the follow-up study are shown. In Figure 5 the results from all four waves in the survey divided into 24 groups, including and excluding the follow-up study, are shown. Totally, in all four waves of the survey approximately

(26)

26 2 200 households are selected for the follow-up study. With this strategy in 2007 the sample-weighted response rate is increased by roughly 7 percent (4.3 percentage units).

Figure 4. Categories of the realisation of wave 1 survey excl. and incl. follow-up study

Figure 5. Results concerning the sample-weighted response rates for total and 24 groups used in

the follow-up study, shown excl. and incl. follow-up study for all four waves of the 2007 survey

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% tot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

(27)

In Figures 5 and 6 the sample-weighted response rate is shown exclusive and inclusive of the follow-up study. In Figure 5 follow-up groups are used to see the realisation of the follow-up work. Figures with results for 2005 and 2006 can be found in Appendix 3 (Figures A3a and A3b). As can be seen in Figure 6, the sample-weighted response rate is not changed radically during 2005, 2006 and 2007. Increment of the response rate due to the follow-up is stable over past three years.

Figure 6. The sample-weighted response rate excl. and incl. follow-up study in the survey for the

years 2005, 2006 and 2007 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 2005 2006 2007

Sample-weighted response rate Excl. follow-up Sample-weighted response rate Incl. Follow-up

In Table 2 estimated sizes of the groups used for the follow-up study are shown. In spite of the various levels of the sample-weighted response rate among groups it can be seen that larger groups, such as groups 3, 11 and 17-20, are very stable for the years 2005, 2006 and 2007 with respect to response rate. Tables for 2005 and 2006 are not shown.

(28)

28

Table 2. Estimated number of households in the 24 follow-up groups 2007(rounded figures)

Number of households Number of households Group estimate Point Standard error Group estimate Point Standard error

1 53 570 3 620 13 31 800 2 930 2 26 480 1 960 14 26 110 1 980 3 264 840 6 380 15 21 120 2 360 4 117 060 3 510 16 22 600 1 870 5 11 160 1 710 17 289 150 11 650 6 1 490 510 18 331 710 9 210 7 8 740 1 520 19 986 810 19 740 8 4 600 870 20 1 240 630 14 140 9 197 020 6 710 21 91 690 6 810 10 117 020 3 930 22 88 750 5 030 11 289 880 7 760 23 109 590 7 370 12 166 200 4 550 24 136 030 6 130 Total 4 634 060 15 150

Many households are contacted in the follow-up study. It can be seen, in the table below, that only half of the households selected into the follow-up study do not change status code between the initial data collection step and the follow-up study. It does not mean that those are not contacted. It may be recollected that status codes beginning with 4 are unable to answer, 5 are non-contact, 6 are refusals and 9 are overcoverage. The table is calculated on one-digit status code. A household can change a code between two-digit codes within the one and the same one-digit status code. The positive result from the follow-up study is that about one fourth of refusals in the ordinary data collection step of the survey are response after the follow-up study. Data for the previous years can be found in Appendix 3 (Tables A3g and A3h).

Table 3. Sample realisation for households selected for the survey’s follow-up study, 2007

Follow-up

Status code 1 4 5 6 9 Total

Ordinary 4* 8 12 3 11 0 34

5 380 23 282 130 17 832

6 358 31 105 826 3 1 323

Total 746 66 390 967 20 2 189

(29)

3.2 Some Results from the Targeted Follow-up

To be able to consider the accuracy of an estimate, when an unbiased estimator is used and the survey suffers from nonresponse, it is important to determine how strong the relationship is between the survey’s variables of interest and the probability of being a respondent. It is assumed that the probability of being a respondent is connected to the variables being collected in the survey. This direct analysis is not possible since information on the entire sample is not available. It is assumed that these characteristics can be estimated by studying the relationship between the response rate and the nonresponse error.

To compare relationship between the sample-weighted response rate and the nonresponse error an expansion estimator is used for calculation of the nonresponse error of the two variables, the disposable income per consuming unit and the number of the households. These two variables are considered to be the two main study variables in the HEK survey. Often, in practice, the expansion estimator is used as a benchmark estimator for other types of estimators, which is case here. This estimator is going to be compared with the GREG estimator in the next chapter. The expansion estimator is also a calibration estimator with a simplest formulation of an auxiliary vector, xi =1 for all households i. This gives the weight

h h i

m n

g = for every household i. This estimator is also often used when no useful auxiliary information is available and it can be assumed in the survey that nonresponse occurs randomly.

Note 3.2.1: Many of the ideas for analysing the covariance between the response rate and the

nonresponse error in this chapter are inspired by Groves (2006).

With a successful follow-up of the survey, the nonresponse error of the target variable should significantly decrease while the response rate or the sample-weighted response rate is increased. With a well-planned and well-implemented targeted follow-up we assume that results from the survey presented in a figure should show an overall pattern of reduction of the nonresponse error for the targeted variable while the response rate is increased. An increase of the response rate is not always directly proportional to a decrease of the nonresponse error but an overall pattern of the reduction of the nonresponse error for the targeted variable should be seen. Illustrated in the figures, this expected pattern should be clearly seen within groups that have very low response

(30)

30 rate, and through the sharp increase in the response rate after the follow-up the nonresponse error should decrease.

Note 3.2.2: Sometimes the nonresponse error can be increased through an outlier being a

respondent. Sometimes when a household with special characteristics responds and it diverges from other responding households in the group, it can increase nonresponse error.

In every wave of the HEK survey, about 70 percent of the eligible nonresponding households, after initial data collection phase of the survey, are selected for the follow-up of the survey. After completion of the follow-up the response rate is slightly increased and almost no changes in the level of the nonresponse error. The response rate of the follow-up in the survey is only about 35 percent. As is indicated in Figure 7, the nonresponse error should decrease when the response rate increases which is desirable feature. In the figure, we should see that nonresponse error after follow-up study (denoted with blue circles in figure) are almost always found below nonresponse error before follow-up study (denoted with red circles in figure). The desired results of the follow-up in the survey are not seen in Figure 7. The basis of the results shown in the figure through the targeted follow-up, there is no possibility of claiming that nonresponse error and response rates are highly negatively correlated for the variable disposable income per consuming unit. The year 2007 is not specific in any way: the same pattern can be seen in the figures concerning 2005 and 2006, to be found in Appendix 3 (Figures A3c and A3d).

(31)

Figure 7. Results concerning the sample-weighted response rate and the absolute nonresponse

error of the estimated disposable income per consuming unit for total and 24 groups used for the targeted follow-up, 2007, expansion estimator

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% tot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Weighted response rate Excl. follow-up Weighted response rate Incl. follow-up Absolute nonresponse error Excl. follow-up Absolute nonresponse error Incl. follow-up

Effect of an extreme outlier beeing respondent

Note 3.2.3: In the results and analysis ahead I assume that all overcoverage in the sample belongs

to the target population. Some of the selected households are already coded as overcoverage with the ID-run but others are coded after the ordinary data collection step. Some of households that belong to the group of the non-contacted households after the ordinary data collection step are found to belong to overcoverage after the follow-up. With the help of this assumption a sample is divided into two non-overlapping sets of respondents and nonrespondents. This simplifies analysis also the making of comparisons between different steps of the survey and different estimators.

The study variable disposable income per consuming unit is a ratio. As a parameter a ratio is a function of two totals and has less variability than a parameter based on a total. Variability as a characteristic of a ratio means that if changes of totals are represented almost equally in the numerator as in the denominator than changes of the ratio are marginal. This is valid under the assumption that changes in the numerator and the denominator are of same kind, i.e. both the numerator and the denominator are decreased or increased. This can be one of the explanations of the very low relationship between the response rate and the nonresponse error of the disposable income per consuming unit. If we compare nonresponse error calculated for the estimate of the

(32)

32 total number of households with the sample-weighted response rate, after the initial data collection was completed and the follow-up was completed, properties illustrated in a figure for a total can differ from properties illustrated in a figure for a ratio. Results for 2007 are shown in Figure 8 and results for 2005 and 2006 are to be found in Appendix 3 (Tables A3e and A3f).

Figure 8. Results concerning the sample-weighted response rate and the absolute nonresponse

error of the estimated total number of households for total and 24 groups used for the targeted follow-up, 2007, expansion estimator is used

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% tot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Weighted response rate Excl. follow-up Weighted response rate Incl. follow-up Absolute nonresponse error Excl. follow-up Absolute nonresponse error Incl. follow-up

Almost in every group a reduction of nonresponse error can be seen. There can be seen an overall reduction of the nonresponse error after the follow-up. It is quite obvious that the greater the increase of the response rate, the greater the reduction of the absolute nonresponse error. It can be seen that the correlation between the sample-weighted response rate and the absolute nonresponse error is negative. The same pattern can be seen for the years 2005 and 2006. In spite of small increase and low level of the sample-weighted response rate after the follow-up, some reduction of the nonresponse error can be observed for the variable.

Up to now I have concentrated on the analysis of the measures based on the point estimator (nonresponse error). To decide about a quality change the precision of an estimator should be taken into account by using the estimate of standard error. Effects of the designed and well-implemented follow-up with an adequate targeted plan for follow-up cannot be seen only in the

(33)

changes of the point estimates. It is possible that an increasing response rate may not always lead to lower nonresponse error for a certain point estimator. The position and precision of a certain estimator depend on the number of the individuals in the target population, the number and type of respondents and the variable’s range. An estimate of a small study domain can suffer from low precision regardless of the response rate of the domain. Rather it depends on domain’s size and homogeneity with respect to the target variable. An often used indicator of the precision achieved in the survey is the coefficient of variation (cve) of the estimator for a given sample. The estimate of the coefficient of variation for the given estimator θˆ takes the form

( )

( )

θ θ θ ˆ ˆ ˆ ˆ V cve = where

( )

θˆ ˆ

V is the estimated standard error of the estimator θˆ .

Table 4. The cve estimates, 2007, expansion estimator is used (cve greater than 10% is marked in red)

income per consuming unit Estimate of disposable Estimate of the total number of households

2007 Excl. follow-up expansion estimator Incl. follow-up expansion estimator Excl. follow-up expansion estimator Incl. follow-up expansion estimator Total 1.0% 0.9% 0.4% 0.4% 1 10.1% 9.1% 8.8% 8.1% 2 5.7% 5.7% 9.7% 9.2% 3 3.8% 3.7% 3.1% 3.0% 4 4.9% 4.6% 3.5% 3.4% 5 6.8% 6.4% 22.4% 21.9% 6 0% 3.1% 100% 71.2% 7 6.5% 6.1% 25.8% 24.9% 8 9.0% 58.2%19 27.9% 25.5% 9 3.6% 3.3% 4.6% 4.3% 10 6.5% 6.2% 4.2% 3.9% 11 2.8% 2.7% 3.2% 3.2% 12 2.6% 2.5% 3.2% 3.1% 13 9.1% 8.0% 13.8% 13.0% 14 6.1% 5.7% 10.5% 9.4% 15 7.0% 6.5% 14.6% 13.2% 16 7.7% 7.1% 10.7% 10.2% 17 15.4% 14.3% 5.4% 5.1% 18 3.7% 3.0% 3.6% 3.3% 19 1.6% 1.5% 2.5% 2.4% 20 1.4% 1.3% 1.3% 1.3% 21 6.9% 6.6% 13.4% 11.8% 22 4.4% 7.6% 7.9% 7.3% 23 5.8% 6.3% 10.2% 9.5% 24 3.2% 3.0% 5.9% 5.6%

Estimated number of households in the domain is under 100 000

Estimated number of households in the domain is between 100 000 and 300 000 Estimated number of households in the domain is more than 300 000

19

(34)

34 As it is illustrated in Figure 8 most of the groups show reduction of the nonresponse error after the follow-up. This pattern of reduction is more appearing for the variable of the total number of households than for the variable disposable income per consuming unit. Estimate of the cve takes into account both changes of the point estimates and changes of the standard error estimates. In Table 4 estimates of the cve for both variables of the year 2007 are presented. The years 2005 and 2006 are to be found in Appendix 5 (Table A5a). As can be seen from Table 4 (and in Table A5a), the precision is marginally changed between the initial data collection and the follow-up. Also the precision of the small domains is lower than for larger domains between data collection phases and the precision of the estimates of the total is on average indeed lower for very small domains. Comparisons between the precision of the two variables indicates that the precision of the total is almost always lower for the small domains than for the estimates of the ratio. The pattern of the nonresponse error reduction between data collection phases cannot confirm quality improvement of the estimates measured by estimates of the cve.

(35)

4 Nonresponse Adjustment with the Calibration Approach

Once data collection is completed it can be seen that data is missing because of nonresponse. A few nonresponse adjustment techniques have been developed. At Statistics Sweden there are two techniques often used when the desired data is missing from a survey. These techniques are reweighing and imputation. Imputation as a method for nonresponse adjustment is frequently used in business statistics and sometimes it is used for item nonresponse in case of individual-based statistics. Reweighing is often used for individual-individual-based statistics. In such statistics it is possible to find relevant auxiliary information that can be used for the estimation under nonresponse. The auxiliary information can be about the frame population and about persons in the population and also for persons on the sample level. The reweighing method is also called the calibration approach.

By using auxiliary information in the estimation phase of the survey the design weights for the respondents are changed to compensate for the nonresponse. The auxiliary information used in the estimation phase must be powerful enough to reduce both the nonresponse bias and the variance. Statistics Sweden has an advantage over many other National Statistical Institutes in the estimation process because of its access to many registers, of high quality and accuracy, which can be used as the presumptive auxiliary variables. The final auxiliary information to be used in the estimation is selected in accordance with analyses done in advance. It is important that selected auxiliary information should possess the desirable properties mentioned in Section 2.1 and if possible follow the guidelines for the construction of an auxiliary vector according to Särndal and Lundström (2005, Chapter 10).

Selection of the auxiliary vector is sample-based. The vector should be evaluated regularly in repeated surveys. Assumptions about certain characteristics of the respondents and nonrespondents are adopted. These characteristics can change over the year, i.e. the structure of the nonresponse can change.

As previously mentioned, a calibration approach is described but in fact the GREG estimator is used in the estimation of the HEK survey. The calibration estimators are a broad group of estimators which contain the GREG estimator. Hence the names are used as synonyms.

References

Related documents

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av

Det har inte varit möjligt att skapa en tydlig överblick över hur FoI-verksamheten på Energimyndigheten bidrar till målet, det vill säga hur målen påverkar resursprioriteringar