• No results found

Pareto πps sampling design vs. Poisson πps sampling design. : Comparison of performance in terms of mean-squared error and evaluation of factors influencing the performance measures.

N/A
N/A
Protected

Academic year: 2021

Share "Pareto πps sampling design vs. Poisson πps sampling design. : Comparison of performance in terms of mean-squared error and evaluation of factors influencing the performance measures."

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Örebro University School of Business Department of Statistics

Course ST3001-28106V18Bachelor Thesis in Statistics 15 credit Supervisor: Ann-Marie Flygare

Examiner:Thomas Laitila Spring Semester of 2018

Date of Submission: 2018-06-18

Pareto πps sampling design vs. Poisson πps sampling design.

Comparison of performance in terms of mean-squared error and evaluation of

factors influencing the performance measures.

Author: Natalia Ogorodnikova

(2)

Abstract

The aim of the study was to compare the performances of the Pareto πps -based vs. Poisson πps-based estimators and determine how the sample size and dataset characteristics affect the relative efficiency of the two estimators. The sample selection procedures, estimators of a population total, asymptotic variance expressions and their estimators were studied theoretically for each of the two schemes. Based on the comparison of asymptotic variances calculated on the population data the POIπps weighted ratio estimator has been found to be the most similar to the PARπps

π-estimator. The simulations showed that when the study and auxiliary variables are highly positively correlated the PARπps π- estimator is more beneficial than the POIπps π-weighted ratio estimator as well as the POIπps π-weighted ratio estimator is more competitive with larger sample sizes. The experiments showed no evidence of any detectable trend in the efficiency advantage that might depend on the skewness, correlation level or the sample size. However, it was found that the sample size (in highly skewed populations) and the strength of auxiliary information are positively

associated with the coverage rate, which in turn can be the decisive factor in the choice between the two schemes.

Key words: Pareto πps, Poisson πps, comparison of estimators, performance comparison, relative efficiency of an estimator, Monte Carlo simulation, Pareto πps π-estimator, Poisson πps π-weighted ratio estimator.

(3)

Table of Contents

1.Introduction ... 1

2.Theory ... 2

2.1.Pareto πps sampling design ... 2

2.1.1 Sample selection procedures ... 2

2.1.2 Estimation procedures. Estimator of a population total, asymptotic variance expression and its estimator ... 3

2.2 Poisson πps sampling design ... 3

2.2.1 Sample selection procedures ... 4

2.2.2 Estimation procedures. Three different estimators of a population total, asymptotic variance expressions and their estimators ... 4

2.3 The theoretical comparison of the two schemes ... 5

2.4 Measures for evaluation of estimators ... 5

2.4.1 Expected value of an estimator ... 5

2.4.2 Bias and MSE ... 6

2.4.3 Coverage rate ... 6

3.Data ... 6

4.Method ... 9

4.1Data collection ... 9

4.1.1 Common assumptions ... 9

4.1.2 Experiment 1,2,3. The same sample size and different populations ... 9

4.1.3 Experiments 4,5,6. Different sampling fractions and the same population ... 10

4.2 Data analysis ... 11

4.3 Comparison of asymptotic variances ... 11

5. Results and Analysis ... 12

5.1 Experiments 1-3. The same sample size and different populations ... 12

5.2 Experiments 4-6. Different sampling fractions and the same population ... 15

6.Conclusions ... 16

Reference List ... 18

Appendix ... 19

Appendix 1 Simulation results. Experiments 1-3. The same sample size and different populations ... 19

Appendix 2 Simulation results. Experiments 4-6. Different sampling fractions and the same population ... 28

(4)

Acknowledgements

This great opportunity to learn from the practice was provided by my supervisor Ann-Marie Flygare who has piqued my curiosity about sample design methods. Stefan Berg and Pär Lindholm, the statisticians from the methodology department, Statistics Sweden have formulated the research problem and simulated the synthetic population microdata .

(5)

1

1.Introduction

The choice of an optimal sampling strategy is the most crucial question every survey statistician has to deal with at the planning stage of a survey (Särndal, Swensson&Wretman 2003, p.447). If auxiliary information is available unequal probability designs, or πps schemes which are characterized by inclusion probabilities proportional to size measure are considered more efficient and used extensively in survey practice (Särndal, Swensson&Wretman 2003, p.85, Rosén 1997b).

Pareto πps sampling design (PARπps) is a πps scheme with predetermined sample size which is considered asymptotically optimal among the category of order1 πps schemes (Rosén 2000b, p.2).

Another πps scheme, Poisson πps sampling design (POIπps) is prominently known for its simple sample selection and estimation properties but a random sample size makes this design less convenient (Rosén 2000b, p.1).

For a given population, a given design and a given estimator all possible samples from the population generate the particular set of all possible values of the estimator. This set has its own probability distribution, called sampling distribution of a statistic (Lohr 2010, p.31). Evaluation of sampling strategies is practically based on a comparison of statistical properties related to the respective sampling distributions of a statistic. From this point of view the properties of sampling distributions of the PARπps-based vs. POIπps- based estimator will be the object of this study. The population total, (Särndal, Swensson&Wretman 2003, p.25) is considered as the parameter of interest. The performance measures of the π-estimator of total and its variance estimator under Pareto πps will be compared to the respective measures under Poisson πps. Furthermore, to determine if the sample size or dataset characteristics such as skewness and correlation between the auxiliary variable, x and the study variable, y may affect the outcome of the study, the performances of the above estimators will be evaluated on the two different variables and with different sampling fractions.

The purpose of this study is to compare the performances of the Pareto πps -based vs. Poisson πps-based estimator and determine how the sample size and dataset characteristics affect the relative efficiency of the two estimators.

This study will therefore address the following research questions:

1 The particular subgroup ”order πps schemes” introduced by Bengt Rosén (1997 b) “central idea in order sampling: The population units are ordered by a ranking variable, and the sample consists of the n units with the smallest ranking variable values”.

(6)

2 • What are the most important differences between Pareto πps vs. Poisson πps schemes with

respect to their sample selection and estimation procedures and what properties make these two schemes similar?

• Which of the three POIπps -based estimators is the most comparable to the PARπps -based π-estimator and which of these two performs better?

• How do dataset characteristics affect the relative efficiency of the two estimators? • How does sample size influence the relative efficiency of the two estimators?

We consider the situation when all the necessary information has been obtained from all sample units. Hence, non-response adjustment was not presented in this paper.

This paper has the following structure: The sample selection and estimation procedures for each of the two schemes as well as the measures of the quality of an estimator were studied in the Theory section. Afterwards, the synthetic dataset`s characteristics inclusive of the descriptive measures that might be related to the performance of the two schemes were described in the Data section. Then, the procedures for data collection and data analysis have been chosen and described in the Method section. This section also provides the comparison of the asymptotic variances calculated on the population data. The detailed simulation results were represented in Appendix 1 and 2 and the most essential results of the experiments were discussed in the Results and Analysis section. The paper's main points were finally summarized in the section Conclusions.

2.Theory

2.1. Pareto πps sampling design

2.1.1 Sample selection procedures

Let`s suppose that we have a sampling frame U=(1,2…N) with size measures x=(𝑥1, 𝑥2, … . . , 𝑥𝑁) 𝑤ℎ𝑒𝑟𝑒 𝑥𝑖 > 0 ,desired inclusion probabilities 𝜆𝑖= 𝑛 ∗

𝑥𝑖

𝑗=1𝑁 𝑥𝑗 i=1,2….,N, a set of

independent random numbers from a uniform distribution (0,1) 𝑈1 𝑈2… . . 𝑈𝑁, and a set of the

ranking variables 𝑄1 𝑄2… . . 𝑄𝑁 where 𝑄𝑖 =

𝑈𝑖(1−𝜆𝑖)

𝜆𝑖(1−𝑈𝑖), 𝑖 = 1,2 … . 𝑁 are considered.

The units corresponding to the n smallest Q values will belong to the sample (Rosén 1997b p. 173). The condition 𝜆𝑖< 1 for all i=1,2,…N is crucial for each working scheme. If this requirement is not

satisfied there is a common method to deal with the issue shortly described by Holmberg (2003, p.20). All elements with 𝜆𝑖 ≥ 1 will belong to a “Take all” stratum (A) specially created for this

purpose. The elements with 𝜆𝑖 < 1 will become another stratum (B) with 𝑁𝐵 = 𝑁 − 𝑁𝐴. The size

measures and the inclusion probabilities for the elements in stratum B must be modified taking into consideration the new size of 𝑁𝐵 and 𝑛𝐵. The elements from stratum B with 𝜆𝑖 ≥ 1 will belong to

(7)

3 the “Take all” stratum (A) with certainty 1. The size measures and the inclusion probabilities for the elements in stratum B have to be modified again taking into consideration the new size of 𝑁𝐵 and

𝑛𝐵. The procedure should be repeated until all units in stratum B match the condition 𝜆𝑖𝐵 < 1 for

i=1,2,…𝑁𝐵 ( Holmberg 2003, p.20).

2.1.2 Estimation procedures. Estimator of a population total, asymptotic variance expression

and its estimator.

The estimator of the population total is 𝑡̂(𝑦) = ∑𝑘𝜖 𝑆𝑎𝑚𝑝𝑙𝑒𝑦𝑘/𝜆𝑘 (3.1 Rosén 2000b). Rosén (2000 b,

p.3) described this estimator as a “quasi H-T estimator”. The condition 𝜋𝑘 ≈ 𝜆𝑘 for k=1,2,….N (1.4.

Rosén 2001b, p.1) is met for all πps schemes. In this case desired and factual inclusion probabilities do not agree exactly and it will, in turn lead to some bias for 𝑡̂(𝑦). On other hand it was also pointed out that this bias is negligible2 in practice (Rosén 2001b, p.6). The approximate variance of the

estimator is defined as

𝑉[𝑡̂(𝑦)] ≈

𝑁 𝑁−1

[ 𝑁 𝑘=1 𝑦𝑘 𝜆𝑘

𝑦𝑗 (1 − 𝜆𝑗 𝑁 𝑗=1 ) /

𝑁𝑗=1𝜆𝑗 (1 − 𝜆𝑗) ]2𝜆𝑘 (1 − 𝜆𝑘) .

This formula is asymptotically correct (3.2 Rosén 2000b).

According to Rosén (3.4.2000b) the estimator of the approximate variance is

V

̂[𝑡̂(𝑦)] =

𝑛 𝑛−1

𝑘є 𝑆𝑎𝑚𝑝𝑙𝑒[ 𝑦𝑘 𝜆𝑘− ∑𝑗є𝑆𝑎𝑚𝑝𝑙𝑒 𝑦𝑗 (1−𝜆𝑗) 𝜆𝑗 / ∑𝑗є𝑆𝑎𝑚𝑝𝑙𝑒 (1 − 𝜆𝑗)] 2 (1-𝜆𝑘)

2.2 Poisson πps sampling design

Poisson πps is a scheme with an unequal probability sampling design with a random sample size with mean 𝐸𝑃𝑂(𝑛𝑠) = ∑𝑢𝜋𝑘 and variance 𝑉𝑃𝑂(𝑛𝑠) = ∑𝑢𝜋𝑘(1 − 𝜋𝑘) ( Särndal, Swensson&Wretman

2003, p.85-86). If we have 𝑥1𝑥2… . . 𝑥𝑁 which are known positive values of an auxiliary variable x and

if it’s possible to assume that our study variable y is approximately proportional to x the inclusion probability 𝜋𝑘 is defined as

𝑛𝑥𝑘

𝑢𝑥𝑘 (3.5.8. Särndal, Swensson&Wretman 2003).

(8)

4 The condition 𝑥𝑘≤

𝑢𝑥𝑘

𝑛 must be satisfied for all k for each working scheme. This condition is

practically equal to the condition described in the 2.1.1 for the PARπps. If the value of 𝑥𝑘 exceeds ∑𝑢𝑥𝑘

𝑛 the method described in the section 2.1.1 for PARπps can be used (Holmberg 2003, p.20).

2.2.1 Sample selection procedures

The implementation of this design is quite simple. A set of inclusion probabilities 𝜋1𝜋2… . . 𝜋𝑁 and a

set of independent random numbers from uniform distribution (0,1) 𝜀1 𝜀2… . . 𝜀𝑁 are created. The

element k =1…..N is selected into the sample if 𝜀𝑘 < 𝜋𝑘 (Särndal, Swensson&Wretman 2003, p.85).

2.2.2 Estimation procedures. Three different estimators of a population total, asymptotic

variance expressions and their estimators.

Three different estimators are considered for POIπps : the H-T estimator and the two ratio estimators.

The unbiased Horvitz–Thompson estimator: 𝑡̂𝜋 = ∑𝑠𝑦𝑘/𝜋𝑘 (3.5.4 Särndal, Swensson&Wretman

2003). The variance of the estimator is defined as follows: 𝑉𝑃𝑂(𝑡̂𝜋) = ∑𝑢(1/𝜋𝑘− 1)𝑦𝑘2 . Due to the

variability in the sample size it is generally large in practice (3.5.5 Särndal, Swensson&Wretman 2003). An unbiased variance estimator is V̂𝑃𝑂(𝑡̂𝜋) = ∑𝑠(1 − 𝜋𝑘)𝑦 𝑘

2

(3.5.6 Särndal, Swensson&Wretman 2003 ).

The alternative estimator

:

𝑡̂

𝑎𝑙𝑡

= 𝑁

∑𝑠𝑦̌ 𝑘

𝑁̂ is a biased ratio − type estimator where

𝑁

̂ = ∑

𝑠

(1/𝜋

𝑘

)

(3.5.9 Särndal, Swensson&Wretman 2003). Its variance is ordinarily smaller than given by (3.5.5) and

𝑡̂

𝑎𝑙𝑡 is generally preferred to the π-estimator (Särndal, Swensson&Wretman

2003, p.87). The variance expression is

𝑉(𝑡̂

𝑎𝑙𝑡

) = ∑∑

𝑢

(𝜋

𝑘𝑙−

𝜋

𝑘

𝜋

𝑙

)

(𝑦

𝑘

− 𝐵𝑥

𝑘

)

𝜋

𝑘

(𝑦

𝑙

− 𝐵𝑥

𝑙

)

𝜋

𝑙

= ∑

𝑢

(1 − 𝜋

𝑘

)

𝜋

𝑘

(𝑦

𝑘

− 𝐵𝑥

𝑘

)

2

(7.2.10 Särndal, Swensson&Wretman 2003) where B= (Särndal, Swensson&Wretman 2003, p. 258).

(9)

5

V

̂[𝑡̂(𝑦)] = ∑

𝑠 1−𝜋𝑘 𝜋𝑘2

(𝑁/𝑁

̂)

2

(𝑦

𝑘

− 𝐵̂𝑥

𝑘

)

2

= (𝑁/𝑁

̂)

2

𝑠 1−𝜋𝑘 𝜋𝑘2

(𝑦

𝑘

− 𝑦

𝑠

)

2

=

=(𝑁/𝑁

̂)

2

𝑠

(

(𝑦̌𝑘−𝑦̌ 𝑠) 2 𝜋𝑘2

(𝑦̌𝑘−𝑦̌ 𝑠) 2 𝜋𝑘 𝜋𝑘2

)= (𝑁/𝑁

̂)

2

(∑

𝑠 (𝑦̌𝑘−𝑦̌ 𝑠) 2 𝜋𝑘2

− ∑

𝑠 (𝑦̌𝑘−𝑦̌̃𝑠)2 𝜋𝑘

)

where 𝑦

𝑠

=

∑𝑠𝑦̌ 𝑘 𝑁̂ (7.2.11; 7.4.7 Särndal, Swensson&Wretman 2003)

The π -weighted ratio estimator: is another biased ratio-type estimator which is commonly used in survey research is 𝑡ˆ𝑦̌,𝑟 = (∑𝑢𝑥𝑘) ∗ (∑𝑠

𝑦̌𝑘

𝜋𝑘/ ∑𝑠

𝑥𝑘

𝜋𝑘) (7.3.2 Särndal, Swensson&Wretman 2003). The approximate variance of the estimator under POIπps was derived based on 7.2.10 Särndal, Swensson and Wretman (2003) as follows

𝑉(𝑡̂

𝑦,𝑟

)

= 𝐴𝑉(

𝑡̂

𝑦,𝑟) = ∑𝑢

(

1 − 𝜋𝑘

)

𝜋𝑘 (𝑦𝑘− 𝐵𝑥𝑘) 2

where B= 𝑡𝑦

𝑡𝑥 . And the variance estimator was derived based on 7.2.11 Särndal,

Swensson&Wretman (2003) as

V

̂(𝑡̂

𝑦,𝑟

) = ∑

𝑠(1−𝜋𝑘) 𝜋𝑘2

(

𝑡𝑥 𝑡^𝑥

)

2

(𝑦

𝑘

− 𝐵̂𝑥

𝑘

)

2

where 𝐵̂ =

∑𝑠𝑦̌ 𝑘 ∑𝑠𝑥 𝑘

2.3 The theoretical comparison of the two schemes

It was noticed that besides fixed vs. random sample size the two schemes differ in sample selection procedures. Both schemes have easy-to-implement variance estimators as well as simple sample selection procedures. It can be shown analytically from the equations that both schemes have nonnegative variance estimates.

2.4 Measures for evaluation of estimators

2.4.1 Expected value of an estimator

Properties of the estimator of the population total and its variance estimator under Pareto πps vs. Poisson πps were specified as the object of this study3. It was also pointed out that a sampling

distribution of a statistic consists of all possible estimates that might be obtained from the all

3 See Introduction section

(10)

6 possible samples taken from the population. Since it is practically impossible to calculate the true characteristics of the both estimators the expected values of 𝑡̂, E[𝑡̂] = ∑𝑠𝑡̂𝑠 𝑃(𝑆) and the variance

of 𝑡̂,V(𝑡̂)= ∑𝑠𝑃(𝑆) [ 𝑡̂𝑠- E( 𝑡̂)2] with the probabilities P(S) with which we select the sample S as

weights (Särndal, Swensson and Wretman 2003, p.40; 2.3, 2.5 Lohr 2010) will be replaced by the calculated simulation summary measures. The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of

𝑉̂, 𝑎𝑛𝑑 𝑆

𝑡̂2

estimates the true variance

(Särndal, Swensson and Wretman 2003, p.280)

.

2.4.2 Bias and MSE

The bias and the mean squared error are the two most commonly used measures of the quality of an estimator. The mean squared error (MSE) is MSE[t̂]= E[(t̂ − t)]2= E[(t̂ − E[t̂] + E[t̂] − t)2] = E[(

t̂ − E[t̂])2] + (E[t̂] − t)2+ 2E[(𝑡̂ − E[t̂])(E[t̂] − t)] = V (t̂)+[Bias(t̂)]2 (Lohr 2010,p.31). The bias of the estimator

𝑡̂

is defined as Bias[

𝑡̂

]= E[

𝑡̂

]-t. For any unbiased estimator

𝑡̂

: E[

𝑡̂

]=t and MSE(

𝑡̂

) = V(

𝑡̂

) (2.4 Lohr 2010; Särndal,Swensson&Wretman 2003,p.40). An estimator

𝑡̂

of t is precise if V[

𝑡̂

] is small and accurate if MSE[

𝑡̂

] is small (Lohr 2010, p.32).

2.4.3 Coverage rate

Särndal, Swensson and Wretman (2003, p.55) exemplify how to check if the desired confidence level is attained for a given sample size, sampling design and population shape. Empirical validation can be carried out applying Monte Carlo simulation. After the statistics t̂ , 𝑉̂( t̂) and the confidence interval t̂+/- 1,96√𝑉̂(t̂) (2.11.2 Särndal,Swensson&Wretman 2003) are computed for each sample the coverage probability, or the expected coverage rate (ECR) can be obtained as ECR=100*R/H, where R is the number of intervals that contain the true value 𝑡𝑦̌ and H is the number of samples.

Highly skewed populations with outlying values require larger sample sizes due to the normal approximation (Särndal,Swensson&Wretman 2003, pp. 55- 57). According to the reasoning above insufficient coverage might indicate that the sample fraction is not large enough to apply the normal approximation. Poor coverage rate will invalidate conclusions about the performance of an

estimator.

3.Data

The synthetic population microdata reflecting the main properties of the real population were provided by SCB. This simulated dataset comprises observations on three variables for 4463

(11)

7 statistical units for a single time period. Each unit represents a part of an enterprise associated with a specific economic activity. The population includes the following two industries: 1026 units have industry code 22 “Manufacture of rubber and plastic products “(SNI2007) and 3437 units belong to industry group with code 77 “Rental and leasing activities “(SNI2007). The variables used in this study:

Gross Wages (Wages, y1): A study variable defined as “a personnel cost component, made up of net earnings, included family allowances payable by an employer to an employee in exchange for work done during the reference period including income tax and social contributions, payable by the employee” (Eurostat: Statistics Explained 2015). 17 % of all observations are zero- values.

Figure 1 Histogram. Distribution of Wages (y1) Figure 2.Box plot. Distribution of Wages (y1)

The histogram (Figure 1) indicates that Wages has right skewed distribution. From the box plot (Figure 2) we can notice that there are many outliers in the data. The median at the level 0 which indicates the prevalence of zero or close to zero values. The box shifts to the right and the right whisker gets longer, which is a sign of a right skewed distribution.

Gross investment in tangible goods (Investment, y2): A study variable defined as “ investment during the reference period in all tangible goods. Included are new and existing tangible capital goods, whether bought from third parties or produced for own use (i.e. capitalized production of tangible capital goods), having a useful life of more than one year including non-produced tangible goods such as land. Investments in intangible and financial assets are excluded” (Eurostat: Statistics Explained 2015). 50 % of all observations are zero- values.

(12)

8

Figure 3.Histogram. Distribution of Investment (y2) Figure 4.Box plot .Distribution of Investment (y2) The figures above illustrate that Investment (y2) is highly skewed, has a high prevalence of zero-values and a small number of outliers. There are several extreme outliers.

Turnover from the previous reference period (Turnover, x): An auxiliary variable defined as “the totals invoiced by the observation unit which correspond to the total value of market sales of goods and services to third parties included invoiced taxes and other affiliated charges passed on to the customer” (Eurostat: Statistics Explained 2015).

Figure 5.Histogram. Distribution of Turnover (x) Figure 6.Box plot. Distribution of Turnover (x) According to the figures 5,6 Turnover (x) has right skewed distribution. The data consists of positive non-zero values. There are many outliers.

All three variables have no missing values. The study variables have different prevalence of zero- values (17% for Wages and 50% for Investment), different number of outliers, different skewness (5.7 for Wages and 23,7 for Investment). Additionally, the linear correlation between the study and auxiliary variables have been measured. Wages and Turnover are highly positively correlated (PCC4

=.78) while Investment and Turnover have no correlation (PCC=.18).

4 The Pearson correlation coefficient

(13)

9

4.Method

4.1Data collection

4.1.1 Common assumptions

Simulations were performed to compare the two estimators. Based on the theory and the

assumptions above a program performing Monte Carlo-type simulations5 has been written in Stata.

From a population of size N the program generates a certain number of independent samples of predetermined size n under both PARπps and POIπps. The specific characteristics described in the experiments below have been computed.

For any Monte Carlo simulation the number of replications is the key factor in determining the accuracy of the approximations (Särndal, Swensson&Wretman 2003, p.35). In line with Särndal, Swensson and Wretman (2003,p.35) it was considered that close approximation to the true values would be attained with 10 000 independent samples. For all experiments (1-6), the specified sampling fractions were not large enough to disrupt the assumption 𝜆𝑘<1 ,for all k є U under

PARπps respectively 𝑥𝑘 ≤ ∑𝑢𝑥𝑘

𝑛 for all k є U under POIπps, and therefore none of the original units

has been set aside as a “Take all” stratum.

The following experimental procedures were carried out based on the theory and the assumptions from the previous sections:

4.1.2 Experiment 1,2,3. The same sample size and different populations

Step 1.10 000 independent samples of sampling fraction ʄ= 0,0346 have been drawn 7 from the

population using PARπps. The following quantities have been calculated8 for each of the 10 000

realized samples:

• estimates of the population totals: t̂ 𝑊𝑎𝑔𝑒𝑠 (𝑦̌1), t̂ 𝐼𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 (𝑦̌2)

• corresponding variance estimates: 𝑉̂( t̂ 𝑊𝑎𝑔𝑒𝑠 (𝑦̌1)), 𝑉̂( t̂𝐼𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 (𝑦̌2))

• corresponding confidence intervals of the population totals

Step 2. The estimates created above have been saved as a new dataset. The following summary statistics and efficiency measures have been calculated9 for each of the two study variables:

• the mean and the variance, 𝑆2 ( t̂) of the 10 000 estimates of the total 𝑡̂

1𝑡̂2. . . 𝑡̂10 000 . In line

with Särndal,Swensson and Wretman (2003,p.280) it was assumed that the mean of t̂ would

5 A special method of approximating the value of quantities which is applicable when deterministic procedure is not useful, called Monte Carlo simulation (Särndal, Swensson&Wretman 2003,p.35)

6 size n=150 out of N=4463

7 see the sample selection procedures for PARπps in the Theory section

8 see the estimation procedures for the π estimator under PARπps in the Theory section

(14)

10 estimate the true expected value of the population total, E (t̂) and 𝑆2 ( t̂) would estimate the true variance, to degree of precision obtained with 10 000 repetitions.

• the bias of the estimator of the population total, B (t̂)

• the mean squared error of the estimator of the population total, MSE (t̂) • the corresponding coverage rate (ECR) for the estimates of the population total • the mean and the variance, 𝑆2[𝑉̂( t̂)] of the 10 000 variance estimates

[𝑉̂( t̂)]1 [𝑉̂( t̂)]2 . . . . [𝑉̂( t̂)]10 000. In line with Särndal,Swensson and Wretman

(2003,p.280) it was assumed that the mean of [𝑉̂( t̂)] would estimate the true expected value of the variance estimator, E [𝑉̂( t̂)].

• the bias of the variance estimator, B [𝑉̂( t̂)]. Since 𝑆2 ( t̂) estimates the true variance the

bias of [𝑉̂( t̂)] was calculated as follows Bias [𝑉̂( t̂)] = E [𝑉̂( t̂)]- 𝑆2( t̂) • the mean squared error of the variance estimator, MSE [𝑉̂( t̂)]

Step 3. Step1 has been repeated using the sample selection procedures for POIπps described in the Theory section. Additionally, the number of observations 𝑛1𝑛2… . 𝑛10000 have been calculated for

each of the 10 000 realized samples.

Step 4. Step 2 has been repeated and for the POIπps based estimator. Additionally, the mean and the variance of the 10 000 estimates (n_s) have been calculated.

Step 5. The summary statistics and efficiency measures created in steps 2-4 have been summarized for analysis in the tables presented in Appendix 1 (Experiment 1:steps1-5).

Step 6. We assume that the two industries “22” and “77” could have different characteristics which, in turn might affect the outcome measures. To inspect whether the previous results would hold up in a more detailed analysis we consider industry “22” as a population and repeat steps 1-5 with 10 000 samples of the equivalent sampling fraction ʄ= 0,03410 (Experiment 2).

Step 7. We consider industry “77” as a population and repeat steps 1-5 with 10 000 samples of the equivalent sampling fraction ʄ= 0,03411 (Experiment 3).

4.1.3 Experiments 4,5,6. Different sampling fractions and the same population

We consider industry “77” as a population and repeat steps 1-5 with 10 000 samples of the sampling fractions f=0,033 (Experiment 4), f=0,067 (Experiment 5), and f=0,10 (Experiment 6).

10 34 observations out of N=1026 11 116 observations out of N=3437

(15)

11

4.2 Data analysis

The two estimators have been compared using the ratio of their efficiencies. The relative efficiency losses/gains (RE) (Holmberg 2003, p.986; Rosén 1997b, p.176) was defined as follows:

𝑅𝐸 = [ Performance measure for POIπps − based estimator

Performance measure for PARπps π − estimator − 1] ∗ 100

This index indicates the percentage change that will occur in the respective performance measure in case the PARπps π-ps estimator is chosen as the preferred estimator. Positive values of RE indicate efficiency gains and negative values indicate efficiency losses. In this paper, RE values greater than 1 will indicate the dissimilarity between two measures.

4.3 Comparison of asymptotic variances

Since the complete dataset for the entire population was available the values of asymptotic variances have been calculated for each of the four strategies. Relative efficiency losses/gains (RE) described in 4.2 have been calculated to compare the three POIπps estimators with the PARπps π-estimator (Table 1).

Table 1 Comparison between the values of asymptotic variances of the estimator 𝑡̂

Statistic AV of the π- estimator of t under PARπps AV of the π-estimator of t under POIπps AV of the alt-estimator of t under POIπps AV of the π-weighted ratio estimator of t under POIπps 1 2 3 4

The whole population.

Wages 344158453937 926038550859 4298115572898 346995266950 RE (%) X 169.07 1148.88 0.82 Investment 5055429152353 5149254943210 5624593543707 5054858402561 RE (%) X 1.86 11.26 -0.01 Industry "22" Wages_div22 207816099242 632698646904 3161179043576 208035909712 RE (%) X 204.45 1421.14 0.11 Investment_div22 586146698772 636155518770 898509546657 585586860794 RE (%) X 8.53 53.29 -0.10 Industry "77" Wages_div77 153483699714 419479089320 1457666040811 153439139975 RE (%) X 173.31 849.72 -0.03 Investment_div77 2946960517173 2997733847673 3172288304855 2946106986806 RE (%) X 1.72 7.65 -0.03

Table 1 Indicates that the asymptotic variance of the POIπps π-estimator (2) as well as the asymptotic variance of the POIπps alternative ratio estimator (3) are greater than the asymptotic

(16)

12 variance of the PARπps π -estimator (1). It has been notices that the POIπps ratio estimator (4) and the PARπps π -estimator (1) have similar12 asymptotic variances. These two estimators (1 and 4)

have been selected for further comparison.

5. Results and Analysis

The performance measures of the POIπps π-weighted ratio estimator were compared to the corresponding performance measures of the PARπps π-estimator. Tables 2,3 represent the results of the comparisons. Information that supports the analysis is included in the appendix.

Since the biases were not large enough to affect the mean squared errors the values of 𝑀𝑆𝐸𝑡̂ and

𝑀𝑆𝐸𝑉̂( t̂) predictably reflect the values of 𝑆(𝑡̂)2 and 𝑆𝑉̂( t̂)2 respectively. In other words

𝑅𝐸𝑀𝑆𝐸𝑡̂is equal to 𝑅𝐸 𝑆(𝑡̂)2 𝑎𝑛𝑑 𝑅𝐸𝑀𝑆𝐸𝑉̂( t̂) is equal to 𝑅𝐸𝑆𝑉̂( t̂)2 . MSE measures are therefore approximated

with the variance estimates setting the bias to zero.

5.1 Experiments 1-3 The same sample size and different populations.

Table 2. The PARπps π-estimator vs. the POIπps π-weighted ratio estimator. Comparison of the efficiency measures obtained using Monte Carlo simulation involving 10 000 samples of f=0.034 (150 of 4463).

Columns 1 and 2 show the similarity (if RE values between -1 and 1), efficiency gains (if positive values greater than 1) or efficiency losses (if negative values greater than 1) that will occur in case the respective PARπps π-ps estimator is preferred. The results of comparison of MSE are obtained using:

12 In this paper, RE values greater than 1 indicate the dissimilarity between two measures.

Study variable RE %. Comparison of MSE(t^) RE (%). Comparison of MSE [V^(t^)] ECR (%). Both designs Variation of n_s under POI +/- (%) Data: Skewness (coeff) Data: Corr.with x (coeff.) 1 2 3 4 5 6 Experiment 1 (f=0.034) Wages -0.78 1.86 94 5.70 0.78 Investment -3.14 -7.32 57 23.70 0.18 Experiment 2 (f=0.034) Wages_22 2.96 19.12 93 3.90 0.82 Investment_22 22.16 52.30 82 12.36 0.37 Experimnet 3 (f=0.034) Wages_77 3.07 31.80 94 3.61 0.71 Investment_77 0.41 2.18 54 26.02 0.09 7.18 15.65 8.97

(17)

13 REMSE of estimator of ty = [

MSE of estimator of total under POIπps

MSE of estimator of total under PARπps− 1] ∗ 100 where MSE[t̂] = 𝑆𝑡̂2+ [Bias(t̂)]2 and Bias[t̂] = E(𝑡̂) − 𝑡

and

REMSE of variance estimator of ty= [

MSE of variance estimator of total under POIπps

MSE of variance estimator of total under PARπps− 1] ∗ 100 where MSE [𝑉̂(t̂)] = 𝑆2[𝑉̂(t̂)] + [Bias[𝑉̂(t̂)]]2 𝑎𝑛𝑑 Bias[𝑉̂(t̂)] = E(𝑉̂(t̂)) − 𝑆

𝑡̂2.

Both estimators provide similar coverage rates 13 which was reported in column 3. Variation of

sample size under POIπps π-weighted ratio estimator, skewness coefficient and the correlation between the study and auxiliary variables are presented in columns 4, 5 and 6 respectively. Wages or the case when the study and auxiliary variables are strongly positively correlated: The coverage rate is close to the desired 95% for both estimators. The comparison indicated that 2 % efficiency gain associated with the variance estimator can be achieved by choosing the PARπps π-estimator over the POIπps π-weighted ratio π-estimator.

Investment or the case when the study and auxiliary variables are not correlated: the loss of efficiency 3% with estimating of the population total and 7% associated with variance estimator. The coverage rate is low (57 %). It has been discussed in the Theory section that highly skewed populations with outliers require larger sample sizes (Särndal,Swensson&Wretman 2003, pp. 55- 57). From this perspective it is possible to assume that the sample fraction used for the estimation of total Investment was not large enough to provide the desired coverage rate. So, in this case a new experiment with a larger sample fraction could be conducted.

Wages_22: The correlation coefficient is slightly lower ( 82% ) compared to the previous population. The coverage rate is almost the same (93%). The comparison led to the conclusion that the PARπps π-estimator provides 3% efficiency gain with estimating of the population total and 19% efficiency gain with estimating of variance in comparison to the POIπps π-weighted ratio estimator.

Investment_22: The correlation coefficient increased to the “moderate” compared to the previous population. The comparison of MSE led to the conclusion that the PARπps π-estimator provides 22% efficiency gain with estimating of the population total and 52 % efficiency gain with estimating of variance in comparison to the POIπps π-weighted ratio estimator.

The distribution of Investment_22 is less skewed (skewness 12.35) than the distribution of Investment in the merged dataset (skewness 23.69).The coverage rate increased from 57%

(18)

14 (Experiment 1 Investment) to 82-83 % (Experiment 2 Investment_22) which indicates that neither of the two estimators provide sufficient coverage. In line with Särndal, Swensson and Wretman (2003, pp. 55- 57 ) it is possible to assume that population skewness negatively affects the respective coverage rate.

Wages_77: The correlation coefficient decreased from 82% to 71% compared to the previous experiment with Wages_22 . The corresponding coverage is still close to the desired value. The PARπps π-ps estimator performs better than the POIπps π-weighted ratio estimator. The comparison of the respective MSE showed that the PARπps π-estimator has 3% performance advantage with the estimating av the population total and 32 % with the estimating of variance .

Investment_77: The correlation coefficient is close to zero. The coverage rate is extremely low (54%). Correlation coefficient decreased to zero compared to the previous experiment with

Investment_22. The comparison of MSE led to the conclusion that PARπps π-estimator provides 2% efficiency gain with estimating of variance in comparison to the POIπps π-weighted ratio estimator. This efficiency advantage is not relevant since the coverage rate is low (54-55% for both

estimators).The distribution of Investment_77 is more asymmetrical (skewness 26.02) than the distribution of Investment_22 (skewness 12.35). In line with Särndal, Swensson and Wretman (2003, pp. 55- 57 ) it is possible to assume that population skewness negatively affects the respective coverage rate.

Regardless whether the desired coverage rate has been achieved or not, for the subpopulation_22 the relative efficiency advantage is getting smaller when the correlation between x and y increases. On the contrary, for the subpopulation_77, the relative efficiency advantage decreases when the correlation coefficient is getting lower. No evidence of any detectable trend in the efficiency advantage that might depend on the correlation level have been found.

Additionally, it has been noticed that changes in the coverage rates reflect the changes in the respective correlation coefficient . As the correlation between Wages and Turnover remains strong the respective coverage rate remains the same. And in case of Investment, as the correlation coefficient increases, the respective coverage rate increases as well.

(19)

15 Figure 7. Plotting the above data on a graph with the ECR values for Wages_22 , Wages_77 , for Investment_22 and Investment_77 on the horizontal axis and the respective correlation coefficient on the vertical axis.

The scatterplot showed the evidence of a positive correlation between the power of the auxiliary information and the coverage rate. For each of the two industries, the power of the auxiliary information might be associated with the respective coverage rate.

5.2 Experiments 4-6. Different sampling fractions and the same population

Table 3. The PARπps π-estimator vs. the POIπps π-weighted ratio estimator. Comparison of the performance measures obtained using Monte Carlo simulation involving 10 000 samples of f=0.033, 0,067, 0,10.

Industry_77

Columns 1-6 were described in the previous section 5.1 (Table 2). According to column 2 the values of MSE [𝑉̂( t̂)] provided by the PARπps π- estimator are smaller than the values of MSE [𝑉̂( t̂)] provided by the POIπps π-weighted ratio estimator. For both variables and for all sampling fractions the PARπps π- estimator is more beneficial than the POIπps π-weighted ratio estimator. These results are consistent with experiments 2 and 3. Moreover, when the sample size increases the

Study variable RE %. Comparison of MSE(t^) RE (%). Comparison of MSE [V^(t^)] ECR (%). Both designs Variation of n_s under POI +/- (%) Data: Skewness (coeff) Data: Corr.with x (coeff.) 1 2 3 4 5 6 Experiment 4 (f=0.033) Wages_77 3.71 39.02 95 3.61 0.71 Investment_77 3.10 3.29 54 26.02 0.09 Experiment 5 (f=0.067) Wages__77 -0.27 13.54 95 3.61 0.71 Investment_77 1.68 3.01 70 26.02 0.09 Experiment 6 (f=0.10) Wages_77 -0.31 13.48 95 3.61 0.71 Investment_77 -0.51 2.06 81 26.02 0.09 9.08 4.72 6.03

(20)

16 corresponding RE values are changing in different directions, hence any detectable trend in the relative efficiency advantage (RE) that might depend on the sampling fraction has not been found. Wages or the case when the study and auxiliary variables are strongly positively correlated: Both estimators provide sufficient coverage (95%) which is constant for all sampling fractions. Investment or the case when the study and auxiliary variables are not correlated and when the study variable is highly skewed: There is some evidence of a positive correlation between the sampling fraction and coverage rate. This result is consistent with Särndal,Swensson and Wretman`s statement (2003, pp. 55- 57) that highly skewed populations with outlying values require larger sample sizes.

For both variables it is possible to assume that for the POIπps π-weighted estimator sampling fraction and variation of sample size are negatively correlated or in other words the POIπps π-weighted estimator can be more competitive with larger sample sizes.

6.Conclusions

The aim of the study was to compare the performances of the Pareto πps -based vs. Poisson πps-based estimator and determine how the sample size and dataset characteristics affect the relative efficiency of the two estimators.

What are the most important differences between Pareto πps vs. Poisson πps schemes with respect to their sample selection and estimation procedures and what properties make these two schemes similar?

The theoretical comparison between the two schemes showed that besides fixed vs. random sample size the two schemes differ in sample selection procedures. Both schemes have quite simple sample selection procedures, understandable variance estimator as well as nonnegative variance estimate. Which of the three POIπps -based estimators is the most comparable to the PARπps -based π-estimator and which of these two performs better?

Based on the comparison of asymptotic variances calculated on the population data the POIπps π-weighted ratio estimator has been found to be the most similar to the PARπps π-estimator. Small differences between the calculated asymptotic variances of the two estimators were almost

unnoticeable while the experimental results showed that for the given type of data and for the given sample fractions when the study and auxiliary variables are highly positively correlated (Wages) the PARπps π- estimator is more beneficial than the POIπps π-weighted ratio estimator.

(21)

17 How do dataset characteristics affect the relative efficiency of the two estimators?

The experiments showed no evidence of any detectable trend in the efficiency advantage that might depend on the skewness or correlation between the auxiliary and study variables. However, it was found that strength of auxiliary information can be positively associated with coverage rate. How does sample size influence the relative efficiency of the two estimators?

POIπps π-weighted ratio estimator can be more competitive with larger sample sizes because an increasing sampling fraction decreases the sample size variation. No trend in the relative efficiency advantage (RE) that might depend on the sample size was evident. However, for highly skewed populations with outlying values sample size is positively correlated with coverage rate which in turn can be the decisive factor in the choice between the two schemes.

(22)

18

Reference List

Eurostat: Statistics Explained (2015). Glossary: Structural business statistics (SBS) ISSN 2443-8219

http://ec.europa.eu/eurostat/statistics-explained/index.php?title=Glossary:Structural_business_statistics_(SBS) [2018-05-25]

Holmberg, A.2003.ESSAYS ON MODEL ASSISTED SURVEY PLANNING Acta Universatis Upsaliensis. Comprehensive Summaries of Uppsala Dissertations from the Faculty of Social Sciences 126.40 pp. Uppsala. ISBN 91-554-5623-5 http://www.diva-portal.org/smash/get/diva2:162729/FULLTEXT01.pdf [2018-05-10]

Lohr, S. (2010). Sampling : Design and analysis.2.nd ed., Boston, MA: Cengage Brooks/Cole. Rosèn, B.(1997b). On sampling with probability proportional to size. Journal of statistical planning and inference,62(1997), pp.159-191.https://doi.org/10.1016/S0378-3758(96)00186-3.

Available:https://www-sciencedirect-com.db.ub.oru.se/science/article/pii/S0378375896001863 [2018-04-17]

Rosèn, B. (2000b). A User's Guide to Pareto πps Sampling.(R&D Report 2000:6). Stockholm: Statistics Sweden.https://www.scb.se/contentassets/14f5e346f4814dd0acd52d10b23286c6/rnd-report-2000-06-green.pdf

SNI2007. The Swedish Standard Industrial Classification. Stockholm: Statistics Sweden. http://www.sni2007.scb.se/snisokeng.asp [2018-04-17]

Särndal, C., Swensson, B. & Wretman, J. (2003). Model assisted survey sampling (Springer series in statistics). New York ; Berlin ; Heidelberg ; Hong Kong ; London ; Milan ; Paris ; Tokyo: Springer.

(23)

19

Appendix

Appendix 1 Simulation results. Experiments 1-3. The same sample size and different

populations

The results of the simulations are represented for the π-estimator under PARπps vs. the ratio estimator under POI πps. We take into consideration that the estimates in the columns 1-6, 8-12, 14-16 have been calculated with a certain degree of precision obtained with 10 000 repetitions. For each of the two designs, the quantities 𝑆𝑡ˆ2 (4), E[𝑉̂ (𝑡̂)] (9) and AV(13) agree closely for each

estimator (14,15,16)

a. 𝑆𝑡ˆ2 and AV are close→ Tailored made AV represents the true variance.

b. 𝑆𝑡ˆ2 agrees closely with the expected value of the variance estimator E [𝑉̂ (𝑡̂)] →the variance estimator is nearly unbiased, as was to be expected according to CGS.

Experiment 1. The whole (22+77) population. Wages vs. Investment

Table 4. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps,π-estimator versus 10 000 samples of random sample size with design: POIπps, π-weighted ratio estimator f=0.034 (150 of 4463)

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance(Särndal, Swensson and Wretman 2003, p.280).

Mean Var E(t^) S^2(t^) B(θˆ) MSE(θˆ)

1 2 3 4 5 6 7 8 PAR πps _wages 150 0 9828704 354497358701 -681 354497821875 95 POI πps _wages 150 116 9821382 351656548862 -8003 351720591971 94 RE (%)_Wages 0.00 X X -0.80 1075.88 -0.78 X PAR πps _investment 150 0 3959029 5116286701818 14897 5116508627207 57 POI πps _investment 150 116 3942909 4955936325894 -1224 4955937823097 57 RE (%)_ Investment 0.00 X X -3.13 -91.79 -3.14 X 9829385 3944132

Experiment 1.The whole population:Wages vs. Investment

Sample size n_s ECR (%) Statistic The 10 000 estimates (t^) t

(24)

20 Figure 8. Wages. PARπps π- estimator of t Figure 9. Wages. POIπps ratio estimator of t

Figure 10.Investment PAR πps π estimator of t Figure 11.Investment. POI πps ratio estimator of t

Table 5. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POIπps, π-weighted ratio estimator f=0.034(150 of 4463)

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

E [V^(t^)] S^2[V^(t^)] B(θˆ) MSE(θˆ)

9 10 11 12 13 14 15 16

PAR πps _wages 344260230599 4.90164E+21 -10237128102 5.00644E+21 344158453937 0.97 1.03 1.00 POI πps _wages 345940582872 5.0669E+21 -5715965990 5.09957E+21 346995266950 0.99 1.02 1.00

RE (%)_Wages 0.49 3.37 -44.16 1.86 0.82 X X X

PAR πps _investment 5115459452734 1.03606E+26 -827249083 1.03606E+26 5055429152353 0.99 1.00 0.99 POI πps _investment 4923812420091 9.60E+25 -32123905803 9.60266E+25 5054858402561 1.02 1.01 1.03 RE (%)_ Investment -3.75 -7.32 3783.22 -7.32 -0.01 X X X Var of t^ (S^2) /AV E [V^(t^)] /AV Var of t^ (S^2) /E[V^(t^)]

Experiment 1.The whole population:Wages vs. Investment

V (t^) Statistic

(25)

21 Figure 12.Wages. PARπps π- estimator of 𝑉 (𝑡̂) Figure 13. Wages. POIπps ratio estimator of 𝑉 (𝑡̂)

Figure 14.Investment. PARπps π-estimator of 𝑉 (𝑡̂) Figure 15.Investment. POIπps ratio estimator of 𝑉 (𝑡̂)

Figure 16.Wages. PARπps π- estimator of total vs Figure 17. Wages. PARπps π variance POIπps ratio estimator of total estimator vs POIπps ratio variance estimator

(26)

22

Figure 18.Investment. PARπps π-estimator of total vs Figure 19. Investment. PARπps π-estimator of 𝑉 (𝑡̂) POIπps ratio estimator of total vs POIπps ratio estimator of 𝑉 (𝑡̂)

Experiment 2. Summary measures industry “22”

Table 6. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of random sample size with design: POI πps, π-weighted ratio estimator f=0.034 (34 of 1026)

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance(Särndal, Swensson and Wretman 2003,

p.280).

Figure 20.Wages_22. PARπps π-estimator of t Figure 21.Wages_22. POIπps ratio estimator of t

Mean Var E(t^) S^2(t^) B(θˆ) MSE(θˆ)

1 2 3 4 5 6 7 8 PAR πps _wages_div22 34 0 4038866 208813876143 3927 208829296511 93 POI πps _wages_div22 34 28 4032846 215007522541 -2094 215011907659 93 RE (%)_Wages_22 0.18 X X 2.97 -46.67 2.96 X PAR πps _investment_div22 34 0 1410875 544040069363 -4675 544061927736 83 POI πps _investment_div22 34 28 1418140 664613610206 2589 664620315528 82 RE (%)Investment _22 0.18 X X 22.16 -44.61 22.16 X 4034940 1415551 Experiment 2.Industry "22" Sample size n_s ECR (%) Statistic The 10 000 estimates (t^) t

(27)

23 Figure 22.Investment_22. PARπps π-estimator of t Figure 23.Investment_22.POIπps ratio estimator of t

Table 7. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POIπps, π-weighted ratio estimator f=0.034 (34 of 1026)

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Figure 24. Wages_22. PARπps π-estimator of 𝑉 (𝑡̂)Figure 25.Wages_22. POIπps ratio estimator of 𝑉 (𝑡̂)

E [V^(t^)] S^2[V^(t^)] B(θˆ) MSE(θˆ)

9 10 11 12 13 14 15 16

PAR πps _wages_div22 209333609644 7.43609E+21 519733501 7.43636E+21 207816099242 1.00 1.00 0.99 POI πps _wages_div22 206033642125 8.77774E+21 -8973880416 8.85827E+21 208035909712 0.97 1.04 1.01

RE (%)_Wages_22 -1.58 18.04 31.94 19.12 0.11 X X X

PAR πps _investment_div22 551237232339 7.35266E+24 7197162976 7.35272E+24 586146698772 1.08 0.99 1.06 POI πps _investment_div22 634813203508 1.11975E+25 -29800406698 1.11983E+25 585586860794 0.88 1.05 0.92 RE (%)Investment _22 15.16 52.29 41.01 52.30 -0.10 X X X Var of t^ (S^2) /AV E [V^(t^)] /AV Var of t^ (S^2) /E[V^(t^)] V (t^) Statistic

The 10 000 variance estimates V^(t^)

(28)

24 Figure 26.Investment_22.PARπps π-estimator Figure 27.Investment_22 POIπps ratio estimator

of 𝑉 (𝑡̂) of V (𝑡̂)

Figure 28.Wages_22. PARπps π-estimator of total vs Figure 29. Wages_22. PARπps π-estimator of 𝑉 (𝑡̂) POIπps ratio estimator of total vs POIπps ratio estimator of 𝑉 (𝑡̂)

Figure 30.Investment_22. PARπps π-estimator of total Figure 31. Investment_22. PARπps π-estimator

(29)

25

Experiment 3. Summary measures industry “77”

Table 8. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POIπps, π-weighted ratio estimator f=0.034 (116 of 3437)

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Figure 32.Wages_77. PARπps π-estimator of t Figure 33.Wages_77.POIπps ratio estimator of t

Figure 34.Investment_77. PARπps π-estimator of t Figure 35. Investment_77.POIπps ratio estimator of t Mean Var E(t^) S^2(t^) B(θˆ) MSE(θˆ)

1 2 3 4 5 6 7 8 PAR πps _wages_div77 116 0 5797292 152032132002 2847 152040237737 95 POI πps _wages_div77 116 108 5802603 156645598844 8158 156712145500 94 RE (%) Wages _77 -0.06 X X 3.03 186.53 3.07 X PAR πps _investment_div77 116 0 2539366 2973408768773 10784 2973525063374 55 POI πps _investment_div77 116 108 2517057 2985664083992 -11525 2985796907002 54 RE (%) Investment_77 -0.06 X X 0.41 6.87 0.41 X 5794445 2528582 Experiment 3.Industry "77" Sample size n_s ECR (%) Statistic The 10 000 estimates (t^) t

(30)

26 Table 9. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed

sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POIπps, π-weighted ratio estimator f=0.034 (116 of 3437)

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003,

p.280).

Figure 36.Wages_77.PARπps π-estimator of 𝑉(𝑡̂) Figure 37.Wages_77.POIπps ratio estimator of 𝑉(𝑡̂)

Figure 38.Investment_77. PARπps π-estimator Figure 39.Investment. POIπps ratio estimator of 𝑉(𝑡̂) of 𝑉(𝑡̂)

E [V^(t^)] S^2[V^(t^)] B(θˆ) MSE(θˆ)

9 10 11 12 13 14 15 16

PAR πps _wages_div77 153360005113 9.12463E+20 1327873111 9.14226E+20 153483699714 1.01 0.99 1.00

POI πps _wages_div77 154008515629 1.19798E+21 -2637083215 1.20494E+21 153439139975 0.98 1.02 1.00

RE (%) Wages _77 0.42 31.29 98.59 31.80 -0.03 X X X

PAR πps _investment_div77 2982938630473 2.35406E+25 9529861700 2.35407E+25 2946960517173 0.99 1.00 0.99

POI πps _investment_div77 2929259533960 2.41E+25 -56404550032 2.41E+25 2946106986806 0.99 1.02 1.01

RE (%) Investment_77 -1.80 2.17 491.87 2.18 -0.03 X X X Var of t^ (S^2) /AV E [V^(t^)] /AV Var of t^ (S^2) /E[V^(t^)] V (t^) Statistic

The 10 000 variance estimates V^(t^)

(31)

27

Figure 40.Wages_77. PARπps π-estimator of total vs Figure 41. Wages_77. PARπps π estimator of 𝑉(𝑡̂) POIπps ratio estimator of total vs POIπps ratio estimator of 𝑉(𝑡̂)

Figure 42.Investment_77. PARπps π-estimator of total Figure 43. Investment_77. PARπps π-estimator of 𝑉(𝑡̂) vs POIπps ratio estimator of total vs POIπps ratio estimator of 𝑉(𝑡̂)

(32)

28

Appendix 2 Simulation results Experiments 4-6. Different sampling fractions and the

same population

Table 10. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POIπps, π-weighted ratio estimator f=0.033 (112 of 3437) Industry _77

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Table 11. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POI πps, π-weighted ratio estimator f=0.033 (112 of 3437) Industry _77

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Mean Var E(t^) S^2(t^) B(θˆ) MSE(θˆ)

1 2 3 4 5 6 7 8 PAR πps _wages_77 112 0 5795941 159937089418 1496 159939327716 95 POI πps _wages_77 112 103 5791750 165866246185 -2695 165873509947 94 RE (%)_Wages 0.00 X X 3.71 80.14 3.71 X PAR πps _investment_77 112 0 2544601 3063173996067 16019 3063430618678 54 POI πps _investment_77 112 103 2541132 3158295399365 12551 3158452924272 54 RE (%)_ Investment_77 0.00 X X 3.11 -21.65 3.10 X 5794445 2528582 Experiment 4.Industry "77" (f=0.033) Sample size n_s ECR (%) Statistic The 10 000 estimates (t^) t E [V^(t^)] S^2[V^(t^)] B(θˆ) MSE(θˆ) 9 10 11 12 13 14 15 16

PAR πps _wages_77 159183696268 8.77E+20 -753393149 8.77468E+20 159441138377 1.00 1.00 1.00 POI πps _wages_77 159427518163 1.18E+21 -6438728022 1.21984E+21 159394841521 0.96 1.04 1.00 RE (%)_Wages 0.15 34.38 754.63 39.02 -0.03 X X X PAR πps _investment_77 3110953435285 2.63E+25 47779439219 2.62563E+25 3059317879173 1.00 0.98 0.98 POI πps _investment_77 3071227024119 2.71E+25 -87068375246 2.71195E+25 3058431591492 0.97 1.03 1.00 RE (%)_ Investment_77 -1.28 3.27 82.23 3.29 -0.03 X X X Var of t^ (S^2) /AV E [V^(t^)] /AV Var of t^ (S^2) /E[V^(t^)] Experiment 4.Industry "77" (f=0.033) V (t^) Statistic

(33)

29 Table 12. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed

sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POI πps, π-weighted ratio estimator f=0.067 (230 of 3437) Industry _77

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Table 13. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POI πps, π-weighted ratio estimator f=0.067 (230 of 3437) Industry _77

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Mean Var E(t^) S^2(t^) B(θˆ) MSE(θˆ)

1 2 3 4 5 6 7 8 PAR πps _wages_div77 230 0 5794580 71671734643 1759641 3168007338728 95 POI πps _wages_div77 230 192 5792020 72049060016 1757080 3159379791211 95 RE (%)_Wages_77 0.00 X X 0.53 -0.15 -0.27 X PAR πps _investment_div77 230 0 2540886 1396414925561 1125335 2662793723347 70 POI πps _investment_div77 230 192 2546422 1428697579997 1130871 2707566969274 71 RE (%)Investment _77 0.00 X X 2.31 0.49 1.68 X Sample size n_s ECR (%) Statistic The 10 000 estimates (t^) t 5794445 2528582 Experiment 5.Industry "77" (f=0.067) E [V^(t^)] S^2[V^(t^)] B(θˆ) MSE(θˆ) 9 10 11 12 13 14 15 16

PAR πps _wages_div77 70905409103 1.04E+20 -766325540 1.04527E+20 70804745757 0.99 1.01 1.00 POI πps _wages_div77 70601913155 1.17E+20 -1447146861 1.18677E+20 70784355578 0.98 1.02 1.00

RE (%)_Wages_77 -0.43 12.16 88.84 13.54 -0.03 X X X

PAR πps _investment_div77 1403546635078 2.51E+24 7131709518 2.51285E+24 1387633496619 0.99 0.99 0.99 POI πps _investment_div77 1399005133713 2.59E+24 -29692446284 2.58848E+24 1387238449254 0.97 1.02 0.99

RE (%)Investment _77 -0.32 2.98 316.34 3.01 -0.03 X X X Var of t^ (S^2) /AV E [V^(t^)] /AV Var of t^ (S^2) /E[V^(t^)] V (t^) Statistic

The 10 000 variance estimates V^(t^)

(34)

30 Table 14. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed

sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POIπps, π-weighted ratio estimator f=0.10 (344 of 3437) Industry _77

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Table 15. Summary measures after Monte Carlo simulation involving 10 000 samples of fixed sample size with design: PARπps ,π-estimator versus 10 000 samples of

random sample size with design: POIπps, π-weighted ratio estimator f=0.10 (344 of 3437) Industry _77

OBS: The mean of t̂ estimates the true expected value of t̂, the mean of 𝑉̂ estimates the true expected value of 𝑉̂, 𝑎𝑛𝑑 𝑆𝑡̂2 estimates the true variance (Särndal, Swensson and Wretman 2003, p.280).

Mean Var E(t^) S^2(t^) B(θˆ) MSE(θˆ)

1 2 3 4 5 6 7 8 PAR πps _wages_div77 344 0 5797259 43025984349 2814 43033904804 95 POI πps _wages_div77 344 264 5793147 42899295124 -1298 42900979852 95 RE (%) Wages _77 0.00 X X -0.29 -53.88 -0.31 X PAR πps _investment_div77 344 0 2530930 852041093179 2348 852046607346 81 POI πps _investment_div77 344 264 2539661 847563468039 11079 847686218454 81 RE (%) Investment_77 0.00 X X -0.53 371.81 -0.51 X Sample size n_s ECR (%) Statistic The 10 000 estimates (t^) t 5794445 2528582 Experiment 6.Industry "77" (f= 0.10) E [V^(t^)] S^2[V^(t^)] B(θˆ) MSE(θˆ) 9 10 11 12 13 14 15 16

PAR πps _wages_div77 42953883437 2.80E+19 -72100912 2.7992E+19 42924535105 1.00 1.00 1.00

POI πps _wages_div77 42988478913 3.18E+19 89183788 3.17657E+19 42912393168 1.00 1.00 1.00

RE (%) Wages _77 0.08 13.47 23.69 13.48 -0.03 X X X

PAR πps _investment_div77 864593154113 6.31E+23 12552060934 6.31512E+23 861809852936 1.01 0.99 1.00

POI πps _investment_div77 874196024221 6.44E+23 26632556182 6.45E+23 861573484655 1.02 0.97 0.99

RE (%) Investment_77 1.11 1.97 112.18 2.06 -0.03 X X X Var of t^ (S^2) /AV E [V^(t^)] /AV Var of t^ (S^2) /E[V^(t^)] V (t^) Statistic

The 10 000 variance estimates V^(t^)

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Från den teoretiska modellen vet vi att när det finns två budgivare på marknaden, och marknadsandelen för månadens vara ökar, så leder detta till lägre

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft