Calibration Estimation under Nonresponse based on Simple Random Sampling Vs Pareto Sampling

(1)

1 Örebro University

Örebro University School of Business Statistics, Paper, Second level

Supervisor: Per Gösta Andersson. Examiner: Sune Karlsson

Spring, 4 June 2014

Calibration Estimation under Nonresponse based on

Simple Random Sampling Vs Pareto Sampling

Muhammad Qasim 850224

(2)

2

Abstract

This study is concerned to compare two sampling designs estimators using calibration estimation in presence of nonresponse. The calibration estimation is a weighting technique that compensate for unit nonresponse in the survey analysis. The calibration estimator is compared using different criteria’s such as varying the nonresponse rate and increasing the sample size. Calibration estimation using Pareto weight is quite a new idea in the literature. In this study, simulation on synthetic data is used and a nonresponse mechanism is applied on sample data. Three kinds of estimator of finite population total are compared with each other, regarding to their bias and their variance under different response distribution. From results, calibration under Pareto design produced reasonable results compare to Simple random sampling.

Keywords: Calibration estimator, Design, Nonresponse, Pareto sampling, Simple random sampling, Weighting.

(3)

3

Acknowledgments

I would like to acknowledge the assistance of my supervisor Per Gösta Andersson. His remarkable support and encouragement made possible to complete this thesis.

I would also like to express my gratitude to Sune Karlsson for his valuable suggestions and comments. Finally, many thanks go out to my friend Zulqarnain Haider and my brother Abdullah for their endless support during my thesis.

(4)

4

1. Introduction

Nonresponse has been considered a major problem in statistical surveys. It is an undesirable property that affects the quality of statistics. By nonresponse, the desired information is not obtained from the entire set of elements of a sample (Särndal, Swensson and Wretman, 1992).

Statistical results of a survey play a crucial role in decision making. These results are obtained from computing survey data. If the survey data is affected by non-negligible nonresponse, it is more likely to obtain flawed results which may lead to wrong decision/s. Therefore, to obtain right decision, the treatment of nonresponse is very important.

In the past, surveys had less nonresponse due to a more positive attitude towards surveys; so survey results were not much affected. Therefore, standard statistical tools were enough to yield acceptable results, while neglecting the nonresponse. But recently, rapid increase in nonresponse seriously influences the quality of statistics. Hence, usual standard statistical tools are not enough to produce acceptable estimates due to problem of underestimation (Särndal and Lundström, 2005; Särndal, Swensson and Wretman, 1992).

The effect and amount of nonresponse is not the same in all kind of surveys but statisticians completely agreed that nonresponse can severely affect the quality of statistics (Särndal and Lundström, 2005). Thus, various techniques have been proposed to develop an estimator, which aim is either to completely eliminate the error of nonresponse or at least reduce it to some extent. Several techniques have been introduced to modify estimates in presence of nonresponse, such as, subsampling of nonrespondents, dealing with nonresponse before and after data collection, callback and follow up, randomized response and technique based on response modeling. However our discussion will only focus on now to treat nonresponse at the estimation stage.

Ideally, there is no such approach which claims unbiasness in presence of nonresponse unless except that; if nonresponse occur completely at random, that is quite unlikely in practice (Särndal and Lundström, 2005). However, some estimators do exist which produced near unbiasness in nonresponse case. One such approach, which produces near unbiasness, is known as calibration. Calibration is a general method that computes different weights using auxiliary information, restrained by equation. The strength of this technique is that, it is a model free approach and can use any kind of auxiliary information as long as it satisfies calibration equation. The approach primarily considers variables which have linear relationship. Calibration weights give a perfect estimate when applied to an auxiliary variable total. If there exist a strong relationship between the study variable and an auxiliary variable, calibration performs well for the study variable (Särndal and Lundström, 2005; Montanari and Ranalli 2005; Deville and Särndal, 1992).

(6)

6

Using auxiliary information, calibration weights modify the design weights and aim to stay close to them, the reason is that design weight yield some desired properties such as unbiasness. Calibration weights are not unique since various sampling designs and different combination of auxiliary variables yield different set of weights. In this paper two sampling designs are considered; Simple Random Sampling (SI), and Pareto Sampling. Calibration weights are constructed using these both design weights with different combinations of auxiliary variables. Finally, these weights were used to produce an estimator to treat nonresponse. The main aim is to identify which sampling design produces better estimates in case of nonresponse.

1.2 Background& Related work

During 1930 to 1940, statisticians knew the fact that high amount of nonresponse can severely affect surveys estimates quality; therefore some initial methods were introduced to deal with nonresponse. Choran (1951) states “unfortunately, any sizeable percentage of nonresponse makes the results open to

question by anyone who cares to do so” (Särndal, Swensson and Wretman, 1992).

Before the 1980, a method to adjust nonresponse was introduced which was based on a deterministic model. In this method the population was divided into two nonoverlapping groups; a respondent group and nonrespondent group. Every sample selected from respondent group has response probability one, while response probability from second group is assumed to be zero. The drawback of this method is its simplicity and is not practical in real world (Särndal and Lundström, 2005).

In the 1980s, another method to adjust nonresponse was introduced. In this method a sample is randomly selected from a finite population, the observed sample had two groups; a response and a nonresponse group. The response group is considered a subset of the random sample. This method is more realistic than the first one because it permits every individual to have its own second order individual response probability . It is always an unknown binary outcome like tossing a coin, which results in head or tail with an unknown probability. It can be estimated by replacing ̂ . An individual response probability is assumed to be positive in order to obtain unbiased or nearly unbiased estimator, however according to Kott, this requirement is not realistic since many surveys have some amount of nonresponse that never respond under any circumstances (Särndal and Lundström, 2005).

Complete elimination of nonresponse is an ideal solution in sample surveys which can produce unbiased estimates but it is very costly and time consuming specially in large surveys. Therefore statisticians introduced techniques that modify estimators rather than developing the techniques of eliminating all nonresponse (Särndal, Swensson and Wretman, 1992).

The concept of Calibration, proposed by Särndal and Lundström (2005) is explained in section 2. Calibration of population total with variance is further described in SI and Pareto designs. In Section

(7)

7

3, the result of simulation study is presented, tables of result and their figures of SI vs Pareto are also compared. Finally, in section 4 we present the conclusions.

2. Methodology

2.1 Calibration

Nonresponse is a feature that severely affects the quality of survey estimates. It is a common but unwanted characteristic of a survey. An increasing interest in the literature concerning nonresponse has been seen in recent years. There are two types of nonresponse; item nonresponse and unit nonresponse. “A unit nonresponse element is one for which information is missing on all the

questionnaire variables. An item nonresponse element is one for which information is missing on at least one, but not all, of the questionnaire variables” (Särndal and Lundström, 2005). Unit

nonresponse normally poses a much greater risk to survey research than item nonresponse. Item nonresponse causes further risk to data quality since it decrease the sample size, if only the completed data are used in the analysis. Unit nonresponse is normally much larger in quantity than item nonresponse. There are various statistical techniques which have been developed to address unit nonresponse (Yan and Curtin, 2010). Calibration can be used to correct for sample survey nonresponse (Chang and Kott2008). Its weights compensate for unit nonresponse while imputation is used for the item nonresponse. There is also a combined approach which uses imputation for the item nonresponse and weighing for unit not response (Särndal and Lundström, 2005). This paper main concern is calibrating two different design weights and comparing estimates under certain response distributions. Calibration estimation is a method of weighting, where weights can be calculated using auxiliary variables. The weights are restrained by a calibration equation that would well adjust for nonresponse, given that proper auxiliary data is used. The main interest of calibration weights is to reduce the nonresponse bias of the estimation as much as possible (Särndal and Lundström, 2005).

An important question about the calibration method is “why calibration is used?”

In full response case each unit in sample has its own design weights, that produce good quality unbiased estimators but when nonresponse comes in to the picture then these design weights are not enough to produce good quality estimators because the sum of the product of study variable and design weights is a clear underestimate of population total. So, we need a method that creates new weights larger than the design weight to compensate for nonrespondents. Calibration is one of the methods that modify design weights to produce estimates that work well in reduction of bias and variance (Särndal and Lundström, 2005).

Let us consider a set of labels for a finite population. Let of size be a randomly selected sample from the target population. Let be the probability of sample from

(8)

8

this target population. The probability that the _{element is included in the sample is called inclusion}

probability of element and can be written as following (Särndal, Swensson and Wretman, 1992).

∑

where for all elements . According to Särndal and Lundstrom (2005), the inclusion probability of an element is the known probability with which the element is selected under a given sampling design. Whereas, the inclusion probability of and elements both are included in the sample is as below.

∑

if in that case .

Let be elements of variable and . The total of auxiliary vector ∑ is assumed to be known, where .

Let us consider the design weight that is the reciprocal of inclusion probability and assumed to be . Let's take an example to interpret design weight in simple random sampling, also known as SI sampling. If 10 elements are selected out of a total population of size 50, the design weight is equal to ⁄ ⁄ . One interpretation is that, an element having design weight equal to 5 represents itself and four other nonsampled elements in the population (Särndal and Lundström, 2005). Another interpretation of design weight of unit can be the number of population units represented by unit (Lohr, 2010).

⁄ (1)

The individual response probability of an element is

In single phase, sample can be drawn from population with known sampling distribution, denoted by ; and then in the second phase over all possible response sets can be realized, given , under the unknown response distribution . Individual response probability is assumed to be known but in practice it is always unknown and can be estimated by replacing ̂ .

The reciprocal of individual response is known as response influence and is denoted by .

(9)

9 The target population total is defined as:

∑ (3)

Equation (4) & (5) are known as calibration estimator and calibrated weights, where ̂ and denote the estimator totals, calibrated weights, study variables and elements of a variable respectively (Särndal and Lundström, 2005).

̂ ∑ (4)

∑ ∑ ∑ ₍₅₎

The calibration weight in second phase, where individual response probability is added for each element is:

{ (∑ ∑ ) (∑ )

} (6)

Finally, the calibration equation or calibration constrain can be written as:

∑ ∑ (7)

The variables which satisfy equation (7) are eligible for estimation using calibration (Särndal and Lundström, 2005).

2.2 Sampling Design

Sampling design is a scheme, in which each unit is known probability of selection in the population. A random mechanism is used to choose a particular unit to be a part of a sample. Let are possible sample with known probability of selection ( ) under condition ( ) (Särndal, Swensson and Wretman, 1992). Two sampling design, SI and Pareto are taken to compute various estimators. Different estimators are computed in different situations such as full response, nonresponse and then nonresponse with individual response probability. First target population parameter with variance is computed. Then the estimators ̂ under SI and ̂ under Pareto

design with their variances are calculated in full response case. Three different estimators of population total ̂ ̂ , ̂ under SI and ̂ ̂ , ̂ under Pareto are compared in nonresponse case. These estimators are defined and explained in the next section. The theoretical variance and simulated variance of the three estimators under SI and only simulated variance under Pareto design are computed. Then their simulated variances are compared. Three different data sets with 50%, 65% and 80% response rate are used.

(10)

10 2.2.1 Simple random sampling

SI sampling is the most basic form of probability sampling in which every possible subset of units has same probability to be a part of sample. There are ( ) equally likely possible samples and any individual sample of units is probability (Lohr, 2010).

(⁄ ) ⁄

The next expression is known as inclusion probability under SI, where and denotes sample and population sizes respectively.

⁄ (8)

Let consider the estimators in full response case. ̂ is known Hurwitz Thompson estimator of

population total under SI design.

̂ ∑ ( ) ∑ ( ) ̅ (9)

The target population variance of totals and its estimator is defined as:

̂ ( ) (10)

̂[ ̂ ] ( ) (11)

where,

(∑ ∑ ⁄ ) ⁄ (12)

(∑ ∑ ⁄ ) ⁄ (13)

Now consider nonresponse case; three different estimators of population totals are introduced. The first one is ̂ the extended version of Hurwitz Thompson estimatorin two phase sampling. ̂ and ̂ are calibration estimators where later, is in two phase sampling. These estimators of population total and their variance estimators are defined below.The individual response probability is defined by .

̂ ∑ ( ) (14)

̂( ̂ ) ( ) (15)

(11)

11

(∑ ( ) (∑ ) ⁄ ) ⁄ (16)

̂ ∑ (17)

The variance of ̂ is divided in two components, one is estimated sampling variance component ̂ and other is estimated nonresponse variance component ̂ (Särndal and Lundström, 2005).

̂( ̂ ) ̂ ̂ (18)

where ̂ is defined as;

̂ ( )

[∑ ̂ (∑ ̂ ) ] ( ) ∑ ̂ (19) Where, ∑ ∑ ∑

and ̂ is defined as;

̂ ( ) ∑ ̂ (20)

where ̂ is a vector of residuals of elements.

̂ ∑ _∑ ₍₂₁₎

̂ is same as ̂ except that ̂ includes response probability of each individual.

̂ ∑ (22)

̂ is a sampling variance component in two phase sampling while ̂ is a nonresponse variance component in two phase sampling.

̂ ( ̂ ) ̂ ̂ (23)

̂ ( )

[∑ ̂ (∑ ̂ ) ] ( ) ∑ ̂ (24)

(12)

12 2.2.2 Pareto sampling

Let a sampling design with , where . Let inclusion probability of element is denoted by:

⁄∑ (26)

where is sample size and . , is assumed in advance. If not satisfied then modify size measure. Let are standard uniform random numbers. The generates ranking variables

⁄ (27)

To select a sample, choose the first smallest values from , with respect to the corresponding elements and . Where and are sample unit of variable and for (Rosen, 1997). Let us consider the estimators in full response case. ̂ is estimator of

population total under Pareto design. ̂ =∑ ( )

The target population variance of totals and its estimator is defined as:

̂ {∑ [∑ _∑ ] } (28)

̂[ ̂] ∑ { ∑ ⁄

∑ } (29)

Due to computational merits the variance expression can be rewritten as:

̂[ ̂ ] ⁄ (30)

where ∑ ( ) , ∑ ( ) and ∑ .

Now consider nonresponse case; three different estimators of population total are considered. The first one is; ̂ the Hurwitz Thompson estimator of population total under Pareto design in two phase sampling. ̂ and ̂ are calibration estimators where later, is in two phase sampling. All of the three estimators use the Pareto design weights. The individual response probability is defined by .

(13)

13

̂ ∑ (31)

̂ ∑ (32)

̂ ∑ (33)

There is no expression of theoretical variance for all of the above three estimators under Pareto design. Therefor simulated variance is computed and described in simulation section.

2.3 Flow chart

Simulated estimators with variances under both designs are described in a flow chart.

( ̂ ) Population

Full Sample Case

̂ ̂( ̂ ) ̂ ̂( ̂ ) ̂( ̂ ) ̂ ̂( ̂ ) ̂ ̂( ̂ ) ̂( ̂ ) ̂ ̂( ̂ ) ̂ ̂( ̂ ) ̂( ̂ )

SI design Pareto design

̂ ̂ ̂ ̂ ̂ ̂

Non Respons Case

Pareto design SI design

(14)

14

3. Data description

A synthetic data set is created using chi-square distribution with two degree of freedom. This data set is considered as a population and is denoted by . The population contains four variables, one study and three auxiliary variables and .The subscript denotes elements in the each variable, where Using cholesky decomposition criteria to create relationship between the study variable and the auxiliary variables. All auxiliary variables and study variable are of continuous type.

There is several ways to generate the response probability. One example of generating the response probability is increasing exponential response distribution.

That is described in Särndal and Lundström, (2005).Where is 0.69031, 1.27891and 2.7088, to obtain the desired average response probabilities that is 50%, 65% and 80% respectively.

Let be independent realizations of a Uniform (0,1) random variable of size 3000, and is response probability of individual. The selection of the element is obtained by Bernoulli experiment, which is explained by the following rule; if the element is selected otherwise not. The Bernoulli random variable is defined as

{

3.1 Simulation

In this section calibration method is applied on synthetic data set using SI and Pareto designs. Three different estimators in both design with different criteria such as varying the sample size , that is 50,100,500,800,1000,1300,1500 and varying the nonresponse rate that is and . All estimators are repeated for 10000, where R is the number of replicates. The target population total is 3113.6, ̂ is total estimator under simple random sampling while ̂ total estimator under Pareto design.

First, we applied simulation in full response case and then in nonresponse case for each design. The simulated mean of total and simulated variance of total estimators is defined as:

̂ ∑ ̂

(34)

(15)

15

3.2 Results

First, simulation is carried out in full response; presented in table one and then in nonresponse case described in Table 2, 3, and 4. In nonresponse case three estimators are compared with respect to bias and variance of the dataset with 50%, 65% and 80% response rate. The 50% response rate figures and table presented in this section while 65% and 80% response rate figures illustrated in the appendix. Table 1: Comparison of estimators bias of total and bias of variance of SI vs Pareto in full response for R=10000 Sample size 50 100 500 800 1000 1300 1500 SI ̂ 5.054827 -6.54755 -0.46502 -0.10378 0.377239 -0.10068 -0.24472 ( ̂ ̂ ) 2139.319 -933.63 -57.2451 -11.2194 0.038023 4.67908 -0.33094 Pareto ̂ -1.46935 5.787729 5.266838 32.73403 57.48357 99.07335 122.8409 ( ̂ ̂ ) -1.29581 94.97147 50.03936 301.3107 544.4961 1174.234 -605.771

(16)

16

Table 2: comparison of estimator bias of total and variance of SI vs Pareto for R=10000 at 50% response rate. Sample size 50 100 500 800 1000 1300 1500 ( ̂ ) -17.7149 -17.4333 -17.9097 -17.0038 -19.0149 -20.7243 -20.7637 ( ̂ ) 804.5798 674.0316 580.7945 569.4414 565.044 560.336 562.7463 ( ̂ ) 304.5425 147.169 26.73449 14.0278 9.955311 4.369207 5.212072 ̂ ̂ 525207 260891.6 44509.57 24042.05 17255.61 11204.03 8797.017 SI ̂ ̂ 856047.2 333878.4 51015.75 28435.03 20095.32 12572.52 9918.749 ̂ ̂ 587174.9 218718.8 31060.89 17058.67 12282.45 7590.736 5930.286 ( ̂ ̂ ) 9940968 9518684 8023003 7049349 6395674 5426901 4787117 ( ̂ ̂ ) 1.07E+09 39230104 37112.7 20868.11 15637.87 10918.57 8895.056 ( ̂ ̂ ) 271883.8 150366.8 28714.76 16288.65 12165.78 8423.168 6806.107 ̂ -31.502 -18.4489 -15.6337 12.40954 37.12634 79.81873 103.9676 ̂ 983.9513 782.1723 596.6507 528.8093 463.2604 364.1458 296.8474 ̂ 634.5989 383.3958 119.1623 68.5673 29.52698 -17.7937 -53.4684 Pareto ̂ ̂ 374788.8 189567.4 32226.58 18602.48 14735.98 11368.3 9974.979 ̂ ̂ 672437.8 279573.1 40553.41 19998.99 12366.77 6121.884 3930.042 ̂ ̂ 688687.3 298160.4 68925.53 46350.82 34694.67 22626.35 19568.57

(17)

17

Table 3: Comparison of estimators bias of total and variance of SI vs Pareto for R=10000 at 65 % response rate. Sample size 50 100 500 800 1000 1300 1500 ( ̂ ) 18.79918 22.87238 16.7897 17.91968 16.79133 18.04105 18.65632 ( ̂ ) 481.6279 399.2841 318.7976 316.0726 311.9406 309.7897 309.9312 ( ̂ ) 178.4901 77.76691 -13.8729 -17.4658 -22.3474 -24.6086 -24.8822 ̂ ̂ 438952.1 206126 35848.48 20369.11 14780.93 9686.684 7358.975 SI ̂ ̂ 474057.8 202978 30047.66 16612.67 12298.92 7782.792 5906.052 ̂ ̂ 365679.3 153114.4 22423.4 12523.62 9192.921 5905.542 4436.351 ( ̂ ̂ ) 10079787 9708583 8195819 7206159 6542618 5562774 4909093 ( ̂ ̂ ) 33555554 7003100 25896.37 14486.82 10697.02 7243.255 5733.118 ( ̂ ̂ ) 223902.3 120918.7 22743.55 12747.28 9414.466 6356.841 5009.798 ̂ 18.47349 19.90629 23.04249 50.29884 79.31547 119.6486 143.5468 ̂ _{659.4502 490.526} _{342.0357 296.6672 253.1883 183.4417 136.2551} ̂ _{438.2472 244.0523} _{53.60404 18.3946} _{-7.17676 -43.7894 -69.3146} Pareto ̂ ̂ 222293.4 113091.9 18694.25 10429.37 7911.876 5972.852 5113.559 ̂ ̂ 432485 177177.5 26490.52 12607.53 7831.648 3913.806 2389.12 ̂ ̂ 455205.1 192809.9 35795.6 19703.19 13151.93 7561.563 5188.067

(18)

18

Table 4: Comparison of estimators bias of total and variance of SI vs Pareto for R=10000 at 80% response rate. Sample size 50 100 500 800 1000 1300 1500 ( ̂ ) -8.15556 1.425367 5.376795 4.666161 4.332517 2.923186 2.149807 ( ̂ ) 294.9884 229.3116 186.6317 180.3983 177.1338 175.661 173.4633 ( ̂ ) 124.5371 45.06588 -16.9066 -25.7534 -30.1862 -33.5279 -35.8392 ̂ ̂ 384573.1 194690.3 33856.43 18468.36 13436.62 8680.319 6791.317 SI ̂ ̂ 326682.6 147124.3 23004.22 12240.58 9160.947 5777.086 4514.208 ̂ ̂ 298087.2 135909.5 22384.08 11841.78 8910.836 5636.941 4439.124 ( ̂ ̂ ) 9860984 9567925 8134723 7144023 6489825 5508677 4857243 ( ̂ ̂ ) 4304019 136866.4 21098.59 11684.79 8554.612 5682.251 4405.148 ( ̂ ̂ ) 204351 109991.5 21244.88 11972.62 8851.54 5966.772 4676.827 ̂ 3.880608 2.55223 9.598401 35.95397 61.33523 102.6815 129.3095 ̂ _473.976 _331.586 _{209.0309 174.2853 145.1396 95.91313 64.357} ̂ _{377.8666 216.2584} _{62.00354 26.3426} _{6.876599 -28.8369 -52.2721} Pareto ̂ ( ̂ ) 154109.7 74839.25 11700.86 6133.215 4458.795 3105.708 2599.741 ̂( ̂ ) 324293 139119.5 20587.31 9706.973 6108.762 3115.849 2053.588 ̂( ̂ ) 348008.6 161738.1 31702.99 17524.47 11928.28 6901.003 5005.939

(19)

19

Figure 1 and 2 compare the bias and the variance of SI and Pareto estimators for population total in full response case using different sample sizes.

Figure 3 compares the bias of ̂ under SI with the bias ̂ under Pareto design, where ̂ is Hurwitz Thompson estimator for population total in two phase sampling.

Figure 4 compares the bias of ̂ under SI with the bias ̂ under Pareto design, where ̂ is single phase calibration estimator proposed by Särndal and Lundström (2005).

Figure 5 compares the bias of ̂ under SI with the bias ̂ under Pareto design, where ̂ is two phase calibration estimator proposed by Särndal and Lundström (2005).

Figure 6, 7 and 8 compares the variances of all the three estimators mentioned above. Figure 1: comparison of estimator bias of total of SI vs Pareto in full response case.

Figure 2: comparison of estimator bias of variance total of SI vs Pareto in full response case. -20 0 20 40 60 80 100 120 140 10 50 100 500 800 1000 1300 1500 SI Pareto -2000 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 10 50 100 500 800 1000 1300 1500 SI Pareto

(20)

20 Figure 3: ( ̂ )vs ( ̂ )at 50% response rate.

Figure 4: ( ̂ )vs ( ̂ )at 50 % response rate.

Figure 5: ( ̂ )vs ( ̂ )at 50 % response rate. -40 -20 0 20 40 60 80 100 120 50 100 500 800 1000 1300 1500 Si Pareto 0 200 400 600 800 1000 1200 50 100 500 800 1000 1300 1500 SI Pareto -100 0 100 200 300 400 500 600 700 50 100 500 800 1000 1300 1500 SI Pareto

(21)

21 Figure 6: ̂( ̂ )vs ̂( ̂ ) at 50 % response rate.

Figure 7: ̂ ( ̂ )vs ̂ ( ̂ ) at 50 % response rate.

Figure 8: ̂( ̂ )vs ̂( ̂ ) at 50% response rate. 0 100000 200000 300000 400000 500000 600000 50 100 500 800 1000 1300 1500 SI Pareto 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 50 100 500 800 1000 1300 1500 SI Pareto 0 100000 200000 300000 400000 500000 600000 700000 800000 50 100 500 800 1000 1300 1500 SI Pareto

(22)

22

4. Conclusion and Discussion

This paper compares the SI design with Pareto design in full response case, nonresponse case in single phase and in two phase approach. Calibration estimation under nonresponse with Pareto design has recently gained popularity among researchers.Calibration under Pareto design is more sensitive than SI for various sample sizes. We computed three estimators ̂ ̂ and ̂ ; ̂ is extended version of

Hurwitz Thompson estimator in two phase sampling, ̂ is acalibration estimator and ̂ is a

calibration estimator in two phase sampling.

In full response case, Pareto produces less bias and variance compared to SI design for 500. In two phase approach Pareto design failed to produce small bias and variance compared to SI. Hence, Pareto design is not a good choice of sampling in two phase calibration estimation.

In single phase calibration estimation under Pareto design yielded small bias and variance compared to SI for 500. The Pareto estimator bias quickly converged to zero than the bias of SI.

The results also concluded that the data set having high response rate produced much protection against bias and variance. The estimators in two phase yields better estimators than single phase. In future study, since the variance and bias is high using Pareto design in two phase approach, further research is recommended to reduce bias and variance.

In this paper only continuous type auxiliary variables and study variable is used. Further examination is required for categorical type of variables under Pareto design. There are few difficulties in using Pareto design with categorical type of data set, one is how to deal multicollinearity problem in categorical type of data set and second is how to choose the important auxiliary variable if there exist several auxiliary variables in the data set.

In this study calibration estimation by Särndal and Lundström is used. The method depends on auxiliary variable total, so it cannot be used if this requirement is not fulfilled. Only the countries which have good quality register system can benefit from this method which is one of the limitation of this method.

(23)

23

References

Braun, W.J., and Murdoch, D.J (2007).A first course in statistical programming with R. New York: Cambridge University Press.

Chang, T., and Kott, P.S. (2008). Using calibration weighting to adjust for nonresponse under a plausible model.Biometrika trust, Vol. 95, No. 3, pp 555-571.

Deville, J. C., and Särndal, C.E. (1992).Calibration Estimators in Survey Sampling.Journal of the

American Statistical Association, Vol. 87, No. 418, pp. 376-382.

Kott, P.S. (1994).A Note on Handling Nonresponse in Sample Surveys.Journal of the American

Statistical Association, Vol. 89, No. 426, pp. 693-696.

Lohr, S.L.,(2010). Sampling: Design and Analysis. Boston: Brooks/Cole Cengage Learning.

Montanari, G.E., and Ranalli M.G. (2005).Nonparametric Model Calibration Estimation in Survey Sampling.Journal of the American Statistical Association. Vol. 100, No. 472, pp 1429-1442.

Rizzo, M.L.,(2008). Statistical computing with R. Florida: Chapman & Hall

Rosen, B. (1996). On sampling with probability proportional to size.Journal of Statistical Planning

and Inference. Vol.62, pp. 159-191.

Särndal, C.E. and Lundström,S. (2005). Estimation in Surveys with Nonresponse. West Sussex, England: John Wiley & Sons Ltd.

Särndal, C.E., Swensson B. and Wretman, J. (1992).Model Assisted Survey Sampling. New York: Springer- Verlag.

Särndal, C.E. (2007). The calibration approach in survey theory and practice.Statistics Canada, Vol. 33, No. 2, pp. 99-119.

Yan, T. and Curtin, R. (2010). The Relation Between Unit Nonresponse and Item Nonresponse: A Response Continuum Perspective. Oxford University Press. Vol. 22, No. 4, PP. 535-551.

(24)

24

Appendix

Figures

Figure9: ( ̂ )vs( ̂ )at 65 % response rate.

Figure10: ( ̂ )vs( ̂ )at 65 % response rate.

Figure 11: ( ̂ )vs( ̂ )at 65% response rate. 0 20 40 60 80 100 120 140 160 50 100 500 800 1000 1300 1500 SI Pareto 0 100 200 300 400 500 600 700 50 100 500 800 1000 1300 1500 SI Pareto -100 0 100 200 300 400 500 50 100 500 800 1000 1300 1500 SI Pareto

(25)

25 Figure 12: ̂( ̂ )vs ̂( ̂ )at 65% response rate.

Figure 13: ̂( ̂ )vs ̂( ̂ )at 65% response rate.

Figure 14: ̂ ( ̂ )vs ̂( ̂ )at 65% response rate. 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 50 100 500 800 1000 1300 1500 SI Pareto 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 50 100 500 800 1000 1300 1500 SI Pareto 0 100000 200000 300000 400000 500000 50 100 500 800 1000 1300 1500 SI Pareto

(26)

26 Figure 15: ( ̂ )vs( ̂ )at 80% response rate.

Figure 16: ( ̂ )vs( ̂ )at 80% response rate.

Figure 17: ( ̂ )vs( ̂ )at 80% response rate. -20 0 20 40 60 80 100 120 140 50 100 500 800 1000 1300 1500 SI Pareto 0 50 100 150 200 250 300 350 400 450 500 50 100 500 800 1000 1300 1500 SI Pareto -100 -50 0 50 100 150 200 250 300 350 400 50 100 500 800 1000 1300 1500 SI Pareto

(27)

27 Figure 18: ̂( ̂ )vs ̂( ̂ )at 80% response rate.

Figure 19: ̂ ( ̂ )vs ̂( ̂ )at 80% response rate.

Figure 20: ̂ ( ̂ )vs ̂( ̂ )at 80% response rate. 0 50000 100000 150000 200000 250000 300000 350000 400000 450000 50 100 500 800 1000 1300 1500 SI Pareto 0 50000 100000 150000 200000 250000 300000 350000 50 100 500 800 1000 1300 1500 SI Pareto 0 50000 100000 150000 200000 250000 300000 350000 400000 50 100 500 800 1000 1300 1500 SI Pareto

(28)

28

SI sampling: The following figures represent how an estimator is affected by nonresponse. Estimator is compared using different datasets in order to see the effect of nonresponse. Figure 21: ( ̂ )at 50%, 65% and 80% response rate.

Figure 22: ( ̂ )at 50%, 65% and 80% response rate.

Figure 23: ( ̂ ) at 50%, 65% and 80% response rate. -25 -20 -15 -10 -5 0 5 10 15 20 25 30 50 100 500 800 1000 1300 1500 50% 65% 80% 0 100 200 300 400 500 600 700 800 900 50 100 500 800 1000 1300 1500 50% 65% 80% -100 -50 0 50 100 150 200 250 300 350 50 100 500 800 1000 1300 1500 50% 65% 80%

(29)

29 Figure 24: ̂( ̂ )at 50%, 65% and 80% response rate.

Figure 25: ̂ ( ̂ )at 50%, 65% and 80% response rate.

Figure 26: ̂( ̂ )at 50%, 65% and 80% response rate. 0 100000 200000 300000 400000 500000 600000 50 100 500 800 1000 1300 1500 50% 65% 80% 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 50 100 500 800 1000 1300 1500 50% 65% 80% 0 100000 200000 300000 400000 500000 600000 700000 50 100 500 800 1000 1300 1500 50% 65% 80%

(30)

30

Pareto sampling: The following figures represent how an estimator is affected by nonresponse. Estimator is compared using different datasets in order to see the effect of nonresponse. Figure 27: ( ̂ )at 50%, 65% and 80% response rate.

Figure 28: ( ̂ )at 50%, 65% and 80% response rate.

Figure 29: ( ̂ )at 50%, 65% and 80% response rate. -60 -40 -20 0 20 40 60 80 100 120 140 160 50 100 500 800 1000 1300 1500 50% 65% 80% 0 200 400 600 800 1000 1200 50 100 500 800 1000 1300 1500 50% 65% 80% -200 -100 0 100 200 300 400 500 600 700 50 100 500 800 1000 1300 1500 50% 65% 80%

(31)

31 Figure 30: ̂( ̂ )at 50%, 65% and 80% response rate.

Figure 31: ̂ ( ̂ )at 50%, 65% and 80% response rate.

Figure 32: ̂ ( ̂ )at 50%, 65% and 80% response rate. 0 50000 100000 150000 200000 250000 300000 350000 400000 50 100 500 800 1000 1300 1500 50% 65% 80% 0 100000 200000 300000 400000 500000 600000 700000 800000 50 100 500 800 1000 1300 1500 50% 65% 80% 0 100000 200000 300000 400000 500000 600000 700000 800000 50 100 500 800 1000 1300 1500 50% 65% 80%

(32)

32

Programmingcode

Programming code: 1

#this program estimates H.TB(t) and B(v(t)) in full response case. # t=(N/n)*sum(yk) and v(t)= (N*N*((1/n)-(1/N)))* s^2

f<-function(n,r){ U<-0

U<-read.table("\\\\edunet\\dfs\\home05\\ht10\\muhqah101\\Desktop\\thesis

data\\data50.R") #read table that has

#has 50% response rate,(file name data50). y,x1,x2,x3,yr,teta U<-U[,-1] #remove the new column,that created by read.table N<-3000 T<-sum(U[,1]) t<-matrix(0,r,1vts<-matrix(0,r,1) b<-matrix(0,r,n) ff<-(N*N*((1/n)-(1/N))) vtp<-ff*(sum(U[,1]*U[,1])-( (sum(U[,1]))^2 )/N )/ (N-1) for(i in 1:r)

#run loop r time, repeat H.T totals and v(tr) r time. {

# select random sample of size n and put it in r=1,2,... and so on, b[i,]<-((sample(U[,1],n)))#b matrix save random sample on size n in each repeation.

#"cat("\n",b[i,]<-((sample(a[,1],n))))" write this if you want to see each obervation of the selected sample

#cat command is used to see what sample is selected.

t[i,]<-(sum(b[i,]))*(N/n) # calculate H.T total for each row.command calculate (r x 1)H.T torals.

vts[i,]<-var(b[i,])*ff#calculate H.T variance totals for each selected sample. save in row wise

(33)

33

#print(T) # population total

#print(vtp)# variance of pop total for specific sample size n.

print( mean(t)- T) # bias of totals estimators

print( mean(vts)-vtp) # bias of variance totals estimators #print(mean(vts))

#print(var(vts))

} #function end

Programming code: 2

#this program estimate bias of t and bias of v of pareto distribution in full response case.

f<-function(n,r) {

U<-0 N<-3000

t<-matrix(0,r,1)#pareto estimator of pop total. vts<-matrix(0,r,1)

vtp<-matrix(0,r,1) Q<-matrix(0,3000,1)

A<-matrix(0,r,1)# zero matrix A,B and C used in variances estimation

B<-matrix(0,r,1) C<-matrix(0,r,1)

U<-read.table("\\\\edunet\\dfs\\home05\\ht10\\muhqah101\\Desktop\\thesis data\\data50.R") #has 50% response rate,(file name data50).

y,x1,x2,x3,yr,teta

U<-U[,-1] #remove the new

column,that created by read.table lemda<-(n*U[,2]/sum(U[,2])) T<-sum(U[,1])

(34)

34 {

u<-runif(3000)#generate uniform r.numbers for(i in 1:3000)

{ #start loop 2

Q[i,1]<-( (u[i]*(1-lemda[i])) / (lemda[i]*(1-u[i]) ) ) # generate ranking variable

} #end loop 2

U2<-cbind(U,lemda,Q)

OU<-U2[ order(Q) ,] #sorted according to Q. OUn<-OU[1:n,] #takes the 1st n rows.

#--- estimate pareto total --- t[j,]<- sum( OUn[,1] / OUn[,7] ) # total j estimators, j=1

present 1st estimated total #see 4.8 p173 of Rosen paper

#--- pareto variance for population total --- f1<- ( sum( U[,1]*(1-lemda) )/sum( lemda*(1-lemda) ) )

#see 4.1 p171 of Rosen paper

f2<-sum( ( ((U[,1]/lemda)-f1)^2 ) * lemda * (1-lemda) ) vtp<- (N/(N-1))* f2

#--- estimate pareto variance --- A[j,]= sum( ((OUn[,1]/OUn[,7])^2) * (1- OUn[,7]) )

#see 4.12 p173 of Rosen paper

B[j,]= sum( (OUn[,1]/OUn[,7]) * (1- OUn[,7]) ) C[j,]= sum( 1- OUn[,7])

# A,B,C used to calculate variance estimator for totals vts[j,]= (n/(n-1)) * ( (A[j,]) - ( ( (B[j,])^2)/C[j,]) )

#---} #end loop1

print( mean(t)- T)# bias of totals estimators

print(mean(vts)-vtp) # bias of variance totals estimators #print(mean(vts))

(35)

35 }

Programming code:3

#this program estimate pop total in nonresponse case.

# t1<-sum(dk * yk/teta), t2=sum(wk * yk), t3=sum(wk * yk/teta)

#variance estimate of pop total v(t2). v(t1) and v(t3) are not done yet. #calibration technique suggest by C.E sandal(green book)

# with three auxiliary is used.see page 62 of green book. #--- how to load a file ---

#there is three file name data50,data65 and data80.click on File -> open script --> thesis data --> choose data50 or data65 or data80

#data50,data65,and data80 represents 50%,65% and 80 response rate in the data.

# read.table command added a new unwanted vector. use command U<-U[,-1] to remove it.

# donot change the position of any vector in matrix.

#--- f<-function(n,r){

U<-0

U<-read.table("\\\\edunet\\dfs\\home05\\ht10\\muhqah101\\Desktop\\thesis data\\data50.R") #read table that has#has 50% response rate,(file name data50). y,x1,x2,x3,yr,teta

U<-U[,-1] #remove the new column,that created by read.table T<-sum(U[,1]) S<-0 R<-0 N<-3000 L<-0 vk<-0

X<-matrix( c(sum(U[,2]),sum(U[,3]),sum(U[,4])),,1 ) #Xtotal y<-matrix(0,r,1)

(36)

36 yt<-matrix(0,r,1) ydt<-matrix(0,r,1) VYW<-matrix(0,r,1) VSAM<-matrix(0,r,1) VNR<-matrix(0,r,1) VYWt<-matrix(0,r,1) VSAMt<-matrix(0,r,1) VNRt<-matrix(0,r,1) VYWt1<-matrix(0,r,1) VSAMt1<-matrix(0,r,1) VNRt1<-matrix(0,r,1) Vydt<-matrix(0,r,1) for(i in 1:r){ #loop S<-U[sample(nrow(U),n), ]

#select a random sample from population of given size R<-S[S[,5]!=0,]

#takes only the respondent elements only by col 5. #cat("\n",dim(R),"\n",dim(S),"\n",dim(U),"\n") #present # of respondent,n,U

dk<-length(U[,1])/length(S[,1]) #respondent N/n, design weight #--- 1: calibration total of yw ---

Xr<-matrix( c(sum(R[,2]),sum(R[,3]),sum(R[,4])),,1 ) #Xr total XXr<-matrix(c(sum(R[,2]*R[,2]),sum(R[,3]*R[,2]),sum(R[,4]*R[,2]),

sum(R[,2]*R[,3]),sum(R[,3]*R[,3]),sum(R[,4]*R[,3]), sum(R[,2]*R[,4]),sum(R[,3]*R[,4]),sum(R[,4]*R[,4])),,3)

# element wise multiplication , then make matric of 3x3 mat2<-solve(dk*XXr,tol=1e-30) #inverse of matrix mat1<-(X-dk*Xr)

(37)

37

vk<-matrix( c(1+(L[1,1]*R[,2]+L[1,2]*R[,3]+L[1,3]*R[,4])),, ) #(rx1)

wk<-dk*vk #(rx1)

y[i,1]<-sum(wk*R[,5]) #calibration total of y

#---2: calibration total of yw/teta--- Xrt<-matrix(

c(sum(R[,2]/R[,6]),sum(R[,3]/R[,6]),sum(R[,4]/R[,6])),,1 )#Xrteta total

XXrt<-matrix(c(sum(R[,2]*R[,2]/R[,6]),sum(R[,3]*R[,2]/R[,6]),sum(R[,4]*R[,2]/R[,6 ]), sum(R[,2]*R[,3]/R[,6]),sum(R[,3]*R[,3]/R[,6]),sum(R[,4]*R[,3]/R[,6]), sum(R[,2]*R[,4]/R[,6]),sum(R[,3]*R[,4]/R[,6]),sum(R[,4]*R[,4]/R[,6])),,3)

mat2t<-solve(dk*XXrt,tol=1e-30) #inverse of matrix mat1t<-(X-dk*Xrt) Lt<-( t(mat1t)%*%mat2t )#lemda vkt<-matrix( c(1+(Lt[1,1]*R[,2]+Lt[1,2]*R[,3]+Lt[1,3]*R[,4])),, ) #(rx1) wkt<-dk*vkt/R[,6] yt[i,1]<-sum(wkt*R[,5])

#calibration total of y including indivisiualr.prob.

#--- 3: total= dk* yk/teta --- ydt[i,1]<- sum(dk* R[,5]/R[,6])

# R[,5] is yr, and R[,6] is teta k.

#--- 4: yw variance estimator in nonresponse case --- VXYr<-matrix(

c(sum(vk*R[,2]*R[,1]),sum(vk*R[,3]*R[,1]),sum(vk*R[,4]*R[,1])),,1 ) dVXYr<-dk*VXYr# (3x1),sum up all respondent

VXXr<-matrix(c(sum(vk*R[,2]*R[,2]),sum(vk*R[,3]*R[,2]),sum(vk*R[,4]*R[,2]), sum(vk*R[,2]*R[,3]),sum(vk*R[,3]*R[,3]),sum(vk*R[,4]*R[,3]),

sum(vk*R[,2]*R[,4]),sum(vk*R[,3]*R[,4]),sum(vk*R[,4]*R[,4])),,3) matV<-solve(dk*VXXr,tol=1e-30) # (3x3),inverse of matrix

M<-(matV%*%dVXYr) # (3x1)

(38)

38 ek<-R[,1]-Yhat

#ek<-matrix(ek,,1)#if you wanna see in vector form f1<-0 f2<-0 f3<-0 f1<-( (N^2)*(1-(n/N))*(1/n)*(1/(n-1)) ) f2<-( (sum((vk*ek)^2)) - (1/n)*(sum(vk*ek))^2 ) f3<-( sum(vk*(vk-1)*(ek^2)) ) VSAM[i,1]<- ( (f1 * f2 ) -( ((N/n)*(N/n)-1) * f3 ) )#sampling variance

VNR[i,1]<-((N/n)^2)*f3 #nonresponse variance

VYW[i,1]<-VSAM[i,1]+VNR[i,1]#calibrated variance of ytotal

#--- 5A: ywt variance estimator in nonresponse case--- #use 11.3 with vkt+teta.dont need to run this code bc we already have code in section 5B.

#VXYrt<-matrix(

c(sum(vkt*R[,2]*R[,1]),sum(vkt*R[,3]*R[,1]),sum(vkt*R[,4]*R[,1])),,1 ) # vktvk divided by teta

#dVXYrt<-dk*VXYrt # (3x1),sum up all respondent #VXXrt<-matrix(c(sum(vkt*R[,2]*R[,2]/R[,6]),sum(vkt*R[,3]*R[,2]/R[,6]),sum(vkt*R[,4 ]*R[,2]/R[,6]), sum(vkt*R[,2]*R[,3]/R[,6]),sum(vkt*R[,3]*R[,3]/R[,6]),sum(vkt*R[,4]*R[,3]/R [,6]), sum(vkt*R[,2]*R[,4]/R[,6]),sum(vkt*R[,3]*R[,4]/R[,6]),sum(vkt*R[,4]*R[,4]/R [,6])),,3)

#matVt<-solve(dk*VXXrt,tol=1e-30) # (3x3),inverse of matrix #Mt<-(matVt%*%dVXYrt) # (3x1)

#Yhatt<-R[,2]*Mt[1,1]+R[,3]*Mt[2,1]+R[,4]*Mt[3,1] #element wise multiplication

#ekt<-R[,1]-Yhatt

(39)

39 #f2t<-0 #f3t<-0 #f2t<-( (sum((vkt*ekt)^2)) - (1/n)*(sum(vkt*ekt))^2 ) #f3t<-( sum(vkt*(vkt-1)*(ekt^2)) ) #VSAMt[i,1]<- ( (f1 * f2t ) -( ((N/n)*(N/n)-1) * f3t ) ) #sampling variance

#VNRt[i,1]<-((N/n)^2)*f3t #nonresponse variance

#VYWt[i,1]<-VSAMt[i,1]+VNRt[i,1] #calibrated variance of ytotal

#--- 5B: ywt2 variance estimator in nonresponse case---- # usepreposition11.12 of green book with teta only, donot include vk

VXYrt1<-matrix(

c(sum(R[,2]*R[,1]/R[,6]),sum(R[,3]*R[,1]/R[,6]),sum(R[,4]*R[,1]/R[,6])),,1

) # vktvk divided by teta

dVXYrt1<-dk*VXYrt1 # (3x1),sum up all respondent

VXXrt1<-matrix(c(sum(R[,2]*R[,2]/R[,6]),sum(R[,3]*R[,2]/R[,6]),sum(R[,4]*R[,2]/R[,6 ]), sum(R[,2]*R[,3]/R[,6]),sum(R[,3]*R[,3]/R[,6]),sum(R[,4]*R[,3]/R[,6]), sum(R[,2]*R[,4]/R[,6]),sum(R[,3]*R[,4]/R[,6]),sum(R[,4]*R[,4]/R[,6])),,3) matVt1<-solve(dk*VXXrt1,tol=1e-30)# (3x3),inverse of matrix

Mt1<-(matVt1%*%dVXYrt1) # (3x1)

Yhatt1<-R[,2]*Mt1[1,1]+R[,3]*Mt1[2,1]+R[,4]*Mt1[3,1] #element wise multiplication

ekt1<-R[,1]-Yhatt1 #ekt1<-matrix(ekt1,,1) #if you wanna see in vector form

f2t1<-0 f3t1<-0 f2t1<-( (sum((ekt1/R[,6])^2)) - (1/n)*(sum(ekt1/R[,6]))^2 ) f3t1<-( sum( (1/R[,6])*( (1/R[,6]) -1 )*(ekt1^2)) ) VSAMt1[i,1]<- ( (f1 * f2t1 ) -( ((N/n)*(N/n)-1) * f3t1 ) ) #sampling variance VNRt1[i,1]<-((N/n)^2)*f3t1#nonresponse variance

(40)

40

VYWt1[i,1]<-VSAMt1[i,1]+VNRt1[i,1]#calibrated variance of ytotal/teta #--- 6: ydtvariace estimator in nonresponse case---Vydt[i,1]<-( N*N* ((1/n)-(1/N)) *( ( (sum(R[,1]/R[,6])^2 ) -

((sum(R[,1]/R[,6]))^2 /n) )/ (n-1)) )

#--- 7: the following 3 is verifying calibrated weights.--- #print(sum(wk*R[,2])) #print(sum(U[,2])) #print(sum(wk*R[,3])) #print(sum(U[,3])) #print(sum(wk*R[,4])) #print(sum(U[,4])) }#loop ends

#--- 8: biase of t1,t2,t3 and its plot --- print(mean(ydt)-T) # dk* yr/teta k, calculate bias

print(mean(y)-T) # wk* yr, calculate bias

print(mean(yt)-T)# wk* yr/teta k, calculate bias #plot(ydt, type="o" , col="blue")

#lines(y, type="o", col="red") #lines(yt, type="o", col="green")

#---9.to see variance of yw and ywt,and it sampling and non respons variances

print(mean(Vydt))#belong to section 6 # dk* yr/teta k,

print(mean(VYW)) #belong to section 4 # wk* yr, calculate

print(mean(VYWt1)) #belong to section 5B # wk* yr/teta k,

(41)

41 print(var(y)) print(var(yt)) } #function ends Programming code:4

#this program used calibration under pareto design.

#t1<-sum(Lk * yk/teta), t2=sum(wk * yk), t3=sum(wk * yk/teta). #variance estimate of pop total v(t2). v(t1) and v(t3)

#X1,X2,X3 are used in section 2A and 2B and X1,X2 are used in 3A and 3B. f<-function(n,r)

{ U<-0

yteta<-matrix(0,r,1) #yteta using teta weights only

y<-matrix(0,r,1) #yw using X1,X2,X3

yt<-matrix(0,r,1) #ywt using X1,X2,X3 and teta y1<-matrix(0,r,1) #yw using X2,X3

y1t<-matrix(0,r,1) #yw using X2,X3 and teta

Q<-matrix(0,3000,1)

U<-read.table("\\\\edunet\\dfs\\home05\\ht10\\muhqah101\\Desktop\\thesis data\\data65.R")

#has 50% response rate,(file name data50). y,x1,x2,x3,yr,teta U<-U[,-1] #remove the new column,that created by read.table T<-sum(U[,1])

lemda<-(n*U[,2]/sum(U[,2]))

X<-matrix( c(sum(U[,2]),sum(U[,3]),sum(U[,4])),,1 ) #Xtotal (X1,X2,X3) X1<-matrix( c(sum(U[,3]),sum(U[,4])),,1 ) #Xtotal (X2,X3)

(42)

42 for(j in 1:r)#start loop 1 {

u<-runif(3000) #generate uniform r.numbers

for(i in 1:3000)

{ #start loop 2

Q[i,1]<-( (u[i]*(1-lemda[i])) / (lemda[i]*(1-u[i]) ) ) # generate ranking variable

} #end loop 2

U2<-cbind(U,lemda,Q)

OU<-U2[ order(Q) ,] #sorted according to Q. OUn<-OU[1:n,] #takes the 1st n rows.

R<-OUn[OUn[,5]!=0 ,] #takes the respondent elements only. #--- 1: (ty= L*yk/teta) ---

yteta[j,1]<-sum( R[,5]/(R[,7]*R[,6]) )#using only pareto and teta weights. where R6 is teta and R7 is lemda

#--- 2A: calibration weights using X1,X2,X3 --- LXr<-matrix( c(sum(R[,2]/R[,7]),sum(R[,3]/R[,7]),sum(R[,4]/R[,7])),,1 )#Xr total LXXr<-matrix(c(sum(R[,2]*R[,2]/R[,7]),sum(R[,3]*R[,2]/R[,7]),sum(R[,4]*R[,2]/R[,7 ]), sum(R[,2]*R[,3]/R[,7]),sum(R[,3]*R[,3]/R[,7]),sum(R[,4]*R[,3]/R[,7]), sum(R[,2]*R[,4]/R[,7]),sum(R[,3]*R[,4]/R[,7]),sum(R[,4]*R[,4]/R[,7])),,3) # element wise multiplication , then make matric of 3x3

mat2<-solve(LXXr,tol=1e-30)#inverse of matrix mat1<-(X-LXr) L<-( t(mat1)%*%mat2 ) #lemda vk<-matrix( c(1+(L[1,1]*R[,2]+L[1,2]*R[,3]+L[1,3]*R[,4])),, ) #(rx1) wk<-vk/R[,7] #(rx1)

(43)

43

y[j,1]<-sum(wk*R[,5]) #calibration total of y

#--- 2B: calibration weights using X1,X2,X3 and teta --- LXrt<-matrix( c(sum(R[,2]/(R[,6] * R[,7])),sum(R[,3]/(R[,6] *

R[,7])),sum(R[,4]/(R[,6] * R[,7]))),,1 ) #Xr total LXXrt<-matrix(c(sum(R[,2]*R[,2]/(R[,6] * R[,7])),sum(R[,3]*R[,2]/(R[,6] * R[,7])),sum(R[,4]*R[,2]/(R[,6] * R[,7])), sum(R[,2]*R[,3]/(R[,6] * R[,7])),sum(R[,3]*R[,3]/(R[,6] * R[,7])),sum(R[,4]*R[,3]/(R[,6] * R[,7])), sum(R[,2]*R[,4]/(R[,6] * R[,7])),sum(R[,3]*R[,4]/(R[,6] * R[,7])),sum(R[,4]*R[,4]/(R[,6] * R[,7]))),,3)

# element wise multiplication , then make matric of 3x3 mat2t<-solve(LXXrt,tol=1e-30) #inverse of matrix mat1t<-(X-LXrt)

Lt<-( t(mat1t)%*%mat2t ) #lemda

vkt<-matrix( c(1+(Lt[1,1]*R[,2]+Lt[1,2]*R[,3]+Lt[1,3]*R[,4])),,

) #(rx1)

wkt<-vkt/(R[,7]*R[,6])#R[,7],R[,6] are lemda and tetarespectilly #(rx1)

yt[j,1]<-sum(wkt*R[,5])

#--- 3A: calibration weights using X2,X3 only ---

LXr1<-matrix( c(sum(R[,3]/R[,7]),sum(R[,4]/R[,7])),,1 )#Xr total

LXXr1<-matrix(c(sum(R[,3]*R[,3]/R[,7]),sum(R[,4]*R[,3]/R[,7]), sum(R[,3]*R[,4]/R[,7]),sum(R[,4]*R[,4]/R[,7])),,2)

# element wise multiplication , then make matric of 2x2

mat21<-solve(LXXr1,tol=1e-30) #inverse of matrix mat11<-(X1-LXr1)

L1<-( t(mat11)%*%mat21 ) #lemda

vk1<-matrix( c(1+(L1[1,1]*R[,3]+L1[1,2]*R[,4])),, )#(rx1)

wk1<-vk1/R[,7] #(rx1)

(44)

44

#--- 3B:calibration weights using X2,X3 and teta --- LXr1t<-matrix(

c(sum(R[,3]/(R[,6]*R[,7])),sum(R[,4]/(R[,6]*R[,7]))),,1 )#Xr total

LXXr1t<-matrix(c(sum((R[,3]*R[,3])/(R[,6]*R[,7])),sum((R[,4]*R[,3])/(R[,6]*R[,7])), sum((R[,3]*R[,4])/(R[,6]*R[,7])),sum((R[,4]*R[,4])/(R[,6]*R[,7]))),,2)

# element wise multiplication , then make matric of 2x2 mat21t<-solve(LXXr1t,tol=1e-30) #inverse of matrix mat11t<-(X1-LXr1t)

L1t<-( t(mat11t)%*%mat21t ) #lemda

vk1t<-matrix( c(1+(L1t[1,1]*R[,3]+L1t[1,2]*R[,4])),, )#(rx1) wk1t<-vk1t/(R[,7]*R[,6])

#R[,7],R[,6] are lemda and tetarespectilly #(rx1) y1t[j,1]<-sum(wk1t*R[,5])

#---

} #end loop 1

print(mean(yteta)-T) #bias of yteta using teta weights only, where T is the pop total of study variable y.