A simulation study of Poisson Regression model with sample selection effect

(1)

Örebro University School of Business

Master Thesis

Supervisor: Professor Sune Karlsson

Examiner: Lecturer Panagiotis Mantalos

Semester: 20112

A simulation study of Poisson Regression

model with sample selection effect

Zengyi Hao

1987-09-12

(2)

Abstract ... 5

1. Introduction ... 6

2. A review of adjusted Poisson regression models ... 10

2.1 truncation ... 10

2.2 censored ... 10

2.3 zero inflated count data ... 12

2.4 Under reporting model ... 13

2.5 endogenous switching and sample selection ... 14

3. Estimators under the sample selection effect ... 16

3.1 FIML estimator ... 17

3.2 TSM estimator ... 19

3.3 NWLS method ... 22

3.4 Poisson regression model ... 24

4. Simulation design ... 25

5. Simulation results ... 30

6. Comments on simulation results ... 47

6.1 The bias of estimates ... 47

6.1.1 The impact of 𝝈 and 𝝆 on estimate bias ... 47

6.1.2 The impact of the common variable on estimate bias ... 49

6.1.3 The impact of 𝝀 on estimate bias ... 49

6.2 The Mean Square Error (MSE) ... 49

6.2.1 The impact of 𝝈 and 𝝆 on MSE... 50

6.2.2 The impact of common variable ... 50

6.2.3 The impact of 𝝀 ... 50

7. The conclusion ... 51

Reference ... 53

(3)

Table 1 The average percentage that the observed y is larger than unobserved y ... 20

Table 2, the summary of simulation set up ... 29

Table 3 FIML estimator, λ=8 and has common variable ... 56

Table 4 TSM estimator, λ=8 and has common variable ... 58

Table 5 NWLS estimator, λ=8 and has common variable ... 60

Table 6 Poisson_s estimator, λ=8 and has common variable ... 62

Table 7 Poisson_f estimator, λ=8 and has common variable ... 64

Table 8 FIML estimator, λ=8 and does not have common variable ... 66

Table 9 TSM estimator, λ=8 and does not have common variable ... 68

Table 10 NWLS estimator, λ=8 and does not have common variable ... 70

Table 11 Poisson_s estimator, λ=8 and does not have common variable ... 72

Table 12 Poisson_f estimator, λ=8 and does not have common variable ... 74

Table 13 FIML estimator, λ=4 and has common variable ... 76

Table 14 TSM estimator, λ=4 and has common variable ... 78

Table 15 NWLS estimator, λ=4 and has common variable ... 80

Table 16 Poisson_s estimator, λ=4 and has common variable ... 82

Table 17 Poisson_f estimator, λ=4 and has common variable ... 84

Table 18 FIML estimator, λ=4 and does not have common variable ... 86

Table 19 TSM estimator, λ=4 and does not have common variable ... 88

Table 20 NWLS estimator, λ=4 and does not have common variable ... 90

Table 21 Poisson_s estimator, λ=4 and does not have common variable... 92

(4)

List of Figures

Figure 1 the estimates bias and standard deviation of β0 in case 1 ... 31

Figure 2, the estimates bias and standard deviation of β1 in case 1 ... 32

Figure 13 Relative MSE, TSM estimator as the benchmark, in case 1 ... 43

(5)

Abstract

Keywords: Poisson regression model, sample selection effect

This paper examines properties of estimators of Poisson Regression Model with

sample selection effect. The Poisson regression model could be estimated by full

information maximum likelihood (FIML) method as a straightway choice.

However, the FIML method has the similar disadvantage as maximum likelihood

that it is un-robust for miss-specified distribution. Furthermore, the FIML

estimator is computationally burdensome. A usually robust estimator, two-stage

method of moments (TSM) and more efficient and robust estimator, nonlinear

weighted least-squares (NWLS) are alternative choose. This paper compared the

finite sample properties of these estimators with Poisson regression estimator at

the same time. The simulation results imply that there is no simple rule that could

be used to choose the best estimator. The variance of random error term in Poisson

distribution has a significant influence on performance on estimators. The variance

is larger, the bias and standard deviation of estimator become larger.

(6)

1. Introduction

In practice one may need to explain a non-negative integer variable, such as the

government want to know the determinations for the number of children in a

family, the car insurance company want to know the expected number of accidents

given some properties of a car and so on. For these purposes, the count data

regression model plays a crucial role and Poisson regression model is of the

widely used model in application. The general form of Poisson distribution is

given as

)

exp(

!

)

(









i y i

y

p

i

(1)

Where λ is the parameter of Poisson distribution, and it is a function of some

explain variables, x. usually, this function will take an exponential form that

)

exp(

x

_i'

β

i





Then the conditional mean of y is

 

y

_i

 

x

_i'

β

E



exp

It is also the conditional variance of y since Poisson distribution leads to an

equal mean and variance.

Under general conditions, maximum likelihood (ML) estimator is a better

estimator because it is more efficient than other estimators and unbiased. The

log-likelihood function is





_

 

_

  





n i i n i n i i i

y

L

1 1 1 '

!

exp

ln

β

|

Y,

X

x

β

x

'i

β

(2)

In order to maximize the log-likelihood function the first order condition should

be satisfied:

(7)

 













n i i ij j

y

x

d

L

d

1

0 exp

ln

β

x

'i



(3)

The solution is easily to be solved by numeric method exists since the Hessian

matrix is negative defined.

 









n i ik ' ij j k

x

d

L

d

1 ' 2

exp

ln

β

x

'_i



(4)

Generally speaking, ML estimator is not robust when one fails to identify the

distribution or conditional distribution. In this situation ML estimator would lead

to significant estimate bias.

In general, there are two strategies to get a more robust estimator in terms of

possible miss-specifying the distribution. The first one is to identify an adjusted

probability density function and the corresponding model. The most common

models are censoring model (Famoye and Wang, 2004), truncation model

(Grogger and Carson, 1991), hurdle model (Mullahy, 1986) zero inflated model

(Lambert, 1992), the count regression model with endogenous switching and

sample selection (Terza, 1998). The second strategy is to apply more robust

estimators than ML estimator such as Two-Stage method, Non-linear Weighted

Least Squared method, Generalized Moment method and some other

non-parameter method. This paper focus on the second strategy, that is to say,

focuses on these estimators that Terza (1998) introduces. Terza's model could

handle both endogenous switching and sample selection effects, and he gives the

details of estimators and offers an application on endogenous vehicle ownership.

Oya (2005) uses Monte Carlo Simulation method to examine properties of those

estimators in Terza's model with endogenous switching. Furthermore Oya relaxes

the assumption on random error term in Terza's model and test these estimators'

(8)

It is better begin with a practical example that illustrates how sample selection

arising. Gronau (1974) and Heckman (1974) first propose the sample selection

effect and selection bias when they research the determinants of wages and labor

supply behavior of females. Suppose one surveyed a sample of women where only

part of them has a job and report the wages. One has an interesting in identifying

how woman’s characteristics influence the wages they get. The selection bias

arises if the workers and no-workers have certain different properties. In order to

having a clearing explanation, we divide those characteristics into two groups: a

group of observable characteristics and a group of unobservable characteristics. If

the two group women have similar characteristics or decision that working or not

is independent on woman’s characteristics, there is no reason to suspect a selection

bias problem. However, whether or not to work is generally dependent on

woman’s characteristics, for example, the number of children and the education

background (Heckman, 1974). Now the decision to work is not random, and as a

consequence, the working and nonworking subpopulation have potential different

characteristics. Further, when the decision is relevant to woman’s characteristics,

and is also determining a woman’s wage, at the same time, the sample selection

effect arises and selection bias will affect the estimator. Here, one needs to pay

attention to woman’s characteristics. As mentioned above, the two group

characteristics have different influence on whether a sample selection effect arises.

In an unreasonable situation that only the observable characteristics deciding both

the decision to work and the wage of a working woman, one can add appropriate

independent variables then selection bias could be controlled. In most cases, both

part of observable and part of unobservable characteristics have an effect on wage

and the decision to work. Since one cannot add independent variables to control

these unobservable characteristics (otherwise they are observable), it leads to

incorrect inference in the model, and introduces bias in the estimator.

(9)

Theoretically, Terza's model is able to deal with both endogenous switching and

sample selection effects; however, there might be some potential problems or

difference. One possible reason is that when dealing with endogenous switching

problem one can use the whole observations, but when sample selection effects

arises, part of the observations cannot be observed. In some extent, miss value or

unobserved elements in population are more harmful to the estimated model. So

this paper is aimed to examine the properties of these estimators under sample

selection effects.

This paper mainly focuses on the sample selection model which the count

variable's distribution is presumed as a Poisson distribution. In section 2, a review

of adjusted Poisson

regression model is presented. In section 3, the simulation

design is given. In section 4, the simulated results are shown and analyzed. Some

comments are given in section 5. Conclusion is in section 6.

(10)

2. A review of adjusted Poisson regression

models

2.1 truncation

Grogger and Carson (1991) find when sample selection rules lead to truncated

count data in dependent variable it will cause magnitude estimation bias,

especially for Poisson regression model. Assuming the dependent variable is

truncated at zero, the Poisson regression model is derived as:

!

)

1 )

(exp(

)]

0 (

Pr

1 )[

exp(

!

)

0 |

(

Pr

1 i i y i i i i y i i i

y

ob

y

ob

i i















(5)

Where

)

exp( β

x

_i



i



Then a maximum likelihood estimator could be got by maximizing the

log-likelihood function:









m i i i i

Y

y

L

1

))

!

ln(

]

1 )

ln[exp(

(

ln

x

i

β



(6)

Where m is the truncated sample size and the last term in log-likelihood

function could be ignored since it does not include parameters.

2.2 censored

Felix Famoye and Weiren Wang (2004) introduce a censored generalized

Poisson regression (CGPR) model which could deal with censored data and model

(11)

over- or under-dispersion. The censored generalized Poisson regression model

defines the non-negative integer dependent variable Y is distributed as a general

Poisson distribution which means









_

































 i i i y i y i i i i i

y

Y

ob

i i













1

1 exp

1

1 !

1 )

(

Pr

1

(7)

and













2

1 |

|

i i i i i i i

Y

Var

Y

E









x

Where θ is defined as a function of independent variables, such as θ=exp(xβ).

Suppose the dependent variable is censored to be y

*

for all value than larger or

equal to y

*

, then the probability distribution function is:















































































  

|

1 |

Pr

1 |

Pr

1

1 exp

1

1 !

1 )

|

(

Pr

* * 1 0 * * 1 *

y

if

y

-F

k

Y

ob

y

Y

ob

y

if

y

Y

ob

i i y k i i i i i i i i y i y i i i i i i i i

x













(8)

Then the likelihood function of sample (Y, X) under censored generalized

Poisson regression is























  







































} | { * } | { 1 * *

|

1

1 exp

1

1 !

1

y y k y y i i i i y i y i i i k i i i

y

F

y

L

k

x

X

Y,

|

β

α,













(9)

The maximum likelihood estimator could be solved by maximizing the

likelihood function or log-likelihood function.

(12)

2.3 zero inflated count data

In practice the surveyed sample data presents there are more certain value,

usually zero, than the Poisson model expects. This will lead to the conditional

variance becoming larger or over-disperse. One reason that there is more zero than

model can predict is a sample selection process which is a combination of the

binomial distribution and Poisson distribution. This process is reasonable and

reliable since some survey questionnaires involve two kinds of answer. A survey,

for example, asks selected families how many children they have. If one family

gives the answer which is zero, it could mean the family would not want a child or

they want to have a one or more children but now they have not. These two kinds

of the family have different property even they give the same answer. A model that

can handle this problem which is called zero inflated Poisson or ZIP model

(Lambert, 1992). This model implies there are two resources that one observes a

zero value y: it might come from a binary distribution or come from a Poisson

distribution. The model could be presented as:

i i i i i

q

Poisson

y

f

y

q

on

distributi

Binary

y



1 y

probabilit

with

)

(

~

)

(

~

y

probabilit

with

~



Where

 

w

γ

w

β

x

i i i

exp

1 exp

exp





i i

q



W is a vector that explains the probability and is set to be a constant times xβ.

Then the probability function of y is





otherwise

)

(

)

1 (

0 if

)

0 (

1 )

(

















i i i i i i

y

f

q

y

f

q

y

p

(10)

The likelihood function is

(13)

 





 

_

_{ }

_



   



































































0 0

exp

!

exp

)

exp(

1 )

exp(

1 exp

exp

)

exp(

1 )

exp(

1 )

exp(

1 )

exp(

)

|

(

)

(

A A i i A i A i i

y

p

L

β

x

β

x

γ

w

γ

w

β

x

γ

w

γ

w

γ

w

γ

w

,

x

γ,

β,

W

X,

Y,

|

γ

β,

i i i i i i i i i i i

(11)

Where A denotes the sample set and A0 contain observations that y is zero. Then

one can get the maximum likelihood estimates by maximizing the log-likelihood

function.

2.4 Under reporting model

The under-reporting sample selection affection arises when there is reporting

mechanism. Suppose every survey element need to support a report for every

event, and there has y

i*

events. Let u

ij

denote the utility that reports jth event's

report of the ith survey element and assume the utility could be modeled as:

i ij

u



z

'_i

α





Here, assume the utility is constant for all events in jth survey element. An index

variable, d

_ij,

is defined as











otherwise

0

0 u

if

1

_ij ij

d

Then jth survey element would report y

i

reports that





* 1 i y j ij i

d

y

And

)

(

)

|

(

)

(

* 0 *

k

y

p

k

y

p

y

f

_i _i k i i i i













 

(12)

(14)

Where the conditional distribution of y is distributed as a binomial distribution

that

))

(

Pr

,

(

~

)

|

(

y

i

y

i*



y

i



k

Binomial

y

i



k

ob

i



z

i

α

P



Winkelmann and Zimmermann (1993) give the complete model by assuming

that p (y

i*

) is distributed as Poisson with mean equal to exp (xβ) and ɛ is

distributed as logistic distribution. Under these assumptions

i y i i i i i i

y

z

x

y

f



!

)

exp(

)

,

|

(





(13)

where

)

exp(

1 )

exp(

γ

z

γ

z

β

x

' i ' i ' i





i



Further they provide the maximum likelihood estimates.

2.5 endogenous switching and sample selection

Terza (1998) proposes a model and three estimators that deal with both sample

selection and endogenous switching. The model is constructed with two parts: a

Poisson equation that describes how independent variables influence the discrete

dependent variable; a selection equation that describes whether or not one element

in population would be observed or this element is affected by a treatment. The

endogenous switching model is given as

)

exp(

!

)

(









i y i

y

p

i

)

exp(







'



_i



i i

x

α

βc

(14)

(15)

















0 if

0

0 if

1

2 1 2 1 i i i i i

z

c























1 )

,

(

~

)

,

(

2









Σ

0 where

Binomial

f

Here the conditional mean of independent variable, y as usual, is influenced by a

specification error and this error is related with random error term in the selection

equation. Oya (2005) uses Monte Carlo Simulation method to examine the finite

properties of Terza's estimators under endogenous switching. Oya's simulation

includes three cases. In case0, the random error terms are correctly specified but

has an invalid constrain on ρ, setting ρ=0. The simulation results show that, the

larger difference between 0 and the true value of ρ, the larger bias of estimators.

The FIML estimator's standard deviations are the smallest, and TSM estimator's

standard deviation is the largest. In addition, as the true value of ρ decreases from

1 to -1, the standard deviations of NWLS estimator become larger. In case1, the

random error terms are correctly specified and has no constrain on ρ. In this case,

FIML estimator gives the smallest bias and standard deviations and those of TSM

estimator are largest. On the other hand, the properties of ɑ

₀

, ɑ

₁

, β

₀

, β

₁

and σ are

highly similar, except for property of ρ. In case2, the random error terms are

miss-specified, a gamma distributed random error term are miss-specified as a

normal distribution. In this situation, the results are similar to case1.

(16)

3. Estimators under the sample selection

effect

Suppose the count variable y is the independent variable and assumed to be

distributed as a Poisson distribution. The parameter of Poisson distribution is

determined by the equation that











exp

xβ



Here, x is exogenous independent variable including a constant term and ɛ is a

random error. For some reason, not all y could be observed, and it depends on the

following equation















0 if

otherwise

0 if

observed



αz

y

Where z is exogenous variable including a constant term, and υ is a random

error. If ɛ and υ are correlated, the sample selection effects arise. That means the

value of y, which is partially dependent on ɛ, is related with whether y could be

observed. For example, when ɛ and υ are positively related then a large υ is

generally combined with a large ɛ. Since a large υ generally leads y to be observed,

and a large ɛ generally leads to a large value of y, then the y which takes larger

value will be more likely to be observed. In other words, in a survey sample data,

the proportion of y which takes large value will be bigger than the y taking small

value. This result to a non-random sample result, even the survey is based on

random design.

Under Terza's model, there are three estimators that could be used. The

following part will give the formulations of three estimators under sample

selection effects.

(17)

3.1 FIML estimator

Assuming ɛ and υ are jointly distributed as a bivariate normal distribution which

is















1

1 )

(

~

)

,

(

2







Σ

0,

N

The unconditional joint discrete density for an observed y is given as

























    

d

f

d

y

p

d

f

p

d

y

p

d

f

d

ob

d

y

p

d

f

d

y

p

d

y

P

z

)

(

1 )

/

(

)

,

1 |

(

]

)

,

(

)

,

|

0 (

)[

,

1 |

(

)

(

)

,

|

1 (

Pr

)

,

1 |

(

)

(

)

,

|

1 ,

(

)

|

1 ,

(

-2



            



























zα

z

x,

z

x,

zα

z

x,

z

x,

z

x,

z

x,

(15)

By exploiting the symmetry of the normal cdf, the probability that d=0 is























2

1 )

,

|

0 (

Pr







zα

z

x,

d

ob

(16)

and









d

f

d

ob

(

)

1 )

|

0 (

Pr

2



  























x,

z

zα

(18)































d

y

d

f

d

y

dP

d

y

P

y

)

exp(

2

1

1 )

)

/

(

)(

1

2 (

))]

exp(

!

)

exp(

)

1 [(

)

(

1 )

)

/

(

)(

1

2 (

)]

,

|

(

)

1 [(

)

|

,

(

2 2 2 2

































































     

zα

xβ

zα

x

z

x,

(17)

This integration could be approximated computed by Hermite Quadrature

integration method. Hermite Quadrature integration formulation is an efficient if

integrand has a particular form that



     





x

f

x

dx

x

g

(

)

exp(

2

)

(

)

(18)

and

points

chosen

some

are

)]

(

[

!

2 )

(

)

(

)

exp(

2 1 2 1 1 2 i i n n i n i i i

x

and

x

H

n

w

where

x

f

w

dx

x

f

x

     













Butler and Moffiitt (1982) say when n is 3 or 4, the accuracy of Hermite

Quadrature integration is sufficient. So in this paper the n is chosen to be 3 and the

corresponding

value

of

w

and

x

are:

x=(-1.224744,0,1.224744)

,

w=(0.295408,1.181635,0.295408), Beyer (1987). In order to apply Hermite

Quadrature method, the likelihood contribution should be transformed into the

special form and after the transformation the likelihood contribution is given as

(19)











d

y

d

y

P

y

)

exp(

1 )

2 )(

1

2 (

))]

2 exp(

exp(

!

)

2 exp(

)

1 [(

1 )

|

,

(

2 2









































 

zα

xβ

z

x,

(19)

The conditional likelihood function is easily computed, and Fully Information

Maximum Likelihood estimators could be getting by maximizing the conditional

likelihood function.

3.2 TSM estimator

The FIML estimator is not robust when y's distribution is not correctly specified.

A more robust estimator is Two-Stage method of Moments estimator. This

estimator only assumes the conditional mean of y is

)

exp(

]

,

|

[

y

x,

z

d





xβ





E

The assumption of the conditional mean is the same in Terza’s paper and

random error terms have the same joint bivariate normal distribution, so the mean

of y conditions on x, z and d are the same between sample selection effect and

endogenous switching. From Terza’s paper, the conditional mean after integrating

out ε is given as

































)

(

1 )

(

1 )

1 (

)

(

)

(

)

exp(

]

,

|

[

*

zα

xβ

z

x,

d



d



y

E

(20)

The conditional mean for observed y is, just put d=1 in the above equation,

)

(

)

(

)

exp(

]

1 |

[

*

zα

xβ

z

x











,d

,

y

E

(21)

Where beta star is the same as beta, except the first element is shifted by σ

2

/2

(20)

term on the right side of the above equation is larger than one. It could be seen as

an adjust term on the conditional mean of y because of the sample selection effects.

As mentioned above, a positive relation between ɛ and υ leads to increase the

proportion of larger value of y in observed data and increases the mean, as well.

The adjusted term makes the "inflated" mean of y closer to the original level, at

least. The expected difference between observed y and unobserved y is:

))

(

1 )(

(

)

(

)

(

)

exp(

))

(

1 )(

(

)

(

)

(

)

exp(

)

(

1 )

(

1 )

(

)

(

)

exp(

]

0 ,

,

|

[

]

1 ,

,

|

[

* *

zα

xβ

zα

xβ

zα

xβ

z

x

z

x

*































































d

y

E

d

y

E

(22)

For example, when the expectation of zα is 0.5 and σ is 0.3, the difference will

be increasing as ρ taking large absolutely value (table 1).

Table 1 The average percentage that the observed y is larger than unobserved y

ρ

-0.8

-0.6

-0.4

-0.2

0 0.2

0.4

0.6

0.8 %

-41

-30

-20

-10

0

9

19

28

36

given the expectation of zα is 0.5 and σ is 0.3

(21)

e

h

y











)

(

)

(

)

exp(

)

,

(

* *

zα

xβ

β

α

z

x



(23)

Where

e is a random error term. This equation could be estimated by non-linear

least squares method or estimated by two-stage technique if beta and alpha have

larger dimensions. The first stage is a simple probit regression analysis and obtains

a consistent estimate of ɑ

₀

and ɑ

₁

. The second-stage is a nonlinear least squares

method to

e

h

y



(

x

,

z

,

α

ˆ

,

β

*

,



)



Where



ˆ

are the estimates in the first stage. Denote vector b1= (β

*

, θ) and

Terza (1998) shows that the approximate distribution of b1 is given as





α

g

b

g

α

g

D

0 b

b

2 1 1 1 ' 1 1 ' 2 1 ' 2 1 ' 1 1 ' 1 1 1

















 

h

E

VAR

E

e

E

where

N

n

d 1 ' 2 1

]

[

]

[

)

ˆ

(

]

[

]

[

]

[

]

,

[

)

ˆ

(

(24)

VAR (

αˆ

) denotes the asymptotic covariance matrix of the first-stage probit

estimator of ɑ. In practice, a heteroskedasticity-consistent estimator of D could be

computed as

1 1

)

](

)

ˆ

(

ˆ

[

)

(

ˆ

_



_

 1 ' 1 1 ' 2 2 ' 1 1 ' 1 1 ' 1

G

ΨG

G

α

G

D

V

A

R

Where G

₁

and G

₂

are matrices whose typical rows are

α

g

b

g

2 1 1

ˆ









h

and

(22)

n n i N i

e

diag

R

A

V

  

























)}

ˆ

(

{

))

ˆ

(

1 )(

ˆ

(

)

ˆ

(

)

ˆ

(

ˆ

2 1 1 2

Ψ

α

z

α

z

α

z

α

_' i ' i i ' i ' i



3.3 NWLS method

Since the variance of a Poisson distribution is λ, so for different observations,

the conditional variance of y is mostly different. Then a weighted least-square

method could gain large efficient.

From Terza's paper, the conditional variance is given as



















2



2 2 2 2

2 exp

,

|

)

,

|

(

)

,

|

(























x

z

x

z

x

E

Var

y

Var

E

y

e

Var

Where

 





 

)

exp(

)

(

/

)

(

,

2

2 exp

* 2 2

xβ

zα

α

























(25)

Parameters ɑ, β

*

and θ can be obtained using two-stage estimators while σ

2

could be estimated by regression approach or conditional maximum likelihood

approach. Conditional maximum likelihood approach is reliable, but it is

computational cumbersome. Therefore, the regression based approach is used in

this paper. One can rearrange terms in var(e) in such way that

(23)

estimates

stage

two

of

value

the

taking

,

are

ˆ

and

)

(25

as

defined

)

exp(

)

ˆ

2 exp(

/

ˆ

2 2 2 2 2 2 2

ψ

δ

e

ψ

δ

e

ψ

δ

e

ψ

δ

t

ψ

δ

ψ

δ

e

r

t

r

,

a

where

a



















The consistent estimator of σ square is

a

of

estimate

OLS

the

denotes

ˆ

where

)

ˆ

ln(

ˆ

2





(26)

In some situation

aˆ

is smaller than zero and regression approach fails. When

one simulated data leads to

aˆ

smaller than zero, in this paper, the programming

stops and try another simulated data. On the other hand, this situation does not

always happen. Compared with the computational cumbersome of conditional

maximum likelihood approach, regression base approach is preferred.

The NWLS estimators are estimated by

 



arg

min

,

* * * NWLS NWLS

β

b

Q

NWLS

















Where

 









 

   









































,

for

estimates

is

ˆ

,

ˆ

,

ˆ

,

ˆ

,

ˆ

2 ˆ

2 exp

ˆ

,

ˆ

2 ˆ

exp

ˆ

,

2 , 2 2 , 2 2 2 2 * 1 2 * *

α

β

α

β

α

β

α

β

* * * * *















 i i i i i i i i i i i i i n i i

v

y

e

Q

(24)

)

2 (exp(

]

)

/

1 [(

]

)

/

1 [(

)

ˆ

(

]

)

/

1 [(

]

)

/

1 [(

]

)

/

1 [(

]

,

[

)

ˆ

(

2 2 2 2 1 2 1 ' 1 1



































  

v

h

E

v

E

VAR

v

E

v

E

v

E

where

n

d

α

g

b

g

α

g

D

0 N

b

2 1 1 1 ' 1 1 ' 2 1 ' 2 1 ' 1 1 ' 1 * * NWLS NWLS

(27)

In practice, the following consistent estimator of D

*

could be estimated as

1 1 1

)

)(

(

)

(

)

(

)

(

ˆ

_

 

_

      1 1 ' 1 1 1 ' 2 2 1 ' 1 1 1 ' 1 1 1 ' 1 *

G

Λ

G

Λ

G

V

G

Λ

G

Λ

G

Λ

G

D

where

n n i

v

diag

_



{

(

ˆ

)}

Λ

3.4 Poisson regression model

As a comparison, a Poisson regression model, only using those observed

elements in simulated data, is also estimated. It is meaningful to see whether

FIML, TSM or NWLS estimators could handle sample selection effect when it

happens and if they perform better than standard Poisson regression estimator. The

estimator of standard Poisson regression model is Maximum Likelihood estimator.

One can solve equation (3) to obtain the Maximum Likelihood estimates.

Since the data is simulated, it is possible to use the whole data, rather than the

observed part. Then a Poisson regression model with whole simulated data is also

applied.

(25)

4. Simulation design

The count-dependent variable yi, i=1, 2...500 are generated from the conditional

Poisson distribution, which is named as outcome equation:





   

i i i i i i

y

x

y

f







exp





!

exp

,

|

and the conditional mean function is



i i i



i



x



x







exp

₀



₁ ₁



₂ ₂



Not all of y

i

could be observed, and it is decided by the following selection

mechanism, which is named as selection equation:















0 if

otherwise

0 if

observed

2 2 1 1 0 2 2 1 1 0 i i i i i i i

z

y









Where coefficient parameters in the model, (ɑ

0

, ɑ

1

, ɑ

2

, β

0

, β

1

, β

2

), are set different

values among simulation designs. The variance-covariance matrix of the error

terms ɛ

_i

and υ

_i

is set as













1

2





In the former papers , like Oya (2005), the simulation set up are different by

changing the value of ρ. There are, however, more potential factors that might be

impact on the performance of estimators. In this paper, simulation designs include

more variance, and examine four factors: the value of ρ, the value of σ, the

value of λ, and whether conditional mean equation has common variable with

selection equation.

(26)

The value of ρis the first factor that should be examine, and in theory the

estimates bias would not appears whenρis zero. So it should be expected in the

simulation results that, whenρis zero the estimates’ bias is not statistically

significant, or is the smallest one if other factors also cause estimates bias.

The second factor is the value of σ. The random error term presents the un-control

part of the regression model, so it also likely has influence on the performance of

estimators. On the other hand, the random error term in selection equation is not

changed, because the probit regression model has a constant variance of random

error term.

The third factor is the expected conditional mean,𝜆. As a characteristic of the

Poisson distribution, the mean of Poisson distribution is equal to the variance of

Poisson distribution, referred as equidisperson (Winkelmann, 2008, p8.). As

conditional mean increases, the conditional variance also becomes larger. This

might has an influence on estimators’ estimates. Another thing should be

mentioned, that the sample selection effect results a change on conditional mean of

the observed sub-sample of the whole population. Since the characteristic of

equidisperson, the conditional variance of sub-sample also different from the

conditional variance of the whole population. This maybe let a complicate

interactive impact between the value of

𝜌 and the value of 𝜆.

The last factor is whether or not the conditional mean equation has common

variable with the selection equation. This factor would more likely have an impact

on the TSM estimator and NWLS estimator. . In Terza's paper, "the TSM estimator

is a nonlinear least-squares analog to the popular Heckman estimator", and the

NWLS estimator is a weighted TSM estimator. So the TSM and the NWLS are

belonging to the Heckman's two-stage estimator. Puhani (2002) points out that

(27)

there is three most important disadvantage or limitation of Heckman's estimator.

The first one is, in term of giving a prediction; regression model using subsample

could give at least as good as the Heckman's estimator or FIML estimator. The

second one is the assumption of normal distribution of random error terms. The

last one is the potential collinearity problem. The first two criticism are not

explained here, since the simulation study in this paper does not examine the

predict power of estimators, and the random error terms are assumed to be normal

distributed. The potential collinearity problem comes from the fact that, the

inverse Mills ratio is roughly a linear function within a range, and when most

observations in a particular sample do not take extreme values, the inverse Mills

ration will be collinearity with the constant term and an approximately linear

function of all the explanatory variables. For more detailed explanations, see

Puhani (2002). Little and Rubin (1987, p.230) say that "for the (Heckman) method

to work in practice, variables are needed in x

2

are good predictors of y

*2

and do not

appear in x

₁

, that is, are not associated with y

₁

when other covariates are

controlled". In this paper, it means the variable of z

₁

and z

₂

should be both

independent from x

1

and x

2

.

Above all, there are four different simulation set up, and all the simulation results

will be used to examine the four factors. In each simulation case, the 𝜌 is taken the

values: -0.8, -0.4, -0.2, 0, 0.2, 0.4, 0.8 and the σ is taken the values: 0.3, 0.6, 1.0,

1.3, 1.6, 2.0. Each value of

𝜌 will combine with each value of σ, so there are total

42 different combinations. In case 1 the conditional mean is set to be 8, and the

conditional mean equation and the selection equation have a common variable,

x

₂

=z

₂

. In case 2, the conditional mean is set to be 8, and the conditional mean

equation and the selection equation do not have common variable. In case 3, the

conditional mean is set to be 4, and the conditional mean equation and the

selection equation have a common variable, x

₂

=z

₂

. In case 4, the conditional mean

(28)

have a common variable.

The explanatory variable x

1

, x

2

are generated from a uniform distribution over the

interval between 0 and 2. z

₂

is also generated from a uniform distribution over the

interval between 0 and 2. z

1

is set to be equal to x

1

or is generated from a uniform

distribution. In this paper, each simulation experiment is conducted by 1000 times.

One detail should be mentioned, that in order to test whether the σ will influence

the performance of estimator, the σ takes different values. However, changing σ's

value will change the expectation of conditional mean or E(λ

_i

). This could be

shown by the following equation:

)

2

1 exp(

)

exp(

)]

[exp(

)

exp(

)]

[exp(

]

[

2







xβ







E

(28)

In order to control a same conditional mean for different σ's value, β

0

is

adjusted that the conditional mean is unchanged. The rest coefficient parameters

are set to be (ɑ

0

, ɑ

1

, ɑ

2

, β

1

, β

2

) = (0.2, 0.2, 0.4, 0.6, 0.4) when

𝜆 is 8, and (ɑ

0

, ɑ

1

,

ɑ

2

, β

1

, β

2

) = (0.2, 0.2, 0.4, 0.3, 0.2) when

𝜆 is 4. Table 2 is a summary of

(29)

Table 2, the summary of simulation set up

Case 1

Case 2

Case 3

Case 4

𝜆

8

4

4 x

1

U (0,2)

x

₂

U (0,2)

z

₁

U (0,2)

z

2

=x

1

U (0,2)

=x

1

U (0,2)

(ɑ

₀

, ɑ

₁

, ɑ

₂

)

(0.2, 0.2, 0.4)

(β

1

, β

2

)

(0.6, 0.4)

(0.3, 0.2)

β

0

𝜎=0.3

1.034442

0.3412944

𝜎=0.6

0.8994415

0.2062944

𝜎=1.0

0.5794415

-0.1137056

𝜎=1.3

0.2344415

-0.4587056

𝜎=1.6

-0.2005585

-0.8937056

𝜎=2.0

-0.9205585

-1.613706

(30)

5. Simulation results

The simulation results are given in appendix A, including the estimate bias of all

estimators, standard deviation of all estimators, student t test statistics on whether

the bias is significant different from zero and the mean square error(MSE) of all

estimator. The estimates bias and standard deviation of each coefficient parameter

in each simulation case are presented from figure 1 to figure 12. The MSE of all

estimators in every simulation case are presented from figure 13 to figure 16. In

figures from 1 to 12, the circles stand for estimated bias of estimates, and dash

lines present the value of estimated bias of estimates plus/minus standard

deviation of estimates . The circles are arranged by two levels, the first level is

different values of ρ, and the second level is different values of σ. For each value

of first level, there are six values on σ, {σ | 0.3, 0.6, 1.0, 1.3, 1.6, 2.0}.Circles

between 1 and 6 are estimates that ρ is -0.8; Circles between 7 and 12 are

estimates that ρ is -0.4; Circles between 13 and 18 are estimates that ρ is -0.2;

Circles between 19 and 24 are estimates that ρ is 0; Circles between 25 and 30 are

estimates that ρ is 0.2; Circles between 31 and 36are estimates that ρ is 0.4;

Circles between 37 and 42 are estimates that ρ is 0.8.