Örebro University School of Business
Master Thesis
Supervisor: Professor Sune Karlsson
Examiner: Lecturer Panagiotis Mantalos
Semester: 20112
A simulation study of Poisson Regression
model with sample selection effect
Zengyi Hao
1987-09-12
Abstract ... 5
1. Introduction ... 6
2. A review of adjusted Poisson regression models ... 10
2.1 truncation ... 10
2.2 censored ... 10
2.3 zero inflated count data ... 12
2.4 Under reporting model ... 13
2.5 endogenous switching and sample selection ... 14
3. Estimators under the sample selection effect ... 16
3.1 FIML estimator ... 17
3.2 TSM estimator ... 19
3.3 NWLS method ... 22
3.4 Poisson regression model ... 24
4. Simulation design ... 25
5. Simulation results ... 30
6. Comments on simulation results ... 47
6.1 The bias of estimates ... 47
6.1.1 The impact of 𝝈 and 𝝆 on estimate bias ... 47
6.1.2 The impact of the common variable on estimate bias ... 49
6.1.3 The impact of 𝝀 on estimate bias ... 49
6.2 The Mean Square Error (MSE) ... 49
6.2.1 The impact of 𝝈 and 𝝆 on MSE... 50
6.2.2 The impact of common variable ... 50
6.2.3 The impact of 𝝀 ... 50
7. The conclusion ... 51
Reference ... 53
Table 1 The average percentage that the observed y is larger than unobserved y ... 20
Table 2, the summary of simulation set up ... 29
Table 3 FIML estimator, λ=8 and has common variable ... 56
Table 4 TSM estimator, λ=8 and has common variable ... 58
Table 5 NWLS estimator, λ=8 and has common variable ... 60
Table 6 Poisson_s estimator, λ=8 and has common variable ... 62
Table 7 Poisson_f estimator, λ=8 and has common variable ... 64
Table 8 FIML estimator, λ=8 and does not have common variable ... 66
Table 9 TSM estimator, λ=8 and does not have common variable ... 68
Table 10 NWLS estimator, λ=8 and does not have common variable ... 70
Table 11 Poisson_s estimator, λ=8 and does not have common variable ... 72
Table 12 Poisson_f estimator, λ=8 and does not have common variable ... 74
Table 13 FIML estimator, λ=4 and has common variable ... 76
Table 14 TSM estimator, λ=4 and has common variable ... 78
Table 15 NWLS estimator, λ=4 and has common variable ... 80
Table 16 Poisson_s estimator, λ=4 and has common variable ... 82
Table 17 Poisson_f estimator, λ=4 and has common variable ... 84
Table 18 FIML estimator, λ=4 and does not have common variable ... 86
Table 19 TSM estimator, λ=4 and does not have common variable ... 88
Table 20 NWLS estimator, λ=4 and does not have common variable ... 90
Table 21 Poisson_s estimator, λ=4 and does not have common variable... 92
List of Figures
Figure 1 the estimates bias and standard deviation of β0 in case 1 ... 31
Figure 2, the estimates bias and standard deviation of β1 in case 1 ... 32
Figure 3 the estimates bias and standard deviation of β2 in case 1 ... 33
Figure 4 the estimates bias and standard deviation of β0 in case 2 ... 34
Figure 5 the estimates bias and standard deviation of β1 in case 2 ... 35
Figure 6 the estimates bias and standard deviation of β2 in case 2 ... 36
Figure 7 the estimates bias and standard deviation of β0 in case 3 ... 37
Figure 8 the estimates bias and standard deviation of β1 in case 3 ... 38
Figure 9 the estimates bias and standard deviation of β2 in case 3 ... 39
Figure 10 the estimates bias and standard deviation of β0 in case 4 ... 40
Figure 11 the estimates bias and standard deviation of β1 in case 4 ... 41
Figure 12 the estimates bias and standard deviation of β2 in case 4 ... 42
Figure 13 Relative MSE, TSM estimator as the benchmark, in case 1 ... 43
Figure 14 Relative MSE, TSM estimator as the benchmark, in case 2 ... 44
Figure 15 Relative MSE, TSM estimator as the benchmark, in case 3 ... 45
Abstract
Keywords: Poisson regression model, sample selection effect
This paper examines properties of estimators of Poisson Regression Model with
sample selection effect. The Poisson regression model could be estimated by full
information maximum likelihood (FIML) method as a straightway choice.
However, the FIML method has the similar disadvantage as maximum likelihood
that it is un-robust for miss-specified distribution. Furthermore, the FIML
estimator is computationally burdensome. A usually robust estimator, two-stage
method of moments (TSM) and more efficient and robust estimator, nonlinear
weighted least-squares (NWLS) are alternative choose. This paper compared the
finite sample properties of these estimators with Poisson regression estimator at
the same time. The simulation results imply that there is no simple rule that could
be used to choose the best estimator. The variance of random error term in Poisson
distribution has a significant influence on performance on estimators. The variance
is larger, the bias and standard deviation of estimator become larger.
1. Introduction
In practice one may need to explain a non-negative integer variable, such as the
government want to know the determinations for the number of children in a
family, the car insurance company want to know the expected number of accidents
given some properties of a car and so on. For these purposes, the count data
regression model plays a crucial role and Poisson regression model is of the
widely used model in application. The general form of Poisson distribution is
given as
)
exp(
!
)
(
i y iy
y
p
i(1)
Where λ is the parameter of Poisson distribution, and it is a function of some
explain variables, x. usually, this function will take an exponential form that
)
exp(
x
i'β
i
Then the conditional mean of y is
y
i
x
i'β
E
exp
It is also the conditional variance of y since Poisson distribution leads to an
equal mean and variance.
Under general conditions, maximum likelihood (ML) estimator is a better
estimator because it is more efficient than other estimators and unbiased. The
log-likelihood function is
n i i n i n i i iy
y
L
1 1 1 '!
exp
ln
β
|
Y,
X
x
β
x
'iβ
(2)
In order to maximize the log-likelihood function the first order condition should
be satisfied:
n i i ij jy
x
d
L
d
10
exp
ln
β
x
'i
(3)
The solution is easily to be solved by numeric method exists since the Hessian
matrix is negative defined.
n i ik ' ij j kx
x
d
d
L
d
1 ' 2exp
ln
β
x
'i
(4)
Generally speaking, ML estimator is not robust when one fails to identify the
distribution or conditional distribution. In this situation ML estimator would lead
to significant estimate bias.
In general, there are two strategies to get a more robust estimator in terms of
possible miss-specifying the distribution. The first one is to identify an adjusted
probability density function and the corresponding model. The most common
models are censoring model (Famoye and Wang, 2004), truncation model
(Grogger and Carson, 1991), hurdle model (Mullahy, 1986) zero inflated model
(Lambert, 1992), the count regression model with endogenous switching and
sample selection (Terza, 1998). The second strategy is to apply more robust
estimators than ML estimator such as Two-Stage method, Non-linear Weighted
Least Squared method, Generalized Moment method and some other
non-parameter method. This paper focus on the second strategy, that is to say,
focuses on these estimators that Terza (1998) introduces. Terza's model could
handle both endogenous switching and sample selection effects, and he gives the
details of estimators and offers an application on endogenous vehicle ownership.
Oya (2005) uses Monte Carlo Simulation method to examine properties of those
estimators in Terza's model with endogenous switching. Furthermore Oya relaxes
the assumption on random error term in Terza's model and test these estimators'
It is better begin with a practical example that illustrates how sample selection
arising. Gronau (1974) and Heckman (1974) first propose the sample selection
effect and selection bias when they research the determinants of wages and labor
supply behavior of females. Suppose one surveyed a sample of women where only
part of them has a job and report the wages. One has an interesting in identifying
how woman’s characteristics influence the wages they get. The selection bias
arises if the workers and no-workers have certain different properties. In order to
having a clearing explanation, we divide those characteristics into two groups: a
group of observable characteristics and a group of unobservable characteristics. If
the two group women have similar characteristics or decision that working or not
is independent on woman’s characteristics, there is no reason to suspect a selection
bias problem. However, whether or not to work is generally dependent on
woman’s characteristics, for example, the number of children and the education
background (Heckman, 1974). Now the decision to work is not random, and as a
consequence, the working and nonworking subpopulation have potential different
characteristics. Further, when the decision is relevant to woman’s characteristics,
and is also determining a woman’s wage, at the same time, the sample selection
effect arises and selection bias will affect the estimator. Here, one needs to pay
attention to woman’s characteristics. As mentioned above, the two group
characteristics have different influence on whether a sample selection effect arises.
In an unreasonable situation that only the observable characteristics deciding both
the decision to work and the wage of a working woman, one can add appropriate
independent variables then selection bias could be controlled. In most cases, both
part of observable and part of unobservable characteristics have an effect on wage
and the decision to work. Since one cannot add independent variables to control
these unobservable characteristics (otherwise they are observable), it leads to
incorrect inference in the model, and introduces bias in the estimator.
Theoretically, Terza's model is able to deal with both endogenous switching and
sample selection effects; however, there might be some potential problems or
difference. One possible reason is that when dealing with endogenous switching
problem one can use the whole observations, but when sample selection effects
arises, part of the observations cannot be observed. In some extent, miss value or
unobserved elements in population are more harmful to the estimated model. So
this paper is aimed to examine the properties of these estimators under sample
selection effects.
This paper mainly focuses on the sample selection model which the count
variable's distribution is presumed as a Poisson distribution. In section 2, a review
of adjusted Poisson
regression model is presented. In section 3, the simulation
design is given. In section 4, the simulated results are shown and analyzed. Some
comments are given in section 5. Conclusion is in section 6.
2. A review of adjusted Poisson regression
models
2.1 truncation
Grogger and Carson (1991) find when sample selection rules lead to truncated
count data in dependent variable it will cause magnitude estimation bias,
especially for Poisson regression model. Assuming the dependent variable is
truncated at zero, the Poisson regression model is derived as:
!
)
1
)
(exp(
)]
0
(
Pr
1
)[
exp(
!
)
0
|
(
Pr
1 i i y i i i i y i i iy
y
ob
y
y
y
ob
i i
(5)
Where
)
exp( β
x
i
i
Then a maximum likelihood estimator could be got by maximizing the
log-likelihood function:
m i i i iY
y
L
1))
!
ln(
]
1
)
ln[exp(
(
ln
x
iβ
(6)
Where m is the truncated sample size and the last term in log-likelihood
function could be ignored since it does not include parameters.
2.2 censored
Felix Famoye and Weiren Wang (2004) introduce a censored generalized
Poisson regression (CGPR) model which could deal with censored data and model
over- or under-dispersion. The censored generalized Poisson regression model
defines the non-negative integer dependent variable Y is distributed as a general
Poisson distribution which means
i i i y i y i i i i iy
y
y
y
Y
ob
i i
1
1
exp
1
1
!
1
)
(
Pr
1(7)
and
21
|
|
i i i i i i iY
Var
Y
E
x
x
Where θ is defined as a function of independent variables, such as θ=exp(xβ).
Suppose the dependent variable is censored to be y
*for all value than larger or
equal to y
*, then the probability distribution function is:
|
1
|
Pr
1
|
Pr
1
1
exp
1
1
!
1
)
|
(
Pr
* * 1 0 * * 1 *y
y
if
y
-F
k
Y
ob
y
Y
ob
y
y
if
y
y
y
y
Y
ob
i i y k i i i i i i i i y i y i i i i i i i ix
x
x
x
(8)
Then the likelihood function of sample (Y, X) under censored generalized
Poisson regression is
} | { * } | { 1 * *|
1
1
1
exp
1
1
!
1
y y k y y i i i i y i y i i i k i i iy
F
y
y
y
L
kx
X
Y,
|
β
α,
(9)
The maximum likelihood estimator could be solved by maximizing the
likelihood function or log-likelihood function.
2.3 zero inflated count data
In practice the surveyed sample data presents there are more certain value,
usually zero, than the Poisson model expects. This will lead to the conditional
variance becoming larger or over-disperse. One reason that there is more zero than
model can predict is a sample selection process which is a combination of the
binomial distribution and Poisson distribution. This process is reasonable and
reliable since some survey questionnaires involve two kinds of answer. A survey,
for example, asks selected families how many children they have. If one family
gives the answer which is zero, it could mean the family would not want a child or
they want to have a one or more children but now they have not. These two kinds
of the family have different property even they give the same answer. A model that
can handle this problem which is called zero inflated Poisson or ZIP model
(Lambert, 1992). This model implies there are two resources that one observes a
zero value y: it might come from a binary distribution or come from a Poisson
distribution. The model could be presented as:
i i i i i
q
Poisson
y
f
y
q
on
distributi
Binary
y
1
y
probabilit
with
)
(
~
)
(
~
y
probabilit
with
~
Where
w
γ
γ
w
β
x
i i iexp
1
exp
exp
i iq
W is a vector that explains the probability and is set to be a constant times xβ.
Then the probability function of y is
otherwise
)
(
)
1
(
0
if
)
0
(
1
)
(
i i i i i iy
f
q
y
f
q
q
y
p
(10)
The likelihood function is
0 0exp
exp
!
exp
)
exp(
1
)
exp(
1
exp
exp
)
exp(
1
)
exp(
1
)
exp(
1
)
exp(
)
|
(
)
(
A A i i A i A i iy
y
p
L
β
x
β
x
γ
w
γ
w
β
x
γ
w
γ
w
γ
w
γ
w
w
,
x
γ,
β,
W
X,
Y,
|
γ
β,
i i i i i i i i i i i(11)
Where A denotes the sample set and A0 contain observations that y is zero. Then
one can get the maximum likelihood estimates by maximizing the log-likelihood
function.
2.4 Under reporting model
The under-reporting sample selection affection arises when there is reporting
mechanism. Suppose every survey element need to support a report for every
event, and there has y
i*events. Let u
ijdenote the utility that reports jth event's
report of the ith survey element and assume the utility could be modeled as:
i ij
u
z
'iα
Here, assume the utility is constant for all events in jth survey element. An index
variable, d
ij,is defined as
otherwise
0
0
u
if
1
ij ijd
Then jth survey element would report y
ireports that
* 1 i y j ij id
y
And
)
(
)
|
(
)
(
* 0 *k
y
y
p
k
y
y
y
p
y
f
i i k i i i i
(12)
Where the conditional distribution of y is distributed as a binomial distribution
that
))
(
Pr
,
(
~
)
|
(
y
iy
i*
y
i
k
Binomial
y
i
k
ob
i
z
iα
P
Winkelmann and Zimmermann (1993) give the complete model by assuming
that p (y
i*) is distributed as Poisson with mean equal to exp (xβ) and ɛ is
distributed as logistic distribution. Under these assumptions
i y i i i i i iy
z
x
y
f
!
)
exp(
)
,
|
(
(13)
where
)
exp(
1
)
exp(
γ
z
γ
z
β
x
' i ' i ' i
i
Further they provide the maximum likelihood estimates.
2.5 endogenous switching and sample selection
Terza (1998) proposes a model and three estimators that deal with both sample
selection and endogenous switching. The model is constructed with two parts: a
Poisson equation that describes how independent variables influence the discrete
dependent variable; a selection equation that describes whether or not one element
in population would be observed or this element is affected by a treatment. The
endogenous switching model is given as
)
exp(
!
)
(
i y iy
y
p
i)
exp(
'
i
i ix
α
βc
(14)
0
if
0
0
if
1
2 1 2 1 i i i i i
z
z
c
1
)
,
(
~
)
,
(
2
Σ
Σ
0
where
Binomial
f
Here the conditional mean of independent variable, y as usual, is influenced by a
specification error and this error is related with random error term in the selection
equation. Oya (2005) uses Monte Carlo Simulation method to examine the finite
properties of Terza's estimators under endogenous switching. Oya's simulation
includes three cases. In case0, the random error terms are correctly specified but
has an invalid constrain on ρ, setting ρ=0. The simulation results show that, the
larger difference between 0 and the true value of ρ, the larger bias of estimators.
The FIML estimator's standard deviations are the smallest, and TSM estimator's
standard deviation is the largest. In addition, as the true value of ρ decreases from
1 to -1, the standard deviations of NWLS estimator become larger. In case1, the
random error terms are correctly specified and has no constrain on ρ. In this case,
FIML estimator gives the smallest bias and standard deviations and those of TSM
estimator are largest. On the other hand, the properties of ɑ
0, ɑ
1, β
0, β
1and σ are
highly similar, except for property of ρ. In case2, the random error terms are
miss-specified, a gamma distributed random error term are miss-specified as a
normal distribution. In this situation, the results are similar to case1.
3. Estimators under the sample selection
effect
Suppose the count variable y is the independent variable and assumed to be
distributed as a Poisson distribution. The parameter of Poisson distribution is
determined by the equation that
exp
xβ
Here, x is exogenous independent variable including a constant term and ɛ is a
random error. For some reason, not all y could be observed, and it depends on the
following equation
0
if
otherwise
0
if
observed
αz
αz
y
Where z is exogenous variable including a constant term, and υ is a random
error. If ɛ and υ are correlated, the sample selection effects arise. That means the
value of y, which is partially dependent on ɛ, is related with whether y could be
observed. For example, when ɛ and υ are positively related then a large υ is
generally combined with a large ɛ. Since a large υ generally leads y to be observed,
and a large ɛ generally leads to a large value of y, then the y which takes larger
value will be more likely to be observed. In other words, in a survey sample data,
the proportion of y which takes large value will be bigger than the y taking small
value. This result to a non-random sample result, even the survey is based on
random design.
Under Terza's model, there are three estimators that could be used. The
following part will give the formulations of three estimators under sample
selection effects.
3.1 FIML estimator
Assuming ɛ and υ are jointly distributed as a bivariate normal distribution which
is
1
1
)
(
~
)
,
(
2
Σ
Σ
0,
N
The unconditional joint discrete density for an observed y is given as
d
f
d
y
p
d
d
f
p
d
y
p
d
f
d
ob
d
y
p
d
f
d
y
p
d
y
P
z)
(
1
)
/
(
)
,
,
1
|
(
]
)
,
(
)
,
,
|
0
(
)[
,
,
1
|
(
)
(
)
,
|
1
(
Pr
)
,
,
1
|
(
)
(
)
,
|
1
,
(
)
|
1
,
(
-2
zα
z
x,
z
x,
zα
z
x,
z
z
x,
z
x,
z
x,
(15)
By exploiting the symmetry of the normal cdf, the probability that d=0 is
21
)
,
|
0
(
Pr
zα
z
x,
d
ob
(16)
and
d
f
d
ob
(
)
1
)
|
0
(
Pr
2
x,
z
zα
d
d
y
d
d
d
f
d
y
dP
d
d
y
P
y)
exp(
2
1
1
)
)
/
(
)(
1
2
(
))]
exp(
exp(
!
)
exp(
)
1
[(
)
(
1
)
)
/
(
)(
1
2
(
)]
,
|
(
)
1
[(
)
|
,
(
2 2 2 2
zα
xβ
xβ
zα
x
z
x,
(17)
This integration could be approximated computed by Hermite Quadrature
integration method. Hermite Quadrature integration formulation is an efficient if
integrand has a particular form that
x
f
x
dx
dx
x
g
(
)
exp(
2)
(
)
(18)
and
points
chosen
some
are
)]
(
[
!
2
)
(
)
(
)
exp(
2 1 2 1 1 2 i i n n i n i i ix
and
x
H
n
n
w
where
x
f
w
dx
x
f
x
Butler and Moffiitt (1982) say when n is 3 or 4, the accuracy of Hermite
Quadrature integration is sufficient. So in this paper the n is chosen to be 3 and the
corresponding
value
of
w
and
x
are:
x=(-1.224744,0,1.224744)
,
w=(0.295408,1.181635,0.295408), Beyer (1987). In order to apply Hermite
Quadrature method, the likelihood contribution should be transformed into the
special form and after the transformation the likelihood contribution is given as
d
d
y
d
d
d
y
P
y)
exp(
1
)
2
)(
1
2
(
))]
2
exp(
exp(
!
)
2
exp(
)
1
[(
1
)
|
,
(
2 2
zα
xβ
xβ
z
x,
(19)
The conditional likelihood function is easily computed, and Fully Information
Maximum Likelihood estimators could be getting by maximizing the conditional
likelihood function.
3.2 TSM estimator
The FIML estimator is not robust when y's distribution is not correctly specified.
A more robust estimator is Two-Stage method of Moments estimator. This
estimator only assumes the conditional mean of y is
)
exp(
]
,
,
|
[
y
x,
z
d
xβ
E
The assumption of the conditional mean is the same in Terza’s paper and
random error terms have the same joint bivariate normal distribution, so the mean
of y conditions on x, z and d are the same between sample selection effect and
endogenous switching. From Terza’s paper, the conditional mean after integrating
out ε is given as
)
(
1
)
(
1
)
1
(
)
(
)
(
)
exp(
]
,
|
[
*zα
zα
zα
zα
xβ
z
x,
d
d
d
y
E
(20)
The conditional mean for observed y is, just put d=1 in the above equation,
)
(
)
(
)
exp(
]
1
|
[
*zα
zα
xβ
z
x
,d
,
y
E
(21)
Where beta star is the same as beta, except the first element is shifted by σ
2/2
term on the right side of the above equation is larger than one. It could be seen as
an adjust term on the conditional mean of y because of the sample selection effects.
As mentioned above, a positive relation between ɛ and υ leads to increase the
proportion of larger value of y in observed data and increases the mean, as well.
The adjusted term makes the "inflated" mean of y closer to the original level, at
least. The expected difference between observed y and unobserved y is:
))
(
1
)(
(
)
(
)
(
)
exp(
))
(
1
)(
(
)
(
)
(
)
exp(
)
(
1
)
(
1
)
(
)
(
)
exp(
]
0
,
,
|
[
]
1
,
,
|
[
* *zα
zα
zα
zα
xβ
zα
zα
zα
zα
xβ
zα
zα
zα
zα
xβ
z
x
z
x
*
d
y
E
d
y
E
(22)
For example, when the expectation of zα is 0.5 and σ is 0.3, the difference will
be increasing as ρ taking large absolutely value (table 1).
Table 1 The average percentage that the observed y is larger than unobserved y
ρ
-0.8
-0.6
-0.4
-0.2
0 0.2
0.4
0.6
0.8
%
-41
-30
-20
-10
0
9
19
28
36
given the expectation of zα is 0.5 and σ is 0.3
e
h
y
)
(
)
(
)
exp(
)
,
,
,
,
(
* *zα
zα
xβ
β
α
z
x
(23)
Where
e is a random error term. This equation could be estimated by non-linear
least squares method or estimated by two-stage technique if beta and alpha have
larger dimensions. The first stage is a simple probit regression analysis and obtains
a consistent estimate of ɑ
0and ɑ
1. The second-stage is a nonlinear least squares
method to
e
h
y
(
x
,
z
,
α
ˆ
,
β
*,
)
Where
ˆ
are the estimates in the first stage. Denote vector b1= (β
*, θ) and
Terza (1998) shows that the approximate distribution of b1 is given as
α
g
b
g
g
g
g
g
α
g
g
g
g
g
g
D
D
0
b
b
2 1 1 1 ' 1 1 ' 2 1 ' 2 1 ' 1 1 ' 1 1 1
h
h
E
E
VAR
E
e
E
E
where
N
n
d 1 ' 2 1]
[
]
[
)
ˆ
(
]
[
]
[
]
[
]
,
[
)
ˆ
(
(24)
VAR (
αˆ
) denotes the asymptotic covariance matrix of the first-stage probit
estimator of ɑ. In practice, a heteroskedasticity-consistent estimator of D could be
computed as
1 1)
](
)
ˆ
(
ˆ
[
)
(
ˆ
1 ' 1 1 ' 2 2 ' 1 1 ' 1 1 ' 1G
G
ΨG
G
G
α
G
G
G
G
G
D
V
A
R
Where G
1and G
2are matrices whose typical rows are
α
g
b
g
2 1 1ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
h
h
and
n n i N i
e
diag
R
A
V
)}
ˆ
(
{
))
ˆ
(
1
)(
ˆ
(
)
ˆ
(
)
ˆ
(
ˆ
2 1 1 2Ψ
α
z
α
z
z
z
α
z
α
' i ' i i ' i ' i
3.3 NWLS method
Since the variance of a Poisson distribution is λ, so for different observations,
the conditional variance of y is mostly different. Then a weighted least-square
method could gain large efficient.
From Terza's paper, the conditional variance is given as
2
2 2 2 22
exp
,
,
|
)
,
,
|
(
)
,
|
(
x
z
x
z
z
x
E
Var
y
Var
E
y
e
Var
Where
)
exp(
)
(
/
)
(
,
,
2
2
exp
* 2 2xβ
zα
zα
α
α
(25)
Parameters ɑ, β
*and θ can be obtained using two-stage estimators while σ
2could be estimated by regression approach or conditional maximum likelihood
approach. Conditional maximum likelihood approach is reliable, but it is
computational cumbersome. Therefore, the regression based approach is used in
this paper. One can rearrange terms in var(e) in such way that
estimates
stage
two
of
value
the
taking
,
are
ˆ
ˆ
ˆ
ˆ
and
)
(25
as
defined
)
exp(
)
ˆ
2
exp(
/
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
2 2 2 2 2 2 2
ψ
ψ
δ
e
ψ
ψ
δ
e
ψ
ψ
δ
e
ψ
δ
t
ψ
δ
ψ
δ
e
r
t
r
,
,
,
,
,
,
,
,
,
a
where
a
The consistent estimator of σ square is
a
a
a
of
estimate
OLS
the
denotes
ˆ
where
)
ˆ
ln(
ˆ
2
(26)
In some situation
aˆ
is smaller than zero and regression approach fails. When
one simulated data leads to
aˆ
smaller than zero, in this paper, the programming
stops and try another simulated data. On the other hand, this situation does not
always happen. Compared with the computational cumbersome of conditional
maximum likelihood approach, regression base approach is preferred.
The NWLS estimators are estimated by
arg
min
,
* * * NWLS NWLSβ
β
b
Q
NWLS
Where
,
,
,
for
estimates
is
ˆ
,
ˆ
,
ˆ
,
ˆ
ˆ
,
ˆ
2
ˆ
2
exp
ˆ
ˆ
,
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
2
ˆ
exp
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
,
,
2 , 2 2 , 2 2 2 2 * 1 2 * *α
β
α
β
α
α
β
α
β
β
* * * * *
i i i i i i i i i i i i i n i iv
v
y
e
e
Q
)
)
2
(exp(
]
)
/
1
[(
]
)
/
1
[(
)
ˆ
(
]
)
/
1
[(
]
)
/
1
[(
]
)
/
1
[(
]
,
[
)
ˆ
(
2 2 2 2 1 2 1 ' 1 1
v
h
h
E
v
E
VAR
v
E
v
E
v
E
where
n
dα
g
b
g
g
g
g
g
α
g
g
g
g
g
g
D
D
0
N
b
b
2 1 1 1 ' 1 1 ' 2 1 ' 2 1 ' 1 1 ' 1 * * NWLS NWLS(27)
In practice, the following consistent estimator of D
*could be estimated as
1 1 1
)
)(
(
)
(
)
(
)
(
ˆ
1 1 ' 1 1 1 ' 2 2 1 ' 1 1 1 ' 1 1 1 ' 1 *G
Λ
G
G
Λ
G
V
G
Λ
G
G
Λ
G
G
Λ
G
D
where
n n iv
diag
{
(
ˆ
)}
Λ
3.4 Poisson regression model
As a comparison, a Poisson regression model, only using those observed
elements in simulated data, is also estimated. It is meaningful to see whether
FIML, TSM or NWLS estimators could handle sample selection effect when it
happens and if they perform better than standard Poisson regression estimator. The
estimator of standard Poisson regression model is Maximum Likelihood estimator.
One can solve equation (3) to obtain the Maximum Likelihood estimates.
Since the data is simulated, it is possible to use the whole data, rather than the
observed part. Then a Poisson regression model with whole simulated data is also
applied.
4. Simulation design
The count-dependent variable yi, i=1, 2...500 are generated from the conditional
Poisson distribution, which is named as outcome equation:
i i i i i iy
x
y
f
exp
!
exp
,
|
and the conditional mean function is
i i i
i