BAYESIAN AND FREQUENTIST HYPOTHESIS TESTS OF HETEROSCEDASTICITY

(1)

Örebro University

School of Business and Economics

Statistics, advanced level thesis, 15hp

Superviser: Sune Karlsson

Examiner: Niklas Karlsson

Spring 2014

BAYESIAN AND FREQUENTIST HYPOTHESIS

TESTS OF HETEROSCEDASTICITY

WeiWei Feng

1981-‐09-‐01

(2)

Abstract

The homoscedasticity assumption is important for the classical linear regression. This assumption is often violated in time series data, cross section data or panel data. In order to address this issue, the generalized linear regression or feasible generalized linear regression is suggested. However, applying those methods requires either

knowledge of structure of variances or estimation of this structure. Before running into such a complex process, testing the heteroscedasticity is not only important but also necessary. In this paper, we used the Monte Carlo simulation to compare both the size and the power of Bayesian hypothesis test with the frequentist hypothesis tests for heteroscedasticity. The Bayesian hypothesis in this case is unpractical and less effective comparing with frequentist hypothesis tests. However, the Bayesian heteroscedasticity test could be possibly improved and the Bayesian heteroscedasticity model could be applied in other situations such as estimations of parameters or the structure of variances.

Keywords: Bayesian Hypothesis test, the White test, the Breusch-‐Pagan Test, the

Koenkar-‐Basset Test, the Gibbs sampling, the Metropolis-‐hasting sampler, the marginal likelihood simulation.

(3)

Introduction

When testing the assumption of homoscedasticity in the classical linear regression, a number of the literatures can be gathered within the classical statistic. White (1980) is the most cited article, the well-‐known Breusch and Pagan (1979) and an improved test proposed by Koenker (1981), Godfrey (1996) and Koenker and Bassett (1982). On the other hand, there are few literatures about the Bayesian hypothesis test for

heteroscedasticity. Richard Startz (2012) constructed Bayesian heteroscedasticity-‐ Robust Standard Errors. Differing from his model, a hierarchical Bayesian model is constructed in order to conduct a Bayesian hypothesis test for heteroscedasticity. Both the size and the power will be compared with three common used frequentist

hypothesis tests: the White’s test (White, 1980), the Breusch-‐Pagan test (Breusch and Pagan, 1979), and the Koenkar-‐Basset test (Koenker and Bassett, 1982).

In the classical linear regression model, one of the important assumptions is

homoscedasticity and nonautocorrelation. It means each disturbance, εi, has the same finite variance, σ2_{,
and
is
uncorrelated
with
any
other
disturbances.
This
assumption}

limits the generality of the model and is often violated in the real world.

Heteroscedasticity means that disturbances have different variances, which often occurs in time-‐series data, cross-‐section data or panel data. In this paper, nonautocorrelation is assumed for simplicity. The linear regression model with heteroscedasticity can be expressed as follow:

y= Xβ + ε E[ε X] = 0 E[ε ε X] = σ′ 2Λ = Σ σ2_{Λ =}σ2 λ1 … 0 ! " ! 0 # λn ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ = σ1 2 … 0 ! " ! 0 # σn 2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ (1)

(5)

Where y is a vector of dependent variables, X is a k× nmatrix of k independent variables

including the constant term, n is number of observations, and _{β
is
a
vector
of}

coefficients. While ε is a vector of disturbances and assumed normally distributed in ordinary situations. Σ is a variance and covariance matrix.Λis a positive definite matrix where Λ = Ι means homoscedasticity.

Even with heteroscedasticity, under the assumption E[ε X] = 0 and the normality assumption of the disturbance, the least squares estimator ˆβ = ( ′X X)−1X y is still ′

unbiased, consistent and asymptotically normally distributed. However, the estimates are not BLUE (best linear unbiased estimator) as showed in the Gauss-‐Markov Theorem. The variance estimator s2

(X X)′ −1is incorrect and leads to unreliable confidence interval

estimation and hypothesis tests. Estimation of the asymptotic covariance matrix then could be based on

var( ˆβ X)= ( ′X X)−1X (s′ 2Λ)X( ′

X X)−1 (2)

If Λ’s structure is known, the model could be transformed to a homoscedasticity model and ordinary least square (OLS) is still BLUE. In a more common case, the structure is impossible to know. So ˆΛ is estimated according to a different approach called the method of feasible generalize least squares (FGLS). (Green, 2012)

Before rushing into such a complex process, it is best to test whether heteroscedasticity exists in the data. This paper mainly focuses on comparing both the size and the power between frequentist heteroscedasticity tests and the Bayesian hypothesis test.

(6)

The frequentist hypothesis tests

The three common used classical tests are the White test (White, 1980), the Breusch-‐ Pagan test (Breusch and Pagan, 1979), and the Koenkar-‐Basset test (Koenker and Bassett, 1982).

They are based on the idea that OLS is a consistent estimator of β even in the presence of heteroscedasticity. In this case, the OLS residuals will preserve information, although not perfectly, about the heteroscedasticity. Therefore, the tests are applied to the OLS residuals in order to detect heteroscedasticity in the data.

The above-‐mentioned three tests are all derived from the Lagrange multiplier test (LM). The LM test assumes a properly scaled Lagrange multiplier has an asymptotically

normal distribution. The LM test statistic is a quadratic form of the Lagrange multiplier. The test statistic, under the null hypothesis, is then chi-‐squared distributed.

The White test

The White test for heteroscedasticity has a basic idea: if the model is homoscedastic, then the disturbances are randomly distributed and have no relationship with any independent variables and their combinations such as squared form or a cross product form. The test statistic is derived based on the data with no heteroscedasticity, and then the least squares estimator of variance gives a consistent estimator of var[b X] as in the equation (2) (Green, 2012, p.315).

Null hypothesis H₀: Homoscedasticity (σ_i2 ₌σ2_{for
all
i
in
model
(1))}

Alternative hypothesis H₁: heteroscedasticity (not all σi

2_{are
equal)}

The testing process can be simplified as follows:

(7)

2) Compute the OLS residuals, e₁,e₂,...,e_n. e is the OLS estimator of ε in the model (1).

3) Regress ei

2_{against
a
constant,
all
independent
variables,
their
squared
form,
and}

their cross products.

4) Compute R2 from this “ auxiliary regression” in Step 3.

5) Compare nR2 to the critical value from the Chi-‐squared distribution with p

degrees of freedom.

The test statistic is nR2 ∼ χ_p2₋₁, where p is the number of the regressor in this “ auxiliary

regression” including the constant in step 3 and n is the number of observations.

The White test is very general and the specific assumption about the nature of the heteroscedasticity is not necessary to make. However, because of its generality, it has a serious shortcoming. The test may reveal heteroscedasticity, but does not detect the reason for it. This causes some trouble when applying the generalized least squares (GLS) method, for which we need to know the structure of the variance. At the same time, the degrees of freedom grow rapidly with the number of independent variables. Hence, it may be just appropriate for the relatively large sample size.

One modification can be done in order to lower the degrees of freedom. In step 3, we regress e2_{against
ˆy
and
ˆy}2

instead, where ˆy = Xb , b is the OLS estimator of the parameters in model (1). In this way, the regressors actually include all independent variables, their squared form and their cross products. In this method, the degrees of freedom decrease to 2 and the test statistic is nR2_{∼ χ}2

(2) . This method is used instead of the original White test in our study.

The Breusch-‐Pagan Test

The Breusch-‐Pagan Test is derived from an LM test with the hypothesis that σi

2₌_σ2

f (Zα) , where Z is a p × n matrix of special combination of independent

variables, squared form of independent variables and the cross products of independent variables that is suspected to cause the heteroscedasticity, p is the number of regressors

(8)

including a constant and n is the number of observations, and f is a function. The model is homoscedastic if α = 0, _{α
is
a
vector
of
parameters
in
the
equation.

The
test
can
be} carried out with a simple regression.

Null hypothesis H₀: Homoscedasticity (α = 0)

Alternative hypothesis H₁: Heteroscedasticity (α ≠ 0)

1) Regress the dependent variable y with all the independent variables X.

2) Compute the OLS residuals, e₁,e₂,...,e_n. e is the vector with all OLS estimator of disturbanceεi(i= 1,2,...,n)in the model (1) and estimates the variance of disturbance as e e / n′ .

3) Regress e2

/ (e e′

n ) against Z, e

2_{is
the
vector
of
all
the
squared
e}

i and compute the ESS (explained sum of square) of the regression. The test statistic ESS is

asymptotically distributed asχp−1

2 _{,
p
is
the
number
of
regressors
in
Z
including
a}

constant. This test statistic ESS can be directly computed with the matrix

formLM = 1

2[g Z(′ Z Z )′

−1_Zg]_{;
g
is
the
vector
of
observations
of
g}

i= ei

2_{/ (}_{e e / n)}_′ −1.

(Breusch and Pagan, 1979)

4) Compare the value of the test statistic from step 4 to the critical value from the Chi-‐squared distribution with p-‐1 degrees of freedom. Homoscedasticity is rejected if the value exceeds the critical value.

The Breusch-‐Pagan Lagrange multiplier test is claimed to be sensitive to the normality assumption. Violation of this assumption does not guarantee that the variance of ε_i2_is

equal to 2σ4_{under
homoscedasticity.

In
this
situation,
the
test
statistic
is
incorrect.}

(Green, 2012, p.276)

(9)

The Koenker-‐Basset Test

Koenker and Basset (1982) has the same H₀ and H₁and almost the same process as the Breusch-‐Pagan test.

Null hypothesis H₀: Homoscedasticity (α = 0)

Alternative hypothesis H₁: Heteroscedasticity (α ≠ 0)

However, it suggests a more robust estimator of the variance of εi

2_:V ₌ 1 n (ei 2− ′e e n ) 2 i=1 n

∑

. With this change the test statistic in Koenker and Basset test becomes

LM = [1 V](e 2_{− ′}e e n ′)Z( ′Z Z ) −1 _′ Z (e2− ′e e

n ) , which is asymptotically distributed as χp−1

2 _.

Where p is the number of regressors in Z (has the same structure as in the Breusch-‐ Pagan test) including a constant (Green, 2012). Other steps are the same as the Breusch-‐ Pagan test.

Under normality, the Koenker-‐Basset test has the same limiting distribution as the Breusch-‐Pagan statistic, but violating the normality, it can be a more powerful test. And if Z has the same structure as the regressor in the white test, the Koenker-‐Basset test is algebraically the same as the White test (Waldman, 1983).

In the simulation part of the Breusch-‐Pagan test and the Koenker-‐Basset test, Z is some partition of combination of independent variables, square of independent variables and cross product of independent variables.

(10)

The Bayesian Hypothesis test

In the frequentist hypothesis tests framework, there are two hypotheses: the Null hypothesis H₀ and the alternative hypothesis H₁. A decision is made based on the collected data. The Null hypothesis is either rejected in favor of the alternative

hypothesis, or accepted. In Bayesian hypothesis testing, there could be more than two hypotheses.

The Bayesian hypothesis test will be explained in the way that only two of the hypothesis will be chosen between.

The model can be generally expressed as follows:

p(yθMi, Mi)-‐-‐ A data likelihood distribution

p(θMi Mi)-‐-‐A prior distribution under the model Mi (3)

p(Mi) -‐-‐ A prior probability that the model is “correct”

The models can be different on the data likelihood distribution, a prior distribution or both. In this paper, the heteroscedasticity model and the homoscedasticity model are different in both the data distribution and prior distributions.

Before explaining the Bayesian hypothesis test, the loss function is introduced first.

L(Mi, Mj)= 0 when model Mi is “ correct” L(Mi, Mj)> 0 when model Mi is “ wrong”, i≠ j

The posterior expected loss from choosing model i based on the data yo_is

E[L(Mi, Mj)]= L(Mi, Mj)p(Mj

j

∑

yo

) , the model with the smallest posterior expected

loss would be chosen. yo_{means
the
collected
data.}

In a situation with only two hypotheses, the expected lost for model M_i:

E[L(M_i, M_j)]= L(M_i, M_j)p(M_j yo₎

The expected loss for model Mj = E[L(Mj, Mi)]= L(Mj, Mi)p(Mi y o

(11)

If L(M_i, M_j)p(M_j yo₎< L(M

j, Mi)p(Mi y

o_{) ,
the
model
M}

i will be chosen over model Mj based on the data information yo_.

In this inequality, the p(Mi y o

) is the posterior model probability which follows Bayes

rules P(Mi y)= p(y Mi)p(Mi) p(y Mj)p(Mj) j

∑

(4) p(y Mi)= p(y

∫

θMi) p(θMi Mi)dθ (5)

By putting the equality (4) into the inequality, because the denominators are the same for both models, we get:

Bij = p(yo Mi) p(yo Mj) > L(Mi, Mj)p(Mj) L(Mj, Mi)p(Mi) (6)

Bij is a Bayes factor. When it is larger than the right-‐hand side in the inequality (6), the model M_i is preferred. (Gelman et al., 2004)

In the classical hypothesis test, the null hypothesis is special which requires

overwhelming evidence in order to reject it. However, in the Bayesian hypothesis test framework, all hypotheses are treated equally. In this situation, assigning a large prior probability to one hypothesis or specifying a large loss from erroneously choosing the other hypothesis make it possible to simulate the sense of null hypothesis as in the frequentist hypothesis test. The left-‐hand side of the inequality (6) could be considered as “the test statistic” and the right-‐hand side of the inequality (6) could be considered as “the critical value” as in the frequentist hypothesis test framework. The critical value is estimated with particular Bayesian hypothesis tests to simulate the similar effect as the significance level.

(12)

To consider the Bayesian hypothesis test for heteroscedasticity, two different models are involved in the paper: M_heteris the heteroscedasticity model and M_hom is the model for homoscedasticity.

H₀: Homoscedasticity H₁: Heteroscedasticity

If the Bayes factorBheter/hom=

f (yo Mheter) f (yo

Mhom)

> c(c is the critical value) is true, reject the null hypothesis.

The heteroscedasticity Bayesian model Mheter and the homoscedasticity Mhom

Heteroscedasticity model Mheter

The heteroscedasticity model is a hierarchical Bayesian model:

yβ,Σ ∼ N(Xβ,Σ) β ∼ N(β,Ω) Σα,w∼ log N(Xα,w2 ) α ∼ N(α,Δ) w2 ∼ IG(ν,ρ) (7)

Where y is a vector of n dependent variables, X is a n× k matrix of independent

variables including a constant term, β is a vector of k parameters. β is the parameters vector and Ω is the variance and covariance matrix for the β prior distribution. Σ is the diagonal variance matrix and its prior distribution is a lognormal distribution with parameters α and w2_{which
have
their
own
prior
distributions.}α and Δ are the

parameters of the α prior distribution, while v and ρ are parameters of the w2 prior

(13)

means inverse gamma distribution. The model is a hierarchical model because the parameters in the _{Σ
have
their
own
distributions.}

The data likelihood distribution:

f (yβ,Σ, X) = (2π)− n2 Σ−12exp[−1

2(y− Xβ ′)Σ −1

(y− Xβ) (8)

This is the likelihood function for linear regression. The details of every part are expressed in the model (7).

Σ = σ1 2 … 0 ! " ! 0 # σn 2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟

σ₁,σ₂,...,σ_n are different which represents the heteroscedasticity in the linear regression.

The conjugate prior for β is

p(β)= (2π)−k2 Ω−12exp[−1

2(β−β ′)Ω

−1

(β−β)] (9)

Where k is the number of parameters including a constant.

The Σ is assumed to have a lognormal distribution and expected value is Xα :

lnσ2 = Xα+ u

u∼ N(0,w2

) (10)

Where σ2is vector of all the σ

i

2_.

The likelihood function for Σ is

(14)

The conjugate prior distribution for α is: p(α)= (2π) −k 2 Δ−12exp[−1 2(α−α ′)Δ −1 (α −α)] (12)

The conjugate prior distribution for w2 is:

p(w2 )= ρ v Γ(ν)(w 2 )−(v+1)exp(− ρ w2) (13)

By multiplying all the parts in this hierarchical model, the posterior distribution becomes

p(β,Σ,α,w2

y)∝ f (yβ,Σ)* p(β)* f (Σα,w2

)* p(α)* p(w2

) (14)

In this situation, it is easy to get the conditional posterior distributions for all

parameters and the Gibb sampling can be applied to get marginal distributions for all parameters (Gelman et al., 2004). The details of Gibb sampling are explained in the part of the Bayes factor simulation.

1) Conditioning on Σ,α and w2, the conditional posterior for β is

p(β y, X,Σ,α,w2)∝ exp[−1 2(y− Xβ ′)Σ −1 (y− Xβ)* exp[−1 2(β−β ′)Ω −1 (β−β)] (15)

This is the kernel of the normal distributions, so we have

β y, X,Σ,α,w2 _{∼ N(β,Ω)}

Ω = ( ′X Σ−1X+ Ω−1)−1 β = Ω( ′X Σ−1y+ Ω−1β)

(16)

2) Conditioning on β,α and w2_{,
the
conditional
posterior
for}_{Σ
is:}

(15)

p(Σ y, X,β,α,w2 )∝ (2π)− n2 Σ−12_exp[−1 2(y− Xβ ′)Σ −1_(y_{− Xβ)]*} (2πw2 )− n2 Σ−1exp[− 1 2w2(lnσ 2− Xα ′)(lnσ2− Xα )] = (2πw)−n _Σ−32 * exp{−1 2[(y− Xβ ′)Σ −1_(y_{− Xβ) −} 1 w2(lnσ 2− Xα ′)(lnσ2− Xα )]} (17) There is no closed form of this distribution, and the normalized constant part is impossible to calculate.

3) Conditioning on the Σ,w2_andβ , the conditional posterior for α is:

p(α y, X,β,Σ,w2)∝ exp[− 1 2w2(lnσ 2_{− X}α _′)(lnσ2_{− X}α_{)]* exp[−}1 2(α −α ′)Δ −1 (α−α)] = exp{−1 2[_w12(lnσ 2_{− X}α _′)(lnσ2_{− X}α )+ (α−α ′)Δ−1(α−α)]} (18) This is the kernel of the normal distribution:

α y, X,Σ,β,w2 ∼ N(α,Δ) Δ = ( 1 w2 X X′ + Δ −1 )−1 α = Δ( 1 w2 X ln′ σ 2_{+ Δ}−1α ) (19) 4) Conditioning on the Σ,β and α , the conditional posterior for w2_is:

p(w2 y, X,β,α,Σ) ∝ (w2)−(v+n2+1)exp(− ρ

w2)exp[− 1 2w2(lnσ

2_{− Xα ′)(lnσ}2_{− Xα )]

(20)}

This is the kernel for the inverse gamma distribution: w2 y, X,Σ,β,α ∼ IG(v,ρ) v= v +n 2 ρ=ρ+1 2(lnσ 2_{− X}_α _′)(ln_σ2_{− X}_α ) (21)

(16)

Homoscedasticity model M_hom

With homoscedasticity, the Σ = γ2 … 0 ! " ! 0 # γ2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟

, the disturbances have the same

variance. The likelihood function in (8) can be simplified as:

f (yβ,γ2, X)= (2πγ2)− n2exp[− 1

2γ2(y− Xβ ′)(y − Xβ)] (22) The model becomes simpler as follows:

yβ,γ2 ∼ N(Xβ,γ2 ) β ∼ N(β,Ω) γ2 _{∼ IG(τ,ϕ)} (23)

The prior for β is the same as the one that was used in the Mheter and γ

2_{is
similar
to}

w2

with different parameters:

p(β)= (2π) −k 2 Ω−12exp[−1 2(β−β ′)Ω −1 (β−β)] (24) p(γ)= ϕ τ Γ(τ)(γ 2 )−(τ +1)exp(− ϕ γ2) (25)

Conditioning on each other, the posteriors for β and γ are:

p(β y, X,γ2)∝ exp[−2γ12(y− Xβ ′)(y − Xβ)]exp[−1₂(β − β ′)Ω

−1_{(β − β)]} β y, X,γ2 ∼ N(β,Ω) Ω = ( ′X X γ2 + Ω −1 )−1 β = Ω( ′X y γ2 + Ω −1_β ) (26) p(γ2 _{y, X,}β) ∝ (γ2₎−(τ+1)(γ2₎−n/2exp(−ϕ γ2)exp[− 1 2γ2(y− Xβ ′)(y − Xβ)]

(17)

γ2 y, X,β ∼ IG(τ,ϕ) τ = τ + n 2 ϕ = ϕ +1 2(y− Xβ ′)(y − Xβ) (27)

Bayes factor simulation

In the Bayesian hypothesis test, the Bayes factor (6) plays an important role. In order to calculate the Bayes factor, the marginal likelihood is needed which is obtained by integrating over the prior distribution as in equation (11). In the high dimension parameter space (more than three parameters in the model), it is almost impossible to do this integrating most of times.

In order to estimate the marginal likelihoods, Chib (1995) suggested a method based on the Gibbs sampler, and Chib and Jeliazkov (2001) generalized this to Metropolis-‐Hasting samplers. Both these two samplers are based on Markov Chain Monte Carlo simulation and the full conditional posterior is needed in order to perform the simulation. In this paper, both methods are used to estimate the marginal likelihoods.

The key insight of those methods is based on the Bayes rules.

p(θM_i y o , Mi)= f (yo Mi,θM_i)p(θM_i Mi) p(yo Mi) , where p(θMi y o

, Mi) is the posterior distribution for parameters θMi in the model Mi and the meaning of other parts are described in the

model expression (3). The marginal likelihood p(yo Mi) is actually the normalizing constant of the posterior density. Therefore, when the data is known, this is the same constant for any points inside the parameter space. The marginal likelihood is changed to the right side, and taking logarithms on both sides can avoid underflow in

computation. Estimating the marginal likelihood becomes:

log[ p(yo Mi)]= log[ f (y o Mi,θMi * )]+ log[ p(θMi * Mi)]− log[ p(θMi * yo , Mi)]₍₂₈₎ In this way, the calculation of the marginal likelihood is reduced to find the left three parts with a single point θ_M*_i_{.

This
could
be
any
point
in
the
parameter
space.
Normally,}

(18)

the high-‐density point is used to increase the efficiency. The first two parts in the left-‐ hand side of equation (28) are easy to calculate because the distributions and the data are both known. The last part of equality (28) is mostly focused on in the simulation.

1) Simulation for marginal likelihood in Heteroscedasticity model Mheter:

log( p(yα,β,Σ,w2 , Mheter))= log( f (y oα* ,β* ,Σ* ,w2* )+ log( f (Σ*α* ,w2* )× p(α* )× p(β* )× p(w2* )) − log(p(α* ,β* ,Σ* ,w2* yo )) (29) In the heteroscedasticity model, the marginal likelihood for yo_{can
be
calculated
by} equation (29). Where * means the fixed point. If α*,β*,Σ*and w2*are all fixed points, the first two parts f (yoα*,β*,Σ*,w2*) andf (Σ*α*,w2*)× p(α*)× p(β*)× p(w2*) are just

simple calculations. Chib (1995) suggests a method to estimate the posterior in the context of Gibb MCMC sampling. In the heteroscedasticity model, the posterior

p(α*

,β*

,Σ*

,w2*

yo

) needs to be estimated. Then, applying the law of total probability, we

have:

p(α*,β*,Σ*,w2* yo)= p(Σ* yo)× p(β* Σ*, yo)× p(α* Σ*,w2, yo)× p(w2* Σ*,α*, yo)₍₃₀₎

Where p(Σ*

yo

) is the marginal posterior density for special point Σ*_{,
which
can
use
the}

ordinary Gibb sampler based on the full conditional distributions for all parameters.

p(β* Σ*

, yo) and p(α* Σ*

, yo

) are both the marginal posterior density for special points β* and α*_{conditioning
on}Σ*_.

p(w2* _Σ*

,α*

, yo

) is the marginal posterior density for

w2*_{conditioning
on}_Σ* and α*_.

The mixed stage Gibb sampler and Metropolis-‐Hastings (Chib and Jeliazkov, 2001) are conducted to estimate the marginal likelihood for the heteroscedasticity model.

(A) Estimate p(Σ*

yo

)

1. Specify the initial values for Σ0,α0,β0 and w0 2

(19)

2. Conduct the Metropolis-‐Hastings method for sampling from (17) because there is no closed form for the posterior distribution of Σ .

a) Draw the proposal pointsΣp from an independent multivariable normal distribution MVN(Σ0,ω ). ω affects the acceptance rate that is the proportion of

proposal Σp being accepted during next steps b), c) and d).

b) Calculate p=exp{− 1 2[(y− Xβ0 ′)Σp −1 (y− Xβ₀)− 1 w₀2(lnσ_p 2_{− X}α 0 ′)(lnσp 2_{− X}α 0)]} exp{−1 2[(y− Xβ0 ′)Σ0 −1 (y− Xβ0)− 1 w02(lnσ0 2_{− X}α 0 ′)(lnσ0 2_{− X}α 0)]}

c) Draw a value q from the uniform distribution U(0,1) , where U means the uniform distribution.

d) If p> q , set the Σ1 value toΣp. Otherwise, keep the old value Σ0.

3. Sample β1 from the posterior multinomial distribution (16) with the value Σ1

from step 2.

4. Sample α₁ from the posterior multinomial distribution (19) with the value Σ1

from step 2 and initial value w0 2_.

5. Sample w1

2_{from
the
posterior
inverse
gamma
distribution
(21)
with
the
value}Σ 1

from step 2 and the value α₁ from step 4.

6. Go back to step 2 with the values Σ1,α1,β1 and w1 2_.

7. Repeat step 2 to step 6 N times including burn-‐in time B till all the marginal distributions converge.

Burn-‐in time is the sufficient time to throw away before the Markov Chain is converging to the marginal distributions.

8. Find the high-‐density value Σ*_{in
the
converged
marginal
distribution
for}_{Σ
and}

(20)

(B) Estimate p(β* Σ*, yo)

1. Sample β₁ from the posterior multi-‐normal distribution (16) with the special value Σ*_{from
the
part
(A).}

2. Repeat step 1 N times including burn-‐in time B till the marginal distribution converges.

3. Find the high-‐density value β*_{and
its
estimated
density}_p(β* _Σ*

, yo

) with the

converged marginal distribution.

(C) Estimate p(α* Σ*

,w2

, yo

)

1. Sample α₁ from the posterior multi-‐normal distribution (19) with the special value Σ*_{from
part
(A)
and
the
initial
value
w}

0 2_.

2. Sample w1

2_{from
the
posterior
inverse
gamma
distribution
(21)
with
the
value}_Σ*

from part (A) and the value α₁ from step 1. 3. Go back to step 1 with the value w1

2

.

5. Find the high-‐density value α*_{and
its
estimated
density}

p(α* Σ*,w2, yo).

(D) Estimate p(w2* _Σ*

,α*

, yo

)

1. Sample w₁2_{from
the
posterior
inverse
gamma
distribution
(21)
with
the
value}

Σ* from part (A) and the value α*_{from
part
(C).}

2. Repeat step 1 N times including burn-‐in time B till all the marginal distribution converges.

3. Find the high-‐density value w2*_{and
its
estimated
density}

p(w2* _Σ*

,α*

, yo

).

(21)

(E) Estimate p(yα,β,Σ,w2

, Mheter) The p(yα,β,Σ,w2

, Mheter)is estimated by putting all the special parameters Σ*,α*,β*_,w2*_{and
the
densities}_p(_Σ*

yo ), p(β* _Σ* , yo ), p(α* _Σ* ,w2 , yo ), p(w2* _Σ* ,α* , yo ) into

the equation (29) and (30).

2) Homoscedasticity model

log( p(yβ,γ2, Mheter))= log( f (y o _β* ,γ2*)+ log(p(β*)× p(γ2*))− log(p(β*,γ2* yo)) p(β* ,γ2* yo )= p(β* yo )× p(γ2* β* , yo ) (31)

The homoscedasticity model is simpler with only two parameters while they all have a closed form. Gibbs sampling can be used here to estimate the marginal likelihood (Gelman et al., 2004).

(A) Estimate p(β* yo)

1. Set initial value γ0.

2. Sample β1 from the posterior multi-‐normal distribution (26) with the initial value

γ₀.

3. Sample γ1

2_{from
the
posterior
inverse
gamma
distribution
(27)
with
the
value}β 1

from step 2.

4. Go back to step 2 with the value γ₁2_{from
step
3.}

6. Find the high-‐density value β*_{and
its
estimated
density}_p(β*

yo

).

(22)

(B) Estimate p(γ2* β*, yo)

1. Sample γ1

2_{from
the
posterior
inverse
gamma
distribution
(27)
with
the
value}β*

from part (A).

2. Repeat step 1 N times including burn-‐in time B till all the marginal distributions converge.

3. Find the high-‐density value γ 2*_{and
its
estimated
density}

p(γ2* β*, yo) .

(C) Calculate p(yβ,γ2

, Mhom) according to equation (31) in the same way as for the

heteroscedasticity part.

3) Bayes factor is Bheter/hom =

p(yα,β,Σ,w2, Mheter)

p(yβ,γ2

, M_hom) , in which p(yβ,γ

2

, Mhom) and

p(yα,β,Σ,w2

, Mheter) are separately estimated in part 1) and part 2).

The tests for convergence

The successful Markov Chain Monte Carlo simulation is mainly based on if the

simulation can actually converge to the target distributions. Before estimating the Bayes Factor, it is necessary to perform the tests for convergence and further decide the

proper burn-‐in draws.

One way to check if the Markov Chain has converged is to see how well the chain is mixing or moving around in the parameter space. If the chain is taking a long time to move around the parameter space, then it will take longer to converge.

Some visual inspections can be conducted to see how well the chain mixing and it needs to be performed for every parameters. Trace plots, running mean plots and

autocorrelation plots can be used as visual inspections. There are also a lot of diagnostics for testing convergence. The Heidelberg and Welch diagnostic

(Heidelberger and Welch, 1983) together with visual diagnostic running mean and autocorrelation plot will be used in this paper.

(23)

The Heidelberg and Welch diagnostic has two part.

The first part is a stationary test. It calculates the test statistic to accept or reject the null hypothesis: the Markov chain is from a stationary distribution. The test is applied first to the whole chain, if the null hypothesis is rejected, discarding the 10%, 20%, etc., of the chain until either the null hypothesis is accepted, or 50% of the chain is discarded which indicates a longer running is needed. If the null hypothesis is accepted, the number of iterations to keep and burn-‐in times are reported.

The second part is the half-‐width test, which uses the portion of the chain passed the stationary test to calculate a 95% confidence interval for the mean. The half-‐width of the interval is compared with the estimated mean. If the ratio between the half-‐width interval and the estimated mean is lower than a value ξ , the half-‐width test is passed. Otherwise, the length of the passed chain is not enough to estimate the sufficient accuracy mean.

The running mean plot is a plot of the iteration against the mean of all draws up to that iteration. The law of Large Numbers concerns the stability of the mean as the sample size increases. If the Chain converges, the running mean plot should show than the mean is moving tightly around a fixed value as the sample size increases.

The lag k autocorrelation ρkis the correlation between every draw and its kth lag. ρk = (xi− x i=1 n−k

∑

)(xi+k− x) (xi− x) 2 i=1 n

∑

The kth lag autocorrelation is expected to be smaller as k increases. If autocorrelation is still relatively high for higher k, it indicates slow mixing.

For Metropolis-‐Hasting Markov Chain Monte Carlo, the acceptance rate is also important diagnostic for convergence. The acceptance rate measures the proportion of new

parameters from a proposal distribution being accepted. It suggests the proper acceptance rate between 20% and 40% (Gelman et al., 2004).

(24)

The size and the power

In the frequentist hypothesis test framework

Size is the probability of rejecting H₀ when it is true. In order to evaluate the size of a frequentist hypothesis test, generate N times data under the null hypothesis and calculate proportion of rejections of H₀. Proportion should be close to the significant level α .

Power is the probability of rejecting H0 when it is false. To evaluate power, generate

data under the alternative hypothesis H₁ N times and calculate proportion of rejections of H0.

When controlling the size around the significance level, the bigger power implies better performance of a test.

In the Bayesian hypothesis test framework

Recall from the part Bayesian hypothesis test:

H₀: Homoscedasticity H₁: Heteroscedasticity

If Bayes factor Bheter/hom=

f (y Mheter)

f (y M_heter)> c (c is the critical value), the null hypothesis is rejected. For simulating the significance level in the frequentist hypothesis test, the N times Bayes Factor is calculated. The critical value c corresponds to the “significance level α ” when classical test is chosen. The N should be large enough to represent a general situation. Generate Bayes Factor N times with data under null hypothesis H0

and find the “ critical value c”. The proportion, when the Bayes Factors are bigger than c, should be equal to the significance level α . The power is the proportion when the

(25)

Bayes Factors are larger than c with the data generated under the alternative hypothesis

H₁N times.

In the M_heter and M_hom the Bayes Factor is actually affected by the sample size because the number of elements in the diagonal of Σ in the heteroscedasticity model M_heter is growing as the sample size is growing. The bigger the size gets, the smaller the Bayes Factor will become. So the different sample sizes have different “ critical value c” in our Bayesian hypothesis test.

(26)

Monte Carlo Simulation

Data generating process

In order to simplify the problem without losing generality, four independent variables from normal distributions are used in the simulation. The heteroscedasticity appeared normally in panel data in which R2_{is
normally
quite
low.
Therefore,
R}2_{is
chosen}

around 30% to control the coefficients β and α in (32), where β = (β0,β1,β2,β3,β4) and

α = (α0,α1,α2,α3,α4) .

Here is the model used to generate data:

yi =β0+β1xi1+β2xi 2+β3xi 3+β4xi 4 +εi εi ∼ N(0,σi) σi 2_{= exp(}α 0+α1xi1+α2xi 2+α3xi 3+α4xi 4) (32)

To make things simple, the parameter _{β = (1,1,1,1,1)
is
chosen
here
and
the
x}₁∼ N(1,1) , x2 ∼ N(1,1.2) , x3∼ N(1,1.5) and x4 ∼ N(1,1.8) are generated, where N stands for normal

distribution. In order to control the R2_{around
30%,
the
variance
of
the
disturbances}

should be around 18. In other words, the expected value of the variance in model (32) exp(α0+α1+α2+α3+α4) is around 18. So α = (2,0.3,0.2,0.25,0.25) is chosen in the

simulation study for the data with heteroscedasticity. With this parameter _{α ,
even} when no heteroscedasticity is in the data, the R2_{is
still
quite
low
around
50%.}

Three different comparisons are performed in our study: different sample size, normality assumption and the structure of variances.

1) Small sample size behavior is normally interesting because the asymptotically distribution of the test statistic is violated when the sample size is small. In this special case, the degrees of freedom can grow dramatically if the test statistic is estimated by regressing variances against all the independent variables, their squared forms and their cross products.

BAYESIAN AND FREQUENTIST HYPOTHESIS TESTS OF HETEROSCEDASTICITY

Örebro University

School of Business and Economics

Statistics, advanced level thesis, 15hp

Superviser: Sune Karlsson

Examiner: Niklas Karlsson

Spring 2014

BAYESIAN AND FREQUENTIST HYPOTHESIS

TESTS OF HETEROSCEDASTICITY

WeiWei Feng

1981-­‐09-­‐01

Abstract

Table of Contents

Introduction

The frequentist hypothesis tests

∑

The Bayesian Hypothesis test

∑

∑

∫

∑

∑

The size and the power

Monte Carlo Simulation

1981-‐09-‐01