• No results found

On Median and Ridge Estimation of SURE Models

N/A
N/A
Protected

Academic year: 2021

Share "On Median and Ridge Estimation of SURE Models"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

This doctoral dissertation is a progressive generalization of some robust estimation methods to the estimation of Seemingly Unrelated Regression Equations (SURE) models. The robust methods are each of the Least Absolute Deviations (LAD) estimation method, also known as the median regression, and the ridge estimation method. The dissertation consists of 5 articles, as follows.

The first article is a generalization of the median regression to the estimation of the SURE models. The second article generalizes the median regression on the conventional multivariate regression analysis, i.e., the SURE models with the same design matrices of the equations. In the third article, the author develops ridge estimation for the median regression. Some properties and the asymptotic distribution of the proposed estimator are investigated. In the fourth article, the properties of some biasing parameters used in the literature for ridge regression are investigated when they are used for the new methodology proposed in the third article. In the last article, the methodologies of the four preceding articles are assembled in a more generalized methodology to develop the ridge-type estimation of the LAD method for the SURE models.

ISSN 1403-0470 ISBN 978-91-86345-36-5 JIBS

Jönköping International Business School Jönköping University JIBS Dissertation Series No. 083 • 2012

On Median and Ridge Estimation of

SURE Models

JIBS Disser tation Series No . 083 On Median and Ridg e Estimation of SURE Models ZANGIN ZEEB ARI

On Median and Ridge Estimation of

SURE Models

DS

ZANGIN ZEEBARI

ZANGIN ZEEBARI

(2)

This doctoral dissertation is a progressive generalization of some robust estimation methods to the estimation of Seemingly Unrelated Regression Equations (SURE) models. The robust methods are each of the Least Absolute Deviations (LAD) estimation method, also known as the median regression, and the ridge estimation method. The dissertation consists of 5 articles, as follows.

The first article is a generalization of the median regression to the estimation of the SURE models. The second article generalizes the median regression on the conventional multivariate regression analysis, i.e., the SURE models with the same design matrices of the equations. In the third article, the author develops ridge estimation for the median regression. Some properties and the asymptotic distribution of the proposed estimator are investigated. In the fourth article, the properties of some biasing parameters used in the literature for ridge regression are investigated when they are used for the new methodology proposed in the third article. In the last article, the methodologies of the four preceding articles are assembled in a more generalized methodology to develop the ridge-type estimation of the LAD method for the SURE models.

ISSN 1403-0470 ISBN 978-91-86345-36-5 JIBS

Jönköping International Business School Jönköping University JIBS Dissertation Series No. 083 • 2012

On Median and Ridge Estimation of

SURE Models

JIBS Disser tation Series No . 083 On Median and Ridg e Estimation of SURE Models ZANGIN ZEEB ARI

On Median and Ridge Estimation of

SURE Models

DS

ZANGIN ZEEBARI

ZANGIN ZEEBARI

(3)

On Median and Ridge Estimation of

SURE Models

(4)

ii

Jönköping International Business School P.O. Box 1026

SE-551 11 Jönköping Tel.: +46 36 10 10 00 E-mail: info@jibs.hj.se www.jibs.se

On Median and Ridge Estimation of SURE Models

JIBS Dissertation Series No. 083

© 2012 Zangin Zeebari and Jönköping International Business School

ISSN 1403-0470

ISBN 978-91-86345-36-5

(5)

iii

Acknowledgements

Achieving graduation is not always an easy process. This has certainly been true in my case, too. Without all the support and encouragement I received when I needed it, it would not have been possible to achieve this end. I would, therefore, like to acknowledge my debt to all whose support was in one way or another instrumental in helping me complete this work.

First and foremost, I have many debts of gratitude to my main advisor for my Ph.D. study, Prof. Ghazi Shukur, for his patience, generous help and solid support at each stage of this dissertation. The first few emails we exchanged were crucial to my future. He introduced me to academic research after he provided me with an invitation letter to join the Ph.D. at Växjö University. I would not have been able to complete this course without his support. I am very happy that we subsequently worked together at Jönköping International Business School. I would not have been able to present my work in international conferences without his support. He has always given me the courage to write on new topics, to teach new subjects and helped me towards a feeling of freedom of choice and thinking. He was always able to provide the necessary push forward to get the work done. Thank you, Prof. Shukur, for your help over all these years!

I would also like to thank my second advisor, Prof. Thomas Holgersson, for his support and invaluable comments on my work. He made a remarkable contribution to completing my dissertation by providing me with some of his books and several computers to run my simulations on. I strongly appreciate this.

I would like to especially thank Prof. Johan Lyhagen from Uppsala University for his invaluable comments and constructive suggestions on earlier versions of the articles, not least for the two occasions on which he did me the honor of going through my work, when he was the discussant at the defense of my Licentiate thesis and in my final seminar. All of these helped me make noticeable improvements.

I am also especially grateful to Prof. Jan Ekberg from Linnaeus University (Växjö) for his generous help in providing me the data for the first article, his useful comments on an earlier draft of the article and supporting me in presenting it at the JSM 2009 international conference, in Washington, D.C., USA. He was a great source of support and help at Växjö University.

I extremely appreciate the varied kinds of help Associate Prof. Håkan Locking offered over the years. He was my second advisor at Linnaeus University (Växjö) where we worked together to prepare my thesis for the Licentiate of Philosophy of Science in Statistics. With his sharp analytical viewpoint he

(6)

iv

taught me to be more cautious about the data and the terminology I was using. His efforts were essential to my gaining admission to Linnaeus University (Växjö) as a Ph.D. student. The support and help he offered during my stay in Växjö made the department a very convenient and useful place to do research. Thank you for everything!

I should also like to thank some anonymous referees for their valuable comments on those articles already published. I also thank my teachers at Lund University: Prof. Krzysztof Nowicki for teaching me advanced probability theory and especially Prof. Björn Holmquist for his valuable and interesting lectures on advanced statistical inference and multivariate methods.

I would like to express my strong appreciation for the support I got from Prof. Börje Johansson during my Ph.D. study. I remember his confidence and giving positive energy during the periodic follow-up meetings for postgraduate study with Ph.D. candidates, and for the occasion on which I was interviewed by him and my advisors. I am grateful to Prof. Scott Hacker, Assistance Prof. Agostino Manduchi and Prof. Andreas Stephan for their patience of listening. I benefited greatly from talking with them and receiving their comments and guidance. I would like to thank Prof. Åke E. Andersson, Prof. Per-Olof Bjuggren, Associate Prof. Johan Klaesson and Prof. Charlotta Mellander for their support and feedback during the periodic follow-up meetings for postgraduate study. My special thanks are due to the administrator of the department Kerstin Ferroukhi for always leaving the door of her office open to me to discuss for any problems and receive help. I would also like to thank Monica Bartels and Maria Carlén at the administration unit for their help. The technical assistance and advice of Susanne Hansson in formatting and printing this dissertation and of Björn Kjellander and Annika Hjalmarsson in the endless corrections and spell checking of the articles were extremely appreciated.

To all my colleagues at the Department of Economics, Finance and Statistics at Jönköping University, thank you for the joyful and stimulating work environment you provided there for me. Therefore, I would like to express my appreciation to: Assistant Prof. Johan Eklund, Mikaela Backhman (especially for your assistance with the courses we taught together), Dr. James Dzansi, Tina Alpfält, Dr. Lina Bjerke, Dr. Johanna Palmberg, Peter Wards, Therese Norman, Özge Öner, Dr. Andreas Högberg, Sara Johansson, Johan P Larsson, Pia Nilsson, Louise Nordström, Lars Pettersson, Erik Wallentin, Diogenis Baboukardos, Hamid Jafari and Viroj Jienwatcharamongkhol.

I am extremely grateful to my friends and colleagues Dr. Krisftofer Månsson, Rashid Mansoor, Dr. Peter Karlsson, Dr. Hyunjoo Kim Karlsson, Jan Weiss, Gabriel Bake and Dr. Pär Sjölander. Thank you for the best thing you shared with me over the years; your friendship. I benefited most from the warm and friendly discussions we had together. The coffee breaks, having lunch together,

(7)

v

Friday after-works, watching movies (with Peter and Hyunjoo), and the countless other times I spent with you all created a stimulating atmosphere for my work.

During my Ph.D. study, my first and longest stay was at Växjö. For the 3 years and 8 months I stayed at Växjö I worked in a comfortable and friendly environment created by the colleagues at the School of Business and Economics, Linnaeus University (Växjö). I would like to thank Prof. Ali Ahmad, Prof. Thomas Lindh, Prof. Dominique Anxo, Mirza Tassaduq Baig, Lars Andersson, Abdulla Almasri, Jonas Månsson and Lars Tomsmark. I would especially like to thank my colleagues as Ph.D. candidates there: Dr. Mikael Ohlson, Dr. Jonas Söderberg, Dr. Monika Hjeds Löfmark, Dr. Lina Andersson, Dr. Susanna Holzer, Dr. Maria Mikkonen, Andreas Mångs, Hans Jonsson and my roommate and friend Joel Karlsson. I am grateful for the enormous technical support with computers I got from Arthur Micallef and Ulf Eclund. The pleasant help and kindness of Hanna Sandqvist, Bernana Delic, Micael Jönsson, Senior Lecturer Christine Tidåsen, Senior Lecturer Anders Hytter, Erika Lundvall, Michaela Sandell and Katharina Lindroos were strongly appreciated. Many thanks are also due to my friend Farvid Mojtaba a Ph.D. candidate at the School of Engineering for his friendship and encouragement. I would also like to express my gratitude to Nasik Najar at the School of Engineering and her family for the joyful time we shared at Växjö and their care and attention. I needed those beautiful moments for relaxation after work. I would like to thank Rojda for the friendship we shared together. Your friendship on tough days always helped me sustain the courage to go on. Thank you for that. The help and assistance of Mohamad Fardmoshiri and Saeid Bagherykhomamy was also strongly appreciated. I would also like to thank all my friends who gave me the courage and hope to complete this work.

Last but not least, my parents and family! Countless thanks are due to my siblings. I would like to express that I owe the greatest debt of gratitude to my parents for bringing me up to be proud of having them. Without your love for me and your belief in me I would get nowhere. My most heartfelt thanks are also due to my wife Martyna. Your love has always been a great source of inspiration and courage.

Nevertheless, perfection is not obligatory for graduation. All remaining errors are, of course, of my own making.

(8)
(9)

vii

Abstract

This doctoral dissertation is a progressive generalization of some of the robust estimation methods in order to make those methods applicable to the estimation of the Seemingly Unrelated Regression Equations (SURE) models. The robust methods are each of the Least Absolute Deviations (LAD) estimation method, also known as the median regression, and the ridge estimation method. The first part of the dissertation consists of a brief explanation of the LAD and the ridge methods. The contribution of this investigation to the statistical methodology is focused on in the second part of the dissertation, which consists of 5 articles.

The first article is a generalization of the median regression to the estimation of the SURE models. The proposed methodology is compared with each of the Generalized Least Squares (GLS) method and the median regression of individual regression equations.

The second article generalizes the median regression on the conventional multivariate regression analysis, i.e., the SURE models with the same design matrices of the equations. The results are compared with the median regression of individual regression equations and the conventionally used OLS estimation method for such models (which is equivalent to the GLS estimation, as well). In the third article, the author develops ridge estimation for the median regression. Some properties and the asymptotic distribution of the estimator presented are investigated, as well. An empirical example is used to assess the performance of the new methodology.

In the fourth article, the properties of some biasing parameters used in the literature for ridge regression are investigated when they are used for the new methodology proposed in the third article.

In the last article, the methodologies of the four preceding articles are assembled in a more generalized methodology to develop the ridge-type estimation of the LAD method for the SURE models. This article has also provided an opportunity to investigate the behavior of some biasing parameters for the SURE models, which were previously used by some researchers in a non-SURE context.

(10)
(11)

ix

Preface

With a prespecified model, if the model specification is not a problem at all, mostly the available data are not perfectly desirable for estimation problem. Undesirability arises from the conflict between God’s will (the data) and man’s will (model specification). Blanchard (1986) is quoted in Gujarati (2003) as saying that “Multicollinearity is God’s will, not a problem with OLS or statistical technique in general”. Also he quoted Achen (1982) as follows.

“Beginning students of methodology occasionally worry that their independent variables are correlated– the so-called multicollinearity problem. But multicollinearity violates no regression assumptions. Unbiased, consistent estimates will occur, and their standard errors will be correctly estimated. The only effect of multicollinearity is to make it hard to get coefficient estimates with small standard errors. But having a small number of observations also has that effect, as does having independent variables with small variances. In fact, at a theoretical level, multicollinearity, few observations and small variances on the independent variables are essentially all the same problem. Thus, What should I do about multicollinearity? is a question like What should I do if I don’t have many observations? No statistical answers can be given”.

In the theory of statistics, it is not desirable (or not allowed) to retouch God’s will, but it is necessary to adapt man’s will to suit God’s will. The problem progresses from undesirable to disputable when man’s will clashes with itself. Pro-least-squares and anti-least-squares modelers (if I don’t want to say scholars), for instance, may blame each other for dealing with problems which have actually arisen from God’s will. The applied statisticians can model the estimating problem differently based on the available data. Koenker (2005) remarks:

“Why does least-squares estimation of the linear regression model so pervade applied statistics? What makes it such a successful tool? Three possible answers suggest themselves. One should not discount the obvious fact that the computational tractability of linear estimators is extremely appealing. Surely this was the initial impetus for their success. Second, if observational noise is normally distributed (i.e., Gaussian), least-squares methods are known to enjoy a certain optimality. But, as it was for Gauss himself, this answer often appears to be an ex post rationalization designed to replace the first response. More compelling is the relatively recent observation that least-squares methods provide a general approach to estimating conditional mean functions”.

In applied statistics, problems with estimation are de jure and de facto due to model specification and data nature, respectively. The author is not among those who believe that man’s will should not be affected by God’s will. Gujarati

(12)

x

names them “the do-nothing scholars of thought”. The data nature is as much God’s will as the Great Flood once was. For applied statisticians, the nature of the data is not a problem in itself, but the estimation is the problem. On one hand, applied statisticians do not want— or they cannot, change the nature of the data (stop the flood from happening, for instance). On the other hand, “do nothing” (drowning in the flood) does not attract most of them. They try very hard to manage to survive from the estimating problem and come up with a solution (safely landing on the summit) through their specified model. The impeccable model specification is the one adapted to suit the data.

In an estimation problem, it is usually the case that some assumptions are imposed on the specified model by its theoretical, mathematical-tractability or data-availability limitations. On one hand, the stricter the assumptions, the more sensitive the estimation usually is to any violation of the assumptions from the data. On the other hand, the stronger the level (scale) of measurement for which the estimator is permitted, the stricter the imposed assumptions on the specified model are likely to be. For instance, reliance on the median is more robust that on the mean. Robustness in the linear regression analysis is one axis of the dissertation and we look for violator data. Some data violate one of several assumptions and some beset the estimation by violating more than one assumption. More violator data causing multicollinear independent variables and non-Gaussian disturbances are emphatically dealt with in this thesis.

For statisticians, an estimator possesses desirable properties (the most accurate estimator) if it is unbiased with the least variance. Or, more precisely, an estimator with less bias and smaller variance is more desirable (more accurate) among a set of estimators. In this thesis, the accuracy of an estimator for comparative purposes is measured by the Mean Squared Error (MSE) of the estimator. Accuracy is usually interpreted as the efficiency. Thus, the efficiency based on the MSE comprises another axis of the dissertation.

The main research throughout the dissertation is presented in 5 articles. The articles are presented in the dissertation exactly as they are published or submitted for publication, with any further changes or comments outlined in the footnotes.

(13)

xi

Contents

Acknowledgements ... iii Abstract ... vii Preface ... ix Contents ... xi

Part I

... 1

Introduction and Summary ... 3

1.1) Introduction ... 3

1.1.1) Median Regression ... 5

1.1.2) Seemingly Unrelated Regression Equations ... 7

1.1.3) Ridge Regression ... 8

1.2) Summary of the Included Articles ... 9

1.2.1) Article I: ... 9 1.2.2) Article II: ... 10 1.2.3) Article III: ... 11 1.2.4) Article IV: ... 11 1.2.5) Article V: ... 12 References ... 13

Part II

... 15 Article I ... 17 ABSTRACT ... 19 2.1) Introduction ... 20 2.2) Methodology ... 22 2.2.1) Median Regression ... 22 2.2.2) SURE Models ... 25

(14)

xii

2.2.3) Seemingly Unrelated Median Regression Equations (SUMRE)

Models: 27

2.3) Efficiency ...31

2.4) Monte Carlo Design and Experiment ...32

2.5) Results ...35

2.5.1) SUMRE GLAD versus SURE GLS ...35

2.5.2) SUMRE GLAD versus Separate Median Regression ...36

2.6) Empirical Application ...41

2.7) Summary and Conclusions ...45

Appendix ...46

2.A.1) Multivariate Skew Normal Distribution ...46

2.A.2) Multivariate Skew-t Distribution: ...47

2.A.3) Parameter Selection: ...49

References ...54

Article II ...55

ABSTRACT ...57

3.1) Introduction ...58

3.2) Methodology ...59

3.3) Monte Carlo experiment ...63

3.4) Empirical example ...69

3.5) Conclusion ...71

Appendix ...72

3.A.1) Multivariate Skew-t Distribution ...73

3.A.2) Parameter Selection ...74

References ...77 Article III ...79 ABSTRACT ...81 4.1) Introduction ...82 4.2) Methodology ...83 4.3) Simulation ...88

(15)

xiii 4.4) Empirical Example ... 89 4.5) Conclusions ... 92 References ... 92 Appendix ... 94 Article IV ... 101 ABSTRACT ... 103 5.1) Introduction ... 104 5.2) Methodology ... 105 5.3) Simulation ... 107 5.4) Simulation Results ... 110

5.4.1) The LAD Ridge Estimation Results ... 111

5.4.2) The OLS Ridge Estimation Results ... 111

5.5) Empirical Example ... 112 5.6) Conclusions ... 114 References ... 115 Appendix ... 117 Article V ... 129 ABSTRACT ... 131 6.1) Introduction ... 132 6.2) Methodology ... 133

6.3) Monte Carlo Experiment ... 136

6.4) Simulation Results ... 140

6.5) Conclusions ... 142

References ... 143

(16)
(17)

1

Part I

(18)
(19)

3

Introduction and Summary

1.1) Introduction

Most of the applied statistical methodology involves an assumption-based approach to modeling different phenomena. Apart from the logical pre-assumptions, it is mainly based on probabilistic and mathematical assumptions. However, the extent to which each probabilistic and/or mathematical assumption is logical could be under debate among applied statisticians. When it comes to the probabilistic assumptions, the most popular are those assumptions that lead to mathematically and probabilistically tractable estimators. The easier the tractability of the estimator, the more popular the assumption is. Some of the most populistic assumptions in statistical methodology of regression analysis are those assumptions that try to model the phenomena in a way that the ground is well prepared for adopting the Ordinary Least Square (OLS) method of estimation. That is done mainly for the ease of mathematical and probabilistic tractability of the OLS estimator.

A strong assumption in regression analysis, that can be used to demonstrate the excellence of the OLS estimator, is the sphericity of the error terms. With spherical error terms, Gauss-Markov theorem states that the OLS estimator is the Best Unbiased Linear Estimator (BLUE). All this is based on strong assumptions; the linearity of the estimator in terms of the dependent variable, the unbiasedness of the estimator and the minimum variance of the estimator. Furthermore, among spherical distributions, the Gaussian distribution of the error terms makes the OLS estimator a probabilistically tractable BLUE, which is of a great interest in hypothesis testing and interval estimation. However, error normality is an idealized assumption.

The unbiasedness and the minimum variance together are strongly desirable statistical properties of estimators, since with these two properties the Mean Squared Error (MSE) of the estimator can be improved. That is the cheerfulness of the assumption of error sphericity for linear estimators and Gauss-Markov theorem. Nonetheless, it is not the case to think that by using other biased and/or non-linear estimators, the MSE of the estimator cannot be improved at all. In fact, they can improve the MSE of the estimator even more. However, it is easier to keep track of the properties of the linear estimators. For instance, the probabilistic tractability and the robustness of the linear estimators

(20)

Jönköping International Business Schiool

4

are much easier to be investigated than those of nonlinear estimators. Therefore, throughout this dissertation, the focus is on the multiple linear regression models.

Consider the classical multiple linear regression model

i i i

y

x β

,

(1.1)

with the pairs

( ,

y x

i i

)

a sample of dependent–independent variable

observations,

i unobservable error terms, for i1, ,n, and the unknown

(p 1) 1 vector parameter β, including intercept. In addition, let us assume

the classical mathematical and probabilistic assumptions which are necessary for the OLS estimation of the parameter in the model (1.1). It is well known that with the idealized assumption of error normality, the maximum likelihood

estimator of β is the OLS estimator, which in turn leads to the sample estimate

of the conditional mean of the dependent variable

Y

given x , i.e.,

( | )

E Y xx b , where b is the OLS estimator.

A serious problem may arise when one or more of the assumptions necessary for the OLS estimation or the idealized error normality assumption are not convincingly suggested. In real life studies, it is quite common to face situations in which the available data cannot be framed to follow all the assumptions necessary for making the OLS estimate the idealized one. The question is whether to preserve the prespecified model and change the procedure for dealing with the data, by using proper transformations on the variables, or drawing up new model specifications.

The transformation of the variables can also be considered as the change in the model, since the new model includes some new variables, but it also can be still seen as the mirror of the original model. Or at least, in many cases an estimate of the original model can be retrieved from the findings of the transformed model. The author means by different model specifications different statistical analysis procedures or different regression models, like generalized linear models, for instance.

This dissertation expresses concern over the violation of the error normality assumption and the high multicollinearity of the independent variables, though none of them cause problems for the OLS estimation. The only problem is that the OLS estimator will lack the ideal assumptions necessary for being probabilistically tractable and precise. Another issue assumed throughout the dissertation is the nonrandomness of independent variables in a regression model, though in many areas of research both the dependent and independent variables are randomly collected. However, except for experimental designs,

(21)

Introduction and summary

5

building models upon the assumption of fixed independent variables is addressed just for the sake of the simplicity of the probabilistic tractability of the estimators.

With abnormal errors, one way of dealing with the model (1.1) is to make other assumptions about the error terms, and to continue to find the maximum likelihood estimate of the model. With different distributions assumed for the error terms, a class of robust estimators is defined which is called in the literature (with the simplest and incomplete definition, the class of) M-estimators (see Maronna, Martin & Yohai, 2006). The OLS estimator is of course also an M-estimator.

A common way of dealing with the problem of multicollinearity is to shrink the estimator through a penalty term imposed on the objective function. The simplest shrinkage estimation method is the ordinary ridge regression method (see Hoerl & Kennard, 1970a,b). This section will briefly be followed by a discussion on a special M-estimator and the ridge-type estimator of the regression parameter, both used as robust estimators for the linear regression model (1.1).

1.1.1) Median Regression

It is well known that the maximum likelihood estimator of a location parameter when the sample is from a normal distribution is the OLS estimator of the parameter while it is the Least Absolute Deviation (LAD) estimator, when the distribution is Laplace (double exponential). Additionally, the OLS estimate of the location parameter leads us to the sample mean and the LAD estimate to the sample median. Both estimators are consistent and asymptotically unbiased estimators. Whatever the distribution, if it is symmetric and the mean exists then the mean and the median will coincide and their estimates will be quite close to each other.

Similarly, for the linear regression model (1.1), the maximum likelihood estimator of the parameter is the OLS estimator if the independent identically distributed (iid) error terms are supposed to come from a normal distribution, while it is the LAD estimator if the distribution is Laplace. Therefore, the LAD estimator is an M-estimator, as well. The LAD estimation method of the model (1.1) is also called the median regression– a special case of the quantile regression at the 0.5 quantile (median) (see Bloomfield & Steiger, 1983). If the supposed error distribution is symmetric and the mean exists, the OLS and the LAD estimates of the model (1.1) will be quite close to each other. It is well known when the mean shifts from zero the OLS estimator of model (1.1)

(22)

Jönköping International Business Schiool

6

is biased. Similarly, a nonzero median leads to a biased LAD estimator. With an intercept involved, for both estimators the slopes are unbiased but the bias of the intercept will be equal to the amount of location shift from zero.

For many real data, the two estimates differ considerably from each other because of an amount of skewness in the error terms. If the maximum likelihood estimation is based on the normality assumption of the error terms, the residuals resulted from the OLS fit must, to a large extend, follow a normal distribution. The same argument holds for the LAD fit residuals and the Laplace distribution. However, any violation of the assumptions about the error distribution affects the OLS estimate more than the LAD estimate.

The influence function of M-estimates, explained by Hampel et al. (1987), can be factorized into 2 parts; the influence of the residuals which is bounded, and the influence of the location of independent variables which is unbounded. For a linear model with random independent variables, the M-estimates are generally nonrobust, since they are sensitive to leverage observations and have a zero breakdown point. However, for models with fixed independent variables, the robustness of some other M-estimates, e.g., the LAD estimate, improves further, compared to the robustness of the OLS estimate. For those estimates, the influence of the residuals is bounded more than the influence of the OLS fit residuals. The following theorem (see Koenker, 2005) shows the robustness of quantile regression estimates to the outliers in the dependent variable.

Theorem:

Let

D

be an n n diagonal matrix with nonnegative elements and ˆ

ˆ y X ( ; ,y X)

 

 

be the residual vector of the th

quantile regression fit

with

 

ˆ( ; , )y X the th

quantile regression estimate of the model (1.1),

y

the vector of observed dependent variable and

X

the design matrix. Then,

ˆ( ; ,y X) ˆ( ;X ˆ( ; ,y X) Dˆ,X)

 

 

 

.

The above theorem indicates that the quantile regression estimate (including the LAD estimate) is not affected by any change in the values of the dependent variable for some observations as long as the relative positions of the observation points to the fitted hyperplane is maintained. Unlike the quantile regression estimates (with the LAD estimate as a special case) the OLS estimate is highly sensitive to any outliers in the residuals.

(23)

Introduction and summary

7

1.1.2) Seemingly Unrelated Regression Equations

Let us consider the multiple linear regression model (1.1) as being bogged down

in autocorrelation and heteroscedasticity problems, i.e.,

E

( )

0

n and

2

( ) n

E



  V

I . Then, the OLS estimator is still unbiased but not the BLUE. An alternative to the OLS estimation method is Aitken’s Generalized Least Squares (GLS) estimation method. Roughly speaking, the idea behind the GLS estimation method is to conduct a proper transformation in order to get

rid of the non-scalar error covariance matrix, i.e., to get 2

n

I , and to maintain

the regression parameter as the same as it was in the original model. That proper transformation is done through multiplying both sides of the equation model (1.1) by the inverse of the square root of the error covariance matrix, i.e.,

1 2 1 2 1 2

Y

X

V

V

β V

.

(1.2)

According to Gauss-Markov theorem, the OLS estimator of the transformation (1.2) is the BLUE. This property is the core argument behind Zellner’s GLS estimation of the Seemingly Unrelated Regression Equations (SURE) models.

Let us consider a system of

M

multiple linear regression equations, whose

error terms are contemporaneously correlated and intertemporally independent. Such systems of equations are called Seemingly Unrelated Regression Equations (SURE) models by Zellner (1962) after his seminal paper on such models. The observation vector of the dependent variable with a block diagonal design of the SURE model is a pile of the observation vectors of the dependent

variables of the

M

multiple linear regressions. The error vector is also a pile of

the error vectors of the individual regression equations.

Individual regression equations are supposed to be free from heteroscedasticity and autocorrelation, but do not necessarily share the same scalar error covariance matrix. Therefore, stacking the error vector of the SURE model with the error vectors of the individual equations causes the problem of heteroscedasticity to appear. Additionally, the inter-equation correlation of the

error terms causes the autocorrelation with the lag n, where n is the sample

size. Consequently, not the OLS estimator of the SURE model, but the GLS estimator will be the BLUE. We should notice that any positive definite matrix instead of the inverse of the square root of the error covariance matrix used for the GLS transformation gives a more efficient estimator than or at least as efficient as the OLS estimator. But only those positive definite matrices that make the transformed errors spherical result in the BLUE estimator.

Here it may come to mind that the GLS estimator is the BLUE for a model that the SURE block diagonal structure caused problems for. The fact is that the OLS estimator of the SURE model is equivalent to the OLS estimator of

(24)

Jönköping International Business Schiool

8

the individual regression equations. Therefore, even with the lack of correlation between inter-equation errors, since their covariance matrix is a diagonal positive definite matrix, the GLS estimation is always a never-lose procedure. The GLS estimation of the SURE model is not helpful when the design matrices of the individual regression equations are identical. With identical design matrix it is not meant the same independent variables in each individual equation but the same observations for the independent variables of each regression equation. In the literature, such SURE models are known as conventional multivariate regression models.

1.1.3) Ridge Regression

Ridge regression, proposed by Hoerl & Kennard (1970a,b), is the most common way of dealing with the problem of multicollinearity, when none of the highly multicollinear independent variables is to be removed from the model. A perfect multicollinearity is a severe mathematical problem for the OLS estimation and many other statistical methods and procedures. Throughout the dissertation references to the multicollinearity indicate less than perfect multicollinearity, unless clearly stated otherwise.

Simply expressed, the problem of multicollinearity arises when we try to retrieve information more than what the data embody. If we look at the variation in the data as the information embodied by the data, with few information in one or more orthogonal directions the problem of multicollinearity appears. If available, by collecting more information the problem of multicollinearity is resolved. Otherwise, ridge regression adds some fictitious information to the data. Most of the research on the ridge regression has been on that fictitious information which is added to the sample. In ordinary ridge regression that information is represented in a single value called the biasing parameter.

The amount of information in each orthogonal direction is the eigenvalue of the covariance matrix of the independent variables corresponding to an eigenvector of that matrix. The orthogonal direction is the direction of the eigenvector itself. Therefore, with little information in one or more orthogonal directions, one or more eigenvalues become very close to zero. If a matrix has some eigenvalues close to zero then the determinant will also be close to zero. The covariance matrix of the OLS estimator (and generally of the M-estimators) relies on the reciprocal of the determinant of the covariance matrix of the independent variables. Therefore, multicollinear independent variables result in big variances of their estimated slopes. With its fictitious information, ridge regression improves the variances of the slopes through expanding the

(25)

Introduction and summary

9

determinant of the covariance matrix of the independent variables, i.e., changing it to a bigger positive number. But that fictitious information causes a bias in their estimates since the fictitious information is not real. The bias is in the form of shrinking the estimated parameter vector to a shorter vector in length. Therefore, the ridge estimation is a trade-off between bias and variance, in a way that gives a smaller MSE of the parameter estimator.

There is no rule of thumb for the extent to which the multicollinearity should exist before it necessitates applying ridge regression. With perfect multicollinearity, any biasing parameter improves the MSE of the estimator. With a perfect orthogonality of the independent variables, i.e., zero multicollinearity, the MSE of the ridge estimator is always bigger than the MSE of the OLS estimator. Ridge regression can improve the MSE of the parameter estimator, even if the multicollinearity is not severe. However, with low multicollinearity, there is no need of adopting ridge regression. In fact, there is even no specific rule for deciding to which extent the multicollinearity is low or high.

1.2) Summary of the Included Articles

In this section, the problems that are investigated in the dissertation along with the suggested solutions to them are briefly discussed. The discussion is arranged in 5 different articles proposing new statistical methodologies.

1.2.1) Article I:

In “On the Median Regression for SURE Models with Applications to 3-Generation Immigrants Data in Sweden”, a LAD estimation method is proposed for estimating the SURE models in the presence of skewed errors (see Shukur & Zeebari, 2011). The same transformation used for the GLS estimation of the SURE models is exploited in developing the LAD estimation method called the Generalized Least Absolute Deviations (GLAD) estimation method. More precisely, instead of the OLS estimate, the LAD estimate of the transformed SURE model is calculated.

Some properties of the new estimator are investigated. For instance, with no correlation between equations, the GLAD estimator is equivalent to the LAD estimators of the individual regression equations. Additionally, the LAD estimator of the SURE model is again equivalent to the LAD estimators of the individual regression equations. However, with identical design matrices of individual regressions, the GLAD estimator is not equivalent to the LAD estimators of individual regression equations.

(26)

Jönköping International Business Schiool

10

A simulation study shows the efficiency of the GLAD estimator over the LAD estimators of the individual regression equations, in the presence of inter-equation error correlations. The higher the level of those correlations, the more efficient the GLAD estimator than the LAD estimators is likely to be. Also, it is found that the GLAD estimator has smaller total variance and generalized variance, in the presence of skewed intra-equation errors. The more the errors depart from the normality, the bigger the gap between the total and generalized variances of the GLAD estimator and the GLS estimator is, in favor of smaller variances of the GLAD estimator.

Additionally, the GLAD estimation method is used for some real data and the results of the estimation are compared with the results of the GLS method and the LAD method of individual equations.

1.2.2) Article II:

In “Median Regression for SUR Models with the Same Explanatory Variables in Each Equation”, the GLAD estimation method is suggested when the SURE model has identical design matrix in each of its equations (see Shukur & Zeebari, 2012). In such cases, the GLS estimation collapses to the OLS estimation of individual regression equations, meaning that the information embedded in the inter-equation error correlation is abandoned. This means that, in conventional multivariate regression analysis, if the error covariance matrix is not diagonal, with the GLAD estimation method, we can still exploit the error covariance matrix in gaining efficiency.

Contrary to the GLS estimator which becomes the OLS estimators of the individual equations, it has been mathematically proved that the GLAD estimator is not equivalent to the LAD estimators of individual equations. A simulation study shows the smaller total variance and generalized variance of the GLAD estimator compared to those of the OLS estimator in the presence of inter-equation error correlations and intra-equation error skewness.

A previously used example in the literature is taken again to show the applicability of the GLAD estimation method to conventional multivariate regression models. The GLAD estimates are compared with the OLS and the LAD estimates. It has been shown that the GLAD estimates have lower variances compared to their OLS and LAD matches.

(27)

Introduction and summary

11

1.2.3) Article III:

In “Developing Ridge Estimation Method for Median Regression”, the problem of multicollinearity in the median regression is targeted (see Zeebari, 2012a). After adding a fictitious portion to the data, as it is done through the ridge-type estimation method, the LAD estimation is used instead of the OLS estimation. The new LAD estimator after adding the fictitious data is called the LAD ridge estimator. Some properties of the LAD ridge estimator, such as asymptotic normality, are investigated.

A simulation study shows the relative efficiency of the LAD ridge estimator to the LAD estimator at the presence of multicollinearity. The higher the multicollinearity, the more the relative efficiency of the LAD ridge estimator to the LAD estimator is. Furthermore, the MSE of the LAD ridge estimator is shown to be much smaller than the MSE of the OLS ridge estimator when the errors are skewed. For any increase in the error skewness, the gap between the two MSEs increases in the favor of the LAD ridge estimator.

An empirical example used in the literature is taken to show the relative efficiency of the LAD ridge estimator over the LAD estimator and the more robustness of the LAD ridge estimator compared to the OLS ridge estimator.

1.2.4) Article IV:

In “A Simulation Study on the Least Absolute Deviations Method for Ridge Regression”, some 17 biasing parameters previously used in the literature for ridge regression are investigated for the LAD ridge method (see Zeebari, 2012b). Furthermore, the LAD version of each of those biasing parameters based on the LAD fit of regression model is developed. In a simulation study, those 17 biasing parameters along their proposed LAD versions are used for both the LAD ridge and the OLS ridge estimation.

With each combination of different sample sizes, levels of multicollinearity, levels of error skewness and number of independent variables in the regression model imposed on the simulated data, an optimum value for the biasing parameter is detected for each of the LAD ridge and the OLS ridge methods. By the optimum value of the biasing parameter, it is meant a positive value that gives the minimum MSE of the estimator. Each calculated biasing parameter is considered as an estimate for the optimum biasing parameter. By processing in

(28)

Jönköping International Business Schiool

12

this manner, an MSE for each biasing parameter is calculated. Along with the MSE of each biasing parameter, the MSEs of the LAD ridge estimator and the OLS ridge estimator using those biasing parameters are calculated.

The same empirical example of Article III is taken to compute the LAD and the OLS versions of the biasing parameters. Then, with each calculated biasing parameter, the LAD ridge and the OLS ridge estimates are calculated, as well.

1.2.5) Article V:

In “On the Least Absolute Deviations Method for Ridge Estimation of SURE Models”, the LAD ridge estimation method proposed in Article III is used for estimation of the SURE model after the same transformation of the GLAD and the GLS estimation is performed (see Zeebari & Shukur, 2012). With the new methodology, the two problems of intra-equation multicollinearity and error skewness in the SURE context are targeted together.

In a Monte Carlo simulation study, the same 17 biasing parameters investigated in Article IV were used for the LAD ridge estimation and the Least Squares (LS) ridge estimation of the SURE model. Some of the biasing parameters were not previously used in the SURE context. Based on the simulation results, with many of those biasing parameters, the LAD ridge estimator is more efficient than the GLAD estimator, in the presence of multicollinearity. The same argument holds for the LS ridge estimator and the GLS estimator of the SURE model.

The ridge estimation is more beneficial to the LS context. This fact can be detected by comparing the smaller MSE of the LS ridge estimator with the MSE of the LAD ridge estimator. A reason that suggests itself may be the fact that the LAD estimation is more robust compared to the LS estimation. Therefore, the ridge-type estimation improves the LS estimation more than the LAD estimation. However, with an increase in the level of skewness of the intra-equation errors, the MSE of the LAD ridge estimator decreases while the MSE of the LS ridge estimator increases. This lets the gap between the two MSEs vanish or even makes the MSE of the LAD ridge estimator smaller when the level of error skewness gets much higher.

(29)

Introduction and summary

13

References

[1] Aitken, A. C. (1935), “On least-Squares and Linear Combination of Observations”, Proceedings of the Royal Society of Edinburgh, 55: 42-48. [2] Bloomfield, P. & Steiger, W. L. (1983), “Least Absolute Deviations:

Theory, Applications, and Algorithms”, Birkhäuser Boston, Inc., USA.

[3] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986), “Robust Statistics: The Approach Based on Influence Functions”, Wiley & Sons, Ltd. Canada.

[4] Hoerl, A. E., and Kennard, R. W. (1970a), “Ridge Regression: Biased Estimation for Nonorthogonal Problems”, Technometrics, 12, 1: 55–67. [5] Hoerl, A. E., and Kennard, R. W. 1970b. Ridge Regression: Application

to Nonorthogonal Problems, Technometrics, 12, No.1: 69–82.

[6] Koenker, R. (2005), “Quantile Regression”, Cambridge University Press, New York, USA: 138–141.

[7] Maronna, R. A., Martin, R. D. & Yohai, V. J. (2006), “Robust Statistics:

Theory and Methods”, Wiley & Sons, Ltd. England.

[8] Shukur, G., and Zeebari, Z. (2011), "On the median regression for SURE models with applications to 3-generation immigrants data in Sweden",

Economic Modelling, 28, 6: 2566-2578.

[9] Shukur, G., and Zeebari, Z. (2012), “Median Regression for SUR Models with the Same Explanatory Variables in Each Equation”, Journal of

Applied Statistics, 39, 8: 1765-1779.

[10] Zeebari, Z. (2012b), “Developing ridge estimation method for median regression”, Journal of Applied Statistics, DOI:10.1080/02664763.2012. 724663.

[11] Zeebari, Z. & Shukur, G. (2012), “A Simulation Study on the Least Absolute Deviations Method for Ridge Regression”, forthcoming in Communications in Statistics–Theory and Methods.

[12] Zeebari, Z. & Shukur, G. (2012), “On the Least Absolute Deviations Method for Ridge Estimation of SURE Models”, submitted to Communications in Statistics–Theory and Methods.

[13] Zellner, A. (1962), “An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias”, Journal of the

(30)
(31)

15

Part II

Collection of Articles

Article 1

On the Median Regression for SURE Models with Applications to 3-Generation Immigrants Data in Sweden

Zangin Zeebari & Ghazi Shukur

Article 2

Median Regression for SUR Models with the Same Explanatory Variables in Each Equation

Zangin Zeebari & Ghazi Shukur

Article 3

Developing Ridge Estimation Method for Median Regression Zangin Zeebari

Article 4

A Simulation Study on the Least Absolute Deviations Method for Ridge Regression

Zangin Zeebari

Article 5

On the Least Absolute Deviations Method for Ridge Estimation of SURE Models

References

Related documents

The geometrical models for pose estimation assume that 2D measurements of the body parts are given, in at least one camera view.. The 3D pose is then estimated from these

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

The numbers of individuals close to or at a kink point have a large influence on the estimated parameters, more individuals close to a kink imply larger estimated incentive

We implemented the model in an empirical problem of locating locksmiths, vehicle inspections, and retail stores of vehicle spare-parts, and we compared the solutions

Location studies on competitive environments have predom- inately considered market areas with already existing facilities competing for customers. These models are designed

The aims of this thesis is to 1) justify the need for a turn towards predictive preventive maintenance planning for the entire rail infrastructure – not only

It is also based on (23), but the weighting does not take the statistics of the impulse response estimate into account. In the case of white input, and using the same length for

In this note we introduce a procedure for the estimation of a functional time series model first utilized in Elezović (2008) to model volatility in Swedish electronic limit order