• No results found

MartinOhlson StudiesinEstimationofPatternedCovarianceMatrices

N/A
N/A
Protected

Academic year: 2021

Share "MartinOhlson StudiesinEstimationofPatternedCovarianceMatrices"

Copied!
68
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköping Studies in Science and Technology. Dissertations.

No. 1255

Studies in Estimation of Patterned

Covariance Matrices

Martin Ohlson

Department of Mathematics

Linköping University, SE–581 83 Linköping, Sweden

Linköping 2009

(2)

Linköping Studies in Science and Technology. Dissertations. No. 1255

Studies in Estimation of Patterned Covariance Matrices

Martin Ohlson martin.ohlson@liu.se www.mai.liu.se Mathematical Statistics Department of Mathematics Linköping University SE–581 83 Linköping Sweden ISBN 978-91-7393-622-4 ISSN 0345-7524 Copyright c 2009 Martin Ohlson Printed by LiU-Tryck, Linköping, Sweden 2009

(3)
(4)
(5)

Abstract

Ohlson, M. (2009). Studies in Estimation of Patterned Covariance Matrices. Doctoral dissertation. ISBN 978-91-7393-622-4. ISSN 0345-7524.

Many testing, estimation and confidence interval procedures discussed in the multivariate statistical literature are based on the assumption that the observation vectors are indepen-dent and normally distributed. The main reason for this is that often sets of multivariate observations are, at least approximately, normally distributed. Normally distributed data can be modeled entirely in terms of their means and variances/covariances. Estimating the mean and the covariance matrix is therefore a problem of great interest in statistics and it is of great significance to consider the correct statistical model. The estimator for the covariance matrix is important since inference on the mean parameters strongly de-pends on the estimated covariance matrix and the dispersion matrix for the estimator of the mean is a function of it.

In this thesis the problem of estimating parameters for a matrix normal distribution with different patterned covariance matrices, i.e., different statistical models, is studied.

Ap-dimensional random vector is considered for a banded covariance structure

re-flectingm-dependence. A simple non-iterative estimation procedure is suggested which

gives an explicit, unbiased and consistent estimator of the mean and an explicit and con-sistent estimator of the covariance matrix for arbitraryp and m.

Estimation of parameters in the classical Growth Curve model when the covariance matrix has some specific linear structure is considered. In our examples maximum like-lihood estimators can not be obtained explicitly and must rely on numerical optimization algorithms. Therefore explicit estimators are obtained as alternatives to the maximum likelihood estimators. From a discussion about residuals, a simple non-iterative estima-tion procedure is suggested which gives explicit and consistent estimators of both the mean and the linearly structured covariance matrix.

This thesis also deals with the problem of estimating the Kronecker product structure. The sample observation matrix is assumed to follow a matrix normal distribution with a separable covariance matrix, in other words it can be written as a Kronecker product of two positive definite matrices. The proposed estimators are used to derive a likeli-hood ratio test for spatial independence. Two cases are considered, when the temporal covariance is known and when it is unknown. When the temporal covariance is known, the maximum likelihood estimates are computed and the asymptotic null distribution is given. In the case when the temporal covariance is unknown the maximum likelihood estimates of the parameters are found by an iterative alternating algorithm and the null distribution for the likelihood ratio statistic is discussed.

(6)
(7)

Populärvetenskaplig sammanfattning

Inom många skattningsproblem, testprocedurer och beräkningar av konfidensintervall som diskuteras i den multivariata statistiska litteraturen antas det att de observerade vektor-erna eller matrisvektor-erna är oberoende och normalfördelade. Det främsta skälet till detta är att observationerna åtminstone oftast har dessa egenskaper approximativt. Normalförde-lad data kan modelleras enbart genom dess väntevärdesstruktur och varians / kovarians. Det är därför ett problem av stort intresse att skatta väntevärdet och kovariansmatrisen bra, samtidigt som det är viktigt att anta en korrekt statistisk modell. Skattningen av ko-variansmatrisen är också viktig eftersom slutsatser om väntevärdet beror på den skattade kovariansmatrisen.

I den här avhandling diskuteras problemet med att skatta parametrarna, alltså vän-tevärdet och kovariansmatrisen, för en matrisnormalfördelning när kovariansmatrisen har olika mönster, det vill säga olika statistiska modeller.

Flera olika strukturer beaktas. Först diskuteras en p-dimensionell stokastisk vektor

som antas ha en kovariansmatris med bandstruktur, vilket innebär ett m-beroende. En

enkel algoritm föreslås som ger en explicit, väntevärdesriktig och konsistent skattning av väntevärdet och en explicit och konsistent skattning av kovariansmatrisen för godtycklig dimensionp och bandbredd m.

Skattning av parametrarna i den klassiska tillväxtkurvemodellen när kovariansma-trisen har en linjär struktur är ett problem av stort intresse. I många exempel kan maximum-likelihoodskattningar inte erhållas explicit och måste därför beräknas med någon nu-merisk optimeringsalgoritm. Vi beräknar explicita skattningar som ett bra alternativ till maximum-likelihoodskattningarna. Från en diskussion om residualerna, ges en enkel al-goritm som resulterar i väntevärdesriktiga och konsistenta skattningar både för väntevärdet och den linjärt mönstrade kovariansmatrisen.

Avhandlingen behandlar även problemet med att skatta kroneckerproduktstrukturen. Observationerna antas följa en matrisnormalfördelning med en separabel kovariansmatris, det vill säga den kan skrivas som en kroneckerprodukt mellan två positivt definita matris-er. Dessa matriser kan tolkas som den spatiala och temporala kovariansen. Huvudmålet är att beräkna likelihoodkvoten som testar spatialt oberoende. Först antas det att den tem-porala kovariansen är känd och maximum-likelihoodskattningarna beräknas. Det visas att fördelningen för likelihoodkvoten är lika med den från fallet med oberoende observation-er. När den temporala kovariansen är okänd, härleds därefter en iterativ algoritm för att hitta maximum-likelihoodskattningarna och fördelningen för likelihoodkvoten diskuteras. När det görs olika tester på kovariansmatriser uppkommer kvadratiska former. I den här avhandlingen härleds en generalisering av fördelningen på den kvadratiska formen av matriser. Det visas att fördelningen på en kvadratisk form är densamma som fördelningen för en viktad summa av ickecentrala wishartfördelade matriser.

(8)
(9)

Acknowledgments

First of all I would like to thank my supervisor Professor Timo Koski for giving me the opportunity to work on the problems discussed in this thesis. It has been very interesting and I am grateful for all the freedom in my work.

I am also very grateful to my assistant supervisor Professor Dietrich von Rosen for all the ideas and all the discussions we have had. Thank you for showing me the "multivariate world" and for all inspiration.

During my time as a PhD-student I have been visiting some people around the world. My deepest thanks goes to to Professor Muni S. Srivastava for my time at the Department of Statistics, University of Toronto. During my time in Canada I was invited for a seminar at Department of Mathematics & Statistics, University of Maryland, Baltimore County. Thank you Professor Bimal Sinha for that opportunity.

I would also like to thank my present and former colleagues at the Department of Mathematics. In particular I wish to thank all the PhD-students at the department and my colleagues at the Division of Mathematical Statistics, especially Dr. Eva Enqvist. Thanks for your many interesting discussions and ideas.

For LaTeX layout of this thesis I am grateful to Dr. Gustaf Hendeby. The LaTeX template has been very convenient and easy to work with. Thank you a lot.

I am very grateful to my dear friend Dr. Thomas Schön for the support and for the proofreading this thesis. We have had a lot of interesting discussions over the years and the coffee breaks with "nice" coffee have been valuable.

Finally, I would like to thank my family for all support. You have always believed in me and encouraged me. Especially you, Lotta! You are very special in my life, you are my everything!

Linköping, April 16, 2009 Martin Ohlson

(10)
(11)

Contents

1 Introduction 1 1.1 Background . . . 2 1.2 Outline . . . 3 1.2.1 Outline of Part I . . . 3 1.2.2 Outline of Part II . . . 3 1.3 Other Publications . . . 5 1.4 Contributions . . . 5

I

Estimation of Patterned Covariance Matrices

7

2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution 9 2.1 Multivariate Distributions . . . 9

2.1.1 Matrix Normal Distribution . . . 9

2.1.2 Wishart and Non-central Wishart Distribution . . . 10

2.1.3 Results on Quadratic Forms . . . 11

2.2 Estimation of the Parameters . . . 18

2.2.1 Maximum Likelihood Estimators . . . 18

2.2.2 Patterned Covariance Matrix . . . 20

2.2.3 Estimating the Kronecker Product Covariance . . . 26

2.2.4 Estimating the Kronecker Product Covariance (One Observation Matrix) . . . 28

2.2.5 The Distribution of a Special Sample Covariance Matrix . . . 33

3 Estimation of the Covariance Matrix for a Growth Curve Model 37 3.1 The Growth Curve Model . . . 37

3.2 Estimation of the Parameters . . . 39

(12)

xii Contents

3.2.1 Maximum Likelihood Estimators . . . 39

3.2.2 Growth Curve Model with Patterned Covariance Matrix . . . 41

4 Concluding Remarks 45 4.1 Conclusion . . . 45

4.2 Future Research . . . 46

Bibliography 49 Due to Copyright restrictions the articles are not included in the electronic version.

II

Papers

57

A On Distributions of Matrix Quadratic Forms 59 1 Introduction . . . 62

2 Distribution of Multivariate Quadratic Forms . . . 63

3 Complex Matrix Quadratic Forms . . . 71

4 A Special Sample Covariance Matrix . . . 73

References . . . 74

B Explicit Estimators under m-Dependence for a Multivariate Normal Distri-bution 77 1 Introduction . . . 80

2 Definitions and Notation . . . 81

3 Explicit Estimator of a Banded Covariance Matrix . . . 82

4 Simulation . . . 90

References . . . 91

C The Likelihood Ratio Statistic for Testing Spatial Independence using a Sep-arable Covariance Matrix 95 1 Introduction . . . 98

2 Known Dependency Structure . . . 99

3 Unknown Dependency Structure . . . 103

3.1 Ψ hasAR(1) Structure . . . 104

3.2 Ψ has Intraclass Structure . . . 108

References . . . 110

D Explicit Estimators of Parameters in the Growth Curve Model with Linearly Structured Covariance Matrices 113 1 Introduction . . . 116

2 Main Idea . . . 117

3 Maximum Likelihood Estimators . . . 120

4 Growth Curve Model with a Linearly Structured Covariance Matrix . . . 120

5 Properties of the Proposed Estimators . . . 124

6 Examples . . . 127

(13)

1

Introduction

T

HISthesis is concerned with the problem of estimating patterned covariance matri-ces for different kinds of statistical models. Patterned covariance matrimatri-ces can arise from a variety of contexts and can be both linear and non-linear. One example of a lin-early structured covariance matrix arise in the theory of graphical modeling. In graphical modeling, a bi-directed graph represent marginal independences among random variables that are identified with the vertices of the graph. Gaussian graphical models are called covariance graph models and can be used for estimation and testing.

If two vertices are not joined by an edge, then the two associated random variables are assumed to be marginally independent. For example, in Figure 2.1 the graph represents the covariance structure for a random vector x = (x1, x2, x3, x4)′. Covariance graph

x3 x4 x2

x1

Figure 1.1: Covariance graph for a random vectorx= (x1, x2, x3, x4)′.

models will generate a patterned covariance matrix with some of the covariances equal to zero, i.e., if the vertices in the graph is not connected by a bi-directed edgei ↔ j then σij = 0. In Figure 1.1 the graph imposes σ12 = σ14 = σ23 = 0 and the covariance

matrix for x is given by

Σ=     σ11 0 σ13 0 0 σ22 0 σ24 σ13 0 σ33 σ34 0 σ24 σ34 σ44     .

The covariance graph can also represent more advanced models if some constraints are added. Another linearly structured covariance matrix is the Toeplitz matrix and a special

(14)

2 1 Introduction

case is a Toeplitz matrix with zeros and different variances. The covariance graph for this special Toeplitz structure is given in Figure 1.2. This structure can for example represent that the covariances depends on the distance between equally spaced locations or time points. The covariance matrix is then given by

Σ=     σ2 1 ρ 0 0 ρ σ2 2 ρ 0 0 ρ σ2 3 ρ 0 0 ρ σ2 4     .

In this thesis, primarily linearly structured covariance matrix is considered. We know that

x2 x3 x4

x1

Figure 1.2: Covariance graph for a kind of Toeplitz covariance matrix with zeros.

inference on the mean parameters strongly depends on the estimated covariance matrix and the dispersion matrix for the estimator of the mean is a function of the covariance matrix. Hence, when testing the mean parameters the estimator of the covariance matrix is very important and it is of great significance to have good estimators for the correct model.

1.1

Background

One of the first to discuss a patterned covariance matrix was Wilks (1946) who considered the uniform (also known as intraclass) covariance structure when dealing with measure-ments onk equivalent psychological tests. The uniform structure is a linear covariance

structure which is given with equal diagonal elements and equal off-diagonal elements. This structure is also of interest since it arise as the marginal distribution in a random-effect model. Votaw (1948) extended the uniform model to compound symmetry structure, a covariance structure similar the uniform structure, but with blocks each having uniform structure. The intraclass model can also be generalized to the Toeplitz or circular Toeplitz structure which is discussed by Olkin and Press (1969), Olkin (1973) and later Nahtman (2006). These models are all special cases of the invariant normal models considered by Andersson (1975) and a proper review of the results related to the invariant normal models can be found in Perlman (1987).

More recently, linearly structured covariance matrices have been discussed in the frame of graphical models, e.g., see Drton and Richardson (2004), Chaudhuri et al. (2007) where an iterative algorithm for the maximum likelihood estimators were given. Origi-nally, many estimators of the covariance matrix were obtained from non-iterative least square methods. With increasing computational power iterative methods such as max-imum likelihood, restricted maxmax-imum likelihood among others were introduced to esti-mate the patterned covariance matrix. Nowadays, the data sets and the dimension of the models are very large and non-iterative methods are preferable again. In this thesis we

(15)

1.2 Outline 3

mainly consider explicit estimators for patterned covariance matrices in the multivariate linear model, but also an iterative estimator for the Kronecker product covariance struc-ture is considered.

The Growth Curve model introduced by Potthoff and Roy (1964) has been extensively studied over the years. The mean structure for the Growth Curve model is bilinear instead of linear as for the ordinary multivariate linear model. Potthoff and Roy (1964) originally derived a class of weighted estimators for the parameter matrix which are a function of an arbitrary positive definite matrix. Khatri (1966a) extended this result and showed that the maximum likelihood estimator also is a weighted estimator.

Patterned covariance matrices for the Growth Curve model have been discussed in the literature, e.g., the intraclass covariance structure has been considered by Khatri (1973), Arnold (1981), Lee (1988). Since the mean structure is bilinear, we will have a decom-position of the space generated by the design matrices as tensor spaces instead of linear spaces as in the linear model case. This decomposition will make maximum likelihood estimation of a patterned covariance more complicated. In this thesis we use the decom-position of tensor spaces and derive explicit estimators of linearly structured covariance matrices.

1.2

Outline

This thesis consists of two parts and the outline is as follows.

1.2.1

Outline of Part I

In Part I the background and theory are given. Chapter 2 starts with definitions and some results for the multivariate distributions that are used, i.e., matrix normal and Wishart distribution. Also a review of the univariate and matrix quadratic form is presented. The second part of Chapter 2 discuss estimation of the mean and covariance matrix in a multivariate linear model. The different estimators which are discussed are maximum likelihood for the non-patterned case and several methods for various structures.

In Chapter 3 we consider the Growth Curve model. The maximum likelihood estima-tors are given for the ordinary non-patterned case and various structures for the covariance matrix are discussed.

Part I ends with a conclusion and some pointers for future work in Chapter 4.

1.2.2

Outline of Part II

Part II consists of four papers. Below follows a short summary for each of the papers.

Paper A: On Distributions of Matrix Quadratic Forms

Ohlson, M. and Koski, T. (2009b). On distributions of matrix quadratic forms. Submitted toCommunications in Statistics - Theory and Methods.

(16)

4 1 Introduction

A characterization of the distribution of the multivariate quadratic form given by XAX′, where X is ap × n normally distributed matrix and A is an n × n symmetric real

ma-trix, is presented. We show that the distribution of the quadratic form is the same as the distribution of a weighted sum of non-central Wishart distributed matrices. This is ap-plied to derive the distribution of the sample covariance between the rows of X when the expectation is the same for every column and is estimated with the regular mean.

Paper B: Explicit Estimators under m-Dependence for a

Multivariate Normal Distribution

Ohlson, M., Andrushchenko, Z., and von Rosen, D. (2009). Explicit estima-tors under m-dependence for a multivariate normal distribution. Accepted for publication inAnnals of the Institute of Statistical Mathematics.

The problem of estimating parameters of a multivariate normalp-dimensional random

vector is considered for a banded covariance structure reflectingm-dependence. A simple

non-iterative estimation procedure is suggested which gives an explicit, unbiased and consistent estimator of the mean and an explicit and consistent estimator of the covariance matrix for arbitraryp and m.

Paper C: The Likelihood Ratio Statistic for Testing Spatial

Independence using a Separable Covariance Matrix

Ohlson, M. and Koski, T. (2009a). The likelihood ratio statistic for testing spatial independence using a separable covariance matrix. Technical Report LiTH-MAT-R-2009-06, Department of Mathematics, Linköping University. This paper deals with the problem of testing spatial independence for dependent obser-vations. The sample observation matrix is assumed to follow a matrix normal distribu-tion with a separable covariance matrix, in other words it can be written as a Kronecker product of two positive definite matrices. Two cases are considered, when the temporal covariance is known and when it is unknown. When the temporal covariance is known, the maximum likelihood estimates are computed and the asymptotic null distribution is given. In the case when the temporal covariance is unknown the maximum likelihood estimates of the parameters are found by an iterative alternating algorithm and the null distribution for the likelihood ratio statistic is discussed.

Paper D: Explicit Estimators of Parameters in the Growth Curve

Model with Linearly Structured Covariance Matrices

Ohlson, M. and von Rosen, D. (2009). Explicit estimators of parameters in the Growth Curve model with linearly structured covariance matrices. Sub-mitted toJournal of Multivariate Analysis.

Estimation of parameters in the classical Growth Curve model when the covariance ma-trix has some specific linear structure is considered. In our examples maximum likeli-hood estimators can not be obtained explicitly and must rely on optimization algorithms.

(17)

1.3 Other Publications 5

Therefore explicit estimators are obtained as alternatives to the maximum likelihood esti-mators. From a discussion about residuals, a simple non-iterative estimation procedure is suggested which gives explicit and consistent estimators of both the mean and the linearly structured covariance matrix.

1.3

Other Publications

Other publications from conferences are listed below.

• Distribution of Quadratic Forms at MatTriad 2007, 22-24 March 2007, Bedlewo,

Poland.

• More on Distribution of Quadratic Forms at The 8th Tartu Conference on

Multi-variate Statistics and The 6th Conference on MultiMulti-variate Distributions with Fixed Marginals, 26-29 June 2007, Tartu, Estonia.

• Explicit Estimators under m-Dependence for a Multivariate Normal Distribution at

LinStat 2008, 21-25 April 2008, Bedlewo, Poland.

• The Likelihood Ratio Statistic for Testing Spatial Independence using a Separable

Covariance Matrix at Swedish Society for Medical Statistics, Spring conference,

27 March 2009, Uppsala, Sweden.

1.4

Contributions

The main contributions of the thesis are as follows.

• In Paper A a characterization of the distribution of a matrix quadratic form is

de-rived. This characterization is similar to the one for the univariate case and can be used to prove several properties for the matrix quadratic form.

• Estimation of a banded covariance matrix, i.e., a matrix with zeros outside an

arbi-trary large band, is considered in Paper B. A non-iterative estimation procedure is suggested which gives explicit, unbiased and consistent estimators.

• In Paper C an iterative alternating algorithm for the maximum likelihood estimators

is derived for the case when we have Kronecker product covariance structure and only one observation matrix. The cases with intraclass and autoregressive structure of order one are handled.

• Linearly structured covariance structures for the Growth Curve model are discussed

in Paper D. Using the residuals a simple non-iterative estimation procedure is sug-gested which gives explicit and consistent estimators of both the mean and the linearly structured covariance matrix.

(18)
(19)

Part I

Estimation of Patterned

Covariance Matrices

(20)
(21)

2

Estimation of the Covariance Matrix

for a Multivariate Normal Distribution

M

ANYtesting, estimation and confidence interval procedures discussed in the multi-variate statistical literature are based on the assumption that the observation vectors are independent and normally distributed (Anderson, 2003, Srivastava and Khatri, 1979, Muirhead, 1982). There are two main reasons for this. Firstly, sets of multivariate obser-vations are often, at least approximately, normally distributed. Secondly, the multivariate normal distribution is mathematically tractable. Normally distributed data can be mod-eled entirely in terms of their means and variances/covariances. Estimating the mean and the covariance matrix are therefore problems of great interest in statistics, as well as in many related more applied areas.

2.1

Multivariate Distributions

2.1.1

Matrix Normal Distribution

Suppose that X: p×n is a random matrix. Let the expectation of X be E(X) = M : p×n

and the covariance matrix becov(X) = Ω : (pn) × (pn).

Definition 2.1. A positive definite matrix A is said to be separable if it can be written as

a Kronecker product of two positive definite matrices B and C,

A= B ⊗ C.



Here⊗ is the Kronecker product, see Kollo and von Rosen (2005). Suppose that the

covariance matrix Ω is separable, i.e., Ω= Ψ ⊗ Σ, where Ψ : n × n and Σ : p × p are

two positive definite covariance matrices. Assume that X is matrix normal distributed, denoted by X∼ Np,n(M, Σ, Ψ), which is equivalent to

vecX ∼ Npn(vecM, Ψ ⊗ Σ) ,

(22)

10 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

wherevec ( · ) is the vectorization operator (Kollo and von Rosen, 2005). The covariance

matrix Σ can be interpreted as the covariance between the rows of X and Ψ can be interpreted as the covariance between the columns of X. Since Σ and Ψ are positive definite, written as Σ> 0 and Ψ > 0, the density function of X is

f (X) = (2π)−12pn|Σ|−n/2|Ψ|−p/2etr  −1 2Σ −1(X − M) Ψ−1(X − M)′ , (2.1)

which is the same density function as forvecX, since we can write |Ψ ⊗ Σ| = |Ψ|p|Σ|n, and

vec′X(Ψ ⊗ Σ)−1vecX = tr{Σ−1−1X},

whereetr(A) = exp(tr(A)) and |Σ| is the determinant of the matrix Σ.

One of the most important properties of the matrix normal distribution is that it is invariant under bilinear transformations.

Theorem 2.1

Let X∼ Np,n(M, Σ, Ψ). For any matrices B : q × p and C : m × n

BXC′ ∼ Nq,m BMC′, BΣB′, CΨC′.

2.1.2

Wishart and Non-central Wishart Distribution

The Wishart distribution is the multivariate generalization of the χ2 distribution. The

matrix W : p × p is said to be Wishart distributed if and only if W = YY′ for some matrix Y ∼ Np,n(M, Σ, I), where Σ is positive definite (Kollo and von Rosen, 2005). If

the mean is zero, M= 0, the Wishart distribution is said to be central and this is denoted

by W ∼ Wp(n, Σ). Otherwise, if M 6= 0, the Wishart distribution is non-central,

W∼ Wp(n, Σ, Ω), where Ω = MM′.

The density function for W∼ Wp(n, Σ) exists if n ≥ p and is given by

fW(W) =  2pn/2Γp  n 2  |Σ|n/2−1|W|(n−p−1)/2exp  −1 2tr{Σ −1W}  ,

for W> 0 and where the multivariate gamma function Γp n2

 is given by Γp  n 2  = πp(p−1)/4 p Y i=1 Γ  1 2(n + 1 − i)  . (2.2)

If W ∼ Wp(n, Σ, Ω), then the characteristic function of W can be found in Muirhead

(1982) and is given by ϕW(T) = |I − iΓΣ|−n/2etr  −1 2Σ −1  etr  1 2Σ −1(I − iΓΣ)−1 ,

where T = (tij)pi,j=1, Γ = (γij) = ((1 + δij) tij)pi,j=1,δij is the Kronecker delta and

tij = tji. Hence, for the case of central Wishart distribution it is nothing more than

(23)

2.1 Multivariate Distributions 11

It is known, that if a Wishart distributed matrix W is transformed as BWB′ for some matrix B: q × p, we have a new Wishart distributed matrix (Kollo and von Rosen, 2005,

Muirhead, 1982).

Theorem 2.2

Let W∼ Wp(n, Σ, Ω) and B : q × p real matrix. Then

BWB′∼ Wq n, BΣB′, BΩB′

 .

It is also easy to see from the definition of a Wishart distribution that the sum of indepen-dent Wishart distributed variables is again Wishart distributed.

Theorem 2.3

Let the random matrix W1 ∼ Wp(n, Σ, Ω1) be independent of the matrix W2 ∼

Wp(m, Σ, Ω2). Then

W1+ W2∼ Wp(n + m, Σ, Ω1+ Ω2).

2.1.3

Results on Quadratic Forms

In this section we will study the distribution of a multivariate quadratic form Q= XAX′, where X : p × n is a random matrix and A : n × n is a real symmetric matrix. The

distribution of Q has been studied by many authors, see for example Khatri (1962), Hogg (1963), Khatri (1966b), Hayakawa (1966), Shah (1970), Rao and Mitra (1971), Hayakawa (1972) and more recently Gupta and Nagar (2000), Vaish and Chaganty (2004), Kollo and von Rosen (2005) and the references therein. Wong and Wang (1993), Mathew and Nordström (1997), Masaro and Wong (2003), Hu (2008) discussed Wishartness for the quadratic form when the covariance matrix is non-separable, i.e., we can not write it as a Kronecker product.

Univariate Quadratic Forms

The quadratic form Q = XAX′ is the multivariate generalization of the univariate quadratic form

q = x′Ax,

where x has a multivariate normal distribution x ∼ Np(µ, Σ), e.g., see Rao (1973),

Graybill (1976), Srivastava and Khatri (1979), Muirhead (1982). We will start by consid-ering the univariate case when the variables are independent. Later on we consider the dependent case with a covariance matrix Σ. The results are given below for the sake of completeness. There is a lot of literature on this topic and the results quoted below can be found in for example Graybill (1976), Srivastava and Khatri (1979), Muirhead (1982). In the subsequent section these results are recapitulated for the multivariate case too. An application of the distribution of a specific quadratic form is the sample variances2.

(24)

12 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution Example 2.1 Let C= In− 1 n11 ′, where 1= (1, . . . , 1)′: n × 1, I

n : n × n is the identity matrix. Then we have

q = x′Cx= x′  I− 1 n11 ′   I− 1 n11 ′  x = (x − ¯x1)′(x − ¯x1) = p X i=1 (xi− ¯x)2= (n − 1)s2,

where the second equality follows from the fact that C = C2, i.e, C is idempotent. Furthermore,x =¯ 1

n1

x and the distribution ofq = xCx is the same as the distribution

of(n − 1)s2.

In the simplest case we suppose that the variables are standard normal, x ∼ Np(0, I).

The distribution ofq is central χ2under some restrictions on A.

Theorem 2.4

Suppose that x∼ Np(0, I) and let q = x′Ax. Then the distribution ofq is central χ2,

i.e.,q ∼ χ2

k, if and only if A is a symmetric idempotent matrix andrank(A) = k.

Again, if we assume the variables to be independent, but with nonzero means, the distri-bution ofq is non-central χ2, under the same restrictions on A as in Theorem 2.4. The

non-centrality parameter will depend on A and µ.

Theorem 2.5

Suppose that x∼ Np(µ, I) and let q = x′Ax. Then the distribution ofq is non-central

χ2, i.e.,q ∼ χ2k(δ), where δ = 12µ

Aµ, if and only if A is an idempotent matrix and

rank(A) = k.

Assume now that the variables in x are dependent, i.e., let x∼ Np(µ, Σ), where Σ > 0.

Since Σ is positive definite, Σ can be decomposed as Σ = Γ′Γ, where Γ : p × p and rank(Γ) = p. Using this decomposition the following theorem from Graybill (1976) is

easy to prove.

Theorem 2.6

Suppose that x∼ Np(µ, Σ), where rank(Σ) = p and let q = x′Ax. Then the

distribu-tion ofq is non-central χ2, i.e.,q ∼ χ2

k(δ), where δ = 12µ

Aµ, if and only if any of the

following three conditions are satisfied

• AΣ is an idempotent matrix of rank k. • ΣA is an idempotent matrix of rank k.

(25)

2.1 Multivariate Distributions 13

Here Σ is a generalized inverse of A, denoted Σ = A−, if and only if AΣA = A.

A special case of Theorem 2.6 is the following corollary (Graybill, 1976), where the covariance matrix is supposed to be a diagonal matrix and the mean is the same for every variable.

Corollary 2.1

Suppose that x∼ Np(µ, D), where D is a diagonal matrix of rank p. Then q = x′Ax∼

χ2

p−1(δ), where δ = 12µ ′Aµ, if

A= D−1− (1′D−11)−1D−111D−1.

Alsoδ = 0 if µ = µ1 for any scalar µ.

Corollary 2.1 gives the distribution of the ordinary sample variance.

Example 2.2

Letx1, . . . , xnbe a random sample and letxi∼ N (µ, σ2) for i = 1, . . . , n. Furthermore,

let the variables be independent. We can now write

x= (x1, . . . , xn)′∼ Nn(µ1, σ2I)

and using Corollary 2.1 we have D= σ2I and

A= D−1− 1′D−11−1D−111′D−1= σ−2 I− n−111′.

Using this we see that the quadratic form x′Ax has the followingχ2distribution,

x′Ax=x ′ I− n−111′x σ2 = (x − ¯x1)′(x − ¯x1) σ2 = Pn i=1(xi− ¯x)2 σ2 = (n − 1)s2 σ2 ∼ χ 2 n−1,

and this is as we expected.

The next theorem is the most general one in the question about the distribution of the quadratic form. It states that every quadratic form x′Ax has the same distribution as

the weighted sum of independent non-centralχ2 variables, see for example Baldessari

(1967), Tan (1977), Graybill (1976) for more details.

Theorem 2.7

Suppose that x∼ Np(µ, Σ), where rank(Σ) = p. The random variable q = x′Ax has

the same distribution as the random variablew = Pni=1diwi, wheredi are the latent

roots of the matrix AΣ, and wherewiare independent non-centralχ2random variables,

each with one degree of freedom.

Theorem 2.7 and the fact that a sum of independent non-centralχ2 variables is again

non-centralχ2distributed give that all the theorems above are special cases.

One of the first to consider independence between quadratic forms was Cochran (1934).

(26)

14 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Theorem 2.8 (Cochran’s Theorem)

Given x ∼ Np(0, I), suppose that xx is decomposed into k quadratic forms, qi =

x′Bix,i = 1, 2, . . . , k, where rank (Bi) = ri and the Bi ≥ 0, then any one of the

following conditions implies the other two.

• Pki=1ri= p,

• each qi∼ χ2ri,

• all the qiare mutually independent.

Many authors have generalized Cochran’s theorem, see for example Chipman and Rao (1964), Styan (1970), Tan (1977) and the references therein.

Multivariate Quadratic Forms

Several authors, see for example Khatri (1962), Shah (1970), Vaish and Chaganty (2004), have investigated the conditions under which the quadratic form Q = XAX′, where

X ∼ Np,n(M, Σ, Ψ), has a Wishart distribution. Rao (1973) answered the question

with the relation

XAX′ ∼ Wp ⇔ l′XAX′l∼ χ2,

for any fixed vector l. Hence, the theory of univariate quadratic forms can be applied to the multivariate case. If M= 0 and Ψ = I, we have a multivariate version of Theorem

2.4 (see for example Rao (1973), Gupta and Nagar (2000) for the details).

Theorem 2.9

Let Y∼ Np,n(M, Σ, I) and let A : n×n be a symmetric real matrix. Then Q = YAY

is Wishart if and only if A is an idempotent matrix.

Example 2.3

Let Y ∼ Np,n(µ1′, Σ, I), with µ = (µ1, . . . , µp)′, i.e., Y isn independent p-vectors,

Y = (y1, . . . , yn). The sample mean vector ¯y and sample covariance matrix S are

defined by ¯ y= 1 n n X i=1 yi= 1 nY1, S= 1 n(Y − ¯y1 ′)(Y − ¯ y1′)′= YCY′,

where C is the centralization matrix, i.e., C= I − 1 n11

. We know that C is idempotent

andrank(C) = n − 1. Using Theorem 2.9 we have nS ∼ Wp(n − 1, Σ),

since MCM′= µ1′C1µ′= 0.

(27)

2.1 Multivariate Distributions 15

AΨ is an idempotent matrix.

Corollary 2.2

Let X∼ Np,n(M, Σ, Ψ) and let A : n × n be a symmetric real matrix. Then

Q= XAX′∼ Wp r, Σ, MAM′

 ,

wherer = rank(AΨ) if and only if AΨ is an idempotent matrix.

The next theorem from Kollo and von Rosen (2005) (Theorem 2.2.4.) gives the necessary and sufficient conditions for two quadratic forms to be independent.

Theorem 2.10

Let X ∼ Np,n(M, Σ, Ψ), Y ∼ Np,n(0, Σ, Ψ) and A : n × n and B : n × n be

non-random matrices. Then

(i) YAYis independent of YBYif and only if

ΨAΨB′Ψ= 0, ΨA′ΨBΨ= 0, ΨAΨBΨ= 0, ΨA′ΨB′Ψ= 0,

(ii) YAYis independent of YB if and only if

B′ΨA′Ψ= 0, B′ΨAΨ= 0.

Theorem 2.10 can be used to show the independence between the sample mean and sam-ple covariance.

Example 2.4

Let Y∼ Np,n(µ1′, Σ, I), with µ = (µ1, . . . , µp)′. Using Theorem 2.10(ii) we see that

¯

y and S given in Example 2.3 are independent.

Khatri (1962) extended Cochran’s theorem (Theorem 2.8) to the multivariate case by discussing conditions for Wishartness and independence of second degree polynomials. Several other have also generalized Cochran’s theorem for the multivariate case, see for example Rao and Mitra (1971), Khatri (1980), Vaish and Chaganty (2004), Tian and Styan (2005).

Khatri (1966b) derived the density for Q= XAX′, when X∼ Np,n(0, Σ, Ψ), i.e.,

for the central case M= 0, as

2−12pn  Γp  1 2n −1 |AΨ|−12p|Σ|−12n|Q|12(n−p−1) exp  −1 2q −1tr{Σ−1Q}  0F0  T,1 2q −1Σ−1Q  , (2.3)

(28)

16 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

where q > 0 is an arbitrary constant and T = In − qA−

1

2Ψ−1A− 1

2. The density function involves the the hypergeometric function of matrix argument0F0and is relatively

cumbersome to handle. The hypergeometric function can be expand in terms of zonal polynomialsCκ( · ) (see Muirhead (1982) for details about the zonal polynomials) as

0F0(R, S) = ∞ X k=0 X κ Cκ(R)Cκ(S) Cκ(I)k! , (2.4)

which is slowly convergent and the expansion in terms of Laguerre polynomials may be preferable for computational purposes.

The probability density function (2.3) is written as the product of a Wishart density function and a generalized hypergeometric function. This form is not always convenient for studying properties of Q. For M = 0 and Ψ = In, both Hayakawa (1966) and

Shah (1970) derived the probability density function for Q. Using the moment generat-ing function of Q, Shah (1970) expressed the density function of Q in terms of Laguerre polynomials with matrix argument. Hayakawa (1966) gave the probability density func-tion for Q when M= 0 and Ψ = Inas

2−12pn  Γp  1 2n −1 |A|−12p|Q| 1 2(n−p−1)0F0  A−1, −1 2Σ −1Q  . (2.5)

Hayakawa (1966) also showed that any quadratic form Q can be decomposed into a linear combination of independent central Wishart or pseudo Wishart matrices with coefficients equal to the eigenvalues of A.

The Laplace transform was used in Khatri (1977) to generalize the results of Shah (1970) to the non-central case. When A = I, Khatri (1977) also obtained a similar

representation of the non-central Wishart density in terms of the generalized Laguerre polynomial with matrix argument.

In the non-central case when X ∼ Np,n(M, Σ, Ψ) Gupta and Nagar (2000) gave

the non-central density for Q = XAX′ in terms of generalized Hayakawa polynomi-als, which are expectations of certain zonal polynomials. Gupta and Nagar (2000) also computed the moment generating function and used it for proving Wishartness and inde-pendence of quadratic forms.

Ohlson and Koski (2009b) characterized the distribution of Q using the characteristic function. The characteristic function for Q= YAY′is given by the following theorem due to Ohlson and Koski (2009b).

Theorem 2.11

Let Y∼ Np,n(M, Σ, I), then the characteristic function of Q = YAYis

ϕQ(T) = r Y j=1  |Ip− iλjΓΣ|−1/2etr  −1 2Σ −1 j  etr  1 2Σ −1 j(Ip− iλjΓΣ)−1  ,

where T= (tij)pi,j=1, Γ= (γij) = ((1 + δij) tij)pi,j=1,tij = tjiandδijis the Kronecker

delta. The non-centrality parameters are Ωj = mjm′j, where mj= M∆j. The vectors

(29)

2.1 Multivariate Distributions 17

Ohlson and Koski (2009b) showed that the distribution of Q coincides with the distribu-tion of a weighted sum of non-central Wishart distributed matrices, similar as in the case when M= 0 and Ψ = I done by Hayakawa (1966). A multivariate version of Theorem

2.7 was given by Ohlson and Koski (2009b) for the matrix quadratic form Q= YAY′.

Theorem 2.12

Suppose Y∼ Np,n(M, Σ, I) and let Q be the quadratic form Q = YAY, where A:

n × n is any real matrix. Then the distribution of Q is the same as for W =PjλjWj,

whereλjare the nonzero latent roots of A and Wjare independent non-central Wishart,

i.e.,

Wj∼ Wp(1, Σ, mjm′j),

where mj = M∆j and ∆jare the corresponding latent vectors.

The multivariate quadratic form YAY′ has the same distribution as W = PjλjWj,

whereλjare the nonzero latent roots of A and Wjare independent non-central Wishart,

i.e., Wj ∼ Wp(1, Σ, mjm′j), where mj = M∆j and ∆jare the corresponding latent

vectors. Since this class of distributions are rather common Ohlson and Koski (2009b) defined the distribution and gave several properties.

Definition 2.2. Assume Y∼ Np,n(M, Σ, I). Define the distribution of the multivariate

quadratic form Q= YAY′to beQp(A, M, Σ).



If we transform a Wishart distributed matrix W as BWB′, we have a new Wishart dis-tributed matrix. In the same way we can transform our quadratic form.

Theorem 2.13

Let Q∼ Qp(A, M, Σ) and B : q × p real matrix. Then

BQB′ ∼ Qq(A, B,BΣB′).

More theorems and properties for the distribution of Q= YAY′can be found in Ohlson and Koski (2009b).

The standard results about distributions of quadratic forms such as Theorem 2.9 follow from Theorem 2.12 and the fact that idempotent matrices only have latent roots equal to zero or one (Horn and Johnson, 1990). The expectation of the quadratic form YAY′is another property that is easy to derive using Theorem 2.12.

Theorem 2.14

Let Q= YAY′ ∼ Qp(A, M, Σ) then E YAY′= tr (A) Σ + MAM′.

Now, suppose that X∼ Np,n(M, Σ, Ψ) i.e., the columns are dependent as well.

Corollary 2.3

The distribution of Q= XAX′is the same as for

W=X

j

(30)

18 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

where λj are the nonzero latent roots of Ψ1/2AΨ1/2 and Wj are independent

non-central Wishart, i.e.,

Wj ∼ Wp(1, Σ, mjm′j),

where mj = MΨ−1/2∆jand ∆jare the corresponding latent vectors.

Hence, we see that the distribution of Q isQp(Ψ1/2AΨ1/2, MΨ−1/2, Σ).

Remark 2.1. The latent roots are the same for Ψ1/2AΨ1/2and AΨ.

2.2

Estimation of the Parameters

Originally, many estimators of the covariance matrix were obtained from non-iterative least squares methods such as the ANOVA and MINQUE approaches, for example when estimating variance components. When computer resources became more powerful iter-ative methods such as maximum likelihood, restricted maximum likelihood, generalized estimation equations among others were introduced. However, nowadays one is interested in applying covariance structures, including variance components models, to very large data sets. These are found for example in QTL-analysis in Genetics or time series with densely sampled observations in meteorology or in EEG/EKG-studies in medicine.

2.2.1

Maximum Likelihood Estimators

Let x ∼ Np(µ, Σ), where µ = (µ1, . . . , µp)′ and Σ > 0. Furthermore, let xi, i =

1, . . . , n, be an independent random sample on x. The observation matrix can then be

written as

X= (x1, . . . , xn) ∼ Np,n(µ1′, Σ, I) .

From (2.1) the likelihood function is given by

L (µ, Σ) = (2π)−12pn|Σ|−n/2etr  −1 2Σ −1(X − µ1) (X − µ1)′  . (2.6) Since (X − µ1′) (X − µ1′)′= S + n (¯x− µ) (¯x− µ)′, where ¯ x= 1 nX1, and S= X I − 1(1′1)−11′X= X I − n−111′X, (2.7)

(31)

2.2 Estimation of the Parameters 19

the likelihood function (2.6) can be written as

L (µ, Σ) = (2π)−12pn|Σ|−n/2etr  −1 2Σ −1 S+ n (¯ x− µ) (¯x− µ)′  . (2.8) Hence, from Fischer-Neyman factorization Theorem (Schervish (1995), page 89)(¯x, S)

is a sufficient statistic for(µ, Σ). The maximum likelihood estimators can be established

through a series of inequalities.

L (µ, Σ) = (2π)−12pn|Σ|−n/2etr  −1 2Σ −1 S+ n (¯ x− µ) (¯x− µ)′  (2.9) ≤ (2π)−12pn|Σ|−n/2etr  −1 2Σ −1S  (2.10) ≤ (2π)−12pn|1 nS| −n/2exp (−pn/2) , (2.11)

where the first inequality holds since(¯x− µ)′Σ−1(¯x− µ) ≥ 0 and the second

inequal-ity since we can use

|Σ|−n/2etr  −1 2Σ −1S  ≤ |1 nS| −n/2exp (−pn/2)

given in Srivastava and Khatri (1979), page 25. Equality in (2.9)-(2.11) holds if and only if µ= ¯x and Σ= 1

nS. Hence, the maximum likelihood estimators for µ and Σ are

b µML= 1 nX1 (2.12) and b ΣML= 1 nX I− 1(1 ′1)−11′X. (2.13)

Since I− 1(1′1)−11′is a projection on the orthogonal space to the column space

gen-erated of the vector 1, using Theorem 2.10(ii) it can be seen thatµbMLand bΣMLare

independent and distributed due to the following theorem (Anderson (2003), page 77 and 255).

Theorem 2.15

If X ∼ Np,n(µ1′, Σ, I) then the maximum likelihood estimators given in (2.12) and

(2.13) are independently distributed as

b µML∼ Np  µ,1 nΣ  and n bΣML∼ Wp(n − 1, Σ) .

(32)

20 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

2.2.2

Patterned Covariance Matrix

Patterned covariance matrices arise in a variety of contexts and have been studied by many authors. In a seminal paper Wilks (1946) considered patterned structures when dealing with measurements onk equivalent psychological tests. This led to a covariance matrix

with equal diagonal elements and equal off-diagonal elements, i.e., a covariance matrix given by

Σ= σ2((1 − ρ)I + ρ11′) : p × p, (2.14) with−p−11 < ρ < 1. This structure is called uniform, complete symmetry or intraclass

covariance structure. Wilks (1946) developed statistical test criteria for testing equality in means, equality in variances and equality in covariances. The structure implies that both the inverse and the determinant have closed form expressions (Muirhead (1982), page 114) and the maximum likelihood estimators are given by

b σ2= trS p(n − 1) and b ρ = 1 ′S1− trS (p − 1)trS, where S is given in (2.7).

Votaw (1948) extended the intraclass model to a model with blocks called compound

symmetry, type I and type II. The compound symmetry covariance matrices are, for the

p = 4 case, given as ΣI =     α β β β β γ δ δ β δ γ δ β δ δ γ     and ΣII =     α β κ σ β α σ κ κ σ γ δ σ κ δ γ     . (2.15)

Votaw (1948) considered different psychometric and medical research problems where the compound symmetry is applicable. In Szatrowski (1982) block compound symmetry was discussed and the models were applied to the analysis of an educational testing problem. In a series of papers, Szatrowski (1985) discussed how to obtain maximum likelihood estimators for the elements of a class of patterned covariance matrices in the presence of missing data.

Another type of block structure arise when multivariate complex normal distribution is considered. The multivariate complex normal distribution can be defined as in Srivastava and Khatri (1979).

Definition 2.3. Let z = x + iy with mean θ and covariance matrix Q = Σ1+ iΣ2.

Then z∼ CNp(θ, Q) if and only if

 x y  ∼ N2p  θ1 θ2  , ΣC  ,

(33)

2.2 Estimation of the Parameters 21 where ΣC =1 2  Σ1 −Σ2 Σ2 Σ1  , θ= θ1+ iθ2. 

Goodman (1963) was one of the first to study the covariance matrix of the multivariate complex normal distribution, which for example arise in spectral analysis of multiple time series. A direct extension is to study quaternions, e.g., see Andersson (1975) and Andersson et al. (1983). The covariance matrix in the quaternion case is a4p × 4p matrix

and is structured as ΣQ=     Σ1 Σ2 Σ3 Σ4 −Σ2 Σ1 −Σ4 Σ3 −Σ3 Σ4 Σ1 −Σ2 −Σ4 −Σ3 Σ2 Σ1     .

A circular stationary model, where variables are thought of as being equally spaced around a circle was considered by Olkin and Press (1969). The covariance between two variables in the circular stationary model depends on the distance between the variables and the covariance matrix forp = 4 and p = 5 have the structures

ΣCS= σ02     1 ρ1 ρ2 ρ1 ρ1 1 ρ1 ρ2 ρ2 ρ1 1 ρ1 ρ1 ρ2 ρ1 1     and ΣCS= σ02       1 ρ1 ρ2 ρ2 ρ1 ρ1 1 ρ1 ρ2 ρ2 ρ2 ρ1 1 ρ1 ρ2 ρ2 ρ2 ρ1 1 ρ1 ρ1 ρ2 ρ2 ρ1 1      .

Olkin and Press (1969) considered three symmetries, namely circular, intraclass and spherical and derived likelihood ratio test and the asymptotic distribution under the hy-pothesis and alternative. Olkin (1973) generalized the circular stationary model with a multivariate version in which each element was a vector and the covariance matrix can be written as a block circular matrix.

The covariance symmetries investigated by for example Wilks (1946), Votaw (1948) and Olkin and Press (1969) are all special cases of invariant normal models considered by Andersson (1975). The invariant normal models include not only all models specified by symmetries of the covariance matrix, but also the linear models for the mean. The symmetry model defined by a groupG is the family of covariance matrices given by

SG+=Σ|Σ > 0, GΣG′= Σ for all G ∈ G ,

i.e., this implies that if x is a random vector withcov(x) = Σ such that Σ ∈ SG+ then there is a symmetry restrictions on the covariance matrix, namely thatcov(x) = cov(Gx)

for all G∈ G. Perlman (1987) summarized and discussed the the group symmetry

covari-ance models suggested by Andersson (1975). Furthermore, several examples of different symmetries, maximum likelihood estimators and likelihood ratio tests are given in Perl-man (1987). The following example by PerlPerl-man (1987) shows the connection between a certain symmetry and a group representation.

(34)

22 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Example 2.5

Let the random vector x be partitioned as x= (x′

1, . . . , x′k)′and assume thatcov(x) =

Σ have circular block symmetry, i.e.,

cov(x) = cov(Prx), r = 1, . . . , k − 1, (2.16) where P=        0 Iq 0 · · · 0 0 0 Iq 0 0 0 . .. I q Iq 0 0 · · · 0        : (qk) × (qk).

LetG1be the cyclic group of orderk given by

G1=

n

I, P, . . . , Pk−1o.

Fork = 4 one can verify that SG+1consists of all positive definite matrices of the form

ΣCBS =     A B C B′ B′ A B C C B′ A B B C B′ A     , A= A′, C = C′,

where A, B, C : q × q and it follows that the circular block symmetry condition (2.16) is

equivalent to Σ∈ SG+1.

In Jensen (1988) a class of covariance models that is larger than the class of invariant normal models was obtained. Jensen (1988) considered structures which are linear in both the covariance and the inverse covariance and proved that these can be parameterized by Jordan algebras. The structures which are linear in the covariance are given by

Σ=

m

X

i=1

σiGi, (2.17)

where Gi, i = 1, . . . , m are symmetric and linear independent known matrices and

σi, i = 1, . . . , m are unknown real parameters such that Σ is positive definite. The

struc-ture (2.17) was first discussed by Anderson (1969, 1970, 1973), where the likelihood equations, an iterative method for solving these equations and the asymptotic distribution of the estimates were given.

Permutation invariant covariance matrices were considered in Nahtman (2006) and

it was proven that permutation invariance implies a specific structure for the covariance matrix. Nahtman and von Rosen (2005) showed that shift invariance implies Toeplitz covariance matrices and marginally shift invariance gives block Toeplitz covariance ma-trices.

(35)

2.2 Estimation of the Parameters 23

There exist many papers on Toeplitz covariance matrices, e.g., see Burg et al. (1982), Fuhrmann and Barton (1990), Marin and Dhorne (2002) and Christensen (2007). To have a Toeplitz structure means that certain invariance conditions are fulfilled, e.g., equality of variances and covariances. Toeplitz covariance matrices are all banded matrices. Banded covariance matrices and their inverses frequently arise in biological, economical time series and time series in engineering. For example in signal processing applications, in-cluding autoregressive or moving average image modeling, covariances of Gauss-Markov random processes (Woods, 1972, Moura and Balram, 1992), or numerical approximations of partial differential equations based on finite differences. Banded matrices are also used to model the correlation of cyclostationary processes in periodic time series (Chakraborty, 1998).

One type of Toeplitz matrices is the covariance matrices arising from the theory of time series. Durbin (1959) suggested an efficient estimation procedure for the parameters in a Gaussian moving average time series of order one. Godolphin and De Gooijer (1982) computed the exact maximum likelihood estimators of the parameters for a moving av-erage process. The autoregressive moving avav-erage process has been discussed by many authors. Anderson (1975) gave the likelihood equations for the parameters which are non-linear in most cases and proposed some iterative solutions. Anderson (1977) derived the Newton-Raphson procedures for the same likelihood equations. For more references on time series analysis, see for example Box and Jenkins (1970), Hannan (1970), Anderson (1971). See Figure 2.2 for the connections between different covariance structures.

Several authors have considered other structures. For example, Jöreskog (1981) con-sidered linear structural relations (LISREL) models and Lauritzen (1996) more sophis-ticated structures within the framework of graphical models. Browne (1977) reviews patterned correlation matrices arising from multiple psychological measurements.

Linearly Structured Covariance Matrices with Zeros

Covariance matrices with zeros have been considered by several authors, see for example Grzebyk et al. (2004), Drton and Richardson (2004), Mao et al. (2004), Chaudhuri et al. (2007).

An new algorithm, called iterative conditional fitting, for the maximum likelihood estimators in the multivariate normal case is derived in Drton and Richardson (2004), Chaudhuri et al. (2007). The iterative conditional fitting algorithm is based on covariance graph models and is an iterative algorithm for deriving the maximum likelihood estima-tors. Suppose we have a random vector x= (x1, x2, x3, x4) with the covariance matrix

Σ=     σ11 0 σ13 0 0 σ22 0 σ24 σ13 0 σ33 σ34 0 σ24 σ34 σ44     . (2.18)

A covariance graph is a graph with a vertex for every random variable in the random vec-tor x. The vertices in the graph are connected by a bi-directed edgei ↔ j unless σij = 0.

Hence, the covariance graph corresponding to the covariance matrix (2.18) is given in Figure 2.1. The iterative conditional fitting algorithm developed in Drton and Richardson (2004), Chaudhuri et al. (2007) is given as follows. First fix the marginal distribution for

(36)

24 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

x3 x4 x2

x1

Figure 2.1: Covariance graph for covariance matrix(2.18).

the variables different fromi. Then estimate, by maximum likelihood, the conditional

distribution of variablei given all the other variables under the constraints implied by the

covariance graph model. Last, calculate a new estimate of the joint distribution by mul-tiplying together the fixed marginal and the estimated conditional distributions. Repeat until some convergence is obtained.

Let the index set of the variables beV = 1, . . . , p, the set of all variables but i be −i = V \ {i} and sp(i) = {j|i ↔ j}, i.e., the set of variables that is dependent to variable i. Let also nsp(i) = V \(sp(i) ∪ {i}). Assume that we have n observations from a normal

distribution with mean zero and covariance matrix with some zeros. Let the observation matrix be X = (x1, . . . , xn) ∼ Np,n(0, Σ, In). Furthermore, the distribution if the

variables X−iis given by

X−i∼ Np−1,n(0, Σ−i,−i, In) ,

where XS is the submatrix of X including the rows given by the index setS and ΣS,T

is the submatrix of Σ including the rows and columns given by index sets S and T ,

respectively.

The conditional distribution of Xi|X−iis then

Xi|X−i∼ N1,n Σi,−iΣ−1−i,−iX−i, λiIn



, (2.19)

where

λi= σii− Σi,−iΣ−1−i,−iΣ−i,i.

Using the fact thatσij= 0 if j ∈ nsp(i) we have from (2.19)

Xi|X−i∼ N1,n   X j∈sp(i) σijZ(i)j , λiIn   , (2.20)

where all the pseudo-variables Z(i)sp(i)are defined as

Z(i)sp(i)= Σ−1−i,−i



sp(i),−iX−i (2.21)

(37)

2.2 Estimation of the Parameters 25

Algorithm 2.1 (Iterative conditional fitting algorithm)

1. Set the iteration counterr = 0, and choose a starting valueΣb(0) ≥ 0, for example the identity matrix.

2. SetΣb(r,0)= bΣ(r)and repeat the following steps for alli ∈ V:

(i) LetΣb(r,i)−i,−i= bΣ(r,i−1)−i,−i and calculate from this submatrix the pseudo-variables

Z(i)sp(i)given in(2.21).

(ii) Compute the maximum likelihood estimates

b

Σ(r,i)i,sp(i)= XiZ(i)′sp(i)



Z(i)sp(i)Z(i)′sp(i)

−1 , bλi= 1 n  Xi− bΣ (r,i) i,sp(i)Z (i) sp(i)   Xi− bΣ (r,i) i,sp(i)Z (i) sp(i) ′ ,

for the linear regression given in(2.20).

(iii) CompleteΣb(r,i)by solving forσiiand thus setting

b σ(r,i)ii = bλi+ bΣ (r,i) i,−i  b Σ(r,i)−i,−i −1! sp(i),sp(i) b Σ(r,i)−i,i.

3. Set Σb(r+1) = bΣ(r,p), increment the counter r tor + 1 and repeat step 2 until convergence is obtained.

Ohlson et al. (2009) studied banded matrices with unequal elements except that certain covariances are zero. This banded structure is a special case of the structure considered by Chaudhuri et al. (2007), see Figure 2.2. The basic idea is that widely separated observa-tions in time or space often appear to be uncorrelated. Therefore, it is reasonable to work with a banded covariance structure where all covariances more thanm steps apart equal

zero, a so calledm-dependentstructure. Let Σ(m)(k) : k × k be an m-dependent banded

covariance matrix and partition Σ(m)(k) as

Σ(m)(k) = Σ (m) (k−1) σ1k σ′ k1 σkk ! , (2.22) where σ′k1 = (0, . . . , 0, σk,k−m, . . . , σk,k−1) .

The procedure suggested by Ohlson et al. (2009) to estimate a banded covariance matrix

(38)

26 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

Algorithm 2.2 (Explicit estimators for a banded covariance matrix)

LetX∼ Np,n(µ1′n, Σ (m)

(p) , In), with arbitrary integerm. The estimators ofµandΣ (m) (p)

are given by the following two steps.

(i) Use the maximum likelihood estimator forµ1, . . . , µm+1andΣ(m)(m+1).

(ii) Calculate the following estimators fork = m + 2, . . . , pin increasing order, where for eachkleti = k − m, . . . , k − 1:

b µk = 1 nx ′ k1n, (2.23) b σki= bβki | bΣ(k−1)| | bΣ(k−2)|, (2.24) b σkk = 1 nx ′ k  In− bXk−1( bX ′ k−1Xbk−1)−1Xb ′ k−1  xk+σb′k1Σb −1 (k−1)σb1k, (2.25) where b σ′k1 = (0, . . . , 0, bσk,k−m, . . . , bσk,k−1) , b βk =  b βk0, bβk,k−m, . . . , bβk,k−1 ′ = ( bX′k−1Xbk−1)−1Xb ′ k−1xk, (2.26) b Xk−1= (1n,bxk−1,k−m, . . . ,xbk−1,k−1) and b xk−1,i= k−1X j=1 (−1)i+j|cM ji (k−1)| | bΣ(k−2)|xj,

whereMcji(k−1)is the matrix obtained when thejth row andith column have been removed fromΣb(k−1).

The estimators in Algorithm 2.2 are fairly natural, but they are ad hoc estimators and Ohlson et al. (2009) motivated them with the following theorem.

Theorem 2.16

The estimatorµb = (bµ1, . . . , bµp)′given in Algorithm 2.2 is unbiased and consistent, and

the estimator bΣ(m)(p) = (bσij) is consistent.

2.2.3

Estimating the Kronecker Product Covariance

If the covariance matrix for a matrix normal distribution is separable, i.e., if it can be written as a Kronecker product between two matrices the model belongs to the curved exponential family. Thus, under the Kronecker product structure the parameter space

(39)

2.2 Estimation of the Parameters 27 ΣCT ΣAR Σ(m) ΣW Z Σ> 0 ΣT Σ= σ2I ΣIC

Figure 2.2: Different covariance structures. (ΣW Z = with zeros,Σ(m)= banded,

ΣT = toeplitz, ΣCT = circular toeplitz, ΣAR = autoregressive andΣIC =

intraclass)

is of lower dimension and it has other statistical properties, i.e., one have to be careful since the estimation and testing can be more complicated. Kronecker product structures in covariance matrices have recently been studied by Dutilleul (1999), Naik and Rao (2001), Lu and Zimmerman (2005), Roy and Khattree (2005a), Mitchell et al. (2005, 2006), Srivastava et al. (2007).

Let X1, . . . , XN be a random sample from the matrix normal distribution, Xi ∼

Np,n(M, Σ, Ψ) for i = 1, . . . , N . From (2.1) the logarithm of the likelihood function

for the sample X1, . . . , XN can, ignoring the normalizing constant, be written as

ln L(M, Σ, Ψ) = −nN 2 ln |Σ| − pN 2 ln |Ψ| −1 2 N X i=1 trΣ−1(Xi− M) Ψ−1(Xi− M)′ . (2.27)

(40)

28 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution Dutilleul (1999) as c M= 1 N N X i=1 Xi= ¯X, (2.28) b Σ= 1 nN N X i=1 Xi− ¯X bΨ −1 Xi− ¯X ′ (2.29) and b Ψ= 1 pN N X i=1 Xi− ¯X ′ b Σ−1 Xi− ¯X. (2.30)

There is no explicit solution to (2.29) and (2.30). Dutilleul (1999) derived an estima-tor for Ψ⊗ Σ using the flip-flop algorithm given in Algorithm 2.3. Srivastava et al.

(2007) pointed out that the Gaussian model with a separable covariance matrix, i.e.,

X ∼ Np,n(M, Σ, Ψ) belongs to the curved exponential family and the convergence

and uniqueness have to be carefully considered.

Algorithm 2.3 (The flip-flop algorithm)

1. Choose a starting value forΨb = bΨ(0). 2. EstimateΣb(r)from(2.29)withΨb = bΨ(r−1). 3. EstimateΨb(r)from(2.30)withΣb = bΣ(r).

4. Repeat Step 2 and 3 until some convergence criteria is fulfilled.

Another problem with the estimation of Σ and Ψ respectively, is that all the parameters are not uniquely defined since for every scalarc 6= 0 we have

Ψ⊗ Σ = cΨ ⊗1 cΣ.

Srivastava et al. (2007) gave two ways to obtain unique estimates. Either setψnn= 1 or

ψii = 1 for i = 1, . . . , n respectively and then use Algorithm 2.3 under these constraints.

2.2.4

Estimating the Kronecker Product Covariance (One

Observation Matrix)

In many situations in statistical analysis the assumption of independence between obser-vations are violated. Obserobser-vations from a spatio-temporal stochastic process can under some conditions be described as a matrix normal distribution with a separable covariance matrix. Often also when the stochastic process are spatio-temporal, some structures can be assumed for one or both of the matrices in the Kronecker product. Shitan and Brock-well (1995), Ohlson and Koski (2009a) have considered the Kronecker product structure in time series analysis and structures for the covariance matrices.

(41)

2.2 Estimation of the Parameters 29

The spatio-temporal processes typically only have one observation matrix. Let a ran-dom matrix X: p× n have matrix normal distribution with a separable covariance matrix,

i.e., X∼ Np,n(M, Σ, Ψ), where the covariance between the rows is Σ : p × p and the

covariance between the columns is Ψ : n × n. We will call Σ and Ψ the spatial and

temporal covariance matrix, respectively. Furthermore, we will start by assuming that Ψ is known.

Assume that X = (x1, x2, . . . , xn), where xi ∼ Np(µ, Σ). Since the temporal

covariance is Ψ we haven dependent vectors. The expectation of X is given by E(X) = µ1′, where µ= (µ1, . . . , µp)′. WhenN = 1 in log-likelihood function (2.27) it is easy

to prove the following theorem.

Theorem 2.17

Let X ∼ Np,n(µ1′, Σ, Ψ), where Σ > 0 and Ψ is known. The maximum likelihood

estimators for µ and Σ are given by

b

µ= (1′Ψ−11)−1−11

and

n bΣ= XHX′,

where H is the weighted centralization matrix

H= Ψ−1− Ψ−11 1′Ψ−11−11′Ψ−1. (2.31) We know that the ordinary sample covariance matrixnS = X(I −n111′)X′is Wishart distributed and we can show that the same is valid for the sample covariance matrix A= XHX′. Since HΨ is idempotent,rank(H) = n − 1 and using Corollary 2.2 we have the

following corollary.

Corollary 2.4

Let X∼ Np,n(µ1′, Σ, Ψ), where Ψ is known, then n bΣ= XHX′∼ Wp(n − 1, Σ).

In most cases, both the spatial and temporal covariance matrices are unknown. If the two covariance matrices have no structure it is impossible to estimate all parameters from one sample observation matrix X. There arep(p + 1)/2 + n(n + 1)/2 + p parameters to

estimate. In many applications we reduce the number of parameters by assuming that the temporal covariance matrix Ψ has some structure, see for example Chaganty and Naik (2002), Huizenga et al. (2002), Roy and Khattree (2005a,b).

Let X ∼ Np,n(µ1′, Σ, Ψ), where Σ > 0 and Ψ > 0. The following corollary

given by Ohlson and Koski (2009a) gives the maximum likelihood estimators when the temporal covariance matrix Ψ can be estimated explicitly.

Corollary 2.5

Let X ∼ Np,n(µ1′, Σ, Ψ) and assume that the temporal covariance matrix Ψ can be

estimated explicitly. The maximum likelihood estimators for µ and Σ are given by

b

(42)

30 2 Estimation of the Covariance Matrix for a Multivariate Normal Distribution

and

n bΣ= X bHX′,

where bH is the estimated weighted centralization matrix i.e., b

H= bΨ−1− bΨ−111′Ψb−11

−1

1′Ψb−1. (2.32)

Ohlson and Koski (2009a) presented an iterative algorithm to find the maximum likeli-hood estimates of the parameters in a matrix normal distribution, similar the algorithm in Dutilleul (1999). The big difference is that Ohlson and Koski (2009a) only have one sam-ple observation matrix and assumed that the temporal covariance Ψ has an autoregressive or intraclass structure, whereas the spatial covariance matrix is unstructured.

Let Ψ be the covariance matrix from an autoregressive process of order one, i.e.,

AR(1) (see Brockwell and Davis (2002), Ljung (1999) for more details). The covariance

matrix is then given by

Ψ(θ) = 1 1 − θ2         1 θ θ2 · · · θn−1 θ 1 · · · θ2 ... . .. .. . θ θn−1 θ 1         .

Further, let the covariance matrix Σ > 0 be unstructured. This model implies that every

row in X comes from the same stationaryAR(1) time series, but with different variances σii. The model can be written as

xit− µi= θ (xi,t−1− µi) + εit, i = 1, . . . , p, t = 1, . . . , n

for some expectationsµiand whereεit∼ N (0, σii), |θ| < 1 and εitis uncorrelated with

xisfor eachs < t. The different AR(1) time series are dependent since σij 6= 0.

A reasonable first estimate of the parameterθ could be the mean of the Yule-Walker

estimates b θ(0)= 1 p p X i=1 b θi, (2.33)

where bθiis the Yule-Walker estimate ofθ (see Brockwell and Davis (2002) for the theory

around the Yule-Walker estimate) in each time series, i.e.,

b θi= Pn−1 t=1(xit− ¯xi)(xi,t+1− ¯xi) Pn t=1(xit− ¯xi)2 , where ¯xi= 1 n n X t=1 xit.

The determinant and the inverse of Ψ can easily be calculated,

References

Related documents

Resultaten från den accelererade åldringen visar dock att val av material samt utförande kan ha stor betydelse för beständigheten av lufttätheten.. I början på 90-talet

För produkter utan EPD:er, användes data från Ecoinvent eller international reference life cycle data system (ILCD). Data från databaser som Ecoinvent eller ILCD är inte

Triangeln liknar den frestelsestruktur som Andersson och Erlingsson ställer upp som visar när det finns möjligheter och blir rationellt för en individ att agera korrupt:

• Byta till mindre kärl för tyngre avfallsfraktioner, t ex matavfall (se tipsblad). Är märkningen av kärlens olika fraktioner

When we looked at relative gene expression value of cyp1a from PCB, PFOS, PCB with PFOS, PCB with PFHxA, except PFHxA alone, we could see differences in average from 13 times up to

Specific objectives are (i) to derive explicit estimators of parameters in the extended growth curve model when the covariance matrix is linearly structured, (ii) to propose

Division of Mathematical Statistics Linköping University SE-581 83 Linköping, Sweden. www.liu.se Joseph Nzabanita B ilinear and T rilinear R

Vi vill alltså, genom våra intervjuer, få en uppfattning om hur andra generationens invandrare upplever sin tillhörighet till det svenska samhället och hur du med