Explicit Estimators for a Banded Covariance Matrix in a Multivariate Normal Distribution

(1)

Examensarbete

Explicit Estimators for a Banded Covariance Matrix in a

Multivariate Normal Distribution

(2)

(3)

Explicit Estimators for a Banded Covariance Matrix in a

Multivariate Normal Distribution

Mathematical Statistics, Link¨opings Universitet Emil Karlsson

LiTH - MAT - EX - - 2014 / 01 - - SE

Examensarbete: 16 hp Level: G2

Supervisor: Martin Singull,

Mathematical Statistics, Link¨opings Universitet Examiner: Martin Singull,

(4)

(5)

Abstract

The problem of estimating mean and covariances of a multivariate normal distributed random vector has been studied in many forms. This thesis focuses on the estimators proposed in [15] for a banded covariance structure with m-dependence. It presents the previous results of the estimator and rewrites the estimator when m = 1, thus making it easier to analyze. This leads to an adjustment, and a proposition for an unbiased estimator can be presented. A new and easier proof of consistency is then presented. This theory is later generalized into a general linear model where the corresponding theorems and propositions are made to establish unbiasedness and consistency. In the last chapter some simulations with the previous and new estimator verifies that the theoretical results indeed makes an impact.

Keywords: Banded covariance matrices, Covariance matrix estimation, Explicit esti-mators, Multivariate normal distribution, general linear model.

URL for electronic version:

(6)

(7)

Acknowledgements

First of all, I want to thank my supervisor Martin Singull. You introduced me to the world of mathematical statistics and have inspired me alot. You have read, helped and commented on my work for a long time and have always been helpful when I ran into questions, whether it was about style, the thesis or mathematics in general.

Further I would like to thank my opponent Jonas Granholm for the comments regarding the thesis and for the time taken to improve it.

I also want to thank Tahere for your encouragement and continuous support. I want to thank mum and dad as well.

Lastly I would like to thank the most important person in my life, Ghazaleh. You have always been an incredible source of inspiration and happiness for me and have helped me in ways I never could have imagined. You are the most incredible person I have ever met. Thank you for making me a better person. I love you.

(8)

(9)

Nomenclature

Most of the reoccurring abbreviations and symbols are described here.

Notation

Throughout the thesis matrices and vectors will be denoted by boldface transcription. Matrices will be denoted by capital letters and vectors by small letters of Latin or Greek alphabets. Scalars and matrix elements will be denoted by ordinary letters of Latin or Greek alphabets. Random variables will be denoted by capital letters from the end of the Latin alphabet. The end of proofs are marked by_.

LIST OF NOTATION Am,n- matrix of size m × n

Mm,n- the set of all matrices of size m × n

aij- matrix element of the i-th row and j-th column

an- vector of size n

c - scalar

A0- transposed matrix A In- identity matrix of size n

|A| - determinant of A rank(A) - rank of A tr(A) - trace of A X - random matrix x - random vector X - random variable E[X] - expectation var(X) - variance

cov(X, Y ) - covariance of X and Y S - sample dispersion matrix

Np(µ, Σ) - multivariate normal distribution

Np,n(M , Σ, Ψ) - matrix normal distribution

(10)

(11)

Introduction

There exist many testing, estimation, confidence interval and types of regression mod-els in the multivariate statistical literature that are based on the assumption that the underlying distribution are normally distributed [21, 11, 1]. The primary reason is that often sets of multivariate datasets are, at least approximately, normally distributed. The multivariate normal distribution is also simpler to analyze than many other distri-butions. For example all the information in a multivariate normal distribution can be found in its mean and covariances. Because of this, estimating the mean and covari-ances are subjects of importance in statistics.

This thesis will study an estimating procedure of a patterned covariance matrix. Pat-terned covariance matrices arise from a variety of different situations and applications and have been studied by many authors. A seminar paper in the 1940:s [23] considered patterned covariances when studying psychological tests. This was the introduction of the covariance matrix with equal diagonal and equal off-diagonal elements. Two years later the author extended this structure into blocks which had a certain pattern [22]. During the late 1960:s the covariance model where the covariances where depended on the distance between variables, a circular stationary model, where developed by [17]. There were different developments and directions for covariance matrices with struc-tures, e.g., [16, 12, 3].

Some extensions from a linear model with one error term into mixed linear models, growth curves and variance component models were made, e.g., [6, 20]. More recent results, e.g., on block structures of matrices, can be found in [13, 14] and others. A special type of patterned covariance matrix called a banded covariance matrix will be discussed in this thesis. Banded covariance matrices are common in applications and arise often in association with time series. For example in signal processing, co-variances of Gauss-Markov random processes or cyclostationary processes [24, 10, 5]. In this thesis we will study a special case of banded matrices with unequal elements except that certain covariances are zero. These covariance matrices will have a tridiag-onal structure.

Originally, estimates of covariance matrices were obtained using non-iterative methods such as analysis of variance (ANOVA) and minimum norm quadratic unbiased estima-tion (MINQUE). The introducestima-tion of the modern computer changed a lot of things and the cheap processing power made it possible to use iterative methods which performed

(14)

2 Chapter 1. Introduction

better. With this came the rise of the maximum likelihood method and more general estimating equations. These methods has surely dominated during the last years. But nowadays we see a shift back to non-iterative methods since the datasets has grown tremendously. There are examples in EEG/EKG-studies in medicine, QTL-analysis in genetics or time series in meteorology. With such huge datasets estimating with itera-tive methods can be a slow and tedious job.

This thesis will discuss some properties of an explicit non-iterative estimator for a banded covariance matrix derived in [15] and present some improvement to this esti-mator. The improvements gives an unbiased and consistent estimator under the special case of first order dependence for the mean and the covariance matrix.

1.1 Chapter outline

Chapter 2:

Introduces the necessary contexts and results in mathematics, mainly in linear algebra and statistics, which are required for studying this thesis. Some of these results are references to throughout the thesis.

Chapter 3:

Presents the explicit estimator and some results regarding it. From these results a new unbiased explicit estimator is suggested.

Chapter 4:

The explicit estimator is generalized for estimating the covariance matrix in a general linear model. Later an unbiased estimator is proposed.

Chapter 5:

Simulations based on the new unbiased explicit estimator in comparison to the previ-ous explicit estimators are presented.

Chapter 6:

Discuss further improvements and directions of research which are interesting with re-gard to the explicit estimator.

Appendix:

(15)

Chapter 2

Mathematical background

This chapter will introduce some of the mathematics that are needed to understand this thesis. Some of the material may be new and other not, but the aim of this chapter is to give a person studying a Bachelor program in Mathematics sufficient background to understand this thesis. Most of theorems will be given without a proof but it can be accessed through the referenced sources of each section.

2.1 Linear algebra

Since this thesis makes extensive use of linear algebra and matrices some of the more common definitions and theorems will be introduced here.

2.1.1 General definitions and theorems

In the first part follows some general definitions and theorems in real linear algebra. This section presents some notation and expressions that are required later on. These results will be given without proof but can be found in [7, 8, 2].

Definition 1. The set of all matrices with m rows and n columns is denoted as Mm,n.

Definition 2. A matrix A ∈ Mn,nis called the identity matrix of sizen, denoted with

In, if the diagonal elements are1 and the off-diagonal elements are 0.

Definition 3. A matrix A ∈ Mm,nis called a tridiagonal matrix if aij = 0 for all

|i − j| > 1, where aijis the element on rowi and column j of the matrix A.

Definition 4. A vector 1n∈ Mn,1of sizen is called the one-vector of size n, e.g.,

15= 1 1 1 1 1

0 . where10denotes the transpose of1.

Definition 5. A matrix A ∈ Mm,nis called the zero-matrix, denoted as0m,n, if all

matrix elements ofA equals 0.

Definition 6. The rank of a matrix A ∈ Mm,n, denoted asrank(A), is defined as the

(16)

4 Chapter 2. Mathematical background

Definition 7. A matrix A ∈ Mm,nis called symmetric ifA0= A.

Definition 8. A matrix A ∈ Mm,nis called normal ifAA0= A0A.

Definition 9. A matrix A ∈ Mm,n is called orthogonal ifA0A = In. In that case

AA0= Imas well.

Definition 10. A symmetric square matrix A ∈ Mn,n is positive (semi-)definite if

x0Ax > (≥)0 for any vector x 6= 0.

Definition 11. The values λi, which satisfyAxi= λixi, are called eigenvalues of the

matrix . The vector xi, which corresponds to the eigenvalueλi, is called eigenvector

ofA corresponding to λi.

And lastly a theorem regarding matrix decomposition, called the spectral theorem. Theorem 1. Any normal matrix A ∈ Mn,nhas an orthonormal basis of eigenvectors.

In other words, any normal matrixA can be represented as A = U DU0,

whereU ∈ Mn,nis an orthogonal matrix andD ∈ Mn,nis a diagonal matrix with

the eigenvalues ofA on the diagonal.

2.1.2 Trace

This section introduces the linear operator called trace which is frequently used in this thesis. These results can be found in [7, 2].

Definition 12. The trace of a square matrix A ∈ Mn,n, denoted bytr(A), is defined

to be the sum of its diagonal entries, that is, tr(A) =

n

X

i=1

aii.

Here follows some common properties regarding the trace of a matrix. Lemma 1. (i) tr(A + B) = tr(A) + tr(B) for A, B ∈ Mn,n.

(ii) tr(cA) = ctr(A) for all scalars c. Thus, the trace is a linear map. (iii) tr(A0) = tr(A)

(iv) tr(AB) = tr(BA) for A ∈ Mn,mandB ∈ Mm,n.

Equivalently, the trace is invariant during cyclic permutations, i.e.,

(v) tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC) for A, B, C and D of proper sizes. This is known as the cyclic property.

The trace of matrix A can also be written as the sum of its eigenvalues.

Lemma 2. Let A ∈ Mn,n andλ1, . . . , λn the eigenvalues ofA, listed according to

their algebraic multiplicities, then,

tr(A) =

n

X

i=1

(17)

2.1. Linear algebra 5

2.1.3 Idempotence

Here follows a definition and some results regarding idempotent matrices. This type of matrices arises frequently in multivariate statistics. These results rely on [7, 2]. Definition 13. A matrix A ∈ Mn,nis called idempotent ifA2= AA = A.

In terms of Theorem 1, an idempotent matrix is always diagonalizable.

Lemma 3. An idempotent matrix is always diagonalizable and its eigenvalues are either0 or 1.

Since an idempotent matrix only has eigenvalues 0 or 1, it is easy to express its trace.

Lemma 4. For an idempotent matrix A ∈ Mn,n, follows

tr(A) = rank(A).

Idempotent matrices have a strong connection to orthogonal projections in the fol-lowing sense.

Theorem 2. A matrix A ∈ Mn,nis idempotent and symmetric if and only ifA is an

orthogonal projection.

It is also possible to construct new idempotent matrices by subtracting an idempo-tent matrix from the identity matrix.

Theorem 3. Let A ∈ Mn.nbe an idempotent matrix. Then the matrixB = In− A is

also idempotent.

Proof. BB = (In− A)(In− A) = In− 2InA + A2= In− 2A + A = In− A =

B.

2.1.4 Vectorization and Kronecker product

In this section some definitions and results on vectorization and Kronecker product are presented. These results simplifies notation regarding matrix normal distribution and makes it easier to interpret. The results rely on [8].

Definition 14. Let A = (aij) be a p × q-matrix and B = (bij) an r × s-matrix. Then

thepr × qs-matrix A ⊗ B is called the Kronecker product of the matrices A and B, if A ⊗ B = [aijB], i = 1, . . . , p; j = 1, . . . , q, where aijB =    aijb11 · · · aijb1s .. . . .. ... aijbr1 · · · aijbrs   .

Definition 15. Let A = (a1, . . . , aq) be a p × q-matrix, where ai, i = 1 . . . , q is the

i-th column vector. The vectorization operator vec· is an operator from Rp×q_to

Rpq_, with vecA =    a1 .. .   .

(18)

2.2 Statistics

This section presents some general definitions in statistics as well as some results in multivariate statistics and the general linear model. The ambition is to give enough substance to establish a basis for the understanding of this thesis. Most of the results will be given without proof but can be found in the references for each subsection.

2.2.1 General concepts

Here follows some definitions and properties regarding estimation in statistics. These definitions and theorems comes from [4].

At first we present the definition of an unbiased estimator.

Definition 16. The bias of a point estimator ˆθ of a parameter θ is the difference be-tween the expected value of ˆθ and θ; that is, Biasθ(ˆθ) = E |ˆθ − θ|. An estimator

whose bias is identically (inθ) equal to 0 is called unbiased and satisfies E[ˆθ] = θ for allθ.

Here follows the definition of a consistent estimator.

Definition 17. A sequence of estimators ˆθn = ˆθn(X1, . . . , Xn) is a consistent

se-quence of estimators of the parameterθ if for every > 0 and every θ in the parameter space,

lim

n→∞Pθ(|ˆθn− θ| < ) = 1.

And lastly a theorem which is widely used to establish consistency when unbiased-ness already has been proved.

Theorem 4. If ˆθnis a sequence of estimators of a parameterθ satisfying,

(i) lim

n→∞V ar(ˆθn) = 0,

(ii) lim

n→∞Biasθ(ˆθn) = 0, for every θ in the parameter space,

then ˆθnis a consistent sequence of estimators ofθ.

2.2.2 The Maximum Likelihood Estimator

In this section the estimating procedure, called maximum likelihood, is examined closer and some results regarding properties of these estimators are presented. These results can be found in [4].

First we present a formal definition of the maximum likelihood estimator (MLE). Definition 18. For each sample point x, let ˆθ(x) be a parameter value at which L(θ|x) attains its maximum as a function of θ, with x held fixed. A maximum like-lihood estimator (MLE) of the parameterθ based on a sample X is ˆθ(X).

The main reasons why the MLEs are so popular are the following theorems regard-ing its asymptotically properties.

(19)

2.2. Statistics 7

Theorem 5. Let X1, X2, . . . , be identically independent distributed f (x|θ), and let

L(θ|x) =

n

Q

i=1

f (xi|θ) be the likelihood function. Let ˆθ denote the MLE of θ. Let τ (θ)

be a continuous function ofθ. Under some regularity conditions, which can be found in Miscellanea 10.6.2 in [4] onf (x|θ) and hence, L(θ|x), for every > 0 and every θ ∈ Θ

lim

n→∞Pθ(|τ ( ˆθ) − τ (θ)| ≥ ) = 0.

That is,τ ( ˆθ) is consistent estimator τ (θ).

Theorem 6. Let X1, X2, . . . , be identically independent distributed f (x|θ), let ˆθ

denote the MLE ofθ, and let τ (θ) be a continuous function of θ. Under some regularity conditions, which can be found in Miscellanea 10.6.2 in [4] on f (x|θ) and hence L(θ|x),

√

n(τ ( ˆθ) − τ (θ)) → N (0, varLB( ˆθ)),

where varLB( ˆθ) is the Cram´er-Rao Lower Bound. That is, τ ( ˆθ) is a consistent,

asymptotically unbiased and asymptotically efficient estimator ofτ (θ).

Note 1. The regularity conditions in the two previous theorems are rather loose con-ditions on the probability density function. Since the underlying distribution in this thesis is the normal distribution, the regularity conditions possess no restriction in the development of results in this thesis.

2.2.3 The matrix normal distribution

This section presents some definitions and estimators for the normal distribution gener-alized into the multivariate and matrix normal distribution. These results can be found in [4, 8, 9, 21, 11].

First of the univariate normal distribution is defined.

Definition 19. A random variable X is said to have a univariate normal distribution, i.e.,X ∼ N (µ, σ2_{), if the probability density function of X is,}

f (x) =√ 1 2πσ2exp −(x − µ) 2 2σ2 .

In order to generalize this into a multivariate setting we need to define the covari-ance of a vector, e.g., the covaricovari-ance matrix.

Definition 20. If the random vector xmhas mean µ the covariance matrix of x is

defined to be them × n-matrix,

Σ = (σij) = cov(x) = E[(x − µ)(x − µ)0],

whereσij = cov(Xi, Xj).

(20)

Definition 21. A covariance matrix of size k has a banded structure of order 1 < m < k, denoted as Σ(k)_(m), if, Σ =            σ11 σ12 . . . σ1m 0 0 . . . 0 σ12 σ22 σ23 . . . σ2,m+1 0 . . . 0 .. . . .. . .. . .. ... 0 . . . 0 σk−m+1,k−1 . . . σk−2,k−1 σk−1,k−1 σk−1,k 0 . . . 0 0 σk−m,k . . . σk−1,k σkk            .

Here follows the definition of a multivariate (vector-) normal distribution.

Definition 22. A random vector xmis said to have am-variate normal distribution if,

for every_{a ∈ R}m_{, the distribution of}_a0_{x is univariate normal distributed.}

Theorem 7. If x has an m-variate normal distribution then both µ = E[x] and Σ = cov(x) exist and the distribution of x is determined by µ and Σ.

In the standard case the following estimators are most frequently used.

Theorem 8. Let x1, . . . , xnbe random sample from a multivariate normal population

with meanµ and covariance Σ. Then, ˆ µ = ¯x = 1 n n X i=1 xi and Σ =ˆ 1 n − 1 n X i=1 (xi− ¯x)(xi− ¯x)0,

are the MLE respective corrected MLE ofµ and Σ.

Theorem 9. The estimators in Theorem 8 are sufficient, consistent and unbiased with respect to the multivariate normal distribution.

The matrix normal distribution adds another dimension of dependence in the data and is often used to model multivariate datasets. Here follows its definition.

Definition 23. Let Σ = τ τ0andΨ = γγ0, whereτ ∈ Mp,randγ ∈ Mn,s. A matrix

X ∈ Mp,nis said to be matrix normally distributed with parametersM , Σ and Ψ, if

it has the same distribution as,

M + τ U γ0,

whereM ∈ Mp,nis non-random andU ∈ Mr,sconsists ofs independent and

iden-tically distributed (i.i.d.)Nr(0, I) vectors Ui, i = 1, 2, . . . , s. If X ∈ Mp,nis matrix

normally distributed, this will be denotedX ∼ Np,n(M , Σ, Ψ).

This definition makes it possible to rewrite the likelihood function in Theorem 8 into a matrix normal setting.

Let X = (x1, . . . , xn). Then X ∼ Np,n(M , Σ, In), where M = µ10n. This will

be a random sample from a matrix normal population. Then the estimator of Σ can be written in the following way,

ˆ

Σ = 1

n − 1XCX

(21)

2.2. Statistics 9

where C = In−_n1110.

The matrix normal distribution is nothing else than a multivariate normal distri-bution, i.e., let X ∼ Np,n(µ, Ω, Ψ), then vecX = x ∼ Npn(vecM , Ω), where

Ω = Ψ ⊗ Σ.

Here follows a theorem which shows a property of the matrix normal distribution and it is later used in the proof of bilinear expectation.

Theorem 10. Let X ∼ Np,n(0, Σ, I) and Q ∈ Mn,nbe an orthogonal matrix which

is independent ofX. Then X and XQ have the same distribution.

2.2.4 General Linear Model

This section introduces the general linear model and some estimators frequently asso-ciated with it. The results are taken from [11, 21, 1].

The multivariate linear model generalizes to the general linear model in the sense it allows a vector of observations given by rows of a matrix Y , to correspond to the rows of the known design matrix X. The multivariate model takes the form

Y = XB + E,

where Y ∈ Mn,mand E ∈ Mn,mare random matrices, X ∈ Mn,pis a known matrix,

called the design matrix, and B ∈ Mp,m is an unknown matrix of parameters called

regression coefficients. We will assume throughout this chapter that X has rank k, that n > m + p, and the rows of the error matrix E are independent Nm(0, Σ) random

vectors. Using the notation introduced in the earlier subsection, this means that E is Nn,m(0, In, Σ) so that Y is Nn,m(XB, In, Σ). Here follows a presentation of the

MLEs and some of their properties.

Theorem 11. Let Y = XB ∼ Nn,m(XB, In, Σ) where n > m + p and p =

rank(X). Then the MLE of B and the corrected MLE of Σ are ˆ B = (X0X)−1X0Y and ˆ Σ = 1 n − p(Y − X ˆB) 0_{(Y − X ˆ}_{B) =} 1 n − pY 0_{HY ,} where H = In− X(X0X)−1X0.

The non-corrected MLE of Σ above is not unbiased therefore the adjustment with the rank of X is required to establish unbiasedness. Here follows a theorem regarding the properties of the estimators.

(22)

2.2.5 Quadratic- and bilinear form

In this section some results regarding quadratic and bilinear forms in statistics are pre-sented. The results here are not so common so the references can be found within the section.

First a small presentation of the quadratic form taken from [9].

Definition 24. Let x ∼ Nn(µ, Σ) and A ∈ Mn,n. Thenx0Ax is called a quadratic

form.

Here follows the expression of the first two moments of the quadratic form. Theorem 13. Let x ∼ Nn(µ, Σ) and A ∈ Mn,n. Then

(i) E[x0Ax] = tr(AΣ) + µ0Aµ

(ii) var[x0Ax] = 2tr((AΣ)2_{) + 4µ}0_AΣAµ

Here follows some results on the bilinear form. This form has not been studied so much so here follows a definition and some results based on the same ideas as the quadratic form.

Definition 25. Let x ∼ Nn(µx, In), y ∼ Nn(µy, In) and A ∈ Mn,n. Thenx0Ay

is called a bilinear form.

Here the first two moments of the bilinear form are presented. Theorem 14. Let x ∼ Nn(0, Σx), y ∼ Nn(0, Σy) and A ∈ Mn,n.

The bilinear formx0Ay has the following properties. (i) E[x0Ay] = tr(A cov(x, y))

(ii) var[x0Ay] = tr(A cov(x, y))2+tr(A var(x)A var(y)) = tr(A) cov(x, y)2+ tr(A2) var(x) var(y).

Proof. Here follows a proof of property one with for the slightly simpler case when A is normal.

Since A is normal, it can be written as A = BDB0, by the spectral decomposition theorem, where B is orthogonal and D is a diagonal matrix.

It is now possible to write the bilinear form as

Q = x0Ay = x0(BDB0)y = (B0x)0D(B0y) = z0Dw where

z = B0x and w = B0y.

Since B is orthogonal Theorem 10 implies that z ∼ Nn(0, In) and w ∼ Nn(0, In).

It is now possible to look for the expected value of Q.

E[Q] = tr[E(Q)] = E[tr(z0Dw)] = E[tr(Dz0w)] = tr(E[Dz0w]) = tr(D E[z0w]) = tr(D cov(z, w)) = tr(D cov(B0x, B0y)) = tr(DB0cov(x, y)B)

= tr(BDB0cov(x, y)) = tr(A cov(x, y)).

(23)

Chapter 3

Explicit estimators of a banded

covariance matrix

In this chapter we introduce the estimator developed in [15]. This thesis will focus on the case where the covariance matrix is banded of order one.

3.1 Previous results

In [15] an explicit estimator for the covariance matrix for a multivariate normal distri-bution when the covariance matrix have an m-dependence structure is presented. They propose estimators for the general case when m + 1 < p < n and establish some prop-erties of it. Furthermore the article consider the special case where m = 1 in detail and in this section some of the results from the article will be presented.

Proposition 1. Let X ∼ Np,n(µ10n, Σ (1)

(p), In). Explicit estimators are given by

ˆ µi= 1 nx 0 i1n, ˆ σii= 1 nx 0 iCxi fori = 1, . . . , p, ˆ σi,i+1= 1 nrˆ 0 iCxi+1 fori = 1, . . . , p − 1,

whererˆ1= y1andrˆi= xi− ˆsirˆi−1 fori = 2, . . . , p − 1,

ˆ si= ˆ r0_i−1Cxi ˆ r0_i−1Cxi−1 , whereC = In− 1 n1n1 0 n.

Since these estimators are ad hoc it is important to establish some properties to motivate them. Thus the following theorem.

Theorem 15. The estimator ˆµ = (ˆµ1, . . . , ˆµp)0given in Proposition 1 is unbiased and

(24)

12 Chapter 3. Explicit estimators of a banded covariance matrix

The previous theorem presents two crucial properties for the estimator, consistency and unbiasedness. However, the sample covariance matrix above lacks the property of unbiasedness. One of the main goals of this thesis were the development of this de-sired property. Regardless the simulations of the estimators in comparison to standard methods, e.g., maximum likelihood have shown great promise which made it probable that and unbiasedness variation of the sample covariance matrix exists.

The result of this goal will be presented in the following sections.

3.2 Remodeling of explicit estimators

This section presents some rewriting of the estimator above to show some properties which makes it a clearer expression and more suitable for interpretation and analyzing. The estimators presented in Proposition 1 is partly composed of the MLE. The esti-mator for µi, ˆ µi= 1 nx 0 i1n

is the mean value which is the MLE.

The proposed estimator and the MLE share a resemblance, which can be seen by look-ing at the estimations of the diagonal elements of the covariance matrix,

ˆ σii = 1 nx 0 iCxi for i = 1, . . . , p.

The diagonal elements is the MLE for diagonal of a covariance matrix. Also, the first off-diagonal element of the covariance matrix looks the following way

ˆ σ12= 1 nrˆ 0 1Cx2, where ˆ r1= x1. So that, ˆ σ12= 1 nrˆ 0 1Cx2= 1 nx 0 1Cx2,

which is the MLE for the same element for an unstructured matrix. This can also be seen by looking at the covariance matrix as a whole.

           σ11 σ12 . . . σ1m 0 0 . . . 0 σ12 σ22 σ23 . . . σ2,m+1 0 . . . 0 .. . . .. . .. . .. ... 0 . . . 0 σk−m+1,k−1 . . . σk−2,k−1 σk−1,k−1 σk−1,k 0 . . . 0 0 σk−m,k . . . σk−1,k σkk           

where the left upper 2x2-partion of the matrix Σ(1)₍₂₎ =σ11 σ12

σ21 σ22

(25)

3.2. Remodeling of explicit estimators 13

is the unstructured covariance matrix.

However, the estimators of the off-diagonal elements of the covariance matrix except σ12are not the MLE and are therefore harder to analyze.

But when i > 1 we can write the estimators as. ˆ σi,i+1= 1 nrˆ 0 iCxi+1= 1 n(xi− ˆsirˆi−1) 0_Cx i+1 = 1 n xi− ˆ r0_i−1Cxi ˆ r0_i−1C ˆri−1 ˆ ri−1 0 Cxi+1 = 1 n x0iCxi+1− ˆ r0_i−1Cxi ˆ r0i−1C ˆri−1 ˆ r0i−1Cxi+1

and since ˆr0_k−1C ˆrk−1is a scalar it is possible to write

ˆ σi,i+1=

1

n x

0

iCxi+1− x0iC ˆri−1(ˆri−10 C ˆri−1)−1rˆ0i−1Cxi+1

= 1 nx

0

i C − C ˆri−1(ˆr0i−1C ˆri−1)−1rˆ0i−1C xi+1,

which can be written as, ˆ σi,i+1= 1 nx 0 iAi−1xi+1, where Ai = C − C ˆri(ˆr0iC ˆri)−1rˆ0iC.

This makes it suitable to propose another form of the estimator in Proposition 1 justified by the calculations above.

Hence, the following proposition. Proposition 2. Let X ∼ Np,n(µ10n, Σ

(1)

ˆ µi= 1 nx 0 i1n, ˆ σii= 1 nx 0 iCxi fori = 1, . . . , p, ˆ σi,i+1= 1 nx 0

iAi−1xi+1 fori = 1, . . . , p − 1,

whereAi= C − C ˆri(ˆr0iC ˆri)−1rˆ0iC, withA0= C, whererˆ1= y1andrˆi = xi− ˆ r0_i−1Cxi ˆ r0_i−1Cxi−1 ˆ ri−1 fori = 2, . . . , p − 1, − 1 0

(26)

Theorem 16. The matrix Aiin Proposition 2 is idempotent and symmetric with rank

n − 2. Proof. Idempotence: Ai= C − C ˆri(ˆr0iC ˆri)−1rˆ0iC. A2_i = (C − C ˆri(ˆr0iC ˆri)−1rî0C)(C − C ˆri(ˆr0iC ˆri)−1rˆ0iC) = C2− CC ˆri(ˆr0iC ˆri)−1rî0C − C ˆri(ˆr0iC ˆri)−1rˆ0iCC +C ˆri(ˆr0iC ˆri)−1rˆ0iCC ˆri(ˆr0iC ˆri)−1rˆ0iC = C − 2C ˆri(ˆr0iC ˆri)−1rˆ0iC + C ˆri(ˆr0iC ˆri)−1(ˆr0iC ˆri)(ˆr0iC ˆri)−1rˆ0iC = C − C ˆri(ˆr0iC ˆri)−1rˆ0iC = Ai. Symmetry: A0_i= (C − C ˆri(ˆr0iC ˆri)−1rˆ0iC)0 = C0− C0rî(ˆri0C ˆri)−1rˆ0iC 0 = C − C ˆri(ˆr0iC ˆri)−1rˆ0iC = Ai. Rank:

Since Aiis idempotent the rank(Ai) = tr(Ai). This implies the following,

rank(Ai) = tr(Ai) = tr(C − C ˆri(ˆri0C ˆri)−1rˆ0iC)

= tr(C) − tr(C ˆri(ˆr0iC ˆri)−1rˆ0iC) = (n − 1) − tr(C ˆri(ˆr0iC ˆri)−1rˆ0i)

= n − 1 − tr((ˆr0_iC ˆri)(ˆr0iC ˆri)−1) = n − 1 − tr(1) = n − 2.

3.3 Proof of unbiasedness and consistency

The last section presented some alteration to the original estimator which made it pos-sible to rewrite it with a quadratic or bilinear form, centering an idempotent matrix, i.e., ˆ σii = 1 nx 0 iCxi for i = 1, . . . , p, and ˆ σi,i+1= 1 nx 0

iAi−1xi+1 for i = 1, . . . , p − 1.

Section 2.2.5 presented some results regarding the expectation and variance for quadratic-and bilinear form. It is now possible to combine those results with the results of the previous section to develop an unbiased estimator for the covariance matrix. It is also possible to present a new and much simpler proof of the consistency for the sample covariance matrix compared to the proof in [15].

(27)

3.3. Proof of unbiasedness and consistency 15

Proposition 3. Let X ∼ Np,n(µ10n, Σ (1)

ˆ µi= 1 nx 0 i1n, ˆ σii= 1 n − 1x 0 iCxi fori = 1, . . . , p, ˆ σ12= 1 n − 1x 0 1Cx2, ˆ σi,i+1= 1 n − 2x 0

iAi−1xi+1 fori = 2, . . . , p − 1,

whereAi= C − C ˆri(ˆr0iC ˆri)−1rˆ0iC, ˆ r1= y1andrˆi= xi− ˆ r0i−1Cxi ˆ r0_i−1Cxi−1 ˆ ri−1 fori = 2, . . . , p − 1, withC = In− 1 n1n1 0 n.

Here follows the proof of the unbiasedness of the previous estimator.

Theorem 17. The estimators from Proposition 3 are unbiased.

Proof. The estimators ˆµi, ˆσii for i = 1, . . . , p and ˆσ12 coincide with the corrected

maximum likelihood and are thus according to Theorem 6 unbiased. Therefore it re-mains to prove that ˆσi,i+1are unbiased for i = 2, . . . , p − 1.

When the estimators are derived in Appendix, they are conditioned on the previous x:s. That is the calculation of ˆσi,i+1 assumes x1, . . . , xi−1to be known constants.

Therefore, the matrix Aibelow can be considered as a non-random matrix.

We can then consider ˆσi,i+1as a bilinear form and calculate its expectation.

E(ˆσi,i+1) = E ₁ n − 2x 0 iAi−1xi+1 = E ₁ n − 2x 0

iAi−1xi+1|x1, . . . , xi−1

= 1

n − 2E(x

0

iAi−1xi+1|x1, . . . , xi−1).

According to the bilinear form, Theorem 14 gives E(ˆσi,i+1) =

1

n − 2rank(Ai−1)σi,i+1 = 1

n − 2(n − 2)σi,i+1 = σi,i+1, where we have used Theorem 19 for the second last equality. Thus E(ˆσi,i+1) = σi,i+1

and the theorem has been proved.

The proof for consistency follow the same structure as the proof above but instead uses that the estimators are unbiased and study the variance of the estimators.

(28)

maximum likelihood and are thus according to Theorem 5 consistent. Therefore it re-mains to prove that ˆσi,i+1are consistent for i = 2, . . . , p − 1.

When the estimators are derived in the Appendix, they are conditioned on the previ-ous x:s. That is the calculation of ˆσi,i+1assumes x1, . . . , xi−1to be known constants.

We can then consider ˆσi,i+1as a bilinear form and calculate its variance.

var(ˆσi,i+1) = var

₁ (n − 2)x 0 iAi−1xi+1 = var ₁ n − 2x 0

iAi−1xi+1|x1, . . . , xi−1

= 1

(n − 2)2var(x 0

iAi−1xi+1|x1, . . . , xi−1),

according to Theorem 14. var(ˆσi,i+1) = 1 (n − 2)2rank(Ai−1)σ 2 i,i+1+ tr(A 2_)σ iiσi+1,i+1.

Since by Theorem 19 Ai−1is idempotent.

var(ˆσi,i+1) =

1

(n − 2)2tr(A)(σ 2

i,i+1+ σiiσi+1,i+1).

It is possible to simplify this further because rank(A) = tr(A) is known, thus,

var(ˆσi,i+1) =

σ2

i,i+1+ σiiσi+1,i+1

n − 2 → 0,

when the number of observations goes to infinity. Since the estimator also is unbiased, it is according to Theorem 4 consistent. Thus the theorem has been proved.

(29)

Chapter 4

Generalization to a general

linear model

In this chapter the estimator presented earlier will be extended into a general linear model. Then the results proved in the last chapter will be proved for this new es-timator. Since the estimating procedure of the covariance matrix for a linear model remains similar to the estimation of a regular multivariate normal covariance matrix, the reasoning will be similar. Two differences of concern are the effect of estimating the regression parameters compared to the expected value and the degrees of freedom in covariance matrix compared to the rank of the design matrix.

4.1 Presentation

In this section the presumptions and some notations for the estimators will be given.

We will use similar presumptions and model as in Chapter 2 on the general linear model. Thus the multivariate model takes the form

Y = XB + E,

where Y and E are n × m random matrices, X is a known n × p-design matrix, and B is an unknown p × m-matrix of regression coefficients. We will assume throughout this chapter that X has full rank p, that n ≥ m + p, and the rows of the error matrix E are independent Nm(0, Σ) random vectors.

We will also assign Σ a certain structure as in the previous chapter, hence Σ is banded of order one.

4.2 B instead of ˆ

ˆ

µ

This section contains a motivation why ˆB will maximize the conditional likelihood function in the same way as ˆµ does.

The main difference between a normal model and a general linear model is the regression parameters which affects the mean. Therefore you estimate B instead of µ.

(30)

18 Chapter 4. Generalization to a general linear model

In a general linear model the MLE for B is ˆB = (X0X)−1X0Y . Since the general linear model is a fusion between different response value, it is possible to determine the different b separately with the following expression.

ˆ

bi= (X0X)−1X0yi, for i = 1, . . . , k.

The explicit estimator in this thesis is derived from stepwise maximization of the like-lihood function. This procedure gives the estimators in this thesis and can be found in the Appendix. The estimators, as we have seen, coincide with the mean for a normal distribution.

The same principle applies to the general linear model in the following way. B = (b1, . . . , bk)0

ˆ

bi|y1, . . . , yi−1= ˆbi= (X0X)−1X0yi.

Since each individual b-vector can be determined independently and because the esti-mator of ˆb above is the MLE, it will maximize the conditional distribution because of the independence of the b:s.

It is then possible to write ˆB = (ˆb1, . . . , ˆbk)0. The estimators ˆB and ˆΣ are sufficient

for B and Σ.

This altogether makes a good basis to propose explicit estimators for a general linear model with banded structure of order one.

4.3 Proposed estimators

In this section we propose explicit estimators for a general linear model. In the last section a motivation for ˆB = (X0X)−1_X0

Y is given and here follows a motivation for the covariance matrix.

In the previous chapter we assumed X ∼ Np,n(µ10n, Σ (p)

(1), In). We now study the

general linear model Y ∼ Nn,p(XdesignB, In, Σ (p)

(1)) and see that the transformation

Y − XdesignB will yield the same model as the last chapter, hence Y − XdesignB ∼

Np,n(0, Σ (p) (1), In).

Replacing the observation xiwith yi− Xdesignˆb in Proposition 3 gives the following

appearance of the estimator.

Proposition 4. Let Y ∼ Nn,p(XB, In, Σ (1)

(p)). Explicit estimators are given by

ˆ B = (X0X)−1X0Y , ˆ σii= 1 n − 1(yi− Xˆbi) 0_C(y i− Xˆbi) fori = 1, . . . , p, ˆ σ12= 1 n − 1(y1− Xˆb1) 0_C(y 2− Xˆb2), ˆ σi,i+1= 1 n − 2_¯(yi− Xˆbi) 0_A

i−1(yi+1− Xˆbi+1) fori = 2, . . . , p − 1,

whereAi= C − C ˆri(ˆr0iC ˆri)−1rˆ0iC,

(31)

4.3. Proposed estimators 19

whererˆ1= y1andrˆi= yi−Xˆbi−

ˆ

r0_i−1C(y_i− Xˆbi)

ˆ

r0_i−1C(y_i−1− Xˆbi−1)

ˆ ri−1 fori = 2, . . . , p−1, withC = In− 1 n1n1 0 n.

This can be simplified by changing the main matrix C into D = In−X(X0X)−1X0.

This removes the necessity of having y_i− Xˆbias it is included in the matrix D. This

leads us to the following proposition.

Proposition 5. Let Y = XB + E ∼ Nn,p(XB, In, Σ (1)

(p)), where rank(X) = k.

Explicit estimators are given by

ˆ B = (X0X)−1X0Y , ˆ σii= 1 n − 1y 0 iDyi fori = 1, . . . , p ˆ σ1,2 = 1 n − 2y 0 1Dy2 ˆ σi,i+1= 1 n − 2y 0

iAi−1xi+1 fori = 2, . . . , p − 1

whereAi= D − Dˆri(ˆr0iDˆri)−1rˆ0iD withA0= D, whererˆ1= y1andrˆi= yi− ˆ r0_i−1Dy_i ˆ r0i−1Dyi−1 ˆ ri−1 fori = 2, . . . , p − 1 withD = In− X(X0X)−1X0

Since in the last chapter we saw that the correction of unbiasedness depended on matrix Aiabove which C is a part of, we need to study the properties of the new Aito

determine what kind of estimator for a general linear model will give us unbiasedness. Here follows a theorem regarding the properties of the matrix Aiabove.

Theorem 19. The matrix Aiin Proposition 5 is idempotent and symmetric with rank

n − k − 1. Proof. Idempotence: Ai= D − Dˆri(ˆr0iDˆri)−1rˆ0iD. A2_i = (D − Dˆri(ˆr0iDˆri)−1rˆi0D)(D − Dˆri(ˆr0iDˆri)−1rˆ0iD) = D2− DDˆri(ˆr0iDˆri)−1rˆ0iD − Dˆri(ˆr0iDˆri)−1ˆr0iDD + Dˆri(ˆr0iDˆri)−1 ˆ r0_iDDˆri(ˆr0iDˆri)−1rˆ0iD = D − 2Dˆri(ˆr0iDˆri)−1rˆ0iD + Dˆri(ˆr0iDˆri)−1(ˆr0iDˆri)(ˆr0iDˆri)−1rˆ0iD

(32)

20 Chapter 4. Generalization to a general linear model Symmetry: A0_i= (D − Dˆri(ˆr0iDˆri)−1ˆr0iD) 0 = D0− D0rˆi(ˆri0Dˆri)−1ˆr0iD 0 = D − Dˆri(ˆr0iDˆri)−1ˆr0iD = Ai. Rank:

Since Aiis idempotent the rank(Ai) = tr(Ai). This implies the following,

rank(Ai) = tr(Ai) = tr(D − Dˆri(ˆri0Dˆri)−1rˆ0iD)

= tr(D) − tr(Dˆri(ˆr0iDˆri)−1rˆ0iD) = (n − k) − tr(Dˆri(ˆr0iDˆri)−1rˆ0i)

= n − 1 − tr((ˆr0_iDˆri)(ˆr0iDˆri)−1) = n − k − tr(1) = n − k − 1.

This gives us the possibility to present the following explicit estimator which in next section will be proved to be unbiased and consistent.

Proposition 6. Let Y = XB + E ∼ Np,n(XB, In, Σ (1)

(p)), where rank(X) = k.

Explicit estimators are given by

ˆ B = (X0X)−1X0Y , ˆ σii= 1 n − ky 0 iDyi fori = 1, . . . , p, ˆ σ12= 1 n − k − 1y 0 1Dy2, ˆ σi,i+1= 1 n − k − 1y 0

iAi−1xi+1 fori = 2, . . . , p − 1,

whereAi= D − Dˆri(ˆr0iDˆri)−1rˆ0iD, withA0= D, whererˆ1= y1andrˆi = yi− ˆ r0_i−1Dy_i ˆ

r0_i−1Dy_i−1rˆi−1 fori = 2, . . . , p − 1, withD = In− X(X0X)−1X0.

(33)

4.4. Proof of unbiasedness and consistency 21

4.4 Proof of unbiasedness and consistency

This section provides the proof of unbiasedness and consistency of the estimators pre-sented above. Since the structure of the estimators is similar to an ordinary normal model, the proofs will be similar to the ones presented in Chapter 3.

We start by proving that the estimators are unbiased.

Theorem 20. The estimators from Proposition 6 are unbiased.

Proof. The estimators ˆB, ˆσii for i = 1, . . . , p and ˆσ12 coincide with the corrected

maximum likelihood and are thus according to Theorem 12 unbiased. Therefore it re-mains to prove that ˆσi,i+1are unbiased for i = 2, . . . , p − 1.

When the estimators are derived in the Appendix, they are conditioned on the previ-ous y:s. That is the calculation of ˆσi,i+1assumes y1, . . . , yi−1to be known constants.

We can consider ˆσi,i+1as a bilinear form and calculate its expectation as

E(ˆσi,i+1) = E 1 n − k − 1y 0 iAi−1yi+1 = E 1 n − k − 1y 0

iAi−1yi+1|y1, . . . , yi−1

= 1

n − 2E(y

0

iAi−1yi+1|y1, . . . , yi−1)

= 1

n − k − 1rank(Ai−1)σi,i+1

= 1

n − k − 1(n − k − 1)σi,i+1= σi,i+1,

according to Theorem 14 and Theorem 19. Thus E(ˆσi,i+1) = σi,i+1and the theorem

has been proved.

The proof for consistency follow the same structure as the proof above but instead uses that the estimators are unbiased and study the variance of the estimators.

Theorem 21. The estimators from Proposition 6 are consistent.

maximum likelihood and are thus according to Theorem 12 consistent. Therefore it remains to prove that ˆσi,i+1are consistent for i = 2, . . . , p − 1.

When the estimators are derived in Appendix, they are conditioned on the previous y:s. That is the calculation of ˆσi,i+1 assumes y1, . . . , yi−1 to be known constants.

We can then consider ˆσi,i+1as a bilinear form and calculate its variance as

var(ˆσi,i+1) = var(

1 n − 2y 0 iAi−1yi+1) = var( 1 n − 2y 0

iAi−1yi+1|y1, . . . , yi−1) =

1

(n − 2)2var(y 0

iAi−1yi+1|y1, . . . , yi−1)

(34)

22 Chapter 4. Generalization to a general linear model = 1 (n − 2)2(A)(σ 2 i,i+1+ σiiσi+1,i+1) = σ 2 i,i+1+ σiiσi+1,i+1 n − 2 ,

according to Theorem 14, Theorem 19 and since rank(A) = tr(A). One can see that, when n → ∞ the var(ˆσi,i+1) → 0.

Estimators which are both unbiased and whose variance tends to 0 when n → ∞ are consistent according to Theorem 4. Thus the theorem has been proved.

We have presented and proved that these explicit estimators can be used with good qualities for a general linear model.

(35)

Chapter 5

Simulations

In this chapter we display some simulations of the unbiased covariance matrix esti-mator presented in Proposition 3 and Proposition 6 and compare them to the previous estimator in Proposition 1 and Proposition 5. This will give an idea about how much better the improved estimators perform.

5.1 Simulations of the regular normal distribution

In this section we assumed that x ∼ N4

µ, Σ(4)₍₁₎where µ =     1 1 1 1     and Σ =     5 2 0 0 2 5 1 0 0 1 5 3 0 0 3 5     .

In this simulation a sample of size n = 20 observations was randomly generated using MATLAB Version 7.13. Then the unbiased explicit estimators and the previous explicit estimators were calculated in each simulation. This was repeated 100000 times and the average values of the obtained estimator were calculated.

Based on the 100000 average, explicit unbiased average estimators are given by,

ˆ Σnew=     4.99501 1.99590 0.00000 0.00000 1.99590 4.99238 0.99678 0.00000 0.00000 0.99678 5.00026 3.00265 0.00000 0.00000 3.00265 5.00368     ,

and the previous estimators are given by,

ˆ Σprev=     4.74526 1.89611 0.00000 0.00000 1.89611 4.74276 0.89710 0.00000 0.00000 0.89710 4.75025 2.70239 0.00000 0.00000 2.70239 4.75350     .

(36)

24 Chapter 5. Simulations

In this simulation the unbiased estimators perform considerably better than the pre-vious ones. This is in line with the theory presented.

5.2 Simulations of the estimators for a general linear

model

In this section we assumed that Y = XB + E ∼ Nn,5

XB, In, Σ (5) (1) where Σ =       4 1 0 0 0 1 3 2 0 0 0 2 5 3 0 0 0 3 5 3 0 0 0 3 5      

For each simulation the matrices X and B were randomly generated using MATLAB:s randomizer routine to avoid any effect on the estimation process.

In this simulation a sample of size n = 80 observations was randomly generated using MATLAB Version 7.13. Then the unbiased explicit estimators and the previous explicit estimators were calculated in each simulation. This was repeated 100000 times and the average values of the obtained estimator were calculated.

Based on the 100000 average unbiased explicit average estimators are given by,

ˆ Σnew=       3.99865 0.99971 0.00000 0.00000 0.00000 0.99971 3.00511 2.00246 0.00000 0.00000 0.00000 2.00246 4.99898 2.99769 0.00000 0.00000 0.00000 2.99769 4.99412 2.99504 0.00000 0.00000 0.00000 2.99504 4.99352       ,

and the previous estimators are given by,

ˆ Σprev=       2.99899 0.74978 0.00000 0.00000 0.00000 0.74978 2.25383 1.47682 0.00000 0.00000 0.00000 1.47682 3.74924 2.21079 0.00000 0.00000 0.00000 2.21079 3.74559 2.20884 0.00000 0.00000 0.00000 2.20884 3.74514       .

In this simulation the unbiased estimators perform vastly better than the previous estimators. This is also in line with the theory since the adjustment for unbiasedness is larger than for an ordinary normal distribution.

(37)

Chapter 6

Discussion and further research

6.1 Discussion

This thesis presents unbiased and consistent estimators for a covariance matrix with a banded structure of order one. This progress makes it more suitable to use in a real situation since the property of unbiasedness is highly desired in combination with the generalization into a general linear model. The simulation study in the Chapter 5 shows that the effect is prominent and since the estimator in the earlier paper [15] was com-parable to other used methods this improved version should be efficient as well.

6.2 Further research

The main direction for future research in this area is a way to extend these results into a banded covariance matrix of any order. This would provide explicit estimators with good qualities. Another step is to compare these estimators with the MLEs for a banded covariance matrix and determine how well it performs in comparison to that. Lastly a direction of research could be to study the efficiency of the estimator which would give a comparison to all other possible unbiased estimator of a banded covariance matrix.

(38)

(39)

Appendix A

Motivation of the estimator

In this chapter the motivation of the explicit estimator presented in [15] will be pre-sented.

Here follows some notation specific for the appendix.

We will define Mji_k as the matrix obtained when the j − th row and the i − th column have been removed from Σk. Moreover, we will partition the matrix Σ

(m) (k) as Σ(m)_(k) = Σ (m) (k−1) σ1k σ0_k1 σkk, ! where σ0_k1= (0, . . . , 0, σk,k−m, . . . , σk,k−1).

For a matrix X, the notation Xi:jwill be used for the matrix including rows from i to

j, i.e.,

X0i:j= (xi, xi+1, . . . , xj).

The estimators are based on the likelihood. However, instead of maximizing the com-plete likelihood we factor the likelihood and maximize each term. In this way explicit estimators are obtained. By conditioning the probability density equals.

f (X) = f (x0_p|X1, . . . , Xp−1) . . . f (x0m+2|X1:m+1)f (X1:m+1).

Hence, for k = m + 2, . . . , p partion the covariance matrix Σ(k). We have

x0_k|X1:k−1 ∼ N1,n(µk|1:k−1, σk|1:k−1, In),

where the conditional variance equals

σk|1:k−1= σkk− σ0k1Σ−1(k−1)σ1k,

and where the conditional expectation equals

µ0_k|1:k−1= µk10n+ σ 0 k1Σ −1 (k−1)    x0₁− µ110n .. . xk−1− µk−110n   

(40)

28 Appendix A. Motivation of the estimator = rk010n+ k−1 X i=k−m σki k−1 X j=1 σ_(k−1)ij x0_j. (A.1)

Here σ_(k−1)ij are the elements of the matrix

Σ−1_(k−1)=σ_(k−1)ij i,j =  (−1)i+j M ji (k−1) Σ(k−1)   i,j .

The first regression coefficient equals

rk0= µk− k−1 X i=k−m σki k−1 X j=1 σ_(k−1)ij µj = µk− k−1 X i=k−m σki k−1 X j=1 (−1)i+j M ji (k−1) Σ(k−1) µj.

We may rewrite equation (A.1) as

µk|1:k−1= rk01n+ k−1 X i=k−m σki k−1 X j=1 Σij_(k−1)xj = rk01n+ k−1 X i=k−m σki Σ(k−2) Σ(k−1) k−1 X j=1 (−1)i+j M ji (k−1) Σ(k−2) xj = rk01n+ k−1 X i=k−m rkisk−1,i= Sk−1rk, where rk= (rk0, rk,k−m, . . . , rk,k−1)0, Sk−1= (1n, sk−1,k−m, . . . , sk−1,k−1), rki= σki Σ(k−2) Σ(k−1) , for i = k − m, . . . , k − 1, and sk−1,i= k−1 X j=1 (−1)i+j M ji (k−1) Σ(k−2) xj, for i = k − 1, . . . , k − 1.

The proposed estimators for the regression coefficients in the k:th step are ˆ r = (ˆrk0, ˆrk,k−m, . . . , ˆrk,k−1)0 = ( ˆS 0 k−1Sˆk−1)−1Sˆ 0 k−1xk, where ˆ Sk−1= (1, ˆsk−1,k−m, . . . , ˆsk−1,k−1), and ˆ sk−1,i= k−1 X j=1 (−1)i+j ˆ Mji_(k−1) ˆ Σ(k−2) xj, for i = k − m, . . . , k − 1.

(41)

29

Here the estimators from the previous terms (1, 2, . . . , k − 1) are inserted in ˆsk−1,ifor

all i = k − m, . . . , k − 1. The estimator for the conditional variance is given by σk|1:k−1= 1 n(xk− ˆµk|1:k−1) 0_(x k− ˆµk|1:k−1) = 1 nx 0 k(In− ˆSk−1( ˆS 0 k−1Sˆk−1)−1Sˆ 0 k−1)xk.

The estimators for the original parameters may be calculated as

ˆ σki= ˆrki ˆ Σ(k−1) ˆ Σ(k−2) , for i = k − m, . . . , k − 1, ˆ µk= ˆrk0+ k−1 X i=k−m ˆ σki k−1 X j=1 (−1)i+j ˆ Mji_(k−1) ˆ Σ(k−2) ˆ µj, and ˆ σkk= 1 nx 0 k(In− ˆSk−1( ˆS 0 k−1Sˆk−1)−1Sˆ 0 k−1)xk+ ˆσk1Σˆ −1 (k−1)σˆ1k.

It remains to show that the estimator ˆµk is the mean of xk, i.e., ˆµ = _n1x0j1n for all

k = 1, . . . , p and a proof via induction is now presented.

Base step:For k = 1, 2, . . . , m + 1, ˆµk = _n1x0k1nsince the estimators are MLEs in a

model with a non-structured covariance matrix.

Inductive step:For some m + 1 < k − 1 assume that ˆµj= _n1x0j1n, for all j < k − 1.

Then ˆ µk= ˆrk0+ k−1 X i=k−m ˆ σki k−1 X j=1 (−1)i+j ˆ Mji(k−1) ˆ Σ(k−1) ˆ µj = ˆrk0+ k−1 X i=k−m ˆ rki k−1 X j=1 (−1)i+j ˆ Mji_(k−1) ˆ Σ(k−2) 1 nx 0 j1n = ˆrk0+ k−1 X i=k−m ˆ rki 1 nxˆ 0 j1 0₌ 1 n ˆ Sk−1rˆk = 1 n ˆ Sk−1( ˆS 0 k−1Sˆk−1)−1Sˆ 0 k−1xk. Since ˆSk−1( ˆS 0 k−1Sˆk−1)−1Sˆ 0

k−1is a projection on a space which contains the vector

1nwe have ˆ µk= 1 n1 0 nSˆk−1( ˆS 0 k−1Sˆk−1)−1Sˆ 0 k−1xk= 1 n1 0 nxk.

Hence, by induction all the estimators for the expectations are means, i.e., 1

(42)

(43)

Bibliography

[1] T. W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley-Interscience, 2003.

[2] D. S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas. Princeton University Press, 2009.

[3] M. W. Browne. The analysis of patterned correlation matrices by generalized least squares. British Journal of Mathematical and Statistical Psychology, 30(1):113– 124, 1977.

[4] G. Casella. Statistical inference. Duxbury Thomson Learning, Pacific Grove, CA, 2002.

[5] M. Chakraborty. An efficient algorithm for solving general periodic Toeplitz systems. Signal Processing, IEEE Transactions on, 46(3):784–787, 1998. [6] V. M. Chinchilli and W. H. Carter. A likelihood ratio test for a patterned

covari-ance matrix in a multivariate growth-curve model. Biometrics, 40(1):151–156, 1984.

[7] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 2012.

[8] T. Kollo and D. von Rosen. Advanced Multivariate Statistics with Matrices. Springer, 2011.

[9] C. E. McCulloch, S. R. Searle, and J. M. Neuhaus. Generalized, Linear, and Mixed Models. Wiley-Interscience, 2008.

[10] J. M. F. Moura and N. Balram. Recursive structure of noncausal gauss-markov random fields. Information Theory, IEEE Transactions on, 38(2):334–354, 1992. [11] R. J. Muirhead. Aspects of Multivariate Statistical Theory. Wiley-Interscience,

2005.

[12] T. Nahtman. Marginal permutation invariant covariance matrices with appli-cations to linear models. Linear Algebra and Its Appliappli-cations, 417(1 SPEC. ISS.):183–210, 2006.

[13] D. N. Naik and S. S. Rao. Analysis of multivariate repeated measures data with a Kronecker product structured covariance matrix. Journal of Applied Statistics, 28(1):91–105, 2001.

(44)

32 Bibliography

[14] L. Nelson and D. L. Zimmerman. The likelihood ratio test for a separable covari-ance matrix. Statistics and Probability Letters, 73(4):449 – 457, 2005.

[15] M. Ohlson, Z. Andrushchenko, and D. von Rosen. Explicit estimators under m-dependence for a multivariate normal distribution. Annals of the Institute of Statistical Mathematics, 63(1):29–42, 2011.

[16] I. Olkin. Testing and Estimation for Structures which are Circularly Symmetric in Blocks. Research bulletin. Educational Testing Service, 1972.

[17] I. Olkin and S. J. Press. Testing and estimation for a circular stationary model. The Annals of Mathematical Statistics, 40(4):1358–1373, 1969.

[18] S. R. Searle. Linear Models. Wiley, 1971.

[19] S. R. Searle. Estimating multivariate variance and covariance components using quadratic and bilinear forms. Biometrical Journal, 21(4):389–398, 1979. [20] S. R. Searle, G. Casella, and C. E. McCulloch. Variance Components.

Wiley-Interscience, 2006.

[21] M. S. Srivastava and C. G. Khatri. Introduction to Multivariate Statistics. Elsevier Science Ltd, 1979.

[22] D. F. Votaw. Testing compound symmetry in a normal multivariate distribution. The Annals of Mathematical Statistics, 19(4):447–473, 1948.

[23] S. S. Wilks. Sample criteria for testing equality of means, equality of variances, and equality of covariances in a normal multivariate distribution. The Annals of Mathematical Statistics, 17(3):257–281, 1946.

[24] J. W. Woods. Two-dimensional discrete markovian fields. Information Theory, IEEE Transactions on, 18(2):232–240, 1972.

(45)

Copyright

The publishers will keep this document online on the Internet - or its possible re-placement - for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permis-sion for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Sub-sequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and acces-sibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringe-ment. For additional information about the Link¨oping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/

Upphovsr¨att

Detta dokument h˚alls tillgängligt p˚a Internet - eller dess framtida ersättare - under 25 ˚ar fr˚an publiceringsdatum under förutsättning att inga extraordinära omständigheter uppst˚ar. Tillg˚ang till dokumentet innebär tillst˚and för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekom-mersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillst˚and. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgäng-ligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet p˚a ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i s˚adan form eller i s˚adant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare in-formation om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/

c

Explicit Estimators for a Banded Covariance Matrix in a Multivariate Normal Distribution

Examensarbete

Explicit Estimators for a Banded Covariance Matrix in a

Multivariate Normal Distribution

Explicit Estimators for a Banded Covariance Matrix in a

Multivariate Normal Distribution

Abstract

Acknowledgements

Nomenclature

Notation

Contents

Chapter 1

Introduction

1.1

Chapter outline

Chapter 2

Mathematical background

2.1

Linear algebra

2.1.1

General definitions and theorems

2.1.2

Trace

2.1.3

Idempotence

2.1.4

Vectorization and Kronecker product

2.2

Statistics

2.2.1

General concepts

2.2.2

The Maximum Likelihood Estimator

2.2.3

The matrix normal distribution

2.2.4

General Linear Model

2.2.5

Quadratic- and bilinear form

Chapter 3

Explicit estimators of a banded

covariance matrix

3.1

Previous results

3.2

Remodeling of explicit estimators

3.3

Proof of unbiasedness and consistency

Chapter 4

Generalization to a general

linear model

4.1

Presentation

4.2

B instead of ˆ

ˆ

µ

4.3

Proposed estimators

4.4

Proof of unbiasedness and consistency

Chapter 5

Simulations

5.1

Simulations of the regular normal distribution

5.2

Simulations of the estimators for a general linear

model

Chapter 6

Discussion and further research

6.1

Discussion

6.2

Further research

Appendix A

Motivation of the estimator

Bibliography