High-dimensional profile analysis

(1)

Department of Mathematics

HIGH-DIMENSIONAL PROFILE

ANALYSIS

Cigdem Cengiz and Dietrich von Rosen

(2)

Linköping University

Department of Mathematics

SE-581 83 Linköping

(3)

HIGH-DIMENSIONAL PROFILE ANALYSIS

Cigdem Cengiz

1

and Dietrich von Rosen

1,2

1_{Department of Energy and Technology,}

Swedish University of Agricultural Sciences, SE-750 07 Uppsala, Sweden.

2_{Department of Mathematics,}

Link¨oping University, SE-581 83 Link¨oping, Sweden.

Abstract

The three tests of profile analysis: test of parallelism, test of level and test of flatness have been studied. Likelihood ratio tests have been derived. Firstly, a traditional setting, where the sample size is greater than the dimension of the parameter space, is considered. Then, all tests have been derived in a high-dimensional setting. In high-dimensional data analysis, it is required to use some techniques to tackle the problems which arise with the dimensionality. We propose a dimension reduction method using scores which was first proposed by L¨auter et al. (1996).

Keywords: High-dimensional data; hypothesis testing; linear scores; multivariate analysis; profile analysis; spherical distributions.

(4)

Notation

Abbreviations p.d. : positive definite

p.s.d. : positive semi-definite

i.i.d. : independently and identically distributed i.e. : that is

e.g. : for example

MANOVA : multivariate analysis of variance

GMANOVA : generalized multivariate analysis of variance BRM : bilinear regression model

PLS : partial least squares

PCA : principal component analysis PCR : principal component regression CLT : central limit theorem

LLN : law of large numbers

Symbols x : column vector X : matrix A0 : transpose of A A−1 : inverse of A A+ _{: Moore-Penrose inverse of A} A− : generalized inverse of A |A| : determinant of A C(A) : column space of A r(A) : rank of A

A⊥ : orthocomplement of subspace A A◦ : C(A◦) = C(A)⊥

⊗ : Kronecker product

: orthogonal sum of linear spaces vec: vec-operator

In : n × n identity matrix

1n : n × 1 vector of ones

E[x] : expectation of x D[x] : dispersion matrix of x

Np(µ, Σ) : multivariate normal distribution

Np,n(µ, Σ, Ψ) : matrix normal distribution

Wp(Σ, n, ∆) : non-central Wishart distribution with n degrees of freedom

Wp(Σ, n) : central Wishart distribution with n degrees of freedom d

= : equal in distribution (A)( )0 : (A)(A)0

(5)

1 Introduction

In this report, we are going to construct test statistics for each of the three hypothesis in profile analysis, first in a classical setting where the number of parameters is less than the number of subjects and then in a high-dimensional setting where the opposite holds, i.e., the number of parameters exceeds the number of individuals.

In profile analysis, we have multiple variables for each individual who form different groups (at least two) and the groups are compared based on the mean vectors of these variables. The idea is to see if there is an interaction between groups and responses. Assume we have p variables and q independent groups (treatments), the p-dimensional vectors are denoted by x1, x2, ..., xq with mean vectors µ1, µ2, ..., µq. The mean profile for the i-th

group is obtained by connecting the lines between the points (1, µi1), (2, µi2), ..., (p, µip).

We can then consider the profile analysis as the comparison of these q lines of mean vectors. See Figure 1 for an illustration.

Figure 1: Profiles of q groups.

There are two possible scenarios that can be considered for the responses:

I. The same mean-variable can be compared between q groups over several time-points (repeated measurements).

II. One can measure different variables for each subject and compare their mean levels between q groups.

In the literature, the topic has been investigated by many researchers. One of the first and leading papers on this topic was published by Greenhouse and Geisser (1959) and the topic has been revisited by Geisser (2003). Srivastava (1987) derived the likelihood ratio tests together with their distributions for the three hypothesis. A chapter on profile analysis can be found in the books by Srivastava (2002) and Srivastava and Carter (1983). Potthoff and Roy (1964) presented the growth curve model for the first time and other

(6)

extensions within the framework of the growth curve model can be found in Fujikoshi (2009), where Fujikoshi extended profile analysis, especially statistical inference on the parallelism hypothesis. Ohlson and Srivastava (2010) considered profile analysis of several groups, where the groups have partly equal means. Seo, Sakurai and Fujikoshi (2011) derived the likelihood ratio tests for the two hypotheses, level and flatness, in profile analysis of growth curve data. Another focus was on the profile analysis with random effects covariance structure. Srivastava and Singull (2012) constructed tests based on the likelihood ratio, without any restrictions on the parameter space, for testing the covariance matrix for random-effects structure or sphericity. Yokoyama (1995) derived the likelihood ratio criterion with random-effects covariance structure under the parallel profile model. Yokoyama and Fujikoshi (1993) conducted analysis of parallel growth curves of groups where they assumed a random-effects covariance structure. They also gave the asymptotic null distributions of the tests.

1.1 Profile analysis of several groups

As mentioned before, there are three types of tests which are commonly considered in profile analysis: test of parallelism, test of levels and test of flatness (Srivastava and Carter, 1983; Srivastava, 1987, 2002). Assume that the ni p-dimensional random vectors

xij are independently normally distributed as Np(µi, Σ), j = 1, ..., ni, i = 1, ..., q, where

µi = (µ1,i, ..., µp,i)0.

(1) Parallelism hypothesis

H1 : µi− µq = γi1p, i = 1, ..., q − 1 and A1 6= H1,

where A1 stands for alternative hypothesis and 1p is a p-dimensional vector of ones.

(2) Level hypothesis

H2|H1 : γi = 0, i = 1, ..., q − 1 and A2 6= H2|H1,

where H2|H1 means H2 under the assumption that H1 is true.

(3) Flatness hypothesis

H3|H1 : µ• = γq1p and A3 6= H3|H1,

where H3|H1 means H3 under the assumption that H1 is true.

The parameters γi’s represent unknown scalars and µ• = _N1 Pq_i=1niµi where N is the

total sample size, that is, N = n1+ · · · + nq.

If the profiles are parallel, we can say that there is no interaction between the responses and the treatments (groups). Given that the parallelism hypothesis holds, one may want to proceed with testing the second hypothesis, H2, which indicates that there is no column

or treatment (group) effect. Alternatively, if the first hypothesis holds, one may want to proceed with testing the third hypothesis, H3, which indicates that there is no row

effect. Briefly speaking, the level hypothesis under the parallelism hypothesis (H2|H1)

indicates that the q profiles are coincident with each other. The flatness hypothesis under the parallelism hypothesis (H3|H1) indicates that the q profiles are constant. It is useful

to note that failing to reject H1 does not mean that it is true, that is the profiles are

parallel. It means that we do not have enough evidence against H1. The second and the

(7)

1.2 Test statistics for the two-sample case

In this section, only the special case of two groups is considered. Let the p-dimensional random vectors x(i)₁ , ..., x(i)ni, i = 1, 2, be independently normally distributed with mean

vector µi and covariance matrix Σ. The sample mean vectors, the sample covariance

matrices and the pooled sample covariance matrix are given by

x(i) = 1 ni ni X j=1 x(i)_j , S(i) = 1 ni− 1 ni X j=1

(x(i)_j − x(i))(x(i)_j − x(i))0,

Sp =

1 n1+ n2− 2

[(n1− 1)S(1)+ (n2− 1)S(2)].

Define a (p − 1) × p matrix C which satisfies C1p = 0 and is of rank r(C) = p − 1. Let

b = n1n2 n1+ n2

, f = n1+ n2− 2, u = x(1)− x(2).

Then the three hypotheses and related test statistics can be written as below (Srivastava and Carter, 1983; Srivastava, 1987, 2002):

(1) Parallelism hypothesis: H1 : Cµ1 = Cµ2.

The null hypothesis is rejected if f − (p − 1) + 1

f (p − 1) bu

0_C0_(CS

pC0)−1Cu ≥ Fp−1,f −p+2,α,

where Fp−1,f −p+2,α denotes the α-percentile of the F -distribution with p − 1 and

f − p + 2 degrees of freedom.

(2) Level hypothesis: H2|H1 : 10pµ1 = 10pµ2.

The null hypothesis is rejected if f − p + 1 f b(1 0 S_p−1u)2(10S_p−11)−1(1 + f−1T_p−12 )−1 ≥ t2 f −p+1,α/2 = F1,f −p+1,α, where T2 p−1 = bu 0_C0_(CS

pC0)−1Cu and t2_{f −p+1,α/2} is the α/2-percentile of the

t-distribution with f − p + 1 degrees of freedom. (3) Flatness hypothesis: H3|H1 : C(µ1+ µ2) = 0.

The null hypothesis is rejected if n(f − p + 3)

p − 1 x

0

C0(CV C0+ bCuu0C0)−1Cx ≥ Fp−1,n−p+1,α,

where x = (n1x(1)+ n2x(2))/(n1+ n2) and V = f Sp.

As it is mentioned before, the second hypothesis is tested given that H1 is true. If one

(8)

2 Useful definitions and theorems

Definition 2.1. The vector space generated by the columns of an arbitrary matrix A : p × q is denoted C(A):

C(A) = {a : a = Ax, x ∈ Rq_}.

Definition 2.2. A matrix, whose columns generate the orthogonal complement to C(A) is denoted A◦, i.e., C(A◦) = C(A)⊥. Similar to the generalized inverse, A◦ is not unique. One can choose A◦ = I − (A0)−A0 or A◦ = I − A(A0A)−A0 in addition to some other choices.

Definition 2.3. The space CV(A) denotes a column vector space with an inner product

defined through the positive definite matrix V ; i.e., for any pair of vectors x and y, the operation x0V−1y holds. If V = I, instead of CI(A) one writes C(A).

Definition 2.4. The orthogonal complement to CV(A) is denoted by CV(A)⊥ and is

generated by all the vectors orthogonal to all the vectors in CV(A); i.e., for an arbitrary

a ∈ CV(A), all the y satisfying y0V−1a = 0 generate the linear space (column vector

space) CV(A)⊥.

Definition 2.5. Let V1 and V2 be disjoint subspaces and y = x1 + x2, where x1 ∈ V1

and x2 ∈ V2. The mapping P y = x1 is called a projection of y on V1 along V2 and P

is a projector. If V1 and V2 are orthogonal, we say that we have an orthogonal projector.

(i) P P = P , which means P is an idempotent matrix. (ii) If P is a projector, then I − P is also a projector. (iii) P is unique.

(iv) PA = A(A0A)−A0 is a projector on C(A) for which the standard inner product is

assumed to hold.

(v) PA,V = A(A0V−1A)−A0V−1 is a projector on CV(A) for which an inner product

defined by (x, y) = x0V−1y is assumed to hold and V is p.d.

Definition 2.6. The matrix W : p × p is said to be Wishart distributed if and only if W = XX0 for some matrix X, where X ∼ Np,n(µ, Σ, I), Σ ≥ 0. If µ = 0, we have

a central Wishart distribution which is denoted by W ∼ Wp(Σ, n) and if µ 6= 0, we

have a non-central Wishart distribution which is denoted by W ∼ Wp(Σ, n, ∆) where

∆ = Σ−1µµ0.

Definition 2.7. Let X and Y be two arbitrary matrices. The covariance of these two matrices is defined by

Cov[X, Y ] = E[(vecX − E[vecX])(vecY − E[vecY ])0]

if the expectations exist. From this, we have the following:

(9)

Definition 2.8. The general multivariate linear model equals

X = M B + E,

where X : p × n is a random matrix which corresponds to the observations, M : p × q is an unknown parameter matrix and B : q × n is a known design matrix. Moreover, E ∼ Np,n(0, Σ, I), where Σ is an unknown p.d. matrix. This is also called the MANOVA

model.

Definition 2.9. A bilinear model can be defined as

X = AM B + E,

where X : p × n, the unknown mean parameter matrix M : q × k, the two design matrices A : p × q and B : k × n and the error matrix E. This is also called the GMANOVA model or the growth curve model.

Theorem 2.1. The general solution of the consistent equation in X:

AXB = C

can be given by any of these three formulas: (i) X = X0+ (A0)◦Z1B0+ A0Z2B◦ 0 + (A0)◦Z3B◦ 0 , (ii) X = X0+ (A0)◦Z1+ A0Z2B◦ 0 , (iii) X = X0+ Z1B◦ 0 + (A0)◦Z2B0,

where X0 represents a particular solution and Zi, i = 1, 2, 3, represent arbitrary matrices

of proper sizes.

Theorem 2.2. The equation AXB = C is consistent if and only if C(C) ⊆ C(A) and C(C0_{) ⊆ C(B}0_{). A particular solution of the equation is given by}

X0 = A−CB−,

where ”−” denotes an arbitrary g-inverse.

Theorem 2.3. If S is positive definite and C(B) ⊆ C(A),

PA,S = PB,S + SPA,S0 B ◦

(B◦0SP_A,S0 B◦)−B◦0PA,S.

A special case is

S−1− B◦(B◦0SB◦)−B◦0 = S−1B(B0S−1B)−B0S−1. Theorem 2.4. For A : m × n and B : n × m

|Im+ AB| = |In+ BA|.

Theorem 2.5. Let X ∼ Np,n(µ, Σ, Ψ). For any A : q × p and B : m × n

(10)

Theorem 2.6. Let W1 ∼ Wp(Σ, n, ∆1) be independent of W2 ∼ Wp(Σ, m, ∆2). Then

W1+ W2 ∼ Wp(Σ, n + m, ∆1+ ∆2).

Theorem 2.7. Let X ∼ Np,n(0, Σ, I) and Q be any idempotent matrix of a proper size.

Then

XQX0 ∼ Wp(Σ, r(Q)).

Theorem 2.8. Let A ∈ Rq×p and W ∼ Wp(Σ, n). Then

AW A0 ∼ Wq(AΣA0, n).

Theorem 2.9. Let A ∈ Rp×q and W ∼ Wp(Σ, n). Then

A(A0W−1A)−A0 ∼ Wp(A(A0Σ−1A)−A0, n − p + r(A)).

Theorem 2.10. Let a partitioned non-singular matrix A be given by A =A11 A12 A21 A22 . If A22 is non-singular, then |A| = |A22||A11− A12A−122A21|; if A11 is non-singular, then |A| = |A11||A22− A21A−111A12|.

Theorem 2.11. Let S be positive definite and suppose that V , W and H are of proper sizes, assuming H−1 exists. Then

(S + V HW0)−1 = S−1− S−1V (W0S−1V + H−1)−1W0S−1.

Theorem 2.12. Let E[X] = µ and D[X] = Ψ ⊗ Σ. Then (i) E[AXB] = AµB,

(ii) D[AXB] = B0ΨB ⊗ AΣA0.

Theorem 2.13. Let A : n × m and B : m × n. Then

r(A − ABA) = r(A) + r(Im− BA) − m = r(A) + r(In− AB) − n.

Theorem 2.14.

(i) For A, B, C and D of proper sizes

tr(ABCD) = tr(DABC) = tr(CDAB) = tr(BCDA).

(ii) If A is idempotent, then

tr(A) = r(A).

(iii) For any A

(11)

Theorem 2.15. Let A, B and C be matrices of proper sizes. Then vec(ABC) = (C0⊗ A)vecB.

Theorem 2.16.

(i) (A ⊗ B)0 = A0⊗ B0

(ii) Let A, B, C and D be matrices of proper sizes. Then (A ⊗ B)(C ⊗ D) = AC ⊗ BD.

(iii) A ⊗ B = 0 if and only if A = 0 or B = 0.

Theorem 2.17. Let A, B and C be matrices of proper sizes. If C(A) ⊆ C(B), then C(B) = C(A) C(B) ∩ C(A)⊥.

3 Likelihood ratio tests for the three hypotheses

In Section 1.1, we have given the three hypotheses of profile analysis of q groups and the test statistics of the likelihood ratio tests for two groups have been presented in Section 1.2 (Srivastava and Carter, 1983; Srivastava, 1987, 2002). Srivastava (1987) derived the test statistics for q groups, but in this section we will reformulate the problems as we indeed are in a mutivariate analysis of variance (MANOVA) or a generalized mutivariate analysis of variance (GMANOVA) testing situation. This will require matrix reformulation of the hypotheses and derivation of the likelihood ratio tests based on this matrix notation.

3.1 The model

The model for one group, say the k-th group, can be written as Xk= MkDk+ Ek

where Xk represents the matrix of observations, Mk the p-vector of mean parameters,

Dk a vector of nk ones, i.e., Dk = 10nk and Ek error matrix. The columns of Xk

are independently distributed, which means that the columns of Ek are independently

distributed. The assumption for the distribution of Ek is that the column vectors of Ek

follow a multivariate normal distribution; ejk ∼ Np(0, Σ).

When we have q groups, we have q models:

(X1 : X2 : ... : Xq) = (M1D1 : M2D2 : ... : MqDq) + (E1 : E2 : ... : Eq) = (M1 : M2 : ... : Mq)D + (E1 : E2 : ... : Eq), (1) where D is a q × N matrix, N =Pq k=1nk, which equals D =      1 · · · 1 0 · · · 0 · · · 0 0 · · · 0 1 · · · 1 · · · 0 .. . . .. ... ... ... ... ... ... 0 · · · 0 0 · · · 0 · · · 1      ,

(12)

and where Ek ∼ Np,nk(0, Σ, Ink). The relation in (1) can be written X (p×N ) = M (p×q) D (q×N ) + E (p×N ) , X ∼ Np,N(M D, Σ, IN). where X = (X1 : X2 : · · · : Xq), M = (M1 : M2 : · · · : Mq) and E = (E1 : E2 : · · · : Eq).

Moreover let F be a q × (q − 1) and C be a (p − 1) × p matrix which satisfy 10F = 0 and C1 = 0, respectively, e.g.,

F =          1 0 · · · 0 −1 1 · · · 0 0 −1 · · · 0 .. . ... . .. ... 0 0 · · · 1 0 0 · · · −1          , C =        1 −1 0 0 · · · 0 0 0 1 −1 0 · · · 0 0 0 0 1 −1 · · · 0 0 .. . ... ... ... . .. ... ... 0 0 0 0 · · · 1 −1        ·

These two matrices, F and C, will be used in the next chapter during the derivations of the tests. Since the common F and C are used in each hypothesis, they are introduced here.

3.2 Derivation of the tests

The model for q groups has been given above by

X = M D + E (2)

where X = (X1 : X2 : · · · : Xq). This model is often called the MANOVA model. If we

want to deduct any inference from the model, the unknown parameters M and Σ need to be estimated.

For the model in (2), the likelihood function equals

L(M , Σ) = (2π)−12pN|Σ|−N/2exp −1 2trΣ −1 (X − M D)(X − M D)0 ≤ (2π)−12pN 1 N(X − M D)(X − M D) 0 −1 2N e−12N p, where N = Pq

k=1nk and equality holds if and only if N Σ = (X − M D)(X − M D)0

(see Srivastava and Khatri, 1979, Theorem 1.10.4). Now, we need to find a lower bound for |(X − M D)(X − M D)0|.

Use that I = I − PD0 + P_D0 and then

|(X − M D)(X − M D)0| = |(X − M D)PD0(X − M D)0

+ (X − M D)(I − PD0)(X − M D)0|

≥ |(X − M D)(I − PD0)(X − M D)0| = |X(I − P_D0)X0|

and equality holds if and only if (X − M D)PD0 = 0. Thus XP_D0 = cM D and N bΣ =

RR0, where R = X(I − PD0) with P_D0 = D0(DD0)−D. This can be considered as a

(13)

two spaces which are orthogonal to each other; C(D0) and C(D0)⊥ which correspond to the mean space and residual space, respectively.

c

M D = XPD0 R = X(I − P_D0)

C(D0₎ _C(D0₎⊥

Figure 2: The decomposition of the space with no restriction on the mean parameter space.

Note that since D is a full rank matrix, one can write PD0 = D0(DD0)−D = D0(DD0)−1D.

Now we will move on to the derivation of the test statistics.

3.2.1 Parallelism Hypothesis

The null hypothesis and the alternative hypothesis for parallelism can be written

H1 : E(X) = M D, CM F = 0,

A1 : E(X) = M D, CM F 6= 0, (3)

where C and F are defined in Section 3.1.

Theorem 3.1. The likelihood ratio statistic for the parallelism hypothesis presented in (3) can be given as λ2/N = |CSC 0_| |CSC0_{+ CXP} D0_(DD0₎−1_KX0C0| , (4)

where S = X(I − PD0)X0 and K is any matrix satisfying C(K) = C(D) ∩ C(F ),

CSC0 ∼ Wp−1(CΣC0, N − r(D)), CXPD0_(DD0₎−1_KX0C0 ∼ W_p−1(CΣC0, r(K)). Then λ2/N = |CSC 0_| |CSC0_{+ CXP} D0_(DD0₎−1_KX0C0| ∼ Λ(p − 1, N − r(D), r(K)),

where Λ(·, ·, ·) denotes the Wilks’ lambda distribution. Proof. We have restrictions on the mean parameter space:

CM F = 0 ⇔ (F0⊗ C)vecM = 0

which means that vecM belongs to C(F ⊗ C0)⊥. By Theorem 2.1, the general solution of the equation CM F = 0 equals

M = (C0)◦θ1+ C0θ2F◦

0

(14)

where θ1 and θ2 are new parameters. Inserting this solution into (2) yields

X = (C0)◦θ1D + C0θ2F◦

0

D + E.

This is the reparameterization of the first model given by (2) after applying the restrictions CM F = 0. Here we notice that we are outside the MANOVA and GMANOVA model. Recall the inequality

|Σ|−N/2e−12tr{Σ −1_{(X−E(X))(X−E(X))}0_} ≤ 1 N(X − E(X))(X − E(X)) 0 −N/2 e−N p/2

with equality if and only if N Σ = (X − E(X))(X − E(X))0.

Now, we start performing some calculations under the null hypothesis: |(X − (C0)◦θ1D − C0θ2F◦ 0 D)( )0| ↑ I = PD0+(I − P_D0) = |(XPD0− (C0)◦θ₁D − C0θ₂F◦ 0 D)( )0 + X(I − PD0)X0 | {z } S | = |S||S−1(XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D)( )0+ I| = |S||(XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D)0S−1(XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D) + I|. (5) Recall from Theorem 2.3,

S−1 = C0(CSC0)−C + S−1(C0)◦[(C0)◦0S−1(C0)◦]−(C0)◦0S−1. If we replace S−1 in (5) with this expression:

=|S||(XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D)0S−1(C0)◦[(C0)◦0S−1(C0)◦]−(C0)◦0S−1 × (XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D) + (XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D)0 × C0(CSC0)−C(XPD0− (C0)◦θ₁D − C0θ₂F◦ 0 D) + I| ≥|S||(XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D)0C0(CSC0)−C × (XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D) + I| (6)

with equality if and only if

(XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D)0S−1(C0)◦[(C0)◦0S−1(C0)◦]−(C0)◦0S−1 ×(XPD0 − (C0)◦θ₁D − C0θ₂F◦ 0 D) = 0. Notice that D0θ₁0 (C0)◦0C0 orthogonal (CSC0)−C = 0. Hence, (6) is equal to |S||(XPD0 − C0θ₂F◦ 0 D)0C0(CSC0)−C | {z } =C0(CSC0)−CSC0(CSC0)−C (XPD0 − C0θ₂F◦ 0 D) + I| =|S||(XPD0 − C0θ₂F◦ 0 D)0C0(CSC0)−CSC0(CSC0)−C(XPD0 − C0θ₂F◦ 0 D) + I|.

(15)

By Theorem 2.4, =|S||C0(CSC0)−C(XPD0− C0θ₂F◦ 0 D)(XPD0 − C0θ₂F◦ 0 D)0C0(CSC0)−CS + I| =| SC0(CSC0)−C | {z } P0 C0,S−1 (XPD0− C0θ₂F◦ 0 D)(XPD0 − C0θ₂F◦ 0 D)0C0(CSC0)−CS | {z } P_C0,S−1 +S| =|(P_C00_,S−1XP_D0 − P0 C0_,S−1C0θ₂F◦ 0 D)( )0+ S| ↑ (I − PD0_F◦) + P_D0_F◦ =|P_C00_,S−1XPD0(I − P_D0_F◦)(P_C00_,S−1XPD0)0 + (P_C00_,S−1XP_D0P_D0_F◦− P0 C0_,S−1C0θ₂F◦ 0 D)( )0 + S| ≥|P_C00_,S−1XP_D0(I − P_D0_F◦)(P0 C0_,S−1XP_D0)0+ S|. (7)

Since PA◦ = I − P_A, one can write I − P_D0_F◦ = P_(D0_F◦₎◦. From the definition of the

column spaces given in the Notation part C[(D0F◦)◦] = C(D0F◦)⊥. Using Theorem 2.17, C(D0_F◦₎⊥ _{can be decomposed into two orthogonal subspaces:}

C(D0F◦)⊥ = C(D0)⊥_C(D0) ∩ C(D0F◦)⊥

| {z }

C(D0_(DD0₎−1_K)

, (8)

where C(K) = C(D) ∩ C(F ). The space C(D0)⊥ will correspond to I − PD0 and the space

C(D0_(DD0₎−1_{K) will correspond to P}

D0_(DD0₎−1_K. Then

(7) =|P_C00_,S−1XP_D0[(I − P_D0) + P_D0_(DD0₎−1_K](P_C00_,S−1XP_D0)0+ S|

(PD0(I − P_D0) = 0, P_D0P_D0_(DD0₎−1_K = P_D0_(DD0₎−1_K)

=|P_C00_,S−1XP_D0_(DD0₎−1_K(P_C00_,S−1X)0+ S|.

For the alternative hypothesis, we do not have any restrictions on the mean parameter space. Thus, we will use the results from the introduction of Section 3.2, where N bΣ = RR0 = S was found. N bΣA1 = S, N bΣH1 = S + P 0 C0_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1. (9) Thus, λ2/N = |N bΣA1| |N bΣH1| = |S| |S + P0 C0_,S−1XPD0_(DD0₎−1_KX0P_C0_,S−1| .

The numerator and the denominator are not independently distributed. To be able to achieve this independency, we introduce the full rank matrix

H = (C0, S−1(C0)◦).

Multiplying the ratio (both the numerator and the denominator) by H yields

λ2/N = |H 0 b ΣA1H| |H0 b ΣH1H| = |CSC 0_| |CSC0_{+ CXP} D0_(DD0₎−1_KX0C0| ·

(16)

By Theorem 2.7 and Theorem 2.8, the following two relations hold:

S = X(I − PD0)X0 ∼ W_p(Σ, N − r(D)) ⇒ CSC0 ∼ W_p−1(CΣC0, N − r(D)),

XPD0_(DD0₎−1_KX0 ∼ W_p(Σ, r(K)) ⇒ CXP_D0_(DD0₎−1_KX0C0 ∼ W_p−1(CΣC0, r(K)).

The ratio given by λ2/N does not depend Σ, consequently we can replace CΣC0 with Ip−1. For a detailed explanation, see Appendix A, Result A1. Therefore,

λ2/N = |CSC

0_|

|CSC0_{+ CXP}

D0_(DD0₎−1_KX0C0|

∼ Λ(p − 1, N − r(D), r(K)),

which completes the proof.

Note that the distribution of λ2/N_{, that is Wilks’ lambda distribution, can be}

approxi-mated very accurately (L¨auter, 2016, Mardia, Kent and Bibby, 1979).

3.2.2 Level Hypothesis

Assuming that the profiles are parallel, we will construct a test to check if they coincide. The null hypothesis and the alternative hypothesis for the level test can be written

H2|H1 : E(X) = M D, M F = 0,

A2|H1 : E(X) = M D, CM F = 0, (10)

Theorem 3.2. The likelihood ratio statistic for the level hypothesis can be expressed as

λ2/N = |(C 0₎◦0 S−1(C0)◦|−1 ((C0)◦0S−1(C0)◦)−1+ ((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1KQ−1K0 × (DD0₎−1_DX0_S−1_(C0₎◦_((C0₎◦0 S−1(C0)◦)−1 (11) where Q = K0(DD0)−1K+K0(DD0)−1DX0C0(CSC0)−1CXD0(DD0)−1K and C(K) = C(D) ∩ C(F ), ((C0)◦0S−1(C0)◦)−1 ∼ W1(((C0)◦ 0 Σ−1(C0)◦)−1, N − r(D) − p + 1), ((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1KQ−1K0(DD0)−1DX0S−1(C0)◦ × ((C0)◦0S−1(C0)◦)−1 ∼ W1(((C0)◦ 0 Σ−1(C0)◦)−1, r(K)). Then λ2/N ∼ Λ(1, N − r(D) − p + 1, r(K)).

Proof. Equivalent expressions for the restrictions in both hypotheses can be written as

H2 : M F = 0 ⇔ M = θF◦ 0 , A2 : CM F = 0 ⇔ M = (C0)◦θ1+ C0θ2F◦ 0 .

(17)

Plugging these solutions into the models gives H2 : X = θF◦ 0 D + E, A2 : X = (C0)◦θ1D + C0θ2F◦ 0 D + E.

First the null hypothesis will be studied. Under H2:

with equality if and only if XPD0_F◦ = bθF◦ 0

D. As with the parallelism hypothesis, we will partition the space C(D0F◦)⊥, which corresponds to I − PD0_F◦, into two orthogonal

parts, that is C(D0F◦)⊥ = C(D0)⊥_C(D0) ∩ C(D0F◦)⊥, where C(D0) ∩ C(D0F◦)⊥ = C(D0_(DD0₎−1_{K) with C(K) = C(D) ∩ C(F ).}

We have already derived the maximum of the likelihood under the restriction CM F = 0 while considering the parallelism hypothesis. This restriction appeared in the null hypothesis for testing parallelism. For the second hypothesis, we assume that the profiles are parallel (or we do not reject H1) and the test is conducted to see if they have equal

levels. The restrictions for this test can be summarised with M F = 0. The alternative hypothesis will not simply be M F 6= 0 because we already assume that the profiles are parallel. Consequently the level hypothesis is tested against the parallelism hypothesis due this prior knowledge. This is the reason CM F = 0 why appears here in the alternative hypothesis. Thus |N bΣH2| = |X(I − PD0F◦)X 0_{| = |X(I − P} D0)X0+ XP_D0_(DD0₎−1_KX0|, |N bΣA2| = |X(I − PD0)X 0 + P_C00_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1|,

where X(I − PD0)X0 = S. We know that S and XP_D0_(DD0₎−1_KX0 are Wishart

dis-tributed, but P_C00_,S−1XPD0_(DD0₎−1_KX0P_C0_,S−1 is not. Therefore the likelihood function

will be manipulated similarly to the treatment of the parallelism hypothesis. Put H = (C0, S−1(C0)◦) which is a full rank matrix and multiply both the numerator and the denominator in the likelihood ratio with H0 and H from left and right respectively:

λ2/N = |H 0 b ΣA2H| |H0 b ΣH2H| ·

We start with calculating H0ΣbA2H:

|H0Σb_A₂H| = C (C0)◦0S−1 (S + P_C00_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1)(C0, S−1(C0)◦) = V11 V12 V21 V22 ,

(18)

where V11 =C(S + PC00_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1)C0, V12 =C(S + PC00_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1)S−1(C0)◦, V21 =(C0)◦ 0 S−1(S + P_C00_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1)C0, V22 =(C0)◦ 0 S−1(S + P_C00_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1)S−1(C0)◦. It follows that V21= (C0)◦ 0 C0 | {z } =0 +(C0)◦0S−1P_C00_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1C0 = (C0)◦0S−1[C0(CSC0)−CS]0XP_D0_(DD0₎−1_KX0C0(CSC0)−CSC0 = (C0)◦0S−1S | {z } =I C0 | {z } =0 (CSC0)−CXPD0_(DD0₎−1_KX0C0(CSC0)−CSC0 = 0.

Notice that V12 = V210 . Therefore V12 = 0.

V11= CSC0+ CSC0(CSC0)−C | {z } =C XPD0_(DD0₎−1_KX0C0(CSC0)−CSC0 | {z } =C0 and V22 =(C0)◦ 0 S−1SS−1(C0)◦ + (C0)◦0S−1S | {z } =I C0 | {z } =0 (CSC0)−CXPD0_(DD0₎−1_KX0C0(CSC0)−CSS−1(C0)◦ =(C0)◦0S−1(C0)◦.

Then the determinant can be written

|H0Σb_A₂H| = CSC0 + CXP_D0_(DD0₎−1_KX0C0 0 0 (C0)◦0S−1(C0)◦ = C(X(I − PD0_F◦)X0)C0 0 0 (C0)◦0S−1(C0)◦ .

Let’s move on to the null hypothesis:

|H0Σb_H₂H| = C (C0)◦0S−1 (S + XPD0_(DD0₎−1_KX0)(C0, S−1(C0)◦) = C(S + XPD0_(DD0₎−1_KX0)C0 C(S + XP_D0_(DD0₎−1_KX0)S−1(C0)◦ (C0)◦0S−1(S + XPD0_(DD0₎−1_KX0)C0 (C0)◦ 0 S−1(S + XPD0_(DD0₎−1_KX0)S−1(C0)◦ = CX(I − PD0_F◦)X0C0 CX(I − P_D0_F◦)X0S−1(C0)◦ (C0)◦0S−1X(I − PD0_F◦)X0C0 (C0)◦ 0 S−1X(I − PD0_F◦)X0S−1(C0)◦ .

(19)

Note that we used the relation X(I − PD0_F◦)X0 = X(I − P_D0)X0+ XP_D0_(DD0₎−1_KX0 =

S + XPD0_(DD0₎−1_KX0. The determinant for the alternative hypothesis is straightforward.

We use the Theorem 2.10, which is for the partitionened matrices, for the determinant for the null hypothesis. Hence,

|H0Σb_A₂H| = |CX(I − P_D0_F◦)X0C0||(C0)◦ 0 S−1(C0)◦|, |H0Σb_H₂H| = |CX(I − P_D0_F◦)X0C0||(C0)◦ 0 S−1X(I − PD0_F◦)X0S−1(C0)◦ − (C0)◦0S−1X(I − PD0_F◦)X0C0[CX(I − P_D0_F◦)X0C0]−1 × CX(I − PD0_F◦)X0S−1(C0)◦|.

Then the ratio will become |H0 b ΣA2H| |H0 b ΣH2H| = |(C 0₎◦0 S−1(C0)◦| |(C0₎◦0 S−1_S 1S−1(C0)◦− (C0)◦ 0 S−1_S 1C0[CS1C0]−1× CS1S−1(C0)◦| = |(C 0₎◦0 S−1(C0)◦| |(C0₎◦0 S−1_[S 1− S1C0(CS1C0)−1CS1]S−1(C0)◦| ·

By the special case of Theorem 2.3,

S1C0(CS1C0)−1CS1 = S1− (C0)◦[(C0)◦ 0 S₁−1(C0)◦]−1(C0)◦0. Thus, |H0 b ΣA2H| |H0 b ΣH2H| = |(C 0₎◦0 S−1(C0)◦| |(C0₎◦0 S−1_[S 1− S1+ (C0)◦((C0)◦ 0 S₁−1(C0₎◦₎−1_(C0₎◦0 ]S−1_(C0₎◦_| = |(C 0₎◦0 S−1(C0)◦| |(C0₎◦0 S−1_(C0₎◦_||((C0₎◦0 S₁−1(C0₎◦₎−1_(C0₎◦0 S−1_(C0₎◦_| = |(C 0 )◦0S−1(C0)◦|−1 |(C0₎◦0 S₁−1(C0₎◦_)|−1· Notice that S₁−1 = [X(I − PD0_F◦)X0]−1 = (S + XP_D0_(DD0₎−1_KX)−1 = [S + (XP1)(XP1)0]−1 (put P1=PD0(DD0)−1K)

= S−1− S−1(XP1)[(XP1)0S−1(XP1) + I]−1(XP1)0S−1 (since P1is idem. and sym.)

= S−1− S−1XP1(P1X0S−1XP1+ I)−1P1X0S−1. (by Theorem 2.11) (12) If we replace S₁−1 in |(C0)◦0S₁−1(C0)◦)|−1 with (12): |(C0₎◦0 S₁−1(C0)◦)|−1 = |(C0)◦0S−1(C0)◦− (C0)◦0S−1XP1(P1X0S−1XP1+ I)−1P1X0S−1(C0)◦| = |(C0)◦0S−1(C0)◦|−1|I − P1X0S−1(C0)◦((C0)◦ 0 S−1(C0)◦)−1 × (C0)◦0S−1XP1(P1X0S−1XP1 + I)−1|−1 = |(C0)◦0S−1(C0)◦|−1|P1X0S−1XP1+ I||I + P1X0S−1XP1 − P1X0S−1(C0)◦((C0)◦ 0 S−1(C0)◦)−1(C0)◦0S−1XP1|−1. (13)

(20)

For the last determinant in (13), I + P1X0S−1XP1− P1X0S−1(C0)◦((C0)◦ 0 S−1(C0)◦)−1(C0)◦0S−1XP1 = I + P1X0[S−1− S−1(C0)◦((C0)◦ 0 S−1(C0)◦)−1(C0)◦0S−1]XP1. (14)

Using the relation given in Theorem 2.3,

(14) = I + P1X0[C0(CSC0)−1C]XP1.

Thus,

|(C0)◦0S₁−1(C0)◦)|−1 = |(C0)◦0S−1(C0)◦|−1|P1X0S−1XP1+ I|

× |I + P1X0C0(CSC0)−1CXP1|−1.

If we put this result back into the ratio:

|H0 b ΣA2H| |H0 b ΣH2H| = |(C 0₎◦0 S−1(C0)◦|−1 |(C0₎◦0 S₁−1(C0₎◦_)|−1 = |I + P1X0C0(CSC0)−1CXP1| |I + P1X0S−1XP1| = |I + X 0_C0_(CSC0₎−1_CXP 1| |I + X0_S−1_XP 1| · (15) Note that P1 = PD0_(DD0₎−1_K = D0(DD0)−1K(K0(DD0)−1DD0(DD0)−1K)−K0(DD0)−1D.

Plug this into the ratio above and take K0(DD0)−1D to the left:

(15) = I + K0(DD0)−1DX0C0(CSC0)−1CXD0(DD0)−1K × (K0_(DD0₎−1_DD0_(DD0₎−1_K)− I + K0(DD0)−1DX0S−1XD0(DD0)−1K × (K0_(DD0₎−1_DD0_(DD0₎−1_K)− ·

Now we take (K0(DD0)−1DD0(DD0)−1K)−out for both the numerator and the denom-inator. Then

(15) = |K

0_(DD0₎−1_DD0_(DD0₎−1_{K + K}0_(DD0₎−1_DX0_C0_(CSC0₎−1_CXD0_(DD0₎−1_K|

|K0_(DD0₎−1_DD0_(DD0₎−1_{K + K}0_(DD0₎−1_DX0_S−1_XD0_(DD0₎−1_K| ·

DD0(DD0)−1 = I. Moreover, we know that S−1= S−1(C0)◦((C0)◦0S−1(C0)◦)−1(C0)◦0 × S−1_{+ C}0_(CSC0₎−1_{C, which implies} (15) = |K 0_(DD0₎−1_{K + K}0_(DD0₎−1_DX0_C0_(CSC0₎−1_CXD0_(DD0₎−1_K| K0(DD0)−1K + K0(DD0)−1DX0S−1(C0)◦((C0)◦0S−1(C0)◦)−1(C0)◦0 × S−1_XD0_(DD0₎−1_{K + K}0_(DD0₎−1_DX0_C0_(CSC0₎−1_CD0_(DD0₎−1_K = |I + [K0(DD0)−1K + K0(DD0)−1DX0C0(CSC0)−1CXD0(DD0)−1K]−1 × K0(DD0)−1DX0S−1(C0)◦((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1K|−1

(21)

Put Q = K0(DD0)−1K + K0(DD0)−1DX0C0(CSC0)−1CXD0(DD0)−1K and use the rotation in Theorem 2.4: (15) = |I + (C0)◦0S−1XD0(DD0)−1KQ−1K0(DD0)−1DX0S−1(C0)◦((C0)◦0S−1(C0)◦)−1|−1 = |(C0)◦0S−1(C0)◦|−1|((C0)◦0S−1(C0)◦)−1+ ((C0)◦0S−1(C0)◦)−1(C0)◦0S−1 × XD0(DD0)−1KQ−1K0(DD0)−1DX0S−1(C0)◦((C0)◦0S−1(C0)◦)−1|−1 = |(C 0₎◦0 S−1(C0)◦|−1 ((C0)◦0S−1(C0)◦)−1+ ((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1KQ−1K0 × (DD0₎−1_DX0_S−1_(C0₎◦_((C0₎◦0 S−1(C0)◦)−1 ·

Now we will find the distributions of the expressions in this ratio. Let’s start with ((C0)◦0S−1(C0)◦)−1. We multiply this expression with the following identity matrices from left and right:

| {z } I ((C0)◦0(C0)◦)−1(C0)◦0(C0)◦((C0)◦0S−1(C0)◦)−1 | {z } I (C0)◦0 | {z } =S−SC0_(CSC0₎−1_CS (C0)◦((C0)◦0(C0)◦)−1 = ((C0)◦0(C0)◦)−1(C0)◦0[S − SC0(CSC0)−1CS](C0)◦((C0)◦0(C0)◦)−1. (16) Recall that S = X(I − PD0)X0. Then (16) becomes

((C0)◦0(C0)◦)−1(C0)◦0X(I − PD0)(I − X0C0[CX(I − P_D0)X0C0]−1

× CX(I − PD0))X0(C0)◦((C0)◦ 0

(C0)◦)−1. (17) From Theorem 2.3, two relations

I = (C0)◦((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1+ ΣC0(CΣC0)−1C, I = Σ−1(C0)◦((C0)◦0Σ−1(C0)◦)−1(C0)◦0 + C0(CΣC0)−1CΣ are obtained which will be then used in (17):

(17) = ((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1X(I − PD0)(I − X0C0[CX(I − P_D0)

× X0C0]−1CX(I − PD0))X0Σ−1(C0)◦((C0)◦ 0

Σ−1(C0)◦)−1. We have achieved a structure where we have · · · X ( )

| {z }

idempotent

X0· · · . Now the rank of this

idempotent matrix is checked (see Appendix A, Result A2) :

r(I − PD0)(I − X0C0[CX(I − P_D0)X0C0]−1CX(I − P_D0)) = N − r(D) − p + 1. (18)

We also need to show that (C0)◦0Σ−1X and X0C0 are independent which is verified in Appendix A, Result A3.

Thus, we can conclude that the conditional distribution of ((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1X × (I − PD0)(I − X0C0[CX(I − P_D0)X0C0]−1CX(I − P_D0)) conditioned on CX follows

a normal distribution with mean 0 and dispersion matrix ((C0)◦0Σ−1(C0)◦)−1. Then, by Theorem 2.7,

((C0)◦0S−1(C0)◦)−1 ∼ W1(((C0)◦

0

(22)

which is independent of CX. Now we will move on to the second expression in the ratio given by (15), which equals

((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1KQ−1K0

× (DD0)−1DX0S−1(C0)◦((C0)◦0S−1(C0)◦)−1. (19) First focus on ((C0)◦0S−1(C0)◦)−1(C0)◦0S−1X. We use the identity matrix ((C0)◦0(C0)◦)−1 ×(C0₎◦0

(C0)◦ and the relations (C0)◦((C0)◦0S−1(C0)◦)−1(C0)◦0S−1 = I −SC0(CSC0)−1C and S = X(I − PD0)X0. Then

((C0)◦0(C0)◦)−1(C0)◦0(C0)◦((C0)◦0S−1(C0)◦)−1(C0)◦0S−1X

= ((C0)◦0(C0)◦)−1(C0)◦0X[I − (I − PD0)X0C0(CX(I − P_D0)X0C0)−1CX]. (20)

Similar to the calcutions done for (17), the identity matrix

I = (C0)◦((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1+ ΣC0(CΣC0)−1C will be used in (20):

((C0)◦0(C0)◦)−1(C0)◦0(C0)◦((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1X

× [I − (I − PD0)X0C0(CX(I − P_D0)X0C0)−1CX]

=((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1X[I − (I − PD0)X0C0(CX(I − P_D0)X0C0)CX]. (21)

Moreover, (C0)◦0Σ−1X and X0C0 are independently distributed. Thus, (21) is normally distributed given CX and so is

((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1KQ−1/2. (22) From the definition of the Wishart distribution given by Definition 2.6, if X is normally distributed with mean 0 and dispersion I ⊗ Σ, then XX0 ∼ W (Σ, n). So we need to check the mean and dispersion for (22). The mean is zero and for the dispersion recall that

D[AXB] = (B0⊗ A) D(X) | {z }

I⊗Σ

(B ⊗ A0) = (B0B) ⊗ (AΣA0).

Using this formula, the dispersion of (22) for given CX equals

Q−1/2K0(DD0)−1D[I − X0C0(CX(I − PD0)X0C0)−1CX(I − P_D0)][I − (I − P_D0)

× X0C0(CX(I − PD0)X0C0)−1CX]D0(DD0)−1KQ−1/2⊗ ((C0)◦ 0

Σ−1(C0)◦)−1(C0)◦0 × Σ−1ΣΣ−1(C0)◦((C0)◦0Σ−1(C0)◦)−1.

(23)

The details for the calculation of (23) is given in Appendix A, Result A4. Based on this result

(23)

Hence, we can conclude that (22) is normally distributed with mean 0 and dispersion I ⊗ ((C0)◦0Σ−1(C0)◦)−1, conditional on CX. Therefore, the square of the matrix in (22) is Wishart distributed. Notice that

[I − (I − PD0)X0C0(CX(I − P_D0)X0C0)−1CX]D0(DD0)−1KQ−1/2

× Q−1/2K0(DD0)−1D[I − X0C0(CX(I − PD0)X0C0)−1CX(I − P_D0)] (24)

is idempotent. To show this, put G = Q−1/2K0(DD0)−1D[I−X0C0(CX(I−PD0)X0C0)−1

× CX(I − PD0)] and

G0GG0 | {z }

=I

G = G0G.

To see the details for GG0 = I, see Appendix A, Result A4. Thus, (24) is idempotent. We need to check the rank of this idempotent matrix to determine the degrees of freedom in the Wishart distribution (See Theorem 2.7):

r((24)) = r(G0G)Prop. 2.14 ii)= tr(G0G)Prop. 2.14 i)= tr(GG0) = tr(I).

Furthermore, tr(I) will be equal to the size of GG0. To be able to find the size of GG0, one needs to check Q = K0(DD0)−1K + K0(DD0)−1DX0C0(CSC0)−1CXD0(DD0)−1K. Say K is a q × s matrix and r(K) = s. Then Q is s × s, so the size of GG0 is s = r(K). As a conclusion,

((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1KQ−1K0(DD0)−1DX0S−1(C0)◦ × ((C0)◦0S−1(C0)◦)−1 ∼ W1(((C0)◦

0

Σ−1(C0)◦)−1, r(K)).

Thus, the distribution for (15) is given by |U |

|U + V |, where U ∼ W1(((C0)◦ 0 Σ−1(C0)◦)−1, N − r(D) − p + 1), V ∼ W1(((C0)◦ 0 Σ−1(C0)◦)−1, r(K)).

If we pre- and post-multiply U and V with ((C0)◦0Σ−1(C0)◦)1/2_{, and denote the new}

expressions with eU and eV respectively, then the ratio becomes | eU |

| eU + eV |, where

e

U ∼ W1(I, N − r(D) − p + 1) and V ∼ We ₁(I, r(K)).

Then

λ2/N = | eU |

(24)

3.2.3 Flatness Hypothesis

Assuming that the profiles are parallel, we will test if they are flat or not.

H3|H1 : E(X) = M D, CM = 0,

A3|H1 : E(X) = M D, CM F = 0, (25)

Theorem 3.3. The likelihood ratio statistic for the flatness hypothesis is given by

λ2/N = |CSC 0 + CXPD0_(DD0₎−1_KX0C0| |CSC0 _{+ CXP} D0_(DD0₎−1_KX0C0+ CXP_D0_F◦X0C0| , (26) where CXPD0_F◦X0C0 ∼ W_p−1(CΣC0, r(D0F◦)), CSC0+ CXPD0_(DD0₎−1_KX0C0 ∼ W_p−1(CΣC0, N − r(D) + r(K)). Then λ2/N ∼ Λ(p − 1, N − r(D) + r(K), r(D0F◦)).

Proof. Equivalent expressions for the restrictions in both hypotheses can be written

H3 : CM = 0 ⇔ M = (C0)◦θ,

A3 : CM F = 0 ⇔ M = (C0)◦θ1+ C0θ2F◦

0

.

If we plug in these solutions into the model given in (2), then H3 : X = (C0)◦θD + E,

A3 : X = (C0)◦θ1D + C0θ2F◦

0

D + E.

We will first look at the null hypothesis. Under H3:

|(X − (C0)◦θD)(X − (C0)◦θD)0| = |(XPD0 − (C0)◦θD)( )0+ X(I − P_D0)X0|. (27)

↑

PD0 + (I − P_D0)

We cannot simply say that (27) ≥ |X(I − PD0)X0| with equality if and only if XP_D0 =

(C0)◦θD because XPD0 = (C0)◦θD is not necessarily a consistent equation. Recall the

two conditions for consistency from Theorem 2.2: (i) C(PD0X0) ⊆ C(D0) is satisfied.

(ii) C(XPD0) ⊆ C((C0)◦), which is not necessarily true.

Thus, we need some further steps.

(27) = |(XPD0 − (C0)◦θD)( )0 + S| = |S||S−1(XP_D0 − (C0)◦θD)( )0+ I|

(25)

Let G = XPD0 − (C0)◦θD and using Theorem 2.3,

(27) = |S||I + G0S−1(C0)◦((C0)◦0S−1(C0)◦)−(C0)◦0S−1G + G0C0(CSC0)−CG| ≥ |S||I + G0C0(CSC0)−CG| (28)

which is independent of θ since

CG = C(XPD0− (C0)◦θD) = CXP_D0− C(C0)◦

| {z }

=0

θD = CXPD0.

Equality holds if and only if

G0S−1(C0)◦((C0)◦0S−1(C0)◦)−(C0)◦0S−1G = 0. This is equivalent to

G0S−1(C0)◦((C0)◦0S−1(C0)◦)− = 0.

Thus, the lower bound which we were seeking for in (28) and which equals |S||I + G0C0(CSC0)−CG| = |S||I + PD0X0C0(CSC0)−CXP_D0| has been obtained. The

situ-ation for the third hyothesis is similar to the level hypothesis where we have CM F = 0 as the alternative hypothesis. We assume that the profiles are parallel and the test is con-ducted to see if they are flat or not. The restrictrictions for this test can be summarised with CM = 0. Due to the assumption that parallelism holds, the alternative hypothesis becomes CM F = 0. Then (9) from Section 3.2.1, where the likelihood for CM F = 0 has been derived, will be used for |N bΣA3|. As a result,

|N bΣH3| = |S||I + PD0X 0 C0(CSC0)−CXPD0|, |N bΣA3| = |S + P 0 C0_,S−1XP_D0_(DD0₎−_KX0P_C0_,S−1|.

In order to get a familiar structure for the ratio of these two quantities, we need to do some changes on |N ΣH3|. Use the rotation given by Theorem 2.4 and since PD0 is

idempotent, PD0P_D0 = P_D0 holds. And use also I = SS−1:

|N bΣH3| = |S||I + PD0X 0 C0(CSC0)−CSS−1X| (rotate) = |S||I + XPD0X0C0(CSC0)−CS | {z } =P_C0,S−1 S−1| (PC0_,S−1 = P2

C0_,S−1 and then rotate)

= |S||I + PC0_,S−1S−1XP_D0X0P_C0_,S−1| = |S||I + C0(CSC0)−C SS−1 | {z } =I XPD0X0P_C0_,S−1| = |S + SC0(CSC0)−CXPD0X0P_C0_,S−1| = |S + P_C00_,S−1XP_D0X0P_C0_,S−1|. Hence, |N bΣH3| = |S + P 0 C0_,S−1XP_D0X0P_C0_,S−1|; |N bΣA3| = |S + P 0 C0_,S−1XP_D0_(DD0₎−1_KX0P_C0_,S−1|.

(26)

We already know that XPD0X0 is Wishart distributed (see Theorem 2.7) but P0

C0_,S−1X

× PD0X0P_C0_,S−1 is not. Similarly XP_D0_(DD0₎−1_KX0 is Wishart distributed but P0

C0_,S−1X

× PD0_(DD0₎−1_KX0P_C0_,S−1 is not. Let H = (C0, S−1C0)◦ be of full rank:

λ2/N = |H 0 b ΣA3H| |H0 b ΣH3H| .

We start with calculating |H0Σb_H₃H|: |H0Σb_H₃H| = C (C0)◦0S−1 (S + P_C00_,S−1XP_D0X0P_C0_,S−1)(C0, S−1(C0)◦) = V11 V12 V21 V22 , where V11 =C(S + PC00_,S−1XP_D0X0P_C0_,S−1)C0, V12 =C(S + PC00_,S−1XP_D0X0P_C0_,S−1)S−1(C0)◦, V21 =(C0)◦ 0 S−1(S + P_C00_,S−1XPD0X0P_C0_,S−1)C0, V22 =(C0)◦ 0 S−1(S + P_C00_,S−1XP_D0X0P_C0_,S−1)S−1(C0)◦. Let’s check V12: V12 = CSS−1(C0)◦+ CPC00_,S−1XP_D0X0P_C0_,S−1S−1(C0)◦ = C(C0)◦ | {z } =0 +CSC0(CSC0)−CXPD0X0C0(CSC0)−CSS−1(C0)◦ | {z } =0 = 0.

Notice that V21 = V120 = 0. Let’s calculate the other elements:

V11= CSC0+ CSC0(CSC0)−CXPD0X0C0(CSC0)−CSC0 = CSC0+ CXPD0X0C0, V22= (C0)◦ 0 S−1SS−1(C0)◦+ (C0)◦0S−1SC0(CSC0)−CXPD0X0C0(CSC0)− × CSS−1(C0)◦ = (C0)◦0S−1(C0)◦.

If the established relations are put together:

|H0ΣbH3H| = CSC0+ CXPD0X0C0 0 0 (C0)◦0S−1(C0)◦ .

The alternative hypothesis for the flatness test is the same as for the level test. Conse-quently, the corresponding likelihood will have the same form, so H0Σb_A₂H = H0Σb_A₃H. Notice that the matrix H used during the level test is the same as the matrix H intro-duced here. Then

|H0ΣbA3H| = |CSC

0

+ CXP_D0_(DD0₎−1_KX0C0||(C0)◦ 0

(27)

Thus, the ratio becomes: λ2/N = |H 0 b ΣA3H| |H0 b ΣH3H| = |CSC 0_{+ CXP} D0_(DD0₎−1_KX0C0||(C0)◦ 0 S−1(C0)◦| |CSC0_{+ CXP} D0X0C0||(C0)◦0S−1(C0)◦| = |CSC 0_{+ CXP} D0_(DD0₎−1_KX0C0| |CSC0_{+ CXP} D0X0C0| ,

where CSC0, CXPD0_(DD0₎−1_KX0C0and CXP_D0X0C0 are all Wishart distributed.

How-ever, we are trying to find a structure with _{|U +V |}|U | , U and V are independently Wishart distributed. Recall the space decomposition given in Equation (8) for the parallelism hypothesis. It implies

I − PD0_F◦ = I − P_D0 + P_D0_(DD0₎−1_K,

PD0 = P_D0_(DD0₎−1_K+ P_D0_F◦.

Then we can write the ratio as

λ2/N = |H 0 b ΣA3H| |H0 b ΣH3H| = |CSC 0_{+ CXP} D0_(DD0₎−1_KX0C0| |CSC0_{+ CX(P} D0_(DD0₎−1_K+ P_D0_F◦)X0C0| = |CSC 0_{+ CXP} D0_(DD0₎−1_KX0C0| |CSC0_{+ CXP} D0_(DD0₎−1_KX0C0+ CXP_D0_F◦X0C0| ·

Since X is normally distributed and PD0_F◦ is idempotent, by Theorem 2.7,

XPD0_F◦X0 ∼ W_p(Σ, r(D0F◦))Theorem 2.8⇒ CXP_D0_F◦X0C0 ∼ W_p−1(CΣC0, r(D0F◦)).

We already know that CSC0 ∼ Wp−1(CΣC0, N − r(D)) and by Theorem 2.6, the sum

of two independently distributed Wishart matrices with the same scale matrix is again Wishart. Then

CSC0+ CXP_D0_(DD0₎−1_KX0C0 ∼ W_p−1(CΣC0, N − r(D) + r(K)).

Using the result given in Appendix A, Result A1,

λ2/N ∼ Λ(p − 1, N − r(D) + r(K), r(D0F◦)).

4 High-dimensional setting

4.1 Background

The classical setting for data analysis usually consists of large number of experimental units and small number of variables. For estimability reasons, the number of data points, n, needs to be larger than the number of parameters, p. Asymptotic properties have been derived in the classical setting. Theorems such as The Law of Large Numbers and The Central Limit Theorem focus on the case when p is fixed and n → ∞.

(28)

In recent years due to the development of information technology and data storage, the direction of the relationship between p and n has started to change. We face more research questions where we have more parameters than data points:

p > n or p n. (29)

Here we can mention different types of asymptotics as n → ∞: (i) p_n → c, where c ∈ (a, b),

(ii) p_n → ∞.

Classical multivariate approaches fail when the dimension of repeated measurements, p starts to exceed the number of observations, n. The sample covarince matrix becomes singular, consequently the likelihood ratio tests are not well-defined. As a result of this, classical tests are not feasible in the high-dimensional setting. We need to investigate and extend the current approaches within the high-dimensional framework. Our focus in this report will not be on any specific type of asymptotics that have been mentioned above. The p and n will be fixed and they will satisfy the condition given by (29). Ledoit and Wolf (2002) derived hypothesis tests for the covariance matrix in a high-dimensional setting. Srivastava (2005) also developed tests for certain hypotheses on the covariance matrix in high-dimensions. Srivastava and Fujikoshi (2006), Srivastava (2007), Srivastava and Du (2008) are other examples in the multivariate area. Kollo, von Rosen and von Rosen (2011) focused on estimating the parameters describing the mean structure in the Growth Curve model. Testing for the mean matrix in a Growth Curve model for high-dimensions was studied by Srivastava and Singull (2017) as well. Fujikoshi, Ulyanov and Shimizu (2010) focus on high-dimensional and large-sample approximations for multivariate statistics.

The focus in this report is on high-dimensional profile analysis. Onozawa, Nishiyama and Seo (2016) derived test statistics for profile analysis with unequal covariance matrices in high-dimensions. Similarly, Harrar and Kong (2016) worked on this topic. Shutoh and Takahashi (2016) proposed new test statistics in profile analysis with high-dimensional data by using the Cauchy-Schwarz inequality. All these references study the asymptotic distributions of the test statistics. They introduce different high-dimensional asymptotic frameworks and derive the test statistics in profile analysis under these frameworks. Our approach will be different than the approaches mentioned above. As noted before, we will not focus on the asymptotic distributions of the test statistics. In this report, fixed p and n are of interest and the method that is going to be used is introduced in the following chapter.

4.2 Dimension reduction using scores and spherical

distribu-tions

L¨auter (1996, 2016) and L¨auter, Glimm and Kropf (1996, 1998) proposed a new method for dealing with the problem that arises in high-dimensional settings. The tests they proposed are based on linear scores which have been obtained by using score coefficients that are determined from data via sums of products matrices. These scores are basically linear combinations of the repeated measures and how the combinations are being chosen

(29)

are called score coefficients or weights. With this approach high-dimensional observations are compressed into low-dimensional scores. Then these are used for the analysis instead of the original data. This approach can be useful in many situations because we often do not have the knowledge on the effect of each single variable or one may want to investigate the joint effect of several variables.

Let’s give the mathematical representation of the theory. Suppose

x = (xi) ∼ Np(µ, Σ)

and n individual p-dimensional vectors form the p × n matrix X which satisfies

X = (xij) ∼ Np×n(µ10n, Σ, In).

Consider a single score

z0 = (z1, z2, · · · , zn) = (d1, d2, · · · , dp)X = d0X,

where d is the vector of weights and zj’s, j = 1, ..., n, are the individual scores. The

rule for choosing the vector d of the coefficients is that it has to be a unique function of XX0 which is the p × p matrix of the sums of the products. Moreover, the condition d0X 6= 0 with probability 1 needs to be satisfied. The total sums of product matrix XX0 corresponds to the hypothesis µ = 0. Consequently, based on the hypothesis the structure of the function can change. We will try to illustrate the idea with two primary theorems presented in L¨auter, Glimm and Kropf (1996).

Theorem 4.1. (L¨auter et al., 1996) Assume that X is a p × n matrix consisting of n p-dimensional observations (p ≥ 1, n ≥ 2) that follows the normal distribution X ∼ Np×n(0, Σ, In). Define a p-dimensional vector of weights d which is a function of

XX0 and assume d0X 6= 0 with probability 1. Then

t = √

n¯z sz

(30)

has the exact t distribution with n − 1 degrees of freedom, where

z0 = (zj)0 = d0X, z =¯ 1 nz 0₁ n, s2z = 1 n − 1(z 0_{z − n¯}_z2_).

Theorem 4.2. (L¨auter et al., 1996) Assume that H ∼ Wp(Σ, m) and G ∼ Wp(Σ, f )

and they are independently distributed. Define a p-dimensional vector of weights d which is a function of H + G and assume d0(H + G)d 6= 0 with probability 1. Then

F = f m

d0Hd d0_Gd

follows an F -distibution with m and f degrees of freedom.

The idea behind these theorems are based on the theory of spherical distributions which has been treated extensively in the book by Fang and Zhang (1990). Elliptically contoured distributions can be considered as the generalization of the class of Gaussian distribution which has been the centre of multivariate theory. Normality is assumed for many testing problems, but practically this is often not true. Thus, there has been an effort to extend

(30)

the class of normal distibution to a wider class which still keeps the basic properties of normal distribution.

We know that if z ∼ Nn(0, In), the statistic t given in (30) has a t-distribution with n − 1

degrees of freedom , so we need to show a connection between z and the standard normal distribution. Since the normal distribution is in the class of spherical distributions, if one can show that z is spherically distributed, then this connection will be provided. We also need to show that the test statistics’ distributions remain the same when we use spherically distributed random vectors. These ideas are given by a corollary and a theorem, among others, by Fang and Zhang (1990).

Corollary 4.1. (Fang and Zhang, 1990) An n × 1 random vector x is spherically distributed if and only if, for every n × n orthogonal matrix Γ, x= Γx.d

Theorem 4.3. (Fang and Zhang, 1990) A statistic t(x)’s distribution remains the same whenever x ∼ S_n+(φ) if t(αx) = t(x) for each α > 0 and each x ∼ Sd _n+(.), where Sn(φ) denotes the spherical distribution with parameter φ and φ(.) is a function of a

scalar variable and it is called the characteristic generator of the spherical distribution. If x ∼ Sn(φ) and P (x = 0) = 0, this is denoted by x ∼ Sn+(φ).

4.3 The derivation of the tests in the high dimensional setting

We have derived the three hypothesis for the classical setting where N > p. Now we will focus on the high dimensional setting where we have p > N or p >> N . First we will construct the scores and then derive the likelihood ratio tests based on these scores. One should notice that we use capital N for the total sample size when there are groups, whose sizes may differ from each other. We refer to the parts related to profile analysis. In our case, there are q groups with group size nk, where k = 1, ..., q and therefore N =

Pq

k=1nk.

In Chapter 4.1 and Chapter 4.2, while the general introduction is being given, the total sample size is denoted by n when there exists one group in the analysis or theory.

4.3.1 Parallelism Hypothesis

Recall (3) and (4) from Section 3.2.1,

H1 : E(X) = M D, CM F = 0, A1 : E(X) = M D, CM F 6= 0 and LR = |ΣA1| |ΣH1| = |CSC 0_| |CSC0_{+ CXP} D0_(DD0₎−_KX0C0| , where CSC0 ∼ Wp−1(CΣC0, N − r(D)), CXPD0_(DD0₎−1_KX0C0 ∼ W_p−1(CΣC0, r(K)). (31)

(31)

In the beginning it was assumed that X ∼ Np,N(M D, Σ, IN). If we multiply X with

C, then CX ∼ N(p−1),N(CM D, CΣC0, IN). As we can see through the derivation of

the statistics in (31), X appears with C. Let Y = CX. Then

CSC0 = CX(I − PD0)X0C0 = Y (I − P_D0)Y0 ∼ W_p−1(CΣC0, N − r(D)),

CXPD0_(DD0₎−1_KX0C0 = Y P_D0_(DD0₎−1_KY0 ∼ W_p−1(CΣC0, r(K)).

Instead of applying the vector d to X we will apply it to Y . Notice that d is a (p − 1) × 1 vector. When we multiply Y with d0 from left (and Y0 with d from right), Y will be reduced to a vector and we call this new vector the score vector and denote it by z, that is z0 = d0Y : λN/2 = d 0_{Y (I − P} D0)Y0d d0_{Y (I − P} D0)Y0d + d0Y P_D0_(DD0₎−_KY0d = z 0_{(I − P} D0)z z0_{(I − P} D0)z + z0P_D0_(DD0₎−1_Kz . (32)

Now we are going to find the distribution of this ratio.

Theorem 4.4. The ratio given in (32) follows Wilks’ lambda distribution with parameters 1, r(K) and N − r(D) that is denoted by Λ(1, r(K), N − r(D)) which is equivalent to B N −r(D)₂ ,r(K)₂ , where B(·, ·) denotes the Beta-distribution.

Proof. First, we should note that since d is a function of Y Y0,

d0Y (I − PD0)Y0d W₁(d0CΣC0d, N − r(D)),

d0Y PD0_(DD0₎−1_KY0d W₁(d0CΣC0d, r(K)),

which means that we cannot find the distribution of the ratio directly. This is where we need the theory of spherical distributions.We begin by showing that the scores are spher-ically distributed. To show this, first we need to show that Y is spherspher-ically distributed:

YΓ= Y Γ

D(YΓ) = (Γ0Γ) ⊗ Σ = I ⊗ Σ but E(YΓ) 6= E(Y ).

Thus, Y is not spherically distributed. Therefore, we need to adapt the test statistic without changing the overall value in order to achieve sphericity.

Recall that the model under the null hypothesis is

X = (C0)◦θ1D + C0θ2F◦

0

D + E,

and accordingly the mean under the null hypothesis equals

(C0)◦θ1+ C0θ2F◦

0

.

(32)

ratio given by (32) will become as it follows:

(i) d0C[X − E(X)](I − PD0)[X − E(X)]0C0d

=d0C[X − (C0)◦θ1D − C0θ2F◦ 0 D](I − PD0)[X − (C0)◦θ₁D − C0θ₂F◦ 0 D]0C0d =d0CX(I − PD0)X0C0d =d0Y (I − PD0)Y0d;

(ii) d0C[X − E(X)]PD0_(DD0₎−1_K[X − E(X)]0C0d

=d0C[X − (C0)◦θ1D − C0θ2F◦ 0 D]PD0_(DD0₎−1_K[X − (C0)◦θ₁D − C0θ₂F◦ 0 D]0C0d =d0[CX − C(C0)◦ | {z } =0 θ1D − CC0θ2F◦ 0 D]PD0_(DD0₎−1_K[CX − C(C0)◦ | {z } =0 θ1D − CC0θ2F◦ 0 D]0d.

Let’s look at the middle part:

CC0θ2F◦ 0 DPD0_(DD0₎−1_K = CC0θ2F◦ 0 DD0(DD0)−1 | {z } =I K[(D0(DD0)−1K)0(D0(DD0)−1K)]−(D0(DD0)−1K)0. (33)

Recall C(K) = C(D) ∩ C(F ) and K can be written as K = D(D0F◦)◦. Then

(33) = CC0θ2F◦ 0 D(D0F◦)◦ | {z } =0 [(D0(DD0)−1K)0(D0(DD0)−1K)]−(D0(DD0)−1K)0. Then (ii) = d0CXPD0_(DD0₎−1_KX0C0d = d0Y P_D0_(DD0₎−1_KY0d

This means that we can subtract the mean from X and the expression of the likelihood ratio remains the same which means that the distribution will remain the same:

λN/2= d 0_{Y (I − P} D0)Y0d d0_{Y (I − P} D0)Y0d + d0Y P_D0_(DD0₎−1_KY0d = d 0_{[Y − CE(X)](I − P} D0)[Y − CE(X)]0d

d0[Y − CE(X)](I − PD0)[Y − CE(X)]0d + d0[Y − CE(X)]

× PD0_(DD0₎−1_K[Y − CE(X)]0d

·

If we denote this new variable, Y − CE(X), by eY , then

λN/2= d 0 e Y (I − PD0) eY0d d0 e Y (I − PD0) eY0d + d0Y Pe _D0_(DD0₎−1_KYe0d .

The reason why we needed the adaptation of the Y was that it was not spherically distributed. Now without changing the statistics overall, we achieved a new variable eY which is spherical distributed. To show this:

E( eY ) = E(Y ) − CE(X) = 0, D( eY ) = D(Y ).

(33)

We showed that eY is distributed with mean 0 and with covariance the same as Y ’s. Define eYΓ = eY Γ, where Γ is an N × N orthogonal matrix. The subscript Γ means

multiplication with Γ from the right-hand side. E( eYΓ) = E(Y Γ) = 0,

D( eYΓ) = (Γ0Γ) ⊗ Σ = I ⊗ Σ.

This proves that eYΓ and eY follow the same (p − 1) × N normal distribution. Now we

can show that the scores, z0 = d0Y , are spherically distributed. First, define ze _Γ0 = d0_ΓYeΓ.

Notice that dΓ = d since they are derived from the same matrix:

e

YΓYe_Γ0 = eY Γ( eY Γ)0 = eY ΓΓ0Ye0 = eY eY0.

In Chapter 4.2, it was stated that d needs to be a unique function of the total sums of product matrix. The derivation above is based on this information. Then

z0_Γ= d0Y Γ = ze 0Γ. (34) From this result, we can say that the vectors z0 and z_Γ0 follows the same distribution which is spherical. It is well-known that every spherically distributed random vector x has a stochastic representation x= Rud (n)_{, where u}(n) _{is a uniformly distributed random}

vector and R ≥ 0 is independent of u(n)_{. Furthermore, if x ∼ S}

n(φ), then x d

= Rw, where w ∼ Nn(0, In) is independent of R ≥ 0. Lastly, recall the Theorem 2.5.8. from

Fang and Zhang (1990) which is also given in Chapter 4.2 as Theorem 4.3. Now we can work on the ratio Λ2/N _{and implement these results:}

f (z) = λ2/N = z 0_{(I − P} D0)z z0_{(I − P} D0)z + z0P_D0_(DD0₎−1_Kz d = (Rw) 0_{(I − P} D0)(Rw) (Rw)0_{(I − P} D0)(Rw) + (Rw)0P_D0_(DD0₎−1_K(Rw) = R 2_w0_{(I − P} D0)w R 2 _w0_{(I − P} D0)w + w0P_D0_(DD0₎−1_Kw

which does not depend on R. The theorem from Fang and Zhang is satisfied. Thus, we can conclude that

λ2/N = z 0_{(I − P} D0)z z0_{(I − P} D0)z + z0P_D0_(DD0₎−1_Kz ∼Λ(1, N − r(D), r(K)) ≡B N − r(D) 2 , r(K) 2 . (35)

Wilks’ lambda distribution can be written as the product of independently distributed Beta variables (Mardia and Kent, 1979, L¨auter, 2016), that is Λ(p, m, n) ∼Qp

i=1Bi, where

B1, B2, ..., Bn are p independent random variables which follows the Beta distribution;

ui ∼ B(m−i+1₂ ,n₂), i = 1, ..., p. Notice that we have one dimensional Wishart distributed

expressions in the ratio given by (32) which means p = 1. Then the overall ratio will reduce to one Beta distribution given at the end of the proof by (35).

(34)

4.3.2 Level hypothesis

First let’s recall how the null and the alternative hypotheses looked like in the normal setting:

H2|H1 : E(X) = M D, M F = 0,

A2|H1 : E(X) = M D, CM F = 0

and the likelihood ratio was

LR = |(C 0₎◦0 S−1(C0)◦|−1 ((C0)◦0S−1(C0)◦)−1+ ((C0)◦0S−1(C0)◦)−1(C0)◦0S−1XD0(DD0)−1KQ−1K0(DD0)−1 × DX0S−1(C0)◦((C0)◦0S−1(C0)◦)−1 · (36)

For this hypothesis, it is clear that the expressions in the likelihood ratio are already one dimensional. The contrast matrix was given explicitly in Chapter 3.1. It is a (p − 1) × p matrix and

C : (p − 1) × p ⇒ C0 : p × (p − 1) ⇒ (C0)◦ : p × 1 ⇒ (C0)◦0 : 1 × p. However, dimension reduction is needed due to the degrees of freedom in the Wishart distribution. Recall

((C0)◦0S−1(C0)◦)−1 ∼ W1(((C0)◦

0

Σ−1(C0)◦)−1, N − r(D) − p + 1).

When p > N , the degrees of freedom will become negative which cannot take place. Moreover, S−1 does not exist in this case. Thus, we need to take care of these issues. One can notice that we have a more complicated situation here because the expressions in the ratio are more complex than for the parallelism and the level hypotheses. To solve the problem, we expand ((C0)◦0S−1(C0)◦)−1 and ((C0)◦0S−1(C0)◦)−1(C0)◦0S−1X. The expansions will be investigated in two parts; (i) and (ii), respectively:

(i) We start with ((C0)◦0S−1(C0)◦)−1:

((C0)◦0S−1(C0)◦)−1 = ((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1(C0)◦((C0)◦0S−1(C0)◦)−1

× (C0)◦0Σ−1(C0)◦((C0)◦0Σ−1(C0)◦)−1 =((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1[S − SC0(CSC0)−1CS]Σ−1(C0)◦((C0)◦0Σ−1(C0)◦)−1.

(37)

Recall that S = X(I − PD0)X0. Then (37) becomes

((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1X[(I − PD0)(I − X0C0(CX(I − P_D0)X0C0)−1

× CX(I − PD0))]X0Σ−1(C0)◦((C0)◦ 0

Σ−1(C0)◦)−1.

Apply the d vector to CX, i.e., d0CX:

((C0)◦0Σ−1(C0)◦)−1(C0)◦0Σ−1X[(I − PD0)(I − X0C0d(d0CX(I − P_D0)

× X0C0d)−1d0CX(I − PD0))]X0Σ−1(C0)◦((C0)◦ 0

Σ−1(C0)◦)−1. (38)