Small Area Estimation under a Multivariate Linear Model for Repeated Measures Data

(1)

' $

Department of Mathematics

Small

Area Estimation under a Multivariate

Linear

Model for Repeated Measures Data

Innocent Ngaruye, Joseph Nzabanita, Dietrich von Rosen and

Martin Singull

(2)

Department of Mathematics

Link¨oping University

(3)

Small Area Estimation under a

Multivariate linear model for

repeated measures data

Innocent Ngaruye1,2_{, Joseph Nzabanita}1,2_{, Dietrich von Rosen}1,3

and Martin Singull1

1_{Department of Mathematics, Link¨}_{oping University,}

SE–581 83 Link¨oping, Sweden

E-mail: {innocent.ngaruye, joseph.nzabanita, martin.singull}@liu.se

2_{Department of Mathematics,}

College of Science and Technology, University of Rwanda, P.O. Box 3900 Kigali, Rwanda

E-mail: {i.ngaruye, j.nzabanita}@ur.ac.rw

3_{Department of Energy and Technology,}

Swedish University of Agricultural Sciences, SE- 750 07 Uppsala, Sweden

E-mail: dietrich.von.rosen@slu.se

Abstract

In this paper, we consider small area estimation under a multivariate linear regression model for repeated measures data. The aim of the proposed model is to get a model which borrows strength across small areas and over time, by incorporating simultaneously the area effects and time correlation. The model accounts for repeated surveys, group individuals and random effects variations. Estimation of model pa-rameters is discussed within a restricted maximum likelihood based approach. Prediction of random effects and the prediction of small area means across time points and per group units for all time points are derived. The results are supported by a simulation study.

Keywords: Maximum likelihood; Multivariate linear model; Prediction of random effects; Repeated measures data; Small Area Estimation

(4)

1 Introduction

Population surveys are carried out via sampling designs and data collection of individual units with intention of making statistical inferences about a larger population of which these units are members. One commonly used design is simple random sampling which assigns equal selection probabilities to all elements in the population. These surveys are usually designed to provide efficient estimates of parameters of interest for large populations. In most cases, surveys are not originally designed to produce estimates for small domains and hence these domains are poorly represented in the sample. Thus, the surveys often provide very little information on a small area level and direct survey estimates on a target small area are not reliable due to a small sample size connected to this area.

In recent years, Small Area Estimation (SAE) methods have received much attention due to their usefulness in both public and private sectors and their demand has greatly increased worldwide. Several approaches and new developments in small area estimation have been investigated by different authors for example, Pfeffermann (2002, 2013); Rao (2003); Chambers and Clark (2012). The demand for small area statistics has increased due to their use in formulating social and economic policies, allocation of government funds, regional planning, business decision making etc. SAE has been used in a wide range of applications such as unemployment rates, poverty mapping, disease mapping, demography etc. One may refer to Ghosh and Rao (1994) and Rao (2003) for some examples and case studies in SAE.

Repeated measures data which refer to response outcomes taken on the same experimental unit at different time points have been widely used in research. The analysis of repeated measures data allows us to study trends over time. The demand for small area statistics is for both cross-sectional and for repeated measures data. For instance, small area estimates for repeated measures data may be used by public policy makers for different purposes such as funds allocation, new educational or health programs and in some cases, they might be interested in a given group of population.

It has been shown that the multivariate approach for model-based meth-ods in small area estimation may achieve substantial improvement over the usual univariate approach (Datta et al., 1999). Some studies dealing with SAE problems for longitudinal surveys have been discussed by various au-thors, for example Consortium (2004); Nissinen (2009); Singh and Sisodia (2011). The latter has developed direct, synthetic and composite estimators for small area means at a given time point when the population units contain non-overlapping groups.

(5)

esti-mators, we propose a model which borrows strength across both small ar-eas and over time by incorporating simultaneously the effects of arar-eas and time correlation. The model allows for finding the small area means at each time point, per group units and particularly the pattern of changes or mean growth profiles over time. This model accounts for repeated surveys, group of individuals and random effects variations.

The paper is organized as follows. After the Introduction, the second section is devoted to the model formulation and notation used and Section 3 deals with estimation of model parameters. Sections 4 and 5 discuss the prediction of random effects and pediction of target small area means, respec-tively. We end up by a simulation study is Section 6 and general concluding remarks in Section 7.

2 Model formulation

We consider repeated measurements on the variable of interest y taken at p time points t1, . . . , tp from a finite population U of size N which is partitioned

into m disjoint subpopulations U1, . . . , Um called small areas of sizes Ni, i =

1, . . . , m such that Pm

i=1Ni = N . We also assume that in every area, there

are k different groups of units of size Ng for group g such that

Pk

g=1Ng = Ni

and Pm

i

Pk

g=1Nig = N , where Nig is the group size within the i-th area.

Suppose that a sample s = s1, . . . , sm is selected from the population using

simple random sampling without replacement, where si is the sample of size

ni observed from area i. The sample remains the same over time.

Let y_ij be the p-vector of measurements on the j-th unit, in the i-th area.

That is y_ij = (yij1, . . . , yijp)0, j = 1, . . . , Ni, i = 1, . . . , m.

We assume the mean growth of the jth unit in area i for each group to be a polynomial in time of degree q − 1. Furthermore, we suppose that we

have auxiliary data xij of r concomitant variables (covariates) whose values

are known for all units in all m small areas. These auxiliary variables are included in the model to strengthen the limited sample size data from areas.

The values xij can be the values of the survey from the same area in the

past and/or the values of the other variables that are related to the variable of interest. Moreover, they can be register based information or the data measured for characteristics of interest in other similar areas.

The relationship between y_ij and xij in each small area is not always

considered as the same as the relationship between the variables in the pop-ulation as whole (Chambers and Clark, 2012). So, we have to add an area specific term to allow them to better account for the between area variability

(6)

random area-effects which vary over time. Thus, for each one of the k groups, the unit level regression model for j-th unit coming from the small area i at time t can be expressed by

yijt = β0 + β1t + · · · + βqtq−1+ γ0xij + uit+ eijt,

j = 1, . . . , Ni; i = 1, . . . , m; t = t1, . . . , tp,

where eijt are random sampling errors depending on the sampling scheme

assumed to be i.i.d normal with mean zero and known sampling variance σ2

e independent of uit. The γ is a vector of fixed regression coefficients of

auxiliary variables. The β0, . . . , βq are unknown parameters assumed to be

the same in all areas under the assumption that there is no area effect on polynomial growth in time.

For all time points, the model can be written in matrix form as y_ij = Aβ + 1pγ0xij + ui+ eij, where A =      1 t1 t21 · · · t q−1 1 1 t2 t22 · · · t q−1 2 .. . ... ... ... 1 tp t2p · · · tq−1p      , 1p = (1, 1, . . . , 1)0 : p × 1.

The vector ui is assumed to be multivariate normally distributed with zero

mean and unknown covariance matrix Σu. Collecting the vectors yij for all

units in small area i coming from the group g gives

Yig = ABg+ 1pγ0Xig+ uiz0ig+ Eig, g = 1, . . . , k, (1) where Yig = (yi1, · · · , yiNig); Bg = (βg, · · · , βg) : q × Nig; Xig = (xi1, · · · , xiNig); zig = 1 √ Nig 1Nig and Eig = (ei1, · · · , eiNig).

The model presented in (1) holds for each group unit in area i. Since we are interested in every group unit, we need to include a known design matrix

Ci : k × Ni of group separation indicators when collecting all k groups in the

i-th area, which are supposed to be the same in all areas.

The model at small area level for k groups is then written as Yi = ABCi+ 1pγ0Xi+ uiz0i+ Ei, where Yi = (Yi1, · · · , Yik); B = (β1, · · · , βk) : q × k; Xi = (Xi1, · · · , Xik); zi = (z0i1, · · · , z 0 ik) 0_; _E ig = (ei1, · · · , eik); Ci =    10_N_i1 0 . .. 0 10_N ik   ,

(7)

where Ei ∼ Np,Ni(0, Σe, INi) standing for matrix normal distribution with

mean zero, positive definite covariance matrix between rows Σe = σe2Ip and

independent columns. Yi is a p × Ni data matrix; A is a p × q, q ≤ p

known within individuals design matrix for fixed effects; B is q × k unknown

parameter matrix; Ci with rank(Ci) + p ≤ Ni is a k × Ni known between

individuals design matrix for fixed effects and Xi is a r × Ni known matrix

taking the values of the covariates.

Combining all small areas for N =Pm

i=1Ni units, we get the model

Y |{z} p×N = A |{z} p×q B |{z} q×k H |{z} k×mk C |{z} mk×N + 1pγ0 |{z} p×r X |{z} r×N + U |{z} p×m Z |{z} m×N + E |{z} p×N , (2) where Y = [Y1, · · · , Ym]; H = Ik: · · · : Ik ; X = [X1, · · · , Xm]; U = [u1, · · · , um]; E = [E1, · · · , Em]; C =    C1 0 . .. 0 Cm    and Z =    z0₁ 0 . .. 0 z0_m   . It is assumed that E ∼ Np,N(0, Σe, IN), U ∼ Np,m(0, Σu, Im), p ≤ m. Furthermore, vec(Y ) ∼ NpN vec(ABHC + 1pγ0X), Σ for Σ = Z0Z ⊗ Σu + IN ⊗ Σe and Σu =      σ2 u1 σu12 · · · σu1p σu12 σu22 · · · .. . · · · . .. ... σu1p · · · σ2up      : p × p,

where the covariance between random effects ui’s at two distinct times t, s

is Cov(ut, us) = σuts= ρt,sσutσus with ρt,s standing for the correlation.

3 Estimation for the mean and covariance

The model defined in (2) is a sum of two matrix normal distributions which is normally distributed but not in general matrix normally distributed. There-fore, in order to use the maximum likelihood estimation approach, we make some transformations to achieve a matrix normal distribution which is easier

(8)

spanning C(A)⊥, i.e., C(Ao) = C(A)⊥, where C(A) denotes the column

vec-tor space generated by the columns of the matrix A. Moreover, A− denotes

an arbitrary generalized inverse of the matrix A such that AA−A = A. We

also denote by PA = A(A0A)−A0, QA = I − PA, PA,B = A(A0BA)−A0B

and Q_A,B = I − PA,B. PA and QA are projector matrices and if B is

positive definite, PA,B and QA,B are also projectors.

First of all, we observe that the matrices Z, H, C from model (2) are of full row rank and the matrix A is of full column rank. Moreover,

C(Z0) ⊆ C(C0) and ZZ0 = Im.

Both relations imply that (CC0)−1/2CZ0ZC0(CC0)−1/2 is an idempotent

matrix and thus can be diagonalized by an orthogonal matrix, say Γ with corresponding eigenvalues 1 and 0, i.e.,

(CC0)−1/2CZ0ZC0(CC0)−1/2 = ΓDΓ0 = ΓIm 0

0 0

Γ0.

Since each column of Γ is an eigenvector associated to the corresponding

eigenvalue in D, we can partition Γ = [Γ1 : Γ2] such that Γ1 corresponds

to the block Im and Γ2 corresponds to the block 0, with Γ1 : mk × m and

Γ2 : mk × (mk − m). It follows that Γ01Γ1 = Im and Γ02Γ2 = Imk−m.

Choose C0o to be a matrix which columns are unit length eigenvectors

corresponding to non zero eigenvalues of I − C0(CC0)−C. Then we have

(C0o)0C0o = IN −mk and we can make a one to one transformation on (2) to

get [V : W ] = Y [C0(CC0)−1/2 : C0o], where

V =Y C0(CC0)−1/2 = ABH(CC0)1/2+ 1pγ0XC0(CC0)−1/2

+ (U Z + E)C0(CC0)−1/2, (3)

W =Y C0o = 1pγ0XC0o+ EC0o, (4)

since CC0o = 0 and C(Z0) ⊆ C(C0) implies that ZC0o = 0. Make a further

transformation on (3) so that [V1 : V2] = Y C0(CC0)−1/2[Γ1 : Γ2], where

V1 =Y C0(CC0)−1/2Γ1 = ABH(CC0)1/2Γ1+ 1pγ0XC0(CC0)−1/2Γ1

+ (U Z + E)C0(CC0)−1/2Γ1, (5)

V2 =Y C0(CC0)−1/2Γ2 = ABH(CC0)1/2Γ2+ 1pγ0XC0(CC0)−1/2Γ2

+ EC0(CC0)−1/2Γ2. (6)

(9)

respectively. Then, V1 ∼ Np,m M1, Σu+ Σe, Im , V2 ∼ Np,mk−m M2, Σe, Imk−m , W ∼ Np,N −mk M3, Σe, IN −mk , where M1 =ABH(CC0)1/2Γ1+ 1pγ0XC0(CC0)−1/2Γ1, M2 =ABH(CC0)1/2Γ2+ 1pγ0XC0(CC0)−1/2Γ2, M3 =1pγ0XC0o.

The matrices V1, V2 and W are independently distributed.

Proof 3.1 The independence of V1, V2 and W follows by the fact that

the matrices Γ1, Γ2 and C0o are all of full rank and pairwise orthogonal. We

observe that Γ0Γ = Imk, and

Γ0Γ1 = I_m 0 , Γ0Γ2 = 0 Imk−m , Γ10Γ =Im 0 , Γ20Γ =0 Imk−m . From E ∼ Np,N(0, Σe, IN), U ∼ Np,m(0, Σu, Im), it follows that U ZC0(CC0)−1/2Γ1 ∼ Np,m 0, Σu, Γ01(CC 0 )−1/2CZ0ZC0(CC0)−1/2Γ1 , EC0(CC0)−1/2Γ1 ∼ Np,m 0, Σe, Γ01(CC 0 )−1/2CC0(CC0)−1/2Γ1 , EC0(CC0)−1/2Γ2 ∼ Np,mk−m 0, Σe, Γ02(CC 0 )−1/2CC0(CC0)−1/2Γ2 , EC0o ∼ Np,N −mk 0, Σe, (C0o)0C0o . Hence, (U Z + E)C0(CC0)−1/2Γ1 ∼ Np,m 0, Σu+ Σe, Im , EC0(CC0)−1/2Γ2 ∼ Np,mk−m 0, Σe, Imk−m , EC0o ∼ Np,N −mk 0, Σe, IN −mk ,

(10)

since Γ0₁(CC0)−1/2CZ0ZC0(CC0)−1/2Γ1 = Γ01ΓDΓ 0 Γ1 = Im, Γ0₁(CC0)−1/2CC0(CC0)−1/2Γ1 = Im, Γ0₂(CC0)−1/2CC0(CC0)−1/2Γ2 = Imk−m, (C0o)0C0o = IN −mk,

which completes the proof of the theorem.

Now consider the two components W and V2 and recall that the covariance

matrix Σe = σe2Ip is known. Put

K1 = H(CC0)1/2Γ1, K2 = H(CC0)1/2Γ2,

R1 = C0(CC0)−1/2Γ1, R2 = C0(CC0)−1/2Γ2.

Then,

W =Y C0o = 1pγ0XC0o+ EC0o,

V2 =Y R2 = ABK2+ 1pγ0XR2+ ER2.

The corresponding log-likelihood of the joint density function is given by l(γ, B) = lW(γ) + lV2(γ, B),

where lW(γ) and lV2(γ, B) denote the log-likelihood functions for W and

V2, respectively. Let use the notation (Y )()0 instead of (Y )(Y )0 in order to

shorten matrix expressions. Therefore,

l(γ, B) = − p(N − mk) 2 log (2π) − N − mk 2 log |Σe| − p(mk − m) 2 log (2π) − mk − m 2 log |Σe| − 1 2tr n Σ−1_e W − 1pγ0XC0o 0o − 1 2tr n Σ−1_e V2− ABK2− 1pγ0XR2 0o =co− 1 2tr n Σ−1_e W − 1pγ0XC0o 0 + Σ−1_e V2− ABK2− 1pγ0XR2 0o ,

where co is a constant not depending on the parameters and tr stands for the

(11)

Taking the first and second derivatives with respect to γ and B, the likelihood equations can be expressed as

XC0o(C0o)0+ XR2R02 Y01p− p XC0o(C0o)0X0+XR2R02X 0 γ −XR2K02B 0 A01p =0, (7) A0Y R2− ABK2− 1pγ0XR2 K0₂ =0. (8)

Lemma 3.1 The likelihood equation (7) admits a unique solution for the parameter vector γ if the matrix X is of full rank and the likelihood equation (8) admits a non unique solution for the parameter matrix B.

The proof of the lemma can be found in Appendix. The second equation (8) gives B =(A0A)−1A0Y R2K02(K2K02) −_{− (A}0 A)−1A01pγ0XR2K02(K2K02) − + T1(K2K02) o0 ,

for an arbitrary matrix T1. Plugging in the value of B into equation (7)

yields XC0o(C0o)0 + XR2R02 Y01p− p XC0o(C0o)0X0+ XR2R02X 0 γ − XR2K02(K2K02) − K2R02Y 0 1p+ pXR2K02(K2K02) − K2R02X 0 γ = 0. Then, the restricted maximum likelihood estimators (RMLEs) for γ and B are obtained by b γ = pXC0o(C0o)0X0 + pXR2R02X 0 − pXR2K02(K2K02) − K2R02X 0− ×XC0o(C0o)0Y0+ XR2R02Y 0 _{− XR} 2K02(K2K02) − K2R02Y 0 1p + pXC0o(C0o)0X0+ pXR2R02X 0 − pXR2K02(K2K02) − K2R02X 0o t2, b B =(A0A)−1A0Y R2K02(K2K02) −_{− (A}0 A)−1A01pγb 0 XR2K02(K2K02) − + T1(K2K02) o0_,

for an arbitrary vector t2and an arbitrary matrix T1. Put P = XC0o(C0o) 0 + XR2R02− XR2K02(K2K02)−K2R02. Then b γ =1 p(P X 0 )−P Y01p+ (P X0)ot2, (9) b B =(A0A)−1A0Y R2K02(K2K02) − − 1 p(A 0 A)−1A01p10pY P 0 (XP0)−XR2K02(K2K02) − − (A0A)−1A01pt02(P X 0 )o0XR2K02(K2K02) − + T1(K2K02)o0. (10)

(12)

The estimators γ of γ and b_b B of B depend on arbitary vector t2 and

arbitrary matrix T1, respectively. However, there is also information about

B and γ in V1 which we now try to utilize in order to find the expressions

of T1 and t2. Recall that

V1 =Y R1 = ABK1+ 1pγ0XR1+ (U Z + E)R1, ER1 ∼Np,m 0, Σu+ Σe, Im .

Inserting the values of γ and B in the model V1 above, yields

V1 =A(A0A)−1A0Y R2K02(K2K02) −_K 1 −1 p1p1 0 pY P 0 (XP0)−XR2K02(K2K02) − K1 − 1pt02(P X 0 )o0XR2K02(K2K02) − K1+ AT1(K2K02) o0 K1 +1 p1p1 0 pY P 0 (XP0)−XR1+ 1pt02(P X 0 )o0XR1 + (U Z + E)R1. Set V3 =Y R1− A(A0A)−1A0Y R2K02(K2K02) − K1 +1 p1p1 0 pY P 0 (XP0)−XR2K02(K2K02) − K1− 1 p1p1 0 pY P 0 (XP0)−XR1,

with model equation V3 =AT1(K2K02) o0_K 1+ 1pt02(P X 0 )o0XR1 − 1pt02(P X 0 )o0XR2K02(K2K02) − K1+ (U Z + E)R1, or equivalently, V3 = AT1Ce₁+ 1_pt0₂Ce₂+ eE, (11) where e C1 =(K2K02)o0K1, e C2 =(P X0)o0XR1− (P X0)o0XR2K02(K2K02) − K1, e

E =(U Z + E)R1, with eE ∼ Np,m

0, Σu+ Σe, Im

.

Hence, the model obtained in (11) is an extended multivariate linear growth

curve model with nested subspace condition C(1p) ⊆ C(A) on the within

design matrices. One can refer to Filipiak and von Rosen (2012) for more details about extended multivariate linear growth curve model and the cor-responding maximum likelihood estimators.

(13)

Lemma 3.2 Let V3 be defined as in (11). The RMLEs for the parameter

matrices T1, t2 and Σu are given by

bt2 =(10pS −1 1 1p)−1( eC2Q_C_e0 1 e C0₂)−Ce₂Q e C01V 0 3S −1 1 1p + ( eC2Q_C_e0 1) o0_t0 211p, (12) b T1 =(A0S−12 A) −1_A0 S−1₂ (V3− 1pbt 0 2Ce2) eC 0 1( eC1Ce 0 1) −_{+ A}0 T11Ce o 1, (13) b Σu = 1 m(V3− A bT1Ce1− 1pbt 0 2Ce2)(V3 − A bT1Ce1− 1pbt 0 2Ce2)0− Σe, (14)

for an arbitrary vector t21 and an arbitrary matrix T11, where

S1 = V3Q_{( e}_C0 1: eC 0 2)V 0 3, S2 = S1+ Q1p,S−11 V3PQ e C0₁Ce 0 2V 0 3Q 0 1p,S−11 .

The proof follows similarly to Theorem 1 in Filipiak and von Rosen (2012) in a particular case of two profiles.

Following Filipiak and von Rosen (2012), from model (11), bT1 andbt2 are

unique if and only if the matrices eC1 and eC2 are of full rank respectively

and C( eC1) ∩ C( eC2) = {0}. However, A bT1Ce₁ and 1_pbt

0

2Ce₂ are unique and hence bΣu given in (14) is unique.

With the above calculations, we are now ready to give a theorem which summarize the main results about estimation of the formulated model.

Theorem 3.2 Consider the model given by (2). Then, the RMLEs of γ,

B and Σu can be expressed as

b γ =1 p(P X 0 )−P Y01p + (P X0)otb2, b B =(A0A)−1A0Y R2K02(K2K02) − −1 p(A 0 A)−1A01p10pY P 0 (XP0)−XR2K02(K2K02) − − (A0A)−1A01pbt 0 2(P X 0 )o0XR2K02(K2K02) − + bT1(K2K02) o0 , b Σu = 1 m(V3− A bT1Ce1− 1pbt 0 2Ce₂)(V₃− A bT₁Ce₁− 1_pbt 0 2Ce₂)0− Σ_e, where bt2 and bT1 are given by (12) and (13) , respectively.

This Theorem follows from relations (9), (10) and Lemma 3.2.

4 Prediction of random effects

The small area means in a given area i are defined as the conditional mean given the realized area effects (Battese et al., 1988). Thus, estimates of ran-dom area effects are needed in the estimation of small area means. As pointed

(14)

out by Pinheiro and Bates (2000), although technically the random effects are not model parameters, in some ways they do behave like parameters and since they are unobservable we want to predict their values. The idea is to predict unobservable random variable from some realized values. Robinson (1991) discusses the need of prediction of random effects and summarizes the theory best linear unbiased predictor ( BLUP). Searle et al. (2009) present the theory of prediction of random variables. As stated by these authors, the minimum variance is used for estimation of a fixed parameter while the minimum mean square is used for estimation of the realized value of a ran-dom variable. In this section, we use the approach developed by Henderson (1973) for prediction of random effects which consists of maximizing the joint density between the observable random variable and non observable random variable.

Consider the model in (2) given by

Y = ABHC + 1pγ0X + U Z + E,

and maximize the joint density f (Y , U ) with respect to U assuming the

covariance matrices Σu and Σe to be known. We get

f (Y , U ) =f (U )f (Y |U ) =λ expn−1 2tr{U 0 Σ−1_u U }o × expn− 1 2tr{Σ −1 e (Y − ABHC − 1pγ0X − U Z)()0} o , where λ is a constant. Then, the estimating equation for U equals

Σ−1_e (Y − ABHC − 1pγ0X − U Z)Z0− Σ−1u U = 0,

which is equivalent to

ΣeΣ−1u U + U ZZ 0

= (Y − ABHC − 1pγ0X)Z0,

and since ZZ0 = Im, it follows that

(ΣeΣ−1u + Ip)U = (Y − ABHC − 1pγ0X)Z0.

Thus, we get the following theorem about the prediction of random effects Theorem 4.1 Consider the model defined by (2). Then, the prediction of random effects is given by

b U =ΣeΣb −1 u + Ip −1 (Y − A bBHC − 1pγb 0 X)Z0, where γ, b_b B and bΣu are given in Theorem 3.2.

(15)

5 Prediction of target small area means

We assume that the small area model holds for the sample data which means that no sample selection bias and that the sampling design is not informative. The model in (2) comprises in theory, sampled and non sampled units. That is

Y = (Y(s)₁ , · · · , Y(s)_m, Y(r)₁ , · · · , Y(r)_m) : p × N,

where Y(s)_i = (y_i1, · · · , y_in_i) : p × ni, represents the sampled ni

observa-tions from the i-th small area and Y(r)_i = (y_in

i+1, · · · , yiNi) : p × (Ni− ni,

corresponds to the non sampled (Ni− ni) units from the i-th small area.

Then, split the sample siinto sig, g = 1, . . . , k with corresponding sample

sizes nig for k groups. Therefore, the target vector in small area i which

elements are area means at each time point is given by y_i =fiY

(s)

i + (1 − fi) bY (r) i ,

where Y(s)_i is the vector of small area means corresponding to sampled

units, bY

(r)

i is the vector of predicted small area means corresponding to

non-sampled units and 1 − fi is the finite population correction factor with

fi = _Nni_i the sampling fraction, that is the fraction of the population that is

sampled. Therefore, y_i = fi ni Y(s)_i 1ni + 1 − fi Ni− ni b Y(r)_i 1Ni−ni = 1 Ni Y(s)_i 1ni + bY (r) i 1Ni−ni , where b Y(r)_i = A bBCi+ 1pγb 0 Xi+ubiz 0 i. (15)

It is convenient to note that γ , b_b B and u_bi used in (15) are estimators

com-puted from Theorem 3.2 and Theorem 4.1 , respectively using observed data. Then, the target vector of small area means for each group g across all time points is given by y_ig = 1 Nig Y(s)_ig 1nig + bY (r) ig 1Nig−nig , g = 1, . . . , k.

(16)

Equivalently, y_ig = 1 Nig X j∈sig y_ij + (1 − nig Nig )Abβ_g+ 1 Nig X j /∈si 1pγb 0 xij + (1 − nig Nig )u_bi+ 1 Nig X j /∈sig eij,

where bβ_g is the column of the estimated parameter matrix bB corresponding

the the group g and the predicted vectors u_bi’s are the the columns of the

predicted matrix bU . The first term of this expression on the right side

is known and by the strong law of large numbers, if Nig is large, the last

term is approximately equal to zero. Following Henderson (1975), the linear predictor of y_ig = (1 − nig Nig )Aβ_g+ 1 Nig X j /∈sig 1pγ0xij + (1 − nig Nig )ui is given by b y_ig = (1 − nig Nig )Abβ_g+ 1 Nig X j /∈sig 1pγb 0 xij + (1 − nig Nig )u_bi,

where bβ_g are columns of the estimated parameter matrix bB. Hence,

y_ig = 1 Nig X j∈sig y_ij + (1 − nig Nig )Abβ_g+ 1 Nig X j /∈sig 1pbγ 0 xij + (1 − nig Nig )u_bi.

Note that the population means of auxiliary variables x in area i at time t

must be known so that the non-sampled mean P

j /∈six

0

ijt is then obtained by

substracting the corresponding sample means from the population mean. Theorem 5.1 Given the multivariate linear regression model as defined in (2). The target small area means at each time point are the elements of the vectors y_i = 1 Ni Y(s)_i 1ni+ bY (r) i 1Ni−ni , where bY(r)_i = A bBCi+ 1pγb 0

Xi+ubizi. The target small area means for each group across all time points are the elements of the vectors

y_ig = 1 Nig X j∈sig y_ij + (1 − nig Nig )Abβ_g+ 1 Nig X j /∈sig 1pbγ 0 xij + (1 − nig Nig )u_bi.

(17)

6 Simulation study example

In this section, we present a simulation study where we consider four times repeated surveys on a population of size having 8 small areas and draw a sample of size n = 450 with the following small area sample sizes given in Table 1,

Table 1: Sample sizes

Area Group 1 Group 2 Group 3 Total

1 n11=12 n12=18 n13=16 n1=46 2 n21=21 n22=23 n23=12 n2=56 3 n31=10 n32=20 n33=15 n3=45 4 n41=16 n42=24 n43=17 n4=57 5 n51=24 n52=26 n53=21 n5=71 6 n61=20 n62=12 n63=28 n6=60 7 n71=27 n72=13 n73=14 n7=54 8 n81=20 n82=14 n83=27 n8=61 m=8 g1=150 g2=150 g3=150 n=450

We assume that we have r = 3 covariables. The design matrices are chosen to be

A =     1 1 1 2 1 3 1 4     , C =    C1 0 . .. 0 C8    for Ci =  10_n i1 ⊗   1 0 0  : 10_n i2 ⊗   0 1 0  : 10_n i3 ⊗   0 0 1    , i = 1, · · · , 8.

The parameter matrices, the sampling variance and the covariance for the random effects are

B = 8 9 10 11 12 13 , γ =   1 2 3  , σ_e2 = 0.16 and Σu =     4.1 1.8 1.2 2.4 1.8 3.6 2.4 1.4 1.2 2.4 6.0 2.2 2.4 1.4 2.2 9.6     .

(18)

Then, the data are randomly generated from

vec(Y ) ∼ Npn(vec(ABHC + 1pγ0X), Σ, In), using MATLAB Version

8.3.0.532 (The MathWorks, Inc. USA), where the matrix of covariates X is randomly generated as uniformly distributed on the interval [0, 1] and then taken as fixed.

The simulations were repeated 500 times using the formulas presented in Theorem 3.2 and the following average estimates were obtained:

b B = 8.0226 9.0551 9.9728 11.0002 11.9997 13.0002 , γ =_b   0.9681 1.9743 3.0222  , b Σu =     4.1683 1.8835 1.2804 2.5056 1.8835 3.6705 2.4316 1.4471 1.2804 2.4316 5.9460 2.1355 2.5056 1.4471 2.1355 9.3509     .

From the above simulations, we see that these estimates are close to the true values and thus, the proposed estimators support the theoretical results.

7 Concluding remarks

The main task in the present paper has been the prediction of small area means for repeated measures data using the model-based approach under SAE techniques. We have considered longitudinal surveys under simple ran-dom sampling without replacement repeated over time whose target popula-tion is divided into non-overlapping groups available in all small areas.

In order to address the problem of SAE under these settings, we have proposed a multivariate linear regression model that borrows strength across both small areas and over time. This model accounts for repeated measures data, group individuals and random effect variations over time. The estima-tion of model parameters has been discussed within a restricted maximum likelihood based approach. The model is split into three component models, some algebraic transformations are performed to achieve the matrix normal distribution of each component thereby follows the derivation of explicit re-stricted maximum likelihood estimators. Prediction of small area means is presented at all time points, at each time point and by group units. These theoretical results have also been illustrated in a simulation study.

In future work we wish to study properties of all proposed estimators. In particular, the moments and approximation of the distribution of estimators for this SAE multivariate linear regression model and Mean Square Error

(19)

estimation. We also wish to have some specific real data set and apply the results of this work.

A

Appendix

A.1 Proof of Lemma 3.1

Proof A.1 Obviously, the likelihood equation (7) admits a unique solu-tion for the parameter vector γ if the matrix XC0o(C0o)0X0 + XR2R02X

0

is of full rank. We have rank(C0o(C0o)0) = N − mk (with N > mk) and

rank(R2R02) = mk − m.

Moreover,

C0o(C0o)0R2R02 = R2R02C 0o

(C0o)0 = 0. So, C(C0o(C0o)0) ∩ C(R2R02) = {0} and then

rank(C0o(C0o)0+ R2R02) = rank(C 0o (C0o)0) + rank(R2R02) = N − m. Therefore, rank(XC0o(C0o)0X0 + XR2R02X 0 ) = rankX(C0o(C0o)0 + R2R02)X 0 = rank(X) provided that rank(X) ≤ N − m.

The likelihood equation (8) is equivalent to

A0ABK2K02 = A

0

Y R2K02− A 0

1pγ0XR2K02.

This equation in B admits a non unique solution if and only if one or both

matrices A and K2are not of full rank. Since A is a full rank matrix, we need

to show that the matrix K2 = H(CC0)1/2Γ2 is not of full rank. It follows

from the construction of Γ2 that C(Γ2) = C

(CC0)1/2_(CZ0

)o_{. But for any}

matrices of proper sizes, if C(F ) = C(G), then C(EF ) = C(EG) (see for

example Harville (1997) for more details). Therefore, CH(CC0)1/2_Γ

2

= CHCC0(CZ0)o_.

Moreover, if two matrices have the same column space then they have

the same rank. Using this result and the following rank formula which

can be found, for example in Kollo and von Rosen (2005), rank(F : G) =

rank(F0Go) + rank(G), we get

(20)

Since Z0 = C0Q for Q =    1 √ N11k 0 . .. 0 √1 Nm1k   , it follows that rank(H(CC0)1/2Γ2) =rank(HCC0(CZ0)o) =rank(CC0H0: CC0Q) − rank(CC0Q) =rank(H0: Q) − rank(Q)

=rank(H0) + rank(Q) − dim(C(H0) ∩ C(Q)) − rank(Q)

=rank(H0) − dim(C(H0) ∩ C(Q)),

since the matrix CC0 is of full rank and where dim denotes the dimension

of a subspace. It remains to show that C(H0) and C(Q) are not disjoint.

Let v1 = 1kand v2 = (

√

N1, · · · ,

√

Nm)0 and recall that H = Ik : · · · : Ik.

Then we have H0v1 = 1mk and Qv2 = 1mk. Thus, the two spaces are not

disjoint since they include a common vector. This completes the proof of the lemma.

Acknowledgements

The research of Innocent Ngaruye is supported by the Swedish International Development and Cooperation Agency (SIDA) in collabolation with the Uni-versity of Rwanda. Both institutions are acknowledged.

References

Battese, G. E., Harter, R. M., and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401):28–36.

Chambers, R. and Clark, R. G. (2012). An introduction to Model-Based Survey Sampling with Applications. Oxford University Press, Oxford New York.

Consortium, T. E. (2004). Enhancing small area estimation techniques to meet european needs. Technical report, Office for National Statistics, Lon-don.

(21)

Datta, G. S., Lahiri, P., Maiti, T., and Lu, K. L. (1999). Hierarchical Bayes estimation of unemployment rates for the states of the us. Journal of the American Statistical Association, 94(448):1074–1082.

Filipiak, K. and von Rosen, D. (2012). On MLEs in an extended multivariate linear growth curve model. Metrika, 75(8):1069–1092.

Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: an appraisal. Statistical science, 9(1):55–76.

Harville, D. A. (1997). Matrix Algebra from a Statistician’s Perspective,

volume 157. Springer.

Henderson, C. R. (1973). Sire evaluation and genetic trends. Journal of Animal Science, 1973(Symposium):10–41.

Henderson, C. R. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics, 31(2):423–447.

Kollo, T. and von Rosen, D. (2005). Advanced Multivariate Statistics with Matrices, volume 579. Springer, Dordrecht.

Nissinen, K. (2009). Small Area Estimation with Linear Mixed Models

from Unit-level panel and Rotating panel data. PhD thesis, University of Jyv¨askyl¨a.

Pfeffermann, D. (2002). Small area estimation-new developments and direc-tions. International Statistical Review, 70(1):125–143.

Pfeffermann, D. (2013). New important developments in small area estima-tion. Statistical Science, 28(1):40–68.

Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and S-plus. Springer-Verlag, New York.

Rao, J. N. K. (2003). Small Area Estimation, volume 331. John Wiley and Sons, New York.

Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical Science, 6(2):15–32.

Searle, S. R., Casella, G., and McCulloch, C. E. (2009). Variance Compo-nents, volume 391. John Wiley and Sons, New York.

Singh, B. and Sisodia, B. S. V. (2011). Small area estimation in longitudinal surveys. Journal of Reliability and Statistical Studies, 4(2):83–91.