' $
Department of Mathematics
Small
Area Estimation under a Multivariate
Linear
Model for Repeated Measures Data
Innocent Ngaruye, Joseph Nzabanita, Dietrich von Rosen and
Martin Singull
Department of Mathematics
Link¨oping University
Small Area Estimation under a
Multivariate linear model for
repeated measures data
Innocent Ngaruye1,2, Joseph Nzabanita1,2, Dietrich von Rosen1,3
and Martin Singull1
1Department of Mathematics, Link¨oping University,
SE–581 83 Link¨oping, Sweden
E-mail: {innocent.ngaruye, joseph.nzabanita, martin.singull}@liu.se
2Department of Mathematics,
College of Science and Technology, University of Rwanda, P.O. Box 3900 Kigali, Rwanda
E-mail: {i.ngaruye, j.nzabanita}@ur.ac.rw
3Department of Energy and Technology,
Swedish University of Agricultural Sciences, SE- 750 07 Uppsala, Sweden
E-mail: dietrich.von.rosen@slu.se
Abstract
In this paper, we consider small area estimation under a multivariate linear regression model for repeated measures data. The aim of the proposed model is to get a model which borrows strength across small areas and over time, by incorporating simultaneously the area effects and time correlation. The model accounts for repeated surveys, group individuals and random effects variations. Estimation of model pa-rameters is discussed within a restricted maximum likelihood based approach. Prediction of random effects and the prediction of small area means across time points and per group units for all time points are derived. The results are supported by a simulation study.
Keywords: Maximum likelihood; Multivariate linear model; Prediction of random effects; Repeated measures data; Small Area Estimation
1
Introduction
Population surveys are carried out via sampling designs and data collection of individual units with intention of making statistical inferences about a larger population of which these units are members. One commonly used design is simple random sampling which assigns equal selection probabilities to all elements in the population. These surveys are usually designed to provide efficient estimates of parameters of interest for large populations. In most cases, surveys are not originally designed to produce estimates for small domains and hence these domains are poorly represented in the sample. Thus, the surveys often provide very little information on a small area level and direct survey estimates on a target small area are not reliable due to a small sample size connected to this area.
In recent years, Small Area Estimation (SAE) methods have received much attention due to their usefulness in both public and private sectors and their demand has greatly increased worldwide. Several approaches and new developments in small area estimation have been investigated by different authors for example, Pfeffermann (2002, 2013); Rao (2003); Chambers and Clark (2012). The demand for small area statistics has increased due to their use in formulating social and economic policies, allocation of government funds, regional planning, business decision making etc. SAE has been used in a wide range of applications such as unemployment rates, poverty mapping, disease mapping, demography etc. One may refer to Ghosh and Rao (1994) and Rao (2003) for some examples and case studies in SAE.
Repeated measures data which refer to response outcomes taken on the same experimental unit at different time points have been widely used in research. The analysis of repeated measures data allows us to study trends over time. The demand for small area statistics is for both cross-sectional and for repeated measures data. For instance, small area estimates for repeated measures data may be used by public policy makers for different purposes such as funds allocation, new educational or health programs and in some cases, they might be interested in a given group of population.
It has been shown that the multivariate approach for model-based meth-ods in small area estimation may achieve substantial improvement over the usual univariate approach (Datta et al., 1999). Some studies dealing with SAE problems for longitudinal surveys have been discussed by various au-thors, for example Consortium (2004); Nissinen (2009); Singh and Sisodia (2011). The latter has developed direct, synthetic and composite estimators for small area means at a given time point when the population units contain non-overlapping groups.
esti-mators, we propose a model which borrows strength across both small ar-eas and over time by incorporating simultaneously the effects of arar-eas and time correlation. The model allows for finding the small area means at each time point, per group units and particularly the pattern of changes or mean growth profiles over time. This model accounts for repeated surveys, group of individuals and random effects variations.
The paper is organized as follows. After the Introduction, the second section is devoted to the model formulation and notation used and Section 3 deals with estimation of model parameters. Sections 4 and 5 discuss the prediction of random effects and pediction of target small area means, respec-tively. We end up by a simulation study is Section 6 and general concluding remarks in Section 7.
2
Model formulation
We consider repeated measurements on the variable of interest y taken at p time points t1, . . . , tp from a finite population U of size N which is partitioned
into m disjoint subpopulations U1, . . . , Um called small areas of sizes Ni, i =
1, . . . , m such that Pm
i=1Ni = N . We also assume that in every area, there
are k different groups of units of size Ng for group g such that
Pk
g=1Ng = Ni
and Pm
i
Pk
g=1Nig = N , where Nig is the group size within the i-th area.
Suppose that a sample s = s1, . . . , sm is selected from the population using
simple random sampling without replacement, where si is the sample of size
ni observed from area i. The sample remains the same over time.
Let yij be the p-vector of measurements on the j-th unit, in the i-th area.
That is yij = (yij1, . . . , yijp)0, j = 1, . . . , Ni, i = 1, . . . , m.
We assume the mean growth of the jth unit in area i for each group to be a polynomial in time of degree q − 1. Furthermore, we suppose that we
have auxiliary data xij of r concomitant variables (covariates) whose values
are known for all units in all m small areas. These auxiliary variables are included in the model to strengthen the limited sample size data from areas.
The values xij can be the values of the survey from the same area in the
past and/or the values of the other variables that are related to the variable of interest. Moreover, they can be register based information or the data measured for characteristics of interest in other similar areas.
The relationship between yij and xij in each small area is not always
considered as the same as the relationship between the variables in the pop-ulation as whole (Chambers and Clark, 2012). So, we have to add an area specific term to allow them to better account for the between area variability
random area-effects which vary over time. Thus, for each one of the k groups, the unit level regression model for j-th unit coming from the small area i at time t can be expressed by
yijt = β0 + β1t + · · · + βqtq−1+ γ0xij + uit+ eijt,
j = 1, . . . , Ni; i = 1, . . . , m; t = t1, . . . , tp,
where eijt are random sampling errors depending on the sampling scheme
assumed to be i.i.d normal with mean zero and known sampling variance σ2
e independent of uit. The γ is a vector of fixed regression coefficients of
auxiliary variables. The β0, . . . , βq are unknown parameters assumed to be
the same in all areas under the assumption that there is no area effect on polynomial growth in time.
For all time points, the model can be written in matrix form as yij = Aβ + 1pγ0xij + ui+ eij, where A = 1 t1 t21 · · · t q−1 1 1 t2 t22 · · · t q−1 2 .. . ... ... ... 1 tp t2p · · · tq−1p , 1p = (1, 1, . . . , 1)0 : p × 1.
The vector ui is assumed to be multivariate normally distributed with zero
mean and unknown covariance matrix Σu. Collecting the vectors yij for all
units in small area i coming from the group g gives
Yig = ABg+ 1pγ0Xig+ uiz0ig+ Eig, g = 1, . . . , k, (1) where Yig = (yi1, · · · , yiNig); Bg = (βg, · · · , βg) : q × Nig; Xig = (xi1, · · · , xiNig); zig = 1 √ Nig 1Nig and Eig = (ei1, · · · , eiNig).
The model presented in (1) holds for each group unit in area i. Since we are interested in every group unit, we need to include a known design matrix
Ci : k × Ni of group separation indicators when collecting all k groups in the
i-th area, which are supposed to be the same in all areas.
The model at small area level for k groups is then written as Yi = ABCi+ 1pγ0Xi+ uiz0i+ Ei, where Yi = (Yi1, · · · , Yik); B = (β1, · · · , βk) : q × k; Xi = (Xi1, · · · , Xik); zi = (z0i1, · · · , z 0 ik) 0; E ig = (ei1, · · · , eik); Ci = 10Ni1 0 . .. 0 10N ik ,
where Ei ∼ Np,Ni(0, Σe, INi) standing for matrix normal distribution with
mean zero, positive definite covariance matrix between rows Σe = σe2Ip and
independent columns. Yi is a p × Ni data matrix; A is a p × q, q ≤ p
known within individuals design matrix for fixed effects; B is q × k unknown
parameter matrix; Ci with rank(Ci) + p ≤ Ni is a k × Ni known between
individuals design matrix for fixed effects and Xi is a r × Ni known matrix
taking the values of the covariates.
Combining all small areas for N =Pm
i=1Ni units, we get the model
Y |{z} p×N = A |{z} p×q B |{z} q×k H |{z} k×mk C |{z} mk×N + 1pγ0 |{z} p×r X |{z} r×N + U |{z} p×m Z |{z} m×N + E |{z} p×N , (2) where Y = [Y1, · · · , Ym]; H = Ik: · · · : Ik ; X = [X1, · · · , Xm]; U = [u1, · · · , um]; E = [E1, · · · , Em]; C = C1 0 . .. 0 Cm and Z = z01 0 . .. 0 z0m . It is assumed that E ∼ Np,N(0, Σe, IN), U ∼ Np,m(0, Σu, Im), p ≤ m. Furthermore, vec(Y ) ∼ NpN vec(ABHC + 1pγ0X), Σ for Σ = Z0Z ⊗ Σu + IN ⊗ Σe and Σu = σ2 u1 σu12 · · · σu1p σu12 σu22 · · · .. . · · · . .. ... σu1p · · · σ2up : p × p,
where the covariance between random effects ui’s at two distinct times t, s
is Cov(ut, us) = σuts= ρt,sσutσus with ρt,s standing for the correlation.
3
Estimation for the mean and covariance
The model defined in (2) is a sum of two matrix normal distributions which is normally distributed but not in general matrix normally distributed. There-fore, in order to use the maximum likelihood estimation approach, we make some transformations to achieve a matrix normal distribution which is easier
spanning C(A)⊥, i.e., C(Ao) = C(A)⊥, where C(A) denotes the column
vec-tor space generated by the columns of the matrix A. Moreover, A− denotes
an arbitrary generalized inverse of the matrix A such that AA−A = A. We
also denote by PA = A(A0A)−A0, QA = I − PA, PA,B = A(A0BA)−A0B
and QA,B = I − PA,B. PA and QA are projector matrices and if B is
positive definite, PA,B and QA,B are also projectors.
First of all, we observe that the matrices Z, H, C from model (2) are of full row rank and the matrix A is of full column rank. Moreover,
C(Z0) ⊆ C(C0) and ZZ0 = Im.
Both relations imply that (CC0)−1/2CZ0ZC0(CC0)−1/2 is an idempotent
matrix and thus can be diagonalized by an orthogonal matrix, say Γ with corresponding eigenvalues 1 and 0, i.e.,
(CC0)−1/2CZ0ZC0(CC0)−1/2 = ΓDΓ0 = ΓIm 0
0 0
Γ0.
Since each column of Γ is an eigenvector associated to the corresponding
eigenvalue in D, we can partition Γ = [Γ1 : Γ2] such that Γ1 corresponds
to the block Im and Γ2 corresponds to the block 0, with Γ1 : mk × m and
Γ2 : mk × (mk − m). It follows that Γ01Γ1 = Im and Γ02Γ2 = Imk−m.
Choose C0o to be a matrix which columns are unit length eigenvectors
corresponding to non zero eigenvalues of I − C0(CC0)−C. Then we have
(C0o)0C0o = IN −mk and we can make a one to one transformation on (2) to
get [V : W ] = Y [C0(CC0)−1/2 : C0o], where
V =Y C0(CC0)−1/2 = ABH(CC0)1/2+ 1pγ0XC0(CC0)−1/2
+ (U Z + E)C0(CC0)−1/2, (3)
W =Y C0o = 1pγ0XC0o+ EC0o, (4)
since CC0o = 0 and C(Z0) ⊆ C(C0) implies that ZC0o = 0. Make a further
transformation on (3) so that [V1 : V2] = Y C0(CC0)−1/2[Γ1 : Γ2], where
V1 =Y C0(CC0)−1/2Γ1 = ABH(CC0)1/2Γ1+ 1pγ0XC0(CC0)−1/2Γ1
+ (U Z + E)C0(CC0)−1/2Γ1, (5)
V2 =Y C0(CC0)−1/2Γ2 = ABH(CC0)1/2Γ2+ 1pγ0XC0(CC0)−1/2Γ2
+ EC0(CC0)−1/2Γ2. (6)
respectively. Then, V1 ∼ Np,m M1, Σu+ Σe, Im , V2 ∼ Np,mk−m M2, Σe, Imk−m , W ∼ Np,N −mk M3, Σe, IN −mk , where M1 =ABH(CC0)1/2Γ1+ 1pγ0XC0(CC0)−1/2Γ1, M2 =ABH(CC0)1/2Γ2+ 1pγ0XC0(CC0)−1/2Γ2, M3 =1pγ0XC0o.
The matrices V1, V2 and W are independently distributed.
Proof 3.1 The independence of V1, V2 and W follows by the fact that
the matrices Γ1, Γ2 and C0o are all of full rank and pairwise orthogonal. We
observe that Γ0Γ = Imk, and
Γ0Γ1 = Im 0 , Γ0Γ2 = 0 Imk−m , Γ10Γ =Im 0 , Γ20Γ =0 Imk−m . From E ∼ Np,N(0, Σe, IN), U ∼ Np,m(0, Σu, Im), it follows that U ZC0(CC0)−1/2Γ1 ∼ Np,m 0, Σu, Γ01(CC 0 )−1/2CZ0ZC0(CC0)−1/2Γ1 , EC0(CC0)−1/2Γ1 ∼ Np,m 0, Σe, Γ01(CC 0 )−1/2CC0(CC0)−1/2Γ1 , EC0(CC0)−1/2Γ2 ∼ Np,mk−m 0, Σe, Γ02(CC 0 )−1/2CC0(CC0)−1/2Γ2 , EC0o ∼ Np,N −mk 0, Σe, (C0o)0C0o . Hence, (U Z + E)C0(CC0)−1/2Γ1 ∼ Np,m 0, Σu+ Σe, Im , EC0(CC0)−1/2Γ2 ∼ Np,mk−m 0, Σe, Imk−m , EC0o ∼ Np,N −mk 0, Σe, IN −mk ,
since Γ01(CC0)−1/2CZ0ZC0(CC0)−1/2Γ1 = Γ01ΓDΓ 0 Γ1 = Im, Γ01(CC0)−1/2CC0(CC0)−1/2Γ1 = Im, Γ02(CC0)−1/2CC0(CC0)−1/2Γ2 = Imk−m, (C0o)0C0o = IN −mk,
which completes the proof of the theorem.
Now consider the two components W and V2 and recall that the covariance
matrix Σe = σe2Ip is known. Put
K1 = H(CC0)1/2Γ1, K2 = H(CC0)1/2Γ2,
R1 = C0(CC0)−1/2Γ1, R2 = C0(CC0)−1/2Γ2.
Then,
W =Y C0o = 1pγ0XC0o+ EC0o,
V2 =Y R2 = ABK2+ 1pγ0XR2+ ER2.
The corresponding log-likelihood of the joint density function is given by l(γ, B) = lW(γ) + lV2(γ, B),
where lW(γ) and lV2(γ, B) denote the log-likelihood functions for W and
V2, respectively. Let use the notation (Y )()0 instead of (Y )(Y )0 in order to
shorten matrix expressions. Therefore,
l(γ, B) = − p(N − mk) 2 log (2π) − N − mk 2 log |Σe| − p(mk − m) 2 log (2π) − mk − m 2 log |Σe| − 1 2tr n Σ−1e W − 1pγ0XC0o 0o − 1 2tr n Σ−1e V2− ABK2− 1pγ0XR2 0o =co− 1 2tr n Σ−1e W − 1pγ0XC0o 0 + Σ−1e V2− ABK2− 1pγ0XR2 0o ,
where co is a constant not depending on the parameters and tr stands for the
Taking the first and second derivatives with respect to γ and B, the likelihood equations can be expressed as
XC0o(C0o)0+ XR2R02 Y01p− p XC0o(C0o)0X0+XR2R02X 0 γ −XR2K02B 0 A01p =0, (7) A0Y R2− ABK2− 1pγ0XR2 K02 =0. (8)
Lemma 3.1 The likelihood equation (7) admits a unique solution for the parameter vector γ if the matrix X is of full rank and the likelihood equation (8) admits a non unique solution for the parameter matrix B.
The proof of the lemma can be found in Appendix. The second equation (8) gives B =(A0A)−1A0Y R2K02(K2K02) −− (A0 A)−1A01pγ0XR2K02(K2K02) − + T1(K2K02) o0 ,
for an arbitrary matrix T1. Plugging in the value of B into equation (7)
yields XC0o(C0o)0 + XR2R02 Y01p− p XC0o(C0o)0X0+ XR2R02X 0 γ − XR2K02(K2K02) − K2R02Y 0 1p+ pXR2K02(K2K02) − K2R02X 0 γ = 0. Then, the restricted maximum likelihood estimators (RMLEs) for γ and B are obtained by b γ = pXC0o(C0o)0X0 + pXR2R02X 0 − pXR2K02(K2K02) − K2R02X 0− ×XC0o(C0o)0Y0+ XR2R02Y 0 − XR 2K02(K2K02) − K2R02Y 0 1p + pXC0o(C0o)0X0+ pXR2R02X 0 − pXR2K02(K2K02) − K2R02X 0o t2, b B =(A0A)−1A0Y R2K02(K2K02) −− (A0 A)−1A01pγb 0 XR2K02(K2K02) − + T1(K2K02) o0,
for an arbitrary vector t2and an arbitrary matrix T1. Put P = XC0o(C0o) 0 + XR2R02− XR2K02(K2K02)−K2R02. Then b γ =1 p(P X 0 )−P Y01p+ (P X0)ot2, (9) b B =(A0A)−1A0Y R2K02(K2K02) − − 1 p(A 0 A)−1A01p10pY P 0 (XP0)−XR2K02(K2K02) − − (A0A)−1A01pt02(P X 0 )o0XR2K02(K2K02) − + T1(K2K02)o0. (10)
The estimators γ of γ and bb B of B depend on arbitary vector t2 and
arbitrary matrix T1, respectively. However, there is also information about
B and γ in V1 which we now try to utilize in order to find the expressions
of T1 and t2. Recall that
V1 =Y R1 = ABK1+ 1pγ0XR1+ (U Z + E)R1, ER1 ∼Np,m 0, Σu+ Σe, Im .
Inserting the values of γ and B in the model V1 above, yields
V1 =A(A0A)−1A0Y R2K02(K2K02) −K 1 −1 p1p1 0 pY P 0 (XP0)−XR2K02(K2K02) − K1 − 1pt02(P X 0 )o0XR2K02(K2K02) − K1+ AT1(K2K02) o0 K1 +1 p1p1 0 pY P 0 (XP0)−XR1+ 1pt02(P X 0 )o0XR1 + (U Z + E)R1. Set V3 =Y R1− A(A0A)−1A0Y R2K02(K2K02) − K1 +1 p1p1 0 pY P 0 (XP0)−XR2K02(K2K02) − K1− 1 p1p1 0 pY P 0 (XP0)−XR1,
with model equation V3 =AT1(K2K02) o0K 1+ 1pt02(P X 0 )o0XR1 − 1pt02(P X 0 )o0XR2K02(K2K02) − K1+ (U Z + E)R1, or equivalently, V3 = AT1Ce1+ 1pt02Ce2+ eE, (11) where e C1 =(K2K02)o0K1, e C2 =(P X0)o0XR1− (P X0)o0XR2K02(K2K02) − K1, e
E =(U Z + E)R1, with eE ∼ Np,m
0, Σu+ Σe, Im
.
Hence, the model obtained in (11) is an extended multivariate linear growth
curve model with nested subspace condition C(1p) ⊆ C(A) on the within
design matrices. One can refer to Filipiak and von Rosen (2012) for more details about extended multivariate linear growth curve model and the cor-responding maximum likelihood estimators.
Lemma 3.2 Let V3 be defined as in (11). The RMLEs for the parameter
matrices T1, t2 and Σu are given by
bt2 =(10pS −1 1 1p)−1( eC2QCe0 1 e C02)−Ce2Q e C01V 0 3S −1 1 1p + ( eC2QCe0 1) o0t0 211p, (12) b T1 =(A0S−12 A) −1A0 S−12 (V3− 1pbt 0 2Ce2) eC 0 1( eC1Ce 0 1) −+ A0 T11Ce o 1, (13) b Σu = 1 m(V3− A bT1Ce1− 1pbt 0 2Ce2)(V3 − A bT1Ce1− 1pbt 0 2Ce2)0− Σe, (14)
for an arbitrary vector t21 and an arbitrary matrix T11, where
S1 = V3Q( eC0 1: eC 0 2)V 0 3, S2 = S1+ Q1p,S−11 V3PQ e C01Ce 0 2V 0 3Q 0 1p,S−11 .
The proof follows similarly to Theorem 1 in Filipiak and von Rosen (2012) in a particular case of two profiles.
Following Filipiak and von Rosen (2012), from model (11), bT1 andbt2 are
unique if and only if the matrices eC1 and eC2 are of full rank respectively
and C( eC1) ∩ C( eC2) = {0}. However, A bT1Ce1 and 1pbt
0
2Ce2 are unique and hence bΣu given in (14) is unique.
With the above calculations, we are now ready to give a theorem which summarize the main results about estimation of the formulated model.
Theorem 3.2 Consider the model given by (2). Then, the RMLEs of γ,
B and Σu can be expressed as
b γ =1 p(P X 0 )−P Y01p + (P X0)otb2, b B =(A0A)−1A0Y R2K02(K2K02) − −1 p(A 0 A)−1A01p10pY P 0 (XP0)−XR2K02(K2K02) − − (A0A)−1A01pbt 0 2(P X 0 )o0XR2K02(K2K02) − + bT1(K2K02) o0 , b Σu = 1 m(V3− A bT1Ce1− 1pbt 0 2Ce2)(V3− A bT1Ce1− 1pbt 0 2Ce2)0− Σe, where bt2 and bT1 are given by (12) and (13) , respectively.
This Theorem follows from relations (9), (10) and Lemma 3.2.
4
Prediction of random effects
The small area means in a given area i are defined as the conditional mean given the realized area effects (Battese et al., 1988). Thus, estimates of ran-dom area effects are needed in the estimation of small area means. As pointed
out by Pinheiro and Bates (2000), although technically the random effects are not model parameters, in some ways they do behave like parameters and since they are unobservable we want to predict their values. The idea is to predict unobservable random variable from some realized values. Robinson (1991) discusses the need of prediction of random effects and summarizes the theory best linear unbiased predictor ( BLUP). Searle et al. (2009) present the theory of prediction of random variables. As stated by these authors, the minimum variance is used for estimation of a fixed parameter while the minimum mean square is used for estimation of the realized value of a ran-dom variable. In this section, we use the approach developed by Henderson (1973) for prediction of random effects which consists of maximizing the joint density between the observable random variable and non observable random variable.
Consider the model in (2) given by
Y = ABHC + 1pγ0X + U Z + E,
and maximize the joint density f (Y , U ) with respect to U assuming the
covariance matrices Σu and Σe to be known. We get
f (Y , U ) =f (U )f (Y |U ) =λ expn−1 2tr{U 0 Σ−1u U }o × expn− 1 2tr{Σ −1 e (Y − ABHC − 1pγ0X − U Z)()0} o , where λ is a constant. Then, the estimating equation for U equals
Σ−1e (Y − ABHC − 1pγ0X − U Z)Z0− Σ−1u U = 0,
which is equivalent to
ΣeΣ−1u U + U ZZ 0
= (Y − ABHC − 1pγ0X)Z0,
and since ZZ0 = Im, it follows that
(ΣeΣ−1u + Ip)U = (Y − ABHC − 1pγ0X)Z0.
Thus, we get the following theorem about the prediction of random effects Theorem 4.1 Consider the model defined by (2). Then, the prediction of random effects is given by
b U =ΣeΣb −1 u + Ip −1 (Y − A bBHC − 1pγb 0 X)Z0, where γ, bb B and bΣu are given in Theorem 3.2.
5
Prediction of target small area means
We assume that the small area model holds for the sample data which means that no sample selection bias and that the sampling design is not informative. The model in (2) comprises in theory, sampled and non sampled units. That is
Y = (Y(s)1 , · · · , Y(s)m, Y(r)1 , · · · , Y(r)m) : p × N,
where Y(s)i = (yi1, · · · , yini) : p × ni, represents the sampled ni
observa-tions from the i-th small area and Y(r)i = (yin
i+1, · · · , yiNi) : p × (Ni− ni,
corresponds to the non sampled (Ni− ni) units from the i-th small area.
Then, split the sample siinto sig, g = 1, . . . , k with corresponding sample
sizes nig for k groups. Therefore, the target vector in small area i which
elements are area means at each time point is given by yi =fiY
(s)
i + (1 − fi) bY (r) i ,
where Y(s)i is the vector of small area means corresponding to sampled
units, bY
(r)
i is the vector of predicted small area means corresponding to
non-sampled units and 1 − fi is the finite population correction factor with
fi = Nnii the sampling fraction, that is the fraction of the population that is
sampled. Therefore, yi = fi ni Y(s)i 1ni + 1 − fi Ni− ni b Y(r)i 1Ni−ni = 1 Ni Y(s)i 1ni + bY (r) i 1Ni−ni , where b Y(r)i = A bBCi+ 1pγb 0 Xi+ubiz 0 i. (15)
It is convenient to note that γ , bb B and ubi used in (15) are estimators
com-puted from Theorem 3.2 and Theorem 4.1 , respectively using observed data. Then, the target vector of small area means for each group g across all time points is given by yig = 1 Nig Y(s)ig 1nig + bY (r) ig 1Nig−nig , g = 1, . . . , k.
Equivalently, yig = 1 Nig X j∈sig yij + (1 − nig Nig )Abβg+ 1 Nig X j /∈si 1pγb 0 xij + (1 − nig Nig )ubi+ 1 Nig X j /∈sig eij,
where bβg is the column of the estimated parameter matrix bB corresponding
the the group g and the predicted vectors ubi’s are the the columns of the
predicted matrix bU . The first term of this expression on the right side
is known and by the strong law of large numbers, if Nig is large, the last
term is approximately equal to zero. Following Henderson (1975), the linear predictor of yig = (1 − nig Nig )Aβg+ 1 Nig X j /∈sig 1pγ0xij + (1 − nig Nig )ui is given by b yig = (1 − nig Nig )Abβg+ 1 Nig X j /∈sig 1pγb 0 xij + (1 − nig Nig )ubi,
where bβg are columns of the estimated parameter matrix bB. Hence,
yig = 1 Nig X j∈sig yij + (1 − nig Nig )Abβg+ 1 Nig X j /∈sig 1pbγ 0 xij + (1 − nig Nig )ubi.
Note that the population means of auxiliary variables x in area i at time t
must be known so that the non-sampled mean P
j /∈six
0
ijt is then obtained by
substracting the corresponding sample means from the population mean. Theorem 5.1 Given the multivariate linear regression model as defined in (2). The target small area means at each time point are the elements of the vectors yi = 1 Ni Y(s)i 1ni+ bY (r) i 1Ni−ni , where bY(r)i = A bBCi+ 1pγb 0
Xi+ubizi. The target small area means for each group across all time points are the elements of the vectors
yig = 1 Nig X j∈sig yij + (1 − nig Nig )Abβg+ 1 Nig X j /∈sig 1pbγ 0 xij + (1 − nig Nig )ubi.
6
Simulation study example
In this section, we present a simulation study where we consider four times repeated surveys on a population of size having 8 small areas and draw a sample of size n = 450 with the following small area sample sizes given in Table 1,
Table 1: Sample sizes
Area Group 1 Group 2 Group 3 Total
1 n11=12 n12=18 n13=16 n1=46 2 n21=21 n22=23 n23=12 n2=56 3 n31=10 n32=20 n33=15 n3=45 4 n41=16 n42=24 n43=17 n4=57 5 n51=24 n52=26 n53=21 n5=71 6 n61=20 n62=12 n63=28 n6=60 7 n71=27 n72=13 n73=14 n7=54 8 n81=20 n82=14 n83=27 n8=61 m=8 g1=150 g2=150 g3=150 n=450
We assume that we have r = 3 covariables. The design matrices are chosen to be
A = 1 1 1 2 1 3 1 4 , C = C1 0 . .. 0 C8 for Ci = 10n i1 ⊗ 1 0 0 : 10n i2 ⊗ 0 1 0 : 10n i3 ⊗ 0 0 1 , i = 1, · · · , 8.
The parameter matrices, the sampling variance and the covariance for the random effects are
B = 8 9 10 11 12 13 , γ = 1 2 3 , σe2 = 0.16 and Σu = 4.1 1.8 1.2 2.4 1.8 3.6 2.4 1.4 1.2 2.4 6.0 2.2 2.4 1.4 2.2 9.6 .
Then, the data are randomly generated from
vec(Y ) ∼ Npn(vec(ABHC + 1pγ0X), Σ, In), using MATLAB Version
8.3.0.532 (The MathWorks, Inc. USA), where the matrix of covariates X is randomly generated as uniformly distributed on the interval [0, 1] and then taken as fixed.
The simulations were repeated 500 times using the formulas presented in Theorem 3.2 and the following average estimates were obtained:
b B = 8.0226 9.0551 9.9728 11.0002 11.9997 13.0002 , γ =b 0.9681 1.9743 3.0222 , b Σu = 4.1683 1.8835 1.2804 2.5056 1.8835 3.6705 2.4316 1.4471 1.2804 2.4316 5.9460 2.1355 2.5056 1.4471 2.1355 9.3509 .
From the above simulations, we see that these estimates are close to the true values and thus, the proposed estimators support the theoretical results.
7
Concluding remarks
The main task in the present paper has been the prediction of small area means for repeated measures data using the model-based approach under SAE techniques. We have considered longitudinal surveys under simple ran-dom sampling without replacement repeated over time whose target popula-tion is divided into non-overlapping groups available in all small areas.
In order to address the problem of SAE under these settings, we have proposed a multivariate linear regression model that borrows strength across both small areas and over time. This model accounts for repeated measures data, group individuals and random effect variations over time. The estima-tion of model parameters has been discussed within a restricted maximum likelihood based approach. The model is split into three component models, some algebraic transformations are performed to achieve the matrix normal distribution of each component thereby follows the derivation of explicit re-stricted maximum likelihood estimators. Prediction of small area means is presented at all time points, at each time point and by group units. These theoretical results have also been illustrated in a simulation study.
In future work we wish to study properties of all proposed estimators. In particular, the moments and approximation of the distribution of estimators for this SAE multivariate linear regression model and Mean Square Error
estimation. We also wish to have some specific real data set and apply the results of this work.
A
Appendix
A.1
Proof of Lemma 3.1
Proof A.1 Obviously, the likelihood equation (7) admits a unique solu-tion for the parameter vector γ if the matrix XC0o(C0o)0X0 + XR2R02X
0
is of full rank. We have rank(C0o(C0o)0) = N − mk (with N > mk) and
rank(R2R02) = mk − m.
Moreover,
C0o(C0o)0R2R02 = R2R02C 0o
(C0o)0 = 0. So, C(C0o(C0o)0) ∩ C(R2R02) = {0} and then
rank(C0o(C0o)0+ R2R02) = rank(C 0o (C0o)0) + rank(R2R02) = N − m. Therefore, rank(XC0o(C0o)0X0 + XR2R02X 0 ) = rankX(C0o(C0o)0 + R2R02)X 0 = rank(X) provided that rank(X) ≤ N − m.
The likelihood equation (8) is equivalent to
A0ABK2K02 = A
0
Y R2K02− A 0
1pγ0XR2K02.
This equation in B admits a non unique solution if and only if one or both
matrices A and K2are not of full rank. Since A is a full rank matrix, we need
to show that the matrix K2 = H(CC0)1/2Γ2 is not of full rank. It follows
from the construction of Γ2 that C(Γ2) = C
(CC0)1/2(CZ0
)o. But for any
matrices of proper sizes, if C(F ) = C(G), then C(EF ) = C(EG) (see for
example Harville (1997) for more details). Therefore, CH(CC0)1/2Γ
2
= CHCC0(CZ0)o.
Moreover, if two matrices have the same column space then they have
the same rank. Using this result and the following rank formula which
can be found, for example in Kollo and von Rosen (2005), rank(F : G) =
rank(F0Go) + rank(G), we get
Since Z0 = C0Q for Q = 1 √ N11k 0 . .. 0 √1 Nm1k , it follows that rank(H(CC0)1/2Γ2) =rank(HCC0(CZ0)o) =rank(CC0H0: CC0Q) − rank(CC0Q) =rank(H0: Q) − rank(Q)
=rank(H0) + rank(Q) − dim(C(H0) ∩ C(Q)) − rank(Q)
=rank(H0) − dim(C(H0) ∩ C(Q)),
since the matrix CC0 is of full rank and where dim denotes the dimension
of a subspace. It remains to show that C(H0) and C(Q) are not disjoint.
Let v1 = 1kand v2 = (
√
N1, · · · ,
√
Nm)0 and recall that H = Ik : · · · : Ik.
Then we have H0v1 = 1mk and Qv2 = 1mk. Thus, the two spaces are not
disjoint since they include a common vector. This completes the proof of the lemma.
Acknowledgements
The research of Innocent Ngaruye is supported by the Swedish International Development and Cooperation Agency (SIDA) in collabolation with the Uni-versity of Rwanda. Both institutions are acknowledged.
References
Battese, G. E., Harter, R. M., and Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401):28–36.
Chambers, R. and Clark, R. G. (2012). An introduction to Model-Based Survey Sampling with Applications. Oxford University Press, Oxford New York.
Consortium, T. E. (2004). Enhancing small area estimation techniques to meet european needs. Technical report, Office for National Statistics, Lon-don.
Datta, G. S., Lahiri, P., Maiti, T., and Lu, K. L. (1999). Hierarchical Bayes estimation of unemployment rates for the states of the us. Journal of the American Statistical Association, 94(448):1074–1082.
Filipiak, K. and von Rosen, D. (2012). On MLEs in an extended multivariate linear growth curve model. Metrika, 75(8):1069–1092.
Ghosh, M. and Rao, J. N. K. (1994). Small area estimation: an appraisal. Statistical science, 9(1):55–76.
Harville, D. A. (1997). Matrix Algebra from a Statistician’s Perspective,
volume 157. Springer.
Henderson, C. R. (1973). Sire evaluation and genetic trends. Journal of Animal Science, 1973(Symposium):10–41.
Henderson, C. R. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics, 31(2):423–447.
Kollo, T. and von Rosen, D. (2005). Advanced Multivariate Statistics with Matrices, volume 579. Springer, Dordrecht.
Nissinen, K. (2009). Small Area Estimation with Linear Mixed Models
from Unit-level panel and Rotating panel data. PhD thesis, University of Jyv¨askyl¨a.
Pfeffermann, D. (2002). Small area estimation-new developments and direc-tions. International Statistical Review, 70(1):125–143.
Pfeffermann, D. (2013). New important developments in small area estima-tion. Statistical Science, 28(1):40–68.
Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and S-plus. Springer-Verlag, New York.
Rao, J. N. K. (2003). Small Area Estimation, volume 331. John Wiley and Sons, New York.
Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical Science, 6(2):15–32.
Searle, S. R., Casella, G., and McCulloch, C. E. (2009). Variance Compo-nents, volume 391. John Wiley and Sons, New York.
Singh, B. and Sisodia, B. S. V. (2011). Small area estimation in longitudinal surveys. Journal of Reliability and Statistical Studies, 4(2):83–91.