Efficiency Comparison of Distribution-Free Transformations in the Straight-Line Regression Problem

(1)

Department of Statistics

Efficiency Comparison of Distribution-Free Transformations

in the Straight-Line Regression Problem

By

Ling Zhang

Supervised by

Prof. Rolf Larsson

Submitted in partial fulfillment of the requirements for the degree of

Master of Social Science May, 2010

(2)

in the Straight-Line Regression Problem

Ling Zhang

Supervisor: Prof. Rolf Larsson

y

May, 2010

ABSTRACT

In statistical inference of the distribution-free straight-line regression problem, two common transformations, rank trans-formation and sign transtrans-formation, are used to construct the test statistics. When shall we need to use the transtrans-formations and which transformation is more e¢ cient are two common questions met by researchers. In this thesis, we will discuss the comparison of the e¢ ciencies of the statistics before and after the rank transformation or the sign transformation in both theoretical and practical ways. Simulation is also used to compare the e¢ ciencies of the statistics under di¤ erent distribu-tions. Some recommendations about when to use transformations and which one to choose are put forward associated with the conclusion drawn from the research work we have done.

Key Words: Rank Transformation, Sign Transformation, E¢ ciency, E¢ cacy, Distribution-Free Straight-Line Regression.

1. Introduction

I

n the linear regression problem, a number of parametric techniques are available for inference about the parameters. And for parametric methods, there are always some basic as-sumptions. Taking the simple linear regression problem as an example, the ordinary least square (OLS) method, which is equivalent to the maximum likelihood (ML) method in this case, is a method that is often used. The classical assump-tion for the simple linear regression model is that the residu-als should be independently, identically and normally distrib-uted with mean 0. Sometimes, the classical assumption is not satis…ed, such as the residuals are not normally distributed. Maybe it is better to do some parametric transformations for the variables before the regression analysis. However, if the classical assumption does not hold even after the transforma-tion, what should we do? An alternative is considering the distribution-free regression and the distribution-free transfor-mations.

Distribution-free methods are also known as nonparamet-ric methods. Here, I prefer “distribution-free methods” to “nonparametric methods”, since “distribution-free methods”

can tell us more precisely that the methods do not rely on any assumptions of the probability distribution of the data sets.

In distribution-free statistical methods, two common transformations are rank transformation and sign transforma-tion. Suppose x1; x2; : : : ; xn are observations of the random

variables X1; X2; : : : ; Xn. The rank transformation is based

on the rank function, i.e.

rank x(i) = i; i = 1; 2; : : : ; n

where x(1) < x(2) < : : : < x(n)are the ordered x1; x2; : : : ; xn.

For a test statistic T (X), where X = (X1; X2; : : : ; Xn),

ac-cording to the rank transformation, we can construct the rank test statistic by substituting the variables X1; X2; : : : ; Xn

with their ranks rank (X1) ; rank (X2) ; : : : ; rank (Xn).

Similarly, sign transformation is de…ned by the sign func-tion, i.e. sgn (x) = 8 < : 1 if x < 0 0 if x = 0 1 if x > 0

Then we can construct the sign test statistic with the signs sgn (X1) ; sgn (X2) ; : : : ; sgn (Xn) instead of the random

vari-ables X1; X2; : : : ; Xn. From the construction of the two

trans-formed statistics, we can see that we do not need assumptions of any certain distribution or assumptions of parameters for the probability distributions.

Author e-mail: Ling.Zhang.5841@student.uu.se

(3)

In the statistical inference of the distribution-free regres-sion, one of the problems of interest is the change of the power or e¢ ciency of the estimator after transformation. From the changes, we will know whether the transformations are avail-able for statistical inference. We will also know which trans-formation is better than the others.

In distribution-free statistics, e¢ ciency is often used for the discussion of di¤erent kinds of transformed statistics. Ef-…cacy, closely related to e¢ ciency, is another term used for the comparison of di¤erent test statistics, which is introduced by Maritz (1981). He compared the e¢ cacies of Wilcoxon signed-rank, median, mean and maximum likelihood (ML) statistics in the one-sample location problem under di¤erent distribu-tions. And he concluded that, in the normal case, the e¢ cacy of the Wilcoxon signed-rank statistic is close to that of the ML statistic. He also concluded that, the better the e¢ cacy of the Wilcoxon signed-rank statistic, the more satisfactory e¢ cacy the median statistic has. In Maritz’s conclusion, however, he didn’t mention the e¤ects of the two common transforma-tions for the e¢ ciency. He did not do the comparison of the e¢ cacies of di¤erent transformed statistics in the two-sample location problem or the regression problem, either.

Stuart (1954) discussed the e¢ ciencies of some distribution-free tests of randomness against normal alter-natives, and gave some comments for the choice of these test statistics. Iman (1974) suggested using the rank transfor-mation for the two-way classi…cation model when interaction may be present. He also pointed out that, when the data are normally distributed, the loss of power for the rank trans-formation is small. From Iman (1974), we know that, for the rank transformation based on the normally distributed data set, there is a little loss of power. However, none of them discussed both the change of the power or e¢ ciency after sign transformation and the di¤erence of the e¢ ciency change between the two di¤erent transformation methods.

This thesis concentrates on the two kinds of transformed statistics in the distribution-free regression problem, and com-pares the changes of the e¢ ciencies after the two transfor-mations. The aim of this thesis is to …nd the di¤erence of the e¤ects of the two transformations from both theo-retical and practical points of view. In the second section, the basic de…nitions are introduced. In the third section, we will compare the e¢ ciencies of the transformed statistics in the distribution-free straight-line regression problem theoret-ically. Simulations will be done to compare the e¢ ciencies practically in the fourth section. The …fth section will show the results of the empirical examples. Finally, in the con-clusion section, we will compare the results we obtained in sections three, four and …ve, and give the conclusions and some recommendations.

2. E¢ ciency, ARE and E¢ cacy

B

efore carrying out the comparison of the transforma-tions, we should know some important concepts related to the comparison. We have mentioned the concepts e¢ ciency and e¢ cacy in the introduction part, and we also need to know what they are and how to measure them.

2.1 E¢ ciency of the Estimator and ARE

E¢ ciency is a kind of measure of the desirability of an estima-tor. It is often used to make a choice between estimators by a quantitative comparison. Relative e¢ ciency, the ratio of two e¢ ciencies, is a relative term to compare two distinct statisti-cal tests for a certain statististatisti-cal hypothesis. For example, sup-pose y1; y2; : : : ; ynare observations of a sample Y1; Y2; : : : ; Yn,

and the sample comes from a distribution with parameter . We want to test the null hypothesis H0 : = 0 against an

alternative hypothesis H1. There are two di¤erent test

sta-tistics TA(Y; ) and TB(Y; ) that can be used in the test,

where Y = (Y1; Y2; : : : ; Yn). Which one shall we choose?

De-…ne the two tests with statistics TA(Y; ) and TB(Y; ) as

TEST A and TEST B, respectively. If we …x the signi…cance level and the statistical power, we can calculate the num-ber of observations for each test that should be used. Denote number a for TEST A, and number b for TEST B. The rela-tive e¢ ciency of TEST A with respect to TEST B is de…ned as eAB = b/ a, that is the ratio of the number of observations

needed, when both statistics test the same null hypothesis H0 : = 0 against the same alternative hypothesis H1 at

the same signi…cance level (both either one-tail or two-tail ) and get the same statistical power.

Pitman (1948) …xed the signi…cance level and the statis-tical power, took the limit lim eAB

! 0

, and then got Pitman as-ymptotic relative e¢ ciency (ARE). Similarly, Bahadur (1960) considered the case that …xed the statistical power and the parameter , and took the limit lim eAB

!0

, while Hodges and Lehmann (1956) …xed the signi…cance level and the pa-rameter , and took the limit of the statistical power to 1. Di¤erent kinds of asymptotic relative e¢ ciencies are devel-oped. In this thesis, we just consider the Pitman asymptotic relative e¢ ciency (ARE).

We have discussed the e¢ ciency and the relative e¢ ciency theoretically. In practice, how do we calculate them? Casella and Berger (2002) mentioned that the e¢ ciency is concerned with the asymptotic variance of an estimator. According to Maritz (1981), the e¢ ciency of estimator b is measured by its variance V ar b . So, a common measurement of the e¢ ciency is the variance of the estimator. The smaller the e¢ ciency is, the more e¢ cient the estimate will be. In order to get the e¢ ciency of the estimator, we just need to calculate the variance of it. And the relative e¢ ciency of two estimators

(4)

is the ratio of their variances.

2.2 E¢ cacy of the Test Statistic

Maritz (1981) also introduced another term, e¢ cacy, the ab-solute value of the …rst derivative of the expectation of test statistic T (Y; ) at ₀, where the statistic T (Y; ) is stan-dardized with respect to its standard deviation under the null hypothesis H0: = 0, i.e. eT( 0) = @E p T (Y; ) V ar[T (Y; 0)] @ = 0 = @E[T (Y; )] @ ₌ 0 p V ar [T (Y; ₀)] (2.1) For the test statistic T (Y; ), we can view eT( 0) as the

slope at ₀, which is a natural measure of the e¢ ciency of T (Y; ).

According to Maritz (1981), the approximation of the vari-ance of the estimator c₀is

V arT c0 V ar [T (Y; ₀)] @E[T (Y; )] @ ₌ 0 2 (2.2)

From formulae (2.1) and (2.2), it is obvious that V arT c0 [eT( 0)]

2

(2.3) i.e. the e¢ ciency of the estimate of ₀ approximately equals the reciprocal of the square of the e¢ cacy eT( 0). So, we can

calculate the e¢ ciency of the estimator c₀ from the e¢ cacy of the test statistic T (Y; ) approximately.

Pitman (1948) gave the ARE of test statistic TA(Y; )

with respect to the test statistic TB(Y; ), i.e.

ARE (TA; TB) = @E[TB(Y; )] @ ₌ 0 2 V ar [TB(Y; 0)] V ar [TA(Y; 0)] @E[TA(Y; )] @ ₌ 0 2 (2.4) From formulae (2.1) - (2.4), it is also obvious that

ARE (TA; TB) = V arTA c0 V arTB c0 eTB( 0) eTA( 0) 2 (2.5)

We can see that the …rst equation is the de…nition of ARE. The relation between the e¢ cacy and the ARE is that the ra-tio of the squared e¢ cacy of statistic TB(Y; ) to that of

sta-tistic TA(Y; ) approximately equals the ARE of TA(Y; )

with respect to TB(Y; ). Maritz (1981) also gave the same

conclusion. Then, as we mentioned in the introduction, e¢ -cacy can also be used to compare the e¢ ciencies.

3. E¢ ciency Comparison in the

Distribution-Free Straight-Line

Regression Problem

I

n the following sections of this thesis, we will discuss the e¢ ciency comparison of the transformed test statistics in distribution-free regression problems. Without loss of gener-ality, we talk about the distribution-free straight-line regres-sion problem.

3.1 Straight-line Regression Model

Suppose Y1; Y2; : : : ; Yn are independent continuous random

variables that are observed at the values x1; x2; : : : ; xn of a

nonrandom variable X. The cumulative distribution function and the density function of Yj(j = 1; 2; : : : ; n) are de…ned as

Fj(y) and fj(y), respectively. We assume that Fj(y) are

identical distributions except for location. j is a location

parameter of Yj, and we take j as the median of Yj in this

problem. Then the straight-line regression can be speci…ed by formula (3.1),

M edian (Yj) = j= + xj (3.1)

where and are parameters.

Now, we consider parameter estimation and inference. Ac-cording to the assumption, we know that Fj(y) are identical

except for location, so we can view this regression problem as an extension of the two-sample location problem. We also get that Yj j are identically distributed.

Dj( ) = Yj xj (3.2)

Since Yj j = Yj xj are identically distributed, the

di¤erences Dj( ) de…ned in formula (3.2) are also identically

distributed. What’s more, for inference based on Dj( ), we

just need to concentrate on the parameter .

3.2 Basic Test Statistic and E¢ cacy

Calcula-tion

From the simple linear regression model yj = + xj +

"j(j = 1; 2; : : : ; n), where "j i:i:d:

N 0; 2 _{is the error term,}

we know that Pn_j=1xj"bj = Pnj=1xj yj b bxj = 0.

Similarly, in order to test the null hypothesis H0 : = 0

against an alternative hypothesis H1, for a distribution-free

straight-line regression model, in the case of E (Yj) = + xj

(5)

shown in (3.3) with the di¤erence Dj( ), wherePnj=1xj= 0 is assumed. T ( ) = n X j=1 xjDj( ) (3.3) From T ^ = n X j=1 xj yj bxj = n X j=1 xjyj bx2j = 0 we get b = n P j=1 xjyj n P j=1 x2 j (3.4) Then V arT b = V ar n P j=1 xjyj ! n P j=1 x2 j !2 = n P j=1 x2 jV ar (yj) n P j=1 x2 j !2 = 2 n P j=1 x2 j (3.5)

So, the e¢ ciency of the estimate of based on the statistic T is 2.Pn j=1x2j. Since @E [T (b)] @b _b= = @ n P j=1 xj[E (Yj) bxj] @b b= = @ n P j=1 xj( xj bxj) @b b= = @ ( b) n P j=1 x2 j @b b= = n X j=1 x2_j (3.6) V ar [T ( )] = n X j=1 x2_jV ar (Yj) = 2 n X j=1 x2_j (3.7) then, from formulae (2.1), (3.6) and (3.7), we obtain that

e (T ) = @E[T (Y; )] @ = 0 p V ar [T (Y; 0)] = n P j=1 x2 j s 2Pn j=1 x2 j = s n P j=1 x2 j (3.8)

The e¢ cacy of statistic T = Pn_j=1xjDj( ) is e (T ) =

qPn j=1x2j

.

. As discussed in the formula (2.3) of section 2.2, from formulae (3.5) and (3.8), we can see that

V arT b = [e (T )] 2 (3.9)

So, for the statistic T , the equality sign holds. We can also express the relation of the e¢ ciency of the estimator b based on the statistic T and the e¢ cacy of T in the following way,

V arT b / 1

.

e (T )2 (3.10) i.e. the e¢ ciency is inverse proportional to the square of the e¢ cacy.

(6)

3.3 Transformed Statistics and Theoretical

E¢ ciency Calculation

We have discussed the two important transformations in the introduction, i.e. rank transformation and sign transforma-tion, which are often used in distribution-free problems. Ac-cording to the two transformations, we get two transformed test statistics shown in formulae (3.11) and (3.12).

TR( ) = n X j=1 xjrank (Dj( )) (3.11) TS( ) = n X j=1 xjsgn Dj( ) D ( )b (3.12) where bD ( ) = median(Dj( )).

Maritz (1981) gave the e¢ cacies of the two statistics, i.e.

e (TR) = Z f2(y) dy v u u t12 n X j=1 x2 j (3.13) e (TS) = 2f (z0:5( )) v u u t n X j=1 x2 j (3.14) where z0:5( ) = bD ( ) .

From formulae (3.8), (3.13) and (3.14), we can work out the ARE of statistic T with respect to TR, and the ARE of

statistic T with respect to TS, respectively. They are shown

in the following two formulae.

ARE (T; TR) = e (TR) e (T ) 2 = 12 2 Z f2(y) dy 2 (3.15) ARE (T; TS) = e (TS) e (T ) 2 = 4 2f2(z0:5( )) (3.16)

From the two AREs, we can see that ARE (T; TR) in

(3.15) depends on the mean density of the distribution F , R f2_{(y) dy, while ARE (T; T}

S) in (3.16) depends on

the density of F at the median f (z0:5( )). The choice

between the statistics T_R R and TS depends on the ratio

f2(y) dy f (z0:5( )). If Rf2(y) dy f (z0:5( )) > 1

p 3 , the statistic TRis more e¢ cient than the statistic TS, and we

prefer TRto TS; otherwise, we prefer TS. Thus, the

theoreti-cal results cannot tell us directly which transformed statistic is more e¢ cient.

4. Simulation

F

rom the theoretical comparison, we cannot make sure which statistic is more e¢ cient. But, from a practical

point of view, we can do some simulations to calculate the values of the e¢ ciencies and compare them. Then, we can make some sense from the simulation results. In the follow-ing part, we will simulate data sets from di¤erent kinds of continuous distributions, and compare both the results based on each distribution and the simulation results from di¤erent distributions.

In order to carry out the simulation research based on the distribution-free straight-line regression, we need to termine the values of the independent variables and the de-pendent variables. For the indede-pendent variable, since we as-sumed thatPn_j=1xj = 0 (j = 1; 2; : : : ; n), in the simulation,

we should do an adjustment for each xj. First, we de…ne a

sequence which is from 3 to 3 with step 0:01. After that, we sample 100 xjs from the sequence with replacement, and

subtract each xj with the mean x, where x = Pnj=1xj

. n. Then, we get a set of observations of xjs that satis…es the

assumptionPn_j=1xj = 0, and the observations of xjs will be

…xed in the whole simulation process.

4.1 Simulation Based on Normal Distribution

For the dependent variable, we should determine its distrib-ution …rst. There are many kinds of distribdistrib-utions that can be used. A distribution that is often used in our study and research is the normal distribution.

4.1.1 Simulation Process

When we have got the observations of the independent vari-able and decided what distribution to use as the distribution of the dependent variable, we can continue simulating the sta-tistics and calculating the e¢ ciencies. The simulation process can be carried out in the following three steps.

Step 1: For the normal case, we simulate 100 yjs from the

standard normal distribution, i.e. Yj N (0; 1). By using

the same data set, we can calculate the point estimates of based on T , TR and TS, respectively, i.e. bT, bTR and bTS.

We can also work out the e¢ cacies of the three statistics. In this problem, the point estimations based on TR and TS are

not easily formulated. Maritz (1981) introduced a smooth-ing method to get the point estimation. Accordsmooth-ing to the smoothing idea, we can work out an estimator that is close to the true value of the point estimator.

Step 2: Repeat the process in step 1 M = 100 times, and we will get 100 point estimates of based on each statistic, i.e. b_{T ji}, b_T_R_ji and b_T_S_ji(i = 1; 2; : : : ; M ; j = 1; 2; : : : ; N ). Then, we can calculate the means b_{T j}, b_T_R_j and b_T_S_j, and the variances s2

T j, s2TRj and s

2

(7)

respec-tively, i.e. b_{T j} = 1 M M X i=1 b_{T ji}; s2_{T j}= 1 M 1 M X i=1 b_{T ji} b_{T j} 2 b_T Rj = 1 M M X i=1 b_T Rji; s 2 TRj = 1 M 1 M X i=1 b_T Rji bTRj 2 b_T_S_j = 1 M M X i=1 b_T_S_ji; s2TSj= 1 M 1 M X i=1 b_T_S_ji b_T_S_j 2

Beside those, we can also work out the mean of the e¢ ca-cies for each statistic. From formula (3.13), we see that the e¢ cacy of TR does not depend on the data set of the

depen-dent variable, so we do not need to repeat to calculate this value. We just calculate it once at the end of the simulation. By comparing the variances of the estimators calculated based on the three di¤erent statistics, we will know which sta-tistic is more e¢ cient. However, drawing conclusions from the results of only one simulation is not su¢ cient, since we don’t know whether the simulation result is stable for each simu-lation. We need to …nd out the variances of the estimates calculated from the three statistics, i.e. VT = V arT b ,

VTR = V arTR b and VTS = V arTS b , based on many

simulations.

Step 3: In order to know whether the simulation result is stable, we should do the simulation several times. Repeat the simulation process in step 2 N = 100 times, and calculate the mean and the variance for each estimate that we got from one of the three di¤erent statistics.

4.1.2 Variance Analysis

After completing the three simulation steps, VT may be split

into two parts. One is the variance within each simulation re-sult WT, the other is the variance between di¤erent simulation

results BT. So do VTR and VTS, i.e.

WT = 1 N N X j=1 s2_{T j}; BT = 1 N 1 N X j=1 b_{T j} b_T 2 WTR = 1 N N X j=1 s2TRj; BTR = 1 N 1 N X j=1 b_T_R_j b_T_R 2 WTS = 1 N N X j=1 s2_T_S_j; BTS = 1 N 1 N X j=1 b_T_S_j b_T_S 2 where b_T = 1 N N X j=1 b_{T j} b_T_R = 1 N N X j=1 b_T_R_j b_T_S = 1 N N X j=1 b_T_S_j

Then we get VT = WT + BT, VTR = WTR + BTR and

VTS = WTS+ BTS.

Table 1. Variance analysis of b based on the three statistics (Yj N (0; 1), sample size 100)

T TR TS

Variance within group W 0.00317 0.00335 0.00483 Variance between groups B 0.00003 0.00003 0.00005 Variance of estimate V 0.0032 0.0034 0.0049

From table 1, we can see that the variances of the estimates between di¤erent simulations for the three statistics are much smaller than those within each simulation, which re‡ects that the simulation process is stable. Since the simulation process is really time consuming, we set only 100 estimators within each simulation, and it is already stable. If we set more es-timators for each simulation, such as 10000, the variance be-tween groups will become much smaller, and we can ignore the variance between groups. Actually, even though there are only 100 estimators within each simulation, from table 1, we can see that ignoring the variance between simulations will not a¤ect the conclusions drawn from the simulation results. The variance analysis has proved that the simulation process based on the normal distribution is stable, which means the simulation results are credible. For other distri-butions, we can also prove that in the same way. Thus, in the following simulations, we will not discuss more about this.

4.1.3 Simulation Results

Table 2. Simulation results of di¤erent statistics under N (0; 1) (sample size 100)

T TR TS

E¢ ciency of Beta 0.0032 0.0034 0.0049 E¢ cacy of statistics 18.00 17.44 14.13

Table 2 shows the simulation results under the normal distribution. We can see that, for the standard normal dis-tribution, the e¢ ciency of the point estimate of based on the statistic T is close to that based on the statistic TR. The

e¢ ciency calculated based on the statistic T is the smallest while the value from TS is the largest. Similarly, the e¢ cacy

(8)

the e¢ ciencies, the e¢ cacy of T is the largest value, while the e¢ cacy of TS is the smallest one. So, TR has larger

ef-…cacy than TS. From formulae (3.15) and (3.16), we obtain

ARE (T; TR) = 0:94 > ARE (T; TS) = 0:62, i.e. the statistic

TR is more e¢ cient than TS. We get the same conclusion

when comparing the e¢ cacies of TRand TS directly. As a

re-sult, in the normal case, we conclude that the transformations result in information loss, even though the statistic after rank transformation just loses a little e¢ ciency. And comparing to the sign transformation, the rank transformation is better.

4.2 Simulation Results under Di¤erent

Distri-butions

More simulations can be done by changing the distribution of Yj from normal distribution to some other distributions.

We choose some common distributions in the following sim-ulations, and they are Cauchy, t, Double Exponential, Ex-ponential, Gamma and Chi-square. We simulate Yj from

Cauchy (0; 1), t (3), DoubleExp (1), Exp (1), Gamma (10; 1) and Chisq (2), respectively. The simulation process is the same as the standard normal distribution case discussed in section 4.1.1, and we do not describe the details any more. Tables 3 and 4 in the following subsections show the simula-tion results under di¤erent distribusimula-tions.

4.2.1 E¢ ciency Comparison Based on Di¤erent Dis-tributions

The simulation results of the e¢ ciencies based on di¤erent statistics are shown in table 3.

Table 3. The e¢ ciencies of b simulated under di¤erent distributions (sample size 100)

V arT b V arTR b V arTS b N (0; 1) 0.0032 0.0034 0.0049 Cauchy (0; 1) 20349.8 0.0116 0.0086 t (3) 0.0091 0.0052 0.0060 DoubleExp (1) 0.0063 0.0044 0.0039 Exp (1) 0.0032 0.0013 0.0030 Gamma (10; 1) 0.0317 0.0312 0.0468 Chisq (2) 0.0127 0.0052 0.0125

4.2.1.1 E¢ ciency Comparison of Symmetric Distrib-utions The top four distributions in table 3 are symmetric about 0. By comparing the four cases, we can see that the e¢ ciencies in the case Cauchy (0; 1), which is equivalent to t (1), are much larger than those in other cases, especially in the …rst column. -5 0 5 0 .0 0 .1 0 .2 0 .3 0 .4 0 .5 D e n s ity N(0, 1) t (3) Cauc h y(0, 1) Doub leE xp(1)

Figure 1. Density curves of the four di¤erent symmetric distributions in simulation.

Figure 1 shows the density curves of the four symmet-ric distributions. We can see that Cauchy (0; 1) is a heavy-tailed distribution. Thus, it is much more probable to simu-late some extreme values from Cauchy (0; 1) than any other distributions. Then the value of the estimator b got from Cauchy (0; 1) may have a large dispersion. That means the variance of the estimator will be larger than those of the other distributions. That is why the e¢ ciencies in the case Cauchy (0; 1) in table 3 are much larger than those in other cases.

From …gure 1, we can also see that the tails of t (3) are very close to those of DoubleExp (1). And both the tails of t (3) and DoubleExp (1) are a little heavier than those of the normal case. For heavy-tailed distributions, we have dis-cussed that it is more probable to get some extreme values from the simulation, and the variance of the estimators will be larger than those of some other distributions that are not heavy-tailed. If we take the rank transformation or the sign transformation, the range of the values simulated from the heavy-tailed distributions will become smaller after transfor-mations. So, there will be no extreme values in the regression, and the variance of the estimate will be much smaller than that before transformation. The simulation results in table 3 just proved what we discussed. We can see that, beside the normal case, the e¢ ciencies decreased after the transfor-mations. In this case, the transformations have good e¤ects on the regression. Thus, we can draw the conclusion that, for heavy-tailed distributions, the results after transformation are better than those without transformation.

(9)

4.2.1.2 E¢ ciency Comparison of Asymmetric Distri-butions 0 5 10 15 20 0 .0 0 .1 0 .2 0 .3 0 .4 0 .5 De ns ity N(10,s qr t(10)) Gamma(10,1) Exp(1) Chisq(2)

Figure 2. Density curves of the three di¤erent asymmetric distributions in simulation.

Figure 2 shows the density curves of the last three distri-butions in table 3. One more density curve, which is from the normal distribution that has the same mean and vari-ance as the distribution Gamma (10; 1), is shown in the

…g-ure 2. Gamma (10; 1) is a unimodal distribution. Comparing to the normal density curve, we can see that the distribution Gamma (10; 1) is a little right skewed, but not a heavy-tailed distribution. From the simulation results of in table 3, we know that the e¢ ciencies before and after the rank transfor-mation are very close to each other, even though it becomes a little smaller. And Gamma (10; 1) is also the only one nonnormal unimodal distribution with the property that the e¢ -ciency became larger after the sign transformation, just like in the standard normal case. We have concluded that the esti-mates are more e¢ cient after transformation for heavy-tailed distributions. In other words, if the distribution is centralized with light tails, such as a normal distribution, the estimate of based on the statistic without transformation is more e¢ -cient. Once again, the results of Gamma (10; 1) proved that the transformations may result in information loss.

4.2.1.3 E¢ ciency Comparison of All Distributions Comparing the results of the columns for all distributions in table 3, we may …nd that, beside the normal case, the e¢ ciencies of based on T are larger than the e¢ ciencies of based on TR. So, for most of the non-normal

distribu-tions, the rank transformation is more e¢ cient than no trans-formation. What’s more, beside the cases Cauchy (0; 1) and DoubleExp (1), the values based on TRare smaller than those

based on TS. We may draw another conclusion that, for most

of the distributions, the rank transformation is better than the sign transformation.

Table 4. The e¢ cacies and AREs of statistics simulated under di¤erent distributions (sample size 100) e (T ) e (TR) e (TS) ARE (T; TR) ARE (T; TS) N (0; 1) 18.00 17.44 14.13 0.94 0.62 Cauchy (0; 1) 1.82 9.83 11.08 29.16 37.01 t (3) 11.29 14.20 12.96 1.58 1.32 DoubleExp (1) 12.85 15.45 16.41 1.45 1.63 Exp (1) 18.38 30.91 17.77 2.83 0.94 Gamma (10; 1) 5.69 5.73 4.56 1.01 0.64 Chisq (2) 9.20 15.45 8.86 2.82 0.93

4.2.2 E¢ cacy and ARE Comparison Based on Di¤er-ent Distributions

Table 4 shows the simulation results of the e¢ cacies and AREs based on di¤erent distributions. Looking at table 4, by com-paring the top four cases, we …nd that the e¢ cacies of the case Cauchy (0; 1) are smaller than that of the other distributions. However, the ARE values are much larger than the other re-sults. The results are consistent with what we got from table 3. Since, from formula (3.10), we know that the e¢ ciency is inverse proportional to the square of the e¢ cacy, if the e¢ -ciency is large, the e¢ cacy will be small. Thus, the e¢ cacies of the case Cauchy (0; 1) are obviously small, especially in the …rst column. The large AREs in this case tell us that both the

two transformations are su¢ ciently e¢ cient. So, for heavy-tailed distributions, carrying out a transformation is a good choice.

Comparing the columns, similar to what we discussed for table 3, beside the case N (0; 1), the e¢ cacies of T under dif-ferent distributions are smaller than those of the statistic TR.

That means the statistic TR has higher e¢ cacy than the

sta-tistic before transformation. So, rank transformation is more e¢ cient in most cases. Comparing the two columns of ARE values, we can see that, beside the cases Cauchy (0; 1) and DoubleExp (1), the ARE values of TRare larger than those of

TS. Again, the simulation results tell us that the rank

(10)

the distributions. That is consistent with the conclusion we drew from the e¢ ciency results. What’s more, the results of the last three distributions have the same property. We can see that, for the last three cases, the e¢ cacy of TR in each

case is the largest of the three e¢ cacies while the e¢ cacy of TS is the smallest. And the AREs of TR in the three cases

are all larger than those of TS. That is because the last three

distributions are all gamma distributions, since both Exp (1) and Chisq (2) are special cases of gamma distribution. They are Gamma (1; 1) and Gamma (1; 2), respectively. Thus, it is not surprising that they have so many results in common.

4.2.3 Some Inconsistency between the Simulation Re-sults of Table 3 and Table 4

If we compare the results of table 3 and table 4 very care-fully, we may …nd that, for the cases Exp (1) and Chisq (2), both the results of the e¢ cacy and the e¢ ciency based on the statistic T are larger than those based on the statistic TS. That seems impossible, since formula (3.10) shows that

the e¢ ciency is inverse proportional to the square of the e¢ -cacy. However, we may also …nd that the results of T are very close to those of TS. As we discussed in formula (2.3) of

sec-tion 2.2, the relasec-tion between the e¢ ciency and the e¢ cacy is not strict but approximate based on a large sample size. In the simulations, we set the sample size as 100, since the simulation process is very time consuming. For most of the distributions, the sample size 100 is enough to get consistent results. But for the cases Exp (1) and Chisq (2), the sample size may not be large enough. So we increase the sample size to 10000. And then we got the results shown in table 5.

Table 5. Simulation results of Exp (1) and Chisq (2) based on di¤erent statistics (sample size 10000)

T TR TS

Exp (1) E¢ ciency 3.44E-5 1.23E-5 3.47E-5 E¢ cacy 172.52 298.74 172.43 Chisq (2) E¢ ciency 1.344E-4 4.589E-5 1.345E-4

E¢ cacy 86.28 149.37 86.25

From table 5, we can see that the contradictions do not exist any more, and the conclusions we discussed still hold. Thus, the small sample size 100 results in the contradiction between the results of the tables 3 and 4. And we can also draw a conclusion that, for the cases Exp (1) and Chisq (2), the statistics before and after the sign transformation have similar e¢ ciencies.

Comparing table 3 with table 4, we may also …nd that, in table 3, the e¢ ciency result of Cauchy (0; 1) based on the statistic T is extremely large. It is much larger than the value

0:30, which is calculated from the e¢ cacy of the statistic T in the case Cauchy (0; 1) of table 4, i.e. 1:82 2 0:30. We have discussed in formula (3.9) that V arT b = [e (T )] 2,

so the two values should be the same. But it seems formula (3.9) does not hold. According to the calculation process, an approximation of the result 20349:8 in table 3 is calculated from the formulaP100_j=1P100_i=1[eji(T )] 2

.

10000 while the re-sult 0:30 is from the formulah P100_j=1P100_i=1eji(T )

. 10000i

2

. It is obvious that the function f (x) = x 2_{(x > 0) is convex.}

From Jensen’s inequality1_{, we get that} 100 X j=1 100 X i=1 [eji(T )] 2 , 10000 > 2 4 100 X j=1 100 X i=1 eji(T ) , 10000 3 5 2

It is consistent with the result 20349:8 > 0:30. That is why there is a large di¤erence between the two values. In this case, the value 0:30 is much smaller than the true value, but it is still a reasonable result, since, based on this value, the conclusions do not change.

5. Demonstration Research with

Empirical Examples

E

ven though we did not obtain any obvious results from the theoretical comparison of the e¢ ciencies based on dif-ferent test statistics, we get some conclusions from the sim-ulation. In this section, we give some empirical examples to check the conclusions that we got from the simulation.

5.1 Empirical Example I: Straight-Line

Re-gression Based on Centralized Light-Tailed

Distributions

The …rst empirical example talks about the relation between the American College Testing (ACT) score and the …rst-year grade point average (GPA) of the freshman student. A ran-dom sample2 _{of size 50 is taken from the freshman students}

in the College of Liberal Arts at the University of Iowa who had composite ACT scores of 30 or greater. The ACT scores and the …rst-year GPA of the 50 students are recorded. Here, the ACT scores recorded are the original ACT scores minus 30.

1_{Jensen’s inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral}

of the convex function. In its simplest form the inequality states, "the convex transformation of a mean is less than or equal to the mean after convex transformation."

2_{Hogg, R. V. and Randles, R. H. (1975): Adaptive Distribution-Free Regression Methods and Their Applications. Technometrics, Vol. 17, No.}

(11)

In the regression problem, we take the ACT score as the in-dependent variable and take the …rst-year GPA as the depen-dent variable. In order to satisfy the assumption mentioned in section 3.2, we make an adjustment for the independent variable ACT score, i.e. xj= ACTj Pn_j=1ACTj

. n.

5.1.1 Residual Analysis of Simple Linear Regression

First, we consider the simple linear regression between the ACT scores and the …rst-year GPA. From the simple linear regression model yj = + xj+ "j(j = 1; 2; : : : ; n), where

"j i:i:d:

N 0; 2 _{is the error term, we can work out the}

resid-uals _b"j of the regression. The two graphs in …gure 3 are

the histogram and the normal Q-Q plot of the residuals of the simple linear regression model. Both of them show that the residuals are not normally distributed. The Shapiro-Wilk normality test provided by R gives the p-value 0:01385. As a result, for the signi…cance level 0:05, we can reject the null hypothesis H0: the residuals are normally distributed. All the

results tell us that we can not use the simple linear regression model, since the basic normality assumption,_b"j N 0; 2 ,

is not satis…ed. Then, we do the distribution-free straight-line regression instead. Histogram of Residuals Residuals De ns ity -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 0 .0 0 .2 0 .4 0 .6 0 .8 (a) -2 -1 0 1 2 -1 .5 -1 .0 -0 .5 0 .0 0 .5

Normal Q-Q Plot of Residuals

T heoretical Quantiles S ampl e Q uant il es (b)

Figure 3. Histogram (a) and normal Q-Q plot (b) of residuals of the simple linear regression model.

5.1.2 Kernel Density Estimation

According to what we discussed in section 3 and the smooth-ing method mentioned in the simulation section 4.1, we can calculate the point estimate of based on T , TR and TS,

re-spectively. We can also work out the e¢ cacy of the statistic T . In order to calculate the e¢ cacies of TRand TS, we need to

know the density function f (y) of the dependent variable. In this empirical problem, that is the density function of GPA.

There are several techniques to estimate the density based on the dataset. In this thesis, we choose the kernel density estimation. The density function can be estimated using for-mula (5.1). b f (y) = 1 nB n X j=1 K y Yj B (5.1)

where K is the kernel function satisfyingR K (y) dy = 1 and B > 0 is the bandwidth or window width.

From formula (5.1), we know that, in order to estimate the density function, we need to decide the kernel function and the bandwidth. The choice of kernel and bandwidth has been researched by a lot of statisticians. Eggermont and LaRiccia (2001) said, in practice, the kernel is not very important. A common kernel is the Gaussian kernel, which is the standard normal density function.

The kernel K does not matter too much, however, the choice of the smoothing parameter B is much more critical. Several methods are developed for choosing the bandwidth.

3_{The rule-of-thumb is B = 0:9An} 1/5_{, where A = min (standard deviation; interquartile range/ 1:34) and n is the sample size.}

(12)

Silverman (1986) suggested a rule-of-thumb3 for choosing the bandwidth of a Gaussian kernel density estimator. That is also the default method used by the R program.

1.5 2.0 2.5 3.0 3.5 4.0 4.5 0 .0 0 .2 0 .4 0 .6 0 .8

K ernel Density Estimation

GP A, Bandwidth=0.2 De ns ity B=def ault B=sj B=ucv B=bcv

Figure 4. Kernel density estimation of GPA under Gaussian kernel and di¤erent choices of bandwidth.

Figure 4 shows the kernel density estimation under the Gaussian kernel and di¤erent choices of bandwidth. We can see that the default method gives a good estimation of the density, and the other methods are either undersmoothed or very close to the density curve estimated by the default method. So, we decide to use the Gaussian kernel and Silver-man’s bandwidth choosing method. In the following empirical examples, the methods will be the same, and we will not talk about it very precisely any more.

5.1.3 E¢ ciency Comparison among Distribution-Free Test Statistics

When we …xed the Gaussian kernel function and the band-width (that is 0:1984 in this problem), from formula (5.1) and the data set, we can estimate the density function f (y). Then, the e¢ cacies of TRand TS can also be estimated.

Ac-cording to formulae (2.3) and (3.9), we can also work out the e¢ ciencies of the estimators approximately.

Table 6. Results of the distribution-free straight-line regression of ACT and GPA

T TR TS

E¢ ciency of Beta 0.0045 0.0050 0.0095 E¢ cacy of statistics 14.86 14.13 10.27 ARE with respect to T 0.90 0.48

Table 6 shows the results of the empirical straight-line re-gression problem. Looking at the rows, we can see that the e¢ ciency of the point estimate of based on statistic T is the smallest of the three, while the value of the e¢ ciency calcu-lated based on statistic TS is the largest. As a contrast, the

e¢ cacy of T is the largest value while the e¢ cacy of TS is the

smallest one.

According to the simulation results in table 3, we know that, for most of the non-normal distributions, the rank trans-formation is more e¢ cient than no transtrans-formation. Looking at …gure 5, it is obvious that the GPA is not normally distrib-uted. So, the e¢ cacy of TR should probably be larger than

that of T . However, comparing to the normal distribution in …gure 5, which has the same mean and variance with GPA, we can see the distribution of GPA is a little left skewed with a short tail on its left. It can be considered as a centralized dis-tribution with light tails. As we discussed, if the disdis-tribution is centralized with light tails, the estimate of that is based on the statistic without transformation is more e¢ cient.

1 2 3 4 5 6 0 .0 0 .2 0 .4 0 .6 0 .8 1 .0 De ns ity GP A

N(mean(GP A), sd(GP A))

Figure 5. Density curve of GPA v.s. normal density curve.

Therefore, the results are consistent with what we dis-cussed in the simulation section, i.e. the transformations re-sult in information loss, and the e¢ ciency (or the variance) of the estimate of the parameter becomes larger after trans-formation. By comparing the columns, we can see that the results calculated based on the statistic T are very close to those based on TR. That is similar with the cases N (0; 1) and

Gamma (10; 1) discussed in the simulation section.

What’s more, the ARE of TRwith respect to T is obviously

larger than that of TS. That means TRhas larger relative

ef-…ciency than TS. It is also consistent with the conclusion we

drew from the simulation results. This empirical example il-lustrates that the sign transformation loses more information

(13)

than the rank transformation. Thus, in this example, com-paring to the sign transformation, the rank transformation is better.

5.2 Empirical Example II: Straight-Line

Re-gression

under

Non-Normal

Heavy-Tailed

Distributions

The second empirical example is about the relation between the body weight and the brain weight of several species of mammals. Allison and Cicchetti (1976) studied the ecolog-ical and constitutional correlates by researching the sleep of mammals4_{. The body weight in kilograms (kg) and the}

brain weight in grams (g) for 62 species of land mammals are recorded.

In this problem, take the body weight and the brain weight as the independent variable and the dependent variable, re-spectively. As we did in the …rst example, we make an adjustment for the data set of the body weight to satisfy the distribution-free regression assumption Pn_j=1xj = 0, i.e.

xj = Bodyj Pnj=1Bodyj

.

n. Then, we can carry out the regression analysis.

5.2.1 Case 1: Regression Analysis Based on the Orig-inal Data Set

Histogram of Residuals Residuals F reque nc y -1000 -500 0 500 1000 1500 2000 2500 0 10 20 30 40 (a) -2 -1 0 1 2 -500 0 5 00 1000 1500 2000

Figure 6. Histogram (a) and normal Q-Q plot (b) of residuals of the simple linear regression model based

on the original data set.

First, we consider the simple linear regression model yj =

+ xj+ "j(j = 1; 2; : : : ; n), where "j i:i:d: N 0; 2 is the

error term, and work out the residuals _b"j of the regression

problem.

Figure 6 shows the histogram and the normal Q-Q plot of the residuals from the simple linear regression model. It is obvious that the residuals are not normally distributed. The p-value of Shapiro-Wilk normality test is 2.316E-14. So, for the signi…cance level 0:01, we have enough evidence to reject the null hypothesis H0: the residuals are normally

distrib-uted. Then, we consider the distribution-free straight-line regression methods.

We have discussed the kernel density estimation in the …rst empirical example. In this distribution-free regression problem, we also choose the Gaussian kernel and the Sil-verman’s bandwidth choosing method, which is also known as the default method. Figure 7 shows the density curves based on di¤erent bandwidth choosing methods. We can see that, the default method is neither undersmoothed nor over-smoothed. Beside that, we can see that the method unbiased cross-validation (ucv) has the same estimation as the default method in this problem. Thus, in this problem, the default method is also a good choice.

(14)

0 1000 2000 3000 4000 5000 6000 0 .000 0 .001 0 .002 0 .003 0 .004 0 .005

K ernel Density Estimation

Br ain W eight (g), Bandwidth=47.59

De ns ity B=def ault B=sj B=ucv B=bcv

Figure 7. Kernel density estimation of brain weight under Gaussian kernel and di¤erent choices of bandwidth.

According to section 3 and section 4.1, we can calculate the point estimate of based on T , TR and TS, respectively.

After we have worked out the density function of the brain weight f (y), we can also work out the e¢ cacies of the statis-tic T , TR and TS, respectively. Then, according to formulae

(2.3) and (3.9), we got the e¢ ciencies of the estimators based on the three statistics.

Table 7. Results of the distribution-free straight-line regression of body weight and brain weight

T TR TS

E¢ ciency of Beta 0.0175 0.0002 0.0054 E¢ cacy of statistics 7.55 68.85 13.60 ARE with respect to T 83.19 3.25

Table 7 gives the results of the three di¤erent statistics. We can see that the e¢ ciency based on T is the largest while the e¢ ciency based on TR is the smallest. As a contrast, the

e¢ cacy of TRis the largest while the e¢ cacy of T is the

small-est. The ARE of TRwith respect to T is much larger than that

of TS. In general, the results are consistent with those we got

from the simulation. The density curve in …gure 7 shows that the distribution of the dependent variable is a non-normal distribution with a heavy right tail. We have discussed that, for heavy-tailed distributions, the results after transformation are better than those without transformation. And, for most of the non-normal distributions, the rank transformation is more e¢ cient than no transformation. This problem veri…ed the conclusions. And it also veri…ed that rank transformation is more e¢ cient than sign transformation.

5.2.2 Case 2: Regression Analysis Based on the Log Transformed Data Set

5.2.2.1 log transformation for linear regression

0 1000 2000 3000 4000 5000 6000 0 1000 2000 3000 4000 5000 Body W eight (kg) Br a in W e ight ( g) (a) 0 100 200 300 400 500 600 0 1 00 2 00 3 00 4 00 5 00 6 00 7 00 Body W eight (kg) Br a in W e ight ( g) (b)

Figure 8. Scatter plot of brain weight against body weight with the simple linear regression line.

Even though the variance of the estimate based on the rank transformed statistic is very small, the straight-line re-gression using the original data set seems not to be a good regression. Figure 8 shows the scatter plot of the brain weight

(15)

against the body weight with a simple linear regression line. From the graph (a), we see that there are some extreme points in the data set. If we zoom in the lower left corner of the graph (a), we got the graph (b). There are also some extreme points in the graph (b). From both of the graphs, we see that it is not a good choice to carry out a straight-line regression based on the original data set directly, no matter what slope of the straight regression line and how small the variance of the es-timate of the slope. We should consider some other methods.

0 10 20 30 40 50 60 0 1 00 2 00 3 00 4 00 5 00 6 00 B ody W e ight ( kg) (a) 0 10 20 30 40 50 60 0 1 00 2 00 3 00 4 00 5 00 6 00 7 00 Br a in W e ight ( g) (b)

Figure 9. Scatter plots of body weight (a) and brain weight (b).

As we mentioned in the introduction, for the data sets that the classic assumptions of the simple linear regression are not satis…ed, beside distribution-free methods, there is another al-ternative, which carries out some parametric transformations for the original data set before the regression analysis. One common transformation is the log transformation. It is avail-able, especially in the case that there is a wide di¤erence of the distances between every two adjacent values of the sample points, since the log transformation can decrease the di¤er-ence of the distances between every two adjacent values of the sample points, and make the distribution more centralized to approximate the normal distribution. When the distribution is approximately a normal distribution, many kinds of classi-cal methods can be used. Looking at …gure 9, it seems that the di¤erence of the distances between every two adjacent values of the data set in the second empirical problem is very large. So, we can try the log transformation.

Now, we consider the simple linear regression model based on the data set after log transformation. First, we carry out the log transformation for both the independent variable and the dependent variable. Then take xj = log (Body)j

Pn

j=1log (Body)j

.

n such that the assumptionPn_j=1xj = 0

is satis…ed. As we did before, we calculate the residuals of the simple linear model, and check if the residuals are normally distributed. Histogram of Residuals Residuals De ns ity -2 -1 0 1 2 0 .0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 (a)

(16)

-2 -1 0 1 2

-1

0

1

2

Figure 10. Histogram (a) and normal Q-Q plot (b) of residuals of the simple linear regression model based

on the log transformed data set.

From …gure 10, we can see that the residuals of the lin-ear model based on the log transformed data set are roughly normally distributed. The p-value got from the Shapiro-Wilk normality test is 0:5292, that means, for the signi…cance level 0:1, we can not reject the null hypothesis, i.e. H0: the

resid-uals are normally distributed. Thus, after log transforma-tion, the least square estimation works. What about the distribution-free methods?

Similarly, we work out the density function of the depen-dent variable at …rst. Then, according to section 3 and section 4.1, we calculate the point estimates of , the e¢ ciencies of the estimates and the e¢ cacies based on T , TR and TS,

re-spectively.

5.2.2.2 E¢ ciency Comparison among Di¤erent Distribution-Free Statistics

Table 8. Results of the distribution-free straight-line regression of log (BodyW eight) and log (BrainW eight)

T TR TS

E¢ ciency of Beta 0.0101 0.0125 0.0229 E¢ cacy of statistics 9.97 8.94 6.61 ARE with respect to T 0.80 0.44 Table 8 shows the results of the empirical distribution-free straight-line regression. We can see that the e¢ cacy of T is the largest while the e¢ cacy of TS is the smallest. The ARE

of TR with respect to T is larger than that of TS.

From …gure 11, we see that the distribution of the de-pendent variable after log transformation is centralized with

light tails. In this case, as we discussed in section 4.2, the distribution-free method without transformation is more ef-…cient than those with transformations, and the rank trans-formation is better than the sign transtrans-formation, just like the results shown in table 8.

-5 0 5 10 0 .00 0 .05 0 .10 0 .15 0 .20 De ns ity log(br ain)

N(mean(log(br ain)), sd(log(br ain)))

Figure 11. Density curve of log(brain weight) v.s. normal density curve.

Again, the empirical data set shows, in the regression problem of some centralized light-tailed distributions, the distributionfree method without transformation is more e¢ -cient than those with transformations, and the rank transfor-mation is more e¢ cient than the sign transfortransfor-mation.

5.2.3 Comparison of Distribution-Free Rank Trans-formation and Parametric Log TransTrans-formation Comparing the results in tables 7 and 8, we may …nd that, based on the log transformed data set, the statistics TR and

TS are not more e¢ cient than the statistic T any more.

Es-pecially, the change of the e¢ cacy of TR after the log

trans-formation is much larger than those of the other statistics. It seems that the rank transformation loses most of its e¢ cacy after the log transformation.

We have mentioned in the subsection 4.2.1 that the rank transformation and the sign transformation can decrease the range of the values simulated from the heavy-tailed distri-butions. Rank transformation transforms the data set into ranks, so it is a monotonic transformation. That is di¤er-ent from the sign transformation, since di¤erdi¤er-ent values may have the same result after the sign transformation. From a point of view, the rank transformation can decrease the dif-ference of the distances between every two adjacent values of the sample points. As we discussed in last subsection, the log

(17)

transformation is also a monotonic transformation which has the property to decrease the di¤erence of the distances be-tween every two adjacent values of the sample points. So, for the data set in the second example, which has a heavy-tailed distribution, the two methods have similar e¤ects. That is why the rank transformation lost e¢ cacy after the log trans-formation.

Under some conditions, the two transformation methods may be viewed as alternatives to each other. For the case of simple linear regression, if after log transformation, the clas-sical assumptions are still not satis…ed, we can choose the distribution-free rank transformation method instead. For the case of distribution-free straight-line regression, if the straight-line model does not …t well, we can also consider tak-ing a log transformation and carrytak-ing out OLS estimation.

6. Discussion and Conclusions

I

n this thesis, we discussed two common transformations in the distribution-free straight-line regression problem. We try to …nd out if it is better to carry out a transformation for the test statistic and which transformation works better.

We …rst carry out the comparison theoretically, that is comparing the formulae directly and trying to …nd out which e¢ ciency is larger and which one is smaller. From the discus-sion in section 3, we got that the AREs with respect to the statistic without transformation depend on either the mean density of the distribution or the density of the distribution at the median. We can not get an obvious result by comparing the formulae directly.

Then we do some simulations based on di¤erent kinds of continuous distributions that are often used. From the sim-ulation results, we got some sense of the comparison of the statistics.

For the normal distribution, after transformation, there is information loss, even though the loss of the rank transfor-mation is small. And, in this case, the rank transfortransfor-mation is better comparing to the sign transformation. Beside the normal distribution, if the distribution of the dependent vari-able is centralized with light tails, transformations may also result in information loss, and the statistic without transfor-mation is more e¢ cient. In other words, if the distribution is heavy-tailed, the results based on the transformed statistics are better than those of the statistic without transformation. For most of the non-normal distributions, the rank transformation is more e¢ cient than no transformation. What’s more, beside some cases such as Cauchy (0; 1) and DoubleExp (1), for most of the distributions, the rank trans-formation is better than the sign transtrans-formation.

The results of the empirical examples are consistent with the conclusions drawn from the simulations. For the case

where the dependent variable is centralized with light tails, which is close to the normal distribution, the statistic with-out transformation is more e¢ cient. For the case that the distribution of the dependent variable is a non-normal distri-bution with heavy right tail, the statistic with transformation is more e¢ cient than that without transformation. And, in both empirical examples, the rank transformation is better comparing to the sign transformation.

As an extension of the discussion of the distribution-free transformations, we compared the distribution-distribution-free rank transformation with the parametric log transformation. And we found that there is something in common. Both of the two methods are monotonic transformations, and they have similar e¤ects to decrease the range of the variables. So, some-times they can be viewed as alternatives to each other when solving the straight-line regression problems.

References

Allison, T. and Cicchetti, D. V. (1976): Sleep in Mammals: Ecological and Constitutional Correlates. Science, Vol. 194, pp. 732-734

Bahadur, R. R. (1960): Simultaneous Comparisons of the Optimum and the Sign Tests of a Normal Mean, in Contributions to Probability and Statistics: Essay in Honor of Harold Hotelling, Stanford University Press, Stanford, Calif., pp. 79-88

Casella, G. and Berger, R. L. (2002): Statistical In-ference. China Machine Press, Beijing

Eggermont, P. P. B., and LaRiccia, V. N. (2001): Maximum Penalized Likelihood Estimation. Springer, New York

Hodges, J. L., Jr. and Lehmann, E. L. (1956): The E¢ ciency of Some Nonparametric Competitors of the t-Test. The Annals of Mathematical Statistics, Vol. 27, No. 2, pp. 324-335

Hogg, R. V. and Randles, R. H. (1975): Adaptive Distribution-Free Regression Methods and Their Appli-cations. Technometrics, Vol. 17, No. 4, pp. 399-407 Hollander, M. and Wolfe, D. A. (1973): Nonpara-metric Statistical Methods. Wiley, New York

Iman, R. L. (1974): A Power Study of a Rank Trans-form for the Two-Way Classi…cation Model When In-teraction may be Present. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, Vol. 2, No. 2, pp. 227-239

(18)

Lehmann, E. L. (1975): Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco Maritz, J. S. (1981): Distribution-Free Statistical Methods. Chapman and Hall, London

Pitman, E.J.G. (1948): Lecture Notes on Nonparamet-ric Statistics. Columbia University Press, New York

Silverman, B. W. (1986): Density Estimation for Sta-tistics and Data Analysis. Chapman and Hall, London

Stuart, A. (1954): Asymptotic Relative E¢ ciencies of Distribution-Free Tests of Randomness against Normal Alternatives. Journal of the American Statistical Asso-ciation, Vol. 49, No. 265, pp. 147-157

Appendix

R Codes of Simulation

#Package (distr) library(distr) rm(list=ls()) set.seed(9999) xx<-sample(seq(-3,3,0.01),100,replace=T) x<-xx-mean(xx) # sum(x)=0 n<-length(x) b<-seq(-0.5,0.5,0.01) m<-length(b) D<-array(dim=c(m,n)) TR<-TS<-c() bT<-bTR<-bTS<-c() ebT<-ebTR<-ebTS<-c() eT<-eTR<-eTS<-meT<-meTS<-c() mbT<-mbTR<-mbTS<-c() for(r in 1:100){ for(j in 1:100){ y<-rnorm(n,0,1) # Y~N(0,1) #y<-rcauchy(n,0,1) # Y~Cauchy(0,1) #y<-rt(n,3) # Y~t(3)

#y<-r(DExp(1))(n) # Y~Double Exp(0,1) #y<-rexp(n) # Y~Exp(1)

#y<-rgamma(n,10,1) # Y~Gamma(10,1) #y<-rchisq(n,2) # Y~Chisq(2)

#T<-sum(x*D) # Statistic T

bT[j]<-sum(x*y)/sum(x^2) # Estimation of beta based on statistic T eT[j]<-sqrt(sum(x^2)/var(y)) # E¢ cacy of statistic T

#TR<-sum(x*rank(D)) # Statistic TR for(i in 1:m){ D[i,]<-y-b[i]*x TR[i]<-sum(x*rank(D[i,])) } Tr<-min(abs(TR)) wr<-0 for(i in 1:m){ if(abs(TR[i])==Tr) wr<-wr+1 } for(i in 1:(m-wr)){ if(abs(TR[i])==Tr&abs(TR[i-1+wr])==Tr){

(19)

hr<-TR[i]+(TR[i-1]-TR[i])/2 lr<-2*hr*wr/(TR[i-1]-TR[i+wr])

bTR[j]<-round(b[i]+lr*0.01,3) }

}

#bTR # Estimation of beta based on statistic TR #TS<-sum(x*sign(D-median(D))) # Statistic TS for(i in 1:m){ D[i,]<-y-b[i]*x TS[i]<-sum(x*sign(D[i,]-median(D[i,]))) } Ts<-min(abs(TS)) ws<-0 for(i in 1:m){ if(abs(TS[i])==Ts) ws<-ws+1 } for(i in 1:(m-ws)){ if(abs(TS[i])==Ts&abs(TS[i-1+ws])==Ts){ hs<-TS[i]+(TS[i-1]-TS[i])/2 ls<-2*hs*ws/(TS[i-1]-TS[i+ws]) bTS[j]<-round(b[i]+ls*0.01,3) } }

#bTS # Estimation of beta based on statistic TS DD<-y-bTS[j]*x

# E¢ cacy of statistic TS

eTS[j]<-sqrt(2/pi)*exp(-(median(DD))^2/2)*sqrt(sum(x^2)) # Y~N(0,1) #eTS[j]<-2/(pi*(1+(median(DD))^2))*sqrt(sum(x^2)) # Y~Cauchy(0,1) #eTS[j]<-4/(sqrt(3)*pi)*(1+(median(DD))^2/3)^(-2)*sqrt(sum(x^2)) # Y~t(3) #eTS[j]<-exp(-abs(median(DD)))*sqrt(sum(x^2)) # Y~Double Exp(0,1) #eTS[j]<-2*exp(-median(DD))*sqrt(sum(x^2)) # Y~Exp(1) #eTS[j]<-2*median(DD)^9*exp(-median(DD))*sqrt(sum(x^2))/362880 # Y~Gamma(10,1) #eTS[j]<-exp(-median(DD)/2)*sqrt(sum(x^2)) # Y~Chisq(2) } ebT[r]<-var(bT) ebTR[r]<-var(bTR) ebTS[r]<-var(bTS) mbT[r]<-mean(bT) mbTR[r]<-mean(bTR) mbTS[r]<-mean(bTS) meT[r]<-mean(eT) meTS[r]<-mean(eTS) print(paste("Simulation",r,"Complete!")) } mebT<-mean(ebT) mebTR<-mean(ebTR) mebTS<-mean(ebTS) mmeT<-mean(meT) # f(y)^2 fy2<-function(y){exp(-y^2)/(2*pi)} # Y~N(0,1) #fy2<-function(y){1/(pi*(1+y^2))^2} # Y~Cauchy(0,1) #fy2<-function(y){4/(3*pi^2)*(1+y^2/3)^(-4)} # Y~t(3) #fy2<-function(y){0.25*exp(-2*abs(y))} # Y~Double Exp(0,1) fbar<-integrate(fy2,-Inf,Inf)$value

(20)

#fy2<-function(y){exp(-2*y)} # Y~Exp(1)

#fy2<-function(y){y^18*exp(-2*y)/362880^2} # Y~Gamma(10,1) #fy2<-function(y){exp(-y)/4} # Y~Chisq(2)

#fbar<-integrate(fy2,0,Inf)$value

eTR<-fbar*sqrt(12*sum(x^2)) # E¢ cacy of statistic TR mmeTS<-mean(meTS)

AREr<-round((eTR/mmeT)^2,3) AREs<-round((mmeTS/mmeT)^2,3)

WT<-mean(ebT) # Variance within group WTR<-mean(ebTR)

WTS<-mean(ebTS)

BT<-var(mbT) # Variance between groups BTR<-var(mbTR)

BTS<-var(mbTS)

VT<-WT+BT # Variance of estimate VTR<-WTR+BTR

VTS<-WTS+BTS

data.frame(row.names=c("Variance within group","Variance between groups","Variance of estimate"),"T"=round(c(WT,BT,VT),5), "TR"=round(c(WTR,BTR,VTR),5),"TS"=round(c(WTS,BTS,VTS),5))

data.frame(row.names=c("E¢ ciency of Beta","E¢ cacy of statistics"),"T"=round(c(VT,mmeT),4),"TR"=round(c(VTR,eTR),4), "TS"=round(c(VTS,mmeTS),4)) Figure 1 #Package (distr) library(distr) plot(seq(-9,9,0.01),dnorm(seq(-9,9,0.01)),ylim=c(0,0.5),xlab=”,ylab=’Density’,type=’l’,lwd=2) lines(seq(-9,9,0.01),dt(seq(-9,9,0.01),3),col=2,lty=2) lines(seq(-9,9,0.01),dcauchy(seq(-9,9,0.01)),col=3,lty=4,lwd=2) lines(seq(-9,9,0.01),d(DExp(1))(seq(-9,9,0.01)),type=’l’,col=4) legend(1,0.5,legend=c("N(0,1)","t(3)","Cauchy(0,1)","DoubleExp(1)"),lwd=c(2,1,2,1),col=1:4,lty=c(1,2,4,1),bty=’n’) Figure 2 plot(seq(0,20,0.01),dnorm(seq(0,20,0.01),10,sqrt(10)),ylim=c(0,0.5),xlab=”,ylab=’Density’,type=’l’,lwd=2) lines(seq(0,20,0.01),dgamma(seq(0,20,0.01),10),lty=2,lwd=1,col=2) lines(seq(0,20,0.01),dexp(seq(0,20,0.01)),lty=1,lwd=1,col=3) lines(seq(0,20,0.01),dchisq(seq(0,20,0.01),2),lty=4,lwd=1,col=4) legend(10,0.5,legend=c("N(10,sqrt(10))","Gamma(10,1)","Exp(1)","Chisq(2)"),lwd=c(2,1,1,1),col=1:4,lty=c(1,2,1,4),bty=’n’)

R Codes of Empirical Example I

rm(list=ls()) ACT<-c(1,2,1,0,0,0,3,2,0,2,1,2,4,0,0,1,3,0,2,3,1,1,0,0,1,0,0,0,0,0,0,1,1,1,2,0,0,0,0,0,0,1,1,1,1,2,2,0,0,1) GPA<-c(4,1.93,3.47,3,3.27,4,3.62,3.89,3.87,4,3,3.73,4,3.56,3.36,3.55,3.2,3.3,3,2.88,3.06,3,3.47,3.27,3.75,3.62,3.25,3.18,2.33,3.75,3.14, 3.06,3.33,3.92,3.6,3,3.43,2.4,4,2.5,4,3.77,4,3.5,3,3.06,4,3.27,3.5,3.76) Iowa<-data.frame(ACT,GPA) xx<-Iowa[,1] x<-xx-mean(xx) y<-Iowa[,2] n<-length(x) l<-lm(y~x)

(21)

Figure 3

resid<-summary(l)$residuals

hist(resid,freq=F,ylim=c(0,0.8),main=’Histogram of Residuals’,xlab=’Residuals’) lines(density(resid))

qqnorm(resid,main=’Normal Q-Q Plot of Residuals’) qqline(resid)

shapiro.test(resid)

Figure 4

plot(density(y),xlab=paste(’GPA, Bandwidth=’,round(density(y)$bw,2),sep=”),main=’Kernel Density Estimation’, lwd=2,ylim=c(0,0.8)) lines(density(y,kernel="gaussian",bw="sj"),col=2,lty=2,lwd=2) lines(density(y,kernel="gaussian",bw="ucv"),col=3) lines(density(y,kernel="gaussian",bw="bcv"),col=4,lty=4) legend(1.4,0.8,legend=c("B=default","B=sj","B=ucv","B=bcv"),lwd=c(2,2,1,1),col=1:4,lty=c(1,2,1,4),bty="n") Figure 5 plot(seq(0.5,6.5,0.01),dnorm(seq(0.5,6.5,0.01),mean(y),sd(y)),ylim=c(0,1),type=’l’,lty=2,main=”,xlab=”,ylab=’Density’,col=2) lines(density(y),col=1) legend(1,1,legend=c("GPA","N(mean(GPA), sd(GPA))"),col=1:2,lty=c(1,2),bty="n") bw<-density(y)$bw f<-function(z){ s<-0 for(i in 1:n){ s<-s+exp(-((z-y[i])/bw)^2/2) } s/(n*bw*sqrt(2*pi)) } f2<-function(z){ s<-0 for(i in 1:n){ s<-s+exp(-((z-y[i])/bw)^2/2) } (s/(n*bw*sqrt(2*pi)))^2 } #T<-sum(x*D) # Statistic T

bT<-sum(x*y)/sum(x^2) # Estimation of beta based on statistic T eT<-sqrt(sum(x^2)/var(y)) # E¢ cacy of statistic T

b<-seq(0.001,0.15,0.001) m<-length(b) D<-array(dim=c(m,n)) TR<-TS<-c() #TR<-sum(x*rank(D)) # Statistic TR for(i in 1:m){ D[i,]<-y-b[i]*x TR[i]<-sum(x*rank(D[i,])) } Tr<-min(abs(TR)) wr<-0 for(i in 1:m){ if(abs(TR[i])==Tr) wr<-wr+1 }

(22)

for(i in 1:(m-wr)){ if(abs(TR[i])==Tr&abs(TR[i-1+wr])==Tr){ hr<-TR[i]+(TR[i-1]-TR[i])/2 lr<-2*hr*wr/(TR[i-1]-TR[i+wr]) bTR<-round(b[i]+lr*0.01,3) } }

#bTR # Estimation of beta based on statistic TR #TS<-sum(x*sign(D-median(D))) # Statistic TS for(i in 1:m){ D[i,]<-y-b[i]*x TS[i]<-sum(x*sign(D[i,]-median(D[i,]))) } Ts<-min(abs(TS)) ws<-0 for(i in 1:m){ if(abs(TS[i])==Ts) ws<-ws+1 } for(i in 1:(m-ws)){ if(abs(TS[i])==Ts&abs(TS[i-1+ws])==Ts){ hs<-TS[i]+(TS[i-1]-TS[i])/2 ls<-2*hs*ws/(TS[i-1]-TS[i+ws]) bTS<-round(b[i]+ls*0.01,3) } }

#bTS # Estimation of beta based on statistic TS DD<-y-bTS*x

eTS<-2*f(median(DD))*sqrt(sum(x^2)) # E¢ cacy of statistic TS fbar<-integrate(f2,-Inf,Inf)$value eTR<-fbar*sqrt(12*sum(x^2)) ebT<-eT^(-2) ebTR<-eTR^(-2) ebTS<-eTS^(-2) AREr<-ebT/ebTR AREs<-ebT/ebTS

data.frame(row.names=c("E¢ ciency of Beta","E¢ cacy of statistics","ARE"),"T"=round(c(ebT,eT,1),4), "TR"=round(c(ebTR,eTR,AREr),4),"TS"=round(c(ebTS,eTS,AREs),4))

R Codes of Empirical Example II

Case 1

rm(list=ls()) library(MASS) attach(mammals) xx<-body x<-xx-mean(xx) y<-brain n<-length(x) l<-lm(y~x)

(23)

Figure 6

hist(resid,xlab=’Residuals’,main=’Histogram of Residuals’) qqnorm(resid,main=’Normal Q-Q Plot of Residuals’) qqline(resid)

shapiro.test(resid)

Figure 7

plot(density(y),xlab=paste(’Brain Weight (g), Bandwidth=’,round(density(y)$bw,2),sep=”),lwd=3,ylim=c(0,0.0055), main=’Kernel Density Estimation’,col=5)

lines(density(y,kernel="gaussian",bw="sj"),col=4,lty=4)

lines(density(y,kernel="gaussian",bw="ucv"),col=2,lty=2,lwd=2) lines(density(y,kernel="gaussian",bw="bcv"),col=1)

legend(3000,0.004,legend=c("B=default","B=sj","B=ucv","B=bcv"),lwd=c(3,1,2,1),lty=c(1,4,2,1),col=c(5,4,2,1),bty="n")

Figure 8

plot(xx,y,xlab=’Body Weight (kg)’,ylab=’Brain Weight (g)’) alpha<-summary(lm(y~xx))$coe¢ cients[1,1]

beta<-summary(lm(y~xx))$coe¢ cients[2,1] lines(xx,alpha+beta*xx)

plot(xx,y,xlim=c(0,600),ylim=c(0,700),xlab=’Body Weight (kg)’,ylab=’Brain Weight (g)’) lines(xx,alpha+beta*xx)

Figure 9

plot(sort(xx),ylab=’Body Weight (kg)’,xlab=”,ylim=c(0,600)) plot(sort(y),ylab=’Brain Weight (g)’,xlab=”,ylim=c(0,700)) bw<-density(y)$bw f<-function(z){ s<-0 for(i in 1:n){ s<-s+exp(-((z-y[i])/bw)^2/2) } s/(n*bw*sqrt(2*pi)) } f2<-function(z){ s<-0 for(i in 1:n){ s<-s+exp(-((z-y[i])/bw)^2/2) } (s/(n*bw*sqrt(2*pi)))^2 } #T<-sum(x*D) # Statistic T

b<-seq(0.01,1.5,0.01) m<-length(b) D<-array(dim=c(m,n)) TR<-TS<-c() #TR<-sum(x*rank(D)) # Statistic TR for(i in 1:m){ D[i,]<-y-b[i]*x

(24)

TR[i]<-sum(x*rank(D[i,])) } Tr<-min(abs(TR)) wr<-0 for(i in 1:m){ if(abs(TR[i])==Tr) wr<-wr+1 } for(i in 1:(m-wr)){ if(abs(TR[i])==Tr&abs(TR[i-1+wr])==Tr){ hr<-TR[i]+(TR[i-1]-TR[i])/2 lr<-2*hr*wr/(TR[i-1]-TR[i+wr]) bTR<-round(b[i]+lr*0.01,3) } }

eTS<-2*f(median(DD))*sqrt(sum(x^2)) # E¢ cacy of statistic TS fbar<-integrate(f2,-Inf,Inf)$value eTR<-fbar*sqrt(12*sum(x^2)) ebT<-eT^(-2) ebTR<-eTR^(-2) ebTS<-eTS^(-2) AREr<-round(ebT/ebTR,3) AREs<-round(ebT/ebTS,3)

Case 2

rm(list=ls()) library(MASS) attach(mammals) xx<-log(body) x<-xx-mean(xx) y<-log(brain) n<-length(x)

(25)

l<-lm(y~x)

Figure 10

hist(resid,xlab=’Residuals’,main=’Histogram of Residuals’,freq=F) lines(density(resid))

qqnorm(resid,main=’Normal Q-Q Plot of Residuals’) qqline(resid) shapiro.test(resid) Figure 11 plot(seq(-6,13,0.1),dnorm(seq(-6,13,0.1),mean(y),sd(y)),ylim=c(0,0.2),type=’l’,lty=2,main=”,xlab=”,ylab=’Density’,col=2) lines(density(y),col=1) legend(-5,0.2,legend=c("log(brain)","N(mean(log(brain)), sd(log(brain)))"),col=1:2,lty=c(1,2),bty="n") bw<-density(y)$bw f<-function(z){ s<-0 for(i in 1:n){ s<-s+exp(-((z-y[i])/bw)^2/2) } s/(n*bw*sqrt(2*pi)) } f2<-function(z){ s<-0 for(i in 1:n){ s<-s+exp(-((z-y[i])/bw)^2/2) } (s/(n*bw*sqrt(2*pi)))^2 } #T<-sum(x*D) # Statistic T

b<-seq(0.01,1.5,0.01) m<-length(b) D<-array(dim=c(m,n)) TR<-TS<-c() #TR<-sum(x*rank(D)) # Statistic TR for(i in 1:m){ D[i,]<-y-b[i]*x TR[i]<-sum(x*rank(D[i,])) } Tr<-min(abs(TR)) wr<-0 for(i in 1:m){ if(abs(TR[i])==Tr) wr<-wr+1 } for(i in 1:(m-wr)){ if(abs(TR[i])==Tr&abs(TR[i-1+wr])==Tr){ hr<-TR[i]+(TR[i-1]-TR[i])/2 lr<-2*hr*wr/(TR[i-1]-TR[i+wr]) bTR<-round(b[i]+lr*0.01,3) }

(26)

}

eTS<-2*f(median(DD))*sqrt(sum(x^2)) # E¢ cacy of statistic TS fbar<-integrate(f2,-Inf,Inf)$value eTR<-fbar*sqrt(12*sum(x^2)) ebT<-eT^(-2) ebTR<-eTR^(-2) ebTS<-eTS^(-2) AREr<-round(ebT/ebTR,3) AREs<-round(ebT/ebTS,3)