Subset selection based on likelihood ratios: the normal means case

(1)

April 1979

SUBSET SELECTION BASED ON LIKELIHOOD RATIOS: THE NORMAL MEANS CASE

By

Jayanti Chotai

American Mathematical Society 1970 subject classification: Primary 62F07;

Secondary 62A10, 62F05.

Key words and phrases: Subset selection, likelihood ratio, order restric

tions, loss function, normal distribution.

(2)

Let tt^, ..., TTj^ be k(>2) populations such that ïï^, i = 1, 2, ..., k, is characterized by the normal distribution with unknown mean and

2 . 2

variance a^a , where a^ is known and a may be unknown. Suppose that on the basis of independent samples of size N^ ^from TK ( i=l,2,...,k), we are interested in selecting a random-size subset of the given popula

tions which hopefully contains the population with the largest mean.

Based on likelihood ratios, several new procedures for this problem are

derived in this report. Some of these procedures are compared with the

classical procedure of Gupta (1956,1965) and are shown to be better in

certain respects.

(3)

Section Page

1. INTRODUCTION AND SUMMARY 1

2. A SHORT REVIEW AND PRELIMINARIES 3

2.1 A short review with comments 3

2.2 Some preliminaries 7

3. THE SELECTION PROCEDURE R1 10

3.1 Derivation of the procedure R1 10

3.2 Some properties of R1

3.3 Determination of the constant satisfying 16 **the P*-condition**

3.3.1 Some general results 17

3.3.2 The probabilities P(M^=m|k;w) 20

3.4 The case of unknown variances ^5

4. PROCEDURES BASED ON SOME OTHER LIKELIHOOD RATIOS 30

4.1 The class R2(A) of procedures 3 ^

4.2 The class R3(A) of procedures 3 ^

4.3 The class R4(A) of procedures 34

4.4 The procedure R5 3 ^

4.4.1 Derivation of the procedure 3 ^

4.4.2 P(CS|R5) and its infimum ³ ^

5. PROCEDURES DERIVED BY ASSUMING A FIXED PARAMETER VECTOR 40

5.1 The procedure R6

5.2 Some other likelihood ratios 43

6. COMPARISONS BETWEEN SOME OF THE PROCEDURES 45 6.1 The case of three populations and fixed configurations ^ 6.2 The case of ten populations and random configurations 55

6.3 Conclusions 58

ACKNOWLEDGEMENTS 61

REFERENCES 68

(4)

1. INTRODUCTION AND SUMMARY

Let ïï^, ..., ïï^ be k(>2) populations such that 7i\ (i =l,2,...,k) is characterized by the normal distribution with unknown mean and vari-

2 . 2

ance aJJ , where a^ is known and a may be unknown. Let 1

^[2] - - ^[k] denote t ' ie ordered means and for i = 1, ..., k let

IT [i-j be the unknown population corresponding to . Suppose that we are interested in selecting t ^ ie best population, on the basis of independent sample {X^j, j»l,...,n^} of size n^ from for each i = 1, 2, k. As a first step in achieving this goal, one may wish to select a subset of the given populations, the size of the subset being a random variable depending on the data obtained. This report is concerned with selection procedures that accomplish this step. The words procedure and rule will be used interchangeably. The procedures given here are new and are obtained through likelihood ratios as described

below. The consequences of applying these methods to distributions other than normal and to other goals like the complete ranking problem and selection of the t best, will be considered in forthcoming reports.

Let JJ be the unknown vector of means and let Q be the parameter space. We shall assume that £2 is the k-dimensional Euclidean space.

The likelihood function obtained by considering the total sample of

k 2

N = E n. observations is denoted by L(x,y,a ). Let y n denote the

i=l ~ u

maximum-likelihood estimate of y. For a given constant c, 0 < c < 1 , consider the region Œ(c) consisting of all y 6 Q such that

2 2

L(x,ii,g ) > c • L( X ,£q, Q ). Thus Q(c) is simply a likelihood-based confidence region for the unknown y. Now consider the selection proce

dure, denoted by Rl, which includes the population TI \ in the selected

subset iff Q(c) contains at least one point (y^,1^,. • • ,U^.) having

Iju as its largest component. This is the topic of section 3.

(5)

Section 2 contains a short review of the field of ranking and selec

tion procedures, together with some preliminaries. Rule R1 is considered in Section 3. In Section 4 we generalize the idea of relative likelihood to consideration of ratios of supremum of likelihoods over some intui

tively reasonable regions in Q. Classes R2(A), R3(A) and R4(A) of selection procedures are derived in Sections 4.1, 4.2 and 4.3 respectively.

The rules contained in R2(A) and R3(A) depend on a prespecified con

stant A > 0 and coincide with R1 for A = 0. The rules in R4(A) depend on a prespecified constant A > 0 (strictly), and the limiting case when A 0 is denoted by R5 and considered in Section 4.4.

If the parameter space fi is restricted to the set of k! permuta

tions of fixed values yj, ..., then likelihood ratios lead to other interesting rules. In particular, the whole of Seal's (1955) class may be generated by assigning different values to ..., ]i^. Thi s is consid

ered in Section 5, where an important rule R6 is derived and studied.

Section 6 is devoted to comparisons between the rules R, RI, R5

and R6, where R denotes the traditional rule proposed by Gupta (1956).

**The comparisions are made both in terms of the P*-approach (see Gupta (1965)) and in terms of four loss functions that have appeared in the**

literature. These loss functions are explicitly given in Section 6.

As it turns out, the rules RI, R5 and R6 seem to do better than R in certain respects. However, if the goal is only to minimize the expected number of nonbest populations selected, then R and R1 do better than R5 and R6. Rule R has an advantage over R1 if the good populations lie far apart. If the good populations lie close together, rule R1 appears to have an advantage over R. Here, the terms "good 11

and "bad 11 are used loosely to mean "having a h igh rank" and "having a

low rank", when the populations are ranked in terms of their means. The

main conclusions from the comparisons made appear in Section 6.3.

(6)

2. A SHORT REVIEW AND PRELIMINARIES

2.1 A short review with comments

Very often the experimenter is faced with the problem of comparing a set of k(>2) populations (categories, drugs etc) in some sense on the basis of independent samples from the populations. The classical tests of homo

geneity provide only a partial solution to the problem. Therefore, proce

dures have been developed within the last thirty years which enable the experimenter to rank the populations and/or select a subset of the popu

lations in a manner that satisfies some reasonable requirements that the experimenter may impose. References to work done in this field may be obtained, among others, from Bechhofer, Kiefer and Sobel (1968), Gibbons, Olkin and Sobel (1977), Gupta (1965,1977) and Gupta and Panchapakesan (1972) An extensive bibliography containing over six hundred items has been

compiled by Kulldorff (1977).

The subject of selection procedures as it exists today may be formu

lated in two directions: fixed-size subset selection and random-size subset selection. The discussion that follows will be mainly confined to the normal means problem with a common known variance, when a sample of size n is taken from each population.

Fixed-size subset selection

Let ß denote the k-dimensional Euclidean space aiid fo r a given integer t, 1 < t < k , let

n ^t (ô») -.{yen: y ^[k _ ^t+1] >y ^[k _ ^t] **+ô*}**

**for any specified ô* > 0 . Then ß (ô) is known as the preference zone and ß-ß (Ô) is known as the indifference zone. Consider the problem of selecting the t best populations, corresponding to t+1]'**

y [k-t+2]' y [k]*

(7)

In the present formulation, a procedure is employed that selects a subset of a prespecified size s, 1 < s < k , from the k given popula

tions. Suppose that a correct selection (CS) is said to be made if all the t best populations are included in the s populations selected by the procedure. One requires that

(2.1) **P(CS) > P* if y € fi (ô*)**

**where P(l/( )<P<1) and 6* > 0 are prespecified constants. Obviously,** k

s > t for the above definition of CS.

The case when s = t is commonly known as the "indifference zone approach 11 . Since Bechhofer (1954), many modifications have been made to obtain procedures for distributions other than normal and to other defini

tions of "correct selection 11 . For example, CS may be defined as inclu

sion in the selected subset of at least t^ of the t best populations.

Generalizations to include the cases when the observations are taken sequentially or in several stages have also been considered in the literature; see Bechhofer and Tamhane (1977).

Random-size subset selection

In this formulation, the size of the selected subset is not fixed in advance but depends on the data obtained. Different approaches have been used. We confine our discussion to the case t = 1.

(A) The P-approach:* This approach is considered extensively in the literature and is commonly known as the "subset selection approach". By a "correct selection" is meant selection of a subset that includes the best population. One requires that

(2.2) **P(CS) > P* for all y € fi.**

This requirement is known as the basic probability requirement or the

**P*-condition.**

(8)

For this problem, Seal (1954,1955) defined a class, denoted by C, of selection procedures as follows. Let c^, •••> c ^i an y non-neg-

k-i - - -

ative real numbers such that E. ,-c. = 1. Let x /1N < x /ON < ... < x n N

J=1 J (1) (2) (k)

be the ordered sample means from the populations. Now consider the fol

lowing selection procedure:

"Include the population corresponding to t ^ ie selected subset iff

*(i) - Vd) + c **2*(2)** + * c **i-l*(i-l)** +

where d is the smallest number such that the Precondition is satisfied.

The class C is obtained by assigning different values to the c:s.

Under the assumption that all the population means except one are equal, Seal proposed to find that rule in C which maximazes the probability of including in the selected subset the population with the unequal mean if this mean is larger than the common mean of the other k-1 popula

tions; and which minimizes this probability if the unequal mean is smaller. His calculations indicated that this rule, denoted by R, would be obtained by assigning the common value (k-1) ^ to each c^

(j=l,2,...,k-l). That this actually is the case has been elegantly proved in Berger (1977).

However, intuitively reasonable arguments led Gupta (1956,1965) to propose another member of C. This rule, denoted by R in this report, may be obtained by setting c,' = 1 and c. = 0 for j ^ k - 1. Gupta

K-1 J

showed that the expected number of nonbest populations included in the selected subset is often smaller if one uses R rather than R. Further comparison between R and the other members of C appears in Deely and Gupta (1968) and in Gupta and Hsu (1977).

It may be remarked that a "natural 1 development of the fixed-size

subset formulation to allow random-size subset selection appears to be

(9)

been considered in Ryan and Antle (1976). Generally, however, tables are available mostly for the requirement (2.2).

It may also be remarked that many generalizations and modifications of the approach under consideration exist in the literature. One line of generalization is to consider populations that are characterized by distributions other than normal. Most of the rules which have been con

sidered under this generalization may be expressed as

"Include TT . in the selected subset iff . i

h(T. ) > max T."

1 ' l<j<k J

where h(x) is a suitable function and for each i, T\ is a suitable estimate of the parameter characterizing ÏÏ ^. For the normal means prob- lem with common known variance a , we may let the T:s be the sample 2

means and set h(x) = x + da//n, thereby obtaining Gupta's (1956) rule R **Before we conclude our review of the P*-approach, the following is in order. When several rules are available for a given problem, their per**

formances are usually compared under certain selected configurations of the parameters involved. Let P[£]> i = 1, 2, ..., k, denote the proba

bility that i = 2, ..., k, is included in the selected subset.

Let a denote the set of ranks of the selected populations and let |a|

denote its cardinality. Then E(|a|) = ^j=i P[j] denotes the expected size of the selected subset. A quantity that has been proposed as a comparison criterion is E( | a| )/P(CS) . However, apart from our desire to obtain E(|a|) as small as possible and P(CS) as large as possible it may be difficult to justify the use of this quantity in many practi

cal situations. •

Intuitively, expected value of the average rank Z. ç j /1 a. | of the jca

selected subset appears to be a reasonable criterion to be used for

(10)

comparing different rules. Straightforward computation shows that for rule R, for the normal means problem, the expression for expected average rank is given by

oo k

/ Z ^P

^_1

Z { Z ^j}{ Z n ^<Ky -A+<5..) n

p=l a€S "-jea "-iCa j^ a ^ j€

j+i

«fCy+ôjj)

- $(y-A+6 i j)jjd^Cy)

where X » dcr//n, 6 . . « ]i. - \x. and S = {a: I a I =p} .

9 ±2 ^ 3 P

In general, expressions and values of P[£] (i = l,2,...,k) may be easier to obtain than those of the expected average rank. In that case, the expected value of J TA j/|a| may be roughly approximated by the ratio of expectations taken separately over the numerator and the

• » k k

denominator. This leads us to suggest the quantity ^[j]^j~l^j to be used as a comparison criterion.

(B) The restricted random-size approach: Here, restrictions like an upper bound on the size of the selected subset are imposed.

(C) The decision theoretic approach: In this approach, a loss function is given that takes into consideration not only "correct selection 11 but also other aspects like the parameter values of the populations selected and their distances from the largest. A prior distribution is given to the unknown parameters. The loss functions that have appeared in this connection are explicitly given in Section 6.

2.2 Some preliminaries

The following simple result is often useful when looking for the para

meter configuration that minimizes the probability of a correct selec

tion. It is a minor generalization of a theorem appearing in Seal (1955)

(11)

LEMMA 2.2.1 Let X = (X^,...,X^) be a k-dimensional random variable (r.v.). Let T., i = 1 , ..., k, be given one-to-one transformation on

i

R^. Let Y = be another r.v. having the same joint distribu

tion as (T^(X^),...,T^(X^)). Now let W be an arbitrary given subset of and set

**W* = {x€R k :(T^ 1 (x 1 ),...,T^ k (xk ))ew}.**

**If W* p W then we have**

P[(X 1 ,...,X k )€W] > P[(Y lf ...,Y k )€W].

PROOF

P[(X 1 X k )€W] - P[(T 1 (X 1 ) T k (X k **))€W*] =**

= P[(Y 1 ,...,Y k **)6W*] >P[(Y 1 ,...,Y k )€W]**

and the proof is complete.

The following definitions from Gupta and Nagel (1971) will be used in this report. Let X = (X^,...,X^) denote the vector of means from

• • •

9

^*

DEFINITION 2.2.2 Given that X = x = (x^,x2» • • • »x^) > the function <p^(x) is defined as

cp^(x) = P[ TI \ included in the selected subset |X=x]

for i = 1 , ..., k.

DEFINITION 2.2.3 A selection rule is called just if for every i = 1 , ..., k, (x^,...,x^) is a non-decreasing function of x^ and a non-increasing function of x^, j ^ i.

DEFINITION 2.2.4 A selection rule is called translation invariant ^if

for every x € R , for every c £ R and for every i = 1 , k k,

^(x.+c,...^) = cp i (k^ 9 .. . ,x k ).

(12)

VK > Pj =» P(tk included in the selected subset)

> P(nj included in the selected subset)

where are the means characterizing ÏÏ ^{^, ...,} TT ^{^.}

DEFINITION 2.2.6 A selection rule is called permutation invariant ^(or symmetric ) if

(^gCx) ,cp 2 g(x), . . . .Ifl^gCx)) = gCcp^x) ,<P 2 (x) »... ,<P k (x))

**for every x and where g(y^>y2* • • • >3^) denotes any arbitrary permuta**

tion of the components y^, y^ ..., y^.

REMARK 2.2,7 For the normal means case it can be deduced from Lemma 2.2.1 that for a just rule, the probability of including - n t ie

selected subset attains its minimum in the parameter space at a point

where ]i^ = \x^ = ~ Furthermore, if the rule is also translation

invariant then this minimum is independent of the common mean. Note that

these results have been obtained in Gupta and Nagel (1971).

(13)

3. THE SELECTION PROCEDURE R1

Suppose that we have k(>2) populations ir , ÏÏ , . From each ÏÏ . we

i K 1

obtain n. i.i.d. observations X.., j = 1 , .n. , where X.. has the

i ij' ^J ⁹ ⁹ i ⁹ ij

normal distribution with unknown mean jj^ and known variance 2 . The

2 2

case when the variances can be written in the form o. = a.a % where the i i 9

a£ are known constants and a 2 is unknown, will be treated in Section

3.4. Now denote the ordered means by 1 ^[2] - - ^[k] anc * t ' ie cor ~ responding unknown populations by ïï [2]' ^fk]' ^ en ïï [k]

be called the best population, and if several of the means are equal to then one of them is assumed to be tagged as the best. Suppose that we are interested in selecting the best population. For this goal, we derive a procedure in this section that selects a nonempty, random-size subset of the populations such that it includes the best population with a "high 11

probability. The event that the selected subset contains the best popula

tion will be denoted by "Correct Selection" (CS).

In Section 3.1 we derive a procedure, R1, from a likelihood-based con

fidence region about y - (y^,...,y^). We show in Section 3.2 that R1 is just, translation invariant and monotone. Section 3.3 deals with the con

**stants necessary to implement R1 so that it satisfies the P*-condition.**

2 2 2

In Section 3.4, we extend the rule to the case when a. _ii ^{= a.O} ^with a

unknown and the a:s as known constants. Tables at the end of this section **give, for k < 12 and for selected values of P* and n, values of d^**

**required to satisfy the P*~condition for the case of variances known (Table IA) and unknown (Table IB).**

3.1 Derivation of the procedure R1

k

The total likelihood for the N = E observations is given by

(14)

(3.1) 2 k 2 ~ n i /2 r ! n i 2 1 L(x,y,o ) = II (2ïïa.) exp . E (x..-y.)

i-1 L 2a. ^j-1 ¹³ ¹ ^J

The maximum-likelihood estimate of y will be denoted by y^ = (x^,...,x^).

where x^, ..., x^ are the sample means from the populations. Let fi denote the parameter space for y, and for each i = 1, ..., k define

fi. = {y € fi: y.>y. Vj}. i = ì - j

Let c be a predetermined number such that 0 < c < 1. The approach used to construct the selection procedure R1 of this section can be expressed as follows:

2 2

"Let ß(c) = {y€ß:L(X,y,g ) > c • MX,£ 0 ,G )}. Now include TN in the selected subset iff fi(c) fl ^ (j). l!

In words, we construct a region having relative likelihood at least c and then select n\ iff there exists at least one point y =

in this region such that y^ > y^ Vj. We now proceed to express this proce

dure explicitly.

It is easy to see that

2 ) -y k «. **2*1**

= exp - -=• E w. (x.-y. ) . , j=l J J J

L(x,y Q ,g ) 2

where w. = n./a?.

J J J

Therefore, R1 may be obtained through the inequality

2 ~ 2

sup L(x,y,a ) > c• L(x,u n ,a ), yen.

- 1

which is equivalent to

k

- 2

(3.2) inf Z w.(x.-y . ) < - 2 log c = d-.

y€fl. j- 1 J J J

Note that if we set y. = x Vj then the sum on the left in (3.2) J J

equals zero, and so the population with the largest observed mean is al

ways selected. Now for j = 1 , .k, the jth component of the value of

(15)

y that gives infimum in (3.2) is given by "the isotonic regression of Xj with weights w:s, with respect to the partial ordering given in fi/ 1

in the terminology of Barlow et al (1972). These regression values may in fact be calculated by using the algorithms given in §2.3 of this reference.

However, some of the results in the present report require for their solution the direct approach given below.

Let X = X be given and define

. y€ß.:(u.>x.)A(y.=x. Vj with x.<y.)

**FT* =** J J J" 1

1 _A(yj=y^ Vj with Xj>vi..) j

Then it may be observed that for every jj ^! € îh - there exists a y"6 such that

k 1c

E w.(x.-yV) 2 < E w.(x.-y!) 2 ; J

=1

J J J - J

=1

J J J

and so we may restrict ourselves to the set when looking for the infimum in (3.2).

Let < X,., < ... < X., . denote the ordered values of the sample **V--/ \£) \W**

means and let W (j) anc * ^(j) t * ie c ' iaracters corresponding to X,. v . Now fix i. For each m = i + 1, ..., k, let I = [x, -, N . ). To

\3) m (m-l; (m;

determine the infimum on fi? we consider the function i

2 k - 2

(3.3) f i (y) = w i (x i -y) + E W/.\(x/.v"y) iff y € I . j=m J J

Note that the above function is defined for every y > x^. The following lemma gives some properties of this function.

**LEMMA 3.1.1 For each i * 1, ..., k, there exists a unique m€J^ =**

{j: x^>x^} such that the quantity y^ ^ defined by

(16)

W.X. + Z W, (j) (j) . . X .. .

(3.4) Y . = ^

' m, i k

w. + Z w,. N 1 j=m <J>

has the property that y .61 - [x, 1x ,x /N ). Moreover, f.(y) at-

r r J 'm,i m (m-1) (m) i J

tains a unique minimum at y = y m, i

PROOF Let i be given. Then on each I for m£J., we have

6 m i

é £ i (y) = " 2u i ( *i" y) "^ 2 - (j )( ;(j )-^

and so

H 2 k

—ô **f.(y) « 2w. + E 2w,.* > 0 .**

j 2 i i (j)

dy j=m

Therefore, f|(y) is strictly increasing in each 1^. Since f^(y) is continuous, and since it is nonincreasing at y = x^ and nondecreasing at y = x **(k)* ^ attains a unique minimum for some y£ Fur**

thermore, f^(y) is differentiable at every y£ f x £> x (k)^" Therefore, the minimum is attained at a point where f|(y) =0. Now f!(y) =0 in I is equivalent to y = y .. Thus set m to be the number such that

m m, i

k

w.x. + Z w,.,x,., , 1 1 i-» (J) (J) . -

x (n-l) - k— x <»)'

w. + I w,. N 1 j-m <J >

The proof of the lemma is now complete.

We may now express our selection rule R1 as follows:

"Retain n\ in the seleoted subset iff

2 k - 2

W.(X.~Y .) + Z w,.(x,. N -y .) < d , i i 'm,i ^ (j) (j) 'm,i - 1'

where Y ^m £ i ^s defined by (3.4) and where the unique m£J^ = {j:x^>x is determined by the property y . € 1 = [ x , , N **, x , * ) • "**

a t- t- ⁴⁷ ^'in,i ^m (m-1) (m)

(17)

REMARK 3.1.2 It is interesting to note that the selection of ïï (£) c * e ~ pends on the distances x^-x^^, j = i + 1, k. This property is also satisfied by the rules obtained in Goel and Rubin (1977)

and Chernoff and Yahav (1977). However, for the rule Rl, if there exists an m€ {i+1,. .. ,k} such that

X (m-1) < w.x. + E w,. V.X,. v

1 1

j-m <J>

^(J)

*<•>'

then Rl does not care where all the other x (j)> J ^ j $ {m,m+l,. . . ,k}

lie as long as

x / • \ (j) ^< w.x. + E w,.vX,. x

1 j-m ( J»J w. + E w.. v

1 ;-™ J =M

REMARK 3.1.3 Gupta and Wong (1976) have generalized the procedure given by Gupta and Huang (1976) and give the following procedure for the present problem:

"Retain TT . in the selected subset iff i

/ x. > max

i -

/a. a 2 \ X. -b. /-Ì+-J- , ^ j 1 n. n.

It may be noted that for k = 2, this procedure is equivalent to Rl.

REMARK 3.1.4 Implementation of Rl It may be of interest to note that for given i < k the value of ^ may in practice be obtained by the following procedure: Calculate

w.x. + w,, .x. .

b = i i (k) (k) 0 w. + w.

i (k)

If b~ > x., set m = k; otherwise calculate b, where

0 (k-1) 1

w.x. + E w.,x.. . 1

i i , =n (k"j) (k-j)

b Ä = , £ = 0 , 1, ... .

w. + E w

1 j-0 (k_j)

(18)

Continue in this manner until the first time that >x (^-£-x)* Set

m = k - i for this £.

It is also worth noting that if w^ = = . .. = then the rejec

tion of f° r an y i implies that all the 71 (j) with j < i are also rejected. However, this is not necessarily true if the w:s are

unequal. •

3.2 Some properties of R1

The following theorem summarizes the main properties of R1.

THEOREM 3.2.1 The rule R1 is just and translation invariant. Further

more, it is monotone if the w:s are equal.

PROOF For the sake of simplicity in notation we assume < \x^ 1 •••

Let now x = (x^,...,x^) and ^ = (y^,...,y^) be any two points in the k-dimensional Euclidean space such that y^ = x^ and y^ > x^ for j ^ k.

Further let be any arbitrary real number such that Uq > x^. Now it is easy to see that

(3.6) w k (x k -u 0 ) 2+ E VVV 2 ± W k (y k _lJ 0 )2+ W j (y j _lJ 0 )2 '

1 x y

where J = {j:x.>y„} and J = {i : y.>u^}• We may take infimum over

x J j- 0 y J 'j-^O 3

{ V ^I

^Q

^{^}

^Q

^{^} X ^} on both the sides in (3.6) and thereby show that A^, given by

k

- 2

A, = inf E w. (x.-y. )

is nondecreasing in each x^, j ^ k. Now obviously is translation in

variant, and it therefore follows that it is nonincreasing in x^ when the other Xj are kept fixed. So R1 is just. Now Lemma 2.2.1 may be applied

to complete the proof of the theorem. •

(19)

REMARK 3.2.2 An alternative proof of the above theorem may be furnished by a theorem in Nagel (1970, p.21, theorem 1.5.2), which states that a rule that is just and symmetric is also monotone. •

3.3 Determination of the constant satisfying the P*-condition

It follows from Theorem 3.2.1 that inf. P(CS|R1) is attained at a point y€ft

where y^ = ]i^ = . . . = y^ = y and is independent of y. We shall now **proceed to determine the constant d^(P) which is necessary to implement R1, if we require that inf P(CS|R1) > P for a given value of P*. In Sec-**

yE£2

tion 3.3.1 we give the distribution of each random variable A^, i = 1 , ..., k, given by

k - 2

(3.7) A. = inf Z w. (X.-y.) , 1 yeSL j=l ^J ^J ^J

- 1 2 where X^ , j = 1, . . ., k, have common mean and variances w^ = ojn^ , j = 1, ..., k, respectively. This distribution depends on certain probabil

ities which will be treated in Section 3.3.2. Obviously, the constant d^

is determined by solving the equation

(3.8) **min P(A.<d ) = P*.**

i<i< k i_i

We show in Section 3.3.2 that for k < 4 , the minimum in (3.8) is given by

the value of i that corresponds to w [^] = niax{w^, . . . ,w^} . We do not know

if this holds generally for k > 4 .

(20)

3.3.1 Some general results

First, we introduce some notations. Let X., i = 1, ..., k be k indepen

dent normally distributed random variables with common mean and variances

- 1 2 -

W i = °i / ' n i' i = 1 k respectively. Let < X (2) < . .. < X^

denote their order statistics. For each i = 1, .... k, let M. denote the i

r.v. taking values in {2,3,...,k+l} and given by

(3.9) M ⁱ =

where

m if X. < X. . and X. .. < v . < X. . i - (m-1) (m-1) - m,i (m)

k + 1 if X. > X. Vj i - J

k

w.X. + E W/ .,X...

1 1

.i-, <j>

(

J>

v , = d

^#

m,i k

I w . X j =m (J) (J)

It is important to note that for each i the distribution of depends on the number k and also on the vector w = (w^,...,w^). Therefore, the notation P(M^=m|k;w) or P(M^=m|k;w^,...,w^) will be used. However, if obvious from the context, the dependence on the particular k or w under consideration will sometimes be suppressed in the notation. Now we may rewrite (3.7) as

2 Ï ,= ,2

w.(X.-y .) + E w / ..(X . v-Yvr •) t k 1

i i T M.,I . M (j) (j) M.,i i T

A. = i

i j=M. J J l'

J

l

0 if M. = k + 1 i

We are now in a position to state the following theorem.

THEOREM 3.3.1 For each i = 1, ..., k and d^ > 0 , we have

k 2

(3.10) P(A.<d ) = P(M.=k+l|k;w) + Z P(X V **_ i <d )*P(M.=m|k;w) l""" i. 1- m = 2 n K. m+ J. "" X** 1

2 2

where the notation x,^ 1S used to denote a random variable having the x

distribution with V degrees of freedom.

(21)

Before we prove the above theorem, some comments are in order.

Firstly, if all the w:s are equal then each A. (i=l,2,...,k) has the same distribution. This is not true for the case of unequal wss.

Secondly, the distribution of A^ depends partly on the x 2 distribu- tion and partly on the distribution of M,. . Extensive tables of the x 2

distribution appear in Khamis (1965). The distribution of M^ is of course independent of d^ and will be discussed in Section 3.3.2.

Thirdly, it may be noted that the approach used to derive the distribu

tion of A^ is very similar to the one in Barlow et.al. (1972, §3), where they discuss tests of homogeneity for ordered alternatives with general partial orders. For the present problem, however, the direct proof given below is shorter and may be of interest.

We now give two lemmas which appear as Lemmas B and C in Barlow et.al.

(1972, p. 128). Their statement of the second lemma is somewhat different from ours, but the proof is exactly the same. Lemma 3.3.3 is needed in the proof of Theorem 3.3.1 whereas Lemma 3.3.2 is required only to prove Lemma 3.3.3.

LEMMA 3.3.2 If Y^, Y^, ..., Y^ are independent normal variables with zero means and unit variances, and Q is a set of restrictions on the Y:s of the form

r

E c.Y. > 0 i=l 1 1 "

such that the probability that Q is satisfied is non-zero, then the condi

tional distribution of

r 2 E Y:

i = i 1

given Q, is t hat of a x 2 random variable with r degrees of freedom. D

(22)

LEMMA 3.3.3 If Z^, Z^, . .., Z^ are independent normally distributed random variables with common mean and variances b,\ b„\ . . . , b \ then

1 2 ' r

the conditional distribution of

r - 2

(3.11) Z b. (Z.-Z) ,

i=l 1 1

given a set of restrictions of the type Z^ < Z or Z. > Z, is that of X 2 with r-1 degrees of freedom, where

Z = Z b.Z./ Z b..

• i 1 1 • i 1

1=1 1=1

Moreover, the distribution of (3.11) is independent of Z.

PROOF OF THEOREM 3.3.1 Let k and i be given, and let d^ > 0 be a given real number. Then,

k+1 k

P(A.<d ) = Z P(A.<d ,M.=m) = Z P(A.<d ,M.=m) + P(M.=k+l).

1_ m-2 ^1-11 m=2 1 1 1 1

For the rest of the proof keep m 6 {2,... ,k} fixed. Partition the k-1 integers 1, 2, ..., i-1, i + 1, k into 0 } and

1 i l m-l

S? = (j •.»•••»j, -,}• Let Lm denote the collection of all the /k-l\

2 J m-1 k-1 0

\ m - 2 / such distinct partitions.

For any given partition (S^S^), let Q m denote the event

(3.12) Xj <x 2 yj es®, X. < x 2 , SL > x 2 vj e s®

where

X 0 = (w.X. + Z w.X.)/(w. + Z w.).

2 ii J J 1 J

•<rc. m 'r:c m

j€S 2 J€S 2

Then it is easy to see that

P(A^<d^,M^=m) = EP(A^<d |Q )/P(Q ).

Lm

Now by Lemma 3.3.3, the random variable

w.(X.-X„) 2 + E w. (X.-X ) 2

1 1 2 j J 2

j€S*

(23)

is independent of Q ^m , and thus

P<A i <d 1 |Q'") = p (Xk_ m+ i <V

for any given partition in Lm. Now the relation

ZP(Q m ) = P (M. =m) Lm

completes the proof of the theorem. •

3.3.2 The probabilities P(M^=m|k;w)

We now give a brief description on determination of the probabilities P(M^=m|k;w). A detailed discussion on such probabilities in a general set

ting appears in Barlow et.al. (1972, §3.3), where they use the notation P(m-l,k;w) to denote P^^mjkjw) .

It became clear in the proof of Theorem 3.3.1 that determination of the probability P(M^=m|k;w) involves determination of probabilities of type P(Q m ), where Q m is given by (3.12). The index sets and in (3.12) are disjoint and S™ U S™ = {1,2,...,i-1,i+1,...,k}.

Let i be given. We first discuss the case m > 2. Since for each j € the random variables 5L - and are independent by appeal to standard results, we may write

p ^(Q

^M

) = P(L<X

2

vj es") • p ^(X.<X

²

^,XJ>X

²

vj ^ES

²

) S P(Q") • P(Q

2

).

Now if we let S™ » {j., , j_,. • .,j and

1 i i m-z

w 0 = Z w. + w.

2 .-„m j J 1 j€S 2

then it can be seen that

(3.13) P(Q?) = P(M =m|m-l;w. ,w ,...,w. ,w ).

1 n _ 1 J 1 J 2 J m - 2 2

Now since the correlation coefficient between X - X ? and X. - X ? for

)L ^ J

z, ^j e ^is

( ^w \l/2 ( ^w ^\l /2 p . = a a. = — • — ,

* 1 v V V V V - ;

(24)

(3.15) E. P(M 1 -j|Äjw) - 1

it is wel l known (see Gupta (1963)) that the probability in (3.13) may be written as

°° f ( ^a i ^x M

(3.14) /1 n TTT2IÌ ^tp(x)dx

j€S° (1 " a i )

where $(x) and cp(x) are the c.d.f. and the p.d.f. of the stan dard normal distribution.

As regard s PCQ^), it c an be seen that P(Q^)

=

P(M,=2 Ik-m+2;w.,w. ,w. ,...,w ) with S™ = {j ,j }. Now a

1 ' i ] - i Ivt £ m-1 m k -1

J m-1 ^J m ^J k-1

probability of type P(M^=2|ü;w) for arbit rary i and w may be d eter

mined through the rel ation

£+1 E j=2

if P(M 1 =j|£;w) were known for j = 3, 4, . .., £ + 1. Since k-m+2 < k for m > 2, we therefore have a recurrence relation whereby m a y determined through determination of pro babilities of type P(M^=j|£;w) for

£ < k and suit able w, and thr ough integrals of the type (3.14) used to determine P(Q^)«

As for the case m = 2, it is clear by now that P (M. =2 |k ;w-, . . . ,w )

1 JL K

may be determined by assuming I = k in (3.15) when P(M.=j|k;w 1 ,...,w )

I L K .

for j =3, 4, ...» k + 1 have been determined.

Finally, the required probability is o btained by

P (M. =m| k;w) -2 P(Q m ), Lm

where Lm denotes the collection of all the (^-2) d i- st i nct partitions _m . m

into and S^-

Especially for large values of k and general w, calculations needed to compute P(M^=m|k,w) may be quite time consu ming. However, the computa

tions involve only integrals of the type (3.14)with m < k and var ious a , and these may be obtained by numerical integration.

An important point i n this connection is that we do not kn ow the value

(25)

of i that gives the minimum in (3.8) for general k. Closed expressions

for the distribution of are available for k < 4, in which case we may prove that this minimum is attained for the value of i corresponding to max w.. For this, define

i<j ck ^

P jp(i)

w .w

J p

(w. +w. ) (w. +w )

1 J . 1 p 1/2

for any set {i,j,p) of distinct integers. Define also

P jp(i) P j£(i) P p&(i) [ (i-n^ ) (l-p^ ^72

U j£(i) p£ (i )

for any set {i,j,p>&} of distinct integers. Then we have the following expressions from Bartholomew (1961):

i) k=2

P(M.=3|2; W

¹

,w 2 ) = P(M.-2|2; W

¹

,W

²

) = |

for i = 1, 2.

ii) k=3

. -1 sin p.

p(M r ⁴ i ³ⁱ w"3) **- i*—**

P(M ⁱ =3|3;W ¹ ,w ² ,W ³ ) = j . -1 sin p. . P(M i =2|3;w 1 ,w 2 ,w 3 ) =

where (i,j,p) is any permutation of (1,2,3).

(iii) k=4

P(M.-5|4; V ¹ ,» 2 ,W 3 ,» 4 )

P <M1 " 4 l 4i " 1 ' w2 > 0 3' u 4)"l + 7s( sin " 1 P j p.£a )+si n" 1 P p|l . j(i )^ir." 1 p jJi . p( i ) )

P(M

I

=3|4;W

1

,W

2

,W

3

,W

4

) = J - P(M.=514;W

X

,W

2

,W

3

.W^)

P(M

I

=2|4;W

1

,W

2

,W

3

,W^) = - P(M

I

=4|4;W

X

,W

2

,W ,W^)

where (i,j,p,&) is any permutation of (1,2,3,4).

(26)

We need the following lemma from Seal (1954) to prove the theorem that

follows.

LEMMA 3.3.4 Let a^, •••» a k real numbers such that a^ < a^

< . .. < a, . Further, let c. . and •c 0 ., j = 1 , 2, ..., k , be real numbers

- k lj

such that

a) E c 1 . - E c 0 . and j-i IJ j-i 23

b) Z c, . < Z c 0 . for r = 1 , 2, . . . , k.

lj " • 2j j=r j=r Then we have

Z c..a. < Z c 0 .a..

j=i ij **J * j.i 2j j**

r

PROOF Set s = £ (c-.-c 0 .). Then by a) we have s, = 0 and so

r j =1 lj 2j k

k k k-1

Z (c...-c 0 .)a. = Z (s -s ,)a + s.a_ = Z s (a -a - ) . . . lj 2j j r r-1 r 11 i r r r + l

j =1 r=2 r=l

The proof follows by observing that > 0 for r = 1 , . .., k.

THEOREM 3.3.5 Let k < 4 be a given integer, and let L, j s 1, . .., k be normally distributed independent random variables with common mean and variances w-\ . . ., w, ^ such that w.. < . .. < w. . Then for the A: s de-

1 k 1 - - k

fined by (3.7) we have

min P(A.<d ) = P(A, <d n ) for every d. >0.

i<i<k 11 k 1 1

~

PROOF Since we know regarding the x 2 distribution that

P(y^ -^d.,) < P(Y^<d ) for d- > 0 ,

A v+1— 1 - A v~ 1 • 1 -

it suffices to prove, by virtue of Theorem 3.3.1 and Lemma 3.3.4, that

k k

(3.16) Z P(M.=m|k;w) > Z P(M, =m|k;w) m=r 1 " m=r k

for each i < k and each r = 1 , ..., k. We use the explicit expressions

given above to show this.

(27)

For k < 2 the relation (3.16) is obvious. For k = 3, it suffices to

prove that P 12(3) < ^ 23(1) and P 12(3) - P 13(2)* But thÌS ÌS eaSy t0 Se ® by examining the expressions for these p:s. As regards k = 4, let (i,j,£) denote any permutation of (1,2,3). The theorem would follow if we show that in the expression for p£j.£(4)> ^ t ' ie integer 4 is exchanged with any one of the integers i, j or £ then the resulting p is not less than p.. n//N . This we now show by first rewriting p.. n/ . N as

K **ijJl(4) 7 6 K ij£(4)**

w.w.(w.+2w.)

(3.17) p.. i j 4 V

ij»Jt(4) w^(w 4 +w i +w^) (w^+w^+w^) '

Now interchange 4 and i in (3.17). The inequality Pjj.£(4) - p **4j*Jl(i) iä equivalent to**

w (w +2w ) w (w +2w ) (3.18) —=—- <

w 4 (w 4 +w j +w Ji ) " w^w.+wj+w^r Now it can be shown that the function

y + 2wj

f(y) ~ y(y + Wj+w Ä )

is nonincreasing in y for positive y, and so (3.18) follows. By symmetry

we have also proved P i j. £ ( 4 ) < p i4.£(j)* Now the inec l ^ualit y Pij.£(4) -

**^ij*4(&) ec l u ^ va ^ ent to w 4 - w £ anc** * so t ^ ie theorem is proved. a

When all the w:s are equal there is considerable simplification since, for fixed k, all the are identically distributed in this case and their distribution is independent of the common w. We have

P(ML ®m|k;w,w,.. • ,w)

= fk

^T

=m|m-l;w,w, .. . ,w, (k-m+2)w) •P(M 1 =2 |k-m+2;w,w,. . . ,w)

\m-z/ m-1 l

for each i = 1 , k. A table of these probabilities appears in Bartholomew (1961, Table 2), which is also reproduced in Barlow et al (1972, Table A.6).

Using this table we have computed Table IA given at the end of Section 3.

**This table gives the values of d^ required to satisfy the P*-condition**

for k = 2 , ..., 12 and P* 0.10(0.13)0.60(0.05)0.95, 0.975, 0.99.

(28)

3.4 The case of unknown variances

The selection procedure R1 can be extended to the case when the variances

2 2

are unknown if the unknown variances are of the form 0. = a.a , where a.,

i i ' i

i = l , . . . , k a r e k n o w n c o n s t a n t s a n d a i s u n k n o w n . 2

Let L(x,y,a ) denote the likelihood function as before. Let also y 2 n =

(x 1 ,...,x k ) and

S o " "i ₁ ₌₁ ¹ . _J ₌₁ ^{E <x} ij" _J ^; i ^)2/N

2 2

be the maximum likelihood estimators of JJ and a , ^where (y,a ^)£ ßx0

and where N is the total number of observations from all the populations.

In the parameter space ftx0, we may now construct a region having relative likelihood at least c, and then select TT . iff there exists at least one

i

point (y,0 ) in this region also belonging to Œ^x0, Now it has been 2

shown by Barlow et al (1972, Example 2.4) that in the restricted space

£hx0, the maximum likelihood estimator y = of y is the same as in the case of a 2 known, and that the maximum likelihood estimator of

2 .

a in £2.x0 is given by

n. . . n.

-2 a = Z a. Z ^{y -}

¹

v"" t ( X . . - X . ) - + Z r n . a . ~

l

(x.-y.) r

A

n

2

k i ?

/N = Z a T ¹ Z (x..-y.) /N.

. , i . , l j i . . . i i i ^ i . . i . , i i i

•i=l j =1 ^J i=l ^J i=l j=l ^J

By considering the ratio

2 /s ^ 2 (3.19) sup L(x,y,a )/L(x,y 0 ,a 0 )

we arrive at the selection rule

"Retain in the selected subset iff

k n.

inf Z — (x.-y.) r-\ « 1 a • 1 i ]j£Œ. 1=1 i

i < d"

, n. 1

1 Z — 1 Z (x. 1 .-x.) - 2

N-k . .. a. . , ij i 1=1 i j=l J

(29)

The infimum over is obtained through (3.4).

It may be remarked that instead of (3.19), if we use the ratio /\ 2 /\ /\ 2

sup L(x,y,o 0 )/L(x,y 0 ,a 0 ) y€fì.

then we arrive at exactly the same selection rule. It may also be noted that the distribution of given by (3.9) is independent of the unknown

value of a ² .

Now let denote the expression in the selection rule such that TU is selected iff A^ < d^. Then by noting the independence between the nu

merator and the denominator in A^, and by using arguments similar to those in Theorem 3.3.1, the following theorem can be proved.

THEOREM 3.4.1 For each i = 1 , ..., k and d^ > 0 , we have

k d i

PXA.^) = P(M.=k + l|kjw) + Z P(F k _ m+ _ k < F¥n -).P(M.=m|k;w) m=2

where the notation F is used to denote a random variable having the V V 2

F distribution with parameters and c

The following theorem is analogous to an earlier result.

THEOREM 3.4.2 In Theorem 3.3.5, if each A^ is replaced by A^ and if

2 2

each w. is assumed to be of the form n./a.a f where a is unknown,

i i i *

^.

9 then the theorem still holds. D

**At the end of this section we give Table IB. For P* = 0.75, 0.90, 0.95, 0.99 and n = 2(1)20(5)50, this table provides the values of d^ required**

2 2 . 2 .

to satisfy the Precondition when cr^ - a , i = 1, k, where a is

assumed unknown and when random samples each of size n are taken from

each of the populations..

(30)

O O N - - i » — O H i n o > ^ ^ n r N H N s î s r H c o a **« • • • • • * • • • • * • • • » O O P - r - H r - N C O M R ^ R ^ O C N R ^ , i H H c s c N r i c n < r i A N c c o**

a > c r » m O O O r - H C o r ^ c o c c o r - . < r r o f - H L n o r ^ v D c n r - J c r > v D r — ( O O O O L O < T O v O v £ > r n O N v O O O O < r O « — i C ^ v D O N C M O H s J O O M C O M i n O ^ O r O M ^ ^ N

H H c s M n n ^ t n o o o o

o C M

'S

m r-j er. O> N M n oo OD m m o m as <t r-i <t r -s r-<

v O N Ì N N ^ V T ^ L O N C O H ^ O O N r - i m M D O O N i n ^ m n o n N f v f o ^ o o ^ ^ H f N n ^

H H C S . C N C N R O ^ I O V Û C O O

ctì

w Q) a>

U ctf

en '£

rC Q) G <L>

On

00 r s v Û ^ ^ v O ^ v O f ^ M H v Û C O O N O O O O t N s î ( N r s r O C ? > H ( T \ O n v D O > C ^ **c?ivrrN(j*Hr)cr\r)irirococo^Drn**

O F O V O O V O ^ C S R S C N O M » ^ 0 ^

H H H C N M C ^ n ^ v û O O O

i n 0 0 v J N \ ö ( ^ H 0 0 ^ r N C 0 0 H N v T > r ^ o o r ^ r ^ r ^ m < t L n r - i < r O v o a >

O • • • • • • • • • • • • • • RS IO 0\ <f N r- I M O R ^ O C N R ^{^ N}

i — i r - i c N C M c o c o < r v o r ^ a N

* Ç^i

< ^"d _C M cö hJ W

^9 <

H M-l w a>

r-4 3 cd >

"O Q) 4J CJ

a)

vO

N O N O O O O O v O C N N O O H I s r i O O O O i n r - i i n a N O O v o c s M f O o s s f H O M n r l H O O ^ H O < f < î ^ v û O ^ O O o c N v f o o r n o ^ r o o O s r m o o v f v j ' H H H N c s n ^ m N O N c s a \ v o N v û O s r v û ^ n o v o ^ N H C N O O v O O O v O m ^ f f n v û ^ N N r - ^ < f r ^ o < r « - H C N i o r ^ o o < r c N O i n O r - i m r ^ r — i < t r ^ r - - ( L n r - ^ 0 L n 0 C ^ • • • • • • • • • • • • • «

H H H C S C S n ^ l / Ì N O O

oo cN O cm 00 ro sT m m N v£5 <î n 00 rN H <r o (N m a\ H <r

o LT> <J\ CNL r-« CM < J\

oo oo sì rs N m oo N VO O 00 H O OO (N OD O O M SF

un H H H M C S n t T l O O O

<u CO 14 O

M-l M 3 N H M ^ O O ^ O O O O r O O O O >

O H r o ^ c ^ H s r o o v t H i n o N r s • • • • • • • • • • • • •

M-l O CO <D r-» 0

> dj

en

**< * r - I O O O O O C M O ^ O O r - l O r ^ v û N O H ^ O O G > n ( ^ O c O ( N**

N I O O R V O O M O O D ^ M C ^ C M

O H ^ i D N O ^ o o m o o o o o H H H M FO ITI VÛ

CSI

c M i n o c r v o n c N L O t o c n a s vd-ooLn<rco<rcMLOr—ir-i v D < 3 " r ^ . m o r ^ < r O ^ ^ 0 ^ c M < r r > » o o r ^ o o ^ • • • • • » » • » • • r— —I CM en LO

* CU

O O O O O O L n o m o ^ o m r ^ c T N un

r-» C N m <t i n v O v O N r N o o o o a > o \ ( ^ ( 7 i

Subset selection based on likelihood ratios: the normal means case

April 1979

SUBSET SELECTION BASED ON LIKELIHOOD RATIOS: THE NORMAL MEANS CASE

By

Jayanti Chotai

American Mathematical Society 1970 subject classification: Primary 62F07;

Secondary 62A10, 62F05.

Key words and phrases: Subset selection, likelihood ratio, order restric­

tions, loss function, normal distribution.

Let tt^, ..., TTj^ be k(>2) populations such that ïï^, i = 1, 2, ..., k, is characterized by the normal distribution with unknown mean and

2 . 2

variance a^a , where a^ is known and a may be unknown. Suppose that on the basis of independent samples of size N^ from TK ( i=l,2,...,k), we are interested in selecting a random-size subset of the given popula­

tions which hopefully contains the population with the largest mean.

Based on likelihood ratios, several new procedures for this problem are

derived in this report. Some of these procedures are compared with the

classical procedure of Gupta (1956,1965) and are shown to be better in

certain respects.

Section Page

1. INTRODUCTION AND SUMMARY 1

2. A SHORT REVIEW AND PRELIMINARIES 3

2.1 A short review with comments 3

2.2 Some preliminaries 7

3. THE SELECTION PROCEDURE R1 10

3.1 Derivation of the procedure R1 10

3.2 Some properties of R1

3.3 Determination of the constant satisfying 16 the P*-condition

3.3.1 Some general results 17

3.3.2 The probabilities P(M^=m|k;w) 20

3.4 The case of unknown variances ^5

4. PROCEDURES BASED ON SOME OTHER LIKELIHOOD RATIOS 30

4.1 The class R2(A) of procedures 3 ^

4.2 The class R3(A) of procedures 3 ^

4.3 The class R4(A) of procedures 34

4.4 The procedure R5 3 ^

4.4.1 Derivation of the procedure 3 ^

4.4.2 P(CS|R5) and its infimum 3 ^

5. PROCEDURES DERIVED BY ASSUMING A FIXED PARAMETER VECTOR 40

5.1 The procedure R6

5.2 Some other likelihood ratios 43

6. COMPARISONS BETWEEN SOME OF THE PROCEDURES 45 6.1 The case of three populations and fixed configurations ^ 6.2 The case of ten populations and random configurations 55

6.3 Conclusions 58

ACKNOWLEDGEMENTS 61

REFERENCES 68

1. INTRODUCTION AND SUMMARY

Let ïï^, ..., ïï^ be k(>2) populations such that 7i\ (i =l,2,...,k) is characterized by the normal distribution with unknown mean and vari-

2 . 2

ance aJJ , where a^ is known and a may be unknown. Let 1

^[2] - - ^[k] denote t ' ie ordered means and for i = 1, ..., k let

below. The consequences of applying these methods to distributions other than normal and to other goals like the complete ranking problem and selection of the t best, will be considered in forthcoming reports.

Let JJ be the unknown vector of means and let Q be the parameter space. We shall assume that £2 is the k-dimensional Euclidean space.

The likelihood function obtained by considering the total sample of

k 2

N = E n. observations is denoted by L(x,y,a ). Let y n denote the

i=l ~ u

maximum-likelihood estimate of y. For a given constant c, 0 < c < 1 , consider the region Œ(c) consisting of all y 6 Q such that

2 2

L(x,ii,g ) > c • L( X ,£q, Q ). Thus Q(c) is simply a likelihood-based confidence region for the unknown y. Now consider the selection proce­

dure, denoted by Rl, which includes the population TI \ in the selected

subset iff Q(c) contains at least one point (y^,1^,. • • ,U^.) having

Iju as its largest component. This is the topic of section 3.

Section 2 contains a short review of the field of ranking and selec­

tion procedures, together with some preliminaries. Rule R1 is considered in Section 3. In Section 4 we generalize the idea of relative likelihood to consideration of ratios of supremum of likelihoods over some intui­

tively reasonable regions in Q. Classes R2(A), R3(A) and R4(A) of selection procedures are derived in Sections 4.1, 4.2 and 4.3 respectively.

The rules contained in R2(A) and R3(A) depend on a prespecified con­

stant A > 0 and coincide with R1 for A = 0. The rules in R4(A) depend on a prespecified constant A > 0 (strictly), and the limiting case when A 0 is denoted by R5 and considered in Section 4.4.

If the parameter space fi is restricted to the set of k! permuta­

tions of fixed values yj, ..., then likelihood ratios lead to other interesting rules. In particular, the whole of Seal's (1955) class may be generated by assigning different values to ..., ]i^. Thi s is consid­

ered in Section 5, where an important rule R6 is derived and studied.

Section 6 is devoted to comparisons between the rules R, RI, R5

and R6, where R denotes the traditional rule proposed by Gupta (1956).

The comparisions are made both in terms of the P*-approach (see Gupta (1965)) and in terms of four loss functions that have appeared in the

literature. These loss functions are explicitly given in Section 6.

and "bad 11 are used loosely to mean "having a h igh rank" and "having a

low rank", when the populations are ranked in terms of their means. The

main conclusions from the comparisons made appear in Section 6.3.

2. A SHORT REVIEW AND PRELIMINARIES

2.1 A short review with comments

Very often the experimenter is faced with the problem of comparing a set of k(>2) populations (categories, drugs etc) in some sense on the basis of independent samples from the populations. The classical tests of homo­

geneity provide only a partial solution to the problem. Therefore, proce­

dures have been developed within the last thirty years which enable the experimenter to rank the populations and/or select a subset of the popu­

Key words and phrases: Subset selection, likelihood ratio, order restric

variance a^a , where a^ is known and a may be unknown. Suppose that on the basis of independent samples of size N^ ^from TK ( i=l,2,...,k), we are interested in selecting a random-size subset of the given popula

3.3 Determination of the constant satisfying 16 **the P*-condition**

4.4.2 P(CS|R5) and its infimum ³ ^

L(x,ii,g ) > c • L( X ,£q, Q ). Thus Q(c) is simply a likelihood-based confidence region for the unknown y. Now consider the selection proce

Section 2 contains a short review of the field of ranking and selec

tion procedures, together with some preliminaries. Rule R1 is considered in Section 3. In Section 4 we generalize the idea of relative likelihood to consideration of ratios of supremum of likelihoods over some intui

The rules contained in R2(A) and R3(A) depend on a prespecified con

If the parameter space fi is restricted to the set of k! permuta

tions of fixed values yj, ..., then likelihood ratios lead to other interesting rules. In particular, the whole of Seal's (1955) class may be generated by assigning different values to ..., ]i^. Thi s is consid

**The comparisions are made both in terms of the P*-approach (see Gupta (1965)) and in terms of four loss functions that have appeared in the**

Very often the experimenter is faced with the problem of comparing a set of k(>2) populations (categories, drugs etc) in some sense on the basis of independent samples from the populations. The classical tests of homo

geneity provide only a partial solution to the problem. Therefore, proce

dures have been developed within the last thirty years which enable the experimenter to rank the populations and/or select a subset of the popu

The subject of selection procedures as it exists today may be formu

n ^t (ô») -.{yen: y ^[k _ ^t+1] >y ^[k _ ^t] **+ô*}**

**for any specified ô* > 0 . Then ß (ô) is known as the preference zone and ß-ß (Ô) is known as the indifference zone. Consider the problem of selecting the t best populations, corresponding to t+1]'**

In the present formulation, a procedure is employed that selects a subset of a prespecified size s, 1 < s < k , from the k given popula

(2.1) **P(CS) > P* if y € fi (ô*)**

**where P(l/( )<P<1) and 6* > 0 are prespecified constants. Obviously,** k

The case when s = t is commonly known as the "indifference zone approach 11 . Since Bechhofer (1954), many modifications have been made to obtain procedures for distributions other than normal and to other defini

tions of "correct selection 11 . For example, CS may be defined as inclu

(A) The P-approach:* This approach is considered extensively in the literature and is commonly known as the "subset selection approach". By a "correct selection" is meant selection of a subset that includes the best population. One requires that

(2.2) **P(CS) > P* for all y € fi.**

**P*-condition.**

be the ordered sample means from the populations. Now consider the fol

*(i) - Vd) + c **2*(2)** + * c **i-l*(i-l)** +

Under the assumption that all the population means except one are equal, Seal proposed to find that rule in C which maximazes the probability of including in the selected subset the population with the unequal mean if this mean is larger than the common mean of the other k-1 popula

It may also be remarked that many generalizations and modifications of the approach under consideration exist in the literature. One line of generalization is to consider populations that are characterized by distributions other than normal. Most of the rules which have been con

means and set h(x) = x + da//n, thereby obtaining Gupta's (1956) rule R **Before we conclude our review of the P*-approach, the following is in order. When several rules are available for a given problem, their per**

formances are usually compared under certain selected configurations of the parameters involved. Let P[£]> i = 1, 2, ..., k, denote the proba

/ Z ^P

Z { Z ^j}{ Z n ^<Ky -A+<5..) n