A PPLIED P HYSICS AND E LECTRONICS
U ME A ˚ U NIVERISTY , S WEDEN
D IGITAL M EDIA L AB
Document distances using the Zipf distribution and a novel metric
Apostolos A. Georgakis
1Dept. Applied Physics and Electronics
Ume˚a University SE-90187, Ume˚a Sweden
e-mail: apostolos.georgakis@tfe.umu.se
H. Li
Dept. Applied Physics and Electronics Ume˚a University
SE-90187, Ume˚a Sweden e-mail: haibo.li@tfe.umu.se
DML Technical Report: DML-TR-2003:01 ISSN Number: 1652-8441
Report Date: December 1, 2003
1This work was supported by the European Union Research Training Network (RTN)“MUHCI: Multi-modal Human Computer Interaction (HPRN-CT-2000-00111).
A novel metric is proposed in the present report for the evaluation of the goodness-of-fit criterion between the distribution functions of two samples. We extend the usage of the proposed criterion for the case of the generalized Zipf distribution. Detailed mathematical analysis of the proposed metric, which is embodied in a hypothesis testing, is also provided.
Keywords
Zipf distribution, n-gram frequencies, bhattacharyya metric
1 Introduction
In a plethora of natural phenomena the distribution of a characteristic under consideration is heavily skewed. For example, biological, ecological, and chemical systems sometimes tend to exhibit an exponential decaying model.
Web site popularity, web access statistics, Internet traffic, population and growth of cities also comply with the same decaying model. Furthermore, many references can be found in bibliometrics, informetrics and library science. A plethora of distributions exists in the literature that are capable to model the above phenomena; with the most prevalent among them the well-know Zipf distribution [1, 3].
The Zipf distribution rely on an empirical law discovered by Estoup in 1916 and named after the Harvard linguistic professor G. K. Zipf (1902-1950). This distribution relates the frequency of occurrence of an eventα and the rank, mα, of the event when the rank is determined by the above frequency of occurrence. The relationship is the power-law function:
P(α) ∼ 1/mθα (1)
with the exponentθto be close to unity. The probability distribution in Eq. (1)is an instance of a power law. Zipf’s law is an experimental law, not a theoretical one. The causes of Zipfian distributions in real life are a matter of some controversy. However, Zipfian distributions are commonly observed in many kinds of phenomena.
Initially the Zipf distribution was confined to the linguistic community and associated the frequency of word in a document with its rank [4, 7]. The prerequisite for the above law to be applicable in linguistics is that the size of the document to be fairly large.
2 Document distance
Its is generally admissible that the contextual “similarity” between documents (regardless of their size) can be based on their structural textual elements, namely the words forming these documents. The previous fact is the basic principle behind the vector space model (VSM) [6]. In the VSM, the available textual data are encoded into a numerical form and are represented by numerical vectors. Furthermore, it is generally agreed upon that the contextual similarity between documents exists also in their vectorial representation. Since the Zipf distribution of a document employs the frequencies of the words forming that particular document, it is justified to evaluate the contextual similarity based on the numerical encoding produced by the particular distribution.
A novel distance measure will be provided in the current chapter. In Appendix A it will be proven that the proposed distance measure is also a metric. This metric is used in order to evaluate the similarity between Zipf distributed vectors. The suggested metric can be easily proven that it is computationally superior (faster) than the Euclidean distance. For example, for two Nw-dimensional vectors, the computational cost of the suggested metric is Nwmultiplications, a bit-shift operation and Nwadditions compared to Nwmultiplications and(2Nw−1) addition of the Euclidean distance.
Furthermore, by exploiting the fact that the vectors under consideration are distributed according to the Zipf law enables us to extend the suggested metric towards the direction of a statistical hypothesis. The hypothesis under consideration is whether two Zipf distributed vectors, and subsequently two documents, are similar or not.
For this reason a detailed distribution is provided for the proposed metric along with a detailed proof. Also, two distribution tables are supplied for the proposed metric to make the chapter self-content.
In what follows, section 2.1 provides a description of the proposed metric and section 2.2 describes the process of incorporating the Zipf distribution in the proposed metric. It also provides a detailed proof for the evaluation of the distribution associated with the proposed metric. Following, section 2.3 provides the hypothesis testing for the evaluation of the similarity between two Zipf distributed vectors.
2.1 Proposed metric
Let us suppose thatXN= {x1,x2,...,xN} is a collection of Nw-dimensional random vectors, where xi= (xi1,xi2, ...,xiNw)T with cumulative probability density function fx(i). Let also ximdenote the univariate random variable with distribution function fi(m), where fi(m) corresponds to the probability of the mth element of the ith vector, that is, fi(m) = P(xim), where
Nw
m=1
P(xim) = 1. (2)
We further assume that the probabilities in Eq. (2) follow the Zipf distribution. In order to assess whether two vectors drawn independently from the setXN are of the same “shape”, one needs to compare their distribution functions. For this purpose a novel metric is introduced. Let xiand xj denote two vectors randomly drawn from the setXN, (xi,xj∈XN). The hypothesis whose validity is to be tested is:
H0: The two cumulative distribution functions are “identical”⇒ fx(i) = fx( j) or equivalently
H0: fi(m) ∼= fj(m), for almost each m,
against the negation of H0. If the null hypothesis is true, the population distributions are identical and the two samples are drawn from the same population, meaning that the vectors xiand xjshould be regarded as instances of the same population. Therefore, allowing for statistically neglectful sampling variations, under H0there should be reasonable agreement between the two distributions. The proposed criterion between the ith and jth distributions, henceforth denoted by Di j, is defined as:
Di j= h(xi,xj,xi◦ xj) = (xi+ xj+ g(xi,xj))(xi+ xj+ g(xi,xj))T, (3) where the notion(◦) corresponds to the Hadamard product between two vectors and g(xi,xj) corresponds to the Nw-dimensional vector whose the k-th element is
P(xik)P(xjk) (square root of the Hadamard product between the vectors xiand xj).
From Eq. (3), the following form for the variable Di j, derives:
Di j =∆
Nw
m=1
fi(m) +
fi(m) fj(m)2
fi(m) (4)
=
Nw
m=1
fi2(m) + fi(m) fj(m) + 2 fi(m)
fi(m) fj(m) fi(m)
=
Nw
m=1
fi(m) + fj(m) + 2
fi(m) fj(m)
=
Nw
m=1
( fi(m) + fj(m)) +
n m=1
2
fi(m) fj(m)
= 2 + 2
Nw
m=1
fi(m) fj(m)
(5)
= 2 + 2LBi j (6)
From Eq. (5) is evident that only the square roots of the ximand xjmare needed. Therefore, instead of storing the actual values for the ximand xjmone can only retain the square roots of them. In that way there is no need
2 4 6 8 10 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
←xik xjk→
Euclidean distance
Probability
(a)
θi=1.35 θj=1.55
2 4 6 8 10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
←xik xjk→
←xikxjk Proposed metric
Probability
(b)
θi=1.35 θj=1.55
Figure 1: The divergence viewed under different metrics. The grayed area corresponds to the divergence measured from: (a) the Euclidean distance, which relies on the shaded area between the distribution functions, and (b) the proposed metric that is based on the shaded area in the bottom left side of the plot.
to evaluate the square roots each time one needs to evaluate the value of the random variable Di j, thus limiting the computations cost to just Nwmultiplications, a bit-shift operation (the multiplication by 2) and Nwadditions.
Appendix A proves that the proposed distance measure satisfies also the three properties of a metric, so it will be referred as a metric, henceforth.
From Eq. (25) is obvious that Di j∈ [2,4], where Di jequals four when fi(m) = fj(m),∀m. On the other hand Di jequals two only in the extreme case where the distributions of the ith and jth RVs are of the following form:
P(xim) =
= 0, when P(xim) = 0
= 0, elsewhere ∀m (7)
in which case the product fi(m) fj(m) equals to zero and therefore Di jtends towards the value two. So the closer the pdf of the ith RV is to the pdf of the jth RV the larger is the value of Li jand subsequently, the value of Di j tends toward the value of four. So the hypothesis test mentioned earlier is transformed into:
H0: Di jis statistically equal to four H1: The negation of H0
It must be noted here that Eq. (4) resembles the Chi-square goodness-of-fit test proposed by Pearson but there is no other resemblance with that particular test. In fact, since Chi-square uses the maximum divergence between the distribution under considerations this might lead to unexpected results in case when the distributions differ in just two samples out of the Nwsamples comprising the Nw-dimensional vectors.
Figure 1 depicts the areas used by the proposed metric and the Euclidean distance in evaluating the similarity between the distributions1.
2.2 The Zipf distribution and the proposed metric
In order to evaluate the hypothesis test mentioned in section 2.1 it is needed to compute the probability density function of the random variable Di j. In doing so one must first determine the probability of the random variable
1The vectors used in this figure were artificially generated.
xim. For the case under consideration the probability of the random variable is:
fi(m) = 1
mθiHNw,θi, (8)
whereθiis a parameter dependent on the data-set under consideration and HNw,θi is the so-called Nwth Harmonic number of orderθiwhich is a normalizing factor equal to:
HNw,θi=
Nw
m=1
1
mθi. (9)
Equation (8) is the well known generalized Zipf distribution [1].
The first step towards the computation of the distribution of the variable Di jis to evaluate the distribution of the elements of the random vector zi j= (zi j1,zi j2,...,zi jNw) = xi◦xj= (xi1xj1,xi2xj2,...,xiNwxjNw). Since for the formation of the mth element of zi jit is needed to multiply the corresponding mth elements in both xiand xjthis leads to the following: P(zi jm) = P(ximxjm). In the previous expression the random variable ximis independent of the variable xjmsince they refer to two different random vectors, which leads to: P(zi jm) = P(xim)P(xjm).
For the evaluation of the probability of zi jmit is needed first to determine the cdf for m a given number, where m∈ IN. Lets denote this distribution by Fi j(m):
Fi j(m) = P(until the mth element of zi j) (10)
= Fi(m) · Fj(m) =
m s=1
fi(s) ·
m t=1
fj(t)
=
m s=1
P(xis) ·
m t=1
P(xjt)
=
m s=1
1 sθiHNw,θi ·
m t=1
1 tθjHNw,θj
= 1
HNw,θiHNw,θj ·
m s=1
1 sθi ·
m t=1
1
tθj (11)
where Fi(m) and Fj(m) are the cdfs of the ith and jth RVs respectively. The next step is to find the pdf for the random variable zi jm, that is:
fi j(m) = P(zi jm) = Fi j(m) − Fi j(m − 1)
= ai j
m
s=1
m t=1
1 sθi · 1
tθj −
m−1
s=1 m−1
t=1
1 sθi · 1
tθj
(12)
0 20 40 60 80 100 0
0.05 0.1 0.15 0.2 0.25 0.3 0.35
Pdf of the RV x ik
Probabiltiy
(a) θi=1.35
0 20 40 60 80 100
0 0.1 0.2 0.3 0.4 0.5
Pdf of the RV x jk
Probabiltiy
(b) θj=1.55
0 50
100 0
50 1000
0.1 0.2
xik 3D display of the pdf of the RV z*ijk
(c) xjk
Probabiltiy
0 20 40 60 80 100
0 0.05 0.1 0.15 0.2
2D projection of the pdf of the RV z*ijk
Probabiltiy
(d)
Figure 2: The probability density function for the Zipf distribution for Nw= 100 and for (a) θi = 1.35, (b) θj= 1.55, and (c) the product z∗i jm.
where ai jdenotes the fraction 1/
HNw,θiHNw,θj
. From (12) derives:
fi j(m) = ai j
1 m(θi+θj) +
1 mθi
m−1
t=1
1 tθj + 1
mθj
m−1
s=1
1 sθi
= ai j
1
m(θi+θj) + HNw,θj
mθi Fj(m − 1) +HNw,θi
mθj Fi(m − 1)
⇒
fi j(m) =
ai j, m= 1
ai j
1 m(θi+θj) +
HNw,θ
j
mθi Fj(m − 1)+
HNw,θ
i
mθj Fi(m − 1)
, ∀m ∈ {2,Nw}
0, elsewhere
(13)
Figure 2 depicts the process of obtaining the distribution of the random variable zi jm.
After the computation of the pdf for zi jmit is needed to compute the density function of the random variable
√zi jm. This is due to the fact that Di jis a linear combination of √zi jm. Let z∗i jmdenote the square root of zi jm, that is, z∗i jm= √zi jm, where m∈ {1,2,...,Nw}. Since the sample space for the RV zi jmis the set Z1= {1,2,...,Nw}, the sample space corresponding to z∗i jmis the set Z2=
1,√ 2,...,√
Nw
. It must be noted here that the cardinality of the set Z2is equal to Nwsince each element of the set Z2is the square root of the set Z1. So zi jmis a discrete RV then the RV z∗i jm is of the same pdf as the RV zi jm [5] and if fi j∗(m) denotes the pdf of the RV z∗i jm, then,
fi j∗(m) = fi j(m),∀m.
The final step is to evaluate the pdf of the random variable Li j=Nw
m=1√zi jm=Nw
m=1z∗i jm. For a large value
of Nwand due to the central limit theorem (CLT) the pdf of the above sum tends toward the normal distribution with mean value µ and varianceσ2[5]. The mean value is:
µ = E[Li j] = E N
w
m=1
z∗i jm
=
Nw
m=1
E z∗i jm
= NwE z∗i jm
= Nw Nw
m=1
√m fi j∗(m)
= Nwai j Nw
m=2
√m
1
m(θi+θj) + HNw,θj
mθi Fj(m − 1) +HNw,θi
mθj Fi(m − 1)
= Nwai j
Nw
m=1
1
m(θi+θj)−0.5+HNw,θj
mθi−0.5Fj(m − 1) + HNw,θi
mθj−0.5Fi(m − 1)
= Nwai j
HN
w,(θi+θj)−0.5+ HNw,θj
Nw
m=1 Fj(m−1)
mθi−0.5+ HNw,θi
Nw
m=1 Fi(m−1)
mθj−0.5
(14)
and the variance is:
σ2 = E
(Li j− µ)2
= E (Li j)2
− µ2
= E
N
w
m=1
z∗i jm
2
−µ2
= E
Nw
m=1
z∗i jm2 + 2
Nw
m1,m2=1 m1=m2
z∗i jm
1z∗i jm
2
− µ2
=
Nw
m=1
E
z∗i jm2 + 2
Nw
m1,m2=1 m1=m2
E z∗i jm
1z∗i jm
2
− µ2
= NwE
z∗i jm2
+ 2Nw(Nw− 1)E z∗i jm
1z∗i jm
2
− µ2 (15)
At this point, and without lost of generality, it can be regarded that the RVs z∗i jm
1 and z∗i jm
2 are independent.
Having this postulate:
E z∗i jm
1z∗i jm
2
= E z∗i jm
1
E z∗i jm
2
(16)
The first term on the right side of the variance equation is equal to:
E
(zi jm)2
= ai j Nw
m=1
√m2
1 m(θi+θj) +
HNw,θ
j
mθi Fj(m − 1)+
HNw,θi
mθj Fi(m − 1)
= ai j
HN
w,(θi+θj)−1+ HNw,θj
Nw
m=1 Fj(m−1)
mθi−1 + HNw,θi
Nw
m=1 Fi(m−1)
mθj−1
(17)
whereas the second term equals to:
E z∗i jm
1
E z∗i jm
2
= µ2 (18)
So the total variance of the random variable Li jis:
σ2 = Nwai j
HN
w,(θi+θj)−1+ HNw,θj
Nw
m=1
Fj(m − 1)
mθi−1 + HNw,θi
Nw
m=1
Fi(m − 1) mθj−1
+ [2Nw(Nw− 1) − 1]µ2
= Nwai j
HN
w,(θi+θj)−1+ HNw,θj
Nw
m=1
Fj(m − 1)
mθi−1 + HNw,θi
Nw
m=1
Fi(m − 1) mθj−1
+ Nwai j[2Nw(Nw− 1) − 1]
HN
w,(θi+θj)−0.5+ HNw,θj
Nw
m=1 Fj(m−1)
mθi−0.5+ HNw,θi
Nw
m=1 Fi(m−1)
mθj−0.5
= Nwai j
HN
w,(θi+θj)−1+ [2Nw(Nw− 1) − 1]HNw,(θi+θj)−0.5
+ 2ai jNw2(Nw− 1)HNw,θj
N
w
m=1
Fj(m − 1) mθi−1 +
Nw
m=1
Fj(m − 1) mθi−0.5
+ 2ai jNw2(Nw− 1)HNw,θi
N
w
m=1
Fi(m − 1) mθj−1 +
Nw
m=1
Fi(m − 1) mθj−0.5
(19) which is equal to:
σ2 = Nwai j
Nw
m=1
1+ [2Nw(Nw− 1) − 1]m−0.5 m(θi+θj)−1
+ 2ai jNw2(Nw− 1)HNw,θj
Nw
m=1
Fj(m − 1)
1− m−0.5 mθi−1 + 2ai jNw2(Nw− 1)HNw,θi
Nw
m=1
Fi(m − 1)
1− m−0.5
mθj−1 (20)
Finally, the pdf of the RV Di j= 2 + 2Li jhas to be computed. Given the fact that Li jis normally distributed we get the following pdf for the RV Di j:
fDi j(t) = fLi jt−2
2
2 = 1
√8πσexp
− 1
8σ2(t − 2 − 2µ)2 (21)
where µ andσare the expected value and the standard deviation of the random variable Li j.
But since the random variable Di jis confined in the interval[2,4] (Di j∈ [2,4]), Eq. (21) obviously underesti- mates the true pdf of Di j. The accurate form of the pdf is:
fDi j(t) =
0, −∞≤ t ≤ 2
exp−8σ12(t−2−2µ)2
! 4
2
exp−8σ12(t−2−2µ)2dt
, 2 ≤ t ≤ 4
0, 4≤ t ≤ +∞
(22)
which is the so-called truncated normal distribution [5]. Equation (22) can be simplified in the following form:
fDi j(t) =
exp−8σ12(t−2(1+µ))2
√2πσer f"√x 2σ−2(1+µ)√
2σ #$$$4 x=2
, 2 ≤ t ≤ 4
0, elsewhere
(23)
where er f(a) =√2π%a
0e−t2dt denotes the so-called error function [2].
2.3 Hypothesis test evaluation
To assess whether the ith and jth distributions are of the same “shape”, the RV Di jwill be employed in a hypoth- esis test. The hypothesis is as follows:
H0: The ith and jth pdf are statistically identical which equals to Di j→ 4.
H1: The ith and jth pdf are not identical.
Given a pre-defined significant level a, the rejection region for the above hypothesis test is formulated as follows:
a = P(Di j≤ za) =
za
!
2
fDi j(t)dt
= er f
"
(1+µ)√
2σ −2√za2σ#
− er f"
(1+µ)√
2σ −√12σ# er f
"
(1+µ)√
2σ −√22σ#
− er f"
(1+µ)√
2σ −√12σ# . (24)
In Eq. (24) the only unknown is the parameter zα. After evaluating the parameter zα the null hypothesis is accepted if the expression zα≤ Di j(t) is true otherwise its rejected. Figure 3 depicts the distribution of the RV Di jalong with the support regions for the hypothesis H0and the alternative hypothesis H1.
Appendix B provides a brief description of the computation of the critical values for the acceptance or the rejection of the null hypothesis along with two tables with critical values computed by the proposed Eq. (27).
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 0
0.1 0.2 0.3 0.4 0.5 0.6
t fDij(t)
α
↓
(1−α)↓
H1 H0
Probability
Figure 3: The support regions for the null and the alternative hypothesis for the RV Di jfor Nw= 2000,θi= 1.35 andθj= 1.55 at a significant level ofα= 0.90. (Important remark: Although the above graph implies a uniform distribution this is not true. The slope of the line in the graph approaches zero but still is significantly different than this value.)
3 Conclusions
The present report provides a preliminary mathematical analysis on a novel metric, that is also introduced in the report, for the evaluation of the contextual similarity between documents. The proposed metric is computationally superior than the Euclidean distance which is oftenly employed in similar tasks. Further investigation will be performed towards the direction of the biasness of the introduced metric (investigate whether the proposed metric is biased or not).
A Is it a metric?
In order to prove that the proposed statistic, Di j, is also a metric distance the following has to be proven:
Positiveness: Since fi(m) and fj(m) for m = 1,2,...,Nw contains the total probability mass of the ith and jth
RV the following stems out:
0≤ xim< 1and Nw
m=1xim= 1 0≤ xjm< 1and Nw
m=1
xjm= 1
⇒
0≤ ximxjm≤ 1 ⇒ 0≤ √ximxjm≤ 1 ⇒ 0≤ Nw
m=1
√ximxjm≤ 1 ⇒
0≤ 2Li j≤ 2 ⇒ 2≤ 2 + 2Li j≤ 4 ⇒ 2≤ Di j≤ 4
(25)
In case where i= j then Lii= Nw
m=1
√ximxjm= Nw
m=1xim= 1 ⇒ Dii= 2 + 2Lii= 4.
Symmetry: Since ximxjm= xjmxim⇒ Di j= Dji.
Triangular inequality: In order to prove the triangular inequality it can be proven that:
Di j+ Djm≥ Dim⇒ 2 + 2Li j+ 2 + Ljm≥ 2 + Lim⇒ 1 + Li j+ Ljm≥ Lim (26) which is obvious since Li j, Ljm≥ 0 and 1 ≥ Lim.
B Critical values
The critical values for the hypothesis test associated with the RV Di jare computed using the following:
1 2√
2σ(2(1+µ)−z! α) 0
e−t2dt = α√ π 2 er f
(µ − 1)√ 2σ
+(1 −α)√ π 2 er f
µ
√2σ
⇒ zα = 2(1 + µ) − 2√
2σer f inv
αer f
(µ − 1)√ 2σ
+ (1 −α)er f
µ
√2σ
, (27)
where er f inv is the inverse of the error function [2]. Using Eq. (27) and pre-defined significance levels two tables of critical values for hypothesis test were computed. Table I2corresponds to a significance level ofα= 0.90 when the dimensionality of the feature vectors is Nw= 2000, whereas, table II3corresponds to a significance level of α= 0.95 under the same feature vector dimensionality.
2The values of the table a scaled version of the original values. Original values= 3.8+scaled value∗10−9.
3The values of the table a scaled version of the original values. Original values= 3.8+scaled value∗10−9.
References
[1] References on zipf’s law. http://linkage.rockefeller.edu/wli/zipf.
[2] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Math- ematical Tables. Dover Pubns., 10th edition, 1974.
[3] L. A. Adamic. Zipf, Power-laws, and Pareto - a ranking tutorial. http://ginger.hpl.hp.com/shl/papers/ranking/
ranking.html.
[4] D. Manning and H. Sch¨utze. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999.
[5] A. Papoulis. Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill, 1984.
[6] G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw Hill, 1983.
[7] R. B. Yates and B. R. Neto. Modern Information Retrieval. ACM Press, 1999.
TableI:DistributiontablesfortheRVDijandforNw=2000atα=0.90(10%ConfidenceLevel). 1.051.101.151.201.251.301.351.401.451.501.551.601.651.70 1.051.20811.21171.21461.21831.22341.22561.22771.23141.23141.23281.23651.23721.2365 1.101.21611.21971.22631.22991.23361.23721.24081.2431.24521.24591.24811.2488 1.151.22631.23281.23791.24381.24811.2511.25471.25761.2591.26121.2627 1.201.23941.24671.25391.25831.26341.26671.2711.27361.27611.2776 1.251.25611.26191.26851.27541.28121.28521.28891.29291.2958 1.301.27251.28161.28851.29541.30121.30561.30921.3132 1.351.29291.30231.311.31611.32311.32821.3322 1.401.31511.32381.33291.34051.34711.3529 1.451.33831.34891.3581.3661.3733 1.501.36421.37511.38531.3938 1.551.39151.40271.4135 1.601.41951.4313 1.651.4484 1.751.801.851.901.952.00 1.051.23721.23791.23791.23941.23871.2394 1.101.24881.25031.25031.2511.2511.2503 1.151.26411.26491.26561.26591.26631.2659 1.201.2791.28051.28051.28231.28231.282 1.251.29721.29831.29981.30011.30161.3012 1.301.31511.31721.31871.32011.32051.3216 1.351.33621.3381.34091.34271.34421.3445 1.401.35691.36091.36341.36561.36741.3682 1.451.37871.38311.38711.38981.39181.3935 1.501.40051.4061.41131.41441.41731.4196 1.551.42161.42871.43441.43891.44241.4451 1.601.44181.45021.45751.46281.46731.4706 1.651.46041.47041.47841.48511.491.4942 1.701.47691.48821.49751.50591.51191.517 1.751.50461.51531.52411.53151.537 1.801.53011.54021.54831.5544 1.851.55391.56261.5697 1.901.57521.5824 1.951.593 2.00
TableII:DistributiontablesfortheRVDijandforNw=2000atα=0.95(5%ConfidenceLevel). 1.051.101.151.201.251.301.351.401.451.501.551.601.651.70 1.056.38256.3976.41166.45526.4486.46256.47716.49166.50626.49896.52076.5286.5207 1.106.41886.43346.46986.48436.51346.52076.54986.55716.56446.57166.59356.6007 1.156.46256.51346.5286.55716.59356.6086.62266.62996.65176.65176.6662 1.206.54256.57896.61536.64446.68086.68446.70996.72446.7396.7463 1.256.62266.6596.6996.73176.76086.78266.80456.82276.8372 1.306.71726.76086.79366.84096.87366.89186.916.9354 1.356.82276.87366.90636.94646.997.00467.0373 1.406.94646.98647.03737.0817.11017.1392 1.457.05917.1217.16467.20837.2483 1.507.2017.25567.31027.3538 1.557.33937.40297.4593 1.607.49027.5557 1.657.643 1.751.801.851.901.952.00 1.056.52076.54256.54256.5286.54986.5425 1.106.60076.6086.59356.6086.60076.6007 1.156.6596.68086.68086.68086.68446.6808 1.206.74996.76086.76086.76456.76456.7608 1.256.84456.85186.85546.8596.86636.8736 1.306.93916.95736.96096.96456.97186.9827 1.357.04827.06287.07737.08467.09197.0992 1.407.1617.18287.19377.20837.21197.2228 1.457.27387.30297.32117.33387.34847.352 1.507.39387.41937.44487.46667.48117.4921 1.557.49937.53937.57037.59577.61217.6285 1.607.60667.65397.69217.72317.74137.7613 1.657.70497.75947.80677.83777.86317.8868 1.707.79227.85597.9057.94687.98148.005 1.757.94137.99598.04328.08148.1123 1.808.07788.12878.17058.2051 1.858.20158.25068.2833 1.908.31428.3506 1.958.407 2.00