• No results found

On Rank-invariant Methods for Ordinal Data

N/A
N/A
Protected

Academic year: 2021

Share "On Rank-invariant Methods for Ordinal Data "

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

On Rank-invariant Methods for Ordinal Data

(2)

To the lost romance 致我们终将逝去的青春

(3)

Örebro Studies in Statistics 9

YISHEN YANG

On Rank-invariant Methods for Ordinal Data

(4)

© Yishen Yang, 2016

Title: On Rank-invariant Methods for Ordinal Data.

Publisher: Örebro University 2017 www.publications.oru.se

Print: Örebro University, Repro December/2016 ISSN1651-8608

ISBN978-91-7529-171-0

(5)

Abstract

Yishen Yang (2016): On Rank-invariant Methods for Ordinal Data. Örebro Studies in Statistics 9.

Data from rating scale assessments have rank-invariant properties only, which means that the data represent an ordering, but lack of standard- ized magnitude, inter-categorical distances, and linearity. Even though the judgments often are coded by natural numbers they are not really metric. The aim of this thesis is to further develop the nonparametric rank-based Svensson methods for paired ordinal data that are based on the rank-invariant properties only.

The thesis consists of five papers. In Paper I the asymptotic properties of the measure of systematic disagreement in paired ordinal data, the Relative Position (RP), and the difference in RP between groups were studied. Based on the findings of asymptotic normality, two tests for analyses of change within group and between groups were proposed. In Paper II the asymptotic properties of rank-based measures, e.g. the Svensson’s measures of systematic disagreement and of additional indi- vidual variability were discussed, and a numerical method for approxi- mation was suggested. In Paper III the asymptotic properties of the measures for paired ordinal data, discussed in Paper II, were verified by simulations. Furthermore, the Spearman rank-order correlation coeffi- cient (rs) and the Svensson’s augmented rank-order agreement coefficient (ra) were compared. By demonstrating how they differ and why they differ, it is emphasized that they measure different things. In Paper IV the proposed test in Paper I for comparing two groups of systematic changes in paired ordinal data was compared with other non- parametric tests for group changes, both regarding different approaches of categorising changes. The simulation reveals that the proposed test works better for small and unbalanced samples. Paper V demonstrates that rank invariant approaches can also be used in analysis of ordinal data from multi-item scales, which is an appealing and appropriate al- ternative to calculating sum scores.

Keywords: Ordinal data, rank-invariance, systematic change, agreement, association, inter-rater reliability, ranking, Spearman, multi-item scale.

Yishen Yang, Department of Statistics, School of Business, Örebro University, SE-701 82 Örebro, Sweden, e-mail: yys1-1@hotmail.com

(6)
(7)

LIST OF PAPERS

The following papers, referred to in the text by their numerals, are includ- ed in this thesis.

1. Yang, Y. and Svensson, E. (2016), Non-parametric analyses of change within group and between two groups of paired assess- ments on rating scales. Manuscript.

2. Svensson, E., Yang, Y., Holm, S. (2016), Asymptotic distribution of rank-based measures for paired ordinal data and their use in in- terval estimations. Manuscript.

3. Yang, Y. and Svensson, E. (2016), Analysing inter-rater agreement:

Spearman’s rank order correlation coefficient vs. Svensson’s aug- mented rank order agreement coefficient. Manuscript.

4. Yang, Y. (2016), Comparison of methods for comparing two groups of paired ordinal data. Manuscript.

5. Yang, Y. (2016), The use of rank-invariant method in analysis of change in multi-item scales. Manuscript.

(8)
(9)

CONTENTS

PART I: INTRODUCTION ... 13

Ordinal data and rank-invariant properties ... 13

Svensson methods ... 16

Change and comparison of changes ... 22

Multi-item scales ... 23

Contributions of this thesis ... 24

PART II: SUMMARY ... 25

REFERENCES ... 31

PART III: INCLUDED PAPERS ... 37

Non-parametric analyses of change within group and between two groups of paired assessments on rating scales ... 39

Asymptotic distribution of rank-based measures for paired ordinal data and their use in interval estimations ... 65

Analysing inter-rater agreement: Spearman’s rank order correlation coefficient vs. Svensson’s augmented rank order agreement correlation 81 Comparison of methods for comparing two groups of paired ordinal data ... 123

The use of rank-invariant method in analysis of change in multi-item scales ... 149

(10)
(11)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 13

PART I: INTRODUCTION

Ordinal data and rank-invariant properties

Questionnaires are important tools for collecting information. A question- naire can consist of different kinds of questions. For examples, ‘Describe the level of your perceived bodily pain.’ (negligible, mild, moderate, and severe); ‘How frequent do you have the feeling of anger?’ (never, occa- sionally, often and very often); ‘What is your attitude towards the death penalty ?’ (disagree in all cases, agree only in certain cases and agree in all cases). Such questions are assessed on rating scales. Assessments on rating scales often produce ordered categorical data or ordinal data (Stevens 1949; Kind 1988; Merbitz et al. 1989; Svensson1996).

Besides, ordinal data can also be generated by categorizing continuous data. They are pervasive in health and social research but the applications are not restricted to these areas. Methods dealing with ordinal data have been substantially developed for more than a century. A large number of papers have been published based on which we summarize various meth- ods from two perspectives, strategies and methods, trying to shed at least some light on this area.

Generally, there are four strategies to analyse ordinal data. Type I treats ordinal data as if they are quantitative normally distributed. Assign scores to the categories and then simply employ parametric methods, such us linear regression and ANOVA to the scores (Liu and Agresti 2005). This kind of method ignores the ordered categorical nature but actually is the most popular way chosen by the applied researchers who commonly en- counter ordinal data. Type II treats ordinal data as nominal by utilizing only the categorical characteristics (Agresti 2010). The ordering infor- mation is lost. Examples are Pearson’s chi-square test of independence and loglinear models. Type III uses only the ordered structures of categories without any assumption, e.g. some non-parametric rank-based methods and some model-based methods as well. Type IV believes that there is an unobserved continuous variable underlying the ordinal scale, which is called “latent variable” (Liu and Agresti 2005; Agresti 2010). The four types of strategies differ in the assumptions and ways of using infor- mation. We summarize them in the following figure.

(12)

14 YISHEN YANG On Rank-invariant Methods for Ordinal Data Figure 1. Four types of strategies for treating ordinal data.

Figure 2 is not a complete picture of all methods for ordinal data but serves as a guide to help readers understand this area. The methods mainly fall into two kinds, model-based and rank-based. Usually the purpose of modelling is to represent the association pattern when data are summa- rized in cross table, e.g. GLM with common links (McCullagh 1980) and association models (Agresti 2010). Unlike the regression models, associa- tion models do not distinguish between response variables and explanato- ry variables. The choice between those two depends on whether it is im- portant to identify the response variable. GLM with common link models are more useful in practice and the choices of “links” are influenced by the distribution of the underlying latent response variables. For clustered or repeated data, multi-level modelling approaches can be used, e.g. marginal models and generalized linear mixed models (Goldstein 2011). Further- more Bayesian approach has received more and more consideration for modelling ordinal data and the recent development is summarized in a survey (Agresti and Hitchcock 2004).

In addition, measures of association can also be helpful in describing as- sociation in cross tables. The cell proportions are transferred to single measure. Hence it is easier to use and interpreted for practical researchers than complex models. Non-parametric tests make few assumptions about the distributions and they are often good choices when data are ordinal.

Several examples are given in Figure 2. Kruskal-Wallis test is for more than two samples comparison and the parametric equivalence is one-way ANOVA. The rest are for paired samples comparison. Lehmann’s book (1975) gives details.

(13)

Figure 2. General map of various methods dealing with ordinal data

YISHEN YANG On Rank-invariant Methods for Ordinal Data 15

(14)

16 YISHEN YANG On Rank-invariant Methods for Ordinal Data

Attention should be paid to the fact that many methods, both modelling and non-parametric rank-based approaches, assign numerical scores to ordinal data. The idea is to incorporate ordering information into the model/measure/test. However, ordinal data has the rank-invariant proper- ties (Svensson 2010). A set of sequential numbers are often used as a con- venient tool to record the ordinal responses but they are not real numbers.

Besides numbers, any other set of ordered symbols or verbal descriptions could be chosen arbitrarily. Changing the coding of the data must not change the statistical treatment. These rank-invariant properties restrict the use of statistical methods assuming quantitative data. Svensson (1993) has developed a family of non-parametric methods for paired ordinal data, only assuming rank-invariant properties only. This means the methods are quite universal to many kinds of data. The thesis is based on and extends the methods developed by Svensson (1993).

Svensson methods

Her methods have a wide applicability. For example, in analysis of change (Svensson and Starmark 2002; Svensson and Sonn 1997), development and validation of scales and global scales (Svensson 2001; Svensson et al.

2009; Allvin et al. 2011), evaluation of inter- and intra-rater reliability, inter-scale comparisons and responsiveness (Svensson et al. 1996; Svens- son 1998; Svensson 2000; Svensson et al. 2012; Svensson et al. 2015), etc.

Several aspects of Svensson’s methods play an important role.

X: before

1 i j m

Y: after

m xmm

j xij xjj

i xii

1 x11

Figure 3. The m×m contingency table showing the joint frequency of the paired assessments of variables X and Y. The main diagonal is oriented from the lower left to the upper right corner showing the unchanged categories.

(15)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 17 Firstly, let us introduce notations used in this thesis. They are based on an m×m square contingency table (Figure 3). Denote the variable measured before treatment as X and after treatment as Y. They are paired. A scale of m categories (C1<...<Cm) is used. We regard the data as a sample from a population with sample size n. Let (Xt ,Yt ) be the paired assessments for individual t, for t = 1,…,n. The (i, j):th cell frequency is xij and the corre- sponding probability is pij. The v:th marginal category frequencies are

=

=1 m

v j vj

x x and =

=1

m

v i iv

y x , for v = 1,…, m.

• Systematic change in position and concentration

The advantage of Svensson’s methods is the ability to separate the ob- served change into systematic and random components. Systematic or group-related change consists of two parts, change in position and in con- centration, which are defined as γ =P X Y

(

<

)

P Y

(

<X

)

and

( ) ( )

δ =P Xl1 <Yk<Xl2 P Yl1 <Xk <Y . The empirical measures are RP l2 (relative position) and RC (relative concentration), that are based on the observed cell frequencies in Figure 3. The ranges of both measures are from -1 to 1. Values close to zero imply a negligible systematic change of categorical distributions between X and Y (Svensson 1993; Svensson and Starmark 2002). The variances of the measures are estimated by Jack- knife technique.

• Two ranking approaches

Traditionally, the paired assessments from two raters are assigned ranks separately. All subjects related to the same category from rater X (or Y) will be allocated the same mean rank value, meaning that the observations are tied to the marginal categories (Lehmann 1975). We will refer to this as the ordinary ranking approach.

Alternatively, Svensson (1993) proposed the augmented ranking ap- proach, which attempts to use the information on the mutual relationship between the paired assessments. The augmented ranks for X and Y are

(X)

Rij and R( )ijY , respectively. Observations within the same cell (Figure 3) will share the same mean ranks. Hence, ranks are tied to the cells in the contingency table, not to the marginal distributions. This ranking ap- proach makes it possible to measure the marginal determined systematic change separately from the individual variation that can be reflected by the marginal distributions.

(16)

18 YISHEN YANG On Rank-invariant Methods for Ordinal Data

Table 1 presents a hypothetical example. In ordinary ranking individu- als ① to ③ are classified to category C1 by rater X and given the same ordinary mean rank value of 2. On the other hand, in augmented ranking the internal ordering of those individuals assessed by rater Y divide them into two levels, with augmented mean rank values 1.5 and 3, respectively.

Table 1. The ordinary and augmented mean ranks allocated to the assess- ments, given by X and Y. The scale categories are C1 < C2 < C3.

observations ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩

Assessment

X C1 C1 C1 C2 C2 C2 C2 C3 C3 C3

Ordinary

mean ranks 2 2 2 5.5 5.5 5.5 5.5 9 9 9 Augmented

mean ranks 1.5 1.5 3 5.5 5.5 5.5 5.5 9 9 9 Assessment

Y C1 C1 C2 C2 C2 C2 C2 C3 C3 C3

Ordinary

mean ranks 1.5 1.5 5 5 5 5 5 9 9 9

Augmented

mean ranks 1.5 1.5 3 5.5 5.5 5.5 5.5 9 9 9

• Individual variation

Individual variation is related to the rank-transformable pattern of change (RTPC). It is the expected pattern when internal ordering of all observa- tion is kept between paired assessments, although the ordinal responses can be changed. It is determined by the marginal distributions of the paired data. An example is used to show how to construct a RTPC, see Figures 4-6.

In figure 4, datasets A and B have the same marginal distributions, which are, (1,2,7) by rater X and (3,3,4) by rater Y, indicating that they share the same RTPC (see the grey area of Figure 4), although dataset A and dataset B differ in the paired frequency distributions. One may be curious that, how the RTPC is constructed? The answer is to use only the marginal frequencies.

(17)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 19 Figure 4. Two datasets share the same marginal distributions and RTPC.

Figure 5 shows the simplest way to understand RTPC. Let us present the dataset A in the original format instead of in contingency table (in Figure 4). In the actual pattern, assessments given by Y on observations ③-④ disturb the common ordering. Therefore, after change the order of assess- ments marked by red circle the expected pattern is formed in which the ordering of all observations is kept between paired assessments. Use the expected pattern to make contingency table then the RTPC is generated.

Figure 5. An example to show the RTPC is formed based on the expected pattern where the ordering of all observations is kept between paired assessments.

(18)

20 YISHEN YANG On Rank-invariant Methods for Ordinal Data

More specifically, Figure 6 illustrates a more formal and detailed steps.

The objective is to fill the cells with digits so that the marginal frequencies between X and Y can match. The marginal frequency of Y on category C1

is 3(①②③). In step1 we fill ① in cell (C1, C1) to make X and Y match.

Then two observations, ②&③ are left and we place them to cell (C2, C1) in step 2, in order to match the marginal frequency of X on category C2. So far the three observations (①②③) are all placed to fill the cells which are marked in grey. We keep doing this kind of “number-placement” and

“marginal matching” in steps 3-4 until all marginal frequencies of Y are matched with marginal frequencies of X. Then the RTPC is completed (see the grey area in step4). We can also use X to match Y that will be result in the same RTPC.

Figure 6. Example to show the steps to construct RTPC (rank-transformable pat- tern of change) based on the marginal distributions of X (1,2,7) and Y (3,3,4).

RTPC is used to visualize the nature of change. When the overall observed change only comes from the systematic component, all observations are within RTPC. Two sets of classifications are called rank transformable when R( )ijX =Rij( )Y meaning the augmented ranks determined by X equal those determined by Y. Dispersion from the RTPC impliesR( )ijXRij( )Y , for at least some cells. Relative rank variance (RV) is defined to measure indi-

(19)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 21 vidual variability, describing the dispersion from the RTPC (Svensson 1993).

( )

= =

= 3

∑∑

( ) ( ) 2

1 1

6 m m X Y

ij ij ij

i j

RV R R x

n

Change is a type of disagreement. In the context of disagreement, RTPC becomes the rank-transformable pattern of agreement (RTPA). The inter- nal rank variance (IV) is a measure of unobserved individual variability when ties occur. There may be an internal rank order of these individuals who are judged into the same category by two raters (Svensson 1993).

• Augmented rank-order agreement coefficient

The augmented rank-order agreement coefficient (ra) (Svensson 1997) measures the agreement in pairs of augmented mean ranks. The measure relative rank variance, RV and the augmented rank-order agreement coef- ficient, ra are two sides of the same coin. RV measures the dispersion from the RTPA while ra describes the closeness of observations to the RTPA.

The general expression is given as

= =

− − ⋅ − ∆

= − − − ⋅ + ⋅

∑∑

3 3 2

1 1

3 2 3 3 3 2

( ) 6

( ) 2 ( ) ( )

m m

ij ij

i j

a

n n n IV x R

r n n n n n IV n IV

where IV is the internal rank variance and Rij2 =(R( )ijXRij( ) 2Y) . It can be simplified as

= − − −

3

3 3

1 ( )

a

r n RV

n n n IV

The upper limits, ra=1, corresponding to RV=0, indicates the observed disagreement is completely described by the systematic component (Svens- son 1993).

• Measures of association

The quality of data from rating scale assessments is determined by the degree of agreement between or within raters. The level of agreement be- tween raters, for example, is important in determining if a particular scale is appropriate for measuring a particular variable (Stevens 1949; Lehmann 1975; Merbitz et al. 1989). Disagreement may include both systematic and individual variability components, which have different impacts on the

(20)

22 YISHEN YANG On Rank-invariant Methods for Ordinal Data

quality of data. The systematic component, once identified, can be under- stood and adjusted for, e.g., re-train the raters while the non-negligible individual variability is a sign of uncertainty in paired assessments or un- satisfactory quality of the scale (Svensson 1993).

According to Daniels (1944) the Pearson product-moment correlation coefficient, the Spearman rank-order correlation coefficient and the Ken- dall’s tau share the properties of a general class of correlation coefficients.

One of these measures, the Spearman’s rs, is frequently used as an agree- ment or reliability measure (Craig et al. 2003, Gedikoglu et al. 2005, Vandelanotte et al. 2005, Kurtze et al. 2008), although some researchers (Svensson 1997; Agresti 2002) doubted that it is strictly appropriate in reliability studies. Alternatively, Svensson proposed the augmented rank- order agreement coefficient (ra) that measures the association between pairs of augmented mean ranks (Svensson 1997). She also (1997) suggest- ed that ra should be added as a new member to this general class as it is calculated from the Pearson’s product-moment correlation coefficient in the case of augmented mean ranks. The higher the value of ra is the more reliable the ordering of paired assessments is. Together with other measures (RP, RC, RV), ra provides a comprehensive evaluation of agree- ment (Svensson 1993; Svensson and Holm 1994; Svensson 1998).

Change and comparison of changes

Chang may concern treatment or intervention effects or natural changes over time. Statistical evaluation of change on subjective variables often refers to assessments made by the same patients on the same rating scales before and after treatment. Change in paired assessments on rating scales cannot be defined as differences as calculations based on adding and sub- tracting are meaningless (Stevens 1949; Chatfield 1991; Agresti 2002).

Hence choices of statistical methods should be carefully made.

Besides analysis of change in a single group, comparison of changes be- tween groups is also a common question. Standard statistical approaches are available when the variables are quantitative and normally distributed (Senn 1997; David 2000; Matthews 2000). However, when the variable is recorded on ordinal scale fewer choices can be found. Moses, Emerson and Hosseini (1992) discussed various testing approaches for comparisons of two groups for ordinal data, for example, chi-squared test, Wilcoxon Mann-Whitney test, t-test by assigning numerical scores to the categories, etc. Their recommendations are widely accepted in practice. Generally, the tests are linear functions of the cell frequencies. Graubard and Korn

(21)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 23 (1987) summarized this type of tests into two categories: methods that need pre-assigned numerical scores for each category level and methods that do not. They concluded that the former is superior to the latter.

However, these methods violate the rank-invariant properties of ordinal data by defining change in ordered categories as difference. Thus we turn to rank-invariant methods.

The Pearson Chi-squared test compares two groups in terms of propor- tions of deterioration, no change and improvement. The Cochran- Armitage test modifies the Pearson chi-squared test by incorporating a suspected ordering in the effects. The stratified Wilcoxon Mann-Whitney test is recommended more often (Emerson and Moses 1985; Agresti and Finlay 1997; Rahlfs and Zimmerman 1993). Alternatively, analysis of change can be based on RP (relative position), a measure of systematic change in position defined by Svensson (1993). In this thesis analysis of change in a single group will extended to comparison of change between groups.

Multi-item scales

Multi-item scales are commonly used in measuring health related quality of life (HRQoL). They consist of multiple items (questions) and each item comprises a single attribute to be recorded, usually an ordinal response (Svensson 2001). When a dimension consists of more than one item, a summary or global score is necessary in order to make inference about the concept being measured (Svensson 2001; Hadzibajramovic 2013). Among all methods to combine items into a total score the use of sum scores is regarded as the standard way (Coste et al. 1995; Agresti 2002; Schwab et al. 2003).

When the purpose is analysis of change of data from multi-item scales, the most dominating methods are also based on sum score and calculation of the differences between scores. For example, Cohen’s d is frequently used (Cohen 1977), which is calculated as the difference between the base- line and follow-up scores divided by the standard deviation of the baseline score. Negative values indicate improvement and positive values indicates deterioration.

Some researchers questioned the appropriateness of the sum score ap- proach in analysis of change in multi-item scales. Sum sore is based on the assumptions of quantitative data and equidistance (Jenkinson et al. 1994;

Coste et al. 1995; McDowell and Newell 1996; Malik et al. 1999; Agresti 2002), which obviously violates the rank-invariant properties of ordinal data (Svensson 2000). Furthermore, various response profiles may result

(22)

24 YISHEN YANG On Rank-invariant Methods for Ordinal Data

in the same sum score value, indicating that it is insufficient statistic (Hadzibajramovic 2013). This thesis shows that the rank-invariant meth- ods can be used to analyze change for multi-item scales instead of sum score approaches.

Contributions of this thesis

The overall aim of the thesis is to further develop the nonparametric rank- based Svensson methods for paired ordinal data. The thesis consists of five papers. The first three are theoretical papers dealing with properties of estimators / tests; the last two are applied papers verifying the applicability of the methods.

More specifically, paper I focuses on one measure, RP and asymptotic properties of that, based on which two tests are proposed for analysis of systematic change in a single sample as well as for comparison of changes between groups of paired ordinal data. Paper II gives a general form of asymptotic for rank-based measures, e.g. various Svensson’s measures based on contingency table. Paper III discusses the measures that are used in inter-rater agreement studies and verifies the asymptotical results in Paper II by simulation. Paper IV compares the tests for comparisons of changes proposed in Paper I with other relevant non-parametric tests by simulation, showing that the proposed test works better for small and unbalanced samples. Paper V shows that besides sum score methods rank invariant methods can also be used in analysis of multi-item scales.

(23)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 25

PART II: SUMMARY

Paper I: Non-parametric analyses of change within group and between two groups of paired assessments on rating scales

Statistical analysis of change or treatment effect on subjective variables often refers to assessments on the same rating scale pre- and post- treatment. It is of great clinical interest in both single sample studies and comparison of two groups. Change in paired assessments cannot be de- fined as differences due to the non-metric properties of ordinal data. Alt- hough the judgements are usually coded by numbers they have no mathe- matical meanings and there is no real distance between them. It restricts us from using statistical methods based on quantitative data and normality assumptions.

In this paper we study a non-parametric measure relative position (RP), which is an empirical estimate of the parameter systematic change in posi- tion (γ ) between paired rating scale assessments (Svensson 1993). We show that both RP and the difference in RP between groups, ∆RP, are asymptotically normally distributed. By employing the theories of U- statistics to the indicator expression, the asymptotic normality of RP and

∆RP are proved, which is also supported by simulation evidences.

Based on the asymptotic results we propose two tests for analyses of systematic change within group and between two independent groups for paired ordinal data. These two tests enable us to answer two questions:

whether a non-zero RP value gives enough evidence for a general conclu- sion of treatment effect in a population and whether the difference be- tween two RP values are large enough to conclude that these two popula- tions are different in treatment effects. A simulation study of type I error rate is performed to the suggested test for comparison of systematic changes between groups. Generally, the normal approximation of the test statistic is reasonably good. However, one should be cautious about using asymptotic results when the nominal level is 1% and for comparing two groups whose sample sizes are quite different from each other.

Our proposed tests are two small, but needed (Cohen et al. 2000) con- tributions to the few statistical methods that require rank-invariant prop- erties of ordinal data. The results of this paper can be applied to clinical and epidemiological studies of qualitative variables when the research purpose is analysis of change in single sample studies or comparison of changes between two groups. The statistical methods are demonstrated by

(24)

26 YISHEN YANG On Rank-invariant Methods for Ordinal Data

an empirical dataset from a prospective study of the sensitivity to change of assessments (Svensson et al. 2015). In their study three groups of diag- noses were included. For demonstration, only two of the diagnoses are used for comparing changes. For these two groups, the RP values provide evidences enough for a general conclusion of treatment effects after surger- ies. However, there is not enough evidence to conclude that the treatment effects would differ between the two diagnoses.

Paper II: Asymptotic distribution of rank-based measures for paired ordinal data and their use in interval estimations

Assessments on rating scales generate ordinal data. Although they are usually coded by numbers but there is no standardized magnitude, inter- categorical distances and linearity between categories (Stevens 1946, Dybkaer and Jorgensen 1989, Hand 1996). Therefore, statistical methods for ordinal data should only be based on orders, or on ranks. Rank based measures of systematic disagreement in position and concentration in paired ordered categorical judgments as well as measure of random varia- tion and alignment have been proposed and studied.

In this paper we present and study asymptotic distributions for a set of measures evaluating the observed disagreement in paired ordered categori- cal data which only assume rank-invariant properties. They are measures of systematic disagreement: relative position (RP) and relative concentra- tion (RC); individual variability: relative rank variance (RV), augmented rank-order agreement coefficient (ra) and internal rank variance (IV) (Svensson 1993); and Spearman rank-order correlation coefficient (rs).

The key property of these rank-based measures is that they are com- pletely determined by the observed cell frequencies in the contingency table. Hence by using the asymptotic theories of the cell frequencies we are able to prove the asymptotic normality and give a general form of asymp- totic variance. A numerical approach to get approximate estimation for the parameters is also suggested. One can get an approximation from a rather straightforward computer calculation without making any theoreti- cal effort. This enables potential users in applied studies to construct con- fidence intervals for the corresponding parameters. As an illustration the presentation is accompanied by two application examples.

(25)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 27

Paper III: Analysing inter-rater agreement: Spearman’s rank or- der correlation coefficient vs. Svensson’s augmented rank order agreement coefficient

This paper is to compare Spearman’s rank-order correlation coefficient (rs) with the augmented rank-order agreement coefficient (ra). We illustrate the differences between these two coefficients, evaluate their appropriateness in agreement studies and emphasize the special features of ra, as a member of the Pearson’s correlation family.

Spearman’s rank-order correlation coefficient (rs) is commonly used to measure the agreement between or within raters using the same rating scale. Alternatively, Svensson (1997) proposed the augmented rank-order agreement coefficient (ra) that expresses the association of paired aug- mented mean ranks or the agreement of the data to the best possible or- dering (RTPA). Both of the coefficients belong to Pearson’s correlation family (Daniels 1944; Svensson 1997) and they only differ in ranking ap- proaches. The population parameters estimated by the Spearman’s rs and the augmented ra areρsandρa, respectively.

A simulation study is conducted to compare their distributional proper- ties, with three situations. (1)ρ = 0s . Two raters classify the subjects inde- pendently and randomly; (2)ρ ≈a 0. Generating data from populations where the association between the augmented mean ranks is close to 0; (3)

ρ ≈ 1a . Generating data from populations where the association between the augmented mean ranks is close to 1, indicating the observed disagree- ment is caused only by the systematic component. Some implications of the simulations are given:

• ra ≥rs in case of positive correlations for all simulated groups which is reasonable as Spearman’s rs always relates to the main diagonal while augmented ra relates the RTPA which is data de- pendent.

• The impact of the number of categories goes to opposite direction on ra and rs. For a fixed n, when m increases, ra decreases while rs

increases. A large Spearman coefficient from a scale with many cat- egories (e.g. VAS) should be carefully interpreted.

• Spearman's rs might fail to detect the disagreement when one rater systematically gives higher assessments than the other rater, ac-

(26)

28 YISHEN YANG On Rank-invariant Methods for Ordinal Data

cording to a certain strong relationship, for example, Case 3 in the third situation.

To conclude, we show the differences between Spearman’s rs and the aug- mented ra in analysing inter-rater agreement. They focus on different data properties. Spearman’s rs is a summary measure and it can be useful in describing how strongly the paired assessments are associated to each other. However, the disagreement pattern is still hidden. A high value of rs

does not necessarily mean a good agreement. On the other hand, ra

measures the association of the data to the best possible ordering, RTPA.

Although formally it is a member of Pearson’s family, the interpretation is different as it is not a “usual” correlation. Looking at both systematic disagreement in position and concentration (RP and RC) and the random variation and alignment (ra and RV) gives more detailed information about the data and makes it possible to draw more precise conclusions.

PART IV: Comparison of methods for comparing two groups of paired ordinal data

Comparison of changes between groups of subjective variables from rating scales is a common question in medical research. In paper one we pro- posed a non-parametric test based on the difference of RP (relative posi- tion) which is an empirical measure of the parameter of systematic change in position between paired assessments, γ =P X

(

l <Yk

)

P Y

(

l <X k

)

(Svensson 1993). It enables us to answer the question: whether the differ- ence between two RP values is large enough to conclude that these two populations are different in changes.

In this paper we compare the ∆RP-based test to the other relevant tests that do not violate the rank-invariant properties of ordinal data, for com- parisons of changes between groups. They are Pearson chi-squared test, Cochran-Armitage test, and stratified Wilcoxon Mann-Whitney test. All of them take into consideration the properties of ordinal data and are aiming at detecting difference in changes between groups. However, they focus on different aspects of comparisons and the null hypotheses differ.

The Pearson chi-squared test, Cochran-Armitage test reduce the data to a 2×3 table reporting the count of observations that become worse, remain the same and become better in each group. The null hypotheses are corre- lated to the 2×3 cell probabilities. The null hypothesis of the stratified Wilcoxon Mann-Whitney test is associated to differences in the cell prob-

(27)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 29 abilities of two contingency tables. On the other hand, the ∆RP-based test concerns the differences in group-level changes and based on the marginal distributions. This paper shows that the null hypothesis of stratified Wil- coxon Mann-Whitney test implies the null hypotheses of χ2or Cochran- Armitage test and the ∆RP-based test as well. This means that as long as

0(SWMW)

H is trueH0(χ2), andH0(C A )are true but not the other way around.

Examples are used to illustrate that different tests do not always tell the same story even from the same data. The conflicts can be interpreted by the nature of the test statistics and the differences in null hypotheses.

Simulations are performed mainly to compare their powers to detect differences in changes. It is accompanied by a simulation of significance levels to investigate for what sample sizes the asymptotic approximations are applicable. The design takes into account the potential impacts of number of categories, sample sizes and balanced or unbalanced group comparisons. For power study the two groups of data are generated from two different joint probability distributions (treatment and control sub- tables) in each data generating probability table. However, the type I error study is done by drawing random samples from the same joint distribution (treatment sub-table only) for both groups and comparing the nominal and actual significance levels. Simulation evidences show that the chi- squared test has unsatisfactory power and size. For small sample sizes with many categories the stratified Wilcoxon Mann-Whitney has slightly worse power and size. Generally speaking, the ΔRP-based test works better than other methods when the sample size is small and unbalanced.

Paper V: The use of rank-invariant methods in analysis of change in multi-item scales

Multi-item scales consist of multiple items (questions). When a dimension consists of more than one item a summary or global score is needed.

Among all methods to construct a global score the sum score approach is regarded as the standard way (Coste et al. 1995; Agresti 2002; Schwab et al. 2003).

The primary aim of the paper is to show that the rank-invariant meth- ods can be used in analysis of change in multi-item questionnaires, instead of the methods based on sum scores. In order to do that, we pick up sev- eral rank-invariant global scores (median, criterion-based, indicator sum) proposed by different people and select some dimensions in the Short- Form-36 Health Survey (Ware et al. 1993) as examples.

(28)

30 YISHEN YANG On Rank-invariant Methods for Ordinal Data

Firstly, we use both sum score and the rank-invariant global scores to analyse the dimensional level of change. The comparisons show that alt- hough they show similar pattern of group-related systematic change they differ in individual variability. Less individual fluctuation is found when the rank-invariant global scores are used. In analysis of change less indi- vidual variability makes the results easier to be interpreted.

Secondly, we compare the dimensional level of change with the item level of change. The dimensional level of change using the rank-invariant global scores is in accord with the item level of change. However, the sum scores do not always satisfy this accordance, regarding both systematic component and individual variability component of change. The implica- tion is that useful information is captured from the original items when aggregating them into the rank-invariant global scores.

To conclude, the rank-invariant global scores have advantages in analy- sis of change for multi-item scales: they are representative to the original items; the analysis of change using the rank-invariant global scores show close pattern to the item level of change; besides, compare to sum scores the rank-invariant global score are easily described in words and inter- pretable. Therefore, rank-invariant global scores are recommended for analysing of change in multi-item scales.

(29)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 31

REFERENCES

Agresti, A and Finlay, B. (1997). Statistical Methods in the Social Sciences, 3rd edn, Prentice-Hall, Upper Saddle River NJ.

Agresti, A. (2002). Categorical Data Analysis. 2nd ed, Wiley, New York.

Agrestic, A. and Hitchcock, D. (2004). Bayesian inference for categorical data analysis: A survey. Technical report, Department of Statistics, University of Florida.

Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd edn, Wiley, New York.

Allvin, R., Svensson, E., Rawal, N., Ehnfors, M., Kling, A.M. and Idvall, E. (2011). The Postoperative Recovery Profile (PRP) – a multidimen- sional questionnaire for evaluation of recovery profiles, Journal of Evaluation in Clinical Practice 17(2): 236–243. doi: 10.1111/j.1365- 2753.2010.01428.

Bowling, A. (2004). Measuring Health: A Review of Quality of Life Measurement Scales, 3rd ed, Open University Press, Philadelphia.

Cohen, J. (1977). Statistical Power Analysis for the Behavioral Sciences, Academic Press, New York.

Chatfield, C. (1991). Problem Solving: A Statistician’s Guide, Chapman and Hall, London.

Coste, J., Fermanian, J. and Venot, A. (1995). Methodological and statis- tical problems in the construction of composite measurement scales: a survey of six medical and epidemiological journals, Statistics in Medi- cine 14: 331–345.

Cohen, A., Sackrowitz, H.B., Sackrowitz, M. (2000). Testing whether treatment is ‘better’ than control with ordered categorical data: an evaluation of new methodology, Statistics in Medicine19: 2699–2712.

Craig, C.L., Marshall, A.L., Sjostrom, M., Bauman, A., Booth, M.L., Ainsworth, B.E., Pratt, M., Ekelund, U., Yngve, A., Sallis, J.F. and Oja, P. (2003). International Physical Activity Questionnaire: 12-country re- liability and validity, Medicine and Science in Sports and Exercise 35:

1381–1395.

(30)

32 YISHEN YANG On Rank-invariant Methods for Ordinal Data

Daniels, H.E. (1944). The relation between measures of correlation in the universe of sample permutations, Biomerika 33: 129–135.

Dybkaer, R., Jorgensen, K. (1989). Measurement, value and scale, Scand J Clin Lab Invest 49: Suppl 194:69-76.

David, J.S. (2000). Handbook of parametric & nonparametric statistical procedures, Chapman & Hall/ CRC, Florida.

Emerson, J.D. and Moses, L.E. (1985). A note on the Wilcoxon-Mann- Whitney test for ordered tables, Biometrics 41: 303–309.

Fayers, P.M. and Machin, D. (2000). Quality of Life. Assessment, Analy- sis and Interpretation, Wiley, Chichester.

Graubard, B.I. and Korn, E.I. (1987). Choice of column scores for testing independence in ordered contingency tables, Biometrics 43: 471–476.

Gedikoglu, U., Coskun, O., Inan, L.E., Ucler, S., Tunc, T. and Emre, U.

(2005). Validity and reliability of Turkish translation of Migraine Dis- ability Assessment (MIDAS) questionnaire in patients with migraine, Cephalalgia 25(6): 452–456.

Goldstein, H. (2011). Multilevel Statistical Models, 4th edn, Wiley, Lon- don.

Hand, D.J. (1996). Statistics and the theory of measurement, J R Statist.

Soc A 159: 445-492.

Hadzibajramovic, E. (2013). Methodological approaches to the analysis of psychosocial work environment (Dissertation), Örebro University, Örebro.

Jenkinson, C., Peto, V. and Coulter, A. (1994). Measuring change over- time: a comparison of results from a global single item of health status and the multi-dimensional SF-36 health status survey questionnaire in patients presenting with menorrhagia, Quality of Life Research 3: 317–

321.

Kind, P. (1988). The development of health indices. In Measuring Health:

a Practical Approach, Teeling Smith G, ed, John Wiley & Sons, Chich- ester.

Kurtze, N., Rangul, V. and Hustvedt, B. (2008). Reliability and validity of the international physical activity questionnaire in the Nord-Trondelag

(31)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 33 health study (HUNT) population of men, BMC Medical Research Methodology 8: 63.

Lehmann, E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks, Holden-Day, San Francisco.

Liu., I. and Agresti, A. (2005). Analysis of ordered categorical data: An overview and a survey of recent development, TEST 14(1): 1-73.

McClullagh, P. (1980). Regression models for ordinal data (with discus- sion), Journal of the Royal Statistical Society. Series B, 42: 109-142.

Merbitz, C., Morris, J. and Grip, J.C. (1989). Ordinal scales and founda- tions of misinference, Archieves of Physical Medicine and Rehabilita- tion 70: 308–312.

Moses, L.E., Emerson J.D. and Hosseini, H. (1992). Analyzing data from ordered categories. In Medical Uses of Statistics, 2nd edn, Bailar J.C., Mossteller F. (des.), NEJM Books, Boston.

McDowell, I. (1996). Measuring Health: A Guide to Rating Scales and Questionnaires, Oxford University Press, New York.

McDowell, I. and Newell, C. (1996). Measuring Health. A Guide to Rat- ing Scales and Questionnaires, 2nd ed, Oxford University Press, Ox- ford, pp.10–46.

Malik, M.L., Connor, K.M., Sutherland, S.M., Smith, R.D., Davison, R.M. and Davidson, J.R. (1999). Quality of life and posttraumatic stress disorder: a pilot study assessing changes in SF-36 scores before and after treatment in a placebo-controlled trial of fluoxetine, Journal of Traumatic Stress 12(2): 387–393.

Matthews, J.N.S. (2000). An Introduction to Randomized Controlled Clinical Trials, Arnold, London.

Plackett, RL. (1983). Karl Pearson and the Chi-Squared Test. Internation- al Statistical Review 51 (1): 59–72.

Rahlfs, V.W. and Zimmerman, H. (1993). Scores: ordinal data with few categories-how they should be analyzed, Drug Information Journal 27:

1227–1240.

Stevens, S.S. (1949). On the theory of scales of measurement, Science 103:

677-680.

(32)

34 YISHEN YANG On Rank-invariant Methods for Ordinal Data

Svensson, E. (1993). Analysis of Systematic and Random differences be- tween paired ordinal categorical data (Dissertation), Göteborg Univer- sity, Göteborg.

Svensson, E. (1996). Guidelines to statistical evaluation of data from rat- ing scales and questionnaires, Journal of Rehabilitation Medicine 33:

47–48.

Svensson, E. (1997). A coefficient of agreement adjusted for bias in paired ordered categorical data, Biometrical Journal 39: 643–657.

Svensson, E. (1998). Ordinal invariant measures for individual and group changes in ordered categorical Data, Statistical in Medicine 17: 2923–

2936.

Svensson, E. (2000). Concordance between ratings using different scales for the same variable, Statistics in Medicine 19: 3483–3496.

Svensson, E. (2001). Construction of a single global scale for multi-item assessments of the same variable, Statistics in Medicine 20: 3831–3846.

Svensson, E. (2010). Rank invariance. In: Everitt BS, Palmer CR, eds. En- cyclopaedic Companion to Medical Statistics, 2nd ed, Wiley, New York, pp. 381–382.

Svensson, E. (2012). Different ranking approaches defining association and agree-ment measures of paired ordinal data, Statistics in Medi- cine31:3104-3117.

Svensson, E. and Holm, S. (1994). Separation of systematic and random differences in ordinal rating scales, Statistics in Medicine 13: 2437–

2453.

Svensson, E., Starmark, J.E., Ekholm, S., von Essen, C., and Johansson, A.

(1996). Analysis of interobserver disagreement in the assessment of subarachnoid blood and acute hydrocephalus on CT scans, Neurologi- cal Research 18: 487–494.

Svensson, E. and Sonn, U. (1997). Measures of individual and group changes in ordered categorical data: application to the ADL staircase, Scandinavian Journal of Rehabilitation Medicine 29: 233–242.

Svensson, E. and Starmark, J.E. (2002). Evaluation of individual and group changes in social outcome after aneurysmal subarachnoid haem-

(33)

YISHEN YANG On Rank-invariant Methods for Ordinal Data 35 orrhage: a long-term follow-up study, Journal of Rehabilitation Medi- cine 34: 251–259.

Svensson, E., Schillberg, B., Kling, A.M. and Nyström, B. (2009). The balanced inventory for spinal disorders. The Validity of a disease spe- cific questionnaire for evaluation of outcomes in patients with various spinal disorders, Spine 34: 1976–1983.

Svensson, E., Schillberg, B., Zhao, X., Nyström, B. (2015). Responsiveness of the Balanced Inventory for Spinal Disorders, A Questionnaire for Evaluation of Outcomes in Patients with various Spinal Disorders, J Spine Neurosurg 4:2. http://dx.doi.org/10.4172/2325-9701.1000184.

Vandelanotte, C., De, Bourdeaudhuij, I.M., Philippaerts, R.M., Sjostrom, M., Sallis, J.F. (2005). Reliability and validity of a computerized and Dutch version of the International Physical Activity Questionnaire (IPAQ), Journal of Physical Activity and Health 2: 63–75.

Ware, J.E., Kosinski, M. and Gandek, B. (1993). SF-36 ® Health Survey:

Manual & Interpretation Guide, Quality Metric Incorporated, Lincoln, RI.

Yang, Y. and Svensson, E. (2016). Analysis of differences in systematic change between two groups of ordered categorical data, manuscript.

(34)
(35)

Publications in the series Örebro Studies in Statistics

1. Werner, Peter (2003). On the Cost-Efficiency of Mixed Mode Surveys Using the Web.

2. Wahlström, Helen (2004). Nonparametric Tests for Comparing Two Treatments by Using Ordinal Data.

3. Westling, Sara (2008). Cost efficiency of nonresponse rate reduction efforts – an evaluation approach.

4. Högberg, Hans (2010). Some properties of measures of disagreement and disorder in paired ordinal data.

5. Alam, Moudud Md. (2010). Feasible computation of the generalized linear mixed models with application to credit risk modelling.

6. Li, Dao (2013). Common Features in Vector Nonlinear Time Series Models.

7. Ding, Shutong (2014). Model Choice in Bayesian VAR Models.

8.

9

Rota, Bernardo João (2016). Calibration Adjustment for Non- response in Sample Surveys.

Yang, Yishen (2017). On Rank-invariant Methods for Ordinal Data.

References

Related documents

When the students have ubiquitous access to digital tools, they also have ubiquitous possibilities to take control over their learning processes (Bergström &amp; Mårell-Olsson,

An important finding of this work is that visualization of vehicle usage based on position data can serve as one of external data sources used in RCA process, which can be used by

One gathers new information that could affect the care of the patient and before the research has been concluded, we can’t conclude whether using that information is

Data from rating scale assessments have rank-invariant properties only, which means that the data represent an ordering, but lack of standardized magni- tude,

In Paper IV the proposed test in Paper I for comparing two groups of systematic changes in paired ordinal data was compared with other non- parametric tests for group changes,

Since S follows a Wishart distribution, the knowledge about the distribution of eigenvalues for Wishart distributions is used to investigate the number of principal components

Figure 8.1.c shows the result of an adaptive wavelet packets transform (see section 7.2). The QMF bank tree applied here has 7 levels, just as the tree used with figure 8.1.b. This

Application of data mining in CRM is helping enterprise to dig out the most valuable customers. Many managers and marketing decision-makers usually focus on the income-flux brought