How to calibrate a questionnaire: quality-assuring categorical data with psychometric measurement theory

(1)

1 How to calibrate a questionnaire: quality-assuring

categorical data with psychometric measurement theory

L R Pendrill

Research Institutes of Sweden, Metrology,

Eklandagatan 86, 41261 Göteborg (SE),

phone:+46 767 88 54 44, mailto:leslie.pendrill@ri.se

(2)

2

(3)

3 Man as Measurement Instrument: Counting

’M dots’

http://itre.cis.upenn.edu/~myl/CommuniqueMundurucuENG.pdf

(4)

4 Measuring Man:

- Status, function of person

- Test against specifications

Man as Measurement Instrument:

- Perception of product/service

function, comfort etc

- Propose improvements in product

Man as Measurement Instrument

(5)

5 Quality-assured categorical

measurement

(6)

6

(7)

7

Tukey [Chapter 8, Data analysis and behavioural science; quoted in “The collected works of John A Tukey, Volume III, Philosophy and principles of data analysis: 1949 – 1964”, ed. L V Jones, Univ. North Carolina, Chapel Hill

Counted fractions

”Beware of attempts to interpret correlations between ratios

whose numerators and demoninators contain common parts”

[Pearson 1897]

%

1





_K k k j j

X

Relative-number problems:

• Counting (sheep & goats)

• How many affected at this dose

• How many of the pebbles are quartz…

j

X

j

(8)

8 Different scales of measurement

6

3

2

1 





http://www.123rf.com

’Counted fractions’

%

1





_K k k j j

X

(9)

9 Logistic ruling

z

P

success

















1 log

’Counted fractions’

(10)

10 Probabi

lity









K

k

c

k

c

p

P

q

1 ,

P

c,k

= probability of classification c

when true level is k

p

_k

= a priori probability that true level is k

Counted fractions







n

i

q

1

1 ;

0  







n

i

y

q

E y















_C

y

b

y

b

i

c i

e

q

b = Lagrange multiplier

J M Linacre 2002, ” Optimizing Rating Scale Category Effectiveness”, Journal of Applied Measurement , 3:1 pp.85-106



_







y

₁

,

y

₂

,...,

y

_C

y

_c

R

(11)

11 Kaffe.11

W P Fisher Jr. 1999

)

Difficulty

(



𝑃

_{𝑠𝑢𝑐𝑐𝑒𝑠𝑠}

=

𝑒

𝜃−𝛿

1 + 𝑒

𝜃−𝛿

(12)

12 Wright B.D. (1997) Fundamental measurement for outcome evaluation. Physical

medicine and rehabilitation : State of the Art Reviews. 11(2) : 261-288

𝑃

_{𝑠𝑢𝑐𝑐𝑒𝑠𝑠}

=

𝑒

𝜃−𝛿

1 + 𝑒

𝜃−𝛿

(13)

(14)

14

(15)

15 PT1 PT2 PT3 PT4 PT5 PT6

PT7 PT8 PT9 PT10 PT11 PT12 PT13 PT14 PT15 ∑P T

RMI-1

0

1

0

1

0

2 RMI-2

0

1

0

1

0

1

0

3 RMI-3

0

1

0

1 RMI-4

0

1

0

1

0

1

0

1

0

1

0

6 RMI-5

0

1

0

1

0

1

0

8 RMI-6

0

1

0

1

0

1

0

1

0

4 RMI-7

0

1

0

1

0

9 RMI-8

0

1

0

1

0

1

0

1

0

1

0

5 RMI-9

0

1

0

1

0

10 RMI-10

1

0

12 RMI-11

0

1

0

11 RMI-12

1

0

1

0

13 RMI-13

0

1

0

1

0

1

0

1

0

7 RMI-14

1

0

14 RMI-15

1

15 ∑RMI

4

14

6

5

13

7

15

8

11

9

12

10

2

3

1

How does this work?

(16)

16 underestimated

7. Ac

hiev

e

3 .

Length

Better

quality

Worse

quality

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐶𝑇𝑇

_𝑖

=

𝑃

𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗

𝐿

𝑗=1

𝐿

𝑙𝑛

_1−𝑃

𝑃

𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗

=𝜃

𝑖

− 𝛿

𝑗

(17)

17 Measurement and ordinal data

Required

- epoch

(/Myear)

Measured

- depth (/cm)

https://cdn-assets.answersingenesis.org/img/articles/ee/v2/geological-layers.jpg

(18)

18 My

health?

Quality-assured measurement

Cognitive ability?

0,8 units ± 0,2 units

Object: Health

Person-centred care (PCC)

• Focus on health (not

illness)

• People partners in care

• More symptoms

• Impact on Activities of

Daily Living

• Subjective & perceptive

• …

(19)

19

OECD Health Statistics 2015, http://dx.doi.org/10.1787/health_glance-2015-graph191-en

Variation in Primary Care

Indicators

Potential causes of

variation:

• Disease prevalence

• How physicians diagnose

• How data coders interpret

diagnoses

(20)

(21)

21

http://en.wikipedia.org/wiki/Bulletproof_vest

)

uality

q

(



leniency)

(



)

bility

a

(



)

hallenge

c

(



n)

penetratio

(





(

resistance

)

Rasch (1963)

𝑙𝑛

𝑃

𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗

1−𝑃

_{𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗}

=𝜃

𝑖

− 𝛿

𝑗

(22)

22 Example

• i

th

_{person of ability}



i

faced with task of

δ

j

level of difficulty

• probability, P

_success

, of achieving task

Category attribute

Object attribute,

δ

j

Person characteristic,



i

Satisfaction

Quality of product

User leniency

Difficulty

Level of difficulty of

activity

(Dis-)ability

Accessibility

Accessibility of

transport mode

Utility (or net benefit, …)

Rasch psychometric model

Logistic regression

)

bility

a

(



)

difficulty

(



General linearised model, link function, z:

𝑙𝑛

𝑃

𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗

(23)

23 Rasch (1961)

Level of ability

Level of difficulty

𝑃

_{𝑠𝑢𝑐𝑐𝑒𝑠𝑠}

=

𝑒

𝜃−𝛿

1 + 𝑒

𝜃−𝛿

𝜃 𝐴𝑏𝑖𝑙𝑖𝑡𝑦

𝛿 𝐷𝑖𝑓𝑓𝑖𝑐𝑢𝑙𝑡𝑦

(24)

24 Construct

map:

Vision

functionality

Pesudovs 2010

Man as Measurement Instrument

(25)

25

(26)

26 Person

Tool

Task

Task - tool

Environment*

• Body structures*

• Body functions*

Person – tool – task

• Activity*

• Participation*

*Five components of health:

[International Classification of Functioning, Disability and Health (ICF)]

(27)

27 underestimated

7. Sports

motiv

1. Healthy

Less

difficult

More

difficult

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐶𝑇𝑇

_𝑖

=

𝑃

𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗

𝐿

𝑗=1

𝐿

𝑙𝑛

_1−𝑃

𝑃

𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗

=𝜃

𝑖

− 𝛿

𝑗

(28)

28 )

Difficulty

(



)

Ability

(



Rasch (1961)

Measuring People

• Correct ordinal data treatment

• Better resolution

(29)

29 Balance as Measurement Instrument - Sensitivity (C

)

R = C·S

+ ”additional terms”

Stimulus (

S

): Mass of weight

Response (

R

):

Mass of weight x

Balance sensitivity

R

Measurand ’restitution’, S = C

_cal

-1

_·R

Calibration

R

_cal

= C

_cal

·S

_cal

+ ”additional terms”

(30)

30 Man as Measurement Instrument - Sensitivity (C

)

R = C·S

+ ”additional terms”

Stimulus (

S

): Task difficulty

R

)

Ability

(



Measurand ’restitution’, S = C

-1

_·R

Measurement systems

Response (

R

):

Task difficulty x

’Instrument’ sensitivity

𝑃

_{𝑠𝑢𝑐𝑐𝑒𝑠𝑠}

=

𝑒

𝜃−𝛿

1 + 𝑒

𝜃−𝛿

𝛿 𝐷𝑖𝑓𝑓𝑖𝑐𝑢𝑙𝑡𝑦

(31)

31 Metrological

references

Difficul

ty

Mas

s

Tasks

Physical disability

𝛿 𝐷𝑖𝑓𝑓𝑖𝑐𝑢𝑙𝑡𝑦

(32)

32 Adapted from: Cano, Hobart IOMW 2014

Functional Independence Measure (FIM)

Barthel Index (BI)

)

(

A

bility

of

patient



(33)

33 N

_TP

= 52

Less

able

More

able

𝜃 𝐴𝑏𝑖𝑙𝑖𝑡𝑦

TP32 +

TP42

TP47

k

= 2

P

su

cce

ss

=

50%

P

su

cce

ss

=

18%

_P

su

cce

ss

=

98%

(34)

34 P

_success

= 18%

P

_success

= 98%

(35)

35 )

(

A

bility

of

patient

(36)

36 My

health?

Quality-assured measurement

Cognitive ability?

0,8 units ± 0,2 units

Object: Health

Person-centred care (PCC)

• Focus on health (not

illness)

• People partners in care

• More symptoms

• Impact on Activities of

Daily Living

• Subjective & perceptive

• …

(37)

37 NeuroMet

EMPIR 15HLT04: Innovative measurements for improved

diagnosis and management of neurodegenerative diseases

June 2016 – June 2019

Acknowledgments

The European Metrology Programme for Innovation & Research (EMPIR, Horizon2020, Art. 185) is jointly funded by the EMPIR participating countries within EURAMET (www.euramet.org) and the European Union in this EMPIR 15 HLT04 NeuroMet project (coordinator: LGC (UK))

(38)

38 Less able

More able

U nce rt a in Po s s ib le Proba bl e

𝑙𝑛

𝑃𝑠𝑢𝑐𝑐𝑒𝑠𝑠,𝑖,𝑗 1−𝑃

=𝜃

𝑖

− 𝛿

𝑗

Sum

𝐶

𝑇𝑇

𝑖

=

𝑃

𝑠𝑢𝑐𝑐𝑒

𝑠𝑠

,𝑖

,𝑗

𝐿 𝑗=

1

(39)

39 Less difficult

More difficult

N

amin

g obj

ects

D

el

aye

d r

eca

ll

(40)

40 Metrological

references

Task

difficulty,

δ

Diffic

ul

ty

Mas

s

Naming objects

Orientation

Delayed recall

(41)

41

PLOS ONE | DOI:10.1371/journal.pone.0162889 October 14, 2016

Under-estimate

Mini Mental State Examination

More able

Less able

Healthy

Mild

cognitive

impairment

AD

(42)

42

PROPERTY CTT IMPLICATION RMT IMPLICATION

Group-level summary statistics legitimate YES Mean (SD) achievable for samples YES Mean (SD) achievable for samples

Item /person parameters can be estimated separately

NO Equivalence of measurement across not

guaranteed

YES All patients scored within an equivalent frame of reference

Total score is a sufficient statistic NO Different patterns of item responses can lead to same score

YES Higher/lower scores reflect more/less of the construct

Clinical hierarchy maintained calibrated items)

NO Meaning can change each time the scale is

used

YES Meaning of measurements same every time

scale used

Missing data is handled appropriately NO Reduce the power of your analysis YES Maximise all of the patient scores in a dataset

Invariance across the scale NO Score change across whole range scale does not mean same thing

YES Score change across whole range scale

means same thing

Automatic measurement in real time NO Impedes usability of the instrument and increases potential for error

YES Improves usability of the instrument /reduces error

Individual patient scores with bespoke standard errors

NO Unable to legitimately use individual patient scores

YES Can use individual patient scores

Tests for person fit NO Unable to legitimately examine patient fit YES Patterns of persons responses falling outside accepted range can be assessed