• No results found

Repeatability imprecision from analysis of duplicates of patient samples and control materials

N/A
N/A
Protected

Academic year: 2021

Share "Repeatability imprecision from analysis of duplicates of patient samples and control materials"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=iclb20

Scandinavian Journal of Clinical and Laboratory

Investigation

ISSN: 0036-5513 (Print) 1502-7686 (Online) Journal homepage: https://www.tandfonline.com/loi/iclb20

Repeatability imprecision from analysis of

duplicates of patient samples and control

materials

Anders Kallner & Elvar Theodorsson

To cite this article: Anders Kallner & Elvar Theodorsson (2020): Repeatability imprecision from analysis of duplicates of patient samples and control materials, Scandinavian Journal of Clinical and Laboratory Investigation, DOI: 10.1080/00365513.2019.1710243

To link to this article: https://doi.org/10.1080/00365513.2019.1710243

© 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.

Published online: 03 Jan 2020.

Submit your article to this journal

Article views: 285

View related articles

View Crossmark data

(2)

ORIGINAL ARTICLE

Repeatability imprecision from analysis of duplicates of patient samples and

control materials

Anders Kallneraand Elvar Theodorssonb

a

Department of Clinical Chemistry, Karolinska University Hospital, Stockholm, Sweden;bDepartment of Clinical Chemistry and Department of Clinical and Experimental Medicine, Link€oping University, Link€oping, Sweden

ABSTRACT

Measurement imprecision is usually calculated from measurement results of the same stabilized con-trol material(s) obtained over time, and is therefore, principally, only valid at the concentration(s) of the selected control material(s). The resulting uncertainty has been obtained under reproducibility con-ditions and corresponds to the conventional analytical goals. Furthermore, the commutability of the control materials used determines whether the imprecision calculated from the control materials reflects the imprecision of measuring patient samples. Imprecision estimated by measurements of patient samples uses fully commutable samples, freely available in the laboratories. It is commonly performed by calculating the results of routine patient samples measured twice each. Since the dupli-cates are usually analysed throughout the entire concentration interval of the patient samples proc-essed in the laboratory, the result will be a weighted average of the repeatability imprecision measured in the chosen measurement intervals or throughout the entire interval of concentrations encountered in patient care. In contrast, the uncertainty derived from many measurements of control materials over periods of weeks is usually made under reproducibility conditions. Consequently, the repeatability and reproducibility imprecision play different roles in the inference of results in clinical medicine. The purpose of the present review is to detail the properties of the imprecision calculated by duplicates of natural samples, to explain how it differs from imprecision calculated from single concentrations of control materials, and to elucidate what precautions need to be taken in case of bias, e.g. due to carry-over effects. ARTICLE HISTORY Received 22 September 2019 Revised 24 November 2019 Accepted 26 December 2019 KEYWORDS

Dahlberg formula; analysis of variance components; repeatability; reproducibility; total variance; bias

Variance components

Results of repeated measurements of the same sample are supposed to vary randomly and thus be normally distributed making the average and the standard deviation optimal measures of the central tendency and variation, respectively. However, in the analytical laboratory the measurement methods need to be characterized in more detail, in particu-lar the variance of results obtained under repeatability con-ditions i.e. no changes in the experimental concon-ditions between measurements (within series or ‘runs’) in contrast to reproducibility conditions, where one or several experi-mental conditions are changed between measurements (between series or ‘runs’). Typically in laboratory medicine practice changes in conditions are limited to measuring sys-tems, time and possibly, but not necessarily, reagents and calibrators. The actual changes between days or runs are not known in detail in all cases and reproducibility is in contrast to repeatability. These conditions are also addressed by the concept‘intermediate measurement precision’.

The one-way analysis of variance, ANOVA is designed to compare the averages of several series of measurements

investigating whether there is a difference between the aver-ages of the series. Calculation tools are readily available in statistical and spreadsheet programs and the results are commonly reported in a standardized format comprising the within-, between- and total sum of squares, the degrees of freedom for each category and the ‘mean squares’ (MS) of the between- and within groups.

The information provided by the ANOVA can also be used to analyse the variance components, i.e. the repeatabil-ity and the reproducibilrepeatabil-ity, since the mean square (MSband MSw) is a measure of the corresponding variances. However, the calculation of the between series variance necessarily includes the within series variance and therefore needs to be corrected to obtain a ‘pure’ between series vari-ance (reproducibility), s2

b, before calculating the total

vari-ance. The correction is: s2b¼

MSbMSw

n0

(1) where n0 is acceptable as the average number of observa-tions in the groups even in a slightly unbalanced design i.e. different number of observations in each series (group).

CONTACTAnders Kallner anders.kallner@ki.se Department of Clinical Chemistry, Karolinska University Hospital, Stockholm, Sweden This article has been republished with minor changes. These changes do not impact the academic content of the article.

ß 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

(3)

The total variance is the sum of the MSw and the s2b: If

the MSb<MSw, (1) would be undetermined and therefore given the value zero and the total imprecision becomes equal to the repeatability. If this occurs even if the condi-tions are varied it signifies a stability against changing exter-nal conditions. A typical consequence would be allowing extended periods between calibrations.

The calculated variances are strictly true only for the concentration – or assigned concentration – of the sample used in the ANOVA design.

The Dahlberg formula

In the days of Laplace [1] and Gauss [2] (beginning of the nineteenth century) and Pearson (end of the nineteenth cen-tury) there was a struggle regarding optimal methods for describing the variation of measured values. This was beau-tifully resolved by the Gauss normal frequency distribution function as for instance discussed in the seminal publication by the astronomer Charlier in 1910 [3]. It is challenging to follow the train of thoughts of the authors of the time in attempts to determine the ‘dispersion’, Streuung (in German) or ‘standard deviation’ of the random results of measurements. However, the archaic terminology in the lit-erature varies and the conclusions are often difficult to fol-low for the non-statistician.

Before the advent of analogue and digital computers the numerical treatment of databases was a major hurdle and special tricks and means were tried to estimate the disper-sion and the average. Dahlberg eventually discussed and for-mulated [4] the calculation of the dispersion of results from duplicate measurements and it became recognized as the standard derivative from duplicates.

s ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN i¼1ðxi, 1xi, 2Þ 2 2 N s ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN i¼1d 2 i 2 N s (2) where d is the difference between duplicate results and N the number of duplicate pairs. It may be difficult to imme-diately see how this expression relates to the definition of the standard deviation and understand its inference and it is therefore justified to derive the Dahlberg formula from the definition of the standard deviation (s) and variance (s2) of a sample.

The standard deviation (s), which in a graph of the nor-mal distribution represents the distance between the max-imum (average) of the bell-shaped curve and the first inflexion point and corresponds to about 1/3 of the area under the curve, is calculated as

s ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN i¼1ðxixÞ 2 n  1 v u u t (3) for a sample of the population and where xi is the individ-ual observation, x the average of observations and n the number of observations in the sample.

The Dahlberg formula can be understood from the for-mula for pooling variances. If the variance (s2) has been cal-culated for the same quantity using the same measurement

procedure in a commutable matrix but on various occasions, it is reasonable to assume that the calculated standard devia-tions belong to identical distribudevia-tions. The uncertainty of the overall estimate of the variance can be reduced by pool-ing the estimates from many experiments. This is the same principle as reducing the uncertainty of an average by pool-ing the averages calculated for many similar or identical experiments. Unless the number of observations in each experiment is the same, the individually calculated results must be weighted by the number of observations in each experiment. This is true when pooling averages as well as variances. Therefore, the general expression of a pooled variance is a‘best estimate’ of the standard deviation of sev-eral samples [5] within a measurement interval

s2pool¼ n11 ð Þs2 1þ nð 21Þs22þ...þ nð N 1Þs2N n1þn2þ...þnN ð ÞN ¼ XN i¼1ðni1Þs 2 i XN i¼1ð ÞNni (4) where the variance s2i

 

has been calculated for each of the N studies (runs), each comprising niobservations.

If the same quantity is measured the same number of times (n) and by the same procedure in N runs, then expression (4) is simplified to s2pool¼ n1 ð Þ  s2 1þ s22þ . . . þ s2N   N  n  N ¼s21þ s22þ . . . þ s2N N ¼ Xi¼N i¼1 s 2 i N (5)

In duplicate measurements there are only two observa-tions in each run and the average can be calculated by the expression: xi,1þxi,2

2 : When this expression of the average is

inserted into the formula, and with algebraic simplifications, the pooled variance based on two observations (duplicates) will be s2i ¼ XN i¼1 x1 x1þ x2 2  2 þ x2x 1þ x2 2  2 " # N ¼ XN i¼1 2x1x1x2 ð Þ2þ 2x 2x1x2 ð Þ2 4 N ¼ XN i¼1 x1x2 ð Þ 2 2 N ¼ PN i¼1d2i 2 N (6)

where N is the number of pairs (runs) and d the difference between the duplicates in each run.

The Dahlberg formula calculates the repeatability impre-cision of measurements i.e. performed under ‘repeatability conditions’ according to the VIM definition [6]. Notably, the repeatability imprecision (within series) is different from the reproducibility precision (between series) obtained by repeated measurements over days and weeks of the same sample. Reproducibility is not addressed by the Dahlberg formula. In contrast to the variance calculated from single 2 A. KALLNER AND E. THEODORSSON

(4)

control samples, the Dahlberg formula calculates the weighted average of the estimated repeatability imprecision of the entire studied concentration interval. It is also differ-ent from the total imprecision or method imprecision which is the sum of the variances of the repeatability and ‘pure’ reproducibility (1).

By partioning the results of results from a patient cohort and calculating the Dahlberg repeatability for each partition, an imprecision profile can be estimated [7].

If the standard deviation can be assumed constant (homoscedastic) in the measuring interval, the Dahlberg uncertainty can be calculated from formula (2). If, however, it is more likely that the standard deviation is proportional to the concentration (heteroscedastic), then a relative differ-ence of each pair is usually preferred to calculate a relative Dahlberg standard deviation:

srel¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN i¼1 2 x1x2 x1þ x2  2 2 N v u u u t (7) The result of the calculation of the relative Dahlberg uncertainty is commonly expressed as the coefficient of vari-ation, CVD, or as percentage %CVD. The CVD is appropri-ately viewed as the best estimate of a relative variation within a given measuring interval. The differences are rela-tive to different numbers i.e. the average of several dupli-cates in the measuring interval and therefore not directly comparable to a conventional relative standard deviation, which is relative to the average of the measurement results of the control sample. The Dahlberg %CV is based on the average of all measured samples which not always expli-citly known.

Numerous authors [8,9] have described the use of dupli-cate measurements for the estimation of imprecision. They have been especially popularized in the field of dentistry/ orthodontics [10]. A recent collection of papers in the jour-nal Accreditation and Quality Assurance [11–14] has also elucidated aspects of the matter.

Bias and the expanded Dahlberg formula

The Dahlberg formula is defined for measurements under true repeatability conditions, i.e. no systematic changes between the first and repeat measurements. However, there may be non-random differences (bias) caused by for instance sample degradation,‘carry over’, reagent decay etc, even if the measurements are made in immediate succes-sion. This bias may be constant for all measurements or vary between the pairs of measurements. In the latter case the bias can be expressed as the average of the non-random differences. To retain a primary repeatability, i.e. minimize the influence of the bias or change of measurement condi-tions, a correction can be applied.

To intuitively understand this correction, one has to con-sider the nature of an average. If a series of repeated obser-vations is related to the first by a randomly distributed difference, which may be reasonable to assume, the average of the first and repeat series of observations will be the

same. A consequence is that the average of the differences between the duplicates is zero.

For example, if we have the results of two observations of a series of samples (x1, x2… xn) and the pairs of observa-tions differ by a random value ±d, then the results will be (x1), (x1þd); (x2), (x2-d)… . and so on.

This can be summarized in a formula: Average¼ xrep¼

Xn

i¼1ðx6dÞi

n , where d ¼ f N, xð i, sÞ (8) If an average of the repeat series of observations is calcu-lated from a large enough number of observations, then the random differences will cancel out and the averages of the series will be the same, i.e.xorig¼ xrep¼

Pn i¼1ð Þxi

n :

The average of the differences will be zero because the sum of the positive deviations will be the same as the sum of the negative deviations:

Xn i¼1 þd ð Þ ¼Xn i¼1 d ð Þ (9)

If the variance of the original series were Var1 and the variance of the random variation Vard, then the variance of the second series will be Var1þ Vard: As a seemingly

paradoxical consequence the uncertainty of the repeat series of measurements will be larger than that of the first.

If there is a non-random component in the difference between the series, then the average would, written in the same format as above and assuming that the average of the error term is e, be

xi¼

Pn

i¼1ðxi6d þ eÞ

n (10)

Since, in the average calculated from a large number of observations, the random deviations cancel out, the average in (10) is reduced toxi ¼

Pn i¼1ðxiþeÞ

n :

We apply this reasoning to the Dahlberg formula (2) and assume that the repeat observation is x2þe. Then the aver-age of duplicate results will be [(x1þ(x2þe)]/2.

Thus, s2i ¼ XN i¼1 ½x1 xð 2þ eÞ x1þ xð 2þ eÞ 2  2 2 N1ð Þ ¼ XN i¼1 di x1þ xð 2þ eÞ 2  2 2 N1ð Þ ¼ XN i¼1 did  2 2 N1ð Þ (11) which will accommodate a bias i.e. the average of non-ran-dom differences, the magnitude of which is the difference between the average of the original and duplicate results [14]. Since we lose one degree of freedom in the calculation of the average of the differences then the Bessel correction N  1, is required for small samples. This correction was described by Ingervall in 1964 [15] and recognized as the ‘method of moments error’ (MME) by Springate [8] and discussed by Hyslop [9].

(5)

The Dahlberg formula and the expanded Dahlberg for-mula (ExpD, 11), which is derived above, can be compared to the MSE, ‘mean square error’ and its root, RMS ‘root mean square’, respectively.

MSE ¼ XN i¼1ðxix0Þ 2 N  1 and RMS ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN i¼1x 2 i N v u u t

where x0is a predetermined value e.g. the assumed target or average ad the RMS describes the width of a distribution centred at zero.

Even a small non-random deviation may influence the calculations. Since a zero non-random bias cannot always be assumed or justified, the expanded Dahlberg formula is a safe option to estimate the repeatability. If the non-random error is absent or small (as indicated by the average of the differences), the two formulas will yield identical results. Otherwise the original Dahlberg formula (2) runs the risk of overestimating the variance. The inconvenient necessity of calculating the average of the differences may not be a major hurdle with access to ample computer capacity.

The clinical interpretation of repeatability and reproducibility

The total imprecision represents the sum of two variances, the repeatability and the pure reproducibility variances, but disregards any bias. Some ambiguity exists in expressing the total imprecision since also the imprecision calculated as the variance across all observations of a control sample is com-monly called total variance. That is only statistically appro-priate if the pure reproducibility is negligible. Although this method risks underestimating the true total variance the effect is generally small inpracticalwork.

For the interpretation of results of clinical investigations at least three situations can be identified with different demands on the analytical sensitivity and therefore on the measurement uncertainty. Generally, the smaller the uncer-tainty is, the higher the analytical sensitivity will be; as expressed in the ‘minimal difference’, MD, i.e. the smallest significant difference between two results.

MD ¼ k  ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u2 1þ u22 q (12) If u1¼ u2, as would be expected under repeatability

con-ditions, (12) is simplified to MDi¼ ui k  ffiffiffi 2 p , (13)

where uiis the measurement uncertainty and k the coverage factor (usually 2 for a confidence level of 95%).

Consequently, in the monitoring of short termandfast events, e.g. changes of P/S-Troponin in myocardial infarc-tions, the repeatability would be the most appropriate infor-mation and suggest a higher analytical sensitivity, i.e. indicate a smaller significant change than if the inference were based on reproducibility. On the other hand, in moni-toring of chronic diseases, e.g. diabetes or comparing a result with a reference interval, reproducibility may be the adequate choice, i.e. a larger difference would be necessary

for aclinicallysignificant significant difference. In screening of patient populations imprecisions based on further assumptions, e.g. uncertainty caused by preanalytical condi-tions may be necessary or special effects of the prevalence of the condition.

Comparison between the analysis of variance components and Dahlberg repeatability

The repeatability in the ANOVA is calculated as the sum of squares of the difference between the observations and their average for each group (series) in the ANOVA. The repeat-ability variance, expressed as the mean square (MSw) is obtained by the division of the sum of squares by the degrees of freedom, i.e. the total number of observations (n) minus the number of groups (k), i.e. (n  k). The repeatabil-ity can be calculated from any number of observations in each group with this method. Since the number of groups will always be n/2, if the groups contain only two observa-tions, the df is also n/2, i.e. the number of pairs, or samples. In that case the MSwequals the Dahlberg variance.

In the Dahlberg calculation the sum of squares of the dif-ferences between the observations and the average of each pair is also the starting point. A further simplification (6) of the formula uses the information that the average is the sum of the two observations divided by 2 and thus, the pooled variances become the sum of the squared difference between observations divided by the number of observa-tions, equal to twice the number of samples determined. The Dahlberg expression is only applicable to duplicates. The degrees of freedom will be the number of pairs, or sam-ples, i.e. the same as for repeatability variance in the analysis of variance components [16,17].

The similarity of the procedures emphasizes their nature of estimating repeatability. A difference is that the compre-hensive analysis of variance components is only applicable for one sample concentration but any number of observa-tions in the groups whereas the Dahlberg approach can only be applied to duplicate samples within an extended measur-ing interval and thus represents a best estimate of the repeatability of measurements in that interval.

Conclusions

The repeatability imprecision calculated by the Dahlberg formula is a ‘best estimate’ of an average repeatability imprecision in a chosen measurement interval, using com-mutable samples. This imprecision may vary substantially within the interval of concentrations encountered in medical laboratories. In case a non-random variation between the first and the repeat results cannot be excluded, a correction should be applied which results in a different formula, the expanded Dahlberg (moment measurement error). It is important to carry out the repeat measurements as close as possible to the first, be vigilant to systematic influences and to observe the measuring interval. A relative repeatability estimated by the Dahlberg formula is based on the average of the pairs of results and thus not directly comparable to a 4 A. KALLNER AND E. THEODORSSON

(6)

variance relative to defined value, e.g. the average of a set of results.

It is argued that the repeatability calculated by the ana-lysis of variance components technique only represent the variance at one concentration but can be based on may repeated observations in each group.

In an ideal situation, the user should match the analytical goal with the clinical need, thus, repeatability should be the goal of choice when a short term development and small changes of a biomarker is evaluated, whereas the reproduci-bility variation is more appropriate when monitoring changes over extended periods of time.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

[1] Laplace PS. Theorie analytique des probabilites. Courcier, Paris;

1812.

[2] Gauss CF. Theoria motus corporum coellestium. G€ottingen:

Gesellschaft Wissenschaft; 1809. (Werke 7 K).

[3] Charlier C. Grunddragen av den matematiska statistiken. Lund:

Statsvetenskaplig Tidskrifts Expedition; 1910.

[4] Dahlberg G. Statistical methods for medical and biological

stu-dents. London: G. Allen & Unwin Ltd.; 1940.

[5] Pearson K. On lines and planes of closest fit to systems of

points in space. Phil Mag. 1901;2(11):559–572.

[6] JCGM. International vocabulary of metrology — Basic and

general concepts and associated terms (VIM 3): Bureau

International des Poids et Mesures; 2012 [2019 Apr 8]. 3 ed.

Available from:

https://www.bipm.org/utils/common/docu-ments/jcgm/JCGM_200_2012.pdf.

[7] Kallner A, Petersmann A, Nauck M, et al. Measurement

repeat-ability profiles of eight frequently requested measurands in clin-ical chemistry determined by duplicate measurements of patient samples. Scand J Clin Lab Invest. 2020.

[8] Springate SD. The effect of sample size and bias on the

reliabil-ity of estimates of error: a comparative study of Dahlberg’s

for-mula. Eur J Orthod. 2012;34(2):158–163.

[9] Hyslop NP, White WH. Estimating precision using duplicate

measurements. J Air Waste Manag Assoc. 2009;59(9):1032–1039.

[10] Houston W. The analysis of errors in orthodontic

measure-ments. Am J Orthod. 1983;83(5):382–390.

[11] Roesslein M, Wolf M, Wampfler B, et al. A forgotten fact about

the standard deviation. Accred Qual Assur. 2007;12(9):495–496.

[12] Rosslein M, Rezzonico S, Hedinger R, et al. Repeatability: some

aspects concerning the evaluation of the measurement

uncer-tainty. Accred Qual Assur. 2007;12:425–434.

[13] Hall BD, Willink R. A comment on: A forgotten fact about the

standard deviation. Accred Qual Assur. 2008;13(1):57–58.

[14] Synek V. Evaluation of the standard deviation from duplicate

results. Accred Qual Assur. 2008;13(6):335–337.

[15] Ingervall B. Retruded contact position of mandible. A

compari-son between children and adults. Odontologisk Revy. 1964;15:

130–149.

[16] Kallner A, Theodorsson E. An experimental study of methods

for the analysis of variance components in the inference of

laboratory information. Scand J Clin Lab Invest. 2019:1–8.

[17] IUPAC. Compendium of analytical nomenclature—Definitive

Rules 2000 (‘‘The Orange Book’’), 3rd ed. [Updated 2002].

IUPAC; 1998. Available from: http://old.iupac.org/publications/

References

Related documents

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

Utvärderingen omfattar fyra huvudsakliga områden som bedöms vara viktiga för att upp- dragen – och strategin – ska ha avsedd effekt: potentialen att bidra till måluppfyllelse,

Den förbättrade tillgängligheten berör framför allt boende i områden med en mycket hög eller hög tillgänglighet till tätorter, men även antalet personer med längre än

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av