• No results found

Reproducibility and repeatability of MRI-based body composition analysis

N/A
N/A
Protected

Academic year: 2021

Share "Reproducibility and repeatability of MRI-based body composition analysis"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Magn Reson Med. 2020;00:1–11. wileyonlinelibrary.com/journal/mrm

|

1 F U L L PA P E R

Reproducibility and repeatability of MRI-based body

composition analysis

Magnus Borga

1,2,3

|

André Ahlgren

3

|

Thobias Romu

3

|

Per Widholm

2,3,4

|

Olof Dahlqvist Leinhard

2,3,4

|

Janne West

1,2,3

1Department of Biomedical Engineering, Linköping University, Linköping, Sweden

2Center for Medical Image science and Visualization, Linköping University, Linköping, Sweden 3AMRA Medical AB, Linköping, Sweden

4Department of Health, Medicine and Caring Science, Linköping University, Linköping, Sweden

This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

© 2020 The Authors. Magnetic Resonance in Medicine published by Wiley Periodicals LLC on behalf of International Society for Magnetic Resonance in Medicine Correspondence

Magnus Borga, Department of Biomedical Engineering, Linköping University, SE-58183 Linköping, Sweden. Email: magnus.borga@liu.se

Funding information

Vetenskapsrådet, Grant/Award Number: 2019-04751

Purpose: There is an absence of reproducibility studies on MRI-based body

compo-sition analysis in current literature. Therefore, the aim of this study was to investigate the between-scanner reproducibility and the repeatability of a method for MRI-based body composition analysis.

Methods: Eighteen healthy volunteers of varying body mass index and adiposity

were each scanned twice on five different 1.5T and 3T scanners from three different vendors. Two-point Dixon neck-to knee images and two additional liver scans were acquired with similar protocols. Visceral adipose tissue (VAT) volume, abdominal subcutaneous adipose tissue (ASAT) volume, thigh muscle volume, and muscle fat infiltration (MFI) in the thigh muscle were measured. Liver proton density fat frac-tion (PDFF) was assessed using two different methods, the scanner vendor's 6-point method and an in-house 2-point method. Within-scanner test-retest repeatability and between-scanner reproducibility were calculated using analysis of variance.

Results: Repeatability coefficients were 13 centiliters (cl) (VAT), 24 cl (ASAT),

17 cl (total thigh muscle volume), 0.53% (MFI), and 1.27-1.37% for liver PDFF. Reproducibility coefficients were 24 cl (VAT), 42 cl (ASAT), 31 cl (total thigh mus-cle volume), 1.44% (MFI), and 2.37-2.40% for liver PDFF.

Conclusion: For all measures except MFI, the within-scanner repeatability explained

much of the overall reproducibility. The two methods for measuring liver fat had sim-ilar reproducibility. This study showed that the investigated method eliminates effects due to scanner differences. The results can be used for power calculations in clinical studies or to better understand the scanner-induced variability in clinical applications.

K E Y W O R D S

body composition analysis, chemical shift encoded MRI, fat fraction, MRI, repeatability, reproducibility, water-fat imaging

(2)

1

|

INTRODUCTION

It is well known that the metabolic risk related to fat accu-mulation is strongly dependent on its distribution. Large amounts of visceral adipose tissue (VAT) are related to in-creased cardiovascular risk,1-4 type 2 diabetes,5,6 liver

dis-ease,7 and cancer.8,9 High levels of liver fat increase the risk

for liver disease and type 2 diabetes,10,11 and increased

mus-cle fat has been associated with increased risk for insulin re-sistance and type 2 diabetes12 as well as reduced mobility.13,14

It is well recognized that bodymass index (BMI) and other anthropometric surrogate measures are poor predictors for individual metabolic risk15-17 and that imaging tools may be

more accurate in describing metabolic risk related to body composition.3,18,19

MRI is increasingly frequently being used for body com-position analysis and is today considered gold standard for compartmental quantification of adipose tissue20 and

mus-cle volumes.21 MRI-based body composition analysis can

be divided into four main steps: image acquisition, image reconstruction, image segmentation, and tissue quantifica-tion. A number of publications on more or less automated methods for MRI-based body composition analysis have been published over the past years; see22,23 for recent reviews.

Several of these publications have reported accuracy of the automated segmentation compared to manual segmentation. However, evaluation of the segmentation alone does not ad-dress the quantitative properties of the complete imaging chain from image acquisition to measurement results. For ex-ample, sensitivity to differences in scanning parameters such as image resolution is not reflected by comparing automatic and manual segmentation, while different methods may be more or less sensitive to partial volume effects, which are directly related to image resolution.22,24

While accuracy reflects the distance between the expected measurement and the “true” value of the measured property, that is, the magnitude of the bias of the measurement error, the precision, moreover, reflects the variability of the mea-surement that is induced by the meamea-surement device, that is, the variance of the measurement error. The precision of a measurement device is often measured as repeatability, which addresses the device's ability to produce similar output when a measurement is repeated on the same subject under similar and controlled conditions within a short period of time.25 Also, repeatability of MRI-based body composition

analysis has been reported in a number of studies.26-31

Test-retest repeatability can be used to show the preci-sion of a particular MRI experiment, which is affected by variability caused by the MRI scanner hardware, interactions between the scanner and the subject, and (unless the analysis software is fully automated) also the variability of the op-erator. However, repeatability does not describe the ability to reproduce a measurement under different conditions, for

example when using another scanner. Different scanners have different pulse sequence implementations, different image re-construction algorithms, and different hardware characteris-tics such as field strength, gradient systems and coils, and as a consequence, differences in repetition time, flip angle, excitation pulse profiles, image resolution, and image matrix. Also, interactions between patient and MRI technician are different on different scanners. To characterize the variance of a measurement when conducted under different conditions, the concept of reproducibility is used.25 However,

reproduc-ibility studies on MRI-based body composition analysis are conspicuous by their absence.22 Therefore, the aim of this

study was to investigate the between-scanner reproducibility and the repeatability of a method for MRI-based body com-position analysis.

2

|

METHODS

2.1

|

Overview

This study was designed to determine test-retest repeat-ability (same experiment setup within one MRI scanner, repeated), reproducibility (not controlled repeatability condi-tions with several scanners), and operator variability (same MRI data acquisition multiple analyses). In-vivo MRI data were acquired at a scanning facility where five different MRI scanners were available within walking distance, using a multi-scanner test-retest setup. The repeatability conditions were the same observation procedure, same scanner, same location, and repeated observations over a short period of time.32 For reproducibility, all five MRI scanners were

in-cluded and the repeatability conditions were not controlled. An overview of the study design is shown in Figure 1.

Intra- and interoperator variability was determined sep-arately by controlling the analysis chain in terms of which operator that performed the analysis, as described in West et al.33 More specifically, a subset of the datasets were used

(16 subjects, one time point, one MRI scanner), and two op-erators analyzed this subset of data two times each, in ran-domized order.

2.2

|

Subjects

Eighteen healthy volunteers were included (13 male, 5 fe-male, aged 37 ± 8 years, range 24-51 years, mean weight 81 kg, range 52-111 kg, mean BMI 26 kg/m2, range

18-31 kg/m2, median liver proton density fat fraction (PDFF)

2.2%, range: 1.5-14.3%). Study participants were enrolled by local advertisement. Exclusion criterion was contraindica-tions for MRI, which were assessed using a questionnaire. The study was approved by the regional ethical review board

(3)

(DNR: 2017/184-31). Written informed consent was ob-tained from all participants prior to study entry.

2.3

|

MRI acquisitions

Five MRI scanners; GE Optima MR450w 1.5T, GE Discovery MR750 3T, Philips Achieva 1.5T, Philips Ingenia 3T, and Siemens Prisma 3T, were included and similar MRI protocols were used for all scanners. Each subject was scanned during a single session within 1 d, where the subject was scanned twice on each of the MRI scanners and walked between the scanners. Subjects were always scanned twice on the same scanner consecutively, but the order of the scanners were dif-ferent because of logistic reasons. The walking distance was up to a few hundred meters. Each complete scan session (five scanners with two scans per scanner) took about 5 h but was not measured. The repeatability conditions were realized by letting each subject exit and re-enter the scanner room in be-tween subsequent test-retest acquisitions on the same MRI scanner. Food intake between examinations was not strictly controlled and some subjects had a light lunch between scan-ners, if the scanning sessions spanned over lunchtime. No subjects ate between scans on the same scanner.

On all scanners, the body-MRI protocol was based on fat-water separated 3D complex-based 2-point (2p) chem-ical shift encoded MRI (CSE-MRI), a.k.a. 2-point Dixon water-fat separation imaging, with neck-to-knee scan cov-erage and adjacent slab overlap of 2-4 cm. The vendors' water-fat reconstruction was used on all scanners. Two ad-ditional imaging slabs covering the liver were acquired. The first one using a multi-echo spoiled gradient echo (GRE) se-quence for liver T∗

2 estimation (for the purpose of correcting

the 2p liver PDFF estimates for T∗

2 effects) with echo times

approximately in-phase, optimized for T∗

2 estimation. The

second additional liver slab was acquired using the vendor's liver fat protocol with factory parameter settings. All neck-to-knee images were T1-weighted with 10° flip angle and repetition time set to shortest possible. Fat, water, in-phase, and opposite-phase images were acquired where fat and water images were reconstructed using the on-scanner software for

all acquisitions. The integrated body coils were used for neck- to-knee scans, whereas manufacturers' standard auxiliary sur-face coils were used for liver slab acquisitions. Scan time for each complete scan (neck-to-knee scan, including the extra liver slabs) was approximately 10 min. All scans were performed in supine position. Details on the MRI acquisitions and protocol settings are listed in Table 1. Since the researchers were blinded for MRI scanner in all analyses, the scanners were labeled as “Scanner 1” to “Scanner 5,” in randomized order. The perfor-mance of a particular scanner in this experiment might not be representative for that scanner model in general, but may be dependent on individual factors of that particular scanner and on how these factors interact with the post processing methods. Therefore, the scanner labeling is left undisclosed.

2.4

|

Body composition analysis

Body composition analyses were performed for all acquisitions to measure visceral adipose tissue (VAT) volume, abdominal subcutaneous adipose tissue (ASAT) volume, thigh muscle vol-ume (left/right, anterior/posterior, and total), muscle fat infiltra-tion (MFI) in the thigh muscle (left/right, anterior/posterior, and average anterior), and liver fat. Both MFI and liver fat were measured as PDFF. The method has been described in more detail and evaluated in terms of accuracy elsewhere.19,31,34-36

Briefly, fat and muscle compartments were determined using the following steps; (1) fat images were calibrated using fat-referenced image calibration, (2) atlases with ground truth la-bels for fat and muscle compartments were registered to the acquired MRI dataset, (3) quality control was performed by trained operators, who could interactively adjust and approve the final segmentation, and (4) fat and muscle volumes were quantified within the segmented regions. All steps except step 3 were fully automated. The operators analyzed datasets from a queue of de-identified data, and were, therefore, blinded to both subject and scanner. Typical review time for each com-plete data set (neck-to-knee scan including the extra liver slabs) was 15-20 min.

Measurements were calculated using the cloud-based service AMRA Researcher (AMRA Medical AB,

(4)

Linköping, Sweden) with an additional correction for left/ right asymmetries in the signals caused by eddy-current ef-fects. Liver fat was calculated both from the neck-to-knee 2p images and 6-point CSE-MRI (6p) images separately using up to nine, at least three, regions of interest (ROIs) manually placed in the liver avoiding vessels, bile ducts, and image artifacts. The 2p liver images were calibrated using our in-house fat-referenced method and corrected for T∗

2 effects using T∗2 values estimated from the

multi-echo spoiled GRE sequence. The fat-referenced liver fat measurements were subsequently rescaled to PDFF by as-suming pure adipose tissue PDFF of 93.7%.36 The 6p liver

images were reconstructed using the vendor's on-scanner software for each scanner.

2.5

|

Statistical analysis

Descriptive statistics were calculated for all measurements with all study subjects pooled. Measurements were not per-formed in regions marked as not analyzable by the operators, and consequently, such regions were not included in the sta-tistical analysis in this study. Overall within-scanner repeat-ability was calculated using one-way analysis of variance, so that all test-retest acquisitions were included from all MRI scanners. In particular, measurements were used as the de-pendent variable, and the subject-scanner combination (eg, “Subject 1 - Scanner 1”) was used as the independent variable in terms of a single random effect. Hence, the repeatability was pooled over all scanners and subjects. The repeatability

MRI scanner

system MRI Sequences (coverage) Acquisition settings: Voxel size (abdomen, thighs), FOV, TR, TE, FA, ETL (no. of shots)

GE 1.5T (Optima MR450w GEM DV25) LAVA Flex (neck-to-knee) 2.0 × 2.0 × 5, 2.0 × 2.0 × 4 mm 3, 500 × 350 mm2 6.1 ms, 2.1/4.2 ms, 10°, 1 (2 shots) Multi-echo spoiled GRE (liver) 2.0 × 2.0 × 10 mm 3, 500 × 500 mm2 150 ms, 4.6/9.2 ms, 10°, 1 (2 shots) IDEAL IQ (liver) 1.6 × 1.6 × 10 mm3, 400 × 320 mm2 17 ms, 1.8/3.9/6.1/8.2/10.4/12.5 ms, 8°, 6 (1 shot) GE 3T (Discovery MR750 DV25) LAVA Flex (neck-to-knee) 1.9 × 1.9 × 5, 1.9 × 1.9 × 4 mm 3, 480 × 350 mm2 3.6 ms, 1.1/2.2 ms, 10°, 1 (2 shots) Multi-echo spoiled GRE (liver) 1.8 × 1.8 × 8 mm 3, 450 × 360 mm2 6.5 ms, 2.3/4.9 ms, 10°, 1 (2 shots) IDEAL IQ (liver) 1.8 × 1.8 × 8 mm3, 450 × 360 mm2 6.5 ms, 0.8/1.6/2.4/3.3/4.1/4.9 ms, 3°, 3 (2 shots) Philips 1.5T (Achieva dStream R5.3) mDIXON FFE (neck-to-knee) 3.3 × 3.3 × 6, 1.7 × 1.7 × 4 mm 3, 530 × 370 mm2 5.9 ms, 2.4/4.7 ms, 10°, 2 (1 shot) Multi-echo spoiled GRE (liver) 3.7 × 3.7 × 5 mm 3, 530 × 340 mm2 11 ms, 4.6/9.2 ms, 10°, 2 (1 shot) mDIXON Quant (liver) 2.0 × 2.0 × 6 mm 3, 375 × 300 mm2 9.0 ms, 1.1/2.4/3.7/5.0/6.3/7.6 ms, 5°, 6 (1 shot) Philips 3T (Ingenia R5.3) mDIXON FFE (neck-to-knee) 3.3 × 3.3 × 4, 1.8 × 1.8 × 4 mm 3, 530 × 370 mm2 3.5 ms, 1.2/2.4 ms, 10°, 2 (1 shot) Multi-echo spoiled GRE (liver) 2.4 × 2.4 × 5 mm 3, 530 × 340 mm2 6 ms, 2.3/4.6 ms, 10°, 2 (1 shot) mDIXON Quant (liver) 2.1 × 2.1 × 6 mm 3, 400 × 350 mm2 6.9 ms, 1.0/1.9/2.8/3.7/4.6/5.5 ms, 3°, 6 (1 shot) Siemens 3T (Prisma E11) Dixon VIBE (neck-to-knee) 2.0 × 2.0 × 5, 2.0 × 2.0 × 4 mm 3, 500 × 375 mm2 3.8 ms, 1.2/2.5 ms, 10°, 2 (1 shot) Multi-echo spoiled GRE (liver) 2.0 × 2.0 × 5 mm 3, 500 × 360 mm2 176 ms, 2.3/4.6 ms, 10°, 2 (1 shot) LiverLab q-Dixon (liver) 2.8 × 2.8 × 4 mm 3, 450 × 400 mm2 9.0 ms, 1.1/2.5/3.7/4.9/6.2/7.4 ms, 4°, 6 (1 shot)

Abbreviations: FOV, field of view; TR, repetition time; TE, echo time; FA, flip angle; ETL, echo train length.

TABLE 1 Details on MRI sequences and acquisition settings

(5)

was also calculated for each MRI scanner separately. Overall reproducibility was calculated using one-way analysis of variance, where the independent variable was modeled as the subject (eg, “Subject 1”) so that the within-subject vari-ance across all scanners was estimated. Repeatability and reproducibility coefficients were calculated as 2.77 times the within-subject standard deviation.37 Within-subject

coef-ficient of variation (CV) was estimated by first calculating all subject-wise coefficients of variation over all scanners and subjects, and then, taking the root mean square of those. To assess the differences (bias) between the MRI scanners, the difference in measurement values between all pairs of scan-ners were calculated. By averaging across all subjects and combinations of scanners, and shifting to a common zero, the average differences between MRI scanners were obtained on an interval scale. These differences were used in the visuali-zation of the results to facilitate evaluation of the contribution of bias and repeatability, respectively, to the overall repro-ducibility. Liver PDFF using the 2p and 6p methods were compared by linear correlation and Bland-Altman analysis. Pooled correlation analysis was performed by averaging the test-retest measurements for each subject and scanner first. Bland-Altman analysis was performed with correction for multiple observations.38

Operator variability was calculated using one-way analy-sis of variance, where the independent variable was modeled as the subject (eg, “Subject 1”) for interoperator variability and as the subject-operator combination (eg, “Subject 1 - Operator 1”) for the intraoperator variability.

All statistical analyses were performed using R (version 3.5.2).

3

|

RESULTS

Examples of the segmentations of adipose tissue and thigh muscles as well as placements of ROIs in the liver are shown in Figure 2.

Descriptive statistics (median and quartiles of each mea-surement variable) for the study cohort are presented in Table 2. Of the planned 180 examinations, 8 examinations were not performed due to insufficient time to carry out all examinations within 1 d or due to technical issues. Hence, a total of 172 examinations were performed. The most notable image quality issue was that MFI was not analyzable in the first five subjects due to insufficient slab overlap in the MRI protocol. This was adjusted for the subsequent 13 subjects. Since the amount of observations included in each statistical analysis depends on the results of the quality control and the type of statistics, all statistical results include the number of subjects and observations included in that particular analysis.

The within-scanner repeatability results (within-subject SD (sw), range of sw across the scanners, repeatability

coef-ficient and CV) for each measurement variable are reported in Table 3. The repeatability coefficients were 13 centiliters (cl) (VAT), 24 cl (ASAT), 17 cl (total thigh muscle volume), 0.53% (MFI), and 1.37% and 1.27% for 2p and 6p liver PDFF, respectively. The reproducibility results (sw, reproducibility

FIGURE 2 Examples of segmentation of subcutaneous adipose tissue (blue), visceral adipose tissue (red), right anterior thigh muscles

(yellow), left anterior thigh muscles (pink), right posterior thigh muscles (blue), and left posterior thigh muscles (green) in an obese subject (panels 1 and 2) and in a lean subject (panels 3 and 4). Right panel: Example of regions of interest for liver proton density fat fraction in coronary view (top) and axial view (bottom)

(6)

coefficient and CV) across different scanners are reported in Table 4. Reproducibility coefficients were 24 cl (VAT), 42 cl (ASAT), 31 cl (total thigh muscle volume), 1.44% (MFI), and 2.37% and 2.40% for 2p and 6p liver PDFF, respectively. The repeatability for each scanner, average difference (bias) between scanners, and between-scanner reproducibility are illustrated in Figure 3. For all measures except MFI, the within-scanner repeatability explained much of the overall

reproducibility. The linear correlation between the 2p and 6p liver PDFF measurements was r = 0.953. The bias in the Bland-Altman analysis (2P-6P) was −0.63%-units, with limits of agreement of −3.09%-1.84%. The inter- and intraoperator variability are reported in Supporting Information Table S1, which is available online. The interoperator repeatability was on par with the interoperator variability and the operator vari-ability is in general small compared to the overall varivari-ability.

Measurement Median Q1 Q3 No. of subjects (observations)

Visceral adipose tissue volume [cl] 174 137 334 18 (166) Abdominal subcutaneous adipose tissue

volume [cl] 392 270 871 18 (166) Total thigh muscle volume [cl] 1428 1324 1537 18 (171) Left anterior thigh muscle volume [cl] 260 249 276 18 (171) Left posterior thigh muscle volume [cl] 444 404 491 18 (171) Right anterior thigh muscle volume [cl] 261 241 277 18 (172) Right posterior thigh muscle volume [cl] 448 408 489 18 (172) Mean anterior thigh muscle fat infiltration [%] 4.0 3.4 5.4 13 (122) Left anterior thigh muscle fat infiltration [%] 4.1 3.4 5.5 13 (122) Left posterior thigh muscle fat infiltration [%] 7.6 5.8 8.8 13 (122) Right anterior thigh muscle fat infiltration [%] 3.9 3.3 5.4 13 (122) Right posterior thigh muscle fat infiltration

[%] 7.8 5.6 8.9 13 (122) Liver fat (2p) [%] 2.0 1.7 2.5 18 (137) Liver fat (6p) [%] 2.2 1.9 3.6 18 (140) Note: Volume measures in centiliters (cl).

TABLE 2 Descriptive statistics of the study cohort

TABLE 3 Within-scanner repeatability

Measurement sw (range)

Repeatability

coefficient Coefficient of variation No. of subjects (observations)

Visceral adipose tissue volume [cl] 4.55 (2.73-6.18) 12.60 2.89% 18 (160) Abdominal subcutaneous adipose tissue volume [cl] 8.82 (6.01-10.63) 24.44 1.81% 18 (160) Total thigh muscle volume [cl] 6.28 (4.02-9.09) 17.40 0.45% 18 (168) Left anterior thigh muscle volume [cl] 1.76 (0.93-2.69) 4.89 0.72% 18 (168) Left posterior thigh muscle volume [cl] 2.71 (1.81-3.65) 7.50 0.64% 18 (168) Right anterior thigh muscle volume [cl] 1.93 (1.21-2.57) 5.36 0.95% 18 (170) Right posterior thigh muscle volume [cl] 3.15 (2.75-3.36) 8.74 0.76% 18 (170) Mean anterior thigh muscle fat infiltration [%] 0.19 (0.08-0.41) 0.53 3.90% 13 (116) Left anterior thigh muscle fat infiltration [%] 0.25 (0.11-0.51) 0.68 5.41% 13 (116) Left posterior thigh muscle fat infiltration [%] 0.18 (0.12-0.34) 0.51 2.94% 13 (116) Right anterior thigh muscle fat infiltration [%] 0.21 (0.13-0.35) 0.59 5.14% 13 (116) Right posterior thigh muscle fat infiltration [%] 0.22 (0.11-0.50) 0.61 4.41% 13 (116) Liver fat (2p) [%] 0.50 (0.18-0.87) 1.37 27.94% 18 (130) Liver fat (6p) [%] 0.46 (0.21-0.70) 1.27 15.70% 18 (124) Note: Units are given in the left column, except for coefficient of variation, which is measured in percent for all measurements.

(7)

4

|

DISCUSSION AND

CONCLUSIONS

Repeatability and reproducibility are both important, for different reasons and in different contexts. In longitudinal studies, where the patient is scanned multiple times on the same scanner, and where the primary variable is the differ-ence between time points, the repeatability may be more important than the reproducibility, as long as the scanner's software or hardware is not upgraded during the study pe-riod. In clinical practice and in multicenter clinical studies, however, the reproducibility is often the primary quality parameter. Reproducibility is also important to enable com-parison to reference values extracted by the same method but from other sources. One example is the use of large imaging biobanks, such as the UK Biobank, for determining norma-tive values for body composition19 or for computing virtual

control groups for determining the propensity for different diseases for a certain body composition profile.39 The

valid-ity of using such reference data outside the study in which they were collected, heavily relies on the reproducibility of the measurement method.

In this study we analyzed the reproducibility of a cloud-based body composition analysis service. The reproducibil-ity coefficients for the volume measurements ranged from 7 cl (left and right anterior thing muscles) to 42 cl (ASAT). The reproducibility coefficient for muscle fat infiltration was 1.4% and for liver fat 2.4%. The interpretation of the repro-ducibility coefficient (also known as “smallest detectable difference”) is that, in 95% of cases, the magnitude of the

reproducibility error will not exceed this coefficient. Whether the precision found in this study is sufficient depends on the specific application. Unfortunately, we have not found any other studies reporting reproducibility of similar methods for volumetric measures of muscle and adipose tissue to which these results could be compared. However, the reproducibil-ity can be compared to the variation in a general population. The variance of VAT in the first 10,000 scanned participants in the UK Biobank was 502 cl.39 The ratio (F-ratio) between

this variance to the within-subject reproducibility variance

s2

w = 0.0064 found here is 784. The F-ratio in this context

can be interpreted as the ratio between the actual variance in the population and the variance of the measurement error or, in other words, the signal-to-noise ratio of the measurement. The corresponding F-ratios were 455 for ASAT, 542 for total thigh muscle volume, and 12.8 for mean anterior thigh MFI. The reproducibility coefficient for ASAT is about twice as large as for VAT. But the same relation holds for the average volume of ASAT and VAT and, hence, in relative terms the reproducibility is very similar for ASAT and VAT.

The reproducibility can also be compared to differences between healthy people and people with different diseases. In a study comparing healthy people to those with coronary heart disease (CHD) and those with type-2 diabetes (T2D) from the UK Biobank,19 there was a significant difference

(P < .001) between people with CHD and sex- and age-matched controls of 82 cl in VAT and 0.5% in MFI. The re-producibility coefficient found in the present study of 21 cl for VAT shows that differences in VAT related to CHD can be detected by this method on an individual level. For MFI, we

TABLE 4 Overall reproducibility

Measurement sw

Reproducibility

coefficient Coefficient of variation No. of subjects (observations)

Visceral adipose tissue volume [cl] 7.51 20.81 4.43% 18 (166) Abdominal subcutaneous adipose tissue volume [cl] 15.33 42.49 3.42% 18 (166) Total thigh muscle volume [cl] 11.22 31.10 0.87% 18 (171) Left anterior thigh muscle volume [cl] 2.49 6.89 1.03% 18 (171) Left posterior thigh muscle volume [cl] 4.86 13.46 1.19% 18 (171) Right anterior thigh muscle volume [cl] 2.48 6.87 1.14% 18 (172) Right posterior thigh muscle volume [cl] 4.92 13.63 1.21% 18 (172) Mean anterior thigh muscle fat infiltration [%] 0.52 1.44 12.29% 13 (122) Left anterior thigh muscle fat infiltration [%] 0.57 1.59 13.19% 13 (122) Left posterior thigh muscle fat infiltration [%] 0.80 2.23 11.41% 13 (122) Right anterior thigh muscle fat infiltration [%] 0.58 1.61 14.36% 13 (122) Right posterior thigh muscle fat infiltration [%] 0.74 2.04 10.76% 13 (122) Liver fat (2p) [%] 0.86 2.37 36.63% 18 (137) Liver fat (6p) [%] 0.86 2.40 28.76% 18 (140)

Note:: Units are given in the left column, except for coefficient of variation which is measured in percent for all measurements. sw—within-subject standard deviation.

cl—centiliter.

(8)

FIGURE 3 Plots showing the within-scanner repeatability, the difference in scanner bias, and the total between-scanner reproducibility for the

different measurements. The markers (o) show the bias of each scanner, around a common zero. The insert text shows the largest average difference between two scanners. The error bars show the interval of ± the repeatability coefficient around the relative bias for each scanner. The red band shows the interval of ± the reproducibility coefficient, that is, the 95% limits of agreement of two measurements on the same subject on different scanners. All volumetric measurements and percentage measurements, respectively, share common axes for comparability

(9)

found a reproducibility coefficient of 1.44%, which is not suf-ficient to detect CHD-related differences on individual basis. Looking at T2D, Linge et al found significant (P < .001) dif-ferences to sex- and age-matched controls of 152 (21) cl for VAT, 170 (42) cl for ASAT, 4.08 (2.4)% for liver fat, and 1.19 (1.44)% for MFI (reproducibility coefficients from the present study in parenthesis).19 Hence, for all these measures,

except MFI, the reproducibility is sufficient to detect differ-ences related to T2D on an individual basis.

The repeatability found in the present study can be related to previous studies on earlier versions of the same technology. In a study by Newman et al,28 the within-scanner

repeatabil-ity coefficient for a single scanner was 9 cl for VAT, which is slightly better than the general repeatability coefficient of 13 cl found here, but within the range of the different scan-ners. For ASAT, Newman et al found a repeatability coeffi-cient of 46 cl, which is higher than that found in this study, 24 cl. A corresponding repeatability study29 on muscle

quan-tification reported 95% limits of agreement corresponding to a repeatability coefficient of 5 cl and 6 cl for total right and left thigh muscle volumes, respectively. (The range between the 95% limits of agreement is twice the repeatability coeffi-cient37.) Also, this is within the range found in this study but

lower than the 17 cl found when all scanners were pooled. West et al investigated the repeatability of the same method on 36 postmenopausal women with a sedentary lifestyle, a quite different cohort from that of the present study. The re-peatability coefficients in the study by West et al were 12 cl for VAT, 35 cl for ASAT, 25 cl for total thigh muscle, and 1.69% for liver PDFF, which are all comparable to those of the present study, despite the rather different cohorts. Yet an-other study24 investigated the repeatability of VAT, ASAT,

thigh muscle volume, and liver PDFF and found 95% limits of agreement corresponding to repeatability coefficients of 32 cl for VAT, 68 cl for ASAT, 30 cl for total thigh mus-cle volume, and 1.45% for liver PDFF, all being higher than those reported in the present study. Also, in terms of CV, the agreement reported by Middleton et al (VAT, 3.6%; ASAT, 2.6%; thigh, 1.5%) was worse than those reported here except for liver PDFF which was 7.3% in their study. This could, however, be explained by the fact that the cohort studied by Middleton et al had a significantly wider range of liver PDFF with a higher mean (average, 7.1%; range, 1.8-27.9%) com-pared to our cohort. The limited range of PDFF values is a limitation of the present study. A meta-analysis of MR imag-ing methods for liver PDFF quantification showed a general reproducibility coefficient between methods of 5.47% and a repeatability coefficient of 4.68% on a subject level.40 Our

results show that the investigated methods are well below these levels. The high CV numbers for PDFF, in particular the 2p method, are likely due to the relatively low PDFF val-ues, which drives the CV to higher values. In a power cal-culation for a clinical study, however, the repeatability and

reproducibility coefficients are more relevant since they show the actual expected variation of the measurements.

While accuracy was not the main question for this study, an investigation of the agreement between fat-referenced 2p PDFF and the vendors' own 6p PDFF methods was in-cluded since accuracy of the fat-referenced 2p method has not been published earlier. Furthermore, 2-point methods in general are known to be unreliable for PDFF quantifica-tion as they are sensitive to a number of confounding fac-tors such as T∗

2 effects and T1 weighting. The fat-referenced

2p PDFF method used here corrects for T∗

2 effects using an

estimate of the T∗

2 from a separate GRE scan over the liver.

Furthermore, the fat-referenced method avoids T1-induced bias as shown in a study by Peterson et al.41 This is possible

since the fat-referenced method does not base the calibration on the relationship between water and fat signals, but on the fat signal in a reference tissue.42,43 The linear correlation of

0.953 between the 2p and 6p methods must be considered good given the relatively low range of PDFF values. More importantly, the range of the limits of agreement between the 2p and 6p methods of −3.09-1.84 percentage units is well below the reproducibility coefficient between PDFF methods reported in the meta analysis by Yokoo et al.40

In Figure 3, it is shown how the reproducibility is a com-bined effect of bias between scanners and within-scanner re-peatability. For all measures except MFI, the within-scanner repeatability explains much of the overall reproducibility. For MFI, the reproducibility coefficient is much larger than the repeatability coefficient, indicating that the differences be-tween scanners is the main source of variability in MFI mea-surements. A potential reason for this is the different imaging parameters used on the different scanners, such as image resolution, which may cause different partial volume effects that could affect the MFI measurements differently. For in-dividual thigh muscle volumes, the repeatability and repro-ducibility coefficients are rather similar, indicating that any difference in scanner bias is negligible for this measurement. The two methods for measuring liver fat had very similar reproducibility. For an individual scanner, the repeatability differed between the two methods, but in different ways for different scanners (compare eg, scanner 2 and 3 in Figure 3), and the aggregated repeatability (Table 2) was similar for the two methods. This indicates that the two methods have sim-ilar performance in terms of precision. It can also be noted in Figure 3 that the bias between the different vendors' 6p liver fat measurements is quite small, even though the ven-dors likely use different algorithms for computing the PDFF maps. But there is also a noticeable difference in repeatability between these five scanners.

The investigated analysis method performs automated segmentation of the different compartments followed by a manual quality control step, where the operator has the possi-bility to adjust the anatomical definition from the automated

(10)

segmentation in cases where the operator deems neces-sary. Comparing the operator repeatability (see Supporting Information Table S1) with the reproducibility (Table 4), it can be concluded that the operator variability contributes only moderately to the overall variability, except for individ-ual parts of the thigh muscles. It can also be seen that the interoperator repeatability was almost as good as the interop-erator variability. Hence, with this analysis method, opera-tors can be used interchangeably even in longitudinal studies. There is a higher inter- and intraoperator variability for the two liver fat methods than for MFI. A potential reason for this is that liver fat is measured in a number of manually po-sitioned ROIs, while MFI is measured in the complete mus-cle. Particularly in the presence of motion artifacts, this might increase variability.

In conclusion, we have presented what, to the best of our knowledge, is the first reproducibility study of an MRI-based body composition analysis method. It was shown that this method effectively eliminates effects due to MRI scanner differences. The results from this study can be used for power calculations in clinical studies or to bet-ter understand the scanner-induced variability in clinical applications. The results can also be used as a benchmark in future reproducibility studies of other methods for body composition analysis.

ACKNOWLEDGMENTS

The authors acknowledge Dr. Mikael Petersson for review-ing the statistical methodology and calculations, and Erika Snygg for assistance with data acquisition. Financial support by the Swedish Research Council (VR 2019-04751) is grate-fully acknowledged.

CONFLICTS OF INTEREST

M.B., T.R., O.D.L., and J.W. are shareholders of AMRA Medical AB. All authors are employed by AMRA Medical AB.

ORCID

Magnus Borga  https://orcid.org/0000-0002-9267-2191

Thobias Romu  https://orcid.org/0000-0003-0607-9795

Janne West  https://orcid.org/0000-0001-8369-0075

REFERENCES

1. Liu J, Fox CS, Hickson DA, et al. Impact of abdominal visceral and subcutaneous adipose tissue on cardiometabolic risk factors: The Jackson heart study. J Clin Endocrinol Metab. 2010;95:5419-5426. 2. Neeland IJ, Ayers CR, Rohatgi AK, et al. Associations of vis-ceral and abdominal subcutaneous adipose tissue with markers of cardiac and metabolic risk in obese adults. Obesity. 2013;21: E439-E447.

3. Neeland IJ, Turer AT, Ayers CR, et al. Body fat distribution and incident cardiovascular disease in obese adults. J Am Coll Cardiol. 2015;65:2150-2151.

4. Piché M-E, Poirier P, Lemieux I, Després J-P. Overview of epi-demiology and contribution of obesity and body fat distribu-tion to cardiovascular disease: An update. Prog Cardiovasc Dis. 2018;61:103-113.

5. Iwasa M, Mifuji-Moroka R, Hara N, et al. Visceral fat volume pre-dicts new-onset type 2 diabetes in patients with chronic hepatitis C.

Diabetes Res Clin Pract. 2011;94:468-470.

6. Kurioka S, Murakami Y, Nishiki M, Sohmiya M, Koshimura K, Kato Y. Relationship between visceral fat accumulation and anti- lipolytic action of insulin in patients with type 2 Diabetes Mellitus.

Endocr J. 2002;49:459-464.

7. van der Poorten D, Milner K-L, Hui J, et al. Visceral fat: A key mediator of steatohepatitis in metabolic liver disease. Hepatology. 2008;48:449-457.

8. Britton KA, Massaro JM, Murabito JM, Kreger BE, Hoffmann U, Fox CS. Body fat distribution, incident cardiovascular disease, can-cer, and all-cause mortality. J Am Coll Cardiol. 2013;62:921-925. 9. Doyle SL, Donohoe CL, Lysaght J, Reynolds JV. Visceral obesity,

metabolic syndrome, insulin resistance and cancer. Proceedings of

the Nutrition Society. 2011;71:181-189.

10. Ekstedt M, Franzén LE, Mathiesen UL, et al. Long-term follow-up of patients with NAFLD and elevated liver enzymes. Hepatology. 2006;44:865-873.

11. Ekstedt M, Nasr P, Kechagias S. Natural history of NAFLD/NASH.

Curr Hepatol Rep. 2017;16:391-397.

12. Goodpaster BH, Kelley DE, Thaete FL, He J, Ross R. Skeletal muscle attenuation determined by computed tomography is as-sociated with skeletal muscle lipid content. J Appl Physiol. 2000;89:104-110.

13. Marcus RL, Addison O, Dibble LE, Foreman KB, Morrell G, LaStayo P. Intramuscular adipose tissue, sarcopenia, and mobility function in older individuals. J Aging Res. 2012;2012:629637. 14. Linge J, Heymsfield SB, Dahlqvist Leinhard O. On the definition

of sarcopenia in the presence of aging and obesity—Initial results from UK biobank. J Gerontol: Series A. 2019:1-8.

15. Prentice AM, Jebb SA. Beyond body mass index. Obes Rev. 2001;2:141-147.

16. Thomas EL, Frost G, Taylor-Robinson SD, Bell JD. Excess body fat in obese and normal-weight subjects. Nutr Res Rev. 2012;25:150-161.

17. Tomiyama AJ, Hunger JM, Nguyen-Cuu J, Wells C. Misclassification of cardiometabolic health when using body mass index categories in NHANES 2005–2012. Int J Obes. 2016;40:883-886.

18. Neeland IJ, Poirier P, Despres J-P. Cardiovascular and metabolic heterogeneity of obesity clinical challenges and implications for management. Circulation. 2018;137:1391-1406.

19. Linge J, Borga M, West J, et al. Body composition profiling in the UK biobank imaging study. Obesity. 2018;26:1785-1795. 20. Thomas EL, Fitzpatrick JA, Malik SJ, Taylor-Robinson SD, Bell

JD. Whole body fat: Content and distribution. Prog Nucl Magn

Reson Spectrosc. 2013;73:56-80.

21. Cruz-Jentoft AJ, Morley JE. Sarcopenia; Chichester, West Sussex; Hoboken, NJ: Wiley-Blackwell, 2012;2012:219-220.

22. Borga M. MRI adipose tissue and muscle composition analysis—A review of automation techniques. Brit J Radiol. 2018;91:20180252. 23. Hu H, Chen J, Shen W. Segmentation and quantification of adi-pose tissue by magnetic resonance imaging. MAGMA: Magn Reson

Mater Phys, Biol Med. 2016;29:259-276.

24. Middleton MS, Haufe W, Hooker J, et al. Quantifying abdominal adipose tissue and thigh muscle volume and hepatic proton density

(11)

fat fraction: Repeatability and accuracy of an MR imaging–based semiautomated analysis method. Radiology. 2017;283:438-449. 25. Sullivan DC, Obuchowski NA, Kessler LG, et al. Metrology

standards for quantitative imaging biomarkers. Radiology. 2015;277:813-825.

26. Joshi AA, Hu HH, Leahy RM, Goran MI, Nayak KS. Automatic in-tra-subject registration-based segmentation of abdominal fat from 3D water-Fat MRI. J Magn Reson Imaging. 2013;37:423-430. 27. Grimm A, Meyer H, Nickel MD, et al. Repeatability of Dixon

magnetic resonance imaging and magnetic resonance spectroscopy for quantitative muscle fat assessments in the thigh. J Cachexia

Sarcopenia Muscle. 2018;9:1093-1100.

28. Newman D, Kelly-Morland C, Leinhard OD, et al. Test–retest reli-ability of rapid whole body and compartmental fat volume quantifi-cation on a widebore 3T MR system in normal-weight, overweight, and obese subjects. J Magn Reson Imaging. 2016;44:1464-1473. 29. Thomas MS, Newman D, Leinhard OD, et al. Test-retest

reli-ability of automated whole body and compartmental muscle vol-ume measurements on a wide bore 3T MR system. Eur Radiol. 2014;24:2279-2291.

30. Sorace AG, Wu C, Barnes SL, et al. Repeatability, reproducibility, and accuracy of quantitative MRI of the breast in the community radiology setting. J Magn Reson Imaging. 2018;48:695-707. 31. Karlsson A, Rosander J, Romu T, et al. Automatic and

quantita-tive assessment of regional muscle volume by multi-atlas segmen-tation using whole-body water-fat MRI. J Magn Reson Imaging. 2015;41:1558-1569.

32. Sullivan DC, Obuchowski NA, Kessler LG, et al. Metrology standards for quantitative imaging biomarkers. Radiology. 2015;277:813-825. 33. West J, Dahlqvist Leinhard O, Romu T, et al. Feasibility of MR-based body composition analysis in large scale population studies.

PLoS One. 2016;11:e0163332.

34. Leinhard OD, Johansson A, Rydell J, et al. Quantitative abdomi-nal fat estimation using MRI. In: 19th Internatioabdomi-nal Conference on Pattern Recognition, Vols 1-6, International Conference on Pattern Recognition; 2008. p 2137-2140.

35. Borga M, West J, Bell JD, et al. Advanced body composition as-sessment: From body mass index to body composition profiling.

J Investig Med. 2018;66:887-895.

36. West J, Romu T, Thorell S, et al. Precision of MRI-based body composition measurements of postmenopausal women. PLoS

ONE. 2018;13:e0192495.

37. Bartlett JW, Frost C. Reliability, repeatability and reproducibil-ity: Analysis of measurement errors in continuous variables.

Ultrasound Obstet Gynecol. 2008;466-475.

38. Bland JM, Altman DG. Measuring agreement in method compari-son studies. Stat Methods Med Res. 1999;8:135-160.

39. Linge J, Whitcher B, Borga M, Dahlqvist LO. Sub-phenotyping metabolic disorders using body composition: An individual-ized, nonparametric approach utilizing large data sets. Obesity. 2019;27:1190-1199.

40. Yokoo T, Serai SD, Pirasteh A, et al. Committee FtR-QPB. Linearity, bias, and precision of hepatic proton density fat fraction measurements by using MR imaging: A meta-analysis. Radiology. 2018;286:486-498.

41. Peterson P, Romu T, Brorson H, Dahlqvist Leinhard O, Månsson S. Fat quantification in skeletal muscle using multigradient-echo imaging: Comparison of fat and water references. J Magn Reson

Imaging. 2016;43:203-212.

42. Dahlqvist Leinhard O, Johansson A, Rydell J, Borga M, Lundberg P. Intensity Inhomogeneity Correction in Two Point Dixon Imaging. In Proceedings of the ISMRM Annual Meeting. International

Society of Magnetic Resonance in Medicine; 2008 May; Toronto,

Canada. p 1519.

43. Hu HH, Nayak KS. Quantification of absolute fat mass using an adipose tissue reference signal model. J Magn Reson Imaging. 2008;28:1483-1491.

SUPPORTING INFORMATION

Additional Supporting Information may be found online in the Supporting Information section.

TABLE S1 Intra- and inter-operator variability. sw—

within-subject standard deviation. cl—centiliter

How to cite this article: Borga M, Ahlgren A,

Romu T, Widholm P, Dahlqvist Leinhard O, West J. Reproducibility and repeatability of MRI-based body composition analysis. Magn Reson Med. 2020;00: 1–11. https://doi.org/10.1002/mrm.28360

References

Related documents

Figure 9 shows the offsets from the mean reference value in X and Y direction for the case of using the Canon macro lens EF 100 mm with the mirror flipping back after each

[r]

′ ′″′′ ′ α 's angrcbshyppighcd(antal ag per vart) mcd stttlere gradient,mcns P.グ ″ b′ ″ s viscr cn svag rnodsat tcndcns.3)Fordelingcrnc imcllcm tracr i den

Still, the effect of the rate constant is significantly smaller for an energy well compared to an energy barrier, where from transition state theory a similar increase or decrease

ad exprimendam notionem rS 7ϊληξΰ&lt;&amp;οζι, aliud, quam noiter habet, qurcramus vcrbum Svecanum. Si vero audacioribus eile licuerit, ienfuin potius, quam propri-. atn t£?

3 H 3CCH3 Syra-bas i organisk kemiacceptor Chemical Principles, Atkins and Jones 11.2 (10.2) Organic Chemistry, Clayden, Greeves, Warren and Wothers pp

Förvaltningsrätten har, utifrån de aspekter domstolen har att beakta, inte något att erinra mot de förslag som framgår av promemorian. Detta remissyttrande har beslutats av

Detta yttrande har beslutats av stallforetradande generaldirektoren. Foredragande har vant sakkunnige Bjorn Axelsson.. cl&lt;~