Do therapist effects really impact estimates of within-patient mechanisms of change? A Monte Carlo simulation study

(1)

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tpsr20

Psychotherapy Research

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tpsr20

Do therapist effects really impact estimates of

within-patient mechanisms of change? A Monte

Carlo simulation study

Fredrik Falkenström , Nili Solomonov & Julian A. Rubel

To cite this article: Fredrik Falkenström , Nili Solomonov & Julian A. Rubel (2020) Do therapist effects really impact estimates of within-patient mechanisms of change? A Monte Carlo simulation study, Psychotherapy Research, 30:7, 885-899, DOI: 10.1080/10503307.2020.1769875

To link to this article: https://doi.org/10.1080/10503307.2020.1769875

View supplementary material

Published online: 02 Jun 2020. _{Submit your article to this journal}

Article views: 860 _{View related articles}

(2)

METHOD PAPER

Do therapist effects really impact estimates of within-patient

mechanisms of change? A Monte Carlo simulation study

FREDRIK FALKENSTRÖM 1, NILI SOLOMONOV 2, & JULIAN A. RUBEL 3

1

Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden;2Weill Cornell Institute of Geriatric Psychiatry, Weill Cornell Medical College, White Plains, NY, USA &3Department of Psychology, Justus Liebig University Giessen, Giessen, Germany

(Received 5 April 2020; revised 8 May 2020; accepted 9 May 2020)

ABSTRACT

Objective: Existing evidence highlights the importance of modeling differential therapist effectiveness when studying psychotherapy outcome. However, no study to date examined whether this assertion applies to the study of within-patient effects in mechanisms of change. The study investigated whether therapist effects should be modeled when studying mechanisms of change on a within-patient level.

Methods: We conducted a Monte Carlo simulation study, varying patient- and therapist level sample sizes, degree of therapist-level nesting (intra-class correlation), balanced vs. unbalanced assignment of patients to therapists, and fixed vs random within-patient coefficients. We estimated all models using longitudinal multilevel and structural equation models that ignored (2-level model) or modeled therapist effects (3-level model).

Results: Across all conditions, 2-level models performed equally or were superior to 3-level models. Within-patient coefficients were unbiased in both 2- and 3-level models. In 3-level models, standard errors were biased when number of therapists was small, and this bias increased in unbalanced designs. Ignoring random slopes led to biased standard errors when slope variance was large; but 2-level models still outperformed 3-level models.

Conclusions:In contrast to treatment outcome research, when studying mechanisms of change on a within-patient level, modeling therapist effects may even reduce model performance and increase bias.

Keywords:Mechanisms of change; Therapist effects; Cross-Lagged Panel Model; Multilevel Modeling; Structural Equation Modeling

Clinical or Methodological Significance of this Article: Our findings suggest that therapist effects can be ignored in longitudinal studies focusing on within-patient effects. Including therapist effects in statistical models focusing on within-patient effects when the number of therapists is small and/or thera-pists treat unequal numbers of patients may even increase the risk of biased inference.

In recent years, there has been an upsurge of interest in studying mechanisms of change in psy-chotherapy research. A mechanism of change is a theoretically postulated underlying target for

therapeutic interventions that, if changed, theoreti-cally leads to change in outcome (Kazdin, 2007). Therapeutic approaches differ in hypothesized mech-anisms presumed to lead to clinical improvement. For example, cognitive therapy for depression targets patients’ distorted cognitions about the self, the world, and the future (Beck et al.,1979). Psycho-dynamic therapy facilitates patients’ insight into pro-blematic relationship patterns, awareness of avoided emotions, and/or a corrective emotional experiences with the therapist (Summers & Barber, 2010). Some mechanisms are trans-theoretical and

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.

∗_{Correspondence concerning this article should be addressed to Fredrik Falkenström, Department of Behavioral Sciences and Learning,}

Lin-köping University, LinLin-köping SE-581 83, Sweden. Email: fredrik.falkenstrom@liu.se Vol. 30, No. 7, 885–899, https://doi.org/10.1080/10503307.2020.1769875

(3)

assumed to facilitate therapeutic change across treat-ment models (e.g., the working alliance).

This increased focus on change mechanisms has been accompanied by an emphasis on collecting longitudinal data with repeated measurements of both candidate mechanisms and outcome through-out treatment, allowing researchers to study nuanced and complex associations over time (Falk-enström et al., 2017). Recent decades have also been marked with developments in the statistical modeling of longitudinal data and increased aware-ness of their complexity (e.g., Allison et al., 2017; Curran et al., 2014; Curran & Bauer, 2011; Hamaker et al., 2015; Wang & Maxwell, 2015; Zyphur et al., 2019). One such complexity is the issue of separating within- from between patient effects. Traditionally, mechanisms of change research focused on between-patient effects—examining whether average differences in a mechanism across patients predicts change in outcome across patients. Within-patient effects are relationships between fluc-tuations over time in two or more variables for each given patient (Hamaker, 2012; Lundh & Falken-ström,2019; Molenaar & Campbell,2009). That is, within-patient effects focus on associations between variables over time (across time-points) for each given patient. In contrast to between-patient effects, within-patient effects are not influenced by any factor that does not change during the time of the study. This characteristic of within-patient effects allows the exclusion of a number of alternative expla-nations connected to potentially confounding stable variables that are relegated to the between-patient level, and thus provide stronger evidence for causal-ity. Conceptually, within-patient effects are also better aligned with theories of therapeutic change that focus on change processes on an individual-patient level, as well as recent emphasis on personali-zation of therapeutic processes (Barber & Solomo-nov, 2019; Rubel et al., 2019). Our focus on mechanisms of change should not be confused with mediation analysis, which is a method commonly used for studying mechanisms of change by estimat-ing indirect effects of treatment on outcome via a can-didate mechanism. Our focus is instead on within-patient fluctuations in a candidate mechanism pre-dicting within-patient changes in outcome, which we would argue is a method well-suited for studying mechanisms of change (Falkenström et al.,2020).

Therapist Effects in Mechanisms of Change Research

Therapists vary in their ability to help their patients get better (e.g., Baldwin & Imel,2013). While thera-pists’ efficacy in reducing symptomatic distress has

been widely studied, little is known about the effects of therapists’ variability in facilitating improve-ment in candidate mechanisms of change. Inclusion of therapist effects when modeling change over time requires attention to “nesting”—most therapists treat more than one patient, and thus observations collected from their patients are dependent. This nested structure violates the assumption of indepen-dence of observations required for most statistical tests (Adelson & Owen,2012).

Early on, Crits-Christoph and Mintz (1991) noted the risk for increased Type-I error rate, and overesti-mation of population treatment effects, if nesting of patients within therapists is ignored. Since then, a number of simulation studies have confirmed this premise, suggesting that therapist effects should be modeled when examining outcome over time to avoid bias in the estimation of treatment effects and increased risk for Type-I error (de Jong et al.,2010; Magnusson et al., 2018; Wampold & Serlin,2000). The most common method to adjust for therapist effects is allowing for variation in outcome among therapists by inclusion of random effects in multilevel models (e.g., Snijders & Bosker,2012). While thera-pist effects in outcome studies have been relatively widely studied, much less is known regarding the impact of modeling therapist effects in the study of mechanisms of change, especially when studying within-patient effects.

Therapist Effects in Within-patient Studies Consider first a simple within-patient regression model, also called fixed effects or mean centered model, in which patient means in both independent and dependent variables are eliminated by person-mean centering, i.e.:

Yi,t− Yi = b0+ b1(Xi,t−1− Xi)+ 1i,t.

(Fixed effects/mean centered model)

In this model, Yi,t is outcome for individual i at

time t,b0 is the intercept term,b1 is the effect of X

for individual i at time t-1, and 1i,t is the model

error term.1 All between-patient variance is elimi-nated by person-mean centering, i.e., subtracting the patient means from each observation of Y and X. What is left after centering is time-specific devi-ations from each patient’s mean, which constitute the patient scores. Regression of within-patient scores in Y on within-within-patient scores in X yields the within-patient effect. In this model, thera-pist mean differences in Y and X are both eliminated through mean centering, because when there are no mean differences among patients, there are no thera-pist differences either.

(4)

Multilevel modeling (MLM). In MLM, person-level means in Y are not eliminated by mean centering, but rather modeled using a random intercept term, which captures the variance in means across individuals. Similar to mean centering, what remains is within-patient deviation scores in Y. However, in standard MLM it is not possible to estimate a random inter-cept for X. In order to acquire a true within-patient regression model, X still needs to be person-mean centered manually. The person-mean centered model (Firebaugh et al.,2013; Hoffman & Stawski,2009), combines these two methods by using mean center-ing/fixed effect for X but random effect for Y. It is one of the most commonly used within-patient effects model in psychotherapy research:

Level 1:

Yi,t = b0i+ b1(Xi,t₋₁− Xi)+ 1i,t.

Level 2:

b0i= g00+ u0i.

(2-level Person-mean centered model)

In this model, Level-1 refers to the within-patient level, i.e., repeated measurements across time, and Level-2 refers to the between-patient level. Between-person variation in Y is captured by the patient-level random intercept u0i, while X is still

person-mean centered. The estimate for coefficient b1 should, however, be the same as in the fixed

effects model. The advantage of the hybrid random effects model, compared to the fixed effects model, is that between-patient variance in Y is not eliminated but can be modeled and potentially explained using between-patient predictors. For instance, the Level-2 intercept u0i, can, in principle, be modeled further

by adding a random intercept on the therapist level, thus creating a three-level model with repeated measures nested within patients, which in turn are nested within therapists:

Level 1:

YT ,i,t= b0T ,i+ b1(XT ,i,t−1− XT ,i) + 1T ,i,t.

Level 2:

b0T ,i= d00T + u0T ,i.

Level 3:

d00T = g000+ v0T.

(3-level Person-mean centered model)

In this model, we have added Level-3 which is the between-therapist level. YT ,i,trepresents outcome for

therapist T with patient i at time t. X is centered at the patient level, as in the previous model. The only difference between this and the previous model is that the additional random effect v0T models the

between-therapist variance among patient-level means, i.e., it is u0i in the previous model which is

further divided into one between-therapist com-ponent (v0T) and one within-therapist deviation

com-ponent (u0T ,i).

The consensus in psychotherapy research suggests that ignoring a level of nesting in the model (in our case—therapists) can lead to a bias in standard errors resulting in too liberal testing (e.g., Crits-Christoph & Mintz, 1991; Wampold & Serlin,

2000). Although usually not explicitly stated, this refers to when the ignored level is the one immedi-ately“above” the one on which the effect of interest is located, e.g., ignoring therapist effects (Level-3) when focusing on differential outcome among patients (a Level-2 effect). However, in the case of Level-1 effects, Moerbeek (2004) argued that in balanced designs these are unaffected by the omission of Level-3 nesting, because variance components at Level-3 will only be redistributed at Level-2, leaving Level-1 unaffected. Thus, leaving out Level-3 will affect standard errors at Level-2, but not at Level-1. Given that psychotherapy studies are almost invari-ably unbalanced, i.e., in our case when therapists treat equal numbers of patients, this finding may not apply. Thus, there is a need to evaluate the impact of ignoring therapist effects in within-person mechanism of change studies.

So far, we have only considered therapist effects in the form of average differences in predictor and/or outcome variable among therapists. However, there is also the possibility that the within-patient coeffi-cient itself (i.e.,b1) may vary among therapists. For

instance, in alliance-outcome research, it is possible that the lagged effect of alliance on symptomatic dis-tress in the next session will be stronger for some therapists than for others, perhaps because some therapists are more able to utilize the alliance in an effective way. Alternatively, the effect of challenging cognitions on subsequent depressive symptoms may be more pronounced in therapists working within a cognitive model than in therapists working within a psychodynamic model of depression. The following equations show the hybrid random effects model with a therapist-level random slope:

Level 1:

(5)

Level 2:

b0T ,i= d00T + u0,T ,i. b1T ,i= d01T + u1T ,i.

Level 3:

d00T = g000+ v0T. d01T = g001+ v1T.

(3-level Person-mean centered model with random slopes) In this model we have allowed the slope of the within-patient coefficientb1to vary across therapists

and patients. The variation in slopes across patients, within therapists, is captured by the term u1T ,i, and

the variation in slopes across therapists is captured by the term v1T. If the above is the true population

model, and estimation is carried out using a fixed coefficient model, results may be biased (Baird & Maxwell, 2016). To our knowledge, there are no studies testing whether ignoring therapist-level random slopes affects estimates of within-person effects.

Limitations of MLM estimation of lagged within-person effects. So far, we have considered only a fairly simple lagged effect model within an MLM frame-work. This model ignores several complexities that may be present in time-series or panel data frame-works, some which can be addressed within an SEM framework. First, there is the issue of autore-gression, i.e., the influence of the outcome variable at time t on itself at time t+1. This effect needs to be adjusted for, otherwise we need to assume that there are no pre-existing differences in Y at time t. Adjusting for the prior value of Y means that lagged X predicts residualized change in Y, thus ensuring correct temporality—i.e., that the predic-tion of Y at time t+1 from X at time t are not purely a consequence pre-existing differences in Y at time t. Within the standard MLM framework, it is compli-cated to estimate autoregression while simul-taneously separating within- from between-patient variance. If a lagged dependent variable is added as a covariate, the estimated model coefficients will be biased. This is variously called “dynamic panel bias” or “Nickel’s bias” (Nickell,1981). It is possible to allow an autoregressive structure of the residuals, but this model still does not fully take into account the dynamic implications of the time-series data structure (Asparouhov & Muthén,2019).

Another limitation of the standard MLM frame-work is that it assumes unidirectionality, and thus, does not include reverse effects, i.e., “feedback effects,” from the dependent variable back to the pre-dictor (i.e., from Y to X). In studies of working alli-ance and psychological distress, for installi-ance, it is now fairly well-established that these variables

mutually influence each other over time, so that higher alliance scores in one session predicts lower distress in the next session, which in turn predicts higher alliance quality and so on (e.g., Falkenström et al., 2013; Xu & Tracey, 2015; Zilcha-Mano & Errázuriz,2015). In SEM framework, this issue can be addressed through modeling of bidirectional relationships in cross-lagged panel models (e.g., Hamaker et al.,2015).

Structural Equation Modeling. In SEM several equations are estimated simultaneously, with data in wide format (i.e., one row per patient in a data frame) where each variable and each wave of data has its own equation, e.g.:

Yi,t2= b0+ b1Xi, t1+ b2Yi, t1+ 11.

Yi,t3= b0+ b1Xi, t2+ b2Yi, t2+ 12.

(Unidirectional Cross-lagged panel model)

In these equations,b0is the intercept, whileb1is

the coefficient for the cross-lagged effect of X on Y.2In addition, we have here incorporated the auto-regressive effect, which is captured inb2. The above

equations only incorporate three waves, but additional waves can easily be added.

To model the reverse effects, i.e., the feedback from Y to X, we add the following equations:

Xi,t2= g0+ g1Yi, t1+ g2Xi, t1+ 13.

Xi,t3= g0+ g1Yi, t2+ g2Xi, t2+ 14.

(Bidirectional Cross-lagged panel model)

Here,g0is the intercept,g1is the coefficient for the

cross-lagged effect of Y on X, andg2 is the

autore-gression coefficient for X. This model, with all the above equations estimated simultaneously, is called cross-lagged panel model. Usually, the error terms at the same time-point are allowed to covary (i.e., 11

and15, 12and16, and so on) to capture

contempora-neous relationships between variables.

In the classical cross-lagged panel model, within- and between-patient effects are conflated. The separation of within- and between-patient variances in the dependent variable is accomplished in both MLM and SEM through the inclusion of random intercept terms. This turns SEM into Multilevel SEM (ML-SEM; e.g., Mehta & Neale,2005), which will be rep-resented by the addition of the person-specific random intercept u0i. This is shown in the following

equation (now using general rather than time-specific notation as above):

(6)

The difference from standard MLM is that in ML-SEM it is possible to add a random intercept term for X:

Xi,t= g0+ g1Yi, t−1+ g2Xi, t−1+ u1i+ 11i,t.

Similar to person-mean centering in MLM, the inclusion of a random intercept term for X separates the person-specific means, that are captured in the random intercept, from the time-specific deviations from the person’s mean that are used for the cross-lagged model. The above model is called Random Intercept Cross-Lagged Panel Model (RI-CLPM; Hamaker et al., 2015). The random intercepts u0i

and u1iare allowed to covary, since it is likely that if

there is a relationship between X and Y at the within-person level, there will also be a relationship between X and Y at the between-person level. Due to the fact that between- and within levels are separ-ated by means of random effects in both X and Y, the problem of modeling autoregression that is present in MLM does not occur in SEM (Allison et al., 2017). Again, however, adding a third level random intercept will simply model the between-therapist variances in patient-level means; i.e., u0i

and u1i are further separated into between- and

within-therapist components:

YT ,i,t= b0+ b1XT ,i, t₋₁+ b2YT ,i, t₋₁+ u0iT ,i+ v00T

+ 11T ,i,t.

XT ,i,t= g0+ g1YT ,i, t−1+ g2XT ,i, t−1+ u1T ,i+ v01T

+ 12T ,i,t.

(RI-CLPM with random intercepts for therapists) Theoretically, the within-patient estimates for b1

and g1 should be unaffected, since v00T and v01T

only models the variances in u0iT ,i and u1T ,i,

respectively.

Limitations of SEM estimation of lagged within-person effects. As in MLM, there is the possibility thatb1(or

any other within-level coefficient) varies among patients and/or therapists. Estimating random slope models in ML-SEM is possible, just as in standard MLM, although this requires numerical integration which is computationally burdensome and time-con-suming, and may become intractable when the number of integration dimensions becomes large (Asparouhov & Muthén,2007).

The Simulation Study

The aim of this Monte Carlo simulation study was to test the impact of ignoring the nesting of patients

within therapists in within-patient analyses, under various conditions reasonably representative for psy-chotherapy research. The focus of this analysis was b1in the previous equations, i.e., the within-patient

cross-lagged effect of X on Y. A recent study of thera-pist effects in outcome research suggested bias when ignoring therapist effects in samples with few thera-pists, many patients per therapist, large proportion of variance at the therapist level (relative to the within-patient residual variance), and unbalanced allocation of patients to therapists (Magnusson et al., 2018). Thus, we decided to experimentally manipulate the following: (a) degree of therapist-level nesting; (b) number of therapists; (c) number of patients per therapist; (d) MLM and SEM models; and (e) balanced and unbalanced allocation of patients to therapists. In addition, we tested the impact of ignoring therapist-level random slopes for the within-patient coefficient under (1) different total slope variances; and (2) different proportions of slope variance at the therapist level, as well as con-ditions (a)–(d) above.

Models and Software

We created the datasets and estimated MLM simu-lations using the Person-mean centered model, and SEM models using the RI-CLPM (for equations see above) using Mplus v.8.1.7 (Muthén & Muthén, 1998-2017) and the MplusAutomation package in R (Hallquist & Wiley,2018). All models were estimated using Maximum Likelihood with robust standard errors.

Fixed Conditions

We set the effect size (standardized regression coeffi-cient) for the cross-lagged effect to β = .30. In addition, we also tested β = 0 to see whether the models correctly indicate non-significance when there is no effect in the population, and to calculate the empirical alpha level. For the RI-CLPM, we set the autoregressive effects b2 and g2 to .50 and the

contemporaneous error correlations to .20. Experimental Conditions for Data Generation

Degree of nesting. We included three conditions for the Intraclass Correlation Coefficient (ICC) at the thera-pist level: (a) ICC = 0 i.e., no nesting at therathera-pist level, (b) ICC = .05; the average ICC reported in meta-analyses of therapist effects in psychotherapy research (e.g., Baldwin & Imel, 2013); (c) ICC = .20; and (d) ICC = .40. Although an ICC of .40 may seem large, this ICC size may be found when

(7)

therapists differ in their self-report style on change in mechanisms (Hatcher et al., 2019). In all simu-lations, we assigned 50% of variance to Level-1, the within-patient (repeated measures) level, while split-ting the remaining 50% variance between Level-2 and Level-3 depending on the therapist-level ICC. For example, when the therapist-level ICC was set to .05, Level-3 was assigned 5%, and Level-2 45% of total variance, while when the ICC was set to .40, Level-3 was assigned 40% variance and Level-2 10%.

Sample size. We created datasets with five repeated measurements, nested within 50, 100 or 200 patients, to reflect fairly realistic scenarios in psychotherapy studies (e.g., Lambert,2013).

Number of therapists and number of patients per thera-pist. Each sample size was created with a different number of therapists, and a different number of patients per therapist. Thus, N = 50 was either 5 therapists with 10 patients each, or 10 therapists with five patients each, N = 100 was either five thera-pists with 20 patients each or 20 therathera-pists with five patients each, and N = 200 was five therapists with 40 patients each, 20 therapists with 10 patients each, or 40 therapists with five patients each.

Unbalanced allocation. We created six conditions with unbalanced allocation of patients to therapists, common in psychotherapy studies. Within each of these sample sizes we created two unbalanced con-ditions—one with relatively many therapists and one with relatively few therapists. We varied the numbers of patients treated by therapists as much as possible, within reasonable limits. The conditions were the following:

N = 50. (a) Two therapists treating 15 patients each and ten therapists treating two patients each (total of 12 therapists); (b) One therapist treating 30 patients, two therapists treating eight patients each, and two therapists treating two patients each (total of five therapists).

N = 100. (a) Two therapists treating 20 patients, four therapists treating ten patients each, and ten therapists treating two patients each (total of 16 therapists); (b) Two therapists treating 25 patients, two therapists treating 15 patients each, and two therapists treating 10 patients each (total of six therapists).

N = 200 (a) Three therapists treating 25 patients each, five therapists treating 15 patients each, and ten therapists treating five patients each (total of 18 therapists); (b) Two therapists treating 50 patients each, three therapists treating 30 patients each, and two therapists treating five patients each (total of seven therapists).

Random slopes for the cross-lagged effect. We chose large values for the slope variances to create

conditions where potential problems can emerge while remaining within reasonable realistic con-ditions. Since the fixed effect was set to a standar-dized regression coefficient of .30, a therapist-level standard deviation of 0.15 would mean that approxi-mately 95% (corresponding to about 2 standard devi-ation units) of the therapists had effects that were larger than zero. With larger variation than that, a substantial proportion of patients would have zero or even negative effects of the mechanism on outcome, which seems unlikely. Thus, we chose SD = 0.15 as our upper boundary for the total between-therapist variation in slopes. We also exam-ined a condition of SD = 0.10, to test a somewhat smaller variation in slopes. In the next step, we included the Slope Intraclass Correlation, or Slope ICC (Magnusson et al., 2018), which represents therapists’ contribution to the total slope variance. Slope ICC is calculated as the therapist-level variance in slopes divided by the total variance in slopes (i.e., therapist + patient level variances). Since meta-ana-lytic evidence has shown therapist effects on outcome to be between 5-10% (Baldwin & Imel, 2013), we set the Slope ICC to be either .05, .10 or .20.

Estimation and Assessment of Estimator Performance

In each condition, we created 2000 samples and sub-sequently analyzed the data using the Person-mean centered MLM or RI-CLPM, in one of two formats: (1) a 2-level model which ignored therapist-level nesting and (2) a 3-therapist-level model which took thera-pist-level nesting into account. The difference between these two models is thus whether the v vari-ables were included or not, i.e., v0T/v1T in the

Person-mean centered MLM, and the v00T and

v01T in the RI-CLPM. We tested each model

esti-mates for:

(1) Proportional coefficient bias. Calculated as the difference between the average estimate ofβ and the true population value forβ, divided by the true population value for β, i.e., ( ˆb− b/b). This represents the percentage average bias from the true coefficient value. A rule of thumb for unbiased models is that this should not exceed 5% (Muthén & Muthén, 2002). For the models with true population value β = 0, coefficient bias cannot be calculated as a proportion since that would involve division by zero. Instead, we report the average deviation score (which will be the average estimate of β).

(8)

(2) Coverage of the 95% confidence intervals. This is calculated as the proportion of times the estimated 95% confidence interval ofβ con-tains the true effect. Coverage is considered a robust indicator of standard error esti-mation (T. Asparouhov, personal communi-cation, January 2nd, 2020). Coverage should be close to 95%, and a rule of thumb is that it should be between 91 and 98% (Muthén & Muthén,2002).

(3) Statistical power. This is calculated as the pro-portion of times the model yielded a statisti-cally significant effect when the population effect was non-zero. As is common in the be-havioral sciences, we used 80% power as the criterion for large-enough statistical power (Cohen,1992).

(4) Empirical alpha level. In models in which the population effect ofβ was set to zero, the pro-portion of times the model yielded a false-positive statistically significant effect can be interpreted as the empirical alpha level. This should be close to 5%.

Results

Overall, results of the MLM and SEM models were similar across conditions (instances of differences are described below). Model convergence rates were good, with 100% of MLM models and between 97.8-100% of SEM models converging, with the exception for the 3-level SEM models with random slopes for which none of the estimations con-verged. The complete results are summarized in

Tables I–V.3

Balanced Allocation of Patients to Therapists (Table Iand Tables S2-S3)

Degree of therapist-level nesting (ICC). With balanced assignment of patients to therapists, ignoring thera-pist-level nesting did not impact estimated within-patient coefficients. In MLM, with manually patient-mean centered predictor variables, the esti-mates were identical for conditions of ICC = .00, .05, .20, or .40 (Table I). In SEM, where centering is accomplished by latent variable modeling, esti-mates were highly similar but not identical (Tables S2 and S3).

Number of therapists. The 3-level models showed poor performance in the estimation of standard errors when the number of therapists was small. For the 3-level model, at least 10 therapists were needed to achieve accurate estimation of standard errors, as shown by coverage rates falling below 91% and alpha levels increasing to ∼ 15% in the conditions

with only five therapists (Table I, Table S2 and Table S3). Thus, estimating a 3-level model with too few therapists increases the risk of Type-I error. In contrast, the 2-level model, which ignores thera-pist-level nesting, showed no problems regardless of the number of therapists.

Number of patients per therapist. Overall, the number of patients per therapist did not impact the perform-ance of neither 2- nor 3-level models for MLM and SEM (seeTables I, S2 and S3).

Unbalanced Allocation of Patients to Therapists (Tables II,IVand S4)

For both MLM (Table II) and SEM (Table IVand S4), the two-level model was unaffected by unba-lanced allocation, while the 3-level model showed poorer performance in terms of bias in standard errors compared to balanced conditions. Coverage rates for the three-level models were below the 91% threshold in almost all conditions, and alpha levels ranged between 9 and 24%.

Degree of therapist-level nesting (ICC). Results were comparable to the balanced conditions, such that estimates did not depend on the degree of nesting (for MLM exactly identical results across all ICC levels, for SEM similar but not identical results).

Number of therapists. In the 3-level models, results were similar to the balanced conditions such that the largest bias was seen in conditions with few thera-pists. In contrast, the 2-level models were unaffected by unbalanced assignment, regardless of the number of therapists.

Number of patients per therapist. Since the number of patients per therapist varied within each condition (by definition to create unbalanced data), it is not possible to make inferences with certainty regarding results under this condition.

Varying Coefficients among Therapists (Random Slope Models; TablesIII,Vand S1) We created data for which the slopes for the Level-1 coefficient varied at both Level-2 (patients) and Level-3 (therapists). The 2-level estimation models were the same as previously, i.e., assuming a fixed Level-1 coefficient—thus ignoring random slopes altogether,4 while the 3-level model estimated random slopes on both Level-2 and -3. Overall, results were mostly unbiased in the fixed coefficient (2-level) model, in both MLM (Table III and S1) and SEM (Table V). Of note, none of the 3-level SEM models converged. In MLM, 3-level models converged but there were problems in the estimation of standard errors in some of the conditions.

(9)

Table I. Simulation results for Person-mean centered model with balanced assignment of patients to therapists at ICC = 0, .05, .20, and .40a.

2-level estimates 3-level estimates

β = .30 β SE β SE

NL2 NL3 Est. b

Biasc Powerd SDe Est.f Coverg Est.b Biasc Powerd SDe Est.f Coverg 50 5 0.298 −0.7% 95.6% 0.081 0.076 93.3% 0.298 −0.7% 96.6% 0.081 0.066 83.0% 50 10 0.298 −0.6% 96.2% 0.078 0.076 93.7% 0.298 −0.6% 96.2% 0.078 0.071 89.9% 100 5 0.297 −1.0% 100.0% 0.055 0.055 95.0% 0.297 −1.0% 99.9% 0.055 0.046 84.0% 100 20 0.297 −0.9% 99.9% 0.056 0.055 94.2% 0.297 −0.9% 100.0% 0.056 0.053 92.1% 200 5 0.299 −0.2% 100.0% 0.039 0.039 95.1% 0.299 −0.2% 100.0% 0.039 0.033 84.8% 200 20 0.300 −0.1% 100.0% 0.039 0.039 95.0% 0.300 −0.1% 100.0% 0.039 0.038 93.4% 200 40 0.299 −0.3% 100.0% 0.040 0.039 93.8% 0.299 −0.3% 100.0% 0.040 0.038 92.4% β = 0 Est.b Biasc Alphah SDe Est.f Coverg Est.b Biasc Alphah SDe Est.f Coverg 50 5 −0.001 – 6.9% 0.084 0.080 93.0% −0.001 – 16.0% 0.084 0.068 84.0% 50 10 −0.001 – 6.4% 0.083 0.080 93.7% −0.001 – 11.2% 0.083 0.074 88.8% 100 5 −0.003 – 5.8% 0.058 0.057 94.2% −0.003 – 15.6% 0.058 0.049 84.4% 100 20 −0.003 – 6.0% 0.060 0.057 94.1% −0.003 – 8.0% 0.060 0.055 92.0% 200 5 −0.001 – 4.6% 0.040 0.041 95.4% −0.001 – 15.0% 0.040 0.035 85.0% 200 20 −0.001 – 4.5% 0.041 0.041 95.5% −0.001 – 6.4% 0.041 0.039 93.7% 200 40 −0.001 – 6.3% 0.042 0.041 93.7% −0.001 – 7.3% 0.042 0.040 92.7%

a_{Since estimates were exactly identical for all ICC levels, only one set of results are presented in the table.} b_{Average estimate of}_{β over 2000 replications.}

c_{Calculated as ( ˆ}_b₋_b₎_/_b_.

d_{Proportion of 2000 replications yielding significant estimate when}_{β = .30.}

e_{Standard deviation of}_{β over 2000 replications (used as estimate of population standard error).} f_{Average of estimated standard errors over 2000 replications.}

g_{Proportion of 2000 replications when 95% confidence intervals contain the population value of}_β. h_{Proportion of 2000 replications yielding significant estimate when}_{β = 0.}

Table II. Simulation results for Person-mean centered model with unbalanced assignment of patients to therapists at ICC = 0, .05, .20, and .40.

β = .30 β SE β SE

NL2 NL3 Est.b Biasc Powerd SDe Est.f Coverg Est.b Biasc Powerd SDe Est.f Coverg

50 5 0.297 −0.9% 96.1% 0.080 0.076 93.7% 0.297 −0.9% 97.5% 0.080 0.055 74.7% 50 10 0.298 −0.6% 96.7% 0.078 0.076 93.9% 0.298 −0.6% 96.9% 0.078 0.067 87.7% 100 5 0.298 −0.8% 100.0% 0.055 0.055 94.4% 0.298 −0.8% 99.9% 0.055 0.047 84.7% 100 16 0.297 −1.1% 100.0% 0.056 0.055 94.0% 0.297 −1.1% 99.9% 0.056 0.050 89.0% 200 7 0.299 −0.4% 100.0% 0.039 0.039 94.9% 0.299 −0.4% 100.0% 0.039 0.033 85.8% 200 18 0.300 −0.1% 100.0% 0.039 0.039 94.7% 0.300 −0.1% 100.0% 0.039 0.036 90.8% β = 0 Est.b _Biasc _Alphah _SDe _Est.f _Coverg _Est.b _Biasc _Alphah _SDe _Est.f _Coverg

50 5 −0.003 – 6.2% 0.082 0.080 93.8% −0.0025 – 24.0% 0.082 0.057 76.0% 50 10 −0.002 – 6.4% 0.082 0.080 93.6% −0.0021 – 12.9% 0.082 0.070 87.1% 100 5 −0.002 – 5.4% 0.058 0.057 94.7% −0.0018 – 14.7% 0.058 0.049 85.3% 100 16 −0.003 – 6.4% 0.059 0.057 93.6% −0.0032 – 11.3% 0.059 0.052 88.7% 200 7 −0.001 – 6.0% 0.041 0.041 94.0% −0.0014 – 15.8% 0.041 0.035 84.2% 200 18 0.000 – 5.4% 0.041 0.041 94.6% −0.0001 – 8.6% 0.041 0.038 91.4% a

Since estimates were exactly identical for all ICC levels, only one set of results are presented in the table.

b

Average estimate ofβ over 2000 replications.

c

Calculated as ( ˆb−b)/b.

d

Proportion of 2000 replications yielding significant estimate whenβ = .30.

e

Standard deviation ofβ over 2000 replications (used as estimate of population standard error).

f

Average of estimated standard errors over 2000 replications.

g

Proportion of 2000 replications when 95% confidence intervals contain the population value ofβ.

h

(10)

ICCslope= .05 β SE

β SE

NL2 NL3 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg

50 5 0.299 −0.2% 94.8% 0.083 0.081 93.3% 0.300 −0.1% 91.4% 0.083 0.087 85.9% 50 10 0.298 −0.8% 95.1% 0.082 0.081 94.1% 0.298 −0.8% 89.6% 0.083 0.111 91.9% 100 5 0.299 −0.4% 99.8% 0.061 0.058 92.7% 0.299 −0.3% 98.9% 0.061 0.055 84.0% 100 20 0.299 −0.2% 100.0% 0.060 0.058 94.1% 0.299 −0.3% 98.9% 0.059 0.063 93.2% 200 5 0.299 −0.2% 100.0% 0.045 0.041 92.8% 0.299 −0.2% 99.8% 0.045 0.038 84.2% 200 20 0.300 0.1% 100.0% 0.042 0.041 94.4% 0.300 0.1% 99.8% 0.042 0.041 93.2% 200 40 0.301 0.4% 100.0% 0.042 0.041 93.9% 0.301 0.4% 99.9% 0.042 0.042 93.8%

ICCslope= .10 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg

50 5 0.299 −0.2% 94.5% 0.084 0.081 92.9% 0.300 −0.1% 91.0% 0.084 0.119 85.9% 50 10 0.298 −0.7% 94.9% 0.083 0.081 93.9% 0.298 −0.7% 89.6% 0.083 0.164 92.0% 100 5 0.299 −0.4% 99.8% 0.063 0.058 92.4% 0.299 −0.3% 98.5% 0.063 0.058 84.8% 100 20 0.299 −0.2% 100.0% 0.060 0.058 94.2% 0.299 −0.2% 99.1% 0.060 0.061 93.1% 200 5 0.299 −0.3% 100.0% 0.048 0.041 91.0% 0.299 −0.3% 99.9% 0.047 0.039 85.2% 200 20 0.300 0.1% 100.0% 0.043 0.041 94.2% 0.300 0.1% 99.8% 0.043 0.044 93.2% 200 40 0.301 0.4% 100.0% 0.042 0.041 93.6% 0.301 0.4% 100.0% 0.043 0.042 93.8%

ICCslope= .20 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg

50 5 0.300 −0.2% 94.2% 0.087 0.081 92.4% 0.300 −0.1% 89.2% 0.087 0.100 85.8% 50 10 0.298 −0.5% 94.7% 0.084 0.081 93.8% 0.298 −0.5% 89.3% 0.084 0.105 92.2% 100 5 0.299 −0.3% 99.7% 0.066 0.058 91.0% 0.299 −0.3% 98.5% 0.066 0.063 84.7% 100 20 0.300 −0.2% 100.0% 0.061 0.058 93.8% 0.299 −0.2% 98.7% 0.060 0.066 93.3% 200 5 0.299 −0.4% 100.0% 0.052 0.041 87.7% 0.299 −0.4% 99.8% 0.052 0.044 85.3% 200 20 0.300 0.1% 100.0% 0.044 0.041 93.5% 0.300 0.1% 99.8% 0.044 0.043 93.0% 200 40 0.301 0.4% 100.0% 0.043 0.041 93.3% 0.301 0.4% 99.9% 0.043 0.042 93.7% a

Average estimate ofβ over 2000 replications. b

Calculated as ( ˆb−b)/b. c

Proportion of 2000 replications yielding significant estimate whenβ = .30. d

Standard deviation ofβ over 2000 replications (used as estimate of population standard error). e

Average of estimated standard errors over 2000 replications. f

Calculated as (se− sd)/sd . g

Proportion of 2000 replications when 95% confidence intervals contain the population value ofβ.

Psychotherapy

Research

(11)

Total slope variance. In conditions with larger total slope variance (SD = 0.15, compared to SD = 0.10) the 2-level model showed slightly worse coverage rates (Tables III, S1 and V). For the 3-level model, the amount of total slope variance did not impact esti-mation performance. However, in both conditions the 2-level model performed better than the 3-level model. Proportion of slope variance at the therapist level (ICCslope). For the 2-level model, a larger proportion

of slope variance at the therapist level reduced cover-age rates slightly. The 3-level model was almost unaf-fected by the increased slope ICC.

Number of therapists. In this condition, increasing the number of therapists improved estimates in both the 3-level and 2-level models. However, this effect was smaller for the 2-level model than the 3-level model.

Number of patients per therapist. In the 2-level model, larger number of patients per therapist was related to worse estimation of standard error. However, this effect was relatively small when evalu-ated by coverage rate of the 95% confidence intervals. For the 3-level model, the number of patients per therapist did not affect estimation performance.

In sum, the 2-level model was superior to the 3-level model in all random slope conditions. The 2-level model performance was reduced in the follow-ing conditions: (1) the variance in slopes at the thera-pist level was very large; (2) the number of therathera-pists was small (< 10); (3) the number of patients per therapist was very large (40 or more). Nevertheless, even in its worst condition (slope SD = 0.15, slope ICC = .20, N = 200 and five therapists), the 2-level model still outperformed the 3-level model.

Table IV. Simulation results for Random Intercept Cross-Lagged Panel Model with unbalanced assignment of patients to therapists at ICC = 0, .05, .20, and .40 andβ = .30.

ICC = 0

β SE β SE

NL2 NL3 Est. a

Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg 50 5 0.287 −4.3% 84.4% 0.097 0.093 93.3% 0.289 −3.8% 88.1% 0.095 20.887 78.0% 50 10 0.286 −4.6% 84.4% 0.095 0.093 93.8% 0.288 −4.1% 83.3% 0.094 11.554 87.8% 100 5 0.295 −1.7% 99.1% 0.064 0.065 95.1% 0.295 −1.6% 98.5% 0.064 0.057 87.0% 100 16 0.296 −1.5% 98.4% 0.066 0.065 94.3% 0.296 −1.4% 97.9% 0.066 0.061 90.2% 200 7 0.298 −0.7% 100.0% 0.044 0.045 95.6% 0.298 −0.6% 100.0% 0.044 0.039 86.4% 200 18 0.298 −0.8% 100.0% 0.044 0.045 96.1% 0.298 −0.8% 100.0% 0.044 0.043 92.7% ICC = .05 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg

50 5 0.287 −4.2% 84.4% 0.097 0.093 93.2% 0.289 −3.8% 87.3% 0.096 0.370 78.0% 50 10 0.287 −4.5% 84.3% 0.094 0.093 94.0% 0.288 −4.0% 81.8% 0.095 >100 88.0% 100 5 0.295 −1.6% 99.0% 0.065 0.065 95.0% 0.295 −1.5% 97.9% 0.064 0.061 87.4% 100 16 0.296 −1.5% 98.6% 0.066 0.065 94.3% 0.296 −1.5% 97.2% 0.066 0.077 90.8% 200 7 0.298 −0.7% 100.0% 0.045 0.045 95.2% 0.298 −0.6% 100.0% 0.044 0.039 86.3% 200 18 0.298 −0.8% 100.0% 0.045 0.045 96.1% 0.298 −0.8% 99.8% 0.044 0.045 92.9% ICC = .20 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg

50 5 0.288 −3.9% 85.2% 0.096 0.092 92.9% 0.290 −3.3% 88.5% 0.094 4.853 78.0% 50 10 0.288 −4.1% 84.9% 0.094 0.093 94.1% 0.289 −3.5% 84.1% 0.094 >100 87.7% 100 5 0.296 −1.5% 98.8% 0.065 0.065 94.8% 0.296 −1.2% 98.5% 0.063 >100 87.6% 100 16 0.296 −1.5% 98.4% 0.067 0.065 94.3% 0.296 −1.2% 97.6% 0.065 0.061 90.5% 200 7 0.298 −0.7% 100.0% 0.045 0.045 95.2% 0.299 −0.5% 99.9% 0.044 0.039 86.2% 200 18 0.298 −0.8% 100.0% 0.045 0.045 95.6% 0.298 −0.6% 100.0% 0.044 0.042 92.9% ICC = .40 Est.a Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg

50 5 0.290 −3.3% 86.6% 0.095 0.091 93.3% 0.294 −2.2% 90.6% 0.090 20.178 77.2% 50 10 0.289 −3.6% 85.4% 0.093 0.092 94.0% 0.293 −2.2% 87.2% 0.091 >100 87.6% 100 5 0.296 −1.2% 99.1% 0.065 0.064 94.1% 0.298 −0.6% 99.0% 0.061 0.054 86.6% 100 16 0.296 −1.4% 98.6% 0.067 0.065 94.2% 0.298 −0.5% 98.9% 0.064 0.059 89.7% 200 7 0.298 −0.6% 100.0% 0.045 0.045 95.5% 0.300 −0.2% 99.9% 0.043 0.037 85.9% 200 18 0.298 −0.7% 100.0% 0.045 0.045 95.4% 0.299 −0.3% 100.0% 0.043 0.041 92.6%

a_{Average estimate of}_{β over 2000 replications.} b_{Calculated as ( ˆ}_b₋_b₎_/_b_.

c_{Proportion of 2000 replications yielding significant estimate when}_{β = .30.}

d_{Standard deviation of}_{β over 2000 replications (used as estimate of population standard error).} e_{Average of estimated standard errors over 2000 replications.}

f_{Calculated as (}_se_{− sd)/sd.}

(12)

Slope SD = 0.10 Slope SD = 0.15

ICCslope= .05 β SE β SE

NL2 NL3 Est.

a

Biasb Powerc SDd Est.e Coverg Est.a Biasb Powerc SDd Est.e Coverg

50 5 0.284 −5.3% 83.1% 0.100 0.093 92.7% 0.283 −5.8% 81.2% 0.103 0.094 92.2% 50 10 0.286 −4.8% 82.8% 0.102 0.093 92.4% 0.285 −5.1% 82.2% 0.105 0.094 91.6% 100 5 0.296 −1.3% 98.7% 0.068 0.065 93.8% 0.296 −1.5% 98.3% 0.071 0.066 92.5% 100 20 0.295 −1.6% 98.5% 0.068 0.065 94.0% 0.294 −1.9% 98.1% 0.070 0.066 93.8% 200 5 0.297 −0.9% 100.0% 0.048 0.046 94.2% 0.296 −1.2% 100.0% 0.050 0.046 92.6% 200 20 0.297 −1.0% 100.0% 0.047 0.046 94.7% 0.296 −1.3% 100.0% 0.049 0.046 93.8% 200 40 0.297 −1.0% 100.0% 0.047 0.046 94.7% 0.296 −1.3% 100.0% 0.049 0.046 93.7%

ICCslope= .10 Est. a

50 5 0.284 −5.3% 82.7% 0.100 0.093 92.4% 0.283 −5.8% 80.8% 0.105 0.094 91.9% 50 10 0.286 −4.8% 82.7% 0.102 0.093 92.2% 0.284 −5.2% 82.5% 0.105 0.094 91.5% 100 5 0.296 −1.3% 98.7% 0.069 0.065 93.3% 0.296 −1.4% 98.1% 0.073 0.066 91.9% 100 20 0.295 −1.6% 98.5% 0.068 0.065 94.2% 0.294 −1.9% 98.2% 0.070 0.066 93.4% 200 5 0.297 −0.9% 100.0% 0.049 0.046 93.3% 0.296 −1.2% 100.0% 0.053 0.046 91.4% 200 20 0.297 −1.0% 100.0% 0.047 0.046 94.4% 0.296 −1.3% 100.0% 0.049 0.046 93.8% 200 40 0.297 −1.0% 100.0% 0.047 0.046 94.3% 0.296 −1.3% 100.0% 0.049 0.046 93.8%

ICCslope= .20 Est. a

50 5 0.284 −5.4% 82.2% 0.102 0.093 92.0% 0.283 −5.7% 80.5% 0.107 0.094 91.4% 50 10 0.285 −4.9% 83.0% 0.102 0.093 92.2% 0.284 −5.3% 81.7% 0.106 0.094 91.1% 100 5 0.296 −1.2% 98.3% 0.070 0.065 92.7% 0.296 −1.3% 97.8% 0.0754 0.0658 91.2% 100 20 0.295 −1.7% 98.5% 0.068 0.065 94.2% 0.294 −1.9% 98.0% 0.0705 0.066 93.3% 200 5 0.297 −0.9% 100.0% 0.051 0.046 91.8% 0.296 −1.2% 100.0% 0.0567 0.0461 89.3% 200 20 0.297 −1.0% 100.0% 0.047 0.046 94.6% 0.296 −1.2% 100.0% 0.0502 0.0463 93.4% 200 40 0.297 −1.0% 100.0% 0.047 0.046 94.3% 0.296 −1.3% 100.0% 0.0493 0.0462 93.3%

a_{Average estimate of}_{β over 2000 replications.} b_{Calculated as ( ˆ}_b₋_b₎_/_b_.

c_{Proportion of 2000 replications yielding significant estimate when}_{β = .30.}

d_{Standard deviation of}_{β over 2000 replications (used as estimate of population standard error).} e_{Average of estimated standard errors over 2000 replications.}

f_{Calculated as (}_se_{− sd)/sd.}

Psychotherapy

Research

(13)

Discussion

Our results suggest that therapist effects do not have a significant impact on within-patient estimates in mechanisms of change studies. Consistent with pre-vious work (Moerbeek, 2004), therapist effects did not affect estimates in balanced design conditions (when all therapists treat the same number of patients), and with within-patient coefficients fixed to be the same across all therapists. This is the first study to expand this finding to various study con-ditions, including unbalanced designs (unequal assignment of patients to therapists), and variation in within-patient coefficients across therapists. We found that if anything, inclusion of therapist effects in 3-level models in these designs reduced model per-formance compared to when therapist effects were ignored (i.e., 2-level models).

This is the first study to show that therapist effects can be ignored when within-patient effects (i.e., Level-1 effects) are the primary focus in either MLM or SEM frameworks across various conditions. Taking therapist effects into account did not improve model performance in any of the simulation con-ditions. When the number of therapists was small, 3-level models resulted in increased risk of Type-I error. Although it is possible to improve estimation of MLM with small samples on the highest level, e.g., by using Restricted Maximum Likelihood with adjusted degrees of freedom (McNeish, 2017), this is seldom done in psychotherapy studies.

Previous simulation studies that focused on the study of treatment efficacy have shown that therapist effects should be modeled when investigating differ-ential treatment effects, estimated at between-patient levels (Magnusson et al.,2018). Our results suggest that this conclusion does not apply to the study of within-patient effects (i.e., Level-1 effects in MLM terminology). This result challenges a common tendency to include therapist effects as a better safe than sorry strategy in all psychotherapy studies.

Our findings also suggest that ignoring therapist-level random slopes does not, generally, affect coeffi-cient estimates (which represent a weighted average between therapists). An exception was for SEM models with very small samples (N = 50), the coeffi-cients were slightly downward biased (just above the 5% criterion). Standard errors were biased only if the slope variance at the therapist level was large, the number of therapists small, and/or the number of patients per therapist was large. Still, 2-level models outperformed 3-level models even in the most stringent condition, likely uncommon in psy-chotherapy studies, with a slope SD = 0.15 (in the context of standardized regression coefficients),

slope ICC = .20, and only five therapists who treat 40 patients each. Notably, the simulation study by Magnusson et al. (2018) found substantial bias when ignoring random slopes of similar magnitudes5 when studying the effects of ignoring therapist effects in outcome research (i.e., the effect of ignoring Level-3 on Level-2 coefficients).

Our simulations showed that statistical power needed to find significant effects was above 90% in all MLM conditions tested, and above 80% for all SEM conditions, even when sample size was as small as N = 50. Although the effect size used in simulations was fairly large (β = .30), this finding is still notable. The high statistical power of these models is most likely attributed to additional infor-mation provided by repeated observations beyond sample size, i.e., with N = 50 and T = 5, there are 50 × 4 = 200 observations6 used to calculate the within-patient effect.

MLM and SEM showed similar findings, although with slightly higher bias and lower power for the SEM models. Considering the greater complexity of the SEM models used, the reduction in power was smaller than expected, suggesting good performance of these models. Notably, an important disadvantage of MLM compared to SEM models is that they do not allow adjusting the cross-lagged coefficient esti-mate for the dependent variable at the prior session. Ignoring the autoregressive effect in panel data results in mis-specified models and in turn in an increased risk for biased estimates of the cross-lagged effects of interest.

Our findings should be considered in the context of several limitations. First, we were unable to estimate therapist-level random slope models in any of the SEM conditions tested. If a researcher is interested in the average effect of a putative mechanism (i.e., the within-patient effect), results from our 2-level models show that random slopes can be ignored as long as they are not very large, the number of thera-pists is not very small, and the number of patients per therapist is not very large. In practice, the size of random slopes is of course unknown, but if the other conditions (i.e., small number of therapists and large number of patients per therapist) are ful-filled, a researcher should consider whether large slope variance at the therapist level is expected. For our simulation study, we chose the maximum slope variance so that 95% of the slopes would not cross zero, because it is unlikely that any theoretically pro-posed mechanism in psychotherapy research would have opposite effects among therapists. However, if in a particular research context this may be a possi-bility, our simulation results may not hold.

If therapist-level random slopes are the focus of research interest, SEM might not be the best

(14)

modeling framework to use. As mentioned, estimat-ing random slopes in SEM is complex and requires numerical integration which is highly computation-ally demanding and time-consuming (Asparouhov & Muthén,2007). Other estimators, such as Bayesian estimation, might be preferable but are beyond the scope of the current study. Additionally, it is likely that the performance of the 3-level MLM models with small numbers of therapists would have been improved with Restricted Maximum Likelihood, perhaps with some adjustment to degrees of freedom (McNeish, 2017). However, since the focus of our study was to investigate whether thera-pist effects impact within-patient estimates in 2-level models rather than testing the performance of 3-level models in small samples, we did not pursue this further.

Second, for feasibility purposes and given the number of conditions tested, we chose to focus on an effect size of β = .30, which represents a medium-sized effect, common in mechanisms of change research. Of note, the β = 0 models (Tables I and II) showed very similar findings as β = .30, and we also tested some of our models using smaller (0.10 and 0.20) and larger (.40 and .50) effects and results were comparable. Future studies can expand our work by testing these models using a range of effect sizes.

Conclusions, Recommendations, and Suggestions for Future Research

Due to the increased interest in mechanisms of change research in general, and within-patient effects in particular, it is vital that researchers use the most robust and accurate methods available. Since adjusting for therapist effects has proven to be important in other areas of psychotherapy research, it is understandable that many researchers believe that this is the case for within-patient mechanism studies. However, our simulation study shows that when interest is in within-patient associations, and models are used that separate these from between-patient differences, the safest strategy is to ignore therapist effects. Thus, we recommend using 2-level models (i.e., repeated measures nested within patients) when estimating these effects. If researchers are specifically interested in therapist effects, these should be models with an effort to recruit a sufficient number of therapists. Assuming an unbalanced assignment of patients to therapists this should likely be at least 15–20 therapists. Alternatively, with fewer therapists, differences among therapists can be modeled using a fixed effects approach (Snij-ders & Bosker,2012).

Further research is needed on optimal estimation of random slopes in SEM models. The newly devel-oped Dynamic Structural Equation Model, which can be used for similar research questions (i.e., esti-mating within-patient cross-lagged coefficients) may be optimal, given its combination of the advantages of both MLM and SEM approaches (DSEM; Aspar-ouhov et al.,2018). However, at present, it requires at least around 10 repeated measurements for stable effects and good model performance (Schultzberg & Muthén,2018).

The growing interest in personalization of psy-chotherapy processes and the study of individual differences across treatment requires use of complex modeling techniques. Our study demon-strates that such modeling should be used with caution and awareness of potential pitfalls and chal-lenges that may affect inferences from estimation results. When studying within-person effects, we encourage researchers to expand our line of inquiry by cautiously considering the impact of their study conditions on modeling, selecting the most optimal and appropriate statistical approach, and investi-gating whether results are replicated under a range of conditions. As with treatment packages, thera-peutic processes, and mechanisms of change, when it comes to modeling—one size does not fit all.

Data Transparency Statement

The data for this study has never been used before.

Supplemental Data

Supplemental data for this article can be accessed at https://doi.org/10.1080/10503307.2020.1769875 description of location.

Funding

This work was supported by National Institute of Mental Health: [grant number T32 MH01913].

Notes

1 _{Errors are assumed to be independent in all of the models, unless}

otherwise specified.

2 _{It is also possible in SEM to assign different coefficients to each}

time-point, thus allowing for different effects between, say, time 1 to time 2 than from time 2 to time 3, and so on. However, we focus on comparable effects in SEM and MLM and thus only discuss models in which (cross-) associations are constrained to be equal over time, i.e. there is only one coefficient for the within-person effect of X on Y.

(15)

3 _{More tables are included in the online supplement, see Tables}

S1-S4.

4 _{A model that estimated random slopes on Level-2, i.e. still}

ignor-ing the therapist-level, turned out to result in very similar esti-mates as the model with completely fixed slopes.

5 _{Our largest slope variance represents a variance ratio (slope}

var-iance/within-patient variance) of 0.022, which is in-between the two largest slope variance conditions (0.02 and 0.03) used in the study by Magnusson et al. (2018). In that study, however, the largest slope ICC tested was .10 while we also included slope ICC = .20.

6 _{Due to lagging, there are only T-1 repeated observations used in}

the estimate ofβ, which is why we multiply by 4 instead of 5.

ORCID

FREDRIK FALKENSTRÖM http://orcid.org/ 0000-0002-2486-6859

NILI SOLOMONOV http://orcid.org/0000-0003-1573-5715

JULIAN A. RUBEL http://orcid.org/0000-0002-9625-6611

References

Adelson, J. L., & Owen, J. (2012). Bringing the psychotherapist back: Basic concepts for reading articles examining therapist effects using multilevel modeling. Psychotherapy, 49(2), 152– 162.https://doi.org/10.1037/a0023990

Allison, P. D., Williams, R., & Moral-Benito, E. (2017, August). Maximum likelihood for cross-lagged panel models with fixed effects. Socius: Sociological Research for a Dynamic World, 3, 2378023117710578. https://doi.org/10.1177/ 2378023117710578

Asparouhov, T., Hamaker, E. L., & Muthén, B. (2018). Dynamic structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 25(3), 359–388. https://doi.org/10. 1080/10705511.2017.1406803

Asparouhov, T., & Muthén, B. (2007). Computationally efficient estimation of multilevel high-dimensional latent variable models. Paper presented at the proceedings of the 2007 JSM meeting in Salt Lake City, Utah, Section on Statistics in Epidemiology. Asparouhov, T., & Muthén, B. (2019). Comparison of models for the analysis of intensive longitudinal data. Structural Equation Modeling: A Multidisciplinary Journal, 1–23.https://doi.org/10. 1080/10705511.2019.1626733

Baird, R., & Maxwell, S. E. (2016). Performance of time-varying predictors in multilevel models under an assumption of fixed or random effects. Psychological Methods, 21(2), 175–188. https://doi.org/10.1037/met0000070

Baldwin, S. A., & Imel, Z. E. (2013). Therapist effects: Findings and methods. In M. J. Lambert (Ed.), Bergin and Garfield’s handbook of psychotherapy and behavior change (6th ed., pp. 258–297). Wiley.

Barber, J. P., & Solomonov, N. (2019). Toward a personalized approach to psychotherapy outcome and the study of thera-peutic change. World Psychiatry, 18(3), 291–292. https://doi. org/10.1002/wps.20666

Beck, A. T., Rush, A., Shaw, B., & Emery, G. (1979). Cognitive therapy of depression. Guilford Press.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159.https://doi.org/10.1037/0033-2909.112.1.155

Crits-Christoph, P., & Mintz, J. (1991). Implications of therapist effects for the design and analysis of comparative studies of psychotherapies. Journal of Consulting and Clinical Psychology, 59(1), 20–26. https://doi.org/10.1037/0022-006X. 59.1.20

Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-within-person effects in longitudinal models of change. Annual Review of Psychology, 62(1), 583–619.https:// doi.org/10.1146/annurev.psych.093008.100356

Curran, P. J., Howard, A. L., Bainter, S. a., Lane, S. T., & McGinley, J. S. (2014). The separation of between-person and within-person components of individual change over time: A latent curve model with structured residuals. Journal of Consulting and Clinical Psychology, 82(5), 879–894. https:// doi.org/10.1037/a0035297

de Jong, K., Moerbeek, M., & van der Leeden, R. (2010). A priori power analysis in longitudinal three-level multilevel models: An example with therapist effects. Psychotherapy Research, 20(3), 273–284.https://doi.org/10.1080/10503300903376320 Falkenström, F., Finkel, S., Sandell, R., Rubel, J. A., & Holmqvist,

R. (2017). Dynamic models of individual change in psychother-apy process research. Journal of Consulting and Clinical Psychology, 85(6), 537–549. https://doi.org/10.1037/ ccp0000203

Falkenström, F., Granström, F., & Holmqvist, R. (2013). Therapeutic alliance predicts symptomatic improvement session by session. Journal of Counseling Psychology, 60(3), 317–328.https://doi.org/10.1037/a0032258

Falkenström, F., Solomonov, N., & Rubel, J. (2020). Using time-lagged panel data analysis to study mechanisms of change in psychotherapy research: Methodological recommendations. Counselling and Psychotherapy Research.https://doi.org/10.1002/ capr.12293

Firebaugh, G., Warner, C., & Massoglia, M. (2013). Fixed effects, random effects, and hybrid models for causal analysis. In L. S. Morgan (Ed.), Handbook of causal analysis for social research (pp. 113–132). Springer Netherlands.

Hallquist, M. N., & Wiley, J. F. (2018). Mplusautomation: An R package for facilitating large-scale latent variable analyses in Mplus. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 621–638. https://doi.org/10.1080/10705511. 2017.1402334

Hamaker, E. L. (2012). Why researchers should think ‘within-person’: A paradigmatic rationale. Handbook of Research Methods for Studying Daily Life, 43–61.

Hamaker, E. L., Kuiper, R. M., & Grasman, R. P. P. P. (2015). A critique of the cross-lagged panel model. Psychological Methods, 20(1), 102–116.https://doi.org/10.1037/a0038889

Hatcher, R. L., Lindqvist, K., & Falkenström, F. (2019). Psychometric evaluation of the working alliance Inventory— therapist version: Current and new short forms. Psychotherapy Research, 1–12. https://doi.org/10.1080/10503307.2019. 1677964

Hoffman, L., & Stawski, R. S. (2009). Persons as contexts: Evaluating between-person and within-person effects in longi-tudinal analysis. Research in Human Development, 6(2-3), 97– 120.https://doi.org/10.1080/15427600902911189

Kazdin, A. E. (2007). Mediators and mechanisms of change in psy-chotherapy research. Annual Review of Clinical Psychology, 3(1), 1–27. https://doi.org/10.1146/annurev.clinpsy.3.022806. 091432

Lambert, M. J. (2013). Bergin and Garfield’s handbook of psychother-apy and behavior change (6th ed.). Wiley.

Lundh, L. G., & Falkenström, F. (2019). Towards a person-oriented approach to psychotherapy research. Journal for Person-Oriented Research, 5(2), 65–79. https://doi.org/10. 17505/jpor.2019.07

(16)

Magnusson, K., Andersson, G., & Carlbring, P. (2018). The con-sequences of ignoring therapist effects in trials with longitudinal data: A simulation study. Journal of Consulting and Clinical Psychology, 86(9), 711–725. https://doi.org/10.1037/ ccp0000333

McNeish, D. (2017). Small sample methods for multilevel model-ing: A colloquial elucidation of REML and the Kenward-Roger correction. Multivariate Behavioral Research, 52(5), 661–670. https://doi.org/10.1080/00273171.2017.1344538

Mehta, P. D., & Neale, M. C. (2005). People are variables too: Multilevel structural equations modeling. Psychological Methods, 10(3), 259–284. https://doi.org/10.1037/1082-989X. 10.3.259

Moerbeek, M. (2004). The consequence of ignoring a level of nesting in multilevel analysis. Multivariate Behavioral Research, 39(1), 129–149.https://doi.org/10.1207/s15327906mbr3901_5 Molenaar, P. C. M., & Campbell, C. G. (2009). The new person-specific paradigm in psychology. Current Directions in Psychological Science, 18(2), 112–117.https://doi.org/10.1111/j. 1467-8721.2009.01619.x

Muthén, L. K., & Muthén, B. O. (1998-2017). Mplus user’s guide. Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on sample size and determine power. Structural Equation Modeling: A Multidisciplinary Journal, 9(4), 599–620.https://doi.org/10.1207/S15328007SEM0904_8 Nickell, S. (1981). Biases in dynamic models with fixed effects.

Econometrica, 49(6), 1417–1426. https://doi.org/10.2307/ 1911408

Rubel, J. A., Zilcha-Mano, S., Giesemann, J., Prinz, J., & Lutz, W. (2019). Predicting personalized process-outcome associations in psychotherapy using machine learning approaches—A dem-onstration. Psychotherapy Research, 1–10. https://doi.org/10. 1080/10503307.2019.1597994

Schultzberg, M., & Muthén, B. (2018). Number of subjects and time points needed for multilevel time-series analysis: A simulation study of dynamic Structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 25(4), 495–515. https://doi.org/10.1080/10705511. 2017.1392862

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Sage Publishers.

Summers, R. F., & Barber, J. P. (2010). Psychodynamic therapy: A guide to evidence-based practice. Guilford Press.

Wampold, B. E., & Serlin, R. C. (2000). The consequence of ignoring a nested factor on measures of effect size in analysis of variance. Psychological Methods, 5(4), 425–433. https://doi. org/10.1037/1082-989X.5.4.425

Wang, L. P., & Maxwell, S. E. (2015). On disaggregating between-person and within-between-person effects with longitudinal data using multilevel models. Psychological Methods, 20(1), 63–83.https:// doi.org/10.1037/met0000030

Xu, H., & Tracey, T. J. G. (2015). Reciprocal influence model of working alliance and therapeutic outcome over individual therapy course. Journal of Counseling Psychology, 62(3), 351– 359.https://doi.org/10.1037/cou0000089

Zilcha-Mano, S., & Errázuriz, P. (2015). One size does not fit all: Examining heterogeneity and identifying moderators of the alli-ance– outcome association. Journal of Counseling Psychology, 62 (4), 579–591.doi:http://doi.org/10.1037/cou0000103 Zyphur, M. J., Allison, P. D., Tay, L., Voelkle, M., Preacher, K. J.,

Zhang, Z., Hamaker, E. L., Shamsollahi, A., Pierides, D. C., Koval, P., & Diener, E. (2019). From data to causes I: Building a general cross-lagged panel model (GCLM). Organizational Research Methods. doi:10.1177/ 1094428119847278