• No results found

Feeling the zeros

N/A
N/A
Protected

Academic year: 2021

Share "Feeling the zeros"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

C-LEVEL THESIS

Feeling the zeros

Modeling individual responses, measured against

time, to treatment of chronic myeloid leukemia.

June 8, 2017

(2)

In this paper response curves of patients with chronic myeloid leukemia are modeled using one and two-level censored models. Two-level models (also called mixed models) allow ran-dom effects and censored models are used to account for the large amount of values too small to be detected. The curves are observed from start of medication to a maximum of 36 months (9 measurement points). The data set is divided into two: "excellent responders" and "other". The "excellent responders" are modeled with a simple cubic censored model, and only one of the background variables measured at time zero ("blasts"), is found to be significant in explaining variation in the change curves, and even this with certain reservations. "Other" are modeled with a cubic two-level censored model and hemoglobin and eosinophile levels, as well as amount of blasts, are significant in explaining variation in this group.

(3)

Contents

1 Introduction 1

2 Data 3

3 Choosing a method 6

3.1 Transforming the data and what that leads to . . . 7

3.2 Feeling the zeros . . . 7

3.3 Assumptions and other considerations . . . 11

4 Methodology 16 4.1 Modeling censored data vs. truncated data . . . 16

4.2 Two-level model for longitudinal data . . . 17

4.2.1 Choosing model form . . . 19

4.3 Censored model . . . 20

4.4 Censored two-level model . . . 21

5 Analysis and results 22 5.1 Models for "Other" -group . . . 24

5.2 Results for "Other" . . . 26

5.3 Models for "Excellent responders" -group . . . 29

5.4 Results for "Excellent responders" . . . 30

5.5 Comparing the two data sets . . . 31

6 Conclusion 32

(4)

1.1 Plot of one observation percent reduction in BCR-ABL1 -protein in logarithmic

scale . . . 2

2.1 The raw data . . . 4

3.1 Logarithm transformed data . . . 8

3.2 Data set with zeros replaced by small values, logarithm transformed . . . 9

3.3 The inverse hyperbolic sine transformation applied to the data set. . . 10

3.4 Residuals group "excellent responders" - model with no random effects . . . . 12

3.5 Level-1 residuals group "excellent responders" - model with random effects . . 12

3.6 Level-2 residuals group "excellent responders" - model with random effects . . 13

3.7 Residuals group "other" - model with no random effects . . . 13

3.8 Level-1 residuals group "other" - model with random effects . . . 13

3.9 Level-2 residuals group "other" - model with random effects . . . 14

(5)

List of Tables

5.1 Results: group other . . . 26

5.2 Results: NLMIXED vs. QLIM . . . 28

5.3 Results: NLMIXED vs. MIXED . . . 28

(6)

Chronic myeloid leukemia is caused by a transformation in the Philadelphia chromosome and is diagnosed through discovery of BCR-ABL1 fusion protein. The arrival of a new gen-eration of medication (so called TKI-medication) in the early 2000s has revolutionised the treatment of chronic phase myeloid leukemia.[1] However, some patients do not respond to the medication with the speed and depth required for a good long term prognosis. It is imperative for the enabling of best possible treatment that these patients are found as early as possible.

The objective of this paper is to model the response curves of chronic phase myeloid leukemia patients to medication (TKI -medication; Dasatinib and Imatinib, see section 2 "Data" below). By modeling the observed curves two things can be achieved; first and fore-most we can gain information on how the response curves of a strongly/poorly responding patient look. This information can be used to learn about the characteristics of the illness and also to create more accurate expectations amongst the treating staff as to the progress of the illness. Secondly, combined with information from the background variables, it can eventually lead to a tool for separating the non-responding patients from the patient body as soon as possible.

In this context, treatment success is measured by MMR (Major Molecular Response), which is one logarithmic scale reduction in the amount BCR-ABL1 -fusion protein found (Figure 1.1), starting from the initial value. Reaching MMR by 12 months is considered to be a successful response to treatment[2].

(7)

Warnqvist Feeling the zeros

Figure 1.1: Plot of one observation percent reduction in BCR-ABL1 -protein in logarithmic scale

If the data was partly chosen to fit two-level longitudinal analysis, it was most certainly the characteristics of the data that gave rise to the need for a censored model. A censored model takes into consideration the zero values and estimate the given model on the condition that the values are non-zero. The data piles on the zero lower boundary, which it cannot cross. This had to be taken into account, and a censored model was a way to achieve that. Censored models are sensitive to departures from normality (See section 3.3 "Assumptions and other considerations"), which highlights the need to transform the highly skewed and heteroscedastic data. The logarithmic transformation used is a solution that alleviates the heteroscedasticity and the heavy skewness of the distribution, while simultaneously creating new problems.

(8)

We begin by introducing the background variables and the data set. The data consists of 86 patients with newly diagnosed chronic phase myeloid leukemia. The data has been originally gathered for two different studies[3, 4, 2]. Patients enrolled to the first study (46 patients) were diagnosed and began treatment 2009-2010. For the second study patients were enrolled and began treatment 2013-2014 (40 patients after exclusions). Patients have been followed for up to 36 months, but after 15 months there are several drop-outs. All patients have given informed written consent for participation in the studies, in accordance with the Helsinki protocol. The original studies have been approved by the ethical committees of the participating countries[3, 4]. Both studies followed ITT (Intention To Treat) structure, meaning that all patients were kept in the studies regardless of possible changes in their treatment. This excludes the possibility of only including successful patients by excluding patients who have to be treated with other medications due to nonresponse. Both studies have been inter-Nordic, including patients who have received treatment at different hospitals in Sweden, Norway and Finland. For the first study the patients were divided into two groups, one receiving treatment with Dasatinib and the other Imatinib, the two groups constituting of 22 and 24 patients respectively. In the second study all patients received Dasatinib. The data consisted of 30 women and 55 men (with one missing value) and the median age was 52 years.

The variables available for the analysis are as follows; • Patient age at the time of enrollment to the study • Patient gender

• The drug given (Dasatinib or Imatinib)

(9)

Warnqvist Feeling the zeros

• Hasford, SOKAL and EUTOS risk scores (see below). • The following values at time zero;

– Hemoglobin (g/l)

– Spleen (cm from costal arch) – Basophils in PB (percent) – Blasts in PB (percent) – Eosinophils in PB (percent) – Leukocytes in PB (x109/l) – Platelet count.

Also, the following information is used:

• BCR-ABL1 international scale-value at three months measurement.

• The percent reduction in BCR-ABL1-level reached at three months from initial value. The SOKAL, Hasford and EUTOS scores are composite scores which use different combina-tions of the information included as background variables in this analysis[5]. The BCR-ABL -fusion protein level is calculated at the international scale, where 100 percent functions as

an internationally agreed reference point.

Figure 2.1: The raw data

(a) BCR-ABL1 values on international scale (IS). (b) BCR-ABL1 values (IS scale) in logarithmic scale.

(10)
(11)

3. Choosing a method

This paper has two objectives: to model the recovery curves in order to investigate their form and to find out how differences in recovery curves relate to the different background variables. In the following the method selection process is explained. After that, the assumptions of the chosen methods are investigated and discussed.

When modeling longitudinal data of the type at hand, several options are available. Generalised linear model can be fitted to the raw data. However, fitting a model that takes into consideration both the heteroscedasticity and the skewness of the data would have caused difficulties. Also, as it is decided to fit a censored model (see section 3.2 "Feeling the zeros"), the importance of normality is highlighted and fitting a model to the raw data stops being an option. When the data have been transformed, generalised linear models are disregarded as unnecessary.

One viable option would have been ANOVA, but due to the less stringent assumptions, greater flexibility and possibility to model continuous effects and not only between groups differences — a prerequisite at the heart of what is attempted with this analysis — a two-level regression model was found superior. (For weaknesses of ANOVA in longitudinal data: [6].)

Survival analysis could also be of interest, but as the objective was to model the curve form and not occurrence of a specific event, it was not used.

As the primary interest of this analysis was to model the differences of the individual curves over time, a two-level model of change was an option readily available. A two-level, random intercept and random slope model for repeated measurements provided the possi-bility to model individual recovery curves without limiting the analysis to - nor excluding the possibility of - between-groups-differences. This was of great importance in allowing a very free model building process, which was very suitable both for the (numeric) data set at hand and in line with the interests of the author.

(12)

are all based on different subsets of the same variables as are at hand for the author. Of these, SOKAL was is the oldest and thus its suitability for classifying patients receiving TKI treatment is in question[5]. Also, some relatively recent studies attempt to predict recovery success from the responses received at one, three and six months[7, 2]. Grouping the data based on information found in the background variables is thus a viable option and indeed used in this paper as well.

3.1 Transforming the data and what that leads to

It was clear that the distribution of the data was highly skewed and the dependent variable (the BCR ABL -score) was not normally distributed, which is an assumption of the two-level censored models. The reason for the skewness was the tendency of all observations to move towards zero, without ever crossing it. This constant move towards zero also caused severe heteroscedasticity, as the starting points had a lot wider variation than the measurements at later time points. To be able to fit a two-level model, the data was transformed by taking the logarithm. This helped against the problems of skewness and heteroscedasticity quite effectively (Figure 3.1, below), though simultaneously bringing an extra layer of complexity to the eventual interpretation of the results. However, as the numbers all tended towards zero, many of them reaching it, a simple logarithm created a considerable amount of missing values, the logarithm of zero being undefined. Thus the logarithm -transformed data is truncated.

3.2 Feeling the zeros

The logarithmic transformation provided help to the problems of heteroscedasticity and skewness, reducing them to more acceptable levels, but created a new problem by truncating the data set. Modeling a truncated data set is also an option, but causes a bias in the estima-tion, and thus makes estimation an operation of questionable usefulness. And thus, we turn to look at the zeros and what they actually are.

(13)

Warnqvist Feeling the zeros

Figure 3.1: Logarithm transformed data

blood what so ever. This interpretation would indicate that a zero reached for the first time should have same indications as a zero maintained for several measurements in a row. In this case, the data are concentrated at zero, but all values observed are the actual values in the blood, excluding measurement error. The other interpretation is that at some point the level of BCR-ABL1 -fusion protein becomes too small for the tests to detect. In this case a zero signifies that the value is beyond the lower detection bound, but might or might not actually be zero. In this case the data are censored. This feels like a very natural interpretation of the data behavior in this context. This interpretation is supported by the tendency to bounce back slightly from zero, clearly visible in the logarithm transformed "fake zero" data (Figure 3.2, below). It should be noted that in this case the detection limit is arbitrary. It is not known what the true detection limit is, or even if it is the same for all observations. The analysis of different samples has been done in different laboratories and it is not known if their detection limits are the same. This problem of values below detection line is not unheard of in the natural sciences and work has particularly been done in this area on data from rebound curves related to Human Immunodeficiency Virus medication[8, 9, 10].

As a first attempt to "deal with" the zeros, several approaches were tried out. In addition to logarithmic transformation, the following transformations were attempted so as to create a data set usable in a two-level model and where the zeros would be included:

(14)

potential, particularly when zero is a proxy for "too small to measure" and the amount of zero values is not proportionately large. However, this approach is very sensitive to the value chosen, which can cause considerable differences in the actual logarithm-transformed data[11, 12]. To test it out, this approach was fitted to the data, mainly as it offers a very easy access to visualisation of the logarithm-transformed variables. The value chosen was approximately half the smallest value in the data set (0.000136, smallest value in dataset: 0.0002710714). It is evident from the plot (Figure 3.2, below) that choosing a slightly smaller value would have changed the distribution considerably.

Figure 3.2: Data set with zeros replaced by small values, logarithm transformed

• The transformation log(y + 1) was also attempted. To try out the suitability of the approach, all the observations with zero values were removed, the remaining data set was both log(y) and log(1 + y) transformed, and then identical models were fit to both the transformed data. The results were completely different, producing different estimates and totally different fit statistics, effectively confirming that the log(1 + y) transformation is not a valid approach to dealing with zero values, at least not when the data consist of small values like the data set at hand does.

(15)

Warnqvist Feeling the zeros

• The inverse hyperbolic sine ³

yi= l og (yi+

q

yi2+ 1) ´

transformation was also applied to the data, but it did not have the desired effect either (Figure 3.3, below).

Figure 3.3: The inverse hyperbolic sine transformation applied to the data set.

• One option, also mentioned above, would have been to disregard the zeros and only model the curves "on their way to zero" so to speak. However, this would cause consid-erable bias in the estimates and could not be regarded as a viable route of action. From this it has become quite obvious that transformation methods that allow the inclu-sion of zero are not strong enough to deal with the heteroscedasticity and particularly the skewness of the data in a satisfactory way. However, as the zeros could not be disregarded, a model had to be eventually build around them.

From the investigation above the following decisions were made:

• Firstly, a two-level model will be fitted to the logarithm-transformed, truncated (zeros removed) data set. This truncated model is done partly to have adequate starting points for the estimates of the parameters in the eventual censored model, partly as a comparison to it.

(16)

logarithm-transformed imputed value will be used as a detection boundary (-8.9). This allows testing for parameter significance simultaneously as the zeros are accounted for. • Thirdly, a two-level censored model will be fitted to the logarithm transformed data

with "fake zeros". This model allows for both random effects and takes into account the zeros. Thus it is a combination of the first two models.

The author admits that above solution is not optimal, but rather a creative solution build upon and taking into consideration the resources and time available. In an ideal situation a censored two-level non-linear model with non-normal residuals would have been fitted to the raw (non-transformed) data. Such models do exist [8, 13, 9]. However, in defense of the model initiated by the author it must be noted that the current solution enables modeling the whole data set through a relatively familiar transformation and with relative ease. Yet, to what extent this modeling technique (building a censored model upon transformed data) produces problems of its own remains an open question.

3.3 Assumptions and other considerations

The assumptions for two-level models are [14, 15]: 1. Normality of both level-1 and level-2 error terms.

2. The covariance structure can be modeled (for SAS PROC MIXED). Assumptions for censored models are[16]:

1. Independent identically distributed error terms. 2. The existence of censoring.

Censored models are known to be sensitive to deviations from normality, which might cause problems (see section 4.3 "censored model"). Also, as the sample consists of longitudinal data, it is evident that the error terms are not independent of each other.

(17)

Warnqvist Feeling the zeros

As always in time series data, the errors of within-series observations - the different ob-served values for the same person - are expected to be (positively) correlated with each other. To stop this from inflating the standard errors erroneously, the covariance structure of the observed series has to be accounted for[17]. The covariance structure for the two-level model on truncated data (estimated in SAS PROC MIXED) was decided upon largely through trial and error[18]. The unequal distances between measurement point caused problems [14, 17]. In the end, the default option (standard parabolic increase) was used, as it was considered appropriate and produced the best fit[14].

Figure 3.4: Residuals group "excellent responders" - model with no random effects

(a) Quantile-quantile normality plot. (b) Histogram compared to normal curve.

Figure 3.5: Level-1 residuals group "excellent responders" - model with random effects

(18)

Figure 3.6: Level-2 residuals group "excellent responders" - model with random effects

(a) Quantile-quantile normality plot. (b) Histogram compared to normal curve.

Figure 3.7: Residuals group "other" - model with no random effects

(a) Quantile-quantile normality plot. (b) Histogram compared to normal curve.

Figure 3.8: Level-1 residuals group "other" - model with random effects

(19)

Warnqvist Feeling the zeros

Figure 3.9: Level-2 residuals group "other" - model with random effects

(a) Quantile-quantile normality plot. (b) Histogram compared to normal curve.

The situation can be summed as follows:

• Though slight deviations from normality should not cause large problems for the two-level linear model, the estimates are expected to be biased due to the truncation of the data.

• Censored models and censored two-level models will quite possibly be affected by the slight deviations from normality. Also, their standard errors are expected to be incorrectly estimated due to the inability to model the covariance structure (see section 4.1 "Modeling censored data vs. truncated data").

And yet, proceed we must.

Other considerations that might affect the validity of the analysis and that were thus investigated are: sample size, outliers, missing values and multicollinearity. All are discussed below along with the used significance level and fit statistic.

For two-level models sample size, a rule of thumb of 30 can be cited for both level-1 and level-2 groups [19]. This requirement is initially fulfilled for our data. Later, when the data set is divided into two, one of the groups does not fulfill this requirement. However, as two-level models could not be fitted to this particular sub group for noncompliance with assumptions, this was not a problem. No explicit recommendations for sample size has been found for censored models. As they are effectively normal regression and mixed effects models all but with a conditionality added to them, the normal recommendations for linear models, that is: more observations than parameters, and two-level models will be used, all but with some caution[20].

Outliers were investigated through plots (See Figure 3.1, above). In the logarithm

(20)

Missing values were left as they were. In the whole data set, with 774 unique patient and

time combinations, there were 96 missing values all in all. Within-series missing values were not very many and did not constitute a problem. Missing values due to discontinued series on the other hand were common. In fact, 43 of the missing values were at time point 36 months. This means that less than half of the missing data is clustered at the last time point. Considering the sample size, this should not cause inference problems. Lost-to-follow-up scenario can be expected to cause inference problems when concentrated, but first with even greater concentration and smaller sample size[21]. This result is valid when the observations are missing at random. In this case the reason for the large amount of missing values at 36 months is caused by the lack of that information for the patients in the second study. As there is nothing to indicate that patients enrolled on a study in different years should behave differently, these observations can be caused missing at random, with some reservations.

Regression models are always sensitive to multicollinearity. Thus, multicollinearity among the background variables was investigated. Unsurprisingly, some of the different background variables were clearly, though not very highly correlated with each other. Most notable were correlations between Hemoglobin and Leukocytes (approximately -0.66) and between Platelet count and Basophiles (approximately 0.5). There was also a clear (and significant) correlation between Gender and Platelet count, (approximately -0.38), which was considered interesting. Platelet count and Leukocytes were not included in the final model, so this did not cause problems.

Also, selecting the significance level used and model comparison are done as follows: • When statistical tests are used, and unless otherwise mentioned, the standard

sig-nificance level of p-value 0.05 is applied. This is a well understood and common benchmark which will be deviated from only for situation-specific reasons.

(21)

4. Methodology

In this section, some differences in fitting a model to truncated and censored data sets will first be discussed. After that, important technical aspects of the program used will be taken up along with what implications they have for our models. Finally, the chosen models and how they are implemented will be discussed in more detail.

4.1 Modeling censored data vs. truncated data

When the zeros are ignored, as in truncated data, what is modeled is no longer the response curves, but the response before reaching the lower measurement limit and not even that without bias. Truncation will cause the estimates of the parameters to be biased. A model fitted to the truncated data set fails to account for the zero observations, creating a positive bias; truncating the data set creates missing values that are most certainly not missing at random[22]. Technically the following happens; when a truncated dataset is modeled, the mean of the error term, assumed to be zero, is actually non-zero. This is because what should be estimated is a conditional mean, mean on the condition that the observation is above 0 (or other detection boundary). As the expected value of the error is assumed to have mean zero, and as the conditional expectation differs from it, it has to be non-zero, thus creating an extra term to the regression equation. This term is naturally not accounted for when fitting a model in truncated data[22]. This will mean that a model fitted on truncated data will automatically be biased. To avoid this bias, a censored model is fitted to the whole data (for example see: [23, 24]).

(22)

PROC NLMIXED and QLIM procedures have not been constructed for longitudinal data (for coding censored models in NLMIXED: [25, 26, 27, 24]). This means that they do not allow taking into account the covariance structure, which is a major limitation. This should not cause bias in the estimates (as with truncated data), but is expected to cause wrongfully estimated standard deviations. Thus, by construct, there will be differences in the results that are beyond the differences in the model structures. Code for modeling covariance structures in NLMIXED exists, but combining that with the rather complex code for a censored model is deemed to be beyond the scope of this thesis, and possibly even beyond the scope of what NLMIXED can handle (for code see: [28]). Estimation in all three procedures will be done with Maximum Likelihood to make the results as comparable as possible[14].

4.2 Two-level model for longitudinal data

As the name indicates, the two-level model has models at two different levels, which together form a hierarchical model and where the sub model can be collapsed into the level-one model by writing the former inside the latter. Longitudinal two-level models were developed in the 1980’s to enable modeling change in longitudinal data like the set at hand[14, 17].

In its basic form the model with random slope and random intercept is:

yi j= π0i+ π1i∗ T I ME + εi j (4.1)

Where:

π0i = γ00+ ζ0i (4.2)

π1i = γ10+ ζ1i (4.3)

Hereπ0i is the random intercept, which is individual for all individual series. The zero

denotes that this is the intercept parameter and the i denotes that it is individual.π1i is the

random slope parameter in the model, also unique for all series. Here the one denotes the second parameter and the i again denotes its individuality. Theεi j is the level one error term,

where the i j denotes that it is individual for all individuals and measurement points.

γ00is the intercept of the level two model that is common to all slopes. In this case this is

the grand mean.ζ0i is the level two error term that is separate to all individuals.γ10is the

fixed effect explaining variation inπ1i, andζ1i is the error term of that equation, being also

individual.

(23)

Warnqvist Feeling the zeros

the fixed effects [14]. Collapsed into one, separating the structural and stochastic parts with brackets, this becomes:

yi j= (γ00+ γ10∗ T I ME) + (εi j+ ζ1i∗ T I ME + ζ0i) (4.4)

Later a model with a fixed intercept and both random and fixed effects will be used, which can also be written in the general form [29]:

yi j= π0+ π1x1i j+ π2x2i j+ (εi j+ ζ1ixi j) (4.5)

In this presentationπ1is a random effect. Because it is a random effect it is also included

in the error term. x1i j is the variable value associated with the random effect; the values are

individual for all individuals and measurement points.π2is a fixed effect and x2i j is variable

value associated with it, being separate for all individuals and measurement points. This last presentation has the advantage of simplicity — which is why it is used below — but it lacks the intuitiveness of the first two representations.

The level-one model (equation 4.1) represents the individual response curves, measured against time. As these curves are individual, their intercepts and slopes in the general model are random. That is, the form of the level-1 model is the same for all observed series (straight line, in this example), but coefficients that define the slopes and the intercept differ for each individual curve. When a model form is decided, it implicates our belief in that chosen form being the form of the underlying data generating process. Already in section 2 ("Data"), when examining graphs, attempts were made to define that form as possibly cubic or discontin-ued straight line, where the slope changes at three months. The level-one parameters are interpreted as the average change between two measurement points, as in normal regression. However, as they are person specific, they are too numerous to be analysed here in detail.

(24)

in this example)[14].

The level-one error termεi jcaptures the variation around the individual slope trajectories.

[14] The level-two error terms,ζ1i andζ0i, capture the variation in the individual growth

parameters that remains unexplained by the level-two predictors[14]. The composite error term represents the difference between the predicted mean value and the observed value, and is in that respect not unusual. However, in the presence of omitted variable bias (which is bound to exist), all the measurement-point-specific error terms for the same individual are affected by it. This causes autocorrelation (and possibly also heteroscedasticity) among the error terms[14]. To come to terms with this autocorrelation, the within-person covariance structure needs to be taken into account as is done for our truncated two-level model.

4.2.1 Choosing model form

(25)

Warnqvist Feeling the zeros

4.3 Censored model

Though the idea behind censored models was introduced by Tobin in 1958 [30], it seems that they have not been in wide use over the decades. They are used in situations when data is censored from either left or right or both, meaning that the exact values are not observable below (or above) a certain threshold and that all values below the threshold pile on the last observable value. This problem is for example encountered when viral suppression curves for AIDS patients are modeled. In this context an observed zero is not genuinely zero, it is just too small to be detected. A censored model takes into consideration that the distribution continues below the detection line and calculates the regression estimates on the condition of the censored observations.

The general form of the model is:

yi j=

(

yi jif yi j> yu

yu if yi j≤ yu

(4.6) Where, in this example:

yi j= π0+ π1∗ T I ME + εi j (4.7)

In the above, yi j∗ and equation 4.7 describes the model when it is above the detection line. For the model below the detection line, the probability of the values being there are calculated. yi jis our dependent variable of interest and yuis the detection limit above which

we have known values for the dependent variable.

The sensitivity to deviations from normality derives from the following reasoning used to construct the likelihood functions. First we observe that xiπ + εi > 0 is the same as εi > −xiπ.

As we expectεi > 0 to be normally distributed with zero mean, and normal distribution is

symmetric, then:

pr (εi> −xiπ) = pr (εi≤ xiπ) (4.8)

Based on this, the likelihood functions for estimates above the detection boundary, as well as of being below it, are formulated[31]. This results in;

(26)

where, in this example:

E (yi j|yu) = π0+ π1∗ T I ME + εi j (4.10)

The parameters estimated by this model can be interpreted as standard regression pa-rameters [31, 16]. In this case; one unit increase in time increases/decreases the estimated average level of logarithm transformed BCR-ABL1 IS%-scale protein by the amount indicated by the parameter, other factors kept constant.

4.4 Censored two-level model

A censored two-level model combines the above models, taking the general form:

yi j= ( yi jif yi j> yu yu if yi j≤ yu (4.11) Where: yi j= π0i+ π1i∗ T I ME + εi j (4.12) Where again: π0i = γ00+ ζ0i (4.13) π1i = γ10+ ζ1i (4.14)

The interpretation of the parameters has become somewhat complex by now. Effectively, the level-2 parameter indicates the estimated average change in the level-one parameters, when time changes one unit, all other factors kept constant and on condition of being above the detection line, and where the level-1 parameters (which are individual from them all observed curves) express average percent change in the estimated logarithm transformed BCR-ABL1 IS-scale score when time changes one unit, all other factors kept constant and on the condition that they are above the detection line.

(27)

5. Analysis and results

The Hasford, SOKAL and EUTOS scores divide patients into groups according to the severity of their symptoms (See section 2 "Data"). There have also been attempts to predict recovery success using values at one, three and six months[7, 2]. It was seen as meaningful to follow up results from these studies, particularly the ones where measurements at one month and three months were used to predict success. This resulted in two decisions:

• Firstly, the final models were build using both the background information taken at the start of treatment (hemoglobin etc.) and the data points (measured BCR-ABL1 IS%-level) as variables. The models include the measured BCR-ABL1 IS%-value at three months (variable 3months) as well as the percent value of three months measurement against the starting point measurement (3 months value/0 months value= variable

first).

• Secondly, based on the reduction in BCR-ABL1 IS%-value at three months when com-pared to the starting point (variable first), the data set is divided into two. The logic behind this decision is the attempt to find different respondent groups. The division process is explained below.

(28)

and above the 10 percent line, respectively). Where as Marin et al. compared the best two thirds to the worst third, this paper compares the best third to the worst two thirds. One reason to use such a strict value is that patients in at least some Nordic studies have been found to perform better than in other international studies. Reason for this is unknown, but might be related to the study format[4].

This cut-off at one percent produces a group of meaningful size (27/86= approximately 31 percent of the data) and considerable heterogeneity (Figure 5.1 (a)). Although this sub-sample is a good portion of the data set, it is under the recommended minimum of 30 for two-level models (see section 3.3 "Assumptions and other considerations"). However, due to the non-normality of the level-2 residuals, this group will be modeled with a normal one-level regression, and thus the smallish sample size does not cause problems.

Thus this results in two data sets: "excellent responders" and "other". For "other" three models will be fitted; two-level model (truncated data), simple censored model and two-level censored model. After this, two models will be fitted on "excellent responders": simple regression model on the truncated data and censored model on the whole data set.

At the onset it had been planned to fit a random-slope and random-intercept model. However, taking the logarithm had collapsed the initially widely spread starting values (values at time zero) close to each other (see Figures 5.1 (a) and (b), below). Due to this, differences in the intercept were not large enough to allow for individual intercepts. This demonstrated itself in the Hessian matrix (used to estimate standard deviations) not being positive defined, even if convergence was achieved. This problem was solved by treating the intercept as a fixed effect and thus all intercepts within the models equal.

Figure 5.1: The two data sets "excellent" and "other" compared

(29)

Warnqvist Feeling the zeros

5.1 Models for "Other" -group

The following models are fitted on the "other" data set: two-level model on the truncated data, simple censored model and two-level censored model.

First the two-level model is fitted to truncated (zeros removed) data. As discussed in sections 3.3 and 4.1 ("Assumptions and other considerations" and "Modeling censored vs. truncated data") this model will be biased by construct due to truncation. However, this is also the only model where the intra-series covariance structure is taken into account. This means that we can expect the parameter estimates for this model to be biased, but the standard deviations to be reliably estimated.

Two-level model (no-zeros data):

yi j= π0+ π1T i mei j+ π2T i me2i j+ π3T i me3i j+ (Model 1.1)

γ113m + γ12f i r st + γ13f i r st ∗ 3m + γ14hemog + γ15bl ast s + γ16eosi no+

ζ1 jT i mei j+ ζ2 jT i me2i j+ ζ3 jT i mei j3 + εi j

In this model yi jis the dependent variable. Theπs are the coefficients of the random effects

and theγs are the coefficients of the fixed effects, the ζs are the individual error terms as-sociated with the random effects andεi j is the individual and measurement point specific

error term of the whole model. As can be seen, the model is cubic and has six fixed effects:

3months, first, 3months*first, hemoglobin, blasts and eosinophils. Of these the three first

derive from the data points and are present in all models, for both data sets. This is because the interaction term and at least one of first or 3months are always statistically significant. The background variables included have been chosen for their statistical significance solely.

(30)

Simple censored model: yi j= ( yi jif yi j∗ > −8.9 −8.9 if yi j∗ ≤ −8.9 (Model 1.2) Where: yi j= π0+ π1T i me + π2T i me2+ π3T i me3+

π4m3 + π5f i r st + π6m3 ∗ f i r st + π7hemog + π8bl ast s + π9eosi no + εi j

Where equation 5.2 is as in section 4.3 (”Censored model”) and theπs are normal regression coefficients. In this model, and in all censored models below, the detection line is -8.9. This is due to the logarithmic transformation. As explained in section 3.2 ("Feeling the zeros") approximately half the smallest value was used to indicate zeros before taking the logarithm (0.000136). A value just above the logarithm transformed "fake zero" was then used as a detection boundary. This resulted in the figure -8.9. Same background variables (and only them) were significant for this model as for the previous truncated-data model, so model selection was unproblematic.

Lastly a censored two-level model is fitted. Like the first model, it allows the slopes to be individual for all observations. Yet, like for the second model, the software does not allow accounting for the intra-series covariance structure, which again is a problem for estimating the standard deviations. As PROC NLMIXED does not by construct allow modeling two-level censored models, the code becomes slightly more complex and considerable convergence problems were encountered. Fitting a cubic model was not successful despite attempts; PROC NLMIXED was unable to estimate three random effects simultaneously. Thus eventually a straight line model was fitted. A comparison of the fitted straight line model to its counterpart with no random effects (model 1.3) can be found in the combined table of results (Table 5.1).

Two-level censored model:

yi j= ( yi jif yi j∗ > −8.9 −8.9 if yi j∗ ≤ −8.9 (Model 1.4) Where: yi j= π0+ π1T i mei j+ γ113m + γ12f i r st + γ13f i r st ∗ 3m+

(31)

Warnqvist Feeling the zeros

This model was based on the previous two and no further models were attempted. Thus we do not know what would have happened if other variables would have been included. However, looking at the results (Table 5.1, below) there is little indication that further statistically significant variables would have been found.

5.2 Results for "Other"

Table 5.1 below represents the coefficients estimated for the different models and methods. The p-values presented after the coefficients are as follows: for the two-level model [15] the test is an F-test, for the censored model and two-level censored models [35, 34] a t-test. All tests test H0:βi= 0. As discussed above, the significance level 0.05 is used.

Table 5.1: Results: group other

Results for "other"

Variable 1.1 Two-level model (truncated data, cubic) 1.2 Censored model (cubic) 1.3 Censored model (straight line) 1.4 Two-level censored model (straight line) Intercept 6.3380 (<.0001) 7.71034 (<.0001) 4.25065 (<.0001) 3.3027 (0.0024) 3Months 0.03693 (<.0001) 0.05434 (<.0001) 0.05512 (<.0001) 0.04023 (<.0001) First 0.9118 (0.0182) -1.09503 (0.0037) -1.08074 (0.0072) 0.3862 (0.5752) 3months*first -0.00823 (0.0121) 0.00808 (0.0126) 0.00788 (0.0225) -0.00449 (0.4453) Hemoglobin -0.00906 (0.0238) -0.01243 (0.0239) -0.01239 (0.0342) -0.00760 (0.3550) Blasts 0.1530 (<.0001) 0.21859 (<.0001) 0.22101 (<.0001) 0.04668 (0.4489) Eosinophils -0.1193 (0.0002) -0.23654 (<.0001) -0.23968 (<.0001) -0.03433 (0.5889) T i me NA -3.51689 (<.0001) -0.98831 (<.0001) NA T i me2 NA 0.45022 (0.0005) - -T i me3 NA -0.02231 (0.0105) - -AIC 1636.8 2036 2078 1915.2

(32)

first and 3 months*first both of which switch signs, the first from negative to positive and

second from positive to negative; the two "change places" as it were.

The estimation of the cubic two-level censored model was not successful. Thus a straight line censored regression model was estimated (model 1.3), so that a two-level censored model (1.4) could be meaningfully compared to something. Estimating the two-level censored model produced rather interesting results. Firstly, even though the fit statistic for the straight line simple censored model was slightly bigger than for the cubic model (AIC 2036 and 2078 for models 1.2 and 1.3, respectively), the straight line two-level censored model (1.4) was better than both of them (AIC 1915.2). This indicates that adding inter-individual variation does improve the model. The second, and more surprising, finding is that where as going from a two-level model on truncated data to a simple censored model produced comparable results (models 1.1 and 1.2), going from the two simpler models to a two-level censored model (1.4) turns the significance of variables almost upside down. The coefficient estimates are in line with the two more simple models, but their significance tests are not. Suddenly, it seems that no background variable is significant. Two trials were made to investigate why this happens; • Firstly, the code of the two-level censored model (NLMIXED) was simplified by ex-cluding the random effect. Thus what was left was a model identical with the simple censored model and they can be compared. Effectively the same model is estimated, just using different SAS procedures. If this produces clearly different results, that would indicate that the problem is in the code for SAS PROC NLMIXED.

• Secondly, if the first trial is successful and the code does not seem to be the problem, PROC NLMIXED is used to estimate a censored model on the truncated data set. Now, the only difference between this model estimated in PROC NLMIXED and the model earlier estimated in PROC MIXED is that NLMIXED does not allow taking into consid-eration the covariance structure, where as PROC MIXED does. Thus, if these models clearly differ from each other, there is strong reason to believe that the considerable difference in significance levels is caused by the inability to model the intra-individual covariance structure in PROC NLMIXED.

The results from these trials are as follows:

(33)

Warnqvist Feeling the zeros

Table 5.2: Results: NLMIXED vs. QLIM

Results from QLIM and NLMIXED compared

Variable PROC QLIM NLMIXED

Intercept 4.250654 (<.0001) 4.2074 (<.0001) 3Months 0.055125 (<.0001) 0.04892 (<.0001) First -1.080744 (0.0072) -1.0746 (0.0052) 3months*first 0.007880 (0.0225) 0.009601 (0.0038) Hemoglobin -0.012395 (0.0342) -0.00417 (0.4466) Blasts 0.221010 (<.0001) 0.2747 (<.0001) Eosinophils -0.239689 (<.0001) -0.2121 (<.0001) t i me -0.988311 (<.0001) -1.1066 (<.0001) AIC 2078 2132.4

Table 5.3: Results: NLMIXED vs. MIXED

Results from MIXED and NLMIXED compared

Variable PROC MIXED NLMIXED

Intercept 3.1222 (<.0001) 3.8817 (<.0001) 3Months 0.03749 (<.0001) 0.04535 (<.0001) First 1.1394 (0.0083) -0.2631 (0.6638) 3months*first -0.01027 (0.0051) 0.000785 (0.8788) Hemoglobin -0.00931 (0.0381) -0.01289 (0.0714) Blasts 0.1549 (<.0001) 0.07602 (0.1496) Eosinophils -0.1252 (0.0006) -0.08282 (0.1505) AIC 1715.5 1701.8

(34)

covariance structure.

5.3 Models for "Excellent responders" -group

Next we will inspect the models estimated for the "excellent responders" group and the results attained. Due to non-normality of the level-2 errors and small sample size, two-level models could not be fitted to this smaller group. Because of this, only simple regression (on truncated data) and simple censored models are used. Initially a discontinuous slope model was fit to the "excellent responders" data. However, model fit was improved by using a cubic model instead of a discontinuous slope. For "excellent responders" first a simple regression model is fitted. Again, we know in advance that the simple regression model is biased due to truncation. However, again, the model has the advantage of including an intra-series covariance structure to the model.

Simple regression model (no-zeros data):

yi j= β0+ β0T i me +β2T i me2+ β3T i me3+ β4m3 +β5f i r st +β6m3 ∗ f i r st +β7bl ast s +εi j

(Model 2.1) Here theβs are normal regression coefficients. It is easily seen that only one non-data-point background variable is included in this model (blasts). In fact, even that is not statistically significant for this particular model (see table 5.4, below).

The second model fitted for the "excellent responders" group is a simple censored model. A similar model was fitted also to the "other" data set. Because two-level models cannot be fitted for the group "excellent responders", this censored model is the only model that is comparable across the two data sets. Again, we know in advance that the inability to account for the covariance structure is a problem for the estimation of the standard deviations. Simple censored model:

yi j= ( yi jif yi j∗ > −8.9 −8.9 if yi j∗ ≤ −8.9 (Model 2.2) Where: yi j= β0+ β0T i me +β2T i me2+ β3T i me3+ β4m3 +β5f i r st +β6m3 ∗ f i r st +β7bl ast s +εi j

(35)

Warnqvist Feeling the zeros

5.4 Results for "Excellent responders"

The two models we have fitted for the "excellent responders" are almost identical in form. They differ in that the censored model takes into account the zeros but fails to take account the covariance structure and the simple regression model takes into account the covariance structure and fails to take account the zeros. So, both are effectively impaired. The results for "excellent responders" are summed in Table 5.4 below. (Tests, significance levels and null hypothesis again as above.)

Table 5.4: Results: "excellent responders"

Results for "excellent responders"

Variable 2.1 Simple regression

(truncated data, cubic)

2.2 Censored model (cubic) Intercept 9.4572 (<.0001) 6.96083 (<.0001) 3Months 11.5256 (0.0017) 43.1100 (<.0001) First 74.6462 (0.2883) 698.668 (<.0001) 3months*first -1210.48 (0.0202) -6177.00 (<.0001) Blasts 0.06700 (0.2348) 0.63415 (<.0001) T i me -9.0523 (<.0001) -9.53918 (<.0001) T i me2 1.7130 (<.0001) 1.50948 (<.0001) T i me3 -0.1012 (<.0001) -0.07868 (0.0003) AIC 401.1 738.3

(36)

parameter blasts. As noted above, the large difference in the amount of observations used really impacts the comparison between the normal regression model and the censored model. It seems that when the zeros are excluded in such great quantity, the implications are severe.

5.5 Comparing the two data sets

From the results it can be seen that even if different background variables are not equally good for all models, the value at 3 months or the percent reduction after three months (first), as well as their interaction term first*3months, are always included in the models. This would indicate that they are indeed a promising option when change in the recovery curves are modeled. The intercept for the censored model for "excellent responders" (model 2.2) is 6.96. For a similar cubic model for the "other" group (model 1.2) it is 7.71. As the intercepts are logarithm transformed, they cannot be interpreted in a particularly meaningful way. However, their clear proximity would indicate that the differences in intercept are not enormous, which is in line with previous findings[2]. The intercepts for the straight line models for the "other" data set are lower, but we have no comparison for them in the "excellent responders" group.

The AIC is best for the models fitted to the truncated data sets for both groups. This is not at all surprising, considering that the zeros, which are a cause of considerable heterogeneity in the data, are excluded. The AIC’s for the two data sets are very different, but again, that was to be expected. A simple visual inspection of the data sets shows that the the "excellent responders" group is more homogeneous.

(37)

6. Conclusion

(38)

An other option would have been to start from the other end and separate the worst re-sponders. Indeed, some patients’ values increased after starting treatment. This very poorly performing group was so small in the data sample that modeling it would have been extremely unreliable. Given larger sample size, this would be a very interesting approach.

The behavior of the recovery curves and change curves close to and in-and-out of zero, along with the possibility of different and even person-specific detection boundaries, is a topic meriting further investigation.

(39)

Bibliography

[1] Regionalt cancercentrum Uppsala Örebro, Nationellt vårdprogram för kronisk myeloisk

leukemi, 2015

[2] ElMissiry et al., "Early BCR-ABL1 Transcript Decline after 1 Month of Tyrosine Kinase In-hibitor Therapy as an Indicator for Treatment Response in Chronic Myeloid Leukemia.",

PLoS ONE 2017;12(1), e0171041, doi:10.1371/journal.pone.0171041

[3] Hjorth-Hansen, H., et al., "Safety and efficacy of the combination of pegylated

interferon-α2b and dasatinib in newly diagnosed chronic-phase chronic myeloid leukemia

pa-tients.",Leukemia 2016;30(9): 1853–1860, doi: 10.1038/leu.2016.121.

[4] Hjorth-Hansen, H., et al., "Dasatinib induces fast and deep responses in newly diag-nosed chronic myeloid leukaemia patients in chronic phase: clinical results from a randomised phase-2 study (NordCML006)", EEur J Haematol., 2015;94(3):1243–250, doi: 10.1111/ejh.12423.

[5] Dybco. et al., ”The Hasford Score May Predict Molecular Response in Chronic Myeloid Leukemia Patients: A Single Institution Experience”, Dis Markers 2016: 7531472, doi: 10.1155/2016/7531472

[6] Diggle, P.J., Liang, K. and Zeger, S.L. Analysis of Longitudinal data, Oxford Statistical Science Series 13: Oxford Science Publications, Oxford University Press, 1994

[7] Marin, et.al, ”Assessment of BCR-ABL1 transcript levels at 3 months is the only require-ment for predicting outcome for patients with chronic myeloid leukemia treated with ty-rosine kinase inhibitors.” J Clin Oncol.2012;30(3): 232-8, doi: 10.1200/JCO.2011.38.6565. [8] Dagne, A.G., Huang, Y., "Mixed-Effects Tobit Joint Models for Longitudinal Data with Skewness, Detection Limits, and Measurement Errors", Journal of Probability and

(40)

[9] Fitzgerald, A.P., DeGruttola, V.G., Vaida, F., "Modelling HIV viral rebound using non-linear mixed effects models", Stat Med.2002;21(14): 2093-108, DOI:10.1002/sim.1155 [10] Lu, T. and Wang, M., "Nonlinear mixed-effects HIV dynamic models with considering

left-censored measurements", J Stat Distrib App. 2014;13(1), doi:10.1186/2195-5832-1-13 [11] Helsel D.L., "Much Ado About Next to Nothing: Incorporating Nondetects in Science",

Ann Occup Hyg.2010;54(3): 257-62, DOI:10.1093/annhyg/mep092

[12] Helsel D.L., "Fabricating data: How substituting values for nondetects can ruin re-sults, and what can be done about it", Chemosphere 2006;65(11): 2434–2439, doi: https://doi.org/10.1016/j.chemosphere.2006.04.051

[13] Dagne, G.A., "Piecewise Growth Mixture Tobit Models: Application to AIDS Studies.", J

Biopharm Stat.2015; 26(6): 1339-52, doi: 10.1080/10543406.2014.1002363.

[14] Singer, J. D., Willett, J. B., Applied longitudinal data analysis: Modeling change and event

occurrence., Oxford University Press;2003, Part one

[15] SAS Institute Inc., SAS/STAT® 9.2 User’s Guide: The MIXED Procedure 2009. Cary, NC: SAS Institute Inc.

[16] Wooldridge, J. M., Econometric Analysis of Cross Section and Panel Data, MIT press; 2010, Section IV

[17] Fitzmaurice, G.M., Laird, N.M. and Ware, J.H. Applied Longitudinal Analysis, Wiley: Wiley Series in Probability and Statistics; 2004

[18] Brown, H., Prescott, R. Applied Mixed Models in Medicine, John Wiley et. Sons; 2014 [19] Bell, B.A., Morgan, Shoeneberger, Loudermilk, "Dancing the Sample Size Limbo with

Mixed Models: How Low Can You Go?" , Poster, SAS Global Forum 2010

[20] Gujarati, D.N., Basic Econometrics, Fourth Edition, McGraw-Hill; 2003, Chapter three [21] Johnson, M., "The Effect of Missing Data on Repeated Measures Models", SAS paper 626

SAS Institute Inc. SAS/STAT®, Cary, NC: SAS Institute Inc., 1996:1104

(41)

Warnqvist Feeling the zeros

[23] Lorimer M.F. and Kiermeier A., "Analysing Microbiological data: Tobit or not Tobit", International Journal of Food Microbiology. 2007;116(3): 313–318, doi: https://doi.org/10.1016/j.ijfoodmicro.2007.02.001

[24] University of California Los Angeles, Institute for Digital Research and Edu-cation, "How do I run a random effect tobit model using nlmixed?: SAS FAQ", https://stats.idre.ucla.edu/sas/faq/how-do-i-run-a-random-effect-tobit-model-using-nlmixed/

[25] Thiebaut R. and Jacqmin-Gadda H. "Mixed models for longitudinal left-censored re-peated measures", Comput Methods Programs Biomed. 2004;74(3): 255—260, DOI: 10.1016/j.cmpb.2003.08.004

[26] Jin, Y., Hein, M.J., Diddens, J.A. and Heins, C.J., "Analysis of Lognormally Distributed Exposure Data with Repeated Measures and Values below the Limit of Detection Using SAS", Ann. Occup. Hyg. 2011;55(1):97–112, doi:10.1093/annhyg/meq061

[27] University of California Los Angeles, Institute for Digital Research and Ed-ucation, "Random effect tobit model in nlmixed: SAS Code Fragments", https://stats.idre.ucla.edu/sas/code/random-effect-tobit-model-in-nlmixed/

[28] Harrin, J.R., Blozis, S., "Fitting correlated residual error structures in nonlinear mixed-effects models using SAS PROC NLMIXED", Behav Res Methods. 2004;46(2):372-84, doi:10.3758/s13428-013-0397-z

[29] University of Bristol, Centre for Multilevel Modelling "random Slope models", http://www.bristol.ac.uk/cmm/learning/videos/random-slopes.html

[30] Tobin J., "Estimation of Relationships for Limited Dependent Variables", Econometrica 1958;26(1): 24-3

[31] Wooldridge, J. M. Introduction to Econometrics - Europe, Middle East and Africa Edition, Cengage;2013

[32] Twisk, J. Rijmen, F., "Longitudinal tobit regression: A new approach to analyze out-come variables with floor or ceiling effects", J Clin Epidemiol.2009;62(9):953-958, doi: 10.1016/j.jclinepi.2008.10.003.

(42)

[34] SAS Institute Inc., SAS/STAT® 14.1 User’s Guide: The NLMIXED Procedure 2015. Cary, NC: SAS Institute Inc.

References

Related documents

&#34;att bifalla motionens första att-sats under förutsättningar att inrättande av &#34;Röda telefonen&#34; i Blekinge sker inom ra1nen för beslutad budget&#34;, &#34;att avslå

Based on the research questions which is exploring an adaptive sensor using dynamic role allocation with interestingness to detect various stimulus and applying for detecting

Göra en processinriktad presentation av dokumentplanen/arkivförteckningen.. Dokumentplanering

VARJE SPAR HAR DOCK INDIVIDUELL BERAKNAD LANGOMA TNING. BETECKNINGAR

Socialnämnden beslutar att godkänna förvaltningens förslag till ändringar i socialnämndens delegationsordning. Reservation

Ett medborgarförslag har inkommit till kommunen med förslag att bygga vidare på cykelvägen längs väg 1341 från Höörs kommungräns till Ludvigsborg. Förslagsställaren

[r]

Varje boksida utgör en grupp av uppgifter, representerande ett visst avsnitt i kursplanen, så att varje sida räcker för t v å veckor, omkring 12 exempel.. Dessa barn önskar