Mining the WHO Drug Safety Database Using Lasso Logistic Regression

(1)

Mining the WHO Drug Safety Database Using Lasso Logistic Regression

Ola Caster

U.U.D.M. Project Report 2007:16

Examensarbete i matematisk statistik, 20 poäng Handledare: Andrew Bate, the Uppsala Monitoring Centre

Examinator: Silvelyn Zwanzig Juni 2007

Department of Mathematics

(2)

(3)

Mining the WHO Drug Safety Database Using Lasso Logistic Regression

Ola Caster

Master of Science Project in Mathematical Statistics

Supervisor: Ph.D. Andrew Bate, the Uppsala Monitoring Centre

Examiner: Ph.D. Silvelyn Zwanzig, Uppsala University

(4)

Abstract

For reasons such as low incidence, occurrence in groups frequently excluded from clinical trials and long onset times, some adverse drug reactions (ADRs) of a new medicinal product stay unnoticed until after market launch. The World Health Organization (WHO) in collaboration with the Uppsala Monitoring Centre (UMC) continuously collect spontaneous ADR reports from the entire world and use data mining approaches to detect which drugs are most likely to cause which previously unanticipated ADRs. This WHO drug safety database, being the largest of its kind, comprises about 3.8 million accumulated reports.

The currently used data mining methods are based on two-dimensional projections of the data with respect to a given drug-ADR combination.

This combination is then given an association score based on the dis- crepancy between the observed and expected number of reports on it.

In this thesis these disproportionality-based methods are represented by the information component (IC) measure of the UMC, a shrunk Bayesian measure.

A limitation with the IC is its incapability to deal with confounding by co-medication and masking. Confounding by co-medication means that the association between a drug and a certain ADR might seem stronger than it really is because that drug is used together with another drug, which in turn is truly associated with the ADR. Masking, on the other hand, is a phenomenon whereby a very strong association between an ADR and some drug might weaken the associations between that ADR and other drugs.

Here a novel method to mine the WHO drug safety database is proposed to address these issues, the lasso logistic regression (LLR). Instead of studying each combination separately, in the LLR model the ADR under study is fixed and its presence on a report is predicted by the presence of all occurring drugs in the database, thus yielding a logistic regression framework. Further, independent prior Laplace distributions are put on the parameters, resulting in a lasso-type shrinkage where a subset of the parameters are shrunk to exactly zero.

The LLR was confirmed to correct for confounding by co-medication and masking in simulated scenarios and specific clinical examples. Fur- ther, with a specific degree of shrinkage the LLR had 10 % higher recall and maintained precision in comparison to the IC with respect to a test database. Although its transparency is limited, the LLR has an important role to play in the future of ADR monitoring.

(5)

Abbreviations

ADR Adverse Drug Reaction WHO World Health Organization UMC Uppsala Monitoring Centre IC Information Component PRR Proportional Reporting Ratio LLR Lasso Logistic Regression

OR Odds Ratio

MAP Maximum a Posteriori BBR Bayesian Binary Regression

NRTI Nucleoside Reverse Transcriptase Inhibitor

NaRTI Nucleoside Analogue Reverse Transcriptase Inhibitor NNRTI Non-Nucleoside Reverse Transcriptase Inhibitor PI Protease Inhibitor

B Biguanide

S Sulfonylurea

T Thiazolidinedione

HIV Human Immunodeficiency Virus

SSRI Selective Serotonin Re-Uptake Inhibitors MS Multiple Sclerosis

FDA Food and Drug Administration

HBLR Hierarchical Bayesian Logistic Regression

(8)

1 Introduction

The issue of drug safety is one of the greatest - if not the greatest - concerns within the pharmaceutical community. It has been well-known at least since the thalidomide tragedy in 1961 that some adverse drug reactions (ADRs) will stay unnoticed until after drug launch [1]. Typically, these ADRs have a low incidence, occur in groups frequently excluded from clinical trials (such as preg- nant women) or have long onset times [2]. With a pharmaceutical industry having severe problems already with the current legislation to be able to afford their clinical trial programs, the way out seems to be post-marketing drug safety surveillance rather than intensified pre-marketing testing.

One of several approaches within post-marketing drug safety surveillance is to collect a large number of spontaneous ADR reports and mine the database thus constructed. When this mining is performed repeatedly over a period of time it is called ADR monitoring, an idea dating back to the 1960’s [3]. Already then the need for international cooperation within ADR monitoring was noticed [1].

In this spirit, the World Health Organization (WHO) in collaboration with the Uppsala Monitoring Centre (UMC) collect reports from about 80 countries worldwide, and now have over 3.8 million accumulated reports. This WHO drug safety database, which is the largest of its kind, is mined quarterly.

The database includes about 14000 drug terms and 2000 ADR terms. There are reports on about 750000 out of the approximately 20 million theoretical combinations of drug and ADR terms. From here on, the word combination will always have this meaning, unless explicitly stated otherwise. In addition to its size, the mining of the database is complex because of huge underreporting, heterogeneity of data on reports and reporting biases [4].

The primary task when mining this and similar databases is to find out which drugs are most likely to cause which previously unanticipated ADRs, i.e. to highlight combinations which seem to be too frequently reported. These combinations are then normally further investigated, often by clinical review of the evidence. Mainstay statistical inference such as e.g. hypothesis testing is not feasible for databases of this kind due to the non-systematic data collection and lack of comparison groups. To avoid confusion for readers familiar with ADR monitoring, a signal is here defined as the highlighting of a combination for clinical review by judging the reporting on that combination as unexpectedly high [5]. This use of the word does not take any clinical evidence of causality into consideration, as has been suggested when trying to define various concepts within this field of research [6].

(9)

The currently most popular data mining methods for quantitative analysis of drug safety databases are the so called disproportionality-based methods, which are either Bayesian or non-Bayesian two-dimensional projections of the data [7].

An example of the former is the information component (IC) measure of the UMC [8], and an example of the latter is the proportional reporting ratio (PRR) [9]. The major advantage with Bayesian measures in this context is that they exhibit an a priori belief of no disproportionality, which means that they are shrunk towards a value of no association. By varying the shrinkage one can control how much data that is needed to shift away from the a priori belief towards a suspicion that the combination is actually disproportionally over-reported. Due to the nature of the two-dimensional projections each combination is studied separately, and in some way the observed number of reports on that combination is compared to what would be expected based on the rest of the reports.

However, the disproportionality-based methods suffer from two clear limitations.

Firstly, they are unable to correct for confounding by co-medication [10]. This means that if one drug A, say, truly causes some ADR and another drug B, say, is frequently co-medicated with drug A, then these methods will signal that both drugs are likely to cause the ADR. Some researchers use the wording

’signal leakage’ from drug A to drug B, and drug B is sometimes referred to as an

’innocent bystander’. This is a well-documented phenomenon within statistics, where it is called Simpson’s paradox. It is, however, not a paradox but rather a simple consequence of the fact that each combination is studied separately.

The second problem is related to the background reporting rate. When studying a particular combination, the disproportionality-based methods naively assume that all reports on the ADR excluding the drug of interest constitute some sort of general background reporting of the ADR. However, if there are one or more drugs which occur frequently on those reports the background reporting becomes substantial, which increases the expected number of reports on the combination currently under study. Thus, when the observed number of reports is compared to the expected number of reports, the false conclusion could be drawn that the combination between the drug and the ADR should not be highlighted as a signal. This phenomenon is called ’masking’, or sometimes ’cloaking’ [10]. Note that this description of masking does not imply that there is any causal link between the ADR and the drug of the masked combination.

In this thesis, a fundamentally different approach to mining the WHO drug safety database is studied, namely regression. Generally speaking, in a regression model the value of some dependent variable is explained by a set of

(10)

predictor variables, and their respective degrees of contribution are estimated in terms of parameters. The nature of drug safety databases, where on each report a certain ADR is either present or absent, makes the use of logistic regression plausible [11]. By simultaneously using all drugs as predictors for the presence of the ADR on a report a multivariate framework is set up which, at least theoretically, corrects for confounding by co-medication. Further, because the regression models include an intercept which is a function of the background reporting rate, the regression approach could in theory be anticipated to avoid problems with masking. Even though there seems to be a huge gap between the disproportionality-based methods and the regression framework, there is a clear link. In fact it can be shown that the parameters of a logistic regression model are some sort of measures of disproportionality [12], however adjusted with respect to the other predictors, i.e. the presence of other drugs in this application.

To be precise, the regression method used is lasso logistic regression (LLR) [13].

This is one of many shrinkage regression methods, which all have the basic idea of shrinking the parameters towards zero. Generally speaking, shrinkage regression is appealing for two reasons: Firstly, whereas the estimates are no longer unbiased they exhibit a lower variance which gives an overall better predictability of the models, and secondly, the models become easier to interpret because a smaller subset of effects are highlighted [14]. The probably most well- known shrinkage regression method is ridge regression which was introduced as early as 1962 [15], six years after the idea of biased estimation had first been proposed by Stein [16].

In ADR monitoring one is primarily not interested in neither predictability nor interpretability of models, but rather in finding out which drugs are likely to cause which ADRs. However, the shrinkage approach still offers three potential advantages over ordinary logistic regression (as normally implemented) in this application where the number of predictors is of the magnitude 10⁴: Numerical instability in the estimation is avoided, the computations become considerably faster and finally drugs with very few reports on the ADR of interest will not be highlighted, given that the shrinkage is strong enough [17, 13].

This final property is completely analogous to the shrinkage of the Bayesian disproportionality-based methods.

The lasso (least absolute shrinkage and selection operator) was proposed by Tibshirani in 1996 [18], however similar ideas seem to have emerged earlier in related fields [19]. It is a regression method that imposes an L1 penalty on the parameters, which means that the sum of the absolute values of the

(11)

parameters is restricted. Already in the original description of the lasso the connection to Bayesian statistics was stated: The L1 penalty is equivalent to putting independent prior Laplace distributions on the parameters. Because of the special nature of this penalty, a subset of the parameters are shrunk to exactly zero. This is appealing in the context of ADR monitoring since it gives a natural signalling threshold at zero, although, admittedly, it is not altogether obvious that lasso logistic regression should be superior to e.g. ridge logistic regression in this application.

It might be clarifying to think of the LLR as a multivariate extension of the Bayesian measures of disproportionality, e.g. the IC, which in turn are based on the non-Bayesian measures of disproportionality. However, this extension does not only bring advantages but also possible limitations, such as the need for model assumptions and presumably heavier computational load. The sheer complexity of the method is also an issue since it might be difficult to interpret the models produced.

The aim of the project is to:

1. implement the method of LLR on the WHO drug safety database 2. investigate the properties of the LLR in ADR monitoring to see if this

method is practically useful and

3. to attempt to characterize when the LLR is most and least likely to be beneficial in ADR monitoring.

(12)

2 Models and Methods

2.1 The Database Structure

The structure of the n reports is:

{([a11, . . . , a1r1]^T, [d11, . . . , d1s1]^T), . . . , ([an1, . . . , anrn]^T, [dn1, . . . , dnsn]^T)}

(1) where a_ijrepresents the j:th ADR in the i:th report and, similarly, d_ijrepresents the j:th drug in the i:th report. We have that aij ∈ A = {α1, . . . , α_|A|}, the set of all ADRs, and that d_ij ∈ D = {δ₁, . . . , δ_|D|}, the set of all drugs. Every report includes at least one ADR and one drug, and we call a report with only a single drug listed a sole drug report.

2.2 Two-Dimensional Measures

Whereas the focus of this thesis is LLR, a couple of two-dimensional measures are needed to set up the framework. A two-dimensional projection of the database is given in table 1.

There are several disproportionality measures defined based on this projection.

In this project comparisons against the IC measure are frequently occurring.

This measure is well approximated by [20]:

IC = log₂ a + 0.5

(a+c)∗(a+b)

a+b+c+d + 0.5 (2)

The numerator is the observed number of reports plus a constant (0.5) and the denominator is the expected number of reports (under the assumption of mutual independence) plus the same constant. The addition of this constant to both the numerator and denominator shrinks the ratio towards 1, and thus the measure towards 0 since the logarithm to the base 2 is used. The shrinkage will be stronger for combinations with few reports. In practice, in addition to a point estimate also the lower endpoint of a 95 % credibility interval (which will

Table 1 . Two-dimensional projection of the database for ADR j and drug k.

ADR j yes ADR j no Total

Drug k yes a b a + b

Drug k no c d c + d

Total a + c b + d a + b + c + d

(13)

here be called a confidence interval) is calculated [4]. In screening the WHO database, the IC highlights all combinations for which this 95 % lower confidence bound called IC025 is strictly positive [8].

Another two-dimensional measure, the odds ratio (OR), is of interest because it is connected to logistic regression (see section 2.3.1). The odds ratio is defined as:

OR = a/b

c/d (3)

To understand this definition we need to know that the odds for an event A is given by P (A)/(1 − P (A)). Thus, the odds ratio is interpreted in this context as the quotient between the odds that the ADR is present on a report with the drug and the odds that the ADR is present on a report without the drug.

Strictly speaking, since we are dealing with the total database, this should be called the reporting odds ratio [7]. Non-shrunk crude reporting odds ratios have themselves been proposed as sensible measures of disproportionality.

2.3 The Lasso Logistic Regression Model

The notation of this section follows that of Section 2.1.

Contrasting the two-dimensional measures, in LLR one ADR rather than one combination is studied at the time. If we study the j:th ADR the data are transformed into the following form: {(x₁, y₁), . . . , (x_n, y_n)}

where y_i=

( 1 if aik= αj for some k ∈ {1, . . . , ri} 0 otherwise

and xip =

( 1 if d_il= δ_p for some l ∈ {1, . . . , s_i} 0 otherwise

if xip is the p:th element of xi. Note that xi stays the same irrespective of which ADR that is studied.

Now, let the ADR under study be fixed. Let xi be extended with the constant term 1 as its first input to accommodate the intercept β₀ and consider the ordinary logistic regression model:

P (y = 1|β, xi) = exp (β^Txi)

1 + exp (β^Txi) (4)

We extend this model to the lasso logistic regression model by imposing an L1

constraint on the parameters:

X

j

|β_j| ≤ t (5)

(14)

It is well known that finding the solution β to the model in Equation 4 under this constraint is the same as finding the (Bayes) maximum a posteriori (MAP) estimate under independent Laplace (double exponential) priors for the βjs [18]:

f (βj|λ) = λ

2exp (−λ|βj|) (6)

This prior distribution has mean 0, mode 0 and variance 2/λ². It can be shown that there is a one-to-one transformation between the bound t and the hyperparameter λ [14].

From now on we adopt the Bayesian perspective. Putting independent Laplace priors with mean 0 and variance 2/λ² on the parameters yields the following posterior log-likelihood (except a normalizing constant) [13]:

l(β) = −

n

X

i=1

log (1 + exp (−β^Txiyi)) −

|D|

X

j=0

(log2

λ+ λ|βj|) (7) The MAP estimate is the β that maximizes this posterior log-likelihood.

2.3.1 Interpretation of the β vector

In a univariate ordinary logistic regression model with a dichotomous predictor variable β consists of only one parameter β1 (if the intercept is neglected). It is well-known and easily shown [12] that, if the predictor variable is coded as either 0 or 1, β1 is the same as the log of the odds ratio (recall section 2.2) for the predictor in the two possible outcomes of the dependent variable. This connection between logistic regression and odds ratios is very important and serves as a foundation for our interpretation of the β vector.

The univariate case is easily extended to a multivariate setting with more than one predictor variable, as is the case in ADR monitoring. In this case each β_j (except the intercept) could be interpreted as an adjusted log odds ratio. In other words it is the log odds ratio for a certain predictor variable under the condition that all other predictor variables are identically distributed within the two outcomes of the dependent variable. Since this condition seldom holds, adjusted log odds ratios often differ from crude log odds ratios calculated from two-dimensional projections. This regression property is the reason why we would expect LLR to correct for confounding by co-medication.

By looking at the definition of an odds ratio, it seems reasonable that a value of greater than one, equivalent to a positive βj, would suggest an association between the ADR under study and the drug corresponding to the βj. In this

(15)

case, when the effect of all other drugs has been accounted for, the odds that a report including the drug will also include the ADR exceeds the odds that a report not including the drug will include the ADR. Note that this chain of logic assumes that the database truly reflects the real world. Since it is well-known that the database suffers from numerous flaws (see Section 1), thus we can only use the estimated β_j values as a means of highlighting combinations for clinical review, and not for common statistical inference.

In the LLR model the parameters are to varying degrees shrunk towards zero (equivalent of shrinking the log odds ratios towards unity) and the magnitude of the shrinkage depends on the value of the hyperparameter λ. Therefore the interpretation of the β vector is a much more delicate issue. However, our main interest will still be the positive parameters.

Equation 4 can aid in the interpretation of the intercept:

P (y = 1|β, xi0= 1; xip= 0 ∀p 6= 0) = exp (β0)

1 + exp (β0) ≈ exp (β0)

where the approximation will always be accurate for this database. Since xip = 0 for all p 6= 0 so that the effect of all drugs has been taken out, the intercept could be interpreted as the multivariate estimate of the log background reporting frequency of the ADR. This is the reason why we would expect the LLR to correct for masking when the background reporting is very unevenly spread out among the drugs.

2.4 The CLG-lasso Algorithm

The issue of finding the MAP estimate of β is a convex optimization problem.

In ordinary logistic regression β is estimated using the maximum likelihood method. The standard algorithm is the iteratively reweighted least squares (IRLS) algorithm, which is based on the Newton-Raphson method [14]. The disadvantage with this method in problems with many predictors, such as this, is the high memory load [13]. Also, when the lasso-type shrinkage is imposed on the parameters the IRLS algorithm can not ensure convergence [18]. Instead, a recently proposed algorithm, the CLG-lasso algorithm, was used for fitting the logistic model with Laplace priors [13]. CLG-lasso, which is a so called cyclic coordinate descent algorithm, is based on the CLG algorithm of Zhang and Oles [21]. An exhaustive description of the algorithm is beyond the scope of this thesis, and interested readers are referred to the detailed description by Genkin et al [13].

(16)

The basis of all cyclic coordinate descent algorithms is to optimize with respect to only one variable at the time while all others are held constant. When this one-dimensional optimization problem has been solved, optimization is performed with respect to the next variable, and so on. When the procedure has gone through all variables it starts all over with the first one again, and the iterations proceed in this manner until some pre-defined convergence criterion is met.

The one-dimensional optimization problem in LLR is to find β_j^new, the value for the j:th parameter that maximizes the posterior log-likelihood assuming that all other βjs are held constant. Looking at Equation 7, the objective function g() becomes:

g(z) =

n

X

i=1

log (1 + exp ((β_j− z)x_ijy_i− β^Tx_iy_i)) + λ|z| (8)

The CLG-lasso algorithm uses an approximation to g() such that, in the end, the update equation for β_j becomes

β_j^new=







βj− ∆j if ∆vj< −∆j

β_j+ ∆v_j if −∆_j ≤ ∆v_j≤ ∆_j βj+ ∆j if ∆j < ∆vj

(9)

where the interval [βj− ∆j, βj+ ∆j] is an iteratively adapted trust region for the suggested update ∆v_j. The width of this interval is determined based on its previous value and the previous update made to βj. The suggested update is given by

∆vj = Pn

i=1x_ijy_i_{1+exp (β}¹_T_x

iy_i)− λsgn(β_j) Pn

i=1x²_ijF (β^Txiyi, ∆jxij) (10) where, for some δ > 0, F () is defined as

F (r, δ) =

( 1/4 if |r| ≤ δ

1/(2 + exp (|r| − δ) + exp (δ − |r|)) otherwise (11)

The suggested update ∆v_j has two problems. First, because sgn() is undefined at 0, the update itself is undefined at βj= 0 and second, there is no guarantee that g() decreases if βjchanges sign. The first problem is solved by testing both sgn(β_j) = 1 and sgn(β_j) = −1 for a decrease in g() when β_j= 0. The convexity of the objective function guarantees that at most one option can be successful.

The second issue is solved by simply setting β_j^new to 0 if the update suggests a change of sign for βj.

(17)

This update is performed once for each β_j in each iteration. The algorithm stops when convergence has been reached as defined by

Pn

i=1|(β^Tx_iy_i)^new− (β^Tx_iy_i)^old| 1 +Pn

i=1|(β^Txiyi)^new| ≤ (12)

In this paper, = 0.0005 throughout.

2.5 Implementation

To implement the model presented in Section 2.3 using the algorithm in Sec- tion 2.4 on the WHO drug safety database, a freely available software called BBR (Bayesian Binary Regression) was used [13, 22]. It is an efficient implementation of the CLG-lasso algorithm written in the C language available for both Windows and Linux environments, operated via the command line. For a detailed description of the software and its options the reader is referred to the web page [22].

The starting point was a transcript from the database on the format presented in Section 2.1. Since the BBR software requires a datafile with a slightly different format, a translating script was written in the Perl language [23]. It prints a datafile based on which ADR the user wishes to study. The syntax looks like

> perl translator.plx adr=<adr number> <infile> <outfile>

where the command line options have obvious meanings.

When called, the BBR software by default prints a so called model file which basically includes the identification numbers for the used predictors (in this case all the drugs) and the respective parameter estimates. The model files give little information when inspected visually if there are more than just a few predictors. The software also prints information such as estimation diagnostics and numerically sorted parameter estimates to the screen. For easier use of the software and facilitated presentation of the results, a Perl script was written. It takes four arguments: Commands to pass on to the BBR software, the name of the data file, the name of the result file it should produce and the number of the ADR studied. These arguments are simply split by a semicolon:

> perl BBR.plx ;<BBR options>;<data file>;<result file>;<adr number>

(18)

This script first starts the BBR software with the given options and data file.

It then reads the information sent to the screen and saves it as the result file specified. Finally, it reads the result file and inserts the following information for drugs that have at least one report with the ADR studied:

• The name of the drug

• The IC value for the combination between this drug and the ADR

• The 95% lower confidence bound for the IC value (IC025)

All other lines are left untouched. In this way a result file is produced which includes estimation diagnostics as well as easily overviewed parameter estimates and corresponding IC values. An example of such a result file is given in Ap- pendix A.

Often it is desirable to get not only point estimates of model parameters but also at least approximate confidence intervals. Whereas there exist different approximate formulae for the calculation of the covariance matrix of the lasso parameter estimates in the context of linear models [18, 24], we know of nothing equivalent for logistic models. However, because the bootstrap is such a general approach, the bootstrap percentile method [25] was used to obtain approximate confidence intervals for the parameter estimates in this application of the LLR.

The method is very straightforward: Sampling with replacement from the original n reports is performed to construct B bootstrap datasets of size n, and then β is estimated from each of the bootstrap datasets. Finally, he approximate 1 − α confidence interval for β_j is given by [β_j^α/2, β^1−α/2_j ], where β_j^k is the k:th percentile of the B estimates for βj.

The property of the L1 penalty to set some parameters to exactly zero introduced a small problem to the bootstrap procedure. It can very well happen that parameters get strictly positive or strictly negative estimates based on the original data and some of the bootstrap datasets, whereas they get estimated to zero based on other bootstrap datasets. This means that the distribution of the bootstrap estimates will be bell-shaped around some value and have an isolated peak at zero. It was chosen to accept this, because if a lower confidence bound for a parameter is set to zero this simply reflects the fact that there is little data to support the association between the ADR and the drug corresponding to that parameter.

The bootstrap procedure was also implemented as a Perl script. It is operated the same way as BBR.plx above, but with an extra argument (<B>) determining the number of bootstrap samples:

(19)

> perl BBRboot.plx ;<B>;<BBR options>;<data file>;<result file>;<adr number>

It produces a result file similar to BBR.plx, however with an extra column for the lower confidence bound of the parameter estimates.

2.6 User Choices

2.6.1 Hyperparameter

As suggested in Section 2.3, the LLR is dependent on the user choosing a hyperparameter λ or, equivalently, a bound t. The value will decide how much the parameters are shrunk. Because the degree of shrinkage is specified in terms of prior variance rather than λ in the BBR software (the prior variance is equal to 2/λ², see Section 2.3), the choice of hyperparameter is presented as prior variance in this thesis. In many applications of the lasso and similar methods the hyperparameter value is chosen with respect to its ability of producing models with low prediction error. In such cases the method of cross-validation [14]

provides an easy and efficient way of estimating the hyperparameter.

The application of ADR monitoring is somewhat different since the models produced are not primarily intended for prediction. In fact, once the estimation of a given model is complete, the model is used to raise hypotheses about possible associations between drugs and the ADR under study. Thus, one could suspect that cross-validation would not be a good way of choosing the hyperparameter value in this application. This suspicion was supported by simple preliminary testing (results not shown) where cross-validation suggested far too little shrinkage; for instance drugs with only a single report on an ADR received unrealistically high parameter estimates. The reason might be that some of the data can add to the predictability although it presents far too weak evidence for potential highlights.

In this study it was assumed that a prior variance of 1 would work well, or at least not be harmful to the method. This hyperparameter value was used in the entire study except the systematic comparisons (see Section 3.4), where different hyperparameter values were used. Interestingly, a similar pragmatic approach regarding a suitable shrinkage is used also for the IC measure [4].

(20)

2.6.2 Signalling Threshold

In the setting of ADR monitoring one needs a signalling threshold, i.e. a decision rule specifying when the output of the method should lead to further investigation of a particular combination. Since the interpretation of the model parameters of the LLR is quite complex (see Section 2.2), it is far from obvious how this threshold should be chosen.

In this study two intuitive signalling thresholds were used, β₀₂₅ > 0 and β >

0. The former has a clear connection to the corresponding threshold for the IC measure, which is IC025 > 0. The main disadvantage with this choice is that the problems of the bootstrap procedure (see Section 2.5) propagate into the signalling decision. The latter one, which was only used in the systematic comparisons, lacks the fine-tuning that a varying confidence level offers, however the low computational load is a clear advantage. See also Section 3.4 for more elaboration on the two different signalling thresholds.

(21)

3 Experiments

This section covers three different parts: Simulation studies, studies of specific real examples and systematic comparisons between the LLR and the IC. For each part the experimental setup is described, followed by the results and a brief discussion. The reason for this separation is to emphasize that the three parts follow a natural chronological course. Note that in quite a few graphs the y-axis label is ’signal score’ which means IC value or, in the case of the LLR, β estimate. These are not expected to be on the same scale and can therefore not be compared in an absolute sense.

3.1 Simulation

In order to investigate whether the LLR was likely to be able to correct for confounding by co-medication and masking, simple simulated examples were studied first. The original database was used as background and then reports on two made up drugs, A and B, and a made up ADR, X, were added. Be- cause of this simple approach the examples were transparent which made the interpretation easier. The LLR was run with a prior variance of 1 without bootstrapping.

Using a notation of A/X for the combination between drug A and ADR X;

¬A/X for the combination between other drugs than A and ADR X; A, B/X for the combination of A and B together with X and so forth, the two scenarios are presented in Table 2.

Scenario 1 was intended to simulate a situation with confounding by co-medication.

The most important features are the 100 sole drug reports on A/X, the 10 reports on A, B/X and the small number of sole drug reports on B/X. This reflects a strong association between A and X and a weak association between B and X probably confounded by co-medication with A, at least at the point

Table 2 . Summary of the number of added reports in the simulated scenarios.

Scenario A/X A, B/X B/X A/¬X B/¬X A, B/¬X ¬A, ¬B/X

(spread)

1 100 10 0-10 1000 1000 100 1000 (100)

2 5000 0 0-10 5000 5000 0 1000 (100)

A and B are made up drugs and X is a made up ADR. The numbers indicate how many reports that were added on certain combinations in the different scenarios. The

’spread’ is the number of drugs that the reports are spread out on.

(22)

where there is not a single sole drug report on B/X. The other number of reports that were used were more or less chosen arbitrarily.

The results are shown in Figure 1. As expected the IC could not differentiate between reports on the combination between drug B and ADR X that included only this drug and those reports that also included drug A. This means that the IC highlighted the combination B/X already at x = 0, i.e. where the only reporting of this combination consisted of ten reports that also included A, a drug strongly associated with X. As the number of sole drug reports between B and X increased the LLR signal score also increased, however it never reached as high as that of the IC. The reason is that the LLR continuously corrected the IC upward bias in the strength of association for this combination. The interpretation of the results was that the two methods performed as expected, and the implication was that a search for real examples of confounding by co- medication was started (see Section 3.2).

# Drug B reports

Signal score

0 2 4 6 8

0 2 4 6 8 10

● ● ● ● ● ● ● ● ●

IC

0 2 4 6 8 10

● ● ● ● ● ● ● ● ●

LLR

● ● ●

Drug A Drug B

Figure 1 . Results from simulation scenario 1, which attempted to mimic a situation with confounding by co-medication. Apart from some background reporting, drug A is extensively reported with ADR X and there are also joint reports on drugs A and B with the ADR. The x-axis shows how many reports on the combination between B and X that have been added, and the y-axis shows the signal score, i.e. the IC values for A and B in the left panel and the LLR β-values in the right panel.

(23)

Scenario 2 was set up to simulate a situation with masking. Here the most important features are the 5000 sole drug reports on A/X and the 1000 reports on X with 100 other drugs, which reflects a very uneven reporting on X where 5/6 of all reports are with A. It was therefore thought that if the LLR were capable of correcting for masking, it would highlight an association between B and X much earlier than the IC, e.g. with fewer B/X reports added. The other number of reports were chosen more or less arbitrarily, however it would be expected that the number of reports on B/¬X could not be set to low. The reason is that it seems unlikely that a drug which is rarely reported would have its association with an ADR masked, irrespective of how uneven the reporting on that ADR is.

The results are shown in Figure 2. The effect was not as dramatic as in the first simulation scenario, but it is clear that the LLR signal score was shifted upwards in comparison to the IC. It is difficult to draw any conclusions regarding how many reports the respective methods would need to highlight the combination B/X because that depends on the particular choices of signalling threshold and hyperparameter value for the LLR. Also, the IC lower confidence bounds were not calculated. Nevertheless, the interpretation of the results was that the LLR had the potential of correcting for occurring masking in the database, and therefore a real example was studied more extensively with respect to this property (see Section 3.3).

(24)

# Drug B reports

Signal score

0 5 10

0 2 4 6 8 10

●

● ● ● ● ●

● ●

IC

0 2 4 6 8 10

● ● ●

● ● ● ● ● ●

LLR

● ● ●

Drug A Drug B

Figure 2 . Results from simulation scenario 2, which attempted to mimic a situation with masking. Apart from some background reporting, drug A was heavily over-reported with ADR X in comparison to all other drugs. The x-axis shows how many reports on the combination between B and X that have been added, and the y-axis shows the signal score, i.e. the IC values for A and B in the left panel and the LLR β-values in the right panel.

3.2 Confounding by Co-Medication

As the results from simulation scenario 1 suggested that the LLR was capable of correcting for confounding by co-medication of drugs, a search for real examples of this phenomenon was initiated. The starting point was an ADR term where it has already been shown that confounding by co-medication is an issue, lactic acidosis.

In this section three different ADRs are presented. For each of them the analysis was performed in the same way: First, the LLR with a prior variance of 1 was run in 200 bootstrap replicates. This is generally considered to be too few replicates, however the lengthy computations allowed no more. Then a subset of drugs from the class(es) of drug(s) under study were selected and displayed separately. For all three ADRs, only drugs with at least five reports on the ADR were selected. In addition a graph displaying all drugs reported with the ADR was drawn where the different class(es) of drug(s) under study were compared to

(25)

all other drugs. Finally the reasons to observed differences between the methods were investigated, mainly by looking at raw counts of reports.

3.2.1 Example 1: Lactic Acidosis

This is a well studied example within this field of research, and has often been presented as a motivation for the need of using regression methods [26, 27]. The condition lactic acidosis is serious and highly undesirable, meaning that the pH level in the blood decreases to potentially lethal levels.

Two groups of drugs are particularly related to this ADR: anti-HIV drugs and anti-diabetes drugs. Anti-HIV therapy is very complex when it comes to ADR monitoring, partly because the drugs cause many different ADRs, partly because they are frequently co-medicated and partly because the reporting systems in the countries where the drugs are mostly used are severely under-developed.

The current question is which class or classes of anti-HIV drugs that actually cause lactic acidosis. Here we consider, like Szarfman et al. [27], four distinct classes of drugs: Nucleoside Reverse Transcriptase Inhibitors (NRTIs), Nucleo- side Analogue Reverse Transcriptase Inhibitors (NaRTIs), Non-Nucleoside Re- verse Transcriptase Inhibitors (NNRTIs) and Protease Inhibitors (PIs).

Regarding the anti-diabetics, it is particularly one class of drugs, the biguanides (B), that are undoubtedly causing lactic acidosis to some extent. Two of these drugs, phentermin and buformin, were actually withdrawn from the market because of their association with lactic acidosis. In addition to this class, the study included two other classes of peroral antidiabetics, the sulfonylurea (S) and the thiazolidinediones (T).

Figure 3 displays the signal scores (point estimates and 95 % lower confidence bounds) for all drugs from the selected classes with at least five reports on lactic acidosis. To simplify matters somewhat, all drug terms of combinations of substances were excluded.

The results look very much like those produced by Szarfman et al. Among the anti-HIV drugs (left panel), the NRTIs stavudine and zidovudine together with the NaRTI tenofovir were the drugs with the highest LLR signal scores.

The other NRTIs and NaRTIs, with the exception of zalcitabine, got lower, yet positive signal scores whereas all PIs and NNRTIs got negative signal scores.

The IC measure, on the contrary, gave positive signals for all the studied drugs.

Moving on to the anti-diabetes drugs (right panel), both methods gave the three biguanides very high signal scores. However, five out of six drugs in the class

(26)

sulfonylurea got positive signal scores by the IC and negative signal scores by the regression method. The β for the sixth drug, tolbutamide, was just slightly above 0 and β025 was clearly negative. The thiazolidinediones also got lowered signal scores but just slightly.

However not shown in the graph, calculated crude odds ratios were consistent with the IC values for almost all drugs. The only exception was lodenosine, and the explanation is probably that the IC is shrunk whereas crude odds ratios are not.

In Figure 4 all drugs reported with lactic acidosis are shown. Here the biguanides, the NaRTIs and some of the NRTIs are placed further up in the upper-right corner than any other drugs, indicating that these drugs are the strongest associated with this ADR. All other anti-HIV and anti-diabetes drugs are placed above or slightly to the left of the mass of all other drugs. The reason is that they have received too high IC values because of confounding. In the interest of clarity the 95 % lower confidence bounds are not presented in the graph, and as expected they added no further information to that given by the point estimates.

The results are readily explained by looking at raw counts of reports, see Ta- ble 3. Starting with the anti-diabetics, the biguanides obviously should be highly ranked. The sulfonylurea were so frequently co-medicated with metformin that it is natural that they got low signal scores by the LLR. The most obvious example is gliclazide, which had 18 reports on lactic acidosis, all including metformin.

Tolbutamide had 75 % sole drug reports, and probably would have got a higher positive signal if the absolute number of counts had been higher. Glibenclamide had 10 sole drug reports, but because 32 of its 52 reports also included metformin, which received a very strong signal, the LLR lowered its signal score markedly. The thiazolidinediones were also co-medicated with metformin, but to a much lower extent. Thus, their very moderate signal dampening seems to be in line with what could be expected.

Among the anti-HIV drugs, it seems natural that the PIs and NNRTIs got negative signal scores because of their very frequent co-medication with stavudine.

This does not apply to amprenavir, but that drug only had 2 sole drug reports, so its low signal score still seems fair. Stavudine and zidovudine had very many sole drug reports and seem to truly cause the ADR. The low LLR scores for the other NRTIs seem reasonable in most cases, however the drug abacavir is a bit of an exception. It had 18 sole drug reports on lactic acidosis but still got a signal score just above 0 because 38 of its total 76 reports also included the strongly associated drug stavudine.

(27)

Signal score −202468 NRTI−Stavudine NRTI−Lodenosine NRTI−Didanosine NRTI−Zidovudine NRTI−Lamivudine NRTI−Abacavir NRTI−Zalcitabine NRTI−Emtricitabine NaRTI−Tenofovir NaRTI−Adefovir NNRTI−Efavirenz NNRTI−Nevirapine NNRTI−Delavirdine PI−Saquinavir PI−Ritonavir PI−Nelfinavir PI−Indinavir PI−Amprenavir PI−Atazanavir B−Phenformin B−Metformin B−Buformin S−Glibenclamide S−Tolbutamide S−Gliclazide S−Glipizide S−Chlorpropamide S−Glimepiride T−Troglitazone T−Rosiglitazone T−Pioglitazone

●

●●

●

● ●

●

●●

●

● ●

●

●●

● IC LLR

Figure 3 . Signal scores for all drugs coming from the pre-defined classes of interest with at least five reports on lactic acidosis. Anti-HIV drugs are presented in the left panel and anti- diabetes drugs are presented in the right panel. Drug class abbreviations: NRTI Nucleoside Reverse Transcriptase Inhibitor; NaRTI Nucleoside Analogue Reverse Transcriptase Inhibitor;

NNRTI Non-Nucleoside Reverse Transcriptase Inhibitor; PI Protease Inhibitor; B Biguanide;

S Sulphon amide; T Thiazolidinediones.

Mining the WHO Drug Safety Database Using Lasso Logistic Regression

Mining the WHO Drug Safety Database Using Lasso Logistic Regression

Ola Caster

U.U.D.M. Project Report 2007:16

Department of Mathematics

Mining the WHO Drug Safety Database Using Lasso Logistic Regression

Contents

Abbreviations

1 Introduction

2 Models and Methods

3 Experiments