Assurance for binary response with and without covariates

(1)

Assurance for binary response

with and without covariates

Mattias Östlund

U.U.D.M. Project Report 2007:28

Examensarbete i matematisk statistik, 30 högskolepoäng

Handledare: Anna Stoltenberg, AstraZeneca

Examinator: Hans Garmo

Oktober 2007

Department of Mathematics

Uppsala University

(2)

(3)

Abstract

This paper describes the difference between statistical power and assurance when the response is binary. Sample size is typically chosen to achieve a desired power conditionally on a pre-specified treatment effect. In practice, power will not always give a good estimation of the probability that the trial will produce a positive outcome because there may be an uncertainty about what the true underlying treatment effect is. Assurance is the unconditional probability that the trial will end with the desired outcome. Sample size calculation using assurance is a Bayesian approach and need a prior distribution of the unknown parameter, which will be described. This paper also describes how to perform a power simulation and an assurance simulation when the response is binary, with and without adjustment for covariates.

(4)

Acknowledgment

I would like to thank Anna Stoltenberg and Olivier Guilbaud at AstraZeneca for all advices and help they gave me. I would also like to thank my supervisor Hans Garmo for his help and support.

(5)

1 BASIC CONCEPT OF POWER AND ASSURANCE... 4

1.1STATISTICAL HYPOTHESES... 4

1.2THE USE OF P-VALUES IN HYPOTHESIS TESTING... 5

1.3POWER... 5

1.4ASSURANCE... 6

2 POWER AND ASSURANCE FOR TWO PROPORTIONS ... 7

2.1LOGISTIC REGRESSION... 7 2.4.1 Wald test ... 7 2.2POWER SIMULATION... 8 2.3COMPUTE ASSURANCE... 8 2.4PRIOR DISTRIBUTIONS... 9 2.4.1 Beta distribution ... 9

2.4.1.1 Shapes of the beta density function... 10

2.4.2 Normal distribution with a logit transformation ...11

2.4.2.1 logit transformation... 12

2.4.2.2 Distribution and generation of P... 13

2.5ILLUSTRATION OF POWER AND ASSURANCE...15

3 POWER AND ASSURANCE FOR TWO PROPORTIONS WITH COVARIATES...18

3.1LOGISTIC REGRESSION WITH COVARIATES...18

3.2POWER SIMULATION WITH COVARIATES...19

3.3COMPUTE ASSURANCE WITH COVARIATES...19

3.3.1GENERATION OF P...19

3.4ILLUSTRATION OF POWER AND ASSURANCE WITH COVARIATES...21

3.4.1 Example 1 ...21

(6)

1 BASIC CONCEPT OF POWER AND ASSURANCE

1.1 Statistical Hypotheses

A null hypothesis, H0, is a hypothesis set up to be rejected in order to support an alternative

hypothesis, H1. The null hypothesis is considered to be true until statistical evidence indicates

otherwise. When testing a hypothesis, the null hypothesis should be the opposite of what you want to show.

For example, say that you want to test the null hypothesis H0 that the average length of life is equal for

man and woman in Sweden, versus that it isn’t equal. This may be stated formally as

0 1 2 1 1 2 : : H H µ µ µ µ = ≠

where µ1 is the average length of life for a man in Sweden, and µ2 is the average length of life for a

woman in Sweden. Here the alternative hypothesis is called a two-sided alternative hypothesis because it is true both if µ1<µ2 or if µ1>µ2.

Let’s look at another example, this time with binary data. Let’s say that one will test a new drug against some illness and that there only are two possible outcomes, either the patient dies (denoted by 0) or he survives (denoted by 1). The null hypothesis will be that the new drug doesn’t give the patient a greater chance to survive than the old one, verses the alternative hypothesis the patient has a greater chance to survive with the new drug. This will be stated formally as

0 1 2 1 1 2 : : H H µ µ µ µ ≤ >

Here µ1 is the new drug and µ2 is the old one. Here the alternative hypothesis is called a one sided

alternative hypothesis because it’s only true if the new drug is better than the old one.

Two kinds of errors may be committed when testing a hypothesis. If the null hypothesis is rejected when it is true, a type I error has occurred. If the null hypothesis is not rejected when it is false, a type

II error has occurred. The probability of these two errors is given by:

(

)

(

)

(

)

(

)

0 0 0 0

type I Error reject when is true

type II Error fail to reject when is false

P P H H P P H H α β = = = =

(7)

In general when testing a hypothesis, one specifies the type I error α of the statistical test, called the significance level, and then ensure that the probability of a type II error β has a suitably small value, e.g. by choosing the sample size appropriately.

1.2 The use of P-values in hypothesis testing

One way to report the result of a test is to state that the null hypothesis was or wasn’t rejected at a specified α-value, for example 0.05. There is some disadvantage of this approach because it gives the decision maker no idea about if the null hypothesis was just barely rejected or not. To avoid this, the

p-value approach has been adopted. The p-value is the probability that the test statistics will take on a

value that is at least as extreme as the observed value of the statistics when the null hypothesis is true.

The p-value is defined as the smallest level of significance that would lead to rejection of the null

hypothesis H0. The test statistics is called significant when the null hypothesis is rejected; therefore,

you may think of the p-value as the smallest level α at which the test is significant. Once the p-value is known, the decision maker can determine how significant the test is without a pre-selected level of significance.

1.3 Power

Power, denoted by π, is the probability that the test will reject the null hypothesis given (i.e. as a function of) the true treatment effect. In other words, given that the true treatment value is different

from what H0 states, the power of a statistical test is the probability to not make a Type II error. As

power increases, the chances of a Type II error decrease. Therefore the power is equal to 1 - β. The power is often set to be 80 % or 90 % and it’s recommended that the power is as large as possible.

The power is important when you decide what sample size to choose. Let R denote rejection of the

null hypothesis and θ = θ* denote the assumed parameter values corresponding to the specified

treatment effect. To choose a sample size in a clinical trial to test a null hypothesis against an alternative hypothesis, you choose the sample size that solves

π(θ*) = π*

where π* is the desired power and π(*)denotes the power function

(8)

This is the conditional probability of R. For example it’s conditional on the true, but unknown, treatment effect.

1.4 Assurance

O’Hagan, Stevens and Campbell introduce assurance as an alternative or complement to power. Assurance is the unconditional probability that a trial will lead to a rejection outcome and it is denoted by γ. Choose the sample size that solves

γ = γ*

where γ = P(R) and γ* is the desired value of γ. In fact, the assurance of the event R is the expected

power

P(R) = E(π(θ))

where the expectation is with respect to the prior probability distribution of θ. By this way assurance

avoids the need to condition on the fixed treatment effect θ* at the design stage. Instead it quantifies

the ability of the trial to achieve a desired outcome based on the available evidence.

You need a Bayesian perspective at the design stage because assurance requires the specification of a prior distribution for the unknown population parameters. The prior distribution expresses prior uncertainty about the fixed but unknown true value of θ.

It’s reasonable to adopt a Bayesian perspective at the design stage because the design should take all information that is known into account. The reason for that is that you will increase the chance of a successful outcome.

(9)

2 POWER AND ASSURANCE FOR TWO PROPORTIONS

Let’s consider a clinical trial where the outcome is binary i.e. with only two possible outcomes. The response is denoted by y and can only take the values 0 and 1 where a successful outcome is denoted

by 1. Let ri denote the number of successes observed among ni patients for treatment i = A,B. Pi will

denote the underlying population success rate for patients receiving treatment i. Treatment A is the

treatment for the active group and treatment B is the treatment for the placebo group. PA is the

probability of a successful outcome in the active group and PB is the probability of a successful

outcome in the placebo group. The null hypothesis will be PA = PB. Logistic regression and a Wald

test will be used to test this.

2.1 Logistic regression

Logistic regression is primarily used when the output variable is, as in this case, binary. It can also be

modified to handle data that are nominal or ordinal. The response, yi, of a patient receiving treatment i

can only take on the values of 0 or 1. This response is assumed to be a Bernoulli random variable with a probability distribution

Pr(Yi = 1) = Pi

Pr(Yi = 0) = 1- Pi

When the response is binary, the Pi’s are often expressed through the logistic response function or

logit. This has the form

log , 1 i i i P i A B P α  _  _= ₌  _  _  −  

This means that 1 i i i e P e α α =

+ and that PA equals PB if and only if αA = αB.

2.4.1 Wald test

The null hypothesis that PA= PBi.e. that αA = αB can be tested through the Wald test. If the sample

sizes nA and nB are large, the Wald test compares the maximum likelihood estimate of αA – αB with an

(10)

(

)

(

)

ˆ ˆ ˆ ˆ A B A B z se α α α α − = −

which is approximately standard normal distributed, N(0,1), under the null hypothesis.. In normal theory models test based on this principle, but with z replaced with t, are exact. In other cases, including the present case, the Wald test is an asymptotic test that is valid only in large samples. In

some case the Wald test is presented as a χ2 tests. This is based on the fact that if z is asymptotically

standard Normal, then asymptotically, z2 follows a χ2 distribution with one degree of freedom.

2.2 Power simulation

Power simulation can be used to determine the sample size. To do a power simulation you draw a

large number of patients from the placebo group and active group. Draw nA Y-values each with

probability Pr(Y = 1) = PA of success and probability Pr(Y = 0) = 1- PA of failure for the active group.

In the same way, draw nB Y-values each with probability Pr(Y = 1) = PB and Pr(Y = 0) = 1- PB for the

placebo group. Then perform a Wald test of the null hypothesis that PA = PB. Repeat this M times.

Then the power is estimated by the proportion of times that the test is significant. Note that in this

simulation the parameter θ = (PA,PB) is fixed at the value θ

*

at which the powerπ∗is desired.

If you have an earlier study you can also select N patients with replacement from that study, say nA =

N/2 from treatment A and nB = N/2 from treatment B. Then perform a Wald test on each set of data to

see if you can reject the null hypothesis that PA = PB. Do it M times. Then the power will be estimated

by the proportions of times that the test is significant. The assumption is then that the PA and PB

estimates in the earlier study are the same as in the planned study.

One can then repeat this simulation for different sample sizes until a sample size that yields the desired power is obtained (usually 80% or 90%).

2.3 Compute assurance

To compute assurance you simply extend the power simulation method. The process involves

sampling a new value of PA and PB that follows a prior distribution each time before sampling the data.

The general algorithm to compute assurance is as follows:

1. Define counters I for iterations and T for the assurance. Set the counter to 0 from the start. Set

the required number, N, of iterations.

(11)

3. Sample the sufficient statistic using the model and the new PA and PB.

4. Increment T if the test is statistically significant at alpha level.

5. Increment I. If I < N, go to step 2.

6. Estimate the assurance γ = P(R) by T/N

Note that the assurance equals power when you don’t sample a new PA and PB for each iteration. The

assurance is a different concept than power. Whereas power equals the probability of rejecting the null

hypothesis given that the true values of the unknown PA and PB are as specified, the assurance equals

the probability of rejecting the null hypothesis unconditionally on the values of PA and PB.

2.4 Prior distributions

2.4.1 Beta distribution

The beta distribution Beta(a,b) is a popular prior distribution for a proportion (or probability parameter). It has some advantages but also some disadvantages, more on that subject later. If P ∈

Beta(a,b), the density function of P is

(

)

( )

(

)

1 1 1 ; , 1 , b a P x a b x x a b β β − − = − ∼ where

( )

1 ₁

(

)

1 0 , a 1 b a b t t dt β =

∫

− − −

( )

(

) (

2

)

1 a E P a b ab Var P a b a b  _  =_ ₊ __ = + + +

To get a new P for each iteration just sample a new value from the Beta(a,b). Given a desired value E for E(P) and v for V(P) you get:

2 3 E E vE a v − − = and

(12)

(

1

)

a E b E − =

2.4.1.1 Shapes of the beta density function

The beta density function will take different shapes depending on the values of a and b:

• If a< and1 b< it is U-shaped 1

• If a< and1 b≥ or1 a= and1 b> it is strictly decreasing 1

• If a= and 1 b> it is strictly convex 2

• If a= and 1 b= it is a straight line 2

• If a= and 11 < < it is strictly concave b 2

• a= and 1 b= it is the uniform distribution 1

• If a= and 1 b< or 1 a> and 1 b≤ it is strictly increasing 1

• If b> and 2 b= it is strictly convex 1

• If a= and 2 b= it is a straight line 1

• If 1< < and a 2 b= it is strictly concave 1

• If a> and 1 b> it is inverted U-shaped. 1

Moreover, if a= then the density function is symmetric around 1/2. b

(13)

When the expected value is fixed at a certain value and you vary the variance, you get different shapes of the distribution function. Let’s look at an example where the expected value is fixed at the value 0.15 and the variance is 0.001, 0.005, 0.01, 0.02, 0.05 and 0.1.

One problem with the beta distribution is that there isn’t any easy way to calculate the assurance when you have a model with covariates. Another problem is that it’s impossible to center the P-distribution

(in the mean or median sense) around a given value E and at the same time let the variance σ2→ . 0

2.4.2 Normal distribution with a logit transformation

The density function of the normal distribution can be written as

(

)

(

)

2 2 1 ; , exp 2 2 x f xµ σ µ σ σ π  ₋ _  _  = __− _   

(14)

I will use the normal distribution with a logit link as the prior distribution. The reason for that is that with normal distribution it’s possible to fix the mean value and take the variance to zero. It’s also easy to calculate the assurance when you have one or more covariates.

2.4.2.1 logit transformation

As mention in chapter 2.1, the logit link transforms P as

( )

logit log 1 P P P  _  = _ ₋ __

The ratio, P/(1-P), is the odds of success so the logit is often called the log odds. The logit function is inverted sigmoid function, in other words, it produce a curve that have an inverted S-form (see picture below). The function is symmetric around 1/2. The logit link is often used for binary or binomial data.

(15)

2.4.2.2 Distribution and generation of P

Given a desired value E for center of the P-distribution, and a value of σ2, then let α be distributed

according to log , 1 E N E α∈ ___ __  _ σ__  −   

To sample a new value of P, sample a value of α and solve

( )

logit log 1 P P P α       = _ _= −    

The new value of P is then given by

1 e P e α α = +

(16)

The probability density function of P is the derivative of the Cumulative distribution function of P.

( )

(

)

(

)

(

)

2 Pr log 1 1 1 1 log 1 1 1 1 1 exp 2 1 2 P f c P c c c c c c c c c c µ ϕ σ σ µ σ σ π ∂ = ≤ ∂    _   −         _ ₋ _ _ _ _     = _ _ __ ₋ __   _  _    _ _ _ _ _  _ _ _{ }  _  __{−  }            −           = _− __ _{ }_ _ __ ₋ _  _ _{ }    _ _{ }  _ _{ }       Where log 1 E E

µ= _ ₋ __. Note also that E equals the median of the P-distribution. This follows from the

fact that the cumulative distribution function of P will be equal to ½ if c is set equal to E. Let us look

at a case when E is fixed at 0.15. This will give log 0.15

1 0.15

µ= _ ₋ __. Vary the variance σ2 of α in a

similar way as was done for the beta distribution as prior distribution. The shape of the P-distributions will be:

(17)

Note that with this approach, the top of the probability density function for P will be close to E = 0.15,

no matter what σ2 is. That was not the case with the beta distribution.

2.5 Illustration of Power and Assurance

Consider a phase II study of a new Stroke drug. Prior information of the PA (the new drug) and PB

(The old drug) is specified via

0 1

New drug 0.85 0.15

Old drug 0.89 0.11

Where 1 indicating no symptoms (i.e. success).

To estimate the power for given values of N you have to:

1. Create N patients (say N/2 patients/group) where the probability of a successful outcome is

0.15 in the active group and 0.11 in the placebo group.

2. Perform logistic regression and a Wald test to test the null hypothesis that PA = PB i.e. αA = αB.

3. Define a counter T set to zero from the start. If p<0.05for H0 increase T bye one.

4. Repeat steps 1-3 5000 times

5.

5000

T

π=

6. Repeat for different N until you find the pre-specified levels of interest (π = 80 % or 90 %) Now, let’s estimate the assurance for given values of N, using the Normal distribution as a prior

distribution. If the variance σ2 for α is taken to be 0.01 and 0.1 (remember the picture in chapter

2.3.2.2) for both the active and the placebo group you calculations are as follows:

1. Simulate an αA from 2 0.15 log , 1 0.15 N__ __ _σ __   −    and an αB from 2 0.11 log , 1 0.11 N__ __ _ σ __   −    2. Calculate 1 A A A e P e α α = + and 1 B B B e P e α α = +

3. Create N/2 patients/group and where the probability of a successful outcome is PA for the

active group and PB for de placebo group.

4. Perform logistic regression and a Wald test to test the null hypothesis that PA = PB i.e. αA = αB.

(18)

6. Repeat from the first step 5000 times 7.

5000

T

γ=

8. Perform this from the first step for the same N as in the power simulation.

Result:

N Power Assurance σ2 = 0.01 Assurance σ2 = 0.1

100 8 % 10 % 14 % 500 32 % 32 % 43 % 1000 57 % 54 % 57 % 1500 73 % 67 % 64 % 1800 80 % 72 % 68 % 2000 84 % 76 % 69 % 2500 90 % 80 % 71 % 3000 96 % 84 % 73 % N P o w e r / A s s u r a n c e 3000 2500 2000 1500 1000 500 0 100 80 60 40 20 0 Variable A ssurance v = 0.1 Power A ssurance v = 0.01

Power and Assurance

Straight lines connect the points, given above. The non-smoothness of the curves is due to the uncertainty in the results.

Notice from the results that:

• If the variance σ2 for α is taken to be 0.01 then the assurance is lower than the 80% or 90%

statistical power we want the study to have. When the power is 80% then the probability of actually getting a good result is 72% and when the power is 90% it is 80%.

(19)

• Increasing statistical power from 80% to 90% doesn’t lead to a 10 percentage point increase of the assurance.

• If the variance σ2 for α is taken to be 0.1 then the assurance is much lower then the power. The

trial sponsor need to decide if it’s really worth to increase the number of patients from 1800 to 2500 when the assurance only increases 3 percentage points.

(20)

3 POWER AND ASSURANCE FOR TWO PROPORTIONS WITH

COVARIATES

In this chapter I will develop this technique to fit the case when you have covariates. Let’s start to look at logistic regression with covariates.

3.1 Logistic Regression with covariates

Logistic regression works almost in the same way as in the case with no covariates. The response, Yi,

can only take on the values of 0 or 1. The response is assumed to be a Bernoulli random variable with a probability distribution

Pr(Yi = 1) = Pi

Pr(Yi = 0) = 1-Pi

A logit link will be used, just as in the case without covariates. If we have k covariates, x1,…,xk, that

are binary or continuous, then

1 1 log , 1 i i k k i P x x i A B P α β β  _  _{= +} _{+ +} ₌  _  _  −   …

If we instead have a covariate that is categorical, let’s say that x1 has m categories, then

( ) ( ) 11 11 1 1 1 1 log , 1 i i m m i P x x i A B P α β β − −  _  _{= +} _{+ +} ₌  _  _  −   …

(21)

3.2 Power simulation with covariates

A power simulation can be performed in a similar way as in the previous chapter. Here is the algorithm.

1. Define a counter T = 0.

2. Create N/2 patients for the active group and N/2 patients for the placebo group.

3. Simulate the covariates to each patient. The covariate-distributions as well as the regression coefficients (assumed fixed), is estimated from the earlier study.

4. Let 1 denote a successful outcome. The probability for a successful outcome will be PA and

PB. PA and PB will depend on the covariates, the regression coefficients and the given values of

αA and αB of interest.

5. Perform logistic regression and a Wald test to test the null hypothesis that PA = PB.

6. If p<0.05for H0 increase T bye one.

7. Repeat step 2-6 M times.

8. T

M

π=

Another possibility is to select N patients with replacement from the earlier study, N/2 from the active group and N/2 from the placebo group. Each patient will then have covariates. Then perform a test on each set of data. Do it M times, then the power will be the proportions of times that the test is

significant. The assumption is then that the PA-distribution and PB-distribution in the earlier study are

the same as in the planned study.

3.3 Compute Assurance with covariates

The algorithm for assurance calculation with covariates is the same as in the case without covariates

but P will be generated in a different way, because the regression coefficients, as well as αA and αB, are

assumed to be random.

3.3.1 Generation of P

If the model has k covariates that is binary then the logistic model for the treatments looks like this

( )

(

)

(

( )

)

1 1 1 log E ... E , 1 i i k k k i P x x x x i A B P α β β  _  _{= +} ₋ _{+ +} ₋ ₌  _  _  −  

Here x1,…,xk are covariates, which can be dummy variables if categorical covariates are involved. The

(22)

with covariates is the same as in the case without covariates (to make comparisons simple between the cases).

To summarize the algorithm for generation of P for a model that has k covariates:

1. Generate covariates from a covariate distribution with known expectations E(x1), …,E(xk)

from an earlier study.

2. Generate αi. Given a desired value Ei for the center of the Pi-distribution and a value σ2, let αi

be distributed according to log , 2 ,

1 i i E N i A B E σ    _   _ _     =       −    .

3. Given a desired Eβ for E(βh) and a chosen σ2 for βh, generate βh from N E

(

β,σk2

)

. Do it for h =

1,…,k, independently of each other.

4. Now solve the logistic equation to get a new P. This will give

( ) ( ) ( ( )) ( ) ( ) ( ( )) 1 1 1 1 1 1 E E E E 1 i k k k i k k k x x x x i _x _x _x _x e P e α β β α β β + − + + − + − + + − = + … …

If the variance is set 0 then this is a third method to determine the power. The power may vary a little bit from the other method because in this logistic method it’s assumed that the odds ratio is the same between the treatments in each category within the covariates.

(23)

3.4 Illustration of Power and Assurance with covariates

3.4.1 Example 1

Let’s look at the same example as in chapter 2.5 but this time with sex as a covariate. In the earlier study there were 1705 patients. Of the 1705 patients data were not available for 6; thus, 1699 patients were included in the efficacy analysis. Of those patients 850 received the new drug and 849 received the placebo. Among the patients who received the new drug 378 patients was female and among the patients in the placebo group 381 patients was male, i.e. about 45% was female in both the active group and in the placebo group. Let a female be denoted by 1 and a male by 0.

Remember that PA is the probability for a successful outcome in the active group and that PB is the

probability of a successful outcome in the placebo group. Prior information about PA and PB:

Sex Sexdistribution

Male (0) 55%

Female (1) 45%

Let’s look at a power simulation and then the assurance calculation when the variance σ2 for α is taken

to be 0.01 and 0.1 (remember the picture in section 2.4.2.2). When logistic regression is performed on the data from the earlier study β is approximated to -0.08 and the variance of β is approximated to

0.15. When the power is simulated the variance σ2 is set to be 0.

1. Create a counter T = 0. 2. Generate N/2 patients/group. 3. Generate an αA from 2 0.15 log , 1 0.15 N__ __ _σ __  −    and an αB from 2 0.11 log , 1 0.11 N__ __ _ σ __  −   , EA = 0.15

and EB = 0.11 is estimated in previous study.

4. Generate a β from N

(

−0.08,σ_β2

)

, E(β) = -0.08 from previous study.

5. Generate the covariate x. The probability for a 1 (girl) is 0.45.

6. Calculate ( ) ( ) ( ) ( ) E E , 1 i i x x i _x _x e P i A B e α β α β + − + − = = +

7. Simulate a response for each patient. Probabilities for a successful outcome are given bye PA

for the active group and PB for the Placebo.

8. Perform logistic regression and then a Wald test for the hypothesis that PA = PB.

9. Increase T if p<0.05

10. Repeat step 2-9 5000 times.

11. Then

5000

T

(24)

Result:

100 7 9 13 500 31 33 44 1000 55 53 56 1500 72 66 64 1800 80 72 68 2000 84 76 68 2500 90 81 72 3000 95 85 76 N P o w e r / A s s u r a n c e 3000 2500 2000 1500 1000 500 0 100 80 60 40 20 0 Variable A ssurance v = 0.1 Power A ssurance v = 0.01

Power and Assurance

It seems like sex doesn’t have a strong correlation to the response because the result is almost the same as in the case without sex as a covariate.

(25)

3.4.2 Example 2

In this example I have the same data as in the last example but also with a covariate with four categories specified via:

Category % of patients

1 36

2 30

3 22

4 12

Prior information about β:

E σ2

β11 3.0 0.6

β12 2.1 0.6

β13 1.1 0.6

To performe a power simulation or an assurance calculation, follow this algorithm:

1. Create a counter T = 0. 2. Generate N/2 patients/group. 3. Generate an αA from 2 0.15 log , 1 0.15 N σα    _   _ _   _ _      −    and an αB from 2 0.11 log , 1 0.11 N σα    _   _ _   _ _      −   

4. Generate β1 from N

(

3.0,σβ21

)

, β2 from N

(

2.1,σβ22

)

and β3 from N

(

1.1,σβ23

)

, independently,

where E(β1) = 3.0, E(β2) = 2.1 and E(β3) = 1.1 from previous study.

5. Generate covariate x1 with probabilities from the table.

6. Make dummy variables depending on the category:

11 12 13 Category 1 0 0 1 0 1 0 2 0 0 1 3 0 0 0 4 x x x 7. Calculate ( ) ( ) ( ( )) ( ( )) ( ) ( ) ( ( )) ( ( )) 11 11 11 12 12 12 13 13 13 11 11 11 12 12 12 13 13 13 E E E E E E , 1 i i x x x x x x i _x _x _x _x _x _x e P i A B e α β β β α β β β + − + − + − + − + − + − = = + , where E(x11) = 0.36,

E(x12) = 0.30 and E(x13) = 0.22.

8. Simulate a response for each patient. Probabilities for a successful outcome are given bye PA

for the active group and PB for the Placebo.

9. Perform logistic regression and then a Wald test for the hypothesis that PA = PB.

10. Increase T if p<0.05

11. Repeat step 2-10 5000 times.

12. Then

5000

T

(26)

The power is determined id all the variances are set to zero. Let’s look at the assurance when σα is

taken to be 0.01 and 0.1 Result:

100 11 % 12 % 16 % 500 38 % 40 % 48 % 1000 64 % 62 % 61 % 1500 81 % 75 % 69 % 2000 91 % 81 % 72 % 2500 95 % 85 % 74 % 3000 98 % 88 % 76 % N P o w e r / A s s u r a n c e 3000 2500 2000 1500 1000 500 0 100 80 60 40 20 0 Variable A ssurance v = 0.1 Power A ssurance v = 0.01

Power and Assurance

There are a few things you can notice from the result. If the variance for α is taken to be 0.01 then:

1. Given N, increasing sample size from 81% to 91% doesn’t lead to a 10 percentage points

increase of the assurance, i.e. of the average success probability. It’s only increases 6 percentage points.

2. The assurance is lower than the 80% or 90% power we want the study to have.

3. Given N, both the power and the assurance increase when the model takes the covariate into

(27)

Suppose an expert actually estimated the variance for α to be 0.1, then increasing the sample size from 1500 to 2000 only increases the assurance by 3 percentage points. For the trial sponsor it would only be worth to increase the sample size from 1500 to 2000 if the potential value of the drug is very large.

(28)

4 DISCUSSION

The use of assurance is in some ways different than the use of power. Where power calculation tells the probability of successfully rejecting the null hypothesis if the true value of the treatment effect is the specified treatment effect, the assurance specifies the unconditional probability of a successful outcome. Determining assurance entails specifying prior distributions for the unknown parameters. Good approximations of the variance of the parameters are needed to calculate the assurance. The assurance will decrease if the variance of the parameters increases. Assurance can easily be determined using Bayesian clinical trial simulation.

Clinical trials are almost always designed to achieve power of 80% or 90%. That doesn’t mean that there is an 80% or 90% chance of a significant result. Instead that figure only applies conditionally on the treatment effect equaling the specified treatment effect. The assurance figure is often mush lower than the power because it’s the unconditionally probability of a successful outcome. The power of 80% or 90% is often misleading and may tend unfounded optimism.

It’s important to specify prior knowledge of the unknown treatment effect carefully. As seen in my last example, the power and the assurance are higher when the covariate is included. If consideration of the covariate is forgotten the trial sponsor may have to pay for more patients than needed. It’s also

important to have a good estimation of the variance for α. Notice from my examples how the

assurance is affected by the form of the prior distribution. In my last example a trial sponsor may think that it’s worth to increase the sample size from 1500 to 2000 when the assurance increases 6

percentage points but not when it only increases 3 percentage points. Assurance will not replace power, but it’s a good complement.

(29)

REFERENCES

1. O’Hagan A, Stevens JW, Campbell MJ. Assurance in clinical trial design. Pharmaceutical

statistics 2005; 4: 187-201

2. SAS help and documentation - Overview of Power Concepts

3. Chuang-Stein C. Sample size and the probability of a successful trial. Pharmaceutical

statistics (in press)

4. O’Hagan A, Stevens JW. Bayesian assessment of sample size for clinical trials of

Cost-Effectiveness. Medical Decision Making; 21(3): 219-230 5. http://www.wikipedia.org/ Beta distribution

6. Cathy Lawson, Douglas C. Montgomery. Logistic Regression Analysis of Customer Satisfaction Data. Quality and reliability engineering international 2006 (in press)

7. Olsson Ulf. Generalized linear models – an applied approach; 2002: 47, 86