Restricted Region Exact Designs

(1)

Department of Mathematics, Linköping University Johan Persson

LiTH-MAT-EX–2017/12–SE

Credits: 16 hp Level: G2

Supervisor: Frank Miller,

Department of Statistics, Stockholm University Examiner: Zhenxia Liu,

Department of Mathematics, Linköping University Linköping: July 2017

(2)

(3)

Problem statement: The D-optimal design is often used in clinical research. In multi-factor clinical experiments it is natural to restrict the experiment’s design space so as not to give a patient the combination of several high dose treatments simultaneously. Under such design space restrictions it is unknown what designs are D-optimal. The goal of the thesis has been to find D-optimal designs for these design spaces. Approach: Two new algorithms for finding D-optimal designs with one, two or three factors with linear models has been developed and implemented in MATLAB. Two restricted design spaces were explored. In cases when the program could not find the D-optimal design an analytic approach was used. Results: Special attention was given to the two factor model with interaction. All of the D-optimal designs for this model, N≤30 and their permutations have been listed as well as their continous designs. Conclusion: In one of the restricted design regions a simple design pattern appeared for N≥7. In the other restricted design region no obvious pattern was found but its continuous design could be calculated through analysis. It turned out that the number of trials at the lowest dose combination did not change when moving from the full space design to the restricted design regions.

Key words: exact design, D-optimality, D-optimal designs, design of ex-periments, Fedorov algorithm, restricted experiment region, restricted design region, restricted design space, information matrix, clinical experiments, iso-morphic designs, isoinformatic designs.

URL for electronic version:

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138614

(4)

(5)

Frågeställning: D-optimala designer är vanliga i kliniska studier. När flera faktorer (läkemedel) prövas samtidigt kan det vara nödvändigt att begränsa försöksrummet så att patienterna undviker att få en hög dos av flera faktorer samtidigt. I sådana begränsade försöksrum är det okänt vilka designer som är D-optimala. Uppsatsens mål har varit att hitta D-optimala designer i begrän-sade försöksrum. Metod: Två nya algoritmer för att hitta D-optimala designer med en, två eller tre dimensioner och linjära modeller har utvecklats och im-plementerats i MATLAB. Två begränsade försöksrum har utforskats. I de fall då MATLAB-programmet inte kunde hitta de D-optimala designerna användes analytiska metoder. Resultat: Analys av en tvåfaktormodell med interaktion utforskades särskilt noggrant. Alla D-optimala designer och permutationer av dessa i de båda begränsade försöksrummen har listats för N≤30, samt även de-ras kontinuerliga designer. Slutsats: För det ena försöksrummet upptäcktes ett mönster i designen då N ≥ 7. I det andra försöksrummet upptäcktes inget möns-ter och det krävdes således analytiska metoder för att finna dess kontinuerliga design. Det visade sig att antalet försök i den lägsta doskombinationen förblev oförändrat då man bytte från det fulla designrummet till de båda begränsade designrummen.

Sökord: exakt design, D-optimalitet, D-optimal design, experimentdesign, Fedorovalgoritm, begränsad experimentregion, begränsad designregion, begrän-sad designrymd, informationsmatris, kliniska experiment, isomorfa designer, iso-informa designer.

URL för elektronisk version:

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138614

(6)

(7)

I would like to thank my supervisor Frank Miller for his great help and guidance. I would also like to thank my parents for their everlasting support. I am grateful to have had company from my lovely classmates on long days of hard work. A double thanks goes to Erik Landstedt, both for his friendship and for being the opponent of my thesis.

Lastly I would like to thank the fix point in my life, Patricia Carlbom.

(8)

(9)

(10)

Nomenclature

d number of factors (dimensions) in an experiment

N number of trials in an experiment

xi = (x1, ..., xd) design point number i, i = 1,...,N

S = {x1,...,xt}⊆ Rd design space

si(j) level j in factor i, i = 1,...,d, j = 1,...,li

li the highest level in factor i

t = l1· l2· ... · ld number of (possible) design points in S

n ≤ N number of design points used in the experiment R = {x1, ..., xr} ⊆ S design region

r ≤ t number of (possible) design points in R

x factors

xi value of factor i

Y = (Y1, ..., YN)T responses

(xi, Yi) trial

E = {(xi, Yi)}i∈N experiment with N trials, i = 1,...,N

Ni number of trials in design point i

ωi= lim N →∞

Ni

N weighted amount of trials in design point i

X design matrix

XT transpose of X

M model matrix

det(A) determinant of A

md,model(X) polynomial that expands X to M

(11)

βi model coefficient i, i = 1,2,...,p ˆ βi coefficient estimate β = (β1, ..., βp)T coefficient vector ε = (ε1, ..., εN)T experimental error (XTX) information matrix (XTX)−1 dispersion/covariance matrix

N (µ, σ2) normal distribution with mean µ and variance σ2

E(Y) expected value of Y

V(Y) variance of Y

DN design with N trials

Dopt,N D-optimal design with N trials

end of example

(12)

(13)

1 Background 1

2 Method 5

2.1 Theory of Experiments . . . 5

2.2 Models . . . 7

2.3 Number of Trials . . . 12

2.4 Least Squares Estimation . . . 12

2.5 D-optimality . . . 14 2.6 Scaling . . . 17 2.7 Levels . . . 18 2.8 Design Regions . . . 19 2.9 Isomorphic Designs . . . 23 2.9.1 d = 1 . . . 23 2.9.2 d = 2 . . . 24 2.9.3 d = 3 . . . 27

2.9.4 Isomorphic Designs and Information . . . 29

2.9.5 Isoinformative Designs . . . 30 2.10 Program . . . 31 2.10.1 Program Idea . . . 31 2.10.2 Program Description . . . 32 2.11 Computational considerations . . . 37 3 Results 39 3.1 Cut-out Design Region, d = 2 . . . 39

3.1.1 Exact Designs . . . 39

3.1.2 Continuous Designs . . . 41

3.2 Cut-off Design Region, d = 2 . . . 43

3.2.1 Exact designs, N = 4 and N = 5 . . . 43

3.2.2 Exact designs, N = 6 . . . 47

(14)

3.2.3 Continuous designs, N ≥ 7 . . . 48

3.3 Algorithm Comparison . . . 49

3.4 Design area and information, d = 2 . . . 50

4 Discussion 53 4.1 Conclusions . . . 53

4.2 Restrictions and Further Studies . . . 53

A Appendix 57 A.1 Code . . . 57

A.2 Isomorphic and Isoinformatic designs . . . 58

A.3 Isomorphism Table . . . 60

A.4 Cut-out Design Region vs Full Design Space . . . 61

A.5 Cut-out Design Region vs Cut-off Design Region . . . 62

(15)

Background

Experiments are probably just as old as the thinking man himself. Throughout all times people have been able to construct good experiments through only intuition and reflection. Let us take a look at stone skipping on water. If we with only two throws want to find out the correlation between the number of stone skips and the stone’s shape, then what type of stones should we throw? The answer is one stone that is as flat as possible and one stone that is as non-flat as possible, let us call it round. We understand intuitively that to throw these two stones would be a better choice than to throw two stones that look more or less the same.

Information rich experiment

Information poor experiment

Perfectly round Perfectly flat

First throw Second throw

Let us have a look at another experiment. Say that we want to estimate Swedish university students’ enjoyment of dinner as a function of three factors: the amount of food they eat, the quality of the wine they drink and the rating of the restaurant where they have the meal. Each of these three factors we give

(16)

three possible levels.

Food = {one dish, two dishes, three or more dishes}. Wine = {bad wine, mediocre wine, good wine}.

Restaurant = {low ranked restaurant, medium ranked restaurant, high ranked restaurant}.

Now let us code this to the numbers -1, 0, 1. Food = {-1, 0, 1}.

Wine = {-1, 0, 1}. Restaurant = {-1, 0, 1}.

So (0,1,-1) means the combination (two dishes, good wine, low ranked restau-rant). Let us further say that for economic reasons we will not allow the com-bination (1,1,1), that is, three or more dishes with good wine at a high ranked restaurant, because it would be too expensive. We thus restrict the design space and get the design region below, containing 26 possible design points.

Design region for dinner

(−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 0, 1) (1, 1, 0) (0, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1) (0, 1, 0) (0, 0, 0) (1, 0, 0) (0, 0, 1) Food Restaurant Wine

In pharmaceutical development it is sometimes interesting to measure the effect not of one, but of two drugs combined. It could for example be of interest to design an experiment measuring pain relief as a function of taking two dif-ferent pain killers, see [9]. In such an experiment it can be desirable to restrict the design of the experiment so as not to give any patients a simultaneous full dose of both pain killers, for safety reasons. This is called a restricted design

(17)

region. Finding effective designs in restricted designs regions is what this bach-elor’s thesis is about.

Dose of painkiller 1 Dose of painkiller 2

(18)

(19)

Method

2.1 Theory of Experiments

Doing an experiment is the best method for examining a causal relation [11, p11-12]. In an experiment the experimenter can control some variables, called factors (also known as independent or explanatory variables). The number of factors is written d and the value of factor i is written xi, i = 1, 2, ..., d. The

experiment output is called responses (also known as dependent or explained variables), these are observed by the experimenter. The responses, denoted Y, are stochastic variables since they depend not only on the controlled factors, but also on non-controlled explanatory variables, random noise and eventual bias; the latter three things are collectively named error and denoted ε.

This work explores what values the factors should take in an arbitrary ex-periment with ordinal data. Therefore we focus on the factors in the definition of experiment below. Experimenter Experiment Factors Responses Control Observation Persson, 2017. 5

(20)

Definition 2.1.1. Experiment

An experiment is an examination of a causal relation between factors and re-sponses, in which the experimenter control some factors. [13]

For this property and others of experiments, see [13, p14-17].

Now that we have got a sense of what an experiment is, we will try to make a mathematical definition. But for that we need to know what a trial is. Definition 2.1.2. Trial

A trial is the pair (x, Y ), where x is a d dimensional row vector, x ∈ Rd_{, and}

Y ∈ R is an observed response. Y = f(x) + ε, f being the experiment function mapping a design point, x, to its response Y . ε is the error and its expected value is assumed to be zero, that is E(ε) = 0.

Definition 2.1.3. Experiment (mathematical definition)

An experiment, E, is a set of trials under the same function f. We write E = {(x1, Y1), (x2, Y2),..., (xN, YN)}. The number of trials in an experiment is

denoted by N. −1 −0.5 0 0.5 1 −2 −1 0 1 2 x Y

Figure 2.1: An experiment containing twelve trials (N = 12). Factor interval scaled to [-1,1].

One trial is the smallest possible experiment since the trial (x, Y ) is also an experiment in itself, namely E = {(x, Y )}. The experiment E = {(x1, Y1),

(21)

(x2, Y2),..., (xN, YN)} could be said to have the design D = {x1, x2,..., xN}

since these are the design points chosen for examination. Because design points are often repeated in an experiment the design is written in the following more concise manner.

Definition 2.1.4. Exact design

D = xN11 x2 x3 . . . xn N N2 N n3 N . . . Nn N , with Pn i=1Ni= N , Ni ∈ N

+ _{and N is the total number of trials.}

The first row gives the values of the factors at the design points. Ni gives the

number of trials at xi. n is the number of design points chosen, by necessity n

≤ N. D is the exact design [2, p120].

When we want to underline the number of trials used in a design, we will write DN instead of D. For example D8 would be a design containing eight

trials.

Definition 2.1.5. Continuous design

D =x1 x2 x3 . . . xn w1 w2 w3 . . . wn , with Pn i=1wi= 1 and wi ∈ (0, 1]T R.

The first row gives the values of the factors at the design points. wi gives the

weighted amount of trials at xi. n is the number of design points chosen. D is

the continuous design [2, p119].

This work deals almost exclusively with exact designs and D, except for the result sections 3.1.2 and 3.2.3.

2.2 Models

If we seek a causal relation:

Y = f (x), (2.1)

with x being the factors’ values and Y being the responses’ values, we can construct a model:

Y = f (x) + ε, _{E(ε) = 0.} (2.2)

As mentioned, the number of factors (dimensions) we designate with d. In general d ∈ N, in this thesis d ∈ {1, 2, 3}. We assume that the factors are

(22)

defined on, or scaled to, the interval [−1, 1], see section 2.6. The factors generate a design space S = ×d

i=1[−1, 1]d. S is a cartesian product, thus d = 1 generates

a line of length 2, d = 2 generates a square of area 4 and d = 3 generates a cube of volume 8; these are our design spaces.

If we restrict the design space we get what we call the design region (also known as restricted design region), denoted R. R = S\F, where F is an arbitrary region in S that we forbid. In an experiment we choose (with repetition of design space points allowed) N design points from R, these are called trials, defined above (definition 2.1.2). If we arrange the N trials as rows we get a N × d matrix, this matrix we denote X and call it the design matrix. We will now put all of this into a small example.

Example 2.2.1. Say that d = 2, x = (x1, x2), x1 = [−1, 1], x2 = [−1, 1] and

F = {(x1, x2) ∈ S : x1+ x2 > 1}. Then R is the cut-off design region below.

Putting one trial at each corner and two trials in the middle of the lower edge we get the following design, picture of the design and design matrix.

D =(−1, −1) (0, −1) (1, −1) (1, 0) (0, 1) (−1, 1) 1 2 1 1 1 1 (−1, −1) (1, −1) (1, 0) (0, 1) (−1, 1) (0, −1) 1 1 1 1 1 2 X =           −1 −1 0 −1 0 −1 1 −1 1 0 0 1 −1 1           =           x1 x2 x2 x3 x4 x5 x6           When we have more than one trial we write the model in the following manner: Y = f (X) + ε, X is the design matrix consisting of N trials and

(23)

Y is the vector of responses, Y = (Y1, ..., YN). f (X) is a function such that

f (X): IN ×d → RN_{, where I = [-1,1]. The function f depends linearly on}

the p-dimensional coefficient vector β = (β1, β2, ..., βp)T and on the function

md,model(xi), md,model(X): Id→ Rp.

For an experiment with N trials we get the following model:

f (X) = f      x1 x2 .. . xN      = f           x1,1 x1,2 ... x1,d x2,1 x2,2 ... x2,d .. . ... . .. ... xN,1 xN,2 ... xN,d           =      md,model(x1,1, x1,2, ..., x1,d) md,model(x2,1, x2,2, ..., x2,d) .. . md,model(xN,1, xN,2, ..., xN,d)      | {z } M :=model matrix      β1 β2 .. . βp      | {z } β:=beta vector (2.3)

where d is the number of factors, p is the number of parameters in the chosen model and β1, β2,..., βp ∈ R.

The models used in this thesis have been five polynomials. Experience indi-cates that in very many experiments the response can be described by polyno-mial models of order no greater than two and that curves with three or more points of inflection are rare [2, p36]. This is why we have restricted the number of polynomials to five. The models presented below have all been multiplied with their respective coefficient vector (β).

In one dimension:

Linear, m1,L y = β1x1+ ε

Affine, m1,A y = β1+ β2x1+ ε

Quadratic, m1,Q y = β1+ β2x1+ β3x21+ ε

(24)

In two dimensions: Linear, m2,L y = β1x1+ β2x2+ ε Affine, m2,A y = β1+ β2x1+ β3x2+ ε Interaction, m2,I y = β1+ β2x1+ β3x2+ β4x1x2+ ε Quadratic, m2,Q y = β1+ β2x1+ β3x2+ β4x1x2+ β5x21+ β6x22+ ε Cubic, m2,C y = β1+ β2x1+ β3x2+ β4x1x2+ β5x21+ β6x21+ β7x31+ β8x32+ ε In three dimensions: Linear, m3,L y = β1x1+ β2x2+ β3x3+ ε Affine, m3,A y = β1+ β2x1+ β3x2+ β4x3+ ε Interaction, m3,I y = β1+ β2x1+ β3x2+ β4x3+ β5x1x2+ β6x1x3+ β7x2x3+ ε Quadratic, m3,Q y = Interaction + β8x21+ β9x22+ β10x23+ ε Cubic, m3,C y = Quadratic + β11x31+ β12x32+ β13x33+ ε

We notice that all models are symmetrical in the way that all factors are handled in the same manner, that is all terms are symmetrical. This means that if x2

1 is included in a three dimensional model then so are x22and x23; if x1x3is

included, then so are x1x2 and x2x3and so on and so forth.

The reason that terms with higher order interaction such as βnx21x2 or

βnx1x2x3 are not included in the models above is because they are seldom

seen in experiments, [2, p41-42]. It is now high time for an example.

Example 2.2.2. Set d = 2, no restrictions on the design space and use the interaction model y = β1+ β2x1+ β3x2+ β4x1x2+ ε. The function m2,I is then

m2,I(x1, x2) = (1, x1, x2, x1x2). The design space is a square with side length

equal to two. We fill the design space with six trials, one in each corner and two in the center.

(25)

(-1,-1) 1 (1,-1) 1 (0,0) 2 (-1,1) 1 (1,1) 1 f (X) = f                 x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 x4,1 x4,2 x5,1 x5,2 x6,1 x6,2                 = f                 −1 −1 −1 1 1 −1 1 1 0 0 0 0                 =         m2,I(−1, −1) m2,I(−1, 1) m2,I( 1, −1) m2,I( 1, 1) m2,I( 0, 0) m2,I( 0, 0)             β1 β2 β3 β4     = =         1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 1 1 1 1 0 0 0 1 0 0 0             β1 β2 β3 β4     = M β.

M is the model matrix and β i the coefficient vector. They both depend on what model we have chosen. M also depends on the design D.

So to summarize, Y =      Y1 Y2 .. . YN      =      md,model(x1,1, x1,2, ..., x1,d) md,model(x2,1, x2,2, ..., x2,d) .. . md,model(xN,1, xN,2, ..., xN,d)           β1 β2 .. . βp      +      ε1 ε2 .. . εN      . (2.4)

Writing the equation more economically:

(26)

2.3 Number of Trials

The more trials done, the higher the information gets. Since each trial has a cost, we want to gain as much information as possible out of as few trials as possible.

The minimum number of trials to get an information greater than zero de-pends on the model. In fact, we need at least as many trials as we have parame-ters. This since it is necessary to have as many equations as there are unknown variables. Besides this minimum number of trials needed we also need the trials to be spread in the design region to at least as many design points as there are parameters.

N it the number of trials, n is the number of design points used in the design, p is the number of coefficients in the model. A necessary criterion for the information to be greater than zero is: N ≥ n ≥ p.

Shown below is a table of the minimum number of n needed for each model used by us to give an information greater than zero. (L means linear model, A means affine model, etc.)

L A I Q C

d = 1 1 2 - 3 4

d = 2 2 3 4 6 8

d = 3 3 4 7 10 13

2.4 Least Squares Estimation

The following theory is in large part taken from [10] and [14]. In the model

Y = M β + ε (2.6)

M is known since we chose it ourselves. The vector β is unknown and con-tains the parameters we want to estimate. The error ε = (ε1, ε2, ..., εN)T is a

stochastic vector and therefore Y is a stochastic vector, but we can observe Y’s values.

We want to estimate β with least squares estimation. We work with the assumption that ε1, ..., εN are identically, independently N (0, σ2)-distributed.

This means that E(Y) = M β and V(Y) = σ2 _{· I}

N, where IN is the

N-dimensional identity matrix. Definition 2.4.1. Rank

The rank of a matrix A, denoted by rank A, is the dimension of the column space of A [6, p178].

(27)

We remind the reader of the fact that rank A = rank AT_.

Definition 2.4.2. Full rank

Full rank means that rank A = min(m,n) where m is the number of rows and n is the number of columns in the matrix A [6, p270].

Theorem 2.4.3. Let Y = M β + ε where M is a N × p matrix of full rank, N ≥ p; β = (β1, ..., βp)T and ε = (ε1, ε2, ..., εN)T, ε1, ..., εN are identically,

independently N (0, σ2_{)-distributed. Then the least squares estimator of β is:}

ˆ

β = (MT_{M )}−1_MT_Y.

Proof. εi = Yi- E(Yi) and we wish to estimate the parameters in β so that the

error will be as small as possible. Set

ˆ

E(Yi) = M ˆβ.

The i-th residual is defined as:

ei= Yi−E(Yˆi) = Yi− M ˆβ.

We want to minimize the errors in absolute terms and do this by minimizing the square of the residuals. One reason for why we use the square of the residuals instead of their absolute value is because the absolute value function is not differentiable everywhere. min N X i1 e2_i = eTe,

e being a column vector. We can then write:

Y = M ˆβ + e. Developing eT_{e we get}

eTe = (Y − M ˆβ)T(Y − M ˆβ) = YTY − YTM ˆβ − ˆβTMTY + ˆβTMTM ˆβ. Which we simplify to

eTe = YTY − 2(MTY) ˆβ + ˆβTMTM ˆβ. This expression has a minimum when

∂eT_e

(28)

that is when, ∂eT_e ∂ ˆβ = −2M T_{Y + 2(M}T_{M ) ˆ}_{β = 0.} Hence, −2MTY + 2(MTM ) ˆβ = 0 ⇔ ˆβ = (MTM )−1MTY. (2.7)

It is in the last step of 2.7 that we need the assumption of full rank because without it the matrix (MT_{M ) cannot be uniquely inverted [2, p52]. Without}

this assumption the pseudo-inverse is used. For this case the idea is analogous but the algebra is somewhat different. We refer to [15] for the interested reader.

2.5 D-optimality

In this section we will define D-optimal designs. Such designs maximize the determinant of what is called the information matrix. The information matrix presented here could be generalized to non-linear models and is then called the Fisher information matrix [16, p5]. For the reader interested in this generaliza-tion I refer to appendix A.6.

We observe that the covariance matrix of the least squares estimator is:

V(β) = σˆ 2(MTM )−1 (2.8)

[2, p52] (for proof see [17, p46]). When working with linear models in an exper-iment we want to estimate the parameters β1, ..., βp with the smallest possible

variance. But we notice that "If interest is in the comparison of experimental de-signs, the value of σ2_{is not relevant, since the value is the same for all proposed}

designs for a particular experiment." [2, p53]. It is thus only the determinant of the matrix (MT_{M )}−1 _{that we try to minimize in this work.}

(MT_{M )}−1 _{is called the dispersion matrix. The determinant of the}

dis-persion matrix is proportional to the confidence ellipsoid of the parameters, see Figure 2.5 [3, p40].

Minimizing the determinant of the dispersion matrix is one type of design criterion (D-optimal design), another design criterion would be to minimize the sum of the squared parameter estimates (A-optimality), a third would be to minimize the squared length of the longest parameter estimate (E-optimality) [3, p40-43].

(29)

β

1

β

2

Figure 2.2: With only two parameters to estimate, β1 and β2, the dispersion

matrix is proportional to the area of the ellips.

Definition 2.5.1. Optimal design

An experiment designed so that it optimizes a given design criterion is called an optimal design.

There are several optimal statistical design criteria. We will use the most popular, which is called D-optimality. One reason for its popularity is the fact that the factor levels can be scaled (linearly transformed/coded) without problems.

Definition 2.5.2. Information matrix

MT_{M is the information matrix for β as well as for the design D [2, p52].}

It is D that together with the function md,model generates M.

Definition 2.5.3. Information

I(D) = |MTM | (2.9)

The information, I(D), is the determinant of the information matrix which is generated by the design D, see 2.1.4.

(30)

which gave us the design matrix: M =         1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 1 1 1 1 0 0 0 1 0 0 0         ,

the determinant of the information matrix is then

|MT_{M | =}     1 1 1 1 1 1 −1 −1 1 1 0 0 −1 1 −1 1 0 0 1 −1 −1 1 0 0             1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 1 1 1 1 0 0 0 1 0 0 0         = 384. Thus I(D) = 384. When it is obvious what design D in I(D) we are speaking of we will only write I.

Definition 2.5.5. D-optimal design

A D-optimal design, Dopt, maximizes the determinant of the information

ma-trix. Equivalently a D-optimal design minimizes the determinant of the disper-sion matrix.

Oftentimes, henceforth, we will use only the phrase optimal design but al-ways mean D-optimal design, if not clearly stated otherwise.

Definition 2.5.6. Relative design efficiency

r(D) = |(M

T_{M )|}1p

N (2.10)

r(D) is a function that takes a design, D, as input data and scales its information according to the model and the number of trials.

Relative design efficiency gives us a way of comparing designs with the same model but a different number of trials, see tables A.2 or A.3 for examples of this.

(31)

Definition 2.5.7. D-efficiency e(D) = I(D) 1 p I(Dopt) 1 p (2.11)

Where Doptis a D-optimal design [2, p151].

The D-efficiency compares a design, D, with the optimal design, Dopt, see

the calculation in 3.2.1 for an example of this. We could of course compare two sub-optimal designs with each other instead of with an optimal design [2, p151]. Besides calculating the information of a design by taking the determinant of the information matrix, the information of a N-trial design can also be calculated in the following manner:

I(D) = |

N

X

i=1

(md,model(xi))Tmd,model(xi)|. (2.12)

While the information of a continuous design could be calculated with:

I(D) = |

n

X

i=1

ωi(md,model(xi))Tmd,model(xi)|, (2.13)

where ωiis the weight to be put in design point xi.

2.6 Scaling

In the dinner example in Background we transformed the levels to {-1,0,1}. This might seem strange. If D-optimality can be coded arbitrarily, why not code the interval to [0,1], [0, li− 1] or [1, li] where liis the number of levels (parts that we

divide the interval in)? One reason not to code the levels to [0, li− 1] or [1, li]

is that the information would then grow proportional to the number of levels. A reason that [-1,1] is prefered over [0,1] is that the two end points should be weighted the same in the calculation of D-optimality, this would not be the case in the [0,1] interval. Scaling the interval to [-1, 1] is good for symmetry reasons.

[-1,1] scaling

For each factor separately, let ui be an unscaled variable such that ui ∈

[min(ui), max(ui)]. We then scale the variable to xi∈ [−1, 1] by setting:

xi=

2ui− max(ui) − min(ui)

max(ui) − min(ui)

(32)

2.7 Levels

In this work we use equidistant levels to divide each factors’ intervals. The reason for which we divide the intervals in levels, is that it is a natrual con-sequence of the implementation of our program. Each factor can take on an arbitrary number of levels in our program, but in this work we have only been interested in the case when each interval has the same number of levels. Factor i’s levels are written si(j), i = 1, 2, ..., d, j = 1, 2, ..., li. s2(3) means factor two

at level three. If we have three factors of which the first takes on seven levels, the second takes on three levels and the third factor takes on four levels we write this as the vector s = (l1, l2, l3) = (7, 3, 4). This gives us Q

3

i=1li = l1· l2· l3 =

7 · 3 · 4 = 84 possible design points. The design points make up the design space S = {x1, x2, ..., x84}. In general S = {x1, x2, ..., xt}, where t = Q d i=1li. li = 2 → {-1, 1} li = 3 → {-1, 0, 1} li = 4 → {−1, −1₃,1₃, 1} li = 5 → {−1, −1₂, 0,1₂, 1} li = 6 → {−1, −3₅, −1₅,1₅,3₅, 1} li = 7 → {−1, −2₃, −1₃, 0,1₃,2₃, 1} li = k → {−1, −1 +_k−12 , −1 + 2 ·_k−12 , −1 + 3 ·_k−12 , ..., 1} (-1,-1) (1,-1) (1,0) (-1,1) (0,1)

Figure 2.3: d = 2, s = (5, 5), upper right corner not permitted, this gives us 21 design points.

In the results section, where we fix the factors to two, we will mainly use levels set at s = (5, 5) and s = (9, 9) because with these we get better designs than with levels of about the same magnitude such as s = (6, 6), s = (7, 7) or s = (8, 8). This since with s = (5, 5) and s = (9, 9) we include the (for our design regions) important design points (0,1), (1,0) and (0.5,0.5). Set:

(33)

(i) −1 + n ·_k−12 = −1₂, or (ii) −1 + n ·_k−12 = 0, or (iii) −1 + n ·_k−12 =1₂

Solving k from any one of these we get:

k = 4n + 1. Since n ∈ N we get k = {5, 9, 13, 17, ...}. For the cut-off region the point (0.5,0.5) is not included and it thus gives optimal designs also when s = (3, 3).

2.8 Design Regions

Definition 2.8.1. Convex region

A convex region, R, has the property that any line segment joining any two points lie within (within meaning int(R) ∪ ∂R) the region [8, p121].

Convex design region. Non-convex design region.

Figure 2.4: The red lines are line segments connecting two points from respective region.

In this thesis three design regions have been explored for each dimension: 1) The full design space.

2) A cut-out design region (non-convex). 3) A cut-off design region (convex).

Even though the design regions in our program are always partitioned by the chosen levels, we will here present continuous design regions. The continuous design regions could be seen as design regions where the number of levels has gone to infinity, that is li → ∞, for all i.

(34)

−1 1 Full design space.

With only one factor we have designed the restricted design regions to be the same as the full design space, that is the intervel [-1,1].

Two factors (d=2).

(−1, −1) (1, −1)

(1, 1) (−1, 1)

Full design space. {x = (x1, x2) : x1, x2 ∈ [-1, 1]} (−1, −1) (1, −1) (1, 0) (0, 0) (0, 1) (−1, 1)

Cut-out design region (non-convex).

(35)

(−1, −1) (1, −1) (1, 0) (0, 1)

(−1, 1)

Cut-off design region (convex).

(36)

Three factors (d=3).

Full design space.

{x = (x1, x2, x3) : x1, x2, x3 ∈ [-1, 1]} (−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1)

Cut out-design region (non-convex).

{x = (x1, x2, x3) : x1, x2, x3 ∈ [-1, 1], min(x1, x2, x3) ≤ 0} (−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 0, 1) (1, 1, 0) (0, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1) (0, 1, 0) (0, 0, 0) (1, 0, 0) (0, 0, 1)

(37)

Cut off-design region (convex). {x = (x1, x2, x3) : x1, x2, x3 ∈ [-1, 1], x1+ x2 + x3 ≤ 1}} (−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 0, 1) (1, 1, 0) (0, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1)

2.9 Isomorphic Designs

Definition 2.9.1. Isomorphic designs

We call two designs isomorphic if they can be made indistinguishable by the operations reflection and rotation. We further demand that they belong to the same experiment, meaning they have the same model md,model.

In the analysis throughout this section we will concentrate on the cut-off and cut-out design regions and merely look at the full space regions for pedagogical purposes.

2.9.1 d = 1

Rotations are not interesting in one dimension, but the line has one symmetry line and it is of interest.

(38)

Figure 2.5: We can reflect a one-factor design through the symmetry line.

5 4 4 5

Figure 2.6: These two designs, call them D1,9and D2,9, are isomorphic, denoted

∼

=, since they can be made indistinguishable through reflection.

D1,9= −1 1 5 4 ∼ = D2,9= −1 1 4 5 .

We say that the design D1,9 is of size two since it can generate another isomorf

design. We write it |D1,9| = 2, or equivalently |D2,9| = 2.

3 3 3 3

Figure 2.7: These two designs are already identical and therefore we say that they are of size one.

2.9.2 d = 2

Rotating the cut-out or the cut-off design regions are not of interest in two di-mensions since this would change the shape of the design region. The restricted design regions have only one symmetry line.

(39)

Figure 2.8: A square has four symmetry lines. Reflection through the fat red line could be seen as flipping the coordinate axes.

(40)

a (x, y) a’ (y, x) x2 y2 y1 x1

Figure 2.9: When we reflect through the symmetry line the point a = (x, y) changes place with the point a0 = (y, x). We call a and a0 paired points. As-suming a 6= a0 this gives us two isomorphic designs.

(41)

a (x, y) a’ (y, x) b (x0, y0) b’ (y0, x0)

Figure 2.10: Reflection through the red line. Assuming at least one of the paired points differ, we get two isomorphic designs. That is: (a 6= a’) ∨ (b 6= b’) ⇒ |D| = 2.

2.9.3 d = 3

In one and two dimensions we reflect the design regions through a symmetry line. In three dimensions we can rotate the restricted design regions around the only rotation line that preserves the design regions shape. Rotation around this one line can generate three isomorphic designs. We see from the figure below that we can rotate the cube 2π₃ around the rotation line and in this way keep the shape of the design region, but change the place of the trials. After three

2π

3 rotations we return to the design we started with.

We can also reflect, not through a line as in two dimensions but through a plane. We have three planes to reflect through, the planes that hold one of the points a, b or c fix at a time and reflects the two other points. These reflections give us three more designs.

The rotations and reflections could also be seen as rotating and reflecting the factors’s coordinate axes.

(42)

(−1, −1, −1) (1, 1, 1) a (1, 1, −1) c (1, −1, 1) b (−1, 1, 1)

Figure 2.11: If a 6= b 6= c this design can generate five others, giving a total of six isomorphic designs. If two points are equal but the third point differs we get three isomorphic designs. If all points are the same we have only one design.

(−1, −1, −1)

(1, 1, 1)

Figure 2.12: One rotation line (the space diagonal) and three planes of reflection, thus just as above.

If we construct design spaces with different number of levels for each factor we get a rectangle (d = 2) and a cuboid (d = 3). The (non-square) restricted

(43)

region rectangle has no symmetry or reflection lines. The (non-cube) restricted region cuboid has no lines of rotation or planes of reflection. But would we allow two of the cuboids factors to have the same number of levels we would get one plane of reflection. This plane of reflection could give two isomorphic designs if the two paired design points have a number of trials that differ from each other.

We will here give two examples where d = 2.

3 3 2 1 2 2 2 2 1 1

Figure 2.13: For the leftmost design |D| = 2. For the rightmost design: |D| = 1.

2.9.4 Isomorphic Designs and Information

We want to show that isomorphic designs have the same information. We start off by looking at d = 2 and d = 3. The reflection that we do when d = 2 could be seen as flipping the coordinate axes, that means the columns in X change places to Xref: X =      x1,1 x1,2 x2,1 x2,2 .. . ... xN,1 xN,2      and Xref =      x1,2 x1,1 x2,2 x2,1 .. . ... xN,2 xN,1     

When d = 3 the reflections in the plane hold one column fix and interchanges the other two. While the rotation 2π₃ moves all columns one step in the following manner:

(44)

X =      x1,1 x1,2 x1,3 x2,1 x2,2 x2,3 .. . ... ... xN,1 xN,2 xN,3     

after rotating2π₃ becomes Xrot =

     x1,3 x1,1 x1,2 x2,3 x2,1 x2,2 .. . ... ... xN,3 xN,1 xN,2      .

Since we have both symmetrical models and symmetrical scaling it is obvious that reflections and rotations for d = 2 and d = 3 will not change the informa-tion. Thus the designs of our 2- and 3-factor models have the same information if they are isomorphic.

Now we will look at d = 1. When reflecting the line we change the sign of all design points. If X is the design matrix of D and Xref is the reflected design

matrix (call the reflected design Dref) we get:

X =      x1 x2 .. . xN      and Xref =      −x1 −x2 .. . −xN      .

For m1,LI(D) = I(Dref) = P N i=1x

2 i.

For m1,A I(D) = I(Dref) = NP N i=1x 2 i − PN i=1xi.

For m1,Q I(D) = I(Dref) = |N (PN_i=1x2i

PN i=1x 4 i − PN i=1x 3 i PN i=1x 3 i)− PN i=1xi(P N i=1xiP N i=1x 4 i − PN i=1x 3 i PN i=1x 2 i)+ PN i=1x 2 i( PN i=1xiP N i=1x 3 i − PN i=1x 2 i PN i=1x 2 i)|.

For the model m1,C:

I(D) = N P xi P x2i P x 3 i P xi P x2i P x 3 i P x 4 i P x2 i P x3i P x4i P x5i P x3 i P x4i P x5i P x6i The information of the reflected design is:

I(Dref) = N −P xi P x2i −P x 3 i −P xi P x2i −P x3i P x4i P x2 i −P x3i P x4i −P x5i −P x3 i P x4i −P x5i P x6i = N −P xi P x2i −P x 3 i P xi −P x2i P x3i −P x4i P x2 i −P x3i P x4i −P x5i P x3 i −P x4i P x5i −P x6i = I(D). Thus for all models we work with, isomorphic designs have the same information.

2.9.5 Isoinformative Designs

Definition 2.9.2. Isoinformative designs

We call two designs isoinformative if they have both the same number of trials and the same information but are not identical. We further demand that they belong to the same experiment, meaning they have the same model md,model.

(45)

When the models and scaling are symmetrical, the isomorphic designs are a subgroup to the isoinformative designs, that is, if DISO is the set of isomorphic

designs of size greater than one and I is the set of isoinformatic designs, then DISO ⊂ I.

The set of isoinformative designs, IN, contains all designs with N number

of trials, the same information and the same model. We denote the size of this set |IN|. The set of isomorphic designs for a given N we denote DISO,N, the

are isoinformatic designs that are not isomorphic, see section 3.2.1 for examples of this, thus the inequality is sometimes strict.

All designs with information greater than zero make up the set A. The smallest of designs in A belong to the set Dpcontaining all designs with p trials,

where p is the number of parameters that our chosen model has. There is no upper limit on the number of trials a design can have and A is thus split in an infinite number of subsets, A = {Dp, Dp+1,...}. Further DISO ⊂ I ⊂ A. See

Figure A.1 for an illustration of these relations.

2.10 Program

2.10.1 Program Idea

The program we have developed finds D-optimal designs in one, two or three dimensions. The program starts with generating a number of random designs. Out of these random designs the ones with the highest information are sent to one of the two algorithms we have developed. What algorithm is used depends on which of the two GUI that the user has executed.

The reason that two algorithms were developed is that the first one devel-oped, which we have called One Step, had the weakness that it ended up in local maxima, and thus not finding the D-optimal design. Therefore another algorithm, which we have called Systematic Search was developed. Systematic Search avoided ending up in local maxima and thus found the D-optimal de-signs. It is more important that the program finds an optimal design and takes a few times longer to run than that it finds a sub-optimal design in lesser time. Therefore Systematic Search is better than One Step. For a comparison see section 3.3.

We later found out that Systematic Search is closely related to the Fedorov Algorithm (1972). The Fedorov Algorithm is described in [4].

(46)

Design generator

4

5

3 ...

2 m

1

Fine search Output Generate a number (m) of random de-signs.

Send the designs with the highest informa-tion to an optimiza-tion algorithm for fine search.

Figure 2.14: A schematic of what the program does.

2.10.2 Program Description

The program is written in MATLAB. The program and instructions on how to run it is linked for download in the appendix A.1. The user has to give the pro-gram the following inputs: factors (d), levels (s), region shape, the number of trials, the model, the number of random designs to be generated (simulations) and the number of designs to be improved upon by the algorithm (prospects). All of these inputs are given in the program’s GUI. This is what the GUI looks like.

(47)

The user can choose one, two or three dimensions. The level-windows pop up after the number of dimensions has been chosen. The levels can be arbitrarily chosen, but the more levels the longer the program takes to run. One of the three regions and one of the five (four if d=1) models is chosen. Trials, simulations and prospects are arbitrarily chosen but prospects must be less than or equal to simulations. The "Show matrix"-box decides whether or not the design matrix of the best design will be shown in the output. The help button opens a pdf-manual (works both in Windows and Linux).

The program output consists of: the input data, elapsed time, a picture of the design with the highest information found, the design’s information and its matrix (optional).

(48)

space region design matrix information top

one step search systematic search

GUI

Output: The highest information found and its design. d,s shape N model simulations, prospects

Function description Functions Input d is the number of

dimensions, 1,2 or 3. s = (l1, . . . , ld) is the

vector of levels each dimension may take. shape ∈ {full design space, cut-out design region, cut-off design region}. N ∈ N is the number of trials. model ∈ {linear, affine, interaction, quadratic, cubic}. simulations ∈ N, the number of random de-signs created. prospects ≤ simu-lations, the number of designs with the highest information sent to fine search. Spans a 1, 2 or 3-dimensional space.

Restricts the design space due to technical or medical reasons.

Randomly generates designs, D. The entries in D are the number of trials Ni at each design point xi.

Constructs a matrix according to D and R.

Calculates the information |MTM |. M is an expansion of X according to the model.

Takes out the matrices with the highest information. Two different algorithms for finding designs with maximum information.

Graphical user interface.

S (t × d matrix) R(r × d matrix) R, D(1 × r vector) X (N × d matrix) X, I(a number) I’,X’ I’,X’ I”,X” I”,X”

(49)

We will now describe in detail what our program does and how each algo-rithm works. The program starts of by generating a number of random designs, this is done by the code modules shape, region and design. In the program imple-mentation the random designs are called simulations. The code modules matrix and information creates model matrices and calculates the designs’s informa-tion. The number of highest information designs to be sent to an algorithm is given by the number of prospects. The module top picks out this (prospect ) number of best designs and sends it to one of the two algorithms. Which algo-rithm is used depends on if you have opened the program Systematic Search or One Step.

This is how One Step works:

Prospects

Contains the best randomly generated designs and their respective information. One prospect at a time is sent to One Step for improvement.

One Step

Move a trial from its design point to the closest two design points in each factors’s direction.

d=1 ⇒ 2 new designs, d=2 ⇒ 4 new designs, d=3 ⇒ 8 new designs

Calculate new information

Send back the highest information and its design. New designs (2, 4 or 8) If the new informa-tion is equal or lower then move di-rectly to the next trial. If the new informa-tion is higher then up-date design and move to the next trial.

Sends back a new design matrix X’ and its infor-mation I’ after all trials have been moved. X,I

Figure 2.15: The functions inside the dashed box are iterated as long as the designs improve. This is done until the design have found an information/design maximum, that is the algorithm can not improve the design any more.

(50)

1 1 1 1 2 1 1 1 1 1 1 1 2 1 3 1 1 3 2 4 3 1 3 2

Figure 2.16: A design that goes through One Step and ends up in a sub-optimal design.

In Figure 2.16 the first design is a randomly generated design which is among the _simulationsprospects highest information designs out of the randomly generated de-signs. The second design has gone through the One Step algorithm once. The third design has gone through One Step once more and has ended up in a local maximum that it can not get out from.

(51)

We will now look at how Systematic Search works:

Prospects

Contains the best randomly generated designs and their respective information. One prospect at a time is sent to systematic search for improvement.

Systematic search

Move a trial from its design point to all other design points. This creates r-1 new designs.

Calculate new information

Send back the highest information and its design. New designs (r-1) If the new informa-tion is equal or lower then move di-rectly to the next trial. If the new informa-tion is higher then up-date design and move to the next trial.

Sends back a new design matrix X’ and its infor-mation I’ after all trials have been moved. X,I

Figure 2.17: Notice that the Systematic Search is not iterated. As long as sim-ulations and prospects are large it only needs one iteration to find a maximum (or the unique maximum, if only one exists). An advantage over One Step is that it is far less likely to end up in a local maximum.

When Systematic search has moved all trials once and when One Step has moved all trials until the algorithm can not improve the designs any more, then the program produces output.

2.11 Computational considerations

It should be noted that old algorithms such as DETMAX (from 1974) are less compute-intensive than the two algorithms we have developed since they calcu-late the determinant of the information matrix in another manner. But

(52)

comput-ing time has not been a big problem durcomput-ing the thesis. It is worth noticcomput-ing that an Intel 8080 from 1974 could do 0.29 MIPS (million instructions per second) while an Intel i7 from 2016 can do around 300 000 MIPS. 300000_0.29 ≈ 1 000 000.

To do an exhaustive search, that is, to calculate all designs for a certain design region and a given number of trials gets to compute-intense as soon as r and/or N gets "big". With r allowed design points in our design region and N trials in our design there are rN designs. Let us try out this with some numbers and see what "big" means. Set N = 10 and s = (5, 5) in the cut-off design region. Then r = 21 and we get: 2110≈ 2010_{= (2 · 10)}10_{= 2}10_{· 10}10_{≈ 10}3_{· 10}10_{= 10}13_.

By setting prospects to 1 and only changing simulations we get an approximation of how long time it would take to generate these 1013designs and calculate their information. We use the interaction model and get the elapsed calculation time below.

simulations 103 ₁₀4 ₁₀5 ₁₀6 ₁₀7

time (seconds) 0.13 0.86 8.1 81 825

It is obviously unreasonable to solve this problem exhaustively. Though it would be possible when s = (3, 3), giving us 810 _{designs in the cut-out design}

(53)

Results

We have looked at the two restricted design regions (cut-off and cut-out) in two dimensions with the model m2,I. The program could just as easily have

produced analogous results for the other two dimension spaces, for another model or for the full design space.

The designs here presented are called optimal if they have the highest infor-mation found, either by our program or by analysis. The reader should keep in mind that we do not provide proofs that these designs are optimal.

3.1 Cut-out Design Region, d = 2

3.1.1 Exact Designs

Below are designs for d = 2, s = (5, 5), interaction model and a variable number of trials. The isoinformative designs are marked with a thin symmetry line. For the cut-out design the program generates D-optimal designs when the number of levels belong to {3,5,9,13,...} (see section 2.7). We have chosen s = (5, 5) here because the figures are of a nice size. All designs except N = 11 and N = 24 have zero or one pair of pairwise design points that differ from each other. The designs for N = 11 and N = 24 have two such pairs (see 2.9 for explanation of paired points). In Figure 3.1 we see three examples of cut-out designs, only five design points are used in all D-optimal designs found for this region and the interaction model.

(54)

1 1 1 1 0 3 3 2 2 1 6 5 3 5 3

Figure 3.1: m2,I, N = 4, 11, 22. The first two designs are isomorphic of size

two.

Table 3.1: Designs for N = 4,5,...,30; model m2,I; |D| is the number of

isomor-phic designs. Below each design point is given its number of trials. N (-1,-1) (1,-1) (-1,1) (1,0) (0,1) |D| 4 1 1 1 1 0 2 5 1 1 1 1 1 1 6 2 1 1 1 1 1 7 2 2 1 1 1 2 8 2 2 2 1 1 1 9 2 2 2 2 1 2 10 3 2 2 2 1 2 11 3 3 2 2 1 2 12 3 3 3 2 1 2 13 3 3 3 2 2 1 14 4 3 3 2 2 1 15 4 4 3 2 2 2 16 4 4 4 2 2 1 17 4 4 4 3 2 2 18 5 4 4 3 2 2 19 5 4 4 3 3 1 20 5 5 4 3 3 2 21 5 5 5 3 3 1 22 6 5 5 3 3 1 23 6 5 5 4 3 2 24 6 6 5 3 4 2 25 6 6 6 4 3 2 26 6 6 6 4 4 1 27 7 6 6 4 4 1 28 7 7 6 4 4 2 29 7 7 7 4 4 1 30 8 7 7 4 4 1

(55)

More data on isomorphic designs for this design region and model can be found in the appendix A.3.

When we study the table we see that a design DN +1can be constructed from

the earlier design DN by simply adding a trial to a design point. No earlier

trials have to be moved. This is a very nice property for making sequential experiments since we then can have an optimal design for each experiment in the sequence.

We also notice that it is not always easy to guess the design DN +1from the

design DN. If we take a look at designs with a much larger number of trials we

will approximate the continuous design and calculate the continuous design in this way.

3.1.2 Continuous Designs

We are now fairly convinced that all D-optimal designs of the cut-off design region with the interaction model have the following form:

Dopt= (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) N1 N N2 N N3 N N4 N N5 N .

All optimal designs that we have looked have used exactly these design points. We would guess that the continuous designs uses the same five points,

Dopt=

(−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0)

ω1 ω2 ω3 ω4 ω5

, we would like to know what ωi is, i = 1,...,5.

Figure 3.2: Exactly these five design points are used in all optimal designs found by the program, for N ≥ 5 and s = [l1, l2] where l1= l2 ≥ 3.

Assuming that only these five points are used, we might us the levels s = (3, 3) when we make designs with a large number of trials, to make the program run fast. (We have run a similar analyis with the levels set to be much larger than this, such as 17 and 21, and always found the same result.)

Looking at Figure 3.3 we find the quite surprising result that the design point (-1,-1) has one fourth of the trials in the cut-out design region, just as it

(56)

does in the full design space. In the leftmost design the levels are s = (4, 4) which gives us the design:

D ≈(−1, −1) (1, −1) (−1, 1) (− 1 3, 1) (1, − 1 3) 0.25 0.220 0.220 0.155 0.155 , I=3.5867·1013_{to be}

com-pared to I(Dopt) = 7.2775 · 1013 when s = (3, 3).

1000 881 619 619 881 3 8N 881+619 4000 = 3 8 1000 934 566 934 566 3 8N 934+566 4000 = 3 8 1000 1000 1000 1000

Figure 3.3: Exact optimal designs with m2,I and N = 4000. In the leftmost

design the levels are l1 = l2 = 4 and the design is cut-out, in the two other

designs the levels are l1= l2 = 3 and the designs are cut-out and full space.

The middle design has the following proportions: Dopt,4000 = (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 1 4 3 8 934 1500 3 8 934 1500 3 8 566 1500 3 8 566 1500 .

This gives us the idea that the continous design looks like this: Dopt= (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 1 4 3 8ω 3 8ω 3 8(1 − ω) 3 8(1 − ω) , ω ∈ (0, 1). We will calculate ω by using the formula 2.13, that is:

I(D) = | n X i=1 ωi(md,model(xi))Tmd,model(xi)| ω1 = 1₄, ω2= ω3 = 3₈ω, ω4 = ω5= 3₈(1 − ω).

(57)

m2,I((−1, −1)) = (1, −1, −1, 1)

m2,I((1, −1)) = (1, 1, −1, −1)

m2,I((−1, 1)) = (1, −1, 1, −1)

m2,I((0, 1)) = (1, 0, 1, 0)

m2,I((1, 0)) = (1, 1, 0, 0)

Set I(ω) to be the information as a function of ω. Then by using formula 2.22 we have that I(ω) = 27₆₄ω +27₃₂ω2+81₆₄ω3. I’(ω) = 27₆₄+27₁₆ω +243₆₄ω2. By setting I’(ω) = 0 we get the two roots ω = 2₉ ±

√ 13

9 of which only one is positive,

ω = 2₉+

√ 13

9 ≈ 0.6228. Thus the continous design is:

Dopt= (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 1 4 2+√13 24 2+√13 24 7−√13 24 7−√13 24 , which is approximately, Dopt≈ (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 0.25 0.2336 0.2336 0.1414 0.1414 .

3.2 Cut-off Design Region, d = 2

We continue with the model m2,I to the cut-off design region, which is an

expansion of the cut-out design region.

3.2.1 Exact designs, N = 4 and N = 5

We will here look at the optimal designs for N = 4 and N = 5. We will see that while the optimal design for N = 4 is unique (see Figure 3.4), the optimal design for N = 5 depends on the number of levels we use. The optimal design for N = 5 on a continuous design region does in fact have to be calculated by analysis, even though our program finds very good approximations of this design as soon as the levels are equal to or greater than nine.

Dopt,4has one trial in each of the four points (-1,-1), (1,-1), (-1,1) or (1₂,1₂).

When we expand Dopt,4 in search for Dopt,5 a reasonable try would be to put

the fifth trial in one of the already used design points (-1,-1), (1,-1), (-1,1) or (1

2, 1

2). Then the highest possible information for l1 = l2 = 5 is I(D5) = 162,

which is the information for the following four designs: (−1, −1) (1, −1) (−1, 1) (1 2, 1 2) 2 1 1 1 (−1, −1) (1, −1) (−1, 1) ( 1 2, 1 2) 1 1 1 2 ; neither one of the above two designs are isomorphic to these two designs: (−1, −1) (1, −1) (−1, 1) (1 2, 1 2) 1 2 1 1 ∼ =(−1, −1) (1, −1) (−1, 1) ( 1 2, 1 2) 1 1 2 1 . Thus we have found our first isoinformative designs that are not isomorphic. But the designs we just found are not optimal compared to designs with levels li ≥

(58)

1 1 1 1 1 1 2 1

Figure 3.4: Left design: Dopt,4. Right design: Optimal design for N = 5 and l1

= l2= 5. The fifth trial can go in any one of the red design points, each design

has I(Dopt,5) = 162.

1 1 1 1 1 (0.25,0.75) (0.75,0.25) (-1,-1) (-1,1) (1,-1)

Figure 3.5: N = 5; I = 162.6953; when s1= s2 ≥ 9 the trials separate into five

(59)

But the design might get even better if we have a continuous design region, that is we let the levels go to infinity.

1 1 1 1 1 (p,1-p) (1-p,p) (-1,-1) (-1,1) (1,-1)

Figure 3.6: N = 5 on a continuous design region.

Assuming that the schematic design above is optimal, we want to find the value of p. We do this by examining the model matrix:

M =       1 −1 −1 1 1 1 −1 −1 1 −1 1 −1 1 p 1 − p p(1 − p) 1 1 − p p p(1 − p)      

The information is:

I = |MTM | = 160 + 32p − 96p5+ 32p6+ 144p3+ 8p4− 120p2

We set the information to be a function of p ∈ [0,1] and plot the function. I(p) = 160 + 32p − 96p5+ 32p6+ 144p3+ 8p4− 120p2

(60)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 160 160.5 161 161.5 162 162.5 163

Figure 3.7: Plot of I(p)

Solving I’(p) = 0 for p = [0,1] we get: p1 = 1₂ − √ 3 6 ≈ 0.2113 and p2 = 1 2 + √ 3

6 ≈ 0.7887. These solutions give us the maximal value (information)

(61)

1 1 1 1 1 (1₂− √ 3 6 , 1 2+ √ 3 6 ) (1₂+ √ 3 6 , 1 2− √ 3 6 ) (-1,-1) (-1,1) (1,-1)

Figure 3.8: Optimal design for N = 5 and a continuous design region. I(Dopt,5)

= 162.7407.

This design is slightly better than the rightmost design in Figure 3.4. We can compare the two designs D-efficiency with the formula 2.11:

e(D) = M (D) 1 p M (Dopt) 1 p = 162 1 4 162.740714 ≈ 0.998860.

So the four isoinformative designs with information equal to 162, that are optimal when l1= l2= 5, are almost as good as the optimal designs when l1=

l2 = 9 or the optimal design on the continuous design region.

3.2.2 Exact designs, N = 6

We will now explore the D-optimal design for N = 6. Both in the case of a continuous design region and in the case when the levels are l1 = l2 = 5.

(62)

2 1 1 1 1 (1₂− √ 3 6 , 1 2+ √ 3 6 ) (1₂+ √ 3 6 , 1 2− √ 3 6 ) (-1,-1) (-1,1) (1,-1) (1₂,1₂) (-1,-1) (-1,1) (1,-1) 1 2 2 1

Figure 3.9: Left design: On a continuous design region I(Dopt,6) ≈ 325.4815.

This is an expansion of Dopt,5. Right design: Optimal design for N = 6 and l1

= l2 = 5, I(D6) = 324. 32414 325.481514 = 162 1 4 162.740714 ≈ 0.998860

The leftmost design in Figure 3.2.2 was found by analysis analogous to how we found Dopt,5. The rightmost design in Figure 3.8 was found by our program.

This design has six isoinformative variants (compare with N = 10 in 3.2). As we will see in section 3.2.3, optimal designs for N ≥ 7 only use the four points (-1,-1), (1,-1), (-1,1) or (1₂,1₂) and thus Dopt,5 and Dopt,6 can not be

expanded to the the optimal designs for N ≥ 7. Neither can Dopt,4be expanded

to Dopt,5 or Dopt,6. While the slightly less informative designs giving D5 = 162

and D6= 324 can be included in a sequential experiment where the designs for

N = 5 and N = 6 are the only non-optimal designs.

3.2.3 Continuous designs, N ≥ 7

All optimal designs found for N ≥ 7 follow a simple pattern and can be con-structed sequentially. The trials split equally in four design points.

Dopt,N ≥7= (−1, −1) (1, −1) (−1, 1) (1 2, 1 2) N 4 N 4 N 4 N 4

This design can be implemented when N is divisible by four. In the other cases the optimal designs are several isoinformative designs. The pattern is repeated each fourth trial.

(63)

Table 3.2: Implementation of Dopt for N = 7,8,9,10. This pattern holds as N

grows and is repeated in a mod 4 fashion. |I| is the number of isoinformative designs. |DN| is the number of isomorphic designs.

N N (mod 4) (-1,-1) (-1,1) (1,-1) (1₂,1₂) I |I| |D| I14/N 7 3 2 1 2 2 648 4 2 0,720768 7 3 1 2 2 2 648 4 1 0,720768 7 3 2 2 2 1 648 4 1 0,720768 8 0 2 2 2 2 1296 1 1 0,75 9 1 2 3 2 2 1944 4 2 0,737788 9 1 3 2 2 2 1944 4 1 0,737788 9 1 2 2 2 3 1944 4 1 0,737788 10 2 2 3 2 3 2916 6 2 0,734847 10 2 3 2 3 2 2916 6 2 0,734847 10 2 3 2 2 3 2916 6 1 0,734847 10 2 2 3 3 2 2916 6 1 0,734847

Considering sequential experiments for the cut-off design region and m2,I we

have now seen that Dopt,N ≥7 can be constructed sequentially. Though we can

not expand Dopt,4to Dopt,5, whilst Dopt,5can be expanded to Dopt,6, but Dopt,6

can not be expanded to Dopt,7. We have also seen that the optimal designs for

N = 5 and N = 6 are only slightly more informative than the designs D5 and

D6 designs that give information 162 and 324 respectively. These latter two

designs belong to a sequence of designs from Dopt,4 to Dopt,N ≥7where they are

the only two sub-optimal designs.

3.3 Algorithm Comparison

As mentioned in 2.11 there are less computation intensive approaches but they are quite irrelevant for the problem our program was set to solve. Had we built a program for many more factors than three we could have used the more complex but less computation intense method explained in chapter 12.3 of Optimum Experimental Designs, with SAS (2007).

As we can see from the table below the computing time for Systematic search with the cut-off and the cut out-design regions and quadratic model is certainly reasonable when d ∈ {1, 2, 3} but had the program been extended to further dimensions it would probably get a computation time measured in hours already by d = 5 or d = 6.

One Step is more or less three times faster given the same input as Systematic search. But it gives worse results than Systematic search, even when we run it

(64)

with a higher number of prospects and/or simulations, that is for a longer time. The reason for this is that One Step tends to end up in local maxima as soon a factor’s levels gets larger than three.

d levels N Simulations Prospects Time (sec.) Information Program

1 9 20 1 000 10 0.16 1176 Systematic 2 9, 9 20 1 000 10 0.67 265630 3 9, 9, 9 20 1 000 10 6.49 35403·105 1 9 20 10 000 10 0.9 1176 2 9, 9 20 10 000 10 1.45 265430 3 9, 9, 9 20 10 000 10 7.18 37732·105 1 9 20 10 000 100 1.57 1176 2 9, 9 20 10 000 100 6.52 267540 3 9, 9, 9 20 10 000 100 65 39648·105 1 9 20 50 000 500 7.84 1176 2 9, 9 20 50 000 500 30 267540 3 9, 9, 9 20 50 000 500 329 39648·105 1 9 20 1 000 10 0.1 1128.4 One step 2 9, 9 20 1 000 10 0.24 260960 3 9, 9, 9 20 1 000 10 2 24774·105 1 9 20 10 000 10 0.75 1109.4 2 9, 9 20 10 000 10 0.93 262570 3 9, 9, 9 20 10 000 10 2.4 16815·105 1 9 20 10 000 100 0.97 1136 2 9, 9 20 10 000 100 2.36 265650 3 9, 9, 9 20 10 000 100 19.7 21952·105 Table 3.3:

Green is the highest information found for d = 1. Yellow is the highest information found for d = 2. Blue is the highest information found for d = 3.

3.4 Design area and information, d = 2

If we set the full design space to have area 1, then the cut-off area is 0.875 and the cut-out area is 0.75. The relative design efficiency (see 2.5.6) of the full

(65)

space design is 1. whilst this number goes towards 0.75 and 0.73 for the cut-off and the cut-out design regions respectively, as N goes to infinity.

(66)

(67)

Discussion

4.1 Conclusions

For the two dimensional cut-off design region with the interaction model the D-optimal designs for N = 4,5,6 were found. For N ≥ 7 a pattern emerged that gives the rest of the D-optimal designs for this design region and model.

Such a pattern was not found for the cut-out design region but the continuous design could be calculated nonetheless and its exact designs are easily found by the program.

It was quite surprising that for both restricted design regions the design point (-1,-1) kept one fourth of the trials, just as it does in the full space design. D-optimal sequential designs are possible to make from our smallest design, D4, and onwards in the cut-out design region. Whilst for the cut-off design

region D-optimal sequential designs are possible starting from D7. Though

designs in the cut-out design region could be constructed sequentially starting from D4without great loss of information.

4.2 Restrictions and Further Studies

We have had to make many restrictions: to three dimensions, to (five) linear models, to only one optimality (D-optimality) and to two types of restricted regions.

If someone should continue this work the following topics would be interest-ing to investigate:

The program should be expanded into other optimality criterion such as A-optimality, E-optimality and C-optimality. I believe that C-A-optimality, above

(68)

all, would be easy to implement. The first thing in order to improve the Sys-tematic Search algorithm would be to make it iterative, the next would be to combine it with One Step. One could also try to combine the program with a combinatorial approach in order to handle non-ordinal data, see [5].

Regarding the design restrictions a few more design regions come to mind. The first one is the set {x = (x1, ..., xd) : xi∈ [0, 1] ∀ xi,P

d

i=1xi≤ 1} and the

second one is the set {x = (x1, ..., xd) : xi ∈ [0, 1] ∀ xi, P d i=1x

2

i ≤ 1}. Designs

where the design regions have a different number of levels for each factor can be constructed by the program already and these design regions ought to be explored. All of these extensions could be of interest for clinical research.

(69)

[1] Sven Erick Alm and Tom Britton. Stokastik. 2009.

[2] A.C. Atkinson, A.N. Donev, and R.D. Tobias. Optimum Experimental Designs, with SAS. Oxford University Press, 2007.

[3] P.F. Berger and W.K. Wong. An introduction to Optimal Designs for Social and Biomedical Research. 2009.

[4] de Aguiar P Bourguignon B Khots M Massart D Phan-Than-Luu R. D-optimal designs. Chemometrics and Intelligent Laboratory Systems, 30:199– 210, 1995.

[5] Gustaf Hector and Ludvig Vikström. Optimala komplementära försöks-planer för experiment med flernivåfaktorer. 2014.

[6] David C. Lay. Linear algebra and its applications. 2003.

[7] O Lombardi, F Holik, and Vanni L. What is Shannon information? PhilSci Archive, page 1, 2014.

[8] Carla C. Morris and Robert M. Stark. Finite Mathematics. 2016.

[9] Donald R. Mehlisch; Sue Aspley; Stephen E. Daniels; Kristin A. Souther-den; and Kyle S. Christensen. A single-tablet fixed-dose combination of racemic ibuprofen/paracetamol in the management of moderate to severe postoperative dental pain in adult and adolescent patients: A multicenter, two-stage, randomized, double-blind, parallel-group, placebo-controlled, fac-torial study. Clinical Therapeutics, 32(6, 2010), 2010.

[10] Sara van de Geer. Least squares estimation. Encyclopedia of Statistics in Behavioral Science, 2:1, 2005.

[11] Joakim Westerlund and Elisabet Borg. Statistik för beteendevetare. 2012.

(70)

[12] Songfeng Zheng. Fisher information and Cramér-Rao bound. 2017. [13] Douglas C. Montgomery Design and Analysis of Experiments. 8th Edition

2012.

[14] Y.B. Chan http://www.ms.unimelb.edu.au/ s620371/slides/ch04.pdf. Ch4, 2016.

[15] Y.B. Chan http://www.ms.unimelb.edu.au/ s620371/slides/ch06.pdf. Ch6, 2016.

[16] Zhu L Dasgupta T Huang Q A D-Optimal Design for Estimation of Pa-rameters of an Exponential-Linear Growth Curve of Nanostructures. 2014. [17] William H. Greene Econometric Analysis. 5th Edition 2002.

(71)

Appendix

A.1 Code

All code and instructions on how to run it can be found at this link Folder in One Drive.

(72)

A.2 Isomorphic and Isoinformatic designs

I

DISO

Dp Dp+1 _... DN DN +1 _...

Figure A.1: The outer box represents the set of all designs with information greater than zero. The light gray box, I, is the set of all isoinformatic designs and the dark gray box, DISO, is the set of all isomorphic designs with size