Department of Mathematics, Linköping University Johan Persson
LiTH-MAT-EX–2017/12–SE
Credits: 16 hp Level: G2
Supervisor: Frank Miller,
Department of Statistics, Stockholm University Examiner: Zhenxia Liu,
Department of Mathematics, Linköping University Linköping: July 2017
Problem statement: The D-optimal design is often used in clinical research. In multi-factor clinical experiments it is natural to restrict the experiment’s design space so as not to give a patient the combination of several high dose treatments simultaneously. Under such design space restrictions it is unknown what designs are D-optimal. The goal of the thesis has been to find D-optimal designs for these design spaces. Approach: Two new algorithms for finding D-optimal designs with one, two or three factors with linear models has been developed and implemented in MATLAB. Two restricted design spaces were explored. In cases when the program could not find the D-optimal design an analytic approach was used. Results: Special attention was given to the two factor model with interaction. All of the D-optimal designs for this model, N≤30 and their permutations have been listed as well as their continous designs. Conclusion: In one of the restricted design regions a simple design pattern appeared for N≥7. In the other restricted design region no obvious pattern was found but its continuous design could be calculated through analysis. It turned out that the number of trials at the lowest dose combination did not change when moving from the full space design to the restricted design regions.
Key words: exact design, D-optimality, D-optimal designs, design of ex-periments, Fedorov algorithm, restricted experiment region, restricted design region, restricted design space, information matrix, clinical experiments, iso-morphic designs, isoinformatic designs.
URL for electronic version:
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138614
Frågeställning: D-optimala designer är vanliga i kliniska studier. När flera faktorer (läkemedel) prövas samtidigt kan det vara nödvändigt att begränsa försöksrummet så att patienterna undviker att få en hög dos av flera faktorer samtidigt. I sådana begränsade försöksrum är det okänt vilka designer som är D-optimala. Uppsatsens mål har varit att hitta D-optimala designer i begrän-sade försöksrum. Metod: Två nya algoritmer för att hitta D-optimala designer med en, två eller tre dimensioner och linjära modeller har utvecklats och im-plementerats i MATLAB. Två begränsade försöksrum har utforskats. I de fall då MATLAB-programmet inte kunde hitta de D-optimala designerna användes analytiska metoder. Resultat: Analys av en tvåfaktormodell med interaktion utforskades särskilt noggrant. Alla D-optimala designer och permutationer av dessa i de båda begränsade försöksrummen har listats för N≤30, samt även de-ras kontinuerliga designer. Slutsats: För det ena försöksrummet upptäcktes ett mönster i designen då N ≥ 7. I det andra försöksrummet upptäcktes inget möns-ter och det krävdes således analytiska metoder för att finna dess kontinuerliga design. Det visade sig att antalet försök i den lägsta doskombinationen förblev oförändrat då man bytte från det fulla designrummet till de båda begränsade designrummen.
Sökord: exakt design, D-optimalitet, D-optimal design, experimentdesign, Fedorovalgoritm, begränsad experimentregion, begränsad designregion, begrän-sad designrymd, informationsmatris, kliniska experiment, isomorfa designer, iso-informa designer.
URL för elektronisk version:
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-138614
I would like to thank my supervisor Frank Miller for his great help and guidance. I would also like to thank my parents for their everlasting support. I am grateful to have had company from my lovely classmates on long days of hard work. A double thanks goes to Erik Landstedt, both for his friendship and for being the opponent of my thesis.
Lastly I would like to thank the fix point in my life, Patricia Carlbom.
Nomenclature
d number of factors (dimensions) in an experiment
N number of trials in an experiment
xi = (x1, ..., xd) design point number i, i = 1,...,N
S = {x1,...,xt}⊆ Rd design space
si(j) level j in factor i, i = 1,...,d, j = 1,...,li
li the highest level in factor i
t = l1· l2· ... · ld number of (possible) design points in S
n ≤ N number of design points used in the experiment R = {x1, ..., xr} ⊆ S design region
r ≤ t number of (possible) design points in R
x factors
xi value of factor i
Y = (Y1, ..., YN)T responses
(xi, Yi) trial
E = {(xi, Yi)}i∈N experiment with N trials, i = 1,...,N
Ni number of trials in design point i
ωi= lim N →∞
Ni
N weighted amount of trials in design point i
X design matrix
XT transpose of X
M model matrix
det(A) determinant of A
md,model(X) polynomial that expands X to M
βi model coefficient i, i = 1,2,...,p ˆ βi coefficient estimate β = (β1, ..., βp)T coefficient vector ε = (ε1, ..., εN)T experimental error (XTX) information matrix (XTX)−1 dispersion/covariance matrix
N (µ, σ2) normal distribution with mean µ and variance σ2
E(Y) expected value of Y
V(Y) variance of Y
DN design with N trials
Dopt,N D-optimal design with N trials
end of example
1 Background 1
2 Method 5
2.1 Theory of Experiments . . . 5
2.2 Models . . . 7
2.3 Number of Trials . . . 12
2.4 Least Squares Estimation . . . 12
2.5 D-optimality . . . 14 2.6 Scaling . . . 17 2.7 Levels . . . 18 2.8 Design Regions . . . 19 2.9 Isomorphic Designs . . . 23 2.9.1 d = 1 . . . 23 2.9.2 d = 2 . . . 24 2.9.3 d = 3 . . . 27
2.9.4 Isomorphic Designs and Information . . . 29
2.9.5 Isoinformative Designs . . . 30 2.10 Program . . . 31 2.10.1 Program Idea . . . 31 2.10.2 Program Description . . . 32 2.11 Computational considerations . . . 37 3 Results 39 3.1 Cut-out Design Region, d = 2 . . . 39
3.1.1 Exact Designs . . . 39
3.1.2 Continuous Designs . . . 41
3.2 Cut-off Design Region, d = 2 . . . 43
3.2.1 Exact designs, N = 4 and N = 5 . . . 43
3.2.2 Exact designs, N = 6 . . . 47
3.2.3 Continuous designs, N ≥ 7 . . . 48
3.3 Algorithm Comparison . . . 49
3.4 Design area and information, d = 2 . . . 50
4 Discussion 53 4.1 Conclusions . . . 53
4.2 Restrictions and Further Studies . . . 53
A Appendix 57 A.1 Code . . . 57
A.2 Isomorphic and Isoinformatic designs . . . 58
A.3 Isomorphism Table . . . 60
A.4 Cut-out Design Region vs Full Design Space . . . 61
A.5 Cut-out Design Region vs Cut-off Design Region . . . 62
Background
Experiments are probably just as old as the thinking man himself. Throughout all times people have been able to construct good experiments through only intuition and reflection. Let us take a look at stone skipping on water. If we with only two throws want to find out the correlation between the number of stone skips and the stone’s shape, then what type of stones should we throw? The answer is one stone that is as flat as possible and one stone that is as non-flat as possible, let us call it round. We understand intuitively that to throw these two stones would be a better choice than to throw two stones that look more or less the same.
Information rich experiment
Information poor experiment
Perfectly round Perfectly flat
Perfectly round Perfectly flat
First throw Second throw
First throw Second throw
Let us have a look at another experiment. Say that we want to estimate Swedish university students’ enjoyment of dinner as a function of three factors: the amount of food they eat, the quality of the wine they drink and the rating of the restaurant where they have the meal. Each of these three factors we give
three possible levels.
Food = {one dish, two dishes, three or more dishes}. Wine = {bad wine, mediocre wine, good wine}.
Restaurant = {low ranked restaurant, medium ranked restaurant, high ranked restaurant}.
Now let us code this to the numbers -1, 0, 1. Food = {-1, 0, 1}.
Wine = {-1, 0, 1}. Restaurant = {-1, 0, 1}.
So (0,1,-1) means the combination (two dishes, good wine, low ranked restau-rant). Let us further say that for economic reasons we will not allow the com-bination (1,1,1), that is, three or more dishes with good wine at a high ranked restaurant, because it would be too expensive. We thus restrict the design space and get the design region below, containing 26 possible design points.
Design region for dinner
(−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 0, 1) (1, 1, 0) (0, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1) (0, 1, 0) (0, 0, 0) (1, 0, 0) (0, 0, 1) Food Restaurant Wine
In pharmaceutical development it is sometimes interesting to measure the effect not of one, but of two drugs combined. It could for example be of interest to design an experiment measuring pain relief as a function of taking two dif-ferent pain killers, see [9]. In such an experiment it can be desirable to restrict the design of the experiment so as not to give any patients a simultaneous full dose of both pain killers, for safety reasons. This is called a restricted design
region. Finding effective designs in restricted designs regions is what this bach-elor’s thesis is about.
Dose of painkiller 1 Dose of painkiller 2
Method
2.1
Theory of Experiments
Doing an experiment is the best method for examining a causal relation [11, p11-12]. In an experiment the experimenter can control some variables, called factors (also known as independent or explanatory variables). The number of factors is written d and the value of factor i is written xi, i = 1, 2, ..., d. The
experiment output is called responses (also known as dependent or explained variables), these are observed by the experimenter. The responses, denoted Y, are stochastic variables since they depend not only on the controlled factors, but also on non-controlled explanatory variables, random noise and eventual bias; the latter three things are collectively named error and denoted ε.
This work explores what values the factors should take in an arbitrary ex-periment with ordinal data. Therefore we focus on the factors in the definition of experiment below. Experimenter Experiment Factors Responses Control Observation Persson, 2017. 5
Definition 2.1.1. Experiment
An experiment is an examination of a causal relation between factors and re-sponses, in which the experimenter control some factors. [13]
For this property and others of experiments, see [13, p14-17].
Now that we have got a sense of what an experiment is, we will try to make a mathematical definition. But for that we need to know what a trial is. Definition 2.1.2. Trial
A trial is the pair (x, Y ), where x is a d dimensional row vector, x ∈ Rd, and
Y ∈ R is an observed response. Y = f(x) + ε, f being the experiment function mapping a design point, x, to its response Y . ε is the error and its expected value is assumed to be zero, that is E(ε) = 0.
Definition 2.1.3. Experiment (mathematical definition)
An experiment, E, is a set of trials under the same function f. We write E = {(x1, Y1), (x2, Y2),..., (xN, YN)}. The number of trials in an experiment is
denoted by N. −1 −0.5 0 0.5 1 −2 −1 0 1 2 x Y
Figure 2.1: An experiment containing twelve trials (N = 12). Factor interval scaled to [-1,1].
One trial is the smallest possible experiment since the trial (x, Y ) is also an experiment in itself, namely E = {(x, Y )}. The experiment E = {(x1, Y1),
(x2, Y2),..., (xN, YN)} could be said to have the design D = {x1, x2,..., xN}
since these are the design points chosen for examination. Because design points are often repeated in an experiment the design is written in the following more concise manner.
Definition 2.1.4. Exact design
D = xN11 x2 x3 . . . xn N N2 N n3 N . . . Nn N , with Pn i=1Ni= N , Ni ∈ N
+ and N is the total number of trials.
The first row gives the values of the factors at the design points. Ni gives the
number of trials at xi. n is the number of design points chosen, by necessity n
≤ N. D is the exact design [2, p120].
When we want to underline the number of trials used in a design, we will write DN instead of D. For example D8 would be a design containing eight
trials.
Definition 2.1.5. Continuous design
D =x1 x2 x3 . . . xn w1 w2 w3 . . . wn , with Pn i=1wi= 1 and wi ∈ (0, 1]T R.
The first row gives the values of the factors at the design points. wi gives the
weighted amount of trials at xi. n is the number of design points chosen. D is
the continuous design [2, p119].
This work deals almost exclusively with exact designs and D, except for the result sections 3.1.2 and 3.2.3.
2.2
Models
If we seek a causal relation:
Y = f (x), (2.1)
with x being the factors’ values and Y being the responses’ values, we can construct a model:
Y = f (x) + ε, E(ε) = 0. (2.2)
As mentioned, the number of factors (dimensions) we designate with d. In general d ∈ N, in this thesis d ∈ {1, 2, 3}. We assume that the factors are
defined on, or scaled to, the interval [−1, 1], see section 2.6. The factors generate a design space S = ×d
i=1[−1, 1]d. S is a cartesian product, thus d = 1 generates
a line of length 2, d = 2 generates a square of area 4 and d = 3 generates a cube of volume 8; these are our design spaces.
If we restrict the design space we get what we call the design region (also known as restricted design region), denoted R. R = S\F, where F is an arbitrary region in S that we forbid. In an experiment we choose (with repetition of design space points allowed) N design points from R, these are called trials, defined above (definition 2.1.2). If we arrange the N trials as rows we get a N × d matrix, this matrix we denote X and call it the design matrix. We will now put all of this into a small example.
Example 2.2.1. Say that d = 2, x = (x1, x2), x1 = [−1, 1], x2 = [−1, 1] and
F = {(x1, x2) ∈ S : x1+ x2 > 1}. Then R is the cut-off design region below.
Putting one trial at each corner and two trials in the middle of the lower edge we get the following design, picture of the design and design matrix.
D =(−1, −1) (0, −1) (1, −1) (1, 0) (0, 1) (−1, 1) 1 2 1 1 1 1 (−1, −1) (1, −1) (1, 0) (0, 1) (−1, 1) (0, −1) 1 1 1 1 1 2 X = −1 −1 0 −1 0 −1 1 −1 1 0 0 1 −1 1 = x1 x2 x2 x3 x4 x5 x6 When we have more than one trial we write the model in the following manner: Y = f (X) + ε, X is the design matrix consisting of N trials and
Y is the vector of responses, Y = (Y1, ..., YN). f (X) is a function such that
f (X): IN ×d → RN, where I = [-1,1]. The function f depends linearly on
the p-dimensional coefficient vector β = (β1, β2, ..., βp)T and on the function
md,model(xi), md,model(X): Id→ Rp.
For an experiment with N trials we get the following model:
f (X) = f x1 x2 .. . xN = f x1,1 x1,2 ... x1,d x2,1 x2,2 ... x2,d .. . ... . .. ... xN,1 xN,2 ... xN,d = md,model(x1,1, x1,2, ..., x1,d) md,model(x2,1, x2,2, ..., x2,d) .. . md,model(xN,1, xN,2, ..., xN,d) | {z } M :=model matrix β1 β2 .. . βp | {z } β:=beta vector (2.3)
where d is the number of factors, p is the number of parameters in the chosen model and β1, β2,..., βp ∈ R.
The models used in this thesis have been five polynomials. Experience indi-cates that in very many experiments the response can be described by polyno-mial models of order no greater than two and that curves with three or more points of inflection are rare [2, p36]. This is why we have restricted the number of polynomials to five. The models presented below have all been multiplied with their respective coefficient vector (β).
In one dimension:
Linear, m1,L y = β1x1+ ε
Affine, m1,A y = β1+ β2x1+ ε
Quadratic, m1,Q y = β1+ β2x1+ β3x21+ ε
In two dimensions: Linear, m2,L y = β1x1+ β2x2+ ε Affine, m2,A y = β1+ β2x1+ β3x2+ ε Interaction, m2,I y = β1+ β2x1+ β3x2+ β4x1x2+ ε Quadratic, m2,Q y = β1+ β2x1+ β3x2+ β4x1x2+ β5x21+ β6x22+ ε Cubic, m2,C y = β1+ β2x1+ β3x2+ β4x1x2+ β5x21+ β6x21+ β7x31+ β8x32+ ε In three dimensions: Linear, m3,L y = β1x1+ β2x2+ β3x3+ ε Affine, m3,A y = β1+ β2x1+ β3x2+ β4x3+ ε Interaction, m3,I y = β1+ β2x1+ β3x2+ β4x3+ β5x1x2+ β6x1x3+ β7x2x3+ ε Quadratic, m3,Q y = Interaction + β8x21+ β9x22+ β10x23+ ε Cubic, m3,C y = Quadratic + β11x31+ β12x32+ β13x33+ ε
We notice that all models are symmetrical in the way that all factors are handled in the same manner, that is all terms are symmetrical. This means that if x2
1 is included in a three dimensional model then so are x22and x23; if x1x3is
included, then so are x1x2 and x2x3and so on and so forth.
The reason that terms with higher order interaction such as βnx21x2 or
βnx1x2x3 are not included in the models above is because they are seldom
seen in experiments, [2, p41-42]. It is now high time for an example.
Example 2.2.2. Set d = 2, no restrictions on the design space and use the interaction model y = β1+ β2x1+ β3x2+ β4x1x2+ ε. The function m2,I is then
m2,I(x1, x2) = (1, x1, x2, x1x2). The design space is a square with side length
equal to two. We fill the design space with six trials, one in each corner and two in the center.
(-1,-1) 1 (1,-1) 1 (0,0) 2 (-1,1) 1 (1,1) 1 f (X) = f x1,1 x1,2 x2,1 x2,2 x3,1 x3,2 x4,1 x4,2 x5,1 x5,2 x6,1 x6,2 = f −1 −1 −1 1 1 −1 1 1 0 0 0 0 = m2,I(−1, −1) m2,I(−1, 1) m2,I( 1, −1) m2,I( 1, 1) m2,I( 0, 0) m2,I( 0, 0) β1 β2 β3 β4 = = 1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 1 1 1 1 0 0 0 1 0 0 0 β1 β2 β3 β4 = M β.
M is the model matrix and β i the coefficient vector. They both depend on what model we have chosen. M also depends on the design D.
So to summarize, Y = Y1 Y2 .. . YN = md,model(x1,1, x1,2, ..., x1,d) md,model(x2,1, x2,2, ..., x2,d) .. . md,model(xN,1, xN,2, ..., xN,d) β1 β2 .. . βp + ε1 ε2 .. . εN . (2.4)
Writing the equation more economically:
2.3
Number of Trials
The more trials done, the higher the information gets. Since each trial has a cost, we want to gain as much information as possible out of as few trials as possible.
The minimum number of trials to get an information greater than zero de-pends on the model. In fact, we need at least as many trials as we have parame-ters. This since it is necessary to have as many equations as there are unknown variables. Besides this minimum number of trials needed we also need the trials to be spread in the design region to at least as many design points as there are parameters.
N it the number of trials, n is the number of design points used in the design, p is the number of coefficients in the model. A necessary criterion for the information to be greater than zero is: N ≥ n ≥ p.
Shown below is a table of the minimum number of n needed for each model used by us to give an information greater than zero. (L means linear model, A means affine model, etc.)
L A I Q C
d = 1 1 2 - 3 4
d = 2 2 3 4 6 8
d = 3 3 4 7 10 13
2.4
Least Squares Estimation
The following theory is in large part taken from [10] and [14]. In the model
Y = M β + ε (2.6)
M is known since we chose it ourselves. The vector β is unknown and con-tains the parameters we want to estimate. The error ε = (ε1, ε2, ..., εN)T is a
stochastic vector and therefore Y is a stochastic vector, but we can observe Y’s values.
We want to estimate β with least squares estimation. We work with the assumption that ε1, ..., εN are identically, independently N (0, σ2)-distributed.
This means that E(Y) = M β and V(Y) = σ2 · I
N, where IN is the
N-dimensional identity matrix. Definition 2.4.1. Rank
The rank of a matrix A, denoted by rank A, is the dimension of the column space of A [6, p178].
We remind the reader of the fact that rank A = rank AT.
Definition 2.4.2. Full rank
Full rank means that rank A = min(m,n) where m is the number of rows and n is the number of columns in the matrix A [6, p270].
Theorem 2.4.3. Let Y = M β + ε where M is a N × p matrix of full rank, N ≥ p; β = (β1, ..., βp)T and ε = (ε1, ε2, ..., εN)T, ε1, ..., εN are identically,
independently N (0, σ2)-distributed. Then the least squares estimator of β is:
ˆ
β = (MTM )−1MTY.
Proof. εi = Yi- E(Yi) and we wish to estimate the parameters in β so that the
error will be as small as possible. Set
ˆ
E(Yi) = M ˆβ.
The i-th residual is defined as:
ei= Yi−E(Yˆi) = Yi− M ˆβ.
We want to minimize the errors in absolute terms and do this by minimizing the square of the residuals. One reason for why we use the square of the residuals instead of their absolute value is because the absolute value function is not differentiable everywhere. min N X i1 e2i = eTe,
e being a column vector. We can then write:
Y = M ˆβ + e. Developing eTe we get
eTe = (Y − M ˆβ)T(Y − M ˆβ) = YTY − YTM ˆβ − ˆβTMTY + ˆβTMTM ˆβ. Which we simplify to
eTe = YTY − 2(MTY) ˆβ + ˆβTMTM ˆβ. This expression has a minimum when
∂eTe
that is when, ∂eTe ∂ ˆβ = −2M TY + 2(MTM ) ˆβ = 0. Hence, −2MTY + 2(MTM ) ˆβ = 0 ⇔ ˆβ = (MTM )−1MTY. (2.7)
It is in the last step of 2.7 that we need the assumption of full rank because without it the matrix (MTM ) cannot be uniquely inverted [2, p52]. Without
this assumption the pseudo-inverse is used. For this case the idea is analogous but the algebra is somewhat different. We refer to [15] for the interested reader.
2.5
D-optimality
In this section we will define D-optimal designs. Such designs maximize the determinant of what is called the information matrix. The information matrix presented here could be generalized to non-linear models and is then called the Fisher information matrix [16, p5]. For the reader interested in this generaliza-tion I refer to appendix A.6.
We observe that the covariance matrix of the least squares estimator is:
V(β) = σˆ 2(MTM )−1 (2.8)
[2, p52] (for proof see [17, p46]). When working with linear models in an exper-iment we want to estimate the parameters β1, ..., βp with the smallest possible
variance. But we notice that "If interest is in the comparison of experimental de-signs, the value of σ2is not relevant, since the value is the same for all proposed
designs for a particular experiment." [2, p53]. It is thus only the determinant of the matrix (MTM )−1 that we try to minimize in this work.
(MTM )−1 is called the dispersion matrix. The determinant of the
dis-persion matrix is proportional to the confidence ellipsoid of the parameters, see Figure 2.5 [3, p40].
Minimizing the determinant of the dispersion matrix is one type of design criterion (D-optimal design), another design criterion would be to minimize the sum of the squared parameter estimates (A-optimality), a third would be to minimize the squared length of the longest parameter estimate (E-optimality) [3, p40-43].
β
1β
2Figure 2.2: With only two parameters to estimate, β1 and β2, the dispersion
matrix is proportional to the area of the ellips.
Definition 2.5.1. Optimal design
An experiment designed so that it optimizes a given design criterion is called an optimal design.
There are several optimal statistical design criteria. We will use the most popular, which is called D-optimality. One reason for its popularity is the fact that the factor levels can be scaled (linearly transformed/coded) without problems.
Definition 2.5.2. Information matrix
MTM is the information matrix for β as well as for the design D [2, p52].
It is D that together with the function md,model generates M.
Definition 2.5.3. Information
I(D) = |MTM | (2.9)
The information, I(D), is the determinant of the information matrix which is generated by the design D, see 2.1.4.
which gave us the design matrix: M = 1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 1 1 1 1 0 0 0 1 0 0 0 ,
the determinant of the information matrix is then
|MTM | = 1 1 1 1 1 1 −1 −1 1 1 0 0 −1 1 −1 1 0 0 1 −1 −1 1 0 0 1 −1 −1 1 1 −1 1 −1 1 1 −1 −1 1 1 1 1 1 0 0 0 1 0 0 0 = 384. Thus I(D) = 384. When it is obvious what design D in I(D) we are speaking of we will only write I.
Definition 2.5.5. D-optimal design
A D-optimal design, Dopt, maximizes the determinant of the information
ma-trix. Equivalently a D-optimal design minimizes the determinant of the disper-sion matrix.
Oftentimes, henceforth, we will use only the phrase optimal design but al-ways mean D-optimal design, if not clearly stated otherwise.
Definition 2.5.6. Relative design efficiency
r(D) = |(M
TM )|1p
N (2.10)
r(D) is a function that takes a design, D, as input data and scales its information according to the model and the number of trials.
Relative design efficiency gives us a way of comparing designs with the same model but a different number of trials, see tables A.2 or A.3 for examples of this.
Definition 2.5.7. D-efficiency e(D) = I(D) 1 p I(Dopt) 1 p (2.11)
Where Doptis a D-optimal design [2, p151].
The D-efficiency compares a design, D, with the optimal design, Dopt, see
the calculation in 3.2.1 for an example of this. We could of course compare two sub-optimal designs with each other instead of with an optimal design [2, p151]. Besides calculating the information of a design by taking the determinant of the information matrix, the information of a N-trial design can also be calculated in the following manner:
I(D) = |
N
X
i=1
(md,model(xi))Tmd,model(xi)|. (2.12)
While the information of a continuous design could be calculated with:
I(D) = |
n
X
i=1
ωi(md,model(xi))Tmd,model(xi)|, (2.13)
where ωiis the weight to be put in design point xi.
2.6
Scaling
In the dinner example in Background we transformed the levels to {-1,0,1}. This might seem strange. If D-optimality can be coded arbitrarily, why not code the interval to [0,1], [0, li− 1] or [1, li] where liis the number of levels (parts that we
divide the interval in)? One reason not to code the levels to [0, li− 1] or [1, li]
is that the information would then grow proportional to the number of levels. A reason that [-1,1] is prefered over [0,1] is that the two end points should be weighted the same in the calculation of D-optimality, this would not be the case in the [0,1] interval. Scaling the interval to [-1, 1] is good for symmetry reasons.
[-1,1] scaling
For each factor separately, let ui be an unscaled variable such that ui ∈
[min(ui), max(ui)]. We then scale the variable to xi∈ [−1, 1] by setting:
xi=
2ui− max(ui) − min(ui)
max(ui) − min(ui)
2.7
Levels
In this work we use equidistant levels to divide each factors’ intervals. The reason for which we divide the intervals in levels, is that it is a natrual con-sequence of the implementation of our program. Each factor can take on an arbitrary number of levels in our program, but in this work we have only been interested in the case when each interval has the same number of levels. Factor i’s levels are written si(j), i = 1, 2, ..., d, j = 1, 2, ..., li. s2(3) means factor two
at level three. If we have three factors of which the first takes on seven levels, the second takes on three levels and the third factor takes on four levels we write this as the vector s = (l1, l2, l3) = (7, 3, 4). This gives us Q
3
i=1li = l1· l2· l3 =
7 · 3 · 4 = 84 possible design points. The design points make up the design space S = {x1, x2, ..., x84}. In general S = {x1, x2, ..., xt}, where t = Q d i=1li. li = 2 → {-1, 1} li = 3 → {-1, 0, 1} li = 4 → {−1, −13,13, 1} li = 5 → {−1, −12, 0,12, 1} li = 6 → {−1, −35, −15,15,35, 1} li = 7 → {−1, −23, −13, 0,13,23, 1} li = k → {−1, −1 +k−12 , −1 + 2 ·k−12 , −1 + 3 ·k−12 , ..., 1} (-1,-1) (1,-1) (1,0) (-1,1) (0,1)
Figure 2.3: d = 2, s = (5, 5), upper right corner not permitted, this gives us 21 design points.
In the results section, where we fix the factors to two, we will mainly use levels set at s = (5, 5) and s = (9, 9) because with these we get better designs than with levels of about the same magnitude such as s = (6, 6), s = (7, 7) or s = (8, 8). This since with s = (5, 5) and s = (9, 9) we include the (for our design regions) important design points (0,1), (1,0) and (0.5,0.5). Set:
(i) −1 + n ·k−12 = −12, or (ii) −1 + n ·k−12 = 0, or (iii) −1 + n ·k−12 =12
Solving k from any one of these we get:
k = 4n + 1. Since n ∈ N we get k = {5, 9, 13, 17, ...}. For the cut-off region the point (0.5,0.5) is not included and it thus gives optimal designs also when s = (3, 3).
2.8
Design Regions
Definition 2.8.1. Convex region
A convex region, R, has the property that any line segment joining any two points lie within (within meaning int(R) ∪ ∂R) the region [8, p121].
Convex design region. Non-convex design region.
Figure 2.4: The red lines are line segments connecting two points from respective region.
In this thesis three design regions have been explored for each dimension: 1) The full design space.
2) A cut-out design region (non-convex). 3) A cut-off design region (convex).
Even though the design regions in our program are always partitioned by the chosen levels, we will here present continuous design regions. The continuous design regions could be seen as design regions where the number of levels has gone to infinity, that is li → ∞, for all i.
−1 1 Full design space.
With only one factor we have designed the restricted design regions to be the same as the full design space, that is the intervel [-1,1].
Two factors (d=2).
(−1, −1) (1, −1)
(1, 1) (−1, 1)
Full design space. {x = (x1, x2) : x1, x2 ∈ [-1, 1]} (−1, −1) (1, −1) (1, 0) (0, 0) (0, 1) (−1, 1)
Cut-out design region (non-convex).
(−1, −1) (1, −1) (1, 0) (0, 1)
(−1, 1)
Cut-off design region (convex).
Three factors (d=3).
Full design space.
{x = (x1, x2, x3) : x1, x2, x3 ∈ [-1, 1]} (−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1)
Cut out-design region (non-convex).
{x = (x1, x2, x3) : x1, x2, x3 ∈ [-1, 1], min(x1, x2, x3) ≤ 0} (−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 0, 1) (1, 1, 0) (0, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1) (0, 1, 0) (0, 0, 0) (1, 0, 0) (0, 0, 1)
Cut off-design region (convex). {x = (x1, x2, x3) : x1, x2, x3 ∈ [-1, 1], x1+ x2 + x3 ≤ 1}} (−1, −1, 1) (1, −1, 1) (−1, −1, −1) (−1, 1, −1) (1, 0, 1) (1, 1, 0) (0, 1, 1) (−1, 1, 1) (1, −1, 1) (1, 1, −1)
2.9
Isomorphic Designs
Definition 2.9.1. Isomorphic designs
We call two designs isomorphic if they can be made indistinguishable by the operations reflection and rotation. We further demand that they belong to the same experiment, meaning they have the same model md,model.
In the analysis throughout this section we will concentrate on the cut-off and cut-out design regions and merely look at the full space regions for pedagogical purposes.
2.9.1
d = 1
Rotations are not interesting in one dimension, but the line has one symmetry line and it is of interest.
Figure 2.5: We can reflect a one-factor design through the symmetry line.
5 4 4 5
Figure 2.6: These two designs, call them D1,9and D2,9, are isomorphic, denoted
∼
=, since they can be made indistinguishable through reflection.
D1,9= −1 1 5 4 ∼ = D2,9= −1 1 4 5 .
We say that the design D1,9 is of size two since it can generate another isomorf
design. We write it |D1,9| = 2, or equivalently |D2,9| = 2.
3 3 3 3
Figure 2.7: These two designs are already identical and therefore we say that they are of size one.
2.9.2
d = 2
Rotating the cut-out or the cut-off design regions are not of interest in two di-mensions since this would change the shape of the design region. The restricted design regions have only one symmetry line.
Figure 2.8: A square has four symmetry lines. Reflection through the fat red line could be seen as flipping the coordinate axes.
a (x, y) a’ (y, x) x2 y2 y1 x1
Figure 2.9: When we reflect through the symmetry line the point a = (x, y) changes place with the point a0 = (y, x). We call a and a0 paired points. As-suming a 6= a0 this gives us two isomorphic designs.
a (x, y) a’ (y, x) b (x0, y0) b’ (y0, x0)
Figure 2.10: Reflection through the red line. Assuming at least one of the paired points differ, we get two isomorphic designs. That is: (a 6= a’) ∨ (b 6= b’) ⇒ |D| = 2.
2.9.3
d = 3
In one and two dimensions we reflect the design regions through a symmetry line. In three dimensions we can rotate the restricted design regions around the only rotation line that preserves the design regions shape. Rotation around this one line can generate three isomorphic designs. We see from the figure below that we can rotate the cube 2π3 around the rotation line and in this way keep the shape of the design region, but change the place of the trials. After three
2π
3 rotations we return to the design we started with.
We can also reflect, not through a line as in two dimensions but through a plane. We have three planes to reflect through, the planes that hold one of the points a, b or c fix at a time and reflects the two other points. These reflections give us three more designs.
The rotations and reflections could also be seen as rotating and reflecting the factors’s coordinate axes.
(−1, −1, −1) (1, 1, 1) a (1, 1, −1) c (1, −1, 1) b (−1, 1, 1)
Figure 2.11: If a 6= b 6= c this design can generate five others, giving a total of six isomorphic designs. If two points are equal but the third point differs we get three isomorphic designs. If all points are the same we have only one design.
(−1, −1, −1)
(1, 1, 1)
Figure 2.12: One rotation line (the space diagonal) and three planes of reflection, thus just as above.
If we construct design spaces with different number of levels for each factor we get a rectangle (d = 2) and a cuboid (d = 3). The (non-square) restricted
region rectangle has no symmetry or reflection lines. The (non-cube) restricted region cuboid has no lines of rotation or planes of reflection. But would we allow two of the cuboids factors to have the same number of levels we would get one plane of reflection. This plane of reflection could give two isomorphic designs if the two paired design points have a number of trials that differ from each other.
We will here give two examples where d = 2.
3 3 2 1 2 2 2 2 1 1
Figure 2.13: For the leftmost design |D| = 2. For the rightmost design: |D| = 1.
2.9.4
Isomorphic Designs and Information
We want to show that isomorphic designs have the same information. We start off by looking at d = 2 and d = 3. The reflection that we do when d = 2 could be seen as flipping the coordinate axes, that means the columns in X change places to Xref: X = x1,1 x1,2 x2,1 x2,2 .. . ... xN,1 xN,2 and Xref = x1,2 x1,1 x2,2 x2,1 .. . ... xN,2 xN,1
When d = 3 the reflections in the plane hold one column fix and interchanges the other two. While the rotation 2π3 moves all columns one step in the following manner:
X = x1,1 x1,2 x1,3 x2,1 x2,2 x2,3 .. . ... ... xN,1 xN,2 xN,3
after rotating2π3 becomes Xrot =
x1,3 x1,1 x1,2 x2,3 x2,1 x2,2 .. . ... ... xN,3 xN,1 xN,2 .
Since we have both symmetrical models and symmetrical scaling it is obvious that reflections and rotations for d = 2 and d = 3 will not change the informa-tion. Thus the designs of our 2- and 3-factor models have the same information if they are isomorphic.
Now we will look at d = 1. When reflecting the line we change the sign of all design points. If X is the design matrix of D and Xref is the reflected design
matrix (call the reflected design Dref) we get:
X = x1 x2 .. . xN and Xref = −x1 −x2 .. . −xN .
For m1,LI(D) = I(Dref) = P N i=1x
2 i.
For m1,A I(D) = I(Dref) = NP N i=1x 2 i − PN i=1xi.
For m1,Q I(D) = I(Dref) = |N (PNi=1x2i
PN i=1x 4 i − PN i=1x 3 i PN i=1x 3 i)− PN i=1xi(P N i=1xiP N i=1x 4 i − PN i=1x 3 i PN i=1x 2 i)+ PN i=1x 2 i( PN i=1xiP N i=1x 3 i − PN i=1x 2 i PN i=1x 2 i)|.
For the model m1,C:
I(D) = N P xi P x2i P x 3 i P xi P x2i P x 3 i P x 4 i P x2 i P x3i P x4i P x5i P x3 i P x4i P x5i P x6i The information of the reflected design is:
I(Dref) = N −P xi P x2i −P x 3 i −P xi P x2i −P x3i P x4i P x2 i −P x3i P x4i −P x5i −P x3 i P x4i −P x5i P x6i = N −P xi P x2i −P x 3 i P xi −P x2i P x3i −P x4i P x2 i −P x3i P x4i −P x5i P x3 i −P x4i P x5i −P x6i = I(D). Thus for all models we work with, isomorphic designs have the same information.
2.9.5
Isoinformative Designs
Definition 2.9.2. Isoinformative designs
We call two designs isoinformative if they have both the same number of trials and the same information but are not identical. We further demand that they belong to the same experiment, meaning they have the same model md,model.
When the models and scaling are symmetrical, the isomorphic designs are a subgroup to the isoinformative designs, that is, if DISO is the set of isomorphic
designs of size greater than one and I is the set of isoinformatic designs, then DISO ⊂ I.
The set of isoinformative designs, IN, contains all designs with N number
of trials, the same information and the same model. We denote the size of this set |IN|. The set of isomorphic designs for a given N we denote DISO,N, the
size of this set is |DISO,N|. Then obviously |IN| ≥ |DISO,N| for all N . There
are isoinformatic designs that are not isomorphic, see section 3.2.1 for examples of this, thus the inequality is sometimes strict.
All designs with information greater than zero make up the set A. The smallest of designs in A belong to the set Dpcontaining all designs with p trials,
where p is the number of parameters that our chosen model has. There is no upper limit on the number of trials a design can have and A is thus split in an infinite number of subsets, A = {Dp, Dp+1,...}. Further DISO ⊂ I ⊂ A. See
Figure A.1 for an illustration of these relations.
2.10
Program
2.10.1
Program Idea
The program we have developed finds D-optimal designs in one, two or three dimensions. The program starts with generating a number of random designs. Out of these random designs the ones with the highest information are sent to one of the two algorithms we have developed. What algorithm is used depends on which of the two GUI that the user has executed.
The reason that two algorithms were developed is that the first one devel-oped, which we have called One Step, had the weakness that it ended up in local maxima, and thus not finding the D-optimal design. Therefore another algorithm, which we have called Systematic Search was developed. Systematic Search avoided ending up in local maxima and thus found the D-optimal de-signs. It is more important that the program finds an optimal design and takes a few times longer to run than that it finds a sub-optimal design in lesser time. Therefore Systematic Search is better than One Step. For a comparison see section 3.3.
We later found out that Systematic Search is closely related to the Fedorov Algorithm (1972). The Fedorov Algorithm is described in [4].
Design generator
4
5
3
...
2
m
1
Fine search Output Generate a number (m) of random de-signs.Send the designs with the highest informa-tion to an optimiza-tion algorithm for fine search.
Figure 2.14: A schematic of what the program does.
2.10.2
Program Description
The program is written in MATLAB. The program and instructions on how to run it is linked for download in the appendix A.1. The user has to give the pro-gram the following inputs: factors (d), levels (s), region shape, the number of trials, the model, the number of random designs to be generated (simulations) and the number of designs to be improved upon by the algorithm (prospects). All of these inputs are given in the program’s GUI. This is what the GUI looks like.
The user can choose one, two or three dimensions. The level-windows pop up after the number of dimensions has been chosen. The levels can be arbitrarily chosen, but the more levels the longer the program takes to run. One of the three regions and one of the five (four if d=1) models is chosen. Trials, simulations and prospects are arbitrarily chosen but prospects must be less than or equal to simulations. The "Show matrix"-box decides whether or not the design matrix of the best design will be shown in the output. The help button opens a pdf-manual (works both in Windows and Linux).
The program output consists of: the input data, elapsed time, a picture of the design with the highest information found, the design’s information and its matrix (optional).
space region design matrix information top
one step search systematic search
GUI
Output: The highest information found and its design. d,s shape N model simulations, prospects
Function description Functions Input d is the number of
dimensions, 1,2 or 3. s = (l1, . . . , ld) is the
vector of levels each dimension may take. shape ∈ {full design space, cut-out design region, cut-off design region}. N ∈ N is the number of trials. model ∈ {linear, affine, interaction, quadratic, cubic}. simulations ∈ N, the number of random de-signs created. prospects ≤ simu-lations, the number of designs with the highest information sent to fine search. Spans a 1, 2 or 3-dimensional space.
Restricts the design space due to technical or medical reasons.
Randomly generates designs, D. The entries in D are the number of trials Ni at each design point xi.
Constructs a matrix according to D and R.
Calculates the information |MTM |. M is an expansion of X according to the model.
Takes out the matrices with the highest information. Two different algorithms for finding designs with maximum information.
Graphical user interface.
S (t × d matrix) R(r × d matrix) R, D(1 × r vector) X (N × d matrix) X, I(a number) I’,X’ I’,X’ I”,X” I”,X”
We will now describe in detail what our program does and how each algo-rithm works. The program starts of by generating a number of random designs, this is done by the code modules shape, region and design. In the program imple-mentation the random designs are called simulations. The code modules matrix and information creates model matrices and calculates the designs’s informa-tion. The number of highest information designs to be sent to an algorithm is given by the number of prospects. The module top picks out this (prospect ) number of best designs and sends it to one of the two algorithms. Which algo-rithm is used depends on if you have opened the program Systematic Search or One Step.
This is how One Step works:
Prospects
Contains the best randomly generated designs and their respective information. One prospect at a time is sent to One Step for improvement.
One Step
Move a trial from its design point to the closest two design points in each factors’s direction.
d=1 ⇒ 2 new designs, d=2 ⇒ 4 new designs, d=3 ⇒ 8 new designs
Calculate new information
Send back the highest information and its design. New designs (2, 4 or 8) If the new informa-tion is equal or lower then move di-rectly to the next trial. If the new informa-tion is higher then up-date design and move to the next trial.
Sends back a new design matrix X’ and its infor-mation I’ after all trials have been moved. X,I
Figure 2.15: The functions inside the dashed box are iterated as long as the designs improve. This is done until the design have found an information/design maximum, that is the algorithm can not improve the design any more.
1 1 1 1 2 1 1 1 1 1 1 1 2 1 3 1 1 3 2 4 3 1 3 2
Figure 2.16: A design that goes through One Step and ends up in a sub-optimal design.
In Figure 2.16 the first design is a randomly generated design which is among the simulationsprospects highest information designs out of the randomly generated de-signs. The second design has gone through the One Step algorithm once. The third design has gone through One Step once more and has ended up in a local maximum that it can not get out from.
We will now look at how Systematic Search works:
Prospects
Contains the best randomly generated designs and their respective information. One prospect at a time is sent to systematic search for improvement.
Systematic search
Move a trial from its design point to all other design points. This creates r-1 new designs.
Calculate new information
Send back the highest information and its design. New designs (r-1) If the new informa-tion is equal or lower then move di-rectly to the next trial. If the new informa-tion is higher then up-date design and move to the next trial.
Sends back a new design matrix X’ and its infor-mation I’ after all trials have been moved. X,I
Figure 2.17: Notice that the Systematic Search is not iterated. As long as sim-ulations and prospects are large it only needs one iteration to find a maximum (or the unique maximum, if only one exists). An advantage over One Step is that it is far less likely to end up in a local maximum.
When Systematic search has moved all trials once and when One Step has moved all trials until the algorithm can not improve the designs any more, then the program produces output.
2.11
Computational considerations
It should be noted that old algorithms such as DETMAX (from 1974) are less compute-intensive than the two algorithms we have developed since they calcu-late the determinant of the information matrix in another manner. But
comput-ing time has not been a big problem durcomput-ing the thesis. It is worth noticcomput-ing that an Intel 8080 from 1974 could do 0.29 MIPS (million instructions per second) while an Intel i7 from 2016 can do around 300 000 MIPS. 3000000.29 ≈ 1 000 000.
To do an exhaustive search, that is, to calculate all designs for a certain design region and a given number of trials gets to compute-intense as soon as r and/or N gets "big". With r allowed design points in our design region and N trials in our design there are rN designs. Let us try out this with some numbers and see what "big" means. Set N = 10 and s = (5, 5) in the cut-off design region. Then r = 21 and we get: 2110≈ 2010= (2 · 10)10= 210· 1010≈ 103· 1010= 1013.
By setting prospects to 1 and only changing simulations we get an approximation of how long time it would take to generate these 1013designs and calculate their information. We use the interaction model and get the elapsed calculation time below.
simulations 103 104 105 106 107
time (seconds) 0.13 0.86 8.1 81 825
It is obviously unreasonable to solve this problem exhaustively. Though it would be possible when s = (3, 3), giving us 810 designs in the cut-out design
Results
We have looked at the two restricted design regions (cut-off and cut-out) in two dimensions with the model m2,I. The program could just as easily have
produced analogous results for the other two dimension spaces, for another model or for the full design space.
The designs here presented are called optimal if they have the highest infor-mation found, either by our program or by analysis. The reader should keep in mind that we do not provide proofs that these designs are optimal.
3.1
Cut-out Design Region, d = 2
3.1.1
Exact Designs
Below are designs for d = 2, s = (5, 5), interaction model and a variable number of trials. The isoinformative designs are marked with a thin symmetry line. For the cut-out design the program generates D-optimal designs when the number of levels belong to {3,5,9,13,...} (see section 2.7). We have chosen s = (5, 5) here because the figures are of a nice size. All designs except N = 11 and N = 24 have zero or one pair of pairwise design points that differ from each other. The designs for N = 11 and N = 24 have two such pairs (see 2.9 for explanation of paired points). In Figure 3.1 we see three examples of cut-out designs, only five design points are used in all D-optimal designs found for this region and the interaction model.
1 1 1 1 0 3 3 2 2 1 6 5 3 5 3
Figure 3.1: m2,I, N = 4, 11, 22. The first two designs are isomorphic of size
two.
Table 3.1: Designs for N = 4,5,...,30; model m2,I; |D| is the number of
isomor-phic designs. Below each design point is given its number of trials. N (-1,-1) (1,-1) (-1,1) (1,0) (0,1) |D| 4 1 1 1 1 0 2 5 1 1 1 1 1 1 6 2 1 1 1 1 1 7 2 2 1 1 1 2 8 2 2 2 1 1 1 9 2 2 2 2 1 2 10 3 2 2 2 1 2 11 3 3 2 2 1 2 12 3 3 3 2 1 2 13 3 3 3 2 2 1 14 4 3 3 2 2 1 15 4 4 3 2 2 2 16 4 4 4 2 2 1 17 4 4 4 3 2 2 18 5 4 4 3 2 2 19 5 4 4 3 3 1 20 5 5 4 3 3 2 21 5 5 5 3 3 1 22 6 5 5 3 3 1 23 6 5 5 4 3 2 24 6 6 5 3 4 2 25 6 6 6 4 3 2 26 6 6 6 4 4 1 27 7 6 6 4 4 1 28 7 7 6 4 4 2 29 7 7 7 4 4 1 30 8 7 7 4 4 1
More data on isomorphic designs for this design region and model can be found in the appendix A.3.
When we study the table we see that a design DN +1can be constructed from
the earlier design DN by simply adding a trial to a design point. No earlier
trials have to be moved. This is a very nice property for making sequential experiments since we then can have an optimal design for each experiment in the sequence.
We also notice that it is not always easy to guess the design DN +1from the
design DN. If we take a look at designs with a much larger number of trials we
will approximate the continuous design and calculate the continuous design in this way.
3.1.2
Continuous Designs
We are now fairly convinced that all D-optimal designs of the cut-off design region with the interaction model have the following form:
Dopt= (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) N1 N N2 N N3 N N4 N N5 N .
All optimal designs that we have looked have used exactly these design points. We would guess that the continuous designs uses the same five points,
Dopt=
(−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0)
ω1 ω2 ω3 ω4 ω5
, we would like to know what ωi is, i = 1,...,5.
Figure 3.2: Exactly these five design points are used in all optimal designs found by the program, for N ≥ 5 and s = [l1, l2] where l1= l2 ≥ 3.
Assuming that only these five points are used, we might us the levels s = (3, 3) when we make designs with a large number of trials, to make the program run fast. (We have run a similar analyis with the levels set to be much larger than this, such as 17 and 21, and always found the same result.)
Looking at Figure 3.3 we find the quite surprising result that the design point (-1,-1) has one fourth of the trials in the cut-out design region, just as it
does in the full design space. In the leftmost design the levels are s = (4, 4) which gives us the design:
D ≈(−1, −1) (1, −1) (−1, 1) (− 1 3, 1) (1, − 1 3) 0.25 0.220 0.220 0.155 0.155 , I=3.5867·1013to be
com-pared to I(Dopt) = 7.2775 · 1013 when s = (3, 3).
1000 881 619 619 881 3 8N 881+619 4000 = 3 8 1000 934 566 934 566 3 8N 934+566 4000 = 3 8 1000 1000 1000 1000
Figure 3.3: Exact optimal designs with m2,I and N = 4000. In the leftmost
design the levels are l1 = l2 = 4 and the design is cut-out, in the two other
designs the levels are l1= l2 = 3 and the designs are cut-out and full space.
The middle design has the following proportions: Dopt,4000 = (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 1 4 3 8 934 1500 3 8 934 1500 3 8 566 1500 3 8 566 1500 .
This gives us the idea that the continous design looks like this: Dopt= (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 1 4 3 8ω 3 8ω 3 8(1 − ω) 3 8(1 − ω) , ω ∈ (0, 1). We will calculate ω by using the formula 2.13, that is:
I(D) = | n X i=1 ωi(md,model(xi))Tmd,model(xi)| ω1 = 14, ω2= ω3 = 38ω, ω4 = ω5= 38(1 − ω).
m2,I((−1, −1)) = (1, −1, −1, 1)
m2,I((1, −1)) = (1, 1, −1, −1)
m2,I((−1, 1)) = (1, −1, 1, −1)
m2,I((0, 1)) = (1, 0, 1, 0)
m2,I((1, 0)) = (1, 1, 0, 0)
Set I(ω) to be the information as a function of ω. Then by using formula 2.22 we have that I(ω) = 2764ω +2732ω2+8164ω3. I’(ω) = 2764+2716ω +24364ω2. By setting I’(ω) = 0 we get the two roots ω = 29 ±
√ 13
9 of which only one is positive,
ω = 29+
√ 13
9 ≈ 0.6228. Thus the continous design is:
Dopt= (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 1 4 2+√13 24 2+√13 24 7−√13 24 7−√13 24 , which is approximately, Dopt≈ (−1, −1) (1, −1) (−1, 1) (0, 1) (1, 0) 0.25 0.2336 0.2336 0.1414 0.1414 .
3.2
Cut-off Design Region, d = 2
We continue with the model m2,I to the cut-off design region, which is an
expansion of the cut-out design region.
3.2.1
Exact designs, N = 4 and N = 5
We will here look at the optimal designs for N = 4 and N = 5. We will see that while the optimal design for N = 4 is unique (see Figure 3.4), the optimal design for N = 5 depends on the number of levels we use. The optimal design for N = 5 on a continuous design region does in fact have to be calculated by analysis, even though our program finds very good approximations of this design as soon as the levels are equal to or greater than nine.
Dopt,4has one trial in each of the four points (-1,-1), (1,-1), (-1,1) or (12,12).
When we expand Dopt,4 in search for Dopt,5 a reasonable try would be to put
the fifth trial in one of the already used design points (-1,-1), (1,-1), (-1,1) or (1
2, 1
2). Then the highest possible information for l1 = l2 = 5 is I(D5) = 162,
which is the information for the following four designs: (−1, −1) (1, −1) (−1, 1) (1 2, 1 2) 2 1 1 1 (−1, −1) (1, −1) (−1, 1) ( 1 2, 1 2) 1 1 1 2 ; neither one of the above two designs are isomorphic to these two designs: (−1, −1) (1, −1) (−1, 1) (1 2, 1 2) 1 2 1 1 ∼ =(−1, −1) (1, −1) (−1, 1) ( 1 2, 1 2) 1 1 2 1 . Thus we have found our first isoinformative designs that are not isomorphic. But the designs we just found are not optimal compared to designs with levels li ≥
1 1 1 1 1 1 2 1
Figure 3.4: Left design: Dopt,4. Right design: Optimal design for N = 5 and l1
= l2= 5. The fifth trial can go in any one of the red design points, each design
has I(Dopt,5) = 162.
1 1 1 1 1 (0.25,0.75) (0.75,0.25) (-1,-1) (-1,1) (1,-1)
Figure 3.5: N = 5; I = 162.6953; when s1= s2 ≥ 9 the trials separate into five
But the design might get even better if we have a continuous design region, that is we let the levels go to infinity.
1 1 1 1 1 (p,1-p) (1-p,p) (-1,-1) (-1,1) (1,-1)
Figure 3.6: N = 5 on a continuous design region.
Assuming that the schematic design above is optimal, we want to find the value of p. We do this by examining the model matrix:
M = 1 −1 −1 1 1 1 −1 −1 1 −1 1 −1 1 p 1 − p p(1 − p) 1 1 − p p p(1 − p)
The information is:
I = |MTM | = 160 + 32p − 96p5+ 32p6+ 144p3+ 8p4− 120p2
We set the information to be a function of p ∈ [0,1] and plot the function. I(p) = 160 + 32p − 96p5+ 32p6+ 144p3+ 8p4− 120p2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 160 160.5 161 161.5 162 162.5 163
Figure 3.7: Plot of I(p)
Solving I’(p) = 0 for p = [0,1] we get: p1 = 12 − √ 3 6 ≈ 0.2113 and p2 = 1 2 + √ 3
6 ≈ 0.7887. These solutions give us the maximal value (information)
1 1 1 1 1 (12− √ 3 6 , 1 2+ √ 3 6 ) (12+ √ 3 6 , 1 2− √ 3 6 ) (-1,-1) (-1,1) (1,-1)
Figure 3.8: Optimal design for N = 5 and a continuous design region. I(Dopt,5)
= 162.7407.
This design is slightly better than the rightmost design in Figure 3.4. We can compare the two designs D-efficiency with the formula 2.11:
e(D) = M (D) 1 p M (Dopt) 1 p = 162 1 4 162.740714 ≈ 0.998860.
So the four isoinformative designs with information equal to 162, that are optimal when l1= l2= 5, are almost as good as the optimal designs when l1=
l2 = 9 or the optimal design on the continuous design region.
3.2.2
Exact designs, N = 6
We will now explore the D-optimal design for N = 6. Both in the case of a continuous design region and in the case when the levels are l1 = l2 = 5.
2 1 1 1 1 (12− √ 3 6 , 1 2+ √ 3 6 ) (12+ √ 3 6 , 1 2− √ 3 6 ) (-1,-1) (-1,1) (1,-1) (12,12) (-1,-1) (-1,1) (1,-1) 1 2 2 1
Figure 3.9: Left design: On a continuous design region I(Dopt,6) ≈ 325.4815.
This is an expansion of Dopt,5. Right design: Optimal design for N = 6 and l1
= l2 = 5, I(D6) = 324. 32414 325.481514 = 162 1 4 162.740714 ≈ 0.998860
The leftmost design in Figure 3.2.2 was found by analysis analogous to how we found Dopt,5. The rightmost design in Figure 3.8 was found by our program.
This design has six isoinformative variants (compare with N = 10 in 3.2). As we will see in section 3.2.3, optimal designs for N ≥ 7 only use the four points (-1,-1), (1,-1), (-1,1) or (12,12) and thus Dopt,5 and Dopt,6 can not be
expanded to the the optimal designs for N ≥ 7. Neither can Dopt,4be expanded
to Dopt,5 or Dopt,6. While the slightly less informative designs giving D5 = 162
and D6= 324 can be included in a sequential experiment where the designs for
N = 5 and N = 6 are the only non-optimal designs.
3.2.3
Continuous designs, N ≥ 7
All optimal designs found for N ≥ 7 follow a simple pattern and can be con-structed sequentially. The trials split equally in four design points.
Dopt,N ≥7= (−1, −1) (1, −1) (−1, 1) (1 2, 1 2) N 4 N 4 N 4 N 4
This design can be implemented when N is divisible by four. In the other cases the optimal designs are several isoinformative designs. The pattern is repeated each fourth trial.
Table 3.2: Implementation of Dopt for N = 7,8,9,10. This pattern holds as N
grows and is repeated in a mod 4 fashion. |I| is the number of isoinformative designs. |DN| is the number of isomorphic designs.
N N (mod 4) (-1,-1) (-1,1) (1,-1) (12,12) I |I| |D| I14/N 7 3 2 1 2 2 648 4 2 0,720768 7 3 1 2 2 2 648 4 1 0,720768 7 3 2 2 2 1 648 4 1 0,720768 8 0 2 2 2 2 1296 1 1 0,75 9 1 2 3 2 2 1944 4 2 0,737788 9 1 3 2 2 2 1944 4 1 0,737788 9 1 2 2 2 3 1944 4 1 0,737788 10 2 2 3 2 3 2916 6 2 0,734847 10 2 3 2 3 2 2916 6 2 0,734847 10 2 3 2 2 3 2916 6 1 0,734847 10 2 2 3 3 2 2916 6 1 0,734847
Considering sequential experiments for the cut-off design region and m2,I we
have now seen that Dopt,N ≥7 can be constructed sequentially. Though we can
not expand Dopt,4to Dopt,5, whilst Dopt,5can be expanded to Dopt,6, but Dopt,6
can not be expanded to Dopt,7. We have also seen that the optimal designs for
N = 5 and N = 6 are only slightly more informative than the designs D5 and
D6 designs that give information 162 and 324 respectively. These latter two
designs belong to a sequence of designs from Dopt,4 to Dopt,N ≥7where they are
the only two sub-optimal designs.
3.3
Algorithm Comparison
As mentioned in 2.11 there are less computation intensive approaches but they are quite irrelevant for the problem our program was set to solve. Had we built a program for many more factors than three we could have used the more complex but less computation intense method explained in chapter 12.3 of Optimum Experimental Designs, with SAS (2007).
As we can see from the table below the computing time for Systematic search with the cut-off and the cut out-design regions and quadratic model is certainly reasonable when d ∈ {1, 2, 3} but had the program been extended to further dimensions it would probably get a computation time measured in hours already by d = 5 or d = 6.
One Step is more or less three times faster given the same input as Systematic search. But it gives worse results than Systematic search, even when we run it
with a higher number of prospects and/or simulations, that is for a longer time. The reason for this is that One Step tends to end up in local maxima as soon a factor’s levels gets larger than three.
d levels N Simulations Prospects Time (sec.) Information Program
1 9 20 1 000 10 0.16 1176 Systematic 2 9, 9 20 1 000 10 0.67 265630 3 9, 9, 9 20 1 000 10 6.49 35403·105 1 9 20 10 000 10 0.9 1176 2 9, 9 20 10 000 10 1.45 265430 3 9, 9, 9 20 10 000 10 7.18 37732·105 1 9 20 10 000 100 1.57 1176 2 9, 9 20 10 000 100 6.52 267540 3 9, 9, 9 20 10 000 100 65 39648·105 1 9 20 50 000 500 7.84 1176 2 9, 9 20 50 000 500 30 267540 3 9, 9, 9 20 50 000 500 329 39648·105 1 9 20 1 000 10 0.1 1128.4 One step 2 9, 9 20 1 000 10 0.24 260960 3 9, 9, 9 20 1 000 10 2 24774·105 1 9 20 10 000 10 0.75 1109.4 2 9, 9 20 10 000 10 0.93 262570 3 9, 9, 9 20 10 000 10 2.4 16815·105 1 9 20 10 000 100 0.97 1136 2 9, 9 20 10 000 100 2.36 265650 3 9, 9, 9 20 10 000 100 19.7 21952·105 Table 3.3:
Green is the highest information found for d = 1. Yellow is the highest information found for d = 2. Blue is the highest information found for d = 3.
3.4
Design area and information, d = 2
If we set the full design space to have area 1, then the cut-off area is 0.875 and the cut-out area is 0.75. The relative design efficiency (see 2.5.6) of the full
space design is 1. whilst this number goes towards 0.75 and 0.73 for the cut-off and the cut-out design regions respectively, as N goes to infinity.
Discussion
4.1
Conclusions
For the two dimensional cut-off design region with the interaction model the D-optimal designs for N = 4,5,6 were found. For N ≥ 7 a pattern emerged that gives the rest of the D-optimal designs for this design region and model.
Such a pattern was not found for the cut-out design region but the continuous design could be calculated nonetheless and its exact designs are easily found by the program.
It was quite surprising that for both restricted design regions the design point (-1,-1) kept one fourth of the trials, just as it does in the full space design. D-optimal sequential designs are possible to make from our smallest design, D4, and onwards in the cut-out design region. Whilst for the cut-off design
region D-optimal sequential designs are possible starting from D7. Though
designs in the cut-out design region could be constructed sequentially starting from D4without great loss of information.
4.2
Restrictions and Further Studies
We have had to make many restrictions: to three dimensions, to (five) linear models, to only one optimality (D-optimality) and to two types of restricted regions.
If someone should continue this work the following topics would be interest-ing to investigate:
The program should be expanded into other optimality criterion such as A-optimality, E-optimality and C-optimality. I believe that C-A-optimality, above
all, would be easy to implement. The first thing in order to improve the Sys-tematic Search algorithm would be to make it iterative, the next would be to combine it with One Step. One could also try to combine the program with a combinatorial approach in order to handle non-ordinal data, see [5].
Regarding the design restrictions a few more design regions come to mind. The first one is the set {x = (x1, ..., xd) : xi∈ [0, 1] ∀ xi,P
d
i=1xi≤ 1} and the
second one is the set {x = (x1, ..., xd) : xi ∈ [0, 1] ∀ xi, P d i=1x
2
i ≤ 1}. Designs
where the design regions have a different number of levels for each factor can be constructed by the program already and these design regions ought to be explored. All of these extensions could be of interest for clinical research.
[1] Sven Erick Alm and Tom Britton. Stokastik. 2009.
[2] A.C. Atkinson, A.N. Donev, and R.D. Tobias. Optimum Experimental Designs, with SAS. Oxford University Press, 2007.
[3] P.F. Berger and W.K. Wong. An introduction to Optimal Designs for Social and Biomedical Research. 2009.
[4] de Aguiar P Bourguignon B Khots M Massart D Phan-Than-Luu R. D-optimal designs. Chemometrics and Intelligent Laboratory Systems, 30:199– 210, 1995.
[5] Gustaf Hector and Ludvig Vikström. Optimala komplementära försöks-planer för experiment med flernivåfaktorer. 2014.
[6] David C. Lay. Linear algebra and its applications. 2003.
[7] O Lombardi, F Holik, and Vanni L. What is Shannon information? PhilSci Archive, page 1, 2014.
[8] Carla C. Morris and Robert M. Stark. Finite Mathematics. 2016.
[9] Donald R. Mehlisch; Sue Aspley; Stephen E. Daniels; Kristin A. Souther-den; and Kyle S. Christensen. A single-tablet fixed-dose combination of racemic ibuprofen/paracetamol in the management of moderate to severe postoperative dental pain in adult and adolescent patients: A multicenter, two-stage, randomized, double-blind, parallel-group, placebo-controlled, fac-torial study. Clinical Therapeutics, 32(6, 2010), 2010.
[10] Sara van de Geer. Least squares estimation. Encyclopedia of Statistics in Behavioral Science, 2:1, 2005.
[11] Joakim Westerlund and Elisabet Borg. Statistik för beteendevetare. 2012.
[12] Songfeng Zheng. Fisher information and Cramér-Rao bound. 2017. [13] Douglas C. Montgomery Design and Analysis of Experiments. 8th Edition
2012.
[14] Y.B. Chan http://www.ms.unimelb.edu.au/ s620371/slides/ch04.pdf. Ch4, 2016.
[15] Y.B. Chan http://www.ms.unimelb.edu.au/ s620371/slides/ch06.pdf. Ch6, 2016.
[16] Zhu L Dasgupta T Huang Q A D-Optimal Design for Estimation of Pa-rameters of an Exponential-Linear Growth Curve of Nanostructures. 2014. [17] William H. Greene Econometric Analysis. 5th Edition 2002.
Appendix
A.1
Code
All code and instructions on how to run it can be found at this link Folder in One Drive.
A.2
Isomorphic and Isoinformatic designs
I
DISO
Dp Dp+1 ... DN DN +1 ...
Figure A.1: The outer box represents the set of all designs with information greater than zero. The light gray box, I, is the set of all isoinformatic designs and the dark gray box, DISO, is the set of all isomorphic designs with size