Bias, Variance and Optimal Experimental Design: Some Comments on Closed Loop Identication
Lennart Ljung and Urban Forssell Department of Electrical Engineering Linkoping University, S-581 83 Linkoping, Sweden
WWW:
http://www.control.isy.liu .seEmail:
ljung,ufo@isy.liu.seMarch 3, 1999
REGLERTEKNIK
AUTOMATIC CONTROL LINKÖPING
Report no.: LiTH-ISY-R-2100
Submitted to \Perspectives in Control, a tribute to I.D. Landau, Paris, June 1998"
Technical reports from the Automatic Control group in Linkoping are
available by anonymous ftp at the address
ftp.control.isy.liu.se.
This report is contained in the compressed postscript le
2100.ps.Z.
Bias, Variance and Optimal Experiment Design:
Some Comments on Closed Loop Identication
Lennart Ljung and Urban Forssell Division of Automatic Control Department of Electrical Engineering Linkoping University, S-581 83 Linkoping, Sweden
E-mail: ljung@isy.liu.se, ufo@isy.liu.se URL: http://www.control.isy.liu.se/
March 3, 1999
Abstract
In this contribution we shall describe a rather unied way of expressing bias and variance in prediction error estimates.
The emphasis is on systems operating in closed loop. We shall describe the identication criterion function in the frequency do- main. The crucial entity is the joint spectrum of input and noise source. Dierent factorizations of this spectrum give dierent insights into the bias mechanisms of closed loop identication.
It will be shown that so called
indirect identicationis the an- swer to the question of how to obtain consistent estimates of the dynamics part, even with an erroneous noise model. We also consider optimal design of experiments that seek to minimize the weighted variance of the dynamics estimate. It is shown that open loop experiments are optimal if the input power is constrained. However for any criteria that involve any kind of constraints on the output power, closed loop experiments will be optimal. The optimal regulator does not depend on the weight- ing function in the criterion to be minimized.
1 Introduction and Setup
Identication of systems operating in closed loop have long been of interest. The reasons are that many systems are not allowed to operate in open loop during an identication experiment. Adaptive control is another situation, where closed loop identication issues naturally arise. See among many references also 15]. The recent interest in so called identication for control has also spurred new methods and results, 4, 13,18],6] and 17].
See, among many general references on closed loop identication,
8],16],1], 9],3],2, 5,11] and 12].
We shall consider identication of a linear system in a traditional prediction error setup. See 14] for all technical details. The true system is supposed to be described by
y
(
t) =
G0(
q)
u(
t) +
H0(
q)
e(
t) (1) where
qis the delay operator,
uis the input,
yis the output and
eis white noise with covariance matrix
0.
The system is operating under arbitrary feedback, but we assume that all signals are quasi-stationary, so that the spectrum of
=
ue
(2) is well dened, and denoted by
(
!) =
u(
!)
ue(
!)
eu(
!)
0
(3) The system is identied within the model structure
y
(
t) =
G(
q)
u(
t) +
H(
q)
e(
t) (4)
G
will be called the dynamics model and
Hthe noise model. The parameter
is estimated by
^
N
= arg min
2D
M V
N
(
ZN) (5)
V
N
(
ZN) = 1
N N
X
t=1
"
T
(
t)
;1"(
t) (6)
"
(
t) =
y(
t)
;y^ (
tj) =
H;1(
q)(
y(
t)
;G(
q)
u(
t)) (7) Here is a symmetric, positive denite weighting matrix.
We shall discuss the asymptotic properties of ^
Nin the sequel.
2 Expressions for the Data Spectrum
The data spectrum plays an important role in the analysis, and we shall therefore collect some results on it here. We shall introduce the following spectrum
ru=
u;ue;10 eu(8) where we suppress the argument
!. This is the spectrum of that part of the input
u, that cannot be estimated from
eby a linear, time- invariant lter.
Similarly we introduce
re=
0;eu;1u ue(9)
The data spectrum can now be written as
u ue eu 0
=
I ue;100
Iru
0 0
0
I
0
;10 eu I
(10)
=
I0
eu;1u Iu
0 0
re
I
;1u ue0
I
(11) (12) From these factorization results we also nd an expression for the inverse
;1=
u ue eu 0
;1
= (
ru)
;1 ;(
ru)
;1ue;10;
;10 eu(
ru)
;1(
re)
;1
(13) In case the regulator is linear, time-invariant, we have
u
(
t) =
r(
t)
;K(
q)
y(
t) (14) where
K(
q) is a linear regulator of appropriate dimensions and where the reference signal
fr(
t)
gis independent of
fe(
t)
g. We then have the following expressions. Let
Sand
Sithe the output and input sensitivity functions:
S
0
= (
I+
G0K)
;1 S0i= (
I+
KG0)
;1(15) Then
u=
S0ir(
S0i)
+
KS0vS0K(16) where
ris the spectrum of the reference signal and
v=
H00H0the noise spectrum. Superscript
denotes complex conjugate transpose.
We shall denote the two terms in (16)
ru=
S0ir(
S0i)
(17) and
eu=
KS0vS0K=
S0iKvK(
Si0)
(18) The cross spectrum between
uand
eis
ue=
;KS0H00=
;S0iKH00(19)
3 The Main Expression
From standard asymptotic theory we know that
^
N
!
arg min
V(
) w.p.1 as
N !1(20)
where
V
(
) =
EVN(
ZN)
=
Z;
tr
;1H;1G H
HG
(
H;1)
d!
(21) Here we have introduced the simplied notation
G=
G0(
ei!)
;G(
ei!)
H=
H0(
ei!)
;H(
ei!) To see (21) we rewrite (7) using (1) as
"
=
H;1(
y;Gu) =
H;1(
G0;G)
u+
H;1H0e=
H;1(
G0;G)
u+ (
H;1H0;I)
e+
e=
H;1(
G0;G)
u+ (
H0;H)
e] +
eWe then use that
H0and
Hare both monic (so that the dierence starts with a delay) and that
e(
t) is independent of (
G0 ;G)
u(
t), which is the case if either the regulator or the model/system contains a delay. Parseval's relationship then gives (21).
4 Identiability, Bias and the Indirect Method
4.1 Consistency and Identiability
Identiability essentially means that the estimate is consistent and will converge to the true system, when the system is contained in the model structure. This is a joint requirement on the model structure and on the experiment condition, i.e., the data spectrum .
The basic expression (20) with (21) shows that
consistency and identiability follows when no model in the structure
G Hlies in the null space of .
A sucient condition for identiability is thus that the data spec- trum is positive denite (non-singular) almost everywhere. From (10) we see that this happens if and only if (we assume
0to be non- singular)
ru(
!)
>0
a:e:!(22)
In 14] this is termed that the experiment is informative enough . With
the denition (8) this means that there should be a full rang part of
uthat cannot be estimated from
eby a linear, time invariant lter. It es-
sentially means that the regulator should not be linear, time-invariant
with no extra signals (disturbances or setpoints). A persistently excit- ing setpoint (reference signal), a non-linear regulator or a time-varying regulator will basically secure (22).
Constraining the model structure may however secure identiabil- ity even when (22) does not hold. Suppose, for example that the noise model
His xed to the true value
H0(
H= 0), and that the dy- namics model structure
Gcontains the true system
G0. Then we see directly from (21) that consistency follows as soon as
u(
!)
>0
a:e:!. However, it is essential that the noise model is the true one, otherwise the limit of
Gwill still be unique, but biased.
4.2 Bias Distribution
To more clearly see bias issues and lack of identiability we can use the factorization (11) to rewrite the convergence expression as
^
N
!
arg min
2D
M Z
;
tr
;1H;1(
G0+
B;G)
u(
G0+
B;G)
+ (
H0;H)
re(
H0;H)
]
H;d!w. p. 1 as
N !1(23) where
B
= (
H0;H)
eu;1u(24) In the SISO, linear regulator case, we can characterize the "size" of
Bas
jB
j
2
=
u0eu ujH0;Hj2(25) (see (16) and (19).) For a xed noise model
H=
H, we see that the dynamics model will approximate the biased frequency function
G
0
+
Bin a frequency domain norm determined by
H;1and
u(The ratio
u=jHj2in the SISO case). We obtain an approximation of the correct dynamics in case
B= 0, which means that the noise model is correct
H=
H0, or the system operates in open loop:
eu= 0.
4.3 The Indirect Method
We saw above that the noise model has to be correct, in order to ensure consistency of the dynamics part. Let us now ask the question:
Suppose the dynamics model structure
Gcontains the true dynam- ics
G0, and that we are not interested in the noise characteristics. Is is then possible to obtain a consistent estimate of
G?
To answer that question, we rewrite the basic expression (21) using
the alternative factorization (10):
V
(
) =
Z;
tr
;1H;1(
Gru(
G)
H;d!+
Z
;
tr
;1H;1(
Gue;10+
H)
0(
Gue;10+
H)
H;d!(26)
To assure consistency even if the noise model is not correct, we should make the second term (the second integral) independent of
G. We then expand the factor in the integral as follows (using the expression (19) for
ue)
H
;1
(
GKS0H0;G0KS0H0+
H0)
;I=
H;1(
GK;G0K+
I+
G0K)
S0H0;I=
H;1(
I+
GK)
S0H0;I= ~
H;1S0H0;Iwhere we introduced the noise model parameterization
H
= (
I+
GK) ~
H=
S;1 H~
(27) where
is a parameterization, independent of
. Here
Sis the model sensitivity function, compare (15). With (27) we have thus achieved that the second integral of (26) is independent of the parameterization of
G. The rst integral will thus determine to what the dynamics model converges: If ~
His a xed model we have
^
N
!
arg min
2D
M Z
;
tr
;1H~
;1(
SG)
ru(
SG)
H~
;d!(28) With (17) we nd that
(
SG)
ru(
SG)
= (
SGS0i)
r(
SGS0i)
(29) with
S
0 G
0
;S
G
=
G0S0i;SG=
S((
I+
GK)
G0;G(
I+
KG0))
S0i=
SGS0iThis shows that the noise model parameterization (27) will t the closed loop model
SGto the closed loop system
S0G0in a norm that is determined by the reference signal spectrum
rand the xed noise model ~
H.
In fact the parameterization (27) corresponds to a well known method for dealing with closed loop identication data: The predictor for
y
(
t) =
G(
q)
u(
t) +
H(
q)
e(
t) (30)
is
^
y
(
tj) =
H;1(
q)
G(
q)
u(
t) + (
I;H;1(
q))
y(
t) (31) Using
u=
r;Kyand inserting (27) we get
^
y
(
tj) = ~
H;1(
q)(
I+
G(
q)
K(
q))
;1G(
q)
r(
t)
+ (
I;H~
;1(
q))
y(
t) (32) But this is exactly the predictor also for the closed-loop model struc- ture
y
(
t) = (
I+
G(
q)
K(
q))
;1G(
q)
r(
t) + ~
H(
q)
e(
t) (33) Identifying the closed loop, and then solving for the open loop dynam- ics is called the indirect method . We have here derived this approach as the answer to the question of how to obtain consistent dynamics models, without dealing with a noise model. Of course, in the open loop case (
K= 0) (27) tells us that this is achieved by letting the noise model be parameterized independently from the dynamics.
5 Asymptotic Variance
5.1 Parameter and Transfer Function Covariance
The classical result on the asymptotic distribution of the parameter estimates (cf 14], Chapter 9) is as follows. Suppose that the true sys- tem is contained in the model structure, and that the experimental conditions are such that identiability is secured. Let the true param- eters be denoted by
0. In the multi-output case we also assume that
=
0. Then
pN(^
N ;0) converges in distribution to the normal distribution with zero mean and a covariance matrix
P
=
V00(
0)]
;1(34) where
Vis dened by (21) and prime denotes dierentiation w.r.t
.
Let us from now on concentrate on the SISO case for notational simplicity and denote
T
=
G H T^
N=
G^
N H^
N(35) Then, using (21) we can write
P
= 12
Z
;
1
vT0(
T0)
d!
;1
(36)
Here
v=
jH0j20. From this expression and the factorizations (10)
and (11) explicit expression can be found how
rand
Kaect the
parameter accuracy.
If we are interested in the covariance of ^
TN, rather than the pa- rameters, Gauss' approximation formula gives the expression
N
Cov ^
TN(
ei!) =
T0(
ei!)
T
2 1
Z
;
v1 (
)
T0(
ei)
u(
)
ue(
)
eu(
)
0
T
0
(
e;i)
Td
;1
T 0
(
e;i!)
(37)
5.2 Asymptotic Black Box Expressions
The expression (37) shows an intriguing symmetry, with the factors
T0in \cancelling positions". In fact, suppose that parameterization of
Thas the following shift structure,
=
2
6
4
...
1
n 3
7
5
d
d
k
T
(
q) =
q;k+1 dd
1
T
(
q)
which is satised by many black-box model parameterizations. Then, as
ntends to innity we have (see 14], Chapter 9):
Cov
G^
N(
ei!) ^
HN(
ei!)
nN
v(
!)
u(
!)
eu(
!)
ue(
!)
0
;1
(38)
n
is here the \model order"and
Nis the number of data. From (13) we then nd that
Cov
G^
N(
ei!)
nN
v(
!)
ru(
!) (39)
6 Optimal Experiment Design
In this section we will consider experiment design problems where the goal is to minimize a weighted norm of the covariance of
G:
J
=
Z;
Cov ^
GN(
e;i!)
C(
!)
d!(40) The minimization shall be carried out with respect to the experiment design variables, which we take as
K(the regulator) and
r(the refer- ence signal spectrum). Other equivalent choices are also possible, e.g.,
uand
ueor
Kand
ru. To make the designs realistic we will also impose constraints on the input power or the output power, or both.
Consider the problem to minimize
Jgiven by (40) and using the asymptotic expression (39). The minimization is to be carried out under the constraint
Z
;
f
u+ (1
;)
ygd!1
20
1] (41)
with respect to the design variables
Kand
r. The solution is to select the regulator
u=
;Kythat solves the standard LQG problem
K
opt
= arg min
K
Eu2+ (1
;)
Ey2]
y=
G0u+
H0e(42) The reference signal spectrum shall be chosen as
optr(
!) =
pv(
!)
C(
!)
j1 +
G0(
ei!)
Kopt(
ei!)
j2p
+ (1
;)
jG0(
ei!)
j2(43) where
is a constant, adjusted so that
Z
;
f
u+ (1
;)
ygd!= 1 (44) This result can be proved as follows:
Proof. Replace the design variables
Kand
rby the equivalent pair
Kand
ru. Then, by using expressions for the input and output spectra in terms of
Kand
ruwe can rewrite the problem as
min
K
r
u Z
;
v ruCd!under the contstraint
Z
;
f
(
+ (1
;)
jG0j2)
ru+
jKj2+ (1
;)
j
1 +
G0Kj2 vgd!1
2
0
1]
(45)
The criterion function is independent of
Khence the optimal controller
K
opt
can be found by solving the LQ problem min
K Z
;
jKj
2
+ (1
;)
j
1 +
G0Kj2 vd!(46) (Here it is implicitly assumed that
y(
t) =
G0(
q)
u(
t) +
v(
t)
u(
t) =
;K
(
q)
y(
t), and
20
1].) This proves (42). Dene the constant
as
= 1
;Z;
jK opt
j
2
+ (1
;)
j
1 +
G0Koptj2 vd!(47) Problem (45) now reads
min
r
u f
Z
;
v ruCd!:
Z;
(
+ (1
;)
jG0j2)
rud!g(48) This problem has the solution (cf. 14], p. 376)
ru=
s
vC(
+ (1
;)
jG j2) (49)
where
is a constant, adjusted so that
Z
;
((
+ (1
;)
jG0j2)
rud!=
(50) or in other words so that
Z
;
f
u+ (1
;)
ygd!= 1 (51) Consequently the optimal
ris
optr=
pvC j1 +
G0Koptj2p