Inconsistency of an Approximate Prediction Error Method for Wiener Model Identification

(1)

Inconsistency of an Approximate Prediction

Error Method for Wiener Model Identification

Anna Hagenblad

Department of Electrical Engineering

Link¨

oping University, SE-581 83 Link¨

oping, Sweden

WWW: http://www.control.isy.liu.se

Email: annah@isy.liu.se

August 17, 2000

REGLERTEKNIK

AUTO_{MATIC CONTR}OL

LINKÖPING

Report no.: LiTH-ISY-R-2275

Technical reports from the Automatic Control group in Link¨oping are available by anonymous ftp at the address ftp.control.isy.liu.se. This report is contained in the file 2275.pdf.

(2)

(3)

Inconsistency of an Approximate Prediction

Error Method for Wiener Model Identification

Anna Hagenblad

August 17, 2000

Abstract

A Wiener model consists of a linear dynamic block followed by with a nonlinear static block. When identifying the parameters of such a system, the Prediction Error Method (PEM) can be used. Depending on how noise enters the system, the predictor can be difficult to express, and an approximate predictor may be interesting. The estimate obtained from using this approximate predictor is however not always consistent. In this report we investigate this inconsistency.

1 Introduction

The Wiener model consists of a linear dynamic system, which we shall denote G, followed by a static nonlinearity, here denoted f . It is depicted in Figure 1. We

u(t) G(q, θ) x(t) v(t) f (·, η) e(t) y(t) + +

Figure 1: The Wiener model. G is a linear dynamic system, f is a static nonlinearity. u(t) is the input signal, y(t) the output signal, and the unmeasurable intermediate signal is given by x(t). v(t) denotes process noise, and e(t) measurement noise.

will in this paper assume that the input signal u(t) is known, and that the output y(t) is measured, possibly with noise. The intermediate signal x(t) cannot be measured. In Figure 1, v(t) denotes process noise, and e(t) measurement noise. The model class consists of parametric models in discrete time, and we assume that the linear and the nonlinear subsystem are independently parameterized, the linear system as G(q, θ), where q denotes the time shift operator and θ the parameters, and the nonlinear system as f (·, η), where the dot denotes the input to the nonlinear system and η the parameters. The output of the Wiener model is

y(t) = f x(t), η+ e(t) = f G(q, θ)u(t) + v(t), η+ e(t). (1)

(4)

A Wiener model can be used to describe a linear system with a nonlinear measurement device. Zhu (1999a) uses a Wiener model to identify a distillation column, and Kalafatis et al. (1995) cites several biological examples.

Several approaches to the identification of Wiener models can be found in the literature. Some references are Zhu (1999b), Kalafatis et al. (1995), Bruls et al. (1997), Wigren (1993), Hunter and Korenberg (1986) and Billings and Fakhouri (1982). Most of these deal with the special case that either v(t) or e(t) is zero. In this paper, we focus on the Prediction Error Method, which is also discussed in Hagenblad (1999).

2 The Prediction Error Method

The idea behind the Prediction Error Method (Ljung, 1999) in identification is to measure the quality of the estimated model in terms of how well it predicts future output. For a model parameterized with the parameter vector Θ (in our case, Θ = (θ, η)), the predictor of the output y is defined as

ˆ

y(t, Θ) = E y(t)|Zt−1

, Θ, (2)

where E denotes expectation and Zt−1the set of old inputs and outputs, Zt−1= {u(1), y(1), u(2), y(2), . . . , u(t − 1), y(t − 1)}.

We compare the prediction with the measured value, and write the prediction error criterion as VN(Θ) = 1 N N X t=1 y(t)− ˆy(t, Θ)2. (3)

Also other norms than the mean square error can be used (Ljung, 1999). The prediction error estimate is the one that minimizes the criterion (3), i.e.,

ˆ

Θ = argmin Θ

VN(Θ). (4)

Minimization of the prediction error criterion (3) poses (at least) two prob-lems: how to find the predictor, and how to minimize the criterion. In this report we will focus on the first one. For more discussion on the minimization of the criterion, numerical methods, and problems related to that, see Hagenblad (1999).

2.1 An Approximation of the Predictor

Assume in the Wiener model described in Equation (1) that the process noise v(t) is zero, and that the measurement noise e(t) is white with zero mean. It is then easy to specify the predictor from Equation (2) as

ˆ

ya(t, Θ) = E y(t)|Zt−1, Θ

= f G(q, θ)u(t), η. (5) If we have non-zero process noise, the predictor is much harder to calculate, but it can approximated by an Extended Kalman filter (c.f., Hagenblad, 1999). This may however be more complicated, and it may seem reasonable to use Equation (5) as an approximation instead.

(5)

3 Consistency

If the measured data are generated from the assumed model (in this case Wiener model) for a particular parameter vector (θ0, η0), will the identification method yield these true parameters, at least when the number of data, N , tends to infinity? This is the question of consistency. Under general assumptions, it can be shown that the prediction error method is consistent (Ljung, 1978). This report investigates the consistency of the approximate predictor defined in Equation (5).

The approximate prediction error criterion, using the approximate predictor, is VN(θ, η) = 1 N N X t=1 y(t)− ˆya(t, θ, η) 2 = = 1 N N X t=1

f G(q, θ0)u(t) + v(t), η0+ e(t)− f G(q, θ)u(t), η 2

. (6) The consistency question can be stated as following:

Given the true parameters (θ0, η0), do there exist parameters (θ1, η1) such that VN(θ1, η1) < VN(θ0, η0) when N→ ∞ ?

If the answer is no, the estimate obtained from the approximate prediction error method is consistent. We disregard here the cases when there are parameter vectors (θ1, η1) that give the same criterion value as (θ0, η0). In these cases, the prediction error estimate is not unique. Depending on the situation, one may be content with any estimate that gives the same output as the true one, or one may pose additional constraints to enforce uniqueness of the estimate. In the following, “inconsistent estimates” refers to the parameter values that give a strictly lower criterion value.

We will denote limN→∞_N1

PN

t=1 with E. If a stochastic variable w(t) is

ergodic, Ew(t) coincides with Ew(t). We assume that all stochastic signals are ergodic.

Letting N tend to infinity, we can then expand Equation (6) as follows: V (θ, η) = Ef2 G(q, θ0)u(t) + v(t), η0+ Ee2(t) + Ef2 G(q, θ)u(t), η

+ 2Ef G(q, θ0)u(t) + v(t), η0e(t)− 2Ee(t)f G(q, θ)u(t), η

− 2Ef G(q, θ0)u(t) + v(t), η0f G(q, θ)u(t), η. (7) Assuming that e(t) has zero mean and is independent of the measurement noise v(t), this simplifies to

V (θ, η) = Ef2 G(q, θ0)u(t) + v(t), η0+ Ee2(t) + Ef2 G(q, θ)u(t), η − 2Ef G(q, θ0)u(t) + v(t), η0f G(q, θ)u(t), η. (8) We may disregard the term Ee2_{(t) since it is independent of the parameters.} An equivalent error criterion is then

˜ V (θ, η) = E f G(q, θ0)u(t) + v(t), η0− f G(q, θ)u(t), η 2 , (9) 3

(6)

and the estimate is inconsistent if we can find parameters (θ1, η1) such that ˜ V (θ1, η1) < ˜V (θ0, η0), (10) or expanding: E f G(q, θ0)u(t) + v(t), η0− f G(q, θ1)u(t), η1 2 < E f G(q, θ0)u(t) + v(t), η0− f G(q, θ0)u(t), η0 2 . (11) From this criterion it can be seen that the consistency depends on the following:

• The true system (the parameters θ0 and η0).

• The distribution (the properties) of the process noise v(t). • The input signal u(t).

We can also see that if there is no process noise, the right hand side of Equation (11) is zero. Since the left hand side is square, and thus always greater than or equal to zero, there are no parameters θ1and η1 that gives a lower value of the criterion than the true θ0 and η0. . The estimate is thus consistent. (This is of course not surprising since the approximative predictor then coincides with the true one.)

Equation (11) is equivalent to E

f G(q, θ0)u(t) + v(t), η0− f G(q, θ0)u(t), η0+ f G(q, θ0)u(t) + v(t), η0 − f G(q, θ1)u(t), η1·f G(q, θ0)u(t) + v(t), η0− f G(q, θ0)u(t), η0

− f G(q, θ0u(t) + v(t), η0+ f G(q, θ1)u(t), η1> 0, (12) which can be simplified to

E

2f G(q, θ0)u(t) + v(t), η0− f G(q, θ0)u(t), η0− f G(q, θ1)u(t), η1 ·f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0> 0. (13) Now, using Taylor expansion,

f G(q, θ0)u(t) + v(t), η0= f G(q, θ0)u(t)+ O1 _v(t)_, ₍₁₄₎ and entering that into Equation (13), we have

E

f G(q, θ0)u(t), η0− f G(q, θ1)u(t), η1+ O v(t)

·f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0> 0, (15) 1_{Big ordo}

(7)

or equivalently, E f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0O v(t) > E f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0 2 . (16) For small noise levels, the ordo term tends to zero, so the inequality is false (meaning that the true parameters (θ0, η0) gives a smaller criterion value than (θ1, η1)) since the right hand side always is larger than or equal to zero. To get an inconsistent estimate, the effect of the noise must be large enough, and also have the correct sign.

To get further insight in the consistency conditions, we use a higher order Taylor expansion. The inconsistency condition is then

E f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0 ·2f0 G(q, θ0)u(t), η0)v(t) + f00 ξθ0,η0,u,v(t)(t) v2(t) > E f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0 2 , (17) where ξ(t) is a number between G(q, θ0)u(t) and G(q, θ0)u(t) + v(t). Without loosing generality, the process noise may be assumed to have zero mean. We also assume that it is independent of the input. The criterion is then

E

f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0f00 ξθ0,η0,u,v(t)(t) v2(t) > E f G(q, θ1)u(t), η1− f G(q, θ0)u(t), η0 2 . (18) To get an inconsistent estimate, the noise variance must be large enough, and the second derivative of the nonlinearity must have the appropriate sign. Recall that E = limN→∞_N1

PN

t=1, so the statements above should be interpreted in a

mean sense.

4 An Example

We will examine the following simple example more in detail:

x(t) = u(t) + v(t) (19)

y(t) = ax2_(t) ₍₂₀₎

For the true system, a = a0. According to the inconsistency criterion (13), we want to know if there exists a parameter value a1 such that

E

2a0 u(t) + v(t)2− a0u2(t)− a1u2(t)

a1u2(t)− a0u2(t)> 0, (21) or equivalently

E (a0− a1)u2(t) + 4a0u(t)v(t) + 2a0v2(t)(a1− a0)u2(t) > 0. (22)

(8)

Assuming that u and v are uncorrelated, and that v has zero mean, this is equivalent to

2a0(a1− a0)Ev2(t)Eu2(t) > (a1− a0)2Eu4(t). (23) This implies different conditions for inconsistency (cf above):

• a0(a1− a0) > 0. • The noise variance Ev2

(t) must be large enough. • or equivalently, the input signal must be small enough.

We may regard this criterion from two different viewpoints, and get two slightly different results. Assume in both cases a fixed input signal u, and that a0> 0.

1. Consider a fixed parameter value a1, such that a1 > a0. If the noise variance is large, Ev2_{(t) >} (a1−a0)Eu4(t)

2a0Eu2(t) , the inequality in Equation (23) holds, so a1 gives a lower (approximate) prediction error criterion value than the true parameter a0. On the other hand, if the noise is small enough, the true parameters will give a lower criterion value (as desired). 2. Consider instead a fixed noise level, Ev2(t) = σ2v. Equation (23) may then

be transformed into the conditions a0< a1< 2a0σ 2 vEu2(t) Eu4_(t) + a0 or a0− 2a0σ2 vEu2(t) Eu4_(t) < a1< a0. (24) It is always possible to find such an a1. The pessimistic conclusion is that no matter how low the noise level is, there is always some parameter value that gives a lower criterion value than the true parameters. However, this parameter value will in general be close to the true parameter, and the smaller the noise variance σv2 is, the closer it will be. This need thus not

have any practical importance. Similar results hold for a0< 0.

5 Conclusions

We have analyzed the properties of an approximate prediction error criterion for identification of Wiener models with process noise, and shown that it is in general not consistent. The inconsistency depends on the input signal u(t), the properties of the process noise v(t) and the true system parameters, θ0and η0, in particular the second order derivative of the nonlinearity f (·, η0). For a special case, it has been shown that for any nonzero noise variance, there is always a parameter that gives a lower criterion value than the true parameter. For small noise variances, this biased parameter is, however, close to the true one.

(9)

References

Billings, S. A. and Fakhouri, S. Y. (1982). Identification of systems containing linear dynamics and static nonlinear elements. Automatica, 18(1):15–26. Bruls, J., Chou, C. T., Haverkamp, B. R. J., and Verhaegen, M. (1997). Linear

and non-linear system identification using separable least-squares. Submitted to European Journal of Control.

Hagenblad, A. (1999). Aspects of the identification of wiener models. Techni-cal Report Licentiate Thesis no. 793, Department of ElectriTechni-cal Engineering, Link¨oping University, SE-581 83 Link¨oping, Sweden.

Hunter, I. W. and Korenberg, M. J. (1986). The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biological Cy-bernetics, 55:135–144.

Kalafatis, A., Arifin, N., Wang, L., and Cluett, W. R. (1995). A new approach to the identification of pH processes based on the Wiener model. Chemical Engineering Science, 50(23):3693–3701.

Ljung, L. (1978). Convergence analysis of parametric identification methods. IEEE Transactions of Automatic Control, AC-23:770–783.

Ljung, L. (1999). System Identification, Theory for the User. Prentice Hall, Englewood Cliffs, New Jersey, USA, second edition.

Wigren, T. (1993). Recursive prediction error identification using the nonlinear Wiener model. Automatica, 29(4):1011–1025.

Zhu, Y. (1999a). Distillation column identification for control using Wiener model. In 1999 American Control Conference, Hyatt Regency San Diego, California, USA.

Zhu, Y. (1999b). Parametric Wiener model identification for control. In 14th World Congress of IFAC, pages 37–42, Beijing, China.