On Input Design for System Identiﬁcation

(1)

On Input Design for System Identification

Input Design Using Markov Chains

CHIARA BRIGHENTI

Masters’ Degree Project Stockholm, Sweden March 2009

XR-EE-RT 2009:002

(2)

Abstract

When system identification methods are used to construct mathematical models of real systems, it is important to collect data that reveal useful information about the systems dynamics. Experimental data are always corrupted by noise and this causes uncertainty in the model estimate. There- fore, design of input signals that guarantee a certain model accuracy is an important issue in system identification.

This thesis studies input design problems for system identification where time domain constraints have to be considered. A finite Markov chain is used to model the input of the system. This allows to directly include input amplitude constraints into the input model, by properly choosing the state space of the Markov chain. The state space is defined so that the model generates a binary signal. The probability distribution of the Markov chain is shaped in order to minimize an objective function defined in the input design problem.

Two identification issues are considered in this thesis: parameter estimation and NMP zeros estimation of linear systems. Stochastic approximation is needed to minimize the objective function in the parameter estimation problem, while an adaptive algorithm is used to consistently estimate NMP zeros.

One of the main advantages of this approach is that the input signal can be easily generated by extracting samples from the designed optimal distribution. No spectral factorization techniques or realization algorithms are required to generate the input signal.

Numerical examples show how these models can improve system identification with respect to other input realization techniques.

1

(3)

Acknowledgements

Working on my thesis at the Automatic Control department at KTH has been a great experience.

I would like to thank my supervisor Professor Bo Wahlberg and my advisor Dr. Cristian R. Rojas for having given me the possibility to work on very interesting research topics. Thank you for your guidance. A special thank goes to Cristian R. Rojas, for the time he spent on me and for the many ideas shared with me.

Thanks to all the people I had the pleasure to know during this work pe- riod: Mitra, Andre B., Fotis, Pedro, Mohammad, Andre, Pierluigi, Matteo, Davide and Alessandro. I really enjoyed your company.

I would like to thank Professor Giorgio Picci who made this experience possible.

Finally, thanks to my family and my friends for their continuous help and support.

2

(4)

Acronyms

AR Autoregressive

FDSA Finite Difference Stochastic Approximation FIR Finite Impulse Response

LMI Linear Matrix Inequality LTI Linear Time Invariant NMP Non Minimum Phase PEM Prediction Error Method PRBS Pseudo Random Binary Signal RLS Recursive Least Squares

SISO Single Input Single Output

SPSA Simultaneous Perturbation Stochastic Approximation WN White Noise

I

(5)

List of Figures

4.1 Graph representation of the two states Markov chain S₂. . . . 20 4.2 Graph representation of the four states Markov chain S₄. . . 20 4.3 Graph representation of the three states Markov chain S₃. . . 25 5.1 Mass-spring-damper system. . . 31 5.2 Cost functions f₂ and f₃ in the interval of acceptable values

of the variables u_max and y_max, respectively. . . 32 5.3 Cost functions f₁, f₄,f₅ in the interval of acceptable values. 32 5.4 Estimate of the cost function on a discrete set of points for

the two states Markov chain in the case 2. . . 34 5.5 Estimation of the best transition probability for the two states

Markov chain in the case 2. . . 35 5.6 Estimation of the best transition probability p for the four

states Markov chain in the case 2. . . 37 5.7 Estimation of the best transition probability r for the four

states Markov chain in the case 2. . . 37 5.8 Estimate of the cost function on a discrete set of points for

the two states Markov chain in the case 1. . . 38 5.9 Estimate of the cost function on a discrete set of points for

the four states Markov chain in the case 1. . . 38 5.10 Bode diagrams of the optimal spectra of the 2 states Markov

chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system. . . 40 5.11 Bode diagrams of the optimal spectra of the 4 states Markov

chains in the cases 1, 2 and 3 of Table 5.3, and of the real discrete system. . . 40 5.12 Estimates of the frequency response of the system using the

optimal two states Markov chains for the cases 1, 2 and 3 in Table 5.1. . . 41

IV

(8)

5.13 Estimates of the frequency response of the system using the optimal four states Markov chains for the cases 1, 2 and 3 in Table 5.1. . . 42 6.1 Representation of the adaptive algorithm iteration at step k.

u^k denotes the vector of all collected input values from the beginning of the experiment, used for the output prediction ˆ

y (k). . . . 50 6.2 A zero estimate trajectory produced by the adaptive algo-

rithms described in Sections 6.5.2 and 6.5.3. . . 51 6.3 Normalized variance of the estimation error for the adaptive

algorithms described in Sections 6.5.2 and 6.5.3. . . 52

V

(9)

List of Tables

4.1 Poles and zeros of the canonical spectral factor of the spec- trum of s_m of Example 4.2.4. . . 24 5.1 Maximum threshold values in the three analyzed cases. . . 33 5.2 Results of 100 Monte-Carlo simulations of the algorithm with

the 2 states Markov chain. . . 34 5.3 Optimal values of the transition probabilities in the cases 1,

2 and 3, obtained after 30000 algorithm iterations. . . 35 5.4 Total cost function values obtained with the optimal Markov

inputs, a PRBS and white noise in case 1. . . 36 5.5 Trace of the covariance matrix obtained with the optimal

Markov inputs, a PRBS, white noise, a binary input having the optimal correlation function and the optimal spectrum in case 2. . . 36 5.6 Total cost function value obtained with the optimal Markov

inputs, a PRBS and white noise in case 3. . . 36 5.7 Estimated values of the parameters of the continuous real sys-

tem and relative percentage errors, obtained with the optimal two states Markov chains. . . 41 5.8 Estimated values of the parameters of the continuous real sys-

tem and relative percentage errors, obtained with the optimal four states Markov chains. . . 42

VI

(10)

Chapter 1

Introduction

Mathematical models for systems are necessary in order to predict their behavior and as parts of their control systems. This work focuses on models constructed and validated from experimental input/output data, by means of identification methods.

Information obtained through experiments on the real system depend on the input excitation which is often limited by amplitude or power constraints. For this reason, experiment design is necessary in order to obtain system estimates within a given accuracy, saving time and cost of the experiment [1]. Robustness of input design for system identification is also one of the most important issues, specially when the model of the system is used for projecting its control system. In [2]- [7] some studies on this problem are presented. The effects of undermodeling on input design are pointed out in [8] and [9].

Depending on the cost function considered in this setting, input design can be typically solved as a constrained optimization problem. In the Pre- diction Error Method (PEM) framework it is common to use, as a measure of the estimate accuracy, a function of the asymptotic covariance matrix of the parameter estimate. This matrix depends on the input spectrum that can then be shaped in order to obtain a “small” covariance matrix and improve the estimate accuracy (see [10], [11]). Usually, a constraint on the input power is also included; in this way, time domain amplitude constraints are approximately translated in the frequency domain [12].

A first disadvantage of these methods is that they are strongly influ- enced by the initial knowledge of the system. Secondly, solving the problem in the frequency domain does not provide any further information on how to generate the input signal in the time domain: the input can be represented as filtered white noise, but many probability distributions can be used to

1

(11)

CHAPTER 1. INTRODUCTION 2

generate white noise. Furthermore, in practical applications time domain constraints on signals have to be considered and the power constraint that is usually set in the frequency domain does not assure that these constraints are respected. For this reason, in [13] a method is proposed to generate a binary input with a prescribed correlation function; once an optimal spectrum or correlation function is found solving the input design problem in the frequency domain, it is possible to generate a binary signal which approxi- mates the optimal input. Also in [14] a method is proposed that provides a locally optimal binary input in the time domain.

This thesis studies the input design problem in the probability domain.

Compared to design methods in the frequency domain, a solution in the probability domain makes it easier to generate input trajectories to apply to the real system, by extracting samples from a given distribution. Inputs are modeled by finite stationary Markov chains which generate binary signals.

Binary signals are often used in system identification and one of the reasons is that they achieve the largest power in the set of all signals having the same maximum amplitude and it is well known that this improves parameter estimation for linear models.

The idea of modeling the input by a finite Markov chain derives from the possibility of including input amplitude constraints directly into the input model, by suitably choosing the state space of the Markov chain.

Furthermore, unlike the design in the frequency domain, this approach keeps more degrees of freedom in the choice of the optimal spectrum, which in general is non unique [12].

Two identification problems are considered here: parameter estimation and non-minimum phase zeros estimation for LTI systems.

For the first problem, the optimal distribution is found by minimizing the cost function defined in the input design problem with respect to the one-step transition probabilities of the Markov chain. In this analysis, a stochastic approximation algorithm is used since a closed-form solution to the optimization problem is not available and the cost is a stochastic function of these transition probabilities and is contaminated with noise (see [15], [16]

for details).

For the second problem, it will be shown that the Markov chain input model has exactly the optimal spectrum for the zero estimation problem of a FIR or ARX model [6]. In general, the spectrum of a two states Markov chain can be made equal to the spectrum of the AR process which guarantees a consistent estimate of the NMP zero of a linear SISO system [17].

Therefore, an optimal or consistent input can be generated in the time domain by Markov chain distributions. The adaptive algorithm introduced

(12)

CHAPTER 1. INTRODUCTION 3

in [18] for input design in the time domain when undermodeling is considered, is modified here in order to generate Markov chain signals having the same spectrum as the general inputs designed in the original version of the algorithm. The advantage is that a binary signal is then used to identify the non-minimum phase zero, by keeping the same input variance and spectrum.

The outline of the thesis is presented in the next section.

1.1 Thesis outline and contributions

The subject of this thesis is input design for system identification. In particular, the objective of this study is to analyze a new approach to input design: work in the probability domain model the input signal as a finite Markov chain. The first chapters summarize some known results of system identification in the PEM framework and input design in the frequency domain, in order to compare methods and results obtained with the classical input design method in the frequency domain and with the method proposed here.

In Chapter 2 PEM and its asymptotic properties are reviewed.

In Chapter 3 the most commonly adopted input design methods are described. These formulate input design problems as convex optimization problems. The solution is given as an optimal input spectrum.

In Chapter 4 general Markov chains that model binary input signals are defined. Some spectral properties are also described.

In Chapter 5 are presented the input design problem for parameter estimation and the solution approach based on the Markov chain input model.

In Chapter 6 input design for identification of NMP zeros of a LTI system is considered. The chapter discusses classical solutions in the frequency domain and adaptive solutions in the time domain in undermodeling conditions where the input is modeled as an AR or a Markov chain process.

Chapter 7 concludes the thesis.

(13)

Chapter 2

System Identification

This chapter introduces system identification in the PEM framework for parameter estimation of LTI models. Once a model structure is defined, this method finds the model’s parameters that minimize the prediction error.

Even when the model structure is able to capture the true system dynamics, the estimate error will not be zero, since data used in the identification procedure are finite and corrupted by noise. The purpose of input design is to minimize the estimate error by minimizing the variance of the parameter estimates, assuming the estimation method is consistent. Section 2.1 defines systems and model structures considered in this work, while Section 2.2 discusses PEM and its asymptotic properties.

2.1 System and model description

This thesis considers discrete-time LTI SISO systems, lying in the set M of parametric models

y (t) = G (q, θ) u (t) + H (q, θ) e (t) , (2.1)

G (q, θ) = q⁻ⁿ^kb₁+ b₂q⁻¹+ · · · + b_n_bq⁻ⁿ^b 1 + a₁q⁻¹+ · · · + a_n_aq⁻ⁿ^a H (q, θ) = 1 + c₁q⁻¹+ · · · + c_n_cq⁻ⁿ^c

1 + d₁q⁻¹+ · · · + d_n_dq⁻ⁿ^d,

θ = [b₁, . . . , b_n_b, a₁, . . . a_n_a, c₁, . . . c_n_c, d₁, . . . d_n_d]^T ∈ R^b×1

where u (t) is the input, y (t) is the output and e (t) is zero mean white noise with finite variance. The symbol q⁻¹ represents the delay operator (q⁻¹u (t) = u (t − 1)). Assume H (q, θ) is stable, monic and minimum phase, i.e. poles and zeros lie inside the unit circle.

4

(14)

CHAPTER 2. SYSTEM IDENTIFICATION 5

The real system S is given as

y (t) = G₀(q) u (t) + H₀(q) e₀(t) , (2.2) where e₀(t) has finite variance λ₀. Assume there exists θ₀, parameter vector such that G (q, θ₀) = G₀(q) and H (q, θ₀) = H₀(q), i.e. assume there is no undermodeling:

S ∈ M. (2.3)

This condition is hardly satisfied in practice, since real systems often are of high order or non linear. Nevertheless, as it will be explained in the next section, this condition is crucial for the consistence and the asymptotic properties of PEM. In Chapter 5, regarding the parameter estimation problem, (2.3) will be supposed to hold, while in Chapter 6, where the zero estimation problem is analyzed, this condition will not necessarily be considered.

2.2 Identification method

System identification aims at describing a real system through a mathematical model constructed and validated from experimental input-output data. The identification method considered here is PEM [19]. This method minimizes the function of the prediction error ε_F (t, θ)

V_N¡ θ, Z^N¢

= 1 2N

XN t=1

ε²_F (t, θ) (2.4)

where Z^N is a vector containing the collected input-output data, i.e. Z^N = [y (1) , u (1) , y (2) , u (2) . . . , y (N ) , u (N )].

The prediction error is defined as ε_F (t, θ) = y (t, θ) − ˆy (t, θ), where the one-step ahead predictor is given by

ˆ

y (t, θ) = H⁻¹(q, θ) G (q, θ) u (t) +£

1 − H⁻¹(q, θ)¤ y (t) .

Suppose all the hypothesis for the consistence of PEM are satisfied; in that case, the parameter estimate ˆθ_N converges to the true parameter vector θ₀ as N tends to infinity. Briefly, these conditions are

1. Condition (2.3) holds, i.e. there is no undermodeling 2. The signals y (t) and u (t) are jointly quasi-stationary 3. u (t) is persistently exciting of sufficiently large order

(15)

2.3 Estimate uncertainty

Measuring the quality of the model estimate is an important issue in system identification. The measure is chosen depending on the application for which the model is required. One possibility to measure the estimate uncertainty is to use a function of the covariance matrix of the parameter estimate. In other cases, such as in control applications, it could be better to use the variance of the frequency response estimate in the frequency domain. These two cases are now presented in more detail.

2.3.1 Parameter uncertainty

Under the assumptions for the consistency of PEM, it holds that

√N

³θˆ_N− θ₀

´

→ N (0, P_θ₀) as N → ∞ (2.5)

P_θ⁻¹₀ = 1 λ₀E£

ψ (t, θ₀) ψ^T(t, θ₀)¤ ψ (t, θ₀) = ∂ ˆy (t, θ)

∂θ

¯¯

θ0

where N denotes the Normal distribution [19]. Therefore, when the model class is sufficiently flexible to describe the real system, the parameter esti- mate will converge to the true parameter vector as the number of data N used in the estimation goes to infinity, with covariance decaying as¹/N.

From (2.5) it follows that a confidence region in which the parameter estimate will lie with probability α is

U_θ =

½ θ| N

³ θ − ˆθ_N

´_T P_θ⁻¹₀

³ θ − ˆθ_N

´

≤ χ²_α(n)

¾

. (2.6)

The covariance matrix defines an ellipsoid asymptotically centered in θ₀. Upon the condition that u and e are independent (that is, data are collected in open-loop), the asymptotic expression in the number of data points N of the inverse of the covariance matrix of the parameter estimate is

P_θ⁻¹₀ = N 2πλ₀

Z _π

−π

F_u¡ e^iω, θ₀¢

Φ_u(ω) F_u^?¡

e^iω, θ₀¢

dω + R_e(θ₀)(2.7) R_e(θ₀) = N

2π Z _π

−π

F_e¡ e^iω, θ₀¢

F_e^?¡

e^iω, θ₀¢ dω

(16)

where

F_u¡ e^iω, θ₀¢

= H⁻¹¡

e^iω, θ₀¢ ∂G¡ e^iω, θ¢

∂θ

¯¯

¯θ0

(2.8)

F_e¡ e^iω, θ₀¢

= H⁻¹¡

e^iω, θ₀¢ ∂H¡ e^iω, θ¢

∂θ

¯¯

¯θ0

(2.9)

and Φ_u(ω) is the power spectral density of the input u (t). Here^? denotes the complex conjugate transpose.

Expression (2.7) shows that the asymptotic covariance matrix of the parameter estimate depends on the input spectrum. Therefore, by shaping Φ_u(ω) it is possible to obtain estimates within a given accuracy.

In the whole thesis it will assumed that there is no feedback in the system, i.e. u and e are independent.

2.3.2 Frequency response uncertainty

In many applications, it could be preferable to measure the quality of the model estimate using the variance of the frequency response estimate, frequency by frequency. In [19] it is shown that under the condition (2.3), the variance of G

³ e^iω, ˆθ_N

´

can be approximated by Var

³ G

³ e^iω, ˆθ_N

´´

≈ m N

Φ_v(ω)

Φ_u(ω) (2.10)

for large but finite model order m and number of data N , where v is the process defined as v (t) = H₀(q) e₀(t).

If the model order is not large enough the previous expression is not a good approximation. Instead, by the Gauss’ approximation formula, it is possible to write

Var

³ G

³ e^iω, ˆθ_N

´´

≈ 1 N

∂G¡ e^iω, θ¢

∂θ

¯¯

¯

?

θ0

P_θ₀ ∂G¡ e^iω, θ¢

∂θ

¯¯

¯θ0

. (2.11) Equation (2.11) expresses the frequency response uncertainty in terms of the parameter uncertainty. Therefore, both equations (2.10) and (2.11) show that a proper choice of the input spectrum can reduce the variance of the frequency response estimate. This is the purpose of input design.

2.4 Conclusions

This chapter introduced PEM for system identification and its asymptotic properties that are often used to solve input design problems.

(17)

Models constructed from experimental data are always affected by uncertainty. In Section 2.3 two possible measures of the model uncertainty were discussed: parameter uncertainty and frequency response uncertainty.

The choice between the two depends on the application. Typically, in control applications a measure in the frequency domain is preferable. This work considers parameter uncertainty.

(18)

Chapter 3

Input Design for System Identification

This chapter presents general input design problems for system identification. Typically, input design aims at optimizing some performance function under constraints on the estimate accuracy and on the input signal. The solution approach will be presented in detail, describing how input design problems can be formulated as convex optimization problems.

Ideas and drawbacks of the general input design framework are reviewed in Section 3.1. The most widely used measures of estimate accuracy are presented in Section 3.2. Here is also shown how quality constraints can be written as convex constraints. Sections 3.3 to 3.5 describe some techniques used for spectra and signal constraints parametrization, needed for handling finitely parametrized problems.

3.1 Optimal input design problem

In a general formulation, input design problems are constrained optimization problems, where the constraints are typically on the input signal spectrum or power and the estimate accuracy. In this framework the objective function to be optimized can be any performance criterion, which usually depends on the practical application. For example, input power or experiment time can be minimized.

Common input design problem formulations are:

1. Optimize some measure of the estimate accuracy, under constraints on input excitation.

2. Optimize some property of the input signal, given constraints on the 9

(19)

CHAPTER 3. INPUT DESIGN FOR SYSTEM IDENTIFICATION 10

estimate accuracy.

As it will be discussed in the next section, typical measures of the estimate accuracy are functions of the uncertainty in the model estimate, like (2.7), (2.10) and (2.11). As was shown in Section 2.3, these functions depend on the input spectrum, which therefore can be used to optimize the objective function.

A formal expression of the first problem formulation is minΦu

f (P_θ₀) (3.1)

subject to g (Φ_u) ≤ α, that can also be written as

minΦu

γ (3.2)

subject to f (P_θ₀) ≤ γ g (Φ_u) ≤ α.

The next sections will show how this type of constraints can be formu- lated as convex constraints, upon certain conditions on the functions f and g.

The following drawbacks of input design problems like (3.2) have to be enlightened. First of all, notice that the asymptotic expression of the covariance matrix depends on the true parameter θ₀ that is not known.

Secondly, the constraints may be non-convex and infinite dimensional. In that case, a parametrization of the input spectrum is necessary in order to handle finitely parametrized optimization problems. Furthermore, once the optimal input spectrum has been found, an input signal having that optimal spectrum has to be generated. This can be done by filtering white noise with an input spectral factor¹. Nevertheless, no information on the probability distribution of the white noise is given in this solution approach.

3.2 Measures of estimate accuracy

In the usual input design framework, three types of quality measures are typically considered. These are described in the following sections.

1By spectral factor is meant an analytic function L (z) such that Φu(z) = L (z) L¡ z⁻¹¢

.

(20)

3.2.1 Quality constraint based on the parameter covariance In Section 2.3 it has been shown that the asymptotic covariance matrix of the parameter estimate is an affine function of the input spectrum, through (2.7).

If the purpose of the identification procedure is parameter estimation, then typical scalar measures of estimate accuracy are the trace, the determinant or the maximum eigenvalue of the covariance matrix [12]. Then the following quality constraints can be introduced:

TrP_θ₀ ≤ γ (3.3)

detP_θ₀ ≤ δ (3.4)

λ_max(P_θ₀) ≤ ². (3.5)

It is possible to prove that these constraints can be manipulated to be convex in P_θ⁻¹₀ ; proofs can be found in [20] and [21] for (3.4) and (3.5), respectively.

For example, (3.5) is equivalent to

"

I² I I P_θ⁻¹₀

#

≥ 0 (3.6)

which is an LMI in P_θ⁻¹₀ .

The constraint (3.3) is a special case of the more general weighted trace constraint that will be considered in the next subsection.

Notice that all these quality constraints depend on the true parameter vector θ₀. Many solutions have been presented in the literature to handle this problem, as will be discussed afterwards.

3.2.2 Quality constraint based on the model variance

Consider the quality constraint based on the variance of the frequency response

1 2π

Z _π

−π

F (ω) Var

³ G

³ e^iω, ˆθ_N

´´

dω ≤ γ, (3.7)

where F (ω) is a weighting function. By substituting the variance expression (2.11), it results that this quality constraint can be written as

TrW P_θ₀ ≤ γ, (3.8)

where

W = 1 2π

Z _π

−π

1 N

∂G¡ e^iω, θ¢

∂θ

¯¯

¯θ0

F (ω) ∂G¡ e^iω, θ¢

∂θ

¯¯

¯

?

θ0

dω. (3.9) See [12] for details. The following Lemma generalizes the previous result. A proof can be found in [6].

(21)

Lemma 3.2.1 The problem

TrW (ω) P_θ₀ ≤ γ, ∀ω W (ω) = V (ω) V^?(ω) ≥ 0, ∀ω

P_θ₀ ≥ 0 can be formulated as an LMI of the form

γ − TrZ ≥ 0

"

Z V^?(ω) V (ω) P_θ⁻¹

0

#

≥ 0, ∀ω.

(3.10) Notice that this formulation is convex in the matrices P_θ⁻¹

0 and Z (see [12], [6]). Notice also that this type of constraint includes the constraint on the trace of the covariance matrix (3.3) as a special case, when W is the identity matrix.

3.2.3 Quality constraint based on the confidence region In control applications, it is often preferable to have frequency by frequency constraints on the estimate error. Consider the measure [6, 12]

∆¡ e^iω, θ¢

= T¡

e^iω¢ G₀¡ e^iω¢

− G¡ e^iω, θ¢

G (e^iω, θ) . (3.11) The input has to be designed so that

¯¯∆¡ e^iω¢¯

¯ ≤ γ, ∀ω, ∀θ ∈ U_θ. (3.12)

This constraint can also be formulated as a convex constraint in P_θ⁻¹₀ , as proven in [22].

3.3 Input spectra parametrization

As discussed in the last section, the typical measures of estimate accuracy are functions of the covariance matrix P_θ₀. Expression (2.7), derived in the asymptotic analysis of PEM, shows that the input spectrum Φ_u can be used to optimize the estimate performance. The problem of finding an optimal input spectrum has an infinite number of parameters, since Φ_u(ω) is a continuous function of the frequency ω. Nevertheless, by a proper spectrum parametrization, it is possible to formulate the problem as a convex

(22)

and finite dimensional optimization problem, since the parametrization of the input spectrum leads to a parametrization of the inverse of the covariance matrix [6, 12].

A spectrum can always be written in the general form Φ_u(ω) =

X∞ k=−∞

˜c_kB_k¡ e^iω¢

, (3.13)

where© B_k¡

e^iω¢ª_∞

k=−∞ are proper stable rational basis functions that span² L₂. It is always possible to choose basis functions having the hermitian properties B_−k = B^?_kand B_k¡

e^−iω¢

= B_k^?¡ e^iω¢

[6]. The coefficients ˜c_ksatisfy the symmetry property ˜c_−k = ˜c_k and must be such that

Φ_u(ω) ≥ 0, ∀ω, (3.14)

otherwise Φ_u would not be a spectrum.

For example, the FIR representation of a spectrum is obtained by choos- ing B_k(ω) = e^−iωk; consequently, it results ˜c_k= r_k, where r_k is the correla- tion function of the process u.

By substituting (3.13) into (2.7), the inverse of the covariance matrix becomes an affine function of the coefficients ˜c_k of the form

P_θ⁻¹₀ = X∞ k=−∞

˜c_kQ_k+ ¯Q. (3.15)

This parametrization of the input spectrum leads to a denumerable but infinite number of parameters in the optimization problem.

Two possible spectra parametrizations are described in the following subsections, which make the problem finitely parametrized.

3.3.1 Finite dimensional spectrum parametrization The finite dimensional spectrum parametrization has the form

Φ_u(ω) = Ψ¡ e^iω¢

+ Ψ^?¡ e^iω¢ Ψ¡

e^iω¢

=

M −1X

k=0

˜c_kB_k¡ e^iω¢

. (3.16)

This parametrization forces the coefficients ˜c_M, ˜c_{M +1}, . . . to be zero. There- fore, the condition Φ_u(ω) ≥ 0 must be assured through the coefficients {˜c_k}^{M −1}_k=0 . The following result, deriving from an application of the Positive Real Lemma [23], can be used to assure the constraint (3.14).

2L2 denotes the set© f |R

|f (x)|²dx < ∞ª

(23)

Lemma 3.3.1 Let {A, B, C, D} be a controllable state-space realization of Ψ¡

e^iω¢

. Then there exists a matrix Q = Q^T such that Ã

Q − A^TQA −A^TQB

−B^TQA −B^TQB

! +

Ã

0 C^T

C D + D^T

!

≥ 0 (3.17)

if and only if Φ_u(ω) ,P_{M −1}

k=0 ˜c_k£ B_k¡

e^iω¢ + B_k^?¡

e^iω¢¤

≥ 0, ∀ω.

The state-space realization of the positive real part of the input spectrum can be easily constructed. For example (Example 3.5 in [6]), an FIR spectrum has positive real part given by

Ψ¡ e^iω¢

= 1 2r₀+

M −1X

k=1

r_ke^−iωk. (3.18)

A controllable state-space realization of Ψ¡ e^iω¢

is A =

Ã

O_{1×M −2} 0

I_{M −2} O_{M −2×1}

!

, B =

³

1 0 . . . 0

´_T

C =

³

r₁ r₂ . . . r_{M −1}

´

, D = 1

2r₀. (3.19)

Therefore, in this example the constraint (3.14) can be written as an LMI in Q and r₁, . . . , r_{M −1}.

3.3.2 Partial correlation parametrization

The partial correlation parametrization uses the finite expansion

M −1X

k=−(M −1)

˜c_kB_k¡ e^iω¢

(3.20)

in order to design only the first M coefficients ˜c_k. In this case, it is necessary to assure that there exists a sequence ˜c_M, ˜c_{M +1}, . . . such that the complete sequence {˜c_k}^∞_k=0 defines a spectrum. That is, the condition (3.14) must hold. This means that (3.20) does not necessary define a spectrum itself, but the designed coefficients are extendable to a sequence that parametrizes a spectrum. As explained in [6], if an FIR spectrum is considered, a necessary and sufficient condition for (3.14) to hold is







r₀ r₁ . . . r_{M −1} r₁ r₀ . . . r_{M −2}

... ... . .. ... r_{M −1} r_{M −2} . . . r₀





≥ 0 (3.21)

(24)

(see [24] and [25]).

This condition also applies to more general basis functions, like B_k¡ e^iω¢

= L¡

e^iω¢

e^−iωk, where L¡ e^iω¢

> 0 [6, 12].

The constraint (3.21) is an LMI in the first M correlation coefficients and therefore is convex in these variables. Notice that (3.21) is less restric- tive than the condition imposed in Lemma 3.3.1 for the finite dimensional parametrization. Furthermore, as it will be discussed in the next section, the finite spectrum parametrization allows to handle spectral constraints on input and output signals, that the partial correlation parametrization cannot handle, since the parametrization (3.20) is not a spectrum. An advantage of the latter parametrization, though, is that it uses the minimum number of free parameters.

3.4 Covariance matrix parametrization

By using one of the input spectrum parametrizations in the expression (2.7), the inverse of the asymptotic covariance matrix can be written as

P_θ⁻¹

0 =

M −1X

k=−(M −1)

˜c_kQ_k+ ¯Q, (3.22)

where Q_k= _2πλ^N

0

R_π

−πF_u¡ e^iω, θ₀¢

B_k¡ e^iω¢

F_u^?¡

e^iω, θ₀¢

dω and ¯Q = R_e(θ₀).

Then, P_θ⁻¹₀ is expressed as a linear and finitely parametrized function of the coefficients ˜c₀, . . . , ˜c_{M −1} (since the symmetry condition ˜c_−k = ˜c_kholds).

Therefore, any quality constraint that is convex in P_θ⁻¹

0 is also convex in

˜c₀, . . . , ˜c_{M −1}. Some common quality constraints have been introduced in Section 3.2, which were all convex functions of P_θ⁻¹₀ .

3.5 Signal constraints parametrization

Constraints on the input spectrum are also considered in input design. Typ- ically, they are frequency by frequency or power constraints. A detailed dis- cussion of the power constraints parametrization is presented in [6]. Briefly, consider power constraints of the type

1 2π

Z _π

−π

¯¯W_u¡

e^iω¢¯¯²Φ_u(ω) dω ≤ α_u (3.23) 1

2π Z _π

−π

¯¯W_y¡

e^iω¢¯¯²Φ_y(ω) dω ≤ α_y. (3.24)

(25)

By using a finite spectrum parametrization these constraints can be written as convex finite-dimensional functions of {˜c_k}^{M −1}_k=0 . For example, a constraint on the input power for an FIR spectrum becomes r₀ ≤ α_u.

For frequency by frequency constraints of the form

β_u(ω) ≤ Φ_u(ω) ≤ γ_u(ω) (3.25) β_y(ω) ≤ Φ_y(ω) ≤ γ_y(ω) , (3.26) Lemma 3.3.1 can be applied to write them as convex constraints in {˜c_k}^{M −1}_k=0 , upon the condition that the constraining functions are rational [6].

3.6 Limitations of input design in the frequency domain

The previous sections introduced input design problems where constraints on the signal spectra as well as on a measure of the estimate accuracy are considered. It has been shown that they can be formulated as finitely parametrized convex optimization problems, upon the condition that the measure of the estimate accuracy is a convex function of P_θ⁻¹

0 and the input spectrum is parametrized as proposed in Section 3.3. Therefore, by solving a constrained optimization problem, the optimal variables ˜c₀, . . . , ˜c_{M −1} are found. The FIR spectrum representation is commonly used, so that the optimization procedure returns the first M terms of the correlation function.

If a partial correlation parametrization is used, the optimal spectrum can be found by solving the Yule-Walker equations as described in [26].

From the optimal spectrum is then necessary to generate a signal in the time domain to apply to the real system. This is a realization problem that caracterizes solutions in the frequency domain. The input can be generated as filtered white noise, by spectral factorization of the optimal spectrum.

Nevertheless, many probability distributions can be used to generate white noise.

Also, it has to be noticed that in general the optimal spectrum is non unique and the input design approach so far considered only finds one of the optimal spectra. In fact, a finite dimensional spectrum parametriza- tion forces the input correlation coefficients r_M, r_{M +1}, . . . to be zero; on the other hand, the partial correlation parametrization needs to complete the correlation sequence by solving Yule-Walker equations which give only one particular correlation sequence.

Furthermore, in practical applications time domain constraints on the signals have to be considered and the power constraint that is usually set in

(26)

the frequency domain does not assure that these constraints are respected.

For these reasons, this thesis proposes to analyze the performance of an input design method in the probability domain, as it will be presented in the next chapters.

3.7 Conclusions

Classical input design in the frequency domain has been presented. The advantage of this approach is that input design problems can be formulated as convex optimization problems. Some limitations of the method concern time domain constraints on signals and realization techniques.

(27)

Chapter 4

Markov Chain Input Model

The drawbacks of input design in the frequency domain, presented in Section 3.6, suggest to study the possibility of a different approach.

This chapter will introduce the idea of input design in the probability domain. In particular, reasons and advantages of modeling the input signal as a Markov chain will be presented in Section 4.1. In Section 4.2 the Markov chain input model will be described in detail.

4.1 Introduction

What in general is required for system identification in practical applications is an input signal in the time domain that guarantees a sufficiently accurate estimate of the system while respecting some amplitude constraints.

As discussed above, input design in the frequency domain does not handle time domain constraints on signals. Another disadvantage is that it does not define how to generate the input signal to apply to the real system from the optimal spectrum. Furthermore, input design in the frequency domain does not use the degrees of freedom in the choice of the optimal spectrum, which is generally non unique. That approach, in fact, only finds an optimal solution that fits with the optimal correlation coefficients r₀, . . . , r_{M −1}. All the other possible solutions are not considered.

The idea of input design in the probability domain arises from this ob- servation: a solution in the probability domain makes it easier to generate input trajectories to apply to the real system, by extracting samples from the optimal distribution. In this way, no spectral factorization or realization algorithm are required.

Markov chains having finite state space could then be used in order to directly include the time domain amplitude constraints into the input

18

(28)

CHAPTER 4. MARKOV CHAIN INPUT MODEL 19

model, by suitably choosing the state space. The idea is to use Markov chain distributions to generate binary signals. Binary signals are often used in system identification and one of the reasons is that they achieve the largest power in the set of all signals having the same maximum amplitude and this improves parameter estimation for linear models.

Also, if the Markov chain has spectrum of sufficiently high order (and this depends on the state space dimension), when designing the optimal probability distribution there are more degrees of freedom in the choice of the optimal spectrum.

A finite stationary Markov chain is used as input model for system identification. The probability distribution will be shaped in order to optimize the objective function defined in the input design problem. This function is then minimized with respect to the transition probabilities of the Markov chain that completely define its distribution [27].

4.2 Markov chains model

This section describes the state space structure of the general Markov chain input model and some of its spectral features.

4.2.1 Markov chain state space

Consider a finite stationary Markov chain having states of the form

(u_t−n, u_t−n+1, . . . , u_t) (4.1) where u_i represents the value of the input at the time instant i; it can be equal to either u_max or −u_max, where u_max is the maximum tolerable input amplitude, imposed by the real system. This model allows the present value of the input to depend on the last n past values, rather than only on the pre- vious one. Note that at the time instant t, the state can transit only to either the state (u_t−n+1, u_t−n+2, . . . , u_t, u_max) or (u_t−n+1, u_t−n+2, . . . , u_t, −u_max) with probabilities p_(u_t−n_,...,u_t₎ and 1 − p_(u_t−n_,...,u_t₎, respectively. Not all the transitions between states are possible; therefore the transition matrix will present several zeros corresponding to those forbidden state transitions.

The last component of the Markov chain state will generate the binary signal to apply to the real system.

Example 4.2.1 Consider a Markov chain having state space S₂ = {1, −1}.

The graph representation is shown in Figure 4.1 and the corresponding tran-

(29)

sition matrix is

Π₂= Ã

p 1 − p 1 − q q

!

. (4.2)

Figure 4.1: Graph representation of the two states Markov chain S₂.

This simple model generates a binary signal where each sample at time t depends only on the previous value at time t − 1.

Example 4.2.2 A more general model is the four states Markov chain, with state space S = {(1, 1) , (1, −1) , (−1, −1) , (−1, 1)}. The transition matrix is

Π =







p 1 − p 0 0

0 0 s 1 − s

0 0 q 1 − q

r 1 − r 0 0





 (4.3)

and the corresponding graph is shown in Figure 4.2.

Figure 4.2: Graph representation of the four states Markov chain S₄.

Note that when p = r and s = q the four states Markov chain model is equivalent to the two states Markov chain of Example 4.2.1.

(30)

These examples show one of the advantages of the proposed Markov chain input model: each model includes all the models of lower dimension as special cases, for proper choices of the transition probabilities.

4.2.2 Markov chains spectra

This section presents a method to calculate the expression of the Markov chains spectra. Some examples will illustrate the type of signals these models generate.

By means of Markov chains and state space realization theories (see [27]

and [28]), it is possible to derive a general expression for the spectrum of a finite stationary Markov chain s_m having state space S = {S₁, S₂, . . . , S_J}.

For the general Markov chains considered in the previous section, each state has the form (4.1) and the number of states is J = 2ⁿ⁺¹.

Let Π denote the transition matrix whose elements are the conditional probabilities Π (i, j) = P {s_m+1= S_j | s_m= S_i} and ¯p =

³

¯

p₁ . . . ¯p_J

´ the solution of the linear system ¯p = ¯pΠ, containing the stationary proba- bilities ¯p_i= P {s_n= S_i}. Consider the states S_i as column vectors. Defining

A_s=

³

S₁ . . . S_J

´

and

D_s=





¯

p₁ 0

. ..

0 p¯_J





it is possible to write the correlation coefficients of the output signal in the matricial form

r_k= A_sD_sΠ^kA^T_s , k = 0, 1, 2 . . . (4.4) For k < 0 the correlation can be obtained by the symmetry condition r_k= r_−k, since the process s_m is real.

To calculate the spectrum of s_m as the Fourier transform of the cor- relation function, note that r_k can be viewed as the impulse response, for k = 1, 2, . . ., of the linear system

x_k+1 = Πx_k+ ΠA^T_su_k

y_k = A_sD_sx_k. (4.5)

Therefore, the transfer function W (z) = A_sD_s(zI − Π)⁻¹ΠA^T_s of the system (4.5) is the Z-transform of the causal part of the correlation function, that is {r_k}^∞_k=1.

(31)

Consequently, W¡ z⁻¹¢

is the Z-transform of the anticausal part of the correlation function, {r_k}⁻¹_k=−∞.

The spectrum of the Markov chain signal s_m can then be expressed as Φ_s(z) = W (z) + r₀+ W¡

z⁻¹¢

, (4.6)

The correlation r_kis in general a matricial function, i.e. for each k, r_k is an n × n matrix. The correlation function of the input signal is given by the sequence obtained for the element in position (n, n): {r_k(n, n)}^∞_k=0. Then, the input signal spectrum is given by the element in position (n, n) of the matrix Φ_s(z).

Consider now the two Markov chains in Examples 4.2.1 and 4.2.2 of the previous section.

Example 4.2.3 By calculating the transfer function W (z) and the autocor- relation r₀, the following expression for the spectrum of the Markov chain of Figure 4.1 is obtained:

Φ_s(z) =

p(1 + α) (1 − γ)p

(1 + α) (1 − γ)

(z − α) (z⁻¹− α) , (4.7)

where α = p + q − 1 ∈ [−1, 1] and γ = (p+q−1)(p+q−2)−(p−q)²

(p+q−2) .

Notice that (4.7) is the spectrum of a first order AR process. This also means that it is possible to generate any first order AR process through a two states Markov chain.

The mean value of s_m is E [s_m] = _2−p−q^p−q . By forcing p = q the mean value would be zero and since γ = α the spectrum would depend on only one parameter. If p = q the variance of the Markov chain is 1.

Example 4.2.4 Consider the Example 4.2.2 of the previous section where the mean value of the Markov chain is set to zero. Then s = ^(1−q)r_(1−p) and the stationary probabilities are

¯ p =







r(1−q) 2(1−p)(1−q)+2r(1−q)

(1−p)(1−q) 2(1−p)(1−q)+2r(1−q)

r(1−q) 2(1−p)(1−q)+2r(1−q)

(1−p)(1−q) 2(1−p)(1−q)+2r(1−q)







T

By definition, A_s = Ã

1 1 −1 −1

1 −1 −1 1

!

. However, only the second component of the Markov chain is of interest, since it will represent the

(32)

input signal. Therefore, the correlation r_k of the input process can be calcu- lated using the vector ¯A_s=

³

1 −1 −1 1

´

instead of A_s. The analytic calculation of the spectrum turns out to be too involved, so it has only been evaluated numerically. Some values of poles and zeros of the canonical spec- tral factor¹ are reported in Table 4.1. These data show that it is not possible to model s_m by an AR process, because in general there are also non null zeros in the spectrum. The four states Markov chain has a higher order spectrum than the two states Markov chain of Example 4.2.3; the number of poles and zeros depends on the values of the probabilities p, r and q and can be up to eight. For some values of the probabilities p, r and q, there are zero-pole cancelations that reduce the spectrum order; in particular, when p = q = r the spectrum has the same simple structure obtained for the previ- ous case (see Table 4.1), as it has already been shown in the previous section.

A particular choice of the transition matrix for this Markov chain is

Π₄=







p 1 − p 0 0

0 0 r 1 − r

0 0 p 1 − p

r 1 − r 0 0





 (4.8)

This choice of the transition matrix makes the Markov chain symmetric in the sense that the transition probabilities are invariant with respect to exchanges in the sign of the states components.

Even if the input is designed in the probability domain, the spectrum is shaped by the choice of n and of the transition probabilities of the Markov chain.

Notice that Φ_uis only subject to (4.6) and no other structural constraints are imposed, except for the constraint that the transition probabilities have to lie in [0, 1]. For the spectrum parametrizations described in Section 3.3 this does not happen. For example, if the FIR representation of the spectrum is used, the finite dimensional parametrization forces the positive real part of the spectrum to be an FIR system. The Markov chain signals have spectrum where poles and zeros are related each other, since the number of free parameters in the problem is J (the transition probabilities), as the number of poles of W (z). However, the positive real part of the spectrum is not forced to be FIR. From these observations it is possible to conclude that looking at input design using Markov chains in the frequency domain, this

1By canonical spectral factor is meant the analytic and minimum phase function L (z) such that Φs(z) = L (z) L¡

z⁻¹¢ .

On Input Design for System Identiﬁcation

On Input Design for System Identification

CHIARA BRIGHENTI

Masters’ Degree Project Stockholm, Sweden March 2009

Abstract

Acknowledgements

Acronyms

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1 Thesis outline and contributions

Chapter 2

System Identification

2.1 System and model description

2.2 Identification method

2.3 Estimate uncertainty

2.4 Conclusions

Chapter 3

Input Design for System Identification

3.1 Optimal input design problem

3.2 Measures of estimate accuracy

3.3 Input spectra parametrization

3.4 Covariance matrix parametrization

3.5 Signal constraints parametrization

3.6 Limitations of input design in the frequency domain

3.7 Conclusions

Chapter 4

Markov Chain Input Model

4.1 Introduction

4.2 Markov chains model