Frequency Domain Identification of Continuous-Time ARMA Models : Interpolation and Non-uniform Sampling

(1)

Frequency Domain Identification of

Continuous-Time ARMA Models:

Interpolation and Non-uniform Sampling

Jonas Gillberg,

Lennart Ljung

Division of Automatic Control

Department of Electrical Engineering

Link¨opings universitet, SE-581 83 Link¨oping, Sweden

WWW:

http://www.control.isy.liu.se

E-mail:

gillberg@isy.liu.se,

ljung@isy.liu.se

27th September 2004

AUTOMATIC CONTROL

COM

MUNICATION SYSTEMS

LINKÖPING

Report no.:

LiTH-ISY-R-2625

Submitted to

Technical reports from the Control & Communication group in Link¨oping are available athttp://www.control.isy.liu.se/publications.

(2)

Abstract

In this paper is discussed how to estimate irregularly sampled continuous-time ARMA models in the frequency domain. In the process, the model output signal is assumed to be piecewise constant or piecewise linear, and an approximation of the continuous-time Fourier transform is calculated. ML-estimation in the frequency domain is then used to obtain parameter estimates.

(3)

Frequency Domain Identification of

Continuous-Time ARMA Models:

Interpolation and Non-uniform Sampling

Jonas Gillberg, Lennart Ljung

2004-09-27

Abstract

1 Introduction

Approaching parameter estimation from the discrete-time domain has been the dominating paradigm in system identification. Partly because the mathematics is straightforward and partly because methods are easily implemented on digi-tal computers[8]. In the black-box discrete-time framework the identified model parameters often lack physical relevance. This is not a problem in some applica-tions, for instance the synthesis of regulators. Modeling of physical systems on the other hand is often performed in continuous-time, and in many applications there is a genuine interest in the physical parameters.

Uniform sampling has also been a standard assumption. A single sensor delivered measurements at a constant rate. With the advent of networked asyn-chronous sensors this assumption has changed. In fields such as economics and finance uniform sampling might not be practically possible.

One route to obtain continuous-time parameters is via identification of a discrete time model. This, so called indirect approach, has proved to be sensitive to the size of the sampling interval. Numerical ill-conditioning can occur for very small sample intervals since the discrete-time poles gather around 1 in the complex plane[7].

1.1 Problem Formulation

In this report we take the so called direct approach, where the continuous-time parameters are identified directly from sampled data. This approach has been used in the time domain for non-uniformly sampled continuous-time ARMA (CARMA) models[3]. We discuss how to estimate an irregularly sam-pled continuous-time ARMA model in the frequency domain. In the process,

(4)

the model output is assumed to be piecewise constant or piecewise linear, and an approximation of the continuous-time Fourier transform is calculated. ML-estimation in the frequency domain is then used to obtain parameter estimates. Rules of thumb concerning the model bias and variance are derived and most important used in order to select the frequencies to be used in estimation.

1.2 Outline

The report will be structured as follows. In section 1w e will introduce the CARMA model together with basic properties for continuous-time stationary processes. We will then, in section 2, a dress the topic of spectral estimation and the use of interpolation in order to approximate the continuous time Fourier transform. We will also treat the effects of small sampling intervals on the spectral bias. After that, in section 4 and 5 we describe the maximum-likelihood approach in the time and most important in the frequency domain. In the following section 6expression relating the spectral bias to parameter bias are derived. Asymptotic expressions for the variance can also be found. These expressions are used in section 7 in order to provide users with rules of thumb on which frequencies to choose. Finally, the preceding arguments are exemplified by numerical experiments in section 8. Comments on conclusions, further research and acknowledgments follows in section 9,10 and 11.

(5)

2 Modeling and Simulation

In this section the basic model and data setup are introduced. First we present the continuous-time autoregressive moving average (CARMA) model in an in-formal manner. Then we more in-formally introduce the model in state space form. We also explain how this stochastic continuous-time dynamical system is simulated and how measurements are taken.

2.1 Continuous-Time ARMA Model

The continuous-time autoregressive moving average (CARMA) model can in-formally be described as

yt= σ2B(p)

A(p)et (1)

where et is continuous time white noise. This means that etdt = dwtwhere wt

is a continuous-time Wiener process with

E[dwt] = 0 E[dwtdws] =

(

dt t = s

0 t 6= s

The operator p is here the differentiation operator while

A(p) = pn_{+ a}

1pn−1+ a2pn−2+ · · · + an B(p) = pm+ b1pm−1+ · · · + bm.

We denote the vector of parameters with θ = [a1 a2 . . . an b1 b2 . . . bn−1λ]T

where λ = σ2_.

2.2 State-Space Representation

The model in (1) can be formally represented in a state-space controller canon-ical form ( dxt = Axtdt + Bdet yt = Cxt (2) where A =       

−a1 −a2 . . . −an−1 −ana

1 0 . . . 0 0 0 1 . . . 0 0 .. . ... ... ... 0 0 . . . 1 0        B =£σ 0 0 . . . 0¤T C =£1 b1 b2 . . . bm ¤ .

The exact solution to the state-space representation in (2) can be written as

yt= CeAt_x 0+

Z t 0

(6)

where the integral is in the sense of Ito [5]. If ytis to be a zero mean stationary

Gaussian process the initial values x0 must be Gaussian and distributed such

that E[x0] = 0 and Q0= Cov[x0] satisfies the Lyapunov equation AQ0+ Q0AT+ σ2BBT = 0.

For the rest of this report we will assume that the process yt is stationary

Gaussian with zero mean value [2].

Finally we repeat the definition of the covariance function

r(τ ) = Eyt+τyt

and power spectrum

Φ(iω) = σ2B(iω)B(−iω) A(iω)A(−iω).

of the stationary process modeled in (1). Moving between spectrum Φ(iω) and covariance r(τ ) representations of the process is facilitated by the well-known formulas Φ(iω) = Z ∞ −∞ r(τ )e−iωτ_dτ and r(τ ) = 1 2π Z _∞ −∞ Φ(iω)eiωτ_dω. ₍₄₎

Assume that we only observe the process yt at the discrete time instances t = kTs. Then the covariance function of the discrete-time process ykTs will be

rd(k) = r(kTs).

and the discrete-time power spectrum will be Φd(z) = Ts

∞

X

k=−∞

rd(k)z−k.

Finally the following relationship, similar to the Poisson summation formula, exists between the discrete-time and continuous-time spectrums [10]

Φd(z) = ∞ X k=−∞ Φ(log z + i2πk Ts )

It can also be written as

Φd(eiωTs) = ∞ X k=−∞ Φ(iω +i2πk Ts ). (5)

(7)

3 Estimation of Power Spectrum

As mentioned earlier the parameter estimation method is divided into two stages. First the power spectrum is estimated and in a second stage the spec-trum is used to estimate the model parameters. In the first stage we resort to interpolation in order to obtain an approximation of a continuous-time re-alization of the output. It turns out that this approach also interpolates the covariance function in two dimensions. The Fourier transform of the interpo-lated realization can then be computed and an estimate of the spectrum can be formed. The bias in this estimate can then be appreciated.

3.1 The Fourier Transform

Let us define Fourier transformation of the continuous time output {yt : t ∈

[0, T ]} as YT(iω) = √1 T Z T 0 yte−iωt_dt. ₍₆₎

The values obtained for ω = 2πk

T , k = −∞, · · · − 1, 0, 1, . . . ∞ are proportional

to the familiar coefficients of the Fourier Series. Since ytis a Gaussian process

and Y (iω) is a linear combination of yt, it will be Gaussian to.

3.2 Interpolating the Continuous-Time Realization

A complicating element in the estimation procedure is that we do not have access to the entire continuous time realization of the output. Instead we have, as we pointed out earlier, a finite number of samples of the continuous output

yt at time instances {t1, t2, . . . , tN}. Therefore it is in some way necessary to

approximate or reconstruct the continuous time realization. In this report we reconstruct the output as

ˆ yT,k_{(t) =} N X i=1 ytiφ k i(t − ti) (7)

where T is the time the process is observed, k ∈ {−1, 0, 1} is the order of interpolation and φ is the interpolation kernels. Three types of kernels are used in this report. First we introduce

φ−1_i (t) = (ti+1− ti)δ(t − ti) (8)

which we term “Riemann interpolation”. Next we introduce the piecewise-constant interpolation φ0 i(t) =      0 t < ti 1 ti≤ t ≤ ti+1 0 ti+1< t

(8)

which will often go under the name Zero-Order Hold (ZOH). Finally we have piecewise linear interpolation

φ1 i(t) =            0 t < ti−1 t−ti−1 ti−ti−1 ti−1≤ t < ti ti+1−t ti+1−ti ti≤ t < ti+1 0 ti+1≤ t

which is usually termed First-Order Hold (FOH).

3.3 Transforming the Interpolated Output

From the interpolated output it is possible to compute the Fourier transform ˆ YT = √1 T Z _T 0 ˆ yT,k(t)u(t)e−iωtdt (9) where u(t) will be called the sampling function. If we sample in the discrete points tk then u(t) = ∞ X k=−∞ δ(t − tk). If we sample uniformly at tk= kTswe get

u(t) = ∞

X

k=−∞

δ(t − kT s)

and if we sample continuously

u(t) = 1.

Since multiplication in the time-domain is equivalent to convolving in the fre-quency domain the transform will be

ˆ YT(iω) = Z ∞ −∞ ˆ Y (i(ω − v))K(iu)du where K(iω) = Z ∞ −∞ u(t)e−iωt_dt and ˆ Y (iω) = Z ∞ −∞ ˆ y(t)e−iωtdt.

If we sample the interpolated signal equidistantly we will get

K(iω) = ∞ X k=−∞ δ(w −2π Ts k)

(9)

and ˆ YT(iω) = ∞ X k=−∞ ˆ Y (i(w −2π Tsk))

From this we see that the only effect of re-sampling is that we fold the transform around the sampling frequency.

3.4 Fourier Transforms for Continuous Sampling of

Inter-polated Output

If we sample the interpolated output ˆy(t) continuously the “Riemann

interpo-lation” case will yield a transform ˆ Y_T(−1)(iωn) =√1 T NXt−1 k=1 (tk+1− tk)y(tk)eiωtk.

In the piecewise constant case it will be ˆ Y_T(0)(iω) = √1 T N −1_X k=1 y(tk)e iωtk−1_{− e}iωtk iω

while in the piecewise linear case we have ˆ

Y_T(1)(iω) =√1 T

1

iω(y(t1)e

iωt1_{− y(tN}_)eiωtN₎

+√1 T 1 (iω)2 N −1_X k=1 y(tk+1) − y(tk) tk+1− tk (e iωtk_{− e}iωtk+1_).

This might seem as an awkward and expensive way of computing the Fourier transform. If we however do this on-line, we can restrict us to a time window of size T and just remove old data as new samples arrive.

3.5 Spectrum Estimates for Uniform Sampling of Output

In the case of uniform sampling tk= kTs the Fourier transform of the

interpo-lated data will be ˆ Y_T(−1)(iω) = √1 T Z _T 0 N X i=1 ykTsφ k i(t − kTs)e−iωtdt (10) = Fk_(iω)_√1 Nt N t X k=1 ykT se−iωkTs ₍₁₁₎ where Fk(iω) = √1 Ts Z ∞ −∞ φk_(t)e−iω_dt.

(10)

The ‘Riemann interpolation” will yield the ordinary discrete-time Fourier trans-form ˆ Y_T(−1)(iω) = √ Ts √ Nt NXt−1 k=1 y(kTs)eiωkTs

which is related to the discrete-time power spectrum

E ˆˆΦ(iω) = lim T →∞E| ˆY (−1) T (iω)|2= Φd(eiωTs) = ∞ X k=−∞ Φ(iω + i2π Tsk).

Zero-order hold and first order hold sampling on the other hand will provide filtered versions of the discrete-time Fourier transform since

ˆ F0_{(iω) = T} ssinc µ ωTs 2 ¶ ˆ F1(iω) = Tssinc2 µ ωTs 2 ¶ and hence

E ˆˆΦZOH(iω) = sinc2

µ ωTs 2 ¶ k=∞_X k=−∞ Φ(iω + i2π Tsk) E ˆˆΦF OH(iω) = sinc4 µ ωTs 2 ¶ k=∞_X k=−∞ Φ(iω + i2π Tsk).

The objective of the interpolation is to make the estimate consistent with the continuous-time power spectrum. Therefor we would like to remove the effects of folding. In view of what has been said earlier with respect to interpolation and filtering we realize that the optimal interpolation kernel φOP T _{should have}

a power spectrum which is

|FOP T(iω)|2=Pk=∞ Φ(iω)

k=−∞Φ(iω + i2πTsk).

(12) where Φ is the power spectrum for the true parameters. In Figure 1 we have compared the spectrum of the optimal kernel with ZOH and FOH. From this perspective we notice that FOH as an approximation of the optimal kernel seems to be worse than ZOH. This will also be confirmed by experiments later on.

3.6 Approximate Covariance Function

The approximation ˆyT,k_{(t) of a realization of the stationary process y}

twill have

the following second order properties ˆ

mT,k_y_ˆ (t) , E[ˆy(t)] = 0

ˆ

rT,k_{(t, s) , E[ˆ}_{y(t) − ˆ}_m(t)][ˆ_{y(s) − ˆ}_m(s)] ₍₁₃₎

= N X i=1 N X j=1 r(ti− tj)φki(t)φkj(s) (14)

(11)

−3000 −200 −100 0 100 200 300 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ω |F OPT| 2 |F ZOH| 2 |F_FOH|2

Figure 1: Kernel Spectrums for Ts= 0.1. The optimal spectrum is for a second

order CAR model where A(p) = p2_{+ 2p + 1, B(p) = 1 and σ = 1.}

Where r is the covariance function defined in (4).

An interesting question is how well ˆrT,k_{approximates R. From the}

represen-tation in (14) we see that piecewise constant interpolation of y(t) will result in a piecewise constant interpolation of r(t − s). Piecewise linear interpolation of

y(t) on the other hand, will result in piecewise bilinear interpolation of r(t − s).

3.6.1 Interpolation in Two Dimensions

In the following lemma we explain how we can bound the interpolation error.

Lemma 3.1 Let f : R2 _{→ R be a continuous once differentiable function with}

bounded first derivatives. Assume that the function is interpolated by a constant function ˆf : R2_{→ R in a rectangle Ωij} _{= {x, y : 0 ≤ x ≤ h}_{i, 0 ≤ y ≤ hj}. Then} we have for some constant C > 0

max

(x,y)∈Ωij

|f (x, y) − ˆf (x, y)| ≤ Chmax (15) If we interpolate f with a bilinear polynomial and assume it is twice differentiable with bounded second derivatives we get for some C > 0

max

(x,y)∈Ωij

|f (x, y) − ˆf (x, y)| ≤ Ch2max (16) Proof: Both results come from Taylor’s theorem. In the case of constant

interpolation we have

f (x, y) − f (0, 0) = fx(0, 0)x + fy(0, 0)y + O(x2+ y3) (17)

In the bilinear case we have the interpolant ˆ f (x, y) =f (0, 0) +f (h1, 0) − f (0, 0) h1 x + f (0, h2) − f (0, 0) h2 y (18) +f (h1, h2) − f (h1, 0) − f (0, h2) + f (0, 0) h1h2 xy (19)

(12)

Define ˜f (x, y) = f (x, y) − ˆf (x, y). Then ˜ f (x, y) = ˜f (0, 0) + ˜fx(0, 0)h1+ ˜fy(0, 0)h2 (20) +1 2 µ x y ¶Tµ_˜ fxx(0, 0) f˜xy(0, 0) ˜ fyx(0, 0) fyy˜ (0, 0) ¶ µ x y ¶ + O(px2_{+ y}23₎ ₍₂₁₎ Now ˜ f (0, 0) = 0 (22) ˜ fx(0, 0) = fx(0, 0) −f (h1, 0) − f (0, 0) h1 = O(h2 1) (23) ˜ fy(0, 0) = O(h22) (24) Similarly we have ˜ fxx(0, 0) = fxx(0, 0) (25) ˜ fyy(0, 0) = fyy(0, 0) (26) ˜ fxy(0, 0) = fxy(0, 0) −f (h1, h2) − f (h1, 0) − f (0, h2) + f (0, 0) h1h2 (27) = O( p h2 1+ h22 3 ) h1h2 . (28) Thus we have ˜ f (x, y) =O(h21)x + O(h22)y + 1 2 µ x y ¶T  fxx(0, 0) O( √ h2 1+h22 3 ) h1h2 O(√h2 1+h22 3 ) h1h2 fyy(0, 0)   µ x y ¶ (29) + O(px2_{+ y}23₎ ₍₃₀₎

From this we get the inequality

| ˜f (x, y)| ≤1 2|fxx(0, 0)|h 2 1+ 1 2|fyy(0, 0)|h 2 2+ O( q h2 1+ h22 3 ) (31) ≤ |fxx(0, 0)| + |fyy(0, 0)| 2 h 2

max+ O(h3max) (32)

This result can be used to bound the covariance function approximation error and its contribution to the bias in the periodogram.

3.6.2 Covariance Function Interpolation Error

From Lemma 3.1 we can now prove that the interpolation error in the whole area Ω = {t, s : 0 ≤ t ≤ T, 0 ≤ s ≤ T } is bounded. Let

hmax= max 1≤i<Nt−1

(ti+1− ti). (33)

(13)

Corollary 3.1 Assume that r is bounded, continuous and has bounded first derivatives. Let ˆrT,0 _{be defined as in (13) then}

max

t,s∈[0,T ]

¯

¯r(t − s) − ˆrT,0_{(t, s)}¯_{¯ ≤ Ch}

max (34)

Proof: Follows directly from Lemma3.1.

In the case of piecewise linear interpolation of ytthere will be a bilinear

interpo-lation of r(t − s) where the approximation error is given by the following lemma

Lemma 3.2 Assume that ry(t, s) is bounded, continuous and has bounded

sec-ond derivatives. Let rT,1_ˆ_y be defined as in (13) then

max

t,s∈[0,T ]

¯

¯r(t − s) − ˆrT,1_{(t, s)}¯_{¯ ≤ Ch}2

max (35)

Proof: Follows directly from Lemma3.1.

We will now continue to treat the bias in the periodogram.

3.7 Periodogram Bias

From (6) and (7) we can compute the approximate periodogram which we denote as ˆˆΦT,k_{(iω) =}¯¯_{¯ ˆ}_Yk T(iω) ¯ ¯ ¯2 (36)

The periodogram which is an estimate of the power spectrum will be biased due to interpolation and leakage effects. This bias as we will see later on translates into the bias in the parameter estimate.

3.7.1 Interpolation Contribution to the Periodogram Bias

The following expression relates the bias in the approximate periodogram to the error from the interpolation of the covariance function.

Lemma 3.3 Let y(t) be a stationary stochastic process. Let ˆˆΦT

y(iω) be defined by (51) and let ˆˆΦT,k

y (iω) be defined by (36). Then, for

∆ ˆΦ1=

¯ ¯ ¯E

h

ˆˆΦT,k_{(iω) − ˆˆ}_ΦT_(iω)i¯¯_¯ ₍₃₇₎ we have

∆ ˆΦ1≤ C1 max t,s∈[0,T ]

¯

¯r(t − s) − ˆrT,k_{(t, s)}¯_¯ ₍₃₈₎ Proof: From the definitions we get

∆ ˆΦ1=

¯ ¯ ¯E| ˆYk

T(iω)|2− E|YT(iω)|2

¯ ¯ ¯ = ¯ ¯ ¯ ¯ ¯ 1 T Z T 0 Z T 0 ¡ ˆ rT,k(t, s) − r(t − s)¢eiωn(t−s)_dtds ¯ ¯ ¯ ¯ ¯ ≤ 1 T Z T 0 Z T 0 ¯ ¯ˆrT,k_{(t, s) − r(t − s)}¯_{¯ dtds.}

(14)

Let ∆r(t, s) = ¯¯ˆrT,k_{(t, s) − r(t − s)}¯¯ and make a change of variables. Then we get 1 T Z T 0 Z T 0 ∆r(t, s)dtds = 1 T Z T 0 Z T −t −t ∆r(t, t + τ )dτ dt = 1 T Z _T 0 Z _{min{α,T −t}} max{−α,−t} ∆r(t, t + τ )dτ dt + 1 T Z _T α Z _−α −t ∆r(t, t + τ )dτ dt + 1 T Z T −α 0 Z T −t α ∆r(t, t + τ )dτ dt These terms can be separately bounded. For the first term we have

1 T Z _T 0 Z _{min{α,T −t}} max{−α,−t} ∆r(t, t + τ )dτ dt ≤ 1 T Z _T 0 Z _{min{α,T −t}} max{−α,−t} Chmmaxdτ dt ≤ αChm max

For some α > 0 and λ > 0 we have

∆r(τ ) ≤ |r(τ )| + |ˆr(t, t + τ )| < e−λ|τ |_.

The second term can then be bounded the following way 1 T Z T α Z −α −t ∆r(t, t + τ )dτ dt ≤ 1 T Z T α Z −α −t e−λ|τ |_{dτ dt} ≤ Z −α −∞ e−λ|τ |_dτ.

We then choose α > 0 such that

αChmmax≥

Z _−α

−∞

e−λ|τ |dτ

and we have our result.

This is the first part of the bias. The second part comes from the leakage. 3.7.2 Leakage Contribution to Bias

Since we only observe our process during a finite time interval [0, T ] the expected periodogram and the spectrum will be slightly different. The following lemma by Ljung [4] quantifies this difference.

Lemma 3.4 Let y(t) be a stationary stochastic process with spectrum Φy and

let ˆˆΦT

y(iω) be defined by (51), then

∆ ˆΦ2=

¯ ¯

¯EˆˆΦT(iω) − Φ(iω) ¯ ¯ ¯ ≤C2

(15)

Proof: Lemma 6.1 in [4].

EYT(iω)YT(−iξ) = 1 T Z T 0 Z T 0 Ey(r)y(s)eiω(ωr−ξs)_dsdr = 1 T Z _T 0 Z _T 0 r(t − s)e−iω(ωt−ξs)dsdt = 1 T Z T 0 e−i(ω−ξ)t Z t t−T r(τ )e−iξτ_{dτ dt} Now Z _t t−T

r(τ )e−iξτdτ = Φy(iξ) − Z _t−T −∞ r(τ )e−iξτdτ − Z _∞ t R(τ )e−iξτdτ and 1 T Z _T 0 e−i(ω−ξ)t_{dt =} ( 1, if ω = ξ 0, if (ω − ξ) = 2π T k, k = ±1, ±2, . . . , ±∞ Consider ¯ ¯ ¯ ¯ ¯ 1 T Z _T 0 e−i(ω−ξ)t Z _t−T −∞ r(τ )e−iξτ_{dτ dt} ¯ ¯ ¯ ¯ ¯≤ 1 T Z _T 0 Z _t−T −∞ |r(τ )| dτ dt ≤ 1 T Z ₀ −∞ |τ | |r(τ )| dτ ≤ C T provided Z _∞ −∞ |τ ry(τ )| dτ Similarly ¯ ¯ ¯ ¯ ¯ 1 T Z _T 0 e−i(ω−ξ)t Z _t−T −∞ r(τ )e−iξτdτ dt ¯ ¯ ¯ ¯ ¯≤ 1 T Z _∞ 0 |τ | |r(τ )| dτ ≤C T

3.7.3 Power Spectrum Bias Expression

The previous two lemmas can now be used to estimate the difference between the expected approximate periodogram and the spectrum

Theorem 3.1 If we define ∆ ˆΦ = ¯ ¯ ¯EˆˆΦT,k− Φ ¯ ¯ ¯ then we have ∆ ˆΦ ≤ C1hk+1max+ C2 T (40)

(16)

Proof: Application of the triangle inequality, Lemma3.4 and Lemma3.3 yields ∆ ˆΦ = ¯ ¯ ¯EˆˆΦT,k_{− Φ}¯¯_¯ ≤ ¯ ¯ ¯EˆˆΦT,k_{− E ˆˆ}_ΦT_{+ E ˆˆ}_ΦT _{− Φ}¯¯_¯ ≤ ¯ ¯ ¯E h

ˆˆΦT,k_{− ˆˆ}_ΦTi¯¯_{¯ +}_¯E¯¯ hˆˆΦT _{− Φ}i¯¯_¯ ≤ C1maxr,s∈[0,T ]

¯ ¯

¯ ˆRT,k_{(r, s) − R(r − s)}¯¯_{¯ +}C2 T

We will now turn to the procedure of estimating parameters from the approxi-mate periodogram.

(17)

4 Maximum Likelihood Estimation in the Time

Domain

A set of samples y(tk), k = 1 . . . Nt of the stationary output of the CARMA

process in (1) will be distributed as

Y =    y(t1) .. . y(tNt)    ∈ N (0, R(θ)) where R(θ) =      r(t1− t1, θ) . . . r(t1− tNt, θ) r(t2− t1, θ) . . . r(t2− tNt, θ) .. . ... r(tNt− t1, θ) . . . r(tNt− tNt, θ)     .

For the stationary case [3]

r(τ, θ) = CeAτ_{P C}T

where

AP + P A + σ2_BBT _{= 0.} ₍₄₁₎

This paves the way for straightforward Maximum-Likelihood Estimation by the criteria

ˆ

θ = arg min θ Y

T_R(θ)−1_{Y + log det R(θ)} ₍₄₂₎

The reason for this is that if the data ytk is the output of the stochastic process

in (1) the samples will be distributed as

p(y(t1), . . . , y(tNt)|θ) = 1 (2π)Nt/2 p det R(θ)e −1 2YTR(θ)−1Y.

The negative log-likelihood function of this distribution will be

− log p(Y |θ) = Nt 2 log 2π + 1 2log det R(θ) + 1 2Y T_R(θ)−1_Y.

The Maximum Likelihood (ML) method for estimating the parameters would hence be

ˆ

θ = arg min θ Y

T_R(θ)−1_{Y + log det R(θ)} ₍₄₃₎

A big ostacle is that we have to compute R(θ)−1_{Y and retrive the eigenvalues}

(18)

4.1 Efficient Computation

The matrix R(θ) will have an almost banded structure since r(τ ) → 0 as e−λτ

for some λ. Hence we can effectively set all r(τ ) = 0 for τ > B for some bandwidth B. This will yield an R(θ) which is exactly banded and algorithms optimized for such matrices can be used.

Since the new matrix R(θ) is symmetric, positive definite and banded it is possible to perform a banded LDLT _{factorization which is a special formulation}

of the Cholesky factorization

R(θ) = L(θ)D(θ)LT_(θ). ₍₄₄₎

The objective function in the optimization could then be rewritten as ˆ θ = arg min θ kL(θ) −1_{Y k}2 D(θ)+ Nt X k=1 log Dkk(θ)

where the Dkk(θ) are the eigenvalues λk(R(θ)). This approach tends to work

only if we have a medium or small number of measurements Nt since the size

of R(θ) grows as Nt× Nt.

4.2 Cramer-Rao Lower Bound

A property of the maximum likelihood estimator is that it under certain condi-tions is consistent i.e.

lim

Nt→∞

ˆ

θ → θ∗

and the covariance matrix of the estimated parameters is bounded as

P = E(ˆθ − θ∗)(ˆθ − θ∗)T ≥ M−1. Here M = −E d 2 dθ2log p(Y |θ ∗₎

is known as the Fisher information matrix and the bound is known as the Cramer-Rao Lower Bound (CRLB)[4].

In the case that the distribution is Gaussian it is possible to give a more explicit expression for the CRLB known as the Slepian-Bang formula [9].

Mij = 1 2 T r R(θ) −1∂R(θ) ∂θi R(θ) −1∂R(θ) ∂θj

This expression can and will be used later on in order to evaluate the efficiency of different parameter estimators for a fairly large number of samples. We will now move on and consider a ML method in the frequency domain.

(19)

5 Maximum Likelihood Estimation in the

Fre-quency Domain

When a continuous time measurement y(t), t ∈ [0, T ] of the output is available the parameters of CARMA models can be estimated by solving the following minimization problem ˆ θ , arg min θ Nω X k=1 ˆˆΦ(iωk)

Φ(iωk, θ)+ log Φ(iωk, θ) (45)

where the frequencies ωk, k = 1, . . . , Nωare chosen such that

ωk= 2π

T l l ∈ Z. (46)

This is a maximum-likelihood procedure where the model has been transformed into the frequency domain. The frequencies have been deliberately selected such that the Fourier transforms of the output at different frequencies are uncorre-lated. If we compare the ojective function with the time-domain expression we see that the quadratic form in the time-domain criteria have been replaced with as summation in the frequency domain. Hence the, complexity has been greatly reduced.

The criteria here is the negative log-likelihood function for the real and imaginary parts of the periodogram given the parameters θ. Most of the material in this section can be found in classical textbooks such as the one by Brillinger [1] but we provide a different perspective on the problem.

5.1 Frequency Domain Model

In this subsection we show what the result will be if we apply the Fourier transform YT(iω) = √1 T Z T 0 yte−iωt_dt

to the stochastic process yt defined in (1). This will produce a set of Gaussian

complex-valued stochastic variables Y (iω).

Theorem 5.1 Assume that we have stochastic process ytgenerated by the CARMA

model in (1). Then the Fourier transform of the process from time t = 0 to t = T will be Y (iω) = σB(iω) A(iω) 1 √ T Z _T 0 e−iωtdet+√1 TC(iωI − A) −1_(x 0− e−iωTxT) (47) were x0 and xT are the states at the initial point and the end point.

Proof: Assume that we have the following system yt= CeAt_x

0+

Z t 0

(20)

Assume that this signal is observed through a window

W (t) = √1 TI[0,T ]

where I is the indicator function. Then we have the transform

Y (iω) =

Z _∞

−∞

W (t)yte−iωtdt

which is a stochastic variable now. The integral of the first term will be 1 √ T Z _T 0 Ce(A−iωI)tx0dt = √1 TC(iωI − A) −1_{(I − e}(A−iωI)T_)x 0

were x0 can be a stochastic variable. Integration by parts

Z _T 0 f (t)Btdt = F (T )BT − Z _T 0 F (t)dBt f (t) = Ce(A−iωI)t

F (t) = C(A − iωI)−1_e(A−iωI)t Bt= Z _t 0 e−AτBdeτ will give 1 √ T Z _T 0 Z _t 0

CeA(t−τ )Bσdete−iωtdt =

1 √ TC(A − iωI) −1_e(A−iωI)T Z _T 0 e−Aτ_Bσdeτ₋_√1 T Z _T 0

C(A − iωI)−1_e(A−iωI)t_e−At_Bσdet

1 √ TC(A − iωI) −1_e−iωT Z T 0 eA(T −τ )_Bσdeτ₋_√1 T Z T 0

C(A − iωI)−1_e(A−iωI)t_e−At_Bσdet

1 √ TC(A − iωI) −1_e−iωT Z _T 0 eA(T −τ )Bσdeτ−√1 T Z _T 0

C(A − iωI)−1Be−iωtσdet

1 √ TC(A − iωI) −1_e−iωT Z _T 0 eA(T −τ )_Bσdeτ₊_√1 Tσ B(iω) A(iω) Z _T 0 e−iωt_det

One effect of the initial and end point conditions is that Z _T

0

eA(T −τ )Bσdeτ+ eATx0 = xT

and by this the transform will become

X(iω) = B(iω) A(iω) 1 √ T Z _T 0 e−iωt_de t+ 1 √ TC(iωI − A) −1_(x 0− e−iωTxT)

(21)

We will now show that for a specifice set of frequencies ωkthe random variables ET(wk) will be uncorrelated with each other.

Theorem 5.2 If ET(iωk) = √1 T Z T 0 e−iωt_det and ωk =2π T l l ∈ Z. then for k, l ∈ Z EE(iωk)E(−iωl) = δk−l

Proof: Study the following vector of random variables

µ ReET(iω) ImET(iω) ¶ = Ã ₁ √ T R_T 0 cos ωtdwt 1 √ T RT 0 − sin ωtdwt !

Assume that we take two different frequencies ω1 and ω2 and check the

corre-lation E µ ReET(iω1) ImET(iω1) ¶ µ ReET(iω2) ImET(iω2) ¶T = Ã 1 T RT 0 cos ω1t cos ω2tdt −T1 RT 0 cos ω1t sin ω2tdt −1 T RT 0 sin ω1t cos ω2tdt T1 RT 0 sin ω1t sin ω2tdt !

The term in the upper left corner becomes 1 T Z T 0 cos ω1t cos ω2tdt = 1 2T Z T 0 cos(ω1− ω2)t + cos(ω1+ ω2)tdt =1 2 µ sin(ω1− ω2)T (ω1− ω2)T + sin(ω1+ ω2)T (ω1+ ω2)T ¶

and the term in the lower right will become 1 T Z T 0 sin ω1t sin ω2tdt = 1 2T Z _T 0 cos(ω1− ω2)t − cos(ω1+ ω2)tdt =1 2 µ sin(ω1− ω2)T (ω1− ω2)T − sin(ω1+ ω2)T (ω1+ ω2)T ¶ .

The term in the upper right will be

− 1 T Z T 0 cos ω1t sin ω2tdt = − 1 2T Z _T 0 sin(ω2− ω1)t + sin(ω2+ ω1)tdt =1 2 µ cos(ω2− ω1)T − 1 (ω2− ω1)T + cos(ω2+ ω1)T − 1 (ω2+ ω1)T ¶ .

(22)

and the lower left is − 1 T Z _T 0 sin ω1t cos ω2tdt = − 1 2T Z _T 0 sin(ω1− ω2)t + sin(ω1+ ω2)tdt =1 2 µ cos(ω1− ω2)T − 1 (ω1− ω2)T + cos(ω2+ ω1)T − 1 (ω1+ ω2)T ¶ .

If we want the random variables at different frequencies to be uncorrelated, then following must be satisfied.

           sin(ω1−ω2)T (ω1−ω2)T + sin(ω1+ω2)T (ω1+ω2)T = 0 sin(ω1−ω2)T (ω1−ω2)T − sin(ω1+ω2)T (ω1+ω2)T = 0 cos(ω2−ω1)T −1 (ω2−ω1)T + cos(ω2+ω1)T −1 (ω1+ω2)T = 0 cos(ω1−ω2)T −1 (ω1−ω2)T + cos(ω2+ω1)T −1 (ω1+ω2)T = 0.

By subtracting the second equation from the first and the fourth equation from the third we and then eliminating we will get

           sin(ω1−ω2)T (ω1−ω2)T = 0 sin(ω1+ω2)T (ω1+ω2)T = 0 cos(ω1−ω2)T −1 (ω1−ω2)T = 0 cos(ω2+ω1)T −1 (ω1+ω2)T = 0.

This yields that

(

ω1− ω2=2πTk k ∈ Z ω1+ ω2=2π_Tl l ∈ Z

This motivates choosing the frequencies ωk, k = 0, . . . , Nω such that ωk= 2π

T l l ∈ Z. (48)

If we make the above choice then

YT(iωk) ∈ Nc µ K(iωk), σ2|B(iωk)|2 |A(iωk)|2 ¶ (49) where Nc denotes the complex normal distribution [1]. That is

(23)

5.2 Objective Function

The likelihood function for the estimated values of the Fourier transform

{YT(iω1), YT(iω2), . . . , YT(iωNω)}

will according to the expression in (50) be

p(YT(iω1), YT(iω2), . . . , YT(iωNω)|θ) = Nω Y k=1 1 2πσ2 |B(iωk)|2 |A(iωk)|2 e −|YT (iωk)−K(iωk)|2 σ2|B(iωk)|2 |A(iωk)|2 _.

The negative log likelihood function will be

L(θ) = − log p(YT(iω1), YT(iω2), . . . , YT(iωNω)|θ)

= Nωlog 2π + Nω

X

k=1

|YT(iωk) − K(iωk)|2 σ2 |B(iωk)|2

|A(iωk)|2

+ log σ2|B(iωk)|2 |A(iωk)|2.

Suppose now that a whole continuous time realization {y(t) : t ∈ [0, T ]} of the output of a CAR model is known. Define the periodogram of this output as

ˆˆΦT_{(iω, θ) = |Y}

T(iω) − K(iω)|2 (51)

since K(iw) will depend on the parameters, endpoint and initial values. The ML-procedure for estimating the parameters and the covariance is then

ˆ θ , arg min θ V (θ, ˆˆΦ T₎ where V (θ, ˆˆΦT) , Nω X k=1 ˆˆΦT_(iω k, θ)

Φ(iωk, θ) + log Φ(iωk, θ) (52)

and

Φ(iω, θ) = σ2|B(iωk)|

2 |A(iωk)|2.

In the remaining theory in this report we will disregard the effects of inital and endpoint conditions. In practice, while estimating models, we will allways estimate initial and endpoint values. We refer the reader to the book by Pintelon and Schoukens[6] which treats the topic of initial values more thoroughly.

(24)

6 Properties of Bias and Variance

In this section we show how the bias and variance of the parameter estimates will be related to the bias and variance of the periodogram. In the case of bias we will have E(ˆθk T− θ∗) ≈ (Vθθ00(θ∗, Φ))−1 Nω X k=1 V00 θΦk(θ ∗_{, Φ)}³_{E ˆˆ}_ΦT,k_(iω k) − Φ(iωk, θ∗) ´ where V00 θθ(θ∗, Φ) ≈ Nω X k=1 Φ0 θ(iωk, θ∗) Φ(iωk, θ∗) µ Φ0 θ(iωk, θ∗) Φ(iωk, θ∗) ¶T (53) if T is large and hmax is small, and

V00 θ ˆΦˆk

(θ∗_{, ˆˆ}_{Φ) =} Φ0θ(iωk, θ∗)

Φ(iωk, θ∗₎2.

In the case of variance we resort to asymptotics where

E(ˆθk

T− θ∗)(ˆθTk − θ∗)T → (Vθθ00(θ∗, Φ)) −1

when T → ∞ and hmax → 0. In the subsections below we derive these

expres-sions for the derivatives, bias and variance.

6.1 Derivatives

In both the expressions for the bias and the asymptotic variance second order derivatives with respect to the parameters are needed.

Lemma 6.1 Let V be defined as in (52).Then

V00 θθ(θ, ˆˆΦ) = Nω X k=1 Φ00

θθ(iωk, θ) − Φ0θ(iωk, θ)Φ0θ(iωk, θ)T

Φ(iωk, θ)2 Ã 1 − ˆˆΦ(iωk) Φ(iωk, θ) ! + Nω X k=1 Φ0

θ(iωk, θ) ˆˆΦ(iωk)Φ0θ(iωk, θ)T

Φ(iωk, θ)3 Proof: From the definition in (52) we have

Vθ0(θ, ˆˆΦ) = Nω X k=1 Φ0 θ(iωk, θ) Φ(iωk, θ) Ã 1 − ˆˆΦ(iωk) Φ(iωk, θ) ! and therefor V00 θθ(θ, ˆˆΦ) = Nω X k=1 Φ00

θθ(iωk, θ) − Φ0θ(iωk, θ)Φ0θ(iωk, θ)T

Φ(iωk, θ)2 Ã 1 − ˆˆΦ(iωk) Φ(iωk, θ) ! + Nω X k=1 Φ0

θ(iωk, θ) ˆˆΦ(iωk)Φ0θ(iωk, θ)T

(25)

If T is large and hmax is small then then ˆˆΦ ≈ Φ and V00 θ ˆΦˆk (θ∗, ˆˆΦ) = Φ 0 θ(iωk, θ∗) Φ(iωk, θ∗₎2. .

We also take a look at the derivatives of the parameters with respect to the parameters and the information from the estimated spectrum. This is summa-rized in the following lemma

Lemma 6.2 Let V be defined as in (52).Then

V00 θ ˆΦˆk

(θ∗_{, ˆˆ}_{Φ) = −}Φ0θ(iωk, θ∗)

Φ(iωk, θ∗₎2 Proof: As in the previous lemma we have

Vθ0(θ, ˆˆΦ) = − Nω X k=1 Φ0 θ(iωk, θ) Φ(iωk, θ) Ã 1 − ˆˆΦ(iωk) Φ(iωk, θ) !

we now let ˆˆΦk= ˆˆΦ(iωk) and then V00 θ ˆΦˆk (θ∗, ˆˆΦ) = −Φ 0 θ(iωk, θ∗) Φ(iωk, θ∗₎2

6.2 Bias Expression

If we define the exact parameters as

ˆ

θk

T , arg min_θ V (θ, ˆˆΦT,ky )

and the estimated parameters as

θ∗, arg min

θ V (θ, ˆΦy) (54)

we get the following result for the bias.

Theorem 6.1 E(ˆθk T − θ∗) ≈ (Vθθ00(θ∗, Φ))−1 Nω X k=1 V00 θΦk(θ ∗_{, Φ)}³_Φ(iω_{k, θ}∗_{) − E ˆˆ}_ΦT,k_(iω k) ´ (55)

Proof: From the definition in (45) we have

V_θ0(ˆθk

T, ˆˆΦT,k) = 0 V_θ0(θ∗_{, Φ) = 0}

A Taylor expansion of V_θ0(θ, Φ) around θ∗ _{and Φ}

k= Φ(iωk) yields Vθ0(ˆθkT, ˆˆΦT,k) ≈ V 0 θ(θ∗, Φ) + V 00 θθ(θ∗, Φ)(θTk − θ∗) + Nω X k=1 V00 θΦk(θ ∗_{, Φ)}³ˆˆΦT,k_(iω k) − Φ(iωk) ´ .

(26)

From this we get ˆ θk T − θ∗≈ (Vθθ00(θ∗, Φ))−1 Nω X k=1 V00 θΦk(θ ∗_{, Φ)}³ˆˆΦT,k_(iω k) − Φ(iωk, θ∗) ´ .

By taking expectations on both sides we get the conclusion.

6.3 Variance Expressions

An expression for the bias can be computed in a similar manner.

Theorem 6.2 The asymptotic variance will be

E(ˆθkT− θ∗)(ˆθTk − θ∗)T ≈ Nω X k=1 Nω X l=1

Ψ(iωk, θ∗_)Ψ(iω_{l, θ}∗₎T_{E[∆Φ(iωk, θ}∗_)∆Φ(iω_{l, θ}∗_)] where

Ψ(iωk, θ∗) = Vθθ00(θ∗, Φ)−1VθΦ00k(θ ∗_{, Φ)} and

∆Φ(iωk, θ∗) = ˆˆΦT,k(iωk) − Φ(iωk, θ∗)

Proof: This is a simple application of the same Taylor expansion as in the

case of bias expression ˆ θTk − θ∗≈ (Vθθ00(θ∗, Φ))−1 Nω X k=1 VθΦ00k(θ ∗_{, Φ)}³ˆˆΦT,k_(iω k) − Φ(iωk, θ∗) ´ .

Multiply this expression by the transpose of itself and take expectation, and we are done.

As you can see this is a translation of the covariance of the spectrum to the covariance of the parameter estimates. A bound on the spectrum covariance is therefor needed. The expression above is quite elaborate to calculate and therefor we settle with the asymptotic properties. When T → ∞ and hmax→ 0

we get

E∆Φ(iωk, θ∗_)∆Φ(iω_{l, θ}∗_{) →}

(

Φ(iωk, θ∗₎2 _{if iωk}_{= iω} l

0 if iωk6= iωl.

Then by simple calculation we will get the asymptotic result

E(ˆθk

T − θ∗)(ˆθkT− θ∗)T ≈ Vθθ00(θ∗, Φ)−1. (56)

(27)

7 Practical Considerations for Frequency

Selec-tion

So far we have said very little about which frequencies to use in the criteria (52) in order to make a tradeoff-between bias and variance. We have only pointed out earlier that we are restricted to the frequencies

ωk =2π

T l, l ∈ Z

where k = 1, . . . , Nω. As will be explained below it is important to avoid

high frequencies, because they will produce a considerable parameter bias if the estimated spectrum is biased. We will also explain why some frequencies have a significant effect on the bias while others have very little effect.

7.1 Minimizing the Variance

According the the asymptotic expression in (56) the variance is roughly inversely proportional to the quantity

Nω X k=1 Ψ(iωk, θ∗)Ψ(iωk, θ∗)T where Ψ(iω, θ) = Φ 0 θ(iω, θ) Φ(iω, θ) For a CAR model where θ = [aT _σ2_]T_, _{a = [a}

1 . . . an]T and Φ(iω, θ) = σ 2 |A(iω, θ)|2 we have Ψ(iω, θ) = " −(|A(iω)|2)0a |A(iω)|2 1 σ2 #

In Figure2we show the first two elements of Ψ(iω, θ∗_{) for a CAR model. From}

the figure we see that these quantities are only significantly different from zero on a small frequency interval. Therefor a rule of thumb would be to only include frequencies where the magnitude of the elements of Ψ(iω, θ) are large in order to reduce the variance of the parameter estimates. The circles represent the discrete frequencies we are allowed to use when T = 25.

7.2 Minimizing the Bias

From expression (55) we know that the contribution to the bias from each individual frequency component of the power spectrum is roughly proportional to V00 θθ Ã θ∗_{, ˆˆ}_Φ)−1 µ Φ0 θk(iωk, θ ∗₎ Φ(iωk, θ∗) ¶T!−1 Φ0 θ∗(iωk, θ∗) Φ(iωk, θ∗₎ E ˆˆΦ(iωk) − Φ(iωk, θ) Φ(iωk, θ∗₎ .

(28)

−5 0 5 −1 −0.5 0 ω Φa 1 / Φ −5 0 5 −2 −1 0 1 ω Φa 2 / Φ

Figure 2: Relative sensitivity for a1 (upper) and a2(lower) for a CAR model.

Here A(p) = p2_{+ a1}_{p + a}_{2, B(p) = 1, σ = 1, a1}_{= 2 and a2=1.}

In Figure3the term

E ˆˆΦ(iωk) − Φ(iωk, θ)

Φ(iωk, θ∗) .

is plotted as a function of the frequency ω. The system is the same as in the previous figure and the output is uniformly sampled with sampling interval

Ts= 0.5. The solid line is the relative spectral bias for the Discrete-Time Fourier Transform. The dotted line indicates the same bias for ZOH interpolation and the dash-dotted for FOH. The contribution from an individual frequency to the

−6 −4 −2 0 2 4 6 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 ω Riemann ZOH FOH

Figure 3: Relative spectral bias for an uniformly sampled CAR model. Here

A(p) = p2_{+ 2p + 1, B(p) = 1, σ = 1 and T}

s= 0.5.

total bias is proportional to Φ0 θ∗(iωk, θ∗) Φ(iωk, θ∗) E ˆˆΦ(iωk) − Φ(iωk, θ) Φ(iωk, θ∗) .

In Figure4 we illustrate the contribution to the bias as a function of ω for the same system and circumstances as above.

(29)

−6 −4 −2 0 2 4 6 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 ω Riemann ZOH FOH −6 −4 −2 0 2 4 6 −0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1 ω DTFT ZOH FOH

Figure 4: Bias contribution for each frequency for a1(left) and a2(right). The

model is uniformly sampled and of CAR type. Here A(p) = p2_{+ a1}_{p + a}_2,

B(p) = 1, σ = 1, a1= 2, a2= 1 and Ts= 0.5.

7.3 Re-parameterization

Consider the second order CAR model

yt= σ

p2_{+ a}₁_{p + a}₂et. (57)

We can also re-parameterize this model as

yt= ω 2 0 p2_{+ 2ζω} 0p + ω02 σet (58)

where ζ and ω0are named damping ratio and the undamped natural frequency.

The vector of parameters for this model is

θ = [ζ ω0λ]T (59)

where λ = σ2_.

For this special parameterization the relative sensitivity functions with re-spect to rere-spective parameters are shown in Figure 5. Here we have chosen

a1= 2 and a2= 2. This means that w0= 1/ √

2 and ζ =√2. From this figure, we conclude that the relative damping ζ is sensitive near the natural resonance frequency ω0 of the system. The natural frequency ω0 on the other hand is

(30)

0 −5 0 5 −3 −2 −1 0 ω Φζ / Φ −50 0 5 1 2 3 ω Φω 0 / Φ

Figure 5: Relative sensitivities for ζ (upper) and ω0 (lower). Here A(p) =

p2_{+ 2ζω0}_{p + ω}2

0 and B(p) = ω20, σ = 1, ζ = 1/

√

2 and ω0=√2.

8 Numerical Experiments

In this section we will illustrate the previous theory with a few examples. First the method is applied to a first order continuous-time autoregressive model. Then we proceed with a second order model. The examples are used to show the effects of interpolation and time of observations on the parameter estimates. Finally we illustrate how the choice of frequencies will affect the parameter estimates.

8.1 Measurement & Simulation

In this report we set out to estimate θ from uniformly and non-uniformly dis-tributed discrete-time samples of the continuous-time output process {yt : t ∈

[0, T ]}. We denote the instances of these samples by {t1, t2, . . . , tNt} and the

observations themselves by {yt1, yt2, . . . , yt_Nt}. We use samples that are

addi-tively randomly distributed around a nominal sampling time Ts according to

the following model

tk= kTs+ k X l=1 δl where δl∈ U (−δ0, δ0).

In order to simulate the discrete-time samples we resort to discretizing the model at Tis = Ts/100 where Tis stands for the “inter-sample” sample time.

The discretized system will become ( xkTis = e ATis_y (k−1)Tis+ RkTis (k−1)Tise A(kTis−τ )_BdWt. ykTis = CxkTis

The last term in this expression is approximated as Z _kT_is (k−1)Tis eA(kTis−τ )_Bdwt_≈ _√1 Tis Z _kT_is (k−1)Tis eA(kTis−τ )_{Bdτ v}

(31)

where v ∈ N (0, 1). This approach has proved to work well in practice and samples have been picked randomly according to a discrete version of the scheme in (60).

8.2 First Order CAR Model

In this section we will apply the method described earlier to a simple first order continuous-time autoregressive model. The first system is

yt= σ

p + aet (60)

where a = 1 and σ = 1. We will observe the effect of: sampling time, the amount of non-uniformity in the sampling and in the time of observation. We will notice that the amount off non-uniformity have little effect on the amount of bias and variance. The variance and bias decreases as the time of observation increases. Finally, the bias decreases as the sampling interval decreases. 8.2.1 Observation Time

In Figure 6 we have simulated and estimated the system for different values of T. The dotted lines represent the Cramer-Rao lower bound computed by

0 20 40 60 80 100 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 a T

Figure 6: Monte-Carlo Simulations of estimates of a for different observation times T . The model is A(p) = p + a, B(p) = 1, σ = 1 and a = 1.Here

NM C = 100, Ts = 0.1 and δ0 = Ts/5. The dotted line represents a CRLB

estimate. Vertical bars indicate the standard deviation of estimates.

Monte-Carlo Simulations using the Slepian-Bang formula [9]

E[(ˆa − a)(ˆa − a)] ≤ 2 T rR−1_R0

aR−1R0a

(61) where R is the covariance matrix for the measurements and R0

a is the derivative

of the covariance matrix with respect to a. The bias and standard deviation decrease with the time of observation. For moderately large T the standard deviation is approximately same as the Cramer-Rao bound.

(32)

8.2.2 Sampling Interval

In Figure 7 we have varied the sample time T s in order to illustrate the effect on the bias. From the figure we see that the jitter have very little effect on the

0 0.2 0.4 0.6 0.8 1 1.2 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 T s a

Figure 7: Monte-Carlo Simulations of estimates of a for different sample times

Ts. The model is A(p) = p + a, B(p) = 1, σ = 1 and a = 1. Here NM C = 250,

T = 200 and δ0 = Ts/5. The vertical bars indicate the standard deviation of

estimates.

quality of the estimates. 8.2.3 Jitter

In Figure 8 we have plotted the effect of different amount of jitter δ0 on the

sampled data. Here the magnitude of the δ0has very little effect on the sampling

interval.

8.2.4 Effect of Linear Interpolation

An interesting observation is that the FOH interpolation does not improve the performance of the method. This is illustrated in the Figure 9 where there is significant bias compared to ZOH. The reason for this as we have seen earlier is that the spectrum is more biased for FOH than for ZOH.

(33)

0 0.02 0.04 0.06 0.08 0.1 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 δ₀ a

Figure 8: Monte-Carlo Simulations of estimates of a for different δ0. The model

is A(p) = p + a, B(p) = 1, σ = 1 and a = 1. Here NM C = 250, T = 200 and

Ts= 0.1. The vertical bars indicate the standard deviations.

8.3 Numerical Example for a second order CAR model

We have also estimated the parameters of a second order continuous-time au-toregressive model where

yt= σ

p2_{+ a}

1p + a2et (62)

First we will illustrate how different kind of interpolation will effect the param-eter estimates. Then we show how the choice of frequency interval affects the quality of estimates.

8.3.1 Interpolation

The effects of “Riemann”, ZOH interpolation and FOH interpolation are illus-trated below in Figure10As in the first order CAR case FOH is no better than ZOH. The performance of the “Riemann” case is even worse. The reasons for this can be attributed to the spectral bias described earlier.

8.4 Choice of Frequencies

In Figure 11we we have estimated the parameters in the model in (62). First we used the frequency span {2π₂₀₀1 , . . . , 2π200₂₀₀} while in the second case we

have used {2π 8

200, . . . , 2π208200}. As can be seen from the picture this doubled the

variance of the a2 parameter estimate. Standard deviation for a2 is 0.0201 in

the first case which is indicated by dots. In the second case which is indicated by plus signs we have 0.0491 . The reason is as we can see from Figure 2that the parameter is especially sensitive at low frequencies.

While there are negative effects of the exclusion of low frequencies the use of high frequencies can be equally detrimental. This is illustrated in Figure 12 where we have used a frequency window up to the the frequency indicated by the x-axis. The reason for the large RMSE

(34)

0 0.2 0.4 0.6 0.8 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 T s a

Figure 9: Monte-Carlo Simulations of estimates of a for different Ts and ZOH

(solid) and FOH (dot-dashed) assumptions. The model is A(p) = p + a, B(p) =

1, σ = 1 and a = 1. Here NM C= 250, T = 250 and δ0= Ts/5.

RM SE = v u u t 1 NM C NXM C j=1 (63) at high frequencies is that there is a very large relative bias at these frequen-cies.

(35)

0.051 0.1 0.15 0.2 0.25 0.3 0.35 0.4 2 3 a1 T s 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.5 1 1.5 a2 T s 0.050 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 σ T s

Figure 10: Monte-Carlo Simulations of estimates of a1,a2and σ for different Ts.

There are three interpolation schemes: “Riemann” (solid), ZOH (dotted) and

FOH (dash-dotted). The model is A(p) = p2_{+ a1}_{p + a}_{2,B(p) = 1, σ = 1,a1}_{= 2}

and a2= 1. Here NM C = 100, T = 200 and δ0= Ts/5.

9 Conclusion

In this paper we identify a continuous-time ARMA process from non-uniformly distributed samples of the output. The continuous-time realization is approxi-mated by piecewise constant and piecewise linear interpolation. The continuous-time Fourier transform is then computed for the interpolated output. Param-eters are then estimated from the power spectrum obtained from the Fourier transform.A bound on the parameter bias in terms of interpolation and leakage is derived. These results focus on the effect of the sampling interval. By a few numerical examples we show that piecewise linear and “Riemann”. interpola-tion is not necessarily better than piecewise constant interpolainterpola-tion. Guidelines on frequency selection in order to reduce the parameter bias and variance are also provided.

10 Further Research

One way to proceed would be to consider the Output Error Model Structure to-gether with the Box-Jenkins Case. It would also be interesting to see if tapering and faders could reduce the bias due to leakage.

References

[1] D.R. Brillinger. Time Series. Holden-Day, San Francisco, CA, 1981.

[2] M.H.A. Davis. Linear Estimation and Stochastic Control. Chapman & Hall, London, 1998.

[3] E.K. Larsson. Identification of Stochastic Continuous-time Systems. PhD thesis, Division of Systems and Control, 2003.

(36)

1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 a₁ a2

Figure 11: Parameter estimate dependence on frequency choice. Here NM C =

200, T = 200, Ts= 0.1 and δ0= Ts/5.

[4] L. Ljung. Identification: Theory for the User. Prentice-Hall, Upper Saddle River, 1999.

[5] B. Oksendal. Stochastic Differential Equations. Springer, Berlin, Heidel-berg, 1998.

[6] R. Pintelon and J. Schoukens. System Identification - A Frequency Domain

Approach. IEEE Press, Piscataway, NJ, 2001.

[7] G.P. Rao and H. Garnier. Numerical illustrations of the relevance of direct continuous-time model identification. In IFAC 2002, July 2002.

[8] N.K. Sihna and G.P. Rao. Identification of continuous-time systems :

methodology and computer implementation. Kluwer, Dordrecht, 1991.

[9] P. Stoica and R.L. Moses. Introduction to to spectral analysis. Prentice-Hall, Upper Saddle River, 1996.

[10] B. Wahlberg. Limit results for sampled systems. International Journal of

(37)

0 10 20 30 40 50 60 0 2 4 6 8 10 12 ω RMSE

Figure 12: RMSE for a second order CAR model with respect to maximum

(38)

Avdelning, Institution Division, Department

Division of Automatic Control Department of Electrical Engineering

Datum Date 2004-09-27 Spr˚ak Language ¤ Svenska/Swedish ¤ Engelska/English ¤ £ Rapporttyp Report category ¤ Licentiatavhandling ¤ Examensarbete ¤ C-uppsats ¤ D-uppsats ¤ ¨Ovrig rapport ¤ £

URL f¨or elektronisk version http://www.control.isy.liu.se

ISBN — ISRN

—

Serietitel och serienummer Title of series, numbering

ISSN 1400-3902

LiTH-ISY-R-2625

Titel Title

Frequency Domain Identification of Continuous-Time ARMA Models: Interpolation and Non-uniform Sampling

F¨orfattare Author

Jonas Gillberg, Lennart Ljung

Sammanfattning Abstract