Approximative Linear and Logarithmic Interpolation of Spectra

(1)

Logarithmic Interpolation of Spectra

Joint work with Per Enqvist

Abstract

Given output data of a stationary stochastic process estimates of covariance and cepstrum parameters can be obtained. These estimates can be used to determine ARMA models to approximately fit the data by matching the parameters exactly. However, the estimates of the parameters may contain large errors, especially if they are determined from short data sequences, and thus it makes sense to match the parameters in an approximate way. Here we consider a convex method for solving an approximate linear and logarithmic spectrum interpolation problem while maximizing the entropy and penalize the quadratic deviation from the nominal parameters.

B.1 Introduction

A common approach to system identification is to use a method of moments where some parameters estimated from the observed system is interpolated exactly. For example, when applying the Maximum entropy method of Burg [4], a set of covariances (or reflection coefficients) are estimated from the observed data. A rational model that exactly matches these covariances is then determined. The estimated parameters can have a large variance, especially if the estimated parameters are based on a small observed data set. Therefore, it is important that the model fitted to the parameters is relatively unsensitive to small variations in the parameter values. The method of Burg deals with this problem by maximizing an entropy measure finding the most unpredictable model consistent with the estimated covariances.

In this paper, a method is proposed that provide an extra control of the ro- bustness of the model, so that the designer can adapt the sensitivity of the model

43

(2)

to the quality of the parameter estimates. The exact interpolation constraints are replaced by soft constraints by introducing an extra design parameter that is the price on deviating from the nominal interpolation values.

It was recognized early that when the maximum entropy principle for probability functions is applied on noisy data the result is improved if the interpolation constraints are not enforced exactly, an example for image reconstruction is given in [20]. For spectral density estimation this insight still has to mature. One of the first references we have found on this is [24] and it is also mentioned in [31, Sec. 2.3.4]. Other approaches that deal with interpolation of spectral densities of uncertain parameters, such as [28, 25] and [9, 29], used hard constraints in the form of prespecified ellipsoidal, or interval, regions for the parameters that had to be met.

B.2 Notation

For a matrix M, M denotes component-wise complex conjugate, M^T denotes the transpose, whereas M^† denotes its hermitian conjugate.

Let h·, ·i denote the standard L2[−π, π]inner product for matrix valued square- integrable functions on the unit circle, i.e.,

hA, Bi = 1 2π

Z π

−π

trA(e^iθ)B(e^iθ)^† dθ. (B.1) This inner product induces the norm kAk = phA, Ai. In particular, if A and B are constant matrices then hA, Bi = tr AB^†

and kAk is the Frobenius norm. Define also the weighted inner product hA, BiW = hAW, Bi, where W is a positive definite matrix, and the weighted norm kAkW =phA, Ai_W.

It can be shown that, for any B and any hermitian A,

< hA, Bi_W = hA, BWi (B.2)

where BW is the hermitian part of BW – i.e B_W = 1

2(W B + B^†W)

In the norm (B.1) the adjoint is defined by A^†(z) = A(¯z⁻¹)^T, and then

hA, Bi = 1 2π

Z π

−π

trA(e^iθ)B^†(e^iθ) dθ.

B.3 State-covariances and interpolation

Let Φ be a bona fide spectral density function, i.e., an Hermitian valued analytic function defined on an annulus including the unit circle that is non-negative definite

(3)

on the whole unit circle. Then Φ can be expressed as a Laurent series and

Φ(e^iθ) =

∞

X

k=−∞

r_ke^ikθ. (B.3)

Interpolation problems for Φ will be considered. In general, any set of (conjugate) interpolations points in the unit disc can be considered. Input-to-state filters is a simple tool that can be used to formulate many interpolation problems in a uniform fashion. An input-to-state filter G is defined by a matrix A with all its eigenvalues in the open unit disc and a full-column rank matrix B such that (A, B) is reachable by

G(z) = (zI − A)⁻¹B. (B.4)

For a spectral density Φ the state-covariances are defined by Σ = Γ(Φ) =

Z

GΦG^†. (B.5)

Let Γ be a function defined on the set of Hermitian valued continuous functions on the unit circle, and L be the range of Γ (which is a linear subspace of the Hermitian matrices). Now let L⁺ be those matrices that are the image of some nonnegative continuous function of the map Γ, i.e., the set of attainable state-covariances.

It has been shown in [17] that there exists a spectral density Φ with the state- covariance Σ if and only if there exists a matrix H such that

Σ − AΣA^† = BH + H^†B^†, (B.6)

i.e., (B.6) has a solution if and only if Σ ∈ L⁺.

An important result in [19, 8] is that for every choice of a “prior” spectral density Ψ, there exists a unique spectral density of the form Φ = ΨQ⁻¹, where Q = G^†ΠG for some matrix Π ∈ L such that Q is positive definite on the unit circle. Furthermore, this spectral density minimizes the Kullback-Leibler distance dKL(Φ||Ψ), which will be defined later.

An important special case of the general interpolation problem is described in the next example.

Example B.3.1 (The Caratheodory interpolation problem) Let

A=





0 1 0 0 0 1 0 0 0



, B=



 0 0 1



.

Using (B.6) we get the well known result that L is the set of Hermitian Toeplitz matrices





h3+ h^†₃ h^†₂ h^†₁ h2 h3+ h^†₃ h^†₂ h1 h2 h3+ h^†₃



.

(4)

Then

G(z) = (zI − A)⁻¹B=



 z⁻³ z⁻² z⁻¹



, and

Q(z) = G(z)^†ΠG(z) =

z³ z² z





π11 π12 π13

π21 π22 π23

π31 π32 π33







 z⁻³ z⁻² z⁻¹





= π11+ π22+ π33+ (π12+ π23)z + (π21+ π32)z⁻¹+ π13z²+ π31z⁻². An orthogonal basis for the Hermitian Toeplitz matrices is

" ₁ ₀ ₀ 0 1 0 0 0 1

# ,

" ₀ ₁ ₀

1 0 1

0 1 0

# ,

" ₀ ₀ ₁

0 0 0

1 0 0

# ,

" ₀ _i ₀

−i 0 i

0 −i 0

# ,

" ₀ ₀ _i

0 0 0

−i 0 0

#

and spans a subspace of dimension 5 [19, Lemma 4].

Considering that Π should be in the image of Γ, it must be an Hermitian Toeplitz matrix and we can write π0 := π11= π22= π33, π1:= π12= π23 and π2:= π13 so Q(z) = π0+ π1z+ π₁^†z⁻¹+ π2z²+ π₂^†z⁻².

The interpolation of such Toeplitz matrices have a common application. For a centered wide-sense stationary stochastic process {yt}_t∈Zthe covariances E{yt+ky_t^†} are independent of t, and the power spectral density (B.3) is determined by rk = E{yky₀^†}for all k.

True covariances satisfy the properties that the Toeplitz matrix

T(r) =







r₀ r₁ · · · r_n r^†₁ r₀ ... ...

... ... ... r₁ r_n^† · · · r₁^† r0







(B.7)

is non-negative definite, and that rk= r^†_−k for all k.

One way to obtain such estimates given some samples {y1, y2, · · · , yN}

from a stationary process, is given by the (biased) estimate of the covariances

ˆ rk= 1

N

N −k

X

`=1

y`+ky^†_`, k ≥0. (B.8)

These estimates can be used to form a non-negative definite Toeplitz matrix and then it can be used as the matrix Σ in the Caratheodory special case of the interpolation problem.

(5)

In many applications the estimates are based on a small number of samples, i.e.

a small N, and hence the bias in (B.8) may be significant; If N = 100 and n = 10 it is 10%. It has been shown in [13] that even small perturbations of the covariance estimates can generate large errors in the estimates of AR-models. Therefore we propose to use unbiased estimates of the covariances, i.e., estimates

ˆ

rk = 1 N − k

N −k

X

`=1

y`+ky_`^†, k ≥0. (B.9)

such that E{ˆr} = r, and then apply a robust matching method that is able to deal with, for example, indefinite covariance matrices.

Much more is actually known about the estimation errors; for Gaussian processes [26, Thm 4.3], or more general stationary processes with bounded fourth moments and satisfying some other technical constraints [27, p.58], it can be shown that for the estimate in (B.8),√

N(ˆr − r)is asymptotically Normal distributed with mean zero and a covariance given by Bartlett’s formula [1, 10, 26].

Another important special case of the general interpolation problem is described in the next example.

Example B.3.2 (The Nevanlinna-Pick case) Let

A=





p1 0 0 0 p2 0 0 0 p3



, B=



 1 1 1



.

It is easy to see that det[B AB A²B] = (p1− p2)(p1− p3)(p3− p2) and (A, B) is reachable iff p1, p₂, p₃ are distinct. We also assume that the diagonal elements all have absolute value less than one.

If F is the positive real part of Φ, i.e., the function analytic in the unit disc mapping the outside of the unit disc in to the right half plane such that F +F^†= Φ, the constraints on Φ are equivalent to that F (¯pk) = hk for k = 1, · · · , 3.

Using (B.6) we get the well known result that L is the set of Pick-matrices







h₁+h^†₁ 1−|p₁|²

h₂+h^†₁ 1−p₁p¯₂

h₃+h^†₁ 1−p₁p¯₃ h₁+h^†₂

1−p₂p¯₁

h₂+h^†₂ 1−|p₂|²

h₃+h^†₂ 1−p₂p¯₃ h₁+h^†₃

1−p₃p¯₁

h₂+h^†₃ 1−p₃p¯₂

h₃+h^†₃ 1−|p₃|²





 .

A basis for the Hermitian Toeplitz matrices is







1 1−|p1|²

1 1−p1p¯2

1 1−p1p¯3

1

1−p2p¯1 0 0

1

1−p3p¯1 0 0





,







0 _1−p⁻ⁱ₁_p_¯₂ _1−p⁻ⁱ₁_p_¯₃

i

1−p2p¯1 0 0

i

1−p3p¯1 0 0





,

(6)







0 _1−p¹

1p¯₂ 0

1 1−p₂p¯₁

1 1−|p₂|²

1 1−p₁p¯₃

0 _1−p¹

3p¯2 0





,







0 _1−pⁱ

1p¯2 0

−i

1− ¯p₁p₂ 0 _1−p⁻ⁱ

1p¯₃

0 _{1− ¯}_pⁱ

2p₃ 0





,

and 





0 0 _1−p¹

3p¯1

0 0 _1−p¹

3p¯₂ 1

1− ¯p₁p₂ 1 1− ¯p₂p₃

1 1−|p₃|²







and spans a subspace of dimension 5 [19, Lemma 4]. (The Lemma says that iαB spans the kernel of the map from H to the matrix Σ, so in this example we can for example assume that h3 is real)

Then

G(z) = (zI − A)⁻¹B=





1 1−p₁z

1 1−p₂z

1 1−p3z



, and Q(z) = G(z)^†ΠG(z)is given by

Q(z) = 1

1− ¯p1z⁻¹ 1 1− ¯p2z⁻¹

1 1− ¯p3z⁻¹





π11 π12 π13

π21 π22 π23

π₃₁ π₃₂ π₃₃









1 1−p₁z

1 1−p2z

1 1−p3z





The matrix

PG= Z

GG^† =







1 1−|p1|²

1 1− ¯p2p1

1 1− ¯p3p1

1 1− ¯p₁p₂

1 1−|p₂|²

1 1− ¯p₃p₂ 1

1− ¯p₁p₃ 1 1− ¯p₂p₃

1 1−|p₃|²







and the range of Υ is the set of matrices Σ of Pick type with elements Σi,j= ^w_1−pⁱ^{+ ¯}^w^j

ip¯j

where w1, w₂, w₃are free parameters and the matrix should be non-negative definite.

Then Q(z) = G(z)^†ΠG(z)is given by

Q(z) = tr {PGΠ} +

3

X

k=1 3

X

j=1

π_kj 1 − pkp¯j

1

1 − ¯pkz⁻¹ + 1 1 − pjz

and considering the structure of the matrix Π;

Q(z) = tr {PGΠ} +

3

X

k=1 3

X

j=1

vk+ ¯vj

(1 − pkp¯j)²

1

1 − ¯pkz⁻¹ + 1 1 − pjz

for some parameters v1∈ C, v2∈ C, v3 ∈ R such that p is nonnegative on the unit circle.

(7)

When using input-to-state filters for interpolation, as in the Kullback-Leibler minimization methods [7], [18, Sec II.E and IV.C] and the Hellinger minimization approach in [14], it has been noted that the estimated interpolation parameters are usually not attainable. Since the estimated Σ can not be interpolated exactly, some kind of approximation has to be performed. Two different approaches to deal with this are

• Modifying the estimate of Σ so that it lies in L⁺, i.e., can be interpolated exactly

• Regularize the interpolation problem to determine approximative solutions.

The first approach is based on the fact that the estimated parameters ˆΣ has to be in L⁺ for the interpolation to be feasible. One way to ensure that ˆΣ belongs to L⁺ is to project the initially estimated Σ on L and then modify it to achieve L⁺.

In a recent, still unpublished, paper [15] the state-covariance in L⁺with minimal information divergence (Kullback-Leibler index) to the estimated state-covariance is determined, and this is likely the best current approach for modifying the estimate.

This approach is not taking into account the structure of the model that will be fitted to the data.

The second approach is based on finding approximative solutions which are close to the interpolation values in some measure. A common approach is to run an optimization procedure on the exact problem and to stop after some iterations or when some stopping criterion says that you are close enough. This is a bit ad hoc, it gives different solutions depending on the starting point, and it is not clear in what sense the result is close to the given interpolation data.

An alternative method proposed here is based on minimizing the quadratic penalty on deviations from the given interpolation parameters. Assume that Σ is any complex-valued matrix representing some estimate of a state-covariance (B.5), and define the deviations from these parameters by

D= Σ − Z

GΦG^†. (B.10)

Define the Kullback-Leibler “distance” [19] as

dKL(Φ, Ψ) = Z

Ψ log det(ΨΦ⁻¹) (B.11)

= Ψ, log(ΨΦ⁻¹) . (B.12)

Remark B.3.1 The function in (B.11) is positive if R Φ = R IΨ, and is zero if and only if Φ is equal to IΨ almost everywhere. The distance is usually used for probability densities and then both the integrals equals one. To satisfy this condition it is necessary to assume that the interpolation of the variance is exact. Without this exact interpolation condition it is still a smooth convex function, but it may be negative.

(8)

In some applications it makes sense to assume that R Φ = Ir0 = R IΨ is an exact interpolation condition in order to have a positive Kullback-Leibler distance.

Then Φ − Ir0 is the unknown part of the spectral density and the error due to this unknown part is given by

D= Σ − P0− Z

G(Φ − Ir0)G^†, (B.13) where P0=R GIr0G^†= r0PG. But we will not assume this here, since we need the current version of the interpolation setup for the proof of the main theorem.

Consider minimizing the weighted sum of the Kullback-Leibler distance from the prior scalar spectral density Ψ to Φ and the weighted quadratic penalty on the deviations D from the interpolation constraints. This is equivalent to maximizing the following function:

fΨ(Φ, D) = −Ψ, log(ΨΦ⁻¹) − 1

2kDk²_W. (B.14) It will be assumed throughout that the prior spectral density Ψ is positive and continuous on the unit circle.

This defines the (primal) problem

(PΨ)







maxΦ,D fΨ(Φ, D)

s.t.







D= Σ −R GΦG^†, Φ(e^iθ) ≥ 0, ∀θ Φ = Φ^†.







Remark B.3.2 Without loss of generality Σ can also be assumed to be in L, i.e., the range space of the operator Γ. Σ can always be decomposed, uniquely, as Σ = ΣL+ Σ⊥ where ΣL lies in L and Σ⊥ in the orthogonal complement in the || · ||W- norm, and a similar decomposition of D, it follows from (B.10) that

D_L = ΣL− Z

GΦG^†, D_⊥= Σ⊥,

and ||D||²W = ||DL||²_W + ||D⊥||²_W. So for a general Σ, the optimal Φ is the same if Σ is replaced with its projection on L in the || · ||W-norm and D is constrained to L .

Proposition B.3.1 The objective function fΨ is strictly concave and the constraints are linear. There always exists a finite optimal solution to (PΨ) which is unique.

Proof.

(9)

The strict concavity follows by considering the second Gâteaux differential. The first differential is given by

δfΨ(Φ, D; δΦ, δD) =ΨΦ⁻¹, δΦ − < hD, δDi_W, (B.15) and the second differential is given by

δ²fΨ(Φ, D; δΦ, δD) = −ΨΦ⁻¹(δΦ)Φ⁻¹, δΦ − kδDk²_W. (B.16) The first term can be rewritten as

−ΨΦ⁻¹(δΦ)Φ⁻¹, δΦ = − ΨU^†(δΦ)U, U^†(δΦ)U ,

using that Φ⁻¹ = U U^† for some non-singular matrix function U, the commuting property of the trace and that Ψ is a scalar function. It now follows that the second differential δ²fΨ in (B.16) is non-positive and equal to zero if and only if δΦ and δDboth are zero.

We can also show that (PΨ) always has a finite optimum. Any spectral density Φcan be written as Φ = αΦ0 where Φ0has norm one. Define D = Σ − α R GΦ0G^†, then (Φ,D) is a feasible pair. Letting δD = − R GδΦG^†, then (Φ + δΦ,D + δD) will also be a feasible pair. From (B.15) it follows that with this choice of δD

δfΨ = 1

αΨΦ⁻¹₀ + G^†DWG, δΦ

= 1

αΨΦ⁻¹₀ + G^†ΣWG, δΦ

− α

Z

GΦ0G^†

W

, GδΦG^†

= 1

αΨΦ⁻¹0 + G^†ΣWG, δΦ

− α<

Z

GΦ0G^†, GδΦG^†

W

where (B.2) was used. Inserting δΦ = −Φ0, it holds that

δf_Ψ = − 1

αΨ + G^†ΣWGΦ0,1

+ α

Z

GΦ0G^†

2

W

which is positive for α big enough and > 0, showing that if Φ is big enough the objective function increases if Φ decrease by linear scaling, thus the optimum has

to be finite.

The spectral density has no finite dimensional parameterization and this makes (PΨ) difficult to solve with a direct approach. Using duality theory another optimization problem more numerically tractable will now be derived.

Proposition B.3.2 The dual optimization problem (DΨ) to (PΨ), where Σ is assumed to be in L, is given by

(DΨ)

" min

Λ ϕΨ(Λ)

s.t. Q(e^iθ) ≥ 0, ∀θ, ΛW ∈ L

#

(10)

where the dual objective function is given by ϕΨ(Λ) = 1

2kΛk²_W+ hΛW,Σi − hΨ(z), log Q(z) + Ii , (B.17) where Q(z) is an Hermitian matrix valued rational function defined by

Q(z) = G^†(z)ΛWG(z). (B.18)

Proof. Consider the Lagrange relaxed objective function LΨ = −1

2kDk²_W −Ψ, log ΨΦ⁻¹ + <

Σ −

Z

GΦG^†− D, Λ

W

= −1

2kDk²_W + < hΣ − D, Λi_W − <

Z

GΦG^†,Λ

W

+Ψ, log(ΦΨ⁻¹) . where Λ is the Lagrange multiplier with respect to the inner product h., .iW.

Here, the following holds

<

Z

GΦG^†,Λ

W

= <

Z

trG^†WΛ^†GΦ

= <

Z

tr(G^†WΛ^†GΦ)^†

= <

Z

trG^†ΛW GΦ

= Z

tr

G^† W Λ + Λ^†W 2

GΦ

= hQ, Φi

(B.19)

where

Q(z) = G^†(z)ΛWG(z). (B.20)

Then

= hQ, Φi = −=Q^†,Φ^† = −= hQ, Φi ,

so the imaginary part is equal to zero. The Lagrange function can then be rewritten as

LΨ = −1

2kDk²_W + hΣ, ΛWi − hD, ΛWi − hQ(z), Φi +Ψ, log(ΦΨ⁻¹) . Furthermore, w.l.o.g. it can also be assumed that ΛW ∈ L, since if ΛW = ΛL+ Λ⊥, where ΛL ∈ L and Λ⊥ ∈ L^⊥, then hΣ, ΛWi = hΣ, ΛLi and G^†ΛWG,Φ = Λ_L+ Λ⊥, GΦG^† = G^†ΛLG,Φ

If Q(z) is negative for some z = e. ^iθ in some direction then Φ(e^iθ)can be chosen arbitrarily large in the same direction and the problem is unbounded from above.

Therefore, the corresponding Λ will be, by definition, dual infeasible.

(11)

Whenever the spectral density is strictly positive the differential of LΨ is given by

δLΨ = < h−Λ − D, δDi_W +−Q(z) + Φ⁻¹Ψ, δΦ

= h−Λ − D, δDWi +−Q(z) + Φ⁻¹Ψ, δΦ where we considered (B.2).

Therefore the stationarity condition is that

Dˆ = −Λ, (B.21)

and

Φ(z) = Ψ(z)Qˆ ⁻¹(z). (B.22)

This defines the structure of the global optimum and clearly it will have a rational form and is non-negative on the unit circle by the assumptions.

The dual objective function ϕΨ(Λ)is defined by ϕ_Ψ(Λ) = max

Φ,D L_Ψ(Φ, D, Λ) = LΨ( ˆΦ, ˆD,Λ)

and using (B.21) and (B.22) the expression (B.17) is obtained. If Φ is not strictly positive then the differential of L can not be determined this way. Nonetheless, it can be proved that conditions (B.21) and (B.22) hold at the optimal saddle point of the Lagrangian function.

To show this we show that the duality gap is zero at that point.

Lemma B.3.1 The following inequality

ϕ_Ψ(Λ) ≥ fΨ(Φ, D) (B.23)

holds for all feasible Λ, Φ and D.

Proof. By the argument above it follows that ϕΨ(Λ) = max

Φ,DLΨ(Φ, D) ≥ LΨ(Φ, D) ≥ fΨ(Φ, D),

and thus (B.23) holds for all feasible Λ such that Q(z) is strictly positive on the unit circle and for all feasible Φ and D where Φ is positive on the unit circle. By

continuity, it then holds for all non-negative Φ.

Proposition B.3.3 Suppose the optimization problem (DΨ) where ϕΨis as defined in (B.17), has an internal solution ˆΛ. Then strong duality holds, i.e. the optimal value for (DΨ) is the same as for problem (PΨ).

(12)

Proof. Since the dual function is convex and continuously differentiable the optimum must occur at a stationary point, which we will now characterize.

Consider the minimization with respect to Λ. The Gateaux derivative of ϕΨ

with respect to Λ is given by

δϕΨ(Λ; δΛ) = − Z

trΨQ⁻¹(G^†δΛWG) + hΛ, δΛWi + hΣ, δΛWi Then

δϕΨ(Λ; δΛ) =

Λ + Σ − Z

GΦG^†, δΛW

, (B.24)

where Φ = ΨQ(z)⁻¹.

Clearly, if ˆΛ is the internal minimizer of the dual the differential (B.24) has to be zero and then the primal variables, defined as in (B.21) and (B.22),

Dˆ = −ˆΛ, Φ(z) = Ψ(z) ˆˆ Q(z)⁻¹, (B.25) where

Q(z) = Gˆ ^†ΛˆWG,

is a feasible point for the original problem (PΨ), i.e., satisfies (B.13), and the positivity constraint for Φ.

Consider the duality gap which is given by ∆Ψ= ϕΨ− fΨ,

∆Ψ = 1

2kDk²_W +1

2kΛk²_W + hΣ, ΛWi − hΨ, log Q + Ii +Ψ, log ΨΦ⁻¹ . Using (B.25) it holds that

∆Ψ( ˆΦ, ˆD, ˆΛ) = kˆΛk²_W +D ˆΛW,ΣE

− hΨ, Ii and from (B.24) we use that Σ = R GˆΦG^†− ˆΛ to obtain

DΣ, ˆΛW

E= <D Σ, ˆΛE

W =D

G^†ΛˆWG,Ψ( ˆQ)⁻¹E

− k ˆΛk²_W showing that ∆Ψ( ˆΦ, ˆD, ˆΛ) = 0.

Since there is no duality gap at the evaluated points they have to be optimal for the primal and dual problems respectively. This holds regardless if the spectral density is strictly positive definite, or singular at isolated points on the unit circle.

Proposition B.3.4 The optimization problem (DΨ) always has a unique optimal solution which is internal.

(13)

Proof.

The proof is based on the proofs in [2, Thm 5.1] and [8, Sec.4], and similarly we show that ϕΨ is strictly convex and has compact sublevel sets.

First we show that there is a finite solution to (DΨ). Any feasible Λ can be written Λ = λ˜Λ where λ is a positive scalar, ˜ΛW ∈ L⁺ and k˜ΛkW = 1. Then

ϕΨ(λ˜Λ) = 1

2λ²+D ˜Λ, ΣE

W

λ −D

Ψ, log ˜Q+ IE

− hΨ, Ii log λ,

where ˜Q= G^†Λ˜WG, is an increasing function in λ for λ big enough. This ensures that ϕΨ has compact sublevel sets and that the optimum is attained for a finite Λ.

The first Gateaux differential of ϕΨ is

δϕΨ(Λ; δΛ) = hΛ, δΛWi + hδΛW,Σi −Ψ, Q⁻¹G^†δΛWG . (B.26) It is shown in [2, 8] that if Qkis a sequence of feasible points converging to a Q such that det Q(e^iθ) = 0is zero for some θ then the differential at Qk in the direction of Qk+1− Qk goes to plus infinity. This shows that the problem (DΨ)can not have a solution on the boundary.

The second Gateaux differential is then

δ²ϕΨ(Λ; δΛ) = kδΛk²_W +Ψ, Q⁻¹δQQ⁻¹δQ , (B.27) where δQ = G^†δΛWG. Since Q is non-negative it has a spectral factor S such that Q= SS^†, and then the second term in (B.27) is Ψ, (S^†δQS)²and it follows that

ϕΨ is strictly convex.

The spectral density obtained by solving (DΨ)is on the form Φ = ΨQ⁻¹, where Q= G^†ΛWGis a rational function whose poles are the eigenvalues of A and A⁻¹. It is common to choose the prior Ψ on the same form as Q since then these poles cancels and the determined spectral density has a lower degree. The zeros of Φ will then be the zeros of the prior Ψ (unless they are cancelled) and no information about the zeros is obtained from the interpolation constraints. In order to obtain information about the zeros it is necessary to integrate further information.

B.4 Linear and logarithmic approximative interpolation

Similarly as with the linear interpolation constraints (B.5) we may consider logarithmic interpolation constraints

Ξ = Υ(Φ) = Z

glog det(Φ)g^†, (B.28)

where g is a possibly different (from (B.4)) input-to-state filter

g(z) = (zI − a)⁻¹b, (B.29)

(14)

a is a matrix with all eigenvalues inside the unit circle, b is a vector and the pair (a, b)is reachable. Denote the range space of the map Υ with Lg.

If Φ is scalar, the parameters in the Laurent expansion of log Φ around the unit circle

log Φ(z) =

∞

X

k=−∞

c_kz^k (B.30)

are called the cepstrum parameters. Because of their analogous role to the covariances in (B.3), they were originally called vocariances [3]. The cepstrum parameters for small k describe the envelope of the spectrum well, and is therefore the basis for popular methods used for smoothing periodogram estimates. Therefore, the matrix in (B.28) will be called the state-cepstrum of Φ. Generalized cepstrum parameters are also used in [2].

The interpolation problem to match both covariances r0, r₁, · · · , r_n and cepstrum parameters c1, · · · , c_m was solved using convex optimization methods in [22, 11, 5, 6], and a new approximation theoretic justification was derived in [16].

The main problem with this approach is that it is difficult to satisfy both interpolation constraints simultaneously, actually they can not be attained for most values of ˆr, ˆc. Furthermore, to the authors knowledge, there are no method that gives estimates ˆr, ˆc such that they can be interpolated exactly by such spectra.

Let us consider the problem of simultaneously interpolating (B.5) and (B.28).

This would be a natural generalization of the approximative covariance-cepstrum interpolation problem considered in [12].

Finding a spectral density matching both Σ and Ξ requires considering the image of the simultaneous map (Γ(Φ), Υ(Φ)). Therefore, it becomes even more important to be able to handle approximation of the given interpolation parameters.

Remark B.4.1 It should be noted that Υ(Φ) depends on the entropy c0=R log det Φ. Since log det Φ = c0+ (log det Φ − c0),

Z

glog det Φg^†= c0

Z gg^†+

Z

(log det Φ − c0)gg^†. Then

Z

glog det Φg^†= c0Pg+

Z ^∞

X

k=1

ck(z^k+ z^−k)gg^†,

where the first term depends linearly on the entropy while the second is independent of it and Pg =R gg^† is the solution to the Lyapunov equation

Pg= aPga^†+ bb^†. As in (B.10), define the deviation

D= Σ − Z

GΦG^† (B.31)

(15)

and analogously for the logarithmic constraints H = Ξ −

Z

glog det(Φ)g^†− λ0Pg, (B.32) where, considering Remark B.4.1, λ0is an arbitrary scalar that compensates for Ξ’s dependence on the entropy. Consider minimizing the weighted sum of the quadratic penalties on the deviations D and H from the interpolation constraints, which is equivalent to maximizing the following function:

f(Φ, D, H) = hlog Φ, Ii −1

2kDk²_W −1

2kHk²_V, (B.33) where W and V are given positive definite matrices. This leads to the definition of the following optimization problem

(P)







Φ,D,H,λmax ₀ f(Φ, D, H)

s.t.











D= Σ −R GΦG^†

H = Ξ −R g log det(Φ)g^†− λ0Pg

Φ(e^iθ) ≥ 0, ∀θ, Φ = Φ^†.







It follows from the argument in Remark B.3.2 that, without loss of generality, it can be assumed that Σ ∈ L, Ξ ∈ Lg, D ∈ L and H ∈ Lg.

Remark B.4.2 We could skip the first term, the entropy, in the objective function, and only minimize the quadratic deviations, but there are some unwanted consequences of this. It can be shown that the coefficients of the numerator and denominator of the optimal spectral density Φ will tend to zero. Their quotient will still define the optimal spectral density, but there will likely be numerical problems.

The entropy term actually helps us in this respect acting as a normalization of the numerator.

A numerically more tractable problem is again determined by using duality theory.

Proposition B.4.1 The dual optimization problem to (P) is given by

(D)





 minΠ,Λ

1

2kΛk²_W−1+¹₂kΠk²_V−1+ hΣ, Λi − hΞ, Πi +p(z), log p(z)Q⁻¹(z) − 1

s.t.







p(e^iθ) ≥ 0, Q(e^iθ) ≥ 0 ∀θ, ΠV ∈ Lg, hΠV, P_gi = 0 ΛW ∈ L





 where

p(z) = 1 + g^†(z)ΠVg(z)

Q(z) = G^†(z)ΛWG(z). (B.34)