The problem to track time-varying properties of a signal is studied. The somewhat contradictory notion of \time-varying spectrum" and how to estimate the

(1)

On time-frequency resolution of signal properties using parametric techniques

F. Gustafsson, S. Gunnarsson and L. Ljung

Department of Electrical Engineering, Linkoping University, S-581 83 Linkoping

September 6, 1993

Abstract

The problem to track time-varying properties of a signal is studied. The somewhat contradictory notion of \time-varying spectrum" and how to estimate the

\current" spectrum in an on-line fashion is discussed. The traditional concepts and relations between time- and frequency resolution are crucial for this problem. As adaptive estimation algorithm is used to estimate the parameters of a time-varying autoregressive model of the signal. It is shown how this algorithm can be equipped with a feature such that the time-frequency resolution trade-o favors quick detection of changes at higher frequencies and has slower adaptation at lower frequencies. This should be an attractive feature and compared, for example, to what wavelet transform techniques achieve for the same problem.

1

(2)

1 Introduction

It is a basic problem in many applications to study and track the time-varying properties of various signals. This is at the heart of adaptation and detection mechanisms, and there is a rich literature on this subject, e.g. 16] and 12].

In many contexts it is very attractive to describe the signal characteristics in the frequency domain, i.e. its spectral properties. The spectrum is itself an averaged, time- invariant concept and generalization to a \time-varying" spectrum is somewhat tricky.

One aspect of this problem lies in the well known frequency-time uncertainty relation, i.e. that the frequency resolution depends on the time span.

We will argue that it is natural to demand a quicker response time, i.e. better time resolution from the adaptive algorithm, at the high frequency than at the low frequency end. In other words we seek a frequency dependent time resolution of our algorithm.

This, as such, is nothing new. A typical use of wavelet transform is exactly to have dierent trade-os between time- and frequency resolution in dierent frequency bands.

From this perspective we shall examine current parametric adaptation algorithms and see if they can oer this desired feature. It will turn out that the most used adaptation algorithms { Least Mean Squares (LMS) and Recursive Least Squares (RLS) { do not give this kind of trade-o: The time-window for RLS is frequency independent, while for LMS it depends on the level of the spectrum (not the frequency).

The major point of this contribution is, however, that a frequency-time trade-o of the desired type can be achieved also in parametric modeling. The key is to use a Kalman-lter based algorithm with a carefully tailored state noise covariance matrix.

The paper is organized as follows: In Section 2 we discuss the notion of of a \time- varying" spectrum and how it can be formalized. Section 3 deals with non-parametric methods for adaptive spectrum estimation, such as short-time Fourier transform and Wavelet transform. In Section 4 parametric spectrum modeling, in particular adaptive one, is discussed with an emphasis to look into what the time-resolution depends on.

Section 5 describes the suggested parametric algorithm with the desired feature of time- resolution that is improved with increasing frequency. The properties of this algorithm are illustrated on real and simulated data in Section 6.

2

(3)

2 Time varying spectra

The basis of signal processing is to nd good description of a signal's properties. Con- sider a signal y(t), which we for this discussion take to be observed in discrete time:

y ( t ) t = 0 1 ::: (1)

One of the most successful ways to describe the properties of y ( t ) is to study its spectrum

^y

( ! ) =

^X¹

k=;1

R

^y

( k ) e

^;ik!

(2)

where

R

^y

( k ) = lim

N!1

N 1

N

X

t=1

y ( t ) y ( t

^;

k ) (3) assuming that the limits exist for all k . There is of course a huge literature on how to estimate and utilize spectra.

Now, the spectrum is inherently a time-invariant property, or a time-averaged property.

If the signal has time-varying properties - whatever that means - they won't show up in (2), other than in a time-averaged fashion.

Nevertheless we may want to capture "time-varying properties" in spectral terms, at least intuitively. There are many attempts to describe such time varying spectra,

"

^y

( !t )" (4)

from simple spectrograms (using spectral estimates computed from nite and moving blocks of observed data) to sophisticated transforms of various kinds. Lately there has been a substantial interest in the wavelet transforms also as a means to capture some variant of (4). We shall review some of these approaches in the next section.

We can think of

^y

( !t ) as a "snapshot" of the signal's frequency contents at time moment t . It is clear, though, that due to the uncertainty relationship between time and frequency there will be problems to interpret what a "momentary frequency" might be.

Let us here introduce a formal denition of

^y

( !t ) that in itself is non-contradictory.

We shall assume that the signal y is generated from a stationary signal source e as an AR-process:

A

^t

( q ) y ( t ) = e ( t ) (5) or, in longhand,

y ( t ) =

^;

a

¹

( t ) y ( t

^;

1)

^;^;

a

ⁿ

( t ) y ( t

^;

n ) + e ( t ) (6)

3

(4)

where

A

^t

( q ) = 1 + a

¹

( t ) q

^;1

+

+ a

ⁿ

( t ) q

^;n

(7) Here e ( t ) is white noise with variance r

²

and q

^;1

is the backward shift operator.

For the signal y ( t ), generated by (6) we dene the momentary spectrum as

^y

( !t ) = r

²

j

A

^t

( e

^i!

)

^j²

(8)

This is an exact denition of a momentary spectrum, but the question is whether (8) captures what we intuitively have in mind with the concept "spectrum". We can make two rather obvious observations around this:

"A quick change" in the spectrum at low frequencies is rather to be interpreted as a high frequency component in the signal.

To be perceived as a variation of the spectrum at a certain frequency the rate of change must be signicantly slower (a factor 10 or so) than the frequency itself.

All this is of course well in agreement with well-known practical ways of handling "time- varying spectra". In amplitude- or frequency-modulation, the modulating signal must change much slower than the carrier. That will also allow the signal to pass with the carrier through the band pass lters designed for the carrier.

Still another aspect of this is the following one: The rate of decay of the impulse response of a band pass lter depends essentially only on the absolute width of the pass band (a smaller band gives a slower lter). A pass band of a given relative width will thus correspond to a "faster lter" at higher frequences than at lower ones, again reenforcing the notion that we can allow quicker responses to changes at higher frequencies.

The bottom line of this discussion is thus: While (6)-(8) makes perfect sense as a formal denition, it is only meaningful as a denition of "time-varying spectra" if the time variation of A

^t

( q ) is such that

^y

( !t ) changes signicantly slower than the frequency

! in question.

We could formalize this notion in the following way: Fix ! to a certain frequency and think of

^y

( !t ) as a stochastic process over time. The increment

^y

( !t ) =

^y

( !t + 1)

^;

^y

( !t ) (9) is thus a random variable. Let its standard deviation be ( ! )

q

E

²^y

( !t ) = ( ! ) (10)

4

(5)

This should vary with frequency according to

( ! ) = !

2 ⁽¹¹⁾

where would be the number of frequency cycles it will take, on the average, for the spectrum to change 100%.

The immediate question thus becomes: can we give rules for the time-variation in the AR-coecients a

ⁱ

( t ), so that (9)-(10) is secured? We shall answer that question in Sections 4 and 5.

Before that, let us note that the discussion also implies that it will possible to detect changes in the spectrum at high frequencies much quicker than changes at low frequences. There is thus a direct relationship between a reasonable denition of time- varying spectrum and the possibilities to estimate such a thing from data. Basically we can have a shorter time-window to track high frequency spectral properties, while we need a longer one to track the low frequency behavior. This feature of a frequency- dependent time window is also central when the wavelet transform is used to estimate a time-varying spectrum.

3 Non-parametric spectrum modeling

In the time-invariant o-line case, there are basically two dierent non-parametric ways to estimate the spectrum, see for example 9, 10, 6]. The rst one is based on an estimate of the covariance function R ( k ),

R ^ ( k ) = 1 N

N;k

X

t=1

y ( t ) y ( t + k ) w ( k )

and the spectrum denition

^y

( ! ) =

^Ff

R ( k )

^g

. Here w ( k ) is a window function used to decrease the variance of the estimate. The second approach is to use the squared magnitude of the Fourier transform of the signal, refered to as the periodogram,

^

^y

( ! ) =

^jF

( y )

^j²

:

This estimate is then often smoothed by an appropriate window. These two methods are in principle the same, and we can concentrate on either of them. We will follow the latter approach when generalizing to the time-varying spectrum estimation problem.

The simplest way to achieve a time-frequency representation of a signal y ( t ) is to use the Short-Time Fourier Transform (STFT)

STFT : Y

^ST^F^T

( tw ) =

^Z

y ( ) w

^N

( t

^;

) e

^;j!

d (12)

5

(6)

That is, we take the Fourier transform of the time-windowed signal, where the time window has compact support, w

^N

( t ) = 0 for t <

^;

N and t > N . Thus, we here use 2 N + 1 data points to form the estimate.

A related but conceptually dierent method that recently has met considerable interest is the Wavelet Transform (WT) dened as:

WT : Y

^W^T

( ta ) = 1

^p

a

Z

y ( ) w

^N

t

^;

a

d: (13)

Surveys of the wavelet theory are provided by 4] and 14]. The basic wavelet w

^N

( t ) is a bandpass function of eective time width 2 N

⁰

+ 1, where N

⁰

is the time width of the basic wavelet. The wavelet transform was originally introduced as a time-scale representation, since the transform is obtained as the inner product of scaled versions of the basic wavelet. For this reason, the number of data used to form the estimate is scale dependent, N ( a ) = N

⁰

a . However, if we assume that the Fourier transform of w

^N

( t ) is essentially concentrated to its center frequency, say !

⁰

, then we can put (13) into a time-frequency representation,

Y

^W^T

( t! ) =

^Z

y ( )

s

!

⁰

w

^N

!

⁰

⁽

^;

t )

d: (14) That is, the window size is frequency dependent

N ( ! ) = !

⁰

N

⁰

!

so that we have a narrower time window at higher frequencies.

Since our aim is to compare dierent representations in the time-frequency domain, we next try to formalize the use of wavelets in this context. A more formal way to change scale into frequency is to use the wavelet representation. For very special choices of basic wavelet, see Daubechies (1990a), only a countable number of scales and dilations needs to be evaluated. The most commonly used family of wavelets is

w

^jk

( t ) = 2

^;j=2

w

^N

(2

^;j

( t

^;

kT )) and the signal can be represented as

y ( t ) =

^X

j X

k

c

^jk

w

^jk

( t ) :

Taking the Fourier transform of the summation terms and summing up over all scales j give a time-frequency representation of y ( t ) at time t = kT ,

Y

^W^T

( kT! ) =

^X

j

c

^jk^Ff

w

^jk

( t )

^g

:

6

(7)

0 1 2 3 4 5 6 7 8 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time

Frequency

0 1 2 3 4 5 6 7 8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time

Frequency

Figure 1: Time-frequency resolution for the STFT (left) and WT (right).

This, rather than (14), is how the wavelet transform as a function of frequency should be computed.

Comparing (12) and (14) reveals the wellknown dierence in time-frequency resolution for the STFT and WT, illustrated in Figure 1. For the STFT, the resolution is xed and depends only on the chosen analyzing window. For the WT, the frequency resolution is worse for higher frequencies, where on the other hand the time resolution is better. Note however that the resolutions in time and frequency cannot be chosen independently.

Heisenberg's inequality implies a lower bound on the area of each rectangle in the

gure in both cases.

One can interpret Figure 1 as an idealized contour plot of the time-frequency transform.

Then, the wavelet coecients correspond to the height of each rectangle. (Since it is only possible to have compact support in either time or frequency direction, the rectangles cannot be as distinct as indicated.)

It is now straightforward to dene quadratic representations with the interpretation of time-frequency energy distributions, or instantaneous power spectrum as it is sometimes refered to. The spectrogram

^ST^F^T

( t! ) is dened as

^ST^F^T

( t! ) =

Y

^ST^F^T

( t! )

²

: In the same way, the scalogram is dened as

^WT

( t! ) =

Y

^W^T

( t! )

²

where we used frequency rather than scale in the second argument.

There is a large family of quadratic time{frequency representations dened directly rather than through a frequency transform. A quite general description is provided by Cohen's class:

^y

( t! ) =

^Z

Z

y ( + = 2) y (

^;

= 2) e

^;j!

w ( ) e

^;j(;t)

ddd

7

(8)

Particular methods correspond to special choices of the window w ( ), where the perhaps most wellknown one is the Wigner distribution. A survey of these methods is found in 8].

We now turn our interest to parametric methods of spectral estimation.

4 Parametric spectrum modeling

4.1 O-line modeling

Our starting point for parametric spectrum modeling is, as mentioned in Section 2, to describe the observed signal as the output of a linear system, i.e.

y ( t ) = H ( q ) e ( t ) (15) where the input e ( t ) is zero mean white noise with variance r

²

. This signal description implies that the signal spectrum is given by

^y

( ! ) =

^j

H ( e

^i!

)

^j²

r

²

(16) and consequently the spectrum modeling problem is viewed as the problem of estimating the coecients in the transfer operator H ( q ). We shall concentrate on AR-models, which means

H ( q ) = 1 A ( q ) (17)

where, as in (7),

A ( q ) = 1 + a

¹

q

^;1

+ ::: + a

ⁿ

q

^;n

(18) We can then describe the output signal as a linear regression

y ( t ) = '

^T

( t ) + e ( t ) (19) where

' ( t ) = (

^;

y ( t

^;

1) :::

^;

y ( t

^;

n ))

^T

(20)

and = ( a

¹

:::a

ⁿ

)

^T

(21)

Given an estimate ^

^N

of the parameters, based on observations y (1) :::y ( N ), and an estimate ^ r

²

of the input noise variance the spectrum estimate is formed as

^

^y

( !N ) =

^j

H ^

^N

( e

^i!

)

^j²

r ^

²

(22) where

H ^

^N

( e

^i!

) = 1

A ( e

^i!

^{^}

^N

) (23)

8

(9)

Here ^

^N

typically is the least squares estimate of the parameters in (19), i.e.

^

^N

= arg min

N

X

k=1

( y ( k )

^;

'

^T

( k ) )

²

(24) A relevant question is now what we can say about the accuracy of the spectrum estimate.

One answer to this question is found by investigating the variance of the estimate of the spectral factor H ( e

^i!

). It can then be shown, see 3] and 13], that asymptotically in n and N

E

^j

H ( e

^i!

)

^;

H ^

^N

( e

^i!

)

^j²

n

N

^j

H ( e

^i!

)

^j²

(25) The variance is hence inversely proportional to the number of data that has been used in the parameter estimation. We can therefore consider N as a window width and equation (25) tells us that the accuracy of the spectrum estimate decreases as the window width N decreases. This interpretation will be used further below.

4.2 Modeling variations in the \momentary spectrum"

Recall from Section 2 our denition of the \momentary spectrum"

^y

( !t ) =

^j

H

^t

( e

^i!

)

^j²

r

²

(26) where

H

^t

( e

^i!

) = 1 A

^t

( e

^i!

) (27) Introducing

W ( ! ) = ( e

^i!

:::e

^in!

)

^T

(28)

we can express A

^t

( e

^i!

) as

A

^t

( e

^i!

) = W

( ! ) ( t ) (29) Since we here study parametric methods for spectrum modeling it is natural that an attempt to describe variations of the momentary spectrum begins by stating a model for the parameter variation. We will therefore assume that the true parameters vary according to

( t ) = ( t

^;

1) + ( t ) (30) where ( t ), which denotes the change in the parameter vector, is a vector of zero mean white noise with covariance matrix

E ( t )

^T

( t )] = R

¹

(31) Equation (30) implies that the polynomial A

^t

( e

^i!

) varies according to

A

^t

( e

^i!

) = A

^t;1

( e

^i!

) + A

^t

( e

^i!

) (32)

9

(10)

Using equation (29) we obtain

A

^t

( e

^i!

) = W

( ! ) ( t ) (33) The average change per sample of the polynomial A

^t

( e

^i!

) can then be expressed as

E

^j

A

^t

( e

^i!

)

^j²

] = W

( ! ) R

¹

W ( ! ) = n

r

^A

( ! ) (34) where r

^A

( ! ) is a scalar function. This means that the choice of R

¹

has a frequency domain interpretation as a measure of the variation of the polynomial A

^t

( e

^i!

). As an illustration we consider a simple example.

Example: Assume that that R

¹

is a Toeplitz matrix R

¹

=

0

B

@

r

⁰

r

¹

0 ::: 0 r

¹

r

⁰

r

¹

0 ::: 0 0 r

¹

r

⁰

r

¹

0 ::: 0 ... ... ... ... ... ... ...

1

C

A

(35) Equation (34) then gives

W

( ! ) R

¹

W ( ! ) = n

r

⁰

+ n

^;

1 n ² r

¹

cos( ! )

(36)

i.e. r

^A

( ! ) = r

⁰

+ n

^;

1 n ² r

¹

cos( ! ) (37)

or r

^A

( ! )

r

⁰

+ 2 r

¹

cos( ! ) (38)

for large values of n . This means that if r

¹

is a positive number r

^A

( ! ) will be larger in the low frequency range than in the high frequency range. We can also note that when the dimension of R

¹

in (35) increases the magnitude of W

( ! ) R

¹

W ( ! ), at a some xed frequency, increases with n . It is therefore natural to write this quantity as n

r

^A

( ! ) where r

^A

( ! ) describes the \shape" of the variation.

²

Assuming the parameters to vary according to equation (30) also implies that the transfer function H

^t

( e

^i!

) varies according to

H

^t

( e

^i!

) = H

^t;1

( e

^i!

) + H

^t

( e

^i!

) (39) The relationship between the variation of H

^t

and A

^t

can be seen as follows, where we for convenience skip the arguments,

H

^t

= H

^t^;

H

^t;1

= 1 A

^t ^;

A ¹

^t;1

⁼ A

^t;1

+ ¹ A

^t ^;

A ¹

^t;1

⁼ ⁽⁴⁰⁾

=

^;

A

^t

( A

^t;1

+ A

^t

)( A

^t;1

)

;

A

^t

A

²^t;1

10

(11)

and this implies the approximate relationship

j

H

^t

( e

^i!

)

^j²

=

^j

H

^t;1

( e

^i!

)

^j⁴^j

A

^t

( e

^i!

)

^j²

(41) The quantity of primary interest is of course the momentary spectrum

^y

( !t ) which will vary according to

^y

( !t ) =

^y

( !t

^;

1) +

^y

( !t ) (42) Straightforward calculations give that

^y

( !t ) =

^j

H

^t

( e

^i!

)

^j²

+2 Re H

^t

( e

^i!

) H

^t;1

( e

^i!

) (43) The procedure for expressing the change in the momentary spectrum as a function of the change in the parameter vector hence consists of three steps, i.e equations (33), (40) and (43). An alternate way to describe the change in

^y

( !t ) is of course to express it directly in terms of parameter changes. This approach will be utilized in Section 5 below.

4.3 Estimating time-varying spectra

Let us now consider the problem of estimating the parameters of the \momentary spectrum". This is of course a special case of the general problem of recursive identication or adaptive modeling of signals and system having time-varying character. General presentations of this topic are found in e.g. 12], 16] and 1]. For applications of these methods to estimation of spectral properties we refer to, for example, 15] and 5].

The estimation algorithms that we will consider can all be described by the following common algorithm structure

^ ( t ) = ^ ( t

^;

1) + K ( t ) " ( t ) (44) where

" ( t ) = y ( t )

^;

'

^T

( t )^ ( t

^;

1) (45) and K ( t ) denotes a gain vector. The (time-varying) spectrum estimate is then formed

as ^

^y

( !t ) =

^j

H ^

^t

( e

^i!

)

^j²

r ^

²

(46)

where

H ^

^t

( e

^i!

) = 1 A ^

^t

( e

^i!

) (47)

and A ^

^t

( e

^i!

) = W

( ! )^ ( t ) (48)

11

(12)

Having set up the algorithm structure the next step is to select a suitable gain vector K ( t ). Recall the linear regression

y ( t ) = '

^T

( t ) ( t ) + e ( t ) (49) and the model of parameter variations

( t ) = ( t

^;

1) + ( t ) (50) Equations (49) and (50) dene a dynamic system on state space form where ( t ) is the state vector. The parameter estimation problem can hence be seen as a state estimation problem, where the optimal parameter estimate is given by the Kalman lter. See for example 12]. The Kalman lter applied to the system dened by (49) and (50) results in the update equation (44) where

K ( t ) = P ( t

^;

1) ' ( t )

^ r

²

+ '

^T

( t ) P ( t

^;

1) ' ( t ) (51) and P ( t ) = P ( t

^;

1)

^;

P ( t

^;

1) ' ( t ) '

^T

( t ) P ( t

^;

1)

r ^

²

+ '

^T

( t ) P ( t

^;

1) ' ( t ) + R ^

¹

(52) The design variables ^ R

¹

and ^ r

²

denote the assumed covariance matrix of the parameter variations and the assumed input noise variance respectively, and a key issue when applying the Kalman lter algorithm is to assign suitable values to these variables.

From the discussion above we also know that the choice of ^ R

¹

in the Kalman lter algorithm, equation (52), can be interpreted as an assumption on how the variation of the polynomial A

^t

( e

^i!

) and also the momentary spectrum is distributed over frequency.

Another way of deriving an algorithm is to minimize the following weighted version of the criterion (24)

V

^t

( ) =

^X^t

k=1

^t;k

( y ( k )

^;

'

^T

( k ) )

²

(53) recursively. Such an operation results in the recursive least squares (RLS) method, see

12], where the parameter estimate is updated according to (44) and the gain vector is K ( t ) = P ( t

^;

1) ' ( t )

+ '

^T

( t ) P ( t

^;

1) ' ( t ) (54) and P ( t ) = 1 P ( t

^;

1)

^;

P ( t

^;

1) ' ( t ) '

^T

( t ) P ( t

^;

1)

+ '

^T

( t ) P ( t

^;

1) ' ( t ) ] (55) Here denotes the so called forgetting factor, which is used to control the length of the update step in the algorithm, and it is typically chosen in the interval 0 : 9

1.

12

(13)

Finally, the third algorithm to be studied is the so called least mean squares (LMS) algorithm. See for example 16] for details. The LMS algorithm corresponds to choosing

K ( t ) = ' ( t ) (56)

where is a positive scalar. Updating the parameter estimates according to (44) with K ( t ) given by (56) corresponds to taking update steps in the (approximate) gradient direction of the criterion

V ( ) = E"

²

( t ) (57) It should also be emphasized that the RLS and LMS algorithms can be interpreted as special cases of the Kalman lter algorithm with particular ad hoc choices of the matrix R ^

¹

. This is discussed in 11] and will also be illustrated below.

4.4 Evaluating algorithm performance

Our aim is now to derive measures of the algorithm performance when applied to signals having time-varying spectra. The performance of dierent recursive parameter estimation methods has of course been studied by several authors and there are many publications available. See for example 11] and 2] for surveys. Our contribution here is measures of the quality of the estimate of the spectrum

^y

( !t ), or rather of the spectral factor H

^t

( e

^i!

), when the signal is modeled using parametric methods. We will also show that the concepts of time- and frequency-resolution in nonparametric spectral estimation have their interpretations also for parametric modeling methods.

Time invariant signal properties

For simplicity we will start by considering a time invariant problem, and analogously to (25) we will use the quality measure

( ! ) = E

^j

H ( e

^i!

)

^;

H ^

^t

( e

^i!

)

^j²

(58) Here H ( e

^i!

) denotes the spectral factor of the \true" time-invariant spectrum and H

^t

( e

^i!

) is the estimate obtained by some of the adaptive estimation algorithms presented above. The quantity ( ! ) hence measures how (the average of) the model error is distributed over frequency.

A rigorous treatment of equation (58) would result in complicated expression for the model quality. Our approach is instead to make some simplications and suitable approximations in order to achieve simpler, but still useful, expressions. The approximations involve the assumption that the adaptation is \slow", which means that ^ R

¹

13

(14)

and are small and that is close to one, and that the model order n is large. Using these assumptions it can be shown, see 7], that for the RLS

( ! )

n

2(1

^;

)

^j

H ( e

^i!

)

^j²

(59)

Comparing this equations with equation (25), E

^j

H ( e

^i!

)

^;

H ^

^N

( e

^i!

)

^j²

n

N

^j

H ( e

^i!

)

^j²

(60) gives that the number 2 = (1

^;

) can be interpreted as an equivalent data length or

\time-window". Choosing close to one hence corresponds to using a large set of data in the parameter estimation and obtaining high accuracy (low variance) in the spectrum estimate.

A similar treatment of the LMS algorithm gives that (for small ), see 7],

( ! )

n

2

^y

( ! )

^j

H ( e

^i!

)

^j²

(61) Comparing this expression with (60) yields the following expression for the \equivalent time-window"

N ( ! ) = 2

^y

¹ ( ! ) (62)

Running the LMS algorithm means that we use a data window with frequency dependent width. In a region with high signal energy, i.e where

^y

( ! ) is large, the window is short, and this gives reduced accuracy. Equation (62) illustrates an important dierence compared to the RLS algorithm. For LMS the time-window depends on the signal the algorithm is applied to while it for the RLS algorithm only depends on the design variable.

Finally for the Kalman lter the accuracy of the estimated model is approximately given by, see 7],

( ! )

n 2

q

^y

( ! )

s

r ^

^A

( ! )

r ^

² ^j

H ( e

^i!

)

^j²

(63) and this implies that the equivalent time-window has width

N ( ! ) = 2

1

q

^y

( ! )

s

r ^

²

^ r

^A

( ! ) (64)

where

r ^

^A

( ! ) = 1 nW

⁽ ! ) ^ R

¹

W ( ! ) (65)

and W ( ! ) = ( e

^i!

:::e

^in!

)

^T

(66)

14

(15)

Also for this algorithm the spectrum of the observed signal aects the window width since N ( ! ) is inversely proportional to the square root of the signal spectrum. We however also see that the design variable ^ R

¹

can be given a frequency domain interpretation, and this will give us a method to aect the width of the equivalent data window.

We can now illustrate the statement that the RLS and LMS algorithms can be seen as special cases of the Kalman lter with ad hoc choices of ^ R

¹

and ^ r

²

. We will do this by showing that some particular choices of design variables in the Kalman lter algorithm result in the same expressions for the \equivalent time-window" as obtained for RLS and LMS. Another way of verifying the statement is presented in 11]. Let us start by chosing

R ^

¹

=

²

E ' ( t ) '

^T

( t )] ^ r

²

= 1 (67) Inserting ^ R

¹

into (65) gives

r ^

^A

( ! ) =

²

^y

( ! ) (68) (when n becomes large). Inserting this expression into (64) gives equation (62), i.e. the window width of the LMS algorithm. We then consider the choice

R ^

¹

= (1

^;

)

²

E ' ( t ) '

^T

( t )]

^;1

r ^

²

= 1 (69) which (for large n ) yields

^ r

^A

( ! ) = (1

^;

)

²

^y

( ! ) (70)

Inserted in (64) gives

N ( ! ) = 2 1

^;

⁽⁷¹⁾

which we recognize from the RLS algorithm.

To summarize the presentation of adaptive spectrum modeling so far we have introduced three dierent algorithms and discussed the accuracy of the estimated spectrum in terms of an \equivalent window width" when we model signals having time invariant spectra.

The model accuracy is given by

E

^j

H ( e

^i!

)

^;

H ^

^t

( e

^i!

)

^j²

n

N ( ! )

^j

H ( e

^i!

)

^j²

(72) with N ( ! ) given by (71), (62) and (64). When the variance of the spectrum estimate is high in some frequency range, for example, when the AR-model has a resonance peak, this means that it will be dicult to determine the exact location and height of the peak. Referring to the presentation of the wavelet transform in the previous section this corresponds to having poor frequency resolution. On the other hand we know from the wavelet discussion that poor frequency resolution also means good time resolution, i.e. that we are able to detect changes in the spectrum rapidly. The question is then how this trade o can be described when working with parametric adaptive spectrum modeling methods.

15

(16)

Time varying signal properties

Even though we are primarily interested in the estimate of

^y

( !t ) we will for convenience evaluate the algorithm performance in terms of the mean square error of the spectral factor estimate ^ H

^t

( e

^i!

). We will therefore represent the variation of the signal character by the quantity r

^H

( ! ) given by

E

^j

H

^t

( e

^i!

)

^j²

= nr

^H

( ! ) (73) and measure the algorithm performance by

( ! ) = E

^j

H

^t

( e

^i!

)

^;

H ^

^t

( e

^i!

)

^j²

(74) where the spectral factor H

^t

( e

^i!

) now is time varying. Applying the same ideas as in the time-invariant case it can be shown, see 7], that

( ! )

n

N ( ! )

^j

H ( e

^i!

)

^j²

+ N ( ! )

4 n

r

^H

( ! ) (75)

where N ( ! ) is given by (71), (62) and (64) for the RLS, LMS and Kalman lter algorithm respectively.

Equation (75) illustrates the trade o between variance and tracking ability in terms of the \time-window width" N . While a large width is preferred for keeping the variance down this gives a large tracking error, and vice versa. This is analogous to the time- frequency resolution trade-o in non-parametric spectrum modeling techniques. The trade o in the choice of window shape can be interpreted as the trade o between good time resolution and good frequency resolution. A wide data window reduces the variance contribution to the mean square error while, on the other hand, the tracking error contribution increases. This corresponds to having good frequency resolution and poor time resolution. While we for both the RLS and LMS algorithm can aect the window only by a scaling, we can for the Kalman lter aect also the shape by appropriate choice of ^ R

¹

. The possibility to aect the time-frequency resolution by a suitable choice of ^ R

¹

will be further discussed in the next section.

5 A frequency selective Kalman lter

As have been argued previously in this paper, the frequency dependent time window in the wavelet transform is intuitively very appealing. On the other hand, parametric methods have found many practical applications where the wavelet transform cannot be applied. Interpreting parametric methods as special choices of ^ R

¹

in the Kalman

lter, the problem here is to nd a generically plausible choice of it.

16

(17)

In this section we will try to combine the advantages of the wavelet transform and the parametric methods. We will derive an expression for ^ R

¹

( t ) which makes the Kalman

lter frequency selective, in the way that high frequencies are tracked faster than low frequencies.

5.1 Transforming spectral variations to

^R

^{^}

¹

Before the proposed algorithm is presented, we will give a lemma for how a model of the spectral variation should be transformed to a suitable value of ^ R

¹

in the Kalman

lter.

This means that we assume how the quantity

E

^j

^y

( !t )

^j²

=

( !t ) (76) is distributed over frequency and transform this variation back to parameter variations, represented by ^ R

¹

.

Lemma 5.1 Suppose the spectral variations

( !t ) are specied independently at the frequencies !

¹

::!

ⁿ

. The parameter variation ^ R

¹

= E(

^T

) of an AR( n ) model is then approximately for small

^j

( !t )

^j

given by

R ^

¹

= M

^(t;1)^;T

R ^{^}

M

^(t;1)^;1

(77) where

M

^(t;1)

= ( m

^(t;1)

( !

¹

) :::m

^(t;1)

( !

ⁿ

)) m

= r

²

W ^! (1 + W

^T

) + W (1 + W

)

j

(1 + W

)

^j⁴

R ^

= diag(

( !

¹

t ) ::

( !

ⁿ

t )) and W ( ! ) dened in (28).

Proof: Using a rst order Taylor expansion we can express the momentary spectrum as

^y

( !t ) =

^y

( !t

^;

1) + m

^T

( t ) (78) where m

is the gradient of

^y

with respect to the parameter vector . Writing

^y

= r

²

(1 + W

)(1 + W

^T

) (79)

17

(18)

we get

m

= d

d

^y

⁼ r

²

W ^! (1 + W

^T

) + W (1 + W

)

j

(1 + W

)

^j⁴

(80)

Then recalling (42), i.e.

^y

( !t ) =

^y

( !t

^;

1) +

^y

( !t ) (81) gives

^y

( !t ) = m

^T

( t ) (82) and consequently

j

^y

( !t )

^j²

= m

^T

( t )

^T

( t ) m

(83) Thus, we have

( !t ) = m

^T

R ^{^}

¹

m

(84) Since independence is assumed, we obtain a diagonal matrix ^ R

given by

R ^

= diag(

( !

¹

t ) :::

( !

ⁿ

t )) (85) By evaluating the vector m

in these frequency points and using (84) we get the equation R ^

= M

^(t;1)^T

R ^{^}

¹

M

^(t;1)

(86) where

M

^(t;1)

= ( m

^(t;1)

( !

¹

) :::m

^(t;1)

( !

ⁿ

)) (87) From equation (86), the corresponding parameter covariance matrix ^ R

¹

in (77) follows.

2

The vector m

is of course a function of the true unknown parameters but a feasible alternative is to evaluate m

in the current parameter estimate ^ ( t

^;

1). A generically plausible choice of

is given in the next subsection.

5.2 The proposed algorithm

There are two problems to solve before the algorithm can be applied:

How to model the spectrum variation.

How should a design parameter be introduced, corresponding to the forgetting factor in RLS, the step size in LMS and the signal to noise ratio in the Kalman

lter.

18

(19)

On the rst problem, we propose to use the same assumptions as in the wavelet transform. That is, the spectral variations in the interval = 4 = 2] is half as fast as in the interval = 2 ], and so on. This is done by picking out the center frequencies in each interval, that is

!

ⁱ

= 3

2

ⁱ

i = 1 2 ::n

and letting

( !

ⁱ

t ) = C 2

²ⁱ

: We can summarize the choice of ^ R

¹

( t ) as follows:

R ^

¹

( t ) = M

^(t;1)^{^}^;T

R ^{^}

M

^(t;1)^{^}^;1

(88) R ^

= C _{diag( 12}

²

₂ ¹

⁴

:: ₂ ¹

²ⁿ

) (89) M

^(t;1)^{^}

= ( m

^(t;1)^{^}

(3

2

¹

) ::m

^(t;1)^{^}

(3

2

ⁿ

)) (90)

m

^(t;1)^{^}

( ! ) = r

²

^{2 real( !} WH ( e

^i!

" ^ ( t

^;

1)))

j

H ( e

^i!

" ^ ( t

^;

1))

^j⁴

(91) The only parameters left to choose are the constant C in the spectral variation and the measurement noise variance r

²

. They have the same eect on the lter. One of these may be used as the user knob to tune the trade-o between time and frequency resolution. Since the norm of ^ R

¹

( t ) may vary very much, it has turned out from simulations that it is very dicult to tune the lter in this way. A better choice is to use

R ^^

¹

( t ) = R ^{^}

¹

( t )

j

R ^

¹

( t )

^j

: (92)

The in#uence of C and is here eliminated. Now, is easier interpreted as the signal to noise ratio, and it has turned out to be as simple to tune as the counterpart in the Kalman lter with constant ^ R

¹

.

6 Examples

In this section we will rst illustrate the performance of the frequency selective Kalman

lter on the simplest possible example and then try to analyse the result on a real speech signal and compare it to other methods.

19

(20)

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -1

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Figure 2: Pole conguration of the three test signals before (poles close to the unit circle) and after the spectral change.

6.1 A simulated signal

The behavior of the frequency selective Kalman lter in a very complex situation is hard to illustrate. Simple test signals consisting of pure sinusoids are not suitable because of known practical diculties in estimating AR models with poles on the unit circle.

Instead, we start with the simplest possible example to focus on the principles. The test signal is generated by a second order AR model, which is also the chosen model structure in the ltering. Hence, we here allow perfect modelling. The poles of the AR model is to start with located close to the unit circle, and then suddenly moved towards the origin.

In this way, we can compare how the tracking ability of the frequency selective Kalman

lter performs, compared to for instance RLS, by varying the resonance frequency.

The poles of the three tested models are shown in Figure 2. The resonance frequencies are 3 = 4, 3 = 8 and 3 = 16, which are the three highest center frequencies in the perfect wavelet transform. The magnitude of the poles are changed at time 100 from 0.99 to 0.90. 100 simulations were performed for 200 data, and the mean parameter estimate was computed. The rst plot in Figure 3 shows the magnitude of the poles in the estimated AR model. As seen, the step response is slower for lower frequencies. As comparison, the second plot shows the same estimate for the RLS algorithm with forgetting factor 0.98 and the third plot shows the standard Kalman lter with R

¹

= 10

^;4

I . Here, there is no visible frequency dependence. We remark that the relative dierence in the convergence properties of the three methods depends on the tuning parameters, and it is not interesting in this context where we investigate frequency dependences.

20

(21)

60 80 100 120 140 160 180 200 0.86

0.88 0.9 0.92 0.94 0.96 0.98 1

Magnitude of the poles

1

2

3

60 80 100 120 140 160 180 200

0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

60 80 100 120 140 160 180 200

0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

Figure 3: The magnitude of the poles for frequencies !

¹

> !

²

> !

³

estimated using the frequency selective Kalman lter, RLS and standard Kalman lter, respectively.

21

(22)

6.2 A speech signal

The purpose of this subsection is to compare parameteric and non-parametric methods for spectral analysis and test the frequency selective Kalman lter on a real example.

We will examine a speech signal where the letter s is pronounced like \ess". The signal is shown in Figure 4. First we have silence for about 1000 samples, then the e-sound for about 1500 samples followed by the high-frequency dominated s-sound and then silence again.

Figure 4 shows also the spectrogram computed using a hamming window of width 30 for three segments of the speech signal, corresponding to silence (with some quantied measurement noise), e- and s-sound. We note that the e-sound has two frequency peaks at 0.2 and 1.5 rad/s, respectively, and the s-sound is dominated by the peak at 3 rad/s.

We will now examine how the aforementioned methods are able to nd these spectral peaks and what the time responses look like. We remark again that we will not try to optimize the tuning parameters to get as good tracking performance as possible, because that is another rather subjective matter. Instead, the design parameters in the parametric methods will be tuned to get an approximate window size of 2000 for the lowest resonance peak for the e-sound. The three plots to the right in Figure 5 show the result of

RLS with forgetting factor 0.99,

the Kalman lter with R

¹

=

¹⁶¹

10

^;4

I and R

²

= 1,

the frequency selective Kalman lter with

^k

R

¹^k

=R

²

= 10

^;4

.

The peak frequency is in the interval = 8 = 4] where the window size is increased a factor 4

²

compared to the highest frequencies, which explains the factor 1/16 in the Kalman lter. As seen, this peak frequency is tracked almost identically by these three methods. Note that the frequency selective Kalman lter is faster than the standard Kalman lter to track the high frequency peaks.

The left column of plots in Figure 5 shows the non-parametric methods

DFT with sliding rectangular window of width 2000,

wavelet transform using a 16-points approximation of the ideal high-pass lter as mother wavelet.

For the windowed DFT, the time-resolution is better compared to the parametric methods. On the other hand, the frequency resolution is poorer.

22

(23)

0 1000 2000 3000 4000 5000 6000 7000 -1

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Time Speech signal "s"

10^-2 10^-1 10⁰ 10¹

10^-9 10^-8 10^-7 10^-6 10^-5 10^-4 10^-3

Frequency Spectra from "s"

Silence s-sound e-sound

Figure 4: The speech signal of \ess" and time-invariant spectrograms of its three parts.

An interesting question is how to control the tracking ability in the wavelet transform.

The current plot is not fair to compare to the other methods, because of the very good time resolution and poor frequency resolution. Obviously, the only parameter which in#uences the tracking ability in the wavelet transform is the sampling period | at least if we consider the mother wavelet as given. In the highest octave, = 2 ], the time window width is approximately 2. Thus, in the third octave containing the peak frequency in the e-sound, the window width is 2

³

= 8. To get the desired window width of 2000, we would have to increase the sample rate a factor 256!

7 Conclusions

In this contribution, we have focused on the time and frequency resolution of several parametric methods for spectral estimation, with the terminology used in the non- parametric context. In the parametric approach, we computed the spectrum from a recursively estimated AR model. It was shown that the time windows | that is, the eective number of samples used to compute the spectrum at a certain frequency

| for common adaptive methods as LMS, RLS and the Kalman lter are inherently frequency independent. The time resolution depends only on the design parameters and the spectrum itself.

We have argued that the time resolution should increase with higher frequencies, similar to the wavelet transform. The proposed method is based on the Kalman lter, where the state noise covariance matrix is adapted recursively. This frequency selective Kalman

lter was compared to other approaches for a speech signal.

23

(24)

0 1000 2000 3000 4000 5000 6000 7000 -1

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Time Speech signal "s"

1000 2000 3000 4000 5000 6000 7000

0.5 1 1.5 2 2.5 3

Time

Frequency

RLS

1000 2000 3000 4000 5000 6000 7000

0.5 1 1.5 2 2.5 3

Time

Frequency

DFT with sliding window

1000 2000 3000 4000 5000 6000 7000

0.5 1 1.5 2 2.5 3

Time

Frequency

Kalman filter

1000 2000 3000 4000 5000 6000 7000

0.5 1 1.5 2 2.5 3

Time

Frequency

Wavelet transform

1000 2000 3000 4000 5000 6000 7000

0.5 1 1.5 2 2.5 3

Time

Frequency

Frequency selective Kalman filter

Figure 5: The speech signal and time-frequency representations. The left column con- tains non-parametric methods using windowed DFT and wavelet transform, and the right column parametric methods using RLS, Kalman lter and the frequency selective Kalman lter.

24

(25)

References

1] M. Bellanger. Adaptive digital lters and signal analysis. Marcel Dekker, Boston, 1988.

2] A. Benveniste. \Design of adaptive algorithms for the tracking of time-varying systems". International Journal of Adaptive Control and Signal Processing, 1:3{

29, 1987.

3] K.N. Berk. \Consistent autoregressive spectral estimates". Annals of Statistics, 2:489{502, 1974.

4] I. Daubechies. Ten Lectures on Wavelets. 1992.

5] B. Widrow et al. \Adaptive noise cancelling : Principles and applications". Pro- ceedings IEEE, 63:1692{1716, 1975.

6] W.A. Gardner. Statistical Spectral Analysis. 1988.

7] S. Gunnarsson. \Frequency domain accuracy of recursively identied ARX models". International Journal Control, Vol.54:465{480, 1991.

8] F. Hlawatsch and G.F. Boudreaux-Bartels. Linear and quadratic time-frequency signal representation. IEEE Signal Processing Magazine, 9(2):21{68, 1992.

9] G.M. Jenkins and D.G. Watts. Spectral Analysis and Its Applications. Holden-Day, 1968.

10] S.M. Kay. Modern Spectral Estimation. 1988.

11] L. Ljung and S. Gunnarsson. \Adaptation and tracking in system identication { A survey". Automatica, 26:7{21, 1990.

12] L. Ljung and T. S'oderstr'om. Theory and Practice of Recursive Identication.

M.I.T. Press, Cambridge, MA., 1983.

13] L. Ljung and Z.D. Yuan. \Asymptotic properties of black-box identication of transfer functions". IEEE Trans. Automatic Control, AC-30:514{530, 1985.

14] O. Rioul and M. Vetterli. Wavelets and signal processing. IEEE Signal Procesing Magazine, 8(4):14{38, 1991.

15] J.R. Treichler. \Transient and convergent behavior of the adaptive line enhancer".

IEEE Trans. Acoustics, Speech and Signal Processing, ASSP-27:53{62, 1979.

16] B. Widrow and S.D. Stearns. Adaptive Signal Processing. Prentice-Hall, Englewood Clis, N.J., 1985.

25

The problem to track time-varying properties of a signal is studied. The some- what contradictory notion of \time-varying spectrum" and how to estimate the

On time-frequency resolution of signal properties using parametric techniques

F. Gustafsson, S. Gunnarsson and L. Ljung

Department of Electrical Engineering, Linkoping University, S-581 83 Linkoping

September 6, 1993