• No results found

Some Essentials of Data Analysis with Wavelets

N/A
N/A
Protected

Academic year: 2022

Share "Some Essentials of Data Analysis with Wavelets"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture notes in the wavelet part of the course in data analysis at The Swedish National Graduate School of Space Technology,

Kiruna and Lule˚a, November–December 2009.

Niklas Grip

a

Department of Mathematics Lule˚a University of Technology

SE-971 87 Lule˚a, Sweden Niklas.Grip@ltu.se.

aThe author was partially supported by the Swedish Research Council (project registration number 2004-3862).

(2)
(3)

The lectures and the accompanying slides will give an overview and more example applications.

These lecture notes partly follows the lectures, leaves out some applications mentioned there and covers the underlying mathematics at some more depth, with references to some (of many) books that gives a more broad and deep coverage.

(4)

Contents

1 The discrete wavelet transform and MRA 4

1.1 Basic idea: Approximation + details . . . 4

1.2 Multiresolution analysis adds smoothnes and more . . . 9

1.2.1 Notation and preliminaries . . . 9

1.2.2 Multiresolution analysis . . . 10

1.3 Different wavelet families and characterizing properties . . . 17

2 Time-frequency analysis and the CWT 21 2.1 Time-frequency analysis . . . 21

2.2 Computing the continuous wavelet transform . . . 25

3 Computer lab Matlab exercises 33 3.1 Friday November 13th: Trying out DWT with Wavelet Toolbox . . . 33

3.1.1 Getting started . . . 33

3.1.2 Homework assignments . . . 34

3.2 Monday november 30th: CWT . . . 35

3.2.1 Getting started . . . 35

3.2.2 Homework assignments . . . 36

Index 40

2

(5)

1.1 Mean value approximation on intervals of length 1, 1/2, 1/4 and 1/8 . . . 5

1.2 Details . . . 6

1.3 The Haar wavelet 𝜓 and its scaling function 𝜑 . . . 6

1.4 Plots of 𝑎𝑘 and 𝑎𝑘+ 𝑑𝑘 = 𝑎𝑘+1 for 𝑘 = 0, 1, . . . , 5 . . . 8

1.5 Pyramid algorithm . . . 11

1.6 Wavelet subspaces . . . 12

1.7 Noise reduction and compression . . . 17

1.8 Daubechies wavelets . . . 18

2.1 Time-frequency analysis in the inner ear. . . 21

2.2 Time-frequency analysis with the STFT and continuous wavelet transform. . . . 22

2.3 Dilation and translation of a mother wavelet . . . 24

2.4 Heisenberg boxes . . . 25

2.5 CWT computed with the Wavelet Toolbox cwt command . . . 27

2.6 CWT computed with the command cwtWTmod in homework assignment 2 . . . . 28

2.7 CWT computed with the command cwtSparse in homework assignment 2 . . . 28

2.8 CWT computed with the command cwtFFT in homework assignment 2 . . . 29

2.9 x . . . 30

2.10 Same as Figure 2.5 but for a longer time interval . . . 31

2.11 Same as Figure 2.6 but for a longer time interval . . . 31

2.12 Same as Figure 2.7 but for a longer time interval . . . 32

2.13 Same as Figure 2.8 but for a longer time interval . . . 32

3.1 Transverse strain measurements on a railway bridge . . . 34

3.2 Hard, soft and firm thresholding . . . 35

3

(6)

Chapter 1

The discrete wavelet transform and multiresolution analysis

1.1 Basic idea: Approximation + details

Suppose that you want to compute a rough approximation of some function or signal 𝑓 (𝑥) on the interval [0, 1]. The simplest approximation you can come up with is probably an approximation by one number, say the mean value:

𝑓 (𝑥) ≈ 𝛼0𝜑(𝑥) def= 𝑎0(𝑥), 𝛼0 =

1 0

𝑓 (𝑥) 𝑑𝑥 =

𝑓 (𝑥)𝜑(𝑥) 𝑑𝑥, 𝜑(𝑥)def=

{1 if 0 ≤ 𝑥 ≤ 1, 0 otherwise.

This approximation is plotted in Figure 1.1 (a), and to be honest, it is a pretty rough ap- proximation, so perhaps we want to add some more details, say, by switching to mean value approximation on intervals of length 1/2, as in Figure 1.1 (b). This is an approximation by two numbers

𝑓 (𝑥) ≈ 𝑎1(𝑥)def= 𝛼1,0𝜑(2𝑥) + 𝛼1,1𝜑(2𝑥 − 1),

because 𝜑(2𝑥) and 𝜑(2𝑥−1) equals one exactly on the intervals [0, 1/2] and [1/2, 1], respectively.

(This may take a little practice to see, but note, for example, that 𝜑(2𝑥 − 1) equals one in an interval with endpoints 2𝑥 − 1 = 0 and 2𝑥 − 1 = 1. Hence the factor 2 shrinks the length of interval where the function lives with a factor 2 and “-1” moves it 1/2 to the right.) Once again, we can write the coefficients 𝛼𝑗,𝑘 in terms of 𝜑 and 𝑓 by writing the mean values in the form

𝛼1,𝑘=

𝑓 (𝑥) ⋅ 2𝜑(2𝑥 − 𝑘) 𝑑𝑥, 𝑘 = 0, 1.

Similarly, we can add more details by computing mean values on intervals of length 1/4:

𝑓 (𝑥) ≈ 𝑎2(𝑥)def= 𝛼2,0𝜑(4𝑥) + 𝛼2,1𝜑(4𝑥 − 1) + 𝛼2,2𝜑(4𝑥 − 2) + 𝛼2,1𝜑(4𝑥 − 3), as in Figure 1.1 (c), with coefficients being the mean values

𝛼2,𝑘=

𝑓 (𝑥) ⋅ 4𝜑(4𝑥 − 𝑘) 𝑑𝑥, 𝑘 = 0, 1, 2, 3.

In general, we can get an arbitrarily fine approximation by mean values on intervals of length 2−𝑛:

𝑓 (𝑥) ≈ 𝑎𝑛(𝑥)def=

2𝑛−1 𝑘=0

𝛼𝑛,𝑘𝜑(2𝑛𝑥 − 𝑘), with 𝛼𝑛,𝑘 =

𝑓 (𝑥) ⋅ 2𝑛𝜑(2𝑛𝑥 − 𝑘) 𝑑𝑥.

4

(7)

0 1 x

f(x) a0(x)

0 1/2 1

x

f(x) a1(x)

(a) (b)

0 1/4 2/4 3/4 1

x

f(x) a2(x)

0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1

x

f(x) a3(x)

(c) (d)

Figure 1.1: Mean value approximation on intervals of length 1, 1/2, 1/4 and 1/8. ...

as shown in Figure 1.1 (d) for 𝑛 = 4 and for some larger 𝑛 in Figure 1.4 (page 8).

Suppose now that we want to send this approximation to a friend somewhere far away, and we do not know in advance how detailed approximation of 𝑓 that our friend wants to see before getting satisfied and interrupting he transmission. Then it would be a waste of time and bandwidth to first send approximation 𝑎0, then approximation 𝑎1, followed by 𝑎2, 𝑎3 and so on up to some 𝑎𝑛, because that would require sending ∑𝑛

𝑘=02𝑘 = 2𝑛+1− 1 coefficients, only the last 2𝑛 of which our friend will need and have any actual use of. A simple trick to get around this problem is to rewrite 𝑎𝑛 as a telescope sum

𝑎𝑛 = 𝑎0+ (𝑎1− 𝑎0) + (𝑎2− 𝑎1) + (𝑎3− 𝑎2) + ⋅ ⋅ ⋅ + (𝑎𝑛− 𝑎𝑛−1).

The important difference is that for each new 𝑘, we will now only send the additional details needed for computing 𝑎𝑘 from 𝑎𝑘−1:

𝑎𝑘= 𝑎𝑘−1+ 𝑑𝑘−1, 𝑑𝑘−1 def= 𝑎𝑘− 𝑎𝑘−1.

(8)

0 1 x

Old approximation New approximation

0 1/2 1

x

Old approximation New approximation

(a) (b)

Figure 1.2: (a) The approximations 𝑎0 and 𝑎1. (b) The approximations 𝑎1 and 𝑎2.

Figure 1.3: The Haar wavelet 𝜓 and its scaling function 𝜑.

Moreover, in our particular case, this can be done in such a way that no time or bandwidth is wasted, that is, exactly 2𝑘 coefficients is needed for transmitting all the approximations 𝑎0, 𝑎1, 𝑎2, . . . , 𝑎𝑘. For example, Figure 1.2 (a) shows the approxiamations 𝑎0 and 𝑎1. Their difference 𝑑0 = 𝑎1 − 𝑎0 seems to be some multiple of the function 𝜓(𝑥) = 𝜑(2𝑥) − 𝜑(2𝑥 − 1) plotted in Figure 1.3. Similarly, Figure 1.2 (b) shows the approxiamations 𝑎1 and 𝑎2. Their difference 𝑑1 = 𝑎2− 𝑎1 looks pretty much like a linear combination 𝛽1,0𝜓(2𝑥) + 𝛽1,1𝜓(2𝑥 − 1).

This is in fact the case. For example, for the details 𝑑0, it holds that 𝑑0(𝑥)def= 𝑎1(𝑥) − 𝑎0(𝑥)

= (∫

𝑓 (𝑡) ⋅ 2𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

𝑓 (𝑡) ⋅ 2𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1) )

𝑓 (𝑡)𝜑(𝑡) 𝑑𝑡 𝜑(𝑥)

(9)

= (∫

𝑓 (𝑡) ⋅ 2𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

𝑓 (𝑡) ⋅ 2𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1) )

𝑓 (𝑡)𝜑(𝑡) 𝑑𝑡 (𝜑(2𝑥) + 𝜑(2𝑥 − 1))

=

𝑓 (𝑡)𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

𝑓 (𝑡)𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1) +

𝑓 (𝑡)(𝜑(2𝑡) − 𝜑(𝑡)) 𝑑𝑡 𝜑(2𝑥) +

𝑓 (𝑡)(𝜑(2𝑡 − 1) − 𝜑(𝑡)) 𝑑𝑡 𝜑(2𝑥 − 1)

=

𝑓 (𝑡)𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

𝑓 (𝑡)𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1)

𝑓 (𝑡)𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥) −

𝑓 (𝑡)𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥 − 1)

=

𝑓 (𝑡)(𝜑(2𝑡) − 𝜑(2𝑡 − 1)) 𝑑𝑡 (𝜑(2𝑥) − 𝜑(2𝑥 − 1)) =

𝑓 (𝑡)𝜓(𝑥) 𝑑𝑡 𝜓(𝑥)

=𝛽0,1𝜓(𝑥) with 𝛽0,1 =

𝑓 (𝑡)𝜓(𝑥) 𝑑𝑡

with 𝜓(𝑥) = 𝜑(2𝑡) − 𝜑(2𝑡 − 1) as depicted in Figure 1.3. Similarly, for 𝑑1 we can repeat exactly the same kind of computation on each of the intervals [0, 1/2] and [1/2, 1] to get

𝑑1(𝑥)def= 𝑎2(𝑥) − 𝑎1(𝑥) = 𝛽1,0𝜓(2𝑥) + 𝛽1,1𝜓(2𝑥 − 1) with 𝛽1,𝑘=

𝑓 (𝑥) ⋅ 2𝜓(2𝑥 − 𝑘) 𝑑𝑥.

(1.1) We leave it to the doubting reader(s) to confirm this in Exercise 1 below. For larger 𝑛, it follows in exactly the same way for 𝑑𝑛 that

𝑑𝑛(𝑥)def= 𝑎𝑛+1(𝑥) − 𝑎𝑛(𝑥) =

2𝑛−1 𝑘=0

𝛽𝑛,𝑘𝜓(2𝑛𝑥 − 𝑘) with 𝛽𝑛,𝑘 =

𝑓 (𝑥) ⋅ 2𝑛𝜓(2𝑛𝑥 − 𝑘) 𝑑𝑥.

(1.2) The approximations up to approximation level 6 are illustrated in Figure 1.4. The number of coefficients 𝛼0,𝑘 and 𝛽𝑗,𝑘 needed to transmit 𝑎𝑛(𝑥) = 𝑎0(𝑥) + 𝑑0(𝑥) + 𝑑1(𝑥) + ⋅ ⋅ ⋅ + 𝑑𝑛−1(𝑥) is 1 +∑𝑛−1

𝑘=02𝑘= 2𝑛.

So far we have restricted to the interval [0, 1], but the same procedure can be repeated in each integer endpoint interval [𝑘, 𝑘 + 1]. In the terminology of next section (with a few small changes in notation, see Exercise 2), the functions 𝜑(𝑥 − 𝑘) and 2𝑗/2𝜓(2𝑗 − 𝑘) (𝑗, 𝑘 integers) is the Haar wavelet basis for the function space of square integrable functions 𝑓 (that is, for all 𝑓 for which ∫

∣𝑓(𝑥)∣ 𝑑𝑥 < ∞) has a wavelet series expansion 𝑓 (𝑥) =∑

𝑘∈ℤ

𝑎0,𝑘𝜑(𝑥 − 𝑘) +

𝑗=0

𝑘∈ℤ

𝑑𝑗,𝑘𝜓𝑗,𝑘(𝑥), 𝜓𝑗,𝑘(𝑥)def= 2𝑗2𝜓(2𝑗𝑥 − 𝑘).

An arbitrarily detailed approximation can be computed by truncations to finite sums. We will see that the Haar wavelet basis is orthonormal (explained in next section). This is the underlying mathematical reason for the simple integral equations for the coefficients that we found above. It also follows that the sum ∑

𝑘∈ℤ𝑎0,𝑘𝜑(𝑥 − 𝑘) is the orthogonal projection of 𝑓 on the space 𝑉0 of square integrable functions that are constant on integer endpoint intervals [𝑘, 𝑘 + 1). For 𝑗 = 1, the sum ∑

𝑘∈ℤ 𝑑𝑗,𝑘𝜓𝑗,𝑘(𝑥) adds the details necessary to get an approximation in the space 𝑉1 of square integrable functions that are constant on all intervals

(10)

0 1 Approximation of f

f(x) a0(x)

0 1

Adding details

a0(x) a0(x)+d

0(x)

0 1/2 1

f(x) a1(x)

0 1/2 1

a1(x) a1(x)+d1(x)

0 1/4 2/4 3/4 1

f(x) a2(x)

0 1/4 2/4 3/4 1

a2(x) a2(x)+d2(x)

0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1

f(x) a3(x)

0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1

a3(x) a3(x)+d3(x)

0 2/16 4/16 6/16 8/16 10/16 12/16 14/16 1

f(x) a4(x)

0 2/16 4/16 6/16 8/16 10/16 12/16 14/16 1

a4(x) a4(x)+d

4(x)

0 4/32 8/32 12/32 16/32 20/32 24/32 28/32 1

f(x) a5(x)

0 4/32 8/32 12/32 16/32 20/32 24/32 28/32 1

a5(x) a5(x)+d5(x)

0 8/64 16/64 24/64 32/64 40/64 48/64 56/64 1

f(x) a6(x)

Figure 1.4: For 𝑘 = 0, 1, . . . , 5, the 𝑘th row shows a comparison of 𝑓 with 𝑎𝑘 on the left and, on the right, a plot of 𝑎𝑘 and 𝑎𝑘+ 𝑑𝑘. A comparison with row 𝑘 + 1 illustrates that 𝑎𝑘+ 𝑑𝑘= 𝑎𝑘+1.

[𝑘/2, (𝑘 + 1)/2) with integer 𝑘, and so on, each 𝑗 adding more details, giving a function space 𝑉𝑗 of approximations that are constant on the intervals [2−𝑗𝑘, 2−𝑗(𝑘 + 1)) for 𝑘 integer.

The Haar wavelet basis is well localized, meaning that if a function has some transient localized behaviour of importance, then we can easily make a more detailed approximation (larger 𝑗)) exactly where things happen and settle for more rough approximations (smaller 𝑗) in regions where 𝑓 is more slowly varying. For example, in Figure 1.4 perhaps adding details up to level 𝑗 = 5 can be considered enough in the interval 5/8–7/8 but a more detailed approximation (larger 𝑗) may be desired in the interval 1/8–3/8. This localization is a clear difference from the Fourier basis functions ei2𝜋𝜉𝑥, who have no localization at all.

One disadvantage of the Haar basis is that it is discontinuous, hence giving the staircase shape of the approximation that is clearly visible in Figure 1.4 and causes larger 𝑗 to be needed for a good approximation of a smooth function 𝑓 . This raises the natural question whether we can keep all the good properties of the Haar wavelet basis but add smoothness? Can we, somehow, “round off the corners” of 𝜙 and 𝜓 in such a way that the functions 𝜑(𝑥 − 𝑘) and 𝜓𝑗,𝑘(𝑥) still are an orthonormal basis with the associated simple integral equation for the coefficients and basis functions living on a relatively small interval?

This is, indeed possible. The first step in that direction was a wavelet basis constructed by

(11)

Str¨omberg [Str81], although it got little publicity at that time. He was followed by Meyer [Mey85]

a few years later. Meyer found his orthonormal wavelet as an unexpected counterexample when he, unaware of Str¨omberg’s results, tried to, basically, prove that a well-localized wavelet basis cannot be very smooth. Soon after, other wavelet bases were found. These first constructions built on calculations that seemed miraculous at that time, but got a simple and natural expla- nation in the fall of 1986, when Mallat and Meyer [Mal89, Mey86] developed the multiresolution analysis framework for construction of orthonormal and biorthogonal wavelets. When he first heard about the Meyer basis, Mallat was working on image analysis, where the idea of simulta- neously studying images at different scales had been popular for many years. Multiresolution analysis uses wavelets as a tool for describing the “increment of information” needed to go from a coarse approximation to a higher resolution approximation. (Source for these historical notes: [Dau92, pages 116, 129] and [Mey92, page 125].)

Exercises

1. Confirm (1.1) in at least one of the intervals [0, 1/2] and [1/2, 1] by repeating a slightly modified verison of the the computations done for 𝑑0 (or by using the result (1.1) together with some suitable substitution of variables).

2. Show that if we set 𝜓𝑛,𝑘 def= 2𝑛/2𝜓(2𝑛𝑥 − 𝑘), then we can rewrite (1.2) in a more symmetric form with sums ∑

𝑘𝑑𝑛,𝑘𝜓𝑛,𝑘(𝑥) with 𝑑𝑛,𝑘 =∫

𝑓 (𝑥) ⋅ 𝜓𝑛,𝑘(𝑥).

1.2 Multiresolution analysis adds smoothnes and more

We begin with a review of some notation and terminology that will be used. The reader can either browse through Section 1.2.1 before continuing with Section 1.2.2 or go directly there and use Section 1.2.1 as reference when necessary.

1.2.1 Notation and preliminaries

This text requires no prerequisites in functional analysis, but will use some standard notation that is central for the wavelet theory and useful for reading other books or papers on the subject.

For sets 𝐴 and 𝐵, 𝑎 ∈ 𝐴 means that 𝑎 is an element in the set 𝐴 and 𝐴 ⊂ 𝐵 means that 𝐴 is a subset of 𝐵 (that is, if 𝑎 ∈ 𝐴, then 𝑎 ∈ 𝐵). We write 𝐴 ⊆ 𝐵 if 𝐴 ⊂ 𝐵 or 𝐴 = 𝐵. The union 𝐴 ∪ 𝐵 is the set of all 𝑥 such that 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵. The intersection 𝐴 ∩ 𝐵 is the set of all 𝑥 such that 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵.

Moreover, ℤ is the set of all integers and 𝑙2 is the set of all sequences 𝑥 = (𝑥𝑘)𝑘∈ℤ of complex numbers that have a finite 𝑙2 norm

∥𝑥∥𝑙2

def= √∑

𝑘∈ℤ

∣𝑥𝑘2.

Similarly, ℝ denotes the set of all real numbers and the set of square integrable functions will be denoted 𝐿2(ℝ) and consists of al complex valued functions with finite 𝐿2 norm

∥𝑓∥𝐿2 def=

√∫

∣𝑓(𝑥)∣2 𝑑𝑥

(12)

(in engineering terminology, ∥𝑓∥2𝐿2 is sometimes called the energy of 𝑓 ). For two square inte- grable functions 𝑓 and 𝑔, the inner product ⟨𝑓, 𝑔⟩ is the integral

⟨𝑓, 𝑔⟩def=

𝑓 (𝑥)𝑔(𝑥) 𝑑𝑥,

with overline notation 𝑧 for the complex conjugate of a complex number 𝑧. We say that 𝑓 and 𝑔 are orthogonal if ⟨𝑓, 𝑔⟩ = 0. A sequence of mutually orthogonal functions 𝑔𝑘 is called orthonormal if all elements have norm ∥𝑔𝑘𝐿2(ℝ) = 1.

A sequence of functions 𝑔𝑘 in 𝐿2(ℝ) is called a Riesz basis for a subspace 𝑉 of 𝐿2(ℝ) if there exists positive real numbers 𝐴 and 𝐵 such that

𝐴 ∥𝑐∥2𝑙2

𝑘∈ℤ

𝑐𝑘𝑔𝑘

2

𝐿2

≤ 𝐵 ∥𝑐∥2𝑙2for all 𝑙2 sequences 𝑐 = (𝑐𝑘)𝑘∈ℤ. (1.3) For a more thorough review of Riesz bases and related subjects, we refer, for example, to [Gr¨o00, Chr02, Gri02], the property of crucial importance in this text is that if (𝑔𝑘)𝑘∈ℤ is a Riesz basis for 𝑉 , then there exists a corresponding dual Rises basis (𝑔˜𝑘)𝑘∈ℤ for 𝑉 with the characterizing property that every function 𝑓 in 𝑉 has unique series expansions.

𝑓 =∑

𝑘∈ℤ

⟨𝑓, ˜𝑔𝑘⟩ 𝑔𝑘=∑

𝑘∈ℤ

⟨𝑓, 𝑔𝑘⟩ ˜𝑔𝑘 (1.4)

with convergence in norm, which means that

𝑛≥𝑁

⟨𝑓, ˜𝑔𝑘⟩ 𝑔𝑘 𝐿2(ℝ)

→ 0 and

𝑛≥𝑁

⟨𝑓, 𝑔𝑘⟩ ˜𝑔𝑘

𝐿2(ℝ)

→ 0

when 𝑁 → ∞. For the wavelet spaces 𝑉𝑗 of main importance in this text, convergence in norm implies uniform pointwise convergence [EG05, Lemma 2.1]. The type of convergence is of great mathematical importance, but not anything that the reader needs to worry about for a sufficient understanding of the material presented in this text.

From the uniqueness of (1.4) follows that the sequences (𝑔𝑘) and (˜𝑔𝑘) are biorthogonal, that is, that

⟨𝑔𝑛,˜𝑔𝑘⟩ = 𝛿𝑘[𝑛]def=

{1 if 𝑘 = 𝑛

1 otherwise. (1.5)

For 𝐴 = 𝐵 = 1, (1.3) becomes the Parseval relation ∑

𝑘∈ℤ𝑐𝑘𝑔𝑘

2

𝐿2 = ∥𝑐∥2𝑙2, from which it follows [Gri02, Theorem 1.54] that 𝑔𝑘 =˜𝑔𝑘, that the functions 𝑔𝑘are orthonormal and hence (𝑔𝑘) is called an orthonormal basis. One of the most well-known orthonormal bases is the Fourier basis (

ei2𝜋𝑘𝜉)

𝑘∈ℤ for the set 𝐿2([0, 1)) of square integrable functions on the interval [0, 1).

1.2.2 Multiresolution analysis

Definition 1 (Multiresolution analysis). A multiresolution analysis (MRA) consists of a se- quence (𝑉𝑗)𝑗∈ℤ of closed subspaces, of 𝐿2(ℝ) satisfying

𝑉𝑗 ⊂ 𝑉𝑗+1 for all 𝑗 ∈ ℤ; (1.6a)

𝐿2(ℝ) = ∪𝑗∈ℤ𝑉𝑗

def= the smallest closed subset of 𝐿2(ℝ) that contains all 𝑉𝑗; (1.6b) 𝑓 ∈ 𝑉𝑗 if and only if 𝑓 (2(⋅)) ∈ 𝑉𝑗+1 for all 𝑗 ∈ ℤ; (1.6c)

𝑗∈ℤ𝑉𝑗 = {0} ; (1.6d)

For some 𝜑 ∈ 𝑉0, the sequence (𝜑(⋅ − 𝑘))𝑘∈ℤ is a Riesz basis for 𝑉0. (1.6e)

(13)

Figure 1.5: The pyramid algorithm. (a) Decomposition of a function into its wavelet coeffi- cients. (b) (Re)construction of a function from its wavelet coefficients. For “large enough”

𝐽 the projection 𝑃𝐽 on 𝑉𝐽 can be neglected and the pre- and postfilter can be approx- imated with an identity mapping. Hence it is common practice to choose a big 𝐽 and use sample values of 𝑓 as a good approximation of the input coefficient vector 𝒂𝐽 in (a) and similarly for the oputput vector 𝒂𝐽 in (b). Convolusion with the filters ℎ, 𝑙, ˜ℎ and

˜𝑙 is here denoted with the corresponding frequency responses 𝐻(𝜉) def= ∑

𝑘∈ℤℎ[𝑘]e−𝑖2𝜋𝜉𝑘, 𝐻(𝜉)˜ def= ∑

𝑘∈ℤ˜ℎ[𝑘]e−𝑖2𝜋𝜉𝑘 =∑

𝑘∈ℤ˜ℎ[−𝑘]e−𝑖2𝜋𝜉𝑘 =∑

𝑘∈ℤ˜ℎ[𝑘]e−𝑖2𝜋𝜉𝑘 and similarly for 𝑙 ,˜𝑙. The function 𝜑 in (1.6e) is called a scaling function.

Conditions (1.6a) and (1.6b) guarantees that (𝑉𝑗) is a chain of growing subspaces of 𝐿2(ℝ) such that any function in 𝐿2(ℝ) can be approximated arbitrarily well with some function 𝑓𝑗 ∈ 𝑉𝑗 for a large enough 𝑗. Condition (1.6c) guarantees that the functions in 𝑉𝑗 contain smaller and smaller details as 𝑗 increases: 𝑉𝑗+1 approximates functions at twice as fine scale as 𝑉𝑗. Condition (1.6d) means that the only finite energy function that can be approximated at an arbitrary coarse scale is the zero function.

Condition (1.6e), finally, is clearly necessary for 𝜑 to have the same role in an MRA wavelet basis that the Haar scaling function had in Section 1.1.

All practically useful wavelets have finite integral ∫

𝜑(𝑥) 𝑑𝑥 < ∞, which implies (see Re- mark 1, page 15) conditions (1.6d) and (1.6b) follow from the other conditions (1.6a), (1.6c) and (1.6e) and, for orthonormal wavelets, it also follows that

𝜑(𝑥) 𝑑𝑥 = 1. (1.7)

The theoretical importance of the MRA-definition cannot be overestimated. For example, the conditions (1.6) can be viewed as a mathematical problem that can be solved under different additional constraints on, for example, the smoothness or support of 𝜑 or the corresponding wavelet (some such properties are explained in Section 1.3). The mathematical machinery used to solve such a problem (see, for example, [HW96, BEL99, Mal99]) is out of the scope of this more applications oriented text, so here we only note that it can be done and the resulting solutions are different families of scaling functions satisfying equation (1.6) as well as some additional constraints.

For each such scaling function, there are several important consequences of the MRA defi- nition that we now will describe somewhat closer.

(14)

Figure 1.6: Wavelet subspaces. (a) A Venn diagram that illustrates that 𝑉𝑗 ⊂ 𝑉𝑗+1 and that 𝑉𝑗 ⊕ 𝑊𝑗 = 𝑉𝑗+1. (b) Illustration of the fact that 𝑉𝑗 is spanned by the sequence (𝜑𝑗,𝑘)𝑘∈ℤ an similarly for ˜𝑉𝑗, 𝑊𝑗 and ˜𝑊𝑗, as well as the biorthogonality conditions between the spaces and their basis functions: For example, 𝑉𝑗 and 𝑊𝑗 can be non-orthogonal, but ˜𝑊𝑗⊥𝑉𝑗 and ˜𝑉𝑗⊥𝑊𝑗. Perhaps the most important consequence of (1.6) is the existence of the fast so-called pyra- mid algorithm for computing the coefficients of the MRA wavelet series expansion from sample values of the analysed signal. This mapping from sample values to wavelet coefficients is some- times called the discrete wavelet transform (DWT). The pyramid algorithm computes both the DWT and its inverse in 𝑂(𝑁) time for signal length 𝑁 (that is, for large 𝑁 the number of arith- metic operations are asymptotically proportional to 𝑁 when 𝑁 → ∞). This can be compared to the fast Fourier transform that is an algorithm for computing the discrete Fourier transform in 𝑂(𝑁 log(𝑁)) time for signal lenght 𝑁.

The pyramid algorithm consists of a repeated use of certain lowpass 𝐿 and highpass filters 𝐻 in a way that is depicted in Figure 1.5).

Important MRA consequences leading to the pyramid algorithm

We will now list some very important consequences of the MRA definition (1.6) and use them for deriving the pyramid algorithm in step 5 of the list.

1. It follows from (1.6a) and elementary functional analysis (see, for example, [Kre78]) that each space 𝑉𝑗+1 can be decomposed into 𝑉𝑗 and a subspace 𝑊𝑗 such that every 𝑓 in 𝑉𝑗+1

can be uniquely decomposed into 𝑓 = 𝑢 + 𝑣 with 𝑢 ∈ 𝑉𝑗 and 𝑣 ∈ 𝑊𝑗. We write this

𝑉𝑗+1 = 𝑉𝑗 ⊕ 𝑊𝑗, ∀𝑗 ∈ ℤ (1.8)

(corresponding to 𝑎𝑛+1(𝑥) = 𝑎𝑛(𝑥) + 𝑎𝑛(𝑥) in (1.2).) This relation between the subspaces is illustrated in Figure 1.6. If all such 𝑢 and 𝑣 are orthogonal (⟨𝑢, 𝑣⟩ = 0), then 𝑊𝑗 is the orthogonal complement of 𝑉𝑗 in 𝑉𝑗+1 (𝑉𝑗⊥𝑊𝑗) and the construction below will give the scaling function and mother wavelet of an orthonormal wavelet basis for 𝐿2(ℝ). In general, however, 𝑊𝑗 does not have to be the orthogonal complement of 𝑉𝑗a (see Figure 1.6 (a)).

aThis is just like in linear algebra, where, for example, the vector space3can be decomposed into3= 𝑈 ⊕𝑉

(15)

One example of this will be the (biorthogonal) spline wavelets whose scaling function defined in in Example 4, page 20.

2. Since {𝜑(𝑥 − 𝑘)}𝑘∈ℤ is a Riesz basis for 𝑉0 there is a corresponding dual Riesz basis that turns out to have the same structure [Gr¨och/Aldroubi citation here???], that is, the basis elements are integer shifts 𝜑(𝑥 − 𝑘) of the dual scaling function ˜˜ 𝜑.

3. From (1.6a) follows that 𝜑,{√ 𝜑 ∈ 𝑉˜ 0 ⊆ 𝑉1. Moreover, from (1.6e) and (1.6c) follows that 2𝜑(2𝑥 − 𝑘)}

𝑘∈ℤ and {√

2𝜑(2𝑥 − 𝑘)˜ }

𝑘∈ℤ are dual Riesz bases for 𝑉1. Hence we get the important scaling equation for 𝜑 and 𝜑, respectively:˜

𝜑(𝑥) =∑

𝑘∈ℤ

𝑙[𝑘]√

2𝜑(2𝑥 − 𝑘), and 𝜑(𝑥) =˜ ∑

𝑘∈ℤ

˜𝑙[𝑘]√

2𝜑(2𝑥 − 𝑘),˜ (1.9)

with coefficients computed with the standard formula as in (1.4):

𝑙[𝑘] =

𝜑(𝑥)√

2𝜑(2𝑥 − 𝑘) 𝑑𝑥˜ and ˜𝑙[𝑘] =∫

𝜑(𝑥)√

2𝜑(2𝑥 − 𝑘) 𝑑𝑥.˜

4. Now set

𝜓(𝑥)def= ∑

𝑘∈ℤ

ℎ[𝑘]√

2𝜑(2𝑥 − 𝑘), and 𝜓(𝑥)˜ def= ∑

𝑘∈ℤ

˜ℎ[𝑘]√

2𝜑(2𝑥 − 𝑘),˜ (1.10a)

with

ℎ[𝑘]def= (−1)1−𝑘𝑙[1 − 𝑘] and ˜ℎ[𝑘]def= (−1)1−𝑘˜𝑙[1 − 𝑘]. (1.10b) Then it can be shown ([BEL99, Section 4.6], [Mal99, Section 7.4] or the more general result [HW96, p. 57-59] proved for orthonormal wavelets) that {𝜓(𝑥 − 𝑘)}𝑘∈ℤ is a Riesz basis for 𝑊0 with dual basis of the same structure {

𝜓(𝑥 − 𝑘)˜ }

𝑘∈ℤ. Moreover, with nota- tion

𝜑𝑗,𝑘(𝑥) = 2𝑗/2𝜑(2𝑗𝑥 − 𝑘) and similarly for 𝜑, 𝜓 and ˜˜ 𝜓, (1.11) a substitution of variables gives corresponding biorthogonal bases {𝜓𝑗,𝑘}𝑘∈ℤand{

𝜓˜𝑗,𝑘

}

𝑘∈ℤ

for corresponding subspaces 𝑊𝑗 and ˜𝑊𝑗, respectively, in a decomposition of 𝐿2(ℝ) into subspaces 𝑉0⊕ 𝑊0⊕ 𝑊1⊕ 𝑊2⊕ 𝑊3⊕ ⋅ ⋅ ⋅ with corresponding dual subspaces ˜𝑉0⊕ ˜𝑊0⊕ 𝑊˜1 ⊕ ˜𝑊2 ⊕ ˜𝑊3 ⊕ ⋅ ⋅ ⋅ . In other words, all square integrable function 𝑓, have the series expansion

𝑓 (𝑥) =∑

𝑘∈ℤ

⟨𝑓, ˜𝜑0,𝑘⟩ 𝜑0,𝑘(𝑥) +

𝑗=0

𝑘∈ℤ

〈 𝑓, ˜𝜓𝑗,𝑘

𝜓𝑗,𝑘(𝑥)

(and similarly with the roles of the basis and the dual basis interchanged, as in (1.4)).

Here, just like for the Haar wavelet in Section 1.1, ∑

𝑘∈ℤ⟨𝑓, 𝜑0,𝑘⟩ 𝜑0,𝑘(𝑥) is a coarse scale 𝑉0-approximation of 𝑓 and for every 𝑗, the sum ∑

𝑘∈ℤ⟨𝑓, 𝜓𝑗,𝑘⟩ 𝜓𝑗,𝑘(𝑥) adds the details in 𝑊𝑗, that is, exactly the details needed to get from the 𝑉𝑗-approximation to the next finer approximation with a function in 𝑉𝑗+1.

with 𝑈 being the 𝑥-𝑦-plane and 𝑉 the 𝑧-axis. In this case, 𝑈 and 𝑉 are orthogonal, but we can just as well write3= 𝑈 ⊕ 𝑉 with 𝑉 being the straight line spanned by the vector(

1 1 1)T .

(16)

5. It follows that every 𝑓 in 𝑉𝑗 = 𝑉𝑗−1⊕ 𝑊𝑗−1 can be written in the two equivalent forms 𝑓 (𝑥) =∑

𝑘∈ℤ

𝑎𝑗,𝑘𝜑𝑗,𝑘(𝑥) =∑

𝑘∈ℤ

𝑎𝑗−1,𝑘𝜑𝑗−1,𝑘(𝑥) + 𝑑𝑗−1,𝑘𝜓𝑗−1,𝑘(𝑥). (1.12)

The decomposition part of the pyramid algorithm (depicted in Figure 1.5 (a)) computes the coefficients 𝑎𝑗−1,𝑘 and 𝑑𝑗−1,𝑘 from the coefficients 𝑎𝑗,𝑘.

To do this we first need to reformulate the scaling equation (1.9) to the corresponding equations for 𝑉𝑗−1:

𝜑(𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]√

2𝜑(2𝑥 − 𝑛)

2𝑗−12 𝜑(2𝑗−1𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]√

2 ⋅ 2𝑗−12 𝜑(2 ⋅ 2𝑗−1𝑥 − 𝑛)

2𝑗−12 𝜑(2𝑗−1𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]2𝑗2𝜑(2𝑗𝑥 − 𝑛) (substitute 𝑥 7→ 𝑥 − 2−(𝑗−1)𝑘)

2𝑗−12 𝜑(2𝑗−1𝑥 − 𝑘) =∑

𝑛∈ℤ

𝑙[𝑛]2𝑗2𝜑(2𝑗𝑥 − (2𝑘 + 𝑛)) 𝜑𝑗−1,𝑘(𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]𝜑𝑗,2𝑘+𝑛(𝑥) and similarly, 𝜑˜𝑗−1,𝑘(𝑥) =∑

𝑛∈ℤ

˜𝑙[𝑛] ⋅ ˜𝜑𝑗,2𝑘+𝑛(𝑥) (1.13) Insertion in the standard formula for the coefficients 𝑎𝑗−1,𝑘 in (1.12) gives

𝑎𝑗−1,𝑘 = ⟨𝑓, ˜𝜑𝑗−1,𝑘⟩ =∑

𝑛∈ℤ

˜𝑙[𝑛] ⟨𝑓, ˜𝜑𝑗,2𝑘+𝑛⟩ =∑

𝑛∈ℤ

˜𝑙[−𝑛] ⟨𝑓, ˜𝜑𝑗,2𝑘−𝑛⟩ =∑

𝑛∈ℤ

˜𝑙[−𝑛]𝑎𝑗,2𝑘−𝑛.

Hence, if we set 𝑢𝑗[𝑘]def= 𝑎𝑗,𝑘 and ˜𝑙[𝑛]def= ˜𝑙[−𝑛] (the involution of ˜𝑙), then the last equation can be reformulated as the convolution 𝑢𝑗−1[𝑘] = (𝑢𝑗 ∗ ˜𝑙)[2𝑘] = (↓ (𝑢𝑗 ∗ ˜𝑙))[𝑘] with ↓ being the downsampling operator that throws away the odd numbered samples. Similarly to (1.13) we can show that

𝜓𝑗−1,𝑘(𝑥) =∑

𝑛∈ℤ

ℎ[𝑛] ⋅ 𝜑𝑗,2𝑘+𝑛(𝑥) and 𝜓˜𝑗−1,𝑘(𝑥) =∑

𝑛∈ℤ

˜ℎ[𝑛] ⋅ ˜𝜑𝑗,2𝑘+𝑛(𝑥) (1.14)

so that it follows for the coefficients 𝑑𝑗−1,𝑘 in (1.12) that 𝑑𝑗−1,𝑘 =〈

𝑓, ˜𝜓𝑗−1,𝑘

=∑

𝑛∈ℤ

˜ℎ[−𝑛]𝑎𝑗,2𝑘−𝑛. (1.15)

Hence we get exactly the algorithm depicted in Figure 1.5 (a). Finally, the (re)construction part of the pyramid algorithm follows from inserting (1.13) and (1.14) into (1.12), setting 𝑚 = 2𝑘 + 𝑛 and identifying the coefficients of 𝜑𝑗,𝑘(𝑥):

𝑘∈ℤ

𝑎𝑗,𝑘𝜑𝑗,𝑘(𝑥) =∑

𝑘∈ℤ

𝑎𝑗−1,𝑘𝜑𝑗−1,𝑘(𝑥) + 𝑑𝑗−1,𝑘𝜓𝑗−1,𝑘(𝑥)

=∑

𝑘∈ℤ

(

𝑎𝑗−1,𝑘

𝑛∈ℤ

𝑙[𝑛]𝜑𝑗,2𝑘+𝑛(𝑥) + 𝑑𝑗−1,𝑘

𝑛∈ℤ

ℎ[𝑛]𝜑𝑗,2𝑘+𝑛(𝑥).

)

(17)

=∑

𝑘∈ℤ

𝑚∈ℤ

(𝑎𝑗−1,𝑘𝑙[𝑚 − 2𝑘] + 𝑑𝑗−1,𝑘ℎ[𝑚 − 2𝑘]) 𝜑𝑗,𝑚(𝑥)

=∑

𝑚∈ℤ

(∑

𝑘∈ℤ

𝑎𝑗−1,𝑘𝑙[𝑚 − 2𝑘] + 𝑑𝑗−1,𝑘ℎ[𝑚 − 2𝑘]

)

𝜑𝑗,𝑚(𝑥) 𝑎𝑗,𝑚=∑

𝑘∈ℤ

𝑎𝑗−1,𝑘𝑙[𝑚 − 2𝑘] + 𝑑𝑗−1,𝑘ℎ[𝑚 − 2𝑘].

This is nothing but convolution with either the odd or the even numbered entries of 𝑙 and ℎ. More precisely, with notation 𝑢𝑗[𝑚] = 𝑎𝑗,𝑚 and 𝑣𝑗[𝑚] = 𝑑𝑗,𝑚 we can rephrase it as applying first an upsampling operator ↑ and then convolution with 𝑙 and ℎ, respectively:

𝑢𝑗[𝑚] = ((↑ 𝑢𝑗−1)∗𝑙)[𝑚]+((↑ 𝑣𝑗−1)∗ℎ)[𝑚] with (↑ 𝑥)[𝑛]def=

{𝑥[𝑛/2] if 𝑛 is even, 0 if 𝑛 is odd, as illustrated in Figure 1.5 (b).

Complexity of the pyramid algorithm

The claimed 𝑂(𝑁) complexity of the pyramid algorithm depends on the fact that the MRA scaling functions and wavelets have finite support, that is, they live on some finite length interval and therefore have finite (and typically relatively small) filter lengths (see exercise 4).

For example, if the filters have length 𝑀 and the signal has length 𝑁 = 2𝐽 much larger than 𝑀, then the pyramid algorithm conists of 𝐽 steps, each consisting of 𝑀𝑂(𝑁) arithmetic operations for the up/downsampling and convolutions.

Input and output of the pyramid algorithm

It is common practice to use sample values of the analysed signal as input coefficient vector of the pyramid algorithm. This means that the two first “boxes” in Figure 1.5 are ignored.

This is easily motivated as long as the analysed signal or function 𝑓 can be assumed to be continuousb. If 𝐽 is large enough, then we can assume that 𝑓 is well approximated by its projection 𝑃𝐽𝑓 def= ∑

𝑘∈ℤ⟨𝑓, ˜𝜑𝐽,𝑘⟩ 𝜑𝐽,𝑘 =∑

𝑘∈ℤ𝑎𝐽,𝑘𝜑𝐽,𝑘 on 𝑉𝐽. Thus the first box in in Figure 1.5 is negligible and we can assume 𝑓 to be in 𝑉𝐽 with negligible errors. Moreover, recall that𝜑(𝑥)˜ is nonzero only on some not so large interval [𝐴, 𝐵] and set 𝑐def= ∫

𝜑(𝑥) 𝑑𝑥. Then˜ 𝜑(2˜ 𝐽𝑥 − 𝑘) is nonzero only for 2−𝐽𝐴 ≤ 𝑥 − 2−𝐽𝑘 ≤ 2−𝐽𝐵, so for large enough 𝐽 we can make the assumption 𝑓 (𝑥) ≈ 𝑓(2−𝐽𝑘) in the integral

𝑎𝐽,𝑘 =

𝑓 (𝑥)2𝐽/2𝜑(2˜ 𝐽𝑥 − 𝑘) 𝑑𝑥 ≈ 2𝐽/2𝑓 (2−𝐽𝑘)

𝜑(2˜ 𝐽𝑥 − 𝑘) 𝑑𝑥 = (set 𝑦 = 2𝐽𝑥 − 𝑘)

=2−𝐽/2𝑓 (2−𝐽𝑘)

𝜑(𝑦) 𝑑𝑦 = 2˜ −𝐽/2𝑐𝑓 (2−𝐽𝑘). (1.16)

Moreover, for orthonormal wavelets 𝜑 = 𝜑 and it follows from (1.7) that 𝑐 =˜ ∫

𝜑(𝑥) 𝑑𝑥 = 1.

(See also (1.18).)

Remark 1. Definition 1 can be written in several equivalent but seemingly more general forms (see [HW96, pages 45, 46 and 49]):

bThis continuity assumption is realistic in typical signal processing applications, but not, for example, in wavelet analysis of a fractal.

(18)

∙ It can be shown that conditions (1.6) implies the existence of some 𝛾 ∈ 𝑉0 whose integer shifts 𝛾(𝑥 − 𝑘) is an orthonormal basis for 𝑉0. Hence we get an equivalent definition of an MRA if we replace “Riesz” with the word orthonormal in (1.6e).

∙ Conditions (1.6a), (1.6c) and (1.6e) imply (1.6d), which therefore can be removed from the definition of an MRA.

∙ If in addition ∣ ˆ𝜑∣ is continuous at 0, then (1.6a), (1.6c) and (1.6e) also imply that (1.6b) is equivalent to requiring that 𝜑(0) ∕= 0. Moreover, it follows for orthonormal waveletˆ scaling functions that ∣ ˆ𝜑(0)∣ = 1.

Exercises

1. For the two dual bases {𝜑𝑗,𝑘}𝑘∈ℤ ∪ {𝜓𝑗,𝑘}𝑘∈ℤ and { ˜𝜑𝑗,𝑘}𝑘∈ℤ ∪{ 𝜓˜𝑗,𝑘

}

𝑘∈ℤ for 𝑉𝑗, convince yourself that the biorthogonality condition (1.5) becomes

⟨𝜑𝑗,𝑘,𝜑˜𝑗,𝑙⟩ =〈

𝜓𝑗,𝑘, ˜𝜓𝑗,𝑙

〉= 𝛿𝑘,𝑙 and 〈

𝜑𝑗,𝑘, ˜𝜓𝑗,𝑙

〉= ⟨𝜓𝑗,𝑘,𝜑˜𝑗,𝑙⟩ = 0.

2. Fill in the left out steps in the computaion of (1.14), how it leads to (1.15) and how (1.15) is related to the filter ˜𝐻 and the downsampling ↓ of its output in Figure 1.5 (a).

3. Why is 𝐿 called a lowpass filter?

Hint: Let’s restrict to orthonormal wavelets, for which we know from (1.7) that 𝜑(0) =ˆ

𝜑(𝑥) 𝑑𝑥 = 1. Taking the Fourier transform of the left-hand and right-hand sides of (1.9) gives

ˆ

𝜑(𝜉)def=∑

𝑘∈ℤ

𝑙[𝑘]√ 2

𝜑(2𝑥 − 𝑘)e−𝑖2𝜋𝜉𝑥𝑑𝑥 =∑

𝑘∈ℤ

𝑙[𝑘]√ 2 ⋅1

2

𝜑(𝑦)e−𝑖2𝜋𝜉

(𝑦 2+𝑘

2 )

𝑑𝑦

= 1

√2

𝑘∈ℤ

𝑙[𝑘]e−𝑖2𝜋𝑘2𝜉

𝜑(𝑦)e−𝑖2𝜋𝜉𝑦2 𝑑𝑦 = 1

√2

𝑘∈ℤ

𝑙[𝑘]e−𝑖2𝜋𝑘𝜉2

𝜑(𝑦)e−𝑖2𝜋2𝜉𝑦𝑑𝑦

= 1

√2𝐿(𝜉

2

)𝜑ˆ(𝜉

2

), or equivalently,

ˆ

𝜑(2𝜉) =12𝐿(𝜉)𝜑(𝜉).ˆ

Suppose that 𝜑 has 99 % of its energy in an interval [𝑎, 𝑏], with 𝑎 < 0 < 𝑏, that is, thatˆ

𝑏

𝑎 ∣ ˆ𝜑(𝜉)∣2 𝑑𝜉 ≥ 0.99∫

∣ ˆ𝜑(𝜉)∣2 𝑑𝜉. Find some interval where 𝜑(2𝜉) has 99 % of its energy.˜ What does this say about the action of 𝐿 on 𝜑?.ˆ

4. Explain why a scaling functions 𝜑, 𝜑 living on finite length intervals have finite length˜ filters 𝑙, ℎ, ˜𝑙 and ˜ℎ.

Hint: Do you know some useful formula(e) for computing these coefficients?

5. Suppose that you have made a software implementetation of the pyramid algorithm based on the formulas in this section, and that you want to plug in some orthonormal wavelet filter from a table in some book. However, suddenly you see that the author of that book has derived the pyramid algorithm from the scaling equation 𝜑(𝑥) =∑

𝑘∈ℤ𝑚𝑘𝐶𝜑(2𝑥 + 𝑘) for some constant 𝐶 ∕=√

2 instead of the scaling equation 𝜑(𝑥) =∑

𝑘∈ℤ𝑙[𝑘]√

2𝜑(2𝑥 − 𝑘) that your implementation is based on. How should you translate the filter coefficients 𝑚𝑘

from the book to the filter coefficients 𝑙[𝑘] that your implementation is based on?

(19)

Figure 1.7: The principle behind noise reduction and compression If the signal is well repre- sented by a small number of large wavelet coefficient (here sorted in decreasing order), then by setting a threshold, and removing all coefficients smaller than this threshold, one can get a good approximation of the signal with a small number of coefficients (compression) as well as removing most of the noise in the process.

1.3 Different wavelet families and characterizing prop- erties

The wavelet bases is not the only available tool for analysing and construction functions, but no tool is the best tool for all applications.

For the condition monitoring application described in the slides from day two of the lectures (and described in full in [EGJ+04]), a certain continuous wavelet transform seemed to be a suitable choice due to its to their property to give better time resolution at higher frequencies.

For OFDM signal synthesis, as described in the slides from day one of the lectures, a Gabor type bases is usually used, and can be argued to have some advantages over wavelets for that particular application [KPZ02]. Both for the FBI finger print storage and the JPEG 2000 standard (also described in the slides from day 1) different compression methods were compared and the best one found was of wavelet type and there is research for constructing bases (beamlets, wedgelets, curvelets) that are argued to be even better suited for image applications.

For the MP3 standard, local cosine bases were chosen (those bases are described, for example in [Mal99]).

So for a given application where some function (signal/image) shall be synthesized or ana- lyzed, one can always try to argue from known properties of this kind of functions what kind of tool is best suited for this application. If wavelets is chosen, there are many different kind of wavelets to choose between. In general, one want to find wavelets that can give sparse repre- sentations of the synthesized or analyzed signal. This means that almost all information about the signal is stored in a small number of large wavelet coefficients. This is clearly an advantage for compression applications, since all but a few coefficients can be set to zero with only a small error, and also for many noise reduction purposes, since orthogonal bases map additive white Gaussian noise in the samples to additive white Gaussian noise evenly distributed as a relatively small error on all coefficients [BEL99, 166]c. See Figure 1.7.

Still, even if other basis can be even better for a particular application, wavelets are still quite useful. In this section we will describe a couple of different wavelets, some of their characterizing properties and how these can be of importance when choosing one particular wavelet basis for a given application. We describe the wavelets only briefly. For more details and examples, we

cTry to find a better citation.

(20)

Figure 1.8: Daubechies 𝑛 scaling functions. Larger 𝑛 means smoother wavelet and more vanishing moments, but at the cost of scaling functions and wavelets having larger support.

refer, for example to the extracts from the Wavelet Toolbox Use’s Guide [MMOP09] available at http://www.ltu.se/tfm/Fysik/ngsst/d2212/1.51590. For example, check the pages 1-39–1-47 to see plots and some properties of different wavelets and scaling functions or make your own plots, or, for a more extensive comparison and summarizing table of different families, see pages 6-71–6-91.

Wavelets with vanishing moments and minimal support: Daubechies and Symmlets We say that a wavelet 𝜓 has 𝑁 vanishing moments if

𝑥𝑛𝜓(𝑥) 𝑑𝑥 = 0 for 𝑛 = 0, 1, . . . , 𝑁 − 1

Note that 𝜓𝑗,𝑘 lives on an interval 𝐼𝑗,𝑘 of length proportional to 2−𝑗. Hence, if 𝑓 has 𝑛 ≤ 𝑁 continuous derivatives on 𝐼𝑗,𝑘, then it can be approximated there with a Taylor polynomial 𝑓 (𝑥) = 𝑃𝑛−1(𝑥) + 𝑂((2−𝑗)𝑛), so that

⟨𝑓, 𝜓𝑗,𝑘⟩ =

𝑓 (𝑥)𝜓𝑗,𝑘(𝑥) 𝑑𝑥 =

𝑃𝑛−1(𝑥)𝜓𝑗,𝑘(𝑥) 𝑑𝑥 + 𝑂(2−𝑗𝑛) = 𝑂(2−𝑗𝑛) (1.17) [BEL99, p. 79]. In other words, many vanishing moments means a fast decay of the wavelet coefficients provided that 𝑓 is smooth, but at the same time more vanishing moments can only be bought at the price of longer wavelet and scaling function support (as in Figure 1.8), thus, if 𝑓 is not smooth everywhere but has some hight frequency transient behaviour at some place(s), then these transients affect wavelet coefficients whose (dual) wavelet or scaling function support overlaps the transient behaviour: For example, the detail coefficient 𝑑𝑗,𝑘 =〈

𝑓, ˜𝜓𝑗,𝑘

〉can be large for all ˜𝜓𝑗,𝑘 whose support overlap some transient behaviour or a discontinuity of 𝑓 . Hence there can be a tradeoff between support length and smoothness/vanishing moments affecting the speed of decay of the wavelet coefficients. This is further discussed in [Mal99, p. 559-560]).

Example 1 (Daubechies 𝑁 wavelets). Minimal support [0, 2𝑁 − 1] for 𝑁 vanishing moments.

Filter length 𝑁. Highly assymetric with energy optimally concentrated near the starting point

(21)

of their support. The only (anti)symmetric orthonormal wavelet is Haar. The number of continuous derivatives growassymptotically as 0.2075𝑁. (See, e.g., [BEL99, p. 126],[Mal99, p.244, 253] or [HW96].)

Example 2 (Symmlet wavelets). 𝑁 vanishing moments with support [−𝑁 + 1, 𝑁] of the same lenght but less assymetric and less smooth than the Daubechies 𝑁 wavelets. (See, for example, [BEL99, p. 126], [Mal99, p.253] or [HW96].)

Scaling function with vanishing moments: Coiflets

Example 3 (Coiflets). For Coiflets, in addition in addition to the vanishing moments of the Daubeches wavelets, the scaling function has 𝑁 − 1 vanishing moments:

𝑥𝑛𝜑(𝑥) 𝑑𝑥 = 0, 𝑛 = 1, 2, . . . , 𝑁 − 1.

Hence, reasoning as in(1.16) and (1.17), if 𝑓 is 𝑛 ≤ 𝑁 times differentiable in the interval of length 𝑂(2−𝑗) where 𝜑𝑗,𝑘 lives then 𝑓 (𝑥) = 𝑃𝑛−1(𝑥) + 𝑂((2−𝑗)𝑛) in that interval, so that

⟨𝑓, ˜𝜑𝑗,𝑘⟩ =

𝑓 (𝑥)𝜑˜𝑗,𝑘(𝑥) 𝑑𝑥 = 2−𝑗/2𝑐𝑓 (2−𝑗𝑘) + 𝑂(2−𝑗𝑝), with 𝑐 as in (1.16). (1.18) Coiflets are even more symmetric that than Symmlets with the same number of vanishing moments, at the cost of longer support length 3𝑁 − 1 as compared to 2𝑁 − 1 for Symmlets.

The filter length is 3𝑁.

(See, for example, [BEL99, p. 130],[Mal99, p. 254] or [HW96].) Symmetry gives linear phase: Biorthogonal wavelets

The Haar wavelet is symmetric but also discontinuous. It is imposible to construct smooth biorthogonal wavelets of compact support that are symmetric or antisymmetric [Mal99, p. 269].

For biorthogonal wavelets, however this is possible. One can construct biorthonormal wavelets that are symmetric or antisymmetric around the centerpoint of their support (ℎ[𝑘] = ℎ[𝑁 − 𝑘]

or ℎ[𝑘] = −ℎ[𝑁 − 𝑘]). Such filters have linear phase [BEL99, p. 25], which means that its if its Fourier series expansion 𝐻(𝜉) is of the form

𝐻(𝜉) = 𝑟(𝜉)ei2𝜋𝜙(𝜉) with real-valued 𝑟 and 𝜙(𝜉) = 𝑎𝜉 + 𝑏.,

(More generally, ℎ also has linear phase if 𝜙 is piecewise linear with constant slope and discon- tinuities allowed only for 𝜉 such that 𝑟(𝜉)=0 [BEL99, p. 24].) If a signal 𝑠[𝑘] = ei2𝜋𝜉0𝑘 is sent through a linear phase filter, then the output is the convolutiond

𝑠 ∗ ℎ[𝑘]def=∑

𝑛∈ℤ

𝑠[𝑘 − 𝑛]ℎ[𝑛] = ∑

𝑛∈ℤ

ei2𝜋𝜉0(𝑘−𝑛)ℎ[𝑛] = ei2𝜋𝜉0𝑘

𝑛∈ℤ

e−i2𝜋𝜉0𝑛ℎ[𝑛]

=ei2𝜋𝜉0𝑘𝐻(𝜉0) = ei2𝜋𝜉0𝑘𝑟(𝜉0)ei2𝜋𝜙(𝜉0)= 𝑟(𝜉0)ei2𝜋(𝜉0𝑘+𝜙(𝜉0)) (𝜙(𝜉) = 𝑎𝜉 + 𝑏)

=𝑟(𝜉0)ei2𝜋𝑏ei2𝜋𝜉0(𝑘+𝑎), (1.19)

which means that a linear phase filter adds the same delay (here 𝑎) to all frequency compo- nents of a signal. This is an important property in applications such as speech and sound processing [BEL99, p. 24].

dReaders familiar with the discrete Fourier transform (DFT) would reduce this computation to a short one line application of standard properties of the DFT.

(22)

Example 4 (B-spline wavelets). Biorthogonal. See, e.g., [HW96, p. 152–154], [Mal99, p.

152–153, 224]. 𝜑0(𝑥) is the Haar scaling function and

𝜑𝑛(𝑥) = 𝜑𝑛∗ 𝜑0(𝑥), 𝑛 = 1, 2, 3, . . . Support and symmetry: [HW96, p. 154]

Dual scaling function and wavelet: [BEL99, p. 138] See also, e.g., [BEL99, p. 134], [HW96, p.

152–154].

See, for example the Wavelet Toolbox user’s guide for some more examples of biorthogonal wavelets.

Some more wavelets

Example 5 (Spline wavelets of order 𝑛). See, e.g., [HW96, p. 161–162]. Called Battle-Lemari´e wavelets in[BEL99, p. 141–142]. Obtained from applying a certain orthonormalizarion trick to the the B-spline wavelets. Both 𝜑, 𝜓 and their derivatives of order 1, 2, . . . , 𝑛 have exponential decay. (Note however that there is no infinitely many times differentiable orthonormal wavelet with exponential decay[HW96, p. 197].)

Example 6 (The Shannon wavelet). 𝜑(𝜉) = 𝜒[−1/2, 1/2](𝜉). Decays as 𝜑(𝑥) = sincˆ (𝑥) decays as 1/ ∣𝑥∣, which makes it practically useless. See, e.g., [HW96, p. 61–62].

Example 7 (Lemari´e–Meyer wavelets). The bad decay properties of the Shannon wavelets comes from discontinuities in the box-shaped Fourier transform 𝜑 of the scaling function. Theˆ Lemari´e–Meyer family of wavelets have a scaling function𝜑(𝜉) that equals on for ∣𝜑∣ < 1/2 − 𝜀ˆ and decays in a certain way to zero for 1/2 − 𝜀 ∣𝜑∣ < 1/2 + 𝜀 in order to give a bandlimited wavelet with faster decay. See, e.g., [HW96, p. 62–63]

Example 8 (Meyer wavelets). See, e.g., [BEL99, p. 141]. The wavelets have an infinite number of vanishing moments. 𝜑 and 𝜓 are very smooth in the sense that they are infinitely many times differentiable.

Exercises

1. (a) Show that the B-spline scaling function 𝜑1 in Example 4 is a so-called “tent function”

𝜑1(𝑥) =

⎧

⎩

𝑥 for 0 ≤ 𝑥 ≤ 1, 2 − 𝑥 for 1 ≤ 𝑥 ≤ 2, 0 otherwise.

(b) 𝜑1(𝑥) is one example of a scaling function for which the the dual scaling function ˜𝜑1 has the special property that it is a reproducing kernel, which means that for all 𝑓 ∈ 𝑉0,

〈𝑓, ˜𝜑1

= 𝑓 (0) and, consequently, 〈

𝑓, ˜𝜑10,𝑘

= 𝑓 (𝑘), so that any 𝑓 in 𝑉0 has series expansion

𝑓 (𝑥) =∑

𝑘∈ℤ

𝑓 (𝑘)𝜑10,𝑘(𝑥) (see, for example, [EG05] for more about such bases).

Given 𝑓 ∈ 𝑉0 such that 𝑓 (0) = 3, 𝑓 (1) = −3, 𝑓(2) = 1 and 𝑓(𝑘) = 0 for all other integers 𝑘, sketch 𝑓 .

(23)

Time-frequency analysis and the continuous wavelet transform

2.1 Time-frequency analysis

An everyday example of time-frequency analysis is the processing performed in the human inner ear (see Figure 2.1) when, for example, listening to someone playing a piano. The sound

Figure 2.1: Time-frequency analysis in the human ear: The sound reaches the outer ear as air pressure variations. It is transmitted through the middle and inner ear and converted to vibrations in the fluid within a spiral channel in the cochlea. The vibrations stimulate hair cells in the basilar membrane, which exhibits different degrees of stiffness, or resonance, along its length. As a result, the different frequencies of the sound are sorted out and transmitted to the brain stem by the cochlear nerve. After extensive processing, the nerve impulses are relayed to the cerebral cortex and the listener becomes aware of the sound. (Source: [Enc00])

21

References

Related documents

Data från Tyskland visar att krav på samverkan leder till ökad patentering, men studien finner inte stöd för att finansiella stöd utan krav på samverkan ökar patentering

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast

The picture illustrates the fact that CWT has a good time and poor frequency resolution at high frequencies (narrow scales at low scale values imply poor frequency resolution),

Approximation properties of multiresolution analysis and the smoothness of wavelet and the scaling functions depend on the number of vanishing wavelet moments. In [14]

1) Original, 2) Single pass denosied, 3) Removed noise, 4), second pass denoised seeking decorrelation between the noise model and the original file... FBI