Some Essentials of Data Analysis with Wavelets

(1)

Lecture notes in the wavelet part of the course in data analysis at The Swedish National Graduate School of Space Technology,

Kiruna and Lule˚a, November–December 2009.

Niklas Grip

^a

Department of Mathematics Lule˚a University of Technology

SE-971 87 Lule˚a, Sweden Niklas.Grip@ltu.se.

aThe author was partially supported by the Swedish Research Council (project registration number 2004-3862).

(2)

(3)

The lectures and the accompanying slides will give an overview and more example applications.

These lecture notes partly follows the lectures, leaves out some applications mentioned there and covers the underlying mathematics at some more depth, with references to some (of many) books that gives a more broad and deep coverage.

(4)

Chapter 1 The discrete wavelet transform and multiresolution analysis

1.1 Basic idea: Approximation + details

Suppose that you want to compute a rough approximation of some function or signal 𝑓 (𝑥) on the interval [0, 1]. The simplest approximation you can come up with is probably an approximation by one number, say the mean value:

𝑓 (𝑥) ≈ 𝛼⁰𝜑(𝑥) ^def= 𝑎0(𝑥), 𝛼0 =

∫ 1 0

𝑓 (𝑥) 𝑑𝑥 =

∫

ℝ

𝑓 (𝑥)𝜑(𝑥) 𝑑𝑥, 𝜑(𝑥)^def=

{1 if 0 ≤ 𝑥 ≤ 1, 0 otherwise.

This approximation is plotted in Figure 1.1 (a), and to be honest, it is a pretty rough approximation, so perhaps we want to add some more details, say, by switching to mean value approximation on intervals of length 1/2, as in Figure 1.1 (b). This is an approximation by two numbers

𝑓 (𝑥) ≈ 𝑎¹(𝑥)^def= 𝛼1,0𝜑(2𝑥) + 𝛼1,1𝜑(2𝑥 − 1),

because 𝜑(2𝑥) and 𝜑(2𝑥−1) equals one exactly on the intervals [0, 1/2] and [1/2, 1], respectively.

(This may take a little practice to see, but note, for example, that 𝜑(2𝑥 − 1) equals one in an interval with endpoints 2𝑥 − 1 = 0 and 2𝑥 − 1 = 1. Hence the factor 2 shrinks the length of interval where the function lives with a factor 2 and “-1” moves it 1/2 to the right.) Once again, we can write the coeﬃcients 𝛼_𝑗,𝑘 in terms of 𝜑 and 𝑓 by writing the mean values in the form

𝛼1,𝑘=

∫

ℝ𝑓 (𝑥) ⋅ 2𝜑(2𝑥 − 𝑘) 𝑑𝑥, 𝑘 = 0, 1.

Similarly, we can add more details by computing mean values on intervals of length 1/4:

𝑓 (𝑥) ≈ 𝑎²(𝑥)^def= 𝛼2,0𝜑(4𝑥) + 𝛼2,1𝜑(4𝑥 − 1) + 𝛼^2,2𝜑(4𝑥 − 2) + 𝛼^2,1𝜑(4𝑥 − 3), as in Figure 1.1 (c), with coeﬃcients being the mean values

𝛼2,𝑘=

∫

ℝ𝑓 (𝑥) ⋅ 4𝜑(4𝑥 − 𝑘) 𝑑𝑥, 𝑘 = 0, 1, 2, 3.

In general, we can get an arbitrarily ﬁne approximation by mean values on intervals of length 2^−𝑛:

𝑓 (𝑥) ≈ 𝑎^𝑛(𝑥)^def=

2∑^𝑛−1 𝑘=0

𝛼𝑛,𝑘𝜑(2^𝑛𝑥 − 𝑘), with 𝛼𝑛,𝑘 =

∫

ℝ𝑓 (𝑥) ⋅ 2^𝑛𝜑(2^𝑛𝑥 − 𝑘) 𝑑𝑥.

4

(7)

0 1 x

f(x) a0(x)

0 1/2 1

x

f(x) a1(x)

(a) (b)

0 1/4 2/4 3/4 1

x

f(x) a2(x)

0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1

x

f(x) a3(x)

(c) (d)

Figure 1.1: Mean value approximation on intervals of length 1, 1/2, 1/4 and 1/8. ...

as shown in Figure 1.1 (d) for 𝑛 = 4 and for some larger 𝑛 in Figure 1.4 (page 8).

Suppose now that we want to send this approximation to a friend somewhere far away, and we do not know in advance how detailed approximation of 𝑓 that our friend wants to see before getting satisﬁed and interrupting he transmission. Then it would be a waste of time and bandwidth to ﬁrst send approximation 𝑎0, then approximation 𝑎1, followed by 𝑎2, 𝑎3 and so on up to some 𝑎𝑛, because that would require sending ∑𝑛

𝑘=02^𝑘 = 2^𝑛+1− 1 coeﬃcients, only the last 2^𝑛 of which our friend will need and have any actual use of. A simple trick to get around this problem is to rewrite 𝑎𝑛 as a telescope sum

𝑎𝑛 = 𝑎0+ (𝑎1− 𝑎⁰) + (𝑎2− 𝑎¹) + (𝑎3− 𝑎²) + ⋅ ⋅ ⋅ + (𝑎^𝑛− 𝑎𝑛−1).

The important diﬀerence is that for each new 𝑘, we will now only send the additional details needed for computing 𝑎𝑘 from 𝑎_𝑘−1:

𝑎𝑘= 𝑎_𝑘−1+ 𝑑_𝑘−1, 𝑑_𝑘−1 ^def= 𝑎𝑘− 𝑎𝑘−1.

(8)

0 1 x

Old approximation New approximation

0 1/2 1

x

Old approximation New approximation

(a) (b)

Figure 1.2: (a) The approximations 𝑎0 and 𝑎1. (b) The approximations 𝑎1 and 𝑎2.

Figure 1.3: The Haar wavelet 𝜓 and its scaling function 𝜑.

Moreover, in our particular case, this can be done in such a way that no time or bandwidth is wasted, that is, exactly 2^𝑘 coefficients is needed for transmitting all the approximations 𝑎0, 𝑎1, 𝑎2, . . . , 𝑎𝑘. For example, Figure 1.2 (a) shows the approxiamations 𝑎0 and 𝑎1. Their difference 𝑑0 = 𝑎1 − 𝑎⁰ seems to be some multiple of the function 𝜓(𝑥) = 𝜑(2𝑥) − 𝜑(2𝑥 − 1) plotted in Figure 1.3. Similarly, Figure 1.2 (b) shows the approxiamations 𝑎₁ and 𝑎₂. Their difference 𝑑1 = 𝑎2− 𝑎¹ looks pretty much like a linear combination 𝛽1,0𝜓(2𝑥) + 𝛽1,1𝜓(2𝑥 − 1).

This is in fact the case. For example, for the details 𝑑0, it holds that 𝑑0(𝑥)^def= 𝑎1(𝑥) − 𝑎⁰(𝑥)

= (∫

ℝ𝑓 (𝑡) ⋅ 2𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

∫

ℝ𝑓 (𝑡) ⋅ 2𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1) )

−

∫

ℝ

𝑓 (𝑡)𝜑(𝑡) 𝑑𝑡 𝜑(𝑥)

(9)

= (∫

ℝ𝑓 (𝑡) ⋅ 2𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

∫

ℝ𝑓 (𝑡) ⋅ 2𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1) )

−

∫

ℝ𝑓 (𝑡)𝜑(𝑡) 𝑑𝑡 (𝜑(2𝑥) + 𝜑(2𝑥 − 1))

=

∫

ℝ

𝑓 (𝑡)𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

∫

ℝ𝑓 (𝑡)𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1) +

∫

ℝ𝑓 (𝑡)(𝜑(2𝑡) − 𝜑(𝑡)) 𝑑𝑡 𝜑(2𝑥) +

∫

ℝ𝑓 (𝑡)(𝜑(2𝑡 − 1) − 𝜑(𝑡)) 𝑑𝑡 𝜑(2𝑥 − 1)

=

∫

ℝ

𝑓 (𝑡)𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥) +

∫

ℝ𝑓 (𝑡)𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥 − 1)

−

∫

ℝ𝑓 (𝑡)𝜑(2𝑡 − 1) 𝑑𝑡 𝜑(2𝑥) −

∫

ℝ𝑓 (𝑡)𝜑(2𝑡) 𝑑𝑡 𝜑(2𝑥 − 1)

=

∫

ℝ𝑓 (𝑡)(𝜑(2𝑡) − 𝜑(2𝑡 − 1)) 𝑑𝑡 (𝜑(2𝑥) − 𝜑(2𝑥 − 1)) =

∫

ℝ

𝑓 (𝑡)𝜓(𝑥) 𝑑𝑡 𝜓(𝑥)

=𝛽_0,1𝜓(𝑥) with 𝛽_0,1 =

∫

ℝ

𝑓 (𝑡)𝜓(𝑥) 𝑑𝑡

with 𝜓(𝑥) = 𝜑(2𝑡) − 𝜑(2𝑡 − 1) as depicted in Figure 1.3. Similarly, for 𝑑¹ we can repeat exactly the same kind of computation on each of the intervals [0, 1/2] and [1/2, 1] to get

𝑑1(𝑥)^def= 𝑎2(𝑥) − 𝑎¹(𝑥) = 𝛽1,0𝜓(2𝑥) + 𝛽1,1𝜓(2𝑥 − 1) with 𝛽1,𝑘=

∫

ℝ𝑓 (𝑥) ⋅ 2𝜓(2𝑥 − 𝑘) 𝑑𝑥.

(1.1) We leave it to the doubting reader(s) to conﬁrm this in Exercise 1 below. For larger 𝑛, it follows in exactly the same way for 𝑑𝑛 that

𝑑𝑛(𝑥)^def= 𝑎𝑛+1(𝑥) − 𝑎^𝑛(𝑥) =

2∑^𝑛−1 𝑘=0

𝛽𝑛,𝑘𝜓(2^𝑛𝑥 − 𝑘) with 𝛽𝑛,𝑘 =

∫

ℝ𝑓 (𝑥) ⋅ 2^𝑛𝜓(2^𝑛𝑥 − 𝑘) 𝑑𝑥.

(1.2) The approximations up to approximation level 6 are illustrated in Figure 1.4. The number of coeﬃcients 𝛼0,𝑘 and 𝛽𝑗,𝑘 needed to transmit 𝑎𝑛(𝑥) = 𝑎0(𝑥) + 𝑑0(𝑥) + 𝑑1(𝑥) + ⋅ ⋅ ⋅ + 𝑑𝑛−1(𝑥) is 1 +∑_𝑛−1

𝑘=02^𝑘= 2^𝑛.

So far we have restricted to the interval [0, 1], but the same procedure can be repeated in each integer endpoint interval [𝑘, 𝑘 + 1]. In the terminology of next section (with a few small changes in notation, see Exercise 2), the functions 𝜑(𝑥 − 𝑘) and 2^𝑗/2𝜓(2^𝑗 − 𝑘) (𝑗, 𝑘 integers) is the Haar wavelet basis for the function space of square integrable functions 𝑓 (that is, for all 𝑓 for which ∫

ℝ∣𝑓(𝑥)∣ 𝑑𝑥 < ∞) has a wavelet series expansion 𝑓 (𝑥) =∑

𝑘∈ℤ

𝑎0,𝑘𝜑(𝑥 − 𝑘) +

∑∞ 𝑗=0

∑

𝑘∈ℤ

𝑑𝑗,𝑘𝜓𝑗,𝑘(𝑥), 𝜓𝑗,𝑘(𝑥)^def= 2^𝑗²𝜓(2^𝑗𝑥 − 𝑘).

An arbitrarily detailed approximation can be computed by truncations to ﬁnite sums. We will see that the Haar wavelet basis is orthonormal (explained in next section). This is the underlying mathematical reason for the simple integral equations for the coeﬃcients that we found above. It also follows that the sum ∑

𝑘∈ℤ𝑎0,𝑘𝜑(𝑥 − 𝑘) is the orthogonal projection of 𝑓 on the space 𝑉0 of square integrable functions that are constant on integer endpoint intervals [𝑘, 𝑘 + 1). For 𝑗 = 1, the sum ∑

𝑘∈ℤ 𝑑_𝑗,𝑘𝜓_𝑗,𝑘(𝑥) adds the details necessary to get an approximation in the space 𝑉1 of square integrable functions that are constant on all intervals

(10)

0 1 Approximation of f

f(x) a0(x)

0 1

Adding details

a₀(x) a0(x)+d

0(x)

0 1/2 1

f(x) a₁(x)

0 1/2 1

a1(x) a₁(x)+d₁(x)

0 1/4 2/4 3/4 1

f(x) a₂(x)

0 1/4 2/4 3/4 1

a₂(x) a₂(x)+d₂(x)

0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1

f(x) a3(x)

0 1/8 2/8 3/8 4/8 5/8 6/8 7/8 1

a₃(x) a₃(x)+d₃(x)

0 2/16 4/16 6/16 8/16 10/16 12/16 14/16 1

f(x) a₄(x)

0 2/16 4/16 6/16 8/16 10/16 12/16 14/16 1

a₄(x) a4(x)+d

4(x)

0 4/32 8/32 12/32 16/32 20/32 24/32 28/32 1

f(x) a₅(x)

0 4/32 8/32 12/32 16/32 20/32 24/32 28/32 1

a5(x) a₅(x)+d₅(x)

0 8/64 16/64 24/64 32/64 40/64 48/64 56/64 1

f(x) a6(x)

Figure 1.4: For 𝑘 = 0, 1, . . . , 5, the 𝑘th row shows a comparison of 𝑓 with 𝑎𝑘 on the left and, on the right, a plot of 𝑎𝑘 and 𝑎𝑘+ 𝑑𝑘. A comparison with row 𝑘 + 1 illustrates that 𝑎𝑘+ 𝑑𝑘= 𝑎𝑘+1.

[𝑘/2, (𝑘 + 1)/2) with integer 𝑘, and so on, each 𝑗 adding more details, giving a function space 𝑉𝑗 of approximations that are constant on the intervals [2^−𝑗𝑘, 2^−𝑗(𝑘 + 1)) for 𝑘 integer.

The Haar wavelet basis is well localized, meaning that if a function has some transient localized behaviour of importance, then we can easily make a more detailed approximation (larger 𝑗)) exactly where things happen and settle for more rough approximations (smaller 𝑗) in regions where 𝑓 is more slowly varying. For example, in Figure 1.4 perhaps adding details up to level 𝑗 = 5 can be considered enough in the interval 5/8–7/8 but a more detailed approximation (larger 𝑗) may be desired in the interval 1/8–3/8. This localization is a clear diﬀerence from the Fourier basis functions e^{i2𝜋𝜉𝑥}, who have no localization at all.

One disadvantage of the Haar basis is that it is discontinuous, hence giving the staircase shape of the approximation that is clearly visible in Figure 1.4 and causes larger 𝑗 to be needed for a good approximation of a smooth function 𝑓 . This raises the natural question whether we can keep all the good properties of the Haar wavelet basis but add smoothness? Can we, somehow, “round oﬀ the corners” of 𝜙 and 𝜓 in such a way that the functions 𝜑(𝑥 − 𝑘) and 𝜓𝑗,𝑘(𝑥) still are an orthonormal basis with the associated simple integral equation for the coeﬃcients and basis functions living on a relatively small interval?

This is, indeed possible. The ﬁrst step in that direction was a wavelet basis constructed by

(11)

Str¨omberg [Str81], although it got little publicity at that time. He was followed by Meyer [Mey85]

a few years later. Meyer found his orthonormal wavelet as an unexpected counterexample when he, unaware of Strömberg’s results, tried to, basically, prove that a well-localized wavelet basis cannot be very smooth. Soon after, other wavelet bases were found. These first constructions built on calculations that seemed miraculous at that time, but got a simple and natural expla- nation in the fall of 1986, when Mallat and Meyer [Mal89, Mey86] developed the multiresolution analysis framework for construction of orthonormal and biorthogonal wavelets. When he first heard about the Meyer basis, Mallat was working on image analysis, where the idea of simulta- neously studying images at different scales had been popular for many years. Multiresolution analysis uses wavelets as a tool for describing the “increment of information” needed to go from a coarse approximation to a higher resolution approximation. (Source for these historical notes: [Dau92, pages 116, 129] and [Mey92, page 125].)

Exercises

1. Conﬁrm (1.1) in at least one of the intervals [0, 1/2] and [1/2, 1] by repeating a slightly modiﬁed verison of the the computations done for 𝑑0 (or by using the result (1.1) together with some suitable substitution of variables).

2. Show that if we set 𝜓_𝑛,𝑘 ^def= 2^𝑛/2𝜓(2^𝑛𝑥 − 𝑘), then we can rewrite (1.2) in a more symmetric form with sums ∑

𝑘𝑑𝑛,𝑘𝜓𝑛,𝑘(𝑥) with 𝑑𝑛,𝑘 =∫

ℝ𝑓 (𝑥) ⋅ 𝜓^𝑛,𝑘(𝑥).

1.2 Multiresolution analysis adds smoothnes and more

We begin with a review of some notation and terminology that will be used. The reader can either browse through Section 1.2.1 before continuing with Section 1.2.2 or go directly there and use Section 1.2.1 as reference when necessary.

1.2.1 Notation and preliminaries

This text requires no prerequisites in functional analysis, but will use some standard notation that is central for the wavelet theory and useful for reading other books or papers on the subject.

For sets 𝐴 and 𝐵, 𝑎 ∈ 𝐴 means that 𝑎 is an element in the set 𝐴 and 𝐴 ⊂ 𝐵 means that 𝐴 is a subset of 𝐵 (that is, if 𝑎 ∈ 𝐴, then 𝑎 ∈ 𝐵). We write 𝐴 ⊆ 𝐵 if 𝐴 ⊂ 𝐵 or 𝐴 = 𝐵. The union 𝐴 ∪ 𝐵 is the set of all 𝑥 such that 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵. The intersection 𝐴 ∩ 𝐵 is the set of all 𝑥 such that 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵.

Moreover, ℤ is the set of all integers and 𝑙² is the set of all sequences 𝑥 = (𝑥_𝑘)_𝑘∈ℤ of complex numbers that have a ﬁnite 𝑙² norm

∥𝑥∥𝑙²

def= √∑

𝑘∈ℤ

∣𝑥𝑘∣².

Similarly, ℝ denotes the set of all real numbers and the set of square integrable functions will be denoted 𝐿²(ℝ) and consists of al complex valued functions with ﬁnite 𝐿² norm

∥𝑓∥𝐿² def=

√∫

ℝ∣𝑓(𝑥)∣² 𝑑𝑥

(12)

(in engineering terminology, ∥𝑓∥²𝐿² is sometimes called the energy of 𝑓 ). For two square integrable functions 𝑓 and 𝑔, the inner product ⟨𝑓, 𝑔⟩ is the integral

⟨𝑓, 𝑔⟩^def=

∫

ℝ

𝑓 (𝑥)𝑔(𝑥) 𝑑𝑥,

with overline notation 𝑧 for the complex conjugate of a complex number 𝑧. We say that 𝑓 and 𝑔 are orthogonal if ⟨𝑓, 𝑔⟩ = 0. A sequence of mutually orthogonal functions 𝑔^𝑘 is called orthonormal if all elements have norm ∥𝑔^𝑘∥𝐿²(ℝ) = 1.

A sequence of functions 𝑔𝑘 in 𝐿²(ℝ) is called a Riesz basis for a subspace 𝑉 of 𝐿²(ℝ) if there exists positive real numbers 𝐴 and 𝐵 such that

𝐴 ∥𝑐∥²𝑙² ≤

∑

𝑘∈ℤ

𝑐_𝑘𝑔_𝑘

2

𝐿²

≤ 𝐵 ∥𝑐∥²𝑙²for all 𝑙² sequences 𝑐 = (𝑐_𝑘)_𝑘∈ℤ. (1.3) For a more thorough review of Riesz bases and related subjects, we refer, for example, to [Gr¨o00, Chr02, Gri02], the property of crucial importance in this text is that if (𝑔_𝑘)_𝑘∈ℤ is a Riesz basis for 𝑉 , then there exists a corresponding dual Rises basis (𝑔˜𝑘)_𝑘∈ℤ for 𝑉 with the characterizing property that every function 𝑓 in 𝑉 has unique series expansions.

𝑓 =∑

𝑘∈ℤ

⟨𝑓, ˜𝑔𝑘⟩ 𝑔^𝑘=∑

𝑘∈ℤ

⟨𝑓, 𝑔^𝑘⟩ ˜𝑔𝑘 (1.4)

with convergence in norm, which means that

∑

𝑛≥𝑁

⟨𝑓, ˜𝑔𝑘⟩ 𝑔^𝑘 𝐿²(ℝ)

→ 0 and

∑

𝑛≥𝑁

⟨𝑓, 𝑔^𝑘⟩ ˜𝑔𝑘

𝐿²(ℝ)

→ 0

when 𝑁 → ∞. For the wavelet spaces 𝑉^𝑗 of main importance in this text, convergence in norm implies uniform pointwise convergence [EG05, Lemma 2.1]. The type of convergence is of great mathematical importance, but not anything that the reader needs to worry about for a suﬃcient understanding of the material presented in this text.

From the uniqueness of (1.4) follows that the sequences (𝑔_𝑘) and (˜𝑔𝑘) are biorthogonal, that is, that

⟨𝑔𝑛,˜𝑔𝑘⟩ = 𝛿𝑘[𝑛]^def=

{1 if 𝑘 = 𝑛

1 otherwise. (1.5)

For 𝐴 = 𝐵 = 1, (1.3) becomes the Parseval relation ∑

𝑘∈ℤ𝑐𝑘𝑔𝑘

²

𝐿² = ∥𝑐∥²𝑙², from which it follows [Gri02, Theorem 1.54] that 𝑔𝑘 =˜𝑔𝑘, that the functions 𝑔𝑘are orthonormal and hence (𝑔𝑘) is called an orthonormal basis. One of the most well-known orthonormal bases is the Fourier basis (

e^{i2𝜋𝑘𝜉})

𝑘∈ℤ for the set 𝐿²([0, 1)) of square integrable functions on the interval [0, 1).

1.2.2 Multiresolution analysis

Deﬁnition 1 (Multiresolution analysis). A multiresolution analysis (MRA) consists of a sequence (𝑉𝑗)_𝑗∈ℤ of closed subspaces, of 𝐿²(ℝ) satisfying

𝑉_𝑗 ⊂ 𝑉𝑗+1 for all 𝑗 ∈ ℤ; (1.6a)

𝐿²(ℝ) = ∪𝑗∈ℤ𝑉𝑗

def= the smallest closed subset of 𝐿²(ℝ) that contains all 𝑉𝑗; (1.6b) 𝑓 ∈ 𝑉^𝑗 if and only if 𝑓 (2(⋅)) ∈ 𝑉^𝑗+1 for all 𝑗 ∈ ℤ; (1.6c)

∩𝑗∈ℤ𝑉_𝑗 = {0} ; (1.6d)

For some 𝜑 ∈ 𝑉⁰, the sequence (𝜑(⋅ − 𝑘))_𝑘∈ℤ is a Riesz basis for 𝑉0. (1.6e)

(13)

Figure 1.5: The pyramid algorithm. (a) Decomposition of a function into its wavelet coeﬃ- cients. (b) (Re)construction of a function from its wavelet coeﬃcients. For “large enough”

𝐽 the projection 𝑃𝐽 on 𝑉𝐽 can be neglected and the pre- and postfilter can be approximated with an identity mapping. Hence it is common practice to choose a big 𝐽 and use sample values of 𝑓 as a good approximation of the input coefficient vector 𝒂𝐽 in (a) and similarly for the oputput vector 𝒂𝐽 in (b). Convolusion with the filters ℎ, 𝑙, ˜ℎ^∗ and

˜𝑙^∗ is here denoted with the corresponding frequency responses 𝐻(𝜉) ^def= ∑

𝑘∈ℤℎ[𝑘]e^{−𝑖2𝜋𝜉𝑘}, 𝐻(𝜉)˜ ^def= ∑

𝑘∈ℤ˜ℎ[𝑘]e^{−𝑖2𝜋𝜉𝑘} =∑

𝑘∈ℤ˜ℎ^∗[−𝑘]e^{−𝑖2𝜋𝜉𝑘} =∑

𝑘∈ℤ˜ℎ^∗[𝑘]e^{−𝑖2𝜋𝜉𝑘} and similarly for 𝑙 ,˜𝑙^∗. The function 𝜑 in (1.6e) is called a scaling function.

Conditions (1.6a) and (1.6b) guarantees that (𝑉𝑗) is a chain of growing subspaces of 𝐿²(ℝ) such that any function in 𝐿²(ℝ) can be approximated arbitrarily well with some function 𝑓𝑗 ∈ 𝑉^𝑗 for a large enough 𝑗. Condition (1.6c) guarantees that the functions in 𝑉𝑗 contain smaller and smaller details as 𝑗 increases: 𝑉𝑗+1 approximates functions at twice as ﬁne scale as 𝑉𝑗. Condition (1.6d) means that the only ﬁnite energy function that can be approximated at an arbitrary coarse scale is the zero function.

Condition (1.6e), ﬁnally, is clearly necessary for 𝜑 to have the same role in an MRA wavelet basis that the Haar scaling function had in Section 1.1.

All practically useful wavelets have ﬁnite integral ∫

ℝ𝜑(𝑥) 𝑑𝑥 < ∞, which implies (see Re- mark 1, page 15) conditions (1.6d) and (1.6b) follow from the other conditions (1.6a), (1.6c) and (1.6e) and, for orthonormal wavelets, it also follows that

∫

ℝ

𝜑(𝑥) 𝑑𝑥 = 1. (1.7)

The theoretical importance of the MRA-definition cannot be overestimated. For example, the conditions (1.6) can be viewed as a mathematical problem that can be solved under different additional constraints on, for example, the smoothness or support of 𝜑 or the corresponding wavelet (some such properties are explained in Section 1.3). The mathematical machinery used to solve such a problem (see, for example, [HW96, BEL99, Mal99]) is out of the scope of this more applications oriented text, so here we only note that it can be done and the resulting solutions are different families of scaling functions satisfying equation (1.6) as well as some additional constraints.

For each such scaling function, there are several important consequences of the MRA deﬁ- nition that we now will describe somewhat closer.

(14)

Figure 1.6: Wavelet subspaces. (a) A Venn diagram that illustrates that 𝑉𝑗 ⊂ 𝑉^𝑗+1 and that 𝑉𝑗 ⊕ 𝑊^𝑗 = 𝑉𝑗+1. (b) Illustration of the fact that 𝑉𝑗 is spanned by the sequence (𝜑𝑗,𝑘)_𝑘∈ℤ an similarly for ˜𝑉_𝑗, 𝑊_𝑗 and ˜𝑊_𝑗, as well as the biorthogonality conditions between the spaces and their basis functions: For example, 𝑉𝑗 and 𝑊𝑗 can be non-orthogonal, but ˜𝑊𝑗⊥𝑉^𝑗 and ˜𝑉𝑗⊥𝑊^𝑗. Perhaps the most important consequence of (1.6) is the existence of the fast so-called pyramid algorithm for computing the coeﬃcients of the MRA wavelet series expansion from sample values of the analysed signal. This mapping from sample values to wavelet coeﬃcients is sometimes called the discrete wavelet transform (DWT). The pyramid algorithm computes both the DWT and its inverse in 𝑂(𝑁) time for signal length 𝑁 (that is, for large 𝑁 the number of arithmetic operations are asymptotically proportional to 𝑁 when 𝑁 → ∞). This can be compared to the fast Fourier transform that is an algorithm for computing the discrete Fourier transform in 𝑂(𝑁 log(𝑁)) time for signal lenght 𝑁.

The pyramid algorithm consists of a repeated use of certain lowpass 𝐿 and highpass ﬁlters 𝐻 in a way that is depicted in Figure 1.5).

Important MRA consequences leading to the pyramid algorithm

We will now list some very important consequences of the MRA deﬁnition (1.6) and use them for deriving the pyramid algorithm in step 5 of the list.

1. It follows from (1.6a) and elementary functional analysis (see, for example, [Kre78]) that each space 𝑉𝑗+1 can be decomposed into 𝑉𝑗 and a subspace 𝑊𝑗 such that every 𝑓 in 𝑉𝑗+1

can be uniquely decomposed into 𝑓 = 𝑢 + 𝑣 with 𝑢 ∈ 𝑉𝑗 and 𝑣 ∈ 𝑊𝑗. We write this

𝑉_𝑗+1 = 𝑉_𝑗 ⊕ 𝑊𝑗, ∀𝑗 ∈ ℤ (1.8)

(corresponding to 𝑎𝑛+1(𝑥) = 𝑎𝑛(𝑥) + 𝑎𝑛(𝑥) in (1.2).) This relation between the subspaces is illustrated in Figure 1.6. If all such 𝑢 and 𝑣 are orthogonal (⟨𝑢, 𝑣⟩ = 0), then 𝑊^𝑗 is the orthogonal complement of 𝑉𝑗 in 𝑉𝑗+1 (𝑉𝑗⊥𝑊^𝑗) and the construction below will give the scaling function and mother wavelet of an orthonormal wavelet basis for 𝐿²(ℝ). In general, however, 𝑊𝑗 does not have to be the orthogonal complement of 𝑉𝑗a (see Figure 1.6 (a)).

aThis is just like in linear algebra, where, for example, the vector spaceℝ³can be decomposed intoℝ³= 𝑈 ⊕𝑉

(15)

One example of this will be the (biorthogonal) spline wavelets whose scaling function deﬁned in in Example 4, page 20.

2. Since {𝜑(𝑥 − 𝑘)}_𝑘∈ℤ is a Riesz basis for 𝑉0 there is a corresponding dual Riesz basis that turns out to have the same structure [Gr¨och/Aldroubi citation here???], that is, the basis elements are integer shifts 𝜑(𝑥 − 𝑘) of the dual scaling function ˜˜ 𝜑.

3. From (1.6a) follows that 𝜑,{√ 𝜑 ∈ 𝑉˜ ⁰ ⊆ 𝑉¹. Moreover, from (1.6e) and (1.6c) follows that 2𝜑(2𝑥 − 𝑘)}

𝑘∈ℤ and {√

2𝜑(2𝑥 − 𝑘)˜ }

𝑘∈ℤ are dual Riesz bases for 𝑉1. Hence we get the important scaling equation for 𝜑 and 𝜑, respectively:˜

𝜑(𝑥) =∑

𝑘∈ℤ

𝑙[𝑘]√

2𝜑(2𝑥 − 𝑘), and 𝜑(𝑥) =˜ ∑

𝑘∈ℤ

˜𝑙[𝑘]√

2𝜑(2𝑥 − 𝑘),˜ (1.9)

with coeﬃcients computed with the standard formula as in (1.4):

𝑙[𝑘] =

∫

ℝ

𝜑(𝑥)√

2𝜑(2𝑥 − 𝑘) 𝑑𝑥˜ and ˜𝑙[𝑘] =∫

ℝ

𝜑(𝑥)√

2𝜑(2𝑥 − 𝑘) 𝑑𝑥.˜

4. Now set

𝜓(𝑥)^def= ∑

𝑘∈ℤ

ℎ[𝑘]√

2𝜑(2𝑥 − 𝑘), and 𝜓(𝑥)˜ ^def= ∑

𝑘∈ℤ

˜ℎ[𝑘]√

2𝜑(2𝑥 − 𝑘),˜ (1.10a)

with

ℎ[𝑘]^def= (−1)^1−𝑘𝑙[1 − 𝑘] and ˜ℎ[𝑘]^def= (−1)^1−𝑘˜𝑙[1 − 𝑘]. (1.10b) Then it can be shown ([BEL99, Section 4.6], [Mal99, Section 7.4] or the more general result [HW96, p. 57-59] proved for orthonormal wavelets) that {𝜓(𝑥 − 𝑘)}_𝑘∈ℤ is a Riesz basis for 𝑊0 with dual basis of the same structure {

𝜓(𝑥 − 𝑘)˜ }

𝑘∈ℤ. Moreover, with notation

𝜑𝑗,𝑘(𝑥) = 2^𝑗/2𝜑(2^𝑗𝑥 − 𝑘) and similarly for 𝜑, 𝜓 and ˜˜ 𝜓, (1.11) a substitution of variables gives corresponding biorthogonal bases {𝜓^𝑗,𝑘}_𝑘∈ℤand{

𝜓˜𝑗,𝑘

}

𝑘∈ℤ

for corresponding subspaces 𝑊𝑗 and ˜𝑊𝑗, respectively, in a decomposition of 𝐿²(ℝ) into subspaces 𝑉0⊕ 𝑊⁰⊕ 𝑊¹⊕ 𝑊²⊕ 𝑊³⊕ ⋅ ⋅ ⋅ with corresponding dual subspaces ˜𝑉0⊕ ˜𝑊0⊕ 𝑊˜1 ⊕ ˜𝑊2 ⊕ ˜𝑊3 ⊕ ⋅ ⋅ ⋅ . In other words, all square integrable function 𝑓, have the series expansion

𝑓 (𝑥) =∑

𝑘∈ℤ

⟨𝑓, ˜𝜑0,𝑘⟩ 𝜑^0,𝑘(𝑥) +

∑∞ 𝑗=0

∑

𝑘∈ℤ

〈 𝑓, ˜𝜓𝑗,𝑘

〉

𝜓𝑗,𝑘(𝑥)

(and similarly with the roles of the basis and the dual basis interchanged, as in (1.4)).

Here, just like for the Haar wavelet in Section 1.1, ∑

𝑘∈ℤ⟨𝑓, 𝜑^0,𝑘⟩ 𝜑^0,𝑘(𝑥) is a coarse scale 𝑉₀-approximation of 𝑓 and for every 𝑗, the sum ∑

𝑘∈ℤ⟨𝑓, 𝜓𝑗,𝑘⟩ 𝜓𝑗,𝑘(𝑥) adds the details in 𝑊𝑗, that is, exactly the details needed to get from the 𝑉𝑗-approximation to the next ﬁner approximation with a function in 𝑉𝑗+1.

with 𝑈 being the 𝑥-𝑦-plane and 𝑉 the 𝑧-axis. In this case, 𝑈 and 𝑉 are orthogonal, but we can just as well writeℝ³= 𝑈 ⊕ 𝑉^′ with 𝑉^′ being the straight line spanned by the vector(

1 1 1)^T .

(16)

5. It follows that every 𝑓 in 𝑉_𝑗 = 𝑉_𝑗−1⊕ 𝑊𝑗−1 can be written in the two equivalent forms 𝑓 (𝑥) =∑

𝑘∈ℤ

𝑎𝑗,𝑘𝜑𝑗,𝑘(𝑥) =∑

𝑘∈ℤ

𝑎_{𝑗−1,𝑘}𝜑_{𝑗−1,𝑘}(𝑥) + 𝑑_{𝑗−1,𝑘}𝜓_{𝑗−1,𝑘}(𝑥). (1.12)

The decomposition part of the pyramid algorithm (depicted in Figure 1.5 (a)) computes the coeﬃcients 𝑎_{𝑗−1,𝑘} and 𝑑_{𝑗−1,𝑘} from the coeﬃcients 𝑎𝑗,𝑘.

To do this we ﬁrst need to reformulate the scaling equation (1.9) to the corresponding equations for 𝑉_𝑗−1:

𝜑(𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]√

2𝜑(2𝑥 − 𝑛)

2^𝑗−1² 𝜑(2^𝑗−1𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]√

2 ⋅ 2^𝑗−1² 𝜑(2 ⋅ 2^𝑗−1𝑥 − 𝑛)

2^𝑗−1² 𝜑(2^𝑗−1𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]2^𝑗²𝜑(2^𝑗𝑥 − 𝑛) (substitute 𝑥 7→ 𝑥 − 2^{−(𝑗−1)}𝑘)

2^𝑗−1² 𝜑(2^𝑗−1𝑥 − 𝑘) =∑

𝑛∈ℤ

𝑙[𝑛]2^𝑗²𝜑(2^𝑗𝑥 − (2𝑘 + 𝑛)) 𝜑_{𝑗−1,𝑘}(𝑥) =∑

𝑛∈ℤ

𝑙[𝑛]𝜑𝑗,2𝑘+𝑛(𝑥) and similarly, 𝜑˜_{𝑗−1,𝑘}(𝑥) =∑

𝑛∈ℤ

˜𝑙[𝑛] ⋅ ˜𝜑^{𝑗,2𝑘+𝑛}(𝑥) (1.13) Insertion in the standard formula for the coeﬃcients 𝑎_{𝑗−1,𝑘} in (1.12) gives

𝑎_{𝑗−1,𝑘} = ⟨𝑓, ˜𝜑_{𝑗−1,𝑘}⟩ =∑

𝑛∈ℤ

˜𝑙[𝑛] ⟨𝑓, ˜𝜑^{𝑗,2𝑘+𝑛}⟩ =∑

𝑛∈ℤ

˜𝑙[−𝑛] ⟨𝑓, ˜𝜑𝑗,2𝑘−𝑛⟩ =∑

𝑛∈ℤ

˜𝑙[−𝑛]𝑎𝑗,2𝑘−𝑛.

Hence, if we set 𝑢𝑗[𝑘]^def= 𝑎𝑗,𝑘 and ˜𝑙^∗[𝑛]^def= ˜𝑙[−𝑛] (the involution of ˜𝑙), then the last equation can be reformulated as the convolution 𝑢_𝑗−1[𝑘] = (𝑢𝑗 ∗ ˜𝑙^∗)[2𝑘] = (↓ (𝑢^𝑗 ∗ ˜𝑙^∗))[𝑘] with ↓ being the downsampling operator that throws away the odd numbered samples. Similarly to (1.13) we can show that

𝜓_{𝑗−1,𝑘}(𝑥) =∑

𝑛∈ℤ

ℎ[𝑛] ⋅ 𝜑^{𝑗,2𝑘+𝑛}(𝑥) and 𝜓˜_{𝑗−1,𝑘}(𝑥) =∑

𝑛∈ℤ

˜ℎ[𝑛] ⋅ ˜𝜑^{𝑗,2𝑘+𝑛}(𝑥) (1.14)

so that it follows for the coeﬃcients 𝑑_{𝑗−1,𝑘} in (1.12) that 𝑑_{𝑗−1,𝑘} =〈

𝑓, ˜𝜓_{𝑗−1,𝑘}〉

=∑

𝑛∈ℤ

˜ℎ[−𝑛]𝑎𝑗,2𝑘−𝑛. (1.15)

Hence we get exactly the algorithm depicted in Figure 1.5 (a). Finally, the (re)construction part of the pyramid algorithm follows from inserting (1.13) and (1.14) into (1.12), setting 𝑚 = 2𝑘 + 𝑛 and identifying the coeﬃcients of 𝜑𝑗,𝑘(𝑥):

∑

𝑘∈ℤ

𝑎𝑗,𝑘𝜑𝑗,𝑘(𝑥) =∑

𝑘∈ℤ

𝑎_{𝑗−1,𝑘}𝜑_{𝑗−1,𝑘}(𝑥) + 𝑑_{𝑗−1,𝑘}𝜓_{𝑗−1,𝑘}(𝑥)

=∑

𝑘∈ℤ

(

𝑎_{𝑗−1,𝑘}∑

𝑛∈ℤ

𝑙[𝑛]𝜑𝑗,2𝑘+𝑛(𝑥) + 𝑑_{𝑗−1,𝑘}∑

𝑛∈ℤ

ℎ[𝑛]𝜑𝑗,2𝑘+𝑛(𝑥).

)

(17)

=∑

𝑘∈ℤ

∑

𝑚∈ℤ

(𝑎_{𝑗−1,𝑘}𝑙[𝑚 − 2𝑘] + 𝑑𝑗−1,𝑘ℎ[𝑚 − 2𝑘]) 𝜑𝑗,𝑚(𝑥)

=∑

𝑚∈ℤ

(∑

𝑘∈ℤ

𝑎_{𝑗−1,𝑘}𝑙[𝑚 − 2𝑘] + 𝑑𝑗−1,𝑘ℎ[𝑚 − 2𝑘]

)

𝜑_𝑗,𝑚(𝑥) 𝑎𝑗,𝑚=∑

𝑘∈ℤ

𝑎_{𝑗−1,𝑘}𝑙[𝑚 − 2𝑘] + 𝑑𝑗−1,𝑘ℎ[𝑚 − 2𝑘].

This is nothing but convolution with either the odd or the even numbered entries of 𝑙 and ℎ. More precisely, with notation 𝑢𝑗[𝑚] = 𝑎𝑗,𝑚 and 𝑣𝑗[𝑚] = 𝑑𝑗,𝑚 we can rephrase it as applying ﬁrst an upsampling operator ↑ and then convolution with 𝑙 and ℎ, respectively:

𝑢𝑗[𝑚] = ((↑ 𝑢𝑗−1)∗𝑙)[𝑚]+((↑ 𝑣𝑗−1)∗ℎ)[𝑚] with (↑ 𝑥)[𝑛]^def=

{𝑥[𝑛/2] if 𝑛 is even, 0 if 𝑛 is odd, as illustrated in Figure 1.5 (b).

Complexity of the pyramid algorithm

The claimed 𝑂(𝑁) complexity of the pyramid algorithm depends on the fact that the MRA scaling functions and wavelets have finite support, that is, they live on some finite length interval and therefore have finite (and typically relatively small) filter lengths (see exercise 4).

For example, if the ﬁlters have length 𝑀 and the signal has length 𝑁 = 2^𝐽 much larger than 𝑀, then the pyramid algorithm conists of 𝐽 steps, each consisting of 𝑀𝑂(𝑁) arithmetic operations for the up/downsampling and convolutions.

Input and output of the pyramid algorithm

It is common practice to use sample values of the analysed signal as input coeﬃcient vector of the pyramid algorithm. This means that the two ﬁrst “boxes” in Figure 1.5 are ignored.

This is easily motivated as long as the analysed signal or function 𝑓 can be assumed to be continuous^b. If 𝐽 is large enough, then we can assume that 𝑓 is well approximated by its projection 𝑃𝐽𝑓 ^def= ∑

𝑘∈ℤ⟨𝑓, ˜𝜑𝐽,𝑘⟩ 𝜑^𝐽,𝑘 =∑

𝑘∈ℤ𝑎𝐽,𝑘𝜑𝐽,𝑘 on 𝑉𝐽. Thus the ﬁrst box in in Figure 1.5 is negligible and we can assume 𝑓 to be in 𝑉𝐽 with negligible errors. Moreover, recall that𝜑(𝑥)˜ is nonzero only on some not so large interval [𝐴, 𝐵] and set 𝑐^def= ∫

ℝ𝜑(𝑥) 𝑑𝑥. Then˜ 𝜑(2˜ ^𝐽𝑥 − 𝑘) is nonzero only for 2^−𝐽𝐴 ≤ 𝑥 − 2^−𝐽𝑘 ≤ 2^−𝐽𝐵, so for large enough 𝐽 we can make the assumption 𝑓 (𝑥) ≈ 𝑓(2^−𝐽𝑘) in the integral

𝑎𝐽,𝑘 =

∫

ℝ

𝑓 (𝑥)2^𝐽/2𝜑(2˜ ^𝐽𝑥 − 𝑘) 𝑑𝑥 ≈ 2^𝐽/2𝑓 (2^−𝐽𝑘)

∫

ℝ𝜑(2˜ ^𝐽𝑥 − 𝑘) 𝑑𝑥 = (set 𝑦 = 2^𝐽𝑥 − 𝑘)

=2^−𝐽/2𝑓 (2^−𝐽𝑘)

∫

ℝ𝜑(𝑦) 𝑑𝑦 = 2˜ ^−𝐽/2𝑐𝑓 (2^−𝐽𝑘). (1.16)

Moreover, for orthonormal wavelets 𝜑 = 𝜑 and it follows from (1.7) that 𝑐 =˜ ∫

ℝ𝜑(𝑥) 𝑑𝑥 = 1.

(See also (1.18).)

Remark 1. Deﬁnition 1 can be written in several equivalent but seemingly more general forms (see [HW96, pages 45, 46 and 49]):

bThis continuity assumption is realistic in typical signal processing applications, but not, for example, in wavelet analysis of a fractal.

(18)

∙ It can be shown that conditions (1.6) implies the existence of some 𝛾 ∈ 𝑉0 whose integer shifts 𝛾(𝑥 − 𝑘) is an orthonormal basis for 𝑉⁰. Hence we get an equivalent deﬁnition of an MRA if we replace “Riesz” with the word orthonormal in (1.6e).

∙ Conditions (1.6a), (1.6c) and (1.6e) imply (1.6d), which therefore can be removed from the deﬁnition of an MRA.

∙ If in addition ∣ ˆ𝜑∣ is continuous at 0, then (1.6a), (1.6c) and (1.6e) also imply that (1.6b) is equivalent to requiring that 𝜑(0) ∕= 0. Moreover, it follows for orthonormal waveletˆ scaling functions that ∣ ˆ𝜑(0)∣ = 1.

Exercises

1. For the two dual bases {𝜑^𝑗,𝑘}_𝑘∈ℤ ∪ {𝜓^𝑗,𝑘}_𝑘∈ℤ and { ˜𝜑𝑗,𝑘}_𝑘∈ℤ ∪{ 𝜓˜𝑗,𝑘

}

𝑘∈ℤ for 𝑉𝑗, convince yourself that the biorthogonality condition (1.5) becomes

⟨𝜑^𝑗,𝑘,𝜑˜𝑗,𝑙⟩ =〈

𝜓𝑗,𝑘, ˜𝜓𝑗,𝑙

〉= 𝛿𝑘,𝑙 and 〈

𝜑𝑗,𝑘, ˜𝜓𝑗,𝑙

〉= ⟨𝜓^𝑗,𝑘,𝜑˜𝑗,𝑙⟩ = 0.

2. Fill in the left out steps in the computaion of (1.14), how it leads to (1.15) and how (1.15) is related to the ﬁlter ˜𝐻 and the downsampling ↓ of its output in Figure 1.5 (a).

3. Why is 𝐿 called a lowpass ﬁlter?

Hint: Let’s restrict to orthonormal wavelets, for which we know from (1.7) that 𝜑(0) =ˆ

∫

ℝ𝜑(𝑥) 𝑑𝑥 = 1. Taking the Fourier transform of the left-hand and right-hand sides of (1.9) gives

ˆ

𝜑(𝜉)^def=∑

𝑘∈ℤ

𝑙[𝑘]√ 2

∫

ℝ𝜑(2𝑥 − 𝑘)e^{−𝑖2𝜋𝜉𝑥}𝑑𝑥 =∑

𝑘∈ℤ

𝑙[𝑘]√ 2 ⋅1

2

∫

ℝ

𝜑(𝑦)e^{−𝑖2𝜋𝜉}

(𝑦 2+𝑘

2 )

𝑑𝑦

= 1

√2

∑

𝑘∈ℤ

𝑙[𝑘]e^−𝑖2𝜋^𝑘²^𝜉

∫

ℝ

𝜑(𝑦)e^{−𝑖2𝜋𝜉}^𝑦² 𝑑𝑦 = 1

√2

∑

𝑘∈ℤ

𝑙[𝑘]e^{−𝑖2𝜋𝑘}^𝜉²

∫

ℝ

𝜑(𝑦)e^−𝑖2𝜋²^𝜉^𝑦𝑑𝑦

= 1

√2𝐿(_𝜉

2

)𝜑ˆ(_𝜉

2

), or equivalently,

ˆ

𝜑(2𝜉) =^√¹₂𝐿(𝜉)𝜑(𝜉).ˆ

Suppose that 𝜑 has 99 % of its energy in an interval [𝑎, 𝑏], with 𝑎 < 0 < 𝑏, that is, thatˆ

∫𝑏

𝑎 ∣ ˆ𝜑(𝜉)∣² 𝑑𝜉 ≥ 0.99∫

ℝ∣ ˆ𝜑(𝜉)∣² 𝑑𝜉. Find some interval where 𝜑(2𝜉) has 99 % of its energy.˜ What does this say about the action of 𝐿 on 𝜑?.ˆ

4. Explain why a scaling functions 𝜑, 𝜑 living on finite length intervals have finite length˜ filters 𝑙, ℎ, ˜𝑙 and ˜ℎ.

Hint: Do you know some useful formula(e) for computing these coeﬃcients?

5. Suppose that you have made a software implementetation of the pyramid algorithm based on the formulas in this section, and that you want to plug in some orthonormal wavelet ﬁlter from a table in some book. However, suddenly you see that the author of that book has derived the pyramid algorithm from the scaling equation 𝜑(𝑥) =∑

𝑘∈ℤ𝑚𝑘𝐶𝜑(2𝑥 + 𝑘) for some constant 𝐶 ∕=√

2 instead of the scaling equation 𝜑(𝑥) =∑

𝑘∈ℤ𝑙[𝑘]√

2𝜑(2𝑥 − 𝑘) that your implementation is based on. How should you translate the ﬁlter coeﬃcients 𝑚𝑘

from the book to the ﬁlter coeﬃcients 𝑙[𝑘] that your implementation is based on?

(19)

Figure 1.7: The principle behind noise reduction and compression If the signal is well repre- sented by a small number of large wavelet coefficient (here sorted in decreasing order), then by setting a threshold, and removing all coefficients smaller than this threshold, one can get a good approximation of the signal with a small number of coefficients (compression) as well as removing most of the noise in the process.

1.3 Diﬀerent wavelet families and characterizing prop- erties

The wavelet bases is not the only available tool for analysing and construction functions, but no tool is the best tool for all applications.

For the condition monitoring application described in the slides from day two of the lectures (and described in full in [EGJ⁺04]), a certain continuous wavelet transform seemed to be a suitable choice due to its to their property to give better time resolution at higher frequencies.

For OFDM signal synthesis, as described in the slides from day one of the lectures, a Gabor type bases is usually used, and can be argued to have some advantages over wavelets for that particular application [KPZ02]. Both for the FBI ﬁnger print storage and the JPEG 2000 standard (also described in the slides from day 1) diﬀerent compression methods were compared and the best one found was of wavelet type and there is research for constructing bases (beamlets, wedgelets, curvelets) that are argued to be even better suited for image applications.

For the MP3 standard, local cosine bases were chosen (those bases are described, for example in [Mal99]).

So for a given application where some function (signal/image) shall be synthesized or analyzed, one can always try to argue from known properties of this kind of functions what kind of tool is best suited for this application. If wavelets is chosen, there are many different kind of wavelets to choose between. In general, one want to find wavelets that can give sparse repre- sentations of the synthesized or analyzed signal. This means that almost all information about the signal is stored in a small number of large wavelet coefficients. This is clearly an advantage for compression applications, since all but a few coefficients can be set to zero with only a small error, and also for many noise reduction purposes, since orthogonal bases map additive white Gaussian noise in the samples to additive white Gaussian noise evenly distributed as a relatively small error on all coefficients [BEL99, 166]^c. See Figure 1.7.

Still, even if other basis can be even better for a particular application, wavelets are still quite useful. In this section we will describe a couple of diﬀerent wavelets, some of their characterizing properties and how these can be of importance when choosing one particular wavelet basis for a given application. We describe the wavelets only brieﬂy. For more details and examples, we

cTry to ﬁnd a better citation.

(20)

Figure 1.8: Daubechies 𝑛 scaling functions. Larger 𝑛 means smoother wavelet and more vanishing moments, but at the cost of scaling functions and wavelets having larger support.

refer, for example to the extracts from the Wavelet Toolbox Use’s Guide [MMOP09] available at http://www.ltu.se/tfm/Fysik/ngsst/d2212/1.51590. For example, check the pages 1-39–1-47 to see plots and some properties of diﬀerent wavelets and scaling functions or make your own plots, or, for a more extensive comparison and summarizing table of diﬀerent families, see pages 6-71–6-91.

Wavelets with vanishing moments and minimal support: Daubechies and Symmlets We say that a wavelet 𝜓 has 𝑁 vanishing moments if

∫

ℝ

𝑥^𝑛𝜓(𝑥) 𝑑𝑥 = 0 for 𝑛 = 0, 1, . . . , 𝑁 − 1

Note that 𝜓𝑗,𝑘 lives on an interval 𝐼𝑗,𝑘 of length proportional to 2^−𝑗. Hence, if 𝑓 has 𝑛 ≤ 𝑁 continuous derivatives on 𝐼𝑗,𝑘, then it can be approximated there with a Taylor polynomial 𝑓 (𝑥) = 𝑃_𝑛−1(𝑥) + 𝑂((2^−𝑗)^𝑛), so that

⟨𝑓, 𝜓^𝑗,𝑘⟩ =

∫

ℝ

𝑓 (𝑥)𝜓𝑗,𝑘(𝑥) 𝑑𝑥 =

∫

ℝ

𝑃_𝑛−1(𝑥)𝜓𝑗,𝑘(𝑥) 𝑑𝑥 + 𝑂(2^−𝑗𝑛) = 𝑂(2^−𝑗𝑛) (1.17) [BEL99, p. 79]. In other words, many vanishing moments means a fast decay of the wavelet coefficients provided that 𝑓 is smooth, but at the same time more vanishing moments can only be bought at the price of longer wavelet and scaling function support (as in Figure 1.8), thus, if 𝑓 is not smooth everywhere but has some hight frequency transient behaviour at some place(s), then these transients affect wavelet coefficients whose (dual) wavelet or scaling function support overlaps the transient behaviour: For example, the detail coefficient 𝑑𝑗,𝑘 =〈

𝑓, ˜𝜓𝑗,𝑘

〉can be large for all ˜𝜓𝑗,𝑘 whose support overlap some transient behaviour or a discontinuity of 𝑓 . Hence there can be a tradeoff between support length and smoothness/vanishing moments affecting the speed of decay of the wavelet coefficients. This is further discussed in [Mal99, p. 559-560]).

Example 1 (Daubechies 𝑁 wavelets). Minimal support [0, 2𝑁 − 1] for 𝑁 vanishing moments.

Filter length 𝑁. Highly assymetric with energy optimally concentrated near the starting point

(21)

of their support. The only (anti)symmetric orthonormal wavelet is Haar. The number of continuous derivatives growassymptotically as 0.2075𝑁. (See, e.g., [BEL99, p. 126],[Mal99, p.244, 253] or [HW96].)

Example 2 (Symmlet wavelets). 𝑁 vanishing moments with support [−𝑁 + 1, 𝑁] of the same lenght but less assymetric and less smooth than the Daubechies 𝑁 wavelets. (See, for example, [BEL99, p. 126], [Mal99, p.253] or [HW96].)

Scaling function with vanishing moments: Coiﬂets

Example 3 (Coiﬂets). For Coiﬂets, in addition in addition to the vanishing moments of the Daubeches wavelets, the scaling function has 𝑁 − 1 vanishing moments:

∫

ℝ

𝑥^𝑛𝜑(𝑥) 𝑑𝑥 = 0, 𝑛 = 1, 2, . . . , 𝑁 − 1.

Hence, reasoning as in(1.16) and (1.17), if 𝑓 is 𝑛 ≤ 𝑁 times diﬀerentiable in the interval of length 𝑂(2^−𝑗) where 𝜑𝑗,𝑘 lives then 𝑓 (𝑥) = 𝑃_𝑛−1(𝑥) + 𝑂((2^−𝑗)^𝑛) in that interval, so that

⟨𝑓, ˜𝜑𝑗,𝑘⟩ =

∫

ℝ

𝑓 (𝑥)𝜑˜𝑗,𝑘(𝑥) 𝑑𝑥 = 2^−𝑗/2𝑐𝑓 (2^−𝑗𝑘) + 𝑂(2^−𝑗𝑝), with 𝑐 as in (1.16). (1.18) Coiﬂets are even more symmetric that than Symmlets with the same number of vanishing moments, at the cost of longer support length 3𝑁 − 1 as compared to 2𝑁 − 1 for Symmlets.

The ﬁlter length is 3𝑁.

(See, for example, [BEL99, p. 130],[Mal99, p. 254] or [HW96].) Symmetry gives linear phase: Biorthogonal wavelets

The Haar wavelet is symmetric but also discontinuous. It is imposible to construct smooth biorthogonal wavelets of compact support that are symmetric or antisymmetric [Mal99, p. 269].

For biorthogonal wavelets, however this is possible. One can construct biorthonormal wavelets that are symmetric or antisymmetric around the centerpoint of their support (ℎ[𝑘] = ℎ[𝑁 − 𝑘]

or ℎ[𝑘] = −ℎ[𝑁 − 𝑘]). Such ﬁlters have linear phase [BEL99, p. 25], which means that its if its Fourier series expansion 𝐻(𝜉) is of the form

𝐻(𝜉) = 𝑟(𝜉)e^{i2𝜋𝜙(𝜉)} with real-valued 𝑟 and 𝜙(𝜉) = 𝑎𝜉 + 𝑏.,

(More generally, ℎ also has linear phase if 𝜙 is piecewise linear with constant slope and discontinuities allowed only for 𝜉 such that 𝑟(𝜉)=0 [BEL99, p. 24].) If a signal 𝑠[𝑘] = e^i2𝜋𝜉⁰^𝑘 is sent through a linear phase ﬁlter, then the output is the convolution^d

𝑠 ∗ ℎ[𝑘]^def=∑

𝑛∈ℤ

𝑠[𝑘 − 𝑛]ℎ[𝑛] = ∑

𝑛∈ℤ

e^i2𝜋𝜉⁰^{(𝑘−𝑛)}ℎ[𝑛] = e^i2𝜋𝜉⁰^𝑘∑

𝑛∈ℤ

e^{−i2𝜋𝜉}⁰^𝑛ℎ[𝑛]

=eî2𝜋𝜉⁰^𝑘𝐻(𝜉0) = eî2𝜋𝜉⁰^𝑘𝑟(𝜉0)e^{i2𝜋𝜙(𝜉}⁰⁾= 𝑟(𝜉0)eî2𝜋(𝜉⁰^{𝑘+𝜙(𝜉}⁰⁾⁾ (𝜙(𝜉) = 𝑎𝜉 + 𝑏)

=𝑟(𝜉0)e^i2𝜋𝑏e^i2𝜋𝜉⁰^(𝑘+𝑎), (1.19)

which means that a linear phase ﬁlter adds the same delay (here 𝑎) to all frequency compo- nents of a signal. This is an important property in applications such as speech and sound processing [BEL99, p. 24].

dReaders familiar with the discrete Fourier transform (DFT) would reduce this computation to a short one line application of standard properties of the DFT.

(22)

Example 4 (B-spline wavelets). Biorthogonal. See, e.g., [HW96, p. 152–154], [Mal99, p.

152–153, 224]. 𝜑⁰(𝑥) is the Haar scaling function and

𝜑^𝑛(𝑥) = 𝜑^𝑛∗ 𝜑⁰(𝑥), 𝑛 = 1, 2, 3, . . . Support and symmetry: [HW96, p. 154]

Dual scaling function and wavelet: [BEL99, p. 138] See also, e.g., [BEL99, p. 134], [HW96, p.

152–154].

See, for example the Wavelet Toolbox user’s guide for some more examples of biorthogonal wavelets.

Some more wavelets

Example 5 (Spline wavelets of order 𝑛). See, e.g., [HW96, p. 161–162]. Called Battle-Lemarié wavelets in[BEL99, p. 141–142]. Obtained from applying a certain orthonormalizarion trick to the the B-spline wavelets. Both 𝜑, 𝜓 and their derivatives of order 1, 2, . . . , 𝑛 have exponential decay. (Note however that there is no infinitely many times differentiable orthonormal wavelet with exponential decay[HW96, p. 197].)

Example 6 (The Shannon wavelet). 𝜑(𝜉) = 𝜒[−1/2, 1/2](𝜉). Decays as 𝜑(𝑥) = sincˆ (𝑥) decays as 1/ ∣𝑥∣, which makes it practically useless. See, e.g., [HW96, p. 61–62].

Example 7 (Lemari´e–Meyer wavelets). The bad decay properties of the Shannon wavelets comes from discontinuities in the box-shaped Fourier transform 𝜑 of the scaling function. Theˆ Lemari´e–Meyer family of wavelets have a scaling function𝜑(𝜉) that equals on for ∣𝜑∣ < 1/2 − 𝜀ˆ and decays in a certain way to zero for 1/2 − 𝜀 ∣𝜑∣ < 1/2 + 𝜀 in order to give a bandlimited wavelet with faster decay. See, e.g., [HW96, p. 62–63]

Example 8 (Meyer wavelets). See, e.g., [BEL99, p. 141]. The wavelets have an infinite number of vanishing moments. 𝜑 and 𝜓 are very smooth in the sense that they are infinitely many times differentiable.

Exercises

1. (a) Show that the B-spline scaling function 𝜑¹ in Example 4 is a so-called “tent function”

𝜑¹(𝑥) =

⎧

⎨

⎩

𝑥 for 0 ≤ 𝑥 ≤ 1, 2 − 𝑥 for 1 ≤ 𝑥 ≤ 2, 0 otherwise.

(b) 𝜑¹(𝑥) is one example of a scaling function for which the the dual scaling function ˜𝜑¹ has the special property that it is a reproducing kernel, which means that for all 𝑓 ∈ 𝑉⁰,

〈𝑓, ˜𝜑¹〉

= 𝑓 (0) and, consequently, 〈

𝑓, ˜𝜑¹_0,𝑘〉

= 𝑓 (𝑘), so that any 𝑓 in 𝑉₀ has series expansion

𝑓 (𝑥) =∑

𝑘∈ℤ

𝑓 (𝑘)𝜑¹_0,𝑘(𝑥) (see, for example, [EG05] for more about such bases).

Given 𝑓 ∈ 𝑉⁰ such that 𝑓 (0) = 3, 𝑓 (1) = −3, 𝑓(2) = 1 and 𝑓(𝑘) = 0 for all other integers 𝑘, sketch 𝑓 .

(23)

Time-frequency analysis and the continuous wavelet transform

2.1 Time-frequency analysis

An everyday example of time-frequency analysis is the processing performed in the human inner ear (see Figure 2.1) when, for example, listening to someone playing a piano. The sound

Figure 2.1: Time-frequency analysis in the human ear: The sound reaches the outer ear as air pressure variations. It is transmitted through the middle and inner ear and converted to vibrations in the fluid within a spiral channel in the cochlea. The vibrations stimulate hair cells in the basilar membrane, which exhibits different degrees of stiffness, or resonance, along its length. As a result, the different frequencies of the sound are sorted out and transmitted to the brain stem by the cochlear nerve. After extensive processing, the nerve impulses are relayed to the cerebral cortex and the listener becomes aware of the sound. (Source: [Enc00])

21

Some Essentials of Data Analysis with Wavelets

Niklas Grip

Contents

Index 40

Chapter 1

The discrete wavelet transform and multiresolution analysis

1.1 Basic idea: Approximation + details

Exercises

1.2 Multiresolution analysis adds smoothnes and more

1.2.1 Notation and preliminaries

1.2.2 Multiresolution analysis

Exercises

1.3 Diﬀerent wavelet families and characterizing prop- erties

Exercises

Time-frequency analysis and the continuous wavelet transform

2.1 Time-frequency analysis