• No results found

Multidimensional signal recognition, invariant to affine transformation and time-shift, using canonical correlation

N/A
N/A
Protected

Academic year: 2021

Share "Multidimensional signal recognition, invariant to affine transformation and time-shift, using canonical correlation"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

Multidimensional signal recognition,

invariant to ane transformation and

time-shift, using canonical correlation

Thesis project done at Computer Vision Laboratory

Linkoping University

Sweden

by

Bjorn Johansson

Reg. nr: LiTH-ISY-EX-1825

Supervisor:

Magnus Borga

Examiner:

Hans Knutsson

Linkoping, October 6, 1997

(2)

Contents

1 Introduction

4

1.1 Acknowledgements . . . 4 1.2 Notation . . . 4 1.3 Previous knowledge . . . 5 1.4 Outlines . . . 5

2 Canonical correlation

6

2.1 De nition and a short explanation . . . 6

2.2 The gradient-search algorithm . . . 9

3 What this thesis is about

11

4 LP- ltering signals

14

4.1 Experiments . . . 14 4.1.1 One-dimensional signals . . . 14 4.1.2 Multidimensional signals . . . 16 4.2 An ounce of theory #1 . . . 17 4.2.1 Multidimensional frequencies . . . 17

4.2.2 Time shift and linear transformation . . . 20

4.3 Discussion . . . 22

5 LP- ltering canonical correlation

26

5.1 An idea . . . 26

5.2 Experiments . . . 27

5.3 An ounce of theory #2 . . . 28

5.3.1 Covariance and signal frequencies . . . 28

5.3.2 Filtered canonical correlation . . . 29

5.3.3 One step further . . . 30

5.4 Discussion . . . 31

6 Signal norm

33

6.1 Experiments . . . 33

6.2 An ounce of theory #3 . . . 34

(3)

7 Round-up

38

7.1 Summary and comparisment . . . 38 7.2 Future work . . . 40

(4)

Introduction

1.1 Acknowledgements

I would like to thank my supervisor licentiate Magnus Borga and my examiner associate professor Hans Knutsson for many inspiring discussions and ideas that has moved the work forward (I got no help from literature - could not nd any concerning this topic). I would also like to thank Magnus Borga for all construc-tive criticism in proof-reading my thesis. Thank you also to the rest of the sta at the Computer Vision laboratory for helping me with practical things during the work. Thanks also to Stefan Eriksson for taking on the job as opponent.

Finally I would like to thank the word processing program LATEX and the

mathe-matical software MATLAB for being cooperative (most of the time).

1.2 Notation

Italics (x) are used for scalars.

Lowercase letters in boldface (

x

) are used for vectors.

Uppercase letters in boldface (

X

) are used for matrices.

Transpose is denoted by a 'T' (e.g.

x

T).

Adjungation (transpose + conjugation) is denoted by a '' (e.g.

x

).

'^' over a vector indicates unit length (e.g.j

v

^j= 1.)

<fzg and =fzg means the real part resp. the imaginary part of the complex

variablez.

(5)

1.3 Previous knowledge

In order to appreciate this thesis you need to have some knowledge in linear algebra (e.g. terms like subspace, span and eigenvalue), statistics (correlation, covariance matrix, etc. ) and Fourier series.

1.4 Outlines

Chapter 2 describes the concept of canonical correlation. This you have to know about in order to understand the continuing discussion.

Chapter 3 introduce you to the problem that was to be solved.

Chapter 4, 5 and 6 discusses three di erent suggestions how to approach the problem. Each chapter begins with a section of experiments as a motivation of the approach. Then follows some theory and mathematical manipulations to structure the thoughts. The last sections contains discussions and suggestions concerning the approach.

Finally chapter 7 contains a summary and a comparismental discussion of the approaches.

(6)

Canonical correlation

Below follows a short explanation of canonical correlation. I will also mention a method to nd the maximum canonical correlation based on an iterative gradient search. This method has been developed at the laboratory of Computer Vision here in Linkoping. If you want to know more about canonical correlation and the gradient search method I strongly recommend the licentiate thesis by Magnus Borga [1].

2.1 De nition and a short explanation

Canonical correlation is a similarity measure between two multidimensional sig-nals. The idea is to only look at one direction (projection) at a time. Canonical correlation also has an advantage compared to ordinary correlation analysis in being invariant to ane transformations.

Assume we have two multidimensional signals

x

2 X and

y

2 Y where X and

Y are multidimensional spaces, not necessarily of the same dimensionality. Let

x = ^

w

T

x

x

and y = ^

w

T

y

y

be the projections of

x

resp.

y

in the directions ^

w

x

resp. ^

w

y. Assume E

f

x

g = Ef

y

g =

0

. Then we can calculate the correlation

betweenxandy as  = Efxyg p EfxxgEfyyg =

w

^TxE f

xy

Tg

w

^ y q ^

w

T xE f

xx

Tg

w

^ x

w

^T yE f

yy

Tg

w

^ y = =

w

xT

C

xy

w

y q

w

T x

C

xx

w

x

w

T y

C

yy

w

y (2.1)

By rotating the projection vectors

w

x and

w

ywe can nd the directions

(7)

Example 2.1

A simple example of a multidimensional (periodic) signal: An\eight"

x

( ) = (cos ;sin2 ) (see g. 2.1). Fig. 2.2 shows the projection x( ) =

w

Ti

x

( )

along the three directions

w

1= (1;0),

w

2= (0:6;0:8) and

w

3= (0;1).

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 x1 x2

Figure 2.1: Example of a multidimensional signal: An \eight"

0 1 2 3 4 5 6 −1 −0.5 0 0.5 1 alpha proj. w1=(1,0) 0 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 alpha proj. w2=(0.6,0.8) 0 1 2 3 4 5 6 −1 −0.5 0 0.5 1 alpha proj. w3=(0,1)

Figure 2.2: Projection of the eight-signal along the directions

w

1= (1;0)

(left),

w

2= (0:6;0:8) (middle) and

w

3= (0;1) (right)

Let

y

( ) =

x

( ) and calculate the canonical correlation . Figure 2.3 shows

as a function of

w

x and

w

y. Because the

w

-vectors are two-dimensional I have

chosen to represent them in the gure as the angle between them and the horizontal

axis (the x1-axis - see g. 2.1). For instance

w

= (1;0) is represented as 0o and

w

= (0;1) corresponds to 90o. The gure shows that we have a maximum= 1

wherever

w

x=

w

y (and nowhere else).



The maximum can be found by looking at the derivatives ofwith respect to

w

x

and

w

y: ( @ @wx = 0 @ @w y = 0 , (

C

xy

w

^y , ^ w T x C xy ^ w y ^ w T x C xx ^ w x

C

xx

w

^x= 0

C

yx

w

^x , ^ w T y Cyxwx^ ^ w T y C yy ^ w y

C

yy

w

^y= 0 , (

C

xy

w

^y , q ^ w T y Cyywy^ ^ w T x C xx ^ w x

C

xx

w

^x= 0

C

yx

w

^x , q ^ w T x C xx ^ w x ^ w T y C yy ^ w y

C

yy

w

^y = 0 (2.2) (Proof see [1].) This equation system can be rewritten as

(8)

0 100 200 300 0 200 −1 −0.5 0 0.5 1 wx rho wy 00 100 200 300 50 100 150 200 250 300 350 rho wx wy

Figure 2.3:  as function of

w

x and

w

y (right). Left gure shows the

same curve from above. The unit of the

w

x- and

w

y-axis are the angle in

degrees from the horizontal axis (x1-axis).

C

A

w

=

C

B

w

(2.3) where

C

A= 

0 C

xy

C

yx

0

 ;

C

B= 

C

xx

0

0 C

yy 

w

=  x

w

^x y

w

^y  ; x y = q ^ w T y Cyy ^ wy ^ w T x Cxxwx^

Eq. 2.3 is called a generalized eigenproblem. By solving this we nd the directions ^

w

x and ^

w

y and then also the projections ^

w

T

x

x

and ^

w

T

y

y

that gives the highest

correlation. The maximum correlation can be found as the largest eigenvalue (1).

The equation system 2.2 can also be rewritten in another way (this form will be used later on in the thesis):

(

M

x

w

^x= 2

w

^ x

M

y

w

^y= 2

w

^ y (2.4) where (

M

x=

C

,1 xx

C

xy

C

,1 yy

C

yx

M

y=

C

,1 yy

C

yx

C

,1 xx

C

xy

If we only want to nd the eigenvalues and not the eigenvectors it is enough to solve one of the equation systems in 2.4. I have assumed that the inverse of the variance matrices exists. It is true in all practical cases where there are noise present.

What is the interpretation of the other eigenvalues (2, 3, ...)? These

eigen-values corresponds to canonical correlations in other directions. Let us say we could somehow eliminate the information in the signals from the direction that corresponds to the largest correlation. Then the second largest eigenvalue and corresponding eigenvector would give the maximal correlation and so forth. Often

(9)

it is not enough to look in one direction only. Note that it is possible to get1= 1

in the rst direction indicating that the signals are equal, and get i 1 in the

remaining directions indicating that the signals only looked alike in one direction. So if we want to know the similarity between the signals in more than one direction we can have use of the other eigenvalues as well. We could for instance let

(2 1+ 2 2+:::+ 2 k)=k ; 1  2 ::: (2.5)

for some suitablek be a measure of similarity between the signals.

As mentioned earlier canonical correlation is invariant to ane transformations. Here follows an example of a special case of that invariance: The maximum of the

canonical correlation between

x

(t) and

x

(t) gives= 1 as it should. If we do the

same thing with

x

(t) and

y

(t) =

Ax

(t), where

A

is a linear transformation, we

get:  =

w

TxE f

xy

Tg

w

y q

w

T xE f

xx

Tg

w

x

w

T yE f

yy

Tg

w

y = =

w

xTE f

xx

T

A

Tg

w

y q

w

T xE f

xx

Tg

w

x

w

T yE f

Axx

T

A

Tg

w

y = =

w

Tx

C

xx

A

T

w

y q

w

T x

C

xx

w

x

w

T y

AC

xx

A

T

w

y (2.6)

and we get= 1 if we choose

w

x=

A

T

w

y.

2.2 The gradient-search algorithm

The second derivative of, i.e. the Hessian, shows that there are no local maxima

ofexcept for the global one if all eigenvaluesfigare distinct (proof see [1]). This

is an important condition if we want to use a gradient search to nd the maximum

of . I will now describe the iterative gradient search method for nding the

maximal canonical correlation.

The update rule is (

w

x(t+ 1) =

w

x(t) + 

w

x(t)

w

y(t+ 1) =

w

y(t) + 

w

y(t) (2.7) where ( 

w

x= x

x

(

y

T

w

^ y ,

x

T

w

x) 

w

y= y

y

(

x

T

w

^ x ,

y

T

w

y) (2.8)

xand y are positive update factors.

Why does this update rule work? The expected change of 

w

x is

Ef

w

x g = xEf

xy

T

w

^ y ,

xx

T

w

x g= x(

C

xy

w

^y ,k

w

x k

C

xx

w

^x) = = 

identify with eq:2:2



= x@@

w

x

(10)

where xis a positive constant andk

w

x k= ^ w T x Cxywy^ ^ w T x Cxxwx^ = x y. Similar calculations gives Ef

w

y g= y @ @

w

y ; k

w

y k= y x (2.10)

This shows that the changes of

w

x and

w

y are on average in the same directions

as the gradient of, i.e. they are moving in a direction that increases.

The canonical correlation can after convergence be calculated as

=q k

w

x kk

w

y k (2.11)

To get the second largest eigenvalue, i.e. the second best directions, there is a way to subtract the information in the rst direction and use a modi ed iterative gradient search. I will not get any further into this. Let us instead begin the work.

(11)

What this thesis is about

Let us say we have measured a multidimensional periodic signal

y

(t). It could

for instance be an image sequence or an ECG signal. We maybe suspect that

this signal comes from one of many in a set of prototype signals f

x

i(t)g. How

do we nd the most likely one? If the prototype signal

y

is measured by sensors

(e.g. cameras) they will probably deform the signal (di erent angles, distances causes rotation, scaling etc. ) and they will also introduce noise. On top of all the measured signal may be unsynchronized in time with the stored prototype

signal. So all in all we could assume that the measured signal

y

is a transformed,

time-shifted and noisy version of a prototype signal

x

i:

y

(t) =

Ax

i(t+ t) +

e

(t) (3.1)

where

A

is a linear transformation matrix, t is a time-shift (sometimes called

time alignment) and

e

(t) is some sort of unwanted noise signal.

In reality the transformation does not have to be linear. A non-linear transforma-tion is a bit more complicated and is not treated in this thesis. So from now on I will assume a linear transformation.

If we want to nd the prototype signal

x

i that is most similar to

y

we must

rst de ne what we mean by similarity. If we want to handle cases like the one discussed above we should have a similarity measure that is invariant to linear

transformations and time-shifts. In this thesis two signals

x

(t) and

y

(t) are said

to be equal if they can be written

y

(t) =

Ax

(t+ t) for some

A

and t.

The canonical correlation algorithm described in chapter 2 can only handle ane transformations. That is not bad at all but it would be even better if it could also handle time-shifts. The goal of this thesis is to nd a way to detect (some form

of) similarity between two signals

x

(t) and

y

(t) =

Ax

(t+ t) regardless of what

ane transformation

A

and time-shift t we have.

Let

x

s(t) =

x

(t+s) and take the canonical correlation between

x

s and

y

(note

that

C

x

s x

s =

C

xx):

(12)

=

w

Tx

C

x s y

w

y q

w

T x

C

xx

w

x

w

T y

C

yy

w

y (3.2)

We can now think ofas a function of

w

x,

w

y ands. If we maximizewe get a

similarity measure between

x

and

y

that is invariant to

A

and t.

One way to nd that maximum would be to use the algorithm described in chapter

2, i.e. maximize with respect to

w

x and

w

y, for each time-shiftsand then choose

the maximal value among these.

Example 3.1

Take a look at gure 3.1. maxw x;

w

x() is calculated for each

time-shift s (with for instance the method described in ch. 2). The 2D-signals used

are shown in gure 3.2. They have a period of 100 samples and one of them are shifted 16 samples (and transformed). This is detected in the graph by the global maximum. 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 s max(can.corr.)

Figure 3.1: maxwx;wy() vs. time-shifts

Figure 3.2:

x

and

y

.

The signals start where the interruption is.



This is not a very fast method. It is just simply an exhaustive testing of all pos-sible time-shifts. If the dimensionality of the signal is high it will take time to

calculate maxwx;wy() for each time-shift (e.g. an image of 64

64 pixels have a

dimensionality of 4096).

How about incorporating the variablesinto the gradient search described in ch. 2?

For a gradient search to be e ective there can only be one optimum - the global one. Otherwise we can get stuck in a local optimum. Before we introduced the

(13)

time-shift variable had  only one optimum and a gradient search was e ective (see ch. 2). Now there are lot of local optima. This can for instance be seen in

g. 3.1. So we should nd some other way to calculate the maximum of.

I should mention that sometimes it may not be necessary to try all time-shifts. In a report about ECG and Vectorcardiographic loops [2] the signals are assumed to be reasonably well-synchronized in time. The method used here was to perform exhaustive search of all time-shifts in an interval. (The similarity measure used was not canonical correlation but rather some matrix error function.) In this case it might be satisfying to use exhaustive search. But as said before the conditions can be more unpleasant.

In the next chapter I will discuss the perhaps most intuitive approach to nd the correct time-shift: LP- ltering of the signals.

(14)

LP- ltering signals

Take a second look at g. 3.1 page 12. If we want to incorporate time-shift in a gradient search we have to get rid of the local maxima. How does one 'usually' smooth out rapidly variating curves and gets rid of local maxima? - One uses LP- ltering. In this chapter I will discuss LP- ltering of the signals. It turns out not to be so e ective with multidimensional signals, but I still want to discuss it because it is so fundamental. Let us start with some experiments to get us motivated in nding out what is happening more theoretically.

4.1 Experiments

We begin with a successful example of correlating LP- ltered one-dimensional signals. After that we try the same thing with multidimensional signals.

4.1.1 One-dimensional signals

Look at the signals in g. 4.1 (they have been generated by LP- ltering white noise and have zero mean). They are equal except for a time-shift of 16 samples and a scaling parameter. To nd the time-shift we can calculate the absolute value of the correlation for each possible time-shift as we did in ch. 3 (the signal period is 100 samples). We would then get the curve in g. 4.2. The global maximum at time-shift 16 is the one we are after.

If we lter the signals with an LP- lter of type 1=!we get the signals in g. 4.3.

Calculating the new correlation gives the curve in g. 4.4. Now there are only two maxima left - one global and one (almost as big) local. Explanation: because after LP- ltering, the lowest frequency component is the single largest component to a ect the correlation, a time-shift of half the period from the true time-shift will be almost as good. But one false optimum is not a big deal if this method would turn out to be e ective with multidimensional signals as well.

(15)

0 20 40 60 80 100 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 sample x(t) 0 20 40 60 80 100 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 sample y(t) = −2x(t+16)

Figure 4.1: Signals x(t) (upper) and

y(t) =,2x(t+ 16) (lower) 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(correlation)

Figure 4.2: Abs(correlation) vs. time-shift 0 20 40 60 80 100 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 sample xLP(t) 0 20 40 60 80 100 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 sample yLP(t) = −2xLP(t+16)

Figure 4.3: LP- lteredx andy

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(correlation) Figure 4.4: Abs(correlation) of

(16)

4.1.2 Multidimensional signals

Now we do the same thing with two multidimensional signals. Start with two 6D-signals (generated by low- ltering white noise), see g. 4.5. One of the signals is a linear transformation + a 16 sample time-shift of the other. A calculation of

maxwx;wx() for each possible time-shift is shown in g. 4.6.

0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 sample 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 sample

Figure 4.5: Signal components of

x

(left) and

y

(right) 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.6: maxw x; w y() vs. time-shift

Next we low-pass lter with the same lter as before. The LP-signals are shown in g. 4.7 and the correlation between them in g. 4.8.

0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 sample 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 sample

Figure 4.7: Signal components of ltered

x

(left)

and

y

(right) with lter1=!

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.8: maxw x; w y() vs. time-shift

As you can see the number of local maxima have not been reduced by many. Instead the two signals have a high correlation at every time-shift. If we take an

even more powerful LP- lter (type 1=!2) we get the result in g. 4.9 and g. 4.10.

Now there is (almost) a full correlation at every time-shift! This is obviously not working. What is happening? By looking at g. 4.9 you can see that all com-ponents look alike - they have lost their individuality. This would intuitively mean that for each time-shift there are a high possibility that at least one component

in

x

matches one component in

y

. That is all that takes to get a high canonical

correlation.

(17)

0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.1 0 0.1 sample 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.05 0 0.05 sample

Figure 4.9: Signal components of ltered

x

(left)

and

y

(right) with lter1=!

2 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.10: maxw x; w y() vs. time-shift

4.2 An ounce of theory #1

In this chapter I will introduce some theory to give you a feeling of the world the multidimensional signals are living in. First we will see how a signal can be explained in terms of frequencies and second how time-shift and linear transfor-mation are related.

4.2.1 Multidimensional frequencies

Monofrequency signals

Consider a parameterized vector signal

x

(t)2Rm,m2Z

+. t

2Ris a parameter,

for instance time. Assume the signal to be periodic, i.e.

x

(t+T) =

x

(t) for some

(known)T 2R.

If the signal

x

(t) consists of only one frequency (=!=2) we can write:

x

(t) = 0 B B B @ a1sin(!t+ 1) a2sin(!t+ 2) ... amsin(!t+ m) 1 C C C A = 0 B B B @

a1sin 1cos!t+a1cos 1sin!t

a2sin 2cos!t+a2cos 2sin!t

...

amsin mcos!t+amcos msin!t

1 C C C A = =

v

1cos!t+

v

2sin!t (4.1) where

v

1= 0 B B B @ a1sin 1 a2sin 2 ... amsin m 1 C C C A ;

v

2= 0 B B B @ a1cos 1 a2cos 2 ... amcos m 1 C C C A

Notice that this is simply a 2D-ellipse in Rm! The ellipse is lying in the 2D-plane

spanned by

v

1 and

v

2 (or 1D-line if

v

1

k

v

2) (see g. 4.11).

(18)

v2

v1

t=0 x(t)

Figure 4.11: Illustration of a multidimensional frequency

x

(t) = 1 X k=,1

c

keik!1t (4.2) where

c

k= 1T Z T 0

x

(t)e,ik! 1tdt (4.3)

While we have only one frequency we get (remember

C

,k =conj(

C

k) since

x

(t)

is real):

x

(t) =

c

,qe ,iq! 1t+

c

qeiq!1t= 2 <f

c

qgcosq! 1t ,2=f

c

qgsinq! 1t whereq=!=!1. So we have

v

1= 2 <f

c

qgand

v

2= ,2=f

c

qg.

Multifrequency signals

What does a periodic signal with more than one frequency look like?

Example 4.1

An example of how multidimensional signals are composed of

2D-frequencies: The 3D-signal

x

( ) in g. 4.12 has been composed of two frequencies

f

1( ) and

f

2( ) where

f

1( ) = 0 @ ,1 ,4 1 0 0 0 1 A  cos sin  ;

f

2( ) = 0 @ ,1 1 1 1 ,1 1 1 A  cos2 sin2 

In g. 4.12 I have included the signals projection onto the plane de ned by

(1;0;0)T and (0;1;0)T as a help to visualize all 3 dimensions.

f

2 has twice the

frequency of

f

1 and I have chosen to represent that in g. 4.12 by one ellipse

rotating one round and the other rotating two rounds. Zero DC-component is ignored.



As seen from the Fourier series expansion (eq. 4.2) we can always describe a mul-tidimensional signal as a sum of frequencies or as we now have seen a sum of two-dimensional elliptic signals. Below follows a de nition that will be useful later on.

(19)

Figure 4.12:

x

( ) =

f

1( ) +

f

2( )

De nition 4.1 (Spatially independent frequencies)

Assume we have two

mul-tidimensional frequencies

f

1(t) =

v

1cos!1t+

v

2sin!1t and

f

2(t) =

v

3cos!2t+

v

4sin!2t where !1

6

=!2 and

v

1;

v

2;

v

3;

v

4

2Rm. Then the frequencies

f

1(t) and

f

2(t) are called spatially independent if [

v

1;

v

2]

\[

v

3;

v

4] =

f0g. [

v

i;

v

j] means

the subspace spanned by

v

i and

v

j.

N multidimensional frequencies are said to be spatially independent if no single

vector in any frequency subspace can be written as a linear combination of vectors

in the other N,1 frequency subspaces.

Or to put in another way: Each frequency occupies a subspace (plane) of one or two dimensions. If then each frequency can have one or two dimensions for itself the frequencies are in a way independent and free from one another.

Example 4.2

In Example 4.1 the two frequencies

f

1 and

f

2 are not spatially

in-dependent because " 0 @ ,1 1 0 1 A; 0 @ ,4 0 0 1 A # \ " 0 @ ,1 1 ,1 1 A; 0 @ 1 1 1 1 A # = " 0 @ 0 1 0 1 A # 

Example 4.3

InR

3 there cannot be four linearly independent vectors so we must

have

v

1

k

v

2 or

v

3 k

v

4 (or both) to have spatially independent frequencies.

Fig. 4.13 shows the case

v

1

,

v

2 and

v

3 k

v

4.



Example 4.4

An "eight"

x

( ) = (cos ;sin2 ) (see g. 4.14) is an example of

spatially independent frequencies in R

2. (The frequency subspaces in g. 4.14 are

supposed to be one-dimensional.)



An interesting detail can be noticed: a band-limited multidimensional periodic

(20)

freq1 freq2

Figure 4.13: Example of two spatially independent frequencies

=

+

Figure 4.14: Frequency decomposition of an eight

band-limited toN frequencies we can also be sure that the signal is dwelling in a

subspace of Rm with a dimension less than or equal to 2N (of course obvious if

m2N) because each frequency occupies (at most) a two-dimensional subspace

and together allN frequencies must occupy a subspace of dimension 2N or less.

Dimension 2N can only occur i all frequencies are spatially independent and each

frequency subspace is two-dimensional.

4.2.2 Time shift and linear transformation

In this section I will show that there are some connections between time-shift and linear transformation in a multidimensional signal.

One frequency

Since a multidimensional frequency is nothing but an ellipse, a time-shift is

equiv-alent to rotation+scaling: Let

x

(t) =

v

1cos!t+

v

2sin!t(as in eq. 4.1) and let

F

= (

v

1

v

2). We can then write

x

(t) =

F



cos!t

sin!t



and a time-shift can now be written

x

(t+ t) =

F

 cos!(t+ t) sin!(t+ t)  =

F

 cos!t ,sin!t sin!t cos!t  cos!t sin!t  =

(21)

=

F

 cos!t ,sin!t sin!t cos!t  (

F

T

F

),1

F

T

F

 cos!t sin!t  = =

AF

 cos!t sin!t  =

Ax

(t) (4.4) where

A

=

F

 cos!t ,sin!t sin!t cos!t  (

F

T

F

),1

F

T

I have here assumed that (

F

T

F

),1 exists, i.e.

v

1 ,

v

2. If however

v

1

k

v

2 we

cannot nd an equivalent transformation for every time-shift. But the assumption is not a very restricted one because even the slightest noise at this frequency will probably make it true.

Example 4.5

If

v

1=

e

kand

v

2=

e

k+1where

e

k and

e

k+1are thek:th resp.(k+

1):th basis vectors in the unit basis (i.e. 1 in the k:th resp.(k+ 1):th position and

0 elsewhere) we get

A

= 0 B B B B B B B B @ 0 ::: 0 0 ::: 0 ... ... ... ... ... 0 ::: cos!t ,sin!t ::: 0 0 ::: sin!t cos!t ::: 0 ... ... ... ... ... 0 ::: 0 0 ::: 0 1 C C C C C C C C A (4.5)

which is just a rotation in the subspace spanned by

v

1 and

v

2. (The non-zero

elements are in thek:th and(k+ 1):th row and column.)



Many frequencies

Suppose the frequencies were spatially independent. Then they would each have a dimension or two of their own. Like above, a time-shift can be written as a linear transformation for each frequency (if each subspace is two-dimensional). Intuitively it should be possible to nd a linear transformation for the whole signal space that can be divided into sub-transformations each one a ecting only one of the frequency subspaces. If true this would mean that we still could nd an equivalent linear transformation for any time-shift. Let us con rm the intuition mathematically:

Theorem 4.1

AssumeN spatially independent frequencies and 2D-frequency

sub-spaces. Then for each t there is a linear transformation matrix

A

t so that

x

(t+ t) =

A

t

x

(t) (4.6)

Proof

see appendix A

(22)

4.3 Discussion

What use can we have of this theory?

It shows that there is a di erence between multidimensional signals and one-dimensional signals (surprise?). If we wanted to nd the time-shift between two one-dimensional signals it would have been enough to look at the lowest frequency component (if the noise is low at that frequency). That is not enough with multi-dimensional signals. Filtering independent frequencies does not have any e ect at all. Then the lter is equivalent to an ane transformation. I will not prove this but it is pretty easy to imagine: If the frequencies lies in separate dimensions it is possible to design a transformation that scales each frequency separately just as a lter would. And if we lter dependent frequencies we only reduce the depen-dence needed to distinguish a time-shift from a transformation. If we want to nd a time-shift between two signals we have to make sure that we keep some spatially dependent frequencies because that is where the information is hiding.

Also note that if the dimensionality of the signal is high, it is likely we need more frequencies than if the dimension were low because the frequencies are more spread out and thus less dependent in a high dimensional space.

So, is all the work in this chapter for nothing? Maybe not. Note that the idea behind LP- ltering is to emphasize the in uence of the lower frequencies. This idea can be used in another way.

Remember the projection vectors

w

xand

w

yin the canonical correlation formula.

Let us say we take an LP- lter that lters out the lowest frequency component. Then we have also reduced the signal into a two-dimensional subspace. When we now maximize the canonical correlation it is only interesting to move the

vec-tors

w

x and

w

y in this subspace. But at the same time we ltered we also lost

the information that was lying in the dependence between the frequencies so this method did not work.

But if we could limit the vectors

w

x and

w

y to only move in a suitable subspace

withouterasing the frequencies we might be more successful. For instance we know

where the lowest frequency subspace is (with the help of Fourier transform). Then it might be enough to maximize the canonical correlation in that subspace (in our

search for the time-shift t).

This idea might become clearer if it is explained mathematically:

We have two signals

x

(t) and

y

(t). The lowest frequency component can be found

by Fourier transformation. Let

c

1 =

Ffx(t)g

1 and

d

1=

Ffy(t)g

1. To limit the

motion of

w

x and

w

y to the lowest frequency subspaces we let:

w

x= 1 <f

c

1 g+ 2 =f

c

1 g= , <f

c

1 g =f

c

1 g   1 2  =

B

x  1 2 

w

y= 1 <f

d

1 g+ 2 =f

d

1 g= , <f

d

1 g =f

d

1 g   1  =

B

y  1 

(23)

where

B

x=, <f

c

1 g =f

c

1 g  ;

B

y=, <f

d

1 g =f

d

1 g 

and 1, 2, 1and 2are real scalars.

Insert this in the canonical correlation formula:

 = , 1 2 

B

Tx

C

x s y

B

y  1 2  s , 1 2 

B

Tx

C

xx

B

x  1 2  , 1 2 

B

Ty

C

yy

B

y  1 2  = = , 1 2 

C

xPsyP  1 2  s , 1 2 

C

xPxP  1 2  , 1 2 

C

yPyP  1 2  (4.7) where

x

P(t) =

B

Tx

x

(t) ;

y

P(t) =

B

Ty

y

(t) , and

x

Ps(t) =

B

Tx

x

(t+s)

(P stands for Projection.) This is just the canonical correlation between

x

P and

y

P (without any restriction on the projection vectors). Note that

x

P and

y

P are

the projections of

x

and

y

into the subspace de ned by the lowest frequency

com-ponent. Like when ltering we have reduced the signal dimensionality but this time we did not loose any frequencies except the ones that are spatially

orthogo-nal to the subspace. The sigorthogo-nals

x

P and

y

P are here two-dimensional. But we do

not have to restrict ourselves to only the lowest frequency component subspace. We can for instance choose the two lowest frequency components and get

four-dimensional signals

x

P and

y

P etc. Of course, the more frequency components we

include the higher the dimensionality of

x

P and

y

P. This increases the complexity

of the calculations needed to nd t. Hence it is best to use as few components

as possible.

Now that we have reduced the dimensionality, and thus the calculation complexity,

we might use exhaustive search to nd t. Furthermore it is almost as e ective

to analytically solve the eigenvalue system (eq. 2.3 or eq. 2.4 in ch. 2) to ndas

it is to use the gradient search algorithm.

Let us try this theory on the example with 6D-signals in sec. 4.1.2:

1. First nd

B

xand

B

y with help of Fourier transform. Then calculate

x

P(t)

and

y

P(t).

2. Calculate

C

xPsyP for every possible time-shifts. This can be eciently done

by convolution on each component in

C

xPsyP separately. Then calculate

M

xP s =

C

,1 x P x P

C

xPsyP

C

,1 y P y P

C

yPxP

s (for each time-shifts).

3. Take the square root of the largest eigenvalue of

M

xP

s (i.e. max can.corr.)

for every time-shift (or an alternative could be trace(

M

xPs)). The result is

(24)

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.15: Maxp eig(

M

x Ps) vs. time-shift s

4. With tknown we go back to the signals

x

and

y

and calculate the canonical

correlation between them.

This method has one catch. As you can see in g. 4.15 we got <1 at the right

time-shift even though we did not have any noise. Why did we not get = 1?

This is due to the projection. A projection is not invariant to linear

transforma-tion. Even if

y

(t) =

Ax

(t+ t) it is not sure that there is a matrix

A

P so that

y

P(t) =

A

P

x

P(t+t). So, we cannot be sure to get the right time-shift with this

method. For example it could fail if the transformation

A

is an extreme scaling

in some dimensions and not in others.

And if there is noise present we can be even less sure to get the exact right time-shift. We might have to perform exhaustive search in a neighbourhood of the calculated time-shift. Exactly how to proceed depends on the application. The method presented here is just a suggestion.

Comment:

The theory can also explain why canonical correlation can give rise to false global optima. Take for example the 'eight' in example 4.4 ( g. 4.14). A calculation of canonical correlation between the eight and a transformation of the eight (no time-shift) will give the curve in g. 4.16.

There are four global maxima. The reason is that the two frequencies present are spatially independent. Therefore the canonical correlation can look at one frequency at a time. The lowest frequency will then contribute with two global maxima and the other frequency with four global maxima (some of the maxima have the same time-shift). This shows that it is not always enough to look at only one direction. For some signals we may have to go on with the next directions in the canonical correlation algorithm (see ch. 2) and somehow put together the results from them.

(25)

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.16: maxw x; w

(26)

LP- ltering canonical

correlation

Take two one-dimensional periodic signals and calculate the correlation between them for every possible time-shift. A typical curve could look like the one in gure 3.1 page 12. As said before it would be nice to get rid of all local optima. Instead of LP- ltering the signals as we did in chapter 4 we could somehow try to LP- ltering the correlation (seen as a function of time-shift) more directly. This LP- ltered correlation could be used to locate the right time-shift. Then we can

get an equality measure by calculating maxwx;wy() at this time-shift.

5.1 An idea

Let us begin with correlation of one-dimensional signals.

Assume we have two one-dimensional signalsx(t) andy(t) with a periodT = 2=!1

and zero mean. Let xs(t) =x(t+s) and take the correlation between xs and y

(note thatEfxsxsg=Efxxg): (s) = Efxsyg p EfxxgEfyyg = 1xyT1 Z T 0 x(t+s)y(t)dt= (5.1) = x1yT1 Z T +s s x(u)y(u,s)du= 1 xyT1 Z T 0 x(u)y(u,s)du

Since the integral is the convolution between x(t) and y(,t) we get the Fourier

series coecients ofas

Pk= 1xyckd

k (5.2)

whereck anddk are Fourier series coecients of x(t) resp.y(t)numerator.

(27)

F(s) that has the Fourier coecientsPFk =PkFk. If we assume real and positive

coecientsFk we can writePFk as

PFk =PkFk= 1xyckd

kFk = 1xy(ckp

Fk)(dkp

Fk)

and we therefore get

F(s) = EfxFsyFg p

EfxxgEfyyg

(5.3)

wherexF andyF are lteredxandywith a lter that has the Fourier coecients

p

Fk. In other word ltering  (seen as a function of time-shift s) is the same

as using ltered signals in the and non- ltered signals in the denominator in the correlation formula.

This is not so strange for one-dimensional signals. Only the numerator a ects

the maximum of  seen as a function of s. The formula 5.3 is then not so hard

to understand. But is it possible that this works for multidimensional signals as well? This is not intuitively easy to answer. Now the denominator also in uences

the maximum. If the projection directions

w

x and

w

y were x we could proceed

as above and use ltered signals in the numerator. But for every x time-shift s

there will be di erent directions that give the maximum correlation!

And it is not safe to take x random directions. If we are unlucky we could choose directions that are uncorrelated at the time-shift where the maximum is. For instance take the case of correlating two 'eights' like the one in example 4.4 page

19. It would not be useful to choose

w

x horizontal and

w

y vertical.

All this complicate things a bit ... We can always try an experiment.

5.2 Experiments

Take the same 6D-signals as in section 4.1.2 ( g. 4.5). Use an LP- lter of type

1=!and lter the signals (equal to lterwith a lter of type 1=!2). The ltered

signals

x

F(t) and

y

F(t) are shown in g. 4.7 page 16. Now we do as before: for

each time-shift calculate maxwx;wy(F) where F =

w T x Cx Fs y F wy p w T x Cxxwxw T y Cyywy. (The

gradient search in chapter 2 could still be used with a minor alteration.) The result is shown in g. 5.1.

The maximum is found in the right place (time-shift 16 samples). We also get one false optima but that is not a problem (we can test both maxima and see which one is the true one). If we always would get a function like the one in g. 5.1 we did

not have to calculate max(F) for every time-shifts. It would simply be enough

to choose a few points and from there deduce where the maximum is. With the time-shift known we could then go back and calculate the canonical correlation with the non- ltered signal in the numerator. This method obviously works in this case. But one example does not make a theory.

(28)

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.LP) Figure 5.1: maxw x; w y(F) vs. time-shift

5.3 An ounce of theory #2

5.3.1 Covariance and signal frequencies

I start by showing that the covariance matrix between two periodic signals can be

expressed with the aid of signal frequencies. Assume we have two signals

x

(t)2Rm

and

y

(t) 2 Rp, both with a periodT = 2=!

1 and a zero mean. Their Fourier

series expansions are

x

(t) = 1 X k=,1

c

keik!1t (5.4)

y

(t) = 1 X l=,1

d

leil!1t (5.5)

By covariance matrix I here mean the integral

C

xy=E f

xy

Tg= 1 T Z T 0

x

(t)

y

T(t)dt (5.6)

If we insert eq. 5.4 and eq. 5.5 in eq. 5.6 we get

C

xy = 1 T Z T 0

x

(t)

y

(t)dt= 1 T Z T 0 1 X k=,1 1 X l=,1

c

k

d

 leik!1te ,il! 1tdt= = 1 X k=,1 1 X l=,1

c

k

d

 lT1 Z T 0 ei(k,l)! 1tdt= 1 X k=,1 1 X l=,1

c

k

d

 l[k,l] = = 1 X k=,1

c

k

d

 k (5.7) By setting

y

=

x

we get

(29)

C

xx= 1 X k=,1

c

k

c

 k (5.8)

and similar for

C

yy.

5.3.2 Filtered canonical correlation

Now we are ready to see what happens when we lter the canonical correlation.

First we repeat the formula of F:

F(

w

x;

w

y;s) =

w

T x

C

x Fs y F

w

y q

w

T x

C

xx

w

x

w

T y

C

yy

w

y (5.9)

where

x

Fs(t) and

y

F(t) are the results from ltering

x

(t+s) and

y

(t).

Assume we have a lter that lters out only the lowest frequency component

(i.e. F1 = 1 and Fk = 0 fork

6

=1) of the signals. The ltered signals

x

F(t)

and

y

F(t) would then only have the Fourier coecients corresponding tok=1.

Eq. 5.7 thus gives (remember Ff

x

(t+s)gk = eik!

1s

Ff

x

(t)gk = eik!

1s

c

k and

c

,k =conj(

c

k) for real functions.)

C

xFsyF =

c

,1e ,i! 1s

d

 ,1+

c

1e i!1s

d

 1= = conj(

c

1e i!1s

d

 1) +

c

1e i!1s

d

 1= 2 <f

c

1e i!1s

d

 1 g= = 2<  (<f

c

1 g+i=f

c

1 g)(cos! 1s+isin!1s)( <f

d

1 gT ,i=f

d

T 1 g) = = (<f

c

1 gcos! 1s ,=f

c

1 gsin! 1s) <f

d

1 gT+ +(<f

c

1 gsin! 1s+ =f

c

1 gcos! 1s) =f

d

1 gT = = , <f

c

1 g =f

c

1 g   cos!1s sin!1s ,sin! 1s cos!1s  <f

d

1 gT =f

d

1 gT  (5.10) Insert eq. 5.10 in eq. 5.9:

F =

w

Tx , <f

c

1 g =f

c

1 g  p

w

T x

C

xx

w

x  cos!1s sin!1s ,sin! 1s cos!1s   <f

d

1 gT =f

d

1 gT 

w

y q

w

T y

C

yy

w

y = =

a

Tx  cos!1s sin!1s ,sin! 1s cos!1s 

a

y (5.11) where

a

x =  <f

c

1 gT =f

c

1 gT 

w

x p

w

T x

C

xx

w

x ;

a

y=  <f

d

1 gT =f

d

1 gT 

w

y q

w

T y

C

yy

w

y

(30)

We know from Cauchys inequality that F  k

a

x kk

a

y k. Since

a

x and

a

y are two-dimensional and  cos!1s sin!1s ,sin! 1s cos!1s 

is a rotation matrix there is always

a solutionsfor every pair of vectors

a

x,

a

y so thatF =

k

a

x

kk

a

y

k.

Therefore if we want to nd argmaxw

x; w

x;s(F) we can proceed as follow:

1. Maximize k

a

x k,k

a

y k. This gives

w

x and

w

y.

2. Find the angle between

a

xand

a

y. This gives the time-shifts(two solutions)

Note that the procedure above gives two possible solutions to s (an angle

be-tween two vectors can be described either by or +). We have to calculate

maxwx;wy() for both solutions and take the largest one.

We have one thing left to discuss in the process above: How do we maximize

k

a

x kandk

a

y k? We have k

a

x k 2 =

a

T x

a

x=

w

T x , <f

c

1 g =f

c

1 g   <f

c

1 gT =f

c

1 gT 

w

y

w

T x

C

xx

w

x = =

w

Tx( <f

c

1 g<f

c

1 gT+=f

c

1 g=f

c

1 gT)

w

x

w

T x

C

xx

w

x = =

w

Tx(

c

,1

c

T ,1+

c

1

c

T 1)

w

x

w

T x

C

xx

w

x =

w

Tx

C

x F x F

w

x

w

T x

C

xx

w

x (5.12) k

a

x k 2 = ::: =

w

T y

C

yFyF

w

y

w

T y

C

yy

w

y (5.13) This looks a lot like the canonical correlation formula. We can in fact use the

same type of gradient search algorithm to nd max(k

a

x

k) and max(k

a

y

k) as we

used to nd maxwx;wy() in chapter 2 (with

y

=

x

and a minor justi cation due

to LP- ltered signals in the numerator):

(

w

x(t+ 1) =

w

x(t) + 

w

x(t)

w

y(t+ 1) =

w

y(t) + 

w

y(t) (5.14) where ( 

w

x= x(

x

F

x

TF

w

^x ,

xx

T

w

x) 

w

y= y(

y

F

y

TF

w

^y ,

yy

T

w

y) (5.15)

5.3.3 One step further

What happens if the lter is broader (i.e. if we use more than one frequency)?

Then

a

x and

a

y are no longer two-dimensional and the matrix between them in

eq. 5.11 is more complex. It is hard to tell what is happening now (and even

harder if noise is present). We should still be able to use

a

x and

a

y though. The

(31)

ane transformations. If there is no noise present the maximum of k

a

x

kwould

be characteristic for the signal (and depending of the lter used) and independent of time-shift and ane transformation. An example to show you what I mean:

Assume

y

(t) =

Ax

(t+ t). Maximize k

a

x k 2 =

w

T x

C

x F x F

w

x

w

T x

C

xx

w

x k

a

y k 2 =

w

T y

C

yFyF

w

y

w

T y

C

yy

w

y =

w

Ty

AC

xFxF

A

T

w

y

w

T y

AC

xx

Aw

y

As mentioned in sec. 2.2: It can be shown that if the generalized eigensystem

C

x F

x

F

w

=x

C

xx

w

has distinct eigenvalues

fxigthen there is only one

maxi-mum ofk

a

x k

2. The same goes for

k

a

y

k

2. Maximizing the functions above should

thus give two vectors:

w

y and

w

x=

A

T

w

y. Then construct x(t) =

w

xT

x

(t) =

w

T y

Ax

(t) y(t) =

w

yT

y

(t) =

w

T y

Ax

(t+ t) =x(t+ t)

These two one-dimensional signals can help to detect the time-shift t.

If we know something about the noise we can choose a lter that moderates fre-quencies with high noise. A new process could look something like this:

1. Find argmax(k

a

x(

w

x)

k) and argmax(k

a

x(

w

x)

k) using some suitable lter.

2. Create the one-dimensional signalsx(t) =

w

Tx

x

(t) andy(t) =

w

T

y

y

(t). Use

these signals to nd a time-shifts.

3. With the time-shift found, calculate maxwx;wy() at that time-shift to get

the similarity measure.

5.4 Discussion

If there is noise present we cannot trust the process above completely. The second

step, i.e. ndings, might be incorrect. But if the noise is not too high the process

might nd a time-shift near the right one. Step three in the process can then be

modi ed to test all time-shifts in a neighborhood of the calculateds. Then we can

choose the time-shift that gives the highest value of . This is still a lot cheaper

than have an exhaustive testing of all time-shifts.

Lets try this new theory on the 6D-signals in section 4.1.2 ( g. 4.5). The

l-ter used is of type 1=!0:2.

1. Filter

x

and

y

. Maximizek

a

x

kand k

a

x

k. This gives vectors

w

(32)

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(correlation) Figure 5.2: jcorr(x(t+s);y(t))jvs.s 2. Create x(t) =

w

T x

x

(t) and y(t) =

w

T

y

y

(t). Calculating the correlations

between them for every time-shift gives the function in g. 5.2. This can be done in e ective ways with one-dimensional signals. In the gure we can clearly see a global maximum at time-shift 16 samples.

3. Usings= 16 we can now calculate maxw

x; w

y((16)) = 1.

Like in the previous chapter this is also only a suggestion. The use should be investigated in each application. I have not thoroughly discussed the in uence of noise and not either how wide neighborhood that should be tested in step 3 in the method above.

(33)

Signal norm

Finally I will discuss another method. Again the idea is to nd the time-shift

rst and then calculate maxwx;wy() at that shift, but the method of nding the

time-shift di ers from the previous ones.

6.1 Experiments

Assume

y

(t) =

Ax

(t+ t) +

e

(t) where

e

(t) is a noise signal. Also assume

e

(t) is

uncorrelated with

Ax

(t+ t) and thatEf

e

T(t)

e

(t)g=e(independent of time).

If

A

is orthonormal, i.e.

A

T

A

=

I

, we can simply use the squared norms of the

signals to nd the time-shift. The squared norms are

u(t) = j

x

(t)j

2=

x

T(t)

x

(t)

v(t) = j

y

(t)j

2=

y

T(t)

y

(t)

And the expectation value ofv(t) is

Efv(t)g = Ef

x

T(t+ t)

A

T

Ax

(t+ t) + 2

e

T(t)

Ax

(t+ t) +

e

T(t)

e

(t)g=

= u(t+ t) +e

If we look at the correlation betweenu(t+s) andv(t) for every possible time-shift

s(or the convolution betweenu(t) andv(t)) we will get a peak at the right

time-shifts= t.

So far so good. But what will happen if

A

is not orthogonal (it might not even

be square)? An experiment is a good start:

Generate a 5D-signal,

x

(t), by LP- ltering noise. Also generate a random

(34)

0 B B B B @ 1 2 3 4 5 1 C C C C A = 0 B B B B @ 0:005 0:0440 0:2970 0:5298 1:2750 1 C C C C A

These eigenvalues are of interest - see the theory in next section. Note that A is

far from orthonormal. Let

y

(t) =

Ax

(t+ 16) (tstands for samples). The squared

norms of

x

and

y

(i.euand v) are shown in g. 6.1. The absolute values of the

correlation between u(t+s) and v(t) as function of sare shown in g. 6.2. The

maximum value is found at time-shift 16.

0 20 40 60 80 100 0 0.05 0.1 0.15 0.2 0.25 sample (t) u(t) 0 20 40 60 80 100 0 0.05 0.1 0.15 0.2 0.25 sample (t) v(t)

Figure 6.1: Squared norms of

x

(left) resp.

y

(right)

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(corr(us,v)) Figure 6.2: Abs(correlation(u(t+s),v(t)) vs.s

It worked even though

A

were not orthonormal! Why? When does it not

work?

6.2 An ounce of theory #3

Ignore the noise. If the noise and the signals are uncorrelated and the varianse of the noise is constant in time it will only a ect the mean of the signal norms.

Assume

y

(t) =

Ax

(t+ t) where

x

(t)2Rm and

y

(t)2Rn. The squared norm

(35)

v(t) =

x

T(t+ t)

A

T

Ax

(t+ t)

A

T

A

symmetric)there are matrices

U

, orthonormal ;

D

= 0 B B B @ 1 0 ::: 0 0 2 ::: 0 ... ... ... ... 0 0 ::: m 1 C C C A such that

A

T

A

=

U

T

DU

(Spectral thesis in linear algebra theory.)

Let

z

(t) = 0 B @ z1(t) ... zm(t) 1 C A=

Ux

(t). We then have u(t) =

x

T(t)

x

(t) =

x

T

U

T

Ux

(t) =

z

T(t)

z

(t) v(t) =

x

T(t+ t)

U

T

DUx

(t+ t) =

z

T(t+ t)

Dz

(t+ t) This can be rewritten as

u(t) =

z

T(t)

z

(t) =z2 1(t) +z 2 2(t) +:::+z 2 m(t) = = , 1 1 ::: 1  0 B B B @ z2 1(t) z2 2(t) ... z2 m(t) 1 C C C A v(t) = 1z 2 1(t+ t) + 2z 2 2(t+ t) +:::+mz 2 m(t+ t) = = , 1 2 ::: m  0 B B B @ z2 1(t+ t) z2 2(t+ t) ... z2 m(t+ t) 1 C C C A

Hence uandv can be seen as two di erent projections of the signal

s

(t) = 0 B B B @ z2 1(t) z2 2(t) ... z2 m(t) 1 C C C A

So, if the transformation

A

is fairly isometric, i.e. the eigenvalues i are of the

(36)

able to determine the time-shift tfromu(t) andv(t).

But the matrix

A

in the experiment in the previous section was not isometric.

Why did this method still work?

Assume for a minute that t= 0. This will not restrict the following discussion.

A projection of a signal can be thought of as projecting the frequencies sepa-rately and then add the results together. As said in chapter 4.2 each frequency is simply an ellipse in some 2D-subspace. A projection of a frequency will give a one-dimensional frequency with a phase that depends on the projection. Two di erent projections will give two di erent phases. (Di erent time-shifts and equal projections will also give di erent phases but this is ignored here.) Hence, if there were two di erent projections, each frequency will give an incorrect contribution to the correlation above - they will 'pull' the convolution peak away from the right time-shift. But if there are many frequencies and they are evenly spread in space it might still work. Then they will each indicate a too high or too low time-shift but when added together we will see a peak near the right time-shift.

Also notice: All eigenvaluesiare positive (because

AA

T is positive semide nite).

Therefore the projections ( 1 1 ::: 1 ) and ( 1 2 ::: m ) are in the

same 'multidimensional quadrant'. They will therefore never totally contradicting each other even if the eigenvalues di ers in size.

Remember the example in sec. 6.1. Look at g. 6.3. The left gure shows the convolution (absolute value) between the squared norms of the signals lowest fre-quency. The peak is shifted a little to the left. Next the second lowest frequency is added and the squared norms are convoluted again. Now the global peak indicates a little too high time-shift. Finally when the third lowest frequency is added the peak is found at the right time-shift (16) and remains there when the rest of the frequencies are added.

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1x 10 −3 sample (time−shift)

abs.conv. of norms of freq.1

0 20 40 60 80 100 0 0.5 1 1.5 2 2.5 3 3.5x 10 −3 sample (time−shift)

abs.conv. of norms of freq.1+2

0 20 40 60 80 100 0 1 2 3 4 5 6 7 8x 10 −3 sample (time−shift)

abs.conv. of norms of freq.1+2+3

Figure 6.3: Absolute value of the convolution between the norms (with zero mean) of the signals: lowest frequency (left), two lowest frequencies (middle), three lowest frequencies (right)

6.3 Discussion

A case where this method does not work is the example with the 6D-signals in chapter 4 (sec. 4.1.2). The correlation is shown in g. 6.4. We have a local maximum at time-shift 16 but the global maximum is somewhere else. (It is not

(37)

always certain that there even is a maximum at all at the right time-shift.) The global maximum is not much larger than the local ones though and that could be an indication that the method did not work.

0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift (s) abs(corr(us,v))

Figure 6.4: Absolute value of the correlation between the squared norms of the 6D-signals in sec. 4.1.2 as function of time-shift

The eigenvalues of

A

T

A

are here

0:0030 0:0100 0:3696 0:7575 0:9036 8:4344

They are not in the same order of magnitude and consequently the method is less likely to succeed.

The method discussed in this chapter is most reliable if the transformation matrix

A

is fairly isometric. If that is not the case it can still work if the signals are

mul-tifrequent and spread in space. The motivation in the previous section is however rather 'ad hoc' and the method should be evaluated at each speci c practical use. The method did not work in the example above. But on the other hand we did not get a distinguished global maximum and this could serve as an indication of failure and we should try something else.

The method is simpler than the ones described in the previous chapters and can be worth trying before using anything more complicated.

(38)

Round-up

7.1 Summary and comparisment

The idea throughout the whole thesis has been to rst locate the time-shift tand

then nd maximum canonical correlation at that time-shift (or test all time-shifts in a neighborhood). In each of the previous three chapters I have presented a sug-gestion how to approach the problem of nding the time-shift. They are presented once again down below as block schedules. There are also complexity estimations.

Assume

x

(t) and

y

(t) are bothm-dimensional and that we haveN sample points

of each signal.

1. Project

x

,

y

into suitable subspaces (chapter 4)

Assume the subspaces to bep-dimensional (i.e. calculatep=2 Fourier coecients).

x

(t) -F  -? f

c

kg projection P P P P q

x

P(t)

y

(t) -F  -? f

d

kg projection     1

y

P(t) argmaxs(trace(

M

xP s)) - s O(pmN) O(pmN) O(p2NlogN)

Complexity: O(pmN) (assumingpsmall so thatm > plogN.)

(39)

x

(t) - Filter

x

-F   -argmax w x( w T x Cx F x F wx w x C xx w x ) 

w

x ? -

w

T x

x

(t) S S S w x(t)

y

(t) - Filter

y

-F   -argmax wy( w T y Cy F y F wy wyCyywy ) 

w

y ? -

w

T y

y

(t)    7 y(t) argmaxs jcorr(xs,y)j -s O(mNlogN) O(mN) O(mN) O(NlogN) Complexity: O(mNlogN)

3. Signal norms (chapter 6)

x

(t) -k

x

(t)k 2 P P P P q u(t)

y

(t) -k

y

(t)k 2     1 v(t) argmaxs(jcorr(us,v)j) - s O(mN) O(NlogN) Complexity: O(mN)

In the complexity estimation I have assumed fairly high dimensionalitymso that

m >logN.

As a comparison let us calculate the complexity of performing exhaustive testing of all time-shifts like we did in chapter 3: Using the gradient search method for

each time-shift givesO(mN2). Another method is to rst calculate the covarians

matrix

C

x

s

yfor each time-shifts. This can be done by correlation for each element

in the matrix separately with aid of FFT (cost: O(m2NlogN)). Then calculate

M

x

s de ned in chapter 3 (for each time-shift s) (cost: O(m

2N)). The time-shift

can then be nd by maximizingtrace(

M

x

s) with respect tos. Total complexity:

O(m2NlogN).

As mentioned before the methods 1 and 3 are not invariant to linear transforma-tions. I still present them because they might be successful in certain applications and they also reveal some of the problems we are faced with when dealing with multidimensional signals. The second method is invariant to linear

(40)

without concern of the other signal. This makes the method more sensitive to noise than if we somehow had considered the mutual information between the signals like we do in for instance canonical correlation (similar arguments can be made for the other two methods).

7.2 Future work

I have only occationally mentioned the in uence of noise on the suggested meth-ods. This could be more thoroughly investigated, for instance what kind of lter that should be used in method 2 in order to minimize the in uence of noise. The test-signals used in this thesis were all home-made. Next step could be to try the methods in a practical application. This might reveal some new insights of the problems (like the ancient Greeks I have only philosophized without testing my theories ...).

Before I end this thesis I brie y want to mention an idea that recently came up (by my examiner) (had no time to look deeper into it):

It is based on LP- ltered canonical correlation (chapter 5) and is an attempt to incorporate the time-shift into the gradient search method described in chapter 2. The idea is to maximize

F(

w

x;

w

y;s) =

w

T x

C

xFsyF

w

y

w

T x

C

xy

w

y (7.1)

with respect to

w

x,

w

y and s using some gradient search similar to the one

de-scribed in chapter 2 (and with the variablesadded to the gradient search).

Step one in the idea is to start with letting

x

F and

y

F be the lowest frequency

com-ponents of

x

resp.

y

. MaximizingF will give a certain solution (

w

(1)

x ;

w

(1) y ;s

(1)).

Next step is to add the second lowest frequency component of

x

and

y

, i.e. letting

x

F and

y

F be the two lowest frequency components of

x

resp.

y

. Now we use

(

w

(1)

x ;

w

(1) y ;s

(1)) as start point in a gradient search on the new

F. Maximizing

the new F will give a solution (

w

(2)

x ;

w

(2) y ;s

(2)). Continue by adding more and

more frequency components so that in step i we let

x

F and

y

F containing thei

lowest frequency components of

x

resp.

y

. MaximizeFstarting from the previous

maximum point (

w

(i,1) x ;

w

(i,1) y ;s (i,1)) found in stepi ,1.

The thought is that for certain signal spectra (maybe for instance spectra of

type 1=!2) we can be sure that by starting at the point (

w

(i,1) x ;

w

(i,1) y ;s

(i,1))

in the gradient search in step i we will end up in the global maximum of F

(i.e. (

w

(i,1)

x ;

w

(i,1) y ;s

(i,1)) is in some way closer to the global maximum than to

any local maximum). If this holds for alliwe can be sure to nally end up in the

global maximum ofwhen letting i!1.

I mention this idea because I nd it interesting and because it (if working) is

a way to include the time-shiftsinto the gradient search, which maybe originally

(41)

Proof of theorem 4.1

Assumptions: N spatially independent frequencies and 2D-frequency subspaces.

Our mission is to show that for every time-shift t there is an equivalent

lin-ear transformation

A

tso that

x

(t+ t) =

A

t

x

(t).

Start with the Fourier series expansion of

x

(t)2Rm(assuming zero DC-component):

x

(t) = XN k=,N

c

keik!1t= N X k=1 2<f

c

keik! 1t g= N X k=1

F

k  cosk!1t sink!1t  (A.1) where

F

k = 2(<f

c

kg=f

c

kg).

The number of terms in the sum are nite since the frequencies by assumption are

spatially independent and the signalspace is of nite dimension (m).

Let

F

= (

F

1

F

2:::

F

N). If

F

is anm

m-matrix we can let the columns in

F

be

a basis for Rm, otherwise we can ll out with a set of vectors to a basis ofRm.

Let the columns in the matrix

B

represent the basis, i.e. :

B

= (

F G

)

where the columns in

G

are the vectors needed to ll out

F

to get a basis.

We can transform this basis to the unitbasis

E

by the transformation matrix

B

,1:

E

=

B

,1

B

. This means that

B

,1

F

k = 0 B B B B B B B B @ 0 0 ... ... 1 0 0 1 ... ... 0 0 1 C C C C C C C C A = (

e

2k,1

e

2k)

where

e

2k,1 and

e

2k are the 2k

,1:th resp. 2k:th unitvector. In this basis a

(42)

a structure like the one in example 4.5 (eq. 4.5), i.e. a subspace rotation.

A

k will

only a ect thek:th frequency and if we sum all

A

k:s into one matrix

A

we get

A

= 0 B B B B B B B B @

A

1 :::

0 0

:::

0

... ... ... ... ...

0

:::

A

N

0

:::

0

0

:::

0 0

:::

0

... ... ... ... ...

0

:::

0 0

:::

0

1 C C C C C C C C A = (A.2) = 0 B B B B B B B B B B B B @ cos!1t ,sin! 1t ::: 0 0 0 ::: 0 sin!1t cos!1t ::: 0 0 0 ::: 0 ... ... ... ... ... ... ... 0 0 ::: cosN!1t ,sinN! 1t 0 ::: 0 0 0 ::: sinN!1t cosN!1t 0 ::: 0 0 0 ::: 0 0 0 ::: 0 ... ... ... ... ... ... ... 0 0 ::: 0 0 0 ::: 0 1 C C C C C C C C C C C C A

So nally a time shift tcan be written as

x

(t+ t) = XN k=1

F

k  cos!1(t+ t) sin!1(t+ t)  =

B

XN k=1

B

,1

F

k  cos!1(t+ t) sin!1(t+ t)  = =

B

XN k=1 ,

e

2k,1

e

2k   cos!1(t+ t) sin!1(t+ t)  = =

B

XN k=1

A

k,

e

2k,1

e

2k   cos!1t sin!1t  = =

B

XN k=1

A

k

B

,1

F

k  cos!1t sin!1t  = =

B

(

A

1+:::+

A

N) N X k=1

B

,1

F

k  cos!1t sin!1t  = =

BAB

,1

x

(t) (A.3)

and we have found the linear transformation

A

t=

BAB

(43)

Bibliography

[1] M. Borga. Reinforcement Learning Using Local Adaptive Models, August 1995. Thesis No. 507, ISBN 91{7871{590{3.

[2] L. Sornmo. Optical alignment of vectorcardiographic loops. Signal Processing Report 35, Department of Applied Electronics, Lund University, December 1996.

References

Related documents

Statistics gathered from RSMCCA can be used to model true population correlation by beta regression, given certain characteristic of data set4. RSM- CCA applied on real biological

Då i stort sett alla remisser i den sydöstra hälso- och sjukvårdsregionen under tidsperioden årsskiftet 2011-2012 till mars 2014 besvarades inom 8 veckor är det utifrån

Genom att skapa en kontrollgrupp, som inte fick några instruktioner eller källkritiska frågor förrän efter de fått se inslagen andra gången, visas om eventuell effekt på

Vidare, på frågan om han ansåg att andelen elever med utländsk, respektive svensk bakgrund spelar någon roll för betygskillnader i ämnet idrott och hälsa och i så fall

Enligt vad Backhaus och Tikoo (2004) förklarar i arbetet med arbetsgivarvarumärket behöver företag arbeta både med den interna och externa marknadskommunikationen för att

In conclusion, CCA approach was applied to find the correlation between the genotypes and the phenotypes in atherosclerosis. The genes met, timd4, pepd and pccd have been

Även att ledaren genom strategiska handlingar kan få aktörer att följa mot ett visst gemensamt uppsatt mål (Rhodes &amp; Hart, 2014, s. De tendenser till strukturellt ledarskap