Multidimensional signal recognition,
invariant to ane transformation and
time-shift, using canonical correlation
Thesis project done at Computer Vision Laboratory
Linkoping University
Sweden
by
Bjorn Johansson
Reg. nr: LiTH-ISY-EX-1825
Supervisor:
Magnus Borga
Examiner:
Hans Knutsson
Linkoping, October 6, 1997
Contents
1 Introduction
4
1.1 Acknowledgements . . . 4 1.2 Notation . . . 4 1.3 Previous knowledge . . . 5 1.4 Outlines . . . 52 Canonical correlation
6
2.1 Denition and a short explanation . . . 62.2 The gradient-search algorithm . . . 9
3 What this thesis is about
11
4 LP-ltering signals
14
4.1 Experiments . . . 14 4.1.1 One-dimensional signals . . . 14 4.1.2 Multidimensional signals . . . 16 4.2 An ounce of theory #1 . . . 17 4.2.1 Multidimensional frequencies . . . 174.2.2 Time shift and linear transformation . . . 20
4.3 Discussion . . . 22
5 LP-ltering canonical correlation
26
5.1 An idea . . . 265.2 Experiments . . . 27
5.3 An ounce of theory #2 . . . 28
5.3.1 Covariance and signal frequencies . . . 28
5.3.2 Filtered canonical correlation . . . 29
5.3.3 One step further . . . 30
5.4 Discussion . . . 31
6 Signal norm
33
6.1 Experiments . . . 336.2 An ounce of theory #3 . . . 34
7 Round-up
38
7.1 Summary and comparisment . . . 38 7.2 Future work . . . 40
Introduction
1.1 Acknowledgements
I would like to thank my supervisor licentiate Magnus Borga and my examiner associate professor Hans Knutsson for many inspiring discussions and ideas that has moved the work forward (I got no help from literature - could not nd any concerning this topic). I would also like to thank Magnus Borga for all construc-tive criticism in proof-reading my thesis. Thank you also to the rest of the sta at the Computer Vision laboratory for helping me with practical things during the work. Thanks also to Stefan Eriksson for taking on the job as opponent.
Finally I would like to thank the word processing program LATEX and the
mathe-matical software MATLAB for being cooperative (most of the time).
1.2 Notation
Italics (x) are used for scalars.
Lowercase letters in boldface (
x
) are used for vectors.Uppercase letters in boldface (
X
) are used for matrices.Transpose is denoted by a 'T' (e.g.
x
T).Adjungation (transpose + conjugation) is denoted by a '' (e.g.
x
).
'^' over a vector indicates unit length (e.g.j
v
^j= 1.)<fzg and =fzg means the real part resp. the imaginary part of the complex
variablez.
1.3 Previous knowledge
In order to appreciate this thesis you need to have some knowledge in linear algebra (e.g. terms like subspace, span and eigenvalue), statistics (correlation, covariance matrix, etc. ) and Fourier series.
1.4 Outlines
Chapter 2 describes the concept of canonical correlation. This you have to know about in order to understand the continuing discussion.
Chapter 3 introduce you to the problem that was to be solved.
Chapter 4, 5 and 6 discusses three dierent suggestions how to approach the problem. Each chapter begins with a section of experiments as a motivation of the approach. Then follows some theory and mathematical manipulations to structure the thoughts. The last sections contains discussions and suggestions concerning the approach.
Finally chapter 7 contains a summary and a comparismental discussion of the approaches.
Canonical correlation
Below follows a short explanation of canonical correlation. I will also mention a method to nd the maximum canonical correlation based on an iterative gradient search. This method has been developed at the laboratory of Computer Vision here in Linkoping. If you want to know more about canonical correlation and the gradient search method I strongly recommend the licentiate thesis by Magnus Borga [1].
2.1 Denition and a short explanation
Canonical correlation is a similarity measure between two multidimensional sig-nals. The idea is to only look at one direction (projection) at a time. Canonical correlation also has an advantage compared to ordinary correlation analysis in being invariant to ane transformations.
Assume we have two multidimensional signals
x
2 X andy
2 Y where X andY are multidimensional spaces, not necessarily of the same dimensionality. Let
x = ^
w
Tx
x
and y = ^w
T
y
y
be the projections ofx
resp.y
in the directions ^w
x
resp. ^
w
y. Assume Ef
x
g = Efy
g =0
. Then we can calculate the correlationbetweenxandy as = Efxyg p EfxxgEfyyg =
w
^TxE fxy
Tgw
^ y q ^w
T xE fxx
Tgw
^ xw
^T yE fyy
Tgw
^ y = =w
xTC
xyw
y qw
T xC
xxw
xw
T yC
yyw
y (2.1)By rotating the projection vectors
w
x andw
ywe can nd the directionsExample 2.1
A simple example of a multidimensional (periodic) signal: An\eight"x
() = (cos;sin2) (see g. 2.1). Fig. 2.2 shows the projection x() =w
Tix
()along the three directions
w
1= (1;0),w
2= (0:6;0:8) andw
3= (0;1).−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 x1 x2
Figure 2.1: Example of a multidimensional signal: An \eight"
0 1 2 3 4 5 6 −1 −0.5 0 0.5 1 alpha proj. w1=(1,0) 0 1 2 3 4 5 6 −1.5 −1 −0.5 0 0.5 1 1.5 alpha proj. w2=(0.6,0.8) 0 1 2 3 4 5 6 −1 −0.5 0 0.5 1 alpha proj. w3=(0,1)
Figure 2.2: Projection of the eight-signal along the directions
w
1= (1;0)(left),
w
2= (0:6;0:8) (middle) andw
3= (0;1) (right)Let
y
() =x
() and calculate the canonical correlation . Figure 2.3 showsas a function of
w
x andw
y. Because thew
-vectors are two-dimensional I havechosen to represent them in the gure as the angle between them and the horizontal
axis (the x1-axis - see g. 2.1). For instance
w
= (1;0) is represented as 0o andw
= (0;1) corresponds to 90o. The gure shows that we have a maximum= 1wherever
w
x=w
y (and nowhere else).
The maximum can be found by looking at the derivatives ofwith respect to
w
xand
w
y: ( @ @wx = 0 @ @w y = 0 , (C
xyw
^y , ^ w T x C xy ^ w y ^ w T x C xx ^ w xC
xxw
^x= 0C
yxw
^x , ^ w T y Cyxwx^ ^ w T y C yy ^ w yC
yyw
^y= 0 , (C
xyw
^y , q ^ w T y Cyywy^ ^ w T x C xx ^ w xC
xxw
^x= 0C
yxw
^x , q ^ w T x C xx ^ w x ^ w T y C yy ^ w yC
yyw
^y = 0 (2.2) (Proof see [1].) This equation system can be rewritten as0 100 200 300 0 200 −1 −0.5 0 0.5 1 wx rho wy 00 100 200 300 50 100 150 200 250 300 350 rho wx wy
Figure 2.3: as function of
w
x andw
y (right). Left gure shows thesame curve from above. The unit of the
w
x- andw
y-axis are the angle indegrees from the horizontal axis (x1-axis).
C
Aw
=C
Bw
(2.3) whereC
A=0 C
xyC
yx0
;C
B=C
xx0
0 C
yyw
= xw
^x yw
^y ; x y = q ^ w T y Cyy ^ wy ^ w T x Cxxwx^Eq. 2.3 is called a generalized eigenproblem. By solving this we nd the directions ^
w
x and ^w
y and then also the projections ^w
T
x
x
and ^w
T
y
y
that gives the highestcorrelation. The maximum correlation can be found as the largest eigenvalue (1).
The equation system 2.2 can also be rewritten in another way (this form will be used later on in the thesis):
(
M
xw
^x= 2w
^ xM
yw
^y= 2w
^ y (2.4) where (M
x=C
,1 xxC
xyC
,1 yyC
yxM
y=C
,1 yyC
yxC
,1 xxC
xyIf we only want to nd the eigenvalues and not the eigenvectors it is enough to solve one of the equation systems in 2.4. I have assumed that the inverse of the variance matrices exists. It is true in all practical cases where there are noise present.
What is the interpretation of the other eigenvalues (2, 3, ...)? These
eigen-values corresponds to canonical correlations in other directions. Let us say we could somehow eliminate the information in the signals from the direction that corresponds to the largest correlation. Then the second largest eigenvalue and corresponding eigenvector would give the maximal correlation and so forth. Often
it is not enough to look in one direction only. Note that it is possible to get1= 1
in the rst direction indicating that the signals are equal, and get i 1 in the
remaining directions indicating that the signals only looked alike in one direction. So if we want to know the similarity between the signals in more than one direction we can have use of the other eigenvalues as well. We could for instance let
(2 1+ 2 2+:::+ 2 k)=k ; 1 2 ::: (2.5)
for some suitablek be a measure of similarity between the signals.
As mentioned earlier canonical correlation is invariant to ane transformations. Here follows an example of a special case of that invariance: The maximum of the
canonical correlation between
x
(t) andx
(t) gives= 1 as it should. If we do thesame thing with
x
(t) andy
(t) =Ax
(t), whereA
is a linear transformation, weget: =
w
TxE fxy
Tgw
y qw
T xE fxx
Tgw
xw
T yE fyy
Tgw
y = =w
xTE fxx
TA
Tgw
y qw
T xE fxx
Tgw
xw
T yE fAxx
TA
Tgw
y = =w
TxC
xxA
Tw
y qw
T xC
xxw
xw
T yAC
xxA
Tw
y (2.6)and we get= 1 if we choose
w
x=A
T
w
y.
2.2 The gradient-search algorithm
The second derivative of, i.e. the Hessian, shows that there are no local maxima
ofexcept for the global one if all eigenvaluesfigare distinct (proof see [1]). This
is an important condition if we want to use a gradient search to nd the maximum
of . I will now describe the iterative gradient search method for nding the
maximal canonical correlation.
The update rule is (
w
x(t+ 1) =w
x(t) +w
x(t)w
y(t+ 1) =w
y(t) +w
y(t) (2.7) where (w
x=xx
(y
Tw
^ y ,x
Tw
x)w
y=yy
(x
Tw
^ x ,y
Tw
y) (2.8)xand y are positive update factors.
Why does this update rule work? The expected change of
w
x isEf
w
x g = xEfxy
Tw
^ y ,xx
Tw
x g=x(C
xyw
^y ,kw
x kC
xxw
^x) = =identify with eq:2:2
= x@@
w
x
where xis a positive constant andk
w
x k= ^ w T x Cxywy^ ^ w T x Cxxwx^ = x y. Similar calculations gives Efw
y g= y @ @w
y ; kw
y k= y x (2.10)This shows that the changes of
w
x andw
y are on average in the same directionsas the gradient of, i.e. they are moving in a direction that increases.
The canonical correlation can after convergence be calculated as
=q k
w
x kkw
y k (2.11)To get the second largest eigenvalue, i.e. the second best directions, there is a way to subtract the information in the rst direction and use a modied iterative gradient search. I will not get any further into this. Let us instead begin the work.
What this thesis is about
Let us say we have measured a multidimensional periodic signal
y
(t). It couldfor instance be an image sequence or an ECG signal. We maybe suspect that
this signal comes from one of many in a set of prototype signals f
x
i(t)g. Howdo we nd the most likely one? If the prototype signal
y
is measured by sensors(e.g. cameras) they will probably deform the signal (dierent angles, distances causes rotation, scaling etc. ) and they will also introduce noise. On top of all the measured signal may be unsynchronized in time with the stored prototype
signal. So all in all we could assume that the measured signal
y
is a transformed,time-shifted and noisy version of a prototype signal
x
i:y
(t) =Ax
i(t+ t) +e
(t) (3.1)where
A
is a linear transformation matrix, t is a time-shift (sometimes calledtime alignment) and
e
(t) is some sort of unwanted noise signal.In reality the transformation does not have to be linear. A non-linear transforma-tion is a bit more complicated and is not treated in this thesis. So from now on I will assume a linear transformation.
If we want to nd the prototype signal
x
i that is most similar toy
we mustrst dene what we mean by similarity. If we want to handle cases like the one discussed above we should have a similarity measure that is invariant to linear
transformations and time-shifts. In this thesis two signals
x
(t) andy
(t) are saidto be equal if they can be written
y
(t) =Ax
(t+ t) for someA
and t.The canonical correlation algorithm described in chapter 2 can only handle ane transformations. That is not bad at all but it would be even better if it could also handle time-shifts. The goal of this thesis is to nd a way to detect (some form
of) similarity between two signals
x
(t) andy
(t) =Ax
(t+ t) regardless of whatane transformation
A
and time-shift t we have.Let
x
s(t) =x
(t+s) and take the canonical correlation betweenx
s andy
(notethat
C
xs x
s =
C
xx):=
w
TxC
x s yw
y qw
T xC
xxw
xw
T yC
yyw
y (3.2)We can now think ofas a function of
w
x,w
y ands. If we maximizewe get asimilarity measure between
x
andy
that is invariant toA
and t.One way to nd that maximum would be to use the algorithm described in chapter
2, i.e. maximize with respect to
w
x andw
y, for each time-shiftsand then choosethe maximal value among these.
Example 3.1
Take a look at gure 3.1. maxw x;w
x() is calculated for each
time-shift s (with for instance the method described in ch. 2). The 2D-signals used
are shown in gure 3.2. They have a period of 100 samples and one of them are shifted 16 samples (and transformed). This is detected in the graph by the global maximum. 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 s max(can.corr.)
Figure 3.1: maxwx;wy() vs. time-shifts
Figure 3.2:
x
andy
.The signals start where the interruption is.
This is not a very fast method. It is just simply an exhaustive testing of all pos-sible time-shifts. If the dimensionality of the signal is high it will take time to
calculate maxwx;wy() for each time-shift (e.g. an image of 64
64 pixels have a
dimensionality of 4096).
How about incorporating the variablesinto the gradient search described in ch. 2?
For a gradient search to be eective there can only be one optimum - the global one. Otherwise we can get stuck in a local optimum. Before we introduced the
time-shift variable had only one optimum and a gradient search was eective (see ch. 2). Now there are lot of local optima. This can for instance be seen in
g. 3.1. So we should nd some other way to calculate the maximum of.
I should mention that sometimes it may not be necessary to try all time-shifts. In a report about ECG and Vectorcardiographic loops [2] the signals are assumed to be reasonably well-synchronized in time. The method used here was to perform exhaustive search of all time-shifts in an interval. (The similarity measure used was not canonical correlation but rather some matrix error function.) In this case it might be satisfying to use exhaustive search. But as said before the conditions can be more unpleasant.
In the next chapter I will discuss the perhaps most intuitive approach to nd the correct time-shift: LP-ltering of the signals.
LP-ltering signals
Take a second look at g. 3.1 page 12. If we want to incorporate time-shift in a gradient search we have to get rid of the local maxima. How does one 'usually' smooth out rapidly variating curves and gets rid of local maxima? - One uses LP-ltering. In this chapter I will discuss LP-ltering of the signals. It turns out not to be so eective with multidimensional signals, but I still want to discuss it because it is so fundamental. Let us start with some experiments to get us motivated in nding out what is happening more theoretically.
4.1 Experiments
We begin with a successful example of correlating LP-ltered one-dimensional signals. After that we try the same thing with multidimensional signals.
4.1.1 One-dimensional signals
Look at the signals in g. 4.1 (they have been generated by LP-ltering white noise and have zero mean). They are equal except for a time-shift of 16 samples and a scaling parameter. To nd the time-shift we can calculate the absolute value of the correlation for each possible time-shift as we did in ch. 3 (the signal period is 100 samples). We would then get the curve in g. 4.2. The global maximum at time-shift 16 is the one we are after.
If we lter the signals with an LP-lter of type 1=!we get the signals in g. 4.3.
Calculating the new correlation gives the curve in g. 4.4. Now there are only two maxima left - one global and one (almost as big) local. Explanation: because after LP-ltering, the lowest frequency component is the single largest component to aect the correlation, a time-shift of half the period from the true time-shift will be almost as good. But one false optimum is not a big deal if this method would turn out to be eective with multidimensional signals as well.
0 20 40 60 80 100 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 sample x(t) 0 20 40 60 80 100 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 sample y(t) = −2x(t+16)
Figure 4.1: Signals x(t) (upper) and
y(t) =,2x(t+ 16) (lower) 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(correlation)
Figure 4.2: Abs(correlation) vs. time-shift 0 20 40 60 80 100 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 sample xLP(t) 0 20 40 60 80 100 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 sample yLP(t) = −2xLP(t+16)
Figure 4.3: LP-lteredx andy
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(correlation) Figure 4.4: Abs(correlation) of
4.1.2 Multidimensional signals
Now we do the same thing with two multidimensional signals. Start with two 6D-signals (generated by low-ltering white noise), see g. 4.5. One of the signals is a linear transformation + a 16 sample time-shift of the other. A calculation of
maxwx;wx() for each possible time-shift is shown in g. 4.6.
0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 sample 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 0 20 40 60 80 100 −0.5 0 0.5 sample
Figure 4.5: Signal components of
x
(left) andy
(right) 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.6: maxw x; w y() vs. time-shift
Next we low-pass lter with the same lter as before. The LP-signals are shown in g. 4.7 and the correlation between them in g. 4.8.
0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 sample 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 sample
Figure 4.7: Signal components of ltered
x
(left)and
y
(right) with lter1=!0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.8: maxw x; w y() vs. time-shift
As you can see the number of local maxima have not been reduced by many. Instead the two signals have a high correlation at every time-shift. If we take an
even more powerful LP-lter (type 1=!2) we get the result in g. 4.9 and g. 4.10.
Now there is (almost) a full correlation at every time-shift! This is obviously not working. What is happening? By looking at g. 4.9 you can see that all com-ponents look alike - they have lost their individuality. This would intuitively mean that for each time-shift there are a high possibility that at least one component
in
x
matches one component iny
. That is all that takes to get a high canonicalcorrelation.
0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.1 0 0.1 sample 0 20 40 60 80 100 −0.2 0 0.2 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.05 0 0.05 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.1 0 0.1 0 20 40 60 80 100 −0.05 0 0.05 sample
Figure 4.9: Signal components of ltered
x
(left)and
y
(right) with lter1=!2 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.10: maxw x; w y() vs. time-shift
4.2 An ounce of theory #1
In this chapter I will introduce some theory to give you a feeling of the world the multidimensional signals are living in. First we will see how a signal can be explained in terms of frequencies and second how time-shift and linear transfor-mation are related.
4.2.1 Multidimensional frequencies
Monofrequency signals
Consider a parameterized vector signal
x
(t)2Rm,m2Z+. t
2Ris a parameter,
for instance time. Assume the signal to be periodic, i.e.
x
(t+T) =x
(t) for some(known)T 2R.
If the signal
x
(t) consists of only one frequency (=!=2) we can write:x
(t) = 0 B B B @ a1sin(!t+1) a2sin(!t+2) ... amsin(!t+m) 1 C C C A = 0 B B B @a1sin1cos!t+a1cos1sin!t
a2sin2cos!t+a2cos2sin!t
...
amsinmcos!t+amcosmsin!t
1 C C C A = =
v
1cos!t+v
2sin!t (4.1) wherev
1= 0 B B B @ a1sin1 a2sin2 ... amsinm 1 C C C A ;v
2= 0 B B B @ a1cos1 a2cos2 ... amcosm 1 C C C ANotice that this is simply a 2D-ellipse in Rm! The ellipse is lying in the 2D-plane
spanned by
v
1 andv
2 (or 1D-line ifv
1k
v
2) (see g. 4.11).
v2
v1
t=0 x(t)
Figure 4.11: Illustration of a multidimensional frequency
x
(t) = 1 X k=,1c
keik!1t (4.2) wherec
k= 1T Z T 0x
(t)e,ik! 1tdt (4.3)While we have only one frequency we get (remember
C
,k =conj(C
k) sincex
(t)is real):
x
(t) =c
,qe ,iq! 1t+c
qeiq!1t= 2 <fc
qgcosq! 1t ,2=fc
qgsinq! 1t whereq=!=!1. So we havev
1= 2 <fc
qgandv
2= ,2=fc
qg.Multifrequency signals
What does a periodic signal with more than one frequency look like?
Example 4.1
An example of how multidimensional signals are composed of2D-frequencies: The 3D-signal
x
() in g. 4.12 has been composed of two frequenciesf
1() andf
2() wheref
1() = 0 @ ,1 ,4 1 0 0 0 1 A cos sin ;f
2() = 0 @ ,1 1 1 1 ,1 1 1 A cos2 sin2In g. 4.12 I have included the signals projection onto the plane dened by
(1;0;0)T and (0;1;0)T as a help to visualize all 3 dimensions.
f
2 has twice the
frequency of
f
1 and I have chosen to represent that in g. 4.12 by one ellipserotating one round and the other rotating two rounds. Zero DC-component is ignored.
As seen from the Fourier series expansion (eq. 4.2) we can always describe a mul-tidimensional signal as a sum of frequencies or as we now have seen a sum of two-dimensional elliptic signals. Below follows a denition that will be useful later on.
Figure 4.12:
x
() =f
1() +f
2()Denition 4.1 (Spatially independent frequencies)
Assume we have twomul-tidimensional frequencies
f
1(t) =v
1cos!1t+v
2sin!1t andf
2(t) =v
3cos!2t+v
4sin!2t where !16
=!2 and
v
1;v
2;v
3;v
42Rm. Then the frequencies
f
1(t) and
f
2(t) are called spatially independent if [v
1;v
2]\[
v
3;
v
4] =f0g. [
v
i;v
j] meansthe subspace spanned by
v
i andv
j.N multidimensional frequencies are said to be spatially independent if no single
vector in any frequency subspace can be written as a linear combination of vectors
in the other N,1 frequency subspaces.
Or to put in another way: Each frequency occupies a subspace (plane) of one or two dimensions. If then each frequency can have one or two dimensions for itself the frequencies are in a way independent and free from one another.
Example 4.2
In Example 4.1 the two frequenciesf
1 andf
2 are not spatiallyin-dependent because " 0 @ ,1 1 0 1 A; 0 @ ,4 0 0 1 A # \ " 0 @ ,1 1 ,1 1 A; 0 @ 1 1 1 1 A # = " 0 @ 0 1 0 1 A #
Example 4.3
InR3 there cannot be four linearly independent vectors so we must
have
v
1k
v
2 or
v
3 kv
4 (or both) to have spatially independent frequencies.
Fig. 4.13 shows the case
v
1,
v
2 and
v
3 kv
4.
Example 4.4
An "eight"x
() = (cos;sin2) (see g. 4.14) is an example ofspatially independent frequencies in R
2. (The frequency subspaces in g. 4.14 are
supposed to be one-dimensional.)
An interesting detail can be noticed: a band-limited multidimensional periodic
freq1 freq2
Figure 4.13: Example of two spatially independent frequencies
=
+
Figure 4.14: Frequency decomposition of an eight
band-limited toN frequencies we can also be sure that the signal is dwelling in a
subspace of Rm with a dimension less than or equal to 2N (of course obvious if
m2N) because each frequency occupies (at most) a two-dimensional subspace
and together allN frequencies must occupy a subspace of dimension 2N or less.
Dimension 2N can only occur i all frequencies are spatially independent and each
frequency subspace is two-dimensional.
4.2.2 Time shift and linear transformation
In this section I will show that there are some connections between time-shift and linear transformation in a multidimensional signal.
One frequency
Since a multidimensional frequency is nothing but an ellipse, a time-shift is
equiv-alent to rotation+scaling: Let
x
(t) =v
1cos!t+v
2sin!t(as in eq. 4.1) and letF
= (v
1v
2). We can then writex
(t) =F
cos!t
sin!t
and a time-shift can now be written
x
(t+ t) =F
cos!(t+ t) sin!(t+ t) =F
cos!t ,sin!t sin!t cos!t cos!t sin!t ==
F
cos!t ,sin!t sin!t cos!t (F
TF
),1F
TF
cos!t sin!t = =AF
cos!t sin!t =Ax
(t) (4.4) whereA
=F
cos!t ,sin!t sin!t cos!t (F
TF
),1F
TI have here assumed that (
F
TF
),1 exists, i.e.v
1 ,
v
2. If however
v
1k
v
2 wecannot nd an equivalent transformation for every time-shift. But the assumption is not a very restricted one because even the slightest noise at this frequency will probably make it true.
Example 4.5
Ifv
1=e
kandv
2=e
k+1wheree
k ande
k+1are thek:th resp.(k+1):th basis vectors in the unit basis (i.e. 1 in the k:th resp.(k+ 1):th position and
0 elsewhere) we get
A
= 0 B B B B B B B B @ 0 ::: 0 0 ::: 0 ... ... ... ... ... 0 ::: cos!t ,sin!t ::: 0 0 ::: sin!t cos!t ::: 0 ... ... ... ... ... 0 ::: 0 0 ::: 0 1 C C C C C C C C A (4.5)which is just a rotation in the subspace spanned by
v
1 andv
2. (The non-zeroelements are in thek:th and(k+ 1):th row and column.)
Many frequencies
Suppose the frequencies were spatially independent. Then they would each have a dimension or two of their own. Like above, a time-shift can be written as a linear transformation for each frequency (if each subspace is two-dimensional). Intuitively it should be possible to nd a linear transformation for the whole signal space that can be divided into sub-transformations each one aecting only one of the frequency subspaces. If true this would mean that we still could nd an equivalent linear transformation for any time-shift. Let us conrm the intuition mathematically:
Theorem 4.1
AssumeN spatially independent frequencies and 2D-frequencysub-spaces. Then for each t there is a linear transformation matrix
A
t so thatx
(t+ t) =A
tx
(t) (4.6)Proof
see appendix A4.3 Discussion
What use can we have of this theory?
It shows that there is a dierence between multidimensional signals and one-dimensional signals (surprise?). If we wanted to nd the time-shift between two one-dimensional signals it would have been enough to look at the lowest frequency component (if the noise is low at that frequency). That is not enough with multi-dimensional signals. Filtering independent frequencies does not have any eect at all. Then the lter is equivalent to an ane transformation. I will not prove this but it is pretty easy to imagine: If the frequencies lies in separate dimensions it is possible to design a transformation that scales each frequency separately just as a lter would. And if we lter dependent frequencies we only reduce the depen-dence needed to distinguish a time-shift from a transformation. If we want to nd a time-shift between two signals we have to make sure that we keep some spatially dependent frequencies because that is where the information is hiding.
Also note that if the dimensionality of the signal is high, it is likely we need more frequencies than if the dimension were low because the frequencies are more spread out and thus less dependent in a high dimensional space.
So, is all the work in this chapter for nothing? Maybe not. Note that the idea behind LP-ltering is to emphasize the in uence of the lower frequencies. This idea can be used in another way.
Remember the projection vectors
w
xandw
yin the canonical correlation formula.Let us say we take an LP-lter that lters out the lowest frequency component. Then we have also reduced the signal into a two-dimensional subspace. When we now maximize the canonical correlation it is only interesting to move the
vec-tors
w
x andw
y in this subspace. But at the same time we ltered we also lostthe information that was lying in the dependence between the frequencies so this method did not work.
But if we could limit the vectors
w
x andw
y to only move in a suitable subspacewithouterasing the frequencies we might be more successful. For instance we know
where the lowest frequency subspace is (with the help of Fourier transform). Then it might be enough to maximize the canonical correlation in that subspace (in our
search for the time-shift t).
This idea might become clearer if it is explained mathematically:
We have two signals
x
(t) andy
(t). The lowest frequency component can be foundby Fourier transformation. Let
c
1 =Ffx(t)g
1 and
d
1=Ffy(t)g
1. To limit the
motion of
w
x andw
y to the lowest frequency subspaces we let:w
x=1 <fc
1 g+ 2 =fc
1 g= , <fc
1 g =fc
1 g 1 2 =B
x 1 2w
y=1 <fd
1 g+ 2 =fd
1 g= , <fd
1 g =fd
1 g 1 =B
y 1where
B
x=, <fc
1 g =fc
1 g ;B
y=, <fd
1 g =fd
1 gand1,2,1and2are real scalars.
Insert this in the canonical correlation formula:
= , 1 2
B
TxC
x s yB
y 1 2 s , 1 2B
TxC
xxB
x 1 2 , 1 2B
TyC
yyB
y 1 2 = = , 1 2C
xPsyP 1 2 s , 1 2C
xPxP 1 2 , 1 2C
yPyP 1 2 (4.7) wherex
P(t) =B
Txx
(t) ;y
P(t) =B
Tyy
(t) , andx
Ps(t) =B
Txx
(t+s)(P stands for Projection.) This is just the canonical correlation between
x
P andy
P (without any restriction on the projection vectors). Note thatx
P andy
P arethe projections of
x
andy
into the subspace dened by the lowest frequencycom-ponent. Like when ltering we have reduced the signal dimensionality but this time we did not loose any frequencies except the ones that are spatially
orthogo-nal to the subspace. The sigorthogo-nals
x
P andy
P are here two-dimensional. But we donot have to restrict ourselves to only the lowest frequency component subspace. We can for instance choose the two lowest frequency components and get
four-dimensional signals
x
P andy
P etc. Of course, the more frequency components weinclude the higher the dimensionality of
x
P andy
P. This increases the complexityof the calculations needed to nd t. Hence it is best to use as few components
as possible.
Now that we have reduced the dimensionality, and thus the calculation complexity,
we might use exhaustive search to nd t. Furthermore it is almost as eective
to analytically solve the eigenvalue system (eq. 2.3 or eq. 2.4 in ch. 2) to ndas
it is to use the gradient search algorithm.
Let us try this theory on the example with 6D-signals in sec. 4.1.2:
1. First nd
B
xandB
y with help of Fourier transform. Then calculatex
P(t)and
y
P(t).2. Calculate
C
xPsyP for every possible time-shifts. This can be eciently doneby convolution on each component in
C
xPsyP separately. Then calculateM
xP s =C
,1 x P x PC
xPsyPC
,1 y P y PC
yPxPs (for each time-shifts).
3. Take the square root of the largest eigenvalue of
M
xPs (i.e. max can.corr.)
for every time-shift (or an alternative could be trace(
M
xPs)). The result is0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.15: Maxp eig(
M
x Ps) vs. time-shift s4. With tknown we go back to the signals
x
andy
and calculate the canonicalcorrelation between them.
This method has one catch. As you can see in g. 4.15 we got <1 at the right
time-shift even though we did not have any noise. Why did we not get = 1?
This is due to the projection. A projection is not invariant to linear
transforma-tion. Even if
y
(t) =Ax
(t+ t) it is not sure that there is a matrixA
P so thaty
P(t) =A
Px
P(t+t). So, we cannot be sure to get the right time-shift with thismethod. For example it could fail if the transformation
A
is an extreme scalingin some dimensions and not in others.
And if there is noise present we can be even less sure to get the exact right time-shift. We might have to perform exhaustive search in a neighbourhood of the calculated time-shift. Exactly how to proceed depends on the application. The method presented here is just a suggestion.
Comment:
The theory can also explain why canonical correlation can give rise to false global optima. Take for example the 'eight' in example 4.4 (g. 4.14). A calculation of canonical correlation between the eight and a transformation of the eight (no time-shift) will give the curve in g. 4.16.
There are four global maxima. The reason is that the two frequencies present are spatially independent. Therefore the canonical correlation can look at one frequency at a time. The lowest frequency will then contribute with two global maxima and the other frequency with four global maxima (some of the maxima have the same time-shift). This shows that it is not always enough to look at only one direction. For some signals we may have to go on with the next directions in the canonical correlation algorithm (see ch. 2) and somehow put together the results from them.
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.) Figure 4.16: maxw x; w
LP-ltering canonical
correlation
Take two one-dimensional periodic signals and calculate the correlation between them for every possible time-shift. A typical curve could look like the one in gure 3.1 page 12. As said before it would be nice to get rid of all local optima. Instead of LP-ltering the signals as we did in chapter 4 we could somehow try to LP-ltering the correlation (seen as a function of time-shift) more directly. This LP-ltered correlation could be used to locate the right time-shift. Then we can
get an equality measure by calculating maxwx;wy() at this time-shift.
5.1 An idea
Let us begin with correlation of one-dimensional signals.
Assume we have two one-dimensional signalsx(t) andy(t) with a periodT = 2=!1
and zero mean. Let xs(t) =x(t+s) and take the correlation between xs and y
(note thatEfxsxsg=Efxxg): (s) = Efxsyg p EfxxgEfyyg = 1xyT1 Z T 0 x(t+s)y(t)dt= (5.1) = x1yT1 Z T +s s x(u)y(u,s)du= 1 xyT1 Z T 0 x(u)y(u,s)du
Since the integral is the convolution between x(t) and y(,t) we get the Fourier
series coecients ofas
Pk= 1xyckd
k (5.2)
whereck anddk are Fourier series coecients of x(t) resp.y(t)numerator.
F(s) that has the Fourier coecientsPFk =PkFk. If we assume real and positive
coecientsFk we can writePFk as
PFk =PkFk= 1xyckd
kFk = 1xy(ckp
Fk)(dkp
Fk)
and we therefore get
F(s) = EfxFsyFg p
EfxxgEfyyg
(5.3)
wherexF andyF are lteredxandywith a lter that has the Fourier coecients
p
Fk. In other word ltering (seen as a function of time-shift s) is the same
as using ltered signals in the and non-ltered signals in the denominator in the correlation formula.
This is not so strange for one-dimensional signals. Only the numerator aects
the maximum of seen as a function of s. The formula 5.3 is then not so hard
to understand. But is it possible that this works for multidimensional signals as well? This is not intuitively easy to answer. Now the denominator also in uences
the maximum. If the projection directions
w
x andw
y were x we could proceedas above and use ltered signals in the numerator. But for every x time-shift s
there will be dierent directions that give the maximum correlation!
And it is not safe to take x random directions. If we are unlucky we could choose directions that are uncorrelated at the time-shift where the maximum is. For instance take the case of correlating two 'eights' like the one in example 4.4 page
19. It would not be useful to choose
w
x horizontal andw
y vertical.All this complicate things a bit ... We can always try an experiment.
5.2 Experiments
Take the same 6D-signals as in section 4.1.2 (g. 4.5). Use an LP-lter of type
1=!and lter the signals (equal to lterwith a lter of type 1=!2). The ltered
signals
x
F(t) andy
F(t) are shown in g. 4.7 page 16. Now we do as before: foreach time-shift calculate maxwx;wy(F) where F =
w T x Cx Fs y F wy p w T x Cxxwxw T y Cyywy. (The
gradient search in chapter 2 could still be used with a minor alteration.) The result is shown in g. 5.1.
The maximum is found in the right place (time-shift 16 samples). We also get one false optima but that is not a problem (we can test both maxima and see which one is the true one). If we always would get a function like the one in g. 5.1 we did
not have to calculate max(F) for every time-shifts. It would simply be enough
to choose a few points and from there deduce where the maximum is. With the time-shift known we could then go back and calculate the canonical correlation with the non-ltered signal in the numerator. This method obviously works in this case. But one example does not make a theory.
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift max(can.corr.LP) Figure 5.1: maxw x; w y(F) vs. time-shift
5.3 An ounce of theory #2
5.3.1 Covariance and signal frequencies
I start by showing that the covariance matrix between two periodic signals can be
expressed with the aid of signal frequencies. Assume we have two signals
x
(t)2Rmand
y
(t) 2 Rp, both with a periodT = 2=!1 and a zero mean. Their Fourier
series expansions are
x
(t) = 1 X k=,1c
keik!1t (5.4)y
(t) = 1 X l=,1d
leil!1t (5.5)By covariance matrix I here mean the integral
C
xy=E fxy
Tg= 1 T Z T 0x
(t)y
T(t)dt (5.6)If we insert eq. 5.4 and eq. 5.5 in eq. 5.6 we get
C
xy = 1 T Z T 0x
(t)y
(t)dt= 1 T Z T 0 1 X k=,1 1 X l=,1c
kd
leik!1te ,il! 1tdt= = 1 X k=,1 1 X l=,1c
kd
lT1 Z T 0 ei(k,l)! 1tdt= 1 X k=,1 1 X l=,1c
kd
l[k,l] = = 1 X k=,1c
kd
k (5.7) By settingy
=x
we getC
xx= 1 X k=,1c
kc
k (5.8)and similar for
C
yy.5.3.2 Filtered canonical correlation
Now we are ready to see what happens when we lter the canonical correlation.
First we repeat the formula of F:
F(
w
x;w
y;s) =w
T xC
x Fs y Fw
y qw
T xC
xxw
xw
T yC
yyw
y (5.9)where
x
Fs(t) andy
F(t) are the results from lteringx
(t+s) andy
(t).Assume we have a lter that lters out only the lowest frequency component
(i.e. F1 = 1 and Fk = 0 fork
6
=1) of the signals. The ltered signals
x
F(t)and
y
F(t) would then only have the Fourier coecients corresponding tok=1.Eq. 5.7 thus gives (remember Ff
x
(t+s)gk = eik!1s
Ff
x
(t)gk = eik!1s
c
k andc
,k =conj(c
k) for real functions.)C
xFsyF =c
,1e ,i! 1sd
,1+c
1e i!1sd
1= = conj(c
1e i!1sd
1) +c
1e i!1sd
1= 2 <fc
1e i!1sd
1 g= = 2< (<fc
1 g+i=fc
1 g)(cos! 1s+isin!1s)( <fd
1 gT ,i=fd
T 1 g) = = (<fc
1 gcos! 1s ,=fc
1 gsin! 1s) <fd
1 gT+ +(<fc
1 gsin! 1s+ =fc
1 gcos! 1s) =fd
1 gT = = , <fc
1 g =fc
1 g cos!1s sin!1s ,sin! 1s cos!1s <fd
1 gT =fd
1 gT (5.10) Insert eq. 5.10 in eq. 5.9:F =
w
Tx , <fc
1 g =fc
1 g pw
T xC
xxw
x cos!1s sin!1s ,sin! 1s cos!1s <fd
1 gT =fd
1 gTw
y qw
T yC
yyw
y = =a
Tx cos!1s sin!1s ,sin! 1s cos!1sa
y (5.11) wherea
x = <fc
1 gT =fc
1 gTw
x pw
T xC
xxw
x ;a
y= <fd
1 gT =fd
1 gTw
y qw
T yC
yyw
yWe know from Cauchys inequality that F k
a
x kka
y k. Sincea
x anda
y are two-dimensional and cos!1s sin!1s ,sin! 1s cos!1sis a rotation matrix there is always
a solutionsfor every pair of vectors
a
x,a
y so thatF =k
a
xkk
a
yk.
Therefore if we want to nd argmaxw
x; w
x;s(F) we can proceed as follow:
1. Maximize k
a
x k,ka
y k. This givesw
x andw
y.2. Find the angle between
a
xanda
y. This gives the time-shifts(two solutions)Note that the procedure above gives two possible solutions to s (an angle
be-tween two vectors can be described either by or+). We have to calculate
maxwx;wy() for both solutions and take the largest one.
We have one thing left to discuss in the process above: How do we maximize
k
a
x kandka
y k? We have ka
x k 2 =a
T xa
x=w
T x , <fc
1 g =fc
1 g <fc
1 gT =fc
1 gTw
yw
T xC
xxw
x = =w
Tx( <fc
1 g<fc
1 gT+=fc
1 g=fc
1 gT)w
xw
T xC
xxw
x = =w
Tx(c
,1c
T ,1+c
1c
T 1)w
xw
T xC
xxw
x =w
TxC
x F x Fw
xw
T xC
xxw
x (5.12) ka
x k 2 = ::: =w
T yC
yFyFw
yw
T yC
yyw
y (5.13) This looks a lot like the canonical correlation formula. We can in fact use thesame type of gradient search algorithm to nd max(k
a
x
k) and max(k
a
y
k) as we
used to nd maxwx;wy() in chapter 2 (with
y
=x
and a minor justication dueto LP-ltered signals in the numerator):
(
w
x(t+ 1) =w
x(t) +w
x(t)w
y(t+ 1) =w
y(t) +w
y(t) (5.14) where (w
x=x(x
Fx
TFw
^x ,xx
Tw
x)w
y=y(y
Fy
TFw
^y ,yy
Tw
y) (5.15)5.3.3 One step further
What happens if the lter is broader (i.e. if we use more than one frequency)?
Then
a
x anda
y are no longer two-dimensional and the matrix between them ineq. 5.11 is more complex. It is hard to tell what is happening now (and even
harder if noise is present). We should still be able to use
a
x anda
y though. Theane transformations. If there is no noise present the maximum of k
a
xkwould
be characteristic for the signal (and depending of the lter used) and independent of time-shift and ane transformation. An example to show you what I mean:
Assume
y
(t) =Ax
(t+ t). Maximize ka
x k 2 =w
T xC
x F x Fw
xw
T xC
xxw
x ka
y k 2 =w
T yC
yFyFw
yw
T yC
yyw
y =w
TyAC
xFxFA
Tw
yw
T yAC
xxAw
yAs mentioned in sec. 2.2: It can be shown that if the generalized eigensystem
C
x Fx
F
w
=xC
xx
w
has distinct eigenvaluesfxigthen there is only one
maxi-mum ofk
a
x k
2. The same goes for
k
a
yk
2. Maximizing the functions above should
thus give two vectors:
w
y andw
x=A
T
w
y. Then construct x(t) =w
xTx
(t) =w
T yAx
(t) y(t) =w
yTy
(t) =w
T yAx
(t+ t) =x(t+ t)These two one-dimensional signals can help to detect the time-shift t.
If we know something about the noise we can choose a lter that moderates fre-quencies with high noise. A new process could look something like this:
1. Find argmax(k
a
x(
w
x)k) and argmax(k
a
x(
w
x)k) using some suitable lter.
2. Create the one-dimensional signalsx(t) =
w
Txx
(t) andy(t) =w
T
y
y
(t). Usethese signals to nd a time-shifts.
3. With the time-shift found, calculate maxwx;wy() at that time-shift to get
the similarity measure.
5.4 Discussion
If there is noise present we cannot trust the process above completely. The second
step, i.e. ndings, might be incorrect. But if the noise is not too high the process
might nd a time-shift near the right one. Step three in the process can then be
modied to test all time-shifts in a neighborhood of the calculateds. Then we can
choose the time-shift that gives the highest value of . This is still a lot cheaper
than have an exhaustive testing of all time-shifts.
Lets try this new theory on the 6D-signals in section 4.1.2 (g. 4.5). The
l-ter used is of type 1=!0:2.
1. Filter
x
andy
. Maximizeka
x
kand k
a
xk. This gives vectors
w
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(correlation) Figure 5.2: jcorr(x(t+s);y(t))jvs.s 2. Create x(t) =
w
T xx
(t) and y(t) =w
Ty
y
(t). Calculating the correlationsbetween them for every time-shift gives the function in g. 5.2. This can be done in eective ways with one-dimensional signals. In the gure we can clearly see a global maximum at time-shift 16 samples.
3. Usings= 16 we can now calculate maxw
x; w
y((16)) = 1.
Like in the previous chapter this is also only a suggestion. The use should be investigated in each application. I have not thoroughly discussed the in uence of noise and not either how wide neighborhood that should be tested in step 3 in the method above.
Signal norm
Finally I will discuss another method. Again the idea is to nd the time-shift
rst and then calculate maxwx;wy() at that shift, but the method of nding the
time-shift diers from the previous ones.
6.1 Experiments
Assume
y
(t) =Ax
(t+ t) +e
(t) wheree
(t) is a noise signal. Also assumee
(t) isuncorrelated with
Ax
(t+ t) and thatEfe
T(t)e
(t)g=e(independent of time).If
A
is orthonormal, i.e.A
TA
=I
, we can simply use the squared norms of thesignals to nd the time-shift. The squared norms are
u(t) = j
x
(t)j2=
x
T(t)x
(t)v(t) = j
y
(t)j2=
y
T(t)y
(t)And the expectation value ofv(t) is
Efv(t)g = Ef
x
T(t+ t)A
TAx
(t+ t) + 2e
T(t)Ax
(t+ t) +e
T(t)e
(t)g== u(t+ t) +e
If we look at the correlation betweenu(t+s) andv(t) for every possible time-shift
s(or the convolution betweenu(t) andv(t)) we will get a peak at the right
time-shifts= t.
So far so good. But what will happen if
A
is not orthogonal (it might not evenbe square)? An experiment is a good start:
Generate a 5D-signal,
x
(t), by LP-ltering noise. Also generate a random0 B B B B @ 1 2 3 4 5 1 C C C C A = 0 B B B B @ 0:005 0:0440 0:2970 0:5298 1:2750 1 C C C C A
These eigenvalues are of interest - see the theory in next section. Note that A is
far from orthonormal. Let
y
(t) =Ax
(t+ 16) (tstands for samples). The squarednorms of
x
andy
(i.euand v) are shown in g. 6.1. The absolute values of thecorrelation between u(t+s) and v(t) as function of sare shown in g. 6.2. The
maximum value is found at time-shift 16.
0 20 40 60 80 100 0 0.05 0.1 0.15 0.2 0.25 sample (t) u(t) 0 20 40 60 80 100 0 0.05 0.1 0.15 0.2 0.25 sample (t) v(t)
Figure 6.1: Squared norms of
x
(left) resp.y
(right)0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift abs(corr(us,v)) Figure 6.2: Abs(correlation(u(t+s),v(t)) vs.s
It worked even though
A
were not orthonormal! Why? When does it notwork?
6.2 An ounce of theory #3
Ignore the noise. If the noise and the signals are uncorrelated and the varianse of the noise is constant in time it will only aect the mean of the signal norms.
Assume
y
(t) =Ax
(t+ t) wherex
(t)2Rm andy
(t)2Rn. The squared normv(t) =
x
T(t+ t)A
TAx
(t+ t)A
TA
symmetric)there are matricesU
, orthonormal ;D
= 0 B B B @ 1 0 ::: 0 0 2 ::: 0 ... ... ... ... 0 0 ::: m 1 C C C A such thatA
TA
=U
TDU
(Spectral thesis in linear algebra theory.)
Let
z
(t) = 0 B @ z1(t) ... zm(t) 1 C A=Ux
(t). We then have u(t) =x
T(t)x
(t) =x
TU
TUx
(t) =z
T(t)z
(t) v(t) =x
T(t+ t)U
TDUx
(t+ t) =z
T(t+ t)Dz
(t+ t) This can be rewritten asu(t) =
z
T(t)z
(t) =z2 1(t) +z 2 2(t) +:::+z 2 m(t) = = , 1 1 ::: 1 0 B B B @ z2 1(t) z2 2(t) ... z2 m(t) 1 C C C A v(t) = 1z 2 1(t+ t) + 2z 2 2(t+ t) +:::+mz 2 m(t+ t) = = , 1 2 ::: m 0 B B B @ z2 1(t+ t) z2 2(t+ t) ... z2 m(t+ t) 1 C C C AHence uandv can be seen as two dierent projections of the signal
s
(t) = 0 B B B @ z2 1(t) z2 2(t) ... z2 m(t) 1 C C C ASo, if the transformation
A
is fairly isometric, i.e. the eigenvalues i are of theable to determine the time-shift tfromu(t) andv(t).
But the matrix
A
in the experiment in the previous section was not isometric.Why did this method still work?
Assume for a minute that t= 0. This will not restrict the following discussion.
A projection of a signal can be thought of as projecting the frequencies sepa-rately and then add the results together. As said in chapter 4.2 each frequency is simply an ellipse in some 2D-subspace. A projection of a frequency will give a one-dimensional frequency with a phase that depends on the projection. Two dierent projections will give two dierent phases. (Dierent time-shifts and equal projections will also give dierent phases but this is ignored here.) Hence, if there were two dierent projections, each frequency will give an incorrect contribution to the correlation above - they will 'pull' the convolution peak away from the right time-shift. But if there are many frequencies and they are evenly spread in space it might still work. Then they will each indicate a too high or too low time-shift but when added together we will see a peak near the right time-shift.
Also notice: All eigenvaluesiare positive (because
AA
T is positive semidenite).Therefore the projections ( 1 1 ::: 1 ) and ( 1 2 ::: m ) are in the
same 'multidimensional quadrant'. They will therefore never totally contradicting each other even if the eigenvalues diers in size.
Remember the example in sec. 6.1. Look at g. 6.3. The left gure shows the convolution (absolute value) between the squared norms of the signals lowest fre-quency. The peak is shifted a little to the left. Next the second lowest frequency is added and the squared norms are convoluted again. Now the global peak indicates a little too high time-shift. Finally when the third lowest frequency is added the peak is found at the right time-shift (16) and remains there when the rest of the frequencies are added.
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1x 10 −3 sample (time−shift)
abs.conv. of norms of freq.1
0 20 40 60 80 100 0 0.5 1 1.5 2 2.5 3 3.5x 10 −3 sample (time−shift)
abs.conv. of norms of freq.1+2
0 20 40 60 80 100 0 1 2 3 4 5 6 7 8x 10 −3 sample (time−shift)
abs.conv. of norms of freq.1+2+3
Figure 6.3: Absolute value of the convolution between the norms (with zero mean) of the signals: lowest frequency (left), two lowest frequencies (middle), three lowest frequencies (right)
6.3 Discussion
A case where this method does not work is the example with the 6D-signals in chapter 4 (sec. 4.1.2). The correlation is shown in g. 6.4. We have a local maximum at time-shift 16 but the global maximum is somewhere else. (It is not
always certain that there even is a maximum at all at the right time-shift.) The global maximum is not much larger than the local ones though and that could be an indication that the method did not work.
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 time−shift (s) abs(corr(us,v))
Figure 6.4: Absolute value of the correlation between the squared norms of the 6D-signals in sec. 4.1.2 as function of time-shift
The eigenvalues of
A
TA
are here0:0030 0:0100 0:3696 0:7575 0:9036 8:4344
They are not in the same order of magnitude and consequently the method is less likely to succeed.
The method discussed in this chapter is most reliable if the transformation matrix
A
is fairly isometric. If that is not the case it can still work if the signals aremul-tifrequent and spread in space. The motivation in the previous section is however rather 'ad hoc' and the method should be evaluated at each specic practical use. The method did not work in the example above. But on the other hand we did not get a distinguished global maximum and this could serve as an indication of failure and we should try something else.
The method is simpler than the ones described in the previous chapters and can be worth trying before using anything more complicated.
Round-up
7.1 Summary and comparisment
The idea throughout the whole thesis has been to rst locate the time-shift tand
then nd maximum canonical correlation at that time-shift (or test all time-shifts in a neighborhood). In each of the previous three chapters I have presented a sug-gestion how to approach the problem of nding the time-shift. They are presented once again down below as block schedules. There are also complexity estimations.
Assume
x
(t) andy
(t) are bothm-dimensional and that we haveN sample pointsof each signal.
1. Project
x
,
y
into suitable subspaces (chapter 4)
Assume the subspaces to bep-dimensional (i.e. calculatep=2 Fourier coecients).
x
(t) -F -? fc
kg projection P P P P qx
P(t)y
(t) -F -? fd
kg projection 1y
P(t) argmaxs(trace(M
xP s)) - s O(pmN) O(pmN) O(p2NlogN)Complexity: O(pmN) (assumingpsmall so thatm > plogN.)
x
(t) - Filterx
-F -argmax w x( w T x Cx F x F wx w x C xx w x )w
x ? -w
T xx
(t) S S S w x(t)y
(t) - Filtery
-F -argmax wy( w T y Cy F y F wy wyCyywy )w
y ? -w
T yy
(t) 7 y(t) argmaxs jcorr(xs,y)j -s O(mNlogN) O(mN) O(mN) O(NlogN) Complexity: O(mNlogN)3. Signal norms (chapter 6)
x
(t) -kx
(t)k 2 P P P P q u(t)y
(t) -ky
(t)k 2 1 v(t) argmaxs(jcorr(us,v)j) - s O(mN) O(NlogN) Complexity: O(mN)In the complexity estimation I have assumed fairly high dimensionalitymso that
m >logN.
As a comparison let us calculate the complexity of performing exhaustive testing of all time-shifts like we did in chapter 3: Using the gradient search method for
each time-shift givesO(mN2). Another method is to rst calculate the covarians
matrix
C
xs
yfor each time-shifts. This can be done by correlation for each element
in the matrix separately with aid of FFT (cost: O(m2NlogN)). Then calculate
M
xs dened in chapter 3 (for each time-shift s) (cost: O(m
2N)). The time-shift
can then be nd by maximizingtrace(
M
xs) with respect tos. Total complexity:
O(m2NlogN).
As mentioned before the methods 1 and 3 are not invariant to linear transforma-tions. I still present them because they might be successful in certain applications and they also reveal some of the problems we are faced with when dealing with multidimensional signals. The second method is invariant to linear
without concern of the other signal. This makes the method more sensitive to noise than if we somehow had considered the mutual information between the signals like we do in for instance canonical correlation (similar arguments can be made for the other two methods).
7.2 Future work
I have only occationally mentioned the in uence of noise on the suggested meth-ods. This could be more thoroughly investigated, for instance what kind of lter that should be used in method 2 in order to minimize the in uence of noise. The test-signals used in this thesis were all home-made. Next step could be to try the methods in a practical application. This might reveal some new insights of the problems (like the ancient Greeks I have only philosophized without testing my theories ...).
Before I end this thesis I brie y want to mention an idea that recently came up (by my examiner) (had no time to look deeper into it):
It is based on LP-ltered canonical correlation (chapter 5) and is an attempt to incorporate the time-shift into the gradient search method described in chapter 2. The idea is to maximize
F(
w
x;w
y;s) =w
T xC
xFsyFw
yw
T xC
xyw
y (7.1)with respect to
w
x,w
y and s using some gradient search similar to the onede-scribed in chapter 2 (and with the variablesadded to the gradient search).
Step one in the idea is to start with letting
x
F andy
F be the lowest frequencycom-ponents of
x
resp.y
. MaximizingF will give a certain solution (w
(1)x ;
w
(1) y ;s(1)).
Next step is to add the second lowest frequency component of
x
andy
, i.e. lettingx
F andy
F be the two lowest frequency components ofx
resp.y
. Now we use(
w
(1)x ;
w
(1) y ;s(1)) as start point in a gradient search on the new
F. Maximizing
the new F will give a solution (
w
(2)x ;
w
(2) y ;s(2)). Continue by adding more and
more frequency components so that in step i we let
x
F andy
F containing theilowest frequency components of
x
resp.y
. MaximizeFstarting from the previousmaximum point (
w
(i,1) x ;w
(i,1) y ;s (i,1)) found in stepi ,1.The thought is that for certain signal spectra (maybe for instance spectra of
type 1=!2) we can be sure that by starting at the point (
w
(i,1) x ;
w
(i,1) y ;s
(i,1))
in the gradient search in step i we will end up in the global maximum of F
(i.e. (
w
(i,1)x ;
w
(i,1) y ;s(i,1)) is in some way closer to the global maximum than to
any local maximum). If this holds for alliwe can be sure to nally end up in the
global maximum ofwhen letting i!1.
I mention this idea because I nd it interesting and because it (if working) is
a way to include the time-shiftsinto the gradient search, which maybe originally
Proof of theorem 4.1
Assumptions: N spatially independent frequencies and 2D-frequency subspaces.
Our mission is to show that for every time-shift t there is an equivalent
lin-ear transformation
A
tso thatx
(t+ t) =A
tx
(t).Start with the Fourier series expansion of
x
(t)2Rm(assuming zero DC-component):x
(t) = XN k=,Nc
keik!1t= N X k=1 2<fc
keik! 1t g= N X k=1F
k cosk!1t sink!1t (A.1) whereF
k = 2(<fc
kg=fc
kg).The number of terms in the sum are nite since the frequencies by assumption are
spatially independent and the signalspace is of nite dimension (m).
Let
F
= (F
1F
2:::F
N). IfF
is anmm-matrix we can let the columns in
F
bea basis for Rm, otherwise we can ll out with a set of vectors to a basis ofRm.
Let the columns in the matrix
B
represent the basis, i.e. :B
= (F G
)where the columns in
G
are the vectors needed to ll outF
to get a basis.We can transform this basis to the unitbasis
E
by the transformation matrixB
,1:E
=B
,1B
. This means thatB
,1F
k = 0 B B B B B B B B @ 0 0 ... ... 1 0 0 1 ... ... 0 0 1 C C C C C C C C A = (e
2k,1e
2k)where
e
2k,1 ande
2k are the 2k,1:th resp. 2k:th unitvector. In this basis a
a structure like the one in example 4.5 (eq. 4.5), i.e. a subspace rotation.
A
k willonly aect thek:th frequency and if we sum all
A
k:s into one matrixA
we getA
= 0 B B B B B B B B @A
1 :::0 0
:::0
... ... ... ... ...0
:::A
N0
:::0
0
:::0 0
:::0
... ... ... ... ...0
:::0 0
:::0
1 C C C C C C C C A = (A.2) = 0 B B B B B B B B B B B B @ cos!1t ,sin! 1t ::: 0 0 0 ::: 0 sin!1t cos!1t ::: 0 0 0 ::: 0 ... ... ... ... ... ... ... 0 0 ::: cosN!1t ,sinN! 1t 0 ::: 0 0 0 ::: sinN!1t cosN!1t 0 ::: 0 0 0 ::: 0 0 0 ::: 0 ... ... ... ... ... ... ... 0 0 ::: 0 0 0 ::: 0 1 C C C C C C C C C C C C ASo nally a time shift tcan be written as
x
(t+ t) = XN k=1F
k cos!1(t+ t) sin!1(t+ t) =B
XN k=1B
,1F
k cos!1(t+ t) sin!1(t+ t) = =B
XN k=1 ,e
2k,1e
2k cos!1(t+ t) sin!1(t+ t) = =B
XN k=1A
k,e
2k,1e
2k cos!1t sin!1t = =B
XN k=1A
kB
,1F
k cos!1t sin!1t = =B
(A
1+:::+A
N) N X k=1B
,1F
k cos!1t sin!1t = =BAB
,1x
(t) (A.3)and we have found the linear transformation
A
t=BAB
Bibliography
[1] M. Borga. Reinforcement Learning Using Local Adaptive Models, August 1995. Thesis No. 507, ISBN 91{7871{590{3.
[2] L. Sornmo. Optical alignment of vectorcardiographic loops. Signal Processing Report 35, Department of Applied Electronics, Lund University, December 1996.