Passive Aircraft Altitude Estimation using Computer Vision

(1)

Link¨

oping Studies in Science and Technology

Thesis No. 847

Passive Aircraft Altitude

Estimation using Computer Vision

Anders Moe

LIU-TEK-LIC-2000:43 Department of Electrical Engineering

Linköpings universitet, SE-581 83 Linköping, Sweden Linköping September 2000

(2)

(3)

Abstract

This thesis presents a number of methods to estimate 3D structures with a single translating camera. The camera is assumed to be calibrated and to have a known translation and rotation.

Applications for aircraft altitude estimation and ground structure estimation ahead of the aircraft are discussed. The idea is to mount a camera on the aircraft and use the motion estimates obtained in the inertia navigation system. One reason for this arrangement is to make the aircraft more passive, in comparison to conventional radar based altitude estimation.

Two groups of methods are considered, optical flow based and region tracking based. Both groups have advantages and drawbacks.

Two methods to estimate the optical flow are presented. The accuracy of the estimated ground structure is increased by varying the temporal distance between the frames used in the optical flow estimation algorithms.

Four region tracking algorithms are presented. Two of them use canonical correlation and the other two are based on sum of squared difference and complex correlation respectively.

The depth estimates are then temporally filtered using weighted least squares or a Kalman filter.

A simple estimation of the computational complexity and memory ments for the algorithms is presented to aid estimation of the hardware require-ments.

Tests on real flight sequences are performed, showing that the aircraft altitude can be estimated with a good accuracy.

(4)

(5)

Acknowledgements

I would like to thank all the people who have helped and supported me, and especially:

All past and present colleagues and friends at the Computer Vision Laboratory for creating a great environment to work in.

My supervisor professor G¨osta Granlund, for giving me the opportunity to work here.

Mats Andersson my assistant supervisor for giving comments and ideas on my work. And helping me with all kind of problems.

Per-Erik Forss´en for reading, correcting and commenting the manuscript. Johan Wiklund for keeping the computers and other technical equipment in excellent condition.

Anders Erlandsson and Per-Erik Lilja at SAAB Aerospace in Link¨oping for providing test sequences and other information.

NFFP (Nationellt Flyg Forsknings Program) for funding the research presented in this thesis.

(6)

(7)

Introduction

1.1 Motivation

In this licenciate thesis, methods to estimate 3D-structure from images captured with a camera with known translation and rotation are considered. The camera is assumed to be calibrated. Both optical flow and region tracking based methods are considered.

The work presented here is done within a joint venture project between SAAB Aerospace in Link¨oping and the Computer Vision Laboratory at Link¨oping uni-versity. The project aims at replacing the radar altitude meter with a camera, see figure 1.1. The purpose is to make the aircraft more passive, thereby reducing the risk of detection. This will naturally only work when the visibility of the ground is reasonably good. The camera translation and rotation are obtained from the aircraft’s inertial navigation system. A further goal of this project is to estimate the 3D-structure ahead of the aircraft and possibly to detect obstacles [4] [5].

15.5 11.8 Camera

Figure 1.1: Camera mounted to estimate 3D structure ahead of the aircraft. Since this project also aims at determining the possibility to implement such a system in a not too distant future (about 10 years ahead) the computational com-plexity is of great importance. The memory consumption has also been considered. This has resulted in the exclusion of methods of high computational complexity and/or high memory consumption at an early stage.

The robustness of the method is of even greater importance since it might be dangerous to provide the pilot with wrong altitude estimates.

(12)

applica-CHAPTER 1. INTRODUCTION

tions like autonomous robots where the robot can for example use a 3D model of the world to be able to physically interact with it [1] [2] [3]. Another application is reverse CAD model generation [6].

1.2 Contributions

The main contribution is considered to be the canonical correlation based displace-ment estimation method that uses cos2 filters, presented in section 5.4.4. Some other minor contributions are:

• To choose the temporal distance between the frames in the optical flow

esti-mation algorithms depending on the aircraft altitude and motion. (Section 3.2)

• The use of plane constraints instead of line constraints on the displacement.

(Section 4.5)

• To warp the tracked regions depending on the obtained depth estimates and

the known motion, to improve the tracking. (Section 5.6)

• Use knowledge about the camera motion in the motion model used when

computing the optical flow. (Section 4.2.1)

1.3 Thesis outline

Chapter 2 gives an introduction is to the problem of estimating 3D structure, with special reference to the structure from known motion problem.

Chapter 3 presents some methods to obtain constraints on the image velocity. Methods to extract the velocity or displacement from the constraints are pre-sented in chapter 4. Some motion models are introduced.

Chapter 5 presents some region tracking algorithms, estimating the positions of the regions with sub-pixel accuracy.

Temporal filtering of the depth estimates is introduced in chapter 6. Weighted least squares and Kalman filters are considered.

Chapter 7 gives some estimates of the computational complexity and memory requirements for the methods presented.

In chapter 8 some results are presented.

Finally in chapter 9 a short summary is given together with some suggestions on future developments.

1.4 Notations

Uppercase letters in boldface, A, are used for matrices and lowercase letters in boldface, a, are used for vectors. Vectors are always column vectors. The transpose with conjugate is denoted, A0, and without conjugate AT_{. Normalized vectors are}

(13)

Chapter 2

3D structure estimation

2.1 Introduction

There are several ways to estimate 3D structure, and which one to choose often depends on the application. Some methods are:

• Laser or radar range meter.

• Range camera using triangulation. A light beam is projected on the ground. • Ultrasound range estimation.

• Out-of-focus blur [11].

• Stereo vision [9] (disparity estimation)

• Structure from unknown motion [12] [10]. Note that this only gives the

structure up to a scaling factor.

• Structure from known motion [7] [8]

Since the system should be passive the first three methods are excluded. The out-of-focus method is not considered to give good enough estimates. Stereo vision might work if the separation between the cameras is large enough, this would in our case involve placing the cameras on each wing tip. The problem with this approach is that the wings are not rigid which makes the disparity estimation hard. The structure from unknown motion is not considered since this is computationally demanding and the camera motion is known with quite good accuracy from the already present inertia system. So the selected method is thus structure from known motion. Observe that some of the methods given above can be combined, for example disparity estimation and structure from known/unknown motion. This will increase the accuracy and robustness of the system, see [13].

2.2 Structure from known motion

We assume a pin-hole camera, and situate a coordinate system with the origin in the focal point of the camera and the z axis parallel with the normal of the image plane, see figure 2.1.

(14)

CHAPTER 2. 3D STRUCTURE ESTIMATION

f

Y

X

Z

y

x

O

Ω

x y z

Figure 2.1: Camera geometry.

Under perspective projection a scene point (X, Y, Z) is projected onto the image plane x, y with x y = f Z X Y (2.1) where f is the focal length. If the focal length is constant, the image velocity can be found by differentiating equation 2.1 with respect to time (only the x part is shown)

˙x = f

ZX˙ − x

ZZ˙ (2.2)

where ˙X, ˙Z are time derivatives of X, Z respectively. These can be separated

into two parts, the part caused by the translation of the camera Tx and the part

caused by the rotation Trx respectively

˙

X = Tx+ Trx= Tx+ (Ω× r) · ˆX (2.3)

where Ω is the rotation of the camera,× is the cross product and r = (X, Y, Z)T. This gives

˙

X = Tx+ (ZΩy−

Zy

f Ωz) (2.4)

This finally gives

v(x, y) = 1 Z f 0 −x 0 f −y T + − xy f (f + x2 f ) −y −(f + y2 f ) xy f x ! Ω (2.5)

(15)

2.3. ERROR MECHANISMS

where T is the instantaneous translation and Ω the instantaneous rotation. Here we can see that if the translation component is known we may compute the scene structure from the estimated image velocity field. Note that the velocity caused by the rotation is an approximation if the above expression is used with time discrete variables.

Another way of estimating the 3D coordinates is to use triangulation, see figure 2.2 where the translation T , rotation R, and the angles α and β are known. From this, the distance r can be estimated with triangulation.

t _t+1

T R

α _β

r

γ

Figure 2.2: Triangulation gives r. The law of sines gives

r = Tsin(α)

sin(γ) (2.6)

where the angles are found from the plane rotation, camera mounting and the image coordinates in the new and old image. While the first method is appropriate to use when using optical flow, and the second is more appropriate to use when using region tracking.

2.2.1 Coordinate transformations

Three different coordinate systems are used: The ground coordinate system with

x to the north, y to the east and z towards the earth center. The plane coordinate

system with x in the direction of the aircraft, y to the right and z down. The camera coordinate system is given in figure 2.1. Since the mounting of the camera is known and the attitude of the aircraft is known from the inertial system, the transformations between the coordinate systems are known. The transformation matrices are Mgaand Mac. Mga is the transformation between ground and

air-craft coordinates. And Mac is the transformation between aircraft and camera

coordinates.

2.3 Error mechanisms

(16)

CHAPTER 2. 3D STRUCTURE ESTIMATION

• To determine the possible performance given an error in the measurements

of the translation and rotation.

• To determine the necessary performance of the optical flow and the

displace-ment estimates, to get a certain accuracy in the depth (Z) estimates.

• To determine when the depth estimates are reliable (useful).

If only the x-direction is considered, the depth estimate can be written as

Z = f Tx− xTz vx− vrx

(2.7) where vrxis the velocity induced by the rotation. From this we can see that the

error in the depth is linearly dependent on the error in the measured translation

T. So large errors caused by this will probably not occur. On the other hand, the

error caused by errors in the measured rotation and in the estimated velocity in the image is nonlinear. As an example we can see how large a rotation around the

y-axis is needed to get a velocity of 1 pixel/frame in the x-direction, looking in

the optical center and assuming an focal length of 2000 pixels gives a rotation of about 0.5 mrad/frame which is about 0.03 degrees/frame. From this we can see that the depth estimates will be very sensitive to errors in the rotation estimates. To reduce the sensitivity we would like to have large velocities in the image which are caused by translation of the camera. To obtain as large image velocities as possible given a translation T of the camera one would like to have the translation vector T in the x, y plane. In this application this implies that the camera should point straight down from the aircraft. However, this would make it impossible to estimate the 3D-structure sufficiently far ahead of the aircraft (and also to detect obstacles) so a compromise has to be made. Some more examples of error mechanisms are given in chapter 6. A more thorough investigation of this topic, has been done by SAAB Aerospace.

2.4 Optical flow or tracking?

There are two main principles of image motion estimation. The first one is to estimate the velocities in the image, this is denoted optical flow. The other method is to track image regions during a number of frames, this is denoted displacement estimation. Some optical flow methods are presented in chapter 3 and some region tracking methods in chapter 5. While the velocities are usually estimated in every pixel position the displacements are usually estimated in a few selected positions since the tracking algorithms are often computationally demanding.

So which method should one choose for this application? Some advantages with the different methods are given below.

Optical flow:

• A dense grid of height estimates is obtained.

• Suitable for detection of obstacles and moving objects.

• The amount of computations needed is the same for each frame with just a

(17)

2.4. OPTICAL FLOW OR TRACKING?

Displacement estimation:

• More accurate depth estimates are obtained.

• The amount of computations needed can easily be decreased by selecting

fewer regions.

(18)

(19)

Chapter 3

Optical flow estimation

3.1 Introduction

There are various ways to estimate the optical flow. The methods presented in this chapter are based on the methods by Lucas and Kanade [16, 17] and Fleet and Jepson [35]. These methods has achieved good results in a comparative study done by Barron et. al. [34], and the resulting algorithm can be made quite efficient which is important in this application.

3.2 Flow constraint extraction

A constraint on the motion can be obtained by assuming that some property image

G(x, y, t) is constant in time dG

dt = 0. Using the chain rule gives us

dG dx dx dt + dG dy dy dt + dG dt dt dt = 0 (3.1)

which is the gradient constraint equation, see [14]

Gxvx+ Gyvy+ Gt= 0 (3.2)

where (vx, vy) is the optical flow vector. The time derivative Gt will here

be approximated with the difference between two time instances of G, to make some extensions possible. This also reduces the computational complexity and the memory requirement. G is usually the intensity image I or some low-pass filtered version of this, but other properties can be used as well. One example is the phase, see next section. Since this only gives one constraint and we have two unknown variables we must have at least two different properties G or make some assumption about the spatial variation of the optical flow. Some solutions to the later problem are given in chapter 4.

Some extensions to the constraint equation will now be made. Since we are not interested in the flow caused by rotations of the camera in this application, we would like to remove this part before solving the problem of finding the flow (vx, vy) from the constraint equations. Since the rotations are known this can

(20)

CHAPTER 3. OPTICAL FLOW ESTIMATION Gx(vx+ vrx) + Gy(vy+ vry) + Gt= 0 (3.3) where vrx vry = − xy f (f + x2 f) −y −(f +y2 f) xy f x ! Ω (3.4)

from eq. 2.5. To handle interlaced sequences the motion induced when using different half images is removed.

Gxvx+ Gy(vy+ vi) + Gt= 0 (3.5)

where vi = 0.5, 0,−0.5 depending on if the time difference is taken between:

An odd and an even half image. Two odd or even half images. An even and an odd half image respectively.

The constraint can be seen as a linearization, so this will usually not work for large velocities. To make the algorithm able to handle large velocities the image is either sub-sampled to get an initial estimate, or, if a prediction exists this can be used instead. If a reliable depth estimate Z exists eq. 2.5 gives a good prediction. The time derivative is then estimated as Gt(vcx, vcy) = Gn(x, y)− Go(x− vcx, y−

vcy) so that the linearization is made near the solution. (vcx, vcy) is the estimated

or predicted velocity. This need to be compensated for in the constraint which gives

Gx(vx− vcx) + Gy(vy− vcy) + Gt(vcx, vcy) = 0 (3.6)

In chapter 6 it will be made plausible that it is favorable to have quite large displacement estimates. This can be achieved by making the time derivation be-tween two frames with suitable time difference, Gt(k) = G(t)− G(t − k). If the

velocity (vx, vy) is assumed to be constant over a number of frames this can be

used to get several constraints making the algorithm more robust. If the velocities between frame t and frame t− N should be computed this gives the N constraints

W (k)(Gx(

k

Nvx) + Gy( k

Nvy) + Gt(k)) = 0 (3.7)

where W is a set of weights for the different constraints and k = 1, 2, ..., N . The final constraints can now be written as

C(l)W (k) ( G(l)_x (k Nvx+ vrx(k)− vcx(k)) + + G(l)y ( k Nvy+ vry(k)− vcx(k) + vi(k)) + + G(l)_t (vcx(k), vcy(k), k)) = 0 (3.8)

where G(l)are the properties used and C(l) is the confidence for the constraint, see section 3.5.

(21)

3.3. PHASE-BASED METHOD

3.3 Phase-based method

3.3.1 Introduction

The local phase is here used as G because of some of the good properties it has. For example, invariance to scalings of the intensity I and also to changes in the DC level, for more details see [36]. The local phase is estimated by using quadra-ture filters. Each filter direction gives N constraints, one G(l) for each direction. Several filters with different center frequencies can also be used to get even more constraints. However, the improvement by using M + 1 filters instead of M is small for large values of M (M≈ 10) if these filters have been selected in a proper way.

3.3.2 Phase estimation

As mentioned the local phase is estimated with quadrature filters. The local phase can be seen as an approximation of the instantaneous phase. The instantaneous phase is the argument of the analytic part of the signal. In one-dimension the analytic part of a signal can be written as

FA= 2F step(U ) (3.9)

in the Fourier domain, where FA is the analytic part of F . This means that

the analytic signal has no negative frequency components. The result of removing the negative frequencies can be seen in the example below. Consider the one-dimensional signal

f (x) = Acos(w₀x) (3.10)

the Fourier transform of f is

F (w) = 1 2(δ(w− w0) + δ(w + w0)) (3.11) so FA(w) = 1 2δ(w− w0) (3.12) which gives fA(x) = Aeiw0x (3.13)

So the instantaneous phase p(x) is

p(x) = arg[fA(x)] = w0x (3.14)

(22)

CHAPTER 3. OPTICAL FLOW ESTIMATION

F (w) = 0 if w≤ 0 (3.15)

so a quadrature filter is used to estimate the instantaneous phase. An example of a one dimensional quadrature filter is given in figure 3.1, spatial domain in the upper and Fourier domain in the lower image. The even curve is the real part and the odd the imaginary part.

−6 −4 −2 0 2 4 6 −0.2 −0.1 0 0.1 0.2 pixels f −π −π/2 0 π/2 π 0 0.5 1 w F

Figure 3.1: Solid: real part, Dashed: imaginary part. The definition of a quadrature filter in two or more dimensions is:

F (w) = 0 if w· ˆn ≤ 0 (3.16)

where ˆn is the filter direction. For more details about quadrature filters see for

example [33].

Since fA(x + 1)fA(x− 1)0= Be−2iw0 the derivation is done by

dx(x, y) = arg(fA(x + 1, y)∗ fA(x− 1, y)0)/2 (3.17)

and similarly for dy and dt. This removes the problems with phase wrap-arounds.

3.4 Phase vs. intensity gradient method

The phase based algorithm has been tested on the Yosemite sequence [37] which has a known velocity field, see figure 3.2. The sky region is excluded from the

(23)

3.4. PHASE VS. INTENSITY GRADIENT METHOD

error analysis since it don’t have any true motion. The velocities are found from the constraints by using an affine motion model presented in chapter 4. These results are compared to the results obtained by using intensity instead of phase. Only two images are used, frame 8-9.

Figure 3.2: Frame 9 and estimated velocity field for the ground and the ’correct’ flow for the sky).

Two different quadrature filter directions are used (along the x and y axes), giving two constraints in each pixel. Each filter is implemented as two 1D filters with 13 complex coefficients in the filter direction and 13 real coefficients in the orthogonal direction, see figure 3.1. The resulting 2D filter is plotted in the Fourier domain in figure 3.3.

The intensity image derivation in the intensity based method is done by first convolving both images with a Gaussian LP filter with 5 coefficients in each direc-tion to get a better temporal derivative (attenuates aliasing and noise sensitivity). The spatial derivatives are then estimated with [−1, 8, 0, −8, 1]/12 and a Gaussian LP filter with 5 coefficients in the orthogonal direction.

The error measure used is

arccos(vT_estvtrue) [34], where v =

1 q

v_x2+ v_y2+ 1

(vx, vy, 1).

The results obtained when some different region sizes are used in the affine mo-tion model are given in table 3.1 together with some previously published results. One should be careful when comparing the different methods since one usually gets very different results depending on the selected parameters. One can see that the results from the intensity and phase based methods change drastically with different region sizes. On the other hand are the implementations of the intensity and phase based methods very similar, with practically the filters and the confidence measure as the only changeable parameters. The phase based method seems to work better but is on the other hand quite computationally demanding compared to the the intensity based method, see chapter 7.

(24)

CHAPTER 3. OPTICAL FLOW ESTIMATION −π −π/2 0 π/2 π −π −π/2 0 π/2 π −0.2 0 0.2 0.4 0.6 0.8

Figure 3.3: Filter in the Fourier domain.

3.5 Confidence measures

It is very important to have confidence measures indicating the reliability of the constraints, especially since the velocity is found from the constraints by using the least squares method (see chapter 4) which is very sensitive to outliers.

The intensity based method does not need to have any separate confidence measure C since the magnitude of _dxdI and dI_dy act as confidence measures. This reduces the computations needed.

The phase based method on the other hand needs to have a separate confi-dence measure. This is since the phase is invariant to the magnitude of the signal variations which should act as a confidence measure. The confidence measure used for the constraints obtained from a certain quadrature filter f(l) is

C(l) =|q(l)|2 (3.18)

(25)

3.5. CONFIDENCE MEASURES

Technique Average Standard Density error deviation

Lucas & Kanade [16] 2.80◦ 3.82◦ 35% Fleet & Jepson [35] 2.97◦ 5.76◦ 34.1% Black & Jepson [38] 2.29◦ 2.25◦ 100% Ju et al. [39] 2.16◦ 2.00◦ 100% Karlholm [16] 2.06◦ 1.72◦ 100% Farneb¨ack [16] 1.14◦ 2.14◦ 100% Intensity, 31∗ 21 2.63◦ 4.08◦ 100% Intensity, 15∗ 15 2.94◦ 4.03◦ 100% Intensity, 11∗ 11 3.29◦ 4.07◦ 100% Phase, 31∗ 21 2.05◦ 2.00◦ 100% Phase, 15∗ 15 2.33◦ 2.40◦ 100% Phase, 11∗ 11 2.64◦ 2.68◦ 100%

(26)

(27)

Chapter 4

Flow and displacement

constraints

4.1 Introduction

The optical flow methods described in chapter 3.3 and the two correlation based methods in section 5.3 and 5.4.3 generated a small number of constraints on the motion. The gradient method generates one constraint, and the other methods generates one for each filter direction. If the number of constraints exceeds the number of sought parameters this will result in an over-determined equation system which can be solved in a number of different ways.

4.2 Optical flow constraints

Since the gradient constraint equation:

Exvx+ Eyvy+ Et= 0 (4.1)

contains two unknown parameters (vx, vy) we will need at least two constraints

to obtain the parameters, see figure 4.1.

So in the case when we only have one constraint in each pixel (for example when only using the gradients of the image intensities) we must use constraints from a neighborhood to obtain the velocities. However, since gradients are noise sensitive the gradient constraints will also be quite noise sensitive. So one would like to use the constraints from a neighborhood to reduce the noise sensitivity, im-prove the accuracy and to spread the constraints from regions with good structure to regions of little structure. Another reason to use constraints from a neighbor-hood is that the probability of only having constraints from one one-dimensional structure is decreased, thus reducing the aperture problem. There are mainly two different methods to include the neighborhood. The first is to make some regu-larization of the velocity field. Horn and Schunck [14] did this by combining the gradient constraint with a global smoothness constraint. These methods have not been considered since they are iteratively solved with a quite high computational

(28)

CHAPTER 4. FLOW AND DISPLACEMENT CONSTRAINTS

vx

vy

Solution

Ix1vx+Iy1vy=-It1

Ix2vx+Iy2vy=-It2

Figure 4.1: One constraint restricts the velocities to a line.

complexity and maybe even more important, they lack a good confidence mea-sure. The second method is to use some parameterized model to approximate the velocity in a region. This will lead to an over-determined equation system. The simplest and fastest way of solving this equation system is to use the least squares method. The largest drawback with this method is the sensitivity to outliers. Two methods which are not as sensitive are to use the median or the Hough transform, the drawback with these methods is the considerable increase in computational complexity. These and some other parameter estimation techniques are presented in [15]. Only the least squares method is considered because of the high compu-tational complexity of the other methods. Instead some other efforts are made to remove the outliers, see section 4.4.

4.2.1 Motion models

Four different parameterized motion models have been considered. The methods are:

• v = k in a region, where k is a constant. Used by Lucas and Kanade [16,17]. • An affine motion model

vx vy = b₀ b₁ + a₀ a₁ a₂ a₃ x y (4.2) The image motion generated by a planar surface under perspective projection may be approximated by this if the depth to the surface is much larger than the depth variation in the surface. See figure 4.2 where d is the depth and

dv the depth variation.

If this assumption is not met, an eight parameter model is needed to describe the flow: vx vy = b₀ b₁ + a₀ a₁ a₂ a₃ x y + x2 xy xy y2 c₀ c₁ (4.3)

(29)

4.2. OPTICAL FLOW CONSTRAINTS d dv z camera ground

Figure 4.2: Affine transformation approximation.

For more details see [18].

• By assuming that the depth is constant in the region and that the observed

region is static and by knowing the motion of the camera one can estimate the velocity changes in the image regions. This is done by using eq. 2.5. In one direction (x) it can be written as follows

v =f Tx− xTz

Z (4.4)

where v is the measured motion subtracted with the motion caused by rota-tions. This can then be written as:

v = v₀(1− dxTz

f Tx− x0Tz

) (4.5)

where v₀ is the velocity in the center of the region and x₀, dx are the center

location and the deviation from the center location respectively. So the wanted parameter is v₀.

• The next model is similar to the one above. The difference is that instead of

assuming a constant depth, it assumes that the ground is a horizontal plane. In the camera coordinate system one gets

aX + bY + cZ = k (4.6)

where a, b, c are calculated from the angles obtained from the inertial system and k is unknown. Putting this into eq. 4.4 gives

(30)

CHAPTER 4. FLOW AND DISPLACEMENT CONSTRAINTS v = v₀(1−1 c( dxTz(c + ad_fx + bd_fy) f Tx− x0Tz − (adx f + b dy f ))) (4.7)

To use the motion models we first replace the velocities in eq. 3.2 and then estimate the new parameters. A second method is to first estimate the velocities in a small region with the first model, and then apply the model to the estimated velocities in a larger region. By doing this one reduces the number of unknown variables to be estimated at a time and obtains a confidence measure when solving the first equation system. This confidence measure can be used when solving the second equation system. However, this increases the number of equation systems to be solved.

The over-determined equation system can be written as

WAx = Wb (4.8)

where x contains the wanted parameters and W is a weighting matrix. The weighting matrix is used to decrease the influence of points far from the point where the parameters are calculated. The least squares solution is then

x = (ATWTWA)−1ATWTWb (4.9)

4.3 Directional flow field

If the scene is static we can calculate the direction of the rotation compensated image velocities vt since the motion of the camera is known. With rotation

com-pensated we mean that the velocity caused by rotations is removed, vt= v− vr.

The direction is found in equation 2.5. Thus

vt= v 1 p (f Tx− xTz)2+ (f Ty− yTZ)2 f Tx− xTz f Ty− yTz = vˆn (4.10)

where v is the magnitude of the velocity. Replacing the velocities in the gradient constraint equation 3.2 gives

v Gx Gy · ˆn + Gt+ Gxvrx+ Gyvry = 0 (4.11)

So the velocity v can be found from one constraint as long as (Gx, Gy)T is not

orthogonal to ˆn. To avoid this problem, to increase the accuracy, and to reduce the

noise sensitivity, a neighboring region is still used when calculating v. The motion models presented above can still be used with a half the number of parameters giving a considerable speed-up.

(31)

4.4. OUTLIER REJECTION

4.4 Outlier rejection

The least squares method which was used when estimating the image velocities is very sensitive to erroneous gradient constraints. So one would like to remove these before calculating the velocities. This can to some extent be done if a prediction of the velocities is available. First the residuals R for the constraints with the predicted velocities are calculated.

R = Gxvpx+ Gyvpy+ Gt (4.12)

If the residual is larger than some threshold T the constraint is removed. The threshold can be selected as T = K

√

k+v2_p Cp

q

G2x+ G2y to allow larger residuals for

large velocities and derivatives. k, K are constants and Cp is the confidence for

the prediction.

If the directional flow is calculated the removal can be done by removing the constraints with |v − vp| >

√

k+v2_p

Cp K. Where v is the magnitude of the velocity obtained from the constraint.

If an estimate of the depth Z exists, a prediction is obtained from equation 2.5. If not, vp = 0, or estimates from sub-sampled images can be used as predictions

to remove some of the outliers (some changes to the thresholds must be done).

4.5 Complex correlation constraints

From the complex correlations we obtained one constraint for each filter direction in the correlation point. So since these methods are used for tracking of image regions the number of constraints is much smaller and the motion models are not applicable since all the constraints are obtained in the same point. Another difference is that the constraint is not a line but a plane or rather a surface; the phase as a function of position.

p = f (dx, dy) (4.13)

where p is the phase and dx, dy is the displacement. Apart from these

con-straints we have the constraint of zero phase (a plane). So we need at least two different filters to estimate the displacement; the intersection of the three planes. Two different methods to find the intersection has been considered.

The first method finds the intersection between the surfaces and the zero phase plane. This is done by for each row in the phase-mesh look for a zero crossing, see figure 4.3 (zero crossings caused by phase wrap-around are removed). If there is more than one crossing, the one with highest correlation value in its closest mesh point is used. The same thing is then done for each column. The number of zero crossings for all rows together are then counted (one or zero for each row) and the same for the columns. If the count for the rows are larger than for the columns the crossings found from the row search will be used to describe the curve generated by the crossing between the surface and the plane, and if the second count is larger the crossings from the column search will be used. This is done for each phase mesh. The number of counts will normally be smaller for the direction closest to

(32)

CHAPTER 4. FLOW AND DISPLACEMENT CONSTRAINTS

the direction of the filter used to obtain the phase mesh. This is since the phase of the filter usually changes faster in the filter direction than in the orthogonal direction. Each curve is then linearized around the intersection point between the curve and the curve obtained from the filter with the most dissimilar direction. This gives displacement constraints in the same form as the gradient constraint equation eq. 3.2. If more than two constraints are produced the equation system is solved with the least squares method.

Displacement

0 -2

-1

1

2 Phase

Interpolated zero crossing

Figure 4.3: Estimation of the zero crossing.

The second method finds the curves in the same manner as the first method does, but instead of linearizing the curves around the intersection point, the sur-faces are linearized. This gives a plane:

Exdx+ Eydy+ p = k (4.14)

where Ex, Ey, k are known and dx, dy is the displacement. The parameter p is

the phase. This gives an equation system:        C₁(E_1xdx+ E1ydy+ p− k1) = 0 C₂(E_2xdx+ E2ydy+ p− k2) = 0 .. . CN(EN xdx+ EN ydy+ p− kN) = 0 αp = 0        (4.15)

where N is the number of filters used and C₁, ..., CN are their confidence

esti-mates. α is the weight of the zero phase plane. This weight is given a quite large value compared to the values of the confidences for the other planes. If more than two filters are used, these plane equations will give an over-determined equation system which is solved with the least squares method. If only two filters are used (gives two planes and the zero plane) this will give the same result as the first

(33)

4.5. COMPLEX CORRELATION CONSTRAINTS

method. However, if more than two filters are used the result will be different giving the planes with a large slope greater influence than the planes with small slope. This is beneficial since the zero crossing is better localized when the slope is large. This is since the linearization is made in a smaller neighborhood around the zero crossing. Tests made show that the accuracy is improved a little bit.

(34)

(35)

Chapter 5

Region tracking

5.1 Introduction

As seen in chapter 6 it is favorable to have large displacements, provided that the measurement errors don’t increase very much because of this. It was further seen that this is not the case in most situations. Region tracking is one way to get large displacements by increasing the temporal distance between the matched regions. Another maybe not so obvious method is to zoom in (increase the focal length). This will however reduce the viewed area and thereby decrease the number of measurements obtained for a region moving through the image.

There are many ways to track a region, the most common methods are based on grey scale correlation type methods [19–21]. A simple implementation of this will be used as reference method. Some other methods are feature based methods [24, 25], phase based methods [22, 23].

Four different methods to estimate displacement are given in this chapter.

5.2 Intensity correlation based method

The first method is an intensity correlation based method which as mentioned above is intended to be used as a reference. One should note that this imple-mentation is not considered to represent intensity based correlation methods in general. However, the characteristics of this implementation is the same as for many other intensity correlation based methods.

The integer position of the region is estimated by finding the position of max-imal correlation. The correlations are written as

C(x, y) =X

R

Ii(x + k, y + l)Ir(k, l) = (Ii(k, l)∗ Ir(−k, −l))(x, y) (5.1)

where (Ir, Ii) are the image region looked for and the image respectively, and

∗ is a convolution. The results are improved by subtracting the mean intensity

value from Ir. This will make the correlation independent of the DC level and

also to changes in the DC level. A drawback with this method is that for a given image region Ir there exists image regions Iro that have a higher correlation with

(36)

CHAPTER 5. REGION TRACKING

Ir than Ir itself has. A method that avoids this problem is to use the SSD (sum

of squared difference).

C(x, y) =X

R

(Ir(k, l)− Ii(x + k, y + l))2 (5.2)

This method has also produced better results in the evaluations made. Conse-quently the SSD is used as reference method. A drawback with this method is its sensitivity to changes in DC level.

To get sub-pixel resolution a quadratic polynomial is fitted to the correlation

C in a 3× 3 region around the integer position with maximal correlation. The

position of the polynomial maximum is then used as position estimate for the region. The results obtained with this method are not very impressive. A positive thing with the method is that it is quite fast when used for tracking, see section 7. For a comparison between a number of different intensity correlation methods see [20].

5.3 Complex correlation based method

This method is based on correlating the complex responses from quadrature fil-ters. At least two different filters with different filter directions are needed. The algorithm first finds the position with integer precision by finding the maximum absolute value of the complex correlation for one filter

maxx,yabs

" X R q_1r(x + k, y + l)q_1i(k, l)0 # (5.3)

where R is the matched region, (q_1r, q_1i) are the filter responses from filter one in the region and new image respectively. It seems to be sufficient to use just one filter when finding the integer position. However, one can increase the robustness by maximizing the products of the correlation sums for the different filters

maxx,y Y j abs(X R qjr(x + k, y + l)qji(k, l)0) (5.4)

where j is the index for the responses from the different filters.

The sub-pixel precision can then be found by using the fact that the complex correlation maximum is real. So the sub-pixel resolution can be found finding the zero crossing of the phase for the correlation. This is done by calculating the phase for the correlation in the integer positions in a 5× 5 environment around the estimated integer position. This will usually give a phase plane, see figure 5.1. This means that at least two different filters must be used, if more filters are used they will define an over-determined system. The zero phase (or minimum phase in the case of an over-determined system) is then found as described in section 4.5. To get good results one should try to get the planes with different normals, this is done by using different filter directions.

(37)

5.4. CANONICAL CORRELATION BASED METHOD 0 2 4 6 8 10 1 2 3 4 5 6 7 8 9 −4 −2 0 2 4

Figure 5.1: The phase as a function of the displacements. 5 corresponds to zero displacement.

5.4 Canonical correlation based method

5.4.1 Introduction

The use of canonical correlation is a natural way to find dependencies between two multidimensional data sets by finding the vectors (w_x, w_y) which maximizes the correlation between the scalars w_xTf_x and wT_yf_y, where (f_x, f_y) are the two data sets. To illustrate how this can be used to estimate displacement a simple example will be considered. Assume that the vector f_x is created by picking the intensity values from a grid from the new image and f_y is created by picking intensity values from the old image, see figure 5.2. The vector fx and scalar fy is created for each pixel position P in a region R. The linear combinations wxis then found from the two data sets using canonical correlation (w_yis one since f_yis a scalar). If the new and old images are identical we can see that component five in f_xwill be identical to f_y, so these two will be totally correlated. Thus w_x = [000010000] will be obtained. This gives zero displacement. If we instead translate the new image one pixel to the right we will get w_x= [000100000] which gives a displacement of one pixel to the right. However, if we have sub-pixel translation, several components in w_x will be non-zero. And since the interpolation properties for the signal are unknown it will be difficult to get a good estimate. Many other grids used to create

f_xand f_ycan be used, but the problem will be the same. This is the same problem as encountered when trying to get sub-pixel precision with intensity correlation based methods. One solution is to first project the signal on a complex function with relatively slowly changing and linear phase. An example of this solution is given in section 5.4.3 and a very similar solution has also been done by Magnus Hemmendorff [26]. Another way is to first project the signal on a function with known interpolation properties. An example of this solution is given in section

(38)

CHAPTER 5. REGION TRACKING 5.4.4.

P+[1 1]

1 2 3

4 5 6

7 8 9

P+[1 0]

_P

1 f

x

f

y

Positions in vector

New image Old image

Positions in images

Figure 5.2: Construction of data sets.

5.4.2 Canonical correlation

This section gives a short description of canonical correlation, for a more detailed description and a description of how it can be used to estimate disparity see [27], [28], [29].

As mentioned above the problem in canonical correlation analysis is to find the vectors ˆwx, ˆwy of maximum covariance between x and y, where x = xTwˆx and

y = yTwˆy, in a data set of x and y vectors. So the function to be maximized is

ρ = E[x 0_y] p E[x0x]E[y0y] = ˆ w0_xC_xywˆ_y p w0_xC_xxw_xw0_yC_yyw_y (5.5)

where E is the expectation value and 0 is the conjugate transpose. The C matrices are the covariance matrices

Cxx= E[(x− mxT)0(x− mx)T] (5.6)

m_x= E[x] (5.7)

where T is the plain transpose. The problem of maximizing the canonical correlation can be solved in a number of different ways, see [27] or [26] for examples. One way is to solve the following equation system:

(39)

5.4. CANONICAL CORRELATION BASED METHOD Cxywˆy= ρλxCxxwˆx (5.8) Cyxwˆx= ρλyCyywˆy (5.9) where λx= λ−1y = s ˆ w_y0C_yywˆ_y ˆ w0_xC_xxwˆ_x (5.10)

Equation 5.8 and 5.9 can be rewritten as

C−1_xxCxyC−1yyCyxwˆx= ρ2wˆx (5.11)

C−1_yyCyxC−1xxCxywˆy= ρ2wˆy (5.12)

Here one can see that ˆwxand ˆwyare eigenvectors to the matrices C−1xxCxyC−1yyCyx

and C−1_yyCyxC−1xxCxyrespectively. Note that only one of the equation systems 5.11

and 5.12 need to be solved since if one of the eigenvectors are found the other can be found from equation 5.8 or 5.9. Since ρ is the canonical correlation, the eigen-vectors corresponding to the largest eigenvalue are the wanted solution. Do also note that since the correlation is normalized ρ will lie between minus one and one. It can be shown that the solution is invariant to affine transformations of x and

y. This will make the displacement estimation algorithms presented here almost

invariant to illumination changes.

5.4.3 Quadrature filter based

This is an extension of the disparity estimation algorithm by Magnus Borga [27] to displacement estimation.

The input vectors to the canonical correlation are quadrature filter responses

q picked from four different positions in each image as shown in figure 5.3.

Thus, q(x, y) =X k X l I(x− k, y − l)f(k, l) (5.13)

where I is the image and f is the quadrature filter. And

q₁= q(px− d, py− d) (5.14)

q₂= q(px+ d, py− d) (5.15)

(40)

Previous image Current image

q1 q2

q3 q4

x=(q1,q2,q3,q4)^T

d

y=(q1,q2,q3,q4)^T

Figure 5.3: The arrows give the filter direction and d is the displacement.

q₄= q(px+ d, py+ d) (5.17)

Note that this is equivalent to taking the filter responses from the quadrature filter f translated +− d pixels:

f₁(x, y) = f (x− d, y − d) (5.18)

f2(x, y) = f (x + d, y− d) (5.19)

f₃(x, y) = f (x− d, y + d) (5.20)

f₄(x, y) = f (x + d, y + d) (5.21) The maximization of the canonical correlation then solves the problem of find-ing the two best linear combinations of the vector components. The covariance matrices are estimated by adding the products in a region R round the position where the displacement is to be calculated.

Cxx=

X

R

W(xT)0xT (5.22) where W is a weighting matrix. The mean value of x does not need to be removed since the quadrature filter are insensitive to D.C.

Two new filters can then be constructed

fx= X i wxifi and fy= X i wyifi (5.23)

(41)

5.4. CANONICAL CORRELATION BASED METHOD

where f_i are the displaced quadrature filters and (wxi, wyi) are the

compo-nents from the two vectors solving the maximization of the canonical correlation,

i=1,2,3,4 in this case. These two filters will have about the same displacement (of

the correlation peak) to each other as the two regions in the images had. Note that these two filters will be quadrature filters as well since a translation is only a phase shift in the Fourier domain. So the objective is now to find the maximum correlation between the filters fx and fy. Thus finding the displacement dx, dy

which maximizes

g(dx, dy) = corr(fx(x, y), fy(x + dx, y + dy)) (5.24)

This can be done very fast since the correlation between the filters (fi, fj) can

be done in advance for different displacements

hij(dx, dy) = corr(fi(x, y), fj(x + dx, y + dy)) (5.25)

Note that (dx, dy) do not need to be integers since the correlation is easily computed for sub-pixels in the Fourier domain. The function g(dx, dy) to be maximized can now be written as

g(dx, dy) =X

i

X

j

wxiw0ijhij(dx, dy) (5.26)

The final solution is then found by using the fact that the maximum correlation is real as was described in section 4.5. Thus we need to do the same computations for at least one more filter direction.

The maximum displacement that can be handled depends on the filter displace-ment d in figure 5.3. For example, if d = 1 pixel, the maximum displacedisplace-ment is 2 pixels. However, the accuracy decreases with increasing filter displacement, and the filters must have a large overlap. The maximum filter displacement depends on the quadrature filter used. To be able to estimate large displacements with good accuracy, the displacement estimate can be refined by starting with a large filter displacement and then decreasing it. The refinement is for example done in three steps d = 2, 1, 0.5. This refinements are however quite expensive since all computations except the filtering have to be done in each step, but it’s still cheaper than sub-sampling the images.

5.4.4 cos

2

filter based

This method is based on the idea to project the signal on a known function with good interpolation properties before using canonical correlation. Like the method using quadrature filters this will also give two new filters which are translated the same amount as the images. However, the choice of filters will make it possible to find the translation directly from the weights for the different filters. The choice of function was inspired by the use of cos2functions in channel representation for learning structures like artificial neural nets, see [31], [30].

(42)

CHAPTER 5. REGION TRACKING 0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 0.2 0.4 0.6 0.8 1

Figure 5.4: f (x, y) = cos(W x)2cos(W y)2

f (x, y) = cos2(W x)cos2(W y) |x|, |y| <= π

2W (5.27) The radially symmetric function cos(Wpx2+ y2) is not selected since this is not separable into to 1D parts which makes the filtering expensive. Another reason is that the selected function for a fixed y is a cos2function of x and analogous for a fix x. This will later be proved to be advantageous. The interpolation properties of the cos2 function is much nicer in 1D than in 2D, so one would like to separate the problem into two 1D problems. This can be done by creating the vectors to the canonical correlation as shown in figure 5.5. Thus the vectors x, y are created from responses of filters shifted horizontally and vertically respectively.

The filter responses are written as

c(x, y) =X

k

X

l

I(x− k, y − l)f(k, l) (5.28)

where I is the image and f is the cos2 filter. The vectors (x, y) are then created as x(x, y) =        c(x, y− kd) c(x, y− (k − 1)d) .. . c(x, y + (k− 1)d) c(x, y + kd)        and y(x, y) =        c(x− kd, y) c(x− (k − 1)d, y) .. . c(x + (k− 1)d, y) c(x + kd, y)        (5.29)

(43)

5.4. CANONICAL CORRELATION BASED METHOD New Image c1 c2 c3 c4 c5 x=(c1,c2,c3,c4,c5)^T y=(c1,c2,c3,c4,c5)^T d d Old Image

Figure 5.5: The circles give the filter centers.

where d is the displacement between the filters and 2k + 1 is the number of filters. The number of filters can of course be even as well. Since translation of the image (or image region) is equivalent to translating the filters in the opposite direction figure 5.6 makes it plausible that this works.

The vectors (w_x, w_y) that maximize the correlation are then found giving the two new filters.

fx= X i wxifxi and fy= X i wyifyi (5.30)

where (f_xi, f_yi) are the translated filters in the new and old image respectively as in figure 5.5. Like before the maximum of the correlation between these two filters will be shifted the same amount as the image region.

Now, since the filter f was a cos2 in both the x- and y-direction, a sum of in

y-direction translated versions of this filter (like fx is) will be a cos2 function in

the x direction. fx= X i wxifxi= X i

wxicos2(W x)cos2(W (y− id)) = Acos2(W x) (5.31)

where A is independent of x and i = −2, −1, 0, 1, 2 in this example (and corre-sponding for fy). This makes it possible to separate the problem into two 1D

prob-lems, finding the position of maximum correlation between fx(see figure 5.7) and

cos2(W y) to get the translation in the y-direction, and between fy and cos2(W x)

to get the x-translation. Since the sign of the w vectors are only determined rel-ative to each other it might be−cos2 we should correlate with. However, instead of finding the sign we can maximize the absolute value of the correlation.

(44)

d

Figure 5.6: Image translated 1d down and 1.5d right.

If we assume the discrete cos2 filter used to be approximately continuous and also that the maximum of the correlation coincides with the maximum of the recon-structed filter, this problem becomes quite simple. The later assumption implies that the reconstructed filter is symmetric or that the cos2filter used was periodic. This is clearly not true. However, the error introduced by these assumptions seems to be quite small, see section 5.5.

So, the problem is now to find the maximum of X

i

wicos2(W (x− id)) (5.32)

Using the trigonometric functions

cos2(x) = 1 + cos(2x)

2 and cos(x, y) = cos(x)cos(y)− sin(x)sin(y) (5.33)

gives X i wi 2 + X i wi

2 (cos(2W (id))cos(2W x)− sin(2W (id))sin(2W x)) = (5.34)

= α + βcos(2W x) + γsin(2W x) (5.35) where α, β, γ are known and

α =X i wi 2 (5.36) β =X i wi 2 cos(2W (id)) (5.37)

(45)

5.4. CANONICAL CORRELATION BASED METHOD −5 0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 5.7: fx= P iwxifxi γ =X i wi 2 sin(2W (id)) (5.38)

The function can now be written as

α + β γ · cos(2W x) sin(2W x) (5.39) where · is the dot product. The absolute value should be maximized, so de-pending on the sign of α we should find a vector (cos(2W x), sin(2W x))T _{which is}

parallel (α positive) or anti-parallel (α negative) with the vector (β, γ)T_{, see figure}

5.8. So x is given as x = sign(α)arctan( γ |β|) 2W (5.40)

Once again we have assumed that the functions are periodic. So to make this work theoretically we must set the coefficients in w to zero for the filters not overlapping the solution, thus for wi with

|x − id| > π

2W (5.41)

where x is the correct solution. This can for example be done by first finding an approximate solution and remove the coefficients to get the final solution, an other way is to use a set of filters which all overlap some interval and then restrict the solution to this interval. In practise however it seems to work quite well without removing these coefficients.

(46)

γ

β

α

>0

<0

Figure 5.8: Different solutions depending on the sign of α.

Choosing angular frequency W and filter displacement d

All values of x on the interval between the center of the first and the center of the last filter should be possible to represent with the cos2 functions. This gives a restriction on W and d (see figure 5.9)

W ≤ π

2d (5.42)

To make the interpolation work properly one should however select a smaller

W d. Values between π₄ and ₁₆π seems to work well. Two different values on W have been tested, W = π₈ and W = ₁₆π, see figure 5.10. The improvement with using W = ₁₆π was small, so all tests presented are with W = π₈. With this W displacements d = 1 and d = 2 seem to work well. The choice of W , d and number of filters is a trade-off between precision, maximum handled displacements and computational complexity. Some test results are given in section 5.5.

Note that the angular frequency W and displacement d do not need to have the same values in the x and y directions. Neither does the number of filters, and

d does not need to be constant. Thus, d(k) can be used. This means that we can

make the algorithm handle different displacements in the x and y direction and also get different precisions for different displacements.

Using a sum of translated cos2 functions instead of a single cos2function

The cos2 function does not have a zero mean like the quadrature filter has. So to be invariant to changes in DC-level one has to remove the mean value when calculating the covariance matrices, which is quite computationally expensive. The

(47)

5.4. CANONICAL CORRELATION BASED METHOD −8 −6 −4 −2 0 2 4 6 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 5.9: W d taking it’s maximum value, W d = π₂

correlation will however still be invariant to scalings of the intensity, which makes it rather invariant to illumination changes. In practise it seems to work well without removing the mean value. If the DC-level changes drastically one must however remove the mean.

It might anyway be interesting to see that the method works for a sum of cos2 functions after some small changes.

So the filter is now written as

f (x, y) =X k axkcos(Wx(x− mx(k)))2 X l aylcos(Wy(y− my(l)))2 (5.43)

Substitution with this in the equation 5.32 results in the same equation system structure as before. α + β γ · cos(2W x) sin(2W x) (5.44) The coefficients have however changed, for the x direction we get

α =X i wi( X k ak 2 ) (5.45) β =X i wi( X k akcos(2Wx(x− mx(k) + dx(i))) 2 ) (5.46) γ =X i wi( X k aksin(2Wx(x− mx(k) + dx(i))) 2 ) (5.47)

(48)

CHAPTER 5. REGION TRACKING −80 −6 −4 −2 0 2 4 6 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 5.10: cos2(W x) with W = π₈ and W = ₁₆π.

where dx(i) are the displacements between the filters. This can be solved in the same way as before. Two different filters have been tested, both aiming at removing the DC component. The two filters are plotted in figure 5.11. The angular frequencies are Wx = Wy = π₈,₁₆π, the displacements are mx = my =

[−1 0 1], [−2 0 2] and the weights are ax = ay = [−0.5 1 − 0.5] for both. The

results achieved are not as good as with the single cos2filter, and the needed filter size seems to be larger increasing the computational complexity. This might not be so strange since the interpolation properties of these filters is not that nice.

−10 −8 −6 −4 −2 0 2 4 6 8 10 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15

Figure 5.11: Filters obtained by summing displaced cos2filters.

Restricting the solution to a subspace

When some filters don’t overlap the displacement it can be favorable to remove all or some of them before the canonical correlation is performed, since these can

(49)

5.5. SOME TEST RESULTS

otherwise corrupt the solution. This can be done by first finding the solution with all filters and then remove the filters not wanted. The cost of doing this is almost negligible since the covariances between all filters have already been calculated and this is the costly part of the algorithm, see chapter 7. So it is just a matter of extracting a part of the covariance matrices and use these parts as covariance matrices in the canonical correlation, see figure 5.12.

Cxx=

Σ

xx^T=

Σ

x1x1 x1x2 x2x1 x2x2 xnx1 xnx2 x1xm x2xm xnxm

Σ

x2xl x2xm xkxl xkxm

Figure 5.12: Restricting the solution to a smaller interval.

5.5 Some test results

A number of tests have been done to evaluate the performance of the different image matching algorithms. The results from two of the tests are presented here. The first test was performed by making a pure translation of an image (see figure 5.13) using bicubic interpolation. The translations are selected randomly with a maximum displacement of two pixels in both x and y direction. Fifty points are then randomly selected and the mean distance from the correct translation and the standard deviation of the error for these points are computed. The second test was performed in the same way but with some noise and deformations added to the translated image. The noise added is normal distributed white noise giving a SNR of 20dB. The deformations are a one degree rotation and a five degree perspective change.

The mean distance from the correct displacement and the standard deviation of the error for the different methods are given in the table 5.1 where:

Passive Aircraft Altitude Estimation using Computer Vision

Link¨

oping Studies in Science and Technology

Thesis No. 847

Passive Aircraft Altitude

Estimation using Computer Vision

Anders Moe

Abstract

Acknowledgements

Contents

Chapter 1

Introduction

1.1

Motivation

1.2

Contributions

1.3

Thesis outline

1.4

Notations

Chapter 2

3D structure estimation

2.1

Introduction

2.2

Structure from known motion

f

Y

X

Z

y

x

O

Ω

Ω

Ω

2.2.1

Coordinate transformations

2.3

Error mechanisms

2.4

Optical flow or tracking?

Chapter 3

Optical flow estimation

3.1

Introduction

3.2

Flow constraint extraction

3.3

Phase-based method

3.3.1

Introduction

3.3.2

Phase estimation

3.4

Phase vs. intensity gradient method

3.5

Confidence measures

Chapter 4

Flow and displacement

constraints

4.1

Introduction

4.2

Optical flow constraints

vx

vy

Solution

Ix1vx+Iy1vy=-It1

Ix2vx+Iy2vy=-It2

4.2.1

Motion models

4.3

Directional flow field

4.4

Outlier rejection

4.5

Complex correlation constraints

Displacement

0

_P