• No results found

Two Multimodal Image Registration Approaches for Positioning Purposes

N/A
N/A
Protected

Academic year: 2021

Share "Two Multimodal Image Registration Approaches for Positioning Purposes"

Copied!
107
0
0

Loading.... (view fulltext now)

Full text

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2019

Two Multimodal Image

Registration Approaches for

Positioning Purposes

(2)

Master of Science Thesis in Electrical Engineering

Two Multimodal Image Registration Approaches for Positioning Purposes:

Linnea Fridman and Victoria Nordberg LiTH-ISY-EX--19/5208--SE Supervisor: Xuan Gu

IMT, Linköpings universitet Bertil Grelsson

Saab Dynamics Examiner: Maria Magnusson

ISY, Linköpings universitet

Computer Vision Laboratory Department of Electrical Engineering

Linköping University SE-581 83 Linköping, Sweden

(3)

Abstract

This report is the result of a master thesis made by two students at Linköping Uni-versity. The aim was to find an image registration method for visual and infrared images and to find an error measure for grading the registration performance. In practice this could be used for position determination by registering the infrared image taken at the current position to a set of visual images with known positions and determining which visual image matches the best. Two methods were tried, using different image feature extractors and different ways to match the features. The first method used phase information in the images to generate soft features and then minimised the square error of the optical flow equation to estimate the transformation between the visual and infrared image. The second method used the Canny edge detector to extract hard features from the images and Chamfer distance as an error measure. Both methods were evaluated for registration as well as position determination and yielded promising results. However, the per-formance of both methods was image dependent. The soft edge method proved to be more robust and precise and worked better than the hard edge method for both registration and position determination.

(4)
(5)

Acknowledgments

We want to thank Saab Dynamics for giving us the opportunity of doing this thesis. We also want to give a special thanks to our supervisors, Bertil Grelsson and Xuan Gu, and our examiner, Maria Magnusson. Thank you for sharing your ideas and knowledge, and for valuable feedback on this work. Lastly we want to thank each other for good and sometimes less productive discussions.

Linköping, May 2019 Linnea & Victoria

(6)
(7)

Contents

1 Introduction 1 1.1 Motivation . . . 1 1.2 Purpose . . . 2 1.3 Delimitatons . . . 3 2 Theory 5 2.1 Related Work . . . 5

2.1.1 Registration Papers Specified for Visual and IR Images . . . 5

2.1.2 Registration Papers not Specified for Visual and IR Images 7 2.2 Image Filtering . . . 8

2.3 Edge Detectors . . . 9

2.3.1 Phase Congruency . . . 9

2.3.2 Sobel Operator . . . 10

2.3.3 The Canny Edge Detector . . . 10

2.3.4 Quadrature Filter . . . 11

2.4 Scale Pyramids . . . 12

2.4.1 Downsampling of Hard Edges . . . 12

2.5 Error Measurements . . . 13

2.5.1 Intensity Based Optical Flow . . . 13

2.5.2 Phase Based Optical Flow . . . 14

2.5.3 Chamfer Distance . . . 15 2.5.4 Hausdorff Distance . . . 17 2.6 Performance Measurements . . . 18 2.6.1 Cross-correlation . . . 18 2.6.2 Mutual Information . . . 19 2.6.3 Structural Similarity . . . 19 3 Method 21 3.1 Division of Labor . . . 21 3.2 General Outline . . . 21

3.3 Motivations for the Selected Methods . . . 23

3.4 Data Set . . . 24

3.5 Pre-Processing . . . 26 vii

(8)

viii Contents

3.5.1 Application of Transformation . . . 26

3.5.2 Image Filtering . . . 26

3.6 Soft Edge Method . . . 27

3.6.1 Generation of Scale Pyramids . . . 27

3.6.2 Feature Extraction Using Phase Information . . . 27

3.6.3 Estimation of Transformation Parameters . . . 28

3.6.4 Refinement of the Transformation Parameters . . . 30

3.7 Hard Edge Method . . . 30

3.7.1 Feature Extraction Using the Canny Edge Detector . . . 30

3.7.2 Generation of Canny Scale Pyramid . . . 31

3.7.3 Hausdorff Matching . . . 31

3.7.4 Chamfer Matching . . . 32

3.7.5 Removal of Bad Points . . . 32

3.7.6 Handling Unknown Areas . . . 33

3.8 Performance Evaluation . . . 34

3.8.1 Confidence . . . 34

3.8.2 Structural Similarity . . . 34

3.8.3 Mutual Information . . . 35

3.9 Position Determination . . . 35

3.9.1 Soft Edge Method . . . 36

3.9.2 Hard Edge Method . . . 36

3.10 Algorithms . . . 37

4 Results 39 4.1 Simple Registration With Applied Transformation . . . 40

4.1.1 Soft Edge Method . . . 41

4.1.2 Hard Edge Method . . . 42

4.2 Simple Registration Without Applied Transformation . . . 44

4.2.1 Soft Edge Method . . . 44

4.2.2 Hard Edge Method . . . 44

4.3 SingleKAIST Registration With Applied Transformation . . . . 45

4.3.1 Soft Edge Method . . . 46

4.3.2 Hard Edge Method . . . 47

4.4 SingleKAIST Registration Without Applied Transformation . . . . 49

4.4.1 Soft Edge Method . . . 49

4.4.2 Hard Edge Method . . . 49

4.5 MultipleKAIST Registrations With Applied Transformation . . . . 50

4.5.1 Soft Edge Method . . . 50

4.5.2 Hard Edge Method . . . 52

4.6 MultipleKAIST Registrations Without Applied Transformation . . 53

4.6.1 Soft Edge Method . . . 53

4.6.2 Hard Edge Method . . . 54

4.7 Position Determination . . . 55

4.7.1 Soft Edge Method . . . 56

4.7.2 Hard Edge Method . . . 57

(9)

Contents ix

4.8.1 Hard Edge Method: Hausdorff . . . 58

4.8.2 Performance Evaluation: SSIM . . . 59

5 Discussion 61 5.1 Data Set . . . 61

5.2 Registrations With Applied Transformation . . . 62

5.2.1 Soft Edge Method . . . 62

5.2.2 Hard Edge Method . . . 63

5.2.3 Comparison . . . 64

5.3 Registrations Without Applied Transformation . . . 65

5.3.1 Soft Edge Method . . . 65

5.3.2 Hard Edge Method . . . 65

5.3.3 Comparison . . . 65

5.4 Position Determination . . . 65

5.4.1 Soft Edge Method . . . 66

5.4.2 Hard Edge Method . . . 66

5.4.3 Comparison . . . 67

5.5 Discarded Methods . . . 67

5.5.1 Hard Edge Method: Hausdorff . . . 67

5.5.2 Performance Evaluation: SSIM . . . 67

6 Conclusion 69 A Appendix - Additional Results 73 A.1 SingleKAIST Registration With Applied Transformation . . . . 73

A.1.1 Soft Edge Method . . . 75

A.1.2 Hard Edge Method . . . 76

A.2 MultipleKAIST Registrations With Applied Transformations . . . 79

A.2.1 Soft Edge Method . . . 79

A.2.2 Hard Edge Method . . . 82

A.3 MultipleKAIST Registrations Without Applied Transformations . 84 A.3.1 Soft Edge Method . . . 84

A.3.2 Hard Edge Method . . . 85

A.4 Position Determination . . . 86

A.4.1 Soft Edge Method . . . 86

A.4.2 Hard Edge Method . . . 91

(10)
(11)

1

Introduction

This report is the result of a master thesis conducted by the two students Lin-nea Fridman and Victoria Nordberg at Linköping University, in cooperation with Saab Dynamics. The master thesis concerns the field of electrical engineering and both students have chosen to focus their studies on image processing which will be the area of focus for the report.

This chapter introduces the reader to the issue that is being investigated in the report and why this issue needs to be resolved. The ending of the chapter deals with some delimitations made in the project.

1.1

Motivation

Visual images and infrared (IR) images contain information about the portrayed surrounding, see Figure 1.1 for example images. Each camera captures a certain type of information, useful on its own but it could also be useful in a combined manner. IR images usually have relatively low resolution and the images contain information about temperature and emissivity [1]. Visual images on the other hand depend on the amount of source illumination incident to the scene being viewed and the amount of illumination reflected by the objects in the scene, see for example [4]. Good conditions and sensitive sensors can yield high resolution images. To be able to combine different image types, a process called image reg-istration can be used to align the images.

Image registration is a procedure applicable in various fields. When applied to visual and IR images, one application is vehicle positioning. This is the under-lying purpose of this thesis. The positioning is performed by trying to register an IR image taken from the vehicle to a set of visual reference images, from known locations. This is illustrated in Figure 1.2, where the blue reference images are a sequence of N geotagged visual images along a road. The red image is the IR

(12)

2 1 Introduction

(a)Visual image. (b)IR image.

Figure 1.1: Images showing an example of an image pair from theKAIST

dataset [16].

age taken at the current location, corresponding to visual image number n. After the registrations, the position of the vehicle is determined by comparing how well the registrations went. The visual image yielding the highest similarity, using a suitable measurement, is considered to be the correct image and location.

Figure 1.2: Illustration of the position determination. The blue images are geotagged visual images and the red image is an IR image taken from the vehicle at the current position.

1.2

Purpose

The aim of this thesis is to first find two robust methods for estimating the trans-formation parameters that align an IR image with a visual image. The aim is also to, from a sequence of visual images, be able to determine which visual image matches a specific IR image the best.

(13)

1.3 Delimitatons 3

The following questions are sought to be answered throughout this project: • Can the two methods be used to register visual and IR images?

• Which of the two approaches gives the best estimation of the parameters? • Can the registration approaches be used for positioning purposes by

find-ing the correspondfind-ing image to a specific IR image within a sequence of visual images?

• What strengths or weaknesses do the methods have depending on the cap-tured scene?

1.3

Delimitatons

This project is carried out during a limited amount of time. Due to this, methods that supposedly would take a long time to investigate are discarded. Another decision made due to the time limitation is that the two members will each focus on a single solution approach. Instead of trying multiple methods, the aim is to find one solid solution per member.

Another delimitation is the images used to evaluate the methods. As men-tioned, one application is to register visual and IR images taken from a vehicle. These are the types of images that will be used for evaluation of the methods. To be able to say more about the general robustness of the methods other types of data sets would need to be used.

Another delimitation is that the registration methods only need to be able to handle similarity transformations and not more complex transformations.

A final delimitation is that the registration methods are only expected to be able to handle small transformations. Investigating methods that could handle larger transformations lay outside the scope of this thesis.

(14)
(15)

2

Theory

Image registration is a process used to align multiple images [12]. It can for exam-ple be used to correct for motion blur in an image sequence, to map subjects to templates, and to align images taken with different imaging techniques. Images can be aligned by applying a transformation based on a number of parameters. For similarity transformations there are four parameters corresponding to the x-and y-translation, rotation, x-and scaling between the images.

Image registration can handle alignment of images taken from different posi-tions. It can also handle alignment of images captured with different sensors, so called multimodal images. This can be used to merge information from different sensors. One common application is clinical use, for example registering CT im-ages with MR imim-ages [18], [23], [14], [9]. Another use of multimodal registration is the alignment of visual images and IR images [21], [15], [10], [22].

2.1

Related Work

This section summarises existing work on multimodal image registration using both visual and IR images but also other types of multimodal images. The reason why the section contains descriptions of papers, using other images than from the visual and IR spectra, is to shed light on new approaches.

2.1.1

Registration Papers Specified for Visual and IR Images

This subsection describes existing papers on multimodal image registration using visual and IR images. This is to give a hint of what has already been tested within this field, which methods are working, which methods are not working and why it is so.

(16)

6 2 Theory

• Hui H. Li and Yi-Tong Zhou describe an image registration algorithm for visual and IR images [21]. Their method includes a wavelet-based feature detector as well as a contour based critical point extractor to find features in the images.

The combination of contour detection, intensity-based interest point selec-tion and the restricselec-tions made when searching for feature points makes this algorithm fast and robust since corner points on spurious edges are avoided.

The coarse point feature matching by Li and Zhou is performed in two steps to make the algorithm accurate. First an initial matching is performed to identify consistent points. Each feature point in image 1 is compared to all feature points in image 2 within a searching range. Secondly the best match is selected and consistency checking is applied to all matched points. Li and Zhou also propose a finer point feature matching. Image 1 is trans-formed to a new image 1 based on the matches found in the initial matching. Then every feature point in image 2 is compared with points in the corre-sponding area in the new image 1 and every feature point in the new image 1 is compared with points in the corresponding area in image 2. Match-ing points are selected and consistency checkMatch-ing is applied again to all the matching points. Lastly image 1 is again transformed to further improve registration with image 2. The refined matching can be iterated.

• Tomislav Hrkać, Zoran Kalafatic, and Josip Krapac apply a scene specific method for man-made structures [15]. They use corners as the matching features for aligning visual and IR images. The images they use to test their algorithm show facades which have many corners present. In cases with other images, corners can be hard to use because of their sensitivity to, for example, scale and rotation. In their algorithm, they use the Harris corner detector as the feature extractor.

Corners are detected in both visual and IR images. The following step is to find the right transformation to align one point in the visual image to the corresponding point in the IR image. Multiple guesses can be made to transform a corner in the visual image to a corner in the IR image. With aim of decreasing the number of transformations to search among, two assump-tions are made. The first assumption is that corners do not move too much in relation to each other. This assumption is based on that most images taken of facades are captured right in front of them and should therefore look similar. This assumption makes it possible to only focus on corners within some specified radius of the current corner. The second assumption is that the noise and uncertainties of corner detection makes it possible to only focus on points that lie outside some distance. Points lying too close will be very affected even by small changes in the transformation.

The kept guesses are tested using the Hausdorff distance. The result gives a rough estimation and hence need improvements.

(17)

2.1 Related Work 7

• Yanjia Chen et al. describe a coarse-to-fine image registration procedure [10]. For the coarse registration, Zernike moments [27] are used to describe salient regions[10]. Zernike moments can be useful for object recognition since they can be made independent of position, size, and orientation. The fine registration procedure uses an entropy optimal process based on edgi-ness to align the images. This is because meaningful gradient information is limited to the vicinity of the edges which have some similarity in visual images and corresponding IR images. Since many registration algorithms using edge detectors fail, the concept edginess is used instead. Edginess is defined as the level of confidence that an edge pixel is present. This is based on the eigenvalues of the structure tensor. The entropy based similar-ity measure is defined by using edginess information and the best transfor-mation is found by maximising the mutual infortransfor-mation of corresponding edginess. The results prove that the method can handle large intensity dis-tortions.

• Rok Istenic et al. describe an approach for multimodal registration in the Hough parameter space [22]. The method is based on straight lines but can be modified in such a way that the generalised Hough transform can be used to detect nonlinear shapes. For each image pair, the Canny edge detector is used to obtain a binary edge image. These images are later pro-cessed by the Hough transform to extract all linear image segments, since line segments are frequently appearing in the chosen data set. The results show that the method works well for translations and rotations but cannot handle scaling.

2.1.2

Registration Papers not Specified for Visual and IR Images

This subsection describes existing work on multimodal image registration which are not using visual and IR images. Instead they are using other types of multi-modal images, such as MRI and CT.

• Keyvan Kasiri, David Clausi and Paul Fieguth describe a solution to the reg-istration problem when using multiple MR modalities (T1, T2 and PD) [18]. Their solution to the problem is extracting structural information from the images, such as the phase congruency and the magnitude of the gradient. Their aim is to convert the problem from being multimodal to monomodal. They achieve this by using a fusing function to map structural features based on gradient magnitude and phase congruency to the same intensity mapping. When this is achieved, any intensity-based similarity measure can be used to find the optimal transformation parameters. In the article, they choose to use the sum of squared intensity differences. According to their results, their method is better than the current matching functions using mutual information based registration of that time.

• Daniel Forsberg et al. describe how to apply polynomial expansion on the images as a feature extractor and mutual information as a matching

(18)

func-8 2 Theory

tion to align a T1-weighted image with a T2-weighted image [14]. The poly-nomial expansion is used to speed up the calculations by locally approx-imate each signal value with a polynomial. This method is proven to be beneficial due to its fast convergence and high accuracy.

• Shabnam Saadat et al. describe a hybrid registration approach to reduce computational cost and increase accuracy [23]. They use projections of CT volumes and register to real-time fluoroscopy X-ray images. Their method is based on dividing the registration into a coarse and a fine registration. For the coarse registration, they use the Canny edge detector as a feature extractor and for the matching, they use edge position difference (EPD). The EPD is based on the 2D Chamfer distance method [5], which is a good approximation to Euclidean distances for small neighbourhoods.

For the fine registration, they use joint-histogram for feature detection and the sum-of-conditional-variance (SCV) as a similarity measure.

The paper provides promising results as the computational time is reduced by the initial coarse registration and the performance is kept equivalent to the previous best measures.

• Registration can be done not only in 2D but also in 3D. Faysal Boughorbel et al. perform rigid 3D registration using Gaussian fields [6]. They use 3D surfaces made from 3D scans of different objects to test their registration. The experiments show that the method is useful for real-word applications. • Lance M. Kaplan and Nasser M. Nasrabadi describe a block Wiener-based image registration method to align consecutive frames in a video sequence with the purpose of indicating moving targets [17]. By adapting different Wiener filters on various regions in the images, the registration method can account for translational distortions which is beneficial for a better perfor-mance.

• Xiaohuan Cao et al. describe a framework for registrating multimodal (in the article called intermodal) images using neural networks [9]. The training uses pre-registered CT and MR images and an intramodality (monomodal) similarity metric. The network can use the intramodality similarity metric on the input-CT image and the CT image paired with the input-MRI image and vice versa. The results prove that the method is both efficient and accurate.

2.2

Image Filtering

This section describes the image filtering used in Chapter 3. The specific type of image filtering used is bilateral filtering described by Carlo Tomasi and Roberto Manduci [28]. Bilateral filtering combines geometric closeness and photometric similarity of pixel values in the weighting of the filtered pixel values. This results

(19)

2.3 Edge Detectors 9

in an image where the edges are preserved and the noise is reduced. The bilateral filtration of an image, I, is defined by

If(x) = k−1(x)X xi∈Ω

I(xi)c(xi, x)s(I(xi), I(x)) (2.1)

with the normalisation k(x) =P

xi∈Ωc(xi, x)s(I(xi), I(x)). The geometric closeness

between the coordinates of the current pixel, x, and a nearby point, xi ∈ Ω, is

mea-sured by c(xi, x) and is smoothed depending on the difference in coordinates. The

similarity function, s(I(xi), I(x)), is the range kernel for smoothing the difference

in intensity. The smoothing functions can for example be Gaussian functions.

2.3

Edge Detectors

This section describes the edge detectors used in Chapter 3.

2.3.1

Phase Congruency

Phase congruency is a contrast invariant edge and corner detector described by Peter Kovesi [19]. It marks features of interest without making any assumptions about the phase of the Fourier series of the signal. Instead it looks for points in the image where there is a high degree of order in the Fourier domain i.e. where the phases are going in the same direction.

The function used to calculate the phase congruency is defined in one dimen-sion by

P C(x) = P E(x)

nAn(x) + ε

, (2.2)

where E(x) represents the local energy function, An(x) represents the magnitude

of the local frequency component and ε is a small value to prevent division by zero [19]. E(x) is calculated as the magnitude of the sum of the even and odd filter responses en(x) and on(x), E(x) = p(Pnen(x))2+ (Pnon(x))2. The filter

re-sponses are obtained from convolution between the signal and a log Gabor filter. The different scales, n, are used to weigh the significance of the spatial size of the feature. An(x) is calculated as the magnitude of en(x) and on(x) at scale n,

An(x) =

p

en(x)2+ on(x)2. A commonly used log Gabor filter is one with the

trans-fer function

Gn(ω) = e

(log(ω/ω0n))2

2(log(κ/ω0n))2, (2.3)

where the scale, n, adjusts the value of the centre frequency, ω0n. A common preset value for κ/ω0nis 0.55.

The definition of phase congruency outlined above only applies to 1D signals. To obtain an overall measure of phase congruency in 2D, the local energy is first calculated in several orientations, typically six [20]. The maximum moment of the phase congruency can be used as a measurement of the strength of an edge.

(20)

10 2 Theory

This is calculated as the maximum singular value of the phase congruency covari-ance matrix, P CCM = " P P Cx2 P P Cx· P Cy P P Cx· P Cy P P Cy2 # , (2.4)

where P Cx and P Cy are the x and y components of phase congruency for each

orientation.

An example of this measurement is illustrated in Figure 3.6a, which is the maximum moment of the phase congruency for the image in Figure 4.1a.

2.3.2

Sobel Operator

The Sobel operator, described by Irwin Sobel, is a gradient operator that can be used to estimate the gradients in an image [25]. This is achieved by convolving the concerned image with the Sobel filters in x- and y-direction. These filters are defined as Gx=         +1 0 −1 +2 0 −2 +1 0 −1         (2.5) and Gy=         −121 0 0 0 +1 +2 +1         . (2.6)

These filters extract approximations of the derivatives of the image in the x-and y-direction respectively. The norm of the Sobel filter responses is the gradient magnitude. An example of this is illustrated in Figure 3.6b, which is the gradient magnitude of the image in Figure 4.1a.

2.3.3

The Canny Edge Detector

The Canny edge detector is defined by John Canny [8]. Canny bases the detector on three criteria:

1. Good detection. 2. Good localisation.

3. To have one response only for one edge.

The three criteria are expressed mathematically, combined, and optimised. The gradient in each pixel is computed by filtering with the x- and y- derivative of a Gaussian probability function. Edges can be extracted by thresholding the absolute value of the gradient. If the response is high enough, and the gradient direction is correct, an edge pixel is found, otherwise the pixel value is set to zero. To not lose weak edges that belong to objects, a second threshold is set that takes

(21)

2.3 Edge Detectors 11

into account the strength and length of surrounding edges. The Canny edge de-tector is beneficial to use since it is more likely to detect weak edges compared to other edge detectors. Since the derivative filters are based on a Gaussian prob-ability function, it is robust to noise [23]. An example of the Canny edges of an image can be found in Figure 3.8a. It contains the Canny edges calculated from the image in Figure 4.1a.

2.3.4

Quadrature Filter

Quadrature filters are filters which are zero over the negative half of the Fourier space, specifically it has to be zero on the negative axis [7]. For signals with a finite length, the filters yield a band pass approximation of the analytic signal.

Quadrature filters can be used to determine what type of local structures an image contains [13]. They are complex valued in the spatial domain. In 2D, the odd (imaginary) part of the filter detects edges and the even (real) part detects lines. The type of local structure within a region is described by the local phase, which is the relationship between the real and imaginary filter response. The concept of local phase is described by Figure 2.1. The magnitude of the filter response can be used as a certainty measurement describing how likely it is that the filter detected a structure (but not just noise).

Figure 2.1: Figure describing the concept of quadrature filters. Image source: [13].

To be able to perform well, and detect features accurately, the quadrature filter used should be invariant to smooth gradients. It is also necessary that the odd part should perform well according to Canny’s criteria found in section 2.3.3 and that the even part should satisfy the requirement of zero-DC component [7].

(22)

12 2 Theory

2.4

Scale Pyramids

This section describes the basic concept of scale pyramids that are used in Chap-ter 3.

The scale is essential in computer vision and image analysis to be able to de-sign methods for interpreting information from images and multidimensional signals [2]. The obtained information is dependent on the relationship between the size of the actual structures portrayed in the image and the resolution of the camera sensors. Real-world objects are composed of different structures at different scales. For example; points and lines may appear in different ways de-pending on the scale of observation. When using scale space analysis, structures at different scales in the images can be analysed. This is useful when developing automatic algorithms interpreting images of unknown scenes, since there is no way to initially determine which scale is relevant.

The images at different scales can form a so called scale pyramid, composed of N images corresponding to the original image scaled down with the factor 2n

for n = {0, 1, ..., N − 1}. An example of an image pyramid with three scales is illustrated in Figure 2.2.

Scale 20

Scale 21

Scale 22

Figure 2.2: Image pyramid illustrating theKAIST image scaled down with

factor 20, 21and 22.

2.4.1

Downsampling of Hard Edges

Downsampling binary edges can be done as described by Gunilla Borgefors [5]. Downsampling by just taking every other pixel cannot be applied to binary edges due to that the edges can be excluded if they are one pixel wide. To get around this problem, each pixel in the downsampled image depends on four pixels in the original image. The original image is divided into patches containing four pixels (creating a square patch). Then each patch gets to vote, edge or no edge. If at least one of the four pixels votes for edge, then the pixel in the downsampled image corresponding to the current patch in the original image is set to be an edge. This way no information will be lost. An example of a binary edge image (a Canny image) and a downsampled version can be found in Figure 3.8.

(23)

2.5 Error Measurements 13

2.5

Error Measurements

This section describes the error measurements used in Chapter 3.

2.5.1

Intensity Based Optical Flow

Anders Eklund describes an intensity based registration approach based on min-imising the optical flow error [11]. It uses the following assumptions:

1. The motion between two images, I1 and I2, can locally be described as a movement v(x), i.e. there is no intensity change between the images I2(x + v(x)) ≈ I1(x).

2. The intensity in the image can locally be described as a slanted plane, i.e. as a first order Taylor expansion I(x + v(x)) ≈ I(x) + v(x)TI(x).

When combining the assumptions, the classical optical flow equation, ∇ITv − ∆I= 0, can be derived (where ∇I is the image gradient and ∆I is the intensity difference between I1 and I2). The registration is per-formed by finding the motion field that minimises the least square error

2=P

i(∇Ii(xi)Tv(xi) − ∆I(xi))2. The motion field is modelled as

v(x) = B(x) · p ="1 0 x y 0 0 0 1 0 0 x y #                      p1 p2 p3 p4 p5 p6                      ="p1 p2 # +"p3 p4 p5 p6 # "x y # (2.7)

where p is a 6-dimensional parameter vector containing the transformation pa-rameters and B(x) is a base matrix.

When calculating the derivative of the least square error with respect to the parameter vector, p, and setting it to zero, an optimal parameter vector, poptimal = A−1h, can be found. Here A =P

iBiT∇IiITi Bi and h =PiBiT∇Ii∆Ii.

The displacement between the images cannot be found directly. This is because the assumptions made are not valid everywhere in the image. The parameter vector that is best in the least square sense might not be the best solution lo-cally. Hence, the best solution is found by iterating the algorithm. For each iteration, a new poptimal is calculated and added to the total parameter vector

ptotal = ptotal+ poptimal. The final ptotalis then used to calculate a motion vector

for each pixel using the model for motion field (2.7). Interpolation is used to calculate the adjusted image.

Figure 2.3 illustrates two images which contain structures with the same in-tensities. The structure in Figure 2.3b is a transformation of the structure in Figure 2.3a. The intensity based optical flow method can be used to generate a motion field that aligns the images, by comparing intensity values in each pixel. The motion field is illustrated with red arrows in Figure 2.3b.

(24)

14 2 Theory

(a)Example image containing a black rectangle.

(b)Transformed example image with the motion field generated to realign it with the example image in Figure 2.3a.

Figure 2.3: Figure 2.3b is a transformation of Figure 2.3a. Since the struc-tures have same intensity the motion field can be generated by the intensity based registration approach to align the images.

2.5.2

Phase Based Optical Flow

Another registration method described by Anders Eklund, Mats Andersson and Hans Knutsson is a phase based registration using quadrature filters [12]. This approach is very similar to the the intensity based optical flow approach in Sec-tion 2.5.1 but instead of using the intensity from the images, the local phase is extracted. It is extracted by using quadrature filters. The updated optical flow equation, ∇ϕTv − ∆ϕ = 0, is minimised in a similar manner as in the intensity

based approach.

The local phase from a complex valued filter response, such as the quadrature filtered images, will tell if there is a line or an edge present and also what kind of edge (dark to bright or vice versa). Since phase conception is valid only if a direction is defined, one filter in each wanted direction has to be used. For images, i.e. the 2D case, it is sufficient with two filters (typically in x- and y-direction).

The phase information is only of interest if the magnitude of the filter re-sponses are high for both images, otherwise the phase is more or less noise. It is also important that the phase does not differ too much in order to make sure that the images contain the same type of structure. Due to this, it is important to use filters that cover the corresponding structures in both images.

In order to ensure that the structures are strong and of the same type, a cer-tainty measurement is formed as

c =p|q1q2|cos ϕ

2 2

(2.8) where q1 and q2 are the filter responses from the quadrature filters. Since the quadrature filters only give the phase in one direction, one certainty

(25)

measure-2.5 Error Measurements 15

ment for each direction needs to be used to know which filters to trust in each point. The phase difference, ∆ϕ, is calculated by

ϕ = arg(q1q

2), (2.9)

where∗

denotes the complex conjugation. The least square error is calculated in the same way as in the intensity based approach, but the local phases are used instead of the local intensities and a certainty measurement, cik, is taken into

account. The error for this approach is calculated according to

2=X

k

X

i

cik(∇ϕk(xi)Tv(xi) − ∆ϕk(xi))2 (2.10)

i.e. a sum of all pixels, i, and all filters, k. Apart from this, the registration process works in the same way as for the intensity case where the parameter vector, p, is calculated and optimised.

Figure 2.4 illustrates two images with different intensities but same structures present. The structure in Figure 2.4b is a transformation of the structure in Figure 2.4a. The phase based optical flow approach uses phase information to register images, therefore it is able to generate a motion field that align the images with each other. The motion field is illustrated as red arrows in Figure 2.4b.

(a)Example image containing a black rectangle.

(b)Transformed example image with reversed intensity and the motion field generated to realign it with the example image in Figure 2.4a. Figure 2.4:Figure 2.4b is a transformation of Figure 2.4a with reversed in-tensity. The motion field generated by the phase information in the images illustrates how Figure 2.4b should be moved to align the images.

2.5.3

Chamfer Distance

A measurement called Chamfer distance can be used for matching edges or lines in images (Chamfer matching) and hence it can also be used to align images [3].

(26)

16 2 Theory

The method needs to be informed where in the images the edges are. This infor-mation is provided by an array (Canny).

The array needs to be converted into an edge-distance map presenting the distance to the closest edge. The edge-distance map can be calculated by two passes over an image, where there are zeros on the positions of the edges and infinity otherwise. The first pass is made diagonally from the top left to the bottom right. The second pass is made diagonally from the bottom left to the top right. For each pass the edge distance for the current position is calculated as the smallest adjacent value plus some distance map weight. If the value for the current position is smaller than the adjacent ones when the distance map weight is added, the current value is kept. The distance map weight can be chosen such that one weight value is added if a step is taken horizontally or vertically and a higher weight value is added if the step is taken diagonally.

Borgefors states that the maximum difference between the Chamfer distance and Euclidean distance is as low as 8% if the distance map weights 3 and 4 are used [5]. Distance map weights 3 and 4 means that each step taken vertically or horizontally is punished by 3 and each step taken diagonally is punished by 4. This can be seen in Figure 2.5. Borgefors also states that the Chamfer distance is good enough compared to the Euclidean distance. The difference between them is within the noise range and the Chamfer distance demands less computation. An example of how the distances from an edge can look like is seen in Figure 2.5, where Figure 2.5a illustrates an example edge, the pixels with value one rep-resents the edge, and Figure 2.5b illustrates the distances from the edges, when the distance map weights are 3 and 4. The edge values in each pixel should be divided by three to generate a horizontal and vertical distance map weight which is equal to 1 and a diagonal distance map weight which is close to

2. This is to generate a distance map similar to a Euclidean. In Figure 2.5b, the distance map weights have not been divided by three to simplify the explanation by only using integers.

To calculate the error, the values in the edge-distance map, which correspond to the edges in the second image used in the matching process, are summed. The best match is found by moving the edge points in the second image such that the error is minimised. The movement is made in steps where the movement gen-erating the smallest error from a local search is used to generate a new starting position for the search of the next movement. To speed up the Chamfer match-ing algorithm, the search can be done at different scales. This act also makes the matching more robust to noise. Depending on which type of transformation separates the images, different optimal step lengths for the local search transfor-mations can be calculated. The optimal step lengths for the local search are also dependent on the used scale of the images. The step length is optimal if it makes the edge points furthest away from the centre of the image move one pixel.

Bad matches are made to have a big impact on the error by using the root mean square average instead of the sum when calculating the total error.

(27)

2.5 Error Measurements 17

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

1

1

(a) Example of an edge where ze-ros represent "no edge" and ones represent "edge".

7

8

4

3

7

4

3

4

4

3

3

3

3

4

7

8

4

3

3

6

0

3

0

0

0

(b)Example of an edge-distance map where the distance map weights are 3 and 4.

Figure 2.5: The edge and its corresponding edge-distance map. If the pixel values in the edge-distance map were to be divided by 3, the horizontal and vertical distance map weights would be close to 1, i.e. the Euclidean distance.

2.5.4

Hausdorff Distance

Hausdorff is a similarity measurement which can be used as an error for im-age registration [26]. It finds the largest Euclidean distance among the small-est distances found between two sets of points. The alignment of images can be achieved by minimising this measurement. The set of smallest distances, found between two sets of points, contains the shortest distance found between point

aiA for i = 0 . . . N − 1, where N is the number of edge points in the first image,

to any point in the second set bjB for j = 0 . . . M − 1, where M is the number

of edge points in the second image. This can be summarised as ˆ H(A, B) = maxa∈A {minb∈B {ka, bk}}. (2.11)

An example of this for one point ai, in set A, can be seen in Figure 2.6a. This

has to be done in both directions, i.e. from set A to set B and from set B to set A. Calculating the largest distance for one of these directions is called the directed Hausdorff distance. An example of when the largest distance is found can be seen in Figure 2.6b. The final distance is the maximum found when comparing the largest smallest distance found from set A to set B and from set B to set A.

Abdel Aziz Taha and Allan Hanbury describe three different changes that can be done, to the Hausdorff distance search, to speed up the search process [26]. First they suggest what they call early stopping. This means that if a distance is found between aiA and any b ∈ B that is smaller than the largest distance

found, the search can stop for aiand can continue for ai+1since the distances will

(28)

18 2 Theory

(a)The dashed lines are the distances from point aito all points in set B. The red dashed line is the shortest one.

(b)The dashed lines are the shortest distance found from a point in set A to a point in set B. The green dashed line is the largest one.

Figure 2.6: The dashed lines are the distances from one point in set A to one point in set B. The left image shows how the smallest distance is found for one point in set A, and the right image shows how the "largest smallest distance" is found.

in set B. If the distance calculated between a point in set A and a point in set B is larger than the largest distance found in the stored set of smallest distances, but not the smallest for the point in A that is currently being looked at, the following distances will probably not fulfil the criteria either. When randomising B, the search process will be faster since early stopping can be found faster. The third suggestion is to remove points from A which also exist in B, since the distance will be zero for those points and there is no value in calculating their distances.

2.6

Performance Measurements

This chapter describes the performance measurements used in Chapter 3.

2.6.1

Cross-correlation

The cross-correlation between two images is a standard approach for calculating their degree of similarity. The cross-correlation, C, between I1and I2is calculated pixel-wise according to C(i, j) = N −1 X n=0 M−1 X m=0 I1(m, n) · I ∗ 2(m + i, n + j) (2.12)

where∗ denotes the complex conjugate, 0 ≤ i < 2M − 1 and 0 ≤ j < 2N − 1. C forms an image with size (2N − 1) × (2M − 1) describing the correlation for every possible translation.

(29)

2.6 Performance Measurements 19

2.6.2

Mutual Information

The mutual information (MI) between the signals, for example between two im-ages, can be calculated as

MI(I1, I2) = N −1 X n=0 M−1 X m=0 P (I1(n)|I2(m))P (I1(n)) log       P (I1(n)|I2(m)]) P (I1(n))       (2.13)

where P is the probability. I1and I2are the two images. N and M are the number of pixels in the images [24].

2.6.3

Structural Similarity

The structural similarity (SSIM) index is a measurement for the similarity be-tween two images [29]. The equation calculating the SSIM is based on comparing the three image components; luminance, contrast, and structure. It is calculated by

SSI M(x, y) = (2µxµy+ c1)(2σxy+ c2)

(µx2µy2+ c1)(σx2σy2+ c2)

(2.14) where µxis the average of window x, which is an image patch from one image, µy

is the average of window y, which is an image patch from the other image, σx2is

the variance of x, σy2is the variance of y and σxyis the covariance of x and y and

(30)
(31)

3

Method

This chapter describes the methods used to fulfil the aims of the thesis. Section 3.1 describes how the work was divided between the two group members. Section 3.2 describes a general outline of the two methods. Section 3.3 contains justifica-tions for each part of the selected methods. Section 3.4 describes the data set used to test the methods. Section 3.5 describes the pre-processing done before the reg-istration methods were applied. Sections 3.6 and 3.7 describe the two methods (soft edge and hard edge) used to register the images. Section 3.8 describes the measures used to evaluate the performance of the methods. Lastly, Section 3.9 describes how the error measures from the registration methods were used for position determination and Section 3.10 contains two algorithms describing the methods. The implementation of the algorithms was made with Python as the framework and the computer used for the calculations had one CPU with a clock frequency of 3.4GHz and 32GB RAM.

3.1

Division of Labor

The work was divided between the two group members as follows; Victoria Nord-berg was responsible for the implementation and evaluation of the soft edge method and Linnea Fridman was responsible for the implementation and eval-uation of the hard edge method. Parts of the two methods that coincided were jointly implemented and described.

3.2

General Outline

The general outline of the whole registration process is described by the flowcharts in Figure 3.1.

(32)

22 3 Method

Figure 3.1a describes the pre-processing made for actual use of the algorithm. The pre-processing described in Figure 3.1b was not part of the algorithm made for real use but only to simulate distortions between the images in order to evalu-ate the performance. The difference between the two pre-processing approaches, was that a transformation was defined and applied to the image in the second one. When evaluating the performance of the methods, the visual images were pre-processed according to Figure 3.1a and the IR images were pre-processed ac-cording to Figure 3.1b. That way, the visual and IR image used in the registration were both filtered and related to each other by a known transformation. Figure 3.1c describes the general workflow used by both registration methods.

After the pre-processing of the images, the registration process began. The first step was feature extraction and downsampling. These were performed for both methods but in different orders. The soft edge method downsampled the images and then extracted the features while the hard edge method extracted the features and then downsampled the feature images. Then the features were reg-istered in a way that minimised some chosen error measurement and estimated the transformation parameters which formed a transformation matrix T . This registration process was performed on several image scales and the final trans-formation matrix Tf inal was applied to the original IR image to align it with the

visual image. When this was done, an evaluation was performed on the results to give a measurement of confidence for the proposed transformation.

(a)Flowchart describing the general pre-processing used for the images.

(b) Flowchart describing the pre-processing used for the IR images for evaluation purposes.

(c)Flowchart describing a general registration process used by both methods. Figure 3.1:Flowcharts describing a general workflow used by both methods.

(33)

3.3 Motivations for the Selected Methods 23

3.3

Motivations for the Selected Methods

The two methods were chosen to use different information from the images. The reason for this was to be able to investigate how the two separate approaches of information extraction affected the registration.

The soft edge method used a feature extractor inspired by the work of Kasiri, Clausi and Fieguth [18]. They use a mix of phase congruency and gradient infor-mation from images to extract edges and receive a measurement of edge strength. These edge detectors are described in Section 2.3.1 and 2.3.2. The hypothesis was that the combination of two separate edge detectors would yield a more reliable edge strength measurement. The hypothesis was also that the more information that could be extracted from the images, the more certain the algorithm could be when searching for a solution.

The used registration method was the phase based optical flow approach, de-scribed in Section 2.5.2. This approach uses a certainty measurement which takes the magnitude and the phase difference of a quadrature filter response into con-sideration to decide if the detected edge is trustful or not. What differed between this registration approach and the soft edge method was the images used as in-put. The soft edge method used the feature images formed in a similar way as in [18] whereas the method described in Section 2.5.2 uses original images. The motivation for this was that the certainty measurement in the phase base optical flow depends on the magnitude and the magnitude of the features depend on the strength of the edges.

Since the phase based optical flow method uses quadrature filters to find the transformation parameters, the magnitude of the transformation that can be han-dled, is limited to the quadrature filter size. This is the motive for using down-sampled images. Since the downdown-sampled images create a rough estimation, scale pyramids were used to refine the transformation and yield a more accurate image registration.

The hard edge method used two different approaches. Saadat et al. gave inspi-ration for the first approach [23]. They use Canny edges from medical images to do a coarse registration. The aim was to investigate if their method could work for visual and IR images as well. The selected error measurement was the hierarchi-cal Chamfer matching described by Borgefors in [5]. Some of the steps described by Borgefors were used in the implementation of the hard edge method. The deci-sion of which parts were used was based on trying to create a global minimum at the correct position. The registration was performed on different scales to speed up the calculations and to decrease the number of faulty error minima. Between the scales, "bad points" were removed. "Bad points" were defined as those points in the edge images contributing the most to the error. These points were removed since they probably did not exist in both image domains and would therefore only disturb the global minimum.

The second hard edge approach was inspired by the work of Hrkać, Kalafatic, and Krapac [15]. This approach used the Hausdorff distance as the error mea-surement but was discarded because of early indications that it would yield in-sufficient results.

(34)

24 3 Method

3.4

Data Set

During the implementation of the algorithms, a multimodal data set were used. To test the methods at an early stage, a simple test image pair was created in Mi-crosoft PowerPoint, see Figure 3.2. This image pair contained some matching and some non-matching structures with different intensities to resemble attributes from real visual and IR images. One example of a non-matching structure is the black structure in the middle of the "visual image" in Figure 3.2a. The size of the images were 512 × 640 pixels.

(a)"Visual image" from the simple test image pair.

(b)"IR image" from the simple test im-age pair.

Figure 3.2:Simple test image pair generated to resemble visual and IR image attributes.

In addition, subsets of the data setKAIST was used to test and evaluate the

algorithms [16].KAIST consists of a set of aligned visual and IR images captured

from a driving vehicle. Since the car is driving along a road many of the images are very similar to each other. The size of the images are 512 × 640 pixels. Figure 3.3 shows an example of an image pair fromKAIST.

TheKAIST images are captured using a specific hardware setup and a

cali-bration procedure in order to align the images [16]. The result of the calicali-bration procedure is that the two image domains are sharing the same focal length, have the same principal point and no baseline. However, there still exists some mis-alignment between the images, as can be seen in Figure 3.4 where an overlay of the visual and IR image pair from Figure 3.3 has been created. The overlay was created by adding the greyscale version of the visual image with the greyscaled IR image put in the red color channel. Figure 3.4a shows the overlay of the images in the original size with two red boxes highlighting areas where the distortion is clearly visible. Figure 3.4b and Figure 3.4c show zoomed versions of these boxes. Potential causes for the distortions are lens distortion and differences in camera characteristics such as exposure time which highly affects the imaging of dynamic objects. These misalignments are disregarded in the rest of the report and the images are referred to as aligned.

(35)

3.4 Data Set 25

(a)Visual image fromKAIST. (b)IR image fromKAIST.

Figure 3.3:Two example images fromKAIST data set.

(a) Example image with red boxes

where the misalignment is visible.

(b) Example area where misalign-ment can be seen in the trees.

(c)Example area where misalignment can be seen on the post.

Figure 3.4:Overlay of the visual image (in grey) and IR image (in red) show-ing examples of misalignment between the images.

(36)

26 3 Method

3.5

Pre-Processing

This section describes the pre-processing applied to the images before the two methods were applied to them. First a transformation was applied, secondly the images were filtered to enhance desired features.

3.5.1

Application of Transformation

Since the images, used to test the registration methods, were aligned, a similarity transformation was applied to the IR image in order to be able to test the perfor-mance of the registration algorithms. The applied transformation was defined as a translation in x- and y-direction, a rotation angle and a scale factor. The result was an IR image rotated around its centre, scaled and translated with the chosen parameters. An example of an overlay of a visual image and a transformed IR image can be found in Figure 4.6a. The IR image was first rotated 4 degrees and scaled with a factor of 1.2. This caused some of its content to lie outside the visual image range and was therefore cropped. After this, the translations were applied 40 pixels to the right and 30 pixels down. Since the IR image was cropped, no content was visible at the left and top border of the overlay. During the registra-tion the informaregistra-tion in this area was ignored, else an edge would be extracted at the IR image border.

3.5.2

Image Filtering

The edges in the images were features desired to enhance since some of the edges probably existed in both image domains. To enhance the edges, bilateral filtering was applied according to (2.1). Figure 3.5 shows a zoomed in version of the visual image in Figure 3.3a before and after the bilateral filtration.

(a) Before bilateral filtration. (b) After bilateral filtration. Figure 3.5: A zoomed in version of the visual image before and after bilat-eral filtering.

(37)

3.6 Soft Edge Method 27

3.6

Soft Edge Method

Following Section describes the method based on soft edge features. The general workflow is described by algorithm 1 in Section 3.10.

3.6.1

Generation of Scale Pyramids

As described by Figure 3.1c and algorithm 1 on line 2-3, the first task of the regis-tration was to create an image pyramid with N images scaled down with a factor 2nwhere n = {0, 1, ..., N − 1}. To reduce the risk of aliasing while downsampling the images, a Gaussian filtration was performed beforehand. The selected sigma for the Gaussian filter was 2n+1/6.0 which corresponds to a filter mask twice the

size of the scale factor that covers more than 99% of the Gaussian distribution for an image scaled down with a factor 2n. The number of scale levels N used in the registrations was chosen based on the size of the filters used. N was here set to

N = 5 since then the filters covered 30% of the image, which should make it able

to handle small transformations.

3.6.2

Feature Extraction Using Phase Information

The second task was to extract features from the images. This step corresponds to line 11-12 in algorithm 1. The soft edge method was extracting features by using a combination of phase congruency and gradient magnitude. Figure 3.6 shows the two components and the final soft edge features of Figure 4.1a.

Figure 3.6a illustrates the phase congruency which was calculated as described in Section 2.3.1. The number of orientations was set to six and the remaining pa-rameter values were set to the recommended ones. Figure 3.6b illustrates the gradient magnitude which was calculated as described in Section 2.3.2 but with a slight modification of the Sobel filters. The modified Sobel filters are defined as

Gx=                 0 +4 0 −4 0 0 +8 0 −8 0 0 +12 0 −12 0 0 +8 0 −8 0 0 +4 0 −4 0                 (3.1) and Gy=                 0 0 0 0 0 +4 +8 +12 +8 +4 0 0 0 0 0 −481284 0 0 0 0 0                 . (3.2)

Figure 3.6c illustrates the combination of phase congruency and gradient mag-nitude which are referred to as the soft features.

(38)

28 3 Method

(a)Phase congruency. (b)Gradient magnitude.

(c)Combination of phase congruency and gradient magnitude.

Figure 3.6: Feature extraction using phase information of a visual KAIST

image. White pixels corresponds to 0 and black pixels corresponds to higher values.

3.6.3

Estimation of Transformation Parameters

The next task was to estimate the transformation parameters used to align the im-ages. This corresponds to line 16 in algorithm 1. To do this, the soft edge method used the phase based optical flow method described in Section 2.5.2. The basic outline of the method was filtering the images, by using quadrature filters, in a number of different directions, in this case two. From the filter responses, the magnitude and phase difference were extracted and together they formed a cer-tainty measurement for each pixel, calculated by (2.8). These three attributes are illustrated in Figure 3.7. Figure 3.7a shows the absolute value of the product of the filter responses, q1yand q2y. The filter responses q1yand q2ywere generated using a quadrature filter in y-direction on the soft feature images generated from the visual and IR image. Figure 3.7b shows the phase difference from these filter responses calculated by (2.9). Figure 3.7c shows the certainty for each pixel

(39)

cal-3.6 Soft Edge Method 29

(a)Absolute value of the product of the filter responses q1Y and q2Y, in

log10scale.

(b) Phase difference ∆ϕY according

to (2.9).

(c) Certainty measurement cY

ac-cording to (2.8), in log10scale.

Figure 3.7: Components used to estimate transformation parameters by minimising the optical flow error. The components are produced by the filter response from a quadrature filter in y-direction.

culated by (2.8). These attributes combined with a transformation model were used to compute and minimise the optical flow error in order to get the optimal transformation parameters to align the images.

The phase based optical flow approach, described in Section 2.5.2, uses an affine transformation model and therefore estimates six parameters. This thesis only treats estimations of similarity transformations, hence their method was ad-justed to estimate only four parameters. The modifications needed, were a slight change in the B(x)-matrix and the p-vector used to model the motion field v(x), in (2.7) giving v(x) = B(x)p ="1 0 x −y 0 1 y x #             p1 p2 p3 p4             ="p1 p2 # +"p3 −p4 p4 p3 # "x y # , (3.3)

(40)

30 3 Method

where p1 and p2 corresponds to the translation and p3 = s · cos(θ) and

p4= s · sin(θ) where θ is the rotation angle around the image centre and s was the scale factor.

3.6.4

Refinement of the Transformation Parameters

The soft edge registration began with estimation of the transformation parame-ters for the coarsest visual and IR image pair in the pyramid. These parameparame-ters were applied to the next IR image in the pyramid, hence the transformation pa-rameters were more and more refined for each level in the pyramid. This proce-dure is described on line 7-18 in algorithm 1. Line 14 corresponds to the move-ment of the IR image before the refined registration and line 18 corresponds to the creation of the refined transformation matrix.

Because of the rescaling, the estimated parameters had to be adjusted when using the previous transformation parameters on the current image pair. Given the transformation matrix in scale n,

Tn="pn3

pn4 pn1

pn4 pn3 pn2

#

the corresponding transformation for scale m (n ≥ m) was calculated as Tm ="pm3 pm4 pm1 pm5 pm6 pm2 # ="pn3pn4 pn1· 2 (n−m) pn4 pn3 pn2· 2(n−m) # . (3.4)

The product of all generated transformation matrices, adjusted to the original image scale, yielded the final transformation matrix T used to align the images. The number of scales was limited to three (n = {4, 3, 2}) in order to reduce com-putations.

3.7

Hard Edge Method

This section describes the attempted methods for registration using the hard edge approach. For hard edges, two approaches were evaluated, the difference be-tween them was that one used Hausdorff distance, described in Section 2.5.4, and the other used Chamfer distance, described in Section 2.5.3, as the error mea-surement. The Hausdorff method showed insufficient results early and was not deeper investigated after that. The workflow for the approach using Chamfer distance can be found in algorithm 2 in Section 3.10.

3.7.1

Feature Extraction Using the Canny Edge Detector

Both attempts made for registration using hard edges used Canny’s algorithm to perform the edge detection as described in Section 3.7.1, see lines 2 and 3 in algorithm 2. The parameters for Canny’s algorithm were chosen such that the number of edge points in the binary edge images were more than some set threshold. The threshold was first based on visual assessment of the content

(41)

3.7 Hard Edge Method 31

(a)Canny edge image for the visual image at original scale.

(b) Canny edge image for the visual image after it has been downsampled with a factor 21.

Figure 3.8: Canny edge images for visual images.

in one pair of theKAIST images, then a second threshold was tried based on

the results from the first attempt. This made sure only the strongest edges were chosen and that enough information was extracted. The Canny edge method uses a lower and upper threshold. The ratio between these was set as 1:3. Figure 3.8a shows the Canny edges for Figure 3.3a after it has been filtered (Figure 4.1a).

3.7.2

Generation of Canny Scale Pyramid

From the Canny edge images, images at coarser scales were calculated using the downsampling described in Section 2.4.1. Figure 3.8 shows the Canny edges for the visualKAIST image at the original scale in Figure 3.8a and when the edges

have been downsampled one level in Figure 3.8b. The scale levels n used were

n = {0, 1, 2}. This was based on the visual degradation of the structures and

per-formance between the scales. This step can be found in line 5 and 6 in algorithm 2.

3.7.3

Hausdorff Matching

The binary images obtained from the Canny algorithm were turned into two lists, one list containing the edge points coordinates for the visual image and one list containing the edge points coordinates for the IR image. These point lists were used for calculating the Hausdorff distance between the images which was used as the error measurement calculated by (2.11). To speed up the algorithm the modifications described in Section 2.5.4 were applied.

The registration was performed by first calculating the Hausdorff distance for the following cases: no additional transformation applied, translation applied in the positive x-direction, translation applied in the negative x-direction, transla-tion applied in the positive y-directransla-tion and translatransla-tion applied in the negative

(42)

32 3 Method

y-direction. The applied translation was one pixel. These five errors were com-pared, and the transformation resulting in the smallest error was chosen and applied to the IR image. This was iterated until none of the five new errors were smaller than the error from the last iteration. The total translation obtained from these iterations was then used as the final estimated translation.

3.7.4

Chamfer Matching

The Canny response images were used for calculating the Chamfer distance be-tween the images, which in turn was used as the error measurement. The Cham-fer distance is described in Section 2.5.3. The edge-distance maps and Canny edge points used for calculating the Chamfer distance are shown in Figure 3.9. The root mean square average was used to calculate the error. Since the visual image and the IR image both contain some information which exists in only one of the domains, the Chamfer distance was calculated for both images and then summed to get the final error. The full equation for the error is

 = 1 3 s 1 NV is X p∈V isCny I RChamf [p]2+1 3 s 1 NI R X p∈I RCny V isChamf [p]2. (3.5)

In (3.5), V isCny and I RCny are the Canny response images for the visual and IR image, p is a point in the Canny response images, NV isand NI Rare the number

of points in the visual and IR Canny responses and V isChamf and I RChamf are the edge-distance maps for the visual and IR image. The average is divided by three to compensate the distance map weight.

The estimation of the transformation parameters was performed in the same way as in Section 3.7.3, but instead of using the Hausdorff distance the Cham-fer distance was used as error measurement. When rotation and scaling were added as transformation parameters the same approach as for translations was used. The translation, rotation and scaling were all tested simultaneously, i.e. four translations, two rotations and two scalings, a total of 8 different transfor-mations. The transformation generating the smallest error was chosen. The step lengths for the transformation parameters were adjusted for each scale level in the scale pyramid and chosen based on visual assessment on how much the im-ages were moved. For translation the step was 2npixels, for rotation the step was 0.1 · 2ndegrees and for scaling the step was 0.01 · 2nwhere n was the current scale level in the pyramid. This is performed in line 22 in algorithm 2. The iteration performed for each new step taken is performed in line 20-22 in algorithm 2.

3.7.5

Removal of Bad Points

In this algorithm "removal of bad points" means removal of those points con-tributing the most to the total error. The information about which points were "bad" was stored in masks where all pixels were ones except the pixels which con-tributes most to the final error, those pixels were zero. These masks are stored in the lists defined in line 8 and 9 in algorithm 2. The application of these masks

References

Related documents

Robust Image Registration for Improved Clinical Efficiency..

På grund av att många små kommuner helt har överlåtit ansvar till för- bunden i fråga om verksamhet som egentligen ska bedrivas i kommunen, har det inte varit helt lätt att

Second, an extensive PubMed search was conducted using the following search terms in 11 combinations: chronic, pulmonary, airways OR lung OR pulmonary, disease, prognosis OR

Linköping Studies in Science and Technology, Dissertation No.. 1862, 2017 Department of

At the beginning of this study, one of the pioneer General Delegates of National Security in Cameroon tries to describe the CIDP project was to lead to a system where all the citizens

The list of feasible actions to reach a sustainable fashion supply chain with today’s technology have been compiled via industry dialogue regarding the possibility to

In connection with Eklund (2008) a webpage was created to accompany the JIPA paper (Eklund, 2008), but the website also covers other ingressive phonation types (like

In comparison with existing curve fitting technique shows that the objective value of PSNR of proposed scheme is better but the subjective quality of the curve fitting