Image Enhancement over a Sequence of Images

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Image Enhancement over a Sequence of Images

Examensarbete utfört i Signal- och bildbehandling vid Tekniska högskolan i Linköping

av

Mikael Karelid

LiTH-ISY-EX--08/4013--SE

Linköping 2008

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Image Enhancement over a Sequence of Images

Examensarbete utfört i Signal- och bildbehandling

vid Tekniska högskolan i Linköping

av

Mikael Karelid

LiTH-ISY-EX--08/4013--SE

Handledare: Peter Bergström

Statens Kriminaltekniska Laboratorium

Examinator: Klas Nordberg

isy, Linköpings Universitet

(4)

(5)

Avdelning, Institution

Division, Department

Department of Electrical Engineering Department of Electrical Engineering Linköpings universitet

SE-581 83 Linköping, Sweden

Datum Date 2008-05-28 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://www.isy.liu.se http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11942 ISBN — ISRN LiTH-ISY-EX--08/4013--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title

Bildförbättring över en bildsekvens, del 2 Image Enhancement over a Sequence of Images

Författare

Author

Mikael Karelid

Sammanfattning

Abstract

This Master Thesis has been conducted at the National Laboratory of Forensic Science (SKL) in Linköping. When images that are to be analyzed at SKL, pre-senting an interesting object, are of bad quality there may be a need to enhance them. If several images with the object are available, the total amount of infor-mation can be used in order to estimate one single enhanced image. A program to do this has been developed by studying methods for image registration and high resolution image estimation. Tests of important parts of the procedure have been conducted. The final results are satisfying and the key to a good high resolution image seems to be the precision of the image registration. Improvements of this part may lead to even better results. More suggestions for further improvements have been proposed.

Nyckelord

(6)

(7)

Abstract

This Master Thesis has been conducted at the National Laboratory of Forensic Science (SKL) in Linköping. When images that are to be analyzed at SKL, pre-senting an interesting object, are of bad quality there may be a need to enhance them. If several images with the object are available, the total amount of infor-mation can be used in order to estimate one single enhanced image. A program to do this has been developed by studying methods for image registration and high resolution image estimation. Tests of important parts of the procedure have been conducted. The final results are satisfying and the key to a good high resolution image seems to be the precision of the image registration. Improvements of this part may lead to even better results. More suggestions for further improvements have been proposed.

Sammanfattning

Detta examensarbete har utförts på uppdrag av Statens Kriminaltekniska Labo-ratorium (SKL) i Linköping. Då bilder av ett intressant objekt som ska analyseras på SKL ibland är av dålig kvalitet finns det behov av att förbättra dessa. Om ett flertal bilder på objektet finns tillgängliga kan den totala informationen från dessa användas för att skatta en enda förbättrad bild. Ett program för att göra detta har utvecklats genom studier av metoder för bildregistrering och skapande av högupplöst bild. Tester av viktiga delar i proceduren har genomförts. De slut-giltiga resultaten är goda och nyckeln till en bra högupplöst bild verkar ligga i precisionen för bildregistreringen. Genom att förbättra denna del kan troligtvis ännu bättre resultat fås. Även andra förslag till förbättringar har lagts fram.

(8)

(9)

Acknowledgments

I would especially like to thank Peter Bergström, Fredrik Eklöf, Tobias Höglund, Per Brolund and also Johnny Bengtsson for discussing problems, finding solutions and coming up with new ideas. I would also like to thank them for letting me write this thesis in the first place.

(10)

(11)

2 Image enhancement over a sequence of images 5 2.1 Image registration . . . 5 2.2 Overview . . . 7 3 Mathematical Introduction 9 3.1 Homogeneous coordinates . . . 9 3.2 Introduction to transforms . . . 9 3.3 Transformation of planes . . . 10 4 Features in images 13 4.1 Textures . . . 13 4.2 Edges . . . 13

4.2.1 Canny edge detection . . . 14

4.3 Lines . . . 14

4.4 Corners . . . 15

4.4.1 Detection with Harris corner detector . . . 15

4.4.2 Detection with SUSAN corner detector . . . 17

5 Feature matching 19 5.1 Manual matching . . . 19

5.2 Automatic matching . . . 19

5.2.1 Matching with cross-correlation . . . 19

5.2.2 Matching with phase filtering . . . 21

6 Homography estimation and transformation 25 6.1 Transforming . . . 25

(12)

x Contents

7 Estimation of high resolution image 29

7.1 Image superposition . . . 29

7.2 Pixel-spraying . . . 32

8 Important intermediate steps 35 8.1 Choosing area of interest . . . 35

8.2 Interlaced images . . . 36

8.3 Extraction of straight edges and lines . . . 39

8.4 Interpolation . . . 41

8.5 Automatic feature matching and homography re-estimation . . . . 43

8.6 Iterative point position adjustment . . . 46

9 Tests, experiments and results 49 9.1 Automatic feature matching test . . . 49

9.2 Iterative point position adjustment test . . . 51

9.3 Estimation of high resolution image test . . . 53

9.4 Final test: From low resolution to high resolution . . . 57

10 Discussion and conclusions 61 10.1 Image registration conclusions . . . 61

10.2 Data integration conclusions . . . 62

10.3 Future improvements . . . 62

Bibliography 65 A Further explanation of methods 67 A.1 Color to intensity conversion . . . 67

A.2 Canny edge detection . . . 67

A.3 SUSAN edge detection . . . 67

A.4 Harris (Plessey) corner detection . . . 68

A.5 SUSAN corner detection . . . 68

A.6 Non-maximum suppression . . . 69

A.7 Singular value decomposition (SVD) . . . 69

A.8 Cross-correlation . . . 69

A.9 Symmetrical Phase Filtering . . . 70

A.10 Direct Linear Transformation (DLT) . . . 70

(13)

Chapter 1

Introduction

This Master Thesis has been conducted at the National Laboratory of Forensic

Science, in swedish abbreviated SKL for Statens Kriminaltekniska Laboratorium.

SKL conducts technical investigations that require equipment, tools or competence that is not otherwise available to the police. The material to be investigated can be everything from videos to DNA or explosives. The Image Division deals with the incoming videos and images for analysis or comparison.

1.1 Background

The steadily increasing use of cameras for surveillance, industrial or more casual purposes as in mobile phones, creates a need for efficient handling and analysis of this kind of data. SKL often receives images and videos where, what may be important details, are hard to distinguish as a result of poor image quality. Old or cheap equipment may be causes of bad image quality but also light conditions and other variables are important factors. Even if single images or frames are of bad quality and hard to enhance separately, it may be possible to use data from all available images to create one single image with improved resolution and more distinct details. Examples of objects in images that the forensic engineers at SKL want to study more carefully this way may be patches on jackets or car registration plates.

1.2 Purpose and hypothesis

The purpose with this Master Thesis is to study the possibilities to estimate an enhanced image of an object found in several input images or pictures, and then to create a program in Matlab that is able to do that, based on the hypothesis that it can be done. The enhanced image shall contain more information about the specified object than any single input image.

In four points, the purposes of the thesis are to: 1

(14)

2 Introduction

• Study image registration methods, that is methods that find relations

be-tween image points in different images. This may involve feature detection ([10] [17] [18] [21] [23] [24]) and matching ([8] [14] [22] [25]), homography estimation ([1] [6] [11] [13]) and transformations ([8] [25]).

• Study methods that integrate data in several images ([7] [15] [16] [19]). These

methods take advantage of the information in all images and estimates one single image containing more information than any of the first ones.

• Create a program that estimates a high resolution image of a planar object

found in several images.

• Evaluate key methods used in the procedure of estimating an enhanced image

and study if the results of the created program are satisfying.

1.3 Disposition

This thesis has been partitioned and disposed in the following manner.

• Chapter 2 gives a brief overview of the thesis and the different steps towards

the estimation of a high resolution image.

• Chapter 3 gives an introduction to the mathematics used.

• Chapter 4 explains how to extract features, such as edges and corners, from

images.

• Chapter 5 explains how to match features from different images in order to

build correspondences.

• Chapter 6 discusses homography estimation and transformation. • Chapter 7 explains how to estimate an image of high resolution.

• Chapter 8 describes the intermediate steps that may or may not be essential

parts in the procedure of estimating a high resolution image.

• Chapter 9 shows, with a series of tests, how well methods from different

parts of the procedure work.

• Chapter 10 is the chapter where the results and methods in this thesis are

discussed and future improvements proposed.

1.4 Notation

In order to make text and formulas consistent throughout this thesis, it uses the following notation.

(15)

1.4 Notation 3

• Groups of data, like coordinates or translations, are written between braces: {1, 3}, {tx, ty}.

• Vectors, like homogeneous coordinates for image points in vector form, are

written in bold lowercase letters: x, y.

• A certain element in a vector is obtained by typing its index between

brack-ets: x[1], y[i].

• Matrices are written in bold capital letters: M, I.

• A certain element in a matrix is obtained by typing its indices between

brackets: M[1, 3], I[i, n].

• Functions are written in cursive style: f , F .

• The euclidean distance between two image points is written as a function d: d(x, y).

• Pairs of entities, like image points, are written between angular brackets: hx, yi.

In some places, new words or expressions that need to be emphasized are written in cursive text.

Images with points marked out in figures have been darkened in order to make the white point markers more distinct.

The term cross-correlation refers in this thesis to the normalized version of the method, usually called normalized cross-correlation. Read more about it in appendix A.8.

The resolution of an image will be used as a measurement of the amount of usable information in the image, as opposed to spatial resolution that will be used as a measurement of the image size in terms of pixels. Of course, there is often a relation between these two measurements.

(16)

(17)

Chapter 2

Image enhancement over a

sequence of images

An image of high resolution contains relatively much information per area unit. The goal of this master thesis is to estimate such an image based on several im-ages of low resolution. Maybe not the entire imim-ages, but small interesting parts of them. It is not possible to achieve a higher resolution just by enlarging an image (increasing the spatial resolution by interpolation), since the information it contains is constant. What could happen is that extra false information is added, or that some is lost. To achieve a higher resolution, information must be added from other sources such as other images. This is also the approach of this thesis. The quality of the final result depends highly on the ability to meet the following conditions.

• For images free from noise, their samples must relate to different points in

the scene in order to prevent the information in the images for being the same.

• The relation between the samples in different images must be found. • Given the relations, the samples in the images must be combined in such a

way that an image of higher resolution can be estimated.

Noise reduction can be achieved by computing the mean image of several images. In the most simple case, all images present the same object from the same view-point. If the mean value of the noise is zero, the mean image should have a higher signal to noise ratio. In a more complex case, the object is photographed from different angles. Ann Holmberg tries to resolve this issue in her master thesis [12].

2.1 Image registration

Image registration aims to find the relation between two different images presenting the same object from different angles or positions. Once found, it can be used to

(18)

6 Image enhancement over a sequence of images

align one of the images, called the sensed image, to the other one, called the reference image. For example, if there is a difference in perspective, the sensed image may need to be rectified. The relative motion between the images may not necessarily be expressed explicit if the method used gives the final result in a direct manner. Image registration is used for several purposes:

• Integration of data from different kinds of sensors. In medical applications

these data can come from CT (Computer Tomography) and PET (Positron Emission Tomography).

• Detection of changes in images taken on different occasions. This helps to

spot growth of tumors or to study the use of land.

• Creation of images with higher resolution, mosaicing of images with

in-common objects or creation of a 3D-representation of the scene.

• Comparison of objects to a model, as in automatic quality inspection.

Sensor noise, low resolution, changes in perspective, movements, deformations, compression, lens distortion or different light conditions and shadows are all pos-sible problems that complicates the procedure. [8] [25]

Image registration techniques can be categorized into being either global or local. A global method presumes that the entire image, or at least the interesting part of it, is the result of one global transformation only, relative to the reference image. The parameters of the transformation can be estimated by the use of in-common features in the images, e.g. corners and edges, or methods like optical flow. Local methods can divide the image into triangles or squares, where a unique transformation is computed for each area, or create a representation of the scene in three dimensions. Ann Holmberg’s master thesis [12] illustrates image division. [25]

The procedure of image registration generally consists of four different steps. [25]

• Feature detection: Corners, edges, lines and contours are extracted from the

images. Points that define these features are called control points (CP).

• Feature matching: Correspondences between points and features in different

images are built.

• Homography estimation: Given the correspondences of the features in two

images, transformations that map the set of control points in one of the images to the set of points in the other image are estimated. If the points on a plane are mapped onto a plane in the other image, that transformation is called a homography.

• Resampling and transformation: One of the images is transformed in order

to resemble the other one as good as possible. This may require interpolation of image point values.

(19)

2.2 Overview 7

It is not always necessary to follow these steps strictly. For example, a pattern recognition method could skip the first step and immediately try to match the images. If the transform estimation is done via the frequency domain as in [13], the feature matching is not really a matching of extracted corners and edges, but their contributions to the frequency domain.

2.2 Overview

The above steps are easier to understand if shown in a picture. Figure 2.1 il-lustrates the procedure in graphics. A couple of images of bad quality or low resolution serve as input. One of these (or an up-sampled version of it) may be chosen as reference image, and then the others are called sensed images. Distinct features are then marked in all of them. The features marked in the sensed images should correspond to those in the reference image. The next step is to match fea-tures with each other and build correspondences, because this gives information about the relations between the reference image and the sensed images. These relations, or transformations, should then be approximated. The last step is to decide whether the images should be rectified and super-positioned, or if some method should use the data from the low resolution images and their respective transforms in a more direct manner. One method that does this and is discussed later is called pixel-spraying.

(20)

8 Image enhancement over a sequence of images Feature detection Feature matching Transform estimation Rectification and interpolation Superposition Pixelspraying Lowresolution images Highresolution image Highresolution image Image registration

(21)

Chapter 3

Mathematical Introduction

It may be appropriate with a short introduction to the mathematics used in many of the methods described in this thesis. An easy and flexible way to work with transformations is in projective spaces with image points in homogeneous coordi-nates.

3.1 Homogeneous coordinates

The homogeneous representation of image coordinates {x, y} is given by a vector

xH= (wx, wy, w)T for w 6= 0.

The homogeneous representation of the line

y = −a bx −

c

b ⇔ ax + by + c = 0

is given by a vector lH = (a, b, c)T, and an image point {x, y} is located on this

line iff lT

HxH = 0.

Let lT

HxH = 0. If T is a non-singular 3 × 3 matrix and x0H = TxH, then the

corresponding transformation for the line l0_H = T−TlH means that l0THx0H = 0.

Lines and points are said to be dual in this projective space. The line connecting two points, x1

Hand x2H, can be calculated with the cross product as l12H = x1H×x2H.

[5]

3.2 Introduction to transforms

Chapter 6 addresses transformations and how to estimate them, but a mathemat-ical introduction to transformations is given below.

In order to use information from a set of photos taken from different angles and positions, there is a need to know which points in the sensed images that correspond to a point in the reference image. This means that given

I(xi, yi) = I0(x0i, y 0 i)

(22)

10 Mathematical Introduction

where h{xi, yi} , {x0i, yi0}i are corresponding pairs of points in images I and I0, then

f must be estimated, where f is the coordinate transform {xi, yi} = f ({x0i, y0i})

It is the quality of this estimation that is the key to good image registration. The transformation of points on a 3D plane between two different viewpoints is given by the 3×3 matrix T so that xi0 = Txi, where x0iand xiare the homogeneous

representations of image coordinates {x0_i, y0_i} and {xi, yi}.

3.3 Transformation of planes

The homography between points on a plane from different viewpoints can be ex-pressed with projective transformations as 3 × 3 matrices. A translation with

{tx, ty} is given by x0=   1 0 tx 0 1 ty 0 0 1  x Rotation with angle θ is given by

x0 =   cos(θ) − sin(θ) 0 sin(θ) cos(θ) 0 0 0 1  x

Skewing with k along the x-axis and scaling with s along the y-axis is given by

x0=   1 k 0 0 s 0 0 0 1  x

The general perspective transformation is given by

x0=   h11 h12 h13 h21 h22 h23 h31 h32 h33  x

Since x0 and e.g. _h1

33x

0 _{corresponds to the same image coordinate, h}

33 = 1 can

be chosen. It then is apparent that this transformation only has eight degrees of freedom rather than nine. Examples of transformations like those previously listed are illustrated in figure 3.1.

(23)

3.3 Transformation of planes 11

(a) Original image (b) Translation (c) Rotation

(d) Skewing and scaling trans-formation

(e) Perspective transformation

(24)

(25)

Chapter 4

Features in images

To choose in-common features in images and then to match them, is a way to find the relation between the images, e.g. the reference and the sensed. The features can be textures like forests and lakes, objects that intentionally have been placed out, or objects like houses in photos taken from high altitudes. If working on a higher, more abstract level like in the rest of this thesis, then lines, corners and contours can be extracted and used as features.

It is important that the areas chosen do not move between images in a way not supported by the method to be used. To choose corresponding pairs of points from a moving car and from a bush by the side of the road when using a global method probably gives an unsatisfying result. [8] [25]

4.1 Textures

By immediately studying the entire images as textures, the second step in image registration (matching) is emphasized, and the first (finding features) is skipped. Symmetrical phase filtering (appendix A.9) may be an appropriate method for this purpose. If only parts of the images contain interesting textures, these areas should be marked out by control points in a standard manner and should then be matched. Read more about feature matching in chapter 5.

4.2 Edges

Edges in images typically occur in areas where objects partially occlude each other, where the surface of an object shifts orientations and therefore reflects light non-uniformly, or where textures and shadows are present. An edge detector should find edge positions (with or without sub-pixel precision) and maybe even magnitudes, orientations and steepness. Low-pass filtering, differencing and labeling are three common operations in the edge detection process. Low-pass filtering is used for noise removal and for regularization, in order to make derivates useful. Of course, this results in loss of information. Labeling means that the edges are marked out.

(26)

14 Features in images

Canny (appendix A.2) is maybe the most well-known method. The edge response and accuracy of a method is not only related to the shape of the actual edge in the image, but also to the filters used to locate it, e.g. large low-pass filter kernels give less accurate responses. [24]

4.2.1 Canny edge detection

The studying of image derivates is an effective approach to edge detection. The presence of an edge is likely where the sum of the squared x- and y-derivates is large. Long continuous edges are extracted by following large edge responses and linking these with weaker responses (this is called hysteresis). Along these resulting edges, non-maximum suppression is applied. Figure 4.1 shows an image and its edges. Notice how corners are smoothed cause of the convolution with differentiated gauss filters, and that edges that should be connected are not. Canny

(a) Original image (b) Edge image

Figure 4.1. An image and its edge image.

is described more in appendix A.2. [2]

4.3 Lines

Lines in images, local extreme values forming ridges, can appear in photos where thin objects are located in front of others, or where roads and rivers are imaged from high altitudes. Blood vessels in medical images can form lines too. Methods for line detection are categorized in three groups. Those that use pixel values straightforwardly, those that search for two parallel edges, and those that search for ridges and ravines e.g. with masks. According to [24] most methods are limited to the use of thinning algorithms which means low precision. In [23] Ziou describes what he calls an optimal line detector where non-maximum suppression is used to achieve sub-pixel precision. Steger uses image derivates and a model of the cross section of a line in [18].

(27)

4.4 Corners 15

4.4 Corners

According to [25] and [17], corners can intuitively be defined as points on an edge where there is a large change in its orientation. Such points are located in areas where different intensities form what reminds of an "L", "T", "Y" or "X", and these variations are often mentioned in the literature. Corners are important features in many computer vision and image processing applications since they are invariant to translation, rotation and (sensible) scaling. It is not easy to theoretically describe what is a corner and what is not, since there are lots of different variations. In [21] the methods for corner detection are grouped into three categories.

• Template based corner detection: Mathematical models of corners are

cor-related with the image. Problems can occur when the orientation of an edge is different from any of the models.

• Contour based corner detection: Edges are detected and chain coded. Points

where their orientations change suddenly, e.g. by a certain angle, are likely to be marked as a corner.

• Direct corner detection: These methods do not need edge images or

corre-lation with corner masks, but use image derivates or pixel values in a direct manner. Harris (Plessey) corner detection and SUSAN for corner detection are examples of methods in this category.

4.4.1 Detection with Harris corner detector

By the use of image derivates, a tensor is calculated for each pixel. Its eigenvalues describe the changes of pixel values in its proximity. Two almost equal eigenvalues indicate the presence of a corner. One large and one small eigenvalue indicate an edge. [10]

Figure 4.2 shows a car registration plate (figure 4.2(a)), the result of the Har-ris operator (figure 4.2(b)) and the extracted corner points after non-maximum suppression (figure 4.2(c)). Obviously lots of points should be discarded as they do not correspond to salient corners, but to noise. By increasing the width of the Harris operator (increasing the variance of the gauss filter used in the method), the influence of noise is decreased and the ability to detect smoother corners is increased. The result of the Harris operator can be seen in figure 4.2(d), and the extracted in figure 4.2(e). Notice that the two corners of each white light of the car almost blend into one. The corners of the plate are still detected but not as dis-tinctly as before. Larger scales also gives larger displacement of the corners, which can be seen in this example. Read appendix A.4 for a more detailed description of the Harris detector.

Different scales can be combined in order to suppress noise and still detect both smooth and sharp corners, and points with low response values can be discarded by the use of a threshold. Figure 4.2(f) shows the result when using multiple scales. The points near the edges of the image should be removed since they are products of convolutions where values outside the image are defined to be zero.

(28)

16 Features in images

(a) Original image.

(b) Harris response with small gauss filter. (c) Harris points with small gauss filter.

(d) Harris response with large gauss filter. (e) Harris points with large gauss filter.

(f) Harris points with multi-scale filters.

(29)

4.4 Corners 17

4.4.2 Detection with SUSAN corner detector

The method for corner detection called SUSAN compares every pixel value with the values of the other pixels in its proximity. If less than half of the values are less than a chosen threshold compared to the center pixel value, a corner is detected. The threshold value can be set automatically by calculating image contrasts, or manually for controlling the number of detected corners. [17]

Figure 4.3 shows the plate again along with final result from the SUSAN oper-ator where non-maximum suppression also has been applied. As can be seen, the operator is sensitive to changes in pixel values which can be a problem in noisy images. SUSAN for corner detection is described more in appendix A.5

(a) Original image. (b) SUSAN points.

(30)

(31)

Chapter 5

Feature matching

When features in the reference and sensed images have been extracted, these must be matched before the relationships of image coordinates can be approximated. This can be done either manually or automatically.

5.1 Manual matching

Manual matching of chosen features in images means that an operator matches them as good as he or she can. Depending on images and extracted features, this is a more or less difficult task and different operators may match in different ways.

5.2 Automatic matching

Chosen features can be matched by comparison of the pixel values in the prox-imity of their control points, their relative spatial positions or their descriptions. Cross-correlation or a frequency domain method can be used and sometimes there are benefits in correlating responses from a corner or an edge detector instead of the original images. Translations are the only allowed transformations when comparing pixel values like this. Processing of large areas can be speed up with a multi-scale approach, as in the use of gauss pyramids. Interpolation of the correlation response image may be used to achieve sub-pixel precision. [8] [25]

5.2.1 Matching with cross-correlation

Normalized cross-correlation gives a similarity measure of a template image for each pixel in a reference image. The position where the correlation response is largest is the most likely point in the reference image to correspond to the center point of the template. Figure 5.1(a) and 5.1(b) show the translation of a slightly surprised man. Correlation of these two images with 5.1(a) as reference and 5.1(b) as template yields the response image in 5.1(c). Its largest value is found in a pixel with a displacement from the center that corresponds to the translation.

(32)

20 Feature matching

(a) Reference (b) Template

(c) Correlation response

Figure 5.1. Two images and their correlation response.

correlation can handle translations only. If images have been transformed in other ways, the cross-correlation method will compare wrong pixel values since image points have been rearranged. Large templates give more stable responses if images have been translated only, but otherwise a small template can be more useful. This is illustrated in figure 5.2. The task is to find the corresponding upper right corner of the square in 5.2(a), in its skewed version 5.2(c). The zoomed template used is shown in 5.2(b). Correlation values from a large version of the template are shown in 5.2(d) and values from a small version of the template are shown in 5.2(e). The points with the largest correlation values have been marked out in figure 5.3. They are apparently displaced along the y-axis relative to the true corner of the skewed square. The approximation of the corner position is better with the smaller template. Better precision may require estimation of the skewing transform between the images (read section 8.6). Just using a very small template increases the risk of false matches since the likelihood that other parts of the image resembles the template just as much as the true part increases.

(33)

5.2 Automatic matching 21

(a) Square. (b) Template. (c) Skewed square.

(d) Correlation response for large template.

(e) Correlation response for small template.

Figure 5.2. Squares, corner and correlation responses.

(a) Largest correlation value with large tem-plate.

(b) Largest correlation value with small tem-plate.

Figure 5.3. Zoomed corner of skewed rectangle and largest correlation values.

5.2.2 Matching with phase filtering

Phase filtering is a common alternative to cross-correlation, and means that the signals that build the image are correlated. If all frequency amplitudes are normal-ized to 1, the method is called symmetrical phase filtering. Read more about this method in appendix A.9. The same images as in section 5.2.1 are correlated again, but now with symmetrical phase filtering. The result in figure 5.4(c) is very clear with a sharp peak at a point that corresponds to the translation of 5.4(a) relative

(34)

to 5.4(b) (or the position of 5.4(b) in 5.4(a)). Figure 5.5 illustrates the same

prob-(a) Reference. (b) Template. (c) Correlation response.

Figure 5.4. Reference and template images and correlation response.

lem as in last section; to find the upper right corner of the square in 5.5(a) in its skewed version in 5.5(c). This time the correlation is done with symmetrical phase filtering. The method is sensitive to changes in pixel values along the borders of the template to the zeroes that are assumed outside the borders. Therefore the template is windowed with a Blackman window, as shown in 5.5(b). The result in 5.5(d) reveals that all edges are found with almost the same magnitude and this is because they are all made of almost the same frequency components but with negative phase in one or two directions. The position of the largest correlation

(a) Square. (b) Template. (c) Skewed square.

(d) Correlation response.

Figure 5.5. Squares, corner and correlation responses using symmetrical phase filtering.

value around the correct corner is plotted in figure 5.6 along with the corner of the skewed square. The coordinates seem to be correct, and this is because this specific template triggers the search for sudden changes along both the x- and y-axis. That the upper edge of the skewed square is slanted is not important in this case, since the conditions set by the template still are met best in correct point. It

(35)

5.2 Automatic matching 23

Figure 5.6. Zoomed corner of skewed rectangle and largest correlation value using

symmetrical phase filtering.

may now seem as though symmetrical phase filtering is invariant to skewing but this is not the case. To demonstrate this figure 5.7 shows a smiling man in 5.7(a), his right (your left) nostril zoomed and windowed in 5.7(b) and a skewed version of the man in 5.7(c). Figure 5.8(a) shows filtering with the nostril as template

(a) Smiling man. (b) Windowed nostril. (c) Skewed smiling man.

Figure 5.7. Original smiling man, windowed nostril and skewed man.

and the original image as reference. The sharp peak gives correct position. When filtering with the skewed man as reference instead, the response image in 5.8(b) is harder to interpret. There is a tiny peak in the correct position but it is weaker than many other false peaks.

(36)

(a) Phase filtering response for original man. (b) Phase filtering response for skewed man.

(37)

Chapter 6

Homography estimation and

transformation

If a plane has been imaged from two different viewpoints, its points in one of the images correspond to the points in the other by a homography. In order to approximate the transform, it is common to first choose which function model that is appropriate and believed to be true to the real transform, then to calculate the parameters of the model. If all points of the interesting areas have been transformed with one single homography, a global method can be used. If objects have been deformed or moved between the images, a global method may not be enough, which leads to the use of a local method. Only the nearest control points then affects the transformation of a region. This makes the transformation local. Images can e.g. be divided into triangles with a unique transform calculated for each of them. Optical flow is another alternative.

Multiple scales are often used to speed up operations. It is in practice hard to choose corresponding pairs of points that perfectly fit a given transform model due to e.g. noise and discretion. Methods then uses either approximation or inter-polation. Approximation means that if more pairs of feature points than required are given, the transform is calculated with least mean squares, which means that the points not necessarily are transformed to their respectives. Interpolation, on the other hand, makes the control points in one image transform to their corre-sponding points in the other image and interpolates image points in-between with polynoms or cubic splines. [1] [8] [25]

6.1 Transforming

When the transformations between the reference and the sensed images have been approximated, they can be used in the search for differences or similarities in the images. Let H give the relationship y = Hx between image coordinates y in the reference image and x in a sensed image. For a general x, it may turn out that the calculated coordinates y are not integers. If the transformation method runs

(38)

26 Homography estimation and transformation

through all pixels in the sensed image and applies the transform and rounds to integers, it is likely that the result, that should be aligned to the reference image, is undefined in some points. This is why the most common approach is to use

x = H−1y instead, run through all pixels of the imagined result image y, and to

set its values to the result of an interpolation of the pixels in the sensed image in the proximity of the calculated x coordinates. The choice of interpolation method depends on required precision and available computing power. [25]

Figure 6.1 shows two images. The right one 6.1(b) (sensed image) will be transformed to align the left 6.1(a) (reference image). Corresponding points have

(a) Reference image. (b) Sensed image.

Figure 6.1. A reference image and a sensed image.

been marked in figure 6.2(a) and 6.2(b). A global transformation H from the sensed image to the reference image is computed with approximation. An empty result image is created. Its pixels yi are run through and corresponding points

xi= H−1yi in the sensed image are calculated. The pixels yi in the empty result

image are set to the values of xi in the sensed image if their coordinates are

integers, else values are interpolated from neighbouring points to xi. This makes

the result image shown in 6.2(c) look similar to 6.2(a) in the relatively flat area where the points were placed.

(39)

6.1 Transforming 27

(a) Reference image points. (b) Sensed image points.

(c) Transformed image.

Figure 6.2. Corresponding points in reference and sensed image, and the transformed

(40)

(41)

Chapter 7

Estimation of high

resolution image

There are a couple of different methods that can be used when trying to estimate a high resolution image from a several low-resolution images, e.g. [7] [15] [16] [19]. The input images may be rectified, scaled up and interpolated, and then super-positioned. Another approach is to start with an empty, large image, and then fill it with values from the input images according to their respective transformations relative to a reference image. This is called pixel-spraying in this thesis.

7.1 Image superposition

Image superposition means that images are somehow aggregated, for example by computing the mean values or by using a more advanced method. The intention is to reduce noise or to suppress values resulting from erroneous image registration or values from disturbing objects. First, all sensed images need to be rectified in order to align to the reference image. Each sensed image relates to the reference image through y = Hx, where y are image coordinates in the reference image,

x in the sensed image, and H is a coordinate transform. The rectified image is

then computed by running through all pixel coordinates y, using the relationship

x = H−1y, and then interpolating pixel values in the sensed image with resulting

coordinates x.

The rear parts of a car have been photographed six times from different angles and these photos are shown in figure 7.1. An enhanced image of the registration plate can be created by computing the mean values of the six images, but they first have to be transformed and aligned to a reference image. Image 7.1(d) is chosen as reference image. The other five image are transformed, with different results, see figure 7.2. The orientations of the edges of the plate differ a bit, but this is nothing to worry about since a good method should be able to handle slightly misaligned images and make the best of it. Two different mean value images are shown in figure 7.3. The smoother image in 7.3(b) is the result of scaling the five

(42)

30 Estimation of high resolution image

(a) Image 1. (b) Image 2. (c) Image 3.

(d) Image 4. (e) Image 5. (f) Image 6.

Figure 7.1. Six input images.

(a) Transformed image 1. (b) Transformed image 2. (c) Transformed image 3.

(d) Image 4. (e) Transformed image 5. (f) Transformed image 6.

Figure 7.2. Five transformed sensed images and one reference image.

transformed images and the reference image up by a factor of 4 before computing the mean values.

A picture of a man have been resampled to 10 slightly translated and down-scaled images. The translations are less than 4 pixels along each axis, which leads to translations of less than 1 pixel in the new images that are scaled down by

(43)

7.1 Image superposition 31

(a) Enhanced image of same size as the input images.

(b) Enhanced image of larger size than the in-put images.

Figure 7.3. Two enhanced images.

a factor of 4. Figure 7.4 shows the original image and one of the new smaller images, zoomed. It is hard to recognize the man since it only consists of ₁₆1 of the original amount of pixels, but if the homographies are known a good reconstruction should not be impossible to estimate thanks to the total amount of information in the small images. In order to further demonstrate superposition methods, each

(a) Original image. (b) Down-scaled and translated image.

Figure 7.4. Original image of the man and one of the down-sampled and translated

images, zoomed.

one of the 10 images are transformed back and interpolated with their respective homography. Figure 7.5 illustrates different ways to superposition them. The computation of the mean image yields the result in figure 7.5(a), and 7.5(b) shows the median filtered image. Figure 7.5(c) illustrates the FMOIS method from [20], shortly described in appendix A.11, which gives the result of a weighted mean value computation along with slightly amplified high frequency data. As can be seen, the results are blurry since high frequency data has been lost in the process

(44)

32 Estimation of high resolution image

of down- and up-sampling and interpolation.

(a) Mean image. (b) Median image.

(c) FMOIS image.

Figure 7.5. Resulting images from three different superposition methods.

7.2 Pixel-spraying

Instead of using the relationship x = H−1y as in image superposition, where y

are reference image coordinates and x are sensed image coordinates, y = Hx may be used in a direct manner.

1. Initialize a large, empty result image.

2. Run through all x coordinates, compute the corresponding y coordinates and round them to integers.

3. Set the pixel values at the rounded y coordinates in the reference image to the values of the pixels at corresponding sensed image coordinates.

This will most likely result in an image where some (or a lot of) pixel values have not been set, but by using more sensed images, these undefined pixels will be fewer.

(45)

7.2 Pixel-spraying 33

If a value should be set more than once, it may be appropriate to let the value be set to the mean value of all aspiring values. In order to make the resulting image as complete as possible, all undefined pixels are run through and their values are set to the mean or median value of their nearest neighbouring defined pixels. If using the same 10 input images of the face of a man as in previous example (section 7.1), the first step in pixel-spraying will result in the image in figure 7.6(a). Figure 7.6(b) shows the final result when mean values of the three nearest neighbouring defined pixels have been used, and 7.6(c) when median values have been used.

(a) Result image with values from input images only.

(b) Result image filled with mean values.

(c) Result image filled with median values.

Figure 7.6. Three images illustrating pixel-spray methods.

An alternative way is to not round the y coordinates to integers, but to keep them as float values and then interpolate image point values with integer coordi-nates from these.

(46)

(47)

Chapter 8

Important intermediate

steps

Some important steps in the process of creating a high resolution image have been left out of the thesis so far, to make the structure of the procedure clear. Instead, these are described below in the most natural order. Depending on material and approach, they may or may not be necessary.

8.1 Choosing area of interest

Since the methods in this thesis do not presume a certain kind of object to search for, the interesting parts of the images must somehow be specified. Salient features in the images must be chosen by the user, or the areas of interest must be spec-ified for feature detection algorithms. If corners are to be detected from the car registration plate in figure 8.1(a), it is appropriate to specify an area containing it as in 8.1(b).

(48)

36 Important intermediate steps

(a) Original image. (b) Area of interest.

Figure 8.1. Specifying the area of interest.

8.2 Interlaced images

An old and very common way to decrease the required bandwidth of video signals is to use interlacing. Every frame then consists of two fields and its every other line is from field 1 and the rest from field 2. The data of field 1 is from time t1

and data of field 2 is from time t2, where t1 6= t2. If there has been any motion

during that, often small, amount of time, there will be some disturbing distortions in the frame. Figure 8.2 shows the frame and fields when the image of a car has been translated along the x-axis between t1and t2. It is difficult to detect salient

features in frames with motion and their fields may have to be studied separately. The undefined lines in the fields can be set by interpolation of the lines below and above. When using separate fields to estimate a high resolution image, it is important to try not to use any interpolated data in the estimation but the original fields only. Figure 8.3 shows two fields where each undefined line has been set to the mean value of the lines below and above, and compares them to a frame without motion and therefore of full resolution. It is evident that interlaced images are harder to work with.

(49)

8.2 Interlaced images 37

(a) Field 1 (b) Field 2

(c) Frame

(50)

(a) Field 1 interpolated. (b) Field 2 interpolated.

(c) Frame without motion.

(51)

8.3 Extraction of straight edges and lines 39

8.3 Extraction of straight edges and lines

Methods that use correspondences of lines for calculating transforms between im-ages usually require the equations of the lines, not the edge maps given by edge detection algorithms like Canny. The picture of a registration plate and the re-sponse of an edge detector are shown in figure 8.4. A way to calculate the equation of the nether edge of the plate is discussed below. The method was written by the author and uses Canny edge detection.

(a) Registration plate. (b) Edge detector response.

Figure 8.4. Image of car registration plate and the response from an edge detector.

An operator has to define with a point p, which line is interesting. All points of the edge closest to point p are grouped and called edge points. The edge points in this case are shown in figure 8.5(a). To be honest, one point of the edge is left out. This is because this particular edge forms a closed contour, but a line in an image should have two end points. Next step is to search along the edge points starting from the edge point closest to p. That point and one edge point on both sides of it now forms the new group line points. The line l that best fits the three points is computed by least mean squares. Further edge points are added to the line points group according to the scheme below.

1. Search for next edge point in one direction (e.g. left). If this point has distance to line l less than certain threshold t, mark it as "good", else mark it as "bad".

2. Search for next edge point in the other direction (e.g. right). If this point has distance to line l less than certain threshold t, mark it as "good", else mark it as "bad".

3. Add "good" points to the group line points, update the equation of line l and start over from step 1. If the edge point in step 1 or 2 was bad, then skip this step for now on. If both steps are skipped, then the group line points contains all edge points that form a straight line.

Figure 8.5(b) shows all edge points that were considered being line points, and figure 8.5(c) shows the corresponding line which seems to align well to the nether edge of the plate.

(52)

(a) Edge points. (b) Line points.

(c) Resulting line.

Figure 8.5. Points of entire edge, points of straight line and the line itself.

It may be better to use a variable threshold t that is large (generous) to start with and then lesser (stricter) the more line points that have been added, than a constant threshold. As an example, t can be defined as

t = g · eσ−nσ −1

where n is the current number of line points and σ decides at which rate t will converge to g. Figure 8.6 shows the above function with g = 1.5 and σ = 1.5, which was previously used when determining the equation of the nether edge of the plate.

(53)

8.4 Interpolation 41

Figure 8.6. Example of threshold function.

8.4 Interpolation

A digital image consists of points in a raster pattern as in figure 8.7(a). If the points are moved according to a general transformation, it is not likely that they still perfectly fit the pattern. This is illustrated in figure 8.7(b). In order to display the image or visually compare it with another, every raster point has to be set to a value. This is done by interpolation and resampling. Resampling of the

(a) Raster pattern and image points. (b) Raster pattern and transformed image points.

Figure 8.7. Image raster pattern and original and transformed image points.

transformed image is done in two steps. First a continuous function Ic is defined

as the convolution of the image I and a chosen interpolation kernel h as

Ic(x, y) = I(x, y) ∗ h(x)δ(y) ∗ h(y)δ(x)

This function is then sampled at the points in the raster pattern, which means that Ic in practice only needs to be computed at those points. [3]

(54)

A band-limited signal can be resampled perfectly with a sinc-function as in-terpolation kernel, but this would lead to very large computations. Common simplified interpolation methods are called "nearest neighbour interpolation", "lin-ear interpolation" or "cubic spline interpolation". Figure 8.8 shows their respective 1D-kernels and compares them with a sinc-function, which also is shown explicitly in 8.8(a). Of these three kernels, a cubic spline approximates the sinc-function best and is in this case defined as

hcs=    x3_{− 2x}2_{+ 1} _{, x ∈ [0, 1]} −x3_{+ 5x}2_{− 8x + 4} _{, x ∈ ]1, 2]} 0 , x ∈ ]2, ∞[

Linear interpolation yields a smoother result than nearest neighbour, although

(a) Sinc-function. (b) Nearest neighbour.

(c) Linear. (d) Cubic spline.

Figure 8.8. 1D interpolation kernels.

some edges may look aliased. An even smoother result is given by the cubic spline, but the choice of kernel also depends on the application and the amount of computing power available.

(55)

8.5 Automatic feature matching and homography re-estimation 43

8.5 Automatic feature matching and homography

re-estimation

Erroneous matchings of control points in different images easily occur by mis-takes from an operator or because the transformation of an image makes non-corresponding points receive a higher correlation value than the true pair of points. False pairs of points will henceforth be called outliers. Figure 8.9 shows two im-ages where 8.9(a) is reference and 8.9(b) has been transformed with a perspective transform. In addition, the upper right corner of the image has been cut out and moved to the lower left corner.

(a) Original image. (b) Perspectively transformed and edited im-age.

Figure 8.9. Original image and perspectively transformed and edited image.

An operator has marked out points in both images that seems to correspond to salient features, and these points are shown in figure 8.10. The same operator then tried to match the points and thereby create a series of corresponding pairs of points, as shown in figure 8.11. The operator has apparently made three bad matchings and these are marked with numbers in the figure. The areas around the points in pair number 1 are rotated, scaled and mirrored versions of each other but their relative positions in the images do not seem correct. Pair number 2 does not seem correct in any way. Pair number 3 reveals that the operator has been fooled by the area that was cut out and moved. Even though the correlation value is high, this pair cannot be used when estimating the global perspective transformation. In order to avoid outliers like those above, an automatic method to match the control points can be applied. The following method was written by the author and uses cross-correlation.

In the first step, all control points in the reference image are correlated with the points in the sensed image, and the ones with the largest correlation values are chosen as pairs. These are shown in figure 8.12. One pair seems to be an outlier.

A perspective transform can be defined by four pairs of points. That is why every possible combination of four pairs of points from the correlation part is

(56)

(a) Original image with points. (b) Perspectively transformed and edited image with points.

Figure 8.10. Original image and perspectively transformed and edited image, both with

points.

Figure 8.11. Manual point matchings between original and edited images.

studied as a next step. The combination that results in the homography H that best fits the rest of the pairs of points given is chosen as the one most likely to represent the true homography. The homography may then be recomputed by using all pairs of points that fit, or almost fit, this homography H. When comparing all possible homographies out of four pairs, the best one is chosen to be the one that gives the minimum mean or median value of all distances d(y, Hix),

where y are the control points in the reference image, x the control points in the sensed image and Hi the different homographies. Also, the more outliers a

homography results in, the less likely it is to be considered the best one. In this specific example, one pair out of those in figure 8.12 was considered an outlier and the remaining pairs of points are shown in figure 8.13.

The final step is to rematch the points, but this time only points that fits the computed homography may build correspondences. The resulting matchings are shown in figure 8.14, and the homography should now be updated.

(57)

8.5 Automatic feature matching and homography re-estimation 45

Figure 8.12. Point matchings based on high correlation values.

Figure 8.13. Possible inliers.

Figure 8.14. Pairs of points from cross-correlation that also fits the estimated

(58)

8.6 Iterative point position adjustment

The control points of features are often a bit misplaced, that is, the two points of a pair are not well positioned relative to each other. The reason may be that the points were placed out by an operator or that a feature detector has not been accurate enough. It may also be necessary to achieve sub-pixel precision in order to make the resulting enhanced image satisfying. Starting from a reference image where the positions of the feature points are defined as exact, the corresponding points in the sensed images must be aligned to these. Problems with using standard cross-correlation to make this alignment arises when the sensed images have been transformed in other ways than with pure translations (read section 5.2). The solution is to use a method that searches in more than two dimensions and therefore can handle other kinds of transformations, or to estimate initial homographies and then refine them. An example of the latter method is demonstrated below. It was written by the author and uses cross-correlation.

An image of a rectangle has been perspectively transformed as shown in figure 8.15 and the corners in both images have been marked by points. As can be seen, the points in the sensed image have been placed out inaccurately and their positions need to be corrected. The method for refining the positions xiof control

(a) Original rectangle with points. (b) Skewed rectangle with inaccurate points.

Figure 8.15. Original and skewed rectangles and their respective corner points.

points, and thereby refining the homography, follows the scheme below.

1. An initial homography H is estimated, defined by the current positions of the control points yi in the reference image and xi in the sensed image.

2. Each neighbourhood Yi of control points yi in the reference image is

trans-formed as Xest

i = H−1Yi in order to estimate the neighbourhoods Xi of

each control point xi in the sensed image.

3. Each xi is set to xi = xi + ∆xi where all ∆xi are computed by

cross-correlation of Xest

i and Xi. Sub-pixel precision is achieved by interpolation

(59)

8.6 Iterative point position adjustment 47

correlation responses. Then H is recomputed. This step is repeated until ∆xi< t for all i, where t is a chosen threshold.

The new positions of the control points shown in figure 8.16 seem more accurate.

(60)

(61)

Chapter 9

Tests, experiments and

results

It may be hard to measure the quality of a final high resolution image. It is easier to study the different steps of the process one by one. In this chapter methods for automatic feature matching, iterative point position adjustment and estimation of high resolution image are tested. Finally the entire procedure of estimating a high resolution image from several low resolution images is tested.

9.1 Automatic feature matching test

Thanks to automatic feature matching, there is no need for an operator to man-ually match the features. If the control points in the images have been detected by a feature detector, it is likely that some of those in a sensed image do not correspond to any in the reference image. A method that excludes such points and estimates the correct homography must be reliable enough to be worth using. The method in section 8.5 is tested with two images with 8 salient control points in each image, shown in figure 9.1. Additional random points are added as control points in both images, and table 9.1 shows the likelihood to estimate the correct homography (the one defined by the original 8 points). This success rate is based on how many times the method included the original 8 points when estimating the homography, out of 1000 tries with different random points. Either the mean or median value of distances between points are used in the search for the correct homography. Read about this in section 8.5. According to further, less rigorous tests, it seems that the nice trend of high success rates continues with up to at least 36 random points.

(62)

50 Tests, experiments and results

(a) Reference image with points. (b) Sensed image with points.

Figure 9.1. Reference and sensed image with salient control points.

Random points Median or mean Success rate

4 Median 0.96 4 Mean 0.96 8 Median 0.98 8 Mean 0.98 12 Median 0.98 12 Mean 0.98

(63)

9.2 Iterative point position adjustment test 51

9.2 Iterative point position adjustment test

Control points marked by an operator or a feature detector have probably not been placed on exactly correct positions. The iterative point position adjustment method from section 8.6 adjusts the control points in sensed images to better correspond to the control points in the reference image. The precision of the method is tested by marking sensible points in a high resolution image. This image is then down-scaled by different factors (2, 3 and 4) and transformed perspectively with a homography H−1. The original image and an example of a transformed image are shown in figure 9.2. The control points yi of the original image are

transformed with the same homography H−1 and are therefore placed correctly in the scaled down images as xi. The method then tries to adjust the control

points to better correspond to the ones in the original reference image. Ideally, the method should not adjust the points at all in this case, since they already are in the right positions, but the information loss following the transformations makes small adjustments to be expected. The new points are called xest_i , and then the mean value of distances d(yi, Hxesti ) is computed. The method is tested

both with and without prior smoothing of the sensed images in order to prevent them being aliased, and the results are shown in figure 9.3. In order to make

(a) Reference image. (b) Example of a transformed image.

Figure 9.2. Reference image and an example of a transformed image.

the test more realistic, the transformed points may be displaced a bit before being adjusted. The mean distances will then come out as shown in figure 9.4

(64)

52 Tests, experiments and results

(a) Distances for images without smoothing. (b) Distances for images with smoothing.

Figure 9.3. Mean distances between correct and adjusted positions for images

trans-formed and down-scaled by different factors, without and with prior smoothing.

(a) Distances for images without smoothing. (b) Distances for images with smoothing.

Figure 9.4. Mean distances between correct and adjusted erroneous positions for images

(65)

9.3 Estimation of high resolution image test 53

9.3 Estimation of high resolution image test

There are a couple of questions that arise when it comes to estimation of a high resolution image.

• To what extent does the estimated image correspond to the true scene that

was photographed or drawn?

• How many input images are required for a good estimation?

• Which of the previously proposed methods (in chapter 7) yields the best

result?

General answers are hard to give since they depend on the material to work with and what is regarded as a good result since it is visual. Given a set of input images, this section aims to answer the questions in that specific case and may also give hints to general answers.

An image of the rear parts of a car is scaled down by different factors (2, 3 and 4) to form sets of input images of different sizes (4, 8, 12, 16, 20, 24, 28 and 32). Each of these images are also slightly transformed perspectively. All homographies are saved and later used to, with different methods, estimate the original non-scaled image from the transformed ones. The estimated high resolution image is then compared to the non-scaled original image by the means of the mean error of pixel values. Figure 9.5 shows the original image and an example of a transformed image (in this case scaled down by a factor of 4). Figure 9.6 shows the mean

(a) Original image. (b) Example of a transformed image.

Figure 9.5. Original image and example of transformed image.

errors resulting from the different methods along with the mean error when one of the scaled images is directly rectified and compared to the original image. The resulting mean error of a useful method must intuitively be less than that of a single rectified image, and this is the case with all tested methods. As seen in the figure, the errors of superposition methods seem to stay almost constant and do not decrease that much when more input images are used. The reason is that the