FLIR Systems

(1)

1 ABSTRACT ... 2

2 SCOPE ... 3

3 INTRODUCTION ... 3

4 INFRARED IMAGING ... 4

4.1 INFRARED IMAGING SYSTEMS ... 4

4.2 INFRARED RADIATION ... 5

4.3 INFRARED DETECTORS ... 6

4.3.1 Detector characteristics ... 7

4.4 NOISE IN INFRARED SYSTEMS ... 11

4.4.1 Three-dimensional noise model ... 11

4.4.2 Noise calculation ... 13

5 INFRARED VIDEO PROCESSING ... 14

5.1 INFRARED VIDEO PROCESSING SIMULATOR ... 15

5.2 NON-UNIFORMITY CORRECTION ... 16

5.3 DEAD PIXEL REPLACEMENT ... 16

5.4 NOISE FILTERING ... 17 5.5 HISTOGRAM GENERATION ... 18 5.6 CONTRAST ENHANCEMENT ... 18 5.6.1 Continuous adjust ... 18 5.6.2 Histogram equalization ... 19 5.6.3 Bilateral filtering ... 20

5.7 OUTPUT PIXEL PROCESSING ... 23

6 ELECTRONIC VIDEO STABILIZATION ... 24

6.1 MOTION ESTIMATION ... 25

6.1.1 Optical flow ... 25

6.1.2 Gradient-based ... 27

6.1.3 Matching techniques ... 30

6.1.4 Frequency-based ... 35

6.2 TRANSFORM MODEL ESTIMATION ... 36

6.2.1 Parametric transform models ... 36

6.2.2 Robust estimation of transform model parameters ... 38

6.3 CAMERA MOTION ESTIMATION ... 39

(2)

1 ABSTRACT

An infrared video processing simulator has been implemented in MATLAB to support the development and hardware implementation of image processing algorithms in infrared camera systems. The simulator supports functions like non-uniformity correction, dead-pixel replacement, noise filtering, contrast enhancement and output pixel mapping. By using the simulator, algorithms may be evaluated and optimized via a graphical user interface before the often time-consuming task of hardware

implementation. The simulator is also a useful tool to demonstrate new video processing algorithms.

(3)

2 SCOPE

This report describes image processing algorithms found in FLIR Systems AB's camera systems. An infrared video-processing simulator that has been implemented in MATLAB is also introduced. Finally, five different techniques for video stabilization are evaluated.

3 INTRODUCTION

FLIR Systems AB develops and manufactures infrared (IR) video camera systems. To achieve an acceptable image quality, signal processing of the detector signal is

necessary in order to compensate for noise and variations in response between detector pixels.

Infrared video sequences are often noisy and low in contrast. Image processing is therefore crucial to most infrared camera systems in order to provide a high quality video output. Infrared imagers typically have functions for non-uniformity correction of sensor deficiencies, dead-pixel replacement, noise filtering, histogram generation, contrast enhancement and output pixel processing. To produce frame rates of 30 or 60 frames per second much of the image processing must be implemented in hardware. There is often a need for testing or developing new algorithms. However,

implementing algorithms in hardware is a difficult and time-consuming task. It is much faster and easier to implement image-processing algorithms for evaluation in MATLAB. An infrared video processing simulator with a graphical user interface would therefore be a valuable tool in the development process.

An area of interest to thermal imaging is video stabilization. Video sequences taken by handheld or mobile cameras are often disturbed by unwanted motions. Removing these unwanted motion provides a smoother sequence that is more pleasant to observe and better suited for further analysis. Accurate motion estimation also lays a

(4)

4 INFRARED IMAGING

Electromagnetic radiation comes in many different shapes. Infrared (IR) radiation is found between visible light and microwaves in the electromagnetic spectrum. All objects with a temperature above 0K emit IR radiation. The intensity and wavelengths of the radiation depends on the temperature. The higher the temperature, the more radiation emitted and the shorter the peak wavelength of the emissions. An object that does not reflect any incoming radiation is called a blackbody.

The human eye is not capable of detecting IR radiation but electronic sensors are. Infrared sensors measure energy emitted (as opposed to reflected) from objects in a scene and therefore work just as well in darkness as in daylight. Infrared or thermal imaging is the process of measuring the irradiance variations between objects in a scene and presenting it as a visible image.

4.1 INFRARED IMAGING SYSTEMS

There are two groups of infrared imaging systems: • Measuring systems and

• Imaging systems

Measuring systems are calibrated to provide accurate temperature information for each pixel in the focal plane array. Imaging systems are designed to provide a useful image without the exact temperature reference. From an image processing perspective measuring and imaging systems are similar and no further distinction will be made in this report.

Early infrared imaging systems were so called scanning systems. A scanning system has only a few detector elements that are used to scan the full scene. Modern systems are called staring systems and have a large matrix of detector elements, called a focal plane array (FPA). A modern FPA typically has 320x240 or 640x480 or more detector elements.

An infrared imaging system may be modeled as a two-dimensional linear system (Figure 1).

Figure 1. Infrared imaging system.

Input

scene Atmosphere Optics Detectors

Electronics Display Human eye Output _scene

(5)

The impulse response or point spread function (PSF) of the system is a convolution of the individual impulse responses:

hsystem(x,y) = hatm(x,y)*hoptics(x,y)*hdet(x,y)*helec(x,y)*hdisp(x,y)*heye(x,y)

Each of the components in the system contributes to blurring of the scene. The blur attributed to a component may be comprised of more than one physical effect. For example, the optical blur is a combination of the diffraction and aberration effects of the optical system.

The Fourier transform of the system's impulse response is called the optical transfer function (OTF). The magnitude part of the OTF is called the modulation transfer function (MTF).

4.2 INFRARED RADIATION

Infrared radiation extends from wavelengths of about 700 nm to 1 mm. Visible light extends from about 400 nm to 700 nm. Wavelengths shorter than visible are referred to as ultraviolet, x-rays and gamma rays. Wavelengths longer than IR are referred to as microwaves and radio waves (Figure 2).

Figure 2. Electromagnetic spectrum

The infrared spectrum is further subdivided into shortwave infrared (SWIR) (~1-3 μm), mid-wave infrared (MWIR) (~3-6 μm) and long-wave infrared (LWIR) (~6-14 μm). Furthermore is the region from 700 nm to 1 μm usually called near infrared

(NIR), from 14 to 25 μm very long-wave infrared (VLWIR), and from 25 μm to 1 mm

far wave infrared (FWIR).

Visible spectrum – Wavelength (nm)

400 480 540 580 700

Infrared spectrum (part of) – Wavelength (μm)

NIR SWIR MWIR LWIR

0.7 1 3 6 14

10-6₁₀-5₁₀-4₁₀-3₁₀-2₁₀-1₁₀0 ₁₀1 ₁₀2 ₁₀3 ₁₀4 ₁₀5 ₁₀6 ₁₀7

Gamma ray X-ray Ultra-violet Microwaves Radio waves

(6)

Some parts of the IR emission spectrum are not usable for imaging systems because the radiation is absorbed by water or carbon dioxide in the atmosphere (Figure 3).

Figure 3. IR atmospheric windows

Wavelength bands with good atmospheric transmission are the:

• 9-12 μm (LWIR) band that offers very good visibility of most

terrestrial objects

• 3-5 μm (MWIR) with the added benefit of lower, ambient,

background noise

• 0.35-2.5 μm (visible, NIR and SWIR) which, however, relies on

illumination to provide good imagery of objects at room temperature.

Detector materials are selected for high sensitivity in a wavelength band. The band is selected based on the expected nature of the scene. For example, the LWIR band allows better transmission thru smoke, dust and water vapor.

4.3 INFRARED DETECTORS

Infrared detectors are electro-optical detectors that absorb electromagnetic radiation and output an electrical signal proportional to the intensity of the incident

electromagnetic radiation (irradiance). Detectors are mainly of two categories: quantum detectors and thermal detectors [1].

Quantum detectors like the quantum well infrared photodetector (QWIP), generates a voltage or current proportional to the number of incoming photons. Quantum detectors are very sensitive. However, to reduce dark current noise, and give a detectable signal, cooling of the detector is required. A very stable temperature of typically 77K (-196° C) is usually used.

Thermal detectors work on the principle that the detector material is heated by the incoming radiation and that the temperature change can be detected by, for example,

Atmospheric transmission (%) Wavelength (μm) 100 0 1 3 6 14

SWIR MWIR LWIR

(7)

temperature dependent resistors like microbolometers. Thermal detectors are slow compared to quantum detectors but they do not need the same advanced cooling. Infrared video sequences used in this report are taken from cooled InSb and QWIP detectors and from uncooled microbolometer detectors.

4.3.1 Detector characteristics

Infrared detectors are characterized by certain figures of merit that describe their performance. Quantum detectors and thermal detectors are usually characterized by the same figures of merit [2]. A difference is that quantum detectors often have a

spectral dependence, that is, a function of wavelength (λ), while thermal detectors

generally do not. Some significant figures of merit are listed below [3].

Responsivity

Responsivity (R) is the detector output-to-input ratio. That is, the detector photocurrent (or voltage) output per unit incident radiant power at a particular

wavelength. The responsivity Rλ is expressed in units of A/W or V/W.

det S I R L A λ λ = , [A/W] where

IS is the root mean square (RMS) value of the signal current in [A],

Lλ is the RMS value of the incident radiation in [W/cm2_{], and}

Adet is the active detector area in [cm2].

Noise equivalent power

Noise equivalent power (NEP) is the monochromatic signal power necessary to produce a root mean square (RMS) signal to noise ratio (SNR) of unity. That is the intensity of incident light required to generate an output signal equal to the detector noise. Thus NEP defines a lower limit of light detection. NEP is determined by dividing the detector noise signal by the responsivity:

N I NEP R λ λ = , [W] where

IN is the RMS noise current in [A] and

Rλ is the responsivity in [A/W].

The unit of NEP is W but it is often normalized to W/√Hz to avoid the necessity of also specifying the electronics bandwidth used when it was measured.

(8)

Detectivity

Spectral detectivity (D) is used to specify the noise in a thermal detector. It is a measure of the least detectable signal to noise ratio. Quantum detectors are usually characterized by their RMS open circuit noise. The detectivity is the inverse of the NEP value: 1 D NEP λ λ = , [W-1_]

Spectral D-star (D*) is a normalization of D to unit area and bandwidth.

det

*

D _λ =D_λ A ∆ , [cm√Hz/W] f

where

Δf is the bandwidth of the detection electronics in [Hz] and

Adet is the active detector area on the FPA in [cm2]

Thus, D* is an area independent measure.

Peak D*

D* or normalized detectivity is the primary detector sensitivity performance

parameter. As mentioned above, is D* a function of wavelength and frequency and can be written as

(

)

det * , A f D f NEP λ = ∆ , [cm√Hz/W]

For quantum detectors the detectivity is a function of wavelength. The detectivity

above the cutoff wavelength (λc) is zero. Peak D* is the highest detectivity in the

spectral pass-band and occurs at a wavelength somewhat below λc (Figure 4).

(9)

Aperture diameter

The aperture diameter (D) of an infrared system is the clear aperture dimension of the collecting optics (Figure 5).

Figure 5. Aperture diameter (D) and focal length (f )

Frequently, imaging systems have a large number of lenses. However, the entrance aperture size is taken as the aperture diameter.

Focal length

Focal length (f ) is the distance between a lens and its focal point (Figure 5). For multiple lens systems the effective focal length is used.

F-number

The F-number (F/#) of an infrared system describes the light collection capabilities of the system. It is the ratio of the focal length to the aperture diameter of the collecting optics. It describes the collection cone of the imaging optics.

/# f

F

D

= Fill factor

The fill factor is defined as the ratio of light-sensitive area to total pixel size. The fill factor also determines the maximum achievable sensitivity.

Instantaneous field of view

Instantaneous field of view (IFOV) is a measure of the spatial resolution of an imaging system. The IFOV is the range of incident angles seen by a single detecting element in the focal plane.

(10)

Noise equivalent temperature difference

Noise equivalent temperature difference (NETD) is defined as the temperature difference that will produce a signal to noise ratio of unity. This is the smallest temperature difference the system can detect and is called the thermal resolution. NETD can be improved by increasing the size of the detecting elements but this would also degrade the spatial resolution. In general the thermal and spatial resolutions are inversely proportional.

Minimum resolvable temperature difference

Minimum resolvable temperature difference (MRTD) combines both spatial and thermal resolution into a single quantity that can be used to compare systems. MRTD is determined experimentally by viewing a test pattern that is slowly heated until the pattern becomes visible against the background (Figure 6).

Figure 6. Four-bar MRTD target. Spacing (d) determines spatial frequency.

Integration time

Integration time in a QWIP detctor is the amount of time light is integrated before reading out a detector output voltage. A short integration time gives less blur but will also reduce the signal to noise ratio. To achieve 60 frames per second the maximum integration time is 1/60 seconds minus the read out time.

Bar temperature Background

(11)

4.4 NOISE IN INFRARED SYSTEMS

Images generated by infrared systems are in general low in contrast and are sensitive to any kind of noise. Infrared systems suffer from mainly two types of noise:

• Temporal noise • Spatial noise

Temporal noise changes from frame to frame and is caused by:

• Dark current shot noise due to thermally generated charge carriers • Electronic noise such as 1/f noise, thermal noise and reset noise Scanning systems suffer mainly from temporal noise.

Spatial noise is more or less static between frames. It can be seen as a fixed pattern in each frame and is usually referred to as fixed pattern noise (FPN). FPN is the

dominant noise source in most staring infrared systems. The main reason for FPN is imperfections in the detectors. Each detector can show significant differences in responsivity, gain, and noise. A non-uniformity correction (NUC) process can efficiently cancel out FPN.

4.4.1 Three-dimensional noise model

Noise equivalent temperature difference (NETD) is limited in that it only characterizes temporal detector noise, whereas three-dimensional (3D) noise characterizes both spatial and temporal noises that are attributed to a wide variety of sources [3].

Successive frames of acquired noise are considered in the 3D noise model (Figure 7).

Figure 7. Three-dimensional noise coordinates.

(12)

A directional average is taken within the coordinate system shown in order to obtain eight parameters that describe the noise at the system's output. The noise is then calculated as the standard deviation of the noise values in the directions that were not averaged (Figure 8).

The parameter subscript that is missing gives the direction(s) that were averaged. The directional averages are converted to equivalent temperatures in a manner similar to NETD. The result is a set of eight noise parameters that can be used as analytical tools in sensor design, analyses, testing, and evaluation.

Figure 8. Three-dimensional noise parameters. The subscript that is missing gives the directions that were averaged

The majority of these parameters cannot be calculated like NETD with the exception

of σtvh, which is similar to NETD. It is actually identical to NETD with the exception

that the actual system noise bandwidth is used instead of the reference filter

bandwidth. The other noise parameters can only be measured to determine the infrared sensor artifacts. In infrared sensor models are reasonable estimates for these

parameters based on historical measurements.

If all the noise components are considered statistically independent, an overall parameter can be given at the system output as

2 2 2 2 2 2 2

tvh tv th vh v h t

σ σ σ σ σ σ σ

Ω = + + + + + +

The frame-to-frame noise is typically negligible, so it is not included in most noise estimates.

The three-dimensional noise can be expanded further to include the perceived noise with eye and brain effects in the horizontal and vertical directions. Composite system noise (perceived) in the horizontal direction can be given by

Noise Description Source

σtvh Random spatio-temporal noise Detector temporal noise

σtv Temporal row noise, line bounce Line processing, 1/f, readout

σth Temporal column noise, column

bounce

Scan effects

σvh Random spatial noise,

bi-directional fixed pattern noise

Pixel processing, detector-to-detector non-uniformity, 1/f

σv Fixed row noise, line-to-line

non-uniformity

Detector-to-detector non-uniformity

σh Fixed column noise,

column-to-column non-uniformity

Scan effects, detector-to-detector non-uniformity

σt Frame-to-frame noise, frame

bounce

Frame processing

Ω Mean of all noise components

(13)

( ) ( )

( )

1 2 2 2 2 2 2 tvhE Et v Eh vhEv Eh thE Et v hEh σ ξ ξ σ ξ ξ σ ξ σ ξ   Ω =_ + + + _

where Et, Ev(ξ), and Eh(ξ) are the eye and brain temporal integration, vertical spatial

integration, and horizontal spatial integration, respectively. In the vertical direction, the composite noise is given by

( ) ( )

( )

1 2 2 2 2 2 2 tvhE Et v Eh vhEv Eh tvE Et v vEv σ η η σ η η σ η σ η   Ω =_ + + + _

The noise terms included in each perceived composite signal correspond to only those terms that contribute in that particular direction.

4.4.2 Noise calculation

Staring arrays are dominated by random spatial noise, so a single fixed pattern noise model is usually used. Noise calculations are straightforward for most sensor

configurations except uncooled and PtSi sensors. The noise bandwidth is calculated and then σtvh is calculated for all sensors.

Noise bandwidth

For a staring imager the noise bandwidth is

int 1 2 noise f t ∆ =

where tint is the integration time.

Random spatial-temporal noise

The random spatial temporal noise, σtvh, is calculated assuming an ambient

temperature of 300K. The σtvh is

( )

( ) ( )

2 1 2 tvh det 4 # * noise optics f f L t A D d T λ λ σ λ λ λ ∆ = ∂ ∂

∏

∫

where Adet is the detector area, D*(λ) is D*peak · Dnormalized(σ), and

( )

L

T λ

∂

∂ is the partial

derivation of the ambient radiance with respect to temperature.

(14)

5 INFRARED VIDEO PROCESSING

Infrared video processing comprises several image processing steps to compensate for noise and detector deficiencies. Individual detector elements in the focal plane array show significant differences in response to incoming radiation that must be corrected for. Furthermore, frames captured by an infrared video camera system are in general low in contrast compared to images acquired from a visible camera system. Low contrast images are sensitive to any kind of noise and noise filtering is therefore needed.

Some of the fundamental steps in the infrared video processing pipeline are [4]: • Non-uniformity correction

• Dead pixel replacement • Noise filtering

• Histogram generation • Contrast enhancement • Output pixel processing

The pipeline is depicted in (Figure 9)

Figure 9. Infrared video processing pipeline

An IR camera system typically handles frame rates of up to 60 frames per second. In order to achieve real-time operation the image processing must be implemented in hardware. However, implementing algorithms in hardware is a hard and

time-consuming task. Hence, simulating the image processing algorithms in software would be a faster way to test and develop new algorithms.

Input frame

Output frame NUC Dead pixel _replacement Noise _filtering

(15)

5.1 INFRARED VIDEO PROCESSING SIMULATOR

An infrared video-processing simulator (VPSim) has been implemented in MATLAB. VPSim is a tool for rapid development and testing of image processing algorithms. The simulator does not provide any real-time operation but is run on video sequences acquired externally from an infrared camera system.

VPSim has a graphical user interface (GUI) for interaction with the simulator backbone. Each step in the video processing chain is presented to the user as a tab panel (Figure 10).

Figure 10. Infrared video processing simulator (VPSim) GUI.

The top panel in the simulator is a File tab where an input sequence is selected and a desired simulator system is selected. Depending on the selected system a number of tabs are loaded and displayed in the GUI.

Each tab contains information about a step in the image processing chain. A tab may be turned on or off to analyze its effect. Settings for each step are easily changed and the result is evaluated from various image displays.

Output sequences from VPSim are stored as AVI files. The simulator settings may be saved to file from the File tab for later recall. Different simulator settings can then be loaded and compared in a convenient way.

(16)

A separate GUI is used to create new simulator systems. A new system is defined by selecting tab panels from a list of existing image processing steps. The order, in which the tab panels are selected, is also the order in which incoming frames are processed by the simulator. A new system is saved to file and can later be loaded into VPSim.

5.2 NON-UNIFORMITY CORRECTION

Focal plane array (FPA) detectors are non-uniform in their response to incident radiation. To correct for the non-uniform response is a non-uniformity correction (NUC) applied to each pixel in the FPA data. A linear correction of the FPA data is achieved by applying a gain and an offset correction to each pixel value (Figure 11).

Figure 11. Infrared scene before and after NUC

Gain and offset corrections are usually based on two temperature measurements that provide a linear correction of FPA data. However, the FPA has a nonlinear

temperature response. To compensate for the temperature nonlinearity, a separate NUC process can be initiated. The result of a NUC process is the generation of new offset data.

To maintain accuracy in temperature measurements, a NUC should be done after changing temperature ranges or if the camera temperature changes or if the lens is changed. A NUC process is usually performed by inserting a shutter with known temperature in front of the detector. Cameras with an automatic NUC capability are of course easier to use in the field. Techniques such as scene-based NUC do not need a shutter but rely on image statistics to calculate non-uniformities.

5.3 DEAD PIXEL REPLACEMENT

Pixels in the FPA data that do not respond correctly to incoming radiation are called dead pixels. Dead pixels are caused by faults in the FPA detector. Dead pixels are replaced by interpolation of neighboring pixels. Pixel replication, bilinear interpolation and bicubic interpolation are common interpolation schemes.

(17)

5.4 NOISE FILTERING

FPN is a dominant noise source for staring detectors. The NUC process removes much of the FPN noise. An adaptive temporal filter is used for additional noise filtering (Figure 12).

Figure 12. Adaptive temporal noise filter.

The difference (fdiff) between each pixel in the input frame and the previous processed

frame is calculated. If the difference is below a certain threshold (T) the previous

processed frame (fprev) is updated with a portion of the frame difference. If the

difference is above the threshold the temporal noise filter is reset and a spatial

low-pass filter is applied to the input frame for these pixels (fLP). This is a recursive type of

temporal filtering that function as an IIR low-pass filter (Figure 13).

Figure 13. Infrared scene before and after adaptive temporal noise filtering

LP filter fdiff(x,y) > T Output frame Threshold fdiff Input

frame (f) _f_diff_{(x,y) ≤ T}

(18)

5.5 HISTOGRAM GENERATION

A histogram describes the contrast in an image. The histogram shows how many pixels in the image that belong to each gray level (Figure 14).

Figure 14. Infrared scene and its histogram normalized to [0,1].

An image with low contrast has pixel magnitudes concentrated within a narrow range. Histogram generation is fundamental to many contrast enhancement techniques. For example, stretching the distribution to a wider range will enhance the contrast.

5.6 CONTRAST ENHANCEMENT

Various methods exist for contrast enhancement.

5.6.1 Continuous adjust

Continuous adjust is a function for automatic control of brightness and contrast. The histogram functionality is used to adaptively control variations in brightness and contrast to provide an output image without flicker (Figure 15).

Figure 15. Histogram showing brightness and contrast

(19)

5.6.2 Histogram equalization

Histogram equalization is a method that flattens the image histogram. The contrast is increased in large uniform areas at the expense of contrast in areas whose intensities are less statistically represented (Figure 16).

Figure 16. Histogram equalization. An IR scene before and after histogram equalization along with the histogram equalized histogram.

As can be seen by observing the red circles in Figure 16, this method is not optimal if the object of interest is small and has a different temperature compared to the rest of the scene. Some solutions are to limit the region of interest (ROI) in the scene or to limit the contrast in large uniform areas (like the background) by taking the square root of the histogram before equalization.

(20)

5.6.3 Bilateral filtering

The real world has a high dynamic range while image media in general has a low dynamic range (Figure 17).

Figure 17. Dynamic range

An image may be sampled and handled internally with 16 bits. However, before the image data is output to a monitor is the dynamic range usually reduced from 16- to 8 bits. Fine details usually obtain only a small part of the dynamic range that may be lost in the compression. A method that decreases dynamic range while preserving details is based on the bilateral filter.

A bilateral filter is a nonlinear filter that depends on underlying image data and smoothes images while preserving edges [5]. Bilateral filtering derives from Gaussian low-pass filtering (smoothing) but preserves edges by decreasing the weights of pixels where the difference in intensity is large. The filter has a spatial and a radiometric part and the filter weights are determined by both geometric closeness and radiometric similarity to neighboring pixels (Figure 18).

High dynamic range scene

Low dynamic range media

(21)

Figure 18. Bilateral filtering

a - Input signal

b - Spatial domain filter kernel

c - Radiometric distance for the central d - Resulting filter kernel

(22)

Detail enhancement may be achieved by decomposing an image into layers using a bilateral filter. The bilaterally filtered image forms a base layer with large-scale information. A detail layer is formed by subtracting the base layer from the original image. Contrast is reduced in the base layer and the detail layer is added back to create an image with reduced contrast but with the details preserved (Figure 19).

Figure 19. Dynamic range compression

This will result in an image where the details use a larger portion of the available dynamic range (Figure 20).

Figure 20. Infrared scene before and after bilateral filtering

When a 16 to 8-bit data conversion is performed fine details and small relative temperature differences are still visible.

(23)

5.7 OUTPUT PIXEL PROCESSING

Before image data is sent to a display system the image data is converted from 16 to 8 bits by a lookup table (LUT) operation. A LUT is a table allowing a display system to map pixel values into colors or grey scale values with a convenient range of brightness and contrast. LUTs can be generated from histogram data and may have a linear or arbitrary transform (Figure 21).

Figure 21. Example of 16-to-8 bit compression and LUT

In the case of color output is the input intensities mapped to a color palette. Many cameras also have functions to overlay graphics and zoom.

(24)

6 ELECTRONIC VIDEO STABILIZATION

Video stabilization is the process of removing undesired camera motion, jitter, but keeping intentional camera motion in a video sequence. The main methods for video stabilization are optical stabilization and electronic stabilization. Optical (or

mechanical) stabilization involves motion sensors and active optical compensation. Optical stabilization is powerful but bulky and expensive. Electronic stabilization on the other hand does not require any external hardware but relies on image processing algorithms for stabilization.

Electronic video stabilization algorithms are based on optical flow measurements to estimate the motion between frames. Optical flow is an approximation of the apparent motion field, which in turn is a projection of the 3D motion field onto the 2D-plane.

Various methods for motion estimation exist and may be categorized as based on [6]:

• Gradients • Matching • Frequency

These methods produce displacement vectors of the estimated optical flow. In order to compensate for the unwanted motion also the desired camera motion must be

estimated. Unwanted motion is typically fast and jerky compared to desired motion and video stabilization is therefore a matter of low-pass filtering.

Once the motion between frames and the desired motion has been estimated, the undesired motion can be calculated and compensated for. The input frames are

transformed, warped, into their desired position by a motion transform model. Various parametric models exist where transform parameters are fitted to the estimated

displacement vectors. The process of finding the correspondence between two or more images of the same scene is referred to as image registration.

(25)

6.1 MOTION ESTIMATION

The motion estimation procedure produces displacement vectors with magnitude and direction of the estimated optical flow. These displacement vectors are later used for motion compensation. All motion estimation algorithms are computationally

demanding. Recall that IR video in general is a low contrast and high noise sequence. Some correspondence problems in motion estimation are independently moving objects, occlusions and variations in brightness and contrast. For example, the

occlusion problem occurs when moving objects cover or uncover background objects (Figure 22). In such areas no correspondence between frames can be found and hence motion is undefined.

Figure 22. Occlusion problem 6.1.1 Optical flow

Calculating optical flow is not a trivial task. A common approach is to assume that the total spatial and temporal derivatives of the image brightness remain constant. This is known as the constant brightness assumption and is usually true when the frame rate is high. The constant brightness assumption allows pixels to move between frames but not their intensity to change. The constant brightness assumption is important to most correlation and gradient-based optical flow estimation algorithms.

Consider a small motion (u,v) (Figure 23).

Figure 23. Small displacement (u,v) under constant brightness assumption.

(26)

Under the constant brightness assumption is

(

,

)

( )

,

I x u y+ +v =H x y

When the motion is small (u and v typically less than one pixel), is a Taylor series expansion of I , a valid approximation to the motion

(

) ( )

( )

, , higher order terms

, , x y x y I I I x u y v I x y u v x y I I I I I x y u v I I x y x y I x y I u I v ∂ ∂ + + = + + + ∂ ∂   ∂ ∂ ∂ ∂ ≈ + + _ = = _ ∂ ∂ _ ∂ ∂ _ = + +

Combining the equations above gives

(

)

( )

(

)

[

]

0 , , , , , , x y x y t x y t I x u y v H x y I x y I u I v H x y I x y H x y I u I v I I u I v I I u v = + + − ≈ + + − ≈ − + + ≈ + + ≈ + ∇

In the limit, when u and v go to zero this becomes the optical flow constraint equation

0 I_t I x y

t t

∂ ∂

 

= + ∇ __∂ _∂ _

The optical flow constraint equation is a single equation in two unknowns. Thus, it is only possible to determine the flow in the gradient direction and not flow parallel to an edge. This is referred to as the aperture problem (Figure 24).

Apparent motion True motion

(27)

One solution to the aperture problem is to assume that the flow field is smooth in a local neighborhood and then consider a small window of pixels. Lucas and Kanade developed a method based on this assumption [7].

6.1.2 Gradient-based Lucas and Kanade’s method

Lucas and Kanade’s method is a gradient-based method for motion estimation [7]. Assuming that all pixels within a small window move similarly it is possible to overcome the aperture problem. For example, a 5x5 window gives 25 equations per pixel

( )

1 1 1 2 2 2 25 25 25 x y t x y t x y t I p I p I p I p I p u I p v I p I p I p b A d             _{ }_{= −}    _{ }                  where , , x y t I I I I I I x y t ∂ ∂ ∂ = = = ∂ ∂ ∂

Lucas and Kanade finds the displacement vectors u and v from the least squares solution to the over-constrained system

2

min

Ad = →b Ad−b

The minimum least squares solution is given by the solution of

T T

A Ad =A b

which is the Lucas and Kanade equations

2 2 x t x x y y t x y y I I I I I u I I I I I v  _{ }   = −  _{ }            

∑

The summations are over all pixels in an NxN window. A too small window will be noisy and a too large will violate the locally constant flow field assumption. A window of size 5x5 or 7x7 is often used.

The translational displacement vectors (u,v) are given by

(28)

To find a solution to the Lucas and Kanade equations

• AT_{A should be invertible}

• AT_{A should not be too small due to noise (the eigenvalues should not}

be too small)

• AT_{A should be well-conditioned (the ratio between the eigenvalues}

should not be too large)

Observing the eigenvalues of AT_{A leads to three cases}

• λ1 and λ2are both large → large gradients → a high textured area

• λ1 is large and λ2is small → one large gradient → an edge

• λ1 and λ2are both small → small gradients → a flat area

Thus flat areas are not good for motion estimation since AT_{A becomes singular. The}

minimum eigenvalue could be compared to a threshold value for more robust motion estimation.

The Lucas and Kanade algorithm

1 Smooth the input windows to reduce noise in the gradient estimates 2 Compute spatial (Ix, Iy) and time (It) gradients for the windows

3 Form AT_{A and compute its inverse}

4 Compute the translational displacement vectors d =

( )

A AT −1A bT

Lucas and Kanade will not work if the constant brightness assumption is not satisfied or if the motion is not small or if a pixel does not move like its neighbors.

Improved performance is obtained by weighting the summation to give more significance to gradients towards the center of the window [6].

(

_T ₂

)

1 _T ₂

d = A W A − A W b

where W a weighting function.

Iterative Lucas and Kanade

The Lucas and Kanade method is only approximate since it drops the higher order terms in the Taylor series expansion. The algorithm may be improved by iterative refinement in a Newton-Raphson style.

Iterative Lucas and Kanade algorithm

1 Estimate the motion at each pixel by solving the Lucas and Kanade equations 2 Warp H towards I by using the estimated flow field

3 Repeat until convergence or for a fixed number of iterations

(29)

Multiresolution Lucas and Kanade

Maximum detectable motion with Lucas and Kanade’s method depends on the size of and the filtering applied to the gradient windows. Typically motion is assumed to be within one or two pixels. Larger motion may be handled by a multiresolution Lucas and Kanade method. Gaussian blur followed by subsampling a factor of two creates Gaussian image pyramids (Figure 26).

Figure 25. Multiresolution image pyramid.

Initial displacement vectors are estimated at the highest level of the pyramid where motion is small. Motion estimates are then propagated down the pyramid to be refined at the next higher resolution.

Multiresolution Lucas and Kanade algorithm

Compute Lucas and Kanade (u,v) at the highest level for each lower level i

1 Take motion estimates ui+1 and vi+1 from level i+1

2 Bilinear interpolate to create matrices ui* and vi* of twice the resolution of

level i+1

3 Scale ui* and vi* by multiplying with 2

4 Use a window displaced by ui* and vi* and

5 Apply Lucas and Kanade to get the corrections in flow: ui’(x,y) and vi’(x,y)

6 Update the motion estimates: ui = ui* + ui’ and vi = vi* + vi’

end

Gaussian pyramid of image H Gaussian pyramid of image I

(30)

6.1.3 Matching techniques Block-correlation

Block-based motion estimation algorithms consider a block of pixels in one image and search for the corresponding block in the next image. Some correlation measure is usually used for matching. These methods are also referred to as area-based, correlation-like, or template matching methods [8]. Block-based methods require a huge amount of computations since all possible motions within some search window must be evaluated (Figure 27).

Figure 26. Block-matching.

The size will affect the resolution of the estimated motion field. A small block-size provides a detailed motion field but is also more vulnerable to false motion estimates since the correlation measure might provide false matches due to noise. A large block-size gives a more robust but less detailed motion estimate. However, the total number of computations does not depend on the block-size.

It is possible to reduce the amount of computations by using hierarchical search patterns. A rough estimate is determined with larger blocks at low resolution levels. Estimates are then fine-tuned at higher resolution levels with smaller blocks. Block matching algorithms differ in search strategy and matching criteria. A search may be performed over the whole image or within a small window. The most common matching criteria are the sum of squared difference (SSD)

( )

(

)

( )

)

2 2 1 , , , , x y SSD u v =

∑

I x u y+ + −v I x y

or the sum of absolute difference (SAD)

(31)

The displacement estimate is the vector (u,v) that minimizes the SSD or SAD criterion. It is assumed that all pixels belonging to one block have a single

displacement vector, which is a special case of the local smoothness constraint (same as for Lucas and Kanade's method). Minimizing the SSD or SAD criteria can be seen as imposing the optical flow constraint on the entire block.

Some limitations with block-based methods are that: • They are computationally demanding.

• A flat area may be incorrectly matched to another flat area. • Correlation based similarity measures are sensitive to intensity

changes.

• Discontinuities at block boundaries

In spite of its limitations block-based methods are easy to implement in hardware and are often used in video codecs such as MPEG2.

A multiresolution approach may reduce the number of computations. Multiresolution block matching algorithm

Compute displacement vectors (u,v) at the highest level for each lower level i

1 Take motion estimates ui+1 and vi+1 from level i+1

2 Bilinear interpolate to create matrices ui* and vi* of twice the resolution of

level i+1

3 Scale ui* and vi* by multiplying with 2

4 Use a window displaced by ui* and vi* and

5 Compute displacement vectors to get the corrections in flow: ui’(x,y) and

vi’(x,y)

6 Update the motion estimates: ui = ui* + ui’ and vi = vi* + vi’

end

A translational block motion model cannot handle rotation or zooming. For larger blocks the translation model is a poor approximation. A generalized block matching method uses a higher order model and warp blocks prior to matching (step 4).

(32)

Radon transform

Instead of using computationally expensive 2D correlation is the Radon-transform an efficient method for motion estimation based on 1D correlation. A 2x1D Radon transform uses projections of the input frame along its columns and rows and correlates them with projections of previous frames to estimate motion (Figure 28).

Figure 27. Row and column projections of an image

Motion estimation based on the horizontal and vertical image projections is

computationally very efficient and therefore suitable for hardware implementation. However, it may be less accurate then block-correlation when it is applied to small blocks since some information is lost in the projections.

Feature-based

Distinctive interest points, like corners, do not suffer from the aperture problem and can be used for accurate motion estimation. Interest points are often called corners even though not all points are corners. However, good features are distinctive corners that are sparse, informative and reproducible.

(33)

Popular corner detectors are Förstner, Harris and SUSAN. Harris corner detector is probably the most common. Examples of popular similarity measures are normalized cross-correlation, sum of absolute differences and sum of squared differences. Usually the normalized cross correlation (NCC)

(

)(

)

(

) (

)

1 1 2 2 , 2 2 1 1 2 2 , ( , ) x y x y I I I I NCC u v I I I I − − = − −

∑

is used since it is invariant to changes in brightness and contrast. Note that NCC will fail if the image intensity is constant since this will result in a divide by zero.

The main concern with feature-based methods is to find invariant and robust features. Features detected in one image may be lost due to a number of reasons such as:

occlusions, perspective distortions or changes in intensity. Furthermore, many putative matches are erroneous or belong to outliers such as moving objects, and should

therefore not be used in the transform model estimation. Robust tracking, like RANSAC, are search methods that automatically detect unreliable matches. Algorithm for feature based methods

1 Detect features in frames

2 Find putative matches between frames

3 Estimate motion parameters by the use of robust tracking

Harris corner detector

Harris corner detector [9] is a combined corner and edge detector based on the local autocorrelation function

( ) ( ) ( )

, , , T

E u v = u v M u v

where M is built from the horizontal and vertical gradients within a local neighborhood W 2 2 2 2 W W _x _x _y x y y W W I I I x x y I I I M I I I I I I x y y  __∂ _ __∂ ___∂ _  _ _ _ __ _ ∂ ∂ ∂          =_ _{= } _        ∂ ∂ ∂      _ __ _ _ _   __∂ __∂ _ __∂ _   

∑

The gradients are smoothed with a Gaussian filter to reduce noise.

Let λ1 and λ2 be the eigenvalues of M. λ1 and λ2 will be proportional to the principal

(34)

Depending on the eigenvalues there are three cases:

• If both curvatures are small, the local autocorrelation is flat, and the windowed image region is of approximately constant intensity. • If one curvature is high and the other low, the autocorrelation is ridge

shaped, and shifts along the ridge (i.e. along an edge) cause little change in E, indicating an edge.

• If both curvatures are high (sharp peak in autocorrelation) shifts in any direction will increase E, indicating a corner.

Harris corner edge response

A measure of corner and edge response is needed to select isolated corner pixels and

to thin the edge pixels. A response function computed from λ1 and λ2 alone is

attractive since it will be rotational invariant. Using the trace of M

( )

2 2

1 2 x y

Tr M = +λ λ =I +I

and the determinant of M

( )

₂ ₂

( )

2

1 2 x y x y

Det M =λ λ =I I − I I

avoids the explicit eigenvalue decomposition of M. The response function used in Harris' corner detector is

( )

(

( )

)

2 ₂ ₂

( )

2

(

₂ ₂

)

2

x y x y x y

R=det M −k trace M =I I − I I −k I +I

where R is positive in the corner region, negative in the edge region, and small in the flat region. Corners are defined by local maxima of R.

(35)

6.1.4 Frequency-based

Phase-correlation

Phase-based motion estimation is performed in a transform domain using for example a Fourier or wavelet transform. A simple approach is to use a fast Fourier transform (FFT) and phase-correlation in the frequency domain. Consider two frames related by a translation (u,v)

( ) (

, ,

)

H x y =I x u y− −v

which in the frequency domain becomes

( )

2 ( )

_{( )}

, j u v ,

F H x y_ _=e− π ω ω+ F I x y_ _

where the translation corresponds to a phase-shift

( )

( ) 2 2 , , , j u v F H x y F I x y e F I x y π ω − +        _{ =}    

The translation (u,v) is found by taking the inverse FFT

( )

₍

₎

2 , j u v IFFT e− π + ω ⇒δ x u y v− −

If the motion is a simple translation, this method produces exactly one peak at the current translation. If the motion is more complex more peaks will exist. Dividing the image into blocks allows for more accurate motion estimates. Observing only the phase and ignoring the amplitude makes the method robust to changes in brightness and contrast.

The algorithm for phase-correlation is the same as for block-matching except that the matching is performed in the transform domain. A problem is that even though the motion is estimated correctly the method does not tell where in the image the motion took place. One solution is to add a correlation-matching step similar to the block-matching method. However, the block-matching is steered by the motions from the phase correlation, and is therefore much more efficient than regular block matching and results in less false matches.

(36)

6.2 TRANSFORM MODEL ESTIMATION

If the depth of objects in a scene is small compared to the distance to the camera, the transformation between images can be approximated by a 2D parametric

transformation. The choice of transform model should correspond to the geometric deformation of the scene. The choice of motion model is also a question of desired accuracy. However, a simpler model is more robust to noise.

6.2.1 Parametric transform models

Geometrical transformations are mathematically simplified by using homogeneous coordinates. The transform of one image may then be expressed as a linear

transformation ′ X = TX where X is an image represented as 1 2 1 2 1 1 1 n n x x x y y y     =       X   

T is the transformation matrix

1 2 3 4 5 6 7 8 1 t t t t t t t t     =       T

and X' is the transformed image. A number of models exist for the transformation matrix (Figure 29).

Figure 28. Example of parametric transform models

None of the models mentioned below are able to handle zooming in the scene. However, these models are robust and the low contrast and high noise nature of IR images makes more complex models more likely to fail.

Similarity Affine Projective

(37)

Translation model

A translation motion model is a two parameter model that handles horizontal and vertical motion. The transformed coordinates are given just by adding the translation (u,v) to the original coordinates

x x u

y y v

′ = + ′ + +

which in homogeneous matrix notation is

1 2 1 0 0 1 1 0 0 1 1 x t x y t y ′         _{′ =}                      , and thus 1 2 u t v t =   = 

A simple model like this is very robust and usually accurate when frame-to-frame motion is small and planar. A more complex model may be erroneous when motion is small and the translational model is therefore often a good solution for video

stabilization.

Similarity model

A similarity transform model is an angle preserving, four parameter model able to handle translation, uniform scale and rotation about the Z-axis. The transformed image coordinates are calculated as

1 2 3 2 1 4 x t x t y t y t x t y t ′ = + +   ′ = − + +

 , and the transformation matrix becomes

1 2 3 2 1 4 0 0 1 t t t t t t     = −_ _     T

When the frame-to-frame motion is small a similarity transformation is sufficient to model most motions.

Affine model

An affine model is a six parameter model where transformed coordinates are calculated as 1 2 3 4 5 6 x t x t y t y t x t y t ′ = + +   ′ = + +

 and its transformation matrix is

1 2 3 4 5 6 0 0 1 t t t t t t     =       T

This model handles non-uniform scale and shear. Parallel lines are still parallel after transformation but the length between lines is no longer preserved. The six degrees of freedom makes the model powerful but also sensitive to noise. Poorly estimated

(38)

Projective model

A projective model also handles perspective projections. It is an eight parameter model where 1 2 3 4 5 6 7 8 1 t t t t t t t t     =      

T , and the transformed image coordinates are

1 2 3 7 8 4 5 6 7 8 1 1 t x t y t x t x t y t x t y t y t x t y + +  ′ =  ₊ ₊   ₊ ₊  ′ =  + + 

Due to the eight degrees of freedom distortions are likely to occur when frame to frame motion is small or when noise is present. The projective method is therefore not very robust for image registration of IR video.

6.2.2 Robust estimation of transform model parameters

Displacement vector estimates are sometimes false and rejection of outliers would improve the transform model. Especially feature matching methods seem to suffer from false matches. A simple approach would be to fit the motion transform model to the median of the displacement vectors. RANSAC is another method for robust estimation of transform model parameters.

RANSAC

Random sample consensus (RANSAC) is an algorithm for robust tracking. Instead of using as much data as possible to obtain an initial solution and then attempting to eliminate invalid data points, RANSAC uses as small an initial data set as feasible and enlarges this set with consistent data when possible [10]. To estimate a projective model four samples are needed, an affine model needs three points and a similarity model only two. The points must not be collinear as this leads to a degenerate case, from which the model parameters cannot be calculated.

The RANSAC algorithm Repeat for N samples

1 Select a random initial data set of 2, 3 or 4 correspondences depending on the model

2 Estimate model parameters (T)

3 Compute a geometric image distance error for each putative correspondence 4 Compute the number of inliers

Choose the model T with most inliers.

(39)

6.3 CAMERA MOTION ESTIMATION

Instead of just transforming the input frame to a fixed reference frame it is possible to estimate the desired camera motion and create a smooth motion in the sequence. A very simple way to estimate the camera motion (ego-motion) is to low-pass filter the frame-to-frame motion estimates.

This camera motion estimate is based on the assumption that unwanted motions typically are fast compared to the desired camera motion. A more advanced state-space model is used in [11].

6.4 MOTION COMPENSATION

A general method for video stabilization is outlined in (Figure 30).

Figure 29. Video stabilization

Motion compensation is the step where the input frame is transformed into its

stabilized position. The stabilized position is determined by the difference between the estimated frame-to-frame motion and the estimated camera motion (Figure 31).

(40)

Figure 30. Frame difference before and after motion compensation. Note the moving objects in the scene.

Interpolation of the transformed image is required since the transformed image

coordinates in general not are integers. Even after interpolation there will be undefined areas outside the image borders. To handle undefined areas it is possible to zoom the image to fill the undefined areas or to use mosaicing to fill the empty parts.

(41)

7 RESULTS

Five different motion estimation methods for video stabilization were evaluated: • Lucas and Kanade

• Block-correlation • Phase-correlation • Radon transform and • Harris corner detector

All methods were implemented and evaluated in MATLAB. The methods were tested on sequences of different complexity. All methods that were tested perform well on sequences were contrast is high, noise is low, and motion is small. In more complex sequences with noise, low contrast, multiple moving objects, occlusions, and/or variations in brightness and contrast, their performances are degraded to various extents.

Overall, Lucas and Kanade’s gradient-based method performs best, as was also found in [12]. Large motions are effectively handled by the multiresolution approach. The method is robust to occlusions and fails only when the image is very flat or noisy. Lucas and Kanade is also computationally effective compared to block-matching techniques.

The main drawback with block-correlation is the computational complexity, especially when image displacement is large. Furthermore, blocks that contain little information (i.e. flat or noisy areas) are often incorrectly matched. Another problem is the choice of correct block-size. These problems also arise in phase-correlation methods. Robust techniques for model estimation exists and improve the results.

The radon transform is of great interest due to its computational efficiency. However, the method suffers from correspondence problems. Noise, variations in brightness and moving objects with deviant temperature all limit the method's performance. A more robust implementation of the radon transform is based on binary gradients of the projections.

Feature-based methods like Harris corner detector overcome some of the

correspondence problems by only considering distinctive points in the image. Harris corner detector has the ability to handle large motions with low computational load. Along with RANSAC for outlier rejection, Harris' method performs very well on sequences with evenly distributed interest points. However, the background is often diffuse in IR sequences. If the method finds more corners that belong to moving objects than to the static scene, it will become a target tracker instead of a motion stabilizer.

One important note is that none of the motion-transform models mentioned above can handle zooming. Models with more degrees of freedom sometimes model zoom effects erroneously. This is also the case in noisy and low contrast sequences. Thus, more robust video stabilization of IR sequences is achieved with a simpler motion

(42)

transform model like the translation or similarity transform model. In case of zooming the reference frame should be updated more frequently than during panning.

8 SUMMARY

An infrared video processing simulator (VPSim) has been developed for use with MATLAB. The simulator allows advanced image processing algorithms to be simulated without the painful work of first having to implement them in hardware. Algorithms for non-uniformity correction, dead pixel replacement, adaptive

spatiotemporal noise filtering, histogram generation, contrast enhancement and output pixel processing were implemented.

Via tabs in a graphical user interface parameters for each algorithm are accessed. The simulated image output is evaluated via image displays in the GUI. This way,

algorithms may be tuned and optimized before any time is spent on hardware implementation.

A modern QWIP detector along with a correct NUC process provides low noise IR images. When the integration time is long (i.e. 14 ms) noise filtering becomes less important. However, contrast may still be limited and hence, contrast enhancement plays an important part in the image processing chain. Very good contrast

enhancement based on bilateral filtering has been noted.

(43)

9 ABBREVIATIONS

10

AVI Audio video interleaved

FFT Fast Fourier transform

FPA Focal plane array

FPN Fixed pattern noise

GUI Graphical user interface

IFFT Inverse fast Fourier transform

IFOV Instantaneous field of view

InSb Indium antimonide

IR Infrared

LUT Lookup table

MPEG Moving pictures experts group

MRTD Minimum resolvable temperature difference

MTF Modulation transfer function

NCC Normalized cross correlation

NEP Noise equivalent power

NETD Noise equivalent temperature difference

NUC Nonuniformity correction

OTF Optical transfer function

PtSi Platinum silicide

QWIP Quantum well infrared photodetector

RANSAC Random sample consensus

RMS Root mean square

SAD Sum of absolute differences

SNR Signal to noise ratio

SSD Sum of squared differences

SUSAN Smallest univalue segment assimilating nucleus

VPSim Video processing simulator

(44)

10 REFERENCES

[1] A. Dahlberg. Infrared technology for military applications. Swedish Journal of Military Technology #4, pp. 22-28, 2003.

[2] Everett Companies LLC. Physics of electro-optic detectors, 2004

[3] U.S. Army Night Vision and Electronic Sensors Directorate. Night vision thermal imaging systems performance model, 2002.

[4] C. Mammen, W. Matseas. Specification CT HRVP Board, 2001. [5] C. Tomasi, R. Manduchi. Bilateral filtering for gray and color images.

Proceedings of IEEE International Conference on Computer Vision, pp. 836-846, 1998.

[6] S.S. Beauchemin and J.L. Barron. The Computation of Optical Flow. ACM Computing Surveys, Vol. 27 No. 3, pp. 433-467, 1995.

[7] B. Lucas, T. Kanade. An iterative image registration technique with an application to stereo vision. Proceedings of Image Understanding Workshop, pp. 121-130, 1981.

[8] B. Zitova, J. Flusser. Image registration methods: a survey. Image and Vision Computing 21, pp. 977–1000, 2003.

[9] C. Harris, M.Stephens. A combined corner and edge detector. Proceedings of 4th Alvey Vision Conference, pp. 147-150, 1988.

[10] M. Fischler, R. Bolles. Random sample consensus: a paradigm. Model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), pp. 381–395, 1981.

[11] A. Litvin, J.Konrad, W. Karl. Probabilistic video stabilization using Kalman filtering and mosaicking. Proceedings of SPIE Conference on Electronic Imaging, pp. 663-674, 2003.