IN THE FIELD OF TECHNOLOGY DEGREE PROJECT

(1)

IN THE FIELD OF TECHNOLOGY DEGREE PROJECT

MEDIA TECHNOLOGY

AND THE MAIN FIELD OF STUDY

COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2018 ,

Evaluation of Video Stabilisation Algorithms in Dynamic

Capillaroscopy

OSKAR WILHELMSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

KTH Royal Institute of Technology

School of Electrical Engineering and Computer Science

Evaluation of Video

Stabilisation Algorithms in Dynamic Capillaroscopy

Author:

Oskar Wilhelmsson

oskarwi@kth.se Examiner:

M˚ arten Bj¨orkman

Supervisor:

Alexander Kozlov

A thesis submitted for the degrees:

Master of Science in Computer Science and

Master of Science and Engineering in Media Technology Performed at

Karolinska Institutet

June 28, 2018

(4)

Abstract

In the field of dynamic capillaroscopy, measurements of the capillary blood cell velocity (CBV)

give significant insight into the human body. For instance, diabetes, hypertension and peripheral

arterial occlusive disease all affect CBV. However, the videos used to measure CBV – captured

with a microscope – are often displaced in relation to the microscope by small motions of the

finger or toe. Stabilisation algorithms are commonly used to reduce this problem, in order to

carry out measurements such as CBV using the stabilised video. Artificial capillaroscopy videos

were used to compare the stabilisation algorithms Mutual information, Single-step DFT, Block

matching and Phase correlation in terms of computational time; RMSE, PSNR and MSE; and

resistance to blurring effects. Single-step DFT was indicated to be the best suited algorithm in

all aforementioned metrics.

(5)

Utv¨ardering av videostabiliseringsalgoritmer inom dynamisk kapill¨aroskopi

Sammanfattning

Inom dynamisk kapill¨ aroskopi ger m¨ atningar av kapill¨ ar blod-cell hastighet (CBV) signifikanta in- sikter inom den m¨ anskliga kroppen. Till exempel, diabetes, hypertoni och perifer arteriell ocklusiv sjukdom p˚ averkar CBV. D¨ aremot ¨ ar videorna som anv¨ ands f¨ or att m¨ ata CBV – tagna med ett mikroskop – ofta f¨ orskjutna i relation till mikroskopet p˚ a grund av sm˚ a r¨ orelser av ett finger eller en t˚ a. Stabiliseringsalgoritmer anv¨ ands vanligen f¨ or att reducera detta problemet med avsikt att d¨ arefter anv¨ anda den stabiliserade videon f¨ or att m¨ ata viktiga egenskaper, som till exempel CBV.

Artificiella kapill¨ aroskopivideor anv¨ andes f¨ or att j¨ amf¨ ora stabiliseringsalgorithmerna Mutual in-

formation, Single-step DFT, Block matching and Phase correlation inom ber¨ akningstid; RMSE,

PSNR och MSE; och resistens mot suddighet. Single-step DFT indikerades som den b¨ ast l¨ ampade

algoritmen inom de ovann¨ amnda m˚ atten.

(6)

Abbreviations

AHE – Adaptive Histogram Equalization CBV – Capillary Blood cell Velocity

CLAHE – Contrast Limited Adaptive Histogram Equalization DFT – Discrete Fourier Transform

FFT – Fast Fourier Transform GUI – Graphical User Interface LED – Light-Emitting Diode MAD – Mean Absolute Difference MSD – Mean Squared Difference MSE – Mean Squared Error

PSNR – Peak Signal-to-Noise Ratio RGB – Red Green Blue (color channels) RMSE – Root Mean Squared Error SAD – Sum of Absolute Difference

pCBV – Peak Capillary Blood cell Velocity

(7)

1 Introduction 1

1.1 Research question . . . . 4

1.2 Purpose . . . . 4

1.3 Delimitations . . . . 4

1.4 Choice of algorithms . . . . 4

2 Background 5 2.1 Preprocessing . . . . 5

2.1.1 Color space . . . . 5

2.1.2 Contrast Limited Adaptive Histogram Equalization . . . . 6

2.1.3 Median filter . . . . 7

2.1.4 Gaussian blur . . . . 8

2.2 Stabilisation . . . . 8

2.2.1 Phase correlation . . . . 9

2.2.2 Single-step DFT . . . . 9

2.2.3 Block matching . . . . 9

2.2.4 Mutual information . . . . 10

2.3 Artificial capillaroscopy videos . . . . 10

2.4 Evaluation metrics . . . . 11

2.4.1 Root Mean Squared Error . . . . 11

2.4.2 Peak Signal-to-Noise Ratio and Mean Squared Error . . . . 12

3 Method 13 3.1 Materials . . . . 13

3.2 Recording capillaroscopy videos . . . . 13

3.3 Artificial capillaroscopy videos . . . . 14

3.3.1 Validity of ground truth translations . . . . 15

3.3.2 Simulation program . . . . 17

3.3.3 Modifications of simulation program . . . . 17

3.4 Evaluation . . . . 21

4 Results 23 4.1 Computational performance . . . . 23

4.2 Precision . . . . 24

4.3 Precision in relation to Gaussian blur . . . . 27

5 Discussion 30 5.1 Assumptions and critique . . . . 30

5.2 Social and ethical aspects . . . . 30

6 Conclusions 32 6.1 Future research . . . . 33

7 References 34

(8)

1 Introduction

This section clarifies the area of research and provides related background information necessary for the thesis. The end of this section describes the research question, purpose and delimitations of this thesis. The motivation for the choice of algorithms is also described.

The primary function of the human cardiovascular system – the exchange of nutrients and waste products – is carried out by its smallest vessels, capillaries. This exchange is affected by the state of the capillaries, such as pressure and the rate of blood flow. The state of the capillaries can not be deduced by larger vessels – which are easier to examine – but have to be examined directly [1].

There are several methods to examine the capillary function, such as capillaroscopy, photoplethys- mography, laser Doppler Flowmetry, and electron microscopy. Out of these methods, capil- laroscopy has been indicated as particularly useful for in vivo investigation of the capillaries in the nail-fold bed [2].

Since capillaries perform the primary function of the cardiovascular system, one can expect that their properties give considerable insight into the human body. Several human conditions affect the capillaries and can be detected by capillaroscopy, such as diabetes [1], hypertension [1], peripheral arterial occlusive disease [1], arterial hypertension [2], systemic sclerosis [3] and blackfoot [4].

Capillaries have been studied extensively using capillaroscopy. Capillaroscopy has been used to investigate several areas on the human body, such as the eyes [5], tongue [6] and skin [7]. However, most research in microcirculation has focused on capillaries in the skin; more specifically, in the nail fold of a finger or a toe [1, 8]. Uniquely, such capillaries lie parallel with the skin, instead of orthogonally [1, 8], as shown in Figure 1.

Figure 1: Left: illustration of the position of the capillaries on a nail-fold and the corresponding top view of the capillaries. Right: the top view of the capillaries elsewhere on the skin.

Naturally, this allows an observer to inspect a larger portion of the capillary, as well as the blood movement along the capillaries [8]. It is important to point out that the capillaries themselves are not visible; instead, it is the blood cells moving through them that are visible [5].

Significant insight into the state of the human body can be achieved via analysis of the capillary

blood cell velocity (CBV), studied in dynamic capillaroscopy. For instance, diabetes, hypertension,

peripheral arterial occlusive disease and blackfoot all affect CBV [1, 4].

(9)

The magnification necessary to observe individual capillaries is considerable. Figure 2 illustrates a capillary under 400x magnification. Due to the large magnification, small movements of the subject – so called micro-movements – cause the capillary to move considerably within the resulting video frame, which reduces the accuracy of measurements [9].

Figure 2: Image of a capillary at 400x magnification [10].

In previous research, the micro-movements have been reduced by physically stabilising the finger and stabilisation of the resulting video [11, 6, 8, 12]. The most successful method of physical stabilisation utilizes a metal bracket, see Figure 3, mark (1), which almost eliminates the movement of the finger in relation to the microscope [6]. However, Watanabe et al. [11] mentioned that slight finger movements are still present, despite the bracket.

Figure 3: Metal bracket attached to objective (1), and the cuff (4) [8]. Mark (2) and (3) of this

figure were not used in this study.

(10)

CBV varies significantly even between adjacent capillaries, which limits comparative analysis be- tween individuals [8]. In order to compare CBV between patients, a more consistent measurement is needed [8]. Such measurements can be achieved by using a cuff, see Figure 3 [8]. The cuff inflates rapidly and stops the blood flow for a period of one minute [12, 4]. Upon release, the time until the peak velocity (pCBV) of the blood flow is measured [8]. However, because an external force is applied to the finger, the deflation of the cuff may cause additional motion in the video. Moreover, Shore [1] mentioned that the cuff causes difficulties in examination. A large body of previous research has successfully used digital video stabilisation to reduce motions in the capillaroscopy video. Hence, it is likely that digital video stabilisation can be used to reduce the motions caused by the sliding of the finger or toe and the deflation of the cuff. However, previous research has shown that different stabilisation algorithms produce varying quality of stabilisation, and at varying degrees of computational efficiency [10].

Figure 4: Diagram of a video stabilisation process.

The purpose of digital video stabilisation is to map the transformation of one image to another image so that the features in the two images have identical coordinates after transformation. The first step in the process is estimating the transformation of the features between two images, which is called global motion estimation. The estimated transformations are then applied to one of the images so that the images are aligned as precise as possible, which called motion compensation or motion correction [13, 14, 15]. A diagram outlining a process of digital video stabilisation is shown in Figure 4.

Several digital video stabilisation methods have been used in this field, such as optical flow [5], phase correlation [10], cross correlation [9], feature-based [10], intensity based [10], and block matching algorithms using different metrics such as mutual information, cross correlation coeffi- cient and mean squared error (MSE) [16, 5, 17, 18]. However, the literature study for this thesis revealed that only few studies have validated the results of the stabilisation algorithms in the field of capillaroscopy. Wu et al. [16] and Dobbe et al. [9] indirectly validated the used stabili- sation methods by validating the subsequent capillaroscopy measurements. Only one study – by Karimov and Volkov [10] – directly validated their results using artificially created videos where the true geometrical transformations were known. The study compared three stabilisation algo- rithms for dynamic capillaroscopy; full-image superposition, feature-based, and phase correlation [10]. Applicability of other currently available stabilisation algorithms in dynamic capillaroscopy remains to be verified. The root mean squared error (RMSE) is used for comparing stabilisation algorithms in the field of dynamic capillaroscopy [10], and in a related field [14]. Other evaluation metrics such as peak signal-to-noise ratio (PSNR) and mean squared error (MSE) are widely used in the digital stabilisation field as well [19].

In order to evaluate the exact precision of the stabilisation algorithms in terms of RMSE, PSNR

and MSE, exact geometric transformations for each image of the video are needed [19, 10]. How-

ever, since the micro-movements of the finger are not known, these can not be used as ground

truth. In order to closely emulate the micro-movements of a finger in this study, the transforma-

(11)

tions present in the video are estimated. The estimated transformations can then be applied to a completely stable artificial video, making the video destabilised [19, 10]. This video is subse- quently stabilised by a stabilisation algorithm, and the resulting video can be evaluated against the original stable video [19, 10]. The stabilisation algorithm also outputs an estimation of the transformations – described numerically – which can be compared to the transformations originally applied to the original stable video [10].

Given that the artificial video adequately represents real capillaroscopy video, a valid comparison of stabilisation algorithms can be achieved; thereby allowing researchers and developers to make an informed decision on which stabilisation algorithm to use for dynamic capillaroscopy.

1.1 Research question

Out of the stabilisation algorithms Mutual information, Single-step DFT, Block matching and Phase correlation, which stabilisation algorithm is best suited for the field of dynamic capil- laroscopy in terms of computational time, RMSE, MSE, PSNR and resistance to blurring effects?

1.2 Purpose

The research in the field of dynamic capillaroscopy that is carried out after the videos have been stabilised is affected by the precision of the stabilisation algorithm. Researchers in the field of dynamic capillaroscopy often use different stabilisation algorithms, some of which have not been compared against alternative algorithms or have not been validated for this field [5, 9, 16, 18], and therefore may have a negative effect on subsequent study. Thus, the purpose of this thesis is to assist researchers in making an informed decision on which stabilisation algorithms to use in the field of dynamic capillaroscopy.

1.3 Delimitations

Due to time limitations, several delimitations were made in this study. Only five artificial videos were created. Four state-of-the-art stabilisation algorithms were compared. One considerable delimitation of this thesis was to only focus on the translational transformations in the stabilisation process. This is motivated by the fact that most research in this field has primarily focused on translational transformations. As estimated later in this report, in section 3.3, the rotational component of the transformations in a medical examination using dynamic capillaroscopy was relatively small compared to the translational component.

1.4 Choice of algorithms

The choice of stabilisation algorithms to include in this study was based on if the algorithm had

previously been used in research within the field of dynamic capillaroscopy. Another criterion

was that the algorithms should not be based on the same principles. The reason for this was

that future research could potentially gain insight into which principles work well in this field,

based on the results of this study. However, both above mentioned criteria were disregarded for

Single-step DFT, which – as described in section 2.2 – is based on similar principles compared

to Phase correlation. The reason for disregarding this criterion was that Single-step DFT can be

seen as an extension of Phase correlation – as described in section 2.2.2 – and could therefore

potentially produce better results.

(12)

2 Background

This section outlines relevant theory necessary to justify the choice of methodology and to interpret the results of this study.

2.1 Preprocessing

”Digital NC [nail fold capillaroscopy] images are usually noisy and characterized by very low contrast between the background and capillaries. For this reason, NC [nail fold capillaroscopy]

image processing is usually accomplished by transforming and analyzing the image onto a different color space where capillaries appear enhanced.” [20]

As explained by the above quote, there is a need for preprocessing in capillaroscopy, caused by the typically poor contrast and high noise in the videos. The choice of preprocessing in this study was motivated by the choice of preprocessing in previous research in capillaroscopy.

2.1.1 Color space

Stabilisation algorithms typically use images with only one color space as input [20]. Therefore, capillaroscopy videos typically need to be converted from RGB to a black and white. Different converting processes produce different contrast in the resulting video [20], and stabilisation algo- rithms are affected by the contrast in the video [14]; therefore, a conversion resulting in higher contrast is advantageous in the stabilisation process.

Figure 5: Evaluation of the quantitative performance impact of the RGB color channels for a related capillaroscopy algorithm [20].

In a study on segmentation of capillaroscopy videos by Goffredo et al. [20], it is indicated that

the highest contrast between capillary and background is achieved when using subtracting 0.5x of

the red channel from 1.5x of the green channel. The performance impact – as defined by Goffredo

et al. [20] – of the aforementioned combination of the red and green color spaces is illustrated by

the combinations’ central position in the darkest red area in Figure 5 (pink dot).

(13)

(a) RGB (b) −0.5x Red +1.5x Green

Figure 6: Before and after converting color space.

Figure 6 illustrates the difference in contrast between a RGB image from the microscope, and the combination of green and red color spaces.

2.1.2 Contrast Limited Adaptive Histogram Equalization

Histogram equalization is a widely used technique to improve contrast. One underlying assumption of histogram equalization is that the image quality is uniform throughout all areas of the image, which may cause distortions such as amplification of noise if the assumption does not hold. To avoid noise amplification – and other artifacts from the equalization process – the image can be divided into blocks, where histogram equalization is applied to each block [21].

Adaptive Histogram Equalization, or AHE, divides the image into many small blocks and adjusts the histogram of each block so that it matches a certain distribution, such as the Uniform or Rayleigh distribution. Since the process changes the intensity of the pixels, the intensity values may no longer line up smoothly between blocks, causing an edge around each block. In AHE, these edges between the blocks are removed using bilinear interpolation [21].

In this context, the interpolation can be interpreted as the process of replacing the edge intensity

values with new intensity values; which, according to Reza [22], are created by taking weighted

averages of pixel intensity values of neighboring pixels. Bilinear interpolation is a commonly used

interpolation technique [23], and is often used when scaling digital images.

(14)

(a) −0.5x Red +1.5x Green (b) Image (a) enhanced with CLAHE

Figure 7: Before and after CLAHE

If the image has very low contrast, AHE may cause neighboring blocks to have different histograms – and therefore appearance – even with the bilinear interpolation. Contrast Limited AHE, or CLAHE, is an extension of AHE where the contrast for each block is limited by a clip limit [22].

The clip limit reduces the effects between neighboring blocks; therefore, CLAHE can be used in images with very low contrast [22], such as capillaroscopy. Figure 7 show a close up of before and after CLAHE was applied to an image from a capillaroscopy video. The block size used in this study was 7x7 and the clip limit was 0.01.

Other contrast enhancements are available as well. Kwon et al. [24] used a reference histogram of one image and matched the remaining images in the video to the reference. Goffredo et al.

[20] used a gamma correction algorithm for contrast adjustments. Lin [17] used a homomorphic filter to compress dynamic range and enhance contrast. Wang [18] and Demir [5] used CLAHE for enhancing the contrast. CLAHE is used in this study since it is the most commonly applied contrast enhancement found in the literature.

2.1.3 Median filter

A median filter was used by Goffredo et al. [20] and Karimov and Volkov [10], which both studies

justified by the large amount of speckle noise present in capillaroscopy videos. Lin [17] also

motivated the use of a median filter with the need to denoise the image, and added that the

median filter also reduces blurring of edges. Moreover, Demir [5] used a Gaussian filter to smooth

the video. Median filtering is used in this study for the purpose of denoising images from the

videos, since median filter is used in the majority of the relevant articles found in the literature.

(15)

(a) −0.5x Red +1.5x Green with CLAHE (b) Image (a) denoised using median filter

Figure 8: Before and after median filtering

Median filtering is performed by applying a kernel on all pixels of the image, where the output value for each pixel is the median of the pixels within the kernel. The length and height of the kernel must be an odd number [25]. The kernel size used in this study was 7x7. A close up example of the median filter being applied to a CLAHE enhanced image is shown in Figure 8.

2.1.4 Gaussian blur

Demir [5] used Gaussian blur to smoothen capillaroscopy videos by applying a Gaussian kernel to each image of the videos, and defined the Gaussian kernel as follows.

G(x, y, σ) = 1

2πσ

²

e

⁻^{x2 +y2}^2σ2

(1)

Where σ denotes the degree of smoothing, and (x, y) denotes the position in relation to the center of the kernel for each respective axes [5]. In this study, blurring of the videos was performed using the Matlab function imgaussfilt [26]. Moreover, the default kernel size (G

size

) in Matlab was used, which is defined as follows [26].

G

_size

(σ) = 2 ∗ ceil(2σ) + 1 (2)

Where the ceil function rounds the input up to the closest integer [27]. Gaussian blur was used in this study in order to evaluate each algorithms’ resistance to blurring effects, as described in section 3.3.3.

2.2 Stabilisation

When performing stabilisation on a video, a common approach is to compare each image in the video against one or more reference images, in order to estimate the global motion. Global motion refers to the displacement – direction and magnitude of motion – in pixels of the entire image necessary to align the images. The process may start with estimating the transformation of two successive images and applying the said transformation to the latter image, so that the images are aligned. The process is then repeated on the successive images until all images have been aligned [14].

The choice of reference image, or reference images, for the stabilisation affects both the end

result and computational complexity of the stabilisation algorithm [10]. A common method is to

(16)

select the first image of the video as a reference and map all subsequent images to that frame [18, 13]. Clearly, this method may result in poor stabilisation if the first image is of low quality.

Furthermore, using all images as reference images has resulted in comparatively lower stabilisation errors in previous research, at the cost of higher computational time [10]. A third – widely used – method is to use the previous image as reference, which has already been stabilised in relation to its previous image. However, in practice, this results in errors building up over time [14].

The type of transformation between the images is also an important factor to consider, since different stabilisation algorithms estimate different types of transformations [15]. Translational transformations – which are used in this thesis – are horizontal and vertical displacements in the plane [28]. Moreover, nonrigid transformations allow non-uniform mapping between images by using a displacement field [29]. According to Erturk [15], translation is the type of transformation that is the most commonly encountered in videos. Although, Liu et al. [30] indicate that heartbeat and breathing cause nonrigid transformations in images of microvascular blood flow.

Another distinction within the field of image stabilisation is between intensity-based and feature- based stabilisation algorithms. The word intensity refers to the pixel values of the image matrix.

One of the aims of feature-based methods is to choose features in the frames that are recognizable and are likely to appear in both frames, such as edges and corners [28]. In regards to computational time, it is clearly advantageous to only map features that are easy to find in both frames, compared to mapping each pixel. However, because of the inherent sparsity of reliable features – especially in capillaroscopy [31, 14] – this method is comparatively less robust [10]. For this reason, no feature based stabilisation algorithms are used in this thesis.

The algorithms used in this study are explained in more detail below. Note that all stabilisation algorithms in this study take two images as input and output translational displacements.

2.2.1 Phase correlation

A cross-correlation field can be used to determine the time shift between two signals. For instance, Bonnefous and Pesqu´ e [32] used the peak of the cross-correlation field between two sound signals to determine the time shift between the two signals. This method can be expanded to 2D images, where the peak of the cross-correlation field corresponds to the translational displacement between two images [33]. Phase correlation exploits this method by using the fast Fourier Transform (FFT) to calculate the cross-correlation field more efficiently. Similarly to the cross-correlation method, the subsequent step is to find the peak of the cross-correlation field, which corresponds to the translational displacement [10].

2.2.2 Single-step DFT

Single-step DFT first calculates the cross-correlation field using FFT, similarly to Phase correla- tion. Unlike the implementation of Phase correlation described above, Single-step DFT uses an upsampling factor of two, which increases the resolution of the field. The reason for upsampling the cross-correlation field is to provide a more precise peak. The next step is to perform considerably more upsampling in order to even more precisely estimate the translational displacement between the two images. However, using FFT to upsample is time and memory consuming because FFT has to upsample the entire cross-correlation field of the two images. Instead, it is more efficient to focus on the small area of the image where the initial estimate of the peak was found. Therefore, instead of using FFT, this algorithm uses Discrete Fourier Transform (DFT), which does not have to use the entire cross-correlation field. Since the peak corresponds to the translational displace- ment between the two images, the specificity of the estimate of the displacement between the two images is proportional to the amount of upsampling used, assuming a peak is found [33]. In this study, an upsampling factor of 100 was used.

2.2.3 Block matching

Instead of using entire images, a smaller area of each image can be used, referred to as the search

region. The block matching technique divides the search region of one of the two images into many

overlapping blocks. Thereafter, one block from the search region of one image is compared to all

overlapping blocks in the other image. The comparison is done using a similarity metric, where

(17)

the two blocks with the highest similarity are used to determine the translational displacement between the two images by comparing the pixel coordinates of each block [34, 35].

The similarity metrics mean absolute difference (MAD) and mean squared difference (MSD) are generally used. However, sum of absolute difference (SAD) was used in this study, since it has low computational complexity compared to MAD and MSD. The equation for SAD is:

SAD =

N

X

i=1 M

X

j=1

B

1

(i, j) − B

2

(i, j)

(3)

Where B

1

is the first block and B

2

is one of the overlapping blocks, both of which have size N xM pixels [36].

The size of the search region used in this study was 18.75% of the image width and height and the block size used was 12.5%. The implementation of the algorithm used in this study places the search region in the center of the first image, but updates the location of the search region for the next image in the video sequence by the estimated displacement of the previous image.

2.2.4 Mutual information

This algorithm is referred to as image registration in Matlab [37], which is a general term for aligning images [38]. Because it is a general term, it could be used to describe other algorithms used in this thesis, such as Phase correlation [15]. To avoid any ambiguity of this term, the algorithm described below is referred to by its similarity metric, Mutual information, in the remainder of this thesis.

The pyramid approach was used in the image registration process. In the pyramid approach, both of the two input images are initially downsampled, which reduces the resolution of the image.

The registration algorithm then estimates the translational displacement which maximizes the similarity metric, as described below. The estimated displacement is passed on to – and refined at – the next level of the pyramid, which uses a less downsampled image. This process is repeated until the resolution of the original image is reached. It is worth mentioning that by starting at a coarse level, the pyramid approach avoids incorrect local optimum, which may have been chosen if the algorithm had only used the highest resolution [39]. Three pyramid levels were used in this study.

In order to maximize the similarity metric, translational displacements are compared by iteratively displacing the first of the two images. For each iteration, the pixels that overlap are evaluated according to the similarity metric, which is Mutual information in this study. The displacement that results in the highest similarity is then returned as the estimated translational displacement [40].

A Parzen window is a non-parametric kernel density estimation of a random variable [41]. In this context, it can be viewed as estimating the probability density function of the histogram of an image. Thevenaz and Unser [40] defined Mutual information (S) as follows.

S =

N

X

l=1 M

X

k=1

h(l, k)log h(l, k) h

T

(l)h

R

(k)

(4)

Where h(l, k) is the normalized joint Parzen window for the test and reference image; h

T

(l) and h

_R

(k) are the marginal Parzen windows for the test and reference image respectively; and N and M are the number of overlapping pixels in the test and reference image respectively [40].

2.3 Artificial capillaroscopy videos

One method to evaluate a stabilisation algorithm’s precision is to compare successive images in

a video that has been stabilised using the algorithm according to an evaluation metric. Hence,

if a video has been stabilised with high precision, the evaluation metric should indicate a low

error, compared to if the video is not stabilised [42]. In the context of this study, this method

(18)

of evaluating stabilisation algorithms could be performed on the original capillaroscopy videos.

However, the method has its limitations, such as being dependent on the video being stabilised [42]. Furthermore, if an area of the image is displaced outside its frame by the stabilisation algorithm, only the overlapping pixels can be evaluated. Therefore, if the number of overlapping pixels is low, this method will not be meaningful [42]. While there are suggested methods for handling a small overlap between the images [42], previous studies indicate that artificial videos are necessary for accurate evaluation of stabilisation algorithms [43, 19]. Therefore, artificial capillaroscopy videos are used to evaluate the stabilisation algorithms’ precision in this study, instead of the original capillaroscopy videos. It should be noted, however, that real capillaroscopy videos are still used in this study in order to create realistic artificial capillaroscopy videos. Several simulation programs – that can be used to create artificial videos of dynamic capillaroscopy – are discussed below.

Huang et al. [44] and Tsukada et al. [45] used a simulation programs to simulate cell movements within capillaries. Although, because these simulations are of cell movements within a capillary, they do not contain all the components of real capillaroscopy videos, such as a background.

Artificial capillaroscopy videos were simulated by Dobbe et al. [9] and Karimov and Volkov [10];

however, the artificial videos that were used – and the simulation programs used to create them – are not available. Tresadern et al. [46] presented artificial nail-fold capillaroscopy videos that appear more realistic than the other simulations mentioned above. The simulation program – used to create the artificial videos – was made available for this study. However, it was previously used to evaluate algorithms that estimate capillary blood cell velocity [46]. Because of this, the program was modified to be used for evaluating stabilisation algorithms. Moreover, the program required manual input of several parameters, for example the capillary width and cell velocity. In this study, all such parameters were chosen to make the resulting artificial videos appear visually similar to the corresponding real capillaroscopy videos.

2.4 Evaluation metrics

Three different precision metrics were used in this study. The first metric – RMSE – was used to compare the vector of ground truth translations against the vector of estimated translations. The other two metrics, MSE and PSNR, were used to evaluate pixel differences between the ground truth video and the stabilised video. PSNR and MSE are commonly used to compare stabilisation algorithms; however, these metrics do not correlate with perceived video quality [19].

2.4.1 Root Mean Squared Error

Karimov and Volkov [10] used RMSE to evaluate the difference between estimated translational displacements and ground truth displacements. The vector of ground truth translations and the vector of estimated translation both hold two values per image in the video, one value for each axis. Thus, the translations need to be combined in order to estimate the combined error for both the x and y axis. According to Karimov and Volkov [10], the vectors are combined as follows:

V

_i^EST

= p

(∆x

_i

+ ∆y

_i

), V

_i^GT

= p

(∆x

_i

+ ∆y

_i

) (5)

However, this does not correspond to the plots provided in the same study. The plots instead follow the following equations:

V

_i^EST

= q

(∆x

²_i

+ ∆y

_i²

), V

_i^GT

= q

(∆x

²_i

+ ∆y

²_i

) (6)

The Phase correlation algorithm was used both in this study and in the study by Karimov and

Volkov [10]. However, using equation (5) results in errors about a factor of 10 lower. Equation

(6), on the other hand, resulted in a lower difference between the errors in the studies for Phase

correlation. Therefore, it is more likely that Karimov and Volkov [10] used equation (6) to combine

the x and y translations, which is the reason equation (6) was used in this study. See Figure 9 for

an illustration of how the combined translation compares to the translation of the x and y axes.

(19)

0 100 200 300 400 500 Frames

-60 -40 -20 0 20 40 60 80 100

Pixels

Ground truth translations

X Y

Combined translation

Figure 9: Plot of the relation between the combined translation and the translation of the x and y axes.

Let V

^GT

be the combined ground truth vector of translations and V

^EST

be the vector of combined estimated translations. In order to evaluate the difference between the combined ground truth translation and the combined estimated translations, the following equation was used:

RM SE(V

^GT

, V

^EST

) = s

P

N

i=1

(V

_i^GT

− V

_i^EST

)

²

N (7)

Where N is the number of elements in the vectors being compared.

2.4.2 Peak Signal-to-Noise Ratio and Mean Squared Error

Let I

_i^GT

be a vector of the intensity values in image i in the ground truth video. Moreover, let I

_i^EST

be a vector of intensity values in one image of the stabilised video provided by the stabilisation algorithm being evaluated. Then, the following equation describes the MSE between the videos.

M SE(I

^GT

, I

^EST

) =

N

X

i=1

(I

_i^GT

− I

_i^EST

)

²

N (8)

Where N is the number of images in the videos being compared [47]. Moreover, Hor´ e and Ziou [47] describes PSNR as follows.

P SN R(I

^GT

, I

^EST

) = 10log 255

²

M SE(I

^GT

, I

^EST

)

(9)

Where 255 represent the number of quantization levels in the video.

(20)

3 Method

The materials and methods used are described in this section, together with a validation of the used simulation program.

3.1 Materials

The microscope CapillaryScope 500 Pro was used in this study. It has LED lights; a polarization filter, which reduces glare effects; and a maximum magnification of 500x. See Figure 10 for an image of the microscope.

Figure 10: CapillaryScope 500 Pro

The USB microscope has a maximum resolution of 1280x1024 pixels. However, setting the micro- scope resolution to 640x480 pixels allows for a faster frame rate: 30 images per second. A higher frame rate allows more reliable measurements of the CBV [7], which is why the lower resolution – 640x480 pixels – was used in this study.

3.2 Recording capillaroscopy videos

A program with an accompanying GUI was created to record the images. The GUI allow the user to control brightness and contrast of the displayed image; start and stop the recording; and note a point in time of the recording.

Figure 11: The GUI used to record capillaroscopy videos. The image displayed is of a ruler with

0.1 mm increments.

(21)

An image of the GUI is available in Figure 11. The microscope outputs one RGB frame at a time, which is converted to the -0.5x Red +1.5x Green color space as mentioned in section 2.1.1. The videos are saved in uncompressed AVI format.

Figure 12: Cuff (1), bracket (2) and foam (3)

The videos used in this study were recorded under controlled conditions by medical professionals.

First, the participant was seated with one arm on the table. The position is illustrated Figure 12.

The cuff was then attached to the participant’s finger, see Figure 12, mark (1). Subsequently, the finger was placed on a block of plastic foam – as shown in Figure 12, mark (3) – to compensate for the slight angle of the tip of the finger. The plastic foam thus allows for a viewing angle that is parallel to the skin surface, from the lens perspective.

Although the thin skin on the nail-folds is translucent, it does not allow a clear image of the capillaries. In this study, Paraffin oil was applied by a medical professional in order to make the skin more transparent, and thereby making the capillaries more discernible. This is common practice in capillaroscopy, and is mentioned in Shore [1].

After allowing the participant time to adjust their position as to minimize the micro-movements, the medical professional adjusted the position and height of the microscope to find focus of the capillaries. The recording was initiated after focus was obtained. After about 30 seconds of recording, the cuff was inflated to 200 mmHg of pressure. After 60 seconds, the cuff was rapidly deflated, which caused a rapid flow of blood. After the blood had returned to normal flow – which took approximately 30 seconds – the recording was stopped. Two participants were included in the study, resulting in five capillaroscopy videos.

The bracket – Figure 12, mark (2) – was included with the microscope. However, its shape constricted blood flow to the nail-folds, which may affect subsequent measurements of blood flow.

For this reason, the bracket was reshaped to allow blood to flow freely to the nail-folds.

3.3 Artificial capillaroscopy videos

The validation of the artificial capillaroscopy videos is continually described through this section.

The validation was done by matching the properties of the artificial videos to the real capillaroscopy

videos.

(22)

3.3.1 Validity of ground truth translations

One of the most important properties of the artificial videos used in this study is to accurately

reflect motion that is present during a real capillaroscopy examination, the procedure of which was

outlined in section 3.2. The real translational displacements were estimated by stabilising the real

capillaroscopy videos, resulting from the examinations. The estimated translations, output by the

stabilisation algorithm, were later applied to the artificial videos, and are referred to as ground

truth translations in the remainder of this thesis. The stabilisation was done using the Single-step

DFT algorithm. Moreover, in order to obtain a more precise result, 100 reference frames were

used. Each of the stabilised videos were averaged to produce one image – which will be referred to

as video average in the remainder of this thesis. Averaging a video is done by adding the intensity

values of the same pixel index from each image in the video, and dividing the resulting value by

the number of images in the video. This process was repeated for each pixel index. The precision

of the aforementioned stabilisation procedure was evaluated, and the validity of the ground truth

translations was validated, by inspecting the video averages.

(23)

(a) Video 1 (b) Video 2

(c) Video 3 (d) Video 4

(e) Video 5

Figure 13: Video averages of corresponding videos.

Figure 13 shows the video average for each of the real capillaroscopy videos. The evaluation and

validation of the stabilisation result, and therefore the accuracy of the ground truth values, were

subjectively determined by the sharpness of each image. As shown in Figure 13, Video 3 and Video

5 have sharper averages than other videos. However, considering duration of each video, about

17 seconds, all video averages can be considered sharp. Furthermore, because of the sharpness of

the video averages, it is unlikely that the rotational component of the transformations is greater

than the translational component.

(24)

3.3.2 Simulation program

As mentioned in section 2.3, the simulation program used in this study is described in Tresadern et al. [46]. The provided simulation program is described below.

Figure 14: Outline of a capillary over a video average.

In order to create the artificial videos, the video averages were used to manually trace the center- lines of each clearly visible vessel. The tracing was done in Photoshop, and was saved as an image.

The simulation program used each centerline image to label each pixel with a number, creating a chain of linked points in the same shape as the vessel centerline. Given the 2D nature of the centerline image, the simulation program was sometimes not able to link the points in the correct order when it crossed itself. See figure 14 for an example of a crossed vessel with its centerline marked in black. Because of this, the centerlines were adjusted slightly, so that the simulation would produce simulated blood flow in the correct direction for these crossings.

The capillaries in the simulation program were populated by cells whose morphology was approxi- mated with a 2D Gaussian distribution. Because of this approximation, the simulation assumes a spherical shape of the blood cells. It is widely known that blood cells are not spherical; therefore, this is a limitation of the simulation.

Several additional parameters of the capillaries – other than shape – were manually input, and were chosen for each capillary to visually match the corresponding capillary in the real capillaroscopy videos. These parameters are: width, width ratio between the arterial limb and the venous limb, amount of cells, direction of blood flow, contrast, and opacity.

In addition to parameters controlling the appearance of the capillaries, the simulation program also allowed controlling the background. According to Tresadern et al. [46], the background of the simulation is randomly generated with the intention to ”simulate slight intensity variation in the underlying tissue.”. The simulation program allowed controlling the contrast of the background and the randomly generated intensity variations within the background. The background was also manually configured to visually match the real capillaroscopy video.

3.3.3 Modifications of simulation program

In order to enable evaluation of stabilisation algorithms, two modifications were made of the pro-

vided simulation program. One of these modifications was to apply the ground truth translations

to the artificial videos, which is described in more detail in section 3.4. The second modification

was to add padding to the artificial video, which increased the pixel size of the artificial videos

(25)

from 640x480 to 1280x960. This modification was necessary since applying transformations to the image would cause areas of the original image to be shifted outside the frame of the image. Such areas are padded with black pixels in Matlab, creating sharp edges around the image. The sharp edges may affect the precision of the stabilisation algorithms. In order to avoid these black pixels, the padding was filled with the same background as in the artificial videos. Additional modifica- tions were made to the provided simulation program in order to further improve the realism of the artificial videos, as described below.

The overall velocity of the blood cells varies over time due to vasomotion [4], which is visible in real capillaroscopy videos. Since the movement of cells between images in the video can be considered as noise for stabilisation algorithms, the fluctuation of capillary blood cell velocity may affect the accuracy of the stabilisation algorithms. Hence, vasomotion was added to the simulation.

0 50 100 150 200 250 300 350 400 450

Frames 0.3

0.4 0.5 0.6 0.7 0.8 0.9 1

Amplitude

Vasomotion

Figure 15: The vasomotion frequency and amplitude.

Figure 15 illustrates the change in overall velocity of the blood cells over time that was added to all artificial videos. The frequency and amplitude of the vasomotion was chosen to match the real capillaroscopy videos. Moreover, the 500 images in each artificial videos are centered around the release of the cuff, which was emulated in the artificial videos by the rapid increase of velocity.

This rapid increase in velocity can be seen starting at frame 250 in Figure 15. It was also observed that the vasomotion amplitude diminish with increasing velocity in the real capillaroscopy videos.

The diminishing amplitude with increasing velocity was approximated by setting the amplitude of the sinusoidal part of the signal to zero, which can be seen in Figure 15 between frame 250 and 500.

The brightness may vary over the images of the real capillaroscopy videos, since the position of the finger changes slightly relative to the light source on the microscope during the recording.

To introduce these brightness variations into the simulation, the mean intensity of the real cap- illaroscopy videos was calculated for each image. These intensity values were then compared to the mean intensity values for each image of the artificial video. Subsequently, the differences in intensity were subtracted from the artificial videos, so that the mean intensity value of each im- age in the artificial videos corresponded with the mean intensity value of each image in the real capillaroscopy video. Similar changes of brightness over time are mentioned by Tresadern et al.

[46].

(26)

In addition to the above mentioned properties of real capillaroscopy video, defocus may also affect the precision of stabilisation algorithms. Because the finger moves in relation to its position to the lens of the microscope, the focus of the image is affected. It is widely known that blurring effects reduce the amount of information of the image. Because of the reduction in the amount of information, the precision of stabilisation algorithms may be affected by the blurring effects.

Therefore, in the context of capillaroscopy, it is advantageous if a stabilisation algorithm is more resistant to blurring effects. In order to evaluate each stabilisation algorithms’ resistance to blurring effects – using the chosen evaluation metrics – a Gaussian blur filter was applied to the simulated videos. The Gaussian blur filter was set to 3,9,12 and 15 standard deviations, producing four blurred videos for each artificial video, in addition to the original artificial video.

According to Goffredo et al. [20], capillaroscopy images contain a large amount of speckle noise. In order to simulate the noise present in the original capillaroscopy videos, speckle noise was added to the artificial videos. The amount of speckle noise added was determined by comparing the speckle noise of a 50x50 pixel area of a frame from the real capillaroscopy video and an equally sized area from a artificial video. In this study, speckle noise was increased until the standard deviation of the two areas was equal to two decimal points.

The microscope produces frames with 256 quantization levels. The simulation program, however,

produces frames in double format, which has more quantization levels, and therefore contains

more information. This additional information could be leveraged by the stabilisation algorithms

to produce a better result in the artificial videos than on the real capillaroscopy videos. For

this reason, the simulation was quantized to 256 levels, matching the quantization level of the

microscope.

(27)

(a) Original artificial image (b) Original capillaroscopy image

(c) CLAHE (d) CLAHE

(e) CLAHE and median filter (f) CLAHE and median filter

Figure 16: A close up from the artificial video – Video 5 – on the left, compared to a close up

from the real capillaroscopy video, on the right, with corresponding adjustments.

(28)

Figure 16 illustrate the differences between a close up of an image from a real capillaroscopy video – Video 5 – and the corresponding artificial video, at different stages of preprocessing. The most visible differences in Figure 16 are: opacity, width and edge contrast of the capillaries, and background contrast. While these factors can be adjusted and potentially improved upon using the simulation program, the artificial videos are subjectively assessed to adequately represent real capillaroscopy videos.

3.4 Evaluation

As mentioned in section 3.3, the simulation program outputs padded artificial videos. These padded videos, in combination with the ground truth translations, are used to evaluate each algorithm. However, there are several intermediate steps in the process, outlined below.

Figure 17: The vector of ground truth translations, in combination with the padded artificial video, was used to produce the cropped artificial video.

The reason for padding the artificial video becomes apparent in Figure 17, which show a diagram

of how the artificial video is processed. When applying the ground truth translations to the

padded artificial video, black borders appeared around on the edges on the padded destabilised

artificial video because the center of the image had been displaced. These black borders needed

to be eliminated from the evaluation because they could affect the result of the stabilisation

algorithms. The reason for padding the artificial video was thus to eliminate these black borders

in the resulting artificial destabilised video, referred to as Cropped artificial video in Figure 17.

(29)

Figure 18: Diagram describing how the stabilisation algorithms were evaluated.

As mentioned in section 2.4, the evaluation of the stabilisation algorithms was done using the metrics RMSE, MSE and PSNR. Initially, the padded artificial video – output from the simulation program – was cropped, see top left corner of Figure 18. This video contained no translations and was therefore completely stable, and acted as a reference to the best possible result of the stabilisation algorithm. Subsequently, the Cropped artificial video was preprocessed the same way the real capillaroscopy videos were, as described in section 2.1, excluding the color space changes.

One important detail to note is that, while the translational displacements were estimated on the preprocessed Stabilised artificial video, shown in Figure 18, the translational estimations were applied to the Destabilised artificial video, shown in Figure 17. This is important for two reasons. The first reason is that the Destabilised artificial video was not preprocessed; therefore, the resulting video was not distorted by preprocessing. The second reason is that the padding in the Destabilised artificial video helped avoid black borders around the resulting stabilised video, which would be present if the padding was not used. The resulting video, which is shown in Figure 18 as Stabilised artificial video, was then compared to the Cropped artificial video in terms of MSE and PSNR. These comparisons were performed according to equations (8) and (9), presented in section 2.4.2.

Another detail to note for the comparison of pixel-wise differences is that the center point of the reference image determined the general center point of the stabilised video, since all stabilised images were mapped to that image. Naturally, there are variations of the position of the center of the stabilised videos, caused by the inaccuracy of the stabilisation algorithms. The alignment between the stabilised video and the completely stable reference video is important because, if the videos were not aligned, the pixel-wise evaluation metrics – MSE and PSNR – would not be comparing the same pixel indices. To account for this, the first image was chosen as a reference image in all videos, and no displacement was added to the first image. This clearly aligns both the Cropped artificial video and the Stabilised artificial video, allowing pixel-wise comparison. In addition to calculating MSE and PSNR, RMSE is calculated by comparing the vector of ground truth translations to the vector of estimated translations, produced by the stabilisation algorithm.

Since the first image of each video was not displaced, only the last 499 images in each video were

evaluated. The computational time was measured by using a timer in Matlab. The timer was

started directly before each video was input to the stabilisation algorithm. The timer was stopped

directly after the stabilisation algorithm had returned the translations for each video.

(30)

4 Results

In this section, the results achieved on the artificial capillaroscopy videos in this study are objec- tively described in terms of computational time, precision – measured in RMSE, MSE and PSNR – and how these metrics vary with increased blurring. In addition, a couple of properties of the artificial videos that were used to estimate the translational displacements are described. Further- more, the computer used in this study is described to enable comparison of the results related to computational time. Results from a similar study are also presented to enable comparison.

Properties of videos

Property Video 1 Video 2 Video 3 Video 4 Video 5

Mean combined displacement (pixels) 22.2454 19.9530 26.2458 49.2336 176.0921 Max combined displacement (pixels) 46.3255 37.9185 82.1428 89.0545 215.6740 Table 1: The mean and max combined displacements of the ground truth translations for each video (pixels).

As shown in Table 1, the max combined displacement and the mean combined displacement was considerably higher in Video 5 than the other videos. Moreover, Video 2 had both the lowest max combined displacement and mean combined displacement.

4.1 Computational performance

The computational performance was evaluated on a Mac with a 2.7 GHz Intel core i7 processor with 16 GB of RAM.

Computational time (s) of algorithms

Algorithm Mean over all videos Mean per image

Single-step DFT 43.1609 0.0863

Phase correlation 103.8988 0.2078

Mutual information 581.7218 1.1634

Block matching 209.4725 0.4189

Table 2: The performance of the algorithms measured in computational time.

As shown in Table 2, the mean computational time of Single-step DFT was the lowest, followed by Phase correlation, Block matching and Mutual information, in that order. As mentioned in section 1, stabilisation results in dynamic capillaroscopy have previously only been directly validated in one study. In the aforementioned study by Karimov and Volkov [10], an algorithm referred to as Full-frame compensation method had a mean computational time of 1.24 seconds per image.

Note, however, that the resolution of the videos in the study by Karimov and Volkov [10] was

about 20% of the resolution used in this study.

(31)

0 2 4 6 8 10 12 14 16

Standard deviation of Gaussian kernel

0 100 200 300 400 500 600 700

Seconds

Computational time of algorithms

Single-step DFT Phase correlation Mutual information Block matching

Table 3: Computational time of algorithms with 95% confidence intervals, calculated over all five videos for each of the five levels of Gaussian blur. Where the standard deviation of the Gaussian kernel is indicated to be 0, no Gaussian blur was applied.

Since none of the confidence intervals overlap between the algorithms in Figure 3, the mean computational time of each algorithm was different from the mean computational time of all other algorithms for each level of Gaussian blur, with statistical significance at the 95% confidence level.

Because most of the confidence intervals overlap for each algorithm in Figure 3, these results can not be used to determine an increase or decrease in computational time in relation to the level of Gaussian blur. Note that all instances in this section where the standard deviation of the Gaussian kernel is indicated to be 0, no Gaussian blur was applied.

4.2 Precision

Root mean square error (RMSE)

Algorithm Video 1 Video 2 Video 3 Video 4 Video 5 Mean Min Single-step DFT 0.4373 0.3103 0.2428 0.3649 0.2066 0.3124 0.2066 Phase correlation 0.5920 0.8815 2.2789 0.5128 131.5010 27.1532 0.5128 Mutual information 10.6541 2.0500 0.3246 0.2127 107.8163 24.2115 0.2127 Block matching 0.7624 0.5991 2.3059 0.7334 3.8596 1.6521 0.5991

Table 4: RMSE of each video for all algorithms, without Gaussian blur.

As shown in Table 4, Single-step DFT had the lowest mean RMSE, followed by Block matching,

Mutual information and Phase correlation, in that order. The RMSE of Phase correlation and

Mutual information on Video 5 was considerably higher compared to the mean RMSE. In the

previously mentioned study by Karimov and Volkov [10], the Full-frame compensation method

had a mean RMSE of 0.76.

(32)

Peak signal-to-noise ratio (PSNR)

Algorithm Video 1 Video 2 Video 3 Video 4 Video 5 Mean Max Single-step DFT 39.9053 40.2054 40.2479 39.3543 38.9924 39.7411 40.2479 Phase correlation 34.5928 35.0595 34.8799 34.9000 31.3400 36.4621 38.6200 Mutual information 38.6395 39.7069 39.7238 40.1061 32.4664 38.1285 40.1061 Block matching 38.1203 38.3878 37.4853 37.3278 36.5190 37.5680 38.3878

Table 5: PSNR of each video for all algorithms, without Gaussian blur.

As shown in Table 5, Single-step DFT had the highest mean PSNR, followed by Mutual informa- tion, Block matching, and Phase correlation, in that order. The PSNR of Phase correlation and Mutual information on Video 5 was considerably lower compared to the mean PSNR.

Mean square error (MSE)

Algorithm Video 1 Video 2 Video 3 Video 4 Video 5 Mean Min Single-step DFT 6.6459 6.2022 6.1417 7.5449 8.2004 6.9470 6.1417 Phase correlation 8.9346 11.0735 12.9158 9.8959 54.0053 19.3650 8.9346 Mutual information 8.8947 6.9566 6.9295 6.3456 36.8505 13.1954 6.3456 Block matching 10.0243 9.4255 11.6025 12.0309 14.4938 11.5154 9.4255

Table 6: MSE of each video for all algorithms, without Gaussian blur.

As shown in Table 6, Single-step DFT had the lowest mean MSE, followed by Block matching, Mutual information and Phase correlation, in that order. The MSE of Phase correlation and Mutual information on Video 5 was considerably higher compared to the mean MSE.

Single-step DFT-20 Phase correlation Mutual information Block matching Full-frame compensation method -10

0 10 20 30 40 50 60 70

Root mean squared error

RMSE between ground truth and estimated translations

Figure 19: RMSE on videos without Gaussian blur, with 95% confidence intervals calculated over

all five videos. The result for Full-frame compensation method is from a similar study [10].

(33)

Single-step DFT34 Phase correlation Mutual information Block matching 35

36 37 38 39 40 41

dB

PSNR between ground truth and estimated translations

Figure 20: PSNR on videos without Gaussian blur, with 95% confidence intervals calculated over all five videos.

Single-step DFT Phase correlation Mutual information Block matching 0

5 10 15 20 25 30 35

Mean squared error

MSE between ground truth and estimated translations

Figure 21: MSE on videos without Gaussian blur, with 95% confidence intervals calculated over all five videos.

Because the confidence intervals of Single-step DFT and Block matching do not overlap in Figures

19, 20 and 21, Single-step DFT had a lower mean RMSE and MSE, and higher mean PSNR,

than Block matching, with statistical significance at 95% confidence level. Moreover, Single-step

DFT had the smallest confidence intervals for all evaluation metrics. As shown in Figure 19, the

(34)

confidence intervals of Full-frame compensation method and Single-step DFT do not overlap, and the mean RMSE is higher for the Full-frame compensation method. Therefore, the mean RMSE of Single-step DFT was lower than the mean RMSE of Full-frame compensation method, with statistical significance at 95% confidence level.

4.3 Precision in relation to Gaussian blur

0 2 4 6 8 10 12 14 16

Standard deviation of Gaussian kernel

-20

0 20 40 60 80 100

Root mean squared error

RMSE between ground truth and estimated translations

Figure 22: RMSE of algorithms with 95% confidence intervals calculated over all five videos for each level of Gaussian blur. The results of the algorithms, except Single-step DFT, are shifted slightly on the x-axis in order to provide a clearer view.

As shown in Figure 22, the mean RMSE was the lowest for Single-step DFT for all levels of Gaussian blur, followed by Mutual information. The mean RMSE of Mutual information also showed a decrease with increasing amounts of Gaussian blur. Block matching had lower mean RMSE than Phase correlation at lower levels of Gaussian blur, but higher mean RMSE for higher levels. Furthermore, Single-step DFT had the smallest confidence intervals and Phase correlation had the largest confidence intervals.

Moreover, the confidence intervals of Single-step DFT do not overlap with Phase correlation or

Block matching when the standard deviation of the Gaussian kernel is 9,12 and 15. Therefore,

Single-step DFT had lower RMSE than Phase correlation and Block matching when the standard

deviation of the Gaussian kernel is 9,12 and 15, with statistical significance at 95% confidence

level.

(35)

0 2 4 6 8 10 12 14 16

Standard deviation of Gaussian kernel

28 30 32 34 36 38 40 42

dB

PSNR

Figure 23: PSNR between the ground truth videos and the stabilised videos for all algorithms, with 95% confidence intervals, calculated over all five videos for each of the five levels of Gaussian blur. The results of the algorithms, except Single-step DFT, are shifted slightly on the x-axis in order to provide a clearer view.

As shown in Figure 23, the mean PSNR was the highest for Single-step DFT for the first two levels of Gaussian blur, and had a similar mean PSNR compared to Mutual information for the remaining three levels. Block matching had higher a mean PSNR than the mean PSNR of Phase correlation at lower levels of Gaussian blur, but had a lower mean PSNR for higher levels of blur. Furthermore, Single-step DFT had the smallest confidence intervals, followed by Mutual information.

Moreover, the confidence intervals of Single-step DFT do not overlap with the confidence intervals of Phase correlation or Block matching for any standard deviation of the Gaussian kernel. There- fore, Single-step DFT had lower mean PSNR than Phase correlation and Block matching for all standard deviations of the Gaussian kernel, with statistical significance at 95% confidence level.

Furthermore, the mean PSNR of Mutual information was higher than the mean of Phase corre-

lation and Block matching when the Gaussian standard deviation was 12 and 15, with statistical

significance at the 95% confidence level.

(36)

0 2 4 6 8 10 12 14 16

Standard deviation of Gaussian kernel

0 20 40 60 80 100 120

Mean squared error

MSE

Figure 24: MSE between the ground truth videos and the stabilised videos for all algorithms, with 95% confidence intervals calculated over all five videos for each of the five levels of Gaussian blur.

The results of the algorithms, except Single-step DFT, are shifted slightly on the x-axis in order to provide a clearer view.

As shown in Figure 24, Single-step DFT had the lowest mean MSE for the first two levels of Gaus- sian blur, but had a similar mean MSE compared to Mutual information for the three remaining levels. Block matching had lower mean MSE than Phase correlation at lower levels of Gaussian blur, but higher mean MSE for higher levels of Gaussian blur. Furthermore, Single-step DFT had the smallest confidence intervals, followed by Mutual information.

Moreover, the confidence intervals of Single-step DFT do not overlap with Phase correlation or Block matching when the standard deviation of the Gaussian kernel is 3 and 12. Therefore, Single- step DFT had a lower mean MSE than Phase correlation and Block matching, when the standard deviation of the Gaussian kernel was 3 and 12, with statistical significance at 95% confidence level.

In addition, the mean MSE of Single-step DFT was lower than the mean of Phase correlation when the Gaussian standard deviation was 3, 9, 12, 15, with statistical significance at the 95% confidence level.

IN THE FIELD OF TECHNOLOGY DEGREE PROJECT