Low Light Video Enhancement along with Objective and Subjective Quality Assessment

(1)

with Objective and Subjective Quality Assessment

Sri Krishna Jayanty Venkata Gopi Krishna Dalasari

November 2016

Dept. Applied Signal Processing

Blekinge Institute of Technology

SE–371 79 Karlskrona, Sweden

(2)

(3)

Electrical Engineering with emphasis on Signal Processing.

Contact Information:

Author(s):

Sri Krishna Jayanty

E-mail: jayanty.srikrishna@gmail.com Venkata Gopi Krishna Dalasari E-mail: dalasarigopi@gmail.com

Industry Supervisor:

Dr. Benny Sällberg

Sällberg Technologies e.U.

Sällberg Technologies e.U

Friedrich Schiller-Str. 11. Phone : +43 660 4849 960 A4840 Vöcklabruck, Austria E-mail : oﬃce@sallberg.at Co-supervisor:

Dr. Josef Ström Bartunek

Dept. Applied Signal Processing

Blekinge Institute of Technology Phone : +46 455 38 50 00

SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

(4)

(5)

Enhancing low light videos has been quite a challenge over the years. A video taken in low light always has the issues of low dynamic range and high noise. This master thesis presents contribution within the field of low light video enhancement. Three models are proposed with different tone mapping algorithms for extremely low light low quality video enhance- ment. For temporal noise removal, a motion compensated kalman structure is presented. Dynamic range of the low light video is stretched using three different methods. In Model 1, dynamic range is increased by adjustment of RGB histograms using gamma correction with a modified version of adap- tive clipping thresholds. In Model 2, a shape preserving dynamic range stretch of the RGB histogram is applied using SMQT. In Model 3, contrast enhancement is done using CLAHE. In the final stage, the residual noise is removed using an efficient NLM. The performance of the models are com- pared on various Objective VQA metrics like NIQE, GCF and SSIM.

To evaluate the actual performance of the models subjective tests are con- ducted, due to the large number of applications that target humans as the end user of the video.The performance of the three models are compared for a total of ten real time input videos taken in extremely low light environ- ment. A total of 25 human observers subjectively evaluated the performance of the three models based on the parameters: contrast, visibility, visually pleasing, amount of noise and overall quality. A detailed statistical evalua- tion of the relative performance of the three models is also provided.

Keywords: Contrast enhancement, Dynamic range, Kalman ﬁlter, Spatial denoising, Noise reduction, Temporal denoising, Tone mapping.

ii

(6)

(7)

Abstract ii

1 Introduction 1

1.1 Motivation . . . . 1

1.2 Problem Statement . . . . 2

1.3 Research Questions . . . . 2

1.4 Survey of Related works . . . . 3

1.5 Proposed Solution Based on Related Work . . . . 4

1.6 Outline of the thesis . . . . 6

2 Background 7 2.1 Video Compression . . . . 7

2.1.1 H.264/AVC . . . . 7

2.2 Kalman Filter . . . . 8

2.3 Tone Mapping . . . . 8

2.3.1 Gamma . . . . 9

2.3.2 Successive Mean Quantization Transform (SMQT) . . . . . 10

2.3.3 Contrast Limited Adaptive Histogram Equalization (CLAHE) 11 2.4 Non Local Means (NLM) . . . . 12

2.5 Wiener Filter . . . . 13

2.6 Spectral Subtraction . . . . 13

3 Low Light Video Enhancement Model 15 3.1 Low Light Video Characteristics . . . . 15

3.2 Temporal Noise Reduction . . . . 17

3.3 Tone Mapping . . . . 20

3.3.1 Histogram Adjustment with Gamma Correction . . . . 21

3.3.2 Successive Mean Quantization Transform . . . . 23

3.3.3 Contrast Limited Adaptive Histogram Equalization . . . . 23

3.4 Spatial Noise Reduction . . . . 25

4 Implementation of Enhancement Model 31 4.1 Temporal Noise Reduction . . . . 33

4.1.1 Temporal Averaging Filter . . . . 33

iv

(8)

4.2 Contrast Enhancement . . . . 33

4.2.1 Histogram Adjustment with Gamma Correction . . . . 33

4.2.2 Successive Mean Quantization Transform . . . . 34

4.2.3 Contrast Limited Adaptive Histogram Equalization . . . . 34

4.3 Spatial Noise Reduction . . . . 34

4.3.1 Fast NLM . . . . 34

4.3.2 Spectral Subtraction . . . . 35

5 Objective and Subjective Quality Assessment 37 5.1 Objective Quality Assessment . . . . 38

5.1.1 No Reference Objective Metrics . . . . 38

5.1.2 Full Reference Objective Metrics . . . . 38

5.2 Subjective Quality Assessment . . . . 39

5.2.1 Outline . . . . 39

5.2.2 Test Environment . . . . 39

5.2.3 Observers . . . . 40

5.2.4 Performance Parameters . . . . 40

5.2.5 Instructions for the assessment . . . . 41

5.2.6 Test Procedure . . . . 42

6 Results and Analysis 43 6.1 Comparing the Performance of the Models . . . . 43

6.1.1 Application 1 . . . . 43

6.1.2 Application 2 . . . . 46

6.1.3 Performance in Presence of Motion . . . . 47

6.2 Analyzing Models with Objective Metrics . . . . 49

6.2.1 No Reference Objective Metrics . . . . 49

6.2.2 Full Reference Objective Metrics . . . . 49

6.3 Subjective Quality Analysis . . . . 50

6.3.1 Mean Opinion Score . . . . 50

6.3.2 Standard Deviation . . . . 53

6.3.3 Correlation of Parameters w.r.t. Overall Quality . . . . 54

6.3.4 Analyzing Parameters w.r.t Luminance Levels . . . . 59

6.3.5 Output Luminance Levels . . . . 65

7 Conclusion and Future Work 66

References 68

Appendices 72

v

(9)

1.1 Block diagram . . . . 2

1.2 Process Flow Diagram . . . . 5

2.1 Successive Mean Quantization Transform for one operation . . . . 11

3.1 Temporal Noise Reduction Schema . . . . 17

3.2 Before and After Temporal Noise Reduction . . . . 19

3.3 Histogram of Low Light Frame . . . . 20

3.4 Histogram of Normal Light Frame . . . . 22

3.5 Histogram of Temporally Denoised Frame (Video Telephony) . . . 24

3.6 Histogram after Gamma Correction (Video Telephony) . . . . 25

3.7 Histogram after SMQT (Video Telephony) . . . . 25

3.8 Histogram after CLAHE (Video Telephony) . . . . 25

3.9 Tone-Mapped Outputs . . . . 27

3.9 Tone-Mapped Outputs (continued) . . . . 28

4.1 Flow chart of the implemented model . . . . 32

6.1 Video Telephony Application . . . . 44

6.1 Video Telephony Application (Continued) . . . . 45

6.2 Signboard . . . . 46

6.2 Signboard (Continued) . . . . 47

6.3 Motion compensation after kalman ﬁltering . . . . 48

6.4 Xylophone . . . . 51

6.4 Xylophone (Continued) . . . . 52

6.5 MOS w.r.t.various parameters for sequence Video Telephony . . . 53

6.6 Standard Deviation w.r.t. parameters for all 10 test sequences . . 54

6.7 Contrast w.r.t. Overall Quality . . . . 55

6.8 Visibility w.r.t.Overall Quality . . . . 56

6.9 Visually Pleasing w.r.t.Overall Quality . . . . 57

6.10 Amount of Noise w.r.t.Overall Quality . . . . 58

6.11 Contrast w.r.t.Luminance Levels . . . . 59

6.12 Visibility w.r.t.Luminance Levels . . . . 60

6.13 Amount of Noise w.r.t.Luminance Levels . . . . 61

vi

(10)

1 Application 4 - Painting . . . . 73

1 Application 4 - Painting (Continued) . . . . 74

2 Application 5 - Indoor Environment . . . . 75

2 Application 5 - Indoor Environment (Continued) . . . . 76

3 Application 6 -Photography . . . . 77

3 Application 6 - Photography (Continued) . . . . 78

4 Application 7 - Outdoor Environment . . . . 79

4 Application 7 - Outdoor Environment (Continued) . . . . 80

5 Application 8 - Pre-processing for Character Recognition . . . . . 81

5 Application 8 - Pre-processing for Character Recognition (Continued) 82 6 Application 9 - Surveillance Camera . . . . 83

6 Application 9 - Surveillance Camera (continued) . . . . 84

7 Application 10 - Forest . . . . 85

7 Application 10 - Forest(Continued) . . . . 86

vii

(11)

2.1 Kalman Equations . . . . 9

4.1 Camera Speciﬁcations . . . . 31

4.2 Values of α and β for diﬀerent low light inputs . . . . 34

5.1 System speciﬁcations . . . . 39

6.1 Performance comparison based on no reference objective metrics . 49 6.2 Performance comparison of full reference objective metrics . . . . 50

6.3 Correlation of all parameters w.r.t Overall Quality . . . . 54

6.4 Input and Output Relative Luminance Values . . . . 65

viii

(12)

(13)

Firstly, we wish to thank our supervisor Dr.Benny S ¨allberg for his valuable expert advice, strong support and encouragement throughout our thesis work. Further- more, we would like to thank him for introducing us to the topic and for helping us realize our true potential as a researcher. Our sincere thanks to the co-supervisor Dr.Josef Str ¨om Bartunek for his guidance and support to complete our thesis.

We would also like to thank them for helping us overcome obstacles and emerge out successful.

We would like to express our deepest gratitude to the entire Department of Ap- plied Signal Processing for helping us throughout our research. Also, we would like to thank the participants who have willingly shared their valuable time for the subjective test.

A special thank you also goes for our family members for all the support we have

received throughout the years.

(14)

(15)

AGC Adaptive Gamma Correction

APMF Adaptive Piecewise Mapping Function ASTA Adaptive Spatio Temporal Accummulation ASTC Adaptive Spatio Temporal Connective AVC Advanced Video Coding

CCD Charge Coupled Device

CDF Cumulative Distribution Function

CLAHE Contrast Limited Adaptive Histogram Equalization FPN Fixed Pattern Noise

GCF Global Contrast Factor

HAGC Histogram Adjustment with Gamma Correction HDR High Dynamic Range

IPTV Internet Protocol Television LDR Low Dynamic Range

NCV Neighborhood Connective Value NIQE Natural Image Quality Evaluator NLM Non Local Means

RL Relative Luminance

SDFT Statistical Domain Temporal Filter SHF Spatial Hybrid Filter

SMQT Successive Mean Quantization Transform

xii

(16)

TM Tone Mapping

VQA Video Quality Assessment

xiii

(17)

Introduction

1.1 Motivation

Over the past few years there is a substantial growth of digital cameras in the space of sensitivity and resolution. Despite the fact that the sensitivity in captur- ing the light by image sensors has been improved, the modern digital cameras still limit in capturing a High Dynamic Range (HDR) images in low light conditions.

Similar to human eye, digital cameras ﬁnd it hard to capture extremely low light conditions. Digital cameras can perform better when captured at higher ISO levels and lower shutter speeds, as more light is received by the image sensors.

These are the desired settings at low light conditions. But, higher ISO levels usually result in higher noise and slower shutter speeds result in motion blur [1].

Moving on to lower lighting conditions, the intensity of noise grows higher than the signal making it hard to reconstruct the image. So, post processing the videos taken at low lighting conditions to improve the visual appearance has been an active research area which could serve the purpose for several video processing applications [2].

Video Telephony has rapidly appeared as a noteworthy technology, replacing the thought of a conventional phone call. With the mobile market expanding with respect to computers and laptops in the near future video telephony will be the main source of communication. With advent of powerful video codecs and high speed internet, video telephony became a practical technology with regular use. H.264 is one of the most widespread codec standard used for video telephony. As low end video quality devices such as web cams and cell phones became ubiquitous for video telephony, there is ever more need for reliable video enhancement technologies to improve their output. The video sequences are often disarrayed by noise image establishment and transmission especially in low light environment. Low light videos have extremely low dynamic range, as a result the quality of videos in low light is limited. Image sequences captured in low light conditions have very low Signal to Noise ratio. Therefore, it is desirable to enhance the quality of the low light video. Similarly, low light video enhancement is used for several speciﬁed applications such as automated vehicles, video telephony, security and surveillance industries, satellite videography, traﬃc management,

1

(18)

digital photography [3].

1.2 Problem Statement

The main obstacle with videos taken in extremely low light environment is the lack of visibility. The obstacles can be characterized into low dynamic range and high amount of noise. There are various types of noise sources in a low light video which include quantization noise, read out noise, thermal noise and photon shot noise. Straight away stretching the dynamic range of a low light video exhibits various undesirable artifacts such as noise amplification, intensity saturation and loss of resolution. So, a suitable denoising technique has to be performed before stretching the dynamic range i.e Tone mapping. Even though a considerable amount of noise is removed before tone mapping step, the noise amplified in the tone mapping step has to be removed by proper denoising technique. To design an effective low light video enhancement technique, temporal denoising is done followed by tone mapping for contrast enhancement and at the end residual noise is removed by spatial noise reduction [3]. The process is shown in the Fig.1.1

Figure 1.1: Block diagram

1.3 Research Questions

• What are the characteristics of a low light video?

• How can a video taken in extremely low light be enhanced?

• What are the denoising and contrast enhancement techniques to be used to enhance the low-light video?

• How are the subjective tests conducted in order to analyze the performance of the algorithms?

• How does the performance of the enhanced video at diﬀerent luminance

levels vary?

(19)

1.4 Survey of Related works

Various approaches are developed for enhancing low light videos. Xuan Dong and Dubok Park [4][5][6] presented a dehazing algorithm for low light video en- hancement. This method is developed under the observation that the results of pixel wise inversion of the low light videos looks quite similar to hazy videos.

The estimation of medium transmission function in the hazy image acquisition model by using a dark channel prior (DCP) becomes unreliable in very low light conditions and it requires large computation loads. Qing Xu [7] proposed a three step method for denoising low light videos. In the ﬁrst step, a modiﬁed version of NLM is performed where NLM in spatial and temporal region is done separately.

These two are combined based on adaptive weights which depends on the amount of motion in the subject of the video. In the second step, tone mapping is done to stretch the dynamic range of the video and in the final step filtering is done in the YCrCb color space. Although a modified version of 3D NLM is used the computational cost of the method is really high.

Dong [8] proposed a contrast enhancement algorithm for low light videos. It is based on a piecewise stretch on the brightness component extracted with Retinex theory in HSV space to improve the visuality of the image. The overall computa- tional complexity of the algorithm is really high. Although it is able to increase the amount of brightness to an ideal level, the algorithm fails to retain color infor- mation in all cases. Malm [9] presented a modified version of structure adaptive anisotropic filtering using the 3D structure tensor for adaptive spatio temporal filtering. The 3D structure tensor obtained from the spatio-temporal gradient is used to estimate kernel width of spatial and temporal direction inorder to con- struct the adaptive anisotropic filter. Contrast limited histogram equalization is used for tone mapping. Due to the inaccurate estimation of the anisotropic kernel from input signals the method becomes unstable and provides blurry results and low output PSNR.

Seong [10] proposed a low light noise removal algorithm, it mainly consists of a statistical domain temporal ﬁlter (SDFT) for moving area and a Spatial hybrid ﬁlter (SHF) for stationary area. Poisson noise and false color noise are considered as the main type of noise in the input video. This algorithm is only for moderately dark videos hence there is no tone mapping step to stretch the dynamic range.

Eric [11] proposed a framework for video enhancement using per pixel virtual exposures. The algorithm mainly deals with enhancing Low Dynamic Range (LDR) videos based on virtual exposure camera model, reducing noise in LDR videos using Adaptive Spatio Temporal Accumulation (ASTA) ﬁlter and a tone mapping approach to enhance LDR videos.

Chao [12] proposed a video enhancement algorithm, it mainly consists of a lo-

cal image statistic named Neighborhood Connective Value (NCV) for identifying

impulse pixels, an Adaptive Spatio Temporal Connective (ASTC) noise ﬁlter for

reducing mixed noise and a Piecewise Mapping Function (APMF) to enhance

(20)

video contrast. ASTC uses optical ﬂow for motion estimation and APMF is ap- plied on segmentation results. Methods such as optical ﬂow and segmentation do not provide reliable results for low light noisy videos.

1.5 Proposed Solution Based on Related Work

Most of the work mentioned so far only deals with moderately dark videos in RAW format. In our research an attempt is made to enhance videos processed by H.264 codec with relative luminance lower than 0.1. In extremely low light videos noise is really high, reconstructing and enhancing the useful information is quite a challenge. The proposed method, which is improved from the previous work in [13] [2], is a novel method for enhancement of low quality and extremely low light video. Prior to denoising the low light video, the various source of noise are categorized into gaussian and FPN noise based on which the denoising algorithms are designed. Initially, a low light input video processed by H.264 codec is passed through the following steps given in the Fig.1.2 to produce an enhanced video. Instead of implementing the computational complex modified kalman filter in [2], a simplified version is implemented in our proposed method.

The temporal denoising step for eliminating still noise and avoiding motion blurs is implemented according to [14]. In the ﬁrst stage, temporal averaging is done to decrease the level of noise and for motion estimation.

In the second stage, the remaining amount of noise is eliminated by motion com- pensated spatio-temporal kalman ﬁltering. The impulse noise is removed using median ﬁlter. After removing a considerable amount of noise, the dynamic range of the frames is increased by one of the three tone mapping algorithms stated in the block diagram:

• Algorithm 1 : In this algorithm a modiﬁed version of histogram adjustment with gamma correction in [2] is proposed, where a detailed explanation is provided on selecting lower and higher clipping thresholds.

• Algorithm 2 : Mikael Nilsson [15] proposed SMQT for enhancement of gray scale images. In this algorithm SMQT is applied for stretching the dynamic range of the input low light video frame.

• Algorithm 3 : This algorithm consists of the most commonly used algorithm for contrast enhancement i.e CLAHE [16].

After implementing one of the above algorithms for improving dynamic range,

the ampliﬁed noise due to tone mapping is eliminated by using Non-local means

algorithm.In order to increase the eﬃciency of the code a modiﬁed version of NLM

[17] is implemented. In the last step of processing the remaining noise i.e. FPN

(21)

is removed by using spectral subtraction. The performance of the various tone mapping algorithms is compared using various subjective and objective metrics.

Figure 1.2: Process Flow Diagram

(22)

1.6 Outline of the thesis

The report has been structured as follows:

Chapter 2 provides the conceptual learning about various image and video pro- cessing algorithms.

Chapter 3 deals with the detail description of the algorithms proposed for low light enhancement.

Chapter 4 provides the values taken for dynamic parameters in each algorithm for a speciﬁc video sequence.

Chapter 5 discusses the procedure in which the subjective assessment has been conducted.

Chapter 6 shows the results and a detailed analysis.

Chapter 7 concludes the work and provides a path for continuing the work.

(23)

Background

In this chapter a detailed explanation of all the selected methods used in the thesis are given. Here, the required background knowledge to better comprehend the concepts of each method with context to the thesis is presented.

2.1 Video Compression

Compression reduces the resource usage, such as data storage space or transmis- sion capacity. Video compression is a technique used for reducing the file size for efficient storage and transmission of data over a network by removing the redundant video data. The process of video compression involves applying an algorithm to obtain a compressed file which can be stored or transmitted. An inverse algorithm is used to play a video with the same content as the original source video [18].

Compression can be either lossless or lossy. Lossless compression technique guar- antees full reconstruction of the original data without incurring any distortion in the process. Lossy compression is the class of data encoding method that re- duces bits by identifying and removing the unnecessary information. For various day-to-day applications lossy compression is used, as the resource usage is very low compared to lossless compression [18]. Various compression standards oﬀer various methods for reducing the data which also creates a variation in the qual- ity, bitrate and latency of the video. The state-of-the-art on video compression is nowadays represented by the H.264/AVC standard video codec which has been deployed in several application domains from wireless video streaming to IPTV and Blu-ray Disc (BD) [19].

2.1.1 H.264/AVC

H.264 is a compression format that is identical to the MPEG-4 part 10, Advanced Video Coding. H.264 is trending block-oriented motion-compensation standard commonly used for recording, compression and transmission of video content. It delivers an average bit rate reduction of 50%, when compared with the other video standards without compromising the video quality. H.264 has the ﬂexibility to

7

(24)

scale the latency depending on the application requirement. It has 11 levels to vary the performance, bandwidth and memory requirements.

H.264 encoder performs block based motion compensation to search for matching blocks in several reference frames for compression. Intra-coded macroblocks are used when matching blocks are not used. Motion compensation is the most de- manding aspect of video encoder and the different ways and degrees with which it can be implemented by an H.264 encoder can have an impact on the efficiency of video compressed [20]. An in-loop de-blocking filter present in H.264 stan- dard helps in reducing the blocking artifacts caused by compression. This filter smoothes block edges using an adaptive strength to deliver a perfect decompressed video.

2.2 Kalman Filter

Kalman filter is an efficient way of estimating the state of a process to minimize the mean of the squared error [14]. The purpose of each iteration of kalman filter is to update the estimate of the state vector of a system and the covariance of that vector based on the information in a new observation. Kalman filter assumes that the observations occur at fixed discrete time intervals and addresses the general problem of trying to estimate the state of a discrete-time controlled process. Kalman filter estimates the process by using a feedback loop in which the filter estimates the process state and obtains a feedback in the form of noisy measurements [21]. It generally falls into prediction and update equations. In the prediction equations, the priori state is obtained by the amount of motion estimation error and the state covariance matrix for current state. The prediction equations are presented in the Table 2.1.

In the Table 2.1, X_est t and P _pred t are the priori estimate calculated in pre- diction state and the state covariance matrix for current state. Q t reﬂects the amount of motion estimation error.

Update equations are responsible for incorporating a new measurement into a priori estimate to obtain an improved posteriori estimate. Update equations are presented in Table 2.1.

In the Table 2.1, K t is the kalman gain which stabilizes quickly and remains con- stant. C t is the covariance noise of the current input frame. X_pred t , P _est t

are the denoised image frame and posteriori estimates of state covariance matrix for current state.

2.3 Tone Mapping

Tone mapping is a technique used in image processing and computer graphics to

map one set of colors to another to approximate the appearance of high-dynamic-

(25)

Kalman Prediction Equations

X_pred t = X_est t −1 (2.1)

P _pred t = P _est t −1 + Q t (2.2)

Kalman Updation Equations

K _t = (P _pred t + C t ) ⁻¹ (2.3) X_est t = P _pred t (P _pred t + C t ) ⁻¹ (2.4) P _est t = (I − K t )P _pred t (2.5)

Table 2.1: Kalman Equations

range images in a medium that has a more limited dynamic range. In this section background of three diﬀerent tone mapping algorithms i.e Gamma correction [22], SMQT [23] and CLAHE [16] are explained.

2.3.1 Gamma

Gamma deﬁnes the relation between a pixel’s numeric value and its actual lumi- nance. It is a non linear operation used to code and decode light and dark values in video images. Gamma correction is usually given in the form of the power function:

intensity = (signal) ^γ , 1.8 < γ < 2.8 (2.6)

Human eyes do not perceive light the way cameras do. In cameras there is a linear

relationship between the number of photons that hit the sensor and the amount of

perceived brightness where as human eyes have a non linear relationship between

the amount of light received and perceived brightness. Compared to camera,

human eyes are much more sensitive to changes in the dark tones then compared

to similar changes in bright tones.Therefore, whenever a digital image is saved,

it is gamma encoded so that the intensity values in the digital images perceive

brightness as the human eye [22].

(26)

2.3.2 Successive Mean Quantization Transform (SMQT)

SMQT is a non-linear transformation that reveals the structure of the data. It preserves the shape of histogram by an operation identical to non-linear his- togram stretch. SMQT can be viewed as a binary tree build of a simple Mean Quantization Units (MQU) where each level performs an automated breakdown of information [23]. It adjusts the dynamic range adaptively and non-linearly and is designed with only one parameter L. The parameter L sets the number of levels in a binary tree.

The data I can be converted into a vector or any arbitrary form. Let x be the data points and D (x) be the data points for each separate vector. The MQU is calculated for D at each level. MQU consists of three steps namely mean calculation, a quantization and a split of input set [23].

In the ﬁrst step of the MQU, we calculate the mean of the data denoted as D (x), according to

D (x) = 1

|D| ×

x ∈D D (x) (2.7)

The second step uses the mean to quantize the values of data points into 0, 1.

The comparison function is deﬁned as ξ (D(y), D(x)) =

1, ifD(y) > D(x)

0, else (2.8)

And let ⊗ denote concatenation,and then

U (x) = ⊗ y ∈D ξ (D(y), D(x)) (2.9) It is the mean quantized set. The set U(x) is the main output from a MQU.

The third splits the input into two subsets

D ₀ (x) = x|D(x) ≤ D(x), ∀x ∈ D (2.10) D ₁ (x) = x|D(x) > D(x), ∀x ∈ D (2.11) Where D ₀ (x) propagates left and D ₁ (x) propagates right in the binary tree as shown in Fig.2.1 The output acquired from U (x) is interpreted as the structure of D(x). Hence, the MQU is independent of gain and bias adjustments of the input. MQU sets the main computing unit for the SMQT. The transform for the ﬁrst level is SM QT ₁ and the output is U. The same implementation continues depending on the number of levels and each notation is extended depending on the number of levels. The ﬁnal SM QT L is found by adding the results. Hence, the result of SM QT l is obtained by

M (x) = x|D(x) =

L l =1

2

^l−1

n =1

I (u l,n ) × 2 ^L ⁻¹ ∀x ∈ M, ∀u l,n ∈ U l,n (2.12)

(27)

Figure 2.1: Successive Mean Quantization Transform for one operation

Where, output set from one MQU in the tree is denoted by U _(l,n) l = 1, 2, . . . , L is the current level and n = 1, 2, . . . , 2 ^l ⁻¹ is the output number for the MQU at level l. MQU is the basic building block of SMQT which is insensitive to gain and bias.

2.3.3 Contrast Limited Adaptive Histogram Equalization (CLAHE)

Adaptive histogram equalization is technique used to improve the contrast by partitioning the image into regions and apply histogram equalization to each re- gion [16]. It operates on small data regions called tiles rather than considering the entire data as in histogram equalization. The contrast of each tile is enhanced so that the output histogram matches the speciﬁed histogram. The neighboring tiles are then consolidated by the bilinear interpolation so as to remove the artiﬁcially induced limits. The contrast can be constrained in homogeneous areas to avoid amplifying the noise present in the image.

CLAHE diﬀers from AHE in its contrast limiting. The contrast ampliﬁcation

in the vicinity of a given pixel value is given by the slope of the transformation

function. This transformation function is used to convert the density function

approximately to a uniform distribution function. The transformation function

is proportional to the slope of the neighborhood cumulative distribution function

(CDF) and therefore to the value of the histogram at that pixel value. CLAHE

limits the ampliﬁcation by clipping the histogram at a predeﬁned value before

computing the CDF. This limits the slope of the CDF and also the slope of the

transformation function. The value at which the histogram is clipped, the so-

called clip limit, depends on the normalization of the histogram and thereby on

the size of the neighborhood region. Clip limit limits the contrast enhancement.

(28)

Increasing the clip limit results in more contrast.

2.4 Non Local Means (NLM)

Non local means is an algorithm in image processing for spatial denoising. Non local means ﬁltering takes a mean of all the neighboring pixels, weighted by how similar these pixels are to the target pixels. This results in improving the clarity and less loss of detail in the image compared with local mean algorithms [24]. Each pixel’s local vicinity is considered as a patch and is checked with the other patches in the neighborhood till a certain vicinity. In this vicinity, the algorithm should search for similar patches and also the patch sizes are adjustable. NLM avoids the faulty inter-color similarity computations by only considering neighboring patches with the same pattern as a reference pattern [2]. The weighting function depends on the similarity between the patches and they generally lie between 0 and 1 [25].

The sum of all the weights is 1. Given a discrete noisy image v = v(i)|i ∈ I, the NLM estimate for a pixel i is computed as the weighted average of all the pixels in the vicinity of the patch.

The equation is given as

N LM [v](i) =

j ∈I w (i, j)v(j) (2.13)

Where, w (i, j) _j is the family of weights that depend on the similarity between the pixels i, j satisﬁes the conditions 0 ≤ w(i, j) ≤ 1 and

j ∈I w (i, j) = 1.

The similarity between the pixels i and j depends on the similarity between the patches v (R i ) and v(R j ), where R k denotes a square neighborhood of ﬁxed size and centered at pixel k. The similarity can be calculated as a decreasing function of the weighted Euclidean distance, ||v(R i )) − v(R j )|| ² _2,a where a > 0 is the standard deviation of the Gaussian kernel.

The Gaussian weighting function is given by w (i, j) = 1

Z (i) e

−||v(Ri)−v(Rj)||22,a

h2

(2.14)

Where, Z (i) is the normalization constant.The equation for Z(i) is given as Z (i) =

j e

−||v(Ri)−v(Rj)||22,a

h2

(2.15)

The parameter h gives the degree of ﬁltering. NLM not only controls the grey level in a single point but also geometrical conﬁguration in a whole neighborhood.

If the patches are not similar, then the weight will be nearly equal to zero.

(29)

2.5 Wiener Filter

The objective of the wiener filter is to estimate the unknown signal using a related signal as an input and filtering that known signal to produce an estimate as an output. Wiener filter estimates the mean and standard deviation of the additive noise by considering the neighborhood pixels of size M-by-N. The estimation of mean and standard deviation is given in [26]. Wiener filter estimates the local mean by

μ = 1 N M

n

₁

,n

₂

∈η A (n 1 , n ₂ ) (2.16) Where η is the M by N neighborhood of each pixel in the image A. n ₁ ,n ₂ represents each pixel.

Standard deviation is calculated as σ ² = 1

N M

n

₁

,n

₂

∈η A ² (n 1 , n ₂ ) − μ ² (2.17) The wiener ﬁlter estimate for frequency zero is given by

B (n ₁ , n ₂ ) = μ + σ ² − ν ²

σ ² (A(n ₁ , n ₂ ) − μ) (2.18) where, ν ² is the noise variance.

2.6 Spectral Subtraction

Spectral subtraction restores the magnitude spectrum of the signal aﬀected with noise through subtraction of an estimate of noise spectrum from the noisy spec- trum. By adding noise to the signal, only changes the magnitude but not the phase spectrum. By subtracting the estimate of the noise spectrum from the magnitude spectrum produces an estimate of the noiseless signal [27].

Noisy input frame is operated with Fast Fourier Transform (FFT) to express the input signal in terms of magnitude and phase spectrum. The equation is given as Y [w 1 , w ₂ ] = Y |w 1 , w ₂ |e ^jφ

^{Y (w1,w2)}

(2.19) where, Y [w 1 , w ₂ ] is the Fourier domain of noisy signal y(n 1 , n ₂ ), Y |w 1 , w ₂ | is the magnitude spectrum and φ is the phase spectrum.

The magnitude spectrum of noisy signal is subtracted with the estimated noise spectrum to obtain the estimate of the noise free spectrum. The equation is given as

|X e (w 1 , w ₂ )| =

[|Y [w 1 , w ₂ ]| − |D(w 1 , w ₂ )|], if |D(w 1 , w ₂ )| < |Y (w 1 , w ₂ )|

0, otherwise

(2.20)

(30)

where, |X e (w ₁ , w ₂ )| is the magnitude spectrum of the estimate of noise free signal and |D(w 1 , w ₂ )| is the estimated magnitude spectrum of the noise.

X _e (w 1 , w ₂ ) = [|Y [w 1 , w ₂ ]| − |D(w 1 , w ₂ )|] × e ^jφ

^{Y (w1,w2)}

(2.21)

The inverse fourier transform is operated on estimated noise spectrum X e [w ₁ , w ₂ ]

and the phase spectrum of noisy signal to get the estimated noiseless signal

x _e (n 1 , n ₂ ).

(31)

Low Light Video Enhancement Model

In this chapter the proposed enhancement model based on algorithms given in Chapter 2 are explained. The problem with low light video is that the contents are poorly visible. In normal conditions spatio-temporal ﬁltering can be used.

However, in low light conditions due to low dynamic range, spatio-temporal ﬁl- tering leads to over smoothing and blurring eﬀect around the moving object.

Techniques for stretching the dynamic range directly applied on the video will lead to amplification of noise along with improvement in visibility. The proposed enhancement model is used to improve the visibility of extremely low light video and avoid these unwanted effects. The model comprises of three methods that are applied consequently one after each other. Initially, temporal filtering is ap- plied before stretching the dynamic range to eliminate noise in motionless regions.

Next, dynamic range of the low light video is stretched by tone mapping and its visibility is enhanced. Finally, spatial denoising is done to eliminate the ampliﬁed noise due to tone mapping. Methods utilized in the proposed enhancement model are parameter dependent.The choice of parameters are described in Chapter 4.

3.1 Low Light Video Characteristics

The main obstacle for capturing the scenery in a low light condition is the lack of visibility of the subject in the video. To increase visibility the amount of light captured by a camera must be increased. The most important parameters that need to be considered to increase the lighting in the video are ISO sensitivity, aperture settings and shutter speed. These three elements determine the ﬁnal ex- posure of the input video which in turn decides the visibility of it. ISO sensitivity is an algorithmic value that indicates the image sensor’s sensitivity to light. On increasing the lighting in the video with the help of these elements there will also be an increase in the light captured by the camera. This results in increase in noise.

The two major characteristics of a low light video are the low dynamic range and high noise. In order to decrease the amount of noise in a low light video it is important to analyze the various noise sources. There are basically two types of noise sources :

15

(32)

• Noise from the recorded video.

• Noise introduced by the measurement device.

These include quantization noise, read out noise, photon shot noise, thermal or dark current noise [2]. Relevant noise sources that are produced by the camera are given below [28].

• Photon shot Noise :

It refers to the inherent natural variation of the photon ﬂux. Photon col- lected by a CCD (Charge Coupled Device) exhibit a poisson distribution.

The ﬂuctuation in photon count is visible in the video frame as Photon shot Noise. The noise histogram has a gaussian distribution. Fluctuations in photon counts from pixel to pixel are completely uncorrelated; in terms of its spatial variation, photon shot noise is white noise, this means that it has a uniform salt-and-pepper appearance with no structure or pattern.

• Read out Noise:

It refers to the uncertainty introduced during the process of quantifying the electronic signal on the CCD. Major part of it arises from the amplifier when converting electrons to voltage. Each electronic circuit component in the signal processing chain - from sensel (Photosensitive part of a pixel or sensor element) readout, to ISO gain, to digitization - suffers voltage fluctuations that contribute to a deviation of the raw value from the ideal value proportional to the photon count. The fluctuations in the raw value due to the signal processing electronics constitute the read noise of the sensor. Under low light conditions read out noise is one of the dominant noise. The histogram of the noise is approximately gaussian in nature.

• Dark Current Noise:

It arises from the statistical variation of thermally generated electrons within the silicon layer comprising the CCD. Dark current represents the rate of generation of CCD currents at a given temperature and follows a poisson relationship. Some pixels may have higher dark currents, thus produces a ﬁxed pattern noise. FPN can be removed using spectral subtraction.

• Quantization Noise:

The analog voltage signal from the sensor is digitized into a raw value, it is rounded to a nearby integer value. Due to this rounding oﬀ, the raw value mis-states the actual signal by a slight amount; the error introduced by the digitization is called quantization noise. The histogram of the noise is approximately gaussian in nature.

Dark current noise is approximated as Fixed pattern noise. Read out noise,

photon shot noise, quantization noise and the rest of the noises can be well ap-

proximated as FPN noise [2].

(33)

A detailed description of temporal noise reduction in extremely low light video using kalman ﬁlter is given in section 3.2. Contrast enhancement algorithms are described in section 3.3. A detailed description of spatial noise reduction using NLM and spectral subtraction is given in section 3.4.

3.2 Temporal Noise Reduction

Temporal noise reduction is implemented based on [14]. As video frames are tem- porally correlated, noise can be reduced effectively by temporal filtering. In order to remove temporal noise without motion blur a motion compensated kalman filter model is implemented. Firstly, pre-filtering is done in the form of temporal averaging for all the video frames. Temporally denoised pixels in a frame are calculated as the average of the surrounding pixels within certain radius in the previous and past frames. This will blur the sequence evolution, smoothing out the temporal variation, like a rapid variation in illumination or movement of an object. After temporal filtering the noise in the video is decreased to quite an extent while the motion in the video remains undisturbed. Temporal denoising is only implemented to increase the precision of motion estimation rather than video denoising.

Figure 3.1: Temporal Noise Reduction Schema

The pre-ﬁltered version of the current noisy frame along with the past denoised

frame is given as input to the motion estimation function. If there is more amount

of motion present in between the current and previous frame, the amount of

kalman ﬁltering is decreased and vice versa. The current noisy frame, the last

denoised frame and the output of motion estimation is given as input to the

kalman ﬁltering process. This in turn gives us the temporally denoised kalman

ﬁlter. Fig.3.1 illustrates the diagram of the motion compensated kalman ﬁltering.

(34)

The kalman filter estimates a process by using a form of feedback control. The modified kalman filter mainly consists of 2 stages: prediction and updation. The prediction stage helps in finding out the current state and to obtain a priori estimate of the error covariance matrix for the next time step. The error is between the input video frame and the estimated version. The updating stage considers a new measurement with respect to the prior estimate to obtain an improved posteriori estimate. The prediction equations are given by

Xp _t = Xe t −1 (3.1)

and

P p _t = P e t −1 + Q t (3.2)

Where, Xp t and P p t are the priori estimate calculated in prediction state and the state covariance matrix for current state. Initially Xe t −1 is obtained by consider- ing a spatially averaged version of the previous denoised frame. Spatial averaging is done with respect to the neighboring pixels. P e t −1 is the error state covariance matrix between the pre-ﬁltered version and noisy version of the video frame. Q t

reﬂects the amount of motion estimation error obtained from motion estimation.

t and t − 1 refers to current and past time frames.

The update equations are given by

Xe _t = P p t (P p t + C t ) ⁻¹ (3.3) and

Xe _t = Xp t + K t (Y t − Xp t ) (3.4) and

P e _t = (I − K t )P p t (3.5)

Where, K t , Y t are the kalman gain and the noisy image frame. C t is the covariance noise of the current input frame. Xp t , P e t are the denoised image frame and posterior estimates of state covariance matrix for current state.If there is large amount of motion between the previous frame and the current frame, the value of kalman gain is low and the amount of kalman ﬁltering is decreased and vice versa.

The kalman ﬁlter reduces noise in the still region to a great extent. However,

the moving region of the video frame still has noise. In low light conditions the

contents of the video frame are not visible clearly and denoising completely will

result in loss of useful information. Noise in the motion region is eliminated

after the tone mapping step, as the contents of the video frame are visible. The

input and temporally denoised output is shown in Fig.3.2. The histogram of the

temporally denoised frame is given in Fig.3.5.

(35)

(a) Input Frame

(b) Temporally Denoised Frame

Figure 3.2: Before and After Temporal Noise Reduction

(36)

(a) Low Light Frame

(b) Red channel (c) Green channel (d) Blue channel Figure 3.3: Histogram of Low Light Frame

3.3 Tone Mapping

The main aim of tone mapping is to stretch the dynamic range of the video frame for enhanced visibility. In this thesis, the following three diﬀerent tone mapping algorithms are explained:

• Histogram Adjustment with Gamma Correction

• Successive Mean Quantization Transform

• Contrast Limited Adaptive Histogram Equalization

(37)

3.3.1 Histogram Adjustment with Gamma Correction

Histogram adjustment with gamma correction is examined based on [2]. The major characteristic of low light video is its limited dynamic range. Most of the information in an extremely dark video is present in a very narrow region with low intensities that correspond to less than 5% of the maximum intensities. This can clearly be understood by comparing histograms at diﬀerent lighting conditions from Fig.3.3 and Fig.3.4 . The peak of the histogram moves towards zero with decrease in illumination level.

Straight away stretching the histogram will result in an increase in lighting in the video and at the same time it also leads to lots of noise. To avoid excess amount of noise, clipping is done before stretching the histogram. Clipping thresholds are mainly considered with the following rule,

λ _low,c = argmax(h(λ)) (3.6)

and

λ _high,c = argmin(

λ x =0

h (x) ≥ η × M), forc ∈ r, g, b (3.7) where, λ low,c , λ high,c are low and high thresholds for each color channel and η represents the proportion of total pixels that are accumulated below a chosen high threshold. h (x) is the histogram for the normalized intensity value x. The histogram of the frame that is passed through gamma correction is shown in Fig.3.6. Low and high values of an input frame are truncated as follows:

Z _trc,c (i, j) = lct, ifZ trc,c (i, j) ≤ lct (3.8) and

Z _trc,c (i, j) = hct, ifZ trc,c (i, j) ≥ hct (3.9) where, Z trc,c is the truncated frame. The values of lct and hct are obtained from the Eq.3.8 and Eq.3.9. After the truncated frame is obtained, each pixel is transformed using the following tone mapping operator:

Z _T,c (i, j) = T [Z trc,c (i, j)] = ( Z _trc,c (i, j) − lct

hct − lct ) ^γ , 0 ≤ γ < 1 (3.10)

where Z T,c represents the tone-mapped frame of each channel.

(38)

(a) Normal Light frame

(b) Red channel (c) Green channel (d) Blue channel

Figure 3.4: Histogram of Normal Light Frame

This formula helps in stretching the dynamic range of the video. To transform dark pixels to a bright level, the parameter of gamma correction is set to be less than one and η is empirically set to values from 0.99 to 0.999 [2]. This method helps in eliminating a major amount of noise and increasing the lighting of the video. This in turn helps in identifying all the details in the video and increasing its visibility. The histogram of the temporally denoised RGB frames and the output RGB frames after applying histogram adjustment with gamma correction are shown in Fig.3.5 and Fig.3.6.The tone-mapped output is shown in Fig.3.9a.

The respective frames are given in Fig.6.1.

(39)

3.3.2 Successive Mean Quantization Transform

A straightforward way to stretch the dynamic range of an image is to use SMQT directly [23]. The SMQT uses an approach that performs an automatic structural breakdown of information. This operation can be seen as a progressive focus on the details in an image.The SMQT parameter L with regard to image processing equals 2 ^L dynamic range levels of an image.The SMQT with L = 8 will yield an uncompressed image with enhanced details that is equivalent to the number of bits with which image is represented. Let a RGB LDR frame be denoted as I _LDR (i, j, k, n).

Where i = 0, 1, . . . ., M − 1 is the row index , j = 0, 1, . . . ., N − 1 the column index, k = 1, 2, 3 are the color channels and n = 1, 2, 3, . . . N represent the number of frames.

SMQT is applied to each channel independently

R _HDR = h(R LDR ) (3.11)

and

G _HDR = h(G LDR ) (3.12)

and

B _HDR = h(B LDR ) (3.13)

The input is a low light video, as a result the histogram of the input video has a low dynamic range and its peak is concentrated at the right corner in the histogram.

After applying SMQT, the output histogram retains the basic shape of the original but stretches it to explore the entire dynamic range as shown in Fig.3.7. Hence, the SMQT adapts the shape of the histogram by performing a non-linear stretch and makes sure that the entire dynamic range of the histogram is explored. The non-linear properties of the SMQT will lead a balanced stretch of the histogram.

These desirable properties come from the non-linear successive quantization based on the mean. Due to this wide histogram it provides high dynamic range outputs making them visually pleasing. But along with improvement in useful information there is also an increase in noise. An interesting thing to notice is that the output histogram in Fig.3.7 is sparse in nature and has tone jumps. In few intervals the histogram has empty spaces, this means a lack of smooth gradation. In most of the cases these minor gaps of only a few pixels wide, representing just a few tonal values, aren’t likely to be visible to human eye. The phenomenon of huge gaps in histograms is called as posterization. The tone-mapped output is shown in Fig.3.9b.

3.3.3 Contrast Limited Adaptive Histogram Equalization

The dynamic range of the input video can also be stretched using CLAHE [16].

It is one of the best available algorithms for contrast enhancement. The level of

(40)

complexity is minimal. Each RGB frame of the noisy low light video is given as input to the CLAHE algorithm. The main aim of CLAHE is to optimize the local contrast. Each frame is divided into tiles of size 8 × 8. The number of bins in all the histograms are ﬁxed to be 256, to make sure that the output has maximal dynamic range. For each of these tiles (contextual regions) the histogram of the contained pixels is calculated. Each obtained histogram is clipped based on the clip limit. The main usage of clip limit is to eliminate the noise in the input video by scaling down the amount of contrast enhancement. Clip limit decreases the amount of noise by allowing only a maximum number of pixels in each of the bins associated with local histograms. After clipping the histogram, the pixels that were clipped are equally redistributed over the whole histogram to keep the total histogram count identical. A clip limit value of one prohibits the contrast enhancement and gives the original frame as output. Redistribution of histogram bin values can be avoided by using a very high clip limit, which is equivalent to the adaptive histogram equalization.

Calculation of the corresponding cumulative histograms results in a look up table that optimizes contrast in each of the contextual regions, essentially a histogram equalization based on local image data. To avoid visibility of region boundaries, a bilinear interpolation scheme is used. The only dynamic variable in CLAHE is the clip limit.

From Fig.3.8 we can see that the dynamic range of the input frame is increased to a great extent. From the output of CLAHE shown in Fig. 3.9c, it is clear that it provides good contrast enhancement and dynamic range stretching. Its main drawback is the ampliﬁcation of noise after contrast enhancement. The tone-mapped output is shown in Fig.3.9c.

(a) Red channel (b) Green channel (c) Blue channel

Figure 3.5: Histogram of Temporally Denoised Frame (Video Telephony)

(41)

(a) Red channel (b) Green channel (c) Blue channel Figure 3.6: Histogram after Gamma Correction (Video Telephony)

(a) Red channel (b) Green channel (c) Blue channel Figure 3.7: Histogram after SMQT (Video Telephony)

(a) Red channel (b) Green channel (c) Blue channel Figure 3.8: Histogram after CLAHE (Video Telephony)

3.4 Spatial Noise Reduction

The low light video is passed only through temporal ﬁltering before tone mapping

and after that it is ﬁltered in the spatial domain so that the remaining noise is

removed. Even though a large amount of noise is attenuated by the temporal

noise reduction, the remaining noise is ampliﬁed in the tone mapping step. For

spatial denoising we use non local means algorithm which measures the similarity

between the neighboring and the reference patch. Non local means performs

(42)

weighted average of the pixels spatially by considering the similarity between the pixels to restore a denoised image. Even though non local means produces the ﬁnest results, it takes more computations for processing as it should consider the local neighborhood patches and calculate the weights for denoising. Fast non local mean is used to make the spatial noise reduction process robust and reduce the computational time for spatial denoising [17].

Fast NLM: Fast non local means helps in computing the weights very eﬃciently.

Generally, the sum of squared diﬀerence is calculated for each patch in NLM. As it is computationally costly, to calculate the weights we prefer fast NLM. The image is shifted by a translation vector d x and the sum of squared diﬀerence is calculated for the whole image which reduces the computations to a great extent.

To avoid memory corruption while implementing we extend the image boundaries with the patch size P . d x = [−K K], Where K is the window size. The weights are calculated as

w (s, t) = g h

δ ∈Δ G _σ (δ)(ν(s + δ) − ν(t + δ)) ² (3.14) where, G σ is the Gaussian kernel and the function g h (x) = e ⁽

^−x2^h2

⁾ and Δ represents the patch regions containing the neighboring sites δ. The Eq.3.14 can be rewritten by splitting the sum and using an identity as

w (s, t) = g h (S d

_x

(s + P ) − S d

_x

(s − P )) (3.15) where, S d

_x

is the discrete integration of the squared diﬀerence of the image ν and its translation by d x .

We obtain S d

_x

for the image as

S _d

_x

=

p k =0

(ν(k) − ν(k + d x )) ² , p ∈ ω (3.16)

where, ω is the discrete regular grid of the images, ω =[0, n-1] in which n repre-

sents the pixels in an image.

(43)

(a) Gamma correction

(b) SMQT

Figure 3.9: Tone-Mapped Outputs

(44)

(c) CLAHE

Figure 3.9: Tone-Mapped Outputs (continued) The restored image is obtained as the convex combination deﬁned as

u (s) = 1 Z (s)

t ∈R(s) w (s, t)ν(t) (3.17)

Where Z(s) is the normalization constant obtained as Z (s) =

t ∈R(s) w (s, t) (3.18)

R (.) is the searching window for R= [−K, K] ^d , d is the space dimension.

Fast NLM works as follows: ﬁrst, all values S d

_x

is computed using Eq.3.16 and the weights are computed using Eq.3.14 and Eq.3.15. Finally, the ﬁltering is performed using Eq.3.17. This thesis employs an algorithm based on pseudo code in [17].

Spectral Subtraction: Spectral subtraction is a simple and eﬀective method for

noise removal in images [27]. In addition to the captured scenary all frames within

the video includes ﬁxed pattern noise which originates in image sensor. Spectral

subtraction is a simple and eﬀective method for eliminating ﬁxed pattern noise. A

(45)

dark frame is obtained by capturing the image in the dark with the image sensor.

This gives us the image of noise in an image sensor. The magnitude spectrum of noisy frame is subtracted with the magnitude of dark frame to obtain the estimate of the noise free spectrum. The equation is given as

|X e (w 1 , w ₂ )| =

[|Y [w ₁ , w ₂ ]| ^k − |D(w ₁ , w ₂ )| ^k ], if |D(w ₁ , w ₂ )| < |Y (w ₁ , w ₂ )|

0, otherwise

(3.19) where, |X e (w 1 , w ₂ )| is the magnitude spectrum of the estimate of noise free signal and |D(w ₁ , w ₂ )| is the estimated magnitude spectrum of the dark frame and k is the magnitude scaling factor.

Here, the output of the non-local means is passed through spectral subtraction

to remove the ﬁxed pattern noise to obtain ﬁnal enhanced frame.

(46)

(47)

Implementation of Enhancement Model

In this chapter, the implementation aspects of the proposed enhancement model within this thesis are described. A total of three different low light video enhance- ment models have been proposed in Chapter 3. All three enhancement models consists of three parts as given in Fig.4.1. The algorithms used for temporal and spatial noise reduction are same for all three proposed models. However, the algo- rithm used for contrast enhancement is application specific those differ in between the three models. Each model is designed to extend the dynamic range as well as eliminate gaussian and fixed pattern noise to enhance the quality in extremely low light video.The algorithms employed in each part of the enhancement model are parameter dependent.The three enhancement algorithms are implemented for a total of ten low light videos which are at different luminance levels. The setup of the parameters in this thesis is based on empirical evaluation for all the ten low light video sequences. The choice of specific values in each part is explained in Section 4.1, 4.2 and 4.3.Most of the parameter values are same for all the ten low light video inputs, except the parameter values of α and β in subsection 4.2.1.

The extremely low light video processed by H.264 codec is taken as input to each model. The entire implementation of the enhancement models is done in MATLAB R 2016a.To evaluate the performance of the proposed method, sev- eral low light videos at diﬀerent luminance levels were considered. All the videos used for the evaluation is captured in a measurement device with the following speciﬁcations:

System Speciﬁcations

Sensor CMOS

Camera 8 megapixel

Focal length 33 mm Aperture size F2.4

Video size 1920 × 1080 Video Compression H.264

Frames/second 30

Table 4.1: Camera Speciﬁcations

31

(48)

Figure 4.1: Flow chart of the implemented model

Usually, the performance of CMOS image sensors is better than for a CCD image sensors in low light environment. In terms of sensitivity in low light, the perfor- mance of CMOS sensor is better than CCD sensor in the following parameter:

Quantum Efficiency is a measure of how efficiently the sensor converts light (pho- tons) to charge (electrons).The more electrons in a pixel during the integration period, the higher the output level of the sensor, so the more sensitive the sen- sor is for that specific wavelength of the light. At room temperature and higher temperatures in terms of QE, performance of CMOS sensor is better than CCD sensor at all wavelengths of light [29].

The various noise sources introduced by the measurement device include quanti- zation noise, read out noise, photon shot noise, thermal or dark current noise [2].

The noise histogram of photon shot noise, read out noise and dark current noise

are gaussian in nature and they can be characterized as gaussian noise. Dark

current noise is the same from frame to frame and it can be characterized as ﬁxed

pattern noise [2].

(49)

4.1 Temporal Noise Reduction

4.1.1 Temporal Averaging Filter

Temporal denoising is only implemented to increase the precision of motion es- timation rather than video denoising.The averaging factor in the temporal aver- aging ﬁlter is slightly lowered than the actual value to improve the amount of ﬁltering. The averaging factor is multiplied by δ = 0.93 to decrease the amount of smoothing and improve motion estimation.

4.1.2 Spatial Averaging Filter

An initial estimate for kalman filter is obtained by taking the spatial average of the first frame. The averaging factor for the spatial averaging filter is multiplied by a factor of γ = 0.975 for improving the filtering process.

4.1.3 Wiener Filter

Along with spatial averaging,Wiener ﬁltering is also done to obtian an initial estimate for kalman ﬁlter with a neighborhood of size 3x3.

4.2 Contrast Enhancement

4.2.1 Histogram Adjustment with Gamma Correction

One of the main contribution of the thesis is the selection of variable clipping thresholds lct and hct which are used to improve the performance of the algorithm.

They are as follows:

lct = α × λ low,c (4.1)

hct = β + λ high,c (4.2)

Where, lct is the low clipping threshold and hct is the high clipping threshold.

Most of the values in the histogram below the peak are noisy ones. To make sure that the noisy pixels are eliminated and to avoid the loss of useful information due to clipping, λ low,c must be selected by the proposed method. The value of α must be in the range of 0.35 to 0.7. The lct decides the amount of noise removal and information retrieval in the tone-mapped video. The value of β must be in the range of 0.2 to 0.8. The intensity value below which the majority of pixels in video frame are accumulated is selected as the hct. The hct decides the amount of lighting in the tone mapped video. Too high hct values will result in loss of true color of the subject.

The gamma value is taken as 0.4. Gamma is inversely proportional to the lighting.

The obtained LT value is multiplied by a factor α, it adjusts the range to be