with Objective and Subjective Quality Assessment
Sri Krishna Jayanty Venkata Gopi Krishna Dalasari
November 2016
Dept. Applied Signal Processing
Blekinge Institute of Technology
SE–371 79 Karlskrona, Sweden
Electrical Engineering with emphasis on Signal Processing.
Contact Information:
Author(s):
Sri Krishna Jayanty
E-mail: jayanty.srikrishna@gmail.com Venkata Gopi Krishna Dalasari E-mail: dalasarigopi@gmail.com
Industry Supervisor:
Dr. Benny Sällberg
Sällberg Technologies e.U.
Sällberg Technologies e.U
Friedrich Schiller-Str. 11. Phone : +43 660 4849 960 A4840 Vöcklabruck, Austria E-mail : office@sallberg.at Co-supervisor:
Dr. Josef Ström Bartunek
Dept. Applied Signal Processing
Dept. Applied Signal Processing
Blekinge Institute of Technology Phone : +46 455 38 50 00
SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57
Enhancing low light videos has been quite a challenge over the years. A video taken in low light always has the issues of low dynamic range and high noise. This master thesis presents contribution within the field of low light video enhancement. Three models are proposed with different tone mapping algorithms for extremely low light low quality video enhance- ment. For temporal noise removal, a motion compensated kalman structure is presented. Dynamic range of the low light video is stretched using three different methods. In Model 1, dynamic range is increased by adjustment of RGB histograms using gamma correction with a modified version of adap- tive clipping thresholds. In Model 2, a shape preserving dynamic range stretch of the RGB histogram is applied using SMQT. In Model 3, contrast enhancement is done using CLAHE. In the final stage, the residual noise is removed using an efficient NLM. The performance of the models are com- pared on various Objective VQA metrics like NIQE, GCF and SSIM.
To evaluate the actual performance of the models subjective tests are con- ducted, due to the large number of applications that target humans as the end user of the video.The performance of the three models are compared for a total of ten real time input videos taken in extremely low light environ- ment. A total of 25 human observers subjectively evaluated the performance of the three models based on the parameters: contrast, visibility, visually pleasing, amount of noise and overall quality. A detailed statistical evalua- tion of the relative performance of the three models is also provided.
Keywords: Contrast enhancement, Dynamic range, Kalman filter, Spatial denoising, Noise reduction, Temporal denoising, Tone mapping.
ii
Abstract ii
1 Introduction 1
1.1 Motivation . . . . 1
1.2 Problem Statement . . . . 2
1.3 Research Questions . . . . 2
1.4 Survey of Related works . . . . 3
1.5 Proposed Solution Based on Related Work . . . . 4
1.6 Outline of the thesis . . . . 6
2 Background 7 2.1 Video Compression . . . . 7
2.1.1 H.264/AVC . . . . 7
2.2 Kalman Filter . . . . 8
2.3 Tone Mapping . . . . 8
2.3.1 Gamma . . . . 9
2.3.2 Successive Mean Quantization Transform (SMQT) . . . . . 10
2.3.3 Contrast Limited Adaptive Histogram Equalization (CLAHE) 11 2.4 Non Local Means (NLM) . . . . 12
2.5 Wiener Filter . . . . 13
2.6 Spectral Subtraction . . . . 13
3 Low Light Video Enhancement Model 15 3.1 Low Light Video Characteristics . . . . 15
3.2 Temporal Noise Reduction . . . . 17
3.3 Tone Mapping . . . . 20
3.3.1 Histogram Adjustment with Gamma Correction . . . . 21
3.3.2 Successive Mean Quantization Transform . . . . 23
3.3.3 Contrast Limited Adaptive Histogram Equalization . . . . 23
3.4 Spatial Noise Reduction . . . . 25
4 Implementation of Enhancement Model 31 4.1 Temporal Noise Reduction . . . . 33
4.1.1 Temporal Averaging Filter . . . . 33
iv
4.2 Contrast Enhancement . . . . 33
4.2.1 Histogram Adjustment with Gamma Correction . . . . 33
4.2.2 Successive Mean Quantization Transform . . . . 34
4.2.3 Contrast Limited Adaptive Histogram Equalization . . . . 34
4.3 Spatial Noise Reduction . . . . 34
4.3.1 Fast NLM . . . . 34
4.3.2 Spectral Subtraction . . . . 35
5 Objective and Subjective Quality Assessment 37 5.1 Objective Quality Assessment . . . . 38
5.1.1 No Reference Objective Metrics . . . . 38
5.1.2 Full Reference Objective Metrics . . . . 38
5.2 Subjective Quality Assessment . . . . 39
5.2.1 Outline . . . . 39
5.2.2 Test Environment . . . . 39
5.2.3 Observers . . . . 40
5.2.4 Performance Parameters . . . . 40
5.2.5 Instructions for the assessment . . . . 41
5.2.6 Test Procedure . . . . 42
6 Results and Analysis 43 6.1 Comparing the Performance of the Models . . . . 43
6.1.1 Application 1 . . . . 43
6.1.2 Application 2 . . . . 46
6.1.3 Performance in Presence of Motion . . . . 47
6.2 Analyzing Models with Objective Metrics . . . . 49
6.2.1 No Reference Objective Metrics . . . . 49
6.2.2 Full Reference Objective Metrics . . . . 49
6.3 Subjective Quality Analysis . . . . 50
6.3.1 Mean Opinion Score . . . . 50
6.3.2 Standard Deviation . . . . 53
6.3.3 Correlation of Parameters w.r.t. Overall Quality . . . . 54
6.3.4 Analyzing Parameters w.r.t Luminance Levels . . . . 59
6.3.5 Output Luminance Levels . . . . 65
7 Conclusion and Future Work 66
References 68
Appendices 72
v
1.1 Block diagram . . . . 2
1.2 Process Flow Diagram . . . . 5
2.1 Successive Mean Quantization Transform for one operation . . . . 11
3.1 Temporal Noise Reduction Schema . . . . 17
3.2 Before and After Temporal Noise Reduction . . . . 19
3.3 Histogram of Low Light Frame . . . . 20
3.4 Histogram of Normal Light Frame . . . . 22
3.5 Histogram of Temporally Denoised Frame (Video Telephony) . . . 24
3.6 Histogram after Gamma Correction (Video Telephony) . . . . 25
3.7 Histogram after SMQT (Video Telephony) . . . . 25
3.8 Histogram after CLAHE (Video Telephony) . . . . 25
3.9 Tone-Mapped Outputs . . . . 27
3.9 Tone-Mapped Outputs (continued) . . . . 28
4.1 Flow chart of the implemented model . . . . 32
6.1 Video Telephony Application . . . . 44
6.1 Video Telephony Application (Continued) . . . . 45
6.2 Signboard . . . . 46
6.2 Signboard (Continued) . . . . 47
6.3 Motion compensation after kalman filtering . . . . 48
6.4 Xylophone . . . . 51
6.4 Xylophone (Continued) . . . . 52
6.5 MOS w.r.t.various parameters for sequence Video Telephony . . . 53
6.6 Standard Deviation w.r.t. parameters for all 10 test sequences . . 54
6.7 Contrast w.r.t. Overall Quality . . . . 55
6.8 Visibility w.r.t.Overall Quality . . . . 56
6.9 Visually Pleasing w.r.t.Overall Quality . . . . 57
6.10 Amount of Noise w.r.t.Overall Quality . . . . 58
6.11 Contrast w.r.t.Luminance Levels . . . . 59
6.12 Visibility w.r.t.Luminance Levels . . . . 60
6.13 Amount of Noise w.r.t.Luminance Levels . . . . 61
vi
1 Application 4 - Painting . . . . 73
1 Application 4 - Painting (Continued) . . . . 74
2 Application 5 - Indoor Environment . . . . 75
2 Application 5 - Indoor Environment (Continued) . . . . 76
3 Application 6 -Photography . . . . 77
3 Application 6 - Photography (Continued) . . . . 78
4 Application 7 - Outdoor Environment . . . . 79
4 Application 7 - Outdoor Environment (Continued) . . . . 80
5 Application 8 - Pre-processing for Character Recognition . . . . . 81
5 Application 8 - Pre-processing for Character Recognition (Continued) 82 6 Application 9 - Surveillance Camera . . . . 83
6 Application 9 - Surveillance Camera (continued) . . . . 84
7 Application 10 - Forest . . . . 85
7 Application 10 - Forest(Continued) . . . . 86
vii
2.1 Kalman Equations . . . . 9
4.1 Camera Specifications . . . . 31
4.2 Values of α and β for different low light inputs . . . . 34
5.1 System specifications . . . . 39
6.1 Performance comparison based on no reference objective metrics . 49 6.2 Performance comparison of full reference objective metrics . . . . 50
6.3 Correlation of all parameters w.r.t Overall Quality . . . . 54
6.4 Input and Output Relative Luminance Values . . . . 65
viii
Firstly, we wish to thank our supervisor Dr.Benny S ¨allberg for his valuable expert advice, strong support and encouragement throughout our thesis work. Further- more, we would like to thank him for introducing us to the topic and for helping us realize our true potential as a researcher. Our sincere thanks to the co-supervisor Dr.Josef Str ¨om Bartunek for his guidance and support to complete our thesis.
We would also like to thank them for helping us overcome obstacles and emerge out successful.
We would like to express our deepest gratitude to the entire Department of Ap- plied Signal Processing for helping us throughout our research. Also, we would like to thank the participants who have willingly shared their valuable time for the subjective test.
A special thank you also goes for our family members for all the support we have
received throughout the years.
AGC Adaptive Gamma Correction
APMF Adaptive Piecewise Mapping Function ASTA Adaptive Spatio Temporal Accummulation ASTC Adaptive Spatio Temporal Connective AVC Advanced Video Coding
CCD Charge Coupled Device
CDF Cumulative Distribution Function
CLAHE Contrast Limited Adaptive Histogram Equalization FPN Fixed Pattern Noise
GCF Global Contrast Factor
HAGC Histogram Adjustment with Gamma Correction HDR High Dynamic Range
IPTV Internet Protocol Television LDR Low Dynamic Range
NCV Neighborhood Connective Value NIQE Natural Image Quality Evaluator NLM Non Local Means
RL Relative Luminance
SDFT Statistical Domain Temporal Filter SHF Spatial Hybrid Filter
SMQT Successive Mean Quantization Transform
xii
TM Tone Mapping
VQA Video Quality Assessment
xiii
Introduction
1.1 Motivation
Over the past few years there is a substantial growth of digital cameras in the space of sensitivity and resolution. Despite the fact that the sensitivity in captur- ing the light by image sensors has been improved, the modern digital cameras still limit in capturing a High Dynamic Range (HDR) images in low light conditions.
Similar to human eye, digital cameras find it hard to capture extremely low light conditions. Digital cameras can perform better when captured at higher ISO levels and lower shutter speeds, as more light is received by the image sensors.
These are the desired settings at low light conditions. But, higher ISO levels usually result in higher noise and slower shutter speeds result in motion blur [1].
Moving on to lower lighting conditions, the intensity of noise grows higher than the signal making it hard to reconstruct the image. So, post processing the videos taken at low lighting conditions to improve the visual appearance has been an active research area which could serve the purpose for several video processing applications [2].
Video Telephony has rapidly appeared as a noteworthy technology, replacing the thought of a conventional phone call. With the mobile market expanding with respect to computers and laptops in the near future video telephony will be the main source of communication. With advent of powerful video codecs and high speed internet, video telephony became a practical technology with regular use. H.264 is one of the most widespread codec standard used for video telephony. As low end video quality devices such as web cams and cell phones became ubiquitous for video telephony, there is ever more need for reliable video enhancement technologies to improve their output. The video sequences are often disarrayed by noise image establishment and transmission especially in low light environment. Low light videos have extremely low dynamic range, as a result the quality of videos in low light is limited. Image sequences captured in low light conditions have very low Signal to Noise ratio. Therefore, it is desirable to enhance the quality of the low light video. Similarly, low light video enhancement is used for several specified applications such as automated vehicles, video telephony, security and surveillance industries, satellite videography, traffic management,
1
digital photography [3].
1.2 Problem Statement
The main obstacle with videos taken in extremely low light environment is the lack of visibility. The obstacles can be characterized into low dynamic range and high amount of noise. There are various types of noise sources in a low light video which include quantization noise, read out noise, thermal noise and photon shot noise. Straight away stretching the dynamic range of a low light video exhibits various undesirable artifacts such as noise amplification, intensity saturation and loss of resolution. So, a suitable denoising technique has to be performed before stretching the dynamic range i.e Tone mapping. Even though a considerable amount of noise is removed before tone mapping step, the noise amplified in the tone mapping step has to be removed by proper denoising technique. To design an effective low light video enhancement technique, temporal denoising is done followed by tone mapping for contrast enhancement and at the end residual noise is removed by spatial noise reduction [3]. The process is shown in the Fig.1.1
Figure 1.1: Block diagram
1.3 Research Questions
• What are the characteristics of a low light video?
• How can a video taken in extremely low light be enhanced?
• What are the denoising and contrast enhancement techniques to be used to enhance the low-light video?
• How are the subjective tests conducted in order to analyze the performance of the algorithms?
• How does the performance of the enhanced video at different luminance
levels vary?
1.4 Survey of Related works
Various approaches are developed for enhancing low light videos. Xuan Dong and Dubok Park [4][5][6] presented a dehazing algorithm for low light video en- hancement. This method is developed under the observation that the results of pixel wise inversion of the low light videos looks quite similar to hazy videos.
The estimation of medium transmission function in the hazy image acquisition model by using a dark channel prior (DCP) becomes unreliable in very low light conditions and it requires large computation loads. Qing Xu [7] proposed a three step method for denoising low light videos. In the first step, a modified version of NLM is performed where NLM in spatial and temporal region is done separately.
These two are combined based on adaptive weights which depends on the amount of motion in the subject of the video. In the second step, tone mapping is done to stretch the dynamic range of the video and in the final step filtering is done in the YCrCb color space. Although a modified version of 3D NLM is used the computational cost of the method is really high.
Dong [8] proposed a contrast enhancement algorithm for low light videos. It is based on a piecewise stretch on the brightness component extracted with Retinex theory in HSV space to improve the visuality of the image. The overall computa- tional complexity of the algorithm is really high. Although it is able to increase the amount of brightness to an ideal level, the algorithm fails to retain color infor- mation in all cases. Malm [9] presented a modified version of structure adaptive anisotropic filtering using the 3D structure tensor for adaptive spatio temporal filtering. The 3D structure tensor obtained from the spatio-temporal gradient is used to estimate kernel width of spatial and temporal direction inorder to con- struct the adaptive anisotropic filter. Contrast limited histogram equalization is used for tone mapping. Due to the inaccurate estimation of the anisotropic kernel from input signals the method becomes unstable and provides blurry results and low output PSNR.
Seong [10] proposed a low light noise removal algorithm, it mainly consists of a statistical domain temporal filter (SDFT) for moving area and a Spatial hybrid filter (SHF) for stationary area. Poisson noise and false color noise are considered as the main type of noise in the input video. This algorithm is only for moderately dark videos hence there is no tone mapping step to stretch the dynamic range.
Eric [11] proposed a framework for video enhancement using per pixel virtual exposures. The algorithm mainly deals with enhancing Low Dynamic Range (LDR) videos based on virtual exposure camera model, reducing noise in LDR videos using Adaptive Spatio Temporal Accumulation (ASTA) filter and a tone mapping approach to enhance LDR videos.
Chao [12] proposed a video enhancement algorithm, it mainly consists of a lo-
cal image statistic named Neighborhood Connective Value (NCV) for identifying
impulse pixels, an Adaptive Spatio Temporal Connective (ASTC) noise filter for
reducing mixed noise and a Piecewise Mapping Function (APMF) to enhance
video contrast. ASTC uses optical flow for motion estimation and APMF is ap- plied on segmentation results. Methods such as optical flow and segmentation do not provide reliable results for low light noisy videos.
1.5 Proposed Solution Based on Related Work
Most of the work mentioned so far only deals with moderately dark videos in RAW format. In our research an attempt is made to enhance videos processed by H.264 codec with relative luminance lower than 0.1. In extremely low light videos noise is really high, reconstructing and enhancing the useful information is quite a challenge. The proposed method, which is improved from the previous work in [13] [2], is a novel method for enhancement of low quality and extremely low light video. Prior to denoising the low light video, the various source of noise are categorized into gaussian and FPN noise based on which the denoising algorithms are designed. Initially, a low light input video processed by H.264 codec is passed through the following steps given in the Fig.1.2 to produce an enhanced video. Instead of implementing the computational complex modified kalman filter in [2], a simplified version is implemented in our proposed method.
The temporal denoising step for eliminating still noise and avoiding motion blurs is implemented according to [14]. In the first stage, temporal averaging is done to decrease the level of noise and for motion estimation.
In the second stage, the remaining amount of noise is eliminated by motion com- pensated spatio-temporal kalman filtering. The impulse noise is removed using median filter. After removing a considerable amount of noise, the dynamic range of the frames is increased by one of the three tone mapping algorithms stated in the block diagram:
• Algorithm 1 : In this algorithm a modified version of histogram adjustment with gamma correction in [2] is proposed, where a detailed explanation is provided on selecting lower and higher clipping thresholds.
• Algorithm 2 : Mikael Nilsson [15] proposed SMQT for enhancement of gray scale images. In this algorithm SMQT is applied for stretching the dynamic range of the input low light video frame.
• Algorithm 3 : This algorithm consists of the most commonly used algorithm for contrast enhancement i.e CLAHE [16].
After implementing one of the above algorithms for improving dynamic range,
the amplified noise due to tone mapping is eliminated by using Non-local means
algorithm.In order to increase the efficiency of the code a modified version of NLM
[17] is implemented. In the last step of processing the remaining noise i.e. FPN
is removed by using spectral subtraction. The performance of the various tone mapping algorithms is compared using various subjective and objective metrics.
Figure 1.2: Process Flow Diagram
1.6 Outline of the thesis
The report has been structured as follows:
Chapter 2 provides the conceptual learning about various image and video pro- cessing algorithms.
Chapter 3 deals with the detail description of the algorithms proposed for low light enhancement.
Chapter 4 provides the values taken for dynamic parameters in each algorithm for a specific video sequence.
Chapter 5 discusses the procedure in which the subjective assessment has been conducted.
Chapter 6 shows the results and a detailed analysis.
Chapter 7 concludes the work and provides a path for continuing the work.
Background
In this chapter a detailed explanation of all the selected methods used in the thesis are given. Here, the required background knowledge to better comprehend the concepts of each method with context to the thesis is presented.
2.1 Video Compression
Compression reduces the resource usage, such as data storage space or transmis- sion capacity. Video compression is a technique used for reducing the file size for efficient storage and transmission of data over a network by removing the redundant video data. The process of video compression involves applying an algorithm to obtain a compressed file which can be stored or transmitted. An inverse algorithm is used to play a video with the same content as the original source video [18].
Compression can be either lossless or lossy. Lossless compression technique guar- antees full reconstruction of the original data without incurring any distortion in the process. Lossy compression is the class of data encoding method that re- duces bits by identifying and removing the unnecessary information. For various day-to-day applications lossy compression is used, as the resource usage is very low compared to lossless compression [18]. Various compression standards offer various methods for reducing the data which also creates a variation in the qual- ity, bitrate and latency of the video. The state-of-the-art on video compression is nowadays represented by the H.264/AVC standard video codec which has been deployed in several application domains from wireless video streaming to IPTV and Blu-ray Disc (BD) [19].
2.1.1 H.264/AVC
H.264 is a compression format that is identical to the MPEG-4 part 10, Advanced Video Coding. H.264 is trending block-oriented motion-compensation standard commonly used for recording, compression and transmission of video content. It delivers an average bit rate reduction of 50%, when compared with the other video standards without compromising the video quality. H.264 has the flexibility to
7
scale the latency depending on the application requirement. It has 11 levels to vary the performance, bandwidth and memory requirements.
H.264 encoder performs block based motion compensation to search for matching blocks in several reference frames for compression. Intra-coded macroblocks are used when matching blocks are not used. Motion compensation is the most de- manding aspect of video encoder and the different ways and degrees with which it can be implemented by an H.264 encoder can have an impact on the efficiency of video compressed [20]. An in-loop de-blocking filter present in H.264 stan- dard helps in reducing the blocking artifacts caused by compression. This filter smoothes block edges using an adaptive strength to deliver a perfect decompressed video.
2.2 Kalman Filter
Kalman filter is an efficient way of estimating the state of a process to minimize the mean of the squared error [14]. The purpose of each iteration of kalman filter is to update the estimate of the state vector of a system and the covariance of that vector based on the information in a new observation. Kalman filter assumes that the observations occur at fixed discrete time intervals and addresses the general problem of trying to estimate the state of a discrete-time controlled process. Kalman filter estimates the process by using a feedback loop in which the filter estimates the process state and obtains a feedback in the form of noisy measurements [21]. It generally falls into prediction and update equations. In the prediction equations, the priori state is obtained by the amount of motion estimation error and the state covariance matrix for current state. The prediction equations are presented in the Table 2.1.
In the Table 2.1, X_est t and P _pred t are the priori estimate calculated in pre- diction state and the state covariance matrix for current state. Q t reflects the amount of motion estimation error.
Update equations are responsible for incorporating a new measurement into a priori estimate to obtain an improved posteriori estimate. Update equations are presented in Table 2.1.
In the Table 2.1, K t is the kalman gain which stabilizes quickly and remains con- stant. C t is the covariance noise of the current input frame. X_pred t , P _est t
are the denoised image frame and posteriori estimates of state covariance matrix for current state.
2.3 Tone Mapping
Tone mapping is a technique used in image processing and computer graphics to
map one set of colors to another to approximate the appearance of high-dynamic-
Kalman Prediction Equations
X_pred t = X_est t −1 (2.1)
P _pred t = P _est t −1 + Q t (2.2)
Kalman Updation Equations
K t = (P _pred t + C t ) −1 (2.3) X_est t = P _pred t (P _pred t + C t ) −1 (2.4) P _est t = (I − K t )P _pred t (2.5)
Table 2.1: Kalman Equations
range images in a medium that has a more limited dynamic range. In this section background of three different tone mapping algorithms i.e Gamma correction [22], SMQT [23] and CLAHE [16] are explained.
2.3.1 Gamma
Gamma defines the relation between a pixel’s numeric value and its actual lumi- nance. It is a non linear operation used to code and decode light and dark values in video images. Gamma correction is usually given in the form of the power function:
intensity = (signal) γ , 1.8 < γ < 2.8 (2.6)
Human eyes do not perceive light the way cameras do. In cameras there is a linear
relationship between the number of photons that hit the sensor and the amount of
perceived brightness where as human eyes have a non linear relationship between
the amount of light received and perceived brightness. Compared to camera,
human eyes are much more sensitive to changes in the dark tones then compared
to similar changes in bright tones.Therefore, whenever a digital image is saved,
it is gamma encoded so that the intensity values in the digital images perceive
brightness as the human eye [22].
2.3.2 Successive Mean Quantization Transform (SMQT)
SMQT is a non-linear transformation that reveals the structure of the data. It preserves the shape of histogram by an operation identical to non-linear his- togram stretch. SMQT can be viewed as a binary tree build of a simple Mean Quantization Units (MQU) where each level performs an automated breakdown of information [23]. It adjusts the dynamic range adaptively and non-linearly and is designed with only one parameter L. The parameter L sets the number of levels in a binary tree.
The data I can be converted into a vector or any arbitrary form. Let x be the data points and D (x) be the data points for each separate vector. The MQU is calculated for D at each level. MQU consists of three steps namely mean calculation, a quantization and a split of input set [23].
In the first step of the MQU, we calculate the mean of the data denoted as D (x), according to
D (x) = 1
|D| ×
x ∈D D (x) (2.7)
The second step uses the mean to quantize the values of data points into 0, 1.
The comparison function is defined as ξ (D(y), D(x)) =
1, ifD(y) > D(x)
0, else (2.8)
And let ⊗ denote concatenation,and then
U (x) = ⊗ y ∈D ξ (D(y), D(x)) (2.9) It is the mean quantized set. The set U(x) is the main output from a MQU.
The third splits the input into two subsets
D 0 (x) = x|D(x) ≤ D(x), ∀x ∈ D (2.10) D 1 (x) = x|D(x) > D(x), ∀x ∈ D (2.11) Where D 0 (x) propagates left and D 1 (x) propagates right in the binary tree as shown in Fig.2.1 The output acquired from U (x) is interpreted as the structure of D(x). Hence, the MQU is independent of gain and bias adjustments of the input. MQU sets the main computing unit for the SMQT. The transform for the first level is SM QT 1 and the output is U. The same implementation continues depending on the number of levels and each notation is extended depending on the number of levels. The final SM QT L is found by adding the results. Hence, the result of SM QT l is obtained by
M (x) = x|D(x) =
L l =1
2
l−1n =1
I (u l,n ) × 2 L −1 ∀x ∈ M, ∀u l,n ∈ U l,n (2.12)
Figure 2.1: Successive Mean Quantization Transform for one operation
Where, output set from one MQU in the tree is denoted by U (l,n) l = 1, 2, . . . , L is the current level and n = 1, 2, . . . , 2 l −1 is the output number for the MQU at level l. MQU is the basic building block of SMQT which is insensitive to gain and bias.
2.3.3 Contrast Limited Adaptive Histogram Equalization (CLAHE)
Adaptive histogram equalization is technique used to improve the contrast by partitioning the image into regions and apply histogram equalization to each re- gion [16]. It operates on small data regions called tiles rather than considering the entire data as in histogram equalization. The contrast of each tile is enhanced so that the output histogram matches the specified histogram. The neighboring tiles are then consolidated by the bilinear interpolation so as to remove the artificially induced limits. The contrast can be constrained in homogeneous areas to avoid amplifying the noise present in the image.
CLAHE differs from AHE in its contrast limiting. The contrast amplification
in the vicinity of a given pixel value is given by the slope of the transformation
function. This transformation function is used to convert the density function
approximately to a uniform distribution function. The transformation function
is proportional to the slope of the neighborhood cumulative distribution function
(CDF) and therefore to the value of the histogram at that pixel value. CLAHE
limits the amplification by clipping the histogram at a predefined value before
computing the CDF. This limits the slope of the CDF and also the slope of the
transformation function. The value at which the histogram is clipped, the so-
called clip limit, depends on the normalization of the histogram and thereby on
the size of the neighborhood region. Clip limit limits the contrast enhancement.
Increasing the clip limit results in more contrast.
2.4 Non Local Means (NLM)
Non local means is an algorithm in image processing for spatial denoising. Non local means filtering takes a mean of all the neighboring pixels, weighted by how similar these pixels are to the target pixels. This results in improving the clarity and less loss of detail in the image compared with local mean algorithms [24]. Each pixel’s local vicinity is considered as a patch and is checked with the other patches in the neighborhood till a certain vicinity. In this vicinity, the algorithm should search for similar patches and also the patch sizes are adjustable. NLM avoids the faulty inter-color similarity computations by only considering neighboring patches with the same pattern as a reference pattern [2]. The weighting function depends on the similarity between the patches and they generally lie between 0 and 1 [25].
The sum of all the weights is 1. Given a discrete noisy image v = v(i)|i ∈ I, the NLM estimate for a pixel i is computed as the weighted average of all the pixels in the vicinity of the patch.
The equation is given as
N LM [v](i) =
j ∈I w (i, j)v(j) (2.13)
Where, w (i, j) j is the family of weights that depend on the similarity between the pixels i, j satisfies the conditions 0 ≤ w(i, j) ≤ 1 and
j ∈I w (i, j) = 1.
The similarity between the pixels i and j depends on the similarity between the patches v (R i ) and v(R j ), where R k denotes a square neighborhood of fixed size and centered at pixel k. The similarity can be calculated as a decreasing function of the weighted Euclidean distance, ||v(R i )) − v(R j )|| 2 2,a where a > 0 is the standard deviation of the Gaussian kernel.
The Gaussian weighting function is given by w (i, j) = 1
Z (i) e
−||v(Ri)−v(Rj)||22,ah2
(2.14)
Where, Z (i) is the normalization constant.The equation for Z(i) is given as Z (i) =
j e
−||v(Ri)−v(Rj)||22,ah2